Estimating $F_2$ norm, Sketching, Johnson-Lindenstrauss Lemma

You can read the notes from the previous lecture of Chandra Chekuri's course on Estimating F_2 norm, Sketching, Johnson-Lindenstrauss Lemma here.

1. Sketch for $F_{p}$ Estimation when $0 < p \leq 2$

We have seen a linear sketching estimate for

F_{2}

estimation that uses

O (\log n)

space. Indyk^[1] obtained a technically sophisticated and interesting sketch for

F_{k}

estimation where

0 < p \leq 2

(note that

p

can be a real number) which uses polylog

(n)

space. Since the details are rather technical we will only give the high-level approach and refer the reader to the paper and related notes for more details. Note that for

p > 2

there is a lower bound of

Ω (n^{1 - 2 / p})

on the space required.

To describe the sketch for

0 < p \leq 2

we will revisit the

F_{2}

estimate via the JL Lemma approach that uses properties of the normal distribution.

\underline{F_{2} -Estimate:}

Let

Y_{1}, Y_{2}, \dots, Y_{n}

be sampled independenty from the

N (0, 1)

distribution

z \leftarrow 0

While (stream is not empty) do

(i_{j}, Δ_{j})

is current token

z \leftarrow z + Δ_{j} \cdot Y_{i_{j}}

endWhile

Output

z^{2}

Let

Z = \sum_{i \in [n]} x_{i} Y_{i}

be the random variable that represents the value of

z

at the end of the stream. The variable

Z

is a sum of independent normal variables and by the properties of the normal distribution

Z \sim \sqrt{\sum_{i} x_{i}^{2}} \cdot N (0, 1)

. Normal distribution is called 2-stable for this reason. More generally a distribution

D

is said to be

p

-stable if the following property holds: Let

Z_{1}, Z_{2}, \dots, Z_{n}

be independent random variables distributed according to

D

. Then

\sum_{i} x_{i} Z_{i}

has the same distribution as

∥ x ∥_{p} Z

where

Z \sim D

. Note that a

p

-stable distribution will be symmetric around

0

It is known that

p

-stable distributions exist for all

p \in (0, 2]

and not for any

p > 2

. The

p

-stable distributions do not have, in general, an analytical formula except in some cases. We have already seen that the standard normal distribution is 2-stable. The 1-stable distribution is the Cauchy distribution which is the distribution of the ratio of two independent standard normal random variables. The density function of the Cauchy distribution is known to be

\frac{1}{π (1 + x^{2})}

; note that the Cauchy distribution does not have a finite mean or variance. We use

D_{p}

to denote a

p

-stable distribution.

Although a general

p

-stable distribution does not have an analytical formula it is known that one can sample from

D_{p}

. Chambers-Mallows-Stuck method is the following:

Sample $θ$ uniformly from $[- π / 2, π / 2]$ .
Sample $r$ uniformly from $[0, 1]$ .
Ouput

\frac{\sin (p θ)}{(\cos θ)^{1 / p}} {(\frac{\cos ((1 - p) θ)}{\ln (1 / r)})}^{(1 - p) / p} .

We need one more definition.

Definition 1 The median of a distribution $D$ is $θ$ if for $Y \sim D, \Pr [Y \leq μ] = 1 / 2$ . If $ϕ (x)$ is the probability density function of $D$ then we have $\int_{- \infty}^{μ} ϕ (x) d x = 1 / 2$ .

Note that a median may not be uniquely defined for a distribution. The distribution

D_{p}

has a unique median and so we will use the terminology median

(D_{p})

to denote this quantity. For a distribution

D

we will refer to

| D |

the distribution of the absolute value of a random variable drawn from

D

. If

ϕ (x)

is the density function of

D

then the density function of

| D |

is given by

ψ

, where

ψ (x) = 2 ϕ (x)

x \geq 0

and

ψ (x) = 0

x < 0 .

\underline{F_{p} -Estimate:}

k \leftarrow Θ (\frac{1}{ϵ^{2}} \log \frac{1}{δ})

Let

M

be a

k \times n

matrix where each

M_{i j} \sim D_{p}

y \leftarrow M x

Output

Y \leftarrow \frac{median (| y_{1} |, | y_{2} |, \dots, | y_{k} |)}{median (| D_{p} |)}

By the

p

-stability property we see that each

y_{i} \sim ∥ x ∥_{p} Y

where

Y \sim D_{p}

. First, consider the case that

k = 1

. Then the output

| y_{1} | /

median

(| D_{p} |)

is distributed according to

c | D_{p} |

where

c = ∥ x ∥_{p} /

median

(| D_{p} |)

. It is not hard to verify that the median of this distribution is

∥ x ∥_{p}

. Thus, the algorithm take

k

samples from this distribution and ouputs as the estimator the sample median. The lemma below shows that the sample median has good concentration properties.

Lemma 1 Let $ϵ > 0$ and let $D$ be a distribution with density function $ϕ$ and a unique median $μ > 0$ . Suppose $ϕ$ is absolutely continuous on $[(1 - ϵ) μ, (1 + ϵ) μ]$ and let $α = min {ϕ (x) ∣ x ϵ$ $[(1 - ϵ) μ, (1 + ϵ) μ]$ . Let $Y = median (Y_{1}, Y_{2}, \dots, Y_{k})$ where $Y_{1}, \dots, Y_{k}$ are independent samples from the distribution $D$ . Then

\Pr [| Y - μ | \geq ϵ μ] \leq 2 e^{- \frac{2}{3} ϵ^{2} μ^{2} α^{2} k} .

We sketch the proof to upper bound

\Pr [Y \leq (1 - ϵ) μ]

. The other direction is similar. Note that by the definition of the median,

\Pr [Y_{j} \leq μ] = 1 / 2

. Hence

\Pr [Y_{j} \leq (1 - ϵ) μ] = 1 / 2 - \int_{(1 - ϵ) μ}^{μ} ϕ (x) d x .

Let

γ = \int_{(1 - ϵ) μ}^{μ} ϕ (x) d x

. It is easy to see that

γ \geq α ϵ μ

Let

I_{j}

be the indicator event for

Y_{i} \leq (1 - ϵ) μ

; we have

E [I_{j}] = \Pr [Y_{i} \leq (1 - ϵ) μ] \leq 1 / 2 - γ

. Let

I = \sum_{j} I_{j}

; we have

E [I] = k (1 / 2 - γ)

. Since

Y

is the median of

Y_{1}, Y_{2}, \dots, Y_{k}, Y \leq (1 - ϵ) μ

only if more than

k / 2

I_{j}

are true which is the same as

\Pr [I > (1 + δ) E [I]]

where

1 + δ = \frac{1}{1 - 2 γ}

. Now, via Chernoff bounds, this probability is at most

e^{- γ^{2} k / 3}

for sufficiently small

γ

We can now apply the lemma to the estimator output by the algorithm. We let

ϕ

be the distribution of

c | D_{p} |

. Recall that the median of this distribution if

∥ x ∥_{p}

and the output of the algorithm is the median of

k

indepenent samples from this distribution. Thus, from the lemma,

\Pr [| Y - ∥ x ∥_{p} | \geq ϵ ∥ x ∥_{p}] \leq 2 e^{- ϵ^{2} k μ^{2} α^{2} / 3} .

Let

ϕ^{'}

be the distribution of

| D_{\sqrt} |

and

μ^{'}

be the median of

ϕ^{'}

. Then it can be seen that

μ α = μ^{'} α^{'}

where

α^{'} = min {ϕ^{'} (x) ∣ (1 - ϵ) μ^{'} \leq (1 + ϵ) μ^{'}}

. Thus

μ^{'} α^{'}

depends only on

D_{p}

and

ϵ

. Letting this be

c_{p, ϵ}

we have,

\Pr [| Y - ∥ x ∥_{p} | \geq ϵ ∥ x ∥_{p}] \leq 2 e^{- ϵ^{2} k c_{p, ϵ}^{2} / 3} \leq (1 - δ),

provided

k = Ω (c_{p, ϵ} \cdot \frac{1}{ϵ^{2}} \log \frac{1}{δ})

Technical Issues: There are several technical issues that need to be addressed to obtain a proper algorithm from the preceding description. First, the algorithm as described requires one to store the entire matrix

M

which is too large for streaming applications. Second, the constant

k

depends on

c_{p, ϵ}

which is not explicitly known since

D_{p}

is not well-understood for general

p

. To obtain a streaming algorithm, the very high-level idea is to derandomize the algorithm via the use of pseudorandom generators for small-space due to Nisan. See^[1:1] for more details.

2. Counting Frequent Items

We have seen various algorithm for estimating various

F_{p}

norms for

p \geq 0

. Note that

F_{0}

corresponds to number of distinct elements. In the limit, as

p \to \infty, ℓ_{p}

norm of a vector

x

is the maximum of the absolute values of the entries of

x

. Thus, we can define the

F_{\infty}

norm to corresponds to finding the maximum frequency in

x

. More generally, we would like to find the frequent items in a stream which are also called "heavy hitters". In general, it is not feasible to estimate the heaviest frequency with limited space if it is too small relative to

m

2.1. Misra-Greis algorithm for frequent items

Suppose we have a stream

σ = a_{1}, a_{2}, \dots, a_{m}

where

a_{j} \in [n]

, the simple setting and we want to find all elements in

[n]

such that

f_{i} > m / k

. Note that there can be at most

k

such elements. The simplest case is when

k = 2

when we want to know whether there is a "majority" element. There is a simple deterministic algorithm that perhaps you have all seen for

k = 2

in an algorithm class. The algorithm uses an associative array data structure of size

k

\underline{MisraGreis (k)}

D

is an empty associative array

While (stream is not empty) do

a_{j}

is current item

(a_{j}

is in

keys (D))

D [a_{j}] \leftarrow D [a_{j}] + 1

Else if

(| keys (A) | < k - 1)

then

D [a_{j}] \leftarrow 1

Else

for each

ℓ \in keys (D)

D [ℓ] \leftarrow D [ℓ] - 1

Remove elements from

D

whose counter values are

0

endWhile

For each

i \in keys (D)

set

{\hat{f}}_{i} = D [i]

For each

i \notin keys (D)

set

{\hat{f}}_{i} = 0

We leave the following as an exercise to the reader.

Lemma 2 For each $i \in [n]$ :

f_{i} - \frac{m}{k} \leq {\hat{f}}_{i} \leq f_{i} .

The lemma implies that if

f_{i} > m / k

then

i \in keys (D)

at the end of the algorithm. Thus one can use a second-pass over the data to compute the exact

f_{i}

only for the

k

itmes in

keys (D)

. This gives an

O (k n)

time two-pass algorithm for finding all items which have frequency at least

m / k

Bibliographic Notes: For more details on

F_{p}

estimation when

0 < p \leq 2

see the original paper of Indyk^[1:2], notes of Amit Chakrabarti (Chapter 7) and Lecture 4 of Jelani Nelson's course.

You can read the notes from the next lecture of Chandra Chekuri's course on Count and Count-Min Sketches here.

Piotr Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. Journal of the ACM (JACM), 53(3):307-323, 2006. ↩︎ ↩︎ ↩︎

Estimating $F_{2}$ norm, Sketching, Johnson-Lindenstrauss Lemma

1. Sketch for $F_{p}$ Estimation when $0 < p \leq 2$

2. Counting Frequent Items

2.1. Misra-Greis algorithm for frequent items

Recommended for you

Estimating F 2 F 2 F_(2)F_2F2 norm, Sketching, Johnson-Lindenstrauss Lemma

1. Sketch for F p F p F_(p)F_{p}Fp Estimation when 0 < p ≤ 2 0 < p ≤ 2 0 < p <= 20<p \leq 20<p≤2

2. Counting Frequent Items

2.1. Misra-Greis algorithm for frequent items

Recommended for you

Report Article

Estimating $F_{2}$ norm, Sketching, Johnson-Lindenstrauss Lemma

1. Sketch for $F_{p}$ Estimation when $0 < p \leq 2$