Estimating $F_k$ norms via AMS sampling

You can read the notes from the previous lecture of Chandra Chekuri's course on Estimating the Number of Distinct Elements in a Stream here.

1. AMS Sampling

We have seen reservoir sampling and the related weighted sampling technique to obtain independent samples from a stream without the algorithm knowing the length of the stream. We now discuss a technique to sample from a stream

σ = a_{1}, a_{2}, \dots, a_{m}

where the tokens

a_{j}

are integers from

[n]

and we wish to estimate a function

g (σ) := \sum_{i \in [n]} g (f_{i})

where

f_{i}

is the frequency of

i

and

g

is a real-valued function such that

g (0) = 0

. A natural example is to estimate frequency moments

F_{k} = \sum_{i \in [n]} f_{i}^{k}

; here we have

g (x) = x^{k}

, a convex function for

k \geq 1

. Another example is the empirical entropy of

σ

defined as

\sum_{i \in [n]} p_{i} \log p_{i}

where

p_{i} = \frac{f_{i}}{m}

is the empirical probability of

i

; here

g (x) = x \log x

.^[1]

AMS sampling from the famous paper [?] gives an unbiased estimator for

g (σ)

. The estimator is based on a random variable

Y

defined as follows. Let

J

be a uniformly random sample from

[m]

. Let

R = | {j | a_{j} = a_{J}, J \leq j \leq m} |

. That is,

R

is the count of the number of tokens after

J

that are for the same coordinate. Then, let

Y

the estimate defined as:

Y = m (g (R) - g (R - 1)) .

The lemma below shows that

Y

is an unbiased estimator of

g (σ)

Lemma 1

E [Y] = g (σ) = \sum_{i \in [n]} g (f_{i}) .

Proof: The probability that

a_{J} = i

is exactly

f_{i} / m

since

J

is a uniform sample. Moreover if

a_{J} = i

then

R

is distributed as a uniform random variable over

[f_{i}]

\begin{aligned} E [Y] & = \sum_{i \in [n]} \Pr [a_{J} = i] E [Y | a_{J} = i] \\ = \sum_{i \in [n]} \frac{f_{i}}{m} E [Y | a_{J} = i] \\ = \sum_{i \in [n]} \frac{f_{i}}{m} \sum_{ℓ = 1}^{f_{i}} m \frac{1}{f_{i}} (g (ℓ) - g (ℓ - 1)) \\ = \sum_{i \in [n]} g (f_{i}) . \end{aligned}

One can estimate

Y

using small space in the streaming setting via the reservoir sampling idea for generating a uniform sample. The algorithm is given below; the count

R

gets reset whenever a new sample is picked.

\underline{AMSEstimate:}

s \leftarrow null

m \leftarrow 0

R \leftarrow 0

While (stream is not done)

m \leftarrow m + 1

a_{m}

is current item

Toss a biased coin that is heads with probability

1 / m

If (coin turns up heads)

s \leftarrow a_{m}

R \leftarrow 1

Else If

(a_{m} == s)

R \leftarrow R + 1

endWhile

Output

m (g (R) - g (R - 1))

To obtain a

(ϵ, δ)

-approximation via the estimator

Y

we need to estimate

Var [Y]

and apply standard tools. We do this for frequency moments now.

1.1. Application to estimating frequency moments

We now apply the AMS sampling to estimate

F_{k}

the

k

'th frequency moment for

k \geq 1

. We have already seen that

Y

is an exact statistication estimator for

F_{k}

when we set

g (x) = x^{k}

. We now estimate the variance of

Y

in this setting.

Lemma 2 When

g (x) = x^{k}

and

k \geq 1

V a r [Y] \leq k F_{1} F_{2 k - 1} \leq k n^{1 - \frac{1}{k}} F_{k}^{2} .

Proof:

\begin{aligned} V a r [Y] & \leq E [Y^{2}] \\ \leq \sum_{i \in [n]} \Pr [a_{J} = i] \sum_{ℓ = 1}^{f_{i}} \frac{m^{2}}{f_{i}} (ℓ^{k} - (ℓ - 1)^{k})^{2} \\ \leq \sum_{i \in [n]} \frac{f_{i}}{m} \sum_{ℓ = 1}^{f_{i}} \frac{m^{2}}{f_{i}} (ℓ^{k} - (ℓ - 1)^{k}) (ℓ^{k} - (ℓ - 1)^{k}) \\ \leq m \sum_{i \in [n]} \sum_{ℓ = 1}^{f_{i}} k ℓ^{k - 1} (ℓ^{k} - (ℓ - 1)^{k}) (using (x^{k} - (x - 1)^{k}) \leq k x^{k - 1}) \\ \leq k m \sum_{i \in [n]} f_{i}^{k - 1} f_{i}^{k} \\ \leq k m F_{2 k - 1} = k F_{1} F_{2 k - 1} . \end{aligned}

We now use convexity of the function

x^{k}

for

k \geq 1

to prove the second part. Note that

max_{i} f_{i} = F_{\infty}

F_{1} F_{2 k - 1} = (\sum_{i} f_{i}) (\sum_{i} f_{i}^{2 k - 1}) \leq (\sum_{i} f_{i}) F_{\infty}^{k - 1} (\sum_{i} f_{i}^{k}) \leq (\sum_{i} f_{i}) {(\sum_{i} f_{i}^{k})}^{\frac{k - 1}{k}} (\sum_{i} f_{i}^{k}) .

Using the preceding inequality, and the inequality

(\sum_{i = 1}^{n} x_{i}) / n \leq ((\sum_{i = 1}^{n} x_{i}^{k}) / n)^{\frac{1}{k}}

for all

k \geq 1

(due to the convexity of the function

g (x) = x^{k})

, we obtain that

F_{1} F_{2 k - 1} \leq (\sum_{i} f_{i}) {(\sum_{i} f_{i}^{k})}^{\frac{k - 1}{k}} (\sum_{i} f_{i}^{k}) \leq n^{1 - 1 / k} {(\sum_{i} f_{i}^{k})}^{\frac{1}{k}} {(\sum_{i} f_{i}^{k})}^{\frac{k - 1}{k}} (\sum_{i} f_{i}^{k}) \leq n^{1 - 1 / k} {(\sum_{i} f_{i}^{k})}^{2} .

Thus we have

E [Y] = F_{k}

and

V a r [Y] \leq k n^{1 - 1 / k} F_{k}^{2}

. We now apply the trick of reducing the variance and then the median trick to obtain a high-probability bound. If we take

h

independent estimators for

Y

and take their average the variance goes down by a factor of

h

. We let

h =

\frac{c}{ϵ^{2}} k n^{1 - 1 / k}

for some fixed constant

c

. Let

Y^{'}

be the resulting averaged estimator. We have

E [Y^{'}] =

F_{k}

and

V a r [Y^{'}] \leq V a r [Y] / h \leq \frac{ϵ^{2}}{c} F_{k}^{2}

. Now, using Chebyshev, we have

\Pr [| Y^{'} - E [Y^{'}] | \geq ϵ E [Y^{'}]] \leq V a r [Y^{'}] / (ϵ^{2} E [Y^{'}]^{2}) \leq \frac{1}{c} .

We can choose

c = 3

to obtain a

(ϵ, 1 / 3)

-approximation. By using the median trick with

Θ (\log \frac{1}{δ})

independent estimators we can obtain a

(ϵ, δ)

-approximation. The overall number of estimators we run independently is

O (\log \frac{1}{δ} \cdot \frac{1}{ϵ^{2}} \cdot n^{1 - 1 / k})

. Each estimator requires

O (\log n + \log m)

space since we keep track of one index from

[m]

, one count from

[m]

, and one item from

[n]

. Thus the space usage to obtain a

(ϵ, δ)

-approximation is

O (\log \frac{1}{δ} \cdot \frac{1}{ϵ^{2}} \cdot n^{1 - 1 / k} \cdot (\log m + \log n))

. The time to process each stream element is also the same.

The space complexity of

\tilde{O} (n^{1 - 1 / k})

is not optimal for estimating

F_{k}

. One can achieve

\tilde{O} (n^{1 - 2 / k})

which is optimal for

k > 2

and one can in fact achieve poly-logarithmic space for

1 \leq k \leq 2

. We will see these results later in the course.

Bibliographic Notes: See Chapter 1 of the draft book by McGregor and Muthukrishnan; see the application of AMS sampling for estimating the entropy. See Chapter 5 of Amit Chakrabarti for the special case of frequency moments explained in detail. In particular he states a clean lemma that bundles the variance reduction technique and the median trick.

You can read the notes from the next lecture from Chandra Chekuri's course on Estimating F_2 norm, Sketching and Johnson-Lindenstrauss Lemma here.

In the context of entropy, by convention, $x \log x = 0$ for $x = 0$ . ↩︎

Estimating $F_{k}$ norms via AMS sampling

1. AMS Sampling

1.1. Application to estimating frequency moments

Recommended for you

Estimating F k F k F_(k)F_kFk norms via AMS sampling

1. AMS Sampling

1.1. Application to estimating frequency moments

Recommended for you

Report Article

Estimating $F_{k}$ norms via AMS sampling