You can read the notes from the previous lecture of Chandra Chekuri's course on Estimating F_2 norm, Sketching, Johnson-Lindenstrauss Lemma here.
The Misra-Greis deterministic counting guarantees that all items with frequency > F_(1)//k>F_{1} / k can be found using O(k)O(k) counters and an update time of O(log k)O(\log k). Setting k=1//epsilonk=1 / \epsilon one can view the algorithm as providing an additive epsilonF_(1)\epsilon F_{1} approximation for each f_(i)f_{i}. However, the algorithm does not provide a sketch. One advantage of linear sketching algorithms is the ability to handle deletions. We now discuss two sketching algorithms that have a found a number of applications. These sketches can be used to for estimating point queries: after seeing a stream sigma\sigma over items in [n][n] we would like to estimate f_(i)f_{i} the frequency of i in[n]i \in[n]. More generally, in the turnstile model, we would like to estimate x_(i)x_{i} for a given i in[n]i \in[n]. We can only guarantee the estimate with an additive error.
1. CountMin Sketch
We firt describe the simpler CountMin sketch. The sketch maintains several counters. The counters are best visualized as a rectangular array of width ww and depth dd. With each row ii we have a hash function h_(i):[n]rarr[w]h_{i}:[n] \rightarrow[w] that maps elements to one of ww buckets.
Exercise: CountMin is a linear sketch. What are the entries of the projection matrix?
We will analyze the sketch in the strick turnstile model where x_(i) >= 0x_{i} \geq 0 for all i in[n]i \in[n]; note that Delta_(t)\Delta_{t} we be negative.
Lemma 1Letd=Omega(log (1)/(delta))d=\Omega\left(\log \frac{1}{\delta}\right)andw > (2)/(epsilon)w>\frac{2}{\epsilon}. Then for any fixedi in[n],x_(i) <= tilde(x)_(i)i \in[n], x_{i} \leq \tilde{x}_{i}and
Proof: Fix i in[n]i \in[n]. Let Z_(ℓ)=C[ℓ,h_(ℓ)(i)]Z_{\ell}=C\left[\ell, h_{\ell}(i)\right] be the value of the counter in row ℓ\ell to which ii is hashed to. We have
Note that we used pair-wise independence of h_(ℓ)h_{\ell} to conclude that Pr[h_(ℓ)(i^('))=h_(ℓ)(i)]=1//w\operatorname{Pr}\left[h_{\ell}\left(i^{\prime}\right)=h_{\ell}(i)\right]=1 / w.
By Markov's inequality (here we are using non-negativity of x\mathbf{x} ),
Remark: By choosing delta=Omega(log n)\delta=\Omega(\log n) we can ensure with probability at least (1-1//poly(n))(1-1 / \operatorname{poly}(n)) that tilde(x)_(i)-x_(i) <= epsilon||x||_(1)\tilde{x}_{i}-x_{i} \leq \epsilon\|\mathbf{x}\|_{1} for all i in[n]i \in[n].
Exercise: For general turnstile streams where x\mathbf{x} can have negative entries we can take the median of the counters. For this estimate you should be able to prove the following.
For i in[n]i \in[n] set tilde(x)_(i)="median"{g_(1)(i)C[1,h_(1)(i)],g_(2)(i)C[2,h_(2)(i),dots,g_(d)(i)C[d,h_(d)(i)]}.\tilde{x}_{i}=\text{median}\{g_{1}(i) C[1, h_{1}(i)], g_{2}(i) C[2, h_{2}(i), \ldots, g_{d}(i) C[d, h_{d}(i)]\}.
Exercise: CountMin is a linear sketch. What are the entries of the projection matrix?
Lemma 2Letd >= log (1)/(delta)d \geq \log \frac{1}{\delta}andw > (3)/(epsilon^(2))w>\frac{3}{\epsilon^{2}}. Then for any fixedi in[n],E[ tilde(x)_(i)]=x_(i)i \in[n], \mathbf{E}\left[\tilde{x}_{i}\right]=x_{i}and
Proof: Fix an i in[n]i \in[n]. Let Z_(ℓ)=g_(ℓ)(i)C[ℓ,h_(ℓ)(i)]Z_{\ell}=g_{\ell}(i) C\left[\ell, h_{\ell}(i)\right]. For i^(')in[n]i^{\prime} \in[n] let Y_(i^('))Y_{i^{\prime}} be the indicator random variable that is 1 if h_(ℓ)(i)=h_(ℓ)(i^('))h_{\ell}(i)=h_{\ell}\left(i^{\prime}\right); that is ii and i^(')i^{\prime} collide in h_(ℓ)h_{\ell}. Note that E[Y_(i^('))]=E[Y_(i^('))^(2)]=1//w\mathbf{E}\left[Y_{i^{\prime}}\right]=\mathbf{E}\left[Y_{i^{\prime}}^{2}\right]=1 / w from the pairwise independence of h_(ℓ)h_{\ell}. We have
because E[g_(ℓ)(i)g_(ℓ)(i^('))]=0\mathbf{E}\left[g_{\ell}(i) g_{\ell}\left(i^{\prime}\right)\right]=0 for i!=i^(')i \neq i^{\prime} from pairwise independence of g_(ℓ)g_{\ell} and Y_(i^('))Y_{i^{\prime}} is independent of g_(ℓ)(i)g_{\ell}(i) and g_(ℓ)(i^('))g_{\ell}\left(i^{\prime}\right). Now we upper bound the variance of Z_(ℓ)Z_{\ell}.
Thus choosing d=O(log n)d=O(\log n) and taking the median guarantees the desired bound with high probability.
Remark: By choosing delta=Omega(log n)\delta=\Omega(\log n) we can ensure with probability at least (1-1//poly(n))(1-1 / \operatorname{poly}(n)) that | tilde(x)_(i)-x_(i)| <= epsilon||x||_(2)\left|\tilde{x}_{i}-x_{i}\right| \leq \epsilon\|\mathbf{x}\|_{2} for alli in[n]i \in[n].
3. Applications
Count and CountMin sketches have found a number of applications. Note that they have a similar structure though the guarantees are different. Consider the problem of estimating frequency moments. Count sketch outputs an estimate tilde(f)_(i)\tilde{f}_{i} for f_(i)f_{i} with an additive error of epsilon||f||_(2)\epsilon\|\mathbf{f}\|_{2} while CountMin guarantees an additive error of epsilon||f||_(1)\epsilon\|\mathbf{f}\|_{1} which is always larger. CountMin provides a one-sided error when x >= 0\mathrm{x} \geq 0 which has some benefits. CountMin uses O((1)/(epsilon)log (1)/(delta))O\left(\frac{1}{\epsilon} \log \frac{1}{\delta}\right) counters while Count sketch uses O((1)/(epsilon^(2))log (1)/(delta))O\left(\frac{1}{\epsilon^{2}} \log \frac{1}{\delta}\right) counters. Note that the Misra-Greis algorithm uses O(1//epsilon)O(1 / \epsilon)-counters.
3.1. Heavy Hitters
We will call an index ii an alpha\alpha-HH (for heavy hitter) if x_(i) >= alpha||x||_(1)x_{i} \geq \alpha\|x\|_{1} where alpha in(0,1]\alpha \in(0,1]. We would like to find S_(alpha)S_{\alpha}, the set of all alpha\alpha-heavy hitters. We will relax this assumption to output SS such that
S_(alpha)sube S subeS_(alpha-epsilon).S_{\alpha} \subseteq S \subseteq S_{\alpha-\epsilon} .
Here we will assume that alpha < alpha\alpha<\alpha for otherwise the approximation does not make sense.
Suppose we used CountMin sketch with w=2//epsilonw=2 / \epsilon and delta=c//n\delta=c / n for sufficiently large cc. Then, as we saw, with probability at least (1-1//poly(n))(1-1 /\operatorname{poly}(n)), for all i in[n]i \in[n],
Once the sketch is computed we can simply go over all ii and add ii to SS if tilde(x)_(i) >= alpha||x||_(1)\tilde{x}_{i} \geq \alpha\|\mathbf{x}\|_{1}. It is easy to see that SS is the desired set.
Unfortunately the computation of SS is expensive. The sketch has O((1)/(epsilon)log n)O\left(\frac{1}{\epsilon} \log n\right) counters and processing each ii takes time proportional to the number of counters and hence the total time is O((1)/(epsilon)n log n)O\left(\frac{1}{\epsilon} n \log n\right) to output a set SS of size O((1)/(alpha))O\left(\frac{1}{\alpha}\right). It turns that by keeping additional information in the sketch in a hierarchical fashion one can cut down the time to be proportional to O((1)/(alpha):}polylog(n)))O\left(\frac{1}{\alpha}\right. \operatorname{polylog}(n))).
3.2. Range Queries
In several application the range [n][n] corresponds to an actual total ordering of the items. For instance [n][n] could represent the discretization of time and x\mathbf{x} corresponds to the signal. In databases [n][n] could represent ordered numerical attributes such as age of a person, height, or salary. In such settings range queries are very useful. A range query is an interval of the form [i,j][i, j] where i,j in[n]i, j \in[n] and i <= ji \leq j. The goal is to output sum_(i <= ℓ <= j)x_(i)\sum_{i \leq \ell \leq j} x_{i}. Note that there are O(n^(2))O\left(n^{2}\right) potential queries.
There is a simple trick to solve this using the sketches we have seen. An interval [i,j][i, j] is a dyadic interval/range if j-i+1j-i+1 is 2^(k)2^{k} and 2^(k)2^{k} divides i-1i-1. Assume nn is a power of 2. Then the dyadic intervals of length 1 are [1,1],[2,2],dots,[n,n][1,1],[2,2], \ldots,[n, n]. Those of length 2 are [1,2],[3,4],dots[1,2],[3,4], \ldots and of length 4 are [1,4],[5,8],dots[1,4],[5,8], \ldots.
Claim 3Every range[i,j][i, j]can be expressed as a disjoint union of at most2log n2 \log ndyadic ranges.
Thus it suffices to maintain accurate point queries for the dyadic ranges. Note that there are at most 2n2 n dyadic ranges. They fall into O(log n)O(\log n) groups based on length; the ranges for a given length partition the entire interval. We can keep a separate CountMin sketch for the n//2^(i)n / 2^{i} dyadic intervals of length ii ( i=0i=0 corresponds to the sketch for point queries). Using these O(log n)O(\log n) CountMin sketches we can answer any range query with an additive error of epsilon||x||_(1)\epsilon\|\mathbf{x}\|_{1}. Note that a range [i,j][i, j] is expressed as the sum of 2log n2 \log n point queries each of which has an additive error. So epsilon^(')\epsilon^{\prime} for the sketches has to be chosen to be epsilon//(2log n)\epsilon /(2 \log n) to ensure an additive error of epsilon||x||_(1)\epsilon\|\mathbf{x}\|_{1} for the range queries.
By choosing d=O(log n)d=O(\log n) the error probability for all point queries in all sketches will be at most 1//poly(n)1 / \operatorname{poly}(n). This will guarantee that all range queries will be answered to within an additive epsilon||x||_(1)\epsilon\|\mathbf{x}\|_{1}. The total space will be O((1)/(epsilon)log^(3)n)O\left(\frac{1}{\epsilon} \log ^{3} n\right)
3.3. Sparse Recovery
Let xinR^(n)\mathbf{x} \in \mathbb{R}^{n} be a vector. Can we approximate x\mathbf{x} by a sparse vector z\mathbf{z}? By sparse we mean that z\mathbf{z} has at most kk non-zero entries for some given kk (this is the same as saying ||z||_(0) <= k\|\mathbf{z}\|_{0} \leq k ). A reasonable way to model this is to ask for computing the error
for some pp. A typical choice is p=2p=2. It is easy to see that the optimum z\mathbf{z} is obtained by restricting x\mathbf{x} to its kk largest coordinates (in absolute value). The question we ask here is whether we can estimate err_(2)^(k)(x)\operatorname{err}_{2}^{k}(\mathbf{x}) efficiently in a streaming fashion. For this we use the Count sketch. Recall that by choosing w=3//epsilon^(2)w=3 / \epsilon^{2} and d=Theta(log n)d=\Theta(\log n) the sketch ensures that with high probability,
AA i in[n],quad| tilde(x)_(i)-x_(i)| <= epsilon||x||_(2).\forall i \in[n], \quad\left|\tilde{x}_{i}-x_{i}\right| \leq \epsilon\|\mathbf{x}\|_{2} .
One can in fact show a generalization.
Lemma 4Count-Sketch withw=3k//epsilonw=3 k / \epsilonandd=O(log n)d=O(\log n)ensures that
AA i in[n],quad| tilde(x)_(i)-x_(i)| <= (epsilon)/(sqrtk)err_(2)^(k)(x).\forall i \in[n], \quad\left|\tilde{x}_{i}-x_{i}\right| \leq \frac{\epsilon}{\sqrt{k}} \operatorname{err}_{2}^{k}(\mathbf{x}) .
Proof: Let S={i_(1),i_(2),dots,i_(k)}S=\left\{i_{1}, i_{2}, \ldots, i_{k}\right\} be the indices of the largest coordinates in x\mathbf{x} and let x^(')\mathbf{x}^{\prime} be obtained from x\mathbf{x} by setting entries of x\mathbf{x} to zero for indices in SS. Note that err_(2)^(k)(x)=||x^(')||_(2)\operatorname{err}_{2}^{k}(\mathbf{x})=\left\|\mathbf{x}^{\prime}\right\|_{2}. Fix a coordinate ii. Consider row ℓ\ell and let Z_(ℓ)=g_(ℓ)(i)C[ℓ,h_(ℓ)(i)]Z_{\ell}=g_{\ell}(i) C\left[\ell, h_{\ell}(i)\right] as before. Let A_(ℓ)A_{\ell} be the event that there exists an index t in St \in S such that h_(ℓ)(i)=h_(ℓ)(t)h_{\ell}(i)=h_{\ell}(t); that is any "big" coordinate collides with ii under h_(ℓ)h_{\ell}. Note that Pr[A_(ℓ)] <= sum_(t in S)Pr[h_(ℓ)(i)=Pr[h_(ℓ)(t)] <= |S|//w <= epsilon//3:}\operatorname{Pr}\left[A_{\ell}\right] \leq \sum_{t \in S} \operatorname{Pr}\left[h_{\ell}(i)=\operatorname{Pr}\left[h_{\ell}(t)\right] \leq|S| / w \leq \epsilon / 3\right. by pair-wise independence of hh. Now we estimate
Now let tilde(x)\tilde{\mathbf{x}} be the approximation to x\mathbf{x} that is obtained from the sketch. We can take the kk largest coordinates of tilde(x)\tilde{\mathbf{x}} to form the vector z\mathbf{z} and output z\mathbf{z}. We claim that this gives a good approximation to err_(2)^(k)(x)\operatorname{err}_{2}^{k}(\mathbf{x}). To see this we prove the following lemma.
Lemma 5Letx,yinR^(n)\mathbf{x}, \mathbf{y} \in \mathbb{R}^{n}such that
wherez\mathbf{z}is the vector obtained as follows:z_(i)=y_(i)\mathbf{z}_{i}=\mathbf{y}_{i}fori in Ti \in T where TTis the set ofkklargest (in absolute value) indices ofy\mathbf{y}andz_(i)=0\mathbf{z}_{i}=0fori!in Ti \notin T.
Proof: Let t=(1)/(sqrtk)err_(2)^(k)(x)t=\frac{1}{\sqrt{k}} \operatorname{err}_{2}^{k}(\mathbf{x}) to help ease the notation. Let SS be the index set of the largest coordinates of xx. We have,
{:[||x-z||_(2)^(2)=sum_(i in T)|x_(i)-z_(i)|^(2)+sum_(i in S\\T)|x_(i)-z_(i)|^(2)+sum_(i in[n]\\(S uu T))x_(i)^(2)],[=sum_(i in T)|x_(i)-y_(i)|^(2)+sum_(i in S\\T)x_(i)^(2)+sum_(i in[n]\\(S uu T))x_(i)^(2).]:}\begin{aligned}
\|\mathbf{x}-\mathbf{z}\|_{2}^{2} &=\sum_{i \in T}\left|x_{i}-z_{i}\right|^{2}+\sum_{i \in S \backslash T}\left|x_{i}-z_{i}\right|^{2}+\sum_{i \in[n] \backslash(S \cup T)} x_{i}^{2} \\
&=\sum_{i \in T}\left|x_{i}-y_{i}\right|^{2}+\sum_{i \in S \backslash T} x_{i}^{2}+\sum_{i \in[n] \backslash(S \cup T)} x_{i}^{2}.
\end{aligned}
We treat each term separately. The first one is easy to bound.
sum_(i in T)|x_(i)-y_(i)|^(2) <= sum_(i in T)epsilon^(2)t^(2) <= epsilon^(2)kt^(2).\sum_{i \in T}\left|x_{i}-y_{i}\right|^{2} \leq \sum_{i \in T} \epsilon^{2} t^{2} \leq \epsilon^{2} k t^{2} .
The third term is common to ||x-z||_(2)\|\mathbf{x}-\mathbf{z}\|_{2} and err_(2)^(k)(x)\operatorname{err}_{2}^{k}(\mathbf{x}). The second term is the one to care about.
Note that SS is set of kk largest coordinates in x\mathbf{x} and TT is set of kk largest coordinates in y\mathbf{y}. Thus |S\\T|=|T\\S||S \backslash T|=|T \backslash S|, say their cardinality is ℓ >= 1\ell \geq 1. Since x\mathbf{x} and y\mathbf{y} are close in ℓ_(oo)\ell_{\infty} norm (that is they are close in each coordinate) it must mean that the coordinates in S\\TS \backslash T and T\\ST \backslash S are roughly the same value in x\mathbf{x}. More precisely let a=max_(i in S\\T)|x_(i)|a=\max _{i \in S \backslash T}\left|x_{i}\right| and b=min_(i in T\\S)|x_(i)|b=\min _{i \in T \backslash S}\left|x_{i}\right|. We leave it as an exercise to the reader to argue that that a <= b+2epsilon ta \leq b+2 \epsilon t since ||x-y||_(oo) <= epsilon t\|\mathbf{x}-\mathbf{y}\|_{\infty} \leq \epsilon t.
Thus,
sum_(i in S\\T)x_(i)^(2) <= ℓa^(2) <= ℓ(b+2epsilon t)^(2) <= ℓb^(2)+4epsilon ktb+4kepsilon^(2)t^(2).\sum_{i \in S \backslash T} x_{i}^{2} \leq \ell a^{2} \leq \ell(b+2 \epsilon t)^{2} \leq \ell b^{2}+4 \epsilon k t b+4 k \epsilon^{2} t^{2}.
But we have
sum_(i in T\\S)x_(i)^(2) >= ℓb^(2).\sum_{i \in T \backslash S} x_{i}^{2} \geq \ell b^{2} .
The lemma follows by by the fact that for sufficiently small epsilon,sqrt(1+9epsilon) <= 1+5epsilon\epsilon, \sqrt{1+9 \epsilon} \leq 1+5 \epsilon.
Bibliographic Notes: Count sketch is by Charikar, Chen and Farach-Colton[1]. CountMin sketch is due to Cormode and Muthukrishnan[2]; see the papers for several applications. Cormode's survey on sketching in[3] has a nice perspective. See[4] for a comparative analysis (theoretical and experimenta) of algorithms for frinding frequent items. A deterministic variant of CountMin called CR-Precis is interesting; see http://polylogblog.wordpress.com/2009/09/22/bite-sized-streams-cr-precis/ for a blog post with pointers and some comments. The applications are taken from the first chapter in the draft book by McGregor and Muthukrishnan.
You can read the notes from the next lecture of Chandra Chekuri's course on \ell_0 Sampling, and Priority Sampling here.
Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding frequent items in data streams. Theoretical Computer Science, 312(1):3-15, 2004. ↩︎
Graham Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. J. Algorithms, 55(1):58-75, 2005. ↩︎
Graham Cormode, Minos N. Garofalakis, Peter J. Haas, and Chris Jermaine. Synopses for massive data: Samples, histograms, wavelets, sketches. Foundations and Trends in Databases, 4(1-3):1-294, 2012. ↩︎
Graham Cormode and Marios Hadjieleftheriou. Methods for finding frequent items in data streams. VLDB J., 19(1):3-20, 2010. ↩︎
Event Camera: the Next Generation of Visual Perception System
Event Camera: the Next Generation of Visual Perception System
Event camera can extend computer vision to scenarios where it is currently incompetent. In the following decades, it is hopeful that event cameras will be mature enough to be mass-produced, to have dedicated algorithms, and to show up in widely-used products.