Approximation Algorithms: Clustering and Facility Location

This is the lecture notes from Chandra Chekuri's CS583 course on Approximation Algorithms. Chapter 9: Clustering and Facility Location. You can read Chapter 10: Introduction to Network Design, here. Chapter 8: Introduction to Local Search, here.
Chapter 9
Clustering and Facility Location are two widely studied topics with a vast literature. Facility location problems have been well-studied in Operations Research and logistics. Clustering is ubiquitious with many applications in data analysis and machine learning. We confine attention to a few central problems and provide some pointers as needed to other topics. These problems have also played an important role in approximation algorithms and their study has led to a variety of interesting techniques. Research on these topics is still quite active.
For both classes of problems a key assumption that we will make is that we are working with points in some underlying metric space. Recall that a space ( V , d ) ( V , d ) (V,d)(V, d) where d : V × V R + d : V × V R + d:V xx V rarrR_(+)d: V \times V \rightarrow \mathbb{R}_{+}is a metric space if the distance function d d dd satisfies metric properties: (i) d ( u , v ) = 0 d ( u , v ) = 0 d(u,v)=0d(u, v)=0 iff u = v u = v u=vu=v (reflexivity) (ii) d ( u , v ) = d ( v , u ) d ( u , v ) = d ( v , u ) d(u,v)=d(v,u)d(u, v)=d(v, u) for all u , v V u , v V u,v in Vu, v \in V (symmetry) and (iii) d ( u , v ) + d ( v , w ) d ( u , w ) d ( u , v ) + d ( v , w ) d ( u , w ) d(u,v)+d(v,w) >= d(u,w)d(u, v)+d(v, w) \geq d(u, w) for all u , v , w V u , v , w V u,v,w in Vu, v, w \in V (triangle inequality). We will abuse the notation and use d ( A , B ) d ( A , B ) d(A,B)d(A, B) for two sets A , B V A , B V A,B sube VA, B \subseteq V to denote the quantity min p A , q B d ( p , q ) min p A , q B d ( p , q ) min_(p in A,q in B)d(p,q)\min _{p \in A, q \in B} d(p, q). Similarly d ( p , A ) d ( p , A ) d(p,A)d(p, A) for p V p V p in Vp \in V and A V A V A sube VA \subseteq V will denote min q A d ( p , q ) min q A d ( p , q ) min_(q in A)d(p,q)\min _{q \in A} d(p, q).
Center based clustering: In center based clustering we are given n n nn points P = { p 1 , p 2 , , p n } P = p 1 , p 2 , , p n P={p_(1),p_(2),dots,p_(n)}P=\left\{p_{1}, p_{2}, \ldots, p_{n}\right\} in a metric space ( V , d ) ( V , d ) (V,d)(V, d), and an integer k k kk. The goal is to cluster/partition P P PP into k k kk clusters C 1 , C 2 , , C k C 1 , C 2 , , C k C_(1),C_(2),dots,C_(k)C_{1}, C_{2}, \ldots, C_{k} which are induced by choosing k k kk centers c 1 , c 2 , , c k c 1 , c 2 , , c k c_(1),c_(2),dots,c_(k)c_{1}, c_{2}, \ldots, c_{k} from V V VV. Each point p i p i p_(i)p_{i} is assigned to its nearest center from c 1 , c 2 , , c k c 1 , c 2 , , c k c_(1),c_(2),dots,c_(k)c_{1}, c_{2}, \ldots, c_{k} and this induces a clustering. The nature of the clustering is controlled by an objective function that measures the quality of the clusters. Typically we phrase the problem as choosing c 1 , c 2 , , c k c 1 , c 2 , , c k c_(1),c_(2),dots,c_(k)c_{1}, c_{2}, \ldots, c_{k} to minimize the clustering objective i = 1 n d ( p i , { c 1 , , c k } ) q i = 1 n d p i , c 1 , , c k q sum_(i=1)^(n)d(p_(i),{c_(1),dots,c_(k)})^(q)\sum_{i=1}^{n} d\left(p_{i},\left\{c_{1}, \ldots, c_{k}\right\}\right)^{q} for some q q qq. The three most wellstudied problems are special cases obtained by choosing an appropriate q q qq.
  • k k kk-Center is the problem when q = q = q=ooq=\infty which can be equivalently phrased as min c 1 , c 2 , , c k V max i = 1 n d ( p i , { c 1 , , c k } ) min c 1 , c 2 , , c k V max i = 1 n d p i , c 1 , , c k min_(c_(1),c_(2),dots,c_(k)in V)max_(i=1)^(n)d(p_(i),{c_(1),dots,c_(k)})\min _{c_{1}, c_{2}, \ldots, c_{k} \in V} \max _{i=1}^{n} d\left(p_{i},\left\{c_{1}, \ldots, c_{k}\right\}\right). In other words we want to minimize the maximum distance of the input points to the cluster centers.
  • k k kk-Median is problem when q = 1. min c 1 , c 2 , , c k V i = 1 n d ( p i , { c 1 , , c k } ) q = 1. min c 1 , c 2 , , c k V i = 1 n d p i , c 1 , , c k q=1.min_(c_(1),c_(2),dots,c_(k)in V)sum_(i=1)^(n)d(p_(i),{c_(1),dots,c_(k)})q=1. \min _{c_{1}, c_{2}, \ldots, c_{k} \in V} \sum_{i=1}^{n} d\left(p_{i},\left\{c_{1}, \ldots, c_{k}\right\}\right).
  • k k kk-Means is the problem when q = 2. min c 1 , c 2 , , c k V i = 1 n d ( p i , { c 1 , , c k } ) 2 q = 2. min c 1 , c 2 , , c k V i = 1 n d p i , c 1 , , c k 2 q=2.min_(c_(1),c_(2),dots,c_(k)in V)sum_(i=1)^(n)d(p_(i),{c_(1),dots,c_(k)})^(2)q=2. \min _{c_{1}, c_{2}, \ldots, c_{k} \in V} \sum_{i=1}^{n} d\left(p_{i},\left\{c_{1}, \ldots, c_{k}\right\}\right)^{2}.
We will mainly focus on the discrete versions of the problems where V = V = V=V= { p 1 , p 2 , , p n } p 1 , p 2 , , p n {p_(1),p_(2),dots,p_(n)}\left\{p_{1}, p_{2}, \ldots, p_{n}\right\} which means that the centers are to be chosen from the input points themselves. However, in many data analysis applications the points lie in R d R d R^(d)\mathbb{R}^{d} for some d d dd and the centers can be chosen anywhere in the ambient space. In fact this makes the problems more difficult in a certain sense since the center locations now come from an infinite set. One can argue that limiting centers to the input points does not lose more than a constant factor in the approximation and this may be reasonable from a first-cut point of view but perhaps not ideal from a practical point of view. In some settings there may a requirement or advantage in choosing the cluster centers from the input set.
Facility Location: In facility location we typically have two finite sets F F F\mathcal{F} and C C CC where F F F\mathcal{F} represents a set of locations where facilities can be located and D D D\mathcal{D} is a set of client/demand locations. We will assume that V = F D V = F D V=F⊎DV=\mathcal{F} \uplus \mathcal{D} and that there is a metrid d d dd over V V VV. There are several variants but one of the simplest one is the Uncapacitated Facility Location (UCFL) problem. In UCFL we are given ( F D , d ) ( F D , d ) (F⊎D,d)(\mathcal{F} \uplus \mathcal{D}, d) as well auxiliarly information which specifies the cost f i f i f_(i)f_{i} of opening a facility at location i F i F i inFi \in \mathcal{F}. The goal is to open a subset of facilities in F F F\mathcal{F} to minimize the sum of the cost of the opened facilities and the total distance traveled by the clients to their nearest open facility. In other words we want to solve min F F ( i F f i + j D d ( j , F ) ) min F F ( i F f i + j D d j , F ) min_(F^(')subeF)(sum_(i inF^('))f_(i)+sum_(j inD)d(j,F^(')))\min _{\mathcal{F}^{\prime} \subseteq \mathcal{F}}(\sum_{i \in \mathcal{F}^{\prime}} f_{i}+\sum_{j \in \mathcal{D}} d\left(j, \mathcal{F}^{\prime}\right)). The problem has close connections to k k kk-Median problem. The term "uncapacitated" refers to the fact that we do not limit the number of clients that can be assigned to an open facility. In several OR applications that motivate facility location (opening warehouses or distribution facilities), capacity constraints are likely to be important. For this reasons there are several capacitated versions.

9.1 k k kk-Center

Recall that in k k kk-Center we are given n n nn points p 1 , , p n p 1 , , p n p_(1),dots,p_(n)p_{1}, \ldots, p_{n} in a metric space and an integer k k kk and we need to choose k k kk cluster centers C = { c 1 , c 2 , , c k } C = c 1 , c 2 , , c k C={c_(1),c_(2),dots,c_(k)}C=\left\{c_{1}, c_{2}, \ldots, c_{k}\right\} such that we minimize max i d ( p i , C ) max i d p i , C max_(i)d(p_(i),C)\max _{i} d\left(p_{i}, C\right). An alternative view is that we wish to find the smallest radius R R RR such that there are k k kk balls of radius R R RR that together cover all the input points. Given a fixed R R RR this can be seen as a Set Cover problem. In fact there is an easy reduction from Dominating Set to k k kk-Center establishing the NP-Hardness. Moreoever, as we saw already in Chapter 1 , k 1 , k 1,k1, k-Center has no 2 ϵ 2 ϵ 2-epsilon2-\epsilon-approximation unless P = N P P = N P P=NPP=N P via a reduction from Dominating Set. Here we will see two 2-approximation algorithms that are quite different and have their own advantages. The key lemma for their analysis is common and is stated below.
Lemma 9.1. Suppose there are k + 1 k + 1 k+1k+1 points q 1 , q 2 , , q k + 1 P q 1 , q 2 , , q k + 1 P q_(1),q_(2),dots,q_(k+1)in Pq_{1}, q_{2}, \ldots, q_{k+1} \in P such that d ( q i , q j ) > d q i , q j > d(q_(i),q_(j)) >d\left(q_{i}, q_{j}\right)> 2 R 2 R 2R2 R for all i j i j i!=ji \neq j. Then OPT > R OPT > R "OPT" > R\text{OPT}>R.
Proof. Suppose OPT R OPT R "OPT" <= R\text{OPT}\leq R. Then there are k k kk centers C = { c 1 , c 2 , , c k } C = c 1 , c 2 , , c k C={c_(1),c_(2),dots,c_(k)}C=\left\{c_{1}, c_{2}, \ldots, c_{k}\right\} which induces k k kk clusters C 1 , , C k C 1 , , C k C_(1),dots,C_(k)C_{1}, \ldots, C_{k} such that for each cluster C h C h C_(h)C_{h} and each p C h p C h p inC_(h)p \in C_{h} we have d ( p , c h ) R d p , c h R d(p,c_(h)) <= Rd\left(p, c_{h}\right) \leq R. By the pigeon hole principle some q i , q j , i j q i , q j , i j q_(i),q_(j),i!=jq_{i}, q_{j}, i \neq j are in the same cluster C h C h C_(h)C_{h} but this implies that d ( q i , q j ) d ( q i , c h ) + d ( q j , c h ) 2 R d q i , q j d q i , c h + d q j , c h 2 R d(q_(i),q_(j)) <= d(q_(i),c_(h))+d(q_(j),c_(h)) <= 2Rd\left(q_{i}, q_{j}\right) \leq d\left(q_{i}, c_{h}\right)+d\left(q_{j}, c_{h}\right) \leq 2 R which contradicts the assumption of the lemma.
Note that the lemma holds even if the centers can be chosen from outside the given point set P P PP.

9.1.1 Gonzalez's algorithm and nets in metric spaces

The algorithm starts with an empty set of centers, and in each iteration picks a new center which is the farthest point from the set of centers chosen so far.
Gonzalez- k -Center ( P , k ) _ Gonzalez- k -Center ( P , k ) _ "Gonzalez-"k"-Center"(P,k)_\underline{\text{Gonzalez-}k\text{-Center}(P,k)}
1. C C C larr O/C \leftarrow \emptyset
2. For i = 1 i = 1 i=1i=1 to k k kk do
quad\quad A. Let c i = arg max c P d ( c , C ) c i = arg max c P d ( c , C ) c_(i)=arg max_(c in P)d(c,C)c_{i}=\underset{c \in P}{\arg \max } d(c, C)
quad\quad B. C C { c i } C C c i C larr C uu{c_(i)}C \leftarrow C \cup\left\{c_{i}\right\}.
3. Output C C CC
Note that c 1 c 1 c_(1)c_{1} is chosen arbitrarily.
Theorem 9.1. Let R = max p P d ( p , C ) R = max p P d ( p , C ) R=max_(p in P)d(p,C)R=\max _{p \in P} d(p, C) where C C CC is the set of centers chosen by Gonzalez's algorithm. Then R 2 R R 2 R R <= 2R^(**)R \leq 2 R^{*} where R R R^(**)R^{*} is the optimum k k kk-Center radius for P P PP.
Proof. Suppose not. There is a point p P p P p in Pp \in P such that d ( p , C ) > 2 R d ( p , C ) > 2 R d(p,C) > 2R^(**)d(p, C)>2 R^{*} which implies that p C p C p!in Cp \notin C. Since the algorithm chose the farthest point in each iteration and could have chosen p p pp in each of the k k kk iteration but did not, we have the property that d ( c i , { c 1 , , c i 1 } ) > 2 R d c i , c 1 , , c i 1 > 2 R d(c_(i),{c_(1),dots,c_(i-1)}) > 2R^(**)d\left(c_{i},\left\{c_{1}, \ldots, c_{i-1}\right\}\right)>2 R^{*} for i = 2 i = 2 i=2i=2 to k k kk. This implies that the distance between each pair of points in the set { c 1 , c 2 , , c k , p } c 1 , c 2 , , c k , p {c_(1),c_(2),dots,c_(k),p}\left\{c_{1}, c_{2}, \ldots, c_{k}, p\right\} is more than 2 R 2 R 2R^(**)2 R^{*}. By Lemma 9.1, the optimum radius must be larger than R R R^(**)R^{*}, a contradiction.
Exercise 9.1. Construct an instance to demonstrate that the algorithm's worstcase performance is 2 2 22.
Remark 9.1. Gonzalez's algorithm can be extended in a simple way to create a permutation of the points P P PP; we simply run the algorithm with k = n k = n k=nk=n. It is easy to see from the proof above that for any k [ n ] k [ n ] k in[n]k \in[n], the prefix of the permutation consisting of the first k k kk points provides a 2-approximation for that choice of k k kk. Thus, one can compute the permutation once and reuse it for all k k kk.

9.1.2 Hochbaum-Shmoys bottleneck approach

A second algorithmic approach for k k kk-Center is due to Hochbaum and Shmoys and has played an influential role in variants of this problem.
For a point v v vv and radius r r rr let B ( v , r ) = { u d ( u , v ) r } B ( v , r ) = { u d ( u , v ) r } B(v,r)={u∣d(u,v) <= r}B(v, r)=\{u \mid d(u, v) \leq r\} denote the ball of radius r r rr around v v vv.
HS- k -Center ( P , k ) _ HS- k -Center ( P , k ) _ "HS-"k"-Center"(P,k)_\underline{\text{HS-}k\text{-Center}(P,k)}
1. Guess R R R^(**)R^{*} the optimum radius
2. C , S P C , S P C larr O/,S larr PC \leftarrow \emptyset, S \leftarrow P
3. While ( S ) ( S ) (S!=O/)(S \neq \emptyset) do
quad\quad A. Let c c cc be an arbitrary point in S S SS
quad\quad B. C C { c } C C { c } C larr C uu{c}C \leftarrow C \cup\{c\}
quad\quad C. S S B ( c , 2 R ) S S B c , 2 R S larr S\\B(c,2R^(**))S \leftarrow S \backslash B\left(c, 2 R^{*}\right)
4. Output C C CC
Theorem 9.2. Let C C CC be the output of the HS algorithm for a guess R R RR. Then for all p P , d ( p , C ) 2 R p P , d ( p , C ) 2 R p in P,d(p,C) <= 2Rp \in P, d(p, C) \leq 2 R and moreover if R R R R R >= R^(**)R \geq R^{*} then | C | k | C | k |C| <= k|C| \leq k.
Proof. The first property is easy to see since we only remove a point p p pp from S S SS if we add a center c c cc to C C CC such that p B ( c , 2 R ) p B ( c , 2 R ) p in B(c,2R)p \in B(c, 2 R). Let c 1 , c 2 , , c h c 1 , c 2 , , c h c_(1),c_(2),dots,c_(h)c_{1}, c_{2}, \ldots, c_{h} be the centers chosen by the algorithm. We observe that d ( c i , { c 1 , , c i 1 } ) > 2 R d c i , c 1 , , c i 1 > 2 R d(c_(i),{c_(1),dots,c_(i-1)}) > 2Rd\left(c_{i},\left\{c_{1}, \ldots, c_{i-1}\right\}\right)>2 R. Thus, if the algorithm outputs h h hh points then the pairwise distance between any two of them is more than 2 R 2 R 2R2 R. By Lemma 9.1, if h k + 1 h k + 1 h >= k+1h \geq k+1 the optimum radius is > R > R > R>R. Hence, if the guess R R R R R >= R^(**)R \geq R^{*} the algorithm outputs at most k k kk centers.
The guessing of R R R^(**)R^{*} can be implemented by binary search in various ways. We omit these routine details.
Exercise 9.2. Describe an example where the algorithm uses exactly k k kk centers even with guess R R R^(**)R^{*}. Describe an example where the algorithm outputs less than k k kk centers with a guess of R R R^(**)R^{*}.
The k k kk-Center problem is natural in geometric settings and one can see from the proof that the 2-approximation for the two algorithms holds even when allowing for centers to be chosen outside. A surprising result of Feder and Greene [1] shows that even in two dimensions (the Euclidean plane) one cannot improve the factor of 2 unless P = N P P = N P P=NPP=N P.
The k k kk-Supplier problem is closely related to k k kk-Center and is motivated by facility location considerations. Here we are given F D F D FuuD\mathcal{F} \cup \mathcal{D} which are in a metric space. We need to choose k k kk centers C F C F C subeFC \subseteq \mathcal{F} to minimize max p D d ( p , C ) max p D d ( p , C ) max_(p inD)d(p,C)\max _{p \in \mathcal{D}} d(p, C). Note that we don't have to cover the facilities. Hochbaum and Shmoys gave a variant of their algorithm that obtains a 3-approximation for k-Supplier and moreover showed that unless P = N P P = N P P=NPP=N P one cannot improve 3 [2]. Interestingly in Euclidean spaces 3 is not tight [3]. Several generalizations of k k kk-Center which constrain the choice of centers have been considered - see [4] for a general framework that also considers outliers.
One can consider weighted version of k k kk-Center or relabeled as priority version in subsequent work. We refer to the work of Plesnik [5] and a recent one [6] on this variant which has found applications in fair clustering.
We finally mention that k k kk-Center clustering has nice connections to the notion of r r rr-nets in metric spaces. Given a set of points P P PP in a metric space and a radius r r rr, an r r rr-net is a set of centers C C CC such that (i) for all p P p P p in Pp \in P we have d ( p , C ) r d ( p , C ) r d(p,C) <= rd(p, C) \leq r (that is the points are covered by balls of radius r r rr ) and (ii) for any distinct c , c C c , c C c,c^(')in Cc, c^{\prime} \in C we have B ( c , r / 2 ) B ( c , r / 2 ) B(c,r//2)B(c, r / 2) and B ( c , r / 2 ) B c , r / 2 B(c^('),r//2)B\left(c^{\prime}, r / 2\right) are disjoint (packing property or the property that no two centers are too close). r r rr-nets provide a concise summary of distances in a metric space at scale r r rr. One can use r r rr-nets to obtain nearest-neighbor data structures and other applications, especially in low-dimensional settings. We refer the reader to [7],[8].
LP Relaxation: The two k k kk-Center algorithms we described are combinatorial. One can also consider an LP relaxation. Since it is a bottleneck problem, writing an LP relaxation involves issues similar to what we saw with unrelated machine scheduling. Given a guess R R RR we can write an LP to check whether a radius R R RR is feasible and then find the smallest R R RR for which it is feasible. The feasibility LP can be written as follows. Let x i x i x_(i)x_{i} be an indicator for whether we open a center at point p i p i p_(i)p_{i}.
i = 1 n x i = k p i B ( p j , R ) x i 1 p j P x i 0 p i P i = 1 n x i = k p i B p j , R x i 1 p j P x i 0 p i P {:[sum_(i=1)^(n)x_(i)=k],[sum_(p_(i)in B(p_(j),R))x_(i) >= 1quad AAp_(j)in P],[x_(i) >= 0quad AAp_(i)in P]:}\begin{aligned} \sum_{i=1}^{n} x_{i} &=k \\ \sum_{p_{i} \in B\left(p_{j}, R\right)} x_{i} & \geq 1 \quad \forall p_{j} \in P \\ x_{i} & \geq 0 \quad \forall p_{i} \in P \end{aligned}
Exercise 9.3. Prove that if R R RR is feasible for the preceding LP then one can obtain a solution with k k kk centers with max radius 2 R 2 R 2R2 R.
Exercise 9.4. Generalize the LP for the k k kk-Supplier problem and prove that one can obtain a 3-approximation with respect to lower bound provided via the LP approach.

9.2 Uncapacitated Facility Location

We now discuss UCFL. One can obtain a constant factor for UCFL via several techniques: LP rounding, primal-dual, local search and greedy. The best known approximation bound is 1.488 1.488 1.4881.488 due to Li [9] while it is known that one cannot obtain a ratio better than 1.463 1.463 1.4631.463 [10]. We will describe the complicated algorithms and focus on the high-level approaches that yield some constant factor approximation.
It is common to use i i ii to denote a facility in F F F\mathcal{F} and j j jj to denote a demand/client.

9.2.1 LP Rounding

The first constant factor approximation for UCFL was via LP rounding by Aardal, Shmoys and Tardos using a filtering technique of Lin and Vitter. We start with the LP relaxation. We use a variable y i y i y_(i)y_{i} for i F i F i inFi \in \mathcal{F} to indicate whether i i ii is opened or not. We use a variable x i , j x i , j x_(i,j)x_{i, j} to indicate whether j j jj is connected to i i ii (or assigned to i i ii). One set of constraints are natural here: each client has to be assigned/connected to a facility. The other constraint requires that j j jj is assigned to i i ii only if i i ii is open.
min i F f i y i + j D i F d ( i , j ) x i , j i x i , j = 1 j D x i , j y i i F , j D x , y 0 min i F f i y i + j D i F d ( i , j ) x i , j i x i , j = 1 j D x i , j y i i F , j D x , y 0 {:[minsum_(i inF)f_(i)y_(i)+sum_(j inD)sum_(i inF)d(i","j)x_(i,j)],[sum_(i)x_(i,j)=1quad AA j inD],[x_(i,j) <= y_(i)quad i inF","j inD],[x","y >= 0]:}\begin{aligned} \min \sum_{i \in \mathcal{F}} f_{i} y_{i}+\sum_{j \in \mathcal{D}} \sum_{i \in \mathcal{F}} d(i, j) x_{i, j} & \\ \sum_{i} x_{i, j} &=1 \quad \forall j \in \mathcal{D} \\ x_{i, j} & \leq y_{i} \quad i \in \mathcal{F}, j \in \mathcal{D} \\ x, y & \geq 0 \end{aligned}
Given a feasible solution x , y x , y x,yx, y to the LP the question is how to round. We note that the LP relaxation does not "know" whether d d dd is a metric or not. In fact when d d dd is arbitrary (but non-negative) we obtain the non-metric facility location problem which is as hard as the Set Cover problem but not much harder - one can obtain an O ( log n ) O ( log n ) O(log n)O(\log n)-approximation. However, we can obtain a constant factor when d d dd is a metric and the rounding will exploit this property.
Given the fractional solution x , y x , y x,yx, y for each j j jj we define α j α j alpha_(j)\alpha_{j} to be the distance cost paid for by the LP: therefore α j = i F d ( i , j ) x i , j α j = i F d ( i , j ) x i , j alpha_(j)=sum_(i inF)d(i,j)x_(i,j)\alpha_{j}=\sum_{i \in \mathcal{F}} d(i, j) x_{i, j}. Note that the LP cost is i f i y i + j α j i f i y i + j α j sum_(i)f_(i)y_(i)+sum_(j)alpha_(j)\sum_{i} f_{i} y_{i}+\sum_{j} \alpha_{j}.
Lemma 9.2. For each j j jj and each δ ( 0 , 1 ) δ ( 0 , 1 ) delta in(0,1)\delta \in(0,1) there is a total facility value of at least ( 1 δ ) ( 1 δ ) (1-delta)(1-\delta) in B ( j , α j / δ ) B j , α j / δ B(j,alpha_(j)//delta)B\left(j, \alpha_{j} / \delta\right). That is, i B ( j , α j / δ ) y i 1 δ i B j , α j / δ y i 1 δ sum_(i in B(j,alpha_(j)//delta))y_(i) >= 1-delta\sum_{i \in B\left(j, \alpha_{j} / \delta\right)} y_{i} \geq 1-\delta. In particular i B ( j , 2 α j ) y i 1 / 2 i B j , 2 α j y i 1 / 2 sum_(i in B(j,2alpha_(j)))y_(i) >= 1//2\sum_{i \in B\left(j, 2 \alpha_{j}\right)} y_{i} \geq 1 / 2.
Proof. This essentially follows from Markov's inequality or averaging. Note that α j = i d ( i , j ) x i , j α j = i d ( i , j ) x i , j alpha_(j)=sum_(i)d(i,j)x_(i,j)\alpha_{j}=\sum_{i} d(i, j) x_{i, j} and i x i , j = 1 i x i , j = 1 sum_(i)x_(i,j)=1\sum_{i} x_{i, j}=1. Suppose i B ( j , α j / δ ) y i < 1 δ i B j , α j / δ y i < 1 δ sum_(i in B(j,alpha_(j)//delta))y_(i) < 1-delta\sum_{i \in B\left(j, \alpha_{j} / \delta\right)} y_{i}<1-\delta. Since x i , j y i x i , j y i x_(i,j) <= y_(i)x_{i, j} \leq y_{i} for all i i ii, j j jj, we will have α j > δ α j / δ α j > δ α j / δ alpha_(j) > deltaalpha_(j)//delta\alpha_{j}>\delta \alpha_{j} / \delta which is impossible.
We say that two clients j j jj and j j j^(')j^{\prime} intersect if there is some i F i F i inFi \in \mathcal{F} such that i B ( j , 2 α j ) B ( j , 2 α j ) i B j , 2 α j B j , 2 α j i in B(j,2alpha_(j))nn B(j^('),2alpha_(j^(')))i \in B\left(j, 2 \alpha_{j}\right) \cap B\left(j^{\prime}, 2 \alpha_{j^{\prime}}\right). The rounding algorithm is described below.
UCFL-primal-rounding _ UCFL-primal-rounding _ "UCFL-primal-rounding"_\underline{\text{UCFL-primal-rounding}}
1. Solve LP and let ( x , y ) ( x , y ) (x,y)(x, y) be a feasible LP solution
2. For each j j jj compute α j = i d ( i , j ) x i , j α j = i d ( i , j ) x i , j alpha_(j)=sum_(i)d(i,j)x_(i,j)\alpha_{j}=\sum_{i} d(i, j) x_{i, j}
3. Renumber clients such that α 1 α 2 α h α 1 α 2 α h alpha_(1) <= alpha_(2) <= dots <= alpha_(h)\alpha_{1} \leq \alpha_{2} \leq \ldots \leq \alpha_{h} where h h hh is number of clients
4. For j = 1 j = 1 j=1j=1 to h h hh do
quad\quad A. If j j jj already assigned continue
quad\quad B. Open cheapest facility i i ii in B ( j , 2 α j ) B j , 2 α j B(j,2alpha_(j))B\left(j, 2 \alpha_{j}\right) and assign j j jj to i i ii
quad\quad C. For each remaining client j > j j > j j^(') > jj^{\prime}>j that intersects with j j jj, assigne j j j^(')j^{\prime} to i i ii
5. Output the list of open facilities and the client assignment
It is not hard to see that every client is assigned to an open facility. The main issue is to bound the total cost. Let F F FF be the total facility opening cost, and let C C CC be the total connection cost. We will bound these separately.
Lemma 9.3. F 2 i f i y i F 2 i f i y i F <= 2sum_(i)f_(i)y_(i)F \leq 2 \sum_{i} f_{i} y_{i}.
Proof. Note that a client j j jj opens a new facility only if it has not been assigned when it is considered by the algorithm. Let j 1 , j 2 , , j k j 1 , j 2 , , j k j_(1),j_(2),dots,j_(k)j_{1}, j_{2}, \ldots, j_{k} be the clients that open facilities. Let A j = F B ( j , 2 α j ) A j = F B j , 2 α j A_(j)=Fnn B(j,2alpha_(j))A_{j}=\mathcal{F} \cap B\left(j, 2 \alpha_{j}\right) be the set of facilities in the ball of radius 2 α j 2 α j 2alpha_(j)2 \alpha_{j} around j j jj. From the algorithm and the definition of intersection of clients, we see that the sets A j 1 , A j 2 , , A j k A j 1 , A j 2 , , A j k A_(j_(1)),A_(j_(2)),dots,A_(j_(k))A_{j_{1}}, A_{j_{2}}, \ldots, A_{j_{k}} are pairwise disjoint. The algorithm opens the cheapest facility in A j A j A_(j_(ℓ))A_{j_{\ell}} for 1 k 1 k 1 <= ℓ <= k1 \leq \ell \leq k. Note that i A j y i 1 / 2 i A j y i 1 / 2 sum_(i inA_(jℓ))y_(i) >= 1//2\sum_{i \in A_{j \ell}} y_{i} \geq 1 / 2 by Lemma 9.2. Hence the cost of the cheapest facility in A j A j A_(jℓ)A_{j \ell} is at most 2 i A j f i y i 2 i A j f i y i 2sum_(i inA_(j))f_(i)y_(i)2 \sum_{i \in A_{j}} f_{i} y_{i} (why?). By the disjointness of the sets A j 1 , , A j k A j 1 , , A j k A_(j_(1)),dots,A_(j_(k))A_{j_{1}}, \ldots, A_{j_{k}} we see that the total cost of the facilities opened is at most 2 i f i y i 2 i f i y i 2sum_(i)f_(i)y_(i)2 \sum_{i} f_{i} y_{i}.
Lemma 9.4. C 6 j α j C 6 j α j C <= 6sum_(j)alpha_(j)C \leq 6 \sum_{j} \alpha_{j}.
Proof. Consider a client j j jj that opens a facility i i ii in B ( j , 2 α j ) B j , 2 α j B(j,2alpha_(j))B\left(j, 2 \alpha_{j}\right). In this case j j jj is assigned to i i ii and d ( i , j ) 2 α j d ( i , j ) 2 α j d(i,j) <= 2alpha_(j)d(i, j) \leq 2 \alpha_{j}. Now consider a client j j jj that does not open a facility. This implies that there was a client j < j j < j j^(') < jj^{\prime}<j that opened a facility i i i^(')i^{\prime} and j j j^(')j^{\prime} and j j jj intersect, and j j jj was assigned to i i i^(')i^{\prime}. What is d ( i , j ) d i , j d(i^('),j)d\left(i^{\prime}, j\right)? We claim that d ( i , j ) 6 α j d i , j 6 α j d(i^('),j) <= 6alpha_(j)d\left(i^{\prime}, j\right) \leq 6 \alpha_{j}. To see this we note that j j j^(')j^{\prime} and j j jj intersect because there is some facility i B ( j , 2 α j ) B ( j , 2 α j ) i B j , 2 α j B j , 2 α j i in B(j,2alpha_(j^(')))nn B(j,2alpha_(j))i \in B\left(j, 2 \alpha_{j^{\prime}}\right) \cap B\left(j, 2 \alpha_{j}\right). By considering the path j i j i j i j i j rarr i rarrj^(')rarri^(')j \rightarrow i \rightarrow j^{\prime} \rightarrow i^{\prime}, via triangle inequality, d ( i , j ) 2 α j + 2 α j + 2 α j 6 α j d i , j 2 α j + 2 α j + 2 α j 6 α j d(i^('),j) <= 2alpha_(j)+2alpha_(j^('))+2alpha_(j^(')) <= 6alpha_(j)d\left(i^{\prime}, j\right) \leq 2 \alpha_{j}+2 \alpha_{j^{\prime}}+2 \alpha_{j^{\prime}} \leq 6 \alpha_{j} since α j α j α j α j alpha_(j^(')) <= alpha_(j)\alpha_{j^{\prime}} \leq \alpha_{j}. Thus the distance traveled by each client j j jj to its assigned facility is at most 6 α j 6 α j 6alpha_(j)6 \alpha_{j}. The lemma follows.
The two preceding lemmas give us the following which implies that the algorithm yields a 6-approximation.
Theorem 9.3. C + F 6 OPT L P C + F 6  OPT L P C+F <= 6" OPT"_(LP)C+F \leq 6 \text{ OPT}_{L P}.
It should be clear to the reader that the algorithm and analysis are not optimized for the approximation ratio. The goal here was to simply outline the basic scheme that led to the first constnat factor approximation.

9.2.2 Primal-Dual

Jain and Vazirani developed an elegant and influential primal-dual algorithm for UCFL [11]. It was influential since it allowed new algorithms for k k kk-median and several generalizations of UCFL in a clean way. Moreover the primal-dual algorithm is simple and efficient to implement. On the other hand we should mention that one advantage of having an LP solution is that it gives an explicit lower bound on the optimum value while a primal-dual method yields a lower bound via a feasible dual which may not be optimal. We need some background to describe the primal-dual method in approximation.
Complementary slackness: To understand primal-dual we need some basic background in complementary slackness. Suppose we have a primal LP ( P ) LP ( P ) LP(P)\operatorname{LP} (\mathrm{P}) of the form min c x min c x min cx\min c x s.t A x b , x 0 A x b , x 0 Ax <= b,x >= 0A x \leq b, x \geq 0 which we intentionally wrote in this standard form as a covering LP. It's dual ( D ) dual ( D ) dual(D)\operatorname{dual} (\mathrm{D}) is a packing LP max b t y max b t y maxb^(t)y\max b^{t} y s.t y A t c , y 0 y A t c , y 0 yA^(t) >= c,y >= 0y A^{t} \geq c, y \geq 0. We will assume that both primal and dual are feasible and hence the optimum values are finite, and via strong duality we know that the optimum values are the same.
Definition 9.4. A feasible solution x x xx to ( P ) ( P ) (P)(P) and a feasible solution y y yy to ( D ) ( D ) (D)(D) satisfy the primal complementary slackness condition with respect to each other if the following is true: for each i , x i = 0 i , x i = 0 i,x_(i)=0i, x_{i}=0 or the corresponding dual constraint is tight, that is j A i , j y j = c i j A i , j y j = c i sum_(j)A_(i,j)y_(j)=c_(i)\sum_{j} A_{i, j} y_{j}=c_{i}. They satisfy the dual complementary slackness condition if the following is true: for each j , y j = 0 j , y j = 0 j,y_(j)=0j, y_{j}=0 or the corresponding primal constraint is tight, that is j A j , i x i = b j j A j , i x i = b j sum_(j)A_(j,i)x_(i)=b_(j)\sum_{j} A_{j, i} x_{i}=b_{j}.
One of the consequences of the duality theory of LP is the following.
Theorem 9.5. Suppose ( P ) ( P ) (P)(P) and ( D ) ( D ) (D)(D) are a primal and dual pair of LPs that both have finite optima. A feasible solution x x xx to ( P ) ( P ) (P)(P) and a feasible solution y y yy to ( D ) ( D ) (D)(D) satisfy the primal-dual complementary slackness properties with respect to each other if and only if they are both optimum solutions to the respective LPs.
We illustrate the use of complementary slackness in the context of approximation via Vertex Cover. Recall the primal covering LP and we also write the dual.
min v V w v x v x u + x v 1 e = ( u , v ) E x v 0 v V min v V w v x v x u + x v 1 e = ( u , v ) E x v 0 v V {:[minsum_(v in V)w_(v)x_(v)],[x_(u)+x_(v) >= 1quad AA e=(u","v)in E],[x_(v) >= 0quad AA v in V],[],[]:}\begin{aligned} \min \sum_{v \in V} w_{v} x_{v}& \\ x_{u}+x_{v} &\geq 1 \quad \forall e=(u, v) \in E \\ x_{v} &\geq 0 \quad \forall v \in V \\ \\ \\ \end{aligned}
max e E y e e δ ( v ) y e w v v V y e 0 , e E max e E y e e δ ( v ) y e w v v V y e 0 , e E {:[maxsum_(e in E)y_(e)],[sum_(e in delta(v))y_(e) <= w_(v)quad AA v in V],[y_(e) >= 0","quad AA e in E]:}\begin{aligned} \max \sum_{e \in E} y_{e} & \\ \sum_{e \in \delta(v)} y_{e} & \leq w_{v} \quad \forall v \in V \\ y_{e} & \geq 0, \quad \forall e \in E \end{aligned}
Recall that we described a simple rounding algorithm. Given a feasible primal solution x x xx. We let S = { v x v 1 / 2 } S = v x v 1 / 2 S={v∣x_(v) >= 1//2}S=\left\{v \mid x_{v} \geq 1 / 2\right\} and showed that (i) S S SS is a vertex cover for the given graph G G GG and (ii) w ( S ) 2 v w v x v w ( S ) 2 v w v x v w(S) <= 2sum_(v)w_(v)x_(v)w(S) \leq 2 \sum_{v} w_{v} x_{v}. Now suppose we have an optimum solution x x x^(**)x^{*} to the primal rather than an arbitrary feasible solution. We can prove an interesting and stronger statement via complementary slackness.
Lemma 9.5. Let x x x^(**)x^{*} be an optimum solution to the primal covering LP. Then S = { v S = { v S={v∣S=\{v \mid x v > 0 } x v > 0 {:x_(v)^(**) > 0}\left.x_{v}^{*}>0\right\} is a feasible vertex cover for G G GG and moreover w ( S ) 2 v w v x v w ( S ) 2 v w v x v w(S) <= 2sum_(v)w_(v)x_(v)^(**)w(S) \leq 2 \sum_{v} w_{v} x_{v}^{*}.
Proof. It is easy to see that S S SS is a vertex cover via the same argument that we have seen before. How do we bound the cost now that we may be rounding x v x v x_(v)^(**)x_{v}^{*} to 1 even though x v x v x_(v)^(**)x_{v}^{*} may be tiny? Let y y y^(**)y^{*} be any optimum solution to the dual; one exists (why?). Via strong duality we have v w v x v = e y e v w v x v = e y e sum_(v)w_(v)x_(v)^(**)=sum_(e)y_(e)^(**)\sum_{v} w_{v} x_{v}^{*}=\sum_{e} y_{e}^{*}. Via primal-complementary slackness we have the property that if x v > 0 x v > 0 x_(v)^(**) > 0x_{v}^{*}>0 then e δ ( v ) y e = w v e δ ( v ) y e = w v sum_(e in delta(v))y_(e)^(**)=w_(v)\sum_{e \in \delta(v)} y_{e}^{*}=w_{v}. Hence
w ( S ) = v : x v > 0 w v = v : x v > 0 e δ ( v ) y e . w ( S ) = v : x v > 0 w v = v : x v > 0 e δ ( v ) y e . w(S)=sum_(v:x_(v)^(**) > 0)w_(v)=sum_(v:x_(v)^(**) > 0)sum_(e in delta(v))y_(e)^(**).w(S)=\sum_{v: x_{v}^{*}>0} w_{v}=\sum_{v: x_{v}^{*}>0} \sum_{e \in \delta(v)} y_{e}^{*} .
Interchanging the order of summation we obtain that
w ( S ) = v : x v > 0 e δ ( v ) y e 2 e E y e = 2 v w v x v w ( S ) = v : x v > 0 e δ ( v ) y e 2 e E y e = 2 v w v x v w(S)=sum_(v:x_(v)^(**) > 0)sum_(e in delta(v))y_(e)^(**) <= 2sum_(e in E)y_(e)^(**)=2sum_(v)w_(v)x_(v)^(**)w(S)=\sum_{v: x_{v}^{*}>0} \sum_{e \in \delta(v)} y_{e}^{*} \leq 2 \sum_{e \in E} y_{e}^{*}=2 \sum_{v} w_{v} x_{v}^{*}
where we use the fact that an edge e e ee has only two end points in the inequality.
Primal-dual for approximating Vertex Cover: We will first study a primaldual approximation algorithm for the Vertex Cover problem - this algorithm due to Bar-Yehuda and Even [12] can perhaps be considered the first primal-dual algorithm in approximation. Primal-dual is a classical method in optimization and is applicable in both continuous and discrete settings. The basic idea, in the context of LPs (the method applies more generally), is to obtain a solution x x xx to a primal LP and a solution y y yy to the dual LP together and certify the optimality of each solution via complementary slackness. It is beyond the scope of this notes to give a proper treatment. One could argue that understanding the setting of approximation is easier than in the classical setting where one seeks exact algorithms, since our goals are more modest.
Typically one starts with one of x , y x , y x,yx, y being feasible and the other infeasible, and evolve them over time. In discrete optimization, this method is successfully applied to LPs that are known to be integer polytopes. Examples include shortest paths, matchings, and others. In approximation the LP relaxations are typically not integral. In such a setting the goal is to produce a primal and dual pair x , y x , y x,yx, y where x x xx is an integral feasible solution, and y y yy is fractional feasible solution. The goal is to approximately bound the value of x x xx via the dual y y yy, and for this purpose we will enforce only the primal complementary slackness condition for the pair ( x , y ) ( x , y ) (x,y)(x, y). To make the algorithm and analysis manageable, the primal-dual algorithm is typically done in a simple but clever fashion – there have been several surprisingly strong and powerful approximation results via this approach.
We illustrate this in the context of Vertex Cover first. It is useful to interpret the dual as we did already in the context of the dual-fitting technique for Set Cover. We think of the edges e E e E e in Ee \in E as agents that wish to be covered by the vertices in G G GG at minimum cost. The dual variable y e y e y_(e)y_{e} can be thought of as the amount that edge e e ee is willing to pay to be covered. The dual packing constraint e δ ( v ) y e w v e δ ( v ) y e w v sum_(e in delta(v))y_(e) <= w_(v)\sum_{e \in \delta(v)} y_{e} \leq w_{v} says that for any vertex v v vv, the total payments of all edges incident to v v vv cannot exceed its weight. This can be understood game theoretically as follows. The set of edges δ ( v ) δ ( v ) delta(v)\delta(v) can together pay w v w v w_(v)w_{v} and get covered, and hence we cannot charge them more. The dual objective is to maximize the total payment that can be extracted from the edges subject to these natural and simple constraints. With this interpretation in mind we wish to produce a feasible dual (payments) and a corresponding feasible integral primal (vertex cover). The basic scheme is to start with an infeasible primal x = 0 x = 0 x=0x=0 and a feasible dual y = 0 y = 0 y=0y=0 and increase y y yy while maintaining feasibility; during the process we will maintain primal complementary slackness which means that if a dual constraint for a vertex v v vv becomes tight we set x v = 1 x v = 1 x_(v)=1x_{v}=1. Note that we are producing an integer primal solution in this process. How should we increase y y yy values? We will do it in the naive fashion which is to uniformly increase y e y e y_(e)y_{e} for all e e ee that are not already part of a tight constraint (and hence not covered yet).
VC-Primal-Dual ( G = ( V , E ) , w : V R + ) _ VC-Primal-Dual G = ( V , E ) , w : V R + _ "VC-Primal-Dual"(G=(V,E),w:V rarrR_(+))_\underline{\text {VC-Primal-Dual}\left(G=(V, E), w: V \rightarrow \mathbb{R}_{+}\right)}
1. x 0 , y 0 // initialization: primal infeasible, dual feasible x 0 , y 0 // initialization: primal infeasible, dual feasible x larr0,y larr0qquad"// initialization: primal infeasible, dual feasible"x \leftarrow 0, y \leftarrow 0 \qquad \textit{// initialization: primal infeasible, dual feasible}
2. U E // uncovered edges that are active U E // uncovered edges that are active U larr E qquadqquadqquadqquadqquad"// uncovered edges that are active"U \leftarrow E \qquad\qquad\qquad\qquad\qquad \textit{// uncovered edges that are active}
3. While ( U ) ( U ) (U!=O/)(U \neq \emptyset) do
quad\quadA. Increase y e y e y_(e)y_{e} uniformly for each e U e U e in Ue \in U until constraint e δ ( v ) y e = w v e δ ( v ) y e = w v sum_(e in delta(v))y_(e)=w_(v)\sum_{e \in \delta(v)} y_{e}=w_{v} for
quadquad\quad\quadsome vertex a a aa
quad\quadB. Set x a = 1 // Maintain primal complementary slackness x a = 1 // Maintain primal complementary slackness x_(a)=1qquadqquad"// Maintain primal complementary slackness"x_{a}=1 \qquad\qquad \textit{// Maintain primal complementary slackness}
quad\quadC. U U δ ( a ) // Remove all edges covered by a U U δ ( a ) // Remove all edges covered by a U larr U\\delta(a)qquadqquadqquadqquad"// Remove all edges covered by a"U \leftarrow U \backslash \delta(a)\qquad\qquad\qquad\qquad \textit{// Remove all edges covered by a}
4. Output integer solution x x xx and dual certificate y y yy
Remark 9.2. Note that when checking whether a vertex v v vv is tight we count the payments from e δ ( v ) e δ ( v ) e in delta(v)e \in \delta(v) even though some of them are no longer active.
Lemma 9.6. At the end of the algorithm x x xx is a feasible vertex cover for G G GG and v w v x v 2 OPT v w v x v 2  OPT sum_(v)w_(v)x_(v) <= 2" OPT"\sum_{v} w_{v} x_{v} \leq 2 \text{ OPT}.
Proof. By induction on the iterations one can prove that (i) y y yy remains dual feasible throughout (ii) a b U a b U ab in Ua b \in U at the start of an iteration iff e = a b e = a b e=abe=a b is not covered yet (iii) each iteration adds at least one more vertex and hence the algorithm terminates in n n <= n\leq n iterations and outputs a feasible vertex cover. The main issue is the cost of x x xx.
For this we note that the algorithm maintains primal complementary slackness. That is, x v = 0 x v = 0 x_(v)=0x_{v}=0 or if x v = 1 x v = 1 x_(v)=1x_{v}=1 then e δ ( v ) y e = w v e δ ( v ) y e = w v sum_(e in delta(v))y_(e)=w_(v)\sum_{e \in \delta(v)} y_{e}=w_{v}. Thus, we have
v w v x v = v : x v > 0 e δ ( v ) y e 2 e y e 2 OPT L P . v w v x v = v : x v > 0 e δ ( v ) y e 2 e y e 2  OPT L P . sum_(v)w_(v)x_(v)=sum_(v:x_(v) > 0)sum_(e in delta(v))y_(e) <= 2sum_(e)y_(e) <= 2" OPT"_(LP).\sum_{v} w_{v} x_{v}=\sum_{v: x_{v}>0} \sum_{e \in \delta(v)} y_{e} \leq 2 \sum_{e} y_{e} \leq 2 \text{ OPT}_{L P} .
We used the fact that e e ee has at most two end points in the first inequality and the fact that y y yy is dual feasible in the second inequality. In terms of payment what this says is that edge u v u v uvu v's payment of y u v y u v y_(uv)y_{u v} can be used to pay for opening u u uu and v v vv while the dual pays only once.
As the reader can see, the algorithm is very simple to implement.
Exercise 9.5. Describe an example to show that the primal-dual algorithm's worst case performance is 2. Describe an example to show that the dual value constructed by the algorithm is OPT / 2 OPT / 2 ≃"OPT"//2\simeq \text{OPT} / 2. Are these two parts the same?
Remark 9.3. The algorithm generalizes to give an f f ff-approximation for Set Cover where f f ff is the maximum frequency of any element. There are examples showing that the performance of this algorithm in the worst-case can indeed be a factor of f f ff. We saw earlier that the integrality gap of the LP is at most 1 + ln d 1 + ln d 1+ln d1+\ln d where d d dd is the maximum set size. There is no contradiction here since the specific primal-dual algorithm that we developed need not achieve the tight integrality gap.
Primal-Dual for UCFL: Now we consider a primal-dual algorithm for UCFL. Recall the primal LP that we discussed previously. Now we describe the dual LP written below. The dual has two types of variables. For each client j j jj there is a variable α j α j alpha_(j)\alpha_{j} corresponding to the primal constraint that each client j j jj needs to be connected to a facility. For each facility i i ii and client j j jj there is a variable β i , j β i , j beta_(i,j)\beta_{i, j} corresponding to the constraint that y i x i , j y i x i , j y_(i) >= x_(i,j)y_{i} \geq x_{i, j}.
max j D α j i β i , j f i i F α j β i , j d ( i , j ) i F , j D α , β 0 max j D α j i β i , j f i i F α j β i , j d ( i , j ) i F , j D α , β 0 {:[maxsum_(j inD)alpha_(j)],[sum_(i)beta_(i,j) <= f_(i)quad AA i inF],[alpha_(j)-beta_(i,j) <= d(i","j)quad i inF","j inD],[alpha","beta >= 0]:}\begin{aligned} \max \sum_{j \in \mathcal{D}} \alpha_{j} & \\ \sum_{i} \beta_{i, j} & \leq f_{i} \quad \forall i \in \mathcal{F} \\ \alpha_{j}-\beta_{i, j} & \leq d(i, j) \quad i \in \mathcal{F}, j \in \mathcal{D} \\ \alpha, \beta & \geq 0 \end{aligned}
It is important to interpret the dual variables. There is a similarity to Set Cover since Non-Metric Facility Location is a generalization, and the LP formulation does not distinguish between metric and non-metric facility location - it is only in the rounding that we can take advantage of metric properties. The variable α j α j alpha_(j)\alpha_{j} can be interpreted as the amount of payment client j j jj is willing to make. This comes in two parts - the payment to travel to a facility which it cannot share with any other clients, and the payment it is willing to make to open a facility which it can share with other clients. The variable β i , j β i , j beta_(i,j)\beta_{i, j} corresponds to the amount client j j jj is willing to pay to facility i i ii to open it. (In Set Cover there is no need to distinguish between α j α j alpha_(j)\alpha_{j} and β i , j β i , j beta_(i,j)\beta_{i, j} since there are no distances (or they can be assumed to be 0 or oo\infty).) The first set of constraints in the dual say that for any facility i i ii, the total payments from all the clients ( j β i , j ) ( j β i , j ) (sum_(j)beta_(i,j))(\sum_{j} \beta_{i, j}) cannot exceed cost f i cost f i cost f_(i)\operatorname{cost} f_{i}. The second set of constraints specify that α j β i , j α j β i , j alpha_(j)-beta_(i,j)\alpha_{j}-\beta_{i, j} is at most d ( i , j ) d ( i , j ) d(i,j)d(i, j). One way to understand this is that if α j < d ( i , j ) α j < d ( i , j ) alpha_(j) < d(i,j)\alpha_{j}<d(i, j) then client j j jj will not even be able to reach i i ii and hence will not contribute to opening i i ii. Via this interpretation it is convenient to assume that β i , j = max { 0 , α j d ( i , j ) } β i , j = max 0 , α j d ( i , j ) beta_(i,j)=max{0,alpha_(j)-d(i,j)}\beta_{i, j}=\max \left\{0, \alpha_{j}-d(i, j)\right\}.
The primal-dual algorithm for UCFL will have a growth stage that is similar to what we saw for Vertex Cover. We increase α j α j alpha_(j)\alpha_{j} for each "uncovered" client j j jj uniformly. A facility i i ii will receive payment β i , j = max { 0 , α j d ( i , j ) } β i , j = max 0 , α j d ( i , j ) beta_(i,j)=max{0,alpha_(j)-d(i,j)}\beta_{i, j}=\max \left\{0, \alpha_{j}-d(i, j)\right\} from j j jj. To maintain dual feasibility, as soon as the constraint for facility i i ii becomes tight we open facility i i ii (in the primal we set y i = 1 y i = 1 y_(i)=1y_{i}=1 ); note that any client j j jj such that β i , j > 0 β i , j > 0 beta_(i,j) > 0\beta_{i, j}>0 cannot increase its α j α j alpha_(j)\alpha_{j} any further and hence will stop growing since it is connected to an open facility. This process is very similar to that in Set Cover. The main issue is that we will get a weak approximation (a factor of | F | | F | |F||\mathcal{F}|) since a client can contribute payments to a large number of facilities. Note that the process so far has not taken advantage of the fact that we have a metric facility location problem. Therefore, in the second phase we will close some facilities which means that a client may need to get connected to a facility that it did not contribute to - however we will use the metric property to show that a client does not need to travel too far to reach a facility that remains open.
With the above discussion in place, we describe the two phase primal-dual algorithm below. The algorithm also creates a bipartite graph G G GG with vertex set F D F D FuuD\mathcal{F} \cup \mathcal{D} and initially it has no edges.
JV-primal-dual ( ( F D , d ) , f i , i F ) _ JV-primal-dual ( F D , d ) , f i , i F _ "JV-primal-dual"((FuuD,d),f_(i),i inF)_\underline{\text{JV-primal-dual}\left((\mathcal{F} \cup \mathcal{D}, d), f_{i}, i \in \mathcal{F}\right)}
1. α j = 0 α j = 0 alpha_(j)=0\alpha_{j}=0 for all j D , β i , j 0 j D , β i , j 0 j inD,beta_(i,j)larr0j \in \mathcal{D}, \beta_{i, j} \leftarrow 0 for all i , j i , j i,j qquadi, j\qquad // initialize dual values to 0 initialize dual values to  0 "initialize dual values to "0\textit{initialize dual values to } 0
2. A D A D A larrDqquadqquadqquadqquadqquadA \leftarrow \mathcal{D}\qquad\qquad\qquad\qquad\qquad // active clients that are unconnected  active clients that are unconnected " active clients that are unconnected"\textit{ active clients that are unconnected}
3. O O O larr O/qquadqquadqquadqquadqquadquadqquadO \leftarrow \emptyset \qquad\qquad\qquad\qquad\qquad\quad\qquad // temporarily opened facilities temporarily opened facilities "temporarily opened facilities"\textit{temporarily opened facilities}
4. While ( A ) ( A ) (A!=O/)(A \neq \emptyset) do qquadqquadqquadqquadqquadqquadquadqquad\qquad\qquad\qquad\qquad\qquad\qquad\quad\qquad // Growth phase  Growth phase " Growth phase"\textit{ Growth phase}
quad\quad A. Increase α j α j alpha_(j)\alpha_{j} uniformly for each j A j A j in Aj \in A while maintaining the invariant
qquad\qquad max { 0 , α j d ( i , j ) } = β i , j max 0 , α j d ( i , j ) = β i , j max{0,alpha_(j)-d(i,j)}=beta_(i,j)\max \left\{0, \alpha_{j}-d(i, j)\right\}=\beta_{i, j} until one of following conditions hold: (i)
qquad\qquad α j = d ( i , j ) α j = d ( i , j ) alpha_(j)=d(i,j)\alpha_{j}=d(i, j) for some j A j A j in Aj \in A and i O i O i in Oi \in O or (ii) j β i j = f i j β i j = f i sum_(j)beta_(ij)=f_(i)\sum_{j} \beta_{i j}=f_{i} for some
qquad\qquad i F O i F O i inF\\Oi \in \mathcal{F} \backslash O
quad\quad B. If (i) happens then add edge ( i , j ) ( i , j ) (i,j)(i, j) to G G GG and and A A { j } A A { j } A larr A-{j}A \leftarrow A-\{j\} // j is j  is j" is"j \textit{ is}
connected to already open facility i O connected to already open facility  i O qquad"connected to already open facility "i in O\qquad\textit{connected to already open facility } i \in O
quad\quad C. Else If (ii) happens then
qquad\qquad 1. O O { i } O O { i } O larr O uu{i}qquadqquadO \leftarrow O \cup\{i\}\qquad\qquad // temporarily open i that became tight temporarily open  i  that became tight "temporarily open "i" that became tight"\textit{temporarily open } i \textit { that became tight}
qquad\qquad 2. for each j D j D j inDj \in \mathcal{D} such that β i , j > 0 β i , j > 0 beta_(i,j) > 0\beta_{i, j}>0 add edge ( i , j ) ( i , j ) (i,j)(i, j) to G G GG // note: clients note: clients "note: clients"\textit{note: clients}
not in A may also get edges to i not in A may also get edges to i qquadquad"not in A may also get edges to i"\qquad\quad\textit{not in A may also get edges to i}
qquad\qquad 3. A A { j : β ( i , j ) > 0 } A A { j : β ( i , j ) > 0 } A larr A\\{j:beta(i,j) > 0}qquadA \leftarrow A \backslash\{j: \beta(i, j)>0\}\qquad// make active clients connected to i inactive make active clients connected to  i  inactive "make active clients connected to "i" inactive"\textit{make active clients connected to } i \textit{ inactive}
5. Create graph H H HH with vertex set O O OO and edge set Q Q QQ where ( i , i ) Q i , i Q (i,i^('))in Q\left(i, i^{\prime}\right) \in Q iff
quad\quad there exists client j j jj such that β ( i , j ) > 0 β ( i , j ) > 0 beta(i,j) > 0\beta(i, j)>0 and β ( i , j ) > 0 β i , j > 0 beta(i^('),j) > 0\beta\left(i^{\prime}, j\right)>0
6. O O O^(')O^{\prime} is any maximal independent set in H H HH
7. Close facilities in O O O O O\\O^(')qquadqquadqquadqquadqquadqquadquadO \backslash O^{\prime}\qquad\qquad\qquad\qquad\qquad\qquad\quad // Pruning phase Pruning phase "Pruning phase"\textit{Pruning phase}
8. Assign each client j j jj to nearest facility in O O O^(')O^{\prime}
Example: The example in Fig 9.2.2 illustrates the need for the pruning phase. There are 2 n 2 n 2n2 n clients and n n nn facilities and the opening costs of the facilities are n + 2 n + 2 n+2n+2 except for the first one which has an opening cost of n + 1 n + 1 n+1n+1. The first group of clients shown at the top of the figure have a connection cost of 1 to each facility. The second group of clients have the following property: d ( i h , j h ) = 1 d i h , j h = 1 d(i_(h),j_(h)^('))=1d\left(i_{h}, j_{h}^{\prime}\right)=1 and d ( i , j h ) = 3 d i , j h = 3 d(i_(ℓ),j_(h)^('))=3d\left(i_{\ell}, j_{h}^{\prime}\right)=3 when j j ℓ!=j\ell \neq j. The rest of the distances are induced by these. One can verify that in the growth phase all the facilities will be opened. However the total dual value after the growth phase is 5 n 1 5 n 1 5n-15 n-1 while the total cost of the opened facilitie is Θ ( n 2 ) Θ n 2 Theta(n^(2))\Theta\left(n^{2}\right) and hence pruning is necessary to obtain a good approximation.

Figure 9.1: Example to illustrate the need for the pruning phase.

We will do a high-level analysis of the algorithm skipping a few fine details. A formal treatment with full details can be found in [13].
The algorithm maintains the property that α j α j alpha_(j)\alpha_{j} and β i , j β i , j beta_(i,j)\beta_{i, j} variables are dual feasible. Consider the graph G G GG created by the algorithm. We call an edge ( i , j ) G ( i , j ) G (i,j)in G(i, j) \in G a witness edge for j j jj when j j jj is removed from A A AA (it happens exactly once for each j j jj ). We call an edge ( i , j ) G ( i , j ) G (i,j)in G(i, j) \in G special if β ( i , j ) > 0 β ( i , j ) > 0 beta(i,j) > 0\beta(i, j)>0 which means that j j jj paid to temporarily open i i ii. We remark that a client j j jj may have a special edge to i i ii even though ( i , j ) ( i , j ) (i,j)(i, j) is not its witness edge - this can happen if i i ii is temporarily opened after j j jj was already removed from A A AA (due to other clients contributing to opening i i ii later). One can associate a notion of time with the progression of the algorithm since dual variables are monotonically increased. This can be used to order events.
A basic property maintained by the algorithm is the following.
Claim 9.2.1. If ( i , j ) ( i , j ) (i,j)(i, j) is an edge in G G GG then α j β i , j = d ( i , j ) α j β i , j = d ( i , j ) alpha_(j)-beta_(i,j)=d(i,j)\alpha_{j}-\beta_{i, j}=d(i, j) and hence α j d ( i , j ) α j d ( i , j ) alpha_(j) >= d(i,j)\alpha_{j} \geq d(i, j).
Proof. The algorithm adds an edge ( i , j ) ( i , j ) (i,j)(i, j) only if ( i , j ) ( i , j ) (i,j)(i, j) is a witness edge for j j jj or if β ( i , j ) > 0 β ( i , j ) > 0 beta(i,j) > 0\beta(i, j)>0 (in which case ( i , j ) ( i , j ) (i,j)(i, j) is special). The algorithm maintain the invariant β i , j = max { 0 , α j d ( i , j ) } β i , j = max 0 , α j d ( i , j ) beta_(i,j)=max{0,alpha_(j)-d(i,j)}\beta_{i, j}=\max \left\{0, \alpha_{j}-d(i, j)\right\} and hence if β i , j > 0 β i , j > 0 beta_(i,j) > 0\beta_{i, j}>0 the claim is clear. If β i , j = 0 β i , j = 0 beta_(i,j)=0\beta_{i, j}=0 then ( i , j ) ( i , j ) (i,j)(i, j) is a witness edge for j j jj and case (i) happens when j j jj is removed from A A AA and in this case α j = d ( i , j ) α j = d ( i , j ) alpha_(j)=d(i,j)\alpha_{j}=d(i, j).
Analysis: We upper bound the cost of opening the facilities in O O O^(')O^{\prime} and the connection cost of the clients.
We leave the proof of the following lemma as an exercise.
Lemma 9.7. For each facility i O i O i in Oi \in O we have ( i , j ) is special β i , j = f i ( i , j ) is special β i , j = f i sum_((i,j)"is special")beta_(i,j)=f_(i)\sum_{(i, j)\textit{is special}}\beta_{i, j}=f_{i}.
Since a client j j jj can pay for multiple facilities which we cannot afford, the pruning phase removes facilities such that no client j j jj is connected two facilities in O O O^(')O^{\prime} with special edges (otherwise those two facilities will have an edge in H H HH ). We say that a client j j jj is directly connected to a facility i O i O i inO^(')i \in O^{\prime} if ( i , j ) ( i , j ) (i,j)(i, j) is a special edge. We call all such clients directly connected clients and the rest of the clients are called indirectly connected. We let D 1 = i O Z i D 1 = i O Z i D_(1)=uu_(i inO^('))Z_(i)\mathcal{D}_{1}=\cup_{i \in O^{\prime}} Z_{i} be the set of all directly connected clients and let D 2 D 2 D_(2)\mathcal{D}_{2} be the set of all indirectly connected clients.
For i O i O i inO^(')i \in O^{\prime} let Z i Z i Z_(i)Z_{i} be the directly connected clients. By the pruning rule we have the property that a client j j jj is directly connected to at most one facility in O O O^(')O^{\prime}. We show that each facility in O O O^(')O^{\prime} can be paid for by its directly connected clients.
Lemma 9.8. For each i O , j Z i β i , j = f i i O , j Z i β i , j = f i i inO^('),sum_(j inZ_(i))beta_(i,j)=f_(i)i \in O^{\prime}, \sum_{j \in Z_{i}} \beta_{i, j}=f_{i}.
Proof. From Lemma 9.7 and the fact that if i O i O i inO^(')i \in O^{\prime} then every client j j jj with special edge ( i , j ) ( i , j ) (i,j)(i, j) must be directly connected to i i ii.
From Claim 9.2.1 we see that if j j jj is directly connected to i i ii then α j β i , j = d ( i , j ) α j β i , j = d ( i , j ) alpha_(j)-beta_(i,j)=d(i,j)\alpha_{j}-\beta_{i, j}=d(i, j), and hence j j jj can pay its connection cost to i i ii and its contribution to opening i i ii. What about indirectly connected clients? The next lemma bounds their connection cost.
Lemma 9.9. Suppose j D 2 j D 2 j inD_(2)j \in \mathcal{D}_{2}, that is, it is an indirectly connected client. Let i i ii be its closest facility in O O O^(')O^{\prime} then d ( i , j ) 3 α j d ( i , j ) 3 α j d(i,j) <= 3alpha_(j)d(i, j) \leq 3 \alpha_{j}.
Proof. Let ( i , j ) ( i , j ) (i,j)(i, j) be the witness edge for j j jj. Note that i O i O i in Oi \in O. Since j j jj is an indirectly connected client there is no facility i O i O i^(')inO^(')i^{\prime} \in O^{\prime} such that ( i , j ) i , j (i^('),j)\left(i^{\prime}, j\right) is a special edge. Since i O i O i!inO^(')i \notin O^{\prime} it must be because i i ii was closed in the pruning phase and hence there must be a facility i O i O i^(')inO^(')i^{\prime} \in O^{\prime} such that ( i , i ) i , i (i,i^('))\left(i, i^{\prime}\right) is an edge in H H HH (otherwise O O O^(')O^{\prime} would not be a maximal independent set). Therefore there is some client j j j j j^(')!=jj^{\prime} \neq j such that ( i , j ) i , j (i^('),j^('))\left(i^{\prime}, j^{\prime}\right) and ( i , j ) i , j (i,j^('))\left(i, j^{\prime}\right) are both special edges. We claim that α j α j α j α j alpha_(j) >= alpha_(j^('))\alpha_{j} \geq \alpha_{j^{\prime}}. Assuming this claim we see via triangle inequality and Claim 9.2.1 that,
d ( i , j ) d ( i , j ) + d ( i , j ) + d ( i , j ) α j + 2 α j 3 α j . d i , j d ( i , j ) + d i , j + d i , j α j + 2 α j 3 α j . d(i^('),j) <= d(i,j)+d(i,j^('))+d(i^('),j^(')) <= alpha_(j)+2alpha_(j^(')) <= 3alpha_(j).d\left(i^{\prime}, j\right) \leq d(i, j)+d\left(i, j^{\prime}\right)+d\left(i^{\prime}, j^{\prime}\right) \leq \alpha_{j}+2 \alpha_{j^{\prime}} \leq 3 \alpha_{j} .
Since i O i O i^(')inO^(')i^{\prime} \in O^{\prime} the nearest client to j j jj is within distance 3 α j 3 α j <= 3alpha_(j)\leq 3 \alpha_{j}.
We now prove the claim that α j α j α j α j alpha_(j) >= alpha_(j^('))\alpha_{j} \geq \alpha_{j^{\prime}}. Let t = α j t = α j t=alpha_(j)t=\alpha_{j} be the time when j j jj connects to facility i i ii as its witness. Consider two cases. In the first case d ( i , j ) t d i , j t d(i,j^(')) <= td\left(i, j^{\prime}\right) \leq t which means that j j j^(')j^{\prime} had already reached i i ii at or before t t tt; in this case α j t α j t alpha_(j^(')) <= t\alpha_{j^{\prime}} \leq t since i i ii was open at t t tt. In the second case d ( i , j ) > t d i , j > t d(i,j^(')) > td\left(i, j^{\prime}\right)>t; this means that j j j^(')j^{\prime} reached i i ii strictly after t t tt. Since i i ii was already open at t , j t , j t,j^(')t, j^{\prime} would not pay to open i i ii which implies that β ( i , j ) = 0 β i , j = 0 beta(i,j^('))=0\beta\left(i, j^{\prime}\right)=0 but then ( i , j ) i , j (i,j^('))\left(i, j^{\prime}\right) would not be a special edge and hence this case cannot arise. This finishes the proof of the claim.
With the preceding two lemmas in place we can bound the total cost of opening facilities in O O O^(')O^{\prime} and connecting clients to them. We will provide a refined statement that turns out to be useful in some applications.
Theorem 9.6.
j D d ( O , j ) + 3 i O f i 3 j D α j 3 OPT L P . j D d O , j + 3 i O f i 3 j D α j 3  OPT L P . sum_(j inD)d(O^('),j)+3sum_(i inO^('))f_(i) <= 3sum_(j inD)alpha_(j) <= 3" OPT"_(LP).\sum_{j \in \mathcal{D}} d\left(O^{\prime}, j\right)+3 \sum_{i \in O^{\prime}} f_{i} \leq 3 \sum_{j \in \mathcal{D}} \alpha_{j} \leq 3 \text{ OPT}_{L P}.
In particular the algorithm yields a 3 3 33-approximation.
Proof. Consider directly connected clients D 1 D 1 D_(1)\mathcal{D}_{1}. We have D 1 = i O Z i D 1 = i O Z i D_(1)=⊎_(i inO^('))Z_(i)\mathcal{D}_{1}=\uplus_{i \in O^{\prime}} Z_{i} where Z i Z i Z_(i)Z_{i} are directly connected to i i ii. Via Lemma 9.8 and Claim 9.2.1
j D 1 α j = i O j Z i α j = i O j Z i ( d ( i , j ) + β i , j ) = j O ( f i + j Z i d ( i , j ) ) j O f i + j D 1 d ( O , j ) . j D 1 α j = i O j Z i α j = i O j Z i d ( i , j ) + β i , j = j O f i + j Z i d ( i , j ) j O f i + j D 1 d O , j . {:[sum_(j inD_(1))alpha_(j)=sum_(i inO^('))sum_(j inZ_(i))alpha_(j)],[=sum_(i inO^('))sum_(j inZ_(i))(d(i,j)+beta_(i,j))],[=sum_(j inO^('))(f_(i)+sum_(j inZ_(i))d(i,j))],[ >= sum_(j inO^('))f_(i)+sum_(j inD_(1))d(O^('),j).]:}\begin{aligned} \sum_{j \in \mathcal{D}_{1}} \alpha_{j} &=\sum_{i \in O^{\prime}} \sum_{j \in Z_{i}} \alpha_{j} \\ &=\sum_{i \in O^{\prime}} \sum_{j \in Z_{i}}\left(d(i, j)+\beta_{i, j}\right) \\ &=\sum_{j \in O^{\prime}}\left(f_{i}+\sum_{j \in Z_{i}} d(i, j)\right) \\ & \geq \sum_{j \in O^{\prime}} f_{i}+\sum_{j \in \mathcal{D}_{1}} d\left(O^{\prime}, j\right) . \end{aligned}
For indirectly connected clients, via Lemma 9.9, we have 3 j D 2 α j 3 j D 2 α j 3sum_(j inD_(2))alpha_(j) >=3 \sum_{j \in \mathcal{D}_{2}} \alpha_{j} \geq j D 2 d ( O , j ) j D 2 d O , j sum_(j inD_(2))d(O^('),j)\sum_{j \in \mathcal{D}_{2}} d\left(O^{\prime}, j\right). Thus
3 j D α j = 3 j D 1 α j + 3 j D 2 α j 3 i O f i + 3 j D 1 d ( O , j ) + j D 2 d ( O , j ) 3 i O f i + j D d ( O , j ) . 3 j D α j = 3 j D 1 α j + 3 j D 2 α j 3 i O f i + 3 j D 1 d O , j + j D 2 d O , j 3 i O f i + j D d O , j . {:[3sum_(j inD)alpha_(j)=3sum_(j inD_(1))alpha_(j)+3sum_(j inD_(2))alpha_(j)],[ >= 3sum_(i inO^('))f_(i)+3sum_(j inD_(1))d(O^('),j)+sum_(j inD_(2))d(O^('),j)],[ >= 3sum_(i inO^('))f_(i)+sum_(j inD)d(O^('),j).]:}\begin{aligned} 3 \sum_{j \in \mathcal{D}} \alpha_{j} &=3 \sum_{j \in \mathcal{D}_{1}} \alpha_{j}+3 \sum_{j \in \mathcal{D}_{2}} \alpha_{j} \\ & \geq 3 \sum_{i \in O^{\prime}} f_{i}+3 \sum_{j \in \mathcal{D}_{1}} d\left(O^{\prime}, j\right)+\sum_{j \in \mathcal{D}_{2}} d\left(O^{\prime}, j\right) \\ & \geq 3 \sum_{i \in O^{\prime}} f_{i}+\sum_{j \in \mathcal{D}} d\left(O^{\prime}, j\right) . \end{aligned}
The algorithm is easy and efficient to implement. One of the main advantages of the stronger property that we saw in the theorem is that it leads to a nice algorithm for the k k kk-median problem; we refer the reader to Chapter 25 in [13:1] for a detailed description. In addition the flexibility of the primal-dual algorithm has led to algorithms for several other variants; see [14] for one such example.
Local search has been shown to be a very effective heuristic algorithm for facility location and clustering probelms and there is extensive literature on this topic. The first paper that proved constant factor approximation bounds for UCFL is by Korupolu, Plaxtion and Rajaraman [15] and it provided a useful template for many future papers. We refer the reader to Chapter 9 in [16] for local search analysis for UCFL.

9.3 k k kk-Median

k k kk-Median has been extensively studied in approximation algorithms due to its simplicity and connection to UCFL. The first constant factor approximation was obtained in [17] via LP rounding. We will consider a slight generalization of k k kk-Median where the medians are to be selected from the facility set F F F\mathcal{F}. We describe the LP which is closely related to that for UCFL which we have already seen. The variables are the same: y i y i y_(i)y_{i} indicates whether a center is opened at location i F i F i inFi \in \mathcal{F} and x i , j x i , j x_(i,j)x_{i, j} indicates whether client j j jj is connected to facility i i ii. The objective and constraints change since the problem requires one to open at most k k kk facilities but there is no cost to opening them.
min j D i F d ( i , j ) x i , j i x i , j = 1 j D x i , j y i i F , j D i y i k x , y 0 min j D i F d ( i , j ) x i , j i x i , j = 1 j D x i , j y i i F , j D i y i k x , y 0 {:[minsum_(j inD)sum_(i inF)d(i","j)x_(i,j)],[sum_(i)x_(i,j)=1quad AA j inD],[x_(i,j) <= y_(i)quad i inF","j inD],[sum_(i)y_(i) <= k],[x","y >= 0]:}\begin{aligned} \min \sum_{j \in \mathcal{D}} \sum_{i \in \mathcal{F}} d(i, j) x_{i, j} & \\ \sum_{i} x_{i, j} &=1 \quad \forall j \in \mathcal{D} \\ x_{i, j} & \leq y_{i} \quad i \in \mathcal{F}, j \in \mathcal{D} \\ \sum_{i} y_{i} & \leq k \\ x, y & \geq 0 \end{aligned}
Rounding of the LP for k k kk-Median is not as simple as it is for UCFL. This is mainly because one needs a global argument. To understanding this consider the following example. Suppose we have k + 1 k + 1 k+1k+1 points where the distance between each pair is 1 1 11 (uniform metric space). Then the optimum integer solution has cost 1 1 11 since we can place k k kk medians at k k kk points and the remaining point has to travel a distance of 1 1 11. Now consider a fractional solution where y i = ( 1 1 / ( k + 1 ) ) y i = ( 1 1 / ( k + 1 ) ) y_(i)=(1-1//(k+1))y_{i}=(1-1 /(k+1)) for each point. The cost of this fractional solution is also 1 1 11, however, each client now pays a fractional cost of 1 / ( k + 1 ) 1 / ( k + 1 ) 1//(k+1)1 /(k+1). Any rounding algorithm will make at least one of the clients pay a cost of 1 1 11 which is much larger than its fractional cost; thus the analysis cannot be based on preserving the cost of each client within a constant factor of its fractional cost; thus the analysis cannot be based on preserving the cost of each client within a constant factor of its fractional cost.
There are several LP rounding algorithms known. An advantage of LP based approach is that it leads to a constant factor approximation for the Matroid Median problem which is a nice and powerful generalization of the k k kk-Median problem; here there is a matroid defined over the facilities and the constraint is that the set of facilities chosen must be independent in the given matroid. One can write a natural LP relaxation for this problem and prove that the LP has a constant integrality gap by appealing to matroid intersection! It showcases the power of classical result in combinatorial optimization. We refer the reader to [18],[19].
Local search for center based clustering problems is perhaps one of the most natural heuristics. In particular we will consider the p p pp-swap heuristic. Given the current set of k k kk centers S S SS, the p p pp-swap heuristic will consider swapping out up to p p pp centers from S S SS with p p pp new centers. It is easy to see that this local search algorithm can be implemented in n O ( p ) n O ( p ) n^(O(p))n^{O(p)} time for each iteration. When p = 1 p = 1 p=1p=1 we simply refer to the algorithm as (basic) local search. We will ignore the convergence time. As we saw for Max Cut, one can use standard tricks to make the algorithm run in polynomial time with a loss of a ( 1 + o ( 1 ) ) ( 1 + o ( 1 ) ) (1+o(1))(1+o(1))-factor in the approximation bound guaranteed by the worst-case local optimum. Thus the main focus will be on the quality of the local optimum. The following is a very nice result.
Theorem 9.7 (Arya et al. [20]). For any fixed p 1 p 1 p >= 1p \geq 1 the p p pp-swap local search heuristic has a tight worst-case approximation ratio of ( 3 + 2 / p ) ( 3 + 2 / p ) (3+2//p)(3+2 / p) for k k kk-Median. In particular the basic local search algorithm yields a 5 5 55-approximation.
Here we give a proof/analysis of the preceding theorem for p = 1 p = 1 p=1p=1, following the simplified analysis presented in [21]. See also [16:1] and the notes from CMU. Given any set of centers S S SS we define cost ( S ) = j D d ( j , S ) cost ( S ) = j D d ( j , S ) cost(S)=sum_(j inD)d(j,S)\operatorname{cost}(S)=\sum_{j \in \mathcal{D}} d(j, S) to be the sum of the distances of the clients to the centers. Let S S SS be a local optimum and let S S S^(**)S^{*} be some fixed optimum solution to the given k k kk-Median instance. To compare cost ( S ) cost ( S ) cost(S)\operatorname{cost}(S) with cost ( S ) cost S cost(S^(**))\operatorname{cost}\left(S^{*}\right) the key idea is to set up a clever set of potential swaps between the centers in S S SS and centers in S S S^(**)S^{*}. That is, we consider a swap pair ( r , f ) ( r , f ) (r,f)(r, f) where r S r S r in Sr \in S and f S f S f inS^(**)f \in S^{*}. Since S S SS is a local optimum it must be the case that cost ( S r + f ) cost ( S ) cost ( S r + f ) cost ( S ) cost(S-r+f) <= cost(S)\operatorname{cost}(S-r+f) \leq \operatorname{cost}(S). The analysis upper bounds the potential increase in the cost in some interesting fashion and sums up the resulting series of inequalities. This may seem magical, and indeed it is not obvious why the analysis proceeds in this fashion. The short answer is that the analysis ideas required a series of developments with the somewhat easier case of UCFL coming first.
We set up some notation. Let ϕ : D S ϕ : D S phi:Drarr S\phi: \mathcal{D} \rightarrow S be the mapping of clients to facilities in S S SS based on nearest distance. Similarly let ϕ : D S ϕ : D S phi^(**):DrarrS^(**)\phi^{*}: \mathcal{D} \rightarrow S^{*} the mapping to facilities in the optimum solution S S S^(**)S^{*}. Thus j j jj connects to facility ϕ ( j ) ϕ ( j ) phi(j)\phi(j) in the local optimum and to ϕ ( j ) ϕ ( j ) phi^(**)(j)\phi^{*}(j) in the optimum solution. We also let N ( i ) N ( i ) N(i)N(i) denote the set of all clients assigned to a facility i S i S i in Si \in S and let N ( i ) N ( i ) N^(**)(i)N^{*}(i) denote the set of all clients assigned to a facility i S i S i inS^(**)i \in S^{*}. Let A j = d ( j , S ) A j = d ( j , S ) A_(j)=d(j,S)A_{j}=d(j, S) and O j = d ( j , S ) O j = d j , S O_(j)=d(j,S^(**))O_{j}=d\left(j, S^{*}\right) be the cost paid by j j jj in local optimum and optimal solutions respectively. To reinforce the notation we express cost ( S ) cost ( S ) cost(S)\operatorname{cost}(S) as follows.
cost ( S ) = j D A j = i S j N ( i ) d ( j , i ) . cost ( S ) = j D A j = i S j N ( i ) d ( j , i ) . cost(S)=sum_(j inD)A_(j)=sum_(i in S)sum_(j in N(i))d(j,i).\operatorname{cost}(S)=\sum_{j \in \mathcal{D}} A_{j}=\sum_{i \in S} \sum_{j \in N(i)} d(j, i) .
Similarly
cost ( S ) = j D O j = i S j N ( i ) d ( j , i ) . cost S = j D O j = i S j N ( i ) d ( j , i ) . cost(S^(**))=sum_(j inD)O_(j)=sum_(i inS^(**))sum_(j inN^(**)(i))d(j,i).\operatorname{cost}\left(S^{*}\right)=\sum_{j \in \mathcal{D}} O_{j}=\sum_{i \in S^{*}} \sum_{j \in N^{*}(i)} d(j, i) .
Setting up the swap pairs: We create a set of pairs P P P\mathcal{P} as follows. We will assume without loss of generality that | S | = | S | = k | S | = S = k |S|=|S^(**)|=k|S|=\left|S^{*}\right|=k. For technical convenience we also assume S S = S S = S nnS^(**)=O/S \cap S^{*}=\emptyset; we can always create dummy centers that are co-located to make this assumption. Consider the mapping ρ : S S ρ : S S rho:S^(**)rarr S\rho: S^{*} \rightarrow S where each i S i S i inS^(**)i \in S^{*} is mapped to its closest center in S S SS; hence ρ ( i ) ρ ( i ) rho(i)\rho(i) for i S i S i inS^(**)i \in S^{*} is the closest center in S S SS to it. Let R 1 R 1 R_(1)R_{1} be the set of centers in S S SS that have exactly one center in S S S^(**)S^{*} mapped to them. Let R 0 R 0 R_(0)R_{0} be the set of centers in S S SS with no centers of S S S^(**)S^{*} mapped to them. This means that for each i S ( R 0 R 1 ) i S R 0 R 1 i in S\\(R_(0)uuR_(1))i \in S \backslash\left(R_{0} \cup R_{1}\right) there are two or more centers mapped to them. Let S 1 S S 1 S S_(1)^(**)subeS^(**)S_{1}^{*} \subseteq S^{*} be the centers mapped to R 1 R 1 R_(1)R_{1}. See Figure 9.2. By a simple averaging argument we have the following claim.
Claim 9.3.1. 2 | R 0 | | S S 1 | 2 R 0 S S 1 2|R_(0)| >= |S^(**)\\S_(1)^(**)|2\left|R_{0}\right| \geq\left|S^{*} \backslash S_{1}^{*}\right|.
We create a set of pairs P P P\mathcal{P} as follows. There will be exactly k k kk pairs. For each f S 1 f S 1 f^(**)inS_(1)^(**)f^{*} \in S_{1}^{*} we add the pair ( r , f ) r , f (r,f^(**))\left(r, f^{*}\right) where ρ ( f ) = r ρ f = r rho(f^(**))=r\rho\left(f^{*}\right)=r. For each f S S 1 f S S 1 f^(**)inS^(**)\\S_(1)^(**)f^{*} \in S^{*} \backslash S_{1}^{*} we add a pair ( r , f ) r , f (r,f^(**))\left(r, f^{*}\right) where r r rr is any arbitrary center from R 0 R 0 R_(0)R_{0} - however we make sure that a center r R 0 r R 0 r inR_(0)r \in R_{0} is in at most two pairs in P P P\mathcal{P}; this is possible because of Claim 9.3.1.
The pairs satisfy the following property.
Claim 9.3.2. If ( r , f ) P r , f P (r,f^(**))inP\left(r, f^{*}\right) \in \mathcal{P} then for any facility f ^ f , ρ ( f ^ ) r f ^ f , ρ f ^ r hat(f)^(**)!=f^(**),rho( hat(f)^(**))!=r\hat{f}^{*} \neq f^{*}, \rho\left(\hat{f}^{*}\right) \neq r.
The intuition for the pairs is as follows. If ρ ( f ) = { r } ρ f = { r } rho(f^(**))={r}\rho\left(f^{*}\right)=\{r\} then we are essentially forced to consider the pair ( r , f ) r , f (r,f^(**))\left(r, f^{*}\right) since r r rr could be the only center near f f f^(**)f^{*} with all other centers from S S SS very far away. When considering the swap ( r , f ) r , f (r,f^(**))\left(r, f^{*}\right) we can move the clients connecting to r r rr to f f f^(**)f^{*}. On the other hand if | ρ 1 ( r ) | > 1 ρ 1 ( r ) > 1 |rho^(-1)(r)| > 1\left|\rho^{-1}(r)\right|>1 then r r rr is close to several centers in S S S^(**)S^{*} and may be serving many clients. Thus we do not consider such an r r rr in the swap pairs.
Figure 9.2: Mapping between S S S^(**)S^{*} and S S SS with each f S f S f^(**)inS^(**)f^{*} \in S^{*} mapped to its closest center in S S SS.
The main technical claim about the swap pairs is the following.
Lemma 9.10. Let ( r , f ) r , f (r,f^(**))\left(r, f^{*}\right) be a pair in P P P\mathcal{P}. Then
0 cost ( S + f r ) cost ( S ) j N ( f ) ( O j A j ) + j N ( r ) 2 O j . 0 cost S + f r cost ( S ) j N f O j A j + j N ( r ) 2 O j . 0 <= "cost"(S+f^(**)-r)-"cost"(S) <= sum_(j inN^(**)(f^(**)))(O_(j)-A_(j))+sum_(j in N(r))2O_(j).0 \leq \textit{cost}\left(S+f^{*}-r\right)-\textit{cost}(S) \leq \sum_{j \in N^{*}\left(f^{*}\right)}\left(O_{j}-A_{j}\right)+\sum_{j \in N(r)} 2 O_{j} .
We defer the proof of the lemma for now and use it to show that cost ( S ) cost ( S ) cost(S) <=\operatorname{cost}(S) \leq 5 cost ( S ) 5 cost S 5cost(S^(**))5 \operatorname{cost}\left(S^{*}\right). We sum over all pairs ( r , f ) P r , f P (r,f^(**))inP\left(r, f^{*}\right) \in \mathcal{P} and note that each f S f S f^(**)in Sf^{*} \in S occurs in exactly one pair and each r S r S r in Sr \in S occurs in at most two pairs. Note that O j A j O j A j O_(j)-A_(j)O_{j}-A_{j} can be a negative number while O j O j O_(j)O_{j} is non-negative number.
0 ( r , f ) P ( j N ( f ) ( O j A j ) + j N ( r ) 2 O j ) f S j N ( f ) ( O j A j ) + 2 r S j N ( r ) 2 O j cost ( S ) cost ( S ) + 4 cost ( S ) . 0 r , f P j N f O j A j + j N ( r ) 2 O j f S j N f O j A j + 2 r S j N ( r ) 2 O j cost S cost ( S ) + 4 cost S . {:[0 <= sum_((r,f^(**))inP)(sum_(j inN^(**)(f^(**)))(O_(j)-A_(j))+sum_(j in N(r))2O_(j))],[ <= sum_(f^(**)inS^(**))sum_(j inN^(**)(f^(**)))(O_(j)-A_(j))+2sum_(r in S)sum_(j in N(r))2O_(j)],[ <= cost(S^(**))-cost(S)+4cost(S^(**)).]:}\begin{aligned} 0 & \leq \sum_{\left(r, f^{*}\right) \in \mathcal{P}}\left(\sum_{j \in N^{*}\left(f^{*}\right)}\left(O_{j}-A_{j}\right)+\sum_{j \in N(r)} 2 O_{j}\right) \\ & \leq \sum_{f^{*} \in S^{*}} \sum_{j \in N^{*}\left(f^{*}\right)}\left(O_{j}-A_{j}\right)+2 \sum_{r \in S} \sum_{j \in N(r)} 2 O_{j} \\ & \leq \operatorname{cost}\left(S^{*}\right)-\operatorname{cost}(S)+4 \operatorname{cost}\left(S^{*}\right) . \end{aligned}
This implies the desired inequality.
Figure 9.3: Two cases in proof of Lemma 9.10. Consider the swap pair ( r , f ) r , f (r,f^(**))\left(r, f^{*}\right) In the figure on the left the client j N ( f ) j N f j inN^(**)(f^(**))j \in N^{*}\left(f^{*}\right) is assigned to f f f^(**)f^{*}. In the figure on the right the client j N ( r ) N ( f ) j N ( r ) N f j in N(r)\\N^(**)(f^(**))j \in N(r) \backslash N^{*}\left(f^{*}\right) is is assigned to r = ρ ( f ^ ) r = ρ ( f ^ ) r^(')=rho( hat(f)^(**))r^{\prime}=\rho(\hat{f}^{*}).
Proof of Lemma 9.10. Since S S SS is a local optimum swapping r r rr with f f f^(**)f^{*} cannot improve the cost and hence we obtain cost ( S + f r ) cost ( S ) 0 cost S + f r cost ( S ) 0 cost(S+f^(**)-r)-cost(S) >= 0\operatorname{cost}\left(S+f^{*}-r\right)-\operatorname{cost}(S) \geq 0. We focus on the more interesting inequality. To bound the increase in cost by removing r r rr and adding f f f^(**)f^{*} to S S SS, we consider a specific assignment of the clients. Any client j N ( f ) j N f j inN^(**)(f^(**))j \in N^{*}\left(f^{*}\right) is assigned to f f f^(**)f^{*} (even if it is suboptimal to do so). See Figure 9.3. For such a client j j jj the change in cost is O j A j O j A j O_(j)-A_(j)O_{j}-A_{j}. Now consider any client j N ( r ) N ( f ) j N ( r ) N f j in N(r)\\N^(**)(f^(**))j \in N(r) \backslash N^{*}\left(f^{*}\right); since r r rr is no longer available we need to assign it to another facility. Which one? Let ϕ ( j ) = f ^ ϕ ( j ) = f ^ phi^(**)(j)= hat(f)^(**)\phi^{*}(j)=\hat{f}^{*} be the facility that j j jj is assigned to in the optimum solution. Note that f ^ f f ^ f hat(f)^(**)!=f^(**)\hat{f}^{*} \neq f^{*}. We assign j j jj to r = ρ ( f ^ ) r = ρ ( f ^ ) r^(')=rho( hat(f)^(**))r^{\prime}=\rho(\hat{f}^{*}); from Claim 9.3.2, ρ ( f ^ ) r ρ ( f ^ ) r rho( hat(f)^(**))!=r\rho(\hat{f}^{*}) \neq r and hence r S r + f r S r + f r^(')in S-r+f^(**)r^{\prime} \in S-r+f^{*}. See Figure 9.3. The change in the cost for such a client j j jj is d ( j , r ) d ( j , r ) d j , r d ( j , r ) d(j,r^('))-d(j,r)d\left(j, r^{\prime}\right)-d(j, r). We bound it as follows
d ( j , r ) d ( j , r ) d ( j , f ^ ) + d ( f ^ , r ) d ( j , r ) (via triangle inequality) d ( j , f ^ ) + d ( f ^ , r ) d ( j , r ) ( since r is closest to f ^ in S ) d ( j , f ^ ) + d ( j , f ^ ) ( via triangle inequality ) = 2 O j . d j , r d ( j , r ) d j , f ^ + d f ^ , r d ( j , r ) (via triangle inequality) d j , f ^ + d f ^ , r d ( j , r ) ( since  r  is closest to  f ^  in  S ) d j , f ^ + d j , f ^ ( via triangle inequality ) = 2 O j . {:[d(j,r^('))-d(j","r) <= d(j, hat(f)^(**))+d( hat(f)^(**),r^('))-d(j","r)quad(via triangle inequality)],[ <= d(j, hat(f)^(**))+d( hat(f)^(**),r)-d(j","r)quad("since "r^(')" is closest to " hat(f)^(**)" in "S)],[ <= d(j, hat(f)^(**))+d(j, hat(f)^(**))quad("via triangle inequality")],[=2O_(j).]:}\begin{aligned} d\left(j, r^{\prime}\right)-d(j, r) & \leq d\left(j, \hat{f}^{*}\right)+d\left(\hat{f}^{*}, r^{\prime}\right)-d(j, r) \quad \text {(via triangle inequality)} \\ &\leq d\left(j, \hat{f}^{*}\right)+d\left(\hat{f}^{*}, r\right)-d(j, r) \quad (\text {since } r^{\prime} \text { is closest to } \hat{f}^{*} \text { in } S) \\ & \leq d\left(j, \hat{f}^{*}\right)+d\left(j, \hat{f}^{*}\right) \quad(\text{via triangle inequality})\\ &=2 O_{j} . \end{aligned}
Every other client is assigned to its existing center in S S SS. Thus the total increase in the cost is obtained as
j N ( f ) ( O j A j ) + j N ( r ) N ( f ) 2 O j j N ( f ) ( O j A j ) + j N ( r ) 2 O j . j N f O j A j + j N ( r ) N f 2 O j j N f O j A j + j N ( r ) 2 O j . sum_(j inN^(**)(f^(**)))(O_(j)-A_(j))+sum_(j in N(r)\\N^(**)(f^(**)))2O_(j) <= sum_(j inN^(**)(f^(**)))(O_(j)-A_(j))+sum_(j in N(r))2O_(j).\sum_{j \in N^{*}\left(f^{*}\right)}\left(O_{j}-A_{j}\right)+\sum_{j \in N(r) \backslash N^{*}\left(f^{*}\right)} 2 O_{j} \leq \sum_{j \in N^{*}\left(f^{*}\right)}\left(O_{j}-A_{j}\right)+\sum_{j \in N(r)} 2 O_{j} .
See [20:1] for a description of the tight example. The example in the conference version of the paper is buggy.

9.4 k k kk-Means

The k k kk-Means problem is very popular in practice for a variety of reasons. In terms of center-based clustering the goal is to choose k k kk centers C = { c 1 , c 2 , , c k } C = c 1 , c 2 , , c k C={c_(1),c_(2),dots,c_(k)}C=\left\{c_{1}, c_{2}, \ldots, c_{k}\right\} to minimize p d ( p , C ) 2 p d ( p , C ) 2 sum_(p)d(p,C)^(2)\sum_{p} d(p, C)^{2}. In the discrete setting one can obtain constant factor approximation algorithms via several techniques that follow the approach of k k kk-Median. We note that the squared distance does not satisfy triangle inequality, however, it satisfies a relaxed triangle inequality and this is sufficient to generalize certain LP rounding and local search techniques.
In practice the continuous version is popular for clustering applications. The input points P P PP are in the Euclidean space R d R d R^(d)\mathbb{R}^{d} where d d dd is typically large. Let X X XX be the set of input points where each x X x X x in Xx \in X is now a d d dd-dimensional vector. The centers are now allowed to be in ambient space. This is called the Euclidean k k kk-Means. Here the squared distance actually helps in a certain sense. For instance if k = 1 k = 1 k=1k=1 then we can see that the optimum center is simply obtained by 1 | P | x X x 1 | P | x X x (1)/(|P|)sum_(x in X)x\frac{1}{|P|} \sum_{x \in X} x - in other words we take the "average". One can see this by considering the problem of finding the center as an optimization problem: min y R d x X x y 2 2 = min y R d x X ( x i y i ) 2 min y R d x X x y 2 2 = min y R d x X x i y i 2 min_(y inR^(d))sum_(x in X)||x-y||_(2)^(2)=min_(y inR^(d))sum_(x in X)(x_(i)-y_(i))^(2)\min _{y \in \mathbb{R}^{d}} \sum_{x \in X}\|x-y\|_{2}^{2}=\min _{y \in \mathbb{R}^{d}} \sum_{x \in X}\left(x_{i}-y_{i}\right)^{2}. It can be seen that we can optimize in each dimension separately and that the optimum in dimension i i ii, can be seen to be y i = 1 | X | x i y i = 1 | X | x i y_(i)^(**)=(1)/(|X|)x_(i)y_{i}^{*}=\frac{1}{|X|} x_{i}. Surprisingly, hardness results for Euclidean k k kk-Means were only established in the last decade. NP-hardness even when d = 2 d = 2 d=2d=2 is established in [22], and APX-hardness for high dimensions was shown in [23].

9.5 Lloyd's algorithm, D 2 D 2 D^(2)D^{2}-sampling and k k kk-Means + + + + ++++

Llyod's algorithm is a very well-known and widely used heuristic for the Euclidean k k kk-Means problem. It can viewed as a local search algorithm with an alternating optimization flavor. The algorithm starts with a set of centers c 1 , c 2 , , c k c 1 , c 2 , , c k c_(1),c_(2),dots,c_(k)c_{1}, c_{2}, \ldots, c_{k} which are typically chosen randomly in some fashion. The centers define clusters C 1 , C 2 , , C k C 1 , C 2 , , C k C_(1),C_(2),dots,C_(k)C_{1}, C_{2}, \ldots, C_{k} based on assigning each point to its nearest center. That is C i C i C_(i)C_{i} is the set of all points in X X XX that are closest to c i c i c_(i)c_{i} (ties broken in some fashion). Once we have the clusters, via the observation above, one can find the best center for that cluster by taking the mean of the points in the cluster. That is, for each C i C i C_(i)C_{i} we find a new center c i = 1 | C i | x C i x c i = 1 C i x C i x c_(i)^(')=(1)/(|C_(i)|)sum_(x inC_(i))xc_{i}^{\prime}=\frac{1}{\left|C_{i}\right|} \sum_{x \in C_{i}} x (if C i C i C_(i)C_{i} is empty we simply set c i = c i c i = c i c_(i)^(')=c_(i)c_{i}^{\prime}=c_{i}). Thus we have a new set of centers and we repeat the process until convergence or some time limit. It is clear that the cost can only improve by recomputing the centers since we know the optimum center for k = 1 k = 1 k=1k=1 is obtained by using the average of the points.
Lloyds- k -Means ( X , k ) _ Lloyds- k -Means ( X , k ) _ "Lloyds-"k"-Means"(X,k)_\underline{\text{Lloyds-}k\text{-Means}(X,k)}
1. Seeding: Pick k k kk centers c 1 , c 2 , , c k c 1 , c 2 , , c k c_(1),c_(2),dots,c_(k)c_{1}, c_{2}, \ldots, c_{k}
2. repeat
quad\quad A. Find clusters C 1 , C 2 , , C k C 1 , C 2 , , C k C_(1),C_(2),dots,C_(k)C_{1}, C_{2}, \ldots, C_{k} where
C i = { x X c i C i = x X c i qquadC_(i)={x in X∣c_(i):}\qquad C_{i}=\left\{x \in X \mid c_{i}\right. is closest center to x } x {:x}\left.x\right\}
quad\quad B. cos t = i = 1 k x C i d ( x , c i ) 2 cos t = i = 1 k x C i d x , c i 2 cos t=sum_(i=1)^(k)sum_(x inC_(i))d(x,c_(i))^(2)\cos t=\sum_{i=1}^{k} \sum_{x \in C_{i}} d\left(x, c_{i}\right)^{2}.
quad\quad C. For i = 1 i = 1 i=1i=1 to k k kk do c i = 1 | C i | x C i x c i = 1 C i x C i x c_(i)=(1)/(|C_(i)|)sum_(x inC_(i))xc_{i}=\frac{1}{\left|C_{i}\right|} \sum_{x \in C_{i}} x
3. Until cost improvement is too small
4. Output clusters C 1 , C 2 , , C k C 1 , C 2 , , C k C_(1),C_(2),dots,C_(k)C_{1}, C_{2}, \ldots, C_{k}
There are two issues with the algorithm. The first issue is that the algorithm can, in the worst-case, run for an exponential number of iterations. This issue is common for many local search algorithms and as we discussed, it can be overcome by stopping when cost improvement is too small in a relative sense. The second issue is the more significant one. The algorithm can get stuck in a local optimum which can be arbitrarily bad when compared to the optimum solution. See figure below for a simple example.
Figure 9.4: Example demonstrating that a local optimum for Lloyd's algorithm can be arbitrarily bad compared to the optimum clustering. The green clusters are the optimum ones and the red ones are the local optimum.
D 2 D 2 D^(2)D^{2}-sampling and k k kk-Means++: To overcome the bad local optima it is common to run the algorithm with random starting centers. Arthur and Vassilvitskii [24] suggested a specific random sampling scheme to initialize the centers that is closely related to independent work in [25]. This is called D 2 D 2 D^(2)D^{2} sampling.
D 2 -sampling- k -Means + + ( X , k ) _ D 2 -sampling- k -Means + + ( X , k ) _ D^(2)"-sampling-"k"-Means"++(X,k)_\underline{D^{2} \text {-sampling-} k \text {-Means}++(X, k)}
1. S = { c 1 } S = c 1 S={c_(1)}S=\left\{c_{1}\right\} where c 1 c 1 c_(1)c_{1} is chosen uniformly from X X XX
2. for i = 2 i = 2 i=2i=2 to k k kk do
quad\quad A. Choose c i c i c_(i)c_{i} randomly where P [ c i = x ] d ( x , S ) 2 P c i = x d ( x , S ) 2 P[c_(i)=x]≃d(x,S)^(2)\mathbf{P}\left[c_{i}=x\right] \simeq d(x, S)^{2}
quad\quad B. S S { c i } S S c i S larr S uu{c_(i)}S \leftarrow S \cup\left\{c_{i}\right\}
3. Output S S SS
k k kk-Means ++ is Lloyd's algorithm intialized with k k kk centers obtained from D 2 D 2 D^(2)D^{2} sampling.
Theorem 9.8 ([24:1]). Let S S SS be the output of Lloyd's algorithm initialized with D 2 D 2 D^(2)D^{2} sampling. Then E [ cost ( S ) ] 8 ( ln k + 2 ) OPT E [ cost ( S ) ] 8 ( ln k + 2 ) OPT E["cost"(S)] <= 8(ln k+2)"OPT"\mathbf{E}[\textit{cost}(S)] \leq 8(\ln k+2)\text{OPT}. Moreover there are examples showing that it is no better than 2 ln k 2 ln k 2ln k2 \ln k competitive.
The analysis establishes that the seeding already creates a good approximation, so in a sense the local search is only refining the initial approximation. [26], [27] show that if one uses O ( k ) O ( k ) O(k)O(k) centers, initialized according to D 2 D 2 D^(2)D^{2} sampling, then the local optimum will yield a constant factor approximation with constant probability; note that this is a bicriteria approximation where the number of centers is a constant factor more than k k kk and the cost is being compared with respect to the optimum cost with k k kk centers. The authors also show that there is a subset of k k kk centers from the output of the algorithm that yields a constant factor approximation. One can then run a discrete optimization algorithm using the centers. Another interesting result based on D 2 D 2 D^(2)D^{2}-sampling ideas yields a PTAS but the running time is of the form O ( n d 2 O ~ ( k 2 / ϵ ) ) O ( n d 2 O ~ ( k 2 / ϵ ) ) O(nd2^( tilde(O)(k^(2)//epsilon)))O(n d 2^{\tilde{O}(k^{2} / \epsilon)}) [28]. See [29] for a scalable version of k k kk-Means + + + + ++++.

  1. Tomás Feder and Daniel Greene. “Optimal algorithms for approximate clustering”. In: Proceedings of the twentieth annual ACM symposium on Theory of computing. 1988, pp. 434–444. ↩︎
  2. Dorit S Hochbaum and David B Shmoys. “A unified approach to approximation algorithms for bottleneck problems”. In: Journal of the ACM (JACM) 33.3 (1986), pp. 533–550. ↩︎
  3. Viswanath Nagarajan, Baruch Schieber, and Hadas Shachnai. “The Euclidean k-supplier problem”. In: Mathematics of Operations Research 45.1 (2020), pp. 1–14. ↩︎
  4. Deeparnab Chakrabarty and Maryam Negahbani. “Generalized center problems with outliers”. In: ACM Transactions on Algorithms (TALG) 15.3 (2019), pp. 1–14. ↩︎
  5. Ján Plesnık. “A heuristic for the p-center problems in graphs”. In: Discrete Applied Mathematics 17.3 (1987), pp. 263–268. ↩︎
  6. Tanvi Bajpai, Deeparnab Chakrabarty, Chandra Chekuri, and Maryam Negahbani. “Revisiting Priority k k kk-Center: Fairness and Outliers”. In: arXiv preprint arXiv:2103.03337 (2021). ↩︎
  7. Sariel Har-Peled and Manor Mendel. “Fast construction of nets in lowdimensional metrics and their applications”. In: SIAM Journal on Computing 35.5 (2006), pp. 1148–1184. ↩︎
  8. Sariel Har-Peled and Benjamin Raichel. “Net and prune: A linear time algorithm for euclidean distance problems”. In: Journal of the ACM (JACM) 62.6 (2015), pp. 1–35. ↩︎
  9. Shi Li. “A 1.488 approximation algorithm for the uncapacitated facility location problem”. In: Information and Computation 222 (2013), pp. 45–58. ↩︎
  10. Sudipto Guha and Samir Khuller. “Greedy strikes back: Improved facility location algorithms”. In: Journal of algorithms 31.1 (1999), pp. 228–248. ↩︎
  11. Kamal Jain and Vijay V Vazirani. “Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and Lagrangian relaxation”. In: Journal of the ACM (JACM) 48.2 (2001), pp. 274–296. ↩︎
  12. Reuven Bar-Yehuda and Shimon Even. “A linear-time approximation algorithm for the weighted vertex cover problem”. In: Journal of Algorithms 2.2 (1981), pp. 198–203. ↩︎
  13. Vijay V Vazirani. Approximation algorithms. Springer Science & Business Media, 2013. ↩︎ ↩︎
  14. Yair Bartal, Moses Charikar, and Danny Raz. “Approximating Min-Sum k-Clustering in Metric Spaces”. In: Proceedings of the Thirty-Third Annual ACM Symposium on Theory of Computing. STOC ’01. Hersonissos, Greece: Association for Computing Machinery, 2001, pp. 11–20. isbn: 1581133499. doi: 10.1145/380752.380754. url: https://doi.org/10.1145/380752.380754. ↩︎
  15. Madhukar R Korupolu, C Greg Plaxton, and Rajmohan Rajaraman. “Analysis of a local search heuristic for facility location problems”. In: Journal of algorithms 37.1 (2000), pp. 146–188. ↩︎
  16. David P Williamson and David B Shmoys. The design of approximation algorithms. Cambridge university press, 2011. ↩︎ ↩︎
  17. Moses Charikar, Sudipto Guha, Éva Tardos, and David B Shmoys. “A constant-factor approximation algorithm for the k-median problem”. In: Journal of Computer and System Sciences 65.1 (2002), pp. 129–149. ↩︎
  18. Ravishankar Krishnaswamy, Amit Kumar, Viswanath Nagarajan, Yogish Sabharwal, and Barna Saha. “The matroid median problem”. In: Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms. SIAM. 2011, pp. 1117–1130. ↩︎
  19. Chaitanya Swamy. “Improved approximation algorithms for matroid and knapsack median problems and applications”. In: ACM Transactions on Algorithms (TALG) 12.4 (2016), pp. 1–22. ↩︎
  20. Vijay Arya, Naveen Garg, Rohit Khandekar, Adam Meyerson, Kamesh Munagala, and Vinayaka Pandit. “Local search heuristics for k-median and facility location problems”. In: SIAM Journal on computing 33.3 (2004), pp. 544–562. ↩︎ ↩︎
  21. Anupam Gupta and Kanat Tangwongsan. Simpler Analyses of Local Search Algorithms for Facility Location. 2008. arXiv: 0809.2554 [cs.DS]. ↩︎
  22. Meena Mahajan, Prajakta Nimbhorkar, and Kasturi Varadarajan. “The planar k-means problem is NP-hard”. In: Theoretical Computer Science 442 (2012), pp. 13–21. ↩︎
  23. Pranjal Awasthi, Moses Charikar, Ravishankar Krishnaswamy, and Ali Kemal Sinop. “The Hardness of Approximation of Euclidean k-Means”. In: 31st International Symposium on Computational Geometry (SoCG 2015). Ed. by Lars Arge and János Pach. Vol. 34. Leibniz International Proceedings in Informatics (LIPIcs). Dagstuhl, Germany: Schloss Dagstuhl–LeibnizZentrum fuer Informatik, 2015, pp. 754–767. isbn: 978-3-939897-83-5. doi: 10.4230/LIPIcs.SOCG.2015.754. url: http://drops.dagstuhl.de/opus/ volltexte/2015/5117. ↩︎
  24. Sergei Vassilvitskii and David Arthur. “k-means++: The advantages of careful seeding”. In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. 2006, pp. 1027–1035. ↩︎ ↩︎
  25. Rafail Ostrovsky, Yuval Rabani, Leonard J Schulman, and Chaitanya Swamy. “The effectiveness of Lloyd-type methods for the k-means problem”. In: Journal of the ACM (JACM) 59.6 (2013), pp. 1–22. ↩︎
  26. Ankit Aggarwal, Amit Deshpande, and Ravi Kannan. “Adaptive sampling for k-means clustering”. In: Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques. Springer, 2009, pp. 15–28. ↩︎
  27. Nir Ailon, Ragesh Jaiswal, and Claire Monteleoni. “Streaming k-means approximation.” In: NIPS. Vol. 22. 2009, pp. 10–18. ↩︎
  28. Ragesh Jaiswal, Amit Kumar, and Sandeep Sen. “A simple D 2-samplingbased PTAS for k-means and other clustering problems”. In: Algorithmica 70.1 (2014), pp. 22–46. ↩︎
  29. Bahman Bahmani, Benjamin Moseley, Andrea Vattani, Ravi Kumar, and Sergei Vassilvitskii. “Scalable K-Means++”. In: Proc. VLDB Endow. 5.7 (Mar. 2012), pp. 622–633. issn: 2150-8097. doi: 10.14778/2180912.2180915. url: https://doi.org/10.14778/2180912.2180915. ↩︎

Recommended for you

Weihua He
Spiking Neural Network : Towards Brain-inspired Computing
Spiking Neural Network : Towards Brain-inspired Computing
A brief introduction and discussion of the base of brain-inspired computing, spiking neural network (SNN).
9 points
1 issues