Approximation Algorithms: Introduction to Local Search

This is the lecture notes from Chandra Chekuri's CS583 course on Approximation Algorithms. Chapter 8: Introduction to Local Search. You can read Chapter 9: Clustering and Facility Location, here. Chapter 7: Congestion Minimization in Networks, here.
Chapter 8
Local search is a powerful and widely used heuristic method (with various extensions). In this lecture we introduce this technique in the context of approximation algorithms. The basic outline of local search is as follows. For an instance I I II of a given problem let S ( I ) S ( I ) S(I)\mathcal{S}(I) denote the set of feasible solutions for I I II. For a solution S S SS we use the term (local) neighborhood of S S SS to be the set of all solutions S S S^(')S^{\prime} such that S S S^(')S^{\prime} can be obtained from S S SS via some local moves. We let N ( S ) N ( S ) N(S)N(S) denote the neighborhood of S S SS.
LocalSearch:
Find a “good” initial solution S 0 S ( I ) S 0 S ( I ) S_(0)inS(I)S_{0}\in\mathcal{S}(I)
S S 0 S S 0 SlarrS_(0)\mathcal{S}\leftarrow \mathcal{S}_{0}
repeat
quad\quad If ( S N ( S ) S N ( S ) (EES^(')in N(S):}\left(\exists S^{\prime} \in N(S)\right. such that val ( S ) val S val(S^('))\operatorname{val}\left(S^{\prime}\right) is strictly better than val ( S ) ) val ( S ) {: val(S))\left.\operatorname{val}(S)\right)
S S S S qquad S larrS^(')\qquad S \leftarrow S^{\prime}
quad\quad Else
S S qquad S\qquad S is a local optimum
qquad\qquad return S S SS
quad\quad EndIf
Until (True)
For minimization problems S S S^(')S^{\prime} is strictly better than S S SS if val ( S ) < val ( S ) val S < val ( S ) val(S^(')) < val(S)\operatorname{val}\left(S^{\prime}\right)<\operatorname{val}(S) whereas for maximization problems it is the case if val ( S ) > val ( S ) val S > val ( S ) val(S^(')) > val(S)\operatorname{val}\left(S^{\prime}\right)>\operatorname{val}(S).
The running time of the generic local search algorithm depends on several factors. First, we need an algorithm that given a solution S S SS either declares that S S SS is a local optimum or finds a solution S N ( S ) S N ( S ) S^(')in N(S)S^{\prime} \in N(S) such that val ( S ) val S val(S^('))\operatorname{val}\left(S^{\prime}\right) is strictly better than val ( S ) val ( S ) val(S)\operatorname{val}(S). A standard and easy approach for this is to ensure that the local moves are defined in such a way that | N ( S ) | | N ( S ) | |N(S)||N(S)| is polynomial in the input size | I | | I | |I||I| and N ( S ) N ( S ) N(S)N(S) can be enumerated efficiently; thus one can check each S N ( S ) S N ( S ) S^(')in N(S)S^{\prime} \in N(S) to see if any of them is an improvement over S S SS. However, in some more advanced settings, N ( S ) N ( S ) N(S)N(S) may be exponential in the input size but one may be able to find a solution in S N ( S ) S N ( S ) S^(')in N(S)S^{\prime} \in N(S) that improves on S S SS in polynomial time. Second, the running time of the algorithm depends also on the number of iterations it takes to go from S 0 S 0 S_(0)S_{0} to a local optimum. In the worst case the number of iterations could be | OPT val ( S 0 ) | | OPT val S 0 | |OPT-val(S_(0))||\operatorname{OPT} -\operatorname{val}\left(S_{0}\right)| which can be exponential in the input size. One can often use a standard scaling trick to overcome this issue; we stop the algorithm unless the improvement obtained over the current S S SS is a significant fraction of val ( S ) val ( S ) val(S)\operatorname{val}(S). Finally, the quality of the initial solution S 0 S 0 S_(0)S_{0} also factors into the running time.
Remark 8.1. There is a distinction made sometimes in the literature between oblivious and non-oblivious local search. In oblivious local search the algorithm uses f f ff as a black box when comparing S S SS with a candidate solution S S S^(')S^{\prime}; the analysis and the definition of local moves are typically based on properties of f f ff. In non-oblivious local search one may use an auxiliary function g g gg derived from f f ff in comparing S S SS with S S S^(')S^{\prime}. Typically the function g g gg is some kind of potential function that aids the analysis. This allows one to move to a solution S S S^(')S^{\prime} even though f ( S ) f S f(S^('))f\left(S^{\prime}\right) may not be an improvement over f ( S ) f ( S ) f(S)f(S).
Remark 8.2. There are several heuristics that are (loosely) connected to local search. These include simulated annealing, tabu search, genetic algorithms and others. These fall under the broad terminology of metaheuristics.

8.1 Local Search for Max Cut

We illustrate local search for the well-known Max Cut problem. In Max Cut we are given an undirected graph G = ( V , E ) G = ( V , E ) G=(V,E)G=(V, E) and the goal is to partition V V VV into ( S , V S ) ( S , V S ) (S,V\\S)(S, V \backslash S) so as to maximize the number of edges crossing S S SS, that is, | δ G ( S ) | δ G ( S ) |delta_(G)(S)|\left|\delta_{G}(S)\right|; we will use δ ( S ) δ ( S ) delta(S)\delta(S) when G G GG is clear. For a vertex v v vv we use δ ( v ) δ ( v ) delta(v)\delta(v) instead of δ ( { v } ) δ ( { v } ) delta({v})\delta(\{v\}) to simplify notation. In the weighted version each edge e e ee has a non-negative weight w ( e ) w ( e ) w(e)w(e) the goal is to maximize the weight of the edges crossing S S SS, that is, w ( δ ( S ) ) w ( δ ( S ) ) w(delta(S))w(\delta(S)); here w ( A ) w ( A ) w(A)w(A) for a set A A AA denotes the quantity e A w ( e ) e A w ( e ) sum_(e in A)w(e)\sum_{e \in A} w(e).
We consider a simple local search algorithm for Max Cut that starts with an arbitrary set S V S V S sube VS \subseteq V and in each iteration either adds a vertex to S S SS or removes a vertex from S S SS as long as it improves the cut value δ ( S ) δ ( S ) delta(S)\delta(S).
LocalSearch for Max Cut:
S S Slarr O/\mathcal{S}\leftarrow \emptyset
repeat
quad\quad If ( v V S ( v V S (EE v in V\\S(\exists v \in V \backslash S such that w ( δ ( S + v ) ) > w ( δ ( S ) ) ) w ( δ ( S + v ) ) > w ( δ ( S ) ) ) w(delta(S+v)) > w(delta(S)))w(\delta(S+v))>w(\delta(S)))
S S + v S S + v qquad S larr S+v\qquad S \leftarrow S+v
quad\quad Else If ( v S ( v S (EE v in S(\exists v \in S such that w ( δ ( S v ) ) > w ( δ ( S ) ) ) w ( δ ( S v ) ) > w ( δ ( S ) ) ) w(delta(S-v)) > w(delta(S)))w(\delta(S-v))>w(\delta(S)))
S S v S S v qquad S larr S-v\qquad S \leftarrow S-v
quad\quad Else
S S qquad S\qquad S is a local optimum
qquad\qquad return S S SS
quad\quad EndIf
Until (True)
We will first focus on the quality of solution output by the local search algorithm.
Lemma 8.1. Let S S SS be the output of the local search algorithm. Then for each vertex v , w ( δ ( S ) δ ( v ) ) w ( δ ( v ) ) / 2 v , w ( δ ( S ) δ ( v ) ) w ( δ ( v ) ) / 2 v,w(delta(S)nn delta(v)) >= w(delta(v))//2v, w(\delta(S) \cap \delta(v)) \geq w(\delta(v)) / 2.
Proof. Let α v = w ( δ ( S ) δ ( v ) ) α v = w ( δ ( S ) δ ( v ) ) alpha_(v)=w(delta(S)nn delta(v))\alpha_{v}=w(\delta(S) \cap \delta(v)) be the weight of edges among those incident to v v vv that cross the cut S S SS. Let β v = w ( δ ( v ) ) α v β v = w ( δ ( v ) ) α v beta_(v)=w(delta(v))-alpha_(v)\beta_{v}=w(\delta(v))-\alpha_{v}.
We claim that α v β v α v β v alpha_(v) >= beta_(v)\alpha_{v} \geq \beta_{v} for each v v vv. If v V S v V S v in V\\Sv \in V \backslash S and α v < β v α v < β v alpha_(v) < beta_(v)\alpha_{v}<\beta_{v} then moving v v vv to S S SS will strictly increase w ( δ ( S ) ) w ( δ ( S ) ) w(delta(S))w(\delta(S)) and S S SS cannot be a local optimum. Similarly if v S v S v in Sv \in S and α v < β v α v < β v alpha_(v) < beta_(v)\alpha_{v}<\beta_{v}, we would have w ( δ ( S v ) ) > w ( δ ( S ) ) w ( δ ( S v ) ) > w ( δ ( S ) ) w(delta(S-v)) > w(delta(S))w(\delta(S-v))>w(\delta(S)) and S S SS would not be a local optimum.
Corollary 8.1. If S S SS is a local optimum then w ( δ ( S ) ) w ( E ) / 2 OPT / 2 w ( δ ( S ) ) w ( E ) / 2 OPT / 2 w(delta(S)) >= w(E)//2 >= OPT //2w(\delta(S)) \geq w(E) / 2 \geq \operatorname{OPT} / 2.
Proof. Since each edge is incident to exactly two vertices we have w ( δ ( S ) ) = w ( δ ( S ) ) = w(delta(S))=w(\delta(S))= 1 2 v V w ( δ ( S ) δ ( v ) ) 1 2 v V w ( δ ( S ) δ ( v ) ) (1)/(2)sum_(v in V)w(delta(S)nn delta(v))\frac{1}{2} \sum_{v \in V} w(\delta(S) \cap \delta(v)). Apply the above lemma,
w ( δ ( S ) ) = 1 2 v V w ( δ ( S ) δ ( v ) ) 1 2 v V w ( δ ( v ) ) / 2 1 2 w ( E ) 1 2 OPT, w ( δ ( S ) ) = 1 2 v V w ( δ ( S ) δ ( v ) ) 1 2 v V w ( δ ( v ) ) / 2 1 2 w ( E ) 1 2 OPT, {:[w(delta(S))=(1)/(2)sum_(v in V)w(delta(S)nn delta(v))],[ >= (1)/(2)sum_(v in V)w(delta(v))//2],[ >= (1)/(2)w(E)],[ >= (1)/(2)"OPT,"]:}\begin{aligned} w(\delta(S)) &=\frac{1}{2} \sum_{v \in V} w(\delta(S) \cap \delta(v)) \\ & \geq \frac{1}{2} \sum_{v \in V} w(\delta(v)) / 2 \\ & \geq \frac{1}{2} w(E) \\ & \geq \frac{1}{2} \text {OPT,} \end{aligned}
since OPT w ( E ) OPT w ( E ) "OPT" <= w(E)\text{OPT} \leq w(E).
The running time of the local search algorithm depends on the number of local improvement iterations; checking whether there is a local move that results in an improvement can be done by trying all possible vertices. If the graph is unweighted then the algorithm terminates in at most | E | | E | |E||E| iterations. However, in the weighted case, it is known that the algorithm can take an exponential time in | V | | V | |V||V| when the weights are large. A very interesting problem is whether one can find a local optimum efficiently - note that we do not need to find a local optimum by doing local search! It is believed that finding a local optimum for Max Cut in the weighted case is hard. There is a complexity class called PLS that was defined by Johnson, Papadimitrioiu and Yannakakis for which Max Cut is known to be a complete problem. PLS plays an important role in algorithmic game theory in recent times. We refer the reader to the Wikipedia page on PLS for many pointers.
Many local search algorithms can be modified slightly to terminate with an approximate local optimum such that (i) the running time of the modified algorithm is strongly polynomial in the input size and (ii) the quality of the solution is very similar to that given by the original local search. We illustrate these ideas for Max Cut. Consider the following algorithm where ϵ > 0 ϵ > 0 epsilon > 0\epsilon>0 is a parameter that can be chosen. Let n n nn be the number of nodes in G G GG.
Modified LocalSearch for Max Cut ( ϵ ) _ Modified LocalSearch for Max Cut ( ϵ ) _ "Modified LocalSearch for Max Cut"(epsilon)_\underline{\text{Modified LocalSearch for Max Cut}(\epsilon)}
S { v } where v = arg max v V w ( δ ( v ) ) S { v }  where  v = arg max v V w ( δ ( v ) ) S larr{v^(**)}" where "v^(**)=arg max_(v in V)w(delta(v))S \leftarrow\{v^{*}\} \text { where } v^{*}=\arg \max _{v \in V} w(\delta(v))
repeat
quad\quad If ( v V S ( v V S (EE v in V\\S(\exists v \in V \backslash S such that w ( δ ( S + v ) ) > ( 1 + ϵ n ) w ( δ ( S ) ) ) w ( δ ( S + v ) ) > ( 1 + ϵ n ) w ( δ ( S ) ) ) w(delta(S+v)) > (1+(epsilon )/(n))w(delta(S)))w(\delta(S+v))>(1+\frac{\epsilon}{n}) w(\delta(S)))
S S + v S S + v qquad S larr S+v\qquad S \leftarrow S+v
quad\quad Else If ( v S ( v S (EE v in S(\exists v \in S such that w ( δ ( S v ) ) > ( 1 + ϵ n ) w ( δ ( S ) ) ) w ( δ ( S v ) ) > ( 1 + ϵ n ) w ( δ ( S ) ) ) w(delta(S-v)) > (1+(epsilon )/(n))w(delta(S)))w(\delta(S-v))>(1+\frac{\epsilon}{n}) w(\delta(S)))
S S v S S v qquad S larr S-v\qquad S \leftarrow S-v
quad\quad Else
qquad\qquad return S S SS
quad\quad EndIf
Until (True)
The above algorithm terminates unless the improvement is a relative factor of ( 1 + ϵ n ) ( 1 + ϵ n ) (1+(epsilon )/(n))(1+\frac{\epsilon}{n}) over the current solution's value. Thus the final output S S SS is an approximate local optimum.
Remark 8.3. An alert reader may wonder why the improvement is measured with respect to the global value w ( δ ( S ) ) w ( δ ( S ) ) w(delta(S))w(\delta(S)) rather than with respect to w ( δ ( v ) ) w ( δ ( v ) ) w(delta(v))w(\delta(v)). One reason is to illustrate the general idea when one may not have fine grained information about the function like we do here in the specific case of Max Cut. The global analysis will also play a role in the running time analysis as we will see shortly.
Lemma 8.2. Let S S SS be the output of the modified local search algorithm for Max Cut. Then w ( δ ( S ) ) 1 2 ( 1 + ϵ / 4 ) w ( E ) w ( δ ( S ) ) 1 2 ( 1 + ϵ / 4 ) w ( E ) w(delta(S)) >= (1)/(2(1+epsilon//4))w(E)w(\delta(S)) \geq \frac{1}{2(1+\epsilon / 4)} w(E).
Proof. As before let α v = w ( δ ( S ) δ ( v ) ) α v = w ( δ ( S ) δ ( v ) ) alpha_(v)=w(delta(S)nn delta(v))\alpha_{v}=w(\delta(S) \cap \delta(v)) and β v = w ( δ ( v ) ) α v β v = w ( δ ( v ) ) α v beta_(v)=w(delta(v))-alpha_(v)\beta_{v}=w(\delta(v))-\alpha_{v}. Since S S SS is an approximately local optimum we claim that for each v v vv
β v α v ϵ n w ( δ ( S ) ) . β v α v ϵ n w ( δ ( S ) ) . beta_(v)-alpha_(v) <= (epsilon )/(n)w(delta(S)).\beta_{v}-\alpha_{v} \leq \frac{\epsilon}{n} w(\delta(S)) .
Otherwise a local move using v v vv would improve S S SS by more than ( 1 + ϵ / n ) ( 1 + ϵ / n ) (1+epsilon//n)(1+\epsilon / n) factor. (The formal proof is left as an exercise to the reader).
We have,
w ( δ ( S ) ) = 1 2 v V α v = 1 2 v V ( ( α v + β v ) ( β v α v ) ) / 2 1 4 v V ( w ( δ ( v ) ) ϵ n w ( S ) ) 1 2 w ( E ) 1 4 v V ϵ n w ( S ) 1 2 w ( E ) 1 4 ϵ w ( S ) . w ( δ ( S ) ) = 1 2 v V α v = 1 2 v V α v + β v β v α v / 2 1 4 v V ( w ( δ ( v ) ) ϵ n w ( S ) ) 1 2 w ( E ) 1 4 v V ϵ n w ( S ) 1 2 w ( E ) 1 4 ϵ w ( S ) . {:[w(delta(S))=(1)/(2)sum_(v in V)alpha_(v)],[=(1)/(2)sum_(v in V)((alpha_(v)+beta_(v))-(beta_(v)-alpha_(v)))//2],[ >= (1)/(4)sum_(v in V)(w(delta(v))-(epsilon )/(n)w(S))],[ >= (1)/(2)w(E)-(1)/(4)sum_(v in V)(epsilon )/(n)w(S)],[ >= (1)/(2)w(E)-(1)/(4)epsilon*w(S).]:}\begin{aligned} w(\delta(S)) &=\frac{1}{2} \sum_{v \in V} \alpha_{v} \\ &=\frac{1}{2} \sum_{v \in V}\left(\left(\alpha_{v}+\beta_{v}\right)-\left(\beta_{v}-\alpha_{v}\right)\right) / 2 \\ & \geq \frac{1}{4} \sum_{v \in V}(w(\delta(v))-\frac{\epsilon}{n} w(S)) \\ & \geq \frac{1}{2} w(E)-\frac{1}{4} \sum_{v \in V} \frac{\epsilon}{n} w(S) \\ & \geq \frac{1}{2} w(E)-\frac{1}{4} \epsilon \cdot w(S). \end{aligned}
Therefore w ( S ) ( 1 + ϵ / 4 ) w ( E ) / 2 w ( S ) ( 1 + ϵ / 4 ) w ( E ) / 2 w(S)(1+epsilon//4) >= w(E)//2w(S)(1+\epsilon / 4) \geq w(E) / 2 and the lemma follows.
Now we argue about the number of iterations of the algorithm.
Lemma 8.3. The modified local search algorithm terminates in O ( 1 ϵ n log n ) O ( 1 ϵ n log n ) O((1)/(epsilon)n log n)O(\frac{1}{\epsilon} n \log n) iterations of the improvement step.
Proof. We observe that w ( S 0 ) = w ( δ ( v ) ) 2 n w ( E ) w S 0 = w δ v 2 n w ( E ) w(S_(0))=w(delta(v^(**))) >= (2)/(n)w(E)w\left(S_{0}\right)=w\left(\delta\left(v^{*}\right)\right) \geq \frac{2}{n} w(E) (why?). Each local improvement iteration improves w ( δ ( S ) ) w ( δ ( S ) ) w(delta(S))w(\delta(S)) by a multiplicative factor of ( 1 + ϵ / n ) ( 1 + ϵ / n ) (1+epsilon//n)(1+\epsilon / n). Therefore if k k kk is the number of iterations that the algorithm, then ( 1 + ϵ / n ) k w ( S 0 ) w ( δ ( S ) ( 1 + ϵ / n ) k w S 0 w ( δ ( S ) (1+epsilon//n)^(k)w(S_(0)) <= w(delta(S)(1+\epsilon / n)^{k} w\left(S_{0}\right) \leq w(\delta(S) where S S SS is the final output. However, w ( δ ( S ) ) w ( E ) w ( δ ( S ) ) w ( E ) w(delta(S)) <= w(E)w(\delta(S)) \leq w(E). Hence
( 1 + ϵ / n ) k 2 w ( E ) / n w ( E ) ( 1 + ϵ / n ) k 2 w ( E ) / n w ( E ) (1+epsilon//n)^(k)2w(E)//n <= w(E)(1+\epsilon / n)^{k} 2 w(E) / n \leq w(E)
which implies that k = O ( 1 ϵ n log n ) k = O ( 1 ϵ n log n ) k=O((1)/(epsilon)n log n)k=O(\frac{1}{\epsilon} n \log n).
A tight example for local optimum: Does the local search algorithm do better than 1 / 2 1 / 2 1//21 / 2? Here we show that a local optimum is no better than a 1 / 2 1 / 2 1//21 / 2-approximation. Consider a complete bipartite graph K 2 n , 2 n K 2 n , 2 n K_(2n,2n)K_{2 n, 2 n} with 2 n 2 n 2n2 n vertices in each part. If L L LL and R R RR are the parts of a set S S SS where | S L | = n = | S R | | S L | = n = | S R | |S nn L|=n=|S nn R||S \cap L|=n=|S \cap R| is a local optimum with | δ ( S ) | = | E | / 2 | δ ( S ) | = | E | / 2 |delta(S)|=|E|//2|\delta(S)|=|E| / 2. The optimum solution for this instance is | E | | E | |E||E|.
Max Directed Cut: A problem related to Max Cut is Max Directed Cut in which we are given a directed edge-weighted graph G = ( V , E ) G = ( V , E ) G=(V,E)G=(V, E) and the goal is to find a set S V S V S sube VS \subseteq V that maximizes w ( δ G + ( S ) ) w δ G + ( S ) w(delta_(G)^(+)(S))w\left(\delta_{G}^{+}(S)\right); that is, the weight of the directed edges leaving S S SS. One can apply a similar local search as the one for Max Cut. However, the following example shows that the output S S SS can be arbitrarily bad. Let G = ( V , E ) G = ( V , E ) G=(V,E)G=(V, E) be a directed in-star with center v v vv and arcs connecting each of v 1 , , v n v 1 , , v n v_(1),dots,v_(n)v_{1}, \ldots, v_{n} to v v vv. Then S = { v } S = { v } S={v}S=\{v\} is a local optimum with δ + ( S ) = δ + ( S ) = delta^(+)(S)=O/\delta^{+}(S)=\emptyset while OPT = n = n =n=n. However, a minor tweak to the algorithm gives a 1 / 3 1 / 3 1//31 / 3-approximation! Instead of returning the local optimum S S SS return the better of S S SS and V S V S V\\SV \backslash S. This step is needed because the directed cuts are not symmetric.

8.2 Local Search for Submodular Function Maximization

In this section we consider the utility of local search for maximizing nonnegative submodular functions. Let f : 2 V R + f : 2 V R + f:2^(V)rarrR_(+)f: 2^{V} \rightarrow \mathbb{R}_{+} be a non-negative submodular set function on a ground set V V VV. Recall that f f ff is submodular if f ( A ) + f ( B ) f ( A ) + f ( B ) f(A)+f(B) >=f(A)+f(B) \geq f ( A B ) + f ( A B ) f ( A B ) + f ( A B ) f(A uu B)+f(A nn B)f(A \cup B)+f(A \cap B) for all A , B V A , B V A,B sube VA, B \subseteq V. Equivalently f f ff is submodular if f ( A + v ) f ( A + v ) f(A+v)-f(A+v)- f ( A ) f ( B + v ) f ( B ) f ( A ) f ( B + v ) f ( B ) f(A) >= f(B+v)-f(B)f(A) \geq f(B+v)-f(B) for all A B A B A sub BA \subset B and v B v B v!in Bv \notin B. f f ff is monotone if f ( A ) f ( B ) f ( A ) f ( B ) f(A) <= f(B)f(A) \leq f(B) for all A B A B A sube BA \subseteq B. f f ff is symmetric if f ( A ) = f ( V A ) f ( A ) = f ( V A ) f(A)=f(V\\A)f(A)=f(V \backslash A) for all A V A V A sube VA \subseteq V. Submodular functions arise in a number of settings in combinatorial optimization. Two important examples are the following.
Example 8.1. Coverage in set systems. Let S 1 , S 2 , , S n S 1 , S 2 , , S n S_(1),S_(2),dots,S_(n)S_{1}, S_{2}, \ldots, S_{n} be subsets of a set U U U\mathcal{U}. Let V = { 1 , 2 , , n } V = { 1 , 2 , , n } V={1,2,dots,n}V=\{1,2, \ldots, n\} and define f : 2 V R + f : 2 V R + f:2^(V)rarrR_(+)f: 2^{V} \rightarrow \mathbb{R}_{+} where f ( A ) = | i A S i | . f f ( A ) = i A S i . f f(A)=|uu_(i in A)S_(i)|.ff(A)=\left|\cup_{i \in A} S_{i}\right| . f is a monotone submodular function. One can also associate weights to elements of U U U\mathcal{U} via a function w : U R + w : U R + w:UrarrR_(+)w: \mathcal{U} \rightarrow \mathbb{R}_{+}; the function f f ff defined as f ( A ) = w ( i A S i ) f ( A ) = w i A S i f(A)=w(uu_(i in A)S_(i))f(A)=w\left(\cup_{i \in A} S_{i}\right) is also monotone submodular.
Example 8.2. Cut functions in graphs. Let G = ( V , E ) G = ( V , E ) G=(V,E)G=(V, E) be an undirected graph with non-negative edge weights w : E R + w : E R + w:E rarrR_(+)w: E \rightarrow \mathbb{R}_{+}. The cut function f : 2 V R + f : 2 V R + f:2^(V)rarrR_(+)f: 2^{V} \rightarrow \mathbb{R}_{+} defined as f ( S ) = e δ G ( S ) w ( e ) f ( S ) = e δ G ( S ) w ( e ) f(S)=sum_(e indelta_(G)(S))w(e)f(S)=\sum_{e \in \delta_{G}(S)} w(e) is a symmetric submodular function; it is not monotone unless the graph is trivial. If G G GG is directed and we define f f ff as f ( S ) = e δ G + ( S ) w ( e ) f ( S ) = e δ G + ( S ) w ( e ) f(S)=sum_(e indelta_(G)^(+)(S))w(e)f(S)=\sum_{e \in \delta_{G}^{+}(S)} w(e) then f f ff is submodular but is not necessarily symmetric.
The following problem generalizes Max Cut and Max Directed Cut that we have already seen.
Problem 8.2. Max Submod Func. Given a non-negative submodular set function f f ff on a ground set V V VV via a value oracle[1] find max S V f ( S ) max S V f ( S ) max_(S sube V)f(S)\max _{S \subseteq V} f(S).
Note that if f f ff is monotone then the problem is trivial since V V VV is the optimum solution. Therefore, the problem is interesting (and NP-Hard) only when f f ff is not necessarily monotone. We consider a simple local search algorithm for Max Submod Func and show that it gives a 1 / 3 1 / 3 1//31/3-approximation and a 1 / 2 1 / 2 1//21/2-approximation when f f ff is symmetric. This was shown in [2].
Remark 8.4. Given a graph G = ( V , E ) G = ( V , E ) G=(V,E)G=(V, E) consider the submodular function f f ff : 2 V R 2 V R 2^(V)rarrR2^{V} \rightarrow \mathbb{R} where f ( S ) = | δ ( S ) | B f ( S ) = | δ ( S ) | B f(S)=|delta(S)|-Bf(S)=|\delta(S)|-B where B B BB is a fixed number. Is there a polynomial time algorithm to decide whether there is a set S S SS such that f ( S ) 0 f ( S ) 0 f(S) >= 0f(S) \geq 0?
LocalSearch for Max Submod Func:
S S Slarr O/\mathcal{S}\leftarrow \emptyset
repeat
quad\quad If ( v V S ( v V S (EE v in V\\S(\exists v \in V \backslash S such that f ( S + v ) > f ( S ) ) f ( S + v ) > f ( S ) ) f(S+v) > f(S))f(S+v)>f(S))
S S + v S S + v qquad S larr S+v\qquad S \leftarrow S+v
quad\quad Else If ( v S ( v S (EE v in S(\exists v \in S such that f ( S v ) > f ( S ) ) f ( S v ) > f ( S ) ) f(S-v) > f(S))f(S-v)>f(S))
S S v S S v qquad S larr S-v\qquad S \leftarrow S-v
quad\quad Else
S S qquad S\qquad S is a local optimum
qquad\qquad return the better of S S SS and V S V S V\\SV\backslash S
quad\quad EndIf
Until (True)
We start the analysis of the algorithm with a basic lemma on submodularity.
Lemma 8.4. Let f : 2 V R + f : 2 V R + f:2^(V)rarrR_(+)f: 2^{V} \rightarrow \mathbb{R}_{+} be a submodular set function on V V VV. Let A B V A B V A sub B sube VA \subset B \subseteq V. Then
  • If f ( B ) > f ( A ) f ( B ) > f ( A ) f(B) > f(A)f(B)>f(A) then there is an element v B A v B A v in B\\Av \in B \backslash A such that f ( A + v ) f ( A ) > 0 f ( A + v ) f ( A ) > 0 f(A+v)-f(A) > 0f(A+v)-f(A)>0. More generally there is an element v B A v B A v in B\\Av \in B \backslash A such that f ( A + v ) f ( A ) f ( A + v ) f ( A ) f(A+v)-f(A) >=f(A+v)-f(A) \geq 1 | B A | ( f ( B ) f ( A ) ) 1 | B A | ( f ( B ) f ( A ) ) (1)/(|B\\A|)(f(B)-f(A))\frac{1}{|B \backslash A|}(f(B)-f(A)).
  • If f ( A ) > f ( B ) f ( A ) > f ( B ) f(A) > f(B)f(A)>f(B) then there is an element v B A v B A v in B\\Av \in B \backslash A such that f ( B v ) f ( B ) > 0 f ( B v ) f ( B ) > 0 f(B-v)-f(B) > 0f(B-v)-f(B)>0. More generally there is an element v B A v B A v in B\\Av \in B \backslash A such that f ( B v ) f ( B ) f ( B v ) f ( B ) f(B-v)-f(B) >=f(B-v)-f(B) \geq 1 | B A | ( f ( A ) f ( B ) ) 1 | B A | ( f ( A ) f ( B ) ) (1)/(|B\\A|)(f(A)-f(B))\frac{1}{|B \backslash A|}(f(A)-f(B)).
Exercise 8.1. Prove the preceding lemma.
We obtain the following corollary.
Corollary 8.3. Let S S SS be a local optimum for the local search algorithm and let S S S^(**)S^{*} be an optimum solution. Then f ( S ) f ( S S ) f ( S ) f S S f(S) >= f(S nnS^(**))f(S) \geq f\left(S \cap S^{*}\right) and f ( S ) f ( S S ) f ( S ) f S S f(S) >= f(S uuS^(**))f(S) \geq f\left(S \cup S^{*}\right).
Theorem 8.4. The local search algorithm is a 1 / 3 1 / 3 1//31/3-approximation and is a 1 / 2 1 / 2 1//21/2-approximation if f f ff is symmetric.
Proof. Let S S SS be the local optimum and S S S^(**)S^{*} be a global optimum for the given instance. From the previous corollary we have that f ( S ) f ( S S ) f ( S ) f S S f(S) >= f(S nnS^(**))f(S) \geq f\left(S \cap S^{*}\right) and f ( S ) f ( S S ) f ( S ) f S S f(S) >= f(S uuS^(**))f(S) \geq f\left(S \cup S^{*}\right). Note that the algorithm outputs the better of S S SS and V S V S V\\SV \backslash S. By submodularity, we have,
f ( V S ) + f ( S S ) f ( S S ) + f ( V ) f ( S S ) f ( V S ) + f S S f S S + f ( V ) f S S f(V\\S)+f(S uuS^(**)) >= f(S^(**)\\S)+f(V) >= f(S^(**)\\S)f(V \backslash S)+f\left(S \cup S^{*}\right) \geq f\left(S^{*} \backslash S\right)+f(V) \geq f\left(S^{*} \backslash S\right)
where we used the non-negativity of f f ff in the second inequality. Putting together the inequalities,
2 f ( S ) + f ( V S ) = f ( S ) + f ( S ) + f ( V S ) f ( S S ) + f ( S S ) f ( S ) + f ( ) f ( S ) = OPT . 2 f ( S ) + f ( V S ) = f ( S ) + f ( S ) + f ( V S ) f S S + f S S f S + f ( ) f S = OPT . {:[2f(S)+f(V\\S)=f(S)+f(S)+f(V\\S)],[ >= f(S nnS^(**))+f(S^(**)\\S)],[ >= f(S^(**))+f(O/)],[ >= f(S^(**))="OPT".]:}\begin{aligned} 2 f(S)+f(V \backslash S) &=f(S)+f(S)+f(V \backslash S) \\ & \geq f\left(S \cap S^{*}\right)+f\left(S^{*} \backslash S\right) \\ & \geq f\left(S^{*}\right)+f(\emptyset) \\ & \geq f\left(S^{*}\right)=\text {OPT} . \end{aligned}
Thus 2 f ( S ) + f ( V S ) OPT 2 f ( S ) + f ( V S ) OPT 2f(S)+f(V\\S) >= "OPT"2 f(S)+f(V \backslash S) \geq\text {OPT} and hence max { f ( S ) , f ( V S ) } OPT / 3 max { f ( S ) , f ( V S ) } OPT / 3 max{f(S),f(V\\S)} >= "OPT"//3\max \{f(S), f(V \backslash S)\} \geq \text {OPT}/ 3.
If f f ff is symmetric we argue as follows. Using Lemma 8.4 we claim that f ( S ) f ( S S ) f ( S ) f S S f(S) >= f(S nnS^(**))f(S) \geq f\left(S \cap S^{*}\right) as before but also that f ( S ) f ( S S ¯ ) f ( S ) f S S ¯ f(S) >= f(S uu bar(S)^(**))f(S) \geq f\left(S \cup \bar{S}^{*}\right) where A ¯ A ¯ bar(A)\bar{A} is shorthand notation for the the complement V A V A V\\AV \backslash A. Since f f ff is symmetric f ( S S ¯ ) = f S S ¯ = f(S uu bar(S)^(**))=f\left(S \cup \bar{S}^{*}\right)= f ( V ( S S ¯ ) ) = f ( S ¯ S ) = f ( S S ) f V S S ¯ = f S ¯ S = f S S f(V\\(S uu bar(S)^(**)))=f(( bar(S))nnS^(**))=f(S^(**)\\S)f\left(V \backslash\left(S \cup \bar{S}^{*}\right)\right)=f\left(\bar{S} \cap S^{*}\right)=f\left(S^{*} \backslash S\right). Thus,
2 f ( S ) f ( S S ) + f ( S S ¯ ) f ( S S ) + f ( S S ) f ( S ) + f ( ) f ( S ) = OPT . 2 f ( S ) f S S + f S S ¯ f S S + f S S f S + f ( ) f S = OPT . {:[2f(S) >= f(S nnS^(**))+f(S uu bar(S)^(**))],[ >= f(S nnS^(**))+f(S^(**)\\S)],[ >= f(S^(**))+f(O/)],[ >= f(S^(**))="OPT".]:}\begin{aligned} 2 f(S) & \geq f\left(S \cap S^{*}\right)+f\left(S \cup \bar{S}^{*}\right) \\ & \geq f\left(S \cap S^{*}\right)+f\left(S^{*} \backslash S\right) \\ & \geq f\left(S^{*}\right)+f(\emptyset) \\ & \geq f\left(S^{*}\right)=\text {OPT} . \end{aligned}
Therefore f ( S ) OPT / 2 f ( S ) OPT / 2 f(S) >= "OPT"//2f(S) \geq \text {OPT}/ 2.
The running time of the local search algorithm may not be polynomial but one can modify the algorithm as we did for Max Cut to obtain a strongly polynomial time algorithm that gives a ( 1 / 3 o ( 1 ) ) ( 1 / 3 o ( 1 ) ) (1//3-o(1))(1 / 3-o(1))-approximation ( ( 1 / 2 o ( 1 ) ( ( 1 / 2 o ( 1 ) ((1//2-o(1)((1 / 2-o(1); for symmetric ) ) )). See [2:1] for more details. There has been much work on submodular function maximization including work on variants with additional constraints. Local search has been a powerful tool for these problems. See [3], [4], [5] for some of the results on local search based method, and [6] for a survey on submodular function maximization.

  1. A value oracle for a set function f : 2 V R f : 2 V R f:2^(V)rarrRf:2^{V} \rightarrow \mathbb {R} provides access to the function by giving the value f ( A ) f ( A ) f(A)f(A) when presented with the set A A AA. ↩︎
  2. Uriel Feige, Vahab S Mirrokni, and Jan Vondrak. “Maximizing nonmonotone submodular functions”. In: SIAM Journal on Computing 40.4 (2011), pp. 1133–1153. ↩︎ ↩︎
  3. Simon Bruggmann and Rico Zenklusen. “Submodular maximization through the lens of linear programming”. In: Mathematics of Operations Research 44.4 (2019), pp. 1221–1244. ↩︎
  4. Moran Feldman, Joseph Seffi Naor, Roy Schwartz, and Justin Ward. “Improved approximations for k-exchange systems”. In: European Symposium on Algorithms. Springer. 2011, pp. 784–798. ↩︎
  5. Jon Lee, Maxim Sviridenko, and Jan Vondrák. “Matroid matching: the power of local search”. In: SIAM Journal on Computing 42.1 (2013), pp. 357–379. ↩︎
  6. Niv Buchbinder and Moran Feldman. “Submodular functions maximization problems”. In: Handbook of Approximation Algorithms and Metaheuristics, Second Edition. Chapman and Hall/CRC, 2018, pp. 753–788. ↩︎

Recommended for you

Srishti Saha
High Dimension Data Analysis - A tutorial and review for Dimensionality Reduction Techniques
High Dimension Data Analysis - A tutorial and review for Dimensionality Reduction Techniques
This article explains and provides a comparative study of a few techniques for dimensionality reduction. It dives into the mathematical explanation of several feature selection and feature transformation techniques, while also providing the algorithmic representation and implementation of some other techniques. Lastly, it also provides a very brief review of various other works done in this space.
29 points
0 issues