This is the lecture notes from Chandra Chekuri's CS583 course on Approximation Algorithms. Chapter 8: Introduction to Local Search.
You can read Chapter 9: Clustering and Facility Location, here. Chapter 7: Congestion Minimization in Networks, here.
Chapter 8
Local search is a powerful and widely used heuristic method (with various extensions). In this lecture we introduce this technique in the context of approximation algorithms. The basic outline of local search is as follows. For an instance II of a given problem let S(I)\mathcal{S}(I) denote the set of feasible solutions for II. For a solution SS we use the term (local) neighborhood of SS to be the set of all solutions S^(')S^{\prime} such that S^(')S^{\prime} can be obtained from SS via some local moves. We let N(S)N(S) denote the neighborhood of SS.
LocalSearch:
Find a “good” initial solution S_(0)inS(I)S_{0}\in\mathcal{S}(I)
SlarrS_(0)\mathcal{S}\leftarrow \mathcal{S}_{0}
repeat
quad\quad If (EES^(')in N(S):}\left(\exists S^{\prime} \in N(S)\right. such that val(S^('))\operatorname{val}\left(S^{\prime}\right) is strictly better than {: val(S))\left.\operatorname{val}(S)\right)
qquad S larrS^(')\qquad S \leftarrow S^{\prime}
quad\quad Else
qquad S\qquad S is a local optimum
qquad\qquad return SS
quad\quad EndIf
Until (True)
For minimization problems S^(')S^{\prime} is strictly better than SS if val(S^(')) < val(S)\operatorname{val}\left(S^{\prime}\right)<\operatorname{val}(S) whereas for maximization problems it is the case if val(S^(')) > val(S)\operatorname{val}\left(S^{\prime}\right)>\operatorname{val}(S).
The running time of the generic local search algorithm depends on several factors. First, we need an algorithm that given a solution SS either declares that SS is a local optimum or finds a solution S^(')in N(S)S^{\prime} \in N(S) such that val(S^('))\operatorname{val}\left(S^{\prime}\right) is strictly better than val(S)\operatorname{val}(S). A standard and easy approach for this is to ensure that the local moves are defined in such a way that |N(S)||N(S)| is polynomial in the input size |I||I| and N(S)N(S) can be enumerated efficiently; thus one can check each S^(')in N(S)S^{\prime} \in N(S) to see if any of them is an improvement over SS. However, in some more advanced settings, N(S)N(S) may be exponential in the input size but one may be able to find a solution in S^(')in N(S)S^{\prime} \in N(S) that improves on SS in polynomial time. Second, the running time of the algorithm depends also on the number of iterations it takes to go from S_(0)S_{0} to a local optimum. In the worst case the number of iterations could be |OPT-val(S_(0))||\operatorname{OPT} -\operatorname{val}\left(S_{0}\right)| which can be exponential in the input size. One can often use a standard scaling trick to overcome this issue; we stop the algorithm unless the improvement obtained over the current SS is a significant fraction of val(S)\operatorname{val}(S). Finally, the quality of the initial solution S_(0)S_{0} also factors into the running time.
Remark 8.1. There is a distinction made sometimes in the literature between oblivious and non-oblivious local search. In oblivious local search the algorithm uses ff as a black box when comparing SS with a candidate solution S^(')S^{\prime}; the analysis and the definition of local moves are typically based on properties of ff. In non-oblivious local search one may use an auxiliary function gg derived from ff in comparing SS with S^(')S^{\prime}. Typically the function gg is some kind of potential function that aids the analysis. This allows one to move to a solution S^(')S^{\prime} even though f(S^('))f\left(S^{\prime}\right) may not be an improvement over f(S)f(S).
Remark 8.2. There are several heuristics that are (loosely) connected to local search. These include simulated annealing, tabu search, genetic algorithms and others. These fall under the broad terminology of metaheuristics.
8.1 Local Search for Max Cut
We illustrate local search for the well-known Max Cut problem. In Max Cut we are given an undirected graph G=(V,E)G=(V, E) and the goal is to partition VV into (S,V\\S)(S, V \backslash S) so as to maximize the number of edges crossing SS, that is, |delta_(G)(S)|\left|\delta_{G}(S)\right|; we will use delta(S)\delta(S) when GG is clear. For a vertex vv we use delta(v)\delta(v) instead of delta({v})\delta(\{v\}) to simplify notation. In the weighted version each edge ee has a non-negative weight w(e)w(e) the goal is to maximize the weight of the edges crossing SS, that is, w(delta(S))w(\delta(S)); here w(A)w(A) for a set AA denotes the quantity sum_(e in A)w(e)\sum_{e \in A} w(e).
We consider a simple local search algorithm for Max Cut that starts with an arbitrary set S sube VS \subseteq V and in each iteration either adds a vertex to SS or removes a vertex from SS as long as it improves the cut value delta(S)\delta(S).
LocalSearch for Max Cut:
Slarr O/\mathcal{S}\leftarrow \emptyset
repeat
quad\quad If (EE v in V\\S(\exists v \in V \backslash S such that w(delta(S+v)) > w(delta(S)))w(\delta(S+v))>w(\delta(S)))
qquad S larr S+v\qquad S \leftarrow S+v
quad\quad Else If (EE v in S(\exists v \in S such that w(delta(S-v)) > w(delta(S)))w(\delta(S-v))>w(\delta(S)))
qquad S larr S-v\qquad S \leftarrow S-v
quad\quad Else
qquad S\qquad S is a local optimum
qquad\qquad return SS
quad\quad EndIf
Until (True)
We will first focus on the quality of solution output by the local search algorithm.
Lemma 8.1.Let SS be the output of the local search algorithm. Then for each vertex v,w(delta(S)nn delta(v)) >= w(delta(v))//2v, w(\delta(S) \cap \delta(v)) \geq w(\delta(v)) / 2.
Proof. Let alpha_(v)=w(delta(S)nn delta(v))\alpha_{v}=w(\delta(S) \cap \delta(v)) be the weight of edges among those incident to vv that cross the cut SS. Let beta_(v)=w(delta(v))-alpha_(v)\beta_{v}=w(\delta(v))-\alpha_{v}.
We claim that alpha_(v) >= beta_(v)\alpha_{v} \geq \beta_{v} for each vv. If v in V\\Sv \in V \backslash S and alpha_(v) < beta_(v)\alpha_{v}<\beta_{v} then moving vv to SS will strictly increase w(delta(S))w(\delta(S)) and SS cannot be a local optimum. Similarly if v in Sv \in S and alpha_(v) < beta_(v)\alpha_{v}<\beta_{v}, we would have w(delta(S-v)) > w(delta(S))w(\delta(S-v))>w(\delta(S)) and SS would not be a local optimum.
Corollary 8.1.If SS is a local optimum then w(delta(S)) >= w(E)//2 >= OPT //2w(\delta(S)) \geq w(E) / 2 \geq \operatorname{OPT} / 2.
Proof. Since each edge is incident to exactly two vertices we have w(delta(S))=w(\delta(S))=(1)/(2)sum_(v in V)w(delta(S)nn delta(v))\frac{1}{2} \sum_{v \in V} w(\delta(S) \cap \delta(v)). Apply the above lemma,
The running time of the local search algorithm depends on the number of local improvement iterations; checking whether there is a local move that results in an improvement can be done by trying all possible vertices. If the graph is unweighted then the algorithm terminates in at most |E||E| iterations. However, in the weighted case, it is known that the algorithm can take an exponential time in |V||V| when the weights are large. A very interesting problem is whether one can find a local optimum efficiently - note that we do not need to find a local optimum by doing local search! It is believed that finding a local optimum for Max Cut in the weighted case is hard. There is a complexity class called PLS that was defined by Johnson, Papadimitrioiu and Yannakakis for which Max Cut is known to be a complete problem. PLS plays an important role in algorithmic game theory in recent times. We refer the reader to the Wikipedia page on PLS for many pointers.
Many local search algorithms can be modified slightly to terminate with an approximate local optimum such that (i) the running time of the modified algorithm is strongly polynomial in the input size and (ii) the quality of the solution is very similar to that given by the original local search. We illustrate these ideas for Max Cut. Consider the following algorithm where epsilon > 0\epsilon>0 is a parameter that can be chosen. Let nn be the number of nodes in GG.
"Modified LocalSearch for Max Cut"(epsilon)_\underline{\text{Modified LocalSearch for Max Cut}(\epsilon)}
S larr{v^(**)}" where "v^(**)=arg max_(v in V)w(delta(v))S \leftarrow\{v^{*}\} \text { where } v^{*}=\arg \max _{v \in V} w(\delta(v))
repeat
quad\quad If (EE v in V\\S(\exists v \in V \backslash S such that w(delta(S+v)) > (1+(epsilon )/(n))w(delta(S)))w(\delta(S+v))>(1+\frac{\epsilon}{n}) w(\delta(S)))
qquad S larr S+v\qquad S \leftarrow S+v
quad\quad Else If (EE v in S(\exists v \in S such that w(delta(S-v)) > (1+(epsilon )/(n))w(delta(S)))w(\delta(S-v))>(1+\frac{\epsilon}{n}) w(\delta(S)))
qquad S larr S-v\qquad S \leftarrow S-v
quad\quad Else
qquad\qquad return SS
quad\quad EndIf
Until (True)
The above algorithm terminates unless the improvement is a relative factor of (1+(epsilon )/(n))(1+\frac{\epsilon}{n}) over the current solution's value. Thus the final output SS is an approximate local optimum.
Remark 8.3. An alert reader may wonder why the improvement is measured with respect to the global value w(delta(S))w(\delta(S)) rather than with respect to w(delta(v))w(\delta(v)). One reason is to illustrate the general idea when one may not have fine grained information about the function like we do here in the specific case of Max Cut. The global analysis will also play a role in the running time analysis as we will see shortly.
Lemma 8.2.Let SS be the output of the modified local search algorithm for Max Cut. Then w(delta(S)) >= (1)/(2(1+epsilon//4))w(E)w(\delta(S)) \geq \frac{1}{2(1+\epsilon / 4)} w(E).
Proof. As before let alpha_(v)=w(delta(S)nn delta(v))\alpha_{v}=w(\delta(S) \cap \delta(v)) and beta_(v)=w(delta(v))-alpha_(v)\beta_{v}=w(\delta(v))-\alpha_{v}. Since SS is an approximately local optimum we claim that for each vv
Otherwise a local move using vv would improve SS by more than (1+epsilon//n)(1+\epsilon / n) factor. (The formal proof is left as an exercise to the reader).
Therefore w(S)(1+epsilon//4) >= w(E)//2w(S)(1+\epsilon / 4) \geq w(E) / 2 and the lemma follows.
Now we argue about the number of iterations of the algorithm.
Lemma 8.3.The modified local search algorithm terminates in O((1)/(epsilon)n log n)O(\frac{1}{\epsilon} n \log n) iterations of the improvement step.
Proof. We observe that w(S_(0))=w(delta(v^(**))) >= (2)/(n)w(E)w\left(S_{0}\right)=w\left(\delta\left(v^{*}\right)\right) \geq \frac{2}{n} w(E) (why?). Each local improvement iteration improves w(delta(S))w(\delta(S)) by a multiplicative factor of (1+epsilon//n)(1+\epsilon / n). Therefore if kk is the number of iterations that the algorithm, then (1+epsilon//n)^(k)w(S_(0)) <= w(delta(S)(1+\epsilon / n)^{k} w\left(S_{0}\right) \leq w(\delta(S) where SS is the final output. However, w(delta(S)) <= w(E)w(\delta(S)) \leq w(E). Hence
which implies that k=O((1)/(epsilon)n log n)k=O(\frac{1}{\epsilon} n \log n).
A tight example for local optimum: Does the local search algorithm do better than 1//21 / 2? Here we show that a local optimum is no better than a 1//21 / 2-approximation. Consider a complete bipartite graph K_(2n,2n)K_{2 n, 2 n} with 2n2 n vertices in each part. If LL and RR are the parts of a set SS where |S nn L|=n=|S nn R||S \cap L|=n=|S \cap R| is a local optimum with |delta(S)|=|E|//2|\delta(S)|=|E| / 2. The optimum solution for this instance is |E||E|.
Max Directed Cut: A problem related to Max Cut is Max Directed Cut in which we are given a directed edge-weighted graph G=(V,E)G=(V, E) and the goal is to find a set S sube VS \subseteq V that maximizes w(delta_(G)^(+)(S))w\left(\delta_{G}^{+}(S)\right); that is, the weight of the directed edges leaving SS. One can apply a similar local search as the one for Max Cut. However, the following example shows that the output SS can be arbitrarily bad. Let G=(V,E)G=(V, E) be a directed in-star with center vv and arcs connecting each of v_(1),dots,v_(n)v_{1}, \ldots, v_{n} to vv. Then S={v}S=\{v\} is a local optimum with delta^(+)(S)=O/\delta^{+}(S)=\emptyset while OPT =n=n. However, a minor tweak to the algorithm gives a 1//31 / 3-approximation! Instead of returning the local optimum SS return the better of SS and V\\SV \backslash S. This step is needed because the directed cuts are not symmetric.
8.2 Local Search for Submodular Function Maximization
In this section we consider the utility of local search for maximizing nonnegative submodular functions. Let f:2^(V)rarrR_(+)f: 2^{V} \rightarrow \mathbb{R}_{+} be a non-negative submodular set function on a ground set VV. Recall that ff is submodular if f(A)+f(B) >=f(A)+f(B) \geqf(A uu B)+f(A nn B)f(A \cup B)+f(A \cap B) for all A,B sube VA, B \subseteq V. Equivalently ff is submodular if f(A+v)-f(A+v)-f(A) >= f(B+v)-f(B)f(A) \geq f(B+v)-f(B) for all A sub BA \subset B and v!in Bv \notin B. ff is monotone if f(A) <= f(B)f(A) \leq f(B) for all A sube BA \subseteq B. ff is symmetric if f(A)=f(V\\A)f(A)=f(V \backslash A) for all A sube VA \subseteq V. Submodular functions arise in a number of settings in combinatorial optimization. Two important examples are the following.
Example 8.1. Coverage in set systems. Let S_(1),S_(2),dots,S_(n)S_{1}, S_{2}, \ldots, S_{n} be subsets of a set U\mathcal{U}. Let V={1,2,dots,n}V=\{1,2, \ldots, n\} and define f:2^(V)rarrR_(+)f: 2^{V} \rightarrow \mathbb{R}_{+} where f(A)=|uu_(i in A)S_(i)|.ff(A)=\left|\cup_{i \in A} S_{i}\right| . f is a monotone submodular function. One can also associate weights to elements of U\mathcal{U} via a function w:UrarrR_(+)w: \mathcal{U} \rightarrow \mathbb{R}_{+}; the function ff defined as f(A)=w(uu_(i in A)S_(i))f(A)=w\left(\cup_{i \in A} S_{i}\right) is also monotone submodular.
Example 8.2. Cut functions in graphs. Let G=(V,E)G=(V, E) be an undirected graph with non-negative edge weights w:E rarrR_(+)w: E \rightarrow \mathbb{R}_{+}. The cut function f:2^(V)rarrR_(+)f: 2^{V} \rightarrow \mathbb{R}_{+} defined as f(S)=sum_(e indelta_(G)(S))w(e)f(S)=\sum_{e \in \delta_{G}(S)} w(e) is a symmetric submodular function; it is not monotone unless the graph is trivial. If GG is directed and we define ff as f(S)=sum_(e indelta_(G)^(+)(S))w(e)f(S)=\sum_{e \in \delta_{G}^{+}(S)} w(e) then ff is submodular but is not necessarily symmetric.
The following problem generalizes Max Cut and Max Directed Cut that we have already seen.
Problem 8.2. Max Submod Func. Given a non-negative submodular set function ff on a ground set VV via a value oracle[1]find max_(S sube V)f(S)\max _{S \subseteq V} f(S).
Note that if ff is monotone then the problem is trivial since VV is the optimum solution. Therefore, the problem is interesting (and NP-Hard) only when ff is not necessarily monotone. We consider a simple local search algorithm for Max Submod Func and show that it gives a 1//31/3-approximation and a 1//21/2-approximation when ff is symmetric. This was shown in [2].
Remark 8.4. Given a graph G=(V,E)G=(V, E) consider the submodular function ff : 2^(V)rarrR2^{V} \rightarrow \mathbb{R} where f(S)=|delta(S)|-Bf(S)=|\delta(S)|-B where BB is a fixed number. Is there a polynomial time algorithm to decide whether there is a set SS such that f(S) >= 0f(S) \geq 0?
LocalSearch for Max Submod Func:
Slarr O/\mathcal{S}\leftarrow \emptyset
repeat
quad\quad If (EE v in V\\S(\exists v \in V \backslash S such that f(S+v) > f(S))f(S+v)>f(S))
qquad S larr S+v\qquad S \leftarrow S+v
quad\quad Else If (EE v in S(\exists v \in S such that f(S-v) > f(S))f(S-v)>f(S))
qquad S larr S-v\qquad S \leftarrow S-v
quad\quad Else
qquad S\qquad S is a local optimum
qquad\qquad return the better of SS and V\\SV\backslash S
quad\quad EndIf
Until (True)
We start the analysis of the algorithm with a basic lemma on submodularity.
Lemma 8.4.Let f:2^(V)rarrR_(+)f: 2^{V} \rightarrow \mathbb{R}_{+} be a submodular set function on VV. Let A sub B sube VA \subset B \subseteq V. Then
If f(B) > f(A)f(B)>f(A) then there is an element v in B\\Av \in B \backslash A such that f(A+v)-f(A) > 0f(A+v)-f(A)>0. More generally there is an element v in B\\Av \in B \backslash A such that f(A+v)-f(A) >=f(A+v)-f(A) \geq(1)/(|B\\A|)(f(B)-f(A))\frac{1}{|B \backslash A|}(f(B)-f(A)).
If f(A) > f(B)f(A)>f(B) then there is an element v in B\\Av \in B \backslash A such that f(B-v)-f(B) > 0f(B-v)-f(B)>0. More generally there is an element v in B\\Av \in B \backslash A such that f(B-v)-f(B) >=f(B-v)-f(B) \geq(1)/(|B\\A|)(f(A)-f(B))\frac{1}{|B \backslash A|}(f(A)-f(B)).
Exercise 8.1. Prove the preceding lemma.
We obtain the following corollary.
Corollary 8.3.Let SS be a local optimum for the local search algorithm and let S^(**)S^{*} be an optimum solution. Then f(S) >= f(S nnS^(**))f(S) \geq f\left(S \cap S^{*}\right) and f(S) >= f(S uuS^(**))f(S) \geq f\left(S \cup S^{*}\right).
Theorem 8.4.The local search algorithm is a 1//31/3-approximation and is a 1//21/2-approximation if ff is symmetric.
Proof. Let SS be the local optimum and S^(**)S^{*} be a global optimum for the given instance. From the previous corollary we have that f(S) >= f(S nnS^(**))f(S) \geq f\left(S \cap S^{*}\right) and f(S) >= f(S uuS^(**))f(S) \geq f\left(S \cup S^{*}\right). Note that the algorithm outputs the better of SS and V\\SV \backslash S. By submodularity, we have,
If ff is symmetric we argue as follows. Using Lemma 8.4 we claim that f(S) >= f(S nnS^(**))f(S) \geq f\left(S \cap S^{*}\right) as before but also that f(S) >= f(S uu bar(S)^(**))f(S) \geq f\left(S \cup \bar{S}^{*}\right) where bar(A)\bar{A} is shorthand notation for the the complement V\\AV \backslash A. Since ff is symmetric f(S uu bar(S)^(**))=f\left(S \cup \bar{S}^{*}\right)=f(V\\(S uu bar(S)^(**)))=f(( bar(S))nnS^(**))=f(S^(**)\\S)f\left(V \backslash\left(S \cup \bar{S}^{*}\right)\right)=f\left(\bar{S} \cap S^{*}\right)=f\left(S^{*} \backslash S\right). Thus,
The running time of the local search algorithm may not be polynomial but one can modify the algorithm as we did for Max Cut to obtain a strongly polynomial time algorithm that gives a (1//3-o(1))(1 / 3-o(1))-approximation ((1//2-o(1)((1 / 2-o(1); for symmetric)). See [2:1] for more details. There has been much work on submodular function maximization including work on variants with additional constraints. Local search has been a powerful tool for these problems. See [3], [4], [5] for some of the results on local search based method, and [6] for a survey on submodular function maximization.
A value oracle for a set function f:2^(V)rarrRf:2^{V} \rightarrow \mathbb {R} provides access to the function by giving the value f(A)f(A) when presented with the set AA. ↩︎
Uriel Feige, Vahab S Mirrokni, and Jan Vondrak. “Maximizing nonmonotone submodular functions”. In: SIAM Journal on Computing 40.4 (2011), pp. 1133–1153. ↩︎↩︎
Simon Bruggmann and Rico Zenklusen. “Submodular maximization through the lens of linear programming”. In: Mathematics of Operations Research 44.4 (2019), pp. 1221–1244. ↩︎
Moran Feldman, Joseph Seffi Naor, Roy Schwartz, and Justin Ward. “Improved approximations for k-exchange systems”. In: European Symposium on Algorithms. Springer. 2011, pp. 784–798. ↩︎
Jon Lee, Maxim Sviridenko, and Jan Vondrák. “Matroid matching: the power of local search”. In: SIAM Journal on Computing 42.1 (2013), pp. 357–379. ↩︎
Niv Buchbinder and Moran Feldman. “Submodular functions maximization problems”. In: Handbook of Approximation Algorithms and Metaheuristics, Second Edition. Chapman and Hall/CRC, 2018, pp. 753–788. ↩︎
Recommended for you
Srishti Saha
High Dimension Data Analysis - A tutorial and review for Dimensionality Reduction Techniques
High Dimension Data Analysis - A tutorial and review for Dimensionality Reduction Techniques
This article explains and provides a comparative study of a few techniques for dimensionality reduction. It dives into the mathematical explanation of several feature selection and feature transformation techniques, while also providing the algorithmic representation and implementation of some other techniques. Lastly, it also provides a very brief review of various other works done in this space.