Approximation Algorithms: Load Balancing and Bin Packing

This is the lecture notes from Chandra Chekuri's CS583 course on Approximation Algorithms. Chapter 5: Load Balancing and Bin Packing. You can read Chapter 6: Unrelated Machine Scheduling and Generalized Assignment, here. Chapter 4: Packing Problems, here.
Chapter 5
This chapter is based on notes first scribed by Rachit Agarwal.
In the last lecture, we studied the Knapsack problem. The Knapsack problem is an NP-hard problem but does admit a pseudo-polynomial time algorithm and can be solved efficiently if the input size is small. We used this pseudo-polynomial time algorithm to obtain an FPTAS for Knapsack. In this lecture, we study another class of problems, known as strongly NP-hard problems.
Definition 5.1 (Strongly NP-hard Problems). An NPO problem π π pi\pi is said to be strongly NP-hard if it is NP-hard even if the inputs are polynomially bounded in combinatorial size of the problem[1].
Many NP-hard problems are in fact strongly NP-hard. If a problem Π Π Pi\Pi is strongly NP-hard, then Π Π Pi\Pi does not admit a pseudo-polynomial time algorithm. We study two such problems in this lecture, Multiprocessor Scheduling and Bin Packing.

5.1 Load Balancing / MultiProcessor Scheduling

A central problem in scheduling theory is to design a schedule such that the finishing time of the last jobs (also called makespan) is minimized. This problem is often referred to as the Load Balancing, the Minimum Makespan Scheduling or Multiprocessor Scheduling problem.

5.1.1 Problem Description

In the Multiprocessor scheduling problem, we are given m m mm identical machines M 1 , , M m M 1 , , M m M_(1),dots,M_(m)M_{1}, \ldots, M_{m} and n n nn jobs J 1 , J 2 , , J n J 1 , J 2 , , J n J_(1),J_(2),dots,J_(n)J_{1}, J_{2}, \ldots, J_{n}. Job J i J i J_(i)J_{i} has a processing time p i 0 p i 0 p_(i) >= 0p_{i} \geq 0 and the goal is to assign jobs to the machines so as to minimize the maximum load [2].

5.1.2 Greedy Algorithm

Consider the following greedy algorithm for the Multiprocessor Scheduling problem which we will call Greedy Multiprocessor Scheduling.
Greedy Multiprocessor Scheduling:
Order (list) the jobs arbitrarily
For i = 1 i = 1 i=1i=1 to n n nn do
quad\quad Assign Job J i J i J_(i)J_{i} to the machine with least current load
quad\quad Update load of the machine that receives job J i J i J_(i)J_{i}
This algorithm is also called a list scheduling algorithm following Graham's terminology from his paper from 1966 [3]. The list is the order in which the jobs are processed and changing it creates different schedules. We prove that the Greedy Multiprocessor Scheduling algorithm gives a ( 2 1 / m ) ( 2 1 / m ) (2-1//m)(2-1 / m)-approximation.
Theorem 5.2. Greedy Multiprocessor Scheduling algorithm gives a ( 2 1 m ) 2 1 m (2-(1)/(m))\left(2-\frac{1}{m}\right)-approximation for any list.
To prove the theorem, we will make use of two lower bounds on the length of the optimal schedule which we denote by OPT OPT OPT\operatorname{OPT}.
Observation 5.3. OPT i p i m OPT i p i m "OPT" >= (sum_(i)p_(i))/(m)\text{OPT}\geq \frac{\sum_{i} p_{i}}{m}.
Observation 5.4. OPT max i p i OPT max i p i OPT >= max_(i)p_(i)\operatorname{OPT}\geq \max _{i} p_{i}.
We leave the proofs of the observations as easy exercises.
Proof of Theorem 5.2. Fix the list and let L L LL denote the makespan of the Greedy Multiprocessor Scheduling algorithm. Let L i L i L_(i)L_{i} denote the load of machine M i M i M_(i)M_{i} and let M i M i M_(i^(**))M_{i^{*}} be the most heavily loaded machine in the schedule by Greedy Multiprocessor Scheduling algorithm.
Let J k J k J_(k)J_{k} be the last job assigned to M i M i M_(i^(**))M_{i^{*}}. Since Greedy Multiprocessor Scheduling algorithm assigns a job to the machine that is least loaded, all machines must be loaded at least L p k L p k L-p_(k)L-p_{k} at the time of assigning J k J k J_(k)J_{k}. Hence, we have:
(5.1) ( i = 1 n p i ) p k m ( L p k ) (5.1) i = 1 n p i p k m L p k {:(5.1)(sum_(i=1)^(n)p_(i))-p_(k) >= m(L-p_(k)):}\begin{equation} \left(\sum_{i=1}^{n} p_{i}\right)-p_{k} \geq m\left(L-p_{k}\right) \tag{5.1} \end{equation}
which implies:
L p k ( i = 1 n p i ) p k m hence L ( i = 1 n p i ) m + p k ( 1 1 m ) OPT + OPT ( 1 1 m ) = OPT ( 2 1 m ) L p k i = 1 n p i p k m hence L i = 1 n p i m + p k 1 1 m OPT + OPT 1 1 m = OPT 2 1 m {:[L-p_(k) <= ((sum_(i=1)^(n)p_(i))-p_(k))/(m)],["hence"],[L <= ((sum_(i=1)^(n)p_(i)))/(m)+p_(k)(1-(1)/(m))],[ <= OPT+OPT(1-(1)/(m))],[=OPT(2-(1)/(m))]:}\begin{aligned} L-p_{k} & \leq \frac{\left(\sum_{i=1}^{n} p_{i}\right)-p_{k}}{m} \\ \text {hence}& \\L& \leq \frac{\left(\sum_{i=1}^{n} p_{i}\right)}{m}+p_{k}\left(1-\frac{1}{m}\right) \\ & \leq \operatorname { OPT }+\operatorname { OPT }\left(1-\frac{1}{m}\right) \\ &=\operatorname{OPT}\left(2-\frac{1}{m}\right) \end{aligned}
where the third step follows from the two lower bounds on OPT OPT OPT\operatorname{OPT}.
The above analysis is tight, i.e., there exist instances where the greedy algorithm produces a schedule which has a makespan ( 2 1 / m ) ( 2 1 / m ) (2-1//m)(2-1 / m) times the optimal. Consider the following instance: m ( m 1 ) m ( m 1 ) m(m-1)m(m-1) jobs with unit processing time and a single job with processing time m m mm. Suppose the greedy algorithm schedules all the short jobs before the long job, then the makespan of the schedule obtained is ( 2 m 1 ) ( 2 m 1 ) (2m-1)(2 m-1) while the optimal makespan is m m mm. Hence the algorithm gives a schedule which has makespan 2 1 / m 2 1 / m 2-1//m2-1 / m times the optimal.
It may seem from the tight example above that an approximation ratio α < ( 2 1 / m ) α < ( 2 1 / m ) alpha < (2-1//m)\alpha<(2-1 / m) could be achieved if the jobs are sorted before processing, which indeed is the case. The following algorithm, due to [4], sorts the jobs in decreasing order of processing time prior to running Greedy Multiprocessor Scheduling algorithm algorithm.
Modified Greedy Multiprocessor Scheduling:
Sort the jobs in decreasing order of processing times
For i = 1 i = 1 i=1i=1 to n n nn do
quad\quad Assign Job J i J i J_(i)J_{i} to the machine with least current load
quad\quad Update load of the machine that receives job J i J i J_(i)J_{i}
Graham [4:1] proved the following tight bound.
Theorem 5.5. Modified Greedy Multiprocessor Scheduling algorithm gives a ( 4 / 3 1 / 3 m ) ( 4 / 3 1 / 3 m ) (4//3-1//3m)(4/3 - 1/3m)-approximation for the Multiprocessor Scheduling problem.
We will not prove the preceding theorem which requires some careful case analysis. Instead we will show how one can obtain an easier bound of 3 / 2 3 / 2 3//23 / 2 via the following claim.
Claim 5.1.1. Suppose p i p j p i p j p_(i) >= p_(j)p_{i} \geq p_{j} for all i > j i > j i > ji>j and n > m n > m n > mn>m. Then, OPT OPT OPT\operatorname{OPT} p m + p m + 1 p m + p m + 1 >= p_(m)+p_(m+1)\geq p_{m}+p_{m+1}.
Proof. Since n > m n > m n > mn>m and the processing times are sorted in decreasing order, some two of the ( m + 1 ) ( m + 1 ) (m+1)(m+1) largest jobs must be scheduled on the same machine. Notice that the load of this machine is at least p m + p m + 1 p m + p m + 1 p_(m)+p_(m+1)p_{m}+p_{m+1}.
Exercise 5.1. Prove that Modified Greedy Multiprocessor Scheduling gives a ( 3 / 2 1 / 2 m ) ( 3 / 2 1 / 2 m ) (3//2-1//2m)(3 / 2-1 / 2 m)-approximation using the preceding claim and the other two lower bounds on OPT OPT OPT\operatorname{OPT} that we have seen already.
Before going to the description of a PTAS for Multiprocessor Scheduling problem, we discuss the case when the processing times of the jobs are bounded from above.
Claim 5.1.2. If p i ϵ OPT , i p i ϵ OPT , i p_(i) <= epsilon*OPT,AA ip_{i} \leq \epsilon \cdot \operatorname{OPT}, \forall i, then Modified Greedy Multiprocesor Scheduling gives a ( 1 + ϵ ) a ( 1 + ϵ ) a(1+epsilon)a(1+\epsilon)-approximation.

5.1.3. A PTAS for Multi-Processor Scheduling

We will now give a PTAS for the problem of scheduling jobs on identical machines. We would like to use the same set of ideas that were used for the Knapsack problem (see Lecture 4): that is, given an explicit time T T TT we would like to round the job lengths and use dynamic programming to see if they will fit within time T T TT. Then the unrounded job lengths should fit within time T ( 1 + ϵ ) T ( 1 + ϵ ) T(1+epsilon)T(1+\epsilon).
Big Jobs, Small Jobs and Rounding Big Jobs: For the discussion that follows, we assume that all the processing times have been scaled so that OPT = 1 OPT = 1 OPT=1\operatorname{OPT}=1 and hence, p max 1 p max 1 p_(max) <= 1p_{\max } \leq 1.
Given all the jobs, we partition the jobs into two sets: Big jobs and Small jobs. We call a job J i J i J_(i)J_{i} "big" if p i ϵ p i ϵ p_(i) >= epsilonp_{i} \geq \epsilon. Let B B B\mathcal{B} and S S S\mathcal{S} denote the set of big jobs and small jobs respectively, i.e., B = { J i : p i ϵ } B = J i : p i ϵ B={J_(i):p_(i) >= epsilon}\mathcal{B}=\left\{J_{i}: p_{i} \geq \epsilon\right\} and S = { J i : p i < ϵ } S = J i : p i < ϵ S={J_(i):p_(i) < epsilon}\mathcal{S}=\left\{J_{i}: p_{i}<\epsilon\right\}. The significance of such a partition is that once we pack the jobs in set B B B\mathcal{B}, the jobs in set S S S\mathcal{S} can be greedily packed using list scheduling.
Claim 5.1.3. If there is an assignment of jobs in B B B\mathcal{B} to the machines with load L L LL, then greedily scheduling jobs of S S S\mathcal{S} on top gives a schedule of value no greater than max { L , ( 1 + ϵ ) OPT } max { L , ( 1 + ϵ ) OPT } max{L,(1+epsilon)OPT}\max \{L,(1+\epsilon) \operatorname{OPT} \}.
Proof. Consider scheduling the jobs in S S S\mathcal{S} after all the jobs in B B B\mathcal{B} have been scheduled (with load L L LL ). If all of these jobs in S S S\mathcal{S} finish processing by time L L LL, the total load is clearly no greater than L L LL.
If the jobs in S S S\mathcal{S} can not be scheduled within time L L LL, consider the last job to finish (after scheduling the small jobs). Suppose this job starts at time T T T^(')T^{\prime}. All the machines must have been fully loaded up to T T T^(')T^{\prime}, which gives O P T T O P T T OPT >= T^(')\mathrm{OPT} \geq T^{\prime}. Since, for all jobs in S S S\mathcal{S}, we have p i ϵ O P T p i ϵ O P T p_(i) <= epsilon*OPTp_{i} \leq \epsilon \cdot \mathrm{OPT}, this job finishes at T + ϵ O P T T + ϵ O P T T^(')+epsilon*OPTT^{\prime}+\epsilon \cdot \mathrm{OPT}. Hence, the schedule can be no more than T + ϵ O P T ( 1 + ϵ ) O P T T + ϵ O P T ( 1 + ϵ ) O P T T^(')+epsilon*OPT <= (1+epsilon)OPTT^{\prime}+\epsilon \cdot \mathrm{OPT} \leq(1+\epsilon) \mathrm{OPT}, settling the claim.
Scheduling Big Jobs: We concentrate on scheduling the jobs in B B B\mathcal{B}. We round the sizes of all jobs in B B B\mathcal{B} using geometrically increasing interval sizes using the following procedure:
Rounding Jobs:
For each job i i ii do
quad\quad If p i ( ϵ ( 1 + ϵ ) j , ϵ ( 1 + ϵ ) j + 1 ] p i ( ϵ ( 1 + ϵ ) j , ϵ ( 1 + ϵ ) j + 1 ] p_(i)in(epsilon(1+epsilon)^(j),epsilon(1+epsilon)^(j+1)]p_{i} \in (\epsilon ( 1 + \epsilon)^{j}, \epsilon(1+\epsilon)^{j+1}]
quadquad\quad\quad Set p i = ϵ ( 1 + ϵ ) j + 1 p i = ϵ ( 1 + ϵ ) j + 1 p_(i)=epsilon(1+epsilon)^(j+1)p_{i}=\epsilon(1+\epsilon)^{j+1}
Let B B B^(')\mathcal{B}^{\prime} be the set of new jobs.
Claim 5.1.4. If jobs in B B B\mathcal{B} can be scheduled with load 1 , then the rounded jobs in B B B^(')\mathcal{B}^{\prime} can be scheduled with load ( 1 + ϵ ) ( 1 + ϵ ) (1+epsilon)(1+\epsilon).
Claim 5.1.5. The number of distinct big job sizes after rounding is O ( ln ( 1 / ϵ ) ϵ ) O ( ln ( 1 / ϵ ) ϵ ) O((ln(1//epsilon))/(epsilon))O(\frac{\ln (1 / \epsilon)}{\epsilon}).
Proof. Notice that due to scaling, we have p i 1 p i 1 p_(i) <= 1p_{i} \leq 1 for all jobs J i J i J_(i)J_{i}. Since the job sizes are between ϵ ϵ epsilon\epsilon and 1 the number of geometric powers of ( 1 + ϵ ) ( 1 + ϵ ) (1+epsilon)(1+\epsilon) required is k k kk where
ϵ ( 1 + ϵ ) k 1 k ln ( 1 + ϵ ) 1 ϵ = O ( ln ( 1 / ϵ ) ϵ ) . ϵ ( 1 + ϵ ) k 1 k ln ( 1 + ϵ ) 1 ϵ = O ( ln ( 1 / ϵ ) ϵ ) . {:[epsilon(1+epsilon)^(k) <= 1],[=>k <= ln_((1+epsilon))(1)/(epsilon)=O((ln(1//epsilon))/(epsilon)).]:}\begin{aligned} \epsilon(1+\epsilon)^{k} & \leq 1 \\ \Rightarrow k & \leq \ln _{(1+\epsilon)} \frac{1}{\epsilon}=O(\frac{\ln (1 / \epsilon)}{\epsilon}) . \end{aligned}
Lemma 5.1. If the number of distinct job sizes is k k kk, then there is an exact algorithm that returns the schedule (if there is one) and runs in time O ( n 2 k ) O n 2 k O(n^(2k))O\left(n^{2 k}\right).
Proof. Use Dynamic Programming.
Corollary 5.6. Big Jobs can be scheduled (if possible) with load ( 1 + ϵ ) ( 1 + ϵ ) (1+epsilon)(1+\epsilon) in time n O ( ln ( 1 / ϵ ) ϵ ) n O ( ln ( 1 / ϵ ) ϵ ) n^(O((ln(1//epsilon))/(epsilon)))n^{O(\frac{\ln (1 / \epsilon)}{\epsilon})}.
Once we have scheduled the jobs in B B B\mathcal{B}, using Claim 5.1.3, we can pack small items using greedy list scheduling on top of them. The overall algorithm is then given as:
PTAS Multiprocessor Scheduling
1. Guess OPT OPT OPT\operatorname{OPT}
2. Define B B B\mathcal{B} and S S S\mathcal{S}
3. Round B B B\mathcal{B} to B B B^(')\mathcal{B}^{\prime}
4. If jobs in B B B^(')\mathcal{B}^{\prime} can be scheduled in ( 1 + ϵ ) OPT ( 1 + ϵ ) OPT (1+epsilon)OPT(1+\epsilon)\operatorname{OPT}
qquad\qquad Greedily pack S S S\mathcal{S} on top
quad\quad Else
qquad\qquad Modify the guess and Repeat.
In the following subsection, we comment on the guessing process.
Guessing: We define a ( 1 + ϵ ) ( 1 + ϵ ) (1+epsilon)(1+\epsilon)-relaxed decision procedure:
Definition 5.7. Given ϵ > 0 ϵ > 0 epsilon > 0\epsilon>0 and a time T , a ( 1 + ϵ ) T , a ( 1 + ϵ ) T,a(1+epsilon)T, a(1+\epsilon)-relaxed decision procedure returns:
  • Output a schedule (if there is one) with makespan load ( 1 + ϵ ) T ( 1 + ϵ ) T (1+epsilon)*T(1+\epsilon) \cdot T or
  • Output correctly that there is no schedule with makespan T T TT.
Define
L = max { max j p j , 1 m j p j } L = max max j p j , 1 m j p j L=max{max_(j)p_(j),(1)/(m)sum_(j)p_(j)}L=\max \left\{\max _{j} p_{j}, \frac{1}{m} \sum_{j} p_{j}\right\}
L L LL is a lower bound on OPT OPT OPT\operatorname{OPT} as we saw earlier. Furthermore, an upper bound on OPT OPT OPT\operatorname{OPT} is given by the Greedy Multiprocessor Scheduling algorithm, which is 2 L 2 L 2L2 L. Consider running the decision procedure with guess L + i ϵ L L + i ϵ L L+i epsilon LL+i \epsilon L for each integer i [ 2 / ϵ ] i [ 2 / ϵ ] i in[|~2//epsilon~|]i \in[\lceil 2 / \epsilon\rceil]. We will choose the schedule with the best makespan among all the successful runs. If L L L^(**)L^{*} is the optimum load then the algorithm will try the decision procedure with L + ϵ L ( 1 + ϵ ) L L + ϵ L ( 1 + ϵ ) L L^(**)+epsilon L <= (1+epsilon)L^(**)L^{*}+\epsilon L \leq(1+\epsilon) L^{*}. For this guess we are guaranteed a solution and the decision procedure will succeed in outputting a schedule with load ( 1 + ϵ ) ( 1 + ϵ ) L ( 1 + 3 ϵ ) L ( 1 + ϵ ) ( 1 + ϵ ) L ( 1 + 3 ϵ ) L (1+epsilon)(1+epsilon)L^(**) <= (1+3epsilon)L^(**)(1+\epsilon)(1+\epsilon) L^{*} \leq(1+3 \epsilon) L^{*} for sufficiently small ϵ ϵ epsilon\epsilon. We run the decision procedure O ( 1 / ϵ ) O ( 1 / ϵ ) O(1//epsilon)O(1 / \epsilon) times. This gives us the desired PTAS.
Remark 5.1. A PTAS indicates that the problem can approximated arbitrarily well in polynomial time. However, a running time of the form n f ( ϵ ) n f ( ϵ ) n^(f(epsilon))n^{f(\epsilon)} is typically not very interesting. We have seen that an FPTAS is ruled out for the makespan minimization problem. However, it does admit what is now called an Efficient PTAS (EPTAS) whose running time is 2 O ( 1 / ϵ 2 ( log ( 1 / ϵ ) ) 3 ) + poly ( n ) 2 O ( 1 / ϵ 2 ( log ( 1 / ϵ ) ) 3 ) + poly ( n ) 2^(O(1//epsilon^(2)*(log(1//epsilon))^(3)))+poly(n)2^{O(1 / \epsilon^{2} \cdot(\log (1 / \epsilon))^{3})}+\operatorname{poly}(n). See [5].

5.1.4 Section Notes

Multiprocessor Scheduling is NP-hard as we can reduce 2-Partition to Multiprocessor Scheduling on two machines. Note that this reduction only proves that Multiprocessor Scheduling is weakly NP-hard. When m m mm is a fixed constant Horowitz and Sahni [6] give an FPTAS. However Multiprocessor Scheduling problem is strongly NP-hard when m m mm is part of the input (by a reduction from 3-Partition [7]). Thus, there can not exist an FPTAS for the Multiprocessor Scheduling problem in general, unless P = N P P = N P P=NPP=N P. However, Hochbaum and Shmoys [8] gave a PTAS which is the one we described. EPTASes have been developed for several problems and a key technique is the use of integer linear programming solvers with a small number of variables.

5.2 Bin Packing

5.2.1 Problem Description

In the Bin Packing problem, we are given a set of n n nn items { 1 , 2 , , n } { 1 , 2 , , n } {1,2,dots,n}\{1,2, \ldots, n\}. Item i i ii has size s i ( 0 , 1 ] s i ( 0 , 1 ] s_(i)in(0,1]s_{i} \in(0,1]. The goal is to find a minimum number of bins of capacity 1 into which all the items can be packed.
One could also formulate the problem as partitioning { 1 , 2 , , n } { 1 , 2 , , n } {1,2,dots,n}\{1,2, \ldots, n\} into k k kk sets B 1 , B 2 , , B k B 1 , B 2 , , B k B_(1),B_(2),dots,B_(k)\mathcal{B}_{1}, \mathcal{B}_{2}, \ldots, \mathcal{B}_{k} such that i B j s i 1 i B j s i 1 sum_(i inB_(j))s_(i) <= 1\sum_{i \in \mathcal{B}_{j}} s_{i} \leq 1 and k k kk is minimum.

5.2.2 Greedy Approaches

Consider the following greedy algorithm for bin packing:
Greedy Bin Packing:
Order items in some way
For i = 1 i = 1 i=1i=1 to n n nn
quad\quad If item i i ii can be packed in some open bin
qquad\qquad Pack it
quad\quad Else
qquad\qquad Open a new bin and pack i i ii in the new bin
In Greedy Bin Packing algorithm, a new bin is opened only if the item can not be packed in any of the already opened bins. However, there might be several opened bins in which the item i i ii could be packed. Several rules could be formulated in such a scenario:
  • First Fit: Pack item in the earliest opened bin
  • Last Fit: Pack item in the last opened bin
  • Best Fit: Pack item in the bin that would have least amount of space left after packing the item
  • Worst Fit: Pack item in the bin that would have most amount of space left after packing the item
Irrespective of what strategy is chosen to pack an item in the opened bins, one could get the following result:
Theorem 5.8. Any greedy rule yields a 2 2 22-approximation.
Observation 5.9. OPT i s i OPT i s i OPT >= sum_(i)s_(i)\operatorname{OPT} \geq \sum_{i} s_{i}.
We call a bin α α alpha\alpha-full if items occupy space at most α α alpha\alpha.
Claim 5.2.1. Greedy has at most 1 1 11 bin that is 1 2 1 2 (1)/(2)\frac{1}{2}-full.
Proof. For the sake of contradiction, assume that there are two bins B i B i B_(i)B_{i} and B j B j B_(j)B_{j} that are 1 2 1 2 (1)/(2)\frac{1}{2}-full. WLOG, assume that Greedy Bin Packing algorithm opened bin B i B i B_(i)B_{i} before B j B j B_(j)B_{j}. Then, the first item that the algorithm packed into B j B j B_(j)B_{j} must be of size at most 1 2 1 2 (1)/(2)\frac{1}{2}. However, this item could have been packed into B i B i B_(i)B_{i} since B i B i B_(i)B_{i} is 1 2 1 2 (1)/(2)\frac{1}{2}-full. This is a contradiction to the fact that Greedy Bin Packing algorithm opens a new bin if and only if the item can not be packed in any of the opened bins.
Proof of Theorem 5.8. Let m m mm be the number of bins opened by Greedy Bin Packing algorithm. From Claim 5.2.1, we have:
i s i > m 1 2 i s i > m 1 2 sum_(i)s_(i) > (m-1)/(2)\sum_{i} s_{i}>\frac{m-1}{2}
Using the observation that OPT i s i OPT i s i OPT >= sum_(i)s_(i)\operatorname{OPT} \geq \sum_{i} s_{i}, we get:
OPT > m 1 2  OPT >  m 1 2 " OPT > "(m-1)/(2)\text { OPT > } \frac{m-1}{2}
which gives us:
m < 2 OPT + 1 m 2 OPT m < 2 OPT + 1 m 2 OPT {:[m < 2*OPT+1],[=>m <= 2*OPT]:}\begin{aligned} m &<2 \cdot \operatorname{OPT}+1 \\ \Rightarrow m & \leq 2 \cdot \operatorname{OPT} \end{aligned}

5.2.3 (Asymptotic) PTAS for Bin Packing

A natural question follows the discussion above: Can Bin Packing have a PTAS? In this subsection, we settle this question in negative. In particular, we give a reduction from an NP-complete problem to the Bin Packing problem and show that a PTAS for the Bin Packing problem will give us an exact solution for the NP-complete problem in polynomial time. We consider the Partition problem:
In the Partition problem, we are given a set of items { 1 , 2 , , n } { 1 , 2 , , n } {1,2,dots,n}\{1,2, \ldots, n\}. Item i i ii has a size s i s i s_(i)s_{i}. The goal is to partition the { 1 , 2 , , n } { 1 , 2 , , n } {1,2,dots,n}\{1,2, \ldots, n\} into two sets A A A\mathcal{A} and B B B\mathcal{B} such that i A s i = j B s j i A s i = j B s j sum_(i inA)s_(i)=sum_(j inB)s_(j)\sum_{i \in \mathcal{A}} s_{i}=\sum_{j \in \mathcal{B}} s_{j}.
Claim 5.2.2. If Bin Packing has a ( 3 2 ϵ ) ( 3 2 ϵ ) ((3)/(2)-epsilon)(\frac{3}{2}-\epsilon)-approximation for any ϵ > 0 ϵ > 0 epsilon > 0\epsilon>0, the Partition problem can be solved exactly in polynomial time.
Proof. Given an instance I I II of the Partition problem, we construct an instance of the Bin Packing problem as follows: Scale the size of the items such that i s i = 2 i s i = 2 sum_(i)s_(i)=2\sum_{i} s_{i}=2. Consider the scaled sizes of the items as an instance I I I^(')I^{\prime} of the Bin Packing problem. If all items of I I I^(')I^{\prime} can be packed in 2 2 22 bins, then we have an "yes" answer to I I II. Otherwise, the items of I I I^(')I^{\prime} need 3 bins and the answer to I I II is "no".
OPT OPT OPT\operatorname{OPT} for I I I^(')I^{\prime} is 2 2 22 or 3 3 33. Hence, if there is a ( 3 2 ϵ ) ( 3 2 ϵ ) ((3)/(2)-epsilon)(\frac{3}{2}-\epsilon)-approximation algorithm for the Bin Packing problem, we can determine the value of OPT OPT OPT\operatorname{OPT} which in turn implies that we can solve I I II. Thus, there can not exist a ( 3 2 ϵ ) ( 3 2 ϵ ) ((3)/(2)-epsilon)(\frac{3}{2}-\epsilon)-approximation algorithm for the Bin Packing problem, unless P = N P P = N P P=NP\mathrm{P}=\mathrm{NP}.
Recall the scaling property where we discussed why many optimization problems do not admit additive approximations. We notice that the Bin Packing problem does not have the scaling property. Hence it may be possible to find an additive approximation algorithms. We state some of the results in this context:
Theorem 5.10 (Johnson '74 [9]). There exists a polynomial time algorithm such A J A J A_(J)\mathcal{A}_{J} such that:
A J ( I ) 11 9 OPT ( I ) + 4 A J ( I ) 11 9 OPT ( I ) + 4 A_(J)(I) <= (11)/(9)OPT(I)+4\mathcal{A}_{J}(I) \leq \frac{11}{9} \operatorname{OPT}(I)+4
for all instances I of the Bin Packing problem.
Theorem 5.11 (de la Vega, Lueker '81 [10]). For any fixed ϵ > 0 ϵ > 0 epsilon > 0\epsilon>0 there exists a polynomial time algorithm such A F L A F L A_(FL)\mathcal{A}_{F L} such that:
A F L ( I ) ( 1 + ϵ ) OPT ( I ) + 1 A F L ( I ) ( 1 + ϵ ) OPT ( I ) + 1 A_(FL)(I) <= (1+epsilon)OPT(I)+1\mathcal{A}_{F L}(I) \leq(1+\epsilon) \operatorname{OPT}(I)+1
for all instances I of the Bin Packing problem.
Theorem 5.12 (Karmarkar, Karp '82 [11]). There exists a polynomial time algorithm such A K K A K K A_(KK)\mathcal{A}_{K K} such that:
A K K ( I ) OPT ( I ) + O ( log 2 ( OPT ( I ) ) ) A K K ( I ) OPT ( I ) + O log 2 ( OPT ( I ) ) A_(KK)(I) <= OPT(I)+O(log^(2)(OPT(I)))\mathcal{A}_{K K}(I) \leq \operatorname{OPT}(I)+O\left(\log ^{2}(\operatorname{OPT}(I))\right)
for all instances I of the Bin Packing problem.
This has been improved recently.
Theorem 5.13 (Hoberg and Rothvoss 2017 [12]). There exists a polynomial time algorithm such A H T A H T A_(HT)\mathcal{A}_{H T} such that:
A K K ( I ) OPT ( I ) + O ( log ( OPT ( I ) ) ) A K K ( I ) OPT ( I ) + O ( log ( OPT ( I ) ) ) A_(KK)(I) <= OPT(I)+O(log(OPT(I)))\mathcal{A}_{K K}(I) \leq \operatorname{OPT}(I)+O(\log (\operatorname{OPT}(I)))
for all instances I of the Bin Packing problem.
A major open problem is the following.
Open Question 5.14. Is there a polynomial-time algorithm A A A\mathcal{A} such that A ( I ) A ( I ) A(I) <=\mathcal{A}(I) \leq OPT ( I ) + c OPT ( I ) + c OPT(I)+c\operatorname{OPT}(I)+c, for some fixed constant c c cc? In particular is c = 1 c = 1 c=1c=1?
Exercise 5.2. Show that First Fit greedy rule yields a 3 2 OPT + 1 3 2 OPT + 1 (3)/(2)OPT+1\frac{3}{2} \operatorname{OPT}+1-approximation.

5.2.4 Asymptotic PTAS for Bin Packing

A recurring theme in last two lectures has been the rounding of jobs/tasks/items. To construct an asymptotic PTAS for Bin Packing problem, we use the same set of ideas with simple in retrospect but non-obvious modifications. In particular, we divide the set of items into big and small items and concentrate on packing the big items first. We show that such a technique results in an asymptotic PTAS for the Bin Packing problem.
Consider the set of items, s 1 , s 2 , , s n s 1 , s 2 , , s n s_(1),s_(2),dots,s_(n)s_{1}, s_{2}, \ldots, s_{n}. We divide the items into two sets, B = { i : s i ϵ } B = i : s i ϵ B={i:s_(i) >= epsilon}\mathcal{B}=\left\{i: s_{i} \geq \epsilon\right\} and S = { j : s j < ϵ } S = j : s j < ϵ S={j:s_(j) < epsilon}\mathcal{S}=\left\{j: s_{j}<\epsilon\right\}. Similar to the Multiprocessor Scheduling problem, where we rounded up the processing times of the jobs, we round up the sizes of the items in the Bin Packing problem. Again, we concentrate only on the items in B B B\mathcal{B}. Let n = | B | n = | B | n^(')=|B|n^{\prime}=|\mathcal{B}| be the number of big items.
We observed earlier that OPT i s i OPT i s i OPT >= sum_(i)s_(i)\operatorname{OPT}\geq \sum_{i} s_{i} and hence OPT ϵ n OPT ϵ n OPT >= epsilon*n^(')\operatorname{OPT}\geq \epsilon \cdot n^{\prime} by just considering the big items.
Claim 5.2.3. Suppose n > 4 / ϵ 2 n > 4 / ϵ 2 n^(') > 4//epsilon^(2)n^{\prime}>4 / \epsilon^{2}. Then OPT 4 / ϵ OPT 4 / ϵ OPT >= 4//epsilon\operatorname{OPT}\geq 4 / \epsilon.
If there are very few big items one can solve the problem by brute force.
Claim 5.2.4. Suppose n < 4 / ϵ 2 n < 4 / ϵ 2 n^(') < 4//epsilon^(2)n^{\prime}<4 / \epsilon^{2}. An optimal solution for the Bin Packing problem can be computed in 2 O ( 1 / ϵ 4 ) 2 O ( 1 / ϵ 4 ) 2^(O(1//epsilon^(4)))2^{O(1 / \epsilon^{4})} time.
Proof Sketch. If the number of big items is small, one can find the optimal solution using brute force search.
The following gives a procedure to round up the items in B B B\mathcal{B} :
Rounding Item Sizes:
Sort the items such that s 1 s 2 s n s 1 s 2 s n s_(1) >= s_(2) >= cdots >= s_(n^('))s_{1} \geq s_{2} \geq \cdots \geq s_{n^{\prime}}
Group items in k = 2 / ϵ 2 k = 2 / ϵ 2 k=2//epsilon^(2)k=2 / \epsilon^{2} groups B 1 , , B k B 1 , , B k B_(1),dots,B_(k)\mathcal{B}_{1}, \ldots, \mathcal{B}_{k} such that each group has n / k n / k |__n^(')//k __|\left\lfloor n^{\prime} / k\right\rfloor items
Round the size of all the items in group B i B i B_(i)\mathcal{B}_{i} to the size of the smallest item in B i 1 B i 1 B_(i-1)\mathcal{B}_{i-1}
Lemma 5.2. Consider the restriction of the bin packing problem to instances in which the number of distinct item sizes is k k kk. There is an n O ( k ) n O ( k ) n^(O(k))n^{O(k)}-time algorithm that outputs the optimum solution.
Proof Sketch. Use Dynamic Programming.
Claim 5.2.5. The items in B B B\mathcal{B} can be packed in OPT + | B 1 | OPT + B 1 OPT+|B_(1)|\operatorname{OPT}+\left|\mathcal{B}_{1}\right| bins in time n O ( 1 / ϵ 2 ) n O ( 1 / ϵ 2 ) n^(O(1//epsilon^(2)))n^{\mathrm{O}(1 / \epsilon^{2})}.
Proof. Using Rounding Item Sizes, we have restricted all items but those in B 1 B 1 B_(1)\mathcal{B}_{1} to have one of the k 1 k 1 k-1k-1 distinct sizes. Using lemma 5.2, these items can be packed efficiently in OPT OPT OPT\operatorname{OPT}. Furthermore, the items in B 1 B 1 B_(1)\mathcal{B}_{1} can always be packed in | B 1 | B 1 |B_(1)|\left|\mathcal{B}_{1}\right| bins (one per bin). Hence, the total number of bins is OPT + | B 1 | OPT + B 1 OPT+|B_(1)|\operatorname{OPT}+\left|\mathcal{B}_{1}\right|.
The running time of the algorithm follows since k = O ( 1 / ϵ 2 ) k = O 1 / ϵ 2 k=O(1//epsilon^(2))k=O\left(1 / \epsilon^{2}\right).
Lemma 5.3. Let ϵ > 0 ϵ > 0 epsilon > 0\epsilon>0 be fixed. Consider the restriction of the bin packing problem to instances in which each items is of size at least ϵ ϵ epsilon\epsilon. There is a polynomial time algorithm that solves this restricted problem within a factor of ( 1 + ϵ ) ( 1 + ϵ ) (1+epsilon)(1+\epsilon).
Proof. Using Claim 5.2.5, we can pack B B B\mathcal{B} in OPT + | B 1 | OPT + B 1 OPT+|B_(1)|\operatorname{OPT}+\left|\mathcal{B}_{1}\right| bins. Recall that | B 1 | = B 1 = |B_(1)|=\left|\mathcal{B}_{1}\right|= n / k ϵ 2 n / 2 ϵ OPT / 8 n / k ϵ 2 n / 2 ϵ OPT / 8 |__n^(')//k __| <= epsilon^(2)*n^(')//2 <= epsilon*OPT //8\left\lfloor n^{\prime} / k\right\rfloor \leq \epsilon^{2} \cdot n^{\prime} / 2 \leq \epsilon \cdot\operatorname{OPT} /8 where, we have used Claim 5.2.3 to reach the final expression.
Theorem 5.15. For any ϵ , 0 < ϵ < 1 / 2 ϵ , 0 < ϵ < 1 / 2 epsilon,0 < epsilon < 1//2\epsilon, 0<\epsilon<1 / 2, there is an algorithm A ϵ A ϵ A_(epsilon)\mathcal{A}_{\epsilon} that runs in time polynomial in n n nn and finds a packing using at most ( 1 + 2 ϵ ) OPT + 1 ( 1 + 2 ϵ ) OPT + 1 (1+2epsilon)OPT+1(1+2 \epsilon) \operatorname{OPT}+1 bins.
Proof. Assume that the number of bins used to pack items in B B B\mathcal{B} is m m mm and the total number of bins used after packing items in S S S\mathcal{S} is m m m^(')m^{\prime}. Clearly
m max { m , [ ( i s i ) ( 1 ϵ ) ] } m max m , i s i ( 1 ϵ ) m^(') <= max{m,[((sum_(i)s_(i)))/((1-epsilon))]}m^{\prime} \leq \max \left\{m,\left[\frac{\left(\sum_{i} s_{i}\right)}{(1-\epsilon)}\right]\right\}
since at most one bin must be ( 1 ϵ ) ( 1 ϵ ) (1-epsilon)(1-\epsilon) full using an argument in Greedy Bin Packing. Furthermore,
( i s i ) ( 1 ϵ ) ] ( i s i ) ( 1 + 2 ϵ ) + 1 i s i ( 1 ϵ ) i s i ( 1 + 2 ϵ ) + 1 |~((sum_(i)s_(i)))/((1-epsilon))] <= (sum_(i)s_(i))(1+2epsilon)+1\left\lceil\frac{\left(\sum_{i} s_{i}\right)}{(1-\epsilon)}\right] \leq\left(\sum_{i} s_{i}\right)(1+2 \epsilon)+1
for ϵ < 1 / 2 ϵ < 1 / 2 epsilon < 1//2\epsilon<1 / 2. This gives the required expression.
The algorithm is summarized below:
Asymptotic Ptas Bin Packing:
Split the items in B B B\mathcal{B} (big items) and S S S\mathcal{S} (small items)
Round the sizes of the items in B B B\mathcal{B} to obtain constant number of item sizes
Find optimal packing for items with rounded sizes
Use this packing for original items in B B B\mathcal{B}
Pack items in S S S\mathcal{S} using Greedy Bin Packing.

5.2.5 Section Notes

An excellent but perhaps somewhat dated survey on approximation algorithms for Bin Packing problem is [13]. See [12:1] for some pointers to more recent work. There has also been substantial recent work on various generalizations to multiple dimensions.

  1. An alternative definition: A problem π π pi\pi is strongly NP-hard if every problem in NP can be polynomially reduced to π π pi\pi in such a way that numbers in the reduced instance are always written in unary ↩︎
  2. The load of a machine is defined as the sum of the processing times of jobs that are assigned to that machine. ↩︎
  3. Ronald L Graham. “Bounds for certain multiprocessing anomalies”. In: Bell system technical journal 45.9 (1966), pp. 1563–1581. ↩︎
  4. Ronald L. Graham. “Bounds on multiprocessing timing anomalies”. In: SIAM journal on Applied Mathematics 17.2 (1969), pp. 416–429. ↩︎ ↩︎
  5. Klaus Jansen. “An EPTAS for scheduling jobs on uniform processors: using an MILP relaxation with a constant number of integral variables”. In: SIAM Journal on Discrete Mathematics 24.2 (2010), pp. 457–485. ↩︎
  6. Ellis Horowitz and Sartaj Sahni. “Exact and approximate algorithms for scheduling nonidentical processors”. In: Journal of the ACM (JACM) 23.2 (1976), pp. 317–327. ↩︎
  7. Michael R Garey and David S Johnson. Computers and intractability. Vol. 174. Freeman, San Francisco, 1979. ↩︎
  8. Dorit S Hochbaum and David B Shmoys. “A polynomial approximation scheme for scheduling on uniform processors: Using the dual approximation approach”. In: SIAM journal on computing 17.3 (1988), pp. 539– 551. ↩︎
  9. David S Johnson. “Approximation algorithms for combinatorial problems”. In: Journal of computer and system sciences 9.3 (1974), pp. 256–278. ↩︎
  10. W Fernandez De La Vega and George S. Lueker. “Bin packing can be solved within 1 + ε 1 + ε 1+epsi1+ \varepsilon in linear time”. In: Combinatorica 1.4 (1981), pp. 349– 355. ↩︎
  11. Narendra Karmarkar and Richard M Karp. “An efficient approximation scheme for the one-dimensional bin-packing problem”. In: 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982). IEEE. 1982, pp. 312–320. ↩︎
  12. Rebecca Hoberg and Thomas Rothvoss. “A logarithmic additive integrality gap for bin packing”. In: Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms. SIAM. 2017, pp. 2616–2625. ↩︎ ↩︎
  13. EG Co man Jr, MR Garey, and DS Johnson. “Approximation algorithms for bin packing: A survey”. In: Approximation algorithms for NP-hard problems (1996), pp. 46–93. ↩︎

Recommended for you

Jintang Li
A Review of Trustworthy Graph Learning
A Review of Trustworthy Graph Learning
In this review, we will explore how to develop trustworthy graph neural networks (GNNs) models from different aspects.
26 points
0 issues