Approximation Algorithms: Load Balancing and Bin Packing
This is the lecture notes from Chandra Chekuri's CS583 course on Approximation Algorithms. Chapter 5: Load Balancing and Bin Packing.
You can read Chapter 6: Unrelated Machine Scheduling and Generalized Assignment, here. Chapter 4: Packing Problems, here.
Chapter 5
This chapter is based on notes first scribed by Rachit Agarwal.
In the last lecture, we studied the Knapsack problem. The Knapsack problem is an NP-hard problem but does admit a pseudo-polynomial time algorithm and can be solved efficiently if the input size is small. We used this pseudo-polynomial time algorithm to obtain an FPTAS for Knapsack. In this lecture, we study another class of problems, known as strongly NP-hard problems.
Definition 5.1 (Strongly NP-hard Problems). An NPO problem is said to be strongly NP-hard if it is NP-hard even if the inputs are polynomially bounded in combinatorial size of the problem[1].
Many NP-hard problems are in fact strongly NP-hard. If a problem is strongly NP-hard, then does not admit a pseudo-polynomial time algorithm. We study two such problems in this lecture, Multiprocessor Scheduling and Bin Packing.
5.1 Load Balancing / MultiProcessor Scheduling
A central problem in scheduling theory is to design a schedule such that the finishing time of the last jobs (also called makespan) is minimized. This problem is often referred to as the Load Balancing, the Minimum Makespan Scheduling or Multiprocessor Scheduling problem.
5.1.1 Problem Description
In the Multiprocessor scheduling problem, we are given identical machines and jobs . Job has a processing time and the goal is to assign jobs to the machines so as to minimize the maximum load [2].
5.1.2 Greedy Algorithm
Consider the following greedy algorithm for the Multiprocessor Scheduling problem which we will call Greedy Multiprocessor Scheduling.
Greedy Multiprocessor Scheduling: |
Order (list) the jobs arbitrarily |
For |
This algorithm is also called a list scheduling algorithm following Graham's terminology from his paper from 1966 [3]. The list is the order in which the jobs are processed and changing it creates different schedules. We prove that the Greedy Multiprocessor Scheduling algorithm gives a -approximation.
Theorem 5.2. Greedy Multiprocessor Scheduling algorithm gives a -approximation for any list.
To prove the theorem, we will make use of two lower bounds on the length of the optimal schedule which we denote by .
Observation 5.3. .
Observation 5.4. .
We leave the proofs of the observations as easy exercises.
Proof of Theorem 5.2. Fix the list and let denote the makespan of the Greedy Multiprocessor Scheduling algorithm. Let denote the load of machine and let be the most heavily loaded machine in the schedule by Greedy Multiprocessor Scheduling algorithm.
Let be the last job assigned to . Since Greedy Multiprocessor Scheduling algorithm assigns a job to the machine that is least loaded, all machines must be loaded at least at the time of assigning . Hence, we have:
which implies:
where the third step follows from the two lower bounds on .
The above analysis is tight, i.e., there exist instances where the greedy algorithm produces a schedule which has a makespan times the optimal. Consider the following instance: jobs with unit processing time and a single job with processing time . Suppose the greedy algorithm schedules all the short jobs before the long job, then the makespan of the schedule obtained is while the optimal makespan is . Hence the algorithm gives a schedule which has makespan times the optimal.
It may seem from the tight example above that an approximation ratio could be achieved if the jobs are sorted before processing, which indeed is the case. The following algorithm, due to [4], sorts the jobs in decreasing order of processing time prior to running Greedy Multiprocessor Scheduling algorithm algorithm.
Modified Greedy Multiprocessor Scheduling: |
Sort the jobs in decreasing order of processing times |
For |
Graham [4:1] proved the following tight bound.
Theorem 5.5. Modified Greedy Multiprocessor Scheduling algorithm gives a -approximation for the Multiprocessor Scheduling problem.
We will not prove the preceding theorem which requires some careful case analysis. Instead we will show how one can obtain an easier bound of via the following claim.
Claim 5.1.1. Suppose for all and . Then, .
Proof. Since and the processing times are sorted in decreasing order, some two of the largest jobs must be scheduled on the same machine. Notice that the load of this machine is at least .
Exercise 5.1. Prove that Modified Greedy Multiprocessor Scheduling gives a -approximation using the preceding claim and the other two lower bounds on that we have seen already.
Before going to the description of a PTAS for Multiprocessor Scheduling problem, we discuss the case when the processing times of the jobs are bounded from above.
Claim 5.1.2. If , then Modified Greedy Multiprocesor Scheduling gives -approximation.
5.1.3. A PTAS for Multi-Processor Scheduling
We will now give a PTAS for the problem of scheduling jobs on identical machines. We would like to use the same set of ideas that were used for the Knapsack problem (see Lecture 4): that is, given an explicit time we would like to round the job lengths and use dynamic programming to see if they will fit within time . Then the unrounded job lengths should fit within time .
Big Jobs, Small Jobs and Rounding Big Jobs: For the discussion that follows, we assume that all the processing times have been scaled so that and hence, .
Given all the jobs, we partition the jobs into two sets: Big jobs and Small jobs. We call a job "big" if . Let and denote the set of big jobs and small jobs respectively, i.e., and . The significance of such a partition is that once we pack the jobs in set , the jobs in set can be greedily packed using list scheduling.
Claim 5.1.3. If there is an assignment of jobs in to the machines with load , then greedily scheduling jobs of on top gives a schedule of value no greater than .
Proof. Consider scheduling the jobs in after all the jobs in have been scheduled (with load ). If all of these jobs in finish processing by time , the total load is clearly no greater than .
If the jobs in can not be scheduled within time , consider the last job to finish (after scheduling the small jobs). Suppose this job starts at time . All the machines must have been fully loaded up to , which gives . Since, for all jobs in , we have , this job finishes at . Hence, the schedule can be no more than , settling the claim.
Scheduling Big Jobs: We concentrate on scheduling the jobs in . We round the sizes of all jobs in using geometrically increasing interval sizes using the following procedure:
Rounding Jobs: |
For each job |
Let be the set of new jobs.
Claim 5.1.4. If jobs in can be scheduled with load 1 , then the rounded jobs in can be scheduled with load .
Claim 5.1.5. The number of distinct big job sizes after rounding is .
Proof. Notice that due to scaling, we have for all jobs . Since the job sizes are between and 1 the number of geometric powers of required is where
Lemma 5.1. If the number of distinct job sizes is , then there is an exact algorithm that returns the schedule (if there is one) and runs in time .
Proof. Use Dynamic Programming.
Corollary 5.6. Big Jobs can be scheduled (if possible) with load in time .
Once we have scheduled the jobs in , using Claim 5.1.3, we can pack small items using greedy list scheduling on top of them. The overall algorithm is then given as:
PTAS Multiprocessor Scheduling |
1. Guess |
2. Define |
3. Round |
4. If jobs in |
In the following subsection, we comment on the guessing process.
Guessing: We define a -relaxed decision procedure:
Definition 5.7. Given and a time -relaxed decision procedure returns:
-
Output a schedule (if there is one) with makespan load
or -
Output correctly that there is no schedule with makespan
.
Define
Remark 5.1. A PTAS indicates that the problem can approximated arbitrarily well in polynomial time. However, a running time of the form is typically not very interesting. We have seen that an FPTAS is ruled out for the makespan minimization problem. However, it does admit what is now called an Efficient PTAS (EPTAS) whose running time is . See [5].
5.1.4 Section Notes
Multiprocessor Scheduling is NP-hard as we can reduce 2-Partition to Multiprocessor Scheduling on two machines. Note that this reduction only proves that Multiprocessor Scheduling is weakly NP-hard. When is a fixed constant Horowitz and Sahni [6] give an FPTAS. However Multiprocessor Scheduling problem is strongly NP-hard when is part of the input (by a reduction from 3-Partition [7]). Thus, there can not exist an FPTAS for the Multiprocessor Scheduling problem in general, unless . However, Hochbaum and Shmoys [8] gave a PTAS which is the one we described. EPTASes have been developed for several problems and a key technique is the use of integer linear programming solvers with a small number of variables.
5.2 Bin Packing
5.2.1 Problem Description
In the Bin Packing problem, we are given a set of items . Item has size . The goal is to find a minimum number of bins of capacity 1 into which all the items can be packed.
One could also formulate the problem as partitioning into sets such that and is minimum.
5.2.2 Greedy Approaches
Consider the following greedy algorithm for bin packing:
Greedy Bin Packing: |
Order items in some way |
For |
In Greedy Bin Packing algorithm, a new bin is opened only if the item can not be packed in any of the already opened bins. However, there might be several opened bins in which the item could be packed. Several rules could be formulated in such a scenario:
-
First Fit: Pack item in the earliest opened bin
-
Last Fit: Pack item in the last opened bin
-
Best Fit: Pack item in the bin that would have least amount of space left after packing the item
-
Worst Fit: Pack item in the bin that would have most amount of space left after packing the item
Irrespective of what strategy is chosen to pack an item in the opened bins, one could get the following result:
Theorem 5.8. Any greedy rule yields a -approximation.
Observation 5.9. .
We call a bin -full if items occupy space at most .
Claim 5.2.1. Greedy has at most bin that is -full.
Proof. For the sake of contradiction, assume that there are two bins and that are -full. WLOG, assume that Greedy Bin Packing algorithm opened bin before . Then, the first item that the algorithm packed into must be of size at most . However, this item could have been packed into since is -full. This is a contradiction to the fact that Greedy Bin Packing algorithm opens a new bin if and only if the item can not be packed in any of the opened bins.
Proof of Theorem 5.8. Let be the number of bins opened by Greedy Bin Packing algorithm. From Claim 5.2.1, we have:
Using the observation that , we get:
which gives us:
5.2.3 (Asymptotic) PTAS for Bin Packing
A natural question follows the discussion above: Can Bin Packing have a PTAS? In this subsection, we settle this question in negative. In particular, we give a reduction from an NP-complete problem to the Bin Packing problem and show that a PTAS for the Bin Packing problem will give us an exact solution for the NP-complete problem in polynomial time. We consider the Partition problem:
In the Partition problem, we are given a set of items . Item has a size . The goal is to partition the into two sets and such that .
Claim 5.2.2. If Bin Packing has a -approximation for any , the Partition problem can be solved exactly in polynomial time.
Proof. Given an instance of the Partition problem, we construct an instance of the Bin Packing problem as follows: Scale the size of the items such that . Consider the scaled sizes of the items as an instance of the Bin Packing problem. If all items of can be packed in bins, then we have an "yes" answer to . Otherwise, the items of need 3 bins and the answer to is "no".
Recall the scaling property where we discussed why many optimization problems do not admit additive approximations. We notice that the Bin Packing problem does not have the scaling property. Hence it may be possible to find an additive approximation algorithms. We state some of the results in this context:
Theorem 5.10 (Johnson '74 [9]). There exists a polynomial time algorithm such such that:
for all instances I of the Bin Packing problem.
Theorem 5.11 (de la Vega, Lueker '81 [10]). For any fixed there exists a polynomial time algorithm such such that:
for all instances I of the Bin Packing problem.
Theorem 5.12 (Karmarkar, Karp '82 [11]). There exists a polynomial time algorithm such such that:
for all instances I of the Bin Packing problem.
This has been improved recently.
Theorem 5.13 (Hoberg and Rothvoss 2017 [12]). There exists a polynomial time algorithm such such that:
for all instances I of the Bin Packing problem.
A major open problem is the following.
Open Question 5.14. Is there a polynomial-time algorithm such that , for some fixed constant ? In particular is ?
Exercise 5.2. Show that First Fit greedy rule yields a -approximation.
5.2.4 Asymptotic PTAS for Bin Packing
A recurring theme in last two lectures has been the rounding of jobs/tasks/items. To construct an asymptotic PTAS for Bin Packing problem, we use the same set of ideas with simple in retrospect but non-obvious modifications. In particular, we divide the set of items into big and small items and concentrate on packing the big items first. We show that such a technique results in an asymptotic PTAS for the Bin Packing problem.
Consider the set of items, . We divide the items into two sets, and . Similar to the Multiprocessor Scheduling problem, where we rounded up the processing times of the jobs, we round up the sizes of the items in the Bin Packing problem. Again, we concentrate only on the items in . Let be the number of big items.
We observed earlier that and hence by just considering the big items.
Claim 5.2.3. Suppose . Then .
If there are very few big items one can solve the problem by brute force.
Claim 5.2.4. Suppose . An optimal solution for the Bin Packing problem can be computed in time.
Proof Sketch. If the number of big items is small, one can find the optimal solution using brute force search.
The following gives a procedure to round up the items in :
Rounding Item Sizes: |
Sort the items such that |
Group items in |
Round the size of all the items in group |
Lemma 5.2. Consider the restriction of the bin packing problem to instances in which the number of distinct item sizes is . There is an -time algorithm that outputs the optimum solution.
Proof Sketch. Use Dynamic Programming.
Claim 5.2.5. The items in can be packed in bins in time .
Proof. Using Rounding Item Sizes, we have restricted all items but those in to have one of the distinct sizes. Using lemma 5.2, these items can be packed efficiently in . Furthermore, the items in can always be packed in bins (one per bin). Hence, the total number of bins is .
The running time of the algorithm follows since .
Lemma 5.3. Let be fixed. Consider the restriction of the bin packing problem to instances in which each items is of size at least . There is a polynomial time algorithm that solves this restricted problem within a factor of .
Proof. Using Claim 5.2.5, we can pack in bins. Recall that where, we have used Claim 5.2.3 to reach the final expression.
Theorem 5.15. For any , there is an algorithm that runs in time polynomial in and finds a packing using at most bins.
Proof. Assume that the number of bins used to pack items in is and the total number of bins used after packing items in is . Clearly
since at most one bin must be full using an argument in Greedy Bin Packing. Furthermore,
for . This gives the required expression.
The algorithm is summarized below:
Asymptotic Ptas Bin Packing: |
Split the items in |
Round the sizes of the items in |
Find optimal packing for items with rounded sizes |
Use this packing for original items in |
Pack items in |
5.2.5 Section Notes
An excellent but perhaps somewhat dated survey on approximation algorithms for Bin Packing problem is [13]. See [12:1] for some pointers to more recent work. There has also been substantial recent work on various generalizations to multiple dimensions.
- An alternative definition: A problem
is strongly NP-hard if every problem in NP can be polynomially reduced to in such a way that numbers in the reduced instance are always written in unary ↩︎ - The load of a machine is defined as the sum of the processing times of jobs that are assigned to that machine. ↩︎
- Ronald L Graham. “Bounds for certain multiprocessing anomalies”. In: Bell system technical journal 45.9 (1966), pp. 1563–1581. ↩︎
- Klaus Jansen. “An EPTAS for scheduling jobs on uniform processors: using an MILP relaxation with a constant number of integral variables”. In: SIAM Journal on Discrete Mathematics 24.2 (2010), pp. 457–485. ↩︎
- Ellis Horowitz and Sartaj Sahni. “Exact and approximate algorithms for scheduling nonidentical processors”. In: Journal of the ACM (JACM) 23.2 (1976), pp. 317–327. ↩︎
- Michael R Garey and David S Johnson. Computers and intractability. Vol. 174. Freeman, San Francisco, 1979. ↩︎
- Dorit S Hochbaum and David B Shmoys. “A polynomial approximation scheme for scheduling on uniform processors: Using the dual approximation approach”. In: SIAM journal on computing 17.3 (1988), pp. 539– 551. ↩︎
- David S Johnson. “Approximation algorithms for combinatorial problems”. In: Journal of computer and system sciences 9.3 (1974), pp. 256–278. ↩︎
- W Fernandez De La Vega and George S. Lueker. “Bin packing can be solved within
in linear time”. In: Combinatorica 1.4 (1981), pp. 349– 355. ↩︎ - Narendra Karmarkar and Richard M Karp. “An efficient approximation scheme for the one-dimensional bin-packing problem”. In: 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982). IEEE. 1982, pp. 312–320. ↩︎
- EG Co man Jr, MR Garey, and DS Johnson. “Approximation algorithms for bin packing: A survey”. In: Approximation algorithms for NP-hard problems (1996), pp. 46–93. ↩︎