The Subset Sum Problem
Definition:
A knapsack vector is a set A of positive and pairwise different integers:
A = (a1, a2, ..., an), ai
!= aj if i != j
Definition:
The subset sum problem is the following:
- Given
- A knapsack vector A = (a1, a2,
..., an) and a positive integer s, called the sum.
- The question is then
- Is there a subset A' of A with SUMa' in A'(a') = s, or equivalent:
Does there exist a vector X = (x1, x2, ..., xn), xi in {0, 1}, with AX = s?
From now on AX means a1x1 + a2x2 + ... + anxn
The problem just described is a decision problem. If a solution is desired, the problem becomes a functional problem: the task is to compute a solution vector X = (x1, x2, ..., xn), xi in {0, 1} such that AX = s, provided a solution exists.
The functional problem of a knapsack vector A together with a sum s is denoted by (A, s).
Example:
Let A be the set (15, 22, 14, 26, 32, 9, 16, 8) and let s be 53. Then a solution to (A, 53) is given by the vector X = (1, 1, 0, 0, 0, 0, 1, 0), because AX = a1 + a2 + a7 = 15 + 22 +16 = 53. The corresponding subset is A' = (15, 22, 16).
In general there may exist several solutions to (A, s) as in this example. Another solution for (A, 53) is X = (0, 1, 1, 0, 0, 1, 0, 1) with the subset A' = (22, 14, 9, 8).
No polynomial algorithm is known solving the general subset sum problem. The subset sum problem is in the complexity class NP. The decision problem is NP-complete and the corresponding functional problem is NP-hard. The decision problem and the functional problem are equivalent with respect to the complexity meaning if a polynomial algorithm is known solving the decision problem, this algorithm can also be used for solving the functional problem and vice versa.
One method of finding a possible solution vector for a given knapsack set A and a given sum s is the trivial one: for all possible 0-1 vectors X = (x1, ..., xn) the sum s' = AX is computed. If s' = s, one solution is the vector X.
Because in worst case this algorithm is computing the sum of all 2n different X vectors, this algorithm has an exponential running time of O(2n).
One of the more efficient algorithms is the meet-in-the-middle algorithm.
This algorithm divides the original knapsack vector A in two equal parts A1 = (a1, ..., at) and A2 = (at+1, ..., an). In a precomputation step, all possible sums s1 = A1X1 are computed. Here t is n/2, if n is even, else (n-1)/2. All s1 are stored in a table, together with the corresponding X-vector. This table then is sorted by the sum.
In a second step, for the knapsack A2 all possible sums s2 are built, the difference L = s - s2 is computed and the table is searched for an entry having L as first component. If so, then L = s - s2 or s = L + s2 = A1X1 + A2X2 and thus one solution is X1||X2.
Input: a knapsack vector A = (a1, ..., an) and a sum s.
Output: a vector X with AX = s, provided a solution exists.
-
Divide the knapsack A = (a1, ..., an) into A1 = (a1, ..., at) and A2 = (at+1, ..., an).
For every vector X1 = (x1, ..., xt)
- compute s1 = A1X1
- store s1 togeter with the corresponding x-vector in a table T
- sort this table by the sum
-
Now for all possible vectors X2 = (xt+1, ..., xn)
- compute s2 = A2X2
- get the difference L = s - s2
- search L in T. If found, a solution vector is X = X1||X2.
The running time of this algorithm is O(n2n/2). In the first step, all 2n/2 sums are computed. Sorting a set of t elements can be done in time O(t log(t)). Therefore a set of 2n/2 elements is sorted in O((n/2)2n/2).
The worst case of step two is, when all 2n/2 sums s2 are computed. The search for an element in an ordered set of t elements is done in O(log t) and thus searching after 2n/2 elements takes O((n/2)2n/2) steps.
Added together, step one and step two result in a running time of O(n2n/2). But additionally, this algorithm needs storage space of O(2n/2) to hold the table.
Start applet