The Subset Sum Problem

Definition:
A knapsack vector is a set A of positive and pairwise different integers:

A = (a₁, a₂, ..., a_n), a_i != a_j if i != j

Definition:
The subset sum problem is the following:

Given: A knapsack vector A = (a₁, a₂, ..., a_n) and a positive integer s, called the sum.
The question is then: Is there a subset A' of A with SUM_{a' in A'}(a') = s, or equivalent:
Does there exist a vector X = (x₁, x₂, ..., x_n), x_i in {0, 1}, with AX = s?

From now on AX means a₁x₁ + a₂x₂ + ... + a_nx_n

The problem just described is a decision problem. If a solution is desired, the problem becomes a functional problem: the task is to compute a solution vector X = (x₁, x₂, ..., x_n), x_i in {0, 1} such that AX = s, provided a solution exists.
The functional problem of a knapsack vector A together with a sum s is denoted by (A, s).

Example:
Let A be the set (15, 22, 14, 26, 32, 9, 16, 8) and let s be 53. Then a solution to (A, 53) is given by the vector X = (1, 1, 0, 0, 0, 0, 1, 0), because AX = a₁ + a₂ + a₇ = 15 + 22 +16 = 53. The corresponding subset is A' = (15, 22, 16).
In general there may exist several solutions to (A, s) as in this example. Another solution for (A, 53) is X = (0, 1, 1, 0, 0, 1, 0, 1) with the subset A' = (22, 14, 9, 8).

No polynomial algorithm is known solving the general subset sum problem. The subset sum problem is in the complexity class NP. The decision problem is NP-complete and the corresponding functional problem is NP-hard. The decision problem and the functional problem are equivalent with respect to the complexity meaning if a polynomial algorithm is known solving the decision problem, this algorithm can also be used for solving the functional problem and vice versa.

One method of finding a possible solution vector for a given knapsack set A and a given sum s is the trivial one: for all possible 0-1 vectors X = (x₁, ..., x_n) the sum s' = AX is computed. If s' = s, one solution is the vector X.
Because in worst case this algorithm is computing the sum of all 2ⁿ different X vectors, this algorithm has an exponential running time of O(2ⁿ).

One of the more efficient algorithms is the meet-in-the-middle algorithm. This algorithm divides the original knapsack vector A in two equal parts A₁ = (a₁, ..., a_t) and A₂ = (a_t+1, ..., a_n). In a precomputation step, all possible sums s₁ = A₁X₁ are computed. Here t is n/2, if n is even, else (n-1)/2. All s₁ are stored in a table, together with the corresponding X-vector. This table then is sorted by the sum.
In a second step, for the knapsack A₂ all possible sums s₂ are built, the difference L = s - s₂ is computed and the table is searched for an entry having L as first component. If so, then L = s - s₂ or s = L + s₂ = A₁X₁ + A₂X₂ and thus one solution is X₁||X₂.

Input: a knapsack vector A = (a₁, ..., a_n) and a sum s.
Output: a vector X with AX = s, provided a solution exists.

Divide the knapsack A = (a₁, ..., a_n) into A₁ = (a₁, ..., a_t) and A₂ = (a_t+1, ..., a_n).
For every vector X₁ = (x₁, ..., x_t)

compute s₁ = A₁X₁
store s₁ togeter with the corresponding x-vector in a table T
sort this table by the sum

Now for all possible vectors X₂ = (x_t+1, ..., x_n)

compute s₂ = A₂X₂
get the difference L = s - s₂
search L in T. If found, a solution vector is X = X₁||X₂.

The running time of this algorithm is O(n2^n/2). In the first step, all 2^n/2 sums are computed. Sorting a set of t elements can be done in time O(t log(t)). Therefore a set of 2^n/2 elements is sorted in O((n/2)2^n/2).
The worst case of step two is, when all 2^n/2 sums s₂ are computed. The search for an element in an ordered set of t elements is done in O(log t) and thus searching after 2^n/2 elements takes O((n/2)2^n/2) steps.
Added together, step one and step two result in a running time of O(n2^n/2). But additionally, this algorithm needs storage space of O(2^n/2) to hold the table.

Start applet

Superincreasing knapsacks