The above picture shows a newly developed schedule for
gossiping on an butterfly network. Butterflies are a class of powerful
networks, that are used as interconnection structure for parallel
computers. Butterfly_k has k * 2^k (k times 2 to the power of k)
processors and twice as many connections. The example gives Butterfly_3
with 24 processors and 48 connections. Gossiping is an important and
time consuming communication problem. Initially each processor holds
one data item. By exchanging data items, the processors must reach the
situation in which they all know all information. We assume that a
processor can only exchange data with one other processor at a time.
Thus, after one round of communication, a processor can know at most
two data items, after two rounds at most four, after three rounds at
most eight and so on. Thus, for a network with 24 processors at least
five communication rounds are required. In the picture each data item
is represented by a small black square. Its position within the larger
rectangles, which represent the processors, corresponds to the position
within the network of the processor form which it is originally coming.
We see that initially, in the highest picture, each processor holds its
own data item. The connections are indicated with thin dark green
lines. The fat red lines indicate the connections that are used for the
exchange of data. We see how, round after round, the set of data items
in each processor grows: 1, 2, 4, 6 or 8, 14, 20. The sixth
communication round will complete the gossiping. So, this schedule
requires one round more than the absolute minimum. We can also count
how many data items are transferred at most by one processor in every
round when the processors always send all data items that are not yet
known by the processor it is communicating with. This we call the
number of steps. In round 1, all processors send 1 data item, so round
1 takes one step. Round 2 takes 2 steps, then 4 and 8. In round 5,
however, no processor needs to send more than 6 data items. Round 6
takes only 4 steps. Summing up these numbers gives that the total
number of steps equals 25. As clearly each processor must receive 23
data items, which takes at least 23 steps, this is at most two more
than absolutely necessary. It is not known whether the obtained bounds
can be improved. Particularly the combination of an almost minimal
number of rounds and an almost minimal number of steps is very good.
The amazing thing is that this schedule was found by a general
gossiping heuristic in less than one second (joint work with Rene
Beier). In this area i am looking for someone who wants to do a
programming project.
External Computing
A second direction, gaining in importance in my research,
is the development of algorithms for computing with large data sets,
external-memory computation. Until now, I have been designing
algorithms for selection, list-ranking, shortest-paths, connected
components and minimum spanning trees. It is well-known that there is
a strong relation between parallel and external-memory computation.
Together with Michael Kaufmann I have designed a paradigm to translate
parallel (BSP) algorithms into external ones. In the future we will
refine this method and use it to obtain novel external-memory
algorithms.
Cluster Computing
In between external and parallel computation, there is the field
of computing on cluster machines. These are particularly interesting
for large problems, because they allow for parallel access to many
hard-disks in a trivial way. Systems of this kind still have not yet
been modeled in sufficient detail. We have investigated the suitability
of the cluster at the MPI for solving very large list-ranking and
connected-components problems.