Preface and Notation

This document contains the text of a first course in computer science.

As html is not suited for mathematical formulas, some additional notation is used (as used in the typographical package Latex). a_i denotes a with subscript i. a^i denotes a to the power i. <= and >= are used as in most computer languages. Curly brackets, "{" and "}", are use to group things. sum_{i = 0}^j denotes the sum for i running from 0 to j. The same may also be written sum_{0 <= i <= j}. ~= stands for "approximately equal" and ~ for "proportional to". Greek letters are written out. For logical expressions we either use the notation from C or the operators are written out in text. So, "a && !b" is the same as "a and not b". sqrt stands for the square root function and log without specifying the base number for the logarithm to basis 2. There are a few more notations, but they should be understood easily. New notions are printed bold there where they are defined. Inside the chapters particularly important ideas and short key notes are highlighted without introductory text. There are many pictures. These are intended to be self-explanatory. Typically they are placed just after the text fragment to which they belong, generally there is no direct reference to them in the text.

There are very few references to the literature. Clearly most of the presented material is not new. A large part is even common knowledge. More directly this text is based on material found in the following books:

Several students have contributed by pointing out errors and spots which required better explanation.

Notice: the following text may be overcomplete. At the examination it is expected that the students only know all that has been presented during the lectures.

Table of Contents

  1. Computer Science
  2. Structured Programming: C
  3. Program Execution
  4. Object Oriented Programming: Java
  5. Graphical User Interfaces and Applets
  6. Programming Paradigms
  7. Functional Programming: Haskel
  8. Logic Programming: Prolog
  9. Finite Automata
  10. Grammars
  11. Laws and Rules
  12. Algebras




Computer Science

In this ntroductory chapter we first try to give an idea what the word computer science means. Then we will sketch the nature of computer science and the impact of technological progress on this science.

Definition

What is computer science? In the book by Aho and Ullman we find
Computer science is the mechanization of abstraction.
Rather cryptical at a first glance.

On the website of the Computer Science Department in Halle we find:

Informatik ist die Wissenschaft von der theoretischen Analyse, der organisatorischen und technischen Gestaltung und der konkreten Realisierung von (komplexen) Systemen, die menschliche Fachkenntnisse und Kommunikation in technischen, wirtschaftlichen und sozialen Bereichen unterstützen sollen.
Even if you can read German, this is hard to understand

"Mechanization of abstraction" may sound cryptical, but at least it is short. The word "computer science" is maybe too narrow. In many other languages the word "information" is hidden in the name. So, it is a science working with computers (or some other mechanical device) on information. This information is often obtained by constructing an abstract (that is mathematical) model of the world. For example, if we want to compute the optimal assignment of green/red durations to traffic lights in a city, then we must model this problem in such a way that it can be handled by the mechanical equipment at hand, our computer. Abstraction does not mean to say easy things in a hard way, but rather to make it simpler by not paying attention to details which most likely do not matter for the problem under consideration. In our example, it is probably not important to keep track of the colors or brands of the cars, but probably we should keep track of their length.

On the website of the Computer Science Department we also find:

Die Einordnung der Informatik in das Wissenschaftsgebäude ist schwer. Viele Grundlagen und Methoden der Informatik sind mathematischer Natur: Abstraktion, Logik, Fragen der Berechenbarkeit. Ihre naturwissenschaftliche-experimentelle Komponente wird deutlich beim Erforschen, Modellieren und Simulieren von schwer überschaubaren Strukturen. Informatik ist auch Ingenieurwissenschaft, denn ausgehend von konstruktiven Methoden geht es ihr um hard- und software-technische Realisierungen von Systemen; hier sieht sie ihre Aufgabe in der Begriffsbildung wie in der Entwicklung generalisierbarer Methoden. Ihr Leitbild ist die Erstellung korrekter Systeme. Damit paßt die Informatik nicht in die klassische Einteilung der Wissenschaften. C.F. von Weizäcker bezeichnete aus diesem Grunde die Informatik als Strukturwissenschaft.
This is not very helpful as a definition, and I do not agree with the main claim, namely that computer science is something very special. I think that computer science is not different in character from physics, chemistry or geology: based on a solid mathematical foundation, aspects of the real world are modeled with the goal to obtain answers to practical and theoretical questions in their respective domains. On the more practical side of these sciences we find people developing flat-screen TVs and high-temperature super-conductors, better paints and more efficient production methods for plastics, searching for oil or gold. All these sciences derive their importance from the fact that ultimately they lead to engineering.

The word "computer science" also contains the word "science". This is to underline that it is not like learning a foreign language. "Science" means research, development and especially an attitude of openness for changes. Changes occur in the underlying conditions and the domains of application.

Underlying Conditions

The underlying conditions have been changing dramatically, and just like science fiction writers play with the technological possibilities of their own days, so is also the phantasy of computer scientists strongly influenced by the available technology.

One of the important pioneers is Charles Babbage (1791-1871). He designed and build machines for computing differences. He even designed a much more ambitious machine, "analytical engines", which would have had many features of a computer. These are great achievements, but we clearly see the relation to his time of working: 1850 was the highday of mechanical thinking. The Jackard weaving looms (Joseph Marie Jacquard, 1752-1834), which are used to weave complex patterns based on information stored in the form of punchcards, were in full production those days.

Design of an analytical machine

The first true computer scientist, for whom the fundamental aspects were more important than the engineering aspects of building a useful machine is Alan Turing (1912-1954). He was thinking in a purely abstract way about the computational possibilities of an imaginary machine, now called "Turing machine", with reading and writing capability on one or more tapes. He did this in a time when such machines where not constructible yet, but it appears that his thoughts were influenced by the fact that in his time telegraph machines were rattling in every office.

The next great person is John von Neumann (1903-1957). Among many many other things he came with a cost model, now called the "von Neumann model". In this model all computer details are abstracted away, which allows to focus on the essential, namely the number of operations performed. This model allows to easily compare two programs: just count the number of operations for each of them, and the one who performs least operations is assumed to be the best. This model was formulated in a time that indeed most operations had approximately the same cost.

John von Neumann with computer

Now, we have the mechanical era far behind us, and punchcards are no longer in use. We still know what tapes are, but the most common storage media are round. More importantly, we see that processors become faster at a much higher rate than memories: on a modern processor, fetching a number from the main memory costs as much as several hundreds of multiplications. This means that if we compare two programs, only considering the number operations performed, does not necessarily lead to the right conclusion. This observation leads to a gradual shift of interest.

Current phantasies go in direction of bio-computing and quantum computing. Bio-computing means that one uses the binding properties of DNA to do computation. This method can really be applied, in principle it even gives the possibility to solve complex problems far faster than with conventional computers, but there are many practical problems, the most important being that one needs enormous amounts of molecules to solve any non-trivial problem. Quantum computing appears to have a larger potential. So far we have accepted that storing a bit requires a transistor. A transistor is a switch. Nowadays transistors are constructed by adding some other substances to a piece of silicon. These silicon-based transistors have become extremely small, but will always need a substantial number of molecules. Quantum computing might bring the next jump in scale: the idea is to store a bit by using that the spin of quantum mechanical objects (electrons, atoms, ...) is discrete. So, potentially memory might be realized at an atomic scale.

Domains of Application

Before 1960, computers were affordable only to some state research centers. The usage was reflected by this: mainly research with military application (breaking codes, atomic program, missile guidance).

During the 10 years up to 1970, computers became available to big companies and some academic institutions. The application area is now dominated by financial administration and number crunching for physical and chemical research. The leading languages for these applications are (still) Cobol and Fortran.

During the 10 years up to 1980, computers became much cheaper and smaller and start to enter the offices of banks and many other companies. The main application gradually becomes text processing, computers are more and more used as noble typewriters. This is the time of the IBM dominance.

Around 1980 the first home computers appeared based on the ZX 80 processor. The capability of the Sinclair Spectrum, which costed about 200 Euro, was extremely limited (the first versions had a few KB of RAM memory and a very "special" keyboard, though later a "luxury" version appeared with a more conventional keyboard and 48 KB of RAM). The next step was the Comodore 64, which had the feel of a real computer (64 KB of RAM and quite appealing design). Then Atari launched a machine with an incredible 1 MB of RAM, the first machine which offered a low-price alternative even for office-use. Anyone who wanted could now have a computer, which was in most cases used for playing games.

The internet, which had been existing for a long time for sending email among scientists, was gradually becoming more important. This exploded by the development of browsers and the concept of websites which are reachable at all times. Nowadays the most striking usage of computers is as a device for surfing the web.

This does not mean that the other applications have disappeared: business administration is still very important; textprocessing is still what many people every day use their computers for; in science computers are used everywhere; games are still played massively. However, games have in part moved from being played on general purpose computers to special purpose game computers. Also for surfing the web a dedicated device might be cheaper and better. Possibly the number of home computers has now reached its peak.

Invisible Computers

There are far more computers than those considered above. In all kind of equipment such as washing machines, cars and elevators, there are some kind of simple processors controlling the process they are dedicated for. The price of these embedded computers is mostly only a fraction of that of the "real" computers. Designing and programming these computers is done by specialists. The design is the work of engineers, but programming them might very well be done by computer scientists. It is an important topic, but does not play a great role in the usual curricula.

On normal computers we got used to 32-bit arithmetic (and there are even 64-bit processors around). This means that in a single clock cycle two 32-bit numbers can be compared and added and even, this is really impressive, multiplied. Embedded computers often still work with 4-bit arithmetic. For computational purposes they would be terribly slow, but for processing some simple signals this is often enough. A 4-bit arithmetic-unit requires much less hardware than a 32-bit one, thereby reducing the size of the ship and thus its price. As should be clear from the above, an embedded computer, including its memory, normally consists of a single chip.

Programming embedded computers is somewhat different from programming usual computers. In the first place must such programs be extremely space efficient because every extra byte of storage increases the cost of the device. In a time that the memory of home PCs is approaching 1 GB, this may sound futile, but depending on the application we may talk about "computers" whose price may lie in the penny range.

For programs for embedded computers, correctness is even more important than for normal computers. It may be time consuming, but if in a normal program a bug shows up, it is mostly possible to make a patch. On the other hand, imagine what happens if, due to a programming error which rarily shows up, all cars of a certain model must be called back for exchanging some module hidden somewhere deep inside.

Correctness in the case of an embedded computer is more than that it does what it should do. There may also be the requirement that it performs a task within a certain number of clock cycles: a signal must have been processed before the next signal may come in. The number of signals a signal processor can process per second is part of its specification. This number must be guaranteed, it is not good enough if it mostly makes it. The same applies to the space consumption: due to the fixed amount of memory, which is part of the specification of the chip the program is designed for, the maximum space consumption must be guaranteed.

A similar situation can be found on the programmable pocket calculator TI 59 produced by Texas Instruments around 1980. It has an accessible user-memory of 4000 bits (not bytes). 4 bits (a unit which is also called a nibble) are required for a single decimal digit. This means that in total 1000 decimal digits can be stored. On this calculator, the user can freely choose whether to have more space for storing 8-digit numbers or to have more space for the program. Every instruction was encoded by a two-digit opcode (clear = 25, store = 42, recall = 43, - = 75, label = 76, reset = 81, + = 85, ...). So, four program instructions could be traded against one possibility to store a number. The space for storing numbers was running from one end of the memory, the space for storing the program from the other end. Of course the boundaries of these partitions where not tested at runtime and strange things happened when overwriting part of the program ...





Structured Programming: C

In this chapter we consider some of the most important aspects of the programming language C. Some of these are typical for C, most of them appear explicitly or implicitly in all other modern programming languages. Both C++ and Java can be viewed as extensions of C, though in these languages there is a different conception of good style. We do not strive for completeness in any way. We mention only the most important data types and commands. The purpose of this chapter is to provide a basic subset of C which allows to write simple programs. Complete programs are provided as examples and these can be used to obtain own programs by modification.

First Program: Basics

We start with a very simple program which reads two values from input, computes their product and prints the result. New features are the following:
  #include "stdio.h"
  int main() {
    int a, b;
    printf("\nGive the value of a   >>>   ");
    scanf("%d", &a);
    printf("Give the value of b   >>>   ");
    scanf("%d", &b);
    a = a * b;
    printf("The product is %1d\n\n", a);
    return 1; }

The first line tells that, before anything is done, routines from the library stdio.h must be loaded. This library contains IO routines which are needed for reading and writing data.

The actual program always begins in the section called "main". The piece of code "int main()" is called the program header. It tells us that the return value of the program is an int, and, in this case, that the program does not have external parameters. The "{" indicates the beginning of the text. The end is indicated by "}". In between we find lines of code, giving several instructions. These form the program body. Each instruction is ended with a ";". Except for the (conditional) jump statements and calls to subroutines which are discussed later, a program is executed in linear order, processing the instructions line-by-line. Instead of the word instruction we will prefer the word statement.

Following the header, it is told which variables are going to be used: "int a, b;". Such a statement is called a declaration. This means that we declare three integer variables. Declaring a variable means that some storage is allocated which can be used to store a value. In C and all other modern computer languages, each variable has a type. This means, that for each variable it must be specified what kind of value it is. Examples of types are char (characters), int (integers) and float (floating-point numbers). Specifying the type is essential because the amount of storage to allocate depends on the type. Specifying the type also allows the system to perform type checking it is probably an error if a character is compared with an integer. C is exceptional in the sense that almost no checking is performed at all. This is done to make the execution faster, but it also implies that programming errors can go undetected for a long time, making the debugging, the process of finding the errors, in programs written in C much harder than in stricter languages such as Java. There are different views on where to write the declarations. In the languages Pascal and C the declarations must come before all other statements. In C++ and Java they can appear at any place before there first usage. In Java it is actually considered to be good style to make a declaration upon first usage. Both views are defendable.

After the variable declaration, we find a statement producing a line of output. The format of IO statements varies strongly from language to language. C is quite convenient. In a print statement, the text to print is enclosed between a pair of "-symbols. Possibly we want to print the value of a variable. In that case it is indicated what kind of variable it is and how much space one can use for it by a combination like "%1d". Here the "%" indicates "take care, here a variable value must be inserted". The "1" indicates that the default number of used positions is 1, always using at least the number of positions the number actually requires. The "d" indicates that here we are going to print an integer (for a float one uses "f", for a char "c" and for a string "s"). The combination "\n" generates a line feed.

Reading a value is analogous. Only now we must precede the name of the variable by the symbol "&". In the section on procedures we will see that this symbol means that we are passing the address of the variable, which allows to return a value into it. Several values can be read in a single statement: "scanf("%d%f", &a, &x);" can be used to read an integer a and a float x. It is equivalent to "scanf("%d", &a); scanf("%f", &x);".

The statement "a = a * b;" is an example of a simple computation followed by an assignment. In an assignment the value of an expression on the right is assigned to a variable on the left. The original value of the variable on the left is overwritten and cannot be retrieved anymore. In our case we did not further need the original value of a. Alternatively we might have used a third int variable c, writing "c = a * b;". In some other languages (such as Algol and Pascal) the symbol for assignment is ":=".

The last statement of the program is "return 1;". Not all compilers require this, but some of them insist that main is returning an integer value. That is why we have written "int main()". Any value will do. The value can be used as a flag. For example, if the program may be terminated in several ways, this value can be used by the calling instance to test on errors or the like.

The execution of any C program starts at "main". The usage of any variable must be preceded by a declaration.

Composing, Compiling and Running

Every program is created inside an editor, it does not matter which one is used. It is common, maybe even required, to give programs names which end in ".c", so for example "my_program.c". Then within a unix-like environment, the program is compiled by calling one of the available compilers followed by the name of the program and compiler options. A compiler is a program which tests the source code and, when it does not find obvious errors, translates it into machine code.

Available compilers are at least gcc and cc. gcc is a real C compiler, cc is a C++ compiler. These compilers do not generate the same code and there is no guarantee that a program which runs when compiled with gcc also runs when compiled with cc or vice versa. There are two main reasons for this:

Compiler options can tell the compiler all kind of useful things: the degree of optimization, the file the code should be written to, the amount of warnings that should be printed, how tolerant the compilation should be performed, which libraries should be loaded, etc. A possible compile command looks like

gcc my_program_c -O3 -Wall -ansi -o my_executable
Here "-O3" means that we want optimized code, "-Wall" means that we want to hear all warnings, "-ansi" means that we want strict enforcement of the ansi rules, "-o my_executable" indicates that the compiled code should be written to the file my_executable.

We have seen that the IO library may be loaded by the instruction #include "stdio.h". Such an instruction is executed before the actual compilation starts. Other libraries which one may need are

stdlib.h:
many useful routines such as for generating random numbers.
math.h:
mathematical routines such as exp, cos and sqrt.
sys/time.h:
routines and types for determining setting clocks.
string.h:
routines for manipulating strings.
For some reason, using mathematical routines does not only require the inclusions of math.h, but also the compiler option -lm. So, then the compilation command might look "gcc my_prog.c -lm".

Of course, most programs, especially those of beginning programmers, contain syntactical errors when they are compiled for the first time. Syntactical errors are deviations from the syntax. The compiler is checking whether all syntactical rules have been applied and only when there are no violations an executable is generated. The default name of this executable is a.out. As the word suggests, the "executable" can be executed directly. That is, execution of the program can be started by simply typing

a.out
In some environments, for technical reasons, one has to type "./a.out".

If there are no syntactical errors, there may nevertheless be all kind of other errors. Possibly the programming is crashing at runtime, because one is performing a division by zero or running out off an array. In this case we say that the program has runtime errors. But even if the program has neither syntactical nor runtime errors, it does not need to be correct. Turning on the warnings with -Wall will find some of the non-syntactical errors, for example it detects that one is using an uninitialized variable, but no software whatever clever, can detect that the program is not fulfilling the specification (unless the program is fed with the specification).

With help of a compiler the source code of a program is translated to executable machine code.

Primitive Data Types

In the program example above we already encountered variables of the type int. More formally, we will also speak of the data type of a variable. Data types come in two kinds: primitive and structured. The primitive data types are the predefined unstructured data types which can contain a single value. The most important primitive data types are
char:
one byte, can also be used in a numerical way. The constants of this type, that is the symbols, are denoted between a pair of '-symbols: 'f', 'g', '8', ... .
int:
integral numbers in twos complement, mostly of size four bytes. The constants of this type are written as usual numbers: 567227, -12, 0, ... . With special symbols it can be specified that numbers are given in a different number system than decimal.
long:
integral numbers in twos complement, of size four or eight bytes The constants are written in the same way as ints.
float:
floating point numbers, typically of size four bytes. The constants of this type can be written in many different formats. For example as 1.2345, 1., 786.34e-25, ... . If one wants that the number is interpreted as a float (in a division for example) it should not look like a correct int, so it must contain a "." or an "e".
double:
floating point numbers, typically of size eight bytes. The constants are written in the same way as floats.

Each type has a certain size. This size is not specified exactly: on larger systems / more powerful processors, a long may be longer than on a small system / basic processor. This is handy. It is guaranteed however, that a long (double) is at least as long as an int (float). The range of numbers that can be stored into a variable of a certain type depends on the number of available bits. For chars the range goes from -128 to +127, for 4-byte ints from -2^31 to 2^31 - 1. A nice feature of C is that there are also unsigned versions of the types char, int and long. These can be used for numbers of which it is known that they are always positive. This doubles the available range: unsigned chars can assume any value from 0 to 255, unsigned ints any value from 0 to 2^32 - 1. The following is correct "unsigned char c = 178;", whereas "char c = 178;" gives an overflow of the number c (the program will not be interrupted, but the value of the variable c will probably not be the expected one). The words overflow and underflow are used to indicate the situation that a variable gets assigned a too large or too small value. Overflow might also be used to generally designate that a value outside the allowed range is assigned.

In many programming languages there is also a type "boolean" or "bool" for storing logical values (true and false). In C any of the numerical types can be used for this, char, interpreted as a number, is most suitable. False corresponds to 0, true to all other values.

C provides several primitive numerical types and characters, but no primitive type for booleans.

Operators

All of the usual operators are available in C: +, -, *, /, % for manipulating numbers; |, & and ^ for bitwise operations; ||, && and ! for boolean operations; <, <=, ==, !=, >= and > for comparison. The first four numerical operators are defined for all numerical types, but the division has different meaning: on integers "/" is division while throwing away the non-integral part of the result. So, 11 / 3 == 3. "%" is only defined for integers. It returns the remainder of the division, 11 % 3 == 2. Said otherwise, "%" corresponds to a modulo computation. Finally there are the shift operators >> and <<. They work on integers. a >> b, returns the value of a when shifting its bit pattern rightwards over b positions, throwing away the least significant b bits. Thus, 55 >> 3 == 6, while 55 == 110111 and 110 == 6. a >> b is equivalent to a / 2^b, but it can be performed faster because it is one of the primitive operations of most processors. In the same way a << b returns the value of a shifted leftwards over b positions. It is equivalent to a * 2^b. 6 << 3 == 48, because 6 == 110 and 110000 == 48. Notice that (a >> b) << b in general does not give a again. On the other hand, (a << b) >> b == a, provided that no overflow occurs.

There are even some special operators which are merely shorthands for combinations of the above operators:

  i++;      <->  i = i  + 1;
  i--;      <->  i = i  - 1;
  i  += j;  <->  i = i  + j;
  i  -= j;  <->  i = i  - j;
  i  *= j;  <->  i = i  * j;
  i  /= j;  <->  i = i  / j;
  i  %= j;  <->  i = i  % j;
  i  &= j;  <->  i = i  & j;
  i  ^= j;  <->  i = i  ^ j;
  i  |= j;  <->  i = i  | j;
  i <<= j;  <->  i = i << j;
  i >>= j;  <->  i = i >> j;
One may have different opinions on whether it is good style or not to use these. However, one should never believe that writing the source code more compactly will lead to shorter and faster compiled code: for the compiler there is really no difference between "i += 4;" and "i = i + 4;", and this will result in exactly the same code.

All operators belong to 1 of 15 priority levels. Any book on C provides a complete table. Here we give a shortened version:

15 bracketlike
 ()   []  . 
normal
14 unary operators
 ++   --  !  +  -  *  & 
reversed
13 multiplicationlike
 *  /  % 
normal
12 additionlike
 +  - 
normal
11 shifts
 <<  >> 
normal
10 comparisons
 <  <=  >  >= 
normal
9 equalitylike
 ==  != 
normal
8 ... 4 logical
 &  ^  |  &&  || 
normal
2 assignment
 =  +=  -=  *=  /=  %=  &=  ^=  |=  <<=  >>= 
reversed
1 comma
 , 
normal
Here in every row we first indicate the priority (higher priority operators are executed first), then the operators and finally the execution order for operators in the same priority class. "Normal" indicates execution from left to right, "reversed" indicates execution from right to left. One must not know all this. In case of doubt one should rather use brackets than look it up: the compiled code does not become longer because of this, but it becomes much easier to understand the program!
In expressions it is essential to assure the correct execution order.

Second Program: Loops

The second program asks for a set of SIZE numbers and then determines their maximum value. New features are the following:
  #include "stdio.h"
  #define  SIZE  10

  int main() {
    int i, m;
    int a[SIZE];
    printf("\n");
    for (i = 0; i < SIZE; i++) {
      printf("Give the value of a[%2d]   >>>   ", i);
      scanf("%d", &a[i]); }
    for (m = a[0], i = 1; i < SIZE; i++)
      if (a[i] > m)
        m = a[i];
    printf("\nThe maximum value is %1d\n\n", m);
    return 1; }

In the second line we find "#define SIZE 10". Upon encountering such an instruction, the preprocessor, the program that processes the program before the compilation, replaces all occurrences of SIZE by 10. In the program text SIZE can be used as a constant, but in the running program it does not physically exist, because it has been replaced by the numbers 10. The great advantage of using a defined constant instead of the value itself, is that this value has to be changed in a single place to make the program suitable for arrays of another size. This is particularly useful for constants which are used for tuning the program, such as the maximum size, the maximum number, the number of files that may be used, the size of certain blocks, etc. Notice that we did not write "#define SIZE 10;". This would replace size by "10;".

The statement "int a[SIZE];" is the declaration of an array. In this case it is specified that a[] is an array with space for SIZE integers. The storage positions in an array are called fields. The fields can be accessed by indexing the array. That is, position i of the array is accessed by writing a[i]. In C (and Java) an array of length size starts at position 0 and ends at position size - 1. It is a common error to try to access the non-existing position a[size]. Java tests for this and other array-bound errors and the computation is interrupted with an adequate error message if they occur. In C the program will mostly run on possibly computing a wrong result or crashing at a point from where it is hard to trace back to the original error.

The usage of arrays would be cumbersome without instructions for repeated execution. The for statement is the major example of such a statement. In the program we find "for (i = 0; i < SIZE; i++) { ... }". This means that for all i, running from 0 to SIZE - 1 the statements inside the curly brackets have to be executed. The for statement is the most natural choice for any loop with a fixed number of repetitions. The entire section of the program "for( ... ) { ... }" is called a for loop. For loops are provided by any imperative programming language, though the format may be somewhat different.

Inside the second for loop, we find an if statement. Here it appears in its simplest variant: a condition is tested, and if this condition is satisfied, the following statement is executed. Here this means that if a[i] happens to be larger than the current value of m, then a[i] is assigned to m.

The format of the second for loop is slightly different from the first one. In "for (m = a[0], i = 1; i < SIZE; i++)", there are two initializations: a[0] is assigned to m and 1 to i. In this loop the maximum of all values is computed. We first verify the correctness of this computation for the case a[] = (12, 45, 67, 16, 65, 89, 13, 44, 92, 55). The values are given for the situation at the end of the pass through the loop for the indicated value of i.

i a[i] m
0 12 12
1 45 45
2 67 67
3 16 67
4 65 67
5 89 89
6 13 89
7 44 89
8 92 92
9 55 92

For the end of the pass with i = 0, we took the values after the initialization. The above looks good.

Let us have a closer look at the correctness. We claim that at all times, m = max{a[j] | 0 <= j < i}, that is, m is the maximum over all numbers considered so far. An important point is that the claim is obviously true at the beginning, because then m = a[0] = max{a[j] | 0 <= j < 1}. If this still holds at the end for i = n, then we have m = max{a[j] | 0 <= j < n}, which is the value to compute. So, it remains to verify that the property does not get lost at an intermediate step. This is realized by the conditional execution of the statement "m = a[i]". If we assume that for the current value of i, i >= 1, m = max{a[j] | 0 <= j < i}, then there are two cases to distinguish:

Thus, in both cases the claimed situation for i + 1 can be deduced from the situation for i. Together with the fact that the claim is ok when i = 1, this means that it holds for i = 1, 2, 3, ... , n. The given argument is slightly informal but already contains all the important ingredients of a real correctness proof.

In the first for loop we have written curly brackets, "{" and "}", around the statements to be repeated, in the second for loop we did not do so. Why? The function of curly brackets is to group statements together into a single compound statement. After a for or an if or any of the other instruction of this type, there can either come a single statement, or a compound statement. So, if there is more than one statement to iterate or to execute conditionally, these must be enclosed in curly brackets. A single statement may be enclosed in brackets. What happens when writing the following by mistake?

    for (i = 0; i < SIZE; i++)
      printf("Give the value of a[%2d]   >>>   ", i);
      scanf("%d", &a[i]);
The compiler will not find anything to complain about. For a human reader, the grouping is clear, but in this case the compiler assumes that only the print statement has to be repeated. The lay-out of the program is of no importance to the compiler! Finally one value is read, and it is assigned to a[SIZE], because at the end of the loop i = SIZE. At this point there is a real error. There are programmers, which always write the curly brackets even if there is only a single statement to execute. There (quite strong) argument is that it is harder to forget something which you always do, then something which you sometimes do. Furthermore, due to corrections the number of statements may change from one to more than one, and then it is very common to forget that this also means that brackets must be added. If they are already there, there is no such risk. Where should these brackets be placed? There are at least three conventions. All of them are good as long as they are applied consequently: Saving lines in the program code means that more code fits on a single screen of the monitor, thereby making it easier to grasp what it is about. Adding extra lines may help to highlight the structure.
Arrays can be used to store a fixed number of elements under a common name. A loop based on the for statement is the natural way of processing arrays. Conditional execution is obtained with the if statement.

Third Program: Pointers

Further aspects of C are illustrated with a program which asks for a number n, then allocates an array of size n, reads n numbers, rearranges these until they stand sorted in increasing order, and finally prints them. New features are the following:
  #include "stdio.h"
  #define  true  1
  #define  false 0

  typedef  char bool;

  int main() {
    int i, x, r, s, n;
    bool ok = false;
    int* a;
    printf("\nGive n   >>>   ");
    scanf("%d", &n);
    a = (int*) malloc(n * sizeof(int));
    for (i = 0; i < n; i++) {
      printf("Give the value of a[%2d]   >>>   ", i);
      scanf("%d", &a[i]); }
    r = s = 0;
    while (!ok) {
      r++;
      ok = true;
      for (i = 1; i < n; i++)
        if (a[i - 1] > a[i]) {
          ok = false;
          s++;
          x = a[i - 1]; a[i - 1] = a[i]; a[i] = x; } }
    printf("\nSorted after %1d rounds with %1d exchanges:\n", r, s);
    for (i = 0; i < n; i++)
      printf("a[%2d] = %10d\n", i, a[i]);
    printf("\n");
    free(a);
    return 1; }

With the instruction "typedef char bool;" it is settled that the type which in the program is called bool, is actually a char. The reason to introduce a type bool like this, is that we may not want to know about the internal representation, as this distracts from what is important about booleans, namely that they can have two values: true and false. If at the beginning the values of true and false have been correctly settled, there is no need to remember in the rest of the program whether 0 corresponds to the logical value true or to false. This reduces the risk of making errors, makes the program easier to read and hides irrelevant details.

The body of the program starts with declaring some int variables and then a bool which is initialized upon declaration. The statement "bool ok = false;" is equivalent to the two statements "bool ok; ok = false;". After compilation the generated code will be the same, so this shorter way of writing does not make the program faster, but it may make it easier to read.

More interesting is the declaration "int* a;" What is this "*"? It indicates that a is not a normal int but a pointer to an int. This is a crucial distinction, which is not so easy to grasp. Before we consider what is a pointer, it is important to better understand the nature of variables. For every variable there is an entry in a list of variables. This entry is a memory address. For an int x this address gives the beginning of a section of 4 bytes which are reserved for storing the int value of x. When a statement "x = y;" is executed, y is looked up in the table. This is an address. Then the value found at the memory position corresponding to this address is copied into the processor. Then x is looked up in the table and finally the value of y is written away at the position given by the returned address. Thus, a variable is actually a reference to a box for storing a value of the appropriate type. For a variable of type int*, in the table we find the address of a storage space for the adress of an integer. To such a variable we may assign the adress of an int rather than the value of an int.

Variables of pointer type serve several purposes. In the statement "a = (int*) malloc(n * sizeof(int));" we encounter a first example. The system procedure malloc allocates memory space and returns the address of the first byte of this space. Here space for n ints is allocated. In addition it is specified that the result must have the type int*. This then, an address, is assigned to a. So, a now points to the beginning of a stretch of n * sizeof(int) ints. This is precisely what we need for an int array of length n. In C we can hereafter work with a as if it were an array. So, it is correct to write a[7]. Internally this is handled by looking up the value of a, the address where the stretch of memory starts. Then 7 * sizeof(int) is added to this value. This gives the address of the position where a[7] is stored. This construction is not very clean but works conveniently.

Then we find "r = s = 0;". This is equivalent to "s = 0; r = s". Again, this is only a more compact way of writing something that just as well might have been written differently. More generally: the lay-out of the program is essential for the human reader, but of no importance to the compiler. Statements may be packed together on a single line, blanks may be added or omitted (as long as there remains one separating element between any two variable names or keywords).

Then we find "while (!ok) { ... }". The semantic is that as long as the condition inside the round brackets holds, the statements inside the curly brackets are executed. In this case the condition is "!ok", which means that it is executed as long as the situation is not ok, whatever this may mean. The while statement is the natural choice for a loop with an a priori unknown number of repetitions. The while statement together with the following iterated statements is called a while loop. In principle there is no need to use both for and while statements. All can be done with either of the two. However, using them in the suggested way, gives the reader of the program, some extra support. Of course there are cases in between, such as when processing an array from its beginning until its end, unless some exception occurs. In that case a for loop with an additional condition is probably clearest. For example, we might have "for (i = 0; i < n && a[i] >= 0; i++) { ... }" if we want the execution to be interrupted if there happens to be a negative value in the array.

The attentive reader will have noticed that the while loop in the given program is going to be executed at least once: ok is initialized to false, and therefore the first time !ok is certainly true. For loops which are going to be executed at least once, it is more natural to use an alternative construction rewriting the loop as follows (omitting the counters):

    do {
      ok = true;
      for (i = 1; i < n; i++)
        if (a[i - 1] > a[i]) {
          ok = false;
          x = a[i - 1]; a[i - 1] = a[i]; a[i] = x; } }
    while (!ok);
So, we are using a do-while loop instead of a while loop. Computationally, the advantages are that there is no need to initialize ok and that a test is saved. More important is that it more clearly expresses the structure of the program.

Between the round brackets of a while statement (and analogously for if and for statements), there must stand a boolean expression. For example, "i < n && a[i] >= 0". By substituting the values of the occurring variables i, n and a[i], this expression can be evaluated to a result true or false. This is the same as evaluating the expression "n - i * a[i]". The simplest expression of a given type is a constant. Many beginning programmers do not realize this and write "while(ok != true)" instead of the simpler "while(!ok)". Optimizers will discover this and save the unnecessary instruction, but it nevertheless looks quite unprofessional. In C and Java and many other languages the symbol "==" is used for testing left and right side on equality. The result is a boolean. It is a very common error to write "while (a = b) { ... }", where "while (a == b) { ... }" is intended. In Java this leads to an error message from the compiler, because the type of the assignment is not boolean. In C, however, 0 is false and all other values are true. As said before, there is nothing like a type boolean, and at the place of the while condition anything which can somehow be interpreted as a number will do. It often takes quite some time to find this kind of errors. The mentioned C feature can even be exploited. The following is a correct but ugly way of shifting all values of an array one position back: "for (i = n - 1; i; i--) a[i - 1] = a[i];". Much clearer is "for (i = n - 1; i > 0; i--) a[i - 1] = a[i];".

Inside the while loop we find the sequence of statements "x = a[i - 1]; a[i - 1] = a[i]; a[i] = x;" This is used to exchange the values of a[i- 1] and a[i]. This operation is called a swap. Swapping two values is not a basic operation. It requires a dummy variable which is used to temporarily store the value of one of the two variables.

Close to the end of the program we find "free(a)". Free is the counterpart of malloc: it tells the system that the memory space allocated to a is no longer needed and can be reused. At the end of the program this would be done anyway, but in other programs new memory is allocated frequently. Never freeing anything would soon lead to a situation in which all available memory has been consumed. This will be worked out in more detail further down.

It was claimed that the given algorithm sorts the numbers of the input in increasing order. Trying several inputs appears to support the claim. This is a good start, but does not prove anything. Another question is how long the algorithm may take. As long as we are typing in the numbers by hand this will not be an issue, but it is still interesting to know whether we can sort 100 or 1,000,000 numbers in a second. Even more important is to make sure that the algorithm terminates. For the previous two programs this was not an issue: the first program contains no loops, so the execution simply runs from the first to the last line. Such programs terminate in a time which is proportional to the number of statements. The second program contains a for loop. The instructions inside the loop are performed at most SIZE times. Again, it is easy to put an upper bound on the number of executed statements, and the running time is proportional to this. For the third program, the situation is much harder: the number of iterations of the while loop is not known beforehand, and it is a priori not even clear that it is finite. Inside the while loop there is a for loop which executes a constant number of statements at most n times. So, the total time consumption is proportional to r * n. n is a user-specified parameter, but r depends on the problem.

If a[i - 1] >= a[i], for all i, 1 <= i < n, then the statements conditioned by the if are never executed. Particularly, the value of ok will not be changed in the for loop, so at the end of the while loop it will have the same value as at the beginning, which is true. In that case the while loop is left. On the other hand, if a[i - 1] < a[i], for some i, 1 <= 1 < n, then the statements conditioned by the if are executed. Particularly, the value of ok will be set to false. Because inside the for loop it cannot be set to true again, it will be false at the end. Therefore, the while loop will not be left. We conclude that the while loop is left if and only if the numbers in the array stand in sorted order. In general termination does not imply correctness, but for this particular problem we now know that if the computation terminates, it is correct as well.

We claim that after r rounds, 0 <= r <= n, the r largest numbers have reached their final positions. For r = 0, the claim is void, so it clearly holds. If it holds for r = n, then it states that all n numbers have reached their final position, which means that the array has been sorted. Now consider any intermediate value of r. Assume the claim holds for r - 1. That is, the largest r - 1 numbers are standing in the r - 1 positions with the highest indices. Consider the number x whose final position is in a[n - r]. This number must stand in some of the a[j] with 0 <= j <= n - r, because the other positions are occupied by the largest numbers. x is the largest of the remaining numbers (if several numbers have the same value x, then j should be the highest index containing this value). In the for loop, when i = j + 1, x is swapped to position i. Thereafter, x bubbles further until it stands in a[n - r]. This shows that the correctness of the claim for r can be deduced from its correctness for r - 1. Together with the correctness for r = 0, this implies that it holds for r = 0, 1, ..., n. Notice the similarity of this argument with the correctness proof of the algorithm computing the maximum.

Arrays are closely related to pointers. The while loop is the natural way of expressing a conditional iteration.

Structured Data Types

A structured data type is a data type whose variables (may) contain more than one value of another type.

Arrays are the most important structured data type. Arrays can be defined over any previously defined type. The general format is "type_name array_name[array_size]". For reasons of convenience C even offers a construct for higher-dimensional arrays. A two-dimensional array can be used as a matrix or a tensor. This is declared like "type_name array_name[array_size_1][array_size_2]". This may be imagined as a block of array_size_1 x array_size_2. The first number giving the number of rows, the second the number of columns. In the memory these are arranged row-wise. So for a two-dimensional array a[][] declared by "int a = a[100][80];", a[17][20] stands just before a[17][21], but a[18][20] stands 100 positions further.

One may think that if we have two int arrays a[] and b[] of length n, that then the values from b[] may be copied to a[] by writing "a = b;". This is not true. Above we pointed out that arrays are very similar to pointers, and such an assignment (if it were allowed) would let a[] point to the same address as b[]. All manipulations on an array are done by manipulating the individual positions. So, copying b[] to a[] is achieved with a for loop:

  int i;
  for (i = 0; i < n; i++)
    a[i] = b[i];

Strings are text sequences. String constants look like "hello what is your name?". In C there is not really a string type. Instead we can either use arrays of char, or a char*. The first works fine when the strings are assigned upon declaration, but after the declaration, for type reasons, a string constant cannot simply be assigned to an array of char. It is more convenient to define a string variable as a char*. Then the following works correctly:

  char* name;
  name = "Michiel de Ruyter";
  printf("Admiral %s was fighting many sea battles\n", name);

Primitive data types consist of a single element of a given type. Arrays contain a parametrizable number of elements of a given type. There are also compound types, which contain a fixed number of elements of possibly different types. In C such compound types are called structs, in other languages these may also be called records. We do not want to spend much time on structs, because we think that for the typical C application they are of limited interest. There are three reasons to use C:

If we want to be ready fast, then structs are not the thing to use. If speed is critical, then it is mostly better not to use structs but rather pack everything in arrays. For problems requiring non-trivial data structures an object-oriented language, such as C++ or Java, is much more suitable.

The classical example of a struct is the personal record of an employee. An employee has a number, a name, a birth date and a salary. This can be achieved as follows:

  struct {
    int   number;
    char* name; 
    float salary; } assistant;
This declares a variable assistant with four fields: an int, a char* and a float. A correct assignment is "assistant.salary = 3212.67", setting the salary field of assistant to 3212.67. Accessing the fields of a compound variable is done in most languages with help of the dot operator ".".

This idea becomes much more useful, when we define a new compound type which can be reused for several declarations:

  struct staff {
    int   number;
    char* name; 
    float salary; };
Hereafter, we can write:
  struct staff assistant;
  struct staff secretary;

We can go one step further by defining struct staff as a type. This can be integrated into the struct definition, but there is no need to do so. If we write

  typedef struct staff staff_type;
then we can write later in the program
  staff_type assistant;
  staff_type secretary;
  staff_type workers[10];
The last defines a working crew of size ten, its fields are accessed with a double indirection: "worker[8].name = "Jan Becker";"

What output will be produced by the following program? It is suggested that you try it before reading on.

  int main() {

    typedef struct {
      int   number;
      char* name;
      float salary; } staff_type;

    staff_type assistent, secretary;

    assistent.number    = 101;
    assistent.name      = "Bertina";
    assistent.salary    = 3212.67;

    secretary.number    = 107;
    secretary.name      = "Hannelore";
    secretary.salary    = 2145.18;

    printf("assistent = (%4d, %10s, %8.2f)\n", 
      assistent.number, assistent.name, assistent.salary);
    printf("secretary = (%4d, %10s, %8.2f)\n", 
      secretary.number, secretary.name, secretary.salary);

    secretary = assistent;

    printf("assistent = (%4d, %10s, %8.2f)\n", 
      assistent.number, assistent.name, assistent.salary);
    printf("secretary = (%4d, %10s, %8.2f)\n", 
      secretary.number, secretary.name, secretary.salary);

    secretary.number    = 134;
    secretary.name      = "Birgit";
    secretary.salary    = 2456.56;

    printf("assistent = (%4d, %10s, %8.2f)\n", 
      assistent.number, assistent.name, assistent.salary);
    printf("secretary = (%4d, %10s, %8.2f)\n", 
      secretary.number, secretary.name, secretary.salary);

    return 1; }

One might expect that, in analogy with the arrays, structs are pointers. In that case, after writing "secretary = assistant;" both pointers would point to the same memory address and when changing any value of either of the two, would also change the value of the other. This is what happens in an analogous situation in Java. C is different. Apparently, when executing "secretary = assistant;" the computer copies the fields of the struct one-by-one. The later changes to the values of secretary have no impact on the values of assistant.

Explicitly defining a new type does not only saves typing, but is also an essential structuring step. By doing this, it is made explicit that we are dealing with staff members, not just with a bunch of unrelated variables. Going beyond this, packing even the functionality of the type into the definition is the principal idea of object-oriented programming.

Arrays and structs are the most important structured data types. One should be very careful with assignments of structured types. Defining own types is a major structuring step.

Fourth Program: Procedures

We consider an alternative version of the earlier sorting program in order to expose some further central programmatic concepts. New features are the following:
  #include "stdio.h"
  #define  true  1
  #define  false 0

  typedef  char bool;

  void swap(int* x, int* y) {
    int z = *x; *x = *y; *y = z; }

  void read_array(int* a, int n) {
    int i;
    for (i = 0; i < n; i++) {
      printf("Give the value of a[%2d]   >>>   ", i);
      scanf("%d", &a[i]); } }

  int sort_array(int* a, int n) {
    int i, r = 0;
    bool ok = false;
    while (!ok) {
      r++;
      ok = true;
      for (i = 1; i < n; i++)
        if (a[i - 1] > a[i]) {
          ok = false;
          swap(&a[i - 1], &a[i]); } }
    return r; }

  void print_array(int* a, int n) {
    int i;
    for (i = 0; i < n; i++)
      printf("a[%2d] = %10d\n", i, a[i]); }

  int main() {
    int  n, r;
    int* a;

    printf("\nGive n   >>>   ");
    scanf("%d", &n);
    a = (int*) malloc(n * sizeof(int));

    read_array(a, n);
    r = sort_array(a, n);
    printf("\nSorted in %1d rounds\n", r);
    print_array(a, n);

    printf("\n");
    free(a);
    return 1; }

The program is structured quite differently from before. As always the execution starts in main. n is read as before, then we find "read_array(a, n);", a procedure call. When encountering such a procedure call, the execution continues at the beginning of the procedure with the corresponding name. A procedure, also called subroutine, is a subsection of the program with its own header and body just like main. Main itself is also a procedure, the procedure where by default the execution of the program starts. Other procedures which we encountered before are the system-provided IO routines printf and scanf.

The procedure call consists of the name of the procedure and a list of variables. These variables are copyed into the corresponding parameters in the header of the procedure. In this case the names inside the procedure are the same as in main, but there is no need for this. The procedure read_array is of type void. This means that it is not returning any value. The procedure which is called next, sort_array, is different in this respect: it is of type int, just like main itself, which means that it must end by returning some int value. A procedure returning a value might also be called a function. The procedure print_array is again of type void. Inside sort_array we find a call to the procedure swap which swaps the values of the two variables which are passed as arguments. We see procedure may contain calls to procedures which ... . The calling depth can in principle be arbitrary, but in practice it is limited by the space which is reserved for storing the data necessary for storing the values of the variables and the like.

It is important to be aware of the visibility of variables. A variable is visible, that is can be used only within a certain scope. A variable which is declared within a procedure can be used anywhere within this procedure, but not in another procedure: it is a local variable. This also means that it is no problem to use the same variable names in several procedures. This is essential! Otherwise, it would be very hard to add a new procedure at a later time to an existing program: one should have a complete overview of all names used anywhere in the program. In principle variables can be even more local then the procedure: in many languages (but not pure C) new variables can be defined inside an if, for, while or compound statement. These are invisible outside it. Even these variables may have the same names as other variables within the same procedure. In that case we say that these other variables are shielded. In general shielding a variable is not a good idea, because it is confusing. The other extreme are variables which are not local to any procedure: global variables. In C these are declared at the beginning of the program like the array a[] above. These are visible everywhere. Because the compiler knows right from the start that these variables will be there, these are allocated in a different way. Except for very large arrays, this goes unnoticed though. It is a good idea to declare variables as local as possible.

There is an essential difference between the way the parameter n is passed to sort_array and the way sort_array passes a[i - 1] and a[i] to swap. In the first case, the value of n is copied into the variable n which lives only locally inside sort_array. Changing the value of n in sort_array has no impact on the value of n in main. This way of calling is known as call by value. Of course this is not what we want to happen in swap: if we would copy the values of a[i - 1] and a[i] into local variables x and y, then swapping the values of x and y would not change the values of a[i - 1] and a[i]. So, in this case x and y should not be local values, they should just be other names for the same objects. That is, they should be pointers, pointing to the same memory space as a[i - 1] and a[i]. That is why x and y are of type int*. This is also why we are not passing a[i - 1] and a[i], but rather the addresses of these. The address of a variable can be accessed with the address operator "&". This second way of calling is known as call by reference, because the address or reference is passed.

Because it is so important, we consider two variants of a swap procedure in more detail:

  void local_swap(int x, int y) {
    int z;
    z = x;
    x = y;
    y = z;
    printf("Values in local_swap:\n");
    printf("  x = %10d,   y = %10d\n",  x,  y);
    printf(" ax = %10X,  ay = %10X\n", &x, &y); }

  void global_swap(int* x, int* y) {
    int z;
     z = *x;
    *x = *y;
    *y =  z;
    printf("Values in global_swap:\n");
    printf("  x = %10d,   y = %10d\n", *x, *y);
    printf(" ax = %10X,  ay = %10X\n",  x,  y); }

  int main() {
    int x, y;
    x = 17;
    y = 23;
    printf("Values in main at beginning:\n");
    printf("  x = %10d,   y = %10d\n",  x,  y);
    printf(" ax = %10X,  ay = %10X\n", &x, &y);
    local_swap(x, y);
    printf("Values in main after local_swap:\n");
    printf("  x = %10d,   y = %10d\n",  x,  y);
    printf(" ax = %10X,  ay = %10X\n", &x, &y);
    global_swap(&x, &y);
    printf("Values in main after global_swap:\n");
    printf("  x = %10d,   y = %10d\n",  x,  y);
    printf(" ax = %10X,  ay = %10X\n", &x, &y);
    return 1; }

Running the program gives the following output (integer values are given in decimal notation, addresses are printed hexadecimally):
   x =         17,  y =         23
  ax =   FFBEED2C, ay =   FFBEED28
  Values in local_swap:
   x =         23,  y =         17
  ax =   FFBEED0C, ay =   FFBEED10
  Values in main after local_swap:
   x =         17,  y =         23
  ax =   FFBEED2C, ay =   FFBEED28
  Values in global_swap:
   x =         23,  y =         17
  ax =   FFBEED2C, ay =   FFBEED28
  Values in main after global_swap:
   x =         23,  y =         17
  ax =   FFBEED2C, ay =   FFBEED28

Values of the variables x, x and x

What are procedures good for? In the current example, you may think it only complicates something simple. For very short programs this is true, but there are very good reasons to use procedures:

Recursion

Principle of Recursion

Now we will see one more very important use of procedures. Consider the following procedure for computing factorials:
  int fac(int n) {
    if (n == 0) 
      return 1;
    else
      return n * fac(n - 1); }
Here we see something new: the procedure calls itself. This is called recursion and the procedure is said to be recursive. Recursion is no special problem, to the compiler it is not really different from any other procedure call. Recursion is allowed in all programming languages. It may also happen that several procedures mutually call each other. In that case, they are said to be mutually recursive.

What happens exactly if we call fac(5)? At the highest level, n = 5 != 0, so the second alternative applies and the returned value is 5 * fac(4). At the second level, n = 4 != 0, so 4 * fac(4) is returned. This goes on until calling fac(0), which returns 1 without further recursion. Then all pending calls to fac return one-by-one. At the highest level fac(4) comes back with value 24, and fac(5) returns 120.

Any recursive algorithm must have at least one non-recursive alternative, otherwise it will not terminate. Furthermore, the programmer must somehow assure that eventually such a non-recursive alternative will be reached. In the above case this is obvious, because at recursion depth d the value n_d of n equals n_0 - d, where n_0 is the original value of n, so after exactly n calls we will have n_d = 0. This can be proven formally: we claim that d + n_d = n_0 for all 0 <= d <= n. For d = 0, the claim obviously holds. For all other d we use that when going one level deeper, d increases and n increases by 1. So, the sum d + n_d does not change when d increases: d + n_d is an invariant of the procedure.

Recursion does not only lead to very compact programs, but it also gives rise to programs whose correctness often can be proven more easily then alternative formulations with a loop. In the above, if we assume that fac(n - 1) = (n - 1)!, it follows immediately that fac(n) = n * fac(n - 1) = n * (n - 1)! = n!. In the following non-recursive version there is a second parameter and other instructions then just assignments and computations:

  int fac(int n) {
    int i, f = 1;
    for (i = 2; i <= n; i++)
      f *= i;
    return f; }

Maximum-Subsequence-Sum Problem

Possibly you are not impressed by the above recursive procedure. Therefore we now consider a problem that is far from trivial. For an array a[] of length n whose values may be positive and negative the task is to compute
maxsubsum(0, n) = max{subsum(l, h) | 0 <= l < h <= n},
where subsum(l, h) = sum_{l <= i < h} a[i]. This value is called the maximum subsequence sum. Precisely this value may not be that important, but variants of the problem, which can be tackled in a similar way, have important applications in bio-informatics.

The definition immediately suggests the following procedure:

  int maxsubsum(int* a) {
    int i, l, h, s, m;
    for (m = l = 0; l < n; l++)
      for (h = l + 1; h <= n; h++) {
        for (s = 0, i = l; i < h; i++)
          s += a[i];
        if (s > m)
          m = s; }
    return m; }
Here we find three nested loops. This might be time consuming! Let us try to estimate how many times the statement "s += a[i];" is executed. This is the statement at the deepest level and is therefore executed most often. For larger values of n this will imply that the time for executing this statement will be a constant fraction of the total running time. The number is sum_{0 <= l < n} sum_{l + 1 <= h < n} (h - l). It is not very hard to see, for example by using a geometric argument, that this sum is proportional to n^3 (inserting a counter in the program will confirm this). So, the time consumption is proportional with n^3. That means that when, for sufficiently large n, doubling the problem size the time consumption will be multiplied with a factor 8. Even though computers are very fast, this means that the problem cannot be solved for very large n. On a fast computer n = 10,000 is about the limit.

There is an alternative recursive formulation which is quite simple and much more efficient. The underlying idea is general and very important: try to apply a divide-and-conquer approach. For our problem we can distinguish three mutually exclusive cases:

Each of these cases is considerably simpler than the whole case. This implies that it is rather efficient to compute each of the three values and then to take the maximum of the three resulting values. The three procedures can be worked out as follows:
  int maxleftsum(int* a, int l, int h) {
    int i, s, m;
    for (i = h - 1, s = m = 0; i >= l; i--) {
      s += a[i];
      if (s > m)
        m = s;  }
    return m; }
    
  int maxrightsum(int* a, int l, int h) {
    int i, s, m;
    for (i = l, s = m = 0; i < h; i++) {
      s += a[i];
      if (s > m)
        m = s;  }
    return m; }

  int recmaxsubsum(int* a, int l, int h) {
    if (h - l == 1)
      return a[l];
    else
      return max(
        recmaxsubsum(a, l, (l + h) / 2), 
        recmaxsubsum(a, (l + h) / 2, h),
        maxleftsum(a, l, (l + h) / 2) + maxrightsum(a, (l + h) / 2, h)); }
It can be shown that the number of performed operations is now proportional to n * log n. Even for n = 10^6, this is not a very large number and indeed, the program solves a problem of this size in less than a second.

A complete program containing both variants together with routines for generating an input and measuring the time consumption can be downloaded.

Basic Instructions

In the above examples we already encountered the most important instructions. Here these and a few others are treated more systematically.

Structure of Programs

A program consists of a list of declarations of variables followed by operations performed on them. Except for the IO almost anything can be done with just three kinds of instructions: The classical conditional statement is the if, the classical jumping statement is the goto. With these two, it is easy to write a loop.

The goto, which jumps to a position indicated by some label, has been banned for good reasons: if employed in a careless way, they lead to spaghetti code which might be close to impossible to understand and debug. Instead certain kinds of jumps can be made with the if, the for and the while statements. In an if statement, if the value of the expression is false, the execution of the following statements is skipped. In a loop, the point of execution jumps back from the end of the iterated statements to the test.

Why are some kinds of jumps ok, while others are considered to be a sign of bad programming style? For this we should consider the structure of a program. Any statement is enclosed by a certain number of curly brackets. This number is unique if the statements after every if, for and while are enclosed in brackets and no superfluous brackets are written. This number is called the bracketing depth of the statement. Using #d to indicate the depth of statements appearing at a certain position of the program, the structure of the presented sorting program is as follows:

  int main() {
    #1
    for () {
      #2 }
    #1
    while () {
      #2
      for ()
        #3
        if () {
          #4 }
        #3 }
      #2 }
    #1 }
    for ()
      #2 }
    #1 }
The essential feature of the structure of this program, and all programs with these instructions, is that the difference between the depths of consecutive statements is at most one. There are no radical jumps into or out off a substructure. Said otherwise, there is a clean nested structure, a hierarchy of levels. Even calls to procedures respect the nested structure: after the procedure the execution continues at the same place. So, the procedure-calling mechanism just adds another way of going one level deeper. It is generally accepted that this kind of structuring leads to programs which are easier to understand. Gotos are banned because they can be used in a way that upsets the nested structure.

Conditional Statements

If Statement

The if statement has two forms. The first only contains a then-part:
  if (a == 1000)
    a = 0;
The keyword "then" is not written (C likes to save writing, even when this goes at the expense of clearness). The boolean condition, is written between round brackets. The operational semantic of the if statement is that if evaluating the boolean condition results in true, in the example this happens when the variable a equals 1000, that then the following statement is executed and otherwise this statement is skipped. The if statement only applies to the statement immediately following it. If one wants to apply it to several statements, these should be turned into a compound statement writing them between curly brackets:
  if (a == 1000) {
    a = 0;
    b++; }

The second form of the if statement also has an else-part.

  if (a == 1000)
    a = 0;
  else {
    a += 4;
    s += a[i]; }
The statements following the else are executed if the boolean expression evaluates to false.

Consider the following code fragment:

  if (i >= 0)
    if (i > 0)
      i--;
  else
    i++;
Starting it for i = 3, the instruction i-- is executed and afterwards i = 2. What happens when i = 0 or i < 0? The layout suggests that nothing should be done if i = 0, and that i should be increased by 1 if it is negative. But, this is not what happens. The compiler ignores the layout and assumes that the else belongs to the if which is closest. Thus, if i = 0, it will be 1 afterwards. If i < 0, it will remain unchanged. There are two ways to resolve the problem. The first is by adding an extra else:
  if (i >= 0)
    if (i > 0)
      i--;
    else
      ;
  else
    i++;
This is ugly, but nicely shows that even an empty statement is a statement. The normal way of solving the problem is by adding protective brackets:
  if (i >= 0) {
    if (i > 0)
      i--; }
  else
    i++;
The brackets here enclose the second if, so that the else clearly is at a different bracketing depth. Actually, in this case the intended functionality can be expressed more clearly by rewriting the code fragment, eliminating the problem at the same time:
  if      (i > 0)
    i--;
  else if (i < 0)
    i++;

Switch Statement

Sometimes there are not two alternatives but more. For example, if the variable d indicates the day of the week, and we want to print it in words, then we may do the following:
  if      (d = 1)
    printf("Sunday");
  else if (d = 2)
    printf("Monday");
  else if (d = 3)
    printf("Tuesday");
  else if (d = 4)
    printf("Wednesday");
  else if (d = 5)
    printf("Thursday");
  else if (d = 6)
    printf("Friday");
  else if (d = 7)
    printf("Saturday");
  else
    printf("wrong day number!");
Here we have chosen an alternative layout to underline that there are several equivalent alternatives rather than a nesting of ifs. This is way of writing is ok, but C also offers a special statement for such multiple-choices: the case statement.

The fact that there are equivalent alternatives can be underlined even more using the switch statement. Then the above looks like

  switch (i) {
    case 1: printf("Sunday");
    case 2: printf("Monday");
    case 3: printf("Tuesday");
    case 4: printf("Wednesday");
    case 5: printf("Thursday");
    case 6: printf("Friday");
    case 7: printf("Saturday");
    otherwise: printf("wrong day number!"); }
The otherwise part captures all non-listed values, it is not compulsory. Instead of a single variable i, we may have any numerical expression to switch on. The switch statement can also be used with characters (which are converted to numbers).
The conditional statements are the if, the if-else and the switch statements.

Repetitive Statements

There are three repetitive statements, the for, the while and the do-while statement. All three have been encountered before. Here a few more details are added.

The for statement in general looks like for (part 1; part 2; part 3). In part 1 and 3 there can stand 0 or more assignments. Part 1 is intended for the initialization of variables relevant to the loop. Part 3 is intended for their actualization at the end of a pass. Part 2 is a boolean expression.

Consider the problem of determining whether a value x occurs among the values stored in an array a[] of length n. This can either be done with a for or with a while loop, but the first appears to be the choice that most clearly exposes the structure of the computation: In a context where a[], n and x are given and i is an int, we can write

  for (i = 0; i < n && a[i] != x; i++);
  if (i < n)
    printf("%1d occurs\n", x);
  else
    printf("%1d does not occur\n", x);
Here the statement following the for statement is the empty one. This is marked by the semicolon following the closing bracket. It is a common error (which is quite hard to find!) to write this semicolon where it is not intended. The above is correct even if x does not occur. In that case i runs until it is n. Then the expression i < n && a[i] != x is evaluated. At this point it is essential that such an expression is evaluated from left-to-right in a lazy way. Lazy execution means that the execution is interrupted as soon as the result is known. If i = n, then i < n is false and the value of a[i] != x does not matter anymore. This is good so, because for i = n, we should not access a[i] as this might lead to a segmentation fault. A segmentation fault occurs when trying to access a memory position which lies outside the space allocated to the program. So, testing with a[i] != x && i < n might have led to an error. Also in the test after the loop we were careful to avoid accessing a[n]. It would be risky to write "if (a[i] == x) ... ". Actually with this alternative test there might happen something that is worse than a segmentation fault: the program might produce the wrong answer without crashing. If position a[n] lies within the space of the program, then in C the test will be performed in a normal way. If by coincidence position a[n] has value x, then the conclusion will be that x occurs in a[].
The for and while statements offer elegant ways to realize a conditional repetition.

Return Statements

Any non-void procedure must end with a return statement. More precisely, it should be assured that any possible path in the computation ends with a return. In other languages, such as Java, the rules are more strict, there the compiler must be able to verify this. This is not always the same.

Computing the maximum of two numbers is easy:

  int maximum(int i, int j) {
    int m;
    if (i <= j)
      m = i;
    else 
      m = j; 
    return m; }
Here there is only one return statement at the very end of the procedure. This clearly exposes the structure of the computation and it is clear which value is going to be returned. So, if there is something wrong, then we know that we must figure out how m got its value.

The maximum can also be computed as follows:

  int maximum(int i, int j) {
    if (i <= j)
      return i;
    else 
      return j; }
This procedure is shorter and possibly faster. Here we use that return statements may appear anywhere in the program. If we look at the changes in bracketing depth, then we see that the return statements skip over a level. So, this way of using return gives a violation of the pure nested structure. On the other hand, this is not a very serious violation because it goes upwards, like closing several brackets in one stroke.

Actually, realizing that the execution is immediately interrupted once a return is encountered, the above can be written even shorter:

  int maximum(int i, int j) {
    if (i <= j)
      return i;
    return j; }
This is shorter but probably not faster, because a good compiler will generate the same code for both alternatives. On the other hand, does this obscure the structure of the procedure: it is no longer clear that there are two equivalent alternatives. It may appear that the default return value is j and that sometimes i is returned instead. In this three-line procedure this is no issue, but in a longer procedure this kind of "improvements" should be avoided.

Comments

The last important type of instructions are comments. They are not important for the functioning of the program (they are simply removed at some stage of the compilation, and therefore they have no influence on the length of the compiled code or the speed of it), but they are essential for understanding the program. This is particularly important for programs which you or someone else is going to use again after some time: even the best code is hard to understand without hints on the purpose and functioning. In C comments are enclose between matching pairs of "/*" and "*/" symbols. Inside you can write anything except "*/". Thus, we might have
  /* Written by Jop Sibeyn, 25.10.2002 
     This program computes the average value of the elements 
     of an array */

  #define SIZE 100 /* The length of the used array */

  int main() {
    int i, a[SIZE], sum; 

    /* Initialization */
    for (i = 0; i < SIZE; i++)
      a[i] = i;

    /* The main part of the program */
    for (i = 0, sum = 0; i < SIZE; i++)
      sum += a[i];

    /* Printing the results */
    printf("The average value is %5.2f\n", (float) sum / SIZE);

    /* Finishing */
    return 1; }

You should not write books, but concise explaining phrases are essential. Any complete program of more than 50 lines should be commented.

Comment is essential for understanding a program.

Working with Arrays

In C there is no very strict type management. This is certainly one of the reasons why C leads to somewhat faster programs than most other languages, and it also creates opportunities. At the same time it leads to errors which are hard to detect. A clever programmer therefore handles the offered possibilities very carefully. This section is gives a detailed discussion of arrays and their relation with pointers, summarizing and extending what has been said before.

Address Arithmetic

An array in C is closely related to a pointer. There are some differences though, which we will discuss here. The typical way of defining an array is as follows:
  int a[100];
This instruction creates 400 bytes of space for storing 100 integers. This space can be accessed with the help of the name a. Actually there is an internal mechanism, which is called address arithmetic, which is used to access the stored values: if we write "b = a[23]", then the value of a is fetched, this is a memory address, namely the address of a[0], then 23 * 4 is added to this value, and the integer at this address (the value represented by the 4 bytes starting at the specified address) is returned. Therefore, the above assignment is equivalent to writing "b = *(a + 23)". The most remarkable feature is that we only need to add 23 and not 4 * 23, because internally it is known that a is an array of integers.

The header of a procedure with an integer argument may be written in two ways. At a first glance the most correct way of doing is to write

  int sum(int a[], int n) {
    int i, s;
    for (i = s = 0; i < n; i++)
      s += a[i]; 
    return s; }
In this way it is explicitly told that the argument must be an integer array. Alternatively, one may write
  int sum(int* a, int n) {
    int i, s;
    for (i = s = 0; i < n; i++)
      s += a[i]; 
    return s; }
Both procedures are correct C.

Exchanging the Values of Arrays

Suppose, now that we want to exchange two arrays a[] and b[]. We do not care where the data are stored, we only want that afterwards a[i] has the original value of b[i] and vice versa, for all 0 <= i < n.

A possible way of doing this is by simply exchanging all values:

  void initialize(int a[], int b[], int n) {
    int i;
    for (i = 0; i < n; i++) {
      a[i] =  i; 
      b[i] = -i; } }

  void simple_exchange(int a[], int b[], int n) {
    int i, c;
    for (i = 0; i < n; i++) {
      c    = a[i];
      a[i] = b[i];
      b[i] = c; } }

  void print_arrays(int a[], int b[], int n) {
    int i;
    printf("\n");
    for (i = 0; i < n; i++)
      printf("a[%2d] = %4d, b[%2d] = %4d\n", i, a[i], i, b[i]); }

  int main() {
    int n = 10, a[n], b[n];
    initialize(a, b, n);
    print_arrays(a, b, n);
    simple_exchange(a, b, n);
    print_arrays(a, b, n); }
This procedure takes time proportional to n. One might fear that it has the same problem as local_swap above, but that is not the case. The reason is that the array variables a and b are actually pointers and that these pointers are handed over as parameters. The output of the program is:
  a[ 0] =    0, b[ 0] =    0
  a[ 1] =    1, b[ 1] =   -1
  a[ 2] =    2, b[ 2] =   -2
  a[ 3] =    3, b[ 3] =   -3
  a[ 4] =    4, b[ 4] =   -4
  a[ 5] =    5, b[ 5] =   -5
  a[ 6] =    6, b[ 6] =   -6
  a[ 7] =    7, b[ 7] =   -7
  a[ 8] =    8, b[ 8] =   -8
  a[ 9] =    9, b[ 9] =   -9
  
  a[ 0] =    0, b[ 0] =    0
  a[ 1] =   -1, b[ 1] =    1
  a[ 2] =   -2, b[ 2] =    2
  a[ 3] =   -3, b[ 3] =    3
  a[ 4] =   -4, b[ 4] =    4
  a[ 5] =   -5, b[ 5] =    5
  a[ 6] =   -6, b[ 6] =    6
  a[ 7] =   -7, b[ 7] =    7
  a[ 8] =   -8, b[ 8] =    8
  a[ 9] =   -9, b[ 9] =    9

This is nice, but not very efficient. This operation can also be performed in constant time: we do not have to exchange all elements, it is sufficient to exchange the values of a and b. So, afterwards a will point to the first position of b and vice versa. That is, we want to perform a procedure like global_exchange with parameters of type "array of integer". For more complex operations like this, it becomes much more convenient not to mix the array notation with the pointer notation:

  void initialize(int a[], int b[], int n) {
    int i;
    for (i = 0; i < n; i++) {
      a[i] =  i; 
      b[i] = -i; } }

  void fast_exchange(int** aa, int** ab) {
    int* c ;
      c = *aa;
    *aa = *ab;
    *ab =   c; }

  void print_arrays(int a[], int b[], int n) {
    int i;
    printf("\n");
    for (i = 0; i < n; i++)
      printf("a[%2d] = %4d, b[%2d] = %4d\n", i, a[i], i, b[i]); }

  int main() {
    int n = 10; 
    int* a = (int*) malloc(n * sizeof(int));
    int* b = (int*) malloc(n * sizeof(int));
    initialize(a, b, n);
    print_arrays(a, b, n);
    fast_exchange(&a, &b);
    print_arrays(a, b, n);
    free(b);
    free(a); }
This produces the same output as the program above.

Values and addresses of arrays

Lifespan of Allocated Memory

It is important to understand the above program in detail. This time a and b are declared of type int*. Exchanging two variables of type int* is performed entirely analogously to exchanging two variables of type int in global_swap. The only difference is that here all variables have one extra *.

Different from an array declaration, declaring a variable of type int* does not immediately allocate a whole lot of memory. A variable of int* has size four (possibly eight) bytes. The standard procedure malloc is used to allocate memory. The number of bytes is passed as an argument. We might write 4 * n, but then we would explicitly use that integers are four bytes long. Doing this, the program would not work on a more modern system were integers consist of eight bytes. The procedure malloc returns a typeless pointer, void*, which cannot be assigned in a correct way to an int* without forcing the system to do so. Therefore we precede the procedure by "(int*)", enforcing a type conversion of the result. Said otherwise, the result type is cast to int*.

At the end of the program we find the calls to the standard procedure free. This procedure deallocates the memory a pointer is pointing to. Of course, at the end of a program all memory is deallocated, so in this case these statements are superfluous. However, in general it is important to carefully manage the memory making sure that the program does not create garbage: allocated memory which cannot be reached anymore by following any of the pointers. Garbage would be created if at some stage in the program we would write "a = b;" or if we would have a second malloc statement for a. Forgetting to free memory is an important source of problems. Suppose that instead of simple_exchange we were doing the following:

  void stupid_exchange(int a[], int b[], int n) {
    int i;
    int* c = (int*) malloc(n * sizeof(int));
    for (i = 0; i < n; i++);
      c[i] = a[i];
    for (i = 0; i < n; i++);
      a[i] = b[i];
    for (i = 0; i < n; i++);
      b[i] = c[i]; }
Constructions of this kind, reducing the number of different variables in each loop, might have advantages when the "cache associativity" is low (stupid_exchange is guaranteed to work fine already for a "two-way associative cache"). However, this procedure leaves behind n * sizeof(int) bytes of garbage. If we are calling this procedure many times, we will run out of memory, even when the program actually needs only a small fraction of it. Every good programmer is so disciplined to match each malloc (or similar operation) with a corresponding free, in the same way any "{" is matched by a corresponding "}".

Now one might become afraid that we have the same problem for the following procedure:

  void not_so_stupid_exchange(int a[], int b[], int n) {
    int i;
    int c[n];
    for (i = 0; i < n; i++);
      c[i] = a[i];
    for (i = 0; i < n; i++);
      a[i] = b[i];
    for (i = 0; i < n; i++);
      b[i] = c[i]; }
But here we touch on the counterpart of the automatic memory allocation of an array: just as the memory is allocated implicitly, it is also automatically deallocated at the end of the procedure.

In particular this means that one should not assign a local array to a pointer variable which is going to be used outside the procedure. Consider the following program:

  void initialize(int** a, int n) {
    int i;
    int b[n];
    for (i = 0; i < n; i++)
      b[i] = i;
    *a = b; }

  void print_array(int* a, int n) {
    int i;
    printf("\n");
    for (i = 0; i < n; i++)
      printf("a[%2d] = %4d\n", i, a[i]); }

  int main() {
    int n = 10;
    int* a;
    initialize(&a, n);
    print_array(a, n); }
This program is syntactically correct. However, when running it, it may crash or it may produce nonsense. So, not withstanding its syntactical correctness, this program has runtime errors. If memory which is allocated in a subroutine should continue to exits after its completion, then this memory should be allocated with malloc as in the following modified procedure:
  void initialize(int** a, int n) {
    int i;
    int* b = (int*) malloc(n * sizeof(int));
    for (i = 0; i < n; i++)
      b[i] = i;
    *a = b; }

Allocating in a subroutine memory which survives after its lifespan makes it hard to keep track of the allocated memory. It is a good practice to assure that subroutines leave no garbage: at the end of any subroutine there should be one free for each of its mallocs. So, it is suggested that instead of allocating the memory in initialize, it is allocated in main. Just like when closing brackets, it is also recommendable that the frees are performed in reversed order: the memory that was allocated latest should be freed first.

The difference between allocating memory by writing "int c[n];" amd "int* c = (int*) malloc(n * sizeof(int));" also becomes visible when running the procedure stupid_exchange and not_so_stupid_exchange for large n. The first will not work for n on the order of some millions, even though there is sufficiently much available memory. The second works up to the limit of the memory. The reason is that for each kind of allocation some space is reserved which is not particularly large for static allocations.

There is a close but imperfect relation between arrays and pointers. Arrays are convenient, but often it is better to use pointers and allocate the memory with malloc and deallocate it with free.

Program Design

So far we have been talking about how to express certain concepts concisely with the help of a programming language. Of course, before it is possible to express things there must be a plan. In general such a plan is born in several stages, going top-down: starting at the highest level of abstraction and gradually filling in the details. The bigger the problem is, the more seriously the modeling must be taken. For real-live problems the principal task is to model the problem so that it can be tackled with the help of a computer. The typical programming exercises are easier: these are already formulated with a computer-science terminology. Nevertheless, even for such problems one should work top-down. It is a good approach to first write the subroutine main() in a version which only contains a few declarations and calls to subroutines. From this it becomes clear which functionality must be supported. Then one may decide how this functionality can be achieved.

For programs there are several quality measures. Obviously, correctness is crucial: a program which is not correct is worthless. For a serious program, readability and extendibility are also of great importance, because if the requirements change, another programmer must be able to modify and extend it. A final main issue is efficiency: a program which can solve only small problems may be good enough today, but it should also work for tomorrows larger problems. Above we have seen how changing the approach had a tremendous impact for the maximum-subsequence-sum problem: for an array of length n the running time of the most basic approach was proportional to n^3 while it was quite easy to reduce this to n * log n.

Instead of approach we will use the word algorithm. An algorithm is a stepwise description of how to tackle a problem. This problem may be how to make an apple-pie, or how to repair a bicycle tire, but it may also be how to compute the maximum-subsequence-sum. If we say that something is an algorithmic issue, we mean that it has to do with the algorithm. This will sometimes be opposed with a programmatic issue, which means that it has to do with the program. Whereas programming is considered to be a rather basic skill, is algorithm design considered to be a more intellectual activity.

The programmatic details can have a substantial impact on the efficiency of a program, but the difference with an optimized version will be bounded by a factor which does not increase with the problem size. On the other hand, as we have seen for the maximum-subsequence-sum problem, the factor between the running time of two programs based on different underlying algorithms can grow arbitrarily large with increasing problem size.

A problem should be solved in a top-down way: first modeling, then algorithm design and finally programming. The programming again should be done in a top-down way.

Exercises

  1. Write a program which asks for integers n and k, determines whether n is divisible by k and prints the result.

  2. Whether a number n is odd or even, can be determined without division, by looking only att the last bit. How are even numbers characterized? Write a program which asks for an integer n and prints its parity.

  3. Integers can also be used to manage small sets: bit i has value 1 if i is an element of the set, and otherwise 0. Under this convention, the size of the represented set is given by the number of ones in the integer. This shows the importance of an operation which determines the number of ones. Write a program which asks for an integer n, determines the number of ones in its binary representation and prints this number as output.

  4. Write a program which asks for a character c. First it tests whether c is a letter. If it is not, it prints c unmodified. If c is a lower-case letter, it prints the corresponding upper-case letter and vice-versa.

  5. The normal way of swapping two number is by using a dummy variable and three assignments. Using some more arithmetic this can also be done without additional storage. Show how.

  6. Consider the program computing the maximum number in an array of length SIZE. For which inputs is the number of instructions performed in the while loop maximized/minimized?

  7. Consider the program sorting n numbers. Modify the program in two ways: Measure the values of c for n = 1000, 2000, 4000, 8000. For each value of n compute the average of 10 experiments. How does c develop as a function of n?

    In the current algorithm the for loop in side the while loop starts with i = 1. As shown in the analysis, this has the effect that the largest numbers are transported to the end of the array. A similar for loop starting with i = n - 1 could transport the smallest numbers to the beginning of the array. There is not really a difference between the two. However, it is interesting to alternate these two for loops. Change the while loop so, that for even r the original for loop is performed, while for odd r it takes the other. Perform the same measurements as before and formulate again how c develops as a function of n. Draw a conclusion from your results.

  8. In many applications it is needed to compute the maximum and minimum. For small numbers it may be worthwhile to design an optimized procedure.

  9. The median of three numbers i, j and k is the number m so that at least one of the other numbers is less than or equal to m and at least one number is larger than or equal to m.

  10. The product a * b of two positive integers a and b can be defined as follows:
    a * b = a, if b = 1,
    a * b = a * (b - 1) + a, if b > 1.
    Write a recursive procedure for computing products based on this definition.

  11. Write a recursive procedure for computing a^x for integers a and x, with x >= 0.

  12. Consider the problem of inverting an integer array a[] of length n. For example, an array with values [12, 4, 6, 33, 5] should be turned into an array with values [5, 33, 6, 4, 12]. The task is to write a void procedure invert which takes as arguments an array and its length. The procedure should be correct for all n >= 0. Write two variants:

  13. Consider the problem of finding the most frequent value in an array a[] of length n. Assume 0 <= a[i] < m for all i, 0 <= i < n. Write a correct C procedure solving this task. The procedure should return an int, giving the most frequent value. Hint: use an additional array which must be declared in a correct way. Integrate the routine in a program and test its correctness for n = 50 and m = 20. The values in a[] should be chosen at random. Measure the time consumption as a function of n for inputs with n = m = 2^k, for k = 10, 11, ... , k_max. Take k_max as large as possible. Does the time increase linearly? If the answer is no, then explain why this was to be expected.

  14. Consider the testing whether a specified value x occurs in an n x n integer matrix a[][]. Write a correct C procedure returning 1 if there are i and j so that a[i][j] = x and 0 otherwise. Integrate the routine in a program and test its correctness.

  15. Consider an array a[] of type unsigned int. The length of the array is given by a parameter n, which is read at the beginning of the program. Initialize the array according to the following rule: a[0] == 0; a[i] = (a[i - 1] + x) % n, for all i > 0. Algebra tells us that whenever n and x are relatively prime, that is gcd(x, n) == 1, the whole pattern of a[] values a permutation of the numbers 0, ..., n. That is, all numbers occur exactly once. Taking x a prime number, this is guaranteed to hold for all n which are not a multiple of x.

    The task is to verify this property for various n and x and to measure the time consumption. This is done by using a second unsigned integer array b[], which is used for counting the frequencies of the numbers in a[]. It is initialized at zero and in a final pass the maximum of all values in b[] is determined and printed.

    Times can be measured with the following procedure:

          long dclock() {
            /* Returns the time in milliseconds */
            struct timeval  tp;
            struct timezone tzp;
            gettimeofday(&tp, &tzp);
            return 1000 * (tp.tv_sec % 1000000) + tp.tv_usec / 1000; } 
        
    This is not the most scientific way of measuring times, but it is simple and works quite well. In order to be able to use this routine it is necessary to include the system library "sys/time.h", which is done in the same way as the inclusion of "stdio.h".

    For n you must test n = 2^k, for all k >= 12 as far as the computer allows you to solve the problem in less than 1 minute. For x you must test x = 1, 2, 4, 11, 19, 1007, 99991. The time measurement should only reflect the time for counting the frequencies of the numbers in a[], not the initialization or finding the maximum. To get stable measurements, the experiments should be repeated until the sum of the measured times exceeds 1000 ms (and then of course you must divide by the number of experiments to get the average time per experiment). Plot the resulting average time consumptions as a function of n using a doubly logarithmic scale (that is, both along the x-axis and along the y-axis the scaling is so that each factor two is one unit distance) connecting the points belonging to the same x value. Consider the developments and the differences and explain them.

  16. Write a program for converting numbers from one number system to another. The program repeatedly asks for the number, the initial radix and the final radix. Both radices can be any number up to 10. Negative numbers and zero should also be treated correctly. The resulting converted number is printed.

  17. In Chapter 1 it was specified how a 32-bit floating-point number is composed: 1 sign bit, 8 exponent bits in excess-127 representation, 23 mantissa bits giving an unsigned int. Write a program which asks for a floating-point number, for example -456.123E16 and composes all bits into the 32 bits of a single unsigned int x. In the program you also declare a float* variable y, which is set to point to the same address as x (setting y = (float*) &x;). Then you print the value *y to check the correctness of your conversion.

  18. Write a program for efficiently performing set operations using a boolean for every element of the set, packing 32 booleans (which indicate whether an element is present in the set or not) in an unsigned int. The elements in the sets have indices from 0 to n - 1, for some value n which is read at the beginning of the program. The supported operations should be:

    Generate three random sets of size 100.000.000 each: S1 are the lotto prices for the first draw, the probability that a number gives a price in the is 0.05. S_2 are the lotto prices for the second draw, again a fraction 0.05 of them is 1. S_3 gives the lotto bets, the probability that a number is selected is 0.2. Now compute the number of bets resulting in a price (each bet gets at most one price). That is, you should first compute the union of S_1 and S_2, then intersect with S_3 and finally compute the size of the resulting set. Print this resulting number (if it does not lie between 1.940.000 and 1.960.000, then probably there is something wrong with your program).

    Random numbers can be generated with help of the function random. See the online manual for the details (type "man random" inside a Unix or Linux environment).

  19. Write a program for determining all primes up to a certain maximum value n which is read at the beginning of the program. Use the Eratosthenes sieve method. The idea of this method is the following:

    This algorithm can be made more efficient by explicitly dealing with some special cases. For example, it is not necessary to ever test even numbers: these can also be thrown out by a modified initialization. Larger improvements can be achieved by not testing multiples of 2, 3, 5, 7, ... , either. And, when one is not testing them, why should one have storage for these numbers? None of these improvements you must implement, it is just pointed out here that this fast algorithm for finding prime numbers can be improved further.

    Program two variants with the following features:

    For each of the two versions determine the largest power of 2 for which the program runs in less than 1 minute. Which variant is best?

    The program should also produce some output. After performing the sieving, you should determine for each number 2 <= k == 2^i <= n / 2 the number of primes between k and 2 * k and the resulting average distance between two primes in these intervals. For each of these intervals the program should also print the maximum distance between any two consecutive primes.

    We want to know how efficient the Eratosthenes algorithm for computing primes is. Not in a concrete sense by measuring seconds, but in an abstract sense by counting some specific operations which give a good measure for the amount of work performed. In our case, such a measure is given by the number of visited multiples of the prime numbers. This does not account for the initialization and the testing, but this amount of work is easy to estimate: it is proportional to n.

    Determine this number for several values of n and speculate how it develops as a function of n. You can choose from simple functions of the following types: c * n^2, c * n^{3/2}, c * n * log n, c * n * loglog n, and variants. Of course you do not need to speculate: using your measurement of the development of the average distance between primes, it is not hard to derive this development.

  20. Write a program for converting a text to caps_format, all letters must be replaced by capitals, while the other characters and the layout remain unchanged. The original text is found on the file input, the converted text is written to the file output. Both files stand in the same directory as the program.

  21. Write a program for reformatting a text so that all lines are left- and right-aligned. The program first asks for the line width w. Then it determines which words fit on a line loading the characters into a buffer. Then between all words that fit on the line a certain number of blanks is added (so that the spacing becomes as evenly as possible). In this context, a "word" means any sequence of characters not containing blanks, so non-letters standing connected to a word (commas, dots, etc.) are treated as being part of the word and do not get separated from it. Additional blanks and empty lines in the original text are ignored, so the output is a single block of text of width w. Only the last line should be treated in a special way: here no additional spaces are added. The original text is found on the file input, the converted text is written to the file output. Both files stand in the same directory as the program.

  22. Write a program for computing matrix products in three different ways. For computing the product C of n x n matrices A and B, the following methods should be tried: Here the transpose of a matrix A is the matrix A' with A'_{ij} = A_{ji}.

    The matrices should be initialized as follows:

    A_{ij} = 1, for all i, j with i + j even
    A_{ij} = -1, for all i, j with i + j odd
    B_{ij} = i, for all i, j
    For C = A * B this gives a simple regular pattern, which can be used to check that the three procedures all compute the same product.

    Measure the time for each of these methods for n = 2^k, for k = 4, 5, ..., 10 or 11. The time for possibly transposing the matrix must also be taken into account, but not the time for allocating and initializing the matrices. The first time you are using C after allocating it, all its fields must be accessed once to make sure that C is actually loaded in to the cache/memory. For the small matrices the experiments must be repeated many times to get stable time measurements.

    Plot the results in a suitable way: along the x-axis you should give the k values, along the y-axis you should give log_2 T(2^k), where T(2^k) gives the time for an experiment with n = 2^k. The graphs should be about lines. Explain the irregularities in the development and the differences between the methods. Which method is best?





Program Execution

Computers

The previous chapter introduced the computer language C more or less independently of computers. It was only told that a compiler is needed to translate the program and that a.out should be typed to execute it. In this chapter we first discuss the essential features of computers. Then we consider how data types are represented internally. This discussion partially overlaps with the earlier presentation, but now it is given independently of the applied programming language. Then we consider in some detail the translation stages of a program and the virtual computer levels. Finally we look at some efficiency aspects.

Composition

Abstractly, a computer consists of one or more processors, a hierarchy of storage devices, a device for handling input and output, abbreviated IO, and connections between all these. These connections are called buses. On the one hand it is good and necessary to abstract. On the other hand it is also important to have a basic understanding of the operation of a computer, because this determines the conditions under which we are working. In the following, in order to make it more concrete, some numbers will be mentioned. Clearly these numbers are changing rapidly. Memories become larger, processors become faster. The main features however do not really change: most of the aspects mentioned in the following were already valid 1980.

Two-Processor Computer System

The processor or central processing unit or ALU is the heart of a computer. The processor is a chip which consists of several parts. Historically the most important parts of this chip were the control unit, the arithmetic logical unit or ALU and the registers, but to improve performance caches have been added. The control unit is the coordinator. It fetches instructions and data from the memory and guides their execution. The ALU contains the hardware for arithmetic and logical operations. The registers are used for storing data involved in the computations, but there is also a register reserved for the program counter, containing the address of the next instruction to execute, and an instruction register, containing the current instruction.

The processor is the heart, but without other components, it would not be very useful. The main memory is the normal place where to store data and instructions. For larger amounts of data and more permanent data, there are secondary and tertiary storage media. The hard disk is the most common form of secondary storage. For even larger amounts of data there exist drums and tapes, but nowadays they are only used for special applications and they are loosing importance because of the rapid development of hard disks.

The IO media allow to get data into the system and out off it. The most important IO media are the key board, the mouse and all kinds of reading devices. Punch card readers, once the prime way to feed data into a computer, have died out. The most important output media are the screen and the printer. Of course for the communication with the outside world, the computer contains several other chips which in the case of the video card might be considered to be a special-purpose computer itself.

If a computer is equipped with more than one processor, then each of them may execute instructions. The easiest situation is when there are several programs running in parallel. This is always the case: in addition to your game/browser/compiler/programming task, there may also be a clock, a screen background, and many system routines running. On a single-processor machine, these are all running `at the same time', at least it looks that way. Actually, each of these processes is occasionally switched on for a very short time and then switched of again. The clock is maybe active for a few microseconds per second, taking on average 0.1% of the total time, while your game requires more computation and may be active for 98% of the time. This whole idea is called time sharing. If there are several processors, then the processes demanding time are scheduled over all of them. Alternatively, but this requires special programs, it is also possible to process one task on several processors.

A computer consists of a processor, a hierarchy of memories and IO devices interconnected by buses.

Memory Hierarchy

As was pointed out above a computer is not equipped with a single amorphous memory, but with a whole hierarchy of memory devices, which are different in size and in nature. A good understanding of how this memory hierarchy works is of rapidly increasing importance for understanding the performance of programs. For the same problem it is often possible to write a program which takes the features of the memory hierarchy into account while some simpler solution does not. In that case, the refined algorithm will mostly be considerably faster, even if it performs some more operations. For the programmer the memory hierarchy is much more important than the details of the instruction execution, because a memory-aware program can mostly not be obtained by optimizing a carelessly designed program: memory-optimization is an algorithmic rather than a programmatic issue.

The registers are located in the immediate vicinity of the ALU. There are a small number of them, mostly 16, 32 or 64 . Most instructions require that (part of) the data to work on are located in the registers. During the computation data are loaded into the registers and written away again when the place they are occupying is needed for other data.

At the next levels of the hierarchy there are one, two or three caches. These are fast and small storages which can be accessed rapidly. They are located on the same chip as the processor. Nowadays, a two-level cache is most common. The size of the smaller of the two, the first-level cache, mostly lies between 16 and 64 KB. Data in the first-level cache can mostly still be accessed by the processor in one clock cycle. The second-level cache is considerably larger, mostly between 256 KB and 1 MB. Accessing data stored in the second-level cache takes several clock-cycles.

The main memory is the primary place where data are stored. It consists of so-called random-access memory or RAM. The size of the RAM has gone up dramatically. 1975 a mainframe might have as little as 64 KB, now most PCs have between 256 MB and 4 GB of RAM, a yearly increase by a factor of about 1.5. The main memory is not on-chip, but off-chip. That means that it is not located on the same chip as the processor. Typically the main memory is even composed of a whole bench of chips: a 1 GB main memory may physically consist of 4 memory chips of 256 MB each. This has many implications. In the first place one can choose the main memory somewhat independently from the processor. Also one can upgrade a computer by adding or replacing a few memory chips at a later time. But, it is of course precisely this off-chip nature of the main memory which causes the high cost of accessing it: the distances are much longer, the wires and contacts take more time to load, the request must be guided through several switches, ... . Accessing the main memory takes hundreds of clock cycles.

The name random-access memory was originally given to distinguish it from tapes and other more serial kinds of memory. Any position of the RAM can be accessed in about the same time. With a tape this is very different: it can only be accessed at the position which is currently at the reading/writing head, and to get to another position requires winding through the tape. In normal live we find the same difference when comparing the access possibilities of music tape and a music CD. Currently RAM is realized by having a certain type of transistors which can be switched in two states. This allows to record one unit of information, a bit, in every transistor. Keeping this state requires some electrical power, and therefore all RAM information will be lost when the power is interrupted for more than a second. In earlier days, RAM was realized by having a large number of small magnetizable iron (ferrite) rings which also could be put in two states. The major disadvantage of these was that they were very large for the amount of memory they could store. They nevertheless constituted a great improvement over the earlier used radio tubes, which were even much larger, consuming tremendous amounts of electricity and highly unreliable.

4 KB of Core Storage From Around 1970 The picture shows 4 KB of ferrite-based core storage from around 1970. Each small rectangle contains 1024 small rings, each of which can be used to store one bit. The electronics around the actual storage can be used to write and read the stored values. The white rod is a pencil which was added to indicate the size of the card.

The secondary memory nowadays consists of so-called hard disks, plastic disks with a magnetizable coating on which data can be written and read. Unlike RAM memory, a hard disk does not need sustained electrical power: data can be reused even if the computer was turned-off for years. This is an essential feature, but not interesting for our considerations of the running computer. The size of hard disks has increased from a few MB to hundreds of GB, growing as fast as the main memory or even slightly faster. Accessing the secondary memory may take as much as 10 ms.

Memory Hierarchy

Counting per GB of storage, secondary storage is much cheaper than primary (RAM) storage: RAM costs 200 euro per GB, hard disks cost 2 euro per GB. This feature makes it cost-efficient to sometimes also use the hard disk for storing data which are needed in a computation: we have a problem to solve, but unfortunately, we have not enough RAM for all data. Then, part of the data can be written away on the hard disk, while the processor is working on another part.

Not only the access time increases dramatically when using the larger memory elements. Even the bandwidth decreases strongly. The bandwidth of a bus, a connection in a computer system, is the maximum number of bytes which can be transferred over it per second. A high access time does not necessarily imply a small bandwidth, because the bandwidth is computed from the time for transferring a large amount of data placed in consecutive memory positions. The bandwidth of the bus to the caches is not a limiting factor. The bandwidth of the memory bus is more than one GB, which is considerable, but in very memory intensive applications, such as copying the values of one large array to another array, it nevertheless determines the execution time. A normal hard disk works at a rate of around 10 MB/s, about 100 times less than the speed of the main memory. On the one hand, this is bad, on the other hand, 10 MB/s is much more than one might expect from the access time of 10 ms. The actual amount of useful data that may be accessed per second depends on how these accesses are organized.

There is a memory hierarchy ranging from tiny and rapidly accessible to huge and slowly accessible.

Internal Representations

Binary Numbers

Internally, the computer is working with numbers written binary, that is, with sequences of zeroes and ones. The binary number system works just as the decimal system: the value of a digit depends on its position. A binary number (a_{k - 1}, ..., a_1, a_0) has value sum_{0 <= j radix of the number system. For us the decimal system (the system with radix 10) is most convenient, but this is only because we have been working with it since we were young. The binary system has the great advantage that the addition and multiplication tables are almost trivial. There is no need to remember that 7 + 6 = 13 and that 9 * 9 = 81. The rules of addition and multiplication can be captured in two 2 x 2 tables. The only case which must be treated with some care is 1 + 1 = 10. In an addition of numbers x and y consisting of several digits this means that there is a carry of 1 to the next addition. Possibly this may mean that the sum z has more digits than x and y. In the common case that the number of digits is bounded, this may even mean that overflow occurs, a common source of run errors.

Binary Arithmetic

The value of a binary number given as an array a[] of length k can be computed with the above sum formula. For a given non-negative integer n, it is also easy to compute the entries of this array. In C this may be done as follows:

  int* int_to_binary(int n, int k) {
    int i = 0;
    int* a = (int*) malloc(k * sizeof(int));
    while (n > 0) {
      a[i] = n & 1; // The least significant bit
      n = n >> 1;   // Shifting right one position
      i++; }
    while (i < k) {
      a[i] = 0;
      i++; }
    return a; }
Here k gives an upper bound on the number of required bits (on a 32-bit machine k = 32 will always be large enough). Instead of an array of ints, we might also use a type requiring less space.

A clear disadvantage of the binary system from the human point of view, is that the numbers get very long and it is hard to see whether there are five or six consecutive zeroes. A convenient middle-way is to express numbers in the hexadecimal system, that is with radix 16. In the hexadecimal system we use the symbols "0", "1", ..., "9" to denote the first ten values, and then "A", "B", "C", "D", "E", "F" to denote the values 10, ..., 15. Thus, the hexadecimal number 3AB75E corresponds to the decimal number 14 * 16^0 + 5 * 16^1 + 7 * 16^2 + 11 * 16^3 + 10 * 16^4 + 3 * 16^5. Conversion from binary to hexadecimal and vice versa is much easier: just divide the binary numbers in groups of four digits and convert each of them individually: 0011.1010.1011.0111.0101.1111 = 3AB75E (here the dots are only added to make the separations clear). This is like writing a large decimal number with a dot after each group of three, which can be viewed as writing the number in a system with radix 1000.

The largest positive integral number which can be used on most current computers is 2^32 - 1 = 2147483647. If one needs this number, it is not very convenient to write it in decimal because it is an ugly number and it does not even fit on most pocket calculators. As an hexadecimal number it is very simple: 0111.1111. ... .1111 = 7FFFFFFF.

Internally computers work with binary numbers. Externally they can handle numbers given in several number systems. Number conversion is simple and can be performed in a time which is proportional to the the number of digits.

Radix Conversion

The given procedure for computing the bits of a binary number is an efficient implementation of a more general algorithm for writing numbers with respect to a specified radix r. We are so familiar with the radix-10 system, that it is hard to distinguish the notation 7683 from the number it stands for. But what is the meaning of this notation? Assuming that we know the meaning of the individual symbols, and that we know how to handle elementary arithmetic, 7683 stands for the number with value 3 + 8 * 10 + 6 * 100 + 7 * 1000. More generally, a decimal l-digit number with digits d_i, 0 <= i < l, where digit 0 is the least significant one, has value sum_{0 <= i < l} d_i * 10^i. More generally, if the number is given with respect to radix r, its value would be sum_{0 <= i < l} d_i * r^i.

Now we know how to compute the value of a number given its digits, but how to compute the digits? That is, for a given number x, we want to find numbers d_i, i >= 0, with 0 <= d_i < r, so that sum_{i >= 0} d_i * r^i = x. There are two approaches: either first computing the least significant or first computing the most significant digit. The first is slightly simpler. We need some basic rules of modular arithmetic. Integer division is denoted by '/'. the modulo operator by '%'.

  1. (a / r) * r + a % r = a.
  2. (a * r) % r = 0.
  3. If 0 <= a < r, then a % r = a.
  4. If a = b, then a / r = b / r.
  5. If a = b, then a % r = b % r.
  6. If a % r = 0, then (a + b) / r = a / r + b / r.
  7. (a + b) % r = (a % r + b % r) % r.
Rule 1 expresses that a % r is the remainder of the division of a by r. This implies rule 2 and 3. Rule 4 and 5 are trivial observations. Rule 6 and 7 express the partial linearity of the operators.

Using these rules it follows that if x = sum_{i >= 0} d_i * r^i, that x % r = (sum_{i >= 0} d_i * r^i) % r = (sum_{i >= 0} (d_i * r^i) % r) % r = d_0 % r = d_0. For the last equality we used that 0 <= d_0 < r. Knowing d_0, we observe that x - d_0 is a multiple of r, and can thus be written as x' * r. In other words, x' = sum_{i >= 1} d_i * r^{i - 1}. Thus, d_1 = x' % r. Continuing, all digits can be determined. Each digit requires a constant number of operations.

Data Types

The computer is working with several kinds of data. We say type for a well-defined kind of data. In normal live one should not compare apples with pears, but one can compare red apples with yellow apples. Likewise, in computer live one should not compare an integral number with a character, but one can compare 15 and 23. In most computer languages there is a typing mechanism: any constant or variable has a well-defined type, and in a single expression one should only encounter objects of the same type (though some languages are very tolerant and perform implicit type conversions to reach this situation).

A primitive type is a type which provides space for a single simple variable and which constitutes the basis for the construction of all derived types. Primitive types in most programming languages (with possibly small variations in the naming) are at least the following

Boolean:
truth values, "false" or "true"
Character:
the keyboard symbols, special characters and control characters
Integer:
integral numbers in a finite range
Float:
floating point numbers with finite precision and range

An array is an indexed sequence of elements of a certain type, all elements being of the same type. In mathematics arrays are denoted by a subscript: "x_i" denotes element i of the array x. In computer science it is common to denote arrays with square brackets: "x[i]". Arrays can be defined over any previously defined element type, particularly of booleans, characters, integers and floats. There are two common ways of speaking about an array over some type xxx: one can either say "an xxx array" or "an array of xxx".

The array construction is the most elementary way of obtaining so-called derived types: types which are obtained by any of the type construction mechanisms provided by the programming language under consideration out off previously defined types. The limitation of the array construction is that all elements must be of the same (underlying) type. The power of the array construction is that the number of fields is specified by a parameter. The type of an array with integer elements is "integer array", not "integer array of length 26". This implies that we can make a procedure (a procedure is a fragment of a program which can be called from outside with certain arguments and which returns some value), for finding the minimum of integer arrays, we do not need one such a procedure for each possible length of the arrays (though some older programming languages have this stricter view).

A second main mechanism to obtain derived types is to create a compound type of two or more existing type. This construction is known under many other names ("record", "structure") as well, but it always works in the same way: a compound type has several fields for elements of possibly different types. These fields are accessed by a name, not by using an indexing mechanism. An example of a compound type is "xyz_coordinate", which has three fields "x_coordinate", "y_coordinate" and "z_coordinate", each of them a floating point number. Another compound type may be "personal_record", with fields "name", some kind of array of characters, "age", a small positive integral number, for which one might use a single byte, "personal_number", a large positive integral number, for which one might use an integer. The details of definition and accessing the individual fields differs from language to language.

Arrays and Compounds

Of course one can have arrays of compounds, compounds of arrays and arrays of compounds of arrays ... . One of the most important two step constructions are matrices. Mathematically a matrix is a square or rectangle with numbers. For a matrix "A", "A_{i, j}" denotes the element at position (i, j) of the block (i denotes the row and j the column, the indexing starts in the upper-left corner with A_{0, 0}). This is such a fundamental mathematical object, that we will also frequently encounter it in computer programs. The cleanest way of obtaining matrix objects is by defining it as an "array of row", where "row" is defined as an "array of xxx", where "xxx" stands for the type of the elements.

Most computer languages work with a typing mechanism. The array and compound construction are the most important mechanisms for constructing arbitrarily complex derived types out off a small number of primitive types.

Booleans

Booleans are truth values, true or false, which we abbreviate to "T" or "F". In this section we give a rather informal description of how to work with booleans.

Basic Operators

In primary school we have learned how to compute with integers and fractional numbers. The main operators being "+" and "*". Special rules had to be observed which, because all of them are derived from generalized counting arguments, are not too unnatural. In a similar manner there are rules for computing within the much simpler world of the booleans. There are basic operators: "or", "and" and "not". The operators "or" and "and" are so-called binary operators. Binary means that they need two arguments to be evaluated like "+" and "*". "not" is a unary operator, requiring only a single argument. Boolean expressions are commonly called expressions.

Because there are only two boolean values and the operators take only one or two arguments, there are very few combinations, which (unlike computation with integers) makes it possible to define the complete evaluation rules by small tables, which are called truth tables:

Truth tables for not, or, and

The value of the first argument is given on the left, the value of the second argument on the top. For example, here we find that if we have two boolean variables x and y, and x is true and y is false, then "x or y" is true and "x and y" is false. Because "not y" is true, we also have that "x and not y" is true. We have added the truth table for the operator "implies", for which we use the symbol "->". This truth table follows if one thinks what implication should mean. "x -> y" is clearly false if x is true and y is false. If both are true, then it clearly should be true. The other two cases are debatable, but if x is false and y is true, then this does not contradict the implication, so the most reasonable choice is true. The same applies when both are false.

Let us introduce symbols for the operators. This makes it easier to distinguish the operators from English words. "not", "or" and "and" are denoted by "||", "&&" and "!", respectively. Like in numerical exprsssions, brackets can be used to enclose subexpressions of a boolean expression which should be evaluated first. So, when we write (x && y) || ((!y) && z), it is clear what is meant: first determine the values of the subexpressions s_1 = x && y and s_2 = (!y) && z, which requires that one first evaluates !y, and then compute s_1 || s_2. It is a good practice to write brackets to prevent any kind of doubt, but there is no obligation to do so: also x && y || !y && z is a correct expression. But what does it mean? The value of such an expression with many operators without any brackets is determined by the priority rules, telling which operators to evaluate first. Just like we know that a * b + -b * c (here "-" is the unary operator) is to be evaluated as (a * b) + ((-b) * c), it is also defined that ! binds strongest, and that && comes before ||. In general, there are differences between programming languages, but always we have:

Next to the above standard operators there are several others. The most important is the "exor" operator, denoted "^". It gives the exclusive or of two variables, that is, "x ^ y" is true if and only if exactly one of the two variables is true. Other operators, most important in hardware design, are "nor", not or, and "nand", not and. The truth tables of these operators are as follows:

Truth tables for exor, nor, nand

The truth tables for the operators are the foundation of all laws on boolean expressions.

Relation with Numbers

There are two boolean constants, T and F. There is also an algebraic object, called C_2 or Z / 2 Z, which has very similar properties. C_2 has two elements, 0 and 1. There are two basic operations: "+" and "*". The product works as one expects: 0 * 0 == 0 * 1 == 1 * 0 == 0 and 1 * 1 == 1, and "+" is to be taken modulo 2, so 0 + 0 == 1 + 1 == 0 and 1 + 0 == 0 + 1 == 1. If we identify 0 with F and 1 with T, then the multiplication and addition tables correspond exactly to the truth tables for "&&" and "^". The operation ! corresponds to + 1. || is slightly more complex: a || b should be replaced by a + b + a * b. a -> b should be replaced by 1 + a + a * b.

Applying this correspondence allows to rewrite boolean expressions as expressions in C_2. Computing in C_2 is very simple because many simplifications can be made: both operators distribute, associative and commutative, a + a == 0 and a * a == a. This is particularly useful for proving equalities. For example, let us check the first distributive law: x && (y || z) == (x && y) || (x && z). The left side corresponds to x * (y + z + y * z) == x * y + x * z + x * y * z, because * distributes over + in C_2. The right side corresponds with x * y + x * z + x * y * x * z = x * y + x * z + x * y * z, because multiplication in C_2 is commutative (x * y == y * x) and idempotent (x * x == x). We also check the first DeMorgan's law: !(x && y) == !x || !y. Rewriting the sides gives 1 + x * y and (1 + x) + (1 + y) + (1 + x) * (1 + y) == 1 + x + 1 + y + 1 + x + y + x * y == 1 + x * y.

Another alternative for working with booleans, is to map the positive integral numbers to the booleans: 0 maps to F, all other numbers correspond to T. In this way (not considering the possibility of overflow), "+" and "*" perfectly correspond to "||" and "&&".

A boolean contains little information, but sometimes one does not want to know more. Assume we have two vectors, v and w, each of length n. Assume that the entries are booleans. Let us say, v[i] == T denotes that something is possible and likewise for w[]. Possibly v[] is the row of a matrix and w[] is a column of a second matrix, but v[] may also be true for all appartments which cost at most 300 euro and w[] may be true for all appartments with at least 70 m^2. We want to determine whether v[] and w[] have a common hit (all affordable sufficiently spacious appartments), that is whether there is an i for which v[i] == w[i] == T. So, we want to compute the following value x:

x = or_{i = 0}^{n - 1} v[i] && w[i].
This can also be reformulated into 0-1 terms: v'[] and w'[] are arrays with 0-1 values, v'[i] == 0 if and only if v[i] == T and likewise for w'[] and w[]. For v'[] and w'[] we can compute the following value x':
x' = sum_{i = 0}^{n - 1} v'[i] * w'[i].
Clearly x == T if and only if x' > 0. So, computing x' also computes x, but not vice versa: x' gives the number of common hits, while x only gives the existence of hits.
Booleans, || and && quite closely correspond to numbers, + and *, a correspondence which often offers computational possibilities.

Internal Representation

Because booleans are bi-valued, variables of this type can in principle be stored in a single bit. The most common convention is that false corresponds to 0 and true to 1.

In practice it is inconvenient that a byte is the smallest addressable memory unit. There is no way to address individual bits. Therefore, if one declares a boolean variable, this boolean will internally correspond to a byte (or more), wasting 7 of 8 bits in the byte (or worse). For a single variable this does not matter, but if one works with a long array of booleans, then it makes a big difference.

Arrays of booleans are one of severable possible ways for working with sets: if S[i] is true, then the element with index i is an element of the set S, otherwise it is not. Arrays of booleans can also be used for marking purposes: suppose you are searching your way in a maze (labyrinth), then it is a useful idea to mark the places you have already visited. If you are trying to solve this kind of searching problems on a computer it is even more important to mark the visited nodes. Because sets or graphs (a set of nodes connected by a set of edges) can be arbitrarily large, it is a good idea not to waste on memory unnecessarily.

The idea is to view a byte not as a number from 0 to 255, but as 8 bits packed together. So, for storing an array of n booleans, we use an array of n / 8 (rounded up) bytes, and store 8 booleans in each of them. The values of the individual bits can be set and read in a constant number (one or two) of clock cycles using the bitwise operations: in most programming languages there are not only instructions to perform operations on booleans, characters and numbers, but one can also perform 8-, 32- or even 64-bit operations in one stroke. Because there is a one-to-one correspondence between bit operations and boolean operations this feature allows to perform 32 or even 64 boolean operations in one clock cycle, provided that all these operations are of the same kind. The language C provides such bitwise operations: bitwise-and, bitwise-or and bitwise-exor. Bitwise-not can be obtained by computing bitwise-exor with FFFFFFFF.

Packing 8 bits of information into a byte is fine, but how do we set the bits and how do we get the information out again? The following procedures can be used for this:

  void set_to_zero(int* x, int i) { // Sets bit i of x to 0
    *x = *x & (255 - (1 << i)); }
  
  void set_to_one(int* x, int i) { // Sets bit i of x to 1
    *x = *x | (1 << i); }
  
  void flip_value(int* x, int i) { // Flips value of bit i of x
    *x = *x ^ (1 << i); }

  boolean is_zero(int x, int i) { // returns true if bit i is zero
    return (x & (1 << i)) == 0; }

  boolean is_one(int x, int i) { // returns true if bit i is one
    return (x & (1 << i)) > 0; }

Using 2^k - 1 instead of 255 = 2^8 - 1 in set_to_zero, the same procedures can also be used if k bits of information are packed together. All these procedures take just a few clock cycles.
Memory-efficient computation requires that arrays of booleans are packed with 8 booleans per byte. This feature is normally not provided defaultly.

Packing 8 booleans in one byte

An important point in computer science is the relation between the time for solving a problem and the amount of memory for doing this. There are problems for which we find a real space-time trade-off. This means that if one has more memory the problem can be solved faster. A good example is the management of a set. If we want to maintain a set with values in the range from 0 to some maximum value n - 1, then as said above, a convenient way to do this is to use an array S[] of n booleans. Assume that we get an initially empty set, that is, all values of the array are false (there is a trick, called "virtual initialization" which makes that we do not need this assumption). Then we can enter an element i by setting S[i] = true, we can test whether an element i is there by looking at the value of S[i] and we can remove an element by setting S[i] = false. All this is just a single instruction, said more formally: these operations can be performed in constant time. Of course here we use the property of the RAM: we use that setting S[i] or checking it can be done in a constant number of clock cycles for all i and that the value of i does not matter much. Clearly, if we use this idea for managing the "Matrikelnummer" of the students in the course Informatik I, this is not a very good idea: we would need storage for 99.999.999 booleans of which only 100 are used. In that case, one better should use an alternative approach requiring an amount of memory proportional to the number of elements in the set and not proportional to the value of the largest possible key. In that case we cannot assure constant time for each of the mentioned operations, so here we see the trade-off:

Much memory -> constant time for all operations,
Less memory -> more than constant time for some operations.

One may think that we have the same situation when we are packing booleans in a byte: instead of directly accessing the boolean we want, we need several operations. However, in the section on caching further down, we will see that it is expensive (possibly costing hundreds of clock cycles) to fetch a cache line from the main memory. Packing the data more densely also means that we may expect to reduce the number of cache misses. Generally the reduction of the time consumption because of reduced caching costs is far larger than the increase due to the unraffling.

Space-time trade-offs exist, but packing data more densely generally gives a reduction of both space and time.

Characters

It is most common to use 8 bits for a character. This offers 256 possibilities. The most important standard in this domain is ASCII. The lower 128 characters are exactly specified, the upper 128 can be adapted to local needs. The normal 26 letters of the alphabet start at character 65. These are the capitals. Starting at character 97 the small letters are given. Because 97 = 65 + 32, the code for a small letter has the same bit pattern, except for bit 5, which is a 1 instead of a 0. This makes it trivial to convert a text from lower to upper case and vice-versa.

ASCII Code Table

Clearly 256 possibilities are not enough for all languages of the world: only for Chinese we need more. Therefore, there are several initiatives for extended character sets. Currently the most used is "Unicode", which uses 16 bits, offering 65536 possibilities, which offers space for Chinese and all other languages of the world.

In many computer languages, there are ways to access the number of a character. Though this is not very elegant, this offers efficient ways to determine whether a character is a small letter, a capital letter, or a symbol. Of course, any program working this way will not work anymore if the coding is changed.

Characters are internally represented numerically, mostly using one byte per character.

Integers

Various types of integers are used for handling integral numbers. Most computer languages offers types with names like shortshort, short, integer, long, longlong. One can assume that the number of bits used for these is weakly increasing. So, an integer will have at least as many bits as a short, but not necessarily more. The precise number of bits used for any of these types depends on the language and sometimes also the system. Typically an integer is 32 bits, that is 4 bytes. 2^{32} ~= 4 * 10^9. These can either be used for positive numbers only, or for both positive and negative numbers.

If there are positive numbers only, a data type which is often indicated by some prefix like "unsigned", then we have a very simple situation: the value of bit_i (starting to count from the last bit which is bit_0) has value 2^i. Using unsigned integers is useful if one wants to have the maximum range, or for the sake of correctness: why should one use numbers that can be negative for numbers which for some reason always must be positive?

If one also needs negative numbers, than typically about half of the possibilities are reserved for negative numbers. Actually, this is mostly the default. There are various ways of achieving this.

The first idea one would come with the leading bit as a sign bit. So, if this bit is 0, the number is positive, otherwise negative. For 1-byte numbers, we thus would have 00100111 = 39, and 10100111 = -39. Computation with these numbers is (for the computer) slightly more complicated than with the other formats. Another disadvantage is that there are two representations of the number 0: 10000000 = 00000000 = 0.

The most widely used method for realizing negative numbers is the so-called two's complement. In this case for numbers with b bits, the first b - 1 count as usually, while the leading bit, bit b - 1, counts as - 2^{b - 1}. So, 10000000 = -128, 10000001 = -127, ..., 111111111 = -1, 000000000 = 0, 00000001 = 1, ..., 01111110 = 126, 01111111 = 127. The disadvantage of this construction is that it is asymmetric: there is a negative number (-128), without a positive counter part. The main advantage is that at the bit level the subtraction x - y can be performed as x + (-y). The bit pattern of a number -y can be obtained from the bit pattern of the number y using that -y = (-2^b + ((2^b - 1) - y)) + 1, that is, by taking the number that has ones where y has zeroes and vice versa and then adding 1 to the obtained number. This implies that when using two's complement there is no need for special subtraction hardware: 12 - 9 = 00001100 - 00001001 = 00001100 + (11110110 + 00000001) = 00001100 + 11110111 = 00000011 = 3. Notice that in the last addition, there is an overflow which must be ignored to get the correct result. The case that the second number is larger than the first is also treated correctly: 12 - 17 = 00001100 - 00010001 = 00001100 + (11101110 + 00000001) = 00001100 + 11101111 = 11111011 = -128 + 64 + 32 + 16 + 8 + 2 + 1 = -5.

A third way of realizing negative numbers, is the so-called excess representation. In this case, all numbers are shifted by a fixed constant, which is called offset. This means that for an offset c a number x gets the bit pattern belonging to x + c. For, example, for 8-bit numbers, the offset might be 2^7 - 1 = 127, Then we get -127 = 00000000, -126 = 00000001, ..., 0 = 01111111, 1 = 10000000, 2 = 10000001, ..., 127 = 11111110, 128 = 11111111. This representation is clumsy for performing additions and subtractions with, but it is used for the exponents of floating point numbers.

Integral numbers can be mapped onto a bit pattern in various ways and can have various lengths.

Floating-Point Numbers

Internally, real numbers are maintained as so-called floating-point numbers. For example, all numbers are maintained as numbers of the form x.xxxx * 10^{yyy}, where yyy may be positive or negative. The number x.xxxx is called the mantissa, the number yyy is called exponent.

In single-precision, that is using 32 bits for a number, a standard format is to divide the bits as [s | e_7 ... e_0 | m_22 ... m_0]. Here s is the sign bit, the bits e_i give the exponent and the bits m_i the mantissa. For double-precision numbers, that is, using 64 bits for a number, the most common lay-out is [s | e_10 ... e_0 | m_51 ... m_0]. We see that double-precision numbers have a somewhat larger range (using 11 instead of 8 bits for the exponent) and a much larger precision (using 52 bits instead of 23 for the mantissa). In any numerical computation, one should be aware that on a computer there are no numbers in the mathematical sense: integers can overflow and underflow, and when working with floating-point numbers one must even take inaccuracies into account: only with luck one will find that (a + b) * (a - b) - a^2 + b^2 = 0.

How does it exactly look? The number is written as a binary number of the form 1.xxxx * 2^yyy, preceded by the sign bit. The leading 1 is omited. So, -49.3125 = -1 * (32 + 16 + 1 + 1/4 + 1/16) = -1 * 110001.0101 = -1 * 1.100010101 * 2^5. So, the signbit is 1, the exponent is 5 and the mantissa (without the leading zero) is 1100010101. The exponent is given with an offset of 127 (1023 for double-precision), so 5 gets the bit pattern of 132: 10000100. The complete 32 bit number then looks like: [1 | 10000100 | 10001010100000000000000], of course the separators are not there, all bits are packed together.

Because 10 is not a power of 2, there are many numbers which can be written with a short decimal fraction, which cannot be written exactly as a binary fraction. The number 0.1 is an example. 0.1 = 3/32 * 16/15 = (1/16 + 1/32) * sum_{0 <= i} 1 / 16^i = 1/16 + 1/32 + 1/256 + 1/512 + 1/4096 + 1/8192 + ... . So, the binary representation of 0.1 is 0.00011001100110011001100 ... . Shifting the point so that there is a leading 1 gives 1.1001100110011001100 ... * 2^{-4}. So, as a floating-point number we get [0 | 01111011 | 10011001100110011001100]. This is not exactly the same value as the decimal value 0.1! When converting a decimal fraction to a floating-point number, some rounding is performed. Actually, the rounding is not performed by simply truncating. Instead the last bit is obtained by a real rounding process (but a different approach may be chosen on another processor). This means that 0.1 is not converted as above indicated but to [0 | 01111011 | 10011001100110011001101]. This is slightly more accurate, but does not solve the fundamental problem that inaccuracy not only arises as a result of under and overflow, but even because of an incompatibility of representation. For example, setting f = 0.1 and g = 10, gives f * g - 1 = 1.490... * 10^{-8} (but a different value may be found on another processor).

In an earlier section an algorithm is given for computing the binary expansion of a given positive integral number. But how to compute the bits of a number smaller than one? It was also pointed out that there are two approaches for computing the digits of a number with respect to a certain radix: either starting from the least significant, or starting from the most significant digit. For the integral part the first approach is most suited, because a priori the number of digits is not known. For a similar, but even more serious reason, the second approach is most suitable for computing the digits of the fractional part. This is not a spectacular observation, as this is precisely what we are doing when computing a decimal expression for 4 / 7. In the following we focus on computing the binary digits, the bits, of the fractional part y of a number x. That is, we must find values d_i, i >= 1, with d_i in {0, 1}, so that sum_{i >= 1} d_i * 2^{-i} = sum_{i >= 1} d_i / 2^i = y.

The fact that the d_i can only be 0 or 1 facilitates the computation considerably. For some numbers this sum will be finite, but in general the number of terms is infinite, as for 0.1. This makes clear that there is no way of starting with the least significant bit. The following two rules are the basis for computing the d_i:

  1. Because all terms are positive, it follows that d_i = 0, if 2^{-i} > y.
  2. Because 2^{-i} < sum_{j > i} 2^{-j}, it follows that d_i = 1, if 2^{-i} <= y < 2 * 2^{-i}.
Starting with a number y < 1, these rules allow to determine d_1. More generally, let y_j = sum_{i > j} d_i / 2^i. Notice that y = y_0 and that y_j = y_{j - 1} - d_i / 2^i. Thus, if d_j is known, y_j can be computed from d_{j - 1}. By induction it can easily be proven that y_j < 1 / 2^j. Thus, the second condition of rule 2 is automatically satisfied. Hence, one of the two rules can be applied to determine bit j of y_{j - 1}, which equals bit j of y_0 = y.

As an example, we compute the first 10 bits of y = 0.1.

    y_0 = y               = 0.1         <  1 / 2      -->   d_1  = 0
    y_1 = y_0 - d_1 / 2   = 0.1         <  1 / 4      -->   d_2  = 0
    y_2 = y_1 - d_2 / 4   = 0.1         <  1 / 8      -->   d_3  = 0
    y_3 = y_2 - d_3 / 8   = 0.1         >= 1 / 16     -->   d_4  = 1
    y_4 = y_3 - d_4 / 16  = 0.0375      >= 1 / 32     -->   d_5  = 1
    y_5 = y_4 - d_5 / 32  = 0.00625     <  1 / 64     -->   d_6  = 0
    y_6 = y_5 - d_6 / 64  = 0.00625     <  1 / 128    -->   d_7  = 0
    y_7 = y_6 - d_7 / 128 = 0.00625     >= 1 / 256    -->   d_8  = 1
    y_8 = y_7 - d_8 / 256 = 0.00234375  >= 1 / 512    -->   d_9  = 1
    y_9 = y_8 - d_9 / 512 = 0.000390625 <  1 / 1024   -->   d_10 = 0
The same pattern is emerging as above. Knowing that the pattern of zeros and ones must be periodic, it is not hard to guess the pattern. Once the pattern has been guessed, the correctness of the guess can be verified. With the appropriate notation, this allows to compute an infinite pattern in finite time.

How about zero? There is no way of shifting its binary representation so that a leading 1 occurs. One solution is to work with a leading 0 instead of a leading 1. However, this wastes a bit (because for all numbers except zero, the first bit then would be a 1). Therefore, it has been decided (at least according to the IEEE-P754 standard) to have a special representation for zero: [0 | 00000000 | 00000000000000000000000]. In principle the value of this number would be 2^{-127} ~= 5.9 * 10^{-39}, which is small, but not zero. Because zero is such an important number, this representation is treated in a special way so that at least this number can be represented exactly. Actually, even [1 | 00000000 | 00000000000000000000000] is exactly zero. Having +0 and -0 facilitates the product routines working with these numbers (-7 * +0 = -0 and -7 * -0 = +0).

Internally real numbers are maintained as floating point numbers. The available bits are divided over a sign bit, an exponent and the mantissa. The limited accuracy leads to rounding errors.

In fact, it is not hard to perform exact computations with fractional numbers. There is no need to expand 0.1 = 1/10 in a binary way: 4 / 7 * 1 / 10 = 4 / 70. This is easy. Square (and other) roots are harder, but even with these computations can be performed in a symbolic way. However, many numbers, the so-called transcendent numbers, can really not be represented in a convenient way. e and pi are two such numbers. Nevertheless, computers may be instructed to work even with these in an accurate way (given that a scheme is provided for evaluating an arbitrary number of digits): at any stage of the computation only those digits are evaluated that have an impact on the output. This means that the computation time becomes a function of the accuracy required to guarantee a correct answer.

For the computer bits are bits, and he cannot tell whether some byte 11110110 stands for the letter "ö", for an unsigned byte with value 246, for a signed byte in two's-complement representation with value -10, for some byte inside an integer or some byte inside a floating-point number. The higher levels should therefore make sure that data in the memory are accessed in the right way. In some programming languages this is done in a very strict way, in the language C almost anything is possible, correctly handling the memory access is the responsibility of the programmer.

From Program to Execution

Virtual Machine Levels

Conceptually a computer system consists of a hierarchy of machine levels. The lowest of these are realized in hardware the higher ones in software. Anything what can be done in software can also be done in hardware and vice-versa, so the distinction between the levels cannot be made on basis of their hardware/software realization. Only at the very bottom there must be one level which is executed directly on some kind of hardware (which also might be a human being). In the following we sketch the principal layers of the hierarchy.

Virtual Machine Levels

Digital Logic Level

At the lowest level digital computers are composed of switches. The gates are build from these switches. A gate is an electronic device which can compute functions of bi-values signals. These gates form the hardware basis from which all digital computers are build. Not all computers are digital, there are also analogous computers, which work quite differently. In modern computers the switches are realized with transistors build from some semi-conducting material (mostly silicon). In earlier days, they were made from radio tubes.

A transistor has three connections called base, collector and emitter. In the most common type of transistor, the collector is connected to plus and the emitter to minus. As long as no tension is supplied on the base almost no current will run from the collector to the emitter: the switch is closed. If a positive tension is supplied on the basis a small current runs from the base to the emitter. The semi-conducting properties make that this in a certain way opens the switch, so that a much larger (about 100-fold) current can run from the collector to the emitter.

The sketched properties of the transistor can be used to build basic gates. All logical functions can be build from NOT and NOR (= not-or) gates and also from NOT and NAND (= not-and) gates. All three can be constructed using two transistors and one resistor.

Basic Gates

In digital computers all computation is performed digitally. Arithmetic operations can be defined in terms of logical gates, but dedicated circuits require fewer transistors. Gates are packed together in a single chip, also called integrated circuit or IC. The simplest ICs consist of just a few gates, but there are also ICs with millions of gates, being capable of performing 64-bit arithmetic or of storing 2 GB of data.

Microprogramming Level

The digital logic level, the real hardware, provides some basic instructions. There will certainly be circuits for performing logical operations. Possibly there are even arithmetic circuits, but this is not necessarily so. In earlier computers more instructions were realized in hardware than in more modern computers. It is not common that there is a special division chip. Instead division is somehow performed using more primitive operations. All instructions available at the conventional machine level are carried out step-by-step by an interpreter running at the microprogramming level. We are talking here about instructions for arithmetic, shifts and comparisons. At this level one has to deal with all technical details, particularly also with timing details. At the microprogramming level the buses, registers and memories are controlled.

Conventional Machine Level

The next higher level is the conventional machine level. It is a conventional machine level, because it is not realized in hardware. On the earliest computers, this was the only programmable level. On many computers it is still the lowest level for which the user can write programs because the microprogram is stored in read-only memory (ROM). At this level instructions are byte and word based and not bit based as on the microprogramming level. An extensive set of instructions is available, including instructions for comparison, arithmetic and memory management.

It is also at the conventional machine level that the procedure calling mechanism is implemented. Calling a procedure means making a jump to another position in the program, passing some arguments, allocating local variables, computing, and returning to the calling instance, possibly passing back some value. After returning the computation should go on as if nothing has happened, the local variables of the calling instance must be available again. This is achieved by managing the data on a stack. A stack is a first-in-first-out data structure: data are added and removed at its top.

When calling a subroutine, a new workspace on top of the stack is created, this workspace is called a stack frame. After returning this space is made available again (there is no need to explicitly destroy the stored information). The address of the top of the stack is kept in a special register called stack pointer, abbreviated SP. Addressing with respect to the stack pointer is clumsy, because in the course of a subroutine new variables may be allocated, changing the value of SP. Therefore it is common that a second register is used as a fixed reference point, abbreviated LB, for all parameters and variables in a subroutine. The stack space may either grow from low to high memory addresses, upwards, or from high to low memory addresses, downwards. We assume the second which is most common.

When calling a procedure, all parameters are written on the stack in the order of their occurrence. Then the return address is added, then LB is added. After these additions, the assignment LB = SP is performed, giving LB its new value. Hereafter the procedure is entered, and space for the local variables is allocated. Thus, on a downwards growing stack, the parameters have a positive and the local variables a negative offset with respect to LB. When returning from a subroutine, execution continues at the position which is found at the position stored at address LB - 4 (assuming that each value on the stack takes four bytes) and LB is restored by assigning it the value which is found at address LB itself. The above is only one of several possibilities: LB may also point to the bottom or the top of the stack frame.

So, every procedure call implies a certain amount of copying and management of SP and LB. The speed with which this is performed is important for the speed with which a well-structured program, delegating subtasks to subroutines, can be executed. Fortunately the amount of copying is proportional to the number of variables and parameter of the called procedure, not of the calling procedure. Therefore, it may be assumed that calling a small function with one or two variables and one or two local variables causes almost no delay. Experiments confirm this. Calling a subroutine with many parameters is more expensive, but such a subroutine mostly also contains many instructions. So, eliminating subroutine calls, at the expense of readability, by copying the code of the subroutine at the position where it is called, a process called inlining, will normally save at most a few percent of the execution time. Inlining may even make the execution slower: if a procedure is called from several places, then inlining makes the code longer. For large programs, this means that a smaller fraction of the program fits in the program cache. Because a cache fault is far more expensive than a few operations on the program stack, this may have a negative impact.

The described calling mechanism is illustrated with the following simple program which can be downloaded here:

  #include "stdio.h"

  void init(int*a , int n) {
    int i;
    for (i = 0; i < n; i++)
      a[i] = (i * 4) % n; }

  int count(int* a, int l, int h, int x) {
    // Count the number of occurrencies of x in a[l] ... a[h - 1]
    int m;
    if (h == l + 1) // Subarray has length 1
      return (a[l] == x ? 1: 0);
    else { // Recurse in each half of the subarray
      m = (l + h) / 2;
      return count(a, l, m, x) + count(a, m, h, x); } }

  int main() {
    int n, x;
    int* a;
    printf("\nGive the value of n   >>>   ");
    scanf("%d", &n);
    a = (int*) malloc(n * sizeof(int));
    init(a, n);
    printf("Give the value of x   >>>   ");
    scanf("%d", &x);
    printf("x occurs %1d times in a[]\n\n", count(a, 0, n, x));
    return 1; }

The program asks for n and reads its value from input. An array a[] of length n is allocated and somehow initialized. The value of x is read from input and then the subroutine count() is called which computes the number of occurrencies of x in a[]. The given implementation is recursive exploiting the following observations: Of course this problem can also be solved efficiently with a single loop, but the given implementation gives a nicer calling sequence. It also provides a clean example of the divide-and-conquer programming paradigm:
  1. The problem is split in subproblems;
  2. The subproblems are solved;
  3. The results of the subproblems are combined to obtain the solution for the whole problem.

If n = 7, count is called from main with l = 0 and h = 7. From count a call is made with l = 0 and h = 3. The following call has l = 0 and h = 1. This does not result in further calls, the procedure is left returning some value. Back at the call with l = 0 and h = 3, the second call to count is made. It has l = 1 and h = 3. In total count is called 13 times (in general the number of calls equals 2 * n - 1). Notice that recursion results in an depth-first execution order. This means that when drawing the execution schedule in a tree-like fashion with the root at the top, all operations in one branch of the tree are performed before executing any instruction of the next branch.

Calling Scheme for n = 7

We continue to assume that the stack grows downwards and that the smallest address of the stack frame of main() is 880. According to our basic considerations, each stack frame has size 7, one for each variable plus one for the return address and LB. So, during call 1 the stack frame spans the 28 bytes ranging from address 852 to 879. During call 2 it ranges from 824 to 851 and during call 3 from 796 to 823. From here the recursion does not go deeper, going one level up before performing call 4. Therefore, during call 4 the same space is used as during call 3. During call 5 it goes one deeper again. Before performing call 6, the recursion first returns from call 5.

Development of Program Stack

In this context the notion of recursion depth is important. A program is said to have recursion depth d if the tree corresponding to the calling structure of its execution has depth d. The depth of a tree is the maximum distance of any leaf, a node at the bottom of the tree, from the root. For a program with recursion depth d, up to d + 1 stack frames must be allocated on the program stack. Before execution of a program starts, a fixed amount of space is reserved for the program stack and SP should never become smaller than the first address of this space. If this happens nevertheless, the program execution is interrupted with a message like "stack overflow". If, when forgetting to handle a terminal case without further recursion, a program contains an infinite recursion, allocating one stack frame after the other, this will be the consequence. However, this may even happen for programs which are logically correct if there is a very deep recursion. With the given implementation of count(), the recursion depth of the above program is logarithmic in n, and this will not be a problem, but the following is also correct:

  int count(int* a, int n, int x) {
    // Count the number of occurrencies of x in a[l] ... a[h - 1]
    if (h == l) // Subarray has length 0
      return 0;
    else // Recurse for first n - 1 elements of array
      return  count(a, n - 1, x) + (a[n - 1] == x ? 1 : 0); }

Here the recursion depth is linear in n, and for slightly larger n this will lead to a crash.

Operating System Level

When talking about the operating system, most people will think of Linux or Windows and their external features. From the point of view of program execution, this is of little importance. Here we consider the virtual machine defined by a set of instructions which is build on top of the conventional machine level.

Most instructions of the operating system level, such as those for arithmetic and logic, are already present at the microprogramming level. These instructions might be handed down level-by-level, but for efficiency reasons they are directly interpreted by the microprogram. In addition there are specific operating-system machine-level instructions which do not exist at the lower levels. These are instructions which have to do with the management of the computer.

The operating-system manages the processes running on the computer. Here the time-sharing mechanism is realized. Time sharing means that on a single computer many processes can run, each getting the impression of being served continuously. This is done by allocating short time slots to each of the processes depending on their priority and need for computing power. The fact that several processes are running in a time-shared fashion, implies that the running time of a program cannot be accurately determined by measuring the clock time at the beginning and end of its execution.

The operating system is also in charge of I/O and the secondary memory management. If the data of a process do not fit into the main memory, then part of the data are paged-out onto the hard disk and upon need paged in again. This paging is coordinated by the operating system, which decides which data to maintain and which data to overwrite by new data (after writing the old data back if they were changed). Virtual memory is the additional space which becomes available through this paging mechanism: on a computer with 1 GB of main memory, one may work, at the price of a possibly large slow down, as if there were 2 or 4 GB of memory.

All data have a logical and a physical address. The logical address is used inside the lower-level programs. The physical address gives the actual address in the main memory or on the hard disk. It is convenient to have this distinction, because it is convenient to have a contiguous view of the memory, even though physically this cannot always be assured (the memory may be fragmented) because memory is allocated and deallocated dynamically in the course of the program and because other processes are using memory as well. The operating system maintains a table for looking up the physical addresses of the pages on the hard disk.

Assembly Language Level

The assembly language stands just below the higher programming languages. Its instructions are in one-one correspondence to the machine instructions, and at this level there is still access to all features of the machines. For example, at this level program addresses may be maintained in an array-like fashion, whereas in most programming languages labels cannot be stored in an array. At this level there is also access to the execution flags, such as the overflow bit.

A compiler translates programs written in higher languages to assembly language. For each pair of languages there must be a different compiler. Writing compilers which lead to efficient assembly code takes time, and therefore running a program written in a popular language on a processor with a widely used assembly language often will go faster than when using an uncommon language on some obscure processor, because in the latter case the program will be translated in a basic way not fully exploiting the features of the machine.

The assembly code is translated by the assembler to machine code. This is a rather simple process because these instructions are in one-one correspondence. For each assembly-language instruction its operation code must be looked up in a table. Slightly harder is the translation of symbolic addresses which are used in conditional and unconditional jump statements. In the machine code all these must be replaced by addresses. Therefore, the assembler mostly makes two passes over the assembly code. In the first pass all labels are collected and stored in a table together with the address of the following instruction. In a second pass this table is used to substitute addresses for all labels.

Assembly Language

Every processor type has its own instruction set. The language of correct expressions using these instructions is called assembly language. The program which translates a program in assembly language to lower-level code is called the assembler.

Importance

The good thing about a program in C or Java or any other higher language is that, provided there is a compiler for this language-processor pair available, it is machine independent. In contrast, the assembly language is machine dependent. Another great disadvantage of assembly language is that it makes many details explicit which one normally does not want to deal with. So, assembly languages are characterized by the following two points:

From a practical point of view, except for system programmers and hardware designers, the lower machine levels are of little importance. Most people with a degree in computer science are somehow dealing with software. This software is written in some high-level language, often it is even composed of standardized and reusable components which are glued together by a few lines of own code. The assembly-language level is nevertheless interesting and worth a closer look for several reasons:

The first point is the most important. It is good to have a basic understanding of how a computer works and how a program is executed. The assembly code tells everything except for the scheduling of the pipelines, the memory management and the parallel execution of steps.

There is one more reason why one would like to have some understanding of assembly language. In most programming there are mostly a few lines in which 95% of the time is spend, for example, the innermost loop in a matrix product computation. If for such a program speed is of great importance, then in a final stage one might even consider the generated assembly code. As we will see, the optimization tools which are build into the compiler can do a lot, but they apply general rules, and do not understand what is going on. So, it might be possible to eliminate a few lines or to arrange them in a better way.

The above discussion makes clear that it would be a waste of time to learn all about a particular assembly language. On the other hand, because most of these languages are structured in more or less the same way, knowing something about any particular language will allow to understand another language with little effort. In the following we will consider some examples from the SPARC 7 assembly language. These will be treated with the above considerations in mind: it is not necessary to understand every single detail, as long as we are able to figure out what is going on. In the following we consider some simple C programs and the corresponding assembly code, both after a simple translation and using optimization.

Programming assembly code is tedious and of limited use. Understanding assembly code provides insight and offers opportunities for final optimizations.

First Example

The following program reads two numbers x and y, computes z = 32 * (x + y) and prints the result.
  #include "stdio.h"
  int main() {
    int i, x, y, z;
    x = 123456789;
    printf("Give x   >>>   ");
    scanf("%d", &x);
    printf("Give y   >>>   ");
    scanf("%d", &y);
    z = 32 * (x + y);
    printf("z = %d\n", z);
    return 1; }

Using the option -S when compiling with gcc, the compiler generates the following assembly code instead of an executable:
	.file	"assembly_ex1.c"
	.section	".rodata"
	.align 8
.LLC0:
	.asciz	"Give x   >>>   "
	.align 8
.LLC1:
	.asciz	"%d"
	.align 8
.LLC2:
	.asciz	"Give y   >>>   "
	.align 8
.LLC3:
	.asciz	"z = %d\n"
	.section	".text"
	.align 4
	.global main
	.type	main, #function
	.proc	04
main:
	!#PROLOGUE# 0
	save	%sp, -128, %sp
	!#PROLOGUE# 1
	sethi	%hi(123456512), %g1
	or	%g1, 277, %g1
	st	%g1, [%fp-24]
	sethi	%hi(.LLC0), %g1
	or	%g1, %lo(.LLC0), %o0
	call	printf, 0
	 nop
	add	%fp, -24, %o5
	sethi	%hi(.LLC1), %g1
	or	%g1, %lo(.LLC1), %o0
	mov	%o5, %o1
	call	scanf, 0
	 nop
	sethi	%hi(.LLC2), %g1
	or	%g1, %lo(.LLC2), %o0
	call	printf, 0
	 nop
	add	%fp, -28, %o5
	sethi	%hi(.LLC1), %g1
	or	%g1, %lo(.LLC1), %o0
	mov	%o5, %o1
	call	scanf, 0
	 nop
	ld	[%fp-24], %o5
	ld	[%fp-28], %g1
	add	%o5, %g1, %g1
	sll	%g1, 5, %g1
	st	%g1, [%fp-32]
	sethi	%hi(.LLC3), %g1
	or	%g1, %lo(.LLC3), %o0
	ld	[%fp-32], %o1
	call	printf, 0
	 nop
	mov	1, %g1
	mov	%g1, %i0
	ret
	restore
	.size	main, .-main
	.ident	"GCC: (GNU) 3.3.3"

In the first lines we find the format of the IO statements. It is interesting that the compiler has noticed that the two readf statements have the same format: the label .LLC1 is used twice.

Any procedure, also main starts with a save and ends with a restore. This saves the state of the calling procedure and ultimately restores it.

Corresponding to the instruction "x = 123456789;" we find

  sethi   %hi(123456512), %g1
  or      %g1, 277, %g1
  st      %g1, [%fp-24]
Here we encounter the various registers. In total there are 32 integer registers, eight of each of four types. These are %g0, ..., %g7, %o0, ..., %o7, %l0, ..., %l7 and %i0, ..., %i7. "sethi" sets the upper 22 bits of the second argument to the value of its first argument. "or" performs a bitwise logical or of its first two arguments and stores the result in the third argument. The number 123456789 is split in 123456512 (which has its last 10 bits equal to zero) and 277, and assigned in two steps to register %g1. Then the value of %g1 is stored away at memory position %fp-24. %hi() is not the name of a register but the operation of extracting the highest 22 bits of a number. There is an analogous operation %lo(), which extracts the lowest 10 bits.

This clumsy construction is necessary for several reasons: only the values of registers can be copied into memory, therefore the number must first be assigned to a register. Why can 123456789 not be assigned directly to %g1? The reason is that any instruction with all its arguments has to fit in 32 bits. 32 bits are not much!

The IO statements all translate in a similar way. For "printf("Give x >>> ");" we find

        sethi   %hi(.LLC0), %g1
        or      %g1, %lo(.LLC0), %o0
        call    printf, 0
The first two lines are now easy to understand: .LLC0 is copied in two steps into %o0. This is somehow passed as an argument to the procedure printf which is called in the third line.

Reading a number is more complicated because also the address of the variable has to be passed to the subroutine. For "scanf("%d", &x);" we find

        add     %fp, -24, %o5
        sethi   %hi(.LLC1), %g1
        or      %g1, %lo(.LLC1), %o0
        mov     %o5, %o1
        call    scanf, 0
The value of %fp, which is the reference point of the memory space, is added to -24 and stored in %o5. In two steps .LLC1 is copied to %g1. Then %o1 is moved to %o1, it is not clear why %o1 was not used before in the addition. Then %o0 and %o1 are passed as arguments to scanf.

The numerical computation "z = 32 * (x + y);" shows how variables are handled:

        ld      [%fp-24], %o5
        ld      [%fp-28], %g1
        add     %o5, %g1, %g1
        sll     %g1, 5, %g1
        st      %g1, [%fp-32]
First x is loaded into %o5 and y into %g1. Then these two are added into %g1. The compiler is clever and has noticed that 32 = 2^5. Therefore it does not use multiplication, which requires calling a subroutine, but uses the operation sll which is a mnemonic for "shift left logical". So, %g1 is taken, shifted 5 positions leftwards and then stored again in %g1. This is stored at address %pf-32, which corresponds to z.

The final instruction "return 1;" is translated as follows:

        mov     1, %g1
        mov     %g1, %i0
        ret
Here we see that a small value can be copied directly into a register. This is copied into %i0, which is the return value of the procedure.
Each C instruction translates to several lines of assembly code. Particular complications arise because an entire assembly instruction has to be packed in 32 bits.

Second Example

Now we are prepared for a slightly more interesting example, covering many other interesting features:
  #include "stdio.h"
  #define  MAXINT 0X7FFFFFFF

  void initialize(int* a, int n, int c) {
    int i;
    for (i = 0; i < n; i++)
      a[i] = (c * i) % n; }

  void swap(int* a, int* b, int n) {
    int i, x;
    for (i = 0; i < n; i++) {
      x = a[i]; a[i] = b[i]; b[i] = x; } }
  
  int span(int* a, int n) {
    int i, m, M;
    m = MAXINT;
    for (i = 0; i < n; i++)
      if (a[i] < m)
        m = a[i];
    M = 0; 
    for (i = 0; i < n; i++)
      if (a[i] > M)
        M = a[i];
    return M - m; }

  int main() {
    int i, s, n;
    int* a;
    int* b;
    printf("\nGive n   >>>   ");
    scanf("%d", &n);
    a = (int*) malloc(n * sizeof(int));
    b = (int*) malloc(n * sizeof(int));
    initialize(a, n, 173);
    initialize(b, n,  93);
    swap(a, b, n);
    s = span(a, n);
    printf("Span of a[] = %1d\n\n", s);
    return 1; }

Typing "gcc assembly_ex2.c -S" generates the following assembly code:
	.file	"assembly_ex2.c"
	.global .umul
	.global .rem
	.section	".text"
	.align 4
	.global initialize
	.type	initialize, #function
	.proc	020
initialize:
	!#PROLOGUE# 0
	save	%sp, -120, %sp
	!#PROLOGUE# 1
	st	%i0, [%fp+68]
	st	%i1, [%fp+72]
	st	%i2, [%fp+76]
	st	%g0, [%fp-20]
.LL2:
	ld	[%fp-20], %o5
	ld	[%fp+72], %g1
	cmp	%o5, %g1
	bl	.LL5
	nop
	b	.LL1
	 nop
.LL5:
	ld	[%fp-20], %g1
	sll	%g1, 2, %o5
	ld	[%fp+68], %g1
	add	%o5, %g1, %l0
	ld	[%fp+76], %o0
	ld	[%fp-20], %o1
	call	.umul, 0
	 nop
	mov	%o0, %g1
	mov	%g1, %o0
	ld	[%fp+72], %o1
	call	.rem, 0
	 nop
	mov	%o0, %g1
	st	%g1, [%l0]
	ld	[%fp-20], %g1
	add	%g1, 1, %g1
	st	%g1, [%fp-20]
	b	.LL2
	 nop
.LL1:
	ret
	restore
	.size	initialize, .-initialize
	.align 4
	.global swap
	.type	swap, #function
	.proc	020
swap:
	!#PROLOGUE# 0
	save	%sp, -120, %sp
	!#PROLOGUE# 1
	st	%i0, [%fp+68]
	st	%i1, [%fp+72]
	st	%i2, [%fp+76]
	st	%g0, [%fp-20]
.LL7:
	ld	[%fp-20], %i5
	ld	[%fp+76], %g1
	cmp	%i5, %g1
	bl	.LL10
	nop
	b	.LL6
	 nop
.LL10:
	ld	[%fp-20], %g1
	sll	%g1, 2, %i5
	ld	[%fp+68], %g1
	add	%i5, %g1, %g1
	ld	[%g1], %g1
	st	%g1, [%fp-24]
	ld	[%fp-20], %g1
	sll	%g1, 2, %i5
	ld	[%fp+68], %g1
	add	%i5, %g1, %i4
	ld	[%fp-20], %g1
	sll	%g1, 2, %i5
	ld	[%fp+72], %g1
	add	%i5, %g1, %g1
	ld	[%g1], %g1
	st	%g1, [%i4]
	ld	[%fp-20], %g1
	sll	%g1, 2, %i5
	ld	[%fp+72], %g1
	add	%i5, %g1, %i5
	ld	[%fp-24], %g1
	st	%g1, [%i5]
	ld	[%fp-20], %g1
	add	%g1, 1, %g1
	st	%g1, [%fp-20]
	b	.LL7
	 nop
.LL6:
	ret
	restore
	.size	swap, .-swap
	.align 4
	.global span
	.type	span, #function
	.proc	04
span:
	!#PROLOGUE# 0
	save	%sp, -128, %sp
	!#PROLOGUE# 1
	st	%i0, [%fp+68]
	st	%i1, [%fp+72]
	sethi	%hi(2147482624), %g1
	or	%g1, 1023, %g1
	st	%g1, [%fp-24]
	st	%g0, [%fp-20]
.LL12:
	ld	[%fp-20], %i5
	ld	[%fp+72], %g1
	cmp	%i5, %g1
	bl	.LL15
	nop
	b	.LL13
	 nop
.LL15:
	ld	[%fp-20], %g1
	sll	%g1, 2, %i5
	ld	[%fp+68], %g1
	add	%i5, %g1, %g1
	ld	[%g1], %i5
	ld	[%fp-24], %g1
	cmp	%i5, %g1
	bge	.LL14
	nop
	ld	[%fp-20], %g1
	sll	%g1, 2, %i5
	ld	[%fp+68], %g1
	add	%i5, %g1, %g1
	ld	[%g1], %g1
	st	%g1, [%fp-24]
.LL14:
	ld	[%fp-20], %g1
	add	%g1, 1, %g1
	st	%g1, [%fp-20]
	b	.LL12
	 nop
.LL13:
	st	%g0, [%fp-28]
	st	%g0, [%fp-20]
.LL17:
	ld	[%fp-20], %i5
	ld	[%fp+72], %g1
	cmp	%i5, %g1
	bl	.LL20
	nop
	b	.LL18
	 nop
.LL20:
	ld	[%fp-20], %g1
	sll	%g1, 2, %i5
	ld	[%fp+68], %g1
	add	%i5, %g1, %g1
	ld	[%g1], %i5
	ld	[%fp-28], %g1
	cmp	%i5, %g1
	ble	.LL19
	nop
	ld	[%fp-20], %g1
	sll	%g1, 2, %i5
	ld	[%fp+68], %g1
	add	%i5, %g1, %g1
	ld	[%g1], %g1
	st	%g1, [%fp-28]
.LL19:
	ld	[%fp-20], %g1
	add	%g1, 1, %g1
	st	%g1, [%fp-20]
	b	.LL17
	 nop
.LL18:
	ld	[%fp-28], %i5
	ld	[%fp-24], %g1
	sub	%i5, %g1, %g1
	mov	%g1, %i0
	ret
	restore
	.size	span, .-span
	.section	".rodata"
	.align 8
.LLC0:
	.asciz	"\nGive n   >>>   "
	.align 8
.LLC1:
	.asciz	"%d"
	.align 8
.LLC2:
	.asciz	"Span of a[] = %1d\n\n"
	.section	".text"
	.align 4
	.global main
	.type	main, #function
	.proc	04
main:
	!#PROLOGUE# 0
	save	%sp, -136, %sp
	!#PROLOGUE# 1
	sethi	%hi(.LLC0), %g1
	or	%g1, %lo(.LLC0), %o0
	call	printf, 0
	 nop
	add	%fp, -28, %o5
	sethi	%hi(.LLC1), %g1
	or	%g1, %lo(.LLC1), %o0
	mov	%o5, %o1
	call	scanf, 0
	 nop
	ld	[%fp-28], %g1
	sll	%g1, 2, %g1
	mov	%g1, %o0
	call	malloc, 0
	 nop
	mov	%o0, %g1
	st	%g1, [%fp-32]
	ld	[%fp-28], %g1
	sll	%g1, 2, %g1
	mov	%g1, %o0
	call	malloc, 0
	 nop
	mov	%o0, %g1
	st	%g1, [%fp-36]
	ld	[%fp-32], %o0
	ld	[%fp-28], %o1
	mov	173, %o2
	call	initialize, 0
	 nop
	ld	[%fp-36], %o0
	ld	[%fp-28], %o1
	mov	800, %o2
	call	initialize, 0
	 nop
	ld	[%fp-32], %o0
	ld	[%fp-36], %o1
	ld	[%fp-28], %o2
	call	swap, 0
	 nop
	ld	[%fp-32], %o0
	ld	[%fp-28], %o1
	call	span, 0
	 nop
	mov	%o0, %g1
	st	%g1, [%fp-24]
	sethi	%hi(.LLC2), %g1
	or	%g1, %lo(.LLC2), %o0
	ld	[%fp-24], %o1
	call	printf, 0
	 nop
	mov	1, %g1
	mov	%g1, %i0
	ret
	restore
	.size	main, .-main
	.ident	"GCC: (GNU) 3.3.3"

Here we really start to appreciate the conciseness of C, 39 lines of source code become 261 lines of assembly code!

The new program reveals the general format of procedures: they all start with a prologue. Here the stack pointer %sp is decreased by a certain amount, allocating space for the variables of the subroutine. Hereafter, the parameters, which are passed in the registers %i0, %i1, ... are copied into local variables.

The C instruction "for(i = 0; i < n; i++)" is translated into

        st      %g0, [%fp-20]
.LL2:
        ld      [%fp-20], %o5
        ld      [%fp+72], %g1
        cmp     %o5, %g1
        bl      .LL5
        nop
        b       .LL1
         nop
.LL5:

        ...
        ...

        ld      [%fp-20], %g1
        add     %g1, 1, %g1
        st      %g1, [%fp-20]
        b       .LL2
         nop
.LL1:
First i, which is stored at address %fp-20 is initialized with the value of %g0, which apparently equals zero. Then this value of i is loaded in %o5 and the value of n, which is stored at address %fp+72 is loaded in %g1. These two values are compared with the cmp instruction. bl is the mnemonic for "branch less". This means, that when in the comparison the first argument is smaller than the second the execution continues at the following label, in this case .LL5. Otherwise the execution procceeds with the next line of code and comes to the unconditional jump to label .LL1, jumping beyond the loop. At the end of the loop is loaded again. 1 is added to the register %g1 containing i and this value is stored again. Then there is an unconditional jump to .LL2, where the condition is tested again.

The numerical computation involving arrays "a[i] = (c * i) % n;" is turned into the following:

        ld      [%fp-20], %g1
        sll     %g1, 2, %o5
        ld      [%fp+68], %g1
        add     %o5, %g1, %l0
        ld      [%fp+76], %o0
        ld      [%fp-20], %o1
        call    .umul, 0
         nop
        mov     %o0, %g1
        mov     %g1, %o0
        ld      [%fp+72], %o1
        call    .rem, 0
         nop
        mov     %o0, %g1
        st      %g1, [%l0]
First i is loaded into %g1. Because each position of the array takes four bytes, i is multiplied by four by shifting two positions to the left. The result is stored in %o5. Then the starting address of a[], which was stored at address %fp+68. These two values are added and the result is stored in %l0. This is the address of a[i]. Now c and i are loaded and multiplied. Stupidly the result, which stands in %o0 is moved to %g1 and then %g1 is move again to %o0. n is loaded into %o1 and the modulo subroutine .rem is called. The result is moved again from %o0 to %g1 and finally it is stored at the previously computed address %l0 of a[i].

Most of the rest is similar. An interesting point is still the exchange of the two array values "x = a[i]; a[i] = b[i]; b[i] = x;". How is this done at assembly level? Here is the code.

        ld      [%fp-20], %g1
        sll     %g1, 2, %i5
        ld      [%fp+68], %g1
        add     %i5, %g1, %g1
        ld      [%g1], %g1
        st      %g1, [%fp-24]
        ld      [%fp-20], %g1
        sll     %g1, 2, %i5
        ld      [%fp+68], %g1
        add     %i5, %g1, %i4
        ld      [%fp-20], %g1
        sll     %g1, 2, %i5
        ld      [%fp+72], %g1
        add     %i5, %g1, %g1
        ld      [%g1], %g1
        st      %g1, [%i4]
        ld      [%fp-20], %g1
        sll     %g1, 2, %i5
        ld      [%fp+72], %g1
        add     %i5, %g1, %i5
        ld      [%fp-24], %g1
        st      %g1, [%i5]
After computing the address, a[i] is loaded into %g1. This value is stored at %fp-24, the address of x. Then the address of a[i] is computed again and the result is left in %i4. Then the address of b[i] is computed and b[i] is loaded into %g1. This value is then stored at address which is found in %i4, that is, it is assigned to a[i]. Finally the address of b[i] is computed again and the value of x, which is loaded from address %fp-24 to %g1 first, is stored at its position. Quite a lot of work for a simple exchange of two variables!
Each procedure starts with a save and ends with a restore. All ifs, fors and whiles are translated using tests and gotos.

Code Optimization

Modern compilers can do much more than translating syntactically correct code of a high programming language into syntactically correct code of an assembly language: The first point is important, but not an issue here: it has no impact on the assembly code.

What possibilities does a compiler have to optimize the code? We list the most important ideas:

All this and much more is provided. The third point has far reaching consequences: if there is no output at all, an optimizing compiler may leave away all code. This means that one has to be careful when measuring running times: maybe the very good results are due to the fact that almost nothing is done!

Of course these optimization tools come at a price: the compilation takes much longer. It also, at least partially, explains why modern compilers are so much larger than those in the early days. In 1984 a complete Pascal compiler took about 20 KB of storage, leaving some space for the program on a computer with 48 KB of memory. A modern C compiler requires 30 MB of storage!

Optimizers are integrated in compilers. They help to generate assembly code which can be executed several times faster than non-optimized code.

First Example Optimized

We consider the same program as before, but now it is compiled with optimization option -O3, which turns on most optimizations. As a result we get the following:
	.file	"assembly_ex1_opt.c"
	.section	".rodata"
	.align 8
.LLC0:
	.asciz	"Give x   >>>   "
	.align 8
.LLC1:
	.asciz	"%d"
	.align 8
.LLC2:
	.asciz	"Give y   >>>   "
	.align 8
.LLC3:
	.asciz	"z = %d\n"
	.section	".text"
	.align 4
	.global main
	.type	main, #function
	.proc	04
main:
	!#PROLOGUE# 0
	save	%sp, -120, %sp
	!#PROLOGUE# 1
	sethi	%hi(123456512), %o5
	sethi	%hi(.LLC0), %o3
	or	%o5, 277, %o4
	or	%o3, %lo(.LLC0), %o0
	call	printf, 0
	st	%o4, [%fp-20]
	add	%fp, -20, %o1
	sethi	%hi(.LLC1), %i0
	call	scanf, 0
	or	%i0, %lo(.LLC1), %o0
	sethi	%hi(.LLC2), %o2
	call	printf, 0
	or	%o2, %lo(.LLC2), %o0
	add	%fp, -24, %o1
	call	scanf, 0
	or	%i0, %lo(.LLC1), %o0
	ld	[%fp-20], %o0
	ld	[%fp-24], %o1
	sethi	%hi(.LLC3), %g1
	add	%o0, %o1, %o2
	mov	1, %i0
	or	%g1, %lo(.LLC3), %o0
	call	printf, 0
	sll	%o2, 5, %o1
	ret
	restore
	.size	main, .-main
	.ident	"GCC: (GNU) 3.3.3"

We first consider the translation of the fragment "x = 123456789; printf("Give x >>> "); scanf("%d", &x);". This translates to
        sethi   %hi(123456512), %o5
        sethi   %hi(.LLC0), %o3
        or      %o5, 277, %o4
        or      %o3, %lo(.LLC0), %o0
        call    printf, 0
        st      %o4, [%fp-20]
        add     %fp, -20, %o1
        sethi   %hi(.LLC1), %i0
        call    scanf, 0
The instructions have been rearranged. There is no longer memory allocated for the useless variable i, now x is stored at address %fp-20. The useless move from %o5 to %o1 has been eliminated. Also the nop (which is the mnemonic for "no operation") after the print is no longer there. Without knowing more about the precise timing of the instructions one cannot tell whether it was necessary in the non-optimized code or not. It is common that there are limitations on the instructions like "instruction xyz should not immediately follow instruction pqr". All together, the optimized code is considerably better than the original one: 13 lines have been reduced to 9.

On the other hand, the code is still far from optimal. It may come as a surprise that the useless initialization of x is performed. However, this is beyond the horizon of the compiler: the compiler does not know that the value of the parameter x which is passed to scanf has no importance inside scanf. Also it is somewhat surprising that the value of x is stored to memory before calling scanf. It might also have been kep in one of the many registers. This is relevant, because the instruction st costs at least 3 clock cycles (this number is given in a description of the sparc instruction set) while copying a value to a register takes only 1 clock cycle. If x is not standing in the cache yet, it costs even much more. Nevertheless, this is not an error of the compiler. The reason is similar: the compiler does not know what is done inside scanf. Thus, the value cannot be left in a register because the registers might be used in the subroutine. The fact that the address of x is passed to the subroutine suggests that it will be written there, but this is not necessarily so. Therefore, in order to be on the safe side, the value of x must be written to memory before entering the subroutine.

The optimizer eliminates useless instructions and rearranges them, but it is not omniscient and cannot always attain the optimal code.

Second Example Optimized

For the second program the difference with the non-optimized version is at least as large. The number of instructions is reduced from 261 to 220. Of course this does not tell much about the performance improvement, because most time is spent inside the loops. For n = 20 * 10^6, on a (rather slow) SPARC workstation the non-optimized version takes 25 seconds, the optimized version 15 seconds. Click here to download the optimized assembly code.

We only consider the following subroutine in some detail:

  int span(int* a, int n) {
    int i, m, M;
    m = MAXINT;
    for (i = 0; i < n; i++)
      if (a[i] < m)
        m = a[i];
    M = 0; 
    for (i = 0; i < n; i++)
      if (a[i] > M)
        M = a[i];
    return M - m; }
Here there is a large potential for optimization. One might wonder whether the compiler decides to squeeze the two loops together. This is profitable in all respects, because it reduces the number of loop operations, it improves the usage of the pipeline, and the variables of a[] have to be brought into the memory only once. All together, this will reduce the cost by at least a factor two.

The optimized assembly code is as follows:

span:
        !#PROLOGUE# 0
        !#PROLOGUE# 1
        mov     %o0, %o3
        mov     0, %o5
        sethi   %hi(2147482624), %o0
        cmp     %o5, %o1
        bge     .LL78
        or      %o0, 1023, %o4
        sll     %o5, 2, %o0
.LL81:
        ld      [%o3+%o0], %g1
        cmp     %g1, %o4
        bge     .LL67
        add     %o5, 1, %o5
        mov     %g1, %o4
.LL67:
        cmp     %o5, %o1
        bl      .LL81
        sll     %o5, 2, %o0
.LL78:
        mov     0, %o0
        cmp     %o0, %o1
        bge     .LL80
        mov     0, %o5
        sll     %o5, 2, %g1
.LL82:
        ld      [%o3+%g1], %g1
        cmp     %g1, %o0
        ble     .LL73
        add     %o5, 1, %o5
        mov     %g1, %o0
.LL73:
        cmp     %o5, %o1
        bl      .LL82
        sll     %o5, 2, %g1
.LL80:
        retl
        sub     %o0, %o4, %o0
        .size   span, .-span
        .ident  "GCC: (GNU) 3.3.3"

In the whole subroutine not a single value is stored to memory. Everything is done within the registers. This is possible because no further subroutines are called. The starting address of a[] is moved to %o3 and the value of n is left in %o1. %o5 is used for the counter i. In the earlier examples there were many nop operations. One of the reasons for this is, as becomes clear from the above code fragment, that the first instruction after a jumping statement is still executed. This leaves some more time for actually performing the jump. In the optimized version the nops are replaced by useful statements. To reduce the number of jumping steps, the loop condition is now tested in two places: once at the beginning before the first pass, and once at the end, before each further pass. This is all quite clever, but the two loops are not squeezed. Apparently, for such a drastical code rewriting one of the many other optimization options must be specified.

In the optimized code variables are stored to memory only if necessary, normally they are kept in the registers.

Efficiency Aspects

Ever Faster

A machine instruction is executed in several smaller steps:
  1. Fetch the instruction pointed to by the program counter into the instruction register.
  2. Update the program counter.
  3. Determine the type of the instruction.
  4. If the instruction requires data, then determine where they are.
  5. If the instruction requires data which are not yet standing in one of the registers, fetch them from the memory and load them into a register.
  6. Execute the instruction.
  7. Store the result, if any.
  8. Go to step 1.
Most of these substeps can be executed in one clock cycle, the basic time unit of a computer system. Modern systems have been designed so that it is actually possible to perform these substeps at the same time in a pipelined version. Cutting an instruction in a large number of very simple substeps is one of the reasons why the clock frequence, which equals one divided by the time per clock cycle, could be increased so dramatically. If these substeps would be executed after each other, nothing would be gained by this. By pipelining the substeps, the increased clock frequence is turned into faster execution. Of course this only works if the pipeline is kept busy. There are several reasons why this is sometimes hard to achieve.

If an instruction requires data computed by a previous instruction, this computation cannot proceed and the pipeline stalls. The compiler will discover this problem and if possible it will therefore try to rearrange some instructions to alleviate this problem without changing the computed result. This is called out-off-order execution. Conditional instructions are another problem for pipelined execution: it takes some time to figure out how the computation will continue. In the case of if statements, the main trick is that the compiler tries to predict which of the two branches is most likely and temptatively continues with this branch. This is called branch prediction, and it is claimed to be highly effective. It may also be that by default the first of several alternatives is taken, and therefore performance may sometimes improve if the most likely alternative is put in the if-clause. In the case of loops with few instructions, such as a for loop initializing an array, after each execution there is a test whether to continue or not. In this case the compiler (or the user) can partially unroll the loop, performing several passes of the loop before testing the condition again. Loop unrolling is a major way to reduce execution time.

A pipeline-conscious program may be several times faster than a program which was written without paying attention to the features of the nstruction execution. Fortunately, this performance improvement can mostly be achieved by minor changes, most of which can be found by a good compiler. Therefore, there is normally no need to take the pipeline into account during program development: pipeline optimization is a programmatic rather than an algorithmic issue.

Recently it has become hard to continue doubling the clock frequence every 18 months as was achieved for the last 30 or 40 years. This exponential increase of the processor speed is often called Moore's law. Due to physical reasons it becomes harder to strongly reduce the size of the circuits. High clock frequences also require high voltage and high voltage implies high energy consumption (V^2 / R Watt is dissipated when putting V volts over a resistor with resistance R and the actual energy consumption increases even stronger than quadratic with V). For mobile devices this is not acceptable because this reduces the time one can work without being connected. For desk-top devices, power consumption itself is not yet the major concern, but it becomes hard to sufficiently cool the processor using air cooling. Liquid cooling is reasonable only for expensive super computers. At the same time, competition forces the producers to come with more powerful processors. One way out is to increase parallelism that is, the possibility to execute several instructions at the same level of the pipeline at the same time in parallel. For example, in the ALU there may be independent units for additions and multiplication, This may mean that in a single clock cycle an addition and a multiplication can be performed. In the case of branching statements, it may even be possible to follow both paths until it becomes clear which one is the right one.

Computations can only be performed on data which are located in one of the registers. In older days, when the clock frequence was infinitely low, the missing data could be fetched from the main memory in a single clock cycle. However, at a frequence of 10^10 Hz, a signal can travel less than 3 centimeter in the time of a clock cycle. Therefore, there are several levels of cache storage located on the processor chip. Because these are much closer to the processor, and because there is no additional delay due to going off-chip, these can be accessed relatively fast. Making the caches larger is a final way of improving processor performance.

The continuing performance improvement is mainly due to large-scale integration, deeper pipelines, on-chip parallelism and larger caches.

Interaction of Cache and Main Memory

The data in the caches are only local copies of the data in the main memory. If the processor needs a data item which is not available in the cache, a request goes to the main memory. Then a cache line is copied from the main memory to some position in the cache. A cache line consists of a consecutive fraction of the memory of about 64 bytes (a byte consists of 8 consecutive bits) containing the requested item. If the place in the cache where some data are stored is needed by some other data, then it is checked whether the current data were changed since they were copied from the main memory. If yes, then the corresponding data in the main memory are overwritten, otherwise this is not necessary.

The motivation for copying 64 bytes, when we were asking only for one or four, is that it is very expensive to fetch data from the main memory, and often an application will also need the neighbors. In this way one hopes to amortize the access cost. Everything becomes much faster every year, but the speed of processors is increasing faster than the speed at which such requests for data can be handled. The gap has been growing continuously and is developing into a serious problem.

Data cannot be placed freely at any position in the cache. This computation of the position is namely done in a very simple way (basically by just taking the memory position modulo the size of the cache). Sometimes the system can also choose several positions. The number of positions from which can be chosen, is called the associativity of the cache. In this context one may say: "the cache is one-way/two-way/four-way/fully associative".

We give an example of a typical case: a four-way associative cache of size 256 KB (when talking about memories, K = 2^10 = 1024, M = 2^20, G = 2^30 and T = 2^40, but 1 GHz really means 10^9 Hz) and cache lines of 64 bytes. This cache should better be viewed as consisting of 2^18 / 64 = 4096 cache lines. Suppose that the data item stored at position i = 39506151 of the memory is needed by the processor. Written binary we have i = 10.010.110.101.101.000.011.100.111. Then first the beginning j of the 64 = 2^6 bytes stretch containing this item is computed. This can be done in one operation by putting the last 6 bits of i at zero: j = 10.010.110.101.101.000.011.000.000. Then the possible cache positions are computed. There are several strategies, but the simplest is to consider the four possibilities given by k_l = (j / 64) modulo 1024 + 1024 * l, for l = 0, 1, 2, 3. Here we use 1024 = 4096 / 4, the number of cache lines which fit into the cache, divided by the associativity. That is, the positions 001.101.000.011, 011.101.000.011, 101.101.000.011 and 111.101.000.011. Computing these numbers may appear hard, but this can be done in one clock cycle using bitwise operations for masking and shifting. Then it must be checked whether this cache line is already there. If not, then it must be copied from the main memory and positioned at one of the possible places. This implies that some earlier cached information must be thrown out. Which one? A good heuristic is to throw out the cache line which has not been used longest: the LRU, least-recently used strategy. Of course, applying this strategy requires that for every cache line there is maintained some extra information telling when this cache line was accessed for the last time. Finally, the 64 bytes of the main memory starting at position j are copied to position k_l for the chosen l, possibly after copying the information which was stored in this section of the cache back to the main memory.

Possible placement of a memory line in the cache

Clearly, having caches makes everything much more complicated. There is also a considerable overhead in computing the positions, checking for available copies and so on. Particularly it also costs extra time to fetch 64 bytes from the memory instead of 1 or 4. All this effort is wasted if out off a cache line a single number is used a single time before this cache line is overwritten by another cache line. That caches are nevertheless used in all modern processors is due to the fact that in most computations there is either temporal or spatial locality. Temporal locality means that within a certain stretch of time the computation stays local in the sense that it is working only on a small subset of the data. Spatial locality means that in consecutive time steps a computation tends to work on data which are stored close together in the memory. In programs without any form of locality it would be better to not have caches. The locality of a program can normally not be improved by rearranging a few instructions: a cache-aware program requires a cache-aware algorithm to begin with. This is a major issue, because a program with poor cache behavior can be 10 times slower than a cache-aware program. Furthermore, this factor has been growing rapidly and it appears that it will continue to grow in the future.

The limited choice of where to position required data in the cache is needed to make it simple to find back the information. At the same time, it implies that in a k-way associative cache there may be sets of k + 1 elements which are continuously kicking each other out of the cache, because they are mapped to the same set of k cache positions. If this happens, we say that the program is trashing. If one is measuring the performance of a program, then it may happen that a minimal change (like reordering the variable declarations) gives a considerable performance change. This may be caused by the difference between trashing or not.

Another complication with having caches is when several processors are working on the same data. Processor 1 is assigning the value of variable x to variable y, processor 2 is assigning the value of y to variable z. Which value gets z? In general this is unclear, because we should specify more exactly the times at which this happens. But even if we enforce that the first assignment is performed before the second, then it does not need to mean that finally z equals x. It may namely happen, that after the assignment y = x, processor 1 is not writing back the new value of y to the main memory but is keeping it in the cache (in general this is clever, because if y gets assigned a new value once more, we have saved an access to the main memory). It may also happen that processor 2 already had y in the cache and does not feel obliged to get a fresh value from the main memory. In all these cases it may happen that finally z equals the original value of y, or another value. In the technical language, this is called the problem of cache coherence. Because the caches must be kept coherent to not encounter any of the mentioned incorrect computation sequences, it is hard to efficiently solve a problem on several processors of a computer accessing a shared main memory, each having their own caches.

Understanding the cache behavior is the key to understanding the performance of programs in practice.

Interaction of Main Memory and Secondary Memory

We consider how the secondary memory can be used to allow to perform computations which require more storage than available in the main memory.

A basic such mechanism is directly build into the operating system and a program can be used this way without any explicit instructions from the side of the programmer. Normally, there is a so-called swap partition, a special section of the hard disk, reserved for this purpose. This is also called the swap space. Mostly the size of the swap space is of the same order of magnitude as the main memory. If we are running an application which requires 800 MB on a computer with 512 MB RAM, then all data which do not fit into the main memory are paged-out automatically. If these data are needed, then they are paged-in again. In this way, if there is for example 600 MB swap space, we can work on a computer with 512 MB main memory as if there were (512 + 600) MB of memory. Therefore this additional memory is also called virtual memory.

Unfortunately virtual is not as good as real. The reason is that accessing the secondary memory is several order of magnitude (an order of magnitude is a factor 10) slower than accessing the main memory. There are several reasons for this. In the first place is the hard disk even more remote from the processor than the main memory. But, more importantly, this is due to the very different nature of the storage medium. A hard disk consists of one or more rotating disks. Data are stored more or less like on a old fashioned music LP: in circular tracks. The only difference being that the tracks do not run in a spiral. Such a track is divided into sectors which can be used for data storage. These sectors are quite large, offering space for several KB. For each of the one or more (sides of) disks there is one reading/writing head (but typically at most one of them can be used at a time). Now if we want to read the data in sector 17 on track 193, first the head must be moved to stand over track 193, and then it must wait until sector 17 comes by. The delay for moving to the specified track is called seek time, the delay for waiting until the sector comes by is called rotational delay. Knowing that hard disks rotate at around 7200 rpm (rotations per minute), it follows that on average the rotational delay is 60 / (2 * 7200) = 0.004 seconds. The seek time is similar, thus performing random accesses to the hard disk, we can on average perform on the order of 100 accesses per second. That is not much!

Hard Disk

Fortunately, if we do not only want to read sector 17 on track 193 but also 18, 19, ..., then the hard-disk controller is so clever to organize this reading so that we have to position the reading head only once. There are many other clever ideas in this domain. We just mention two of them. Suppose we want to read all sectors of track 193. Suppose that when the reading head arrives at the track, it is standing over sector 14. Then it can wait until sector 0 passes by, but it can also start to read immediately, temporarily store the data in a buffer and return them in the correct order. The second idea applies when we want to read several sectors scattered over the hard disk. Then these can be accessed in the specified order: the reading head will move back and forth over the disk. Alternatively one can try to schedule the accesses so that the distance covered by the reading head is minimized. This is achieved by sorting the requests according to track number: the reading head will move once from the inside outwards or vice versa. If one also takes into account the rotational delays, then the problem to solve becomes much harder

In this way we can often read much more than 100 sectors per second. If we are reading very long sections at a time, then the transfer rate between memory and hard disk increases to several MB per second. Exactly the same phenomenon we find also when considering the transfer of data between main memory and cache: a single cache miss (the event of not finding in the cache a requested data item) costs several hundreds of clock cycles. But, if we are requesting many data items which are stored consecutively in the main memory, then these are delivered at a much higher rate, reducing the average cost to a few clock cycles per delivered item. At both levels, it is attempted to improve the performance by applying pre-fetching: this means that if the program is accessing data item i, i + 1, i + 2, then it is guessed that also i + 3 will be needed, and while the processor is still working on i + 2 the system already fetches i + 3. Possibly this effort is wasted, but there was not much else to do anyway.

Random accesses to the hard disk are terribly expensive. Acceptable transfer rates between primary and secondary memory are obtained only when accessing large stretches of the secondary memory consecutively.

If a data item which is needed by the processor does not reside in the cache, we speak of a cache miss; when a requested data item does not reside in the main memory, we speak of a page miss (the analogy would have been better if cache misses would be called "line miss"). In the case of a cache miss, the line containing the requested data item is copied from the main memory to one of the few possible places. This replacement strategy is kept simple because a cache miss should cost only a few clock cycles (which unfortunately is not achieved any longer as was discussed above). In the case of a page miss, the page containing the requested item is copied from the secondary memory to the main memory. A page is the unit of reading and writing on the hard disk. Different from the size of the cache lines, which is a system parameter, the size of the pages is user defined (though down in the system there is some system specific page size as well). A typical default size is 8 KB. Because this operation is `infinitely' expensive, there is time to try to optimize the replacement strategy. So, the location of the page to copy is not determined by a simple modulo computation. In this case the LRU strategy is applied in full generality: the page is placed instead of the page which was least-recently used (this requires the implementation of a data structure known under the name "priority queue").

The LRU strategy is simple and intuitive. It is the most used strategy for deciding which page to throw out of the main memory.

The system provided virtual-memory management is convenient: it allows to go into the `red' without need for a new program. However, whether this works or not depends very strongly on which problem one is solving and which program one is using. We have seen that every time one wants to access a data item which is not available in the main memory a whole page containing thousands of data items is brought into the main memory. If all these items are useful, then the performance will be worse than if one would have sufficient main memory, but the factor is not large. If on the other hand, there are only a few useful data items on each page loaded, then the performance breaks down entirely, in many cases one will not have sufficient patience to wait for completion: increasing the size of the problem by 10% may increase the computation time by a factor 100 or more. Saying it more formally, the performance degradation strongly depends on the locality of the program. For example, when computing a[i] = b[i] + c[i], for i = 0, 1, ..., n - 1, then there is good spatial locality and the computation will still go quite fast even if n becomes so large that the arrays do no longer fit into the main memory. On the other hand, computing a[i] = b[c[i]], may go very slowly for large: a[] and c[] are still accessed in a consecutive way, but, depending on the values in c[], the access pattern of b[] may be chaotic.

There is no need to rely on the virtual-memory management. If one is especially designing programs for solving very large problems, then it is better to take over the control. This is done by making sure that the whole application fits into the main memory and explicitly writing away to a file the data which one temporarily does not need. The biggest theoretical advantage of this user-controlled memory management is, that one exactly knows which data are in the main memory. This makes that you can give performance guarantees, because no unexpected paging occurs. More practically, it turns out that doing it yourself is faster: if you need many data at once, this can be specified; data which do not need to be saved are not written away; no time is wasted on finding out which page should be removed. The really compelling argument to apply explicit memory handling is that the mentioned swap partition is mostly relatively small (though one can declare it to be bigger from the start). Some of the other partitions (typically with names such as "/tmp", "/var", "/var/tmp" or "/scratch") are much bigger, which allows to solve huge problems on a 1000-euro PC.

The virtual-memory manager provides a convenient but limited extension of the possibilities. For serious computations with large data sets, one should deal with the memory management in an explicit way.

Exercises

  1. Mention several possible causes why accessing the main memory takes much longer than accessing the cache. Speculate about the possible future development of each of these contributions.

  2. Write the decimal number 123456789 as a binary and as a hexadecimal number. Write the decimal number 1234 as a number with radix 7. Explain how you found your results or show your computation.

  3. In the text you can find a C procedure int_to_binary(int n, int k) for converting an integer into an array containing the bits of its binary representation.
    • Generalize the idea of this procedure to a procedure int_to_digits(int n, int r, int k), converting an int to an array of digits with respect to radix r.
    • Give the procedure digitsToInt(int* a, int r, int k) which converts an array of digits of a number given with respect to radix r into the corresponding int.
    • Give a procedure radix_conversion(int* a , int r1, int r2, int k) which converts a number, given as an array of digits from a representation with respect to radix r1 to a representation with respect to radix r2. This procedure should call the other two procedures, it should not contain much code itself.
    • Embed the three procedures into a small program which does the following: It asks for a number n, for r1 and for r2. It Converts n to the array representation with respect to radix r1 and prints the result. It converts this representation to a representation with respect to r2 and prints the result. It converts back to an integer and prints the result.

  4. In the text procedures were given for manipulating the bits in a byte. Rewrite these to manipulate the bits in an array of unsigned chars. For example, set_to_one should have parameters a and i, where a is the name of an array and i the index of the bit which has to be set to 1.

  5. Write a C program which asks for a line of text, reads it, and outputs it again. In the output all capitals should be replaced by the corresponding small letters, while all other letters should remain unchanged. Hint: use the subroutine getchar() to read a single character from input. The character '\n' indicates the end of a line.

  6. We consider 8-bit numbers. Give the binary representation of the numbers -100 and +100 in the format with sign bit, in two's-complements format and in the excess representation with offset 127.

  7. Give the bit representation of the following two numbers as floating-point numbers for the subdivision of a 32-bit number specified in the text above: -127.390625 and 56.47.

  8. The two special representations of 0 as floating point numbers were discussed. Write a C program to perform the following tests:
    • Initialize a float f with 0 and print its 32 bits.
    • Multiply f with a large positive number, for example 10^20, and print its bits. If the bit representation has changed, then compare f with 0 and print whether it is still considered to have value 0.
    • Multiply f with a negative number and print its bits. If the bit representation has changed, then compare f with 0 and print whether it is still considered to have value 0.
    • Asign -0 to f and print its 32 bits. If the bit representation is not the same, then compare f with 0 and print whether it is still considered to have value 0.
    • Asign 0 to f. Then change some bits of the mantissa of f from 0 to 1. Compare f with 0 and print whether it is still considered to have value 0.
    • In conclusion, how many different representations are there for the number 0?

  9. Show how the OR, AND and EXOR functions can be written as a combination of NOT and NOR. Do the same using only NOT and NAND.

  10. Create a two-bit adder circuit using NOT, OR, AND, NAND, NOR and EXOR gates. For two number composed of two bits each, (a_1, a_0) and (b_1, b_0), this adder should compute the two-bit sum of these numbers and the value of a possible carry. So, the circuit should have four inputs and three outputs.

  11. Design a gate computing the exor function for the two input signals a and b. The goal is to minimize the number of used transistors: it is not correct to compute the function using that a ^ b = (a || b) && ! (a && b), which would require five transistors.

  12. Design a gate computing the implication function for the two input signals a and b. The goal is to minimize the number of used transistors. It is not hard to construct a gate computing a -> b with three transistors. The question is whether two transistors are sufficient.

  13. Download the program and change the position of the variable i in the declaration "int i, x, y, z;": put it at position 1, 2, 3 and 4. Compile the code with gcc without optimization. Check the generated assembly code. For each of the variants give the addresses of the variables x, y and z. Conclude from this in which order the variables are arranged. What does this mean for running programs on a computer with limited cache associativity?

  14. Download the optimized assembly code of the second example. Describe how the exchange of two numbers in the subroutine swap() is realized. Is there a great difference with the implementation in the non-optimized version?

  15. Write a C program along the following lines. It has a subroutine swap_local(int x, int y) which exchanges x and y. It has a subroutine swap_global(int* x, int* y), which really exchanges x and y. In main variables x and y are read. Then swap_local is called and the values are printed. Then the same is repeated with swap_global. Compile the program with gcc -S. Point out the essential difference in the assembly code which makes that in the first case the values of x and y in main remain unchanged while in the second case they are exchanged.

  16. In most higher programming languages there is a conditional statement for jumping to one of many equivalent alternatives. In C this statement is called "switch". Such a statement can be translated in two essentially different ways:
    • It can be turned into a chain of the form if ... else if ... else if ... else ... . In this case, the time to perform the statement is proportional to the number of listed alternatives.
    • If the key value x which is used to index the alternatives only assumes values in a small range, then an array l[] of labels might be created, accessing the alternatives by jumping to l[x - m], where m is the smallest value of x. In this case, the correct alternative can be chosen in constant time, independently of the number of listed alternatives.
    Write a program which asks for an integer between 1 and 12 and then, using a switch, prints the name of the corresponding month. Translate this program with gcc -S and with gcc -S -O3. Consider which of the above two implementations is chosen.

  17. Consider a four-way associative cache of size 512 KB. Consider a character stored in byte 442,202,882 of the memory. If we apply the described mapping strategy, then indicate were in the cache this byte can be mapped.

  18. Consider an array a[] consisting of n integers of four bytes. We want to analyze how many cache lines arise when accessing all positions of a[] in order. This depends on how a[] is arranged in the main memory. In practice the memory may be fragmented, but this has little impact, if any at all. In the following it may be assumed that a[] is stored in 4 * n consecutive memory positions. Let r(n) denote an upper bound on the number of cache lines in which values of a[] may be stored.
    • Give the exact value of r(n) for n = 1, ..., 10. Notice that a[0] need not to be stored at the beginning of a line.
    • Give an exact expression for r(n) for general n.
    • Alignment means to arrange data objects (most noticeably arrays and files) in memory so that their beginning lies at an address which is a suitably large power of two. In the current context, aligning a[] would be to take care that the address of a[0] is a multiple of 64. Why might it be attractive to have a[] aligned?
    • In C aligned memory allocation is provided by the command memalign(). Write your own subroutine void* aligned_malloc(int size, int alignment), which returns a pointer to an aligned block of at least size bytes. Hint: allocate some extra memory and then place the pointer appropriately.

    Non-Aligned Array in Memory

  19. If we have an array a[] consisting of n integers of four bytes each, then the element a[i] is stored at position x + 4 * i, where x indicates the address of a[0]. Consider the following operations and indicate the number of resulting cache misses in case the cache lines have length 64 bytes.
    • All values of a[] are increased by 1: a[i] = a[i] + 1.
    • All values of a[] are increased by the value of a variable d: a[i] = a[i] + d.
    • The values of two arrays a[] and b[] are added together in a[]: a[i] = a[i] + b[i].
    • The values of two arrays a[] and b[] are added together in c[]: c[i] = a[i] + b[i].
    The correct answers to the above questions may depend on the associativity. Distinguish various cases, always indicating the number of cache faults in the most unlucky situation.

  20. Assume we have a main memory with space for 4 pages. Initially all data are stored on the hard disk. Let the access pattern of the pages be given by A, B, A, C, D, E, F, A, C, D, A, B, D, F, where the letters indicate the indices of pages on which the requested data can be found. Assume that the LRU strategy is applied for determining which page to throw out if the main memory is full and a new page must be loaded. Indicate the sequence of pages which can be found in the main memory after any of the 14 accesses. How many pages are loaded in total? Possibly the number of page faults can be reduced by not applying the LRU strategy. Give the best schedule you can find. Are you sure it is optimal?

  21. Assume we have a main memory with space for x pages. Assume that the LRU strategy is applied for determining which page to throw out. Construct an access pattern, for which applying the LRU strategy results in many more page faults than applying an optimal strategy. Considering very long patterns, how big can the factor between LRU and optimal become expressed as a function of x? Prove that this bound is tight (except for possibly a constant factor). In the theory of "online" algorithms, this worst possible ratio is called the performance ratio of the strategy.





Object Oriented Programming: Java

In this chapter a high-level view of Java is presented. It is not intended to provide a complete description of the language. Particularly, it is assumed that the reader already knows how to program in C or a similar language. Here we point out the main features of object-oriented programming and illustrate the introduced concepts with examples taken from Java. There are many good textbooks and reference books. For specific information on classes an overview of all standard classes is provided online.

Introduction

Like in C, the execution of a Java program starts in the procedure called "main". In Java it is common to say method for procedure. This procedure must be found inside some class. In Java classes are what before we called types. In object-oriented languages, the classes are put in the foreground, and therefore, any method, including "main", must belong to some class. The name of the text file in which a program is stored is determined by the name of the class in which main is given: it must be "MainClassName.java".

Java was originally designed as an interpreted language. However, not the source code is interpreted, but something called byte-code. This byte-code is generated by a program (one could say a compiler) called "java.c". So, once the program is written, one types "java.c MainClassName.java". If there are no syntactical errors, then a new file with name "MainClassName.class" is generated. The program can now be executed by typing "java MainClassName".

The following gives a very simple program computing the average value of the fields of an array:

  class ArrayAverage
  {
    static final int SIZE = 100;

    public static void main(String ps[])
    {
      int i, sum;
      int[] a = new int[SIZE];
      
      for (i = 0; i < SIZE; i++)
        a[i] = i;
      for (i = 0, sum = 0; i < SIZE; i++)
        sum += a[i];
      System.out.println("\nThe average value is " + 
        (float) sum / SIZE + "\n");
    }
  }

Comparing with the C program doing the same, we see that details have changed, but that the program as a whole is more or less the same. The differences are

For trivial programs as in the example, Java is a burden: C is simpler.

In Java there are quite generally applied name-giving conventions:

At the most basic level, the difference between Java and C is small.

Classes, Objects and Methods

Definitions

The central notion in object-oriented languages is the class (though in other languages other names may be used). A class is an extension of a structured data type. Namely, it is a structured data type together with the operations that can be performed on it. In Java, these operations, which in non-object-oriented languages would be called procedures, are called methods. A class variable is called an object.
Classes, structured data types defined along with the operations that can be performed on them, their functionality, are the central notions of object-oriented programming languages.

Class Examples

We give some simple examples of classes. The first class may be useful when writing a program for the administration of a company: the class "Employee". The second class can be useful in a linear algebra program: the class "IntegerMatrix". The third class is useful in creating data structures: the class "Chain". The classes Employee and Chain are, slightly modified, appearing again in complete programs in the section "Program Examples" hereafter. Extending class IntegerMatrix to a program is one of the exercises.

Class Employee

  class Employee
  {
    String name;
    int number;
    double salary;

    public Employee(String theName, int theNumber, int theSalary) 
    {
      name   = theName;
      number = theNumber;
      salary = theSalary;
    }

    public void increaseSalary(double salaryIncrease)
    {
      salary += salaryIncrease;
    }

    public String toString()
    {
      return "(" + name + ", " + number + ", " + salary + ")";
    }

    public void setName(String newName)
    {
      name = newName;
    }
  }

Here we see many important aspects of classes. First the header. A class header always consists of the word "class" followed by the name of the class, in our case "Employee". In the basic case we are considering here, we then get a "{", which is matched by a "}" at the end of the definition of a class. It should be noticed that here we only describe a class, we do not create an object of this class.

Then we see a list of variables: a String, an int and a double. These variables will be called instance variables. Other names are in use as well. Any Employee object (= instance) has the instance variables name, number and salary. So far a class is just like a struct in C.

The difference with a struct is that the definition of a class contains also the definition of the methods that are working on objects (= variables) of this class. Further down we will see how this goes, but here we can notice already that there are four such methods: Employee, increaseSalary, toString and setName.

The latter three resemble procedures in C: they have a name, parameters and a return type. In addition we find the word "public", which is an example of an access modifier. "public" means that these methods are accessible from outside. If we would have written "private" instead, these methods could only have been called from inside the class itself. In total there are four of these access modifiers, they will be discussed further down.

The method Employee is more special. It is a so-called constructor. When calling this method in combination with the keyword new, then memory is allocated and the instructions in the constructor are executed. Typically a constructor contains instructions to initialize the instance variables, but it may also do more or less. In any case: each class must have at least one constructor, otherwise no objects can be generated. The name of a constructor is always the same as the name of the class and therefore one does not indicate the return type: by default it returns an object of the class.

Inside a class a list of instance variables together with their type is followed by all methods. Each class definition must contain at least one constructor, which is called when creating new objects.

Class IntegerMatrix

  class IntegerMatrix
  {
    int n;
    int[][] a;

    public IntegerMatrix(int size)
    // Initializes all positions with 0
    {
      n = size;
      a = new int[n][n];
      for (int i = 0; i < n; i++)
        for (int j = 0; j < n; j++)
          a[i][j] = 0;
    }

    public IntegerMatrix(IntegerMatrix matrix)
    // Creates a copy of matrix
    {
      n = matrix.n;
      a = new int[n][n];
      for (int i = 0; i < n; i++)
        for (int j = 0; j < n; j++)
          a[i][j] = matrix.a[i][j];
    }

    public int trace()
    // Computes sum of diagonal elements
    {
      int s = 0;
      for (int i = 0; i < n; i++)
        s += a[i][i];
      return s;
    }

    public boolean findValue(int x)
    // Checks whether the value x occurs
    {
      int i;
      for (int j = i = 0; i < n && a[i][j] != x; i++)
        for (j = 0; j < n && a[i][j] != x; j++);
      return i < n;
    }
  }

Here we see all the things we saw above plus some new features. Most noticeable is that there are two constructors. They have the same name, inside the same scope (both names are visible inside and outside the class). Nevertheless this is correct: the names are the same, but their signature is not. The signature of a method is the whole set of name, parameter list and return type. For two parameter lists to be the same, parameters of the same types must appear in the same order. Here the first constructor has an int parameter, the second has an IntegerMatrix parameter. When calling these methods from outside, the compiler/interpreter has no problem in figuring out which of the two is meant: it just has to check the type of the parameters and to match it with one of the specified methods. This is a first example of polymorphism about which we will hear more further down.

Now it is also time to notice that inside the class the methods can work with the instance variables. In general, in a method there are four kinds of variables:

Class variables will be introduced later, the other three we have encountered already. In findValue we see examples of each category: a[][] and n are instance variables, x is a parameter and i and j are local variables.

Local variables must not be declared at the beginning of a method. On the contrary: in Java it is considered to be good style to declare a variable locally. Also it is considered good to initialize a variable upon declaration. One must be careful with the scope of a local variable. By scope we mean the "visibility range" of a variable. The scope of a variable stretches from its declaration to the end of the level at which it was declared. So, a variable declared at the beginning of a method is visible anywhere in the method, but not outside the method. That is why we call it a local variable. The variable i in findValue is of this type. A variable which is declared in the header of a for loop is visible inside this loop, but not outside of it. The variable j in findValue is of this type. The reason that i was not declared in the header of the first loop is that we wanted to use it in the final comparison. A variable declared inside a compound statement is visible only within this compound statement.

It is correct and perfectly accepted to use the same variable name in many different methods. This possibility assures that program fragments can be combined without extensive effort to trace all common variable names. If from a method in which a variable x is used another method is called in which also a variable x is used, then of course this latter x is the valid one, because the scope of the x in the calling method is limited to its own method.

Slightly less clear that the following is also correct:

  int i = 10;
  for (int i = 0; i < 1000; i++)
    a[i] = 2 * i;
  System.out.println("i = " + i);
What is going to be printed? 10 of course! Outside the for loop the locally defined variable i is not existing, the scope of this variable is limited to the loop. On the other hand, inside the loop the original variable i is not visible: it is shielded by the more local variable. The reason why this works, not only in Java, is that the compiler creates its own list of variables and has no problem to keep the variables apart. Even though this works, there is rarely a good reason to program this way, and therefore this confusing style of programming should be avoided.

In the second constructor, we see that there may be a parameter of the same type as the class. One might fear that this leads to confusion. However, the instance variables of such a parameter are accessed like the instance variables of any other object with help of the dot-operator, ".", just like in C. So, n is the instance variable of the instance under consideration, while mat.n denotes the corresponding instance variable of the parameter mat.

In both constructors we see how an array is allocated. In C there is no distinction between declaring an array variable and the allocation of its memory. In Java, writing int a[][] creates an array variable without allocating memory (which would be hard, because the compiler still does not know how big it should be). An array variable is actually a reserved memory location in which pointers to arrays of the appropriate type can be stored. The call "new int[n][n]" allocates space for n * n integers and returns a pointer to this space. This pointer is assigned to the array variable a. All this is very clean. A similar construction we have in C when we use int** a and malloc to allocate memory, but this is quite ugly.

Now that we speak about memory allocation: in Java one does not have to bother about cleaning up (though it is possible to do so): the system runs a garbage collector in the background. A garbage collector is a program which checks for objects to which no pointers are pointing anymore and then deallocates their memory.

A variable of a class type is actually only a pointer to an object of this type. This object can be generated calling a constructor and then it can be assigned to the variable.
Not all data types are classes: the primitive data types, several numerical types, characters and booleans, are not classes. All derived types are classes. Only the variables of class types are called objects. This distinction is important: when calling a method, objects are passed "by reference", while non-objects (= normal variables) are passed "by value". Actually this is not an entirely correct view: a variable of a class type is actually a pointer. So, if we pass a class variable as parameter in a method call, then the value of this pointer variable, an address, is copied into the corresponding parameter. This is the same as in C, the only difference is that for variables which are not objects, variables of the primitive types, there is no way to specify that we want to pass their address.

At this point Java is rigid, and sometimes this makes it hard to do easy things. In C it is trivial to write a procedure "swap" for exchanging the value of two variables which are passed as parameters: one passes their address instead of their value, which is done with help of the address operator "&". In the procedure one can access the values of these parameters with help of the value-of operator "*". In Java this simple and common task can only be realized in a quite elaborate way, using a so-called wrapper class: a class with a single instance variable of a primitive type, which thus obtains object status. The following example, which can be downloaded here, gives a possible work-out of this idea. Java provides also predefined wrapper classes: Integer, Float, Boolean, ... .

  class Int // A self-defined wrapper class
  {
    public int v; // The wrapped value
  
    public Int(int x)
    {
      v = x;
    }
  
    static public void swap(Int a, Int b)
    {
      int c;
      c   = a.v;
      a.v = b.v;
      b.v = c;
    }
  }
  
  public class Swap
  {
    static public void swap(int a, int b)
    {
      int c;
      c = a;
      a = b;
      b = c;
    }
  
    public static void main(String[] args)
    {
      int a = 4;
      int b = 7;
      System.out.print("a = " + a + ", b = " + b + "\n");
  
      swap(a, b); // Swapping without effect
      System.out.print("a = " + a + ", b = " + b + "\n");
  
      Int aWrap = new Int(a); Int bWrap = new Int(b); // Wrapping
      Int.swap(aWrap, bWrap);                         // Swapping
      a = aWrap.v; b = bWrap.v;                       // Unwrapping
      System.out.print("a = " + a + ", b = " + b + "\n");
    }
  }

When calling methods class variables, objects, are passed by reference, while variables of primitive types are passed by value. Wrapper classes grant object status to primitive types, allowing to pass variables of primitive types by reference.

Class Chain

In this section we consider classes which can be used to construct a linked list of nodes. The class Node, corresponding to the nodes of the list, contains no methods except for constructors. The class Chain, has a single Node as instance variable. This is the access point to the chain. The methods provide the required functionality, allowing for searching a specified key, printing, insertions and deletions.
  class Node
  {
    int  key;
    Node next;

    Node(int key, Node next)
    {
      this.key  = key;
      this.next = next;
    }

    Node(int key)
    {
      this(key, null);
    }
  }

  class Chain
  {
    Node first;

    public Chain()
    {
      first = null;
    }

    private Node getLast()
    // Return the last node of a chain
    {
      if (first == null)
        return null;
      Node node = first;
      while (node.next != null)
        node = node.next;
      return node;
    }

    public void addFirst(int key)
    // Add a new node at the beginning of the chain
    {
      first = new Node(key, first);
    }

    public void addLast(int key)
    // Add a new node at the end of the chain
    {
      if (first == null)
        first = new Node(key);
      else
        getLast().next = new Node(key);
    }

    public void concatenate(Chain chain)
    // Attach the Chain chain at the end of the considered chain
    {
      if (first == null)
        first = chain.first;
      else
        getLast().next = chain.first;
      chain.first = null;
    }

    public boolean findValue(int x)
    // Test whether there is a node with key value x
    {
      Node node = first;
      while (node != null && node.key != x)
        node = node.next;
      return node != null;
    }

    public void print()
    // Print all the keys together with their position in the list
    {
      int counter = 0;
      Node node = first;
      while (node != null)
      {
        System.out.println("Node " + counter + " has key " + node.key);
        counter++;
        node = node.next;
      }
    }
  }

In Node there are two instance variables: key and node. key is a simple integer instance variable as we have seen before. The exiting thing is that node is of type Node. Is this possible? What does it mean? Here it is crucial that an object, and any variable of type Node is an object because Node is a class, is only a pointer and not the thing itself (otherwise we would get an explosion). So, upon calling one of the constructors with "new Node( ... )", space is allocated for one integer and for one pointer to a Node object (each takes either 4 or 8 bytes) and a pointer to this space is returned.
Linked structures can be defined by defining a class with an instance variable with the data type of the class itself. Because memory for an object is only allocated when explicitly calling a constructor, this does not lead to a recursive explosion.

The constructors of Node contain the special word this. this has several related meanings. It means either: "this class", or "the current object". In our example we find examples of both applications:

When calling in a class a method of the same class, then by default it is assumed that this call is to be performed with the current object. Therefore, even though this is not wrong, it is superfluous to write, for example, "this.getLast()" in the method addLast() of Chain.

The first constructor of Node is of a conventional type: two parameters are passed and assigned to the instance variables. Slightly problematic might be the assignment "this.next = next". Here a Node object is assigned to another Node object. What does it mean? If one realizes that an object is a pointer, the answer is clear: afterwards this.next points to the same object as next. This is general: an assignment "x = y" can always be performed when x and y are variables (y may also be a constant) of the same type (or more generally when the type of y may be converted to the type of x). In case x and y are of a primitive type, then afterwards x has the same value as y. In case x and y are objects, then afterwards x points to the same object as y (even in this case one can say that x has the same value as y, namely the same address).

The class Chain has only one instance variable: the Node first. The single constructor is trivial: no parameters, first is set to null. null is a constant value which can be assigned to any pointer variable (that is, object). It means something like "to_nothing". The important thing is that it can be used in tests. If first (or any other object) has value null, then it would be fatal (that is, leading to a runtime error) to use first.key: first == null means that the pointer of first has no specific value, in particular it is not pointing to a storage space of a Node. Thus, first.key, which means so much as the value of the int stored in the storage space first is pointing to, is not defined. Errors of this kind are very common. In the above example we were careful not to run into it.

The other methods of Chain are for adding nodes either at the beginning or at the end, for checking whether an element exists or for printing all keys in the order they appear in the list. Further methods can be added to make it more useful, here we only give an example. The method getLast is declared private. This is because we decided that it should be for internal usage only. The reason for this is that we maybe do not want to guarantee that it is always there or not in exactly this form. This prevents users from using features which they are not supposed to use. This is a first example of encapsulation about which we will hear more further down.

Now that we are presenting the Chain, we should also try to understand how exactly it works under addition of nodes. Initially we have an empty structure: first == null. Then, the first addition (it does not matter which of the additions is used) creates a new initialized Node by calling "new Node(key)" and assigns the returned value, a pointer to a Node to first. The later additions are of two kinds.

addFirst performs

  first = new Node(key, first);
Here many things are happening! First the value of first, a pointer to a Node or null, is looked up and together with the new key it is passed to the Node constructor. This creates a new Node object with the same next value as first had so far. Then the resulting pointer is assigned to first.

addLast performs

  getLast().next = new Node(key);
Here a new Node object with the new key value is created. Its next value is set to null. Then the resulting pointer is assigned to the next field of the Node which is found by calling getLast. Here getLast walks along the chain until coming to the last node and returns this object (of course it would be handy to have a second instance variable "last" in order to access this position faster, but this would be less instructive).

Operations on a Chain

It was pointed out that one should be very careful not to access the instance variables of a null-object. Is this not exactly what we are doing in the following loop in findValue?

  while (node != null && node.key != x)
    node = node.next;
No! The reason is that in an expression involving && the left-hand side is evaluated first. If node == null, it is certain that the whole evaluation will result with false and therefore it is interrupted. On the other hand, it would have been fatal to write
  while (node.key != x && node != null)
    node = node.next;
Even though this works, depending on the programming language there may be no guarantee that it does. Therefore this is an example of a possibly risky programming style which might better be avoided. In this case this goes at little extra cost by rewriting findValue() as follows:
  public boolean findValue(int x)
  {
    if (first == null)
      return false;
    Node node = first;
    while (node.next != null && node.key != x)
      node = node.next;
    return node.key == x;
  }
Using linked structures in an object-oriented way requires that one or more objects of some node-type occur as instance variables in the definition of another class. These instance variables give the access points to the structure.

Program Examples

It is now considered how the above classes can be integrated in a program which can be tested and adapted. In Java the various classes may stand in several files. For simple small programs, there is no need to do so, but for larger programs this is actually recommended. Each class may be located in its own file with appropriate name: NameOfClass.java. If, as in the example of Chain, a class uses objects from another class, then javac is so clever to first trace all needed classes and to translate even them. So, storing Node and Chain in files Node.java and Chain.java, it is sufficient to write "javac Chain.java": this generates both Node.class and Chain.class, just as when they would have been stored in the same file.

Program Employee

The class Employee is now extended to a complete program based on it. We introduce one extra class Company and a trivial class containing main.
  class Employee
  {
    protected String name;
    protected int number;
    protected double salary;

    public Employee(String theName, int theNumber, double theSalary) 
    {
      name   = theName;
      number = theNumber;
      salary = theSalary;
    }

    public double getSalary()
    {
      return salary;
    }

    public double getNumber()
    {
      return number;
    }

    public void increaseSalary(double salaryIncrease)
    {
      salary += salaryIncrease;
    }

    public String toString()
    {
      return "(" + name + ", " + number + ", " + salary + ")";
    }

    public void setName(String newName)
    {
      name = newName;
    }
  }
  class Company
  {
    protected int size;
    protected int maxSize;
    protected Employee staff[];

    public Company(int theMaxSize)
    {
      size    = 0;
      maxSize = theMaxSize;
      staff   = new Employee[maxSize];
    }

    public int getSize()
    {
      return size;
    }

    public int getMaxSize()
    {
      return maxSize;
    }

    public void setName(int number, String name)
    {
      int i = 0;
      while (i < size && staff[i].getNumber() != number)
        i++;
      if (i == size)
        System.out.print("Number not found, ignoring instruction!\n");
      else
        staff[i].setName(name);
    }

    public void addEmployee(String name, int number, double salary)
    {
      if (size == maxSize)
        System.out.print("No space left, ignoring instruction!\n");
      else
      {
        staff[size] = new Employee(name, number, salary);
        size++;
      }
    }

    public void increaseSalary(double factor, double leastIncrease)
    {
      for (int i = 0; i < size; i++)
      {
        double increase = factor * staff[i].getSalary();
        if (increase < leastIncrease)
          staff[i].increaseSalary(leastIncrease);
        else
          staff[i].increaseSalary(increase);
      }
    }

    public void print()
    {
      System.out.print("\nOverview of employees:\n");
      for (int i = 0; i < size; i++)
        System.out.print("Employee[" + i + "] = " + staff[i] + "\n");
    }
  }
  class CompanyTest
  {
    public static void main(String ps[])
    {
      Company myCompany = new Company(100);

      myCompany.addEmployee("Becker, Boris",    235521, 4500.00);
      myCompany.addEmployee("Hecht, Edgar",     878722, 6500.00);
      myCompany.addEmployee("Albers, Marianne", 456212, 1554.00);
      myCompany.addEmployee("Krauser, Angela",  426578, 1954.00);
      myCompany.addEmployee("Noack, Christina", 663738, 5646.00);
      myCompany.print();

      myCompany.increaseSalary(0.04, 50.0);
      myCompany.addEmployee("Brauer, Harald", 568900, 2200.00);
      myCompany.setName(456212, "Becker Marianne");
      myCompany.print();

      System.out.print("\n");
    }
  }
Here we have slightly changed even Employee. We have made the instance variables "protected" in order to restrict the access from outside the class. Instead special access methods are supplied. These methods are typically given names like "getNumber" and "setName". Of course this means that many extra calls to methods are made, but errors in future extensions is worse! Never forget that in Java the prime consideration is correctness, not speed. If speed is really critical (as it is in programs solving very large problems and in games), then in some small well-documented sections in which most of the computation is performed you may do ugly things. However, if speed really matters, then one can better write a hack in C.

One should also notice that once we have defined Employee how amazingly simple it is to build Company on top of it: we just declare an array of Employee and add a few methods for performing operations on the Company as a whole. Then the main program is more or less an empty shell. The good thing is that even without knowing about the underlying organization, any reader who understands the format immediately grasps what is going on. This is partially because of the names that were chosen, but even more because of the usage of powerful subroutines and the object-oriented programming style.

Here we touch on the most important new point. What does writing "myCompany.addEmployee( ... )" or "staff[i].increaseSalary( ... )" mean? Here we see the second usage of the dot-operator. Before we have seen that it can be used for accessing the instance variables of an object. Here we use it to connect an object with a method from its class. The semantic of this is, that the system first determines the class of the object, then searches for a method with matching signature in the class and then executes the method working on the instance variables of the object. In object-oriented languages, this is the major way of calling methods.

An exception are the static methods. A static method is any method which in its definition is preceded by the keyword "static". Static methods can be called without passing an object of the class on which it works. This implies that inside a static method there are no instance variables to use. A static method corresponds to a procedure in C and other non-object-oriented languages. The non-static methods are something new, the static ones we already know! Even in Java we already know one important example: main. Of course main should be callable without object, because by the time it is called there is not yet any object!

Static methods are encapsulated inside their classes. That is, if they are called from outside the class, it is not obvious where to find such a method (there might be static methods with the same name in several classes). Therefore, when calling a static method it is necessary to indicate where they can be found. This is done by prefixing the name of a static method with the name of the class connected by ".": another usage of the dot operator.

In principle it is possible to program in Java as in C: make one big class without instance variables and declare all methods to be static. This is against the whole concept of object-oriented programming, and therefore considered to be extremely bad style. Sometimes it is very handy though to have static methods, sometimes it more clearly expresses what is going on (a call with an object puts one object in the foreground, but maybe the operation uses several objects as arguments in a symmetric way), and sometimes there is no alternative: as we mentioned before variables of the primitive data types are no objects. So, how should one compute e^x for a double x? The exponent function, and many other mathematical functions alike, are therefore static. This allows to call them the conventional way, without first converting a double into a Double (Double is the class with a double as an instance variable). Therefore, inside the class Math the method exp is defined as

  public static double exp(double a)
It can be called by writing Math.exp(x).

A somewhat strange case are the constructors. These are called by only giving the name of the method, but because this name is identical with the name of the class, it is clear where to find them. No object is passed, in this sense it resembles a static method, but the constructor allocates the object, and therefore the instance variables are available like in non-static methods.

In object-oriented programming, the default way of calling methods is by connecting an object of the appropriate class to the method with help of the dot operator. The method is working on this object. Static methods are called by specifying the class without passing an object.

Program FibonacciTest

Above it was considered how to write a Java method that swaps the values of two integers. The problematic that primitive data types cannot be passed by reference also arises when trying to efficiently compute Fibonacci numbers in a recursive way. Here we consider how to handle this problem in an elegant and object-oriented way.

Computing fib(n), the n-th Fibonacci number using directly fib(n) = fib(n - 1) + fib(n - 2) gives an algorithm whose time consumption increases exponentially with n. Of course Fibonacci numbers can easily be computed in an iterative way, but that is not the point here: this problem stands for a whole class of problems. An efficient recursive algorithm can be obtained by not only computing fib(n), but also fib(n - 1). From these two values fib(n + 1) and fib(n) can be computed in constant time and thus the time for computing fib(n) increases linearly with n, as it should do.

In C the two computed values may be handed over using variables of type int*. In Java each variable can be individually wrapped, but doing that means ignoring the structure of the problem: if the method should return a pair of values, then we should use objects of some class which can hold a pair of integers. This can now easily be turned into a correct and efficient program, but this leads to a functional rather than to an object-oriented approach. In an object-oriented context, it is cleaner to let the method work on objects of some class, than to let a static method return objects of this class. Taking all these considerations into account, we get the following program which can be downloaded here:

import java.io.*;

class IO
{

  public static int readInt() 
  // Reads an int from standard input.
  {
    String input = "";
    try 
    {
      BufferedReader bufRead = new BufferedReader
        (new InputStreamReader (System.in));
      input = bufRead.readLine();
    } 
    catch (java.io.IOException e) 
    {
      System.out.print("Error while reading input line!\n");
    }
    return Integer.valueOf(input).intValue();
  }
} 

class Fibonacci
{
  private int x, y;

  private Fibonacci()
  {
    x = y = 0;
  }

  private void recFib(int n)
  {
    if (n == 1)
    {
      x = 0;
      y = 1;
    }
    else
    {
      recFib(n - 1);
      y += x;
      x =  y - x;
    }
  }

  static int fib(int n)
  {
    if (n == 0)
      return 0;
    Fibonacci p = new Fibonacci();
    p.recFib(n);
    return p.y;
  }
}

class FibonacciTest
{
  public static void main(String[] args)
  {
    System.out.print("\nGive n       >>>   ");
    int n = IO.readInt();
    System.out.println("Computed value = " + Fibonacci.fib(n) + "\n");
  }
}

Here we will not try to understand the method readInt(). In Java even IO is handled in a clean object-oriented way, but one would prefer C's basic but convenient routines. Unformatted writing is easy, but reading and formatted writing require quite elaborate methods. More interesting is the class Fibonacci. It contains a static method fib(), which is called from main(). We see how when calling readInt() and fib(), the name of the class is indicated by prefixing it with the respective class names.

In fib() an object p of the class Fibonacci is created. recFib() is called with this object. In recFib() recursive calls are made. Remind that the statement "recFib(n - 1)" is equivalent to "this.recFib(n - 1)", and in this way p, or more correctly a pointer to p, is handed all the way down until reaching the bottom of the recursion. There the values x and y are given values. Remind that whenever working inside a class with the instance variables, these are the instance variables of the current object. In our case this is the object p. Then the recursion returns step-by-step, eventually computing fib(n - 1) and fib(n). The second of these values is returned by fib().

Program Chain

In this section Node and Chain, with minimal modifications, are combined into the following program which can be downloaded here.
  class Node
  {
    static int totalSize = 0;

    int  key;
    Node next;

    Node(int key, Node next)
    {
      this.key  = key;
      this.next = next;
      totalSize++;
    }

    Node(int key)
    {
      this(key, null);
    }

    protected void finalize()
    {
      totalSize--;
    }
  }

  class Chain
  {
    Node first;

    public Chain()
    {
      first = null;
    }

    private Node getLast()
    // Return the last node of a chain
    {
      if (first == null)
        return null;
      Node node = first;
      while (node.next != null)
        node = node.next;
      return node;
    }

    public void addFirst(int key)
    // Add a new node at the beginning of the chain
    {
      first = new Node(key, first);
    }

    public void addLast(int key)
    // Add a new node at the end of the chain
    {
      if (first == null)
        first = new Node(key);
      else
        getLast().next = new Node(key);
    }

    public void concatenate(Chain chain)
    // Attach the Chain chain at the end of the considered chain
    {
      if (first == null)
        first = chain.first;
      else
        getLast().next = chain.first;
      chain.first = null;
    }

    public boolean findValue(int x)
    // Test whether there is a node with key value x
    {
      Node node = first;
      while (node != null && node.key != x)
        node = node.next;
      return node != null;
    }

    public void print()
    // Print all the keys together with their position in the list
    {
      int counter = 0;
      Node node = first;
      while (node != null)
      {
        System.out.println("Node " + counter + " has key " + node.key);
        counter++;
        node = node.next;
      }
    }
  }

  class ChainTest
  {
    public static void main(String ps[])
    {
      Chain c1 = new Chain();
      Chain c2 = new Chain();

      System.out.println("\nCreating chain 1\n");
      c1.addFirst(12);
      c1.addFirst(22);
      c1.addFirst(16);
      c1.addFirst(14);
      c1.addFirst(20);
      c1.addFirst(18);
      for (int i = 0; i < 100; i++)
        if (c1.findValue(i))
          System.out.print(i + " is among the stored values\n");
      c1.print();
      System.out.println("Total number of nodes = " + Node.totalSize);

      System.out.println("\nCreating chain 2\n");
      c2.addLast(11);
      c2.addLast(23);
      c2.addLast(19);
      c2.addLast(37);
      c2.addLast(21);
      for (int i = 0; i < 100; i++)
        if (c2.findValue(i))
          System.out.print(i + " is among the stored values\n");
      c2.print();
      System.out.println("Total number of nodes = " + Node.totalSize);

      System.out.println("\nConcatenating chains\n");
      c1.concatenate(c2);
      for (int i = 0; i < 100; i++)
        if (c1.findValue(i))
          System.out.print(i + " is among the stored values\n");
      System.out.println("\nChain 1:\n");
      c1.print();
      System.out.println("\nChain 2:\n");
      c2.print();
      System.out.println("Total number of nodes = " + Node.totalSize);

      System.out.println("\nRemoving chain 1\n");
      c1 = null;
      System.gc();
      System.out.println("Total number of nodes = " + Node.totalSize);
    }
  }

The class Chain is unchanged. Node is augmented by a static variable. A static variable is the fourth kind of variables next to instance variables, parameters and local variables. These might best be called class variables, so belonging to the class and not to the instance: for all the objects of a class there is only one copy of a static variable. This is the ideal way to maintain information pertaining to the class as a whole. The prime example of this is a counter which keeps track of the number of objects extent. For example, it may be counted how many external ports are in use, and once a new port is requested when the maximum number is already used, some special action must be taken. Inside the class these variables can be accessed just like the instance variables. Outside the class they are accessed analogously to the way a static method is accessed: the name of the static variable is prefixed with the name of the class connected by ".". An example is found in the instruction

  System.out.println("Total number of nodes = " + Node.totalSize);
Of course this access is possible only if the variable is not private.

In Node we now also find a new method called finalize. The method finalize is called automatically by the system when an object is removed by the garbage collector, once for each removed object. It is by default part of any class definition (in an unvisible way) doing nothing, but one can choose to give it a certain functionality. Especially when one uses static variables to count occupied resources, it is important to also decrease there value when these resources are freed again.

Now one might think that in our example the value printed in the last line is 0: because there are no pointers anymore to the chain, all nodes in it have become garbage, unaccessible allocated parts of the memory. So, they could be removed. However, the garbage collection is done in a lazy way: typically it is only performed when need arises or when the processor is waiting anyway. Therefore, the printed value will most likely be 11. If one wants to force the garbage collector to run, then one should add a call to the static method gc from System:

  System.gc();
Notice that in concatenate the final instruction is deleting the link of the attached chain. Without this instruction, the second half of the chain would still have been reachable, and the garbage collector would only throw away 6 of the nodes. It is strongly suggested that the readers actually try these variants of the program and understand what is happening.

In total we have encountered four different uses of the dot operator:

In ChainTest we see examples of each type: Node.totalSize, System.gc(), node.next and c1.addFirst().

The main distinction is between static and non-static, not between variables and methods. In an object-oriented language, the methods are considered to belong to the classes and objects just as much as the variables. To underline this, a non-static method might be called an instance method, a static method might be called a class method. Only internally there is a difference: after compilation the difference between the types of methods disappears, and any method, static or not, is stored only once.

Static variables are class variable: one copy exists for all objects of a given class. This is particularly useful for counters. In order to keep the counting up-to-date in the context of automatic garbage collection one should overwrite the method finalize().

Inheritance, Polymorphism and Encapsulation

General Idea of Inheritance

In software development it is a very common situation that an existing software package is extended. For example, we have one of the above classes and decide that we actually need an extra instance variable or an extra method. Of course one could edit the old package and add the new instance variable or method. This requires that one finds its way through the declarations which might have been made long ago or by some else.

Harder is it if we do not want to add an instance variable or a method but to change it. For example we may want to change the type of the node first in Chain from Node to BetterNode or we may want to replace a method which is good in a general case by a method which is better in a special case. Of course we can give it a new name and add it nevertheless. This is however quite ugly and confusing. It would at least require a very good documentation to make sure that later updates indeed choose the right methods. In any case it increases the number of variables and methods unnecessarily.

Now assume that we want to maintain objects with slightly different features in a common structure, for example an array. One can think of a shop having all kind of things to sell. For food articles there is an ultimate selling day, for non-food articles there may be seasons to respect. But all of them have a price. So, it makes sense to maintain all objects in an array and to call a method price increase. In C this is really hard to realize.

All mentioned aspects are dealt with in a trivial way by the idea of inheritance. Inheritance means that one defines a new class as an extension of an existing class. Such a new class is called a derived class, the class which it extends will be called mother class or base class.

A derived class inherits all the instance variables and methods of its mother class. In addition new instance variables and methods may be added. Instance variables from the mother class may even be defined again, shielding the variable from the mother class. Methods can be overwritten. Frequently a method in a derived class is merely a small modification of a method in the mother class. In that case it is natural and possible to reuse the code from the mother class by a special calling mechanism.

Inheritance is the key concept of object-oriented programming. It allows to add, extend and adapt the functionality of methods and to add instance variables to a class in a hierarchical way.

Inheritance, Polymorphism and Encapsulation

Class BetterCompany

As an example we consider again the earlier defined classes Employee and Company. In this company there are only one kind of employees, which all have the same stored features and which are all treated in the same way by the methods. However, in most companies there are many kinds of employees: they can be divided both according their domain of activity and according to their hierarchical level. We will consider the last, and distinguish director, staff, worker. Different rules apply for them according to salary increases, vacation days, absence due to illness, etc. They might also have different relevant features to store: for the director it might not be counted how much vacation he/she takes, but for all others this variable must be there; only the director has a budget to take care of.

All what has been mentioned so far, holds true for any object-oriented language, possibly with some differences in terminology. The concrete example brings us back to Java. The class definitions of Employee and Company are not repeated, these classes are considered as being fixed. All of the following classes are all build on top of these two. It turns out that while designing these original classes, we might have been slightly more extension oriented: one method is formulated in an unsuitable way, another is not defined at all, even though it will arise in all derived classes. Therefore the following construction is slightly more complex than necessary. This might be considered as a realistic example therefore. Click here if you want to download the complete program.

Overview of Classes in Company Example

  class FixedEmployee extends Employee
  {
    public FixedEmployee(String name, int number, double salary)
    {
      super(name, number, salary);
    }

    public void endOfYear()
    {
    }
  }

  class Director extends FixedEmployee
  {
    private double yearlyBudget;
    private double budget;
   
    public Director(String name, int number, double salary,
             double theBudget)
    {
      super(name, number, salary);
      yearlyBudget = budget = theBudget;
    }

    public void endOfYear()
    {
      budget = budget / 2 + yearlyBudget;
    }

    public void expense(double amount)
    {
      budget -= amount;
    }

    public void increaseSalary(double salaryIncrease)
    {
      if (budget >= 0)
        super.increaseSalary(2.0 * salaryIncrease);
    }

    public String toString()
    {
      return "(" + name + ", " + number + ", " + salary +
             ", director, " + yearlyBudget + ", " + budget + ")";
    }
  }
FidexEmployee is only used to add the method endOfYear, which is defined in all derived classes.

Director is defined as an extension of FixedEmployee. Director has two additional instance variables: "yearlyBudget" and "budget". The new constructor has one more parameter. It performs first a call super( ... ). In this case this means a call to the constructor of the mother class. However, the usage of super is not limited to this case: it generally denotes methods or instance variables in the mother class. The opposite is this, which we encountered already in class Node. It generally denotes the current object or a method, particularly a constructor, from the current class.

"endOfYear" and "expense" are new methods. More interesting are the methods which existed already before: "increaseSalary" and "toString". These are overwriting the methods with the same name in the mother class. increaseSalary calls the method in the mother class by specifying this with super.

  class LowerEmployee extends FixedEmployee
  {
    protected int vacationDays;
    protected int yearlyVacationDays;

    public LowerEmployee(String name, int number, double salary, 
      int theYearlyVacationDays)
    {
      super(name, number, salary);
      vacationDays = 0;
      yearlyVacationDays = theYearlyVacationDays;
    }

    public int applyVacation(int numberOfDays)
    {
      if (numberOfDays > vacationDays)
        numberOfDays = vacationDays;
      vacationDays -= numberOfDays;
      return numberOfDays;
    }

    public void endOfYear()
    {
      vacationDays = vacationDays / 2 
                   + yearlyVacationDays;
    }
  }
The class LowerEmployee has the same features as Director: a few new instance variables and methods. In the constructor the constructor of the mother class is again called. It is a requirement that this call is the first statement of any constructor in a derived class.

  class Staff extends LowerEmployee
  {
    private int overTime;

    public Staff(String name, int number, double salary,
      int yearlyVacationDays)
    {
      super(name, number, salary, yearlyVacationDays);
      overTime = 0;
    }

    public void addOvertime(int hours)
    {
      overTime += hours;
    }

    public void endOfYear()
    {
      super.endOfYear();
      vacationDays += overTime / 10;
      overTime = 0;
    }

    public String toString()
    {
      return "(" + name + ", " + number + ", " + salary +
             ", staff, " + yearlyVacationDays + ", " + vacationDays + ")";
    }
  }
  class Worker extends LowerEmployee
  {
    private static int shiftVacationDays  = 5;
    private boolean shiftDuty;

    public Worker(String name, int number, double salary,
      int yearlyVacationDays, boolean theShiftDuty)
    {
      super(name, number, salary, yearlyVacationDays);
      shiftDuty = theShiftDuty;
    }

    public void increaseSalary(double salaryIncrease)
    {
      if (shiftDuty)
        super.increaseSalary(1.1 * salaryIncrease);
    }

    public void endOfYear()
    {
      super.endOfYear();
      if (shiftDuty)
        vacationDays += shiftVacationDays;
    }

    public String toString()
    {
      return "(" + name + ", " + number + ", " + salary +
             ", worker, " + yearlyVacationDays + ", " + vacationDays +
             ", " + shiftDuty + ")";
    }
  }
The variable shiftVacationDays is static. This means that this is not an individual quantity, but common to all members of the class.

  class BetterCompany extends Company
  {
    public BetterCompany(int maxSize)
    {
      super(maxSize);
    }

    public void addEmployee(FixedEmployee newEmployee)
    {
      if (size == maxSize)
        System.out.print("No space left, ignoring instruction!\n");
      else
      {
        staff[size] = newEmployee;
        size++;
      }
    }

    public void endOfYear()
    {
      for (int i = 0; i < size; i++)
        if (staff[i] instanceof FixedEmployee)
          ((FixedEmployee) staff[i]).endOfYear();
    }

    public void expense(int number, double amount)
    {
      int i = 0;
      while (i < size && staff[i].getNumber() != number)
        i++;
      if (i == size)
        System.out.print("Number not found, ignoring instruction!\n");
      else
        if (staff[i] instanceof Director)
          ((Director) staff[i]).expense(amount);
        else
          System.out.print("Employee with number " + number + 
            " is not a director, ignoring instruction!\n");
    }
  }
The class BetterCompany corrects an omission in Company: the method addEmployee with an Employee parameter. Notice that in this case we do not say that addEmployee is overwriting the method with the same name in the mother class: the signature of these methods is not the same. Here we rather encounter polymorphic variants.

The new method endOfYear makes it possible to perform endOfYear in the same way as increaseSalary in the original version. The new method expense makes it possible to call the method expense in Director in the same way as before we could call changeName.

In endOfYear we see the operator "instanceof". The reason for this is that staff[] is an array of Employee objects. Even though we might believe that these are actually of type fixedWorker, for which the method endOfYear is defined, there might also be a derived class TemporaryAid for which endOfYear is not defined. At this point it is important to introduce the difference between the declared type and the actual type of a variable. The declared type of staff[i] is Employee, the actual type may be any of the derived classes. instanceof determines at runtime the actual type of a variable and returns true if this matches the specified type.

Even though we now are sure that the application of endOfYear is correct, it still does not work to simply write

          staff[i].endOfYear();
The problem is that endOfYear is not mentioned in class Employee. Thus, at compile time, this looks wrong. Therefore it is required to add a so-called cast. A cast is a forced type conversion. So, we convert staff[i] in a FixedEmployee, on our own responsibility. Not withstanding the cast, at runtime the actual type determines which method to select.

Now we have obtained all we need to get a main program with considerably larger functionality. The changes to make are small. If Company would have been designed better, with a method addEmployee with Employee parameter, the changes would have been even less.

  class CompanyTest
  {
    public static void main(String ps[])
    {
      BetterCompany myCompany = new BetterCompany(100);
      myCompany.addEmployee(
        new Staff("Becker, Boris",    235521, 4500.00, 28));
      myCompany.addEmployee(
        new Director("Hecht, Edgar",     878722, 6500.00, 10000000));
      myCompany.addEmployee(
        new Worker("Albers, Marianne", 456212, 1554.00, 23, false));
      myCompany.print();
      myCompany.endOfYear();
      myCompany.print();
      myCompany.addEmployee(
        new Worker("Krauser, Angela",  426578, 1954.00, 25, true));
      myCompany.addEmployee(
        new Staff("Noack, Christina", 663738, 5646.00, 32));
      myCompany.print();
      myCompany.increaseSalary(0.04, 50.0);
      myCompany.addEmployee(
        new Worker("Brauer, Harald", 568900, 2200.00, 25, true));
      myCompany.setName(456212, "Becker, Marianne");
      myCompany.expense(878722, 73000);
      myCompany.print();
      System.out.print("\n");
    }
  }
Running the program gives the following output, clearly showing the result of the more individual treatment.
Overview of employees:
Employee[0] = (Becker, Boris, 235521, 4500.0, staff, 28, 0)
Employee[1] = (Hecht, Edgar, 878722, 6500.0, director, 1.0E7, 1.0E7)
Employee[2] = (Albers, Marianne, 456212, 1554.0, worker, 23, 0, false)

Overview of employees:
Employee[0] = (Becker, Boris, 235521, 4500.0, staff, 28, 28)
Employee[1] = (Hecht, Edgar, 878722, 6500.0, director, 1.0E7, 1.5E7)
Employee[2] = (Albers, Marianne, 456212, 1554.0, worker, 23, 23, false)

Overview of employees:
Employee[0] = (Becker, Boris, 235521, 4500.0, staff, 28, 28)
Employee[1] = (Hecht, Edgar, 878722, 6500.0, director, 1.0E7, 1.5E7)
Employee[2] = (Albers, Marianne, 456212, 1554.0, worker, 23, 23, false)
Employee[3] = (Krauser, Angela, 426578, 1954.0, worker, 25, 0, true)
Employee[4] = (Noack, Christina, 663738, 5646.0, staff, 32, 0)

Overview of employees:
Employee[0] = (Becker, Boris, 235521, 4680.0, staff, 28, 28)
Employee[1] = (Hecht, Edgar, 878722, 7020.0, director, 1.0E7, 1.4927E7)
Employee[2] = (Becker, Marianne, 456212, 1554.0, worker, 23, 23, false)
Employee[3] = (Krauser, Angela, 426578, 2039.976, worker, 25, 0, true)
Employee[4] = (Noack, Christina, 663738, 5871.84, staff, 32, 0)
Employee[5] = (Brauer, Harald, 568900, 2200.0, worker, 25, 0, true)

Polymorphism

This program may look rather unspectacular, but it illustrates the killer application of object-oriented programming. Notice what we are doing: we are handling objects of different classes within one common structure. In methods like print and increaseSalary, we are calling an old inherited method from Company and nevertheless we get for each object the increased functionality of the class these objects actually belong to: a Director gets a larger salary increase than the others, the shift workers get five extra days of vacation.

The above gives an example of polymorphism in the more strict sense: polymorphism means that variables can actually stand for different kinds of objects. This implies that parts of the program which are formulated in general terms can be applied to different kinds of objects. This notion is closely linked to the notion of dynamic binding: the above described phenomenon, that at runtime the actual type is used to determine which of the methods with identical signature is going to be used.

At compile time, it is checked that any method is connected by the dot operator to an object of a class in which this method is defined. This is done by checking the declared type of the object. At run time, the method to execute is chosen by looking at the actual type of the object connected to the method.

Now it is time to mention that any class is implicitly defined as an extension of class Object. Object is at the top of the class hierarchy. Without knowing this, we have already been using this fact implicitly. Consider a print statement of the following kind:

  System.out.print("Employee[" + i + "] = " + staff[i] + "\n");
How does this work? First the expression between the round brackets is evaluated. Here we use that the operator "+" is polymorphic, although for operators we rather say that they are overloaded. So, depending on the types of the arguments, "+" has a different effect.

This is nothing new, we already know that 3 / 4 < 0.5, while 3.0 / 4 > 0.5. The reason is here that in the first case "/" is evaluated as an integer operation, while in the second case it is evaluated as an operation between doubles. The rule for "/" is that it is evaluated as an integer operator if both its arguments are integers. If one of the arguments is a float or a double, then the other argument is converted to this type as well before the division is performed between floats or doubles. Notice that the resulting type has no impact: if x is a double, then "x = 3 / 4" is equivalent to writing "x = 0". Slightly more tricky is that "x = 3 / 4 * 10.0" has the same effect. The reason is that among operators with the same priority, the evaluation order goes from left to right (in this case).

The rules for "+" are different but similar. "+" between two String objects performs a concatenation of these. If the arguments are objects of other classes, then first the method toString is called. Because toString is defined in Object, this always works. Not overwriting toString results in a standard layout. Overwriting toString, as is done in Employee, allows to tune the output. Only when both arguments of "+" are of a numerical type, it is assumed that an addition is to be performed. Therefore we have

  "Value = " + i + i    !=     "Value = " + (i + i)
  i + i + "= Value"     !=     i + (i + "= Value")
Casts are sometimes needed to obtain a forced type conversion.

Encapsulation

We have seen several of the access modifiers. These are part of a hierarchy which allow the programmer to specify in which classes and packages the methods and instance variables of a method can be accessed.

Unfortunately there is no modifier for "the own class + all derived classes". The only way to obtain this is to define a method / instance variable as "protected" and not integrating any non-derived classes in the package.

A careful choice of the applied modifiers is of great importance: making everything public is convenient, but implies that external applications may essentially use features of the internal realization of a class. If later one wants to change this internal realization, then it may happen that these applications do not run correctly anymore.

It is good practice to fix a well-defined interface between the class and the outside world: that is to fix which instance variables of an object should be visible and which methods should be callable. Less visibility gives more flexibility! Classes should be defined according to their functionality, not according to how it is realized. For example: a Chain has the functionality of a special kind of (multi) set, with two insert operations and the possibility to unify to Chain objects. The general idea of limiting the access is called encapsulation, it is one of the corner stones of object-oriented programming.

The above argument should have made clear that it is wrong to only use public. But only using "private" or "public" is not good either. Sometimes classes are designed with the explicit purpose that they are going to be derived. One can consider Employee to be of this type. One may consider that the structure of Employee is so reasonable that there will never arise need to modify it. At the same time derivations are considerably facilitated if the instance variables and methods are accessible from the derived classes. Therefore, we have chosen to use "protected" for the instance variables in Employee.

The access modifiers allow the programmer to fix the degree of encapsulation of classes, objects and methods. Mostly instance variables are private or protected and can be accessed only by special access methods

Further Important Aspects

By now Java has expanded terribly and few people will have an overview of all classes and methods defined. Here we do not even attempt to provide a complete overview of this. However, there are still many very fundamental aspects which have not been mentioned above. Here some of them are shortly discussed.

Final and Abstract

We have encountered the keyword final in front of a variable. However, final is a general modifier. When final stands in front of a variable, then this means that this variable cannot be changed after declarations. In other words, the variable is actually a constant. This also means that the variable must be initialized upon declaration. Final for a method means that this method cannot be overwritten in a derived class. Final for a class means that the whole class cannot be derived. Possible advantages of the use of "final" are speed and security. If a method is final, then the compiler may "inline" it. Declaring classes final apparently also makes it harder to maliciously operate on a piece of code.

Abstract is more or less the opposite of final: an "abstract" class must be derived. One cannot create any objects of an abstract class. Likewise, an abstract method must be overwritten. To make this consistent, the designers of Java have decided that abstract methods can only appear in abstract classes (but an abstract class can have methods that are not abstract).

Polyinheritance

In many object-oriented languages, there are almost no limitations on ways to inherit. In Java there is a strict limitation: any class inherits from at most one other class. This implies that the complete structure is like a tree: there are no cycles. An example of this we have seen for the classes derived from Employee. Because any class is an extension of Object, there is in fact only one tree, with Object at the top connecting to all classes which are not derived from other classes themselves.

There are good reasons to allow polyinheritance: many objects incorporate aspects of several more general classes. A person can both be an Employee and a ClubMember, an article in a shop may both be a FoodArticle and a LuxuryArticle. However, polyinheritance may also lead to consistency problems: if BClass and CClass each extend AClass and DClass extends both BClass and CClass, then methods from AClass are inherited in two possible ways. If a method from AClass has been overwritten in BClass and/or CClass, then at runtime it would not be clear which one to take.

To exclude this kind of problems in Java polyinheritance is generally forbidden. In other languages this problem is addressed differently: One might generally allow polyinheritance, but forbid inheritances which result in having equally valid variants of methods. One might allow any kind of inheritance, and in case a method is inherited several times, one might for example always select the variant from the first listed class in which it is defined. Java has chosen the most restrictive approach, assuring correctness and facilitating the task of the compiler, at the expense of programming possibilities.

Polyinheritance of Classes

Each class extends at most one other class, assuring that the inheritance hierarchy has a tree structure. But, classes may implement many interfaces, telling which methods certainly exist.

Interfaces

Definition

An interface is something like a class, but different. It is close to being a fully-abstract class: it has only abstract methods (because this is default, one does not have to define them as such). An interface does not have instance variables. The only thing it may have is interface constants: static final variables. For the rest an interface is like a normal class: one can define variables, parameter and return values of an interface type.

The properties of interfaces can easily be summarized:

Because all methods listed in an interface are abstract, in any class which claims to implement an interface, implementations of all these methods must be provided. This fact makes interfaces of great importance when designing larger software packages. Specifying that a class implements an interface gives a synopsis of its functionality. Other classes, possibly designed by other programmers, may build on this. Thus, when working with objects of a class implementing an interface, the methods of this interface (in the sense of Java) may be used as an interface (in the conventional sense).

Because interfaces have neither instance variables nor worked-out methods, there are no problems related to having implementations of several interfaces and therefore a class may implement any number of interfaces.

Polyinheritance of Interfaces

Classes may implement many interfaces, telling which methods certainly exist. Specifying that a class implements an interface gives its minimal guaranteed functionality.

EndOfYearable

Interfaces are typically used to express properties. Therefore, it is customary to give an interface a name ending on "able". In the earlier example "BetterCompany", the class FixedEmployee was only defined to obtain a common platform for all the derived classes implementing the method endOfYear. This could better have been done in the following way using an interface (click here if you want to download the complete modified program):
  interface EndOfYearable
  {
    public void endOfYear();
  }

  class Director extends Employee implements EndOfYearable
  {
    ...

    public void endOfYear()
    {
      budget = budget / 2 + yearlyBudget;
    }

    ...
  }

  class LowerEmployee extends Employee implements EndOfYearable
  {
    ...

    public void endOfYear()
    {
      vacationDays = vacationDays / 2
                   + yearlyVacationDays;
    }

    ...
  }

  class Staff extends LowerEmployee
  {
    ...

    public void endOfYear()
    {
      super.endOfYear();
      vacationDays += overTime / 10;
      overTime = 0;
    }

    ...
  }

  class Worker extends LowerEmployee
  {
    ...

    public void endOfYear()
    {
      super.endOfYear();
      if (shiftDuty)
        vacationDays += shiftVacationDays;
    }

    ...
  }

  class BetterCompany extends Company
  {
    ...

    public void endOfYear()
    {
      for (int i = 0; i < size; i++)
        if (staff[i] instanceof EndOfYearable)
          ((EndOfYearable) staff[i]).endOfYear();
    }

    ...
  }

Comparable

There are many predefined interfaces. One of them is Comparable. Comparable is defined as follows:
  public interface Comparable
  {
    int compareTo(Object o);
  }

So, any class which implements Comparable promises to provide an implementation of the method compareTo() returning an int. Of course, this method can be realized in any way, but some ways make more sense than others. The intended semantic of compareTo() is that it returns a negative integer, zero, or a positive integer as this object is less than, equal to, or greater than the specified object.

Consider the following classes:

  class CompString implements Comparable
  {
    char[] characters;

    CompString(String string)
    {
      characters = string.toCharArray();
    }

    public int compareTo(Object o)
    {
      return characters.length - ((CompString) o).characters.length;
    }
  }


  class CompMatrix extends IntegerMatrix implements Comparable
  {
    CompMatrix(int size)
    {
      super(size);
    }

    CompMatrix(IntegerMatrix matrix)
    {
      super(matrix);
    }

    public int compareTo(Object o)
    {
      return trace() - ((IntegerMatrix) o).trace();
    }
  }

These classes have nothing to do with each other except that they both implement compareTo() in one of many possible ways (one could think of more sensible ways to compare strings and matrices). The great thing is now that arrays of instances of the classes CompString and CompMatrix and any other class which implements the Comparable interface can be sorted with the following sorting method:
  class CompSort
  {
    static void sort(Comparable[] a, int n)
    {
      for (int r = n - 1; r > 0; r--)
        for (int i = 0; i < r; i++)
          if (a[i].compareTo(a[i + 1]) > 0)
            {
              Comparable x = a[i];
              a[i]         = a[i + 1];
              a[i + 1]     = x;
            }
    }
  }

The underlying sorting algorithm (known as bubble sort) is not particularly efficient, but that is not the issue here. The purpose of the code example is to demonstrate how interfaces can be used in an effective way. Here again the dynamic binding is crucial: in the sorting routine, the actual type of the objects to compare is determined at runtime and then the appropriate implementation of compareTo() is selected. Without dynamic binding, we had to have different sorting routines for each class whose objects we would like to sort.
The most exciting feature of interfaces is that they allow to process objects of different classes that share a single aspect with a single method.

Exceptions

Description

The final thing to know about Java is the notion and handling of exceptions. As remarked before, Java has paid utmost attention to preventing as far as possible programming errors. Therefore it has created a system to test for unexpected situations. One can enclose a fragment of program text in a try clause, which is followed by a catch clause.

How does this work? If an error occurs, then one of the following things happens:

  1. The faulty line is surrounded by a try-catch of the corresponding type. In that case the instruction specified in the catch part are executed and the computation goes on as specified (possibly exiting with an adequate error message).
  2. There is no such try-catch. In that case the exception is passed upwards to the method from which this fragment of code was called, and it is tested again for try-catch. And so on, until a matching try-catch is found or the program exits with an error.
When we think of Java also as the language for internet applications (mainly in the form of applets), then an error condition does not necessarily mean that the program is wrong: it may have asked for a connection to be opened which was impossible because the other side was not replying. Instead of crashing, one might want to try something else, or just go on and ignore the thing.

There are two types of exceptions: runtime and general exceptions. General exceptions must be dealt with, runtime exceptions may be dealt with. An example of a general exception is when reading: Java obliges the programmer to be aware of the possibility to read beyond EOF. Thus, every read must be surrounded by a try-catch. The following piece of code gives a class which contains a static method for reading an integer. In case something goes wrong while reading, the user is informed, and 0 is returned from the method.

  import java.io.BufferedReader;
  import java.io.InputStreamReader;
  
  class IntReader
  {
    static int readInt() 
    // Reads an integer from input
    {
      try 
      {
         return Integer.valueOf(
           (new BufferedReader(
              new InputStreamReader(System.in)).readLine())).intValue();
      } 
      catch (java.io.IOException e) 
      {
        System.out.print("IO Exception occurred, returning 0");
        return 0;
      }
    }
  }

  class ExceptionTest
  {
    public static void main(String ps[])
    {
      int i;
      System.out.print("Give i   >>>   ");
      i = IntReader.readInt();
      System.out.print("i = " + i + "\n");
    }
  }

An example of a runtime exception is division-by-zero: the programmer may for this and choose an appropriate reaction, but this is not required. Testing for all possible errors would make programs long and slow, so therefore this freedom is good. The keyword throws allows to handle exceptions at a higher level. Using throws indicates that one is aware of the possibility that something might go wrong, but that one does not want to deal with it at this level. Using throws may help to save many try-catch pairs.

  import java.io.BufferedReader;
  import java.io.InputStreamReader;
  
  class IntReader
  {
    static int readInt() throws java.io.IOException
    // Reads an integer from input
    {
       return Integer.valueOf(
         (new BufferedReader(
            new InputStreamReader(System.in)).readLine())).intValue();
    }
  }

  class ExceptionTest
  {
    public static void main(String ps[])
    {
      int i;
      System.out.print("Give i   >>>   ");
      try 
      {
        i = IntReader.readInt();
      } 
      catch (java.io.IOException e) 
      {
        System.out.print("IO Exception occurred, continuing with i == 0");
        i = 0;
      }
      System.out.print("i = " + i + "\n");
    }
  }

User-Defined Exceptions

Everything is classes, and so are exceptions. By deriving from the class Exception, the user can define his/her own exception classes, which might be useful to test for and react on non-standard exceptions in a convenient way.

In the section on interfaces it was considered how a single sorting method could be used to sort all kinds of arrays, as long as their obejcts were comparable. This works really fine, unless the provided array contains objects of different comparable types: appels can be compared with appels, and pears with pears, but ... . This will be rarely a problem, because the programmer knows that the array only contains objects that are mutually comparable. However, it may nevertheless be handy to deal with this and similar other exceptional circumstances.

A convenient way of doing is to define a NonComparableException which is thrown when it is attempted to compare with non-comparable objects. Otherwise the instruction "a[i].compareTo(a[i + 1])" will lead to an error: because of the dynamic binding, the method compareTo() of the class of a[i] will be called. In this method some instance variables of a[i] and a[i + 1] will be accessed. If a[i + 1] is an instance of a different class this will result in an attempt to access a non-existing instance variable. The solution is simple. In compareTo(), before casting obj to the class of this, it should be tested of obj is an instance of the same class using the operator instanceof. If this is the case, the cast is safe. Otherwise a NonComparableException is thrown. Using throws this exception may be guided all the way up. In the following program, which is run as a batch job, this is not that useful: without the added tests the program would have crashed, we would have corrected the error, and tried again. However, an interactive job such as an online banking applet should not crash. Instead it should tell the user what went wrong and how to proceed.

  interface SafeComparable
  {
    int compareTo(Object o) throws NonComparableException;
  }

  class NonComparableException extends Exception
  {
    NonComparableException(String string)
    {
      super(string);
    }
  }

  class SafeCompString implements SafeComparable 
  {
    char[] characters;

    SafeCompString(String string)
    {
      characters = string.toCharArray();
    }

    public int compareTo(Object o) throws NonComparableException
    {
      if (o instanceof SafeCompString)
        return characters.length - 
          ((SafeCompString) o).characters.length;
      throw new NonComparableException("Object not a SafeCompString");
    }

  
    public String toString()
    {
      return new String(characters);
    }
  }

  class SafeCompMatrix extends IntegerMatrix implements SafeComparable
  {
    SafeCompMatrix(int size)
    {
      super(size);
    }

    SafeCompMatrix(IntegerMatrix matrix)
    {
      super(matrix);
    }

    SafeCompMatrix(int size, int[] array)
    {
      super(size);
      for (int i = 0; i < n; i++)
        for (int j = 0; j < n; j++)
          a[i][j] = array[i * n + j];
    }

    public int compareTo(Object o) throws NonComparableException
    {
      if (o instanceof SafeCompMatrix)
        return trace() - ((SafeCompMatrix) o).trace();
      throw new NonComparableException("Object not a SafeCompMatrix");
    }

    public String toString()
    {
      String s = "( ";
      for (int i = 0; i < n; i++)
      {
        for (int j = 0; j < n; j++)
          s += a[i][j] + " ";
        if (i != n - 1)
          s += "| ";
      }
      return s + ")";
    }
  }

  class SafeCompSort
  {
    static void sort(SafeComparable[] a, int n) 
      throws NonComparableException
    {
      for (int r = n - 1; r > 0; r--)
        for (int i = 0; i < r; i++)
          if (a[i].compareTo(a[i + 1]) > 0)
            {
              SafeComparable x = a[i];
              a[i]             = a[i + 1];
              a[i + 1]         = x;
            }
    }
  }

class SafeCompSortTest
{
  static void sort(SafeComparable[] s, int n)
  {
    try
    {
      SafeCompSort.sort(s, n);
      for (int i = 0; i < n; i++)
        System.out.println(s[i]);
    }
    catch (NonComparableException e)
    {
      System.out.println("Array NOT sorted, " + e);
    }
  }

  public static void main(String[] args)
  {
    System.out.println("-------------------------------------");
    SafeComparable[] s = new SafeComparable[10];
    int a[];

    System.out.println();
    s[0] = new SafeCompString("abcde");
    s[1] = new SafeCompString("cde");
    s[2] = new SafeCompString("abcde");
    s[3] = new SafeCompString("cdav");
    s[4] = new SafeCompString("abxcde");
    s[5] = new SafeCompString("xx");
    sort(s, 6);
  
    System.out.println();
    a = new int[4]; a[0] = 10; a[1] = 5; a[2] = 7; a[3] = 8;
    s[0] = new SafeCompMatrix(2, a);
    a = new int[1]; a[0] = 6;
    s[2] = new SafeCompMatrix(1, a);
    a = new int[4]; a[0] = 11; a[1] = 6; a[2] = 17; a[3] = 5;
    s[3] = new SafeCompMatrix(2, a);
    sort(s, 4);
  
    System.out.println();
    a = new int[1]; a[0] = 43;
    s[1] = new SafeCompMatrix(1, a);
    sort(s, 4);

    System.out.println("\n-------------------------------------");
  }
}

Exceptions are there to assure that in case something goes wrong a decent output is produced and resources are freed before crashing or going on in an alternative way.

Summary

Java is an object-oriented language. The guiding idea in the design of Java has been to assure correctness, even if this goes at the expense of speed or flexibility.

At a superficial level, the object-orientedness of Java is expressed by the way methods are called: an object is connected to a method of the class of this method with the dot-operator, putting the object in the foreground. Much more important are the following general concepts of object-oriented programming:

Inheritance:
A class is defined to be an extension of an existing class, inheriting all its instance variables and methods, with the possibility to add instance variables and to add, extend or overwrite methods.
Encapsulation:
Details of the implementation are made unvisible externally: a class is defined by its external functionality and not by its internal realization. This allows a high level of abstraction and the flexibility to later modify the details as long as the external functionality remains unchanged.
Polymorphism:
Methods can have the same name as long as they have a different signature. More importantly, an array (or other objects containing other objects), defined to hold objects of a certain type, may also hold objects of any derived type. This allows to store objects with different features in a common data structure.
Dynamic Binding:
When an object is connected to a method, then at compile time it is tested whether in the class of the declared type of the object this method exists. However, at runtime, it is the actual type of the object, which because of the described polymorphism does not need to be the same, which determines the method to be called. Especially in the context of an array (or other objects containing other objects) with polymorphic objects, this allows to give a specific treatment of objects with different features stored in a common data structure.

Further Code Examples

The following code examples illustrate many aspects of Java, even some more than discussed above, inside working programs. By modification these programs can be used for most common non-graphical programming tasks in Java.

Exercises

Some of the exercises are (almost) identical to the exercises from the chapter on C. This is not a mistake. It is instructive to make the similarities and differences explicit.
  1. Define a class IntArray. The class has two instance variables: an int "length" and an int[] "a". IntArray has one constructor; a method for printing all values in a; and two inversion methods. Inversion is the operation so that afterwards a[i] has the value which initially was found in a[n - i - 1]. The first inversion method works with a dummy array b[]. The second method performs the inversion in-situ, that is, without using much extra memory. Embed this class in a program. In main an IntArray object is created of length 20. The fields of the array are initialized with a[i] = 2 * i, for all i. Then the array is inverted with each of the inversion methods. After each big change the array is printed.

  2. The example class IntegerMatrix from above can be downloaded here. Augment this class with a method for adding matrices (a_{i, j} = b_{i, j} + c_{i, j}, for all i, j) were the two matrices to add are passed as parameters, while the matrix in which the sum is computed is passed as object. Add a similar method for multiplying matrices (a_{i, k} = sum_j b_{i, j} * c_{j, k}, for all i, j). In all operations you may assume that the matrices fit: they are all n x n matrices for some fixed n. On the other hand, you must take care that the product method even computes correctly when the involved matrices are not all different, for example when computing A = A * A. Should you also take extra care with the method for computing the sums? Further you should add methods for setting the value of a specified position of the matrix in an IntegerMatrix and for printing all values of the matrix.

    IntegerMatrix should also have a static variable totalSize keeping track of the sum of all sizes of all matrices, and the constructors should refuse to allocate new memory when totalSize would exceed MAX_TOTAL_SIZE, for some constant. In that case some output is produced, ideally this is handled by a self-defined exception, but this is not required. The method finalize() should be overwritten to assure that totalSize remains accurate even when IntegerMatrix objects are removed by the garbage collector.

    Integrate class IntegerMatrix into a program which creates several matrices, makes some assignments and performing some operations. More concretely, we want you to create matrices A, B and C, as specified below, and to compute A = A . (B + C).

               ( 1  7  2)      ( 3 -7 -3)      ( 0  4 -2)
           A = (-1  2  7)  B = (-4  2  3)  C = ( 0 -1 -5)
               ( 1  4 -5)      (-6 -1  3)      (10  5 -2)
        
    The initial, intermediate and final matrices should be printed. Check that the computed results make sense:
                   ( 3 -3 -5)                  (-17  12 -17)
           B + C = (-4  1 -2)    A . (B + C) = ( 17  33   8)
                   ( 4  4  1)                  (-33 -19 -18)
        

  3. Consider the problem of sorting pairs (x, y) on the value of x. The first position of such a pair is called its key, the second position its name. The key is an integer in a finite range: 0 <= x < m for some reasonably small value m. For the sake of simplicity, even the names are assume to be integers, but these could be arbitrary. The class of these objects is called Pair. Pair has methods getKey(), setKey(), getName() and setName(). The method toString() should be overwritten so as to produce a pretty output: a Pair with key 12 and name 245 should be converted to the string (12, 245). In main() your program should ask for the number of pairs n and m, and then create an array of Pairs with random values (bounding the key values to m).

    This array should be sorted. To this end you should define a class Sort which has a static method sort() which has an array of Pairs as parameter. Sort has another parameter which is used to pass the value of m. Here we are not so much interested in efficiency but in handling classes. Define a further class, called Node. A Node has two instance variables: a Node and a Pair. The class NodeArray mainly consists of an array of Nodes. In our application this array has length m. Because the Nodes will be linked to each other so that they form lists, an object of NodeArray can be viewed as a set of m linked lists. In NodeArray there is a method which allows to insert a Node at the beginning of a list at a specified position of the array. NodeArray also has methods which allow to enumerate all Nodes in all lists in a systematic way, starting with the list at array position 0. The sorting can now be performed by sort() as follows:

    1. A new empty NodeArray is created.
    2. The array of Pairs is traversed and each Pair (x, y) is enveloped in a Node which is added at the beginning of the list starting at position x of the array.
    3. Repeatedly call for the next Node from which the enveloped Pair is extracted (by calling a method in Node). Insert the Pairs in this new order in the array of Pairs.

    Fill in the details yourself and work this out to a running program. Test it for m = 10 and n = 20.

    In the current version, if there are several Pairs with the same key, then the order of these Pairs will get reversed. This is undesirable: in many applications it is required that a sorting subroutine is stable. With a minimal change the above sorting method can be made stable. How?

    What is the running time of your algorithm expressed in terms of n and m? What do you get for m = O(n)?

  4. Write a program for efficiently performing set operations using a boolean for every element of the set, packing 31 booleans (which indicate whether an element is present in the set or not) in an int. The elements in the sets have indices from 0 to n - 1, for some value n which is read at the beginning of the program. Use the class IntReader for this. It can be downloaded here. The supported operations must be:
    • remove_all: sets all bits to zero,
    • add_all: sets all bits to one,
    • choose_random: sets all bits to one or zero with a specified probability p,
    • insert: sets a specified bit to one,
    • delete: sets a specified bit to zero,
    • find: returns the value of a specified bit,
    • overview: a routine printing a concise overview of all bits,
    • intersection: computes the intersection of two sets,
    • union: computes the union of two sets,
    • size: the total number of ones.

    Of course you should define a class Set for this. All operations should be perfectly intransparent and the instance variables should not be visible outside the class. All calls to the methods of set must be performed in an object-oriented way, none of the mentioned methods may be static.

    Random numbers can be generated with help of the methods in the class Random in java.util. Use this to generate three random sets of size 100.000.000 each:

    • S1 gives the numbers on which a lotto price is falling in the first round. A fraction 0.05 (that is 5%) of the numbers is selected.
    • S2 gives the numbers on which a lotto price is falling in the second round. A fraction 0.05 (that is 5%) of the numbers is selected.
    • S3 gives the numbers on which someone has put a bet. A fraction 0.20 (that is 20%) of the numbers is selected.

    The task is to compute the number of bets resulting in a price (each bet gets at most one price). That is, you should first compute the union of S_1 and S_2, then intersect with S_3 and finally compute the size of the resulting set. Print this resulting number (if it does not lie between 1.940.000 and 1.960.000, then probably something is wrong with your program).

  5. The class Chain allows to insert elements at the beginning and end, but in the latter case the whole chain has to be traversed. Define a derived class LastChain of Chain, which has one additional instance variable Node last. Of course LastChain has its own constructor, which should also call the constructor of the mother class. The method addLast should be overwritten. Even the methods addFirst and concatenate may have to be adapted: the method from the mother class should be called, only the new operations should be specified.

    Integrate LastChain into a program: take the program ChainTest from the text above and change the type of c1 and c2 from Chain to LastChain. The text of the program can be downloaded here.

  6. In program Chain a Node has instance variables int key and Node next. One can also define Node with instance variables int key, Node left and Node right. This gives nodes that can be used to construct a tree. Like the chain a tree is a linked structure but in a tree the nodes may have degree larger than one, in our case they have degree 0, 1 or 2. A tree in which the degree of the nodes is at most two is called a binary tree. The node pointed to by the instance variable left is called the left child, the node pointed to by right the right child.

    In a search tree, the nodes are not are not arranged arbitrarily, but so that for any node the key of its left child (if existing) is smaller than its own key, and that the key in its right child is larger. This arrangement allows to easily perform the operation find: determining whether an element with a specified key exists or not. This is done in the following way: If the value x is smaller than the key y of the current node, then, if x occurs at all in the tree, it must occur in the left child or the nodes which can be reached from there. If the current node has no left child, then x does not occur. In case x > y, we must go right. If x is equal to the key, then we have found the value.

    Binary Search Tree

    Create a class SearchTree implementing the above ideas. The class has an instance variable Node root. "root" corresponds to "first" in Chain: this is the node from which the structure is entered. There must be a trivial constructor, a method find along the above guidelines and a method print. The return type of find should be Node: it returns null when the value x we were looking for does not occur, otherwise it returns the Node with key equal to x. "print" should print all nodes in some systematic way. A very good idea is to do it recursively, A method is called recursive when it works by calling itself again (with a certain stopping condition). This recursive printing should however be handed over to a method within the class Node or an extension thereof (after testing that root != null). It has a structure of the following kind:

           void print() 
           {
             if (left != null)
             {
               System.out.print("Going left\n");
               left.print();
             }
             System.out.print("Key value = " + key + "\n");
             if (right != null)
             {
               System.out.print("Going right\n");
               right.print();
             }
           }
        
    It is a good idea, but not required, to also hand over find to a method in the class Node.

    Create the search tree from the picture "by hand", that is, by creating nodes with appropriate keys one by one and hooking them in the correct way. Then call print for the tree.

    Inserting a node with key x in a search tree is also easy: Search for x. If x already occurs, we do not insert it again. Otherwise if the search ends in a node with key y != x, then if y > x, a new Node with key x is added as left child, otherwise as right child. Delete can be performed by marking the deleted nodes in a special way, if this value is inserted again later on, the marking must be undone.

    Create a derived class MarkNode of Node which has one additional instance variable: boolean deleted. Of course this class also needs a constructor. The class Dictionary is a derived class of SearchTree. It has additional instance variables int size and int realSize. "size" indicates the number of non-deleted nodes, while "realSize" indicates how many nodes are physically there. Methods insert and delete are added. The actual work should best be done at the level of MarkNode.

    Now create the same tree again by inserting the elements in appropriate order. For two trees to be the same the structure and the keys in corresponding nodes must be the same.

    Create an empty Dictionary. Generate 100,000 random values in the range 0, ... , 199,999 and insert these in the order they are generated. Print the size of the tree. It should lie around 78600. Generate 100,000 random numbers in the same range and count how many of them occur in the tree. It should be about 39300. Generate 100,000 random numbers in the range 0, ... , 199,999 and perform a delete for all of them. Print again the number of remaining nodes, now print both size and realSize. size should lie around 47500, realSize should be the same as before.

    Create an empty Dictionary. Insert the numbers 0, 1, ..., 99,999 in this give order. What do you notice. What is the reason? Why did this not happen before? What is your conclusion about the suggested data structure Dictionary?

  7. Write a small Java program consisting of main and a method swap. swap exchanges the value of two integer values passed as parameters. Think of the simplest way to realize this. Hint: embed the variables into an object of the class Integer, or an own class of this kind.

  8. Write a program for converting a text to caps_format, all letters must be replaced by capitals, while the other characters and the layout remain unchanged. The original text is found on the file input, the converted text is written to the file output. Both files stand in the same directory as the program.

  9. Write an applet performing the above exercise on converting a text to caps format. There should be two text boxes, and a submit button. In the first text box the input text is entered, to the second text box the program writes the modified text after pressing the button. The arrangement should be: input text at the top, submit button in the middle and output text at the bottom. The labels of these should be "INPUT TEXT", "CONVERT", "OUTPUT TEXT", respectively.

  10. Write an applet simulating the operations in a post office. The details of the task are specified here.

  11. Define a class of complex numbers. Complex numbers are pairs of two doubles with certain arithmetic rules, which means that they can be added, subtracted and multiplied (and more). The rules are
    (a, b) + (c, d) = (a + b, c + d),
    (a, b) - (c, d) = (a - b, c - d),
    (a, b) * (c, d) = (a * b - b * d, b * c + a * d).
    Here the symbols +, - and * inside the brackets denote the operations on doubles. Define a class ComplexRing implementing these operations as methods. The methods should be called add, subtract and multiply. They should be non-static: the return value is the object the method is called with: computing x = y + z is performed by calling x.add(y, z). The method isZero is non-static. It returns a boolean when the complex number passed as an object is equal to zero. A complex number (a, b) is zero when a == 0 and b == 0.

    Add a constructor which can be called with two double arguments. Also add a method (called with an object connected to the method name with the dot operator) readComplex. readComplex asks for two doubles which are read with help of the class DoubleReader. It can be downloaded here. Overwrite the method toString from Object to enable printing complexNumbers in a decent way. The instance variables should be "protected" and the methods "public".

    We build on on the class ComplexRing. Define a derived class ComplexField. This class has one extra instance variable and some extra methods. The private double instance variable "norm" at all times gives the norm of the complex number, which for a number (a, b) is defined as a^2 + b^2.

    The method isZero is overwritten: the test on zero is simplified to norm == 0. reciprocal is a private static method returning the reciprocal of the complex number passed as an argument. This is defined as follows: for a number (a, b) the reciprocal is given by the number

    (a, b)^{-1} = (b / norm, -a / norm).
    Use this method to define a method divide which returns x / y = x * y^{-1} in the object z with which it is called for two complex numbers x and y passed as arguments. So, the method can be called as z.divide(x, y).

    In all methods both these of ComplexRing and CompledField one should be careful to assure that if the arguments overlap with the calling object, that then the correct result is computed. Either one might test for this and write a special strategy, or one should work with some dummy variable.

    Define an exception divisionByZero. This exception is thrown by the method divide when the second argument equals zero. Read the above text on exceptions to see how this is done and consider the example in class Seven.

    Create a program with main embedded in a class called ComplexTest. In main five complex numbers are created: u, v, x, y, z. u, x and y are read in. v is initialized at zero. Then compute z = x + y - z, and subsequently z = z * v. Print the results. Then compute x / x x / y and z / z and print the results.

  12. In the considered class Chain, the keys were ints. However, chains can have data fields of arbitrary type, the essence is the linked structure. If the keys are comparable in the sense of the interface Comparable, then chains can even be sorted in a natural way, just like arrays: A chain is said to be sorted, if for all nodes u with v = u.next != null, u.data smaller than or equal to v.data.
    • Define a class CompNode implementing Comparable. The keys of these nodes should be comparable and the definition of compareTo() should be induced by the comparability of the keys.
    • Define an interface Linkable which expresses the essential functionality of a chain.
    • Define a class CompChain implementing Linkable for nodes of type CompNode.
    • How much memory does a chain consisting of n nodes require in addition to the memory needed for the objects stored in it?
    • Add a second static sorting method to the class CompSort for sorting chains.
    • Complement the above with a class containing main. Create a list with six CompString objects and a list with four CompMatrix objects. Sort each of them.
    • Concatenate the sorted lists and sort again. What happens. How should such an abusive usage of sort be prevented / dealt with?





Graphical User Interfaces and Applets

In this chapter a short introduction is given on how to create a graphical user interface, GUI for a java program. Once this has been done, for most programs it is a small step to turn the program into an applet. An applet is a java program whose GUI is integrated into an html-page. An applet cannot be executed in a direct way, but can only be activated by going through the page in which it appears. For testing purposes and for faster execution, however, there is the program appletviewer. Typing "appletviewer name_of_page" will activate the program without showing the contents of the surrounding webpage. The word applet is used in opposition to application, which is used for conventional autonomous programs.

An applet integrated in an html-page is executed on the computer of the user who is visiting it, not on the computer which is hosting the page. This has important consequences. Because this applet should run anywhere, independent of the available system resources it implies that it cannot read or write any files. This is desirable anyway, because we would not like an unknown applet to leave possibly dangerous rubbish on our computer. In principle executing applets is save, because our own browser continuously checks that the applet is not doing anything dangerous.

Class AWT

AWT is one of several Java classes providing tools for creating a GUI. It is convenient and simple, and for basic applications there is no need to know more.

Outline

Java provides classes for the different types of building blocks of a picture. These classes can be distinguished according to their functions in several categories. The most important of these are:
Containers:
A container can contain other graphical objects. The most important container classes are the following:
  • Frame gives a window. This window looks exactly like a normal window, in dependency of the operating system. All graphical objects are placed on a frame.
  • Panel can be used to group together other graphical objects. Panels can be located on a frame or on other panels.
Components:
These are the actual graphical objects. The most important component classes are the following:
  • Button: gives a clickable button, providing a specific instruction to the program.
  • TextField: for providing a textual input to the program.
  • Label: can be used for textual output by the program.
  • Checkbox: for setting a flag, which may be used to provides a one-bit status information to the program.
Layout Managers:
Classes which can be used to arrange components and containers within a larger container.
Event-Listeners:
Classes which can be used to attach a listener to buttons and checkboxes. They allow to model the type of reactions on movements and clicks of the mouse and on keyboard input.
Canvas:
A class which can be used for drawing graphs.
More detailed information on the methods of all these classes can be found online and in any book on Java. Here we highlight only some of the most general aspects.

Everything works by creating a class which is derived from Frame. For example, in the following example we will have a class Manager which extends Frame. In main(), or somewhere else, an object of this class is created by calling the constructor of this class. In this constructor or elsewhere all components are allocated to the frame. Then the methods pack() and show() are applied to this object. pack() determines an appropriate size, show() puts the new window on the screen.

The following layout managers are available:

Before adding any component to a container p, it must first be specified which layout manager applies for p. This is done with the method setLayout(), writing p.setLayout(new XxxxLayout), where Xxxx stands for any of the listed layout managers. Of course, in the constructor of the class derived from Frame, p stands for "this" and can be omitted as usual.

Mostly even some input is desired, in that case an event listener must be attached to some of the graphical objects. In this case the class which extends Frame or one of the other classes must implement the interface ActionListener. This interface contains the method actionPerformed(), which must be implemented. actionPerormed() has a single argument, an object e of the class ActionEvent. If something happens, such as clicking the mouse, the operating system passes an interrupt to the program, which causes an the method actionPerformed() to be called with a corresponding ActionEvent. In actionPerformed() it can be figured out what has happened, for example by applying the method getSource() to e.

Class Applet

Applet is the Java class that allows a program to be executed within an html-page. Creating an applet requires that a self-defined class is defined to extend Applet.

Embedding Applets

Once an applet has been created it is a triviality to integrate it in an html-page. html, which is an abbreviation of "hypertext markup language, is a small language which is used to format webpages. There are the typical instructions for specifying fonts, for headers, for tables and for enumerations. Of course there is also a way to link pages to other pages. This is an essential feature but does not interest us here. More interesting at this point is that html also supports pictures and in a similar way applets to be integrated. An applet is not so much different from an animated picture.

A minimal page containing an applet looks as follows:

  <html>
  <head>
    <title>Nice Applet
  </head>

  <h1>Title_of_the_Page
  Surrounding text, maybe telling what the applet is doing.

  <p>

  <center>
    <applet code   = ProgramName
            width  = 700 
            height = 580
            Substitute text>
    </applet>
  </center>

  </body>
  </html>

Once an applet and a webpage of the above type has been created, it can be viewed. An applet is intended to be viewed by opening the page with a browser. Thereupon the compiled code of the applet is transferred from the computer hosting the weppage to the client computer and started. This is a rather slow process and when making changes to the applet it is not always easy to convince the computer to reload the code. Therefore, during the development it may be convenient to view an applet with the program appletviewer by typing "appletviewer name_of_webpage".

Detailed Example

For an example with a detailed description of all that is needed we refer to an earlier programming exercise dealing with the customer handling in a Modern Post Office. Several above mentioned points are repeated, but in a more concrete context.

Further Applet Examples

The following examples show some of the possibilities with applets. They are all created with very simple graphical means. Thereby they lack the sophistication, but they serve their purpose well, they were easy to program and, most importantly, they run with most browsers. It is a very bad practice of many "professional" programmers to design applications that only run with Explorer. In a commercial environment this simply costs customers, in a social context this reduces interest.



Programming Paradigms

Imperative Programming

The first computers were extremely expensive, and it was no problem to have highly specialized staff to program them. The first computers were programmed directly in binary.

Then a more understandable shorthand was developed, which is called assembler. Assembler is in very close connection to the machine code. It consists of basic instructions for loading and storing registers, for arithmetic operations, comparisons and jumps within the program. An essential feature of assembler is that it is interpreted, that is, each instruction is substituted by the corresponding machine code at runtime. Because there is such a close correspondence, this is no big issue.

The next step in the development came in the late fifties with Fortran (Formula Translator). Fortran offers many more instructions and allows to define derived types, such as arrays. Fortran is one of the first compiled languages, that is, before starting the execution, the program is fed to a special program called compiler, which translates the program into assembler.

The idea of compiling a program is crucial, because it allows to write programs on one hand in more or less human language, while at the same time it can be executed almost as well as dedicated assembler code (a good compiler is even better than an assembler programmer who is not familiar with the details of the underlying hardware). Even more important is that the compiler can be hardware specific. This implies that programs written in Fortran designed for a Vax computer from 1980 can now be translated to run on a Pentium IV even though the assembler instructions are probably quite different. Thus, with every new processor model, we need a new compiler for each of the supported languages, but there is no need to rewrite all programs.

Once this idea was around, many languages developed. Fortran (in a modernized version) is still used in scientific computing. Cobol already came in use during the early sixties and is still used for administrative purposes. Algol 60 and Algol 68 have died, but they inspired Pascal and C. The older languages are less structured, and they lead to code which is often very hard to follow. Algol 68 was a beautiful language with many nice features, very pure and rather abstract, therefore it never became so popular.

C on the other hand has become terribly popular, though later much of its popularity was taken over by C++ and Java. It is one of the earlier languages from the high-day of structured programming. The main point of the structured-programming paradigm is that a program should be structured by the extensive usage of subroutines, which can also be hierarchically nested. This avoids jumping around, as is common in Fortran: the only supported jumps (though some goto-statement still exists in most languages) are in conditional statements, loops and procedure calls. C allows to write very compact code. Because of the possibilities of the language and because there are good compilers, C and C++ are still the best choices when speed is crucial.

Pascal, which was designed by Niclaus Wirth around 1975, is simpler and more structured than C. Many handy but risky constructions which are allowed in C are forbidden in Pascal. With Pascal the structured-programming paradigm reached its full development. Another important aspect which gradually evolved over time is the typing mechanism: in Pascal there is very strict typing. Assignments can be made only between variables whose type is exactly the same. Strict typing prevents sloppy programming, and can help to reduce the number of errors.

Next to these mainstream languages, there have been developed several more special purpose languages. Simula for example was designed for simulation purposes. For working with text there are other languages.

Object-Oriented Programming

We have already mentioned structured programming as a guiding idea (paradigm) for the design of programs and for the design of programming languages. Pascal and C and their predecessors are focusing on the ongoing action, making them classical imperative languages, viewing a program in the first place as a sequence of statements to execute.

For the typical needs of scientific computing, C is all one needs: it is easy to learn, simple to use and leads to fast programs. For software projects there are different needs. Speed is only one of many important aspects, typically not one of the most important ones. Correctness, understandability for outsiders and extendibility are often much more important. This suggests a modular programming style, such as employed by Modula: a program package consists of boxes with shielded interior.

One step further brings us to object-oriented programming: no longer is the action in the center, but the information which is manipulated. The action which can be performed on an object of a certain type is an integral part of the type definition. Of course one can ignore this, just like one can ignore the ideas of structured programmed when programming in Pascal or C, but the way object-oriented languages are designed very strongly suggests a certain style of programming. Any program using non-trivial data structures profits enormously from the object-oriented style, leading much faster to running programs. Another crucial feature of object-oriented languages is the concept of inheritance. This means that a type is defined as an extension of a previously defined type, inheriting its structure and its functionality. This is precisely the feature one needs for creating data structure libraries: users can extend supplied types to fit their needs.

Many object-oriented programming languages have been developed. The most important two are C++ and Java. C++ is a follow-up to C, as the name suggests. But, it is much more than that. At the elementary level it contains almost all C instructions and therefore, one can use C++ compilers for compiling C code. However, it is extended by the whole set of object-oriented possibilities, and in addition it has a huge set of standard functions. C++ is huge.

Java is the more recent development. When designing Java the designers tried to learn from earlier mistakes. One of the main focuses in Java is failure prevention. That is why many things are simply forbidden in Java. There are no explicit pointer types, because these are a major source of errors. All pointers are implicit. Java has a much stricter typing mechanism than C and C++. It actually helps: quite complex tasks are often running correctly once the last syntactical error has been removed. At first Java was nice and small, but as soon as it became popular, the set of standard types (called classes in Java) exploded just as in C++, and now probably no one dares to claim he/she knows the whole language. Java has considerable build-in support for creating graphical applications, and of course Java is the language for animating the internet because it is only one step from a Java program to an applet. Because Java was designed with the internet in mind, Java is in principle an interpreted language (though compilers also exist): the source code is pre-chewed and turned into some intermediate code which then is interpreted. During the first years this made Java considerably slower than C and C++. This was not too serious because only in special cases speed is the main concern of software development. Furthermore, this speed difference has disappeared almost entirely. A more important disadvantage of Java, for beginners and experienced programmers alike, is that sometimes easy tasks can only be solved in a seemingly unnecessarily complicated way.

Other Programming Paradigms

There are (at least) two more programming paradigms which do not fit so directly in the line of development which started with assembler and came all the way to Java. One is functional programming. Here one expresses the whole operation of a program, by defining it as a function to evaluate. In logic programming, the operation of the program is expressed as a logical expression to evaluate. These paradigms will be discussed in more detail in dedicated chapters.



Functional Programming: Haskell

Introduction

We have already encountered two programming paradigms: structured programming (in C) and object-oriented programming (in Java). Now we will encounter a third one: functional programming. The discussion in this chapter is based on the functional language Haskell, but most of the discussed aspects can be found in similar ways in other functional programming languages (such as ML and Miranda).

The idea of functional programming is to express the desired action of a program in terms of a function. What does a program do? It somehow transforms input into output. In imperative programming (be it structured or object-oriented), the focus is on the action, so on how this transformation is performed. In functional programming, the focus lies on the transformation. A simple example is a function which accepts a set of numbers and returns the one with the smallest value. Functions can be used in a connection to realize more complex functions, just like in mathematics one can write sqrt(e^(cos(x^2))), which for any argument x, first applies the square function, then the cosine function, then the exponent function and finally the square-root function.

Neither arguments nor return values have to be numerical, they can be of any of the many types that can be declared. A type is a collection of values which have their most essential aspects in common, so that the same functions can be applied to them. Examples of trivial types are integers, floats, booleans, chars. Examples of non-trivial types are strings, sets, pictures, graphs, ... .

Haskell is a recent functional programming language. Dating back to around 1987. The name Haskell is the first name of H.B. Curry, who contributed noticeably to the lambda-calculus theory, which is a mathematical theory of functions, the foundation of functional programming. There are several implementations of Haskell, we will use the Hugs interpreter. Hugs and much more information are freely available on the Haskell homepage at http://www.haskell.org.

If Hugs has been installed on the computer, one first types "hugs" to enter the Hugs environment. Then program lines can be typed in directly or they can be loaded from files. The latter is more convenient. Loading a file is done by typing ":l file_name" (a list of commands can be obtained by typing ":?"). Hereafter the defined functions can be called by simpling typing there name followed by their parameters. The answer is printed immediately on the screen. For example, if the factorial function has been defined with name "fac", typing "fac 6" returns 720.

We have learned to perform simple computations in primary school and got so used to these operations that we take then like identities: for many people the expression "(2 + 3) * 4" is considered to be a synonym for the number twenty. However, this is only by convention. Somewhere at the bottom it has been defined what the numbers stand for (1 is one more than 0, 2 is one more than 1, ...), and then it has been defined what addition is (namely by counting) and then what the product operation is (namely by telling that it is a repeated addition). So, it is important to distinguish between the expression its evaluation and its value.

In a pocket calculator many simple operations (functions) are build in, and pushing the "=" button evaluates the pending expression (though typically intermediate results are evaluated before this). An important point is that we do not have to care about how internally the evaluation is performed.

The implementation of a functional programming language can be viewed as a pocket calculator for which own functions can be defined. A functional program consists of a number of definitions of types and functions, which ultimately are combined to compute the desired result. The result of a function may either be turned into output (by converting it to a printable string), or it may be used as input to another function.

Like on the pocket calculator, it is irrelevant how the computation below the level of function application is performed. For example, there is no such thing as an array or a for loop. However, there are list data types, and there are numerous standard functions working on lists (like copying, throwing away the first or last n elements, ...). Thus, instead of using a for loop to manipulate an array, we use a list operation to manipulate a list. This makes the formulation much shorter and abstracter. The correctness of a functional program can be proven in a formal way much more easily than the correctness of an imperative program. This is the main argument in favor of using functional programming. On the other hand, designing functional programs requires a special more abstract way of thinking which takes some time to develop. We tend to think and argue in a dynamic way: "we start at the first element, to which we add i, and then we take the next element and so on, each time adding one less than before".

The text in this chapter is to a large extend extracted from two sources: the lecture notes "Functioneel Programmeren" by Jeroen Fokker (in Dutch) and the book "Haskell, the Craft of Functional Programming" by Thompson. The first is far more systematic the second is maybe more motivating with a running example which is gradually developed, presenting the theory in small easily digestible bits.

Basics of Haskell and Functional Programming

Variables and Functions

Because function application is so common, it is not necessary to write brackets around the argument of a function. So, if we have a function "findMin" and a set "S_1", then it is correct to write
  findMin S_1".
If we also have a function "unify" and a second set "S_2", then it is also correct to write
  findMin (unify S_1 S_2)
In this case the brackets are necessary, because otherwise findMin would interpret unify as an argument, but unify is not a set but a function mapping two sets into one set.

In Haskell a name or identifier is associated with values of a certain type by a declaration involving the operator "::". They have the following form:

  name    :: type
For example, if we have already defined a type "Set", then we may write
  S1      :: Set
  S2      :: Set
This is nothing special, here variables are declared in terms of a known type, just like we would write "Set S_1" in C or Java.

Much more interesting is the following:

  findMin :: Set -> Set
  unify   :: Set -> Set -> Set
This is different from anything we have seen before: here functions are declared.

Once we have defined the sets and the functions, it is possible to work with these functions without knowing anything about the details of the underlying realization. This important idea, which is called type abstraction is fundamental to functional programming just as for object-oriented programming.

The general format of function declaration is

  name :: t_1 -> t_2 -> ... -> t_k -> t
Here t_1, t_2, ..., t_k give the types of the formal parameters of the function, t gives the return type. There is no need for a type void, because a function without return value does not need to be called and a void parameter can just as well be omitted.

So, functions are declared just as variables. We will also see how to assign a value, that is a certain functionality, to a function. Of course this is essential in anything called functional programming, but the elegance and importance of this concept is something one can take a second extra to think about.

Assigning values is done with the assignment operator "=". For the sake of concreteness we will now give some integer examples, because we do not yet know how to give values to a variable of type set.

  square :: Int -> Int
  square n = n * n
  i :: Int
  i = 5
  j :: Int
  j = square i + 2
  k :: Int
  k = square (i + 2)

The name of a formal parameter is arbitrary. Particularly it does not need to be different from the names of other variables as it is strictly local. In a strict sense it is wrong to speak of variables: i, j and k are no variables, as they can be given a value only once. In Haskell there are only constants. The distinction between constants and functions is artificial: a constant is a function with zero parameters: it always returns the same value, whereas a function with parameters may return different values depending on the values of its parameters. This is also expressed by the analogous type declarations. In the following we will nevertheless continue to speak of variables and in the case of a variable we will mostly use the word assignment for the action of giving it a value. For functions it sounds better to speak of definition for the action of allocating a certain functionality to it.

The order of the the statements is not important at all, though a logical ordering may improve the readability. It is good practice to specify the type of variables and functions, but for most of them this is clear anyway, and in those cases the formal declaration may be omitted. So, the above fragment of Haskell may also be written as:

  k = square (i + 2)
  i = 5
  square n = n * n
  j = square i + 2

In this text we distinguish between declarations, assignment and definitions, and if we mean just any of them we will speak of statements. In the context of Haskell, for example in the book by Thompson, it is customary though to use the word definition in this extended meaning.

The assignment tells that any argument of integer type, passed as an actual parameter, is to be substituted for the formal parameter n. In Haskell function application has highest priority. Thus, the value of j is evaluated as (i * i) + 2, which after substitution of 5 for i gives 27. The value of k is evaluated as (i + 2) * (i + 2), which after substitution of 5 for i gives 7 * 7, which is evaluated to 49.

Slightly more complicated is the following function with two variables:

  i :: Int
  i = 3
  j :: Int
  j = 4
  Int k
  squareSum :: Int -> Int -> Int
  squareSum i j = square i + square j
  k = squareSum i j
Here there are two formal parameters i and j. These have nothing to do with the variables i and j. When calling the function, the actual parameters, which by coincidence are i and j, are substituted for the formal parameters. The function square is called for them and the result is added together, returning 25 which is assigned to k.

The general format for a function definition is as follows:

  name x_1 x_2 ... x_k = e
Here x_1, x_2, ..., x_k are the formal parameters and the returned value in terms of these formal parameters is given by the expression e.

A first glance of the power of functional languages we get by considering the possibility to compose functions. Assume that a function "intSqrt :: Int -> Int" has been defined, which for a number n returns the largest Int m so that m^2 <= n. Then, we can define

  intNorm :: Int -> Int -> Int
  intNorm = intSqrt . squareSum
Here the operator "." combines the two functions: the output from squareSum is fed into intSqrt. This is the mathematical functional composition.

In mathematics it is not uncommon to apply operators to functions: f + g, f^2, f . g are all well-defined. In all cases the definition of operators on functions is given by telling what the value of the resulting function is for an argument, for example, the function f + g is defined by saying that for all x, (f + g) (x) = f(x) + g(x). Likewise, f . g is the function so that for all x, (f . g) (x) = f (g (x)).

Defining Functions

Basic Function Definitions

We have already seen the following two ways of defining functions: The composition of functions is to be distinguished from the above definition of squareSum, which was also given in terms of the function square, but at a lower level: there the added numbers are integers, not functions.

These two possibilities of defining functions are analogous to the two possibilities of assigning values to a variable:

Both kinds of definitions are in terms of earlier defined functions. For numerical functions this may work, but for functions over self-defined data types we must be able to start somewhere.

Definition by Enumeration

Definitions can also be made by enumeration. This is most important for functions with parameters of Boolean type, which only has a finite number of different values:
  not :: Bool -> Bool
  not True  = False
  not False = True

  or :: Bool -> Bool -> Bool
  or False False = False
  or False True  = True 
  or True  False = True 
  or True  True  = True 

  and :: Bool -> Bool -> Bool
  and False False = False
  and False True  = False
  and True  False = False
  and True  True  = True 

  exor :: Bool -> Bool -> Bool
  exor False False = False
  exor False True  = True 
  exor True  False = True 
  exor True  True  = False
When executing a Haskell program containing a line "or x y", the value of x and y is looked up, and it is checked whether somewhere there is a definition of the or function which matches. If yes the appropriate value is returned, else an error message is produced.

The usage of literals in the definition of a function can be combined with parameters. Further savings can be made by using other functions. The above function definitions may also be written shorter:

  or :: Bool -> Bool -> Bool
  or False False = False
  or x     True  = True 
  or True  x     = True 

  and :: Bool -> Bool -> Bool
  and x     False = False
  and False x     = False
  and True  True  = True 

  exor :: Bool -> Bool -> Bool
  exor x y = and (or x y) (not (and x y))
In the definition of exor the brackets are necessary because a line is processed from left to right, and otherwise the function "and" would consider "or" to be its first argument.

Definition by Case Distinction

The most important additional feature we need is a more subtle mechanism to distinguish cases. In Haskell this is realized by a construction involving so-called guards. Consider the following:
  absDif:: Int -> Int -> Int
  absDif x y 
    | x >= y    = x - y
    | otherwise = y - x
This function returns the absolute value of the difference of x and y: from two arguments x and y it returns x - y if x >= y and otherwise it returns y - x. Of course even the boolean functions can be defined using guards, but this does not make the definitions shorter and clearer:
  not :: Bool -> Bool
  not x 
    | x == True  = False
    | x == False = True

So, instead of one expression we find several expressions which are conditioned by the boolean expressions called guards appearing behind the "|" symbols. For any set of actual parameters substituted for the formal parameters, the guards are evaluated in the order they appear and the returned value is the one given by the expression standing after the first guard evaluating to True. A similar approach is followed for a function definition by enumeration: the first matching alternative is chosen. For functions it is not tested whether there is a definition for all possible values of the parameters, nor whether definitions are conflicting.

The general form for a function definition with guards is as follows:

  name x_1 x_2 ... x_k 
    | g1        = e1
    | g2        = e2
          ...
    | otherwise = e

Alternatively, this may also be done with help of an if-then-else construction:

  absDif:: Int -> Int -> Int
  absDif x y = if x >= y then x - y else y - x
The variant with guards makes a more `functional' impression and is usually preferred.

Guards allow to formulate recursive functions:

  fac :: Int -> Int
  fac n
    | n == 0    = 1
    | otherwise = fac (n - 1) * n
Given that in a functional language there are no loop-statements, recursion is the only way to assure a repeated execution while not having to write the iteration out in code explicitly.

The above works fine for all n > 0 and even for n == 0 the correct value is returned. But how about a call "fac -3"? It calls "fac -4", which call "fac -5", which ... . Of course the function "fac" should not be called with negative values, but it is better to catch this erroneous behavior in a decent way:

  fac :: Int -> Int
  fac n
    | n == 0     = 1
    | n > 0      = fac (n - 1) * n
    | otherwise  = error "fac only defined for positive numbers"
Haskell has a build-in error mechanism. It is more convenient than the error mechanism in Java: if an "error" is encountered, the execution is interrupted with some notice which may also contain a user-specified string.

We consider one more recursive function. The famous Fibonacci numbers (named after a 14th century mathematician from Pisa) are given by

fib(0) = 0, fib(1) = 1, fib(n) = fib(n - 1) + fib(n - 2), for all n > 1.
The sequence starts as follows: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, ... .

This definition immediately leads to the following recursive Haskell function, showing how easily mathematical formulations can be turned into Haskell scripts:

  fib :: Int -> Integer
  fib n
    | n == 0     = 0
    | n == 1     = 1
    | n > 1      = fib (n - 1) + fib(n - 2)
    | otherwise  = error "fac only defined for positive numbers"

This function is correct, but leads to a very inefficient computation: it is easy to show that the number of recursive calls to the function fib is exponential to the number n. The general risk with recursion is that it leads to elegant short formulations, which are extremely inefficient. Further down we will see how to compute fib(n) in a time that is linear in n.

To be precise, in the above case the number of recursive calls equals fib(n), which is about g^n, for g = (1 + sqrt(5)) / 2 ~= 1.61. This number g is called the golden ratio and already the old-Greeks (Pythagoras) were fascinated by it.

Operators and Functions

The notions operator and function are to be distinguished. An operator is a binary function which is written between its two arguments (so operators are "infix"). Functions are appearing before their arguments (so functions are "prefix").

In most computer languages one can define functions, but the possibility to define operators is less common. Operators can have names which are composed of the following symbols: !, #, $, %, *, +, ., /, <, =, >, ?, \, ^, |, :, - and ~.

An operator is nothing but a binary infix function. An operator with name "&&&&" may be turned into a binary prefix function by enclosing it in round brackets: "(&&&&)". This even holds for the predefined operators such as "+". For example one can write

  (+) 2 3
These brackets can be used to define an operator just as any other function. For example:
  (&&&&) :: Int -> Int -> Int
  i &&&& j = square i + square j

It is also possible to use binary functions like an operator in an infix way: a binary function is turned into an operator by enclosing it in back quotes. For example, for a binary function "max" we may write

  k = i `max` j

It is well-defined how to execute a line with several functions, variables and numbers: anything within brackets is to be executed first. After eliminating all brackets, the line is read from left to right, where every function looks for the arguments it needs to the immediate right of it.

Operators are more flexible in their behavior than functions. Using brackets any desired execution order can be completely specified. However, relying on their associativity and fixity most brackets can be saved. Fixity is what is also called binding power or priority in other languages. It tells us which operator to apply first when in an expression operators with different fixities appear (such as "*" and "+").

The associativity tells us how to evaluate an expression involving operators with the same fixity. For example, 3 - 4 + 5 == (3 - 4) + 5 == 4 and not 3 - 4 + 5 == 3 - (4 + 5) == -6. The reason is that "+" and "-" have the same fixity and that they are left-associative (only in the Netherlands "+" has a higher priority than "-" and there the second answer is correct). An operator is left-associative if the evaluations goes from left to right, otherwise it is called right-associative. Many operators are left-associative, but not all. The assignment operator "=" is right-associative. Also the power function "^" is commonly, and in Haskell, right-associative: 3^3^3 == 3^(3^3) == 3^27 and not 3^3^3 == (3^3)^3 == 3^9.

One might think that there are only two categories of associativity, but there are four! It is namely also possible that an operator is non-associative, which means that it is simply forbidden to have an expression of the form "a ~ b ~ c", where "~" stands for an operator. Examples of non-associative operators are the comparison operators "==", ">", ... and the division operator "/". So, in Haskell it is forbidden to write 45 / 3 / 5, this expression must be clarified by using brackets.

On the other hand may operators be simply associative. Here it does not matter how the expression is evaluated. This is the case for operators which are commutative: 3 + 4 + 5 == 4 + 5 + 3, so it cannot matter whether we first compute 3 + 4 and then add 5 or we first compute 4 + 5 and then add 3.

For all predefined operators the fixity and associativity is fixed. In case of doubt it is the best to use brackets, alternatively it can be looked up in a table. Essentially the rules are the same as for C and Java. The fixity ranges from 0 (lowest) to 9 (highest).

The default fixity of self-defined operators is 9. However, the fixity and associativity can be set in a simple way. For example

  infixl 6 &&&&
attributes fixity 6 and left-associativity to the operator "&&&&". Using "infixr" would have made it right-associative.

Fixity and associativity can even be attributed to binary functions:

  infixl 8 `max`
assigns fixity 8 and right-associativity to the binary function "max" in case it is used as an operator.

Basic Types

In Haskell there are two numerical types: Int for integers and Float for floating point numbers. There is no automatic type conversion, but the function "floor" makes an Int from a Float by throwing away the decimal fraction and "fromInt" converts an Int into a Float with the same value. Haskell is very strict. Therefore it is not correct to write "4 + 5.6": the operator "+" is defined only when both arguments are Float or both are Int.

In fact, there are two types for integral numbers. "Int" is the same type as in other languages: 32 bits, positive and negative, thus the largest positive number (in two's complement) is 2^31 - 1. There is also the type "Integer", which can be used to exactly represent arbitrarily (within the limits of the available memory) large numbers.

There are also two floating point number types: just as in C and Java there is also a type Double, which can be used for double precision numbers.

Because Haskell has strong typing and no automatic type conversion, it may be important to explicitly denote the type of an expression or a literal: it is not clear what "2 + 3" stands for, this might be an Int or an Integer, because both literal can have either of the types and the operator "+" is overloaded. Fixing a type is done in the same way it was done in declarations: with help of the operator "::". Thus, we might write "(2 + 3) :: Int". The brackets are used here to indicate that we mean the type of the expression and not just of the final number.

The type Bool is used for Booleans. It has two literals: True and False.

Characters are there in the type Char. Literals of type Char are written in single quotes: 'a', 'b', 'c', ... . There are the usual conversion functions between Int and Char:

  ord :: Char -> Int
  chr :: Int  -> Char
With help of these functions it is easy to perform operations on characters, for example testing whether a character is a digit:
  isDigit :: Char -> Bool
  isDigit c = (ord 0 <= ord c) && (ord c <= ord 9)
However, in this case we can save the calls to ord, because these are build into the system: the comparison operators "<", "<=", ">=" and ">" can be applied directly on characters, comparing the underlying numbering of these. Thus, we can also write more simply:
  isDigit :: Char -> Bool
  isDigit c = (0 <= c) && (c <= 9)

Another predefined type is String. It is not a primitive type (but a list of Char), but it has special operators. The literals of this type are enclosed in double quotes: "aabb", "", "a", ... . Here we were showing a string of length 4, composed of the characters 'a', 'a', 'b' and 'b', an empty string, containing zero characters and a string of length 1 containing the single character 'a'. A string of length 1 is to be distinguished from a character, because they have different type. Therefore there are different functions and operators that can be applied to them. For example writing "ord "a"" results in a type error:

  Main> ord "a"
  ERROR - Type error in application
  *** Expression     : ord "a"
  *** Term           : "a"
  *** Type           : String
  *** Does not match : Char
Further down, after discussing lists in general, we will come back to strings.

Structure of a Script

Every language has its own idiosyncrasies. For example in Java procedures are called methods. In Haskell a program is rather called script.

Traditional and Literate Scripts

The developers of Haskell wanted to underline the importance of comment and to facilitate the usage of it. That is why there are two types of scripts: traditional scripts and literate scripts. In a traditional script, comments are enclosed between "{-" and "-}" pairs of brackets. A one-line comment can also be written after double dash "--". In a literate script, the comment text is put in the foreground. So, here everything is comment except when it is indicated that a line is code, which is done by starting it with the symbol ">". For the interpreter to know what kind of script it is dealing with, a traditional script should be written in a file with name xxxx.hs, a literate script in a file with name xxxx.lhs.

For example, we might store the following script in a file myScript.hs

  {- Here may come a longer comment telling the name of the author 
     and the date the script was written or updated. It should also 
     be mentioned what the program is doing, how the input, if any,
     should be given and what output is produced. -}

  -- Two ints are declared and initialized
  i :: Int
  i = 3
  j :: Int
  j = 4

  -- The function squareSum with two arguments is declared and defined
  squareSum :: Int -> Int -> Int
  squareSum i j = square i + square j

  -- A third integer is declared and gets assigned the square sum 
  -- of i and j by calling the function squareSum.
  Int k
  k = squareSum i j

The same script can also be given in a literate from, storing it in a file myScript.lhs:

  Here may come a longer comment telling the name of the author 
  and the date the script was written or updated. It should also 
  be mentioned what the program is doing, how the input, if any,
  should be given and what output is produced. -}

  Two ints are declared and initialized
  > i :: Int
  > i = 3
  > j :: Int
  > j = 4

  The function squareSum with two arguments is declared and defined
  > squareSum :: Int -> Int -> Int
  > squareSum i j = square i + square j

  A third integer is declared and gets assigned the square sum 
  of i and j by calling the function squareSum.
  > Int k
  > k = squareSum i j

Considering the mathematical and abstract nature of Haskell, the literate variant of scripts is not unnatural at all: the actual code will be relatively short because it is based on powerful hidden operations which allow to express ideas concisely, but at the same time they may require some more explanation then well-written code in an imperative language.

The goal is to come away from hacking and to write code fragments with a clearly specified behavior which is also testable. So, even though we the only difference between a traditional and a literate script is in the lay-out, which one can consider as a minor detail, it marks an important conceptual difference.

An elegant, but irrelevant, feature of Haskell is that comments may be nested: a comment can stand inside a comment. In C, the following does not work

  /* This a long comment /* here we find a nested comment */ the 
     comment continues over two lines. */
After finding the "/*" the (pre)compiler only searches for the closing bracket "*/". It does not count the number of "/*" standing open. In Haskell this is done, so there it is correct to write
  {- This a long comment {- here we find a nested comment -} the 
     comment continues over two lines. -}

Modules

Haskell has a module concept. A module is a component of program text. So, for a large software project, a certain number of definitions can be packed together in a module. A module is declared as follows:
  module ElementaryFunctions where
    -- here follow the definitions 
    ...

There is the possibility of importing the definitions from other modules by using the keyword "import" in combination with the name of the other module. Inside a module it can be specified which definitions are available for export to other modules. For the purpose of this lecture the module concept is of no importance, but if need arises one should know that it exists. There is a certain analogy of modules and classes in object-oriented languages: classes can be inherited, and the access modifiers (private, ...) allow to control the access of variables and methods.

Layout Rules

Haskell is saving on additional symbols: there is no need to place brackets around a function argument; there is no need to mark the end of a command by a semicolumn; if a function has several arguments these are simply separated by blanks and so on.

Normally everything on a line belongs to a single declaration, assignment or definition, and therefore there is no need for special separator symbols. However, one may write several statements on the same line. In that case the special end symbol ";" must be used. For example, one may write

  i :: Int ; i = 3
  j :: Int ; j = 4

An important rule for saving brackets is the offside rule. This rule tells what belongs to the definition of a function. Above we have written

  squareSum i j = square i + square j
  k = squareSum i j
However, this might also have been formatted as follows:
  squareSum i j = 
    square i + 
    square j
  k = squareSum i j
or in many other ways.

The rule is that a definition ends when closing an invisible box: the box contains the line on which it starts and all following lines as long as they are beginning to the right of the first letter of the definition. So, in the example the two lines containing "square i" and "square j" belong to the box opened by the word "squareSum", but the line "k = squareSum i j" does not belong ot this box because k the stands again as far left as the word "squareSum".

This rule also implies that it is wrong to write the following:

  squareSum i j = 
  square i + 
  square j
Here we will get an error message telling that an unexpected ";" is encountered. The reason is that internally at the end of every box a ";" is placed. Here the first box ends at the "=" symbol. However, "=" should not have an empty right-hand side. The next box ends after the "+" symbol. However, "+" should not have an empty right-hand side.

Of course many things are correct, which are nevertheless not good. It is a good idea to align alternatives vertically and to use fixed indentation rules. Also, a ";" should be separated by blanks rather than written connected to the last letter of the definition: even though formally it is an end symbol, it is more natural to perceive it as a separator.

Naming Rules

In Java there are conventional naming rules, in Haskell these are enforced.

Names of the following categories must start with a small letter:

The last we have not encountered yet.

Names of the following categories must start with a capital:

A constructor is a word like True. Type classes we have not seen yet.

Of course some words are so-called reserved words, which are used by the language and which cannot be used for denoting own defined symbols. Examples are "case", "do", "if", "import", "type", "where".

More interesting is the problem that one imports a module in which a name appears that one would like to reuse. This is not an advanced problem, because by default, without explicitly requiring this, the module "Prelude.hs" is imported. In this prelude there are functions for many elementary functions, such as

  min :: Int -> Int -> Int
  max :: Int -> Int -> Int
which compute and return the minimum and maximum of two integers.

If one uses the word "min" or "max" as a name for an own defined function then an error message complaining about a clash with the imported function is the result. So, there is no automatic overwriting or hiding mechanism as in Java. In this case one can make the import of the prelude explicit and to specify that min and max are not to be imported using the keyword "hiding":

  import Prelude hiding (max, min)

It is allowed to give the same name to a variable and a function (because for the interpreter the distinction will be clear), but this is usually confusing and therefore disrecommended.

Tuples, Lists and Strings

In Haskell there are no for-loops and no arrays. These concepts, the working horses of imperative programming, do not fit well in the functional framework. Instead there are several type constructions which allow to collect together several other objects and operations which work on all elements. Internally these collections may very well be realized as arrays, but this realization is invisible.

These collections appear in two categories: the tuples collect a specified number of objects which may have different types (like records in Pascal or classes in Java), the lists collect an unspecified number of items of the same type (like linked lists). Strings are nothing more than lists of characters which have gotten their own type as a concession to convenient programming.

Tuples

Definition and Usage

The general specification of a tuple type has the following form:
  (t_1, t_2, ..., t_n)
The instances of such a tuple type look like
  (v_1, v_2, ..., v_n)
where v_i must be of type t_i, for all i.

As an examples of tuples we consider the following:

  ("Umeå", -12.7, -4.0, 12)
  ("Halle", 2.2, 4.5, 1)
These tuples are of type (String, Float, Float, Int), and might give the weather data of a city: name, minimum temperature, maximum temperature and precipitation.

Of course one can define variables of tuple types:

  weatherData :: (String, Float, Float, Int)
  weatherData = ("Umeå", -12.7, -4.0, 12)

As a second example we consider a personal record, consisting of last name, first name and age. These tuples have type (String, String, Int). A literal of this type is given by ("de Boer", "Henk", 32).

Functions can work on tuples and return them as results:

  increaseAge :: (String, String, Int) -> (String, String, Int)
  increaseAge (x, y, z) = (x, y, z + 1)

Using explicit type definitions helps to do this in a more concise way and allows for data abstraction as well. Therefore, it is a very good practice to explicitly define types and use the name of the type instead of writing out the structure in all usages. For our example we could define the following:

  type Person = (String, String, Int)

Then a small script might look as follows:

  increaseAge :: Person -> Person
  increaseAge (x, y, z) = (x, y, z + 1)

  myFriend :: Person
  myFriend = ("de Boer", "Henk", 32)
  increaseAge myFriend

Function like "increaseAge" returning a tuple offer a new possibility: We already have encountered functions with several parameters (x, y and z could also have been passed as three separate parameters instead as one tuple), but before we were not able to return more than a single value. Here the function returns a tuple which contains several values.

The above function call works with pattern matching: when providing the actual parameter ("de Boer", "Henk", 32) of type Person, then the fields in this parameter are matched against the fields in the formal parameter (x, y, z). One should realize that this is one level more of unpacking then what we have seen before.

Selector Functions

Using so-called selector functions the pattern matching can also be performed without this deeper unpacking. A selector function applied to a tuple returns the value of this field. Apparently these are only defined for two-tuples. The function "fst" returns the first field of a two-tuple, the function "snd" returns the second field. This works over any underlying type, a first example of true polymorphism in Haskell.

So, if we had written:

  type Name   = (String, String)
  type Person = (Name, Int)
we could define
  lastName    :: Person -> String
  lastName    p = fst (fst p)
  firstName   :: Person -> String
  firstName   p = snd (fst p)
  increaseAge :: Person -> Person
  increaseAge p = (fst p, lst p + 1)
However, it appears that the equivalent definitions with pattern matching are easier to formulate and understand, and therefore these are mostly to be preferred.

Example: Bottom-Up Computation

We consider how to compute the Fibonacci numbers without a recursive explosion.

In C or Java it is easy to compute the Fibonacci numbers:

  int fib(int n) {
    int i, x, y, z;
    if (n == 0)
      return 0;
    if (n == 1)
      return 1;
    for (x = 0, y = 1, i = 1; i < n; i++) {
      z = x + y; x = y; y = z; }
    return y; }

In the following we show how this behavior can be imitated in a functional way. This is done by a trick that allows to simulate the efficient bottom up computation, starting at the low side, rather than by a conventional recursive top down computation starting at the high side.

The key operation is the following function "fibStep" which performs a single step of the computation. This function is called in an indirect way with one recursive call for each level instead of two, thus preventing the exponential behavior:

  fibStep :: (Integer, Integer) -> (Integer, Integer)
  fibStep (x, y) = (y, x  + y)

  fibPair :: Int -> (Integer, Integer)
  fibPair n
    | n == 0      = (0, 1)
    | n >  0      = fibStep (fibPair (n - 1))
    | otherwise   = error "No negative values!"

  fastFib :: Int -> Integer
  fastFib = fst . fibPair

How does this work? When calling fastFib 6, it first computes fibPair 6 and then takes the first component of the returned tuple. So, the most important is to understand how fibPair 6 works. From the function definition we can see that it unrolls as follows:

  fibPair 6
  fibStep (fibPair 5 )
  fibStep (fibStep (fibPair 4 ))
  fibStep (fibStep (fibStep (fibPair 3 )))
  fibStep (fibStep (fibStep (fibStep (fibPair 2 ))))
  fibStep (fibStep (fibStep (fibStep (fibStep (fibPair 1 )))))
  fibStep (fibStep (fibStep (fibStep (fibStep (fibStep (fibPair 0 ))))))
  fibStep (fibStep (fibStep (fibStep (fibStep (fibStep (0, 1) )))))
  fibStep (fibStep (fibStep (fibStep (fibStep (1, 1) ))))
  fibStep (fibStep (fibStep (fibStep (1, 2) )))
  fibStep (fibStep (fibStep (2, 3) ))
  fibStep (fibStep (3, 5) )
  fibStep (5, 8)

So, first we only go down, `counting' how far we still have to go. For each further level down, one function call is added. When reaching the bottom of the recursive definition, some result is substituted and the `saved' function calls are executed, just as in an imperative program with a loop repeatedly calling the function.

Proving Correctness and Time Consumption

The function "fastFib" is a typical example of a functional program. Somehow it looks good, but the formulation is so intricate that it takes a while to understand what it is doing. On the other hand, this is also a positive example: the formulation is so mathematical, that we can prove that when calling fastFib with parameter n, the returned value is indeed equal to the number fib(n) defined above. In the following argument we will denote by f(n) the value returned by a function when calling it with actual parameter n.

Of course such a proof goes by complete induction. What do we want to prove exactly? We want to prove that fastFib(n) == fib(n). This appears to be quite analogous to the prove that the function "fac" computes the faculty function. Let us prove this first, as an exercise so to say. We recall that "fac" was defined as follows:

  fac n
    | n == 0    = 1
    | otherwise = fac (n - 1) * n

We must show that fac(n) == n!. The basis is given by the case n == 0: fac(0) =alg= 1 =def= 0!. So, assume that fac(i) == i! for all i < n. Then, fac(n) =alg= fac(n - 1) * n =ind= (n - 1)! * n =def= n!. Here "=alg=" denotes an equality because of the algorithm, "=ind=" denotes an equality because of the induction assumption, while "=def=" indicates an equality because of a mathematical definition. The argument is short and convincing.

Let us now try to apply the same method to proving that fastib(n) = fib(n), where fib(n) denotes the mathematically defined n-th Fibonacci number. fastFib(0) == fst(0, 1) == 0 == fib(0). So, this gives a good basis. However, if we want to go on, we have no foundation for an argument: the algorithm works with the pairs and not with the numbers fastFib(n). So, accordingly we must make a stronger claim, from which the claim we want to prove will follow. This stronger claim is that fibPair(n) == (fib(n), fib(n + 1)). If this holds, then clearly (fst . fibPair)(n) =def= fst(fibPair(n)) =ass= fst(fib(n), fib(n + 1)) =def= fib(n).

So, it suffices to prove that fibPair(n) == (fib(n), fib(n + 1)) for all n >= 0. Let us try to do this now. For n == 0, we have fibPair(0) =alg= (0, 1) =def= (fib(0), fib(1)). This is our basis. So, assume that fibPair(i) == (fib(i), fib(i + 1)), for all i < n, for some given n. Then, fibPair(n) =alg= fibStep(fibPair(n - 1)) =ind= fibStep((fib(n - 1), fib(n)) =alg= (fib(n), fib(n - 1) + fib(n)) =def= (fib(n), fib(n + 1)). This completes the proof. The reader should consider how much harder it would have been to prove the same correctness of the computation in a imperative programming language.

In a similar way we can also prove that the number of calls to the functions fibPair and fibStep is linear in n. Because each of these functions contains only a constant number of instructions this immediately implies that calling fastFib(n) has a time consumption that is somehow linear in n. This is terribly much faster than the earlier formulation. With fastFib it is no problem to compute fastFib 10000. With fib it already requires considerable patience to compute fib 30.

As in any inductive proof, we must first formulate a precise claim. In this case our claim is that calling fastFib with parameter n >= 0, that this results in exactly n calls to the function fibStep and n + 1 calls to fibPair. To allow us to talk about these numbers in a more mathematical way, we must introduce corresponding functions (in the context of grammars such functions were called attributes). So, let them be given by T_step(n) and T_pair(n). The claims can now be formulated concisely as

T_pair(n) == n + 1, T_step(n) == n, for all n >= 0.

The case n == 0 constitutes the basis of the recursive proof. For n == 0 no recursive calls are made. The result is one call to fibPair and zero to fibStep: T_pair(0) == 1 and T_step(0) == 0, as it should be. So, assume the claim holds for all i < n, for some n > 0. Then we see from the definition of fibPair that calling fibPair with parameter n results in exactly one call to fibStep plus one call to fibPair with parameter n - 1. This gives

T_pair(n) =def= 1 + T_pair(n - 1) =ind= 1 + n =cmp= n + 1,
T_step(n) =def= 1 + T_step(n - 1) =ind= 1 + (n - 1) =cmp= n.

Lists

Lists can be defined over any earlier defined type. They are used to collect objects of the same type in a single object. In lists the number of these objects does not need to be specified upon definition. For any type "t", "[t]" is the corresponding list type. Functions may use objects of a list type as argument and may return these as result.

Defining Lists

Primitive lists are obtained by enumerating the members. We give some examples:
  [1, 2, 4, 4, 2, 1]      :: [Int]
  [True]                  :: [Bool]
  ['a', 'b', 'b', 'a']    :: [Char]
  [min, max, fac]         :: [Int -> Int]
  [[3, 4, 5], [], [4, 7]] :: [[Int]]
  []                      :: [Int]
  []                      :: [Bool]

The examples show that the concept is not limited to simple types, but may be applied also to functions, as long as they all have the same type, and lists. The empty list "[]" is also a list. Often this is an important special case in inductive algorithms. The empty list can have any list type, because it may contain zero objects of any previously defined type.

The examples also show that elements may appear several times, so a list is definitely not a set in a mathematical sense, but rather a multiset. Furthermore, as we will see the order in lists is relevant:

  [1, 2, 4, 4, 2, 1] != [2, 1, 4, 4, 1, 2]
  [min, max, fac]    != [max, fac, min]
Two lists are equal if and only if they have the same type of objects, the same number of these and if the objects at corresponding places are equal.

For enumerated types, that is types which have a natural ordering such as numbers and characters, lists can be written shorter by indicating ranges as in [m .. n]. The default increase value is 1 and if n comes before m, then the list is empty. We give some examples. The short-hands on the left, the equivalent full definition on the right:

  [5 .. 9]       ==   [5, 6, 7, 8, 9]
  [9 .. 5]       ==   []
  [3.1 .. 6.8]   ==   [3.1, 4.1, 5.1, 6.1]
  ['a' .. 'f']   ==   ['a', 'b', 'c', 'd', 'e', 'f']

One can also specify the increase (which actually might be a decrease) by indicating the first two values according to the format [m, n .. p]. The following examples make this notion clear:

  [5, 7 .. 12]       ==   [5, 7, 9, 11]
  [5, 7 .. 13]       ==   [5, 7, 9, 11, 13]
  [9, 8 .. 5]        ==   [9, 8, 7, 6, 5]
  [3.1, 3,2 .. 3.6]  ==   [3.1, 3.2, 3.3, 3.4, 3.5, 3.6]
  ['a', 'c' .. 'f']  ==   ['a', 'c', 'e']

It is important to notice that all values within the range including the bounds are included, but that the end value itself is not necessarily an element of the list, because it may be jumped over by the increase. When we have a list of Float this may even happen when the increase is 1 as we can see in the above example [3.1 .. 6.8].

Functions on Lists

Lists can be used as parameters and return values in functions just as tuples. This opens many possibilities. Particularly, there are also many predefined functions on lists. We list the most important ones:
  :         a -> [a] -> [a]          add a single element to list
  ++        [a] -> [a] -> [a]        join two lists
  !!        [a] -> Int -> a          element with specified index
  concat    [[a]] -> [a]             concatenate constituting lists
  length    [a] -> Int               length of list
  head      [a] -> a                 first element of list
  last      [a] -> a                 last  element of list
  tail      [a] -> [a]               list with first element removed
  init      [a] -> [a]               list with last  element removed
  replicate Int -> a -> [a]          list with specified number of copies
  take      Int -> a -> [a]          list with    first so many elements
  drop      Int -> a -> [a]          list without first so many elements
  splitAt   Int -> [a] -> ([a], [a]) split list at specified index
  reverse   [a] -> [a]               reverse order of list
  zip       [a] -> [b] -> [(a, b)]   turn two lists into list of pairs
  unzip     [(a, b)] -> ([a], [b])   turn list of pairs into pair of lists

Notice that the first three are operators (as should be clear from the fact that they are written with `operator letters').

The following examples illustrate what is meant:

  'x' : ['a', 'b', 'c']          == ['x', 'a', 'b', 'c']
  [7, 9, 11] ++ [9, 8, 7]        == [7, 9, 11, 9, 8, 7]
  [1, 2, 7, 4, 8, 5] !! 3        == 4
  concat [[3, 4], [6], [7, 1]]   == [3, 4, 6, 7, 1]
  length [3, 4, 6, 7, 1]         == 5
  head [3, 4, 6, 7, 1]           == 3
  last [3, 4, 6, 7, 1]           == 1
  tail [3, 4, 6, 7, 1]           == [4, 6, 7, 1]
  init [3, 4, 6, 7, 1]           == [3, 4, 6, 7]
  replicate 4, 'x'               == ['x', 'x', 'x', 'x']
  take 2 [3, 4, 8, 6, 7, 1]      == [3, 4]
  drop 2 [3, 4, 8, 6, 7, 1]      == [8, 6, 7, 1]
  splitAt 2 [3, 4, 8, 6, 7, 1]   == ([3, 4], [8, 6, 7, 1])
  reverse [3, 4, 8, 6, 7, 1]     == [1, 7, 6, 8, 4, 3]
  zip [2, 3, 4, 5] [4, 2, 3]     == [(2, 4), (3, 2), (4, 3)]
  unzip [(2, 4), (3, 2), (4, 3)] == ([2, 3, 4], [4, 2, 3])

Notice that the first position of a list has, as in arrays in C and Java, index 0. This is important to be aware of when using the operator "!!". When zipping lists of unequal length, the tail of the longer list is automatically discarded. So, zip and unzip are not really inverse operations. Furthermore, zip takes two lists as arguments, unzip produces one pair of lists as result.

These functions are all one could desire: they offer both typical array functions such as "!!" and typical list function such as "drop" and splitAt. Using a combination of splitAt, drop 1 and concat, it is possible to delete an element at any specified position. Some more tricks allow to delete an element with a specified value. This is a true list operation.

On the other hand, one should never be fooled by the fact that it is possible to write an operation with just one instruction: we do not know (and mostly should not want to know) how lists are realized at the lower level. Most likely lists are implemented as linked lists, and in that case, if there is no extra pointer there, it takes time proportional to the length of the list to perform the operation "last". If on the other hand lists are implemented as arrays, then an operation like "tail", which on linked lists can be performed with a constant number of computer instructions, requires that all elements are shifted one position, which takes time proportional to the length of the array.

The above functions are defined for any list type [a]. This polymorphism is discussed further down. There are also some useful functions defined only when the list is of a specified type:

  and     [Bool] -> Bool    conjunction of all booleans in list
  or      [Bool] -> Bool    disjunction of all booleans in list
  sum     [Int]    -> Int   sum of all values in list
          [Float]  -> Float
  product [Int]    -> Int   product of all values in list
          [Float]  -> Float

List Comprehensions

The above is nice to have, but we also need to define functions on lists ourselves. A powerful instrument for doing so is constituted by the so-called list comprehensions. In a list comprehension a list is described in terms of the values of another list. The idea is to generate elements from a list, to test whether these satisfy a certain property, to transform all those that pass the test and to output this in a new list.

The basic format of a comprehension is

  [  expression_in_terms_of_variable_n | n <- some_list ]
As a concrete example consider
  doubleValue :: [Int] -> [Int]
  doubleValue list_par = [ 2 * n | n <- list_par ]
Calling "doubleValue [3, 4, 5]" returns the list [6, 8, 10]. In a comprehension, the part "n <- name_of_list" is called a generator, because it generates the elements on which the operation on the left is working.

Comprehensions can be combined with tests. Tests are Boolean expressions, which are added, separated by commas, on the right-hand side of the generator. So, the general format is

  [  expression_in_terms_of_variable_n | 
       n <- some_list , test_1 , test2 , ... ]
As a concrete example consider
  doubleSpecial :: [Int] -> [Int]
  doubleSpecial list_par = [ 2 * n | n <- list_par , isEven n , n > 5 ]
Applying this function to [3, 6, 7, 4, 8, 2] returns just [12, 16], because the other elements are either odd or too small.

Example

A nice example is the construction of a list of prime numbers up to a certain specified value. Let us fix that a prime number is any number n >= 2 with no divisors except for 1 and n. In Haskell, using list comprehensions this is an easy task. We define a function that computes the list of divisors. Then, testing the length of this list, we can test for primality. This test is used as a filter when constructing a list of primes:
  divisors :: Int -> [Int]
  divisors n = [i | i <- [1 .. n] , n `rem` i == 0]

  isTwo :: Int -> Bool
  isTwo = (== 2)
  
  isPrime :: Int -> Bool
  isPrime = isTwo . length . divisors

  primes :: Int -> [Int]
  primes n = [ i | i <- [2 .. n] , isPrime i]

This is super elegant. It is also super inefficient. For n == 1000, it takes almost one minute, while an efficient imperative algorithm can find all primes up to 10^7 in one second.

There are several things to remark about this script. In the definition of divisors we are using the binary function "rem" as an operator by enclosing it in backquotes. The function "isTwo" gives an example of an operator section that will be discussed further down. Because we have created this function, we can write the function isPrime without parameters as a function composition.

In the functions "divisors" and "primes" we see that the list which is handled in a list comprehension does not need to be passed as a parameter as it was done above in "doubleValue". Here the list is created on the spot, and then filtered by a filter that depends on the passed parameter n.

The script can be downloaded here. Running it within Hugs by typing "primes 1000" automatically prints a list: as pointed out before, Hugs works like a multi-type pocket calculator, immediately outputting computed results.

The inefficiency has three main reasons:

About the implementation of list comprehensions: we do not know how these are realized, but possibly every new insertion to the list under construction implies traversing the whole lists. What is certainly true is that running an interpreter causes a considerable slow down: every instruction of Haskell is turned into machine code again and again.

Algorithmic Improvements

Considering the algorithm for sorting, The above algorithm tries for all numbers i <= n all numbers j <= i. So, the amount of work is proportional to sum_{i <= n} i ~= n^2 / 2. However, in order to test whether a number n is prime it suffices to test the numbers up to the square root of n: if n can be written n == x * y, then either x or y is at most sqrt(n). In a very simple way, this gives a considerable improvement: now the time is proportional to sum_{i <= n} sqrt(i) ~= 2/3 * n * sqrt(n).
  intSqrt :: Int -> Int
  intSqrt = floor . sqrt . fromInt

  smallDivisors :: Int -> [Int]
  smallDivisors n = [i | i <- [2 .. intSqrt n] , n `rem` i == 0]

  emptyList :: [Int] -> Bool
  emptyList s = s == []

  isPrime :: Int -> Bool
  isPrime = emptyList . smallDivisors

  fprimes :: Int -> [Int]
  fprimes n = [ i | i <- [2 .. n] , isPrime i]

A further saving can be obtained by realizing that one can stop testing a number as soon as the first divisor is found. For numbers with several divisors this means that one can stop even earlier. It is not so easy though to estimate how much earlier.

  intSqrt :: Int -> Int
  intSqrt = floor . sqrt . fromInt

  isPrime :: Int -> Int -> Int -> Bool
  isPrime d b i
    | d > b     = True
    | otherwise = i `rem` d /= 0 && isPrime (d + 1) b i

  ffprimes :: Int -> [Int]
  ffprimes n = [ i | i <- [2 .. n] , isPrime 2 (intSqrt i) i]

A final standard improvement is to consider as divisors only the prime numbers up to the square root: if none of the prime numbers divides a number n, then no product of them divides n either.

  intSqrt :: Int -> Int
  intSqrt = floor . sqrt . fromInt

  isPrime :: Int -> Int -> Int -> [Int] -> Bool
  isPrime d b i x
    | y > b     = True
    | otherwise = i `rem` y /= 0 && isPrime (d + 1) b i x
      where y = x !! d

  fffprimes :: Int -> [Int]
  fffprimes n 
    | n == 2                    = [2]
    | isPrime 0 (intSqrt n) n x = x ++ [n]
    | otherwise                 = x
      where x = fffprimes (n - 1)

A really different algorithm, Eratosthenes' sieve method (invented by a Greec mathematician living around 200 BC in northern Africa), is even much more efficient, both in theory and in practice. To program this method in Haskell is one of the exercises.

Strings

Strings are lists of characters:
  type String = [Char]
this implies that all list functions also work on strings. A string can be printed out to the screen by the following command:
  putStr :: String -> IO ()
To output values from other types than String, one can use the function "show". The function "read" does the opposite: it reads a character and converts it to a value of another type.
  show (6 + 7)           prints "13" on the screen
  show (True)            prints "True"
  read "True"            reads True
  read "37"              reads 37
  (read "37") :: Int     reads 37 with explicit type indication

Parameters, Patterns and Variables

Pattern Matching

Defining and Calling Functions

Above we have considered several types of function definitions. One of them was called `definition by enumeration'. This is actually the simplest example of a much more general concept, namely of a definition by specifying a pattern. The discussion of this mechanism was postponed because to describe all possibilities we also need the notions of lists and tuples. These further possibilities will occasionally show up in the examples provided in the remainder of this chapter.

In most other computer languages, the only allowed type of pattern in a function definition is a name. That is, when defining a function, one of the things to do is to provide a listing of its formal parameters. Such a function is called by specifying its name together with actual parameters. These actual parameters are expressions, which are evaluated upon calling the function. The computed values are substituted for the formal parameters in the order they are specified. This mechanism even applies when a call is performed `by reference', in that case the value of the actual parameter is an address. In this simple case the compiler or interpreter only needs to check that the actual parameters have the correct type and that the number of actual parameters equals the number of formal parameters. This can indeed be checked at compile time, because the number of parameters and their types is clear from the provided code.

As we have seen, in Haskell a pattern may also involve literals. A literal is an expression like True or False or 2 or 3. The value of a literal is the literal itself. More generally a function can be defined by specifying one or more patterns. These patterns are mostly chosen to be mutually exclusive, but they do not need to deal with all possibilities. For example, the square root function is not defined for negative numbers and division is not defined for 0. It is the responsibility of the programmer to assure that a function is called only for values for which it is defined.

What happens in Haskell when a function is called? As in any other language a function is called by specifying its name together with actual parameters. Then it is checked which of the provided patterns matches the actual parameters. If such a pattern is found the definition of the function after it is chosen. If no matching pattern is found the program is terminated with an error notice.

In the above case of the "not" function this was very simple: the function has a single Boolean argument, and both alternatives for this value are listed. Calling "not b", where "b" stands for a Boolean expression, the value of "b" is determined. If its value is true, the first definition is chosen, if it is false the second.

The "exor function has two Boolean arguments and is called with "exor b1 b2", where "b1" and "b2" are Boolean expressions. To find the matching pattern "b1" is evaluated. If its value is true, the first definition is chosen, if it is false the second. "b2" is not yet evaluated in the pattern matching state. This `lazy evaluation' is discussed further down.

Patterns in Haskell

If pattern matching with names and literals was all there is, it would be good enough, but the possibilities offered by Haskell reach much further. Legal patterns are all of the following: Notice that the pattern matching might have been stretched much further. In principle one could allow any pattern from which the occurring formal parameters can be resolved, like n / 2, n * n, replicate 7 c, x && not y, ... . What is provided is all what one needs to conveniently define recursive functions in the way one also finds such definitions in mathematics. In mathematics it is very common to have a definition like "f(n + 1) = ...".

Pattern matching must be performed at run time: at compile time the values are unknown, and therefore it cannot be determined at compile time which of the alternatives to choose. Particularly, before knowing the values it cannot be checked whether for the arising values there will always be a matching alternative.

The @-Notation

Suppose we want to define a function which for any list constructs a list of lists consisting of all the suffixes of the input list. Thus, for the list [5, 7, 0, 3, 4] the result would be [[5, 7, 0, 3, 4], [7, 0, 3, 4], [0, 3, 4], [3, 4], [4], []]. Using pattern matching this function is easy to define:
  tails [] = [[]]
  tails (x : s) = (x : s) : tails s

This definition requires that the provided list, if it is non-empty, is decomposed into x and s. To compute the result, these are combined in the expression "(x : s)". However, this is a waste of effort, because the result is the same as the provided actual parameter. To make such computations more efficient, Haskell uses the symbol "@" to attach a name to a pattern. Using this symbol, the definition looks as follows:

  tails []           = [[]]
  tails xs @ (x : s) = xs : tails s

Examples

The standard function "length", computing the length of a list, is considered to illustrate some pattern matching aspects. Without resorting to the pattern matching mechanism this function can be defined as follows:
  length s
    | s = []    = 0
    | otherwise = 1 + length . tail s

Using the construction of a pattern with help of the operator ":" this function can also be defined more directly as follows:

  length []      = 0
  length (x : s) = 1 + length s
Notice that here only one of the two possibilities can match: an empty list cannot be interpreted as some list to which x is added at the beginning.

In definitions using pattern matching it is common that at most one of the patterns matches. In that case the order is immaterial. However, it is not forbidden to have patterns which are not mutually exclusive. The following is correct:

  fac 0 = 1
  fac n = n * fac (n - 1)

When calling "fac 0", in principle both definitions are matching. In this case, just like a definition with guards, the first matching alternative is chosen. Changing the order of the definitions is not correct in this case: the recursion will not stop and after some waiting the execution crashes with a segmentation fault.

Surprisingly, the following is correct:

  fac (n + 1) = (n + 1) * fac n
  fac 0       = 1
Why is this not the same? To understand this, one should know how n is solved in equations like these. The only matching n are natural numbers, that is, integers with value at least 0. In this case, calling "fac 0" the only matching n would be -1, so this is not considered to be a match. Therefore another alternative is sought and found.

In the second definition of "length" we are actually not interested in the value of "x", it is only important that the list can be decomposed in a first element and a tail. In such cases, a parameter in a pattern matching may be replaced by a place holder or wild card, for which Haskell uses the symbol "-". So, "length" can be further simplified:

  length []      = 0
  length (_ : s) = 1 + length s

In this case the practical difference will be small, but in general the usage of a place holder will may make the computation more efficient. This might be the case for the following definition of the function "isEmpty":

  isEmpty []      = True
  isEmpty (_ : _) = False

Local Variables

Local variables can be declared behind the keyword "where". This may be useful to make the definition of a function clearer, and it can also make an evaluation faster by preventing that a value is computed several times.

As an example we consider a function which computes (n - 1)! * n!. Of course we can write:

  facFac :: Integer -> Integer
  facFac n = (fac (n - 1)) * (fac n)
This computation requires 2 * n products. This function should therefore better be written as follows:
  facFac :: Integer -> Integer
  facFac n = x * y
    where
      x = fac (n - 1)
      y = x * n
Here the number of products is only n + 1.

When using "where", the layout is free, except for the fact that the offside rule applies. This means that the "where" should begin to the right of the beginning of the function name, and that everything that belongs to its scope should begin to the left of the beginning of it.

As a further, more convincing example, we give an alternative, shorter version of the efficient computation of the Fibonacci numbers:

  altFib :: Integer -> Integer
  altFib 0 = 0
  altFib n = snd (fibPair n)
    where 
      fibPair 1       = (0, 1)
      fibPair (n + 1) = (b, a + b)
        where
          (a, b) = fibPair n

This example reveals several interesting aspects: The local `variables' do not need to be variables (local constants would be a better name anyway). It is better to speak of local definitions, which may also include local function definitions. In that case it makes sense to have several locality levels, with a local definition containing even more local definitions. In the example we find two of these levels, each initiated with their own "where".

Even more interesting are the pattern matchings. One non-trivial pattern matching occurs in the line "fibPair (n + 1) = (b, a + b)". Calling fibPair with some value x, the system is then matching x to (n + 1) and substitutes x - 1 for all occurrences of the formal parameter n. A second pattern matching occurs in the line "(a, b) = fibPair n". Here the value of fibPair n, which is a pair of ints is matched component by component with (a, b). This latter type is called conformal pattern matching, because the expression on the right-hand side has to conform to the pattern on the left.

Types and Partial Parametrization

Polymorphism

The functions such as "length" and "!!" and "head" are not limited in their application to a single type of arguments. Rather they can be applied to any list type. So, how can function like these be defined? What is the type of these functions?

It would be a lot of work to list all possibilities, and this is even forbidden. Trying the following piece of code:

  myfun :: Bool -> Bool
  myfun x = not x

  myfun :: Int -> Int
  myfun x = x + 1
Returns the following error:
  Repeated type signature for "myfun"

So, how to define functions like "length"? The solution is to use a type variable. In the following the variable "a" is a type variable:

  length :: [a] -> Int

This definition states that for a list over a type "a" the result is an integer. That "a" is not a type but a variable is clear because of the naming rules: all types begin with a capital. The above really describes all cases in which the function length can be used. Thus, "[a] -> Int" is what is called the most general type for the function length.

There are more general and less general cases. The definition of ":" involves only a single type variable:

  (:) a -> [a] -> [a]

The function "zip" is more general, with a definition involving two type variables:

  zip :: [a] -> [b] -> [(a, b)]

Of course definitions of this kind are not limited to functions from the prelude: one can also define new functions using type variables. For example:

  repAdd :: Int -> a -> [a] -> [a]
  repAdd i x s
    | i == 0 = s
    | i  > 0 = repAdd (i - 1) x (x : s)

Overloading

Overloading is not the same as polymorphism. A function or operator is said to be overloaded if it works with arguments of several types, but not for all. Examples are the operators "+", "==", ">". The first is defined only for numerical arguments, the second only for types which somehow can be compared, the last only for types which are ordered.

It is important to nevertheless allocate a single type to every function an operator. Therefore, types are arranged in type classes. The most important of these are:

Num:
the class of numerical types.
Ord:
the class of ordered types.
Eq:
the class of types for which the elements can be tested on equality.
Integral:
the class of integral numbers.

The type definitions of the above functions now read:

  (+)  :: Num a => a -> a -> a
  (<)  :: Ord a => a -> a -> Bool
  (==) :: Eq  a => a -> a -> Bool
Here we encounter the new symbol "=>". The above expressions must be read as "(+)" is of type "a -> a -> a" provided that a is an element from the type class "Num".

Like polymorphic functions and operators can the user also define overloaded functions and operators.

  squareSum :: Num a => a -> a -> a
  squareSum x y = x * x + y * y

Notice that the polymorphism of self-defined polymorphic functions was `inherited' from the polymorphism of earlier defined functions. In the same way "squareSum" derives its overloaded character from the overloadedness of the operators "+" and "*".

Currying

Partial Parametrization

A function with several parameters can also be interpreted as a function with a single parameter to a function with one fewer parameter. The value of a function without parameters is a constant. This is the reason that the symbol -> is used several times in definition and not only to indicate the type of the return value. Interpreting functions of several parameters as functions to functions is called currying.

This distinction is no problem because function application is left-associative and the operator "->" is right associative. This means that

  f :: A -> B -> C -> D -> E

  f a b c d    ...
are equivalent to
  f :: A -> (B -> (C -> (D -> E)))

  (((f a) b) c) d    ...

So, a function with four parameters can be interpreted as a function which maps an argument to a function which maps an argument to a function which maps an argument to a function of one variable. This corresponds to the function f starting to `swallow' its arguments starting from the left: if there are no brackets, then they stand so that Currying works well.

In the context of higher-order functions, treated further down, we will see that sometimes it is nevertheless necessary to use brackets in function definitions.

Currying is not just a theoretical issue. It can also be used practically. Consider the following function

  fac :: Integer -> Integer
  fac n 
    | n == 0 = 1
    | n >  0 = n * fac (n - 1)

  choose :: Integer -> Integer -> Integer
  choose n k 
    | n < k     = error "In choose: k larger than n"
    | otherwise = fac n `div` (fac (n - k) * fac k)

It computes the number of subsets of size k of a set with n elements, a function which is often pronounced as "n choose k".

Suppose that we are working with a set of fixed size and that we want to define the function which computes the number of subsets of a size to specify. This function can be written by partial parametrization as follows:

  chooseFromTen :: Integer -> Integer
  chooseFromTen = choose 10

The most important application of partial parametrization is when one wants to pass a partially parametrized function as an argument to a higher-order function (see further down). In that case it is not necessary to explicitly give the function a name, we can simply write "choose n".

Operator Sections

Because operators are binary, it makes some sense to have a more symmetric notion of partial parametrization. In Haskell operators can be partially parametrized on either side. The result is a function, called operator section, with one parameter:

When applying one of these operator sections to an argument y, the returned value is the one given by substituting y at the single free position. Thus, for an operator "&&&" with

  (&&&) :: A -> B -> C
we have
  (&&& x) :: A -> C
  (&&& x) y = y &&& x

  (x &&&) :: B -> C
  (x &&&) y = x &&& y

Of course for commutative operators there is no difference between left and right sections, but in general there is. For example, "( / 3)" is the function "divide by three" and "(3 / )" is the function "divide three by". We give some examples of potentially useful definitions:

  next :: Integral a => a -> a
  next = ( + 1)

  square :: Num a => a -> a
  square = ( ^2)

  twoPower:: Num a => a -> a
  twoPower = (2^ )

  isZero :: Integral a => a -> Bool
  isZero = ( == 0)

  reciprocal :: Float -> Float
  reciprocal = (1.0 / )

Higher-Order Functions

Definition

We have defined lists and used them both as arguments and as output to functions. An important construction was the list comprehension, which allowed to transform a list in a functional way conditioned by one or more Boolean expressions. With these tools one can program almost anything one likes, but not always in a very general or particularly appealing way. In this section we will learn how to employ higher-order functions to increase our expressive power.

A higher-order function is a function with a function as argument. The possibility of defining higher-order functions is only consequent: functions are not really different from values: they have a type, they can be the result of a function application, for example by Currying, and now we will even consider them as parameters. In the following we will present a number of these functions which are defined already in the prelude. It is instructive to consider their definitions.

Examples

Map

"map" realizes the idea of doing something with all elements of a list. So, it is equivalent to an unfiltered list comprehension. A possible definition is (the actual definition is slightly more general):
  map :: (a -> b) -> [a] -> [b]
  map f []      = []
  map f (x : s) = f x : map f s

Here we see several things. First "map" is a function which takes as arguments a function from type a to type b and a list over type a and produces a list over type b. Here we encounter another example of a highly polymorphic definition. Clearly it is important to write the brackets in "(a -> b)" because otherwise when calling "map f s" for some function "f" and a list "s", would not work: map would expect its first argument to be of type "a" and then expects a third argument, namely a list of functions. So, here the brackets are used to correct the default bracketing given by the right-associativity of "->".

Another point is the definition of a recursive function using pattern matching. Alternatively, we could have written

  map :: (a -> b) -> [a] -> [b]
  map f xs 
    | length xs == 0  = []
    | otherwise       = f x : map f s
      where
        x = head xs
        s = tail xs
It would have been more elegant to write "xs == []", but testing the equality of two lists is performed by comparing the elements, and the elements of a polymorphic type can not necessarily be compared.

In the first definition of "map", coming as actual parameter with a list "xs", it is considered which of the definitions fits. In this case there are no conflicts possible: a list is either empty, or it can be written as a first element plus some rest. Because of the type of the operator ":", it is clear to the interpreter that xs must be split into head and tail, and not in any other way.

Therefore the following is not supported:

  map :: (a -> b) -> [a] -> [b]
  map f []         = []
  map f (s1 ++ s2) = (map f s1) : (map f s2)
This would give an interesting but undesirable ambiguous cutting of s anywhere (the possibilities include splitting of an empty list, which would result in an infinite recursion).

A simple application of mapping is in the following:

  map (2 *) [1, 2, 3]      ----> [2, 4, 6]

Along this line we might even define our own function "doubleAll"

  doubleAll :: Num a => [a] -> [a]
  doubleAll = map (2 *)
Here the higher-order function map is Curried with a function, which itself is an operator section, and turned thereby into a function mapping lists of a numerical type into lists of the same type.

Filter

"filter" realizes the idea of constructing a sublist of a list by testing for each of them a Boolean expression:
  filter :: (a -> Bool) -> [a] -> [a]
  filter p []      = []
  filter p (x : s) 
    | p x       = x : filter p s
    | otherwise =     filter p s

It is now possible to give very concise formulations like the following:

  (//) :: Int -> Int -> Bool
  (//) x y = x `rem` y == 0

  multiples :: Int -> Int -> [Int]
  multiples i n = filter ( // i) [0 .. n]

But of course for a list of integers "s" one can also use the following one liner to filter out all numbers >= 10:
  filter ( < 10) s

A list comprehension is a shorthand for zero or more filters plus an application of map. Consider the following:

  myComp :: [Int] -> [Int]
  myComp s = [2 * i | i <- s, i // 3]
This can be rewritten as a composition of the function "filter with ( // 3)" and the function "map with (2 * )":
  myFilterMap :: [Int] -> [Int]
  myFilterMap = map (2 * ) . filter ( // 3)

Filter and map are more general, but list comprehensions are clearer. So, if the desired functionality can be obtained with a comprehension, then this is the preferred way of writing it.

Takewhile and Dropwhile

The above defined filtering is only one of many possibilities to conditionally traverse a list. An alternative is to traverse a list and keep all the elements until a certain condition is violated. Or, to throw away all elements as long as a certain condition is satisfied. These functions are defined in the prelude under the names "takeWhile" and "dropWhile":
  takeWhile :: (a -> Bool) -> [a] -> [a]
  takeWhile p [] = []
  takeWhile p (x : s) 
    | p x       = x : takeWhile p s
    | otherwise = []

  dropWhile :: (a -> Bool) -> [a] -> [a]
  dropWhile p [] = []
  dropWhile p xs @ (x : s) 
    | p x       = dropWhile p s
    | otherwise = xs

A combination of takeWhile and dropWhile can be used to process a text in a very simple but slightly inefficient way: assume that we want to decompose a text presented as a single long text into words. The words are terminated by some well-defined symbols such as blanks. This task is performed by the following script:

  endMarker :: Char -> Bool
  endMarker c = c == ' '

  takeWord :: String -> String
  takeWord = takeWhile (not . endMarker)

  dropWord :: String -> String
  dropWord = dropWhile (not . endMarker)

  dropSeps :: String -> String
  dropSeps [] = []
  dropSeps (x : s) 
    | endMarker x = dropSeps s
    | otherwise   = x : s

  getWords :: String -> [String]
  getWords s
    | x == []   = []
    | otherwise =  (takeWord x) : (getWords (dropWord x))
      where
        x = dropSeps s

Foldl and Foldr

The folding operations are useful when one wants to compose all elements of a list by use of an operator working on elements of the type of the list to a value of this type. For example if one wants to sum up all the values of a list of a numeric type. Consider
  foldr :: (a -> b -> b) -> b -> [a] -> b
  foldr op e []      = e
  foldr op e (x : s) = x `op` foldr op e s

Here the operator is denoted "`op`", the neutral element of the operator is denoted by "e" and the third argument is the list of elements of type "a".

For example, one can now type

  foldr (+) 0 [0 .. 10]     ----> 55
Other examples are given by functions like "sum" and "product" and "or":
  sum :: [Int] -> Int
  sum = foldr (+) 0

  product :: [Int] -> Int
  product = foldr (*) 1

  or :: [Bool] -> Bool
  or = foldr (||) False

For non-associative operators it makes sense to also consider folding from the other end. The formulation is slightly less elegant:

  foldl :: (a -> b -> a) -> a -> [b] -> a
  foldl op e []      = e
  foldl op e (x : s) = foldl op (e `op` x) s

Here the evaluation is performed starting from the left, accumulating the result in the value that is returned once the list s is empty.

Until

There is a strong need for an elegant functional mechanism for obtaining iteration. This is offered by the function "until":
  until :: (a -> Bool) -> (a -> a) -> a -> a
  until p f x 
    | p x       = x
    | otherwise = until p f (f x)

Here "p" is a predicate for type "a" it maps values of this type to Booleans. "f" is the function which is to be repeatedly executed, and "x" is the starting value. The execution stops as soon as the value of "x" satisfies the condition given by "p".

This definition of "until" is by far not as versatile as a for-loop in C, but it can be used for operations such as finding the smallest power of two exceeding a given number:

  until (> 1000) (2 *) 1

Here we must be more careful than before: a priori there is no guarantee that an until-loop terminates. This was different for "map", "filter" and "foldr": these had a recursion which was defined over the length of the list. This length is monotonously decreasing and eventually becomes zero.

Infinite Lists and Lazy Evaluation

In Haskell it is possible to define and use infinite lists. The meaning of this is that it is not always easy to know beforehand how many elements or answers are needed, using infinite lists is a convenient option for such cases.

Of course, in finite time it is not possible to compute an infinite number of results. Fortunately, in Haskell any answer is printed as soon as it is known. If we have seen sufficiently many answers the computation can be interrupted with "control c".

Lazy Evaluation

Even more important is the usage of infinite lists as intermediate results in the computation of a finite result. Here again, infinitely many results cannot be computed in finite time, but here we profit from the lazy evaluation mechanism in Haskell. This means that intermediate results are evaluated only once it has become clear that they are needed for computing the output values. The opposite of lazy evaluation is greedy evaluation. Greedy evaluation means that as soon as all actual parameters are known a function is evaluated completely. Lazy evaluation is a necessary condition for working with infinite lists.

Even lazy evaluation does not mean that one can work with infinite lists in a careless way: applying functions like "sum" or "length" on an infinite list would imply that the computer first should compute all values, before being able to compute the result. The interpreter does not detect such programming errors, but executes the task, taking infinite time.

Some basic kind of lazy evaluation we are already familiar with: if we write a Boolean expression the expression is evaluated from left to right, and as soon as it is certain that the result of the expression results in true or false, the remainder of the expression is not evaluated. This kind of optimization is performed by most compilers, but this aspect is not necessarily part of the specification. Thus, if we write in C "for (i = 0; i < 10 && bestValue(a[i], &x, y); i++)", then we may not be sure whether "bestValue" is executed for "i == 10" or not. If it is executed this may lead to array-bound errors. If it is not executed this may mean that at the end "x", which may be changed as a result of the action of "bestValue", has an unexpected value. Finding programming errors of this kind is very hard.

Fortunately in functional languages there is no such thing as side effects of function calls: actual parameters are constants within the scope of their functions. They do not change as a result of calls to functions with them. Therefore, the order of the function calls is irrelevant for the computed value or correctness (but it may have an impact on the time consumption because due to lazy evaluation may mean that not all of them have to be executed).

The operator "&&" is defined as

  (&&) :: Bool -> Bool -> Bool
  False && y = False
  True  && y = y
This means that when calling "x && y", for "x == False", the result does not depend on the value of "y", and therefore "y" is not computed.

With a different definition both actual parameters must be known in order to perform the pattern matching. In that case not even lazy evaluation can help us:

  (&&&) :: Bool -> Bool -> Bool
  (&&&) 
  False && False = False
  False && True  = False
  True  && False = False
  True  && True  = True

The definition of equality of lists is (except for details of the typing) as follows:

  (==) :: [a] -> [a] -> Bool
  [] == []           = True
  []      == (y : t) = False
  (x : s) == []      = False
  (x : s) == (y : t) = x == y && s == t

The important thing is that "==" is defined in terms of "&&". We know that due to its clever formulation, lazy evaluation implies that as soon as the first argument does not evaluate to true, the second argument is not evaluated. But then this implies for "==" on lists that the comparison is terminated once the first non-matching pair of values is encountered.

This observation of the operation of "==" on lists is important. It implies that often comparisons are much cheaper than they appear. Consider the following:

  divisors :: Int -> [Int]
  divisors n = [i | i <- [1 .. n] , n `rem` i == 0]

  isPrime :: Int -> Bool
  isPrime n = divisors n == [1, n]

How far is "divisors" evaluated? Only as long as this is needed to verify whether the result equals "[1, n]" or not. This means only as long as the lists, starting from the beginning are pairwise identical. So, as soon as the first non-trivial divisor is found, it is compared with "n" and the conclusion is that the lists are not the same. So, the value of "isPrime n" is known, and the rest of the list of divisors is left unevaluated. Clearly it is even better to only test the numbers "[2 .. floor (sqrt (fromInt n))]" and to compare with the empty list.

Infinite Lists

Standard Constructions

Infinite lists can arise in various ways. The following denotes the set of all integers starting with n:
  [n .. ]

The following function also gives an infinite list:

  iterate :: (a -> a) -> a -> [a]
  iterate f x = x : iterate f (f x)
Calling "iterate (+ 1) 3" gives the same list as above.

Simpler infinite lists are obtained by "repeat", which simply gives the same result again and again:

  repeat :: a -> [a]
  repeat x = x : repeat x

The simplest polymorphic function is the identity function:

  id :: a -> a
  id x = x
Partially parametrizing "iterate" with "id" gives "repeat", so the higher-order function "iterate" is strictly more general than "repeat".

Small Powers

There are many handy applications of infinite lists. Assume we want to compute all powers of a number "r" which do not exceed "m". An easy possibility is to estimate the number "n" so that r^n < m <= r^{n + 1} and then to perform
  map (3^) [0 .. n]
In this particular case, this is not very efficient, and the number n can indeed be computed, but if we want to determine more generally all values so that f(i) < m for all 0 <= i <= n and f(n + 1) >= m, then we do not know which n to take.

This problem can be solved in Haskell with the methods we have encountered before, but the following is much easier:

  takeWhile ( < m) (map f [0 .. ])

If in a call like "takeWhile ( < 1000) (map (3^) [0 ..])" first all powers of three would be computed, this would never lead to a result, but the whole process is directed by "takeWhile": it repeatedly calls for a new value from the list. This value is then produced by applying "f" to the next element from the list. This is repeated until the first value larger than 1000 is produced.

We consider an alternative variant. Above we were generating all values so that f(i) < m for all 0 <= i <= n and f(n + 1) >= m. We can also consider the values so that f(f( ... f(0))) < m. This is a typical example where one may use "iterate":

  takeWhile ( < m) (iterate f 0)

For the values 1, 3, 9, 27, ... can be interpreted as the sequence of functional values f(x) for the function f = (3^), but just as well as the values f(f( ... f(0))) for the function f = (3 * ). This latter interpretation, allows to generate this sequence more efficiently:

  takeWhile ( < m) (iterate (3 * ) 1)

Converting an Integer to a String

Assume we want to define a function which converts an integer to a string which in decimal notation has the same value. The first one needs is a function which converts a decimal digit to the corresponding character, then one needs some construction to extract the digits from the number one by one and to pack them in a string.

The first task is easy (actually this is also achieved by the function "digitChar" from the prelude):

  digitToChar :: Int -> Char
  digitToChar = chr . (ord '0' +)
Here we see an operator section of "+" composed with the function "chr". The function does not work correctly when the provided argument is not a digit. Providing other numbers will either result with nonsense or an error.

For converting an entire positive number, we can proceed as follows:

  digitsToString :: Int -> String
  digitsToString n
    | n == 0 = []
    | n >  0 = digitToChar x : digitsToString y
      where
        x = n `rem` 10
        y = n `div` 10

It is a minor problem that now the number gets reversed and we should also deal with the case n == 0 in a special way:

  intToString :: Int -> String
  intToString n
    | n == 0    = "0"
    | otherwise =       reverse (digitsToString    n )

There is nothing wrong with the above method, but it does not taste very `functional', it stays very close to a conventional C implementation, with a barely hidden loop. Furthermore, we were once again programming the recursion ourselves, not using all the provided mechanisms. This is as if we would program a for loop in C with an if and a goto.

We now describe a more sophisticated alternative which only uses functions from the prelude. A number is viewed to have an infinite number of leading zeros. This interpretation allows to apply "iterate (`div` 10)". Remember: "iterate" gives an unconditional repetition. Applying this function to 64582 generates [64582, 6458, 645, 64, 6, 0, 0, ...]. This infinite execution can be made finite by combining it with "takeWhile (/= 0)". As soon as the first number with value 0 is encountered in the list, the list is truncated. Because of the lazy evaluation, this means that "iterate" is not further asked to provide numbers. The remaining task is easy: from each of the provided numbers we must take the last digit and convert it to a character. These should be glued together to a string and reversed to get the answer (only the case that the specified value equals 0 we get an empty string):

  intToString :: Int -> String
  intToString 0 = "0"
  intToString = reverse . map (digitToChar . (`rem` 10)) 
                        . takeWhile (/= 0) . iterate (`div` 10)

Sieving Primes

A classical method for finding prime numbers is by `sieving' out the non-primes. In the conventional imperative algorithms, one uses an array of Booleans for this (using one bit to represent the status of each number) exploiting the random-access feature of arrays to rapidly access the numbers across the array. To program this original variant of the algorithm is one of the exercises. If random access is provided, this is very efficient: to determine all prime numbers up to a certain number n takes an amount of time that is hardly more than linear in n, using n bits of memory.

Here we discuss an even simpler variant. In an implementation with lists it is quite an efficient prime number generating method, but it is incomparably slower than the method working with an array of Booleans: the time for generating all primes up to a certain number n is almost quadratic in n. The infiniteness is here pushed quite far: we do not work with just one infinite list, but with infinitely many infinite lists. Of course none of these is entirely traversed.

The first list contains all numbers larger than 1. Its first element, 2, is a prime number. In the second list we take out all multiples of 2, these are certainly no prime numbers. The remaining list consists of all odd numbers larger than 1. Its first element, 3, is a prime number. Taking out all multiples of 3 gives a list consisting of all numbers larger than 1, which are no multiples of 2 and 3. Its first element, 5, is a prime number again.

More generally, the construction goes as follows:

Let us denote the i-th prime number by p_i. Thus p_0 = 2, p_1 = 3, ... . We claim that with the above construction the first element of list i is just p_i, for all i >= 0. Notice how important the precise definitions are for clearly formulating a claim like this. It is proven by complete induction. The case i == 0 forms the basis of the induction. The first element of list 0 is 2, which is indeed p_0. Now assume the claim holds for all j <= i. Which elements do we find in list i + 1? This list is obtained in i + 1 steps from list 0 by filtering out the multiples of their first elements. Because of the induction assumption we know that these are the numbers p_0, ..., p_i. Consider p_{i + 1}, the (i + 1)-st prime number. Because p_{i + 1} >= 2, it is an element of list 0. Because p_{i + 1} is not a multiple of any other prime number, it is not filtered out, and therefore it is also an element of list i + 1. It remains to show that p_{i + 1} is the first element of list i + 1. Consider any number q, 2 <= q < p_{i + 1}. Any number, so q as well, can be written as the product of prime factors. Because q < p_{i + 1}, q cannot have factors from p_{i + 1} or any larger prime number. Thus q can be written as product of prime numbers smaller than p_{i + 1}. But this implies that q was removed from the lists when the multiples of its smallest prime factor where filtered out.

The above gives a formal proof that taking the first elements of all generated lists precisely gives us the set of prime numbers, ordered from small to large. It remains to turn this idea into Haskell. Notice that in the proof, even though all involved lists are infinite, this played no role. Just in the same way, the infiniteness will not bother an interpreter with lazy evaluation. On the other hand, without lazy evaluation, it would not be possible to directly translate the mathematical idea into computer code.

The iterate construction produces an infinite list of results by applying a function again and again. Until now the function was exactly the same in all applications: in the first example we used (3 * ), in the second (`div` 10). "iterate" allows to pass only one function, but its functionality can be tuned with a parameter. That is what we will do to construct the filtered lists:

  multiple :: Int -> Int -> Bool
  multiple x y = y `rem` x == 0

  sieve :: [Int] -> [Int]
  sieve (x : s) = filter (not . multiple x) s

  makeLists :: [[Int]]
  makeLists = iterate sieve [2 .. ]

  primeNumbers :: [Int]
  primeNumbers = map head makeLists

Here we encounter some new aspects of Haskell. The functions "makeLists" and "primeNumbers" do not have any parameters, out of nothing they generate a list of lists of ints and a list of ints, respectively. The function "map" takes two arguments: a function and a list. In our case the list is actually a list of lists. We see that "makeLists" can be treated just as any other variable of type [[Int]].

In the definition of "sieve" we see that the composed function "not . multiple" does not need to be enclosed in brackets. The reason is that the operator "." has fixity 9 and this it will be executed even before "multiple" is applied to "x". The sieve function is parametrized with the first element of the list to which it is applied. This first element is removed. Notice that "sieve" is only defined for non-empty lists: for an empty list the pattern-matching mechanism will fail.

For this last reason, it is better to not define a top-level function "sieve", but to make it local: this assures that it will never be called in an inappropriate way. Observing that there is no need to define "makeLists" explicitly, we get the following:

  primeNumbers :: [Int]
  primeNumbers = map head (iterate sieve [2 .. ])
    where
      sieve (x : s) = filter (not . multiple x) s
        where
          multiple x y = y `rem` x == 0

Here we omitted most of the type information. In general there are three good reasons for providing type information:

On the other hand, it does not make sense to double the length of a script providing information that is obvious anyway.

Efficiency and Correctness

Time Consumption

Counting Reductions

Typing ":s +s" in the Hugs environment turns on the measurement of time and memory usage. After the completion of a call to a function this prints the number of reductions and cells. The reductions give the number of function calls as counted by the interpreter. The cells give a measure for the memory used. In addition the interpreter may inform us that it was performing garbage collection, this means that the interpreter was running out of memory and reused space.

It may happen that calling the same function with the same arguments does not result in the same number of reductions and cells. The reason is that even the procedure which performs the syntactical analysis of the script (upon loading it, or any time after leaving the editor) is lazy. This means that if a constant has a functional value, this value is not yet computed. When the script is called again, this value is still available. This happens when making two calls to "f" in the following script:

  k = 3 + 4
  f x = k + x

Order of Growth

The development of the time consumption as a function of the input size is more important then the exact number of reductions. The reason is that this tells more about the underlying applied methods and is less dependent on the details of the implementation. This order of growth is the main topic of any course on algorithms. Here we will only recall some given examples and consider a few further examples to show the importance of choosing the right approach.

For the computation of the Fibonacci numbers we were first computing fib(n) y calling fib(n - 1) and fib(n - 2). The result was a time consumption that is exponential in n. The alternative method based on making steps with a pair of numbers had a time consumption which was linear in n. The difference is huge, with the first method n == 30 is about the largest value for which fib(n) can be computed, with the second method there is almost no limit on n.

For the computation of prime numbers we have seen how to reduce the number of computations from quadratic (dividing any number j <= n by all numbers < j), to something much smaller. Here we encountered the problem that Haskell does not provide random access memory. This implies that the discussed sieving method (of which a variant is the subject of one of the exercises) has to traverse the lists step by step. In a language like C we would use an array which would be traversed with ever increasing steps.

Performance Improvements

Here we do not present all possibilities to solve problems better by using better algorithms and data structures. We only make a Haskell-specific observation related to the implementation of lists.

Lists are implemented as linked lists (see the example in the chapter on Java). This means that insertions can be made easily at the beginning of the list, but that it is costly to access an element at a specific position or to add something at the end of a list. This is the reason that the preferred way of adding elements to a list is with help of the operator ":". Independently of the length of the list "s", it takes only constant time to compute "x : s".

How should one concatenate two or more lists? Suppose we have lists called "s1", "s2", "s3" and "s4". Their lengths are l1, l2, l3 and l4. If we perform ((s1 ++ s2) ++ s3) ++ s4, then the cost for finding the end of the first list in each ++-clause equals l1 + (l1 + l2) + (l1 + l2 + l3) = 3 * l1 + 2 * l2 + l3. If on the other hand we perform s1 ++ (s2 ++ (s3 ++ s4)), the cost is only l3 + l2 + l1. So, it matters how "++" and similar operations are performed. Realizing this, the Haskell designers have chosen to make "++" right-associative.

Proving Laws

Logical and Numerical Laws

In an earlier chapter we have seen laws from propositional logic laws. Simple ones such as "x & y == y & x" (& is commutative) and less simple ones such as "x -> y & y -> z => x -> z" (-> is transitive). For computing with numbers there are also laws. Examples of such laws are "a * b + a * c == a * (b + c)" (* distributes over +) and "x * a / (x * b) == a / b" (simplification of fractions by a common factor).

The essence of a law is that it holds independently of its context, that is, the values of a, b, c, x, y and z in the formulation of the laws do not matter: the law holds always. The reason to formulate laws is that they allow to argue with them. This can be used when proving the equivalence of two more complicated logical expressions.

As an example we give a slightly shortened version of the prove of the law "(a + b) * (a - b) == a * a - b * b":

  (a + b) * (a - b)             =distributivity= 
  a * a + b * a - a * b - b * b =commutativity of *=
  a * a + a * b - a * b - b * b =distributivity=
  a * a + a * (b - b) - b * b   =definition of -=
  a * a + a * 0 - b * b         =0 annihilates *=
  a * a + 0 - b * b             =0 is neutral element of +=
  a * a - b * b
Once we have proven this new law we can use it whenever we encounter a pattern "(a + b) * (a - b)". Here "a" and "b" are not necessarily numbers, they can themselves be numerical expressions.

The main reason to apply laws is to simplify expressions, whatever simplification exactly means. In one context one should rather write 13 / 20, in another context 0.65, which by definition is nothing but 65 / 100. Typically one wants to reduce the number of symbols or the size of the involved numbers, but the goal may also be to come with an equivalent formulation which can more easily be evaluated.

Haskell Laws

In an analogous way we can formulate and prove laws in Haskell. Examples of such laws, which will be proven hereafter, are the following:
  f . (g . h)    == (f . g) . h             -- associativity of .
  map f . (x :)  == ((f x) :) . map f       
  map (f . g)    == map f . map g           -- map distributes over .
  map f (s1 ++ s2) == map f s1 ++ map f s2  -- map f distributes over ++
  map f . concat == concat . map (map f)

Just as with logical and mathematical laws, these Haskell laws can be used to prove more complex functional equalities. The purpose of this is to be able to perform program transformation, which means that in a stepwise way a piece of code is transformed to another piece of code which is provably equivalent. As in the mathematical and logical context, the reason that one may want to do this is to `simplify' the code, which in a programming context typically means to obtain a program which can be executed more efficiently by the computer. The great point with having laws is that they can be handled in an automatized way. This implies that program optimization can be performed by the computer. Even in imperative languages one can perform program transformations, but there one always has to take care of unexpected side-effects. Because the value of a function does not depend on a context, and because computing a function does not change any other values, do we not have this problem when applying laws in a functional language.

In order to start proving, we need some foundations. We will use that by definition "f == g" means that for all x from the domain over which f and g are defined "f x == g x". Also we use that the operator ".", which gives functional composition, is defined as "(f . g) x == f (g x)". And, of course, we will need the definitions of the involved functions such as "map", "++" and "concat".

We prove the law "f . (g . h) == (f . g) . h", by showing that for any x from the domain of h we have "(f . (g . h)) x == ((f . g) . h) x":

  (f . (g . h)) x                     =def .=
  f ((g . h) x)                       =def .=
  f (g (h x))                         =def .=
  (f . g) (h x)                       =def .=
  ((f . g) . h) x

The law "map f . (x :) == ((f x) :) . map f" is proven by showing that for any argument s, which clearly must be a list of the type of x, we have "(map f . (x :)) s == (((f x) :) . map f) s":

  (map f . (x :)) s                   =def .=
  map f ((x :) s)                     =def of operator sections=
  map f (x : s)                       =def of map=
  (f x) : (map f s)                   =def of operator sections=
  ((f x) :) (map f s)                 =def .=
  (((f x) :) . map f) s

The above proofs were straight-forward. It is slightly harder to prove laws on functions with a recursive definition. In that case we should give a proof by complete induction, and distinguish several cases, which together cover all possible inputs. Mostly there are two cases, one basis case and the general case, but sometimes it is necessary to distinguish more than two cases. Because of the inductive assumption, when proving the general case, we may assume that the law holds for shorter strings or smaller numbers.

We prove that "map (f . g) == map f . map g". This law tells that there is no need to traverse the list twice: we can directly apply the composed function. First we show that "(map (f . g)) [] == (map f . map g) []", and then showing that "(map (f . g)) (x : s) == (map f . map g) (x : s)" under assumption that "(map (f . g)) s == (map f . map g) s".

  (map (f . g)) []                    =def map=
  []                                  =def map=
  map f []                            =def map=
  map f (map g [])                    =def .=
  (map f . map g) []

  (map (f . g)) (x : s)               =def map=
  (f . g) x : (map f . g) s           =induction assumption=
  (f . g) x : (map f . map g) s       =def .=
  f (g x) : map f (map g s)           =def map=
  map f (g x : map g s)               =def map=
  map f (map g (x : s))               =def .=
  (map f . map g) (x : s)

We prove that "map f (s1 ++ s2) == map f s1 ++ map f s2" for all lists s1 and s2 of appropriate type. First the case "s1 ==[]" is checked, then it is checked for s1 == (x : s), assuming that the law holds for s1 == s:

  map f ([] ++ s2)                    =def ++=
  map f s2                            =def ++=
  [] ++ map f s2                      =def map=
  map f [] ++ maf s2

  map f ((x : s) ++ s2)               =def ++=
  map f (x : (s ++ s2))               =def map=
  f x : map f (s ++ s2)               =induction assumption=
  f x : (map f s ++ map f s2)         =def ++=
  (f x : map f s) ++ map f s2         =def map=
  map f (x : s) ++ map f s2

The inductive proof that "map f . concat == concat . map (map f)" goes analogously, first checking the empty list, and then proving that "(map f . concat) (x : s) == (concat . map (map f)) (x : s)", under assumption that this holds for "s". Even this may be used to reduce the number of list traversals by one. This law is a direct generalization of the above for the case that there is a list of lists:

  (map f . concat) []                 =def .=
  map f (concat [])                   =def concat=
  map f []                            =def map=
  []                                  =def concat=
  concat []                           =def map=
  concat (map (map f) [])             =def .=
  (concat . map (map f)) []

  (map f . concat) (x : s)            =def .=
  map f (concat (x : s))              =def concat=
  map f (x ++ concat s)               =distribution of map f over ++=
  map f x ++ map f (concat s)         =def .=
  map f x ++ (map f . concat) s       =induction assumption=
  map f x ++ (concat . map (map f)) s =def .=
  map f x ++ concat (map (map f) s)   =def concat=
  concat (map f x : map (map f) s)    =def map=
  concat (map (map f) (x : s))        =def .=
  (concat . map (map f)) (x : s)

In the above proof we use the distributive law we have proven before. The more laws we know, the shorter proofs can be given. Prove with several cases can often also be shortened by only performing the case distinction where it is necessary, while performing transformations which hold for all inputs only once. We would have saved some lines, if we would have realized that for all xs (empty or not), "(map . concat) xs =def .= map (concat xs)" and that "concat . map (map f)) xs =def .= concat (map (map f) xs)". Then by induction it only remains to prove that "map f (concat []) == concat (map (map f) [])" and that "map f (concat (x : s)) == concat (map (map f) (x : s))" under assumption that "map f x ++ map f (concat s) == map f x ++ concat (map (map f) s)".

When trying to construct a proof like those given above, it may not always be light to stay on the path leading from the function at the beginning to the function at the end. Often it helps to start from both ends and to try to meet in the middle (in inductive proofs this may be the point were the inductive assumption can be applied).

Exercises

  1. The function "nand" has two boolean arguments and returns True when not both arguments are true. Give two definitions of nand: one in terms of other logic functions and one by enumeration.

  2. The function "nand" is universal in the sense that the other (we mean "not", "and", and "or") Boolean functions can be given in terms of it. Do this.

    The functions nand, and and or are binary. Construct corresponding operators, which should be denoted !&&, &&& and |||, respectively. Make them left-associative and give them a sensible fixity.

  3. Define a function
          sortedFour :: Int -> Int -> Int -> Int -> Bool
        
    which returns True if and only if the four specified values stand in sorted order. That is, they must be are (weakly) increasing.

  4. Define a function
          differentFour :: Int -> Int -> Int -> Int -> Bool
        
    which returns True if and only if all four specified values are different. Use guards and minimize the number of comparisons (6 comparisons is enough).

  5. Define a function
          numDifFour :: Int -> Int -> Int -> Int -> Int
        
    which returns a decent error message if the numbers are not sorted (you may use the function sourtedFour from above for testing this) and otherwise returns the number of different values among the four arguments.

  6. Define a function
          charToNum :: Char -> Int
        
    Which converts a character which represents a digit to the corresponding digit. So, '5' is converted to 5. If the character is not a digit an error message should be produced.

    Give two variants of the function: the first uses guards to distinguish all cases, the second uses the functions "ord" and "chr".

    Now create a function

          stringToNum :: String -> Int
        
    which converts a string consisting of digits only to an integer. This function should use charToNum and it should also be possible to enter negative numbers.

  7. Define a function
          rangeProduct:: Int -> Int -> Int
        
    For arguments low and hgh it is defined as follows: if low > hgh, the returned value is 0. Otherwise the range product is given by low * (low + 1) * ... * hgh.

    Give a definition of the function "fac", computing the factorial, in terms of the function "rangeProduct".

  8. The famous Euclidean algorithm for computing the greatest common divisor, gcd, of two positive integers proceeds as follows: for numbers x and y (assume x >= y) it equals y if the remainder of dividing x by y equals 0, otherwise it equals the gcd of y and the remainder of the division of x by y.

    Using the function "rem" computing the remainder of the division of the first by the second argument defined in the prelude, define a recursive function "gcd" which for any pair of positive (that is, larger than 0) integers computes the gcd. Concretely: your script should also handle the case x < y.

  9. Define a function
          stringReplicate :: Int -> String -> String
        
    For an integer n and string s, it creates a string with n copies of s after each other. So,
          stringReplicate 3 "ape"    ==   "apeapeape"
        

  10. Define a function "charFilter" which for any input string takes out all alphabetical letters, small letters and capital, and turns the small letters into capitals. All other symbols are to be discarded. Use list comprehensions and the functions "chr" and "ord", but do not use any of the other prelude functions such as "toUpper" or "isAlpha". Define these yourself instead.

  11. How are lists implemented? If they are implemented in arrays, then it matters where an insertion is performed: at the end or the beginning. Define the following three functions:
          pshRght :: Int -> String -> String
          addRght :: Int -> String -> String
          addLeft :: Int -> String -> String
          pshLeft :: Int -> String -> String
        
    "pshRght n s" takes string s and adds n times a blank at the front of s. So, "pushright 3 "ape"" gives " ape". Notice that this is not done by adding n blanks in one stroke. "pshRght" is based on the operator ":". "addRght n s" does the same, but adds n times " " using "++". "addLeft" is similar to "addRght", but adds the strings " " add the end using "++". In order to define pshLeft, first define an operator ":::" adding an element to a list at the end of the list. It should be left-associative and have fixity 5. Now define "pshLeft" analogously to "pshRight".

    Determine for each of the functions the smallest n so that it takes more than a minute to execute them.

  12. We have seen how to compute prime numbers. The program was short but inefficient. A really good method to compute all prime numbers up to a certain number is with help of the so-called sieve of Eratosthenes. In the text a version using an array of integers is presented. For computing all primes up to n, it is potentially much more efficient (provided there would be an array implementation of lists) to work with an array of n + 1 Booleans.

    Initially all positions are set to true except for position 0 and 1. Then the array is traversed from the small indices to the large. When at position i we encounter true, we know that i is a prime number. All multiples of i are no prime numbers, so their values must be set to false.

    In an imperative language, using a for loop an if and a second for loop, this program can be programmed in 20 minutes. It can be downloaded here. Now turn this idea into a Haskell script: define a function

           eratosthenes :: Int -> [Int]
        
    which for an integer value computes all prime numbers in a list.

  13. The Catalan numbers are defined by
    cat(1) = 1,
    cat(n) = sum_{i = 1}^{n - 1} cat(i) * cat(n - i), for all n > 1.
    Give a simple 3 line function computing the Catalan numbers. Hint: use a list comprehension to simulate the sum.

    How much is cat(15)? What is the problem? Let T(i) denote the time for computing T(i). Prove that if T(1) == 1, that then T(n) >= 2^n. Hint: use induction. Actually the time consumption is even worse, and therefore you can make quite coarse estimates.

    A major improvement can be obtained by not computing the same numbers again and again. This can be achieved by first computing cat(i), for all i < n and packing these somehow in a list. This list is used when computing cat(n). Hint: the list can be accessed with help of the function "!!". How long does it take now to compute cat(15)?

  14. An interesting problem is to compute the minimum number of coins required for paying a certain amount of money, assuming that there are sufficiently many coins of each value. We consider a system in which the coins have values v_0 = 1, v_1 = 4, v_2 = 6 and generally v_{3 * k + i} = v_i * 10^k, for all 0 <= i < 3 and k > 0. In this system it is not trivial to perform the task: always taking the largest still fitting coin first is not good for 88.

    Let coin(n, l) be the minimum number of coins needed to pay an amount n using only the coins with values v_0, ..., v_l. Then, it is not hard to see that

    coin(n, l) = infinity, if n < 0
    coin(n, l) = 0, if n == 0
    coin(n, l) = min{coin(n, l - 1), 1 + coin(n - v_l, l), if n > 0

    Let minCoin(n) be the minimum number of coins needed to pay an amount n using any of the coins. Clearly the largest potentially useful coin is the largest coin with value not exceeding n. Write a script for evaluating the function minCoin according to the given specification. How much is minCoin(400)?

    The inefficiency of the program comes from the fact that certain values are recomputed again and again. The worst are the values coin(n, 0) and coin(n, 1). Now define a function fastMinCoin, which computes the same values in a more efficient way: any value should be evaluated only once and kept in appropriate lists.

    Hint: compute the values `row by row', that is, first compute all values coin(n, 0), then the values coin(n, 1). Of course, you may just as well work `column by column', first computing all values coin(0, l), then all values coin(1, l). It may help to first consider an imperative implementation to better understand the task to solve. An implementation in C can be downloaded here.

  15. Define the function "concat" as a call to "foldr".

  16. Define the function "length" as a call to "foldr", defining a suitable binary operator (with arguments of different types).

  17. Give a non-recursive definition of "repeat" using "iterate".

  18. Define the function "until" using "iterate", "dropWhile" and one more function.

  19. Prove that "foldr (:) [] == id". Here "(:)" is the binary function corresponding to the operator ":".

  20. Prove that "length . (x:) == (1 +) . length".

  21. Prove that "length . map f == length", for any function "f".

  22. Prove that "length (s1 ++ s2) == length s1 + length s2", for all lists s1 and s2. Using this, prove that "length . concat == sum . map length".





Logic Programming: Prolog





Finite Automata

Definitions

A pattern is a set of objects with a recognizable property. Most common is to consider patterns in strings of characters. An example of a pattern is that of a legal name of a C identifier. Not all strings give legal identifier names: such a name should begin with a letter or '_' and the following characters should all be letters, digits or '_'. The two central questions in this context are:

The whole domain dealing with these two questions is called automata theory or language theory. In the chapter on grammars we have already seen how many patterns can be generated, for example all strings with equally many a's and b's. Parsing is one way of recognizing strings. In this chapter a more limited machinery is considered.

Accepting and Rejecting

A program often progresses through a certain number of states: situations with well-defined properties. For example, one can think of a program for recognizing whether a string contains some codeword as a subsequence. We consider the codeword "tomorrow". The program starts in the initial state. The program starts to traverse the supplied string and finds the first 't'. Then it progresses to the next state, looking for an 'o'. In this way, the program progresses from state to state and eventually comes to the final state after finding the final 'w'. In this case the whole string is said to be accepted, that is, it fits the imposed requirements. Alternatively, the letters 't', 'o', 'm', 'o', 'r', 'r', 'o', 'w' do not occur in the string in this order, and the program comes to the end of the string without reaching the final state. In that case the string is said to be rejected.

The above way of progressing from state to state is thought to be performed by a device called state machine. Of course this process can also be described graphically. In the graph, the states correspond to the nodes, and the edges give the possible transitions. Labels next to the edges indicate the conditions under which the transitions are made. The initial state is called starting state, the final state is called accepting state. The accepting states will be drawn with double circles. Such a graph is called a finite automaton or just automaton. The default assumption is that the state machine stops upon reaching an accepting state. In that case it may be thought to output "yes", if it stops without reaching an accepting state, the machine may be thought to output "no".

Finite State Machine Searching for a String

The important point with finite state machines is that they can be translated in a mechanical way to a piece of code. One can write a procedure for each state. In each procedure the next character is scanned and the program continues with one of the specified alternatives which is selected depending on the value of the character. An essential feature of these finite state automata is that they traverse the string from left to right only once. The program, which can be downloaded here, corresponding to the recognition of "tomorrow" as a substring looks as follows:

  #include "stdio.h"
  #define TRUE  1
  #define FALSE 0
  typedef char boolean;

  boolean first_w(FILE* input) {
    char c;
    while ((c = getc(input)) != EOF && c != 'w');
    if (c == EOF) return FALSE;
    return TRUE; }

  boolean third_o(FILE* input) {
    char c;
    while ((c = getc(input)) != EOF && c != 'o');
    if (c == EOF) return FALSE;
    return first_w(input); }

  boolean second_r(FILE* input) {
    char c;
    while ((c = getc(input)) != EOF && c != 'r');
    if (c == EOF) return FALSE;
    return third_o(input); }

  boolean first_r(FILE* input) {
    char c;
    while ((c = getc(input)) != EOF && c != 'r');
    if (c == EOF) return FALSE;
    return second_r(input); }

  boolean second_o(FILE* input) {
    char c;
    while ((c = getc(input)) != EOF && c != 'o');
    if (c == EOF) return FALSE;
    return first_r(input); }

  boolean first_m(FILE* input) {
    char c;
    while ((c = getc(input)) != EOF && c != 'm');
    if (c == EOF) return FALSE;
    return second_o(input); }

  boolean first_o(FILE* input) {
    char c;
    while ((c = getc(input)) != EOF && c != 'o');
    if (c == EOF) return FALSE;
    return first_m(input); }

  boolean first_t(FILE* input) {
    char c;
    while ((c = getc(input)) != EOF && c != 't');
    if (c == EOF) return FALSE;
    return first_o(input); }

  boolean accept_string(FILE* input) {
    return first_t(input); }

  int main() {
    FILE* input = fopen("input", "r");
    if (accept_string(input))
      printf("Tomorrow we are starting\n");
    else
      printf("We still have to wait\n");
    fclose(input); return 1; }

The program is rather long, but trivial. With the help of a routine find_char the text could be shortened, but the given variant stays more closely to the operation of the finite state machine. This could have been done even more explicitly by replacing the do-while loops by further recursive calls to the method itself. The reason that we have not done this, is that very deep recursion is leading to stack-overflow. The given program is guaranteed to have finite recursion depth independently of the length of the input text.

Turning the action of a finite state machine into a program is a rare example of a context in which the usage of "goto" is defendable. The reason why one normally should not use goto is that it makes it hard to trace the execution. Here this is no problem: the history is irrelevant, the only two points that matter are the current state and the remaining string to process. The labels of the states should be used as labels in the program.

Using goto, there is no need for while loops because no data are accumulated as is the case with subroutine calls. So, for the simple machines considered here, goto allows to stay most closely to the operation of the finite state machine. The alternative program, which can be downloaded here, looks as follows:

  #include "stdio.h"

  int main() {
    FILE* input = fopen("input", "r");
    char c;

    first_t:
      if ((c = getc(input)) == EOF)         goto reject;
      else if (c == 't')                    goto first_o;
      else                                  goto first_t;
  
    first_o:
      if ((c = getc(input)) == EOF)         goto reject;
      else if (c == 'o')                    goto first_m;
      else                                  goto first_o;
  
    first_m:
      if ((c = getc(input)) == EOF)         goto reject;
      else if (c == 'm')                    goto second_o;
      else                                  goto first_m;
  
    second_o:
      if ((c = getc(input)) == EOF)         goto reject;
      else if (c == 'o')                    goto first_r;
      else                                  goto second_o;
  
    first_r:
      if ((c = getc(input)) == EOF)         goto reject;
      else if (c == 'r')                    goto second_r;
      else                                  goto first_r;
  
    second_r:
      if ((c = getc(input)) == EOF)         goto reject;
      else if (c == 'r')                    goto third_o;
      else                                  goto second_r;
  
    third_o:
      if ((c = getc(input)) == EOF)         goto reject;
      else if (c == 'o')                    goto first_w;
      else                                  goto third_o;
  
    first_w:
      if ((c = getc(input)) == EOF)         goto reject;
      else if (c == 'w')                    goto accept;
      else                                  goto first_w;
  
    accept:
      printf("Tomorrow we are starting\n"); goto stop;
  
    reject:
      printf("We still have to wait\n");    goto stop;
   
    stop:
      fclose(input); return 1; }

Signal Processing

One of the most important applications of automata is in signal processing: a good copying machine should not copy the tiny speckles on the paper and enlarge them in subsequent copies finally blurring the whole text. This can be done by some kind of edge-detection. A one-dimensional variant of this is that in a string of 0's and 1's any isolated 0 or 1 is suppressed.

The machine has four states:

0.
Zero after a zero
1.
One after a zero
2.
One after a one
3.
Zero after a one

In this case the point of the processing is not so much to accept or reject a string, but rather to traverse it and perform some specified actions depending on the state of the machine. Therefore, the machine does not need to have accepting states. Alternatively one might add one accepting state to which the automaton transits at the end of the processed string.

The transitions follows from the descriptions of the states. For example, if the machine is in state 0, it transits to state 1 when the next bit is a one otherwise it stays in state 0. In state 0 and 1 the machine outputs a zero, in state 2 and 3 the machine outputs a one.

Finite State Machine Filtering Isolated 0's and 1's

The given machine has four states for two output values. These extra states endow the machine with a memory of its most recent history. In general a finite-state machine can be given a finite memory. However, if the input is not just bi-valued as in this example, the number of states needed may grow fast.

We mentioned that an important feature of finite automata is that they can be translated into code in a mechanical way. At least as important is that they can be realized at a much lower hardware level. The given speckle-suppressor can be realized by a small number of gates, which are switching depending on the next bit: a zero gives a pulse on input channel zero, a one on input channel one. This means that each step can be performed by a small constant number of gate switches, by a tiny circuit. Both factors together make such switching several orders of magnitude faster than handling the signals by a general-purpose processor in which signals have to travel through the whole chip and where a single operation at the level of C involves many operations at the level of the gates.

Paths and Labels

For a given automaton, one can follow for a string of symbols S = s_1 ... s_n, the path starting from the start state through the automaton (which is defined as a graph) choosing the edges which correspond to the symbols s_i. This path can be represented by the sequence of visited states. Constructing this path or, equivalently, this sequence of states, is called simulating the automaton on the sequence s_1 ... s_n. The string S is said to be a label of the path. Because an edge can correspond to many different characters, a path may have many different labels.

As an example we consider the string 01101001101101. If the four-state automaton for suppressing isolated 0's and 1's is applied on this string, then the sequence of visited states is given by 01232301232232. So, 01101001101101 is a label (in this case it is unique) of the path 01232301232232. The output is 00111100111111.

Deterministic and Non-Deterministic Automata

Definitions

Until now we only considered automata in which the transitions were well-defined in the sense that for any given input character in any state there was a unique outgoing edge labeled with this character. Such automata are called deterministic automata.

However, there is no requirement that automata are constructed like this. In other words, it is allowed that the same character appears in the list of symbols corresponding to an edge. If an automaton is in some state and the next symbol is x, while x occurs next to more than one outgoing edge, than the automaton may proceed over any of these edges. Such automata are called non-deterministic automata. Without further specification, one should assume that an automaton is non-deterministic.

For a deterministic automaton, one path might be labeled with many different strings, but for any string the path it labels is unique. On a non-deterministic automaton, a string may label many different paths. The most important point is that not all of these paths need to terminate in either accepting or rejecting states. The convention is to say that a non-deterministic automaton accepts a string if at least one of the paths ends in an accepting state.

A non-deterministic automaton can be viewed as a process in which at certain stages guessing is allowed. Not only the guessing is allowed, but it is even assumed that the process always guesses right. So, it is not correct to replace a non-deterministic automaton by a deterministic one by simply fixing one of the alternatives and excluding the others: this possibly reduces the number of accepted strings, because the necessary alternative for reaching an accepting state might be eliminated.

Detecting 0100

It is relatively easy to construct an automaton to detect a finite number of consecutive 0's or 1's (designing an automaton which recognizes four consecutive 1's is an exercise). It is slightly more tricky to design an automaton which detects a substring like 0100. Consider the following automaton:

Attempt to Detect Substring 0100

Clearly it transits to the accepting state, state 4 only if the digits 0100 occur consecutively: if any wrong bit is encountered, the automaton transits to the starting state, state 0, again. But, does it accept all strings which have 0100 as a consecutive substring? The answer is no. For example, 00100 and 010100 are rejected, but this is not correct. The reason is that even a non-matching string can contain a non-empty prefix of the string we are looking for.

In this simple case, it is not hard to repair this mistake with an alternative deterministic automaton:

Deterministic Automaton Detecting Substring 0100

However, it is much easier to do this in a non-deterministic way:

Non-Deterministic Automaton Detecting Substring 0100

With an extra convention, the automaton can be simplified further:
If an automaton is in a certain state and the next symbol does not occur in the list of labels of any of the outgoing edges of this state, this branch of the evaluation dies.
This convention is equivalent to an implicit transition labeled with all possible symbols except for those listed along the other outgoing edges to a rejecting state without exit.

This convention allows to simplify the automaton:

Simplified Automaton Detecting Substring 0100

The semantic is very simple now: the automaton can wait any number of characters looping in state 0, before starting to run, detecting the string 0100 if it occurs. If one would like to use this automaton for actually testing a string, the non-determinism in state 0 when the next bit is 0 should be interpreted as a point where two alternative continuations are to be considered: the search is branching. In other words, the search proceeds along a tree structure instead of a simple path. For the complexity of this search process, it is essential that the branching is limited to the minimum, which is achieved by letting superfluous branches die.

Simulating a Non-Deterministic Automaton

A deterministic automaton can be simulated for a given string by simply following the unique path labeled by the string. If at some stage an accepting state is reached, the string is accepted. If we come to the end of the string before reaching an accepting state, or if the the path dead-ends because at some stage there is no transition from the actual state labeled with the current character, the string is rejected.

For a non-deterministic automaton we must proceed along several paths. The following gives the complete simulation of the non-deterministic automaton for detecting the substring 0100:

Simulation for String 010100100

Above we have seen that turning deterministic automata into programs is trivial. For non-deterministic machines this is harder: even though we assume that the automaton always guesses right, we cannot assume that our program does. So, we must accept that it is not sufficient to just follows any path and reject the string if no accepting state is reached. This implies that somehow we must keep track of all alive paths.

A first idea is to keep track of the state and the reached position in the string of each alive path. So, our data base might look like (2, 12), (1, 20), (3, 18), meaning that there is one alive path which has reached state 2 after reading 12 characters, one with state 1 and character 20 and one with state 3 and character 18. This requires that a position of the string may be accessed several times, but practically this is no problem: the string can be loaded into an array or the command fseek() can be used.

Alternatively, all alive paths can be pushed forwards in a synchronized way as follows: The computation is divided in supersteps. In each superstep the next character is read and for all alive paths the corresponding transition is made. In each superstep paths may die and new paths may be spawned.

The synchronized processing has the disadvantage that the expansion of a promising path is retarded by the others. However, the advantages are tremendous: if the string is infinite (for example the string of digits of the number pi), any given branch can remain unsuccessful for ever, even though others might reach an accepting state. Synchronized expansion assures that any path of finite length is traversed in finite time.

Another great advantage of synchronous expansion of the paths is that if two paths reach the same state, we do not have to continue with both of them: if they are in the same state at the same input character, then they will either both reach an accepting state, or neither of them. Because we are speaking about finite state machines, this means that in any given superstep, there are at most a finite number of alive paths to expand. This means that simulating the non-deterministic machine this way is slower by at most a constant factor in comparison with the best we could do: guess right at every branch and run towards the accepting state at full speed.

Because the synchronous expansion assures that all paths have reached the same position in the string, their is no need to store this position along with the states: it suffices to store the current set of states, which is a subset of all states. Because the automaton is finite, this requires only a constant amount of memory.

Here we touch on a great point of finite automata: if a problem can be formulated in terms of acceptance by a finite automaton (deterministic or non-deterministic) at all, then the problem can be solved by a simple program in a time that is proportional to the size of the input requiring only constant memory. Of course this only holds for the kind of automata we were considering so far: they are traversing the input string only once.

Detecting 01*0 or 1010

A similar example, but slightly more complex, is to detect whether a string either contains the substring 01*0 or 1010. Here * denotes a wildcard, in our case this means either 0 or 1. It is no big problem to design a deterministic automaton, but a non-deterministic automaton is again much easier. In words its functionality is: wait any number of characters before finding 01*0 or finding 1010. There is no need to take into account the multiple possibly useful prefixes:

Detecting 01*0 or 1010

This automaton can be simulated as before. Like in the previous example, in the simulation two matching substrings are found. An actual evaluation would probably halt as soon as the first accepting state is reached. But maybe the task is to find all matches: when searching for a string pattern in an editor, there is mostly an option "next match".

Simulating for String 01011010

This example makes clear that non-determinism allows to easily formulate automata for string matching problems of the kind "look for occurrencies of string S_1 followed by S_2 or S_3 followed by S_4 and S_5". This explains why string matching is one of the most important application areas of automata: a request can be translated by a simple program into an automaton, which subsequently can be turned into a piece of code. Not only that this can be done, but the resulting code is even efficient.

Eliminating Non-Determinism

We have seen how to simulate a non-deterministic automaton by following all possible paths in a synchronized way. Doing this, we only need to store the current (finite) set of states, from which the next set of states can be computed in constant time. This idea is the key to the idea that any non-deterministic automaton can be turned into a deterministic one. This is one of the most important ideas from the theory on finite automata.

Towards a Deterministic Automaton

Example 1: the 0100 Automaton

Consider again the automaton detecting 0100. Performing a synchronous expansion, as a result of the transitions dictated by the characters of the input string, the database of states develops as follows:
  {            0} --0->
  {         1, 0} --1->
  {      2,    0} --0->
  {   3,    1, 0} --1->
  {      2,    0} --0->
  {   3,    1, 0} --0->
  {4,       1, 0} --1->
  {      2,    0} --0->
  {   3,    1, 0} --0->
  {4,       1, 0}

Because there are finitely many states, it is practical to manage the current set of states with an array b[] of bits, b[i] == 1 meaning that state i is in the set and b[i] == 0 meaning that it is not. With this convention, the array b[] develops as follows:

  (0, 0, 0, 0, 1) --0->
  (0, 0, 0, 1, 1) --1->
  (0, 0, 1, 0, 1) --0->
  (0, 1, 0, 1, 1) --1->
  (0, 0, 1, 0, 1) --0->
  (0, 1, 0, 1, 1) --0->
  (1, 0, 0, 1, 1) --1->
  (0, 0, 1, 0, 1) --0->
  (0, 1, 0, 1, 1) --0->
  (1, 0, 0, 1, 1)

Graphically these transitions can be indicated as follows:

Simulating for String 010100100

Example 2: the 01*0-or-1010 Automaton

Consider again the automaton detecting 0100. Performing a synchronous expansion, as a result of the transitions dictated by the characters of the input string, the database of states develops as follows:
  {                  0} --0->
  {               1, 0} --1->
  {         3, 2,    0} --0->
  {   5, 4,       1, 0} --1->
  {   5,    3, 2,    0} --1->
  {   5,    2,       0} --0->
  {6,    4,       1, 0} --1->
  {   5,    3, 2,    0} --0->
  {6, 5, 4,       1, 0}

Because there are finitely many states, it is practical to manage the current set of states with an array b[] of bits, b[i] == 1 meaning that state i is in the set and b[i] == 0 meaning that it is not. With this convention, the array b[] develops as follows:

  (0, 0, 0, 0, 0, 0, 1) --0->
  (0, 0, 0, 0, 0, 1, 1) --1->
  (0, 0, 0, 1, 1, 0, 1) --0->
  (0, 1, 1, 0, 0, 1, 1) --1->
  (0, 1, 0, 1, 1, 0, 1) --1->
  (0, 1, 0, 0, 1, 0, 1) --0->
  (1, 0, 1, 0, 0, 1, 1) --1->
  (1, 0, 0, 1, 1, 0, 1) --0->
  (1, 1, 1, 0, 0, 1, 1)

Graphically these transitions can be indicated as follows:

Simulating for String 01011010

General Idea

More in general, for an automaton with s states, we maintain an array b[] of s bits. Computing the next set of states, as a function of the current input character c, is done, with help of a second array b'[], as follows:
  void transition(char* b, int s, char c) {
    char b'[s];
    for (i = 0; i < s; i++)
      b'[i] = 0;
    for (i = 0; i < s; i++)
      if (b[i] == 1)
        for (each transition e leading from state i to state j)
          if (c appears in the list of labels of e)
            b'[j] = 1;
    for (i = 0; i < s; i++)
      b[i] = b'[i]; }

This procedure is quite good: s is finite and there are only finitely many transitions from each state, so it runs in constant time (at worst the time is proportional to s^2). However, one can do much better than this. How many different bit-vectors of length s are there? 2^s. For large s, S = 2^s is a large number, but for constant s it is finite. These vectors are in a trivial one-one correspondence with the numbers 0, 1, ..., S - 1.

For any vector b, the resulting vector b_c after encountering a character c is defined by the procedure transition. This is independent of the history: any time the current states are described by b and the input character is c, the next set of states is described by b_c. So, we might just as well precompute all these: for an alphabet of size r and an automaton of size s, we create r arrays of size S = 2^s: b_c[x], 0 <= c < r, 0 <= x < S, indicates the result when applying procedure transition to the vector corresponding to the number x for character c.

The constructed arrays b_c[] define a finite automaton: there are S states, and the transition from state x, 0 <= x < S, upon encountering character c, 0 <= c < r, is given by b_c[x]. Because for any input character a unique transition is specified, this is a deterministic automaton.

We still have to fix the starting state and the final states of the new automaton. If the starting state of the non-deterministic automaton is state i, then the starting state of this deterministic automaton is state x, with x = 2^i. If state i is an accepting state of the non-deterministic automaton, then any state x of the deterministic automaton with the i-th bit of x equal to 1 is an accepting state of the deterministic automaton.

The above given construction for constructing an equivalent deterministic automaton for a given non-deterministic automaton is known in the literature as the subset construction.

Equivalence of Deterministic and Non-Deterministic Automaton

We now prove that the constructed deterministic automaton is equivalent to the non-deterministic automaton. Two automata A_1 and A_2 are said to be equivalent if they accept the same strings. That is, if for any string S over the correct alphabet A_1 accepts S if and only if A_2 accepts S.

Because the constructed deterministic automaton does nothing more than offering an efficient implementation of a simulation of the non-deterministic automaton and because the starting and accepting states were defined sensibly, this equivalence is nothing surprising. In the following we will check it quite formally nevertheless.

The main step is proving the following claim:

For any t, some state i of the non-deterministic automaton is an element of the set of states reached after processing t characters if and only if after processing t characters the constructed automaton reaches a state x for an x which has its i-th bit equal to 1.
Once this claim is proven, we are done, because then it follows that the non-deterministic automaton reaches an accepting state i after t characters, if and only if the deterministic automaton reaches a state x with i-th bit equal to 1 and such a state was defined to be accepting.

The claim is proven by complete induction over the number t of processed characters: for t == 0, it is true because of the definition of the starting state of the deterministic automaton: before processing any characters, the set of reached states of the non-deterministic automaton only consists of the starting state i. The starting state of the deterministic automaton has only bit i equal to 1.

It remains to show that, assuming that the claim holds after processing t characters, the claim holds after processing t + 1 characters. Let c be character t + 1. Denote the states of the deterministic automaton before and after processing c by x and x', respectively.

Assume that state j is an element of the set of states reached by the non-deterministic automaton after processing t + 1 characters. It is only reached if there is a transition e from a state i which was reached after processing t characters which has label c on its transition leading to j. If i was reached after processing t characters, then we may assume, due to the induction hypothesis, that bit i of state x equals 1. But then, the above procedure transition, which defines the transitions of the deterministic automaton is so that even bit j of x' equals 1.

For the other direction, assume that bit j of x' equals 1. This only happens when there is some bit i of x which is equal to 1 so that there is a transition e from state i to state j of the non-deterministic automaton which contains c in its list of labels. However, if bit i of x equals 1, then state i is among the reached states after processing t characters, and thus is state j among the states reached after processing t + 1 characters.

Importance of Construction

Now that we have proven that the constructed automaton is equivalent to the original one, it is time to consider what we have achieved: a non-deterministic automaton with s states has been replaced by an, in principle, exponentially larger deterministic automaton. Is this an improvement?

The answer is yes. Even though the non-deterministic automaton can be simulated quite efficiently, by an algorithm taking time quadratic in the number of states for every character of the input, this simulation requires a procedure which cannot easily be expressed in a small automaton itself. So, from the perspective of the automaton this is like external magic.

As automata are also supposed to operate as embedded systems, being build as a small piece of hardware, this is not what we want: non-determinism does not lead to the straight-forward flow of operations we are used to, and which can be handled by the hardware as a flow of signals going from one device to the next. This is the practical reason why non-determinism is undesirable. More theoretically it is in many domains a question whether deterministic approaches are as powerful as non-deterministic ones. In the case of finite automata, the given construction shows that the answer is affirmative.

The reason to work with non-deterministic automata at all, is that they often allow to achieve the desired functionality much easier than with deterministic automata. One should only consider the example of the non-deterministic automaton for detecting 01*0 or 1010. The given construction then allows to turn this non-deterministic automaton into an equivalent deterministic one. So, a deterministic automaton for a given task can often most easily be obtained by first constructing a non-deterministic one.

The major disadvantage of the construction is the tremendous increase in the number of states. However, in practice it turns out that this number often can be strongly reduced. Further down we will present a method for reducing the number of states to the minimum possible. We will see that for our example problems, the required number of states in the deterministic automaton is not so much larger than the number in the non-deterministic one.

Deterministic Automaton for Detecting 0100

A Proof by Induction

The non-deterministic automaton for detecting substring 0100 has 5 states. So, in principle we should construct a deterministic automaton with 32 states, (0, 0, 0, 0, 0), (0, 0, 0, 0, 1), ..., (1, 1, 1, 1, 1). However, it is a waste of effort to incorporate states which are not reachable from the starting state, which is (0, 0, 0, 0, 1).

It is immediately clear that any reachable state (b_4, ..., b_1, b_0) has b_0 == 1. This can easily be proven by complete induction over the number of processed characters. Proving this in full detail is a nice exercise of how to prove facts for the deterministic automaton exploiting knowledge about the non-deterministic automaton and using the way the deterministic automaton is constructed.

Denote by B_t the state of the deterministic automaton reached after processing t characters. B_0 == (0, 0, 0, 0, 1), which clearly has b_0 == 1, so the claim that b_0 == 1 for all t holds for t == 0. Now assume that b_0 == 1 in B_t, for some t >= 0. Then we should show that b_0 == 1 even in B_{t + 1}. B_{t + 1} depends on B_t and character t + 1. Let B_{t + 1, 0} be the resulting state when this character is 0, and B_{t + 1, 1} when it is 1.

0 element_of f(0, 0) subset_of union_{i | b_i == 1 in B_t} f(i, 0)
Thus, the definition of the transitions in the deterministic automaton gives that b_0 == 1 in B_{t + 1, 0}. Analogously, the following implies that even b_0 == 1 in B_{t + 1, 0}.
0 element_of f(0, 1) subset_of union_{i | b_i == 1 in B_t} f(i, 1)

So, given that b_0 == 1 in B_t, b_0 == 1 also in B_{t + 1}, whatever the value of character t + 1 is. This completes the proof by complete induction: we have been checking both the basis and the step.

Working with the Subset Construction

The normal way of proceeding is to start with the starting state (0, ..., 0, 1) of the deterministic automaton and to consider all states which are reachable from there. Then one considers all states which are reachable from these. This process is repeated until no further states can be reached from any of the reachable states.

One should work systematically in order not to forget any reachable state. A good idea is to write any newly discovered state on a list and consider as next state the one that is at the end or the beginning of the list. Initially only the initial state stands on the list, as soon as the list is empty, we are done.

Computing the transitions from the states given by bitvectors of the deterministic automaton can be performed by directly executing the above procedure "transition". However, this is facilitated by first computing the transitions from the primitive bitvectors: for a non-deterministic automaton with s states the bitvectors with a single 1 at position i, 0 <= i < s, are denoted by p_i and called primitive bitvectors.

Denote the state that is reached when transiting from p_i with character c by p_{i, c}. p_{i, c} has a 1 in all positions that correspond to states of the non-deterministic automaton that are reachable from state i by a transition with label c and 0's in all other positions.

When transiting with character c from a state of the deterministic automaton given by bitvector (b_{s - 1}, ..., b_1, b_0), a state is reached with bitvector Or_{0 <= i < s| b_i == 1} p_{i, c}. Here the or operation is performed bitwise: for two bitvectors u and v, w == u or v has a 1 precisely there where u or v have a 1.

Application to a Concrete Problem

These ideas are applied to the considered automaton. For the primitive states we have:
  (0, 0, 0, 0, 1) --0-> (0, 0, 0, 1, 1)
                  --1-> (0, 0, 0, 0, 1)

  (0, 0, 0, 1, 0) --0-> (0, 0, 0, 0, 0)
                  --1-> (0, 0, 1, 0, 0)

  (0, 0, 1, 0, 0) --0-> (0, 1, 0, 0, 0)
                  --1-> (0, 0, 0, 0, 0)

  (0, 1, 0, 0, 0) --0-> (1, 0, 0, 0, 0)
                  --1-> (0, 0, 0, 0, 0)

  (1, 0, 0, 0, 0) --0-> (0, 0, 0, 0, 0)
                  --1-> (0, 0, 0, 0, 0)

Here --0-> denotes the transition when the next character equals 0, the transition with label 0, and --1-> denotes the transition with label 1.

We can now easily compute all states which are reachable from the starting state (0, 0, 0, 0, 1) of the deterministic automaton by computing the appropriate bitwise ors of these vectors:

  -- Distance 0 From Starting State --

  (0, 0, 0, 0, 1) --0-> (0, 0, 0, 1, 1)
                  --1-> (0, 0, 0, 0, 1)

  -- Distance 1 From Starting State --

  (0, 0, 0, 1, 1) --0-> (0, 0, 0, 1, 1)
                  --1-> (0, 0, 1, 0, 1)

  -- Distance 2 From Starting State --

  (0, 0, 1, 0, 1) --0-> (0, 1, 0, 1, 1)
                  --1-> (0, 0, 0, 0, 1)

  -- Distance 3 From Starting State --

  (0, 1, 0, 1, 1) --0-> (1, 0, 0, 1, 1)
                  --1-> (0, 0, 1, 0, 1)

  -- Distance 4 From Starting State --

  (1, 0, 0, 1, 1) --0-> (0, 0, 0, 1, 1)
                  --1-> (0, 0, 1, 0, 1)


As an example we consider how the transition from (0, 1, 0, 1, 1) for character 0 is computed. There are 1's at position 0, 1 and 3. So, we must compute the bitwise or of p_{0, 0}, p_{0, 1} and p_{3, 0}. That gives (0, 0, 0, 1, 1) or (0, 0, 0, 0, 0) or (1, 0, 0, 0, 0) == (1, 0, 0, 1, 1).

In the given very simple example, there was always just one state on the list of states to work out, but this is an exceptional case. In this we were also lucky that the number of resulting states is so small, but in general this number may lie close to 2^s, for a non-deterministic automaton with s states. Actually, in this case the number of states was minimal: the deterministic automaton can never have fewer states than the non-deterministic automaton (provided that all states of the non-deterministic automaton are reachable).

The resulting automaton is (except for the transitions from the accepting state) identical to the earlier given deterministic automaton for this problem. Of course in the final automaton the states can be renumbered contiguously starting from 0: the numbers of the states have no external meaning, and for processing purposes it is handy if they are as small as possible.

Deterministic Automaton for Detecting 0100

Deterministic Automaton for Detecting 01*0 or 1010

The non-deterministic automaton has 7 states. In principle this might give up to 128 == 2^7 states in the constructed non-deterministic automaton, but fortunately only a small fraction of these is actually reachable from the starting state (0, 0, 0, 0, 0, 0, 1). Without further analysis, which was only performed as an exercise after all, we immediately proceed with determining the reachable states and the transitions from there.

For the primitive states we have:

  (0, 0, 0, 0, 0, 0, 1) --0-> (0, 0, 0, 0, 0, 1, 1)
                        --1-> (0, 0, 0, 0, 1, 0, 1)

  (0, 0, 0, 0, 0, 1, 0) --0-> (0, 0, 0, 0, 0, 0, 0)
                        --1-> (0, 0, 0, 1, 0, 0, 0)

  (0, 0, 0, 0, 1, 0, 0) --0-> (0, 0, 1, 0, 0, 0, 0)
                        --1-> (0, 0, 0, 0, 0, 0, 0)

  (0, 0, 0, 1, 0, 0, 0) --0-> (0, 1, 0, 0, 0, 0, 0)
                        --1-> (0, 1, 0, 0, 0, 0, 0)

  (0, 0, 1, 0, 0, 0, 0) --0-> (0, 0, 0, 0, 0, 0, 0)
                        --1-> (0, 1, 0, 0, 0, 0, 0)

  (0, 1, 0, 0, 0, 0, 0) --0-> (1, 0, 0, 0, 0, 0, 0)
                        --1-> (0, 0, 0, 0, 0, 0, 0)

  (1, 0, 0, 0, 0, 0, 0) --0-> (0, 0, 0, 0, 0, 0, 0)
                        --1-> (0, 0, 0, 0, 0, 0, 0)

These are used to simply determine the following set of reachable states. To facilitate the construction of the picture given hereafter and to check the correct drawing of it, we have already indicated the indices to which the states of the deterministic machine are mapped.

  -- Distance 0 From Starting State --

   0 ~ (0, 0, 0, 0, 0, 0, 1) --0-> (0, 0, 0, 0, 0, 1, 1) ~ 1
                             --1-> (0, 0, 0, 0, 1, 0, 1) ~ 2

  -- Distance 1 From Starting State --

   1 ~ (0, 0, 0, 0, 0, 1, 1) --0-> (0, 0, 0, 0, 0, 1, 1) ~ 1
                             --1-> (0, 0, 0, 1, 1, 0, 1) ~ 3

   2 ~ (0, 0, 0, 0, 1, 0, 1) --0-> (0, 0, 1, 0, 0, 1, 1) ~ 4
                             --1-> (0, 0, 0, 0, 1, 0, 1) ~ 2

  -- Distance 2 From Starting State --


   3 ~ (0, 0, 0, 1, 1, 0, 1) --0-> (0, 1, 1, 0, 0, 1, 1) ~ 5
                             --1-> (0, 1, 0, 0, 1, 0, 1) ~ 6

   4 ~ (0, 0, 1, 0, 0, 1, 1) --0-> (0, 0, 0, 0, 0, 1, 1) ~ 1
                             --1-> (0, 1, 0, 1, 1, 0, 1) ~ 7

  -- Distance 3 From Starting State --


   5 ~ (0, 1, 1, 0, 0, 1, 1) --0-> (1, 0, 0, 0, 0, 1, 1) ~ 8
                             --1-> (0, 1, 0, 1, 1, 0, 1) ~ 7

   6 ~ (0, 1, 0, 0, 1, 0, 1) --0-> (1, 0, 1, 0, 0, 1, 1) ~ 9
                             --1-> (0, 0, 0, 0, 1, 0, 1) ~ 2

   7 ~ (0, 1, 0, 1, 1, 0, 1) --0-> (1, 1, 1, 0, 0, 1, 1) ~ A
                             --1-> (0, 1, 0, 0, 1, 0, 1) ~ 6

  -- Distance 4 From Starting State --

   8 ~ (1, 0, 0, 0, 0, 1, 1) --0-> (0, 0, 0, 0, 0, 1, 1) ~ 1
                             --1-> (0, 0, 0, 1, 1, 0, 1) ~ 3

   9 ~ (1, 0, 1, 0, 0, 1, 1) --0-> (0, 0, 0, 0, 0, 1, 1) ~ 1
                             --1-> (0, 1, 0, 1, 1, 0, 1) ~ 7

   A ~ (1, 1, 1, 0, 0, 1, 1) --0-> (1, 0, 0, 0, 0, 1, 1) ~ 8
                             --1-> (0, 1, 0, 1, 1, 0, 1) ~ 7

Even here we are very lucky: out off the potentially 128 states, only 11 are reachable from the starting state. Three of these states are accepting states. So, the resulting deterministic automaton remains relatively simple. At the same time, it appears that it would have been far from easy to find this deterministic automaton, including all transitions, without the presented general technique for turning non-deterministic automata into deterministic automata.

Deterministic Automaton Detecting 01*0 or 1010

Fusing Terminal States

For all automata we are considering, the default is that an automaton stops as soon as an accepting state is reached (though it is sometimes handy to work with automata for which acceptance depends on the status of the state in which they reside at the end of the text). Applying the subset construction will typically result in a deterministic automaton with several accepting states and transitions from the accepting states among each other and back to non-accepting states. The given examples show that this happens even if the non-deterministic automaton has a single accepting state without outgoing transitions.

If the only purpose is to determine whether a specified pattern occurs or not, all accepting states can be fused to a single accepting state without outgoing transitions. Only if one wants to use the automaton to detect all occurrencies matching the pattern, these accepting states and their outgoing transitions must be preserved. This only works when, against our default assumption, we assume that the automaton does not halt when reaching an accepting state, but rather produces some output (further down we will see that we could better speak of a Moore automaton in this case).

If one is eventually going to fuse the accepting states anyway, then there is no need to generate all of them to start with: any state of the deterministic automaton of the form (1, *, ..., *) is an accepting state, and there is no need to consider their outgoing transitions. This may help saving quite some work during the construction.

Minimizing the Number of States

Goal and Approach

It is interesting to wonder how many states are really needed for a given task. This is not only an interesting puzzle, once it comes to building a automaton as a small piece of hardware, it may be a cost advantage to reduce the number of states from say 41354 to 20980.

So, the goal is to find among all equivalent automata (in the sense that they accept the same strings) the one which has the fewest states. We are lucky: in the case of deterministic automata, there is a unique minimum-state automaton within any class of equivalent automata, and it can be found quite easily.

The idea is to consider which states are equivalent. Two states x and y are said to be equivalent if for any legal input string the automaton reaches an accepting state starting from x if and only if it reaches an accepting state starting from state y. If x and y are equivalent in this sense, then there is no need to keep both of them. One of them, for example y, can be eliminated: all transitions leading to y are replaced by transitions leading to x, all transitions leading out off y are simply deleted.

Notice that the claimed unique and easy-to-construct minimal automaton is guaranteed to exist only for deterministic automata. In the example further down, we will see equivalent non-deterministic automata. Even though they have different sizes, they do not have any equivalent states, so they cannot be reduced.

Algorithm

It is rather hard to tell which states are equivalent in a direct way, but it is quite easy to tell which states are non-equivalent:

The idea is now to initially construct two subsets: the accepting states and the non-accepting states. Then, repeatedly we look for a subset S with states in it which have transitions labeled with the same character c to states in different sets. S is split accordingly. The process stops once for all subsets S, for all states in S all transitions with the same label lead to states in the same set.

The proof that any two states which are classified as non-equivalent according to the above criteria are indeed not equivalent can be performed by induction over the number of performed splitting operations. After performing zero splitting operations, there are two subsets: the accepting states and the non-accepting states. These are not equivalent, because starting in an accepting state, an accepting state is reached when the input string is empty. From a non-accepting state we do not reach an accepting state with the empty string. Now assume that states A and B belong to the same subset of states until some step t and that they belong to different subsets after step t. By induction we may assume that any two states which were split into different sets are indeed not equivalent. A and B are split into different sets in step t, only when the procedure has found a character c for which there is a transition from A to some state A_c and for B to some state B_c so that A_c and B_c have been classified as non-equivalent before. By the induction assumption, this means that there is a string S so that starting in A_c we reach an accepting state A_t and starting in B_c we reach a state B_t which is not accepting, or vice-versa. Assume without loss of generality that A_t is accepting and B_t is not. Then, starting from A with string cS (c followed by the symbols from S) we reach the accepting state A_t and starting with cS from B we reach the non-accepting state B_t. So, indeed A and B are not equivalent.

We do not prove the reversal: once the process stops all states which are not equivalent have been split into different subsets. Together with the above this implies that the procedure precisely finds all classes of equivalent states. Thus melting together all states which lie in the same subset gives a minimal automaton (because no unnecessary splits are performed) and this automaton is equivalent to the original one (because no non-equivalent states end up in different subsets).

The claim is even stronger: because for any given problem the minimal deterministic automaton is unique (except for the irrelevant numbering of the states), the obtained automaton is unique. Particularly, starting with different non-deterministic automata for a problem, it may happen that the subset construction gives us very different deterministic automata. However, after minimizing each of them, we have the guarantee to obtain the same minimal deterministic automaton. This means that there is no need to choose the non-deterministic automaton in a special way: in the end we will always find the same (only the amount of work in the intermediate steps may be different).

In order to perform this procedure, one may have to add a special dead state: for any character not appearing in the list of labels of any of the transitions out off a state x, a transition to this dead state is added. The dead state has a transition to itself for all characters of the alphabet. After running the minimization procedure, the dead state can be removed again. If all characters appear as label at one of the transitions out off each state, then there is no need for a dead state. This latter case particularly arises for deterministic automata which were obtained as a result of the subset construction. Also one should remove all states which have no ingoing links. The subset construction never produces such states if only earlier reached states are handled.

First Example

As an example we try to determine the non-equivalent states of the following automaton:

Deterministic Automaton

After removing state 9 which has no ingoing transitions and adding a dead state D, we distinguish to equivalence classes as indicated in the following picture. In the further description, the (preliminary) equivalence classes will be designated by the index of the node with the smallest index in it. So, we will speak of class 0 and class A.

Initial Classes

In a first round of testing we discover the difference between the states 7 and 8 and the other nodes in the class of node 0: from state 7 and 8, there is a transition with label 0 to class A and from the other nodes there is no such transition. This gives the following situation with three classes:

After One Round of Testing

In the second round of testing we discover the difference between the states 3, 4, 5 and 6 and the other nodes in the class of node 0: from these states, there is a transition with label 0 to class 7 and from the other nodes in class 0 there is no such transition. This gives the following situation with four classes:

After Two Rounds of Testing

In the third round of testing we discover the difference between the states 3 and the other states in class 3: from state 3, there is a transition with label 1 to a state in class 3, while for the others the transition with label 1 leads to a state in class 0. We also discover that state 1 and 2 are not equivalent to state 0 and D, because from 1 and 2 there are transitions to states in class 3.

After Three Rounds of Testing

In the fourth round of testing we discover the difference between state 1 and state 2: from state 1 there is a transition with label 0 to a state in class 3 and for state 2 the transition with label 0 leads to a state in class 4. Likewise we discover the difference between state 0 and state D. This gives the following situation:

After Four Rounds of Testing

Hereafter there is nothing to discover anymore. Identifying all equivalent states and removing the dead state gives the following much simpler automaton. It is globally minimal and equivalent to the original automaton.

After Identifying Equivalent States

Second Example

We consider automata for detecting the following substrings
  0{2}0 | 
  0{2}1 | 
  0{2}2 | 
  1{2}0 | 
  1{2}1 | 
  2{2}0 | 
  2{2}2
Here consecutive symbols indicate a substring, "|" separates alternatives and "{x}" denotes zero or more repetitions of symbol x. Round brackets, "( ... )", may be used to group subexpressions. The given pattern can be written as
  0{2}(0|1|2) | 
  1{2}(0|1) | 
  2{2}(0|2)
and as
  (0|2){2}(0|2) | 
  (0|1){2}(0|1)

These reformulations can immediately be translated into equivalent non-deterministic automata. Each of these automata is minimal, in the sense that the number of states cannot be reduced by fusing equivalent states. This shows that there is no unique minimal non-deterministic automaton for a task.

Equivalent Non-Deterministic Automata

Constructing the corresponding deterministic automata gives two different automata. However, in the upper automaton the states 5, 5' and 5" are equivalent and likewise 1 and 1'. Fusing the equivalent states gives the lower automaton. So, here we find a unique minimal deterministic automaton.

Equivalent Deterministic Automata

If the final states are fused, which one should do if one only wants to detect whether the substring occurs or not instead of finding all occurrencies, then only five states remain, which is hardly more than the number in the smallest non-deterministic automaton. The rather surprising conclusion is that any string of at least 4 characters is accepted.

Deterministic Automaton with Terminal States Fused

If one is going to fuse the terminal states, then one should do this before running the minimization procedure, because nodes with transitions to non-equivalent terminals may become equivalent once the terminals are fused. Going through this whole elaborate process one eventually often finds quite a small deterministic automaton even for relatively complicated searches.

Types of Finite Automata

There are several kinds of closely related finite automata. In the above text we have not spoken this out very precisely. The main distinction is between automata which are only supposed to test whether a provided string is in the language or not and automata which are used for signal processing.

The basic definition says that an automaton halts as soon as it reaches an accepting state. However, it might be handy to consider the variant that a string is accepted if the automaton is in an accepting state at the end of the string. In the exercises we will consider the problem of determining whether the number of 1's in a string of 0's and 1's is even. It is handy to let the accepting state correspond to the case "so far the number of encountered 1's was even". With the conventional definition of accepting, we need an extra state to which the automaton jumps at the end of the string. This makes the automaton less aesthetical and forces to tell what character is used to mark the end of the string.

Automata which are used to produce some output (more than one bit) in reaction to the provided string can be distinguished in two categories:

Moore Automata:
Every reached state produces some (possibly empty) output. In the diagram, this output is written next to, or instead of the label of the states.
Mealy Automata:
Every transition produces some (possibly empty) output. In the diagram, this output is written next to the label separated by a slash, '/'.

Without explicitly calling it that way, we have encountered a Moore automaton in the example on suppressing isolated 0's and 1's. Here state 0 and 1 produce as output a 0, while state 2 and 3 produce as output a 1. Mealy automata we have not encountered yet.

Moore and Mealy automata are equivalent. This means that for any Moore automaton there is a corresponding finite Mealy automaton producing exactly the same output and vice versa. A minor problem, implying that often quite a few extra states are required, is that the number of visited states exceeds the number of transitions by one.

A Moore automaton can be turned into a Mealy automaton by allocating the output of a state to all transitions leading to this state (alternatively one might allocate this output to the transitions leading out off this state). In order to assure that even the output from the starting state is produced, one should add an extra starting state, which produces extra output on the transitions out of it. Only for empty strings no output is produced: the automata produce the same output for all non-empty strings.

Moore Automaton and an Equivalent Mealy Automaton

A Mealy automaton can be turned into a Moore automaton, by replacing a state s by as many states s_1, ..., s_k as there are different outputs on the transitions leading to state s. The transitions from these states are the same as for s. Because the number of transitions to a state is limited by the size of the alphabet and the number of states, this replacement increases the number of states by at most a constant factor on a finite automaton. Because the Mealy automaton does not produce output before the first transition, one should add a special start state which does not produce output. The following example gives a Mealy automaton which inverts the bits of a string and the equivalent Moore automaton:

Mealy Automaton and an Equivalent Moore Automaton

Exercises

  1. Design an automaton which decides whether a string ending with a blank, ' ', is a correct C identifier. A correct C identifier begins with a letter or '_', followed by letters, digits or '_'. The automaton should accept strings which may be identifiers and reject all others.

  2. We consider strings over an alphabet consisting of three characters: 0, 1 and L. L is used only once and marks the end of the string.
    1. Design an automaton which accepts a string if and only if its parity is even. The parity of a string is given by the parity of the number of 1's in the string: it is even if there is an even number of 1's, otherwise it is odd.
    2. Compare this automaton with an automaton which at a first glance is very similar: it accepts a string if and only if it has equally many 0's and 1's. What is the problem?
    3. Design an automaton which accepts a string if and only if its prefix balance lies between -2 and +2. Here the prefix balance is defined as the number of 1's minus the number of 0's computed from the beginning of the string until the current position. So, 001011110101L is acceptable but 00100100L is not.

  3. On pocket-calculators there is often a limitation on the number of brackets that may stand open at the same time. Assume that 4 is the maximum. So, (()((()())())) is correct, but ((((())))) is not. Of course, ())( and (() are also incorrect. Design an automaton which accepts a string consisting of "(" and ")" intermixed with other non-blank characters and ending with a blank, ' ', if and only if it has a bracketing structure which is correct according to the above specification.

  4. Turn the above automaton for testing the correctness of the bracketing structure into a C program. In this case the string may be assumed to be terminated with EOF. Hint: modify the program for recognizing a hidden keyword, which can be downloaded here.

  5. Consider strings over an alphabet consisting of three characters: 0, 1 and 2.
    1. Design a non-deterministic automaton which detects all occurrencies of substrings of the form 0*2 or 1*0. Here '*' denotes any of the characters from the alphabet, in this case that is 0, 1 or 2.
    2. Construct an equivalent deterministic automaton. Hint: the correct answer consists of an automaton with at least 15 states (at least 9 non-accepting and at least 6 accepting states).

  6. Consider strings over an alphabet consisting of two characters: 0 and 1.
    1. Design a non-deterministic automaton which accepts a string if and only if it contains as substrings both 01 and 10, which should not overlap. Notice that the order in which the substrings should occur is not fixed. So, 0110, 1001 and 110000100 are acceptable but 010, 101 and 00111 are not.
    2. Construct an equivalent deterministic automaton.
    3. Is the constructed automaton minimal? Determine all equivalent states and then construct the minimal deterministic automaton solving the task. Hint: when fusing the accepting states, the minimal deterministic automaton has 9 states of which 1 is accepting.

  7. Consider strings over an alphabet consisting of three characters: 0, 1 and 2.
    1. Design a non-deterministic automaton which accepts a string if and only if it contains as substrings both 01 and 10, which should not overlap. So 2012210 and 10220220122 are acceptable but 0221210 and 102221021 are not.
    2. Construct an equivalent deterministic automaton.
    3. Is the constructed automaton minimal? Determine all equivalent states and then construct the minimal deterministic automaton solving the task. Hint: when fusing the accepting states, the minimal deterministic automaton has 13 states of which 1 is accepting.

  8. Consider strings over an alphabet consisting of three characters: 0, 1 and L. L is used only once and marks the end of the string.
    1. Design an automaton which accepts a string if and only if it does not contain the substring 11111.
    2. Design an automaton which accepts a string if and only if it does not contain the substring 10010. Why should one be very careful with non-determinism?

  9. Consider strings over an alphabet consisting of four characters: A, X, Y and L. L is used only once and marks the end of the string. Design a deterministic automaton which accepts a string if and only if X occurs, but Y does not occur.

  10. Consider strings over an alphabet consisting of three characters: 0, 1 and L. L is used only once and marks the end of the string. Design an automaton which accepts a string if and only 0010 occurs as a substring but 1001 does not.

  11. Consider the given four-state automaton for filtering out isolated 0's and 1's. This automaton has the undesirable property that a sequence 000111 is rendered as 000011. That is, even a perfectly sharp transition from 0's to 1's is mutilated. If we insist that every state outputs one bit, 0 or 1, this appears to be an inevitable consequence of the fact that a sequence 000100 should be rendered as 000000. However, there is no need to do so. If we allow that some states output no bits, some states one bit and some states two bits, then a finite state automaton can be constructed which leaves clear transitions unchanged.
    1. Design such an automaton (which is called a Moore automaton). The output of a state can be written in the nodes of the automaton next to or instead of the labels.
    2. The same improved behavior can be achieved more easily with a Mealy automaton, that is, an automaton in which the output is generated by the transitions instead of the states. Design a Mealy automaton for this task. The output of a transition is written next to it, separated from the label by a "/".

  12. Consider the given four-state automaton for filtering out isolated 0's and 1's. This automaton has the undesirable property that even long alternating sequences are completely turned into 0's or 1's, depending on the state in which the machine was at the beginning of the sequence. If one interprets a 0 as a black pixel and 1 as a white pixel, this means that all gray-tones are eliminated. Extend the automaton so that still any isolated 0 or 1 is suppressed, but that a sequence 00010101011011 is rendered as 00000101011111. So, the automaton should have three output states: black, gray and white: after any two consecutive 0's, the automaton should be in a black state, after any two 1's, the automaton should be in a white state. If the automaton was in a black state, it should be in a gray state after encountering 101; if the automaton was in a white state, it should be in a gray state after encountering 010. In every black state one 0 is output, in every white state one 1. In a gray state the bits are output without change.

  13. The above two exercises suggested improvements to the four-state automaton for filtering out isolated 0's and 1's. The task is now to design a Mealy automaton achieving all we want. As before a sequence 00100 should be rendered as 00000, a sequence 11011 as 11111. However, 00101 and 11010 should be rendered without change. That is, starting from a black context, a 1 should be suppressed only when neither the next, nor the next-next is a 1. The rule for suppressing a 0 is analogous. In a Mealy automaton, the output is generated by the transitions instead of the states. The output of a transition is written next to it, separated from the label by a "/".

  14. We consider strings over an alphabet consisting of four characters: 0, 1, 2 and 3. Design a Mealy automaton producing output according to the following rules. Denote by j the previous character and i the current character of the input string. Let k = (4 + i - j) % 4 (where "%" denotes the modulo operation). So, for j == 1 and i == 3, k == 2, for j = 2 and i == 1, k == 3, for i == j, k == 0. The transition due to encountering i produces as output k consecutive i's. So, 31222013 is converted to 33311200133. Convert this automaton to an equivalent Moore automaton.

  15. A dead state is a non-accepting state with transitions to itself for each character from the language. Adding a dead state to an automaton with transitions with label c from any node x for which character c does not appear along the transition out off x gives an equivalent automaton. Prove this. Hint: use the definition of equivalent.

  16. Applying the subset construction to a deterministic automaton, which if necessary has been completed with a dead state, constructs the same automaton, except for a renumbering of the nodes. Prove this. Hint: use some kind of induction.

  17. The task is to design a finite deterministic automaton modeling the movement of a simplified elevator. There is a single elevator, there are four floors. At floor zero there is a button for "up", at floor 1 and 2 there are buttons "up" and "down", at floor three there is a button "down". In addition at each floor there are contacts registering that the elevator reaches and leaves a floor. In practice there are even buttons inside the elevator, but it is not necessary to take these into the model. It is only required to model the states and the state transitions, but not any timing aspects. It may be assumed that no two events (generated by the buttons and the contacts) happen at exactly the same time. The movement rules are simple and widely applied:
    • If no one is waiting the elevator stays at the latest floor it has delivered someone at.
    • If the elevator is standing still, and someone clicks at some floor, the elevator goes there.
    • If the elevator is moving upwards it stops at all floors where an "up" button was pressed and at the highest floor where a "down" button was pressed.
    • If the elevator is moving downwards it stops at all floors where a "down" button was pressed and at the lowest floor where an "up" button was pressed.
    A time step corresponding to the time the elevator needs to stop or to move to an adjacent floor. Work out the modeling further if necessary and draw the resulting automaton. If the automaton becomes unreasonably big, analogous parts need not to be worked-out in full detail. The physical state of the elevator (such as "moving upeards from floor 2 to 3") should be indicated. Your automaton must support this. What is the name of this type of automaton?





Grammars

Definitions

Speaking about languages and about the correctness of an expression requires a formalization: we need to define a grammar. For natural languages, the grammar should reflect the properties of the language. Some languages are more complex than others in the sense that they require more rules to describe all grammatically correct sentences (in this case one commonly says that these are exceptions to the rules). For constructed languages, the rules are defined and all expressions which are constructed according to the rules are correct, all others are wrong. Such constructed languages are called formal languages. Prime examples of formal languages are computer languages, which because of the need for automatic translation to lower-level code should be specified very precisely in order to assure that misunderstanding is excluded.

It is important to distinguish two kind of symbols: the symbols from the language itself and the meta-symbols which are used to speak about the language. In principle we might use any symbol as a meta-symbols but for practical reasons one mostly uses words (article, substantive, adjective, verb, ...). The symbols from the language itself (cat, dog, walk, long, ...) are also called terminal symbols because these stand at the bottom of the construction of the language, the meta-symbols are often called non-terminal symbols.

Syntax-diagrams are a handy way of formulating grammatical rules. Assume that we want to construct simples sentences consisting of a subject part, a verb and an object part, then the rules for this might be formulated with help of the following diagrams:

Syntax Diagram for Sentences

Here the symbols in rounded boxes are terminal symbols, the symbols in rectangled boxes are meta symbols. SNT stands for "sentence", FRM for "substantive form", ART for "article", ADJ for "adjective", SUB for "substantive" and VRB for "verb".

How to read the diagram? One starts at the left side and ends at the right side. Any path gives a correct expression. If a line forks, this indicates several legal alternatives. In the definition of FRM we see two different kind of alternatives: a form consists of zero or one articles and zero or more adjectives. This follows from the rule that one should always follow smooth curves. So, to get an adjective one takes the second turn-off and then one may loop several times.

Notice that the formulation of SNT in terms of FRM and VRB is handy, but not unique. Particularly it is always correct (but the underlying structure is obscured by this) to replace any non-terminal symbol by its definition. That is, in the above diagram, the rectangled boxes might be replaced by the defining diagrams, until the whole diagram only consists of terminal symbols, lines and curves.

The above grammatical rules are called production rules: they can be used to produce meta symbols. The meta symbol at which the production start, in our case SNT, is called the starting symbol.

In the example we specified a few terminal symbols of each category. It would also have made sense not to specify these at all, knowing that there are thousands of words and that our intention is to express the grammatical rules, not the set of words. A grammar which is not worked out until the level of the terminal symbols is called an abstract grammar.

We summarize the notions and come to a formal definition of a grammar. A grammar is a quadruple consisting of:

Recursion

The language constructed by the above English grammar is very small and simple. One can construct sentences like "the big red dog sees a nice big cow" and "man eats dog". Both natural and formal languages are more interesting than this. We will now extend the definition of FRM to FRM'. The definition of SNT is accordingly extended to SNT':

Syntax Diagram for Extended Sentences

Here we have further added the meta-symbols PGR, which stands for "preposition group", PRP for "preposition" and REL for "relative pronoun". Of course we have very few terminal symbols, but, assuming that we would have specified more verbs, substantives, adjectives etc, with the now given production rules one can already construct quite complicated sentences like "the ugly tall man which sees the brown dog which bytes the black fat cow in the tail throws a sandwich to the other man which wears a blue hat ... ".

The exiting thing about this is that in the definition of FRM' we find FRM' itself again. Before we have seen how a FRM could have an arbitrary number of ADJ in it by a looping construction, but the phenomenon here is more intricate. It is called recursion. How can one define something in terms of itself? Don't we get an explosion? No, just as in the loop, there is a possibility to terminate by not choosing a recursive alternative.

It is essential that any recursive definition contains at least one non-recursive alternative. Such an alternative is called a basis of the recursion. A recursive definition without a basis is called circular. In the above example, there is one direct recursion: FRM' appears in FRM' again, but there is also a instance of a more indirect recursion: PGR appears in FRM' and FRM' appears in PGR. In the latter case we will say that the notions FRM' and PGR are mutually recursive.

Examples

Integer Constants

In C an integer constant, ICN, is defined to be either a decimal, an octal or a hexadecimal constant followed by an optional integer suffix. These are denoted DCN, OCN, HCN and ISF, respectively. As a syntax diagram, the complete set of definitions can be depicted as follows:

Syntax Diagram for Integer Constants

The advantages and disadvantages of syntax diagrams becomes clear from this: it is simple and clear. On the other hand the pictures may become quite large and they take much time to draw. In the following section we will therefore consider alternative ways to formulate rules.

Chains

In the chapter on object-oriented programming we have been working with chains, which were build from nodes. A chain could be enlarged by adding a node at the beginning, at the end or by gluing two chains together. All this can also be formulated in terms of diagrams, be it without expressing the details how to perform the linking.

Syntax Diagram for Chains

Palindromes

A palindrome is a word (or a piece of text) which remains the same when it is reversed. A palindrome is either empty, one arbitrary letter, or a palindrome extended by the same letter on each side. If the alphabet, the set of terminal symbols, we are working with is small, then the structure of palindromes can be defined easily with the help of a syntax diagram. For larger alphabets it would be very nice if we could express the idea of adding the same letter on each side with help of a variable. However, with the limited means of the formalism this is not possible.

Syntax Diagram for Palindromes

Backus-Naur Form (BNF)

Syntax diagrams are not the only way of formulating rules. A very common alternative textual way is by using the so-called Backus-Naur form (BNF). In books on computer languages there will typically be an appendix with the rules of the language formulated according to this formalism or a close variant of it.

The choice of the symbols in the BNF is somewhat old fashioned, using only type-writer symbols, but now that we are using html to write this text, this is convenient. The symbols are:

::=
defines the notion on the left in terms of the formulation on the right.
|
separates alternative possibilities.
{ }
encloses a symbol which is repeated zero or more times.

In the syntax-diagrams, the distinction between terminal and non-terminal symbols was expressed by having two kinds of boxes. In the BNF, the non-terminal symbols are enclosed between sharp brackets: "<" and ">".

The expression on the right of a "::=" is read from left to right, just as in the syntax diagrams, the default connection for several consecutively listed symbols being "and". There are no brackets for delimiting subexpressions. Thus, the "|" symbol can only be used at the top-level. If one wants to define alternatives at a lower level, a new non-terminal symbol like "letter_or_digit" must be introduced, where the "|" can be used on several listed alternatives at the top-level.

These symbols used in the BNF are neither terminal symbols from the language, nor meta-symbols used for grammatical notions. These are called meta-syntactical symbols: they are used in order to describe the syntax, but are not part of the syntax themselves.

To show how the formalism works, we reformulate the above examples, except for SNT' which is given as an exercise. For integer constants we have the following:

  <int constant> ::= <numb constant> <int suffix>
  <int suffix> ::= <upart> | <lpart> | <upart> <lpart>
  <l part> ::= <empty> | <one l part> | <two l part>
  <two l part> ::= ll | LL
  <one l part> ::= l | L
  <empty> ::=
  <u part> ::= <empty> | u | U
  <numb constant> ::= <dec constant> | <oct constant> | <hex constant>
  <hex constant> ::= 0 <x part> <hex digit> { <hex digit> }
  <hex digit> ::= <dec digit> | A | B | C | D | E | F 
  <dec digit> ::= 0 | <non-zero dec digit>
  <non-zero dec digit> ::= <non-zero oct digit> | 8 | 9
  <non-zero oct digit> ::= 1 | 2 | 3 | 4 | 5 | 6 | 7
  <x part> ::= x | X
  <oct constant> ::= 0 { <oct digit> }
  <oct digit> ::= 0 | <non-zero oct digit>
  <dec constant> ::= 0 | <non-zero dec constant>
  <non-zero dec constant> ::= <non-zero dec digit> { <dec digit> }

For chains we have the following rather clumsy way of saying that a chain consists of zero or more nodes connected by links. However, it clearly expresses the several ways chains can be constructed:

  <chain> ::= <empty> | node | 
              <node chain> | <chain node> | <chain chain>
  <chain chain> ::= <chain> link <chain>
  <chain node> ::= <chain> link node
  <node chain> ::= node link <chain>
  <empty> ::=

For palindromes we have the following:

  <palin> ::= <empty> | <letter> | <letter palin letter>
  <letter palin letter> ::= a <palin> a | b <palin> b
  <letter> ::= a | b
  <empty> ::=

The given example of integer constants shows that one may need quite a lot of non-terminal symbols to formulate a relatively easy concept. Additional meta-syntactical symbols make the formalism complexer but also more powerful, allowing to give shorter formulations.

Parsing

Notion and Problem

Until now we were producing expressions with help of a set of production rules. This is what we are doing when we are speaking (in practice many spoken sentences are ungrammatical because this is a complex process and because the listener will probably understand it any way). The other side is to give meaning to an expression. A first step is to determine the function of every word in the sentence. This is done by trying to match sequences of words to higher concepts until finally it is understood (or not) how the whole sentence is composed. This process is called parsing

The most natural way of parsing is to work bottom-up. This means that one starts at the bottom, in our case that is at the terminal symbols, and works upwards until reaching the top, in our case that means at the starting symbol. As an example we will consider a sentence, a chain and a palindrome. A priori it is not even clear that these are grammatically correct, that is whether they have been composed according to their respective production rules. Even more interesting is the question how to perform the parsing and if the parsing is successful whether the resulting decomposition is unique.

Parsing is not easy. The basic idea is that the parser (the program executing the parsing task) is continuously looking for a replacement to make in the hope to finally reach the starting symbol. If the parser does not work sufficiently carefully, it may dead end or turn in a loop. As a result it constructs a tree-like structure, indicating which symbols were taken together and replaced by one symbol. This tree-like structure is called a parse tree.

Alternatively, one can try to apply top-down parsing. This means that one starts at the top and tries to reach the bottom. This is done by starting at the start symbol of the grammar. One chooses a path in the diagram of the start symbol. For all meta symbols on this path a path in their diagrams is chosen. This procedure is continued until only terminal symbols are left. The resulting sequence of terminal symbols should be identical to the sentence we wanted to parse.

If the parser complains about syntax errors, this means that it was not able to construct a parse tree of the give sentence or computer program or whatever other expression which is supposedly constructed according to a grammar. In spoken language it is quite common that the speaker somehow ends a sentence in a way that does not fit the way he/she has started it.

Parsing Chains

A chain has a very simple structure, and parsing it is so simple that it is almost confusing. Fortunately the above formulation of the syntax facilitates the parsing. It is very convenient that the meta-symbols NLC, CLN and CLC have been introduced explicitly. The following gives two different parse trees for the same chain of length tree. In both cases the parsing is successful: the chain is indeed reducible to the start symbol.

Parse Trees for a Chain of Length 3

A grammar, like the grammar for chains, which can generate sentences that can be parsed in several ways, is called ambiguous. The sentences for which several parse trees exist are also called ambiguous themselves.

Most natural languages are ambiguous. For example, if one says in English "the man is throwing at the dog with a ball". You may assume that the man is throwing a ball at the dog, until the sentence is extended to "the man is throwing a stone at the dog with a ball". Showing that in the first sentence it is not clear who has the ball. In spoken language ambiguity is often overcome by the use of intonation, in written language this support is missing.

Parsing Palindromes

Parsing a palindrome is no problem. The parser should first determine whether the number of letters is even or odd. If it is even, the process starts with the empty palindrome and then it repeatedly tries to match characters at equal distance from the middle. If the number of letters is odd, the process starts with the palindrome consisting of the middle letter.

Parse Trees for Palindromes

Parsing a Sentence

Our sentences had quite a complex structure. Parsing is therefore harder as well. We parse the sentence "the gray red dog on the roof sees a man that hits a girl". In the context of a natural language it becomes more clear that parsing means so much as completely determining the function of each terminal symbol in the sentence.

Nevertheless, still we do not know what the sentence means: parsing can be performed by a computer with a dictionary (provided that each words belongs to a unique class), but attributing meaning to it requires more. Here we encounter the difference between a grammatical analysis and determining the semantic of an expression. The same situation we also find with numbers: 067252ll can be parsed and the conclusion is that it is an integer constant. However, the interpretation that this is an octal longlong number with decimal value 2 + 5 * 8 + 2 * 64 + 7 * 512 + 6 * 4096 = 28330 is left to be done.

Parse Tree for a Sentence

Structural Induction

Attributes

An attribute is a property of non-terminal symbols. These properties are mostly expressed numerically. In that case an attribute assigns some value to a non-terminal symbol. For example, an attribute of a sentence might be the number of words or the number of verbs in the sentence. An attribute of a palindrome is the number of letters or the number of a's.

An attribute is defined formally, by indicating for each production rule the resulting value of the attribute for the non-terminal symbol on the left in terms of the values of the attribute for the non-terminal symbols on the right and by specifying the value of the attribute for the terminal symbols.

As a first example, consider the number of a's in a palindrome. We list the rules of the grammar again, now all possible productions are numbered:

  <palin>       ::=                         (1) 
                  | a                       (2) 
                  | b                       (3) 
                  | <let_pal_let>           (4)
  <let_pal_let> ::= a <palin> a             (5)
                  | b <palin> b             (6)
Now the attribute can be defined as follows:

The definition is given so that the defined attribute corresponds to the number of a's, but there was no need to do so: we can define whatever we want, but some definitions are more useful than others.

If now one wants to determine the number of a's in a palindrome then it can be determined by constructing a parse tree and working ones way up from the terminal symbols to the starting symbol applying the given rules.

As a second example we consider a restricted grammar for numerical expressions:

  <expr>   ::= <part>                       (1)
             | <part> <operator> <expr>     (2)
  <part>   ::= <number>                     (3)
             | ( <expr> )                   (4)
Here the non-terminal symbol <number> is not further specified. So, this is an abstract grammar. <number> might for example be an <int constant>.

The attribute we consider is the number of operators. It is defined as

In this way the attributes of the non-terminal symbols higher in the parse tree are defined in terms of the attributes of the non-terminal symbols at lower levels. Such a definition is called inductive. Because this has to do with the structure of the tree this kind of induction is called structural induction, we may say "the attribute is defined by structural induction".

Of course in the simple cases of computing the number of a's in a palindrome or the number of operators in an expression the problem can be solved easier by determining the values in a direct way, but for more general attributes the definition of an attribute by structural induction and its evaluation with help of a parse tree makes sense.

Proving Properties

The most important reason for using structural induction is for proving relations between attributes. For example, we might want to prove that the number of numbers in an expression always equals the number of operators plus one. For any given expression, this claim can be verified by counting. Checking many expressions might give the feeling that the claim holds for all expressions, but one cannot be sure: there might be a special case which one did not check. For example, if the grammar would have been extended with the unary operator "-" to be applied to numbers, the claim is not true, but if we would check only examples with positive numbers, we would never detect this. Therefore, proving such a claim on attributes must be handled in a formalized way, that is by using the inductive proof of the attributes.

The attribute number_of_numbers is defined by structural induction as follows:

Both attributes are now defined formally. Proving the claim can then be performed by performing the following two steps

This approach of proving is not limited to this particular case, but is a general proof method. Such a proof is called a proof by structural induction. That this is a legal way of proving cannot be proven itself. It is an axiom of mathematics, part of the believe so to say. It is not unnatural though: if something is true for all basic cases, and it remains true when applying any of the possible ways to obtain a more involved case, then it should be true for all cases.

It remains to write down the proof by structural induction of the claim that number_of_numbers = number_of_operators + 1 for all constructible expressions. We should first check that it holds for all numbers. For these the claim is ok: one number, zero operators, as expressed by the statement on the result of a production using rule (3).

Now we consider the other production rules. We use O(symbol) to denote the number of operators in the denoted symbol, N(symbol) denotes the number of numbers. The basis of the proof by induction is given by rule (3):

N(<number>) = 1 = 0 + 1 = O(<number>) + 1.
Here "=def=" denotes "equal because of the definition", "=ass=" denotes "equal due to the induction assumption" and "=cmp" denotes "equal due to computation".

Complete Induction

Structural induction is a generalization of the proof method of complete induction. Alternatively it is also called "mathematical induction", and often it is just called "induction".

This method states, that for proving a claim on a function f from the natural numbers, it is sufficient to do the following:

  1. Check that the claim holds for n = 0, that is, check that f(0) has the value it is claimed to have for 0. This is called checking the basis.
  2. Prove, assuming that the claim holds for an arbitrary value n >= 0, it also holds for n + 1. This is called showing that a step can be made.

A well-known example of a fact which can be proven by complete induction is that for S(n) = sum_{i = 0}^n i and P(n) = n * (n + 1) / 2, we have S(n) = P(n), for all n >= 0. Possibly this might be proven in some direct way, but for proving facts like these induction is the first method to try. So, what should we do? Clear: check the basis and show that a step can be made.

Checking the basis means showing that S(0) = P(0) using the definitions of S() and P(). Showing that a step can be made means showing that S(n + 1) = P(n + 1), using the definitions of S() and P() and using that S(n) = P(n). This can be worked out as follows:

  1. For n = 0,
    S(0) =def=
    sum_{i = 0}^0 i =cmp=
    0 =cmp=
    0 * (0 + 1) / 2 =def=
    P(0).
  2. For n >= 0,
    S(n + 1) =def=
    sum_{i = 0}^{n + 1 i =cmp=
    (sum_{i = 0}^n i) + (n + 1) =def=
    S(n) + (n + 1) =ass=
    P(n) + (n + 1) =def=
    (n + 2) * (n + 1) / 2 =cmp=
    (n + 1) * ((n + 1) + 1) / 2 =def=
    P(n + 1).

Complete induction is really a special case of structural induction when one considers the natural numbers to be sentences in the language generated by the following grammar:

  <number> ::= <zero>
             | <number> 1
  <zero>   ::= 
Here the numbers are given in unary notation: the number n is written as n ones. So, proving a claim on attributes defined over this grammar by structural induction, means checking the claim for <zero> and proving it holds for 1 ... 11 assuming it holds for 1 ... 1. This is precisely what complete induction does.

The correctness of the method of proving by induction and the fact that the above grammar corresponds to the set of natural numbers constitutes the set of axioms of natural numbers which were formulated 1889 by Peano. The properties of the grammar are more conventionally formulated as:

Grammar of a Property

Above we have dealt with the issue of testing whether a property P holds for all sentences of a language generated by a given grammar G. The other side of the medal is that one would like to show that G generates all sentences satisfying P. In this case one may say

One possible way of proving that all sequences satisfying P are generated, is by indicating for any possible sequence S satisfying P by which production rule it arises out off smaller sequences S_1, ... satisfying P. In the construction of S_1, ... satisfying P, it may explicitly be used that S satisfies P. So, here the structural induction is reversed. It is essential that the new sequence(s) are smaller, otherwise a circular argument cannot be excluded. Equivalently one may also perform complete induction over the length of the sequences. This point will be addressed in more detail after some examples.

Equal Numbers of a's and b's

As an example we consider the following simple grammar:
  <sent> ::= a <sent> b           (1)
           | b <sent> a           (2)
           | <sent> <sent>        (3)
           |                      (4)

We want to prove that this grammar generates precisely those sentences which have the same number of a's and b's. Denote the number of a's by A() and the number of b's by B(). Then

In order to prove that all generated sentences S have the property that A(S) = B(S), one should proceed by structural induction. The proof is entirely analogous to the above proof that the number of numbers in an expression exceeds the number of operators by one and is left as an exercise.

Now we consider all possible sequences of a's and b's with an equal number of each of them. There is only one such sequence with 0 a's and 0 b's: the empty sequence. It is generated by rule (4). All non-empty strings with equally many a's and b's must have at least one a and one b and therefore have at least two letters in total. Strings with at least two letters over an alphabet with two letters can either begin with a and end on a, or begin with a and end on b, or ... . So, any string S with at least two letters is of the form x R y, with x, y either equal to a or b, and where R is a string with two fewer letters. For each of these four cases, we show that if S satisfies A(S) = B(S), that then it can be constructed from shorter strings also satisfying this property.

A proof like this has the problem that one needs some argument to convince the reader that really all cases are treated. In this case this argument is really convincing, but here we use external facts: our understanding of how strings with the same number of a's and b's look. A related point is that for the case a R a, we use all kinds of facts which do not follow from the production rules. For example, that if A(a R a) = B(a R a), that then A(R) = B(R) - 2. These points can be formalized by extending the definition of the attributes to all strings, not only those generated by the grammar. Even harder it is to formalize the argument that if a string has two more b's than a's it can be cut so that each section has one more b, but even this can be done. Here we are slightly tolerant and occasionally do not ask to formalize the last detail. Nevertheless it is important to underline the fundamental difference between proving that for all generated sentences a certain property holds, and proving that all sequences for which a property holds are generated. The first is straight forward, the second requires that one somehow can formulate how all possible cases look and to find a sensible subdivision in cases which can be treated separately. Not being able to find a counter example to a hypothesis is not the same as proving it!

The given prove that in the case a R b the sequence R satisfies A(R) = B(R) is complete and convincing, but it can also be given in a somewhat other way, which might sometimes be easier. The only facts that can be used in this proof is the way R is constructed and that A(S) = B(S). An alternative to the given positive proof, is a proof by contradiction. This is a general proof technique, which works as follows: some assumption is made and then it is shown that this leads to a contradiction. Then it is concluded that the assumption cannot hold, and that thus the opposite must be true. The correctness of this way of arguing is an axiom of mathematics, and has been the topic of fundamental disputes. There are few cases where a proof by contradiction cannot be turned into a positive proof, but often they are elegant and convincing. In our case, assuming A(R) != B(R), implies A(S) = A(R) + 1 != B(R) + 1 = B(S), a contradiction. Thus, the assumption A(R) != B(R) must have been wrong, which implies A(R) = B(R).

Balanced Parentheses

A variant of the above grammar is the grammar which generates sequences with balanced parentheses, for example "((()())((())()))". This is not the same problem, because here it is not sufficient that the number of each of the symbols is equal: in addition we must have that at any place the "balance" is at least 0. In other words, each closing parenthesis must match an opening parenthesis to the left of it. So, ")(" is illegal. The production rules implementing this idea are
  <par>   ::= <empty>            (1)
            | ( <par> ) <par>    (2)
  <empty> ::= 

Define the attribute B() to give the balance: B(S) gives the number of "(" symbols minus the number of ")" symbols in S. More generally, B_i(S) gives the balance in the first i symbols of S. The attribute M() on S gives the minimum prefix balance of S: M(S) = min_{0 <= i <= number of symbols in S} B_i(S). A string of parentheses S is defined to be balanced if B(S) = M(S) = 0. We will show that the grammar generates precisely these strings.

B(<empty>) = M(<empty>) = 0. So, the basis of the inductive proof that all generated strings satisfy the property is ok. For rule (2), we have B(<par> left) =def= 1 + B(<par> first right) -1 + B(<par> second right) =cmp= B(<par> first right) + B(<par> second right) =ass= 0 + 0 =cmp= 0. In order to determine M(S) for a string S = ( R_l ) R_r generated according to rule (2), we distinguish several cases. Let l be the number of symbols in R_l and r the number in R_r. The following is easy to check:

B_i(S) = 1 >= 0, for i = 1
B_i(S) = 1 + B_{i - 1}(R_l) >= 1 >= 0, for 2 <= i <= l + 1
B_i(S) = 1 + B(R_l) - 1 = 0, for i = l + 2
B_i(S) = 1 + B(R_l) - 1 + B_{i - l - 2}(R_r) >= 0, for l + 3 <= i <= l + r + 2
Thus, M(S) = 0, showing that S satisfies the properties.

For the other direction we consider an arbitrary string S of n parentheses. The first symbol is a "(", so we may write S = ( R, where R equals the remaining symbols. Denote position i, 0 <= i < n - 1, of R by r_i. B_i(R) = sum_{0 <= j < i} value(r_i), where the function value() is defined by value('(') = +1 and value(')') = -1. B_{n - 1}(R) = B_n(S) - 1 = 0 - 1 = -1, that is, in R there is one more ')' than '('. Let i, 1 <= i <= n - 1, be the smallest value so that B_i(R) < 0. Because B_i differs from B_{i - 1} by 1, we must have B_{i - 1}(R) = 0 and B_i(R) = -1. Let R_l be the string consisting of r_0, r_1, ..., r_{i - 2} and let R_r consist of r_i, r_{i + 1}, ..., r_{n - 2}. R_r is possibly empty. So, S = ( R_l ) R_r. By definition B_j(R_l) >= 0 for all j < i - 1 and B_{i - 1}(R_l) = 0. Thus, R_l is a correct sequence of parentheses. For R_r we find B_j(R_r) = B_{j + i + 2}(S) - B_{i + 2}(S) = B_{j + i + 2}(S) - (1 + 0 - 1) = B_{j + i + 2}(S) for all j. Thus, B_j(R_r) >= 0 for all j, 1 <= j < n - i - 2, and B_{n - i - 2}(R_r) = 0. Thus, also R_r is correct. Thus, S can be generated from R_l and R_r applying rule (2).

The above proof is so important that we dwell on it somewhat longer. After having defined R_l and R_r, it must be proven that these satisfy the properties. This must be shown, because only if they satisfy them, it can be assumed inductively that R_l and R_r can be generated. Only then we can argue that S can be generated by first generating R_l and R_r and then applying rule (2). This way of arguing is an implicit application of structural induction to the claim that Sat(S) = true implies Gen(S) = true, where Sat() and Gen() are Boolean attributes which are true for a sequence which satisfies the properties and can be generated, respectively, and false otherwise. In older texts, it was common to give such proofs by complete induction over the length of the sequences. Doing this, when proving that an arbitrary sequence satisfying the properties of length n could be generated, we may assume that all sequences satisfying the properties of length less than n could be generated. Such an argument is more concrete, but considered to be less elegant than a prove by structural induction, because it unnecessarily relies on a the secondary notion length.

Equal Numbers of a's and b's, Revisited

We consider a variant of the problem with equally many a's and b's.
  <sent>   ::= a <sent> b           (1)
             | b <sent> a           (2)
             | <empty>              (3)
             | b <sentaa> b         (4)
             | a <sentbb> a         (5)
  <sentaa>  ::= a <sentaa> b        (6)
             | b <sentaa> a         (7)
             | a <sent> a           (8)
  <sentbb>  ::= a <sentbb> b        (9)
             | b <sentbb> a        (10)
             | b <sent> b          (11)
  <empty>   ::=                    (12)

After what we have seen we soon realize that any sent again has the same number of a's and b's. These attributes are denoted A() and B() as before. This is easy to prove, but requires one step more than the earlier proves. Showing that not all sequences are generated is left as an exercise.

As always we must prove that for the symbol on the left of a production rule the desired property follows when we assume that it holds for the symbols on the right. However, on the left of the production rule (4) we find sentaa and in (5) we find sentbb, for which other properties hold. So, the more correct way of formulating the principle of structural induction is

So, in our case we work with four claims:

Here the last claim is the basis assumption, which may be considered as given. All other properties now can easily be tested by checking the production rules. For example:

Types of Grammars

The whole idea to explicitly view a grammar as a set of production rules has been developed by Noam Chomsky in the fifties who was most interested in describing natural languages. Backus and Naur were the first to apply this idea to describe all legal constructions of a programming language: Algol-60. By now all this has become so natural that one wonders how one could talk about these concepts in a different way. Hereafter we discuss four types of grammar. A language generated by an xxxx grammar will be called an xxxx language.

Context-Free Grammars

The grammars we were considering so far, are in a more general context called context-free grammars. These are characterized by production rules in which on the left we find a single non-terminal symbol and on the right an arbitrary sequence of terminal and non-terminal symbols.

Many useful languages are context free. Most importantly, many programming languages context free. This can easily be verified by checking the specifications of them which are often also given in BNF or in some similar format.

On the other hand, there are very few (if any) context free natural languages. Languages have verbal inflections depending on the person, they may have genders affecting articles and adjectives, they may have cases, the choice of relative pronouns may depend on whether the subject of reference is a a person or a thing.

For a more mathematical example, consider the language SS containing the sentences abc, aabbcc, aaabbbccc, aaaabbbbcccc, ... . The following grammar generates all these sentences:

  <S> ::= <A> <B> <C>
  <A> ::= a | a <A>
  <B> ::= b | b <B>
  <C> ::= c | c <C>
However, it even generates aaabcc, which we did not want to have. The following grammar comes closer, it still generates all sequences of SS and it does not generate aaabcc:
  <S> ::= a <B> c | a <sent> c
  <B> ::= b | b <B>
However, this does not really solve the problem, it is now assured that the number of a's equals the number of c's, but the number of b's can be arbitrary. The problem is not that we are not sufficiently clever: It can been proven, which is not easy, that SS cannot be generated by any context-free grammar. Informally one can argue as follows: a context-free production can essentially only glue together several non-terminal symbols with additional terminal symbols around it. This allows to add the same on both ends, but it is not possible to somehow specify that the same change should be carried out in the middle of a sequence, or that only sequences of equal length should be glued.

Context-Sensitive Grammars

Context-free grammars are a very important class of grammars but they are only one of several types of grammars. In a context-sensitive grammar, the production rules are of the form
S_1 X S_3 ::= S_1 S_2 S_3
Here S_1, S_2 and S_3 are arbitrary sequences of terminal and non-terminal symbols, while X denotes a single non-terminal symbol. The interpretation of the above production rule is the following: "In the context of S_1 and S_3, the symbol X can be replaced by S_2".

As a first example we consider a refinement of the grammar of SNT' considered at the beginning of this chapter. SNT' gave nice sentences, but it does not handle the subtility that in English we write "an apple" and "a pear". Furthermore, it builds sentences like "the man which sees the girl throws a stone". It would be more correct to use "who" instead of "which". These details can be handled conveniently with a context-sensitive grammar:

  <snt'>        ::= <frm'> <vrb> <frm'>
  <frm'>        ::= <frm> <prrl>
  <frm>         ::= <arg> <adg> <sub>
  <arg>         ::= <empty> | <vart> | <cart> | the
  <empty>       ::= 
  <vart> <vsub> ::= an <vsub>
  <vart> <vadj> ::= an <vadj>
  <vsub>        ::= <pvsub> | <uvsub>
  <pvsub>       ::= assistent | aunt | uncle 
  <uvsub>       ::= anchor | ape | eagle | elefant | umbrella
  <vadj>        ::= akward | old | ugly
  <cart> <csub> ::= a <csub>
  <cart> <cadj> ::= a <cadj>
  <csub>        ::= <pcsub> | <ucsub>
  <pcsub>       ::= girl | man | woman
  <ucsub>       ::= car | dog | house | roof | zebra
  <cadj>        ::= brown | cold | good | high | hot | strong 
  <adg>         ::= <empty> | <adj> | <adj> <adg>
  <adj>         ::= <vadj> | <cadj>
  <sub>         ::= <vsub> | <csub>
  <prrl>        ::= <empty> | <pgr> | <rlg>
  <pgr>         ::= <prp> <frm'>
  <prp>         ::= after | in | over | upon
  <rlg>         ::= <rel> <vrb> <frm'>
  <rel>         ::= <prel> | <urel>
  <psub> <prel> ::= <psub> who
  <psub>        ::= <pvsub> | <pcsub>
  <usub> <prel> ::= <usub> which | <usub> that
  <usub>        ::= <uvsub> | <ucsub>
  <vrb>         ::= buys | eats | hits | sees

The above extension achieves what we wanted, but in this case the same might even have been achieved in a context-free way by listing several alternatives for taking into account the combination possibilities. In a similar way the problem with the relatives could have been solved. So, in this case the context-sensitivity is not essential, though it allows to express the grammatical concepts in a natural way.

A class of grammars G_1 is said to be (strictly) more powerful than a class of grammars G_2, if there is a language L which can be generated with a grammar from G_1 which cannot be generated with any grammar from G_2. How about context-free and context-sensitive grammars? Clearly context-sensitive grammars are at least as powerful as context-free grammars, because taking S_1 = S_3 = empty gives a context-free production rule. So, the question is whether there are languages which can be generated by context-sensitive grammars which cannot be generated by context-free grammars. The remainder of this section is devoted to proving this.

Theorem: Context-sensitive grammars are strictly more powerful than context-free grammars.

Proof: Consider again the language SS = {abc, aabbcc, ...}. Above it was claimed that this language cannot be generated by a context-free grammar. So, proving that SS can be generated by a context-sensitive grammar, demonstrates that context-sensitive grammars are strictly more powerful than context-free grammars in the above defined sense. We propose the following grammar:

    <S>     ::= a <B> <C>       (1)
    a <B>   ::= a a <B> <B>     (2)
    <B> <C> ::= bc              (3)
              | b <C> c         (4)
    <B> b   ::= b <B>           (5)
  

We show how to construct aaabbbccc using this grammar. We leave the "<" and ">" symbols away to shorten the notation. We use "-i->" to indicate a production with rule i, for 1 <= i <= 5:

    <S> -1-> aBC 
        -2-> aaBBC 
        -2-> aaaBBBC 
        -4-> aaaBBbCc 
        -5-> aaaBbBCc 
        -5-> aaabBBCc 
        -4-> aaabBbCcc 
        -5-> aaabbBCcc 
        -3-> aaabbbccc 
  
Generalizing the given scheme for generating all sequences in SS is left as an exercise. Before trying a full generalization, it is recommended to first consider how aaaabbbbcccc can be generated.

We are not ready yet. Generating all sequences of SS is not hard, and can even be achieved with a context-free grammar. The point is that a context-free grammar cannot generate precisely those sequences. So, it remains to verify that the set of (non-terminal-free) sequences generated with the given context-sensitive grammar only contains sequences a...ab...bc...c with equally many a's, b's and c's.

The proof can be given by defining attributes: N_a, N_b, N_c, and N_B, indicating the number of symbols of each type. Using structural induction, the following equalities can be proven to hold for all generated sequences:

Thus, when ultimately N_B = 0, we must have N_a = N_b = N_c. Of course, this is not all: the specified strings are not only characterized by the fact that they have equally many a's, b's and c's, but also by the fact that all a's come before all b's which come before all c's. This can be proven by using two boolean attributes. The attribute O_ab is true if all a's and A's come before all b's and B's and false otherwise. O_bc is defined analogously for b's and c's. With structural induction it is easy to prove that O_ab and O_bc are true for all generated sequences. End.

The reason to not always use context-sensitive grammars is that sentences from a context-sensitive grammar are considerably harder to parse than sentences from a context-free language.

General Grammars

Even more general and stronger are the general grammars. In those there is no limitation on the left-side at all: the production rules have the form
S_1 ::= S_2
That is, an arbitrary sequence of symbols S_1 can be replaced by an arbitrary sequence of symbols S_2. Parsing sentences from general languages is extremely hard: for all known algorithms, there are sequences of length n, for which parsing takes time exponential in n. Here algorithm means a step-by-step description how to proceed.

Regular Grammars

There is also a category of grammars which is more limited than the context-free grammars: the so called regular grammars. These have production rules satisfying the following conditions: The first condition specifies that a regular grammar is context-free. So, in a regular grammar one cannot have mutual recursion, one cannot have the non-terminal symbol on the left side or in the middle and it cannot occur several times.

Notice that in many books one may find a definition which at a first glance appears to be more restrictive, however, the definition given here is equivalent in a sense that all grammars which are classified as regular according to this definition can also be classified as regular according to the apparently more restrictive definitions. With the here given definition it is often easier to show that a grammar is regular.

The given grammar for chains is not regular. However, chains can be ontained with a regular grammar as well:

  <chain> ::= <empty>
            | node
            | node link <chain>
The same is true for the language of (numerical) expressions without brackets (which is essentially the same):
  <op>   ::= + | - | * | /
  <expr> ::= <empty>
           | <number>
           | <number> <op> <expr>

Regular grammars are strictly weaker than context-free grammars. For example, the given grammar for palindromes contains the production

  <pal> = a <pal> a
which is not a regular production rule, showing that this is not a regular grammar. In principle, this does not exclude the possible existence of a regular grammar generating all palindromes, but it can be shown that there is no such regular grammar. Intuitively it can be understood as follows: palindromes appear to essentially require a growth out off the middle, something which is not possible regularly.

If a language as basic as that of the palindromes cannot be generated then clearly many other useful languages cannot be generated either. The reason to consider regular grammars is that they are almost trivial to parse, considerably easier than context-free languages.

Chomsky Hierarchy

The described types of grammars are also known under the following names:
Type 0: general grammars
Type 1: context-sensitive grammars
Type 2: context-free grammars
Type 3: regular grammars

This classification is due to Chomsky in the course of his pioneering work in this area. Therefore it is also called the Chomsky hierarchy.

Exercises

  1. The natural numbers can be defined inductively as follows: 0 is a natural number; for any natural number i, i + 1 is a natural number as well. In unary notation a number i can be represented by a sequence 11...1 of length i (or any other sequence of i symbols). Give a syntax diagram of the natural numbers in unary notation. Also give the corresponding BNF formulation.

  2. Give the BNF formulation of SNT' and all non-terminal symbols used in its definition.

  3. Floating point constants can have many different forms. In words the rules are as follows:
    • A float must have a dot or an E or both.
    • If a dot occurs, then the part before or after it consists of one or more digits. If no dot occurs, then the part before the E consists of one or more digits.
    • If an E occurs, then the parts after it consists of one or more digits possibly preceded by a sign.
    • Instead of E one can also use e.
    • The number can have a suffix: one of the letters f, F, l or L.

    We give some examples. Correct are .14, 17., 0E1, 17.3E-15, 12.3E+14F. Not correct are 18 (no dot, no E), .E1 (no digits before or after the dot), E1 (no digits before the E), 0E (no digits after the E), 14E1.2 (dot after the E), -71.56 (the sign is not part of the float constant but considered to be a unary operator).

    Give a complete syntax diagram for floating point constants in decimal notation. Also give the complete BNF formulation of the above rules. Hint: work analogously to the given examples for integer constants and chains, defining several non-terminal symbols for the respective possible parts and forms of the number.

  4. Give parse trees for the following numbers:
    • 056347ull
    • 78CD33A6
    • 1562.7882E-12F
    For the last number you should refer to the non-terminal symbols from the previous exercise.

  5. Give a parse tree for the following sentence: "the man with the meager white ugly face which looks like an apple sees the cow in the field which lies against a green slope". Is the given grammar ambiguous or not?

  6. A part of the definition of statements in C looks as follows:
          <stat> ::= <var> = <expr> ;
                   | if <expr> <stat>
                   | if <expr> <stat> else <stat>
        
    Here <var> stands for variable and <expr> for expression. Show that this grammar is ambiguous, by presenting two different parse trees for the code fragment
           if (a > 3)
             if (a > 0)
               a = a + 3;
            else
              a = a * b + 7;
        

    Show two ways how to modify the grammar of the if statement so that the ambiguity is eliminated.

  7. Formulate a grammar for numerical expressions which allows to parse an expression like 34 + 42 * 100 / 35 * 6 - 12 + 5 in a way that expresses the normal way of evaluation: * and / have higher priority than + and -, among operators of the same priority the processing order is from left to right. So, the given expression should be evaluated as (((34 + (((42 * 100) / 35) * 6)) - 12) + 5). Hint: distinguish between terms and factors, between multiplicative and additive operators. The operators which should be applied early should not be useful to generate expressions.

  8. Using the BNF formulation of SNT' derived above, give a formal definition of the attribute which corresponds to the number of verbs in an extended sentence. A second attribute is given by the number of relative pronouns in an extended sentence.

    Prove by structural induction that the number of verbs in an extended sentence equals the number of relative pronouns plus one. A correct BNF formulation of SNT' can be downloaded here.

  9. Prove by structural induction that the number of a's and b's is equal for all sentences generated by the grammar
          <sent> ::= a <sent> b 
                   | b <sent> a 
                   | <sent> <sent> 
                   | 
        

  10. We have shown that all strings <sent> generated with the following grammar have equally many a's and b's:
          <sent>   ::= a <sent> b           (1)
                     | b <sent> a           (2)
                     | <empty>              (3)
                     | b <sentaa> b         (4)
                     | a <sentbb> a         (5)
          <sentaa>  ::= a <sentaa> b        (6)
                     | b <sentaa> a         (7)
                     | a <sent> a           (8)
          <sentbb>  ::= a <sentbb> b        (9)
                     | b <sentbb> a        (10)
                     | b <sent> b          (11)
          <empty>   ::=                    (12)
        

    Give an example of a short string with equally many a's and b's, which cannot be generated by this grammar. Generalize this to indicate infinitely many strings which cannot be generated. Prove this. Hint: use the way the strings are grown and the values of the attributes A() and B() giving the number of a's and b's, respectively.

  11. Above we specified a grammar for expressions:
          <expr1> ::= <part> 
                    | <part> <operator> <expr1>
          <part>  ::= <number> 
                    | ( <expr1> )
        
    This grammar generates exactly the same sequences of symbols as the simpler grammar
          <expr2> ::= <number> 
                    | ( <expr2> ) 
                    | <expr2> <operator> <expr2>
        
    Prove this (for example, by showing that both generate everything we normally consider to be expressions with correctly placed parentheses).

    Two grammars are said to be weakly equivalent if they generate the same language. Two grammars are said to be strongly equivalent if they are weakly equivalent and if for any word of the language they construct isomorphic parse trees. The above proof shows that the grammars for expr1 and expr2 are weakly equivalent. However, they are not strongly equivalent as can be seen by checking that the parse trees of a * b + c are not isomorphic.

  12. Consider again the grammar for <sent> in the previous exercise. Denote by SS the set of all generated strings, that is, the language generated by the grammar. Let SS' denote the set of all strings with equally many a's and b's. In the text of this chapter it was shown that SS is a subset of SS'. In the above exercise it was shown that SS is a true subset of SS' by proving that there is a string S in SS' which is no element of SS. For this very simple grammar and easily characterized set SS', this was quite easy, but in general it might be hard to find such a string S, and even harder to proof without using a computer that a string does not belong to a language. Fortunately, sometimes it is not necessary to provide an example for of an element S in SS' which does not belong to SS to prove that SS cannot be equal to SS'. One such an alternative way of proving this, is by demonstrating that SS and SS' have different cardinality. In our case, it can be shown that |SS| < |SS'|. Proving this inequality is the topic of this question. Actually, we may focus on the strings of any given length: SS is equal to SS' if and only if, for any n, SS_n = SS'_n, where SS_n and SS'_n denote the subsets of strings in SS and SS', respectively, with length n. Let x_n = |SS_n| and z_n = |SS'_n|.
    • z_n is the number of strings of length n containing equally many a's as b's. Give a concise expression for z_n.
    • Use Stirling's formula, which says that n! ~= Theta(sqrt(n) * (n / e)^n) to estimate z_n as z_n ~ alpha^n, for some value alpha to be determined.
    • Let y_n be the number of strings of length n in the language of <sentaa>. Carefully consider the grammar and formulate recurrences expressing upper bounds on x_n and y_n in terms of the values x_{n - 2} and y_{n - 2}. Without a proof you may assume that the number of strings of length n in the language of <sentbb> is equal to y_n.
    • Why are the recurrences not necessarily giving equalities?
    • Compute lower bounds on x_n and y_n for all even n <= 12.
    • Give estimates showing that x_n and y_n lie within small constant factors from each other for all n.
    • The above should show that (the upper bounds on) x_n and y_n grow exponentially. Assuming that x_n = beta^n and y_n = c * beta^n, determine beta.
    • Combine all derived facts to conclude that x_n < z_n.
    A proof of the above type is known as a counting argument.

  13. Consider the following grammars defined in the text of this chapter :
    • Extended sentences, SNT'
    • Integer constants, ICN
    • Equal number of a's and b's
    • Balanced parentheses
    For which of these can the presentation of the grammar easily be modified to become regular? For all these, give the modified grammar. Explain in an informal way why the other grammars are not regular.

  14. Above we considered the following grammar:
          <S>     ::= a <B> <C>          (1)
          a <B>   ::= a a <B> <B>        (2)
          <B> <C> ::= bc                 (3)
                    | b <C> c            (4)
          <B> b   ::= b <B>              (5)
        

    Give an algorithm, a step-by-step description how to proceed, for the construction of a sequence a...ab...bc...c with n a's, b's and c's. Alternatively you may give an inductive proof that all strings of this form are generated.

    Formalize and complete the proof that all generated strings without non-terminal symbols are of this form.

    This grammar was presented as an example of a context-sensitive grammar being able to generate a language which cannot be generated by a context-free grammar. However, the way it is formulated, most rules are not following the allowed format for context-sensitive grammars. Indicate for each rule whether the given formulation is regular, context-free, context-sensitive or general.

    Rewrite the grammar (adding non-terminal symbols and production rules) so that it really becomes context-sensitive. Take care that the new grammar should still generate exactly the same sequences of terminal symbols.

  15. How hard is parsing a sentence from a regular language? More precisely, for a sequence S of n symbols generated with a given regular grammar, how long does it take to construct a parse tree for S? Express the time in terms of n, and describe how the parsing process should be organized in order to achieve the given time bound.

  16. The size of a tree structure is the number of nodes in the tree. The size of a parse tree gives a lower bound on the time for parsing the considered sequence and corresponds to the number of productions applied for generating it. We are interested in determining the maximum size of a parse tree for parsing a sequence of n symbols.

    We must be careful. Consider the following grammar:

          <S> ::= <A>
          <A> ::= <B>
          <B> ::= <A>
                | a
        
    There are many possibilities to parse the sequence a:
          a -> <B> -> <A> -> <S>
          a -> <B> -> <A> -> <B> -> <A> -> <S>
          a -> <B> -> <A> -> <B> -> <A> -> <B> -> <A> -> <S>
          ...
        

    However, this is stupid, one should not make it unnecessarily larger. So, let t(R, G) denote the minimum number of productions needed to produce a sequence of symbols R with grammar G. In terms of t() we define

    T(n, G) = max_{R is a sequence of n symbols produced with G} t(R, G).

    Of course T(n, G) is a function which depends on n. It also depends on G. If G has many production rules, the number of rules to apply to obtain a sequence may be larger. In order to know how hard parsing, speaking in general, may be, we should give a lower estimate of

    C(n, m) = max_{G is a grammar with m production rules}

    Give a context-free grammar with m production rules for which producing sequences with n symbols requires about n * m steps. Because m is a constant, this is not so bad at all: n * m is linear in n. Show that in general C_cf(n, m) <= n * m, where C_cf(n, m) is defined just as C(n, m) with the maximum taken over all context-free grammars.

    The situation is worse for context sensitive grammars in an essential way. Let G be the grammar for generating the sequences {abc, aabbcc, aaabbbccc, ...}. Think of your algorithm for generating the sequence a...ab...bc...c with n a's, b's and c's. Show that the number of production steps in this algorithm is quadratic in n. That is, show that there is a constant d > 0 so that the algorithm takes at least d * n^2 steps.

    Explain in a non-formal way why for this grammar any schedule for generating the sequence with n a's, b's and c's consists of at least a quadratic number of productions.

    The above points together give a very important result: for some d > 0, C_cs(n, m) >= d * n^2 > n * m >= C_cf(n, m) for n > m / d. Thus, when parsing a sentence from a context-sensitive language it may not only be harder to find the steps to make, but the number of these steps may also grow with the length of the sentence in a much more unpleasant way.

  17. In the text it is shown that chains and expressions without brackets are regular languages. Regular grammars can also be used to generate sequences with a given subsequence or a number of specified subsequences. The order of these subsequences can be specified or not. Let S_1 = aaabbb, S_2 = aaaabab and S_3 = baaabaab.
    • Give a regular grammar generating all sequences of a's and b's containing S_1 or S_2 or S_3 (at least one of the three).
    • Give a regular grammar generating all sequences of a's and b's containing S_1 and S_2 and S_3 (all three) non-overlapping and in arbitrary order. Notice that S_1 may be separated from S_2 and S_2 from S_3 by other symbols.
    • Give a regular grammar generating all sequences of a's and b's containing S_1 and S_2 and S_3 non-overlapping in the given order. Notice that these sequences may be separated by other symbols.


This page was created by Jop Sibeyn.
Last update Monday, 24 January 05 - 10:05.
For any comments: send an email.