As html is not suited for mathematical formulas, some additional notation is used (as used in the typographical package Latex). a_i denotes a with subscript i. a^i denotes a to the power i. <= and >= are used as in most computer languages. Curly brackets, "{" and "}", are use to group things. sum_{i = 0}^j denotes the sum for i running from 0 to j. The same may also be written sum_{0 <= i <= j}. ~= stands for "approximately equal" and ~ for "proportional to". Greek letters are written out. For logical expressions we either use the notation from C or the operators are written out in text. So, "a && !b" is the same as "a and not b". sqrt stands for the square root function and log without specifying the base number for the logarithm to basis 2. There are a few more notations, but they should be understood easily. New notions are printed bold there where they are defined. Inside the chapters particularly important ideas and short key notes are highlighted without introductory text. There are many pictures. These are intended to be self-explanatory. Typically they are placed just after the text fragment to which they belong, generally there is no direct reference to them in the text.
There are very few references to the literature. Clearly most of the presented material is not new. A large part is even common knowledge. More directly this text is based on material found in the following books:
Notice: the following text may be overcomplete. At the examination it is expected that the students only know all that has been presented during the lectures.
Computer science is the mechanization of abstraction.Rather cryptical at a first glance.
On the website of the Computer Science Department in Halle we find:
Informatik ist die Wissenschaft von der theoretischen Analyse, der organisatorischen und technischen Gestaltung und der konkreten Realisierung von (komplexen) Systemen, die menschliche Fachkenntnisse und Kommunikation in technischen, wirtschaftlichen und sozialen Bereichen unterstützen sollen.Even if you can read German, this is hard to understand
"Mechanization of abstraction" may sound cryptical, but at least it is short. The word "computer science" is maybe too narrow. In many other languages the word "information" is hidden in the name. So, it is a science working with computers (or some other mechanical device) on information. This information is often obtained by constructing an abstract (that is mathematical) model of the world. For example, if we want to compute the optimal assignment of green/red durations to traffic lights in a city, then we must model this problem in such a way that it can be handled by the mechanical equipment at hand, our computer. Abstraction does not mean to say easy things in a hard way, but rather to make it simpler by not paying attention to details which most likely do not matter for the problem under consideration. In our example, it is probably not important to keep track of the colors or brands of the cars, but probably we should keep track of their length.
On the website of the Computer Science Department we also find:
Die Einordnung der Informatik in das Wissenschaftsgebäude ist schwer. Viele Grundlagen und Methoden der Informatik sind mathematischer Natur: Abstraktion, Logik, Fragen der Berechenbarkeit. Ihre naturwissenschaftliche-experimentelle Komponente wird deutlich beim Erforschen, Modellieren und Simulieren von schwer überschaubaren Strukturen. Informatik ist auch Ingenieurwissenschaft, denn ausgehend von konstruktiven Methoden geht es ihr um hard- und software-technische Realisierungen von Systemen; hier sieht sie ihre Aufgabe in der Begriffsbildung wie in der Entwicklung generalisierbarer Methoden. Ihr Leitbild ist die Erstellung korrekter Systeme. Damit paßt die Informatik nicht in die klassische Einteilung der Wissenschaften. C.F. von Weizäcker bezeichnete aus diesem Grunde die Informatik als Strukturwissenschaft.This is not very helpful as a definition, and I do not agree with the main claim, namely that computer science is something very special. I think that computer science is not different in character from physics, chemistry or geology: based on a solid mathematical foundation, aspects of the real world are modeled with the goal to obtain answers to practical and theoretical questions in their respective domains. On the more practical side of these sciences we find people developing flat-screen TVs and high-temperature super-conductors, better paints and more efficient production methods for plastics, searching for oil or gold. All these sciences derive their importance from the fact that ultimately they lead to engineering.
The word "computer science" also contains the word "science". This is to underline that it is not like learning a foreign language. "Science" means research, development and especially an attitude of openness for changes. Changes occur in the underlying conditions and the domains of application.
One of the important pioneers is Charles Babbage (1791-1871). He designed and build machines for computing differences. He even designed a much more ambitious machine, "analytical engines", which would have had many features of a computer. These are great achievements, but we clearly see the relation to his time of working: 1850 was the highday of mechanical thinking. The Jackard weaving looms (Joseph Marie Jacquard, 1752-1834), which are used to weave complex patterns based on information stored in the form of punchcards, were in full production those days.
The first true computer scientist, for whom the fundamental aspects were more important than the engineering aspects of building a useful machine is Alan Turing (1912-1954). He was thinking in a purely abstract way about the computational possibilities of an imaginary machine, now called "Turing machine", with reading and writing capability on one or more tapes. He did this in a time when such machines where not constructible yet, but it appears that his thoughts were influenced by the fact that in his time telegraph machines were rattling in every office.
The next great person is John von Neumann (1903-1957). Among many many other things he came with a cost model, now called the "von Neumann model". In this model all computer details are abstracted away, which allows to focus on the essential, namely the number of operations performed. This model allows to easily compare two programs: just count the number of operations for each of them, and the one who performs least operations is assumed to be the best. This model was formulated in a time that indeed most operations had approximately the same cost.
Now, we have the mechanical era far behind us, and punchcards are no longer in use. We still know what tapes are, but the most common storage media are round. More importantly, we see that processors become faster at a much higher rate than memories: on a modern processor, fetching a number from the main memory costs as much as several hundreds of multiplications. This means that if we compare two programs, only considering the number operations performed, does not necessarily lead to the right conclusion. This observation leads to a gradual shift of interest.
Current phantasies go in direction of bio-computing and quantum computing. Bio-computing means that one uses the binding properties of DNA to do computation. This method can really be applied, in principle it even gives the possibility to solve complex problems far faster than with conventional computers, but there are many practical problems, the most important being that one needs enormous amounts of molecules to solve any non-trivial problem. Quantum computing appears to have a larger potential. So far we have accepted that storing a bit requires a transistor. A transistor is a switch. Nowadays transistors are constructed by adding some other substances to a piece of silicon. These silicon-based transistors have become extremely small, but will always need a substantial number of molecules. Quantum computing might bring the next jump in scale: the idea is to store a bit by using that the spin of quantum mechanical objects (electrons, atoms, ...) is discrete. So, potentially memory might be realized at an atomic scale.
During the 10 years up to 1970, computers became available to big companies and some academic institutions. The application area is now dominated by financial administration and number crunching for physical and chemical research. The leading languages for these applications are (still) Cobol and Fortran.
During the 10 years up to 1980, computers became much cheaper and smaller and start to enter the offices of banks and many other companies. The main application gradually becomes text processing, computers are more and more used as noble typewriters. This is the time of the IBM dominance.
Around 1980 the first home computers appeared based on the ZX 80 processor. The capability of the Sinclair Spectrum, which costed about 200 Euro, was extremely limited (the first versions had a few KB of RAM memory and a very "special" keyboard, though later a "luxury" version appeared with a more conventional keyboard and 48 KB of RAM). The next step was the Comodore 64, which had the feel of a real computer (64 KB of RAM and quite appealing design). Then Atari launched a machine with an incredible 1 MB of RAM, the first machine which offered a low-price alternative even for office-use. Anyone who wanted could now have a computer, which was in most cases used for playing games.
The internet, which had been existing for a long time for sending email among scientists, was gradually becoming more important. This exploded by the development of browsers and the concept of websites which are reachable at all times. Nowadays the most striking usage of computers is as a device for surfing the web.
This does not mean that the other applications have disappeared: business administration is still very important; textprocessing is still what many people every day use their computers for; in science computers are used everywhere; games are still played massively. However, games have in part moved from being played on general purpose computers to special purpose game computers. Also for surfing the web a dedicated device might be cheaper and better. Possibly the number of home computers has now reached its peak.
On normal computers we got used to 32-bit arithmetic (and there are even 64-bit processors around). This means that in a single clock cycle two 32-bit numbers can be compared and added and even, this is really impressive, multiplied. Embedded computers often still work with 4-bit arithmetic. For computational purposes they would be terribly slow, but for processing some simple signals this is often enough. A 4-bit arithmetic-unit requires much less hardware than a 32-bit one, thereby reducing the size of the ship and thus its price. As should be clear from the above, an embedded computer, including its memory, normally consists of a single chip.
Programming embedded computers is somewhat different from programming usual computers. In the first place must such programs be extremely space efficient because every extra byte of storage increases the cost of the device. In a time that the memory of home PCs is approaching 1 GB, this may sound futile, but depending on the application we may talk about "computers" whose price may lie in the penny range.
For programs for embedded computers, correctness is even more important than for normal computers. It may be time consuming, but if in a normal program a bug shows up, it is mostly possible to make a patch. On the other hand, imagine what happens if, due to a programming error which rarily shows up, all cars of a certain model must be called back for exchanging some module hidden somewhere deep inside.
Correctness in the case of an embedded computer is more than that it does what it should do. There may also be the requirement that it performs a task within a certain number of clock cycles: a signal must have been processed before the next signal may come in. The number of signals a signal processor can process per second is part of its specification. This number must be guaranteed, it is not good enough if it mostly makes it. The same applies to the space consumption: due to the fixed amount of memory, which is part of the specification of the chip the program is designed for, the maximum space consumption must be guaranteed.
A similar situation can be found on the programmable pocket calculator TI 59 produced by Texas Instruments around 1980. It has an accessible user-memory of 4000 bits (not bytes). 4 bits (a unit which is also called a nibble) are required for a single decimal digit. This means that in total 1000 decimal digits can be stored. On this calculator, the user can freely choose whether to have more space for storing 8-digit numbers or to have more space for the program. Every instruction was encoded by a two-digit opcode (clear = 25, store = 42, recall = 43, - = 75, label = 76, reset = 81, + = 85, ...). So, four program instructions could be traded against one possibility to store a number. The space for storing numbers was running from one end of the memory, the space for storing the program from the other end. Of course the boundaries of these partitions where not tested at runtime and strange things happened when overwriting part of the program ...
#include "stdio.h"
int main() {
int a, b;
printf("\nGive the value of a >>> ");
scanf("%d", &a);
printf("Give the value of b >>> ");
scanf("%d", &b);
a = a * b;
printf("The product is %1d\n\n", a);
return 1; }
The first line tells that, before anything is done, routines from the
library stdio.h must be loaded. This library contains IO routines which
are needed for reading and writing data.
The actual program always begins in the section called "main". The piece of code "int main()" is called the program header. It tells us that the return value of the program is an int, and, in this case, that the program does not have external parameters. The "{" indicates the beginning of the text. The end is indicated by "}". In between we find lines of code, giving several instructions. These form the program body. Each instruction is ended with a ";". Except for the (conditional) jump statements and calls to subroutines which are discussed later, a program is executed in linear order, processing the instructions line-by-line. Instead of the word instruction we will prefer the word statement.
Following the header, it is told which variables are going to be used: "int a, b;". Such a statement is called a declaration. This means that we declare three integer variables. Declaring a variable means that some storage is allocated which can be used to store a value. In C and all other modern computer languages, each variable has a type. This means, that for each variable it must be specified what kind of value it is. Examples of types are char (characters), int (integers) and float (floating-point numbers). Specifying the type is essential because the amount of storage to allocate depends on the type. Specifying the type also allows the system to perform type checking it is probably an error if a character is compared with an integer. C is exceptional in the sense that almost no checking is performed at all. This is done to make the execution faster, but it also implies that programming errors can go undetected for a long time, making the debugging, the process of finding the errors, in programs written in C much harder than in stricter languages such as Java. There are different views on where to write the declarations. In the languages Pascal and C the declarations must come before all other statements. In C++ and Java they can appear at any place before there first usage. In Java it is actually considered to be good style to make a declaration upon first usage. Both views are defendable.
After the variable declaration, we find a statement producing a line of output. The format of IO statements varies strongly from language to language. C is quite convenient. In a print statement, the text to print is enclosed between a pair of "-symbols. Possibly we want to print the value of a variable. In that case it is indicated what kind of variable it is and how much space one can use for it by a combination like "%1d". Here the "%" indicates "take care, here a variable value must be inserted". The "1" indicates that the default number of used positions is 1, always using at least the number of positions the number actually requires. The "d" indicates that here we are going to print an integer (for a float one uses "f", for a char "c" and for a string "s"). The combination "\n" generates a line feed.
Reading a value is analogous. Only now we must precede the name of the variable by the symbol "&". In the section on procedures we will see that this symbol means that we are passing the address of the variable, which allows to return a value into it. Several values can be read in a single statement: "scanf("%d%f", &a, &x);" can be used to read an integer a and a float x. It is equivalent to "scanf("%d", &a); scanf("%f", &x);".
The statement "a = a * b;" is an example of a simple computation followed by an assignment. In an assignment the value of an expression on the right is assigned to a variable on the left. The original value of the variable on the left is overwritten and cannot be retrieved anymore. In our case we did not further need the original value of a. Alternatively we might have used a third int variable c, writing "c = a * b;". In some other languages (such as Algol and Pascal) the symbol for assignment is ":=".
The last statement of the program is "return 1;". Not all compilers require this, but some of them insist that main is returning an integer value. That is why we have written "int main()". Any value will do. The value can be used as a flag. For example, if the program may be terminated in several ways, this value can be used by the calling instance to test on errors or the like.
The execution of any C program starts at "main". The usage of any variable must be preceded by a declaration.
Available compilers are at least gcc and cc. gcc is a real C compiler, cc is a C++ compiler. These compilers do not generate the same code and there is no guarantee that a program which runs when compiled with gcc also runs when compiled with cc or vice versa. There are two main reasons for this:
Compiler options can tell the compiler all kind of useful things: the degree of optimization, the file the code should be written to, the amount of warnings that should be printed, how tolerant the compilation should be performed, which libraries should be loaded, etc. A possible compile command looks like
gcc my_program_c -O3 -Wall -ansi -o my_executableHere "-O3" means that we want optimized code, "-Wall" means that we want to hear all warnings, "-ansi" means that we want strict enforcement of the ansi rules, "-o my_executable" indicates that the compiled code should be written to the file my_executable.
We have seen that the IO library may be loaded by the instruction #include "stdio.h". Such an instruction is executed before the actual compilation starts. Other libraries which one may need are
Of course, most programs, especially those of beginning programmers, contain syntactical errors when they are compiled for the first time. Syntactical errors are deviations from the syntax. The compiler is checking whether all syntactical rules have been applied and only when there are no violations an executable is generated. The default name of this executable is a.out. As the word suggests, the "executable" can be executed directly. That is, execution of the program can be started by simply typing
a.outIn some environments, for technical reasons, one has to type "./a.out".
If there are no syntactical errors, there may nevertheless be all kind of other errors. Possibly the programming is crashing at runtime, because one is performing a division by zero or running out off an array. In this case we say that the program has runtime errors. But even if the program has neither syntactical nor runtime errors, it does not need to be correct. Turning on the warnings with -Wall will find some of the non-syntactical errors, for example it detects that one is using an uninitialized variable, but no software whatever clever, can detect that the program is not fulfilling the specification (unless the program is fed with the specification).
With help of a compiler the source code of a program is translated to executable machine code.
Each type has a certain size. This size is not specified exactly: on larger systems / more powerful processors, a long may be longer than on a small system / basic processor. This is handy. It is guaranteed however, that a long (double) is at least as long as an int (float). The range of numbers that can be stored into a variable of a certain type depends on the number of available bits. For chars the range goes from -128 to +127, for 4-byte ints from -2^31 to 2^31 - 1. A nice feature of C is that there are also unsigned versions of the types char, int and long. These can be used for numbers of which it is known that they are always positive. This doubles the available range: unsigned chars can assume any value from 0 to 255, unsigned ints any value from 0 to 2^32 - 1. The following is correct "unsigned char c = 178;", whereas "char c = 178;" gives an overflow of the number c (the program will not be interrupted, but the value of the variable c will probably not be the expected one). The words overflow and underflow are used to indicate the situation that a variable gets assigned a too large or too small value. Overflow might also be used to generally designate that a value outside the allowed range is assigned.
In many programming languages there is also a type "boolean" or "bool" for storing logical values (true and false). In C any of the numerical types can be used for this, char, interpreted as a number, is most suitable. False corresponds to 0, true to all other values.
C provides several primitive numerical types and characters, but no primitive type for booleans.
There are even some special operators which are merely shorthands for combinations of the above operators:
i++; <-> i = i + 1; i--; <-> i = i - 1; i += j; <-> i = i + j; i -= j; <-> i = i - j; i *= j; <-> i = i * j; i /= j; <-> i = i / j; i %= j; <-> i = i % j; i &= j; <-> i = i & j; i ^= j; <-> i = i ^ j; i |= j; <-> i = i | j; i <<= j; <-> i = i << j; i >>= j; <-> i = i >> j;
All operators belong to 1 of 15 priority levels. Any book on C provides a complete table. Here we give a shortened version:
Here in every row we first indicate the priority (higher priority operators are executed first), then the operators and finally the execution order for operators in the same priority class. "Normal" indicates execution from left to right, "reversed" indicates execution from right to left. One must not know all this. In case of doubt one should rather use brackets than look it up: the compiled code does not become longer because of this, but it becomes much easier to understand the program!
15 bracketlike () [] .normal 14 unary operators ++ -- ! + - * &reversed 13 multiplicationlike * / %normal 12 additionlike + -normal 11 shifts << >>normal 10 comparisons < <= > >=normal 9 equalitylike == !=normal 8 ... 4 logical & ^ | && ||normal 2 assignment = += -= *= /= %= &= ^= |= <<= >>=reversed 1 comma ,normal
In expressions it is essential to assure the correct execution order.
#include "stdio.h"
#define SIZE 10
int main() {
int i, m;
int a[SIZE];
printf("\n");
for (i = 0; i < SIZE; i++) {
printf("Give the value of a[%2d] >>> ", i);
scanf("%d", &a[i]); }
for (m = a[0], i = 1; i < SIZE; i++)
if (a[i] > m)
m = a[i];
printf("\nThe maximum value is %1d\n\n", m);
return 1; }
In the second line we find "#define SIZE 10". Upon encountering such an
instruction, the preprocessor, the program that processes the program
before the compilation, replaces all occurrences of SIZE by 10. In the
program text SIZE can be used as a constant, but in the running program
it does not physically exist, because it has been replaced by the
numbers 10. The great advantage of using a defined constant instead of
the value itself, is that this value has to be changed in a single
place to make the program suitable for arrays of another size. This is
particularly useful for constants which are used for tuning the
program, such as the maximum size, the maximum number, the number of
files that may be used, the size of certain blocks, etc. Notice that
we did not write "#define SIZE 10;". This would replace size by "10;".
The statement "int a[SIZE];" is the declaration of an array. In this case it is specified that a[] is an array with space for SIZE integers. The storage positions in an array are called fields. The fields can be accessed by indexing the array. That is, position i of the array is accessed by writing a[i]. In C (and Java) an array of length size starts at position 0 and ends at position size - 1. It is a common error to try to access the non-existing position a[size]. Java tests for this and other array-bound errors and the computation is interrupted with an adequate error message if they occur. In C the program will mostly run on possibly computing a wrong result or crashing at a point from where it is hard to trace back to the original error.
The usage of arrays would be cumbersome without instructions for repeated execution. The for statement is the major example of such a statement. In the program we find "for (i = 0; i < SIZE; i++) { ... }". This means that for all i, running from 0 to SIZE - 1 the statements inside the curly brackets have to be executed. The for statement is the most natural choice for any loop with a fixed number of repetitions. The entire section of the program "for( ... ) { ... }" is called a for loop. For loops are provided by any imperative programming language, though the format may be somewhat different.
Inside the second for loop, we find an if statement. Here it appears in its simplest variant: a condition is tested, and if this condition is satisfied, the following statement is executed. Here this means that if a[i] happens to be larger than the current value of m, then a[i] is assigned to m.
The format of the second for loop is slightly different from the first one. In "for (m = a[0], i = 1; i < SIZE; i++)", there are two initializations: a[0] is assigned to m and 1 to i. In this loop the maximum of all values is computed. We first verify the correctness of this computation for the case a[] = (12, 45, 67, 16, 65, 89, 13, 44, 92, 55). The values are given for the situation at the end of the pass through the loop for the indicated value of i.
| i | a[i] | m |
|---|---|---|
| 0 | 12 | 12 |
| 1 | 45 | 45 |
| 2 | 67 | 67 |
| 3 | 16 | 67 |
| 4 | 65 | 67 |
| 5 | 89 | 89 |
| 6 | 13 | 89 |
| 7 | 44 | 89 |
| 8 | 92 | 92 |
| 9 | 55 | 92 |
Let us have a closer look at the correctness. We claim that at all times, m = max{a[j] | 0 <= j < i}, that is, m is the maximum over all numbers considered so far. An important point is that the claim is obviously true at the beginning, because then m = a[0] = max{a[j] | 0 <= j < 1}. If this still holds at the end for i = n, then we have m = max{a[j] | 0 <= j < n}, which is the value to compute. So, it remains to verify that the property does not get lost at an intermediate step. This is realized by the conditional execution of the statement "m = a[i]". If we assume that for the current value of i, i >= 1, m = max{a[j] | 0 <= j < i}, then there are two cases to distinguish:
In the first for loop we have written curly brackets, "{" and "}", around the statements to be repeated, in the second for loop we did not do so. Why? The function of curly brackets is to group statements together into a single compound statement. After a for or an if or any of the other instruction of this type, there can either come a single statement, or a compound statement. So, if there is more than one statement to iterate or to execute conditionally, these must be enclosed in curly brackets. A single statement may be enclosed in brackets. What happens when writing the following by mistake?
for (i = 0; i < SIZE; i++)
printf("Give the value of a[%2d] >>> ", i);
scanf("%d", &a[i]);
The compiler will not find anything to complain about. For a human
reader, the grouping is clear, but in this case the compiler assumes
that only the print statement has to be repeated. The lay-out of the
program is of no importance to the compiler! Finally one value is read,
and it is assigned to a[SIZE], because at the end of the loop i = SIZE.
At this point there is a real error. There are programmers, which
always write the curly brackets even if there is only a single
statement to execute. There (quite strong) argument is that it is
harder to forget something which you always do, then something which
you sometimes do. Furthermore, due to corrections the number of
statements may change from one to more than one, and then it is very
common to forget that this also means that brackets must be added. If
they are already there, there is no such risk. Where should these
brackets be placed? There are at least three conventions. All of them
are good as long as they are applied consequently:
for ( ... ) {
...
... }
for ( ... ) {
...
...
}
for ( ... )
{
...
...
}
Arrays can be used to store a fixed number of elements under a common name. A loop based on the for statement is the natural way of processing arrays. Conditional execution is obtained with the if statement.
#include "stdio.h"
#define true 1
#define false 0
typedef char bool;
int main() {
int i, x, r, s, n;
bool ok = false;
int* a;
printf("\nGive n >>> ");
scanf("%d", &n);
a = (int*) malloc(n * sizeof(int));
for (i = 0; i < n; i++) {
printf("Give the value of a[%2d] >>> ", i);
scanf("%d", &a[i]); }
r = s = 0;
while (!ok) {
r++;
ok = true;
for (i = 1; i < n; i++)
if (a[i - 1] > a[i]) {
ok = false;
s++;
x = a[i - 1]; a[i - 1] = a[i]; a[i] = x; } }
printf("\nSorted after %1d rounds with %1d exchanges:\n", r, s);
for (i = 0; i < n; i++)
printf("a[%2d] = %10d\n", i, a[i]);
printf("\n");
free(a);
return 1; }
With the instruction "typedef char bool;" it is settled that the
type which in the program is called bool, is actually a char. The
reason to introduce a type bool like this, is that we may not want to
know about the internal representation, as this distracts from what is
important about booleans, namely that they can have two values: true
and false. If at the beginning the values of true and false have been
correctly settled, there is no need to remember in the rest of the
program whether 0 corresponds to the logical value true or to false.
This reduces the risk of making errors, makes the program easier to
read and hides irrelevant details.
The body of the program starts with declaring some int variables and then a bool which is initialized upon declaration. The statement "bool ok = false;" is equivalent to the two statements "bool ok; ok = false;". After compilation the generated code will be the same, so this shorter way of writing does not make the program faster, but it may make it easier to read.
More interesting is the declaration "int* a;" What is this "*"? It indicates that a is not a normal int but a pointer to an int. This is a crucial distinction, which is not so easy to grasp. Before we consider what is a pointer, it is important to better understand the nature of variables. For every variable there is an entry in a list of variables. This entry is a memory address. For an int x this address gives the beginning of a section of 4 bytes which are reserved for storing the int value of x. When a statement "x = y;" is executed, y is looked up in the table. This is an address. Then the value found at the memory position corresponding to this address is copied into the processor. Then x is looked up in the table and finally the value of y is written away at the position given by the returned address. Thus, a variable is actually a reference to a box for storing a value of the appropriate type. For a variable of type int*, in the table we find the address of a storage space for the adress of an integer. To such a variable we may assign the adress of an int rather than the value of an int.
Variables of pointer type serve several purposes. In the statement "a = (int*) malloc(n * sizeof(int));" we encounter a first example. The system procedure malloc allocates memory space and returns the address of the first byte of this space. Here space for n ints is allocated. In addition it is specified that the result must have the type int*. This then, an address, is assigned to a. So, a now points to the beginning of a stretch of n * sizeof(int) ints. This is precisely what we need for an int array of length n. In C we can hereafter work with a as if it were an array. So, it is correct to write a[7]. Internally this is handled by looking up the value of a, the address where the stretch of memory starts. Then 7 * sizeof(int) is added to this value. This gives the address of the position where a[7] is stored. This construction is not very clean but works conveniently.
Then we find "r = s = 0;". This is equivalent to "s = 0; r = s". Again, this is only a more compact way of writing something that just as well might have been written differently. More generally: the lay-out of the program is essential for the human reader, but of no importance to the compiler. Statements may be packed together on a single line, blanks may be added or omitted (as long as there remains one separating element between any two variable names or keywords).
Then we find "while (!ok) { ... }". The semantic is that as long as the condition inside the round brackets holds, the statements inside the curly brackets are executed. In this case the condition is "!ok", which means that it is executed as long as the situation is not ok, whatever this may mean. The while statement is the natural choice for a loop with an a priori unknown number of repetitions. The while statement together with the following iterated statements is called a while loop. In principle there is no need to use both for and while statements. All can be done with either of the two. However, using them in the suggested way, gives the reader of the program, some extra support. Of course there are cases in between, such as when processing an array from its beginning until its end, unless some exception occurs. In that case a for loop with an additional condition is probably clearest. For example, we might have "for (i = 0; i < n && a[i] >= 0; i++) { ... }" if we want the execution to be interrupted if there happens to be a negative value in the array.
The attentive reader will have noticed that the while loop in the given program is going to be executed at least once: ok is initialized to false, and therefore the first time !ok is certainly true. For loops which are going to be executed at least once, it is more natural to use an alternative construction rewriting the loop as follows (omitting the counters):
do {
ok = true;
for (i = 1; i < n; i++)
if (a[i - 1] > a[i]) {
ok = false;
x = a[i - 1]; a[i - 1] = a[i]; a[i] = x; } }
while (!ok);
So, we are using a do-while loop instead of a while loop.
Computationally, the advantages are that there is no need to initialize
ok and that a test is saved. More important is that it more clearly
expresses the structure of the program.
Between the round brackets of a while statement (and analogously for if and for statements), there must stand a boolean expression. For example, "i < n && a[i] >= 0". By substituting the values of the occurring variables i, n and a[i], this expression can be evaluated to a result true or false. This is the same as evaluating the expression "n - i * a[i]". The simplest expression of a given type is a constant. Many beginning programmers do not realize this and write "while(ok != true)" instead of the simpler "while(!ok)". Optimizers will discover this and save the unnecessary instruction, but it nevertheless looks quite unprofessional. In C and Java and many other languages the symbol "==" is used for testing left and right side on equality. The result is a boolean. It is a very common error to write "while (a = b) { ... }", where "while (a == b) { ... }" is intended. In Java this leads to an error message from the compiler, because the type of the assignment is not boolean. In C, however, 0 is false and all other values are true. As said before, there is nothing like a type boolean, and at the place of the while condition anything which can somehow be interpreted as a number will do. It often takes quite some time to find this kind of errors. The mentioned C feature can even be exploited. The following is a correct but ugly way of shifting all values of an array one position back: "for (i = n - 1; i; i--) a[i - 1] = a[i];". Much clearer is "for (i = n - 1; i > 0; i--) a[i - 1] = a[i];".
Inside the while loop we find the sequence of statements "x = a[i - 1]; a[i - 1] = a[i]; a[i] = x;" This is used to exchange the values of a[i- 1] and a[i]. This operation is called a swap. Swapping two values is not a basic operation. It requires a dummy variable which is used to temporarily store the value of one of the two variables.
Close to the end of the program we find "free(a)". Free is the counterpart of malloc: it tells the system that the memory space allocated to a is no longer needed and can be reused. At the end of the program this would be done anyway, but in other programs new memory is allocated frequently. Never freeing anything would soon lead to a situation in which all available memory has been consumed. This will be worked out in more detail further down.
It was claimed that the given algorithm sorts the numbers of the input in increasing order. Trying several inputs appears to support the claim. This is a good start, but does not prove anything. Another question is how long the algorithm may take. As long as we are typing in the numbers by hand this will not be an issue, but it is still interesting to know whether we can sort 100 or 1,000,000 numbers in a second. Even more important is to make sure that the algorithm terminates. For the previous two programs this was not an issue: the first program contains no loops, so the execution simply runs from the first to the last line. Such programs terminate in a time which is proportional to the number of statements. The second program contains a for loop. The instructions inside the loop are performed at most SIZE times. Again, it is easy to put an upper bound on the number of executed statements, and the running time is proportional to this. For the third program, the situation is much harder: the number of iterations of the while loop is not known beforehand, and it is a priori not even clear that it is finite. Inside the while loop there is a for loop which executes a constant number of statements at most n times. So, the total time consumption is proportional to r * n. n is a user-specified parameter, but r depends on the problem.
If a[i - 1] >= a[i], for all i, 1 <= i < n, then the statements conditioned by the if are never executed. Particularly, the value of ok will not be changed in the for loop, so at the end of the while loop it will have the same value as at the beginning, which is true. In that case the while loop is left. On the other hand, if a[i - 1] < a[i], for some i, 1 <= 1 < n, then the statements conditioned by the if are executed. Particularly, the value of ok will be set to false. Because inside the for loop it cannot be set to true again, it will be false at the end. Therefore, the while loop will not be left. We conclude that the while loop is left if and only if the numbers in the array stand in sorted order. In general termination does not imply correctness, but for this particular problem we now know that if the computation terminates, it is correct as well.
We claim that after r rounds, 0 <= r <= n, the r largest numbers have reached their final positions. For r = 0, the claim is void, so it clearly holds. If it holds for r = n, then it states that all n numbers have reached their final position, which means that the array has been sorted. Now consider any intermediate value of r. Assume the claim holds for r - 1. That is, the largest r - 1 numbers are standing in the r - 1 positions with the highest indices. Consider the number x whose final position is in a[n - r]. This number must stand in some of the a[j] with 0 <= j <= n - r, because the other positions are occupied by the largest numbers. x is the largest of the remaining numbers (if several numbers have the same value x, then j should be the highest index containing this value). In the for loop, when i = j + 1, x is swapped to position i. Thereafter, x bubbles further until it stands in a[n - r]. This shows that the correctness of the claim for r can be deduced from its correctness for r - 1. Together with the correctness for r = 0, this implies that it holds for r = 0, 1, ..., n. Notice the similarity of this argument with the correctness proof of the algorithm computing the maximum.
Arrays are closely related to pointers. The while loop is the natural way of expressing a conditional iteration.
Arrays are the most important structured data type. Arrays can be defined over any previously defined type. The general format is "type_name array_name[array_size]". For reasons of convenience C even offers a construct for higher-dimensional arrays. A two-dimensional array can be used as a matrix or a tensor. This is declared like "type_name array_name[array_size_1][array_size_2]". This may be imagined as a block of array_size_1 x array_size_2. The first number giving the number of rows, the second the number of columns. In the memory these are arranged row-wise. So for a two-dimensional array a[][] declared by "int a = a[100][80];", a[17][20] stands just before a[17][21], but a[18][20] stands 100 positions further.
One may think that if we have two int arrays a[] and b[] of length n, that then the values from b[] may be copied to a[] by writing "a = b;". This is not true. Above we pointed out that arrays are very similar to pointers, and such an assignment (if it were allowed) would let a[] point to the same address as b[]. All manipulations on an array are done by manipulating the individual positions. So, copying b[] to a[] is achieved with a for loop:
int i;
for (i = 0; i < n; i++)
a[i] = b[i];
Strings are text sequences. String constants look like "hello what is your name?". In C there is not really a string type. Instead we can either use arrays of char, or a char*. The first works fine when the strings are assigned upon declaration, but after the declaration, for type reasons, a string constant cannot simply be assigned to an array of char. It is more convenient to define a string variable as a char*. Then the following works correctly:
char* name;
name = "Michiel de Ruyter";
printf("Admiral %s was fighting many sea battles\n", name);
Primitive data types consist of a single element of a given type. Arrays contain a parametrizable number of elements of a given type. There are also compound types, which contain a fixed number of elements of possibly different types. In C such compound types are called structs, in other languages these may also be called records. We do not want to spend much time on structs, because we think that for the typical C application they are of limited interest. There are three reasons to use C:
The classical example of a struct is the personal record of an employee. An employee has a number, a name, a birth date and a salary. This can be achieved as follows:
struct {
int number;
char* name;
float salary; } assistant;
This declares a variable assistant with four fields: an int, a char*
and a float. A correct assignment is "assistant.salary = 3212.67",
setting the salary field of assistant to 3212.67. Accessing the fields
of a compound variable is done in most languages with help of the
dot operator ".".
This idea becomes much more useful, when we define a new compound type which can be reused for several declarations:
struct staff {
int number;
char* name;
float salary; };
Hereafter, we can write:
struct staff assistant; struct staff secretary;
We can go one step further by defining struct staff as a type. This can be integrated into the struct definition, but there is no need to do so. If we write
typedef struct staff staff_type;then we can write later in the program
staff_type assistant; staff_type secretary; staff_type workers[10];The last defines a working crew of size ten, its fields are accessed with a double indirection: "worker[8].name = "Jan Becker";"
What output will be produced by the following program? It is suggested that you try it before reading on.
int main() {
typedef struct {
int number;
char* name;
float salary; } staff_type;
staff_type assistent, secretary;
assistent.number = 101;
assistent.name = "Bertina";
assistent.salary = 3212.67;
secretary.number = 107;
secretary.name = "Hannelore";
secretary.salary = 2145.18;
printf("assistent = (%4d, %10s, %8.2f)\n",
assistent.number, assistent.name, assistent.salary);
printf("secretary = (%4d, %10s, %8.2f)\n",
secretary.number, secretary.name, secretary.salary);
secretary = assistent;
printf("assistent = (%4d, %10s, %8.2f)\n",
assistent.number, assistent.name, assistent.salary);
printf("secretary = (%4d, %10s, %8.2f)\n",
secretary.number, secretary.name, secretary.salary);
secretary.number = 134;
secretary.name = "Birgit";
secretary.salary = 2456.56;
printf("assistent = (%4d, %10s, %8.2f)\n",
assistent.number, assistent.name, assistent.salary);
printf("secretary = (%4d, %10s, %8.2f)\n",
secretary.number, secretary.name, secretary.salary);
return 1; }
One might expect that, in analogy with the arrays, structs are
pointers. In that case, after writing "secretary = assistant;" both
pointers would point to the same memory address and when changing
any value of either of the two, would also change the value of the
other. This is what happens in an analogous situation in Java. C is
different. Apparently, when executing "secretary = assistant;" the
computer copies the fields of the struct one-by-one. The later changes
to the values of secretary have no impact on the values of assistant.
Explicitly defining a new type does not only saves typing, but is also an essential structuring step. By doing this, it is made explicit that we are dealing with staff members, not just with a bunch of unrelated variables. Going beyond this, packing even the functionality of the type into the definition is the principal idea of object-oriented programming.
Arrays and structs are the most important structured data types. One should be very careful with assignments of structured types. Defining own types is a major structuring step.
#include "stdio.h"
#define true 1
#define false 0
typedef char bool;
void swap(int* x, int* y) {
int z = *x; *x = *y; *y = z; }
void read_array(int* a, int n) {
int i;
for (i = 0; i < n; i++) {
printf("Give the value of a[%2d] >>> ", i);
scanf("%d", &a[i]); } }
int sort_array(int* a, int n) {
int i, r = 0;
bool ok = false;
while (!ok) {
r++;
ok = true;
for (i = 1; i < n; i++)
if (a[i - 1] > a[i]) {
ok = false;
swap(&a[i - 1], &a[i]); } }
return r; }
void print_array(int* a, int n) {
int i;
for (i = 0; i < n; i++)
printf("a[%2d] = %10d\n", i, a[i]); }
int main() {
int n, r;
int* a;
printf("\nGive n >>> ");
scanf("%d", &n);
a = (int*) malloc(n * sizeof(int));
read_array(a, n);
r = sort_array(a, n);
printf("\nSorted in %1d rounds\n", r);
print_array(a, n);
printf("\n");
free(a);
return 1; }
The program is structured quite differently from before. As always the
execution starts in main. n is read as before, then we find
"read_array(a, n);", a procedure call. When encountering such a
procedure call, the execution continues at the beginning of the
procedure with the corresponding name. A procedure, also called
subroutine, is a subsection of the program with its own header and
body just like main. Main itself is also a procedure, the procedure
where by default the execution of the program starts. Other procedures
which we encountered before are the system-provided IO routines
printf and scanf.
The procedure call consists of the name of the procedure and a list of variables. These variables are copyed into the corresponding parameters in the header of the procedure. In this case the names inside the procedure are the same as in main, but there is no need for this. The procedure read_array is of type void. This means that it is not returning any value. The procedure which is called next, sort_array, is different in this respect: it is of type int, just like main itself, which means that it must end by returning some int value. A procedure returning a value might also be called a function. The procedure print_array is again of type void. Inside sort_array we find a call to the procedure swap which swaps the values of the two variables which are passed as arguments. We see procedure may contain calls to procedures which ... . The calling depth can in principle be arbitrary, but in practice it is limited by the space which is reserved for storing the data necessary for storing the values of the variables and the like.
It is important to be aware of the visibility of variables. A variable is visible, that is can be used only within a certain scope. A variable which is declared within a procedure can be used anywhere within this procedure, but not in another procedure: it is a local variable. This also means that it is no problem to use the same variable names in several procedures. This is essential! Otherwise, it would be very hard to add a new procedure at a later time to an existing program: one should have a complete overview of all names used anywhere in the program. In principle variables can be even more local then the procedure: in many languages (but not pure C) new variables can be defined inside an if, for, while or compound statement. These are invisible outside it. Even these variables may have the same names as other variables within the same procedure. In that case we say that these other variables are shielded. In general shielding a variable is not a good idea, because it is confusing. The other extreme are variables which are not local to any procedure: global variables. In C these are declared at the beginning of the program like the array a[] above. These are visible everywhere. Because the compiler knows right from the start that these variables will be there, these are allocated in a different way. Except for very large arrays, this goes unnoticed though. It is a good idea to declare variables as local as possible.
There is an essential difference between the way the parameter n is passed to sort_array and the way sort_array passes a[i - 1] and a[i] to swap. In the first case, the value of n is copied into the variable n which lives only locally inside sort_array. Changing the value of n in sort_array has no impact on the value of n in main. This way of calling is known as call by value. Of course this is not what we want to happen in swap: if we would copy the values of a[i - 1] and a[i] into local variables x and y, then swapping the values of x and y would not change the values of a[i - 1] and a[i]. So, in this case x and y should not be local values, they should just be other names for the same objects. That is, they should be pointers, pointing to the same memory space as a[i - 1] and a[i]. That is why x and y are of type int*. This is also why we are not passing a[i - 1] and a[i], but rather the addresses of these. The address of a variable can be accessed with the address operator "&". This second way of calling is known as call by reference, because the address or reference is passed.
Because it is so important, we consider two variants of a swap procedure in more detail:
void local_swap(int x, int y) {
int z;
z = x;
x = y;
y = z;
printf("Values in local_swap:\n");
printf(" x = %10d, y = %10d\n", x, y);
printf(" ax = %10X, ay = %10X\n", &x, &y); }
void global_swap(int* x, int* y) {
int z;
z = *x;
*x = *y;
*y = z;
printf("Values in global_swap:\n");
printf(" x = %10d, y = %10d\n", *x, *y);
printf(" ax = %10X, ay = %10X\n", x, y); }
int main() {
int x, y;
x = 17;
y = 23;
printf("Values in main at beginning:\n");
printf(" x = %10d, y = %10d\n", x, y);
printf(" ax = %10X, ay = %10X\n", &x, &y);
local_swap(x, y);
printf("Values in main after local_swap:\n");
printf(" x = %10d, y = %10d\n", x, y);
printf(" ax = %10X, ay = %10X\n", &x, &y);
global_swap(&x, &y);
printf("Values in main after global_swap:\n");
printf(" x = %10d, y = %10d\n", x, y);
printf(" ax = %10X, ay = %10X\n", &x, &y);
return 1; }
Running the program gives the following output (integer values are
given in decimal notation, addresses are printed hexadecimally):
x = 17, y = 23 ax = FFBEED2C, ay = FFBEED28 Values in local_swap: x = 23, y = 17 ax = FFBEED0C, ay = FFBEED10 Values in main after local_swap: x = 17, y = 23 ax = FFBEED2C, ay = FFBEED28 Values in global_swap: x = 23, y = 17 ax = FFBEED2C, ay = FFBEED28 Values in main after global_swap: x = 23, y = 17 ax = FFBEED2C, ay = FFBEED28
What are procedures good for? In the current example, you may think it only complicates something simple. For very short programs this is true, but there are very good reasons to use procedures:
int fac(int n) {
if (n == 0)
return 1;
else
return n * fac(n - 1); }
Here we see something new: the procedure calls itself. This is called
recursion and the procedure is said to be recursive.
Recursion is no special problem, to the compiler it is not really
different from any other procedure call. Recursion is allowed in all
programming languages. It may also happen that several procedures
mutually call each other. In that case, they are said to be mutually
recursive.
What happens exactly if we call fac(5)? At the highest level, n = 5 != 0, so the second alternative applies and the returned value is 5 * fac(4). At the second level, n = 4 != 0, so 4 * fac(4) is returned. This goes on until calling fac(0), which returns 1 without further recursion. Then all pending calls to fac return one-by-one. At the highest level fac(4) comes back with value 24, and fac(5) returns 120.
Any recursive algorithm must have at least one non-recursive alternative, otherwise it will not terminate. Furthermore, the programmer must somehow assure that eventually such a non-recursive alternative will be reached. In the above case this is obvious, because at recursion depth d the value n_d of n equals n_0 - d, where n_0 is the original value of n, so after exactly n calls we will have n_d = 0. This can be proven formally: we claim that d + n_d = n_0 for all 0 <= d <= n. For d = 0, the claim obviously holds. For all other d we use that when going one level deeper, d increases and n increases by 1. So, the sum d + n_d does not change when d increases: d + n_d is an invariant of the procedure.
Recursion does not only lead to very compact programs, but it also gives rise to programs whose correctness often can be proven more easily then alternative formulations with a loop. In the above, if we assume that fac(n - 1) = (n - 1)!, it follows immediately that fac(n) = n * fac(n - 1) = n * (n - 1)! = n!. In the following non-recursive version there is a second parameter and other instructions then just assignments and computations:
int fac(int n) {
int i, f = 1;
for (i = 2; i <= n; i++)
f *= i;
return f; }
maxsubsum(0, n) = max{subsum(l, h) | 0 <= l < h <= n},where subsum(l, h) = sum_{l <= i < h} a[i]. This value is called the maximum subsequence sum. Precisely this value may not be that important, but variants of the problem, which can be tackled in a similar way, have important applications in bio-informatics.
The definition immediately suggests the following procedure:
int maxsubsum(int* a) {
int i, l, h, s, m;
for (m = l = 0; l < n; l++)
for (h = l + 1; h <= n; h++) {
for (s = 0, i = l; i < h; i++)
s += a[i];
if (s > m)
m = s; }
return m; }
Here we find three nested loops. This might be time consuming! Let
us try to estimate how many times the statement "s += a[i];" is
executed. This is the statement at the deepest level and is therefore
executed most often. For larger values of n this will imply that the
time for executing this statement will be a constant fraction of the
total running time. The number is sum_{0 <= l < n} sum_{l + 1 <= h < n}
(h - l). It is not very hard to see, for example by using a geometric
argument, that this sum is proportional to n^3 (inserting a counter
in the program will confirm this). So, the time consumption is
proportional with n^3. That means that when, for sufficiently large n,
doubling the problem size the time consumption will be multiplied with
a factor 8. Even though computers are very fast, this means that the
problem cannot be solved for very large n. On a fast computer n =
10,000 is about the limit.
There is an alternative recursive formulation which is quite simple and much more efficient. The underlying idea is general and very important: try to apply a divide-and-conquer approach. For our problem we can distinguish three mutually exclusive cases:
int maxleftsum(int* a, int l, int h) {
int i, s, m;
for (i = h - 1, s = m = 0; i >= l; i--) {
s += a[i];
if (s > m)
m = s; }
return m; }
int maxrightsum(int* a, int l, int h) {
int i, s, m;
for (i = l, s = m = 0; i < h; i++) {
s += a[i];
if (s > m)
m = s; }
return m; }
int recmaxsubsum(int* a, int l, int h) {
if (h - l == 1)
return a[l];
else
return max(
recmaxsubsum(a, l, (l + h) / 2),
recmaxsubsum(a, (l + h) / 2, h),
maxleftsum(a, l, (l + h) / 2) + maxrightsum(a, (l + h) / 2, h)); }
It can be shown that the number of performed operations is now
proportional to n * log n. Even for n = 10^6, this is not a very large
number and indeed, the program solves a problem of this size in less
than a second.
A complete program containing both variants together with routines for generating an input and measuring the time consumption can be downloaded.
The goto, which jumps to a position indicated by some label, has been banned for good reasons: if employed in a careless way, they lead to spaghetti code which might be close to impossible to understand and debug. Instead certain kinds of jumps can be made with the if, the for and the while statements. In an if statement, if the value of the expression is false, the execution of the following statements is skipped. In a loop, the point of execution jumps back from the end of the iterated statements to the test.
Why are some kinds of jumps ok, while others are considered to be a sign of bad programming style? For this we should consider the structure of a program. Any statement is enclosed by a certain number of curly brackets. This number is unique if the statements after every if, for and while are enclosed in brackets and no superfluous brackets are written. This number is called the bracketing depth of the statement. Using #d to indicate the depth of statements appearing at a certain position of the program, the structure of the presented sorting program is as follows:
int main() {
#1
for () {
#2 }
#1
while () {
#2
for ()
#3
if () {
#4 }
#3 }
#2 }
#1 }
for ()
#2 }
#1 }
The essential feature of the structure of this program, and
all programs with these instructions, is that the difference between
the depths of consecutive statements is at most one. There are no
radical jumps into or out off a substructure. Said otherwise, there
is a clean nested structure, a hierarchy of levels. Even calls
to procedures respect the nested structure: after the procedure the
execution continues at the same place. So, the procedure-calling
mechanism just adds another way of going one level deeper. It is
generally accepted that this kind of structuring leads to programs
which are easier to understand. Gotos are banned because they can be
used in a way that upsets the nested structure.
if (a == 1000)
a = 0;
The keyword "then" is not written (C likes to save writing, even when
this goes at the expense of clearness). The boolean condition, is
written between round brackets. The operational semantic of the if
statement is that if evaluating the boolean condition results in true,
in the example this happens when the variable a equals 1000, that then
the following statement is executed and otherwise this statement is
skipped. The if statement only applies to the statement immediately
following it. If one wants to apply it to several statements, these
should be turned into a compound statement writing them between curly
brackets:
if (a == 1000) {
a = 0;
b++; }
The second form of the if statement also has an else-part.
if (a == 1000)
a = 0;
else {
a += 4;
s += a[i]; }
The statements following the else are executed if the boolean
expression evaluates to false.
Consider the following code fragment:
if (i >= 0)
if (i > 0)
i--;
else
i++;
Starting it for i = 3, the instruction i-- is executed and afterwards
i = 2. What happens when i = 0 or i < 0? The layout suggests that
nothing should be done if i = 0, and that i should be increased by 1 if
it is negative. But, this is not what happens. The compiler ignores the
layout and assumes that the else belongs to the if which is closest.
Thus, if i = 0, it will be 1 afterwards. If i < 0, it will remain
unchanged. There are two ways to resolve the problem. The first is by
adding an extra else:
if (i >= 0)
if (i > 0)
i--;
else
;
else
i++;
This is ugly, but nicely shows that even an empty statement is a
statement. The normal way of solving the problem is by adding
protective brackets:
if (i >= 0) {
if (i > 0)
i--; }
else
i++;
The brackets here enclose the second if, so that the else clearly is at
a different bracketing depth. Actually, in this case the intended
functionality can be expressed more clearly by rewriting the code
fragment, eliminating the problem at the same time:
if (i > 0)
i--;
else if (i < 0)
i++;
if (d = 1)
printf("Sunday");
else if (d = 2)
printf("Monday");
else if (d = 3)
printf("Tuesday");
else if (d = 4)
printf("Wednesday");
else if (d = 5)
printf("Thursday");
else if (d = 6)
printf("Friday");
else if (d = 7)
printf("Saturday");
else
printf("wrong day number!");
Here we have chosen an alternative layout to underline that there are
several equivalent alternatives rather than a nesting of ifs. This is
way of writing is ok, but C also offers a special statement for such
multiple-choices: the case statement.
The fact that there are equivalent alternatives can be underlined even more using the switch statement. Then the above looks like
switch (i) {
case 1: printf("Sunday");
case 2: printf("Monday");
case 3: printf("Tuesday");
case 4: printf("Wednesday");
case 5: printf("Thursday");
case 6: printf("Friday");
case 7: printf("Saturday");
otherwise: printf("wrong day number!"); }
The otherwise part captures all non-listed values, it is not
compulsory. Instead of a single variable i, we may have any numerical
expression to switch on. The switch statement can also be used with
characters (which are converted to numbers).
The conditional statements are the if, the if-else and the switch statements.
The for statement in general looks like for (part 1; part 2; part 3). In part 1 and 3 there can stand 0 or more assignments. Part 1 is intended for the initialization of variables relevant to the loop. Part 3 is intended for their actualization at the end of a pass. Part 2 is a boolean expression.
Consider the problem of determining whether a value x occurs among the values stored in an array a[] of length n. This can either be done with a for or with a while loop, but the first appears to be the choice that most clearly exposes the structure of the computation: In a context where a[], n and x are given and i is an int, we can write
for (i = 0; i < n && a[i] != x; i++);
if (i < n)
printf("%1d occurs\n", x);
else
printf("%1d does not occur\n", x);
Here the statement following the for statement is the empty one. This
is marked by the semicolon following the closing bracket. It is a
common error (which is quite hard to find!) to write this semicolon
where it is not intended. The above is correct even if x does not
occur. In that case i runs until it is n. Then the expression i < n &&
a[i] != x is evaluated. At this point it is essential that such an
expression is evaluated from left-to-right in a lazy way. Lazy
execution means that the execution is interrupted as soon as the result
is known. If i = n, then i < n is false and the value of a[i] != x
does not matter anymore. This is good so, because for i = n, we should
not access a[i] as this might lead to a segmentation fault. A
segmentation fault occurs when trying to access a memory
position which lies outside the space allocated to the program. So,
testing with a[i] != x && i < n might have led to an error. Also in the
test after the loop we were careful to avoid accessing a[n]. It would
be risky to write "if (a[i] == x) ... ". Actually with this alternative
test there might happen something that is worse than a segmentation
fault: the program might produce the wrong answer without crashing. If
position a[n] lies within the space of the program, then in C the test
will be performed in a normal way. If by coincidence position a[n]
has value x, then the conclusion will be that x occurs in a[].
The for and while statements offer elegant ways to realize a conditional repetition.
Computing the maximum of two numbers is easy:
int maximum(int i, int j) {
int m;
if (i <= j)
m = i;
else
m = j;
return m; }
Here there is only one return statement at the very end of the
procedure. This clearly exposes the structure of the computation and
it is clear which value is going to be returned. So, if there is
something wrong, then we know that we must figure out how m got its
value.
The maximum can also be computed as follows:
int maximum(int i, int j) {
if (i <= j)
return i;
else
return j; }
This procedure is shorter and possibly faster. Here we use that return
statements may appear anywhere in the program. If we look at the
changes in bracketing depth, then we see that the return statements
skip over a level. So, this way of using return gives a violation of
the pure nested structure. On the other hand, this is not a very
serious violation because it goes upwards, like closing several
brackets in one stroke.
Actually, realizing that the execution is immediately interrupted once a return is encountered, the above can be written even shorter:
int maximum(int i, int j) {
if (i <= j)
return i;
return j; }
This is shorter but probably not faster, because a good compiler will
generate the same code for both alternatives. On the other hand, does
this obscure the structure of the procedure: it is no longer clear that
there are two equivalent alternatives. It may appear that the default
return value is j and that sometimes i is returned instead. In this
three-line procedure this is no issue, but in a longer procedure this
kind of "improvements" should be avoided.
/* Written by Jop Sibeyn, 25.10.2002
This program computes the average value of the elements
of an array */
#define SIZE 100 /* The length of the used array */
int main() {
int i, a[SIZE], sum;
/* Initialization */
for (i = 0; i < SIZE; i++)
a[i] = i;
/* The main part of the program */
for (i = 0, sum = 0; i < SIZE; i++)
sum += a[i];
/* Printing the results */
printf("The average value is %5.2f\n", (float) sum / SIZE);
/* Finishing */
return 1; }
You should not write books, but concise explaining phrases are essential. Any complete program of more than 50 lines should be commented.
Comment is essential for understanding a program.
int a[100];This instruction creates 400 bytes of space for storing 100 integers. This space can be accessed with the help of the name a. Actually there is an internal mechanism, which is called address arithmetic, which is used to access the stored values: if we write "b = a[23]", then the value of a is fetched, this is a memory address, namely the address of a[0], then 23 * 4 is added to this value, and the integer at this address (the value represented by the 4 bytes starting at the specified address) is returned. Therefore, the above assignment is equivalent to writing "b = *(a + 23)". The most remarkable feature is that we only need to add 23 and not 4 * 23, because internally it is known that a is an array of integers.
The header of a procedure with an integer argument may be written in two ways. At a first glance the most correct way of doing is to write
int sum(int a[], int n) {
int i, s;
for (i = s = 0; i < n; i++)
s += a[i];
return s; }
In this way it is explicitly told that the argument must be an integer
array. Alternatively, one may write
int sum(int* a, int n) {
int i, s;
for (i = s = 0; i < n; i++)
s += a[i];
return s; }
Both procedures are correct C.
A possible way of doing this is by simply exchanging all values:
void initialize(int a[], int b[], int n) {
int i;
for (i = 0; i < n; i++) {
a[i] = i;
b[i] = -i; } }
void simple_exchange(int a[], int b[], int n) {
int i, c;
for (i = 0; i < n; i++) {
c = a[i];
a[i] = b[i];
b[i] = c; } }
void print_arrays(int a[], int b[], int n) {
int i;
printf("\n");
for (i = 0; i < n; i++)
printf("a[%2d] = %4d, b[%2d] = %4d\n", i, a[i], i, b[i]); }
int main() {
int n = 10, a[n], b[n];
initialize(a, b, n);
print_arrays(a, b, n);
simple_exchange(a, b, n);
print_arrays(a, b, n); }
This procedure takes time proportional to n. One might fear that it has
the same problem as local_swap above, but that is not the case. The
reason is that the array variables a and b are actually pointers and
that these pointers are handed over as parameters. The output of the
program is:
a[ 0] = 0, b[ 0] = 0 a[ 1] = 1, b[ 1] = -1 a[ 2] = 2, b[ 2] = -2 a[ 3] = 3, b[ 3] = -3 a[ 4] = 4, b[ 4] = -4 a[ 5] = 5, b[ 5] = -5 a[ 6] = 6, b[ 6] = -6 a[ 7] = 7, b[ 7] = -7 a[ 8] = 8, b[ 8] = -8 a[ 9] = 9, b[ 9] = -9 a[ 0] = 0, b[ 0] = 0 a[ 1] = -1, b[ 1] = 1 a[ 2] = -2, b[ 2] = 2 a[ 3] = -3, b[ 3] = 3 a[ 4] = -4, b[ 4] = 4 a[ 5] = -5, b[ 5] = 5 a[ 6] = -6, b[ 6] = 6 a[ 7] = -7, b[ 7] = 7 a[ 8] = -8, b[ 8] = 8 a[ 9] = -9, b[ 9] = 9
This is nice, but not very efficient. This operation can also be performed in constant time: we do not have to exchange all elements, it is sufficient to exchange the values of a and b. So, afterwards a will point to the first position of b and vice versa. That is, we want to perform a procedure like global_exchange with parameters of type "array of integer". For more complex operations like this, it becomes much more convenient not to mix the array notation with the pointer notation:
void initialize(int a[], int b[], int n) {
int i;
for (i = 0; i < n; i++) {
a[i] = i;
b[i] = -i; } }
void fast_exchange(int** aa, int** ab) {
int* c ;
c = *aa;
*aa = *ab;
*ab = c; }
void print_arrays(int a[], int b[], int n) {
int i;
printf("\n");
for (i = 0; i < n; i++)
printf("a[%2d] = %4d, b[%2d] = %4d\n", i, a[i], i, b[i]); }
int main() {
int n = 10;
int* a = (int*) malloc(n * sizeof(int));
int* b = (int*) malloc(n * sizeof(int));
initialize(a, b, n);
print_arrays(a, b, n);
fast_exchange(&a, &b);
print_arrays(a, b, n);
free(b);
free(a); }
This produces the same output as the program above.
Different from an array declaration, declaring a variable of type int* does not immediately allocate a whole lot of memory. A variable of int* has size four (possibly eight) bytes. The standard procedure malloc is used to allocate memory. The number of bytes is passed as an argument. We might write 4 * n, but then we would explicitly use that integers are four bytes long. Doing this, the program would not work on a more modern system were integers consist of eight bytes. The procedure malloc returns a typeless pointer, void*, which cannot be assigned in a correct way to an int* without forcing the system to do so. Therefore we precede the procedure by "(int*)", enforcing a type conversion of the result. Said otherwise, the result type is cast to int*.
At the end of the program we find the calls to the standard procedure free. This procedure deallocates the memory a pointer is pointing to. Of course, at the end of a program all memory is deallocated, so in this case these statements are superfluous. However, in general it is important to carefully manage the memory making sure that the program does not create garbage: allocated memory which cannot be reached anymore by following any of the pointers. Garbage would be created if at some stage in the program we would write "a = b;" or if we would have a second malloc statement for a. Forgetting to free memory is an important source of problems. Suppose that instead of simple_exchange we were doing the following:
void stupid_exchange(int a[], int b[], int n) {
int i;
int* c = (int*) malloc(n * sizeof(int));
for (i = 0; i < n; i++);
c[i] = a[i];
for (i = 0; i < n; i++);
a[i] = b[i];
for (i = 0; i < n; i++);
b[i] = c[i]; }
Constructions of this kind, reducing the number of different variables
in each loop, might have advantages when the "cache associativity" is
low (stupid_exchange is guaranteed to work fine already for a "two-way
associative cache"). However, this procedure leaves behind n *
sizeof(int) bytes of garbage. If we are calling this procedure many
times, we will run out of memory, even when the program actually needs
only a small fraction of it. Every good programmer is so disciplined
to match each malloc (or similar operation) with a corresponding free,
in the same way any "{" is matched by a corresponding "}".
Now one might become afraid that we have the same problem for the following procedure:
void not_so_stupid_exchange(int a[], int b[], int n) {
int i;
int c[n];
for (i = 0; i < n; i++);
c[i] = a[i];
for (i = 0; i < n; i++);
a[i] = b[i];
for (i = 0; i < n; i++);
b[i] = c[i]; }
But here we touch on the counterpart of the automatic memory
allocation of an array: just as the memory is allocated implicitly,
it is also automatically deallocated at the end of the procedure.
In particular this means that one should not assign a local array to a pointer variable which is going to be used outside the procedure. Consider the following program:
void initialize(int** a, int n) {
int i;
int b[n];
for (i = 0; i < n; i++)
b[i] = i;
*a = b; }
void print_array(int* a, int n) {
int i;
printf("\n");
for (i = 0; i < n; i++)
printf("a[%2d] = %4d\n", i, a[i]); }
int main() {
int n = 10;
int* a;
initialize(&a, n);
print_array(a, n); }
This program is syntactically correct. However, when running it, it may
crash or it may produce nonsense. So, not withstanding its syntactical
correctness, this program has runtime errors. If memory which is
allocated in a subroutine should continue to exits after its
completion, then this memory should be allocated with malloc as in
the following modified procedure:
void initialize(int** a, int n) {
int i;
int* b = (int*) malloc(n * sizeof(int));
for (i = 0; i < n; i++)
b[i] = i;
*a = b; }
Allocating in a subroutine memory which survives after its lifespan makes it hard to keep track of the allocated memory. It is a good practice to assure that subroutines leave no garbage: at the end of any subroutine there should be one free for each of its mallocs. So, it is suggested that instead of allocating the memory in initialize, it is allocated in main. Just like when closing brackets, it is also recommendable that the frees are performed in reversed order: the memory that was allocated latest should be freed first.
The difference between allocating memory by writing "int c[n];" amd "int* c = (int*) malloc(n * sizeof(int));" also becomes visible when running the procedure stupid_exchange and not_so_stupid_exchange for large n. The first will not work for n on the order of some millions, even though there is sufficiently much available memory. The second works up to the limit of the memory. The reason is that for each kind of allocation some space is reserved which is not particularly large for static allocations.
There is a close but imperfect relation between arrays and pointers. Arrays are convenient, but often it is better to use pointers and allocate the memory with malloc and deallocate it with free.
For programs there are several quality measures. Obviously, correctness is crucial: a program which is not correct is worthless. For a serious program, readability and extendibility are also of great importance, because if the requirements change, another programmer must be able to modify and extend it. A final main issue is efficiency: a program which can solve only small problems may be good enough today, but it should also work for tomorrows larger problems. Above we have seen how changing the approach had a tremendous impact for the maximum-subsequence-sum problem: for an array of length n the running time of the most basic approach was proportional to n^3 while it was quite easy to reduce this to n * log n.
Instead of approach we will use the word algorithm. An algorithm is a stepwise description of how to tackle a problem. This problem may be how to make an apple-pie, or how to repair a bicycle tire, but it may also be how to compute the maximum-subsequence-sum. If we say that something is an algorithmic issue, we mean that it has to do with the algorithm. This will sometimes be opposed with a programmatic issue, which means that it has to do with the program. Whereas programming is considered to be a rather basic skill, is algorithm design considered to be a more intellectual activity.
The programmatic details can have a substantial impact on the efficiency of a program, but the difference with an optimized version will be bounded by a factor which does not increase with the problem size. On the other hand, as we have seen for the maximum-subsequence-sum problem, the factor between the running time of two programs based on different underlying algorithms can grow arbitrarily large with increasing problem size.
A problem should be solved in a top-down way: first modeling, then algorithm design and finally programming. The programming again should be done in a top-down way.
In the current algorithm the for loop in side the while loop starts with i = 1. As shown in the analysis, this has the effect that the largest numbers are transported to the end of the array. A similar for loop starting with i = n - 1 could transport the smallest numbers to the beginning of the array. There is not really a difference between the two. However, it is interesting to alternate these two for loops. Change the while loop so, that for even r the original for loop is performed, while for odd r it takes the other. Perform the same measurements as before and formulate again how c develops as a function of n. Draw a conclusion from your results.
a * b = a, if b = 1,Write a recursive procedure for computing products based on this definition.
a * b = a * (b - 1) + a, if b > 1.
The task is to verify this property for various n and x and to measure the time consumption. This is done by using a second unsigned integer array b[], which is used for counting the frequencies of the numbers in a[]. It is initialized at zero and in a final pass the maximum of all values in b[] is determined and printed.
Times can be measured with the following procedure:
long dclock() {
/* Returns the time in milliseconds */
struct timeval tp;
struct timezone tzp;
gettimeofday(&tp, &tzp);
return 1000 * (tp.tv_sec % 1000000) + tp.tv_usec / 1000; }
This is not the most scientific way of measuring times, but it is
simple and works quite well. In order to be able to use this
routine it is necessary to include the system library "sys/time.h",
which is done in the same way as the inclusion of "stdio.h".
For n you must test n = 2^k, for all k >= 12 as far as the computer allows you to solve the problem in less than 1 minute. For x you must test x = 1, 2, 4, 11, 19, 1007, 99991. The time measurement should only reflect the time for counting the frequencies of the numbers in a[], not the initialization or finding the maximum. To get stable measurements, the experiments should be repeated until the sum of the measured times exceeds 1000 ms (and then of course you must divide by the number of experiments to get the average time per experiment). Plot the resulting average time consumptions as a function of n using a doubly logarithmic scale (that is, both along the x-axis and along the y-axis the scaling is so that each factor two is one unit distance) connecting the points belonging to the same x value. Consider the developments and the differences and explain them.
Generate three random sets of size 100.000.000 each: S1 are the lotto prices for the first draw, the probability that a number gives a price in the is 0.05. S_2 are the lotto prices for the second draw, again a fraction 0.05 of them is 1. S_3 gives the lotto bets, the probability that a number is selected is 0.2. Now compute the number of bets resulting in a price (each bet gets at most one price). That is, you should first compute the union of S_1 and S_2, then intersect with S_3 and finally compute the size of the resulting set. Print this resulting number (if it does not lie between 1.940.000 and 1.960.000, then probably there is something wrong with your program).
Random numbers can be generated with help of the function random. See the online manual for the details (type "man random" inside a Unix or Linux environment).
This algorithm can be made more efficient by explicitly dealing with some special cases. For example, it is not necessary to ever test even numbers: these can also be thrown out by a modified initialization. Larger improvements can be achieved by not testing multiples of 2, 3, 5, 7, ... , either. And, when one is not testing them, why should one have storage for these numbers? None of these improvements you must implement, it is just pointed out here that this fast algorithm for finding prime numbers can be improved further.
Program two variants with the following features:
For each of the two versions determine the largest power of 2 for which the program runs in less than 1 minute. Which variant is best?
The program should also produce some output. After performing the sieving, you should determine for each number 2 <= k == 2^i <= n / 2 the number of primes between k and 2 * k and the resulting average distance between two primes in these intervals. For each of these intervals the program should also print the maximum distance between any two consecutive primes.
We want to know how efficient the Eratosthenes algorithm for computing primes is. Not in a concrete sense by measuring seconds, but in an abstract sense by counting some specific operations which give a good measure for the amount of work performed. In our case, such a measure is given by the number of visited multiples of the prime numbers. This does not account for the initialization and the testing, but this amount of work is easy to estimate: it is proportional to n.
Determine this number for several values of n and speculate how it develops as a function of n. You can choose from simple functions of the following types: c * n^2, c * n^{3/2}, c * n * log n, c * n * loglog n, and variants. Of course you do not need to speculate: using your measurement of the development of the average distance between primes, it is not hard to derive this development.
The matrices should be initialized as follows:
A_{ij} = 1, for all i, j with i + j evenFor C = A * B this gives a simple regular pattern, which can be used to check that the three procedures all compute the same product.
A_{ij} = -1, for all i, j with i + j odd
B_{ij} = i, for all i, j
Measure the time for each of these methods for n = 2^k, for k = 4, 5, ..., 10 or 11. The time for possibly transposing the matrix must also be taken into account, but not the time for allocating and initializing the matrices. The first time you are using C after allocating it, all its fields must be accessed once to make sure that C is actually loaded in to the cache/memory. For the small matrices the experiments must be repeated many times to get stable time measurements.
Plot the results in a suitable way: along the x-axis you should give the k values, along the y-axis you should give log_2 T(2^k), where T(2^k) gives the time for an experiment with n = 2^k. The graphs should be about lines. Explain the irregularities in the development and the differences between the methods. Which method is best?
The processor or central processing unit or ALU is the heart of a computer. The processor is a chip which consists of several parts. Historically the most important parts of this chip were the control unit, the arithmetic logical unit or ALU and the registers, but to improve performance caches have been added. The control unit is the coordinator. It fetches instructions and data from the memory and guides their execution. The ALU contains the hardware for arithmetic and logical operations. The registers are used for storing data involved in the computations, but there is also a register reserved for the program counter, containing the address of the next instruction to execute, and an instruction register, containing the current instruction.
The processor is the heart, but without other components, it would not be very useful. The main memory is the normal place where to store data and instructions. For larger amounts of data and more permanent data, there are secondary and tertiary storage media. The hard disk is the most common form of secondary storage. For even larger amounts of data there exist drums and tapes, but nowadays they are only used for special applications and they are loosing importance because of the rapid development of hard disks.
The IO media allow to get data into the system and out off it. The most important IO media are the key board, the mouse and all kinds of reading devices. Punch card readers, once the prime way to feed data into a computer, have died out. The most important output media are the screen and the printer. Of course for the communication with the outside world, the computer contains several other chips which in the case of the video card might be considered to be a special-purpose computer itself.
If a computer is equipped with more than one processor, then each of them may execute instructions. The easiest situation is when there are several programs running in parallel. This is always the case: in addition to your game/browser/compiler/programming task, there may also be a clock, a screen background, and many system routines running. On a single-processor machine, these are all running `at the same time', at least it looks that way. Actually, each of these processes is occasionally switched on for a very short time and then switched of again. The clock is maybe active for a few microseconds per second, taking on average 0.1% of the total time, while your game requires more computation and may be active for 98% of the time. This whole idea is called time sharing. If there are several processors, then the processes demanding time are scheduled over all of them. Alternatively, but this requires special programs, it is also possible to process one task on several processors.
A computer consists of a processor, a hierarchy of memories and IO devices interconnected by buses.
The registers are located in the immediate vicinity of the ALU. There are a small number of them, mostly 16, 32 or 64 . Most instructions require that (part of) the data to work on are located in the registers. During the computation data are loaded into the registers and written away again when the place they are occupying is needed for other data.
At the next levels of the hierarchy there are one, two or three caches. These are fast and small storages which can be accessed rapidly. They are located on the same chip as the processor. Nowadays, a two-level cache is most common. The size of the smaller of the two, the first-level cache, mostly lies between 16 and 64 KB. Data in the first-level cache can mostly still be accessed by the processor in one clock cycle. The second-level cache is considerably larger, mostly between 256 KB and 1 MB. Accessing data stored in the second-level cache takes several clock-cycles.
The main memory is the primary place where data are stored. It consists of so-called random-access memory or RAM. The size of the RAM has gone up dramatically. 1975 a mainframe might have as little as 64 KB, now most PCs have between 256 MB and 4 GB of RAM, a yearly increase by a factor of about 1.5. The main memory is not on-chip, but off-chip. That means that it is not located on the same chip as the processor. Typically the main memory is even composed of a whole bench of chips: a 1 GB main memory may physically consist of 4 memory chips of 256 MB each. This has many implications. In the first place one can choose the main memory somewhat independently from the processor. Also one can upgrade a computer by adding or replacing a few memory chips at a later time. But, it is of course precisely this off-chip nature of the main memory which causes the high cost of accessing it: the distances are much longer, the wires and contacts take more time to load, the request must be guided through several switches, ... . Accessing the main memory takes hundreds of clock cycles.
The name random-access memory was originally given to distinguish it from tapes and other more serial kinds of memory. Any position of the RAM can be accessed in about the same time. With a tape this is very different: it can only be accessed at the position which is currently at the reading/writing head, and to get to another position requires winding through the tape. In normal live we find the same difference when comparing the access possibilities of music tape and a music CD. Currently RAM is realized by having a certain type of transistors which can be switched in two states. This allows to record one unit of information, a bit, in every transistor. Keeping this state requires some electrical power, and therefore all RAM information will be lost when the power is interrupted for more than a second. In earlier days, RAM was realized by having a large number of small magnetizable iron (ferrite) rings which also could be put in two states. The major disadvantage of these was that they were very large for the amount of memory they could store. They nevertheless constituted a great improvement over the earlier used radio tubes, which were even much larger, consuming tremendous amounts of electricity and highly unreliable.
| The picture shows 4 KB of ferrite-based core storage from around 1970. Each small rectangle contains 1024 small rings, each of which can be used to store one bit. The electronics around the actual storage can be used to write and read the stored values. The white rod is a pencil which was added to indicate the size of the card. |
The secondary memory nowadays consists of so-called hard disks, plastic disks with a magnetizable coating on which data can be written and read. Unlike RAM memory, a hard disk does not need sustained electrical power: data can be reused even if the computer was turned-off for years. This is an essential feature, but not interesting for our considerations of the running computer. The size of hard disks has increased from a few MB to hundreds of GB, growing as fast as the main memory or even slightly faster. Accessing the secondary memory may take as much as 10 ms.
Counting per GB of storage, secondary storage is much cheaper than primary (RAM) storage: RAM costs 200 euro per GB, hard disks cost 2 euro per GB. This feature makes it cost-efficient to sometimes also use the hard disk for storing data which are needed in a computation: we have a problem to solve, but unfortunately, we have not enough RAM for all data. Then, part of the data can be written away on the hard disk, while the processor is working on another part.
Not only the access time increases dramatically when using the larger memory elements. Even the bandwidth decreases strongly. The bandwidth of a bus, a connection in a computer system, is the maximum number of bytes which can be transferred over it per second. A high access time does not necessarily imply a small bandwidth, because the bandwidth is computed from the time for transferring a large amount of data placed in consecutive memory positions. The bandwidth of the bus to the caches is not a limiting factor. The bandwidth of the memory bus is more than one GB, which is considerable, but in very memory intensive applications, such as copying the values of one large array to another array, it nevertheless determines the execution time. A normal hard disk works at a rate of around 10 MB/s, about 100 times less than the speed of the main memory. On the one hand, this is bad, on the other hand, 10 MB/s is much more than one might expect from the access time of 10 ms. The actual amount of useful data that may be accessed per second depends on how these accesses are organized.
There is a memory hierarchy ranging from tiny and rapidly accessible to huge and slowly accessible.
The value of a binary number given as an array a[] of length k can be computed with the above sum formula. For a given non-negative integer n, it is also easy to compute the entries of this array. In C this may be done as follows:
int* int_to_binary(int n, int k) {
int i = 0;
int* a = (int*) malloc(k * sizeof(int));
while (n > 0) {
a[i] = n & 1; // The least significant bit
n = n >> 1; // Shifting right one position
i++; }
while (i < k) {
a[i] = 0;
i++; }
return a; }
Here k gives an upper bound on the number of required bits (on a
32-bit machine k = 32 will always be large enough). Instead of an
array of ints, we might also use a type requiring less space.
A clear disadvantage of the binary system from the human point of view, is that the numbers get very long and it is hard to see whether there are five or six consecutive zeroes. A convenient middle-way is to express numbers in the hexadecimal system, that is with radix 16. In the hexadecimal system we use the symbols "0", "1", ..., "9" to denote the first ten values, and then "A", "B", "C", "D", "E", "F" to denote the values 10, ..., 15. Thus, the hexadecimal number 3AB75E corresponds to the decimal number 14 * 16^0 + 5 * 16^1 + 7 * 16^2 + 11 * 16^3 + 10 * 16^4 + 3 * 16^5. Conversion from binary to hexadecimal and vice versa is much easier: just divide the binary numbers in groups of four digits and convert each of them individually: 0011.1010.1011.0111.0101.1111 = 3AB75E (here the dots are only added to make the separations clear). This is like writing a large decimal number with a dot after each group of three, which can be viewed as writing the number in a system with radix 1000.
The largest positive integral number which can be used on most current computers is 2^32 - 1 = 2147483647. If one needs this number, it is not very convenient to write it in decimal because it is an ugly number and it does not even fit on most pocket calculators. As an hexadecimal number it is very simple: 0111.1111. ... .1111 = 7FFFFFFF.
Internally computers work with binary numbers. Externally they can handle numbers given in several number systems. Number conversion is simple and can be performed in a time which is proportional to the the number of digits.
Now we know how to compute the value of a number given its digits, but how to compute the digits? That is, for a given number x, we want to find numbers d_i, i >= 0, with 0 <= d_i < r, so that sum_{i >= 0} d_i * r^i = x. There are two approaches: either first computing the least significant or first computing the most significant digit. The first is slightly simpler. We need some basic rules of modular arithmetic. Integer division is denoted by '/'. the modulo operator by '%'.
Using these rules it follows that if x = sum_{i >= 0} d_i * r^i, that x % r = (sum_{i >= 0} d_i * r^i) % r = (sum_{i >= 0} (d_i * r^i) % r) % r = d_0 % r = d_0. For the last equality we used that 0 <= d_0 < r. Knowing d_0, we observe that x - d_0 is a multiple of r, and can thus be written as x' * r. In other words, x' = sum_{i >= 1} d_i * r^{i - 1}. Thus, d_1 = x' % r. Continuing, all digits can be determined. Each digit requires a constant number of operations.
A primitive type is a type which provides space for a single simple variable and which constitutes the basis for the construction of all derived types. Primitive types in most programming languages (with possibly small variations in the naming) are at least the following
An array is an indexed sequence of elements of a certain type, all elements being of the same type. In mathematics arrays are denoted by a subscript: "x_i" denotes element i of the array x. In computer science it is common to denote arrays with square brackets: "x[i]". Arrays can be defined over any previously defined element type, particularly of booleans, characters, integers and floats. There are two common ways of speaking about an array over some type xxx: one can either say "an xxx array" or "an array of xxx".
The array construction is the most elementary way of obtaining so-called derived types: types which are obtained by any of the type construction mechanisms provided by the programming language under consideration out off previously defined types. The limitation of the array construction is that all elements must be of the same (underlying) type. The power of the array construction is that the number of fields is specified by a parameter. The type of an array with integer elements is "integer array", not "integer array of length 26". This implies that we can make a procedure (a procedure is a fragment of a program which can be called from outside with certain arguments and which returns some value), for finding the minimum of integer arrays, we do not need one such a procedure for each possible length of the arrays (though some older programming languages have this stricter view).
A second main mechanism to obtain derived types is to create a compound type of two or more existing type. This construction is known under many other names ("record", "structure") as well, but it always works in the same way: a compound type has several fields for elements of possibly different types. These fields are accessed by a name, not by using an indexing mechanism. An example of a compound type is "xyz_coordinate", which has three fields "x_coordinate", "y_coordinate" and "z_coordinate", each of them a floating point number. Another compound type may be "personal_record", with fields "name", some kind of array of characters, "age", a small positive integral number, for which one might use a single byte, "personal_number", a large positive integral number, for which one might use an integer. The details of definition and accessing the individual fields differs from language to language.
Of course one can have arrays of compounds, compounds of arrays and arrays of compounds of arrays ... . One of the most important two step constructions are matrices. Mathematically a matrix is a square or rectangle with numbers. For a matrix "A", "A_{i, j}" denotes the element at position (i, j) of the block (i denotes the row and j the column, the indexing starts in the upper-left corner with A_{0, 0}). This is such a fundamental mathematical object, that we will also frequently encounter it in computer programs. The cleanest way of obtaining matrix objects is by defining it as an "array of row", where "row" is defined as an "array of xxx", where "xxx" stands for the type of the elements.
Most computer languages work with a typing mechanism. The array and compound construction are the most important mechanisms for constructing arbitrarily complex derived types out off a small number of primitive types.
Because there are only two boolean values and the operators take only one or two arguments, there are very few combinations, which (unlike computation with integers) makes it possible to define the complete evaluation rules by small tables, which are called truth tables:
Let us introduce symbols for the operators. This makes it easier to distinguish the operators from English words. "not", "or" and "and" are denoted by "||", "&&" and "!", respectively. Like in numerical exprsssions, brackets can be used to enclose subexpressions of a boolean expression which should be evaluated first. So, when we write (x && y) || ((!y) && z), it is clear what is meant: first determine the values of the subexpressions s_1 = x && y and s_2 = (!y) && z, which requires that one first evaluates !y, and then compute s_1 || s_2. It is a good practice to write brackets to prevent any kind of doubt, but there is no obligation to do so: also x && y || !y && z is a correct expression. But what does it mean? The value of such an expression with many operators without any brackets is determined by the priority rules, telling which operators to evaluate first. Just like we know that a * b + -b * c (here "-" is the unary operator) is to be evaluated as (a * b) + ((-b) * c), it is also defined that ! binds strongest, and that && comes before ||. In general, there are differences between programming languages, but always we have:
Next to the above standard operators there are several others. The most important is the "exor" operator, denoted "^". It gives the exclusive or of two variables, that is, "x ^ y" is true if and only if exactly one of the two variables is true. Other operators, most important in hardware design, are "nor", not or, and "nand", not and. The truth tables of these operators are as follows:
The truth tables for the operators are the foundation of all laws on boolean expressions.
Applying this correspondence allows to rewrite boolean expressions as expressions in C_2. Computing in C_2 is very simple because many simplifications can be made: both operators distribute, associative and commutative, a + a == 0 and a * a == a. This is particularly useful for proving equalities. For example, let us check the first distributive law: x && (y || z) == (x && y) || (x && z). The left side corresponds to x * (y + z + y * z) == x * y + x * z + x * y * z, because * distributes over + in C_2. The right side corresponds with x * y + x * z + x * y * x * z = x * y + x * z + x * y * z, because multiplication in C_2 is commutative (x * y == y * x) and idempotent (x * x == x). We also check the first DeMorgan's law: !(x && y) == !x || !y. Rewriting the sides gives 1 + x * y and (1 + x) + (1 + y) + (1 + x) * (1 + y) == 1 + x + 1 + y + 1 + x + y + x * y == 1 + x * y.
Another alternative for working with booleans, is to map the positive integral numbers to the booleans: 0 maps to F, all other numbers correspond to T. In this way (not considering the possibility of overflow), "+" and "*" perfectly correspond to "||" and "&&".
A boolean contains little information, but sometimes one does not want to know more. Assume we have two vectors, v and w, each of length n. Assume that the entries are booleans. Let us say, v[i] == T denotes that something is possible and likewise for w[]. Possibly v[] is the row of a matrix and w[] is a column of a second matrix, but v[] may also be true for all appartments which cost at most 300 euro and w[] may be true for all appartments with at least 70 m^2. We want to determine whether v[] and w[] have a common hit (all affordable sufficiently spacious appartments), that is whether there is an i for which v[i] == w[i] == T. So, we want to compute the following value x:
x = or_{i = 0}^{n - 1} v[i] && w[i].This can also be reformulated into 0-1 terms: v'[] and w'[] are arrays with 0-1 values, v'[i] == 0 if and only if v[i] == T and likewise for w'[] and w[]. For v'[] and w'[] we can compute the following value x':
x' = sum_{i = 0}^{n - 1} v'[i] * w'[i].Clearly x == T if and only if x' > 0. So, computing x' also computes x, but not vice versa: x' gives the number of common hits, while x only gives the existence of hits.
Booleans, || and && quite closely correspond to numbers, + and *, a correspondence which often offers computational possibilities.
In practice it is inconvenient that a byte is the smallest addressable memory unit. There is no way to address individual bits. Therefore, if one declares a boolean variable, this boolean will internally correspond to a byte (or more), wasting 7 of 8 bits in the byte (or worse). For a single variable this does not matter, but if one works with a long array of booleans, then it makes a big difference.
Arrays of booleans are one of severable possible ways for working with sets: if S[i] is true, then the element with index i is an element of the set S, otherwise it is not. Arrays of booleans can also be used for marking purposes: suppose you are searching your way in a maze (labyrinth), then it is a useful idea to mark the places you have already visited. If you are trying to solve this kind of searching problems on a computer it is even more important to mark the visited nodes. Because sets or graphs (a set of nodes connected by a set of edges) can be arbitrarily large, it is a good idea not to waste on memory unnecessarily.
The idea is to view a byte not as a number from 0 to 255, but as 8 bits packed together. So, for storing an array of n booleans, we use an array of n / 8 (rounded up) bytes, and store 8 booleans in each of them. The values of the individual bits can be set and read in a constant number (one or two) of clock cycles using the bitwise operations: in most programming languages there are not only instructions to perform operations on booleans, characters and numbers, but one can also perform 8-, 32- or even 64-bit operations in one stroke. Because there is a one-to-one correspondence between bit operations and boolean operations this feature allows to perform 32 or even 64 boolean operations in one clock cycle, provided that all these operations are of the same kind. The language C provides such bitwise operations: bitwise-and, bitwise-or and bitwise-exor. Bitwise-not can be obtained by computing bitwise-exor with FFFFFFFF.
Packing 8 bits of information into a byte is fine, but how do we set the bits and how do we get the information out again? The following procedures can be used for this:
void set_to_zero(int* x, int i) { // Sets bit i of x to 0
*x = *x & (255 - (1 << i)); }
void set_to_one(int* x, int i) { // Sets bit i of x to 1
*x = *x | (1 << i); }
void flip_value(int* x, int i) { // Flips value of bit i of x
*x = *x ^ (1 << i); }
boolean is_zero(int x, int i) { // returns true if bit i is zero
return (x & (1 << i)) == 0; }
boolean is_one(int x, int i) { // returns true if bit i is one
return (x & (1 << i)) > 0; }
Using 2^k - 1 instead of 255 = 2^8 - 1 in set_to_zero, the same
procedures can also be used if k bits of information are packed
together. All these procedures take just a few clock cycles.
Memory-efficient computation requires that arrays of booleans are packed with 8 booleans per byte. This feature is normally not provided defaultly.
An important point in computer science is the relation between the time for solving a problem and the amount of memory for doing this. There are problems for which we find a real space-time trade-off. This means that if one has more memory the problem can be solved faster. A good example is the management of a set. If we want to maintain a set with values in the range from 0 to some maximum value n - 1, then as said above, a convenient way to do this is to use an array S[] of n booleans. Assume that we get an initially empty set, that is, all values of the array are false (there is a trick, called "virtual initialization" which makes that we do not need this assumption). Then we can enter an element i by setting S[i] = true, we can test whether an element i is there by looking at the value of S[i] and we can remove an element by setting S[i] = false. All this is just a single instruction, said more formally: these operations can be performed in constant time. Of course here we use the property of the RAM: we use that setting S[i] or checking it can be done in a constant number of clock cycles for all i and that the value of i does not matter much. Clearly, if we use this idea for managing the "Matrikelnummer" of the students in the course Informatik I, this is not a very good idea: we would need storage for 99.999.999 booleans of which only 100 are used. In that case, one better should use an alternative approach requiring an amount of memory proportional to the number of elements in the set and not proportional to the value of the largest possible key. In that case we cannot assure constant time for each of the mentioned operations, so here we see the trade-off:
Much memory -> constant time for all operations,
Less memory -> more than constant time for some operations.
One may think that we have the same situation when we are packing booleans in a byte: instead of directly accessing the boolean we want, we need several operations. However, in the section on caching further down, we will see that it is expensive (possibly costing hundreds of clock cycles) to fetch a cache line from the main memory. Packing the data more densely also means that we may expect to reduce the number of cache misses. Generally the reduction of the time consumption because of reduced caching costs is far larger than the increase due to the unraffling.
Space-time trade-offs exist, but packing data more densely generally gives a reduction of both space and time.
Clearly 256 possibilities are not enough for all languages of the world: only for Chinese we need more. Therefore, there are several initiatives for extended character sets. Currently the most used is "Unicode", which uses 16 bits, offering 65536 possibilities, which offers space for Chinese and all other languages of the world.
In many computer languages, there are ways to access the number of a character. Though this is not very elegant, this offers efficient ways to determine whether a character is a small letter, a capital letter, or a symbol. Of course, any program working this way will not work anymore if the coding is changed.
Characters are internally represented numerically, mostly using one byte per character.
If there are positive numbers only, a data type which is often indicated by some prefix like "unsigned", then we have a very simple situation: the value of bit_i (starting to count from the last bit which is bit_0) has value 2^i. Using unsigned integers is useful if one wants to have the maximum range, or for the sake of correctness: why should one use numbers that can be negative for numbers which for some reason always must be positive?
If one also needs negative numbers, than typically about half of the possibilities are reserved for negative numbers. Actually, this is mostly the default. There are various ways of achieving this.
The first idea one would come with the leading bit as a sign bit. So, if this bit is 0, the number is positive, otherwise negative. For 1-byte numbers, we thus would have 00100111 = 39, and 10100111 = -39. Computation with these numbers is (for the computer) slightly more complicated than with the other formats. Another disadvantage is that there are two representations of the number 0: 10000000 = 00000000 = 0.
The most widely used method for realizing negative numbers is the so-called two's complement. In this case for numbers with b bits, the first b - 1 count as usually, while the leading bit, bit b - 1, counts as - 2^{b - 1}. So, 10000000 = -128, 10000001 = -127, ..., 111111111 = -1, 000000000 = 0, 00000001 = 1, ..., 01111110 = 126, 01111111 = 127. The disadvantage of this construction is that it is asymmetric: there is a negative number (-128), without a positive counter part. The main advantage is that at the bit level the subtraction x - y can be performed as x + (-y). The bit pattern of a number -y can be obtained from the bit pattern of the number y using that -y = (-2^b + ((2^b - 1) - y)) + 1, that is, by taking the number that has ones where y has zeroes and vice versa and then adding 1 to the obtained number. This implies that when using two's complement there is no need for special subtraction hardware: 12 - 9 = 00001100 - 00001001 = 00001100 + (11110110 + 00000001) = 00001100 + 11110111 = 00000011 = 3. Notice that in the last addition, there is an overflow which must be ignored to get the correct result. The case that the second number is larger than the first is also treated correctly: 12 - 17 = 00001100 - 00010001 = 00001100 + (11101110 + 00000001) = 00001100 + 11101111 = 11111011 = -128 + 64 + 32 + 16 + 8 + 2 + 1 = -5.
A third way of realizing negative numbers, is the so-called excess representation. In this case, all numbers are shifted by a fixed constant, which is called offset. This means that for an offset c a number x gets the bit pattern belonging to x + c. For, example, for 8-bit numbers, the offset might be 2^7 - 1 = 127, Then we get -127 = 00000000, -126 = 00000001, ..., 0 = 01111111, 1 = 10000000, 2 = 10000001, ..., 127 = 11111110, 128 = 11111111. This representation is clumsy for performing additions and subtractions with, but it is used for the exponents of floating point numbers.
Integral numbers can be mapped onto a bit pattern in various ways and can have various lengths.
In single-precision, that is using 32 bits for a number, a standard format is to divide the bits as [s | e_7 ... e_0 | m_22 ... m_0]. Here s is the sign bit, the bits e_i give the exponent and the bits m_i the mantissa. For double-precision numbers, that is, using 64 bits for a number, the most common lay-out is [s | e_10 ... e_0 | m_51 ... m_0]. We see that double-precision numbers have a somewhat larger range (using 11 instead of 8 bits for the exponent) and a much larger precision (using 52 bits instead of 23 for the mantissa). In any numerical computation, one should be aware that on a computer there are no numbers in the mathematical sense: integers can overflow and underflow, and when working with floating-point numbers one must even take inaccuracies into account: only with luck one will find that (a + b) * (a - b) - a^2 + b^2 = 0.
How does it exactly look? The number is written as a binary number of the form 1.xxxx * 2^yyy, preceded by the sign bit. The leading 1 is omited. So, -49.3125 = -1 * (32 + 16 + 1 + 1/4 + 1/16) = -1 * 110001.0101 = -1 * 1.100010101 * 2^5. So, the signbit is 1, the exponent is 5 and the mantissa (without the leading zero) is 1100010101. The exponent is given with an offset of 127 (1023 for double-precision), so 5 gets the bit pattern of 132: 10000100. The complete 32 bit number then looks like: [1 | 10000100 | 10001010100000000000000], of course the separators are not there, all bits are packed together.
Because 10 is not a power of 2, there are many numbers which can be written with a short decimal fraction, which cannot be written exactly as a binary fraction. The number 0.1 is an example. 0.1 = 3/32 * 16/15 = (1/16 + 1/32) * sum_{0 <= i} 1 / 16^i = 1/16 + 1/32 + 1/256 + 1/512 + 1/4096 + 1/8192 + ... . So, the binary representation of 0.1 is 0.00011001100110011001100 ... . Shifting the point so that there is a leading 1 gives 1.1001100110011001100 ... * 2^{-4}. So, as a floating-point number we get [0 | 01111011 | 10011001100110011001100]. This is not exactly the same value as the decimal value 0.1! When converting a decimal fraction to a floating-point number, some rounding is performed. Actually, the rounding is not performed by simply truncating. Instead the last bit is obtained by a real rounding process (but a different approach may be chosen on another processor). This means that 0.1 is not converted as above indicated but to [0 | 01111011 | 10011001100110011001101]. This is slightly more accurate, but does not solve the fundamental problem that inaccuracy not only arises as a result of under and overflow, but even because of an incompatibility of representation. For example, setting f = 0.1 and g = 10, gives f * g - 1 = 1.490... * 10^{-8} (but a different value may be found on another processor).
In an earlier section an algorithm is given for computing the binary expansion of a given positive integral number. But how to compute the bits of a number smaller than one? It was also pointed out that there are two approaches for computing the digits of a number with respect to a certain radix: either starting from the least significant, or starting from the most significant digit. For the integral part the first approach is most suited, because a priori the number of digits is not known. For a similar, but even more serious reason, the second approach is most suitable for computing the digits of the fractional part. This is not a spectacular observation, as this is precisely what we are doing when computing a decimal expression for 4 / 7. In the following we focus on computing the binary digits, the bits, of the fractional part y of a number x. That is, we must find values d_i, i >= 1, with d_i in {0, 1}, so that sum_{i >= 1} d_i * 2^{-i} = sum_{i >= 1} d_i / 2^i = y.
The fact that the d_i can only be 0 or 1 facilitates the computation considerably. For some numbers this sum will be finite, but in general the number of terms is infinite, as for 0.1. This makes clear that there is no way of starting with the least significant bit. The following two rules are the basis for computing the d_i:
As an example, we compute the first 10 bits of y = 0.1.
y_0 = y = 0.1 < 1 / 2 --> d_1 = 0
y_1 = y_0 - d_1 / 2 = 0.1 < 1 / 4 --> d_2 = 0
y_2 = y_1 - d_2 / 4 = 0.1 < 1 / 8 --> d_3 = 0
y_3 = y_2 - d_3 / 8 = 0.1 >= 1 / 16 --> d_4 = 1
y_4 = y_3 - d_4 / 16 = 0.0375 >= 1 / 32 --> d_5 = 1
y_5 = y_4 - d_5 / 32 = 0.00625 < 1 / 64 --> d_6 = 0
y_6 = y_5 - d_6 / 64 = 0.00625 < 1 / 128 --> d_7 = 0
y_7 = y_6 - d_7 / 128 = 0.00625 >= 1 / 256 --> d_8 = 1
y_8 = y_7 - d_8 / 256 = 0.00234375 >= 1 / 512 --> d_9 = 1
y_9 = y_8 - d_9 / 512 = 0.000390625 < 1 / 1024 --> d_10 = 0
The same pattern is emerging as above. Knowing that the pattern of
zeros and ones must be periodic, it is not hard to guess the pattern.
Once the pattern has been guessed, the correctness of the guess can be
verified. With the appropriate notation, this allows to compute an
infinite pattern in finite time.
How about zero? There is no way of shifting its binary representation so that a leading 1 occurs. One solution is to work with a leading 0 instead of a leading 1. However, this wastes a bit (because for all numbers except zero, the first bit then would be a 1). Therefore, it has been decided (at least according to the IEEE-P754 standard) to have a special representation for zero: [0 | 00000000 | 00000000000000000000000]. In principle the value of this number would be 2^{-127} ~= 5.9 * 10^{-39}, which is small, but not zero. Because zero is such an important number, this representation is treated in a special way so that at least this number can be represented exactly. Actually, even [1 | 00000000 | 00000000000000000000000] is exactly zero. Having +0 and -0 facilitates the product routines working with these numbers (-7 * +0 = -0 and -7 * -0 = +0).
Internally real numbers are maintained as floating point numbers. The available bits are divided over a sign bit, an exponent and the mantissa. The limited accuracy leads to rounding errors.
In fact, it is not hard to perform exact computations with fractional numbers. There is no need to expand 0.1 = 1/10 in a binary way: 4 / 7 * 1 / 10 = 4 / 70. This is easy. Square (and other) roots are harder, but even with these computations can be performed in a symbolic way. However, many numbers, the so-called transcendent numbers, can really not be represented in a convenient way. e and pi are two such numbers. Nevertheless, computers may be instructed to work even with these in an accurate way (given that a scheme is provided for evaluating an arbitrary number of digits): at any stage of the computation only those digits are evaluated that have an impact on the output. This means that the computation time becomes a function of the accuracy required to guarantee a correct answer.
For the computer bits are bits, and he cannot tell whether some byte 11110110 stands for the letter "ö", for an unsigned byte with value 246, for a signed byte in two's-complement representation with value -10, for some byte inside an integer or some byte inside a floating-point number. The higher levels should therefore make sure that data in the memory are accessed in the right way. In some programming languages this is done in a very strict way, in the language C almost anything is possible, correctly handling the memory access is the responsibility of the programmer.
A transistor has three connections called base, collector and emitter. In the most common type of transistor, the collector is connected to plus and the emitter to minus. As long as no tension is supplied on the base almost no current will run from the collector to the emitter: the switch is closed. If a positive tension is supplied on the basis a small current runs from the base to the emitter. The semi-conducting properties make that this in a certain way opens the switch, so that a much larger (about 100-fold) current can run from the collector to the emitter.
The sketched properties of the transistor can be used to build basic gates. All logical functions can be build from NOT and NOR (= not-or) gates and also from NOT and NAND (= not-and) gates. All three can be constructed using two transistors and one resistor.
In digital computers all computation is performed digitally. Arithmetic operations can be defined in terms of logical gates, but dedicated circuits require fewer transistors. Gates are packed together in a single chip, also called integrated circuit or IC. The simplest ICs consist of just a few gates, but there are also ICs with millions of gates, being capable of performing 64-bit arithmetic or of storing 2 GB of data.
It is also at the conventional machine level that the procedure calling mechanism is implemented. Calling a procedure means making a jump to another position in the program, passing some arguments, allocating local variables, computing, and returning to the calling instance, possibly passing back some value. After returning the computation should go on as if nothing has happened, the local variables of the calling instance must be available again. This is achieved by managing the data on a stack. A stack is a first-in-first-out data structure: data are added and removed at its top.
When calling a subroutine, a new workspace on top of the stack is created, this workspace is called a stack frame. After returning this space is made available again (there is no need to explicitly destroy the stored information). The address of the top of the stack is kept in a special register called stack pointer, abbreviated SP. Addressing with respect to the stack pointer is clumsy, because in the course of a subroutine new variables may be allocated, changing the value of SP. Therefore it is common that a second register is used as a fixed reference point, abbreviated LB, for all parameters and variables in a subroutine. The stack space may either grow from low to high memory addresses, upwards, or from high to low memory addresses, downwards. We assume the second which is most common.
When calling a procedure, all parameters are written on the stack in the order of their occurrence. Then the return address is added, then LB is added. After these additions, the assignment LB = SP is performed, giving LB its new value. Hereafter the procedure is entered, and space for the local variables is allocated. Thus, on a downwards growing stack, the parameters have a positive and the local variables a negative offset with respect to LB. When returning from a subroutine, execution continues at the position which is found at the position stored at address LB - 4 (assuming that each value on the stack takes four bytes) and LB is restored by assigning it the value which is found at address LB itself. The above is only one of several possibilities: LB may also point to the bottom or the top of the stack frame.
So, every procedure call implies a certain amount of copying and management of SP and LB. The speed with which this is performed is important for the speed with which a well-structured program, delegating subtasks to subroutines, can be executed. Fortunately the amount of copying is proportional to the number of variables and parameter of the called procedure, not of the calling procedure. Therefore, it may be assumed that calling a small function with one or two variables and one or two local variables causes almost no delay. Experiments confirm this. Calling a subroutine with many parameters is more expensive, but such a subroutine mostly also contains many instructions. So, eliminating subroutine calls, at the expense of readability, by copying the code of the subroutine at the position where it is called, a process called inlining, will normally save at most a few percent of the execution time. Inlining may even make the execution slower: if a procedure is called from several places, then inlining makes the code longer. For large programs, this means that a smaller fraction of the program fits in the program cache. Because a cache fault is far more expensive than a few operations on the program stack, this may have a negative impact.
The described calling mechanism is illustrated with the following simple program which can be downloaded here:
#include "stdio.h"
void init(int*a , int n) {
int i;
for (i = 0; i < n; i++)
a[i] = (i * 4) % n; }
int count(int* a, int l, int h, int x) {
// Count the number of occurrencies of x in a[l] ... a[h - 1]
int m;
if (h == l + 1) // Subarray has length 1
return (a[l] == x ? 1: 0);
else { // Recurse in each half of the subarray
m = (l + h) / 2;
return count(a, l, m, x) + count(a, m, h, x); } }
int main() {
int n, x;
int* a;
printf("\nGive the value of n >>> ");
scanf("%d", &n);
a = (int*) malloc(n * sizeof(int));
init(a, n);
printf("Give the value of x >>> ");
scanf("%d", &x);
printf("x occurs %1d times in a[]\n\n", count(a, 0, n, x));
return 1; }
The program asks for n and reads its value from input. An array a[] of
length n is allocated and somehow initialized. The value of x is read
from input and then the subroutine count() is called which computes the
number of occurrencies of x in a[]. The given implementation is
recursive exploiting the following observations:
If n = 7, count is called from main with l = 0 and h = 7. From count a call is made with l = 0 and h = 3. The following call has l = 0 and h = 1. This does not result in further calls, the procedure is left returning some value. Back at the call with l = 0 and h = 3, the second call to count is made. It has l = 1 and h = 3. In total count is called 13 times (in general the number of calls equals 2 * n - 1). Notice that recursion results in an depth-first execution order. This means that when drawing the execution schedule in a tree-like fashion with the root at the top, all operations in one branch of the tree are performed before executing any instruction of the next branch.
We continue to assume that the stack grows downwards and that the smallest address of the stack frame of main() is 880. According to our basic considerations, each stack frame has size 7, one for each variable plus one for the return address and LB. So, during call 1 the stack frame spans the 28 bytes ranging from address 852 to 879. During call 2 it ranges from 824 to 851 and during call 3 from 796 to 823. From here the recursion does not go deeper, going one level up before performing call 4. Therefore, during call 4 the same space is used as during call 3. During call 5 it goes one deeper again. Before performing call 6, the recursion first returns from call 5.
In this context the notion of recursion depth is important. A program is said to have recursion depth d if the tree corresponding to the calling structure of its execution has depth d. The depth of a tree is the maximum distance of any leaf, a node at the bottom of the tree, from the root. For a program with recursion depth d, up to d + 1 stack frames must be allocated on the program stack. Before execution of a program starts, a fixed amount of space is reserved for the program stack and SP should never become smaller than the first address of this space. If this happens nevertheless, the program execution is interrupted with a message like "stack overflow". If, when forgetting to handle a terminal case without further recursion, a program contains an infinite recursion, allocating one stack frame after the other, this will be the consequence. However, this may even happen for programs which are logically correct if there is a very deep recursion. With the given implementation of count(), the recursion depth of the above program is logarithmic in n, and this will not be a problem, but the following is also correct:
int count(int* a, int n, int x) {
// Count the number of occurrencies of x in a[l] ... a[h - 1]
if (h == l) // Subarray has length 0
return 0;
else // Recurse for first n - 1 elements of array
return count(a, n - 1, x) + (a[n - 1] == x ? 1 : 0); }
Here the recursion depth is linear in n, and for slightly larger n this
will lead to a crash.
Most instructions of the operating system level, such as those for arithmetic and logic, are already present at the microprogramming level. These instructions might be handed down level-by-level, but for efficiency reasons they are directly interpreted by the microprogram. In addition there are specific operating-system machine-level instructions which do not exist at the lower levels. These are instructions which have to do with the management of the computer.
The operating-system manages the processes running on the computer. Here the time-sharing mechanism is realized. Time sharing means that on a single computer many processes can run, each getting the impression of being served continuously. This is done by allocating short time slots to each of the processes depending on their priority and need for computing power. The fact that several processes are running in a time-shared fashion, implies that the running time of a program cannot be accurately determined by measuring the clock time at the beginning and end of its execution.
The operating system is also in charge of I/O and the secondary memory management. If the data of a process do not fit into the main memory, then part of the data are paged-out onto the hard disk and upon need paged in again. This paging is coordinated by the operating system, which decides which data to maintain and which data to overwrite by new data (after writing the old data back if they were changed). Virtual memory is the additional space which becomes available through this paging mechanism: on a computer with 1 GB of main memory, one may work, at the price of a possibly large slow down, as if there were 2 or 4 GB of memory.
All data have a logical and a physical address. The logical address is used inside the lower-level programs. The physical address gives the actual address in the main memory or on the hard disk. It is convenient to have this distinction, because it is convenient to have a contiguous view of the memory, even though physically this cannot always be assured (the memory may be fragmented) because memory is allocated and deallocated dynamically in the course of the program and because other processes are using memory as well. The operating system maintains a table for looking up the physical addresses of the pages on the hard disk.
A compiler translates programs written in higher languages to assembly language. For each pair of languages there must be a different compiler. Writing compilers which lead to efficient assembly code takes time, and therefore running a program written in a popular language on a processor with a widely used assembly language often will go faster than when using an uncommon language on some obscure processor, because in the latter case the program will be translated in a basic way not fully exploiting the features of the machine.
The assembly code is translated by the assembler to machine code. This is a rather simple process because these instructions are in one-one correspondence. For each assembly-language instruction its operation code must be looked up in a table. Slightly harder is the translation of symbolic addresses which are used in conditional and unconditional jump statements. In the machine code all these must be replaced by addresses. Therefore, the assembler mostly makes two passes over the assembly code. In the first pass all labels are collected and stored in a table together with the address of the following instruction. In a second pass this table is used to substitute addresses for all labels.
From a practical point of view, except for system programmers and hardware designers, the lower machine levels are of little importance. Most people with a degree in computer science are somehow dealing with software. This software is written in some high-level language, often it is even composed of standardized and reusable components which are glued together by a few lines of own code. The assembly-language level is nevertheless interesting and worth a closer look for several reasons:
There is one more reason why one would like to have some understanding of assembly language. In most programming there are mostly a few lines in which 95% of the time is spend, for example, the innermost loop in a matrix product computation. If for such a program speed is of great importance, then in a final stage one might even consider the generated assembly code. As we will see, the optimization tools which are build into the compiler can do a lot, but they apply general rules, and do not understand what is going on. So, it might be possible to eliminate a few lines or to arrange them in a better way.
The above discussion makes clear that it would be a waste of time to learn all about a particular assembly language. On the other hand, because most of these languages are structured in more or less the same way, knowing something about any particular language will allow to understand another language with little effort. In the following we will consider some examples from the SPARC 7 assembly language. These will be treated with the above considerations in mind: it is not necessary to understand every single detail, as long as we are able to figure out what is going on. In the following we consider some simple C programs and the corresponding assembly code, both after a simple translation and using optimization.
Programming assembly code is tedious and of limited use. Understanding assembly code provides insight and offers opportunities for final optimizations.
#include "stdio.h"
int main() {
int i, x, y, z;
x = 123456789;
printf("Give x >>> ");
scanf("%d", &x);
printf("Give y >>> ");
scanf("%d", &y);
z = 32 * (x + y);
printf("z = %d\n", z);
return 1; }
Using the option -S when compiling with gcc, the compiler
generates the following assembly code instead of an executable:
.file "assembly_ex1.c" .section ".rodata" .align 8 .LLC0: .asciz "Give x >>> " .align 8 .LLC1: .asciz "%d" .align 8 .LLC2: .asciz "Give y >>> " .align 8 .LLC3: .asciz "z = %d\n" .section ".text" .align 4 .global main .type main, #function .proc 04 main: !#PROLOGUE# 0 save %sp, -128, %sp !#PROLOGUE# 1 sethi %hi(123456512), %g1 or %g1, 277, %g1 st %g1, [%fp-24] sethi %hi(.LLC0), %g1 or %g1, %lo(.LLC0), %o0 call printf, 0 nop add %fp, -24, %o5 sethi %hi(.LLC1), %g1 or %g1, %lo(.LLC1), %o0 mov %o5, %o1 call scanf, 0 nop sethi %hi(.LLC2), %g1 or %g1, %lo(.LLC2), %o0 call printf, 0 nop add %fp, -28, %o5 sethi %hi(.LLC1), %g1 or %g1, %lo(.LLC1), %o0 mov %o5, %o1 call scanf, 0 nop ld [%fp-24], %o5 ld [%fp-28], %g1 add %o5, %g1, %g1 sll %g1, 5, %g1 st %g1, [%fp-32] sethi %hi(.LLC3), %g1 or %g1, %lo(.LLC3), %o0 ld [%fp-32], %o1 call printf, 0 nop mov 1, %g1 mov %g1, %i0 ret restore .size main, .-main .ident "GCC: (GNU) 3.3.3"
In the first lines we find the format of the IO statements. It is interesting that the compiler has noticed that the two readf statements have the same format: the label .LLC1 is used twice.
Any procedure, also main starts with a save and ends with a restore. This saves the state of the calling procedure and ultimately restores it.
Corresponding to the instruction "x = 123456789;" we find
sethi %hi(123456512), %g1 or %g1, 277, %g1 st %g1, [%fp-24]Here we encounter the various registers. In total there are 32 integer registers, eight of each of four types. These are %g0, ..., %g7, %o0, ..., %o7, %l0, ..., %l7 and %i0, ..., %i7. "sethi" sets the upper 22 bits of the second argument to the value of its first argument. "or" performs a bitwise logical or of its first two arguments and stores the result in the third argument. The number 123456789 is split in 123456512 (which has its last 10 bits equal to zero) and 277, and assigned in two steps to register %g1. Then the value of %g1 is stored away at memory position %fp-24. %hi() is not the name of a register but the operation of extracting the highest 22 bits of a number. There is an analogous operation %lo(), which extracts the lowest 10 bits.
This clumsy construction is necessary for several reasons: only the values of registers can be copied into memory, therefore the number must first be assigned to a register. Why can 123456789 not be assigned directly to %g1? The reason is that any instruction with all its arguments has to fit in 32 bits. 32 bits are not much!
The IO statements all translate in a similar way. For "printf("Give x >>> ");" we find
sethi %hi(.LLC0), %g1
or %g1, %lo(.LLC0), %o0
call printf, 0
The first two lines are now easy to understand: .LLC0 is copied in two
steps into %o0. This is somehow passed as an argument to the procedure
printf which is called in the third line.
Reading a number is more complicated because also the address of the variable has to be passed to the subroutine. For "scanf("%d", &x);" we find
add %fp, -24, %o5
sethi %hi(.LLC1), %g1
or %g1, %lo(.LLC1), %o0
mov %o5, %o1
call scanf, 0
The value of %fp, which is the reference point of the memory space, is
added to -24 and stored in %o5. In two steps .LLC1 is copied to %g1.
Then %o1 is moved to %o1, it is not clear why %o1 was not used before
in the addition. Then %o0 and %o1 are passed as arguments to scanf.
The numerical computation "z = 32 * (x + y);" shows how variables are handled:
ld [%fp-24], %o5
ld [%fp-28], %g1
add %o5, %g1, %g1
sll %g1, 5, %g1
st %g1, [%fp-32]
First x is loaded into %o5 and y into %g1. Then these two are added
into %g1. The compiler is clever and has noticed that 32 = 2^5.
Therefore it does not use multiplication, which requires calling a
subroutine, but uses the operation sll which is a mnemonic for "shift
left logical". So, %g1 is taken, shifted 5 positions leftwards and then
stored again in %g1. This is stored at address %pf-32, which
corresponds to z.
The final instruction "return 1;" is translated as follows:
mov 1, %g1
mov %g1, %i0
ret
Here we see that a small value can be copied directly into a register.
This is copied into %i0, which is the return value of the procedure.
Each C instruction translates to several lines of assembly code. Particular complications arise because an entire assembly instruction has to be packed in 32 bits.
#include "stdio.h"
#define MAXINT 0X7FFFFFFF
void initialize(int* a, int n, int c) {
int i;
for (i = 0; i < n; i++)
a[i] = (c * i) % n; }
void swap(int* a, int* b, int n) {
int i, x;
for (i = 0; i < n; i++) {
x = a[i]; a[i] = b[i]; b[i] = x; } }
int span(int* a, int n) {
int i, m, M;
m = MAXINT;
for (i = 0; i < n; i++)
if (a[i] < m)
m = a[i];
M = 0;
for (i = 0; i < n; i++)
if (a[i] > M)
M = a[i];
return M - m; }
int main() {
int i, s, n;
int* a;
int* b;
printf("\nGive n >>> ");
scanf("%d", &n);
a = (int*) malloc(n * sizeof(int));
b = (int*) malloc(n * sizeof(int));
initialize(a, n, 173);
initialize(b, n, 93);
swap(a, b, n);
s = span(a, n);
printf("Span of a[] = %1d\n\n", s);
return 1; }
Typing "gcc assembly_ex2.c -S" generates the following assembly code:
.file "assembly_ex2.c" .global .umul .global .rem .section ".text" .align 4 .global initialize .type initialize, #function .proc 020 initialize: !#PROLOGUE# 0 save %sp, -120, %sp !#PROLOGUE# 1 st %i0, [%fp+68] st %i1, [%fp+72] st %i2, [%fp+76] st %g0, [%fp-20] .LL2: ld [%fp-20], %o5 ld [%fp+72], %g1 cmp %o5, %g1 bl .LL5 nop b .LL1 nop .LL5: ld [%fp-20], %g1 sll %g1, 2, %o5 ld [%fp+68], %g1 add %o5, %g1, %l0 ld [%fp+76], %o0 ld [%fp-20], %o1 call .umul, 0 nop mov %o0, %g1 mov %g1, %o0 ld [%fp+72], %o1 call .rem, 0 nop mov %o0, %g1 st %g1, [%l0] ld [%fp-20], %g1 add %g1, 1, %g1 st %g1, [%fp-20] b .LL2 nop .LL1: ret restore .size initialize, .-initialize .align 4 .global swap .type swap, #function .proc 020 swap: !#PROLOGUE# 0 save %sp, -120, %sp !#PROLOGUE# 1 st %i0, [%fp+68] st %i1, [%fp+72] st %i2, [%fp+76] st %g0, [%fp-20] .LL7: ld [%fp-20], %i5 ld [%fp+76], %g1 cmp %i5, %g1 bl .LL10 nop b .LL6 nop .LL10: ld [%fp-20], %g1 sll %g1, 2, %i5 ld [%fp+68], %g1 add %i5, %g1, %g1 ld [%g1], %g1 st %g1, [%fp-24] ld [%fp-20], %g1 sll %g1, 2, %i5 ld [%fp+68], %g1 add %i5, %g1, %i4 ld [%fp-20], %g1 sll %g1, 2, %i5 ld [%fp+72], %g1 add %i5, %g1, %g1 ld [%g1], %g1 st %g1, [%i4] ld [%fp-20], %g1 sll %g1, 2, %i5 ld [%fp+72], %g1 add %i5, %g1, %i5 ld [%fp-24], %g1 st %g1, [%i5] ld [%fp-20], %g1 add %g1, 1, %g1 st %g1, [%fp-20] b .LL7 nop .LL6: ret restore .size swap, .-swap .align 4 .global span .type span, #function .proc 04 span: !#PROLOGUE# 0 save %sp, -128, %sp !#PROLOGUE# 1 st %i0, [%fp+68] st %i1, [%fp+72] sethi %hi(2147482624), %g1 or %g1, 1023, %g1 st %g1, [%fp-24] st %g0, [%fp-20] .LL12: ld [%fp-20], %i5 ld [%fp+72], %g1 cmp %i5, %g1 bl .LL15 nop b .LL13 nop .LL15: ld [%fp-20], %g1 sll %g1, 2, %i5 ld [%fp+68], %g1 add %i5, %g1, %g1 ld [%g1], %i5 ld [%fp-24], %g1 cmp %i5, %g1 bge .LL14 nop ld [%fp-20], %g1 sll %g1, 2, %i5 ld [%fp+68], %g1 add %i5, %g1, %g1 ld [%g1], %g1 st %g1, [%fp-24] .LL14: ld [%fp-20], %g1 add %g1, 1, %g1 st %g1, [%fp-20] b .LL12 nop .LL13: st %g0, [%fp-28] st %g0, [%fp-20] .LL17: ld [%fp-20], %i5 ld [%fp+72], %g1 cmp %i5, %g1 bl .LL20 nop b .LL18 nop .LL20: ld [%fp-20], %g1 sll %g1, 2, %i5 ld [%fp+68], %g1 add %i5, %g1, %g1 ld [%g1], %i5 ld [%fp-28], %g1 cmp %i5, %g1 ble .LL19 nop ld [%fp-20], %g1 sll %g1, 2, %i5 ld [%fp+68], %g1 add %i5, %g1, %g1 ld [%g1], %g1 st %g1, [%fp-28] .LL19: ld [%fp-20], %g1 add %g1, 1, %g1 st %g1, [%fp-20] b .LL17 nop .LL18: ld [%fp-28], %i5 ld [%fp-24], %g1 sub %i5, %g1, %g1 mov %g1, %i0 ret restore .size span, .-span .section ".rodata" .align 8 .LLC0: .asciz "\nGive n >>> " .align 8 .LLC1: .asciz "%d" .align 8 .LLC2: .asciz "Span of a[] = %1d\n\n" .section ".text" .align 4 .global main .type main, #function .proc 04 main: !#PROLOGUE# 0 save %sp, -136, %sp !#PROLOGUE# 1 sethi %hi(.LLC0), %g1 or %g1, %lo(.LLC0), %o0 call printf, 0 nop add %fp, -28, %o5 sethi %hi(.LLC1), %g1 or %g1, %lo(.LLC1), %o0 mov %o5, %o1 call scanf, 0 nop ld [%fp-28], %g1 sll %g1, 2, %g1 mov %g1, %o0 call malloc, 0 nop mov %o0, %g1 st %g1, [%fp-32] ld [%fp-28], %g1 sll %g1, 2, %g1 mov %g1, %o0 call malloc, 0 nop mov %o0, %g1 st %g1, [%fp-36] ld [%fp-32], %o0 ld [%fp-28], %o1 mov 173, %o2 call initialize, 0 nop ld [%fp-36], %o0 ld [%fp-28], %o1 mov 800, %o2 call initialize, 0 nop ld [%fp-32], %o0 ld [%fp-36], %o1 ld [%fp-28], %o2 call swap, 0 nop ld [%fp-32], %o0 ld [%fp-28], %o1 call span, 0 nop mov %o0, %g1 st %g1, [%fp-24] sethi %hi(.LLC2), %g1 or %g1, %lo(.LLC2), %o0 ld [%fp-24], %o1 call printf, 0 nop mov 1, %g1 mov %g1, %i0 ret restore .size main, .-main .ident "GCC: (GNU) 3.3.3"Here we really start to appreciate the conciseness of C, 39 lines of source code become 261 lines of assembly code!
The new program reveals the general format of procedures: they all start with a prologue. Here the stack pointer %sp is decreased by a certain amount, allocating space for the variables of the subroutine. Hereafter, the parameters, which are passed in the registers %i0, %i1, ... are copied into local variables.
The C instruction "for(i = 0; i < n; i++)" is translated into
st %g0, [%fp-20]
.LL2:
ld [%fp-20], %o5
ld [%fp+72], %g1
cmp %o5, %g1
bl .LL5
nop
b .LL1
nop
.LL5:
...
...
ld [%fp-20], %g1
add %g1, 1, %g1
st %g1, [%fp-20]
b .LL2
nop
.LL1:
First i, which is stored at address %fp-20 is initialized with the
value of %g0, which apparently equals zero. Then this value of i is
loaded in %o5 and the value of n, which is stored at address %fp+72 is
loaded in %g1. These two values are compared with the cmp instruction.
bl is the mnemonic for "branch less". This means, that when in the
comparison the first argument is smaller than the second the execution
continues at the following label, in this case .LL5. Otherwise the
execution procceeds with the next line of code and comes to the
unconditional jump to label .LL1, jumping beyond the loop. At the end
of the loop is loaded again. 1 is added to the register %g1 containing
i and this value is stored again. Then there is an unconditional jump
to .LL2, where the condition is tested again.
The numerical computation involving arrays "a[i] = (c * i) % n;" is turned into the following:
ld [%fp-20], %g1
sll %g1, 2, %o5
ld [%fp+68], %g1
add %o5, %g1, %l0
ld [%fp+76], %o0
ld [%fp-20], %o1
call .umul, 0
nop
mov %o0, %g1
mov %g1, %o0
ld [%fp+72], %o1
call .rem, 0
nop
mov %o0, %g1
st %g1, [%l0]
First i is loaded into %g1. Because each position of the array takes
four bytes, i is multiplied by four by shifting two positions to the
left. The result is stored in %o5. Then the starting address of a[],
which was stored at address %fp+68. These two values are added and the
result is stored in %l0. This is the address of a[i]. Now c and i are
loaded and multiplied. Stupidly the result, which stands in %o0 is
moved to %g1 and then %g1 is move again to %o0. n is loaded into %o1
and the modulo subroutine .rem is called. The result is moved again
from %o0 to %g1 and finally it is stored at the previously computed
address %l0 of a[i].
Most of the rest is similar. An interesting point is still the exchange of the two array values "x = a[i]; a[i] = b[i]; b[i] = x;". How is this done at assembly level? Here is the code.
ld [%fp-20], %g1
sll %g1, 2, %i5
ld [%fp+68], %g1
add %i5, %g1, %g1
ld [%g1], %g1
st %g1, [%fp-24]
ld [%fp-20], %g1
sll %g1, 2, %i5
ld [%fp+68], %g1
add %i5, %g1, %i4
ld [%fp-20], %g1
sll %g1, 2, %i5
ld [%fp+72], %g1
add %i5, %g1, %g1
ld [%g1], %g1
st %g1, [%i4]
ld [%fp-20], %g1
sll %g1, 2, %i5
ld [%fp+72], %g1
add %i5, %g1, %i5
ld [%fp-24], %g1
st %g1, [%i5]
After computing the address, a[i] is loaded into %g1. This value is
stored at %fp-24, the address of x. Then the address of a[i] is
computed again and the result is left in %i4. Then the address of b[i]
is computed and b[i] is loaded into %g1. This value is then stored at
address which is found in %i4, that is, it is assigned to a[i].
Finally the address of b[i] is computed again and the value of x, which
is loaded from address %fp-24 to %g1 first, is stored at its position.
Quite a lot of work for a simple exchange of two variables!
Each procedure starts with a save and ends with a restore. All ifs, fors and whiles are translated using tests and gotos.
What possibilities does a compiler have to optimize the code? We list the most important ideas:
Of course these optimization tools come at a price: the compilation takes much longer. It also, at least partially, explains why modern compilers are so much larger than those in the early days. In 1984 a complete Pascal compiler took about 20 KB of storage, leaving some space for the program on a computer with 48 KB of memory. A modern C compiler requires 30 MB of storage!
Optimizers are integrated in compilers. They help to generate assembly code which can be executed several times faster than non-optimized code.
.file "assembly_ex1_opt.c" .section ".rodata" .align 8 .LLC0: .asciz "Give x >>> " .align 8 .LLC1: .asciz "%d" .align 8 .LLC2: .asciz "Give y >>> " .align 8 .LLC3: .asciz "z = %d\n" .section ".text" .align 4 .global main .type main, #function .proc 04 main: !#PROLOGUE# 0 save %sp, -120, %sp !#PROLOGUE# 1 sethi %hi(123456512), %o5 sethi %hi(.LLC0), %o3 or %o5, 277, %o4 or %o3, %lo(.LLC0), %o0 call printf, 0 st %o4, [%fp-20] add %fp, -20, %o1 sethi %hi(.LLC1), %i0 call scanf, 0 or %i0, %lo(.LLC1), %o0 sethi %hi(.LLC2), %o2 call printf, 0 or %o2, %lo(.LLC2), %o0 add %fp, -24, %o1 call scanf, 0 or %i0, %lo(.LLC1), %o0 ld [%fp-20], %o0 ld [%fp-24], %o1 sethi %hi(.LLC3), %g1 add %o0, %o1, %o2 mov 1, %i0 or %g1, %lo(.LLC3), %o0 call printf, 0 sll %o2, 5, %o1 ret restore .size main, .-main .ident "GCC: (GNU) 3.3.3"We first consider the translation of the fragment "x = 123456789; printf("Give x >>> "); scanf("%d", &x);". This translates to
sethi %hi(123456512), %o5
sethi %hi(.LLC0), %o3
or %o5, 277, %o4
or %o3, %lo(.LLC0), %o0
call printf, 0
st %o4, [%fp-20]
add %fp, -20, %o1
sethi %hi(.LLC1), %i0
call scanf, 0
The instructions have been rearranged. There is no longer memory
allocated for the useless variable i, now x is stored at address
%fp-20. The useless move from %o5 to %o1 has been eliminated. Also the
nop (which is the mnemonic for "no operation") after the print is no
longer there. Without knowing more about the precise timing of the
instructions one cannot tell whether it was necessary in the
non-optimized code or not. It is common that there are limitations on
the instructions like "instruction xyz should not immediately follow
instruction pqr". All together, the optimized code is considerably
better than the original one: 13 lines have been reduced to 9.
On the other hand, the code is still far from optimal. It may come as a surprise that the useless initialization of x is performed. However, this is beyond the horizon of the compiler: the compiler does not know that the value of the parameter x which is passed to scanf has no importance inside scanf. Also it is somewhat surprising that the value of x is stored to memory before calling scanf. It might also have been kep in one of the many registers. This is relevant, because the instruction st costs at least 3 clock cycles (this number is given in a description of the sparc instruction set) while copying a value to a register takes only 1 clock cycle. If x is not standing in the cache yet, it costs even much more. Nevertheless, this is not an error of the compiler. The reason is similar: the compiler does not know what is done inside scanf. Thus, the value cannot be left in a register because the registers might be used in the subroutine. The fact that the address of x is passed to the subroutine suggests that it will be written there, but this is not necessarily so. Therefore, in order to be on the safe side, the value of x must be written to memory before entering the subroutine.
The optimizer eliminates useless instructions and rearranges them, but it is not omniscient and cannot always attain the optimal code.
We only consider the following subroutine in some detail:
int span(int* a, int n) {
int i, m, M;
m = MAXINT;
for (i = 0; i < n; i++)
if (a[i] < m)
m = a[i];
M = 0;
for (i = 0; i < n; i++)
if (a[i] > M)
M = a[i];
return M - m; }
Here there is a large potential for optimization. One might wonder
whether the compiler decides to squeeze the two loops together. This
is profitable in all respects, because it reduces the number of loop
operations, it improves the usage of the pipeline, and the variables of
a[] have to be brought into the memory only once. All together, this
will reduce the cost by at least a factor two.
The optimized assembly code is as follows:
span:
!#PROLOGUE# 0
!#PROLOGUE# 1
mov %o0, %o3
mov 0, %o5
sethi %hi(2147482624), %o0
cmp %o5, %o1
bge .LL78
or %o0, 1023, %o4
sll %o5, 2, %o0
.LL81:
ld [%o3+%o0], %g1
cmp %g1, %o4
bge .LL67
add %o5, 1, %o5
mov %g1, %o4
.LL67:
cmp %o5, %o1
bl .LL81
sll %o5, 2, %o0
.LL78:
mov 0, %o0
cmp %o0, %o1
bge .LL80
mov 0, %o5
sll %o5, 2, %g1
.LL82:
ld [%o3+%g1], %g1
cmp %g1, %o0
ble .LL73
add %o5, 1, %o5
mov %g1, %o0
.LL73:
cmp %o5, %o1
bl .LL82
sll %o5, 2, %g1
.LL80:
retl
sub %o0, %o4, %o0
.size span, .-span
.ident "GCC: (GNU) 3.3.3"
In the whole subroutine not a single value is stored to memory. Everything is done within the registers. This is possible because no further subroutines are called. The starting address of a[] is moved to %o3 and the value of n is left in %o1. %o5 is used for the counter i. In the earlier examples there were many nop operations. One of the reasons for this is, as becomes clear from the above code fragment, that the first instruction after a jumping statement is still executed. This leaves some more time for actually performing the jump. In the optimized version the nops are replaced by useful statements. To reduce the number of jumping steps, the loop condition is now tested in two places: once at the beginning before the first pass, and once at the end, before each further pass. This is all quite clever, but the two loops are not squeezed. Apparently, for such a drastical code rewriting one of the many other optimization options must be specified.
In the optimized code variables are stored to memory only if necessary, normally they are kept in the registers.
If an instruction requires data computed by a previous instruction, this computation cannot proceed and the pipeline stalls. The compiler will discover this problem and if possible it will therefore try to rearrange some instructions to alleviate this problem without changing the computed result. This is called out-off-order execution. Conditional instructions are another problem for pipelined execution: it takes some time to figure out how the computation will continue. In the case of if statements, the main trick is that the compiler tries to predict which of the two branches is most likely and temptatively continues with this branch. This is called branch prediction, and it is claimed to be highly effective. It may also be that by default the first of several alternatives is taken, and therefore performance may sometimes improve if the most likely alternative is put in the if-clause. In the case of loops with few instructions, such as a for loop initializing an array, after each execution there is a test whether to continue or not. In this case the compiler (or the user) can partially unroll the loop, performing several passes of the loop before testing the condition again. Loop unrolling is a major way to reduce execution time.
A pipeline-conscious program may be several times faster than a program which was written without paying attention to the features of the nstruction execution. Fortunately, this performance improvement can mostly be achieved by minor changes, most of which can be found by a good compiler. Therefore, there is normally no need to take the pipeline into account during program development: pipeline optimization is a programmatic rather than an algorithmic issue.
Recently it has become hard to continue doubling the clock frequence every 18 months as was achieved for the last 30 or 40 years. This exponential increase of the processor speed is often called Moore's law. Due to physical reasons it becomes harder to strongly reduce the size of the circuits. High clock frequences also require high voltage and high voltage implies high energy consumption (V^2 / R Watt is dissipated when putting V volts over a resistor with resistance R and the actual energy consumption increases even stronger than quadratic with V). For mobile devices this is not acceptable because this reduces the time one can work without being connected. For desk-top devices, power consumption itself is not yet the major concern, but it becomes hard to sufficiently cool the processor using air cooling. Liquid cooling is reasonable only for expensive super computers. At the same time, competition forces the producers to come with more powerful processors. One way out is to increase parallelism that is, the possibility to execute several instructions at the same level of the pipeline at the same time in parallel. For example, in the ALU there may be independent units for additions and multiplication, This may mean that in a single clock cycle an addition and a multiplication can be performed. In the case of branching statements, it may even be possible to follow both paths until it becomes clear which one is the right one.
Computations can only be performed on data which are located in one of the registers. In older days, when the clock frequence was infinitely low, the missing data could be fetched from the main memory in a single clock cycle. However, at a frequence of 10^10 Hz, a signal can travel less than 3 centimeter in the time of a clock cycle. Therefore, there are several levels of cache storage located on the processor chip. Because these are much closer to the processor, and because there is no additional delay due to going off-chip, these can be accessed relatively fast. Making the caches larger is a final way of improving processor performance.
The continuing performance improvement is mainly due to large-scale integration, deeper pipelines, on-chip parallelism and larger caches.
The motivation for copying 64 bytes, when we were asking only for one or four, is that it is very expensive to fetch data from the main memory, and often an application will also need the neighbors. In this way one hopes to amortize the access cost. Everything becomes much faster every year, but the speed of processors is increasing faster than the speed at which such requests for data can be handled. The gap has been growing continuously and is developing into a serious problem.
Data cannot be placed freely at any position in the cache. This computation of the position is namely done in a very simple way (basically by just taking the memory position modulo the size of the cache). Sometimes the system can also choose several positions. The number of positions from which can be chosen, is called the associativity of the cache. In this context one may say: "the cache is one-way/two-way/four-way/fully associative".
We give an example of a typical case: a four-way associative cache of size 256 KB (when talking about memories, K = 2^10 = 1024, M = 2^20, G = 2^30 and T = 2^40, but 1 GHz really means 10^9 Hz) and cache lines of 64 bytes. This cache should better be viewed as consisting of 2^18 / 64 = 4096 cache lines. Suppose that the data item stored at position i = 39506151 of the memory is needed by the processor. Written binary we have i = 10.010.110.101.101.000.011.100.111. Then first the beginning j of the 64 = 2^6 bytes stretch containing this item is computed. This can be done in one operation by putting the last 6 bits of i at zero: j = 10.010.110.101.101.000.011.000.000. Then the possible cache positions are computed. There are several strategies, but the simplest is to consider the four possibilities given by k_l = (j / 64) modulo 1024 + 1024 * l, for l = 0, 1, 2, 3. Here we use 1024 = 4096 / 4, the number of cache lines which fit into the cache, divided by the associativity. That is, the positions 001.101.000.011, 011.101.000.011, 101.101.000.011 and 111.101.000.011. Computing these numbers may appear hard, but this can be done in one clock cycle using bitwise operations for masking and shifting. Then it must be checked whether this cache line is already there. If not, then it must be copied from the main memory and positioned at one of the possible places. This implies that some earlier cached information must be thrown out. Which one? A good heuristic is to throw out the cache line which has not been used longest: the LRU, least-recently used strategy. Of course, applying this strategy requires that for every cache line there is maintained some extra information telling when this cache line was accessed for the last time. Finally, the 64 bytes of the main memory starting at position j are copied to position k_l for the chosen l, possibly after copying the information which was stored in this section of the cache back to the main memory.
Clearly, having caches makes everything much more complicated. There is also a considerable overhead in computing the positions, checking for available copies and so on. Particularly it also costs extra time to fetch 64 bytes from the memory instead of 1 or 4. All this effort is wasted if out off a cache line a single number is used a single time before this cache line is overwritten by another cache line. That caches are nevertheless used in all modern processors is due to the fact that in most computations there is either temporal or spatial locality. Temporal locality means that within a certain stretch of time the computation stays local in the sense that it is working only on a small subset of the data. Spatial locality means that in consecutive time steps a computation tends to work on data which are stored close together in the memory. In programs without any form of locality it would be better to not have caches. The locality of a program can normally not be improved by rearranging a few instructions: a cache-aware program requires a cache-aware algorithm to begin with. This is a major issue, because a program with poor cache behavior can be 10 times slower than a cache-aware program. Furthermore, this factor has been growing rapidly and it appears that it will continue to grow in the future.
The limited choice of where to position required data in the cache is needed to make it simple to find back the information. At the same time, it implies that in a k-way associative cache there may be sets of k + 1 elements which are continuously kicking each other out of the cache, because they are mapped to the same set of k cache positions. If this happens, we say that the program is trashing. If one is measuring the performance of a program, then it may happen that a minimal change (like reordering the variable declarations) gives a considerable performance change. This may be caused by the difference between trashing or not.
Another complication with having caches is when several processors are working on the same data. Processor 1 is assigning the value of variable x to variable y, processor 2 is assigning the value of y to variable z. Which value gets z? In general this is unclear, because we should specify more exactly the times at which this happens. But even if we enforce that the first assignment is performed before the second, then it does not need to mean that finally z equals x. It may namely happen, that after the assignment y = x, processor 1 is not writing back the new value of y to the main memory but is keeping it in the cache (in general this is clever, because if y gets assigned a new value once more, we have saved an access to the main memory). It may also happen that processor 2 already had y in the cache and does not feel obliged to get a fresh value from the main memory. In all these cases it may happen that finally z equals the original value of y, or another value. In the technical language, this is called the problem of cache coherence. Because the caches must be kept coherent to not encounter any of the mentioned incorrect computation sequences, it is hard to efficiently solve a problem on several processors of a computer accessing a shared main memory, each having their own caches.
Understanding the cache behavior is the key to understanding the performance of programs in practice.
A basic such mechanism is directly build into the operating system and a program can be used this way without any explicit instructions from the side of the programmer. Normally, there is a so-called swap partition, a special section of the hard disk, reserved for this purpose. This is also called the swap space. Mostly the size of the swap space is of the same order of magnitude as the main memory. If we are running an application which requires 800 MB on a computer with 512 MB RAM, then all data which do not fit into the main memory are paged-out automatically. If these data are needed, then they are paged-in again. In this way, if there is for example 600 MB swap space, we can work on a computer with 512 MB main memory as if there were (512 + 600) MB of memory. Therefore this additional memory is also called virtual memory.
Unfortunately virtual is not as good as real. The reason is that accessing the secondary memory is several order of magnitude (an order of magnitude is a factor 10) slower than accessing the main memory. There are several reasons for this. In the first place is the hard disk even more remote from the processor than the main memory. But, more importantly, this is due to the very different nature of the storage medium. A hard disk consists of one or more rotating disks. Data are stored more or less like on a old fashioned music LP: in circular tracks. The only difference being that the tracks do not run in a spiral. Such a track is divided into sectors which can be used for data storage. These sectors are quite large, offering space for several KB. For each of the one or more (sides of) disks there is one reading/writing head (but typically at most one of them can be used at a time). Now if we want to read the data in sector 17 on track 193, first the head must be moved to stand over track 193, and then it must wait until sector 17 comes by. The delay for moving to the specified track is called seek time, the delay for waiting until the sector comes by is called rotational delay. Knowing that hard disks rotate at around 7200 rpm (rotations per minute), it follows that on average the rotational delay is 60 / (2 * 7200) = 0.004 seconds. The seek time is similar, thus performing random accesses to the hard disk, we can on average perform on the order of 100 accesses per second. That is not much!
Fortunately, if we do not only want to read sector 17 on track 193 but also 18, 19, ..., then the hard-disk controller is so clever to organize this reading so that we have to position the reading head only once. There are many other clever ideas in this domain. We just mention two of them. Suppose we want to read all sectors of track 193. Suppose that when the reading head arrives at the track, it is standing over sector 14. Then it can wait until sector 0 passes by, but it can also start to read immediately, temporarily store the data in a buffer and return them in the correct order. The second idea applies when we want to read several sectors scattered over the hard disk. Then these can be accessed in the specified order: the reading head will move back and forth over the disk. Alternatively one can try to schedule the accesses so that the distance covered by the reading head is minimized. This is achieved by sorting the requests according to track number: the reading head will move once from the inside outwards or vice versa. If one also takes into account the rotational delays, then the problem to solve becomes much harder
In this way we can often read much more than 100 sectors per second. If we are reading very long sections at a time, then the transfer rate between memory and hard disk increases to several MB per second. Exactly the same phenomenon we find also when considering the transfer of data between main memory and cache: a single cache miss (the event of not finding in the cache a requested data item) costs several hundreds of clock cycles. But, if we are requesting many data items which are stored consecutively in the main memory, then these are delivered at a much higher rate, reducing the average cost to a few clock cycles per delivered item. At both levels, it is attempted to improve the performance by applying pre-fetching: this means that if the program is accessing data item i, i + 1, i + 2, then it is guessed that also i + 3 will be needed, and while the processor is still working on i + 2 the system already fetches i + 3. Possibly this effort is wasted, but there was not much else to do anyway.
Random accesses to the hard disk are terribly expensive. Acceptable transfer rates between primary and secondary memory are obtained only when accessing large stretches of the secondary memory consecutively.
If a data item which is needed by the processor does not reside in the cache, we speak of a cache miss; when a requested data item does not reside in the main memory, we speak of a page miss (the analogy would have been better if cache misses would be called "line miss"). In the case of a cache miss, the line containing the requested data item is copied from the main memory to one of the few possible places. This replacement strategy is kept simple because a cache miss should cost only a few clock cycles (which unfortunately is not achieved any longer as was discussed above). In the case of a page miss, the page containing the requested item is copied from the secondary memory to the main memory. A page is the unit of reading and writing on the hard disk. Different from the size of the cache lines, which is a system parameter, the size of the pages is user defined (though down in the system there is some system specific page size as well). A typical default size is 8 KB. Because this operation is `infinitely' expensive, there is time to try to optimize the replacement strategy. So, the location of the page to copy is not determined by a simple modulo computation. In this case the LRU strategy is applied in full generality: the page is placed instead of the page which was least-recently used (this requires the implementation of a data structure known under the name "priority queue").
The LRU strategy is simple and intuitive. It is the most used strategy for deciding which page to throw out of the main memory.
The system provided virtual-memory management is convenient: it allows to go into the `red' without need for a new program. However, whether this works or not depends very strongly on which problem one is solving and which program one is using. We have seen that every time one wants to access a data item which is not available in the main memory a whole page containing thousands of data items is brought into the main memory. If all these items are useful, then the performance will be worse than if one would have sufficient main memory, but the factor is not large. If on the other hand, there are only a few useful data items on each page loaded, then the performance breaks down entirely, in many cases one will not have sufficient patience to wait for completion: increasing the size of the problem by 10% may increase the computation time by a factor 100 or more. Saying it more formally, the performance degradation strongly depends on the locality of the program. For example, when computing a[i] = b[i] + c[i], for i = 0, 1, ..., n - 1, then there is good spatial locality and the computation will still go quite fast even if n becomes so large that the arrays do no longer fit into the main memory. On the other hand, computing a[i] = b[c[i]], may go very slowly for large: a[] and c[] are still accessed in a consecutive way, but, depending on the values in c[], the access pattern of b[] may be chaotic.
There is no need to rely on the virtual-memory management. If one is especially designing programs for solving very large problems, then it is better to take over the control. This is done by making sure that the whole application fits into the main memory and explicitly writing away to a file the data which one temporarily does not need. The biggest theoretical advantage of this user-controlled memory management is, that one exactly knows which data are in the main memory. This makes that you can give performance guarantees, because no unexpected paging occurs. More practically, it turns out that doing it yourself is faster: if you need many data at once, this can be specified; data which do not need to be saved are not written away; no time is wasted on finding out which page should be removed. The really compelling argument to apply explicit memory handling is that the mentioned swap partition is mostly relatively small (though one can declare it to be bigger from the start). Some of the other partitions (typically with names such as "/tmp", "/var", "/var/tmp" or "/scratch") are much bigger, which allows to solve huge problems on a 1000-euro PC.
The virtual-memory manager provides a convenient but limited extension of the possibilities. For serious computations with large data sets, one should deal with the memory management in an explicit way.
Java was originally designed as an interpreted language. However, not the source code is interpreted, but something called byte-code. This byte-code is generated by a program (one could say a compiler) called "java.c". So, once the program is written, one types "java.c MainClassName.java". If there are no syntactical errors, then a new file with name "MainClassName.class" is generated. The program can now be executed by typing "java MainClassName".
The following gives a very simple program computing the average value of the fields of an array:
class ArrayAverage
{
static final int SIZE = 100;
public static void main(String ps[])
{
int i, sum;
int[] a = new int[SIZE];
for (i = 0; i < SIZE; i++)
a[i] = i;
for (i = 0, sum = 0; i < SIZE; i++)
sum += a[i];
System.out.println("\nThe average value is " +
(float) sum / SIZE + "\n");
}
}
Comparing with the C program doing the same, we see that details have changed, but that the program as a whole is more or less the same. The differences are
In Java there are quite generally applied name-giving conventions:
At the most basic level, the difference between Java and C is small.
Classes, structured data types defined along with the operations that can be performed on them, their functionality, are the central notions of object-oriented programming languages.
class Employee
{
String name;
int number;
double salary;
public Employee(String theName, int theNumber, int theSalary)
{
name = theName;
number = theNumber;
salary = theSalary;
}
public void increaseSalary(double salaryIncrease)
{
salary += salaryIncrease;
}
public String toString()
{
return "(" + name + ", " + number + ", " + salary + ")";
}
public void setName(String newName)
{
name = newName;
}
}
Here we see many important aspects of classes. First the header. A
class header always consists of the word "class" followed by the name
of the class, in our case "Employee". In the basic case we are
considering here, we then get a "{", which is matched by a "}" at the
end of the definition of a class. It should be noticed that here we
only describe a class, we do not create an object of this class.
Then we see a list of variables: a String, an int and a double. These variables will be called instance variables. Other names are in use as well. Any Employee object (= instance) has the instance variables name, number and salary. So far a class is just like a struct in C.
The difference with a struct is that the definition of a class contains also the definition of the methods that are working on objects (= variables) of this class. Further down we will see how this goes, but here we can notice already that there are four such methods: Employee, increaseSalary, toString and setName.
The latter three resemble procedures in C: they have a name, parameters and a return type. In addition we find the word "public", which is an example of an access modifier. "public" means that these methods are accessible from outside. If we would have written "private" instead, these methods could only have been called from inside the class itself. In total there are four of these access modifiers, they will be discussed further down.
The method Employee is more special. It is a so-called constructor. When calling this method in combination with the keyword new, then memory is allocated and the instructions in the constructor are executed. Typically a constructor contains instructions to initialize the instance variables, but it may also do more or less. In any case: each class must have at least one constructor, otherwise no objects can be generated. The name of a constructor is always the same as the name of the class and therefore one does not indicate the return type: by default it returns an object of the class.
Inside a class a list of instance variables together with their type is followed by all methods. Each class definition must contain at least one constructor, which is called when creating new objects.
class IntegerMatrix
{
int n;
int[][] a;
public IntegerMatrix(int size)
// Initializes all positions with 0
{
n = size;
a = new int[n][n];
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
a[i][j] = 0;
}
public IntegerMatrix(IntegerMatrix matrix)
// Creates a copy of matrix
{
n = matrix.n;
a = new int[n][n];
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
a[i][j] = matrix.a[i][j];
}
public int trace()
// Computes sum of diagonal elements
{
int s = 0;
for (int i = 0; i < n; i++)
s += a[i][i];
return s;
}
public boolean findValue(int x)
// Checks whether the value x occurs
{
int i;
for (int j = i = 0; i < n && a[i][j] != x; i++)
for (j = 0; j < n && a[i][j] != x; j++);
return i < n;
}
}
Here we see all the things we saw above plus some new features.
Most noticeable is that there are two constructors. They have
the same name, inside the same scope (both names are visible inside
and outside the class). Nevertheless this is correct: the names are
the same, but their signature is not. The signature of a method
is the whole set of name, parameter list and return type. For two
parameter lists to be the same, parameters of the same types must
appear in the same order. Here the first constructor has an int
parameter, the second has an IntegerMatrix parameter. When calling
these methods from outside, the compiler/interpreter has no problem
in figuring out which of the two is meant: it just has to check the
type of the parameters and to match it with one of the specified
methods. This is a first example of polymorphism about which we
will hear more further down.
Now it is also time to notice that inside the class the methods can work with the instance variables. In general, in a method there are four kinds of variables:
Local variables must not be declared at the beginning of a method. On the contrary: in Java it is considered to be good style to declare a variable locally. Also it is considered good to initialize a variable upon declaration. One must be careful with the scope of a local variable. By scope we mean the "visibility range" of a variable. The scope of a variable stretches from its declaration to the end of the level at which it was declared. So, a variable declared at the beginning of a method is visible anywhere in the method, but not outside the method. That is why we call it a local variable. The variable i in findValue is of this type. A variable which is declared in the header of a for loop is visible inside this loop, but not outside of it. The variable j in findValue is of this type. The reason that i was not declared in the header of the first loop is that we wanted to use it in the final comparison. A variable declared inside a compound statement is visible only within this compound statement.
It is correct and perfectly accepted to use the same variable name in many different methods. This possibility assures that program fragments can be combined without extensive effort to trace all common variable names. If from a method in which a variable x is used another method is called in which also a variable x is used, then of course this latter x is the valid one, because the scope of the x in the calling method is limited to its own method.
Slightly less clear that the following is also correct:
int i = 10;
for (int i = 0; i < 1000; i++)
a[i] = 2 * i;
System.out.println("i = " + i);
What is going to be printed? 10 of course! Outside the for loop the
locally defined variable i is not existing, the scope of this variable
is limited to the loop. On the other hand, inside the loop the original
variable i is not visible: it is shielded by the more local
variable. The reason why this works, not only in Java, is that the
compiler creates its own list of variables and has no problem to keep
the variables apart. Even though this works, there is rarely a good
reason to program this way, and therefore this confusing style of
programming should be avoided.
In the second constructor, we see that there may be a parameter of the same type as the class. One might fear that this leads to confusion. However, the instance variables of such a parameter are accessed like the instance variables of any other object with help of the dot-operator, ".", just like in C. So, n is the instance variable of the instance under consideration, while mat.n denotes the corresponding instance variable of the parameter mat.
In both constructors we see how an array is allocated. In C there is no distinction between declaring an array variable and the allocation of its memory. In Java, writing int a[][] creates an array variable without allocating memory (which would be hard, because the compiler still does not know how big it should be). An array variable is actually a reserved memory location in which pointers to arrays of the appropriate type can be stored. The call "new int[n][n]" allocates space for n * n integers and returns a pointer to this space. This pointer is assigned to the array variable a. All this is very clean. A similar construction we have in C when we use int** a and malloc to allocate memory, but this is quite ugly.
Now that we speak about memory allocation: in Java one does not have to bother about cleaning up (though it is possible to do so): the system runs a garbage collector in the background. A garbage collector is a program which checks for objects to which no pointers are pointing anymore and then deallocates their memory.
A variable of a class type is actually only a pointer to an object of this type. This object can be generated calling a constructor and then it can be assigned to the variable.Not all data types are classes: the primitive data types, several numerical types, characters and booleans, are not classes. All derived types are classes. Only the variables of class types are called objects. This distinction is important: when calling a method, objects are passed "by reference", while non-objects (= normal variables) are passed "by value". Actually this is not an entirely correct view: a variable of a class type is actually a pointer. So, if we pass a class variable as parameter in a method call, then the value of this pointer variable, an address, is copied into the corresponding parameter. This is the same as in C, the only difference is that for variables which are not objects, variables of the primitive types, there is no way to specify that we want to pass their address.
At this point Java is rigid, and sometimes this makes it hard to do easy things. In C it is trivial to write a procedure "swap" for exchanging the value of two variables which are passed as parameters: one passes their address instead of their value, which is done with help of the address operator "&". In the procedure one can access the values of these parameters with help of the value-of operator "*". In Java this simple and common task can only be realized in a quite elaborate way, using a so-called wrapper class: a class with a single instance variable of a primitive type, which thus obtains object status. The following example, which can be downloaded here, gives a possible work-out of this idea. Java provides also predefined wrapper classes: Integer, Float, Boolean, ... .
class Int // A self-defined wrapper class
{
public int v; // The wrapped value
public Int(int x)
{
v = x;
}
static public void swap(Int a, Int b)
{
int c;
c = a.v;
a.v = b.v;
b.v = c;
}
}
public class Swap
{
static public void swap(int a, int b)
{
int c;
c = a;
a = b;
b = c;
}
public static void main(String[] args)
{
int a = 4;
int b = 7;
System.out.print("a = " + a + ", b = " + b + "\n");
swap(a, b); // Swapping without effect
System.out.print("a = " + a + ", b = " + b + "\n");
Int aWrap = new Int(a); Int bWrap = new Int(b); // Wrapping
Int.swap(aWrap, bWrap); // Swapping
a = aWrap.v; b = bWrap.v; // Unwrapping
System.out.print("a = " + a + ", b = " + b + "\n");
}
}
When calling methods class variables, objects, are passed by reference, while variables of primitive types are passed by value. Wrapper classes grant object status to primitive types, allowing to pass variables of primitive types by reference.
class Node
{
int key;
Node next;
Node(int key, Node next)
{
this.key = key;
this.next = next;
}
Node(int key)
{
this(key, null);
}
}
class Chain
{
Node first;
public Chain()
{
first = null;
}
private Node getLast()
// Return the last node of a chain
{
if (first == null)
return null;
Node node = first;
while (node.next != null)
node = node.next;
return node;
}
public void addFirst(int key)
// Add a new node at the beginning of the chain
{
first = new Node(key, first);
}
public void addLast(int key)
// Add a new node at the end of the chain
{
if (first == null)
first = new Node(key);
else
getLast().next = new Node(key);
}
public void concatenate(Chain chain)
// Attach the Chain chain at the end of the considered chain
{
if (first == null)
first = chain.first;
else
getLast().next = chain.first;
chain.first = null;
}
public boolean findValue(int x)
// Test whether there is a node with key value x
{
Node node = first;
while (node != null && node.key != x)
node = node.next;
return node != null;
}
public void print()
// Print all the keys together with their position in the list
{
int counter = 0;
Node node = first;
while (node != null)
{
System.out.println("Node " + counter + " has key " + node.key);
counter++;
node = node.next;
}
}
}
In Node there are two instance variables: key and node. key is a simple
integer instance variable as we have seen before. The exiting thing
is that node is of type Node. Is this possible? What does it mean?
Here it is crucial that an object, and any variable of type Node is
an object because Node is a class, is only a pointer and not the thing
itself (otherwise we would get an explosion). So, upon calling one of
the constructors with "new Node( ... )", space is allocated for one
integer and for one pointer to a Node object (each takes either 4 or 8
bytes) and a pointer to this space is returned.
Linked structures can be defined by defining a class with an instance variable with the data type of the class itself. Because memory for an object is only allocated when explicitly calling a constructor, this does not lead to a recursive explosion.
The constructors of Node contain the special word this. this has several related meanings. It means either: "this class", or "the current object". In our example we find examples of both applications:
The first constructor of Node is of a conventional type: two parameters are passed and assigned to the instance variables. Slightly problematic might be the assignment "this.next = next". Here a Node object is assigned to another Node object. What does it mean? If one realizes that an object is a pointer, the answer is clear: afterwards this.next points to the same object as next. This is general: an assignment "x = y" can always be performed when x and y are variables (y may also be a constant) of the same type (or more generally when the type of y may be converted to the type of x). In case x and y are of a primitive type, then afterwards x has the same value as y. In case x and y are objects, then afterwards x points to the same object as y (even in this case one can say that x has the same value as y, namely the same address).
The class Chain has only one instance variable: the Node first. The single constructor is trivial: no parameters, first is set to null. null is a constant value which can be assigned to any pointer variable (that is, object). It means something like "to_nothing". The important thing is that it can be used in tests. If first (or any other object) has value null, then it would be fatal (that is, leading to a runtime error) to use first.key: first == null means that the pointer of first has no specific value, in particular it is not pointing to a storage space of a Node. Thus, first.key, which means so much as the value of the int stored in the storage space first is pointing to, is not defined. Errors of this kind are very common. In the above example we were careful not to run into it.
The other methods of Chain are for adding nodes either at the beginning or at the end, for checking whether an element exists or for printing all keys in the order they appear in the list. Further methods can be added to make it more useful, here we only give an example. The method getLast is declared private. This is because we decided that it should be for internal usage only. The reason for this is that we maybe do not want to guarantee that it is always there or not in exactly this form. This prevents users from using features which they are not supposed to use. This is a first example of encapsulation about which we will hear more further down.
Now that we are presenting the Chain, we should also try to understand how exactly it works under addition of nodes. Initially we have an empty structure: first == null. Then, the first addition (it does not matter which of the additions is used) creates a new initialized Node by calling "new Node(key)" and assigns the returned value, a pointer to a Node to first. The later additions are of two kinds.
addFirst performs
first = new Node(key, first);Here many things are happening! First the value of first, a pointer to a Node or null, is looked up and together with the new key it is passed to the Node constructor. This creates a new Node object with the same next value as first had so far. Then the resulting pointer is assigned to first.
addLast performs
getLast().next = new Node(key);Here a new Node object with the new key value is created. Its next value is set to null. Then the resulting pointer is assigned to the next field of the Node which is found by calling getLast. Here getLast walks along the chain until coming to the last node and returns this object (of course it would be handy to have a second instance variable "last" in order to access this position faster, but this would be less instructive).
It was pointed out that one should be very careful not to access the instance variables of a null-object. Is this not exactly what we are doing in the following loop in findValue?
while (node != null && node.key != x)
node = node.next;
No! The reason is that in an expression involving && the left-hand
side is evaluated first. If node == null, it is certain that the whole
evaluation will result with false and therefore it is interrupted.
On the other hand, it would have been fatal to write
while (node.key != x && node != null)
node = node.next;
Even though this works, depending on the programming language there
may be no guarantee that it does. Therefore this is an example of a
possibly risky programming style which might better be avoided. In
this case this goes at little extra cost by rewriting findValue() as
follows:
public boolean findValue(int x)
{
if (first == null)
return false;
Node node = first;
while (node.next != null && node.key != x)
node = node.next;
return node.key == x;
}
Using linked structures in an object-oriented way requires that one or more objects of some node-type occur as instance variables in the definition of another class. These instance variables give the access points to the structure.
class Employee
{
protected String name;
protected int number;
protected double salary;
public Employee(String theName, int theNumber, double theSalary)
{
name = theName;
number = theNumber;
salary = theSalary;
}
public double getSalary()
{
return salary;
}
public double getNumber()
{
return number;
}
public void increaseSalary(double salaryIncrease)
{
salary += salaryIncrease;
}
public String toString()
{
return "(" + name + ", " + number + ", " + salary + ")";
}
public void setName(String newName)
{
name = newName;
}
}
class Company
{
protected int size;
protected int maxSize;
protected Employee staff[];
public Company(int theMaxSize)
{
size = 0;
maxSize = theMaxSize;
staff = new Employee[maxSize];
}
public int getSize()
{
return size;
}
public int getMaxSize()
{
return maxSize;
}
public void setName(int number, String name)
{
int i = 0;
while (i < size && staff[i].getNumber() != number)
i++;
if (i == size)
System.out.print("Number not found, ignoring instruction!\n");
else
staff[i].setName(name);
}
public void addEmployee(String name, int number, double salary)
{
if (size == maxSize)
System.out.print("No space left, ignoring instruction!\n");
else
{
staff[size] = new Employee(name, number, salary);
size++;
}
}
public void increaseSalary(double factor, double leastIncrease)
{
for (int i = 0; i < size; i++)
{
double increase = factor * staff[i].getSalary();
if (increase < leastIncrease)
staff[i].increaseSalary(leastIncrease);
else
staff[i].increaseSalary(increase);
}
}
public void print()
{
System.out.print("\nOverview of employees:\n");
for (int i = 0; i < size; i++)
System.out.print("Employee[" + i + "] = " + staff[i] + "\n");
}
}
class CompanyTest
{
public static void main(String ps[])
{
Company myCompany = new Company(100);
myCompany.addEmployee("Becker, Boris", 235521, 4500.00);
myCompany.addEmployee("Hecht, Edgar", 878722, 6500.00);
myCompany.addEmployee("Albers, Marianne", 456212, 1554.00);
myCompany.addEmployee("Krauser, Angela", 426578, 1954.00);
myCompany.addEmployee("Noack, Christina", 663738, 5646.00);
myCompany.print();
myCompany.increaseSalary(0.04, 50.0);
myCompany.addEmployee("Brauer, Harald", 568900, 2200.00);
myCompany.setName(456212, "Becker Marianne");
myCompany.print();
System.out.print("\n");
}
}
Here we have slightly changed even Employee. We have made the instance
variables "protected" in order to restrict the access from outside the
class. Instead special access methods are supplied.
These methods are typically given names like "getNumber" and "setName".
Of course this means that many extra calls to methods are made, but
errors in future extensions is worse! Never forget that in Java the
prime consideration is correctness, not speed. If speed is really
critical (as it is in programs solving very large problems and in
games), then in some small well-documented sections in which most of
the computation is performed you may do ugly things. However, if speed
really matters, then one can better write a hack in C.
One should also notice that once we have defined Employee how amazingly simple it is to build Company on top of it: we just declare an array of Employee and add a few methods for performing operations on the Company as a whole. Then the main program is more or less an empty shell. The good thing is that even without knowing about the underlying organization, any reader who understands the format immediately grasps what is going on. This is partially because of the names that were chosen, but even more because of the usage of powerful subroutines and the object-oriented programming style.
Here we touch on the most important new point. What does writing "myCompany.addEmployee( ... )" or "staff[i].increaseSalary( ... )" mean? Here we see the second usage of the dot-operator. Before we have seen that it can be used for accessing the instance variables of an object. Here we use it to connect an object with a method from its class. The semantic of this is, that the system first determines the class of the object, then searches for a method with matching signature in the class and then executes the method working on the instance variables of the object. In object-oriented languages, this is the major way of calling methods.
An exception are the static methods. A static method is any method which in its definition is preceded by the keyword "static". Static methods can be called without passing an object of the class on which it works. This implies that inside a static method there are no instance variables to use. A static method corresponds to a procedure in C and other non-object-oriented languages. The non-static methods are something new, the static ones we already know! Even in Java we already know one important example: main. Of course main should be callable without object, because by the time it is called there is not yet any object!
Static methods are encapsulated inside their classes. That is, if they are called from outside the class, it is not obvious where to find such a method (there might be static methods with the same name in several classes). Therefore, when calling a static method it is necessary to indicate where they can be found. This is done by prefixing the name of a static method with the name of the class connected by ".": another usage of the dot operator.
In principle it is possible to program in Java as in C: make one big class without instance variables and declare all methods to be static. This is against the whole concept of object-oriented programming, and therefore considered to be extremely bad style. Sometimes it is very handy though to have static methods, sometimes it more clearly expresses what is going on (a call with an object puts one object in the foreground, but maybe the operation uses several objects as arguments in a symmetric way), and sometimes there is no alternative: as we mentioned before variables of the primitive data types are no objects. So, how should one compute e^x for a double x? The exponent function, and many other mathematical functions alike, are therefore static. This allows to call them the conventional way, without first converting a double into a Double (Double is the class with a double as an instance variable). Therefore, inside the class Math the method exp is defined as
public static double exp(double a)It can be called by writing Math.exp(x).
A somewhat strange case are the constructors. These are called by only giving the name of the method, but because this name is identical with the name of the class, it is clear where to find them. No object is passed, in this sense it resembles a static method, but the constructor allocates the object, and therefore the instance variables are available like in non-static methods.
In object-oriented programming, the default way of calling methods is by connecting an object of the appropriate class to the method with help of the dot operator. The method is working on this object. Static methods are called by specifying the class without passing an object.
Computing fib(n), the n-th Fibonacci number using directly fib(n) = fib(n - 1) + fib(n - 2) gives an algorithm whose time consumption increases exponentially with n. Of course Fibonacci numbers can easily be computed in an iterative way, but that is not the point here: this problem stands for a whole class of problems. An efficient recursive algorithm can be obtained by not only computing fib(n), but also fib(n - 1). From these two values fib(n + 1) and fib(n) can be computed in constant time and thus the time for computing fib(n) increases linearly with n, as it should do.
In C the two computed values may be handed over using variables of type int*. In Java each variable can be individually wrapped, but doing that means ignoring the structure of the problem: if the method should return a pair of values, then we should use objects of some class which can hold a pair of integers. This can now easily be turned into a correct and efficient program, but this leads to a functional rather than to an object-oriented approach. In an object-oriented context, it is cleaner to let the method work on objects of some class, than to let a static method return objects of this class. Taking all these considerations into account, we get the following program which can be downloaded here:
import java.io.*;
class IO
{
public static int readInt()
// Reads an int from standard input.
{
String input = "";
try
{
BufferedReader bufRead = new BufferedReader
(new InputStreamReader (System.in));
input = bufRead.readLine();
}
catch (java.io.IOException e)
{
System.out.print("Error while reading input line!\n");
}
return Integer.valueOf(input).intValue();
}
}
class Fibonacci
{
private int x, y;
private Fibonacci()
{
x = y = 0;
}
private void recFib(int n)
{
if (n == 1)
{
x = 0;
y = 1;
}
else
{
recFib(n - 1);
y += x;
x = y - x;
}
}
static int fib(int n)
{
if (n == 0)
return 0;
Fibonacci p = new Fibonacci();
p.recFib(n);
return p.y;
}
}
class FibonacciTest
{
public static void main(String[] args)
{
System.out.print("\nGive n >>> ");
int n = IO.readInt();
System.out.println("Computed value = " + Fibonacci.fib(n) + "\n");
}
}
Here we will not try to understand the method readInt(). In Java even
IO is handled in a clean object-oriented way, but one would prefer C's
basic but convenient routines. Unformatted writing is easy, but reading
and formatted writing require quite elaborate methods. More interesting
is the class Fibonacci. It contains a static method fib(), which is
called from main(). We see how when calling readInt() and fib(), the
name of the class is indicated by prefixing it with the respective
class names.
In fib() an object p of the class Fibonacci is created. recFib() is called with this object. In recFib() recursive calls are made. Remind that the statement "recFib(n - 1)" is equivalent to "this.recFib(n - 1)", and in this way p, or more correctly a pointer to p, is handed all the way down until reaching the bottom of the recursion. There the values x and y are given values. Remind that whenever working inside a class with the instance variables, these are the instance variables of the current object. In our case this is the object p. Then the recursion returns step-by-step, eventually computing fib(n - 1) and fib(n). The second of these values is returned by fib().
class Node
{
static int totalSize = 0;
int key;
Node next;
Node(int key, Node next)
{
this.key = key;
this.next = next;
totalSize++;
}
Node(int key)
{
this(key, null);
}
protected void finalize()
{
totalSize--;
}
}
class Chain
{
Node first;
public Chain()
{
first = null;
}
private Node getLast()
// Return the last node of a chain
{
if (first == null)
return null;
Node node = first;
while (node.next != null)
node = node.next;
return node;
}
public void addFirst(int key)
// Add a new node at the beginning of the chain
{
first = new Node(key, first);
}
public void addLast(int key)
// Add a new node at the end of the chain
{
if (first == null)
first = new Node(key);
else
getLast().next = new Node(key);
}
public void concatenate(Chain chain)
// Attach the Chain chain at the end of the considered chain
{
if (first == null)
first = chain.first;
else
getLast().next = chain.first;
chain.first = null;
}
public boolean findValue(int x)
// Test whether there is a node with key value x
{
Node node = first;
while (node != null && node.key != x)
node = node.next;
return node != null;
}
public void print()
// Print all the keys together with their position in the list
{
int counter = 0;
Node node = first;
while (node != null)
{
System.out.println("Node " + counter + " has key " + node.key);
counter++;
node = node.next;
}
}
}
class ChainTest
{
public static void main(String ps[])
{
Chain c1 = new Chain();
Chain c2 = new Chain();
System.out.println("\nCreating chain 1\n");
c1.addFirst(12);
c1.addFirst(22);
c1.addFirst(16);
c1.addFirst(14);
c1.addFirst(20);
c1.addFirst(18);
for (int i = 0; i < 100; i++)
if (c1.findValue(i))
System.out.print(i + " is among the stored values\n");
c1.print();
System.out.println("Total number of nodes = " + Node.totalSize);
System.out.println("\nCreating chain 2\n");
c2.addLast(11);
c2.addLast(23);
c2.addLast(19);
c2.addLast(37);
c2.addLast(21);
for (int i = 0; i < 100; i++)
if (c2.findValue(i))
System.out.print(i + " is among the stored values\n");
c2.print();
System.out.println("Total number of nodes = " + Node.totalSize);
System.out.println("\nConcatenating chains\n");
c1.concatenate(c2);
for (int i = 0; i < 100; i++)
if (c1.findValue(i))
System.out.print(i + " is among the stored values\n");
System.out.println("\nChain 1:\n");
c1.print();
System.out.println("\nChain 2:\n");
c2.print();
System.out.println("Total number of nodes = " + Node.totalSize);
System.out.println("\nRemoving chain 1\n");
c1 = null;
System.gc();
System.out.println("Total number of nodes = " + Node.totalSize);
}
}
The class Chain is unchanged. Node is augmented by a static variable. A static variable is the fourth kind of variables next to instance variables, parameters and local variables. These might best be called class variables, so belonging to the class and not to the instance: for all the objects of a class there is only one copy of a static variable. This is the ideal way to maintain information pertaining to the class as a whole. The prime example of this is a counter which keeps track of the number of objects extent. For example, it may be counted how many external ports are in use, and once a new port is requested when the maximum number is already used, some special action must be taken. Inside the class these variables can be accessed just like the instance variables. Outside the class they are accessed analogously to the way a static method is accessed: the name of the static variable is prefixed with the name of the class connected by ".". An example is found in the instruction
System.out.println("Total number of nodes = " + Node.totalSize);
Of course this access is possible only if the variable is not private.
In Node we now also find a new method called finalize. The method finalize is called automatically by the system when an object is removed by the garbage collector, once for each removed object. It is by default part of any class definition (in an unvisible way) doing nothing, but one can choose to give it a certain functionality. Especially when one uses static variables to count occupied resources, it is important to also decrease there value when these resources are freed again.
Now one might think that in our example the value printed in the last line is 0: because there are no pointers anymore to the chain, all nodes in it have become garbage, unaccessible allocated parts of the memory. So, they could be removed. However, the garbage collection is done in a lazy way: typically it is only performed when need arises or when the processor is waiting anyway. Therefore, the printed value will most likely be 11. If one wants to force the garbage collector to run, then one should add a call to the static method gc from System:
System.gc();Notice that in concatenate the final instruction is deleting the link of the attached chain. Without this instruction, the second half of the chain would still have been reachable, and the garbage collector would only throw away 6 of the nodes. It is strongly suggested that the readers actually try these variants of the program and understand what is happening.
In total we have encountered four different uses of the dot operator:
The main distinction is between static and non-static, not between variables and methods. In an object-oriented language, the methods are considered to belong to the classes and objects just as much as the variables. To underline this, a non-static method might be called an instance method, a static method might be called a class method. Only internally there is a difference: after compilation the difference between the types of methods disappears, and any method, static or not, is stored only once.
Static variables are class variable: one copy exists for all objects of a given class. This is particularly useful for counters. In order to keep the counting up-to-date in the context of automatic garbage collection one should overwrite the method finalize().
Harder is it if we do not want to add an instance variable or a method but to change it. For example we may want to change the type of the node first in Chain from Node to BetterNode or we may want to replace a method which is good in a general case by a method which is better in a special case. Of course we can give it a new name and add it nevertheless. This is however quite ugly and confusing. It would at least require a very good documentation to make sure that later updates indeed choose the right methods. In any case it increases the number of variables and methods unnecessarily.
Now assume that we want to maintain objects with slightly different features in a common structure, for example an array. One can think of a shop having all kind of things to sell. For food articles there is an ultimate selling day, for non-food articles there may be seasons to respect. But all of them have a price. So, it makes sense to maintain all objects in an array and to call a method price increase. In C this is really hard to realize.
All mentioned aspects are dealt with in a trivial way by the idea of inheritance. Inheritance means that one defines a new class as an extension of an existing class. Such a new class is called a derived class, the class which it extends will be called mother class or base class.
A derived class inherits all the instance variables and methods of its mother class. In addition new instance variables and methods may be added. Instance variables from the mother class may even be defined again, shielding the variable from the mother class. Methods can be overwritten. Frequently a method in a derived class is merely a small modification of a method in the mother class. In that case it is natural and possible to reuse the code from the mother class by a special calling mechanism.
Inheritance is the key concept of object-oriented programming. It allows to add, extend and adapt the functionality of methods and to add instance variables to a class in a hierarchical way.
All what has been mentioned so far, holds true for any object-oriented language, possibly with some differences in terminology. The concrete example brings us back to Java. The class definitions of Employee and Company are not repeated, these classes are considered as being fixed. All of the following classes are all build on top of these two. It turns out that while designing these original classes, we might have been slightly more extension oriented: one method is formulated in an unsuitable way, another is not defined at all, even though it will arise in all derived classes. Therefore the following construction is slightly more complex than necessary. This might be considered as a realistic example therefore. Click here if you want to download the complete program.
class FixedEmployee extends Employee
{
public FixedEmployee(String name, int number, double salary)
{
super(name, number, salary);
}
public void endOfYear()
{
}
}
class Director extends FixedEmployee
{
private double yearlyBudget;
private double budget;
public Director(String name, int number, double salary,
double theBudget)
{
super(name, number, salary);
yearlyBudget = budget = theBudget;
}
public void endOfYear()
{
budget = budget / 2 + yearlyBudget;
}
public void expense(double amount)
{
budget -= amount;
}
public void increaseSalary(double salaryIncrease)
{
if (budget >= 0)
super.increaseSalary(2.0 * salaryIncrease);
}
public String toString()
{
return "(" + name + ", " + number + ", " + salary +
", director, " + yearlyBudget + ", " + budget + ")";
}
}
FidexEmployee is only used to add the method endOfYear, which is
defined in all derived classes.
Director is defined as an extension of FixedEmployee. Director has two additional instance variables: "yearlyBudget" and "budget". The new constructor has one more parameter. It performs first a call super( ... ). In this case this means a call to the constructor of the mother class. However, the usage of super is not limited to this case: it generally denotes methods or instance variables in the mother class. The opposite is this, which we encountered already in class Node. It generally denotes the current object or a method, particularly a constructor, from the current class.
"endOfYear" and "expense" are new methods. More interesting are the methods which existed already before: "increaseSalary" and "toString". These are overwriting the methods with the same name in the mother class. increaseSalary calls the method in the mother class by specifying this with super.
class LowerEmployee extends FixedEmployee
{
protected int vacationDays;
protected int yearlyVacationDays;
public LowerEmployee(String name, int number, double salary,
int theYearlyVacationDays)
{
super(name, number, salary);
vacationDays = 0;
yearlyVacationDays = theYearlyVacationDays;
}
public int applyVacation(int numberOfDays)
{
if (numberOfDays > vacationDays)
numberOfDays = vacationDays;
vacationDays -= numberOfDays;
return numberOfDays;
}
public void endOfYear()
{
vacationDays = vacationDays / 2
+ yearlyVacationDays;
}
}
The class LowerEmployee has the same features as Director: a few new
instance variables and methods. In the constructor the constructor of
the mother class is again called. It is a requirement that this call is
the first statement of any constructor in a derived class.
class Staff extends LowerEmployee
{
private int overTime;
public Staff(String name, int number, double salary,
int yearlyVacationDays)
{
super(name, number, salary, yearlyVacationDays);
overTime = 0;
}
public void addOvertime(int hours)
{
overTime += hours;
}
public void endOfYear()
{
super.endOfYear();
vacationDays += overTime / 10;
overTime = 0;
}
public String toString()
{
return "(" + name + ", " + number + ", " + salary +
", staff, " + yearlyVacationDays + ", " + vacationDays + ")";
}
}
class Worker extends LowerEmployee
{
private static int shiftVacationDays = 5;
private boolean shiftDuty;
public Worker(String name, int number, double salary,
int yearlyVacationDays, boolean theShiftDuty)
{
super(name, number, salary, yearlyVacationDays);
shiftDuty = theShiftDuty;
}
public void increaseSalary(double salaryIncrease)
{
if (shiftDuty)
super.increaseSalary(1.1 * salaryIncrease);
}
public void endOfYear()
{
super.endOfYear();
if (shiftDuty)
vacationDays += shiftVacationDays;
}
public String toString()
{
return "(" + name + ", " + number + ", " + salary +
", worker, " + yearlyVacationDays + ", " + vacationDays +
", " + shiftDuty + ")";
}
}
The variable shiftVacationDays is static. This means that this is not
an individual quantity, but common to all members of the class.
class BetterCompany extends Company
{
public BetterCompany(int maxSize)
{
super(maxSize);
}
public void addEmployee(FixedEmployee newEmployee)
{
if (size == maxSize)
System.out.print("No space left, ignoring instruction!\n");
else
{
staff[size] = newEmployee;
size++;
}
}
public void endOfYear()
{
for (int i = 0; i < size; i++)
if (staff[i] instanceof FixedEmployee)
((FixedEmployee) staff[i]).endOfYear();
}
public void expense(int number, double amount)
{
int i = 0;
while (i < size && staff[i].getNumber() != number)
i++;
if (i == size)
System.out.print("Number not found, ignoring instruction!\n");
else
if (staff[i] instanceof Director)
((Director) staff[i]).expense(amount);
else
System.out.print("Employee with number " + number +
" is not a director, ignoring instruction!\n");
}
}
The class BetterCompany corrects an omission in Company: the method
addEmployee with an Employee parameter. Notice that in this case we
do not say that addEmployee is overwriting the method with the same
name in the mother class: the signature of these methods is not the
same. Here we rather encounter polymorphic variants.
The new method endOfYear makes it possible to perform endOfYear in the same way as increaseSalary in the original version. The new method expense makes it possible to call the method expense in Director in the same way as before we could call changeName.
In endOfYear we see the operator "instanceof". The reason for this is that staff[] is an array of Employee objects. Even though we might believe that these are actually of type fixedWorker, for which the method endOfYear is defined, there might also be a derived class TemporaryAid for which endOfYear is not defined. At this point it is important to introduce the difference between the declared type and the actual type of a variable. The declared type of staff[i] is Employee, the actual type may be any of the derived classes. instanceof determines at runtime the actual type of a variable and returns true if this matches the specified type.
Even though we now are sure that the application of endOfYear is correct, it still does not work to simply write
staff[i].endOfYear();
The problem is that endOfYear is not mentioned in class Employee. Thus,
at compile time, this looks wrong. Therefore it is required to add a
so-called cast. A cast is a forced type conversion. So, we
convert staff[i] in a FixedEmployee, on our own responsibility. Not
withstanding the cast, at runtime the actual type determines which
method to select.
Now we have obtained all we need to get a main program with considerably larger functionality. The changes to make are small. If Company would have been designed better, with a method addEmployee with Employee parameter, the changes would have been even less.
class CompanyTest
{
public static void main(String ps[])
{
BetterCompany myCompany = new BetterCompany(100);
myCompany.addEmployee(
new Staff("Becker, Boris", 235521, 4500.00, 28));
myCompany.addEmployee(
new Director("Hecht, Edgar", 878722, 6500.00, 10000000));
myCompany.addEmployee(
new Worker("Albers, Marianne", 456212, 1554.00, 23, false));
myCompany.print();
myCompany.endOfYear();
myCompany.print();
myCompany.addEmployee(
new Worker("Krauser, Angela", 426578, 1954.00, 25, true));
myCompany.addEmployee(
new Staff("Noack, Christina", 663738, 5646.00, 32));
myCompany.print();
myCompany.increaseSalary(0.04, 50.0);
myCompany.addEmployee(
new Worker("Brauer, Harald", 568900, 2200.00, 25, true));
myCompany.setName(456212, "Becker, Marianne");
myCompany.expense(878722, 73000);
myCompany.print();
System.out.print("\n");
}
}
Running the program gives the following output, clearly showing the
result of the more individual treatment.
Overview of employees: Employee[0] = (Becker, Boris, 235521, 4500.0, staff, 28, 0) Employee[1] = (Hecht, Edgar, 878722, 6500.0, director, 1.0E7, 1.0E7) Employee[2] = (Albers, Marianne, 456212, 1554.0, worker, 23, 0, false) Overview of employees: Employee[0] = (Becker, Boris, 235521, 4500.0, staff, 28, 28) Employee[1] = (Hecht, Edgar, 878722, 6500.0, director, 1.0E7, 1.5E7) Employee[2] = (Albers, Marianne, 456212, 1554.0, worker, 23, 23, false) Overview of employees: Employee[0] = (Becker, Boris, 235521, 4500.0, staff, 28, 28) Employee[1] = (Hecht, Edgar, 878722, 6500.0, director, 1.0E7, 1.5E7) Employee[2] = (Albers, Marianne, 456212, 1554.0, worker, 23, 23, false) Employee[3] = (Krauser, Angela, 426578, 1954.0, worker, 25, 0, true) Employee[4] = (Noack, Christina, 663738, 5646.0, staff, 32, 0) Overview of employees: Employee[0] = (Becker, Boris, 235521, 4680.0, staff, 28, 28) Employee[1] = (Hecht, Edgar, 878722, 7020.0, director, 1.0E7, 1.4927E7) Employee[2] = (Becker, Marianne, 456212, 1554.0, worker, 23, 23, false) Employee[3] = (Krauser, Angela, 426578, 2039.976, worker, 25, 0, true) Employee[4] = (Noack, Christina, 663738, 5871.84, staff, 32, 0) Employee[5] = (Brauer, Harald, 568900, 2200.0, worker, 25, 0, true)
The above gives an example of polymorphism in the more strict sense: polymorphism means that variables can actually stand for different kinds of objects. This implies that parts of the program which are formulated in general terms can be applied to different kinds of objects. This notion is closely linked to the notion of dynamic binding: the above described phenomenon, that at runtime the actual type is used to determine which of the methods with identical signature is going to be used.
At compile time, it is checked that any method is connected by the dot operator to an object of a class in which this method is defined. This is done by checking the declared type of the object. At run time, the method to execute is chosen by looking at the actual type of the object connected to the method.
Now it is time to mention that any class is implicitly defined as an extension of class Object. Object is at the top of the class hierarchy. Without knowing this, we have already been using this fact implicitly. Consider a print statement of the following kind:
System.out.print("Employee[" + i + "] = " + staff[i] + "\n");
How does this work? First the expression between the round brackets is
evaluated. Here we use that the operator "+" is polymorphic, although
for operators we rather say that they are overloaded. So,
depending on the types of the arguments, "+" has a different effect.
This is nothing new, we already know that 3 / 4 < 0.5, while 3.0 / 4 > 0.5. The reason is here that in the first case "/" is evaluated as an integer operation, while in the second case it is evaluated as an operation between doubles. The rule for "/" is that it is evaluated as an integer operator if both its arguments are integers. If one of the arguments is a float or a double, then the other argument is converted to this type as well before the division is performed between floats or doubles. Notice that the resulting type has no impact: if x is a double, then "x = 3 / 4" is equivalent to writing "x = 0". Slightly more tricky is that "x = 3 / 4 * 10.0" has the same effect. The reason is that among operators with the same priority, the evaluation order goes from left to right (in this case).
The rules for "+" are different but similar. "+" between two String objects performs a concatenation of these. If the arguments are objects of other classes, then first the method toString is called. Because toString is defined in Object, this always works. Not overwriting toString results in a standard layout. Overwriting toString, as is done in Employee, allows to tune the output. Only when both arguments of "+" are of a numerical type, it is assumed that an addition is to be performed. Therefore we have
"Value = " + i + i != "Value = " + (i + i) i + i + "= Value" != i + (i + "= Value")
Casts are sometimes needed to obtain a forced type conversion.
Unfortunately there is no modifier for "the own class + all derived classes". The only way to obtain this is to define a method / instance variable as "protected" and not integrating any non-derived classes in the package.
A careful choice of the applied modifiers is of great importance: making everything public is convenient, but implies that external applications may essentially use features of the internal realization of a class. If later one wants to change this internal realization, then it may happen that these applications do not run correctly anymore.
It is good practice to fix a well-defined interface between the class and the outside world: that is to fix which instance variables of an object should be visible and which methods should be callable. Less visibility gives more flexibility! Classes should be defined according to their functionality, not according to how it is realized. For example: a Chain has the functionality of a special kind of (multi) set, with two insert operations and the possibility to unify to Chain objects. The general idea of limiting the access is called encapsulation, it is one of the corner stones of object-oriented programming.
The above argument should have made clear that it is wrong to only use public. But only using "private" or "public" is not good either. Sometimes classes are designed with the explicit purpose that they are going to be derived. One can consider Employee to be of this type. One may consider that the structure of Employee is so reasonable that there will never arise need to modify it. At the same time derivations are considerably facilitated if the instance variables and methods are accessible from the derived classes. Therefore, we have chosen to use "protected" for the instance variables in Employee.
The access modifiers allow the programmer to fix the degree of encapsulation of classes, objects and methods. Mostly instance variables are private or protected and can be accessed only by special access methods
Abstract is more or less the opposite of final: an "abstract" class must be derived. One cannot create any objects of an abstract class. Likewise, an abstract method must be overwritten. To make this consistent, the designers of Java have decided that abstract methods can only appear in abstract classes (but an abstract class can have methods that are not abstract).
There are good reasons to allow polyinheritance: many objects incorporate aspects of several more general classes. A person can both be an Employee and a ClubMember, an article in a shop may both be a FoodArticle and a LuxuryArticle. However, polyinheritance may also lead to consistency problems: if BClass and CClass each extend AClass and DClass extends both BClass and CClass, then methods from AClass are inherited in two possible ways. If a method from AClass has been overwritten in BClass and/or CClass, then at runtime it would not be clear which one to take.
To exclude this kind of problems in Java polyinheritance is generally forbidden. In other languages this problem is addressed differently: One might generally allow polyinheritance, but forbid inheritances which result in having equally valid variants of methods. One might allow any kind of inheritance, and in case a method is inherited several times, one might for example always select the variant from the first listed class in which it is defined. Java has chosen the most restrictive approach, assuring correctness and facilitating the task of the compiler, at the expense of programming possibilities.
Each class extends at most one other class, assuring that the inheritance hierarchy has a tree structure. But, classes may implement many interfaces, telling which methods certainly exist.
The properties of interfaces can easily be summarized:
Because interfaces have neither instance variables nor worked-out methods, there are no problems related to having implementations of several interfaces and therefore a class may implement any number of interfaces.
Classes may implement many interfaces, telling which methods certainly exist. Specifying that a class implements an interface gives its minimal guaranteed functionality.
interface EndOfYearable
{
public void endOfYear();
}
class Director extends Employee implements EndOfYearable
{
...
public void endOfYear()
{
budget = budget / 2 + yearlyBudget;
}
...
}
class LowerEmployee extends Employee implements EndOfYearable
{
...
public void endOfYear()
{
vacationDays = vacationDays / 2
+ yearlyVacationDays;
}
...
}
class Staff extends LowerEmployee
{
...
public void endOfYear()
{
super.endOfYear();
vacationDays += overTime / 10;
overTime = 0;
}
...
}
class Worker extends LowerEmployee
{
...
public void endOfYear()
{
super.endOfYear();
if (shiftDuty)
vacationDays += shiftVacationDays;
}
...
}
class BetterCompany extends Company
{
...
public void endOfYear()
{
for (int i = 0; i < size; i++)
if (staff[i] instanceof EndOfYearable)
((EndOfYearable) staff[i]).endOfYear();
}
...
}
public interface Comparable
{
int compareTo(Object o);
}
So, any class which implements Comparable promises to provide an
implementation of the method compareTo() returning an int. Of course,
this method can be realized in any way, but some ways make more sense
than others. The intended semantic of compareTo() is that it returns
a negative integer, zero, or a positive integer as this object is less
than, equal to, or greater than the specified object.
Consider the following classes:
class CompString implements Comparable
{
char[] characters;
CompString(String string)
{
characters = string.toCharArray();
}
public int compareTo(Object o)
{
return characters.length - ((CompString) o).characters.length;
}
}
class CompMatrix extends IntegerMatrix implements Comparable
{
CompMatrix(int size)
{
super(size);
}
CompMatrix(IntegerMatrix matrix)
{
super(matrix);
}
public int compareTo(Object o)
{
return trace() - ((IntegerMatrix) o).trace();
}
}
These classes have nothing to do with each other except that they both
implement compareTo() in one of many possible ways (one could think of
more sensible ways to compare strings and matrices).
The great thing is now that arrays of instances of the classes
CompString and CompMatrix and any other class which implements the
Comparable interface can be sorted with the following sorting method:
class CompSort
{
static void sort(Comparable[] a, int n)
{
for (int r = n - 1; r > 0; r--)
for (int i = 0; i < r; i++)
if (a[i].compareTo(a[i + 1]) > 0)
{
Comparable x = a[i];
a[i] = a[i + 1];
a[i + 1] = x;
}
}
}
The underlying sorting algorithm (known as bubble sort) is not
particularly efficient, but that is not the issue here. The purpose of
the code example is to demonstrate how interfaces can be used in an
effective way. Here again the dynamic binding is crucial: in the
sorting routine, the actual type of the objects to compare is
determined at runtime and then the appropriate implementation of
compareTo() is selected. Without dynamic binding, we had to have
different sorting routines for each class whose objects we would like
to sort.
The most exciting feature of interfaces is that they allow to process objects of different classes that share a single aspect with a single method.
How does this work? If an error occurs, then one of the following things happens:
There are two types of exceptions: runtime and general exceptions. General exceptions must be dealt with, runtime exceptions may be dealt with. An example of a general exception is when reading: Java obliges the programmer to be aware of the possibility to read beyond EOF. Thus, every read must be surrounded by a try-catch. The following piece of code gives a class which contains a static method for reading an integer. In case something goes wrong while reading, the user is informed, and 0 is returned from the method.
import java.io.BufferedReader;
import java.io.InputStreamReader;
class IntReader
{
static int readInt()
// Reads an integer from input
{
try
{
return Integer.valueOf(
(new BufferedReader(
new InputStreamReader(System.in)).readLine())).intValue();
}
catch (java.io.IOException e)
{
System.out.print("IO Exception occurred, returning 0");
return 0;
}
}
}
class ExceptionTest
{
public static void main(String ps[])
{
int i;
System.out.print("Give i >>> ");
i = IntReader.readInt();
System.out.print("i = " + i + "\n");
}
}
An example of a runtime exception is division-by-zero: the programmer may for this and choose an appropriate reaction, but this is not required. Testing for all possible errors would make programs long and slow, so therefore this freedom is good. The keyword throws allows to handle exceptions at a higher level. Using throws indicates that one is aware of the possibility that something might go wrong, but that one does not want to deal with it at this level. Using throws may help to save many try-catch pairs.
import java.io.BufferedReader;
import java.io.InputStreamReader;
class IntReader
{
static int readInt() throws java.io.IOException
// Reads an integer from input
{
return Integer.valueOf(
(new BufferedReader(
new InputStreamReader(System.in)).readLine())).intValue();
}
}
class ExceptionTest
{
public static void main(String ps[])
{
int i;
System.out.print("Give i >>> ");
try
{
i = IntReader.readInt();
}
catch (java.io.IOException e)
{
System.out.print("IO Exception occurred, continuing with i == 0");
i = 0;
}
System.out.print("i = " + i + "\n");
}
}
In the section on interfaces it was considered how a single sorting method could be used to sort all kinds of arrays, as long as their obejcts were comparable. This works really fine, unless the provided array contains objects of different comparable types: appels can be compared with appels, and pears with pears, but ... . This will be rarely a problem, because the programmer knows that the array only contains objects that are mutually comparable. However, it may nevertheless be handy to deal with this and similar other exceptional circumstances.
A convenient way of doing is to define a NonComparableException which is thrown when it is attempted to compare with non-comparable objects. Otherwise the instruction "a[i].compareTo(a[i + 1])" will lead to an error: because of the dynamic binding, the method compareTo() of the class of a[i] will be called. In this method some instance variables of a[i] and a[i + 1] will be accessed. If a[i + 1] is an instance of a different class this will result in an attempt to access a non-existing instance variable. The solution is simple. In compareTo(), before casting obj to the class of this, it should be tested of obj is an instance of the same class using the operator instanceof. If this is the case, the cast is safe. Otherwise a NonComparableException is thrown. Using throws this exception may be guided all the way up. In the following program, which is run as a batch job, this is not that useful: without the added tests the program would have crashed, we would have corrected the error, and tried again. However, an interactive job such as an online banking applet should not crash. Instead it should tell the user what went wrong and how to proceed.
interface SafeComparable
{
int compareTo(Object o) throws NonComparableException;
}
class NonComparableException extends Exception
{
NonComparableException(String string)
{
super(string);
}
}
class SafeCompString implements SafeComparable
{
char[] characters;
SafeCompString(String string)
{
characters = string.toCharArray();
}
public int compareTo(Object o) throws NonComparableException
{
if (o instanceof SafeCompString)
return characters.length -
((SafeCompString) o).characters.length;
throw new NonComparableException("Object not a SafeCompString");
}
public String toString()
{
return new String(characters);
}
}
class SafeCompMatrix extends IntegerMatrix implements SafeComparable
{
SafeCompMatrix(int size)
{
super(size);
}
SafeCompMatrix(IntegerMatrix matrix)
{
super(matrix);
}
SafeCompMatrix(int size, int[] array)
{
super(size);
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
a[i][j] = array[i * n + j];
}
public int compareTo(Object o) throws NonComparableException
{
if (o instanceof SafeCompMatrix)
return trace() - ((SafeCompMatrix) o).trace();
throw new NonComparableException("Object not a SafeCompMatrix");
}
public String toString()
{
String s = "( ";
for (int i = 0; i < n; i++)
{
for (int j = 0; j < n; j++)
s += a[i][j] + " ";
if (i != n - 1)
s += "| ";
}
return s + ")";
}
}
class SafeCompSort
{
static void sort(SafeComparable[] a, int n)
throws NonComparableException
{
for (int r = n - 1; r > 0; r--)
for (int i = 0; i < r; i++)
if (a[i].compareTo(a[i + 1]) > 0)
{
SafeComparable x = a[i];
a[i] = a[i + 1];
a[i + 1] = x;
}
}
}
class SafeCompSortTest
{
static void sort(SafeComparable[] s, int n)
{
try
{
SafeCompSort.sort(s, n);
for (int i = 0; i < n; i++)
System.out.println(s[i]);
}
catch (NonComparableException e)
{
System.out.println("Array NOT sorted, " + e);
}
}
public static void main(String[] args)
{
System.out.println("-------------------------------------");
SafeComparable[] s = new SafeComparable[10];
int a[];
System.out.println();
s[0] = new SafeCompString("abcde");
s[1] = new SafeCompString("cde");
s[2] = new SafeCompString("abcde");
s[3] = new SafeCompString("cdav");
s[4] = new SafeCompString("abxcde");
s[5] = new SafeCompString("xx");
sort(s, 6);
System.out.println();
a = new int[4]; a[0] = 10; a[1] = 5; a[2] = 7; a[3] = 8;
s[0] = new SafeCompMatrix(2, a);
a = new int[1]; a[0] = 6;
s[2] = new SafeCompMatrix(1, a);
a = new int[4]; a[0] = 11; a[1] = 6; a[2] = 17; a[3] = 5;
s[3] = new SafeCompMatrix(2, a);
sort(s, 4);
System.out.println();
a = new int[1]; a[0] = 43;
s[1] = new SafeCompMatrix(1, a);
sort(s, 4);
System.out.println("\n-------------------------------------");
}
}
Exceptions are there to assure that in case something goes wrong a decent output is produced and resources are freed before crashing or going on in an alternative way.
At a superficial level, the object-orientedness of Java is expressed by the way methods are called: an object is connected to a method of the class of this method with the dot-operator, putting the object in the foreground. Much more important are the following general concepts of object-oriented programming:
IntegerMatrix should also have a static variable totalSize keeping track of the sum of all sizes of all matrices, and the constructors should refuse to allocate new memory when totalSize would exceed MAX_TOTAL_SIZE, for some constant. In that case some output is produced, ideally this is handled by a self-defined exception, but this is not required. The method finalize() should be overwritten to assure that totalSize remains accurate even when IntegerMatrix objects are removed by the garbage collector.
Integrate class IntegerMatrix into a program which creates several matrices, makes some assignments and performing some operations. More concretely, we want you to create matrices A, B and C, as specified below, and to compute A = A . (B + C).
( 1 7 2) ( 3 -7 -3) ( 0 4 -2)
A = (-1 2 7) B = (-4 2 3) C = ( 0 -1 -5)
( 1 4 -5) (-6 -1 3) (10 5 -2)
The initial, intermediate and final matrices should be printed.
Check that the computed results make sense:
( 3 -3 -5) (-17 12 -17)
B + C = (-4 1 -2) A . (B + C) = ( 17 33 8)
( 4 4 1) (-33 -19 -18)
This array should be sorted. To this end you should define a class Sort which has a static method sort() which has an array of Pairs as parameter. Sort has another parameter which is used to pass the value of m. Here we are not so much interested in efficiency but in handling classes. Define a further class, called Node. A Node has two instance variables: a Node and a Pair. The class NodeArray mainly consists of an array of Nodes. In our application this array has length m. Because the Nodes will be linked to each other so that they form lists, an object of NodeArray can be viewed as a set of m linked lists. In NodeArray there is a method which allows to insert a Node at the beginning of a list at a specified position of the array. NodeArray also has methods which allow to enumerate all Nodes in all lists in a systematic way, starting with the list at array position 0. The sorting can now be performed by sort() as follows:
Fill in the details yourself and work this out to a running program. Test it for m = 10 and n = 20.
In the current version, if there are several Pairs with the same key, then the order of these Pairs will get reversed. This is undesirable: in many applications it is required that a sorting subroutine is stable. With a minimal change the above sorting method can be made stable. How?
What is the running time of your algorithm expressed in terms of n and m? What do you get for m = O(n)?
Of course you should define a class Set for this. All operations should be perfectly intransparent and the instance variables should not be visible outside the class. All calls to the methods of set must be performed in an object-oriented way, none of the mentioned methods may be static.
Random numbers can be generated with help of the methods in the class Random in java.util. Use this to generate three random sets of size 100.000.000 each:
The task is to compute the number of bets resulting in a price (each bet gets at most one price). That is, you should first compute the union of S_1 and S_2, then intersect with S_3 and finally compute the size of the resulting set. Print this resulting number (if it does not lie between 1.940.000 and 1.960.000, then probably something is wrong with your program).
Integrate LastChain into a program: take the program ChainTest from the text above and change the type of c1 and c2 from Chain to LastChain. The text of the program can be downloaded here.
In a search tree, the nodes are not are not arranged arbitrarily, but so that for any node the key of its left child (if existing) is smaller than its own key, and that the key in its right child is larger. This arrangement allows to easily perform the operation find: determining whether an element with a specified key exists or not. This is done in the following way: If the value x is smaller than the key y of the current node, then, if x occurs at all in the tree, it must occur in the left child or the nodes which can be reached from there. If the current node has no left child, then x does not occur. In case x > y, we must go right. If x is equal to the key, then we have found the value.
Create a class SearchTree implementing the above ideas. The class has an instance variable Node root. "root" corresponds to "first" in Chain: this is the node from which the structure is entered. There must be a trivial constructor, a method find along the above guidelines and a method print. The return type of find should be Node: it returns null when the value x we were looking for does not occur, otherwise it returns the Node with key equal to x. "print" should print all nodes in some systematic way. A very good idea is to do it recursively, A method is called recursive when it works by calling itself again (with a certain stopping condition). This recursive printing should however be handed over to a method within the class Node or an extension thereof (after testing that root != null). It has a structure of the following kind:
void print()
{
if (left != null)
{
System.out.print("Going left\n");
left.print();
}
System.out.print("Key value = " + key + "\n");
if (right != null)
{
System.out.print("Going right\n");
right.print();
}
}
It is a good idea, but not required, to also hand over find to
a method in the class Node.
Create the search tree from the picture "by hand", that is, by creating nodes with appropriate keys one by one and hooking them in the correct way. Then call print for the tree.
Inserting a node with key x in a search tree is also easy: Search for x. If x already occurs, we do not insert it again. Otherwise if the search ends in a node with key y != x, then if y > x, a new Node with key x is added as left child, otherwise as right child. Delete can be performed by marking the deleted nodes in a special way, if this value is inserted again later on, the marking must be undone.
Create a derived class MarkNode of Node which has one additional instance variable: boolean deleted. Of course this class also needs a constructor. The class Dictionary is a derived class of SearchTree. It has additional instance variables int size and int realSize. "size" indicates the number of non-deleted nodes, while "realSize" indicates how many nodes are physically there. Methods insert and delete are added. The actual work should best be done at the level of MarkNode.
Now create the same tree again by inserting the elements in appropriate order. For two trees to be the same the structure and the keys in corresponding nodes must be the same.
Create an empty Dictionary. Generate 100,000 random values in the range 0, ... , 199,999 and insert these in the order they are generated. Print the size of the tree. It should lie around 78600. Generate 100,000 random numbers in the same range and count how many of them occur in the tree. It should be about 39300. Generate 100,000 random numbers in the range 0, ... , 199,999 and perform a delete for all of them. Print again the number of remaining nodes, now print both size and realSize. size should lie around 47500, realSize should be the same as before.
Create an empty Dictionary. Insert the numbers 0, 1, ..., 99,999 in this give order. What do you notice. What is the reason? Why did this not happen before? What is your conclusion about the suggested data structure Dictionary?
(a, b) + (c, d) = (a + b, c + d),Here the symbols +, - and * inside the brackets denote the operations on doubles. Define a class ComplexRing implementing these operations as methods. The methods should be called add, subtract and multiply. They should be non-static: the return value is the object the method is called with: computing x = y + z is performed by calling x.add(y, z). The method isZero is non-static. It returns a boolean when the complex number passed as an object is equal to zero. A complex number (a, b) is zero when a == 0 and b == 0.
(a, b) - (c, d) = (a - b, c - d),
(a, b) * (c, d) = (a * b - b * d, b * c + a * d).
Add a constructor which can be called with two double arguments. Also add a method (called with an object connected to the method name with the dot operator) readComplex. readComplex asks for two doubles which are read with help of the class DoubleReader. It can be downloaded here. Overwrite the method toString from Object to enable printing complexNumbers in a decent way. The instance variables should be "protected" and the methods "public".
We build on on the class ComplexRing. Define a derived class ComplexField. This class has one extra instance variable and some extra methods. The private double instance variable "norm" at all times gives the norm of the complex number, which for a number (a, b) is defined as a^2 + b^2.
The method isZero is overwritten: the test on zero is simplified to norm == 0. reciprocal is a private static method returning the reciprocal of the complex number passed as an argument. This is defined as follows: for a number (a, b) the reciprocal is given by the number
(a, b)^{-1} = (b / norm, -a / norm).Use this method to define a method divide which returns x / y = x * y^{-1} in the object z with which it is called for two complex numbers x and y passed as arguments. So, the method can be called as z.divide(x, y).
In all methods both these of ComplexRing and CompledField one should be careful to assure that if the arguments overlap with the calling object, that then the correct result is computed. Either one might test for this and write a special strategy, or one should work with some dummy variable.
Define an exception divisionByZero. This exception is thrown by the method divide when the second argument equals zero. Read the above text on exceptions to see how this is done and consider the example in class Seven.
Create a program with main embedded in a class called ComplexTest. In main five complex numbers are created: u, v, x, y, z. u, x and y are read in. v is initialized at zero. Then compute z = x + y - z, and subsequently z = z * v. Print the results. Then compute x / x x / y and z / z and print the results.
An applet integrated in an html-page is executed on the computer of the user who is visiting it, not on the computer which is hosting the page. This has important consequences. Because this applet should run anywhere, independent of the available system resources it implies that it cannot read or write any files. This is desirable anyway, because we would not like an unknown applet to leave possibly dangerous rubbish on our computer. In principle executing applets is save, because our own browser continuously checks that the applet is not doing anything dangerous.
Everything works by creating a class which is derived from Frame. For example, in the following example we will have a class Manager which extends Frame. In main(), or somewhere else, an object of this class is created by calling the constructor of this class. In this constructor or elsewhere all components are allocated to the frame. Then the methods pack() and show() are applied to this object. pack() determines an appropriate size, show() puts the new window on the screen.
The following layout managers are available:
Mostly even some input is desired, in that case an event listener must be attached to some of the graphical objects. In this case the class which extends Frame or one of the other classes must implement the interface ActionListener. This interface contains the method actionPerformed(), which must be implemented. actionPerormed() has a single argument, an object e of the class ActionEvent. If something happens, such as clicking the mouse, the operating system passes an interrupt to the program, which causes an the method actionPerformed() to be called with a corresponding ActionEvent. In actionPerformed() it can be figured out what has happened, for example by applying the method getSource() to e.
A minimal page containing an applet looks as follows:
<html>
<head>
<title>Nice Applet
</head>
<h1>Title_of_the_Page
Surrounding text, maybe telling what the applet is doing.
<p>
<center>
<applet code = ProgramName
width = 700
height = 580
Substitute text>
</applet>
</center>
</body>
</html>
Once an applet and a webpage of the above type has been created, it can be viewed. An applet is intended to be viewed by opening the page with a browser. Thereupon the compiled code of the applet is transferred from the computer hosting the weppage to the client computer and started. This is a rather slow process and when making changes to the applet it is not always easy to convince the computer to reload the code. Therefore, during the development it may be convenient to view an applet with the program appletviewer by typing "appletviewer name_of_webpage".
Then a more understandable shorthand was developed, which is called assembler. Assembler is in very close connection to the machine code. It consists of basic instructions for loading and storing registers, for arithmetic operations, comparisons and jumps within the program. An essential feature of assembler is that it is interpreted, that is, each instruction is substituted by the corresponding machine code at runtime. Because there is such a close correspondence, this is no big issue.
The next step in the development came in the late fifties with Fortran (Formula Translator). Fortran offers many more instructions and allows to define derived types, such as arrays. Fortran is one of the first compiled languages, that is, before starting the execution, the program is fed to a special program called compiler, which translates the program into assembler.
The idea of compiling a program is crucial, because it allows to write programs on one hand in more or less human language, while at the same time it can be executed almost as well as dedicated assembler code (a good compiler is even better than an assembler programmer who is not familiar with the details of the underlying hardware). Even more important is that the compiler can be hardware specific. This implies that programs written in Fortran designed for a Vax computer from 1980 can now be translated to run on a Pentium IV even though the assembler instructions are probably quite different. Thus, with every new processor model, we need a new compiler for each of the supported languages, but there is no need to rewrite all programs.
Once this idea was around, many languages developed. Fortran (in a modernized version) is still used in scientific computing. Cobol already came in use during the early sixties and is still used for administrative purposes. Algol 60 and Algol 68 have died, but they inspired Pascal and C. The older languages are less structured, and they lead to code which is often very hard to follow. Algol 68 was a beautiful language with many nice features, very pure and rather abstract, therefore it never became so popular.
C on the other hand has become terribly popular, though later much of its popularity was taken over by C++ and Java. It is one of the earlier languages from the high-day of structured programming. The main point of the structured-programming paradigm is that a program should be structured by the extensive usage of subroutines, which can also be hierarchically nested. This avoids jumping around, as is common in Fortran: the only supported jumps (though some goto-statement still exists in most languages) are in conditional statements, loops and procedure calls. C allows to write very compact code. Because of the possibilities of the language and because there are good compilers, C and C++ are still the best choices when speed is crucial.
Pascal, which was designed by Niclaus Wirth around 1975, is simpler and more structured than C. Many handy but risky constructions which are allowed in C are forbidden in Pascal. With Pascal the structured-programming paradigm reached its full development. Another important aspect which gradually evolved over time is the typing mechanism: in Pascal there is very strict typing. Assignments can be made only between variables whose type is exactly the same. Strict typing prevents sloppy programming, and can help to reduce the number of errors.
Next to these mainstream languages, there have been developed several more special purpose languages. Simula for example was designed for simulation purposes. For working with text there are other languages.
For the typical needs of scientific computing, C is all one needs: it is easy to learn, simple to use and leads to fast programs. For software projects there are different needs. Speed is only one of many important aspects, typically not one of the most important ones. Correctness, understandability for outsiders and extendibility are often much more important. This suggests a modular programming style, such as employed by Modula: a program package consists of boxes with shielded interior.
One step further brings us to object-oriented programming: no longer is the action in the center, but the information which is manipulated. The action which can be performed on an object of a certain type is an integral part of the type definition. Of course one can ignore this, just like one can ignore the ideas of structured programmed when programming in Pascal or C, but the way object-oriented languages are designed very strongly suggests a certain style of programming. Any program using non-trivial data structures profits enormously from the object-oriented style, leading much faster to running programs. Another crucial feature of object-oriented languages is the concept of inheritance. This means that a type is defined as an extension of a previously defined type, inheriting its structure and its functionality. This is precisely the feature one needs for creating data structure libraries: users can extend supplied types to fit their needs.
Many object-oriented programming languages have been developed. The most important two are C++ and Java. C++ is a follow-up to C, as the name suggests. But, it is much more than that. At the elementary level it contains almost all C instructions and therefore, one can use C++ compilers for compiling C code. However, it is extended by the whole set of object-oriented possibilities, and in addition it has a huge set of standard functions. C++ is huge.
Java is the more recent development. When designing Java the designers tried to learn from earlier mistakes. One of the main focuses in Java is failure prevention. That is why many things are simply forbidden in Java. There are no explicit pointer types, because these are a major source of errors. All pointers are implicit. Java has a much stricter typing mechanism than C and C++. It actually helps: quite complex tasks are often running correctly once the last syntactical error has been removed. At first Java was nice and small, but as soon as it became popular, the set of standard types (called classes in Java) exploded just as in C++, and now probably no one dares to claim he/she knows the whole language. Java has considerable build-in support for creating graphical applications, and of course Java is the language for animating the internet because it is only one step from a Java program to an applet. Because Java was designed with the internet in mind, Java is in principle an interpreted language (though compilers also exist): the source code is pre-chewed and turned into some intermediate code which then is interpreted. During the first years this made Java considerably slower than C and C++. This was not too serious because only in special cases speed is the main concern of software development. Furthermore, this speed difference has disappeared almost entirely. A more important disadvantage of Java, for beginners and experienced programmers alike, is that sometimes easy tasks can only be solved in a seemingly unnecessarily complicated way.
The idea of functional programming is to express the desired action of a program in terms of a function. What does a program do? It somehow transforms input into output. In imperative programming (be it structured or object-oriented), the focus is on the action, so on how this transformation is performed. In functional programming, the focus lies on the transformation. A simple example is a function which accepts a set of numbers and returns the one with the smallest value. Functions can be used in a connection to realize more complex functions, just like in mathematics one can write sqrt(e^(cos(x^2))), which for any argument x, first applies the square function, then the cosine function, then the exponent function and finally the square-root function.
Neither arguments nor return values have to be numerical, they can be of any of the many types that can be declared. A type is a collection of values which have their most essential aspects in common, so that the same functions can be applied to them. Examples of trivial types are integers, floats, booleans, chars. Examples of non-trivial types are strings, sets, pictures, graphs, ... .
Haskell is a recent functional programming language. Dating back to around 1987. The name Haskell is the first name of H.B. Curry, who contributed noticeably to the lambda-calculus theory, which is a mathematical theory of functions, the foundation of functional programming. There are several implementations of Haskell, we will use the Hugs interpreter. Hugs and much more information are freely available on the Haskell homepage at http://www.haskell.org.
If Hugs has been installed on the computer, one first types "hugs" to enter the Hugs environment. Then program lines can be typed in directly or they can be loaded from files. The latter is more convenient. Loading a file is done by typing ":l file_name" (a list of commands can be obtained by typing ":?"). Hereafter the defined functions can be called by simpling typing there name followed by their parameters. The answer is printed immediately on the screen. For example, if the factorial function has been defined with name "fac", typing "fac 6" returns 720.
We have learned to perform simple computations in primary school and got so used to these operations that we take then like identities: for many people the expression "(2 + 3) * 4" is considered to be a synonym for the number twenty. However, this is only by convention. Somewhere at the bottom it has been defined what the numbers stand for (1 is one more than 0, 2 is one more than 1, ...), and then it has been defined what addition is (namely by counting) and then what the product operation is (namely by telling that it is a repeated addition). So, it is important to distinguish between the expression its evaluation and its value.
In a pocket calculator many simple operations (functions) are build in, and pushing the "=" button evaluates the pending expression (though typically intermediate results are evaluated before this). An important point is that we do not have to care about how internally the evaluation is performed.
The implementation of a functional programming language can be viewed as a pocket calculator for which own functions can be defined. A functional program consists of a number of definitions of types and functions, which ultimately are combined to compute the desired result. The result of a function may either be turned into output (by converting it to a printable string), or it may be used as input to another function.
Like on the pocket calculator, it is irrelevant how the computation below the level of function application is performed. For example, there is no such thing as an array or a for loop. However, there are list data types, and there are numerous standard functions working on lists (like copying, throwing away the first or last n elements, ...). Thus, instead of using a for loop to manipulate an array, we use a list operation to manipulate a list. This makes the formulation much shorter and abstracter. The correctness of a functional program can be proven in a formal way much more easily than the correctness of an imperative program. This is the main argument in favor of using functional programming. On the other hand, designing functional programs requires a special more abstract way of thinking which takes some time to develop. We tend to think and argue in a dynamic way: "we start at the first element, to which we add i, and then we take the next element and so on, each time adding one less than before".
The text in this chapter is to a large extend extracted from two sources: the lecture notes "Functioneel Programmeren" by Jeroen Fokker (in Dutch) and the book "Haskell, the Craft of Functional Programming" by Thompson. The first is far more systematic the second is maybe more motivating with a running example which is gradually developed, presenting the theory in small easily digestible bits.
findMin S_1".If we also have a function "unify" and a second set "S_2", then it is also correct to write
findMin (unify S_1 S_2)In this case the brackets are necessary, because otherwise findMin would interpret unify as an argument, but unify is not a set but a function mapping two sets into one set.
In Haskell a name or identifier is associated with values of a certain type by a declaration involving the operator "::". They have the following form:
name :: typeFor example, if we have already defined a type "Set", then we may write
S1 :: Set S2 :: SetThis is nothing special, here variables are declared in terms of a known type, just like we would write "Set S_1" in C or Java.
Much more interesting is the following:
findMin :: Set -> Set unify :: Set -> Set -> SetThis is different from anything we have seen before: here functions are declared.
Once we have defined the sets and the functions, it is possible to work with these functions without knowing anything about the details of the underlying realization. This important idea, which is called type abstraction is fundamental to functional programming just as for object-oriented programming.
The general format of function declaration is
name :: t_1 -> t_2 -> ... -> t_k -> tHere t_1, t_2, ..., t_k give the types of the formal parameters of the function, t gives the return type. There is no need for a type void, because a function without return value does not need to be called and a void parameter can just as well be omitted.
So, functions are declared just as variables. We will also see how to assign a value, that is a certain functionality, to a function. Of course this is essential in anything called functional programming, but the elegance and importance of this concept is something one can take a second extra to think about.
Assigning values is done with the assignment operator "=". For the sake of concreteness we will now give some integer examples, because we do not yet know how to give values to a variable of type set.
square :: Int -> Int square n = n * n i :: Int i = 5 j :: Int j = square i + 2 k :: Int k = square (i + 2)
The name of a formal parameter is arbitrary. Particularly it does not need to be different from the names of other variables as it is strictly local. In a strict sense it is wrong to speak of variables: i, j and k are no variables, as they can be given a value only once. In Haskell there are only constants. The distinction between constants and functions is artificial: a constant is a function with zero parameters: it always returns the same value, whereas a function with parameters may return different values depending on the values of its parameters. This is also expressed by the analogous type declarations. In the following we will nevertheless continue to speak of variables and in the case of a variable we will mostly use the word assignment for the action of giving it a value. For functions it sounds better to speak of definition for the action of allocating a certain functionality to it.
The order of the the statements is not important at all, though a logical ordering may improve the readability. It is good practice to specify the type of variables and functions, but for most of them this is clear anyway, and in those cases the formal declaration may be omitted. So, the above fragment of Haskell may also be written as:
k = square (i + 2) i = 5 square n = n * n j = square i + 2
In this text we distinguish between declarations, assignment and definitions, and if we mean just any of them we will speak of statements. In the context of Haskell, for example in the book by Thompson, it is customary though to use the word definition in this extended meaning.
The assignment tells that any argument of integer type, passed as an actual parameter, is to be substituted for the formal parameter n. In Haskell function application has highest priority. Thus, the value of j is evaluated as (i * i) + 2, which after substitution of 5 for i gives 27. The value of k is evaluated as (i + 2) * (i + 2), which after substitution of 5 for i gives 7 * 7, which is evaluated to 49.
Slightly more complicated is the following function with two variables:
i :: Int i = 3 j :: Int j = 4 Int k squareSum :: Int -> Int -> Int squareSum i j = square i + square j k = squareSum i jHere there are two formal parameters i and j. These have nothing to do with the variables i and j. When calling the function, the actual parameters, which by coincidence are i and j, are substituted for the formal parameters. The function square is called for them and the result is added together, returning 25 which is assigned to k.
The general format for a function definition is as follows:
name x_1 x_2 ... x_k = eHere x_1, x_2, ..., x_k are the formal parameters and the returned value in terms of these formal parameters is given by the expression e.
A first glance of the power of functional languages we get by considering the possibility to compose functions. Assume that a function "intSqrt :: Int -> Int" has been defined, which for a number n returns the largest Int m so that m^2 <= n. Then, we can define
intNorm :: Int -> Int -> Int intNorm = intSqrt . squareSumHere the operator "." combines the two functions: the output from squareSum is fed into intSqrt. This is the mathematical functional composition.
In mathematics it is not uncommon to apply operators to functions: f + g, f^2, f . g are all well-defined. In all cases the definition of operators on functions is given by telling what the value of the resulting function is for an argument, for example, the function f + g is defined by saying that for all x, (f + g) (x) = f(x) + g(x). Likewise, f . g is the function so that for all x, (f . g) (x) = f (g (x)).
These two possibilities of defining functions are analogous to the two possibilities of assigning values to a variable:
Both kinds of definitions are in terms of earlier defined functions. For numerical functions this may work, but for functions over self-defined data types we must be able to start somewhere.
not :: Bool -> Bool not True = False not False = True or :: Bool -> Bool -> Bool or False False = False or False True = True or True False = True or True True = True and :: Bool -> Bool -> Bool and False False = False and False True = False and True False = False and True True = True exor :: Bool -> Bool -> Bool exor False False = False exor False True = True exor True False = True exor True True = FalseWhen executing a Haskell program containing a line "or x y", the value of x and y is looked up, and it is checked whether somewhere there is a definition of the or function which matches. If yes the appropriate value is returned, else an error message is produced.
The usage of literals in the definition of a function can be combined with parameters. Further savings can be made by using other functions. The above function definitions may also be written shorter:
or :: Bool -> Bool -> Bool or False False = False or x True = True or True x = True and :: Bool -> Bool -> Bool and x False = False and False x = False and True True = True exor :: Bool -> Bool -> Bool exor x y = and (or x y) (not (and x y))In the definition of exor the brackets are necessary because a line is processed from left to right, and otherwise the function "and" would consider "or" to be its first argument.
absDif:: Int -> Int -> Int
absDif x y
| x >= y = x - y
| otherwise = y - x
This function returns the absolute value of the difference of x and y:
from two arguments x and y it returns x - y if x >= y and otherwise it
returns y - x. Of course even the boolean functions can be defined using
guards, but this does not make the definitions shorter and clearer:
not :: Bool -> Bool
not x
| x == True = False
| x == False = True
So, instead of one expression we find several expressions which are conditioned by the boolean expressions called guards appearing behind the "|" symbols. For any set of actual parameters substituted for the formal parameters, the guards are evaluated in the order they appear and the returned value is the one given by the expression standing after the first guard evaluating to True. A similar approach is followed for a function definition by enumeration: the first matching alternative is chosen. For functions it is not tested whether there is a definition for all possible values of the parameters, nor whether definitions are conflicting.
The general form for a function definition with guards is as follows:
name x_1 x_2 ... x_k
| g1 = e1
| g2 = e2
...
| otherwise = e
Alternatively, this may also be done with help of an if-then-else construction:
absDif:: Int -> Int -> Int absDif x y = if x >= y then x - y else y - xThe variant with guards makes a more `functional' impression and is usually preferred.
Guards allow to formulate recursive functions:
fac :: Int -> Int
fac n
| n == 0 = 1
| otherwise = fac (n - 1) * n
Given that in a functional language there are no loop-statements,
recursion is the only way to assure a repeated execution while not
having to write the iteration out in code explicitly.
The above works fine for all n > 0 and even for n == 0 the correct value is returned. But how about a call "fac -3"? It calls "fac -4", which call "fac -5", which ... . Of course the function "fac" should not be called with negative values, but it is better to catch this erroneous behavior in a decent way:
fac :: Int -> Int
fac n
| n == 0 = 1
| n > 0 = fac (n - 1) * n
| otherwise = error "fac only defined for positive numbers"
Haskell has a build-in error mechanism. It is more convenient than the
error mechanism in Java: if an "error" is encountered, the execution is
interrupted with some notice which may also contain a user-specified
string.
We consider one more recursive function. The famous Fibonacci numbers (named after a 14th century mathematician from Pisa) are given by
fib(0) = 0, fib(1) = 1, fib(n) = fib(n - 1) + fib(n - 2), for all n > 1.The sequence starts as follows: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, ... .
This definition immediately leads to the following recursive Haskell function, showing how easily mathematical formulations can be turned into Haskell scripts:
fib :: Int -> Integer
fib n
| n == 0 = 0
| n == 1 = 1
| n > 1 = fib (n - 1) + fib(n - 2)
| otherwise = error "fac only defined for positive numbers"
This function is correct, but leads to a very inefficient computation: it is easy to show that the number of recursive calls to the function fib is exponential to the number n. The general risk with recursion is that it leads to elegant short formulations, which are extremely inefficient. Further down we will see how to compute fib(n) in a time that is linear in n.
To be precise, in the above case the number of recursive calls equals fib(n), which is about g^n, for g = (1 + sqrt(5)) / 2 ~= 1.61. This number g is called the golden ratio and already the old-Greeks (Pythagoras) were fascinated by it.
In most computer languages one can define functions, but the possibility to define operators is less common. Operators can have names which are composed of the following symbols: !, #, $, %, *, +, ., /, <, =, >, ?, \, ^, |, :, - and ~.
An operator is nothing but a binary infix function. An operator with name "&&&&" may be turned into a binary prefix function by enclosing it in round brackets: "(&&&&)". This even holds for the predefined operators such as "+". For example one can write
(+) 2 3These brackets can be used to define an operator just as any other function. For example:
(&&&&) :: Int -> Int -> Int i &&&& j = square i + square j
It is also possible to use binary functions like an operator in an infix way: a binary function is turned into an operator by enclosing it in back quotes. For example, for a binary function "max" we may write
k = i `max` j
It is well-defined how to execute a line with several functions, variables and numbers: anything within brackets is to be executed first. After eliminating all brackets, the line is read from left to right, where every function looks for the arguments it needs to the immediate right of it.
Operators are more flexible in their behavior than functions. Using brackets any desired execution order can be completely specified. However, relying on their associativity and fixity most brackets can be saved. Fixity is what is also called binding power or priority in other languages. It tells us which operator to apply first when in an expression operators with different fixities appear (such as "*" and "+").
The associativity tells us how to evaluate an expression involving operators with the same fixity. For example, 3 - 4 + 5 == (3 - 4) + 5 == 4 and not 3 - 4 + 5 == 3 - (4 + 5) == -6. The reason is that "+" and "-" have the same fixity and that they are left-associative (only in the Netherlands "+" has a higher priority than "-" and there the second answer is correct). An operator is left-associative if the evaluations goes from left to right, otherwise it is called right-associative. Many operators are left-associative, but not all. The assignment operator "=" is right-associative. Also the power function "^" is commonly, and in Haskell, right-associative: 3^3^3 == 3^(3^3) == 3^27 and not 3^3^3 == (3^3)^3 == 3^9.
One might think that there are only two categories of associativity, but there are four! It is namely also possible that an operator is non-associative, which means that it is simply forbidden to have an expression of the form "a ~ b ~ c", where "~" stands for an operator. Examples of non-associative operators are the comparison operators "==", ">", ... and the division operator "/". So, in Haskell it is forbidden to write 45 / 3 / 5, this expression must be clarified by using brackets.
On the other hand may operators be simply associative. Here it does not matter how the expression is evaluated. This is the case for operators which are commutative: 3 + 4 + 5 == 4 + 5 + 3, so it cannot matter whether we first compute 3 + 4 and then add 5 or we first compute 4 + 5 and then add 3.
For all predefined operators the fixity and associativity is fixed. In case of doubt it is the best to use brackets, alternatively it can be looked up in a table. Essentially the rules are the same as for C and Java. The fixity ranges from 0 (lowest) to 9 (highest).
The default fixity of self-defined operators is 9. However, the fixity and associativity can be set in a simple way. For example
infixl 6 &&&&attributes fixity 6 and left-associativity to the operator "&&&&". Using "infixr" would have made it right-associative.
Fixity and associativity can even be attributed to binary functions:
infixl 8 `max`assigns fixity 8 and right-associativity to the binary function "max" in case it is used as an operator.
In fact, there are two types for integral numbers. "Int" is the same type as in other languages: 32 bits, positive and negative, thus the largest positive number (in two's complement) is 2^31 - 1. There is also the type "Integer", which can be used to exactly represent arbitrarily (within the limits of the available memory) large numbers.
There are also two floating point number types: just as in C and Java there is also a type Double, which can be used for double precision numbers.
Because Haskell has strong typing and no automatic type conversion, it may be important to explicitly denote the type of an expression or a literal: it is not clear what "2 + 3" stands for, this might be an Int or an Integer, because both literal can have either of the types and the operator "+" is overloaded. Fixing a type is done in the same way it was done in declarations: with help of the operator "::". Thus, we might write "(2 + 3) :: Int". The brackets are used here to indicate that we mean the type of the expression and not just of the final number.
The type Bool is used for Booleans. It has two literals: True and False.
Characters are there in the type Char. Literals of type Char are written in single quotes: 'a', 'b', 'c', ... . There are the usual conversion functions between Int and Char:
ord :: Char -> Int chr :: Int -> CharWith help of these functions it is easy to perform operations on characters, for example testing whether a character is a digit:
isDigit :: Char -> Bool isDigit c = (ord 0 <= ord c) && (ord c <= ord 9)However, in this case we can save the calls to ord, because these are build into the system: the comparison operators "<", "<=", ">=" and ">" can be applied directly on characters, comparing the underlying numbering of these. Thus, we can also write more simply:
isDigit :: Char -> Bool isDigit c = (0 <= c) && (c <= 9)
Another predefined type is String. It is not a primitive type (but a list of Char), but it has special operators. The literals of this type are enclosed in double quotes: "aabb", "", "a", ... . Here we were showing a string of length 4, composed of the characters 'a', 'a', 'b' and 'b', an empty string, containing zero characters and a string of length 1 containing the single character 'a'. A string of length 1 is to be distinguished from a character, because they have different type. Therefore there are different functions and operators that can be applied to them. For example writing "ord "a"" results in a type error:
Main> ord "a" ERROR - Type error in application *** Expression : ord "a" *** Term : "a" *** Type : String *** Does not match : CharFurther down, after discussing lists in general, we will come back to strings.
For example, we might store the following script in a file myScript.hs
{- Here may come a longer comment telling the name of the author
and the date the script was written or updated. It should also
be mentioned what the program is doing, how the input, if any,
should be given and what output is produced. -}
-- Two ints are declared and initialized
i :: Int
i = 3
j :: Int
j = 4
-- The function squareSum with two arguments is declared and defined
squareSum :: Int -> Int -> Int
squareSum i j = square i + square j
-- A third integer is declared and gets assigned the square sum
-- of i and j by calling the function squareSum.
Int k
k = squareSum i j
The same script can also be given in a literate from, storing it in a file myScript.lhs:
Here may come a longer comment telling the name of the author and the date the script was written or updated. It should also be mentioned what the program is doing, how the input, if any, should be given and what output is produced. -} Two ints are declared and initialized > i :: Int > i = 3 > j :: Int > j = 4 The function squareSum with two arguments is declared and defined > squareSum :: Int -> Int -> Int > squareSum i j = square i + square j A third integer is declared and gets assigned the square sum of i and j by calling the function squareSum. > Int k > k = squareSum i j
Considering the mathematical and abstract nature of Haskell, the literate variant of scripts is not unnatural at all: the actual code will be relatively short because it is based on powerful hidden operations which allow to express ideas concisely, but at the same time they may require some more explanation then well-written code in an imperative language.
The goal is to come away from hacking and to write code fragments with a clearly specified behavior which is also testable. So, even though we the only difference between a traditional and a literate script is in the lay-out, which one can consider as a minor detail, it marks an important conceptual difference.
An elegant, but irrelevant, feature of Haskell is that comments may be nested: a comment can stand inside a comment. In C, the following does not work
/* This a long comment /* here we find a nested comment */ the
comment continues over two lines. */
After finding the "/*" the (pre)compiler only searches for the closing
bracket "*/". It does not count the number of "/*" standing open. In
Haskell this is done, so there it is correct to write
{- This a long comment {- here we find a nested comment -} the
comment continues over two lines. -}
module ElementaryFunctions where
-- here follow the definitions
...
There is the possibility of importing the definitions from other modules by using the keyword "import" in combination with the name of the other module. Inside a module it can be specified which definitions are available for export to other modules. For the purpose of this lecture the module concept is of no importance, but if need arises one should know that it exists. There is a certain analogy of modules and classes in object-oriented languages: classes can be inherited, and the access modifiers (private, ...) allow to control the access of variables and methods.
Normally everything on a line belongs to a single declaration, assignment or definition, and therefore there is no need for special separator symbols. However, one may write several statements on the same line. In that case the special end symbol ";" must be used. For example, one may write
i :: Int ; i = 3 j :: Int ; j = 4
An important rule for saving brackets is the offside rule. This rule tells what belongs to the definition of a function. Above we have written
squareSum i j = square i + square j k = squareSum i jHowever, this might also have been formatted as follows:
squareSum i j =
square i +
square j
k = squareSum i j
or in many other ways.
The rule is that a definition ends when closing an invisible box: the box contains the line on which it starts and all following lines as long as they are beginning to the right of the first letter of the definition. So, in the example the two lines containing "square i" and "square j" belong to the box opened by the word "squareSum", but the line "k = squareSum i j" does not belong ot this box because k the stands again as far left as the word "squareSum".
This rule also implies that it is wrong to write the following:
squareSum i j = square i + square jHere we will get an error message telling that an unexpected ";" is encountered. The reason is that internally at the end of every box a ";" is placed. Here the first box ends at the "=" symbol. However, "=" should not have an empty right-hand side. The next box ends after the "+" symbol. However, "+" should not have an empty right-hand side.
Of course many things are correct, which are nevertheless not good. It is a good idea to align alternatives vertically and to use fixed indentation rules. Also, a ";" should be separated by blanks rather than written connected to the last letter of the definition: even though formally it is an end symbol, it is more natural to perceive it as a separator.
Names of the following categories must start with a small letter:
Names of the following categories must start with a capital:
Of course some words are so-called reserved words, which are used by the language and which cannot be used for denoting own defined symbols. Examples are "case", "do", "if", "import", "type", "where".
More interesting is the problem that one imports a module in which a name appears that one would like to reuse. This is not an advanced problem, because by default, without explicitly requiring this, the module "Prelude.hs" is imported. In this prelude there are functions for many elementary functions, such as
min :: Int -> Int -> Int max :: Int -> Int -> Intwhich compute and return the minimum and maximum of two integers.
If one uses the word "min" or "max" as a name for an own defined function then an error message complaining about a clash with the imported function is the result. So, there is no automatic overwriting or hiding mechanism as in Java. In this case one can make the import of the prelude explicit and to specify that min and max are not to be imported using the keyword "hiding":
import Prelude hiding (max, min)
It is allowed to give the same name to a variable and a function (because for the interpreter the distinction will be clear), but this is usually confusing and therefore disrecommended.
These collections appear in two categories: the tuples collect a specified number of objects which may have different types (like records in Pascal or classes in Java), the lists collect an unspecified number of items of the same type (like linked lists). Strings are nothing more than lists of characters which have gotten their own type as a concession to convenient programming.
(t_1, t_2, ..., t_n)The instances of such a tuple type look like
(v_1, v_2, ..., v_n)where v_i must be of type t_i, for all i.
As an examples of tuples we consider the following:
("Umeå", -12.7, -4.0, 12)
("Halle", 2.2, 4.5, 1)
These tuples are of type (String, Float, Float, Int), and might give
the weather data of a city: name, minimum temperature, maximum
temperature and precipitation.
Of course one can define variables of tuple types:
weatherData :: (String, Float, Float, Int)
weatherData = ("Umeå", -12.7, -4.0, 12)
As a second example we consider a personal record, consisting of last name, first name and age. These tuples have type (String, String, Int). A literal of this type is given by ("de Boer", "Henk", 32).
Functions can work on tuples and return them as results:
increaseAge :: (String, String, Int) -> (String, String, Int) increaseAge (x, y, z) = (x, y, z + 1)
Using explicit type definitions helps to do this in a more concise way and allows for data abstraction as well. Therefore, it is a very good practice to explicitly define types and use the name of the type instead of writing out the structure in all usages. For our example we could define the following:
type Person = (String, String, Int)
Then a small script might look as follows:
increaseAge :: Person -> Person
increaseAge (x, y, z) = (x, y, z + 1)
myFriend :: Person
myFriend = ("de Boer", "Henk", 32)
increaseAge myFriend
Function like "increaseAge" returning a tuple offer a new possibility: We already have encountered functions with several parameters (x, y and z could also have been passed as three separate parameters instead as one tuple), but before we were not able to return more than a single value. Here the function returns a tuple which contains several values.
The above function call works with pattern matching: when providing the actual parameter ("de Boer", "Henk", 32) of type Person, then the fields in this parameter are matched against the fields in the formal parameter (x, y, z). One should realize that this is one level more of unpacking then what we have seen before.
So, if we had written:
type Name = (String, String) type Person = (Name, Int)we could define
lastName :: Person -> String lastName p = fst (fst p) firstName :: Person -> String firstName p = snd (fst p) increaseAge :: Person -> Person increaseAge p = (fst p, lst p + 1)However, it appears that the equivalent definitions with pattern matching are easier to formulate and understand, and therefore these are mostly to be preferred.
In C or Java it is easy to compute the Fibonacci numbers:
int fib(int n) {
int i, x, y, z;
if (n == 0)
return 0;
if (n == 1)
return 1;
for (x = 0, y = 1, i = 1; i < n; i++) {
z = x + y; x = y; y = z; }
return y; }
In the following we show how this behavior can be imitated in a functional way. This is done by a trick that allows to simulate the efficient bottom up computation, starting at the low side, rather than by a conventional recursive top down computation starting at the high side.
The key operation is the following function "fibStep" which performs a single step of the computation. This function is called in an indirect way with one recursive call for each level instead of two, thus preventing the exponential behavior:
fibStep :: (Integer, Integer) -> (Integer, Integer)
fibStep (x, y) = (y, x + y)
fibPair :: Int -> (Integer, Integer)
fibPair n
| n == 0 = (0, 1)
| n > 0 = fibStep (fibPair (n - 1))
| otherwise = error "No negative values!"
fastFib :: Int -> Integer
fastFib = fst . fibPair
How does this work? When calling fastFib 6, it first computes fibPair 6 and then takes the first component of the returned tuple. So, the most important is to understand how fibPair 6 works. From the function definition we can see that it unrolls as follows:
fibPair 6 fibStep (fibPair 5 ) fibStep (fibStep (fibPair 4 )) fibStep (fibStep (fibStep (fibPair 3 ))) fibStep (fibStep (fibStep (fibStep (fibPair 2 )))) fibStep (fibStep (fibStep (fibStep (fibStep (fibPair 1 ))))) fibStep (fibStep (fibStep (fibStep (fibStep (fibStep (fibPair 0 )))))) fibStep (fibStep (fibStep (fibStep (fibStep (fibStep (0, 1) ))))) fibStep (fibStep (fibStep (fibStep (fibStep (1, 1) )))) fibStep (fibStep (fibStep (fibStep (1, 2) ))) fibStep (fibStep (fibStep (2, 3) )) fibStep (fibStep (3, 5) ) fibStep (5, 8)
So, first we only go down, `counting' how far we still have to go. For each further level down, one function call is added. When reaching the bottom of the recursive definition, some result is substituted and the `saved' function calls are executed, just as in an imperative program with a loop repeatedly calling the function.
Of course such a proof goes by complete induction. What do we want to prove exactly? We want to prove that fastFib(n) == fib(n). This appears to be quite analogous to the prove that the function "fac" computes the faculty function. Let us prove this first, as an exercise so to say. We recall that "fac" was defined as follows:
fac n
| n == 0 = 1
| otherwise = fac (n - 1) * n
We must show that fac(n) == n!. The basis is given by the case n == 0: fac(0) =alg= 1 =def= 0!. So, assume that fac(i) == i! for all i < n. Then, fac(n) =alg= fac(n - 1) * n =ind= (n - 1)! * n =def= n!. Here "=alg=" denotes an equality because of the algorithm, "=ind=" denotes an equality because of the induction assumption, while "=def=" indicates an equality because of a mathematical definition. The argument is short and convincing.
Let us now try to apply the same method to proving that fastib(n) = fib(n), where fib(n) denotes the mathematically defined n-th Fibonacci number. fastFib(0) == fst(0, 1) == 0 == fib(0). So, this gives a good basis. However, if we want to go on, we have no foundation for an argument: the algorithm works with the pairs and not with the numbers fastFib(n). So, accordingly we must make a stronger claim, from which the claim we want to prove will follow. This stronger claim is that fibPair(n) == (fib(n), fib(n + 1)). If this holds, then clearly (fst . fibPair)(n) =def= fst(fibPair(n)) =ass= fst(fib(n), fib(n + 1)) =def= fib(n).
So, it suffices to prove that fibPair(n) == (fib(n), fib(n + 1)) for all n >= 0. Let us try to do this now. For n == 0, we have fibPair(0) =alg= (0, 1) =def= (fib(0), fib(1)). This is our basis. So, assume that fibPair(i) == (fib(i), fib(i + 1)), for all i < n, for some given n. Then, fibPair(n) =alg= fibStep(fibPair(n - 1)) =ind= fibStep((fib(n - 1), fib(n)) =alg= (fib(n), fib(n - 1) + fib(n)) =def= (fib(n), fib(n + 1)). This completes the proof. The reader should consider how much harder it would have been to prove the same correctness of the computation in a imperative programming language.
In a similar way we can also prove that the number of calls to the functions fibPair and fibStep is linear in n. Because each of these functions contains only a constant number of instructions this immediately implies that calling fastFib(n) has a time consumption that is somehow linear in n. This is terribly much faster than the earlier formulation. With fastFib it is no problem to compute fastFib 10000. With fib it already requires considerable patience to compute fib 30.
As in any inductive proof, we must first formulate a precise claim. In this case our claim is that calling fastFib with parameter n >= 0, that this results in exactly n calls to the function fibStep and n + 1 calls to fibPair. To allow us to talk about these numbers in a more mathematical way, we must introduce corresponding functions (in the context of grammars such functions were called attributes). So, let them be given by T_step(n) and T_pair(n). The claims can now be formulated concisely as
T_pair(n) == n + 1, T_step(n) == n, for all n >= 0.
The case n == 0 constitutes the basis of the recursive proof. For n == 0 no recursive calls are made. The result is one call to fibPair and zero to fibStep: T_pair(0) == 1 and T_step(0) == 0, as it should be. So, assume the claim holds for all i < n, for some n > 0. Then we see from the definition of fibPair that calling fibPair with parameter n results in exactly one call to fibStep plus one call to fibPair with parameter n - 1. This gives
T_pair(n) =def= 1 + T_pair(n - 1) =ind= 1 + n =cmp= n + 1,
T_step(n) =def= 1 + T_step(n - 1) =ind= 1 + (n - 1) =cmp= n.
[1, 2, 4, 4, 2, 1] :: [Int] [True] :: [Bool] ['a', 'b', 'b', 'a'] :: [Char] [min, max, fac] :: [Int -> Int] [[3, 4, 5], [], [4, 7]] :: [[Int]] [] :: [Int] [] :: [Bool]
The examples show that the concept is not limited to simple types, but may be applied also to functions, as long as they all have the same type, and lists. The empty list "[]" is also a list. Often this is an important special case in inductive algorithms. The empty list can have any list type, because it may contain zero objects of any previously defined type.
The examples also show that elements may appear several times, so a list is definitely not a set in a mathematical sense, but rather a multiset. Furthermore, as we will see the order in lists is relevant:
[1, 2, 4, 4, 2, 1] != [2, 1, 4, 4, 1, 2] [min, max, fac] != [max, fac, min]Two lists are equal if and only if they have the same type of objects, the same number of these and if the objects at corresponding places are equal.
For enumerated types, that is types which have a natural ordering such as numbers and characters, lists can be written shorter by indicating ranges as in [m .. n]. The default increase value is 1 and if n comes before m, then the list is empty. We give some examples. The short-hands on the left, the equivalent full definition on the right:
[5 .. 9] == [5, 6, 7, 8, 9] [9 .. 5] == [] [3.1 .. 6.8] == [3.1, 4.1, 5.1, 6.1] ['a' .. 'f'] == ['a', 'b', 'c', 'd', 'e', 'f']
One can also specify the increase (which actually might be a decrease) by indicating the first two values according to the format [m, n .. p]. The following examples make this notion clear:
[5, 7 .. 12] == [5, 7, 9, 11] [5, 7 .. 13] == [5, 7, 9, 11, 13] [9, 8 .. 5] == [9, 8, 7, 6, 5] [3.1, 3,2 .. 3.6] == [3.1, 3.2, 3.3, 3.4, 3.5, 3.6] ['a', 'c' .. 'f'] == ['a', 'c', 'e']
It is important to notice that all values within the range including the bounds are included, but that the end value itself is not necessarily an element of the list, because it may be jumped over by the increase. When we have a list of Float this may even happen when the increase is 1 as we can see in the above example [3.1 .. 6.8].
: a -> [a] -> [a] add a single element to list ++ [a] -> [a] -> [a] join two lists !! [a] -> Int -> a element with specified index concat [[a]] -> [a] concatenate constituting lists length [a] -> Int length of list head [a] -> a first element of list last [a] -> a last element of list tail [a] -> [a] list with first element removed init [a] -> [a] list with last element removed replicate Int -> a -> [a] list with specified number of copies take Int -> a -> [a] list with first so many elements drop Int -> a -> [a] list without first so many elements splitAt Int -> [a] -> ([a], [a]) split list at specified index reverse [a] -> [a] reverse order of list zip [a] -> [b] -> [(a, b)] turn two lists into list of pairs unzip [(a, b)] -> ([a], [b]) turn list of pairs into pair of listsNotice that the first three are operators (as should be clear from the fact that they are written with `operator letters').
The following examples illustrate what is meant:
'x' : ['a', 'b', 'c'] == ['x', 'a', 'b', 'c'] [7, 9, 11] ++ [9, 8, 7] == [7, 9, 11, 9, 8, 7] [1, 2, 7, 4, 8, 5] !! 3 == 4 concat [[3, 4], [6], [7, 1]] == [3, 4, 6, 7, 1] length [3, 4, 6, 7, 1] == 5 head [3, 4, 6, 7, 1] == 3 last [3, 4, 6, 7, 1] == 1 tail [3, 4, 6, 7, 1] == [4, 6, 7, 1] init [3, 4, 6, 7, 1] == [3, 4, 6, 7] replicate 4, 'x' == ['x', 'x', 'x', 'x'] take 2 [3, 4, 8, 6, 7, 1] == [3, 4] drop 2 [3, 4, 8, 6, 7, 1] == [8, 6, 7, 1] splitAt 2 [3, 4, 8, 6, 7, 1] == ([3, 4], [8, 6, 7, 1]) reverse [3, 4, 8, 6, 7, 1] == [1, 7, 6, 8, 4, 3] zip [2, 3, 4, 5] [4, 2, 3] == [(2, 4), (3, 2), (4, 3)] unzip [(2, 4), (3, 2), (4, 3)] == ([2, 3, 4], [4, 2, 3])Notice that the first position of a list has, as in arrays in C and Java, index 0. This is important to be aware of when using the operator "!!". When zipping lists of unequal length, the tail of the longer list is automatically discarded. So, zip and unzip are not really inverse operations. Furthermore, zip takes two lists as arguments, unzip produces one pair of lists as result.
These functions are all one could desire: they offer both typical array functions such as "!!" and typical list function such as "drop" and splitAt. Using a combination of splitAt, drop 1 and concat, it is possible to delete an element at any specified position. Some more tricks allow to delete an element with a specified value. This is a true list operation.
On the other hand, one should never be fooled by the fact that it is possible to write an operation with just one instruction: we do not know (and mostly should not want to know) how lists are realized at the lower level. Most likely lists are implemented as linked lists, and in that case, if there is no extra pointer there, it takes time proportional to the length of the list to perform the operation "last". If on the other hand lists are implemented as arrays, then an operation like "tail", which on linked lists can be performed with a constant number of computer instructions, requires that all elements are shifted one position, which takes time proportional to the length of the array.
The above functions are defined for any list type [a]. This polymorphism is discussed further down. There are also some useful functions defined only when the list is of a specified type:
and [Bool] -> Bool conjunction of all booleans in list
or [Bool] -> Bool disjunction of all booleans in list
sum [Int] -> Int sum of all values in list
[Float] -> Float
product [Int] -> Int product of all values in list
[Float] -> Float
The basic format of a comprehension is
[ expression_in_terms_of_variable_n | n <- some_list ]As a concrete example consider
doubleValue :: [Int] -> [Int] doubleValue list_par = [ 2 * n | n <- list_par ]Calling "doubleValue [3, 4, 5]" returns the list [6, 8, 10]. In a comprehension, the part "n <- name_of_list" is called a generator, because it generates the elements on which the operation on the left is working.
Comprehensions can be combined with tests. Tests are Boolean expressions, which are added, separated by commas, on the right-hand side of the generator. So, the general format is
[ expression_in_terms_of_variable_n |
n <- some_list , test_1 , test2 , ... ]
As a concrete example consider
doubleSpecial :: [Int] -> [Int] doubleSpecial list_par = [ 2 * n | n <- list_par , isEven n , n > 5 ]Applying this function to [3, 6, 7, 4, 8, 2] returns just [12, 16], because the other elements are either odd or too small.
divisors :: Int -> [Int] divisors n = [i | i <- [1 .. n] , n `rem` i == 0] isTwo :: Int -> Bool isTwo = (== 2) isPrime :: Int -> Bool isPrime = isTwo . length . divisors primes :: Int -> [Int] primes n = [ i | i <- [2 .. n] , isPrime i]
This is super elegant. It is also super inefficient. For n == 1000, it takes almost one minute, while an efficient imperative algorithm can find all primes up to 10^7 in one second.
There are several things to remark about this script. In the definition of divisors we are using the binary function "rem" as an operator by enclosing it in backquotes. The function "isTwo" gives an example of an operator section that will be discussed further down. Because we have created this function, we can write the function isPrime without parameters as a function composition.
In the functions "divisors" and "primes" we see that the list which is handled in a list comprehension does not need to be passed as a parameter as it was done above in "doubleValue". Here the list is created on the spot, and then filtered by a filter that depends on the passed parameter n.
The script can be downloaded here. Running it within Hugs by typing "primes 1000" automatically prints a list: as pointed out before, Hugs works like a multi-type pocket calculator, immediately outputting computed results.
The inefficiency has three main reasons:
intSqrt :: Int -> Int intSqrt = floor . sqrt . fromInt smallDivisors :: Int -> [Int] smallDivisors n = [i | i <- [2 .. intSqrt n] , n `rem` i == 0] emptyList :: [Int] -> Bool emptyList s = s == [] isPrime :: Int -> Bool isPrime = emptyList . smallDivisors fprimes :: Int -> [Int] fprimes n = [ i | i <- [2 .. n] , isPrime i]
A further saving can be obtained by realizing that one can stop testing a number as soon as the first divisor is found. For numbers with several divisors this means that one can stop even earlier. It is not so easy though to estimate how much earlier.
intSqrt :: Int -> Int
intSqrt = floor . sqrt . fromInt
isPrime :: Int -> Int -> Int -> Bool
isPrime d b i
| d > b = True
| otherwise = i `rem` d /= 0 && isPrime (d + 1) b i
ffprimes :: Int -> [Int]
ffprimes n = [ i | i <- [2 .. n] , isPrime 2 (intSqrt i) i]
A final standard improvement is to consider as divisors only the prime numbers up to the square root: if none of the prime numbers divides a number n, then no product of them divides n either.
intSqrt :: Int -> Int
intSqrt = floor . sqrt . fromInt
isPrime :: Int -> Int -> Int -> [Int] -> Bool
isPrime d b i x
| y > b = True
| otherwise = i `rem` y /= 0 && isPrime (d + 1) b i x
where y = x !! d
fffprimes :: Int -> [Int]
fffprimes n
| n == 2 = [2]
| isPrime 0 (intSqrt n) n x = x ++ [n]
| otherwise = x
where x = fffprimes (n - 1)
A really different algorithm, Eratosthenes' sieve method (invented by a Greec mathematician living around 200 BC in northern Africa), is even much more efficient, both in theory and in practice. To program this method in Haskell is one of the exercises.
type String = [Char]this implies that all list functions also work on strings. A string can be printed out to the screen by the following command:
putStr :: String -> IO ()To output values from other types than String, one can use the function "show". The function "read" does the opposite: it reads a character and converts it to a value of another type.
show (6 + 7) prints "13" on the screen show (True) prints "True" read "True" reads True read "37" reads 37 (read "37") :: Int reads 37 with explicit type indication
In most other computer languages, the only allowed type of pattern in a function definition is a name. That is, when defining a function, one of the things to do is to provide a listing of its formal parameters. Such a function is called by specifying its name together with actual parameters. These actual parameters are expressions, which are evaluated upon calling the function. The computed values are substituted for the formal parameters in the order they are specified. This mechanism even applies when a call is performed `by reference', in that case the value of the actual parameter is an address. In this simple case the compiler or interpreter only needs to check that the actual parameters have the correct type and that the number of actual parameters equals the number of formal parameters. This can indeed be checked at compile time, because the number of parameters and their types is clear from the provided code.
As we have seen, in Haskell a pattern may also involve literals. A literal is an expression like True or False or 2 or 3. The value of a literal is the literal itself. More generally a function can be defined by specifying one or more patterns. These patterns are mostly chosen to be mutually exclusive, but they do not need to deal with all possibilities. For example, the square root function is not defined for negative numbers and division is not defined for 0. It is the responsibility of the programmer to assure that a function is called only for values for which it is defined.
What happens in Haskell when a function is called? As in any other language a function is called by specifying its name together with actual parameters. Then it is checked which of the provided patterns matches the actual parameters. If such a pattern is found the definition of the function after it is chosen. If no matching pattern is found the program is terminated with an error notice.
In the above case of the "not" function this was very simple: the function has a single Boolean argument, and both alternatives for this value are listed. Calling "not b", where "b" stands for a Boolean expression, the value of "b" is determined. If its value is true, the first definition is chosen, if it is false the second.
The "exor function has two Boolean arguments and is called with "exor b1 b2", where "b1" and "b2" are Boolean expressions. To find the matching pattern "b1" is evaluated. If its value is true, the first definition is chosen, if it is false the second. "b2" is not yet evaluated in the pattern matching state. This `lazy evaluation' is discussed further down.
Pattern matching must be performed at run time: at compile time the values are unknown, and therefore it cannot be determined at compile time which of the alternatives to choose. Particularly, before knowing the values it cannot be checked whether for the arising values there will always be a matching alternative.
tails [] = [[]] tails (x : s) = (x : s) : tails s
This definition requires that the provided list, if it is non-empty, is decomposed into x and s. To compute the result, these are combined in the expression "(x : s)". However, this is a waste of effort, because the result is the same as the provided actual parameter. To make such computations more efficient, Haskell uses the symbol "@" to attach a name to a pattern. Using this symbol, the definition looks as follows:
tails [] = [[]] tails xs @ (x : s) = xs : tails s
length s
| s = [] = 0
| otherwise = 1 + length . tail s
Using the construction of a pattern with help of the operator ":" this function can also be defined more directly as follows:
length [] = 0 length (x : s) = 1 + length sNotice that here only one of the two possibilities can match: an empty list cannot be interpreted as some list to which x is added at the beginning.
In definitions using pattern matching it is common that at most one of the patterns matches. In that case the order is immaterial. However, it is not forbidden to have patterns which are not mutually exclusive. The following is correct:
fac 0 = 1 fac n = n * fac (n - 1)
When calling "fac 0", in principle both definitions are matching. In this case, just like a definition with guards, the first matching alternative is chosen. Changing the order of the definitions is not correct in this case: the recursion will not stop and after some waiting the execution crashes with a segmentation fault.
Surprisingly, the following is correct:
fac (n + 1) = (n + 1) * fac n fac 0 = 1Why is this not the same? To understand this, one should know how n is solved in equations like these. The only matching n are natural numbers, that is, integers with value at least 0. In this case, calling "fac 0" the only matching n would be -1, so this is not considered to be a match. Therefore another alternative is sought and found.
In the second definition of "length" we are actually not interested in the value of "x", it is only important that the list can be decomposed in a first element and a tail. In such cases, a parameter in a pattern matching may be replaced by a place holder or wild card, for which Haskell uses the symbol "-". So, "length" can be further simplified:
length [] = 0 length (_ : s) = 1 + length s
In this case the practical difference will be small, but in general the usage of a place holder will may make the computation more efficient. This might be the case for the following definition of the function "isEmpty":
isEmpty [] = True isEmpty (_ : _) = False
As an example we consider a function which computes (n - 1)! * n!. Of course we can write:
facFac :: Integer -> Integer facFac n = (fac (n - 1)) * (fac n)This computation requires 2 * n products. This function should therefore better be written as follows:
facFac :: Integer -> Integer
facFac n = x * y
where
x = fac (n - 1)
y = x * n
Here the number of products is only n + 1.
When using "where", the layout is free, except for the fact that the offside rule applies. This means that the "where" should begin to the right of the beginning of the function name, and that everything that belongs to its scope should begin to the left of the beginning of it.
As a further, more convincing example, we give an alternative, shorter version of the efficient computation of the Fibonacci numbers:
altFib :: Integer -> Integer
altFib 0 = 0
altFib n = snd (fibPair n)
where
fibPair 1 = (0, 1)
fibPair (n + 1) = (b, a + b)
where
(a, b) = fibPair n
This example reveals several interesting aspects: The local `variables' do not need to be variables (local constants would be a better name anyway). It is better to speak of local definitions, which may also include local function definitions. In that case it makes sense to have several locality levels, with a local definition containing even more local definitions. In the example we find two of these levels, each initiated with their own "where".
Even more interesting are the pattern matchings. One non-trivial pattern matching occurs in the line "fibPair (n + 1) = (b, a + b)". Calling fibPair with some value x, the system is then matching x to (n + 1) and substitutes x - 1 for all occurrences of the formal parameter n. A second pattern matching occurs in the line "(a, b) = fibPair n". Here the value of fibPair n, which is a pair of ints is matched component by component with (a, b). This latter type is called conformal pattern matching, because the expression on the right-hand side has to conform to the pattern on the left.
It would be a lot of work to list all possibilities, and this is even forbidden. Trying the following piece of code:
myfun :: Bool -> Bool myfun x = not x myfun :: Int -> Int myfun x = x + 1Returns the following error:
Repeated type signature for "myfun"
So, how to define functions like "length"? The solution is to use a type variable. In the following the variable "a" is a type variable:
length :: [a] -> Int
This definition states that for a list over a type "a" the result is an integer. That "a" is not a type but a variable is clear because of the naming rules: all types begin with a capital. The above really describes all cases in which the function length can be used. Thus, "[a] -> Int" is what is called the most general type for the function length.
There are more general and less general cases. The definition of ":" involves only a single type variable:
(:) a -> [a] -> [a]
The function "zip" is more general, with a definition involving two type variables:
zip :: [a] -> [b] -> [(a, b)]
Of course definitions of this kind are not limited to functions from the prelude: one can also define new functions using type variables. For example:
repAdd :: Int -> a -> [a] -> [a]
repAdd i x s
| i == 0 = s
| i > 0 = repAdd (i - 1) x (x : s)
It is important to nevertheless allocate a single type to every function an operator. Therefore, types are arranged in type classes. The most important of these are:
The type definitions of the above functions now read:
(+) :: Num a => a -> a -> a (<) :: Ord a => a -> a -> Bool (==) :: Eq a => a -> a -> BoolHere we encounter the new symbol "=>". The above expressions must be read as "(+)" is of type "a -> a -> a" provided that a is an element from the type class "Num".
Like polymorphic functions and operators can the user also define overloaded functions and operators.
squareSum :: Num a => a -> a -> a squareSum x y = x * x + y * y
Notice that the polymorphism of self-defined polymorphic functions was `inherited' from the polymorphism of earlier defined functions. In the same way "squareSum" derives its overloaded character from the overloadedness of the operators "+" and "*".
This distinction is no problem because function application is left-associative and the operator "->" is right associative. This means that
f :: A -> B -> C -> D -> E f a b c d ...are equivalent to
f :: A -> (B -> (C -> (D -> E))) (((f a) b) c) d ...
So, a function with four parameters can be interpreted as a function which maps an argument to a function which maps an argument to a function which maps an argument to a function of one variable. This corresponds to the function f starting to `swallow' its arguments starting from the left: if there are no brackets, then they stand so that Currying works well.
In the context of higher-order functions, treated further down, we will see that sometimes it is nevertheless necessary to use brackets in function definitions.
Currying is not just a theoretical issue. It can also be used practically. Consider the following function
fac :: Integer -> Integer
fac n
| n == 0 = 1
| n > 0 = n * fac (n - 1)
choose :: Integer -> Integer -> Integer
choose n k
| n < k = error "In choose: k larger than n"
| otherwise = fac n `div` (fac (n - k) * fac k)
It computes the number of subsets of size k of a set with n elements,
a function which is often pronounced as "n choose k".
Suppose that we are working with a set of fixed size and that we want to define the function which computes the number of subsets of a size to specify. This function can be written by partial parametrization as follows:
chooseFromTen :: Integer -> Integer chooseFromTen = choose 10
The most important application of partial parametrization is when one wants to pass a partially parametrized function as an argument to a higher-order function (see further down). In that case it is not necessary to explicitly give the function a name, we can simply write "choose n".
When applying one of these operator sections to an argument y, the returned value is the one given by substituting y at the single free position. Thus, for an operator "&&&" with
(&&&) :: A -> B -> Cwe have
(&&& x) :: A -> C (&&& x) y = y &&& x (x &&&) :: B -> C (x &&&) y = x &&& y
Of course for commutative operators there is no difference between left and right sections, but in general there is. For example, "( / 3)" is the function "divide by three" and "(3 / )" is the function "divide three by". We give some examples of potentially useful definitions:
next :: Integral a => a -> a next = ( + 1) square :: Num a => a -> a square = ( ^2) twoPower:: Num a => a -> a twoPower = (2^ ) isZero :: Integral a => a -> Bool isZero = ( == 0) reciprocal :: Float -> Float reciprocal = (1.0 / )
A higher-order function is a function with a function as argument. The possibility of defining higher-order functions is only consequent: functions are not really different from values: they have a type, they can be the result of a function application, for example by Currying, and now we will even consider them as parameters. In the following we will present a number of these functions which are defined already in the prelude. It is instructive to consider their definitions.
map :: (a -> b) -> [a] -> [b] map f [] = [] map f (x : s) = f x : map f s
Here we see several things. First "map" is a function which takes as arguments a function from type a to type b and a list over type a and produces a list over type b. Here we encounter another example of a highly polymorphic definition. Clearly it is important to write the brackets in "(a -> b)" because otherwise when calling "map f s" for some function "f" and a list "s", would not work: map would expect its first argument to be of type "a" and then expects a third argument, namely a list of functions. So, here the brackets are used to correct the default bracketing given by the right-associativity of "->".
Another point is the definition of a recursive function using pattern matching. Alternatively, we could have written
map :: (a -> b) -> [a] -> [b]
map f xs
| length xs == 0 = []
| otherwise = f x : map f s
where
x = head xs
s = tail xs
It would have been more elegant to write "xs == []", but testing the
equality of two lists is performed by comparing the elements, and the
elements of a polymorphic type can not necessarily be compared.
In the first definition of "map", coming as actual parameter with a list "xs", it is considered which of the definitions fits. In this case there are no conflicts possible: a list is either empty, or it can be written as a first element plus some rest. Because of the type of the operator ":", it is clear to the interpreter that xs must be split into head and tail, and not in any other way.
Therefore the following is not supported:
map :: (a -> b) -> [a] -> [b] map f [] = [] map f (s1 ++ s2) = (map f s1) : (map f s2)This would give an interesting but undesirable ambiguous cutting of s anywhere (the possibilities include splitting of an empty list, which would result in an infinite recursion).
A simple application of mapping is in the following:
map (2 *) [1, 2, 3] ----> [2, 4, 6]
Along this line we might even define our own function "doubleAll"
doubleAll :: Num a => [a] -> [a] doubleAll = map (2 *)Here the higher-order function map is Curried with a function, which itself is an operator section, and turned thereby into a function mapping lists of a numerical type into lists of the same type.
filter :: (a -> Bool) -> [a] -> [a]
filter p [] = []
filter p (x : s)
| p x = x : filter p s
| otherwise = filter p s
It is now possible to give very concise formulations like the following:
(//) :: Int -> Int -> Bool (//) x y = x `rem` y == 0 multiples :: Int -> Int -> [Int] multiples i n = filter ( // i) [0 .. n]But of course for a list of integers "s" one can also use the following one liner to filter out all numbers >= 10:
filter ( < 10) s
A list comprehension is a shorthand for zero or more filters plus an application of map. Consider the following:
myComp :: [Int] -> [Int] myComp s = [2 * i | i <- s, i // 3]This can be rewritten as a composition of the function "filter with ( // 3)" and the function "map with (2 * )":
myFilterMap :: [Int] -> [Int] myFilterMap = map (2 * ) . filter ( // 3)
Filter and map are more general, but list comprehensions are clearer. So, if the desired functionality can be obtained with a comprehension, then this is the preferred way of writing it.
takeWhile :: (a -> Bool) -> [a] -> [a]
takeWhile p [] = []
takeWhile p (x : s)
| p x = x : takeWhile p s
| otherwise = []
dropWhile :: (a -> Bool) -> [a] -> [a]
dropWhile p [] = []
dropWhile p xs @ (x : s)
| p x = dropWhile p s
| otherwise = xs
A combination of takeWhile and dropWhile can be used to process a text in a very simple but slightly inefficient way: assume that we want to decompose a text presented as a single long text into words. The words are terminated by some well-defined symbols such as blanks. This task is performed by the following script:
endMarker :: Char -> Bool
endMarker c = c == ' '
takeWord :: String -> String
takeWord = takeWhile (not . endMarker)
dropWord :: String -> String
dropWord = dropWhile (not . endMarker)
dropSeps :: String -> String
dropSeps [] = []
dropSeps (x : s)
| endMarker x = dropSeps s
| otherwise = x : s
getWords :: String -> [String]
getWords s
| x == [] = []
| otherwise = (takeWord x) : (getWords (dropWord x))
where
x = dropSeps s
foldr :: (a -> b -> b) -> b -> [a] -> b foldr op e [] = e foldr op e (x : s) = x `op` foldr op e s
Here the operator is denoted "`op`", the neutral element of the operator is denoted by "e" and the third argument is the list of elements of type "a".
For example, one can now type
foldr (+) 0 [0 .. 10] ----> 55Other examples are given by functions like "sum" and "product" and "or":
sum :: [Int] -> Int sum = foldr (+) 0 product :: [Int] -> Int product = foldr (*) 1 or :: [Bool] -> Bool or = foldr (||) False
For non-associative operators it makes sense to also consider folding from the other end. The formulation is slightly less elegant:
foldl :: (a -> b -> a) -> a -> [b] -> a foldl op e [] = e foldl op e (x : s) = foldl op (e `op` x) sHere the evaluation is performed starting from the left, accumulating the result in the value that is returned once the list s is empty.
until :: (a -> Bool) -> (a -> a) -> a -> a
until p f x
| p x = x
| otherwise = until p f (f x)
Here "p" is a predicate for type "a" it maps values of this type to Booleans. "f" is the function which is to be repeatedly executed, and "x" is the starting value. The execution stops as soon as the value of "x" satisfies the condition given by "p".
This definition of "until" is by far not as versatile as a for-loop in C, but it can be used for operations such as finding the smallest power of two exceeding a given number:
until (> 1000) (2 *) 1
Here we must be more careful than before: a priori there is no guarantee that an until-loop terminates. This was different for "map", "filter" and "foldr": these had a recursion which was defined over the length of the list. This length is monotonously decreasing and eventually becomes zero.
Of course, in finite time it is not possible to compute an infinite number of results. Fortunately, in Haskell any answer is printed as soon as it is known. If we have seen sufficiently many answers the computation can be interrupted with "control c".
Even lazy evaluation does not mean that one can work with infinite lists in a careless way: applying functions like "sum" or "length" on an infinite list would imply that the computer first should compute all values, before being able to compute the result. The interpreter does not detect such programming errors, but executes the task, taking infinite time.
Some basic kind of lazy evaluation we are already familiar with: if we write a Boolean expression the expression is evaluated from left to right, and as soon as it is certain that the result of the expression results in true or false, the remainder of the expression is not evaluated. This kind of optimization is performed by most compilers, but this aspect is not necessarily part of the specification. Thus, if we write in C "for (i = 0; i < 10 && bestValue(a[i], &x, y); i++)", then we may not be sure whether "bestValue" is executed for "i == 10" or not. If it is executed this may lead to array-bound errors. If it is not executed this may mean that at the end "x", which may be changed as a result of the action of "bestValue", has an unexpected value. Finding programming errors of this kind is very hard.
Fortunately in functional languages there is no such thing as side effects of function calls: actual parameters are constants within the scope of their functions. They do not change as a result of calls to functions with them. Therefore, the order of the function calls is irrelevant for the computed value or correctness (but it may have an impact on the time consumption because due to lazy evaluation may mean that not all of them have to be executed).
The operator "&&" is defined as
(&&) :: Bool -> Bool -> Bool False && y = False True && y = yThis means that when calling "x && y", for "x == False", the result does not depend on the value of "y", and therefore "y" is not computed.
With a different definition both actual parameters must be known in order to perform the pattern matching. In that case not even lazy evaluation can help us:
(&&&) :: Bool -> Bool -> Bool (&&&) False && False = False False && True = False True && False = False True && True = True
The definition of equality of lists is (except for details of the typing) as follows:
(==) :: [a] -> [a] -> Bool [] == [] = True [] == (y : t) = False (x : s) == [] = False (x : s) == (y : t) = x == y && s == tThe important thing is that "==" is defined in terms of "&&". We know that due to its clever formulation, lazy evaluation implies that as soon as the first argument does not evaluate to true, the second argument is not evaluated. But then this implies for "==" on lists that the comparison is terminated once the first non-matching pair of values is encountered.
This observation of the operation of "==" on lists is important. It implies that often comparisons are much cheaper than they appear. Consider the following:
divisors :: Int -> [Int] divisors n = [i | i <- [1 .. n] , n `rem` i == 0] isPrime :: Int -> Bool isPrime n = divisors n == [1, n]
How far is "divisors" evaluated? Only as long as this is needed to verify whether the result equals "[1, n]" or not. This means only as long as the lists, starting from the beginning are pairwise identical. So, as soon as the first non-trivial divisor is found, it is compared with "n" and the conclusion is that the lists are not the same. So, the value of "isPrime n" is known, and the rest of the list of divisors is left unevaluated. Clearly it is even better to only test the numbers "[2 .. floor (sqrt (fromInt n))]" and to compare with the empty list.
[n .. ]
The following function also gives an infinite list:
iterate :: (a -> a) -> a -> [a] iterate f x = x : iterate f (f x)Calling "iterate (+ 1) 3" gives the same list as above.
Simpler infinite lists are obtained by "repeat", which simply gives the same result again and again:
repeat :: a -> [a] repeat x = x : repeat x
The simplest polymorphic function is the identity function:
id :: a -> a id x = xPartially parametrizing "iterate" with "id" gives "repeat", so the higher-order function "iterate" is strictly more general than "repeat".
map (3^) [0 .. n]In this particular case, this is not very efficient, and the number n can indeed be computed, but if we want to determine more generally all values so that f(i) < m for all 0 <= i <= n and f(n + 1) >= m, then we do not know which n to take.
This problem can be solved in Haskell with the methods we have encountered before, but the following is much easier:
takeWhile ( < m) (map f [0 .. ])
If in a call like "takeWhile ( < 1000) (map (3^) [0 ..])" first all powers of three would be computed, this would never lead to a result, but the whole process is directed by "takeWhile": it repeatedly calls for a new value from the list. This value is then produced by applying "f" to the next element from the list. This is repeated until the first value larger than 1000 is produced.
We consider an alternative variant. Above we were generating all values so that f(i) < m for all 0 <= i <= n and f(n + 1) >= m. We can also consider the values so that f(f( ... f(0))) < m. This is a typical example where one may use "iterate":
takeWhile ( < m) (iterate f 0)
For the values 1, 3, 9, 27, ... can be interpreted as the sequence of functional values f(x) for the function f = (3^), but just as well as the values f(f( ... f(0))) for the function f = (3 * ). This latter interpretation, allows to generate this sequence more efficiently:
takeWhile ( < m) (iterate (3 * ) 1)
The first task is easy (actually this is also achieved by the function "digitChar" from the prelude):
digitToChar :: Int -> Char digitToChar = chr . (ord '0' +)Here we see an operator section of "+" composed with the function "chr". The function does not work correctly when the provided argument is not a digit. Providing other numbers will either result with nonsense or an error.
For converting an entire positive number, we can proceed as follows:
digitsToString :: Int -> String
digitsToString n
| n == 0 = []
| n > 0 = digitToChar x : digitsToString y
where
x = n `rem` 10
y = n `div` 10
It is a minor problem that now the number gets reversed and we should also deal with the case n == 0 in a special way:
intToString :: Int -> String
intToString n
| n == 0 = "0"
| otherwise = reverse (digitsToString n )
There is nothing wrong with the above method, but it does not taste very `functional', it stays very close to a conventional C implementation, with a barely hidden loop. Furthermore, we were once again programming the recursion ourselves, not using all the provided mechanisms. This is as if we would program a for loop in C with an if and a goto.
We now describe a more sophisticated alternative which only uses functions from the prelude. A number is viewed to have an infinite number of leading zeros. This interpretation allows to apply "iterate (`div` 10)". Remember: "iterate" gives an unconditional repetition. Applying this function to 64582 generates [64582, 6458, 645, 64, 6, 0, 0, ...]. This infinite execution can be made finite by combining it with "takeWhile (/= 0)". As soon as the first number with value 0 is encountered in the list, the list is truncated. Because of the lazy evaluation, this means that "iterate" is not further asked to provide numbers. The remaining task is easy: from each of the provided numbers we must take the last digit and convert it to a character. These should be glued together to a string and reversed to get the answer (only the case that the specified value equals 0 we get an empty string):
intToString :: Int -> String
intToString 0 = "0"
intToString = reverse . map (digitToChar . (`rem` 10))
. takeWhile (/= 0) . iterate (`div` 10)
Here we discuss an even simpler variant. In an implementation with lists it is quite an efficient prime number generating method, but it is incomparably slower than the method working with an array of Booleans: the time for generating all primes up to a certain number n is almost quadratic in n. The infiniteness is here pushed quite far: we do not work with just one infinite list, but with infinitely many infinite lists. Of course none of these is entirely traversed.
The first list contains all numbers larger than 1. Its first element, 2, is a prime number. In the second list we take out all multiples of 2, these are certainly no prime numbers. The remaining list consists of all odd numbers larger than 1. Its first element, 3, is a prime number. Taking out all multiples of 3 gives a list consisting of all numbers larger than 1, which are no multiples of 2 and 3. Its first element, 5, is a prime number again.
More generally, the construction goes as follows:
Let us denote the i-th prime number by p_i. Thus p_0 = 2, p_1 = 3, ... . We claim that with the above construction the first element of list i is just p_i, for all i >= 0. Notice how important the precise definitions are for clearly formulating a claim like this. It is proven by complete induction. The case i == 0 forms the basis of the induction. The first element of list 0 is 2, which is indeed p_0. Now assume the claim holds for all j <= i. Which elements do we find in list i + 1? This list is obtained in i + 1 steps from list 0 by filtering out the multiples of their first elements. Because of the induction assumption we know that these are the numbers p_0, ..., p_i. Consider p_{i + 1}, the (i + 1)-st prime number. Because p_{i + 1} >= 2, it is an element of list 0. Because p_{i + 1} is not a multiple of any other prime number, it is not filtered out, and therefore it is also an element of list i + 1. It remains to show that p_{i + 1} is the first element of list i + 1. Consider any number q, 2 <= q < p_{i + 1}. Any number, so q as well, can be written as the product of prime factors. Because q < p_{i + 1}, q cannot have factors from p_{i + 1} or any larger prime number. Thus q can be written as product of prime numbers smaller than p_{i + 1}. But this implies that q was removed from the lists when the multiples of its smallest prime factor where filtered out.
The above gives a formal proof that taking the first elements of all generated lists precisely gives us the set of prime numbers, ordered from small to large. It remains to turn this idea into Haskell. Notice that in the proof, even though all involved lists are infinite, this played no role. Just in the same way, the infiniteness will not bother an interpreter with lazy evaluation. On the other hand, without lazy evaluation, it would not be possible to directly translate the mathematical idea into computer code.
The iterate construction produces an infinite list of results by applying a function again and again. Until now the function was exactly the same in all applications: in the first example we used (3 * ), in the second (`div` 10). "iterate" allows to pass only one function, but its functionality can be tuned with a parameter. That is what we will do to construct the filtered lists:
multiple :: Int -> Int -> Bool multiple x y = y `rem` x == 0 sieve :: [Int] -> [Int] sieve (x : s) = filter (not . multiple x) s makeLists :: [[Int]] makeLists = iterate sieve [2 .. ] primeNumbers :: [Int] primeNumbers = map head makeLists
Here we encounter some new aspects of Haskell. The functions "makeLists" and "primeNumbers" do not have any parameters, out of nothing they generate a list of lists of ints and a list of ints, respectively. The function "map" takes two arguments: a function and a list. In our case the list is actually a list of lists. We see that "makeLists" can be treated just as any other variable of type [[Int]].
In the definition of "sieve" we see that the composed function "not . multiple" does not need to be enclosed in brackets. The reason is that the operator "." has fixity 9 and this it will be executed even before "multiple" is applied to "x". The sieve function is parametrized with the first element of the list to which it is applied. This first element is removed. Notice that "sieve" is only defined for non-empty lists: for an empty list the pattern-matching mechanism will fail.
For this last reason, it is better to not define a top-level function "sieve", but to make it local: this assures that it will never be called in an inappropriate way. Observing that there is no need to define "makeLists" explicitly, we get the following:
primeNumbers :: [Int]
primeNumbers = map head (iterate sieve [2 .. ])
where
sieve (x : s) = filter (not . multiple x) s
where
multiple x y = y `rem` x == 0
Here we omitted most of the type information. In general there are three good reasons for providing type information:
It may happen that calling the same function with the same arguments does not result in the same number of reductions and cells. The reason is that even the procedure which performs the syntactical analysis of the script (upon loading it, or any time after leaving the editor) is lazy. This means that if a constant has a functional value, this value is not yet computed. When the script is called again, this value is still available. This happens when making two calls to "f" in the following script:
k = 3 + 4 f x = k + x
For the computation of the Fibonacci numbers we were first computing fib(n) y calling fib(n - 1) and fib(n - 2). The result was a time consumption that is exponential in n. The alternative method based on making steps with a pair of numbers had a time consumption which was linear in n. The difference is huge, with the first method n == 30 is about the largest value for which fib(n) can be computed, with the second method there is almost no limit on n.
For the computation of prime numbers we have seen how to reduce the number of computations from quadratic (dividing any number j <= n by all numbers < j), to something much smaller. Here we encountered the problem that Haskell does not provide random access memory. This implies that the discussed sieving method (of which a variant is the subject of one of the exercises) has to traverse the lists step by step. In a language like C we would use an array which would be traversed with ever increasing steps.
Lists are implemented as linked lists (see the example in the chapter on Java). This means that insertions can be made easily at the beginning of the list, but that it is costly to access an element at a specific position or to add something at the end of a list. This is the reason that the preferred way of adding elements to a list is with help of the operator ":". Independently of the length of the list "s", it takes only constant time to compute "x : s".
How should one concatenate two or more lists? Suppose we have lists called "s1", "s2", "s3" and "s4". Their lengths are l1, l2, l3 and l4. If we perform ((s1 ++ s2) ++ s3) ++ s4, then the cost for finding the end of the first list in each ++-clause equals l1 + (l1 + l2) + (l1 + l2 + l3) = 3 * l1 + 2 * l2 + l3. If on the other hand we perform s1 ++ (s2 ++ (s3 ++ s4)), the cost is only l3 + l2 + l1. So, it matters how "++" and similar operations are performed. Realizing this, the Haskell designers have chosen to make "++" right-associative.
The essence of a law is that it holds independently of its context, that is, the values of a, b, c, x, y and z in the formulation of the laws do not matter: the law holds always. The reason to formulate laws is that they allow to argue with them. This can be used when proving the equivalence of two more complicated logical expressions.
As an example we give a slightly shortened version of the prove of the law "(a + b) * (a - b) == a * a - b * b":
(a + b) * (a - b) =distributivity= a * a + b * a - a * b - b * b =commutativity of *= a * a + a * b - a * b - b * b =distributivity= a * a + a * (b - b) - b * b =definition of -= a * a + a * 0 - b * b =0 annihilates *= a * a + 0 - b * b =0 is neutral element of += a * a - b * bOnce we have proven this new law we can use it whenever we encounter a pattern "(a + b) * (a - b)". Here "a" and "b" are not necessarily numbers, they can themselves be numerical expressions.
The main reason to apply laws is to simplify expressions, whatever simplification exactly means. In one context one should rather write 13 / 20, in another context 0.65, which by definition is nothing but 65 / 100. Typically one wants to reduce the number of symbols or the size of the involved numbers, but the goal may also be to come with an equivalent formulation which can more easily be evaluated.
f . (g . h) == (f . g) . h -- associativity of . map f . (x :) == ((f x) :) . map f map (f . g) == map f . map g -- map distributes over . map f (s1 ++ s2) == map f s1 ++ map f s2 -- map f distributes over ++ map f . concat == concat . map (map f)
Just as with logical and mathematical laws, these Haskell laws can be used to prove more complex functional equalities. The purpose of this is to be able to perform program transformation, which means that in a stepwise way a piece of code is transformed to another piece of code which is provably equivalent. As in the mathematical and logical context, the reason that one may want to do this is to `simplify' the code, which in a programming context typically means to obtain a program which can be executed more efficiently by the computer. The great point with having laws is that they can be handled in an automatized way. This implies that program optimization can be performed by the computer. Even in imperative languages one can perform program transformations, but there one always has to take care of unexpected side-effects. Because the value of a function does not depend on a context, and because computing a function does not change any other values, do we not have this problem when applying laws in a functional language.
In order to start proving, we need some foundations. We will use that by definition "f == g" means that for all x from the domain over which f and g are defined "f x == g x". Also we use that the operator ".", which gives functional composition, is defined as "(f . g) x == f (g x)". And, of course, we will need the definitions of the involved functions such as "map", "++" and "concat".
We prove the law "f . (g . h) == (f . g) . h", by showing that for any x from the domain of h we have "(f . (g . h)) x == ((f . g) . h) x":
(f . (g . h)) x =def .= f ((g . h) x) =def .= f (g (h x)) =def .= (f . g) (h x) =def .= ((f . g) . h) x
The law "map f . (x :) == ((f x) :) . map f" is proven by showing that for any argument s, which clearly must be a list of the type of x, we have "(map f . (x :)) s == (((f x) :) . map f) s":
(map f . (x :)) s =def .= map f ((x :) s) =def of operator sections= map f (x : s) =def of map= (f x) : (map f s) =def of operator sections= ((f x) :) (map f s) =def .= (((f x) :) . map f) s
The above proofs were straight-forward. It is slightly harder to prove laws on functions with a recursive definition. In that case we should give a proof by complete induction, and distinguish several cases, which together cover all possible inputs. Mostly there are two cases, one basis case and the general case, but sometimes it is necessary to distinguish more than two cases. Because of the inductive assumption, when proving the general case, we may assume that the law holds for shorter strings or smaller numbers.
We prove that "map (f . g) == map f . map g". This law tells that there is no need to traverse the list twice: we can directly apply the composed function. First we show that "(map (f . g)) [] == (map f . map g) []", and then showing that "(map (f . g)) (x : s) == (map f . map g) (x : s)" under assumption that "(map (f . g)) s == (map f . map g) s".
(map (f . g)) [] =def map= [] =def map= map f [] =def map= map f (map g []) =def .= (map f . map g) [] (map (f . g)) (x : s) =def map= (f . g) x : (map f . g) s =induction assumption= (f . g) x : (map f . map g) s =def .= f (g x) : map f (map g s) =def map= map f (g x : map g s) =def map= map f (map g (x : s)) =def .= (map f . map g) (x : s)
We prove that "map f (s1 ++ s2) == map f s1 ++ map f s2" for all lists s1 and s2 of appropriate type. First the case "s1 ==[]" is checked, then it is checked for s1 == (x : s), assuming that the law holds for s1 == s:
map f ([] ++ s2) =def ++= map f s2 =def ++= [] ++ map f s2 =def map= map f [] ++ maf s2 map f ((x : s) ++ s2) =def ++= map f (x : (s ++ s2)) =def map= f x : map f (s ++ s2) =induction assumption= f x : (map f s ++ map f s2) =def ++= (f x : map f s) ++ map f s2 =def map= map f (x : s) ++ map f s2
The inductive proof that "map f . concat == concat . map (map f)" goes analogously, first checking the empty list, and then proving that "(map f . concat) (x : s) == (concat . map (map f)) (x : s)", under assumption that this holds for "s". Even this may be used to reduce the number of list traversals by one. This law is a direct generalization of the above for the case that there is a list of lists:
(map f . concat) [] =def .= map f (concat []) =def concat= map f [] =def map= [] =def concat= concat [] =def map= concat (map (map f) []) =def .= (concat . map (map f)) [] (map f . concat) (x : s) =def .= map f (concat (x : s)) =def concat= map f (x ++ concat s) =distribution of map f over ++= map f x ++ map f (concat s) =def .= map f x ++ (map f . concat) s =induction assumption= map f x ++ (concat . map (map f)) s =def .= map f x ++ concat (map (map f) s) =def concat= concat (map f x : map (map f) s) =def map= concat (map (map f) (x : s)) =def .= (concat . map (map f)) (x : s)
In the above proof we use the distributive law we have proven before. The more laws we know, the shorter proofs can be given. Prove with several cases can often also be shortened by only performing the case distinction where it is necessary, while performing transformations which hold for all inputs only once. We would have saved some lines, if we would have realized that for all xs (empty or not), "(map . concat) xs =def .= map (concat xs)" and that "concat . map (map f)) xs =def .= concat (map (map f) xs)". Then by induction it only remains to prove that "map f (concat []) == concat (map (map f) [])" and that "map f (concat (x : s)) == concat (map (map f) (x : s))" under assumption that "map f x ++ map f (concat s) == map f x ++ concat (map (map f) s)".
When trying to construct a proof like those given above, it may not always be light to stay on the path leading from the function at the beginning to the function at the end. Often it helps to start from both ends and to try to meet in the middle (in inductive proofs this may be the point were the inductive assumption can be applied).
The functions nand, and and or are binary. Construct corresponding operators, which should be denoted !&&, &&& and |||, respectively. Make them left-associative and give them a sensible fixity.
sortedFour :: Int -> Int -> Int -> Int -> Bool
which returns True if and only if the four specified values
stand in sorted order. That is, they must be are (weakly)
increasing.
differentFour :: Int -> Int -> Int -> Int -> Bool
which returns True if and only if all four specified values
are different. Use guards and minimize the number of comparisons
(6 comparisons is enough).
numDifFour :: Int -> Int -> Int -> Int -> Int
which returns a decent error message if the numbers are not sorted
(you may use the function sourtedFour from above for testing this)
and otherwise returns the number of different values among the
four arguments.
charToNum :: Char -> Int
Which converts a character which represents a digit to the
corresponding digit. So, '5' is converted to 5. If the character
is not a digit an error message should be produced.
Give two variants of the function: the first uses guards to distinguish all cases, the second uses the functions "ord" and "chr".
Now create a function
stringToNum :: String -> Int
which converts a string consisting of digits only to an integer.
This function should use charToNum and it should also be possible
to enter negative numbers.
rangeProduct:: Int -> Int -> Int
For arguments low and hgh it is defined as follows: if low > hgh,
the returned value is 0. Otherwise the range product is given by
low * (low + 1) * ... * hgh.
Give a definition of the function "fac", computing the factorial, in terms of the function "rangeProduct".
Using the function "rem" computing the remainder of the division of the first by the second argument defined in the prelude, define a recursive function "gcd" which for any pair of positive (that is, larger than 0) integers computes the gcd. Concretely: your script should also handle the case x < y.
stringReplicate :: Int -> String -> String
For an integer n and string s, it creates a string with n copies of
s after each other. So,
stringReplicate 3 "ape" == "apeapeape"
pshRght :: Int -> String -> String
addRght :: Int -> String -> String
addLeft :: Int -> String -> String
pshLeft :: Int -> String -> String
"pshRght n s" takes string s and adds n times a blank at the front
of s. So, "pushright 3 "ape"" gives " ape". Notice that this is
not done by adding n blanks in one stroke. "pshRght" is based on
the operator ":". "addRght n s" does the same, but adds n times
" " using "++". "addLeft" is similar to "addRght", but adds the
strings " " add the end using "++". In order to define pshLeft,
first define an operator ":::" adding an element to a list at the
end of the list. It should be left-associative and have fixity 5.
Now define "pshLeft" analogously to "pshRight".
Determine for each of the functions the smallest n so that it takes more than a minute to execute them.
Initially all positions are set to true except for position 0 and 1. Then the array is traversed from the small indices to the large. When at position i we encounter true, we know that i is a prime number. All multiples of i are no prime numbers, so their values must be set to false.
In an imperative language, using a for loop an if and a second for loop, this program can be programmed in 20 minutes. It can be downloaded here. Now turn this idea into a Haskell script: define a function
eratosthenes :: Int -> [Int]
which for an integer value computes all prime numbers in a
list.
cat(1) = 1,Give a simple 3 line function computing the Catalan numbers. Hint: use a list comprehension to simulate the sum.
cat(n) = sum_{i = 1}^{n - 1} cat(i) * cat(n - i), for all n > 1.
How much is cat(15)? What is the problem? Let T(i) denote the time for computing T(i). Prove that if T(1) == 1, that then T(n) >= 2^n. Hint: use induction. Actually the time consumption is even worse, and therefore you can make quite coarse estimates.
A major improvement can be obtained by not computing the same numbers again and again. This can be achieved by first computing cat(i), for all i < n and packing these somehow in a list. This list is used when computing cat(n). Hint: the list can be accessed with help of the function "!!". How long does it take now to compute cat(15)?
Let coin(n, l) be the minimum number of coins needed to pay an amount n using only the coins with values v_0, ..., v_l. Then, it is not hard to see that
coin(n, l) = infinity, if n < 0
coin(n, l) = 0, if n == 0
coin(n, l) = min{coin(n, l - 1), 1 + coin(n - v_l, l), if n > 0
Let minCoin(n) be the minimum number of coins needed to pay an amount n using any of the coins. Clearly the largest potentially useful coin is the largest coin with value not exceeding n. Write a script for evaluating the function minCoin according to the given specification. How much is minCoin(400)?
The inefficiency of the program comes from the fact that certain values are recomputed again and again. The worst are the values coin(n, 0) and coin(n, 1). Now define a function fastMinCoin, which computes the same values in a more efficient way: any value should be evaluated only once and kept in appropriate lists.
Hint: compute the values `row by row', that is, first compute all values coin(n, 0), then the values coin(n, 1). Of course, you may just as well work `column by column', first computing all values coin(0, l), then all values coin(1, l). It may help to first consider an imperative implementation to better understand the task to solve. An implementation in C can be downloaded here.
The whole domain dealing with these two questions is called automata theory or language theory. In the chapter on grammars we have already seen how many patterns can be generated, for example all strings with equally many a's and b's. Parsing is one way of recognizing strings. In this chapter a more limited machinery is considered.
The above way of progressing from state to state is thought to be performed by a device called state machine. Of course this process can also be described graphically. In the graph, the states correspond to the nodes, and the edges give the possible transitions. Labels next to the edges indicate the conditions under which the transitions are made. The initial state is called starting state, the final state is called accepting state. The accepting states will be drawn with double circles. Such a graph is called a finite automaton or just automaton. The default assumption is that the state machine stops upon reaching an accepting state. In that case it may be thought to output "yes", if it stops without reaching an accepting state, the machine may be thought to output "no".
The important point with finite state machines is that they can be translated in a mechanical way to a piece of code. One can write a procedure for each state. In each procedure the next character is scanned and the program continues with one of the specified alternatives which is selected depending on the value of the character. An essential feature of these finite state automata is that they traverse the string from left to right only once. The program, which can be downloaded here, corresponding to the recognition of "tomorrow" as a substring looks as follows:
#include "stdio.h"
#define TRUE 1
#define FALSE 0
typedef char boolean;
boolean first_w(FILE* input) {
char c;
while ((c = getc(input)) != EOF && c != 'w');
if (c == EOF) return FALSE;
return TRUE; }
boolean third_o(FILE* input) {
char c;
while ((c = getc(input)) != EOF && c != 'o');
if (c == EOF) return FALSE;
return first_w(input); }
boolean second_r(FILE* input) {
char c;
while ((c = getc(input)) != EOF && c != 'r');
if (c == EOF) return FALSE;
return third_o(input); }
boolean first_r(FILE* input) {
char c;
while ((c = getc(input)) != EOF && c != 'r');
if (c == EOF) return FALSE;
return second_r(input); }
boolean second_o(FILE* input) {
char c;
while ((c = getc(input)) != EOF && c != 'o');
if (c == EOF) return FALSE;
return first_r(input); }
boolean first_m(FILE* input) {
char c;
while ((c = getc(input)) != EOF && c != 'm');
if (c == EOF) return FALSE;
return second_o(input); }
boolean first_o(FILE* input) {
char c;
while ((c = getc(input)) != EOF && c != 'o');
if (c == EOF) return FALSE;
return first_m(input); }
boolean first_t(FILE* input) {
char c;
while ((c = getc(input)) != EOF && c != 't');
if (c == EOF) return FALSE;
return first_o(input); }
boolean accept_string(FILE* input) {
return first_t(input); }
int main() {
FILE* input = fopen("input", "r");
if (accept_string(input))
printf("Tomorrow we are starting\n");
else
printf("We still have to wait\n");
fclose(input); return 1; }
The program is rather long, but trivial. With the help of a routine find_char the text could be shortened, but the given variant stays more closely to the operation of the finite state machine. This could have been done even more explicitly by replacing the do-while loops by further recursive calls to the method itself. The reason that we have not done this, is that very deep recursion is leading to stack-overflow. The given program is guaranteed to have finite recursion depth independently of the length of the input text.
Turning the action of a finite state machine into a program is a rare example of a context in which the usage of "goto" is defendable. The reason why one normally should not use goto is that it makes it hard to trace the execution. Here this is no problem: the history is irrelevant, the only two points that matter are the current state and the remaining string to process. The labels of the states should be used as labels in the program.
Using goto, there is no need for while loops because no data are accumulated as is the case with subroutine calls. So, for the simple machines considered here, goto allows to stay most closely to the operation of the finite state machine. The alternative program, which can be downloaded here, looks as follows:
#include "stdio.h"
int main() {
FILE* input = fopen("input", "r");
char c;
first_t:
if ((c = getc(input)) == EOF) goto reject;
else if (c == 't') goto first_o;
else goto first_t;
first_o:
if ((c = getc(input)) == EOF) goto reject;
else if (c == 'o') goto first_m;
else goto first_o;
first_m:
if ((c = getc(input)) == EOF) goto reject;
else if (c == 'm') goto second_o;
else goto first_m;
second_o:
if ((c = getc(input)) == EOF) goto reject;
else if (c == 'o') goto first_r;
else goto second_o;
first_r:
if ((c = getc(input)) == EOF) goto reject;
else if (c == 'r') goto second_r;
else goto first_r;
second_r:
if ((c = getc(input)) == EOF) goto reject;
else if (c == 'r') goto third_o;
else goto second_r;
third_o:
if ((c = getc(input)) == EOF) goto reject;
else if (c == 'o') goto first_w;
else goto third_o;
first_w:
if ((c = getc(input)) == EOF) goto reject;
else if (c == 'w') goto accept;
else goto first_w;
accept:
printf("Tomorrow we are starting\n"); goto stop;
reject:
printf("We still have to wait\n"); goto stop;
stop:
fclose(input); return 1; }
The machine has four states:
- 0.
- Zero after a zero
- 1.
- One after a zero
- 2.
- One after a one
- 3.
- Zero after a one
In this case the point of the processing is not so much to accept or reject a string, but rather to traverse it and perform some specified actions depending on the state of the machine. Therefore, the machine does not need to have accepting states. Alternatively one might add one accepting state to which the automaton transits at the end of the processed string.
The transitions follows from the descriptions of the states. For example, if the machine is in state 0, it transits to state 1 when the next bit is a one otherwise it stays in state 0. In state 0 and 1 the machine outputs a zero, in state 2 and 3 the machine outputs a one.
The given machine has four states for two output values. These extra states endow the machine with a memory of its most recent history. In general a finite-state machine can be given a finite memory. However, if the input is not just bi-valued as in this example, the number of states needed may grow fast.
We mentioned that an important feature of finite automata is that they can be translated into code in a mechanical way. At least as important is that they can be realized at a much lower hardware level. The given speckle-suppressor can be realized by a small number of gates, which are switching depending on the next bit: a zero gives a pulse on input channel zero, a one on input channel one. This means that each step can be performed by a small constant number of gate switches, by a tiny circuit. Both factors together make such switching several orders of magnitude faster than handling the signals by a general-purpose processor in which signals have to travel through the whole chip and where a single operation at the level of C involves many operations at the level of the gates.
As an example we consider the string 01101001101101. If the four-state automaton for suppressing isolated 0's and 1's is applied on this string, then the sequence of visited states is given by 01232301232232. So, 01101001101101 is a label (in this case it is unique) of the path 01232301232232. The output is 00111100111111.
However, there is no requirement that automata are constructed like this. In other words, it is allowed that the same character appears in the list of symbols corresponding to an edge. If an automaton is in some state and the next symbol is x, while x occurs next to more than one outgoing edge, than the automaton may proceed over any of these edges. Such automata are called non-deterministic automata. Without further specification, one should assume that an automaton is non-deterministic.
For a deterministic automaton, one path might be labeled with many different strings, but for any string the path it labels is unique. On a non-deterministic automaton, a string may label many different paths. The most important point is that not all of these paths need to terminate in either accepting or rejecting states. The convention is to say that a non-deterministic automaton accepts a string if at least one of the paths ends in an accepting state.
A non-deterministic automaton can be viewed as a process in which at certain stages guessing is allowed. Not only the guessing is allowed, but it is even assumed that the process always guesses right. So, it is not correct to replace a non-deterministic automaton by a deterministic one by simply fixing one of the alternatives and excluding the others: this possibly reduces the number of accepted strings, because the necessary alternative for reaching an accepting state might be eliminated.
Clearly it transits to the accepting state, state 4 only if the digits 0100 occur consecutively: if any wrong bit is encountered, the automaton transits to the starting state, state 0, again. But, does it accept all strings which have 0100 as a consecutive substring? The answer is no. For example, 00100 and 010100 are rejected, but this is not correct. The reason is that even a non-matching string can contain a non-empty prefix of the string we are looking for.
In this simple case, it is not hard to repair this mistake with an alternative deterministic automaton:
If an automaton is in a certain state and the next symbol does not occur in the list of labels of any of the outgoing edges of this state, this branch of the evaluation dies.This convention is equivalent to an implicit transition labeled with all possible symbols except for those listed along the other outgoing edges to a rejecting state without exit.
This convention allows to simplify the automaton:
The semantic is very simple now: the automaton can wait any number of characters looping in state 0, before starting to run, detecting the string 0100 if it occurs. If one would like to use this automaton for actually testing a string, the non-determinism in state 0 when the next bit is 0 should be interpreted as a point where two alternative continuations are to be considered: the search is branching. In other words, the search proceeds along a tree structure instead of a simple path. For the complexity of this search process, it is essential that the branching is limited to the minimum, which is achieved by letting superfluous branches die.
For a non-deterministic automaton we must proceed along several paths. The following gives the complete simulation of the non-deterministic automaton for detecting the substring 0100:
Above we have seen that turning deterministic automata into programs is trivial. For non-deterministic machines this is harder: even though we assume that the automaton always guesses right, we cannot assume that our program does. So, we must accept that it is not sufficient to just follows any path and reject the string if no accepting state is reached. This implies that somehow we must keep track of all alive paths.
A first idea is to keep track of the state and the reached position in the string of each alive path. So, our data base might look like (2, 12), (1, 20), (3, 18), meaning that there is one alive path which has reached state 2 after reading 12 characters, one with state 1 and character 20 and one with state 3 and character 18. This requires that a position of the string may be accessed several times, but practically this is no problem: the string can be loaded into an array or the command fseek() can be used.
Alternatively, all alive paths can be pushed forwards in a synchronized way as follows: The computation is divided in supersteps. In each superstep the next character is read and for all alive paths the corresponding transition is made. In each superstep paths may die and new paths may be spawned.
The synchronized processing has the disadvantage that the expansion of a promising path is retarded by the others. However, the advantages are tremendous: if the string is infinite (for example the string of digits of the number pi), any given branch can remain unsuccessful for ever, even though others might reach an accepting state. Synchronized expansion assures that any path of finite length is traversed in finite time.
Another great advantage of synchronous expansion of the paths is that if two paths reach the same state, we do not have to continue with both of them: if they are in the same state at the same input character, then they will either both reach an accepting state, or neither of them. Because we are speaking about finite state machines, this means that in any given superstep, there are at most a finite number of alive paths to expand. This means that simulating the non-deterministic machine this way is slower by at most a constant factor in comparison with the best we could do: guess right at every branch and run towards the accepting state at full speed.
Because the synchronous expansion assures that all paths have reached the same position in the string, their is no need to store this position along with the states: it suffices to store the current set of states, which is a subset of all states. Because the automaton is finite, this requires only a constant amount of memory.
Here we touch on a great point of finite automata: if a problem can be formulated in terms of acceptance by a finite automaton (deterministic or non-deterministic) at all, then the problem can be solved by a simple program in a time that is proportional to the size of the input requiring only constant memory. Of course this only holds for the kind of automata we were considering so far: they are traversing the input string only once.
This example makes clear that non-determinism allows to easily formulate automata for string matching problems of the kind "look for occurrencies of string S_1 followed by S_2 or S_3 followed by S_4 and S_5". This explains why string matching is one of the most important application areas of automata: a request can be translated by a simple program into an automaton, which subsequently can be turned into a piece of code. Not only that this can be done, but the resulting code is even efficient.
{ 0} --0->
{ 1, 0} --1->
{ 2, 0} --0->
{ 3, 1, 0} --1->
{ 2, 0} --0->
{ 3, 1, 0} --0->
{4, 1, 0} --1->
{ 2, 0} --0->
{ 3, 1, 0} --0->
{4, 1, 0}
Because there are finitely many states, it is practical to manage the current set of states with an array b[] of bits, b[i] == 1 meaning that state i is in the set and b[i] == 0 meaning that it is not. With this convention, the array b[] develops as follows:
(0, 0, 0, 0, 1) --0-> (0, 0, 0, 1, 1) --1-> (0, 0, 1, 0, 1) --0-> (0, 1, 0, 1, 1) --1-> (0, 0, 1, 0, 1) --0-> (0, 1, 0, 1, 1) --0-> (1, 0, 0, 1, 1) --1-> (0, 0, 1, 0, 1) --0-> (0, 1, 0, 1, 1) --0-> (1, 0, 0, 1, 1)
Graphically these transitions can be indicated as follows:
{ 0} --0->
{ 1, 0} --1->
{ 3, 2, 0} --0->
{ 5, 4, 1, 0} --1->
{ 5, 3, 2, 0} --1->
{ 5, 2, 0} --0->
{6, 4, 1, 0} --1->
{ 5, 3, 2, 0} --0->
{6, 5, 4, 1, 0}
Because there are finitely many states, it is practical to manage the current set of states with an array b[] of bits, b[i] == 1 meaning that state i is in the set and b[i] == 0 meaning that it is not. With this convention, the array b[] develops as follows:
(0, 0, 0, 0, 0, 0, 1) --0-> (0, 0, 0, 0, 0, 1, 1) --1-> (0, 0, 0, 1, 1, 0, 1) --0-> (0, 1, 1, 0, 0, 1, 1) --1-> (0, 1, 0, 1, 1, 0, 1) --1-> (0, 1, 0, 0, 1, 0, 1) --0-> (1, 0, 1, 0, 0, 1, 1) --1-> (1, 0, 0, 1, 1, 0, 1) --0-> (1, 1, 1, 0, 0, 1, 1)
Graphically these transitions can be indicated as follows:
void transition(char* b, int s, char c) {
char b'[s];
for (i = 0; i < s; i++)
b'[i] = 0;
for (i = 0; i < s; i++)
if (b[i] == 1)
for (each transition e leading from state i to state j)
if (c appears in the list of labels of e)
b'[j] = 1;
for (i = 0; i < s; i++)
b[i] = b'[i]; }
This procedure is quite good: s is finite and there are only finitely many transitions from each state, so it runs in constant time (at worst the time is proportional to s^2). However, one can do much better than this. How many different bit-vectors of length s are there? 2^s. For large s, S = 2^s is a large number, but for constant s it is finite. These vectors are in a trivial one-one correspondence with the numbers 0, 1, ..., S - 1.
For any vector b, the resulting vector b_c after encountering a character c is defined by the procedure transition. This is independent of the history: any time the current states are described by b and the input character is c, the next set of states is described by b_c. So, we might just as well precompute all these: for an alphabet of size r and an automaton of size s, we create r arrays of size S = 2^s: b_c[x], 0 <= c < r, 0 <= x < S, indicates the result when applying procedure transition to the vector corresponding to the number x for character c.
The constructed arrays b_c[] define a finite automaton: there are S states, and the transition from state x, 0 <= x < S, upon encountering character c, 0 <= c < r, is given by b_c[x]. Because for any input character a unique transition is specified, this is a deterministic automaton.
We still have to fix the starting state and the final states of the new automaton. If the starting state of the non-deterministic automaton is state i, then the starting state of this deterministic automaton is state x, with x = 2^i. If state i is an accepting state of the non-deterministic automaton, then any state x of the deterministic automaton with the i-th bit of x equal to 1 is an accepting state of the deterministic automaton.
The above given construction for constructing an equivalent deterministic automaton for a given non-deterministic automaton is known in the literature as the subset construction.
Because the constructed deterministic automaton does nothing more than offering an efficient implementation of a simulation of the non-deterministic automaton and because the starting and accepting states were defined sensibly, this equivalence is nothing surprising. In the following we will check it quite formally nevertheless.
The main step is proving the following claim:
For any t, some state i of the non-deterministic automaton is an element of the set of states reached after processing t characters if and only if after processing t characters the constructed automaton reaches a state x for an x which has its i-th bit equal to 1.Once this claim is proven, we are done, because then it follows that the non-deterministic automaton reaches an accepting state i after t characters, if and only if the deterministic automaton reaches a state x with i-th bit equal to 1 and such a state was defined to be accepting.
The claim is proven by complete induction over the number t of processed characters: for t == 0, it is true because of the definition of the starting state of the deterministic automaton: before processing any characters, the set of reached states of the non-deterministic automaton only consists of the starting state i. The starting state of the deterministic automaton has only bit i equal to 1.
It remains to show that, assuming that the claim holds after processing t characters, the claim holds after processing t + 1 characters. Let c be character t + 1. Denote the states of the deterministic automaton before and after processing c by x and x', respectively.
Assume that state j is an element of the set of states reached by the non-deterministic automaton after processing t + 1 characters. It is only reached if there is a transition e from a state i which was reached after processing t characters which has label c on its transition leading to j. If i was reached after processing t characters, then we may assume, due to the induction hypothesis, that bit i of state x equals 1. But then, the above procedure transition, which defines the transitions of the deterministic automaton is so that even bit j of x' equals 1.
For the other direction, assume that bit j of x' equals 1. This only happens when there is some bit i of x which is equal to 1 so that there is a transition e from state i to state j of the non-deterministic automaton which contains c in its list of labels. However, if bit i of x equals 1, then state i is among the reached states after processing t characters, and thus is state j among the states reached after processing t + 1 characters.
The answer is yes. Even though the non-deterministic automaton can be simulated quite efficiently, by an algorithm taking time quadratic in the number of states for every character of the input, this simulation requires a procedure which cannot easily be expressed in a small automaton itself. So, from the perspective of the automaton this is like external magic.
As automata are also supposed to operate as embedded systems, being build as a small piece of hardware, this is not what we want: non-determinism does not lead to the straight-forward flow of operations we are used to, and which can be handled by the hardware as a flow of signals going from one device to the next. This is the practical reason why non-determinism is undesirable. More theoretically it is in many domains a question whether deterministic approaches are as powerful as non-deterministic ones. In the case of finite automata, the given construction shows that the answer is affirmative.
The reason to work with non-deterministic automata at all, is that they often allow to achieve the desired functionality much easier than with deterministic automata. One should only consider the example of the non-deterministic automaton for detecting 01*0 or 1010. The given construction then allows to turn this non-deterministic automaton into an equivalent deterministic one. So, a deterministic automaton for a given task can often most easily be obtained by first constructing a non-deterministic one.
The major disadvantage of the construction is the tremendous increase in the number of states. However, in practice it turns out that this number often can be strongly reduced. Further down we will present a method for reducing the number of states to the minimum possible. We will see that for our example problems, the required number of states in the deterministic automaton is not so much larger than the number in the non-deterministic one.
It is immediately clear that any reachable state (b_4, ..., b_1, b_0) has b_0 == 1. This can easily be proven by complete induction over the number of processed characters. Proving this in full detail is a nice exercise of how to prove facts for the deterministic automaton exploiting knowledge about the non-deterministic automaton and using the way the deterministic automaton is constructed.
Denote by B_t the state of the deterministic automaton reached after processing t characters. B_0 == (0, 0, 0, 0, 1), which clearly has b_0 == 1, so the claim that b_0 == 1 for all t holds for t == 0. Now assume that b_0 == 1 in B_t, for some t >= 0. Then we should show that b_0 == 1 even in B_{t + 1}. B_{t + 1} depends on B_t and character t + 1. Let B_{t + 1, 0} be the resulting state when this character is 0, and B_{t + 1, 1} when it is 1.
0 element_of f(0, 0) subset_of union_{i | b_i == 1 in B_t} f(i, 0)Thus, the definition of the transitions in the deterministic automaton gives that b_0 == 1 in B_{t + 1, 0}. Analogously, the following implies that even b_0 == 1 in B_{t + 1, 0}.
0 element_of f(0, 1) subset_of union_{i | b_i == 1 in B_t} f(i, 1)
So, given that b_0 == 1 in B_t, b_0 == 1 also in B_{t + 1}, whatever the value of character t + 1 is. This completes the proof by complete induction: we have been checking both the basis and the step.
One should work systematically in order not to forget any reachable state. A good idea is to write any newly discovered state on a list and consider as next state the one that is at the end or the beginning of the list. Initially only the initial state stands on the list, as soon as the list is empty, we are done.
Computing the transitions from the states given by bitvectors of the deterministic automaton can be performed by directly executing the above procedure "transition". However, this is facilitated by first computing the transitions from the primitive bitvectors: for a non-deterministic automaton with s states the bitvectors with a single 1 at position i, 0 <= i < s, are denoted by p_i and called primitive bitvectors.
Denote the state that is reached when transiting from p_i with character c by p_{i, c}. p_{i, c} has a 1 in all positions that correspond to states of the non-deterministic automaton that are reachable from state i by a transition with label c and 0's in all other positions.
When transiting with character c from a state of the deterministic automaton given by bitvector (b_{s - 1}, ..., b_1, b_0), a state is reached with bitvector Or_{0 <= i < s| b_i == 1} p_{i, c}. Here the or operation is performed bitwise: for two bitvectors u and v, w == u or v has a 1 precisely there where u or v have a 1.
(0, 0, 0, 0, 1) --0-> (0, 0, 0, 1, 1)
--1-> (0, 0, 0, 0, 1)
(0, 0, 0, 1, 0) --0-> (0, 0, 0, 0, 0)
--1-> (0, 0, 1, 0, 0)
(0, 0, 1, 0, 0) --0-> (0, 1, 0, 0, 0)
--1-> (0, 0, 0, 0, 0)
(0, 1, 0, 0, 0) --0-> (1, 0, 0, 0, 0)
--1-> (0, 0, 0, 0, 0)
(1, 0, 0, 0, 0) --0-> (0, 0, 0, 0, 0)
--1-> (0, 0, 0, 0, 0)
Here --0-> denotes the transition when the next character equals 0,
the transition with label 0, and --1-> denotes the transition with
label 1.
We can now easily compute all states which are reachable from the starting state (0, 0, 0, 0, 1) of the deterministic automaton by computing the appropriate bitwise ors of these vectors:
-- Distance 0 From Starting State --
(0, 0, 0, 0, 1) --0-> (0, 0, 0, 1, 1)
--1-> (0, 0, 0, 0, 1)
-- Distance 1 From Starting State --
(0, 0, 0, 1, 1) --0-> (0, 0, 0, 1, 1)
--1-> (0, 0, 1, 0, 1)
-- Distance 2 From Starting State --
(0, 0, 1, 0, 1) --0-> (0, 1, 0, 1, 1)
--1-> (0, 0, 0, 0, 1)
-- Distance 3 From Starting State --
(0, 1, 0, 1, 1) --0-> (1, 0, 0, 1, 1)
--1-> (0, 0, 1, 0, 1)
-- Distance 4 From Starting State --
(1, 0, 0, 1, 1) --0-> (0, 0, 0, 1, 1)
--1-> (0, 0, 1, 0, 1)
As an example we consider how the transition from (0, 1, 0, 1, 1) for character 0 is computed. There are 1's at position 0, 1 and 3. So, we must compute the bitwise or of p_{0, 0}, p_{0, 1} and p_{3, 0}. That gives (0, 0, 0, 1, 1) or (0, 0, 0, 0, 0) or (1, 0, 0, 0, 0) == (1, 0, 0, 1, 1).
In the given very simple example, there was always just one state on the list of states to work out, but this is an exceptional case. In this we were also lucky that the number of resulting states is so small, but in general this number may lie close to 2^s, for a non-deterministic automaton with s states. Actually, in this case the number of states was minimal: the deterministic automaton can never have fewer states than the non-deterministic automaton (provided that all states of the non-deterministic automaton are reachable).
The resulting automaton is (except for the transitions from the accepting state) identical to the earlier given deterministic automaton for this problem. Of course in the final automaton the states can be renumbered contiguously starting from 0: the numbers of the states have no external meaning, and for processing purposes it is handy if they are as small as possible.
For the primitive states we have:
(0, 0, 0, 0, 0, 0, 1) --0-> (0, 0, 0, 0, 0, 1, 1)
--1-> (0, 0, 0, 0, 1, 0, 1)
(0, 0, 0, 0, 0, 1, 0) --0-> (0, 0, 0, 0, 0, 0, 0)
--1-> (0, 0, 0, 1, 0, 0, 0)
(0, 0, 0, 0, 1, 0, 0) --0-> (0, 0, 1, 0, 0, 0, 0)
--1-> (0, 0, 0, 0, 0, 0, 0)
(0, 0, 0, 1, 0, 0, 0) --0-> (0, 1, 0, 0, 0, 0, 0)
--1-> (0, 1, 0, 0, 0, 0, 0)
(0, 0, 1, 0, 0, 0, 0) --0-> (0, 0, 0, 0, 0, 0, 0)
--1-> (0, 1, 0, 0, 0, 0, 0)
(0, 1, 0, 0, 0, 0, 0) --0-> (1, 0, 0, 0, 0, 0, 0)
--1-> (0, 0, 0, 0, 0, 0, 0)
(1, 0, 0, 0, 0, 0, 0) --0-> (0, 0, 0, 0, 0, 0, 0)
--1-> (0, 0, 0, 0, 0, 0, 0)
These are used to simply determine the following set of reachable states. To facilitate the construction of the picture given hereafter and to check the correct drawing of it, we have already indicated the indices to which the states of the deterministic machine are mapped.
-- Distance 0 From Starting State --
0 ~ (0, 0, 0, 0, 0, 0, 1) --0-> (0, 0, 0, 0, 0, 1, 1) ~ 1
--1-> (0, 0, 0, 0, 1, 0, 1) ~ 2
-- Distance 1 From Starting State --
1 ~ (0, 0, 0, 0, 0, 1, 1) --0-> (0, 0, 0, 0, 0, 1, 1) ~ 1
--1-> (0, 0, 0, 1, 1, 0, 1) ~ 3
2 ~ (0, 0, 0, 0, 1, 0, 1) --0-> (0, 0, 1, 0, 0, 1, 1) ~ 4
--1-> (0, 0, 0, 0, 1, 0, 1) ~ 2
-- Distance 2 From Starting State --
3 ~ (0, 0, 0, 1, 1, 0, 1) --0-> (0, 1, 1, 0, 0, 1, 1) ~ 5
--1-> (0, 1, 0, 0, 1, 0, 1) ~ 6
4 ~ (0, 0, 1, 0, 0, 1, 1) --0-> (0, 0, 0, 0, 0, 1, 1) ~ 1
--1-> (0, 1, 0, 1, 1, 0, 1) ~ 7
-- Distance 3 From Starting State --
5 ~ (0, 1, 1, 0, 0, 1, 1) --0-> (1, 0, 0, 0, 0, 1, 1) ~ 8
--1-> (0, 1, 0, 1, 1, 0, 1) ~ 7
6 ~ (0, 1, 0, 0, 1, 0, 1) --0-> (1, 0, 1, 0, 0, 1, 1) ~ 9
--1-> (0, 0, 0, 0, 1, 0, 1) ~ 2
7 ~ (0, 1, 0, 1, 1, 0, 1) --0-> (1, 1, 1, 0, 0, 1, 1) ~ A
--1-> (0, 1, 0, 0, 1, 0, 1) ~ 6
-- Distance 4 From Starting State --
8 ~ (1, 0, 0, 0, 0, 1, 1) --0-> (0, 0, 0, 0, 0, 1, 1) ~ 1
--1-> (0, 0, 0, 1, 1, 0, 1) ~ 3
9 ~ (1, 0, 1, 0, 0, 1, 1) --0-> (0, 0, 0, 0, 0, 1, 1) ~ 1
--1-> (0, 1, 0, 1, 1, 0, 1) ~ 7
A ~ (1, 1, 1, 0, 0, 1, 1) --0-> (1, 0, 0, 0, 0, 1, 1) ~ 8
--1-> (0, 1, 0, 1, 1, 0, 1) ~ 7
Even here we are very lucky: out off the potentially 128 states, only 11 are reachable from the starting state. Three of these states are accepting states. So, the resulting deterministic automaton remains relatively simple. At the same time, it appears that it would have been far from easy to find this deterministic automaton, including all transitions, without the presented general technique for turning non-deterministic automata into deterministic automata.
If the only purpose is to determine whether a specified pattern occurs or not, all accepting states can be fused to a single accepting state without outgoing transitions. Only if one wants to use the automaton to detect all occurrencies matching the pattern, these accepting states and their outgoing transitions must be preserved. This only works when, against our default assumption, we assume that the automaton does not halt when reaching an accepting state, but rather produces some output (further down we will see that we could better speak of a Moore automaton in this case).
If one is eventually going to fuse the accepting states anyway, then there is no need to generate all of them to start with: any state of the deterministic automaton of the form (1, *, ..., *) is an accepting state, and there is no need to consider their outgoing transitions. This may help saving quite some work during the construction.
So, the goal is to find among all equivalent automata (in the sense that they accept the same strings) the one which has the fewest states. We are lucky: in the case of deterministic automata, there is a unique minimum-state automaton within any class of equivalent automata, and it can be found quite easily.
The idea is to consider which states are equivalent. Two states x and y are said to be equivalent if for any legal input string the automaton reaches an accepting state starting from x if and only if it reaches an accepting state starting from state y. If x and y are equivalent in this sense, then there is no need to keep both of them. One of them, for example y, can be eliminated: all transitions leading to y are replaced by transitions leading to x, all transitions leading out off y are simply deleted.
Notice that the claimed unique and easy-to-construct minimal automaton is guaranteed to exist only for deterministic automata. In the example further down, we will see equivalent non-deterministic automata. Even though they have different sizes, they do not have any equivalent states, so they cannot be reduced.
The idea is now to initially construct two subsets: the accepting states and the non-accepting states. Then, repeatedly we look for a subset S with states in it which have transitions labeled with the same character c to states in different sets. S is split accordingly. The process stops once for all subsets S, for all states in S all transitions with the same label lead to states in the same set.
The proof that any two states which are classified as non-equivalent according to the above criteria are indeed not equivalent can be performed by induction over the number of performed splitting operations. After performing zero splitting operations, there are two subsets: the accepting states and the non-accepting states. These are not equivalent, because starting in an accepting state, an accepting state is reached when the input string is empty. From a non-accepting state we do not reach an accepting state with the empty string. Now assume that states A and B belong to the same subset of states until some step t and that they belong to different subsets after step t. By induction we may assume that any two states which were split into different sets are indeed not equivalent. A and B are split into different sets in step t, only when the procedure has found a character c for which there is a transition from A to some state A_c and for B to some state B_c so that A_c and B_c have been classified as non-equivalent before. By the induction assumption, this means that there is a string S so that starting in A_c we reach an accepting state A_t and starting in B_c we reach a state B_t which is not accepting, or vice-versa. Assume without loss of generality that A_t is accepting and B_t is not. Then, starting from A with string cS (c followed by the symbols from S) we reach the accepting state A_t and starting with cS from B we reach the non-accepting state B_t. So, indeed A and B are not equivalent.
We do not prove the reversal: once the process stops all states which are not equivalent have been split into different subsets. Together with the above this implies that the procedure precisely finds all classes of equivalent states. Thus melting together all states which lie in the same subset gives a minimal automaton (because no unnecessary splits are performed) and this automaton is equivalent to the original one (because no non-equivalent states end up in different subsets).
The claim is even stronger: because for any given problem the minimal deterministic automaton is unique (except for the irrelevant numbering of the states), the obtained automaton is unique. Particularly, starting with different non-deterministic automata for a problem, it may happen that the subset construction gives us very different deterministic automata. However, after minimizing each of them, we have the guarantee to obtain the same minimal deterministic automaton. This means that there is no need to choose the non-deterministic automaton in a special way: in the end we will always find the same (only the amount of work in the intermediate steps may be different).
In order to perform this procedure, one may have to add a special dead state: for any character not appearing in the list of labels of any of the transitions out off a state x, a transition to this dead state is added. The dead state has a transition to itself for all characters of the alphabet. After running the minimization procedure, the dead state can be removed again. If all characters appear as label at one of the transitions out off each state, then there is no need for a dead state. This latter case particularly arises for deterministic automata which were obtained as a result of the subset construction. Also one should remove all states which have no ingoing links. The subset construction never produces such states if only earlier reached states are handled.
After removing state 9 which has no ingoing transitions and adding a dead state D, we distinguish to equivalence classes as indicated in the following picture. In the further description, the (preliminary) equivalence classes will be designated by the index of the node with the smallest index in it. So, we will speak of class 0 and class A.
In a first round of testing we discover the difference between the states 7 and 8 and the other nodes in the class of node 0: from state 7 and 8, there is a transition with label 0 to class A and from the other nodes there is no such transition. This gives the following situation with three classes:
In the second round of testing we discover the difference between the states 3, 4, 5 and 6 and the other nodes in the class of node 0: from these states, there is a transition with label 0 to class 7 and from the other nodes in class 0 there is no such transition. This gives the following situation with four classes:
In the third round of testing we discover the difference between the states 3 and the other states in class 3: from state 3, there is a transition with label 1 to a state in class 3, while for the others the transition with label 1 leads to a state in class 0. We also discover that state 1 and 2 are not equivalent to state 0 and D, because from 1 and 2 there are transitions to states in class 3.
In the fourth round of testing we discover the difference between state 1 and state 2: from state 1 there is a transition with label 0 to a state in class 3 and for state 2 the transition with label 0 leads to a state in class 4. Likewise we discover the difference between state 0 and state D. This gives the following situation:
0{2}0 |
0{2}1 |
0{2}2 |
1{2}0 |
1{2}1 |
2{2}0 |
2{2}2
Here consecutive symbols indicate a substring, "|" separates
alternatives and "{x}" denotes zero or more repetitions of symbol x.
Round brackets, "( ... )", may be used to group subexpressions. The
given pattern can be written as
0{2}(0|1|2) |
1{2}(0|1) |
2{2}(0|2)
and as
(0|2){2}(0|2) |
(0|1){2}(0|1)
These reformulations can immediately be translated into equivalent non-deterministic automata. Each of these automata is minimal, in the sense that the number of states cannot be reduced by fusing equivalent states. This shows that there is no unique minimal non-deterministic automaton for a task.
Constructing the corresponding deterministic automata gives two different automata. However, in the upper automaton the states 5, 5' and 5" are equivalent and likewise 1 and 1'. Fusing the equivalent states gives the lower automaton. So, here we find a unique minimal deterministic automaton.
If one is going to fuse the terminal states, then one should do this before running the minimization procedure, because nodes with transitions to non-equivalent terminals may become equivalent once the terminals are fused. Going through this whole elaborate process one eventually often finds quite a small deterministic automaton even for relatively complicated searches.
The basic definition says that an automaton halts as soon as it reaches an accepting state. However, it might be handy to consider the variant that a string is accepted if the automaton is in an accepting state at the end of the string. In the exercises we will consider the problem of determining whether the number of 1's in a string of 0's and 1's is even. It is handy to let the accepting state correspond to the case "so far the number of encountered 1's was even". With the conventional definition of accepting, we need an extra state to which the automaton jumps at the end of the string. This makes the automaton less aesthetical and forces to tell what character is used to mark the end of the string.
Automata which are used to produce some output (more than one bit) in reaction to the provided string can be distinguished in two categories:
Without explicitly calling it that way, we have encountered a Moore automaton in the example on suppressing isolated 0's and 1's. Here state 0 and 1 produce as output a 0, while state 2 and 3 produce as output a 1. Mealy automata we have not encountered yet.
Moore and Mealy automata are equivalent. This means that for any Moore automaton there is a corresponding finite Mealy automaton producing exactly the same output and vice versa. A minor problem, implying that often quite a few extra states are required, is that the number of visited states exceeds the number of transitions by one.
A Moore automaton can be turned into a Mealy automaton by allocating the output of a state to all transitions leading to this state (alternatively one might allocate this output to the transitions leading out off this state). In order to assure that even the output from the starting state is produced, one should add an extra starting state, which produces extra output on the transitions out of it. Only for empty strings no output is produced: the automata produce the same output for all non-empty strings.
A Mealy automaton can be turned into a Moore automaton, by replacing a state s by as many states s_1, ..., s_k as there are different outputs on the transitions leading to state s. The transitions from these states are the same as for s. Because the number of transitions to a state is limited by the size of the alphabet and the number of states, this replacement increases the number of states by at most a constant factor on a finite automaton. Because the Mealy automaton does not produce output before the first transition, one should add a special start state which does not produce output. The following example gives a Mealy automaton which inverts the bits of a string and the equivalent Moore automaton:
It is important to distinguish two kind of symbols: the symbols from the language itself and the meta-symbols which are used to speak about the language. In principle we might use any symbol as a meta-symbols but for practical reasons one mostly uses words (article, substantive, adjective, verb, ...). The symbols from the language itself (cat, dog, walk, long, ...) are also called terminal symbols because these stand at the bottom of the construction of the language, the meta-symbols are often called non-terminal symbols.
Syntax-diagrams are a handy way of formulating grammatical rules. Assume that we want to construct simples sentences consisting of a subject part, a verb and an object part, then the rules for this might be formulated with help of the following diagrams:
Here the symbols in rounded boxes are terminal symbols, the symbols in rectangled boxes are meta symbols. SNT stands for "sentence", FRM for "substantive form", ART for "article", ADJ for "adjective", SUB for "substantive" and VRB for "verb".
How to read the diagram? One starts at the left side and ends at the right side. Any path gives a correct expression. If a line forks, this indicates several legal alternatives. In the definition of FRM we see two different kind of alternatives: a form consists of zero or one articles and zero or more adjectives. This follows from the rule that one should always follow smooth curves. So, to get an adjective one takes the second turn-off and then one may loop several times.
Notice that the formulation of SNT in terms of FRM and VRB is handy, but not unique. Particularly it is always correct (but the underlying structure is obscured by this) to replace any non-terminal symbol by its definition. That is, in the above diagram, the rectangled boxes might be replaced by the defining diagrams, until the whole diagram only consists of terminal symbols, lines and curves.
The above grammatical rules are called production rules: they can be used to produce meta symbols. The meta symbol at which the production start, in our case SNT, is called the starting symbol.
In the example we specified a few terminal symbols of each category. It would also have made sense not to specify these at all, knowing that there are thousands of words and that our intention is to express the grammatical rules, not the set of words. A grammar which is not worked out until the level of the terminal symbols is called an abstract grammar.
We summarize the notions and come to a formal definition of a grammar. A grammar is a quadruple consisting of:
Here we have further added the meta-symbols PGR, which stands for "preposition group", PRP for "preposition" and REL for "relative pronoun". Of course we have very few terminal symbols, but, assuming that we would have specified more verbs, substantives, adjectives etc, with the now given production rules one can already construct quite complicated sentences like "the ugly tall man which sees the brown dog which bytes the black fat cow in the tail throws a sandwich to the other man which wears a blue hat ... ".
The exiting thing about this is that in the definition of FRM' we find FRM' itself again. Before we have seen how a FRM could have an arbitrary number of ADJ in it by a looping construction, but the phenomenon here is more intricate. It is called recursion. How can one define something in terms of itself? Don't we get an explosion? No, just as in the loop, there is a possibility to terminate by not choosing a recursive alternative.
It is essential that any recursive definition contains at least one non-recursive alternative. Such an alternative is called a basis of the recursion. A recursive definition without a basis is called circular. In the above example, there is one direct recursion: FRM' appears in FRM' again, but there is also a instance of a more indirect recursion: PGR appears in FRM' and FRM' appears in PGR. In the latter case we will say that the notions FRM' and PGR are mutually recursive.
The choice of the symbols in the BNF is somewhat old fashioned, using only type-writer symbols, but now that we are using html to write this text, this is convenient. The symbols are:
In the syntax-diagrams, the distinction between terminal and non-terminal symbols was expressed by having two kinds of boxes. In the BNF, the non-terminal symbols are enclosed between sharp brackets: "<" and ">".
The expression on the right of a "::=" is read from left to right, just as in the syntax diagrams, the default connection for several consecutively listed symbols being "and". There are no brackets for delimiting subexpressions. Thus, the "|" symbol can only be used at the top-level. If one wants to define alternatives at a lower level, a new non-terminal symbol like "letter_or_digit" must be introduced, where the "|" can be used on several listed alternatives at the top-level.
These symbols used in the BNF are neither terminal symbols from the language, nor meta-symbols used for grammatical notions. These are called meta-syntactical symbols: they are used in order to describe the syntax, but are not part of the syntax themselves.
To show how the formalism works, we reformulate the above examples, except for SNT' which is given as an exercise. For integer constants we have the following:
<int constant> ::= <numb constant> <int suffix>
<int suffix> ::= <upart> | <lpart> | <upart> <lpart>
<l part> ::= <empty> | <one l part> | <two l part>
<two l part> ::= ll | LL
<one l part> ::= l | L
<empty> ::=
<u part> ::= <empty> | u | U
<numb constant> ::= <dec constant> | <oct constant> | <hex constant>
<hex constant> ::= 0 <x part> <hex digit> { <hex digit> }
<hex digit> ::= <dec digit> | A | B | C | D | E | F
<dec digit> ::= 0 | <non-zero dec digit>
<non-zero dec digit> ::= <non-zero oct digit> | 8 | 9
<non-zero oct digit> ::= 1 | 2 | 3 | 4 | 5 | 6 | 7
<x part> ::= x | X
<oct constant> ::= 0 { <oct digit> }
<oct digit> ::= 0 | <non-zero oct digit>
<dec constant> ::= 0 | <non-zero dec constant>
<non-zero dec constant> ::= <non-zero dec digit> { <dec digit> }
For chains we have the following rather clumsy way of saying that a chain consists of zero or more nodes connected by links. However, it clearly expresses the several ways chains can be constructed:
<chain> ::= <empty> | node |
<node chain> | <chain node> | <chain chain>
<chain chain> ::= <chain> link <chain>
<chain node> ::= <chain> link node
<node chain> ::= node link <chain>
<empty> ::=
For palindromes we have the following:
<palin> ::= <empty> | <letter> | <letter palin letter> <letter palin letter> ::= a <palin> a | b <palin> b <letter> ::= a | b <empty> ::=
The given example of integer constants shows that one may need quite a lot of non-terminal symbols to formulate a relatively easy concept. Additional meta-syntactical symbols make the formalism complexer but also more powerful, allowing to give shorter formulations.
The most natural way of parsing is to work bottom-up. This means that one starts at the bottom, in our case that is at the terminal symbols, and works upwards until reaching the top, in our case that means at the starting symbol. As an example we will consider a sentence, a chain and a palindrome. A priori it is not even clear that these are grammatically correct, that is whether they have been composed according to their respective production rules. Even more interesting is the question how to perform the parsing and if the parsing is successful whether the resulting decomposition is unique.
Parsing is not easy. The basic idea is that the parser (the program executing the parsing task) is continuously looking for a replacement to make in the hope to finally reach the starting symbol. If the parser does not work sufficiently carefully, it may dead end or turn in a loop. As a result it constructs a tree-like structure, indicating which symbols were taken together and replaced by one symbol. This tree-like structure is called a parse tree.
Alternatively, one can try to apply top-down parsing. This means that one starts at the top and tries to reach the bottom. This is done by starting at the start symbol of the grammar. One chooses a path in the diagram of the start symbol. For all meta symbols on this path a path in their diagrams is chosen. This procedure is continued until only terminal symbols are left. The resulting sequence of terminal symbols should be identical to the sentence we wanted to parse.
If the parser complains about syntax errors, this means that it was not able to construct a parse tree of the give sentence or computer program or whatever other expression which is supposedly constructed according to a grammar. In spoken language it is quite common that the speaker somehow ends a sentence in a way that does not fit the way he/she has started it.
Most natural languages are ambiguous. For example, if one says in English "the man is throwing at the dog with a ball". You may assume that the man is throwing a ball at the dog, until the sentence is extended to "the man is throwing a stone at the dog with a ball". Showing that in the first sentence it is not clear who has the ball. In spoken language ambiguity is often overcome by the use of intonation, in written language this support is missing.
Nevertheless, still we do not know what the sentence means: parsing can be performed by a computer with a dictionary (provided that each words belongs to a unique class), but attributing meaning to it requires more. Here we encounter the difference between a grammatical analysis and determining the semantic of an expression. The same situation we also find with numbers: 067252ll can be parsed and the conclusion is that it is an integer constant. However, the interpretation that this is an octal longlong number with decimal value 2 + 5 * 8 + 2 * 64 + 7 * 512 + 6 * 4096 = 28330 is left to be done.
An attribute is defined formally, by indicating for each production rule the resulting value of the attribute for the non-terminal symbol on the left in terms of the values of the attribute for the non-terminal symbols on the right and by specifying the value of the attribute for the terminal symbols.
As a first example, consider the number of a's in a palindrome. We list the rules of the grammar again, now all possible productions are numbered:
<palin> ::= (1)
| a (2)
| b (3)
| <let_pal_let> (4)
<let_pal_let> ::= a <palin> a (5)
| b <palin> b (6)
Now the attribute can be defined as follows:
The definition is given so that the defined attribute corresponds to the number of a's, but there was no need to do so: we can define whatever we want, but some definitions are more useful than others.
If now one wants to determine the number of a's in a palindrome then it can be determined by constructing a parse tree and working ones way up from the terminal symbols to the starting symbol applying the given rules.
As a second example we consider a restricted grammar for numerical expressions:
<expr> ::= <part> (1)
| <part> <operator> <expr> (2)
<part> ::= <number> (3)
| ( <expr> ) (4)
Here the non-terminal symbol <number> is not further specified. So,
this is an abstract grammar. <number> might for example be an
<int constant>.
The attribute we consider is the number of operators. It is defined as
In this way the attributes of the non-terminal symbols higher in the parse tree are defined in terms of the attributes of the non-terminal symbols at lower levels. Such a definition is called inductive. Because this has to do with the structure of the tree this kind of induction is called structural induction, we may say "the attribute is defined by structural induction".
Of course in the simple cases of computing the number of a's in a palindrome or the number of operators in an expression the problem can be solved easier by determining the values in a direct way, but for more general attributes the definition of an attribute by structural induction and its evaluation with help of a parse tree makes sense.
The attribute number_of_numbers is defined by structural induction as follows:
Both attributes are now defined formally. Proving the claim can then be performed by performing the following two steps
This approach of proving is not limited to this particular case, but is a general proof method. Such a proof is called a proof by structural induction. That this is a legal way of proving cannot be proven itself. It is an axiom of mathematics, part of the believe so to say. It is not unnatural though: if something is true for all basic cases, and it remains true when applying any of the possible ways to obtain a more involved case, then it should be true for all cases.
It remains to write down the proof by structural induction of the claim that number_of_numbers = number_of_operators + 1 for all constructible expressions. We should first check that it holds for all numbers. For these the claim is ok: one number, zero operators, as expressed by the statement on the result of a production using rule (3).
Now we consider the other production rules. We use O(symbol) to denote the number of operators in the denoted symbol, N(symbol) denotes the number of numbers. The basis of the proof by induction is given by rule (3):
N(<number>) = 1 = 0 + 1 = O(<number>) + 1.
| N(<expr>) | =def= |
| N(<part>) | =ass= |
| O(<part>) + 1 | =def= |
| O(<expr>) + 1. |
| N(<expr> left) | =def= |
| N(<part>) + N(<expr> right) | =ass= |
| (O(<part>) + 1) + (O(<expr> right) + 1) | =cmp= |
| (O(<part>) + O(<expr> right) + 1) + 1 | =def= |
| O(<expr> left) + 1. |
| N(<part>) | =def= |
| N(<expr>) | =ass= |
| O(<expr>) + 1 | =def= |
| O(<part>) + 1. |
This method states, that for proving a claim on a function f from the natural numbers, it is sufficient to do the following:
A well-known example of a fact which can be proven by complete induction is that for S(n) = sum_{i = 0}^n i and P(n) = n * (n + 1) / 2, we have S(n) = P(n), for all n >= 0. Possibly this might be proven in some direct way, but for proving facts like these induction is the first method to try. So, what should we do? Clear: check the basis and show that a step can be made.
Checking the basis means showing that S(0) = P(0) using the definitions of S() and P(). Showing that a step can be made means showing that S(n + 1) = P(n + 1), using the definitions of S() and P() and using that S(n) = P(n). This can be worked out as follows:
| S(0) | =def= |
| sum_{i = 0}^0 i | =cmp= |
| 0 | =cmp= |
| 0 * (0 + 1) / 2 | =def= |
| P(0). |
| S(n + 1) | =def= |
| sum_{i = 0}^{n + 1 i | =cmp= |
| (sum_{i = 0}^n i) + (n + 1) | =def= |
| S(n) + (n + 1) | =ass= |
| P(n) + (n + 1) | =def= |
| (n + 2) * (n + 1) / 2 | =cmp= |
| (n + 1) * ((n + 1) + 1) / 2 | =def= |
| P(n + 1). |
Complete induction is really a special case of structural induction when one considers the natural numbers to be sentences in the language generated by the following grammar:
<number> ::= <zero>
| <number> 1
<zero> ::=
Here the numbers are given in unary notation: the number n is written
as n ones. So, proving a claim on attributes defined over this grammar
by structural induction, means checking the claim for <zero> and
proving it holds for 1 ... 11 assuming it holds for 1 ... 1. This is
precisely what complete induction does.
The correctness of the method of proving by induction and the fact that the above grammar corresponds to the set of natural numbers constitutes the set of axioms of natural numbers which were formulated 1889 by Peano. The properties of the grammar are more conventionally formulated as:
One possible way of proving that all sequences satisfying P are generated, is by indicating for any possible sequence S satisfying P by which production rule it arises out off smaller sequences S_1, ... satisfying P. In the construction of S_1, ... satisfying P, it may explicitly be used that S satisfies P. So, here the structural induction is reversed. It is essential that the new sequence(s) are smaller, otherwise a circular argument cannot be excluded. Equivalently one may also perform complete induction over the length of the sequences. This point will be addressed in more detail after some examples.
<sent> ::= a <sent> b (1)
| b <sent> a (2)
| <sent> <sent> (3)
| (4)
We want to prove that this grammar generates precisely those sentences which have the same number of a's and b's. Denote the number of a's by A() and the number of b's by B(). Then
A(<sent> left) = A(<sent> right) + 1,
B(<sent> left) = B(<sent> right) + 1.
A(<sent> left) = A(<sent> right) + 1,
B(<sent> left) = B(<sent> right) + 1.
A(<sent> left) = A(<sent> right_1) + A(<sent> right_2),
B(<sent> left) = B(<sent> right_1) + B(<sent> right_2).
A(<sent>) = 0,
B(<sent>) = 0.
In order to prove that all generated sentences S have the property that A(S) = B(S), one should proceed by structural induction. The proof is entirely analogous to the above proof that the number of numbers in an expression exceeds the number of operators by one and is left as an exercise.
Now we consider all possible sequences of a's and b's with an equal number of each of them. There is only one such sequence with 0 a's and 0 b's: the empty sequence. It is generated by rule (4). All non-empty strings with equally many a's and b's must have at least one a and one b and therefore have at least two letters in total. Strings with at least two letters over an alphabet with two letters can either begin with a and end on a, or begin with a and end on b, or ... . So, any string S with at least two letters is of the form x R y, with x, y either equal to a or b, and where R is a string with two fewer letters. For each of these four cases, we show that if S satisfies A(S) = B(S), that then it can be constructed from shorter strings also satisfying this property.
A proof like this has the problem that one needs some argument to convince the reader that really all cases are treated. In this case this argument is really convincing, but here we use external facts: our understanding of how strings with the same number of a's and b's look. A related point is that for the case a R a, we use all kinds of facts which do not follow from the production rules. For example, that if A(a R a) = B(a R a), that then A(R) = B(R) - 2. These points can be formalized by extending the definition of the attributes to all strings, not only those generated by the grammar. Even harder it is to formalize the argument that if a string has two more b's than a's it can be cut so that each section has one more b, but even this can be done. Here we are slightly tolerant and occasionally do not ask to formalize the last detail. Nevertheless it is important to underline the fundamental difference between proving that for all generated sentences a certain property holds, and proving that all sequences for which a property holds are generated. The first is straight forward, the second requires that one somehow can formulate how all possible cases look and to find a sensible subdivision in cases which can be treated separately. Not being able to find a counter example to a hypothesis is not the same as proving it!
The given prove that in the case a R b the sequence R satisfies A(R) = B(R) is complete and convincing, but it can also be given in a somewhat other way, which might sometimes be easier. The only facts that can be used in this proof is the way R is constructed and that A(S) = B(S). An alternative to the given positive proof, is a proof by contradiction. This is a general proof technique, which works as follows: some assumption is made and then it is shown that this leads to a contradiction. Then it is concluded that the assumption cannot hold, and that thus the opposite must be true. The correctness of this way of arguing is an axiom of mathematics, and has been the topic of fundamental disputes. There are few cases where a proof by contradiction cannot be turned into a positive proof, but often they are elegant and convincing. In our case, assuming A(R) != B(R), implies A(S) = A(R) + 1 != B(R) + 1 = B(S), a contradiction. Thus, the assumption A(R) != B(R) must have been wrong, which implies A(R) = B(R).
<par> ::= <empty> (1)
| ( <par> ) <par> (2)
<empty> ::=
Define the attribute B() to give the balance: B(S) gives the number of "(" symbols minus the number of ")" symbols in S. More generally, B_i(S) gives the balance in the first i symbols of S. The attribute M() on S gives the minimum prefix balance of S: M(S) = min_{0 <= i <= number of symbols in S} B_i(S). A string of parentheses S is defined to be balanced if B(S) = M(S) = 0. We will show that the grammar generates precisely these strings.
B(<empty>) = M(<empty>) = 0. So, the basis of the inductive proof that all generated strings satisfy the property is ok. For rule (2), we have B(<par> left) =def= 1 + B(<par> first right) -1 + B(<par> second right) =cmp= B(<par> first right) + B(<par> second right) =ass= 0 + 0 =cmp= 0. In order to determine M(S) for a string S = ( R_l ) R_r generated according to rule (2), we distinguish several cases. Let l be the number of symbols in R_l and r the number in R_r. The following is easy to check:
B_i(S) = 1 >= 0, for i = 1Thus, M(S) = 0, showing that S satisfies the properties.
B_i(S) = 1 + B_{i - 1}(R_l) >= 1 >= 0, for 2 <= i <= l + 1
B_i(S) = 1 + B(R_l) - 1 = 0, for i = l + 2
B_i(S) = 1 + B(R_l) - 1 + B_{i - l - 2}(R_r) >= 0, for l + 3 <= i <= l + r + 2
For the other direction we consider an arbitrary string S of n parentheses. The first symbol is a "(", so we may write S = ( R, where R equals the remaining symbols. Denote position i, 0 <= i < n - 1, of R by r_i. B_i(R) = sum_{0 <= j < i} value(r_i), where the function value() is defined by value('(') = +1 and value(')') = -1. B_{n - 1}(R) = B_n(S) - 1 = 0 - 1 = -1, that is, in R there is one more ')' than '('. Let i, 1 <= i <= n - 1, be the smallest value so that B_i(R) < 0. Because B_i differs from B_{i - 1} by 1, we must have B_{i - 1}(R) = 0 and B_i(R) = -1. Let R_l be the string consisting of r_0, r_1, ..., r_{i - 2} and let R_r consist of r_i, r_{i + 1}, ..., r_{n - 2}. R_r is possibly empty. So, S = ( R_l ) R_r. By definition B_j(R_l) >= 0 for all j < i - 1 and B_{i - 1}(R_l) = 0. Thus, R_l is a correct sequence of parentheses. For R_r we find B_j(R_r) = B_{j + i + 2}(S) - B_{i + 2}(S) = B_{j + i + 2}(S) - (1 + 0 - 1) = B_{j + i + 2}(S) for all j. Thus, B_j(R_r) >= 0 for all j, 1 <= j < n - i - 2, and B_{n - i - 2}(R_r) = 0. Thus, also R_r is correct. Thus, S can be generated from R_l and R_r applying rule (2).
The above proof is so important that we dwell on it somewhat longer. After having defined R_l and R_r, it must be proven that these satisfy the properties. This must be shown, because only if they satisfy them, it can be assumed inductively that R_l and R_r can be generated. Only then we can argue that S can be generated by first generating R_l and R_r and then applying rule (2). This way of arguing is an implicit application of structural induction to the claim that Sat(S) = true implies Gen(S) = true, where Sat() and Gen() are Boolean attributes which are true for a sequence which satisfies the properties and can be generated, respectively, and false otherwise. In older texts, it was common to give such proofs by complete induction over the length of the sequences. Doing this, when proving that an arbitrary sequence satisfying the properties of length n could be generated, we may assume that all sequences satisfying the properties of length less than n could be generated. Such an argument is more concrete, but considered to be less elegant than a prove by structural induction, because it unnecessarily relies on a the secondary notion length.
<sent> ::= a <sent> b (1)
| b <sent> a (2)
| <empty> (3)
| b <sentaa> b (4)
| a <sentbb> a (5)
<sentaa> ::= a <sentaa> b (6)
| b <sentaa> a (7)
| a <sent> a (8)
<sentbb> ::= a <sentbb> b (9)
| b <sentbb> a (10)
| b <sent> b (11)
<empty> ::= (12)
After what we have seen we soon realize that any sent again has the same number of a's and b's. These attributes are denoted A() and B() as before. This is easy to prove, but requires one step more than the earlier proves. Showing that not all sequences are generated is left as an exercise.
As always we must prove that for the symbol on the left of a production rule the desired property follows when we assume that it holds for the symbols on the right. However, on the left of the production rule (4) we find sentaa and in (5) we find sentbb, for which other properties hold. So, the more correct way of formulating the principle of structural induction is
So, in our case we work with four claims:
Here the last claim is the basis assumption, which may be considered as given. All other properties now can easily be tested by checking the production rules. For example:
A(sent) = 0 + A(sentaa) + 0 = A(sentaa) =ass= B(sentaa) + 2 = 1 + B(sentaa) + 1 = B(sent).
A(sentaa) = 1 + A(sent) + 1 = A(sent) + 2 =ass= B(sent) + 2 = 0 + B(sent) + 0 + 2 = B(sentaa) + 2.
Many useful languages are context free. Most importantly, many programming languages context free. This can easily be verified by checking the specifications of them which are often also given in BNF or in some similar format.
On the other hand, there are very few (if any) context free natural languages. Languages have verbal inflections depending on the person, they may have genders affecting articles and adjectives, they may have cases, the choice of relative pronouns may depend on whether the subject of reference is a a person or a thing.
For a more mathematical example, consider the language SS containing the sentences abc, aabbcc, aaabbbccc, aaaabbbbcccc, ... . The following grammar generates all these sentences:
<S> ::= <A> <B> <C> <A> ::= a | a <A> <B> ::= b | b <B> <C> ::= c | c <C>However, it even generates aaabcc, which we did not want to have. The following grammar comes closer, it still generates all sequences of SS and it does not generate aaabcc:
<S> ::= a <B> c | a <sent> c <B> ::= b | b <B>However, this does not really solve the problem, it is now assured that the number of a's equals the number of c's, but the number of b's can be arbitrary. The problem is not that we are not sufficiently clever: It can been proven, which is not easy, that SS cannot be generated by any context-free grammar. Informally one can argue as follows: a context-free production can essentially only glue together several non-terminal symbols with additional terminal symbols around it. This allows to add the same on both ends, but it is not possible to somehow specify that the same change should be carried out in the middle of a sequence, or that only sequences of equal length should be glued.
S_1 X S_3 ::= S_1 S_2 S_3Here S_1, S_2 and S_3 are arbitrary sequences of terminal and non-terminal symbols, while X denotes a single non-terminal symbol. The interpretation of the above production rule is the following: "In the context of S_1 and S_3, the symbol X can be replaced by S_2".
As a first example we consider a refinement of the grammar of SNT' considered at the beginning of this chapter. SNT' gave nice sentences, but it does not handle the subtility that in English we write "an apple" and "a pear". Furthermore, it builds sentences like "the man which sees the girl throws a stone". It would be more correct to use "who" instead of "which". These details can be handled conveniently with a context-sensitive grammar:
<snt'> ::= <frm'> <vrb> <frm'> <frm'> ::= <frm> <prrl> <frm> ::= <arg> <adg> <sub> <arg> ::= <empty> | <vart> | <cart> | the <empty> ::= <vart> <vsub> ::= an <vsub> <vart> <vadj> ::= an <vadj> <vsub> ::= <pvsub> | <uvsub> <pvsub> ::= assistent | aunt | uncle <uvsub> ::= anchor | ape | eagle | elefant | umbrella <vadj> ::= akward | old | ugly <cart> <csub> ::= a <csub> <cart> <cadj> ::= a <cadj> <csub> ::= <pcsub> | <ucsub> <pcsub> ::= girl | man | woman <ucsub> ::= car | dog | house | roof | zebra <cadj> ::= brown | cold | good | high | hot | strong <adg> ::= <empty> | <adj> | <adj> <adg> <adj> ::= <vadj> | <cadj> <sub> ::= <vsub> | <csub> <prrl> ::= <empty> | <pgr> | <rlg> <pgr> ::= <prp> <frm'> <prp> ::= after | in | over | upon <rlg> ::= <rel> <vrb> <frm'> <rel> ::= <prel> | <urel> <psub> <prel> ::= <psub> who <psub> ::= <pvsub> | <pcsub> <usub> <prel> ::= <usub> which | <usub> that <usub> ::= <uvsub> | <ucsub> <vrb> ::= buys | eats | hits | seesThe above extension achieves what we wanted, but in this case the same might even have been achieved in a context-free way by listing several alternatives for
A class of grammars G_1 is said to be (strictly) more powerful than a class of grammars G_2, if there is a language L which can be generated with a grammar from G_1 which cannot be generated with any grammar from G_2. How about context-free and context-sensitive grammars? Clearly context-sensitive grammars are at least as powerful as context-free grammars, because taking S_1 = S_3 = empty gives a context-free production rule. So, the question is whether there are languages which can be generated by context-sensitive grammars which cannot be generated by context-free grammars. The remainder of this section is devoted to proving this.
Theorem: Context-sensitive grammars are strictly more powerful than context-free grammars.
Proof: Consider again the language SS = {abc, aabbcc, ...}. Above it was claimed that this language cannot be generated by a context-free grammar. So, proving that SS can be generated by a context-sensitive grammar, demonstrates that context-sensitive grammars are strictly more powerful than context-free grammars in the above defined sense. We propose the following grammar:
<S> ::= a <B> <C> (1)
a <B> ::= a a <B> <B> (2)
<B> <C> ::= bc (3)
| b <C> c (4)
<B> b ::= b <B> (5)
We show how to construct aaabbbccc using this grammar. We leave the "<" and ">" symbols away to shorten the notation. We use "-i->" to indicate a production with rule i, for 1 <= i <= 5:
<S> -1-> aBC
-2-> aaBBC
-2-> aaaBBBC
-4-> aaaBBbCc
-5-> aaaBbBCc
-5-> aaabBBCc
-4-> aaabBbCcc
-5-> aaabbBCcc
-3-> aaabbbccc
Generalizing the given scheme for generating all sequences in SS is
left as an exercise. Before trying a full generalization, it is
recommended to first consider how aaaabbbbcccc can be generated.
We are not ready yet. Generating all sequences of SS is not hard, and can even be achieved with a context-free grammar. The point is that a context-free grammar cannot generate precisely those sequences. So, it remains to verify that the set of (non-terminal-free) sequences generated with the given context-sensitive grammar only contains sequences a...ab...bc...c with equally many a's, b's and c's.
The proof can be given by defining attributes: N_a, N_b, N_c, and N_B, indicating the number of symbols of each type. Using structural induction, the following equalities can be proven to hold for all generated sequences:
S_1 ::= S_2That is, an arbitrary sequence of symbols S_1 can be replaced by an arbitrary sequence of symbols S_2. Parsing sentences from general languages is extremely hard: for all known algorithms, there are sequences of length n, for which parsing takes time exponential in n. Here algorithm means a step-by-step description how to proceed.
Notice that in many books one may find a definition which at a first glance appears to be more restrictive, however, the definition given here is equivalent in a sense that all grammars which are classified as regular according to this definition can also be classified as regular according to the apparently more restrictive definitions. With the here given definition it is often easier to show that a grammar is regular.
The given grammar for chains is not regular. However, chains can be ontained with a regular grammar as well:
<chain> ::= <empty>
| node
| node link <chain>
The same is true for the language of (numerical) expressions without
brackets (which is essentially the same):
<op> ::= + | - | * | /
<expr> ::= <empty>
| <number>
| <number> <op> <expr>
Regular grammars are strictly weaker than context-free grammars. For example, the given grammar for palindromes contains the production
<pal> = a <pal> awhich is not a regular production rule, showing that this is not a regular grammar. In principle, this does not exclude the possible existence of a regular grammar generating all palindromes, but it can be shown that there is no such regular grammar. Intuitively it can be understood as follows: palindromes appear to essentially require a growth out off the middle, something which is not possible regularly.
If a language as basic as that of the palindromes cannot be generated then clearly many other useful languages cannot be generated either. The reason to consider regular grammars is that they are almost trivial to parse, considerably easier than context-free languages.
Type 0: general grammars
Type 1: context-sensitive grammars
Type 2: context-free grammars
Type 3: regular grammars
This classification is due to Chomsky in the course of his pioneering work in this area. Therefore it is also called the Chomsky hierarchy.
We give some examples. Correct are .14, 17., 0E1, 17.3E-15, 12.3E+14F. Not correct are 18 (no dot, no E), .E1 (no digits before or after the dot), E1 (no digits before the E), 0E (no digits after the E), 14E1.2 (dot after the E), -71.56 (the sign is not part of the float constant but considered to be a unary operator).
Give a complete syntax diagram for floating point constants in decimal notation. Also give the complete BNF formulation of the above rules. Hint: work analogously to the given examples for integer constants and chains, defining several non-terminal symbols for the respective possible parts and forms of the number.
<stat> ::= <var> = <expr> ;
| if <expr> <stat>
| if <expr> <stat> else <stat>
Here <var> stands for variable and <expr> for expression.
Show that this grammar is ambiguous, by presenting two different
parse trees for the code fragment
if (a > 3)
if (a > 0)
a = a + 3;
else
a = a * b + 7;
Show two ways how to modify the grammar of the if statement so that the ambiguity is eliminated.
Prove by structural induction that the number of verbs in an extended sentence equals the number of relative pronouns plus one. A correct BNF formulation of SNT' can be downloaded here.
<sent> ::= a <sent> b
| b <sent> a
| <sent> <sent>
|
<sent> ::= a <sent> b (1)
| b <sent> a (2)
| <empty> (3)
| b <sentaa> b (4)
| a <sentbb> a (5)
<sentaa> ::= a <sentaa> b (6)
| b <sentaa> a (7)
| a <sent> a (8)
<sentbb> ::= a <sentbb> b (9)
| b <sentbb> a (10)
| b <sent> b (11)
<empty> ::= (12)
Give an example of a short string with equally many a's and b's, which cannot be generated by this grammar. Generalize this to indicate infinitely many strings which cannot be generated. Prove this. Hint: use the way the strings are grown and the values of the attributes A() and B() giving the number of a's and b's, respectively.
<expr1> ::= <part>
| <part> <operator> <expr1>
<part> ::= <number>
| ( <expr1> )
This grammar generates exactly the same sequences of symbols
as the simpler grammar
<expr2> ::= <number>
| ( <expr2> )
| <expr2> <operator> <expr2>
Prove this (for example, by showing that both generate everything
we normally consider to be expressions with correctly placed
parentheses).
Two grammars are said to be weakly equivalent if they generate the same language. Two grammars are said to be strongly equivalent if they are weakly equivalent and if for any word of the language they construct isomorphic parse trees. The above proof shows that the grammars for expr1 and expr2 are weakly equivalent. However, they are not strongly equivalent as can be seen by checking that the parse trees of a * b + c are not isomorphic.
<S> ::= a <B> <C> (1)
a <B> ::= a a <B> <B> (2)
<B> <C> ::= bc (3)
| b <C> c (4)
<B> b ::= b <B> (5)
Give an algorithm, a step-by-step description how to proceed, for the construction of a sequence a...ab...bc...c with n a's, b's and c's. Alternatively you may give an inductive proof that all strings of this form are generated.
Formalize and complete the proof that all generated strings without non-terminal symbols are of this form.
This grammar was presented as an example of a context-sensitive grammar being able to generate a language which cannot be generated by a context-free grammar. However, the way it is formulated, most rules are not following the allowed format for context-sensitive grammars. Indicate for each rule whether the given formulation is regular, context-free, context-sensitive or general.
Rewrite the grammar (adding non-terminal symbols and production rules) so that it really becomes context-sensitive. Take care that the new grammar should still generate exactly the same sequences of terminal symbols.
We must be careful. Consider the following grammar:
<S> ::= <A>
<A> ::= <B>
<B> ::= <A>
| a
There are many possibilities to parse the sequence a:
a -> <B> -> <A> -> <S>
a -> <B> -> <A> -> <B> -> <A> -> <S>
a -> <B> -> <A> -> <B> -> <A> -> <B> -> <A> -> <S>
...
However, this is stupid, one should not make it unnecessarily larger. So, let t(R, G) denote the minimum number of productions needed to produce a sequence of symbols R with grammar G. In terms of t() we define
T(n, G) = max_{R is a sequence of n symbols produced with G} t(R, G).
Of course T(n, G) is a function which depends on n. It also depends on G. If G has many production rules, the number of rules to apply to obtain a sequence may be larger. In order to know how hard parsing, speaking in general, may be, we should give a lower estimate of
C(n, m) = max_{G is a grammar with m production rules}
Give a context-free grammar with m production rules for which producing sequences with n symbols requires about n * m steps. Because m is a constant, this is not so bad at all: n * m is linear in n. Show that in general C_cf(n, m) <= n * m, where C_cf(n, m) is defined just as C(n, m) with the maximum taken over all context-free grammars.
The situation is worse for context sensitive grammars in an essential way. Let G be the grammar for generating the sequences {abc, aabbcc, aaabbbccc, ...}. Think of your algorithm for generating the sequence a...ab...bc...c with n a's, b's and c's. Show that the number of production steps in this algorithm is quadratic in n. That is, show that there is a constant d > 0 so that the algorithm takes at least d * n^2 steps.
Explain in a non-formal way why for this grammar any schedule for generating the sequence with n a's, b's and c's consists of at least a quadratic number of productions.
The above points together give a very important result: for some d > 0, C_cs(n, m) >= d * n^2 > n * m >= C_cf(n, m) for n > m / d. Thus, when parsing a sentence from a context-sensitive language it may not only be harder to find the steps to make, but the number of these steps may also grow with the length of the sentence in a much more unpleasant way.