[SLP-Homepage]    [Source Modules]    [Manual]    [Run]    [Examples]
 

The Lexical Scanner

The scanner is responsible for the lexical analysis: It partitions the input stream into tokens (and skips white space and comments).

An object of the class lex_c contains (a pointer to) an open input stream. There can be multiple lex_c-objects at the same time (e.g. for include files).

Lexical errors are reported to the caller as special error tokens and no error message is printed. This allows to treat them like syntax errors (e.g., give the same appearance to the error message, skip to the next fullstop, etc.).


Construction:
lex_c:
lex_c(in_c * input);
This method constructs a scanner object for a given input stream (an object of class in_c). The input stream must really be open, i.e. the method is_open must return BOOL_TRUE. So when a scanner object is created, any errors during the file open have already been handled. The constructor reads already the first input token, so functions like tok_type can immediately be called.


Functions for Getting Input:

tok_type:
tok_t tok_type(void) const;
This function returns the type of the current token (such as TOK_ATOM). Token types are defined in tok.h.
tok_ptr:
str_t tok_ptr(void) const;
This function returns a pointer to the first character of the current token in the input buffer. The text of the token is delimited by the pointer returned by tok_end. Note that the token is not null-terminated.
tok_end:
str_t tok_end(void) const;
This function returns a pointer to the current input character which is just after the current token. For instance, for a one character token like "(", we would have tok_end()=tok_ptr()+1. The tok_end-pointer should be used only as an end-marker, not as a lookahead. It is not guranteed that *tok_end() really is the next input character. It could also be the null character (although this does not happen in the current implementation, which always keeps a complete line in the buffer).
tok_int:
int tok_int(void) const;
This function returns the integer value of the current token. It can only be called if tok_type()=TOK_INT. Note that minimal and maximal integer are defined as constants in lex.h. Thus, the integer range for integer tokens can be smaller than what the machine actually supports. This helps that SLP behaves in the same way on different platforms. Currently, the minimum integer is -65536 and the maximum integer is 65535 (i.e. 16-bit integers). This should be very portable. If an integer constant in the input violates this range, the token type is TOK_INT_OVERFLOW.
tok_op:
op_t tok_op(void) const;
This function returns the operator value of the current token. The current token type must be TOK_MON_OP or TOK_BIN_OP or TOK_COMMA. The operator of TOK_COMMA is OP_AND.
is_v:
bool_t is_v(void) const;
This function returns BOOL_TRUE if the current token is the reserved atom "v". This atom represents disjunction, but it happend to me several times that I forgot that and used it as a predicate name. Then the error message was not very helpful. Therefore I decided to add to the error message a note reminding the user of the special meaning of "v" in case the error is detected at this token.
next:
void next(void);
This function reads the next token (and thereby changes the values returned by the methods tok_type, tok_ptr, and tok_end). Note that next is implicitly called from the constructor, so it is not necessary (and in fact a mistake) to call next before the first token. The input file may not contain null characters or very long lines. Otherwise this method prints an error message and simply exits the program. If the operating system should report an error during the read-operation, an error message is printed and the end of file is assumed. Lexical errors are treated by returning special error tokens (see tok.h). No error message is printed in this case.


Diagnosis Functions:

filename:
str_t filename(void) const;
This method returns the name of the currently open file. It does so by calling the corresponding method of the in_c-object given to the constructor. The filename may be STR_NIL for input devices which are not files.
line_no:
int line_no(void) const;
This method returns the current line number. Line numbers are counted from 1.
show:
void show(void) const;
This function prints the current input line and highlights the current token within this line. It is intended for syntax error messages. It assumes that the output cursor is currently at the beginning of a new line.


Implementation:


Stefan Brass (sbrass@sis.pitt.edu), October 2, 2001.    [HTML 3.2 Checked]