The scanner is responsible for the lexical analysis: It partitions the input stream into tokens (and skips white space and comments).
An object of the class lex_c contains (a pointer to) an open input stream. There can be multiple lex_c-objects at the same time (e.g. for include files).
Lexical errors are reported to the caller as special error tokens and no error message is printed. This allows to treat them like syntax errors (e.g., give the same appearance to the error message, skip to the next fullstop, etc.).
lex_c(in_c * input)
;in_c
).
The input stream must really be open,
i.e. the method is_open
must return BOOL_TRUE
.
So when a scanner object is created,
any errors during the file open have already been handled.
The constructor reads already the first input token,
so functions like tok_type
can immediately be called.
tok_t tok_type(void) const
;TOK_ATOM
).
Token types are defined in
tok.h.
str_t tok_ptr(void) const
;tok_end
.
Note that the token is not null-terminated.
str_t tok_end(void) const
;(
",
we would have tok_end()=tok_ptr()+1
.
The tok_end
-pointer should be used only as an
end-marker, not as a lookahead.
It is not guranteed that *tok_end()
really is the
next input character.
It could also be the null character
(although this does not happen in the current implementation,
which always keeps a complete line in the buffer).
int tok_int(void) const
;tok_type()=TOK_INT
.
Note that minimal and maximal integer are defined as constants
in lex.h
.
Thus,
the integer range for integer tokens can be smaller
than what the machine actually supports.
This helps that SLP behaves in the same way on different platforms.
Currently,
the minimum integer is -65536
and the maximum integer is 65535
(i.e. 16-bit integers).
This should be very portable.
If an integer constant in the input violates this range,
the token type is TOK_INT_OVERFLOW
.
op_t tok_op(void) const
;TOK_MON_OP
or TOK_BIN_OP
or TOK_COMMA
.
The operator of TOK_COMMA
is OP_AND
.
bool_t is_v(void) const
;BOOL_TRUE
if the current token is the reserved atom "v".
This atom represents disjunction,
but it happend to me several times that I forgot that
and used it as a predicate name.
Then the error message was not very helpful.
Therefore I decided to add to the error message a note
reminding the user of the special meaning of "v"
in case the error is detected at this token.
- next:
-
void next(void)
;
This function reads the next token
(and thereby changes the values returned by the methods
tok_type
, tok_ptr
, and tok_end
).
Note that next
is implicitly called
from the constructor,
so it is not necessary (and in fact a mistake)
to call next
before the first token.
The input file may not contain null characters or very long lines.
Otherwise this method prints an error message
and simply exits the program.
If the operating system should report an error during the
read
-operation, an error message is printed
and the end of file is assumed.
Lexical errors are treated by returning special error tokens
(see tok.h).
No error message is printed in this case.
str_t filename(void) const
;in_c
-object
given to the constructor.
The filename may be STR_NIL
for input devices
which are not files.
int line_no(void) const
;void show(void) const
;