CS 440
Program 2
due October 8, 2015
estended to November 5
The Program
For Program 2 you will write two modules of your PAXI compiler, the
scanner and the symbol table. In order to test your
program you will also write two test drivers which exercise
these modules.
The Scanner
The scanner will be written using flex. It will recognize as
a lexeme each of the keywords listed in the PAXI Language Definition and
each of the symbols listed there except ' (single quote) and "
(double quote). It will return an integer token for each of
these lexemes.
Tokens can be chosen arbitrarily (but must be non-zero). They
must be given readable names which are defined in a header
file #included into your flex file.
Your scanner must also recognize:
- identifiers
action: return token (more will be added here in Program 3)
- literal integers
action: save the integer value in a global variable and return a
token
- quoted strings
action: remove quote marks, add to string list, and
return token
- single characters (enclosed in single quotes)
action: return token
- new line characters ('\n')
action: increment line counter
- comments
action: none
- white space (regular expression: [ \t])
action: ignore
- characters which do not match the lexical structure of PAXI
(regular expression: ".")
action: give error message and exit program.
The String List
Whenever a quoted string is recognized it is added (without quote
marks) to a string list. The string list must be kept in the
order in which the strings are found in the source file. I.e. when a
new string is found it is added to the end of the list. The
string list will be a simple linked list.
The Symbol Table
The symbol table will be a hash table. The simplest hash
table organization ("separate chaining" for synonyms) will work well
here. The table's internal workings will not be visible
outside of the module in which it is defined. Access will be through
two operations: lookup and insert.
Lookup will take a character string as parameter, search
the symbol table and return a pointer to a symbol table entry if one
is found having that string as its (identifier) key. If no entry was
found lookup will return a NULL pointer.
Insert will create a new entry in the symbol table.
Before making the new entry it will first check to see if an entry
already exists with the same (identifier) key. If such already exists
insert will return a value indicating that it failed, otherwise it
will return a value indicating success.
Symbol table entries will have four fields:
The name field will hold the identifier. The type
field will hold value variable, array,
procedure, parameter, or local variable.
Size will be 1 in the case of a (scalar) variable, the array
length in the case of an array, and the number of formal parameters in
the case of a procedure. Location will be the address (data
store index) of a variable or array and the entry point (code store
index) for a procedure. For Program 2 the size and location fields
will be set to 0 (this will be changed in later programs).
The Test Drivers
You will write a test driver to run your scanner and another to test
your symbol table functions.
For the first test driver, you will write a program which calls the
scanner (yylex()) in a loop which terminates on end of input
(when yylex() returns 0). Each time yylex() returns a token your
program will print out the (integer) token value. If the token is
that for identifier, number, or string the program will also display
the recognized identifier, number, or string. After the program has
finished scanning the input it will display the entire string list.
The second test driver will display a simple menu to the user with
three options: insert, lookup, and quit.
If the user chooses insert the program will accept an
identifier (character string) and type (as described above) and call
the symbol table's insert operation to make a new table entry. The
test program will display whether the insertion was successful or
not. If the user chooses lookup the program will accept an
identifier and attempt to find a symbol table entry for that entry.
If none is found a message to that effect will be displayed. If one
is found its four fields will be displayed. When the user chooses
quit the program will display the entire symbol
table, in a readable form, and then exit.
Opening Files
Your scanner test driver must get the name of an input file from
the command line, open the file
for reading and assign it to yyin.
yyin is declared by flex to be of type
FILE*. This means that you should open the input file using the C
fopen() function and assign the returned pointer to yyin. If
your program was unable to open the input file (fopen() returned a
NULL pointer) you should display a polite error message and exit.
To Hand In
As with Program 1 you will hand in your source code
and a terminal session using sample data to be provided later.