CS 440-540
Program 1
due May 30, 2008
The Program
For Program 1 you will write two modules of your Paxi compiler, the
scanner and the symbol table. In order to test your
program you will also write two test drivers which excercise
these modules.
The Scanner
The scanner will be written using flex or jflex. It
will recognize as a lexeme each of the keywords listed in the
Paxi Language Definition
and each of the symbols listed there except ' (single quote) and
" (double quote). It will return an integer token for
each of these lexemes.
Tokens can be chosen arbitrarily (but must be non-zero). They
must be given readable names which are defined as constants
(Java) or defined in a header file #included into your flex file
(C++).
Your scanner must also recognize:
- identifiers
action: return token (more will be added here in Program 2)
- literal integers
action: save the integer value in a variable and return a token
- quoted strings
action: remove quote marks, add to string list, and
return token
- single characters (enclosed in single quotes)
action: return token
- new line characters ('\n')
action: increment line counter
- comments
action: none
- white space (regular expression: [ \t])
action: ignore
- characters which do not match the lexical structure of Paxi
(regular expression: ".")
action: give error message and exit program.
The String List
Whenever a quoted string is recognized it is added (without quote
marks) to a string list. The string list must be kept in the
order in which the strings are found in the source file. I.e. when a
new string is found it is added to the end of the list.
The Symbol Table
The symbol table will be a hash table. The simplest hash
table organization ("separate chaining" for synonyms) will work well
here. The table's internal workings will not be visible
outside of the module in which it is defined. Access will be through
two operations: lookup and insert.
Lookup will take a character string as parameter, search
the symbol table and return a pointer to a symbol table entry if one
is found having that string as its (identifier) key. If no entry was
found lookup will return null.
Insert will create a new entry in the symbol table.
Before making the new entry it will first check to see if an entry
already exists with the same (identifier) key. If such already exists
insert will return a value indicating that it failed, otherwise it
will return a value indicating success.
Symbol table entries will have four fields:
The name field will hold the identifier. The type
field will hold value variable, array,
procedure, parameter, or local variable.
Size will be 1 in the case of a (scalar) variable, the array
length in the case of an array, and the number of formal parameters in
the case of a procedure. Location will be the address (data
store index) of a variable or array and the entry point (code store
index) for a procedure. For Program 1 the size and location fields
will be set to 0 (this will be changed in later programs).
The Test Drivers
You will write a test driver to run your scanner and another to test
your symbol table functions.
For the first test driver (for your scanner),
you will write a program which calls the
scanner (yylex()) in a loop which terminates on end of input
(when yylex() returns 0). Each time yylex() returns a token your
program will print out the (integer) token value. If the token is
that for identifier, number, or string the program will also display
the recognized identifier, number, or string. After the program has
finished scanning the input it will display the entire string list.
The second test driver (for your symbol table)
will display a simple menu to the user with
three options: insert, lookup, and quit.
If the user chooses insert the program will accept an
identifier (character string) and type (as described above) and call
the symbol table's insert operation to make a new table entry. The
test program will display whether the insertion was successful or
not. If the user chooses lookup the program will accept an
identifier and attempt to find a symbol table entry for that entry.
If none is found a message to that effect will be displayed. If one
is found its four fields will be displayed. When the user chooses
quit the program will display the entire symbol
table, in a readable form, and then exit.
To Hand In
You will hand in your source code
and a terminal session using sample data to be provided later.