CS 440-540
Program 1
due May 30, 2008

The Program

For Program 1 you will write two modules of your Paxi compiler, the scanner and the symbol table. In order to test your program you will also write two test drivers which excercise these modules.

The Scanner

The scanner will be written using flex or jflex. It will recognize as a lexeme each of the keywords listed in the Paxi Language Definition and each of the symbols listed there except ' (single quote) and " (double quote). It will return an integer token for each of these lexemes.

Tokens can be chosen arbitrarily (but must be non-zero). They must be given readable names which are defined as constants (Java) or defined in a header file #included into your flex file (C++).

Your scanner must also recognize:

The String List

Whenever a quoted string is recognized it is added (without quote marks) to a string list. The string list must be kept in the order in which the strings are found in the source file. I.e. when a new string is found it is added to the end of the list.

The Symbol Table

The symbol table will be a hash table. The simplest hash table organization ("separate chaining" for synonyms) will work well here. The table's internal workings will not be visible outside of the module in which it is defined. Access will be through two operations: lookup and insert.

Lookup will take a character string as parameter, search the symbol table and return a pointer to a symbol table entry if one is found having that string as its (identifier) key. If no entry was found lookup will return null.

Insert will create a new entry in the symbol table. Before making the new entry it will first check to see if an entry already exists with the same (identifier) key. If such already exists insert will return a value indicating that it failed, otherwise it will return a value indicating success.

Symbol table entries will have four fields:

The name field will hold the identifier. The type field will hold value variable, array, procedure, parameter, or local variable. Size will be 1 in the case of a (scalar) variable, the array length in the case of an array, and the number of formal parameters in the case of a procedure. Location will be the address (data store index) of a variable or array and the entry point (code store index) for a procedure. For Program 1 the size and location fields will be set to 0 (this will be changed in later programs).

The Test Drivers

You will write a test driver to run your scanner and another to test your symbol table functions.

For the first test driver (for your scanner), you will write a program which calls the scanner (yylex()) in a loop which terminates on end of input (when yylex() returns 0). Each time yylex() returns a token your program will print out the (integer) token value. If the token is that for identifier, number, or string the program will also display the recognized identifier, number, or string. After the program has finished scanning the input it will display the entire string list.

The second test driver (for your symbol table) will display a simple menu to the user with three options: insert, lookup, and quit. If the user chooses insert the program will accept an identifier (character string) and type (as described above) and call the symbol table's insert operation to make a new table entry. The test program will display whether the insertion was successful or not. If the user chooses lookup the program will accept an identifier and attempt to find a symbol table entry for that entry. If none is found a message to that effect will be displayed. If one is found its four fields will be displayed. When the user chooses quit the program will display the entire symbol table, in a readable form, and then exit.

To Hand In

You will hand in your source code and a terminal session using sample data to be provided later.