CS 440/540
Program 2

due
June 9, 2008

The Program

In this program you will add a yacc-based parser (yacc being byaccj or bison), symbol table access, and some code generation to your compiler. This is the largest of the pieces of your compiler.

The parser is built around the grammar in the PAXI Language Definition. An ASCII file containing just the grammar can be found here. Use your favorite text editor to convert this into the basis for your yacc file.

The Parser

The parser is straight forward. C/C++ users will need a %union declaration, however, with three field types -- one for integer, one for pointer to character, and one for pointer to symbol table entry. (The pointer to character type will be used to get ID strings from the scanner.) Java users will use the ival and sval instance fields from ParserVal for the integer and String types and the obj (type Object) instance field, typecast to pointer to symbol table entry.

You will have to make a few small additions to your lex (flex or jflex) file. Pass integer values (from the action for NUMBER) and strings (from the action for ID) to the parser through assignment to yylval.

Most of the effort in building the yacc file will be in accessing the symbol table and in code generation.

Calling the Symbol Table Routines

Your program will call the insert function from your symbol table to create new table entries whenever a declaration is parsed and will call the lookup function whenever an identifier is referenced.

New Table Entries

A new table entry will be made in each of four situations: a global declaration, a procedure definition, a procedure parameter being read, and a local declaration.

Global scalar variables are declared in each right hand side of production 5, <global_var_list>. A character string (for the name field) should be attached to each ID token. The size field will be 1. Arrays are declared in production 8, <single_array>. The name field will be attached to the ID token and the size field will be attached to the NUMBER token.

To find a location field value for globals you must allocate memory in the data store for the item (variable or array) and bind the item to the memory allocated to it. This is done by keeping a "next available" counter with the address (data store index) of the next place which is available to be allocated. This counter is increased by the size of the item (allocating). The value of the counter before allocating becomes the location field (binding).

Procedures are declared in production 10, <procedure_decl>. Again, the name field will be attached to the ID token. The size field will hold the number of formal parameters. These must be counted from the number of entries in the parameter list. This can be done in production 12, <formal_list>, by incrementing a counter.

When a symbol table entry is made for a parameter or a local variable you must make sure that it can be distinguished from a possible global variable of the same name. The easiest way to do this is by mangling the name. Append the name of the procedure it is declared in to the ID, separated by a character which cannot appear in an identifier (e.g. '#' or '@').

Parameters are declared in production 12, <formal_list>. The name field is built from the string attached to the ID token. The size is 1. The location field is the number (starting with 1 in each procedure) of the parameter in the parameter list. This is computed from the counter incremented in this production.

Local variables are declared in (each right hand side of) production 15, <local_var_list>. The name field is built from the string attached to ID and size is 1. The location field is the number (starting with 1 in each procedure) of the local variable. As with parameters, this is found by keeping a counter and incrementing it as each local variable is declared.

Looking Up Table Entries

Anywhere in a program where and identifier is used other than where it is being defined it must be looked up in the symbol table. In these places a semantic check must be made (to make sure, for example, that a procedure name is not being used as a variable) and the location field must be made available for code generation.

Scalar variables, arrays, parameters, and/or local variables are referenced in productions 30 (<variable>), 31 (<read_statement>), and 32 (<write_statement>). Procedures are referenced in production 25 (<call>). When an ID is found where a variable is expected you must first check to see if a local variable exists with that name. If none does, check for a global variable.

Code Generation

For Program 2 you will do code generation for variable and array references, arithmetic, the read and write statements, and the assignment statement.

Code generation is done by keeping a string array (an ArrayList of Strings for Java, a vector of strings for C++) where each string is a line of C code. Code will be added to this array from appropriate productions in the parser. When parsing (and hence code generation) is complete this array (along with some additional code to be discussed in class) is written to the output file making a C source file.

All arithmetic will be done using the stack. That is, a binary operation will pop two arguments from the stack into a temporary location (500 - 509 in the data store), perform the operation, and push the result.

Details on code generation will be discussed in class.

To Hand In

As usual you will hand in your source code and a terminal session using sample data to be provided later.