Sean's Patched lil-gp Kernel

This modified lil-gp kernel provides a number of new features developed to make possible our GP development of real-time soccer softbots for 1997 RoboCup competition in Nagoya. You may find them useful.

Bug Fixes: There are two serious race-condition bugs in multithreaded lil-gp. This kernel fixes them, but if you don't want to use this kernel, you can still patch them yourself; the instructions are located here. Regardless, you need to very slightly modify your application code as described below.
Strongly Typed GP: The kernel is now strongly-typed, not typeless like normal lil-gp. You can still do "typeless" GP, of course, as this is just a special case of strongly-typed GP (with a single type).
Coevolution: The kernel supports a simple form of coevolution.
Reading Populations from Files: You can now generate populations from lisp-like tree files.
Tree Generation and Modification: Mutation, population generation, and crossover now have some new features.
New Selection Methods: The kernel now provides boltzman selection and sigma scaling selection.
Mutlithreading Patch: You can now specify the number of threads in your input file, instead of (yuck) in an environment variable.

Actually, I'm not exactly certain if this is all the features added to this kernel; it's been a while. And I state here and now, I am not responsible for any bugs or errors in this code. It is offered freely and without warranty or guarantee of any kind.

Peter Andersen has found two bugs in the patched and original kernels. One is a minor memory leak which has been fixed as of December 23, 1997 (download a new version if your version is older than this). The other concerns ERCs and appears to occur in the original kernel; I've not verified if his patch fix works yet, so instead of patching it in the kernel, I give his instructions here.

The kernel is located here. It includes a single app (a modified version of ant, called anttype, which demonstrates strongly-typed GP with the kernel), but no documentation etc. You'll have to get that from the lil-gp home page.

Bug Fixes

The kernel fixes two very serious multithreading race-condition bugs in the lilgp distribution. One is in the random() facility, the other is in the variable current_individual.

To make possible the current_individual bug fix, a modification had to be made in the user's definition of and use of the globaldata structure. So to use this kernel, you'll need to add one field to your globaldata structure in app.h to facilitate one of the bug patches. Namely, your globaldata struct should look thusly:

                    typedef struct
                        {
                        /* include this to patch the current_individual bug,
                           but DO NOT access it.  Just leave it alone... */
                        individual* current_individual;
                        /* end patch */


                        /* The rest of your globaldata stuff goes here...*/

                        } globaldata;


                    /* Now it's important to comment this out -- we no longer use it! */
                    /*extern globaldata g;*/

So if we've commented out g as a global variable, how do we gain access to our global data? This is done with a new, correctly thread-safe function called get_globaldata(). At the top of app_eval_fitness, you now do something like:

                    void app_eval_fitness ( individual *ind )
                        {
                         globaldata* g=get_globaldata();
                         set_current_individual ( ind );
                         ...

...and that completes your part of the bug fix!

Strongly-Typed lilgp

This version of lilgp has been modified to use strongly-typed gp instead of single-typed gp. Tree generation (for mutation and for initial trees) and crossover have been heavily modified to enforce strongly-typed GP.

This kernel distinguishes "types" by assigning each "type" a unique integer, starting with 0 and going up. Each tree has a user-defined return type for that tree. Additionally, each function (terminal or nonterminal) has a user-defined return type, and nonterminal functions have unique return types for each of their arguments. The kernel will not permit trees to be constructed such that their root function returns the wrong type for the tree, or such that any nonterminal has as a child a function which returns the wrong type for that argument position in the nonterminal.

You'll use DATATYPE as usual for passing results among functions. You should take it on faith that whatever you encode in DATATYPE will be properly decoded by whomever you pass it to, since you now are enforced to expect the data in DATATYPE to be of the same exact "type".

This code works peachy for a single tree and for individuals with several trees not used as ADFs. I have *not* tested it for ADFs, so I cannot vouch that it works for sure with them, but I believe it works right.

Here are the main data types and defines, and explanations for how they've changed to support strong typing. Read and follow these THOROUGHLY.

DATATYPE

is the C data type returned by all functions and terminals. because this is *strongly*typed* GP, DATATYPE is best written as the *union* of all your data types. As an example, suppose you wanted to support vectors (two floats) and booleans as your datatypes. Then DATATYPE might be set to something like

       typedef struct vect_ { float x; float y; } vect;
       typedef union app_datatype_ { vect v ; char bool; } app_datatype;
       #define DATATYPE app_datatype

NUMTYPES

A new define. This indicates the total number of data types in your domain. If your types are, say, int, bool, and vector, then you set NUMTYPES to 3. Your datatype should be assigned numbers (as explained below) ranging from 0 to NUMTYPES - 1. You need to add this define to your appdef.h file.

function

Now looks like this:

                    typedef struct
                        {
                         DATATYPE (*code)();
                         void (*ephem_gen)( DATATYPE * );
                         char * (*ephem_str)();
                         int arity;
                         char *string;
                         int type;
                         int evaltree;
                         int index;
                    /* Added to support strong data typing */
                         int return_type;
                         int argument_type[MAXARGS];
                        } function;

The two last items are the return type (a value between 0 and NUMTYPES-1) and the type of each argument to the function (if it's a nonterminal). Here's an example. Imagine that your MAXARGS is 3, your NUMTYPES is 2 (booleans are 0 and vectors are 1). You want to describe a nonterminal function "CrossProduct>0" that returns a boolean value after comparing two vectors. You might write it as:

/* Okay, some defines first... */
#define BOOLEANS 0
#define VECTORS 1
#define IGNORE -1      /* This will get ignored anyway */

/* ... and your function definition might look like... */
     
{f_cp,NULL,NULL,2,"CrossProduct>0",FUNC_DATA,-1,0,BOOLEANS,{VECTORS,VECTORS,IGNORE}}

user_treeinfo

This combines the old tree map (an integer), the tree_name (a char*) together, plus the return type for the tree as a whole, as:

                    typedef struct
                        {
                        int fset;                   /* function set for tree */
                        int return_type;            /* return type for tree */
                        char* name;                 /* tree name */
                        } user_treeinfo;

function_sets_init()

This has been modified; you no longer pass it the tree_map[] and tree_name[]; instead, you pass it a user_treeinfo[] array containing all necessary tree information for each tree. The format is now:

                   int function_sets_init(function_set* user_fset, 
                                          int user_fcount,
                                          user_treeinfo* user_tree_map,
                                          int user_tcount );

Some caveats:

You *must* have at least one terminal and at least one nonterminal for *each* return type, or tree-generation will fail.
If you're supporting ADFs, you *must* make certain that the ADF functions in your function set have the same return type as the ADF tree itself! Additionally, you *must* make certain that the ARG functions have the same return type as that argument type for the cooresponding ADF function in calling trees. If you do this, the ADF facility will probably work, though I've not tested it.
Remember to properly set the return types for ephemeral random constants.
When evaluating a tree, remember to cast its result to the appropriate return type for that tree.

Reading from Files

You can now read trees in from a file at population-generation time, instead of generating them randomly. To force a particular tree to be read from a file, use the new input-file parameters:

	init.tree[val].method = load
	tree-replace[val] = filename

The first tells the GP system that it will be loading trees for tree position val for individuals in the population. The second gives the filename of the file to load from. You should use separate filenames for separate tree positions. All the trees for all the individuals in a particular tree position should appear one after another in the file.

Files consist of a series of trees in pseudo-LISP form. The tree syntax follows the following rules:

nonterminals are of the form (nonterminal_name function function function etc.)
terminals are of the form (terminal_name). Terminals may alternately be written without the parentheses only within an enclosing nonterminal.
Parentheses may not have whitespace around them. That is,
```
( hello ( dolly ! ) )
```
is INVALID. It must be written as
```
(hello (dolly !))
```

For example, if you have a population of four individuals, consisting of a single tree each, for, say, symbolic regression, you might load them from a file that looks like:

	(+ (/ x 2) x)
	(sin (cos
	     (+ 2 x)))
	(x)
	(- x x)

Note that the lone (x) must have parenthesis around itself. This is not a robust system; if there is an error in the syntax of the file, the program will self-destruct. GDB can help you determine which function name is causing the problem.

Coevolution

You can do a simple 2-individual form of coevolution over a single population in multi-threaded lil-gp easily. Single-threaded coevolution is not supported.

First, add the item -DCOEVOLUTION to your app's GNUmakefile CFLAGS line. Now lil-gp expects to pass you two roughly random (okay, next to each other in the individual array, but it was loaded randomly) individuals in app_eval_fitness(). So your app_eval_fitness() function would now look something like:


     void app_eval_fitness ( individual *ind, individual* ind2 )
             /* Note there are two individuals now, not just one */
     {
     DATATYPE d1, d2;

     set_current_individual(ind);
     d1=evaluate_tree(ind->tr[0].data,0);

     set_current_individual(ind2);
     d2=evaluate_tree(ind2->tr[0].data,0);

     /* yahda yahda yahda */

     ind->hits= /* blah blah blah */
     ind->r_fitness = /* blah blah blah */
     ind->s_fitness = /* blah blah blah */
     ind->a_fitness = 1.0/(1.0+ind->s_fitness);
     ind->evald = EVAL_CACHE_VALID;

     ind2->hits= /* blah blah blah */
     ind2->r_fitness = /* blah blah blah */
     ind2->s_fitness = /* blah blah blah */
     ind2->a_fitness = 1.0/(1.0+ind2->s_fitness);
     ind2->evald = EVAL_CACHE_VALID;
     }

Tree Generation and Modification

The new initialization parameter

	init.ignore_limits = 1         # by default, it's 0

...will force GP to ignore depth limit, node size, and duplicate individual violations when generating new individuals. This is useful for reading individuals from files.

The mutation and crossover operators now have a new argument:

		num_times = val

In conjunction with the argument keep_trying, this new argument tells lil-gp how many times to keep trying; this puts an effective bound on the number of times (previously, it was unbounded). If num_times is set to 0, then there is no bound.

New Selection Methods

Sigma Scaling Selection

Sigma Scaling selection attempts to keep up selection pressure throughout a run. Sigma Scaling adjusts the selection intervals with a function based on the mean and standard deviation of the population. For more information, see [Mitchell 96].

This version isn't very rigorous as it uses Koza's adjusted fitness metric as its basic fitness, and this isn't the best choice. To perform Sigma Scaling selection, change your input file's selection method to "sigma".

Sigma scaling is located in sigma.c

Boltzman Selection

Boltzman selection slowly decreases temperature T from boltzman_hi to boltzman_low through the course of the GP run. T is decreased by boltzman_step each generation. As described in [Mitchell 96], T determines the randomness of the selection.

This version isn't very rigorous as it uses Koza's adjusted fitness metric as its basic fitness, and this isn't the best choice. Boltzman selection MAY NOT WORK WITH CHECKPOINT FILES.

To perform Boltzman selection, change your input file's selection method to "boltzman". You adjust Boltzman selection with the input file parameters:

boltzman_hi: The high-temperature (initial) value, default: max_generations, else 51.0;
boltzman_low: The low-temperature (final) value, default: 0.0
boltzman_step: The amount T decreases each generation, default: 1.0

Boltzman selection is located in boltzman.c.

Multithreading Patch

You now specify the number of threads not with an environment variable (man, that was weird), but with an input file parameter:

	num_threads = val

This must be set in order to do multithreading.

Have fun!