As far as GP systems go, this is a big one. It sports a number of packages for doing a variety of custom tasks. The big hierarchies are:
- The ec.util package contains utility code unrelated to evolution.
- The ec package contains high-level abstract evolutionary computation objects. Subsidiary packages handle basic breeding, selection, generational or steady-state evolution, and multiobjective optimization.
- The ec.gp package contains GP stuff. It contains a number of subpackages with custom breeders, node builders, Koza-specific mechanisms, etc.
- The ec.vector package contains vector stuff for GAs and ES.
- The ec.app package holds example problem applications.
- The ec.experiment package is for temporary experimental code.
The ec.util package contains a number of objects which are not evolution-specific, but are useful in their own right as generic objects for a variety of purposes.
Random Number Generators
The system comes with two random number generators, MersenneTwister and MersenneTwisterFast. Algorithmically they are identical; however, MersenneTwister is a formal subclass of java.util.Random, is synchronized, and is therefore half the speed as MersenneTwisterFast. It's not used in the system, but is included in case you might find it useful.
The MersenneTwister is an exceptionally high-quality random number generator which has a very long period, has extremely good statistical properties out to hundreds of dimensions, and is quite fast. It is widely considered superior to most every existing random number generator, including: C's broken rand() and random(), the Park-Miller generator used in Numerical Recipies' rand1() and rand2(), and the Knuth subtractive generator used in Numerical Recipies' rand3() and in java.util.Random. This Java version is fully serializable, extensively tested, and surprisingly fast.
The system relies on user-provided runtime parameters to guide nearly every decision it makes, from the classes it loads to the evolution settings they use. These parameters are stored in a parameter file specified at runtime. This parameter file may have parent parameter files whose parameters it overrides; they in turn may have parent parameter files, and so on. Parameter files are similar in design to Java's property lists; similarly the storage facility for parameters, ParameterDatabase, is based on java.util.PropertyList. ParameterDatabase returns values for Parameter settings, which are organized hierarchically much like directories and files. Malformed Parameters generate BadParameterExceptions, and when a parameter specifies a classname, failure to load that class results in a ParameterClassNotFoundException. Many packages contain a parameter file, params, which defines some basic parameter settings for guiding objects in that package.
Checkpointing and Encoding
The Checkpoint class serializes objects out to a checkpoint file, or restores the object state from a checkpoint file. Checkpoint files are compressed for storage. The Code class is provided for encoding binary data in a reliable but (somewhat) readable way in a text file. When reading data from strings, Code wraps them in a DecodeReturn object to tokenize them and return decoded data.
To output information under multithreaded conditions, handle exceptional conditions gracefully, and recover well under checkpoints, the system provides an Output logging facility. This facility maintains a set of Logs, which write to various streams. Logs can be restarted after a checkpoint through their LogRestarters. Problems in logging to streams result in OutputExceptions. Announcements are special messages outputted to Logs which indicate warnings, errors, or other important messages; Announcements can be stored in memory and so are guaranteed to be preserved across checkpoint restarts.
The system comes with a fast middlemost QuickSort routine, which does object comparisons using the SortComparator and SortComparatorL interfaces. RandomChoice picks a random item from a probability distribution. Lastly the current system version, and related information, is stored in the Version class.
The ec package lays out the overall abstract design of the evolutionary mechanism. It provides the entry point for programs, high-level abstract evolutionary objects and processes, and design patterns.
At the most abstract level are five design pattern interfaces which guide the overall syntactic design of most all other classes in the system. The topmost design pattern is Setup, which is java.io.Serializable (nearly every class in the system must be serializable in order for checkpointing to work). Setup defines a single method, setup(...), which is called near the beginning of an evolutionary run to allow an object to set itself up from ec.util.Parameters stored in the ec.util.ParameterDatabase.
Singleton is a Setup which defines classes which have only a single global instance throughout the evolutionary run. Clique is a Setup which defines classes which contain a small set of instances which are considered unique from each other, and which each get setup(...) seperately. For example, the set of GP types (ec.gp.GPType) is a Clique. Cliques typically have a centralized static java.util.Hashtable which holds all member instances of the Clique. Prototype is a Setup which defines classes whose instances are commonly created not with new but through cloning a single unused prototypical instance with clone(). The prototypical instance receives a single (usually) call to setup(...); thereafter all cloned instances do not get called setup(...). Prototypes are very common objects in the system, used for nearly everything that must be allocated a very many number of times (Individuals, for example). Lastly, Group is a Setup which defines classes very much like Prototypes, but have prototypical instances which must be used. Population, for example, is a Group because a prototypical instance would be very large and expensive to stash away only for the purpose of cloning other Populations.
High-Level Evolutionary Processes
Evolve is the topmost class in the evolutionary system. It provides the entry point main(...) which sets up the checkpointing facility, random number generators, output logging facility, parameter database, and the EvolutionState object. EvolutionState, a Singleton, holds the complete state of evolution at any time. Because it holds everything but Evolve, either directly or indirectly through some trail of references, the state of the entire system can be checkpointed simply by serializing EvolutionState. Because it holds the random number generators, the output facility, the population, and the parameter database, EvolutionState is passed around in a lot of methods to objects which need access to these features. EvolutionState also is the entry point for the main evolutionary loop. For this reason it is abstract; you must define this loop for your own purposes (steady state evolution vs. generational evolution most notably).
EvolutionState contains five other Singleton objects which abstract the high-level evolutionary process. Their abstract forms in the ec package define their functions but do little or nothing to implement them. Initializer is responsible for creating the initial population. Depending on the Subpopulation, it might do this by randomly-generating individuals or by loading them from a text file the user provides. Evaluator evaluates a population and assesses its fitness. Breeder breeds new populations from previous ones. Exchanger performs inter-subpopulation exchanges or inter-process exchanges (possibly with other machines). Unlike other objects, Exchanger contains only hooks; all of its operations are highly domain-specific. Finisher is responsible for "cleaning up" a population at the end of an evolutionary run; it is typically unused as well.
Finally, EvolutionState contains a tree of one or more Statistics objects which perform a variety of statistics accumulation and logging tasks throughout the evolutionary process.
Populations, Individuals, and Other Stuff
EvolutionState contains a single Population, which it receives from Initializer and updates during the evolutionary run. The Population holds an array of Subpopulations. Population and Subpopulation are both Groups. It's up to the particular Breeder and Evaluator used as to how the Population and its one or more Subpopulations are to be bred and evaluated.
Each Subpopulation contains an array of evolutionary Individuals, a Species to which they belong, and a prototypical Fitness which is used to clone Fitness objects each Individuals is evaluated with.
An Individual is the basic atomic unit of evolution. It is evaluated and bred, and defines a possible solution to the given evolutionary domain problem. Each Individual holds a Fitness instance, which defines its evaluated fitness in the population. Both Individual and Fitness are abstract; there are specialized forms of them elsewhere in the system to do various kinds of things (simple vs. multiobjective fitness, GP individuals, etc.).
A Species represents a group of Individuals which have the same basic genetic makeup. A Species instance contains a prototypical Individual which is commonly used to generate new Individuals at the request of the Initializer when it is creating an initial population. It also contains a set of Breeding Pipelines used to breed individuals of that Species. Species is abstract.
A BreedingPipeline is a basic mechanism for breeding individuals from a previous generation to form the next generation. BreedingPipelines are in some sense similar to Java streams: they take Individuals from one or more BreedingSources, which may be either SelectionMethods or other BreedingPipelines, and produce new individuals derived from those source individuals. BreedingPipelines are designed to be linkable in chains or hierarchies to facilitate fairly flexible breeding mechanisms. At one end of this chain are SelectionMethods, which select individuals from an old population and return them. The BreedingPipeline mechanism is also designed to work properly in a multithreaded environment, even though there are no synchronized methods in the system (for efficiency). Each breeding thread receives its own separate set of BreedingPipelines, and in a multithreaded environment Subpopulations are meant to be write-once, read-many. If you need to create a breeder which demands locks, you'll need to synchronize on Individuals on your own. BreedingSource, BreedingPipeline, and SelectionMethod are all abstract.
The Evaluator contains a prototypical Problem, which defines the problem domain individuals are being evaluated against. At evaluation-time, the Evaluator clones a separate Problem for each evaluation thread. Problem is abstract; a problem domain subclasses from Problem to create its own unique evaluation function.
DefaultsForm is the interface adhered to by all defaults (classes which provide the default base for a given package). ECDefaults provides the default base for the ec package.
The ec.simple package contains basic implementations of several of ec's abstract objects, providing simple non-coevolved, generational, non-multiobjective evolution. Even if you're doing something more advanced than this, you may still be able to reuse those objects in this package which do not conflict with your evolution style.
SimpleEvolutionState evaluates and breeds individuals using generational evolution. SimpleInitializer initializes a population simply by calling Population's populate(...) method. SimpleBreeder and SimpleEvaluator breed and evaluate individuals in a non-coevolutionary fashion, that is, individually. SimpleFitness defines the fitness of an individual as a single floating-point value, with 0.0 being ideal and positive infinity being worst. SimpleProblemForm defines the procedure for problems which evaluate individuals independently of one another.
SimpleStatistics writes basic population statistics to an output Log, namely the best individual of each generation, and the best individual of the run. SimpleShortStatistics writes population statistics out in a column-oriented fashion that can be easily disected with tools such as awk. SimpleExchanger and SimpleFinisher do nothing. SimpleDefaults provides the defaults base for the package.
|The ec.multiobjective Package|
The ec.multiobjective package provides hooks for doing multiobjective evolution. Its sole class, MultiobjectiveFitness, defines a Fitness which is an array of floating-point values. MultiObjectiveDefaults, provides the defaults base for the package.
|The ec.steadystate Package|
The ec.steadystate package provides facilities for doing basic steady-state evolution. The SteadyStateEvolutionState class breeds and evaluates a single individual at a time, reintroducing it to the population. To do this, it works in concert with the SteadyStateEvaluator and SteadyStateBreeder classes. Some interfaces are defined for other evolutionary elements to adhere to: SteadyStateSpeciesForm, SteadyStateBreedingSourceForm, and SteadyStateExchangerForm. Because it doesn't do anything at all, ec.SimpleExchanger implements SteadyStateExchangerForm. SteadyStateDefaults provides the defaults base for the package.
The ec.select package contains implementations of SelectionMethods. TournamentSelection implements basic tournament selection, and adheres to ec.steadystate.SteadyStateBreedingSourceForm. MultiSelection stores several subsidiary SelectionMethods and selects one at random to select for it. BestSelection always returns the best (or worst) individual in a population. FitnessProportionateSelection implements Koza-style fitness proportionate (roulette) selection. GreedyOverselection implements Koza-style fitness proportionate selection with greedy overselection. FirstSelection always returns the first individual in a population -- used primarily for testing purposes. SelectDefaults provides the default base for the package.
The ec.breed package contains implementations of domain-independent BreedingPipelines which work across domains. MultiBreedingPipeline stores several subsidiary BreedingPipelines and selects one at random to breed for it. ForceBreedingPipeline stores a subsidary BreedingPipeline and forces it to always produce n individuals at a shot. BufferedBreedingPipeline stores a subsidiary BreedingPipeline, and has it produce n individuals at a shot, which are buffered and used to return individuals as needed. BreedDefaults provides the default base for the package.
The ec.es package contans classes which make possible the (mu,lambda) and (mu+lambda) selection procedures common in Evolution Strategies (and the (mu+mu) procedure common in Evolutionary Programming), plus hooks for the 1/5 rule.
To do these procedures, the package provides its own EvolutionState object, ESEvolutionState, plus two associated breeders, MuCommaLambdaBreeder and MuPlusLambdaBreeder, both of the form ESBreederForm. These breeders require the use of a special SelectionMethod, ESSelection. ESDefaults defines the default base for the package.
The ec.exchange package contains various interesting Exchangers. Of particular interest is IslandExchange, which does asynchronous and synchronous island exchange models over large numbers of parallel processes on different machines. ECJ is platform-agnostic, so these machines can be different operating systems and processor types. InterPopulationExchange does basic exchanges between subpopulations in a population.
|The ec.experiment Package|
The ec.experiment package is provided for you to place experimental examples that shouldn't otherwise go in the app package, and also might not be distributed. To make the namespace clean in case experiments are distributed with this package that could conflict with yours, I suggest creating subpackages of the form ec.experiment.yourname.yourexperiment.
The ec.vector package provides tools for doing vector-style evolution common in Genetic Algorithms and Evolution Strategies. The package by default assumes fixed-length vectors, but can also accommodate variable-length vector representations.
The basic abstract individual in the package is VectorIndividual, of the species VectorSpecies. Vector individuals contain a genome consisting of an array of some type. Various concrete subclasses of VectorIndividual round out the different types that might fill this array: BitVectorIndividual, ByteVectorIndividual, ShortVectorIndividual, IntegerVectorIndividual, LongVectorIndividual, FloatVectorIndividual, and DoubleVectorIndividual provide genomes of their respective types plus appropriate mutation mechanisms for that type. GeneVectorIndividual allows for vectors of arbitrary types: its genome is an array of Gene objects, which you may subclass to do whatever you like with. VectorDefaults defines the default base for the package.
|The ec.vector.breed Package|
The ec.vector.breed package gives some basic breeding pipelines applicable to all VectorIndividual, including the most common breeding procedures in GAs. VectorCrossoverPipeline provides one, two, and any-point (uniform) crossover. VectorMutationPipeline gives a basic mutation procedure. VectorReproductionPipeline simply makes copies of individuals.
The ec.gp package contains a number of classes which implement tree-style genetic programming. The package is fairly complete, containing multiple tree forests, full strongly-typed genetic programming with generic types, automatically-defined functions, automatically-defined macros, ephemeral random constants, and a collection of Koza-I and Koza-II-style breeding and initialization methods.
Individuals, Trees, and Nodes
Genetic programming individuals are of Species which implement the interface GPSpeciesForm. GPSpecies is a simple Species which implements this form. Generally, GP individuals are of the class GPIndividual or a subclass. This class contains an array of GPTrees associated with that individual. A GPTree is a single tree made up of function-nodes, instances of subclasses of GPNode (which is abstract). When executed, a GPNode receives and returns its data in the form of some subclass of the abstract class GPData.
GPNodes have "parents" and "children". Children of GPNodes must always be other GPNodes, but parents of GPNodes implement the interface GPNodeParent. GPTrees and GPNodes are both GPNodeParents.
The GP system provides a few GPNodes by default. ERC is an abstract superclass for ephemeral random constants. ADF defines an automatically-defined function (see Koza-II), associated with a particular tree in the GPIndividual's tree forest. ADF nodes work in concert with ADFArgument nodes, leaf (terminal) nodes which return the evaluated subtrees of the ADF node. ADM nodes [Lee Spector] are like ADFs except that the do not store the evaluated results of their children, but instead execute and re-execute them as necessary. The ADF and ADM facility relies on storing the current calling context in an ADFContext, which is pushed and popped onto a complex double-stack object called an ADFStack.
Types and Constraints
This GP system obeys a set of strongly-typed type constraints which guide which GPNodes may be the children of other GPNodes or the root of which GPTrees in an individual.
Types themselves are defined by three classes: an abstract superclass GPType, and its two subclasses GPAtomicType (basic types, compatible only with themselves) and GPSetType (types compatible with a set of other types). These three classes together form a Clique.
GPFunctionSet, another Clique, defines a set of GPNodes which form a function set for some tree.
The return type and children argument types of a particular GPNode are given its associated GPNodeConstraints object. Likewise, the root type for a GPTree, plus the GPFunctionSet (the set of valid GPNode classes) for the tree and its GPNodeBuilder (the abstract superclass of objects which create initial trees or subtrees), are stored in the GPTree's associated GPTreeConstraints object. GPTreeConstraints and GPNodeConstraints are both Cliques.
Tree Manipulators and Other Stuff
GPBreedingPipeline is an abstract subclass of BreedingPipeline designed specifically for GP. The GPInitializer sets up the various GP Cliques before performing any initialization. A GPNodeSelector is an abstract superclass of GP objects which pick a node out of a tree, perhaps to choose a crossover or mutation point. A GPProblem is a special kind of Problem which defines and holds the ADFStack and GPData passed into trees when executing them. Lastly, GPDefaults holds the default parameter base off of which various GP objects derive their default Parameters.
The ec.gp.koza package holds objects which define the specific algorithms used in Koza-I and Koza-II.
Three pipelines, CrossoverPipeline, MutationPipeline, and ReproductionPipeline are versions of GPBreedingPipeline which perform subtree crossover, point mutation, and reproduction, respectively. These pipelines attempt to be as close to the Koza texts as possible, though they have some minor differences due to ad-hoc approaches taken by Koza-I and Koza-II. These pipelines also implement features found in lil-gp when possible.
Three GPNodeBuilders, FullBuilder, GrowBuilder, and HalfBuilder, build trees using the Koza-I FULL, GROW, and RAMPED HALF-AND-HALF building methods, respectively. Like the pipelines above, these objects attempt to replicate the Koza texts as closely as possible, but there are minor differences due to certain ad-hoc features of Koza-I and Koza-II. These node builders also implement features found in lil-gp when possible.
The KozaNodeSelector is a GPNodeSelector which performs Koza-style node selection (plus some additional features), choosing leaf nodes over terminal nodes with a given probability, etc. KozaFitness replicates the Koza-I fitness metric (raw/standardized/adjusted fitness and number of hits) with one exception: raw fitnesses must always be standardized (that is, 0 must be the best fitness, and positive infinity the worst). KozaStatistics provides additional statistics information, including evaluation and breeding timings, mean fitnesses, and number of nodes and individuals processed. KozaShortStatistics performs abbreviated statistics for each generation, one per line, designed to be parseable with tools such as awk. KozaDefaults holds the default parameter base off of which various Koza-specific objects derive their default Parameters.
The ec.select.breed package contains GPBreedingPipelines other than the standard Koza collection. Presently this consists primarily of mutation methods defined by [Kumar Chellapilla] and [Una-May O'Reilly], plus a few special to ECJ.
InternalCrossoverPipeline swaps two nodes within the same individual. MutateAllNodesPipeline attempts to replace every node in the tree with a compatible node. MutateOneNodePipeline picks one node in the tree and replaces it with a compatible node. MutateERCPipeline mutates an ERC in a way appropriate for it (often numerically). MutateSwapPipeline picks a nonterminal node and swaps two type-compatible sibling subtrees of that node. MutatePromotePipeline picks a node, clips its subtree out, and fills that spot with one of its children nodes. MutateDemotePipeline picks a node, clips its subtree out, fills the spot with a new random node, fills one of that node's argument positions with the original node and its subtree, and generates random subtrees for the remaining argument positions. RehangPipeline adjusts the a tree so that some node within the tree (other than the root) becomes the new root. This can be visualized to some (incorrect) extent in an interesting way: imagine that the tree is a hanging mobile of coat hangers and joints. Now pick a nonterminal in the tree, and grab it with your fingers. Let the rest of the tree droop beneath it. You have rehung the tree. GPBreedDefaults holds the default parameter base off of which GPBreedingPipelines (except for Koza's) derive their default Parameters.
The ec.gp.build package contains GPNodeBuilders other than the standard Koza collection. RandomBranch restricts node building as described by [Kumar Chellapilla]. PTC1 and PTC2 implement the respective algorithms as described by [Sean Luke]. They require function sets of PTCFuntionSetForm. PTCFunctionSet provides one such set for your use. RandTree implements the algorithm by the same name by [Hitoshi Iba] -- the implementation is experimental, however. Instead, try the 100% Uniform algorithm as defined by [Bohm and Geyer-Shultz]. GPBuildDefaults holds the default parameter base for this package.
Presently there are five problem domains which use this system. Each contains parameter files which set up the evolutionary run. Problem domains typically contain a subclass of ec.Problem which evaluates individuals in the given problem domain. GP Problem domains also typically contain a sublcass of ec.GPData, plus a func subpackage which stores the GPNode subclasses which define the actual function nodes for the problem.
- The Koza-I Boolean-Multiplexer problem. 11.params runs the 11-Booolean Multiplexer, 6.params runs the 6-Boolean Multiplexer, and 3.params runs the 3-Boolean Multiplexer (with no if statements). This version of multiplexer, which should be algorithmically identical to the "slow" multiplexer below, takes advantage of Langdon and Poli's "Sub-Machine Code" GP, basically doing a single call of the function tree but having each node perform its operation on all the multiplexer problems simultaneously. This can be done very efficiently, especially for 6.params, and typically results in a 3 to 10 times speedup, depending on the length of the run.
- ec.app.multiplexerslow ("Slow" multiplexer)
- This original form of the Koza-I Boolean-Multiplexer problem can be found in the directory ec/app/multiplexerslow It's much easier to understand than the new (and much much faster) multiplexer code, so it's provided here as example code.
- The Koza-I Symbolic Regression problem. The erc.params file runs Symbolic Regression with ephemeral random constants between -1.0 and 1.0 inclusive. The noerc.params file runs Symbolic Regression without ephemeral random constants. Symbolic regression can be sped up in the same way as "fast" multiplexer is done, but the difference is minor enough (maybe 5%) that it's not worth the dramatic increase in complexity.
- The Koza-I Artificial Ant problem. params runs the standard Ant problem with the Santa Fe trail (santafe.trl). progn4.params runs the Ant problem with the addition of a progn4 statement with the Santa Fe trail, as used in lil-gp. The Ant directory also contains an additional trail, the Los Altos trail (losaltos.trl). Both trail files are compatible with lil-gp. The Ant problem also has a special Statistics class, AntStatistics, which prints out the best ant trail.
- The Koza-II Lawnmower problem. params runs the Lawnmower problem with an 8x8 lawn and no ADFs. adf.params runs the same problem but with 2 ADFs as described in Koza-II.
- The [even|odd] n-parity problem, for values of n between 2 and 31 inclusive. params runs the n-parity problem with no ADFs. adf.params runs the same problem but with 2 ADFs as described in Koza-II. Perhaps in the future if there's time, parity will also get the "fast" treatment like multiplexer did.
- Edge-encoding problems for the Tomita Language set. See Sean Luke's PhD thesis, or other papers on Edge Encoding, which can be found at his papers repository. Edge comes with its own statistics which, like Ant, describe the result. The Tomita Language problems 1-7 can be run with the equivalent 1.params, 2.params, 3.params, 4.params, 5.params, 6.params, 7.params files.
- The Koza-II Two-Box problem, with or without ADFs. adf.params runs the problem with
ADFs, and noadf.params runs the problem without ADFs.
- A simple GA problem in ECJ. Sum is a simple example of doing the fitness=(sum over vector) problem. sum.params runs the example.
- ECSuite is a collection of common floating-point GA problems. ecsuite.params by default runs the Rastrigin problem -- read the parameter file for information on how to run other problem domains.