GEMS Code Notes :: gpstats

This package is used in the analysis of the GP system, looking at a single generation of individuals, or even multiple generations.

This package uses the gems:individual structure from mini-gp to represent an individual.

Functions

best-fitness-in-generation (generation)

best-individuals-in-generation (generation &optional tolerance)

clean-individuals (individuals run-experiment)

find-best-worst-pairs (individuals)

read-log (filename)

Used to read in a log file, and return a list of lists, each child list being the values in a line of the log file.

This function can also read the output of write-deadcode-generations.

Example:

> sbcl
* (require 'asdf)
* (require 'gems)
* (gems:read-log "logfile.csv")
((1 0.48 0.5 0.48 5050.0 0.99 18.0 0.09 1.0)
 (2 0.48 0.5 0.48 5050.0 0.99 18.0 0.09 1.0)
 (3 0.48 0.5 0.48 3850.0 0.96 6.0 0.03 1.0)
 (4 0.48 0.5 0.48 220.0 0.34 8.0 0.04 1.0)
 (5 0.48 0.5 0.48 240.0 0.33 6.0 0.03 1.0)
 (6 0.48 0.5 0.48 8530.0 1.0 18.0 0.09 1.0)
 (7 0.48 0.5 0.48 1735.0 0.56 21.0 0.1 1.0)
...
)

read-similarity-file (filename)

Used to read a set of similarities from a dat file (see Output format) into an array.

read-trace (filename)

Used to read in a trace file, as created by a logger of kind :trace.

The function returns a list of generations, where each generation is itself a list comprising the generation number and then the individuals in that generation.

Example, at REPL, read in a trace file and show, for each generation, the generation number, the number of individuals in that generation, and the fitness of the first individual:

> sbcl
* (require 'asdf)
* (require 'gems)
* (setf results (gems:read-trace "sample-trace.yml"))
* (dolist (g results) (format t "~a ~a ~a" (first g) (length (rest g)) (gems:individual-fitness (second g))))
5000 1000 0.9356

write-deadcode-generations (generations filebase run-experiment)

generations is a list of generation values, as read by read-trace
filebase is the first part of the filename - each generation is saved to a file formed from the filebase and the generation number
run-experiment is a function to run an experiment on a given individual

For each generation, computes the proportion of dead code in every instance, and then reports the frequency of each proportion under the following groups as a CSV file:

0, "0.0-0.1", 0
1, "0.1-0.2", 5
2, "0.2-0.3", 3
3, "0.3-0.4", 7
4, "0.4-0.5", 10
5, "0.5-0.6", 12
6, "0.6-0.7", 15
7, "0.7-0.8", 20
8, "0.8-0.9", 6
9, "0.9-1.0", 0

In addition, saves a file named "filebase-stats.csv" which contains the generation number, mean and standard-deviation of the dead-code proportion in CSV format.

write-fitness-generations (generations filebase)

generations is a list of generation values, as read by read-trace
filebase is the first part of the filename - each generation is saved to a file formed from the filebase and the generation number

For each generation, computes the fitness similarity of every pair of instances and outputs the result to a file - one file per generation - in the Output format described below.

Fitness similarity is the absolute difference of the fitness of the two models.

write-similarity-generations (generations filebase)

generations is a list of generation values, as read by read-trace
filebase is the first part of the filename - each generation is saved to a file formed from the filebase and the generation number

For each generation, computes the model similarity of every pair of instances and outputs the result to a file - one file per generation - in the Output format described below.

Model similarity is computed using syntax-tree:similarity.

write-similarity-individuals (individuals filename)

individuals is a list of gems:individual instances
filename is the name of a file to save the similarity data to

Given a list of individuals, this function computes the similarity of every pair of individuals and outputs the result to a file of the given filename, in the Output format described below.

Output format

The functions write-fitness-generations, write-similarity-generations and write-similarity-individuals each compare every pair of individuals in one or more given generations. They output their findings in the following output format:

Each row in the file has three numbers: the index of each of the two individuals, followed by the (fitness or model) similarity score. There is a blank line between each "row" of the output.

This format is used because it is one of those accepted by GNUplot (see the second example) - see plotting for why GNUplot was chosen. If required, alternative output formats can be supported, e.g. for Matlab or R.

GEMS Package: gpstats

Functions

Output format