README

CHICKEN benchmarks

Some benchmark programs have been taken from CHICKEN version 4.2.0 and
other programs have been adapted from other benchmark suites or just
made for this suite.

Note: please see the section "Results' stability" for information on
how to get stable benchmark results.

Before you can use it with CHICKEN 5, you must run this command first:

  $ chicken-install srfi-1 srfi-13 srfi-69

In CHICKEN 4, these extensions are included with the base system, so
you don't need to install anything.

A simpler runner (run.scm) is available:

  $ ./run.scm -h
  Usage: run.scm [ <options> ] [ config file ]
  
  <options> are:
    --programs-dir=<directory>      directory where programs are
    --log-file=<file>               the log filename
    --debug-file=<file>             the debug log filename
    --repetitions=<number>          number of times to repeat each program
    --csc-options=<csc options>     options to give csc when compiling programs
    --runtime-options=<options>     runtime options to give programs
    --programs=<prog1>,<prog2>      a comma-separated list of programs to run
    --skip-programs=<prog1>,<prog2> a comma-separated list of programs to skip
    --max-deviance=<number>         maximum accepted deviance in the results
                                    (standard deviation normalized against mean).
                                    Default = 5

The configuration file (optional) is a scheme file which can set the
following parameters:

* programs-dir: directory where programs to be benchmarked can be
    found (default = "progs/general")

* repetitions: number of times each benchmark program is executed
    (default = 10)

* installation-prefix: the CHICKEN installation prefix (default =
  prefix of the running chicken)

* csc-options: options to be passed to csc

* log-file: a sexpr-formatted log file where results are written
    (default = benchmark.log)

* debug?: shows the executed shell commands

* programs: a list of symbols or #f.  If it is a list of symbols, the
    symbols indicate what programs are to be run.  #f indicates all
    programs (default = all programs)

* skip-programs: a list of symbols indicating what programs are not to
    be run (default = '())

* max-deviance: a number specifying the maximum deviance in
    results (default = 5).


If the configuration file is not provided, run.scm will pick the
CHICKEN tools from $PATH.


Configuration file example:

  $ cat 4.7.0.conf
  (csc-options "-O3")


compare.scm can be used to display log results in a readable format.
It prints pure text output (or text decorated with ANSI escape
sequences if the output is a terminal) or generate an HTML page.
compare.scm requires the sxml-transforms egg.

Example:

  $ ./compare.scm 4.7.0.log 4.7.4.log
  +---[1]:
  |-> installation-prefix: /usr/local/chicken-4.7.0
  |-> csc-options: -O3
  |-> repetitions: 10

  +---[2]:
  |-> installation-prefix: /usr/local/chicken-4.7.4
  |-> csc-options: -O3
  |-> repetitions: 10

  Displaying normalized results (larger numbers indicate better results)

  Programs                   [1]       [2]
  ========================================
  0_________________________1.00______1.00
  binarytrees_______________1.00______1.78
  boyer_____________________1.00______1.02
  browse____________________1.00______1.02
  conform___________________1.01______1.00
  cpstak____________________1.00______1.01
  ctak______________________1.00______1.02
  dderiv____________________1.07______1.00
  deriv_____________________1.05______1.00
  destructive_______________1.03______1.00
  div-iter__________________1.00______1.62
  div-rec___________________1.00______1.10
  dynamic___________________1.00______1.20
  earley____________________1.00______1.19
  fft_______________________1.00______1.02
  fib_______________________1.00______1.00
  fibc______________________1.00______1.02
  fprint____________________1.04______1.00
  fread_____________________1.00______1.36
  graphs____________________1.00______1.09
  hanoi_____________________1.00______1.00
  kanren____________________1.00______1.07
  lattice___________________1.00______1.13
  maze______________________1.00______1.08
  mazefun___________________FAIL______1.00
  nbody_____________________1.00______1.07
  nboyer____________________1.00______1.24
  nestedloop________________1.00______1.03
  nfa_______________________1.00______1.27
  nqueens___________________1.00______1.10
  nucleic2__________________1.00______1.15
  paraffins_________________1.01______1.00
  peval_____________________1.00______1.09
  psyntax___________________1.00______1.12
  puzzle____________________1.00______1.04
  sboyer____________________1.00______1.21
  scheme____________________1.00______1.18
  slatex____________________1.06______1.00
  sort1_____________________1.00______1.11
  tak_______________________1.00______1.05
  takl______________________1.00______1.18
  takr______________________1.00______1.01
  traverse__________________1.00______1.10
  travinit__________________1.00______1.05
  triangl___________________1.00______1.04


query.scm can be used to extract some specific information out of
benchmark log files.  Examples:

  $ ./query.scm -h
  Usage: query.scm <command> [<options>] log-file
  
  <command>s are:
  csc-options
  programs
  repetitions
  runtime-options
  build-time [--programs=<prog1>[,prog2...]]
  cpu-time [--programs=<prog1>[,prog2...]]
  major-gcs-time [--programs=<prog1>[,prog2...]]
  mutations [--programs=<prog1>[,prog2...]]
  mutations-tracked [--programs=<prog1>[,prog2...]]
  major-gcs [--programs=<prog1>[,prog2...]]
  minor-gcs [--programs=<prog1>[,prog2...]]
  
  $ ./query.scm runtime-options benchmark.log
  -:s256k
  
  $ ./query.scm cpu-time benchmark.log 
  525.331999999999
  
  $ ./query.scm cpu-time --programs=maze,psyntax benchmark.log 
  8.456


## Results' stability

You might notice that the results my vary on your system, even if you
use the same programs with the same inputs.  Some environmental
factors may influence the stability of results:

* Multi-task systems.  The benchmark programs compete with other
  programs for the use of resources on your system (e.g., CPU, memory,
  I/O).  It's important that the system where you are running the
  benchmarks doesn't run programs that might influence the benchmark
  results.

* Swapping.  If your system uses swap (non-volatile memory as RAM)
  while benchmarks are running, the results may be greatly distorted.

* CPU frequency governor.  Modern CPUs might change their clock
  according to various policies.  On Linux systems, some examples are:
  "ondemand", "powersave", "performance" and "conservative".  The CPU
  frequency governor can highly influence the benchmark results (see
  http://paste.call-cc.org/paste?id=9199f833f1ccae2c9a1234980ffe055c5ab4a677
  for an example).  The recommended CPU frequency governor for
  benchmarks is "performance".

* CPU frequency throttling due to overheating.  CPUs can automatically
  reduce their clock if they start to overheat, which usually happens
  when they get stressed (and that's what this benchmark tool does to
  the CPU).  So, to get stable results, make sure that your CPU
  doesn't overheat to the point of getting its frequency throttled
  during the execution of benchmarks.

This tool will report deviances in the results, which indicate that
they are not very stable.  The deviances are determined by computing
the standard deviation of results and normalizing it against the mean
(see the --max-deviance command line option).  By default, deviances
greater than 5 will be highlighted in the results' summary.

Ideally, we should have zero deviance, but that might not be
achievable in practice, due to the nature of modern systems.