AEC Evaluation
These are instructions for evaluating Herbie, the artifact for PLDI 2015 paper #61. The main downloads for this artifact are the Submitted paper and the VirtualBox Image, along with these instructions.
There are three ways to try out Herbie. The simplest is to use a VirtualBox image to run Herbie; for users familiar with Docker, Herbie provides a Docker image which may be more convenient; and Herbie can also be built and run from source.
To run Herbie in a virtual machine, download the virtual machine image, start VirtualBox, and start the image in VirtualBox. (VMs other than VirtualBox should also work; however, this has not been tested.)
The virtual machine will start into a graphical desktop with two icons on the desktop:
Results
, which holds Herbie's output. It contains a recent run's results.README.html
, which are a copy of these instructions.
In the virtual machine, Herbie can be run with the herbie
command.
In case you want to install additional software in the VM,
the machine is a standard Ubuntu 14.04 Desktop installation,
with username aec and password password.
To run Herbie through Docker, install Docker and download the Herbie image to your computer with:
docker pull pldi15num61/herbie
Create a folder for Herbie to place its results into:
mkdir Results
You can now run Herbie with the incantation
docker run -it -v $PWD/Results/:/herbie/graphs pldi15num61/herbie
For convenience create an alias for this command in your shell. In Bash, you would do this by executing:
alias herbie=docker run -it -v $PWD/Results/:/herbie/graphs pldi15num61/herbie
Herbie is developed on Github
in Racket.
To run Herbie, you'll need to install Racket.
Take care to use the official installer,
instead of using your distribution's package manager or a tool like OS X Homebrew.
These repositories often have out-of-date Racket version (Herbie requires 6.1)
or buggy versions of Racket's bundled mathematics libraries.
Note that Herbie's git
history contains the names of Herbie's authors,
so this method may sacrifice double-blind evaluation.
Herbie's source can be downloaded with:
git clone https://github.com/uwplse/herbie.git herbie
Build Herbie by running:
cd herbie && raco make herbie/reports/make-report.rkt
Herbie can now be run with:
racket herbie/reports/make-report.rkt
For convenience create an alias for this command in your shell. In Bash, you would do this by executing:
alias herbie=racket herbie/reports/make-report.rkt
Unlike for the virtual machine or the Docker image,
results will appear in graphs/
inside the Herbie source directory.
Now that Herbie is installed and can be run,
there are several experiments you can perform
to reproduce the results in the paper.
These instructions assume the herbie
alias
has been defined as in the instructions above.
To reproduce the results from the main evaluation, run:
herbie bench/hamming
This command will take a while to run and demands at least two gigabytes of memory to complete. (Runtime can be anywhere from five minutes to an hour, depending on the number of CPUs available, the available memory, and the speed of the machine. In a virtual machine, this may take longer yet).
Once complete, open report.html
,
from the results folder, in a browser.
(The page has been mostly tested in Firefox,
but should work in all modern browsers.)
Note that each invocation of herbie
will overwrite this report page.
The top of the page should contain a graphic
similar to the double precision results in Figure 7 from the paper.
The results in the figure may not be exactly identical to that in the paper, due to the following reasons:
We do not expect any of these sources of error to lead to significant difference in the results.
The rest of the report contains various details of how Herbie achieved its results
and several metrics for evaluating them.
We did not discuss these metrics in the paper, but invite the artifact evaluator to explore them.
For each benchmark, the Target bits
column represents
the average bits correct for Hamming's answer, when known.
To reproduce Herbie's results on the extended evaluation, execute:
herbie bench
This command may take several hours to execute,
and is expected to require as much as four gigabytes of memory.
The results will again be summarized in report.html
.
Note that for the numeric results reported in the paper,
only some of the test cases were considered.
(Herbie's complete benchmarks contain several trivial or duplicate benchmarks,
since the same formula sometimes shows up in multiple places;
these were ignored in the reported results.)
Herbie supports several additional options,
which can be used to explore the effect of other parameters.
These options are summarized by herbie --help
:
SEED
environment variable. Omitting this argument asks Herbie to choose a new seed.sample:double
and precision:double
are toggled,
Herbie will search for improvements in single-precision mode.
The other flags turn off various parts of Herbie's search, and are not recommended.
New benchmarks can also be written and passed to Herbie.
To do this, create a new file in bench/
named something.rkt
.
This file should be in a standard format; see bench/basic.rkt
for an example.
Herbie should now be run so:
herbie bench/something.rkt
A report is produced as usual.