Herbie

Herbie (#61)

AEC Evaluation

These are instructions for evaluating Herbie, the artifact for PLDI 2015 paper #61. The main downloads for this artifact are the Submitted paper and the VirtualBox Image, along with these instructions.

Installing Herbie

There are three ways to try out Herbie. The simplest is to use a VirtualBox image to run Herbie; for users familiar with Docker, Herbie provides a Docker image which may be more convenient; and Herbie can also be built and run from source.

Virtual machine

To run Herbie in a virtual machine, download the virtual machine image, start VirtualBox, and start the image in VirtualBox. (VMs other than VirtualBox should also work; however, this has not been tested.)

The virtual machine will start into a graphical desktop with two icons on the desktop:

In the virtual machine, Herbie can be run with the herbie command. In case you want to install additional software in the VM, the machine is a standard Ubuntu 14.04 Desktop installation, with username aec and password password.

Docker image

To run Herbie through Docker, install Docker and download the Herbie image to your computer with:

docker pull pldi15num61/herbie

Create a folder for Herbie to place its results into:

mkdir Results

You can now run Herbie with the incantation

docker run -it -v $PWD/Results/:/herbie/graphs pldi15num61/herbie

For convenience create an alias for this command in your shell. In Bash, you would do this by executing:

alias herbie=docker run -it -v $PWD/Results/:/herbie/graphs pldi15num61/herbie

Installing from source

Herbie is developed on Github in Racket. To run Herbie, you'll need to install Racket. Take care to use the official installer, instead of using your distribution's package manager or a tool like OS X Homebrew. These repositories often have out-of-date Racket version (Herbie requires 6.1) or buggy versions of Racket's bundled mathematics libraries. Note that Herbie's git history contains the names of Herbie's authors, so this method may sacrifice double-blind evaluation.

Herbie's source can be downloaded with:

git clone https://github.com/uwplse/herbie.git herbie

Build Herbie by running:

cd herbie && raco make herbie/reports/make-report.rkt

Herbie can now be run with:

racket herbie/reports/make-report.rkt

For convenience create an alias for this command in your shell. In Bash, you would do this by executing:

alias herbie=racket herbie/reports/make-report.rkt

Unlike for the virtual machine or the Docker image, results will appear in graphs/ inside the Herbie source directory.

Evaluating Herbie

Now that Herbie is installed and can be run, there are several experiments you can perform to reproduce the results in the paper. These instructions assume the herbie alias has been defined as in the instructions above.

Reproducing the main evaluation

To reproduce the results from the main evaluation, run:

herbie bench/hamming

This command will take a while to run and demands at least two gigabytes of memory to complete. (Runtime can be anywhere from five minutes to an hour, depending on the number of CPUs available, the available memory, and the speed of the machine. In a virtual machine, this may take longer yet).

Once complete, open report.html, from the results folder, in a browser. (The page has been mostly tested in Firefox, but should work in all modern browsers.) Note that each invocation of herbie will overwrite this report page. The top of the page should contain a graphic similar to the double precision results in Figure 7 from the paper.

The results in the figure may not be exactly identical to that in the paper, due to the following reasons:

We do not expect any of these sources of error to lead to significant difference in the results.

The rest of the report contains various details of how Herbie achieved its results and several metrics for evaluating them. We did not discuss these metrics in the paper, but invite the artifact evaluator to explore them. For each benchmark, the Target bits column represents the average bits correct for Hamming's answer, when known.

Reproducing the extended test suite evaluation

To reproduce Herbie's results on the extended evaluation, execute:

herbie bench

This command may take several hours to execute, and is expected to require as much as four gigabytes of memory. The results will again be summarized in report.html. Note that for the numeric results reported in the paper, only some of the test cases were considered. (Herbie's complete benchmarks contain several trivial or duplicate benchmarks, since the same formula sometimes shows up in multiple places; these were ignored in the reported results.)

More things to do with Herbie

Herbie supports several additional options, which can be used to explore the effect of other parameters. These options are summarized by herbie --help:

-r R
A random seed to use; a fixed seed is stored in the SEED environment variable. Omitting this argument asks Herbie to choose a new seed.
-n N
Number of iterations of the main loop to use.
-s N
Number of sample points to use during the internal search.
-f category:flag
Toggle flags that govern Herbie's search. This option can be repeated. If both sample:double and precision:double are toggled, Herbie will search for improvements in single-precision mode. The other flags turn off various parts of Herbie's search, and are not recommended.

Writing new tests

New benchmarks can also be written and passed to Herbie. To do this, create a new file in bench/ named something.rkt. This file should be in a standard format; see bench/basic.rkt for an example. Herbie should now be run so:

herbie bench/something.rkt

A report is produced as usual.