EcoData Retriever Developer Documentation

The EcoData Retriever system is a Python library created to simplify downloading and importing ecological data. There's a lot of publically available data out there, and the speed at which we can add new datasets is limited. We've made it easy for developers to develop their own custom dataset scripts.

This document is a resource for Python developers or ecologists interested in developing scripts for use with the EcoData Retriever.

We encourage users who develop their own scripts to submit them to the Retriever team, so that they can be used by other researchers in future distributions.

API Documentation

Read the API documentation for help with the EcoData Retriever API.

Scripts

The EcoData Retriever platform is divided into three packages:

The platform dynamically loads dataset scripts from Python (.py) files found in the "scripts" directory. Each script contains instructions to access a different dataset; if you need a dataset that's not yet supported, you can create your own script in the "scripts" directory and it will show up automatically. On Mac and Linux, this directory is in ~/.retriever/scripts by default

A simple example

The EcoData Retriever platform was developed to take the most effort out of the most common tasks in importing data. As an example, the Ernest Mammal Life History dataset, one of our standard scripts, is implemented here in just 3 lines of code (expanded for clarity):

from retriever.lib.templates import BasicTextTemplate

VERSION = '0.5'

SCRIPT = BasicTextTemplate(
                           name="Mammal Life History Database (Ecological Archives 2003)",
                           description="S. K. Morgan Ernest. 2003. Life history characteristics of placental non-volant mammals. Ecology 84:3402.",
                           shortname="MammalLH",
                           urls = {"species": "http://esapubs.org/archive/ecol/E084/093/Mammal_lifehistories_v2.txt"}
                           )

Here's a simple walkthrough of what's going on in this script:

  1. from retriever.lib.templates import BasicTextTemplate
    The BasicTextTemplate class contains all of the functionality needed to download raw data files, create database tables, and import data.
  2. SCRIPT = BasicTextTemplate( ... )
    The Retriever looks for this variable, SCRIPT, when it loads scripts; SCRIPT should be an instance of the Script class found in retriever.lib.models (or another class that inherits from this class, such as BasicTextTemplate.)
  3. name=..., description=..., shortname=...
    These keyword arguments give basic information about the script.
  4. urls = {"species": "http://..."}
    The BasicTextTemplate class uses the urls dictionary to create tables. Each key represents the name of a table to be created; each value represents the URL of the raw data file, which will be automatically parsed for column names, data types, etc. Note that because Python dictionaries are unordered, these tables will be downloaded in no particular order.
    Optionally, another dictionary called tables can also be passed if the Retriever is unable to automatically determine the structure of the table that should be created. Using the table names as key arguments, each value should be an instance of the Table class found in retriever.lib.models.

Adding scripts to the Retriever

If you've developed a script and you'd like it to be accessible from the Retriever's GUI, just place it in the scripts directory and it will be loaded automatically when the Retriever is started. If there are any problems with your script it will not be shown.

Testing

The EcoData Retriever system also contains a helper class, ScriptTest, for use in unit testing.

As with the development of the scripts themselves, the development of unit tests has also been streamlined. The class is meant to run a script, import the data from the newly created database, and test it against a text file containing manually imported data.

See the scripts included in the EcoData Retriever distribution for examples of test classes.

To run all tests, you'll first need to add your script, as explained in the section Adding scripts to the main wizard. Running tests.py will run all unit tests found in these scripts.