Building a Docker Image

This document explains how to quickly prepare a Docker image from the biocontainers base image. The biocontainers base image have a pre-installed conda environments. The container is recommended to contain only the dependent libraries for your codes to run, so that you do not have to build a new image every time you modify your scripts. The BDP mounts your codes to the container on the fly.

Let’s start by explaining how the Dockerfile is written.

1. Base Image

We use the biocontainers as our base image. You may see their Dockerfile. The following instruction shows that our package image is based on the bioconductors/bioconductions image.

FROM biocontainers/biocontainers:latest

The biocontainers/biocontainers image is built on top of ubuntu:16.04. All modifications can be traced back based on its dockerfile. It creates an account biodocker and installed several useful tools such as conda for developers to install their dependent packages/libraries.

2. Install required packages/libraries that needs root privileges

The following snippet in the dockerfile file instructs the Docker to build your image with the instructions. The RUN instruction shows you to install the latest NodeJS version for demonstration.

You may also install other packages/scripts that require root privileges.

See also

The dockerfile reference: https://docs.docker.com/engine/reference/builder/

USER root
ENV NODE_PATH=/usr/lib/node_modules
RUN apt-get update && \
    curl -sL https://deb.nodesource.com/setup_10.x | bash - && \
    apt-get install -y nodejs && \
    apt-get clean && \
    apt-get purge --auto-remove -y curl

Tip

You can see the commands after the RUN instructions as if you are using the Linux operation system. Instead of using multiple RUN instructions, using && \ to organize multiple commands into multi-lines. This will reduce the layers of cached images. use '\' to split one long line of a command into multi-lines.

3. Install required packages/libraries that need NOT root privileges

Because the biocontainers image has created a user account biodocker, you can switch to the biodocker account to install the required packages or libraries. The following snippet showing how to switch the user to biodocker (from root).

USER biodocker

The followings show four ways to install the required packages:

When the tools are already in conda

Default conda channels r and bioconda were already set in the biocontainers base image, you can directly use these channels. Or you may need the -c argument to specify the channel. (e.g. conda install -c conda-forge nodejs)

You may also find other conda packages from the anaconda website.

RUN conda install r-base=3.3.2 \
                  samtools=1.4 \
                  bwa=0.7.15 \
                  fastqc=0.11.5 \
                  cutadapt

Although it is optional, but is best to specify the version of these required packages/libraries.

When tools need to be downloaded from internet

You can use curl or wget to retrieve contents from internet. Remember to clean unneeded files that do not need after installation.

RUN mkdir -p /tmp/trim_galore && \
    cd /tmp/trim_galore && \
    wget http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/trim_galore_v0.4.3.zip && \
    unzip /tmp/trim_galore/trim_galore_v0.4.3.zip && \
    mv trim_galore /home/biodocker/bin/trim_galore && \
    rm -rf /tmp/trim_galore

Build from the source code

The following code block demonstrates an example to download the compressed source code. You need to decompress the file(s) and then build the executable from source codes. You may keep only the binary executable files. You might want to set the file to be executable by setting chmod 755 your-executable-file. Remember to remove all unneeded files.

RUN cd /tmp && wget http://search.cpan.org/CPAN/authors/id/T/TI/TIMB/DBI-1.636.tar.gz && \
    tar zxvf DBI-1.636.tar.gz && \
    cd /tmp/DBI-1.636 && \
    perl Makefile.PL && \
    make && \
    make test && \
    make install && \
    rm -rf /tmp/DBI-1.636 && \
    rm -f /tmp/DBI-1.636.tar.gz

Copy the scripts or files from other third party library

You may arrange files into your own file structures. It is possible to directly copy files to the package container. Putting your files inside /home/biodocker/ directory since this is the default home directory.

COPY --chown=biodocker:biodocker ["./scripts-inside-container", "/home/biodocker/scripts/"]

More information about the COPY instructions can be found.

Install packages via package manager(s) that do not need the root privilege

If your scripts require npm packages and a package.json is provided in the folder /home/biodocker/scripts, you may need to run npm install inside that folder. In fact, you may install required packages/libraries as if you are using a linux system. For example, use pip install to install python packages may be suitable for your needs.

RUN cd /home/biodocker/scripts && \
    npm install && \
    npm cache clean

4. Building from Dockerfile

In the directory where the Dockerfile locates, call docker build -t 'your-docker-image-name' .. (Don’t forget the last . in your command, that specifies the Dockerfile locations, which is the current working directory ..

Tip

It is recommended to use gitlab to save your scripts as well as the container registry. It is free for personal use.

Volume mapping from host folders to docker containers

The VOLUME mapping is automatically processed in Big Data Processor. The Docker Adapter mounts the project folder and the package folder to the container with the paths /project and /pacakge folders, respectively. This means your scripts that editing on the BDP can be accessed in the /package/scripts.

See also

To fully understand how to map volumes between hosts and containers, please read the docker run reference (the -v part).