Reproducible Environments and Github: Nix vs. Docker

Nix: The Reproducible Package Manager

Nix is a tool for managing software environments in a way that is fully reproducible. It works by describing your entire environment—including the exact versions of R and all your R packages—in a single file. This file can then be used by anyone to get the exact same setup.

The Nix Workflow with RStudio and GitHub

  1. Install Nix: In the search bar, search for terminal or command prompt. Type in wsl –install
    (wsl stands for Windows Subsystem for Linux).

The goal is to install a Linux distribution, eg Ubuntu. You will be asked to provide a user name and password. !! When typing in the password, no asterisks or other symbols will be shown. You have to be sure what you type, then press enter.

Open Ubuntu, by searching for ‘Ubuntu’ and opening it, or by typing and executing ‘wsl’ in the command prompt. Now proceed with the instructions on nix.org/download. I had to execute:
sh <(curl –proto ‘=https’ –tlsv1.2 -L https://nixos.org/nix/install) –no-daemon
(! delete the dollar sign at the start of the code given on the website)

From here on it turns out to be very dependent on the local system you are using. I would recommend working with AI, i.e. tell them the system you are on, that you installed wsl and Ubuntu, and that you want to use RStudio with Nix. Ask to use flake.nix (by default you will be told to use shell.nix or default.nix, for setting up the R version, R packages and versions you would like to use). In flake.nix, it’s possible to load specific versions of R packages, all collected on a dedicated github repository.

Attention: If you want to try this out, I recommend to run either ‘shell-nix’ or ‘nix’, and following ‘rstudio’ commands from the Unix/Ubuntu command prompt, NOT from the RStudio terminal. Hint: Most keyboard shortcuts, like ‘Copy and paste’, and file paths formats are different under Unix. (e.g. Shift + Ctrl + C for copy; forward slash / for paths in Unix; windows paths starting with c:/users can be accessed with /mnt/c/users)

Pros and Cons of using Nix

  • Pros:
    • Reproducibility: Guarantees the same environment every time.
    • Isolation: Packages and dependencies are kept separate from your system, avoiding conflicts.
    • Fine-grained control: You can pin dependencies to specific versions, even across different operating systems.
    • Performance: Nix runs directly on your machine without the overhead of containerization.
  • Cons:
    • Learning Curve: The Nix language and ecosystem IS complex for new users.
    • Initial Setup: Installing Nix and setting up your first environment can take time.
    • Package availability: There won’t be all R packages and versions available on Nix, compared to package availability on CRAN.
    • All-or-Nothing: All collaborators must use Nix to get the benefits.

Docker: The Containerization Solution

Docker is a platform that uses containers to package an application and its dependencies into a single, isolated unit. Think of it as a virtual machine that contains everything your project needs to run, from the operating system to the R packages.

The Docker Workflow with RStudio and GitHub

While the following steps give a rough overview of how one possible process of using docker with R might work, a more recent worked example online, combined with AI might be your best shot at trying this out, as instruction steps can change with updated software versions. Once it’s set up though, it’s execution is quite stable from my experience. It might be worth looking at the pre-configured r-docker images: https://rocker-project.org/images/

We implemented docker into our interactive workshop websites (metaanalysis.zajitschek.net, lifespananalysis.zajitschek.net), and there might be other ways of integrating docker functionality for specific purposes other than reproducible analysis.

  1. Define Your Container: In the root of your R project, you create a text file named Dockerfile. This file contains instructions for building your container image. You’ll start with a base image (like a rocker/rstudio image), and then specify which R packages to install.

    FROM rocker/rstudio:4.3.0
    
    # Install R packages
    RUN R -e "install.packages('ggplot2', repos='[https://cloud.r-project.org/](https://cloud.r-project.org/)')"
    RUN R -e "install.packages('jsonlite', repos='[https://cloud.r-project.org/](https://cloud.r-project.org/)')"
    
    # Set the working directory
    WORKDIR /home/rstudio
  2. Build the Image: From your terminal, run the command docker build -t my-r-project .. This command reads the Dockerfile and builds a new Docker image containing all your dependencies.

  3. Run the Container: Once the image is built, you can run a container from it and map your project folder to it.

docker run --rm -p 8787:8787 -v C:/path/to/my/project:/home/rstudio my-r-project

This command starts the container and makes RStudio accessible in your web browser at localhost:8787. The -v flag ensures that your local project files are accessible inside the container.

  1. Version Control with Git and GitHub:

    • You commit the Dockerfile to your Git repository and push it to GitHub alongside your R code.
    • A collaborator clones the repository. They only need to run the docker build and docker run commands to get a fully working, isolated environment that is identical to yours.

Pros and Cons of using Docker

  • Pros:
    • Portability: Docker containers can run on any machine with Docker installed (Windows, macOS, or Linux).
    • Strong Isolation: The container is a fully isolated environment, which is great for security and avoiding conflicts.
    • Industry Standard: Docker is widely used, and there is a huge community and ecosystem of pre-built images.
    • Environment Encapsulation: The entire environment, from the operating system up, is encapsulated in the image.
  • Cons:
    • Performance Overhead: The containerization layer can add some overhead, especially for computationally intensive tasks.
    • Resource Intensive: Docker can be more resource-heavy than Nix, as it requires a background daemon and can consume more disk space.
    • Debugging: It can be slightly more challenging to debug code inside a container than it is in a local environment.