Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Scheduling data imports into DHIS2

This guide explains how to set up automated, scheduled imports of climate data into DHIS2. We demonstrate how to do this using the Import ERA5-Land Daily workflow, showing how to move from interactive notebook exploration to production-ready scheduled imports. But the same approach can be used to setup a workflow for any other workflow or script.

The key technologies we will use are Docker as a container to run the schedule, cron as the actual scheduling program, and papermill to run the import notebook.

For a complete example of an implementation that imports ERA5-Land data on a schedule, with additional convenience functionality, see the dhis2-era5land-simple repository.

Prerequisites

Before starting, ensure you have:

1) Gather the needed files

For this tutorial, we are going to be using the provided example folder. This folder contains all the files we’ll be needing, and will be explained in more detail later.

    workflows/
    └── scheduling/
        └── example
            ├── Dockerfile
            ├── docker-compose.yml
            ├── cronfile
            ├── requirements.txt  (copied from root folder)
            ├── notebooks
            |   └── import-era5-daily.ipynb  (copied from workflows folder)
            ├── configs
            |   └── import-temperature-config.yaml
            └── data
                └── ... (this is where climate data will be downloaded and cached)

Note that we have copied the requirements.txt file and the import-era5-daily.ipynb file from elsewhere in the repo.

2) Make the notebook configurable

The Import ERA5-Land Daily notebook hardcodes all input parameters, including sensitive settings like DHIS2 instance, username, and password. For automation, credentials and settings should be externalized rather than hardcoded in scripts.

Tag the parameters cell

Since we are using papermill to run the notebook, we first need to tell papermill where the parameters are defined. As described in the papermill documentation, this is done by adding a parameters tag to the notebook cell containing the parameters.

Create the parameters yaml file

Papermill can then read parameters from a yaml file, and then inject and override those defined in the notebook cell with the parameters tag. For this tutorial, we have included an import-temperature-config.yaml file where we set the parameters to import temperature data instead of the default precipitation.

DHIS2_BASE_URL: https://play.im.dhis2.org/stable-2-42-3-1
DHIS2_USERNAME: admin
DHIS2_PASSWORD: district

DHIS2_DATA_ELEMENT_ID: urBVcn8nZ7V
DHIS2_TIMEZONE_OFFSET: 0
DHIS2_DRY_RUN: True

IMPORT_VARIABLE: 2m_temperature
IMPORT_VALUE_COL: t2m
IMPORT_IS_CUMULATIVE: False
IMPORT_FROM_UNITS: kelvin
IMPORT_TO_UNITS: degC

IMPORT_START_DATE: 2025-01

DOWNLOAD_FOLDER: /app/data
DOWNLOAD_PREFIX: era5_hourly_temperature
TEMPORAL_AGGREGATION: mean

This approach allows:

3) Configure Docker and cron scheduling

For production use, we run imports automatically on a schedule using Docker and Cron schedules.

To make our notebook run on a schedule, we have defined three files:

3.1) Dockerfile

Defining a Dockerfile image is needed to define the virtual operating system with needed tools such as cron to run schedules, the code and script files to import the data, and install the packages and environment needed to run them. We include an example Dockerfile that has what we will use for this tutorial. Its contents look like this:

# start with a Python 3.13 base image
FROM python:3.13-slim

# install system dependencies
RUN apt-get update && apt-get install -y \
    cron \
    git \
    tzdata \
    dos2unix \
    && rm -rf /var/lib/apt/lists/*

# set and enter the workspace
WORKDIR /app

# install the DHIS2 Climate Tools environment
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

To keep the Docker image generic and reusable, the Dockerfile only copies what we need to install the Python environment, without baking any notebooks, scripts, or configuration into the image itself.

The notebooks, cron configuration, and parameter files are provided later by mounting the project directory into the container at runtime using Docker Compose.

3.2) cronfile

To define the scheduled imports, we have created an example cronfile which looks like this:

# Set your local timezone here
TZ=Europe/Oslo

# Force the shell to bash (standardizes behavior)
SHELL=/bin/bash

# Add the Python/Pip installation path
PATH=/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin

# ====================================================
# Define one or more commands to be run at a schedule
# Format is: {CRONPATTERN} {COMMAND}
# The > /proc/1/fd/1 2>&1 at the end redirects the cron logs to Docker logs

# Run ERA5 import for temperature at 02:00
0 2 * * * papermill --cwd /app /app/notebooks/import-era5-daily.ipynb /app/data/import-temperature.output.ipynb -f /app/configs/import-temperature-config.yaml --log-output --no-progress-bar > /proc/1/fd/1 2>&1

What the above cronfile does:

  1. Creates a crontab entry for running import-era5-daily.ipynb using the parameters from import-temperature-config.yaml.

  2. Forwards output to Docker logs for monitoring.

  3. Runs continuously, executing the import according to the specified schedule.

Cron expression examples:

ExpressionDescription
0 6 * * *Daily at 6:00 AM
0 1 * * *Daily at 1:00 AM
0 0 * * 0Weekly on Sunday at midnight
0 0 1 * *Monthly on the 1st

Use crontab.guru to build expressions.

3.3) docker-compose file

We also need a docker compose file which will build the Docker image, link to our local example folder, and run the crontab command on the cronfile. We include an example docker-compose.yaml file that can be used for this tutorial. It should look like this:

services:
schedule:
    build: .
    image: climate-scheduler:latest     # Explicit image name
    container_name: climate-scheduler   # Explicit container name
    restart: unless-stopped             # Restarts in the case of a crash
    environment:
    - TZ=Europe/Oslo                  # Set your local timezone here
    volumes:
    - .:/app                          # Links current folder content to app folder
    - ./data:/app/data          # Links local data folder to container folder
    - ~/.cdsapirc:/root/.cdsapirc:ro  # Links local CDS API key to container root user folder
    command: >
    sh -c "
        dos2unix /app/cronfile &&
        crontab /app/cronfile &&
        cron &&
        tail -f /dev/null
    "

Things to note:

4) Build and run the scheduler with docker compose

The files contained in the example folder should be enough to test run the provided import schedule on your computer.

The only thing you have to change is updating the DATA_ELEMENT_ID in the configs/import-temperature-config.yaml file. If you are running against the public DHIS2 server, you can use the Prepare Metadata notebook to create the needed data elements.

Note that the included parameters file uses DRY_RUN = True, so remember to set this to False if you want to actually import into your instance.

From the root of the DHIS2 Climate Tools repository, navigate to the scheduling example folder:

cd docs/workflows/scheduling/example

Starting the docker compose file will build the image (only the first time) and start the cron scheduler:

    docker compose up --detach --build

To check that the docker container started successfully and is running:

    docker ps

To verify that the docker container and cron schedule uses the correct timezone:

    docker exec climate-scheduler date

To check the last n timestamped logs:

    docker logs --timestamps -n 20 climate-scheduler

Now the ERA5-Land imports should repeat at regular intervals as specified in the cronfile, for as long as the docker container climate-scheduler is running. If something unexpected happens or it crashes, the container will restart and continue where it left off.

Making changes to the notebook, parameters, or schedules

If you make any changes to any of the files or schedules, you simply have to restart the docker container in order to restart cron and for the changes to take effect:

    docker compose down
    docker compose up --detach --build

5) Adding multiple scheduled jobs

You can also add multiple schedules to the same cronfile, so that one schedule runs the notebook with the temperature parameters file, and another schedule with a precipitation parameters file, and so on.

Your configs folder would then have one config for temperature and another for precipitation:

    └── configs
        ├── import-temperature-config.yaml
        └── import-precipitation-config.yaml

And your cronfile would have two schedules instead of one:

    # Run ERA5 import for temperature at 02:00
    0 2 * * * papermill --cwd /app /app/notebooks/import-era5-daily.ipynb /app/data/import-temperature.output.ipynb -f /app/configs/import-temperature-config.yaml --log-output --no-progress-bar > /proc/1/fd/1 2>&1

    # Run ERA5 import for precipitation at 03:00
    0 3 * * * papermill --cwd /app /app/notebooks/import-era5-daily.ipynb /app/data/import-precipitation.output.ipynb -f /app/configs/import-precipitation-config.yaml --log-output --no-progress-bar > /proc/1/fd/1 2>&1