Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Aggregate climate data to DHIS2 organisation units

In this notebook we will show how to load daily climate data from NetCDF using earthkit and aggregate temperature and precipitation climate variables to DHIS2 organisation units.

import geopandas as gpd
import earthkit.data
from earthkit import transforms
from dhis2eo.integrations.pandas import dataframe_to_dhis2_json

Loading the data

Our sample NetCDF file contains daily temperature and precipitation data for Sierra Leone in July 2025. Let’s load the file using earthkit:

file = "../data/era5-daily-temp-precip-july-2025-sierra-leone.nc"
data = earthkit.data.from_source("file", file)

See more examples for how you can load data with eartkit, or see the video below.

How to get data with earthkit

To more easily work with and display the contents of the dataset we can convert it to an xarray. It shows that the file includes 3 dimensions (latitude, longitude and valid_time) and two data variables t2m (temperature at 2m above sea level), and tp (total precipitation). The data source is European Centre for Medium-Range Weather Forecasts (ECMWF).

data_array = data.to_xarray()
data_array
Loading...

Loading the organisation units

We next use geopandas to load our organisation units that we’ve downloaded from DHIS2 as a GeoJSON file:

district_file = "../data/sierra-leone-districts.geojson"
org_units = gpd.read_file(district_file)

The GeoJSON file contains the boundaries of 13 named organisation units in Sierra Leone. For the aggregation, we are particularly interested in the id and the geometry (polygon) of the org unit:

org_units
Loading...

Aggregating the data to organisation units

To aggregate the data to the org unit features we use the spatial.reduce function of earthkit-transforms. We keep the daily period type and only aggregate the data spatially to the org unit features.

Since our climate data variables need to be aggregated using different statistics, we do separate aggregations for each variable.

Temperature

To aggregate the temperature variable, we extract the temperature or t2m variable, and tell the spatial.reduce function to aggregate the data to our organisation units org_units. We set mask_dim='id' to specify that we want one aggregated value for every unique value in the organisation unit id column. Finally, we set how='mean' so that we get the average temperature of all gridded values that land inside an organisation unit.

temp = data_array['t2m']
agg_temp = transforms.spatial.reduce(temp, org_units, mask_dim='id', how='mean')
agg_temp
Loading...

The result from spatial.reduce is an xarray object, which doesn’t make much sense for our aggregated data. So instead we convert the results to a Pandas dataframe, which allows us to read the results easier:

agg_temp_df = agg_temp.to_dataframe().reset_index()
agg_temp_df
Loading...

We see that the aggregated dataframe contains what seems to be kelvin temperature values for each organisation unit and each time period (daily).

Precipitation

We use the same approach for precipitation by extracting the precipitation or tp variable. Again, we set how='mean' to get the average precipitation for the entire area. It’s also common to report minimum and maximum precipitation, which can be done by instead setting how='min' or how='max'.

precip = data_array['tp']
agg_precip = transforms.spatial.reduce(precip, org_units, mask_dim='id', how='mean')
agg_precip_df = agg_precip.to_dataframe().reset_index()
agg_precip_df
Loading...

We see that the aggregated dataframe contains what seems to be total precipitation values in meters for each organisation unit and each time period (daily).

Post-processing

We have now aggregated the temperature and precipitation data to our organisation units. But before we submit the results to DHIS2, we want to make sure they are reported in a format that makes sense to most users.

For temperature, we convert the data values from kelvin to celcius by subtracting 273.15 from the values:

agg_temp_df['t2m'] -= 273.15
agg_temp_df
Loading...

For precipitation, to avoid small decimal numbers, we convert the reporting unit from meters to millimeters:

agg_precip_df['tp'] *= 1000
agg_precip_df
Loading...

Converting to DHIS2 Format

Before we can send these data to DHIS2, we need to use the dhsi2eo utility function dataframe_to_dhis2_json to translate each of our aggregated pandas.DataFrame into the JSON structure used by the DHIS2 Web API.

First, for temperature:

agg_temp_json_dict = dataframe_to_dhis2_json(
    df = agg_temp_df,               # aggregated pandas.DataFrame
    org_unit_col = 'id',            # column containing the org unit id
    period_col = 'valid_time',      # column containing the period
    value_col = 't2m',              # column containing the value
    data_element_id = 'VJwwPOOvge6' # id of the DHIS2 data element
)

We can display the first 3 items to see that we have one temperature value for each org unit and period combination.

agg_temp_json_dict['dataValues'][:3]
[{'orgUnit': 'O6uvpzGd5pu', 'period': '20250701', 'value': 24.0718994140625, 'dataElement': 'VJwwPOOvge6'}, {'orgUnit': 'O6uvpzGd5pu', 'period': '20250702', 'value': 24.78936767578125, 'dataElement': 'VJwwPOOvge6'}, {'orgUnit': 'O6uvpzGd5pu', 'period': '20250703', 'value': 24.631500244140625, 'dataElement': 'VJwwPOOvge6'}]

And we do the same for precipitation:

agg_precip_json_dict = dataframe_to_dhis2_json(
    df = agg_precip_df,             # aggregated pandas.DataFrame
    org_unit_col = 'id',            # column containing the org unit id
    period_col = 'valid_time',      # column containing the period
    value_col = 'tp',               # column containing the value
    data_element_id = 'eHFmngLqpj4' # id of the DHIS2 data element
)

And inspect the results:

agg_precip_json_dict['dataValues'][:3]
[{'orgUnit': 'O6uvpzGd5pu', 'period': '20250701', 'value': 5.330399036407471, 'dataElement': 'eHFmngLqpj4'}, {'orgUnit': 'O6uvpzGd5pu', 'period': '20250702', 'value': 7.60601806640625, 'dataElement': 'eHFmngLqpj4'}, {'orgUnit': 'O6uvpzGd5pu', 'period': '20250703', 'value': 11.747666358947754, 'dataElement': 'eHFmngLqpj4'}]

Next steps

At this point we have successfully aggregated temperature data in a JSON format that can be used by DHIS2. To learn how to import this JSON data into DHIS2, see our guide for uploading data values using the Python DHIS2 client.