Skip to article frontmatterSkip to article content

Aggregate climate data to DHIS2 organisation units

In this notebook we will show how to load daily climate data from NetCDF using earthkit and aggregate temperature and precipitation climate variables to DHIS2 organisation units.

import earthkit.data
from earthkit import transforms
import dhis2eo.org_units
from dhis2eo.integrations.pandas import dataframe_to_dhis2_json

Loading the data

Our sample NetCDF file contains daily temperature and precipitation data for Sierra Leone in July 2025. Let’s load the file using earthkit:

file = "../data/era5-daily-temp-precip-july-2025-sierra-leone.nc"
data = earthkit.data.from_source("file", file)

See more examples for how you can load data with eartkit, or see the video below.

How to get data with earthkit

To more easily work with and display the contents of the dataset we can convert it to an xarray. It shows that the file includes 3 dimensions (latitude, longitude and valid_time) and two data variables t2m (temperature at 2m above sea level), and tp (total precipitation). The data source is European Centre for Medium-Range Weather Forecasts (ECMWF).

data_array = data.to_xarray()
data_array
Loading...

Loading the organisation units

We next use the dhis2eo.org_units.from_file convenience method to load the organisation units from DHIS2 that we saved as a GeoJSON file, and convert these to GeoPandas format:

district_file = "../data/sierra-leone-districts.geojson"
org_units = dhis2eo.org_units.from_file(district_file, dhis2=True)

The GeoJSON file contains the boundaries of 13 named organisation units in Sierra Leone. For the aggregation, we are particularly interested in the org_unit_id and the geometry (polygon) of the org unit:

org_units
Loading...

Aggregating the data to organisation units

To aggregate the data to the org unit features we use the spatial.reduce function of earthkit-transforms. We keep the daily period type and only aggregate the data spatially to the org unit features.

Since our climate data variables need to be aggregated using different statistics, we do separate aggregations for each variable.

Temperature

To aggregate the temperature variable, we extract the temperature or t2m variable, and tell the spatial.reduce function to aggregate the data to our organisation units org_units. We set mask_dim='org_unit_id' to specify that we want one aggregated value for every unique value in the organisation unit org_unit_id column. Finally, we set how='mean' so that we get the average temperature of all gridded values that land inside an organisation unit.

temp = data_array['t2m']
agg_temp = transforms.spatial.reduce(temp, org_units, mask_dim='org_unit_id', how='mean')
agg_temp
Loading...

The result from spatial.reduce is an xarray object, which doesn’t make much sense for our aggregated data. So instead we convert the results to a Pandas dataframe, which allows us to read the results easier:

agg_temp_df = agg_temp.to_dataframe().reset_index()
agg_temp_df
Loading...

We see that the aggregated dataframe contains what seems to be kelvin temperature values for each organisation unit and each time period (daily).

Precipitation

We use the same approach for precipitation by extracting the precipitation or tp variable. The main difference here is that we set how='sum' since precipitation is typically reported as the total precipitation for an area (not average).

precip = data_array['tp']
agg_precip = transforms.spatial.reduce(precip, org_units, mask_dim='org_unit_id', how='sum')
agg_precip_df = agg_precip.to_dataframe().reset_index()
agg_precip_df
Loading...

We see that the aggregated dataframe contains what seems to be total precipitation values in meters for each organisation unit and each time period (daily).

Post-processing

We have now aggregated the temperature and precipitation data to our organisation units. But before we submit the results to DHIS2, we want to make sure they are reported in a format that makes sense to most users.

For temperature, we convert the data values from kelvin to celcius by subtracting 273.15 from the values:

agg_temp_df['t2m'] -= 273.15
agg_temp_df
Loading...

For precipitation, to avoid small decimal numbers, we convert the reporting unit from meters to millimeters:

agg_precip_df['tp'] *= 1000
agg_precip_df
Loading...

Converting to DHIS2 Format

Before we can send these data to DHIS2, we need to use the dhsi2eo utility function dataframe_to_dhis2_json to translate each of our aggregated pandas.DataFrame into the JSON structure used by the DHIS2 Web API.

First, for temperature:

agg_temp_json_dict = dataframe_to_dhis2_json(
    df = agg_temp_df,               # aggregated pandas.DataFrame
    org_unit_col = 'id',            # column containing the org unit id
    period_col = 'valid_time',      # column containing the period
    value_col = 't2m',              # column containing the value
    data_element_id = 'VJwwPOOvge6' # id of the DHIS2 data element
)

We can display the first 3 items to see that we have one temperature value for each org unit and period combination.

agg_temp_json_dict['dataValues'][:3]
[{'orgUnit': 'O6uvpzGd5pu', 'period': '20250701', 'value': 24.0718994140625, 'dataElement': 'VJwwPOOvge6'}, {'orgUnit': 'O6uvpzGd5pu', 'period': '20250702', 'value': 24.78936767578125, 'dataElement': 'VJwwPOOvge6'}, {'orgUnit': 'O6uvpzGd5pu', 'period': '20250703', 'value': 24.631500244140625, 'dataElement': 'VJwwPOOvge6'}]

And we do the same for precipitation:

agg_precip_json_dict = dataframe_to_dhis2_json(
    df = agg_precip_df,             # aggregated pandas.DataFrame
    org_unit_col = 'id',            # column containing the org unit id
    period_col = 'valid_time',      # column containing the period
    value_col = 'tp',               # column containing the value
    data_element_id = 'eHFmngLqpj4' # id of the DHIS2 data element
)

And inspect the results:

agg_precip_json_dict['dataValues'][:3]
[{'orgUnit': 'O6uvpzGd5pu', 'period': '20250701', 'value': 31.98239517211914, 'dataElement': 'eHFmngLqpj4'}, {'orgUnit': 'O6uvpzGd5pu', 'period': '20250702', 'value': 45.636104583740234, 'dataElement': 'eHFmngLqpj4'}, {'orgUnit': 'O6uvpzGd5pu', 'period': '20250703', 'value': 70.48600006103516, 'dataElement': 'eHFmngLqpj4'}]

Next steps

At this point we have successfully aggregated temperature data in a JSON format that can be used by DHIS2. To learn how to import this JSON data into DHIS2, see our guide for uploading data values using the Python DHIS2 client.