In this notebook we will show how to load daily climate data from NetCDF using earthkit and aggregate temperature and precipitation climate variables to DHIS2 organisation units.
import earthkit.data
from earthkit import transforms
import dhis2eo.org_units
from dhis2eo.integrations.pandas import dataframe_to_dhis2_jsonLoading the data¶
Our sample NetCDF file contains daily temperature and precipitation data for Sierra Leone in July 2025. Let’s load the file using earthkit:
file = "../data/era5-daily-temp-precip-july-2025-sierra-leone.nc"
data = earthkit.data.from_source("file", file)See more examples for how you can load data with eartkit, or see the video below.
How to get data with earthkit
To more easily work with and display the contents of the dataset we can convert it to an xarray. It shows that the file includes 3 dimensions (latitude, longitude and valid_time) and two data variables t2m (temperature at 2m above sea level), and tp (total precipitation). The data source is European Centre for Medium-Range Weather Forecasts (ECMWF).
data_array = data.to_xarray()
data_arrayLoading the organisation units¶
We next use the dhis2eo.org_units.from_file convenience method to load the organisation units from DHIS2 that we saved as a GeoJSON file, and convert these to GeoPandas format:
district_file = "../data/sierra-leone-districts.geojson"
org_units = dhis2eo.org_units.from_file(district_file, dhis2=True)The GeoJSON file contains the boundaries of 13 named organisation units in Sierra Leone. For the aggregation, we are particularly interested in the org_unit_id and the geometry (polygon) of the org unit:
org_unitsAggregating the data to organisation units¶
To aggregate the data to the org unit features we use the spatial.reduce function of earthkit-transforms. We keep the daily period type and only aggregate the data spatially to the org unit features.
Since our climate data variables need to be aggregated using different statistics, we do separate aggregations for each variable.
Temperature¶
To aggregate the temperature variable, we extract the temperature or t2m variable, and tell the spatial.reduce function to aggregate the data to our organisation units org_units. We set mask_dim='org_unit_id' to specify that we want one aggregated value for every unique value in the organisation unit org_unit_id column. Finally, we set how='mean' so that we get the average temperature of all gridded values that land inside an organisation unit.
temp = data_array['t2m']
agg_temp = transforms.spatial.reduce(temp, org_units, mask_dim='org_unit_id', how='mean')
agg_tempThe result from spatial.reduce is an xarray object, which doesn’t make much sense for our aggregated data. So instead we convert the results to a Pandas dataframe, which allows us to read the results easier:
agg_temp_df = agg_temp.to_dataframe().reset_index()
agg_temp_dfWe see that the aggregated dataframe contains what seems to be kelvin temperature values for each organisation unit and each time period (daily).
Precipitation¶
We use the same approach for precipitation by extracting the precipitation or tp variable. The main difference here is that we set how='sum' since precipitation is typically reported as the total precipitation for an area (not average).
precip = data_array['tp']
agg_precip = transforms.spatial.reduce(precip, org_units, mask_dim='org_unit_id', how='sum')
agg_precip_df = agg_precip.to_dataframe().reset_index()
agg_precip_dfWe see that the aggregated dataframe contains what seems to be total precipitation values in meters for each organisation unit and each time period (daily).
Post-processing¶
We have now aggregated the temperature and precipitation data to our organisation units. But before we submit the results to DHIS2, we want to make sure they are reported in a format that makes sense to most users.
For temperature, we convert the data values from kelvin to celcius by subtracting 273.15 from the values:
agg_temp_df['t2m'] -= 273.15
agg_temp_dfFor precipitation, to avoid small decimal numbers, we convert the reporting unit from meters to millimeters:
agg_precip_df['tp'] *= 1000
agg_precip_dfConverting to DHIS2 Format¶
Before we can send these data to DHIS2, we need to use the dhsi2eo utility function dataframe_to_dhis2_json to translate each of our aggregated pandas.DataFrame into the JSON structure used by the DHIS2 Web API.
First, for temperature:
agg_temp_json_dict = dataframe_to_dhis2_json(
df = agg_temp_df, # aggregated pandas.DataFrame
org_unit_col = 'id', # column containing the org unit id
period_col = 'valid_time', # column containing the period
value_col = 't2m', # column containing the value
data_element_id = 'VJwwPOOvge6' # id of the DHIS2 data element
)We can display the first 3 items to see that we have one temperature value for each org unit and period combination.
agg_temp_json_dict['dataValues'][:3][{'orgUnit': 'O6uvpzGd5pu',
'period': '20250701',
'value': 24.0718994140625,
'dataElement': 'VJwwPOOvge6'},
{'orgUnit': 'O6uvpzGd5pu',
'period': '20250702',
'value': 24.78936767578125,
'dataElement': 'VJwwPOOvge6'},
{'orgUnit': 'O6uvpzGd5pu',
'period': '20250703',
'value': 24.631500244140625,
'dataElement': 'VJwwPOOvge6'}]And we do the same for precipitation:
agg_precip_json_dict = dataframe_to_dhis2_json(
df = agg_precip_df, # aggregated pandas.DataFrame
org_unit_col = 'id', # column containing the org unit id
period_col = 'valid_time', # column containing the period
value_col = 'tp', # column containing the value
data_element_id = 'eHFmngLqpj4' # id of the DHIS2 data element
)And inspect the results:
agg_precip_json_dict['dataValues'][:3][{'orgUnit': 'O6uvpzGd5pu',
'period': '20250701',
'value': 31.98239517211914,
'dataElement': 'eHFmngLqpj4'},
{'orgUnit': 'O6uvpzGd5pu',
'period': '20250702',
'value': 45.636104583740234,
'dataElement': 'eHFmngLqpj4'},
{'orgUnit': 'O6uvpzGd5pu',
'period': '20250703',
'value': 70.48600006103516,
'dataElement': 'eHFmngLqpj4'}]Next steps¶
At this point we have successfully aggregated temperature data in a JSON format that can be used by DHIS2. To learn how to import this JSON data into DHIS2, see our guide for uploading data values using the Python DHIS2 client.