Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Aggregate climate data to different time scales

Climate data may be collected and reported at various temporal resolutions (e.g., hourly, daily, or monthly). When preparing data for import into DHIS2, it’s important to ensure that the data is aggregated appropriately to match both the type of variable and the target period. Note that you can also aggregate daily data to other period types using the built-in aggregation support in DHIS2.

import xarray as xr
from earthkit import transforms

Aggregating hourly climate data to daily

Consider the ERA5-Land data, which in its original version contains data for climatic variables for every hour since 1950. Below, we load a sample file of ERA5-Land hourly temperature and precipitation data for Sierra Leone, and subset it to only the first 3 days of February, 2025:

# open and clean the data
hourly = xr.open_dataset('../data/era5-land-hourly-temp-precip-feb-2025-sierra-leone.nc')
hourly = hourly.drop_vars(['number', 'expver'])

# subset between start and end date
start = '2025-02-01'
end = '2025-02-03'
hourly = hourly.sel(valid_time=slice(start, end))
hourly
Loading...

We see that our dataset contains data for 72 hours (3 days), with data values for temperature (t2m) and precipitation (tp). Such hourly data are too detailed for DHIS2, where the smallest possible period type is daily. To import them, we next show how to aggregate the hourly values to daily totals.

For the upcoming sections, we will use the capital of Sierra Leone, Freetown, as an example to illustrate the effects of temporal aggregation. We select the grid point closest to the capital to obtain an xarray with only hourly values:

hourly_freetown = hourly.sel(latitude=8.48, longitude=-13.23, method="nearest")

Temperature

Before we aggregate the hourly temperature data, let’s first inspect and plot the hourly temperature values for the city of Freetown. From this, we see that the hourly temperature data is reported in kelvin, and that the temperature peaks a little past noon each day, as we would expect:

hourly_freetown['t2m'].plot(color='red', marker='o')
<Figure size 640x480 with 1 Axes>

A common way to aggregate hourly to daily temperature values is to take the average temperature for each day. We can do so easily by extracting the t2m variable and passing that to the temporal.daily_reduce function from the earthkit.transforms module. We specify that we want the average temperature value by setting how='mean':

hourly_temp = hourly_freetown['t2m']
daily_temp = transforms.temporal.daily_reduce(hourly_temp, how='mean')

If we now plot the aggregated data, we see only 3 daily values, as expected. We also reuse the previous y-limits to make it comparable with the hourly chart:

ymin, ymax = hourly_temp.min(), hourly_temp.max()
daily_temp.plot(color='red', marker='o', ylim=(ymin, ymax))
<Figure size 640x480 with 1 Axes>

Precipitation

For the precipitation data, let’s do the same as we did for temperature and plot what the hourly precipitation data looks like for the city of Freetown:

hourly_freetown['tp'].plot(marker='o')
<Figure size 640x480 with 1 Axes>

This time we see that there is something strange going on. We see that the precipitation, which is reported in meters, has a curious shape that gradually increases and suddently resets at the end of each day.

That is because the tp precipitation variable in ERA5-Land is provided as a daily accumulated variable, meaning each time step represents the total precipitation accumulated since the start of the day.

To instead know how much precipitation occured during each hour, we need to calculate the incremental differences for each of the accumulated values:

accum_precip = hourly_freetown['tp']
hourly_precip = accum_precip.diff(dim='valid_time')
hourly_precip.plot(marker='o')
<Figure size 640x480 with 1 Axes>

This plot looks better, but now we see that there are negative differences at the boundary between days where the cumulative totals reset. Those negative values represent the first hour of each day, and can be replaced with their corresponding accumulated value:

hourly_precip = xr.where(hourly_precip < 0, accum_precip.isel(valid_time=slice(1, None)), hourly_precip)
hourly_precip.plot(marker='o')
<Figure size 640x480 with 1 Axes>

Now that we have the correct hourly precipitation values throughout the day, we can proceed to aggregate from hourly to daily precipiation. Daily precipitation is typically calculated as the sum total of all the precipitation that happened throughout each day. To do so, we can simply pass the hourly precipiation to earthkit’s temporal.daily_reduce function, and set how='sum':

daily_precip = transforms.temporal.daily_reduce(hourly_precip, how='sum')

Finally, when we plot the aggregated precipitation data, we see that February 3 had the least amount of total precipitation, matching what we saw in the hourly chart. We also see that the total daily amounts are higher than in the hourly chart, because the daily precipitation was computed as the sum of hourly precipitation:

daily_precip.plot(marker='o')
<Figure size 640x480 with 1 Axes>

Aggregating hourly climate data to monthly

We can apply the same approach we used for hourly-to-daily aggregation to produce monthly summaries. In the previous section we only looked at the first 3 days of February, so we start by loading the full sample of the same ERA5-Land dataset for February 2025:

hourly = xr.open_dataset('../data/era5-land-hourly-temp-precip-feb-2025-sierra-leone.nc')
hourly = hourly.drop_vars(['number', 'expver'])

Let’s select and plot the hourly temperature values for the city of Freetown:

hourly_freetown = hourly.sel(latitude=8.48, longitude=-13.23, method="nearest")
hourly_temp = hourly_freetown['t2m']
hourly_temp.plot(color='red')
<Figure size 640x480 with 1 Axes>

To aggregate the daily temperature data to monthly averages, we use earthkit’s temporal.monthly_reduce function:

monthly_temp = transforms.temporal.monthly_reduce(hourly_temp, how='mean')

Since the original data was only for the month of February, we are left with only a single monthly aggregated value:

monthly_temp.to_dataframe()
Loading...

Next steps

In this notebook we have shown how to aggregate hourly climate data to daily data so that it can be imported into DHIS2. We have also shown how to inspect the temporal aspects of the hourly data, and how aggregation differs for temperature and precipitation variables. We also briefly showed how the same principles can be applied to aggregate daily climate data to monthly.

Once your gridded climate data has been aggregated to the desired time period, the next step is to spatially aggregate from gridded data to DHIS2 organisation units. See our guide for aggregating to DHIS2 organisation units.