Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Downloading and Harmonizing Dengue Data from OpenDengue

This guide demonstrates how to download and harmonize OpenDengue case data for use with DHIS2. The same approach can also be applied to local Dengue case counts from official Ministry of Health data.

The notebook focuses on data harmonization and preparation using a worked example for Nepal (districts / admin2) and monthly data.

Inputs

This workflow expects two local input files under ../../data/:

Output

The workflow produces:

from pathlib import Path

import pandas as pd
import geopandas as gpd

pd.set_option("display.max_columns", 200)

Paths

DATA_FOLDER = Path("../../data")

LOCATIONS_GEOJSON = DATA_FOLDER / "nepal-locations.geojson"
OPENDENGUE_SOURCE_PATH = DATA_FOLDER / "nepal-opendengue.csv"

# Output
OUT_CSV = DATA_FOLDER / "nepal-dengue-monthly-admin.csv"

for p in [LOCATIONS_GEOJSON, OPENDENGUE_SOURCE_PATH]:
    if not p.exists():
        raise FileNotFoundError(f"Missing required input: {p}")

print("Using inputs:")
print(" -", LOCATIONS_GEOJSON)
print(" -", OPENDENGUE_SOURCE_PATH)
Using inputs:
 - ..\..\data\nepal-locations.geojson
 - ..\..\data\nepal-opendengue.csv

Load DHIS2 district locations

locations = gpd.read_file(LOCATIONS_GEOJSON)

# DHIS2 UID 
uid_col = "id" if "id" in locations.columns else None
if uid_col is None:
    raise KeyError(f"Expected DHIS2 UID in GeoJSON 'id'. Found: {list(locations.columns)}")

locations["location"] = locations[uid_col].astype(str).str.strip() 

# Join helper (district name)
if "name" not in locations.columns:
    raise KeyError(f"Expected district name in GeoJSON 'name'. Found: {list(locations.columns)}")

locations["district_name"] = (
    locations["name"].astype(str)
    .str.replace(r"^\s*\d+\s+", "", regex=True)  # drop "101 that came with location names"
    .str.upper()
    .str.strip()
)

# Keep only what we need
locations = locations[["location", "district_name", "geometry"]].dropna(subset=["location"]).copy()
locations
Loading...

Load OpenDengue

df_raw = pd.read_csv(OPENDENGUE_SOURCE_PATH)
print("Loaded:", OPENDENGUE_SOURCE_PATH)
print("Columns:", df_raw.columns.tolist())
df_raw.head()
Loaded: ..\..\data\nepal-opendengue.csv
Columns: ['adm_0_name', 'adm_1_name', 'adm_2_name', 'full_name', 'ISO_A0', 'FAO_GAUL_code', 'RNE_iso_code', 'IBGE_code', 'calendar_start_date', 'calendar_end_date', 'Year', 'dengue_total', 'case_definition_standardised', 'S_res', 'T_res', 'UUID', 'region']
Loading...

OpenDengue contains multiple administrative levels in the same file, so we subset to only the Admin 2 units.

df_raw = df_raw[df_raw['S_res']=='Admin2']
print('Number of rows after filtering to admin2 units:', len(df_raw))
Number of rows after filtering to admin2 units: 2772

Column mapping

# OpenDengue export columns
DATE_COL = "calendar_start_date"
CASES_COL = "dengue_total"
ADMIN2_COL = "adm_2_name"

missing = [c for c in [DATE_COL, CASES_COL, ADMIN2_COL] if c not in df_raw.columns]
if missing:
    raise KeyError(
        f"Input CSV is missing required columns: {missing}. "
        f"Available columns: {df_raw.columns.tolist()}"
    )

print("Using columns:", {"date": DATE_COL, "cases": CASES_COL, "admin2": ADMIN2_COL})
Using columns: {'date': 'calendar_start_date', 'cases': 'dengue_total', 'admin2': 'adm_2_name'}

Normalize OpenDengue (Nepal districts)

df_norm = pd.DataFrame({
    "date": pd.to_datetime(df_raw[DATE_COL], errors="coerce"),
    "cases": pd.to_numeric(df_raw[CASES_COL], errors="coerce"),
    "district_name": df_raw[ADMIN2_COL],   # <-- not location yet
})

# Normalize district name for the crosswalk join
df_norm["district_name"] = (
    df_norm["district_name"]
    .astype(str)
    .str.upper()
    .str.strip()
    .str.replace(r"\s+", " ", regex=True)
)

# Keep only valid rows
# Map district_name -> DHIS2 orgUnit UID
df_norm = df_norm.merge(
    locations[["district_name", "location"]],
    on="district_name",
    how="left",
)

# Fail fast (or drop) if mapping is incomplete
unmapped = df_norm["location"].isna().mean()
print(f"Unmapped dengue rows: {unmapped:.2%}")
if unmapped > 0:
    print("Examples:", df_norm.loc[df_norm["location"].isna(), "district_name"].drop_duplicates().head(15).tolist())

# Drop rows with missing values
df_norm = df_norm.dropna(subset=["location"]).copy()
df_norm = df_norm.dropna(subset=["date", "cases", "district_name"])

df_norm
Unmapped dengue rows: 1.30%
Examples: ['CHITAWAN']
Loading...

Monthly aggregation

# Convert to month period label (YYYY-MM)
df_norm["time_period"] = df_norm["date"].dt.to_period("M").astype(str)

# Aggregate within month + location
disease = (
    df_norm.groupby(["time_period", "location"], as_index=False)["cases"]
    .sum()
    .rename(columns={"cases": "disease_cases"})
)

print("Aggregated rows:", len(disease))
disease
Aggregated rows: 2736
Loading...

Filter to districts and align time axis

# Keep only locations present in the GeoJSON backbone
before = len(disease)
disease = disease.merge(locations[["location"]], on="location", how="inner")
after = len(disease)
print(f"Backbone filter kept {after}/{before} rows")

# Build full (time_period x location) grid and fill missing with 0
all_months = pd.period_range(disease["time_period"].min(), disease["time_period"].max(), freq="M").astype(str)
all_locations = locations["location"].sort_values().unique()

grid = pd.MultiIndex.from_product([all_months, all_locations], names=["time_period", "location"]).to_frame(index=False)

disease_full = grid.merge(disease, on=["time_period", "location"], how="left")
disease_full["disease_cases"] = disease_full["disease_cases"].fillna(0)

# Keep integer-looking values as ints where possible
disease_full["disease_cases"] = pd.to_numeric(disease_full["disease_cases"], errors="coerce").fillna(0).astype(int)

print("Final rows (complete grid):", len(disease_full))
disease_full
Backbone filter kept 2736/2736 rows
Final rows (complete grid): 2772
Loading...

Write output CSV

disease_full.to_csv(OUT_CSV, index=False)
print("Wrote:", OUT_CSV)
OUT_CSV
Wrote: ..\..\data\nepal-dengue-monthly-admin.csv
WindowsPath('../../data/nepal-dengue-monthly-admin.csv')

Next steps

This guide stops after downloading and producing a harmonized, DHIS2-ready OpenDengue dataset for Nepal.

To import the resulting data into DHIS2, see the section for importing data to DHIS2.