Overview of IPUMS-DHS Contextual Variables
Contextual variables describe features of the physical and social environment of a small geographic area (5-10 kilometer radius) surrounding the location where a DHS respondent was interviewed. Contextual variables in IPUMS-DHS eliminate the need to create linking keys and merge data files. The contextual variables encompass:
- Physical and environment context, including the predominant terrestrial ecoregion, soil type, normalized vegetation index (NDVI), precipitation, minimum monthly temperature, and maximum monthly temperature
- Economic and social context, such as livelihood zones, population density, and malaria incidence.
- Agricultural context, including the share of land devoted to cropland or pastureland, crop harvest area (e.g., RICE_H), and crop production (e.g., RICE_P)
The original DHS files include some variables that are implicitly contextual, such as the classification of the respondents' residence as urban versus rural. The IPUMS-DHS contextual variables allow researchers to study how a wider range of surrounding characteristics may influence health and well-being. For example, researchers have found that exposure to unusually hot days is correlated with higher rates of heart attack and low birthweights, while unusually high or low rainfall influences outmigration. Certain types of physical environments, livelihoods, and staple crops are more vulnerable to weather extremes and global warming, with implications for health and well-being.
Click HERE for a list of all IPUMS-DHS contextual variables.
Two Ways to Access Contextual Variables
Researchers may include contextual variables as part of their customized data file (extract), treating these variables as characteristics of the respondents' environment (just like urban or rural residence).
For samples not yet included in IPUMS-DHS, researchers may download a flat file containing environmental and contextual variables and link to respondents in an original DHS data file, on the basis of sample and cluster number. A list of samples with data on contextual variables is available HERE.
Additional contextual variables that link GPS cluster data to ancillary data are available from The DHS Program's Geospatial Covariates page.
Contextual variable computation in IPUMS-DHS
The statistics for all contextual variables are computed within a 5- to 10-kilometer buffer around DHS clusters. The main reason for using a buffer is to minimize the effects of DHS cluster displacement. It is also best to calculate environmental statistics by considering the surrounding area for individuals or communities, instead of using the value at the single point location. The size of the buffer varies across variables. A 5-kilometer buffer was used for ecoregion, livelihood zone, population density, and soil data; all other variables used a 10-kilometer buffer. All buffer sizes for a variable were consistent across all clusters - regardless of whether urban or rural - to make the data consistent and comparable across individuals.
The creation of most IPUMS-DHS contextual variables is based on a common general methodology using Esri's ArcGIS software suite. Source data were acquired as raster or vector files. Source raster files with a resolution greater than 500 meters and vector files were converted to raster files with a resolution of 500 meters. The Focal Statistics tool was then used to update each pixel value within a 5- or 10-kilometer circular buffer around the sample cluster GPS location. For qualitative variables such as soil type, livelihood zone, or ecoregion, the predominant value within a 5-kilometer buffer was used to update each pixel value. For quantitative variables such as temperature, precipitation, population density, or malaria incidence, the mean, maximum, or sum was computed for all pixels within a 5- or 10-kilometer buffer. The staff then used the Extract Values to Points tool to assign the value of the intersecting pixel to each cluster location.
A different methodology was used for the conflict variables (battles, riots, and violence against civilians). The conflict data report the annual counts (i.e., number of days) of incidents for a given latitude/longitude coordinate. IPUMS-DHS staff converted the coordinates to a point layer, created a 10-kilometer buffer around each DHS sample cluster location, and then counted the number of conflict events falling within the buffer. If the buffer crossed an international boundary, only events occurring within the same country as the DHS sample were included in the count.
Flat files in .csv format are available for all contextual variables for all DHS samples with GPS data available before July 2018, including samples not yet available in IPUMS-DHS.
A note about the GPS cluster datasets
The Demographic and Health Surveys (DHS) Program provides GPS coordinates for clusters. Clusters are groupings of households that participated in the survey. The GPS readings are highly accurate, but are displaced to ensure respondent confidentiality. Displacement ranges from 0 to 2 kilometers in urban areas to 0 to 5 kilometers for rural areas, with a further 1% of rural clusters displaced up to 10 kilometers. Clusters are not displaced across survey regions or national boundaries. The contextual variables calculated by IPUMS-DHS average the values within the radius of displacement. For details, please check the documentation on cluster displacement. Users interested in performing spatial analysis with GPS cluster datasets may obtain DHS cluster shapefiles from The DHS Program website.