IPUMS DHS is created by a small research team with funding from the National Institutes of Health, and we are adding more samples and countries three times a year, as quickly as we can. I expect that all currently available public African standard DHS samples will be released through IPUMS DHS by early 2020. We are working on Chad data now. We are currently funded to harmonize standard DHS samples from Africa, the Middle East, and South Asia, and we will soon apply for additional funding to cover more regions of the world.

If the country or countries you need are not currently available, check back again.

Thanks very much, Miriam: I understand your constraints and I am looking forward to your next release.

But I have a question about the appending process.

When I append multiple rounds/surveys, all my value labels are messed up. For example, the "region" variable seemed to include only the value labels from the last data appended. So, if country A has regions 1 2 3, and country B has regions 3 4 5, I would expect the appended data to include all 6 regions. But in my case, only regions 3 4 5 are populated.

Do you have any hints, strategies to synchronize the value labels, given your experience?

I suggest that you use IPUMS-DHS to avoid the frustration you experience when working with the original DHS files, with labels varying across samples. We recode the data from the original DHS files so that the same meaning is given the same code and label across samples, and you don't have to deal with the issue you describe.

This integrated region variable includes as many samples as possible and has consistent codes identifying areas with the same geographic footprint across all samples included for a country. There may be less detail than in the single sample region variables but there will be comparability in codes, labels, and meanings for the GEO_ integrated variables.

Our data harmonization is designed to insure that the same variable meaning has the same codes and labels across samples; that is why we integrate the data. We also release single-sample region variables for all samples, because they sometimes include more detail than is available in the integrated geographic variable, but most other variables are integrated across multiple sample years (and, apart from a few variables dealing with geography and ethnicty, across countries).

Regions are easy if you're comparing one country over time. The IPUMS DHS integrated geography variables, described by Dr. King, will work well for you.

If you want unique labels for regions across countries, that's trickier. The process with the IPUMS data is similar to the process with the original DHS files. There's no way to make it simpler because the IPUMS folks cannot know which surveys or survey years researchers will want to use.

Here's some Stata code that will work with your IPUMS DHS data file to apply region labels to multiple surveys. The notes provide the information on how to tailor the code to your specific data.

I believe you are looking over time and across countries, so you'll probably want to use the integrated geography variables. Other researchers, who are comparing across only the most recent surveys, will want to use the single-survey geography variables.

Thanks. I will try it out. I have since changed my research question to focus on most recent surveys and extracted the single-survey geography variables. Do I just substitute these for the integrated version in the STATA code below?

Secondly, I asked the question about "Region" because I was going to use use it to create unique stratum codes (egen stratum=group(survey v024 v0025)) for pooled data across countries/years using the original DHS data, consistent with advice given by Dr. Pullum and others on this board.

But now that I am using the IPUMS data (which has its own stratum/IPU customized variables), do I still need to go through this process if I don't need the "region" variables? Wouldn't I ust svyset my data using the following:

Good afternoon. I hope you are doing well in the midst of this pandemic.

I am following up with your suggestions re creating an integrated regionids. I did but got stumped towards the last section where I am supposed to attach labels to the regionids. Unfortunately, the labels were not created.

Regions are easy if you're comparing one country over time. The IPUMS DHS integrated geography variables, described by Dr. King, will work well for you.

If you want unique labels for regions across countries, that's trickier. The process with the IPUMS data is similar to the process with the original DHS files. There's no way to make it simpler because the IPUMS folks cannot know which surveys or survey years researchers will want to use.

Here's some Stata code that will work with your IPUMS DHS data file to apply region labels to multiple surveys. The notes provide the information on how to tailor the code to your specific data.

I believe you are looking over time and across countries, so you'll probably want to use the integrated geography variables. Other researchers, who are comparing across only the most recent surveys, will want to use the single-survey geography variables.

The information provided on this Web site is not official U.S. Government information and
does not represent the views or positions of the U.S. Agency for International Development or the U.S. Government.