Step 2: Review questionnaires. Familiarize yourself with the questionnaires used to collect the data that you want to analyze. Model questionnaires are used for each survey phase , but each country modifies the core questionnaire slightly to meet their needs. The questionnaires used to collect data for a specific survey are always included at the back of each survey's final report. All final reports are free to download, and in some cases, can be ordered in hard copy, also for free.

Use the questionnaires to determine whether the information you want to analyze was collected in your survey of interest, and who you want to analyze (your unit of analysis).

If the data you want to analyze was collected for everyone listed in the household questionnaire, your unit of analysis is probably household members. On the other hand, if for example you want to analyze data about women's contraceptive use, you will find that the relevant questions were asked in the women's questionnaire, and your unit of analysis is women. The unit of analysis will help you determine which dataset you want to download in step 4.

Step 3: Register for dataset access. All DHS datasets are free to download and use. To download datasets, you must complete a short registration form. Remember your username and password; you can use it later to login quickly and register for access to additional datasets.

Requests to access datasets are usually approved within 24 hours. You will receive an email from archive@dhsprogram.com once your request has been approved with instructions for download.

Step 4: Download datasets. Follow instructions from the email you received after registering. Once you log in to dhsprogram.com, you will see the country, survey, and list of datasets that you are approved to download. View the full tutorial on how to download DHS datasets in the video below:

If for some reason you requested a DHS datasets, but need to modify your request or give additional information to gain dataset approval, you can find the instructions in the video below:

The list of Zip files containing datasets are labeled with brief but meaningful names, such as KEIR41DT. The full description of file naming conventions is here, but briefly:

The first two letters ("KE") refer to the country – in this case, Kenya. The country code list is here.

The second two letters ("IR") refer to the data file type. IR is the individual (women's) recode file, MR is the men's recode, HR is the household recode, etc. The complete list of data file types is here. Based on your review of the questionnaires, select the file type you need for your unit of analysis.

The next two characters ("41") refer to the phase and number of the survey. A complete explanation of this numbering is here. If you are only analyzing one survey, all datasets from that survey will have the same numbering.

The last two letters refer to the software program you want to use. The DT file contains the Stata (.DTA) data file and associated documentation; The SV file contains the SPSS (.SAV) file; the SD file contains the SAS (.SAS7BDAT) file; and the FL file contains an ASCII file and dictionaries.

If you would like to download more than one dataset, please see the tutorial below to download multiple DHS datasets.

Step 5: Open your dataset in the software you are using for analysis.

A note for Stata users: if your memory and maximum number of variables (maxvar) have not been adjusted from the factory settings, you may get an error message when trying to open DHS datasets, which are very large:
Change the memory and maxvar settings. Try

set memory 450m
set maxvar 10000

to start. You may be able to set these values higher depending on your computer. These settings should allow you to open a DHS dataset.

Step 6: Get to know your variables. When your dataset is open, you will see thousands of variables with confusing names and very short variable labels that briefly describe the contents of each variable. To understand each variable and its contents, get to know the DHS recode manual. Some analysts refer to the recode manual as the "DHS Analysis Bible." Why is the recode manual so important? Here's an example:

In your dataset (assuming you are using an IR, BR, KR, or MR file) check the label of v107 (mv107). The label says "highest year of education." If you analyze this variable assuming it is the respondent's highest year of education, you will have highly misleading results. Why? Because the variable label needs to be short, and so cannot give complete information about every variable included in the dataset. Download the DHS recode manual and look through it to find v107. See that v107 is the highest year of education at the level recorded in v106. Had you analyzed v107 as the highest years of education, you would have seriously underestimated the level of education in the country you are studying. This is just one example of why it is important to use the DHS recode manual.

Step 7: Use sample weights. DHS sample weights are used in almost every tabulation in DHS final reports. The few unweighted tables are clearly labeled. Sample weights are described fully in the Guide to DHS Statistics but briefly, weights are used in all analyses to make sample data representative of the entire population. There are different weights for different sample selections/units of analysis:

Sample weights in DHS datasets

Unit of analysis

Variable

Households

hv005

Household members

hv005

Women or children

v005

Men

mv005

Domestic Violence

d005

HIV test results

hiv05

like other variables in DHS datasets, decimal points are not included in the weight variable. Analysts need to divide the sampling weight they are using by 1,000,000. Examples:

In Stata:

generate wgt = v005/1000000
tab var [iweight=wgt]

In SPSS:

COMPUTE WGT = V005/1000000.
WEIGHT by WGT.

These are just examples; other types of weights are available in different software packages.

The information provided on this Web site is not official U.S. Government information and does not represent the views or positions of the U.S. Agency for International Development or the U.S. Government.