Introduction

Health expenditure share, or the percentage of the household expenditure spent on health care, is an important variable in health financing research.1 This figure is used to determine the number of households incurring catastrophic health expenditures and, in many countries, to derive the estimates of private health expenditure reported in national health accounts.2-5 Studies have shown that health expenditure share estimates derived from household expenditure surveys have problems with “reliability, validity, and comparability”.6,7 For example, two nationally representative surveys conducted in the Philippines in 2003 reported widely different health expenditure shares – 1.3% and 7.7%. One wonders which of the two estimates is a more accurate reflection of reality.

An extensive literature exists on the sources of bias in surveys.8-10 However, few studies have explored how biases affect estimates of out-of-pocket household health expenditure. Lu et al.6 examined how the number of questions on health expenditure and the recall period of a survey affected estimates of household out-of-pocket payments and catastrophic expenditure on health. Their study analysed data from the World Health Surveys (WHS) for 43 countries and from the Living Standards Measurement Survey (LSMS) for three countries. They found that estimates of health spending were lower when the survey had fewer questions and that the estimates were higher when the recall period was shorter. Heijink et al.7 conducted an exhaustive review of the evidence surrounding measurement errors in self-reported household expenditure and health expenditure. They also collected 90 household expenditure surveys from the International Household Survey Network (IHSN). Their findings concurred with those of Lu et al.: households reported higher health expenditures when more questions were asked. The authors reported that the influence of the recall period was unclear, but that the mode of data collection, such as a diary versus face-to-face interviews, did affect the estimates. Most of the studies identified by Heijink et al.7 concluded that diaries yielded lower expenditure figures;11,12 one study showed conflicting results.13 Heijink et al. also suggested that the questionnaire’s structure affects the results.7 In some surveys, health expenditure questions are included within the health module, whereas in others they are placed within the household expenditure module.

As noted, previous studies have identified the direction of the biases inherent in health expenditure share estimates. Our study, however, is the first to quantify the effect of these biases. We analyse multiple surveys per country or territory and show how the estimated share of the household expenditure devoted to health (i.e. health expenditure share) would have varied if survey instruments with different characteristics had been employed. Our contribution makes it possible for analysts to compare health expenditure share estimates across surveys. At the end of the paper we raise some points to be considered when conducting cross-country comparisons of household survey data.

Methods

We conducted an exhaustive search of all surveys that reported information on health expenditure and total household expenditure. First we identified the data sources used by the World Health Organization to estimate out-of-pocket expenditure in its Global Health Expenditure Database. We identified a total of 719 household expenditure surveys. However, to conduct our analysis, we needed both the survey questionnaire and microdata (or a report) illustrating how to calculate the health expenditure share for that particular instrument. To obtain this information for the 719 surveys, we looked up the questionnaires, documentation reports and microdata in the IHSN, the Global Health Data Exchange and the web sites of statistical offices and the health ministry of each country or territory. These supplementary sources were available for 214 surveys. The final sample therefore consisted of 214 survey years across 78 territories, as presented in Table 1 (available at: http://www.who.int/bulletin/volumes/91/7/12-115535). A flowchart showing our search strategy is presented in Fig. 1.

Fig. 1. Flowchart showing strategy followed in searching for surveys

WHO, World Health Organization.

Surveys vary in how they report expenditures and, as a result, analysts have to employ different methods to calculate total household expenditure. For example, some LSMS surveys (such as the Viet Nam Living Standards Survey) place expenditure on food and on other recurrent daily and annual expenses and expenditures on health, education and housing in separate categories. To calculate the number of questions on total expenditure, we summed all expenditure questions corresponding to the first 12 categories of the Classification of Individual Consumption According to Purpose (COICOP), which are food and non-alcoholic beverages; alcoholic beverages, tobacco and narcotics; clothing and footwear; housing, water, electricity, gas and other fuels; furnishing, household equipment and routine household maintenance; health; transport; communication; recreation and culture; education; restaurants and hotels, and miscellaneous goods and services. All questions contributed to total expenditure regardless of their location in the survey instrument.

For surveys in which respondents were interviewed, we counted all the questions that pertained to expenditures. For surveys with screening or conditional (“skip”) questions, we assumed that the respondent replied affirmatively and counted all the questions that followed the screening question. The process was more complicated for surveys in which respondents recorded results in a diary. If the survey report contained a detailed disaggregation of expenditure items, we counted all the items that were listed. If the report contained only aggregated totals (22 surveys), we used the number of disaggregated categories in the COICOP. For surveys in which a combination of interviews and diaries was used we relied on the data generated from the interview.

In our sample, the number of questions on health expenditure ranged from one (WHS surveys) to 274 (Dominican Republic 2007 Encuesta Nacional de Ingresos y Gastos de los Hogares [ENIGH]). Hence, the WHS asked a single question about the household’s total amount of health expenditure; other surveys asked multiple questions, each of them focused on a specific type of health expenditure. The number of questions on total expenditure ranged from one (WHS surveys) to 2431 (Dominican Republic 2007 ENIGH). Again, as the number of questions increased, the questions became more specific. WHS surveys had single questions for health expenditure and a single question for total expenditure, and they also had eight questions on disaggregated categories of out-of-pocket health spending and six questions on total expenditure. Given the focus of our analysis, we calculated four different iterations of health expenditure shares based on the aggregation of health and total expenditure questions. The survey with the largest number of questions – the Dominican Republic 2007 ENIGH survey – was conducted through the use of diaries. When we state that the survey had 2431 health expenditure questions, we are referring to the number of expenditure items enumerated in the survey report. Since the diary method allows respondents to record their purchases in detail, this survey reports on a greater number of items than other surveys. For example, instead of reporting overall expenditure on medicines, the Dominican Republic survey reported expenditures disaggregated by different types of medicines, such as antihistamines, anti-depressants and analgesics.

The shortest recall period was 10 days (2006 Household Socio-Economic Survey in Iraq) and the longest was 12 months (surveys from Bulgaria, Côte d’Ivoire, the Federated States of Micronesia, Gambia, Ghana, Madagascar, Mauritius and Saint Lucia). We derived health expenditure shares from microdata and validated those estimates by comparing them with the data from survey reports. If microdata were not available, we used the health expenditure shares from survey reports. Health expenditure shares ranged from 0.1% (Gambia 1992 Household Economic Survey) to 27.4% (1999 Cambodia Socio-Economic Survey). Descriptive statistics for these variables are presented in Table 2.

The dependent variable was the share of household expenditure spent on health. We modelled it as a function of the recall period, the number of health expenditure questions and the number of total expenditure questions. We also included binary indicators to represent the data collection method and the placement of the health module within the survey. We included the number of total expenditure questions because we hypothesized that one additional question pertaining to non-health expenditure would marginally increase the estimate of total expenditure without affecting the estimate of health expenditure. Therefore, we anticipated that the total number of expenditure questions would be inversely related to the health expenditure share. We assigned a value of 1 to the indicator for the data collection method if the data had been collected through a diary. We assigned a value of 1 to the indicator for the placement of the question within the health module if health expenditure questions and total expenditure questions were placed separately.

We included in the model gross domestic product (GDP) per capita and average years of education, as well as fixed effects for territory and World Bank income categorization, to control for unobservable characteristics at the territory level. We allowed the territory fixed effects to interact with year indicators to generate a unique time trend for every country. The model was estimated using ordinary least squares regression. We found 57 unique survey types and we clustered our observations based on these types. To account for potential heteroskedasticity, we report heteroskedasticity-robust standard errors.

To illustrate how health expenditure shares are influenced by survey characteristics, we defined three types of survey instruments: “minimalist”, “typical” and “extensive”. A “minimalist” instrument would have one expenditure question, one health expenditure question and a two-week recall period. These thresholds represent the minimal value of those variables in our sample. A “typical” instrument would have six expenditure questions, five health expenditure questions and a one-month recall period. These thresholds represent the median value of those variables in our sample. An “extensive” instrument would have 2431 expenditure questions, 274 health expenditure questions and a 12-month recall period. These thresholds represent the maximum value of those variables in our sample. We used the point estimates to predict counterfactual values for each of the surveys in our sample, such that each observation has three counterfactual values (one for each type of instrument). For this exercise, we assumed that the surveys were collected through an interview and that the module on health expenditure was nested in the expenditure module. To generate confidence intervals we drew 1000 random samples from normal multivariate distributions based on regression coefficient point estimates and the variance-covariance matrix obtained from our main model and used them to generate 1000 estimates of health expenditure share. The middle 95% of these estimates are presented as our confidence intervals in Fig. 2.

Note: The x-axis represents time and the y-axis represents health expenditure share. The points in the figure represent actual data from surveys in South Africa that were included in our sample. The lower line represents our predicted values if those surveys had all employed an extensive instrument. The middle line represents our predicted values if those surveys had all employed a typical instrument. The upper line represents our predicted values if those surveys had all employed a minimalist instrument.

Results

Regression results are reported in Table 3. The results are consistent with qualitative conclusions from the literature: The greater the number of health expenditure questions, the greater the health expenditure share. Other factors held constant, a one-unit increase in the number of health questions was accompanied by a 1% increase in health expenditure share. A one-unit increase in the number of total expenditure questions (while holding the number of health expenditure questions constant) was accompanied by a 0.2% decrease in health expenditure share. A one-month increase in the recall period was accompanied by a 6% reduction in health expenditure share. Surveys that employed a diary generated lower health expenditure shares. Country income classification, GDP and education were not found to be significantly related to health expenditure share and removing them from the model did not alter the statistical significance of the other independent variables namely, number of health questions, number of expenditure questions, recall period, and survey type.

Fig. 2 illustrates the counterfactual estimates for South Africa, which fielded surveys of all three types: minimalist, typical and extensive. As is evident in the figure, the instrument’s characteristics affect the estimated household health expenditure share.

The results yielded by minimalist, typical and extensive instruments differ dramatically. In most cases, the minimalist instrument results in health expenditure shares that are twice as high as those derived from the extensive instrument. This is problematic because unadjusted health expenditure shares (i.e. shares calculated without regard for the influence of the survey instrument) are routinely used to estimate two important metrics: the level of out-of-pocket expenditure reported in national health accounts and the level of catastrophic health expenditure across countries. For example, in the Philippines in 2003, catastrophic health expenditure was incurred by 8.3% of the households according to WHS data,14 but by only 0.8% according to data from the Family Income and Expenditure Survey.15 For both surveys, catastrophic health expenditure was based on the same threshold – 25% of household income. This dramatic discrepancy in the estimates generates mixed, confusing messages that policy-makers cannot properly interpret.

Discussion

Policy-makers need to rely on accurate and reliable out-of-pocket expenditure estimates. Household expenditure surveys were originally designed to measure consumer price index, living standards and household consumption for the national accounts, but not to measure out-of-pocket expenditure. Because of this limitation, the manual A system of health accounts: 2011 edition advocates an “integrative approach” to estimating private expenditure that involves making use of all available data sources, such as provider tax returns, pharmaceutical sales databases and household surveys.16 This approach would triangulate flows from these different channels to generate an accurate estimate.4 Although this approach is ideal, it is also impractical, especially in the near term for low-income countries. An interim solution would be to rigorously track the flow of funds at selected validation sites, as is done for the Medical Expenditure Panel Survey of the United States of America. This exercise would capture expenditure outflows from households to all health-care platforms in the community, including hospitals, clinics and pharmacies, and would provide a “gold standard” estimate of out-of-pocket expenditure that could then be used to adjust existing household survey data. Analysts will be able to systematically, reliably and accurately estimate out-of-pocket expenditure only if these validated estimates exist. Efforts should be made to ensure that policy-makers have access to data that capture reality rather than the idiosyncrasies of survey design.

Acknowledgements

We thank Eduardo Banzon, Joseph Dieleman, Gabriel Angelo Domingo, Karen Eggleston, Emmanuela Gakidou, Lizheng Shi, Tessa Tan Torres-Edejer and Madeleine Valera, as well as all panel participants at the 4th Biennial American Society of Health Economists and at the 2nd Symposium on Health Systems Research for their helpful comments. We also thank Brian Childress, Casey M Graves, Marissa Ianarrone, Annie Haakenstad, Katherine Leach-Kemon and Abby McLain for their generous assistance.

Funding:

This research was supported by core funding from the Bill & Melinda Gates Foundation. The funders had no role in study design, data collection and analysis, interpretation of data, decision to publish, or preparation of the manuscript. The corresponding author had full access to all data analyzed and had final responsibility for the decision to submit this original research paper for publication