Background

Claims data are currently widely used as source data in asthma studies. However, the insufficient information in claims data related to level of asthma severity may negatively impact study findings. The present study develops and validates an asthma severity classification model that uses medication utilization in Taiwan National Health Insurance claims data.

Methods

The National Health Insurance Research Database was used for the years 2006–2012 and included a total of 7221 patients newly diagnosed with asthma in 2007 for model development and in 2008 for model validation. The medication utilization of patients during the first year after the index date was used to classify level of severity, and the acute exacerbation of asthma during the second through fourth years after the index date was used as the outcome variable. Three models were developed, with subjects classified into four, three, and two groups, respectively. The area under the receiver operating characteristic curve (AUC) and the Kaplan-Meier survival curve were used to compare the performances of the classification models.

Results

In development data, the distribution of subjects and acute exacerbation rate among the stage 1 to stage 4 were: 62.71%, 5.54%, 22.79%, and 8.96%, and 8.17%, 9.55%, 11.97%, and 14.91%, respectively. The results also showed the higher severity groups to be more prone to being prescribed oral corticosteroids for asthma control, while lower severity groups were more likely to be prescribed short-acting medication and inhaled corticosteroid treatment. Furthermore, the results of survival analysis showed two-group classification was recommended and yield moderate performance (AUC = 0.671). In validation data, the distribution of subjects, acute exacerbation rates, and medication uses among stages were similar to those in development data, and the results of survival analysis were also the same.

Conclusions

Understanding asthma severity is critical to conducting effective, scholarly research on asthma, which currently uses claims data as a primary data source. The model developed in the present study not only overcomes a gap in the current literature but also provides an opportunity to improve the validity and quality of claims-data-based asthma studies.

Asthma is a common, chronic disease involving inflammation of the small airway that affects more than 300 million people around the world [1]. An estimated 24.6 million Americans had asthma in 2015, including 8.4% of all children and 7.6% of all adults [2]. A 2017 estimate pegged the cost of asthma in the United States at $56 billion annually, with an additional $5.9 billion in productivity losses [3]. In Taiwan, asthma is also common, with an estimated prevalence rate of 12% of the total population [4]. The average hospitalization-related expenditures of asthmatic patients in Taiwan are 2.7 times that of other patient categories [5].

Using claims data to research asthma issues on a national or regional scale has become increasingly popular in recent years [6–8], with benefits frequently including lower costs, larger sample sizes, and easier longitudinal follow-up. However, this approach also presents several limitations and challenges in dealing with claims data such as insufficient code columns, validity issues (e.g. up-coding or down-coding to fit the payment scheme), and being unable to determine disease severity [9–11]. The latter is the most important factor that limits the use of claims data in related outcome research.

In order to resolve this limitation on determining disease severity, previous studies have used data related to medication utilization such as “controllers-to-total-asthma-drug” ratios and inhaled corticosteroids plus leukotriene antagonist receptors to estimate severity [12–15]. A recent systematic review study by Jacob et al. found 54 articles in the literature that had used claims data to assess asthma severity [16]. They further found that four types of algorithms were used to classify asthma severity, including Healthcare Effectiveness Data and Information Set (HEDIS) criteria, Leidy criteria, the Global Initiative for Asthma (GINA) criteria, Canadian Asthma Consensus Guidelines (CACQ). Of these, the HEDIS criteria were the most widely used.

All of the abovementioned algorithms use medical and medication utilizations to classify asthma severity in claims data. For example, the criteria of HEDIS include asthma-related patient data such as numbers of outpatients, admission, and ER visits and data on asthma medication dispensation [17, 18] but only classifies patients into persistent and intermittent parameters. The Leidy criteria rely on data on asthma medication dispensation only [19], with the frequency of oral corticosteroid prescriptions and short-acting β2-agonist used to classify level of severity into mildly persistent, moderately persistent, and severely persistent. However, information on model validation is insufficient for all of the models, with the exception of the CACQ [20].

The GINA guidelines are the most important reference for asthma treatment. These guidelines were published by an international organization that was launched in 1993 in collaboration with the National Heart, Lung, and Blood Institute, National Institutes of Health in the United States and the World Health Organization in order to develop a global strategy for managing and preventing asthma. GINA reports have provided the foundation for many national guidelines. They are prepared by international experts from primary, secondary, and tertiary care data and are updated annually following a review of the current evidence. Prior to 2014, GINA guidelines categorized asthma patients based on level of symptoms, of airflow limitation, and of lung function variability. The GINA guidelines were revised significantly in 2014 and now provide recommendations for categorizing levels of asthma control. In addition, most of the current severity classification models were developed before 2011. To the best of the knowledge of the present authors, no study in the literature has used current GINA severity classification criteria to develop an asthma severity classification model.

The Taiwan National Health Insurance (NHI) program, which currently covers roughly 23 million citizens and residents, was launched in 1995. Claims data, which are extracted from data collected regularly by the NHI, have been often used to explore the outcomes and quality of healthcare with regard to many diseases, including asthma [7, 8, 21–23]. However, as noted above, the classification of asthma severity in Taiwan NHI claims data remains an unresolved issue. Therefore, the purpose of the present study is to use the most updated GINA guidelines as a reference to develop an asthma severity classification model using Taiwan NHI claims data and to validate the feasibility of this model.

Data sources

The present study employed a retrospective, database cohort study and used data in the Taiwan National Health Insurance Research Database (NHIRD) from the time period 2006 to 2012. In Taiwan, the National Health Insurance Administration is the sole insurer and has implemented national health insurance (NHI) since March 1, 1995. The NHIRD covers the 23 million enrollees in the NHI program (approximately 99.9% of Taiwanese citizens), and almost all healthcare facilities are NHI-contracted providers. The NHI claims data includes inpatient medical benefit claims, ambulatory care medical benefit claims, pharmaceutical benefit claims, contracted medical care institutions, health professionals in contracted medical care institutions, and beneficiaries.

Ethics statement

The study protocol was fully reviewed and approved by the Ethics Committee for Clinical Research, National Taiwan University Hospital (Taipei, Taiwan; protocol # 201601056RINB). The dataset was obtained from the NHIRD and all personal identification information has been encrypted. Therefore, written informed consent was not necessary.

Patient population

A retrospective study design was conducted. The medication utilization of patients during the first year after the index date was used to classify level of severity, and the acute exacerbation of asthma during the second through fourth years after the index date was used as the outcome variable (Fig. 1). The ambulatory, inpatient, and enrollment data of the asthma cohort were extracted from the NHIRD for the year 2007 in order to identify all patients with “newly diagnosed asthma in the year of 2007”. The date of the first diagnosis of asthma was used as the index date, and patients with two or more outpatient service claims or one or more inpatient care claim with a primary or secondary diagnosis of asthma (International Classification of Diseases, Ninth Revision, Clinical Modification [ICD-9-CM] codes including 493.0, 493.1, 493.2, 493.8 and 493.9) during 2007 were included as subjects [7, 8, 21]. Otherwise qualified patients who had been diagnosed with asthma prior to 2007 were excluded. Patients were also excluded if they: (1) were <18 years of age, (2) had withdrawn from coverage during the observation period, or (3) had been diagnosed with chronic obstructive pulmonary disease (COPD) prior to the index date. Additionally, for model validation, the population of asthmatics newly diagnosed in 2008 was recruited using the same inclusion and exclusion criteria.

Fig. 1

Data collection procedure

Model development and validation of the asthma severity classification model

The classification criteria that were used in the present study referred to Reliever/Oral Steroid Use (ROSU) [24] and GINA 2014 guidelines [25]. Although GINA classifies severity into five steps, data on Immunoglobulin E (IgE) utilization are not available in Taiwan NHI claims data. Thus, an expert meeting was held to discuss how to group patients and determine the criteria. Ultimately, the present study chose to classify asthma severity into four groups according to the following procedure and criteria: 1) A subject was classified as stage 4 (GINA step 4–5) if at least 50% of her/his prescriptions during the target year included medium/high dose-inhaled corticosteroid/long-acting inhaled β2-agonist (ICS/LABA) combinations or oral corticosteroids (OCSs); 2) A subject was classified as stage 3 (GINA step 3) if she/he had prescriptions of medium/low dose ICS/LABA or <50% of prescriptions included medium/high-dose ICS/LABA during the target year; 3) A subject was classified as stage 2 (GINA step 2) if she / he received only low-dose ICS prescriptions during the target year; and 4) All other patients were classified as stage 1 (GINA step 1) (Fig. 2).

Fig. 2

Flow diagram of patient selection and severity classification

Moreover, the event of acute exacerbation was used as the gold standard, including: (1) had an asthma-related hospitalization and at least one ER visit and (2) received two or more than two consecutive short-acting medication prescriptions within a 10-day period. We also collected information on subject age, gender, score on the Charlson Comorbidity Index, obesity status, sinusitis status, and Gastroesophageal Reflux Disease (GERD) status and used these data as covariates.

Three models were developed in the present study. In model 1, asthma severity was classified into four groups (stages 1 to 4); in model 2, asthma severity was classified into three groups (stages 1–2, 3, and 4); and in model 3, asthma severity was classified into two groups (stages 1–2 and 3–4). All of the covariates were included in all of the models. As mentioned above, newly diagnosed asthmatic patients in 2007 were selected for model development, while newly diagnosed asthmatic patients in 2008 were selected for model validation.

Statistical analysis

All of the analyses were conducted using SAS statistical software, version 9.4 (SAS Institute Inc., Cary, NC) and the statistical level of significance was set at 0.05. Descriptive statistics were presented as number, mean, and standard deviation (SD) for continuous variables and as frequency, and percentage for categorical variables. Differences were examined using the Chi-square test for categorical variables and the analysis of variance (ANOVA) test for continuous variables. Kaplan-Meier survival analysis was performed, with the differences in survival functions between severity groups assessed using the log rank test. The area under the receiver operating characteristic (ROC) curve (AUC) was used to compare the performances between various classification models. Additionally, the multivariable Cox proportion hazard regression was used to examine the effect of asthma severity on the acute exacerbation of asthma after adjusting for the covariates.

Table 1 illustrates the characteristics, medication, and medical utilization of the development group. A total of 3593 patients were included as subjects. Three-fifths (58.47%; 2101) were female and around 60% were above 45 years of age, with the largest numbers in the >65 (22.96%) and 18–34 (24.49%) years-of-age groups. In terms of comorbidity, the mean score for the Charlson Comorbidity Index (CCI) was 0.39, which included 10, 40, and 11 subjects who had been diagnosed with obesity, sinusitis, and gastroesophageal reflux disease, respectively. In terms of medication utilization, 866 (24.10%) received at least one ICS/LABA combination prescription, 822 (22.88%) received at least one short-acting beta agonist (SABA) prescription, 366 (10.19%) received at least one oral corticosteroid prescription, 309 (8.60%) received at least one ICS prescription, 111 (3.09%) received at least one short-acting muscarinic receptor antagonist (SAMA) prescription, and 155 (4.31%) received at least one SABA and SAMA combination prescription within one year of the index date. In terms of acute exacerbation, 349 (9.71%) had experienced at least one acute exacerbation event, 52 (1.45%) had been admitted to hospital, 160 (4.45%) made an ER visit, and 218 (6.07%) had received ≥2 consecutive short-acting medication prescriptions within a 10-day period.

According to the algorithm, the distribution of subjects among the four stages were: stage 1–2253 (62.71%), stage 2–199 (5.54%), stage 3–819 (22.79%), and stage 4–322 (8.96%), respectively. Further, Table 1 shows comparisons of patient characteristics, medication utilization, and acute exacerbation events among subjects in the four groups. The results revealed that patient age, medication, and acute exacerbation events showed significant differences among these groups. The higher severity groups had a slightly higher proportion of older individuals. In terms of medication utilization, the data also showed that higher severity groups (stages 3–4) were more likely to receive prescriptions of OCS for asthma control. In contrast, the lower severity groups were primarily prescribed SABAs, SAMAs, and ICSs. Further, the percentage of acute exacerbation events increased with level of severity.

Table 2 shows the results of the descriptive analysis of the validation group, which included a total of 3628 subjects. In terms of patient characteristics, the results are similar to the development group. Regarding medication utilization, in general, the patterns of medication utilization are almost the same between the two groups (development and validation), with the exception of the slightly higher utilization of SABA in the stage-3 validation group. In terms of acute exacerbation, overall acute exacerbation was around 10%, which is similar to the development group, although the percentage of acute exacerbation events increased with level of severity, with stage 3 rather than stage 4 showing the highest rate of acute exacerbation.

Figure 3 presents the results of the Kaplan–Meier survival analysis. All of the results achieved statistical significance (p < 0.001) regardless of which severity classification model was applied. Nevertheless, the survival curves of stage 4 and stage 3 were crossed, when both Model 1 (four severity groups) and Model 2 (three groups) were applied. Therefore, Models 1 and 2 were excluded. In terms of classification performance, the area under the curve of Model 3 was 0.671. For model validation, the same inclusion and exclusion criteria were used to select newly diagnosed asthma in 2008 and the same classification algorithms were used to classify asthma severity. The classification performance also showed that the area under the curve of Model 3 was 0.702. The results are thus similar between the development and validation groups, suggesting that the developed model is stable. Finally, the numbers of censored subjects among models are presented in Table 3, and Table 4 shows the results of the Cox proportional hazard model and revealed that the more severe groups faced a higher risk of acute exacerbation.

Asthma is a chronic airway inflammatory disorder that affects more than 300 million people worldwide and causes substantial morbidity for patients as well as economic loss for society [26]. Many of these studies use claims data for research and analysis. However, the lack of information on asthma severity in claims data is a major limitation that may diminish the value of research findings. In this study, we referred to ROSU and 2014 GINA guidelines to develop asthma severity classification models in claims data. After model development and validation, the results of the present study support the validity of using the medication utilization information in claims databases to classify asthma patients into two groups. Thus, the achievements of the present study may help fill this gap and lead to academic advances.

The major purpose of claims data is for reimbursement, therefore, when it is being applied to other purposes, using existing information to find out proxy indicators is necessary and important. For example, risk adjustment is an important procedure for healthcare organization comparison, and the information of disease severity could be the most essential component included. Recently, a lot of studies tried to use the information of medication and healthcare utilization to classify disease severity (e.g. stroke [27], COPD [9]), and outcomes of care (e.g. surgical site infection [28]). Therefore, the medication and healthcare utilization could be a good source of information, and might be also more accurate than ICD codes.

In addition, this study highlighted several issues that are worth further discussion. First, regarding necessity for developing a new model to classify asthma severity in claims data instead of adopting an existing model, several other models have been developed, and prior research also used the information of medication and healthcare utilization as criteria. However, the model that was developed in the present study presents several unique advantages, including: (1) the present sample was abstracted from a nationwide database. Thus, the representativeness of this sample should be superior to existing models and (2) the procedure used to develop the model in the present study is more comprehensive than those used in previous studies. In addition to referencing existing experience, the authors convened an expert meeting to confirm the feasibility and practicality of the selected classification criteria and also used data from another year for model validation.

Second, regarding the explicitness of severity classification criteria, although the existing models have provided many criteria to implement, they were developed many years ago and the appropriateness of these criteria should be reviewed carefully. Besides, existing models and criteria faced certain limitations in implementation. One example is whether it is appropriate to use an absolute number as a cutoff point. A physician-ordered modification to a prescription may not relate to a change in patient severity level. Therefore, the present study used the proportion of medium/high-dose ICS or ICS/LABA in order to avoid this limitation, while the expert meeting helped make the developed model more applicable in daily practice.

Third, concerning the time lag between the years of data and the severity classification standards used to test the developed model, although the year of data we used and the year of severity classification standard we adopted were not consistent, but the medications and treatment medications were similar with data collected over the past decade, with the exception of IgE. Only the severity classification standard had been revised over time [25, 29]. Therefore, the inconsistency between the year of data used and the year of severity classification standard may not pose a significant limitation or problem for the present study.

Claims data is widely used for various purposes, but lacking of the information of disease severity is the major defeat. Based on the experience of this study, researchers can follow the model development procedure to develop and validate disease severity classification models for other diseases, especially in chronic disease. Researchers can pay more attention to disease severity classification model development, and policy makers also can apply them to optimize local healthcare delivery.

Limitations

Although we followed rigorous procedures in developing and validating the asthma severity classification models in the claims data, the present study is affected by several limitations. Most importantly, information on actual asthma severity level and the distribution of severity among asthma patients was not directly obtainable from claims data. In order to minimize the impact of this limitation, we selected acute exacerbation of asthma to validate our model. The acute exacerbation of asthma is highly correlated with asthma severity level [26]. Thus, this limitation could be alleviated. Next, with regard to the rate of guideline adherence, several studies have indicated that poor adherence to guidelines is an issue in asthma care [30, 31]. Moreover, our data were extracted from a period prior to the GINA guidelines major revision in 2014. Therefore, physician prescription patterns should have changed after the 2014 GINA guidelines were published. However, medication utilization may be the only information available to classify asthma patient severity in claims data. Therefore, an expert meeting was convened in order to reduce the effect of low guideline adherence. Nevertheless, this limitation remains difficult to avoid. Finally, unmeasurable factors such as air pollution, allergens, and other environmental factors were not available in the present study, which may cause the value of the area under the curve was lower than 0.7.

Accurately understanding level of asthma severity is necessary and critical to asthma research. Current studies widely use claims data to collect the data necessary to assess asthma severity. The results of this study suggest that it is possible to use the medical utilization of patients to classify asthma severity in claims data. The model developed in the present study has the potential not only to improve the validity and quality of asthma research but also to analyze the data and explain the results in advance.

Acknowledgements

The authors would like to thank the Taiwan Ministry of Science and Technology for financial support. (MOST-106-2628-H-227 -001).

Funding

This study was financially supported by Taiwan Ministry of Science and Technology (MOST- 106-2628-H-227 -001).

Availability of data and materials

The dataset used in this study is held by Taiwan Ministry of Health and Welfare (MOHW). Researchers who want to access this dataset may submit an application form to the Ministry of Health and Welfare. Please contact the staff of MOHW (Email: stcarolwu@mohw.gov.tw) for further assistance.

Authors’ contributions

THY developed the study concept, analyzed the data, and drafted the manuscript; FPK was involved in collecting data and drafting the manuscript; and YCT reviewed the methods and results and revised the manuscript. All authors read and approved the final manuscript.

Ethics approval and consent to participate

The study protocol has been fully reviewed and approved by the Ethics Committee for Clinical Research of National Taiwan University Hospital (Taipei, Taiwan; protocol # 201601056RINB). The dataset was obtained from the National Health Insurance Research Database (NHIRD) and all personal identification information has been encrypted. Therefore, written informed consent was not necessary and not obtained for this study.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.