Wolters Kluwer Health
may email you for journal alerts and information, but is committed
to maintaining your privacy and will not share your personal information without
your express consent. For more information, please refer to our Privacy Policy.

Gastroesophageal reflux disease (GERD) symptoms are best assessed using patient-reported outcome (PRO) instruments. Guidance on developing well-defined and reliable instruments that capture optimal information from the patient’s perspective was recently published by the US Food and Drug Administration and the European Medicines Agency. The aim of this systematic review was to identify and evaluate existing PRO instruments for GERD symptoms with regard to regulatory requirements. Systematic literature searches were conducted in PubMed and Embase to identify PRO instruments for GERD symptoms that have undergone psychometric evaluation. Content, construct and test–retest reliability, internal consistency, and responsiveness were evaluated in relation to regulatory recommendations. Supplementary searches were conducted to assess whether identified instruments had been used as clinical trial endpoint measures. The systematic literature searches identified 15 PRO instruments for GERD symptoms that have undergone psychometric evaluation. Eight were designed to evaluate GERD symptoms, two were to diagnose GERD, four were designed for both evaluative and diagnostic purposes, and one was designed for screening purposes. Five instruments were developed and reported to include most steps recommended by the Food and Drug Administration and European Medicines Agency, and have also been used as endpoint measures in clinical trials: the GERD Symptom Assessment Scale, the Nocturnal Gastro-oesophageal Reflux Disease Symptom Severity and Impact Questionnaire, the Reflux Questionnaire, the Reflux Disease Questionnaire, and the Proton pump inhibitor Acid Suppression Symptom test. Existing PRO instruments for GERD do not meet all the regulatory requirements for an outcome instrument in reflux trials and may need further validation.

aDepartment of Gastroenterology, Aurora Summit Hospital, University of Wisconsin School of Medicine and Public Health, Madison, Wisconsin, USA

Introduction

Heartburn and regurgitation are the characteristic symptoms of the typical reflux syndrome in gastroesophageal reflux disease (GERD); other major symptoms of GERD include epigastric pain, chest pain, and dysphagia 1. Traditionally, heartburn has been the primary symptom evaluated in clinical trials conducted in patients with GERD, with some trials also including evaluation of regurgitation and, less commonly, dysphagia 2,3. However, many individuals with GERD have multiple upper gastrointestinal symptoms in addition to heartburn 4–6. With the recent drive by the US Food and Drug Administration (FDA) to involve patients in the development of patient-reported outcome (PRO) instruments 7, it is thus likely that other symptoms will, in the future, be included in the assessment of symptoms in GERD trials.

Both the FDA and the European Medicines Agency (EMA) emphasize the importance of taking patients’ experiences into account in symptomatic diseases 7,8. Patient input is essential during the development of PRO instruments and when establishing their content validity. Content validity is relevant in a broader context beyond the regulatory setting, as it ensures that the instrument includes all relevant symptoms, measures accurately the concepts of interest to the patient and asks about these in an understandable manner, as well as uses appropriate response options. For existing instruments that did not undergo content validity testing, documentation of content validity can be provided based on new qualitative work similar to that when developing a new instrument. An overview of an instrument’s measurement properties and how these are assessed and scored is shown in Table 1.

The aim of this systematic review was to identify and assess currently available instruments for GERD symptoms designed to be used in efficacy studies and for disease diagnosis, taking into consideration the current FDA PRO guidance and EMA draft guidelines 7,8. Keeping in mind that many PRO instruments were developed before regulatory guidance was established, and that additional work to support psychometric validation can be performed on existing instruments, we reviewed and evaluated existing instruments against the regulatory guidance.

Methods

Literature searches

Systematic searches of the literature to September 2010 were conducted in PubMed and Embase, using the following search string: (‘GERD’ OR ‘GORD’ OR ‘gastroesophageal reflux’ OR ‘heartburn’ OR ‘regurgitation’) AND (‘patient reported outcome’ OR ‘questionnaire’ OR ‘instrument’ OR ‘scale’ OR ‘measure’ OR ‘score’ OR ‘index’) AND (‘symptom’ OR ‘screening’ OR ‘diagnosis’ OR ‘diagnostic’ OR ‘evaluation’ OR ‘efficacy’) AND (‘validation’ OR ‘validity’ OR ‘reliable’ OR ‘reliability’ OR ‘responsiveness’ OR ‘responsive’ OR ‘internal consistency’ OR ‘psychometric properties’). The following studies were excluded: studies of health-related quality of life instruments, studies of instruments relating to reflux surgery, studies of instruments that are for use in pediatric populations only, and studies of instruments that do not have an English-language version. Supplementary database (PubMed and Embase) searches were conducted to identify clinical trials that used the PRO instruments included in this review as endpoint measures. A flow chart of the literature searches that shows the search strings and reasons for study exclusion is depicted in Fig. 1.

Data collection

The following data were collected for the evaluation of content validity: objectives for each available questionnaire (screening, evaluative, or diagnostic purposes); characteristics of intended patient population and patient population involved in questionnaire development; method of item generation; process used for conducting patient qualitative, cognitive, and focus group interviews; and method used to evaluate the draft questionnaire. The following data were collected for the evaluation of construct validity, reliability, and responsiveness to change: characteristics of patient population included in validation study; type and duration of medical treatment; and information on convergent validity, discriminant validity, internal consistency, test–retest reliability, known-groups validity, and responsive to change. Data were also collected on whether an instrument measured both heartburn and regurgitation, and both frequency and severity of symptoms. When foreign-language translations were available, data on validation were collected for the original English-language version only. For PRO instruments that had been identified as having been used as endpoint measures in clinical trials, data were collected on whether these endpoints were primary, secondary, or exploratory, and on how response was defined.

For five PRO instruments, there was no published evidence of patient involvement in the process of instrument development. Two of these, the Carlsson–Dent questionnaire 14 and GERD questionnaire 15, were diagnostic questionnaires developed in the 1990s by assessing concurrent validity against esophageal endoscopy and pH-metry results. ReQuest in Practice 32 was a modified version of a questionnaire with a longer format, the ReQuest 29–31. The GERD symptom diary 21 was a modified version of a questionnaire developed for a different patient population (patients with dyspepsia). The Heartburn frequency and severity scale—diary card was a simple scale specifically for heartburn 23. Information on the psychometric evaluation of construct validity, reliability, and reproducibility is summarized in Table 3.

Supplementary literature searches showed that five of the identified PRO instruments have been used to derive endpoints in clinical trials (Table 3). The GSAS and the N-GSSIQ have both been utilized as exploratory endpoint measures, without implementation of predefined responder definitions. The PASS has been used as a cosecondary endpoint measure, with responders being defined as being free from symptoms 39. The PASS sleep disturbance dimension has been implemented on its own to derive a cosecondary endpoint, with a responder being defined as experiencing no sleep disturbance 40. The RDQ has been used as an exploratory 48,49 and secondary 46,47 endpoint measure, with responders being defined as being free from symptoms. The RDQ Heartburn dimension has been utilized on its own to derive a cosecondary endpoint, with a responder being defined as experiencing no heartburn 39. The ReQuest has been used as a primary endpoint measure, with responders being defined as being free from symptoms 50. The ReQuest–GI subscale has been utilized to derive a primary endpoint, without implementation of a predefined responder definition 51. The GI subscale has also been used as a coprimary endpoint measure, using a score below 1.73 as a predefined responder definition 52, based on a corresponding 95th percentile symptom threshold identified in individuals without GERD 53.

An evaluation of whether steps documented in the instrument development process included recommended steps outlined by regulatory guidance is shown in Table 4. It should be noted that such evaluation can only be performed in a general fashion. In a regulatory context, each instrument will always be reviewed in the context of a clinical program. Hence, our evaluation intends to illustrate what steps were taken in the documentation of the instrument and match those against the recommendations in current guidance.

Discussion

This systematic review of the literature identified 15 PRO instruments for GERD symptoms that have undergone psychometric evaluation. The usefulness of a PRO instrument depends on the quality of its development. Recommendations on how to develop well-defined and reliable instruments were recently published by the FDA and the EMA 7,8. The steps outlined in the regulatory documents provide guidance on how best to capture optimal information from the patient’s perspective. As such, these are also relevant in a broader context beyond the regulatory setting. Of the PRO instruments identified in the current review, five (GSAS, N-GSSIQ, ReQuest, RDQ, and PASS) were developed and documented to include most steps outlined in the regulatory guidance documents, and have also been used as endpoint measures in clinical trials.

Input from the target patient population is vital when establishing an instrument’s content validity. It ensures that all relevant symptoms are included, the instrument measures accurately the concepts of interest to the patient, it asks about these in an understandable way, and it provides appropriate response options. The two earliest questionnaires, from the 1990s, the Carlsson–Dent questionnaire 14 and the GERD questionnaire 15, did not include patients in the process of instrument development and thus do not have documented content validity. However, their development and assessment provided useful information for the development of subsequent questionnaires. Johnsson et al. 15 first implemented a word picture to describe heartburn in their instrument. Carlsson et al. 14 subsequently showed that only about one-third of patients who reported having ‘a burning feeling rising from the stomach or lower chest up toward the neck’ recognized it as being heartburn, thus emphasizing the importance of using a descriptor for heartburn. The observation made by Carlsson et al. 14 also highlighted the importance of making sure that a PRO instrument uses the patient’s way of expressing their symptoms, rather than that of the physician. The Carlsson–Dent questionnaire later formed the basis for the development of the RDQ.

The Montreal definition of GERD is now widely used to define GERD and has been endorsed by the EMA for use in clinical trials 1,8. The definition meets the PRO guidance criteria in that it relies on symptoms elicited directly from the patient and includes a patient-centered approach to the impact of the symptoms on the patient’s life 1,8. The approach of the Montreal group is therefore in keeping with the PRO concept in that symptoms are elicited from the patient and the impact of symptoms on the patient’s life is not determined using prespecified criteria such as a general quality of life instrument that may not adequately capture the impact of symptoms. Instead, the Montreal group has emphasized an individual patient-centered approach to the impact of symptoms, which is in keeping with the regulatory PRO guidance.

For most of the five PRO instruments included in this review that have been used as endpoints in clinical trials, either no responder definition was defined or response was defined as freedom from symptoms. However, one study, which used the ReQuest–GI subscale, did not require absence of symptoms as a part of the responder definition, but instead used a score below 1.73 to define symptomatic response 52. The threshold was based on a previous study, conducted in individuals without evidence of GERD, which found a reflux symptom threshold of 0.95 and 1.73, calculated as the 90th and 95th percentile of the ReQuest–GI score, respectively 53. The FDA PRO guidance encourages planning for clinical trial interpretation using an a-priori responder definition 7. However, the guidance also notes that the responder definition may vary by clinical trial design characteristics.

Of the 15 instruments included in the review, only six measured both the frequency and the severity of GERD symptoms. It is interesting to note that, when developing the GerdQ, Jones et al. 16 found that the outcome of frequency assessment alone was similar to assessing both frequency and severity in terms of the instrument’s sensitivity and specificity. The EMA draft guidelines recommend that evaluation of GERD symptoms should include an assessment of both their frequency and their severity 8. This area needs further research to balance, ensuring that the relevant aspects of a symptom are captured and avoiding asking questions that are too highly correlated.

The properties required of a PRO instrument can depend on the setting in which it is to be used. For instance, an instrument that is to be used in clinical practice will not necessarily need to meet the same standards as an instrument that is to be used for regulatory purposes as part of a product approval process. For medical product development in particular, it is useful to determine early on whether an adequate PRO instrument exists to assess and measure the concepts of interest. If it does not, then a new instrument can be developed or, in some cases, can be modified from an existing instrument. For example, of the instruments included in the current review, ReQuest in Practice was developed as a shorter version of ReQuest 32, and the GERD symptom diary was based on a similar symptom diary developed for patients with dyspepsia 21. The requirements on validation for modified instruments differ depending on whether the modifications were changes in format or significant changes in the target population 7. For example, existing instruments would need to be thoroughly documented to support their appropriateness for use in GERD patients with a partial response to proton pump inhibitor therapy, an area of intense current research 54,55.

At the moment, no instrument has met all the regulatory requirements, which state that the adequacy of any PRO instrument as a measure to support medical product labeling claims depends on whether its characteristics, conceptual framework, content validity, and other measurement properties are satisfactory. The FDA will review documentation of PRO instrument development and testing in conjunction with clinical trial results to determine whether a labeling claim is substantiated. Regulatory approval of the instrument will therefore depend on the trial design, the indication, and the properties of the drug 7. However, some of the instruments are a potential starting point for further development into a PRO for use in a specific clinical trial.

Although the regulatory PRO guidance documents are relevant beyond the regulatory setting, it should be noted that the final FDA PRO guidance document was published relatively recently 7, and discussions about content and implementation are still ongoing. Furthermore, the FDA PRO guidance is not disease specific, hence the principles are general and need to be applied to the disease area of interest. The EMA treatment evaluation guideline for GERD is disease specific, but is still under development 8, whereas the EMA reflection paper focuses on assessment of health-related quality of life rather than symptoms 56. To advance the field of PRO instrument development, it is important that publication of all relevant data pertaining to the psychometric evaluation and use of an instrument is encouraged.

Conclusion

It has been recognized for several years now that the patient’s symptoms need to be reported by the patient, not the physician. It is important to seek input from the target patient population early on when developing a PRO instrument and when establishing its content validity. The current review reveals that many existing instruments need further documentation to support psychometric validation, especially if they are to meet new regulatory requirements. No single instrument currently meets all clinical and regulatory requirements. The five instruments identified that meet most requirements and have been used as endpoints in clinical trials were developed for use in different settings: the N-GSSIQ was developed to assess the severity and impact of nocturnal GERD symptoms 25; the PASS was developed to identify patients with persistent acid-related symptoms during proton pump inhibitor therapy in clinical practice and assess response to change in therapy 26; the ReQuest was developed separately in patients with GERD with and without reflux esophagitis 29–31; the GSAS was designed to measure treatment effects in the context of clinical trials 19,20; and the RDQ was developed for diagnostic and evaluative purposes in primary care and clinical trials 6,27,28. Depending on the application, the reader may choose one of these instruments for further development. Instruments for use in other contexts such as clinical practice will not necessarily need to meet the same standards as an instrument that is to be used for regulatory purposes in the clinical trial setting. However, irrespective of the context in which the instrument is to be used, patient input is valuable in guiding instrument development.

Acknowledgements

Conflicts of interest

Nimish B. Vakil has received consultancy fees from AstraZeneca, Takeda Pharmaceutical, XenoPort, Orexo, and Ironwood Pharmaceuticals; grant/research support from AstraZeneca and XenoPort; and has ownership interest (e.g. stocks, stock options) in Orexo and Meridian Bioscience. Katarina Halling and Anna Rydén are employees of AstraZeneca. Anja Becher is a contracted employee of Oxford PharmaGenesis, which has received funding from AstraZeneca.

56. Reflection paper on the regulatory guidance for the use of health-related quality of life (HRQL) measures in the evaluation of medicinal products. London, UK Committee for Medicinal Products for Human Use of the European Medicines Agency 2005. Available at: http://www.ema.europa.eu/ema/ [Accessed 9 June 2006]