Accurate diagnosis and effective management of gastro-oesophageal reflux disease (GERD) can be challenging for clinicians and other health care professionals.

Aim

To develop a patient-centred, self-assessment questionnaire to assist health care professionals in the diagnosis and effective management of patients with GERD.

Methods

Questions from patient-reported GERD instruments, previously documented in terms of content validity and psychometric properties (RDQ, GSRS and GIS) and data on the diagnosis of GERD in primary and secondary care were used in the formal development of a diagnostic and management tool, the GerdQ, involving psychometric validation and piloting in patient focus groups.

Results

Analyses of data from over 300 primary care patients, moderated by patient input from qualitative interviews, were used to select specific items from the existing instruments to create a new six-item diagnostic and management tool (GerdQ). ROC analysis indicated a sensitivity for GerdQ of 65% and a specificity of 71% for the diagnosis of GERD, similar to that achieved by gastroenterologists.

Conclusion

The GerdQ is a potentially useful tool for family practitioners and other health care professionals in diagnosing and managing GERD without initial specialist referral or endoscopy.

Aliment Pharmacol Ther 30, 1030–1038

Introduction

Gastro-oesophageal reflux disease (GERD) is common in the community and in daily clinical practice. The prevalence of GERD in the general population in western countries is around 10–20%1 and the condition accounts for up to 4% of consultations with family physicians. There is evidence that primary care physicians face challenges in making an accurate diagnosis of GERD and in managing it effectively. Routine visits to family physicians are becoming shorter and the demands on physicians to ensure compliance with screening and performance targets reduce the time available for interactions with patients regarding their symptoms. Attention is increasingly being placed on innovative methods that can aid the primary care physician in rapidly reaching a diagnosis and in supporting management. In this article, we describe the development of the GerdQ, a patient-centred tool devised for health care professionals to identify and manage patients with GERD. Our aim was to develop an instrument to support accurate, symptom-based diagnosis of GERD by primary care physicians, to guide them in their treatment decisions by differentiating between GERD patients with occasional reflux symptoms (which have a relatively low impact on daily of life) and those with frequent symptoms (that have a significant effect on daily life) and to monitor the effect of therapy on patients’ symptoms and daily lives over time.

Methods

The data which form the basis for the GerdQ were collected in a large international study (DIAMOND, study code D9914C00002),2 performed in a primary care population presenting with upper GI symptoms. Clinical diagnoses and questionnaire scores were compared with objective diagnostic tests for GERD (endoscopy, pH-metry and Symptom Association Probability (SAP) and the PPI test in selected patients) to develop a self-assessment questionnaire with a high diagnostic accuracy for GERD. The GerdQ questions are derived from the Reflux Disease Questionnaire (RDQ),3 the Gastrointestinal Symptom Rating Scale (GSRS)4 and the Gastro-oesophageal reflux disease Impact Scale (GIS).5 These validated instruments were chosen because their development has formed part of the programme of research which preceded the present study. The GerdQ was refined by repeatedly testing content validity in patient focus groups to ensure that the individual items selected were robust and complied with the guidance published for development of Patient Reported Outcomes (PROs)6 in terms of relevance, importance and comprehensibility for patients.

The Diamand study

The DIAMOND study2 was carried out in Germany, Sweden, Canada, Denmark, Norway and the United Kingdom and involved 73 family practitioner and 22 specialist clinics. Patients aged 18–79 years who presented to their family physician with upper abdominal symptoms believed to be of GI tract origin were recruited for study. The patients were screened and completed three previously validated patient-reported GERD outcome questionnaires: RDQ, GSRS and GIS. A conventional symptom-based diagnosis was also performed initially and independently by the family physician and by a gastroenterologist, according to normal practice. Investigations including endoscopy and then 48-h wireless oesophageal pH monitoring (Bravo) with symptom event monitoring were carried out at the patients’ second visit. A test of PPI therapy (single-blind esomeprazole) was then carried out for 2 weeks following completion of the investigations. A diagnosis of GERD was made if patients fulfilled at least one of the following criteria:

As there is no single ‘gold standard’ test for GERD,8, 9 endoscopy and wireless 48-h pH recording with symptom association monitoring were used in all patients to provide an independent and objective reference standard for the diagnosis of reflux disease.

Exploratory analyses based on the DIAMOND data

The analyses described below were undertaken to explore which item combination gave the most accurate diagnosis of GERD. The intention was to include items assessed in the target population i.e. in patients presenting with symptoms thought to originate from the upper GI tract, and to keep the questionnaire as short as possible without compromising sensitivity and specificity.

The diagnosis of GERD, based on the combined use of the four objective tests used in the DIAMOND study, is currently considered the best that can be achieved using objective criteria. This diagnosis was used as the reference against which to evaluate firstly the RDQ and now the GerdQ.

The analyses of the DIAMOND study data showed that adding the frequency and severity scores of the RDQ items ‘A burning feeling behind your breastbone’, ‘A pain in the centre of the upper stomach’ (with reversed scoring) and ‘Unpleasant movement of material upwards from the stomach’, provided a useful tool for diagnosing GERD. This combination was developed on the basis of RDQ data from another study10 with a population similar to the DIAMOND population and was pre-specified in the DIAMOND study protocol. ROC analyses indicated that consultation with a family physician was comparable to the RDQ in terms of sensitivity and specificity for the diagnosis of GERD, at a level of approximately 60%.11 The corresponding accuracy for the gastroenterologist11 and the GerdQ is higher, at approximately 70%.

Construct criteria for the GerdQ

Selection of wording for response options. For the development of a simplified questionnaire, GerdQ, it was necessary to investigate first whether both frequency and severity gradings should be used in the response options as in the RDQ. One argument for symptom frequency scoring is that it is preferred by patients over symptom severity scoring, as it is easier to recall (focus group feedback). It was also found that using frequency alone gave a slightly better ROC curve than severity alone and that the ROC curve based on frequency alone (sensitivity and specificity for an optimal diagnosis cut-off 64% and 67%) was as good as the original ROC curve (corresponding sensitivity and specificity being 62% and 67%), which was based on both severity and frequency. Thus, only frequency scoring was included in the GerdQ, reducing the number of items by half compared with the RDQ.

Number of response options. The second question was whether it was possible to simplify the questionnaire by reducing the number of response options for each symptom from six in the RDQ questionnaire to four, i.e. the number of response options used in the GIS. When the RDQ scale was recoded in this way, it was found that the ROC curve with the recoded scale was not impaired compared with the original. The recoded 4 graded scale (No symptoms, 1 day, 2–3 days and 4–7 days of symptoms) was derived from the original RDQ scale and also guided by the Montreal definition of GERD,12 where two or more days of reflux symptoms were considered troublesome by the patient and thus indicative of GERD. Moreover, two or more days with heartburn (during treatment) is considered insufficient control of this symptom.13 The patient input also indicated a preference for a 4-graded scale. Therefore, for consistency and simplicity, all GerdQ items use a four-graded scale, the latter previously validated by Junghard and Wiklund.14 A 1-week recall period was chosen on the basis of previous survey data and the opinions of patients in the focus groups, both in this study and in the process of developing the RDQ.15

Response option selection. The third step was to find out which, if any, items from GIS and⁄or GSRS would improve diagnostic accuracy when added to the selected and recoded RDQ items. For this evaluation, a logistic regression with a stepwise selection procedure was used, where the three most discriminant RDQ items, ‘A burning feeling behind your breastbone’ (positive predictor for GERD), ‘A pain in the centre of the upper stomach’ (negative predictor for GERD) and ‘Unpleasant movement of material upwards from the stomach’ (positive predictor for GERD), were forced into the model with all GIS and GSRS items as factors. The items that had a significant impact on the accuracy of diagnosing GERD (ordered from most to least significant impact) were the nausea item from GSRS (negative impact), the eat⁄drink item from GIS (negative), the ‘sensation of not completely emptying the bowels’ item from GSRS (negative), the ‘urgent need to have a bowel movement’ item from GSRS (positive), the ‘sleep disturbance due to reflux symptoms’ item and the ‘additional medication’ (e.g. OTC) item, both from GIS (positive). The nausea item, the sleep item and the additional medication items were considered the most intuitive logical items to include in GerdQ in addition to the RDQ items above. The GIS and GSRS items were also re-coded and graded by frequency. The GSRS grades ‘Minor discomfort’ and ‘Mild discomfort’ were collapsed and considered to represent ‘1 day’, ‘Moderate discomfort’ to represent ‘2–3 days’ and the grades ‘Moderately severe’, ‘Severe’ and ‘Very severe discomfort’ were collapsed and considered to represent ‘4–7 days’.

Including all nine items identified by the stepwise selection procedure gave an ROC curve similar (63% sensitivity and 73% specificity) to the chosen combination of six items for GerdQ (65% sensitivity and 71% specificity). Increasing the number of items beyond six did not improve the diagnostic accuracy.

Qualitative patient interviews. Qualitative data were collected from patients regarding the acceptability of various questionnaire design features and ease of completion. This survey was conducted by the United Bio-Source Corporation (UBC) Center for Health Outcomes Research, Bethesda, MD 20814, US and took place in two rounds, to test both original and revised versions of questions and their response options. Trained moderators led the focus groups and one-to-one interviews and an assistant moderator recorded notes from each session. An interview guide was used to elicit open-ended discussion regarding symptom experience and terms used to describe symptoms. All groups or interviews were conducted in English for US patients and in German for German patients. Round One was conducted in the US, in interviews with 14 patients with a clinical diagnosis of GERD, IBS and⁄or dyspepsia, and utilized GerdQ Version 1, where the questions were worded as in the RDQ, GSRS and GIS questionnaires.

In Round Two, two focus groups, one in the US with 10 patients and one in Germany with 18 patients, were evaluated using the six items proposed for GerdQ Version 2, where the wording of some items had been changed, based on patient input from Round One.

Results

Study population for exploratory analysis

Data were evaluable for a mixed population of 308 patients (143 men and 165 women) with a mean age of 47 years who presented in primary care with upper abdominal symptoms.2 They had experienced upper abdominal symptoms for an average of 4.1 years and 23% were found to be positive for H. pylori. At endoscopy, 38% were found to have reflux oesophagitis, 39% hiatal hernia, five gastric ulcer and four duodenal ulcer.

Qualitative interview results

In Round One, when asked about the 4-point vs. the 6-point frequency options, 86% preferred the 4-point response scale. In Round Two, 11 of the participants had GERD, 11 had dyspepsia and 6 had IBS. All participants in Round Two thought that the number of items in GerdQ (6) was acceptable and 57% (16⁄28) liked the recall period of 7 days. The majority of participants (61%) preferred the frequency to be counted in days rather than to be described by more vague categories (such as sometimes and often).

A substantial number of participants in both Rounds thought that the item ‘How often did you have a burning feeling behind your breastbone?’ should address the fact that heartburn radiates from the oesophagus up to the throat, and that the oesophagus should be added to the picture (i.e. the drawing of the torso) indicating the location of pain or discomfort. The participants thought the wording of the remaining items was clear and easy to understand.

Formulation of GerdQ questionnaire. Based on the DIAMOND ROC analysis, logistic regression and relevance, as well as data from previous studies and qualitative interviews, six items from the different questionnaires were included in the new questionnaire, GerdQ, which is shown in Table 1. It comprises four positive predictors of GERD: heartburn and regurgitation (the two characteristic symptoms of GERD, according to the Montreal definition), sleep disturbance because of these two reflux symptoms and use of OTC medication in addition to that prescribed (found to be positive predictors in the DIAMOND study) and two negative predictors of GERD, epigastric pain3 and nausea.4 Patients were asked to reflect on symptoms over the preceding week. Scores ranging from 0 to 3 were applied for the positive predictors and from 3 to 0 (reversed order, where 3 = none) for negative predictors. The GerdQ score was calculated as the sum of these scores, giving a total score ranging from 0 to 18.

Table 1. The GerdQ questionnaire respondents enter the frequency scores after reflecting on their symptoms over the previous week

Frequency score (points) for symptom

Question

0 day

1 day

2–3 days

4–7 days

1. How often did you have a burning feeling behind your breastbone (heartburn)?

0

1

2

3

2. How often did you have stomach contents (liquid or food) moving upwards to your throat or mouth (regurgitation)?

0

1

2

3

3. How often did you have a pain in the centre of the upper stomach?

3

2

1

0

4. How often did you have nausea?

3

2

1

0

5. How often did you have difficulty getting a good night’s sleep because of your heartburn and⁄ or regurgitation?

0

1

2

3

6. How often did you take additional medication for your heartburn and ⁄ or regurgitation, other than what the physician told you to take? (such as Tums, Rolaids, Maalox?)

0

1

2

3

Specificity and sensitivity of GerdQ for diagnosing GERD. In Figure 1, the ROC curve describing the sensitivity and specificity of the GerdQ and RDQ questionnaire for diagnosing GERD is compared with the corresponding symptom-based diagnoses reached by the family practitioners and gastroenterologists who took part in the DIAMOND study. The diagonal line represents the outcome by chance and the GerdQ point (sum score) closest to the gastroenterologists’ diagnosis represents a cut-off of 8 (those with a score of 8 or more have a high likelihood of having GERD and those with less than 8 have low or no likelihood). The cut-offs 6, 7 and 8 have the highest chance-corrected efficiency (efficiency is the probability of correctly classifying patients). Of those cut-offs, 8 has the highest specificity (71.4%) and sensitivity (64.6%) and consequently, it is proposed as the cut-off when testing for GERD. GerdQ reaches a diagnostic accuracy similar to that of the gastroenterologist.

In Figure 2, the GerdQ score is plotted against the percentage of patients with GERD, reflux oesophagitis and abnormal oesophageal pH, diagnosed according to the DIAMOND study criteria. All of these criteria show similar direct correlation with increasing GerdQ cutoff scores. In patients with a GerdQ sum score of 8 or more, approximately 80% have GERD and in those with a sum score of 3–7, 50% have GERD. None with a score of 0–2 had GERD. The patients in whom the diagnosis of GERD was made exclusively on the basis of a positive SAP (n = 15) are distributed across all the groups, so that their numbers within each group are too small to be analysed separately. There was also a good direct correlation between heartburn severity at baseline and the frequency-based GerdQ score (Figure 3).

Fig. 2. Proportions (%) of patients with reflux oesoph-agitis, abnormal oesophageal pH or GERD in each cut-off range for GerdQ

Fig. 3. Distribution of patients by GerdQ score and baseline heartburn severity (as assessed by the subject together with the investigator)

Measurement of treatment response with GerdQ. There was no other measurement of change in the DIAMOND study and hence the methods that were applied to assess responsiveness were distribution-based.

The GerdQ questionnaire was shown to be a sensitive tool for measuring response to treatment with an accuracy similar to that previously shown for the RDQ. The mean of selected items may be used as a measure of treatment response. To evaluate responsiveness, the effect size (mean change divided by the standard deviation at baseline) and standardized response mean (mean change divided by the standard deviation of the change) are frequently used. In terms of these values, GerdQ may be compared with the RDQ GERD dimension, as this dimension of RDQ has the largest effect-size and standardized response mean.3 Table 2 shows data for some selections of items. Thus, GerdQ, using the four positive predictors, is also suitable for measuring treatment response in GERD over time. In clinical practice, the aim of the treatment is treatment success. This may be defined as: At most 1 day with ‘A burning feeling behind your breastbone’ and⁄or at most 1 day with ‘Stomach contents (liquid or food) moving upwards to your throat or mouth’. Possible combinations could include no sleep disturbance and no intake of additional medication, (i.e. both scores should be zero), or 1 day with sleep disturbance and⁄or intake of additional medication (i.e. none of these scores should exceed 1).

An additional feature of GerdQ is its ability to identify patients in whom GERD has a greater impact on their lives. This feature should assist in treatment choice where there is a need for more effective treatment. Patients with a sum score of 3 or more (out of 6) on sleep disturbance plus OTC medication use were those most likely to be impacted by their disease and showed correspondingly higher GerdQ scores (Figure 4). Similar, direct correlation was also seen comparing the severity scores for heartburn and the GerdQ sum score (Figure 3). For example, identifying the 40% most affected GERD patients in clinical practice corresponds to a sum score of these impact items of 3 or above. A sum score of 4 or more would identify 20% of the most affected GERD patients.

Fig. 4. Distribution of patients by GerdQ score and the sum of Sleep and OTC scores (shaded 0- >3)

Discussion

Current US and UK guidelines recommend that patients with symptoms of reflux disease are treated without expensive diagnostic tests.16–19 A new patient-centred consensus definition of GERD has recently been developed12 known as the Montreal definition’. It recommends that the clinical diagnosis of GERD be made on the presence of troublesome symptoms of GERD (heartburn and regurgitation); initial endoscopy is generally unnecessary when alarm symptoms are absent. An accurate diagnosis of GERD is important because when untreated or partially treated, the condition has a significant impact on the patients’ health-related quality of life20, 21 and may contribute to excess healthcare expenditure through unnecessary clinical investigation and inappropriate therapy.

Primary care physicians face challenges both in making an accurate diagnosis of GERD and in its management. The use and interpretation of upper gastrointestinal endoscopy by family physicians vary widely16 and the initial and maintenance use of acid-suppressing therapy frequently does not follow evidence-based guidance.18 In addition, the presentation of GERD and its impact may vary considerably across patients, adding to the challenge of managing the condition. In some patients, for example, the most typical reflux symptoms may not be particularly frequent or troublesome, whilst in others, they may cause major disruption to daily activities and sleep. Patients with more persistent and disruptive symptoms may need more aggressive treatment. A structured approach for the assessment of GERD in family practice is, therefore, desirable particularly if third party payers are to control spending in this large therapeutic area.

There are a number of instruments for use in the GERD population but none of these sufficiently aids physicians in addressing these challenges of GERD diagnosis and management. For example, the recently developed GERD Impact Scale (GIS)5 is useful after patients have been diagnosed with GERD, to facilitate patient-physician dialogue, but it is not a diagnostic tool in itself.5 Other instruments such as the PAGI-S22 and ReQuest23 do not meet these criteria. A Chinese language GERDQ was developed for use in epidemio-logical and interventional studies, but has not been used in a clinical setting.24 There is a need for a simple, accurate and well-documented questionnaire that will assist physicians in making or refuting the GERD diagnosis as well as devising a suitable therapeutic strategy and monitoring treatment outcomes over time.

There is a tendency for physicians to underestimate the severity of GERD, because it is a common condition without immediate consequences in terms of mortality or serious morbidity. Patients, on the other hand, sometimes blame themselves for their symptoms and may be reluctant to trouble their physician, even when their symptoms are disruptive to their life-style.25 There is also evidence of a mismatch between physician and patient assessment of the severity of symptoms and the response to treatment.26 It may be easier to elicit accurate information from patients about their symptoms by asking them to complete a short questionnaire. This may also engender improved and systematic assessment of treatment outcomes over time.

The GerdQ has been developed as a tool to support the diagnosis of GERD and to assist in the selection of suitable treatment based on response measurement. It has been developed on the basis of evidence and information collected from recent high-quality clinical studies,2,10 as well as from qualitative patient interviews with regard to preferences for easy completion of questionnaires. Interviews revealed that patients prefer to complete a questionnaire in which the items do not have too many response options (four preferred to six) and also where frequency of symptoms is clearly defined in terms of number of days rather than vaguely as ‘sometimes, often’ etc. This is reflected in our choice of duration intervals for the frequency scores. The work leading up to the development of the RDQ3 and the Montreal definition12 suggested that symptoms for 2 days or more each week was a meaningful cut-off for patients. Additionally, there were very small numbers of GERD patients for some of the frequencies, so that clustering them together provided more meaningful numbers for analysis. A recent population-based study of two communities in northern Sweden has shown that an increasing frequency of even mild reflux symptoms has an increasingly negative impact on patient well-being.27 Weekly GERD symptoms were associated with lower scores on five dimensions of the SF-36 instrument. Identifying and treating relatively minor changes in symptom status can often lead to significant improvements in patients’ health-related quality of life.

CONCLUSIONS

The results indicate that the patient-centred GerdQ questionnaire has three potential uses in clinical practice:

(i) GerdQ can be used to diagnose GERD with an accuracy similar to that of the gastroenterologist.
(ii) GerdQ can be used to assess the relative impact of the disease on patients’ lives and to assist in choice of treatment.
(iii) GerdQ can be used to measure response to treatment over time.

These attributes make GerdQ a potentially useful tool for family practitioners in the diagnosis and management of GERD, without the need for specialist referral or endoscopy.

Acknowledgements

Declaration of personal interests: Roger Jones, John Dent and Nimish Vakil have served as speakers, consultants and advisory board members for AstraZeneca and have received research funding from AstraZeneca. O. Junghard, B. Wernersson and T. Lind are currently employees of AstraZeneca and K. Halling is a previous employee of AstraZeneca. Declaration of funding interests: This study was funded by AstraZeneca R&D, Mo¨lndal. The writing and preparation of this paper were funded in part by AstraZeneca R&D, Mo¨lndal, Sweden. Writing support was provided by Dr Madeline Frame of AstraZeneca R&D and funded by AstraZeneca.