Brennan A, Akehurst R. Modelling in health economic evaluation. What is its place? What is its value? PharmacoEconomics. 2000 May;17(5):445-59.

Taylor RS, Iglesias CP. Assessing the clinical and cost-effectiveness of medical devices and drugs: are they that different? Value in health : the journal of the International Society for Pharmacoeconomics and Outcomes Research. 2009 Jun;12(4):404-6.

Drummond M, Griffin A, Tarricone R. Economic evaluation for devices and drugs--same or different? Value in health : the journal of the International Society for Pharmacoeconomics and Outcomes Research. 2009 Jun;12(4):402-4.

Perkins MR, Devlin NJ, Hansen P. The validity and reliability of EQ-5D health state valuations in a survey of Maori. Quality of life research : an international journal of quality of life aspects of treatment, care and rehabilitation. 2004 Feb;13(1):271-4.

Murray C. Rethinking DALYs. The global burden of disease A comprehensive assessment of mortality and disability from diseases, injuries, and risk factors in 1990 and projected to 2020: Harvard School of Public Health, on behalf of the World Health Organisation and the World Bank; 1996.

Stinnett AA, Mullahy J. Net health benefits: a new framework for the analysis of uncertainty in cost-effectiveness analysis. Medical decision making : an international journal of the Society for Medical Decision Making. 1998 Apr-Jun;18(2 Suppl):S68-80.

[2] Refer to Table 12: Reporting of Cost-Utility Analysis Results in Chapter 11 for further details on information to include in a CUA report when describing the disease, patient population and treatment options.

[3] Meta-analysis systematically combines the results of studies in order to draw overall conclusions about the efficacy and/or safety of the treatment.

[5] The p value is the probability that an observed effect is due to sampling error; therefore, it provides a measure of the strength of an association. This section uses p values to notionally define statistical significance; however, it is noted that confidence intervals may better summarise the strength and precision of the effect estimate.

[6] Effect sizes with p values close to but not reaching statistical significance will be due to either one of two circumstances: (1) the effect is strong but the confidence interval is wide, because numbers of events, etc, are small; or (2) the effect is weaker but the confidence interval is narrower. In either case the p value being close to 0.05 means that the 95% confidence interval will only just include the value of 1.0 (ie a small but statistically significant chance that there is no effect). When deciding whether to still include such clinical events: (1) a strong effect will take precedence over a weaker effect; (2) a strong effect (with wide confidence limits) means the effect is likely to be clinically important, being limited by insufficient power (where ‘absence of evidence is not evidence of absence’) (18). Conversely, a weak effect with narrower confidence limits is unlikely to be clinically important (ie greater confidence but a negligible effect on outcomes).

[7] To help determine whether events are clinically significant, outcomes should be examined to determine whether their association with treatment is likely to be causal. Key criteria for determining causal associations include (19): temporality (ie the cause must precede the effect); strength of association; consistency between different populations and different study designs; and a dose-response relationship (ie increased exposure is associated with an increased biological effect).

[8] For composite endpoints to be valid, the results of the individual endpoints of composite measures reported by clinical trials should be reported (20). The numberof individual endpoints should be minimised to preferably no more than three or four (21). Component non-fatal endpoints should be measured appropriately, with the use of a blinded endpoints committee, a core laboratory, or both (21), and analysis of non-fatal events should take into account competing risks. For information on the assessment of composite outcomes, please refer to the PBAC Guidelines for preparing a major submission (22).

[9] Due to the differences in regulatory approval processes, this section applies mainly to medical devices.

[10] Patient subgroups may have different responses to treatment or magnitudes of benefit. These subgroups may be defined by age, gender, other demographic factors, disease-related factors (symptom complexes, severities), comorbidities, or intractability and factors affecting treatment effectiveness. The degree of breakdown depends upon the complexity of the targeting decisions to be made. Some situations will require many subgroups, others just the overall group.

[11] Relevant statistical tests of interaction include the chi-square test using the Q statistic in an individual trial or the Cochran Q statistic across the pooled result, and the I2 statistic with its 95% uncertainty interval.

[12] Statistical tests of interaction are preferred to individual tests within each subgroup – individual tests often overestimate the extent of true differences. (32)

[13] Subgroup treatment effects in a trial with no overall treatment effect are said to be usually superfluous subgroup salvages of otherwise indeterminate (negative) trials (33).

[14] DALYs are expressed in terms of years of life lost due to premature death and years lived with a disability of specific severity and duration.

[16] This included negative values for health states considered to be worse than death (47). Survey results indicated that respondents can and do evaluate some health states as worse than death, and the study authors recommended the systematic inclusion of these states to describe a more complete range of preference values (48).

[17] Logical inconsistency was defined as “when a state that ‘in logical terms’ is unambiguously less severe than another is assigned a lower value” (46).