This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background

Smoking is a known lung cancer cause, but no detailed quantitative systematic review
exists. We summarize evidence for various indices.

Methods

Papers published before 2000 describing epidemiological studies involving 100+ lung
cancer cases were obtained from Medline and other sources. Studies were classified
as principal, or subsidiary where cases overlapped with principal studies. Data were
extracted on design, exposures, histological types and confounder adjustment. RRs/ORs
and 95% CIs were extracted for ever, current and ex smoking of cigarettes, pipes and
cigars and indices of cigarette type and dose–response. Meta-analyses and meta-regressions
investigated how relationships varied by study and RR characteristics, mainly for
outcomes exactly or closely equivalent to all lung cancer, squamous cell carcinoma
(“squamous”) and adenocarcinoma (“adeno”).

Results

287 studies (20 subsidiary) were identified. Although RR estimates were markedly heterogeneous,
the meta-analyses demonstrated a relationship of smoking with lung cancer risk, clearly
seen for ever smoking (random-effects RR 5.50, CI 5.07-5.96) current smoking (8.43,
7.63-9.31), ex smoking (4.30, 3.93-4.71) and pipe/cigar only smoking (2.92, 2.38-3.57).
It was stronger for squamous (current smoking RR 16.91, 13.14-21.76) than adeno (4.21,
3.32-5.34), and evident in both sexes (RRs somewhat higher in males), all continents
(RRs highest for North America and lowest for Asia, particularly China), and both
study types (RRs higher for prospective studies). Relationships were somewhat stronger
in later starting and larger studies. RR estimates were similar in cigarette only
and mixed smokers, and similar in smokers of pipes/cigars only, pipes only and cigars
only. Exceptionally no increase in adeno risk was seen for pipe/cigar only smokers
(0.93, 0.62-1.40). RRs were unrelated to mentholation, and higher for non-filter and
handrolled cigarettes. RRs increased with amount smoked, duration, earlier starting
age, tar level and fraction smoked and decreased with time quit. Relationships were
strongest for small and squamous cell, intermediate for large cell and weakest for
adenocarcinoma. Covariate-adjustment little affected RR estimates.

Conclusions

The association of lung cancer with smoking is strong, evident for all lung cancer
types, dose-related and insensitive to covariate-adjustment. This emphasises the causal
nature of the relationship. Our results quantify the relationships more precisely
than previously.

Background

It has been known for many years that smoking causes lung cancer. An association was
clearly documented in case–control studies conducted in Germany in the 1930s
[1], and in the United States and Great Britain
[2,3] in the 1950s, and was strengthened by surveys of large cohorts. This led the US Surgeon
General to conclude in 1964
[4] that “cigarette smoking is a cause of lung cancer in men, and a suspected cause of
lung cancer in women”. Further reports
[5,6] have defined the relationship in more detail, and it has been estimated that, in
the United States, 90% of male lung cancer deaths and 75%-80% of female lung cancer
deaths are caused by smoking
[7].

While some meta-analyses of the evidence have been published in recent years
[8-10] none consider more than a relatively small fraction of the published evidence. We
attempt to rectify this omission, though the sheer extent of the available data, and
resources available, has meant limiting attention to papers published in the last
century and studies involving over 100 lung cancer cases. As will be seen, this still
gives us an extensive database involving almost 300 studies.

Because the relationship of smoking to the two major types of lung cancer (squamous
cell carcinoma and adenocarcinoma) is known to vary
[5,6], we present detailed results relating, not only to total lung cancer risk, but also
to these two histological types of lung cancer. We also present some more limited
results for other lung cancer types. To provide a broad description of the relationship
of smoking to lung cancer, we do not concentrate on a single primary analysis, but
quantify the relationships to each of a range of indices of smoking, investigating
how these relationships vary according to characteristics such as sex, age, location,
study design, period considered, definition of exposure and extent of confounder adjustment.
The style of this systematic review is similar to one we have recently published for
smoking and COPD, chronic bronchitis and emphysema
[11].

Methods

Full details of the methods used are described in Additional file
1: Methods, and are summarized below. Throughout this paper, we use the term relative
risk (RR) to include its various estimators, including the odds ratio and the hazard
ratio.

Inclusion and exclusion criteria

Attention was restricted to epidemiological prospective or case–control studies published
up to and including 1999, which involved 100 lung cancers or more, and which provided
RR estimates for one or more defined major, cigarette-type or dose-related smoking
indices. The “major indices” compare ever, current or ex smoking with never or non-current
smoking, and refer to smoking of any product, cigarettes, pipes, cigars and combinations,
or of specific types of cigarette. The “cigarette type indices” compare smokers of
different types of cigarette – filter with plain, manufactured with handrolled and
mentholated with non-mentholated. The “dose-related indices” concern amount smoked,
age of starting to smoke, duration of smoking, duration of quitting, tar level, butt
length or fraction smoked. Pack-years was not considered as it was felt more important
to separate effects of extent and duration of exposure. Uncontrolled case studies
were not included. There were no further exclusion criteria.

Literature searching

Between 1997 and 2001 potentially relevant papers were sought from Medline and Emtree
searches, from British Library monthly bulletins, from files on smoking and health
accumulated over many years by P N Lee Statistics and Computing Ltd, and from references
cited in papers obtained, until ultimately no paper examined cited a paper of possible
relevance not previously examined.

Identification of studies

Relevant papers were allocated to studies, noting multiple papers on the same study,
and papers reporting on multiple studies. Each study was given a unique reference
code (REF) of up to 6 characters (e.g. COMSTO or LUBIN2), based on the principal author’s
name and distinguishing multiple studies by the same author.

Some studies were noted as having overlaps with other studies. To minimize problems
in meta-analysis arising from double-counting of cases, overlapping studies were divided
into two categories, as shown in Additional file
2: Studies. The first category involved minor overlap, which could not be disentangled,
and which it was decided to ignore. The second category contains sets of studies which
probably or definitely overlap. Here the set member containing the most comprehensive
data (e.g. largest number of cases or longest follow-up) was called the ‘principal
study’, other members being ‘subsidiary studies’ only considered in meta-analyses
where the required RR was unavailable from the principal study.

Data recorded

Relevant information was entered onto a study database and two linked RR databases.
Data entry was carried out in two stages. In 1997–2002, data were entered on the first
RR database for the major smoking indices, cigarette type indices, and amount smoked.
In 2009–2010, data were entered on the second RR database for the remaining dose-related
indices.

The study database contains a record for each study, describing the following aspects:
relevant publications; study title; study design; sexes considered; age range, race(s)
and other details of the population studied; location; timing and length of follow-up;
whether principal or subsidiary, with details of overlaps or links with other studies;
number of cases and extent of histological confirmation; number of controls or subjects
at risk; types of controls and matching factors used in case–control studies; use
of proxy respondents, interview setting and response rates; confounding variables
considered; availability of results by histological types; and availability of results
for all smoking indices (including those indices not considered here, such as pack-years).

The RR databases hold the detailed results, typically containing multiple records
for each study. Each record is linked to the relevant study and refers to a specific
RR, recording the comparison made and the results. This record includes the sex, age
range, race, lung cancer type, and (for prospective studies) the follow-up period.
The smoking exposure of the numerator of the RR is defined by the smoking status (ever,
current or ex), smoking product (e.g. any, cigarettes, cigarettes only, pipes only)
and cigarette type (e.g. any, mainly hand-rolled cigarettes, filter cigarettes only,
mentholated cigarettes). Similar information is recorded about the denominator of
the RR. For dose-related indices, the level of exposure is recorded. The source of
the RR is also recorded, as are details on adjustment variables. Results recorded
include numbers of cases for the numerator and denominator, and, for unadjusted results,
numbers of controls, persons at risk or person-years at risk. The RR itself and its
lower and upper 95% confidence limits (LCL and UCL) are always recorded. These may
be as reported, or derived by various means (see below), with the method of derivation
noted.

Identifying which RRs to enter

Lung cancer type

Results were entered for all lung cancer, for Kreyberg I (as originally presented,
or by combining squamous, small and large cell carcinoma) and Kreyberg II (as originally
presented, or by combining adenocarcinoma and others not in Kreyberg I), and for squamous,
small, and large cell carcinoma and for adenocarcinoma separately. Additionally, the
following groups were constructed if not originally presented: all lung cancer or
nearest equivalent, but at least squamous cell carcinoma and adenocarcinoma; squamous
cell carcinoma or nearest equivalent; adenocarcinoma or nearest equivalent.

Major and cigarette type smoking indices

The intention was to enter RRs comparing current smokers, ever smokers or ex smokers
with never or non smokers. Near-equivalent definitions were accepted when stricter
definitions were unavailable, so that, for example, never smokers could include occasional
smokers (or exceptionally, light smokers), while current smokers could include, and
ex-smokers exclude, recent quitters. RRs were to be entered relating to smoking of
defined products and, when the product related to cigarette smoking, to defined cigarette
types (see also Additional file
1: Methods). If available, results (for each of current, ex and ever smoking) were
entered for five comparisons: any product vs. never any product, cigarettes vs. never
any product, cigarettes only vs. never any product, cigarettes vs. never cigarettes,
and cigarettes only vs. never cigarettes (and also for five equivalent comparisons
for current vs non smoking). Here “cigarettes” ignores whether other products (i.e.
pipes and cigars) are also smoked, while “cigarettes only” excludes mixed smokers.
Additionally, when the numerator related to the smoking of filter, handrolled or mentholated
cigarettes, RRs were entered with the denominator defined as relating to plain, manufactured
or non-mentholated smokers respectively.

Dose-related smoking indices

RRs were entered for seven measures: amount smoked, age of starting, duration of smoking,
duration of quitting, tar level, butt length and fraction smoked. RRs were expressed
relative to never smokers (or near equivalent), if available, or relative to non smokers
otherwise. For duration of quitting, RRs were also expressed relative to current smokers.
Except for amount smoked, further RRs were entered, restricted to smokers, and expressed
relative to the level expected to have the lowest risk (e.g. shortest duration or
latest age started).

Confounders adjusted for

For case–control studies, results were entered adjusted for the greatest number of
potential confounding variables for which results were available, and also unadjusted
(or adjusted for the smallest number of confounders). For prospective studies, results
were entered adjusted for age and the greatest number of confounders, and for age
only or age and the smallest number of confounders, with unadjusted results entered
only if no age-adjusted results were available. These alternative RRs are subsequently
referred to as “most-adjusted” and “least-adjusted”. For dose-related RRs restricted
to smokers, results with “most adjustment” but without adjustment for other aspects
of smoking were also entered if available.

Strata

Three strata were considered – sex, age and race. Results were entered for males and
females separately when available, with combined sex results only entered when sex-specific
results were not available. Results were entered for all ages combined and for individual
age groups, and for all races and for individual racial groups.

Derivation of RRs

Adjusted RRs and their 95% CIs were entered as provided, when available. Unadjusted
RRs and CIs were calculated from their 2 × 2 table, using standard methods (e.g.
[12]), noting any discrepancies between calculated values and those provided by the author.
Sometimes the 2 × 2 table was constructed by summing over groups (e.g. adding current
and ex smokers to obtain ever smokers) or from a percentage distribution. Various
other methods were used as required to provide estimates of the RR and CI. Some more
commonly used methods are summarized below, fuller details being given in Additional
file
1: Methods.

Correction for zero cell

If the 2 × 2 table has a zero cell, 0.5 was added to each cell, and the standard formulae
applied.

Combining independent RRs

Combining non-independent RRs

The Hamling et al. method
[14] was used (e.g. to derive an adjusted RR for ever smokers from available adjusted
RRs for current and ex smokers, each relative to never smokers, or to combine adjusted
RRs for several histological types, each relative to a single control group).

Estimating CI from crude numbers

If an adjusted RR lacked a CI or p-value but the corresponding 2 × 2 table was available,
the CI was estimated assuming that the ratio UCL/LCL was the same as for the equivalent
unadjusted RR.

Data entry and checking

Master copies of all the papers in the study file were read closely, with relevant
information highlighted to facilitate checking. Where multiple papers are available
for a study, a principal publication was identified, although details described only
in other publications were also recorded. Preliminary calculations and data entry
were carried out by one author and checked by another, and automated checks of completeness
and consistency were also conducted. RR/CIs underwent validation checks
[15].

Meta-analyses conducted – overview

A pre-planned series of meta-analyses was conducted for various smoking indices for
each of the three main outcomes (all lung cancer, squamous cell carcinoma, and adenocarcinoma)
and also for some indices for two other outcomes (large cell carcinoma and small cell
carcinoma). Nearest equivalent definitions are allowed for the three main outcomes,
with the terms “squamous” and “adeno” used subsequently to distinguish these results
from those specifically for these cell types. Each meta-analysis was repeated, based
on most-adjusted RRs and on least-adjusted RRs. For each meta-analysis conducted,
combined estimates were made first for all the RRs selected, then for RRs subdivided
by level of various characteristics, testing for heterogeneity between levels.

Selecting RRs for the meta-analyses

All meta-analyses are restricted to records with available RR and CI values. The process
of selecting RRs for inclusion in a meta-analysis must try to include all relevant
data and to avoid double-counting. For a given analysis (e.g. of current cigarette
smoking), several definitions of RR may be acceptable (e.g. cigarette smoking, or
cigarette only smoking), so, for studies with multiple RRs, the one to be used is
determined by a preference order defined for the meta-analysis. Preference orders
may be required for smoking status, smoking product, the unexposed base, and extent
of confounder adjustment. As the definitions of RR available may differ by sex (e.g.
a study may provide RRs for any product smoking for males, but only for cigarette
smoking for females), the RRs chosen for each sex may not necessarily have the same
definition. Sexes combined results are only considered where sex-specific results
are not available. Similarly RRs from a subsidiary study are only used where eligible
RRs are unavailable from the principal study. When multiple preference orders are
involved, the sequence of implementation may affect the selection, so preferences
for the most important aspects, usually concerning smoking, are implemented first.

Carrying out the meta-analyses

Fixed-effect and random-effects meta-analyses were conducted using the method of Fleiss
and Gross
[13], with heterogeneity quantified by H, the ratio of the heterogeneity chisquared to
its degrees of freedom, which is directly related to the statistic I2[16] by the formula I2 = 100 (H-1)/H. For all meta-analyses, Egger’s test of publication bias
[17] was also included.

Meta-analyses were conducted in various sets (A to N) corresponding to the sub-sections
of the results section of the paper. A full list of the analyses is given in Additional
file
1: Methods.

The major smoking indices

For the major smoking indices, the first four sets of meta-analyses relate to: A ever
smoking, B current smoking, C ever smoking (but with current smoking used if ever
smoking not available), referred to subsequently as “ever/current” smoking, and D
ex smoking. In what is referred to as the main analysis in each set, smoking of any
product is preferred by selecting RRs in the following preference order: 1. smoking
of any product vs. never smoked any product; 2. smoking of cigarettes vs. never smoked
any product, 3. smoking of cigarettes only vs. never smoked any product; 4. smoking
of cigarettes vs. never smoked cigarettes; 5. smoking of cigarettes only vs. never
smoked cigarettes; with options 6–10 the same as options 1–5 except that “never smoked”
is replaced by “never smoked near equivalent”. A variant analysis prefers cigarette
smoking (by changing the preference order to 4, 5, 2, 3, 1, 9, 10, 7, 8, 6). In meta-analyses
of type C, a further variant analysis reverses the preference so current smoking results
are preferred to those for ever smoking, referred to subsequently as “current/ever”
smoking. Other variant analyses are based on RRs for specified age ranges.

A further set of meta-analyses, E, concerns smoking of pipes and/or cigars (but not
cigarettes), referred to subsequently as smoking of “pipes/cigars only”, smokers of
pipes only, smokers of cigars only, and smokers of cigarettes and pipes/cigars (“mixed”
smokers). Separate meta-analyses were conducted for ever smoking, current smoking,
ever/current smoking, current/ever smoking and ex smoking.

The cigarette type indices

Meta-analyses were conducted, in set F, for only filter vs. only plain, ever filter
vs. only plain, only filter vs. ever plain, handrolled vs. manufactured, and mentholated
vs. non-mentholated. These were only conducted for ever/current smoking, and preferring
RRs for cigarettes over RRs for cigarettes only. The analyses with only filter as
the numerator used the preference order of filter only, always, mainly, both, equally,
and ever, while the analyses with ever filter as the numerator used the reverse preference.
Similar preference orders applied to the denominators. The analyses of handrolled
vs. manufactured cigarettes used the preference order of any, both, mainly, and only
for handrolled, and only ever, only current, any and ever for manufactured.

The dose-related smoking indices

For the dose-related indices, sets of meta-analyses were conducted for: G amount smoked,
H age of starting to smoke, I duration of smoking, J duration of quitting compared
to never smokers (or long-term ex smokers), K duration of quitting compared to current
smokers (or short-term quitters), L tar level, and M butt length or fraction smoked
(taking short butt length as being equivalent to a large fraction smoked). For any
measure, a study typically provides a set of non-independent RRs for each dose-category,
expressed relative to a common base. To avoid double-counting only one was included
in any one meta-analysis. Two approaches were adopted. The first involves specifying
a scheme with a number of levels of exposure (“key values”), then carrying out meta-analyses
for each level in turn, expressed relative to never smokers. For an RR to be allocated
to a key value, its dose-category has to include that key-value and no other. Schemes
with a few, widely spaced, key values tend to involve more studies, whereas schemes
with more key values, closely spaced, involve RRs from fewer studies, but ones with
dose categories more closely clustered around the key value. The sets of key values
used (with 999 indicating an open-ended category) were 5, 20, 45 and 1, 10, 20, 30,
40, 999 for amount smoked; 26, 18, 14 and 30, 26, 22, 18, 14, 10 for age of starting
to smoke; 20, 35, 50 and 5, 20, 30, 40, 50, 999 for duration of smoking; 12, 7, 3
and 20, 12, 3 for duration of quitting vs. never; and 3, 7, 12 and 3, 12, 20 for duration
of quitting vs. current. No key value analysis was conducted for tar level, or for
butt length/fraction smoked. The second approach (not conducted for amount smoked)
involves meta-analysing of RRs for the highest compared with the lowest categories
of exposure within smokers available for each study.

Meta-regression analyses

While full multivariable analysis of the data is considered beyond the scope of this
report, meta-regression analyses were also carried out using the sets of RRs selected
for the main meta-analyses for ever smoking and for current smoking. Following preliminary
meta-regressions (not shown), a “fixed model” was fitted to examine the effect on
the results of six different categorical variables (sex, location, start year of study,
major study type, number of lung cancer cases and number of adjustment factors). Note
that the number of lung cancer cases (in the study as a whole), which is referred
to subsequently as “number of cases”, is used as an indicator of study size. The significance
of each of these variables was estimated by an F-test based on the increase in deviance
resulting from its exclusion from the basic model. A list of secondary variables was
also defined (relating to more detailed aspects of location, outcome, study type and
confounder adjustment, national cigarette tobacco type, the product smoked, the denominator
used in the RR, use of proxy respondents, whether the study required 100% histological
confirmation of lung cancer, whether the population studied worked in risky occupations,
the age of the subjects, and the derivation of the RR) with the significance of adding
each characteristic to the fixed model estimated by an F-test based on the increase
in deviance. Fuller details are given in Additional file
1: Methods.

Additional analyses

Additional tests of the relationship of lung cancer risk to various characteristics
of interest were based on corresponding pairs of RR and CI estimates within the same
study for the same definition of outcome and exposure, and deriving the ratio of the
two RRs. Where the pairs involved independent sets of subjects, the variance of the
ratio was also derived, and meta-analyses of the ratio were conducted. Where the pairs
involved non-independent sets of subjects the numbers of ratios greater and less than
1 were compared using the sign test. Tests of independent pairs related to sex (males
vs. females), age (oldest vs. youngest age group) and race (white people vs. non-white
or black people). Tests of non-independent pairs related to level of adjustment (most-adjusted
vs. least-adjusted), and to comparisons of product smoked (mixed smokers vs. cigarette
only smokers, and vs. smokers of pipes/cigars only). Tests were always carried out
for all lung cancer and ever/current smoking. For sex, additional analyses were conducted
for current and for ever smoking, for squamous and adeno, and also within level of
amount smoked. For level of adjustment, two sets of analyses were run. The first,
relating to RRs for ever/current smoking were based on the most-adjusted/least-adjusted
ratio, while the second, for highest vs. lowest RRs for age of starting to smoke,
duration, years quit and tar level, compared RRs that were most- or least-adjusted
for other aspects of smoking.

Software

All data entry and most statistical analyses were carried out using ROELEE version
3.1 (available from P.N. Lee Statistics and Computing Ltd, 17 Cedar Road, Sutton,
Surrey SM2 5DA, UK). Some analyses were conducted using Quattro Pro 9 or Excel 2003.

Table 3.Distribution of the main characteristics of the 287 studies of lung cancer

Of the 287 studies, 267 are classified as principal, 209 (78.3%) of these being case–control
studies, 52 (19.5%) prospective, 5 (1.9%) nested case–control and 1 (0.4%) case-cohort.
Note that the last three study designs, where exposure was determined before diagnosis,
are combined into one category in Table
3 (and the text below based on it). The other 20 studies are classified as subsidiary.
Of the principal studies, 262 provide data for all lung cancer, 84 for squamous and
86 for adeno. Only rarely did these studies provide data only for squamous (1 study)
or adeno (3 studies). The data come less often from case–control designs for all lung
cancer (77.9%) than for squamous (86.9%) and adeno (87.2%).

Of the 267 principal studies, 158 (59.2%) provide results for both sexes, 90 (33.7%)
for males only, and 19 (7.1%) for females only. One hundred and ninety-six (73.4%)
of the studies included subjects who are under 30 years old (or allowed their inclusion
by having no age restriction), while only 31 (11.6%) were restricted to subjects aged
40 or more. Subjects aged 80 years or more were included by 200 (74.9%), while only
16 (6.0%) were restricted to subjects aged 60 or less. Prospective studies were much
more likely than case–control studies to specify age restrictions, e.g. 62.1% vs.
16.7% for age 30 years or more, and 48.3% vs. 18.7% for age less than 80 years. Eighty-nine
(33.3%) principal studies were conducted in USA or Canada, with 22 (8.2%) in the UK,
25 (9.4%) in Scandinavia, 43 (16.1%) in other parts of Europe, 37 (13.9%) in China,
18 (6.7%) in Japan, 17 (6.4%) in the rest of Asia and 16 (6.0%) elsewhere – in South
or Central America, Africa or Australia. Of the 58 prospective studies, all but 12
were conducted in North America, UK or Scandinavia. Of the principal studies, 42 (15.7%)
were conducted in countries where at least 75% of cigarettes smoked are made from
Virginia tobacco, with 184 (68.9%) carried out where at least 75% of cigarettes are
from blended tobaccos. Forty seven (17.6%) started before 1960. Studies starting after
1979 were predominantly (92.4%) case–control. Thirty-six (13.5%) involved at least
1,000 lung cancer cases. Seven (2.6%) were conducted in miners, with a further 11
(4.1%) conducted in other occupational groups with a known relationship with lung
cancer. Proxy respondents were used for some subjects in 74 (27.7%), with full histological
confirmation of cases reported to be carried out in 68 (25.5%).

Most study groups (i.e. a principal study or one of its subsidiaries) provide some
results for the major indices compared to never smokers, 240 (89.9%) for ever smokers,
134 (50.2%) for current smokers and 127 (47.6%) for ex smokers. Many studies provide
results for smoking of any product (162 studies, 60.7%) or for cigarettes (147, 55.1%),
but less do so for cigarette only smoking (55, 20.6%), smoking of pipes/cigars only
(62, 23.2%), mixed smoking (29, 10.9%), or for the cigarette type indices filter/plain
cigarette smoking (38, 14.2%), hand-rolled cigarette smoking (15, 5.6%), or mentholated
cigarette smoking (3, 1.1%). Though dose–response data are most commonly available
by amount smoked (162, 60.7%), many studies provide data by age of starting to smoke
(62, 23.2%), duration (77, 28.8%), and time quit (58, 21.7%). Few studies provide
data on tar level (11 studies, 4.1%), fraction smoked (9 studies, 3.4%), or butt length
(2 studies, 0.7%).

Relative risks

A total of 16,616 RRs were entered, the number recorded per study varying from 1 to
1,029. Of these, 1,266 relate to subsidiary studies. Table
4 summarizes the distribution of various characteristics of the RRs by outcome, sex,
study type and location.

Table 4.Distribution of the main characteristics of the relative risksa

Of the total of 16,616 RRs, 71.9% relate to case–control studies, and 93.8% are sex-specific.
40.2% come from North American studies, 36.8% from Europe, 16.7% from Asia, and 6.3%
from other continents. 60.9% are unadjusted for potential confounding variables and
18.7% are adjusted for sex and/or age only. 70.1% are given directly or are calculated
by standard methods, the rest being derived by more complex methods.

Of the total RRs, 5,061 relate to the major smoking indices, where the denominator
is never or non smoking, with 3,614 of these relating to smoking of any product or
cigarettes (regardless of pipe or cigar smoking), 678 to cigarette only smoking and
769 to pipe, cigar or mixed smoking. Four hundred and forty-eight relate to cigarette
type comparisons, most commonly (303 RRs) to the filter vs. plain comparison. All
the 25 RRs for the mentholated/non-mentholated comparison come from North American
studies, while none of those for the handrolled/manufactured comparison do. There
are 10,921 RRs for dose-related indices, based mainly on 3,625 sets, 2,047 vs. never
or non smoking, 1,327 vs. the low level, and 251 vs. current smoking. There are most
sets for amount smoked (1,145) and least for butt length (5). For amount smoked, age
of starting, duration of smoking, years quit (vs. never and vs. current) there are
sufficient numbers of dose–response sets to study variation in RR by sex, study type
and continent.

None of the RRs included in the meta-analyses and meta-regressions show more than
minor failures of the validation tests used, attributable to rounding errors or small
imprecisions or uncertainties in estimating the RRs and CIs. Additional file
3: RRs provides further detail.

The meta-analyses and meta-regressions

The main findings are summarized in the following sections, with tables and forest
plots. Additional file
5: Detailed Analysis Tables fully presents all the meta-analyses and meta-regressions
conducted. The interested reader should first see Additional file
1: Methods, which lists the other files, and describes their content and structure.

Findings are generally presented for three outcomes, referred to as “all lung cancer”,
“squamous” or “adeno”. These outcomes are defined in the Methods section, and also
in the footnotes to the tables, and allow the inclusion of results based on alternative
similar definitions. (Note that the terms “squamous cell carcinoma” and “adenocarcinoma”
are only used when reference is made to results specifically for the particular cell
type).

A. Risk from ever smoking

Figures
1,
2,
3,
4 and
5 (all lung cancer), Figures
6,
7 (squamous) and Figure
8,
9 (adeno) present the results of the main meta-analyses for ever smoking any product
(or cigarette smoking for studies without RRs for any product), based on most-adjusted
RRs. Table
5 presents additional results subdivided by level of certain characteristics, while
Table
6 presents results of some alternative meta-analyses of ever smoking. From these findings,
various observations can be made.

Figure 1.Forest plot of ever smoking of any product and all lung cancer – part 1. Table
5 presents the results of a main meta-analysis for all lung cancer based on 328 relative
risk (RR) and 95% confidence interval (CI) estimates for ever smoking of any product
(or cigarettes if any product not available). The individual study estimates are shown
numerically and graphically on a logarithmic scale in Figures
1,
2,
3,
4,
5. The studies are sorted in order of sex within study reference (REF) within start
year of study (START) within continent (CONT), with the exception of study LIU4 shown
at the end of Figure
5. In the graphical representation individual RRs are indicated by a solid square,
with the area of the square proportional to the weight (inverse-variance of log RR).
Arrows indicate where the CI extends outside the range allocated.

Figure 2.Forest plot of ever smoking of any product and all lung cancer – part 2. This is a continuation of Figure
1, presenting further individual study data included in the main meta-analysis for
all lung cancer shown in Table
5. For study DORGAN separate estimates, within sex, are shown for whites then blacks.
For study HUMBLE they are shown for non-hispanic whites then hispanics, and for study
KELLER for whites then non-whites.

Figure 3.Forest plot of ever smoking of any product and all lung cancer – part 3. This is a continuation of Figure
2, presenting further individual study data included in the main meta-analysis for
all lung cancer shown in Table
5.

Figure 4.Forest plot of ever smoking of any product and all lung cancer – part 4. This is a continuation of Figure
3, presenting further individual study data included in the main meta-analysis for
all lung cancer shown in Table
5.

Figure 5.Forest plot of ever smoking of any product and all lung cancer – part 5. This is a continuation of Figure
4, presenting the remaining individual study data included in the main meta-analysis
for all lung cancer shown in Table
5. Also shown are the combined random-effect estimates. These are represented by a
diamond of standard height, with the width indicating the 95% CI. Note that the sizes
of the squares for the two estimates from study LIU4 indicate the relative weight
of the male and female data, but are not comparable with the sizes of the squares
for the other estimates.

Figure 6.Forest plot of ever smoking of any product and squamous – part 1. Table
5 presents the results of a main meta-analysis for squamous based on 102 relative risk
(RR) and 95% confidence interval (CI) estimates for ever smoking of any product (or
cigarettes if any product not available). The individual study estimates are shown
numerically and graphically on a logarithmic scale in Figures
6,
7. The studies are sorted in order of sex within study reference (REF) within start
year of study (START) within continent (CONT). In the graphical representation individual
RRs are indicated by a solid square, with the area of the square proportional to the
weight (inverse-variance of log RR). Arrows indicate where the CI extends outside
the range allocated. For study SCHWAR separate estimates, within sex, are shown for
whites then blacks.

Figure 7.Forest plot of ever smoking of any product and squamous – part 2. This is a continuation of Figure
6, presenting the remaining individual study data included in the main meta-analysis
for squamous shown in Table
5. Also shown are the combined random-effect estimates. These are represented by a
diamond of standard height, with the width indicating the 95% CI.

Figure 8.Forest plot of ever smoking of any product and adeno – part 1. Table
5 presents the results of a main meta-analysis for adeno based on 107 relative risk
(RR) and 95% confidence interval (CI) estimates for ever smoking of any product (or
cigarettes if any product not available). The individual study estimates are shown
numerically and graphically on a logarithmic scale in Figures
8,
9. The studies are sorted in order of sex within study reference (REF) within start
year of study (START) within continent (CONT). In the graphical representation individual
RRs are indicated by a solid square, with the area of the square proportional to the
weight (inverse-variance of log RR). Arrows indicate where the CI extends outside
the range allocated. For study SCHWAR separate estimates, within sex, are shown for
whites then blacks.

Figure 9.Forest plot of ever smoking of any product and adeno – part 2. This is a continuation of Figure
8, presenting the remaining individual study data included in the main meta-analysis
for adeno shown in Table
5. Also shown are the combined random-effect estimates. These are represented by a
diamond of standard height, with the width indicating the 95% CI.

Table 5.Main meta-analyses for ever smoking of any product (or cigarettes if any product not
available)a

Table 6.Some alternative meta-analyses for ever smoking compared to those in Table5

First, the RRs for all three outcomes are markedly heterogeneous. As shown in Table
5, H is estimated as 22.84 for all lung cancer, 5.17 for squamous and 8.78 for adeno
(p < 0.001). Individual RRs vary up to 125.27 for all lung cancer (study STUCKE for
males), 92.66 for squamous (ABRAHA/males), and 34.45 for adeno (SCHWAR/males). Based
on random-effects estimates, a positive association is seen, strongest for squamous
(RR 10.47, 95% CI 8.88-12.33, based on 102 RRs), but also clearly evident for all
lung cancer (5.50, 5.07-5.96, n = 328) and adeno (2.84, 2.41-3.35, n = 107). Although
the strength of association varies markedly by study, the consistency of direction
is clear, with only two of the all lung cancer RRs, none of the 102 squamous RRs,
and nine of the 107 adeno RRs below 1.0.

As shown in Table
6, the overall estimates for each outcome were virtually unchanged by using least-adjusted
rather than most-adjusted estimates. They were slightly increased by restricting attention
to estimates using a more precise outcome definition, the random-effects estimates
changing to 5.59 (5.15-6.07) for the 317 estimates specifically for all lung cancer,
11.56 (9.68-13.81) for the 74 estimates specifically for squamous cell carcinoma,
and 2.99 (2.49-3.58) for the 87 estimates specifically for adenocarcinoma. The overall
estimates for each outcome were virtually unchanged when RRs for ever smoking cigarettes
were preferred to RRs for ever smoking any product. This is partly due to many studies
providing only one type of RR, so that for all lung cancer, for example, 250 of the
328 RRs are common to both meta-analyses. A much smaller number of estimates were
available for cigarette only smoking; RRs from these were slightly higher: 6.45 (5.41-7.70,
n = 54) for all lung cancer, 11.50 (7.47-17.69, n = 11) for squamous, and 2.87 (1.49-5.55,
n = 11) for adeno. Estimates were also extracted specifically for populations of age
<56, 50–70 or 65+ years (with age determined at baseline for prospective studies).
As shown in Table
6, data were rather limited for squamous and adeno, particularly for older populations.
For all lung cancer, the three RRs: 6.57 (4.94-8.74, n = 38) for age <56 years, 6.46
(4.99-8.35, n = 31) for age 50–70 years, and 5.48 (4.59-6.55, n = 37) for age 65+
years were all consistent with the overall RR of 5.50, with no clear trend.

Returning to the main meta-analysis (most-adjusted and preferring ever smoking any
product), there is a large variation between RRs in the weight they contribute to
the analysis. This is very marked for all lung cancer. Here the 328 estimates provided
a combined weight of 19,346 (mean 59.0), but the male and female estimates from study
LIU4 together contributed a weight of 9,846, 50.9% of the total. Omitting these two
estimates substantially reduced the heterogeneity, H falling from 22.84 to 12.54.
The next largest weights were 1,550 in study STOCKW (sexes combined), 443 in BROWN2
(males) and 428 in BROWN2 (females). For squamous, the total weight was 1,000 for
the 102 RRs (mean 9.8). The largest contributors to this were 164 for BROWN2/males,
90 for BROWN2/females, 52 for LUBIN2/males and 47 for LUBIN2/females, together contributing
35% of the total weight. For adeno, the total weight was 1,514 for the 107 RRs (mean
14.1). Again, BROWN2 and LUBIN2 were the largest contributors, providing, respectively,
24% and 6% of the total weight.

In investigating sources of heterogeneity, variation was studied firstly using a univariable
approach, the results for the characteristics considered in Table
5 being summarized below, based on the random-effects estimates.

Sex

For all three outcomes, RRs were always somewhat lower for females than for males
or for sexes combined, though the variation by sex was not significant (p ≥ 0.1) for
squamous.

Location

For all three outcomes, RRs were lower from studies conducted in Europe and Asia than
from studies conducted in North America. While for all lung cancer and adeno RRs were
noticeably lower in Asia than in Europe, this difference was not evident for squamous.
The difference in RRs by continent was very marked and highly significant (p < 0.001)
for all lung cancer and adeno, but less marked, though still significant (p < 0.01)
for squamous.

Start year of study

For all lung cancer and squamous, variation by start year was not significant (p ≥ 0.05)
although there was some tendency for RRs to be higher in more recent studies. For
adeno, the variation was significant (p < 0.01) but there was no clear trend.

Study type

For all three outcomes, RRs were somewhat lower for case–control studies than for
prospective studies (or other study designs where the smoking data were collected
before lung cancer diagnosis). However, the difference was never statistically significant
(p ≥ 0.05).

National cigarette tobacco type

For all three outcomes, there was significant (p < 0.01 or< 0.001) variation. This
was mainly due to low estimates in the “other” group, which mainly included results
from China. For all lung cancer, RRs for Virginia (6.24, 5.16-7.54, n = 50) and blended
(6.30, 5.79-6.87) were quite similar. For squamous and adeno, there were limited results
for Virginia, and no clear difference from blended was evident.

Any proxy use

There was some evidence that RRs were higher where proxy respondents were used for
squamous (p < 0.05) and adeno (p < 0.1), but not for all lung cancer.

Full histological confirmation

RR estimates were somewhat higher where full histological confirmation of diagnosis
was a study requirement, but this was only significant at p < 0.05 for all lung cancer.

Number of cases

Some tendency for RRs to increase with increasing number of cases was evident for
all three outcomes, but variation in number of cases was only significant for all
lung cancer (p < 0.01).

Smoking product

The analyses in Table
5 are based on a preference order of any product, cigarettes (ignoring other products)
and cigarettes only. For all lung cancer, where 205 of the 328 estimates were for
any product, 114 were for cigarettes and 9 for cigarettes only, there was no evidence
that the RRs included varied by smoking product. For squamous and adeno (both p < 0.001),
however, RRs were lowest for smoking any product, intermediate for cigarettes, and
highest for cigarettes only (though based on only two RRs for cigarettes only for
each outcome).

Unexposed base

RRs were somewhat higher where the unexposed base group was never cigarettes than
when it was never any product, though this was only significant (p < 0.05) for adeno.
This result is somewhat counter-intuitive, as lower RRs might be expected where the
base (never cigarettes) includes some smokers (pipe/cigar only), and probably arises
from the strong correlation between the definitions of smoking product and unexposed
base. Two combinations – any product vs. never any product (n = 203) and cigarettes
vs. never cigarettes (n = 90) – form a large proportion of the total RRs (with any
product vs never cigarettes not a valid possibility).

Number of adjustment factors

There was no evidence that RR estimates varied by whether they were adjusted for 0,
1 or 2+ potential confounding variables.

The full meta-analysis (see Additional file
5: Detailed Analysis Tables) also includes results by levels of some other characteristics.
In an attempt to evaluate the independent role of a whole range of characteristics,
preliminary meta-regression analyses were conducted for each outcome (results not
shown). As a result, it was decided to present findings for a fixed model involving
six major characteristics (see Table
7), test the effect of each by deleting each of the six individually from the fixed
model (and also by allowing each to enter a step-wise model in order of significance),
and test the effect of a range of other characteristics by adding each individually
into the fixed model (see Additional file
5: Detailed Analysis Tables). The main conclusions to be drawn from these analyses
are summarized below.

Table 7.Meta-regression results for ever smoking of any product (or cigarettes if any product
not available)a

For all lung cancer, by far the strongest source of variation was location, with the
overall heterogeneity reduced from 22.84 per d.f. to 7.02 per d.f. after including
location only into the model. As noted earlier this was mainly due to relatively high
RRs in North America and low RRs in Asia. Other clear effects were also associated
with start year of study (p < 0.001, higher risks in later studies, much more clearly
evident than in the univariable analyses in Table
5), study type (p < 0.01, higher risks in prospective studies) and number of cases
(p < 0.001, higher risks in larger studies). There was no significant effect of sex,
and the weakly significant (p < 0.05) effect for number of adjustment factors was
associated with an erratic pattern, with lower RRs where the number of factors was
1, and higher RRs where it was 0 or 2+. The heterogeneity for the fixed model including
all the six characteristics included in Table
7 was 4.72 per d.f., with the model explaining 80.5% of the overall variation between
the RRs. Inspection of standardized residuals revealed eight estimates where the value
was outside the range +/− 2.5 SEs : MILLS/males (RR 1.33, fitted 3.35), LOMBA2/females
(RR 1.33, fitted 5.07), TIZZAN/males (RR 1.93, fitted 3.50), WANG4/males (RR 1.16,
fitted 2.01), PERNU/males (RR 8.93, fitted 4.37), LUBIN2/males (RR 8.50, fitted 5.47),
BOFFET/males (RR 14.20, fitted 7.78) and JUSSAW/males (RR 16.83, fitted 3.77). Only
two other characteristics studied significantly (p < 0.05) improved the fit of the
model, both related to study location. One was a variable subdividing “Other Europe”
(i.e. other than UK and Scandinavia) into five smaller regions, with risk relatively
low in the Balkans (Greek and Turkish studies) and relatively high in multiregional
studies compared with the rest, and the other a variable subdividing “Other Asia”
(i.e. other than China or Japan) into three smaller regions, with risk higher in India
compared to Hong Kong and the rest of Asia (Taiwan, Thailand, Singapore and South
Korea). No independent effect was evident for national cigarette tobacco type. Additional
analysis (data not shown) confirmed the strong independent effect of start year of
study separately within studies conducted in North America, Europe and Asia, though
the tendency for higher RRs in more recent studies was stronger in North America than
in Europe, and the pattern of variation was more erratic for Asia. It also confirmed
the strong independent effects of location and start year of study separately for
males and for females.

For squamous, start year of study was the most important factor, on its own reducing
the heterogeneity from 5.17 to 4.33 per d.f. (p < 0.001). Other significant characteristics
included location (p < 0.001), with RRs high in North America and low in China, and
number of cases (p < 0.05), with higher RRs in larger studies. Number of adjustment
factors was also significant (p < 0.05), but the pattern was erratic and not the same
as for all lung cancer. Though the pattern of results by study type was similar to
that for all lung cancer, this characteristic did not contribute significantly to
the model. The heterogeneity for the fixed model (Table
7) was 3.18 per d.f., the model explaining 49.9% of the overall variation. Two standardized
residuals were outside the range +/− 2.5 SEs : STAYNE/males (RR 3.47, fitted 10.50)
and LUBIN2/males (RR 16.66, fitted 8.41). Two other characteristics significantly
improved the model fit. One was national cigarette tobacco type, with RRs higher where
flue-cured Virginia tobacco was smoked, than where blended tobacco was smoked. Also,
RRs were higher (p < 0.01) where they had been derived by a relatively complex method
(see Methods) than where they were as reported originally, or derived by more standard
methods.

For adeno, location was the most important factor, on its own reducing the heterogeneity
from 8.78 to 4.36 per d.f. (p < 0.001), with the pattern of results (RRs high in North
America and low in Asia) similar to that for all lung cancer. As for all lung cancer,
there was variation by start year of study (p < 0.05) and number of cases (p < 0.05),
with RRs higher for recent and larger studies. RRs were again higher for prospective
studies, but here the difference was not significant (p ≥ 0.05). Here, variation by
sex was significant (p < 0.05) with RRs higher for males than females, but number
of adjustment factors was not (p ≥ 0.05). The heterogeneity for the fixed model (Table
7) was 3.27 per d.f., the model explaining 69.5% of the overall variation. Two standardized
residuals were outside the range +/− 2.5 SEs : LOMBA2/females (RR 0.53, fitted 2.32)
and WYNDER6/females (RR 13.99 fitted 6.22). Four other characteristics significantly
improved the model fit. One was “Other Asia” (p < 0.05) where RRs were high in India
(based on a single RR from JUSSAW) and relatively low in Hong Kong, Taiwan, Thailand,
Singapore and South Korea. National cigarette tobacco type was also significant (p < 0.05),
with RRs for blended higher than for Virginia, opposite to the finding for squamous.
RRs were also lower where there was any use of proxy respondents (p < 0.05). Also,
RRs varied (p < 0.001) by the detailed definition of adenocarcinoma used. This appeared
to be mainly because of a low RR for “not squamous or undifferentiated”, a definition
used only for LOMBA2/females, where the standardized residual of −3.721 SEs was the
largest for any RR (see also above).

The fixed model (Table
7) considered how RR estimates varied by six main characteristics and additional analyses
(see Additional file
5: Detailed Analysis Tables) tested whether adding in further characteristics improved
the model fit. Characteristics which did not improve the fit for any of the three
outcomes considered included whether there was adjustment for specific factors (such
as age), the age of the subjects studied, the definition of smoking product, the definition
of the unexposed base, whether the study was conducted in a population working in
a risky occupation, and whether the study procedures required full histological confirmation.

B. Risk from current smoking

Figures
10,
11,
12 (all lung cancer), Figure
13 (squamous) and Figure
14 (adeno) present the results of the main meta-analyses for current smoking of any
product. As before, RRs for smoking of cigarettes are used if RRs for any product
smoking are not available, and RRs are most-adjusted. For prospective studies, current
smoking refers to smoking status as at baseline. Table
8 presents additional results by level of the same set of characteristics considered
in Table
5, while Table
9 presents results of alternative meta-analyses of current smoking.

Figure 10.Forest plot of current smoking of any product and all lung cancer – part 1. Table
8 presents the results of a main meta-analysis for all lung cancer based on 195 relative
risk (RR) and 95% confidence interval (CI) estimates for current smoking of any product
(or cigarettes if any product not available). The individual study estimates are shown
numerically and graphically on a logarithmic scale in Figures
10,
11,
12. The studies are sorted in order of sex within study reference (REF) within start
year of study (START) within continent (CONT). In the graphical representation individual
RRs are indicated by a solid square, with the area of the square proportional to the
weight (inverse-variance of log RR). Arrows indicate where the CI extends outside
the range allocated. For study DORGAN separate estimates, within sex, are shown for
whites then blacks. For study HUMBLE they are shown for non-hispanic whites then hispanics,
and for study SCHWAR for whites then non-whites.

Figure 11.Forest plot of current smoking of any product and all lung cancer – part 2. This is a continuation of Figure
10, presenting further individual study data included in the main meta-analysis for
all lung cancer shown in Table
8. For study KELLER separate estimates, within sex, are shown for whites then non-whites.

Figure 12.Forest plot of current smoking of any product and all lung cancer – part 3. This is a continuation of Figure
11, presenting the remaining individual study data included in the main meta-analysis
for all lung cancer shown in Table
8. Also shown are the combined random-effect estimates. These are represented by a
diamond of standard height, with the width indicating the 95% CI. For study KREUZE
separate estimates, within sex, are shown for age ≤ 45 and 55–69.

Figure 13.Forest plot of current smoking of any product and squamous. Table
8 presents the results of a main meta-analysis for squamous based on 41 relative risk
(RR) and 95% confidence interval (CI) estimates for current smoking of any product
(or cigarettes if any product not available). The individual study estimates are shown
numerically and graphically on a logarithmic scale. The studies are sorted in order
of sex within study reference (REF) within start year of study (START) within continent
(CONT). In the graphical representation individual RRs are indicated by a solid square,
with the area of the square proportional to the weight (inverse-variance of log RR).
Arrows indicate where the CI extends outside the range allocated. Also shown are the
combined random-effect estimates. These are represented by a diamond of standard height,
with the width indicating the 95% CI.

Figure 14.Forest plot of current smoking of any product and adeno. Table
8 presents the results of a main meta-analysis for adeno based on 44 relative risk
(RR) and 95% confidence interval (CI) estimates for current smoking of any product
(or cigarettes if any product not available). The individual study estimates are shown
numerically and graphically on a logarithmic scale. The studies are sorted in order
of sex within study reference (REF) within start year of study (START) within continent
(CONT). In the graphical representation individual RRs are indicated by a solid square,
with the area of the square proportional to the weight (inverse-variance of log RR).
Arrows indicate where the CI extends outside the range allocated. Also shown are the
combined random-effect estimates. These are represented by a diamond of standard height,
with the width indicating the 95% CI.

Table 8.Main meta-analyses for current smoking of any product (or cigarettes, if any product
not available)a

Table 9.Some alternative meta-analyses for current smoking compared to those in Table8

As for ever smoking, the RRs for all three outcomes are heterogeneous (p < 0.001),
with the largest estimates seen being 104.50 for all lung cancer (STUCKE/males), 78.91
for squamous (CPSII/females), and 21.70 for adeno (OSANN/males). The random-effects
estimates (all lung cancer 8.43, 95% CI 7.63-9.31, n = 195; squamous 16.91, 13.14-21.76,
n = 41; adeno 4.21, 3.32-5.34, n = 44) are all clearly positive, larger than the corresponding
estimates for ever smoking, and also show a stronger relationship with squamous than
adeno. Similarly to ever smoking, the individual RRs are virtually all above 1.0,
though varying substantially. The estimates are again little affected (Table
9) by preferring least, rather than most, adjusted RRs, by restricting to a more precise
outcome definition, or by preferring RRs for current smoking of cigarettes to those
for current smoking of any product. Again estimates based specifically on cigarette
only smoking were slightly higher than those shown in Table
8 – 9.52 (7.89-11.49, n = 38) for all lung cancer, 20.85 (14.84-29.29, n = 8) for squamous,
and 6.05 (3.69-9.92, n = 7) for adeno. More so than in Table
6, data by age were rather limited for squamous and adeno. For all lung cancer estimates
were 6.57 (4.68-9.23, n = 25) for age <56 years, 9.62 (7.10-13.05, n = 24) for age
50–70 years, and 9.07 (6.83-12.04, n = 27) for age 65+ years, no clear trend being
evident. Table
9 also includes results for the comparison current vs. non-current smokers. The RRs
here (3.75, 3.48-4.03 for all lung cancer; 4.71, 3.84-5.79 for squamous; 2.46, 2.07-2.93
for adeno) were markedly lower than the corresponding estimates for current vs. never
smokers, reflecting the increased risk in ex-smokers described later (see section
D below).

For the main meta-analysis, the studies contributing most to the total weight for
current smoking for all lung cancer were STOCKW/sexes combined (17.8% of the total
of 6,750) followed by BROWNS/males (6.0%) and BROWNS/females (5.4%). BROWNS was the
major contributor for both squamous and adeno, with the two sex-specific results contributing
36.0% of the total weight of 646 for squamous, and 30.0% of the total weight of 1,017
for adeno. The huge LIU4 study did not provide results for current smoking.

For the characteristics considered in Table
8, the pattern of variation has a number of similarities to that for ever smoking in
Table
5. Thus, as for ever smoking, RRs for all three outcomes tend to be higher for males,
for North American studies, and where the unexposed base is never cigarettes, and
smaller for older studies and smaller studies, with no clear variation by extent of
adjustment. A tendency for RRs to be higher where data may be reported by proxy respondents
seems somewhat stronger for current smoking, although based on few estimates for squamous
and adeno. A tendency for RRs to be higher where the smoking product is cigarettes
or cigarettes only than when it is any product is also evident, though not for squamous,
whereas it was seen most clearly in squamous for ever smoking. There is also some
indication that RRs are higher in prospective studies, though interestingly not for
all lung cancer. Whereas for ever smoking, RRs for studies requiring full histological
confirmation were higher than for those that did not for all three outcomes, the tendency
was in the reverse direction for squamous and adeno for current smoking. For national
cigarette tobacco type, current smoking RRs for squamous and adeno are virtually all
for blended, so are unhelpful. For all lung cancer, RRs are quite similar for Virginia
and blended, the significant (p < 0.001) variation shown in Table
8 arising because of the low RRs in the “Other” group, mainly for China.

As for ever smoking, meta-regression analyses were conducted to give further insight,
the results from the same fixed model including six characteristics being summarized
in Table
10. Based on these results and those for other characteristics in Additional file
5: Detailed Analysis Tables various conclusions can be drawn.

Table 10.Meta-regression analyses for current smoking of any product (or cigarettes if any
product not available)a

For all lung cancer, as was the case for ever smoking RRs, by far the strongest source
of variation in current smoking RRs was location with relatively high risks in North
America and low risks in Asia. The overall heterogeneity reduced from 13.76 per d.f.
to 6.73 per d.f. after including location only into the model. Higher risks were also
seen in the fixed model in more recent studies (p < 0.001) and for males than females
(p < 0.01). There was some evidence (p < 0.1) of higher RRs in larger studies and
in prospective studies, but no association was seen with the number of adjustment
factors. The heterogeneity for the fixed model shown in Table
10 was 4.68 per d.f., with the model explaining 69.3% of the overall variation between
the current smoking RRs. Four standardized residuals were outside the range +/− 2.5
SEs : BROWN2/males (RR 11.30, fitted 15.86), TIZZAN/males (RR 1.90, fitted 3.68),
CPSI/females (RR 3.20, fitted 6.59) and KREUZE/males aged 55–69 (RR 41.86, fitted
11.85). No other characteristic significantly improved the fit when added to the fixed
model. Additional analysis (data not shown) confirmed the effect of start year of
study separately for North America and Europe (though no such relationship was seen
in Asia) and also confirmed that the effects of location and start year of study were
evident separately for males and for females.

For squamous and adeno, numbers of current smoking RRs (41 and 44 respectively) were
much lower than those for all lung cancer, with no data for China or the United Kingdom,
or for national cigarette type “other”. For squamous, only two characteristics in
the fixed model (Table
10) were significant, and then only at p < 0.05, and one of these was number of adjustment
factors, where the pattern of response was erratic. Location was the other, with RRs
again highest in North America and lowest in Asia. There were no estimates with large
standardized residuals, and no other characteristic improved the model fit.

For adeno, three of the characteristics considered in Table
10 contributed significantly to the model, sex (p < 0.001), location (p < 0.001) and
start year of study (p < 0.05), with the direction of effect similar to that noted
earlier for ever smoking. There were no large standardized residuals, and the only
additional characteristic which improved the model fit (p < 0.05) related to somewhat
lower RRs being seen for studies with full histological confirmation.

For none of the three outcomes did characteristics associated with detailed location,
national cigarette tobacco type, the precise definition of the outcome, adjustment
for specific factors, the definitions of smoking product or of the unexposed base,
whether the study was conducted in a population working in a risky occupation or whether
proxy respondents were used, add significantly to the model.

C. Risk from ever or current smoking

In an attempt to incorporate data from a greater number of studies, additional analyses
were carried out for ever/current smoking and for current/ever smoking. The meta-analysis
RRs are shown in Table
11. The number of studies included increased from 236 to 242 for all lung cancer, from
73 to 78 for squamous and from 75 to 81 for adeno, compared with Table
5. Note that the slightly higher number of RR estimates in the current/ever analysis
arises from inclusion there of more sex-specific results.

Table 11.Main meta-analyses for current or ever smoking of any product (or cigarettes, if not
available)a

As many of the RRs are common between the specific ever smoking analyses in Table
5 and the ever/current smoking analyses in Table
11, the meta-analysis RRs tend to be quite similar. However those for current/ever smoking
are intermediate between those specifically for ever smoking (Table
5) and those specifically for current smoking (Table
8). For example, for all lung cancer, random-effects estimates are 5.50 (95% CI 5.07-5.96,
n = 328) for ever smoking, 5.48 (5.07-5.93, n = 342) for ever/current smoking, 6.20
(5.68-6.77, n = 344) for current/ever smoking, and 8.43 (7.63-9.31, n = 195) for current
smoking. The pattern of RRs by level of the characteristics studied for both ever/current
and current/ever smoking tends to be quite similar to that for the specific analyses.
Results for ever or current smoking by level of selected characteristics are therefore
only presented in Additional file
5: Detailed Analysis Tables.

D. Risk from ex smoking

Figures
15,
16,
17 (all lung cancer), Figure
18 (squamous) and Figure
19 (adeno) present the results of the main meta-analyses for ex smoking of any product
(or cigarettes if any product was not available), based on most-adjusted RRs. Some
results by levels of characteristics are shown in Table
12.

Figure 15.Forest plot of ex smoking of any product and all lung cancer – part 1. Table
12 presents the results of a main meta-analysis for all lung cancer based on 182 relative
risk (RR) and 95% confidence interval (CI) estimates for ex smoking of any product
(or cigarettes if any product not available). The individual study estimates are shown
numerically and graphically on a logarithmic scale in Figures
15,
16,
17. The studies are sorted in order of sex within study reference (REF) within start
year of study (START) within continent (CONT). In the graphical representation individual
RRs are indicated by a solid square, with the area of the square proportional to the
weight (inverse-variance of log RR). Arrows indicate where the CI extends outside
the range allocated. For studies DORGAN and KELLER separate estimates, within sex,
are shown for whites then blacks. For study HUMBLE they are shown for non-hispanic
whites then Hispanics. For study KELLER the estimate shown for females is for whites.

Figure 16.Forest plot of ex smoking of any product and all lung cancer – part 2. This is a continuation of Figure
15, presenting further individual study data included in the main meta-analysis for
all lung cancer shown in Table
12. For study KELLER the estimate shown for females is for non-whites.

Figure 17.Forest plot of ex smoking of any product and all lung cancer – part 3. This is a continuation of Figure
16, presenting the remaining individual study data included in the main meta-analysis
for all lung cancer shown in Table
12. Also shown are the combined random-effect estimates. These are represented by a
diamond of standard height, with the width indicating the 95% CI. For study KREUZE
separate estimates, within sex, are shown for age ≤ 45 and 55–69.

Figure 18.Forest plot of ex smoking of any product and squamous. Table
12 presents the results of a main meta-analysis for squamous based on 33 relative risk
(RR) and 95% confidence interval (CI) estimates for ex smoking of any product (or
cigarettes if any product not available). The individual study estimates are shown
numerically and graphically on a logarithmic scale. The studies are sorted in order
of sex within study reference (REF) within start year of study (START) within continent
(CONT). In the graphical representation individual RRs are indicated by a solid square,
with the area of the square proportional to the weight (inverse-variance of log RR).
Arrows indicate where the CI extends outside the range allocated. Also shown are the
combined random-effect estimates. These are represented by a diamond of standard height,
with the width indicating the 95% CI.

Figure 19.Forest plot of ex smoking of any product and adeno. Table
5 presents the results of a main meta-analysis for adeno based on 34 relative risk
(RR) and 95% confidence interval (CI) estimates for ex smoking of any product (or
cigarettes if any product not available). The individual study estimates are shown
numerically and graphically on a logarithmic scale. The studies are sorted in order
of sex within study reference (REF) within start year of study (START) within continent
(CONT). In the graphical representation individual RRs are indicated by a solid square,
with the area of the square proportional to the weight (inverse-variance of log RR).
Arrows indicate where the CI extends outside the range allocated. Also shown are the
combined random-effect estimates. These are represented by a diamond of standard height,
with the width indicating the 95% CI.

Table 12.Main meta-analyses for ex smoking of any product (or cigarettes, if any product not
available)a

Again the RRs are markedly heterogeneous (p < 0.001 for all three outcomes), ranging
up to 135.69 for all lung cancer (STUCKE/males), 22.90 for squamous (OSANN/males)
and 13.10 for adeno (OSANN/males). The random-effects estimates (all lung cancer 4.30,
95% CI 3.93-4.71, n = 182, squamous 8.74, 6.94-11.01, n = 33, and adeno 2.85, 2.20-3.70,
n = 34), though all clearly positive, are smaller than the corresponding estimates
for current smoking. Individual RRs are only very occasionally below 1.0 and never
significantly so. Estimates are little affected by using the more specific definition
of each outcome, preferring least-adjusted RRs to most-adjusted RRs, or preferring
RRs for ever smoking cigarettes to those for ever smoking any product. RRs for ever
smoking cigarettes only were too few for useful analysis for squamous and adeno, but
for all lung cancer were similar to those for ever smoking any product. Fuller details
are given in the Additional file
5: Detailed Analysis Tables.

For the main meta-analysis of ex smoking, the studies contributing most to the total
weight for all lung cancer were STOCKW/sexes combined (22.4% of the total of 4,739),
followed by BROWNS/males (8.5%) and BROWNS/females (6.5%). BROWNS was the major contributor
for both squamous and adeno, with the two sex-specific results contributing 49.4%
of the total weight of 446 for squamous, and 45.2% of the total weight of 619 for
adeno.

For the characteristics considered in Table
12 the sources of variation for all lung cancer are generally quite similar to those
seen for ever smoking in Table
5 and for current smoking in Table
8. Thus, RRs are higher for males, for North America, for more recent studies and for
larger studies. Interestingly RRs are clearly lower for prospective than for case–control
studies. Numbers of ex smoking RRs are less for squamous (33) and for adeno (34) than
for all lung cancer (182), but nevertheless some associations are evident in relation
to location for adeno, to study type for squamous, to number of adjustment factors
for adeno, and to number of cases, smoking product and unexposed base for both squamous
and adeno. Meta-regression analyses were not attempted for ex smoking.

E. Risk from smoking specific products compared to smoking of any product

Table
13 summarizes the results of meta-analyses for all lung cancer for cigarette only smokers,
smokers of pipes/cigars only, smokers of pipes only, and smokers of cigars only. In
each analysis, the base is never smokers of any product. The results for ever smoking
of pipes/cigars only are also shown in Figure
20.

Figure 20.Forest plot of ever pipe and/or cigar smoking and all lung cancer. Table
13 presents the results of a meta-analysis for all lung cancer based on 56 relative
risk (RR) and 95% confidence interval (CI) estimates for ever pipe and/or cigar smoking.
The individual study estimates are shown numerically and graphically on a logarithmic
scale. The studies are sorted in order of sex within study reference (REF) within
start year of study (START) within continent (CONT). In the graphical representation
individual RRs are indicated by a solid square, with the area of the square proportional
to the weight (inverse-variance of log RR). Arrows indicate where the CI extends outside
the range allocated. Also shown are the combined random-effects estimates. These are
represented by a diamond of standard height, with the width indicating the 95% CI.

For ever smoking, current smoking and ex smoking the random-effects RRs are similarly
elevated for pipes/cigars, pipes only and cigars only, but to a markedly lesser extent
than for cigarettes only. As for cigarette smoking, RRs for pipe and cigar smoking
are clearly higher for current smokers than for ex smokers.

Available results for squamous and adeno are limited, and mainly for ever smoking.
For pipe and/or cigar smoking, the RR for squamous (3.72, 95% CI 1.95-7.10, n = 8)
is somewhat higher than that for all lung cancer (2.92, 2.38-3.57, n = 38), but the
RR for adeno is not elevated (0.93, 0.62-1.40, n = 7). The lack of association of
adeno with pipe and cigar smoking is also evident in the RRs for pipes only (0.50,
0.23-1.10, n = 4) and for cigars only (0.55, 0.11-2.88, n = 3).

The results for pipe and cigar smoking mainly apply to males, as the few available
estimates for females have wide variability. The increased risk in smokers of pipes
and cigars is evident in each location studied, though data for Asia are extremely
sparse. Unlike for cigarettes, higher RRs are seen for Scandinavia (7.02, 4.72-10.44,
n = 6) and for Other Europe (5.17, 2.91-9.19, n = 8) than for North America (2.27,
1.79-2.89, n = 26) or the UK (4.32, 2.73-6.84, n = 11). These results are for ever/current
smoking, with the full results given in Additional file
5: Detailed Analysis Tables.

Table
13 also shows results for lung cancer for mixed smokers. For ever, current and ex smoking,
the random-effects RRs are slightly, but not significantly, higher than those for
smokers of cigarettes only. Available results for squamous and adeno are again limited,
and mainly for ever smokers. The RRs for squamous (9.78, 4.94-19.35, n = 6) and for
adeno (2.48, 1.25-4.95, n = 6) do not clearly differ from the RRs for squamous (11.09,
7.19-17.09, n = 10) and for adeno (2.63, 1.32-5.24, n = 10) for smokers of cigarettes
only.

F. Risk by type of cigarette smoked

Table
14 summarizes results by type of cigarette smoked. For filter and plain cigarette smoking
results are shown for three comparisons, including, for studies where there is a choice,
the nearest available equivalents to only filter vs. only plain (with results for
all lung cancer also shown in Figure
21), ever filter vs. only plain, and only filter vs. ever plain. Results are also shown
for the comparison of handrolled and manufactured cigarette smoking, and for mentholated
vs. non-mentholated cigarette smoking, with results for all lung cancer also shown
in Figures
22 and
23.

Figure 21.Forest plot of only filter vs. only plain cigarette smoking and all lung cancer. Table
14 presents the results of a meta-analysis for all lung cancer based on 42 relative
risk (RR) and 95% confidence interval (CI) estimates for only filter vs. only plain
cigarette smoking. The individual study estimates are shown numerically and graphically
on a logarithmic scale. The studies are sorted in order of sex within study reference
(REF) within start year of study (START) within continent (CONT). In the graphical
representation individual RRs are indicated by a solid square, with the area of the
square proportional to the weight (inverse-variance of log RR). Arrows indicate where
the CI extends outside the range allocated. Also shown are the combined random-effects
estimates. These are represented by a diamond of standard height, with the width indicating
the 95% CI.

Figure 22.Forest plot of handrolled vs. manufactured cigarette smoking and all lung cancer. Table
14 presents the results of a meta-analysis for all lung cancer based on 20 relative
risk (RR) and 95% confidence interval (CI) estimates for handrolled vs. manufactured
cigarette smoking. The individual study estimates are shown numerically and graphically
on a logarithmic scale. The studies are sorted on sex within study reference (REF)
within start year of study (START) within continent (CONT). In the graphical representation
individual RRs are indicated by a solid square, with the area of the square proportional
to the weight (inverse-variance of log RR). Arrows indicate where the CI extends outside
the range allocated. Also shown are the combined random-effects estimates. These are
represented by a diamond of standard height, with the width indicating the 95% CI.

Figure 23.Forest plot of mentholated vs. non-mentholated cigarette smoking of any product and
all lung cancer. Table
14 presents the results of a meta-analysis for all lung cancer based on six relative
risk (RR) and 95% confidence interval (CI) estimates for mentholated vs. non-mentholated
cigarette smoking. The individual study estimates are shown numerically and graphically
on a logarithmic scale sorted on sex within study reference (REF) within start year
of study (START) within continent (CONT). The studies are sorted in order of sex within
study reference (REF). In the graphical representation individual RRs are indicated
by a solid square, with the area of the square proportional to the weight (inverse-variance
of log RR). Arrows indicate where the CI extends outside the range allocated. Also
shown are the combined random-effects estimates. These are represented by a diamond
of standard height, with the width indicating the 95% CI.

The random-effects RRs show a reduction in risk for only filter vs. only plain cigarette
smoking that is significant for all lung cancer (RR 0.69, 95% CI 0.61-0.78, n = 42),
and squamous (0.52, 0.40-0.68, n = 13), though not for adeno (0.84, 0.66-1.08, n = 10).
The alternative comparisons for filter and plain, where only a third to a half of
the RRs included actually differ, show clear reductions for all lung cancer and squamous
associated with filter cigarette smoking, though no difference for adeno (see Table
14). The reductions for all lung cancer and squamous are evident in both sexes and all
continents (see Additional file
5: Detailed Analysis Tables).

The risk associated with handrolled smoking is greater than that with manufactured
cigarette smoking, with RRs of 1.29 (1.12-1.49, n = 20) for all lung cancer and 1.62
(1.18-2.21, n = 5) for squamous. The RR of 2.09 (0.83-5.25, n = 4) for adeno is based
on very heterogeneous estimates, varying from 0.43 to 8.76, and allows no clear conclusion.
As results for females are limited, and have wide variability, the conclusions mainly
apply to males. The estimated RR for all lung cancer is greater than 1 in all locations
studied, though not always statistically significant. However, there are no data from
North America.

Data on mentholated cigarette smoking are limited, particularly by histological type.
For all lung cancer, the RR of 0.98 (0.80-1.20, n = 6) is consistent with no effect
of mentholation on risk, five RR estimates close to or below 1.0, counterbalancing
one reported significant increase in males for study KAISER of 1.45 (1.03-2.02). There
is some evidence (p < 0.05) of heterogeneity by sex with estimates of 1.15 (0.93-1.43,
n = 3) for males, and 0.78 (0.63-0.98, n = 3) for females.

G. Risk by amount smoked

Table
15 summarizes the results of meta-analyses using RRs categorized by number of cigarettes
(or cigarette equivalents) smoked per day and based on data for ever/current smoking
and for smoking of any product (or cigarettes if not available). These are based on
those 140 studies for all lung cancer, 36 for squamous, and 34 for adeno which provided
data that could be used in the meta-analyses. For all three outcomes, results are
shown for one of the sets of “key values” (see Methods). For all lung cancer, squamous
and adeno, a clear increase is seen for RRs for categories including 5, but not 20,
cigarettes/day, with the meta-analysis RR increasing monotonically with increasing
amount smoked. Random-effects estimates for categories including 45, but not 20 cigarettes/day,
are 13.69 (11.80-15.89, n = 128) for all lung cancer, 27.65 (20.42-37.44, n = 37)
for squamous and 4.80 (3.29-7.01, n = 34) for adeno. The increase with amount smoked
is also clearly evident when an alternative set of key values (1, 10, 20, 30, 40,
999) is used, though numbers of available RRs are quite sparse for the higher key
values, when least-adjusted RRs are considered, and in both sexes (see Additional
file
5: Detailed Analysis Tables). The key value analyses do not use results for all the
dose–response data available, as a number of the studies use broad dose–response categories
(such as 1–20 or 20+ cigs/day) which span more than one of the key values. Additional
file
5: Detailed Analysis Tables also includes results for alternative definitions of smoking
status and product smoked, which show a similarly clear dose–response. For example,
for current smoking of any product, the RRs for squamous rise from 9.92 (7.41-13.28,
n = 8) for key value 5 cigs/day to 39.16 (23.67-64.79, n = 12) for key value 45 cigs/day.
Additional file
4: Dose Not Meta also includes available results for some other studies which present
dose–response data in a form that cannot readily be included in the meta-analyses
(e.g. where the only available comparison is with an inappropriate base group). These
results do not appear inconsistent with those summarized in Table
15.

Dose–response by amount smoked was investigated for pipe and cigar smoking, but the
number of estimates available was small, and referred only to males. However, there
was some evidence of dose–response. Thus for all lung cancer, one can compare RRs
for cigar only smoking for the highest (8.21, 4.36-15.49, n = 6) and lowest exposure
groups (1.84, 1.22-2.79, n = 5), and can also compare RRs for pipe only smoking for
the highest (5.99, 3.57-10.04, n = 9) and lowest exposure groups (3.68, 2.75-4.93,
n = 8).

H. Risk by age of starting to smoke

Table
16 summarizes meta-analysis results for age of starting to smoke based on data for ever/current
smoking and for smoking of any product (or cigarettes if not available). Random-effects
RRs for earliest compared to latest starting, and selecting results least-adjusted
for other aspects of smoking, are significantly elevated for all lung cancer (2.35,
2.08-2.65, n = 73), squamous (2.23, 1.66-2.98, n = 18) and adeno (1.99, 1.48-2.67,
n = 17). Alternatively selecting results most-adjusted for other aspects of smoking,
the RR for all lung cancer is 2.20 (1.96-2.47, n = 73). The increase in risk with
earlier starting is consistent with the results of the key value analyses, with, for
example, random-effects estimates relative to never smokers for squamous rising from
11.06 (6.87-17.81, n = 14) for categories including 26 years but not including 18
years to 31.07 (17.93-53.85, n = 6) for categories including 14, but not 18 years.
As seen in Additional file
5: Detailed Analysis Tables, a similar pattern is generally seen for other definitions
of smoking status and product smoked, although data for smokers of pipes and/or cigars
are very limited.

I. Risk by duration of smoking

Table
17 is laid out similarly to Table
16 and also presents results for ever/current smoking. Random-effects RRs for longest
compared to shortest duration of smoking, and selecting results least adjusted for
other aspects of smoking, are significantly elevated for all lung cancer (3.56, 2.90-4.35,
n = 76), squamous (3.93, 3.10-4.97, n = 27) and adeno (2.64, 2.04-3.43, n = 23). Alternatively
selecting results most adjusted for other aspects of smoking, the RR for all lung
cancer is 3.00 (2.57-3.49, n = 77). The increase in risk with longer duration is consistent
with the results of the key value analyses, with, for example, random-effects estimates
for all lung cancer rising from 2.48 (2.09-2.95, n = 55) for categories including
20 years but not including 35 years to 10.13 (7.66-13.39, n = 45) for categories including
50, but not 35 years. A clear trend of risk with increasing duration is also seen
for other definitions of smoking status and product smoked (see Additional file
5: Detailed Analysis Tables). Data for pipe and cigar smoking are limited, though even
so there is some evidence of a trend. Thus, for all lung cancer longest to shortest
RRs are elevated, both in smokers of pipes only (4.32, 1.57-11.89, n = 5) and smokers
of cigars only (2.43, 1.02-5.79, n = 3).

J. Risk by duration of quitting (vs. never smoking)

Table
18 presents results for duration of quitting (vs. never smoking) based on results for
smoking of any product (or cigarettes if not available). Random-effects RRs for shortest
compared to longest duration of quitting, selecting results least adjusted for other
aspects of smoking, are significantly elevated for all lung cancer (3.97, 3.32-4.75,
n = 65), squamous (6.22, 3.75-10.30, n = 14) and adeno (3.32, 1.98-5.58, n = 14).
Alternatively selecting results most adjusted for other aspects of smoking, the RR
for all lung cancer is 3.61 (3.04-4.28, n = 65). The increase in risk with shorter
duration of quitting is consistent with the results of the key value analyses, with,
for example, random-effects estimates relative to never smokers for adeno rising from
2.10 (1.49-2.94, n = 12) for categories including 12 years but not including 7 years
to 6.73 (3.46-13.12, n = 6) for categories including 3, but not 7 years. A clear trend
of risk with increasing duration of quitting is also seen for cigarette smoking (or
any product if not available), and for cigarette only smoking (see Additional file
5: Detailed Analysis Tables). Data for pipe and cigar smoking were too limited for
reliable conclusions.

K. Risk by duration of quitting (vs. current smoking)

For duration of quitting compared to current smoking the number of data sets available
are somewhat less than the corresponding number for duration of quitting compared
to never smoking. Results included in the longest vs. shortest analysis shown in Table
19 are generally the inverse of those in the shortest vs. longest analysis in Table
18 (exceptions arising for studies which combined current smokers and recent quitters
of more than 2 years). While the key value analyses shown in Table
19 echo the trends shown in Table
18, they also show that for shorter term quitting (categories including 3 but not 7
years) there is no evidence of a decline in risk from quitting. Thus the RRs for all
lung cancer (0.95, 0.84-1.08, n = 41) and adeno (1.02, 0.85-1.22, n = 6) are close
to 1.00, and the RR for squamous (1.15, 1.03-1.28, n = 6) is slightly elevated. Longer
quit durations are, however, clearly associated with a reduction in risk. For all
lung cancer, almost 40% of the RRs used in the key value analyses included short-term
quitters (of up to 2 years) in the current smoker base. No difference was seen between
those RRs and those with a more precisely defined current smoker base.

Table 19.Meta-analyses for duration of quitting (vs. current smoking)a

L. Risk by tar level

Due to the variety of different methods of quantifying tar levels, only highest vs.
lowest analyses have been carried out. No data were available by histological type,
and all data relate to cigarette smoking. For all lung cancer and for ever/current
smoking of cigarettes the 14 available estimates, from 9 studies, showed some evidence
of heterogeneity (H = 2.29, p < 0.01). However, 12 of the estimates showed a higher
risk in the higher tar group, and the random-effect estimate (1.42, 1.18-1.71) confirmed
the relationship between risk and tar level. The increase was evident for males (1.29,
1.08-1.53, n = 7) and females (1.48, 1.05-2.09, n = 6). There was no evidence of heterogeneity
by any specific characteristic, including extent of adjustment, 7 of the 14 estimates
being adjusted for one or more of aspects of smoking. These results are based on RRs
that are selected as being least adjusted for other aspects of smoking. Alternatively,
using RRs selected as most adjusted for other aspects of smoking, the overall estimate
was 1.34 (1.16-1.56, n = 14).

M. Risk by butt length and fraction smoked

All the available data relate to cigarette smoking. As the number of available estimates
were quite limited, particularly for butt length, they have been combined into a single
analysis including RRs for shortest vs. longest butt lengths and for greatest vs.
smallest fraction smoked, and including results for ever smoking and current smoking.
The combined estimates were 1.43 (1.14-1.79, n = 11) for all lung cancer, 1.39 (1.04-1.86,
n = 7) for squamous, and 1.30 (1.07-1.58, n = 6) for adeno. There was some evidence
of heterogeneity for all lung cancer (H = 2.29, p < 0.05) and for squamous (H = 2.96,
p <0.01), though not for adeno (H = 0.75), but a clear majority (18/24 = 75.0%) of
the estimates indicated a higher risk associated with smoking more of the cigarette.

N. Further analyses by histological type

The results so far have been restricted to all lung cancer, squamous or adeno. Table
20 gives results for ever, current and ever/current smoking of any product (or cigarettes
if not available) for small cell carcinoma and large cell carcinoma, with corresponding
results also shown for all lung cancer, squamous cell carcinoma and for adenocarcinoma.
For ever/current smoking, the RR for large cell carcinoma (5.33, 4.02-7.07, n = 29)
is quite similar to that for all lung cancer (5.48, 5.07-5.93, n = 342), while the
RR for small cell carcinoma (11.14, 8.59-14.46, n = 61) is markedly higher, and similar
to that for squamous cell carcinoma (11.62, 9.80-13.78, n = 82). This pattern is also
true for current smoking, where RR estimates are higher than for ever/current smoking,
and for ever smoking. Additional file
5: Detailed Analysis Tables gives results by level of the various characteristics studied.
As for all lung cancer, squamous and adeno, RRs for small cell and large cell carcinoma
varied substantially by location, with RRs much higher in North America than in China,
and no clear pattern for the other regions, some of which have sparse data. There
was also a tendency for RRs to be higher where there was 100% histological confirmation.
For ever/current smoking RRs and for small cell carcinoma, the RRs were 9.84 (7.19-13.45,
n = 42) without such confirmation, and 14.62 (9.38-22.80, n = 19) with it (p < 0.01).
For large cell carcinoma, the corresponding RRs were 3.90 (2.90-5.24, n = 19) without
confirmation and 8.28 (5.89-11.65, n = 10) with it (p < 0.01). There was also some
evidence for small cell carcinoma only that RRs were higher from more recent studies.

O. Further analyses based on independent pairs of relative risks

Some studies provide independent RRs for males and females for the same definition
of outcome and exposure. Random-effects meta-analysis of the male/female sex ratio
confirms the impression already gained from the analyses shown in earlier Tables that
RRs tend to be somewhat higher for males, although estimates are heterogeneous. For
ever/current smoking, the sex ratio is 1.38 (1.23-1.54) for all lung cancer, based
on 93 ratios, 64 higher in males; 1.31 (0.91-1.90) for squamous, based on 30 ratios,
18 higher in males, and 1.43 (1.14-1.78) for adeno, based on 33 ratios, 27 higher
in males.

As sex differences may reflect greater cigarette consumption in males, meta-analysis
estimates of the sex ratio for ever/current smokers and for all lung cancer were also
calculated within levels of amount smoked (as defined in section G). The sex ratio
is 1.33 (1.05-1.68) for smokers of about 5 cigs/day, based on 46 ratios, 26 higher
in males, 1.59 (1.25-2.01) for smokers of about 20 cigs/day, based on 25 ratios, 20
higher in males, and 1.21 (0.99-1.49) for smokers of about 45 cigs/day, based on 26
ratios, 17 higher in males.

A number of studies provide RR estimates for ever/current smoking separately by age,
and random-effects meta-analysis were conducted, based on the ratio of the estimate
for the oldest age group for which data were available compared to that for the youngest.
Despite only 22 of the 45 (48.9%) of the ratios showing a greater risk in the oldest
age group, the meta-analysis showed a significantly higher risk in the oldest age
group (ratio 1.17, 95% CI 1.10-1.25), the seven ratios with most weight all being
greater than 1.0.

There were also eight studies, all conducted in the US, which provide comparable sex-specific
results for ever/current smoking separately for white people and black people (or
non-white people). Random-effects meta-analyses of the white/black race ratio showed
no difference between the races (1.05, 0.90-1.23, n = 14).

P. Further analyses based on non-independent pairs of relative risks

Some studies also provide separate non-independent least-adjusted and most-adjusted
RRs for the same definition of exposure. There is little evidence that adjustment
reduces the RR for ever/current smoking. Using the same preferences as in Table
11, the most-adjusted estimate is lower than the least-adjusted estimate for 57 of the
126 (45.2%) pairs for all lung cancer, for 14 of the 36 (38.9%) pairs for squamous,
and for 21 of the 41 (51.2%) pairs for adeno. In no case do the percentages differ
from 50% (at p < 0.05), and in each case the random-effects meta-analysis estimate
based on the most-adjusted pair members is similar to the corresponding estimate based
on the least-adjusted pair members (data not shown).

RRs for a dose-related index of smoking may be adjusted for other such indices. For
all lung cancer, and for four dose-related indices of smoking, pairs of otherwise
similar highest vs lowest RRs were identified in which one of the pair was adjusted
for the most available other aspects of smoking, and the other had no such adjustment.
Both were also chosen as adjusted for the most possible other variables (although
those other variables may differ between the pair). There was a clear tendency for
the additional adjustment for other aspects of smoking, typically including amount
smoked, to produce lower RR estimates. This was true for 18/22 (81.8%, p < 0.01) of
the pairs of estimates for age of starting to smoke, 12/15 (80.0%, p < 0.05) of the
pairs for duration of smoking, all 17 (100%, p < 0.001) of those for years quit, and
5/7 (71.4%, NS) of those for tar level.

Based on results for ever/current smoking and for all lung cancer, RRs for mixed smokers
were compared with those for smokers of cigarettes only. For 22 of the 34 (64.7%)
pairs, the RR was lower for mixed smokers, but this tendency was not significant (p = 0.12).
RRs for mixed smokers were also compared with those for smokers of pipes/cigars only.
Here 23 of the 24 (95.8%, p < 0.001) pairs showed a lower risk in the smokers of pipes/cigars
only.

Q. Publication bias

Some results of Egger’s test
[17] for publication bias are presented in Tables
5,
8 and
12, with further results given in Additional file
5: Detailed Analysis Tables, but have not previously been referred to in the text.
For ever smoking there is evidence of publication bias for all lung cancer (p < 0.001)
and adeno (p < 0.01), but not for squamous (p ≥ 0.1). For current smoking, some evidence
of publication bias is seen for all lung cancer (p < 0.05), but not for squamous or
adeno (p ≥ 0.1). For ex smoking, there is again evidence of bias for all lung cancer
and for adeno (p < 0.001) but not for squamous. Figure
24 (all lung cancer), Figure
25 (squamous) and Figure
26 (adeno) show funnel plots for ever smoking. Where asymmetry is seen, this in the
direction of there being more higher-weight RRs above the mean. This is consistent
with the evidence in Table
5 of higher RRs for larger studies. Inspection of a funnel plot for ex-smoking for
all lung cancer (data not shown) also showed the high weight RRs tended to be above
the mean.

Figure 24.Funnel plot for ever smoking and all lung cancer. Funnel plot of the 328 relative risk estimates for ever smoking and all lung cancer
included in the main meta-analysis in Table
5 against their weight (inverse-variance of log RR). The dotted vertical line indicates
the fixed-effect meta-analysis estimate.

Figure 25.Funnel plot for ever smoking and squamous. Funnel plot of the 102 relative risk estimates for ever smoking and squamous included
in the main meta-analysis in Table
5 against their weight (inverse-variance of log RR). The dotted vertical line indicates
the fixed-effect meta-analysis estimate.

Figure 26.Funnel plot for ever smoking and adeno. Funnel plot of the 107 relative risk estimates for ever smoking and adeno included
in the main meta-analysis in Table
5 against their weight (inverse-variance of log RR). The dotted vertical line indicates
the fixed-effect meta-analysis estimate.

Discussion

Evidence of a relationship

The meta-analyses carried out demonstrate a clear relationship of smoking to overall
lung cancer risk. This is evident for ever, current and ex smoking, for pipes and
cigars, and for all types of cigarette studied. The increased risk in smokers is evident
in both sexes, in younger and older subjects, in all continents studied and in prospective
and case–control studies. That this relationship is causal is supported by the evidence
of a dose–response, risk increasing with increasing amount smoked, duration of smoking,
tar level and fraction smoked, and with earlier age of starting to smoke, and decreasing
with duration of quitting. It is also supported by the similarity of results based
on most-adjusted and least-adjusted RRs (though adjustment for amount smoked reduces
the association with other dose–response indices of smoking). The association is clearly
evident with each of the major histological types of lung cancer studied, being stronger
for squamous and small cell carcinoma, intermediate for large cell carcinoma, and
weakest for adenocarcinoma. Exceptionally, no relationship is seen between adenocarcinoma
and pipe or cigar smoking.

Heterogeneity

The studies are remarkably consistent in reporting an increased risk in ever smokers.
Only two of the 328 all lung cancer RRs, none of the 102 squamous RRs, and nine of
the 107 adeno RRs considered in Figures
1,
2,
3,
4,
5,
6,
7,
8,
9 are less than 1.0. However, studies also vary markedly in the magnitude of the estimated
RR, as illustrated by the high values of H seen in the meta-analysis of the major
smoking indices, which often exceed 5 and sometimes exceed 20. (H values of 5, 10
and 20 are the same as I2 values
[16] of 80%, 90% and 95%). This heterogeneity is perhaps unsurprising given the many sources
of variation involved, including sex, location, timing, study design and populations,
definition of outcome and type of product smoked, and extent of confounder adjustment.

Using univariable and multivariable (meta-regression) methods, we investigated variation
in risk by a number of characteristics of the study and the RR for the outcomes all
lung cancer, squamous and adeno. While our “fixed” multivariable models involving
six characteristics (sex, location, start year of study, study type, number of cases
and number of adjustment factors) explained a substantial proportion of the variation
(e.g. reducing H from 22.84 to 4.72 for all lung cancer for ever smoking), there was
always substantial residual heterogeneity (with H varying from 2.43 to 4.72 in the
six analyses in Tables
7 and
10). Of the six characteristics studied, location was generally the most important characteristic,
with RR estimates for ever and for current smoking and for all three outcomes always
highest in North America, and lowest in China, and (with the exception of ever smoking
for squamous) lower in the rest of Asia than in Europe, with no consistent differences
seen between results for the United Kingdom, Scandinavia and the rest of Europe. Another
consistently seen relationship was the tendency for RRs to vary by start year of study,
with higher RRs seen in more recent studies. Three other tendencies were generally
seen, though the level of significance varied according to the analysis. One was the
tendency for RRs to vary by number of cases, with the lowest estimates always seen
for the smaller studies, (involving 100 to 249 cases), another was the tendency for
RRs to be higher in prospective studies than in case–control studies, and the third
was the tendency for RRs to be somewhat higher in males than females. The final characteristic
included in the fixed model, number of adjustment factors, showed no clear relationship
with the RR, with significance either not present or weak (0.01 < p < 0.05), and the
direction of effect inconsistent.

We also tested for the effect of a number of other characteristics on the estimated
RR. A number of relationships were seen in the univariable models that were significant.
However, these mainly became non-significant in the multivariable models, presumably
due to correlations between the characteristics. Where a characteristic was significant,
this tended to be only in one of the six analyses, so not providing convincing evidence
of a true effect. It would have been possible, for each of the six combinations of
smoking status and outcome we considered, to present analyses of “best” models, based
on forward stepwise regression, that each included a different set of predictive characteristics.
However we felt that the regressions we presented based on a fixed model were more
useful. Sources of variation are discussed further in the following paragraphs.

Sex

If possible, sex-specific results are included in the meta-analyses, with combined
sex results included only if not. Though variation by sex was not significant in all
the main analyses, risk estimates generally tended to be higher for males than females.
This is supported by additional analyses comparing RRs within study for the same outcome
and exposure definition. Somewhat higher RRs were found in males even in analyses
where comparisons were made within the same levels of daily cigarette consumption
(about 5, 20 or 45 cigs/day). Even so, the existence of somewhat higher RRs for males
does not necessarily indicate any greater susceptibility, as it may reflect their
increased exposure to occupational carcinogens, or other differences in smoking history
such as greater duration of smoking or increased use of plain and higher tar cigarettes.
It should be noted however that in prospective studies where smoking habits were determined
at baseline, the greater tendency of males to quit during follow-up may cause bias
in the reverse direction. It should also be noted that comparison of smoker/never
smoker RRs for men and women does not take account of possible differences in risk
between male and female never smokers, the base groups for these comparisons. A detailed
overall assessment of this aspect is beyond the scope of this paper, and ideally would
involve direct comparison of risk in male and female smokers, with detailed adjustment
for age, smoking characteristics and major potential confounding variables. We note
that Bain et al.
[18] concluded, based on analysis of two large prospective studies and review of results
from six other such studies, that “women do not appear to have a greater susceptibility
to lung cancer than men, given equal smoking exposure”.

Age

While it is clear that absolute risk of lung cancer rises markedly with age, both
in smokers and never smokers, it is far less clear whether the smoker/never smoker
RR also does. Predictions based on the multistage model
[19] suggest that there should be a modest rise, but there is difficulty in establishing
this, especially when the great majority of the studies do not give results by age.
Possible effects of age were investigated in two ways. The first method (see Tables
6 and
9) was to compare RRs which were specific to subjects in specific age groups. Data
here were limited for squamous and adeno, and for all lung cancer suggested a possible
increase in RR with age for current smoking, but not for ever smoking. More reliable
are the comparisons (described in results section O), of RRs for the highest and lowest
age groups within study for ever/current smoking; between-study differences are automatically
controlled for under this approach. These showed a 17% greater risk for the highest
age group (95% CI 10% to 25%). Whether or not a RR was adjusted for age was considered
as a characteristic in the meta-regression analyses, but it never added significantly
to the fixed model for either ever or current smoking for any of the three outcomes.

Race

Although RRs were entered onto the database, if available, there were few studies
that provided such data. For eight studies which provided pairs of comparable RRs
for ever/current smoking, there was no indication that RRs for white people differed
systematically from those for black people (or non-white people). This, of course,
does not rule out the possibility that absolute risks for white people and black people
with similar smoking habits may differ. As our concern was only with RRs for smoking,
and whether these vary by other characteristics, we have not attempted to collect
data comparing absolute risk according to these characteristics, such as white/black
RRs within never smokers, or within smokers. Detailed analysis and discussion of racial
differences in lung cancer risk between black people and white people is therefore
beyond the scope of this paper. Elsewhere Lee
[20] points out that in the USA black men have a higher risk of lung cancer than do white
men. However, interpretation of this difference in terms of effects of smoking is
not straightforward for various reasons. Thus Lee notes that though black people are
more often current smokers, are less likely to quit smoking, smoke cigarettes with
a higher tar level, and have higher cotinine levels, all characteristics predictive
of a higher risk of lung cancer, they are also less likely to have ever smoked, smoke
fewer cigarettes a day and start to smoke later, all characteristics predictive of
a lower risk. Also little or no difference in lung cancer rate is seen between black
and white women. Black people are much more likely than white people to use mentholated
cigarettes, but no evidence of a difference in lung cancer risk associated with mentholation
was found, either in the present analysis or in other reviews
[20,21].

Location and national cigarette tobacco type

A consistent tendency in our meta-analyses was for RRs to be highest in studies in
North America, intermediate in Europe and lowest in Asia, particularly in China. There
was no very clear evidence of a difference between European countries, or between
other countries in Asia, though some of the analyses suggested relatively lower RRs
in Greece and Turkey than in the rest of Europe, and higher RRs in India than in the
rest of Asia. In an attempt to study a possible explanation for this difference we
divided countries into three groups by national cigarette tobacco type. One was the
countries (Australia, Canada, India, South Africa, UK and Zimbabwe) which typically
use flue-cured Virginia tobacco, another was the countries (all except those in the
other two groups) which typically use blended tobacco, and the third included Taiwan
and China (countries which used both types quite commonly or where we lacked confirmed
information). Including this variable into the meta-analyses did not consistently
improve the prediction of our model, a finding which is consistent with the conclusions
of other analyses we have conducted based on national data on lung cancer rates and
smoking frequency
[22]. There are, of course other possible explanations of the clear differences in lung
cancer RRs between continents, including genetic differences, and differences in baseline
rates of the disease.

Study timing

Our meta-regressions generally showed a tendency for RRs to be lower in studies which
started earlier. There may be a number of reasons for this, such as changes in the
relative use of cigarettes and pipes or cigars, and improvement of study quality,
with better standardization of questionnaires and definition of products smoked. However
we consider the most plausible reason to be changes in patterns of uptake of smoking,
with smokers in earlier born cohorts being less likely to have a lengthy smoking career
than smokers in later born cohorts.

Study type

Though this was only clearly significant in the analyses of ever smoking for all lung
cancer, there was a consistent tendency for RRs to be somewhat higher from prospective
studies than from case–control studies. If this is a true effect, the explanation
for it is unclear.

Number of cases

In order to limit the considerable amount of work needed, we limited attention to
studies involving at least 100 lung cancer cases. Given that smaller studies would
have contributed much less weight to the meta-analyses than would the studies that
were included, we consider that this restriction unlikely to have any material effect
on our conclusions. The meta-regression analyses did show a consistent tendency for
RRs to be higher in larger studies, though this was only significant for ever smoking
(all lung cancer p < 0.001), squamous and adeno p < 0.05). This tendency is in the
opposite direction to that predicted from publication bias. The explanation is unclear.

Adjustment for other factors

Generally our analyses showed that adjustment for age and other factors had very little
effect on the meta-analysis estimates of smoking-related RR, whether one considered
the total number of adjustment factors, or the effect of specific factors. This conclusion
of a minimal effect of confounding is consistent with that of a detailed analysis
of data from the huge CPSII prospective study
[23], and means that though the main results we report are based on most-adjusted estimates,
this decision had little or no effect on our conclusions or on the magnitude of our
estimates.

Adjustment for other aspects of smoking is, however, important when considering the
dose-related variables. Though studies rarely, if ever, present results to allow detailed
analysis of the effect of adjustment for one specific aspect of smoking on RRs for
another aspect, we have shown that adjustment for other aspects of smoking (which
typically includes amount smoked) consistently tends to reduce associations with age
of starting to smoke, duration of smoking, years quit and tar level. This is presumably
due to the tendency for earlier starters and high tar smokers to smoke more heavily
than do later starters and low tar smokers, and for lighter smokers to be more ready
to quit smoking. Below, we further discuss the effect of adjustment on results for
type of cigarette.

Product smoked

There was consistent evidence that risk of lung cancer was higher for cigarette only
smokers than for smokers of any product, and substantially higher than for smokers
of pipes only, cigars only or pipes/cigars only. For current smokers, for example,
RRs were 9.57 (7.90-11.59) for cigarettes only, as compared to 4.76 (3.44-6.59) for
pipes/cigars only. Mixed smokers tended to have similar risks to cigarette only smokers.
Interpretation of this finding is difficult as mixed smokers and cigarette only smokers
may have a different total exposure to tobacco, as well as a different cigarette consumption.
Data on the types of cigars or pipes smoked have not been recorded on the database,
but the increased risk is evident in each continent. The results for pipes and cigars
mainly apply to males and to RRs for all lung cancer. Though there are only limited
results by histological type, it is interesting that there is no indication of an
increased risk of adenocarcinoma for pipe and cigar smokers.

Type of cigarette smoked

The conclusions drawn from the results in Table
14 are consistent with those drawn by one of us in a review of the relationship between
lung cancer and type of cigarette conducted in 2001
[24]. This is unsurprising, because the data sets considered are very similar. The conclusions
are also very similar to those of a review by Kabat carried out in 2003
[25].

Comparisons between filter and plain smoking are made more difficult by the variety
of ways in which different reports present their results, but based on the index most
closely equivalent to only filter vs. only plain, the present report shows a reduction
in risk that is significant for all lung cancer (0.69, 95% CI 0.61-0.78) and for squamous
(0.52, 0.40-0.68), though not for adeno (0.84, 0.66-1.08). Significant reductions
in risk for all lung cancer and squamous, but not for adeno were also evident for
the alternative comparisons ever filter vs. only plain, and only filter vs. ever plain.
Our analyses were based on most-adjusted RR estimates, with many of the estimates
adjusted for other aspects of smoking, such as number of cigarettes smoked. In 2001,
a National Cancer Institute monograph
[26] claimed that apparent benefits of filter vs. plain and of low tar vs. high tar cigarettes
may be illusory if RRs are adjusted for daily consumption, as switching to cigarettes
with a lower machine-smoked delivery of tar and nicotine leads to “compensation” for
the reduced nicotine intake by increasing numbers of cigarettes smoked. Lee and Sanders
[27] investigated this claim in detail by comparing RRs for all lung cancer adjusted and
unadjusted specifically for daily cigarette consumption, and concluded that “whether
or not relative risk estimates are adjusted for cigarette consumption is not crucial
to the conclusion of a clear advantage to filter cigarettes and tar reduction”. This
analysis is more precise than that used in this report, but its conclusions are similar,
as we also found adjustment not to affect our overall conclusion that filter vs. plain
cigarette smoking was associated with a lower risk of all lung cancer and of squamous.
It should be noted that although no significant reduction in risk for filter cigarette
smoking was seen for adeno, there was also no evidence of an increase. This would
seem to argue against the claim often made that the observed rise over time in the
incidence of adenocarcinoma relative to squamous cell carcinoma seen in many countries
is due to changes in cigarette design increasing the risk of smoking-related adenocarcinoma.
In this context, it should be noted that though our database contains evidence by
histological type for filter vs. plain cigarette smoking, no such data were found
relating to tar level.

Our conclusions of a higher RR in handrolled vs. manufactured cigarette smokers is
consistent with that of the 2001 review
[24], with the increased risk evident, despite the limited amount of data, for squamous
and adeno as well as for all lung cancer.

Our review also found no difference in risk between smokers of mentholated and non
mentholated cigarette smokers, though based on data from only three studies, only
one of which provided results by histological type. Though no more recent studies
have reported results by histological type, five further studies have reported results
for all lung cancer, and a recently published systematic review
[20] confirms the lack of apparent effect of cigarette mentholation on the lung carcinogenicity
of cigarettes.

Dose–response relationships

We have investigated the relationship of lung cancer risk to various indices of the
dose–response relationship. We did not record data on our database for pack-years,
as we wished to investigate the separate roles of daily amount smoked and duration
of smoking. Indeed, previous work (e.g.
[19,28]) has in fact suggested that pack-years is not a valid measure, as for example, smokers
of 20 cigs/day for 40 years and smokers of 40 cigs/day for 20 years have very different
smoking RRs despite their identical pack-years. For those indices that we did consider
where there were substantial amounts of data – daily amount smoked, duration, age
of starting to smoke, and time of quit (relative both to current smoking and to never
smoking) – there was very clear evidence that greater exposure leads to greater risk,
not only for all lung cancer, but also for squamous and adeno. The results by time
of quit extend the observation that RRs in ex smokers are intermediate between those
of never smokers and current smokers. Because dose–response results are expressed
in categories of exposure which vary from study to study, there are difficulties in
combining the evidence over studies. We have used two approaches. One is to consider
the RR for the highest vs. lowest level of exposure (where highest and lowest refer
to expected risk, so that early ages of starting, for example, are considered highest).
The other is the key value approach where we consider categories including a specified
level of exposure and not including another specified level. Both approaches have
limitations. The highest vs. lowest approach will vary between study in the ratio
of exposures considered, while the key value approach, although combining results
relating to different exposures in different studies to a lesser extent, necessarily
omits results from studies with broader categories while somewhat arbitrarily selecting
or discarding RRs from studies with narrow categories. Work is ongoing on a third
approach to fit a dose–response curve to the RRs and estimated dose mid-points of
the categories for each study. This approach is complex, and was considered outside
the scope of the current paper, which was more intended to summarize major features
of the data. However, a future paper is planned which will describe the shape of these
dose–response relationships including characteristics of the curves, such as the estimated
time after quitting by which half the excess risk associated with continued smoking
has disappeared. We note that, when considering RR for time of quitting, the problem
of “reverse causation” needs to be taken into account, as evidenced by the data in
Table
19 showing no decrease in risk compared to current smokers for quitters of about 3 years.
Our analyses also showed that for all lung cancer, risk increased with increasing
tar level and with increasing fraction smoked (or equivalently short butt length),
data here being more limited and non existent by histological type. As noted earlier,
when discussing cigarette type, the relationship with tar level is not an artefact
of inappropriate adjustment for amount smoked
[27], as has been claimed
[26].

Derivation of RRs

Almost a third of RRs used in meta-analyses were not directly available from the source
or calculated directly from cross-tables of exposure by outcome, and required more
complex methods to derive the required RR. It was reassuring that whether or not the
RR was derived did not (with one minor exception) add predictive power to the main
meta-regression models, suggesting that our extensive use of derived RRs caused no
material bias.

Effect of studies with high RRs or large weight

The statistical analyses investigated the role of various characteristics on the estimated
risk of all lung cancer, squamous and adeno in relation to ever and current smoking,
but generally did not formally test the effect of exclusion of specific studies with
extreme RRs or large weights. An exception was the case of study LIU4 for ever smoking
and all lung cancer, this study not giving data for current smoking or by histological
type. The two sex-specific RRs for this study together contributed 50.9% of the weight
for the 328 available RRs from all the studies, and its exclusion increased the overall
fixed-effect RR from 4.22 (95% CI 4.16-4.28) to 6.47 (95% 6.34-6.60). However there
was little difference in the random-effects estimates, and in the meta-regression
analysis the two LIU4 RRs did not produce unusual standardized residuals, suggesting
that the relatively low RRs from this study (2.76, 2.69-2.83 for males, and 2.86,
2.77-2.95 for females), were due to the characteristics of the study included in the
model (in particular that it was conducted in China) and not due to its unusual results.
While there are other large studies, none involved nearly as many lung cancer cases
as LIU4, and we feel it unlikely that excluding other specific studies would have
had a major effect on our meta-analysis estimates or on our conclusions as to how
RRs varied by exposure, outcome and study and RR characteristics.

Representativeness

We did not exclude studies on the basis of the population studied. However, most studies
include subjects broadly representative of the general population. A small number
of studies were conducted in miners or in other occupations with a known or suspected
lung cancer risk, such as welding or foundry working. Risky occupation was considered
as a characteristic in the meta-regression models but was never found to be an independent
predictor of RRs associated with ever or current smoking.

Publication bias

It is well known that researchers are more likely to wish to publish, and editors
more likely to accept for publication, studies finding a statistically significant
association between exposure and disease. The published literature may therefore overstate
any true association or produce a false-positive relationship. As part of each meta-analysis
we have carried out Egger’s test of publication bias, though results are generally
shown only in the detailed tables. While evidence for such bias generally is mixed,
the results for all lung cancer suggest that, where significant bias is seen, it is
not in the direction of smaller studies with lower-weight RRs producing higher RRs.
Rather it is, as noted above, the larger studies that tend to produce higher RRs.
The reason for this finding is unclear. It should also be noted that our analyses
are based only on those studies satisfying the inclusion criteria, and that one of
these criteria restricted attention to studies with at least 100 lung cancer cases.

We have not attempted to try to correct for publication bias for four reasons. Firstly,
we feel that evidence for its existence is not strong. Second, any adjustment for
it seems unlikely to affect our main conclusions. Third, any adjustment for it would
be complicated by the restriction on study size. Finally, any correction for publication
bias would be open to question, as it inevitably involves assumptions that are impossible
to verify.

Bias due to misclassification of smoking status

Another source of bias is misclassification of smoking status. Random misclassification
would dilute the association, as would any tendency for cases to deny or understate
their smoking more than for the general population. Any tendency for current smokers
to claim to be ex-smokers, as might happen in a study conducted in a clinical setting
or where patients have been advised to stop smoking, would tend to inflate the risk
for ex smoking. Adjustment for misclassification would be difficult, as denial rates
are likely to vary by aspects of the study design, the way questions are asked, and
also by sex, age, location and other demographic variables.

Limitations

This review has various limitations, many unavoidable. Lack of access to individual
subject data limits the ability to carry out meta-analyses using similar exposure
indices and confounder adjustment throughout, but obtaining such data was not feasible
given many studies were conducted years ago. Obtaining a reliable definition of outcome
and exposure is often hindered by incomplete information in the source papers. We
do not consider that limiting attention to studies of 100 cases or more is of particular
importance as results from smaller studies would contribute little weight to the overall
meta-analyses. Limiting attention to studies conducted up to 1999 may be more relevant
for some exposures and issues (particularly the trend in RR over time), though we
feel that our consideration of data from 287 published studies should give a very
reliable overall picture. The problem is that the procedures conducted for this review
were extremely time-consuming and it would take some years to update the database
and include smaller and more recent studies.

It may also be argued that the analyses presented here do not make full use of all
the data collected. This is inevitable, given the extensive amount of information
collected and the need to present the findings in a paper of reasonable length. As
noted, when discussing dose–response, we do plan further analyses. We would also be
willing to make the database available to bona fide researchers for further analysis.

Conclusions

After excluding studies involving less than 100 lung cancer cases, we identified 287
epidemiological studies of lung cancer which provided information on risk in relation
to one or more of a defined list of smoking indices
[2,3,6,29-689]. Of the 267 independent principal studies, 262 provided RRs relating to all lung
cancer, 84 provided RRs relating to squamous cell carcinoma, and 86 provided RRs relating
to adenocarcinoma (or to outcomes that are closely equivalent). One major conclusion
is that for each outcome the RRs for all major smoking indices were markedly heterogeneous.

Another conclusion is that RR estimates for ever, current or ex smoking of any product
(or cigarettes if not available) are clearly elevated for all three outcomes. Individual
study RRs virtually all exceed 1.0, and based on random-effects meta-analyses of most-adjusted
RRs, increases were seen for ever smoking (all lung cancer 5.50, CI 5.07-5.96, n = 328
RRs; squamous, 10.47, 8.88-12.33, n = 102; adeno 2.84, 2.41-3.35, n = 107), current
smoking (all lung cancer 8.43, 7.63-9.31; squamous 16.91, 13.14-21.76; adeno 4.21,
3.32-5.34) and ex smoking (all lung cancer 4.30, 3.93-4.71; squamous 8.74, 6.94-11.01;
adeno 2.85, 2.20-3.70). For all lung cancer, RRs were also elevated for cigarettes
only smokers (ever smoking 6.36, 5.33-7.59) and mixed smokers of cigarettes and pipes/cigars
(7.37, 5.97-9.11), though lower for smokers of pipes/cigars only (2.92, 2.38-3.57),
pipes only (3.31, 2.51-4.35) and cigars only (2.95, 1.91-4.56). While pipe and cigar
smoking is associated with an increased risk for squamous, there is no increase for
adeno. The consistency and strength of the relationships are consistent with a causal
relationship (except for pipe and cigar smoking and adenocarcinoma). A causal relationship
is also supported by the fact that estimates are generally not materially affected
by adjustment for confounding variables, and by the strong evidence of a dose–response
relationship, with RRs for all outcomes clearly increasing with amount smoked, duration
and earlier starting age, and decreasing with time quit, and for all lung cancer increasing
with tar level and fraction smoked. Relationships were also clearly seen between smoking
and RRs for the other major histological types, small cell carcinoma and large cell
carcinoma.

Our review also provides evidence that risk varied by type of cigarette smoked, with
filter cigarette smokers having lower risks than plain cigarette smokers (a conclusion
not explained by “over-adjustment” for amount smoked), and that handrolled cigarette
smokers have higher risks than manufactured cigarette smokers, though mentholation
of cigarettes seems unrelated to risk. It also shows that various characteristics
of the study and of the RR affect risk estimates. Thus RRs were generally highest
for studies in North America and lowest for Asia, particularly in China, and higher
in later starting, larger and prospective studies. RRs were also somewhat higher in
males than in females, though this may be related to differences in their detailed
smoking habits. There is no clear tendency for the smoking/lung cancer relationship
to vary with age.

This comprehensive review provides further insight into the relationship of smoking
to lung cancer and its major histological types.

Competing interests

PNL, founder of P.N.Lee Statistics and Computing Ltd., is an independent consultant
in statistics and an advisor in the fields of epidemiology and toxicology to a number
of tobacco, pharmaceutical and chemical companies. This includes Philip Morris Products
S.A., the sponsor of this study. BAF and KJC are employees of P.N.Lee Statistics and
Computing Ltd.

Authors’ contributions

BAF and PNL were responsible for planning the study. Literature searches were carried
out by KJC with the assistance of PNL and BAF. Data entry was either carried out by
KJC and checked by BAF, or carried out by BAF and checked by PNL. Where appropriate,
difficulties in interpreting published data or in the appropriate methods for derivation
of RRs were discussed by BAF and PNL. The statistical analyses were conducted by BAF
along lines discussed and agreed with PNL. PNL and BAF jointly drafted the paper,
which was then critically reviewed by KJC. All authors read and approved the final
manuscript.

Acknowledgements

This research was funded by Philip Morris Products S.A. However the opinions and conclusions
of the authors are their own, and do not necessarily reflect the position of Philip
Morris Products S.A. We thank John Fry for assistance with the statistical analysis.
We also thank Pauline Wassell, Diana Morris and Yvonne Cooper for assistance in typing
the various drafts of the paper and obtaining the relevant literature.

Advisory Committee to the Surgeon General of the Public Health Service: Smoking and
health: Report of the Advisory Committee to the Surgeon General of the Public Health Service. Washington DC: US Department of Health, Education, and Welfare; Public Health Service:
Public Health Service Publication; 1964:1103.

US Surgeon General US Surgeon General: Reducing the health consequences of smoking.
25 years of progress: Reducing the health consequences of smoking. A report of the Surgeon General. Rockville, Maryland: US Department of Health and
Human Services; Public Health Services; 1989:89-8411.

Archer VE, Gillam JD, James LA: Radiation, smoking and height relationships to lung cancer in uranium miners. In Proceedings of the Third International Symposium on Detection and Prevention of Cancer. Edited by Nieburgs HE. New York: Marcel Dekker; 1979:1689-1712.

Cederlöf R, Friberg L, Hrubec Z, Lorich U: The relationship of smoking and some social covariables to mortality and cancer morbidity.
A ten year follow-up in a probability sample of 55,000 Swedish subjects age 18–69. Stockholm: Karolinska Institute, Dept of Environmental Hygiene; 1975.

Chen BY, Yang JF: A study of lung cancer risks in Zhengzhou. In Health effects of air pollution in 26 cities. Edited by He XZ, He DW, Xiao HP. Beijing, People's Republic of China: Ministry of Public Health; 1984:450-455.

Hammond EC: Smoking in relation to the death rates of one million men and women. In Epidemiological approaches to the study of cancer and other chronic diseases. Edited by Haenszel W. Bethesda, Maryland: U.S. Department of Health, Education, and Welfare. Public Health
Service National Cancer Institute; 1966:127-204.

National Cancer Institute Monograph 19

Hammond EC, Seidman H: Smoking and cancer in the United States.

Prev Med 1980, 9:169-173.

Hammond EC: Smoking in relation to mortality and morbidity. Findings in first thirty-four months
of follow-up in a prospective study started in 1959.

J Natl Cancer Inst 1964, 32:1161-1188.

Hammond EC: Evidence on the effects of giving up cigarette smoking.

Am J Public Health 1965, 55:682-691.

Hammond EC: Air pollution, smoking, and health.

Texas Journal of Medicine 1962, 58:639-647.

Hammond EC: Prospective studies on smoking in relation to death rates.

Jacobs EJ, Shapiro JA, Thun MJ: Cigar smoking in men and risk of death from tobacco-related cancers [Abstract of the
32nd Annual Society for Epidemiologic Research Meeting, Baltimore, Maryland, June
10–12, 1999].

US Surgeon General: The health benefits of smoking cessation. A report of the Surgeon General. Rockville, Maryland: US Department of Health and Human Services, Public Health Service,
Centers for Disease Control, Center for Chronic Disease Prevention and Health Promotion,
Office on Smoking and Health; 1990.

Hrubec Z, McLaughlin JK: Former cigarette smoking and mortality among U.S. veterans: a 26-year followup, 1954
to 1980. In Changes in cigarette-related disease risks and their implications for prevention and
control. Edited by Shopland DR, Burns DM, Garfinkel L, Samet JM. Rockville, Maryland: US Department of Health and Human Services, National Institutes
of Health, National Cancer Institute; 1997:501-530.

Rogot E, Murray JL: Smoking and causes of death among US Veterans: 16 years of observation.

Public Health Rep 1980, 95:213-222.

Rogot E: Smoking and general mortality among US veterans, 1954–1969. Bethesda, Md: Department of Health, Education and Welfare; 1974.

Rogot E: Smoking and mortality among U.S. veterans.

J Chronic Dis 1974, 27:189-203.

Dorn HF: Tobacco consumption and mortality from cancer and other diseases.

Public Health Rep 1959, 74:581-593.

Dorn H: Tobacco consumption and mortality from cancer and other diseases.

Acta Unio International Cancer 1960, 16:1653-1665.

Kahn HA: The Dorn study of smoking and mortality among U.S. veterans: report on eight and one-half
years of observation. In Epidemiological approaches to the study of cancer and other chronic diseases. Edited by Haenszel W. Bethesda, Maryland: U.S. Department of Health, Education, and Welfare. Public Health
Service National Cancer Institute; 1966:1-125.

National Cancer Institute Monograph 19

Herrold KM: Survey of histologic types of primary lung cancer in US veterans.

Pathol Annu 1972, 7:45-79.

Enstrom JE: Smoking cessation and mortality trends among two United States populations.

Esaki H, Chang CP: Epidemiologic study on deaths from lung cancer in Omuta City of Japan. An analysis
of the risk factors for lung cancer by a case–control study, especially on the aspects
of air pollution, occupation and smoking habits.

Nippon Eiseigaku Zasshi 1977, 31:703-710.

Fan R, Zheng S, Wu Z, Zhang R, Cao L, Li Y: Study of the relation between smoking as a lifestyle factor and lung cancer in Beijing
area of China.

Gillis CR, Hole DJ, Boyle P: Cigarette smoking and male lung cancer in an area of very high incidence. I. Report
of a case–control study in the West of Scotland.

J Epidemiol Community Health 1988, 42:38-43.

Godley FH: Cigarette smoking, social factors, and mortality: new estimates from representative
national samples Thesis.

University of Maryland 1974.

US Department of Health Education and Welfare - Public Health Service: Cigarette smoking status - June 1966, August 1967, and August 1968.

Mon Vital Stat Rep 1969, 18(Suppl):1-4.

Golledge AH, Wicken AJ: Local variation in the incidence of lung cancer and bronchitis mortality.

Medical Officer 1964, 112:273-277.

Wicken AJ, Buck SF: Report on a study of environmental factors associated with lung cancer and bronchitis
mortality in areas of north east England. London: Tobacco Research Council; 1964. Research Paper 8; 1964.

Goodman MT, Kolonel LN, Yoshizawa CN, Hankin JH: The effect of dietary cholesterol and fat on the risk of lung cancer in Hawaii.

Hammond EC, Selikoff IJ: Relation of cigarette smoking to risk of death of asbestos-associated disease among
insulation workers in the United States. In Biological effects of asbestos. Proceedings of a working conference held at the International
Agency for Research on Cancer, Lyon, France, 2–6 October 1972. Edited by Bogovski P, Timbrell V, Gilson JC, Wagner JC. Lyon: International Agency for Research on Cancer; 1973:312-316.

IARC Scientific Publications No. 8

Selikoff IJ: Recent perspectives in occupational cancer.

Ambio 1975, 4:14-17.

Saracci R: Asbestos and lung cancer: an analysis of the epidemiological evidence on the asbestos-smoking
interaction.

Hirayama T: Smoking and cancer: A prospective study on cancer epidemiology based on a census population
in Japan. In Health consequences, education, cessation activities, and social action. Volume II.
Proceedings of the Third World Conference on Smoking and Health, New York City, June
2–5, 1975. Edited by Steinfeld J, Griffiths W, Ball K, Taylor RM. USA: US Department of Health, Education and Welfare; 1977:65-72.

DHEW Publication No (NIH) 77–1413

Hirayama T: Prospective studies on cancer epidemiology based on census population in Japan. In Volume 1. Third international symposium on detection and prevention of cancer. Edited by Nieburgs HE. New York: Marcel Dekker; 1977:1139-1148.

1

Hirayama T: Diet and cancer.

Nutr Cancer 1979, 1:67-81.

Hirayama T: Smoking and cancer in Japan. A prospective study on cancer epidemiology based on census
population in Japan. Results of 13 years follow up. In The UICC Smoking Control Workshop. Edited by Tominaga S, Aoki K. University of Nagoya Press; 1982:2-8.

Hirayama T: Operational epidemiology of cancer.

J Cancer Res Clin Oncol 1981, 99:15-28.

Hirayama T: Smoking in relation to death rates of 265,118 men and women in Japan. Tokyo: National Cancer Center Research Institute; 1967.

Hirayama T: Cohort studies on smoking and mortality in Japan. In Tobacco and health 1990, the global war. Edited by Durston B, Jamrozik K. Perth: 7th World Conference on Tobacco and Health; 1990:36-40.

Kabat GC: Aspects of the epidemiology of lung cancer in smokers and nonsmokers in the United
States. In International symposium on lifestyle factors and human lung cancer. Dec 12–16 1994. Guangzhou, People's Republic of China; 1994:1-27.