Task 2c: How to Obtain Confidence Intervals for Geometric Means Using Stata

This task will provide you with a method to obtain confidence
intervals for geometric means.

When the data are highly skewed you will need to transform them.
For example, you can obtain the geometric mean by applying a log
transformation to the data.

In this example, you will obtain geometric means for the fasting
serum triglyceride variable. You can see that fasting triglycerides
has a right skew by looking at the distribution with this command:
sum lbxtr [w=wtsaf4yr], det – which shows that median value is 106
but the mean is 135. So, the geometric mean is a better
representation of central tendency than the regular mean.

Obtain the mean and its standard error
of the log transformed fasting serum triglyceride variable from the Stata command svy:mean and then use ereturn display,
eform( ) to display the exponentiated coefficients (geometric
mean, standard error and confidence interval). The explanations in
the summary table below provide an example that you can follow.

WARNING

There are several things you should be aware of while analyzing
NHANES data with Stata. Please see the
Stata Tips page to review them
before continuing.

Step 1: Use svyset to define survey design variables

Remember that you need to define the SVYSET before using the SVY series of commands.
The general format of this command is below:

svyset [w=weightvar], psu(psuvar) strata(stratavar) vce(linearized)

To define the survey design variables for your fasting serum triglyceride analysis, use the weight variable
for four-years of MEC data obtained from persons who fasted nine hours and were
examined in the morning at the MEC(wtsaf4yr), the PSU variable (sdmvpsu),
and strata variable (sdmvstra) .The vce option specifies the
method for calculating the variance and the default is "linearized" which is
Taylor linearization. Here is the svyset command for
four years of MEC data obtained from persons who fasted nine hours and were
examined in the morning:

svyset [w=
wtsaf4yr], psu(sdmvpsu) strata(sdmvstra) vce(linearized)

Step 2: Create log transformed variable

The gen command is used to created new variables. The
ln option creates the log of the variable of interest.
The general format of this command is below.

gen logvar=ln(var)

In this example, you will create the log transformed
triglycerides variable (lnlbxtr) for the triglycerides
variable (lbxtr) using this command:

gen lnlbxtr=ln(lbxtr)

Step 3: Use svy:mean to generate geometric means and standard
errors in Stata

Now, that the svyset has been defined you can use the Stata command, svy: mean,
to generate means and standard errors. To display the
geometric mean in the original units of the variable, use the
ereturn display command with the eform option. The general command for obtaining weighted
means and standard errors of a subpopulation is below.

svy: mean varname,
subpop(if condition)

ereturn display,
eform(varname)

Use the svy : mean commandwith the log
transformed triglyceridevariable (lnlbxtr) to estimate
the mean the geometric mean of triglycerides for people age 20 years and older. Use the subpop( )
option to select a subpopulation for analysis, rather than
select the study population in
the Stata program while preparing the data file. This
example uses an if statement to define the subpopulation
based on the age variable's (ridageyr) value. Another option
is to create a dichotomous variable where the subpopulation of
interest is assigned a value of 1, and everyone else is assigned a
value of 0.
Use ereturn display, eform( ) to
display the geometric mean in the
original units of triglyceride (i.e.,
the exponentiated coefficients) (geo_mean),
standard error, and confidence interval.

svy:mean
lnlbxtr, subpop(if ridageyr>=20 &
ridageyr<.)

ereturn display,
eform(geo_mean)

Output of svy:mean

Step 4: Use over option of svy:mean
command to generate geometric means and standard
errors for different subgroups in Stata

You can also add the over() option to the svy:mean
command to generate the means for different subgroups. To display
the geometric mean in the original units of the variable, use the
ereturn display command with the eform option. Here is the general format of
these commands for this example:

svy: mean varname,
subpop(if condition) over(var1 var2)

ereturn display,
eform(varname)

Use the svy : mean commandwith the log
transformed triglyceridevariable (lnlbxtr) to estimate
the mean the geometric mean of triglycerides for people age 20 years and older. Use the subpop( )
option to select a subpopulation for analysis, rather than
select the study population in
the Stata program while preparing the data file. This
example uses an if statement to define the subpopulation
based on the age variable's (ridageyr) value. Another option
is to create a dichotomous variable where the subpopulation of
interest is assigned a value of 1, and everyone else is assigned a
value of 0.
Use the over option to get
stratified results. This example produces estimates by gender and
age. Use ereturn display, eform( ) to
display the geometric mean in the
original units of triglyceride (i.e.,
the exponentiated coefficients) (geo_mean),
standard error, and confidence interval.

Output of svy:mean command with over
option

Step 5: Review Output

Here is a table summarizing
the output for the variable fasting triglyceride (lbxtr):

Summary output for the variable fasting triglyceride (lbxtr)

Subpopulation analyzed

Number of respondents with data

Geometric
Mean

95% confidence interval

Adults age 20 and older

3,982

122

118-126

Men
age 20 and older

1,893

130

124-137

Women age 20 and older

2,089

114

111-118

Men age 20-29

103

96-111

Men age 30-39

122

115-129

Men age 40-49

153

136-172

Men age 50-59

148

135-162

Men age 60-69

141

129-154

Men 70+

125

117-134

Women age 20-29

97

91-104

Women age 30-39

102

96-107

Women age 40-49

104

96-112

Women age 50-59

133

123-143

Women age 60-69

144

136-152

Women age 70+

142

133-151

According to the stratified analysis, men's fasting trigylcerides is
16 points higher than women's. Confidence intervals
can also be used as a first glance to see if two groups are
different, for example the CI for mean serum triglycerides for total
males (CI 124, 137) and total females (CI 111, 118) do not overlap,
indicating that the two groups are likely to be different. However,
a test for statistical difference, such as a t-test, should be
performed in order to definitively determine a significant
difference between the mean for two population sub-groups. The
geometric mean for males increases up to age 40-49 years
and then declines. The geometric mean for females
increases up to age 60-69 years and then declines. The
width of the confidence interval (CI) is wider for males
than for females, and is the largest for males 40-49
years, indicating more variability in the mean serum
triglycerides in this group.