Multiple logistic regression analysis of cigarette use among high school students

Transcription

1 Multiple logistic regression analysis of cigarette use among high school students ABSTRACT Joseph Adwere-Boamah Alliant International University A binary logistic regression analysis was performed to predict high school students cigarette smoking behavior from selected predictors from 2009 CDC Youth Risk Behavior Surveillance Survey. The specific target student behavior of interest was frequent cigarette use. Five predictor variables included in the model were: a) race, b) frequency of cocaine use, c) initial cigarette smoking age, d) feeling sad or hopeless, and e) physically inactive behavior. The results of the logistic regression analysis showed that the full model, which considered all the five independent variables together, was statistically significant.. The strongest predictors of youth smoking behavior were race, frequency of cocaine use and physically inactive behavior. For example, the odds of smoking are increased by a factor of 5.0 if the student is White compared to an African American, controlling for other variables in the model. The logistic model employed explained about 31% of the variance in current frequent cigarette use among the high school students. It correctly classified 93% of the cases. The key finding is that the selected variables are important correlates of frequent cigarette use among high school students. Keywords: Logistic regression, CDC, Youth risk behavior surveillance system, cigarette smoking. Multiple logistic regression analysis, Page 1

2 Tobacco use is the single most preventable cause of disease, disability, and death in the United States. Each year, an estimated 443,000 die prematurely from smoking or exposure to secondhand smoke, and another 8.6 million have a serious illness caused by smoking (CDC, p.1). INTRODUCTION The above quotation from CDC documents the harmful effects of cigarette smoking. Despite the risks associated with smoking, CDC (2010) estimates 46 million U.S. adults smoke cigarettes. The Department of Health and Human Services Center for Disease Control (CDC) and Prevention maintains cigarette use is the leading preventable cause of death in the United States. The CDC lists tobacco use by young adults as one of its priority health-risk behaviors (CDC, MMWR, 2004). Because most smokers initiate cigarette use during adolescence (Hersch, 1998), the prevalence of both past-year and past-month smoking peeks during smokers' late teens or early twenties. The adverse longterm effects of cigarette smoking as documented above by the CDC are inversely correlated with earlier initial smoking ages. Concomitantly, the CDC has set a national objective to reduce the prevalence of cigarette use among high school students to less than 16% for the year 2010 (CDC, MMWR, 2006). The CDC has developed a Youth Risk Behavior Surveillance System (YRBSS) to monitor the health risk behaviors among American youth. The system monitors six categories of high-risk behaviors that according to the CDC contribute substantially to "the leading causes of death, disability, and social problems among youth and adults in the United States" (CDC, MMWR, 2010 page 1). The categories of high-risk behaviors are: 1. Behaviors that contribute to unintentional injury and violence, 2. Alcohol and other drug use, 3. Sexual behaviors that contribute to unintended pregnancy and sexually transmitted diseases including HIV infection, 4. Physical inactivity, 5. Obesity and dietary behaviors, and, most importantly for this study, 6. Tobacco use. The purpose of this study was to assess the impact of a set of predictors on cigarette smoking behavior of high school students. Specifically, the target outcome behavior of interest is Current Frequent Cigarette Use (CFCU) among the youth. This study sought to answer the question: Can CFCU among the youth be accurately predicted from Race, Frequency of Cocaine Use (FCU), age at which the youth smoked a whole cigarette for the first time [Initial Cigarette Smoking Age (ICSA)], feeling of sadness or hopelessness, [Felt Sad or Hopeless (FSH)] and Physically Inactive Behavior (PIB). See Figure 1 for the specific CDC questionnaire items and the predictors or independent variables of this study. The paper is structured as follows: First, I briefly review the methodological strategy employed and the data source for this study. Subsequently, I present the results of the data analysis through descriptive statistics and a logistic regression indicating the significant predictors. Finally, I summarize the study results. METHOD The dependent or the outcome variable of interest, Current Frequent Cigarette Use was constructed as a yes/no dichotomous indicator of smoking status based on the Multiple logistic regression analysis, Page 2

3 response to CDC (2009) survey questionnaire item: During the past 30 days, on how many days did you smoke cigarettes? Respondents who answered they smoked 20 to 29 days or all the 30 days were classified as current frequent cigarette user (coded yes otherwise no for less frequent or non cigarette users). The categorical dependent variable of the study necessitated the use of multiple logistic regression model for investigating whether the likelihood of current frequent cigarette use among the youth was related to the selected predictors above. (Menard, 2010; Hosmer, & Lemeshow, 2000). The specific logistic regression model fitted to the data was: Logit (CFCU) = bo + b1 (Hispanic) + b2 (White) + b3 (ICSA) + b4 (FSH) + b5 (FCU) + b6 (PIB). (Where b0 is a constant. b1, b2,... b6 are logistic coefficients or estimates for the parameters, β1, β2,... β6). Race was a design variable coded, Hispanic = 1, White = 2, African American was the reference category. PIB and FSH are categorical variables and were dummy coded 0, 1. DATA SOURCE This study drew from the CDC s 2009 National Youth Risk Behavior Survey (YRBS), a questionnaire containing items designed to elicit information from high school students about the fore-mentioned categories of health-related risk behaviors, along with basic demographic information. The sampling frame for the survey consisted of all public and private schools with 9-12 grade students. Representative samples of students were drawn from those grades. All the questionnaires were self-administered. Student response rate for the national survey was 88%, approximately students. For this study, usable data from students were analyzed. DATA ANALYSIS. Prior to analysis of the data the dependent variable of interest, current frequent cigarette use which had binary response of Yes/No, was recoded 0, 1 (No/Yes). The coding change was made to reflect the predicted target category, current frequent cigarette use. As suggested by Menard (2010), preliminary analysis of the data was performed to check the assumptions of logistic regression with respect to the selected predictors of the study. ICSA, FSH, FCU, PIB and Race were subjected to Linear regression analysis to evaluate multicollinearity among the predictors or the independent variables. Multicollinearity among predictors in logistic regression creates problems for the validity of the model for the investigation. In particular, it affects the validity of the statistical tests of the regression coefficients by inflating their standard errors. (Garson, 2010). The results of the analysis showed that the data did not violate the multicollinearity assumption. The tolerance value of each independent variable was greater than.720 which exceeded the suggested criteria of below.10. (Pallant, 2007). Lack of multicollinearity among the independent variables was also supported by the obtained variance inflation factor (VIF) values. They were all well below the cut-off value of 10. (Field, 2005). The VIF values of Multiple logistic regression analysis, Page 3

4 the variables ranged from to After the preliminary analysis of the data, the binary logistic regression procedure in SPSS was used to perform the analysis to determine whether the likelihood of CFCU could be predicted from the independent variables. Data from high school students were included in this analysis. RESULTS Sample description: The age distribution of the students ranged from 14 to 18 years old. Among the respondents, approximately 70% were White, 17% African Americans and 13% Hispanics. About 11% of the students smoked a whole cigarette for the first time before age 13. About 46% of the students have ever smoked cigarette. The prevalence of Ever smoked cigarettes was higher among Hispanic students (51.6%) than White (46.1%) and African Americans (43.5). Most Ever smokers first smoked at age 13 or 14. The percentage of Ever smokers that were 13 or 14 years old was 25%. The corresponding percentages for Ever smokers that were 11 or 12, 15 or 16 years old were 12% and 23% respectively. The results of the data analysis showed that the proportion of students who have tried smoking increased with age. For example, by the age of 18, approximately 53% of the youth have tried smoking. In contrast to the relatively high prevalence (51.6%) of Ever Smoked Cigarettes among Hispanic youth, a relatively small percentage of Hispanics have ever smoked cigarettes daily, i.e., had ever smoked at least one cigarette every day for 30 days. For example, the prevalence of ever smoked cigarettes daily was higher among White (13.7%) than African American (4.3%) and Hispanic (6.3%). The results of the logistic regression analysis show that the full model which considered all the five independent variables together was statistically significant, χ2 = , df = 6, N = 11424, p <.001.This implies that the odds for a high school student to indicate that he was a current frequent cigarette user were related to the five independent variables, Race, ICSA, FSH, FCU, and PIB. The model correctly classified approximately 93% of the cases. The pseudo R estimates indicate that the model explained between 13% (Cox & Snell R Squared) and 31% (Nagelkerke R Squared) of the variance in current frequent cigarette use. Table 1 presents a summary of the raw score binary logistic regression coefficients, Wald statistics, odds ratios [(Exp (B)] along with a 95% CI. Wald statistics indicate that all the variables significantly predict current frequent cigarette use. The strongest predictor of CFCU was race. In particular, white. The odds ratio for white was 5.0 i.e., the odds of a high school student indicating that he is a current frequent cigarette user are increased by a factor of 5.0 if the student is White compared to African American adjusting for the effects of the other predictors in the model. Other predictors that made significant contribution to the model (CFCU) were frequency of cocaine use, physical inactive behavior, feeling sad or hopeless and initial cigarette smoking age. The older the youth before he smoked a whole cigarette for the first time, the more likely he would report that he is a current frequent cigarette user. The predictor (ICSA) recorded an odds ratio of 1.6. Thus, the odds of smoking frequently compared to not smoking cigarettes frequently increase by a factor of 1.6 for a unit increase in age from when the youth smoked cigarette for the first time. In other words, the odds of Multiple logistic regression analysis, Page 4

5 current frequent cigarette use increase by 60% for each unit increase in ICSA. (Warner, 2008). Cocaine use and cigarette smoking behavior of young people were strongly related. Frequent cocaine use (FCU) predicts smoking behavior. (p <.001). For a unit increase in the number of times the youth uses any form of cocaine, including powder, crack, or freebase, the odds for smoking cigarettes frequently i.e., smoking 20 to 29 days or 30 days in one month are increased by a factor of 3.5 when all other variables are held constant. Feeling sad or hopeless recorded an odds ratio of 1.7. This indicates that the odds of smoking cigarettes frequently were about 1.7 times higher for high school students who felt sad or hopeless than for those who did not feel that way. As shown in Table 1, physically inactive behavior is implicated in student smoking. It recorded an odds ratio of about 1.7. In other words, students who were physically inactive have higher odds of smoking frequently ( more than 1.7 times as high) compared with students who are physically active, controlling for other variables in the model. SUMMARY Multiple logistic regression was used to jointly examine the influence of race, frequency of cocaine use, physically inactive/active behavior, initial cigarette smoking age and feeling sad or hopeless The key finding is that the selected variables are important correlates of current frequent cigarette use among high school students. The strongest predictors of youth smoking behavior are race, frequency of cocaine use and physically inactive behavior. For example, the odds of smoking are increased by a factor of 5.0 if the student is White compared to an African American. The logistic model employed explained about 31% of the variance in current frequent cigarette use among the high school students. It correctly classified 93% of the cases. Multiple logistic regression analysis, Page 5

6 Table 1- Logistic regression predicting the likelihood of high school students reporting frequent cigarette use. Predictor B S.E Wald Df P Odds Ratio 95% C.I. Lower 95% C.I Upper Race White Hispanic FCU FIB FSH ICSA Constant Figure 1- Selected predictors from CDC s 2009 National Youth Risk Behavior Survey (YRBS). CDC questionnaire item Variable name Acronym Q5. What is your race Race Q23. During the past 12 months, did you feel ever so sad of hopeless almost every day for two weeks or more in a row that you stopped doing some usual activities? Q29. How old were you when you smoked a whole cigarette for the first time? Q50. During the past 30 days, how many times did you use any form of cocaine, including powder, crack, or freebase? Q80. During the past 7 days, on how many days were you physically active for a total of at least 60 minutes per day? Felt sad or hopeless Initial cigarette smoking age Frequency of cocaine use Physically inactive/active behavior FSH ICSA FCU PIB Multiple logistic regression analysis, Page 6

Task Chapter 6: Answers Recent research has shown that lecturers are among the most stressed workers. A researcher wanted to know exactly what it was about being a lecturer that created this stress and

Determinants and Correlations of Excessive Alcohol Use and Depression among College Students in a North East University Irfan Syed, M.B.B.S., M.P.H Sandra Minor Bulmer, Ph.D. Christine Unson, Ph.D. SOUTHERN

A cohort of people is a group of people whose membership is clearly defined. A prospective study is one in which a cohort of people is followed for the occurrence or nonoccurrence of specified endpoints

By Hui Bian Office for Faculty Excellence 1 Email: bianh@ecu.edu Phone: 328-5428 Location: 2307 Old Cafeteria Complex 2 When want to predict one variable from a combination of several variables. When want

Binary Logistic Regression Main Effects Model Logistic regression will accept quantitative, binary or categorical predictors and will code the latter two in various ways. Here s a simple model including

Linear Regression in SPSS Data: mangunkill.sav Goals: Examine relation between number of handguns registered (nhandgun) and number of man killed (mankill) checking Predict number of man killed using number

136 Poisson Regression Analysis 13. Poisson Regression Analysis We have so far considered situations where the outcome variable is numeric and Normally distributed, or binary. In clinical work one often

Substance Abuse 108 Background The deliberate use and overuse of harmful substances has a serious impact on the quality of life of Maine people. As a result of substance abuse, the lives of Maine residents

Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

Young People in Liverpool: Synthetic Estimates of Smoking Prevalence Introduction Smoking continues to be a major public health concern in England and is one of the main causes of disease and death (Department

Weight perception and control behaviors among secondary school students in Trinidad and Tobago N. Brathwaite & L. Wilson, University of the Southern Caribbean, Trinidad & Tobago; Alabama State University,

Engagement at Work Predicts Changes in Depression and Anxiety Status in the Next Year October 2009 Sangeeta Agrawal, MS and James Harter, Ph.D. For more information about Gallup Consulting or our solutions

Cairo University Faculty of Economics and Political Science Statistics Department English Section Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study) Prepared

School Staff Referrals for Connecting Students to HIV/STD Testing Catherine N. Rasberry, PhD, MCHES Division of Adolescent and School Health Centers for Disease Control and Prevention (CDC) American School

1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are highly extraverted people less afraid of rejection

Interpretation and Implementation 1 Categorical Variables in Regression: Implementation and Interpretation By Dr. Jon Starkweather, Research and Statistical Support consultant Use of categorical variables

How Do We Test Multiple Regression Coefficients? Suppose you have constructed a multiple linear regression model and you have a specific hypothesis to test which involves more than one regression coefficient.

Credit Risk Analysis Using Logistic Regression Modeling Introduction A loan officer at a bank wants to be able to identify characteristics that are indicative of people who are likely to default on loans,

Maternal and Child Health Issue Brief Why is substance abuse an issue among youth? December 14 8 Substance Abuse among Youth in Colorado Substance abuse among youth is defined as using alcohol, tobacco,

MICHIGAN STATE BOARD OF EDUCATION POLICY ON COMPREHENSIVE SCHOOL HEALTH EDUCATION The Michigan State Board of Education promotes school success through coordinated school health programs. 1 Schools cannot

Guiding Principles for Promoting Adolescent Health Adolescence the transition from childhood to adulthood is one of the most dynamic stages of human development. It is a time of marked physical, emotional,

Predicting Successful Completion of the Nursing Program: An Analysis of Prerequisites and Demographic Variables Introduction In the summer of 2002, a research study commissioned by the Center for Student

Fleet and Marine Corps Health Risk Assessment, 1 January 31 December, 2014 Executive Summary The Fleet and Marine Corps Health Risk Appraisal is a 22-question anonymous self-assessment of many of the most

When to Use Which Statistical Test Rachel Lovell, Ph.D., Senior Research Associate Begun Center for Violence Prevention Research and Education Jack, Joseph, and Morton Mandel School of Applied Social Sciences

If you live in Lubbock A Statistical Review A report given to the Board of Health, City of Lubbock, March 2011 Brian D. Carr, Ph.D., Board Member *denotes areas of possible intervention Population Total

COURSE DESCRIPTION The course Data Analysis with SPSS was especially designed for students of Master s Programme System and Software Engineering. The content and teaching methods of the course correspond

Stats - Moderation Moderation A moderator is a variable that specifies conditions under which a given predictor is related to an outcome. The moderator explains when a DV and IV are related. Moderation

RELATIVE RISK AND ODDS RATIOS Other summaries that are often computed when investigating the relationship between two categorical variables are the relative risk ratio and the odds ratio. EXAMPLE: Consider

Multiple regression Introduction Multiple regression is a logical extension of the principles of simple linear regression to situations in which there are several predictor variables. For instance if we

15//215 Annual new HIV and AIDS diagnoses and deaths: UK, 1981-211 Trends in HIV testing and undiagnosed HIV in men who have sex with men in London, United Kingdom (UK) 2-213: implications for HIV prevention

DHS WORKING PAPERS Obesity and Related Factors among Jordanian Women of Reproductive Age Based on Three DHS Surveys, 2002-2012 Mohannad Al-Nsour Ali Arbaji 2014 No. 115 September 2014 This document was

10.1177/1098214005275825 American Liberman / Journal How Much of Evaluation More Likely? / June 2005 How Much More Likely? The Implications of Odds Ratios for Probabilities Akiva M. Liberman National Institute

Evaluation of Peer Court, Inc.: 1993-2001 Statistics and Recidivism Andrew Rasmussen, M.A. Department of Psychology University of Illinois at Urbana-Champaign Presented to the Board of Directors of Peer

Durham County Community Health Assessment This document presents key findings from the 2011 Durham County Community Health Assessment. The goal of the assessment was to provide a compilation of valid and

Paper 194-29 Head of the CLASS: Impress your colleagues with a superior understanding of the CLASS statement in PROC LOGISTIC Michelle L. Pritchard and David J. Pasta Ovation Research Group, San Francisco,