R Basics: Practicing Common Statistical Analysis

Introduction Into R for Social Scientists

Practicing Common Statistical Analysis

Dr. Elze G. Ufkes

Benjamin Ziepert

1 Instructions

For the current session you will practice common statistical methods such as correlation, regression, etc. For this session you will need the material from the website https:\\benjaminziepert.com\teaching and your knowledge from the datacamp assignments. During the session you will create a R script file which is necessary for passing this course.

You have to submit the working R script file before 27th November 2018 to b.ziepert@student.utwente.nl to pass the R lectures. All steps that are in bold in the current handout have to be included in the script.

2 Research

For our analysis we will use data from experiments that studied the relationship between movement and cognition. Students had the task to transport supposed cocaine or flour and had to avoid border guards. You can see the movement data, measured with GPS, below. The students walked three or fours rounds and after each round they filled in a questionnaire about their feelings and thoughts. For instance, we were checking whether participants would walk closer together when they had the feeling they would do something illegal.

The following animation shows a participant walking four rounds from the start (left) to the finish (right). When the path is white then the participant transported flour and when the path is black then the participant transported cocaine. After passing the finish to the right, the participants would answer questionnaires.

The columns starting with "com_" are the measurements (components) from the questionnaire. All items were measured with a likert scale from 1 to 7. The columns sat1 - sat5 are the questions for the component com_sat and the columns dt1 - dt3 are the questions for com_dt.

Further, the columns starting with "ah_" are the measurements from the movement (GPS) sensors.

To see the labels of the columns you can run the following code. Please load the "Hmisc" package to make the code work.

To write an answer in your script you can use the comment indicator #. If you start a line with a # then this line will not executed by R. Please check the example below.

# How frightened was the most frightened participant?
# The most frightened participant reported a score of ...

6 Reliability

To measure the internal reliability of our questions we can calculate Cronbach's alpha with the function alpha() from the package "psych".

Be aware, we also use the package "ggplot2" and both, "ggplot2" and "psych", have a function alpha(). To tell R that we want to use alpha() from "psych" we can add "psych::" before the function psych::alpha().

Since we want to measure the internal reliability of all questions from the component "Alertness to Being Target of Guards" we select the columns with the questions sat1 - sat5.

library("psych")

##
## Attaching package: 'psych'

## The following object is masked from 'package:Hmisc':
##
## describe

## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha

When we are satisfied with the result and you don't want to perform a factor analysis / principal component analysis then you can create the "com_sat" component with the function rowMeans. In order that we don't overwrite our current component, we will create a new component called "com_sat2".

Tip: you can run ?rcorr to open the help page for this function. Under the heading "value" you can read what for information the function returns.

10 Regression

To perform a regression analysis you have to formulate the formula of the regression. You start with the dependent variable and sepearte it with ~ from the predictor. The generalized form of the formula is y ~ x.

To save the formula we use fit <- lm(y ~ x, data = data) where data is a data frame or matrix with the columns x and y. Finally, we run summary(fit) to show the results of the regression analysis.

For instance, if you want to know whether "Alertness to Being Target of Guards" is a significant predictor for "Speed" then you can run the following code.

As you can see in the coefficients table above, the predictor com_sat is not a significant predictor for ah_mean_kmh with p = .255.

For the next step you will perform your own regression analysis with "Suppressed Impulses to Change Movement" as predictor and "Variation Route Deviation" as dependent variable.

"Suppressed Impulses to Change Movement" is the attempt of participants to walk normal in the precense of the guards in order not to give themselves away as smugglers with supposed cocaine. Further, "Variation Route Deviation" is an indicator how often particiapnts changed their route.

Is "Suppressed Impulses to Change Movement" a significant predictor for "Variation Route Deviation"? Please write your answer in the same manner as you would do in a result section of your thesis. However, formatting such as italic is not necessary.