Download Presentation

Lecture 13: Statistics

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

Presentation Transcript

Lecture 13: Statistics

Lecture 13: Statistics

Outline:

Conventions in presenting quantitative data

Tests of significance-when to use which?

Chi-square tests

Goldvarb

T-tests (seminar)

Pearson’s r (seminar)

Greg Guy

1. Some conventions

1. Tables must always have a title (legend). Ideally, the title should be descriptive enough that your reader doesn’t have to read the text to interpret it. That is, your table should be able to stand alone.

Tables should also be numbered. The number precedes the legend:

Table 1. Frequencies of use of be like by speaker sex.

3. Figures in columns should lined up by decimal places.

4. Provide N’s as well as %’s when possible in your tables.

1. Some conventions

(Source: my diss.)

1. Some conventions

5.You should refer to all tables in your text.

Additional information needed to interpret the table--such as notes on differences in significance go below the table in a footnote.

Figures must include units of measurement.

On figures, the independent variable goes on the x axis and the dependent variable goes on the y-axis.

If your independent variable is a category variable use a bar-graph.

1. Some conventions

From Tagliamonte and Hudson 1999:162

1. Some conventions

Figure X: Coda r deletion in NYC by style

From Labov 1966

1. Some conventions

10.Cardinal rule for tables: Frequencies should reflect use of the dependent variable as a proportion of the total number of tokens of the independent variable (not the other way around.)

Table X: Use of be like vs. say by speaker sex. (BAD Table!)

1. Some conventions

The %s here tell us that 33% of our be like tokens are by men and 67% are by women. Our say tokens are evenly distributed by sex.

But this isn’t what we want to know. Rather, what we want to know is whether men use be like vs. say to a greater or lesser extent than women. To see this, we need to look at use of be like vs. say among men as a proportion of the total number of tokens for men.

1. Some conventions

Table X: Use of be like vs. say by speaker sex. (GOOD Table!)

The %’s here tell a very different story. Here, we see that Women tend toward be like much more strongly than men.

3. When to use which test

In your reading of sociolinguistic work, you’ll have noticed different kinds of tests of significance: chi-square tests, t-tests, F-tests. There are others, but these are some of the most frequently used.

3. When to use which test

3. When to use which test

t-tests are used with a nominal variable and a continuous variable. More precisely, they compare the means of two samples.

F1s by social class

3. When to use which test

A good question that’s undoubtedly in everyone’s head right now: What kind of test should be use with two quantitative variables. Say, for example, that we want to look at F1 by speaker age?

Typically, measurements of correlation are used in such cases. These indicate to what degree one variable predicts or covaries with another.

4. The chi-square test

How it works.

What a chi-square test does is test the null hypothesis, that is, that there is NO relationship between our variables.

It compares observed values in a distribution with the expected values and measures the probability that the difference in these two is by chance.

4. The chi-square test

How it works.

The observed values are what we have in our data, which, let’s suppose, is the following.

Observed values: Use of be like vs. say by speaker sex

4. The chi-square test

How it works.

How, then do we determine the expected values? First, look at the totals for be like and say. How would we expect them to be distributed if there were no relationship?

Expected values:Use of be like vs. say by speaker sex

4. The chi-square test

How it works.

How, then do we determine the expected values? First, look at the totals for be like and say. How would we expect them to be distributed if there were no relationship?

Expected values:Use of be like vs. say by speaker sex

4. The chi-square test

How it works.

Now, in the previous example, figuring out the expected values was easy because the number of tokens for men and women was the same. What would we do if it wasn’t?

4. The chi-square test

How it works.

Now, in the previous example, figuring out the expected values was easy because the number of tokens for men and women was the same. What would we do if it wasn’t?

Observed values:Use of be like vs. say by speaker sex

4. The chi-square test

How it works.

Easy. The expected values for each cell will be:

((∑column)(∑row))/total.

Expected values:Use of be like vs. say by speaker sex

4. The chi-square test

How it works.

And so on.

Expected values:Use of be like vs. say by speaker sex

4. The chi-square test

How it works.

Once you calculate one cell’s value in this way, you can calculate the rest by subtracting from the marginals.

Expected values:Use of be like vs. say by speaker sex

4. The chi-square test

How it works.

Once filled out, our table of expected values will look like this.

Expected values:Use of be like vs. say by speaker sex

4. The chi-square test

How it works.

We then compare the observed and the expected values in each cell. Note that the difference in each case is 6.15 (absolute value).

4. The chi-square test

How it works.

To figure out what chance there is that this difference is by chance we use the following formula.

∑(observed-expected)2/expected

This means three steps:

We square the difference between the observed and expected values in each cell

We divide this number by the expected value for each cell

We then add all of these cell values together.

4. The chi-square test

How it works.

Let’s do this step by step.

First, subtract the expected from the observed for each cell.

4. The chi-square test

How it works.

Second, square these differences.

4. The chi-square test

How it works.

Third, divide these squares by the expected values for each cell.

4. The chi-square test

How it works.

Finally, add all of these values together and this is our chi-square value.

X2=.46+.74+.57+.93=2.70

4. The chi-square test

How it works.

We then look up our chi-square on a chi-square table, which will give the probabilities (p-values) associated with each chi-square. (Such a table can be found on the web or in the back of any statistics book.)

To do this we will need to know the degrees of freedom (d.f.) The degrees of freedom for a chi-square is (#rows-1)(#columns-1. Recall that we have a 2 x 2 table, so our d.f.= (2-1)(2-1)=1.

4. The chi-square test

How it works.

Looking in our table, we find the following at 1 d.f.:

chi-squarep

3.84.05

5.41.02

6.64.01

10.33.001

4. The chi-square test

How it works.

Our chi-square value of 2.70, then, means that there’s more than a 5% chance that our observed relationship is by chance! (In fact, it’s about 10%.)

At the level of p=.05, then, we do not reject the null hypothesis.

NB: chi-square tests cannot be performed when the expected frequency of any given cell is less than 5 and not ideal when total N < 20. Instead use Fisher’s exact test.

4. The chi-square test

Other ways to do chi-square tests

Excel

On a spread sheet, you’ll need a table of observed values and a table of expected values, as above.

Click on a free cell. This is where your result will appear.

Then, on the “Insert” menu at the top, select “Function.” A dialog box will then appear with two columns in it.

In the left column select “statistical.” In the right column, select “CHITEST.”

4. The chi-square test

Other ways to do chi-square tests

Excel

A new dialog box will appear with two fields, one asking you for a range of observed values and one asking for a range of expected values.

You can input these by right-clicking and dragging the cursor over the relevant tables in your spreadsheet, and selecting “OK.”

Excel will then give you the p-value.

4. The chi-square test

Other ways to do chi-square tests

Web pages

An even easier solution is to use webpages such as the following:

http://www.graphpad.com/quickcalcs/contingency1.cfm

No explanation necessary for this!

5. Goldvarb

Goldvarb (Varbrul) is a kind of multivariate analysis (specifically, a logistic regression model).

In the kind of variation data that we typically work with, different kinds of factors combine to produce the patterns of variation we see.

A speaker’s use of t-glottaling, say, may be influenced by his/her age, gender, dialect, as well as linguistic factors such as preceding and following segment.

The problem, then, is to sort out the effect of these competing constraints on variation.

5. Goldvarb

What Goldvarb does is build a model of this variation that estimates the contribution of different factors on the dependent variable.

Variables correspond (roughly) to factor groups, and the different categories of these variables are called factors. The factor group sex, for example, will (presumably) have the factors male and female.

Each factor in each group is then assigned a weight which estimates its contribution to the application value-a variant of the dependent variable.

5. Goldvarb

Source: my diss.

5. Goldvarb

In reporting goldvarb analyses, authors typically also provide the N’s and the input (also corrected mean or overall tendency). This is roughly the overall likelihood of occurrence of the application value.

It is also typical to report N’s and frequencies for each factor.

Note, also, that goldvarb is only used with non-categorical variables. Variables for which variation is categorical or near categorical >~95% are excluded.