Gazetteer of Southern Vowels

This site was created to allow you to interact with data extracted from the Digital Archive of Southern Speech. Please report bugs, suggestions, or comments to Joey Stanley at joeystan@uga.edu.

Where do the data come from?

The Digital Archive of the Southern Speech (DASS) is an audio corpus of semi-spontaneous linguistic atlas interviews (Kretzschmar
et al.
2013) derived from the Linguistic Atlas of the Gulf States (Pederson
et al.
1986). It contains speech from 64 natives (34 men and 30 women, born 1886–1965) of 8 Southern US states. This sample contains a mixture of ethnicities, social classes, education levels, and ages.

DASS is currently undergoing transcription, forced alignment, and acoustic analysis (see Renwick
et al.
2017 and Olsen
et al.
2017). We use
DARLA
for forced alignment and FAVE for formant extraction. We have removed all filters from FAVE so that
all
vowel tokens, whether they be from unstressed syllables or stopwords, are included here. Currently, this site displays
988,217
vowel tokens from
63
speakers.

What does this site do?

Currently, the site has four main pages:

Vowel Plot Comparison:
On this page, you can subset the DASS data by many demographic attributes and view the corresponding speakers' vowel tokens plotted in F1, F2 space. You can also subset by stress, vowel, word, and following consonant and choose what normalization technique (if any), filtering, and transcription system should be used. The plots are extremely customizable and you can change how the data is displayed. Two graphs are included on this page to—given a large enough screen size—facilitate side-by-side comparison of subgroups. Below each graph are tables that give basic summaries of the speakers and the vowels selected.

Interactive Vowel Plot:
Here you can focus on specific portions of individual speakers' vowel space and see words rather than just points. If you click on the plot itself, a table at the top will display the five points nearest to where you clicked, showing you exact formant measurements, the word, and the speaker associated with that observation.

Point Pattern Analysis:
This is an alternative way of viewing the vowel space, pioneered by Kretzschmar. On this page, you can again subset the data the same as on the other two pages and see a scatterplot in F1, F2 space. The underlaid grid indicates how many observations lie in each cell, with the number of rows and columns in the grid controllable by the user. Below the plot is a chart of the distribution of the grids, plotted in decreasing order of density. The resulting chart follows an Asymptotic Curve (or simply, "A-Curve").

Speaker Info:
The speaker info page allows you to explore the metadata and distribution of speakers in DASS. The map has some flexibility as to how various demographic categories are displayed.

New content is being added regularly, so check back for additional features. See the bottom of this page for updates on recent changes.

How is this site powered?

This site is built in
Shiny,
a web application framework for R. With Shiny, users can utilize the computational power of the R programming language without having to learn R or install it to their computers. This is all bundled up and put on the web to allow for the interactive capabilities of web browsers. See the bottom of this page for a list of specific packages that are used to process this data and create this site.

How is this project funded?

Who is involved?

The PIs for this project are William Kretzschmar and
Margaret E. L. Renwick,
of the University of Georgia. Our team has several graduate student researchers including Mike Olsen, Rachel Olsen,
Lisa Lipani,
Jeremy Shi, and
Joey Stanley.
We also had several dozen undergraduate student workers, funded by ADS or NSF, who did most of the transcribing work.

Contact information

For more information, please contact Joey Stanley at joeystan@uga.edu.

How can I cite this resource?

If you use or refer to this website, you must cite the
Gazetteer of Southern Vowels
as follows:

Version 1.3 (August 28, 2018)

We've added measurements from over 200,000 vowels to the corpus, bringing the total to 988,217. All speakers are now represented in the corpus with the exception of speaker 850 who had pretty awful data for some reason.

Updated the "Joey's filter" procedure to the latest development.

A third filtering option, the Mahalanobis distance, is now available.

Version 1.2 (June 5, 2018)

Updated the PPA page so that the bottom corner of the grid (A1) is in a reasonable place instead of being determined by the max(F1) and max(F2) measurements.

In the "Plot Options" tab, the "Fit all data" button has been changed to "Fit Vowel Space". Instead of zooming WAAAY out to fit all the bad data, it zooms to a comfortable vowel space that is consistent across speakers.

This standard vowel space is the default rather than the axes adjusting to accomodate all the selected data. This will make comparing different subsets easier, and when looking at a single vowel it'll provide some context as to what portion of the vowel space it occupies.