tools and notes (mostly) for students

Making it possible for students to get directly involved in research is one of the priorities of the USM Linguistics program. This means it’s helpful to rely on what are called “open source’ tools. These are typically available for free and do not require recurring license fees. It’s also helpful to standardize on a few tools to increase opportunities for student to student collaboration and support, and to allow the faculty to develop deeper familiarity and expertise with the tools students are using. We also favor systems that run more or less equally well on Mac and Windows (and Linux) systems.

So, here’s a quick list of linguistics experiment-running (and experiment preparation) software that has proven useful, of good quality and (in most cases) freely available for all relevant computer platforms.

For running experiments ‘live’, with participants coming to an on-campus lab:

PsychoPy (http://www.psychopy.org): “PsychoPy is an open-source application to allow the presentation of stimuli and collection of data for a wide range of neuroscience, psychology and psychophysics experiments. It’s a free, powerful alternative to Presentation™ or e-Prime™, written in Python (a free alternative to Matlab™ ).”

Linger (http://tedlab.mit.edu/~dr/Linger/): “…a software package for performing reading, listening, and other sentence processing experiments. Linger was primarily designed for masked, self-paced reading experiments. However, the code is flexible and can be adapted to support many other types of experiments. …It can run on Unix, Windows, or Macintosh systems. Linger is also able to handle non-English text and has been used to conduct experiments in Chinese, Japanese, and other languages.

For running online experiments with participants recruited through the web:

Amazon Mechanical Turk (https://www.mturk.com/mturk/welcome): “The Amazon Mechanical Turk (MTurk) is a crowdsourcing Internet marketplace that enables individuals and businesses (known as Requesters) to coordinate the use of human intelligence to perform tasks that computers are currently unable to do.” [Wikipedia] (Note: There is a cost to using this system, but it can be minimal.)

For analyzing statistical results from experiments:

R (http://www.r-project.org): “R is a free software environment for statistical computing and graphics. It … runs on a wide variety of UNIX platforms, Windows and MacOS.” R has also proven useful in preparing materials for use with Amazon Mechanical Turk. (At the moment, 14 Sept 2016, there’s a useful free intro tutorial for R on DataCamp.)

RStudio (https://www.rstudio.com): RStudio is what’s called an “integrated development environment” for R. It makes it easier to write small chunks of R code, test them step by step as they are built, stitch them together into a whole that does something useful, and get help when it’s needed. Like R, it’s free for non-commercial use and it runs on a wide variety of Mac, Windows and Linux machines.

For analyzing, editing and synthesizing speech materials generated by an experiment or to be presented to participants in an experiment:

Praat (http://www.fon.hum.uva.nl/praat/): “Praat (the Dutch word for “talk” or “speak”) is a free scientific computer software package for the analysis of speech in phonetics. … It can run on a wide range of operating systems, including various versions of Unix, Linux, Mac and Microsoft Windows … . The program also supports speech synthesis, including articulatory synthesis.”

Finally, I’ll mention a tool that is not focused on doing experiments particularly, but is nevertheless very relevant to any sort of experimental work, a bibliographic database:

Zotero (https://www.zotero.org): “Zotero [zo’tɛɹo] is a free, easy-to-use tool to help you collect, organize, cite, and share your research sources.”

If you know of other resources that might make useful additions to this list, please let me know.

Part of what makes science hard is that data, even the best, most reliable data, are often extremely difficult to interpret.

Interpreting results involves describing, categorizing, and organizing data. These are not necessarily straight-forward matters. Often too there are earlier key decisions about what data to collect and what to ignore. Not everything that seems relevant at first really is.

Then there are the stats. Almost any data set can be reasonably analyzed in more than one way. This can be very helpful. It can yield deeper insight into what they do, and do not, have to say. But there is a risk too. Along the way you may discover that you are more like to get a result that appeals to you in some ways, or less likely in other ways.

This item is one of the best presentations I’ve seen addressing these issues for a non-specialist audience. It includes an excellent do-it-yourself demo.