Catégorie : R

The data set presented here was compiled by Frédérique Gayet, a psychomotor therapist whose research I supervised in 2013. Gayet (2013) focused on spatial prepositions in French: à côté de « next to » en dessous...

A regression towards mediocrity Originally, the term regression means “going back”. It gained currency when Sir Francis Galton related the heights of children to the average height of their parents. Galton (1886) found that...

This post is the first of a series on word embeddings, i.e. vector representations of words in a vector space. Word embeddings have been known to linguists for quite some time. Recently, artificial neural networks have...

What does your intuition tell you when a phenomenon is counterintuitive? It is my intuition that you cannot do good research without intuition. But, without safeguards, intuition can play tricks on you. Suppose that scientists...

I am a big fan of old corpora. Of course, I do also appreciate XXL corpora compiled semi-blindly from the Web. But, comparatively speaking, the older corpora have the kind of spick-and-span internal structure...

Forgive me for doing a little self-promotion as I am pleased to announce the publication of Corpus Linguistics and Statistics with R. Introduction to Quantitative Methods in Linguistics by Springer in the Quantitative Methods in the Humanities and Social...

This semester, I teach corpus-based sociolinguistics to third-year students. One challenge is to compile corpora that are suitable for the study of variation in language, especially when variation is correlated with geographic origin. Another...

Two years ago, I gave a mini workshop on text-mining techniques at a one-day conference on philology and the digital humanities at Paris 8 University. The conference was organized by colleagues from the Department...