Tuesday, May 27, 2014

For our final project in AP Calculus class, I'm doing a
presentation on the connection between mathematics and linguistics, and I
stumbled on your blogpost "Why Linguists Should Study Math" while
researching my topic.

I was wondering if you could point me towards some resources
(that are relatively easy to understand) about how math is present in and
affects our written and spoken language.

Some things that I am considering are:

- the occurrences of words in our language

- how grammar uses mathematical principles

- algorithms we use to construct sentences

Thanks,

M.

My [edited] response (suggestions from y'all as to better resources are much appreciated; I'll forward; I wanted to get a response out quickly because the final is presumably fast approaching):

M.,

Thanks for reaching out to me. Of course, I think you’ve
chosen a good topic. There are two broad ways in which linguistics and math
intersects:

How the human brain uses math in natural language (psycholinguistics)

How linguists use math to study and model languages (computational linguistics)

From your email, it appears you are mostly interested in #1.
However, in contemporary linguistics, the two are fast becoming one. Most
contemporary linguists use math as a tool.

Let me address your three areas of interest with respect to
how the human brain might use math to process and produce language:

The occurrences of
words in our language: For the most part, this means “frequency” which
really means counting. Linguists love to count. We use large corpora of texts
to count words and phrases. Lancaster University in the UK is a well-known
corpus linguistics school. Their web page has a lot of good introductory
information (although I find it a bit clunky looking).

UPDATE: I forgot to include the one item that most directly answers the basic question: frequency effects in language. Human's are very aware of how often they hear words. In some way, we count words automatically, even if it's not quite a specific count like 75, somehow we know which words, phonemes, syntactic structures we hear/read more than others. This gives rise to a variety of frequency effects in language processing. This is the clearest example of how the brain uses math for language.

For example, we recognize high frequency words much faster than low frequency words. The website for Paul Warren's book "Introducing Psycholinguistics" has an online demo for a word frequency task you can walk through to see how linguists study this.

What do linguists count?

Words: I’m sure
you’ve seen word clouds like Wordle. This is composed of simple word frequency counts. One of the most enduring
facts about word counts is Zipf’s Law which says “the most frequent word [in a corpus of texts] will occur
approximately twice as often as the second most frequent word, three times as
often as the third most frequent word, etc.” Why would this be true? Linguists
have been studying this for decades.

Ngrams: sets of
two-word, three-words, four-word strings, etc. This helps provide more context
than mere single word frequencies. Have some fun playing around with Google’s
Ngram Viewer if you haven’t already.
Try plotting the change in frequency of “mathematical linguistics” and “corpus
linguistics” (paste those two phrases into the search box with no quotes and
only a comma separating them). Scholars are trying to use this to plot changes
in culture. For example, take a look at this PDF.

Other: We also
count many other things too, like parts of speech (verbs, nouns, prepositions,
etc). We also count the co-occurrence of linguistics items that are not right
next to each other. If you want to dig into more frequency fun, check out the
more advanced tools at BYU.
You can read more about how these tools help us study language here.

How grammar uses
mathematical principles: One of the most commonly studied types of
mathematical principle in language is statistical learning. A good example of
this is transitional probabilities, which are sets of probabilities for what
linguistic item might come next given a string of items (e.g., words or
phonemes). For example, if you read “The author signed the _______”, you could
guess what the blank word is based on the previous four words (most likely,
it’s “book”).This is based on the
psycholinguistic tests called “Cloze tests”. Linguists have discovered that the brain tracks transitional probabilities
for all kinds of linguistic items. In fact, this is one of the most robust
areas of study in language acquisition. Linguists study how babies use
transitional probabilities to learn language. For example, one of the most
challenging problems is figuring out how babies learn to separate a continuous stream of audio noise coming in to their ears into separate words, without any
knowledge of what words are or what they mean. One theory is that babies quickly learn transitional probabilities of sounds
that tell them where one word ends and another begins. But transitional
probabilities alone are not enough. For a challenge, try reviewing this PDF:

Algorithms we use to
construct sentences: This is the most controversial area you’ve asked about.
The fact is, we linguists don’t really know how the brain constructs sentences.
As I mentioned above, there are models based on transitional probabilities like
Markov models, a computer algorithm designed to make those same kinds of guesses
we made about “book”. Markov models and Cloze tests are a good example of psycholinguistics and
computational linguistics coming together. As a theoretical contrast to
statistical models, there are rule-based models like formal grammars.
These are not mathematical in a typical sense, but they are based on formal
logic, which is the underlying foundation of mathematics. Linguistics is in the
middle of a war between the formal grammar camp and the statistical grammar
camp. There’s no consensus on which is the *correct* model of language.
However, in the last decade or so, the statistical side seems to have gained
the advantage. If you really want to dig in to this war, here’s a challenging
read.

Additional Reading:

Linguists who count (the comments are especially engaging;
your teacher might be particularly interested in the calculus vs. algebra debate that
ensues).

I hope this gets you off to a good start. Please don’t
hesitate to ask for clarifications or more resources (especially let me know if
you need more intro level or more advanced level; I wasn’t sure if I hit the
level right or not). I’m happy to be of more assistance if I can. As a smart,
dedicated student, I’m sure you’re ready to dig in to ngrams and Markov models.
But, as a high school junior in southern California with June fast approaching,
I’m also sure you’re ready for the beach. Both are required for a healthy life
of the mind.

Wednesday, May 21, 2014

B.A. degree, backgrounds of interest include any verbal-focused or writing intensive field (e.g. Linguistics)
Apply HereText Analytics Consultant
Medallia, Inc. - Palo Alto,California
Bachelor's degree
Background in Linguistics
Demonstrated interest in technology
Strong preference for a French or German native speaker
(Not visible on company website, found on LinkedIn, sign in required)
Apply Here

Linguistic Intern
Bosch Group
SF Bay Area
Responsibilities: Support the development of next-generation language products in the areas of speech and language technologies and systems. Support the administration of user studies
Qualifications: Senior undergraduate or graduate students in Applied Linguistics, or related fields
Apply Here

Analytical Linguist, Ads Human Evaluation
Google
Los Angeles, CA, USA
Product Management
Responsibilities: Direct, monitor, train, and manage the day-to-day work of temporary workers.
Design and implement tests on data and worker quality, analyzing and reporting on the results using Python, XML/CSS, HTML/JavaScript, database queries, and Google-internal technologies.
Work directly with engineers and statisticians to devise and run experiments to answer specific questions about advertising and product quality.
Minimum Qualifications: MA/MS or PhD degree in an analytical field (e.g., Linguistics, Cognitive Science, Statistics, Mathematics), or equivalent practical experience.
Experience with one or more of the following: Python or another scripting language, Java or C++, XML/HTML/CSS/JavaScript, SQL or specialized database query languages and/or specialized analysis software such as Matlab, R, SPSS, STATA, SAS, Praat, or E-Prime.
Experience working with large quantities of data.
Apply Here
And see my context here

Favorite Posts

"Laymen are generally lousy linguists: they do not know what questions to ask, they do not know how to look for answers to them and they are too ready to accept generalizations to which they could easily find counter examples."---James D. McCawley

"Asking a linguist how many languages they speak is like asking a doctor how many diseases they have."---Lynne Murphy (aka lynneguist)