How syntax gets you to "think"

A perennial problem in literature on word-learning addresses how children learn the meanings of words like “think” and “want”. Since the mental states they refer to are unobservable—-one can’t point to thoughts or desires even if they can see their effects—-it is unclear how a learner could know to link “think” to the concept thinking and “want” to the concept wanting. One proposal for how children learn these meanings is that they can categorize words by the sorts of sentences they show up in (Gleitman, 1990). For instance, “think” can show up in sentences like (1a) but is odd in sentences like (1b) and (1c) (denoted by the “*”) while “want” can show up in sentences like (2b) and (2c) but is odd in sentences like (2a). 1a. Carla thinks that Janet went to the store. 1b. *Carla thinks Janet to go to the store. 1c. *Carla thinks a piece of cake. 2a. *Carla wants that Janet went to the store. 2b. Carla wants Janet to go to the store. 2c. Carla wants a piece of cake. Here, I present evidence for this proposal—-that features of a sentence carry information about the meaning of words in that sentence—-using both experimental methodologies and computational modeling. I show that there exists a nontrivial correlation between the sentences a word can show up in and that word’s meaning.

How syntax gets you to "think"

A perennial problem in literature on word-learning addresses how children learn the meanings of words like “think” and “want”. Since the mental states they refer to are unobservable—-one can’t point to thoughts or desires even if they can see their effects—-it is unclear how a learner could know to link “think” to the concept thinking and “want” to the concept wanting. One proposal for how children learn these meanings is that they can categorize words by the sorts of sentences they show up in (Gleitman, 1990). For instance, “think” can show up in sentences like (1a) but is odd in sentences like (1b) and (1c) (denoted by the “*”) while “want” can show up in sentences like (2b) and (2c) but is odd in sentences like (2a). 1a. Carla thinks that Janet went to the store. 1b. *Carla thinks Janet to go to the store. 1c. *Carla thinks a piece of cake. 2a. *Carla wants that Janet went to the store. 2b. Carla wants Janet to go to the store. 2c. Carla wants a piece of cake. Here, I present evidence for this proposal—-that features of a sentence carry information about the meaning of words in that sentence—-using both experimental methodologies and computational modeling. I show that there exists a nontrivial correlation between the sentences a word can show up in and that word’s meaning.

10834 Views

Share Presentation

fun and interesting!
first a clarification: what do you mean when you say you looked at “how good different words sound with different sentence types”. could you please explain what yo mean by “good”?

now fr the question: do you think that MTurk participants are a good proxy for children (in the context of learning word meaning)? I see the value in exposing them to new, unfamiliar, words, but I am not sure that is sufficient, primarily because adults’ world, associations, context and learning experiences are significantly more loaded than those of children.

To answer the clarification: by good, I mean judged acceptable by participants. An “acceptable” sentence is something a native speaker of English would say. For example, “Frank wondered what he should eat with” is 100% acceptable even if a prescriptive text would warrant against the ‘dangling preposition’. In contrast, “Frank wondered what he should eat bananas and” is clearly not acceptable. Acceptability judgments are a common form of data used in theoretical linguistics. These judgments can be collected in various ways. The way we used was a Likert scale task, but thermometer and magnitude estimation tasks are also used.

To answer your question (Dr. Drayton had a similar question in the Discussion section): first, a clarification. Adults were seeing words they knew and were either making judgments about their similarity with other words they knew or judgments about their compatibility (acceptability) with a syntactic construction. Experiments (in the human simulation paradigm; Gillette et al. 1999) are conducted that ask adults to make inferences about the meaning of a new word given constrained information—e.g. linguistic context and/or visual context—and we plan to carry some out using data from these experiments (see my answer to Dr. Drayton’s question for some of the reasoning behind this). But these experiments were with known words.

With regard to adults being quite different from children in terms of their biases, this is quite likely. The idea we’re pursuing here is to establish what the end state—the adult’s knowledge of their language—-in fact looks like. That is, what do the connections between different words’ meanings and their syntactic distributions look like once you’ve learned these words’ meanings? The idea is that this is as noiseless a representation as we could hope to get of the end state. The next step is to use this knowledge about the informativity of different syntactic constructions to guide our research with children. What do children infer about a word’s meaning given the syntactic features it cooccurs with and is it what we’d expect given the adult data?

Adults are interesting in their own right also. Given that we now know something about the strength of connections between different syntactic cues and meanings, suppose we feed these distributions back into the adult by giving them instances of a novel word. Do they infer things about the novel word’s meaning that you would expect given our model, or do they differ in systematic ways. This is interesting from the modeler’s perspective because it can give insight into how knowledge about the connections between meaning and syntax is actually used in learning. And even if this just tells us about how adults learn, the hope is that it would expand our knowledge about human cognitive capacities.

In the context of this research, we’re interested in addressing a possibly unbounded number of possible meaning categories. The way we address this is with nonparametric versions of Bayesian methods for text analysis. One broad class of methods we use are topic models—models that attempt to automatically learn topics from text. When we make these models nonparametric (by placing a prior like the Chinese Restaurant Process or the Indian Buffet Process on topics/categories/features), we allow the model to assume that the number of topics—or in our case, meaning categories—is infinite. What is nice about these models is that we can balance parsimony—less categories—and fit within the workings of the model. Another nice thing about them is that, since many of these models are built for analysis of text, the observed data are contingency tables—i.e. cooccurrence counts. This is exactly the sort of data we can glean from parsed corpora, allowing us to easily fit these models to cooccurrences of words with syntactic features.

Since the language children hear will be correlated with their social experiences, we can look at how our models perform (and what meaning categories they find) given transcripts of different social situations. For example, some of my recent work has focused on differences in syntactic complexity between play contexts (where a parent and child are playing together with toys) and meal contexts (where the parents and children are sitting at the dinner table). What I found was that there is higher syntactic complexity in the meal-time contexts than the play-time contexts (we already knew there was higher lexical complexity). The higher the syntactic complexity (and the more varied the syntax across situations), the better we’d expect our models to do. This suggests that different social experiences are an integral part of language learning even when we’re focused just on the linguistic input itself.

Our group is also collaborating with a group in Maryland’s Human Development and Quantitative Methods department to look at Socioeconomic Status (SES) differences both in the time course of word-learning and in the input. We’re interesting in looking at syntactic complexity differences—akin to the play-time v. meal-time contexts work I described—in SES. The idea is to see whether our models can predict differences in the word-learning time courses we may find between different SES groups.

This is a very interesting question. I suspect that topic models could be used to good effect here—where in this case a document corresponds to features associated with a single child. I think the bigger obstacle might be in the feature engineering. What features to extract would depend on the sort of classification you want to make. So for instance, if one were interested in syntactic differences among different sets of children, it would be interesting to fit a topic model to features extracted from parsed transcripts of children’s speech (maybe at the same age). (A slightly modified version of an adaptor grammar might also be interestingly applied for this sort of problem.) One might then look at the topics (classes of children) learned. An interesting application for this might be trying to discover common syntactic or lexical differences among disordered population that could aid in early discovery and treatment—something we’re currently looking into.

Interesting project. You mention that you use MTurk as the most “noiseless” sample of the adult state, are you suggesting the MTurkers are representative English speaking adults? Did you collect demographic data to verify (or at least qualify) that ‘end state’ measurement?

We do collect demographic data to ensure the Turkers are native American English speakers. More importantly, though, we have designed and piloted a short screen to ensure that the participant is a native American English speaker (an MTurk qualification that is viewable here: http://goo.gl/BPJ3s). Having passed this screen is required before the Turker can participate.

This is a pretty challenging topic, I look forward to future developments!
Some things that this poster made me wonder:
1. What are the syntactic features that you think enable the learner to infer more specific meaning, as opposed to syntactic category?
2. How do you differentiate between the situation and the syntax? Under the theory of language you’re operating with, how do you deal with the “John is easy/eager to please” sort of phenomenon?
3. Is it safe to use adult learners as models for language acquisition?

1. Syntactic category is a possible syntactic feature; we’ve known since at least Brown (1957) that children can use syntactic category as a cue to word meaning. Syntactic category wouldn’t be such a useful feature in our experiment since we looked at only verbs. Insofar as syntactic category focuses you in on the right concept space—-verb (very roughly) linked to event/state concepts; adjectives to property concepts, etc.—-all of our predicates of interest will have had the same such focus qua their category.

The sorts of features we were interested were (roughly) ones inherent in the verbs’ subjects and objects. So for instance, finiteness (does your embedded clause have tense) may be a good cue to the sorts of concepts the word links up with.

a. John wants/*thinks to be happy.
b. John *wants/thinks that he is happy.

Another sort of feature is the ability to take bare verb phrases:

c. John saw/wanted/thought Jen leave the house.

(And there are quite a few others.)

2. We didn’t differentiate between control and tough constructions in our experiment—-not because they’re not interesting, but because we only had a limited number of constructions that we could include. To tease that difference out, we’d need to add an extra construction.

We did this for some constructions. So for instance, we were interested in whether a distinction between what are called the believe- and wager-classes would come out. We needed two constructions to differentiate this:

d. John believed/*wagered Mary to be intelligent.
e. Mary was believed/wagered to be intelligent.

3. No. But it is safe to use them as models of the end state. What we were interested in the experiment described is (1) whether there was information in principle present in the syntactic distributions once you’ve converged on a grammar; and (2) which features are informative about which distinctions. The idea is to use this knowledge to inform our kid experiments: which features should be informative and do children actually use them?

This is not to say that valuable things can’t be learned from how adults make inferences based on evidence from syntactic features. We’re developing some experiments right now using the human simulation paradigm (Gillette et al. 1999), in which participants listen to a novel word cooccurring with various syntactic features and then have to figure out that word’s meaning. The idea is to see whether, if we give participants a specific verb’s syntactic distribution, we can get them to recapitulate the similarity judgment’s for that verb. This will give us another way to see how informative the syntactic distributions really are.

Very much enjoyed the poster and the video which clearly illustrated the issues that children face when constructing meaning. Thanks.
The only problem that I had was that I found that the repetitive sound track in the video distracted me from what was being said … but maybe just late in the day… Thanks again.

Cécilia Tsan

Guest

May 20, 2013 | 10:06 p.m.

Fascinating analysis. I look forward to more. I agree with Joni, the soundtrack is not necessary. Being a musician myself, I found it really distracting. Thanks

Very interesting topic. Your “dax” example is excellent at demonstrating a thought process which we often take for granted. I was wondering if you could expand on the broader impacts of this research and perhaps any future studies you hope to conduct.

That was a really interesting video, and make me think about language in a way I haven’t before. Have you done research into how reading might influence a child’s ability to think? For instance, in elementary school, most students learn to read from common, modern books. Then in middle school, some students might be required to read books that were written centuries ago, when sentence construction was different. Would being exposed to the same language, but from different centuries, hinder, or help the ability to contextually decipher words?