Last Friday, I took a quick look at recent work by James Street and Ewa Dabrowska that shows a striking difference between grad students and people of "low academic attainment" ("LAA") on an apparently simple sentence-interpretation task ("'Unable to understand basic sentences?'", 7/9/2010). I had to cut my investigation short in order to take the RER-B to Charles de Gaulle airport for a flight back to Philadelphia, and so I didn't have a chance to take up (what I thought was) the most striking aspect of these experiments: the fact that the LAA subjects did so much better on sentences of the form "Every X is in a Y" than on sentences of the form "Every Y has an X in it".
In my original post, I suggested some reasons for poor LAA performance on such tasks in general — issues of attention, motivation, and general test-fu. I also alluded to old work by Peter Wason which shows that the fit between logical reasoning and ordinary-language interpretation is often quite bad, even when logic is arguably an accurate reconstruction of the core natural-language meaning. Various commenters reinforced and amplified these arguments, arguing for example that the test sentences were unidiomatic; that there might be dialect-difference issues; and that the task was highly decontextualized, making it more likely that subjects might imagine scenarios where (say) an injective function might be taken to pragmatically implicate a bijection.

These lines of reasoning explain the overall difference between HAA and LAA subjects, and also why the quantifier sentences were harder overall than the active/passive sentences were. And the well-known tendency to associate grammatical subjects with semantic agents might explain why the passive sentences would be harder than the active ones, since all the active-sentence subjects were agents and all the passive-sentence subjects weren't.

But nothing in my post or in the comments, as far as I can see, explains why the Q-has sentences ("Every shoe has a hamster in it") were so much harder than the Q-is sentences ("Every umbrella is in a stand"). In Street & Dabrowska's Experiment 1, the LAA group's Q-is average performance was 78% correct, while their Q-has average was 43% correct. In Experiment 2, the pre-test mean for Q-is was 44% correct, and for Q-has 15% correct. (The lower baseline for the Experiment 2 participants is a puzzle for another day.)

I've thought of several possible reasons for the Q-is/Q-has difference, none of them very satisfactory. You can probably think of some others — and I hope that they're more persuasive than mine are!

My first thought was that this might have to do with the grammatical role of the reference to the things that need to be checked. In "Every ball is in a box", the way to get the right answer is to look at all the balls and make sure that each of them is in a box. If you look at the boxes to see whether all of them have balls in them, you'll get the "wrong" answer. In the sentence "Every box has a ball in it", you need to check the boxes for balls, not the balls for boxes. But perhaps the construction "Y has an X in it" activates the proposition "X is in Y", and makes it more likely that you'll get confused about the domain and range of the mapping. However, I don't have any evidence for this idea, which just replaces one unexplained pattern with another.

My second thought was that there might be a difference in construction frequency. But in fact, both of these constructions are vanishingly rare. In the 400-million-word COCA corpus, the search pattern {every [n*] has a|an [n*] in it} turns up exactly one hit ("Every classroom has a flag in it"). And the pattern {every [n*] is in a|an [n*]} turns up two hits, both spurious ("Every word is in a sense an infinite object" and "…every author is in a sense atypical…").

My third thought was that there might be some issue with animacy. In the "List of sentences used in one version of the comprehension task" given in the paper's appendix, three of six Q-has sentences feature an animate "thing contained": hamsters in shoes, turtles in bowls, dogs in baskets. The subjects in the Q-is sentences are all inanimate: umbrellas, feathers, toothbrushes, balls, pencils, cakes. And it's plausible (though not established, as far as I know) that animates would be more likely to be chosen as the domain of the mapping to be checked. But I don't know whether the other versions of the experimental materials maintained this difference in animacy.

[By the way — it's not clear to me whether all the pictures in the quantifier experiments always instantiated injective functions. In other words, were the relations between X's and Y's always one-on-one (perhaps with some X's and/or Y's left over)? Or was e.g. "Every ball is in a box" tested against a picture in which (say) there were four balls in one box, one ball in a second box, and no balls in three other boxes? ]

Scott said,

I'm sure they controlled for this, but could there be an ordering issue? If the participants were first exposed to a "Q-is" proposition, and then to a "Q-has" one, they might still have been looking at the objects (rather than the containers) to determine whether the statement is true or false, based on the pattern established in the previous round.

[(myl) The description of their experimental design indicates that all four types of sentences (Active, Passive, Q-is, Q-has) were presented in a quasi-randomized order, such that subjects never got two sentences of the same type in a row.]

Perhaps it is just down to the fact that "Q-has" sentences sound odd. We wouldn't say "every seat has a person in/on it"; instead we'd say "every seat is taken". I suspect that, given everyday examples like these, the participants would have performed better.

D. Sky Onosson said,

How about looking at phrases without "every"? "There's an X in the Y" seems to me to be much more natural than "The X has a Y in it".

A quick search in COCA for {[n*] has a|an|the [n*] in it} yields only 29 hits, while {[n*] is in a|an|the [n*]} gives 634. So while neither is very frequent, there is a pretty large difference between the two.

(I originally ran the search without "the", but I realized that "X is in the Y" should be fairly common. When I added "the", the count jumped from about 200 to 634 – but the 1st phrase "has … in it" didn't change at all, i.e. there were no occurrences at all of "has the … in it")

[(myl) Good point. Maybe this rescues the construction-frequency hypothesis.]

I can't help but wonder if this is more of a logic issue and less of a language issue. There are, after all, a great many people who do not seem to be able to isolate the problem with such statements as "All that glitters is not gold," or "All trees are not oaks." Such statements can apparently not be easily (and correctly) interpreted as being false, without a decent grounding in set theory, and it seems to me that there is a link between set theory and the Q-is/Q-has issue.

[(myl) I agree that (as in the case of the Wason selection test) the Q-is and Q-has part of the experiment is clearly an issue of semantic/pragmatic interpretation and not of grammatical analysis. There's no evidence to suggest that the subjects couldn't parse the sentences — maybe sometimes they couldn't or didn't, but this was never tested.

In the case of the active/passive sentences, the fact that nearly everyone got the active sentences nearly always right suggests that the basic semantics was clear, and the only question was how to map phrases onto the semantic interpretation process.

In the case of the Q-is/Q-has sentences, the fact that the best LAA scores were pretty bad suggests some basic unclarity about the semantic-interpretation task itself, i.e. the same sort of logical issues that arise in the Wason selection test.]

Stephen Jones said,

You look at the picture for 'every basket has a dog in it' you'll see three baskets and four dogs. So what is salient in the picture is the dog without the basket.

However that is not what is salient in the sentence, and therefore the sentence needs analysing as an abstract task, and we agreed that partly because of the paper-airplane effect when the LAA group has to make an explicit analysis and an abstraction error is more likely to creep in.

What I suggested was that half the group was shown the picture of three baskets and four dogs, and the other half a picture of three hotel rooms and four guests. The sentence 'Every room has a person in it' would now be relevant to the situation (somebody is without a room) and I suspect the number getting it right would be higher.

David Hilbert said,

It's not so clear to me that "All that glitters is not gold" is correctly interpreted as being false. It seems to me (and many others) that the "not" can take wide scope in sentences like this and thus the sentence can be glossed:
~((∀x)(Gl(x)→G(x))) which is equivalent to (∃x)(Gl(x)∧~G(x)) and is arguably true.
The narrow scope reading (∀x)(Gl(x)→~G(x)) is false and may be available but seems unnatural to me. I believe there is a largish literature on these issues but I'm afraid I'm not an expert.
None of this affects Mark's more general point.

It occurs to me that the Q-has sentences are syntactically and semantically more complex than the Q-is sentences; in Q-is, you have a simple PP that maps to a simple predicate, e.g. "in a box" maps to something like λx.[in'(x, box)], which then hooks in with a copula in a straightforward way so that the only complexity to resolve is the quantifier.

In Q-has, however, you have a pronoun coreference ("it") to resolve—depending on your formalism of choice, this might show up as a free variable e.g. "in it" -> λx.[in'(x, y)], but however you deal with it, the constituents "in it" and "a dog in it" require much more work to represent as partial meanings. And if you fall back from direct interpretation to the more pragmatic "what might they have meant by this?", you end up with the constituents that semantically serve as arguments of in' in reverse order, e.g. for "every basket has a dog in it", the relevant predicate order is in'(dog, basket), which means that someone bypassing the syntax is more likely than in the Q-is case to get it wrong.

From the perspective of this research, the Q-has case combines the difficulty of both the passive (reordered constituents) and the Q-is (quantifier). Plus more, because of the pronoun issue.

J. W. Brewer said,

Maybe "basic sentence[s]" might be a misnomer for sentences involving specific constructions that are "vanishingly rare" when searched for in COCA, however syntactically straightforward they might seem when diagrammed? Although I guess you'd need to get a sense of how common other sorts of patterns are. E.g., while "my hovercraft is full of eels" was originally funny because of its novelty/incongruousness, one might expect (tho' I don't have time to run the exercise) that other sentences of the form my X [noun, singular] is full of Y [some other noun, plural or uncountable] would be comparatively more common in corpora like COCA. Prior exposure to such a pattern could perhaps aid an LAA test-taker in determining whether or not a particular picture involving a hovercraft and some eels matched up with the sentence or not.

Brandon Weaver said,

Could it be attributed to an incomplete reading of the sentence? The Q-has sentences have a little twist at the end which could easily be glossed over, particularly by those unaccustomed to the "trick" questions of testing.

Since all the other questions have a pattern of "X | action | Y", and the Q-has questions deviate from this pattern with a small but important addition at the end, were participants merely conditioned to conduct their search after recognition of the initial pattern?

If you re-worded the question such that you could not conduct a search until you reach the end of the sentence, I wonder how the participants would score.

If you looked at a participants data over time, did they eventually realize the Q-has questions have a different pattern? If it's really an issue of reading, I would expect the first several Q-has questions to be wrong, then, once the realization occurs that there are two patterns in play, for Q-has questions to be answered correctly, as more attention was placed on fully reading the sentence.

Michael Y Chen said,

This might be related to Scribner and Cole's studies on Vai literacy, schooling, and logic (learned at Penn, in Gene Buckley's LING115: Writing Systems).

The study found that participants with formal schooling did better at plausible syllogisms (situations that had real-world relevance), but everyone did equally well in abstract syllogisms. The deduction from the finding was that while schooling made people more willing to abstract from more-or-less ordinary statements, everyone was capable of abstraction. Literacy had no effect.

i doubt that this would account for the degree of difference in the results, but the "has" constructions require the resolution of anaphora (X~'it'). if that did make the difference, then sentences of the type "Every X has a Y" (e.g. "every man has a hat") should be considerably easier to process. that seems a bit unlikely, but maybe worth testing.

John Roth said,

One problem is that the obvious logical interpretation is not the one most people seem to make. The common proverb "All that glistens (or glitters, whatever) is not gold" seems to be interpreted as "Not everything that glistens is gold." That is, the word "all" is not treated as the universal quantifier, but rather as an existential quantifier.

[(myl) Or, equivalently and more plausibly, this is a consequence of the theorem

¬(∀x)P(x) ⇔ (∃x)(¬P(x))

This is not accurately described as "all" being "treated … as an existential quantifier", but rather as negation (associated with "not") taking wide scope over the universal quantifier (associated with "all"). But probably that's what you meant…

Anyhow, it's not obvious that there's an exploitable scope ambiguity in either the Q-is or Q-has sentences. Or am I missing something?]

The logical interpretation is only obvious to people with training in formal logic.
My suspicion is that the strange wording is what tags it as a proverb and sends it to the part of the brain that deals with metaphors. Such a part apparently exists: I remember seeing articles about people with specific brain damage who were unable to process that (and other) metaphors.

As far as the main question is concerned, it doesn't surprise me in the least that people without training in formal logic have difficulty with those statements.

Lucy Kemnitzer said,

I'm strongly supportive of the "unidiomatic" interpretation together with the contextlessness interpretation. I can't imagine an ordinary context in which a person might say "Every box has a hamster in it" rather than either "there is a hamster in every box" or "all the boxes have hamsters." Even a very simple and straightforward sentence can throw you for a loop if you haven't dealt with it before.

I don't have the material to hand, but when I was studying to be a teacher we read a lot of studies that seemed to show that students with a less academic background were more dependent on context (including social context) in making sense of things.

Marion Crane said,

This is just my personal gut feeling, but as I was reading this post and the previous one, and the comments on both, something struck me. Thing is, from the examples in the post, it seems they're not so much testing for Q-is vs Q-has, but rather specifically 'Q-is in' vs 'Q- has in it'.
Does it really need to be 'is in' or 'has in it' to test the subjects' grasp of simple sentences?

How about 'Q-is on' vs 'Q-has on it'?

Every hat is on a man.
Every man has a hat.

Every collar is on a dog.
Every dog has a collar.

Or, still testing basic sentence structures but throwing out the factor of two separate agents interacting, how about 'Q-is attribute' vs 'Q-has attribute'?

Every basket is red.
EVery basket has a red colour.

Every ball is striped.
Every ball has stripes.

Wouldn't these examples, or better yet, a mix of all, have been better? And thus given different (more realistic) results for the LAA group's scores?

Peter said,

"All that glisters/glitters is not gold" is a fixed expression that predates set theory by close to 300 years. Trying to interpret it by modern set logic is like comparing the story of Icarus to modern aviation.

[(myl) Or interpreting the development of flowering plants in terms of the theory of evolution?

Logic is an attempt to give a formal reconstruction of truth and consequences — and to the extent that it's a successful attempt, it ought to work for understanding those aspects of expressions of any era. ]

Liz said,

One aspect that does not seem to be considered here is that LAAs and HAAs approach these tests in different ways. Based on my experience administering the Census Applicant test to 300 or more prospective Census Enumerators (and talking to them about their experience with the test afterwards) in both English and Spanish to people with educational levels ranging from not completing High School (in some cases not completing 9th grade) to those with advanced professional degrees, both LAAs and those who were educated in a system that does not use 'bubble tests' described requiring a higher level of understanding of the material and confidence in identifying the right answer than the HAAs.

Given the low frequency of "every x has a y", one might expect that LAAs would have a lower level of confidence that they understood the question correctly, and if you don't feel confident that you have chosen correctly, you might well change your answer to an incorrect answer solely because you distrust your own instincts.

Which makes for a long way of saying that to a degree this test may be measuring test taking skills and confidence to a higher degree than the researchers may have thought.

Rodger C said,

Given that "All that glisters is not gold" is an obvious metrical inversion of "Not all that glisters is gold," I fail to see the problem here. Or are logicians forbidden by their discipline to consider syntactic variation?

[(myl) I'm not aware that there's anything worth calling a "problem". There's a well-known scope ambiguity in sentences of the general form "All things with property P don't have property Q" (e.g. "all swans are not white" or "all the arrows didn't hit the target"), which can correspond roughly to (at least) two different kinds of logical expressions:

¬(∀x)P(x) ⇒ Q(x)
or
(∀x)P(x) ⇒ ¬Q(x)

Obviously the truth conditions for these two interpretations are quite different. But is this a "problem"? I don't see why.

In general, however, sentences of the form "Not all things with property P have property Q" (e.g. "not all swans are white" or "not all the arrows hit the target") are not ambiguous in this way, having only the interpretation in which the negation has wider scope than the universal quantifier. Again, no problem, just a fact.]

exackerly said,

I'm confused about exactly what this test is testing. Since we all agree that sentences of the form "Every x has a y in it" are extremely rare in actual English usage, and even "Every x is in a y" also appears to be rare, do the results really tell us anything about language skills? I think it has more to do with logic, reasoning, critical thinking, or math skills, take your pick. More like an IQ test than a language test.

[(myl) To reliably answer these questions correctly, subjects need to understand what the sentences (are meant to) mean, and to evaluate whether that proposition is true of the corresponding picture. This obviously does involve language understanding (I'd have no shot at problems expressed in Tibetan, for example, because I don't understand the language). But equally obviously, it involves some simple reasoning about the relation between the implied logical expressions and the schematized schematized states of affairs.]

elinar said,

As far as I can see, Street & Dabrowska argue that there is no such thing as linguistic competence that is independent of cognition and experience. They assume that our linguistic abilities cannot be separated from other cognitive abilities; and that the ability to make use of syntactic cues is a product of experience and education/training, rather than of an autonomous, innate language faculty.

It may be – as some people have suggested – that S&D’s sentences are not basic (or natural) enough to test ‘core ability’. But the same thing could be said of many of the sentence structures that have been discussed extensively in the linguistics literature, even though they occur mainly or exclusively in written language.

So whatever the problems with these kinds of experiments, I think they raise important questions about theories of syntax. Specifically, what are they theories of: basic language capacity or the linguistic competence of highly educated, highly literate speakers/writers?

Tom Rose said,

My brother, Ralph Rose, is a linguist and he turned me on to this topic; very interesting, by the way. I'm a rank amateur and probably my comments are irrelevant but here goes anyway.

Were the researchers trying to make the following two statements logically equal?
"Every X is in a Y"
and
"Every Y has an X in it"
I think they're not but that might be irrelevant to the experiment.

The first statement says nothing about a potential infinity of Ys without Xs while the second statement says nothing about a potential infinity of Xs without Ys.

Moving on. For this LAA, the second sentence is harder for me to understand because of the word "it". I look for the closest word to associate with "it" and I find X. However, I realize that that doesn't work so I associate "it" with Y which occurred so long ago :) Oops! To late. I can't think about my own thinking without data tampering. Nevertheless, my first impression was that the word "it" slowed my processing of the sentence.

Lastly, some passive police, and I've been one of them, criticize the passive because they say bureaucrats and others use it to cover up their, lies, incompetence, etc. If that's true, what language constructions do the experts on language, the linguists, advise when the rest of us wish to lie, cover-up, obfuscate, etc.?