Comments

Wednesday, July 31, 2013

Not surprisingly, MOOCs are increasingly a hot topic and now that some are being tried out, problems are sprouting. Here's a piece reporting on one such experiment. I wouldn't make too big a deal out of this apparent failure (at least if Udacity's reaction is anything to go by) for the technology is in its earlier stages and will no doubt improve. What is worth noting in this piece starts at paragraph four onwards. The hype associates with MOOCs is nothing new. Moreover, what makes most salivate is not the immeasurable improvements to education that MOOCs will generate, but the COST SAVINGS and PROFITS that its biggest fans can clearly taste. The piece reminds us that this is always so with new technologies. Moreover, it is always the case that boosters (those who have the most to gain) are optimistic and sell the innovations based on best case scenarios of what the new technology will bring. Some believe (e.g. Brad DeLong) that the good features of MOOCs can be harnessed and the downsides mitigated through careful monitoring of their implementation. I am less sanguine. When there is money to be made and saved and that it the primary attraction, then more hard to measure features, e.g. enhanced education, often give way.

This said, I have a modest proposal. If MOOCs are really the way of the future and their real attraction is their enhanced educational promise, then let's try them out FIRST in elite institutions. I suggest that harvard, Yale, Princeton, MIT, Stanford, Duke, etc. announce that in order to enhance the educational experience of their undergrads that they are going to shift to MOOCs in a big way. If it's the case that MOOCs really are better then students and their parents should be delighted to see them replace more stuffy, less educationally advanced methods of knowledge delivery. If MOOCs are all that they are cracked up to be, or can be all that they are so cracked, then let's experiment first with elite students and when the kinks are worked out, expand these to everyone else.

Call me cynical, but I suspect that this will be a hard sell at elite schools. I don't see their "customers" rallying to MOOC style education. Of course, I might be wrong. Let's see.

Monday, July 29, 2013

In some earlier posts (e.g. here,
here),
I discussed a theory of word acquisition developed by Medina, Snedeker,
Trueswell and Gleitman (MSTG) that I took to question whether learning in the
classical sense ever takes
place. MSTG propose a theory they dub
“Propose-but-Verify” (PbV) that postulates that word learning in kids is (i)
essentially a one trial process where everything but the first encounter with a
word is essentially irrelevant, (ii) at any given time only one hypothesis is
being entertained (i.e. there is no hypothesis testing;/comparison going on)
and (iii) that updating only occurs if the first guess is disconfirmed, and
then it occurs pretty rapidly. MSTG’s
theory has two important features. First, it proceeds without much counting of
any sort, and second, the hypothesis space is very restricted (viz. it includes exactly one hypothesis at any
given time). These two properties leave relatively little for stats to do as
there is no serious comparison of
alternatives going on (as there’s only one candidate at a time and it gets
abandoned when falsified).

This story was always a little too good to be true. After
all, it seems quite counterintuitive to believe that single instances of disconfirmation
would lead word acquirers (WA) to abandon a hypothesis. And not surprisingly, as is often the case,
those things too good to be true might not be. However, a later reconsideration
of the same kind of data by a distinguished foursome (partially overlapping,
partially different) argues that the earlier MSTG model is “almost” true, if not exactly spot on.

In a new paper (here)
Stevens, Yang, Trueswell and Gleitman (SYTG) adopt (i)-(iii) but modify it to
add a more incremental response to relevant data. The new model, like that
older MSTG one, rejects “cross situational learning” which SYTG take to involve
“the tabulation of multiple, possibly all, word-meaning associations across
learning instances” (p.3) but adds a more gradient probabilistic data evaluation procedure. The process works as
follows. It has two parts.

First, for “familiar” words, this account, dubbed “Pursuit
with abandon” (p. 3) (“Pursuit” (P) for short), selects the single most highly
valued option (just one!) and rewards it incrementally if consistent with the
input and if not it decreases its score a bit while also randomly selecting a
single new meaning from “the available meanings in that utterance” (p. 2) and
rewarding it a bit. This take-a-little, give-a-little is the stats part. In
contrast to PbV, P does not completely dump a disconfirmed meaning, but only
lowers its overall score somewhat. Thus,
“a disconfirmed meaning may still remain the most probable hypothesis and will
be selected for verification the next time the word is presented in the
learning data” (p. 3). SYTG note that replacing MSTG’s one strike you’re out
“counting” procedure, with a more gradient probabilistic evaluation measure
adds a good deal of “robustness” to the learning procedure.

Second, for novel words, P encodes “a probabilistic form of
the Mutual Exclusivity Constraint…[viz.] when encountering novel words,
children favor mapping to novel rather than familiar meanings” (p. 4). Here too the procedure is myopic, selecting
one option among many and sticking with it until it fails enough to be replaced
via step one above.

Thus, the P model, from what I can tell, is effectively the
old PbV model but with a probabilistic procedure for, initially, deciding on
which is the “least probable” candidate (i.e. to guide an initial pick) and for
(dis)confirming a given candidate (i.e. to up/downgrade a previously
encountered entry). Like the PbV, P is
very myopic. Both reject cross situational learning and concentrate on one
candidate at a time, ignoring other options if all goes well and choosing at
random if things go awry.

This is the P model. Using simulations based on Childes
data, the paper goes on to show that this system is very good when compared
both with PbV and, more interestingly, with more comprehensive theories that
keep many hypothesis in play throughout the acquisition process. To my mind,
the most interesting comparison is with Bayesian approaches. I encourage you to
take a look at the discussion of the simulations (section 3 in the paper). The bottom line is that the P model bested
the three others on overall score, including the Bayesian alternative. Moreover, SYTG was able to identify the main
reason for the success: non-myopic comprehensive procedures fail to
sufficiently value “high informative cues” provided early in the acquisition
process. Why? Because comprehensive
comparison among a wide range of alternatives serves to “dilute the probability
space” for correct hits, thereby “making the correct meaning less likely to be
added to the lexicon” (P. 6-7). It seems
that in the acquisition settings found in CHILDES (and in MSTGs more realistic visual
settings), this dilution prevents WAs from more rapidly building up their
lexicons. As SYTG put it:

The advantage of the Pursuit model
over cross-situational models derives from its apparent sub-optimal design. The
pursuit of the most favored hypothesis limits the range of competing meanings.
But at the same time, it obviates the dilution of cues, especially the highly
saline first scene…which is weakened by averaging with more ambiguous leaning
instances…[which] are precisely the types of highly salient instances that the
learner takes advantage of…(p. 7).

There is a second advantage of the P model as compared to a
more sophisticated and comprehensive Bayesian approach. SYTG just touch on this, but I think it is worth
mentioning. The Bayesian model is computationally very costly. In fact, SYTG
notes that full simulations proved impractical as “each simulation can take
several hours to run” (p. 8). Scaling up
is a well-known problem for Bayesian accounts (see here),
which is probably why Bayesian proposals are often presented as Marrian level 1
theories rather than actual algorithmic procedures. At any rate, it seems that
the computational cost stems from precisely the feature that makes Bayesian
models so popular: their comprehensiveness. The usual procedure is to make the
hypothesis space as wide as possible and then allow the “data” to find the
optimal one. However, it is precisely this feature that makes the obvious
algorithm built on this procedure intractable.

In effect, SYTG show the potential value of myopia, i.e. in
very narrow hypothesis spaces. Part of the value lies in computational
tractability. Why? The narrower the hypothesis space, the less work is required
of Bayesian procedures to effectively navigate the space of alternatives to
find the best candidate. In other words,
if the alternatives are few in number, the bulk of explaining why we see what
we get will lie not with fancy evaluation procedures, but with the small set of
options that are being evaluated. How to count may be important, but it is less
important the fewer things there are to count among. In the limit,
sophisticated methods of counting may be unnecessary, if not downright
unproductive.

The theme that comprehensiveness may not actually be
“optimal” is one that SYTG emphasize at the end of their paper. Let me end this
little advertisement by quoting them again:

Our model pursues the [i.e. unique
NH] highly valued, and thus probabilistically defined, word meaning at the
expense of other meaning candidates. By
contrast, cross-situational models do not favor any one particular meaning, but
rather tabulate statistics across learning instances to look for consistent
co-occurrences. While the cross-situational
approach seems optimally designed [my emph, NH], its advantage seems
outweighed by its dilution effects that distract the learner away from clear
unambiguous learning instances…It is notable that the apparently sub-optimal
Pursuit model produces superior results over the more powerful models with
richer statistical information about words and their associated meanings: word
learning is hard, but trying to hard may not help.

I would put this slightly differently: it seems that what
you choose to compare may be as (more?) important than how you choose to
compare them. SYTG reinforces MSTG’s earlier warning about the perils of
open-mindedness. Nothing like a well designed narrow hypothesis space to aid
acquisition. I leave the rationalist/empiricist overtones of this as an
exercise for the reader.

Friday, July 26, 2013

I have more than once gotten the impression that some think that generative grammarians (minimalists in particular) have a hostility to combing grammars and stats because of some (misguided, yet principled) belief that grammars and probabilities don't mix. Given the wide role that probability estimates play in processing theories, learnability models, language evolution proposals, etc. the question is not whether grammars and stats ought to be combined (yes they should be) but how they should be combined. Grammarians should not fear stats and the probabilistically inclined should welcome grammars. As Tim notes below there are two closely related issues: what to count and how to count it. Grammars specify the whats, stats the hows. The work Tim discusses was done jointly with Chris Dyer (both, I am proud to say, UMD products) and I hope that it encourages some useful discussion on how to marry work on grammars with stats to produce useful and enlightening combinations.

Tim Hunter Post:

Norbert came
across this paper, which defines a
kind of probabilistic minimalist grammar based on Ed Stabler's formalisation of
(non-probabilistic) minimalist grammars, and asked how one might try to sum up
"what it all means". I'll mention two basic upshots of what we
propose: the first is a simple point about the compatibility of minimalist
syntax with probabilistic techniques, and the second is a more subtle point
about the significance of the particular nuts and bolts (e.g. merge and move
operations) that are hypothesised by minimalist syntacticians. Most or all of
this is agnostic about whether minimalist syntax is being considered as a
scientific hypothesis about the human language faculty, or as a model that
concisely captures useful generalisations about patterns of language use for
NLP/engineering purposes.

Norbert noted
that it is relatively rare to see minimalist syntax combined explicitly with
probabilities and statistics, and that this might give the impression that
minimalist syntax is somehow "incompatible" with probabilistic
techniques. The straightforward first take-home message is simply that we provide
an illustration that there is no deep in-principle incompatibility there.

This, however,
is not a novel contribution. John Hale (2006)
combined probabilities with minimalist grammars, but this detail was not
particularly prominent in that paper because it was only a small piece of a
much larger puzzle. The important technical property of Stabler's formulation
of minimalist syntax that Hale made use of had been established even earlier: Michaelis
(2001) showed that the well-formed derivation trees can be defined in the
same way as those of a context-free grammar, and given this fact probabilities
can be added in essentially the same straightforward way that is often used to
construct probabilistic context-free grammars. So everything one needs for
showing that it is at least possible for these minimalist grammars to be
supplemented with probabilities has been known for some time.

While the
straightforward Hale/Michaelis approach should dispel any suspicions of a deep
in-principle incompatibility, there is a sense in which it does not have as
much in common with (non-probabilistic) minimalist grammars as one might want
or expect. The second, more subtle take-home message from our paper is a
suggestion for how to build on the Hale/Michaelis method in a way that better
respects the hypothesised grammatical machinery that distinguishes
minimalist/generative syntax from other formalisms.

As mentioned
above, an important fact for the Hale/Michaelis method is that minimalist
derivations can be given a context-free characterisation; more precisely, any
minimalist grammar can be converted into an equivalent multiple context-free
grammar (MCFG), and it is from the perspective of this MCFG that it becomes
particularly straightforward to add probabilities. The MCFG that results from
this conversion, however, "misses generalisations" that the original
minimalist grammar captured. (The details are described in the paper, and are
reminiscent of the way GPSG encodes long-distance dependencies in context-free
machinery by using distinct symbols for, say, "verb phrase" and
"verb phrase with a wh-object", although MCFGs do not reject movement
transformations in the way that GPSG does.) In keeping with the slogan that
"Grammars tell us what to count, and statistical methods tell us how to do
the counting", in the Hale/Michaelis method it is the MCFG that tells us
what to count, not the minimalist grammar that we began with. This means that
the things that get counted are not defined by notions such as merge and move
operations, theta roles or case features or wh features, which appeared in the
original minimalist grammar; rather, the counts are tied to less transparent
notions that emerge in the conversion to the MCFG.

We suggest a way
around this hurdle, which allows the "what to count" question to be
answered in terms of merge and move and feature-checking and so on (while still
relying on the context-free characterisation of derivations to a large extent).
The resulting probability model therefore works within the parameters that one
would intuitively expect to be laid out for it by the non-probabilistic
machinery that defines minimalist syntax; to adopt merge and move and
feature-checking and so on is to hypothesise certain joints at which nature is
to be carved, and the probability model we propose works with these same
joints. Therefore to the extent that this kind of probability model fares
empirically better than others based on different nuts and bolts, this would
(in principle, prima facie, all else being equal, etc.) constitute evidence in
favour of the hypothesis that merge and move operations are the correct
underlying grammatical machinery.

Thursday, July 25, 2013

Bill Idsardi sent me this link to an interview of Chomsky when he was in Ann Arbor. It pretty well reprises the themes that Chomsky touched on in his public lecture. The discussion makes clear why Chomsky thinks that the communicative view of language is both empirically incorrect (basic structure does not facilitate or reflect communication goals) and methodologically hopeless as an object of study (just much too complicated) There are also some pretty amusing comments that the interviewer intersperses and a link to the full raw interview. A personal remark: there is something charming about the interview, both the questions and internal dialogue and Chomsky's openness and availability. Those who have interacted with Chomsky will recognize these traits and be, once again, delighted. Enjoy.

Wednesday, July 24, 2013

The LSA summer institute just finished last week. Here are
some impressions.

In many ways it was a wonderful experience and it brought
back to me my life as a graduate student.
My apartment was “functional” (i.e. spare and tending towards the
slovenly). As in my first grad student apartments, I had a mattress on the
floor and an AC unit that I slept under. The main difference this time around
was that that the AC unit I had at U Mich was considerably smaller than the earlier
industrial strength machine that was able to turn my various abodes into a meat
locker (I’m Canadian/Quebecois and ¯“mon pays ce n’est pas
un pays c’est l’hiver…”¯ !). In fact, this time around the AC was more like
ten flies flapping vigorously. It was ok if I slept directly under the fan
(hence the floor mattress). The
downside, something that I do not remember from my experience 40 years ago, was
that this time around, getting up out of bed was more demanding now than it was
then.

I was at the LSA to teach intro to minimalist syntax. It was a fun course to teach. There were
between 80-90 people that attended regularly, about half taking the course for
some kind of credit. To my delight, there was real enthusiasm for minimalist
topics and the discussion in class was always lively. The master narrative for the course was that
the Minimalist Program (MP) aims to answer a “newish” question: what features
of FL are peculiarly linguistic? The first lecture and a half consisted of a
Whig history of Generative Grammar, which tried to locate the MP project
historically. The main idea was that if one’s interest lies in distinguishing
the cognitively general from the linguistically parochial within FL there have
to be candidate theories of FL to investigate. GB (for the first time) provides
an articulated version of such a theory, with the sub-modules, (i.e. Binding
theory, control theory, movement, subjacency, the ECP, X’ theory etc.)
providing candidate “laws of grammar.” The goal of MP is to repackage these
“laws” in such a way as to factor out those features that are peculiar to FL
from those that are part of general cognition/computation. I then suggested that this project could be
advanced by unifying the various principles in the different modules in terms
of Merge, in effect eliminating the modular structure of FL. In this frame of
mind, I showed how various proposals within MP could be seen as doing just
this: Phrase Structure and Movement as instances of Merge (E and I
respectively), case theory as an instance of I-merge, control, and anaphoric
binding as instances of I-merge (A-chain variety) etc. It was fun. The last lectures were by far the
most speculative (it involved seeing if we could model pronominal binding as an
instance of A-to-A’-to-A movement (don’t ask)) but there was a lot of
interesting ongoing discussion as we examined various approaches for possible
unification. We went over a lot of the
standard technology and I think we had a pretty good time going over the
material.

I also went on a personal crusade against AGREE. I did this partly to be provocative (after
all most current approaches to non-local dependencies rely on AGREE in a
probe-goal configuration to mediate I-merge) and partly because I believe that
AGREE introduces a lot of redundancy into the theory, not a good thing, so it
allowed us to have a lively discussion of some of the more recondite evaluative
considerations that MP elevates.[1] At any rate, here the discussion was
particularly lively (thanks Vicki) and fun. I would love to say that the class
was a big hit, but this is an evaluation better left to the attendees than to
me. Suffice it to say, I had a good time and the attrition rate seemed to be
pretty low.

One of the perks of teaching at the institute is that one
can sit in on one’s colleagues’ classes. I attended the class given by Sam Epstein,
Hisa Kitihara and Dan Seely (EKS). It
was attended by about 60 people (like I said, minimalism did well at this LSA
summer camp). The material they covered
required more background than the intro course I taught and EKS walked us
through some of their recent research. It was very interesting. The aim was to
develop of an account of why transfer applies when it does. The key idea was
that cyclic transfer is forced in computations that result in in multi-peaked
structures that themselves result from strict adherence to derivations that
respect (an analogue of) Merge-Over-Move and feature lowering of the kind that
Chomsky has recently proposed. The
technical details are non-trivial so those interested should hunt down some of
their recent papers.[2]

A second important benefit of EKS’s course was the careful
way that they went through some of Chomsky’s more demanding technical
suggestions, sympathetically yet critically.
We had a great time discussing various conceptions of Merge and how/if
labeling should be incorporated into core syntax. As many of you know, Chomsky
has lately made noises that labeling should be dispensed with on simplicity
grounds. Hisa (with kibbitzing from Sam and Dan) walked us though some of his
arguments (especially those outlined in “Problems of Projection”). I was not
convinced, but I was enlightened.

Happily, in the third week, Chomsky himself came and
discussed these issues in EKS’s class.
The idea he proposed therein was that phrases require labels at least when transferred to the CI
interface. Indeed, Chomsky proposed a labeling algorithm that incorporated
Spec-Head agreement as a core component (yes, it’s back folks!!). It resolves labeling ambiguities. To be slightly less opaque: in {X, YP}
configurations the label is the most prominent (least embedded) lexical item
(LI) (viz. X). In {XP, YP} configurations there are two least embedded LIs
(viz. descriptively, the head of X and the head of Y). In these cases,
agreement enters to resolve the ambiguity by identifying the two heads (i.e. thereby making them the same).
Where agreement is possible, labeling is as well. Where it is not, one of the
phrases must move to allow labeling to occur in transfer to CI. Chomsky suggested that this requirement for
unambiguous labeling (viz. the demand that labels be deterministically
computed) underlies successive cyclic movement.

To be honest, I am not sure that I yet fully understand the
details enough to evaluate it (to be more honest, I think I get enough of it to
be very skeptical). However, I can say that the class was a lot of fun and very
thought provoking. As an added bonus, it brought me and Vicki Carstens together
on a common squibbish project (currently under construction). For me it felt
like being back in one of Chomsky’s Thursday lectures. It was great.

Chomsky gave two other less technical talks that were also
very well attended. All in all, a great two days.

There were other highlights. I got to talk to Rick Lewis a
lot. We “discussed” matters of great moment over excellent local beer and some
very good single malt scotch. It was as part of one of these outings that I got
him to allow me to post his two papers here.
One particularly enlightening discussion involved the interpretation of the
competence/performance distinction. He proposed that it be interpreted as
analogous to the distinction between capacities and exercisings of
capacities. A performance is the
exercise of a capacity. Capacities are never exhausted by their
exercisings. As he noted, on this
version of the distinction one can have competence
theories of grammars, of parsers, and of producers. On this view, it’s not that
grammars are part of the theory of competence and parsers part of the theory of
performance. Rather, the distinction marks the important point that the aim of
cognitive theory is to understand capacities, not particular exercisings
thereof. I’m not sure if this is exactly what Chomsky had in mind when he
introduced the distinction, but I do think that it marks an important
distinction that should be highlighted (one further discussed here).

Let me end with one last impression, maybe an inaccurate
one, but one that I nonetheless left with.
Despite the evident interest in minimalist/biolinguistic themes at the
institute, it struck me that this conception of linguistics is very much in the
minority within the discipline at large. There really is a
linguistics/languistics divide that is quite deep, with a very large part of
the field focused on the proper description of language data in all of its vast
complexity as the central object of study. Though, there is no a priori reason why this endeavor should
clash with the biolinguistic one, in practice it does.

The two pursuits are animated by very different aesthetics,
and increasingly by different analytical techniques. They endorse different conceptions of the
role of idealization, and different attitudes towards variation and complexity.
For biolinguists, the aim is to eliminate the variation, in effect to see
through it and isolate the individual interacting sub-systems that combine to
produce the surface complexity. The trick on this view is to find a way of
ignoring a lot of the complex surface data and hone in on the simple underlying
mechanisms. This contrasts with a second conception, one that embraces the
complexity and thinks that it needs to be understood as a whole. On this second
view, abstracting from the complex variety manifested in the surface forms is
to abstract away from the key features of language. On this second view, language IS variation,
whereas from the biolinguistic perspective a good deal of variation is noise.

This, of course, is a vast over-simplification. But I sense that
it reflects two different approaches to the study of language, approaches that
won’t (and can’t) fit comfortably together. If so, linguistics will (has) split
into two disciplines, one closer to philology (albeit with fancy new
statistical techniques to bolster the descriptive enterprise) and one closer to
Chomsky’s original biolinguistic conception whose central object of inquiry is
FL.

Last point: One thing I also discovered is how much work running
one of these Insitutes can be. The organizers at U Michigan did an outstanding
job. I would like to thank Andries Coetze, Robin Queen, Jennifer Nguyen and all
their student helpers for all their efforts.
I can be very cranky (and I was on some days) and when I was, instead of
hitting me upside the head, they calmly and graciously settled me down, solved
my “very pressing” problem and sent me on my merry way. Thanks for your efforts,
forbearance and constant good cheer.

CB assures me that Peter Ackema was NOT review editor when the review of Of Mind and Language mentioned in the previous post was solicited. As such you deserve my sincerest apologies. I can see how being falsely accused of such solicitation would border on defamation of intellect. Sadly, you were review editor at the time of its publication. I hope (and would like to believe) that you were not part of the review process and that the unstoppable wheels of the JL juggernaut would have crushed you had you tried to intervene and delay publication until a modicum of content could have been added. But that is a lot to ask: schedules are schedules after all. So, sorry for personally singling you out. I should have appreciated the long lag time between solicitation and publication would have had another captain at the helm. However, now that you are the editor may I suggest a little more quality control.

Tuesday, July 23, 2013

One of the secrets of time management for the busy
linguistic professional is to know what to read. I rely on two sources to guide
me.

First and most important, I rely on my colleagues. If they
recommend something, I generally rush off to take a look. Why so obedient?
Because I really trust my colleagues’ judgments. Not only do they know a good
paper when they read one, they also know have very good taste in topics, i.e.
they know which are worth worrying about and which a waste of time. They know a
good argument, analysis, criticism, proposal when they see one and, equally
important, can spot work that is best left undisturbed by human eyes. In other
words, they have judgment and good taste and so their recommendations are
golden.

Second I rely on reviews. I love reviews. These come in two
flavors: reviews by those whose taste and judgment you trust and those (let’s
dub these “Inverse” reviews (IR)) by reviewers whose taste and judgment you
don’t. Both are very very useful. You can guess why I value the former. They
strongly correlate with “worthwhiledness,” (W). But the latter are also very
useful for they are full of information (in the technical sense: viz. they
inversely strongly correlate with W). After all if someone with execrable taste
and barely competent analytic abilities loves a certain piece of work, then
what better reason can one have to avoid it.
And the converse holds as well: what better recommendation for a book
than a negative review by one whose taste and competence you deplore. One could
go further, praise from such a source would be reason enough to question one’s
own positive evaluations!

Why do I mention this? For two reasons. First, I recently
came across a very useful review of what I took to be a very good and
enlightening book about the Minimalist Program. The book is Of
Minds and Language. Here’s
a review I did with Alex Drummond. Happily, my positive reaction to this
discussion about Minimalism was seconded by this very negative review here.
Faithful readers of this blog will recognize the deft comments of CB. With
characteristic flair, CB pans the book, and applies her keen critical analysis
to Chomsky’s every (imaginary) faux pas.
So there you have it; a strong recommendation from me (and Alex) and, if
possible, an even stronger inverse recommendation (and hence a definite must-read for the wise) from CB.

One last point: despite the utility of reviews like CB’s for
people like me, they do have one failing. They don’t really engage the subject
matter and so do not add much to the discussion. This is too bad. A good review
is often as (and sometimes, more) interesting as the thing reviewed. Think of
Chomsky’s famous review of Skinner’s Verbal
Behavior for example (here). IRs despite their great informational content
are not really worth reading. As such, a question arises: why do journals like
the Journal of Lingusitics (JL) and
review editors like Peter Ackema solicit this kind of junk? True, Ackema
and JL are doing a public service in
that these reviews are excellent guides about what to read (remember inverse
correlations of quality are reliable
indicators of quality; all that’s needed is a handy minus sign to reveal the
truth). But really, do Ackema and JL really think that this adds anything of
substance? Impossible! So why do they
solicit and print this stuff? I think I know.

Taken in the right frame of mind, these kinds of reviews can
be quite entertaining. I think that JL has joined the Royal Society’s efforts (here)
to lighten the tone of scholarly research by soliciting (unconscious?) parodies.
It’s probably a British thing, you know Monty
Python Does Scholarship or Linguistics
Beyond the Fringe. Or maybe this
stuff reads better with a British accent (I still think that half of what makes
Monty Python, The Goon Squad Show (thanks to David P for correction) and Beyond the Fringe funny is the accent). At any rate, it’s clear that the editors of
these journals have decided that there’s no future in real scholarship and have
decided to go into show business big time (not unlike Science). I have nothing against this, though I do think that they
should have warned their readers before including parody in their pages. But
now you are warned and you can read these papers with pleasure, all the while
also extracting lots of useful information, as one can from perfect negative
correlations.

Oh yes: I am sure that CB will be busy correcting my errors
in the comment sections. I will refrain from replying given my policy of
refraining from engaging CB ever again, but this should not prevent you from
enjoying yourselves.