Comments

Wednesday, March 30, 2016

Linguistics from a Marrian perspective 2

This post follows up on this
one. The former tries to identify the relevant computational problems that
need solving. This one discusses some ways in which the Marrian perspective
does not quite fit the linguistic situation. Here goes.

Theories that address computational problems are
computational level 1 theories. Generative Grammar (GG) has offered accounts of
specific Gs of specific languages, theories of FL/UG that describe the range of
options for specific Gs (e.g. GB is such a theory of FL/UG) and accounts that
divide the various components of FL into the linguistically specific (e.g.
Merge) and the computationally/cognitively general (e.g. Merge vs. feature
checking and minimal search). These accounts aim to offer partial accounts for
the three questions in (1-3). How do they do this? By describing,
circumscribing and analyzing the class of generative procedures that Gs
incorporate. If these theories are on the right track, they partially explain
how it is that native speakers can understand and produce language never before
encountered, what LADs bring to the problem of language acquisition that
enables them to converge on Gs of the type they do despite the many splendored
poverty of the linguistic input and (this is by far the least developed
question) how FL might have arisen from a pre-linguistic ancestor. As these are
the three computational problems, these are all computational theories. However,
the way linguists do this is somewhat different from what Marr describes in his
examples.

Marr’s general procedure is to solve level 1 problems by
appropriating some already available off the shelf “theory” that models the
problem. So, in his cash register example, he notes that the problem is
effectively an arithmetical one (four functions and the integers). In vision
the problem is deriving physical values of the distal stimulus given the input
stimuli to the visual system. The physical values are circumscribed by our
theories of what is physically possible (optics, mechanics) and the problem is
to specify these objective values given proximate stimuli. In both cases,
well-developed theories (arithmetic, optics) serve to provide ways of
addressing the computational problem.

So, for example, the cash register “solves” its problems by
finding ways of doing addition, subtraction, multiplication and division of
numbers which corresponds to adding items, subtracting discounts, adding many
of the same item and providing prices per unit. That’s what the cash register
does. It does basic arithmetic. How does it do it? Well that’s the level 2
question. Are prices represented in base 2 or base 10? Are discounts registered
on the individual items as registered or is taken off the total at the end?
These are level 2 questions of the level 1 arithmetical theory. There is then a
level 3 question: how are the level 2 algorithms and representations embodied?
Silicon? Gears and fly-wheels? Silly putty and string? But observe that the
whole story begins with a level 1 theory that appropriates an off the shelf
theory of arithmetic.

The same is true of Marr’s theories of early vision where
there are well-developed theories of physical optics to leverage a level 1
theory.

And this is where linguistics is different. We have no
off-the shelf accounts adequate to describing the three computational problems
noted. We need to develop one and that’s what GG aims to do: specify level 1
computational theories to describe the lay of the linguistic land. And how do
we do this? By specifying generative procedures and representations and
conditions on operations. These theories circumscribe the domain of the possible; Gs tell us what a possible linguistic object in a specific
language is, FL/UG tells us what a possible
G is and Minimalist theories tell us what a possible
FL is. This leaves the very real question of how the possible relates to the
occurrent: how do Gs get used to figure out what this sentence means? How does FL/UG get used to build this G that the LAD is acquiring? How
does UG combine with the cognitive and computational capacities of our
ancestors to yield this FL (i.e. the
ones humans in fact have)? Generative procedures are not algorithms, and (e.g.)
representations the parser uses need not be the ones that our level 1 G
theories describe.

Why mention this? Because it is easy to confuse procedures
with algorithms and representations in Marr’s level 2 sense with Chomsky’s
level 1 sense. I know that I confused them, so this is in part a meaculpa
and in part a public service. At any rate, the levels must be kept conceptually
distinct.

I might add that the reason Marr does not distinguish
generative procedures from algorithms or level 1 from level 2 representations
is that for him, there is no analogue of generative procedures. The big
difference between linguistics and vision is that the latter is an input system
in Fodor’s sense, while language is a central system. Early visual perception
is more or less pattern recognition, and the information processing problem is
to get from environmentally generated patterns to the physical variables that
generate these patterns.[1]

There is nothing analogous in language, or at least not
large parts of it. As is well known, the syntactic structures we find in Gs are
not tied in any particular way with the physical nature of utterances.
Moreover, linguistic competence is not related to pattern matching. There are
an infinite number of well-formed “patterns,” (a point that Jackendoff rightly
made many moons ago). In short, Marr’s story fits input systems better than it
does central systems like linguistic knowledge.

That said, I think that the Marr picture presses an issue
that linguists should be taking more seriously. The real virtue of Marr’s
program for us lies in insisting that
the levels should talk to one
another. In other words, the work on any level could (and should) inform the
theories at the other levels. So, if we know what kinds of algorithms
processors use then this should tell us something abut the right kinds of level
1 representations we should postulate.

The work by Pietroski et. al. on most (discussed here)
provides a nice illustration of the relevant logic. They argue for a particular
level 1 representation of most in
virtue of how representations get used to compare quantities in certain visual
tasks. The premise is that transparency between level 1 and level 2 representations
is a virtue. If it is, then we have an argument that the structure of most looks like this: |{x: D (x) & Y (x)}| > |{
x: D (x)}| - |{x: D(x) & Y (x)}| and not like
this: |{x: D (x) & Y (x)}| > {x: D (x)
& - Y (x)}|.

Is transparency a reasonable
assumption. Sure, in principle. Of course, we may find out that it raises
problems (think of the Derivational Theory of Complexity (DTC) in days of
yore). But I would argue that this is a good
thing. We want our various level theories to inform one another and this means
countenancing the likelihood that the various kinds of claims will rub roughly
against one another quite frequently. Thus we want to explore ideas like the
DTC and representational transparency that link level 1 and level 2 theories.[2]

Let me go further: in other
posts I have argued for a version of the Strong Minimalist Thesis (here and here and
here) which can be recast in Marr terms as
follows: assume that there is a strong transparency between level 1 and level 2
theories in linguistics. Thus, the objects of parsing are the same as those we
postulate in our competence theories, and the derivational steps index
performance complexity a BOLD responses and other CN measures of occurrent
processing and real time acquisition and… This is a very strong thesis for it says that the categories and procedures
we discover in our level 1 theories strongly correlate with the algorithms and
representations in our level 2 theories. That would be a very strong claim and
thus very interesting. In fact, IMO, interesting enough to take as a regulative
ideal (as a good research hypothesis to be explored until proven decisively
wrong, and maybe even then). This is what Marr’s logic suggests we do, and it
is something that many linguists feel inclined to resist. I don’t think we
should. We should all be Marrians now.

To end: Marr’s view was that
CNers ignored level 1 theories to their detriment. In practice this meant
understanding the physical theories that lie behind vision and the physical
variables that an information processing account of vision must recover. This
perspective had real utility given the vast amount we know about the physical
bases of visual stimuli. These can serve to provide a good level 1 theory.
There is no analogue in the domain of language. The linguistic properties that
we need to specify in order to answer the three computational problems in (1-3)
are not tied down in any obvious ways to the physical nature of the “input.”
Nor do Gs or FL appear to be all that interesting mathematically so that there
is off the shelf stuff that we can use to specify the countours of the
linguistic problem. Sure, we know that we need recursive Gs, but there are
endlessly many different kinds of recursive systems and what we want for a
level 1 linguistic theory is a specification of the one that characterizes our
Gs. Noting that Gs are recursive is, scientifically, a very modest observation
(indeed, IMO, close to trivial). So, a good deal of the problem in linguistics
is that posing the problem does not invite a lot of pre-digested technology
that we can throw at it (like arithmetic or optics). Too bad.

However, thinking in level terms
is still useful for it serves as a useful reminder that we want our level 1
theories to talk to the other levels. The time for thinking in these terms
within linguistics has never been more ripe. Marr 3 level format provides a
nice illustration of the utility of such cross talk.

[1]
Late vision, the part that gets to object recognition, is another matter. From
what I can tell, “higher” vision is not the success story that early vision is.
That’s why we keep hearing about how good computers are at finding cats in
Youtube videos. One might surmise that the problem vision has with object
recognition is that they have not yet developed a good level 1 theory of this
process. Maybe they need to develop a notion of a “possible” visual object.
Maybe this will need a generative combinatorics. Some have mooted this
possibility. See this
on “geons.” This kind of theory is recognizably similar to our kinds of GGs. It
is not an input theory, though like a
standard G it makes contact with input systems when it operates.

[2]
Let me once again recommend earlier work by Berwick and Weinberg (e.g. here)
that discuss these general issues lucidly.

8 comments:

I'm not sure that there is an algorithmic (level 2) level for syntax itself. I think basically there's a computational level, and a physical level, but syntactic knowledge is not knowledge of process in any sense. Sure you can use syntactic knowledge in performing tasks (thinking or speaking or whatever), but what that knowledge is is a recursive specification an infinite range of structures, not any kind of process. If that's right, then we could take one of those tasks (say parsing) and ask for that task what the relevant levels are. Here, because we have a process, we can indeed specify things at the three levels. What is the computational problem (the map from continuous speech waves to the syntactic representation), what particular algorithm carries it out (is it perfectly incremental, is it parallel, etc) and what wetware carries it out (which bits of anatomy are recruited and what do they do). But that is because parsing is an activity/process, which syntax is not (at least arguably, though Colin may beg to differ!). So we can apply Marrian levels to parsing quite easily, but for syntax itself, I think there's at least a reasonable position that says that it's only computational and physical.

I'm not sure. We can specify two computational problems as Chomsky seems to do. The first is the creative use of language (more or less Chomsky's terminology) and the acquisition of Gs. Both these are processes. The supposition is that part of explaining how this works is via postulation of FL/UGs and Gs which provide abstract descriptions of what these activities do: they map sounds into meanings and meanings into sounds as per our best theories of those Gs and they map PLD into Gs as per our best theories of FL/UG. There are algorithms that compute the functions these Gs and FL/UG describe. Of course, generative procedures are not themselves algorithms, they are part of the intensional description of the function that mediates S and M and PLD and G, just like Marr's computational level theories describe the functions that the algorithms compute.

Now, we can make an additional (and IMO very useful assumption) that there is a pretty transparent relation between the structure of the computational theories and the algorithms that use it. So, for example, we might assume that derivational steps mirror the algorithmic ones (the old DTC) and we might assume that the structured objects that the Gs generate are mirrored in the objects that our on-line algorithms construct. This may or may not be true, but it would be interesting were it so and there is some non-trivial evidence suggesting that this is often true. The same kind of assumption might be made my Marr in vision, and indeed, the physical constraints that characterize the "real" objects are mirrored by cognitive principles in the algorithmic domain (e.g. the rigidity principle is used in the visual systems real time computations to recover values for objective visual variables).

If this is on the right track, then were I agree with you is that we should not confuse generative procedures with algorithmic ones (though we might postulate that they are closely related) and we should not think of Gs and FL/UG as theories of use (they are not). But I think that we should believe that they are necessary steps in characterizing such theories for they specify what the language system is doing when processing/producing and acquiring, i.e. provide computational level accounts of the computational problems that must be solved.

Last point: I am not sure that I think that Gs and FL/UG are physically embodied apart from the systems of use that exploit them. I am not sure that I understand the claim that Gs and FL/UG enjoy physical existence independently of the systems that use them. Fortunately, I don't think it matters right now HOW I think of this question. Maybe there is, maybe there isn't. I have generally assumed that FL/UG specifies the LAD, IN PART. And I am happy to think that Gs specify properties of parsing and producing IN PART. Whether this means that they "live" in some part of the brain independently of these systems of use is not something that I feel committed to (or even really understand). At any rate, I think that if Gs and FL/UG are computational level theories then they should inform us about how systems use them, though they are not themselves systems of use. But this is what computational theories always do, at least in Marr. The levels talk to one another without being identical. I can live with that view for linguistic theories as well.

I think I agree with David in that applying Marrian levels to e.g. parsing is much more intuitive: we have a sound wave that corresponds to light wave/particles in visual processing as input, some algorithm that carries out computations and possibly dedicated wetware. What causes confusion is that, as Norbert notices, real physical objects unlike G-generated objects are not generated by anything, so investigating level 2 of visual processing probably doesn't require knowledge about the nature of physical objects we see.

However, to fully understand the algorithmic level of speech processing we would have to explore the nature of Gs and generated objects, and this is what we basically do (It would be nice if DTC is true). The problem just doesn't seem as straightforward as vision, we go back and forth. For example Gs are result of acquisition and acquisition obviously relies on the algorithmic level of speech perception. So to investigate level 2 would mean to investigate and explain something that relies on level 2, but that is exactly what we are trying to explain. Sorry, if this leads to more confusion.

I think we’re in broad agreement, Norbert, but I guess I don’t see the creative use of language as the problem syntactic theory is addressing - the problem syntactic theory addresses is the unbounded pairing of form and meaning, and that problem seems to me to have a computational and not an algorithmic level. Any problem that involves using syntax will have an algorithmic level as, like vision, there’s a process involved so the computational level of description of the problem is something which is (presumably) implemented algorithmically.

I guess I see syntax as a bit like a logic. You can state the nature of the logic computationally (as rules essentially), but how you do a particular proof, while constrained by those rules, can involve many different kinds of algorithms. So putting the logic to use involves Marr’s algorithmic level, but simply saying what the logic is (which is what I think syntactic theory does), involves the computational level only. That computational level specification constrains the use to just those algorithms that respect the overall computational nature of the system, so that’s an initial set of constraints on use that’s given by the logic of the argument. I agree that it is plausible, though not necessary of course, that the computational system is also a factor in optimisation of the systems of use.

In terms of physical specification, I don’t mean anything very deep by it, just that we are physical beings, so our physical architecture, whatever it is, must have a special structure that supports the computational nature of language. I think I disagree that there’s nothing to this beyond the physical nature of the systems of use. One could imagine a creature that has the physical wherewithal to support the underlying logic of syntax (some organisation that supports numerically unbounded hierarchy) without any physical structure that would support a system for using it (the converse of Chomsky’s evolutionary scenario).

The creative aspect of language use (that humans can produce and understand an unbounded number of novel sentences) is the BIG FACT that Chomsky noted in *Issues *was one of the phenomena that needed explanation. He noted that a G was a necessary condition for explaining this obvious behavioral fact. Without a function that relates S and Ms over an unbounded domain there is no possible hope of accounting for this evident creative capacity. My question is what kind of theory is a G in this context? I suggested that a Marr level 1 theory seems to fit the bill. Seen thus, we can ask what level 2 theories might look like. They would be theories that showed how to compute Gish properties in real time (for processing and production say). So, Gs are level 1 theories and the DTC, for example, is a level 2 theory.

The main problem with Marr's division is not that we can't use it (I did), but that the leverage Marr got out of a level 1 theory in vision and cash registers was absent in syntax. Why? Because he was able to use already available theories for level 1. In the vision case, it is physical optics which relates actual physical magnitudes (luminescence, shape, parallax etc) to info on the retina. The problem becomes how to calculate these real magnitudes from retinal inputs (a standard inverse problem). In the cash register case we have arithmetic. It turns out that calculating prices is a simple mathematical problem with a recognizable arithmetical structure. Given this we can ask how a cash register does the calculation, given that we know what the calculation is.

None of this is true of language.Thus, Gs are not determined by physical magnitudes the way vision is and there is no interesting math that lies behind syntax (or if there is we haven't found it yet). We need to construct the level 1 theory from scratch and that's what syntacticians do. We show what kind of recursive procedure language uses. We argue that it is one that uses rules of a certain format and representations of a certain shape. The data we typically use is performance data (judgments) hopefully sanitized to remove many performance impediments (like memory constraints and attention issues). We assume that this data reflects an underlying mental system (or at least I do) that is casually responsible for the judgment data we collect. So we use some cleanish performance data to infer something about the structure of a level 1 theory.

Now if this is the practice, then we all know that it runs together level 1 and level 2 considerations. You cannot judge what you cannot parse. But that's life. We also recognize that delving more deeply into the details of performance might indicate that the level 1 theories we have might need refining (the representations we assume might not be the ones that real time parsing uses, the algorithms might not reflect the derivational complexity of the level 1 theory). Sure. But, and here I am speaking personally, the big payoff would be if the two lined up pretty closely. Syntactic representations might not be use-representations but it would be surprising to me if they diverged radically. After all if they did, then how come we pari the meanings we do with the sounds we do? If the latter fact is due to our G competence then we must be parsing a G function in real time when we judge the way we do. Ditto with the DTC, which I personally believe we have abandoned too quickly. At any rate, because we don't enjoy "autonomous" level 1 theories as in vision and cash registers our level 1 and 2 theories blend a bit more and the distinction is useful but should not be treated as a dualism. In fact, I take the Pietroski et al work to demonstrate the utility of not taking the problem as simply finding pairings of S and Ms. How the system engages with other systems during performance can tell us something about the representational format of the system beyond what S,M pairings might.

Last point: I agree that one could imagine the physical scenario you imagine. I can imagine having syntax embodied someplace explicitly or implicitly without being usable. I can even imagine that what we know is in no way implicated in what we do. But I would find this very odd for our case and even odder for our practice given that what we do in practice is infer what we know by looking at what we do in a circumscribed set of doings. This does not imply that we should reduce linguistic knowledge to behavior, but it does seem to imply that our behavior exploits the knowledge we impute and that it is a useful guide to the structure of that knowledge. Once one makes that move, why are some bits of behavior more privileged than others IN PRINCIPLE? I can't see why. And if not, though the competence/performance distinction is useful I would hesitate to confer on it metaphysical substance. I would actually go a little further: as a regulative ideal we should assume strong transparency between level 1 and level 2 theories in linguistics, though this is not as obvious an assumption to make in the domain of cash registers and vision.

That said, I would bet that given enough time and beer we would discover that we are saying the same things in slightly different ways. I look forward to the occasion when we can test this bold hypothesis.

Isn’t the mapping of linguistically expressible meanings to syntactic structures a computational problem, in the Marr sense? (I don’t mean ‘mapping’ in any specific technical sense, I just mean that there’s a clearly defined computational problem here in the correlation of meaning with syntax, even in the absence of an objective independent characterization of meaning, or even of linguistically expressible meaning.)

I'm thinking that that is a high-level computational problem that could be stated abstractly enough for broad consensus, and then there could be radically different algorithmic proposals about the syntactic architecture.

I’m thinking of Generative Semantics, Jackendoff’s Parallel Architectures, and a whole range of different approaches to interpretive syntax, including some with lots of fine grained semantic categories and features (think subcategorization and triggered Merge, or cartographic topic and focus features, or Beghelli and Stowell's quantifier positions, or various syntactic approaches to lexical semantics), and others with as little semantics as possible (call it austere Minimalism).

So, though correct me if I’m wrong, David suggests there’s no algorithmic statement of the properties of the system which generates an infinitude of convergent syntactic structures (one way of stating the creativity problem); but a lot of syntactic argumentation is concerned with the interface with semantics and it seems to me that does introduce an algorithmic level.

briefly on Peter's final comment because I need to think about Norbert's longer comment and the beach here in Australia is calling ... I think the mapping to thought may indeed be very similar to the mapping to pronunciation, and that presumably will be characterisable at the algorithmic level since it's more than a pairing (presumably it involves some kind of process that links the thought to whatever is the final step of the transduction from syntax to thought). So that bit, like processing a sensory percept, is very like the vision problem, and it makes sense to tackle it at the various Marrian levels. Not sure that the syntax which lives in the space between the transductions to meaning and form is so characterisable, though.