Gil Kalai’s blog

Chomskian Linguistics

Here is another little chapterette from my book. It follows a chapter based on discussions that followed a post by David Corfield from n-Category Cafe. There, the following thought was raised: Is there something analogous to Chomsy’s theory of language’s structure and language acquisition when it comes to mathematics. One interesting aspects is trying to understand “dyscalculia” which is a term describing children’s learning disabilities in mathematics.

I remember from my youth a book in Hebrew called “Logic, language and method” by Yehoshua Bar-Hillel. Yehoshua Bar-Hellel was a philosopher at the Hebrew University of Jerusalem and among other things wrote together with Micha A. Perles and Eli Shamir some basic papers in the study of automata. Ten years ago I collaborated with his daughter Maya Bar-Hillel (who is a Psychology professor at HU) in studying the “bible code“.

One fact I remember from Bar-Hillel’s book (used there to explain some basic notion of transformations) is the difference between English and Hebrew regarding anti missile missile. In English you say “anti missile missile” and “anti anti missile missile missile” and “anti anti anti missile missile missile missle” while in Hebrew it will be “missile anti missile” “missile anti missile anti missile”, “missile anti missile anti missile anti missile” etc.

Apropos differences between languages, Rodica Simion (see her poem “Immigrant complex,”) once told me that in English and most other languages she knew, you say “more or less” but in Hebrew you say “less or more”. (When I asked around I was told that Greek is like Hebrew; anyway, i will be happy to learn how this saying goes in languages you know.)

These three paragraphs on Chomskian’s linguistics represent a subject where my knowledge is “second hand.” I wonder if it shows.

The Chomskian revolution in linguistics

The Chomskian revolution in linguistics is comprised of three elements. The first is finding common structures and formulating common rules that apply to all human languages (to a much greater extent than before). The second is relating linguistics to studying and making hypotheses about the way children acquire languages. And the third is studying mathematically very abstract forms of languages. Chomsky’s theory of generative grammar is important in all three aspects.

Chomsky’s perception and demonstration of the unifying concepts behind different languages have impacted the way languages are perceived by linguists and by philosophers, and dramatically changed the way linguistics is practiced. Chomsky saw a direct link between the way children acquire language and the internal structure and logic of languages. His works in this direction are regarded as part of the cognitive revolution in psychology. While emphasizing the universal rules behind different grammars, Chomsky also made a strong point regarding the uniqueness of the cognitive aspects of language as compared with other cognitive abilities. He had a famous debate with psychologist Jean Piaget on this subject. Chomsky’s mathematical works on formal languages and the related concept of “automaton” are now fundamental in theoretical computer science.

Chomsky is criticized for being too dominant in the area of linguistics and for leading to unmotivated sharp turns in his own theory. The decline of individual language studies is regarded by some as a negative side-effect of the Chomskian revolution. Others argue that without a major additional statistical ingredient, formal mathematical structures á la Chomsky’s generative grammar and “transformation rules” are insufficient for understanding the structure and acquisition of languages.

One of Chomsky’s arguments (I never read anything technical) for grammar being hardwired is that children learn from few examples and make interesting (but predictable) kinds of mistakes as they learn to speak. Has this ever been made rigorous? I am curious especially in light of advances in machine learning (and I should have asked some of my friends about this earlier, but I guess I’m doing this now).

I mostly agree, having studied Chomsky in my Computational Lingusitic courses at Caltech and UMass. More recently, I wrote 100+ pages of Ed.,D. Dissertation Proposal on use of Chaos Theory, Neurogenetics, and fMRI as a basis for a new science-based Pedagogy. Working title: “Classrooms at the Edge of Chaos.” My examples were drawn from my readings in, battles against, and Federal grant proposals about Dyscalculia in college, university, middle school, and high school. But then the joint UC Irvine/CSULA program dissolved, and was told that I needed mandatory courses if I did the Ed.D. at CSULA alone. As an ex-Math professor, I considered this an insult. But this subject has been on mind regularly since the late 1960s. Or aperiodically, anyway.

In Persian, the standard expression is “kam o bish” which is less or more. The translation of the anti anti… missile would also follow the pattern of Hebrew. My guess is that in the languages in which the adjective typically follows the modified noun (French, Persian, Hebew?) the construction would be similar to missile anti missile, whereas in English, German, etc where the adjective preceeds the noun, one uses something like anti missle missile.

Mastering the combination of refereed science publications, major publisher’s books, and popular magazines, Harvard triple professor Marc Hauser explained in great depth the Chomskian basis of evolutionary Ethics in “Moral Minds: How Nature Designed Our Universal Sense of Right and Wrong”, HarperCollins. A clear summary of the concepts, in interview format, is available on American news stands right now: “The Origin of Right and Wrong”, by Josie Glausiusz, Discover Presents The Brain, pp.5-8.
“… draw on an analogy with language and ask whether there might be something like a universal moral grammar, a set of principles that every human being is born with…. In linguistics, there is a lot of variation that we see in the expressed languages throughout the world. The real deep insight of Chomskian linguistics was to ask, ‘Might this variation at some level be explained by certain common principles of universal grammar?’ That allows, of course, for every language to have its own lexicon….”

“..draw on an analogy with language and ask whether there might be something like a universal moral grammar, a set of principles that every human being is born with…”

Human mind is a universal processor specially tailored to perceive / produce efficiency (economy), symmetry (reasoning by analogy is a symmetrization process), freedom (expressiveness) and order (conventions), and applies its processing capabilities to any input beeing it languages or moral rules. Several human minds acting in parallel upon a given input, by local innovations (driven by above´s 4 criteria) and global imitation, can create such complex things. You can start with any set of moral rules, at the end you will reach similar results. But as it happens with languages, both the conventions and the free-choices can obscure hidden patterns.

Personally I don´t like the name “evolution” applied to to the dynamics of human institutions; it is charged and misleading: “transmutation” would be a more appropiate name.

“… I don´t like the name ‘evolution’ applied to to the dynamics of human institutions…”

To me, this is a Mathematical question. IFF human institutions act functionally (dynamorphism?) as if what we observe expressed (phenotype) is translated from a linear string of tokens (genotype) and the strings are in a database whose operations are those of the GA –Genetic Algorithm (selection with probability proportional to fitness; point mutation, cross-over, inversion; deletion; elimination from the database to make room for new string, with probability inversely proportional to fitness) — then I say that “evolution by natural selection” is exactly what the human institutions are doing.

My biases includes my connection to Holland’s book on the GA. John Holland in “Adaptation in Natural and Artificial Systems”, U. Michigan Press, 1976. I “beta tested” this book while in grad school at U.Mass., while taking Category Theory. I was the first to program the genetic algorithm over semantic spaces where the alleles were mathematical operators, not mere coefficients. This included in 1976 my automatically evolving working APL programs. Any comment on Iverson’s programming languages, intended to make tensors easy, in the light of n-Categories?

Holland proved the big theorem, about the genetic algorithm being exponentially faster than other optimization methods. He did this by looking at the string of alleles evolving in the superspace of the
string of {allele + “don’t care”}^ChromosomeLength.

Each chromosome is both the intersection in hyperspace of hyperplanes with the right pattern, and the intersection in higher dimension of the patterns including “don’t care.”

The actual population at any time is merely a SAMPLE of the superspace.

The theorem proves that the fundamental equation on
dP/dt = f(fitness)

that the rate of growth of the fraction of the population with a given allele is proportional to the fitness of the samples containing that allele, ALSO operated deeply in parallel with the evolution of
virtual populations in the superspace.

It might be worth showing this Categorically! I thought so in grad school.

Since then, among other things, I’ve acquired profound experience in the dynamics of human institutions, including teaching a post-doc seminar in the subject, and publishing several papers at the International Conferences on Computational Analysis of Social and Organizational Systems (CASOS).

So, maybe, tentatively, “… I DO like the name ‘evolution’ applied to to the dynamics of human institutions…”

I explain my comment. In first point i will talk about the reality. In second point i talk about GA (with wich i was not familiar but thanks to your comment i´ve read something today about them).

1. Evolution needs variation, transmission, expression and selection.

In biological systems variations are RANDOM MUTATIONS or RANDOM CROSSOVERS, transmission is made trough BIOLOGICAL HERIDITY (therefore intergenerational), expression is made by STABLE PHENOTIPIC effects in organisms and selection trough ACCIDENTAL ENCOUNTER with environment (no organism but humans can predict wich environment he will find).

In sociological systems variations are FUNCTIONAL INNOVATIONS (thanks to science usually the agent knows how the system works, know his criteria for selecting innovations: order, symmetry, efficiency, freedom and therefore functional innovations are far from random), transmission is made by PLANNED IMITATION (wich can be done during the same generation), expression is made trough the CHAOTIC PRODUCTS OF INSTITUTIONS (a price is a phenotipic expression of a market wich in itself is a set of rules; a novel is a phenotipic expression of a literary genre wich in itself is a set of rules) and the encounter with environment is in general NOT ACCIDENTAL. .

So although the process include same steps in both systems the mechanisms involved in each step are completelly different. In a higher level biological evolution of organisms is much less flexible and gradual: we do not see that within the same generation how a chicken becomes a lion. On the other hand these radical changes are possible in institutions and in complete societies in days: think about the french revolution and many other historical examples. This why i suggest the name transmutation.

2.GA: by the few readings i´ve done, i consider GA as a poor model even of biological evolution, but it is a subject wich in its mathematical more abstract setting (stochastic processes and in particular random walks on graphs) interests me a lot since a while.
If i´m not wrong, state spaces can be modeled as complete Cayley Graphs in the permutation encoding (hypercubes in the binary enconding), population are randomly selected vertices of such graphs and can be considered as processors. Each processor applying sequentially genetic operators generates sequences of states, each of wich is sequentially evaluated in order to check how close it is from a solution (applying the fitness function). I couldn´t find the present state of knowledge about what´s the behavior of GA in the worst case (i assume exponential in some of its parameters such as population size ie number of processors or number of iterations until it finds a solution). I can only say that, by the knowledge in my hand, the correct model of the dynamics of all interesting systems we find in nature (atoms, cells, brain and society) is by far not ruled by an stochastic process.

“You…have 32 fewer years pondering the GA than I have ” Yes, you are right and the respect is mutual.

In any case finally, and to end the discussion (from my side) wich might be annoying our guest, i found the kind of complexity result i was looking for:
“Optimizing an arbitrary function is hard for the genetic algorithm” by Hart and Belew.
For me the conclusion wich can be extracted from this result is that GA (evolution?) is not a such a good searching strategy unles it is applied to STRUCTURED searching spaces (or problems).

By the way i found some interesting results regarding the convergence of Quantum Computing and Genetic algorithms. There is no surprise on the convergence of both fields since as it is well known quantum computing is nothing more than a generalization of stochastic computing.
As an instance of this convergence from a 2008 paper (unfortunately i could only read the abstract) “Quantum model of genetic algorithm” by Wan Peng.
“Similarities of genetic algorithm and quantum algorithm have been analyzed based on implicit parallelism. The genetic algorithm has been described by quantum theory. Some genetic operators are seen as a kind of quantum operator. So it implies genetic algorithm is essentially a kind of quantum algorithm in the classical computer on the reduced order to achieve. It through genetic manipulation to input superposition states data into classical computer. In the evolutionary process, a lot of implicit modes have been measured, so as to realize the parallel search and processing in the model space. The uncertainty characteristic of genetic algorithm is the essential cause of implicit parallelism. This thinking could explain the immanent cause of artificial intelligence”

In Russian and Ukrainian it’s “more or less”: «более или менее» and «бiльш-менш», respectively. The “Anti missile” thing in these languages sounds like in English but consists two words regardless of “anti” parts quantity.

Those who don’t have the education to know what goes on in a Mathematician’s brain can get a glimpse of the human, social nature in common between a great Math mind and a dyscalculia sufferer. Abstract pursuits are by people with feelings and bodies. If you prick us, do we not bleed? Yes, consider a continuously bleeding spherical cow…

1/3 of those in the USA diagnosed with clinical dyscalculia (Math Anxiety or Math Disability) have their brains wired differently than the majority. But 2/3 clinical dyscalculia simply had a bad teacher early in life who threw them off the path to enlightenment. It isn’t easy, but a good teacher can regress them (get them to unlearn the wrong stuff) and then proceed forwards again. This is what the
dyscalculia literature shows. This is also what I experience 100% of the time as a Math professor and high school/middle school math teacher.