Commercial Intelligencesystems that know and understand and think and learn2016-11-16T12:44:03ZWordPresshttp://haleyai.com/wordpress/feed/atom/paul@haleyAI.comhttp://www.haleyAI.comhttp://haleyai.com/wordpress/?p=8622016-11-16T12:44:03Z2016-11-16T12:44:03ZThis is pretty impressive work by Google!

They are seeing the objective behind the query. It’s pretty simple, in theory, to see the verb “read” operating on the object “string” with source (i.e., “from) being consistent with an input stream (also handling the concatenated compound).

More impressive is that they have learned from such queries and content that people view following such queries, perhaps even more deeply, that character streams, scanners, and stream APIs are relevant.

And they have also narrow my results based on the frequency that I look at Java versus other implementation languages.

]]>0paul@haleyAI.comhttp://www.haleyAI.comhttp://haleyai.com/wordpress/?p=8412015-10-23T14:53:26Z2015-10-23T14:53:26ZIn preparing for some natural language generation[1], I came across some work on natural logic[2][3] and reasoning by textual entailment[4] (RTE) by Richard Bergmair in his PhD at Cambridge:

The work he describes overlaps our approach to robust inference from the deep, variable-precision semantics that result from linguistic analysis and disambiguation using the English Resource Grammar (ERG) and the Linguist™.

Mr. Bergmair’s semantic logic project has 2 components:

A Python platform for experimentation with semantics:
i.e., software for converting the minimal recursion semantics (MRS) produced using the ERG into first-order logic

A textual entailment engine using Monte Carlo techniques on the first-order predicate calculus (FOPS) produced above

The following comments are particularly interesting:

With the scoping machinery and the first-order approximation in place, PyPES™ makes it possible to translate text into formulae of FOPC. This is what Boxer does for CCG and what Glue Semantics does for LFG.

…the main problem with Boxer and glue semantics is their strong commitment throughout to classical bivalent logic, which is limited in its ability to represent natural language semantics.

FOPC lacks some kinds of expressive power that are important for natural language, such as quantifiers like most as well as weakening and strengthening modifiers like very.

The straightforward logical encoding used by Boxer and glue semantics leads to over-commitment in some places, for example forcing strictly recursive quantifier scopings when little or nothing is known about the scopings from the natural language input

Mr.Bergmair continues his comment regarding strict resolution of all scope ambiguities by citing the use of slacker semantics rather than resolving ambiguities of scope. Implicit in this statement is that the first component is not typically used to resolve all scopal ambiguities.

I should note that Mr. Bergmair does not discuss the resolution of grammatical ambiguity. For short sentences, as in most of his examples, grammatical ambiguity is low. But it is combinatoric and becomes significant for sentences over 10 words long. Presumably, Mr. Bergmair works from the best ranking parse. However, the best ranking parse rarely includes the intended semantics for sentences significantly more than 10 words in length.[8]

I am not surprised that Mr. Bergmair does not address grammatical ambiguity and deemphasizes resolution of scopal ambiguities since to do so can be difficult without a user interface such as in the Linguist. Given an RTE focus this is a reasonable position since there is no human assistance as in a cognitive-computing approach. However, for sentences of moderate length or longer, ambiguity of grammar, let alone of logic will produce combinatorial ambiguities of entailment that even a Monte Carlo approach may not be able to address. If this is the case, then slacker semantics (i.e., heuristic reasoning from under-specified semantics) will have to be reduced significantly, perhaps close to the point of elimination of ambiguity.

This is precisely what we emphasize with the Linguist. Consequently, Mr. Bergmair’s approach and others we are pursuing will result in more robust, deep, and precise reasoning.

]]>0paul@haleyAI.comhttp://www.haleyAI.comhttp://haleyai.com/wordpress/?p=8232016-06-09T13:09:20Z2015-10-22T22:10:19ZThe title is in tribute to Raj Reddy’s classic talk about how it’s hard to wreck a nice beach.

I came across interesting work on higher order and semantic dependency parsing today:

So I gave the software a try for the sentence I discussed in this post. The results discussed below were somewhat disappointing but not unexpected. So I tried a well know parser with similar results (also shown below).

There is no surprise here. Both parsers are marvels of machine learning and natural language processing technology. It’s just that understanding is far beyond the ken of even the best NLP. This may be obvious to some, but many are surprised given all the hype about Google and Watson and artificial intelligence or “deep learning” recently.

I was a little surprised that it missed several things, especially mistakes on the parts of speech of “mammals nursing”.

The semantics from FrameNet are obviously mistaken, but the mistake in what “to maturity” complements was also disappointing.

Of course, this is a statistical dependency parser, so the right parse might be somewhere down in the ranking. It just shows how hard it is to train such parsers.

The Stanford parser more or less defines the state of the broader art and has been thoroughly trained and tuned, so I thought it would not make such mistakes.

So I gave it a try on a simpler sentence after contemplating the results above.

I was even more surprised by this result, in which “mammals” is interpreted as a verb and “nurse” is interpreted as a noun!

And the interpretation of “young” is poor in each of the above results (it doesn’t make sense to “possess” an adjective).

The bottom line is that understanding English well enough to acquire useful logical knowledge requires a cognitive computing approach (i.e., both man and machine, not just the latter).

]]>0paul@haleyAI.comhttp://www.haleyAI.comhttp://haleyai.com/wordpress/?p=8122015-10-22T22:31:42Z2015-10-22T18:31:25ZConsider the following disambiguation result from a user of Automata’s Linguist™.

Notice the following:

conflation is set to “all” in the right most drop-down list on the toolbar

rendering of elided arguments is suppressed since the “…” item on the toolbar is not toggled on

the pronoun “them” is specified as referencing “mammals” rather than “their young”

“to maturity” complements the conjunction of nurse and nurture

which is fine if that is the intent, but we assume it should complement only “nurture”

the existential symbol before a unary predicate for “maturity”

where a binary predicate for maturity having an “of” argument may be desirable

where that second argument would be specified as referencing “their young”

two rendering problems given that all conflation is enabled and ellipsis is not to be rendered

i.e., they should be treated as elided given they are not referenced

the variable ?e2 should occur a 2nd time (in the “and” literal)

otherwise the head complemented by the “to” literal is unspecified

the existential quantifications of variables ?e9 and ?e21 should be suppressed

The rendering problems were reported to Automata and addressed as reflected in the concluding screen capture of this document.

To correct what “to maturity” complements, we either disambiguate again (using the appropriate tool item, all of which have tool-tips) or we re-parse the sentence, which results in the following dialog:

As this is the desired syntactic structure, we approve the dialog and continue disambiguation.

The following clauses are automatically selected as a result of our syntactic approval:

There being no further semantic disambiguation to be performed we can change to “detailed” clauses using the appropriate drop-down list on the toolbar or we can proceed to the derivations tab.

The derivations tab will show us what appears to be equivalent semantics.

The detailed clauses allow us to discriminate between these 2 derivations by selecting that “their” is a plural reference:[1]

After which a single derivation survives, as shown below:

The most obvious next steps are to resolve the pronouns, as in:

The next most obvious step is to specify “mammals” as having the outermost scope and being universally quantified. This is done by selecting “outermost” from the drop-down list when clicking on the “within” column in the row for the quantification of “Mammals.” and selecting the universal quantifier from the drop-down list when clicking on the logic column of the same row.

It is probably the case that we want universal quantification over the young rather to assert that for every mammal there exists some young that it nurses and nurtures.

The existential quantification of the situational variables are implicit in the nurse and nurture literals, respectively, but are not rendered for brevity (since they are not co-referenced).

We can select either of the first 2 readings as desirable (the order of the adjacent existential quantifiers has no logical effect) since each existential should be within the scope of both universals.

After Automata addressed the rendering issue, the result is as follows:

[1] the singular sense of “their” can be eliminated from the lexicon entirely if more formal but less robust grammar is expected
]]>1paul@haleyAI.comhttp://www.haleyAI.comhttp://haleyai.com/wordpress/?p=7952015-10-22T22:30:03Z2015-10-22T17:45:32ZEducational technology for personalized or adaptive learning is based on various types of pedagogical models.

Google provides the following info-box and link concerning pedagogical models:

As described in this paper, pedagogical models are cognitive models or theoretical constructs derived from knowledge acquisition models or views about cognition and knowledge, which form the basis for learning theory.

The term “knowledge graph” is particularly common, including Google’s (link).

There are some serious problems with these names and some varying limitations in their pedagogical efficacy. There is nothing cognitive about Declara’s graph, for example. Like the others it may organize “concepts” (i.e., nodes) by certain relationships (i.e., links) that pertain to learning, but none of these graphs purports to represent knowledge other than superficially for limited purposes.

Each of Google’s, Knewton’s, and Declara’s graphs are far from sufficient to represent the knowledge and cognitive skills expected of a masterful student.

Each of them is also far from sufficient to represent the knowledge of learning objectives and instructional techniques expected of a proficient instructor.

Nonetheless, such graphs are critical to active e-learning technology, even if they fall short of our ambitious hope to dramatically improve learning and learning outcomes.

The most critical ingredients of these so-called “knowledge” or “cognitive” graphs include the following:

learning objectives

educational resources, including instructional and formative or summative assessment items

relationships between educational resources and learning objectives (i.e., what they instruct and/or assess)

relationships between learning objectives (e.g., dependencies such as prerequisites)

The following user interface supports curation of the alignment of educational resources and learning objectives, for example:

And the following supports curation of the dependencies between learning objectives (as in a prerequisite graph):

Here is a presentation of similar dependencies from Kahn Academy:

And here is a depiction of such dependencies in Pearson’s use of Knewton within MyLab (cited above):

Of course there is much more that belongs in a pedagogical model, but let’s look at the fundamentals and their limitations before diving too deeply.

Prerequisite Graphs

The link on Knewton recommendations cited above includes a graphic showing some of the learning objectives and their dependencies concerning arithmetic. The labels of these learning objectives include:

reading & writing whole numbers

adding & subtracting whole numbers

multiplying whole numbers

comparing & rounding whole numbers

dividing whole numbers

And more:

basics of fractions

exponents & roots

basics of mixed numbers

factors of whole numbers

But:

There is nothing in the “knowledge graph” that represents the semantics (i.e., meaning) of “number”, “fraction”, “exponent”, or “root”.

There is nothing in the “knowledge graph” that represents what distinguishes whole from mixed numbers (or even that fractions are numbers).

There is nothing in the “knowledge graph” that represents what it means to “read”, “write”, “multiply”, “compare”, “round”, or “divide”.

Graphs, Ontology, Logic, and Knowledge

Because systems with knowledge or cognitive graphs lack such representation, they suffer from several problems, including the following, which are of immediate concern:

dependencies between learning objectives must be explicitly established by people, thereby increasing the time, effort, and cost of developing active learning solutions, or

learning objectives that are not explicitly dependent may become dependent as evidence indicates, which requires exponentially increasing data as the number of learning objectives increases, thereby offering low initial and asymptotic efficacy versus more intelligent and knowledge-based approaches

For example, more advanced semantic technology standards (e.g., OWL and/or SBVR or defeasible modal logic) can represent that digits are integers are numbers and an integer divided by another is a fraction. Consequently, a knowledge-based system can infer that learning objectives involving fractions depend on some learning objectives involving integers. Such deductions can inform machine learning such that better dependencies are induced (initially and asymptotically) and can increase the return on investment of human intelligence in a cognitive computing approach to pedagogical modeling.

As another example, consider that adaptive educational technology either knows or do not know that multiplication of one integer by another is equivalent to computing the sum of one the other number of times. Similarly, they either know or they do not know how multiplication and division are related. How effectively can the improve learning if they do not know? How much more work is required to get such systems to achieve acceptable efficacy without such knowledge? Would you want your child instructed by someone who was incapable of understanding and applying such knowledge?

Semantics of Concepts, Cognitive Skills, and Learning Objectives

Consider again the labels of the nodes in Knewton/Pearson’s prerequisite graph listed above. Notice that:

the first group of labels are all sentences while the second group are all noun phrases

the first group (of sentences) are cognitive skills more than they are learning objectives

i.e., they don’t specify a degree of proficiency, although one may be implicit with regard to the educational resources aligned with those sentences

the second group (of noun phrases) refer to concepts (or, implicitly, sentences that begin with “understanding”)

the second group (of noun phrases) that begin with “basics” are unclear learning objectives or references to concepts

For adaptive educational technology that does not “know” what these labels mean nor anything about the meanings of the words that occur in them, the issues noted above may not seem important but they clearly limit the utility and efficacy of such graphs.

Taking a cognitive computing approach, human intelligence helps artificial intelligence understand these sentences and phrases deeply and precisely. A cognitive computing approach also results in artificial intelligence that deeply and precisely understands many additional sentences of knowledge that don’ fit into such graphs.

For example, the system comes to know that reading and writing whole numbers is a conjunction of finer grained learning objectives and that, in general, reading is a prerequisite to writing. It comes to know that whole numbers are non-negative integers which are typically positive. It comes to know that subtraction is the inverse of addition (which implies some dependency relationship between addition and subtraction). In order to understand exponents, the system is told and learns about raising numbers to powers and about what it means to square a number. The system is told and learns about roots how they relate to exponents and powers, including how square roots relate to squaring numbers. The system is told that a mixed number is an integer and proper fraction corresponding to an improper fraction.

Adaptive educational technology either understands such things or it does not. If it does not, human beings will have to work much harder to achieve a system with a given level of efficacy and subsequent machine learning will take a longer time to reach a lower asymptote of efficacy.

Creativity is just connecting things. When you ask creative people how they did something, they feel a little guilty because they didn’t really do it, they just saw something. It seemed obvious to them after a while. That’s because they were able to connect experiences they’ve had and synthesize new things. And the reason they were able to do that was that they’ve had more experiences or they have thought more about their experiences than other people. “Unfortunately, that’s too rare a commodity. A lot of people in our industry haven’t had very diverse experiences. So they don’t have enough dots to connect, and they end up with very linear solutions without a broad perspective on the problem. The broader one’s understanding of the human experience, the better design we will have.

As it turns out, my colleagues at Knowmatters worked extensively with Jobs. I only pursued the quote further because of the stories they have told.

The interview/article states:

Mr. Thiel spends much of his time agitating to change how we educate people and create economic and technological growth. In his book “Zero to One,” written with Blake Masters, Mr. Thiel argues that society has become too rule-oriented, and people need to devise ways to think differently, and find like-minded individuals to realize goals.

And quotes him:

We’ve built a country in which people are tracked, from kindergarten to graduate school, and everyone who is “successful” acts the same way. That is overrated. It distorts things and hurts growth.

This is the “one size fits all” approach to standardized education. Personalized e-learning promises to disrupt this. The resulting creativity and growth is aspirational for Knowmatters.

]]>0paul@haleyAI.comhttp://www.haleyAI.comhttp://haleyai.com/wordpress/?p=7752014-09-13T13:24:32Z2014-09-13T12:31:00ZThanks to John Sowa’s comment on LinkedIn for this link which, although slightly dated, contains the following:

In August, I had the chance to speak with Peter Norvig, Director of Google Research, and asked him if he thought that techniques like deep learning could ever solve complicated tasks that are more characteristic of human intelligence, like understanding stories, which is something Norvig used to work on in the nineteen-eighties. Back then, Norvig had written a brilliant review of the previous work on getting machines to understand stories, and fully endorsed an approach that built on classical “symbol-manipulation” techniques. Norvig’s group is now working within Hinton, and Norvig is clearly very interested in seeing what Hinton could come up with. But even Norvig didn’t see how you could build a machine that could understand stories using deep learning alone.

But in his keynote speech on Monday, Oren Etzioni, a prominent computer scientist and chief executive of the recently created Allen Institute for Artificial Intelligence, delivered a call to arms to the assembled data mavens. Don’t be overly influenced, Mr. Etzioni warned, by the “big data tidal wave,” with its emphasis on mining large data sets for correlations, inferences and predictions. The big data approach, he said during his talk and in an interview later, is brimming with short-term commercial opportunity, but he said scientists should set their sights further. “It might be fine if you want to target ads and generate product recommendations,” he said, “but it’s not common sense knowledge.”

The “big” in big data tends to get all the attention, Mr. Etzioni said, but thorny problems often reside in a seemingly simple sentence or two. He showed the sentence: “The large ball crashed right through the table because it was made of Styrofoam.” He asked, What was made of Styrofoam? The large ball? Or the table? The table, humans will invariably answer. But the question is a conundrum for a software program, Mr. Etzioni explained

Instead, at the Allen Institute, financed by Microsoft co-founder Paul Allen, Mr. Etzioni is leading a growing team of 30 researchers that is working on systems that move from data to knowledge to theories, and then can reason. The test, he said, is: “Does it combine things it knows to draw conclusions?” This is the step from correlation, probabilities and prediction to a computer system that can understand

This is a significant statement from one of the best people in fact extraction on the planet!

As you know from elsewhere on this blog, I’ve been involved with the precursor to the AIAI (Vulcan’s Project Halo) and am a fan of Watson. But Watson is the best example of what Big Data, Deep Learning, fact extraction, and textual entailment aren’t even close to:

During a Final Jeopardy! segment that included the “U.S. Cities” category, the clue was: “Its largest airport was named for a World War II hero; its second-largest, for a World War II battle.”

Sure, you can rationalize these things and hope that someday the machine will not need reliable knowledge (or that it will induce enough information with enough certainty). IBM does a lot of this (e.g., see the source of the quotes above). That day may come, but it will happen a lot sooner with curated knowledge.

]]>1paul@haleyAI.comhttp://www.haleyAI.comhttp://haleyai.com/wordpress/?p=7662015-10-23T15:17:13Z2014-08-20T17:01:02ZI found the following video in a recent post by Steve DeAngelis of Enterra Solutions:

It’s a bit too far towards the singularity/general AI end of the spectrum for me, but nicely done and fun for many not in the field, perhaps:

Enterra is an interesting company,too, FYI. They are in cognitive computing with a bunch of people formerly of Inference Corporation where I was Chief Scientist. Doug Lenat of Cycorp was one of our scientific advisors. Interestingly enough, Enterra uses Cyc!

]]>2paul@haleyAI.comhttp://www.haleyAI.comhttp://haleyai.com/wordpress/?p=7612014-08-26T20:48:38Z2014-08-05T23:14:15ZThe Andes Physics Tutor [1,2] is a nice piece of work. Looking at the data from the Spring 2010 physics course at the US Naval Academy [3] there are obvious challenges facing the current generation of adaptive education vendors.

Most of the modeling going on today is limited to questions where a learner gets the answer right or wrong with nothing in between. Most assessment systems use such simple “dichotomous” models based on item response theory (IRT). IRT models range in sophistication based on the number of “logistic parameters” that the models use to describe assessment items. IRT models also come in many variations that address graded responses (e.g., Likert scales [4]) or partial credit and onto multi-dimensional and/or multi-level/hierarchical and/or longitudinal models. I am not aware of any model that addresses the richness of pedagogical data available from the physics tutor, however.

Having spent too long working through too much of the research literature in the area, a summary path through all this may be helpful… There are 1, 2, and 3 parameter logistic IRT models that characterize assessment items in terms of their difficulty, discrimination, and guess-ability. These are easiest to understand initially in the context of an assessment item that assesses a single cognitive skill by scoring a learner’s performance as passing or failing on the item. Wikipedia does a good job of illustrating how these three numbers describe the cumulative distribution function of the probability that a learner will pass as his or her skill increases [5]. A 1 parameter logistic (1PL) model describes an assessment item only in terms of a threshold of skill at which half the population falls on either side. The 2PL IRT model also considers the steepness of the curve. The steeper it is the more precisely the assessment item discriminates between above and below average levels of skill. And the 3PL model takes into consideration the odds that the a passing result is a matter of luck, such as in guessing the right answer from a multiple choice question.

In the worlds of standardized testing and educational technology, especially with regard to personalized learning (as in adaptive assessment, curriculum sequencing, and intelligent tutoring), multiple choice and fill-in-the-blank questions dominate because grading can be automated. It is somewhat obvious then that 3PL IRT is the appropriate model for standardized testing (which is all multiple choice) and a large fraction of personalized learning technology. Fill-in-the-blank questions are sometimes less vulnerable to guessing, in which case 2PL may suffice. You may be surprised to learn, however, that even though 3PL is strongly indicated, its use is not pervasive because estimating the parameters of a 3PL model is mathematically and algorithmically sophisticated as well as computationally demanding. Nonetheless, algorithmic advances in Bayesian inference have matured over the last decade such that 3PL should become much more pervasive in the next few years. (To their credit, for example, Knewton appears to use such techniques [6].)

It gets worse, though. Even if an ed-tech vendor is sophisticated enough to employ 3PL IRT, they are far less likely to model assessment items that involve multiple or hierarchical skills and assessments other than right or wrong. And there’s more beyond these complications, but let’s pause to consider two of these for a moment. In solving a word problem, such as in physics or math, for example, a learner needs linguistic skills to read and comprehend the problem before going on to formulate and solve the problem. These are different skills. They could be modeled coarsely, such as a skill for reading comprehension, a skill for formulating equations, and a skill for solving equations, but conflating these skills into what ITR folks sometimes call a single latent trait is behind the state of the art.

Today, multi-dimensional IRT is hot given recent advances in Bayesian methods. Having said that, it’s worth noting that multiple skills have been on experts’ minds for a long time (e.g., [7]) and have been prevalent in higher-end educational technology, such as intelligent tutoring systems (aka, cognitive tutors), for years. These issues are prevalent in almost all the educational data available at PLSC’s DataShop [3]. Unfortunately, the need to associate multiple skills with assessment items exacerbates a primary obstacle to broader deployment of better personalized learning solutions: pedagogical modeling. One key aspect of a pedagogical model is the relationship between assessment items and the cognitive skills they require (and assess). Given such information, multi-dimensional IRT can be employed, but even articulating a single learning objective per assessment item and normalizing those learning objectives over thousands of assessment items is a major component of the cost of developing curriculum sequencing solutions. (We’ll be announcing progress on this problem “soon”.)

In addition to multi-dimensional IRT, which promises more cognitive tutors, there are other aspects of modeling assessment, such as in hierarchical or multi-level IRT. Although the terms hierarchical and multi-level are sometimes used interchangeably with respect to models of assessment, we are more comfortable with the former being with respect to skills and the latter with regard to items. A hierarchical model is similar to a multi-dimensional model in that an item involves multiple skills but most typically where those skills have some taxonomic relationship. A multi-level model allows for shared structure or focus between assessment items, including multiple questions concerning a common passage or scenario, as well as drill-down items. All of the issues discussed in this paragraph are prevalent in the Andes Physics Tutor data. Many other data models available at PLSC’s Datashop also involve hierarchically organization of multiple skills (aka, “knowledge components”).

And we have yet to address other critical aspects of a robust model of assessment! For example, we have not considered how the time taken to perform an assessment reflects on a learner’s skil nor graded responses or grades other than pass/fail (i.e., polytomous vs. dichotomous models). The former is available in the data (along with multiple attempts, hints, and explanations that we have not touched on). The latter remains largely unaddressed despite being technically straightforward (albeit somewhat sophisticated). All of these are important, so put them on your assessment or pedagogical modeling and personalized learning checklist and stay posted!