What AI Can Learn From Romeo & Juliet

The story so far.

When someone talks about “AI”, today, they are referring to one particular type of AI: multi-layer neural nets trained on big data to recognize patterns. These so-called “deep learning” algorithms are great at learning more or less the same sort of stimulus/response functionality that our right brain hemispheres carry out – what Daniel Kahneman calls “thinking fast”. This is also what the entire brains of most animals do. So a better name for them might be AAI’s, for Artificial Animal Intelligences. In my last Forbes article (Not Good As Gold: Today's AI's Are Dangerously Lacking In AU (Artificial Understanding)) I argued that almost all of today’s AI’s have little or no left brain function – logical, causal, “thinking slowly”. Homo sapiens pays a huge price for having an over-sized bicameral brain (high birthing pain and risk) but upon reflection it’s worth it — in particular, you or I couldn’t perform such reflection without it! Our ability to generate alternative possibilities, to rationally construct and consider pro- and con- arguments for and against each proposition, is clearly worth it. At least for decisions that are important to us.

Some of the best AI systems today have already taken one small step in the right direction: they combine right-brain machine learning with some sort of left-brain symbolic representation of knowledge (typically something like a triple-store or knowledge graph) and an inference engine that can mechanically produce some conclusions from those abstract symbolic representations. Just as you are doing right now, as you read the words in this article.

Most of these “steps in the right direction” today are just baby steps.

In this article I will talk about several of them, including the one (CYC) that isn’t just a baby step.

In order to make the contrasts clear, let’s pick something which is familiar to all of us, something which was easy for us to understand and is yet quite hard for AI’s to understand: the plot of Shakespeare’s Romeo & Juliet.

Imagine that someone has just seen and largely understood Romeo & Juliet (“R&J”). There are now an astronomical number of straightforward plot questions about it that he/she can answer. By “straightforward” here I mean objective questions that don’t involve metaphors, symbolism, rising and falling action in a five-act tragedy, etc. Straightforward questions like “Why did Romeo kill himself?” and “When Juliet swallowed that potion, how did she expect Romeo to react and why?”

In other words, each such question should be unambiguous and should have an undisputed, noncontroversial correct answer – an answer that probably hasn’t changed at all in four centuries.

So, with R&J as our Drosophilae, what can an Artificial Intelligence system understand, from it?

There are three qualitatively different types or levels of AI understanding:

Machine Learning (and other statistical techniques that operate on the text of the play),

Limited Logics (the aforementioned baby steps in the right direction), and

Higher Order Logics (these can do anything, but the problem historically has been: Can they do anything fast enough to be usable?)

Let’s now discuss these one by one, and see how each increasing depth of representation and understanding enables a whole new universe of questions to answered.

Various pattern-finding algorithms and text processing techniques can indeed successfully answer some questions about R&J by processing the English text itself; for example, questions like these:

What are the names of all the characters in Romeo & Juliet? (This can be done by using well-understood techniques of named-entity recognition.)

Who knew the Friar? (using well-understood techniques of link analysis)

How did Romeo feel about Juliet in Act II? (using well-understood techniques of sentiment analysis)

During which scenes did anyone wear a mask? (using well-understood algorithms involving thesauri of synonyms and string searching)

What are the various different meanings that the word “Capulet” has? (using well-understood techniques of latent semantic analysis and indexing)

Generally these answers, even when correct, cannot be explained nor justified by the AI software that came up with them. Alternatively, you could say that their “justification” is no more and no less than statistical – about as deep as a rat's “understanding” of which behaviors earn it a reward and which behaviors don't.

Another way to think of this is to imagine someone just typing or speaking the whole question to Google or to IBM’s Watson. This works well in cases where there already exists at least one single (indexed) web-page that contains many/all of the words and phrases in the question and also the answer, all on that same web-page, hopefully near each other on that web-page. Even then, one still needs to hope that the search engine can find and return that as the top hit (or one of the first few hits). That is generally helped enormously by making the question (and its answer) as specific as possible, a technique which appears to most human observers to make the problem harder (which it would be for people, but not for text searching algorithms).

Despite all those qualifications, this shallow approach sometimes works marvelously well – we all witnessed in 2011 how perfect a match this is to Jeopardy, a game show whose answer-and-question pairs almost always do appear on a single web-page.

That approach also makes current web-searching seem smarter than it really is: it finds hits which are close enough, or good enough, or independently interesting enough, to obviate the need to answer the precise question we were originally trying to ask. If you ask Google, e.g., who the prime minister of the UK was when Theresa May was born, you get a million hits that don’t mention the answer (Anthony Eden) but are likely to distract you; and if you really care about the answer, you've just been told a million times when her date of birth was, and you can frame a second Google query using that piece of information (“Who was the prime minister of the UK in October of 1956”) — but notice that you are the one who has to perform that one step of logical reasoning, you are the one who has to understand enough to frame that second query. Google has done a great job of training you and me to not expect even one step of arithmetic or logical inference, from it!

Google being asked who the PM of the UK was, when Theresa May was born. Over a million hits all tell us when she was born, but don't mention the actual answer to the question (Anthony Eden.)

Google (July 2, 2019)

This sort of “understanding” is powered by statistics, and is no more or less than what Machine Learning should be able to do if one were to feed it the entirety of the internet as training data. I put “understanding” in quotes, there, for many reasons: Making you do the logical work, for instance, as in the Theresa May question, above. Or not understanding sarcasm. Or not understanding metaphor and analogy. Or not understanding most forms of negation, because two things that commonly co-occur near each other on lots of web-pages will be linked, or a triple will be linked, even if most of those occurrences are negations denying that link or that relationship. Sometimes a negation is clearly marked with a word like “not”, but quite often the proposition that "P is false" is communicated in a way that today’s state of the art NLU (natural language understanding) systems fail to recognize; e.g., “You might think P. But you would be wrong.”

Text processing and statistical machine learning are enormously cost-effective and omnipresent — they power our Amazon and Google searches, our Siri and Alexa interactions, and they determine which ads we’re exposed to. All this is so well documented and so well known to all of us that I’m not going to talk more about Level 1 Understanding here. Instead, let’s move on to the next deeper level, which is made possible by representing the meaning of a piece of text as a set of logical statements in a formal language and then having an AI algorithm automatically perform some sort of mechanical inference or theorem-proving on that set of logical statements.

Level 2 Understanding: Limited Logics.

Some of the most sophisticated AI programs today go beyond Machine Learning, go beyond statistics, and capture some of the meaning of a piece of text. The meaning is represented in some symbolic structure – for example, Knowledge Graphs, Triple- and Quad-Stores, OWL ontologies, OWL Lite, Description Logics (OWL DL), Property Graphs, Propositional Logic, Bayesian Networks, First Order Logic (First Order Predicate Calculus, Common Logic, OWL-Full), SAT inequalities, etc. When I say that these each “go beyond” ML, here, what I mean is that they enable that AI program to answer questions (e.g., R&J questions) that ML currently fails at, answers that Google searching can’t return because the question and answer don't already appear anywhere on any existing single web-page. I.e., these require one or two or or more steps of logical reasoning combining what’s said on existing web-pages. Let’s examine what all these Limited Logics have in common, and then talk a bit about what separates them.

What they have in common. When statements in Romeo & Juliet are represented in one of these formalisms, there is a range of questions whose answers can then be automatically produced by running some sort of inference engine that algorithmically generates logical entailments of the statements expressed in that logical language. In other words, such a system is able to correctly answer some types of queries once they have been represented in that logical language.

To illustrate what we mean by automatically inferring something, let’s consider an example even simpler than anything in R&J. Suppose that one of these Limited Logic systems contains – i.e., represents – these two simple statements, S1 and S2:

Then almost all the Limited Logical systems would be able to quickly conclude:

S3: likes(Fred, Joe) meaning that Fred likes Joe.

Equivalently, if you asked any software application performing that type of Limited Logic whether or not Fred likes Joe, it should and in most cases would quickly deduce the right answer “Yes”– it would deduce the conclusion likes(Fred, Joe). How does it do that? Probably it starts out by negating that desired conclusion and then it logically derives FALSE. Since FALSE cannot be true, the negated premise must not hold; and if NOT P can’t be true, then that necessitates that P must be true. Based on that proof (what mathematicians call an argument by reductio ad absurdum or RAA), the Limited Logic AI is able to answer your question affirmatively and – unlike the Level 1 Text Processing AI’s above – such an application could also tell you why; it could produce the logical version of the English sentence: “Yes, because I know that Fred likes Joe or Sam, and Fred does not like Sam, so therefore Fred must like Joe.”

These systems are often quite good at generating natural-sounding English explanations, even though they aren’t able to understand similarly-complex English input. This often confuses users of digital assistants like Siri and Alexa, since in human beings one’s natural language understanding (NLU) ability develops as fast and as well as one’s language generation (NLG) ability; in fact usually a person’s NLU surpasses their NLG. But human-level automated natural language understanding is beyond the state of the art of AI today, whereashuman-level automated natural language generation from logic is relatively straightforward. As a result, many of today’s Limited Logic systems could not only give you the right answer but could produce a readable English justification for it.

You can think of Level 2 logic as similar to what we all did in high school Plane Geometry class, where we had one or more “givens” and the proof we had to produce – by hand – was not just one step long (as in the Fred-likes-Joe case, above), it was usually several steps long.

One example of that is the following problem: The student is given (in step 1) the fact that there is a triangle RST and given that the two legs RT and RS are equal in length, and the student’s task is to produce a step-by-step proof that angle T must equal angle S. The first thing they do, in step 2, is to negate the desired conclusion (S=T). In each subsequent step, they are allowed to invoke some universally true axiom and/or some of the already-derived intermediate conclusions (“lemmas”). That happens in steps 3 and 4, and then (in step 5) a contradiction is derived. So, by reductio ad absurdum the negation must be false, and hence angle T must be equal to angle S:

Proving a theorem in plane geometry. Each step is a known axiom, or follows from the earlier ones by applying a valid rule of inference such as modus ponens. Statement 2, though, is the NEGATION of what we want to prove. When we eventually prove

Kiwznet 2019.

In a more typical, less trivial, high school geometry problem, there are usually several “givens” and several “to-be-proven” objectives – for example this one with 7 givens and 3 to-be-provens

Here is a plane geometry problem where there are several "givens" and several "to be provens".

Kwiznet 2019.

In a realistic knowledge-based AI application today there might be millions or even billions of axioms and “givens” or premises – statements in the knowledge base, each one represented declaratively (i.e., as a sentence in that Limited Logic language) – and each question being asked of the application would be answered by the AI searching and finding a step-by-step proof that might be tens or hundreds or even thousands of steps long.

What separates them. I listed about a dozen different flavors of Limited Logics (and limited representations), above, and I have talked about them as if they are all one monolithic type of thing. They all do indeed have a lot in common, so what makes them different from each other?

Each flavor of Limited Logic allows various sorts of features to be in the statements, and forbids everything else; each allows only certain types of inferences to be drawn, etc. Some such distinguishing features are:

Some Limited Logics allow statements to contain function symbols which, when used in a statement, stand for the value of that function applied to its arguments – e.g., PresidentOf(USA) denotes the president of the United States, and SpouseOf(Fred) denotes the person who is Fred’s spouse, and therefore this statement says that the spouse of the American president knows the spouse of the Chinese president: knows(SpouseOf(PresidentOf(USA)), SpouseOf(PresidentOf(China)))

Some Limited Logics allow relations like between, which takes three arguments instead of just two – e.g., between(Los Angeles, San Diego, San Francisco), or four arguments, such as betweenAlong(Los Angeles, San Diego, San Francisco, US101). Some of these logics impose a limit on the highest allowed arity (number of arguments) and some don’t.

Some Logics allow functions and predicates (true/false functions) to take a variable arity – e.g., they support a single “Plus” operator which takes any number of numbers which are to be summed up. You can give Plus two numbers, or twenty-two numbers, to add.

Various Limited Logics deal differently with negation. Believe it or not, for some Limited Logic systems, if the AI tries and fails to prove P, then it jumps to the conclusion that P is false! This is usually a very dangerous leap to make, but there are some applications where this works well, notably those that have what’s called a closed world of complete information: if there are only eight possible murder suspects, and seven of them have ironclad alibis, then things are not looking good for suspect #8.

The various Limited Logic systems differ in how (if at all) they handle representing and reasoning with statements which are true by default but nevertheless have some exceptions. Some allow whole classes of exceptions (e.g., birds can fly, but penguins can’t) or not; some allow individual exceptions (e.g., Tweety is a particular bird whose wings are too small to support its weight in flight) or not; some allow exceptions to the exceptions; and so on. Accommodating exceptions is important in most applications since, outside of mathematics and games, almost nothing in the world is completely exception-free. Sometimes this is finessed by using probabilities, but this may not suffice if the AI needs to tease apart which cases are exceptional, and/or if the massive data needed for probabilistic reasoning happens to not be available.

The various Limited Logics also differ in how (if at all) they can change their minds. Suppose that, at some point in time, such AI learns or is told or infers that proposition P is true. The AI may now infer a large number of entailments, based at least partly on P being true. But then, let’s say the next day, some new information surfaces that clearly demonstrates that P was false all along. Some of the Limited Logic AI’s will, in that case, do the right thing, and retract P. Artificial Intelligence pioneer John McCarthy called that capability elaboration tolerance. An even smaller subset of the Limited Logic AI’s can go further and correctly untangle and undo all the entailments that it only believed because it had thought that P was true. That's called truth maintenance or reason maintenance. Even the Limited Logic AI’s that support this sort offeature (such as ART) routinely keep it turned off (which is the same thing as not having that ability at all) because it slows them down so much! But the price of turning this feature off, or not having it at all, is that the AI’s arguments can become stale, over time, without it realizing that; of course that happens to all of us humans as well — I often find myself mistaken about something today because I am relying on things I learned half a century ago that turned out not be true!

Some Limited Logic AI’s that handle defaults, and truth maintenance, go even further and support some form of argumentation: there may be multiple arguments that conclude P is true, and at the same time multiple arguments that conclude P is false. Being able to do this crosses the boundary of what we would call a Limited Logic AI, since it is reasoning about its own knowledge and reasoning about its own reasoning. That type of meta-reasoning is actually a form of what's called higher order logic (HOL) and will be discussed below.

Variables and quantifiers: Are some of the arguments to functions and predicates allowed to be variables, and, if so, do all the quantifiers need to be universal (“Every…” or “For All…”, abbreviated “∀”) or can there be nested universal (∀) and existential (∃) quantifiers? For instance, consider this statement which says that for each person, there exists some female adult person who is that person’s mother: ∀x ∃y Person(x) ⇒ Woman(y) AND mother(x, y)) Many real-world problems require, or are vastly terser and faster to solve, if one does allow variables and quantifiers; but sometimes a problem can be stated, and questions answered, using just propositional logic, which does not allow any variables or quantifiers at all. For example, propositional logic may suffice for scheduling a semester’s classrooms, professors, and students at a university given the large set of constraints that must be satisfied.

Assuming that statements in that Limited Logic can be of the form of if/then rules — “IF P1 and P2…and Pj THEN Q” — is there some restriction on how many of the Pi literals can be positive (non-negated)? This may sound like a very bizarre restriction, but it’s actually quite common, among Limited Logic AI's, since it allows the inference engine to use a well-known theorem-proving algorithm which is much simpler and faster. As we’ll talk about below, that’s really the reason for the proliferation of all these different Limited Logics: the more restricted they are, the faster they can operate.

Some of the Limited Logics are even more restrictive, and require all the “givens” to be simple numeric inequalities. This is all the representation needs to express, for example, in order for a theorem-prover to operate on it so as to solve a particular SUDOKU puzzle.

Implementations of these Limited Logics also differ in which meta-level features and controls and monitors they support. Such things are not really part of the logic – they aren’t expressed as statements in that language – but rather they are provided to (and handled specially by) the software application that runs that Limited Logic. They are pragmatic, heuristic advice that the rule-writers can give to the AI. Some common and quite useful meta-level features of Limited Logic systems are:

Providing some sort of resource limits to the inference process, such as a time cutoff or a depth cutoff. Or an interrupt, such as the user asking a new question.

Providing some sort of guidance to the reasoner, such as statistics on which paths, which combinations of rule-firings, etc. were particularly useful or wasteful in the past; whether a particular rule is appropriate to use “forward” (to eagerly infer new consequences of something the AI has just concluded) or “backward” (only when the AI is searching deeply for an answer to a problem that has no obvious known solution) or both.

The system automatically recording some sort of bookkeeping meta-data which can later be queried to ascertain the provenance of a statement, who entered it and when and from what source, and a history of how that assertion/fact/rule has been edited and how it has been used and by whom and when and for what and how well that turned out. For those Limited Logic AI’s that support argumentation (see above), they may log this same sort of bookkeeping information about how various arguments fared, historically.

Why aren’t Limited Logics the end of the story? Think back to Romeo & Juliet, or for that matter think of any recent news story: There are things we can say in a natural language, like English, which are awkward or combinatorially explosive or even impossible to say — and to reason with, at least as completely as you or I would — in any of the above Logics. That’s why I grouped them all together, after all, and labeled them all “Limited Logics.” To reach that important next level, we have to turn to something called higher order logic (“HOL”).

An extra step required: NL —> Logic. We’ll talk about HOL in the next section. But first I want to underscore an important “added cost” to using any type of logic — limited or higher order. The content to be reasoned about must first somehow be represented in that logical language. If it starts out as English (as in the case of R&J) then there needs some way of converting those sentences into sentences in that logical language. This translation can of course be carried out manually, by axiomatizers who know both of the languages; or semi-automatically (checked and tweaked by humans); or — with a moderate amount of error — fully automatically. The translation errors arise because humans, like Shakespeare, are speaking and writing for other humans, presuming that every one of their readers already has an immense amount of tacit knowledge (common sense and culturally shared metaphors and very widely known facts) and prior knowledge (including both the content of the discussion/article up to that point and the context in which this is being communicated, which may include its purpose, intended audience, satirical nature, and so on.) Things have changed enough since 1597 that many high school students need to read a modern printing of Romeo & Juliet which contains a plethora of footnotes to explain all the things which would have been common knowledge back then but aren't today.

So let's assume that somehow the content has been captured, translated faithfully into a Limited Logic or a Higher Order Logic representation. The AI program automatically, algorithmically, mechanically operates over those logical assertions, and reaches some new conclusion, answer, or insight. That is still represented in logic, not English. So that NLU transformation step must then be performed a second time, to happen in the opposite direction, i.e., natural language generation (“NLG”), transforming the logic back into natural language. In many cases, the answer is so terse (Yes/No, a number, a person or place, etc.) that this NLG step is trivial, and, even when it isn’t trivial, NLG is vastly simpler than NLU, since the logic generally doesn't contain pronouns, ambiguities, literary devices, elision, and all the other flourishes that make NLU so engaging for human readers and so difficult for AI's to read.

Level 3 Understanding: Higher Order Logics.

Many of the everyday things that we say and write to each other, and expect the other person to understand and draw “the obvious conclusions” from, require so-called higher order constructs – features that transcend FOL (First Order Logic); we can group those logics that do that together and label them, collectively, as “Higher Order Logics” or “HOL”. Here are a few such HOL features:

The ability to make statements about, and ask questions about, relations in the system. For instance, in HOL one can ask: What is the relationship between rain and irrigation? (One such relation is reducesTheNeedFor). One can ask: Which relations (that my logic knows about) are transitive, symmetric, antireflexive, and hold between members of a family? (One such relation is siblingsOf). If you can express these things in your logic, then we say it’s a “second order” logic, the simplest form of HOL.

Another sort of higher order logic feature is the ability to treat entire statements in the language as first class objects: i.e., the ability to make statements in the language about those other statements. For example, if S1 and S2 are two first-order sentences, can we say things like: (a) S1 was told to the system earlier than S2 was; (b) S1 is more likely to be true than S2; (c) if you, the AI, are trying to use S1 to find an answer and it seems to have gotten you into an infinite loop, then stop and try using S2; (d) whenever S1 works, in solving a problem, it is almost always applied twice, almost never just once nor more than twice (e.g., this is true for integration by parts, and was a rule of thumb most of us learned the hard way in our Calculus II class.) This is a deep but important sort of HOL self-reflection capability, being able to declaratively represent and reason about the steps of its own problem-solving process, and reason about its own reasoning algorithms and proof strategies and tactics.

Higher order logics don’t (and shouldn’t) have to follow all the classical syllogisms such as modus ponens. Consider the so-called modal logic of libel laws: what can you sue someone for having stated or published publicly? Suppose that at one point the NY Times printed the statement P in some article, and at some other time in another article they printed that if P were true then it would follow that Q would be true. Does that mean that they can be sued as though they had ever explicitly printed Q? No; modus ponens doesn't apply in the context of "what a newspaper has ever printed". Similarly, since we humans all have limited reasoning capabilities, just because a person believes P, Q, R,… and those materially imply S, does not mean that that person believes S. Otherwise all of us would already know all the theorems that will ever be discovered in Number Theory! Lewis and Langford teased apart and axiomatized five different flavors of modal logics in their seminal 1932 book Symbolic Logic. For instance, we can’t always substitute X for Y even when X equals Y – just because Fred believes that John’s age is 31, and in reality John’s age is 32, certainly does not mean that Fred believes that 32 is 31, so we can’t replace “John’s age” by “32” even though John’s age is 32! Just because Lois Lane believes the sentence “Superman can fly”, and it so happens, in that reality, that Superman is the same as Clark Kent, we can’t replace “Superman” with “Clark Kent” and validly conclude that Lois Lane believes the sentence “Clark Kent can fly.” Things get even more complicated when the modal relationships are nested: Israel wants Iraq to believe that if the Strait of Hormuz were blockaded then ... Relations like believes, wants, dreads, hopes, aims, expects,... are called modal operators, in logic, and the logics that allow them and reason with them are a species of higher order logic called modal logic.

Of course to be useful, it should be the case that the logic that the AI uses is both (a) more or less consistent (all the statements one can logically infer from correct statements in that representation are also correct) and (b) more or less complete (if S should be entailed by P, Q, and R, then there is actually some proof of S in your logic, given P, Q, and R, and your job is now just to somehow find that proof.) We’ll talk about what I mean by that hand-wavy qualifier “…more or less…” below, but the short explanation is that we want our symbolic AI computer programs which are mechanically grinding through a higher order logic to very promptly answer the questions posed to it in a timely manner (versus sometime around, say, the heat death of the universe). So that need for speed is what forces us to build our symbolic AI programs around logics – and mechanical inference algorithms – which can “live with” a little inconsistency and “live with” a little incompleteness:

Living with inconsistency. Think Tevye in Fiddler on the Roof, whose reaction to paradox is to laugh. Pragmatically, this means that our inference engine must do a kind of argumentation (gathering up pro- and con- arguments and reasoning about which arguments are better than or preferred over which other arguments – e.g., use more recent knowledge, use more expert knowledge, prefer terse arguments to long-winded ones, etc.), and/or uses some probabilistic reasoning algorithm.

Living with incompleteness. At least until quantum computing is cheap and ubiquitous, every AI of necessity is running on some real-world computers which have large but limited storage – say n bits total. As Gödel famously proved, for each number n there is some question we could ask that would make it run out of storage of that size before it could answer. Pragmatically, “living with incompleteness” means that any time our inference engine is asked a question, it is also given an explicit or implicit resource bound: “Hey, AI, just do the best you can to answer my questions, given these computational resources” – the most common sort of resource bound is a time cutoff: “Hey, AI, if you haven’t answered the question in N milliseconds, just give up.”

Aren't Limited Logics enough, though? Almost all human conversations, podcasts, broadcasts, literature, web-pages, etc. are created by human beings for human beings. More to the point, they are created by beings that have common sense, and they are intended to be read, viewed, and utilized by other beings that also have common sense.

This unstated pact is what enables us to liberally use pronouns, ambiguous words, metaphors, analogies, sarcasm, hyperbole, and allusions to well-known characters, real-world events, and fictional stories.

This unstated pact is what enables us to omit all the “obvious’ details, In fact if we include all the obvious details, we risk confusing or insulting the audience!

This unstated pact is what enables us to invoke constructions like “etc.”, etc.

This unstated pact is what enables us to be as complicatedas we need to be because we have a good understanding of how convoluted a sentence (and a story) can be before the audience will fail to be able to follow it and understand it. E.g., USA Today expects its readers to understand sentences with nested modals like “Israel in 2013 wanted the USA to expect HAMAS to be afraid to attack Israel in the next five years lest the USA retaliate by deploying…” The world really is that complicated – see almost any conversation or article about the U.S. government today – and if our AI can’t fully represent and reason with that level of complexity and nesting then we are limiting its understanding to that of a child or a pet.

"Oh, come on!", you might say, "how badly do those Limited Logics fare?" To answer that, just imagine how hard it would be for you to rewrite this article, or any news article, or any conversation transcript, or novel, say R&J, as a series of three- and four-word English sentences! And think how agonizing and probably unsuccessful it would be if you had to try to read and understand article or novel that had been rewritten into that form. Most of the meaning would be lost, or obscured beyond any realistic hope of anyone being able to quickly decide whether some proposition Q was entailed by that set of sentences or not. And yet that is exactly the sort of “flattening out” of content that occurs when Limited Logics are employed: reducing complicated statements down to simple knowledge graphs, property graphs, Bayesian networks, and so on.

A detailed example: Using Higher Order Logic to Reason about Romeo and Juliet

Let’s return to our opening example, Shakespeare’s Romeo & Juliet. We will excerpt a few statements in English and then represent them in CycL, the higher order logical language that our AI, Cyc, uses to represent knowledge. This HOL can faithfully capture their full meaning; to demonstrate this, we will show how Cyc is able to mechanically produce the answers – the same answers that you or I or Shakespeare would have expected – to questions about who (in that play) knew what when, why they did what they did when they did it, and what alternatives they were probably weighing.

Here is a synopsis of what we want to represent and answer questions about:

The condensed plot of scenes from Acts IV and V of Romeo and Juliet.

Wikipedia 2019.

For convenience (to make things terser, below) let’s name a few of the important contexts – time points (or time intervals) and, in some cases, a person’s set of beliefs, expectations, goals, dreads, intentions, etc. at or during that time:

T0 – just before Act IV starts.

T1a – the start of Juliet’s visit with Friar Lawrence

T1b – the end of that visit

T2 – what the Friar and Juliet both believes about T4 and T5

T3 – Juliet drinks the feign death potion

T3b – Juliet’s body is discovered and taken to the crypt

T4 – Romeo hears from the Friar’s messenger about the planned trickery

T5 – Romeo hears (from someone else) that Juliet has died

T6 – Romeo goes to the crypt and finds Juliet’s inert body.

T7 – Juliet awakens from her feigning of death.

T8 – Romeo and Juliet secretly flee Verona

T9 – Romeo and Juliet live happily ever after

T10 – Romeo buys poison

T11 – Romeo drinks poison

T12 – Romeo dies

T13 – Juliet discovers Romeo’s dead body

T14 – Juliet drinks poison

As you can see, some of these contexts never come about (in the play), except in some characters’ minds, in the expectations or hopes or dreads that some characters harbor.

For instance in the context T0, here are a few of the statements that hold true:

Lord Capulet wants and expects Juliet and Paris to marry.

Juliet knows that Lord Capulet wants and expects Juliet and Paris to marry.

Romeo believes that Juliet is alive.

Romeo, Juliet, and the Friar do not want Juliet to marry Paris.

All three of them know that all three of them do not want that to happen.

As each scene of the play unfolds, some statements start becoming true and some statements stop being true. And over the course of a much longer time period, often by dint of common sense (“of course” below) all the characters in that fictional world, and all the audience members in the real world, believe that:

Of course if a person were dead, they would not have to marry anyone.

Of course a person often asks another for help accomplishing something that they both want.

Of course if someone drinks an instantly fatal dose of poison, they immediately die.

Of course if a trusted friend of yours tells you something, and you don’t have a better reason not to believe it, then you are very likely to believe what they say.

Of course when someone newsworthy is believed to be dead, news of their death will spread quickly. The scale and speed depend on the information technology available and the dead person’s level of fame. In particular, in a small medieval European town, the news of a local noble’s demise would spread by word of mouth throughout that town over a period of hours but in less than 24 hours.

Of course if one believes that the love of his/her life has just died, then he/she is likely to feel overwhelming sadness and hopelessness.

Of course if someone dies, they stay dead.

Of course an object at rest will stay at rest until moved by some person or some force acting on that object.

Of course while anyone is unconscious or dead they are an object at rest.

There are many more rules of thumb like that, default-true rules that together comprise what we mean by common sense. Over 25 million such general rules have been codified and formally represented already in the Cyc knowledge base.

By “default-true” we mean that – like most rules outside mathematics – they have exceptions. Often, these exceptions are precisely the features that make some event newsworthy in the real world. Sometimes the exceptions occur only in fiction, but we have no trouble imagining, hearing about, and reasoning about such worlds in which, e.g., vampires, faster than light travel, time travel, ESP, lightsabers, orcs, etc. are real. Higher order logic has no problem representing systemic classes of exceptions, and individual exceptions. In Cyc, for example, in order to represent a general class of exceptions – e.g., birds generally can fly, but penguins can’t – we can create a context or microtheory in which we unassert (or actually deny) something, or we can explicitly use the predicate exceptWhen, which takes two arguments: an assertion asserted to be true in the knowledge base and a condition under which Cyc should withhold its belief in that assertion. The standard rules of default logic — more particularly circumscription— then automatically apply. Individual exceptions are even easier to represent, explicitly; e.g., that Captain Hook had only one hand.

Now let’s see how our HOL reasoner and knowledge base, Cyc, makes use of all that, to answer some of the questions that anyone who understood Romeo & Juliet would be able to answer, such as (a) nested belief reasoning and (b) pro- and con- reasoning about each of the most plausible alternative answers.

(a) Nested belief reasoning

In almost every drama, indeed in almost all news stories, there are important situations where one person (or group) correctly or mistakenly believes/expects/wants/dreads that some other person (or group) believes/expects/wants/dreads that some... etc. etc. In Romeo & Juliet, as with many of Shakespeare’s plays, the plot hinges on a series of such nested beliefs, some correct and some tragically incorrect. For example, that Juliet believes, when she drinks the suspended animation potion, that Romeo has already been informed of the plot and knows she isn’t really dead. We can pose this question to Cyc and ask for its line of reasoning explaining its answer:

The Cyc AI program answering a question involving deeply nested modal about what someone believes that someone else will believe. Cyc can then explain why it came to that answer, step by step.

Cycorp 2019.

Repeatedly asking Cyc to explain its reasoning more and more deeply eventually bottoms out in it pieces of common sense we all learned as toddlers, like “People generally remember things that are very important to them.”

The above answer was returned by Cyc in 3 seconds, running on a laptop, most of which time was taken up by its figuring out how to generate a more or less grammatical sentence translating each higher order logic statement from CycL into English. Each of these English sentences was generated automatically from the underlying CycL. Here, for example, is how we could write in higher order logic what got translated into “At the time of Juliet's taking of the feign-death potion, Juliet has a model of Romeo’s beliefs at the time of Juliet's being in suspension after taking the feign-death potion that includes the proposition that Juliet is a living thing”:

(ist T3 ;; T3 is the context where Juliet drinks the feign death potion

(believes

Juliet

(ist T4 ;; T4 is where Romeo hears Juliet was found apparently dead

(believes

Romeo

(ist T7 ;; T7 is the context in which Juliet actually awakens

(isa Juliet LivingOrganism)))))

Note that the context T4 is counterfactual with respect to the other contexts: T4 is where Romeo hears from the Friar’s messenger about the planned trickery; that notifying event was planned for, and tragically relied upon, by Juliet and the Friar, but that event failed to transpire due to the messenger being delayed.

So the HOL formula, above, is saying that it was true, at the time she imbibed the potion, that Juliet believed that Romeo would believe, from the time he heard Friar Lawrence’s message up through her awakening, that she was actually still alive. Whew! This sort of nested belief situation comes up often enough that Cyc actually has a 6-place predicate – agentsModelOfAgentAtTimeIncludesModelAtTimeWhere-PropLiftsToTime – so that the Cyc ontologists can tersely express such assertions, and so that the Cyc inference engine can efficiently reason with them to infer all the things that you or I would, from such statements if they were made in English.

There is also a third benefit to having a broad vocabulary of terms, including relations/predicates/functions, which makes the “target” logical expression terse: it makes it easier for the AI to automatically infer or at least guess at new knowledge. In Cyc today, e.g., the amount of content automatically inferred by the program itself has finally begun to exceed the tens of millions of rules and assertions that were hand-axiomatized by our ontologists over the past 35 years.

To illustrate pro- and con- argumentation, let’s see how Cyc answers the following question: What are the apothecary’s incentives and disincentives for selling poison to Romeo?

The question is complicated because (as is made clear in the play), such a sale is a capital offense (a strong con- reason), but on the other hand the apothecary’s family is practically starving to death (a strong pro- reason). Coming up with these pro- and con- answers appears to take Cyc several seconds, on a laptop, but most of that time is spent in NLG, generating the easily understandable English justification sentences from the deeply nested HOL representations.

The Cyc AI producing arguments why the apothecary might, and might not, want to sell fatal poison to Romeo. In the end, his family's desperate straits outweigh his fear of being caught and punished.

Cycorp 2019.

Here is the CycL representation of the question which leads Cyc to gather negative arguments:

(ist

(MtTimeDimFn

(TemporalExtentFn RomeoBuysPoisonFromTheApothecary))

(isDisincentivizedTo ApothecaryInRomeoAndJuliet

(BuyingFn ToxicSubstance-HomoSapiens)

seller)))

This is a representation, in the CycL higher order logic language, of the query which we might express in English as: "Why is the apothecary incented to not play the role of seller in a transaction where the object being bought is a poison that’s fatal (to humans)?"

And here is the CycL representation of one of the statements Cyc used in one negative argument:

(ist

(MtTimeDimFn

(TemporalExtentFn RomeoBuysPoisonFromTheApothecary))

(desires ApothecaryInRomeoAndJuliet

(relationNotExistsInstance

agentPunished

Execution-Judicial

ApothecaryInRomeoAndJuliet)))

This is a representation, in the CycL higher order logic language, of one "lemma" which Cyc used to answer the above query, about why the apothecary was reticent to sell the poison to Romo. We might express this in English as: "That apothecary, at that time, desired that there not be any future instance of a judicial execution in which he himself was the person being put to death". This lemma was not told to Cyc, it was inferred by Cyc from more general knowledge about things that most people generally do not ever want to happen to them (in this case, being executed.)

A couple of the other steps that Cyc went through, in its step-by-step argument against going through with the sale of the poison to Romeo, are:

the sale of lethal poison is (in that context) a capital offense in Mantua.

the apothecary lived and worked, at that time, in Mantua.

Several other steps in Cyc's reasoning involved general knowledge about how the world works, some of the default-true rules of thumb that Cyc has known for decades to typically hold true in the context of more or less any civilized society:

people are subject to the code of conduct of the place they live and work.

if they commit a major crime, there is a good chance they will get caught.

once caught, a criminal has some sort of hearing, and is often found guilty.

if found guilty, one usually receives the prescribed statutory punishment.

After some on-stage soul-searching, the Mantua apothecary in R&J finds the pro- argument slightly more compelling to him, his family is starving and he desperately needs the money; so he reluctantly goes through with the transaction and sells Romeo the lethal poison.

Conclusions and Good News.

I’ve made a case for using symbolic logic, not just statistics, in our AI’s – among other reasons because symbolic logic always provides a step-by-step explanation for everything they conclude. This raises a very important and timely question: how exactly can we yoke these two — statistical machine learning and symbolic logical representation and reasoning — together? How can we tap into both of those powerful reasoning technologies?

And I’ve made a case for using higher order logic, not just one of the plethoras of Limited Logics (e.g., Knowledge Graphs) in widespread use today, so as to avoid only representing a pale shadow of the full meaning. This raises a second timely and important question: how exactly can we get our programs to reason quickly enough, with knowledge represented in that cumbersome way?

The good news is that both of those questions have concrete answers, which will be the topics of my next two upcoming Forbes columns:

Yes, there is a way to get higher order logical representation and reasoning to be efficient. You probably already inferred that, from the examples of real-time question-answering that Cyc can do, above, even with questions involving deeply nested beliefs and other modals (desires, fears, expects, etc.) There are many engineering solutions that come together to recoup the efficiency that utilizing HOL usually costs, and I will go through and explain and illustrate each of these engineering breakthroughs (which, as so often is the case with scientific inventions and magic tricks, seem obvious once you see how they’re done.)

Yes, there is a way to harness both statistical machine learning and symbolic HOL inference, and thereby get the benefits of each and — moreso — gain the synergy when both souces of power are brought to bear. I felt alone in delivering this message until quite recently, but I’m happy to report that that is finally changing: a conference at Stanford I attended a couple months ago was completely dedicated to this very topic. The upshot is that there is a whole spectrum of ways of synergizing these two different sources of power, and I will discuss and exemplify each of them.