Friday, January 27, 2017

Imperial College: automated theorem proving & neural nets

These Imperial College lectures date back to 2006 and 2009 (Simon Colton), but the underlying principles haven't changed. The earlier lectures (starting here) provide an introduction to AI and logic: we get to implementation at lecture 7 and subsequent.

I've chosen them for their clarity and completeness. The level of detail is that of a good overview.

We have shown how knowledge can be represented in first-order logic, and how rule-based expert systems expressed in logic can be constructed and used. We now look at how to take some known facts about a domain and deduce new facts from them. This will, in turn, enable agents to prove things, i.e., to start with a set of statements we believe to be true (axioms) and deduce whether another statement (theorem) is true or not.

We will first look at how to tell whether a sentence in propositional logic is true or false. This will suggest some equivalences between propositional sentences, which allow us to rewrite sentences to other sentences which mean the same thing, regardless of the truth or meaning of the individual propositions they contains. These are reversible inferences, in that deduction can be applied either way.

We then look at propositional and first-order inference rules in general, which enable us deduce new sentences if we know that certain things are true, and which may not be reversible.

A minor miracle occurred in 1965 when Alan Robinson published his resolution method. This method uses a generalised version of the resolution rule of inference we saw in the previous lecture. It has been mathematically proven to be refutation-complete over first order logic. This means that if you write any set of sentences in first order logic which are unsatisfiable (i.e., taken together they are false, in that they have no models), then the resolution method will eventually derive the False symbol, indicating that the sentences somehow contradict each other.

In particular, if the set of first order sentences comprises a set of axioms and the negation of a theorem you want to prove, the resolution method can be used in a proof-by-contradiction approach. This means that, if your first order theorem is true then proof by contradiction using the resolution method is guaranteed to find the proof to a theorem eventually.

To recap, we have looked at logic and some rules of deduction in order to understand automated reasoning in the general case. In the last lecture we looked at a particular rule of inference, the resolution rule. We know that the application of this rule produces a complete search method, which means that it will prove any true theorem which can be written in first order logic. Application of the rule relies on having two sentences in conjunctive normal form (CNF) where one literal of one unifies with the negation of a literal in the other. We described how to write first order sentences in CNF, and how to find unifying substitutions in lecture 8. In this lecture, we look at exactly how to use the resolution rule to prove first order theorems in practice.

We have looked at the automation of intelligent tasks using deduction to infer new information from old. We now look at the use of inductive reasoning to infer new information from old. ...

As with many areas in AI, machine learning has become fairly specialised. In this case, learning from examples dominates the field, partially because of the applications it affords. If a learning agent can look at the examples of share prices and learn a reason why some shares fall during the first financial quarter, this is of great commercial advantage. If another agent can learn reasons why certain chemicals are toxic and others are not, this is of great scientific value. If another agent yet can learn what a tank looks like given just photographs of tanks, this is of great military advantage.

As discussed in the last lecture, the representation scheme we choose to represent our learned solutions and the way in which we learn those solutions are the most important aspects of a learning method. We look in this lecture at decision trees - a simple but powerful representation scheme, and we look at the ID3 method for decision tree learning.

Decision trees, while powerful, are a simple representation scheme. While graphical on the surface, they can be seen as disjunctions of conjunctions, and hence are a logical representation, and we call such schemes symbolic representations. In this lecture, we look at a non-symbolic representation scheme known as Artificial Neural Networks. This term is often shortened to Neural Networks, but this annoys neuro-biologists who deal with real neural networks (inside our human heads).

As the name suggests, ANNs have a biological motivation, and we briefly look at that first. Following this, we look in detail at how information is represented in ANNs, then we look at the simplest type of network, two layer networks. We look at perceptrons and linear units, and discuss the limitations that such simple networks have. In the next lecture, we discuss multi-layer networks and the back-propagation algorithm for learning such networks.

We can now look at more sophisticated ANNs, which are known as multi-layer artificial neural networks because they have hidden layers. These will naturally be used to undertake more complicated tasks than perceptrons.

We first look at the network structure for multi-layer ANNs, and then in detail at the way in which the weights in such structures can be determined to solve machine learning problems. There are many considerations involved with learning such ANNs, and we consider some of them here. First and foremost, the algorithm can get stuck in local minima, and there are some ways to try to get around this.

As with any learning technique, we will also consider the problem of overfitting, and discuss which types of problems an ANN approach is suitable for.

1 comment:

The Logic presented in Lecture 7 is strongly Classical: ¬¬P = P and ¬P v P = True. One may need to reflect on how universally applicable this is in the full AI context (e.g. Databases with incomplete information, etc).