The Elephant in the room

While consciousness & "free will" are hot topics for philosophers, most "serious" Artificial Intelligence (AI) literature avoids them (exceptions include 1,2,3). However, I'm yet to meet someone with an interest in AI who hasn't thought about these deep questions (consciousness being the other big one). We just don't talk about them in scholarly company.

The history of AI is a series of speculative bubbles: False hopes, dashed promises and unrealized dreams have created the mainstream perception that AI has gone nowhere for decades. We seem to re-brand AI every 10 years to shed past disappointment. For example, Machine Learning currently enjoys huge popularity, yet half the community do not consider ML to be a subset of AI, viewing the latter as a narrower discipline concerned with symbolic reasoning. A decade ago everything was about explicit treatment of uncertainty.

Today, it is rare to write about “AI” in scholarly journals; we talk about specific approaches instead. AI is tainted terminology.

Having been repeatedly burnt, AI & ML researchers now focus on short-term, achievable goals while (publicly) ignoring questions about fundamental, qualitative distinctions between artificial intelligence methods that already work, and the natural intelligence of people and animals.

We have a “God of the Gaps” problem: The qualitative differences between man and machine keep receding into ineffable gaps in machine capability. Humans keep raising the bar.

Physicists have repeatedly proposed (e.g. Penrose) that consciousness requires some special kind of physics. This rationalizes machine inferiority, but seems like an answer in search of a problem without strong prior evidence of such unusual phenomena.

The idea that consciousness requires some form of exceptional phenomena may be a peculiarly Western philosophical trait, resulting from exposure to Descartes' Dualism.

An opposing view is that when properly defined these problems might simply be emergent characteristics of a particular type of algorithm, making the type of physical embodiment unimportant.

A Problem of Definition

Our position is that the ongoing difficulty with consciousness & free will is simply lack of a satisfactory definition for these features. For those of us who regard Artificial Intelligence as a practical toolkit akin to spanners or wrenches, it's time to hold the less applied thinkers to account. Exactly what performance, qualities or abilities do people accept as demonstrating consciousness?

To improve the specificity of these problems let's replace “Free Will” with Self-Determination and swap Consciousness with Self-Awareness.

Self-Determination

The most problematic aspect of free will seems to be determinism. In a deterministic system the relationship between inputs and outputs is fixed; there are no choices. Instead the inputs determine the outputs. Clearly there is no room for free will if our sensory inputs exactly determine our actions in response.

"Free" will implies that normal causation doesn't apply to thinking - as if it were "free" from any and all constraints. This smacks of a preoccupation with Dualism and the theoretical constraints of hard determinism derived from Newtonian models of the universe. Both Descartes and Newton lived and wrote 300 years ago. We have moved on since then.

A much better description might be "self determination": deciding what to do via some reasoning process, making use of knowledge, experience and personal biases. The big issue is then whether this definition is lacking some essential “freedom” quality.

What if our sensor inputs don't determine our actions, but an unconscious neurological process does instead? Brain imaging suggests that conscious awareness of decisions often occurs long after decisions are actually made and even after action begins. In this model, due to "our" lack of access and control over decisions, free will is merely an illusion. Although this looks hopelessly conclusive, the word "our" is key to unlocking a greater range of more satisfying answers. It is ownership and access that is lacking. We will argue that self-determination restores both.

Self-Awareness

If we are to make our own choices, we need a framework for modelling the world and the values we attach to concepts, events and entities within it. The framework enables us to become aware of ourselves and the world we inhabit.

Consciousness is often described as awareness - of the self, of the past and future, and of the world around. There's also awareness of sensation - Qualia - and an ongoing "stream" of consciousness, such as an internal monologue. Attention - selective awareness - is also entwined with consciousness.

This article isn’t going to be able to describe how to artificially reproduce human awareness. For one thing, evidence from animals and inter-personal variation suggest that the qualities and properties of human awareness are a matter of degree, not a binary feature. Most of the other articles on this blog discuss ways to generate artificial representations of the world from sensory data, but there’s more about current progress in artificial awareness in a postscript to this article.

In contrast, we can ask some pretty definitive questions about the feasibility of self-determination, which will be the focus of the rest of this article.

Models of Self-Determination

Let's explore some thought experiments to see if we can find a set of physically-plausible qualities that fulfill a satisfying definition of self-determination.

Free Will is typically described as the ability to make decisions, or choose actions, in a way that is not determined by external events. In some sense, we are "free" to express personal preference and values over strong external persuasion or evidence.

But Free Will is not simply a random or unintelligent reaction; we would like it to be a choice of some kind. We would like to make informed, deliberate choices that balance our interpretation of objective and subjective criteria: Let our history, knowledge and experiences shape our choices.

A choice without knowledge and understanding is not really a choice at all. Only via an understanding of consequences can we assign value to the choices available. Comprehension of available choices relies on existing knowledge and experience. Execution of decisions requires ongoing, grounded understanding of the world and our action capabilities, to ensure we execute choices as intended. And you're not going to "choose" an action that has no personal meaning - for example, if you live in a desert where it never rains, you won’t invent an umbrella*.

We believe the answer to the whole Free Will problem is also the source of limits to our freedom: Our knowledge, experience and personal values. This is no bad thing: Individual characteristics develop from our personal history, and experience of consequences affects the values we express in future decisions. We don't have to express these limits negatively; we can say that experience, knowledge and beliefs guide us to choices we find reasonable. Personal identity is defined by experience and knowledge and expressed in choices: It’s what makes you, you. And it makes the choices yours.

This is not a new idea. In philosophy, those who believe that Free Will is compatible with some level of determinism are known as compatibilists. Compatibilitists (such as us) believe that the free-will vs determinism debate is a false dilemma that can be sidestepped by clearly defining the objective.

Self-determination is a positive way to summarise the influence of internal constraints. So, if we were to accept these constraints, what freedom do we have, to make choices?

* The etymology of the word “Umbrella” is relevant. It derives from the Latin Umbra, meaning shade. The novel application of an existing sunshade tool to rain protection may be an example of ideas being guided, inspired and yet constrained by previous experience. The adaptation of an existing idea to a new problem is a common innovation technique.

Uncertainty and Stability

The universe we inhabit appears to be mostly deterministic, at human scales, but also quite random, at smaller scales. Some systems, such as weather, are unpredictable at large scales because they are very sensitive to small-scale changes. This phenomenon is known as “sensitive-dependence”: Chaos theory is the study of such systems. A famous example is the “Butterfly Effect”, in which the beating of a butterfly’s wings alters the path of a tropical storm thousands of miles away.

Other large-scale systems are stable against small-scale changes. For example, the behaviour of a brick house or a steel girder does not change significantly in response to butterflies’ flapping. The structure of these systems absorbs these changes, and the behaviour we care about is not affected.

Conversely, we can design robust information processing systems. For example, your computer’s processor can execute millions of instructions per second without error. Nature can do this too: Error-checking mechanisms ensure remarkably accurate copying of DNA into every cell in your body; but mutations do occur at a low but necessarily nonzero rate. Natural selection of some mutations amplifies the frequency of some variants, resulting in evolution.

In Estimation theory, events are represented by a combination of explicit models and terms that capture the effects of random events. We call these random events “noise”. In a sensitive-dependent system, these random events can make the future state of the system unpredictable.

Noise can be selectively amplified or suppressed by carefully designed systems. Complex, large-scale systems can be sensitive to small-scale random events, or they can be designed to be almost entirely unaffected by noise.

Exploiting Uncertainty

For our self-determination model, the ideal solution is a mix of the two extremes. We need deterministic reasoning to accurately evaluate potential actions and outcomes. And we can exploit uncertainty and stability in a system to amplify the consequences of random events when exploring options, and stabilize decisions once made.

Consider perception. You want your senses to reliably tell you what's going on outside your head. This is essential: If all you saw were crazy swirls and flashing imaginary lights, you wouldn't be able to find your way around. But we also want some uncertainty in perception to allow us to test different hypotheses for what we're seeing and decide which fits best. For example, the perception of ambiguous images depends on expectations.

Imagine a scenario with 2 alternative choices: A, and B. Our internally "preferred" choice is A. But perceptual cues and third party advice strongly suggests action B. Let’s assume this "external bias" means that the probability of us even conceiving plan A is small (e.g. 0.1). So in this case it seems that external causes are dominating internal preferences. We're simply responding to external cues, like a puppet. Not ideal.

Now let's change the system a bit. We include a method of repeatedly, randomly generating combinations of ideas, and strongly reinforcing the ideas with the greatest internal value. Imagine rolling a 10-sided die. If we get a '1', we are lucky enough to think of plan A. If we get any other number, we only think of plan B, due to the external bias.

The dice will only roll a '1' rarely. But if we allow ourselves 100 dice-rolls we will get quite a few '1's. As long as we produce a strong response to rolling the occasional '1' (due to the anticipated outcome of plan A), we can design a system that will be largely determined by internal preferences, not dictated by chance properties of the outside world.

Figure: An ambiguous image; the same sensory input can be perceived in two ways (in this case either an old or young lady), depending on the prior or internal bias of the viewer.

The Cortex is widely accepted as the origin of high-level, abstract thought, including strategy and planning. In essence, the Cortex provides understanding of contexts and potential choices. The design of the brain routes inter-cortical connections via central hubs - the Basal Ganglia, and the Thalamus. During routing, messages are filtered. This design closely matches the architecture needed for a competitive evaluation of competing plans influenced by an understanding of their consequences.

If the part of your brain that generates ideas is sensitive-dependent, then it can overcome the influence of external factors and suggest all sorts of things until an internally resonating plan is found. Of course, your evaluation of the ideas should be quite deterministic, based on previous experience and knowledge.

Figure: Flow of information from the cortex through deeper structures such as the Basal Ganglia and Thalamus, before routing back to Cortex. This architecture allows deeper structures that filter and select the activity in the cortex. Image from Scholarpedia.

We can predict the weather a couple of days into the future, but beyond that we're no better than random chance at guessing whether it will be sunny. The impact of noise increases over time. Noise in the brain could also lead to vastly different outcomes even given similar initial states.

In just a few seconds of thought, our brains are sensitive and complex enough to generate and evaluate thousands of different ideas. With a little time for consideration, the probability of generating your "preferred" plan may be very high. It’s not the end of the world if this doesn't always happen; sometimes we need expert or friendly advice!

As discussed above, this form of self-determination requires only a highly tuned, sensitive, but perfectly ordinary physical mechanism.

Retrospective Future Self-Determination!

Sometimes we have to pause and think carefully about things. Sometimes we will mull over a big decision for a few days. Other “decisions” are trivial and are made instantly – perhaps most. But in these cases retrospective awareness of instinctive choices makes us more than helpless bystanders: We can reflect consciously on decisions already made, and re-assess for next time.

Our imagined experience of potential outcomes might modify the values we associate with various actions. We can imagine better outcomes from other choices, that will then be more readily selected in future. In this model, you can improve yourself by conscious reflection on your actions.

Conclusion

The issue of free will has profound moral implications: Are we responsible for our actions? The answer is often said to lie in the existence (or otherwise) of a magic property of conscious awareness that allows understanding and decision-making to occur outside conventional physical processes. This process is often called “free-will”: The ability to choose a course that is not pre-determined, but instead a novel and unpredictable response: Chosen, not inevitable. The problem is, we know of no physical mechanism for this type of choice; we would argue that it can't even be defined coherently.

Instead, this article offers a limited but positive alternative to free will as intelligent self-determinism, exploiting both sensitive-dependence and stability via feedback within ordinary physics.

It's interesting to consider our concept of identity in relation to this type of self-determinism. Preferences and emotional expectations resulting from past experiences and personal consequences define our personalities and character via the values and emotional responses we learn to attach to events. This means that self-determined decisions define our identities. We are literally the product of our choices: Your personality is formed by a continuity of experience, choice and consequence.

Having described self-determinism without resorting to special physics or magic beans, there’s nothing to prevent us creating artificial copies of it. We have every reason to believe we can create machines that self-determine their own destiny just as we do. I can't wait to have this debate with one of them.

Postscript

Capabilities of Artificial Self-Awareness

Machines can already construct sophisticated internal representations of the world. Machines can interpret data in ways that have similar qualities and performance to human vision (e.g. how-old.net). So is there anything that fundamentally divides machines' internal representations from the experience of awareness we humans enjoy? Some philosophers call this the "hard problem" of consciousness; other philosophers say the problem doesn't exist, because the missing qualities can't be properly defined or do not really exist! This paradox is best illustrated with the Zombie thought experiment.

The Zombie Conundrum

A philosophical "zombie" is an entity that has the external appearance of consciousness, but internally is merely a simulation of the real thing. In fact, we might all be zombies, depending on the quality of consciousness required for the "real" thing. If you set impossible requirements, then we are all zombies.

Actually, it is easier to prove the veracity of conscious experience in software than in humans. This is because we can “pause” and explore software brains in great detail, with access to all internal state. Given an algorithm that is expected to produce the qualities of consciousness, we can inspect and measure these qualities directly. It might even be easier to build genuine consciousness, than a convincing simulation.

Consciousness may be a continuum rather than a binary feature: Awareness with varying degrees of quality and depth. Chimpanzees deliberately construct tools for later use. Dogs are capable of sophisticated social interactions. So there may not be a yes/no answer to the Hard Problem.

Progress in Artificial Awareness

The purpose of this blog is to look at practical techniques for automatically creating hierarchical, increasingly abstract representations of an embodied, adaptive agent in its environment. There's also some discussion of "symbol grounding" - making the jump between sensory and symbolic representations (we believe the problem goes away when defined as "accumulating invariances" instead).

Persistent belief in the hard problem of consciousness may stem from our inability to imagine both how our awareness would scale down to current computer levels of complexity, and how computer representations would feel when scaled to human proportions.

I find it fascinating to visualize internal representations created by AI, such as the image below. They do seem to capture the essential qualities and variations of broad classes, such as “Cat”. Often, when machines are wrong, their mistakes are reminiscent of the errors made by young children (e.g. “sheeps” or “shoeses” overgeneralization).

Figure: Google trained an artificial neural network (ANN) on stills from YouTube videos. Reverse engineering of one of the resulting neurons reveals this input; it is effectively a self-taught class-label for a common set of inputs. We call this class "Cat". Although the ANN didn't give it a name, it was able to experience this concept. If we had also taught it to speak like SIRI, it is likely the ANN would correctly associate this visual perception with the "cat" word. How is this different to human qualia of Cat?

Friday, 8 May 2015

By Gideon Kowadlo and David Rawlinson

In our last blog post, we discussed the repeating functional columnar structure of the neocortex, and the inconsistent terminology used to discuss it throughout the literature. As mentioned in that post, the function of the column is an important concept for understanding the function of the neocortex, and as a consequence, for designing algorithms that are inspired by the neocortex. We therefore require a clear nomenclature for discussing and working with these concepts.

As promised, here is a follow-up post with definitions of columns and associated concepts. The definitions are based on a paper by Rinkus [1] (introduced in the previous post). For decades it was widely accepted that the structure of columns in the neocortex is uniform across species and individuals. Recent studies have shown that to be not entirely correct [3] (summarised here and in [4] ). Rinkus provides a well founded functional basis for the definition of columns. This approach is more meaningful and robust, and directly relevant to understanding the neocortex algorithmically.

Layer

Function

Defining the cortical layers is necessary for discussions on cortex. The cortex is a surface that consists of several layers of cells. The density, morphology and function of cells varies between layers. The distribution of connections to other layers varies for each layer, but is relatively constant within a layer.

Although cells in any layer may connect to cells in all other layers, they do this only for cells within the same macrocolumn.

This means that columns extend through all cortex layers. Columns are organised perpendicularly to layers. Since the layers consist of different patterns of cell connectivity and type, layer distinctions are also functional distinctions.

Anatomy

Typically 5-7 layers, described as:

L1 Molecular Layer

(non-cellular, just axons)

[L2, L3] Small pyramidal cells (of two sizes)

L4 Spherical neurons.

[L5a, L5b] Large pyramidal cells (a & b often distinguished)

L6 multiform layer

Macrocolumn (also referred to as a Region or Hypercolumn)

Function

Overall input includes bottom up input from thalamus and lower cortical areas, top down from higher cortical areas and horizontal from adjacent cortical areas. This is also referred to as the context. The macrocolumn responds to context dependent input patterns.

A standard definition of a macrocolumn is a set of cells that have the same receptive field. In this definition, we specify that all cells in the macrocolumn don’t necessarily have same learned receptive field, but the same potential receptive field.

Anatomy

300–600 μm

60 - 80 minicolumns per macrocolumn

Minicolumn

Function

A subset of cells in the macrocolumn, for which there is a winner take all (WTA) cell, for a given macrocolumn context (overall input pattern). According to this definition, the function of the minicolumn is to enforce sparseness.

The fact that there is only one winner results in an SDR in the macrocolumn. Therefore, the macrocolumn output contains a signal from 1 winning cell in each minicolumn, in each layer (~70 cells in total per layer). In most implementations, WTA is implemented with a competitive process.

A standard definition of a minicolumn is that all cells within it describe a similar feature within that the receptive field of the macrocolumn. This will occur in most cases, but it emerges from the function, which is the basis of our definition.

Anatomy

~20 cells (physically localised)

20–50 μm

Potential Receptive Field

Function

A set of input bits that can be connected to a cell.

Anatomy

A set of axons that potentially could be synapsed by the dendrites of a neuron.

Learned Receptive Field

Function

The actual set of input bits synapsed to a cell after learning and the effects of mutual inhibition or self-organisation with its neighbours.

Anatomy

The synapses formed by the dendrites of a neuron on input axons.

Many researchers believe that the set of active cells in a single macrocolumn layer can be described as a Sparse Distributed Representation (SDR). We assume this to be the case in our definitions. SDRs can be understood as having the following properties:

Attributes

Compositionality

Distribution

SDR: Attributes

A subset of an SDR that has some semantic meaning; 1 or more bits, NOT the whole set of active bits in an SDR.

SDR: Compositionality

Compositionality of SDRs emerges from the fact that an SDR contains many attributes in combination.

SDR: Distributed

A distributed representation is one that consists of multiple attributes, those attributes can exist independently, be shared between representations and overlap.