To follow this blog by email, give your address here...

Saturday, November 22, 2014

Intuitively, it is tempting (to some people anyway!) to think of the potential future Technological Singularity as somehow "sucking us in" -- as a future force that reaches back in time and guides events so as to bring about its future existence. Terrence McKenna was one of the more famous and articulate advocates of this sort of perspective.

This way of thinking relates to Aristotle's notion of "Final Causation" -- the final cause of a process being its ultimate purpose or goal. Modern science doesn't have much of a place for final causes in this sense; evolutionary theories often seem to be teleological in a "final causation" way on the surface, but then can generally be reformulated otherwise. (We colloquially will say "evolution was trying to do X," but actually our detailed models of how evolution was working toward X, don't require any notion of "trying", but only notions of mutation, crossover and differential survival...)

It seems to me, though, that the Surprising Multiverse theory presented in one of my recent blog posts (toward the end), actually implies a different sort of final causation -- not quite the same as what Aristotle suggested, but vaguely similar. And this different sort of final causation does, in a sense, suggest that the Singularity may be sucking us in....

The basic concept of the Surprising Multiverse theory is that, in the actual realized rather than merely potential world, patterns with high information-theoretic surprisingness are more likely to occur. This implies that, among the many possible universes consistent with a given set of observations (e.g. a given history over a certain interval of time), those universes containing more surprisingness are more likely to occur.

Consider, then, a set of observations during a certain time interval -- a history as known to a certain observer, or a family of histories as known to a set of communicating observers -- and the question of what will happen AFTER that time interval is done. For instance, consider human history up till 2014, and the question of the human race's future afterwards.

Suppose that, of the many possible futures, some contain more information-theoretic surprisingness. Then, if the Surprising Multiverse hypothesis holds, these branches of the multiverse -- these possible universes -- will have boosted probabilities, relative to other options. The surprisingness weighting may then be viewed intuitively as "pulling the probability distribution over universes, toward those with greater surprisingness."

The "final cause" of some pattern P according to observer O, may be viewed as the set of future surprising patterns Q that are probabilistically caused by P, from the perspective of observer O. (There are many ways to quantify the conceptual notion of probabilistic causation -- perhaps the most compelling is as "P having nonneutralized positive component effect on Q, based on the knowledge of O", as defined in the interesting paper A Probabilistic Analysis of Causation.)

So the idea is: final causation can be viewed as the probabilistic causation that has the added oomph of surprisingness (and then viewed in the backwards direction). A final cause of P is something that is probabilistically caused by P, and that has enough surprisingness to be significantly overweighted in the Surprising Multiverse weighting function that balances P's various possible futures.

So what of the Singularity? We may suppose that a Technological Singularity will display a high degree of probabilistic surprisingness, relative to other alternative futures for humanity and its surrounds. If so, branches of the multiverse involving a Singularity would be preferentially weighted higher, according to the Surprising Multiverse hypothesis. The Singularity is thus a final cause of human history. QED....

A fair example of the kind of thing that passes through my head at 2:12 AM Sunday morning ;-) ...

This is a human tragedy like every single death; and it's also the kind of
thing one can expect from time to time in the course of development of
any new technology. I have no doubt that progress toward tourist spaceflight will continue apace: inevitable startup struggles notwithstanding, it's simply an idea whose time has come.

Every tragedy is also an occasion for reflection on the lessons implicit in the tragic events.

(in the center picture, the SpaceShipTwo is shown in the center,

between the motherships that provide its initial lift)

For me, watching the struggles of the Virgin Galactic approach to spaceflight has also been a bit of a lesson in the pluses and minuses of prize-driven technology development. SpaceShipTwo is the successor to SpaceShipOne, which won the Ansari X-Prize for commercial spaceflight a decade ago. At the time it seemed that the Ansari X-Prize would serve at least two purposes:

Raise consciousness generally about the viability of commercial spaceflight, particularly of the pursuit of spaceflight by startups and other small organizations rather than governments and large government contractors

Concretely help pave a way toward commercially viable spaceflight, via progressive development of the winning spaceflight technology into something fairly rapidly commercially successful

It seems clear that the first goal was met, and wonderfully well. Massive kudos are due to the X-Prize Foundation and Ansari for this. The press leading up to and following from the Ansari X-Prize made startup spaceflight into a well-recognized "thing" rather than a dream of a tiny starry-eyed minority.

Regarding the second goal, though, things are much less clear. Just a little before the tragic SpaceShipTwo crash, a chillingly prescient article by Doug Messier was posted, discussing the weaknesses of the SpaceShipTwo design from a technical perspective. If you haven't read it yet, I encourage you to click and read it through carefully -- the article you're reading now is basically a reflection on some of the points Messier raises, and a correlation of some of those points with my own experiences in the AI domain.

Messier's article traces SpaceShipTwo's development difficulties back to the SpaceShipOne design, on which it was based -- and points out that this design may well have been chosen (implicitly, if not deliberately) based on a criterion of winning the Ansari X-Prize quickly and at relatively low cost, rather than a criterion of serving as the best basis for medium-term development of commercial spaceflight technology.

As Messier put it,

It
turns out that reaching a goal by a deadline isn’t enough; it matters
how you get there. Fast and dirty doesn’t necessarily result in solid,
sustainable programs. What works well in a sprint can be a liability in a
marathon. A - See more at:
http://www.parabolicarc.com/2014/10/30/apollo-ansari-hobbling-effects-giant-leaps/#sthash.1ah2VRLy.dpuf

It turns out that reaching a goal by a deadline isn’t enough; it matters how you get there. Fast and dirty doesn’t necessarily result in solid, sustainable programs. What works well in a sprint can be a liability in a marathon.

It
turns out that reaching a goal by a deadline isn’t enough; it matters
how you get there. Fast and dirty doesn’t necessarily result in solid,
sustainable programs. What works well in a sprint can be a liability in a
marathon. - See more at:
http://www.parabolicarc.com/2014/10/30/apollo-ansari-hobbling-effects-giant-leaps/#sthash.1ah2VRLy.dpuf

It
turns out that reaching a goal by a deadline isn’t enough; it matters
how you get there. Fast and dirty doesn’t necessarily result in solid,
sustainable programs. What works well in a sprint can be a liability in a
marathon. A - See more at:
http://www.parabolicarc.com/2014/10/30/apollo-ansari-hobbling-effects-giant-leaps/#sthash.1ah2VRLy.dpuf

However, while I am fascinated by Messier's detailed analysis of the SpaceShipOne and SpaceShipTwo technologies, I'm not sure I fully agree with the general conclusion he draws -- or at least not with the way he words his conclusions. His article is titled "Apollo, Ansari and the Hobbling Effects of Giant Leaps" -- he argues that a flaw in both the Ansari X-Prize approach and the Apollo moon program was an attempt to make a giant leap, by hook or by crook. In both cases, he argues, the result was a technology that achieved an exciting goal using a methodology that didn't effectively serve as a platform for ongoing development.

Of course, the inspirational value of putting a man on the moon probably vastly exceeded the technical value of the accomplishment - and the inspirational value was the main point at the time. But I think it's also important to make another point: the problem isn't that pushing for Giant Leaps is necessarily bad. The problem is that pushing for a Giant Leap that is defined for non-technical, non-scientific reasons, with a tight time and/or financial budget, can lead to "fast and dirty" style short-cuts that render the achievement less valuable than initial appearances indicate.

Apollo,
Ansari and the Hobbling Effects of Giant Leaps - See more at:
http://www.parabolicarc.com/2014/10/30/apollo-ansari-hobbling-effects-giant-leaps/#sthash.1ah2VRLy.dpuf

That is: If the goal is defined as "Achieve Giant Leap Goal X as fast and cheap as possible," then the additional goal of "Create a platform useful for leaping beyond X" is not that likely to be achieved as well, along the way. And further -- as I will emphasize below -- I think the odds of the two goals being aligned are higher if Great Leap Goal X emerges from scientific considerations, as opposed to from socially-oriented marketing or flashy-demonstration considerations.

It's interesting that Messier argues against Giant Leaps and in favor of incremental development. And yet there is a sense in which SpaceShipOne/Two represents incremental development at its most incremental. I'm thinking of the common assumption in the modern technology world, especially in Silicon Valley, that the best path to radical technological success is also generally going to be one that delivers the most awe-inspiring, visible and marketable results at each step of the way. The following graphic is sometimes used to illustrate this concept:

On the surface, the SpaceShipTwo approach exemplifies this incremental development philosophy perfectly. It's a spaceplane, an incremental transition between place and spaceship; and the spaceship portion is lifted high into the air initially by a plane. It's precisely because of taking this sort of incremental approach that SpaceShipOne was able to win the Ansari X-Prize with the speed and relatively modest cost that it did.

On the other hand, Messier favors a different sort of incremental spacecraft development -- not incremental steps from plane to plane/spacecraft to spacecraft, but rather ongoing incremental development of better and better materials and designs for making spacecraft, even if this process doesn't lead to commercial space tourism at the maximum speed. In fact, scientific development is almost always incremental -- the occasional Eureka moment notwithstanding (and Eureka moments tend to rest on large amounts of related incremental development).

It seems important, in this context, to distinguish between incremental basic scientific/technological progress and incremental business/marketing/demonstration progress. Seeking incremental scientific/technological progress makes sense (though other issues emerge here, in terms of pathologies resulting from trying too hard to quantitatively and objectively measure incremental scientific/technological progress -- I have discussed this in an AGI context before). But the path of maximally successful incremental business/marketing/demonstration progress often does not involve the most sensible incremental scientific path -- rather, it sometimes involves "fast and dirty" technological choices that don't advance science so much at all.

In my own work on AGI development, I have often struggled with these aspects of development. The incremental business/marketing/demo development approach has huge commercial advantages, as it has more potential of giving something money-making at each step of the way. It also has advantages in the purely academic world, in terms of giving one better demos of incremental progress at each step of the way, which helps with keeping grant funding flowing in. The advantages also hold up in the pure OSS software domain, because flashy, showy incremental results help with garnering volunteer activity that moves an OSS project forward.

However, when I get into the details of AGI development, I find this "incremental business/marketing/demo" approach often adds huge difficulty. In the case of AGI the key problem is the phenomenon I call cognitive synergy, wherein the intelligence of a cognitive system largely comes from the emergent effects of putting many parts together. So, it's more like the top picture in the above graphic (the one that's supposed to be bad) rather than the bottom picture. Building an AGI system with many parts, one is always making more and more scientific and technological progress, step by step and incrementally. But in terms of flashy demos and massive commercial value, one is not necessarily proceeding incrementally, because the big boost in useful functionality is unlikely to come before a lot of work has been done on refining individual parts and getting them to work together.

Google, IBM and other big companies recently redoubling their efforts in the AI space are trying to follow the bottom-picture approach, and work toward advanced AGI largely via incrementally improving their product and service functionalities using AI technology. Given the amount of funding and manpower they have, they may be able to make this work. But where AGI is concerned, it's pretty clear to me that this approach adds massive difficulty to an already difficult task.

One lesson the SpaceShipOne/Two story has, it seems to me, is that aggressive pursuit of the "maximize incremental business/marketing/demo results" path has not necessarily been optimal for commercial spaceflight either. It has been fantastically successful marketing-wise, but perhaps less so technically.

I've been approached many times by people asking my thoughts on how to formulate a sort of X-Prize for AGI. A couple times I put deep thought into the matter, but each time I came away frustrated -- it seemed like every idea I thought of was either

"Too hard", in the sense that winning the prize would require having a human-level AGI (in which case the prize becomes irrelevant, because the rewards for creating a human-level AGI will be much greater than any prize); OR

Susceptible to narrow-AI approaches -- i.e. likely end up rewarding teams who pushed toward winning the prize quickly via taking various short-cuts, using approaches that probably wouldn't be that helpful toward achieving human-level AGI eventually

The recently-proposed AI TED-talk X-Prize seems to me likely to fall into the latter category. I can envision a lot of approaches to making AIs to give effective TED talks, that are basically "specialized TED talk giving machines" designed and intensively engineered for the purpose, without really having architectures suitable as platforms for long-term AGI development. And if one had a certain fixed time and money budget for winning the AI TED-talk X-Prize, pursuing this kind of specialized approach might well be the most rational course. I know that if I myself join a team aimed at winning the prize, there will be loads of planning discussions aimed at balancing "the right way to do AGI design/development" versus "the cleverest way to win the prize."

On the other hand, as a sci-tech geek I need to watch out for my tendency to focus overly on the technical aspects. The AI TED-Talk X-Prize, even if it does have the shortcomings I've mentioned above, may well serve amazingly well from a marketing perspective, making the world more and more intensely aware of the great potential AI holds today, and the timeliness of putting time and energy and resources into AGI development.

I don't want to overgeneralize from the SpaceShipTwo crash -- this was a specific, tragic event; and any specific event has a huge amount of chance involved in it. Most likely, in a large percentage of branches of the multiverse, the flight Friday went just fine. I also don't want to say that prize-driven development is bad; it definitely has an exciting role to play, at very least in helping to raise public consciousness about technology possibilities. And I think that sometimes the incremental business/marketing/demo progress path to development is exactly the right thing. As well as being a human tragedy, though, I think the recent terrible and unfortunate SpaceShipTwo accident does serve as a reminder of the limitations of prize-driven technology development, and a spur to reflect on the difficulties inherent in pursuing various sorts of "greedy" incremental development.

Saturday, November 01, 2014

A little piece of patternist analytical philosophy to brighten up your weekend.... I was thinking this stuff through today while going about in Tai Po running errands. Most notably, the back bumper of my car fell off yesterday, and I was trying to find someone to repair it. A friendly auto repair shop ended up reattaching the bumper with duct tape. A real repair is pending them finding a replacement for the broken plastic connector between the bumper and the car, in some semi-local junkyard. Egads! Well, anyway....

The concept of "representation" is commonly taken as critical to theories of cognition. In my own work on the foundations of cognition, I have taken the concept of "pattern" as foundational, and have characterized "pattern" as meaning "representation as something simpler."

But what is representation? What is simplicity?

In this (rather abstract, theoretical) post, I will suggest a way of grounding the concepts of representation and simplicity (and hence, indirectly, pattern) in terms of consciousness -- or more specifically, in terms of the concept of attention in cognitive systems.

I'll speak relatively informally here, but I'm confident these ideas can be formalized mathematically or philosophically if one wishes...

From Attention to Representation

Suppose one has an intelligent system containing a large amount of contents; and each item of contents has a certain amount of attention associated with it at a given point in time.

(One can characterize attention either energetically or informationally, as pointed out in Section 2.6 of my recent review paper on consciousness. Given the close connection between energy and information in physics, these two characterizations may ultimately be the same thing.)

In real-world cognitive systems, attention is not distributed evenly across cognitive contents. Rather, it seems to generally be distributed in such a way that a few items have a lot of attention, and most items have very little attention, and there is a steep but continuous slope between the former and latter categories. In this case, we can speak about the Focus of consciousness as (a fuzzy set) consisting of those items that have a lot of attention during a certain interval; and the Fringe of consciousness as (a fuzzy set) consisting of those items that have more attention than average during a certain interval, but not as much as the items in the Focus do. (The general idea that human consciousness has a Focus and a Fringe goes back at least to William James.)

It seems to me one can ground the notion of representation in the structure of consciousness, specifically in the relation between the Focus and the Fringe.

Namely, one can say that ... R represents E, to system S, if: In the mind of system S, when R is in the Focus, this generally implies E is likely to be in the Fringe at around the same time (perhaps at the same time, perhaps a little earlier, or perhaps a little later).

As a single example, consider the typical simplifying model of
the visual cortex as a processing hierarchy. In this case, we may say that when we
are visually remembering the tree

the state of the upper levels of the visual hierarchy is in Focus, along with bits and pieces of the lower levels

the full state of the whole visual hierarchy is mostly contained in Fringe

So the rough visual image we have of the tree in our "mind's eye" at a certain point in time, represents the richer visual image of the tree we have in our broader sensorimotor memory.

On the other hand, the phrase "the tree in the middle of my old backyard" may represent my stored visual images of that tree as well, if when that phrase occurs in my Focus (because I heard it, said it, or thought about it), my stored visual images rise into my Fringe (rising up from the deeper, even less attended parts of my memory).

From Attention to Simplicity

I'd like to say that R is a pattern in E if: R represents E, and R is simpler than E. But this obviously begs the question of what constitutes simplicity....

In prior writings, I have tended to take simplicity as an assumptive prior concept. That is, I have assumed that each mind has its own measure of simplicity, and that measurement of pattern is relative to what measure of simplicity one chooses.

I still think this is a good way to look at it -- but now I'm going to dig a little deeper into the cognitive underpinnings of how each mind generates its own measure of simplicity.

Basically, I propose we can consider E as simpler than F, if it's generally possible to fit more stuff in the Focus along with E, than along with F.

Note that both E and F may be considered as extending over time, in this definition. Sometimes they may be roughly considered as instantaneous, but this isn't the general case.

One technical difficulty with this proposal is how to define "more." There are many ways to do that; one is as follows....

Define simple_1 as follows: E is simpler_1 than F if it's generally possible to fit a greater number of other coherent cognitive items in the Focus along with E, than with F. (This relies on the concept of "coherence" of a cognitive item as a primitive -- or in other words, it assumes that the sense or notion of what is a "coherent whole" is available to use to help define simplicity.)

Then define simple_2 as: E is simpler_2 than F if it's generally possible to fit a set of cognitive items with less simplicity_1 in the Focus along with E, as compared to with F.

One can extend this to simple_3, simple_4, etc., recursively.

According to this approach, we would find a detailed image of a tree is less simple than a rough, approximate image. When visualizing a detailed image, we keep more different samples of portions of the detailed image in Focus, leaving less room for anything else.

Similarly, the concept of the function x^2, once we understand it, takes up much less space in Focus than the procedure "take a number and multiply it by itself", and much less than a large table of pairs of numbers and their squares. Once an abstraction is learned (meaning that holding it in Focus causes appropriate knowledge to appear in Focus), and mastered (meaning that modifying it while it's in Focus causes the contents of Fringe to change appropriately), then it can provide tremendous simplification over less abstract formulations of the same content.

From Attention to Pattern

So, having grounded both representation and simplicity in terms of the structure of attention, we have grounded pattern in terms of the structure of attention. This is interesting in terms of the cognitive theory I outlined in my 2006 book The Hidden Pattern and elsewhere, which grounded various aspects of intelligence in terms of the "pattern" concept.

As I've previously grounded so much of cognition in terms of pattern, if one then grounds pattern in terms of consciousness, one is then in effect grounding the structure and dynamics of cognition in terms of simple aspects of the structure of consciousness. This can be viewed mathematically and formally, and/or phenomenologically.

Logical and Linguistic Representation

Often when one hears about "representation", the topic at hand is some sort of formal, logical or linguistic representation. How does that kind of representation fit into the present framework?

Formal or semi-formal systems like (formal or natural) languages or mathematical theories may be viewed as systems for generating representations of cognitive items.

(What I mean by a semi-formal system is: A system for generating entities for which, for a variety of values of x less than 1, we have a situation where a fraction x of the system's activity can be explained by a set of n(x) formal rules. Of course n(x) will increase with x, and might sometimes increase extremely fast as x approaches 1. Natural languages tend to be like this.)

When we have a situation where

R represents E

R is a construct created in some formal or semi-formal system S

E is not a construct in S; rather, E is connected via some chain of representations with sensory and/or motor data

then we can say that E "grounds" R.

Grounding tends to be useful in the case of systems where

R is commonly simpler than E (or at least, there's some relatively easy way to tell what will be the situations in which R is going to be simpler than E)

There is a methodology for going from E to the formal / semi-formal representation R, that doesn't take a huge amount of attention (once the mind is practiced at the methodology)

Carrying out manipulations within the formal / semi-formal system commonly generates new formal / semi-formal constructs that represent useful things

These criteria hold pretty well in the case of human languages, and most branches of mathematics (I suppose the jury's still out on, say, the theory of inaccessible cardinals....)

Note that one system may be grounded in another system. For instance, formal grammars of English are grounded in natural English language productions -- which in turn are grounded, for each language user, in sensorimotor experience.

If it is simple to generate new representations using a certain system, then this means the process of representation-generation is a pattern in a the set of representations generated -- i.e. it's simpler to hold that process in Focus over an interval of time, than to hold the totality of representations generated by it in Focus over an interval of time. The formal and semi-formal systems adopted by real-world minds, are generally adopted because grounding is useful for them, and their process of representation-generation is a pattern.

This is all quite abstract -- I'll try to make it a little more concrete now.

Suppose I tell you about a new kind of animal called a Fnorkblein, which much enjoys Fbljorking, especially in the season of YingYingYing. You can then produce sentences describing the actual or potential doings of these beings, e.g. "If Fnorkbleins get all their joyful Fbljorking done in YingYingYing, they may be less happy in other seasons."

These sentences will be representations of, and patterns in, your visual images of corresponding scenes (your mental movie of the Fnorkbleins romping and Fbljorking in the throes of their annual YingYingYing celebration, and so forth). They will ground these images.

Furthermore, the process of formulating this Fnorkblein-ful sentences, will take relatively little Focus for you, because you know grammar. If you didn't know grammar, then formulating linguistic patterns representing Fnorkblein-relevant images would require a lot more work, i.e. a lot more Focus spent on the particulars of Fnorkbelein-ness.

Of course, people can use grammar fairly effectively -- including about Fnorkbleins -- without any knowledge of any explicit formalization of grammar. However, if one wants people to stick close to a specific version of grammar, rather than ongoingly improvising in major ways, it does seem effective to teach them explicit formalized rules that capture much of everyday grammatical usage. That is, if the task at hand is not deploying the grammar one knows in practical contexts, but rather communicating or gaining knowledge regarding which sentences are considered grammatical in a certain community -- then formalizations of grammar become very useful. This is the main reason grammar is taught in middle school -- giving a
bit of theory alongside the real-world examples a child hears in the
course of their life, helps the child learn to reliably produce
grammatical sentences according to recognized linguistic patterns,
rather than improvising on the patterns they've heard as happens in
informal communication. (And it may be that formal grammars are also
useful for teaching AIs grammar, to help overcome their lack of the
various aspects of human life and embodiment that human children learn
to pick up grammar implicitly from examples -- but that's a whole other
story.)

It seems the act of teaching/learning the rules of grammar of a
language, constitutes a pattern in "the act of communicating/learning the set of
sentences that are grammatical in that language", in the sense that: If
one has to tell someone else how to reliably assess which sentences are
grammatical in a certain language (as utilized within a certain specific
community), a lot less Focus will be spent via telling them the rules
of grammar alongside a list of examples, than by just giving them a long
list of examples and leaving them to do induction. When teaching
grammar, while a rule of grammar is in the Focus, the specific examples
embodying that rule of grammar are in the Fringe. While passing the rules of grammar from teacher's Focus to student's Focus, the real point is what is being represented: the passage of sets of sentences judged as grammatical/ungrammatical from teacher's Fringe to student's Fringe.

The
rules of grammar, then, may be described as a "subpattern" in the act
of communicating a natural grammar (a subpattern meaning they are part
of the pattern of teaching the rules of grammar). This may seem a bit
contorted, but it's not as complicated as the design of the computer I'm
typing this on. Formal grammars, like Macbooks, are a complex technology that only
emerged quite recently in human evolution.

Interpretations....

And so, Bob's your uncle.... Starting from the basic structure of consciousness, it's not such a big leap to get to pattern -- which brings one the whole apparatus of mind as a system for recognizing patterns in itself and the world -- and to systems like languages and mathematics.

One can interpret this sort of of analysis multiple ways ... for instance:

Taking physical reality as the foundation, one can study the structure of consciousness corresponding to a certain physical system (e.g. a brain), and then look at the structure of the mind corresponding to that physical system as consequent from the structure of its consciousness.

Taking experience as the foundation, one can take entities like Unity and Attention as primary, and then derive concepts like Focus and Fringe, and from there get to Representation and Pattern and onwards -- thus conceptually grounding complex aspects of cognition in terms of phenomenological basics.