I've never read an article about AI before this one. I feel qualified to comment because that places me squarely in your target audience for this piece. I found it to be an interesting introduction to the 3 questions you highlighted for discussion. I found the discussion of "friendly AI" particularly compelling/scary.

I do have a couple of comments on style that might be helpful:

In the opening section, the three paragraphs on what you are not planning to do may be useful signaling for people who are fed up with some elements of the AI discourse. But as a newcomer, it made me feel as though you were on the defensive from the beginning. I also found parts of these paragraphs a bit confusing, in contrast to the rest of the article which is clearly aimed at newcomers. E.g. I don't know what a classical Von Neumann computing architecture is, or what it means that you aren't assuming the continuation of Moore's law.

Section 1 ("from here to AI") is a good introduction to how to think about the tricky problem of 'when is it coming?'. But it wasn't clear to me how your conclusion that "there is a significant probability that AI will be created this century" follows from section 1. It might help if you were to spell out which of the seemingly difficult AI benchmarks have been reached (are these the Accelerators you discussed before?).

By "AI," we refer to general AI rather than narrow AI. That is, we refer to "systems which match or exceed the cognitive performance of humans in virtually all domains of interest"

My problem is:

This appears to follow from the definition of intelligence as "optimization power [across many domains] divided by resources used"

The definition appears to be less precise

Given that precision of the basic terminology is really important here, would it be worth pointing out that the definition of AI basically follows from the definition of intelligence, rather than making them sound like related but subtly different concepts?

One more for the list of reasons why AGIs may be more poweful than humans: Self-transparency. If an AI makes a mistake and finds out later, it can do forensic analysis of its memories, and trust that those memories reflect its actual past mental states. Humans can't do that very well, because our memory isn't detailed or reliable enough, and usually contains only the output of our thought processes and not the procedural details. An AI, on the other hand, could be designed in such a way that it can perfectly reconstruct any past state, by storing occasional snapshots and replaying inputs. It wouldn't have to wait for a real-world mistake to benefit, either; it could just as easily test itself on hypothetical scenarios, and even manipulate its own memories and inputs to make the tests realistic.

This is especially powerful when combined with duplication and editability. It can collect a test suite of puzzles and reference solutions, and have very good testing of self modifications. By running slightly different versions of itself on the same input, starting from the same state, it would bypass the uncontrolled variability that plagues human cognitive psychology.

Many experts and groups have tried to predict AI. Unfortunately, experts often perform little better than chance

When it comes to making predictions, what exactly does "better than chance" mean? Assuming a prediction is in the form of a probability distribution, all we can say is that one probability distribution scored better than another.

I have to compliment you on this paper: as with Machine Ethics and Superintelligence, I can't help but to consider this one of the clearest, best argued, and most-sticking-to-the-point Singularity papers that I've read. This also seems to be considerably improved from some of the earlier drafts that I saw.

But uncertainty is not a “get out of prediction free” card. We either will or will not choose to encourage WBE development, will or will not help fund AI safety research, etc. The outcomes of these choices will depend, among other things, on whether AI is created in the near future.

I think I know what you mean by this paragraph, but its intended meaning is unclear to someone who doesn't. Also "the outcomes of these choices will depend, among other things, on whether AI is created in the near future" took me a moment to parse - as it is, it seems to be saying "the outcome of these choices will depend on the outcome these choices". (The first rule of Tautology Club is the first rule of Tautology Club.)

I suggest something like:

"But uncertainty is not a “get out of prediction free” card. We still need to decide whether or not to encourage WBE development, whether or not to help fund AI safety research, etc. Deciding either way already implies some sort of prediction - choosing not to fund AI safety research suggests that we do not think AI is near, while funding it implies that we think it might be."

or

"But uncertainty is not a “get out of prediction free” card. We still need to decide whether or not to encourage WBE development, whether or not to help fund AI safety research, etc. These choices will then lead to one outcome or another. Analogously, uncertainty about the reality and effects of climate change does not mean that our choices are irrelevant, that any choice is as good as any other, or that better information could not lead to better choices. The same is true for choices relating to the intelligence explosion."

and that brain size and IQ correlate positively in humans, with a correlation coefficient of about 0.35 (McDaniel 2005).

I did include Rushton in my research summary, IIRC, but it's probably a good idea to not cite him - Rushton is poison! An editor on Wikipedia tried to get me punished by the Arbcom for citing Rushton, even. (What saved me was being very very careful to not ever mention race in any form and specifically disclaim it.)

If we fail to implement AI safely on our first attempt, we may not get a second chance (Yudkowsky 2008a).

...seems wrong. We have failed to implement AI safely a million times by now. Obviously we don't get only one shot at the problem.

The sentence seems to be trying to say that we only have to terminally screw up once. I don't really see why it doesn't just say that. Except that what it does actually say sounds scarier - since it implies that any failure is a terminal screw up. That would be scary - if it was true. However, it isn't true.

The "How long, then, before AI?" section could benefit by being a teensy bit more specific. "a significant
probability that AI will be created this century" is surely possible to improve upon. IMO, best is to give a probability density function.

"This may not be the case if early AIs require quantum computing hardware, which is less likely to be plentiful and inexpensive than classical computing hardware at any given time."

Could you explain how this might work? I'm concerned that if there is less QC hardware than classical hardware, then we would likely know less about QC hardware (compared to classical) than a nascent AI. Would it be safer to run it on, e.g., a classical "potato"? Or are you saying that a classical potato would upload itself onto better classical hardware asap, but a quantum potato would be unable to do so?

Surely this doesn't increase intelligence just optimization power. If you are going to introduce definitions stick by them. :)

"Communication speed. Axons carry spike signals at 75 meters per second or less (Kandel et al. 2000).
That speed is a fixed consequence of our physiology. In contrast, software minds could be ported
to faster hardware, and could therefore process information more rapidly. (Of course, this also
depends on the efficiency of the algorithms in use; faster hardware compensates for less efficient
software.)"

This seems confusing... When we talk about the speed of computers we generally aren't talking about signal propagation speed (which has been a large fraction of the speed of light in most computers, AFAIK). It hasn't been something we have tried to optimise.

Having something with a fast signal propagation speed would allow for faster reaction times, but I'm not sure what other benefits you are suggesting that it would allow an AI to dominate humanity.

"Goal coordination. Let us call a set of AI copies or near-copies a "copy clan." Given shared goals, a
copy clan would not face certain goal coordination problems that limit human effectiveness
(Friedman 1993). A human cannot use a hundredfold salary increase to purchase a hundredfold
increase in productive hours per day. But a copy clan, if its tasks are parallelizable, could do just
that. Any gains made by such a copy clan, or by a human or human organization controlling that
clan, could potentially be invested in further AI development, allowing initial advantages to
compound."

This seems to neglect the overhead in normal co-ordination, e.g. who does what task. For example say you are doing research: you do a search on a subject and each copy takes one page of google scholar. They then follow interesting references. However these references are likely to overlap so you would get overlap of effort. And because the copy clones are likely to have the same interests, they are more likely to duplicate research compared to normal humans.

"Duplicability" : I'm sceptical of this to a certain extent. While it will lead to very good short-term gains, having lots of computer hardware that think the same way I think will cause some research avenues to be unexplored, due to all copies expecting that avenue to have minimal expected value (e.g. a billion einsteins might ignore quantum physics).

So I wouldn't expect this sort of intelligence explosion to dominate the rest of humanity in research and science.

Surely this doesn't increase intelligence just optimization power. If you are going to introduce definitions stick by them. :)

This jumped out at me as well, though I forgot about it when writing my other comment.

I think it's important to distinguish between what I'd call "internal" and "external" resources. If we took the "intelligence = optimization power / resources" thing to literally mean all resources, it would mean that AIs couldn't become more intelligent by simply adding hardware, which is arguably their strongest advantage over humans. It might also mean that bacteria could turn out to be "smarter" than humans - they can accomplish far fewer things, but they also use far less resources.

Intuitively, there's a clear difference between a larger brain making a human more powerful than a dog ("internal resources"), and a larger bank account making a human more powerful than another human ("external resources"). Fortunately, this distinction emerges pretty easily from Legg & Hutter's intelligence formalism. (Luke & Anna didn't actually use the formalism, but the distinction emerging easily from the formalism suggests to me that the distinction actually carves reality at the joints and isn't just an arbitrary one.)

The formalism is fundamentally pretty simple: there's an agent, which receives a stream of observations about the environment, chosen from some set of symbols. In response, it chooses some action, again chosen from some (other) set of symbols, and gets some reward. Then it makes new observations and chooses new actions.

Legg & Hutter's formalism treats the agent itself as a black box: it doesn't care about how the agent reaches its conclusions, or for that matter, whether the agent does anything that could be called "reaching a conclusion" in the first place. It only looks at whether the agent is able to match its actions to the observations so as to produce the highest rewards. So "internal resources" would be things that go into that black box, only affecting the choices that the agent makes.

"External resources", on the other hand, are things that affect the agent's set of actions. Intuitively, a rich person can do things that a poor person can't: for instance, buying a house that costs a million dollars. In Legg & Hutter's formalism, the rich person would have a broader set of actions that they could choose from.

We can then rewrite "intelligence is optimization power divided by resources" as "intelligence is the optimization power of an agent when their set of available actions is held constant". I think this is a pretty good match for our intuitive sense of "intelligence". If you can solve a puzzle in one move by bribing the person who's administering it, that might get you a higher amount of reward, but it doesn't make you more intelligent than the person who doesn't have that option and has to solve it the hard way. (If you wanted to be exact, you'd also need to hold the set of observations constant, so that e.g. a seeing person didn't end up more intelligent than a blind one.)

ETA: Technically, the poor person does have the same set of actions available as the rich person - they too can claim to have a million dollars and try to tell their bank to transfer the money to the seller. It's just that the same actions produce different consequences - if you don't have the million dollars in your bank, the bank will refuse to obey you. But saying that they have different sets of actions gets close enough, I think.

I think this particular argument would dissolve away if the paper said "may allow AIs to acquire vastly more optimization power" instead of "vastly more intelligent".

The key point here is not that AIs have more computational resources available than humans, but that they are (presumed) able to translate extra computational resource directly into extra cognitive power. So they can use that particular resource much more efficiently than humans.

EDIT: actually I'm confusing two concepts here. There's "computer hardware", which is an external resource that AIs are better at utilizing than we are. Then there's "computational power" which AIs obtain from computer hardware and we obtain from our brains. This is an internal resource, and while I believe it's what the paper was referring to as "increased computational resources", I'm not sure it counts as a "resource" for the purposes of Yudkowsky's definition of intelligence.

I have trouble with this definition to be honest. I can't help be nit-picky. I don't really like treating computers (including my own brain) as black box functions. I prefer to think of them as physical systems with many inputs and outputs (that are not obvious).

There are many actions I can't do, I can't consume 240V electricity or emit radio frequency em radiation. Is a power supply an external resource for a computer, are my hands an external resource for my brain (they allow more actions)?

Cutting off my hands would severely curtail my ability to solve problems unrelated to actually having hands (no more making notes on problems and typing a program to solve a problem would be a little bit trickier)

Okay so lets try a thought experiment: Give the AI a human body with a silicon brain that runs off the glucose in the blood supply. Brains use 20 watts or so (my Core 2 Duo laptop is about 12W when not doing much (although that includes a screen)). Give it no ethernet port, no wifi. Give it eyes and ears with that take the same data as a human. Then we could try to compare it roughly with a humans capabilities, to discover whether it is more intelligent or not. One major issue; If it doesn't perform the correct autonomic functions of the human brain (breathing etc), the body is likely to die and not be able to solve many problems. It is this kind of context sensitivity that makes me despair at trying to pin an intelligence number on a system.

However this model isn't even very useful for predicting the future. Computers do have gigabit ethernet, they can easily expand to take more power. Even if it took an age to learn how to control a pen to answer questions it doesn't help us.

This is unsatisfactory. I'll have to think about this issue some more.

I agree. It is the task of the intelligence to decide how "efficiently" will solve a particular task. A greater intelligence may decide to pack it together with some other problems and to solve it that way, many at once. It's less efficient form the point of view of this problem, but not from a broader perspective.

It is also not always that the time what's crucial, maybe the energy spent or the nerves of the boss spared or something else.

The more and the stronger motives served, would be a better definition of a greater intelligence.

Suppose agent A has goal G, and agent B has goal H (assumed to be incompatible). Put both agents in the same world. If you reliably end up with state G, then we say that A has greater optimization power.

I guess there's a hypothesis (though I don't know if this has been discussed much here) that this definition of optimization power is robust, i.e. you can assign each agent a score, and one agent will reliably win over another if the difference in score is great enough.

If the world is complex and uncertain then this will necessarily be "cross-domain" optimization power, because there will be enough novelty and variety in the sorts of tasks the agents will need to complete that they can't just have everything programmed in explicitly at the start.

So optimization power determines who ends up ruling the world - it's the thing that we really care about here.

But you can improve the optimization power of many kinds of agent just by adding some resource (such as money or computer hardware). This is relatively straightforward and doesn't constitute an innovation. But to improve the resource->optimization_power function, you do need innovation and this is what we're trying to capture by the word "intelligence".

(Just to make it clear, here I'm talking about innovation generating intelligence not intelligence generating innovations).

But we don't always expect optimization power to scale linearly with resources, so I think Robin Hanson may be closer to the mark with his "production function" model, than Yudkowsky with his "divide one thing by the other" model. If you give me so much money that I'm no longer getting much marginal value from it, you're not actually making me stupider.

Suppose agent A has goal G, and agent B has goal H (assumed to be incompatible). Put both agents in the same world. If you reliably end up with state G, then we say that A has greater optimization power.

Fitnesses are dependent on the environment, though. So: if agent A has goal GA, B has goal GB and C has goal CG, and A and B produce GA, B and C produce GB and C and A produce GC then you can't just assign scalar fitnesses to each agent and expect that to work. That could happen with circular predation, for example.

If you do want to assign scalar fitnesses to organisms - in order to compare them - I think you have to do something like testing them on a standard suite of test environments.

There's another flaw in the model which I presented, which is that I was only thinking about goals which conflict with other agents' goals. "Solve problem x for $5"-type tasks may not fall into that category, but may still require a lot of "intelligence" to solve. (Although narrow intelligence may be enough).

It is still painful to see the term "Intelligence Explosion" being used to refer only to future developments.

The Intelligence Explosion Is Happening Now. If people think otherwise they should make a case for that. This really matters because - if a process has been going on for thousands of years, then we might already know something about how it operates.

So far, about the only defense of this misleading terminology that I have seen boiled down to: "I . J .Good said so, and we should defer to him". For the actual argument see the section here titled "Distinguishing the Explosion from the Preceding Build-Up". I think that this is a pretty feeble argument - which in no way justifies the proposed usage.

A nuclear explosion begins when critical mass is reached. You can't just define the explosion as starting when the bomb casing shatters - and justify that by saying: that is when the human consequences start to matter. By then the actual explosion has been underway for quite some time.

I think that most of the people who promote the idea of a future-only explosion actually think that the explosion will happen in the future. They really think that there will be a future "ignition point", after which progress will "take off". This is a case of bad terminology fostering and promoting bad thinking. These people are just confused about how cultural evolution and evolutionary synergy work.

The "singularity" termminology is also to blame here. That is another case where bad terminology has resulted in bad thinking.

No doubt the picture of a strange future which can't be understood in terms of past trends appeals to some. If we believe their claim that future is filled with strange new woo that is totally different from what came before then maybe we should pay more attention to thier guide book. Perhaps the "future-only explosion" nonsense is best understood - not as attempted science, but as attempted manipulation. I suspect that this factor is involved in how this bad meme has spread. The weird and different future pictured can thus be best seen as a sociological phenomenon - rather than a scientific one.

Anyway, I think those involved should snap out of this one. It is pretty much 100% misleading nonsense. You folk should acknowledge the historical roots of the phenomenon and adopt an appropriate classification and naming scheme for it - and stop promoting the idea of a "future-only explosion".

(You could argue that e.g. the Flash Crash resulted because humans weren't in the loop, but humans can still interrupt the process eventually, so they're not really out of the loop - they just respond more slowly.)

(You could argue that e.g. the Flash Crash resulted because humans weren't in the loop, but humans can still interrupt the process eventually, so they're not really out of the loop - they just respond more slowly.)

A new paper suggests that ultrafast machine trading is causing crashes that don't last long enough for humans to react:

The speed in which the rises and falls occur might last no longer than half a second, unapparent to any human who is tracking prices. Johnson says if you blink you miss it. Flash events may happen in milliseconds and have nothing to do with a company’s real value.

...

Following the May 2010 event, U.S. regulators, as a safety mechanism, upheld circuit breakers designed to stop trading if a stock price makes a sudden large move. Whether or not that is the best solution around, considering the speed in which today’s machine trading can occur, does not convince all market experts. At that level of resolution, one of the study authors said it was troublesome to even observe, leave alone regulate.

If we want oversight of this kind of trading, it seems like we'll have to rely on more ultrafast machines.

There is no "loop", there are many loops, some of which humans have already been eliminated from - through the conventional process of automation.

Humans being in some of the loops does not necessarily even slow things down very much - if the other loops are permitted to whir around at full speed.

In my essay on the topic I cite increasingly infrequent periodic code reviews as an example of how human influence on designing the next generation of dominant creatures could fade away gradually, without causing very much slow-down. That sort of thing might result in a "slightly muffled" explosion, but it would still be an explosion.

It seems to me there's a continuum between "humans carefully monitoring and controlling a weakish AI system" and "superintelligent AI-in-a-box cleverly manipulates humans in order to wreak havoc". It seems that as the world transitions from one to the other, at some point it will pass an "intelligence explosion" threshold. But I don't think that it ever passes a "humans are no longer in the loop" threshold.

I haven't read the paper yet, but from all the other material I've seen from the SI, an important part of the "intelligence explosion" hypothesis is the establishment of a stable goal/utility function. This is qualitatively unlike what we have now, which is a system of agents competing and trading with each other.

An "intelligence explosion" refers to explosive increases in intelligence. What you are talking about sounds as though it has more to do with the social structure in which agents are embedded.

Do you deny that intelligence has increased recently? What about computers and calculators? What about collective intelligence? What about education? What about the evolution of human beings from chimp-like creatures.

Intelligence on the planet is exploding (in the sense of undergoing exponential growth) already, and has been for a long time - check with the facts.

There isn't really an "intelligence explosion hypothesis". The intelligence explosion is a well-established, ongoing event. One might make hypotheses about the future shape of the the intelligence explosion - although there are many of those.

I've read the paper, and while it mentions "intelligence explosion" a few times, they seem to be keeping that terminology taboo when it comes to the meat of the argument, which is what I think you were asking for.

Most of the material is phrased in terms of whether AIs will exhibit significantly more intelligence than human-based systems and whether human values will be preserved.

I think most people use "intelligence explosion" to mean something more specific than just exponential growth. But you're right that we should try and learn what we can about how systems evolve from looking at the past.

I've read the paper, and while it mentions "intelligence explosion" a few times, they seem to be keeping that terminology taboo when it comes to the meat of the argument, which is what I think you were asking for.

Yes, this is only a cosmetic issue with the paper, really.

I think most people use "intelligence explosion" to mean something more specific than just exponential growth.

Sure: explosions do also have to wind up going rapidly to qualify as such.

Humans tend to underestimate the likelihood of outcomes that can come about through many different paths (Tversky and Kahneman 1974), and we believe an intelligence explosion is one such outcome.

This sentence feels awkward to me. I'd rather see it replaced by something like "We focus on convergent outcomes because they are typically underestimated relative to detailed scenarios (Tversky and Kahneman 1974)."

On page 6, you mention "Kryder's law" as support for the accelerator of "massive datasets". Clearly larger diskspace enables us to use larger datasets, but how will these datasets be created? Is it obvious that we can create useful, large datasets?

On page 10, you write (editability as an AI advantage) "Of course, such possibilities raise ethical concerns.". I'm not sure why this sentence is there, is editability the only thing that raises these concerns? If yes, what are these concerns specifically?

On page 13, you cite "Muehlhauser 2011", this should probably be "Muehlhauser 2012"

As applied to AI risks in particular, a plan of differential intellectual progress would recommend that our progress on the philosophical, scientific, and technological problems of AI safety outpace our progress on the problems of AI capability such that we develop safe superhuman AIs before we develop arbitrary superhuman AIs.

One might reasonably hope that market forces would have a broadly similar effect. People simply won't buy unsafe machines (except for perhaps the military!).

However, unilateral selective relinquishment of technologies that facilitate machine intelligence may have the effect of disadvantaging our own efforts in that direction by rendering us impotent - and by ceding the initiative to other parties. That is a strategy that could easily have more costs than benefits. This possibility needs serious consideration before a path of selective relinquishment is taken on anything but a global scale.

As to global selective relinquishment - that involves a considerable coordination problem. We may see global coordination at some stage - but perhaps not long before we see sophisticated machine intelligence. A global plan may simply not be viable.

Would a strategy of biasing development and relinquishment have worked with car safety? There a big part of the problem is that society is prepared to trade speed for lives. Different societies make the tradeoff at different points. Technological approaches to improving safety might possibly help - but they don't really address one of the main causes of the problem. This perspective leads me to suspect that there are other strategies - besides biasing development and relinquishment - to consider here.

Technological fixes are neat, but we probably shouldn't just be thinking about them.

AIs will want to preserve themselves, as destruction would prevent them from further influencing the world to achieve their goals.

Would an AI sacrifice itself to preserve the functional status of two other AIs from its copy clan with similar goals?

Unless an AI is specifically programmed to preserve what humans value, it may destroy those valued structures (including humans) incidentally. As Yudkowsky (2008a) puts it, "the AI does not love you, nor does it hate you, but you are made of atoms it can use for something else."

Another possibility is, rather than trying to alter the values of the AI, alter the environment such that the AI realises that working against human values is likely to be counter productive in achieving its own goals. It doesn't have to share human values - just understand them and have a rational appreciation of the consequences of working against them.

The virtues of this article is the adoption of via negativa. If the main question is safety, then trying to build a AI with much efford guide us to erros of bad specifications. The problem is to convince AI researchers of that, I read some interviews here and part of them are not aware of the risks.