A Guide to MIRI’s Research

by Nate Soares

In order to ensure that the development of smarter-than-human artificial intelligence has a positive impact, we must meet three formidable challenges: First, we must design smarter-than-human systems to be highly reliable, such that we can justify confidence that the system will fulfill the specified goals or preferences. Second, the designs must be error-tolerant, so that the systems are amenable to online modification and correction in the face of inevitable human error. Third, the system must actually learn beneficial goals or preferences.

MIRI’s current research program focuses on understanding how to meet these challenges in principle: there are aspects of reliable reasoning that we do not yet understand even in theory; there are questions of bounded rationality that we could not yet solve even using unlimited computing resources. Our study focuses on finding any solutions to these problems, practical or not, as a first step. As such, our modern research looks much more like pure mathematics than software engineering or practical machine learning.

This guide briefly overviews our research priorities, and provides resources that will help you get to the cutting edge on each subject area. This guide is not intended to justify these research topics; for further motivation of our approach, refer to our technical agenda and supporting papers.

Table of Contents

How to use this guide

Perhaps the shortest path to being hired as a MIRI researcher is to study the materials below, then attend the nearest MIRIx workshop two or three times, then attend a MIRI workshop or two and show an ability to contribute at the cutting edge. The same path (read these materials, then work your way through some workshops) will also help if you want to research these topics at some other institution.

You can learn most of the requisite material by simply reading the textbooks and papers below. However, with all of the material in this guide, please do not grind away for the sake of grinding away. If you already know the material, skip ahead. If one of the active research areas fails to capture your interest, switch to a different one. If you don’t like one of the recommended textbooks, find a better one or skip it entirely. The goal is to get yourself to the front lines with a solid understanding of what our research says. Hopefully, this guide can help you achieve that goal, but don’t let it hinder you!

The basics

It’s important to have some fluency with elementary mathematical concepts before jumping directly into our active research topics. All of our research areas are well-served by a basic understanding of computation, logic, and probability theory. Below are some resources to get you started.

You don’t need to read the books in this section in the order listed. Pick up whatever is interesting, and don’t hesitate to skip back and forth between the research areas and the basics as necessary.

Set Theory

Most of modern mathematics is formalized in set theory, and the textbooks and papers listed here are no exception. This makes set theory a great place to begin.

chapters 1-18

Computation and Logic

The theory of computability (and the limits posed by diagonalization) is foundational to understanding what can and can’t be done by machines.

chapters 1-4

Probability Theory

Probability theory is central to an understanding of rational agency. Some familiarity with reasoning under uncertainty is critical in all of our active research areas.

chapters 3-5 and 9

Probabilistic Graphical Models

This book will help flesh out an understanding of how inference can be done using probabilistic world-models.

Artificial Intelligence

Our research primarily focuses on foundational mathematical problems in intelligence, but knowledge of the modern field of artificial intelligence is important to put this work in context.

It’s also important to understand the concept of VNM rationality, which I recommend learning from the Wikipedia article but which can also be picked up from the original book. Von Neumann and Morgenstern showed that any agent obeying a few simple consistency axioms acts with preferences characterizable by a utility function. While some expect that we may ultimately need to abandon VNM rationality in order to construct reliable intelligent agents, the VNM framework remains the most expressive framework we have for characterizing the behavior of arbitrarily powerful agents. (For example, see the orthogonality thesis and the instrumental convergence thesis from Bostrom’s “The Superintelligent Will.”) The concept of VNM rationality is used throughout all our active research areas.

Realistic world-models

Formalizing beneficial goals does you no good if your smarter-than-human system is unreliable. There are aspects of good reasoning that we don’t yet understand, even in principle. It is likely possible to gain insight by building practical systems that use algorithms which seem to work, even if the reasons why they work are not yet well-understood: often, theoretical understanding follows in the wake of practical application. However, we consider this approach imprudent when designing systems that have the potential to become superintelligent: we will be safer if we have a theory of general intelligence on hand before attempting to create practical superintelligent systems.

For this reason, many of our active research topics focus on parts of general intelligence that we do not yet understand how to solve, even in principle. For example, consider the following problem:

I have a computer program, known as the “universe.” One function in the universe is undefined. Your job is to provide me with a computer program of the appropriate type to complete my universe program. Then, I’ll run my universe program. My goal is to score your agent according to how well it learns what the original universe program is.

How could I do this? Solomonoff’s theory of inductive inference sheds some light on a theoretical solution: it describes a method for making ideal predictions from observations, but only in the case where the predictor lives outside the environment. Solomonoff induction has led to many useful tools for thinking about inductive inference (including Kolmogorov complexity, the universal prior, and AIXI), but the problem becomes decidedly more difficult in the case where the agent is a subprocess of the universe, computed by the universe.

In the case where the agent is embedded inside the environment, the induction problem gets murky: what counts as “learning the universe program”? Against what distribution over environments should the agent be scored? What constitutes ideal induction in the case where the boundary between “agent” and “environment” becomes blurry? These are questions of “naturalized induction.”

Solving problems of naturalized induction requires gaining a better understanding of realistic world-models: What is the set of “possible realities”? What sort of priors about the environment would an ideal agent use? Answers to these questions must not only allow good reasoning, they must allow for the specification of human goals in terms of those world-models.

For example, in Solomonoff induction (and in Hutter’s AIXI), Turing machines are used to model the environment. Pretend that the only thing we value is diamonds (carbon atoms covalently bound to four other carbon atoms). Now, say I give you a Turing machine. Can you tell me how much diamond is within?

In order to design an agent that pursues goals specified in terms of its world models, the agent must have some way of identifying the ontology of our goals (carbon atoms) inside its world models (Turing machines). This “ontology identification” problem is discussed in “Formalizing Two Problems of Realistic World Models” (linked above), and was first introduced by De Blanc:

De Blanc’s “Ontological crises in artificial agents’ value systems” asks how one might make an agent’s goals robust to changes in ontology. If the agent starts with an atomic model of physics (where carbon atoms are ontologically basic) then this may not be hard. But what happens when the agent builds a nuclear model of physics (where atoms are constructed from neutrons and protons)? If the “carbon recognizer” was hard-coded, the agent might fail to identify any carbon in this new world-model, and may start acting strangely (in search of hidden “true carbon”). How could the agent be designed so that it can successfully identify “four-proton atoms” with “carbon atoms” in response to this ontological crisis?

Legg and Hutter’s “Universal Intelligence: A Definition of Machine Intelligence” describes AIXI, a universally intelligent agent in settings where the agent is separate from the environment, and a “scoring metric” used to rate the intelligence of various agent programs in this setting. Hutter’s AIXI and Legg’s scoring metric are very similar in spirit to what we are looking for in response to problems of naturalized induction and ontology identification. The two differences are that AIXI lives in a universe where agent and environment are separated whereas naturalized induction requires a solution where the agent is embedded within the environment, and AIXI maximizes rewards specified in terms of observations whereas we desire a solution that optimizes rewards specified in terms of the outside world.

You can learn more about AIXI in Hutter’s book Universal Artificial Intelligence, although reading Legg’s paper (linked above) is likely sufficient for our purposes.

Decision theory

Say I give you the following: (1) a computer program describing a universe; (2) a computer program describing an agent; (3) a set of actions available to the agent; (4) a set of preferences specified over the history of states that the universe has been in. I task you with identifying the best action available to the agent, with respect to those preferences. For example, your inputs might be:

(Notice how the agent is embedded in the environment.) This is another question that we don’t know how to answer, even in principle. It may seem easy: just iterate over each action, figure out which outcome the agent would get if it took that action, and then pick the action that leads to the best outcome. But as a matter of fact, in this thought experiment, the agent is a deterministic subprocess of a deterministic computer program: there is exactly one action that the agent is going to output, and asking what “would happen” if a deterministic part of a deterministic program did something that it doesn’t do is ill-defined.

In order to evaluate what “would happen” if the agent took a different action, a “counterfactual environment” (where the agent does something that it doesn’t) must be constructed. Satisfactory theories of counterfactual reasoning do not yet exist. We don’t yet understand how to identify the best action available to an agent embedded within its environment, even in theory, even given full knowledge of the universe and our preferences and given unlimited computing power.

Solving this problem will require a better understanding of counterfactual reasoning; this is the domain of decision theory.

Decision Theory

Peterson’s textbook explain the field of normative decision theory in broad strokes. For a quicker survey, with a stronger focus on Newcomblike problems, see Muehlhauser’s “Decision theory FAQ.”

Game Theory

Many open problems in decision theory involve multi-agent settings. I have heard good things about Tadelis’ textbook, but have not read it myself. You also may have luck with Scott Alexander’s “Introduction to game theory” on LessWrong.

chapters 1-5(+6-9 if enthusiastic)

Provability Logic

Toy models of multi-agent settings can be studied in an environment where agents base their actions on the things that they can prove about other agents in the same environment. Our current toy models make heavy use of provability logic.

Existing methods of counterfactual reasoning turn out to be unsatisfactory both in the short term (in the sense that they systematically achieve poor outcomes on some problems where good outcomes are possible) and in the long term (in the sense that self-modifying agents reasoning using bad counterfactuals would, according to those broken counterfactuals, decide that they should not fix all of their flaws). My talk “Why ain’t you rich?” briefly touches upon both these points. To learn more, I suggest the following resources:

Soares & Fallenstein’s “Toward idealized decision theory” serves as a general overview, and further motivates problems of decision theory as relevant to MIRI’s research program. The paper discusses the shortcomings of two modern decision theories, and discusses a few new insights in decision theory that point toward new methods for performing counterfactual reasoning.

If “Toward idealized decision theory” moves too quickly, this series of blog posts may be a better place to start:

Benson-Tilsen’s “UDT with known search order” is a somewhat unsatisfactory solution. It contains a formalization of UDT with known proof-search order and demonstrates the necessity of using a technique known as “playing chicken with the universe” in order to avoid spurious proofs.

In order to study multi-agent settings, Patrick LaVictoire has developed a modal agents framework, which has also allowed us to use provability logic to make some novel progress in the field of decision theory:

Barasz et al.’s “Robust cooperation in the Prisoner’s Dilemma” allows us to consider agents which decide whether or not to cooperate with each other based only upon what they can prove about each other’s behavior. This prevents infinite regress; in fact, the behavior of two agents which act only according to what they can prove about the behavior of the other can be determined in quadratic time using results from provability logic.

These blog posts are of historical interest, but nearly all of their content is in ”Toward idealized decision theory”, above.

Logical uncertainty

Imagine a black box, with one input chute and two output chutes. A ball can be put into the input chute, and it will come out of one of the two output chutes. Inside the black box is a Rube Goldberg machine which takes the ball from the input chute to one of the output chutes.

A perfect probabilistic reasoner who doesn’t know which Rube Goldberg machine is in the box doesn’t know how the box will behave, but if they could figure out which machine is inside the box, then they would know which chute would take the ball. This reasoner is environmentally uncertain.

A realistic reasoner might know which machine is in the box, and might know exactly how the machine works, but may lack the deductive capability to figure out where the machine will drop the ball. This reasoner is logically uncertain.

Probability theory assumes logical omniscience; it assumes that reasoners know all consequences of the things they know. In reality, bounded reasoners are not logically omniscient: we can know precisely which machine the box implements and precisely how the machine works, and just not have the time to deduce where the ball comes out. We reason under logical uncertainty.

A formal theory of reasoning under logical uncertainty does not yet exist. Gaining this understanding is extremely important when it comes to constructing a highly reliable generally intelligent system: whenever an agent reasons about the behavior of complex systems, computer programs, or other agents, it must operate under at least a little logical uncertainty.

To understand the state of the art, a solid understanding of probability theory is a must; consider augmenting the first few chapters of Jaynes with Feller, chapters 1, 5, 6, and 9, and then study the following papers:

Gaifman’s “Concerning measures in first-order calculi” looked at this problem many years ago. Gaifman has largely focused on a relevant subproblem, which is the assignment of probabilities to different models of a formal system (assuming that once the model is known, all consequences of that model are known). We are now attempting to expand this approach to a more complete notion of logical uncertainty (where a reasoner can know what the model is but not know the implications of that model), but work by Gaifman is still useful to gain a historical context and an understanding of the difficulties surrounding logical uncertainty.

Hutter et al.’s “Probabilities on sentences in an expressive logic” largely looks at the problem of logical uncertainty assuming access to infinite computing power (and many levels of halting oracles). Understanding Hutter’s approach (and what can be done with infinite computing power) helps flesh out our understanding of where the difficult questions lie.

Demski’s “Logical prior probability” provides an computably approximable logical prior. Following Demski, our work largely focuses on the creation of an approximable prior probability distribution over logical sentences, as the act of refining and approximating a logical prior is very similar to the act of reasoning under logical uncertainty in general.

Vingean reflection

Much of what makes the AI problem unique is that a sufficiently advanced system will be able to do higher-quality science and engineering than its human programmers. Many of the possible hazards and benefits of an advanced system stem from its potential to bootstrap itself to higher levels of capability, possibly leading to an intelligence explosion.

If an agent achieves superintelligence via recursive self-improvement, then the impact of the resulting system depends entirely upon the ability of the initial system to reason reliably about agents that are more intelligent than itself. What sort of reasoning methods could a system use in order to justify extremely high confidence in the behavior of a yet more intelligent system? We refer to this sort of reasoning as “Vingean reflection”, after Vernor Vinge (1993), who noted that it is not possible in general to precisely predict the behavior of agents which are more intelligent than the reasoner.

A reasoner performing Vingean reflection must necessarily reason abstractly about the more intelligent agent. This will almost certainly require some form of high-confidence logically uncertain reasoning, but in lieu of a working theory of logical uncertainty, reasoning about proofs (using formal logic) is the best available formalism for studying abstract reasoning. As such, a modern study of Vingean reflection requires a background in formal logic:

First-Order Logic

MIRI’s existing toy models for studying self-modifying agents are largely based on this logic. Understanding the nuances of first-order logic is crucial for using the tools we have developed for studying formal systems capable of something approaching confidence in similar systems.

We study Vingean reflection by constructing toy models of agents which are able to gain some form of confidence in highly similar systems. To get to the cutting edge, read the following papers:

Yudkowsky’s “The procrastination paradox” goes into more detail on the need for satisfactory solutions to walk a fine line between the Löbian obstacle (a problem stemming from too little “self-trust”) and unsoundness that come from too much self-trust.

Christiano et al.’s “Definability of truth in probabilistic logic” describes an early attempt to create a formal system that can reason about itself while avoiding paradoxes of self-reference. It succeeds, but has ultimately been shown to be unsound. My walkthrough for this paper may help put it into a bit more context.

If you’re excited about this research topic, there are a number of other relevant tech reports. Unfortunately, most of them don’t explain their motivations well, and have not yet been put into their greater context.

Fallenstein’s “Decreasing mathematical strength…” describes one unsatisfactory property of Parametric Polymorphism, a partial solution to the Löbian obstacle. Soares’ “Fallenstein’s monster” describes a hackish formal system which avoids the above problem. It also showcases a mechanism for restricting an agent’s goal predicate which can also be used by Parametric Polymorphism to create a less restrictive version of PP than the one explored in the tiling agents paper. Fallenstein’s “An infinitely descending sequence of sound theories…” describes a more elegant partial solution to the Löbian obstacle, which is now among our favored partial solutions.

Corrigibility

As artificially intelligent systems grow in intelligence and capability, some of their available options may allow them to resist intervention by their programmers. We call an AI system “corrigible” if it cooperates with what its creators regard as a corrective intervention, despite default incentives for rational agents to resist attempts to shut them down or modify their preferences.

This field of research is basically brand-new, so all it takes in order to get up to speed is to read a paper or two:

Soares et al.’s “Corrigibility” introduces the field at large, along with a few open problems.

Armstrong’s “Proper value learning through indifference” discusses one potential approach for making agents indifferent between which utility function they maximize, which is a small step towards agents that allow themselves to be modified.

Our current work on corrigibility focuses mainly on a small subproblem known as the “shutdown problem”: how do you construct an agent that shuts down upon the press of a shutdown button, and which does not have incentives to cause or prevent the pressing of the button? Within that subproblem, we currently focus on the utility indifference problem: how could you construct an agent which allows you to switch which utility function it maximizes, without giving it incentives to affect whether the switch occurs? Even if we had a satisfactory solution to the utility indifference problem, this would not yield a satisfactory solution to the shutdown problem, as it still seems difficult to adequately specify “shutdown behavior” in a manner that is immune to perverse instantiation. Stuart Armstrong has written several blog posts about the specification of “reduced impact” AGIs:

Early work in corrigibility can be found on the web forum Less Wrong. Most of the relevant results are captured in the above papers. One of the more interesting of these is “Cake or Death”, an example of the “motivated value selection” problem. In this example, an agent with uncertainty about its utility function benefits from avoiding information that reduces its uncertainty.

Armstrong’s “Utility indifference” outlines the original utility indifference idea, and is largely interesting for historical reasons. It is subsumed by the “Proper value learning through indifference” paper linked above.

Value learning

Since our own understanding of our values is fuzzy and incomplete, perhaps the most promising approach for loading values into a powerful AI is to specify a criterion for the agent to learn our values incrementally. But this presents a number of interesting problems:

Say you construct a training set containing many outcomes filled with happy humans (labeled “good”) and other outcomes filled with sad humans (labeled “bad”). The simplest generalization, from this data, might be that humans really like human-shaped smiling-things: this agent may then try to build many tiny animatronic happy-looking people.

Value learning must be an online process: the system must be able to identify ambiguities and raise queries about these ambiguities to the user. It must not only identify cases that it doesn’t know how to classify (such as cases where it cannot tell whether a face looks happy or sad), but also identify dimensions along which the training data gives no information (such as when your training data never shows outcomes filled with human-shaped automatons that look happy, labeled as worthless).

Of course, ambiguity identification alone isn’t enough: you don’t want a system that spends the first three weeks asking for clarification on whether humans are still worthwhile when they are at different elevations, or when the wind is blowing, before finally (after the operations have stopped paying attention) asking whether it’s important that the human-shaped things be acting of their own will.

In order for an agent to reliably learn our intentions, the agent must be constructing and refining a model of its operator and using that model to inform its queries and alter its preferences. To learn more about these problems and others, see the following:

MacAskill’s “Normative Uncertainty” provides a framework for discussing normative uncertainty. Be warned, the full work, while containing many insights, is very long. You can get away with skimming parts and/or skipping around some, especially if you’re more excited about other areas of active research.

One approach to resolving normative uncertainty is Bostrom & Ord’s “parliamentary model,” which suggests that value learning is somewhat equivalent to a voter aggregation problem, and that many value learning systems can be modeled as parliamentary voting systems (where the voters are possible utility functions).

Owen Cotton-Barratt’s “Geometric reasons for normalising…” discusses the normalization of utility functions; this is relevant to toy models of reasoning under moral uncertainty.

Fallenstein & Stiennon’s “Loudness” discusses a concern with aggregating utility functions stemming from the fact that the preferences encoded by utility functions are preserved under positive affine transformation (e.g. as the utility function is scaled or shifted). This implies that special care is required in order to normalize the set of possible functions.

Other tools

Mastery in any subject can be a very powerful tool, especially in the realm of mathematics, where seemingly disjoint topics are actually deeply connected. Many fields of mathematics have the property that if you understand them very very well, then that understanding is useful no matter where you go. With that in mind, while the subjects listed below are not necessary in order to understand MIRI’s active research, an understanding of each of these subjects constitutes an additional tool in the mathematical toolbox that will often prove quite useful when doing new research.

Discrete Math

Most math studies either continuous or discrete structures. Many people find discrete mathematics more intuitive, and a solid understanding of discrete mathematics will help you gain a quick handle on the discrete versions of many other mathematical tools, such as group theory, topology, and information theory.

Linear Algebra

Linear algebra is one of those tools that shows up almost everywhere in mathematics. A solid understanding of linear algebra will be helpful in many domains.

Type Theory

Set theory commonly serves as the foundation for modern mathematics, but it’s not the only available candidate. Type theory can also serve as a foundation for mathematics, and in many cases, type theory is a better fit for the problems at hand. Type theory also bridges much of the theoretical gap between computer programs and mathematical proofs, and is therefore often relevant to certain types of AI research.

Category Theory

Category theory studies many mathematical structures at a very high level of abstraction. This can help you notice patterns in disparate branches of mathematics, and makes it much easier to transfer your mathematical tools from one domain to another.

Topology

Topology is another one of those subjects that shows up pretty much everywhere in mathematics. A solid understanding of topology turns out to be helpful in many unexpected places.

Computability and Complexity

MIRI’s math research is working towards solutions that will eventually be relevant to computer programs. A good intuition for what computers are capable of is often essential.

Program Verification

Program verification techniques allow programmers to become confident that a specific program will actually act according to some specification. (It is, of course, still difficult to validate that the specification describes the intended behavior.) While MIRI’s work is not currently concerned with verifying real-world programs, it is quite useful to understand what modern program verification techniques can and cannot do.

Understanding the mission

Why do this kind of research in the first place? (The first book below is the most important.)

Superintelligence

This guide largely assumes that you’re already on board with MIRI’s mission, but if you’re wondering why so many people think this is an important and urgent area of research in the first place, Superintelligence provides a nice overview.

Global Catastrophic Risks

But what about other global risks? How does AI compare to them? This book provides an introductory overview of the global risk landscape.

Rationality: From AI to Zombies

This electronic tome compiles six volumes of essays that explain much of the philosophy and cognitive science behind MIRI’s perspective on AI.