Yes, although I would word it as “the nodes include everything relevant to our implied preferences”, rather than “whole worlds”, just to be clear what we’re talking about. Certainly the entire notion of adding together two utilities is something which requires additional structure.

Roughly correct. The missing piece is completeness: for the DAG to uniquely define a utility function, we have to have an edge between each pair of nodes. Then the argument works.

The relative magnitude of the numbers matters only in nondeterministic scenarios, where we take an expectation over possible outcomes. If we restrict ourselves to deterministic situations, then any monotonic transformation of the utility function produces the exact same preferences. In that case, the toposort numbers are fine.

Your intuition about transitivity being the key requirement is a good intuition. Completeness is more of a model foundation; we need completeness in order to even have preferences which can be transitive in the first place. A failure of completeness would mean that there “aren’t preferences” in some region of world-space. In practice, that’s probably a failure of the model—if the real system is offered a choice, it’s going to do something, even if that something amounts to really weird implied preferences.

So when I talk about Dr Malicious pushing us into a region without ordered preferences, that’s what I’m talking about. Even if our model contains no preferences in some region, we’re still going to have some actual behavior in that region. Unless that behavior implies ordered preferences, it’s going to be exploitable.

As for AIs reasoning about universe-states...

First, remember that there’s no rule saying that the utility must depend on all of the state variables. I don’t care about the exact position of every molecule in my ice cream, and that’s fine. Your universe can be defined by an infinite-dimensional state vector, and your AI can be indifferent to all but the first five variables. That’s fine.

Other than that, the above comments on completeness still apply. Faced with a choice, the AI is going to do something. Unless its behavior implies ordered preferences, it’s going to be exploitable, at least when faced with those kinds of choices. And as long as that exploitability is there, Dr Malicious will have an incentive to push the AI into the region where completeness fails. But if the AI has ordered preferences in all scenarios, Dr Malicious won’t have any reason to develop peach-ice-cream-destroying nanobots, and we probably just won’t need to worry about it.

They were interviewed using the British Crime Survey questionnaire for domestic and sexual violence, and their responses were compared to those from 22,606 respondents to the 2011⁄12 national crime survey.

I find it totally plausible that there’s a much higher base rate of crime, abuse, etc among the mentally ill. But if we want to argue against the model “mentally ill people are more likely to be delusional or exaggerating”, then a study which just asks them is not going to lend much evidence.

The approximation of VNM rationality is foundational to most of economics. The whole field is basically “hey, what happens if you stick together VNM agents with different utility functions, information and resource baskets?”. So pretty much any successful prediction of economics is an example of humans approximating VNM-rational behavior. This includes really basic things like “prices increase when supply is expected to decrease”. If people lacked (approximate) utility functions, then prices wouldn’t increase (we’d just trade things in circles). If people weren’t taking the expectation of that utility function, then the mere expectation of shortage wouldn’t increase prices.

This is the sort of thing you need VNM utility for: it’s the underlying reason for lots of simple, everyday things. People pursue goals, despite having imperfect information about their environment—that’s VNM utility at work. Yes, people violate the math in many corner cases, but this is remarkable precisely because people do approximate VNM pretty well most of the time. Violations of transitivity, for instance, require fairly unusual conditions.

As for the risk of mugging, there are situations where you will definitely be money-pumped for violating VNM—think Wall Street or Vegas. In those situations, it’s either really cheap to money-pump someone (Wall Street), or lots of people are violating VNM (Vegas). In most day-to-day life, it’s not worth the effort to go hunting for people with inconsistent preferences or poor probability skills. Even if you found someone, they’d catch onto your money-pumping pretty quick, at which point they’d update to better approximate VNM rationality. Since it’s not profitable, people don’t usually do it. But as Wall Street and Vegas suggest, if a supply of VNM irrationality can be exploited with reasonable payoff-to-effort, people will exploit it.

Downvoting, because it is prescriptive, and the comment doesn’t even bother to argue why it wouldn’t be. VNM utility generalizes both the Dutch Book arguments and deterministic utility, and similar arguments apply.

Let’s talk about why a VNM utility is useful in the first place. The first reason is prescriptive: if you don’t have a VNM utility function, you risk being mugged by wandering Bayesians (similar to Dutch Book arguments). The second is descriptive: humans definitely aren’t perfect VNM-rational agents, but it’s very often a useful approximation. These two use-cases give different answers regarding the role of completeness.

First use-case: avoiding losing one’s shirt to an unfriendly Bayesian, who I’ll call Dr Malicious. The risk here is that, if we don’t even have well-ordered preferences in some region of world-space, then Dr Malicious could push us into that region and then money-pump us. But this really only matters to the extent that someone might actually attempt to pull a Dr Malicious on us, and could feasibly push us into a region where we don’t have well-ordered preferences. No one can feasibly push us into a world of peach ice-cream, and if they could, they’d probably have easier ways to make money than money-pumping us.

Second use-case: prediction based on approximate-VNM. Just like the first use-case, completeness really only matters over regions of world-space likely to come up in the problem at hand. If someone has no implicit utility outside that region, it usually won’t matter for our predictions.

So to close: this is an instance of spherical cow in a vacuum. In general, the spherical-cow-vacuum assumption is useful right up until it isn’t. Use common sense, remember that the real world does not perfectly follow the math, but the math is still really useful. You can add in corrections if and when you need them.

Donated. I’ve been wanting to see something like this for a while, excited to see it happen.

I’m sure there’s a dozen people reading this and preparing their argument for why it probably won’t work, but remember: with this kind of upside it’s worth donating even for a small chance that it will work. I wouldn’t give it very high chance of success, maybe 10-20% of proving something which increases human lifespan by at least 15 years, but that’s a very worthwhile investment.

From this standpoint, the key property of daemons (or any other goal-driven process) is that it’s adaptive: it will pursue the goal with some success across multiple possible environments. Intuitively, we expect that adaptivity to come with a complexity cost, e.g. in terms of circuit size.

Let’s set aside daemons for a moment, and think about a process which does “try to” make accurate predictions, but also “tries to” perform the relevant calculations as efficiently as possible. If it’s successful in this regard, it will generate small (but probably not minimal) prediction circuits. Let’s call this an efficient-predictor process. The same intuitive argument used for daemons also applies to this new process: it seems like we can get a smaller circuit which makes the same predictions, by removing the optimizy parts.

This feels like a more natural setting for the problem than daemons, but also feels like any useful result could carry back over to the daemon case.

The next step along this path: the efficient-predictor is presumably quite general; it should be able to predict efficiently in many different environments. The “optimizy parts” are basically the parts needed for generality. Over time, the object-level prediction circuit will hopefully stabilize (as the process adapts to its environment), so the optimizy parts mostly stop changing around the object-level parts. That’s something we could check for: after some warm-up time, we expect some chunk of the circuit (the optimizy part) to be mostly independent of the outputs, so we get rid of that chunk of the circuit.

Tl;dr: the second half of the post conflates ability bias with signalling.

You’d think you could also do this simply by paying people purely on the basis of their educations (and perhaps seniority), like a government or a union shop where ability isn’t relevant. The problem is you’d still need to fix that more able people more often attend, and conditional on going more often finish.

That is the obvious explanation, and is completely consistent with everyday experience. Sure, I could waltz through a PhD no problem, but I’m not getting paid nearly as much today as I would with a PhD.

And yes, I do get paid more than my former classmates who would not be able to handle a PhD. So that speaks to nonzero ability bias (n=1). On the other hand, the difference in pay between “has an undergrad degree and could handle grad” vs “has an undergrad degree and could not handle grad” is presumably way smaller than the difference between “has an undergrad degree and could handle grad” vs “has a grad degree”.

If you want to hypothesize a world where ability bias is actually zero, then yes, you’d have to turn to the weird scenarios in this post. But you don’t need any of that to hypothesize a world where ability bias looks like zero—is statistically indistinguishable from zero. For that, you just need a world where ability bias is rounding error on top of the main effect, i.e. signalling, and all the statistical effect from ability bias gets masked by that larger effect.

In particular, if you measure ability bias by looking at earnings after controlling for education level, then you do not need to “fix that more able people more often attend, and conditional on going more often finish” in order to find statistically zero ability bias effect. The effect from “more able people more often attend” etc would be signalling, not ability bias—as the earlier part of the post defines them. Those effects go away when we control for education; that’s the whole point of controlling for education. If the large effect is signalling, then of course we’re not going to find a large effect when we look for ability bias separate from signalling!

Now, I certainly agree that Berkeley professors arguing for the effectiveness of education should be viewed with an awful lot of suspicion. But maybe rather than just dropping a-priori anvils, look at the data? It’s entirely plausible that ability bias effects are statistically indistinguishable from zero, but this post doesn’t really provide much evidence toward that question one way or the other.

If any macrophenomenon is found to be reproducible, then it follows that all microscopic details that were not reproduced, must be irrelevant for understanding and predicting it. In particular, all circumstances that were not under the experimenter’s control are very likely not to be reproduced, and therefore are very likely not to be relevant.

I’m having trouble expressing in words just how useful that is. It clarifies a whole range of questions and topics I think about regularly. Thankyou for sharing!

I disagree that “pure empiricism usually works pretty well”. It’s more that a failure of theory looks different: to someone who lacks the necessary theory, a problem simply looks intractable or anti-inductive or just plain confusing.

Economics is a great source of examples: consider rent control. To someone with no knowledge of rent control, the fact that it creates a shortage of housing units is surprising; to someone who’s been through econ 101, it’s obvious. To someone without the theoretical knowledge, it might not even be obvious that rent control has anything to do with the shortage at all—to the pure empiricist, the world is full of random surprises popping up all the time, and a housing shortage is just one more such surprise. Why would it have anything to do with rent control?

This leads into a broader point: theory tells us which questions to ask in the first place. Economics theory says “housing shortage? Check for rent control!”, whereas pure empiricism would just check every conceivable factor to see what correlates with housing shortages. It’s the same principle as privileging the hypothesis: a large amount of evidence is needed just to distinguish a hypothesis from entropy in the first place. Theory can provide that evidence: it doesn’t always give the right answers, but it gives us enough evidence to pick a hypothesis.

The first objection is particularly interesting, and I’ve been mulling another post on it. As a general question: if you want to have high impact on something, how much decision-making weight should you put on leveraging your existing skill set, versus targeting whatever the main bottleneck is regardless of your current skills? I would guess that very-near-zero weight on current skillset is optimal, because people generally aren’t very strategic about which skills they acquire. So e.g. people in semiconductor physics etc probably didn’t do much research in clean energy bottlenecks before choosing that field—their skillset is mostly just a sunk cost, and trying to stick to it is mostly sunk cost fallacy (to the extent that they’re actually interested in reducing carbon emissions). Anyway, still mulling this.

Totally agree with the second objection. That said, there are technologies which have been around as long as PV which look at-least-as-promising-and-probably-more-so but receive far less research attention—solar thermal and thorium were the two which sprang to mind, but I’m sure there’s more. From an outside view, we should expect this to be the case, because academics usually don’t choose their research to maximize impact—they choose it based on what they know how to study. Which brings us back to the first point.

Personally, I usually like the values of five-year-olds better than the values of adults. The five-year-olds haven’t had the ambition beaten out of them yet, they at least still have their sights aimed high. They want to be astronauts or whatever. You talk to the average adult over thirty, and their life goals amount to “impress friends/​family, raise the kids well, prep for retirement, have some fun”.

Side note: I remember lying in bed worrying about this back in sixth grade. I promised myself I wouldn’t abandon my ambitions when I got older. Turns out I broke that promise; I decided my childhood ambitions weren’t ambitious enough. It just never occurred to me until high school that “don’t die at all” could be on the table.