Sometimes all functions are continuous

You may have heard at times that there are mathematicians who think that all functions are continuous. One way of explaining this is to show that all computable functions are continuous. The point not appreciated by many (even experts) is that the truth of this claim depends on what programming language we use.

Surely they are not all continuous!?

You must be thinking to yourself: how can anyone in their right mind claim that all functions are continuous – here’s one that isn’t:
$$\mathrm{sgn}(x)=\begin{cases}
-1 & x < 0 \\
0 & x = 0 \\
1 & x > 0
\end{cases}$$
At $x=0$ the sign function jumps from $-1$ to $0$ to $1$, which is a nice discontinuity. As crazy as it seems, it makes sense to refuse to admit that $\textrm{sgn}$ is a legitimate function!

The official definition nowadays is that a function $f : A \to B$ is the same thing as a functional relation on $A \times B$. Recall that a relation $R$ on $A \times B$ is just a subset of $A \times B$. It is functional when it is

single-valued: if $(x,y) \in R$ and $(x,z) \in R$ then $y = z$.

total: for every $x \in A$ there is $y \in B$ such that $(x,y) \in R$.

If you are a programmer or a computer-sciency mathematician, you will probably recite the above definition when asked by a judge what a function is. Alas, your own brain secretly thinks that functions are the same thing as procedures that you can implement on your computer. Perhaps you will not admit it, but a careful psychoanalysis of your mind would reveal, among other things, that you never ever concern yourself with non-computable functions. (Note: if you are a physicst, your mind is of a different sort. I shall address your psychology on another day.)

There are no discontinuous ones

So for a moment let us shake off the prejudices of classical training and accept the following definition, which is accepted in certain brands of constructive mathematics.

Definition: A function $f : A \to B$ is a computational procedure which takes as an input $x \in A$ and yields as output a uniquely determined $y \in B$.

We have left out many details here, but the basic intuition is clear: “It ain’t a function if it can’t be computed”. Now we return to the sign function defined above. Is it a function? In order to compute $\mathrm{sgn}(x)$, we must for any given real number $x$ discover whether it is negative, zero, or positive. So suppose we inspect the digits of $x$ for a while and discover, say, that the first billion of them are all zero. Then $\mathrm{sgn}(x)$ is either $0$ or $1$, but we cannot tell which. In fact, no finite amount of computation will guarantee that we will be able to tell whether $x=0$ or $x > 0$. Since infinite amount of computation is impossible in finite time (please do not drag in quantum computing, it does not help) the sign function cannot be computed. It is not a function!

An argument similar to the one above shows that we cannot hope to compute a function $f : \mathbb{R} \to \mathbb{R}$ at a point of discontinuity. With a more formal treatment we could prove:

Theorem: There are no discontinuous functions.

As long as we stick to functions that can actually be computed, we will never witness a discontinuity. Classicaly, there is no difference between the above theorem and

Theorem (classically equivalent form): All functions are continuous.

When constructive mathematicians says that “all functions are continuous” they have something even better in mind. They are telling you that all functions are computably continuous. This is where interesting stuff begins to happen.

They might be all continuous

To simplify the argument, let me switch from real-valued functions $\mathbb{R} \to \mathbb{R}$ to something that is more readily handled in a programming language. Instead of the reals, let us consider Baire space $B$, which is the space of all infinite number sequences,
$$B = \mathbb{N} \to \mathbb{N}.$$
Programmers will recognize in $B$ the datatype of infinite streams of non-negative integers, or the total elements of type nat -> nat. A function $f : B \to \lbrace 0,1 \rbrace$ is said to be classically continuous if the value $f(x)$ depends only on finitely many terms of the input sequence $x$. A function $f$ is computably continuous, if we can actually compute an upper bound on how many terms of $x$ are needed to determine $f(x)$. (This characterization of continuity of $f : B \to \lbrace 0,1 \rbrace$ comes from taking the product topology on $B$ and discrete topology on $\lbrace 0,1 \rbrace$.) Since we are concerned with computable continuity we shall drop the qualifier “computable” and call it just continuity.

Theorem: All functions $B \to \lbrace 0,1 \rbrace$ are continuous.

Without going into too many details, let me just state that the proof of the above theorem comes down to the following programming exercise.

Exercise: Write a program $m : (B \to \lbrace 0,1 \rbrace) \to B \to \mathbb{N}$ such that, for any $f : B \to \lbrace 0,1 \rbrace$ and $x \in B$ the value of $f(x)$ depends at most on the first $m(f)(x)$ terms of $x$. More precisely, if $M = m(f)(x)$, then for every $y \in B$, if $x_0 = y_0$, …, $x_M = y_M$ then $f(x) = f(y)$.

The program $m$ is called a modulus of continuity functional because it gives us information about how $f$ is continuous: any input $y \in B$ which agrees with $x$ in the first $m(f)(x)$ terms gives the same output $f(y)$ as does $f(x)$. Incidentally, such an $m$ is an order 3 functional.

A strategy for computing $m(f)(x)$ is clear enough: given $f$ and $x$, inspect the computation of $f(x)$ to discover how much of input $x$ is actually needed to arrive at the answer. Since the computation of $f(x)$ takes only finitely many steps, an infinite prefix of the sequence $x$ will suffice.

Let us write $m$ in a real programming language. How can we do that? The trick is rather obvious, once you think of it. Instead of computing $f(x)$, we compute $f(y)$ where $y$ is special: it gives the same answers as $x$ but it also records the highest index $n$ which was fed to it. In ocaml we get something like this:

Function $y$ behaves like $x$, except that it stores in the reference $k$ the highest value of $n$ for which $y n$ has been computed so far. To compute $m(f)(x)$ we evaluate $f y$. After it returns, we look at $k$ to see how many terms $f$ used.

Another possibility is to use exceptions instead of mutable store. This time, we feed $f$ a $y$ which behaves like $x$, except that if $f$ attempts to compute $y n$ for $n \geq k$, we throw an exception. Here $k$ is a variable parameter. First we try to compute $f y$ with $k = 0$. If we get an exception, we catch it and try $k = 1$. As long as we keep getting exceptions, we increase the value of $k$. At some point no exception happens and we know that $f$ looked only at those values $y n$ for which $n \leq k$. The code in ocaml is this:

We now have two solutions, one using mutable store, another using exceptions. Could we use some other feature instead? Yes, we could program $m$ using continuations, which I leave as an exercise.

An interesting question comes to mind: which programming features, apart from the ones we mentioned above, allow us to program $m$? In terms of Haskell, the same question is which monads allow us to program $m$. Can we do it without any extra features and use just “pure” functional programming?

If by pure functional programming we understand a functional programming language with natural numbers, booleans and recusive definitions, also known as PCF, the answer is no. The proof of this uses denotational semantics and domain theory, and I will not go into it now. You may entertain yourself by trying and failing to define $m$ in pure Haskell or ML, i.e., no side effects, no exceptions, no mutable store, no parallelism, no monads.

As far as I know, nobody has studied systematically which programming features allow us to program $m$. So I pose it as a question:

What can be said about those programming features X for which PCF+X causes all functions to be computably continuous?

We know mutable store, exceptions and continuations give us continuity of all functions. Some other candidates to think about are: parallelism, non-determinism, communication channels, timeouts (interrupting computations that take too long) and unqouting (disassembling the code at run-time).

Type II Computability

Very likely “Type 2 Computability” experts are lurking around. This comment is for them. In Type 2 Computability we do not work with a programming language but directly with (Type 2) Turing machines. There is a well-known proof that in this setting the modulus of continuity $m$ is computable. The proof uses the fact that functions $B \to \lbrace 0,1 \rbrace$ are represented in such a way that the value of $m(f)(x)$ can be read off the tape directly.

Type II Turing machines are not special. They are just another programming language in disguise. In comparison with pure functional programming, a Type II Turing machine can do the following special thing: it can do a thing for a while, and if it takes too long, it just stops doing it and does something else instead. The programming feature that this corresponds to is timeout, which works as follows: $\mathtt{timeout}\, k\, x\, y$ evaluates $x$ for at most $k$ steps. If $x$ evaluates successfully within $k$ steps, its value is returned. If the evaluation of $x$ is interrupted, $y$ is evaluated instead. I have heard John Longley claim that Type II Turing machines are equivalent to PCF+timeout+catch (where “catch” is a form of local exceptions), but I think the question was never settled. It would be interesting to know, because then we could replace arguments using Turing machines with source code written in ML+timeout+catch.

Is there a lesson?

The lesson is for those “experts” who “know” that all reasonable models of computation are equivalent to Turing machines. This is true if one looks just at functions from $\mathbb{N}$ to $\mathbb{N}$. However, at higher types, such as the type of our function $m$, questions of representation become important, and it does matter which model of computation is used.

Post navigation

38 thoughts on “Sometimes all functions are continuous”

“In order to compute sgn(x), we must for any given real number x discover whether it is negative, zero, or positive. So suppose we inspect the digits of x for a while and discover, say, that the first billion of them are all zero. Then sgn(x) is either 0 or 1, but we cannot tell which. In fact, no finite amount of computation will guarantee that we will be able to tell whether x=0 or x>0. Since infinite amount of computation is impossible in finite time (please do not drag in quantum computing, it does not help) the sign function cannot be computed. It is not a function!”

By this logic, wouldn’t all functions be non-computable? For all functions, we need to know what x is to determine what the function output should be.

Of course not. The special thing about `\text{sgn}(x)` is that you need to “know” `x` to infinite precision in order to produce just one bit of output. But there are many functions where, in order to compute `n` bits of output you only need to know some finite amount of input bits. For example, if we took the function `f(x) = 2x`, then in order to get `n` bits of `f(x)` correct, we need to know only the first `n+1` bits of `x`.

I think we need to be careful about what “to know `x`” means. It makes no sense to say “a machine knows the number `x`” — a machine might be given `x` as an infinite sequence of digits on an input tape, but it does not “know” `x` in the sense of it being aware of `x`.

It’s not naive but it does not work because it destroys computability of basic operations. Suppose we did that. Then every operation on reals, e.g. addition and subtractions, would have to compute not only the magnitude of the result, but also its sign. In particular, we would have to compute the sign of `x – y`, given `x`, `y` and their signs. But this is impossible, as it would allow us to compare real numbers (compute `x – y` and look at the sign to see whether `x = y`), from which we could solve the Halting Problem.

So, if you’re willing to sacrifice computability of addition and subtraction, you can do a thing like that. But then you are not representing “real” real numbers anymore.

Can the proof be done by simply observing that this functional m does not preserve extensionality of functions, and hence cannot be definable in the pure lambda calculus? Anyway, this is a nice example.

I’m interested in the continuation implementation because I thought that continuations could be implemented in the pure lambda calculus. At least they seem to be done in such a way in Haskell’s Control.Monad.Cont.

Indeed, the proof that you cannot write modulus of continuity in pure `lambda`-calculus has to do with the fact that modulus seems to be non-extensional. But do you really think it is just a matter of “observation” that the functional `m` is necessarily non-extensional?

And here is functional `m` with continuations. I used SML/NJ continuations. If there is a nice person who is proficient in Haskell, please rewrite it in Haskell for roconnor.

Well, I think it is obviously true, but probably hard to prove. ;) In the following, newf transforms f into an extensionally equivalent function that accesses at least the 100th element.

newf f b = (f b)+(b 100)-(b 100)

or even worse newf’ accesses some element greater than f does.

newf’ f b = (f b)+(b ((m f b)+1))-(b ((m f b)+1))

Of course to prove it requires one to actually define the semantics of evaluation to determine what the elements accessed are. Then proving all this works is probably not trivial. For instance, I suppose it possible to have some strange semantics that replaces e-e directly with 0. Then my example doesn’t work.

No, it is not crazy. By the same reasoning, we should all stop doing mathematics because “most problems are unsolvable”, right? But clearly mathematics has had an enormous impact on our society (a good one). Similarly, real numbers have had an imporant value in the history of science. Without classical analysis, which is based on real numbers, physics probably would not have made the progress it did.

Even if it turns out that in the future we will not use real numbers at all (I find this probable, in fact), this does not mean that today we should dismiss real numbers based on speculation by a couple of physicsts and Chaitin’s fine piece of popular science writing.

If you are claiming that it is crazy to talk about computations on real numbers because somehow computation on non-computable reals is an ill-defined concept, then you are just flat out mistaken.

I ported the code to Haskell, and now I clearly see what is going on. Although it is essentially written in the pure lambda calculus one is still forced inside the continuation monad (of course). This means the type of m becomes

((a -> Cont a b) -> Cont a t) -> (a -> Cont a b) -> a

where a is something like Integer.

So the input are functions and functionals that are Kleisli arrows rather than pure function arrows.
This type is basically a nice way of writing

Hi! In comment 4, you write: “But this is impossible, as it would allow us to compare real numbers […], from which we could solve the Halting Problem.”

Are there formal systems in which the Halting Problem is solvable? If so, does this have any consequences on the cited passage? (Sorry if my questions are naive or FAQs. Pointers to papers would totally suffice.)

I am not quite sure you asked what you meant to ask. To solve the Halting Problem one has to provide a program which does a certain thing. Such a program does not exist. The proof that the required program does not exist only uses logical principles that everybody agrees on, i.e., it is valid both intuitionistically and classically, so one would have to come up with a rather bizarre logic to invalidate the proof. Are you asking whether it is possible to change the meaning of the word “program” so that there will be one that solves the Halting Problem?

The Halting Problem also manifests itself in logical form as the question whether the following holds:

Trivially, the statement holds if we use classical logic, as it is just a special case of the law of excluded middle. The statement cannot be proved in constructive logic. There are formal systems strictly weaker than classical logic in which the statement holds. All of this has little to do with what can actually be computed (with the usual understanding of the word “program”). I hope I am answering your question, whatever it is.

Thank you very much, you have not only answered my question, but also gave me new food for thought, namely the logical form of the Halting Problem. :) Thanks!

(For reference, my question was answered in “so one would have to come up with a rather bizarre logic to invalidate the proof”. I thought that maybe there’d exist some known logic which is a) not too bizarre and b) invalidates the proof.)

From your argument for the continuity of sgn above, it appears that you’re using a two’s complement representation, scanning the bits from the “left” (i.e. your statement “… the first billion of them are all zero. Then sgn(x) is either 0 or 1 …”).

I don’t see how a similar argument applies to sg. Isn’t the first bit (again, in two’s comp. representation) enough to distinguish negative from non-negative?

I’m struggling with ‘All functions Bâ†’{0,1} are continuous’, yet it seems I can make a function over B, such as: ‘if the number stream contains a 1, return 1, otherwise return 0’, which should be rejected for the same reason ‘sgn’ is rejected.

Hopefully last update (sorry slow on the B uptake)… not if ‘number stream contains a 1, but f only sees a 1 only after a very large number of other values’ (understanding B as sets of sets of N). If the Type II proof orders the values on the tape, I think that’s cheating, if they don’t, then it seems like there needs to be some number of ‘timeouts’ above which the machine says ‘not a function’. If so, they should have the same problem as non-type II machines on random input-value tapes.

Reply to comment 15 by jn: First, I did not make an argument “for continuity” of sgn, but rather “against continuity”. Whether your sg is continuous (computable) depends on how you represent real numbers. Assuming a standard representation of reals (one in which addition is computable, for example), your sg is not computable and is not continuous as a map between representations. I do not know what “two’s complement representation” is for real numbers (including negative ones). Please provide more detail about your representation. Is addition computable in your representation?

If you can write down a base-10 representation of a real number then it must follow that that number is rational and finite, because irrational numbers have an infinite numeric representation and it is impossible to write down an infinite string and all finite base-10 numerals represent finite rational numbers.

So if you can write a number down, I can implement a sgn function that will tell you its sign using effort at most proportional to what it took you to write the number down.

If you get to manipulate real numbers containing an unbounded amount of information, then I get to have functions that process an infinite amount of information in a finite time. No finite tools can work with infinite bundles of information.

@NovaDenizen: you misunderstand what it means to inspect the digits of a number, or what it means to have a real number given as a sequence of digits. Namely, the real number is not given by an infinite sequence that is actually written down anywhere, but rather as an algorithm, or a black box, which accepts a number `n` and outputs the `n`-th digit. At no point do I manipulate an infinite amount of information, yet I can look at any digit I want. Of course, I cannot look at all digits in a finite time, which is sort of the point of the post.

@Jacob: your function is of course continuous as a map from the streams of digits to digits. However, this function does not represent a function between real numbers because it does not respect equality of reals. Let me call your function `f` for easier notation. Then we have `f(0.99999…) = 0` and `f(1.00000…) = 1`, however the streams of digits `0.99999…` and `1.0000…` both represent number one. One way to get around this problem is to forbid streams which end with `99999…` If we do so your function `f` will be well-defined as a map from the real numbers to digits. However, the topology that is induced on the reals with this representation is not the usual Euclidean topology but rather something that goes under the name Fine metric. And in this topology `f` is continuous again!

In general, when we compute with the points of a space `X` we do not actually use the points themselves but their representations in terms od datatypes (such as streams of digits). It is important that we take as the topology of `X` the one induced by the representation, otherwise we can make any function continuous or discontinuous at will just by changing the topology of `X`.

I’m coming to this discussion very late. I wanted to make two comments. First, thank you for helping me to understand, at last, the sense of the apparently obviously false statement that every function is continuous. Secondly, I thought I’d just point out that I was confused by the definition you gave of a function in the post and reached my eventual understanding only after reading your answer to the first question above. I think it would be helpful to make it more explicit what computation over the real numbers means. That is, a real number is something like an oracle that tells you, for each n, what the nth digit is. A function from the reals to the reals is a computational procedure that can convert an oracle for x into an oracle for f(x). The classical function sgn is not a function in this sense because, as you point out, if x happens to be 0 then you need to know infinitely many digits of x to be sure of even the first digit of sgn(x).

Actually, I now see that that’s not quite the right way of putting it, since then x goes to 3x wouldn’t be a function. (Proof: you need to know infinitely many digits of 1/3 to know whether its image begins with 0.9 or 1.0.) So to make sure we’re talking about the right topology we have to talk about knowing f(x) to arbitrary precision rather than about knowing digits.

But if you define a function to be a computable procedure that can output f(x) to arbitrary precision given x to sufficient precision, it feels almost tautologous to say that a function is continuous.

@gowers: Precisely, it is almost tautologous that computable implies continuous when you get the representations and topologies right. This might be why some people think that “all functions are continuous” is a reasonable axiom. As always, the difference between classical and constructive math is not in what is true, but in what words mean. Constructive mathematicians mean something else by “function” than classical mathematicians do.

“All reasonable models of computation are equivalent to Turing machines” is the Church-Turing thesis. What it means in this context is that functions like m and timeout are computable by Turing machines if the higher-order functional arguments are supplied as a Turing machine number (rather than as a black box). That doesn’t mean that there aren’t interesting things to say about black boxes that do and don’t support mutable state, call/cc, quantum gates, etc.; it just isn’t what the Church-Turing thesis happens to be about.

I think you should rewrite or delete your last paragraph, because as it stands it’s like accusing the “experts” who “know” that 1+1=2 of not realizing that it isn’t true of water and ethanol.

@Ben: if I understand you correctly your argument is: if we pick a particular representation (everything is represented by Gödel codes) then higher-order functions like $m$ and timeout are computable by Turing machines. I am guessing that you think (although you did not say so explicitly) that this is a general phenomenon, i.e., that the model you suggest is somehow “the most computable”. This however is not the case!

It just so happens that $m$ and timeout are computable in the Turing machine model, but other slightly different functionals are not! For instance, I could ask a very similar question: “Are all functions from Cantor space $2^\mathbb{N}$ to natural numbers $\mathbb{N}$ uniformly continuous?” The answer again depends on the model of computation, and for the Turing machine model it is negative. In fact, there is a Turing-computable function from $2^\mathbb{N}$ to $\mathbb{N}$ which is computably not uniformly continuous. But if we use some other representation, for instance Type II computability or domain theory, then the answer is positive. The various representations give incomparable models of computation at higher types. None of them is “the best”, they are just different.

Who are the “experts” in the last paragraph of the blog post? If they’re students or dilettantes, the paragraph might make sense. If they’re professionals, the paragraph seems quite cranky.

My 1+1=2 analogy wasn’t very good. Here’s a better one: a description of how to trisect an angle with a marked straightedge (correct and interesting), followed by “this should be a lesson for the ‘experts’ who ‘know’ that trisecting angles is impossible. That’s true if one just uses a compass and unmarked straightedge. However, there are other ways of doing constructions.” The problem isn’t in the construction. The problem is that you seem to think the experts don’t know about it.

In your post, you said that m and timeout are not computable in PCF, and implied that this should be a lesson to “experts” who think that “all reasonable models of computation are equivalent to Turing machines”. So I pointed out that they are computable by Turing machines in the specific sense of computability that’s relevant to the Church-Turing thesis, which is what experts actually believe if they say they believe “all reasonable models of computation are equivalent to Turing machines”.

It’s not clear to me what you mean by type II computation. I thought type II computations had inputs and outputs which in general have no finite description, making them more general than type I computations. Unless someone shows how to actually perform such computations in the real world, they aren’t relevant to the Church-Turing thesis. But you say that type II Turing machines are just another programming language, and possibly equivalent to PCF+timeout+catch, implying that they are simulatable by type I Turing machines. Either way, they aren’t a counterexample to the Church-Turing thesis, and that’s all that these experts actually believe.

@Ben: I do not like your argumentative style and I am not going to have a pointless discussion about “who are the experts”. I am trying to be kind here and I am providing you with references and examples that will further your understanding of the subject, so perhaps you can tone things down a little.

PCF is a reasonable model of computation, there is absolutely no doubt about that. It is an extension of the simply typed $\lambda$-calculus with natural numbers and general recursion. Modern functional programming languages (notably Haskell) are directly based on ideas developed through PCF. It is Turing complete in the sense that it is equivalent to Turing machines (Turing machines can simlate PCF and vice versa). Yet, when it comes to higher-type computation, Turing machines and PCF differ.

Type II computation has infinite inputs and outputs. A Turing machine has an infinite tape. These facts should be understood not as “we are doing actual infinite things” but that the resources, inputs and outputs are potentially unlimited. Type II computability models certain situations better than Type I. For instance, imagine a physics experiment which produces an unending sequence of 0’s and 1’s (for instance by listening to the cosmic radiation). A machine which processes this sort of input is better modeled by Type II (i.e., the infnite input is written on an infinite tape) than Type I (the input tape contains the Gödel code of a Turing machine that calculates the stream of 0’s and 1’s). This is why people are interested in Type II: not to discuss “truly infinite” data but to discuss “potentiall unbounded data which arrives as a sequence of bits”.

All of this does not matter because I can easily give examples of models of computation which are equivalent to Turing machines (can be simulated by and can simluate Turing machines) but differ in higher-type computation.

I think this has been as much of an explanation I can give in these comments. I refer you to the excellent book Higher-Order Computability by John Longley and Dag Normann. As a point of reference, I consider them to be the experts on the topic.

Looking for more information about type II computability, I found your lecture notes “Realizability as the Connection between Computable and Constructive Mathematics”, where theorem 2.1 is that there exists a computably unbounded type I computable function from $[0,1]_c$ to $\mathbb{R}_c$. Is there a proof of this anywhere? I can’t find any other mention of this theorem (no other Google Scholar hits for “computably unbounded”, for example), and it seems inconsistent with the result that all such functions are continuous.

The theorem that there is a type I computable unbounded function on $[0,1]$ ought to be proved in at least one of these: “Varietis of Constructive Mathematics” by Bridges and Richman, “Constructivism in Mathematics” by Troelstra and van Dalen (probably volume 1). I don’t have them with me right now to look them up and tell you exactly where. You can also do it yourself once you learn about Specker sequences. Section 3.2 of Constructive mathematics at the Stanford Encyclopedia of Philosophy also discusses the theorem briefly, but does not give a reference. The original theorem would have been due to someone from the Russian constrictivism school.

Surprisingly, the result is not inconsistent with pointwise continuity.