User login

Navigation

Type Bombs

People like blowing stuff up, explosions are great! I started wondering about type bombs, small expressions which have large type terms. Is there anything known about that, in particular for Haskell/ML since these are popular?

The small bomb I know of employs 'dup' and is relatively well known. I.e., take `dup`

dup x = (x,x)

The repeated application gives back a type term which grows exponentially in the size of the expression.

(dup . dup . dup . dup . dup) 0

(It depends on the type checker, of course, whether the type actually is represented exponentially.)

You can lift `dup` further but can you grow faster? I tried `let t = (\f -> f . f) in let d = (\x -> (x,x)) in (t.t.t.t) d 0` which doesn't type check in ghci as a first order approximation. But I thought I would ask you, people.

What is the fastest growing type term you can come up with? Or what term will keep a type checker really, really busy?

Comment viewing options

Note that while this term generates a very big type, it is actually very fast to type-check with most reasonable type-checker implementations which use sharing for repeated subtypes. Of course, printing the type may still consume a lot of time.

If you think about it, this program also generates a very big result value, and yet it computes it in linear time.

There are other examples in the Hindley-Milner literature that do really require exponential type-checking time, using let-bindings.

The sharing is in the post. I am looking at terms which force exponential behavior, basically through unification, on the type checker. One way I am thinking about is to have two equivalent types with linear representation but which break the let bindings when unified.

These programs take time exponential in the amount of let nesting. HM inference is EXPTIME-complete, so in principle this is as bad as it gets (see "An analysis of ML typability" by Kfoury, Tiuryn and Urzyczyn)

Neat. I didn't look very carefully at it yet but this is another twist on 'simply double the type term with lets', right?

I'll try it. I've been looking at growing faster than exponential but it looks you need at least rank-n types. Like, if I could type `twice` with `(forall a. a -> f a) -> a -> f (f a))` I think you might be able to pull it off. Unsure yet.

...if you just doubled the size of the type, hash-consing/sharing can keep the memory usage of the inferred type linear in the size of the program, by turning an exponential-size tree into a linear-size DAG.

However, you can actually grow the size of a type by a bigger factor, since you can use a variable multiple times in a term. Here's a non-CPS version of Stephen's example:

f3 applies f2 twice, so it multiplies the size of its argument by 256.

f4 applies f3 twice, so it multiplies the size of its argument by 65,536.

f5 applies f3 twice, so it multiplies the size of its argument by 4,294,967,296.

At each step, you square the size of the term, so you get doubly-exponential growth in the size of the program (i.e., the program f_k will have a result type 2 ^ (2 ^ k) bigger than its argument type), and since hash-consing knocks off an exponential factor, you get exponential growth.

P is a type-level function that doubles the size of its argument, so P^n (P composed with itself n times) has size 2^n. F composes its argument with itself, so F^n (X) = X^(2^n). G also composes its argument with itself, so G^n(A) = A^(2^n).

So, G^n (F) (P).t has size 2^(2^(2^n)). The program above has n = 2, adding an extra G would give a type of size ~2^256.

Doing a few more layers of this will get you growth rates of any exponential tower (if you can manage to write the increasingly nasty functor types). I'm curious whether it's possible to go beyond primitive recursion, though.

I think you could show that there is no functor F^n (t) producing a type as large as Ack(n,|t|) using similar methods to those that show Ackermann isn't primitive recursive. Basically show that the functors smaller than Ackermann are closed under all of the basic forms of type growth provided by the language.

This use of modules can be reduced to pure System F-omega. And you are only using the type constructor level of F-omega, which is isomorphic to the simply typed lambda calculus. So AFAICS, you shouldn't be able to construct any function this way that grows faster than anything you can express in the STLC.

The problem with my initial type bombs I had was in typing twice which is roughly the same as the approach of neelk.

STLC should type twice like this.

twice : (a -> a) -> a -> a = (f -> f . f)

But when you want to explode type terms that way you need to be able to type

twice dup 0

Which STLC doesn't allow. (`dup` has type `a->(a*a)` which doesn't unify with `a->a`.)

So, one manner of doing so is by typing twice

twice: (forall a. a -> f a) -> a -> f (f a) = (f -> f . f)

But for that, I gather you need to move beyond STLC and that looks to be exactly what I think the module trick achieved. So I don't think a reduction to STLC can possibly be right. (The difference probably somewhere in that the expansion of type constructors can double whereas type variables can only unify.)

Bonus question: How fast does the following term grow for `n` appliations of twice?

I don't think Andreas was claiming that STLC and System Fw have equivalent type systems. He's saying the type constructors for System Fw by themselves are isomorphic to STLC (types & terms). Presumably STLC types would correspond to Fw kinds, and STLC values to Fw types.

It's very close, and maybe that's all Andreas meant, but it doesn't seem like a trivial isomorphism, because Fw just has a single atomic kind *, while STLC has countably many atomic variables. I don't see how an encoding like a = *->*, b = *->*->*, ... could produce a full isomorphism, though, since if something corresponds to * by itself, that would generate the rest of the space. Oh, and what does the constructor for quantification correspond to?

Andreas mind clarifying? Is there an actual isomorphism or were you speaking heuristically?

So yes, it has infinitely many term constants, but so has an STLC to which, say, you add natural numbers as a primitive type. They are uninterpreted, so don't affect any relevant meta-theoretical properties.

Yeah, that shows an embedding of the type level of System Fw into STLC with some constants, which is what's important for the point you were making. I was hung up on the other direction of the isomorphism, which wasn't needed for your greater point. For example, what System Fw kind corresponds to the STLC type (forall a. a -> a)? So maybe the isomorphism is to System Fw + top-level kind polymoprhism?

Scala used to have a lot of these back when I was working on it. There were magic numbers in the compiler to keep things in check, but refinement type cases would always slip the crack. Not sure what its like today, hopefully they were worked out over time.

Good
Type bombs are a rarity and not encountered in practice. Even if you would encounter one, a programmer is usually only interested in pretty simple static guarantees and -worst case- could trivially code around any type which manages to explode.

Bad
Type bombs exist therefore might play a role in future applications. Two possible scenarios:

A. You ship a new product, say WASM++, which makes use of trusted -possibly even proof-carrying- code. But an adversary manages to blow up applications by injecting type bombs and crippling the static analyzer.

B. You got with the digital coin craze, and ship a product with has a typed expression language for financial contracts. An adversary claims the network by type bombing away a sufficient amount of nodes.

A limiter would help a lot of course in both scenarios but then you're stuck with a rather awkward decision on an upper bound on types which may not get totally rid of the problem.

Ugly
Type bombs are a necessity for any typed sufficiently expressive language. As has been stated many times on LtU, types often serve as rough approximations for terms. It's natural to assume that in any expressive type system, there will always be types which closely follow the structure of a given term. You don't want an upper bound on the size of type terms, or you want the upper bound as high as possible.

What you're saying is what I thought for a long time, but it turns out to only apply to ML/Haskell. You can't create a type bomb in (say) C#!

"Type bombs" in ML/Haskell basically mean that it's possible to write down programs whose type is much bigger than the program itself. The reason that this is possible is that ML and Haskell do automatic generalization -- given a let-bound term, they add universal quantifiers to the type of the inferred term. Then, when you instantiate the quantifier you can pick a very big type. If you don't do automatic generalization, then you need to write down type annotations to declare that a value has a polymorphic type. Then you can't explode the size of a type relative to a program, since the necessary type declarations become part of the program size.

This is all assuming that you don't include nontrivial computation or constraint solving in the type language. If you do, then type bombs are possible again!

However, constraint solving is really useful for typechecking (eg, SMT solving of integer constraints for array bounds checks). Back in the day, George Necula and S.P. Rahul worked out a solution for this problem in their paper Oracle-based checking of untrusted software. (Warning: this is postscript, not pdf)

Their idea was incredibly simple! Constraint solving is potentially slow because it has to do a lot of nondeterministic search. So they said that when you send a program, you also send a sequence of bits determining the specific sequence nondeterministic to use! As a result, the checker never has to guess -- if you want to pass the checker, your certificate has to tell it which choices to make.

The type of the expression c(c(c(...))) is exponential in its length. All you need to get the exponential behaviour is automatic instantiation of type parameters, and the ability to use a type parameter more than once. It's not necessary to automatically generalise to get blowups.