Monday, May 02, 2011

More points for lazy evaluation

In a recent blog post Bob Harper shows one use of laziness, but I think he
misses the real import points of laziness.
So I will briefly enumerate what I think are the important points of
lazyness (excuse me for not coming up with any kind of pun about points).

First, I'd like to say that I don't think strict or lazy matters that
much in practice; they both work fine. I have programmed in regular
non-strict Haskell as well as strict Haskell, and most things carry
over well to a strict Haskell. Furthermore, when I say strict language I mean a strict language with at least non-termination as an effect; total languages is another matter.

Lazy bindings

I like to tell Haskell beginners that any subexpression can be named and "pulled out", modulo name capture, of course.
For instance

... e ...

is the same as

let x = e
in ... x ...

The key thing is that

... e ... e ...

is the same as

let x = e
in ... x ... x ...

so that common subexpressions can be given a name.

This is (in general) just wrong in a strict language, of course.
Just take the simple example

if c then error "BOO!" else 0

Which is not the same as

let x = error "BOO!"
in if c then x else 0

In this case you can easily fix the problem by delaying the
computation with a lambda (a common theme).

let x () = error "BOO!"
in if c then x () else 0

But for a slightly more complicated example this simple technique goes
wrong. Consider

map (\ a -> a + expensive) xs

where expensive does not depend on a. In this case
you want to move the expensive computation out of the loop (cf. loop
invariant code motion in imperative languages). Like so

let x = expensive
in map (\ a -> a + x) xs

In a lazy language x will be evaluated exactly zero times or
once, just as we want. Using the delaying trick doesn't work here:

let x () = expensive
in map (\ a -> a + x ()) xs

since expensive will get evaluated once for every list
element.

This is easy enough to remedy, by introducing an abstraction for lazy
computations (which will contain an assignment to compute the value
just once). The signature for the abstract type of lazy values is
something like

data Lazy a
delay :: (() -> a) -> Lazy a
force :: Lazy a -> a

Note that the delay needs to take a function to avoid the
a being evaluated early.
(This is probably what Bob would name a benign effect and is easily programmed using unsafePerformIO, which means it needs careful consideration.)

And so we get

let x = delay (\ () -> expensive)
in map (\ a -> a + force x) xs

This isn't exactly pretty, but it works fine. In a language with
macros the ugliness can be hidden better.

Lazy functions

Even strict languages like ML and C have some lazy functions even if
they don't call them that, like SML's if, andthen, and
orelse. You really need the if construct to evaluate the
condition and then one of the branches depending on the condition.
But what if I want to define my own type with the same kind of
functions?
In ML I can't, in C I would have to use a macro.

The ability to define new functions that can be used as control
constructs is especially important when you want to design embedded
domain specific languages. Take the simple example of the
when (i.e., one-arm if) function in Haskell.

when :: (Monad m) => Bool -> m () -> m ()

A quite common use of this function in monadic code is to check for
argument preconditions in a function, like

f x = do
when (x < 0) $
error "x must be >= 0"
...

If the when function is strict this is really bad, of course,
since the call to error will happen before the when is
called.

But in my opinion, this is too ugly to even consider. The intent of
the function is obscured by the extra noise to make the second
argument lazy.

I think every language needs a mechanism for defining some form of
call-by-name functions. And many languages have this in the form of
macros (maybe built in like in Lisp, or as a preprocessor like C).
If a language cannot define lazy function it simply lacks in
abstraction power. I think any kind of fragment of the language
should be nameable and reusable. (Haskell lacks this ability for
patterns and contexts; a wart in the design.) So if you notice a
repeated expression pattern, like

if c then t else False

and cannot give this a name, like

and c t = if c then t else False

and then use it with the same effect as the orginal expression, well,
then your language is lacking.

For some language constructs the solution adopted by Smalltalk (and
later Ruby), i.e., a very lightweight way on constructing closures is
acceptable. So, for instance, I could accept writing

... myAnd x {y} ...

(In SML you could make something using functors, but it's just too ugly to contemplate.)

Lazy constructors

Lazy constructors were sort of covered in what Bob claimed to be the point of laziness, so I'll just mention them for completeness. Sometimes you need them, but in my experience it's not very often.

Cyclic data structures

This is related to the last point.

Sometimes you really want cyclic data structures. An example are the
Haskell data types in Data.Data that describe data types and
constructors. A data type descriptor needs to contain a list of its
constructors and a constructor descriptor needs to contain the data type
descriptor.
In Haskell this can be described very naturally by having the two
descriptors reference each other.
In SML this is not possible. You will have to break the cycle by
somthing like a reference (or a function).
In OCaml you can define cyclic data structures in a similar fashion to
Haskell, so this isn't really a problem with strict languages, but
rather a feature that you can have if you like.
Of course, being able to define cyclic data leads to non-standard
elements in your data types, like

data Nat = Zero | Succ Zero
omega :: Nat
omega = Succ omega

So having the ability to define cyclic data structures is a double
edged sword.
I find the lack of a simple way to define cyclic data a minor nuisance only.

Reuse

I've saved my biggest gripe of strict evaluation for last.
Strict evaluation is fundamentally flawed for function reuse.
What do I mean? I will illustrate with and example.
Consider the any function is Haskell:

any :: (a -> Bool) -> [a] -> Bool
any p = or . map p
It's quite natural to express the any function by reusing the
map and or functions. Unfortunately, it doesn't
behave like we would wish in a strict language. The any function should scan the list from the head forwards and as soon as an
element that fulfills the predicate is found it should return true and stop
scanning the list. In a strict language this would not happen, since
the predicate will be applied to every element before the or
examines the elements.

But the misery doesn't end here. This still doesn't do the right
thing, because the strict language will recurse all the way down
the list since it will call foldr before f. So we either have to fuse again, or invent a new version of
foldr that delays the recursive call.
One more fusion gets us to

any p [] = False
any p (y:ys) = y || any p ys

So where's the function reuse? Nowhere in sight.

With strict evaluation you can no longer with a straight face tell
people: don't use recursion, reuse the recursion patterns in map,
filter, foldr, etc. It simply doesn't work (in general).

Using macros doesn't really save us this time, because of the
recursive definitions. I don't really know of any way to fix this
problem short of making all (most?) functions lazy, because the
problem is pervasive. I.e., in the example it would not be enough to fix foldr; all the functions involved need to be lazy to get the desired semantics.

I find this the biggest annoyance with strict evaluation, but at the
same time it's just an annoyance, because you can always rewrite the
code to make it work. But strict evaluation really, fundamentally
stops you from reusing functions the way you can with laziness.

As an aside, the definition of any doesn't work in SML for
another reason as well, namely the value restriction. But that's just
a language wart, like the monomorphism restriction in Haskell.

Complexity

Another complaint Bob Harper had about lazy evaluation is the
difficulty of finding out the complexity of functions. I totally
agree that the space complexity is very messy in a lazy language.
For (sequential) time complexity I don't see any need to worry.
If strict a function has O(f(n)) complexity in a strict language then
it has complexity O(f(n)) in a lazy language as well. Why worry? :)

Summing up

One of the most important principles in all software design is DRY,
i.e., Don't Repeat Yourself. This means that common patterns that we
can find in a program should be abstractable so you can reuse them.
In a lazy language you can use functions for abstracting control
structures, which a strict language does not permit (at least not in a
convenient way). This can be mitigated by providing some other
abstraction mechanism, like macros (hopefully some kind of sane macros).
For the reasons above, I find it more convenient to program in a lazy
language than a strict one. But a strict language with the ability to define lazy bindings, lazy functions, and lazy constructors is good enough that I find it usable.

46 Comments:

I think the only serious gripe is what you call your biggest gripe, and I must say that I envy this in Haskell. As you know, in the eager world we tend to write out our own recursive functions, rather than use combinators. You give part of the reason; the other part is that deforestation and related transformations are not valid, at least not in the presence of effects. (But, as I've pointed out before, dual transformations that are valid in ML are not in Haskell. You basically trade strictness conditions for totality conditions.) I've made my peace with this, but I agree that you have a point (ahem).

It seems like you and Bob would both be pretty happy with a call-by-name language: you get predictable interaction with effects, full beta-reduction and a reasonable parallel cost model.

I don't have any experience programming with call-by-name, but on the surface it seems like you could program pretty much as in Haskell, except you would want to memoize some thunks when terms get duplicated. Or am I missing something?

I don't think you should dismiss a language where there is a pure (total) core functional language, and where all effects, including non-termination and exceptions, are banished to a secondary language (e.g., monad). An example is Coq + the Ynot monad.

There, you get the best of both worlds. You get to evaluate the pure fragment CBN, CBV, or CB-need or however you see fit. (Though I think I prefer a cost model that leans towards CBV to avoid the space problems Bob mentioned. However, to support control-abstraction, which I also think is important, you might lean towards CBN or CB-need.) You get to factor everything out with "let" if you like. You get to do de-forestation. The compiler doesn't have to do strictness analysis. You can add "strictness" annotations to types or calls to seq and have it not break all of those nice properties Haskell folks claim they have for small programs, but never have for big ones.

There are other advantages. Integrating dependent types is easier. Embedding DSLs with control requirements that rule out divergence or exceptions becomes possible. And you get to do Curry-Howard, something you can't pull off in Haskell at all, and in ML only when you do silly things like restrict dependencies to A-normal form (c.f., Leroy's modules.)

@Greg I didn't in any way mean to dismiss languages with a total core, I only meant to say that I wasn't talking about those languages. Indeed, I have for a long time wished for such a language for exactly the reasons you outline. (Also, as a compiler writer I love the idea that evaluation order is left up to me rather than the language spec.)

I think Scala has an interesting approach to defining your own control constructs. It lets you declare a function parameter as call by name and it automatically inserts a lambda round the actual parameter and an application round uses of the formal parameter.Just one of Scala's tricks to make it easier to implement EDSLs.

@Lennart Just an aside, but: I would love it too, but even with a total core, I don't think it is practical to let the user choose an evaluation regime.

For example, Brian Howard published a type system (1995, I think) for inductive/coinductive types and total functions which can encode Ackermann's function, so you could write programs that, even for very small inputs, probably would not terminate before the universe ends: efficiency would still be an unavoidable issue.

I think any non-tiny program would need to be written with an evaluation order in mind.

Lennart, I think re-use falls short in Haskell as well as in ML. The core of the problem is that functions such as any, map, fold are defined on a particular concrete datatype. With lazy lists as the datatype you get some useful composition laws that you don't get with strict lists.

But you can achieve much higher reusability if collection methods are defined uniformly over strict as well as lazy collections. In Scala, map, fold, exists are methods that work on any kind of collection. They are strict for strict collections and lazy for lazy collections. And it's perfectly possible to abstract and write your own operations that have the same adaptive behavior.

So then the only re-use argument in favor of laziness is that refactorings like the one you describe are efficient only if one assumes the collection is lazy. In other words, by restricting the domain to lazy structures you get more performance-preserving transformations. For me that argument is not so convincing. Sure, you need to work harder to write utility functions that perform unformly well on strict as well as lazy collections. But once that's done these functions will be more reusable and therefore ultimately more useful.

@Martin I don't totally agree with that. The definition 'any p = or . map p' is just the wrong one for a strict language. It doesn't matter if 'or' and 'map' are overloaded for different kinds of collections, it's still wrong for strict collections. (BTW, Haskell has this kind of reuse if you use Traversible etc.)

But I do agree that Haskell is lacking in reuse, but my gripe is more about mapM vs map etc.

In Haskell, we have Data.Foldable ( http://www.haskell.org/ghc/docs/7.0.3/html/libraries/base-4.3.1.0/Data-Foldable.html ), which affords 'container-centric' opportunities to optimize folds in general and ( http://www.haskell.org/ghc/docs/7.0.3/html/libraries/base-4.3.1.0/Data-Traversable.html ) which provides monadic and applicative traversals in the same spirit.

I agree with most of your post, but I never found the argument about custom control operators particularly convincing. If anything, they would rather be an argument for call-by-name -- laziness may happen to work for `if', but what about, say, `while'? Of course, if your only effect is non-termination you might not care, but in general?

2. For a compiler strategy that dynamically subdivided map for parallel execution on multiple cores requires it not be lazy.

3. any = or . map is not "wrong" for eager (strict) evaluation when there is referential transparency. It is slower in sequential time, but maybe faster in parallel execution.

4. Wikipedia says that for Big O notation, 0(n) is faster than O(n log n) = O(log n!). @Lennart, I think you meant to say that O(n) is faster than O(n log n).

5. Given that laziness causes space and latency indeterminism, if the main reason to use lazy is to avoid the performance hit for conjunctive functional composition over functors, then only functions which output applicable functors need apply laziness. As @martin (Odersky) suggested, provide lazy and eager versions of these functions. Thus eager by default with optional lazy annotations would be preferred.

Principle and 80/20 rule. 80+% of the programmers in world are not likely to grasp debugging lazy space leaks. It will only take one really difficult one to waste a work-week, and that will be the end of it. And what is the big payoff, especially with the parallelism freight train bearing down? Someone claimed that perhaps the biggest advance in mainstream languages since C, was GC (perhaps Java's main improvement over C++). Thus, if the typical programmer couldn't do ref counting without creating cycles, I don't think they will ever grasp lazy space and latency indeterminism. I am approaching this from wanting a mainstream language which can replace Java for statically typed.

@shelby I've used strict and lazy languages a lot, and prefer lazy, but it's not a huge deal to me. I think arguing against something you haven't even used is somewhat arrogant.

Lazy code evaluated as strict can also exhibit arbitrary slowdowns and space leaks. So it's just the same drawbacks as evaluating strict code lazily. Both have problems, and they can be fixed. The problems and the fixes are different, though.

@augustss Rash judgments w/o experience is annoying. Two decades of programming, and copious reading is all I can humbly offer at this moment. This is imperfect and warrants my caution. I appreciate factual correction.

My point is that with eager, debugging the changes in the program's state machine at any function step, will be bounded to the function hierarchy inside the body of the function, so the programmer can correlate changes in the state machine to what the function is expected to do.

Whereas, with lazy any function may backtrack into functions that were in the argument hierarchy of the current function, and inside functions called an indeterminant time prior. Afaics, lazy debugging should be roughly analogous to debugging random event callbacks, and reverse engineering the state machine in a blackbox event generation module.

As I understand from Filinksi's thesis, eager and lazy are categorical duals in terms of the induction and coinductive values in the program. Eager doesn't have products (e.g. conjunctive logic, "and") and lazy doesn't have coproducts (e.g. disjunctive, "or"). So this means that lazy imposes imperative control logic incorrectness from the outside-in, because coinductive types are built from observations and their structure (coalgebra) is not known until the finality when the program ends. Whereas, eager's incorrectness is from the inside-out, because inductive types have a a priori known structure (algebra) built from an initiality. Afaics, this explains why debugging eager has a known constructive structure and debugging lazy is analogous to guessing the structure of a blackbox event callback generation module.

My main goal is to factually categorize the tradeoffs. I am open to finding the advantages of lazy. I wrote a lot more about this at my site. I might be wrong, and that is why I am here to read, share, and learn. Thanks very much.

Could you tell me why lazy (with optional eager) is better for you than referentially transparent (RT) eager (with optional lazy), other than the speed of conjunctive (products) functional composition? Your lazy binding and lazy functions points should be doable with terse lazy syntax at the let or function site only, in a well designed eager language. Infinite lazy types can be done with the optional lazy in an eager language with such optional lazy syntax. Those seem to be superficial issues with solutions, unlike the debugging indeterminism, which is fundamental and unavoidable.

Your any = or . map example flattens to a cycle (loop) on each element of the functor (e.g. list), which converts each element to a boolean, exits if the result is true, and always discards the converted element in every possible code path of the inputs. That discard proves (assuming RT and thus no side-effects in map) the lazy evaluation of the eager code has no new coproducts, and thus the determinism in space and latency is not transformed.

@shelby With lazy evaluation I can make building blocks and compose them together, i.e., you can program using a combinator library (with any f = or . map f as an example). This doesn't work nearly as well strict evaluation. Strict evaluation just isn't composable the way lazy evaluation is.

But debugging and error messages are indeed better in a strict language. That said, I hardly ever use a debugger in Haskell. Proper stack traces on errors is what I really miss in Haskell.

@augustss Are you conflating 'strict' with 'not referentially transparent'? I think that is a common misperception because there aren't many strict languages which are also RT. So the experience programmers have with strict languages is non-compositional due to the lack of RT. Afaik, composition degrees-of-freedom is the same for both strict and lazy, given both are RT. The only trade-off is with runtime performance, and that trade-off has pluses and minuses on both sides of the dual. Please correct me if I am wrong on that.

@shelby No, I'm referring to strict and pure (referentially transparent). It's not just a question of efficiency, but of termination. The change in evaluation order often turns a nice lazy idiom into a non-terminating (I'm equating infinite loops and exceptions here) idiom in a strict language.

For instance, in Haskell it's very common to write something like fromMaybe (error "BOOO") x. With strict evaluation that doesn't work without something extra that makes sure that the first argument to fromMaybe is not evaluated before the call. You can fix that, but when you start composing things with higher order functions you need different variants with different strictness properties. It quickly becomes very messy, and you just learn not to program in that style in a strict language.

@augustss thanks I now understand your concern. The difference in non-termination evaluation order with runtime exceptions is irrelevant to me, because I proposed that by using types we could eliminate all exceptions (or at least eliminate the catching them from the declarative RT portion of the program). I provide an example of how to eliminate divide-by-zero at copute.com (e.g. a NonZero type that can only be instantiated from case of Signed, thus forcing the check at compile-time before calling the function that has division). A runtime exception means the program is in a random state, i.e. that the denotational semantics is (i.e. the types are) not fit to the actual semantics of the runtime. Exceptions are the antithesis of compositional regardless of the evaluation order.

In my opinion, a greater problem for extension with Haskell (and Scala, ML, etc) is afaics they allow conflation of interface and implementation. That is explained at my site. I have been rushing this (80+ hour weeks), so I may have errors.

@shelby To make strict evaluation as compositional as lazy you need to ensure termination. So no exception, no error construct that terminates the program, and no general recursion. I've yet to see a real programming language that fulfills that (I guess Agda and Coq come closest).

@augustss agreed must insure termination for cases which are not bugs, but infinite recursion is a bug in strict, thus I excluded it for comparing compositionality. No need to go 100% total in strict to get the same compositionality as lazy. Perhaps Coq can insure against infinite recursion bug, but that is an orthogonal issue.

Diamond problem impacts compositionality, because it disables SPOT (single-point-of-truth). For example, Int can't be an instance of Ord and OrdDivisible, if OrdDivisible inherits from Ord. Thus functions that would normally compose on Ord, have to compose separate on Int and IntDivisible, thus do not compose.

@shelby You don't seem to understand the full problem of non-termination. Non-termination in the computations that are normally executed is (probably) a bug, but those are not the ones that worry me. Take something simple like 'fac n = if (n==0) 1 (n*fac(n-1))'. If the 'if' function were not lazy this would never terminate. So you need to treat 'if' specially. Next example, I want to evaluated 'any pred list', and I know that either there are no elements that satisfy pred, or if there is, then following that element there will be some where pred does not terminate. That works fine with lazy evaluation, but not with strict. So now you have to make a special 'any'. And so on. There are many functions where you might need a lazy version. With lazy evaluation I never need to worry about that, with strict I do.

Lennart's right that in a lazy language it is quite easy to define functions that rely on their arguments not being evaluated prior to the call. If laziness were a type, and if the syntax were civilized, then it wouldn't be so different in an eager language, eg writing %e to mean that e should be suspended (and memoized).

In my mind however the conditional branch is not an example of this phenomenon. The expression "if e then e1 else e2" is short-hand for a case analysis, which binds a variable (of unit type) in each branch. So the conditional is not "lazy" at all; rather, it is an instance of the general rule that we do not evaluate under binders (ie, evaluate only closed code).

@augustuss if 'map pred list' doesn't terminate, it is because list is infinite. But infinite lists break parallelism and require lazy evaluation, so I don't want them as part of a default evaluation strategy. Offering a lazy keyword (i.e. type annotation or lazy type) can enable expression of those infinite constructions, but in my mind, it should discouraged, except where clarity of expression is a priority over parallelism. Did I still miss any predominant use case?

Thus, I concur with what Existential Type replied. For example, pattern match guards for a function, is runtime function overloading (splitting the function into a function for each guard), thus the compile shouldn't evaluate the functions (guard cases) that are not called.

It appears to me (see my idea in a prior comment) that lazy is a ad-hoc runtime non-deterministic approximation of deforestation. We need better automatic deforestation aware algorithms for eager compilers, so the parallelism vs. sequential execution strategy is relegated to the backend and not the in the language.

Come to find out that I can get the same conceptual efficiency as lazy does for 'or . map p', by using the State monad and Traversable in strict language, where I have a version of 'traverse' which inputs a function that returns Maybe (i.e. can terminate the iteration).

I have come to the conclusion that neither an eager (a/k/a strict, e.g. most languages) nor lazy (a/k/a total, e.g. Haskell) language is vastly superior to the other. They each have tradeoffs, because they are categorical duals.

Eager is an *inductive* evaluation model, meaning we run the leaves as we as build the nested function tree of runtime possibilities.

Lazy is a *coinductive* evaluation model, meaning we run the leaves as needed from the nested function tree of runtime possibilities.

Eager can't instantiate coinductive types, such as bottom (i.e. the entire Universe) or any infinite type, but it can instantiate inductive types such as the natural numbers.

Lazy can instantiate coinductive types, such as bottom and infinite types, but it can't instantiate inductive types such as the natural numbers.

Eager does not have conjunctive products, so we can't do general function composition (e.g. or . map p) without potentially doing evaluation that is not shared (i.e. required) by the conjunctive result.

Lazy does not have disjunctive sums, so we can't do logic such as:

((if ... then 1 else 2),3) == if ... then (1,3) else (2,3)

For example, if we only evaluate the second element in a lazy language and if ... has an exception, then the left version will not generate the exception, but the right version will.

I suppose most people think rather inductively.

And models can be used in eager to provide conjunctive products for some cases.

Maybe I didn't make it clear here. What I meant by "effect" is something general. It corresponds to a monadic type in Haskell. Suppose we have IO actions in expensive, then we can't even use map to apply it on a list.

On the other hand if we don't have effects (or monad actions) in expensive, we can do the same transformation in a strict language without repeat the expensive computation.

But I see your other point now. A strict language does need a tweak in order to prevent the evaluation of expensive when the list is empty. I was just talking about the repeating issue.

So, I've replied over at Google+, in the newly-formed Haskell community there (of which I am the founder; full disclosure of shameless plug). I hope you find my perspective as an advocate of strictness to be appropriate and not dismissive.

You lose the ability to pass around maybe first-class, in that you can't pass if around. In Haskell you could exploit the first-class-ness by e.g.zipWith3 maybe errors functions maybesbut that seems unlikely, but even so, you just write a macro to not evaluate "errors". Of course, than that isn't first class, and so on...

So the trade-off appears to be Haskell's functions are sort of "first class macros" in exchange for unpredictable memory usage. Show me a program that isn't easily convertible into scheme and I'm sold, which based on the above observation would seem to require a big hierarchy of first-class conditionals passed around to yet higher ones. But until then, Haskell remains my type-safe prototyping friend.