In every iteration we obtain a new seed and pass it to the next
iteration. In general, the seed is a combination of tuples, Eithers and
some primitive values (Ints, mostly). If we want fusion to be efficient,
we have to get rid of the constructors and just keep those parts of the
seed actually required. This is what SpecConstr does but, alas, not
always.

SpecConstr doesn't get rid of the Just because it assumes that x must be
boxed which is obviously not the case here. This actually happens with
seeds all the time because we often pass a part of the seed unchanged to
the next iteration. We would still like them to be unboxed as the boxed
version is never needed.

The following are some random ramblings about how this could perhaps be
fixed quickly. I have tried several things to help the compiler here.
The obvious:

foo (Just m) n = foo (Just m) (n-m)

doesn't work because CSE spots this and turns it into the previous
version. The next thing I tried was a hack:

foo (Just m) n = let {-# INLINE x #-}
x = Just m
in foo x (n-m)

This is rewritten to

foo (Just m) n = foo (__inline_me (Just m)) (n-m)

and CSE doesn't look inside the __inline_me. Unfortunately, neither does
SpecConstr. After teaching SpecConstr (and the rule matcher) to ignore
Core notes (I believe this is what is wanted in general, although I'm
not really sure about SCCs) this actually worked (although m still
wasn't getting unboxed, more about this below).

Unfortunately, things are not so simple. First, inline notes tend to
disappear a lot so as a quick hack, I made CSE ignore everything under a
"nocse" core note. More problematic is the fact that we usually do not
know the actual type of a (part of a) seed; stream transformers augment
seeds but cannot look inside of them. So just to see if this is
feasible, I tried to write a generic rebox function, as in

and use rebox on all parts of the seed which are passed unchanged to the
next iteration or, even better, on the entire seed in every consumer.
Note that all calls to rebox are inlined because the optimiser knows all
the types involved. We also want to stop the reboxing at certain points
in the tree. For instance, in

scanlS :: (a -> b -> a) -> a -> Stream b -> Stream a

the a is part of the seed but we do *not* want to rebox it in each
iteration. The class-based code above allows us to express this.

Anyway, this did work to some extent, but not really well. Still,
perhaps playing around with a rebox pragma might be a feasible idea if
it was better integrated into the compiler. In particular, the
strictness analyser could infer U instread of S for things under the
pragma and SpecConstr could look at unfoldings. I'm not at all sure
about this, though. I might try to implement it and play around with it
at some point (but not soon!).

A related problem is

foo :: Int -> Int -> Int
foo 0 n = 0
foo m n = foo (m-n) n

Here, n isn't getting unboxed although it could be if m is not 0 in the
first iteration. Perhaps unrolling the loop once could help here.

Join points

Unfortunately, I have lost the example, but when trying out the reboxing
stuff I saw the following happen quite a lot. I can try to find an
example if you want.

Suppose we have (*after* worker/wrapper, i.e., the join point must be
introduced by the subsequent simplifier run) something like

This doesn't buy us a lot - we still construct and immediately
deconstruct a pair in each iteration. This is a fairly obscure case,
though.

Inlining vs. rewriting

One thing I've noticed when trying out stream fusion for lists is the
following. If we define

map f = unstream . mapS f . stream

we *always* want to inline map (especially if we are going to rewrite it
back if fusion doesn't happen). But if stream is lazy (which it is), in

foo f g = map f . map g

the two map won't be inlined, I assume because the simplifier sees no
reason to do it. I remember that you made the simplifier keener to
inline in contexts in which rules might apply but this doesn't work here
because map doesn't have a rewrite rule - it's the stream/unstream rule
we want to fire.

So just saying {-# INLINE map #-} doesn't help us. So what do we do? One
solution is to have an additional rewrite rule:

map f = unstream . mapS f . stream

But this results in a lot of code duplication, especially for functions
which aren't quite as small (i.e., stream transformers such as mapS
which we always want to inline). Neither does just having the rule and
defining map as

map = undefined

help since GHC just treats map as a diverging function.

This problem is actually even worse with stream transformers which we
only want to inline quite late in the game as information about their
strictness, arity and so on should still be available in the earlier
phases. This means that we want GHC to see their definitions and then
inline them unconditionally at some point.

It would be quite handy to have a pragma which means "inline
unconditionally", for instance REWRITE. It would support the same phase
annotations etc. but behave as if instead of

It doesn't have to be implemented via rules, of course. I briefly talked
to Don about this and he said that he'd like to have this as well for
the ByteString library.

One question here is: do we want the above rule or

foo = __inline_me(\x -> e)

In general, I have to say that the quality of the loops produced by
fusion depends on the optimiser doing a *really* good job which is quite
fragile. I've been thinking about how to help the compiler even more but
don't have anything worth talking about yet.