(1) you might want to write a guard that depends on side effects in body.
(2) you might want to avoid constructing a secondary list or other
data structure containing the filtered elements.

One problem is how to integrate this with for-yield collections. It
would be unacceptable to have different behaviors of guards depending
on whether we are in a for-loop or a for-yield.
The other problem is how to do this for all kinds of generators,
collections and others, in a uniform way.

Unlike in Haskell, there's no unit element that we can use in a
for-expression. So the only alternative seems to be to make use of
filterForeach, filterMap, filterFlatMap operations in the generator
classes. That is, for-expression expansion would still work as
described by Jorge:
>>>>

for (x <- c) f(x)

simply translates to:

c.foreach(x => f(x))

Likewise:

for (x <- c) yield f(x)

simply translates to:

c.map(x => f(x))

Something with a guard:

for (x <- c if p(x)) yield f(x)

translates to:

c.filter(x => p(x)).map(x => f(x))

And a nested for:

for (x1 <- c1; x2 <- c2) yield f(x1, x2)

translates to:

c1.flatMap(x1 => c2.map(x2 => f(x1, x2)))

<<<

But, there would be a post-processing step, where

c1.filter(p).map(f) is rewritten to c1.filterMap(p, f)
c1.filter(p).flatMap(f) is rewritten to c1.filterFlatMap(p, f)
c1.filter(p).foreach(f) is rewritten to c1.filterForeach(p, f)

These rewritings would only happen for filters/maps/flatMaps generated
in for-expressions and only if the c1 type had defined the filterXXX
operations. We don't want to demand that for every generator type,
because that would make it too complicated to define generators. But
collections would uniformly have those operations, and implement them
with lazy filters. For instance, in class TraversableLike:

Question: Should we do this? Advantages I can see: (1) More efficient
operations through some amount of deforestation. (2) Less surprises
for users expecting imperative behavior.
Disadvantages: (1) It complicates the rules, and possibly the behavior
because now filterMap, etc would decide whether filter was strict or
lazy. (2) It would break some code.

Btw, one could also consider to go further with deforestation, for
instance with rules like this:

Re: Rethinking filter

It would make for expression more powerful and useful, and encourage us to use for expression rather than map,filter,... etc.

Martin Odersky wrote

We have a number of open tickets which all want that

for (x <- xs if guard) body

should expand to

for (x <- xs if (guard) body

rather than the current expansion

for (x <- xs filter (x => guard)) body

The motivations are

(1) you might want to write a guard that depends on side effects in body.
(2) you might want to avoid constructing a secondary list or other
data structure containing the filtered elements.

One problem is how to integrate this with for-yield collections. It
would be unacceptable to have different behaviors of guards depending
on whether we are in a for-loop or a for-yield.
The other problem is how to do this for all kinds of generators,
collections and others, in a uniform way.

Unlike in Haskell, there's no unit element that we can use in a
for-expression. So the only alternative seems to be to make use of
filterForeach, filterMap, filterFlatMap operations in the generator
classes. That is, for-expression expansion would still work as
described by Jorge:
>>>>

for (x <- c) f(x)

simply translates to:

c.foreach(x => f(x))

Likewise:

for (x <- c) yield f(x)

simply translates to:

c.map(x => f(x))

Something with a guard:

for (x <- c if p(x)) yield f(x)

translates to:

c.filter(x => p(x)).map(x => f(x))

And a nested for:

for (x1 <- c1; x2 <- c2) yield f(x1, x2)

translates to:

c1.flatMap(x1 => c2.map(x2 => f(x1, x2)))

<<<

But, there would be a post-processing step, where

c1.filter(p).map(f) is rewritten to c1.filterMap(p, f)
c1.filter(p).flatMap(f) is rewritten to c1.filterFlatMap(p, f)
c1.filter(p).foreach(f) is rewritten to c1.filterForeach(p, f)

These rewritings would only happen for filters/maps/flatMaps generated
in for-expressions and only if the c1 type had defined the filterXXX
operations. We don't want to demand that for every generator type,
because that would make it too complicated to define generators. But
collections would uniformly have those operations, and implement them
with lazy filters. For instance, in class TraversableLike:

Question: Should we do this? Advantages I can see: (1) More efficient
operations through some amount of deforestation. (2) Less surprises
for users expecting imperative behavior.
Disadvantages: (1) It complicates the rules, and possibly the behavior
because now filterMap, etc would decide whether filter was strict or
lazy. (2) It would break some code.

Btw, one could also consider to go further with deforestation, for
instance with rules like this:

Re: Rethinking filter

martin odersky wrote:
> We have a number of open tickets which all want that
>
> for (x <- xs if guard) body
>
> should expand to
>
> for (x <- xs if (guard) body

(I assume this should read "for (x <- xs) if (guard) body")

>
> rather than the current expansion
>
> for (x <- xs filter (x => guard)) body
>
> The motivations are
>
> (1) you might want to write a guard that depends on side effects in body.
> (2) you might want to avoid constructing a secondary list or other
> data structure containing the filtered elements.

I'm against this suggestion if these are the only reasons for change.
It's not hard to manually move the guard inside the for-body.

Regarding the yield-case, the only motivation I can see is performance,
but it's not worth complicating the rewrite rules for that. It's better
to put more work into the optimizer in that case.

(1) you might want to write a guard that depends on side effects in body.
(2) you might want to avoid constructing a secondary list or otherdata structure containing the filtered elements.

One problem is how to integrate this with for-yield collections. Itwould be unacceptable to have different behaviors of guards depending
on whether we are in a for-loop or a for-yield.The other problem is how to do this for all kinds of generators,collections and others, in a uniform way.

Unlike in Haskell, there's no unit element that we can use in a
for-expression. So the only alternative seems to be to make use offilterForeach, filterMap, filterFlatMap operations in the generatorclasses. That is, for-expression expansion would still work asdescribed by Jorge:
>>>>

for (x <- c) f(x)

simply translates to:

c.foreach(x => f(x))

Likewise:

for (x <- c) yield f(x)

simply translates to:

c.map(x => f(x))

Something with a guard:

for (x <- c if p(x)) yield f(x)

translates to:

c.filter(x => p(x)).map(x => f(x))

And a nested for:

for (x1 <- c1; x2 <- c2) yield f(x1, x2)

translates to:

c1.flatMap(x1 => c2.map(x2 => f(x1, x2)))

<<<

But, there would be a post-processing step, where

c1.filter(p).map(f) is rewritten to c1.filterMap(p, f) c1.filter(p).flatMap(f) is rewritten to c1.filterFlatMap(p, f)
c1.filter(p).foreach(f) is rewritten to c1.filterForeach(p, f)

These rewritings would only happen for filters/maps/flatMaps generatedin for-expressions and only if the c1 type had defined the filterXXXoperations. We don't want to demand that for every generator type,
because that would make it too complicated to define generators. Butcollections would uniformly have those operations, and implement themwith lazy filters. For instance, in class TraversableLike:

Question: Should we do this? Advantages I can see: (1) More efficientoperations through some amount of deforestation. (2) Less surprises
for users expecting imperative behavior.Disadvantages: (1) It complicates the rules, and possibly the behaviorbecause now filterMap, etc would decide whether filter was strict orlazy. (2) It would break some code.

Btw, one could also consider to go further with deforestation, forinstance with rules like this:

c1.flatMap(x => f(x).map(g)) rewrites to c1.mapFlatMap(g, f)

Cheers
-- Martin

It is funny I thought about (some of) these things (but not very deeply, I should say) quite a long ago and my resolution was that Scala will be able in the future to optimize some common cases and eliminate intermediate structures. Not sure if this sheds any light though...

Re: Re: Rethinking filter

(1) you might want to write a guard that depends on side effects in body.
(2) you might want to avoid constructing a secondary list or other
data structure containing the filtered elements.

I'm against this suggestion if these are the only reasons for change. It's not hard to manually move the guard inside the for-body.

Regarding the yield-case, the only motivation I can see is performance, but it's not worth complicating the rewrite rules for that. It's better to put more work into the optimizer in that case.

The optimizer needs to finds out that 'filter' is side-effect free,
that map builds the exact same collection type, that the predicate and
mapping function are again side-effect free and can be reordered...
Maybe when an effect type-system is in place this should be done, but I
wouldn't hold my breath.

If it was only for efficiency, I'd agree with you, but having guards that rely on
side-effects in the mapping function is more intuitive for most
people. The current behavior is a source of subtle bugs. And no matter who does the rewrite, the optimizer or the 'rewrite' rules, some complexity (and arguably, a lot more if we rely on the optimizer) will appear somewhere. Speccing it allows the programmer to rely on it, and actually simplifies the implementation. The added complexity in the specification doesn't seem that great to me, so I'm in favor of this change.

Re: Rethinking filter

Iulian Dragos wrote:
> The optimizer needs to finds out that 'filter' is side-effect free, that
> map builds the exact same collection type, that the predicate and
> mapping function are again side-effect free and can be reordered...
> Maybe when an effect type-system is in place this should be done, but I
> wouldn't hold my breath.

Yes, an effect tracking system is required to make this "stream fusion"
work properly (although it doesn't have to be integrated with the type
system).

> If it was only for efficiency, I'd agree with you, but having guards
> that rely on side-effects in the mapping function is more intuitive for
> most people. The current behavior is a source of subtle bugs. And no
> matter who does the rewrite, the optimizer or the 'rewrite' rules, some
> complexity (and arguably, a lot more if we rely on the optimizer) will
> appear somewhere. Speccing it allows the programmer to rely on it, and
> actually simplifies the implementation. The added complexity in the
> specification doesn't seem that great to me, so I'm in favor of this change.

Well, I think the key question is: are functions passed to filter, map
and flatMap allowed to have side effects? And, if so, what restrictions
apply and what properties can be relied upon? If we start by specifying
this, I think the implementations details will fall out naturally.

Re: Re: Rethinking filter

>> The motivations are
>> (1) you might want to write a guard that depends on side effects in
>> body. (2) you might want to avoid constructing a secondary list or
>> other data structure containing the filtered elements.

Jesper> I'm against this suggestion if these are the only reasons for
Jesper> change. It's not hard to manually move the guard inside the
Jesper> for-body.

Agree.

(1) doesn't seem important enough to do for 2.8, given everything else
that's going on, especially given that only imperative code (and rather
tricky imperative code, too) benefits from the additional complication.

As for (2), I'd point out that it's not hard to add ".view" to avoid
constructing intermediates.

Re: Re: Rethinking filter

I agree with Seth and Jesper.

I rather like the simplicity of the current transformation. The "combine if it looks like this AND this method is defined" rule seems overly complicated and a can of worms. Would it apply the filterXXX methods based on the static type or the run-time type? How "deep" does the deforestation go?

That said, I haven't seen the open tickets regarding this issue. Which tickets are they? Perhaps they contain compelling arguments for change.

>> The motivations are
>> (1) you might want to write a guard that depends on side effects in
>> body. (2) you might want to avoid constructing a secondary list or
>> other data structure containing the filtered elements.

Jesper> I'm against this suggestion if these are the only reasons for
Jesper> change. It's not hard to manually move the guard inside the
Jesper> for-body.

Agree.

(1) doesn't seem important enough to do for 2.8, given everything else
that's going on, especially given that only imperative code (and rather
tricky imperative code, too) benefits from the additional complication.

So you see that the first index is computed with the previously given
limit. It is not affected by the assignment in the yield. No surprise
because we know filter is strict. However, the second index does go up
to 3 after the first loop iteration, so it does take the changed limit
into account. If you look at the translation it's clear why it has to
be so:

You see that the second map is in the closure passed to the first
flatMap, so its evaluation gets interleaved with the flatMap
iteration. But the filter is in the generator of that flatMap, so gets
computed before the flatMap is even started.

But for someone who has not 100% internalized that translation (and
who has?), it's very weird indeed that _some_ parts of the for
comprehension are evaluated before the body is started, but other
parts are interleaved.

So I think we need a better translation of `for'. I have found a
scheme which is different from filterMap, filterFlatMap, filterForeach
and which looks simpler. We postulate a new method withFilter which
like filter takes a predicate as argument. The `for' translation is
exactly as it is now, but using withFilter instead of filter. This
means that one has the guarantee that the result of withFilter is
further taken apart using one of map, flatMap, forEach, or another
withFilter. For instance, the for expression above would translate to:

It is recommended (but not enforced) that withFilter is lazy. That is,
it should return a proxy object with methods for map, flatMap,
foreach, and withFilter. Each of these would combine a guard with the
original operation. I append an implementation of withFilter for class
TraversableLike which would be inherited by all collection classes.

To ensure a smooth transition, I propose that if the `for' translation
scheme cannot find the `withFilter' method, it uses `filter' instead,
but emits a deprecated warning.

I believe this proposal has a lot going for it: It's not significantly
more complex than the previous translation, it leads to more intuitive
behavior, and it can be a win in performance because it reduces the
number of intermediate structures that are built.

The only possible downside is that it might break some code. But I
doubt that would be very likely, as code that relies on a filter
seeing a snapshot of the state before the loop body is executed seems
rather weird.

I think this also settles the case for Range. Range should be strict,
and we do not need an exception for its filter anymore.

Cheers

-- Martin

Here's the proposed implementation of withFilter in class TraversableLike:

So you see that the first index is computed with the previously givenlimit. It is not affected by the assignment in the yield. No surprisebecause we know filter is strict. However, the second index does go up
to 3 after the first loop iteration, so it does take the changed limitinto account. If you look at the translation it's clear why it has tobe so:

You see that the second map is in the closure passed to the firstflatMap, so its evaluation gets interleaved with the flatMapiteration. But the filter is in the generator of that flatMap, so gets
computed before the flatMap is even started.

But for someone who has not 100% internalized that translation (andwho has?), it's very weird indeed that _some_ parts of the forcomprehension are evaluated before the body is started, but other
parts are interleaved.

So I think we need a better translation of `for'. I have found ascheme which is different from filterMap, filterFlatMap, filterForeachand which looks simpler. We postulate a new method withFilter which
like filter takes a predicate as argument. The `for' translation isexactly as it is now, but using withFilter instead of filter. Thismeans that one has the guarantee that the result of withFilter isfurther taken apart using one of map, flatMap, forEach, or another
withFilter. For instance, the for expression above would translate to:

It is recommended (but not enforced) that withFilter is lazy. That is,
it should return a proxy object with methods for map, flatMap,foreach, and withFilter. Each of these would combine a guard with theoriginal operation. I append an implementation of withFilter for classTraversableLike which would be inherited by all collection classes.

To ensure a smooth transition, I propose that if the `for' translationscheme cannot find the `withFilter' method, it uses `filter' instead,but emits a deprecated warning.

I believe this proposal has a lot going for it: It's not significantly
more complex than the previous translation, it leads to more intuitivebehavior, and it can be a win in performance because it reduces thenumber of intermediate structures that are built.

The only possible downside is that it might break some code. But I
doubt that would be very likely, as code that relies on a filterseeing a snapshot of the state before the loop body is executed seemsrather weird.

I think this also settles the case for Range. Range should be strict,
and we do not need an exception for its filter anymore.

Cheers

-- Martin

Here's the proposed implementation of withFilter in class TraversableLike: