Type inference across method chains

Type inference across method chains

I have been trying to figure out why type inference doesn't work across method chaining (see the code example below). I finally came across these old messages on lambda-dev:

On Tue Jul 16 15:08:05 PDT 2013, Dan Smith wrote:> Without ruling out the possibility of enhancements that address situations like this, [...] the status quo is that when you type a dot, the compiler has to completely type-check the stuff to the left before it can proceed. Inference can do a lot of cool tricks with context, but will not be influenced by "context" that occurs after the dot.

Method chaining really is getting much more common in Java, with the advent of APIs like Java 8 streaming, Apache Spark and Apache Flink.

Type inference across chained calls in Flink seems to work sometimes, but not always, which is frustrating. You end up having to add type declarations in lots of places that shouldn't need them, or breaking method chains into separate declarations so that the compiler can understand the types.

Based on Dan Smith's comments in the above thread, it seems that it was decided that the numerous complex steps involved in receiver method resolution had to all be completed before parameter type inference. However, it seems that if the entire graph of all possible method signatures and all type constraints were built before any inference were performed, then all of these steps could be performed simultaneously and iteratively until convergence: in other words, all the types and methods in a method call chain could be solved as a single constraint satisfaction problem. If in the process of solving this constraint satisfaction problem, it was discovered that there is no single satisfying assignment of types to type variables, and/or if this did not result in a single method definition being selected per method name, then a type error would be reported.

Based on the discussion in the thread above, it sounds like the reason this hasn't been done yet is due to the complexity of the current compiler design, not because it's not actually possible to do this?

Some example code: the method test1() below really should typecheck OK, but gives an error on (l + 1) in Java 8. Providing types for the lambda params in test2() allows the types to propagate one method call further along the chain, but it is frustrating to always have to do that. Breaking up the chaining into two declarations in test3() allows the typechecking to work for both operations, but this defeats the purpose.

Re: Type inference across method chains

Its a pretty big overstatement to say "type inference doesn't
work across method chaining." As is inevitable with type
inference, there will be limits, and people will always complain
about the marginal example, and say "but this example could be
accommodated too, with a more complex inference algorithm." And
that's usually correct, but then there will be new cases at the
margin, about which people will complain. So let's not fool
ourselves that this is "the last case" -- it's just "the next
case."

There's nothing magic about type inference; it's just constraint
solving. The choices one has in designing a type inference
algorithm amount to: what constraints to use, and over what scope
to solve.

There are also a number of hidden trade-offs besides complexity.
For example, the larger the scope over which constraints are
solved, the more likely error messages will be something like
"there was an error somewhere in Foo.java -- which users find
unhelpful.

So, with that, let's look at your example. The cases which make
people complain "type inference doesn't work under chaining"
invariably reduce to the case where there is a generic method used
as a receiver for a chained invocation, and whose generic type
parameters cannot be solved directly from its arguments, but
instead would need type information that would have to be
back-propagated from later links in the chain (which generally
further require information from the target type.)

As you correctly point out, there are simple workarounds, such as
using explicit lambdas or exact method refs (test2), breaking up
the chain (test3), or providing explicit type witnesses (not
shown.)

Type inference across chained calls in Flink
seems to work sometimes, but not always, which is frustrating.

"sometimes but not always" is a fact of life with type inference;
let's not pretend that "always" is a place we could get to. It's
just a question of where to draw the line, and accept that there
will always be something just over the line.

Method chaining really is getting much more common in Java,
with the advent of APIs like Java 8 streaming, Apache Spark and
Apache Flink.

You say this like you think we're not aware of this, or
indeed that enabling broader use of chaining in APIs wasn't, like,
one of the major goals of the work we did in Java 8 on type
inference! The reality is: there will always be limitations, and
they will always be annoying when you encounter them.

Based on the discussion in the thread above, it sounds like
the reason this hasn't been done yet is due to the complexity of
the current compiler design, not because it's not actually
possible to do this?

It is indeed not impossible. But, as you say, it adds
additional complexity to a type inference algorithm that has
already grown quite complex (and which now interacts heavily with
another complex mechanism, overload selection). Could we take one
more step, and move the corner cases further out to the margins,
at the cost of greater complexity (and perhaps decreased
usability)? Yes, we could. In Java 8, we chose to leave it here,
which I think was a sensible choice.

We might someday come back to look at this -- but right now, it
seems like this is not the highest-return use of limited and
highly specialized resources.

On 3/16/2017 10:55 PM, Luke Hutchison
wrote:

I have been trying to figure out why type inference
doesn't work across method chaining (see the code example
below). I finally came across these old messages on lambda-dev:

On Tue Jul 16 15:08:05 PDT 2013, Dan Smith wrote:
> Without ruling out the possibility of enhancements that
address situations like this, [...] the status quo is that when
you type a dot, the compiler has to completely type-check the
stuff to the left before it can proceed. Inference can do a lot
of cool tricks with context, but will not be influenced by
"context" that occurs after the dot.

Method chaining really is getting much more common in Java,
with the advent of APIs like Java 8 streaming, Apache Spark
and Apache Flink.

Type inference across chained calls in Flink seems to work
sometimes, but not always, which is frustrating. You end up
having to add type declarations in lots of places that
shouldn't need them, or breaking method chains into separate
declarations so that the compiler can understand the types.

Based on Dan Smith's comments in the above thread, it seems
that it was decided that the numerous complex steps involved
in receiver method resolution had to all be completed before
parameter type inference. However, it seems that if the entire
graph of all possible method signatures and all type
constraints were built before any inference were performed,
then all of these steps could be performed simultaneously and
iteratively until convergence: in other words, all the types
and methods in a method call chain could be solved as a single
constraint satisfaction problem. If in the process of solving
this constraint satisfaction problem, it was discovered that
there is no single satisfying assignment of types to type
variables, and/or if this did not result in a single method
definition being selected per method name, then a type error
would be reported.

Based on the discussion in the thread above, it sounds like
the reason this hasn't been done yet is due to the complexity
of the current compiler design, not because it's not actually
possible to do this?

Some example code: the method test1() below really should
typecheck OK, but gives an error on (l + 1) in Java 8. Providing types
for the lambda params in test2() allows the types to propagate
one method call further along the chain, but it is frustrating
to always have to do that. Breaking up the chaining into two
declarations in test3() allows the typechecking to work for
both operations, but this defeats the purpose.

Re: Type inference across method chains

So let's not fool
ourselves that this is "the last case" -- it's just "the next
case."
[...]So, with that, let's look at your example. The cases which make
people complain "type inference doesn't work under chaining"
invariably reduce to the case where there is a generic method used
as a receiver for a chained invocation, and whose generic type
parameters cannot be solved directly from its arguments, but
instead would need type information that would have to be
back-propagated from later links in the chain (which generally
further require information from the target type.)

Right, nonetheless, if this single "next case" were solved via bidirectional propagation of type information, rather than the current situation where method overloading before the dot is solved before type inference after the dot, I would argue that this would solve a large majority of remaining real-world cases where an expression should quite obviously typecheck (according to human judgment) but does not. It would certainly cause streaming APIs to work significantly better with generics. This seems to be the most obvious weakness in the current type inference algorithm.

It is indeed not impossible. But, as you say, it adds
additional complexity to a type inference algorithm that has
already grown quite complex (and which now interacts heavily with
another complex mechanism, overload selection). Could we take one
more step, and move the corner cases further out to the margins,
at the cost of greater complexity (and perhaps decreased
usability)? Yes, we could. In Java 8, we chose to leave it here,
which I think was a sensible choice.

We might someday come back to look at this -- but right now, it
seems like this is not the highest-return use of limited and
highly specialized resources.

Can you please address in concrete terms, for this specific request only (i.e. for back-propagation of type information before the final decision about method overloading is made, as opposed to "for all cases" or "for the last case") -- just how difficult solving this would be, and what the technical limitations would be given the current type inference algorithm?

Re: Type inference across method chains

On 3/17/2017 4:41 PM, Luke Hutchison
wrote:

Right, nonetheless, if this single
"next case" were solved via bidirectional propagation of
type information, rather than the current situation where
method overloading before the dot is solved before type
inference after the dot, I would argue that this would solve
a large majority of remaining real-world cases

You may be right, but bear in mind that it *always* looks this way.
You forgot the implicit "...that I currently know of" that needs to
trail this sentence. The definition of "real world" is implicitly
bounded by "the stuff that works now", and always moves. (Ten years
ago, chaining was not considered a real world case at all.)

Can you please address in concrete
terms, for this specific request only (i.e. for
back-propagation of type information before the final
decision about method overloading is made, as opposed to
"for all cases" or "for the last case") -- just how
difficult solving this would be, and what the technical
limitations would be given the current type inference
algorithm?

I know you probably don't think you're asking this, but you're
basically asking "can you solve the problem (or try until you give
up), and then explain what you did." So, that's not really
practical, as much as I'd love to have a simple answer I can pull
out that fits on a page and is accessible to
non-type-system-experts. Suffice to say it's doable, but it's not
trivial, and the risks are not trivial.

Also, bear in mind that "how difficult" is only one (and in reality,
probably the least important) dimension of the problem. "How
difficult" corresponds to "how much will it cost Oracle", which is
certainly a reasonable input into the decision process, but far from
the only one. How about what incremental perceived complexity for
the users would it create? What damage would it do to scrutability
of diagnostics? What are the compatibility risks? (Adding new
constraints into type inference algorithms can often cause something
that was previously solvable to become overconstrained, breaking
existing code.) What is the consequence for the specification? For
implementors like Eclipse that have to implement from the
specification? In what ways might it interact with other features,
present or future?