GHC will spit out all six type parameters, in all their glory. This can be a bit confusing.

Perhaps it's time we start contemplating ways of alleviating this problem from the GHC end, rather than just leaving it to each library developer to solve it in an ad-hoc fashion. Haskell's type system can do some wicked awesome stuff; it's rather silly to not take advantage of it simply for the sake of the default way that error messages are presented.

With type-level records (as seen in e.g. Ur), you could bundle some of the parameters together to get better error messages. Keeping the monad and the result type as separate parameters for the Monad and MonadTrans instances, you might end up with a type like

In general, I'm skeptical of introducing type synonyms (excepting associated type synonyms in classes) for the purpose of clarifying errors. It tends to be fragile, and depending on the details of the error, the synonym may or may not appear at any given use, which just makes things worse. I'm similarly skeptical of using type synonyms to "hide" complexity, for the same reason. Synonyms just add vocabulary to learn; they never remove anything that you need to understand to comprehend an interface. Type synonyms are a solution to a documentation problem, but never to a complexity problem, whether in API or errors.

I remembered having asked on Haskell-Café why type synonyms partial applications weren't allowed. I believe I remember having received the classical answer ("it adds inteterminacy/undecidability in some cases for the type inferer").

Still, it's true that newtypes are too pervasive to be convenient in that respect.

It is not an issue of undecidability. Indeterminacy is a better word. But it's not "some cases", it's essentially every case where we would want to infer a type constructor of higher kind.

Suppose I write

return "foo" :: (String, String) -- or (,) String String

The type of return is (Monad m) => a -> m a and "foo" :: String so we need to solve (,) String String = m String for m. In Haskell, we can do this--the solution is m = (,) String. That only works because we have no type-level lambdas! If m could be a type-level lambda, there are other possibilities, like

even before considering type system extensions like type families. How is the compiler supposed to select the appropriate instance? All of these are potentially monads.

And if the compiler cannot select the instance automatically, there's no point to using type classes rather than just a record of functions. So the request "I want to be able to write instances for type-level lambdas" actually has negative utility, you can't gain anything from doing so and you break the existing type class system. It is not an issue of "not implemented yet".

(As an aside, I wish there was a definitive, fuller writeup of this justification that I could refer people to, since I have explained this about a dozen times. It relates fundamentally to the way the Haskell type system works and it should be part of the general wisdom about type classes. I know the Haskell "old hands" know this, but it should be more accessible somehow to people who are just getting comfortable with type classes and higher kinds.)

While there is indeterminacy here, these unifications are only potential solutions to the system of constraints we're trying to solve. We have to push the question back one step: these potential solutions are only actual solutions if there are instances for them in scope. But, notably, there cannot be instances for more than one of those unless we also have OverlappingInstances (and possibly IncoherentInstances). Thus, in the absence of those extensions, the actual solution (i.e., the instance dictionary required) is guaranteed to be unique, whatever it is. And even with OverlappingInstances, in the absence of IncoherentInstances we are still guaranteed to have a subsumption lattice, and therefore we can always uniquely choose the most specific instance.

Yes, it would greatly complicate the inferencer. And, sure, there may be uglier cases which can refute this argument. But the presence of the argument suggests that the problem is not, in fact, known to be impossible. I agree that it's hardly a "not implemented yet" ---there's a lot of theory to be done yet to prove confluence of type-class resolution--- but I'm not convinced that it's impossible.

Move the Pipe constructors to a separate module (Data.Conduit.Internal), and recommend people avoid using them directly.

I will say that I think this is good practice. Can't remember the number of times I've wanted to make legitimate extensions to a library but was unable to due to unexported constructors or unexported methods, without a good reason for not exporting them. I had to go and patch the library itself. So I feel the .Internal approach is best, it's a way for the API developer to say “hey, you can use it, but don't rely on it”, but also doesn't block the developer from extending/experimenting. IME such “encapsulation” is almost always premature and just annoying.

For example:

class PrintfType t
The PrintfType class provides the variable argument magic for printf. Its implementation
is intentionally not visible from this module.

I feel like Haskell's implementation hiding policies are too strongly influenced by the OO dogma around at the time of its creation. Using hiding to enforce type system invariants is a valuable tool, and I don't think I'd change the language at all, just suggest that as a community matter it would be better for the internals to more frequently be exposed, but labeled as such (.Internals is great). In other languages like Python or Perl, if the implementation is a little too locked up, you can generally still do what you need to do without library modification, but the combination of all 3 things above means if the library author isn't precisely right you've often got no options left but to crack it open and start directly modifying it. Making the wrong protections too strong is not a win.

The problem often is that while we have a strong type system, it isn't strong enough.

For example, I often use newtypes to implement weak dependent sums in Haskell. I.e., given newtype PFoo = PFoo Foo, the type PFoo represents Foos which satisfy some additional predicate P (e.g., non-negative, non-zero, etc). The whole point of using newtypes is to ensure that being strongly type correct does not actually impair performance. However, if I expose the constructor for these newtypes, then the type safety goes out the window--- since Haskell lacks dependent types and so there is no way to force the standard typechecker to respect the invariants I'm encoding.

I understand that the subject of your ire isn't newtypes like these but rather data structures etc. However, I see no principal distinction between those cases. Often with data structures the constructors must be hidden for exactly the same reason that the newtype constructors are hidden: the structure uses powerful type invariants which Haskell's type system cannot express (or cannot express efficiently).

And even if we did have full dependent types, there are still reasons for implementation hiding--- or else Coq, Agda, and everyone else wouldn't be doing it. One reason is simply a pragmatic one: users of, say, IntMap shouldn't care how it's proved correct; they just want to know that it is. That is, while dependent types are needed locally within the implementation in order to demonstrate its correctness, the dependencies don't often escape into the API. But another reason for hiding is that it appears to be necessary for fundamental reasons. That is, by hiding implementations you tell the typechecker where it is not allowed to unfold things, and this is necessary to prevent type checking from taking unacceptably long.

Fundamentally there is always a difference between the thing being implemented and the content of the implementation. The implementation lives at the level of syntax, whereas the thing being implemented lives at the level of semantics. Computers understand structure/syntax, but they cannot comprehend meaning/semantics. Thus, this distinction is firmly in the realm of human thought and, much as we try to codify it into syntax, there will always be a gap between the two. Thus, without implementation hiding there is no way for the computer to ensure that clients are not depending on implementation details.

Why do a strong type system and immutability make implementation hiding any less relevant?

The reason to do implementation hiding, besides enforcing invariants, is so that people don't wind up depending on your implementation, so you can change it without causing breakage. It's to preserve your freedom as a library author and to protect users of the library. It's possible to do compromises like making implementation details public while stating that "this can change at any time and if you use it you're on your own", but in practice, if something important ends up depending on it, it can wind up being impractical to change all the same. If some Haskell package authors are overzealous with their implementation hiding, leaving the public interfaces insufficiently expressive, that seems like a human problem, not anything to do with the language.

Or - I think I just now got your meaning - if in languages like Python and Perl due to the nature of the language it's possible to subvert the module system, I would call that a bug, not a feature. But if that's what you want to do, then Template Haskell might be able to do it.

(As an aside, do mutability and a lack of static typing have anything to do with it in even Perl and Python's case? I remember there being a big internet debate between the Python and Ruby camps about how in Python private members were only private by convention but could actually be accessed, which seems to imply that Ruby does manage to enforce implementation hiding even though it's dynamically typed and mutable.)

Thinking about it a little, I think the situation might be a result of Haskell programmers placing a higher priority on enforcing invariants than programmers of other languages. So if they have some things in the library which can be used to break invariants but which could also be necessary for some purposes, they will opt to hide those parts and preserve the invariants. In that case the Internal module is a good compromise (though maybe Unsafe would be a more appropriate name?): the implication is that you can do bad things with it so you better watch out, not that it might be changed without warning.

The reason to do implementation hiding, besides enforcing invariants, is so that people don't wind up depending on your implementation, so you change it without causing breakage.

There's a missing link in your logic there, which is that only implementation hiding can accomplish this goal. Since the time that Haskell was created, I believe a number of communities have established that sufficiently strong community standards suffice to obtain the benefits without incurring anywhere near the costs, and the costs are greater on the Haskell side than in those languages due to the radically superior strength of enforcement.

Immutability fits into this complex because in Python/Perl/Ruby/other languages I often solve this with some form of monkeypatching, which is a form of mutability. I also often solve this with direct mutation of state I'm not "supposed" to mutate, also mutability.

The strong typing comes in because it lets us pick up many dependency change issues at compile time, which seriously mitigates the dangers of overdependence on internals. (It doesn't mitigate the pain of having to change, but it does mitigate the danger.) This is the one that I think particularly invalidates simple repetition of the standard OO line about the need for implementation hiding. The types carry a lot more information in Haskell than Java. It is enough for only one of the types or implementation hiding to provide compile-time protection, we don't redundantly need both.

if in languages like Python and Perl due to the nature of the language it's possible to subvert the module system, I would call that a bug, not a feature

True, but I'd suggest in some sense it's a bug in the module, rather than the language, at least relative to your use. If the module must be modified before you can successfully use it, doesn't that sound like a module problem? (I say "relative to your use" because it may be doing precisely what the designer intended, so in general we might not call that "a bug", but it is from your point of view.)

I think a lot more stuff ends up hidden than needs to be. If you've got a library that boils down to parsing something, processing the stream of tokens, and reassembling it (imagine an HTML cleansing library), while I appreciate the clean interface of Thing -> Thing, I'd like access to the stream to put my own processors in too, for instance. Functional programming often results in lots of little useful pieces and it would be greatly more compositional if those little pieces were available for use.

(And to reiterate, no this is not an absolute declaration that implementation hiding is bad, it's a statement of my opinion that the current balance is suboptimal. I would not change the language itself at all, it has use cases for which it is clearly necessary.)

Right, I don't think we disagree about the remedy at all. People are hiding too many things that would be useful, and it would be better to export them in some way, even it means they come with a warning attached.

We disagree about other things - I prefer that if we have implementation hiding, that it's enforced, you prefer that it can be circumvented in case of need; in addition I was trying to disentangle the two reasons for implementation hiding - but that's off topic here. (I agree that Haskell's types make it much easier to adapt to changes in a correct way and that that's a very good thing, but I think it's somewhat orthogonal.)

Actually, it can be circumvented in the case of need, at least since in the vast majority of cases, if not all cases, Haskell libraries are distributed as source. I think one potential difference between your way of thought and mine is that I already see that if I put my mind to it I can blow my way past any invariant you care to lay out, so why decieve yourself into thinking that your invariants are being enforced by anything other than consensus already? And given that, why not make all our collective lives a little easier by understanding, accepting, and acting on that reality? A balance that more accurately reflects reality has the ironic consequence that by appearing to superficially weaken the code, it actually strengthens it by making it less likely that a user will have to actually change it for it to work at all, often much more blindly than they would be doing than by working with internals. It also provides great places to document the invariants and the consequences for violation. I know this can be hard to see in the abstract, but every change I've had to make to existing Haskell libraries for some particular thing I needed to work would have been unnecessary if the code had been less obsessive about hiding its internals. (Particularly the older code I've encountered with the old and now busted exception handling deeply woven into it.)

If you modify my library you're not depending on my library, you're depending on your fork of my library, and it's up to you to maintain it. That's fine with me.

But we still seem to be talking past each other: I'm saying the language should not allow implementation hiding to be circumvented; you're responding that implementation hiding can be circumvented from outside the language, therefore libraries should export more of their internals. And I agree! Libraries in many cases should export more of their internals. But that doesn't touch the question of why implementation hiding should be easier to circumvent.

(As an aside, Template Haskell really can see unexported constructors, though some people consider it a bug. Have you considered using it?)

Gabriel will probably correct me on this, and I'm not sure if I can actually get it to work. And even if I could, I don't think it's all that useful. Maybe with some more type parameters? We still have 18 to go :)

I have a pretty good idea what the function is doing: creating a stream of ByteStrings. On the other hand:

someFunc :: Monad m => Pipe l i ByteString u m ()

is much harder to parse. Even someone who is familiar with the datatype will still need to take the time to count location of parameters. I'm not sure how significant this complexity is for users; that's what I'm trying to determine.

For errors: I agree that it can be simulated via usage of Either. I think that setting the output of every Pipe in the parameter to have Either e ... would be sufficient. But I'm not even sure yet how useful error propagation is. We've been getting by without it in conduit for a while, and enumerator before that. I've yet to see real use cases where it's necessary.

I'm also not sure about propagating leftovers upstream. I've actually been considering a modified version of injectLeftovers that holds onto the leftovers and returns them with the result. It might solve the one case where I needed constructors that I mentioned in the blog post (request header limiting in http-conduit).

And you're absolutely correct about generalizing the composition operator; I've done so in the devel branch now. Thanks!

Alright, I'm coming around to this point of view. I'd much prefer to see a clearer type parameter there, though. It took me some nontrivial thought to realize that there are really only two options there, and that l is only intended to be unified with a if anything. If GHC extensions are not an issue, then a GADT with phantom types Leftovers and NoLeftovers could be much cleaner. A new kind as well would be even clearer yet, but I'm afraid that it's a bit premature to expect any kind of core package to require GHC 7.4.

There's still one problem, namely that to get a proper upstream identity you cannot receive a Right after a Left, however if you separate the category of u values from the return value r then I think you can then implement this correctly and form a strong upstream identity.

The lack of a strong upstream identity manifests itself in the inability to return a value directly from upstream. I know this is not as important to conduits as to pipes, but I briefly mentioned the benefits of being able to do so here:

Can you clarify what you mean by "separate the category of u values from the turn value r"? I'm not sure I follow.

I actually think the inability to return a value directly from upstream is an advantage of this approach versus pipes: the type system is able to more strongly enforce invariants, such as the fact that you should only be able to return a value from upstream if downstream is an infinite consumer. This is what lets conduit avoid the need for all of the Maybe wrapping in pipes.

Anyway, it's easy to construct a helper function to create infinite consumers in this new approach, e.g.:

Can you clarify what you mean by "separate the category of u values from the turn value r"?

I mean something like:

Pipe l i o u1 u2 m r

I guess the part I'm not sure about is how you write something like awaitF from Frames that automatically handles upstream termination (i.e. returns a instead of Maybe a) without bottoming. I find the lack of a proper awaitF makes conduit consumers more difficult to write, as I noted in my link. In other words, I'm looking for something with the type:

Pipe l i o u m i

However, you are right that this does make folds easier to write. Perhaps these should be considered two separate categories with separate purposes (i.e. the pipes one is not biased towards any pipe in the chain returning, whereas yours is a fold-like category that biases towards the downstream pipe). There is still the issue of the weaker upstream identity that requires invariants, too, and that will always bug me, but you and twanvl are really making some interesting points and I need to think more about this.

Do you have a sketch of an implementation somewhere with the Pipe l i o u1 u2 m r type that shows how it can be used for auto-termination? It's not clear to me how to make that work, or what exactly you have in mind here.

Initially I'm rather favoring Twan's point that this can be accomplished with EitherT when needed.

I already benchmarked conduits, pipes, and frames, and conduits 0.4 is slower than pipes, but faster than frames, and the changes he's proposing should not affect performance. The majority of performance degradation is from the finalizer caching, which this doesn't impact.

I'd be interested in seeing those benchmarks. In the benchmark in the conduit repo, pipes is 20x slower than conduit on pure code, and 2x as slow on file writing code.

This devel branch has been thoroughly performance tested, and AFAICT there are no regressions. I had to add a few rewrite rules to avoid intermediate data structures (e.g., a rewrite rule for awaitE == flip NeedInput), but as those seem to fire reliably there doesn't seem to be a problem.

I agree that in general it's meaningless how fast these things perform in completely pure code: if you can express it in pure code, just use laziness and don't bother with any streaming library!

The reason pure code does matter though is that it's very common to need to combine some pure Pipes (like map or fold) with IO pipes. I should figure out a good benchmark to measure the effect of adding some pure pipes to an IO pipeline like file copy.

Yeah. Also, the only reason pipes go faster than conduits is because they are not doing any finalization work. However right now I'm working on implementing a proper parametrized monad, which is why I've been a bit silent on pipes.

Actually, I was a bit mistaken in my comment: there's a very slight performance decrease for pure code from conduit 0.4 to 0.5, but a noticeable increase edit for non-pure code.. You can see the Criterion results here: