This is a known problem that I've discussed with Jacques and I think could and should be improved before the next version.

The fundamental issue is that the OCaml syntax has no way to *name* the existential variables corresponding to a GADT constructor. The name generation it uses now ("ex#" + fresh integer) is particularly obnoxious, but in any case there is a type corresponding to your variable 'a in the type of the constructor A that has no name.

My personal opinion is that not having explicit binding of those variables in the syntax (in the constructors: (A : 'a. ('a -> 'b) * ... -> 'b t), and in the matching (| A a (e, e') -> ...)) is a defect of the GADT syntax -- but syntax is about hard compromises, and here the more concise syntax was chosen.

I believe the current behavior could be improved by:
- choosing names for existential that are more natural; I would like them to include either the constructor they come from, or the original user-chosen name of the corresponding type variable in the type definition, or both
- improving the wording of the error message to avoid the annoying "has type 'foo' but was expected at type 'foo'" and possible display something GADT-specific

In the meantime, this is not a bug (the behavior is perfectly correct) but a rather shortcoming on the user interface side. And it is not "sometimes", all existential variables have those unreadable names, and all escape situations generate those unreadable error messages.

Thanks for your report anyway: it's helpful to see that people care about this.

This is indeed a well known problem.
Actually part of it was a bug, as there was already some code to reuse variable names from the definition, but it didn't work properly.
This part is now fixed in trunk, at revision 12998.
The new error message is:
Error: This expression has type a#0 t but an expression was expected of type
a#0
Unfortunately, this is only a small improvement, as origin of the name is not so clear, and many existential types do not come directly from constructors.
As for the "#0" part, this is indeed to avoid name collisions.

As Gabriel mentioned, there has always been a plan to support naming of existential types in patterns.
However this requires new syntax, and this was delayed waiting for more experience.
This should be in the next release.

I'm open to improvements of the wording, but clever error reporting is very difficult.

I have added a patch in trunk/experimental/garrigue/pattern-local-types.diffs which allows to define types in patterns.
They are only allowed at the beginning of match cases, and the syntax is:
| type a b c. <pattern> -> <body>
They are _different_ from local abstract types, as they are not abstract, but inferred from their uses as type annotations inside <pattern>. If the inferred type is a fresh existential type, then some special manipulation is done to rename it to the given name, so that it will appear in error messages.

To be clear to other potential readers of this bug item, the problem is that as there is currently no explicit binding syntax on GADT type *definitions*, the existential types are introduced in no particular order. If you define a constructor type K with two existential types 'a and 'b, we cannot use, say, the syntax (K (type a) (type b) (x, f) -> ...) because there is no notion of "first" and "second" existential type. So Jacques' choice here is to allow to introduce names for some types, that are later unified with the corresponding existential variable by additional annotations on the pattern arguments (in his example, (x:b) is necessary to link the fresh name b to the actual existential variable 'a of D).

I'm not really satisfied with this compromise, because the notations are still heavier than necessary (though arguably it helps readability to force to annotate use of the types) and we have yet another semantic distinction around type variable binding.

What about the following other compromise:
- allow the optional use of a type binding syntax at the GADT declaration site
type t = D : 'a . 'a * ('a -> int) -> t
(I'm not hung up on this particular syntax, for example it would be ok to have "type a . a * (a -> int) -> t" instead)
- use the "(type a)" syntax to locally introduce the type after the constructor name and before the value parameters, using the order in the definition
function D (type a) (x, f) -> f x
- for GADTs that have not used an explicit order in their definition, reject this syntax (or possibly use the left-to-right order of occurence of the variables, but this would be a hack)

I believe this gives us a reasonable transitional path to a more principle syntax, with ultimately lighter annotations at the matching site than what is possible with your suggestion.

(Of course I'm no expert on the grammatical ambiguities here so one of the syntax proposed may actually not be an option. The important idea is having explicit type binding syntax at both the definition and use site of the constructors.)

I'm actually not so sure that this use of the "(type a)" syntax is consistent with its current use (which is important: I don't want to introduce new semantic notion). Is it the case? Or should we prefer another syntax for existential types in GADT patterns?

(Where are we on the "(type a b)" sugar? Is it currently implemented?)

Well, I was also a promoter of explicit syntax for existentials in GADT definitions, so I cannot say I don't agree with Gabriel for the first part.

OK, your proposal makes sense, but it has the major drawback of requiring changes in type definitions, while we still allow the old format.
So if say, you are using a library with implicit existentials, you would have no way to use them.
Well, maybe using left-to-right order is okay, but we must be very careful about avoiding bugs (the internal representation must keep the syntactic order, which is not usually true for objects and polymorphic variants for instance.)
Also there is the question of how to handle row variables.
I think your use of (type a) is consistent (you do not really abstract an external type, since the existential is already abstract, and cannot leak anyway, but otherwise this is the same thing), so at least this part is okay.
There is currently not (type a b) sugar, but this would cost nothing.

As for a transition path for a more principled syntax, I would say either this is too late, or we must decide now to change the syntax and maybe break some existing programs, because otherwise it will be too late...

If I would give one argument for my approach, I would say that while it can be in some situations lightly more verbose, it is in general pretty light-weight, thanks to the inference-based approach.

And why not simply improving the error message by printing (for instance):

Error: This expression has type (type a) a t but an expression was expected of type
(type a) a

This would explicitly tell the user that there is some existential around. After all, my surprise essentially came from the strange format of the variable name and the fact that its abstract nature was not explicit in the message.

I'm not sure "(type a) a" more clearly indicates an existential type than "ex#23", and besides I'm not sure it's meaningful to close on all existentials (that may also be involved in some other part of the context) when printing the type.

However, I think the error messages of the form "This expression has type <foo> but an expression was expected of type <foo>", with the same type printed at both places, are always harmful for non-experts and could be systematically skipped (or turned into a less confusing "There is a typing conflict at type <foo>", not very informative but also prints the type). I don't know whether that would be easy to implement, though.

the type system just isn't able to deduce this equality. But it could, if it
checked whether 'a appeared elsewhere in the type and then safely removed it.
This check is very similar to checking that the existential doesn't escape in:

The difference between the two is quite important; the existential is opened in one case by the function and in the other case by the caller. Since we are talking about naming the existential, this distinction is rather important.

first defines a rigid variable (a), and that the patterns (Ext(x : a)) first introduces an existential variable ex#N, and then upon processing the pattern (x : a) adds the local equation (a = ex#N) in the branch of the pattern clause. We are not naming (a) the actual existential variable, but working in a context where both are equivalent, just as

let cast (type a) (Refl : (a, int) eq) = fun (x : a) -> (x : int)

works in a local context enriched with the equation (a = int).

Currently such GADT equations are only added by an annotation on the type of the whole GADT value. This does not allow to name existentials as they precisely don't appear in the outer GADT type.

It seems this is a different mechanism from the one Jacques described for (type a . Ext (x : a)). Similar subtleties/hacks would probably be needed to get error messages to mention (a) rather than the existential names.

PS: I'm copying below the message from Oleg on caml-list about error messages for GADTs, which could be an interesting inspiration for useful error messages orthogonally to the naming of existentials discussed here.

> Just in case, here is the error message produced by GHC for the
> equivalent example.
>
> Code:
>
> {-# LANGUAGE GADTs #-}
>
> data E a where
> B :: Bool -> E Bool
> A :: E (a -> b) -> E a -> E b
>
> eval :: E a -> a
> eval (B x) = x
> eval (A f x) = (eval f) x
>
> The error message:
>
> /tmp/ga.hs:9:25:
> Couldn't match type `a1' with `E a1'
> `a1' is a rigid type variable bound by
> a pattern with constructor
> A :: forall b a. E (a -> b) -> E a -> E b,
> in an equation for `eval'
> at /tmp/ga.hs:9:7
> In the second argument of `eval', namely `x'
> In the expression: (eval f) x
> In an equation for `eval': eval (A f x) = (eval f) x
>
> The gist is using the type variable name 'a' from the definition of
> GADT, and attaching the numeric suffix to disambiguate. (GHC does that all
> the time -- and I believe OCaml does something similar in many cases,
> only not for GADTs.) What helps in particular I believe is quoting the
> clause from the GADT declaration, so we can see where the name 'a'
> comes from. So, perhaps if the error message can be improved with
> giving more information about the error, the problem will be solved
> without additional syntax?

I had dropped this discussion hoping that a natural solution would arise,
but unfortunately a feature that was expected to be in 4.01 will not even
be in 4.02.

Looking at the proposals so far, they all seem to be incoherent with the
current constructs (using a type construction in a pattern context, missing
notion of parameter, or incompatible scoping rules).

I've been discussing that with Jeremy Yallop, and we think we have a potential
solution:
(Ext (x : type a. a))
with the scope of "type a" being everything to its right.

The syntax is nice, and the meaning seems clear.
And since we are on the right of x, there is no longer the problem of
rebinding the type.
There are however a few difficulties
1) originally, in
let f : type a. typ = expr1 in expr2
the scope of "type a" is expr1, whereas the pattern-matching equivalent
of the above would be
match expr1 with (f : type a. typ) -> expr2
which gives a different scope.
Note however that the meaning of "let f : typ = expr" is "let f = (expr : typ)",
not "let (f : typ) = expr", so there is no formal ambiguity.
2) The original "type a. typ" is defined as syntactic sugar, and forces the
pattern to have a really polymorphic type. Here this is not the case since
we want to match an existential type. This could be solved by rather
using the syntax
(Ext (x (type a) : a)
This is ok too, since (type a) is to the right of x.
However, this is less readable, and this does not solve point (1),
i.e. there is the same potential confusion with "let f (type a) = expr1 in expr2".
3) There is some interaction between these abstract types appearing inside
patterns and type propagation. Consider this pattern:
(... (Ext (x : type a. a)) ... : a t)
Independently of whether it will be typable or not (in many cases it wouldn't,
since the type would leak), the scoping rules say that the "a" in "a t" is the type
of x. For this reason, we need an extra traversal, to check which names are,
and if a type annotation contains names bound in a subpattern, we must delay
building the type to after typing the subpattern.

> As I explained in my previous answer
> (type a) (Ext (x : a))
> is incorrect from a scoping point of view: it tries to bind an
> existential that does not exist yet.

Only if you think in terms of closed existential types. If we consider these to be open existential types (see "Modeling Abstract Types in Modules with Open Existential Types" by Montagu and Rémy) then:

let (type a) (Ext (x : a)) = ext in ...

is simply

ν a. let (Ext y) = ext in let x = open(a) y in ...

So `(type a)` is reduction and `(Ext (x : a))` is opening in the terminology of Montagu et al.

Actually, the binder of the existential is the constructor, not the variable, so
Ext((type a) x : a)
would be ok too.

However, the syntax for type annotations in patterns is devilish. That's what
you would have to write for Abs:
Abs (((type a) f : a -> _), x)
Ugly isn't it?

I'm starting to think that Gabriel's idea might end up being the cleanest.
Namely, the main drawback was when there are several existential variables.
But in most cases, there is only one:
Abs (type a) (f,x)
is non-ambiguous.
When there are several variables, as suggested before, we can use a syntax
to give explicitly the existentials in the declaration.
Also, contrary to what I though first, this approach makes it easier to
handle private rows, without complex hacks involved.

In the workaround code, you first pattern match on D, which introduces the existential variable in the environment, before it is matched against (type b).
In the proposal version, type b is introduced before the pattern matching, so the original scoping rules do not allow to redefine an already introduced abstract type.
I do not see how we can reconciliate the two different scoping approaches, short of introducing a new keyword.
Otherwise, (type b) has to appear after D for coherence.

Re. explicit ordering of existential variables in GADT constructor declarations, we may use either syntaxes:

(1a) Prod : 'b 'c . 'b expr * 'c expr -> ('b * 'c) expr

and

(1b) Prod : type b c . b expr * c expr -> (b * c) expr

And conversely in pattern position,

(2a) Prod b c (e1, e2) -> ...

or (if we decide that we really want the (type a b ...) syntax)

(2b) Prod (type b c) (e1, e2) -> ...

I personally like (2b) best (it would also be the least confusing for users, in the sense that someone that doesn't know about existential type would understand that the (type b c) part can be ignored in (2b), and be confused about (2a)), but for constructor declaration I have no preference between (1a) and (1b) ((1a) is the more incremental choice though), and I think we might want to allow both.

> Currently (type a) both introduces a new type and binds it (implicitly) at the point it is introduced.
> So this would clearly be a change in semantics.

Yes, but it seems a reasonable one, and I think it is more important to avoid adding yet another type variable syntax.

The semantics seem simple enough; a newtype is either used in a (single) existential pattern which binds it, or it may escape its scope in which case it is bound when it is introduced. If someone tries to do both they get an escaping scope error, which is what they would get even if they didn't name the existential.

In a way it is not even a change in semantics. If you consider

let f (type a) x = x in ...

to introduce a type `a` and not bind it, and

let f (type a) (x : a) = x in ...

to introduce a type `a` and bind it to the type of the first argument then we already have a semantics where `(type a)` introduces a type and `(... : a)` binds it.

Personally I don't particularly like adding a form of explicit type argument, which is what this proposal amounts to. I also dislike giving the ordering of type variables a meaning. Does this means that

I agree that the ordering-meaningfulness is slightly unpleasant, but you can always annotate the value arguments of the pattern with the bound names, so that ordering mistakes are immediately caught.

The fact that this annotation is not mandatory means that this syntax is in general lighter.

Finally, you can also support or-patterns (but in a more verbose way):
let f (A (type a) (x, g) | B (type a) (x, g)) = g x
The usual scoping requirement that both sides must define the same variables applies to type "variables" as well.

Concerning the syntax for explicit arguments, (1a) (2b) are clearly the least surprising ones, and it should always be our goal.
Note by the way that it should logically be
Prod (type a) (type b) (x,y)
the eventual (type a b) being only a shortcut.

I know that it is not like Coq, but in ocaml type variables
can be kind of hidden (row variables for instance), and having
a way to say explicitly what is existential and what is not would
be helpful, aside of the fact that having to write all variables when
only few of them are existential would be a pain.

Having a look at CamlinternalFormatBasics is good to give you an
idea of what concrete GADT constructors can be.
There are indeed a number of cases with either one or two existentials
in the definition of fmt. Note that in each case there are also at least
6 non-existential type variables, and in some cases up to 12.
So requiring to write the non-existential variables looks like a real burden.

Of course we must be future proof, so it is worth thinking about potential
situations where we would need to be explicit about non-existential
variables. I see none, but it may just be a lack of insight.

I kind of agree with you in principle.
However, concrete examples look more like
Abs (type a) ((f : a -> b),(x : b))
vs
Abs (type a) (f, x)
Note that you need both annotations, because we cannot propagate the expected type.
I wish we could at least omit the inner parentheses like in SML.
Of course, you would still be free to write the extra annotations (this is precisely the goal of introducing names).

Also, at some point it might be a good thing to recognize the specificity of existential type variables.

Even if we do not propagate the expected type, surely we only need to set `a` to be equal to `ex#1`? Can't this be inferred from a single annotation, similar to how GADT matching adds equations?

Assuming we do need both annotations, I still think it is worth being explicit. In this case I think that clarity is more important than concision, especially since the syntax will probably not be used very often.

> I wish we could at least omit the inner parentheses like in SML.

That would indeed be nice. Is there a grammar conflict, or could it be allowed?

> Even if we do not propagate the expected type, surely we only need to set `a` to be equal to `ex#1`?
> Can't this be inferred from a single annotation, similar to how GADT matching adds equations?

The problem is that the semantics of locally abstract types is that they hide their original definition, rather than being equal to it.
If you look at how Pexp_newtype is typed, you will see that a fresh abstract type is introduced in the environment, then the body is typed without any expected type, and ultimately the internal and external types are made equal.
The reason it works so well is because type inference tells us all occurrences of the abstract type, so it is then sufficient to connect the two.
However doing that during pattern matching would be much more complex, since there is not only the body to type, but also the remainder of the patterns.
So the only solution I see for now would be to be fully explicit, otherwise you end up risking to mix external and internal types in the internal context, where they are incompatible.

The advantage of the positional notation is that you are no longer introducing a locally abstract type hiding something, but just naming the existential variable.

> Assuming we do need both annotations, I still think it is worth being explicit. In this case I think that clarity is more important than concision, especially since the syntax will probably not be used very often.

To me this just sounds like saying that you should annotate all your functions with their complete types.
Sure this may help catch some type errors earlier.
But it doesn't make your type system any safer (at least in this case).
And debugging types also works by adding the annotations later: once you get a type error, you may start annotating, and the more you annotate the surer you get. The fact that introducing annotations is cheap is an advantage.

> The problem is that the semantics of locally abstract types is that they hide their original definition, rather than being equal to it.
...
> However doing that during pattern matching would be much more complex, since there is not only the body to type, but also the remainder of the patterns.

I can see how that would make connecting the inner and outer types during pattern matching difficult. However, I was rather suggesting connecting the inner type of `a` with the inner type of `ex#0` by adding a GADT equation (as in `add_gadt_equation` in ctype.ml).

> To me this just sounds like saying that you should annotate all your functions with their complete types.
> Sure this may help catch some type errors earlier.
> But it doesn't make your type system any safer (at least in this case).

I don't think this is the same thing. Annotating your function types is about adding clarity to a particular piece of code by being specific about what type it has. I'm talking about adding consistency to a piece of syntax by having `(type a)` mean the same thing in all contexts. With the positional approach, in some contexts it introduces a type and in others it binds a type. Sharing a syntax for these different operations is confusing.

> I can see how that would make connecting the inner and outer types during pattern matching difficult.
> However, I was rather suggesting connecting the inner type of `a` with the inner type of `ex#0` by adding a GADT equation (as in `add_gadt_equation` in ctype.ml).

This amounts to changing the semantics. add_gadt_equation is only used for equations enforced by the GADT itself. This is not the case here.

> I'm talking about adding consistency to a piece of syntax by having `(type a)` mean the same thing in all contexts.

It does. This is why I am so careful about the positioning of (type a).
The positional approach makes clear which existential variable is related to "a".
Then you can perfectly choose to follow the hiding semantics, since all occurrences are "hidden" together.

My point is that, to completely keep the same semantics, you need either to use a positional approach, or to be extremely verbose (try annotating camlinternalFormatBasics for fun...)

> Abs (type a) (f, x) looks like an unused newtype declaration.

Until you learn that (type a) immediately after a non-constant constructor in a pattern is a binding occurrence.
The important part is that this behavior refines the usual rule, without introducing a contradiction.

My proposal of making explicit only existentials in the declaration might be more controversial. My justification is that distinguishing existentials will actually help the user, even if this is superficially incoherent with other uses of the universal notation.

>> I can see how that would make connecting the inner and outer types during pattern matching difficult.
>> However, I was rather suggesting connecting the inner type of `a` with the inner type of `ex#0` by adding a GADT equation (as in `add_gadt_equation` in ctype.ml).

> This amounts to changing the semantics. add_gadt_equation is only used for equations enforced by the GADT itself. This is not the case here.

I'm only suggesting it as a means of type-checking `Abs (type a) (f, (x : a))` without having to annotate all uses of `a`.

From this perspective, there is no observational difference between adding a GADT equation and making `ex#0` the external type of `a`, because any use of `a` which escapes the scope of the equation will result in an error anyway.

> Until you learn that (type a) immediately after a non-constant constructor in a pattern is a binding occurrence.

But that's my point, it is one more thing users must know in order to read OCaml.