Among HM's more notable properties is completeness and its ability to deduce the most general type of a given program without the need of any type annotations or other hints supplied by the programmer. Algorithm W is a fast algorithm, performing type inference in almost linear time with respect to the size of the source, making it practically usable to type large programs.[note 1] HM is preferably used for functional languages. It was first implemented as part of the type system of the programming language ML. Since then, HM has been extended in various ways, most notably by constrained types as used in Haskell.

Organizing their original paper, Damas and Milner[4] clearly separated two very different tasks. One is to describe what types an expression can have and another to present an algorithm actually computing a type. Keeping both aspects apart from each other allows one to focus separately on the logic (i.e. meaning) behind the algorithm, as well as to establish a benchmark for the algorithm's properties.

How expressions and types fit to each other is described by means of a deductive system. Like any proof system, it allows different ways to come to a conclusion and since one and the same expression arguably might have different types, dissimilar conclusions about an expression are possible. Contrary to this, the type inference method itself (Algorithm W) is defined as a deterministic step-by-step procedure, leaving no choice what to do next. Thus clearly, decisions not present in the logic might have been made constructing the algorithm, which demand a closer look and justifications but would perhaps remain non-obvious without the above differentiation.

Logic and algorithm share the notions of "expression" and "type", whose form is made precise by the syntax.

The expressions to be typed are exactly those of the lambda calculus, enhanced by a let-expression. These are shown in the table to the right. For readers unfamiliar with the lambda calculus, here is a brief explanation: The application represents applying the function to the argument , often written . The abstraction represents an anonymous function that maps the input to the output . This is also called function literal, common in most contemporary programming languages, and sometimes written as . The let expression represents the result of substituting every occurrence of in with .

Types as a whole are split into two groups, called mono- and polytypes.[note 2]

Monotypes are syntactically represented as terms. A monotype always designates a particular type, in the sense that it is equal only to itself and different from all others.

Examples of monotypes include type constants like or , and parametric types like . These types are examples of applications of type functions, for example, from the set , where the superscript indicates the number of type parameters. The complete set of type functions is arbitrary in HM, except that it must contain at least , the type of functions. It is often written in infix notation for convenience. For example, a function mapping integers to strings has type . [note 3]

Type variables are monotypes. Standing alone, a type variable is meant to be as concrete as or , and clearly different from both. Type variables occurring as monotypes behave as if they were type constants whose identity is unknown. Correspondingly, a function typed only maps values of the particular type on itself. Such a function can only be applied to values having type and to no others.

A function with polytype can map any value of the same type to itself, and the identity function is a value for this type. As another example is the type of a function mapping all finite sets to integers. The count of members is a value for this type. Note that quantifiers can only appear top level, i.e. a type for instance, is excluded by the syntax of types and that monotypes are included in the polytypes, thus a type has the general form .

In a type , the symbol is the quantifier binding the type variables in the monotype . The variables are called quantified and any occurrence of a quantified type variable in is called bound and all unbound type variables in are called free. Like in the lambda calculus, the notion of free and bound variables is essential for the understanding of the meaning of types.

This is certainly the hardest part of HM, perhaps because polytypes containing free variables are not represented in programming languages like Haskell. Likewise, one does not have clauses with free variables in Prolog. In particular developers experienced with both languages and actually knowing all the prerequisites of HM, are likely to slip this point. In Haskell for example, all type variables implicitly occur quantified, i.e. a Haskell type a -> a means here. Because a type like , though it may practically occur in a Haskell program, cannot be expressed there, it can easily be confused with its quantified version.

So what function can have a type like e.g. , i.e. a mixture of both bound and free type variables and what could the free type variable therein mean?

Example 1

Consider in Example 1, with type annotations in brackets. Its parameter is not used in the body, but the variable bound in the outer context of surely is. As a consequence, accepts every value as argument, while returning a value bound outside and with it its type. to the contrary has type , in which all occurring type variables are bound. Evaluating, for instance , results in a function of type , perfectly reflecting that foo's monotype in has been refined by this call.

In this example, the free monotype variable in foo's type becomes meaningful by being quantified in the outer scope, namely in bar's type. I.e. in context of the example, the same type variable appears both bound and free in different types. As a consequence, a free type variable cannot be interpreted better than stating it is a monotype without knowing the context. Turning the statement around, in general, a typing is not meaningful without a context.

Consequently, to get the yet disjoint parts of the syntax, expressions and types together meaningfully, a third part, the context is needed. Syntactically, it is a list of pairs , called assignments or assumptions, stating for each value variable therein a type . All three parts combined gives a typing judgment of the form , stating, that under assumptions , the expression has type .

Now having the complete syntax at hand, one can finally make a meaningful statement about the type of in example 1, above, namely . Contrary to the above formulations, the monotype variable no longer appears unbound, i.e. meaningless, but bound in the context as the type of the value variable . The circumstance whether a type variable is bound or free in the context apparently plays a significant role for a type as part of a typing, so it is made precise in the side box.

While the equality of monotypes is purely syntactical, polytypes offer a richer structure by being related to other types through a specialization relation expressing that is more special than .

When being applied to a value a polymorphic function has to change its shape specializing to deal with this particular type of values. During this process, it also changes its type to match that of the parameter. If for instance the identity function having type is to be applied on a number having type , both simply cannot work together, because all the types are different and nothing fits. What is needed is a function of type . Thus, during application, the polymorphic identity is specialized to a monomorphic version of itself. In terms of the specialization relation, one writes

Now the shape shifting of polymorphic values is not fully arbitrary but rather limited by their pristine polytype. Following what has happened in the example one could paraphrase the rule of specialization, saying, a polymorphic type is specialized by consistently replacing each occurrence of in and dropping the quantifier. While this rule works well for any monotype used as replacement, it fails when a polytype, say is tried as a replacement, resulting in the non-syntactical type . But not only that. Even if a type with nested quantified types would be allowed in the syntax, the result of the substitution would not longer preserve the property of the pristine type, in which both the parameter and the result of the function have the same type, which are now only seemingly equal because both subtypes became independent from each other allowing to specialize the parameter and the result with different types resulting in, e.g. , hardly the right task for an identity function.

The syntactic restriction to allow quantification only top-level is imposed to prevent generalization while specializing. Instead of , the more special type must be produced in this case.

One could undo the former specialization by specializing on some value of type again. In terms of the relation one gains as a summary, meaning that syntactically different polytypes are equal with respect to renaming their quantified variables.

Specialization Rule

Now focusing only on the question whether a type is more special than another and no longer what the specialized type is used for, one could summarize the specialization as in the box above. Paraphrasing it clockwise, a type is specialized by consistently replacing any of the quantified variables by arbitrary monotypes gaining a monotype . Finally, type variables in not occurring free in the pristine type can optionally be quantified.

Thus the specialization rules makes sure that no free variable, i.e. monotype in the pristine type becomes unintentionally bound by a quantifier, but originally quantified variable can be replaced with whatever, even with types introducing new quantified or unquantified type variables.

Starting with a polytype , the specialization could either replace the body by another quantified variable, actually a rename or by some type constant (including the function type) which may or may not have parameters filled either with monotypes or quantified type variables. Once a quantified variable is replaced by a type application, this specialization cannot be undone through another substitution as it was possible for quantified variables. Thus the type application is there to stay. Only if it contains another quantified type variable, the specialization could continue further replacing for it.

So the specialization introduces no further equivalence on polytype beside the already known renaming. Polytypes are syntactically equal up to renaming their quantified variables. The equality of types is a reflexive, antisymmetric and transitive relation and the remaining specializations of polytypes are transitive and with this the relation is an order.

The syntax of HM is carried forward to the syntax of the inference rules that form the body of the formal system, by using the typings as judgments. Each of the rules define what conclusion could be drawn from what premises. Additionally to the judgments, some extra conditions introduced above might be used as premises, too.

A proof using the rules is a sequence of judgments such that all premises are listed before a conclusion. Please see the Examples 2 and 3 below for a possible format of proofs. From left to right, each line shows the conclusion, the of the rule applied and the premises, either by referring to an earlier line (number) if the premise is a judgment or by making the predicate explicit.

The side box shows the deduction rules of the HM type system. One can roughly divide them into two groups:

The first four rules (variable or function access), (application, i.e. function call with one parameter), (abstraction, i.e. function declaration) and (variable declaration) are centered around the syntax, presenting one rule for each of the expression forms. Their meaning is pretty obvious at the first glance, as they decompose each expression, prove their sub-expressions and finally combine the individual types found in the premises to the type in the conclusion.

The second group is formed by the remaining two rules and . They handle specialization and generalization of types. While the rule should be clear from the section on specialization above, complements the former, working in the opposite direction. It allows generalization, i.e. to quantify monotype variables that are not bound in the context. The necessity of this restriction is introduced in the section on free type variables.

As mentioned in the introduction, the rules allow one to deduce different types for one and the same expression. See for instance, Example 2, steps 1,2 and Example 3, steps 2,3 for three different typings of the same expression. Clearly, the different results are not fully unrelated, but connected by the type order. It is an important property of the rule system and this order that whenever more than one type can be deduced for an expression, among them is (modulo alpha-renaming of the type variables) a unique most general type in the sense, that all others are specialization of it. Though the rule system must allow to derive specialized types, a type inference algorithm should deliver this most general or principal type as its result.

Not visible immediately, the rule set encodes a regulation under which circumstances a type might be generalized or not by a slightly varying use of mono- and polytypes in the rules and .

In rule , the value variable of the parameter of the function is added to the context with a monomorphic type through the premise , while in the rule , the variable enters the environment in polymorphic form . Though in both cases the presence of x in the context prevents the use of the generalisation rule for any monotype variable in the assignment, this regulation forces the parameter x in a -expression to remain monomorphic, while in a let-expression, the variable could already be introduced polymorphic, making specializations possible.

As a consequence of this regulation, no type can be inferred for since the parameter is in a monomorphic position, while yields a type , because has been introduced in a let-expression and is treated polymorphic therefore. Note that this behaviour is in strong contrast to the usual definition and the reason why the let-expression appears in the syntax at all. This distinction is called let-polymorphism or let generalization and is a conception owed to HM.

Now that the deduction system of HM is at hand, one could present an algorithm and validate it with respect to the rules. Alternatively, it might be possible to derive it by taking a closer look on how the rules interact and proof are formed. This is done in the remainder of this article focusing on the possible decisions one can make while proving a typing.

Isolating the points in a proof, where no decision is possible at all, the first group of rules centered around the syntax leaves no choice since to each syntactical rule corresponds a unique typing rule, which determines a part of the proof, while between the conclusion and the premises of these fixed parts chains of and could occur. Such a chain could also exist between the conclusion of the proof and the rule for topmost expression. All proofs must have the so sketched shape.

Because the only choice in a proof with respect of rule selection are the and chains, the form of the proof suggests the question whether it can be made more precise, where these chains might be needed. This is in fact possible and leads to a variant of the rules system with no such rules.

A contemporary treatment of HM uses a purely syntax-directed rule system due to Clement[5] as an intermediate step. In this system, the specialization is located directly after the original rule and merged into it, while the generalization becomes part of the rule. There the generalization is also determined to always produce the most general type by introducing the function , which quantifies all monotype variables not bound in .

Formally, to validate, that this new rule system is equivalent to the original , one has to show that , which falls apart into two sub-proofs:

While consistency can be seen by decomposing the rules and of into proofs in , it is likely visible that is incomplete, as one cannot show in , for instance, but only . An only slightly weaker version of completeness is provable [6] though, namely

implying, one can derive the principal type for an expression in allowing to generalize the proof in the end.

Comparing and note that only monotypes appear in the judgments of all rules, now.

Within the rules themselves, assuming a given expression, one is free to pick the instances for (rule) variables not occurring in this expression. These are the instances for the type variable in the rules. Working towards finding the most general type, this choice can be limited to picking suitable types for in and . The decision of a suitable choice cannot be made locally, but its quality becomes apparent in the premises of , the only rule, in which two different types, namely the function's formal and actual parameter type have to come together as one.

Therefore, the general strategy for finding a proof would be to make the most general assumption () for in and to refine this and the choice to be made in until all side conditions imposed by the rules are finally met. Fortunately, no trial and error is needed, since an effective method is known to compute all the choices, Robinson'sUnification in combination with the so-called Union-Find algorithm.

To briefly summarize the union-find algorithm, given the set of all types in a proof, it allows one to group them together into equivalence classes by means of a procedure and to pick a representative for each such class using a procedure. Emphasizing on the word procedure in the sense of side effect, we're clearly leaving the realm of logic to prepare an effective algorithm. The representative of a is determined such, that if both and are type variables the representative is arbitrarily one of them, while uniting a variable and a term, the term becomes the representative. Assuming an implementation of union-find at hand, one can formulate the unification of two monotypes as follows:

unify(ta,tb):
ta = find(ta)
tb = find(tb)
if both ta,tb are terms of the form D p1..pn with identical D,n then
unify(ta[i],tb[i]) for each corresponding ith parameter
elseif at least one of ta,tb is a type variable then
union(ta,tb)
else
error 'types do not match'

The presentation of Algorithm W as shown in the side box does not only deviate significantly from the original[4] but is also a gross abuse of the notation of logical rules, since it includes side effects. It is legitimized here, for allowing a direct comparison with while expressing an efficient implementation at the same time. The rules now specify a procedure with parameters yielding in the conclusion where the execution of the premises proceeds from left to right. Alternatively to a procedure, it could be viewed as an attributation of the expression.

The procedure specializes the polytype by copying the term and replacing the bound type variables consistently by new monotype variables. '' produces a new monotype variable. Likely, has to copy the type introducing new variables for the quantification to avoid unwanted captures. Overall, the algorithm now proceeds by always making the most general choice leaving the specialization to the unification, which by itself produces the most general result. As noted above, the final result has to be generalized to in the end, to gain the most general type for a given expression.

Because the procedures used in the algorithm have nearly O(1) cost, the overall cost of the algorithm is close to linear in the size of the expression for which a type is to be inferred. This is in strong contrast to many other attempts to derive type inference algorithms, which often came out to be NP-hard, if not undecidable with respect to termination. Thus the HM performs as well as the best fully informed type-checking algorithms can. Type-checking here means that an algorithm does not have to find a proof, but only to validate a given one.

Efficiency is slightly reduced because the binding of type variables in the context has to be maintained to allow computation of and enable an occurs check to prevent the building of recursive types during . An example of such a case is , for which no type can be derived using HM. Practically, types are only small terms and do not build up expanding structures. Thus, in complexity analysis, one can treat comparing them as a constant, retaining O(1) costs.

In the original paper,[4] the algorithm is presented more formally using a substitution style instead of side effects in the method above. In the latter form, the side effect invisibly takes care of all places where a type variable is used. Explicitly using substitutions not only makes the algorithm hard to read[dubious– discuss], because the side effect occurs virtually everywhere, but also gives the false impression that the method might be costly. When implemented using purely functional means or for the purpose of proving the algorithm to be basically equivalent to the deduction system, full explicitness is of course needed and the original formulation a necessary refinement.

A central property of the lambda calculus is, that recursive definitions are non-elemental, but can instead be expressed by a fixed point combinator. The original paper[4] notes that recursion can realized by this combinator's type . A possible recursive definitions could thus be formulated as .

Alternatively an extension of the expression syntax and an extra typing rule is possible as:

where

basically merging and while including the recursively defined variables in monotype positions where they occur left to the but as polytypes right to it. This formulation perhaps best summarizes the essence of let-polymorphism.

^Hindley–Milner is DEXPTIME-complete. However, non-linear behaviour only manifests itself on pathological inputs, as such the complexity theoretic proofs by (Mairson 1990) and (Kfoury, Tiuryn & Urzyczyn 1990) came as a surprise to the research community. When the depth of nested let-bindings is bounded—as is the case in realistic programs—Hindley–Milner type inference becomes polynomial.

^The parametric types were not present in the original paper on HM and are not needed to present the method. None of the inference rules below will take care or even note them. The same holds for the non-parametric "primitive types" in said paper. All the machinery for polymorphic type inference can be defined without them. They have been included here for sake of examples but also because the nature of HM is all about parametric types. This comes from the function type , hard-wired in the inference rules, below, which already has two parameters and has been presented here as only a special case.