Hilton and Mathematical Probability

In 1958 Ordway Hilton participated in Session #5 of the RCMP Seminar Series. His article was originally published in that series by the RCMP, and subsequently republished in 1995 in the International Journal of Forensic Document Examiners.1

The later republication included the following abstract:

In every handwriting identification we are dealing with the theory of probability. If an opinion is reached that two writings are by the same person, we are saying in effect that with the identification factors considered the likelihood of two different writers having this combination of writing characteristics in common is so remote that for all practical purposes it can be disregarded. Such an opinion is derived from our experience and is made without formal reference to any mathematical measure. However, the mathematician provides us with a means by which the likelihood of chance duplication can be measured. It is the purpose of this paper to explore the possibility of applying such mathematical measure to the handwriting identification problem to see how we might quantitatively measure the likelihood of chance duplication.

Hilton’s article was written in 8 main sections with references, and is followed by a discussion between seminar participants. Today’s review will discuss each section of the article in turn.
Before delving into the article a few general observations are in order.

First, this article is important because Hilton made a serious attempt to address the application of probability and statistics for FDE work. It is a key article by a popular and respected author that has been referenced by many others including Huber and Headrick in their textbook, Handwriting Identification — Facts and Fundamentals.2 Examiners should definitely be familiar with the article and what Hilton has to say.

Second, the title mentions ‘mathematical probability’ in relation to handwriting examination (specifically, the Handwriting Identification Problem). Prefacing the word ‘probability’ with ‘mathematical’ implies there are different types of probability. Such distinctions can be made but, in reality, the distinction is misplaced and inaccurate. Many people hold the firm, yet mistaken, idea that probability is somehow different when applied in everyday life. They believe that the colloquial use of the term differs significantly from its more ‘formal’ use in mathematics or statistics. This point will be discussed at some length but it’s very important to understand from the start that probability is the same no matter how or when the term is used or applied. In all its guises probability is a useful tool for discussion about, and exploration of, uncertainty that exists in information being used or referenced. More important, at all times and for all uses, the same rules and limits apply — no matter when, where or how probability is used or invoked.

Third, although it may not be entirely clear, Hilton’s article stands as one of the earlier writings in the QD realm to touch on concepts relating to Bayes Theorem as it might apply to our work.3 Mind you, the theorem is not mentioned directly, nor are critical aspects of it covered such as conditional probabilities. Indeed, Hilton’s understanding of probabilistic logic is a bit misdirected and incomplete, like many other authors at that time and since. This is pretty much as one would expect given that the article was written in 1958.

Section 1 — Basic Mathematics

In this section Hilton provides a limited review of some very basic probability theory with the goal, I believe, of providing his audience with key concepts upon which one can address the problem of handwriting identification from a probabilistic point-of-view. That was a good idea since some basic knowledge/theory is necessary for a deeper discussion of how such concepts might apply to handwriting identification.4

Some of the math presented by Hilton may derive from Souder or Olkin but, unfortunately, I do not have direct access to their works or comments. That makes any assessment of the situation a bit more challenging but not impossible.

The basic math discussed in this section is correct. The author provides equations that correspond to a very traditional frequentist approach to probability. Although the terminology used is not overly precise, various key concepts are discussed or defined including success/failure (which I prefer to talk about in terms of being mutually-exclusive and exhaustive events), joint probability (confined to a specific special case rather than the more general form) and, of course, independence.

Hilton explains joint probability as follows:

if we have several events, each with their probability of occurrence as $p^{1}$, $p^{2}$, $p^{3}$, …up to $p^{n}$ and each event is independent of every other, then the probability of all events happening simultaneously is the product of their individual probability. $$ \left( P=\prod^{n}_{i=1} p^{i}\right) $$

This is confusing because the written description is correct yet the equation that follows is not. It is unclear to me why Hilton chose to use superscripts in this manner but it goes against standard convention.

Mathematicians will interpret the term $p^{2}$ as meaning p-squared (or $p \cdot p$), $p^{3}$ as p-cubed (or $p \cdot p \cdot p$), and so on. To avoid misinterpretation, the more correct form would be this: $$ P=\prod^{n}_{i=1} p_{i} $$ An additional source of confusion comes from the fact that $P$ happens to equal $p^{n}$ (in the power sense) whenever every event has the same probability, $p$, which is what Hilton does later in his discourse. So long as all $p_{i}$ have the same value: $$ P=\prod^{n}_{i=1} p_{i} = p^{n} $$ But if one were to apply Hilton’s original equation while treating $p^{i}$ as a power function, the result would be a hugely inflated probability value, $P=p^{1} \cdot p^{2} \cdot p^{3} \ldots p^{(n-1)} \cdot p^{n}$. This is completely incorrect but I suspect it may have been a simple transcription error, likely occurring in the typesetting. I base this belief on the fact that Hilton uses $p^{n}$ correctly several times later on in the text.

In any event Hilton noted that the basic equation only applies in the situation where two, quite specific, conditions are met:

equal likelihood of events which “simply means that each occurrence of an event must be just as apt to happen as any other”, and

“mutual independence of successive events in the determination of the probability of several events happening simultaneously.”

This is fine but it is important to realise it represents a limited case of a more general formulation where one or more of these conditions is not met.

Hilton then observes:

We find in handwriting problems that this condition [ie., independence] must be considered with extreme care or else our probability factors will have little meaning. The error caused by ignoring this condition of mutual independence can lead to a probability determination which appears much rarer than it actually is.

Most examiners would agree with this warning. Independence of events can be hard to assess at the best of times and should never be assumed when discussing handwriting. That much is true. Given that dependencies are known or can be expected with this type of problem, the real issue becomes how can one properly address those dependencies? Hilton does not present any useful way to address the situation when independence is suspect or, more critically, when dependencies are known to exist.

Nonetheless, and notwithstanding that failing, Hilton’s attempt to introduce some very rudimentary concepts was okay. In light of the limited understanding of such things in the QD community this approach is useful as a precursor to a fuller discussion of probability. Overall, there is nothing terribly wrong with this section of the article. It is just very ‘limited’.

The limited nature of the discussion may have been unavoidable. However, in the end, things become misleading and confusing as the author valiantly attempts to extend some basic concepts beyond their limits. The article does not explore key aspects of probability that make it possible for us to apply probability effectively in our work. That is, perhaps, understandable. The end-result is a bit of a kludge that misleads the reader as to the way that probability can and should be used.

A critical concept missing from the discussion is that of uncertainty.5 Everyone knows what ‘uncertainty’ means in the colloquial sense. It is the “state of being uncertain” where ‘uncertain’ means “not able to be relied on; not known or definite”. That particular definition, however, is a bit extreme in that it permits of no gradations. A better one would invoke the idea of a continuum of uncertainty. Something like “a situation which involves imperfect and/or unknown information”. In the forensic realm all information is imperfect or unknown to some degree.

“Probability is a branch of mathematics which aims to conceptualize uncertainty and render it tractable to decision-making.”6 The reason ‘probability’ is used in our work is to address issues of uncertainty. Hilton clearly recognized the value and need for probability in QD work but he did not discuss it this way.

Uncertainty is omnipresent — both in the information we have available to us and in the methods we use to assess that information. Hence, it must be a factor in any outcome resulting from those methods. Hilton made a wonderful observation that “When the examiner states that in his opinion two writings were written by the same person he has actually applied the theory of probability (without mathematics).” In other words, the examiner deals with uncertainty by using probability. This is true because all forms of human reasoning apply probability theory to some degree and in some manner. But it doesn’t go quite far enough.

Hilton’s parenthetic comment about using probability “(without mathematics)” seems almost dismissive even though the concept of probability is addressed throughout the paper as though it were a mathematical concept at its core. It is important to realise just how true this is. The key to understanding is the realization that probability functions the same whether we think of it in terms of mathematics, statistics, logic, or just in the colloquial sense. Quite simply, dismissing any use of it as something ‘other than mathematical’ is wrong. If probability is to be used in our work it must conform to the rules of probability and logic which are very well-defined.7

Probability is an essential concept. In the absence of probability there is no way to discuss and explore uncertainty. It is the tool that lets us work with uncertainty using clear and unequivocal rules and procedures.

Hilton’s view of probability, while essentially correct, is very basic and limited. The aspects we need to utilize exist in the realm of probability theory which is “the extension of logic to uncertain events.”8 In particular, we need to understand conditional probabilities and probabilistic logic. Given the time period in which the article was written I can understand why this was not done. Still, I am surprised that Hilton did not address the issue of conditionality more fully given that he consulted with some well-qualified statisticians/mathematicians who should have understood these issues.9

Conditionality is closely related to the concept of independence and has an impact on everything we do. The distinction and relationship between joint and conditional probabilities is critical but, unfortunately, it was overlooked.

As discussed earlier the joint probability equation given in the article, $P=\prod_{i=1}^{n}p^{i}$, should be written as either $P=p^{n}$ or $P=\prod_{i=1}^{n}p_{i}$ with the latter being a more general form that permits different values of $p$ for each event.

Now, let’s consider the more general correct form: $$ P \left(\bigcap X_{i}\right)=\prod^{n}_{i=1}p_{i} $$ This means that that joint probability of ‘i’ independent events, written $P\left( \bigcap X_{i}\right)$, can be obtained by multiplying together the probability of each single event; that is, pi. If we assign the same probability to each event then it certainly simplifies the math involved even though it is not a requirement. But, if that condition does not hold, then we need to know the probability for each of them.

That addresses the first of the two limitations noted above. This equation still applies only when all of the events are independent.10 And we already know that is not a valid assumption for handwriting. In order to deal with dependencies we need to introduce the idea of conditional probability, written as $p(A|B)$ or the probability of $A$ given $B$ (for any two events $A$ and $B$).

It turns out that joint and conditional probabilities are related to one another. For two events, $A$ and $B$, the situation is pretty straight-forward. The equation describing the relationship is as follows: $$P(A│B)=\frac{P(A⋂B)}{P(B)}$$ After some re-arranging, we get: $$P \left(A\bigcap B\right)= P(A|B) \cdot P(B)$$ This formula is quite similar to our earlier one. The difference is that $P(A)$ has been replaced by $P(A|B)$.

This tells us how conditionality affects the final joint probability when we’re dealing with dependent events. If the events are independent the joint probability is simply $P(A) \cdot P(B)$ but, if they are not we use $P(A|B) \cdot P(B)$. Very straight-forward for just 2 events. However, it doesn’t remain simple when additional events are added to the mix.

If we extend this to ‘n’ events it ends up something like this: $$ P \left(\bigcap X_{i}\right)= P(X_{1}│X_{2},X_{3},\dotsb X_{n} )\cdot P(X_{2}│X_{3},X_{4} \dotsb X_{n} )\cdot P(X_{3}│X_{4},X_{5} \dotsb X_{n} ) \dotsb P(X_{n}) $$ Which we can also write using more generic notation as: $$ P \left(\bigcap X_{i}\right)=p(X_{n}) \cdot \prod^{n-1}_{i=1} \left(P(X_{i}│X_{i+1},X_{i+2},\dotsb X_{n} \right) $$ The expression $P(X_{i}│X_{i+1},X_{i+2},\dotsb X_{n})$ refers to the probability of each single event, $X_{i}$, conditioned by all of the other events $X_{i+1}$ through to $X_{n}$. By addressing conditionality in this manner each event is made ‘conditionally independent’ of the others (when taken as a longitudinal series of events). We do this for each of the events in turn and multiple the resulting conditional probabilities together. Note that the first term in the equation is the marginal probability of $X_{n}$, which is not a conditional probability, per se.

This formula is rather complicated and probably a bit intimidating.

Certainly, as a representation of what an examiner actually does during an examination in the real-world, it leaves something to be desired. I am sure that examiners evaluate writing with dependencies in mind.11 But I doubt very much that any examiner conducts their evaluation using this type of rigorous sequential assessment. Our normal evaluation process involves a very detailed analysis of the writing with assessment of many individual elements or aspects of the writing. But, in the end, the final evaluation is done on more of a ‘gestalt’ basis. That is to say, most examiners do not even attempt to assign specific probability values to individual features or habits. They may well be able to point out features, forms or habits as being ‘usual’, ‘distinctive’ or ‘individual’. But most will also say that no one feature, form or habit served to ‘identity’ a writer (meaning to characterize that writer’s set of habits fully). Examiners assess the overall probability of the writing, as a whole, looking at the similarities and divergences between the sets. Of course, even then, most do not think of the probability of what they have observed in any formal or rigorous manner. My key point is that they do attempt to address the issue of conditionality that exists in writing.

For example, say that a writer displays a relatively ‘unusual’ motor pattern in the way they form the bowl of a written ‘d’ form. A formation observed only rarely would be a full clockwise circle beginning on the left and moving through 270°, followed by a rising staff with a full retrace before exiting to the next letterform. When such a movement is observed in a person’s writing the examiner would ‘expect’ to see similar forms, such as the ‘a’, ‘g’ or even the ‘o’, produced with a similar movement. It is not guaranteed but it is likely because most letterforms with similar graphical appearance invoke the same underlying motor pattern (or habit). In other words, the forms cannot be treated as ‘independent’ when thinking in terms of the underlying motor pattern/habit from which they are generated. Examiners know this and incorporate that knowledge in their assessment of the ‘uniqueness’ of the writing.

Now, at the time Hilton was writing his article there was no easy way to handle this type of conditional probabilistic data. Even today it is not ‘easy’ but we now have very capable (if still complex) options such as Bayes Networks to deal with this effectively and efficiently.

The matter becomes more interesting when we introduce probabilistic logic in the discussion both in the general sense of modifying existing belief and more specifically how our information serves that purpose in a court of law. Hilton touches on this in his section called “Testing The Probability Determination” so I will leave this topic for the discussion of that section.

Section 2 — Historical Cases Involving Probability Measurements

Hilton noted “Any formal computation of a mathematical measure of probability in connection with handwriting identification problems has indeed been rare” and I agree. I do not think anything has really changed in that regard over the years since this article was written.

The Sylvia Ann Howland case involved a traced signature on a codicil to a will and a typewriting case, the Risley trial, where a mathematician testified about identification of a typewriter are both presented. As historical reviews go, Hilton’s comments are sufficiently accurate.

Section 3 — Typical Problems

Extending the commentary from the previous section Hilton goes on to discuss what he calls ‘typical problems’ where “mathematical measures may be applied.” He acknowledges the application involving a tracing but refers to this as a “very limited application, but one in which it would be easiest to establish the various individual probabilities of two consecutive strokes coinciding by simply studying the available genuine signatures. The joint probability ratio could then be calculated by simple multiplication.”

This is a vast over-simplification of what would be involved; however, Hilton is correct in his belief that this would be a relatively easy problem to study. Issues that arise in other handwriting problems would clearly be more complex. Hilton describes the situation as follows:

Measuring the likelihood of accidental coincidence in the general handwriting identification problem is far more complex than under the situations presented in the Howland traced forgery. The difficulty is not so much that of mathematical or arithmetical operations although we shall see that there are some complexities in this field, but rather is that of assembling the basic data from which to calculate the joint probability value. To do this we must study more than one individual’s handwriting. In fact we must know how often each identifying factor occurs among all writers who could have produced the document, for example all writers in the United States and Canada or in New York City depending upon the investigation. Furthermore, not just certain classes of characteristics or qualities, but rather all should be considered.

Once this basic data is available for each identifying factor we can then resort to the multiplication theorem to calculate the likelihood of chance duplication. With handwriting identification, however, there are several considerations which should be analyzed before we can decide whether it would be worthwhile to make the extensive study necessary to establish the basic probability factors.

Overall, I agree with much that is written here. The basic approach is sound if we are considering solely the prospect of a ‘chance match’ or ‘accidental coincidence’ between two writers taken at random from some (hopefully well-defined population). Things become more problematic when we extend the evaluation to consider other, arguably more important, propositions such as those involving simulation or disguise. This is a facet of the situation people often overlook.12

In addition, the invocation of formal statistics would require some appropriate form of sampling or quantification of data which isn’t discussed by Hilton. However, that is a very complicated topic so, like Hilton, we will ignore that aspect of things.

Section 4 — Special Considerations in the Handwriting Problem

Hilton stated that “three hurdles stand in our way of calculating a measure of the reliability in any handwriting problem”; variation, independence and writing quality. Before moving onto the details I should note this sentence is worthy of attention. It is the only point in the article where Hilton indicates that the process he is considering is one based on “calculating a measure of the reliability”. I don’t disagree with this idea but I wonder what was meant by the word ‘reliability’ as used in this context? Was it reliability in the sense of ‘consistency’?, or reliability as interpreted in court (as a synonym for validity)?, or something else entirely? As I have no clear answer I leave this as an open question.

The first special consideration is variation and Hilton observed the following:

Writing variation causes an identification factor to occur with slight modifications in successive specimens. To take the classic example which all non-experts, statistically minded workers cite in connection with a suggested probability determination, the open topped “a”, the question immediately arises with a series of open “a’s” by the same writer just how open this letter is. In some instances the gap will be very small, it may even occasionally be closed. In other instances the opening may be so wide as to make the letter look like a ‘u’. Thus, the variation factor forces us to decide that if an open topped “a” has a certain probability value then does this value undergo any adjustment when the letter is only slightly open or when it is very open. There are many other aspects of handwriting variation and how each is to be handled in probability determinations requires decision. Arbitrary definitions or decisions, probably somewhat personal to each worker, may have to be developed in much the same way that these procedures are developed in presenting the problem of variation to a jury.

These are all good comments and the example, using variation in an ‘a’ form, is perfectly fine. However, there is an issue overlooked in this commentary relating to the metric used to describe the ‘identification factor’.

The manner in which one would address variation in handwriting depends greatly upon the metric used to describe the writing (or the facet of interest within the writing). Suffice it to say that there are many different solutions that might be applied including the one given by Hilton. An approach based on ‘personal’ decisions, though it may seem inadequate or weak, is essentially what examiners do all the time so there is no reason to view it disparagingly. It may prove to be entirely valid and reasonable.13

I am not sure what Hilton meant by the term “predetermined probability factor” but I suspect it refers to some preset value common to all features (in accordance with his ‘condition of equal likelihood’). That requirement is unnecessary; not to mention unrealistic and unattainable (though it may be a valid approach for estimation purposes if done conservatively).

Each event can and will have its own probability value that must be taken into account. Again, this simply serves to make the math a bit more complicated. But that’s all it does. Hilton’s overall comment is correct: “Yet if we use a predetermined probability factor for each and in calculating the joint probability value fail to recognize this interrelation, we arrive at a final probability value which indicates less likelihood of coincidence than is actually the case. Errors in this direction must be eliminated at any cost.”

This is true. Ignoring dependencies in our estimations would result in very inaccurate results.14 However, since dependencies are unavoidable in our work, the real issue is how we should be dealing with those interrelations (dependencies) that exist.

Hilton introduces correlation in the discussion. Correlation methods can serve to measure the degree of dependence between variables. That part of the article is correct. And his comment that “In every attempt to set up probability factors the correlation between data should be tested unless its existence or non-existence is obvious” is reasonable. But, in general, we can be reasonably sure of the outcome of any such testing.

We know that we should expect correlation (i.e., some degree of conditionality) to exist for most aspects of what we do. It might be helpful to test for this if there is any real chance that independence might exist. But for the most part dependence is more likely than not. So the real question is what should be done when it is present? Hilton is understandably silent on that point.

The third special consideration Hilton considers is writing quality. Here he presents issues that arise when dealing with discrete versus continuous data. The main conclusion offered is that the problem is ‘complex’ — which is certainly true enough. But there is fundamentally no difference when it comes to applying this approach for any data type, discrete or continuous in nature. The matter is one of sampling and the conversion of real world information into quantified data. The approach suggested by Hilton is a reasonable one and would work well. But this consideration is not a huge problem in the grand scheme.

Section 5 — Testing the Probability Determination

This part of the article is the most interesting to me. And, in my opinion, it is also most in need of extended discussion. Hilton attempts to describe how one might calculate a joint probability value in practice. This is important because statistical information of any sort is only useful if it can be applied — in this case, to our authorship evaluation process.

The discourse seems to be based mainly on a presentation given by Olkin15 which I have unfortunately been unable to acquire.

Hilton provides a confusing commentary which I will present verbatim with my own comments interspersed as appropriate:

A calculation of a joint probability value under the criteria already discussed needs some appraisal or statistical testing.

I do not know why ‘statistical testing’ was mentioned at all but based on the list of provided references I suspect it may be a result of Hilton’s reading of Mood.16 Still, while it doesn’t add much to the discussion, the idea of applying a test to evaluate the statistic generated in this process seems reasonable enough. Indeed, the likelihood ratio has various applications one of which is as a test to assess the goodness-of-fit of two models where one of models (usually the null) is considered to be a special case of the other model (the alternative). In that sense it is a Bayesian equivalent for null-hypothesis testing. However it should be noted that the likelihood ratio can be used in many ways and that particular one is not very useful to us in our work.

In this type of determination we are dealing with a number of factors, certainly at least 10 or 12 points of identification. Each individual probability factor is expressed as a fraction and the multiplication of 10 such fractions, even though each denominator is small, leads to a reasonably large value for the denomination of the joint value. Let us consider two sample cases. In each we have 10 points of identification. In the first we determine that the probability of the occurrence of each probability factor is ½. Thus, the joint probability value is $(\tfrac{1}{2})^{10}$ or $\tfrac{1}{1024}$. Considering a second problem where the individual probability value of each identification factor is $\tfrac{1}{3}$ we calculate the joint probability to be $(\tfrac{1}{3})^{10}$ or 1/59049. With either problem under case situations in which the number of possible writers is limited to only three or four persons either joint probability value will tend to make a convincing argument to support the identification.

Here Hilton is trying to show how a change in the “individual probability factor” will greatly affect the outcome when calculating the joint probability using the formula provided earlier in the article. Again, I am not sure why he opted for this approach though I suspect it was based on advice from the statisticians. This is one way to approach the matter though I would seriously dispute the actual figures offered as examples. More important, the formula used by Hilton is problematic because it demands certain things of the data; specifically, the probability factor for each features must to be equal and each factor must be independent. Placing these conditions on the data is both unreasonable and untenable. More important, they are definitely not needed to use this type of information effectively.

Before leaving these problems let us consider for a moment the possibility of nonidentity. In the first case the probability of nonidentity of each factor is ½ and the joint probability value again is 1/1024. In the second case the individual nonidentity probability value of each factor is 2/3 and the joint value is (2/3)10 or 1024/59049, that is approximately 1/57. Obviously, in the first case the argument for identification or non-identification from a purely mathematical point of view is equal.

This terminology used here is interesting.

Hilton speaks of the ‘probability of non-identity” and later in the paper (see discussion below) contrasts it to the ‘probability of identity’ in a manner somewhat akin to a proper likelihood ratio.

In fact, when reference is made to Olkin’s paper the presentation shifts to present that concept:

Olkin, in a paper presented at the 1958 Annual Meeting of the American Academy of Forensic Sciences [8], proposed an evaluation of the identification problem in terms of the likelihood ratio statistic [9]. This statistic, λ, is the ratio of the probability of the characteristics under the assumption of identity to the probability under the assumption of non-identity. This ratio can be treated as a measure of the probability of chance coincidence. Mathematically it can be shown that λ varies between 0 and 1 just as does the probability values, and in its interpretation the smaller its value the more positive the identification.

At this point it is important that we delve a bit into the concept of the likelihood ratio that could be applied in our work. As I noted earlier I have not read Olkin’s paper so I have no way to know if he got it wrong in the first place or if it was Hilton who misunderstood what was being said. At any rate, some of what is written here is correct, some not at all.

Hilton described the likelihood ratio as the “ratio of the probability of the characteristics under the assumption of identity to the probability under the assumption of non-identity”. This is perfectly fine and conforms reasonably with the standard odds form for the likelihood ratio. All we have to do is substitute $E$, evidence, for characteristics and $H_{x}$ for each of the propositions and we end up with $LR=\frac{p(E│H_{1})}{p(E│H_{2})}$. This is the familiar odds-form of the LR. I will discuss it more fully later in the post but, basically, the likelihood ratio can be seen to be a ratio of two conditional probabilities; the first conditioned by the main proposition (in this example ‘identity’ of source) and the second conditioned by the alternative proposition (‘nonidentity’ of source). The term ‘under the assumption of’ is fine for the conditional element. The only thing ‘lacking’ in the equation is the acknowledgement of framework information (often $F$ or $I$) but that is done by many authors so it is not a serious oversight.

The idea that “This ratio can be treated as a measure of the probability of chance coincidence” is also acceptable, but incomplete. The issue of ‘chance coincidence’ plays a part in the evaluation; specifically, as a singular possibility under the general alternative proposition of non-identity. This type of wording applies better to evidence like DNA where other alternatives like simulation or tracing are essentially irrelevant. In the case of handwriting the issue of a chance match from a randomly selected individual in the relevant population is relatively insignificant when those other options are in play. Nonetheless, Hilton’s basic statement is not wrong, just incomplete (and potentially misleading).

So where do things go ‘wrong’ in this presentation? Hilton says “Mathematically it can be shown that λ varies between 0 and 1 just as does the probability values”. This is not true and it is easy to see why. λ is a ratio of two probabilities and every probability can vary between 0 and 1. However, when any two probabilities are divided you end up with a value that ranges from 0 to infinity, exclusive. “Exclusive” because anything divided by 0 is undefined and infinity can never be reached.

Hilton then says “in its interpretation the smaller its value the more positive the identification.” This, too, is wrong given his definition of λ and the range it covers. Under the correct interpretation, the ‘balance’ point which does not favour either proposition is a value of ‘1’. Any value above 1 (ie. in the range from 1 to infinity, exclusive) favours the numerator over the denominator, while any value below 1 (ie. in the range from 0 to 1, exclusive) favours the denominator over the numerator.

The confusion here may arise from drawing a parallel to evidence like DNA which uses the concept of the RMP (or CMP) for the denominator. In those cases, the numerator of the likelihood ratio is often set to ‘1’ while the denominator is taken to be 1/RMP. Thus, the smaller the value of the denominator (1/RMP), the larger the final likelihood ratio and the greater the support for the main proposition (which could be expressed as “the more positive the identification”). I will return to this point later.

I have discussed these concepts at length elsewhere but the complete formula for the likelihood ratio is the following: $$LR=\frac{p(E│H_{1},I)}{p(E│H_{2},I)}$$ where $E$ = the observed evidence or features, $H_{1}$ = the main proposition, $H_{2}$ = the alternative proposition, and $I$ = framework information.

As noted above, Hilton’s comments are consistent with this formulation to a degree even though he does not, perhaps understandably, present it as such. The LR is a ratio of two conditional probabilities. The conditional probabilities reflect the probability of observing the evidence, $E$, under each of two competing propositions, $H_{x}$. I have included an additional factor, $I$, that reflects the potential influence of non-handwriting information provided as background or ‘framework’.

Hilton continues with:

This likelihood ratio is one of the statistical means of testing a calculated value derived from a statistical sample. Naturally, in every handwriting identification problem we are dealing with what could be described as a statistical sampling of a person’s hand-writing when we use a limited set of statistical sampling of a person’s handwriting when we use a limited set of known writing specimens as standards.

Similar to the earlier discussion, this is essentially correct. The actual evaluation process is rather more complicated and nuanced that this suggests but, by and large, it is acceptable. At the same time I find the second sentence confusing. It might be better stated along the lines of “Naturally, in every handwriting identification problem we are dealing with what could be described as a statistical sampling of a person’s hand-writing. That sampling is limited to the set of known writing specimens provided as standards which constituted only a small sample of the writer’s abilities.” Hilton then suggests:

Applying this likelihood ratio as described by Olkin in effect is a test to determine whether the probability of identity and probability of nonidentity are really significantly different and also gives us an estimate in a numerical value of this difference.

This is wrong. It sounds pretty good but, in fact, it is an example the ‘transposed conditional’. The LR does not, however, speak to the probability of identity or non-identity directly. It is important to remember those terms refer to the propositions. The LR relates to the probability of the evidence, not the propositions. Thus, it provides a (numerical) estimation of the difference in support provided by the evidence for each of the two competing propositions.

This is a very important, even critical, concept to understand if one wishes to apply the logical approach correctly.

This gets more complicated when Hilton returns to the numeric example presented earlier.

Going back to our problems already discussed, in the first example of 10 factors with individual probability of ½ each we find that the likelihood ratio of the probability of identification divided by the probability of nonidentification is equal to 1. A factor of λ approaching 1, much less 1 itself, is a warning of an extremely weak identification. Let us consider the second case where we found that the probability of identification was approximately 1/59000 and of non-identification approximately 1/57. This gives us a likelihood of chance coincidence. This procedure of Olkin’s is certainly to be recommended as every identification expert, handwriting or other wise, needs to be conservative in arriving at an identification.

This explanation exposes a further misunderstanding of the LR concept. As noted earlier, the LR can vary from 0 to infinity. The ‘balance’ point is ‘1’ and it represents equal support for both propositions. That constitutes, literally, an inconclusive state where the evidence has no probative value; ie., where the evidence does not differentiate between the propositions. Therefore the statement “A factor of λ approaching 1, much less 1 itself, is a warning of an extremely weak identification” is wrong. As λ approaches 1 (from either direction), the evidence becomes less and less useful to the trier-of-fact for any purpose whatsoever.

Again, a critical issue that should be obvious to all readers is the fact that Hilton’s entire discussion relates to the propositions and the probability that one of the propositions is true. In terms of the logical approach, such statements are untenable as they involve a ‘transposition of the conditional’. His approach is completely understandable because examiners have historically focused on the probabilities of the propositions, rather than the probability of the evidence given the propositions. In other words, the author was thinking and writing in a constant state of transposition. Quite understandable, if completely incorrect and unjustified.

Unfortunately, the resulting discussion of the likelihood ratio and its application to our work ends up being very confusing.

Section 6 — Need of Research

I agree with Hilton’s observation that “Resorting to mathematical measure of the likelihood of accidental coincidence in the courtroom today is uncommon. Possibly under cross-examination the expert witness may be forced to do so by an opposing attorney.” The situation may be changing as our evaluation paradigm continues to shift but, for the moment, it is still a rare thing.

I do not agree with Hilton’s thoughts that follow, however, when he wrote:

Under such circumstances he would, based upon his experience, state conservative values for the likelihood of occurrence of each characteristic, say one in ten, one in twenty-five, and the like [reference to Souder, 1934]. By multiplying together a series often, fifteen, or twenty such probability ratios he would establish a joint probability value sufficiently small, perhaps one in several million or billion, to justify his identification. Undoubtedly, under these circumstances, no attempt would be made to apply the likelihood ratio just discussed, but even if this could be carried out on the witness stand a very small probability value would still be reached.

This approach, based on Hilton’s flawed joint probability concept, would not be a good option. Hilton showed his understanding of problems with this approach in his follow-up discussion. However, the problems are not those listed. First, when asked to give a numeric value for the likelihood of accidental coincidence an examiner should clarify the situation by presenting the correct concept of the likelihood ratio as outlined earlier. In doing so the court should become educated to the complexity of the evaluation and realise that a ‘coincidental match’, while part of the issue, is not a huge factor.

Second, there is the issue of using personal, subjective probabilities. Hilton clearly doesn’t approve of this approach as shown in his objection:

What is wrong with this procedure? Simply we are substituting an arithmetic guess for our original judgment of when we have made an identification. Granted through our experience we are able to estimate the frequency of occurrence of a particular identifying habit or quality with fair accuracy. Nevertheless, it is merely a guess based upon experience.

To be blunt this is very common belief held by many. It is incorrect. An examiner is not ‘substituting an arithmetic guess’ when doing this. Rather, they are providing a perfectly legitimate numeric estimate17 of the probability — one based on their experience, training and understanding of handwriting. Certainly it is personal but all probability is personal, even those based on empirical data.18 Any value(s) the examiner provides would be estimates of a personal nature but they are also estimates based on expert belief. Such expressions are not a problem so long as the personal nature of the evaluation be made completely clear. Doing so permits the court to evaluate the weight to be given the evidence based on the examiner’s background, training, etc.

Also keep in mind that the evaluation process is an overall assessment of the probability of observing all of the evidence (all aspects, features and components observed in the examination and comparison) under two competing propositions. As such, while an examiner might be able to provide estimates for the frequency of occurrence for select habits or writing qualities, doing so would be meaningless to the trier. Only the combined end-result of the evaluation matters.

The (unpleasant?) truth is that examiners make such estimates, often unknowingly, all the time even though most of them do not ‘think’ in terms of numbers. Most examiners do not formally apply numbers to their beliefs but some process akin to this (hopefully equivalent to it) is happening. The important point to understand is that converting one’s personal belief into a numerical value is not ‘bad’. Nor is it ‘wrong’ to do so. It is perfectly legitimate so long as it is clear how the value was produced and what data was drawn upon in reaching it. At the same time it isn’t easy to do at first. Only with some effort and practice can it become straight-forward and simple to do.

To dismiss this process as being “merely a guess based on experience” is equivalent to dismissing the evaluation of a doctor, an engineer, or any other highly skilled and knowledgeable expert.

Or… to dismiss what examiners do now routinely, albeit without assigning a personal, subjective value to the assessment.

As a final comment I will say that, notwithstanding my comments above, expressing one’s belief numerically is not absolutely essential to the process. It is one approach to the matter and one that may well be beneficial. But the logical approach can be applied whether or not the information is quantified or expressed in numerical terms.19

Hilton’s thoughts on research aimed at addressing the question of frequency are interesting:

What is needed if we are to avoid this situation is extensive research into the actual frequency of occurrence of various identifying factors. This requires a study of a large representative sample of handwriting. When we have completed it we would know that certain letter forms appear once in every ten North American handwritings. A very skilful writer is found in one out of eighty writers and so on. Periodically the study would need to be up dated to compensate for the effect of changes in handwriting instruction and use. Nevertheless, while the basic study must be extensive it is essential if one is to measure quantitatively the accuracy of an identification.

These comments are reasonable though I am not entirely convinced in the value of that type of research. Such studies have been done and have provided some interesting (but, to date, limited) information. Most important, that data is also severely constrained in a manner that makes it almost useless in a casework context.

Beyond this, no matter how such data is attained it will never “measure quantitatively the accuracy of an identification”. Leaving aside the issue of the transposed conditional in that expression, accuracy in conclusions offered by examiners can only be assessed through testing of the examiners under conditions of known ground truth. That type of research is very different from Hilton’s proposal.

Section 7 — Effect of Limiting Number of Characteristics

This section discusses issues relating to the use of a limited subset of features drawn from a person’s handwriting. Although Hilton doesn’t use the words it is really a matter of sampling and metrics. The selection of appropriate metrics is essential, and critical, to using this approach. The concerns raised by Hilton (and others) stem from the idea that one might ‘overlook’ that very important “significant difference” that points to another writer (or, more correctly, provides strong support for the alternative proposition). That is certainly a concern but it is the same concern when conducting our analyses using any other approach. What matters when ‘converting’ to some sort of metric is ensuring that all the critical aspects of the writing are factored into the evaluation.

Examiners often point to a traced signature, for example, and say that quantified data would fail to detect a tracing because ‘perfect correspondence’ in graphical form would produce numbers that appear to show only similarity and fail to indicate the problem inherent with ‘perfect correspondence’. But that isn’t necessarily the case.

There are various approaches that might be used to address this but all of them introduce a metric (or more than one) aimed at ‘detecting’ a tracing situation. That is, we simply need some metric that can probabilistically indicate a ‘tracing’ activity. I do not, at this moment, know exactly what that metric would be. But, if an examiner can ‘detect’ it and ‘measure’ it, then one will exist.

One might incorporate a test that specifically ‘flags’ excessively close correspondence between sample signatures assigning a value that indicates ‘unnaturalness’. Alternatively, it might be possible (and probably better) to use one or more metrics that assess relevant features that one might to see under a ‘traced’ condition. In other words, some of the elements in the evidence (i.e., the features being assessed) can be expected to have a higher or lower probability of occurrence if a signature is traced. As a side note, those features would be given predominant consideration when evaluating the evidence under any sub-propositions specifically involving tracing.

The last sentence in this section is interesting. There is, indeed, “a danger in using a limited number of identifying characteristics in a probability formula if the joint probability factor or likelihood ratio is going to serve as the sole means of establishing the identification.” However, the solution to that problem is quite simple: ensure that the evaluation is not based solely on a limited or set number of features. Neither of these is a requirement when using this approach to the issue. We simply need to apply an evaluation process that corresponds to our present approach wherein the examiner assesses “all” of the writing in its entirety.

Section 8 — Conclusions

Hilton’s conclusions are interesting. He asked:

What can be gained by the application of the mathematical measure of probability in handwriting identification problems? The greatest value is a concrete measure of the sureness of an identification. When the examiner states that in his opinion two writings were written by the same person he has actually applied the theory of probability (without mathematics). He has determined from his experience that the chance of accidental coincidence, that is that there are two writers with these same writing characteristics, is so slight that for all practical purposes it can be disregarded. The value of using mathematical measure only leads others to appreciate the degree of accuracy of the identification.

What did Hilton mean with “The greatest value is a concrete measure of the sureness of an identification”? If we could somehow define ‘concrete’, ‘measure’ and ‘sureness’ then, perhaps, this sentence would be reasonable. I think he was trying to say that it would give us (and our clients) a better sense of the reliability and accuracy of our work. To some degree I concur with this but I explain those issues a little differently.

Our use of probability relates to assessment of uncertainty in the evaluation. As such it is an integral component of our conclusion or opinion because the latter is aimed at conveying to the trier the appropriate weight to be given the evidence at hand. It is not, however, a matter of applying the theory of probability to reach a conclusion of ‘identity’ as Hilton suggests. Rather it is a matter of assessing the evidence in terms of the relevant propositions and telling the court which proposition is best supported by that evidence, and by how much. That evaluation must be done in probabilistic terms.

As noted earlier, ‘accidental coincidence’ is only part of the issue and a relatively minor part at that so examiners must do much more that determine “…the chance of accidental coincidence…” I also take exception to the idea that the uncertainty can ever be reduced to a level where that chance “…is so slight that for all practical purposes it can be disregarded.” This idea is misplaced and inappropriate and serves as a good example of making statements that are 1) too strong, 2) misleading and 3) potentially usurping in their nature. Any decision to disregard a ‘chance match’ or a potential error must remain with the trier, not the examiner.

To some degree I agree that “the value of using mathematical measure only leads others to appreciate the degree of accuracy of the identification.” But I suggest this is a potential only which can be achieved when the metrics and evaluations based on those metrics are fully and completed explained. Otherwise, the use of mathematics may also lead others to ‘overvalue’ and misinterpret the degree of accuracy (or precision) involved. This issue was noted by Hilton when he wrote:

At the same time if we assume values for the identifying factors, we only add window dressing to our opinion. We are not making our conclusion any more scientific or accurate by resorting to mathematics and the theory of probability. It is merely a tool to assist us in our estimation of accuracy.

But we are not adding ‘window dressing’ when we use probability or mathematics. To avoid this possibility the examiner must always be very clear about the limitations of the process being used. I don’t see the primary benefit as making our work more scientific or accurate but rather one of better transparency about what we are doing. At the same time, proper application of the logical approach to evidence evaluation must be considered more ‘scientific’ in that it entails the rigorous application of proper probabilistic logic. Hilton’s final thought is, nonetheless, reasonably good — this approach is merely a tool that assists us; in doing the evaluation if not in our estimation of accuracy.

Hilton clearly does not feel that “mathematical determinations” are of value to examiners. He said:

Despite statements by some critics of modem handwriting identification methods, mathematical determinations based upon complicated research and calculations would add very little to the experienced examiner’s opinion. With the beginner it might be a dangerous tool, causing him to stop too soon in his examination before he had exhausted his search for all identifying characteristics and thus gradually form a habit of careless, too rapid examinations. It is only in the very rare case involving limited amounts of writing that these determinations would have great significance. An opinion which considers and evaluates all of the identifying attributes present in the writing, if properly carried out, is accurate whether the examiner turns to a probability check on it or not.

The problem with this statement is that Hilton is unaware that the likelihood ratio approach can (and should) be used whether or not empirically-derived, numeric data is available to inform it. While purely mathematical determinations are unlikely to improve or enhance an examiner’s opinion, the approach is perfectly suited to our work (or any similar type of evaluation). It can be used to consider and evaluate all of the attributes present in handwriting in a manner not that dissimilar to what has always been done.

The key difference lies in the focus on the probability of the evidence, rather than the probability of the propositions.

Discussion of Mr. Hilton’s Paper

The discussion section that follows the main article reflects the approach taken in the RCMP Seminar series. A couple of the comments are particularly interesting.

McCarthy: I think that the basic difficulty in probability is where you have the chance that an event might be duplicated, such as in handwriting, one chance in ten million, say that it can occur anywhere in the ten million. In other words this may be the one case in the ten million possibilities that it has been duplicated.

This is, of course, true. And it will always be true no matter how we approach our work. The reality is that the ‘truth’ of the matter is never known and people should not think this is something unique to handwriting. It applies to all types of evidence and information. That is why a trial is taking place and the trier-of-fact has the responsibility to determine what really did or did not happen. The only party in any position to assess the probability of the propositions (i.e., the hypotheses about what actually happened) is the trier, not the expert.

We work in a constant state of uncertainty. Simply ignoring it does not make the situation any better. Probability provides us with a means to discuss and explore uncertainty. In forensic situations the information we have is 1) always uncertain, and 2) always conditional. Those are two of the reasons why the logical/likelihood ratio approach is an excellent option.

Hodgins: It seems to me that the greatest difficulty is going to be in arriving at local probabilities. No matter in the light of your experience or even of a survey, be it of Canada or United States or practically world-wide, when you actually go to Court you are going to a given area and the writer you are dealing with was, in some of our cases in the outports let us say, schooled and raise there as was everyone else and what may seem to be a reasonable and very safe probability otherwise is much too low for that area.

Hilton: I will thoroughly agree with that, that these determinations are going to be influenced as they would in this case by local training. Any factors that are determined and developed this year and years from now, even if you were to take them for the whole of Canada or the whole of the United States or the whole of the North American continent, should be revised or checked ten years from now, simply because of certain changes in the population, new writers coming in, old ones dropping out, which are going to change some of these values. In a particular area a more remote, rural type of area or where you have a very constant population where all people have grown up in the same school system and imitated each other’s writing to a certain extent, you are going to have some unusual factors there that are not at all unusual for that particular population.

This discussion is excellent and touches obliquely on something rarely discussed in our field. I suspect the two speakers are not aware of it but the issue underlying this topic is the ‘relevant population’ under the alternative proposition. This issue applies in all our work and raises some concern for any survey or study aimed at assessing the frequency of occurrence of handwriting features on a mass scale. To be applicable for casework any such database must permit delimitation of the data to correspond to the specified alternative proposition.

But the critical issue may not be quite the one presented here. Hodgins wrote “…the writer you are dealing with was, in some of our cases in the outports let us say, schooled and raise (sic) there…” If he was speaking about the actual, unknown writer of the questioned material (i.e., the perpetrator), then this statement would be fine. However, if he was speaking about the suspect (as is often the case), then it is not. The fact the suspect comes from a given location is not key to the evaluation. What matters is the part that says “…as was everyone else”. For an alternative proposition along the lines of “the questioned signature was written by someone other than the suspect with the writer coming from the same geographic area”, then it would be fine.

Most alternative propositions do not include the latter even though it is clearly implicit in much of what we do.

One of the reasons I like the likelihood ratio approach is that it helps to make such subtle distinctions clearer and easier to understand, both to the examiner and anyone listening to their evidence. In that regard, it can also help to ensure more thorough and arguably more accurate examinations and results.

Wrenshall: There are in my estimation two considerations which must be measured and calculated statistically. One of them is the frequency with which the particular characteristic under study occurs in any individual’s writing, that is to establish its inherence, and secondly, the frequency with which it occurs in many persons’ writings, that is to establish its individuality. If you can do that, and I believe that it is not impossible, we are really getting somewhere.

This is one of the most astute comments in the entire article. The points made by Wrenshall are the essence of the likelihood ratio, albeit in a very basic form. The form of the LR described earlier is one of the easier ones to apply whether one has numeric data or not. But, when using strictly numeric information, the likelihood ratio is usually derived from two data distributions; one provides estimates for the numerator (based on within-writer, aka intra-writer, variation) and the other for the denominator (based on between-writer, aka inter-writer, variation). Alternatively, we might also think in terms of the ‘sensitivity’ and ‘specificity’ for the evaluation process.

In any event, those concepts are precisely what was being described by Wrenshall.

As I said at the outset, Hilton’s article is important because he made a serious attempt to address the application of probability and statistics for FDE work. He discussed “mathematical probability” as it pertains to handwriting examination and, unknowingly, extended the discussion to an approach now called ‘the logical approach to evidence evaluation’.

I hope that this post will serve to clarify and explain some of the issues in Hilton’s understanding of probabilistic logic and reasoning.

Please do not take this as a suggestion the Theorem actually has any real applicability to our work. In my opinion, it does not. The reasons for my position should, I hope, become clear as you read through this post.

Handwriting identification is taken to mean the process whereby a questioned sample of handwriting can be associated with (identified to) a given writer; the basic process being one where the examiner attempts to determine whether or not a given writer may have produced a writing in question. As per convention of the time (and even today), the question is posed and considered in terms of the propositions of interest to the trier-of-fact.

Lindley (2007) explains that “… probability is the unique extension of logic and your ideal should not be to be logical, but to be coherent in the sense of the three rules. The grand assertion is that you must see the world through probability and that probability is the only guide you need. “Understanding Uncertainty” means knowing the three rules of probability. The language of life is that of probability.” (p. 66) This may seem a bit over-the-top but the point being made is that probability helps us to understand and deal with uncertainty that exists in all that we do. Furthermore, the proper and logical application of probability is defined in the three rules of probability: Convexity, Addition and Multiplication.

E.T. Jaynes (2003), Probability Theory: The Logic of Science

Of course, it is impossible to know what those statisticians told Hilton. Perhaps they did explain these points but the information never made it into the article. There is no way to know what actually transpired.

‘Independence’ can be easily defined in terms of ‘conditionality’. In mathematical terms, two events are independent when $P(A│B)=P(A)$ and $P(B│A)=P(B)$. That is, when conditioning on one event has no effect on the other.

The idea that dependency exists between and within elements of handwriting is well-documented by document examiners. Every major author has discussed it in their textbook but none that I have read ever discuss precisely how this comes into play in the final evaluation. For example, it is not particularly clear that this type of feature has limited impact on the numerator value of a likelihood-ratio but it is very significant for the denominator, especially when considering sub-propositions relating to coincidental match, simulation or tracing.

The general topic of propositions and how they work in the logical approach to evidence evaluation is discussed in another blog post you can read here.

I should point out that variation is relatively easy to address once you decide which metric you want to use to describe the ‘feature’ or ‘habit’ in the first place. ‘Open-ness’ could, for example, be addressed using a positive or negative number relating to the distance between the two components of the letter form. A ‘closed’ form would be zero, a form with overall/retrace would be negative and a form that has an open top would have a positive value. The variation displayed in this regard by a given writer would be some statistic (and there are several that might be considered) relating to the variation in the measured values.

Olkin, I., “The Evaluation of Physical Evidence and the Identity Problem by Means of Statistical Probabilities”, a paper read at the General Scientific Session of the American Academy of Forensic Sciences, Cleveland, Ohio, February 28, 1958.

Lindley (2006, pp. 37-38) uses the term ‘personal’ rather than ‘subjective’ to avoid the connotation of the latter being subject to sloppy and inappropriate reasoning. An individual’s ‘personal’ belief can be informed by many types of information and if we constrain ourselves to more objective ‘sources’ then our personal belief is perfectly valid. More important, it is unavoidable.

Others disagree with my position and insist (or at least recommend) that numeric values be assigned in this process. See, for example, ENFSI recommendations, p. 16: “Forensic practitioners often experience difficulty in assigning and justifying probabilities when the assignments are based on expert knowledge. However, likelihood ratios can be informed by subjective probabilities using expert knowledge. These probability assignments shall still be expressed by a number between 0 and 1 rather than by an undefined qualifier (such as frequent, rare, etc.). Such personal probability assignment is not arbitrary or speculative, but is based on a body of knowledge that should be available for auditing and disclosure.”

Follow Us

Site Stats

Copyright Notice

Unauthorized use and/or duplication of this material without express and written permission from the author and site owner, Brent Ostrum, is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to the author with appropriate and specific direction to the original content on this site.