Quite often, clients will ask me if I can prove that they were descended from William the Conqueror, or what proof I have when I tell them that they were actually the great-grandson of Bill the Bricklayer... Then they'll get a bit upset with me when I try to explain that you can never really prove anything in genealogy.

I thought it was about time I got round to writing a blog post on the subject of genealogical proof, what that might look like, and whether it could ever even exist.

This standard isn't as well known on this side of the Pond, but in my view it should be, as I believe it gives us a useful set of guidelines for checking whether or not our research has been as thorough as possible in tackling a particular genealogical question.

But proof??? Does it give really us a method of establishing whether or not some point of genealogy has been proved or not? I would argue it does not.

In fact, I would argue that to even use the term proof in the context of genealogy is distinctly misleading and unhelpful. My feeling is we need to steer people away from any thought or talk of proof and adopt an approach to genealogy which is much closer to the scientific method.

In science, nobody talks about things having been "proved" (or at least they shouldn't). Scientists talk instead about hypotheses and theories: Newton's Theory of Universal Gravitation, Einstein's Theory of Relativity, Darwin's Theory of the Evolution of Species by Natural Selection etc. Those have all been around for over a century, are almost universally accepted and we have used them to build the modern world, but none of them has been proved. It is always possible that someone could do an experiment tomorrow that reveals a gaping hole in one of those theories no one had spotted before. Indeed, Newton's ideas on gravity served us pretty well for a couple of hundred years before Einstein improved on them. And the incompatibility of Relativity Theory at the macro level and Quantum Physics at the micro level means one thing physicists agree upon is that we must have something wrong somewhere!

In popular parlance, the word theory is sometimes used in a rather derogatory way. We might say: "Yes, but that's only your theory", to dismiss something we don't agree with. But theories are fundamental to the scientific method.

The way science works is that, in an attempt to shed light on some aspect of the world, a researcher will first put forward a hypothesis, which is a proposed explanation for some phenomenon. That hypothesis is then tested through observation, modified, and improved, and, eventually a fully fledged theory is developed. But theories are never proven: one theory holds sway for a while, then gets superseded by a better one, then that by an ever better one etc. Einstein's theory supplanted Newton's, and Einstein's and will (hopefully) someday be supplanted by a "Theory of Everything" that reconciles Relativity and Quantum Theory.

You can perhaps think of a theory as a model of the universe that gets refined and gradually becomes a better interpretation and reflection of reality as new information comes to light. We should constantly be testing it by looking for new information and checking how well the theory stands up to scrutiny. If the theory is inconsistent with our new evidence, we need to revise it.

But wait a minute! Isn't that the way good genealogy is done too!?

We start with some corpus of information (e.g. baptisms and marriages recorded in parish registers). From that we build a working family tree (our theory, or model of historical reality). Some things will fit together neatly and convincingly, others less so: often there will be many different ways in which the same set of information could be interpreted, and, based solely on the information available at the time, there might be no way in which to make a firm judgement as to which interpretation is most likely to be correct. So we look for more information to help us decide which is right, and, based on any new details that come to light, we may have to go back and change our working family tree so it is consistent with them.

To me, that's not really any different from a scientist taking a set of experimental results, constructing a theory that attempts to explain those results, then doing other experiments to try to see if the theory stands up.

So, I try (but frequently fail, I fear...) to make my clients understand that what I am giving them is just my best attempt to produce a model of their ancestry, based on the (often imperfect, incomplete and inconsistent) evidence available, and should always be considered subject to revision if further information comes to light. Every time I arrange for a family tree chart to be printed so it can be framed and put up on somebody's living room wall and passed down to posterity, as if it it were the final, definitive version of the truth, it is, if I'm honest, with a somewhat heavy heart. I'd much rather produce a model out of coloured balls linked together with wire like molecular chemists do, so it could be taken apart and reworked if the structure turned out to be incorrect. But it's hard to put one of those in a frame above the TV.

In my view, therefore, we should never talk about "proof" in the context of genealogy. We should only talk about how well our model of historical reality stands up to scrutiny, which aspects of it are less well tested than others, and what further information might be available to refine it.

When I started researching my family tree a rather scary number of years ago (just before my first daughter was born - she's 30 soon!), family history was a rather esoteric hobby. There were a few books on the subject in the local library, but not many. There had been one, pretty obscure TV series in the UK on the subject a few years earlier, but I confess it had passed me - and most of the rest of the nation - by at the time.

Now, family history is mainstream. Prime time TV series are sponsored by the likes of Ancestry, GenesReunited and FindMyPast. Major events are held at venues such as the NEC and London Olympia every year and thousands of people have caught the bug. It seems to be a pastime whose time is no longer past. Why I wonder?

Well, I think there are a number of reasons.

The internet and genealogy: a marriage made in heaven

One is to do with practicality. When I started out in genealogy, researching your family tree meant at best booking time on a microfilm reader at the local library to view the census reels, at worst travelling many miles to the nearest record office to look at dusty old tomes and barely illegible parchment documents (though actually, if I'm honest, I always loved that part of it!).

Today, with the advent of the internet, and increasing online access to easily read transcripts, high-speed indexes and digitised images, one can do a lot of research from the comfort of one's own home and in just a few hours you can achieve more than would have been possible in years previously.

The truth of the matter is that we are living in an information age and genealogy is an information-based activity. Family history is booming today at least in part because the internet is booming, and the two go hand-in-glove.

Looking for our lost roots?

But I wonder if there's not more to it than that?

Don't we also live in an age when many of us live in different cities, maybe even in different countries or on different continents, from where our parents or grandparents lived, let alone our ancestors from centuries past?

We routinely travel around the world, whether on business or for pleasure, visiting places our forebears had probably never heard of, and would only be likely to see if they went there to fight a war.

Many of us do jobs where the output of our labours is invisible, insubstantial and intangible: few of us today in the western world are fortunate enough at the end of the day to be able to pick something up, hold it in our hands and proudly say "I made that!".

I wonder whether the popularity of family history today is not also in some way connected with our modern lifestyles? Isn't there also an element of nostalgic hankering after a (possibly largely imaginary) lost world of solidity, certainty and "rootedness"? Do we envy our ancestors their connection with a particular locality, with its land and soil?

The thrill of the chase

But that doesn't explain why some of us are enthralled by this hobby even when the focus of our research has nothing to do with us personally. Having exhausted our own families, many of us who have "caught the family history bug" will then start researching other families completely unconnected with our own. Why's that?

Well, the truth is, I think, that many of us are just what I like to call research junkies. We got a thrill out of discovering our own past and can't give it up. We want to feel the euphoria of discovery again and again and again.

How many of us would, I wonder, admit to being addicted to the detective work that genealogy entails? My guess is that Sherlock Holmes was a research junkie too, and only resorted to cocaine when he didn't have a case to solve.

Don't get me wrong, I'm no Luddite. I worked in IT for 30 years and have developed my fair share of software in that time. I strongly believe that technology has a made a major contribution to the boom that family history is currently enjoying. I also believe software can make the life of every genealogist easier and more productive. It's just that I don't think it goes far enough at present - unless, that is, I'm missing something and there are programs and features out there I'm not aware of - in which case, please enlighten me, because I need them.

My concern with most (actually, all) genealogy software I've ever used is that it's geared towards recording "facts". We progress by attaching events and other data to individuals: you create an entry in a database for an individual, then gradually build up details of that person's life by inputting their birth, baptism, marriage, death, burial etc. as you uncover that information. You also connect those individuals to previous and next generations and to their spouses, and can then use the information recorded to generate charts and reports. If you're lucky, the software will also enable - and hopefully even encourage - you to input references for the sources from which that data has been obtained.

So what's wrong with that, I hear you ask? Well, nothing, so far as recording "facts" goes. But the trouble is, so often in genealogy, we're not actually dealing in "facts"; not, at least, in the sense of clear-cut, hard-and-fast indisputable truths about people's lives. In reality, much of what we do as genealogists is actually about hypothesis, conjecture and interpretation. It's about taking snippets of information from various sources and piecing them together as best we can to form a coherent view that hopefully approximates to the reality of history. Some of the sources at our disposal will be highly reliable and in agreement one another, but often sources will be incomplete or questionable and the details we obtain from one source may conflict with those we glean from another. But ideally, we still need to record it all somewhere, and it would be good to be able to do that electronically in some convenient, accessible and usable form (i.e. not just in Notepad or Excel) but without being obliged to make premature decisions about which of several possible pieces of information to give preference to, or which of several possible interpretations or hypotheses to select.

It's a cliché to talk about genealogy as being like trying to piece together a jigsaw to try to form a picture of a person's life, but the analogy is such as good one, it's hard to avoid it. However, so often the task we're faced with is even harder than the hardest jigsaw puzzle. So often it's rather as if somebody had taken all of the jigsaw puzzles out of the cupboard, thrown all of the pieces into a big bag, shaken it up and thrown away the boxes. We're left with no idea what the pictures we're trying to reconstruct should look like and, what's more, not only do we need to put the pieces together in the right way, we also need to disentangle the pieces of several different puzzles from one another in the process.

What worries me about genealogy software is that at present it does nothing to help this process of piecing together information to form a coherent picture. In fact, I think it can make it harder, or encourage "bad" genealogy, by which I mean asserting as "facts" things which are unproven. If we try to use current genealogy software to construct an hypothesis, just to test out an idea, there's an immediate risk things will begin to become crystallised and open to misinterpretation as historical "truth" rather than theory. As soon as we record a piece of information against an individual, however tentative that assignment, that data begins to look like a "fact". As soon as we connect two people together, however unsure we are of that connection, that link begins to look like a solid bond.

Let's take some examples which might make my point clearer.

Suppose we have an individual who was born before 1837, when civil registration of births, marriages and deaths started in England and Wales, and for whom no birth certificate is available therefore to give us an "official" date of birth. If we want to determine or infer that person's date of birth, we will be obliged to work from other sources. The sources available to us might include the following:

an entry on the 1841, 1851, 1861, 1871, 1881 and 1891 censuses, giving that person's age as 15, 27, 38, 45, 53 and 60 respectively;

a marriage certificate dated 1848, giving an age of 21

a death certificate dated 1900, giving an age of 80

a baptism dated 1st November 1830

a family bible, giving a date of birth of 1st November 1830

So, what date of birth do we record in our genealogy program, which only lets us enter a single date of birth against a person? The temptation will be, I suspect, to enter the date of birth from the family bible, since that's the only full date we have available. But it's seriously at odds with the dates of birth implied by the ages collected from the other sources, which are also inconsistent with one another. We also need somehow to explain away how the child was born and baptised on the same day. But, once we've entered into our program "01 Nov 1830" as the date of birth, and "family bible" as the source of that information, there's a risk this will become a "fact". If we export the data as a GEDCOM and publish it online, this date of birth is now "out there" and could become the "received wisdom" for ever more. How frustrating is that, if, some while later, we come across the child's baptism certificate in an elderly relative's possession and discover that it states that the child was baptised aged 7 years on 1st November 1830 and realise that the date in the family bible was incorrect (it could have been written out many years later by someone who entered the date of baptism rather than the date of birth)?

Wouldn't it be better if our genealogy software enabled us to enter several possible values for each piece of data held against a person, each with a documentary source and perhaps a confidence factor of some kind, so we could clearly indicate that the information was uncertain and not a known "fact"?

Let's consider another example. Imagine we're trying to identify the parents of a child baptised in 1799, son of a William and Mary Lock. We search the marriage indexes and identify two possible couples called William and Mary Lock in the right area at the right time. The one couple were married the year before the child's baptism in the village a mile or so down the road. The other couple were married twenty miles away 10 years previously. The temptation will be for us to attach the child to the first couple as their son because they seem so much more likely to be the "right" couple based on the available information. And immediately, however much we ourselves might still have the outside possibility of his belonging to the second couple in the back of our minds as an alternative to be remembered, we have set that relationship in stone and there's a risk it has become a "fact". We ourselves might forget about the second couple. If we publish our data online, no one else will know we still considered this a possible link, that's for sure.

Wouldn't it be great if there were some way in the software for us to indicate a tentative or speculative link to the first couple but also a possible link to the second, with confidence levels and reasons? Is there any software out there that allows that? I'm not aware of any.

Wouldn't it be better if genealogy software could be re-designed to support and assist the genealogical method, by which I means the process we go through to arrive at reasonable conclusions, rather than just the final conclusions themselves? Rather than assigning facts to individuals and then cross-referencing to sources, couldn't we start by recording the content of sources?

I think what I would like to see is a package which allowed you to:

record sources by using pre-defined or user-defined templates (e.g. forms for census data from various years, GRO certificates, parish registers etc. plus forms we built ourselves for unusual or ad hoc requirements)

link those sources together to form connections where the information they provide relates to the same "fact" (e.g. a date of birth taken from a birth certificate and an age taken from a census or death certificate) and enable us to highlight the degree to which the information is or is not consistent, with confidence levels assigned to each source, and the ability to comment on our reasons for those confidence levels

construct multiple hypothetical timelines and pedigrees for individuals and family groups as a means of trying to analyse the information available to us from the various sources and piece it together into some kind of coherent structure

perhaps (is this pie in the sky?) have the program suggest possible connections and point out inconsistencies?

If this could all be done in some kind of graphical form, maybe akin to a mind map, we might then have a tool which was truly of assistance to the genealogical method rather than just allowing us to record - all too often prematurely - the end results of the genealogical method.

My own "brick wall"

My great-great-great-great-grandfather John Clifford married Ann Shill at Badgeworth, Gloucestershire in 1793 and was described as "John Clifford of Swindon" (i.e. what is now known as Swindon Village near Cheltenham, not the town in Wiltshire) in the marriage register entry there. He and Ann had five children baptised at Swindon, including my great-great-great-grandfather Thomas Clifford (in 1803) and his younger brother William Clifford (in 1808).

(William was later convicted twice of theft and transported to New South Wales for seven years for the second offence in 1834, leaving descendants who live "down under" to this day - but that's a story for another time. Back to John...)

I have spent over 25 years trying to identify John's parents through the "normal" methods, e.g. Church of England and Nonconformist registers, poor law and settlement records etc. etc. etc.

The sad, but inevitable conclusion I have come to is that John's life left virtually no impression on the documentary record. His years on this planet appear (as far as I can tell at this moment) to have resulted in:

an entry in a marriage register

five entries in a baptism register (for his children)

a cursory entry in the Swindon burials register ("1811: John Clifford" - he doesn't even merit a full date)

half a dozen or so mentions in the Overseers Accounts when he was "on the parish"

a passing reference in his son Thomas's settlement examination taken 20 years or more after his death

That's about it, so far as 25 years of searching has uncovered. There is no likely baptism to be found within 20 miles of Swindon. I have plenty of theories about John's origins, plenty of candidates to be his parents, plenty of possible lines to pursue, but given the sparse nature of the material available, there seems precious little prospect I will ever have any certainty about his parentage.

The sad truth, I'm afraid, is that even the best of us has to recognize eventually that the paper trail has gone cold and the documentary record alone just does not provide sufficient evidence to connect any further back with confidence.

This was particularly frustrating because there was no shortage of Cliffords to connect to! "Clifford" is by some margin the most common surname in the Swindon parish registers: there are more Cliffords even than there are Smiths or Jones. There was a well documented line of Cliffords in Swindon back to 1522, if only I could find that missing link!

Could genetics help where genealogy had failed?

After 25 years of trying to find that link through documentary sources, I eventually began to wonder if genetic testing could help?

Here was the idea. I had two main theories about who John's parents might be, so I would trace two living people:

Person "A", a direct male living descendant of the man who would have been John's father if the first theory was correct, and;

Person "B", a direct male living descendant of the man who would have been John's father if the second theory was correct.

I would then persuade those people to take part in Y-chromosome tests which I could then compare with my own test results. I realized, of course, that these tests would not actually prove anything, but they might help give me a steer as to which theory was the most likely, or avoid wasted time by largely ruling out an incorrect theory.

There were three possible outcomes, as far as the 3 Y-chromosome profiles were concerned:

I might match to neither A nor B. That would in some ways have been the least satisfactory outcome, but even this would have been interesting. One possible theory was that John was illegitimate, and if neither other Clifford matched to me, then this would have been a line of enquiry I might have been inclined to pursue more vigorously.

I might match to both A and B. This wouldn't have given me much of a clue in terms of deciding which of the possible lines of descent was correct, but it would still have been encouraging, in the sense that it confirmed John's connections at some point in the past with the other Swindon Cliffords and made it unnecessary to concoct elaborate and outlandish theories about his being a bastard, foundling or adoptee.

I might match to one profile, but not the other. This would perhaps be the best outcome, giving me a definite indication that the one theory was more likely than the other. But this seemed a highly improbable outcome.

The first obstacle, of course, was to trace those living male descendants to match to. This actually was very easy as far as the one theory was concerned - I'd been in touch with just the right person to be "A" for years through a one-name group. But in relation to the other theory, despite the potential father having had numerous offspring, tracing and then contacting a living direct male descendant turned out to be quite a challenge, and one that took several months to crack. There were many different lines to follow, and many of them seemed to die out after a few generations. But eventually, I found a line that came right down to the present day, and identified an individual still living in the Cheltenham area and whose contact details I was able to obtain (it turned out in fact I'd worked with his nephew for many years!).

The next hurdle was persuading both people to agree to the genetic testing (albeit at my expense, I hasten to add). Having spent all that time identifying and contacting the right person, I was worried I would face the frustration of a refusal, but in practice, both men were delighted to help.

The tests went off and I awaited the results with anticipation.

Opening the envelope

What the results showed was that the third possibility was the case: my Y-chromosome was completely different from A's but almost identical to B's (just one slight difference on a single marker - about what would be expected given that the lines probably separated over 200 years previously).

Clearly, that was a very exciting result for me, after 25 years of fruitless searching. But what did this exercise actually tell me, in reality, when I'd calmed down?

Well, nothing definite, is the truth, but plenty to take encouragement from. What I know for sure is that person B and I have a common male ancestor at some point in the past. Since I know with confidence who B's ancestors were, through the standard documentary method, I can be sure that my ancestor John and I connect to that tree somewhere, at some point. The slight difference in our Y-chromosome profile is perhaps suggestive that the two lines separated maybe 200 or so years ago, but it would probably be unwise to place too much reliance on estimates based on an assumed rate of mutation such as that. I am sure our trees don't connect up more recently than the middle the 18th century, but the actual common ancestor could, in theory, have lived hundreds of years before that.

Nonetheless, this has given me a great deal of encouragement that I can consider my John to be a "true" Swindon Clifford who joins up with the earlier Cliffords from that place in some way, as well as indicating which lines of enquiry are most likely to bear fruit and which are probably a waste of time.

I think we are only just beginning to explore the potential for genealogy and genetics to complement one another. I am sure that as genetic science progresses, new opportunities to involve genetic testing in our family history research will become evident.