The unofficial, unauthorized view of Ancestry.com and FamilySearch.org. The Ancestry Insider reports on, defends, and constructively criticizes these two websites and associated topics. The author attempts to fairly and evenly support both.

Tuesday, November 24, 2009

Genealogical Maturity Model Definitions

A year ago I learned some things from Elizabeth Shown Mills. I’m not certain they were the things she was trying to teach me. Nevertheless, I think they are quite valuable. Having grown up with PAF and its special definitions for source and citation, I was having problems understanding Evidence Explained. It took Elizabeth about six weeks to talk me out of the box I was in. The lessons I took away from that discussion were:

Ignoring the aggregated wisdom of past scholarship is unwise, and

Assigning specialized definitions to words is unwise.

Accordingly, I’ve tried to avoid special definitions in the GMM wherever possible. Here are definitions for the main terms:

evidence – 1. “something that furnishes proof.”7 2. “information that is relevant to the problem.”8 3. analyzed and correlated information assessed to be of sufficient quality.9 4. “the information that we conclude—after careful evaluation—supports or contradicts the statement we would like to make, or are about to make, about an ancestor.”10

Are you asking why I didn't define the term repository, or why I didn't use it in the definition for citation?

The reason I didn't define it was because this article was to define the terms for which readers asked clarification. All were category names from the Genealogical Maturity Model, so I attempted to define all six.

As to the definition of citation, I am certainly in agreement with you and the world's scholars on including a repository in a citation when appropriate and excluding it when inclusion is amateurish. That's really the point of quoting scholars rather than creating my own definition.

I was a bit too quick in my comment. I would have expected the term repository to be one of the top level definitions in your summary of "sourcing" terms.

I did not pick up that this was a list for clarification. I thought it was attempting to define the key terms that we need to be familiar with when thinking about documenting evidence for an assertion.

Also, I would not associate a repository with a citation, but with a source. As in, each source is associated with a single repository, but a repository can be associated with several sources.

Further, at a generic level, a source may be found in many repositories. Yet, when you use a source as the basis for evidence of an assertion, you need to identify the specific repository from which you reviewed the source.

EG, 1850 Federal census... You could be looking at the NARA microfilms at a particular NARA facility or some genealogy library. Or you could have viewed images at ancestry.com genealogy.com, or elsewhere. Hopefully, you will get the same data at each... But the quality of the image at the different locations may lead to a different interpretation of the data.

I would seldom quote the repository in a citation. However, I would include it in my backup material so that another researcher will be able to find it, if necessary.

Something you said triggered an entirely different comment I want to make.

When you are evaluating evidence, what standard of proof is a researcher using when making an assertion?

One of the issues that I have with much material I review is that I do not know what standard for proof is behind any assertion.

Is the assertion a theory or hypothesis which seems right, but additional evidence is needed?

Is the assertion based on a preponderance of evidence standard? IE, lots of data points to this conclusion but there is no smoking gun.

Is the assertion true beyond a shadow of doubt?

I believe that when publishing, it is valuable to present assertions based on all standards of proof, but that the tools or documentation model should allow for a way to assert what standard is being used.

Most genealogy software will allow you to provide an evaluation for the quality of a source, but not of the conclusions based on the evidence compiled.

John wrote: >When you use a source as the basis for evidence of an assertion, you need to identify the specific repository from which you reviewed the source.

>EG, 1850 Federal census... You could be looking at the NARA microfilms at a particular NARA facility or some genealogy library. Or you could have viewed images at ancestry.com genealogy.com, or elsewhere. Hopefully, you will get the same data at each... But the quality of the image at the different locations may lead to a different interpretation of the data.

John, may I insert an opinion into this discussion? You and AI are both right, IMO, in saying that repositories need to be identified--when it is appropriate to do so.

As a rule of thumb, repositories are cited for unpublished works that require contact with that repository to access the material. Repositories are not cited, as a rule, for published works because they may be used in many places. (This is not to say that we cannot, in our own research notes and for our own convenience, identify where we used a book; the convention is that *when we publish* we do not waste space on citing a repository for published works.)

However, in the census example that you pose, three separate issues are involved.

1. If we view an original census at NARA, we cite the original and identify NARA as the repository that holds it. We identify the repository because that original is not available anywhere else.

2. If we view a microfilmed census at NARA or any library, we do not need to cite the facility at which we consulted the published microfilm.

3. If we view a digital image of that NARA film online, no repository is involved. Regardless of the website we consult, we're dealing with a publication, not a repository.

I agree with you, totally, that we should note which *version* of the census (or any other digitized microfilm) we used, because there are significant differences in image quality and search-engine capability. But it's not a "repository" issue. What we're dealing with is different *editions* of the same published work. After we cite the census images, we then add a note to identify the *edition* that we used.

Actually, I think that your comments validate my assertion that a definition and acknowledgment of repository needs to have been on AI's list.

However, I do not agree with your point that a researcher only needs to identify a repository when "it is appropriate".

I think we understand "identify" two different ways. I will not usually identify a repository in a citation. I think citations based on your approaches are too verbose. (If the NEHGS Register can accept a more concise citation, why should not a software's sourcing system... but this is another discussion.)

However, I think it is important for a researcher to maintain this level of data in their documentation. And, when I publish on-line, I would expect that a viewer would be able to find the repository for any of my sources through the web publishing software.

I believe that we see this through different paradigms. We both agree about verboseness and "wasting space". However, I want to see minimalist citations with details hidden in bibliographies and repository lists.

Unless I am confused about your methodology, I see it as citation-based. IE, the citation contains all of the necessary data of a "Master Source" item. As a result, software folks, who have implemented your methodology, have created software that would create very verbose citations were the user to have full source and repository information.

As I publish almost exclusively on the web and am able to use hyperlink technology to allow a reader to drill down quickly from citation to master source item to repository, if it is of interest to them.

Of course, documents can also be created that accomplish similar reader detail.

Your dissection of the various census versions is useful.

I do not agree that the repository where a NARA microfilm was viewed is unimportant. I have seen several cases over the years where the film at a particular site was worn, spliced, scratched, etc.

Further, not all NARA microfilm repositories have the same collection of microfilms.

I do like your use of the term "editions".

You will have to sell me a lot more that the online location of a digital edition is not a repository. I see no reason to treat different on-line "archives" any differently than brick and mortar archives.

If one library has an original of a book and another has a Higginson reprint edition, I am sure that you would want to document that.

Whereas a book would likely cite its origin. However, it is often less obvious for many researchers to know what edition they are looking at for online images. And from what edition of the original work that the images were made.

Are the images made from a copy of the original source? Are they made from a microfilm of a source? Are they a copy of some other online archive's images?

Consequently, citing the repository and the date of accession could give the informed reader more data. It could explain why different researchers arrived at different conclusions.

How would you treat something like the SSDI which has a different edition every month and which different on-line repositories update at different frequencies and provide different detail?

I do appreciate this discussion. And I do think it validates why I suggested AI add repository to his list. We need to be sure that we are all talking about the same thing when we use these terms.

It is also important to understand the goals we have when we do document our work. I think that fits within the GMM.

As someone who has taught the CMM to software folks, I know that one of its limitations is that it was designed for a particular class of software development to solve specific problems.

Other types of software development organizations tried to implement it and failed because they had different end goals.

However, CMM is a series of techniques (KPAs) that any organization can select from, adopt, and adapt to improve their success.

John, you've raised a whole bunch of questions. I’ll have to answer in multiple messages.

You wrote:>However, I do not agree with your point that a researcher only needs to identify a repository when "it is appropriate".

John, your original comment to AI was this: “Most scholars find it important to identify not just the source, as you define it, but where the source was located.”

Because you approached this from the standpoint of “scholarly practice,” I responded by describing scholarly practice. Whether or not you, I, or any individual “agrees” with any particular scholarly practice is a non-issue, because

- we are all free to record what we please in our “working notes.”

- if we self-publish, we are free to publish what we please. Knowing that others will judge us by those “scholarly standards,” we may or may not choose to include a prefatory argument that states why we disagree with a particular practice and have chosen to do otherwise.

- If we publish via a publisher that pays the tab for us, we’ll include only what that publisher is willing to print.

However, we do have an issue insofar as the statement that “most scholars find it important to identify not just the source … but where the sources were located." The premise makes a generalization that contradicts scholarly practice. Scholars “find it important to identify … where the sources were located” only in some cases—those being the ones I described. As AI has said, the scholarly community excludes a citation of repository in cases where “inclusion is amateurish.” All of the classic guides, from CMOS to APA to MLA to the Harvard Bluebook, practice the exclusions that I defined in my earlier post.

> I think citations based on your approaches are too verbose. (If the NEHGS Register can accept a more concise citation, why should not a software's sourcing system....)

Ah, John. This is the old confusion between data INput and data OUTput. The Register, like most publishers, strip citations down to the bare minimum needed to relocate a source—that OUTput is the citation that appears at the culmination of our research. In the long meanwhile, during which our research is in progress, we ourselves need much more data—not just a location to re-find the source if needed, but the type of information about the source that enables us to analyze the strength of its information or the evidence we derive from that information. EE provides examples of the type of details we need to record for each type of source at the INput stage. What detail or what style this-or-that publisher requires at OUTput is a different issue.

>However, I think it is important for a researcher to maintain this level of data in their documentation.

John wrote:>I believe that we see this through different paradigms. We both agree about verboseness and "wasting space". However, I want to see minimalist citations with details hidden in bibliographies and repository lists.

John, do I understand you correctly that you are designing a new paradigm strictly for web-publications? Traditionally, those bibliographies are where the minimalist citations appear; and many publications dispense with bibliographies on the premise that, when good reference notes exist, the bibliographies are redundant.

(As for repository lists, those are not common in scholarly works. What repositories someone went to says little about the validity of the research. Validity lies in the sources used.)

The problem I see with the paradigm you suggest is that those “details” often must be tied to the individual assertion (specifically to the reference note that provides supporting evidence for the assertion). Neither we nor our readers can judge the validity of the assertion without considering details that identify the nature of the source—not just the title, not just the repository, but the nature of the source which provides that piece of information.

>Unless I am confused about your methodology, I see it as citation-based. IE, the citation contains all of the necessary data of a "Master Source" item. As a result, software folks, who have implemented your methodology, have created software that would create very verbose citations were the user to have full source and repository information.

John, I’m not certain that we are defining “citation” the same way. The standard definition for citation is essentially this: The attached statement in which we identify and comment upon the source of (or evidence for) an assertion. By that definition, then, yes, “the citation contains all of the necessary data of a ‘Master Source’ item.”

However, that’s not my methodology. That’s standard practice that has existed longer than you and I have been around—and standard practice that will continue among those scholars you invoke, regardless of what genealogical software decides to do. We’ve had 3 decades, now, of genealogical software designers trying to redefine standard terms to the confusion of the whole industry and all users who come into genealogy from all other research fields. Judging by what I hear from most major designers now, the industry has matured to the point of recognizing that (a) standard definitions already exist; and (b) attempting to redefine standard terms has created chaos. Ergo, AI’s comments in his posting of 24 November.

>software that would create very verbose citations were the user to have full source and repository information.

This puts us back to the issue of whether repository information needs to be included in a citation. If your objection is that some software prints the repository data in reference notes when we don't want it there, then the solution would be for the software to offer a checkbox by the repository field, alongside words like, "Include repository in printed citation." If we don't check it, then the repository info won't print. But we'd have it in our research notes for consultation whenever we personally need it.

I'll continue with your other issues in a separate message. -- Elizabeth

John wrote:>As I publish almost exclusively on the web and am able to use hyperlink technology to allow a reader to drill down quickly from citation to master source item to repository, [they can do so] if it is of interest to them.

True. Hyperlinks offer advantages, so long as users and readers work in the virtual environment. However, when those writings are converted to print (for all those Luddite reasons that people still have a need for print copies, that value is lost in most cases.

This, too, is an issue software design can deal with. The points at which we encounter problems with software design are all those situations in which someone attempts to develop novel approaches that are totally at odds with standard "scholarly" practices. The experience of gen software for three decades has shown that the whole rest of the world is not going to change its way of conducting scholarly research, just to accommodate the individual preferences of one or another software developer.

That, of course, is a frustration--whether we are one of those software designers or whether we are codifying existing citation standards, in the process of which we see all sorts of flaws in the "system." Innovators do tend to march to their own drum beat. But if our drum beat is way out of sync, the rest of the world will just ignore us.

John wrote:>I do not agree that the repository where a NARA microfilm was viewed is unimportant. I have seen several cases over the years where the film at a particular site was worn, spliced, scratched, etc.

True, of course. But most of our readers will not be using that repository for whatever published film we cite. That caveat, which is important to us, is the type we make in our working notes.

>Further, not all NARA microfilm repositories have the same collection of microfilms. And not all libraries have the same books. That’s one of the reasons why we may want our working notes to identify the library where we found that source. But again, it’s not a standard part of a citation at the OUTput stage.

>If one library has an original of a book and another has a Higginson reprint edition, I am sure that you would want to document that.

Same response.

>I do like your use of the term "editions".

Thank you.

>How would you treat something like the SSDI which has a different edition every month and which different on-line repositories update at different frequencies and provide different detail?

Any time we cite a publication, we cite publication data. For an online citation, we cite place of publication (URL) and date (either the date of publication or the date we consulted it, specifying which). Why does that not solve the problem you are seeing?

>a book would likely cite its origin. However, it is often less obvious for many researchers to know what edition they are looking at for online images. And from what edition of the original work that the images were made.

I’m not sure I understand your question here.

>Are the images made from a copy of the original source? Are they made from a microfilm of a source? Are they a copy of some other online archive's images? Consequently, citing the repository and the date of accession could give the informed reader more data. It could explain why different researchers arrived at different conclusions.

John, I totally agree that this data would help your readers understand why you reached certain conclusions. That’s good practice for a researcher. But you’ve also lost me here! Earlier in your post, you said that EE citations were too verbose in comparison to the _Register._ Now you seem to be arguing that our output should include, for our readers, information that the _Register_ does not include but EE does include in those citations you describe as “too verbose.” Somehow I feel like I’ve chased my tail and just caught it.

John wrote:>You will have to sell me a lot more that the online location of a digital edition is not a repository.

Okay, let's start with an analogy:

- The National Archives (NA) is a repository. It is a facility that houses original records and some books. The agency that runs it, the National Archives and Records Administration (NARA) is also a publisher. When we use NARA publications, we cite the publication and the publisher. When we cite original records at NA, we cite the repository.

- Ancestry.com Operations, Inc., is a publisher. Its best known offering is the website Ancestry.com, a publication that covers a variety of topics, just as a series of books or a journal series offers a variety of published material within the confines of that series or journal.

- For Ancestry, the repository equivalent to the National Archives, the site where Ancestry.com Operations, Inc., houses its data would be that anonymously located, highly secured building where all Ancestry’s servers reside.

- The “online location” we cite for a source published online (the URL) is the location of a publication, not the location of a repository.

The “website=repository” argument seems one in which the tail wags the dog. The usual example is a big corporate entity such as Ancestry. In that discussion, the exception looms so large that the vast majority of websites don't fit into that newly created “rule.”

Today’s new media does call for adaptation. However, I would argue that a logical adaptation has to apply two basic rules to be effective for programmers, researchers, and writers:

RULE 1The adaptation should follow established precepts to whatever extent possible. (This follows two time-honored fundamentals: [a] Don’t Throw Out the Baby with the Bathwater; and [b] Don’t ‘Fix’ What Ain’t Broken!)

RULE 2Any adaptation or redefinition of accepted terms should be applicable to all entities in its category, not just to a few exceptions. (This follows one other fundamental: The Exception Doesn’t Make the Rule.)

By extension, an argument that Ancestry is a repository would have to treat every website as a repository. However, most do not fit that mold.

Here, I'll break one last time to keep the 4,096-Character Police happy.

To continue the website = publication argument where I left off . . . Consider, for example:

Business website http://www.powellgenealogy.com

Here, the site creator offers 9 different ‘articles,’ whose titles are as follows: “Welcome,” “Research,” “Lectures,” “Articles,” “Publications,” “Classes,” “About Us,” "PA Counties,” and “Cemeteries.” Using the principle that a website is a publication, we would cite one of these items as follows:

If we consider a website to be a repository, then how would we cite her 10-paragraph offering titled “About Us”? It’s not a book. We can’t cite it as an unpublished manuscript in a repository; it’s published worldwide on the Internet. If we cite it as an article, then we have to cite the publication in which the article appears. An article, by definition, is a titled segment of writing that appears within a larger publication. So how would these standard concepts fit the concept of treating a website as a repository?

Family websitehttp://www.shahall.com/alpha.html

Here the site creator offers one thing: an index to 55185 individuals in her files. Can we really call an index a repository? No. What is logical is to cite this as a publication:

If we adopt a rule that a website is a repository, then Wikipedia would not fit. It’s simply an encyclopedia, published online, with many articles written mostly by anonymous individuals. The only way to cite it logically would be to use one of the standard formats for published encyclopedias, such as this:

1. “John Wesley,” Wikipedia (www.wikipedia.org : 27 November 2009).

BlogThe Ancestry Insider

If a website is a repository, how would this be cited? Each individual posting is only an article. Certainly each one could not be treated as a book. Nor could each be treated as an unpublished manuscript held by a repository, because it’s published on the Internet. But if we cite each topical posting as the article that it is, then a citation to an article has to have an accompanying identification of the publication in which it appears. What would that publication be, if the blog’s webpage is a repository?

If we were to argue that some websites should be repositories but other websites aren't, then:

- Where and how would we draw the line between them?

- How many (and what kind of) materials must a website offer before it qualifies to be a repository?

- How much success can we realistically expect in teaching "the masses" where to apply that line in their daily use of an infinite variety of websites?

This public conversation has been useful for me, and hopefully for others.

After reading your very thoughtful responses, I realized that I never gave you my "why". And that without knowing some of my frame of reference, my comments could be misunderstood.

So, in this first reply to your last series of postings, I will try to share my why.

All of my published work is on-line. Not because I have anything about genealogical books, but because I am still actively researching.

My research focus has been extensive single name studies with an interest in getting the family framework presented rather than the telling the story ("the "music") of a specific family.

I will freely admit that early in my "career" doing this I was not as effective documenting where I found various factoids as I try to be today. Or more importantly, how I came to my conclusions about family framework.

However, the question that I was continually asked by visitors to my web site was "where did you find this information". And not just what source, but physically what repository.

Of course, it was a usually a moot point if it was a census. But not always. I came to realize that it did matter where the census was read, even which set of microfilms. I think this is likely less an issue today.

The more obscure the source, the more likely that I needed to alert the reader to the repository information.

The choice I made for my web publishing software was partially driven by how well it would share this type of information so a reader would not have to ask me for this information.

As an aside, you asked if I was designing a new paradigm for web publishing. That is not my intent at all.

However, the web does provide additional flexibility in presenting data that a book type publication does not. IE, you can bury source technical details easily but still make it convenient for the reader to drill down to find them.

The second "why" has been alluded to previously. I have seen the citations produced by several genealogy programs and have been numbed by the verboseness of them. Let me be clear, I am talking about the citations, not the so-called master source.

I ask, why can't the citations be as compact as I read in the Register?

A follow-on to this deals with how some software packages train its users to deal with certain sources. Let me use US Census as an example.

When you are using microfilms for the census, the source approach would include the NARA series designator for the decade and the roll number for the specific film. If the film was being accessed through the FHL, their are different numbers, I believe.

However, if you emulate that approach when you are accessing the images on line, many researchers create different master source entries for each decade and state and sometimes even county.

I would prefer to create a master source for, say, the 1850 US Federal census at ancestry.com and include the ancestry database ID (as the equivalent of the call number). Then the citation would be to 1850 census, State, County, Locality, and page. (I won't use image number because then someone trying to find data from different source - eg, microfilms - could not easily find it.)

The citation is short; the master source is more detailed and it points to the call number/database ID in a particular repository.

The web software I use shows the citations in the standard user presentation with links to the master source for greater details and that has a link to the repository if the reader chooses to drill down that far.

My PC software is similarly able to construct Register style reports that can include (optionally) a bibliography and repository info.

My observation is that software packages that use your Evidence Explained strategy tend to facilitate users creating more verbose citations.

I will continue replying as I can. Please feel free to email me directly at jlisle AT gmail.com.

If your website is open to the public, would you consider posting a link? While we could discuss the design problems of most software programs in the abstract, it wouldn't hurt to pick on a specific program.

Let me fire the opening salvo by saying that it is the "Master Source" implementation that creates the issue you mention. It's nice that software has such a feature to ease the entry of source information, but it is a weakness that software locks the "master source" and the source list concepts together. This robs users of the ability to logically organize a source list after entering all the sources.

-- The Insider

PS BTW, a terminology land mine we must keep in mind for this discussion is the misdefinition of source and citation by virtually all genealogy software. Academics lack a simple term to describe the citation elements not included in a source list entry (aka bibliography entry), but present in a note (hereinafter, both end note or footnote). Genealogy programs typically use the term "citation" for these elements, even though source list entries, "master sources," and notes are all citations. All are citations and all describe sources. Adopting this non-standard terminology leads to many of the source and citation deficiencies present in genealogy software today.

I wonder. If genealogy software defined citations correctly, would we be having this discussion about verbosity?

Subscribe via email

The Ancestry Insider

The Ancestry Insider is consistently a top ten and readers’ choice award winner. He has been an insider at both the two big genealogy organizations, FamilySearch and Ancestry.com. He was Time Magazine Man of the Year in both 1966 and 2006. And he really is descended from an Indian princess.

Some four years ago I wrote a series of articles about cursive handwriting for indexers. With many new indexers trying out FamilySearch Inde...

Biography

The Ancestry Insider was a readers’ choice for the top four genealogy news and resources blogs, part of Family Tree Magazine’s “40 Best Genealogy Blogs” for 2010. He reports on the two big genealogy organizations, Ancestry.com and FamilySearch. He was named a “Most Popular Genealogy Blogs” by ProGenealogists, and has received Family Tree Magazine’s “101 Best Web Sites” award every year since 2008. A genealogical technologist, the Insider has a post-graduate technology degree and holds a dozen technology patents in the United States and abroad. He has done genealogy since 1972 and has worked in the computer industry since 1978. He was Time Magazine Man of the Year in both 1966 and 2006. And he really is descended from an Indian princess.

Legal Notices

The Ancestry Insider is written independently of Ancestry.com and FamilySearch. The opinions expressed herein are those of the author, and do not necessarily reflect those of Ancestry.com or FamilySearch.

E-mails and posted messages may be republished and may be edited for content, length, and editorial style.

The Ancestry Insider may be biased by the following factors: 1) The Ancestry Insider accepts products and services free of charge for review purposes. 2) The author of the Ancestry Insider is employed by the Corporation of the President of the Church of Jesus Christ of Latter-day Saints, owner and sponsor of FamilySearch. 3) The author is a believing, practicing member of the same Church. 4) The author is a former stock-holder and employee of the business now known as Ancestry.com and maintains many friendships established while employed there. 5) It is the editorial policy of this column to be generally supportive of Ancestry.com and FamilySearch. 6) The author is an active volunteer for the National Genealogical Society.

"Ancestry Insider" does not refer to Ancestry.com. Trademarks used herein are trademarks or registered trademarks of their respective owners. The Ancestry Insider is solely responsible for any silly, comical, or satirical trademark parodies presented as such herein.

All content is copyrighted by the Ancestry Insider unless designated otherwise. For content copyrighted by the Ancestry Insider, permission is granted for non-commercial republication as long as you give credit and you link back to the original.