These are common historical sources, and there are accepted printed
citation formats applicable to each of them, but this Citation entity goes
further; it can also identify a collection of works, a repository or
institution, or even represent attribution to an individual.

In the citations of normal written or printed works, there
are two main citation modes that may be employed within text: document labels
and source labels, being applicable to documents and images respectively.
Citations may involve reference notes linked to inline superscript indicators in
the main text. Alternatively, they may involve a source list or bibliography at
the end of the work. Parenthetical in-text citations — such as “Smith (2004, p. 39) claims that...”, or “...(Smith
2004, p.39)...” if all details are parenthesised — are commonly associated with published sources in academic work
and are less appropriate for genealogical or historical citations. This is
because they do not accommodate the source provenance or analytical notes that
are frequently required.

There are citation conventions that apply to different
source types and scenarios in order to present some consistency, and these have
specifications for their layout, quotation marks, punctuation, and use of
italics. Several citation styles are in common use. For instance, in the humanities
there are: Modern Language Association (MLA), Harvard referencing, Modern
Humanities Research Association (MHRA), and the Chicago Manual of Style (CMOS).
There are other styles commonly used in law or the sciences too.

The Board for Certification of
Genealogists (BCG) recommends the
use of both CMOS and EE for family history. EE is a style devised by Elizabeth
Shown Mills in Evidence
Explained: Citing History Sources from Artifacts to Cyberspace (Baltimore: Genealogical
Publishing Co., 2009) to cover the wider range of historical and unpublished
sources used in family history.

It should be understood, though, that all these citation
styles and modes relate to the final-form written or printed citations. Their
application is therefore relevant to a specific end-user rather than to computer
storage. Since those final-form citations are designed to be humanly-readable, they
also embody elements of a specific locale, culture, and preferred style. This
is a problem for electronic documents as they are not computer-readable, and so
cannot be adjusted to suit the locale or preferences of an arbitrary end-user. It
is therefore necessary to go back to the essence of a citation rather than consider
specific physical implementations —
i.e. to provide sufficient information through a digested citation to uniquely
identify a source, its characteristics, and any analytical assessment. These citation-elements — implemented through STEMMA’s Parameter
mechanism — should be sufficient to support the formatting appropriate
for any given end-user.

Although it is possible to generate citations of different
style, and for different locales, from the discrete citation-element values,
there are many complications in the real world. A citation sentence may contain
different layers describing the provenance of the source and its information,
or it may contain analytical notes. A reference note may contain multiple
citation sentences — a tour of
these scenarios was covered in Cite
Seeing. Subsequent references to the same source would typically use a
shortened form of the associated reference note (see ‘WhereIn’ attribute), or
the author may have employed an explicit hereinafter-cited-as
term, or the Latin abbreviation Ibid.
A footnote may have woven two source references into the same piece of text.
Certain parts of a citation may not have been available (e.g. an undated
document), or were erroneous, and so the citation would need to override any
simple template-like formatting. In effect, authors of narrative work are loath
to delegate generation of their citations to a piece of software working
blindly from a set of data values. It is therefore necessary to support
hand-crafted forms, and change the focus of citation-elements to that of
correlation and interrogation rather than formatting.

The scheme presented here is a generalised computer-readable
one that would cope with all possible source types and scenarios. It does not
strive to enumerate all possible source types, or specify what elements they
require, or mandate a particular presentation style; the main goals of this
scheme are to keep it open-ended so that source types can be defined freely, to
parameterise the scheme so that it can interface to external citation-templates,
and to give it a hierarchical structure for representing different layers of a
citation (e.g. for provenance or location).

The parameterisation is available in the citation-title, the
format-strings, narrative elements, and the values of Parameters themselves (i.e.
within a Params element).

Note that Parameter names are local to the corresponding
source-type. There is no sharing of Parameter names between different
source-types, and no implied semantics in any of their names. If two
source-types each have a Parameter called ‘Publisher’ then they are each
interpreted in the context of their respective source-types. In effect, no
semantics are conveyed directly by the Parameter name — that is the purpose of the optional SemType attribute.

The valid Parameter data-types are documented at: Data Types.
The same ItemList approach to lists is taken as for Property values. The semantic
type is indicated by the SemType attribute which may use the Dublin Core vocabulary,
e.g. SemType=’DC:title’ or SemType=’DC:publisher’. The default value for the
Optional attribute is 0 (i.e. false) which means that a non-blank value must be
provided. The ‘WhereIn’ attribute flags parameters that identify a location or
entry within a source, as opposed to the source itself, its provenance, or its
location. The ‘Subst’ attribute allows the formatting of a value to be
overridden, and is especially useful for unknown values. For instance, an
undated document might be represented with a date Parameter having a value of
“?” but a substitution of “n.d.” or “[1832]”.

The <BaseCitationLnk> element may nominate an Abstract
Citation from which data may be inherited by the current Citation, in much the
same vein as base classes and derived classes in software programming. An
Abstract Citation must define no embedded Keys, can only reference other
abstract entities, and must contain Parameter definitions rather than Parameter
settings. Any application of Parameter substitution must therefore occur after
the inheritance process has completed. If an implementation creates a temporary
conglomerate entity in memory by doing a physical merge then it must not be
persisted back to the data file, otherwise it constitutes a data corruption.
See Inheritance and Parameters
for more information.

It is important retain a clear view of the distinction
between a Citation and a Resource. As an example, consider UK BMD references.
These might be linked to the defining body, say with something like http://www.gro.gov.uk/, in order to create a
unique source citation. However, if you wanted to be able to pull up the
appropriate census page from some Web site then that would be done via a
corresponding Resource entity.

The simple Dublin Core (see Dublin Core Metadata Initiative)
terms cannot clearly distinguish, say, the title of an article from the title
of a journal containing that article, or provide a clear indication of other
data related to the containing journal such as publication date as distinct from
the article submission date, or the volume and issue numbers. That same page
recommends the use of the OpenURL
(ANSI/NISO standard, Z39.88-2004) ContextObject for representing the context of
a bibliographic citation, although it does not take this to the level of a
hierarchical chain. The OpenURL concept is designed to provide the context of a
citation in a machine-readable form that can be resolved by an unspecified
library or archive. In other words, the Dublin core recommendation doesn’t cite
a source directly but as a library-independent
hyperlink to content. At best, it constitutes a reference to an indefinite source.

The SemType attribute associates such semantic information
with the individual citation-elements (i.e.
Parameters) but leaves the Parameter names to be chosen independently to
suit the source-type. Other semantic types could be applied using the same
attribute, but with a different namespace.

The STEMMA scheme described here is fully in keeping with
those Dublin Core recommendations but is not specifically tied to it. It allows
each type of source to be represented by a source-type-uri. Parameters can be
applied to build up a citation description for a specific instance of that
source-type. The source-type-uri also acts as a global key for retrieving
localised text for soliciting Parameter values, data-types for validating the
Parameter values, and for interfacing to a citation-template system in order to
generate a formatted string for the user. If omitted then an effective one must
be available through inheritance.

Citation Chain

Citations may be linked to describe the provenance of a
source, the provenance of the information itself, where the originals are held,
and any analytical comments. These are known as citation
layers and the associated chain forms part of a hierarchy created through
the use of the <ParentCitationLnk> element.

Note that STEMMA Citation chains do not differentiate
between citing a specific source of information, citing a collection or work
that the information was contained within, or citing a repository or
institution hosting that work or collection — they are all citing something in the more literal sense. They
do not mandate the juxtaposition of definite
and indefinitesources,[1]
or the ordering of original and derivative references (see below). Supporting
citation layers avoids duplication and provides a stronger representation
overall.

The links between the layers may be characterised using the
Type=’layer-type’ as follows. Note that this doesn’t describe the layers
themselves — which should be obvious
from their content — but rather the relationship between the layers.

These should cope with instances of image derivatives where
the emphasis is placed on the image or the original document. This choice is
covered in detail by Elizabeth Shown Mills at “QuickLesson 19: Layered
Citations Work Like Layered Clothing”, Evidence
Explained: Historical Analysis, Citation & Source Usage (https://www.evidenceexplained.com/content/quicklesson-19-layered-citations-work-layered-clothing
: posted 4 Sep 2014, updated 5 Mar 2016, accessed 4 Apr 2017), under “Online
Records at State-Agency Sites”.

Display Format

STEMMA allowed preferred hand-crafted citations to be specified
in the <CitationRef> element (see CITATION_REF).
This is particularly useful when there are shortened forms (e.g. employing
‘hereinafter cited as ...’) which cannot be generated directly from the
Parameters representing the citation-elements. Individual Parameters may also
override their default formatting, say for substituted text in the absence of a
value, or for abbreviated list formatting. Taking the onus off formatting
allows the Parameter settings to be used moer for correlation and
interrogation.

Citation entities will require formatting to a given style
and locale before they can be displayed. A later version may allow styles to be
automatically selected from Citation Style Language (CSL) templates — CSL is an open XML-based language
for defining the parameters and formatting for different citation types. Such
styles can be browsed and searched via the Zotero Style Repository, although it
currently has no concept of a URI string which is unfortunate because it would
be a convenient handle to distinguish the templates and applicable source-types
in the repository. A problem with such citation-template schemes is that they
try to format plain textual elements into a simple template, whereas STEMMA
assumes that objects (in the OOP
sense) representing, say, a Person, Place, or Contact can be provided. The
advantage of this scheme is that the template system can call-back on
well-defined methods to obtain a particular style of name, or specific contact
details; otherwise the genealogical software product is assumed to have
intimate knowledge of the specific template.

In the absence of any external formatting support for
citations, or any explicit hand-crafted citations, the <DisplayFormat>
element can also be used as a simple STEMMA-defined citation-template. It
allows a number of language-specific text strings to be defined for different
formatting modes (e.g. full reference note — the default), and these can make use of mark-up and
parameterisation to employ them in multiple scenarios. Although some brief
examples are presented below, a fuller example may be found at: Citation
Template. NB: this template feature is purely declarative and currently
contains no decisional control over the generation of the citation text.

Examples

Here’s a simple example of a traditional book citation:

<Citation Key=’cOldNottm’>

<Title>Old Nottingham Notes</Title>

<URI> http://stemma .parallaxview.co/source-type/book/
</URI>

<Params>

<Param Name=’Author’>James Granger</Param>

<Param Name=’Title’>OLD NOTTINGHAM : Its Streets, People,
etc</Param>

<Param Name=’Publisher’>Nottingham Daily Express
Office</Param>

<Param Name=’Date’>1904</Param>

<Param Name=’Pages’/>

</Params>

<Text>

Reprinted from the Nottingham Daily Express, October 3rd, 1903 –
July 9th, 1904.

Whether this generates a long or short reference note depends
on whether the same source is referenced earlier in the current
<Narrative> element.

Citations can become very complex since the author will not
only want to cite the source, and the information obtained from that source,
but the context of how it substantiates or contradicts their assertions and
conclusions. This often involves some type of analytical commentary in the
citation. For instance:

Death
notices, Ulster Gazette
and Daily National Intelligencer, both dated 24 January 1815. Corra Bacon-Foster,
"The Story of Kalorama," Records of the Columbia Historical Society (1910), 108, states Louisa left four children;
three have been identified. In 1810, Charles "Cating" and a female,
both over 44, were enumerated with one male and female aged 26-44; one male and
female aged 16-25; and one male under 10 - suggesting that George, Louisa, and
their first son may have been living in the Catton household. See 1810 U.S.
census, Ulster County, New York, New Paltz, p. 116, line 6; NA micropublication
M252, roll 37.

Each reference note may contain multiple “citation
sentences” (separated by periods), and each of these may contain multiple
layers (separated by semicolons). See Cite
Seeing for a deeper discussion.

[1]
Note that academic citations, such as those in journals, often refer to an
indefinite source. This allows them to be much briefer but it only works
because such sources are published and easily accessible; it makes no
difference where the article or paper was obtained from.