Note

This is a work in progress.

Abstract

The Greek New Testament was copied by hand for almost fifteen centuries until the advent
of mechanized printing provided an alternative means of propagation. Translations into other
languages were produced as well. Some of these — such as the Latin, Coptic, and Syriac
versions — appeared early and thus preserve ancient states of the text. Patristic citations
form another class of evidence that allows varieties of the text to be associated with
particular localities and epochs. Analysis of textual variation allows relationships between
these three classes of witnesses — manuscripts, versions, and patristic citations — to be
explored. This article applies various kinds of analysis to textual variation data collected
from a variety of sources. The analysis results offer complementary views of the textual space
occupied by these venerable witnesses to the New Testament text.

Introduction

As with every widely read work from antiquity, the New Testament exhibits textual
variation introduced by scribes and correctors. Sites where textual variation occurs are
identified by comparing extant witnesses of the text.[1] Differences may be classified as orthographic or substantive. Orthographic
variations are often ignored as they only affect the surface form of a text and not its
meaning. Substantive variations do affect meaning: they are often called
readings or variants The list of witnesses that
support a particular reading of a particular variation site is known as the reading's
attestation. A list of all readings at a variation site along with the
attestation of each reading is called a variation unit. Critical editions
often present variation units in an apparatus. The present study is based on analysis of data
sets extracted from critical editions, monographs, journal articles, dissertations, or online
databases.

There is an ongoing effort to establish the initial text which stands
behind the range of texts found among surviving witnesses of the New Testament.[2] The most important witnesses for establishing the initial text fall into these
categories:

Greek manuscripts

ancient versions

patristic citations.

Greek manuscripts are the primary witnesses to the text of the New Testament.
Ancient versions are early translations of the Greek text into languages such as Latin,
Coptic, Syriac, and Armenian. It is often possible to establish which Greek reading a version
supports by translating its text at a variation site back into Greek. Patristic citations are
quotations of the scripture by Church Fathers. Which reading was in a Church Father's copy of
the text at a particular variation site can often be discerned if that part of the text is
covered by one of his quotations.

A large proportion of the textual evidence disappeared long ago. Even a comprehensive data
set that includes all readings of all extant witnesses is still a mere sample of what once
existed. In general, the older the copy, the more likely it is to have been lost. This loss of
data presents a fundamental problem: if extant texts do not represent the oldest copies then
the survivors will give a skewed impression of the initial text. Happily, there is a way
forward: if like texts are grouped and more or less accurate representations of the archetypal
texts that gave rise to the groups can be reconstructed then these archetypes have a claim to
being more representative of the ancient text. In addition, the extant copies provide data for
estimating how accurately the copyists practiced their art. Armed with a knowledge of the
number of generations of copies and typical rates of substituting one reading for another per
generation, it is possible to say how much the original text is likely to differ from the
initial text recoverable from extant copies. While raw data that could be used for the purpose
is presented below, I do not propose to reconstruct hypothetical archetypes or estimate rates
of change here. Instead, this article will focus on presentation and analysis of available
data to explore grouping among the extant texts.[3]

Even though much is lost, a stupendous amount of evidence remains. There are many
thousands of manuscripts in Greek, Latin, Syriac, Armenian, and other languages. Patristic
citations are also very numerous. Given such a great cloud of witnesses, it can be difficult
to see where each one stands in relation to the others. Fortunately, various methods of
statistical analysis can be applied to data sets which relate to textual variation in order to
explore relationships among the witnesses.

Analysis might begin from a number of starting points. One suitable place to begin is a
data set derived from a critical apparatus which gives attestations (i.e. lists of witnesses)
in support of readings found at variation sites. In nearly all cases, practical considerations
restrict an apparatus to presenting a sample of extant texts. Results obtained by analysis of
these data sets are therefore provisional because it is always possible that including further
data would produce different results. However, it is reasonable to expect that analysis
results will approximate those that would be obtained if a more comprehensive data set were
analysed provided that the sample is sufficiently large and has been selected without
systematic bias.

The information contained in an apparatus must first be encoded as illustrated by
reference to this entry from the fourth edition of the United Bible Societies'
Greek New Testament (UBS4):

Figure 1. Apparatus entry (Mark 1.1, UBS4)

The data sets presented in this article use a number of encoding conventions. Exotic
characters and superscripts can cause problems when plotting analysis results so witness
identifiers (i.e. sigla) are Romanized and superscripts are replaced by hyphenated sequences
of characters. Apart from these changes, the method of identifying witnesses used by the
source of a data set is usually retained. Be warned, dear reader: this approach is liable to
cause confusion when two sources use different identifiers for the same witness. For example,
Codex Sinaiticus may be identified as Aleph or 01.
Also, the critically established text used in the INTF's Editio Critica
Maior may be referred to as A (for
Ausgangstext), making it easy to confuse with the
A often used to represent Codex Alexandrinus.

When it comes to encoding apparatus entries, the textual states found among the witnesses
can be represented by numerals, letters, or other symbols. In the present example, the first
reading is encoded as 1, the second as 2, and so on. The state of a
witness is classified as undefined and encoded as NA (for not
available) when it is not clear which reading the witness supports. For
manuscripts this may be due to physical damage or because the manuscript does not include the
section of text being examined; for versions, it may not be clear which state of the Greek
text is supported by a back-translation of the version; for patristic citations, the reading
of a Church Father's text may be unclear if the quotations are not exact (e.g. adaptations,
allusions, or quotations from memory) or if different witnesses of the Church Father's text
have different readings. In the present example, a number of versions (Latin, Syriac, Coptic)
and patristic citations (e.g. those of Irenaeus, Ambrose, Chromatius, Jerome, and Augustine)
are treated as undefined because it is not clear which readings they support at this variation site.[4]

Encoded readings are entered into a data matrix which has a row for
every witness and a column for every variation site. The appropriate code is entered at the
cell corresponding to a particular witness and variation site, namely that cell located at the
intersection of the witness row and variation site column. Manuscript correctors are treated
as separate witnesses, as are supplements.

Figure 2. Part of a data matrix (Mark, UBS4)

The next step is to construct a distance matrix which tabulates the
simple matching distance between each pair of witnesses sufficiently
represented in the data set. The simple matching distance between two witnesses is the
proportion of disagreements between them at those variation sites where the textual states of
both are defined. Being a ratio of two pure numbers, this quantity is dimensionless (i.e. has
no unit). It varies from a value of zero for complete agreement to a value of one for no agreement.[5] A witness only qualifies for inclusion in a distance matrix if all distances for
that witness are calculated from at least a minimum number of variation sites. This constraint
is intended to reduce sampling error to a tolerable level. It is enforced by a vetting
algorithm that progressively drops witnesses with the least numbers of defined variation sites
until all distances in the distance matrix are guaranteed to have been calculated from a
minimum acceptable number of sites. The minimum acceptable number for the distance matrices of
this study is nearly always set at fifteen.[6]

Figure 3. Part of a distance matrix (Mark, UBS4)

Computing Environment

Various analytical methods can applied to a data set derived from a critical apparatus to
explore relationships between witnesses. All of the results presented in this article are
obtained using a statistical computing language called R. The analysis
is performed by means of R scripts written by the author which are
available here. The R program and
additional packages (e.g. cluster, rgl, ape) required
to run the scripts can be installed using instructions provided at the R web site.

Readers are encouraged to use the scripts. There are various ways to run a script once the
R environment is installed. For users who prefer a command line
interface, typing R into a terminal window provides an R
prompt. (It helps to change to the directory which holds the scripts before launching
R.) A command can then be entered in order to run a script. As an
example, the command source("dist.r") typed at the R
prompt causes the dist.r script to
construct a distance matrix from the specified data matrix. Parameters such as paths to input
and output files are specified in the scripts, which users are free to edit.

Data Sets

The data sets analysed in this article derive from various sources. Each source is
assigned an identifier based on the author or party who produced it. A source is often used to
produce data sets for a number of New Testament sections such as individual gospels and
letters. Each analysis result is keyed to the relevant section and source identifier so that
its underlying data set can be identified.

The data sets generally retain the symbols used by their associated sources to represent
New Testament witnesses. Some represent manuscripts by Gregory-Aland numbers (e.g. 01, 02, 03,
044) while others use letters or latinized forms (e.g. Aleph, A, B, Psi). These symbols carry
through to the analysis results. In INTF data, ECM or
A (for Ausgangstext or initial
text) represents the text of the Editio Critica Maior. The
A for Ausgangstext in INTF data sets
should not be confused with the A for Codex Alexandrinus in other data
sets. Also beware of confusing texts when the same letter (e.g. D, E, F, G, H, K, L, P) refers
to different manuscripts in different parts of the New Testament. Abbreviations
UBS, WH, and TR stand for the
texts of the United Bible Societies' Greek New Testament, Westcott and
Hort's New Testament in the Original Greek, and the
Textus Receptus, respectively. Maj,
Byz, and Lect stand for majority, Byzantine, and
lectionary texts, respectively. The relevant printed editions should be consulted for
explanations of what these group symbols represent.

A source may be in the form of apparatus entries, tables of percentage agreement, or lists
of pairwise proportional agreement. If the source is an apparatus then it is used to construct
one data matrix per desired section. Each data matrix includes those witnesses and variation
sites covered by the apparatus, using symbols such as numerals or letters to encode reported
textual states (i.e. readings). A distance matrix is then constructed from the data matrix. If
the source only reports percentage or proportional agreement between witnesses then a distance
matrix is constructed directly from the agreement data and no data matrix is produced.
Distances are usually specified to three decimal places regardless of whether this level of
precision is warranted.

Analysis cannot proceed if a distance matrix has missing entries. This problem can be
avoided by manually producing multiple distance matrices from the same source data, each
omitting a particular witness whose inclusion would create an empty cell. This is done for a
number of the distance matrices presented below, including Brooks' table for John (where there
is a missing cell for C and Old Latin j) and Fee's table for John 1-8 (which lacks cells where
the first hand and corrector intersect for P66 and Aleph).

Distance matrices are normally obtained by applying the default vetting algorithm, which
drops the least defined witness of each pair used to calculate a distance until all distances
are calculated from the minimum acceptable number of variation sites where both are defined,
which is normally fifteen. In some cases, an alternative approach is used which forces a
particular witness to be retained provided it has enough defined variation sites at the
outset. Examples include UBS2 distance matrices for Mark, John, and Acts where P45, Ephraemi
Rescriptus (C), and Sinaiticus (Aleph) have been retained due to their importance.

It is helpful to know what analysis results look like when there is no clustering among
the objects being analysed. (Generic terms such as object,
observation, case, or item
may be used for the things being compared when they are not necessarily New Testament
witnesses.) We have a natural facility for recognising group structure but are also prone to
mistake a purely random distribution of items for a cluster. One way to avoid this kind of
error is to be familiar with analysis results produced from a data set that has no group
structure. With this purpose in mind, a control data set may be generated which is analogous
to its model data set in various respects (e.g. number of objects, number of variables, mean
distance between objects) but has no actual clustering among its objects.

A control data set is generated by performing c trials to randomly select one
of two possible states (1 and 2) then repeating this r
times to produce a data matrix with r rows of objects and c columns
of variables. The generator aims to produce objects which have a mean distance of
d between them. Values for r, c, and d
are derived from the model: r is the number of objects in the model distance
matrix; c is the rounded mean number of variables in the objects from which the
model distance matrix was calculated; and d is the mean of distances in the model
distance matrix. The control data matrix is then used to calculate a control distance matrix
which has the same number of objects as the model and approximately the same mean distance
between objects.[7]

The binomial distribution predicts the range of distances expected to occur between pairs
of objects generated in this way. A 95% confidence interval is the range of distances expected
to occur for 95% of randomly generated cases. Only 5% of distances between two randomly
generated objects fall outside the upper and lower limits defined by this interval. A distance
outside this range, either less or more, is statistically significant in the sense that it is
unlikely to happen by chance (though there is a 5% chance it will). A distance outside the
normal range defined by the 95% confidence interval indicates an
adjacent or opposite relationship between two
objects: adjacent if the distance is less than normal and opposite if greater.[8]

While distances outside the normal range are unlikely to occur by chance, a distance
inside that range does not necessarily imply lack of relationship between two objects: a
relationship between the two may exist but it is not possible to say so with confidence. The
relative size of the normal range contracts as the number of places compared increases so a
distance which is not statistically significant in one data set may be statistically
significant in another which includes more variation sites.

The following table presents the data sets and their sources. Links in the table provide
access to data and distance matrices which are formatted as comma-separated
vector (CSV) files so that they can be downloaded and imported into a spreadsheet
program. A distance matrix is always provided but a data matrix is only included if one has
been constructed. If there is no data matrix then NA for not
available is entered in the relevant column.

Tables of percentage agreement for the Gospel of John and Paul's Letters
from Arthur Cunningham's “New Testament Text of St. Cyril of Alexandria,”
421-2 and 753. Associated tables of counts are on pages 423-4 and 754.

Data matrices for Acts, the General Letters, and Paul's Letters from Gerald
Donker's Text of the Apostolos in Athanasius of Alexandria. Gerald
Donker and the SBL have made this data available through an archive located at sbl-site.org/assets/pdfs/pubs/Donker/Athanasius.zip. May their respective tribes
increase!

Data used by Jared Anderson for his ThM thesis, “Analysis of the Fourth Gospel in
the Writings of Origen.” The data was originally collected by Bart D. Ehrman,
Gordon D. Fee, and Michael W. Holmes for their Text of the Fourth Gospel in the
Writings of Origen. (Bruce Morrill did the statistical analysis presented in
that volume.) A revised version of Anderson's thesis will be published in SBL's New
Testament in the Greek Fathers series.

Tables of percentage agreement from three articles by Gordon Fee: (1) a
table covering Luke 10 from “The Myth of Early Textual Recension in
Alexandria”; (2) tables covering John 1-8, John 4, and John 9 from “Codex
Sinaiticus in the Gospel of John”; (3) another table covering John 4 but
including patristic data from “The Text of John in Origen and Cyril of
Alexandria.” Two distance matrices are produced for each table of percentage
agreement with a blank entry for agreement between the first hand and corrector of a
manuscript.

Tables of percentage agreement from Larry Hurtado's
Text-Critical Methodology and the Pre-Caesarean Text. There is
one table for each of the first fourteen chapters of the Gospel of Mark, one for Mark
15.1-16.8, and another for places where P45 is legible. Data from an augmented version of
Hurtado's P45 table is presented below in the Mullen source
entry.

Distance matrices made from tables located at http://intf.uni-muenster.de/PPApparatus/. These present data related to
Strutwolf and Wachtel (eds.), Novum Testamentum Graecum: Editio Critica Maior:
Parallel Pericopes. The INTF has generously provided open access to this
data.

Data extracted from Roderic Mullen's The New Testament Text of
Cyril of Jerusalem. Two data sets have been prepared for the Gospel of Mark:
one is a data matrix based on citations isolated by Mullen (112-7); the other is a
distance matrix corresponding to a table of percentage agreement which relates to the
parts of Mark's Gospel covered by P45 (41). Mullen based the latter on data compiled by
Larry Hurtado then added other texts such as Family 1, 28, 157, and 700 (40, n. 81).

Tables of percentage agreement compiled from the apparatus of the second
edition of the UBS Greek New Testament by Maurice A. Robinson. The
tables were originally presented in Robinson's “Determination of Textual
Relationships” and “Textual Interrelationships.” They were
transcribed by Claire Hilliard and Kay Smith.

Data matrices constructed from the apparatus of the fourth edition of the
UBS Greek New Testament. (The UBS4 apparatus includes minuscule
2427, which is now regarded as a forgery. The data for this manuscript has been retained
for the sake of interest; dropping it would have little effect on analysis results.)
Richard Mallett constructed the matrices for Mark, 2 Corinthians, and Revelation. A
substantial part of the matrix for Matthew was encoded by Mark Spitsbergen. (Only the
first fourteen chapters of Matthew are presently covered.) In some cases, the evidence for
a number of similar witnesses is consolidated to produce a group reading. For example, the
majority reading of vg-cl, vg-st, and vg-ww is counted as the reading of the Vulgate (vg)
in 1 John. Matrices for Mark derived as controls or by excluding readings found in
representatives of five textual groups (B = Vaticanus, Byz = Byzantine, it-ff-2 = Old
Latin ff2, f-1 = Family 1, vg = Jerome's Vulgate) are included
as well. PAM analysis (see below) was used to select the representatives. Variants were
excluded by script mask.R, which for each witness
drops (by substituting NA) those readings that match the representative
text.

Tables of proportional agreement from Tommy Wasserman's “Patmos Family of
New Testament MSS” covering Matt 19.13-26, Mark 11.15-26, Luke
13.34-14.11, John 6.60-7.1, and the Pericope Adulterae
(usually John 7.53-8.11). The underlying collations used a reconstructed text to represent
Family Π in Matt 19.13-26 and the Pericope Adulterae, which
text is labelled f-Pi in the analysis results.

a control comprised of
randomly generated objects which are by definition unrelated.

Clusters may be isolated by inspecting a CMDS or NJ plot, cutting a DC dendrogram, or
producing a partition using PAM analysis. Similar objects tend to be similar distances from a
reference object, near each other in a CMDS plot, in the same branch of DC and NJ plots, and
in the same group of a PAM partition. The more eccentric an object when compared to others in
the data set, the more isolated it will appear in analysis results. If an object is
mixed, being comprised of a mixture of states characteristic of
differing groups, then a CMDS result will locate it between the relevant groups,
proportionally closer to those whose characteristics it most often contains. In DC, NJ, and
PAM analysis, a slight change in the distance matrix can cause a mixed object to leap from one
branch, cluster, or group to another.

The respective analysis results are often but not always consistent. If all of the
analysis results point to the same conclusion with respect to implied clustering then that can
be taken as a firm result; if they differ then each result needs to be handled with due
caution. The distance matrix remains the final arbiter when the affiliation of an object is
not clearly indicated by concurrence of analysis results. When the classification of an object
is uncertain, further information may produce a more definite result. However, if an object
has a mixed nature then it may remain difficult to classify as anything but a mixture. A mixed
object will tend to be isolated unless other objects happen to have similar mixtures of
states.

One aim of New Testament textual research is to recover the initial text, namely the
common ancestor of extant New Testament texts. Some aspects of the results produced by the
analysis modes used in this study can be interpreted in terms of temporal development. In
particular, there may be points of contact between the the family tree of New Testament texts
and the tree-like structures produced by divisive clustering and neighbour joining. However,
these tree-like analysis results do not provide unequivocal guidance on the location of the
initial text. Any node (i.e. junction) or leaf (i.e. terminal) of a DC or NJ tree could be
closest to the initial text. If one were to make a string model of such a tree, with knots at
every node tying together string segments of the appropriate lengths, the model could be
picked up at any node or leaf. The point being held would become a new tree root so there
would be as many possible trees as the number of nodes and leaves. The trick is to decide
where the root of the tree is located, a topic which will occupy the field of New Testament
textual research for some time to come. The Coherence-based Genealogical Method (CBGM)
developed by the INTF can be used to investigate whether the witnesses in one branch are
closer to the initial text than those in another.[9] Phylogenetic techniques such as described in Spencer, Wachtel and Howe's
“Greek Vorlage of the Syra
Harclensis” can also be used to investigate the priority of texts. Yet
another possibility is to see where texts reconstructed from early patristic citations are
located in trees produced by DC and NJ analysis.

Ranked Distances

Ranking involves selecting a reference object then extracting its
row of the distance matrix. Entries in that row are then ordered by increasing distance from
the reference. As an example, the following ranks witnesses in the UBS4 data set for the
Gospel of Mark by distance from minuscule 205, which is a member of Family 1. The reference
witness (i.e. 205) is a distance of zero from itself and would stand at the head of the list
if included.[10]

Statistical analysis shows what range of distances is expected to occur between
artificial objects comprised of randomly selected states. Distances in this normal range
(i.e. those for 1243, slav, ..., Psi) are marked by asterices to show they are not
statistically significant. Some texts (i.e. f-1, Lect, ..., 597) have an adjacent
relationship to minuscule 205 while others (i.e. Delta, cop-bo, ..., D) are opposite.

A ranked list of distances from one member of the control data set shows what to expect
for unrelated objects. The 95% confidence interval calculated using the binomial
distribution with parameters derived from the model data set has lower and upper bounds of
0.374 and 0.553, respectively. (An interval of this kind can be compactly written as [0.374,
0.553].) As can be seen, distances in the control data set tend to fall within these bounds.[11]

A list of ranked distances can be produced for every object in a data set. While
clustering among members of the data set might be discerned from lists of this kind, the
other analysis modes are better suited to discovering inherent group structure.

Classical Multidimensional Scaling (CMDS)

Classical multidimensional scaling finds the set of object coordinates which best
reproduces the actual distances between objects in the distance matrix. A plot of these
coordinates shows how the objects are disposed with respect to one another when all
distances are considered. This study refers to such a plot as a map and
uses the term textual space for the space obtained when the objects are
textual witnesses.

Achieving a perfect spatial representation of a distance matrix may require any number
of dimensions up to one less than the number of objects. This presents a problem when a
large number of objects is being examined because our spatial perception is
three-dimensional. Fortunately, three dimensions is often sufficient to achieve a reasonably
good approximation to the actual situation. CMDS analysis produces a coefficient called the
proportion of variance which indicates how much of the information
contained in a distance matrix is explained by the associated map. This coefficient ranges
from a value of zero to one, with a value of one indicating that the map is a perfect
representation of the entire set of actual distances.

The CMDS map obtained from the UBS4 data set for Mark's Gospel shows that the textual
space formed by New Testament witnesses has structure. The associated proportion of variance
figure is 0.51, meaning that about half of the entire distance information is captured in
the plot.[12]

Figure 4. CMDS (Mark, UBS4)

The galactic imagery Eldon J. Epp uses to describe text-types seems apt for the clusters
evident in this analysis result:

A text-type is not a closely concentrated entity with rigid boundaries, but it is more
like a galaxy — with a compact nucleus and additional but less closely related members
which range out from the nucleus toward the perimeter. An obvious problem is how to
determine when the outer limits of those more remote, accompanying members have been
reached for one text-type and where the next begins.[13]

A term such as group,cluster, or nucleus might be used to describe a
local maximum in the density of objects within a CMDS map. A line which joins two items
might be called a trajectory, and a region between groups where there
is a higher than usual concentration of witnesses might be called a
stream or corridor.[14]

CMDS analysis of the control distance matrix produces the following result:

Figure 5. CMDS (Mark, UBS4, control)

Any appearance of clustering in the control map is illusory: its objects are by
definition unrelated, having been randomly generated. There are various differences between
the model and control maps. The model map has an irregular shape while the control map is
globular. Another difference relates to the respective map diameters: the volume enclosed by
map axes is greater for the model than the control. This indicates that dispersion among New
Testament texts is greater than would be expected if those texts resulted from random
selection among alternative readings. Yet another difference is the proportion of variance
figures for the model and control maps, which are respectively 0.51 and 0.16. The
dimensionality of the New Testament distance data is lower than for the control data, making
it easier to squeeze into only three dimensions.

Divisive Clustering (DC)

Divisive clustering begins with a single cluster and ends with individual objects. The
R program documentation describes the clustering algorithm as follows:[15]

At each stage, the cluster with the largest diameter is selected. (The diameter of a
cluster is the largest dissimilarity between any two of its observations.) To divide the
selected cluster, the algorithm first looks for its most disparate observation (i.e.,
which has the largest average dissimilarity to the other observations of the selected
cluster). This observation initiates the "splinter group". In subsequent steps, the
algorithm reassigns observations that are closer to the "splinter group" than to the "old
party". The result is a division of the selected cluster into two new clusters.

This type of analysis produces a dendrogram which shows the
“heights” at which clusters divide into sub-clusters. A divisive
coefficient which measures the amount of clustering is presented as well. The
value of this coefficient ranges from zero to one with larger values indicating a greater
degree of clustering. A DC dendrogram does not necessarily reflect the family tree of
objects in the underlying data set. Instead, it merely shows a reasonable way to
progressively subdivide an all-encompassing cluster until every sub-cluster is comprised of
a single object.[16]

Figure 6. DC (Mark, UBS4)

The vocabulary of tree structures is useful when discussing DC dendrograms. A branching
point is called a node, each structure which descends from a node is
called a branch, and terminals are called leaves.
The dendrograms produced by analysing New Testament data have a self-similar character
where, apart from scale, smaller parts have the same appearance as larger parts. Each branch
contains its own sub-branches, unless terminated by leaves (i.e. individual
witnesses).

A partition based on a DC dendrogram is obtained by means of a horizontal line which
cuts across the dendrogram at some height to produce a set of separate branches. One
possible height to cut a DC dendrogram is the upper critical limit of distances. Such a
large distance is seldom encountered among unrelated objects. Cutting at the upper critical
limit produces the following partition of the model data set.[17]

Performing DC analysis on the control distance matrix produces this dendrogram:

Figure 7. DC (Mark, UBS4, control)

The model and control dendrograms seem quite similar at first glance although there are
important differences: nearly all of the branching heights in the control dendrogram are in
the normal range [0.374, 0.553], and the divisive coefficient for the model (0.74) is much
larger than for the control (0.33).

The objects in the control can be grouped even though it is pointless to do so: if
nearly all distances between objects fall within the normal range then partitioning may well
be futile. In the present case, group sizes are more uniform for the control than model
although there is no reason why a data set with actual groups cannot have uniform group
sizes.

Table 6. DC partition (Mark, UBS4, control, upper critical limit)

Group no.

Members

1

R1 R5 R11 R15 R20 R22 R23 R25 R27 R32 R38 R50 R52 R54 R57 R64

2

R2 R3 R4 R10 R18 R30 R34 R41 R44 R47 R51 R62

3

R6 R7 R12 R17 R19 R29 R33 R39 R40 R48 R49 R58 R65

4

R8 R13 R14 R28 R31 R36 R37 R43 R45 R53 R55 R59

5

R9 R16 R21 R24 R26 R42 R46 R56 R60 R61

6

R35 R63

Neighbour Joining (NJ)

Neighbour joining (NJ) is an iterative process that begins with a starlike tree. A pair
of neighbours is chosen at every step, being that pair of objects which gives the smallest
sum of branch lengths. A node is then inserted between this pair, which node is regarded as
a single object for subsequent steps. The procedure seeks to find the minimum-evolution
tree, being that tree which most economically accounts for the observed set of distances
between objects. While the method “produces a unique final tree under the principle of
minimum evolution,” it does not always produce the minimum-evolution tree. However,
computer simulations show that it “is quite efficient in obtaining the correct tree
topology.”[18]

As with DC dendrograms, the vocabulary of tree structures is useful for discussing NJ
analysis results. The NJ procedure produces an unrooted tree, meaning that any node or
terminal in the result could be closest to the common ancestor of the entire tree.

Applying the NJ procedure to the model distance matrix produces a tree whose branches
correspond to clusters seen in the CMDS and DC results obtained from the same distance matrix:[19]

Figure 8. NJ (Mark, UBS4)

The tree obtained from the control distance matrix retains the NJ algorithm's initial
starlike structure. This shows what kind of topology (i.e. shape) to expect for an NJ result
derived from a data set comprised of unrelated objects. The marked difference from the model
result is another indication that clustering exists among texts of Mark's Gospel.

Figure 9. NJ (Mark, UBS4, control)

Partitioning Around Medoids (PAM)

Partitioning around medoids (PAM) builds clusters around representative objects called
medoids. The program documentation provides this description:[20]

The ‘pam’-algorithm is based on the search for ‘k’ representative objects or medoids
among the observations of the dataset. These observations should represent the structure
of the data. After finding a set of ‘k’ medoids, ‘k’ clusters are constructed by assigning
each observation to the nearest medoid. The goal is to find ‘k’ representative objects
which minimize the sum of the dissimilarities of the observations to their closest
representative object.

PAM analysis can be used to divide a data set into any number of clusters between two
and the number of cases in the data set. The standard procedure in this study will be to
partition data sets into two, three, four, five, and twelve clusters. The progression from
two to five shows which groups separate first, while the twelve-way partition is useful for
revealing core group members.[21]

Brackets mark the medoid of each group. A medoid has the minimum
mean distance to other group members and is the most central one for groups of three or more
items. For two member groups, the PAM algorithm chooses one as the medoid.

Note

This study uses the bracketed medoid identifier as a label for the associated group.
For example, [vg] refers to the first group in the above two-way partition.

A singleton is a solitary item which forms its own group. It is
isolated, not having any close relatives within the data set. Singletons are listed under a
separate heading, and the medoid of a singleton group is the sole member itself. The total
number of groups in a partition equals the sum of numbers of singletons and multiple member
groups.

Not all members of a group need be a good fit. PAM analysis calculates a statistic
called the silhouette width for each object in the data set being
partitioned into a chosen number of groups. Its value ranges from +1 to -1: the closer it is
to +1, the better the associated case fits into its assigned group; by contrast, the closer
the statistic is to -1, the worse the fit. Like hammering square pegs into round holes (or
vice versa), negative silhouette widths indicate that the
affected cases are not well suited to their assigned places. The last column in the table
lists witnesses with negative silhouette widths, putting those with the most negative values
last. The worst classified witnesses lie farthest to the right in such a list. A poor fit
may indicate that a witness has a mixed text or that the chosen number of groups is too
small for a text to be grouped with like texts alone.

As a data set is partitioned into larger numbers of groups, parent
groups tend to spawn child groups while themselves contracting into
narrower, more coherent groups. Group [Byz] is an example: as the same data set is
partitioned into more and more groups, this group contributes items to various other groups
while retaining a core membership. Partitioning a data set into a large number of groups
reveals coherent cores comprised of close confederates.

Adding the partition's number of groups to the group label produces a more specific
identifier. For example, [Byz] (3) refers to the group with medoid Byz in a three-way
partition while [Byz] (12) refers to the group with medoid Byz in a twelve-way partition.
Corresponding groups such as [Byz] (3) and [Byz] (12) are often produced when the same data
set is divided into different numbers of parts. However, the medoids of such groups are not
necessarily the same. Adding or subtracting even a single member can cause the medoid of a
group to change. Consequently, correspondence must be established on the basis of shared
membership, not common medoids. If groups from different partitions have the same core
membership but differing medoids then descendant groups can be labelled by chaining the
respective medoids together. To give an example from the table above, [Psi] (3) and [B] (5)
share members but their medoids differ. One might label the subgroup as [Psi-B] (5) to
indicate the connection with the supergroup from which its members are drawn.

Some numbers of groups are more suitable than others. Plotting a statistic called the
mean silhouette width against each possible number of groups
indicates which numbers of groups are more natural for the data set. The plot for the model
data set indicates that three, six, eleven, and twenty-four are among the more preferable
numbers of groups.[22]

Figure 10. Mean silhouette width versus number of groups (Mark, UBS4)

The MSW plot for the control data set also has a number of peaks even though that data
set has no actual groups.

Comparing the model and control MSW plots reveals a great difference in the respective
magnitudes of the MSW statistic. While the control data set is randomly generated and
consequently contains no actual groups, there is nevertheless random clustering which
accounts for the peaks seen in the associated MSW plot. The MSW plot for the control data
set establishes a noise level: peaks with such small magnitudes are worthless as indicators
of grouping.

Subtraction

The readings of a particular text can be subtracted to mask its effect on a data matrix.
This is useful in cases where the text in question is thought to contribute readings to
other texts. One example is the Byzantine text, which is a component of many
“mixed” texts. If the influence of such a text is removed from a data matrix
then what remains can be analysed to see how other texts relate in its absence.

Subtraction is achieved by selecting a text to eliminate then replacing its readings
with NA wherever they occur in a data matrix. A script called mask.R performs the task to produce a data matrix in
which all traces of the subtracted text are eliminated.

PAM (or a similar technique) can be used to identify texts that have a claim to
represent their respective clusters. Once medoids are identified, they can be used to
produce a corresponding series of masked data matrices in which the respective texts are
eliminated. Such a series is given in the UBS4 data matrices of Mark, above.

Analysis Results

This section presents results obtained by analysing the data sets referenced above using
the methods described in the preceding section. The results are given in three parts:

CMDS, DC, and NJ results for all data sets

PAM results for selected data sets

ranked distances for patristic data sets.

PAM results are presented for a series of data sets selected for their broad coverage of
witnesses and variation sites in respective sections of the New Testament. For patristic data
sets, ranked distance is the preferred analysis mode.

Selected Data Sets (PAM)

PAM results for selected data sets are presented below, arranged according to major
divisions of the New Testament. The chosen data sets have a relatively broad coverage of
witnesses and variation sites. Group medoids are marked by brackets (e.g. [033]). Results
for each data set are presented as partitions into two, three, four, five, and twelve
groups: the progression through two to five reveals the sequence of group emergence; the
division into twelve (a somewhat arbitrary number) shows which groups survive a many-way
division. (Such groups are aptly described as “coherent.”)

Patristic Witnesses (Ranked Distances)

Ranked distance results are presented in tables devoted to individual Church Fathers.
The lists include asterices to mark distances that are not statistically significant
provided the number of well-defined variation sites per witness is known. (The numbers, if
known, are found in a "counts" file associated with the relevant distance matrix.)
Indications of statistical significance are not given when these numbers are not
known.

Discussion

This discussion focusses on two categories of analysis results given above:

selected data sets

patristic witnesses.

An attempt will be made to cover what seem to be the most important features of
the textual landscape revealed by analysis of the more comprehensive data sets. Analysis
results for patristic texts will be discussed as well even though the associated data sets
tend not to have a broad coverage of witnesses or variation sites.

Discussion typically begins with PAM results for a data set, though CMDS, DC, NJ, and
ranked distance results are often covered as well. Patristic data sets are an exception to
this general approach. Due to the underlying vagaries of this hard won class of information,
the pictures produced by most of the analysis techniques tend to be uninformative. To give an
example of the difficulties encountered, PAM analysis often isolates a patristic witness as a
singleton as soon as the number of groups is increased beyond a few. Ranked distances provide
a viable alternative in these circumstances, allowing some sense of the textual nature of a
patristic witness to be gained by identifying its near neighbours. However, the accuracy and
precision of the impressions thus obtained may be open to question due to the inherent
uncertainties associated with establishing a Church Father's text from quotations.

A few warnings need to be issued before launching into the disquisition. The results are
provisional in the sense that any change to the inputs will produce changes in the outputs. If
more comprehensive data sets are analysed then their results will vary from those presented
here. The important question is, how much will they vary? If the data set analysed here is
already quite comprehensive and is not marred by some form of systematic bias then the
corresponding results can be expected to be consistent with any obtained from more
comprehensive data sets.[23]

Each analysis method presents its own view of the textual landscape occupied by New
Testament witnesses. While these views are generally consistent, there are cases where the
analysis modes differ with respect to the alignments of particular witnesses. If this happens
then due caution should be exercised when drawing conclusions about the textual complexions of
those witnesses. For analysis methods which divide texts into branches or groups (i.e. DC, NJ,
and PAM), any text which shares characteristics of multiple branches or groups (sometimes
called a "mixed" text) is prone to jump from one place to another if even slight changes are
made to the data upon which the analysis is based. The respective analysis techniques will
often agree on the core members of a group or branch but disagree on the placement of
peripheral ones.

All this raises the question of how best to describe a witness when the different analysis
modes present conflicting views of its affiliations. The ranked distance result acts as an
arbiter in such cases, providing a standard against which to assess aspects of textual
relationships indicated by the respective modes of analysis.[24]

Selected Data Sets

...

Gospels

In deference to Streeter, I begin discussion of the group structure of available New
Testament witnesses with the Gospel of Mark:

Mark provided very few lessons for the selection read in the public services of the
Church. It was much less used and much less commented on than the other Gospels... Hence
the comparative carelessness shown in correcting Mark to the fashionable type of text is
easily accounted for. There emerges a principle of some importance... Seeing that the
Gospel of Mark has escaped Byzantine revision in more copies and to a greater extent
than the other Gospels, it follows that our materials for reconstructing the old local
texts are far more abundant and trustworthy in this Gospel. From this we deduce the
following canon of textual criticism. Research into the pedigree of a MS.
should begin with a study of its text of Mark.[25]

Mark

...

Matthew

...

Luke

...

John

...

Acts and General Letters

...

Acts

...

James

...

1 Peter

...

2 Peter

...

1 John

...

2 John

...

3 John

...

Jude

...

Paul's Letters

...

2 Corinthians

...

Hebrews

...

Revelation

...

Patristic Witnesses

...

Table 60. Church Fathers, dates, and locations

Name

Dates

Locations

Athanasius

c. 297 - 373

Alexandria, Egypt

Basil of Caesarea

c. 330 - 379

Caesarea, Cappadocia

Clement of Alexandria

c. 150 - c. 215

Rome, Italy; Ephesus, Asia; Alexandria, Egypt

Cyril of Alexandria

c. 376 – 444

Alexandria, Egypt

Cyril of Jerusalem

c. 315 - 386

Jerusalem, Judea

Didymus the Blind

c. 313 – 398

Alexandria, Egypt

Epiphanius of Salamis

c. 315 – 403

Salamis, Cyprus

Gregory of Nyssa

c. 335 – c. 395

Nyssa, Cappadocia

Origen

185 - 254

Alexandria, Egypt (until 231); Caesarea, Judea (from 231)

Ramblings (draft, do not quote)

UBS2

Matthew

Figure 12. UBS2 Matt NJ

Groups exist but are not always well defined.

NJ branches have points of contact with PAM divisions but the correspondence is
sometimes weak.

If the sampled texts developed from a single initial text then it is reasonable to
look for the initial text's nearest extant relatives where major branches of the NJ
tree converge. Accordingly, key texts to consider when recovering the initial text of
Matthew include 33, 892, and 1546.

One way to recover the initial text at every variation site is to take the most
frequent reading across a number of texts located near the junction of major branches
of the NJ tree. Another approach is to take the most frequent reading across group
medoids. Branches or groups that are known to be secondary (e.g. [it-aur] (6), namely
the Vulgate group) can be eliminated from consideration before using these recovery
procedures.

NJ result indicates that Syriac (except Harclean and Palestinian), Armenian,
Georgian, and Latin occupy the same branch which is devoid of Greek MS support (except
D).

21-way partition has a cluster (W, Psi, Family 1, 565, 1009, 1079, 1365, 1546)
centred on Basil (Cappadocia, d. 379). Did the writings of Church Fathers interfere
with the biblical text? Monks might be expected to use theologically-correct phrases
when copying (cf. Ehrman). Some members of this cluster are in Streeter's Caesarean
group in Mark.

D, it-d, and Eusebius associate in 21-way partition. This suggests a link between
the D-text and the text used by Eusebius. Is this a clue to the provenance of Codex
Bezae? (Spelling analysis would be interesting.)

Three of the branches are associated with ancient versions: Alexandrian and
Coptic; Eastern and Old Syriac; Western and Old Latin.

Key texts to consider when recovering the initial text of Mark (i.e. those near
the convergence of major branches): 33, syr-h, eth, 579, syr-p, syr-pal, 700, Family
13.

The CMDS map shows that
Jerome's Vulgate (vg) stands between a group of Old Latin texts (a, b, c, d,
ff2, i, r1) and the Byzantine
cluster. This indicates that the Greek texts Jerome used to revise the Latin of Mark's
Gospel were of the Byzantine variety. Jerome said that the Greek copies he used were
“old indeed.” Unless Jerome was badly mistaken about the age of these
Greek copies, it seems that the Byzantine variety existed well before his time.

2 Corinthians

Figure 17. NJ (2 Cor, UBS4)

Galatians

Figure 18. NJ (Gal, Carlson)

Hebrews

Figure 19. NJ (Heb, UBS4)

1 Peter

Figure 20. NJ (1 Peter, UBS4)

1 John

Figure 21. NJ (1 John, UBS4)

The Apocalypse

Figure 22. NJ (Rev, UBS4)

INTF General Letters

James

Figure 23. INTF General, James, NJ

1 Peter

Figure 24. INTF-General, 1 Peter, NJ

2 Peter

Figure 25. INTF-General, 2 Peter, NJ

1 John

Figure 26. INTF-General, 1 John, NJ

2 John

Figure 27. INTF-General, 2 John, NJ

3 John

Figure 28. INTF-General, 2 John, NJ

Jude

Figure 29. INTF-General, Jude, NJ

INTF Parallel Pericopes

INTF Parallel, Luke

Comparison of Classifications of Greek MSS

PAM partitions of INTF Parallel Pericopes data set can be
compared with classifications proposed by (1) von Soden and (2) Wisse.

[826] (i.e. Family 13) is highly coherent, hardly changing as the data set is
partitioned into more groups. A two-way partition of this data set separates Family
13 from the rest.

Streeter classifies some members of [024] (11) (i.e. 032, 038, 157, 700, 1071)
as Caesarean and others as Alexandrian (i.e. 04, 033, 040, 33, 892). Wisse
classifies a number of [024] (11) members into his B group for one or more of his
test passages (i.e. 032, 040, 33, 157, 700, 892).

Note

The comparison is based on Wisse's table of profile classifications.[26] Each PAM group is labelled by its medoid (e.g. [A], which stands for the
Ausgangstext, not Codex Alexandrinus). Figures in
parentheses (e.g. 3/4) give the proportion of witnesses in a PAM group that von Soden
or Wisse place together in one of their groups. E.g. in the [1446] row, of four
witnesses in the PAM group whose classifications by von Soden are given, three are in
his Iκ group. In the von Soden column,
I and A categories count all witnesses in
the corresponding subgroups. In some cases, however, subgroups are specifically
listed. For figures given in the Wisse column, each witness is
assigned the majority classification across Wisse's three test passages (i.e. Luke
chapters 1, 10, and 20). A witness is not counted if there is no majority
classification. For von Soden and Wisse
columns, entries are made only for those categories which include more than one
witness from the corresponding PAM group.

In an 11-way partition, PAM groups [A], [041], [1582], [1012], [826], and [1446]
correlate well with groups identified by von Soden and Wisse while PAM groups [3],
[35], [024], and [184] do not. Group [184] would be a good match if compared with
combinations of (1) von Soden's Iβ and
Iφ and (2) Wisse's groups 16 and 1216.

Didymus of Alexandria; Egypt; d. 398

Mullen

Cyril of Jerusalem; Palestine; d. 386

Osburn

Epiphanius of Salamis; Cyprus (also Palestine and Egypt); d. 403

Racine

Wasserman

Patmos Manuscripts

What Difference Does It Make?

The analysis results presented above highlight variations between witnesses of the New
Testament. This naturally raises the question of what difference the variations make to the
meaning of the text. Most variations are of little consequence — whether an added or dropped
article, change of word order, or substitution of a synonymous phrase. Other variations have
larger semantic effect, the two most extreme examples being Mark 16.9-20 and John 7.53-8.11
that are absent from a number of witnesses.

One way to convey how much difference the variations make is to provide translations of a
number of textual varieties for the same section of text. The following parallel translation
of four varieties of the first chapter of Mark highlights the variation sites given in the
United Bible Societies Greek New Testament. This edition only presents
a selection of textual variations:

The intention was to provide an apparatus where the most important international
translations of the New Testament show notes referring to textual variants or even have
differences in their translations or interpretations. Other groups of variants have also
been included when for various reasons they are significant and worthy of consideration.[27]

The variation sites presented in the UBS apparatus constitute a small proportion of the
total number that exist. Nevertheless, those presented are the most important from a semantic
point of view, and the great majority of variations not presented in the UBS apparatus have
negligible semantic effect. Consequently, looking at these variation sites should give a
reliable impression of the magnitude of difference in meaning between textual varieties.

The textual varieties shown in the parallel translation consist of four clusters
identified by reference to the DC dendrogram
of the UBS4 data set for Mark:[28]

A: The mainly Byzantine cluster comprised of A ... syr-pal

B: Aleph B C L Delta Psi 892 1342 cop-bo cop-sa it-k

C: W Theta f-1 28 205 565 arm geo syr-s

D: D it-a it-b it-c it-d it-ff-2 it-i it-q it-r-1

For each variation unit, the reading supported by a textual variety is taken to be the one
that occurs most frequently among its members. To illustrate, suppose that a variation unit
has three readings and that two witnesses in cluster C have the first, three have the second,
and four have the third. The reading supported by cluster C would then be taken to be the
third. For the purpose of this exercise, if a tie occurs then the supported reading is taken
to be the one with the greatest tendency to isolate the variety.

The highlighted passages show how much the more important differences encountered in the
first chapter of Mark's Gospel affect the meaning of the text. Most of the differences are
hardly worth a second thought, though a few do convey a different shade of meaning. If this
one chapter is representative (and there is no reason to think it is not) then it is fair to
say that most of the textual variation in the New Testament has little semantic effect. That
said, there are a few places (such as Mark 16.9-20 and John 7.53-8.11) where the differences
are significant.

Table 67. Four-way parallel translation of Mark chapter one

Reference

A

B

C

D

1.1

The beginning of the good news about Jesus Christ, Son of
God.

The beginning of the good news about Jesus Christ, Son of
God.

The beginning of the good news about Jesus Christ.

The beginning of the good news about Jesus Christ, Son of
God.

1.2

As written in the prophets, "Look, I send my messenger
before you, who will prepare your way;"

As written in the prophet Isaiah, "Look, I send my
messenger before you, who will prepare your way;"

As written by Isaiah the prophet, "Look, I send my
messenger before you, who will prepare your way;"

As written in the prophet Isaiah, "Look, I send my
messenger before you, who will prepare your way;"

1.3

"A voice shouting in the wilderness, 'Prepare the way of the Lord! Make his paths
straight!'"

"A voice shouting in the wilderness, 'Prepare the way of the Lord! Make his paths
straight!'"

"A voice shouting in the wilderness, 'Prepare the way of the Lord! Make his paths
straight!'"

"A voice shouting in the wilderness, 'Prepare the way of the Lord! Make his paths
straight!'"

1.4

John appeared, baptizing in the wilderness and announcing
a baptism of a changed attitude for forgiveness of wrong deeds.

John the Baptist appeared in the wilderness, and [was]
announcing a baptism of a changed attitude for forgiveness of wrong deeds.

John the Baptist appeared in the wilderness, and [was]
announcing a baptism of a changed attitude for forgiveness of wrong deeds.

John appeared in the wilderness, baptizing and announcing
a baptism of a changed attitude for forgiveness of wrong deeds.

1.5

They went out to him, all of the land of Judea and those of Jerusalem, and were baptized
by him, confessing their wrong deeds.

They went out to him, all of the land of Judea and those of Jerusalem, and were baptized
by him, confessing their wrong deeds.

They went out to him, all of the land of Judea and those of Jerusalem, and were baptized
by him, confessing their wrong deeds.

They went out to him, all of the land of Judea and those of Jerusalem, and were baptized
by him, confessing their wrong deeds.

1.6

John was clothed [with] camel hair and a leather covering
around his waist; he ate locusts and wild honey.

John was clothed [with] camel hair and a leather covering
around his waist; he ate locusts and wild honey.

John was clothed [with] camel hair and a leather covering
around his waist; he ate locusts and wild honey.

John was clothed [with] camel hair and a leather covering
around his waist; he ate locusts and wild honey.

1.7

He gave notice saying, "One more powerful than me comes after me,
whose sandal straps I am not worthy to bend down and untie."

He gave notice saying, "One more powerful than me comes after me,
whose sandal straps I am not worthy to bend down and untie."

He gave notice saying, "One more powerful than me comes after me,
whose sandal straps I am not worthy to bend down and untie."

He gave notice saying, "I baptize you in water. One more powerful
than me comes after me, whose sandal straps I am not worthy to bend down and
untie."

1.8

"I baptize you in water; he will baptize you in the Holy
Spirit."

"I baptize you [in] water; he will baptize you in the
Holy Spirit."

"I baptize you in water; he will baptize you in the Holy
Spirit."

"He will baptize you in the Holy Spirit."

1.9

In those days Jesus came from Nazareth, Galilee, and was baptized in the Jordan by
John.

In those days Jesus came from Nazareth, Galilee, and was baptized in the Jordan by
John.

In those days Jesus came from Nazareth, Galilee, and was baptized in the Jordan by
John.

In those days Jesus came from Nazareth, Galilee, and was baptized in the Jordan by
John.

1.10

Then coming up from the water he saw the heavens being torn open and the Spirit coming
down to him like a dove.

Then coming up from the water he saw the heavens being torn open and the Spirit coming
down to him like a dove.

Then coming up from the water he saw the heavens being torn open and the Spirit coming
down to him like a dove;

Then coming up from the water he saw the heavens being torn open and the Spirit coming
down to him like a dove;

1.11

There came from the heavens a voice: "You are my beloved
Son; I am delighted with you."

There came from the heavens a voice: "You are my beloved
Son; I am delighted with you."

from the heavens he heard a voice: "You are my beloved
Son; I am delighted with you."

from the heavens a voice: "You are my beloved Son; I am
delighted with you."

1.12

Then the Spirit drives him into the wilderness.

Then the Spirit drives him into the wilderness.

Then the Spirit drives him into the wilderness.

Then the Spirit drives him into the wilderness.

1.13

He was in the desert forty days being tested by Satan; he was with the wild animals and
the angels waited on him.

He was in the desert forty days being tested by Satan; he was with the wild animals and
the angels waited on him.

He was in the desert forty days being tested by Satan; he was with the wild animals and
the angels waited on him.

He was in the desert forty days being tested by Satan; he was with the wild animals and
the angels waited on him.

1.14

After John had been arrested, Jesus went into Galilee announcing the good news of the kingdom of God

After John had been arrested, Jesus went into Galilee announcing the good news of God

After John had been arrested, Jesus went into Galilee announcing the good news of God

After John had been arrested, Jesus went into Galilee announcing the good news of the kingdom of God

1.15

saying, "The time has come and God's kingdom is near. Change your attitude and believe
the good news."

saying, "The time has come and God's kingdom is near. Change your attitude and believe
the good news."

saying, "The time has come and God's kingdom is near. Change your attitude and believe
the good news."

saying, "The time has come and God's kingdom is near. Change your attitude and believe
the good news."

1.16

Passing by the Sea of Galilee he saw Simon and Andrew, Simon's brother, throwing a net
into the sea. (They were fishermen.)

Passing by the Sea of Galilee he saw Simon and Andrew, Simon's brother, throwing a net
into the sea. (They were fishermen.)

Passing by the Sea of Galilee he saw Simon and Andrew, Simon's brother, throwing a net
into the sea. (They were fishermen.)

Passing by the Sea of Galilee he saw Simon and Andrew, Simon's brother, throwing nets
into the sea. (They were fishermen.)

1.17

Jesus said to them, "Come with me and I will make you into fishers of men."

Jesus said to them, "Come with me and I will make you into fishers of men."

Jesus said to them, "Come with me and I will make you into fishers of men."

Jesus said to them, "Come with me and I will make you into fishers of men."

1.18

Then they left the nets and followed him.

Then they left the nets and followed him.

Then they left the nets and followed him.

Then they left the nets and followed him.

1.19

Going a bit further he saw Jacob Zebedee and his brother John who were in the boat
fixing the nets.

Going a bit further he saw Jacob Zebedee and his brother John who were in the boat
fixing the nets.

Going a bit further he saw Jacob Zebedee and his brother John who were in the boat
fixing the nets.

Going a bit further he saw Jacob Zebedee and his brother John who were in the boat
fixing the nets.

1.20

Then he called them. Leaving their father Zebedee in the boat with the hired hands, they
went after him.

Then he called them. Leaving their father Zebedee in the boat with the hired hands, they
went after him.

Then he called them. Leaving their father Zebedee in the boat with the hired hands, they
went after him.

Then he called them. Leaving their father Zebedee in the boat with the hired hands, they
went after him.

1.21

They go into Capernaum. Then, on the Sabbath, having gone into the synagogue, he
taught.

They go into Capernaum. Then, on the Sabbath, having gone into the synagogue, he
taught.

They go into Capernaum. Then, on the Sabbath, having gone into the synagogue, he
taught.

They go into Capernaum. Then, on the Sabbath, having gone into the synagogue, he
taught.

1.22

They were shocked by his teaching because he taught them like someone with authority,
not like the scholars.

They were shocked by his teaching because he taught them like someone with authority,
not like the scholars.

They were shocked by his teaching because he taught them like someone with authority,
not like the scholars.

They were shocked by his teaching because he taught them like someone with authority,
not like the scholars.

1.23

Then there was a man with an unclean spirit in their synagogue. He screamed,

Then there was a man with an unclean spirit in their synagogue. He screamed,

Then there was a man with an unclean spirit in their synagogue. He screamed,

Then there was a man with an unclean spirit in their synagogue. He screamed,

1.24

"What's with us and you, Jesus Nazarene? Have you come to destroy us? I know who you are
— God's holy one!"

"What's with us and you, Jesus Nazarene? Have you come to destroy us? I know who you are
— God's holy one!"

"What's with us and you, Jesus Nazarene? Have you come to destroy us? I know who you are
— God's holy one!"

"What's with us and you, Jesus Nazarene? Have you come to destroy us? I know who you are
— God's holy one!"

1.25

Jesus told it off saying, "Be quiet! Get out of him!"

Jesus told it off saying, "Be quiet! Get out of him!"

Jesus told it off saying, "Be quiet! Get out of him!"

Jesus told it off saying, "Be quiet! Get out of him!"

1.26

Throwing a fit and shouting with a loud voice, the unclean spirit got out of him.

Throwing a fit and shouting with a loud voice, the unclean spirit got out of him.

Throwing a fit and shouting with a loud voice, the unclean spirit got out of him.

Throwing a fit and shouting with a loud voice, the unclean spirit got out of him.

1.27

All being shocked they asked each other, "What is this? What new
teaching is this, that with authority he gives orders even to unclean spirits and they
obey him?"

All being shocked they asked each other, "What is this new teaching
with authority? He gives orders even to unclean spirits and they obey
him."

All being shocked they asked each other, "What is this, this new
teaching with authority? He gives orders even to unclean spirits and they obey
him."

All being shocked they asked each other, "What is that teaching,
this new one with authority, that he gives orders even to unclean spirits and they obey
him?"

1.28

The news about him then got out everywhere in the whole region of Galilee.

The news about him then got out everywhere in the whole region of Galilee.

The news about him then got out everywhere in the whole region of Galilee.

The news about him then got out everywhere in the whole region of Galilee.

1.29

Then, leaving the synagogue, they went to Simon and
Andrew's house with Jacob and John.

Then, leaving the synagogue, they went to Simon and
Andrew's house with Jacob and John.

Then, leaving the synagogue, he went to Simon and
Andrew's house with Jacob and John.

Leaving the synagogue, he went to Simon and Andrew's
house with Jacob and John.

1.30

Simon's mother-in-law lay sick with fever. Then they tell him about her.

Simon's mother-in-law lay sick with fever. Then they tell him about her.

Simon's mother-in-law lay sick with fever. Then they tell him about her.

Simon's mother-in-law lay sick with fever. Then they tell him about her.

1.31

He went over, took hold of her hand, and helped her up. The fever left her and she began
to wait on them.

He went over, took hold of her hand, and helped her up. The fever left her and she began
to wait on them.

He went over, took hold of her hand, and helped her up. The fever left her and she began
to wait on them.

He went over, took hold of her hand, and helped her up. The fever left her and she began
to wait on them.

1.32

In the evening after sunset they began to bring everyone who was suffering from sickness
and the demonized.

In the evening after sunset they began to bring everyone who was suffering from sickness
and the demonized.

In the evening after sunset they began to bring everyone who was suffering from sickness
and the demonized.

In the evening after sunset they began to bring everyone who was suffering from sickness
and the demonized.

1.33

The whole town was gathered at the door.

The whole town was gathered at the door.

The whole town was gathered at the door.

The whole town was gathered at the door.

1.34

He cured a lot who suffered a variety of sicknesses and got out a lot of demons. He did
not allow the demons to speak because they had recognized him.

He cured a lot who suffered a variety of sicknesses and got out a lot of demons. He did
not allow the demons to speak because they had recognized him to be
Christ.

He cured a lot who suffered a variety of sicknesses and got out a lot of demons. He did
not allow the demons to speak because they had recognized him to be
Christ.

He cured a lot who suffered a variety of sicknesses and got out a lot of demons. He did
not allow the demons to speak because they had recognized him.

1.35

Getting up early while it was still dark, he left and went away to a deserted spot and
prayed there.

Getting up early while it was still dark, he left and went away to a deserted spot and
prayed there.

Getting up early while it was still dark, he left and went away to a deserted spot and
prayed there.

Getting up early while it was still dark, he left and went away to a deserted spot and
prayed there.

1.36

Simon and those with him hunted him down.

Simon and those with him hunted him down.

Simon and those with him hunted him down.

Simon and those with him hunted him down.

1.37

They find him and say to him, "Everyone is looking for you."

They find him and say to him, "Everyone is looking for you."

They find him and say to him, "Everyone is looking for you."

They find him and say to him, "Everyone is looking for you."

1.38

He says to them, "Let's go somewhere else -- into the next towns -- so that I can
campaign there too, because I came out for this."

He says to them, "Let's go somewhere else -- into the next towns -- so that I can
campaign there too, because I came out for this."

He says to them, "Let's go somewhere else -- into the next towns -- so that I can
campaign there too, because I came out for this."

He says to them, "Let's go somewhere else -- into the next towns -- so that I can
campaign there too, because I came out for this."

1.39

He was campaigning in their synagogues throughout
Galilee, driving out demons too.

He went campaigning in their synagogues throughout
Galilee, driving out demons too.

He was campaigning in their synagogues throughout
Galilee, driving out demons too.

He was campaigning in their synagogues throughout
Galilee, driving out demons too.

1.40

A leper came towards him begging and kneeling to him,
saying "If you want to you can make me clean."

A leper came towards him begging and kneeling, saying "If
you want to you can make me clean."

A leper came towards him begging and kneeling, saying "If
you want to you can make me clean."

A leper came towards him begging, saying "If you want to
you can make me clean."

1.41

Deeply moved, reaching out his hand he takes hold of him
and says: "I want to. Be clean."

Deeply moved, reaching out his hand he takes hold of him
and says: "I want to. Be clean."

Deeply moved, reaching out his hand he takes hold of him
and says: "I want to. Be clean."

Getting annoyed, reaching out his hand he takes hold of
him and says: "I want to. Be clean."

1.42

Then the leprosy left him and he was cleansed.

Then the leprosy left him and he was cleansed.

Then the leprosy left him and he was cleansed.

Then the leprosy left him and he was cleansed.

1.43

He told him off then sent him away.

He told him off then sent him away.

He told him off then sent him away.

He told him off then sent him away.

1.44

He says to him, "Look, don't say anything to anyone. Instead, go off, show yourself to
the priest, and offer what Moses commanded for your cleansing as proof to them."

He says to him, "Look, don't say anything to anyone. Instead, go off, show yourself to
the priest, and offer what Moses commanded for your cleansing as proof to them."

He says to him, "Look, don't say anything to anyone. Instead, go off, show yourself to
the priest, and offer what Moses commanded for your cleansing as proof to them."

He says to him, "Look, don't say anything to anyone. Instead, go off, show yourself to
the priest, and offer what Moses commanded for your cleansing as proof to them."

1.45

However, he went out and began much campaigning and spreading the word so that Jesus
couldn't openly go into a city anymore but stayed outside in remote places. They came to
him from everywhere.

However, he went out and began much campaigning and spreading the word so that Jesus
couldn't openly go into a city anymore but stayed outside in remote places. They came to
him from everywhere.

However, he went out and began much campaigning and spreading the word so that Jesus
couldn't openly go into a city anymore but stayed outside in remote places. They came to
him from everywhere.

However, he went out and began much campaigning and spreading the word so that Jesus
couldn't openly go into a city anymore but stayed outside in remote places. They came to
him from everywhere.

Notes

My translation attempts to produce contemporary English while retaining the atmosphere
of the Greek. I've used "change your attitude" instead of the somewhat archaic "repent,"
and "campaign" instead of the rarely used "proclaim" or less vivid "preach." The simple
present is used to translate Mark's "historic present." (E.g. "He says to them...")

Sometimes the most frequently supported readings of the four varieties are all the
same, as in Mark 1.6 where two witnesses from cluster D have leather
instead of hair.

A variation unit may affect more than one verse, as at Mark 1.7-8.

Conclusions (draft, do not quote)

Based on comparison of results presented here, these analysis techniques seem to be
robust against loss of information. For example, comparing UBS and INTF results for a data
set shows that Greek MSS tend to occupy the same groups when versional and patristic
information is omitted. Even so, it is prudent to include as much information as
practicable to reduce the risk of missing important relationships.

Some groups are coherent (e.g. Byzantine text, Family 13) while others (e.g.
Alexandrian text, Western text, Streeter's “Eastern type”) are diffuse.
Coherent groups tend to remain intact when a data set is split into many parts while
diffuse groups tend to evaporate. A possible cause of coherence is lack of
interference.

Early versions such as the Syriac, Latin, and Coptic are associated with textual
varieties such as Streeter's “Eastern type,” the Western text, and the
Alexandrian text, respectively. It may be that these versions interfered with the Greek
text through the mechanism of back-translation from vernacular to Greek.

...

...

Finally, a plea. Please share information in a format that is useful to others. Given the
volume of data and human cost of manual transcription, sharing data as electronic files is a
Good Thing. (CSV is a good choice; so is XML.) Please present data sets as data matrices or
something like the following XML that can be processed to produce a data matrix using a
language such as XQuery:

Having information that can be electronically processed to produce data matrices allows a
broad range of analysis techniques to be applied. If unable to release data matrices then
distance matrices are useful, though the range of applicable analysis techniques is narrower.
(Tables of percentage agreement or proportional agreement are readily converted to distance
matrices.) An important adjunct to distance matrices (or tables of agreements) is a table of
counts saying how many data points were used to calculate each distance. Not supplying this
information leaves others in the dark concerning the statistical significance of the
distances.

Another important aspect is to present apparatus data in a manner that allows presence or
absence of all witnesses chosen for citation to be readily established. (See, for instance,
the category with text "undefined" in the above XML example.) The UBS Greek New
Testament apparatus is most useful in this respect: one knows that if a
sometimes cited witness does not appear in an apparatus entry then its reading cannot be
established at that place. The Nestle-Aland Novum Testamentum Graece
apparatus is less easy to use for constructing data matrices because there are a number of
reasons why a witness might not be cited at a variation site:[29]

it is subsumed under the majority text (𝔐)

it does not support the noted reading of a negative apparatus entry

its text is not legible.

In the absence of an algorithm to establish why a witness is not cited at a
variation site, this kind of apparatus is not amenable to producing data matrices.

Acknowlegments

Richard Mallett deserves special thanks for encoding data matrices and transcribing tables
of percentage agreement from numerous sources. Compiling the basic data from which analysis
proceeds is an arduous and painstaking task, and he has done great service in this respect.
Mark Spitsbergen also deserves thanks for helping to encode UBS4 apparatus data for the first
fourteen chapters of Matthew.

Maurice A. Robinson kindly provided tables of percentage agreement for the Gospels and
Acts. These are derived from the apparatus of the second edition of the United Bible
Societies' Greek New Testament. The exacting task of transforming the
data into electronic format was performed by Claire Hilliard and Kay Smith.

A number of the results are produced from comprehensive data generously provided by the
Institut für neutestamentliche Textforschung in Münster, Germany. Researchers at the INTF have
spent many years on the gargantuan task of compiling this data. Holger Strutwolf, Klaus
Wachtel, and Volker Krüger were instrumental in providing access to the data.

Thanks go to Gerald Donker for suggesting that the RGL plotting library be used to produce
three-dimensional CMDS maps. He also encouraged me to take a less procrustean approach to
missing data. As a consequence, the analysis results presented here include many more
witnesses than they otherwise would.

Isaac Newton said, “If I have seen further it is only by standing on the shoulders
of giants.” This sentiment truly applies to the results presented here. Our field
owes a great debt to those who have compiled the information, both printed and electronic,
upon which the data and distance matrices are based.

A. Supplementary Information

This appendix provides supplementary information related to analysis results for the data sets:

what proportion of variance is explained by the corresponding three-dimensional CMDS
result

the MSW plot obtained by PAM analysis which indicates preferable numbers of
groups.

Cosaert, Carl P. The Text of the Gospels in Clement of
Alexandria. New Testament in the Greek Fathers 9. Atlanta: Society of Biblical
Literature, 2008.

Cunningham, Arthur. “The New Testament Text of St. Cyril of
Alexandria.” PhD dissertation, University of Manchester, 1995.

Donker, Gerald J. The Text of the Apostolos in Athanasius of
Alexandria. New Testament in the Greek Fathers 8. Atlanta: Society of Biblical
Literature, 2011.

Ehrman, Bart D. Didymus the Blind and the Text of the
Gospels. New Testament in the Greek Fathers 1. Atlanta: Society of Biblical
Literature, 1986.

Ehrman, Bart D., Gordon D. Fee, and Michael W. Holmes. The Text of the
Fourth Gospel in the Writings of Origen. New Testament in the Greek Fathers 3.
Atlanta: Society of Biblical Literature, 1992.

Ehrman, Bart D. and Michael W. Holmes. The Text of the New Testament in
Contemporary Research: Essays on the Status
Quaestionis. Studies and Documents 46. Grand Rapids: Eerdmans,
1995.

Epp, Eldon J. “The Significance of the Papyri for Determining the Nature of
the New Testament Text in the Second Century: A Dynamic View of Textual
Transmission.” In Epp and Fee, Studies in Theory and Method,
274-97.

———. “The Twentieth-Century Interlude in New Testament Textual
Criticism.” In Epp and Fee, Studies in Theory and Method,
83-108.

Racine, Jean-François. The Text of Matthew in the Writings of Basil of
Caesarea. New Testament in the Greek Fathers 5. Atlanta: Society of Biblical
Literature, 2004.

Richards, W. L. The Classification of the Greek Manuscripts of the
Johannine Epistles. SBL Dissertation Series 35. Missoula: Society of Biblical
Literature, 1977.

Robinson, Maurice A. “The Determination of Textual Relationships among
Selected Manuscripts of the New Testament through the use of Data-Processing
Methods.” Unpublished paper in three parts, Southeastern Baptist Theological
Seminary, 1972-3.

Wisse, Frederik. The Profile Method for the Classification and
Evaluation of Manuscript Evidence as Applied to the Continuous Greek Text of the Gospel of
Luke. Studies and Documents 44. Grand Rapids: Eerdmans, 1982.

[2] Gerd Mink provides a definition of the term initial text in
“Problems of a Highly Contaminated Tradition,” 25-26. Eldon J. Epp finds
the term original text problematic, as discussed in his
“Multivalence of the Term 'Original Text.'”

[3] My impression is that the original text's readings are likely to survive among those
we know, and that applying the full range of tools now available (including conventional
criteria for choosing the best readings) allows us to produce a good approximation to the
original text at the level of individual words (though not their spelling). At the
semantic level (which is what matters), comparing the archetypal texts (i.e. group
representatives) shows that the New Testament is well established, with only a few places
where there is a real question about the meaning of the text as handed down from the
apostolic generation.

[5] A distance matrix can be obtained from a table of percentage agreement by dividing
each percentage by one hundred then subtracting the result from one. For example, a
percentage agreement of 85% corresponds to a distance of 0.15.

[6] The vetting algorithm can be forced to retain a particular witness provided it has
more than the minimum acceptable number of defined variation sites. The number fifteen is
chosen because with this many variation sites a distance estimate of 0.5 has a sampling
error of plus or minus 0.233. (In statistical terminology, the critical limits of the 95%
confidence interval are 0.267 and 0.733.) That is, when using only fifteen variation
sites, the sampling error associated with the distance estimate covers about one half of
the entire range of possible distances. The relative size of the sampling error decreases
as the number of sites from which the distance is calculated increases. A number less than
fifteen may be used so that a particularly important fragmentary witness can be included
in a distance matrix.

[7] The R script named control.r produces the control data matrix then dist.r is used to produce the
corresponding distance matrix. Each variable in the control data matrix has only two
possible states whereas more than two can occur in variables (i.e. variation sites) of the
model data matrix. This is not a bad approximation as variables of the model data matrix
often have only two states. The main aim, which is hardly affected by the number of
states, is to produce a control with approximately the same mean distance between objects
as the model. This is achieved using the R expression p = (1
+ (1 - 2*d)^0.5)/2 to calculate the probability p of choosing the
first state (i.e. 1) based on the desired mean distance d. This
p is then used to set the chance of generating a 1 when
c1s and 2s are generated to form an object. Due to its stochastic
nature, the procedure is unlikely to produce a control with exactly the same mean distance
between objects as the model. However, if many controls were produced and their mean
distances between objects were averaged then the result would tend towards
d.

[8] The limits of the 95% confidence interval for the distance between two randomly
generated objects can be obtained with the R expression
qbinom(c(0.025, 0.975), c, d)/c where c is the number of
variables and d is the mean distance.

[11] The bounds were calculated using the R expression
qbinom(c(0.025, 0.975), 123, 0.464)/123. The mean distance between
objects in the model distance matrix is 0.464 and the rounded mean number of variables
in the objects from which that distance matrix was calculated is 123.

[12] CMDS analysis is performed by MVA-CMDS.r. The proportion of variance figure for each
CMDS plot is provided in the "Supplementary Information" appendix.

[14] In this article, a trajectory refers to a line joining two
endpoints in textual space. By contrast, Epp uses the term to describe a time sequence
of witnesses with the same kind of text; see e.g. his “Twentieth-Century
Interlude,” 93.

[16] DC analysis is performed by MVA-DC.r using the distance matrix and a table of
counts which gives the number of variation sites covered by each witness.

[17] Branching heights correspond to distances between the clusters constituted by the
branches. The upper critical limit calculation and partitioning are performed by MVA-DC.r. The order of
groups is determined by the program and is not significant.

[20] See documentation relating to the pam method of the
cluster package by Maechler and others, “Cluster Analysis
Basics and Extensions.”

[21] PAM analysis is performed by MVA-PAM.r.
There is no particular reason for making the largest number of groups twelve. Smaller
numbers of groups tend to retain peripheral members while larger numbers tend to produce
unwieldy tables of results.

[22] The MSW plot is produced by MVA-PAM-MSW.r. This script also identifies numbers of groups corresponding to
peaks with above-average MSW values. An MSW plot for each data set is provided in the
"Supplementary Information" appendix.

[23] Systematic bias can be introduced by editorial practices that treat one group
differently to another.

[24] The script rank.r allows ranked distance
results to be obtained for any witness included in a distance matrix. This script requires
a distance matrix and list of counts of readings per witness in the distance matrix, which
are found in the dist directory of this web site.