The Duplication Problem

Observational Errors

1. Under Counting. Under counting occurs (a) when not all the birds that are really present are counted and (b) when
the counter counts the visible or heard birds incorrectly reporting a lower number than was really present.

2. Over Counting. Over counting occurs when the counter miscounts the birds seen or heard reporting a larger number of
the species than was really present.

3. Misidentifying. This occurs when the observer reports one species as being another and is really an under count of
the correct species combined with an over count of the incorrect species.

All of these errors assume that there is a “truth” out there in the world and that it is known to a Guaranteed Omniscient
Discriminator who can then identify that a particular error has occurred.

In a practical sense, we cannot know that these errors have been made. Under counting 1 (a) cannot be known to human observers
except under extremely favourable conditions. Undercounting 1 (b) can be picked up if birders count in teams and correct each
other’s estimates. Over counting 2 can be corrected if birders count in groups. Misidentifying 3 can be caught by a better
identifier particularly if birders are in groups but later attempts to disqualify observations based on likelihood are
statistical in nature and not certain.

In the field we do the best that we can knowing that we will be making errors all the time to a greater or lesser degree
and knowing as well that we will never know how great or little our errors are since we are not Guaranteed Omniscient
Disciminators.

Other Errors

4. Transcription errors. These occur (a) when observers write down their data and get it wrong or (b) when later data
copying into other formats such as electronic form gets it wrong.

5. Processing errors. These are errors in extraction of data or of calculation based on data.

We will not be discussing these errors further. But the possibility of this kind of error should prevent us from
becoming too arrogantly happy with the accuracy of our data since even perfect data is subject to them.

A duplication error is an instance of over counting 1 (b) and will arise in the eBird system frequently. For example, even though
the recommendation is that TOC trip leaders enter trip observations into eBird, many of the attendees of the trip will want
to enter the data too as part of their day/month/year/life list. Therefore each bird will be reported not once but potentially
several times in eBird.

We cannot avoid this. The real issue is whether it constitutes a problem. Whether duplication of data points is a problem
for the system depends on what how the data is going to be used.

We have recommended four reasons for collecting data (Migration Information, Population Trends over Time, Nesting, Rarities)
and it is useful to examine each to assess the impact of over counting on findings.

Migration Information

If the TOC wants eventually to produce a “Birds of Toronto” publication then it will need information
about (a) when the birds are here, and (b) how frequent they are when they are here.

If the data extraction from eBird is used to give presence/absence information on each day of the year to answer (a) then
the duplication problem has no effect on the determination. Answering (b) is more complex. If numerical count information is
used from eBird, then duplications matter and they have to be corrected for somehow. This may not be possible.
Alternatively, presence/absence information tracked over time can answer (b) and in this case the duplication problem has no
effect.

eBird creates its graphs by calculating the thickness as number of checklists that report the bird divided by the total
number of checklists. If a Condor is sighted on a TOC field trip of 10 people to High Park and everyone on the trip reports
it to eBird, the bird will appear on 10 lists. However the 10 lists are divided by the total number of lists for the day which
could be in the hundreds and consequently the effect of the error is reduced. This is for a determination of a single day.
For calculation of a graph covering all years for Toronto, the 10 lists are divided by thousands of lists for all years and
the error vanishes.

Population Trends over Time

Any numerical trending requires numeric counts and duplication is a problem. Data collection protocol is essential and in
that way duplication will be prevented.

We recommend that numeric trending data only be used when it comes from an official TOC counting project such as the Warbler
Survey, The Hawk watch and the Whimbrel Count. Other projects of a similar nature can be set up whenever there is a desire for
them.

Nesting

The TOC is not collecting nesting information yet. However, the nature of the beast means that duplication is rare in the
first place and can be controlled completely if data extractions use geo-location information to recognize duplicate nests.

Rarities

In reports of rare birds or birds completely new to the official Toronto list, the error of Misidentifying 3 is paramount. It
is the job of the vetting committee to do the best it can to determine that a sighting can be accepted or not.

Errors of Under or Over counting are not relevant to the acceptance of the record and therefore the duplication problem has no
effect.