Statistical Guide to
Data Analysis of Avian
Monitoring Programs
Biological Technical Publication
BTP-R6001-1999
U.S. Fish & Wildlife Service
Statistical Guide to
Data Analysis of Avian
Monitoring Programs
Biological Technical Publication
BTP-R6001-1999
Nadav Nur
Point Reyes Bird Observatory, Stinson Beach, CA 94970
Stephanie L. Jones
U.S. Fish & Wildlife Service, Mountain-Prairie Region, Denver, CO 80225
Geoffrey R. Geupel
Point Reyes Bird Observatory, Stinson Beach, CA 94970
U.S. Fish & Wildlife Service
Authors
Nadav Nur
Point Reyes Bird Observatory
4990 Shoreline Hwy.
Stinson Beach, CA 94970-9701
415/868 1221
email: NadavNur@prbo.org
Stephanie L. Jones
Nongame Migratory Bird Coordinator
U.S. Fish & Wildlife Service, Mountain-Prairie Region
P.O. Box 25486 DFC
Denver, CO 80225
303/236 8145 ext. 608
email: Stephanie_Jones@fws.gov
Geoff Geupel
Point Reyes Bird Observatory
4990 Shoreline Hwy.
Stinson Beach, CA 94970-9701
415/868 1221
email: GGeupel@prbo.org
Suggested citation
Nur, N., S.L. Jones, and G.R. Geupel. 1999. A statistical
guide to data analysis of avian monitoring programs.
U.S. Department of the Interior, Fish and Wildlife
Service, BTP-R6001-1999, Washington, D.C.
ii
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Chapter I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Computer Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Recommended Monitoring Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Methods for Assessing Abundance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Demographic Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Statistical Terminology and Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
General Considerations of Study Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Analysis of Vegetation and Habitat Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Chapter II. Assessment of Abundance and Species Composition Using Point Counts . . . . . . . . . . . . . . . . . . . . . . 8
Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Community Similarity Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Analyzing Vegetation Data in Relation to Point Count Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Power and Sample Size Analysis Using TRENDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Using MONITOR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Power and Sample-Size Analyses: Other Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Chapter III. Demographic Monitoring: Mist-nets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Analysis of Productivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Analysis of Adult Survival. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Chapter IV. Demographic Monitoring: Nest-monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Background. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Additional Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Alternatives to the Mayfield Method: Systematic Searching and Time-to-Failure Analysis . . . . . . . . . . . . . . 34
Vegetation Analysis in Relation to Nest-monitoring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Chapter V. Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Chapter VI. Concluding Remarks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
iii
Table of Contents
iv
Tables
1. Monitoring methods used in landbird population monitoring and their characteristics. . . . . . . . . . . . . . . . . 2
2. Potential objectives of a monitoring program and typical number of years needed for a method
to achieve results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3. Example of data from point count observations conducted at three point count stations, three times
during the breeding season. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4. Calculation of diversity, similarity and evenness indices using total bird detections across sites in
burned and unburned aspen (Populus tremuloides) stands in Wyoming (from Dieni 1996). . . . . . . . . . . . . . 12
5. Linear regression analysis of number of Black-headed Grosbeaks during the breeding season. . . . . . . . . 14
6. Sample output for linear regression analyses using STATA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
7. Analysis of point count data on Sacramento River: relationship of bird species richness to
Damage Index, controlling for vegetation/habitat characteristics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
8. Analysis of mist-net captures, Sacramento River 1993: relationship to Damage Index for the
six species with adequate sample size. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
9. Analysis of mist-net captures, Sacramento River, 1993: relationship of HY, and proportion HY birds
caught in relation to Vegetation Damage Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
10. Evaluation and summary of available computer program software used for the analysis of animal
marking and surveying studies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
11. Results of SURGE analysis of Wrentits, by territory status. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
12. Summary of models in JOLLY and JOLLYAGE (Pollack et al. 1990). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
13. Power analysis for detecting differences in survivorship between two groups. . . . . . . . . . . . . . . . . . . . . . . 36
14. Logistic regression analyses of Grasshopper Sparrow presence/absence in relation to habitat
features (from Holmes and Geupel 1998). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Figures
1A. Trend, log-linear, P = 0.001, Black-headed Grosbeak, Palomarin 1980-1992. . . . . . . . . . . . . . . . . . . . . . . . 13
1B. Trend, linear-no transformation, P = 0.004, Black-headed Grosbeak, Palomarin 1980-1992. . . . . . . . . . . 13
2A. Normal probability plot, residuals of log-transformed data, Black-headed Grosbeak. . . . . . . . . . . . . . . . 16
2B. Normal probability plot, residuals of untransformed data, Black-headed Grosbeak. . . . . . . . . . . . . . . . . 16
3. Bird species richness in relation to Vegetation Damage Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4A. Distribution of residuals: species richness vs. Vegetation Damage Index. . . . . . . . . . . . . . . . . . . . . . . . . . 19
4B. Quantile-quantile plot of residuals of species richness vs. Vegetation Damage Index against
normal distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5. Probability of detecting Grasshopper Sparrows in relation to Index of Perennial Grass Cover. . . . . . . . . 40
List of Tables and Figures
v
This Statistical Guide is intended to aid field
biologists wishing to analyze data gathered in
standardized monitoring programs for landbirds. It
grew out of the needs expressed by the Western
Working Group of Partners in Flight, and we thank
the members of that group for providing the
incentive to develop this document. It is not
intended to replace good statistical texts, but to
supplement them. We encourage readers, and
especially users, of this Guide to forward their
comments, corrections, and other advice to the
senior author for incorporation into future versions
of this Guide.
This work has been a contract between Point Reyes
Bird Observatory and the U.S. Fish & Wildlife
Service. This is PRBO Contribution 679.
References to commercial products does not imply
endorsement.
Acknowledgments
We thank John R. Sauer, J. Scott Dieni, Ken Gerow,
Daniel R. Petit, and Jon Bart for multiple reviews of
earlier drafts; John Cornely, Barry Noon, Kathie
Purcell, C.J. Ralph, Len Thomas, and Jerry Verner
also provided helpful discussion and comments on an
earlier draft of this document. The authors, not the
above named reviewers, should be held responsible
for any errors or outlandish opinions expressed
here. We thank Jim Nichols for providing a helpful
preprint. We thank the USFWS Nongame
Coordinators: Tara Zimmerman, Bill Howe, Steve
Lewis, Diane Pence, Richard Coon, Kent Wohl,
together with Dan Petit and John Trapp, for support
and encouragement. Special thanks to all the field
biologists who took the time to assist us in doing this
document and are out there doing the work, facing
the challenges, and balancing the issues: Adrianna
Araya, Grant Ballard, Sharon Browder, Mike
Bryant, Claire Caldes, Lynn Clark, Paula Gouse,
Ron Garcia, Todd Grant, Bill Haglan, Jeanne
Hammond, Laura Hubers, Craig Hultberg, Beth
Madden, Steve Martin, Bob Murphy, Lark Osborne,
Fritz Prellwitz, Pam Rizor, Vickie Roy, Kelli Stone,
Julian Wood, Kodiak and McDougall Jones and
many more.
Preface
This Guide is intended to provide guidance to field
biologists wishing to analyze data collected on
terrestrial bird populations, as part of an avian
population monitoring program. A second objective
is to provide information that will help biologists
design such programs. The audience is similar to
that for the Handbook of Field Methods (Ralph et
al. 1993), the Monitoring Bird Populations by Point
Counts (Ralph et al. 1995), and in many ways this
Statistical Guide to Data Analysis of Avian
Monitoring Programs can be a useful complement
to the field methods handbook. At the same time,
we feel this Statistical Guide can be of use to field
biologists studying other organisms besides
terrestrial birds. In our view, all field biologists
will benefit from taking the equivalent of 2 or 3
semester courses in statistics and we assume that
readers of this guide have completed at least this
basic level in statistics.
This document is not intended to fill deficiencies in
basic knowledge of statistics, nor is it a substitute
for a good statistical text. Rather, this Guide is
intended as a supplement to these texts. Our aim is
to provide practical advice in the design and analysis
of field ecological data and to provide timely
information about current statistical computer
programs. Two good statistical texts are provided by
Neter et al. (1990) and Kleinbaum et al. (1988). Both
of these texts are “intermediate” in level; that is,
they assume the reader has had a basic,
introductory course in statistics. Other texts by
Snedecor & Cochran (1989), Sokal & Rohlf (1995)
and Zar (1996) all provide a good, general statistical
background. Intermediate level guides for
practicing ecologists are provided by Crawley (1993),
Bart and Notz (1996) and Bart et al. (1998).
Noteworthy specialized statistical ecological texts
include Ludwig & Reynolds (1988), Skalski &
Robson (1992), and Draper & Smith (1981). The last
two mentioned have many biological examples. Also
see the informative review by Lancia et al. (1996).
Computer Programs
Computer programs for summarizing and analyzing
data with general statistical packages are available,
for many different levels, prices and target
audiences. Ellison (1992) reviewed a number of
general statistical packages, but that review is
somewhat out of date. One versatile statistical and
graphical package, available for DOS, Windows,
and UNIX platforms, is Stata (StataCorp. 1999)
(obtained from Stata Corporation, 702 University
Drive East, College Station, TX 77840). Specialized
computer software programs have been created to
assist with analysis of capture/recapture data (used
for analyses of survivorship, also population size);
these are reviewed and summarized in this and
additional specialized computer programs are
mentioned in the respective sections of this Guide.
Recommended Monitoring Methods
A wide range of methods have been used to conduct
avian monitoring, each tailored to meet a different
set of objectives in the face of different constraints.
This Guide does not address all methods that are
available, especially those that are more widely used
for research or inventory. Below is a short review of
monitoring methods available, based on Butcher
(1992) and Ralph et al. (1993). The reader is referred
to these references (and others cited below) for
additional information. Table 1 describes the
variables measured and subjectively assesses the
relative strengths and weaknesses of each method.
“Strength” and “weakness” is assessed relative to
the quality of the data gathered to meet the
objective and we have not attempted to factor in cost
per datum. Table 2 provides a list of monitoring
objectives, monitoring methods and the typical time
required by the various methods to achieve those
objectives (from Geupel & Warkentin 1995).
Descriptions of monitoring methods, their
applications and comparisons, and their limitations
can be found in Ralph and Scott (1981), Verner
(1985), Butcher (1992), Ralph et al. (1993), Buckland
et al. (1993) and Geupel & Warkentin (1995).
Methods
Area search—A method in which observers are
allowed to roam for a fixed time in a specified area,
usually 20 minutes per 3 hectare area (Loyn 1986,
Slater 1994). This technique has a wide appeal to
volunteers but standardization of data collection is
difficult.
1
I. Introduction
2 Statistical Guide to Data Analysis of Avian Monitoring Programs
Table 1. Monitoring methods used in landbird population monitoring and their characteristics.
Methods are grouped under “survey” and “demographic.” Positive or high level is denoted by “+”,
negative or low level denoted by “–” and partial level denoted by “+/–“. Modified from Table 1 in Butcher
(1992). “Color banding” is assumed to include nest-searching. “Rare” species refers to species that are
locally (not just globally) rare.
Survey Demographic
Fixed Spot Area Variable Mist Nest Color
Variables Measured distance map Search distance net Search banding
Index to abundance + + + + +/– +/– +
Density – + – + – – +
Survivorship (adult) – – – – + – ++
Productivity – – – – + + +
Recruitment – – – – + – +
Habitat Relations + + + + +/– + +/–
Nest Site Characteristics – – – – – + +
Predation/Parasitism – – – – – + +
Individuals Identified – – – – + – +
Breeding Status Known – + – – +/– + +
General Characteristics
Habitat specificity + + + + +/– + +
Rare species measured + +/– + +/– – +/– +/–
Canopy species measured + + + + – +/– –
Area sampled known + + + + +/– + +
Large area sampled + – + + +/– – –
Use in non-breeding season + +/– + + + – +
Table 2. Potential objectives of a monitoring program and typical number of years needed for a method to
achieve results.
Actual number of years depends on study design and will vary depending on sample size (e.g., number of
census stations, detection or capture rates, number of nests found). We assume that the priorities of the
monitoring program reflect local or site-specific needs (adapted from Geupel & Warkentin 1995).
Method
Single Point Repeat Area Spot Mist Nest
Objective Countsa Pt. Countsb Searchc mapping nettingd monitoringd
Inventory, species presence/absence 1 1 1 1 1 na
Inventory locally rare species 2-3 1-3 1-3 1-3 1-3 na
Determine species richness 2-3 1-3 1-3 1-3 na na
Determine relative abundance 1-2 1-2 1-3 1-2 3-5 na
Determine species breeding status/seasonality na 1-3 1-3 1-3 1-3 1-3
Determine population trend 6-10 5-9 10+ 5-9 6-10 na
Determine productivity na na na na 1-3 1-2
Determine adult survivorship na na na 3-5e 3-5 na
Determine life history traits na na na 2-4 na 1-2
Habitat association or preference 1-2 1-2 1-2 1-3 na 1-2
Identify habitat features 4-6 3-5 3-5 2-4 na 1-2
Determine cause of pop. change na na na na 3+ 3+
a Each point count censused one time in a season.
b Each point count censused 3 or more times in a season.
c Each plot censused 3 or more times in a season.
d Most authors/programs recommend this method in conjunction with population surveys.
e Possible if birds have been uniquely color-banded.
na Not applicable or not possible.
Methods for Assessing Abundance
Point counts—Fixed radius point counts are the
basic method recommended for most monitoring
studies, and are most widely used (Hutto et al. 1986,
Ralph et al. 1993, Ralph et al. 1995). These can
provide a cost-effective method of estimating the
relative abundance of birds.
Line transects—Fixed-width transects can provide
coverage of a greater area than point counts, but
with fewer independent data points or replicates.
Variable distance methods—Estimating distance at
which birds are detected can be incorporated into
both point count and line transect surveys.
Standardization of distance estimation may be
difficult, as abilities to accurately estimate distances
may vary greatly between observers.
Spot-mapping—Can provide good density
information and information on many aspects of
avian life history. It is expensive per data point and
may be better applied to research projects or to high
priority areas or species.
Demographic Methods
In general, demographic monitoring methods can be
used to identify proximal causes of population
declines and provide insight into causes of habitat
associations. They can identify population problems
prior to the detection of declines based on
abundance surveys. Ultimately, these methods can
be used to identify “source” or “sink” populations.
However, these methods require much effort per
station.
Constant effort mist-netting—Provides information
on productivity and survivorship of populations,
but is limited by area covered (which is generally
unknown) and lack of habitat specificity. However,
many species can be monitored at the same time,
without expending extra effort.
Nest monitoring—Provides site-specific and
habitat-specific information on productivity and
reproductive status. Available personnel usually
limit the number of plots that can be studied, and
studying additional species normally requires
increased effort.
Color-banding—When combined with nest
monitoring, using unique color-band combinations to
follow the fates of individuals will provide the most
complete and unbiased measures of demographic
parameters. However, it is the most intensive
method of all. It is not a method recommended for
general monitoring, but like spot-mapping, best
suited for research projects or for high priority
areas and species.
Statistical Terminology and Principles
The following is a selective review of some statistical
terms relevant to a biologist conducting a
monitoring study. Our intention here is to re-acquaint
the reader with terms and principles that
may have rested dormant for many years.
Accuracy—An estimator is accurate if it produces
estimates that are, on average, close to the true
value, i.e., without bias or with a minimum of bias.
Accuracy is independent of precision (below). An
estimate can be accurate but not precise, precise but
not accurate, or both accurate and precise. The
difficulty is that often the “true” value is unknown
and therefore accuracy is difficult to judge, except
for simulated data where an investigator knows the
true values.
Bias—The difference between the average estimate
(more precisely, the expected value of the estimate)
and the true value. Bias is not the same as “error”,
rather it is one kind of error, systematic error. If an
estimate is as likely to be an overestimate as it is to
be an underestimate, the estimator in question is
unbiased, even though there will always be error
associated with an estimate. To minimize bias would,
by definition, maximize accuracy.
Precision—Precision refers to the variability of the
estimate: the smaller the variability (and thus the
smaller the standard error) of the estimate, the
greater the precision. As mentioned above, precision
is independent of accuracy. An estimate can be very
precise, but wildly inaccurate (i.e., strongly biased).
Type I and Type II errors—Rejecting the null
hypothesis when it is correct is committing a Type I
error. The probability of committing a Type I error
is symbolized α [alpha] and is the significance level
of a test of statistical inference. Accepting the null
hypothesis when it is incorrect is committing a Type
II error; the probability of making such an error is
symbolized ß [beta].
Power—The probability of detecting a biological
effect, if there is one. More precisely, power is the
probability of rejecting the null hypothesis when the
null hypothesis is incorrect. Normally, the null
hypothesis is an hypothesis of no effect (i.e., no
difference). Power is equal to 1–ß. Power cannot be
calculated unless one specifies the alternative
hypothesis: one must specify the magnitude of the
effect or difference. A given test will have greater
power the greater the magnitude of the effect, and
conversely, the smaller the true difference between
groups, the less the power to detect that difference
for a given sample size. Power is discussed in
greater depth in Chapter II of this Guide.
Introduction 3
Poisson distribution—Among several discrete
distributions (binomial, geometric, negative
binomial), this distribution is one of the most likely
to be encountered or utilized in ecological studies
(Ludwig and Reynolds 1988). Many random
processes, in which events occur independently of
each other, in space or time, conform to a Poisson
distribution. Suppose one set up a grid of 100 one-cm
squares (10 cm × 10 cm). The number of rain drops
falling per square in a short interval of time is likely
to be Poisson-distributed. Suppose that in one
minute, 100 rain drops fell on the 100 squares. If this
process was indeed a Poisson process, then we would
expect that in 1minute, on average, 37 squares
would receive 0 drops, 37 squares would receive 1
drop, 18 squares would receive 2 drops, and 8
squares would receive 3 or more drops. For a
Poisson process, the mean of occurrences (per unit
time or space, in the rain drop example = 1.0 drops
per square per minute) will equal the variance of
those occurrences. Thus, for a Poisson-distributed
variable, only one parameter is specified.
Another useful distribution is the binomial
distribution. If N independent trials were conducted
and the probability of a “hit” (representing success,
failure, death, etc.) on any one trial is p, then the
number of hits in total is binomially distributed, with
mean = Np, and Variance = Np(1–p). As a result,
variance is neither independent of the mean (as it is
in the normal distribution) nor is it equal to the
mean (as it is in the Poisson distribution); moreover,
variance is maximized when p =1–p = 0.5. As p
approaches 0 or 1, the variance will shrink to zero.
The binomial distribution is, for example, utilized in
logistic regression (Chapter V). Note that the
binomial distribution has two parameters, N and p.
When the number of trials is large and the
probability of a “hit”, p, is low, then the binomial
distribution can be approximated by the Poisson
distribution.
Replicates—Replicates are independent repetitions
or measurements within the experimental design. If
repetitions are not independent then these “repeats”
are sometimes referred to as pseudoreplicates
(Hurlbert 1984, Bart et al. 1998). Suppose 100 point
count stations in a given habitat type have been
surveyed three separate times during the breeding
season. The 300 data points obtained should not be
treated as 300 replicates or samples, because bird
data obtained on different days in the same season
are not independent. Whether or not the 100 point
count stations are independent or not is difficult to
say a priori, but if spaced far enough apart (Ralph
et al. 1995 recommend spacing of at least 250 m), so
that the same individuals are not being counted at
different stations, the 100 point count stations can be
treated as independent. Assuming independence
among adjacent point count stations, and if the 100
point count stations were divided evenly among 4
habitats, then there would be 25 replicates.
As far as the three repeats per point count station
are concerned, one can average the data, select the
repeat with the highest score for each individual
species, or sum the data from each of the three
visits. If one wished to compare results among the
three visits (e.g., asking whether there was a
seasonal, within-year trend), one can analyze the 300
observations, using “point count station” as a
categorical variable to be controlled for; this is an
example of a repeated-measures design, in which
“point count station” is a blocking variable.
Independence of observations—This is an important
issue in statistical analysis, and is often
misunderstood. To start, what is required are that
outcomes be independent from one observation to
another, after controlling for factors or variables
that might be influencing the outcome. Suppose the
point count stations have been spaced 100 m apart
on transects of 1 km length. An investigator might
not feel comfortable in treating observations from
different stations on the same transect as being
independent of each other. One solution would be to
classify the transect as the unit of observation, i.e.,
pooling data from all point count stations on the
same transect, and analyze data accordingly.
Another solution would be to include in the analysis
a “transect effect.” This would control for the fact
that stations on the same transect are more likely to
be similar in outcome than are stations on different
transects. In this way one can investigate
differences among and within transects.
A second point is that the independence refers to the
outcome, not the independent variables or factors.
Suppose one related bird species richness to
vegetation. As long as bird species richness varies
independently from station to station (after
controlling for various factors), it would not matter
that all stations on a transect shared some of the
same vegetation characteristics. In other words,
there is no requirement that vegetation
characteristics be independent from one observation
unit to another.
General Considerations of Study Design
General study design considerations will apply to
most monitoring techniques and studies. Neter et al.
(1990) provides a good discussion of experimental
design, also see Skalski & Robson (1992) and
Crawley (1993); those wishing more detail can
consult specialized texts such as Hicks (1982). A
helpful and interesting discussion of the issues and
the process for designing an avian monitoring study
on one site such as a National Wildlife Refuge is
4 Statistical Guide to Data Analysis of Avian Monitoring Programs
given in Johnson (In Press). In this section, we
discuss some general points concerning design of a
study. Later when discussing each methodology in
turn (point counts, mist-netting and nest-monitoring),
we return to questions of design.
Throughout this Guide, the use of “station” refers to
one independent monitoring site, e.g., one point
count station (if observations are deemed
independent of other stations), one line transect, one
mist-netting array, one nest-monitoring plot, etc. It
is important to correctly determine the unit of
analysis early in the study design.
Design—The first and most important
consideration in designing a study is its objectives.
Statistical inference (in particular, tests of
statistical significance) may be of little interest, in
which case statistical power need not be considered
in determining the sample size needed. A biologist
may instead wish to monitor a particular area
mainly as a descriptive tool. If data are gathered
in a standardized fashion (Ralph et al. 1993), the
data from one area can contribute to regional or
national monitoring programs, which likely have
statistical inference as an objective. In many cases
the number of stations will be limited by available
resources or by the physical areas of interest.
Some field biologists will be able to establish one,
or at most, a couple of demographic monitoring
stations (e.g., one mist-net array or one
nest-monitoring plot). In those cases placement
of the station will usually be constrained by the
location and size of the habitat of interest, by the
density of the species of special concern, or be
centered on the location of the habitat or species
of interest.
Data from just a single demographic monitoring
station may be valuable for several reasons:
1. the data provide a description of temporal
patterns, which data can be combined with other
sources of data, 2. the data can allow statistical tests
of trends over time, given sufficient number of years
of data collection (possibly 10 years or more for a
single station), and 3. the data can be combined with
data from other monitoring stations.
Not every monitoring program needs to have
hypothesis testing as its goal from the outset. A
monitoring program may be able to collect valuable
data that can later be analyzed (by itself or as part
of a larger study), and that analysis would surely
include hypothesis testing and tests of statistical
significance. But it is pointless to erect contrived
hypotheses before data collection has begun, simply
in order to justify the establishment of a monitoring
program. After data have been collected, the
investigator will have a much better idea of how to
formulate meaningful hypotheses. This point does
not apply to experimental studies, where explicit
hypothesis formulation is an essential ingredient to
a successful study.
Assuming statistical inference is an important
consideration, one needs to determine whether the
objective is to determine trends through time,
establish bird-habitat relationships, compare effects
of different treatments, or other possible objective.
Choice of objective will influence questions of sample
size and allocation of stations (see Randomization
below).
Assuming that statistical inference is a goal, the
question of necessary sample size needs to be
related to statistical power, i.e., the ability to detect
an effect if there is one. Statistical power is an
elusive concept in part because it is arbitrary.
Calculations of sample size in the past have used
power values ranging from 50% to 95%. Clearly, the
greater the desired power, the greater the sample
size necessary to achieve that power. Generally, this
Guide uses values of 50% and 80%. In designing a
study one would not ordinarily consider 50% power
to be adequate and we do not recommend a study be
designed to achieve 50% power. Nevertheless 50%
power presents a useful level for a posteriori
investigations, where someone has already collected
these data and the biologist wishes to consider the
statistical power of the data to detect effects of
interest. Conversely, in designing a study, 80%
power is a commonly used and often-recommended
benchmark, but it is nothing more than a
benchmark.
Power calculations and sample size calculations
both rely on the presumed magnitude of the effect
in question. Clearly, the greater the presumed
effect (e.g., the greater the difference between
the two groups), the greater the power will be to
detect that effect, and, conversely, the smaller the
necessary sample size to detect an effect at a
specified power. The difficulty here is that the true
difference between groups is unknown, and
furthermore one cannot necessarily use the
observed magnitude of an effect (e.g., observed
difference between two groups) as the criterion
for judging power.
It is easy to fall into the trap of estimating power,
retrospectively, using the observed magnitude of an
effect, and several general statistical packages
appear to encourage users to do so, without
appropriate warnings (discussed in Thomas and
Krebs 1997). The problem is that if a statistically
significant effect is found, one would not normally
calculate power retrospectively. If the investigator
looks for an effect and finds there is one, then there
is little need to determine the probability of having
Introduction 5
found that effect. Therefore, retrospective power
calculations are usually pursued only when no
significant effect is detected. But given that no effect
was detected (statistically), it could be because the
observed magnitude of an effect was substantial, but
power was weak, or because the observed
magnitude of an effect was small, even negligible.
However, power will always be low to detect a
negligible effect. It is not very informative to
calculate that, given the negligible effect observed,
yes, one’s power to detect a negligible effect is
negligible. Thus, to be useful, retrospective power
analysis requires that only effects of a priori
interest be examined. In other words, in conducting
power analysis, the magnitude of the effect of
interest needs to be fixed independently of the data
at hand. The biologist must decide what is the
magnitude of an effect worth considering; this is a
biological, not a statistical, issue that is sometimes
difficult to settle.
Randomization—Randomization is an important
part of experimental design, owing to the work of
Sir Ronald Fisher in the early 20th century.
Randomization is used to combat biases that can
undermine survey and experimental studies. The
most important bias concerns assignment to
treatments. By randomizing assignment to
treatment (e.g., grazed vs. ungrazed), extraneous
differences among experimental units can be
minimized. Even here one would likely use
randomization subject to constraint. Suppose one
had five land units, each one that can be divided into
four plots. Randomly choosing treatment for the 20
plots could result in an unbalanced design. Instead,
one can randomly choose treatment, subject to the
constraint of 10 plots for each treatment. An even
better design would use land unit as a blocking
variable. Within each block (here, land unit), one
randomly assigns treatment to plots, with the
constraint that there must be two plots for each
treatment. Of course, in many studies assignment to
treatment is not always under the investigator’s
control.
Randomization should also be applied to minimize
other types of bias, if feasible. If two treatments
are being compared using point counts, using two
observers, one should not assign one observer to
conduct point counts in treatment A and the other
observer to conduct point counts in treatment B.
In this case, observer identity and the effect of the
treatment would be confounded. Instead, the two
treatments should be divided between the two
observers, as randomly or equitably as possible.
Another bias concerns order of observation. If
several plots are to be visited each day, one should
not visit the plots in the same order each time, but
should vary the order. It is not usually feasible to
visit point count stations in a random order, but
one can usually randomize the starting point on
each visit.
The final source of bias concerns inclusion in a study.
The sample to be studied will likely be the most
representative of the population in question if it is
randomly selected; however, this is often not
feasible. Nevertheless, we recommend incorporating
some randomness into every study. For example, one
could lay out a grid of point count stations, centered
on a randomly selected starting point as suggested
by Sauer (1998). This approach can be adapted for
those setting up transects of point count stations:
the starting point for a transect can be randomly
selected among a subset of possible points. Another
approach is to set up a grid of possible stations and
then randomly determine whether or not to include
individual stations in the study. Hutto et al. (1996)
and Hutto and Paige (1995) provide other
suggestions for randomizing point count stations
across broad areas.
Analysis of Vegetation and Habitat Characteristics
Data on vegetation and habitat features can play an
important role in avian monitoring studies. These
data can be gathered at different scales and in many
different ways. Methods of vegetation data
collection are described in many publications,
including Ralph et al. (1993), the BBIRD program
protocol (Martin et al. 1997), and Hays et al. (1981).
One of the most influential vegetation assessment
protocols developed for use with bird studies is by
James and Shugart (1970), with modifications by
Noon (1981). The analyses of vegetation data
collected in conjunction with point counts and
nest-monitoring are discussed in the appropriate
sections.
Vegetation data can be collected and analyzed at
several different scales. The broadest is habitat
classification and is qualitative (categorical) rather
than quantitative. This level includes most
vegetation maps and can be used to select the
vegetation types for study. The next broadest scale
is the “stand” level. This scale is commonly used to
ground-proof aerial photographs and, depending on
methods, to construct bird-habitat (or bird-vegetation)
correlations, making use of point count
and line transect data. The third scale involves
vegetation used to characterize the study area at a
smaller scale than the first methods, often within a
radius of 11.28 m following James and Shugart
(1970). In some studies, plots are centered on nests
or other sites of bird use (“use sites”), while others
(“non-use sites”) are randomly placed for
comparison within the study area. This scale allows
data that are more quantitative in nature to be
collected, compared to other scales. Examples of
6 Statistical Guide to Data Analysis of Avian Monitoring Programs
studies using this scale are Knopf et al. (1988) and
Larson & Bock (1986). This scale provides a good
means to establish bird-habitat relationships; such
data can be gathered quickly, accurately and
efficiently. The finest scale of vegetation
measurement is around the nest, nest plant or other
micro-habitat features (Martin & Roper 1988;
Martin et al. 1997).
Currently there is little agreement among biologists
on the methods, and even the scale, of vegetation
data collection needed to correlate with bird
abundance, habitat needs, distribution and behavior.
Therefore, it is not possible at this time to
recommend a single approach for analysis of
vegetation data since the data analytic approach will
depend on how the data were collected.
Introduction 7
Several techniques have been used for estimating
abundance of birds (Verner 1985, Bibby et al. 1992,
Butcher 1992, Skalski & Robson 1992, Buckland et
al. 1993, Greenwood 1996, Lancia et al. 1996). In the
past, two widely used and promoted methods have
been point counts and line-transects (Ralph & Scott
1981, Buckland et al. 1993). Capture/recapture data
is a third method used to estimate populations
(Greenwood 1996, Lancia et al. 1996). Following the
recommendations of the National Monitoring
Working Group of Partners in Flight (Butcher 1992)
and Ralph et al. (1993) and Ralph et al. (1995), we
restrict our attention to point counts. Line-transects
can also yield valuable data regarding population
abundance and species composition; however, the
design and analysis of transect data is beyond the
scope of this Guide (Ralph & Scott 1981, Buckland et
al. 1993). We assume that data will be collected using
fixed radius point counts, as described in Ralph et al.
(1993), rather than unlimited distance point counts
or variable distance point counts (Ralph & Scott
1981).
Throughout this Guide we discuss how to analyze
data gathered in a typical monitoring program and
then discuss design of monitoring programs,
especially sample size. Ideally, one should first put
careful thought into designing a monitoring
program before data collection and analysis.
However, here we discuss data analysis first in order
to give the reader a better idea of what sorts of data
can be gathered and what are some inferences that
can be drawn from data collected in a monitoring
program.
Analysis
Point count data have commonly been analyzed
with respect to 1. relative abundance, 2. species
richness, 3. species diversity and 4. community
similarity. An alternative to the analysis of relative
abundance, has been 5. the analysis of species
presence/absence (i.e., a species is scored as 1 if one
or more individuals are detected, and 0 if
otherwise). (We recommend not using the term
“frequency of occurrence” to characterize such
analyses, because of ambiguity of this terminology.)
However, from the point of maximizing statistical
power, the analysis of relative abundance (i.e.,
number of individuals detected per station) is to be
preferred to an analysis of presence/absence. The
latter discards information, leading to a loss of
statistical power. On this point we are in agreement
with Dawson (1981),
“[E]ither frequency of occurrence or average
number [per station] is adequate measure for
species which occur usually as one or none in each
counting unit. On the other hand, frequency
becomes an increasingly insensitive measure for
species found in larger numbers.”
Presence/absence may be very helpful as a
descriptive tool. That is, it may be informative to
state that a species was present at 40% of stations in
habitat x and 60% of stations in habitat y. Another
advantage of presence/absence data is that some
analytic methods can be used for such data but not
for total detections. For example, logistic regression
can be used with presence/absence, but not with
total detections. Logistic regression is discussed in
more detail in Chapter V and an example is provided
below of the analysis of presence/absence data.
Nevertheless, more sophisticated variants on
logistic regression can use total detections (e.g,
“ordered logistic regression”, StataCorp. 1999). Also,
Poisson regression, an analytic method that has
much in common with logistic regression, can
analyze total detections (Kleinbaum et al. 1988). As
its name implies, Poisson regression assumes that
the number of detections per station is Poisson-distributed,
but some software (e.g., EGRET)
includes the capability of testing this assumption
(and modifying the analysis if data do not conform to
this assumption).
Relative abundance is analyzed as number of
detections per unit area. The number of individuals
are determined at each point count station and this
datum can be entered into regression analyses or
analysis of variance (ANOVA). Results from several
point count stations can be averaged to produce a
summary statistic (Example 1). If a point count
station is surveyed more than once per season, one
can either sum the number of detections over all
point count surveys or calculate an average number
per point-count survey. As long as each station is
surveyed the same number of times (e.g., three
times), the two measures (average vs. sum) will
8
II. Assessment of Abundance and Species
Composition Using Point Counts
differ only by a constant, in this case, three. A third
commonly used method is to use the maximum
number of detections over the course of the three
surveys. In analyzing relative abundance these three
methods can be expected to yield similar patterns.
The number of individuals detected at a point count
station is a function of the absolute abundance and
the probability of detecting an individual (given that
it is present). Analyses of relative abundance
assume that differences in detectability can be
ignored, for the purposes of the study. In contrast,
variable distance methods (often referred to as
distance sampling; Buckland et al. 1993) attempt to
estimate detectability. The assumption that
differences in detectability are unimportant should
be kept firmly in mind when considering surveys of
relative abundance. Recent studies confirm that
detectability is influenced by a number of different
factors (Buckland et al. 1993, McShea & Rappole
1997, Gutzwiller & Marcum 1997).
Absolute abundance. Point count data are often used
to determine relative abundance; however, absolute
abundance may be estimated using variable distance
methods (Buckland et al. 1993, Ramsey & Scott,
1981). An important assumption of variable distance
methods is that at the center point of the
observation, all individuals are detected (i.e.,
detectability = 100%). It is possible to relax this
assumption if, instead, the true absolute density can
be independently determined at the center point,
but this is often not feasible. A second important
assumption is that individuals do not move towards
or away from the observer before being detected.
Buckland et al. (1993) provide extensive discussion of
these and other assumptions. The same authors
have developed a program DISTANCE that carries
out such analyses (Laake et al. 1993, Web site:
<http://www.ruwpa.st-and.ac.uk/distance/>).
Species richness is analyzed as total number of
species detected. A total can be calculated for each
point count station, or for each group of point count
stations (Example 1).
There are a plethora of indices for species diversity
(Magurran 1988, Ludwig & Reynolds 1988). The
utility of diversity indices has been strongly
questioned by some (Verner & Larson 1989), and
their use has limitations. It has been argued that
species richness, a component of species diversity, is
more easily and more accurately measured. Species
richness is highly correlated with species diversity
and can be interpreted more clearly (Verner&
Larson 1989). An example of the value of a diversity
index (but one that is admittedly extreme) is a
comparison of two communities, each containing five
species and each with a total of 100 individuals.
Community A contains 96 individuals of species 1
and 1 individual of each of the other 4 species;
community B contains 20 individuals of each of five
species. Which community is more diverse? If one
feels that both are equally diverse, then species
richness is all one needs to take into account.
However, if one’s view is that community B is more
diverse, because its bird community is more
heterogeneous, then one is justified in using a
diversity index. However, keep in mind that more
assumptions are required to estimate diversity than
species richness. In particular, calculations of
species diversity assume that relative abundance is
accurately estimated and ignores the differences in
detectability among species that can skew estimates
of relative abundance.
The most widely used diversity index is referred to
as Shannon’s index, or as the Shannon-Wiener index
or the Shannon-Weaver index (Krebs 1989).
Shannon’s index, which is derived from information
theory, reflects both species richness and evenness
of distribution among species present. An equation
for the Shannon index, using natural logarithms
(ln) is:
where S = number of species in the sample, and pi
is the proportion of all individuals belonging to the
ith species. The original Shannon index was
calculated in terms of logarithm base 2, and thus H'
was expressed in terms of bits; however, it is more
common and more convenient to use natural
logarithms, as we have done above. A useful
transformation of H' is given by eH', which has been
labeled N1 (MacArthur 1965).
N1 expresses diversity in terms of species instead of
bits and thus is easier to interpret. N1 provides the
number of species that would, if each were equally
common, yield the same H' value as the actual
sample. For example, suppose there are 3 species, 20
of species A, 20 of species B and 10 of species C.
Using the above equation, H' = 1.055 and N1 = 2.87.
These three species, in their uneven distribution,
yield the same diversity value as would 2.87 species
of equal abundance. A comparison of species
richness (= S = 3) with N1 (= 2.87) gives us a
measure of evenness of species distribution. That is
the species distribution is maximally even when
S=N1.
For a fixed S, the maximum diversity (Hmax) is equal
to –ln (1/S) = ln(S) and therefore the ratio of
observed diversity to maximum diversity is a
measure of evenness (E): E = H'/Hmax = H'/ln S
i=S
H′ = Σ(pi)(lnp), i=1, 2,…S
i=1
Assessment of Abundance and Species Composition Using Point Counts 9
(Examples 1 and 2). If some species are more
detectable than others this will bias one’s measure of
diversity, either upwards or downwards. If the
Shannon index is calculated for a number of
samples, the indices themselves will be normally
distributed, making it possible to use parametric
statistics to compare sets of samples using diversity
indices (Magurran 1988). Further techniques for the
analysis of diversity patterns are described in
Magurran (1988) and Pielou (1975).
Example 1:
Calculation of Summary Statistics
The following is a simple and hypothetical example
of data collected using point counts (Table 3).
Observations were made at 3 point count stations at
3 different times during the breeding season.
Species are uniquely identified by a single letter
(A, B, C, etc.).
From these data, summary statistics can be
calculated, first of all summing (or averaging) across
the three survey periods, and then summing (or
averaging) across the three point count stations
whose data have already been summed over the 3
survey periods. Such a summarization is shown in
Table 3B.
The results shown in Table 3A for each point count
station can be used in a statistical analysis (e.g.,
regression or ANOVA) (Example 3). The biologist
may also summarize results for a group of point
count stations characterized by an important
similarity, e.g., all stations at a specific site, or all
stations in a specific habitat on a refuge, or other
unit of interest.
The row titled “Average” in Table 3B (second from
the bottom), simply averages the results from point
count stations 1-3. The row titled “Cumulative”
(bottom) shows the total number of individuals seen
at the 3 point count stations (a measure of
abundance), the total species richness for the 3
stations, and the species diversity as measured for
all 3 stations taken together. Thus, the average
station had 5 species, but the three stations together
had 7 different species. For average number of
individuals seen per point count survey, the
“Cumulative” value is simply three times that of the
“Average” value (i.e., 12.33 = 4.11 × 3). Thus the only
difference between these two measures is that in
one case one sums the number of individuals and
divides by the number of point count stations and in
the other case one sums and does not divide. Any
statistical results will be identical whichever
measure of individuals detected is used, except for a
10 Statistical Guide to Data Analysis of Avian Monitoring Programs
Table 3. Example of data from point count observations conducted at three point count stations, three times
during the breeding season.
A. Results by species. “A, A” indicates two individuals of species A were seen, “A, A, A” indicates three individuals,
“A, B, C” indicates one individual of three species, etc.
Point Count Survey Species Number of Species
Station Number Observed Individuals Richness
1 1 A, A, B, C 4 3
1 2 B, B, C, D 4 3
1 3 A, C, D, E 4 4
2 1 B, B, B, C 4 2
2 2 B, B, D 3 2
2 3 B, B, F 3 2
3 1 B, C, C, D 5 3
3 2 B, C, E, F, F 5 4
3 3 B, C, E, F, F, G 6 5
B. Summarization of data from Table 4A.
Point Count Average Number Cumulative Ecological
Station Individuals Species Richness Species Diversity1 Eveness = E
1 4.0 5 4.69 0.960
2 3.33 4 2.56 0.678
3 5.0 6 5.24 0.924
Average 4.11 5.0 3.86 0.839
Cumulative 12.33 7 5.55 0.881
1 Shannon’s index expressed as N1
constant (in this case, 3, the number of point count
stations).
In contrast to measures of abundance, Average
species richness and Cumulative species richness,
will generally not be so simply related to each other.
At one extreme average species richness will equal
cumulative species richness where there is complete
overlap of species at each point count station. At the
other extreme, cumulative species richness will be
three times that of average species richness
(assuming one is summarizing data from three point
count stations) provided there is no species overlap
at any point count station. Reality will usually fall
somewhere in between. Either way of summarizing
species richness can be justified. The same holds for
species diversity; the average diversity (per point
count station) and the diversity of the group of point
count stations are both legitimate ways to
characterize diversity.
Community Similarity Indexes
Another method of comparing communities is to
measure the degree of association or similarity in
community composition between sites or samples.
For example, two sites may be identical in species
richness, but both have completely different species.
For this purpose, a wide range of similarity indices
have been developed (Magurran 1988). Two such
indices that are widely used and that rely only on
presence/absence data are the Jaccard index and
Sorensen index (Krebs 1989):
where j = the number of species found at both site A
and B, a = the number of species in site A and b =
the number of species found in site B. These indices
are designed to equal 1 where the species from the
two sites are the same and 0 if the sites have no
species in common. Example 2 and Table 4 provide
an example of a calculation of Jacard and Sorenson
similarity coefficients.
One of the advantages to these indices is their
simplicity, but the indices do not account for
differences in the abundance of species. All species
count equally in the equation whether they are
abundant or rare. For this reason, quantitative
indices of similarity have much appeal as an
alternative. Again, many such indices have been
developed (Magurran 1988, Krebs 1989). Here we
just mention one of the simplest, the Renkonen
2j
Sorenson Cs = _______
a+b
j
Jaccard Cj = _______
a+b–j
index, also called the Percentage Similarity index.
The formula for the Renkonen index (P) is:
where pA
i is the percentage of species i in sample A
and pB
i is the percentage of species i in sample B
and S is the number of species found in either
sample. With no overlap between samples the index
equals 0, with complete similarity the Renkonen
index equals 100%. Table 4 provides an example of
the Renkonen index.
Example 2:
Calculation of Community Similarity Indices
The following is an simplified example of data
collected using point counts (Table 4). Observations
were pooled using the highest number counted
during 3 different surveys in the breeding season,
and pooled across 5 paired treatment-control plots
(modified from Dieni 1996). Community similarity
and diversity indices can be calculated and
comparisons made using these data.
The row titled “number of individuals” in Table 4 is
the sum of the total number of individuals counted in
each site. The columns titled “pa” and “pb” are the
proportion of each species in the total; i.e., the
number of individuals divided by the total number of
individuals for that site. The calculation of Jaccard’s
index is the number of species in common (j) to both
sites divided by the difference between the sum of
the number of species in each site minus the number
in common. Sorenson’s index is 2 times j divided by
the summation of the number of species in both
sites.
Other indices may also be informative including the
Renkonen index which is calculated by taking the
summation of the minimum of either pa or pb. Other
examples of the calculations of indices that may be
useful are shown in Table 4.
Linear Regression
To introduce linear regression, and provide a simple
example of trend analysis we consider the following.
Example 3:
An Example of Simple Regression
Black-headed Grosbeaks (Pheucticus
melanocephalus) have been surveyed at the
Palomarin station of Point Reyes National Seashore
during the breeding season for many years. Here we
present data from 1980-1992 (13 years) and wish to
i=S
P = Σ minimum (pA
i , pB
i)
i=1
Assessment of Abundance and Species Composition Using Point Counts 11
determine if there has been a trend for numbers to
increase or decrease during this period.
Keep in mind four key assumptions of linear
regression analysis:
1. Normality of residuals
2. Homoscedasticity; that is, there are no systematic
differences in variance of residuals
3. Independence of the outcome variable (i.e.,
independence of residuals), and
4. That we are interested in testing the hypothesis
(HA) that there is some sort of linear relationship
between dependent and independent variable. In
this case, the hypothesis is that bird abundance is
decreasing or increasing with time, in a linear
fashion.
Note that assumptions 1-3 refer to residuals, i.e., the
difference between the observed value of the
dependent (i.e., outcome) variable and the predicted
value from a regression model, we have to fit a
regression model before we can evaluate the
residuals. Figure 1 shows observed data and fitted
regression lines for this example, for
log-transformed data (Figure 1A), and for
untransformed data (Figure 1B). The log
transformation is commonly used in analyses of
linear models (e.g., regression and ANOVA;
additional examples below). There are two reasons
for using a logarithmic transformation:
12 Statistical Guide to Data Analysis of Avian Monitoring Programs
Table 4. Calculation of diversity, similarity and evenness indices using total bird detections across sites in
burned and unburned aspen (Populus tremuloides) stands in Wyoming (modified from Dieni 1996).
Number of Individuals Statistical Transformations
Minimum
Species Burned Control pa or pb pa pa ln pa pb pb ln pb
Red-tailed Hawk 2 0 0.000 0.004 –0.021 0.000 0.000
American Kestrel 1 0 0.000 0.002 –0.012 0.000 0.000
Northern Flicker 36 20 0.036 0.068 –0.182 0.036 –0.119
Western Wood-Pewee 21 39 0.040 0.040 –0.128 0.070 –0.186
Dusky Flycatcher 13 9 0.016 0.024 –0.091 0.016 –0.066
Tree Swallow 47 29 0.052 0.089 –0.215 0.052 –0.153
Clark’s Nutcracker 3 0 0.000 0.006 –0.029 0.000 0.000
Black-capped Chickadee 13 18 0.024 0.024 –0.091 0.032 –0.110
White-breasted Nuthatch 0 3 0.000 0.000 0.000 0.005 –0.028
Red-breasted Nuthatch 1 6 0.002 0.002 –0.012 0.011 –0.049
House Wren 127 142 0.239 0.239 –0.342 0.254 –0.348
Hermit Thrush 0 1 0.000 0.000 0.000 0.002 –0.011
American Robin 38 47 0.072 0.072 –0.189 0.084 –0.208
Warbling Vireo 163 199 0.307 0.307 –0.363 0.355 –0.368
Orange-crowned Warbler 14 40 0.026 0.026 –0.096 0.071 –0.189
Brewer’s Blackbird 3 0 0.000 0.006 –0.029 0.000 0.000
Western Tanager 2 2 0.004 0.004 –0.021 0.004 –0.020
Pine Siskin 33 3 0.005 0.062 –0.173 0.005 –0.028
American Goldfinch 1 0 0.000 0.002 –0.012 0.000 0.000
Cassin’s Finch 13 2 0.004 0.024 –0.091 0.004 –0.020
Number of individuals 531 560
Number of species 18 15
Number of species in common (j) 13
Summations 0.827 1.0 –2.095 1.0 –1.903
Jaccard (Cj) 0.650
Sorenson qualitative (Cs) 0.788
Renkonen index (P) 0.827
Shannon diversity (H) 2.095 1.903
Shannon evenness (E) 0.725 0.703
Shannon maximum value (Hmax) 2.890 2.708
1. Linear models assume additivity, but the
relationship between the dependent variable and an
independent variable may be multiplicative, i.e., with
an increase of each unit in x, y increases by a
constant proportion. Exponential growth or decline
of a population is a good example of a multiplicative
model. In this case, we may wish to fit a model in
which Black-headed Grosbeak numbers increase or
decrease by d% per year; our objective is to estimate
the value d, and test whether it is significantly
different from zero.
By taking logarithms, one can convert a
multiplicative relationship,
y = abx,
into an additive relationship,
log(y) = log(a) + (log(b))(x).
What was once a multiplicative relationship can be
rewritten in an additive form,
y′ = a′ + b′x.
2. The logarithmic transformation can often
normalize residuals (as shown below), thus
conforming to an important assumption of
regression analysis, as well as of ANOVA, ANCOVA,
and similar analysis.
A regression analysis on the log-transformed data is
appropriate, but before doing so, we present typical
output from STATA (Table 5) from a regression
analysis with annotated comments (numbers below
correspond to numbers on the output). Table 5A
shows analysis of log-transformed data; Table 5B
shows analysis of untransformed data.
1. Sums of Squares (“SS” in Table 5), degrees of
freedom (“df ”), and Mean Squares (“MS”) are
provided for the model being examined. This output
is usually of greater interest in ANOVA than in
regression analyses. Sums of squares are included in
R2 and R2a (#3, below). “Model” refers to
independent variables (in this case, only one) and
does not include the “constant” term.
2. The F statistic (“F”) for the entire model
(excluding the constant) is shown, and the P-value
associated with that statistic (“Prob >F”). The
degrees of freedom of the numerator (the first term
within the parentheses) equals the number of
parameters in the model, excluding the constant. If
a model includes linear trends for two independent
variables, the numerator df is equal to 2. If the
model, instead, includes quadratic and linear terms
for a single independent variable then the
numerator df is also equal to 2. If the model includes
linear trends for two independent variables and
their interaction, then the numerator df is equal to 3,
and so on.
The overall P-value, while of some interest, should
be of less concern than P-values for individual terms.
A model which contains one very significant
independent variable and one insignificant
independent variable can generate a highly
significant overall P-value, though such a model
Assessment of Abundance and Species Composition Using Point Counts 13
Figure 1. A. Linear trend in log(number Black-headed Grosbeaks observed) in relation to year (1980 to
1992), (statistical analysis in Table 5A). Triangles indicate log(number observed) in each year; solid line
indicates best-fitting trend using linear regression analysis. The trend depicted is a log-linear trend.
B. As in A. but numbers observed are untransformed. Statistical analysis in Table 5B; trend depicted is a
linear trend. Note that trend line fits observations better for log-transformed data (Figure 1A) than for
untransformed data (Figure 1B); e.g., with a higher R2 0.637 vs. 0.545.
Figure 1A. Trend, log-linear, P=0.001
Black-headed Grosbeak, Palomarin 1980-1992
Figure 1B.Trend, linear—notransformation,P=0.004
Black-headed Grosbeak, Palomarin 1980-1992
would be undesirable. On the other hand, if two
independent variables are highly correlated, each
variable could be insignificant (when controlled for
the other), yet the overall model could be very
significant and provide a good predictive model.
3. R2 (“R-square”) and adjusted R2 (“Adj R-square”).
The first statistic is often referred to as the
coefficient of determination. While it should be
familiar to all field biologists, much confusion still
surrounds its use or abuse (Anderson-Sprecher
1994). The second statistic is probably unfamiliar to
many, yet should be more widely known and used
(Neter et al. 1990, Kleinbaum et al. 1988). R2 can be
interpreted as the proportion of variation in the
dependent variable that can be accounted for by the
model in question. Both statistics provide a measure
of the predictive ability of a model. If R2 and
adjusted R2 are low, this means that much variation
in the Y variable is not accounted for by the model,
but this does not reflect on the adequacy of the
model. In Table 5A, R2 = 0.637, meaning that 36% of
the variation in Black-headed Grosbeak numbers is
not accounted for by an exponential decline in
numbers with increasing year.
There are several drawbacks to R2. For one, any
regression model will have a positive R2 associated
with it, even a regression model that links two
variables that are completely unrelated. To provide
an example, we generated two random variables X,
Y, integers chosen from a uniform distribution (0,
100) and which were independent of each other.
Values for X were (3, 67, 98, 63, 25, 90, 34, 4, 31, 78)
and for Y were (44, 91, 30, 92, 26, 56, 57, 90, 81, 47).
Regressing Y on X we obtain R2 = 0.021 (P = 0.69).
We would feel uncomfortable in stating that “X
accounted for 2.1% of the variation in Y,” since in
reality we know that it accounts for no such
variation. The second drawback is that as one adds
additional terms (additional independent variables),
R2 will always increase (Neter et al. 1990). Adjusted
14 Statistical Guide to Data Analysis of Avian Monitoring Programs
Table 5A. Linear regression analysis of number of Black-headed Grosbeaks, breeding season,
log-transformed (=ltotbrs) vs. year.
Source SS df MS Number of obs = 13
Model 2.71854704 1 2.71854704 F (1, 11) = 19.31
Residual 1.54865857 11 .140787142 Prob > F = 0.0011
Total 4.26720561 12 .355600468 Rsquare = 0.6371
Adj Rsquare = 0.6041
Root MSE = .37522
ltotbrs Coef. Std. Err. t P>|t| [95% Conf. Interval]
year .1222173 .0278129 4.394 0.001 .183433 .0610016
_cons 245.1384 55.23646 4.438 0.001 123.5637 366.713
Table 5B. Linear regression analysis of number of Black-headed Grosbeaks, breeding season,
untransformed (=totalbrs) vs. year.
Source SS df MS Number of obs = 13
Model 298.291209 1 298.291209 F (1, 11) = 13.20
Residual 248.631868 11 22.6028971 Prob > F = 0.0039
Total 546.923077 12 45.5769231 Rsquare = 0.5454
Adj Rsquare = 0.5041
Root MSE = 4.7543
totalbrs Coef. Std. Err. t P>|t| [95% Conf. Interval]
year 1.28022 .3524085 3.633 0.004 2.055866 .504574
_cons 2554.44 699.8845 3.650 0.004 1014.004 4094.875




R2 (R2a) was developed to counteract these
drawbacks. Adjusted R2 is defined as
where n = number of observations and p = number
of parameters (including the constant), SSE equals
Sums of Squares of the Residual and SSTO equals
Total Sums of Squares. Note that
In other words, R2a is equal to R2 after multiplying
the proportion of unexplained variance by
(n–1)/(n–p). This ratio (the adjustment factor) is
always equal to or greater than one, and therefore
R2a will always be less than or equal to R2. As n gets
large this ratio diminishes, and as p gets large, the
ratio increases. The properties of R2a are that:
1. If there is no relationship between two variables,
R2a will, on average, be equal to zero. Thus, under
the null hypothesis, R2a provides an unbiased
measure of the true relationship between the two
variables. In other words, if Y and X are completely
unrelated, R2a but not R2 will, on average, equal
zero. In the example cited above (of random X and
Y), R2a = –0.101. Any R2a less than 0 makes it
unambiguously clear that one variable does not have
value in predicting the other.
2. R2a will not necessarily increase as one adds
parameters. If the gain in R2 is small, then R2a can
decrease because the gain in R2 does not offset the
decrement due to the increase in p. Thus, R2a can
provide a good means of selecting the best
predictive regression model. In fact, the model
which maximizes R2a is also the model that
minimizes Mean Square Error (equivalently, Root
MSE), which is a measure of residual variation
about the predicted regression line.
4. Root Mean Square Error (“Root MSE”). This
provides a measure of the variability about the
regression line. In other words, it is the residual
variation left after allowing for the effect of, in this
case, year on Black-headed Grosbeak numbers. It is,
literally, the square-root of the Mean Square
associated with the Residual term (i.e., “error”).
Root MSE would equal the standard deviation of the
outcome variable if there were no explanatory power
to the independent variable (i.e., R2 = 0.0);
otherwise, Root MSE is less than the standard
 SSE 
______
SSTO R2 =1–
 SSE 
______
SSTO
 n–1
____
 n–p R2a = 1 –
deviation. Note that Root Mean Square Error in this
example is the measure of variance which the
programs MONITOR and TRENDS ask for
(described in detail below).
5. The regression coefficients (“Coef.”), their
standard errors (“Std. Err.”), and results of t tests,
examining whether t is significantly different from
zero, are shown (“t” and “P>|t|,” respectively).
Shown first is the regression coefficient for the
independent variable, Year. From Table 5A, our best
estimate (assuming that linear regression
assumptions are met) is that the number of birds
observed declines at an instantaneous rate of 0.122
units, expressed in natural logarithms. This
translates to an 11.5 percent decline per year, i.e.,
each year the number of detected birds is 0.885 times
that of the previous year. When the untransformed
data are analyzed (Table 5B), the best estimate is a
decline of 1.28 birds per year.
Shown below the coefficient for year is the
coefficient for the intercept term (here termed
“constant”). The value of the intercept term
provides the predicted value when the independent
term (here Year) equals zero. Thus its value depends
on how the independent variable is coded. Year = 0
might refer to the year 0, to the year 1900, or to any
other year so designated. The designation is
arbitrary and won’t affect the regression coefficient
for the term, Year. Note that STATA evaluates the
regression coefficient for Year using a two-sided
test, which we consider appropriate.
6. The 95% confidence interval for the regression
coefficients are presented. We recommend that
biologists examine confidence intervals for
regression coefficients; a confidence interval can
provide clear evidence of the precision (or lack of
precision) of our analysis. For an example, where an
analysis indicates no significant effect, a confidence
interval may indicate that a very broad range of
values is consistent with the data.
Comparing Table 5A and 5B (corresponding to
Figure 1A and 1B), we see that log-transformation
(Table 5A, Figure 1A) produces a better fitting
model (higher R2, more significant P-value) than
does analysis of untransformed data. This implies
that Black-headed Grosbeaks are declining at a,
more or less, constant proportion rather than at, a
more or less a constant decrease, using the absolute
number of individuals. This result makes biological
sense. Evaluating residuals confirms that the log-transformed
model is preferable. For example, we
can evaluate whether skewness and kurtosis of
residuals deviates from normality for each model
using the Skewness/Kurtosis test in the program
STATA (StataCorp. 1999). For log-transformed data,
Assessment of Abundance and Species Composition Using Point Counts 15
we cannot reject the hypothesis of normality (P =
0.25), whereas for untransformed data we can reject
the assumption of normality (P = 0.0003) (results
obtained using “sktest” in the program STATA).
Results will not always be this clear-cut; we may
want to use graphical methods to examine normality
of residuals. Figure 2 shows a normal probability
plot for transformed (Figure 2A) and untransformed
data (Figure 2B). We won’t go into the details of
these plots (interested readers can refer to
Kleinbaum et al. 1988, Neter et al. 1990); the main
point is that if residuals are normally distributed,
the data points will fall on the straight line shown.
For the log-transformed data there is a reasonably
good match between data points and the line; for
untransformed data there is not. The graphical
method does not determine whether or not the
residuals are normally distributed. It does indicate
to what extent transformation is or is not improving
the normality of residuals.
Example 4:
Application of Simple and Multiple Regression
We now tackle a more complex example, taken from a
study by the Point Reyes Bird Observatory,
conducted for the California Department ofFish&
Game (Nur et al. 1994).We use this example as an
opportunity to provide guidance in carrying out
multiple regression analysis. In July1991an herbicide
was accidentally spilled in and near the Sacramento
River, close to Dunsmuir,CA, resulting in the death of
all aquatic forms of life for a 36-mile stretch of river.
In addition, terrestrial fauna and flora along the river
were thought to have been impacted. Nur et al. (1994)
report results of an avian monitoring project
designed to assess the impact of the spill on
terrestrial bird populations.Aquantitative measure
of presumed impact was developed by California
Department ofFish&Game biologists, relying on
defoliation, leaf death and other symptoms of stress
exhibited by the riparian vegetation, which we term
the Vegetation Damage Index. Sites along the river
varied in the degree of impact, depending on the
exposure to the herbicide. In general sites closer to
the spill site in the downstream direction received
greater damage, and therefore higher values of the
damage index. Point counts were laid out in transects
of 7 stations per transect, stations spaced 300m
apart, with each transect parallel to the river and1800
min length, with one transect per “site”. All transects
were in riparian habitat.
In general, there was a tendency for areas with high
damage to show low species richness (Figure 3). In
particular, there was an overall significant linear
trend for bird species richness to decline with
increasing damage, when analyzing all 55 point
count stations. Output for this analysis is shown in
Table 6A (using the program STATA). Note that in
Table 6A, R2 = 0.149, meaning that 85% of the
variation in species richness among point count
16 Statistical Guide to Data Analysis of Avian Monitoring Programs
Figure 2. Evaluating the assumption of normality using graphical techniques. Comparison of “Normal
probability plots” depicting residuals from analysis of log-transformed (Figure 2A) and untransformed
(Figure 2B) observations of Black-headed Grosbeaks. Figure 2A and 2B depict the empirical cumulative
distribution function expected if the variable were normally distributed (y-axis) vs. the observed cumulative
distribution function (x-axis). If the variable in question were normally distributed then the graphed points
would fall exactly on the solid line and the correlation between the two cumulative distribution functions
would be +1.0. Log-transformed data conform better to a normal distribution than do untransformed
observations.
Figure 2A. Normal probability plot,
residuals of log-transformed data
Black-headed Grosbeak
Figure 2B. Normal probability plot,
residuals of untransformed data
Black-headed Grosbeak
stations is not accounted for by differences in the
damage index. Our interpretation of this result is
that species richness data from individual point
count stations are very variable. The model is,
however, highly significant, and we have no reason to
think the model is inadequate.
One needs to keep in mind that there are two
different objectives for which one can use regression
models: (i) hypothesis testing, and (ii) prediction. In
this case, a model with only vegetation damage
would poorly predict species richness at a specific
point count station. However, such a model achieves
the objective of confirming the hypothesis that
biological damage resulting from the spill was
associated with diminished species richness.
Also keep in mind that the magnitude of R2 depends
on the unit of analysis. If one were to average data
from several point counts and then use the averaged
data in a regression analysis, this would have little
effect on the P-value, yet would increase R2
substantially. This is because some of the variation
in the dependent variable has been eliminated by
using mean species richness values in the regression
analysis, rather then species richness at individual
point count stations.
We confirmed that the linear regression analysis in
Table 6A is appropriate, first by examining
normality of residuals: P = 0.50, using the skewness/
kurtosis test (“sktest” of STATA). In other words,
residuals do not appear to deviate from normality.
We demonstrate this point graphically in Figure 4.
Figure 4A shows the frequency distribution of
residuals compared to a normal distribution; Figure
4B shows a quantile-normal plot for the residuals
from Table 6A. (Quartiles and percentiles are
examples of quantiles; a quantile-normal plot shows
quantiles for the distribution of interest vs. quantiles
from a normal distribution which matches the first
distribution in terms of mean and variance.) As with
a normal-probability plot (Figure 2), if the
distribution is indeed normal, then the data points
(quantiles in this case) would fall on the solid line
shown in the Figure 4B. In this case, there seems to
be a very good match, implying that residuals are
approximately, normally-distributed.
That bird species richness was correlated with the
Vegetation Damage Index is not by itself adequate
evidence for a causal link. In a similar fashion to the
analysis in Table 6A, a suite of vegetation
characteristics were examined, to determine
whether bird species richness, diversity and
abundance were related to habitat or vegetation
features. If so, such habitat variables could be
confounding any relationship of the bird fauna to the
impact of the spill. In one scenario, there could be no
true functional relationship between bird species
richness and vegetation damage, but a correlation
between the two can arise if both are correlated with
a vegetation feature. In another scenario, the true
causal relationship between bird species richness
and vegetation damage could be strong but it could
be masked, wholly or in part, because both are
correlated with a vegetation feature. For example, if
biological damage from the spill lowered species
richness, and the presence of willow (Salix spp.)
increased species richness, then if biological damage
was greatest in an area where willows were most
abundant, the correlation between biological
damage and bird species richness could be very
weak despite a strong causal relationship between
the latter two variables.
Nur et al. (1994) examined 25 habitat features to
determine whether they might be correlated with
abundance, species richness and/or species diversity.
They found that only two habitat features were
significantly correlated with abundance, species
richness and diversity. The latter variables were
positively correlated to the presence of willow
species and negatively with the presence of big-leaf
maple (Acer macrophyllum), i.e., the more big-leaf
maple, the fewer the bird species detected. The
independent variables were indices based on percent
cover that was willow (on a 0 to 10 scale,
corresponding to 0 to 100%), and percent cover of
big-leaf maple. Results of simple linear regression of
bird species richness in relation to willow cover and
Assessment of Abundance and Species Composition Using Point Counts 17
Figure 3. Bird species richness from 55 point count
stations along the Sacramento River, in relation to
Vegetation Damage Index. Higher values imply
greater damage from spill of metam sodium
(statistical results in Table 6A). Least squares line of
best fit is shown. Data at each point count station
have been “jittered” (Stata Corp. 1997) to reduce
overlap of points.
Figure 3. Bird species richness in relation to
Vegetation Damage Index.
in relation to big-leaf maple cover are shown in
Tables 6B and 6C, respectively.
The next step in the analysis was to conduct a
multiple regression analysis including the three
independent variables (Table 7). In this case, the
primary interest was the effect of damage index
while controlling for the two habitat variables. The
results indicate that damage was still inversely
correlated with species richness, even after
controlling for one or the other habitat variable, or
after controlling for both of the habitat variables
(Table 7). These results give support to the view that
biological damage due to the spill reduced species
richness along the river. The results do not support
the alternative view that the inverse association
between species richness and damage was
coincidental, reflecting habitat or vegetation
differences among sites along the river.
The degree and direction of differences among
independent variables (if it exists) can be assessed
by comparing regression coefficients in the
simple regression analysis (Table 6) and in the
corresponding multiple regression analysis
18 Statistical Guide to Data Analysis of Avian Monitoring Programs
Table 6. Sample output for linear regression analyses using STATA. See text, Example 4.
A) model: species richness [specrich] = Vegetation Damage Index [vegdindx]
Source SS df MS Number of obs = 55
Model 118.969398 1 118.969398 F (1, 53) = 9.28
Residual 679.466965 53 12.8201314 Prob > F = 0.0036
Total 798.436364 54 14.7858586 Rsquare = 0.1490
Adj Rsquare = 0.1329
Root MSE = 3.5805
specrich Coef. Std. Err. t P>|t| [95% Conf. Interval]
vegdindx 1.663229 .5459849 3.046 0.004 2.758336 .5681219
_cons 9.368597 .5243445 17.867 0.000 8.316895 10.4203
B) model: species richness [specrich] = willow cover [willotco]
Source SS df MS Number of obs = 55
Model 78.2553574 1 78.2553574 F (1, 53) = 5.76
Residual 720.181006 53 13.5883209 Prob > F = 0.0200
Total 798.436364 54 14.7858586 Rsquare = 0.0980
Adj Rsquare = 0.0810
Root MSE = 3.6862
specrich Coef. Std. Err. t P>|t| [95% Conf. Interval]
willotco .0838145 .0349257 2.400 0.020 .0137624 .1538667
_cons 8.184659 .549244 14.902 0.000 7.083015 9.286303
C) model: species richness [specrich] = big-leaf maple Cover [bigleaco]
Source SS df MS Number of obs = 55
Model 111.19329 1 111.19329 F (1, 53) = 8.58
Residual 687.243074 53 12.9668504 Prob > F = 0.0050
Total 798.436364 54 14.7858586 Rsquare = 0.1393
Adj Rsquare = 0.1230
Root MSE = 3.601
specrich Coef. Std. Err. t P>|t| [95% Conf. Interval]
bigleaco .0290063 .0099058 2.928 0.005 .0488740 .0091387
_cons 9.799174 .6043525 16.214 0.000 8.586997 11.01135
(Table 7). The effect of damage index was similar
when analyzed by itself or after controlling for
willow tree cover (ß = –1.66 ± 0.55 vs. ß = –1.58 ±
0.53). This indicates that willow cover did not
confound the relationship between vegetation
damage and species richness. On the other hand, the
apparent effect of damage index, was stronger when
analyzed by itself than after controlling for big-leaf
maple (ß = –1.66 ± 0.55 vs. ß = –1.26 ± 0.56). Big-leaf
maple tended to be more prevalent in areas
where biological damage was greater (in fact, there
was a significant correlation between the two,
P<0.01) and thus part of the apparent reduction in
species richness with increasing damage may be
attributed to the influence of big-leaf maple.
Analyzing Vegetation Data in Relation to
Point Count Data
In Example 4 (Tables 6-7), we provide an example of
vegetation data analysis coupled with analysis of
data on bird populations. In this case, the objective
was to determine whether the relationship between
vegetation damage and species richness was due to a
direct effect of spill-induced damage, or whether the
correlation was spurious and due to the fact that
both were correlated with additional habitat
variables. There was evidence that bird species
richness was related to habitat variables (specifically
the presence of willow and big-leaf maple), but these
relationships could not by themselves account for
the observation that bird species richness declined
as vegetation damage increased.
Collecting data on many habitat and vegetation
features doesn’t answer the question of which
habitat and vegetation features are causally related
to bird abundance or distribution. If data on
important variables are not collected then
interpretation of the data that were collected can be
compromised. There is still the problem of sifting
through the data to determine which features are
most closely related to the response variable in
question. Many techniques have been used by
investigators to evaluate multi-dimensional data,
including logistic regression, discriminant analysis,
principal component analysis, correspondence
analysis and MANOVA (Ludwig & Reynolds 1988,
Trexler & Travis 1993). It is beyond the scope of this
Guide to review these various techniques; however,
an example of the use of discriminant analysis is
presented in the section on vegetation analysis in
relation to nest-monitoring.
Design
Even if statistical inference is of no concern to the
investigator, one must decide on sample size. Ralph
et al. (1995) recommend at least 30 point count
stations per habitat and per area of interest. If one
wished to monitor two habitats in each of two areas
then this would necessitate 120 point count stations.
This is only a base number and the number of
stations should be increased if few individuals of a
species or group of species of interest are detected.
Where statistical inference is a goal, then sample size
will be dictated by considerations of statistical
Assessment of Abundance and Species Composition Using Point Counts 19
Figure 4. Evaluation of the assumption of normality of residuals using graphical techniques. Residuals of
linear regression analysis of species richness are depicted (statistical model in Table 6A). Figure 4A)
Frequency distribution of residuals (histogram), superimposing a frequency distribution for a normally-distributed
variable with the same mean and variance as the observed variable. Figure 4B) Same residuals
as in A) but graphed using a quantile-normal plot. The quantiles for the observed distribution (residuals as
in Figure 4A) are plotted against the quantiles from a normally-distributed variable with the same mean and
variance as the variable in question. Normality is demonstrated if the observations fall on the solid line.
Both A) and B) confirm that residuals are normally distributed.
Figure 4A. Distribution of residuals: species
richness vs. Vegetation Damage Index
Figure 4B. Quantile-quantile plot of residuals of
species richness vs. Vegetation Damage Index
against normal distribution
power; examples include comparisons among
treatments and monitoring programs that assess
temporal trends (Gerrodette 1987, 1991, Link &
Hatfield 1990; for a general review, see Thomas &
Krebs 1997).
Dawson (1981) has derived an equation for
determining the sample size, in this case the number
of point count stations necessary to detect an effect
of interest with 50% power. He assumed that each
station is surveyed once, and that the number of
detections (for each species or group of species) at
each station follows a Poisson distribution. If the
distribution of bird-detections deviates from Poisson
distribution, then the formula would need to be
revised. Under a Poisson distribution the mean
number of detections and the variance in the
number of detections per point count station would
be equal. If the variance substantially exceeds the
mean, or vice versa, one could either modify his
formula, or use other formulas (as modified below).
Dawson also assumed that there were two groups
(two habitats, two treatments, etc.) of interest. The
formula for sample size calculations is
[1]
(3.84)(20000)
n > ______________
(d2)(m)
where m = average number of detections per
sampling unit and d = percent difference between
group 1 and group 2, defined as
For example, if m1 = 2.5 and m2 =1.5, then m = 2.0
and d = 50.
Thus, where the average number of individuals
detected per station is equal to 1, the number of
stations per group necessary to achieve 50% power
to detect a 50% difference in mean abundance is
30.7. To detect a 25% difference would require 123
point count stations per group. That is, to detect
half the difference (all else being equal) requires 4
times the sample size! This exemplifies a general
rule—precision increases, and therefore standard
errors decrease, in proportion to the square root of
sample size. Note that as average number of
detections increases, sample size (number of
stations) decreases linearly—and proportionally.
This emphasizes that calculations of necessary
sample sizes reflect the average number of
detections. Thus, if the average number of
individuals detected per station is 0.5 rather than
m1–m2
_______
 m 
d = (100)
20 Statistical Guide to Data Analysis of Avian Monitoring Programs
Table 7. Analysis of point count data on Sacramento River. Relationship of bird species richness to Damage
Index, controlling for vegetation/habitat characteristics (n = sample size).
A. Multiple regression analysis of bird species richness per point count station in relation to vegetation
damage and willow cover
Model Statistics Independent variable1
R2a = 0.201, R2= 0.231, P = 0.0011, n =55 Vegetation Damage Index ß = –1.577 ± .525, t = –3.00, P = 0.004
Willow cover2 ß = +0.0770 ± .0326, t = +2.36, P = 0.022
B. Multiple regression analysis of bird species richness per point-count station, in relation to vegetation
damage and big-leaf maple cover.
Model Statistics Independent variable1
R2a = 0.186, R2= 0.216, P = 0.0018, n = 55 Vegetation Damage Index ß = –1.264 ± .562, t = –2.25, P = 0.029
Big-leaf maple cover2 ß = –0.0213 ± .0102, t = –2.10, P = 0.041
C. Multiple regression analysis of bird species richness per point-count station, in relation to vegetation
damage, willow cover, and big-leaf maple cover
Model Statistics Independent variable3
R2a = 0.229, R2= 0.272, P = 0.0010, n = 55 Vegetation Damage Index ß = –1.273 ± .547, t = –2.33, P = 0.024
Willow cover2 ß = +0.0651± .0329, t = +1.98, P = 0.053
Big-leaf maple cover2 ß = –0.0170 ± .0101, t = –1.68, P = 0.100
1 Both considered simultaneously
2 On 0 – 10 scale
3 All three considered simultaneously
1.0, then twice as many point count stations are
required, i.e., 246 and 61.4 point count stations
would be required per group to detect a 25% and
50% difference, respectively.
As stated earlier, it is common practice to use a
higher level of statistical power (e.g., 80%) in
designing studies. Dawson (1981) only considered
50% power, but his results can be extended to
consider more stringent levels of power as follows.
For other levels of power substitute the following
(approximate) values for the 3.84 in Equation 1: for
70% power, substitute 6.15; for 80% power substitute
7.84; for 90% power substitute 10.50.
These values were derived from a more general
formula for comparison of means using two-sample
t tests, i.e.,
where n is the required sample size for each sample,
σ1
2 refers to variance in sample 1, etc., μ1 refers to
mean value in sample 1, etc., and zsubscript is a
“normal deviate” (also called z-score). For example,
for α = 0.05, z(1–α/2) = z0.975 =1.96; for power = 0.8,
z0.8=0.84; for power=0.5, z0.5=0.0; for power=0.9,
z0.9=1.28; and so on (Snedecor & Cochran 1989).
The experimenter would set the difference between
means; σ1 and σ2 are fixed by the investigator
(i.e., determined independently). For a Poisson-distributed
variable, μ1 = σ1
2 and μ2 = σ2
2.
Thus, with 1 individual detected on average per
point count station, approximately 250 and 63 point
count stations would be required per group to
achieve 80% power to detect between-group
differences of 25% and 50%, respectively. With 0.5
individual detected per point count, at least double
these sample sizes (500 and 126) would be required
per group to achieve the same 80% power. Note that
the minimum recommendation of Ralph et al. (1995),
i.e., 30 point count stations per habitat or treatment,
provides 80% power to detect a 50% difference given
an average of 2.0 detections per station, but only
yields 50% power to detect a 50% difference given
1.0 detections per station.
Where there are three or more groups of interest,
we recommend the same number of point count
stations per group be maintained. Thus to detect a
50% between-group difference with 50% power and
0.5 individual detected, on average, per station, one
would need either 123 point count stations (total)
allocated among 2 groups or 184 point count stations
[2]
(σ1
2+σ2
2) (z1–α/2+z1–β)2
________________________
( μ1–μ2)2
n =
for 3 groups. This is admittedly a conservative
approach. It would maintain the same degree of
precision per group, whether or not there are two or
more groups.
Buckland et al. (1993) present a formula for
calculating sample size (number of point count
stations) necessary to achieve a specified precision
in estimating population size. They assume that one
has conducted a pilot study of k0 stations and
detected a total of n0 individuals (assuming no
aggregations or clustering of individuals, as is the
case in flocks). Their formula for number of
stations, K, is
where CV(D) is the coefficient of variation of
abundance, i.e., the standard error (not the
standard deviation) of abundance divided by mean
abundance. For example if one wished to estimate
abundance such that the standard error was 20% of
the mean value, then CV(D) = 0.2. b is a factor that
depends on several variables and can be estimated
from pilot data (Buckland et al 1993); however, they
state that it will usually be about 3. Thus if 15
individuals are detected at 10 point count stations
on the pilot study, CV(D) = 0.2 (by design) and b =
3, then K = 3.0⁄0.04 × 0.667 = 50 point count stations.
This will be sufficient to establish a confidence
interval for abundance that ranges ± 40% of the
true value.
Power and Sample Size Analysis Using TRENDS
Recently, there has been a veritable explosion of
software now available for determination of power
and/or necessary sample size to achieve specified
power (Thomas & Krebs 1997, available on the
world-wide web at http://www.interchg.ubc.ca/cacb/
power/review/). Two specialized, free programs are
available for monitoring programs that evaluate
trends, whether those trends are temporal or
spatial.
For analysis of trend data using linear regression, a
user-friendly program, TRENDS (Gerrodette 1987,
1991), has been written by Tim Gerrodette and is
available, free of charge, from the Internet
(ftp://ftp.im.nbs.gov/pub/software/CSE/wsb21515/
trends.zip; also available from T. Gerrodette,
Southwest Fisheries Science Center, P.O. Box 271,
La Jolla, CA 92038, in which case please provide him
a 3.5″ IBM-compatible floppy disk). A User’s Guide
is provided with the software. We offer the following
as guidance in using and interpreting results from
TRENDS.
 k0 
___
 n0 
 b 
 _________ 
 [CV(D)]2 
K =
Assessment of Abundance and Species Composition Using Point Counts 21
TRENDS can be used for either temporal trends or
spatial trends, with regard to changes in abundance.
The program TRENDS can compute any one of the
following parameters:
1. Number of samples (either number of occasions,
for temporal trends, or number of sites for spatial
trends) (n);
2. Rate of change (expressed as proportional decline
or increase of the total population, e.g., –0.10 refers
to 10% decline per unit of time or space; 0.05 refers
to 5% growth per unit of time or space, etc.) (r);
3. Measure of variation about the trend line, which
Gerrodette (1989) refers to as “initial coefficient of
variation” (CV1);
4. Significance level (α);
5. Power (1-ß, where ß is the probability of making a
Type II error).
If four of these parameters are specified, the fifth
parameter is strictly determined, and its value can
be calculated by TRENDS. For example, if one
specifies number of samples, magnitude of the rate
of change, the variation about the trend line, and the
a level, TRENDS calculates power. In the same way
one can calculate the necessary number of temporal
or spatial samples to achieve a specified power to
detect an effect of specified magnitude.
If one is evaluating temporal trends, then the
number of samples refers to the number of sampling
occasions—most commonly the number of years
studied. If several surveys or point counts
contribute to a single year’s data, these would be
averaged together to yield a single datum for that
year. In the program TRENDS, if 10 point counts
are conducted in each of 10 years, the number of
samples is 10, not 100.
The measure of variation about the trend line is
given as the coefficient of variation (= standard
deviation [or standard error] divided by the mean),
symbolized in the program as CV1. CV1 is inversely
related to precision. Gerrodette refers to CV1 as
“initial coefficient of variation”, which is misleading.
The best way to obtain an estimate of CV1 is to use
data from a trend analysis to determine root mean
square error (Table 5) and then divide this by the
mean (or expected value). Gerrodette provides an
example where CV1 was estimated from replicate
counts on the same population in the same year
which was the initial year of the study. We strongly
advise against this practice because doing so
estimates only the part of the variation due to
measurement error. An additional part of the
variation about the trend line is due to stochastic
variation in abundance, which also needs to be
incorporated into CV1.
The other three parameters are straight forward. In
addition, TRENDS requires one to make additional
specifications:
6. Whether to use a 1- or 2-tailed test,
7. Whether population change is linear or
exponential,
8. Whether one’s test statistic is the z or t statistic,
and
9. How CV1 is related to abundance.
Although the first three are straight-forward, we do
have some recommendations. First, a 2-tailed test is
almost always the appropriate test, because the
possibility of an increase in population cannot be
ruled out, and would be of interest just as much as a
decline in population. Secondly, for temporal trends,
exponential growth is to be preferred. Exponential
growth implies that the growth or decline is a
constant percentage, e.g., a population increases at
10% per year. Furthermore, this assumption is in
accord with the definition of r, the rate of change.
Thirdly, if one is estimating CV1 from data, then the
t statistic is appropriate (Link & Hatfield 1990).
The relationship of CV1 to abundance is complex.
TRENDS assumes that variance (VAR) in a
population estimate is (i) proportional to abundance
(A), (ii) proportional to A2, or (iii) proportional to A3.
TRENDS does not allow for VAR to be independent
of abundance, or for VAR to be inversely
proportional to abundance. Noting that mean
abundance is proportional to abundance and that the
CV is the square-root of VAR divided by the mean,
options (i) to (iii) imply that:
(i) CV is proportional to 1/√A,
(ii) CV is independent of A, or
(iii) CV is proportional to √A.
TRENDS allows one to choose among these three
options. Which option one chooses will affect
calculated power, necessary sample sizes, etc.
(Link & Hatfield 1990). As guidance for choosing
among the options, Gerrodette (1987) offers the
following: for quadrats, strip transects, line
transects, or catch per unit effort (CPUE), CV is
proportional to 1/√A. For distance sampling, CV is
independent of A. For the single mark-recapture
using the Peterson method, CV is proportional to
22 Statistical Guide to Data Analysis of Avian Monitoring Programs
√A. Presumably, option (i) applies to standard
(fixed-distance) point count data, since a point
count can be thought of as a line transect of length
zero (Buckland et al. 1993).
Example 5:
Power Calculation Using TRENDS
To provide an example of one of the uses of
TRENDS, we consider surveys of Black-headed
Grosbeaks (based on data given in Example 3 and
Figure 1). CV1 for a single annual count (of log-transformed
data) is estimated to be 0.284 (root
mean square error [see Table 5A] divided by mean
value). Assuming α=0.05, exponential population
change (i.e., constant percentage change each year),
a 2-tailed test, use of the t-statistic, and CV1
proportional to 1/√A, the probability (power) to
detect a 5% decline per year after 10 years is 34%.
If, instead, CV1 is independent of A, then the power
to detect a 5% decline per year after 10 years is 29%.
It is difficult to say whether option (i) or option (ii) is
more appropriate, but the difference between the
two estimates is small. Thus, the power to detect a
substantial decline (amounting to 40% decline after
10 years) is fairly weak. If we increase the time scale
to 15 years, however, power increases to 89% (under
option i). Under this scenario we would have
appreciable power to detect a decline, but after 15
years the population will have declined by 54%. An
alternative means of increasing power would be to
increase the precision of our annual estimate, which
implies lowering the variance about the trend line,
i.e., decreasing CV1. If CV1 could be lowered from
0.284 to 0.200, power would increase (assuming 5%
decline over a 10 year period) from 34% to 59%.
CV1 might be lowered if sources of error could be
reduced (e.g., conduct surveys at the same time of
year) or if replicate surveys were carried out and
the results for each year then averaged.
One can easily use TRENDS to determine, instead,
the minimum number of years required to attain
80% power to detect a 5% decline per year (14 years,
assuming option i).
Using MONITOR
Whereas TRENDS uses an analytic approach to
determine sample size, power, etc., MONITOR
(developed by James Gibbs) uses computer
simulation. Link & Hatfield (1990) argue that
computer simulation is to be preferred to analytic
solutions, because the latter can only provide
approximate results. The program MONITOR is
easy to use (ftp://ftp.im.nbs.gov/pub/software/
monitor), and a Manual is readily available as well.
Users of the program should take into account the
following points.
1. Similar to TRENDS, only one data point per plot
(or transect or route) is allowed per time unit (e.g.,
per year). Unlike TRENDS, MONITOR can allow
for analyses conducted on several plots at once.
Where several plots (transects, routes, etc.) are
analyzed at once, MONITOR calculates a weighted
trend (see below for further discussion of
weighting).
2. Whereas “plots”, can refer to “routes” or
“transects,” it can also refer to individual point count
stations if they will be analyzed in this way (and not
simply pooled across a transect or route). The
maximum number of “plots” is 250.
3. When several surveys are conducted for each plot
in the same year (or breeding season or other
interval of interest), MONITOR averages across
these data (i.e., collapsing the data into a single data
point per plot per year).
4. A critical variable is “variance of plot counts.”
This variance is used to simulate variation about the
specified trend line. The manual suggests that
within-year variation (determined from multiple
surveys) can be used to estimate between-year
variance, but this will generally not be valid. The
correct estimate is the variance about the trend line,
just as with TRENDS.
5. As with TRENDS, the trend can be linear or
exponential. We strongly recommend an exponential
trend for reasons discussed above, unless data at
hand indicate a linear trend is more appropriate.
6. Data from multiple plots can be weighted
according to mean abundance, but variance about
the plot-specific trends is not used in weighting. This
is less than satisfactory, because it means that a
poorly-estimated trend has as much weight as a
well-estimated trend.
7. When data are collected from several plots,
MONITOR de-means the values (subtracting off
the mean value for each plot) before calculating the
variance. Otherwise, variance due to habitat
differences among plots will be included in the
estimate of sampling variance (which we are
interested in). However, this de-meaning is
undesirable because it over-corrects. That is,
suppose we have n plots that are true replicates. In
this case, all between-plot differences are due to
sampling variation, which will have been completely
removed by de-meaning. A better approach would
be to use a covariate (or set of covariates) to
characterize habitat variation, and then use
residuals from a regression on the habitat
covariate, to provide an appropriate measure of
variance.
Assessment of Abundance and Species Composition Using Point Counts 23
Power and Sample-Size Analyses: Other Sources
Several stand-alone statistical packages are now
available that can calculate power for a variety of
statistical tests and situations (reviewed by Thomas
& Krebs 1997). In their review, Thomas and Krebs
mention five programs that they could recommend.
Of these we highlight two: the first is PASS (Power
And Sample Size; available from NCSS Statistical
Software, 329 North 1000 East, Kaysville, UT 84037;
http://www.ncss.com/pass.html). When a class of
ecology graduate students was asked to compare
PASS with three other recommendable power and
sample size programs, 17 out of 19 students
preferred PASS! It is flexible, accurate, easy to use,
and easy to learn. The cost is moderate ($249). The
other program we mention (also reviewed by
Thomas and Krebs) is GPOWER (Erdfelder et al.
1996); though this program did not score as highly as
PASS, it is free (http://www.psychologie.uni-trier.
de:8000/projects/gpower.html). Thomas and
Krebs (1997) examined several general-purpose
statistical programs with built-in power analyses,
but found none that they could recommend.
Two other valuable sources for power analysis are
on the web. The Patuxent Wildlife Research Center
of USGS has an excellent page, that includes a
power analysis program for calculating power
for monitoring programs, using the data in a
manner similar to TRENDS. This is available at
http://www.im.nbs.gov/powcase/powcase.html.
Also available is a web page dedicated to the
discussion and calculation of power analyses at
http://www.im.nbs.gov/powcase/powlinks.html.
This site has both MONITOR and TRENDS
available as freeware.
A number of statistical texts treat the problem of
determining power. Fleiss (1981) provides an
excellent practical treatment of the problem when
the outcome is binary (only one of two outcomes), or
can be expressed as a rate or proportion. Thus, his
text can be very useful for studies of survival or
studies in which the outcome is presence or absence.
Cohen (1988) gives an extensive non-technical
treatment of power analysis for ANOVA.
24 Statistical Guide to Data Analysis of Avian Monitoring Programs
Mist-nets can be used to provide estimates of many
parameters: 1. relative abundance, 2. species
composition (richness, diversity), 3. productivity, as
measured by production or abundance of HY
(Hatching Year) birds, and 4. annual adult survival.
In addition, one can, in theory, estimate 5. offspring
survivorship to breeding age using data from mist-nets
but this is an area that is only now being
investigated by researchers. Regarding abundance
and species composition, methods of analysis are
the same as described for analysis of point count
data. Recent examples of analyses of trends in
abundance include Johnson & Geupel (1996) and
Chase et al. (1997); Silkey et al. (1999) discuss the
validity of inferring population trends from mist-net
capture data. Nur et al. (1994) analyzed
patterns of abundance along the upper Sacramento
River (Example 4) using mist-net capture data and
using point-count data. Mist nets cannot, however,
provide an absolute measure of abundance. On the
other hand, they can provide an age-specific, and
sometimes sex-specific, measure of abundance,
with a resolution that cannot be matched by point-count
or line-transect data.
Analysis of Productivity
The number of HY birds caught in a standardized
mist-netting study can provide an index of
production of young (DeSante & Geupel 1987, Nur
& Geupel 1993b, DeSante et al. 1993). Such data
have been analyzed in three ways: (i) analysis of
total number of HY birds caught; (ii) analysis of
number of HY birds caught per AHY (After
Hatching Year, i.e., adult) caught; or (iii) analysis of
per cent of all birds who are HY. Among these
parameters, (iii) is just a transformation of (ii), and
vice versa, provided that all birds are classified as
HY or AHY (total = AHY + HY). This can be seen
as follows: let HY/AHY=R. Then proportion of all
birds that are HY = HY/(AHY + HY), can be
written as
Thus, (iii) only re-expresses (ii), but the
interpretation of (ii) is more direct: the number
of fledged young per adult.
proportion HY = ____1___
1 + (1–
R)
Nur & Geupel (1993a, 1993b) point out that there are
hazards with including the number of AHY birds
caught as a measure of productivity (as do indices ii
and iii above): notably, the catchment area of HY
and AHY can differ markedly (Nur & Geupel 1993a).
Secondly, many AHY birds are transient, i.e., not
breeding locally. As an alternative, one can use the
measure in (i), HY birds alone. If one finds
differences in the number of HY birds caught in two
sets of sites, or can establish a trend in HY numbers,
this provides information about the production of
young on a population level, but it may or may not
indicate differences or trends in productivity per
pair. Thus we recommend that, if variations in
breeding population size can safely be ruled out, in
comparing areas or comparing years, the number of
HY be analyzed by itself (see examples in Nur &
Geupel 1993b). Otherwise, the biologist should
analyze (ii) or (iii). Example 6 demonstrates
different ways of analyzing productivity.
Example 6:
Analyses of Productivity.
This example is taken from the study of the impact
of the herbicide metam sodium on landbird
populations of the Sacramento River, described in
Example 4. Table 8 shows a species-by-species
analysis of productivity as measured in two ways,
HY birds per 100 net-hours, and proportion of HY
birds in the catch.
To examine patterns of productivity we selected
those species with sufficient sample size. Our
criteria were: (1) at least 36 individuals caught (of all
age classes) from the 9 sites, and (2) at least 12 HY
individuals caught, total, from the 9 sites. Six species
met both criteria (Table 8). The “36 individual
criterion” implied that each site averaged 4 or more
individuals caught, which we considered a minimal
acceptable number. We would have preferred to
impose a minimum of 45 individuals (i.e., 5
individuals caught per site on average), but then we
would have had fewer than 6 species to analyze. The
second criterion, at least 12 HY individuals caught
among the 9 sites, may seem too low a threshold (an
average of 1.33 HY individuals caught per site).
Nevertheless, we wished to include possible
instances where reproductive success was poor or
25
III. Demographic Monitoring: Mist-nets
nil at a number of the 9 sites; such apparent
reproductive failure might be especially informative.
Thus, a hypothetical species with 5, 4, 2, 1, 0, 0, 0, 0, 0
captures at 9 sites would qualify with respect to the
“12 HY” criterion. On the other hand, a species with
12 HY captures at 1 site, and 8 sites without any HY
captures is not particularly informative. We thus set
a 3rd criterion: HY captures at a minimum of 3 sites.
All species that met criteria (1) and (2), also met the
3rd criterion.
To analyze the HY capture data, we log-transformed
capture rates (birds caught per 100 net-hours), for
each species at each site. To avoid taking the log of 0
(which is undefined), we added a constant—in this
case, 1—before log-transforming. Had there been no
zeroes in the data set, we would not have added any
constant; there would have been no need to. Whereas
adding a constant before log-transformation is
standard practice, it can lead to bias (Thomas 1996).
However, the direction of bias is conservative: adding
a constant makes it somewhat more difficult to detect
an effect (e.g., to detect a trend).We recommend that
investigators try two different constants (e.g., adding
0.5 and adding 1) and determine if results are similar.
If they are, then the investigator has some confidence
that his or her results are not unduly sensitive to the
chosen constant.
For analysis of the proportion of HY in the catch, we
used the logit-transformation. In the case where
total captures = HY + AHY,
logit(proportion of HY) = loge(HY/AHY).
26 Statistical Guide to Data Analysis of Avian Monitoring Programs
Table 8. Analysis of mist-net captures, Sacramento River 1993: Relationship to Damage Index for the
six species with adequate sample size (at least 12 HY individuals caught, and at least 36 individuals,
total, caught, among 9 sites).
A) Dependent Variable: Hatching Year birds per 100 net hoursa
Species Analysis Number HY Caught
Black-headed Grosbeak ß = –0.369 ± 0.192, P = 0.096, R2a= 0.251, R2= 0.345 12
McGillivray’s Warbler ß = –0.519 ± 0.405, P = 0.240, R2a= 0.074, R2= 0.190 39
Orange-crowned Warbler ß = –0.226 ± 0.078, P = 0.023, R2a= 0.479, R2= 0.544 15
Song Sparrow ß = –0.565 ± 0.357, P = 0.16, R2a= 0.159, R2= 0.264 55
Spotted Towhee ß = –0.649 ± 0.227, P = 0.024, R2a= 0.472, R2= 0.538 32
Yellow-breasted Chat ß = –0.462 ± 0.169, P = 0.029, R2a= 0.449, R2= 0.518 13
B) Dependent Variable: Proportion of Hatching Year birdsb
Species Analysis Number of Sites
Black headed Grosbeak ß = –0.901 ± 0.566, P = 0.15, R2a= 0.161, R2= 0.266 9
McGillivray’s Warbler ß = –1.054 ± 1.115, P > 0.3, R2a= 0.006, R2= 0.130 9
Orange-crowned Warbler ß = +1.623 ± 0.432, P = 0.007, R2a= 0.622, R2= 0.669 9
Song Sparrow ß = –0.819 ± 0.690, P > 0.2, R2a= 0.170, R2= 0.274 9
Spotted Towhee ß = –1.707 ± 0.789, P = 0.074, R2a= 0.344, R2= 0.438 8
Yellow-breasted Chat ß = –1.497 ± 1.117, P > 0.2, R2a= 0.117, R2= 0.264 7
a Hatching Year Birds caught per 100 net-hours, log-transformed, i.e. ln((HY caught + 1)/100 net-hours). Results of simple regression
analyses for effect of Vegetation Damage Index. Sample size = 9 sites for each analysis.
b Proportion of Hatching Year birds, logit-transformed Results of simple regression analyses for effect of Vegetation Damage Index.
Sample size (number of sites) for each analysis is shown.
Table 9. Analysis of mist-net captures, Sacramento River, 1993: Relationship of HY, and proportion HY birds
caught in relation to Vegetation Damage Index. Results of simple regression analyses; independent variable
in each model is Vegetation Damage Index. Number of sites (sample size) is 9. Capture rates have been
log-transformed, i.e. ln((number of birds + 1)/100 net-hours).
Dependent Variable Analysis
HY birds/100 net-hours ß = –0.795 ± 0.201, P = 0.006, R2a= 0.646, R2= 0.690
Proportion of HY birds caught
logit-transformed ß = –0.360 ± 0.120, P = 0.028, R2a= 0.453, R2= 0.521
The logit transformation is a commonly used
transformation in biological analysis and forms the
basis of logistic regression (Chapter V). Note that
the logit(proportion of HY) is undefined when the
denominator (in this case number AHY caught) is
zero.We could have added a constant to the
denominator to avoid this “problem” but did not;
we consider it biologically appropriate that our
measure of productivity is undefined when there
are (apparently) no adults present. For the analysis
in Table 8B, sites could not be included where
logit(proportion of HY) was undefined, i.e., where
no AHY were caught. This applied to two of the
six species.
As shown in Table 8, of the six species analyzed,
three showed a significant decline in capture rate
with increasing biological damage. Analyses of the
HY/AHY ratio indicated a consistently downward
trend with increasing damage (5 out of 6 species had
a negative slope), but no species had a significant
negative trend. These results suggest that sample
sizes of individual species were likely too small to
reveal significant patterns, and a pooled analysis
was carried out, shown in Table 9. Analyses of all
HY and AHY caught for all terrestrial bird species
were pooled and the results confirmed a significant
decrease in productivity with increase in damage
symptoms.
Analysis of Adult Survival
Survival can be analyzed in two ways: using
capture/recapture methods or analyzing “return
rate”. Return rate is the proportion of individuals
observed in one time period (we refer to this
period as t), which are observed again (resighted,
recaptured, etc.) in the following time period
(period t +1). Thus return rate is the product of two
processes: survival from period t to period t +1, and
resighting (or recapture) in period t +1. Resighting
probability is defined as the probability an individual
is resighted at time t +1, given that an individual
has survived until time t +1 (Clobert et al. 1987,
Nur & Clobert 1988). In short,
return rate = survival × recapture
probability .probability.
(We use “recapture” in a broad sense to refer to
both resighting and recapture.) The justification
for analyzing return rate as a means of studying
survival is the assumption that recapture probability
is 100% or, at least, that it can be treated as a
constant. This assumption is likely to be violated
when one is comparing the sexes, or comparing
different species or even different populations.
Capture/recapture methodology analyzes both
parameters, survival and recapture probability.
In this way, survival can be estimated independently
of recapture probability and one can test for
differences in survival as well as differences in
recapture probability (Lebreton et al. 1992). It
would seem that capture/recapture methodology
provides a superior means to analyze survival and,
in theory, it does. However, there are three
drawbacks to its usage:
1. Capture/recapture methods require at least three
field seasons to estimate survival for one year,
instead of two. 2. More data are required to carry
out these analyses than with return-rate analyses,
because two parameters are being estimated instead
of one. 3. The optimal software for survival analyses
is not yet available, one that combines flexibility,
statistical power, and ease of use, without requiring
specialized instruction. In the meantime, there are
several programs available which can fill the gap (for
more detailed discussion, see Lebreton et al. 1993).
Table 10 summarizes statistical programs that are
available for analyzing capture/recapture data
(based on Lebreton et al. 1992). Below we discuss six
programs that have been widely used (SURGE,
RELEASE, MARK, SURPH, JOLLY, and
JOLLYAGE).
General Comments. Capture/recapture models
such as SURGE require at least three field seasons
(usually years) in order to estimate survival between
the first season and the second, though it is possible,
making some assumptions, to derive survival
estimates for the period between the second and
third field seasons. Thus ten field seasons would
yield estimates of survival for each of eight years,
and so on. It is strongly recommended that the
capture occasions be equally spaced and generally
speaking the programs SURGE, RELEASE,
JOLLY, and JOLLYAGE assume this. If one is
seeking to estimate annual survival, then the
capture “occasion” is the year or breeding season.
In each year, an individual is either caught or
re-sighted (scored a “1”), or not observed (scored a
“0”). This allows one to construct a capture history
for each individual (a string of 1’s an

Click tabs to swap between content that is broken into logical sections.

Statistical Guide to
Data Analysis of Avian
Monitoring Programs
Biological Technical Publication
BTP-R6001-1999
U.S. Fish & Wildlife Service
Statistical Guide to
Data Analysis of Avian
Monitoring Programs
Biological Technical Publication
BTP-R6001-1999
Nadav Nur
Point Reyes Bird Observatory, Stinson Beach, CA 94970
Stephanie L. Jones
U.S. Fish & Wildlife Service, Mountain-Prairie Region, Denver, CO 80225
Geoffrey R. Geupel
Point Reyes Bird Observatory, Stinson Beach, CA 94970
U.S. Fish & Wildlife Service
Authors
Nadav Nur
Point Reyes Bird Observatory
4990 Shoreline Hwy.
Stinson Beach, CA 94970-9701
415/868 1221
email: NadavNur@prbo.org
Stephanie L. Jones
Nongame Migratory Bird Coordinator
U.S. Fish & Wildlife Service, Mountain-Prairie Region
P.O. Box 25486 DFC
Denver, CO 80225
303/236 8145 ext. 608
email: Stephanie_Jones@fws.gov
Geoff Geupel
Point Reyes Bird Observatory
4990 Shoreline Hwy.
Stinson Beach, CA 94970-9701
415/868 1221
email: GGeupel@prbo.org
Suggested citation
Nur, N., S.L. Jones, and G.R. Geupel. 1999. A statistical
guide to data analysis of avian monitoring programs.
U.S. Department of the Interior, Fish and Wildlife
Service, BTP-R6001-1999, Washington, D.C.
ii
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Chapter I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Computer Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Recommended Monitoring Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Methods for Assessing Abundance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Demographic Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Statistical Terminology and Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
General Considerations of Study Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Analysis of Vegetation and Habitat Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Chapter II. Assessment of Abundance and Species Composition Using Point Counts . . . . . . . . . . . . . . . . . . . . . . 8
Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Community Similarity Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Analyzing Vegetation Data in Relation to Point Count Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Power and Sample Size Analysis Using TRENDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Using MONITOR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Power and Sample-Size Analyses: Other Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Chapter III. Demographic Monitoring: Mist-nets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Analysis of Productivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Analysis of Adult Survival. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Chapter IV. Demographic Monitoring: Nest-monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Background. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Additional Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Alternatives to the Mayfield Method: Systematic Searching and Time-to-Failure Analysis . . . . . . . . . . . . . . 34
Vegetation Analysis in Relation to Nest-monitoring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Chapter V. Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Chapter VI. Concluding Remarks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
iii
Table of Contents
iv
Tables
1. Monitoring methods used in landbird population monitoring and their characteristics. . . . . . . . . . . . . . . . . 2
2. Potential objectives of a monitoring program and typical number of years needed for a method
to achieve results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3. Example of data from point count observations conducted at three point count stations, three times
during the breeding season. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4. Calculation of diversity, similarity and evenness indices using total bird detections across sites in
burned and unburned aspen (Populus tremuloides) stands in Wyoming (from Dieni 1996). . . . . . . . . . . . . . 12
5. Linear regression analysis of number of Black-headed Grosbeaks during the breeding season. . . . . . . . . 14
6. Sample output for linear regression analyses using STATA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
7. Analysis of point count data on Sacramento River: relationship of bird species richness to
Damage Index, controlling for vegetation/habitat characteristics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
8. Analysis of mist-net captures, Sacramento River 1993: relationship to Damage Index for the
six species with adequate sample size. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
9. Analysis of mist-net captures, Sacramento River, 1993: relationship of HY, and proportion HY birds
caught in relation to Vegetation Damage Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
10. Evaluation and summary of available computer program software used for the analysis of animal
marking and surveying studies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
11. Results of SURGE analysis of Wrentits, by territory status. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
12. Summary of models in JOLLY and JOLLYAGE (Pollack et al. 1990). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
13. Power analysis for detecting differences in survivorship between two groups. . . . . . . . . . . . . . . . . . . . . . . 36
14. Logistic regression analyses of Grasshopper Sparrow presence/absence in relation to habitat
features (from Holmes and Geupel 1998). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Figures
1A. Trend, log-linear, P = 0.001, Black-headed Grosbeak, Palomarin 1980-1992. . . . . . . . . . . . . . . . . . . . . . . . 13
1B. Trend, linear-no transformation, P = 0.004, Black-headed Grosbeak, Palomarin 1980-1992. . . . . . . . . . . 13
2A. Normal probability plot, residuals of log-transformed data, Black-headed Grosbeak. . . . . . . . . . . . . . . . 16
2B. Normal probability plot, residuals of untransformed data, Black-headed Grosbeak. . . . . . . . . . . . . . . . . 16
3. Bird species richness in relation to Vegetation Damage Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4A. Distribution of residuals: species richness vs. Vegetation Damage Index. . . . . . . . . . . . . . . . . . . . . . . . . . 19
4B. Quantile-quantile plot of residuals of species richness vs. Vegetation Damage Index against
normal distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5. Probability of detecting Grasshopper Sparrows in relation to Index of Perennial Grass Cover. . . . . . . . . 40
List of Tables and Figures
v
This Statistical Guide is intended to aid field
biologists wishing to analyze data gathered in
standardized monitoring programs for landbirds. It
grew out of the needs expressed by the Western
Working Group of Partners in Flight, and we thank
the members of that group for providing the
incentive to develop this document. It is not
intended to replace good statistical texts, but to
supplement them. We encourage readers, and
especially users, of this Guide to forward their
comments, corrections, and other advice to the
senior author for incorporation into future versions
of this Guide.
This work has been a contract between Point Reyes
Bird Observatory and the U.S. Fish & Wildlife
Service. This is PRBO Contribution 679.
References to commercial products does not imply
endorsement.
Acknowledgments
We thank John R. Sauer, J. Scott Dieni, Ken Gerow,
Daniel R. Petit, and Jon Bart for multiple reviews of
earlier drafts; John Cornely, Barry Noon, Kathie
Purcell, C.J. Ralph, Len Thomas, and Jerry Verner
also provided helpful discussion and comments on an
earlier draft of this document. The authors, not the
above named reviewers, should be held responsible
for any errors or outlandish opinions expressed
here. We thank Jim Nichols for providing a helpful
preprint. We thank the USFWS Nongame
Coordinators: Tara Zimmerman, Bill Howe, Steve
Lewis, Diane Pence, Richard Coon, Kent Wohl,
together with Dan Petit and John Trapp, for support
and encouragement. Special thanks to all the field
biologists who took the time to assist us in doing this
document and are out there doing the work, facing
the challenges, and balancing the issues: Adrianna
Araya, Grant Ballard, Sharon Browder, Mike
Bryant, Claire Caldes, Lynn Clark, Paula Gouse,
Ron Garcia, Todd Grant, Bill Haglan, Jeanne
Hammond, Laura Hubers, Craig Hultberg, Beth
Madden, Steve Martin, Bob Murphy, Lark Osborne,
Fritz Prellwitz, Pam Rizor, Vickie Roy, Kelli Stone,
Julian Wood, Kodiak and McDougall Jones and
many more.
Preface
This Guide is intended to provide guidance to field
biologists wishing to analyze data collected on
terrestrial bird populations, as part of an avian
population monitoring program. A second objective
is to provide information that will help biologists
design such programs. The audience is similar to
that for the Handbook of Field Methods (Ralph et
al. 1993), the Monitoring Bird Populations by Point
Counts (Ralph et al. 1995), and in many ways this
Statistical Guide to Data Analysis of Avian
Monitoring Programs can be a useful complement
to the field methods handbook. At the same time,
we feel this Statistical Guide can be of use to field
biologists studying other organisms besides
terrestrial birds. In our view, all field biologists
will benefit from taking the equivalent of 2 or 3
semester courses in statistics and we assume that
readers of this guide have completed at least this
basic level in statistics.
This document is not intended to fill deficiencies in
basic knowledge of statistics, nor is it a substitute
for a good statistical text. Rather, this Guide is
intended as a supplement to these texts. Our aim is
to provide practical advice in the design and analysis
of field ecological data and to provide timely
information about current statistical computer
programs. Two good statistical texts are provided by
Neter et al. (1990) and Kleinbaum et al. (1988). Both
of these texts are “intermediate” in level; that is,
they assume the reader has had a basic,
introductory course in statistics. Other texts by
Snedecor & Cochran (1989), Sokal & Rohlf (1995)
and Zar (1996) all provide a good, general statistical
background. Intermediate level guides for
practicing ecologists are provided by Crawley (1993),
Bart and Notz (1996) and Bart et al. (1998).
Noteworthy specialized statistical ecological texts
include Ludwig & Reynolds (1988), Skalski &
Robson (1992), and Draper & Smith (1981). The last
two mentioned have many biological examples. Also
see the informative review by Lancia et al. (1996).
Computer Programs
Computer programs for summarizing and analyzing
data with general statistical packages are available,
for many different levels, prices and target
audiences. Ellison (1992) reviewed a number of
general statistical packages, but that review is
somewhat out of date. One versatile statistical and
graphical package, available for DOS, Windows,
and UNIX platforms, is Stata (StataCorp. 1999)
(obtained from Stata Corporation, 702 University
Drive East, College Station, TX 77840). Specialized
computer software programs have been created to
assist with analysis of capture/recapture data (used
for analyses of survivorship, also population size);
these are reviewed and summarized in this and
additional specialized computer programs are
mentioned in the respective sections of this Guide.
Recommended Monitoring Methods
A wide range of methods have been used to conduct
avian monitoring, each tailored to meet a different
set of objectives in the face of different constraints.
This Guide does not address all methods that are
available, especially those that are more widely used
for research or inventory. Below is a short review of
monitoring methods available, based on Butcher
(1992) and Ralph et al. (1993). The reader is referred
to these references (and others cited below) for
additional information. Table 1 describes the
variables measured and subjectively assesses the
relative strengths and weaknesses of each method.
“Strength” and “weakness” is assessed relative to
the quality of the data gathered to meet the
objective and we have not attempted to factor in cost
per datum. Table 2 provides a list of monitoring
objectives, monitoring methods and the typical time
required by the various methods to achieve those
objectives (from Geupel & Warkentin 1995).
Descriptions of monitoring methods, their
applications and comparisons, and their limitations
can be found in Ralph and Scott (1981), Verner
(1985), Butcher (1992), Ralph et al. (1993), Buckland
et al. (1993) and Geupel & Warkentin (1995).
Methods
Area search—A method in which observers are
allowed to roam for a fixed time in a specified area,
usually 20 minutes per 3 hectare area (Loyn 1986,
Slater 1994). This technique has a wide appeal to
volunteers but standardization of data collection is
difficult.
1
I. Introduction
2 Statistical Guide to Data Analysis of Avian Monitoring Programs
Table 1. Monitoring methods used in landbird population monitoring and their characteristics.
Methods are grouped under “survey” and “demographic.” Positive or high level is denoted by “+”,
negative or low level denoted by “–” and partial level denoted by “+/–“. Modified from Table 1 in Butcher
(1992). “Color banding” is assumed to include nest-searching. “Rare” species refers to species that are
locally (not just globally) rare.
Survey Demographic
Fixed Spot Area Variable Mist Nest Color
Variables Measured distance map Search distance net Search banding
Index to abundance + + + + +/– +/– +
Density – + – + – – +
Survivorship (adult) – – – – + – ++
Productivity – – – – + + +
Recruitment – – – – + – +
Habitat Relations + + + + +/– + +/–
Nest Site Characteristics – – – – – + +
Predation/Parasitism – – – – – + +
Individuals Identified – – – – + – +
Breeding Status Known – + – – +/– + +
General Characteristics
Habitat specificity + + + + +/– + +
Rare species measured + +/– + +/– – +/– +/–
Canopy species measured + + + + – +/– –
Area sampled known + + + + +/– + +
Large area sampled + – + + +/– – –
Use in non-breeding season + +/– + + + – +
Table 2. Potential objectives of a monitoring program and typical number of years needed for a method to
achieve results.
Actual number of years depends on study design and will vary depending on sample size (e.g., number of
census stations, detection or capture rates, number of nests found). We assume that the priorities of the
monitoring program reflect local or site-specific needs (adapted from Geupel & Warkentin 1995).
Method
Single Point Repeat Area Spot Mist Nest
Objective Countsa Pt. Countsb Searchc mapping nettingd monitoringd
Inventory, species presence/absence 1 1 1 1 1 na
Inventory locally rare species 2-3 1-3 1-3 1-3 1-3 na
Determine species richness 2-3 1-3 1-3 1-3 na na
Determine relative abundance 1-2 1-2 1-3 1-2 3-5 na
Determine species breeding status/seasonality na 1-3 1-3 1-3 1-3 1-3
Determine population trend 6-10 5-9 10+ 5-9 6-10 na
Determine productivity na na na na 1-3 1-2
Determine adult survivorship na na na 3-5e 3-5 na
Determine life history traits na na na 2-4 na 1-2
Habitat association or preference 1-2 1-2 1-2 1-3 na 1-2
Identify habitat features 4-6 3-5 3-5 2-4 na 1-2
Determine cause of pop. change na na na na 3+ 3+
a Each point count censused one time in a season.
b Each point count censused 3 or more times in a season.
c Each plot censused 3 or more times in a season.
d Most authors/programs recommend this method in conjunction with population surveys.
e Possible if birds have been uniquely color-banded.
na Not applicable or not possible.
Methods for Assessing Abundance
Point counts—Fixed radius point counts are the
basic method recommended for most monitoring
studies, and are most widely used (Hutto et al. 1986,
Ralph et al. 1993, Ralph et al. 1995). These can
provide a cost-effective method of estimating the
relative abundance of birds.
Line transects—Fixed-width transects can provide
coverage of a greater area than point counts, but
with fewer independent data points or replicates.
Variable distance methods—Estimating distance at
which birds are detected can be incorporated into
both point count and line transect surveys.
Standardization of distance estimation may be
difficult, as abilities to accurately estimate distances
may vary greatly between observers.
Spot-mapping—Can provide good density
information and information on many aspects of
avian life history. It is expensive per data point and
may be better applied to research projects or to high
priority areas or species.
Demographic Methods
In general, demographic monitoring methods can be
used to identify proximal causes of population
declines and provide insight into causes of habitat
associations. They can identify population problems
prior to the detection of declines based on
abundance surveys. Ultimately, these methods can
be used to identify “source” or “sink” populations.
However, these methods require much effort per
station.
Constant effort mist-netting—Provides information
on productivity and survivorship of populations,
but is limited by area covered (which is generally
unknown) and lack of habitat specificity. However,
many species can be monitored at the same time,
without expending extra effort.
Nest monitoring—Provides site-specific and
habitat-specific information on productivity and
reproductive status. Available personnel usually
limit the number of plots that can be studied, and
studying additional species normally requires
increased effort.
Color-banding—When combined with nest
monitoring, using unique color-band combinations to
follow the fates of individuals will provide the most
complete and unbiased measures of demographic
parameters. However, it is the most intensive
method of all. It is not a method recommended for
general monitoring, but like spot-mapping, best
suited for research projects or for high priority
areas and species.
Statistical Terminology and Principles
The following is a selective review of some statistical
terms relevant to a biologist conducting a
monitoring study. Our intention here is to re-acquaint
the reader with terms and principles that
may have rested dormant for many years.
Accuracy—An estimator is accurate if it produces
estimates that are, on average, close to the true
value, i.e., without bias or with a minimum of bias.
Accuracy is independent of precision (below). An
estimate can be accurate but not precise, precise but
not accurate, or both accurate and precise. The
difficulty is that often the “true” value is unknown
and therefore accuracy is difficult to judge, except
for simulated data where an investigator knows the
true values.
Bias—The difference between the average estimate
(more precisely, the expected value of the estimate)
and the true value. Bias is not the same as “error”,
rather it is one kind of error, systematic error. If an
estimate is as likely to be an overestimate as it is to
be an underestimate, the estimator in question is
unbiased, even though there will always be error
associated with an estimate. To minimize bias would,
by definition, maximize accuracy.
Precision—Precision refers to the variability of the
estimate: the smaller the variability (and thus the
smaller the standard error) of the estimate, the
greater the precision. As mentioned above, precision
is independent of accuracy. An estimate can be very
precise, but wildly inaccurate (i.e., strongly biased).
Type I and Type II errors—Rejecting the null
hypothesis when it is correct is committing a Type I
error. The probability of committing a Type I error
is symbolized α [alpha] and is the significance level
of a test of statistical inference. Accepting the null
hypothesis when it is incorrect is committing a Type
II error; the probability of making such an error is
symbolized ß [beta].
Power—The probability of detecting a biological
effect, if there is one. More precisely, power is the
probability of rejecting the null hypothesis when the
null hypothesis is incorrect. Normally, the null
hypothesis is an hypothesis of no effect (i.e., no
difference). Power is equal to 1–ß. Power cannot be
calculated unless one specifies the alternative
hypothesis: one must specify the magnitude of the
effect or difference. A given test will have greater
power the greater the magnitude of the effect, and
conversely, the smaller the true difference between
groups, the less the power to detect that difference
for a given sample size. Power is discussed in
greater depth in Chapter II of this Guide.
Introduction 3
Poisson distribution—Among several discrete
distributions (binomial, geometric, negative
binomial), this distribution is one of the most likely
to be encountered or utilized in ecological studies
(Ludwig and Reynolds 1988). Many random
processes, in which events occur independently of
each other, in space or time, conform to a Poisson
distribution. Suppose one set up a grid of 100 one-cm
squares (10 cm × 10 cm). The number of rain drops
falling per square in a short interval of time is likely
to be Poisson-distributed. Suppose that in one
minute, 100 rain drops fell on the 100 squares. If this
process was indeed a Poisson process, then we would
expect that in 1minute, on average, 37 squares
would receive 0 drops, 37 squares would receive 1
drop, 18 squares would receive 2 drops, and 8
squares would receive 3 or more drops. For a
Poisson process, the mean of occurrences (per unit
time or space, in the rain drop example = 1.0 drops
per square per minute) will equal the variance of
those occurrences. Thus, for a Poisson-distributed
variable, only one parameter is specified.
Another useful distribution is the binomial
distribution. If N independent trials were conducted
and the probability of a “hit” (representing success,
failure, death, etc.) on any one trial is p, then the
number of hits in total is binomially distributed, with
mean = Np, and Variance = Np(1–p). As a result,
variance is neither independent of the mean (as it is
in the normal distribution) nor is it equal to the
mean (as it is in the Poisson distribution); moreover,
variance is maximized when p =1–p = 0.5. As p
approaches 0 or 1, the variance will shrink to zero.
The binomial distribution is, for example, utilized in
logistic regression (Chapter V). Note that the
binomial distribution has two parameters, N and p.
When the number of trials is large and the
probability of a “hit”, p, is low, then the binomial
distribution can be approximated by the Poisson
distribution.
Replicates—Replicates are independent repetitions
or measurements within the experimental design. If
repetitions are not independent then these “repeats”
are sometimes referred to as pseudoreplicates
(Hurlbert 1984, Bart et al. 1998). Suppose 100 point
count stations in a given habitat type have been
surveyed three separate times during the breeding
season. The 300 data points obtained should not be
treated as 300 replicates or samples, because bird
data obtained on different days in the same season
are not independent. Whether or not the 100 point
count stations are independent or not is difficult to
say a priori, but if spaced far enough apart (Ralph
et al. 1995 recommend spacing of at least 250 m), so
that the same individuals are not being counted at
different stations, the 100 point count stations can be
treated as independent. Assuming independence
among adjacent point count stations, and if the 100
point count stations were divided evenly among 4
habitats, then there would be 25 replicates.
As far as the three repeats per point count station
are concerned, one can average the data, select the
repeat with the highest score for each individual
species, or sum the data from each of the three
visits. If one wished to compare results among the
three visits (e.g., asking whether there was a
seasonal, within-year trend), one can analyze the 300
observations, using “point count station” as a
categorical variable to be controlled for; this is an
example of a repeated-measures design, in which
“point count station” is a blocking variable.
Independence of observations—This is an important
issue in statistical analysis, and is often
misunderstood. To start, what is required are that
outcomes be independent from one observation to
another, after controlling for factors or variables
that might be influencing the outcome. Suppose the
point count stations have been spaced 100 m apart
on transects of 1 km length. An investigator might
not feel comfortable in treating observations from
different stations on the same transect as being
independent of each other. One solution would be to
classify the transect as the unit of observation, i.e.,
pooling data from all point count stations on the
same transect, and analyze data accordingly.
Another solution would be to include in the analysis
a “transect effect.” This would control for the fact
that stations on the same transect are more likely to
be similar in outcome than are stations on different
transects. In this way one can investigate
differences among and within transects.
A second point is that the independence refers to the
outcome, not the independent variables or factors.
Suppose one related bird species richness to
vegetation. As long as bird species richness varies
independently from station to station (after
controlling for various factors), it would not matter
that all stations on a transect shared some of the
same vegetation characteristics. In other words,
there is no requirement that vegetation
characteristics be independent from one observation
unit to another.
General Considerations of Study Design
General study design considerations will apply to
most monitoring techniques and studies. Neter et al.
(1990) provides a good discussion of experimental
design, also see Skalski & Robson (1992) and
Crawley (1993); those wishing more detail can
consult specialized texts such as Hicks (1982). A
helpful and interesting discussion of the issues and
the process for designing an avian monitoring study
on one site such as a National Wildlife Refuge is
4 Statistical Guide to Data Analysis of Avian Monitoring Programs
given in Johnson (In Press). In this section, we
discuss some general points concerning design of a
study. Later when discussing each methodology in
turn (point counts, mist-netting and nest-monitoring),
we return to questions of design.
Throughout this Guide, the use of “station” refers to
one independent monitoring site, e.g., one point
count station (if observations are deemed
independent of other stations), one line transect, one
mist-netting array, one nest-monitoring plot, etc. It
is important to correctly determine the unit of
analysis early in the study design.
Design—The first and most important
consideration in designing a study is its objectives.
Statistical inference (in particular, tests of
statistical significance) may be of little interest, in
which case statistical power need not be considered
in determining the sample size needed. A biologist
may instead wish to monitor a particular area
mainly as a descriptive tool. If data are gathered
in a standardized fashion (Ralph et al. 1993), the
data from one area can contribute to regional or
national monitoring programs, which likely have
statistical inference as an objective. In many cases
the number of stations will be limited by available
resources or by the physical areas of interest.
Some field biologists will be able to establish one,
or at most, a couple of demographic monitoring
stations (e.g., one mist-net array or one
nest-monitoring plot). In those cases placement
of the station will usually be constrained by the
location and size of the habitat of interest, by the
density of the species of special concern, or be
centered on the location of the habitat or species
of interest.
Data from just a single demographic monitoring
station may be valuable for several reasons:
1. the data provide a description of temporal
patterns, which data can be combined with other
sources of data, 2. the data can allow statistical tests
of trends over time, given sufficient number of years
of data collection (possibly 10 years or more for a
single station), and 3. the data can be combined with
data from other monitoring stations.
Not every monitoring program needs to have
hypothesis testing as its goal from the outset. A
monitoring program may be able to collect valuable
data that can later be analyzed (by itself or as part
of a larger study), and that analysis would surely
include hypothesis testing and tests of statistical
significance. But it is pointless to erect contrived
hypotheses before data collection has begun, simply
in order to justify the establishment of a monitoring
program. After data have been collected, the
investigator will have a much better idea of how to
formulate meaningful hypotheses. This point does
not apply to experimental studies, where explicit
hypothesis formulation is an essential ingredient to
a successful study.
Assuming statistical inference is an important
consideration, one needs to determine whether the
objective is to determine trends through time,
establish bird-habitat relationships, compare effects
of different treatments, or other possible objective.
Choice of objective will influence questions of sample
size and allocation of stations (see Randomization
below).
Assuming that statistical inference is a goal, the
question of necessary sample size needs to be
related to statistical power, i.e., the ability to detect
an effect if there is one. Statistical power is an
elusive concept in part because it is arbitrary.
Calculations of sample size in the past have used
power values ranging from 50% to 95%. Clearly, the
greater the desired power, the greater the sample
size necessary to achieve that power. Generally, this
Guide uses values of 50% and 80%. In designing a
study one would not ordinarily consider 50% power
to be adequate and we do not recommend a study be
designed to achieve 50% power. Nevertheless 50%
power presents a useful level for a posteriori
investigations, where someone has already collected
these data and the biologist wishes to consider the
statistical power of the data to detect effects of
interest. Conversely, in designing a study, 80%
power is a commonly used and often-recommended
benchmark, but it is nothing more than a
benchmark.
Power calculations and sample size calculations
both rely on the presumed magnitude of the effect
in question. Clearly, the greater the presumed
effect (e.g., the greater the difference between
the two groups), the greater the power will be to
detect that effect, and, conversely, the smaller the
necessary sample size to detect an effect at a
specified power. The difficulty here is that the true
difference between groups is unknown, and
furthermore one cannot necessarily use the
observed magnitude of an effect (e.g., observed
difference between two groups) as the criterion
for judging power.
It is easy to fall into the trap of estimating power,
retrospectively, using the observed magnitude of an
effect, and several general statistical packages
appear to encourage users to do so, without
appropriate warnings (discussed in Thomas and
Krebs 1997). The problem is that if a statistically
significant effect is found, one would not normally
calculate power retrospectively. If the investigator
looks for an effect and finds there is one, then there
is little need to determine the probability of having
Introduction 5
found that effect. Therefore, retrospective power
calculations are usually pursued only when no
significant effect is detected. But given that no effect
was detected (statistically), it could be because the
observed magnitude of an effect was substantial, but
power was weak, or because the observed
magnitude of an effect was small, even negligible.
However, power will always be low to detect a
negligible effect. It is not very informative to
calculate that, given the negligible effect observed,
yes, one’s power to detect a negligible effect is
negligible. Thus, to be useful, retrospective power
analysis requires that only effects of a priori
interest be examined. In other words, in conducting
power analysis, the magnitude of the effect of
interest needs to be fixed independently of the data
at hand. The biologist must decide what is the
magnitude of an effect worth considering; this is a
biological, not a statistical, issue that is sometimes
difficult to settle.
Randomization—Randomization is an important
part of experimental design, owing to the work of
Sir Ronald Fisher in the early 20th century.
Randomization is used to combat biases that can
undermine survey and experimental studies. The
most important bias concerns assignment to
treatments. By randomizing assignment to
treatment (e.g., grazed vs. ungrazed), extraneous
differences among experimental units can be
minimized. Even here one would likely use
randomization subject to constraint. Suppose one
had five land units, each one that can be divided into
four plots. Randomly choosing treatment for the 20
plots could result in an unbalanced design. Instead,
one can randomly choose treatment, subject to the
constraint of 10 plots for each treatment. An even
better design would use land unit as a blocking
variable. Within each block (here, land unit), one
randomly assigns treatment to plots, with the
constraint that there must be two plots for each
treatment. Of course, in many studies assignment to
treatment is not always under the investigator’s
control.
Randomization should also be applied to minimize
other types of bias, if feasible. If two treatments
are being compared using point counts, using two
observers, one should not assign one observer to
conduct point counts in treatment A and the other
observer to conduct point counts in treatment B.
In this case, observer identity and the effect of the
treatment would be confounded. Instead, the two
treatments should be divided between the two
observers, as randomly or equitably as possible.
Another bias concerns order of observation. If
several plots are to be visited each day, one should
not visit the plots in the same order each time, but
should vary the order. It is not usually feasible to
visit point count stations in a random order, but
one can usually randomize the starting point on
each visit.
The final source of bias concerns inclusion in a study.
The sample to be studied will likely be the most
representative of the population in question if it is
randomly selected; however, this is often not
feasible. Nevertheless, we recommend incorporating
some randomness into every study. For example, one
could lay out a grid of point count stations, centered
on a randomly selected starting point as suggested
by Sauer (1998). This approach can be adapted for
those setting up transects of point count stations:
the starting point for a transect can be randomly
selected among a subset of possible points. Another
approach is to set up a grid of possible stations and
then randomly determine whether or not to include
individual stations in the study. Hutto et al. (1996)
and Hutto and Paige (1995) provide other
suggestions for randomizing point count stations
across broad areas.
Analysis of Vegetation and Habitat Characteristics
Data on vegetation and habitat features can play an
important role in avian monitoring studies. These
data can be gathered at different scales and in many
different ways. Methods of vegetation data
collection are described in many publications,
including Ralph et al. (1993), the BBIRD program
protocol (Martin et al. 1997), and Hays et al. (1981).
One of the most influential vegetation assessment
protocols developed for use with bird studies is by
James and Shugart (1970), with modifications by
Noon (1981). The analyses of vegetation data
collected in conjunction with point counts and
nest-monitoring are discussed in the appropriate
sections.
Vegetation data can be collected and analyzed at
several different scales. The broadest is habitat
classification and is qualitative (categorical) rather
than quantitative. This level includes most
vegetation maps and can be used to select the
vegetation types for study. The next broadest scale
is the “stand” level. This scale is commonly used to
ground-proof aerial photographs and, depending on
methods, to construct bird-habitat (or bird-vegetation)
correlations, making use of point count
and line transect data. The third scale involves
vegetation used to characterize the study area at a
smaller scale than the first methods, often within a
radius of 11.28 m following James and Shugart
(1970). In some studies, plots are centered on nests
or other sites of bird use (“use sites”), while others
(“non-use sites”) are randomly placed for
comparison within the study area. This scale allows
data that are more quantitative in nature to be
collected, compared to other scales. Examples of
6 Statistical Guide to Data Analysis of Avian Monitoring Programs
studies using this scale are Knopf et al. (1988) and
Larson & Bock (1986). This scale provides a good
means to establish bird-habitat relationships; such
data can be gathered quickly, accurately and
efficiently. The finest scale of vegetation
measurement is around the nest, nest plant or other
micro-habitat features (Martin & Roper 1988;
Martin et al. 1997).
Currently there is little agreement among biologists
on the methods, and even the scale, of vegetation
data collection needed to correlate with bird
abundance, habitat needs, distribution and behavior.
Therefore, it is not possible at this time to
recommend a single approach for analysis of
vegetation data since the data analytic approach will
depend on how the data were collected.
Introduction 7
Several techniques have been used for estimating
abundance of birds (Verner 1985, Bibby et al. 1992,
Butcher 1992, Skalski & Robson 1992, Buckland et
al. 1993, Greenwood 1996, Lancia et al. 1996). In the
past, two widely used and promoted methods have
been point counts and line-transects (Ralph & Scott
1981, Buckland et al. 1993). Capture/recapture data
is a third method used to estimate populations
(Greenwood 1996, Lancia et al. 1996). Following the
recommendations of the National Monitoring
Working Group of Partners in Flight (Butcher 1992)
and Ralph et al. (1993) and Ralph et al. (1995), we
restrict our attention to point counts. Line-transects
can also yield valuable data regarding population
abundance and species composition; however, the
design and analysis of transect data is beyond the
scope of this Guide (Ralph & Scott 1981, Buckland et
al. 1993). We assume that data will be collected using
fixed radius point counts, as described in Ralph et al.
(1993), rather than unlimited distance point counts
or variable distance point counts (Ralph & Scott
1981).
Throughout this Guide we discuss how to analyze
data gathered in a typical monitoring program and
then discuss design of monitoring programs,
especially sample size. Ideally, one should first put
careful thought into designing a monitoring
program before data collection and analysis.
However, here we discuss data analysis first in order
to give the reader a better idea of what sorts of data
can be gathered and what are some inferences that
can be drawn from data collected in a monitoring
program.
Analysis
Point count data have commonly been analyzed
with respect to 1. relative abundance, 2. species
richness, 3. species diversity and 4. community
similarity. An alternative to the analysis of relative
abundance, has been 5. the analysis of species
presence/absence (i.e., a species is scored as 1 if one
or more individuals are detected, and 0 if
otherwise). (We recommend not using the term
“frequency of occurrence” to characterize such
analyses, because of ambiguity of this terminology.)
However, from the point of maximizing statistical
power, the analysis of relative abundance (i.e.,
number of individuals detected per station) is to be
preferred to an analysis of presence/absence. The
latter discards information, leading to a loss of
statistical power. On this point we are in agreement
with Dawson (1981),
“[E]ither frequency of occurrence or average
number [per station] is adequate measure for
species which occur usually as one or none in each
counting unit. On the other hand, frequency
becomes an increasingly insensitive measure for
species found in larger numbers.”
Presence/absence may be very helpful as a
descriptive tool. That is, it may be informative to
state that a species was present at 40% of stations in
habitat x and 60% of stations in habitat y. Another
advantage of presence/absence data is that some
analytic methods can be used for such data but not
for total detections. For example, logistic regression
can be used with presence/absence, but not with
total detections. Logistic regression is discussed in
more detail in Chapter V and an example is provided
below of the analysis of presence/absence data.
Nevertheless, more sophisticated variants on
logistic regression can use total detections (e.g,
“ordered logistic regression”, StataCorp. 1999). Also,
Poisson regression, an analytic method that has
much in common with logistic regression, can
analyze total detections (Kleinbaum et al. 1988). As
its name implies, Poisson regression assumes that
the number of detections per station is Poisson-distributed,
but some software (e.g., EGRET)
includes the capability of testing this assumption
(and modifying the analysis if data do not conform to
this assumption).
Relative abundance is analyzed as number of
detections per unit area. The number of individuals
are determined at each point count station and this
datum can be entered into regression analyses or
analysis of variance (ANOVA). Results from several
point count stations can be averaged to produce a
summary statistic (Example 1). If a point count
station is surveyed more than once per season, one
can either sum the number of detections over all
point count surveys or calculate an average number
per point-count survey. As long as each station is
surveyed the same number of times (e.g., three
times), the two measures (average vs. sum) will
8
II. Assessment of Abundance and Species
Composition Using Point Counts
differ only by a constant, in this case, three. A third
commonly used method is to use the maximum
number of detections over the course of the three
surveys. In analyzing relative abundance these three
methods can be expected to yield similar patterns.
The number of individuals detected at a point count
station is a function of the absolute abundance and
the probability of detecting an individual (given that
it is present). Analyses of relative abundance
assume that differences in detectability can be
ignored, for the purposes of the study. In contrast,
variable distance methods (often referred to as
distance sampling; Buckland et al. 1993) attempt to
estimate detectability. The assumption that
differences in detectability are unimportant should
be kept firmly in mind when considering surveys of
relative abundance. Recent studies confirm that
detectability is influenced by a number of different
factors (Buckland et al. 1993, McShea & Rappole
1997, Gutzwiller & Marcum 1997).
Absolute abundance. Point count data are often used
to determine relative abundance; however, absolute
abundance may be estimated using variable distance
methods (Buckland et al. 1993, Ramsey & Scott,
1981). An important assumption of variable distance
methods is that at the center point of the
observation, all individuals are detected (i.e.,
detectability = 100%). It is possible to relax this
assumption if, instead, the true absolute density can
be independently determined at the center point,
but this is often not feasible. A second important
assumption is that individuals do not move towards
or away from the observer before being detected.
Buckland et al. (1993) provide extensive discussion of
these and other assumptions. The same authors
have developed a program DISTANCE that carries
out such analyses (Laake et al. 1993, Web site:
).
Species richness is analyzed as total number of
species detected. A total can be calculated for each
point count station, or for each group of point count
stations (Example 1).
There are a plethora of indices for species diversity
(Magurran 1988, Ludwig & Reynolds 1988). The
utility of diversity indices has been strongly
questioned by some (Verner & Larson 1989), and
their use has limitations. It has been argued that
species richness, a component of species diversity, is
more easily and more accurately measured. Species
richness is highly correlated with species diversity
and can be interpreted more clearly (Verner&
Larson 1989). An example of the value of a diversity
index (but one that is admittedly extreme) is a
comparison of two communities, each containing five
species and each with a total of 100 individuals.
Community A contains 96 individuals of species 1
and 1 individual of each of the other 4 species;
community B contains 20 individuals of each of five
species. Which community is more diverse? If one
feels that both are equally diverse, then species
richness is all one needs to take into account.
However, if one’s view is that community B is more
diverse, because its bird community is more
heterogeneous, then one is justified in using a
diversity index. However, keep in mind that more
assumptions are required to estimate diversity than
species richness. In particular, calculations of
species diversity assume that relative abundance is
accurately estimated and ignores the differences in
detectability among species that can skew estimates
of relative abundance.
The most widely used diversity index is referred to
as Shannon’s index, or as the Shannon-Wiener index
or the Shannon-Weaver index (Krebs 1989).
Shannon’s index, which is derived from information
theory, reflects both species richness and evenness
of distribution among species present. An equation
for the Shannon index, using natural logarithms
(ln) is:
where S = number of species in the sample, and pi
is the proportion of all individuals belonging to the
ith species. The original Shannon index was
calculated in terms of logarithm base 2, and thus H'
was expressed in terms of bits; however, it is more
common and more convenient to use natural
logarithms, as we have done above. A useful
transformation of H' is given by eH', which has been
labeled N1 (MacArthur 1965).
N1 expresses diversity in terms of species instead of
bits and thus is easier to interpret. N1 provides the
number of species that would, if each were equally
common, yield the same H' value as the actual
sample. For example, suppose there are 3 species, 20
of species A, 20 of species B and 10 of species C.
Using the above equation, H' = 1.055 and N1 = 2.87.
These three species, in their uneven distribution,
yield the same diversity value as would 2.87 species
of equal abundance. A comparison of species
richness (= S = 3) with N1 (= 2.87) gives us a
measure of evenness of species distribution. That is
the species distribution is maximally even when
S=N1.
For a fixed S, the maximum diversity (Hmax) is equal
to –ln (1/S) = ln(S) and therefore the ratio of
observed diversity to maximum diversity is a
measure of evenness (E): E = H'/Hmax = H'/ln S
i=S
H′ = Σ(pi)(lnp), i=1, 2,…S
i=1
Assessment of Abundance and Species Composition Using Point Counts 9
(Examples 1 and 2). If some species are more
detectable than others this will bias one’s measure of
diversity, either upwards or downwards. If the
Shannon index is calculated for a number of
samples, the indices themselves will be normally
distributed, making it possible to use parametric
statistics to compare sets of samples using diversity
indices (Magurran 1988). Further techniques for the
analysis of diversity patterns are described in
Magurran (1988) and Pielou (1975).
Example 1:
Calculation of Summary Statistics
The following is a simple and hypothetical example
of data collected using point counts (Table 3).
Observations were made at 3 point count stations at
3 different times during the breeding season.
Species are uniquely identified by a single letter
(A, B, C, etc.).
From these data, summary statistics can be
calculated, first of all summing (or averaging) across
the three survey periods, and then summing (or
averaging) across the three point count stations
whose data have already been summed over the 3
survey periods. Such a summarization is shown in
Table 3B.
The results shown in Table 3A for each point count
station can be used in a statistical analysis (e.g.,
regression or ANOVA) (Example 3). The biologist
may also summarize results for a group of point
count stations characterized by an important
similarity, e.g., all stations at a specific site, or all
stations in a specific habitat on a refuge, or other
unit of interest.
The row titled “Average” in Table 3B (second from
the bottom), simply averages the results from point
count stations 1-3. The row titled “Cumulative”
(bottom) shows the total number of individuals seen
at the 3 point count stations (a measure of
abundance), the total species richness for the 3
stations, and the species diversity as measured for
all 3 stations taken together. Thus, the average
station had 5 species, but the three stations together
had 7 different species. For average number of
individuals seen per point count survey, the
“Cumulative” value is simply three times that of the
“Average” value (i.e., 12.33 = 4.11 × 3). Thus the only
difference between these two measures is that in
one case one sums the number of individuals and
divides by the number of point count stations and in
the other case one sums and does not divide. Any
statistical results will be identical whichever
measure of individuals detected is used, except for a
10 Statistical Guide to Data Analysis of Avian Monitoring Programs
Table 3. Example of data from point count observations conducted at three point count stations, three times
during the breeding season.
A. Results by species. “A, A” indicates two individuals of species A were seen, “A, A, A” indicates three individuals,
“A, B, C” indicates one individual of three species, etc.
Point Count Survey Species Number of Species
Station Number Observed Individuals Richness
1 1 A, A, B, C 4 3
1 2 B, B, C, D 4 3
1 3 A, C, D, E 4 4
2 1 B, B, B, C 4 2
2 2 B, B, D 3 2
2 3 B, B, F 3 2
3 1 B, C, C, D 5 3
3 2 B, C, E, F, F 5 4
3 3 B, C, E, F, F, G 6 5
B. Summarization of data from Table 4A.
Point Count Average Number Cumulative Ecological
Station Individuals Species Richness Species Diversity1 Eveness = E
1 4.0 5 4.69 0.960
2 3.33 4 2.56 0.678
3 5.0 6 5.24 0.924
Average 4.11 5.0 3.86 0.839
Cumulative 12.33 7 5.55 0.881
1 Shannon’s index expressed as N1
constant (in this case, 3, the number of point count
stations).
In contrast to measures of abundance, Average
species richness and Cumulative species richness,
will generally not be so simply related to each other.
At one extreme average species richness will equal
cumulative species richness where there is complete
overlap of species at each point count station. At the
other extreme, cumulative species richness will be
three times that of average species richness
(assuming one is summarizing data from three point
count stations) provided there is no species overlap
at any point count station. Reality will usually fall
somewhere in between. Either way of summarizing
species richness can be justified. The same holds for
species diversity; the average diversity (per point
count station) and the diversity of the group of point
count stations are both legitimate ways to
characterize diversity.
Community Similarity Indexes
Another method of comparing communities is to
measure the degree of association or similarity in
community composition between sites or samples.
For example, two sites may be identical in species
richness, but both have completely different species.
For this purpose, a wide range of similarity indices
have been developed (Magurran 1988). Two such
indices that are widely used and that rely only on
presence/absence data are the Jaccard index and
Sorensen index (Krebs 1989):
where j = the number of species found at both site A
and B, a = the number of species in site A and b =
the number of species found in site B. These indices
are designed to equal 1 where the species from the
two sites are the same and 0 if the sites have no
species in common. Example 2 and Table 4 provide
an example of a calculation of Jacard and Sorenson
similarity coefficients.
One of the advantages to these indices is their
simplicity, but the indices do not account for
differences in the abundance of species. All species
count equally in the equation whether they are
abundant or rare. For this reason, quantitative
indices of similarity have much appeal as an
alternative. Again, many such indices have been
developed (Magurran 1988, Krebs 1989). Here we
just mention one of the simplest, the Renkonen
2j
Sorenson Cs = _______
a+b
j
Jaccard Cj = _______
a+b–j
index, also called the Percentage Similarity index.
The formula for the Renkonen index (P) is:
where pA
i is the percentage of species i in sample A
and pB
i is the percentage of species i in sample B
and S is the number of species found in either
sample. With no overlap between samples the index
equals 0, with complete similarity the Renkonen
index equals 100%. Table 4 provides an example of
the Renkonen index.
Example 2:
Calculation of Community Similarity Indices
The following is an simplified example of data
collected using point counts (Table 4). Observations
were pooled using the highest number counted
during 3 different surveys in the breeding season,
and pooled across 5 paired treatment-control plots
(modified from Dieni 1996). Community similarity
and diversity indices can be calculated and
comparisons made using these data.
The row titled “number of individuals” in Table 4 is
the sum of the total number of individuals counted in
each site. The columns titled “pa” and “pb” are the
proportion of each species in the total; i.e., the
number of individuals divided by the total number of
individuals for that site. The calculation of Jaccard’s
index is the number of species in common (j) to both
sites divided by the difference between the sum of
the number of species in each site minus the number
in common. Sorenson’s index is 2 times j divided by
the summation of the number of species in both
sites.
Other indices may also be informative including the
Renkonen index which is calculated by taking the
summation of the minimum of either pa or pb. Other
examples of the calculations of indices that may be
useful are shown in Table 4.
Linear Regression
To introduce linear regression, and provide a simple
example of trend analysis we consider the following.
Example 3:
An Example of Simple Regression
Black-headed Grosbeaks (Pheucticus
melanocephalus) have been surveyed at the
Palomarin station of Point Reyes National Seashore
during the breeding season for many years. Here we
present data from 1980-1992 (13 years) and wish to
i=S
P = Σ minimum (pA
i , pB
i)
i=1
Assessment of Abundance and Species Composition Using Point Counts 11
determine if there has been a trend for numbers to
increase or decrease during this period.
Keep in mind four key assumptions of linear
regression analysis:
1. Normality of residuals
2. Homoscedasticity; that is, there are no systematic
differences in variance of residuals
3. Independence of the outcome variable (i.e.,
independence of residuals), and
4. That we are interested in testing the hypothesis
(HA) that there is some sort of linear relationship
between dependent and independent variable. In
this case, the hypothesis is that bird abundance is
decreasing or increasing with time, in a linear
fashion.
Note that assumptions 1-3 refer to residuals, i.e., the
difference between the observed value of the
dependent (i.e., outcome) variable and the predicted
value from a regression model, we have to fit a
regression model before we can evaluate the
residuals. Figure 1 shows observed data and fitted
regression lines for this example, for
log-transformed data (Figure 1A), and for
untransformed data (Figure 1B). The log
transformation is commonly used in analyses of
linear models (e.g., regression and ANOVA;
additional examples below). There are two reasons
for using a logarithmic transformation:
12 Statistical Guide to Data Analysis of Avian Monitoring Programs
Table 4. Calculation of diversity, similarity and evenness indices using total bird detections across sites in
burned and unburned aspen (Populus tremuloides) stands in Wyoming (modified from Dieni 1996).
Number of Individuals Statistical Transformations
Minimum
Species Burned Control pa or pb pa pa ln pa pb pb ln pb
Red-tailed Hawk 2 0 0.000 0.004 –0.021 0.000 0.000
American Kestrel 1 0 0.000 0.002 –0.012 0.000 0.000
Northern Flicker 36 20 0.036 0.068 –0.182 0.036 –0.119
Western Wood-Pewee 21 39 0.040 0.040 –0.128 0.070 –0.186
Dusky Flycatcher 13 9 0.016 0.024 –0.091 0.016 –0.066
Tree Swallow 47 29 0.052 0.089 –0.215 0.052 –0.153
Clark’s Nutcracker 3 0 0.000 0.006 –0.029 0.000 0.000
Black-capped Chickadee 13 18 0.024 0.024 –0.091 0.032 –0.110
White-breasted Nuthatch 0 3 0.000 0.000 0.000 0.005 –0.028
Red-breasted Nuthatch 1 6 0.002 0.002 –0.012 0.011 –0.049
House Wren 127 142 0.239 0.239 –0.342 0.254 –0.348
Hermit Thrush 0 1 0.000 0.000 0.000 0.002 –0.011
American Robin 38 47 0.072 0.072 –0.189 0.084 –0.208
Warbling Vireo 163 199 0.307 0.307 –0.363 0.355 –0.368
Orange-crowned Warbler 14 40 0.026 0.026 –0.096 0.071 –0.189
Brewer’s Blackbird 3 0 0.000 0.006 –0.029 0.000 0.000
Western Tanager 2 2 0.004 0.004 –0.021 0.004 –0.020
Pine Siskin 33 3 0.005 0.062 –0.173 0.005 –0.028
American Goldfinch 1 0 0.000 0.002 –0.012 0.000 0.000
Cassin’s Finch 13 2 0.004 0.024 –0.091 0.004 –0.020
Number of individuals 531 560
Number of species 18 15
Number of species in common (j) 13
Summations 0.827 1.0 –2.095 1.0 –1.903
Jaccard (Cj) 0.650
Sorenson qualitative (Cs) 0.788
Renkonen index (P) 0.827
Shannon diversity (H) 2.095 1.903
Shannon evenness (E) 0.725 0.703
Shannon maximum value (Hmax) 2.890 2.708
1. Linear models assume additivity, but the
relationship between the dependent variable and an
independent variable may be multiplicative, i.e., with
an increase of each unit in x, y increases by a
constant proportion. Exponential growth or decline
of a population is a good example of a multiplicative
model. In this case, we may wish to fit a model in
which Black-headed Grosbeak numbers increase or
decrease by d% per year; our objective is to estimate
the value d, and test whether it is significantly
different from zero.
By taking logarithms, one can convert a
multiplicative relationship,
y = abx,
into an additive relationship,
log(y) = log(a) + (log(b))(x).
What was once a multiplicative relationship can be
rewritten in an additive form,
y′ = a′ + b′x.
2. The logarithmic transformation can often
normalize residuals (as shown below), thus
conforming to an important assumption of
regression analysis, as well as of ANOVA, ANCOVA,
and similar analysis.
A regression analysis on the log-transformed data is
appropriate, but before doing so, we present typical
output from STATA (Table 5) from a regression
analysis with annotated comments (numbers below
correspond to numbers on the output). Table 5A
shows analysis of log-transformed data; Table 5B
shows analysis of untransformed data.
1. Sums of Squares (“SS” in Table 5), degrees of
freedom (“df ”), and Mean Squares (“MS”) are
provided for the model being examined. This output
is usually of greater interest in ANOVA than in
regression analyses. Sums of squares are included in
R2 and R2a (#3, below). “Model” refers to
independent variables (in this case, only one) and
does not include the “constant” term.
2. The F statistic (“F”) for the entire model
(excluding the constant) is shown, and the P-value
associated with that statistic (“Prob >F”). The
degrees of freedom of the numerator (the first term
within the parentheses) equals the number of
parameters in the model, excluding the constant. If
a model includes linear trends for two independent
variables, the numerator df is equal to 2. If the
model, instead, includes quadratic and linear terms
for a single independent variable then the
numerator df is also equal to 2. If the model includes
linear trends for two independent variables and
their interaction, then the numerator df is equal to 3,
and so on.
The overall P-value, while of some interest, should
be of less concern than P-values for individual terms.
A model which contains one very significant
independent variable and one insignificant
independent variable can generate a highly
significant overall P-value, though such a model
Assessment of Abundance and Species Composition Using Point Counts 13
Figure 1. A. Linear trend in log(number Black-headed Grosbeaks observed) in relation to year (1980 to
1992), (statistical analysis in Table 5A). Triangles indicate log(number observed) in each year; solid line
indicates best-fitting trend using linear regression analysis. The trend depicted is a log-linear trend.
B. As in A. but numbers observed are untransformed. Statistical analysis in Table 5B; trend depicted is a
linear trend. Note that trend line fits observations better for log-transformed data (Figure 1A) than for
untransformed data (Figure 1B); e.g., with a higher R2 0.637 vs. 0.545.
Figure 1A. Trend, log-linear, P=0.001
Black-headed Grosbeak, Palomarin 1980-1992
Figure 1B.Trend, linear—notransformation,P=0.004
Black-headed Grosbeak, Palomarin 1980-1992
would be undesirable. On the other hand, if two
independent variables are highly correlated, each
variable could be insignificant (when controlled for
the other), yet the overall model could be very
significant and provide a good predictive model.
3. R2 (“R-square”) and adjusted R2 (“Adj R-square”).
The first statistic is often referred to as the
coefficient of determination. While it should be
familiar to all field biologists, much confusion still
surrounds its use or abuse (Anderson-Sprecher
1994). The second statistic is probably unfamiliar to
many, yet should be more widely known and used
(Neter et al. 1990, Kleinbaum et al. 1988). R2 can be
interpreted as the proportion of variation in the
dependent variable that can be accounted for by the
model in question. Both statistics provide a measure
of the predictive ability of a model. If R2 and
adjusted R2 are low, this means that much variation
in the Y variable is not accounted for by the model,
but this does not reflect on the adequacy of the
model. In Table 5A, R2 = 0.637, meaning that 36% of
the variation in Black-headed Grosbeak numbers is
not accounted for by an exponential decline in
numbers with increasing year.
There are several drawbacks to R2. For one, any
regression model will have a positive R2 associated
with it, even a regression model that links two
variables that are completely unrelated. To provide
an example, we generated two random variables X,
Y, integers chosen from a uniform distribution (0,
100) and which were independent of each other.
Values for X were (3, 67, 98, 63, 25, 90, 34, 4, 31, 78)
and for Y were (44, 91, 30, 92, 26, 56, 57, 90, 81, 47).
Regressing Y on X we obtain R2 = 0.021 (P = 0.69).
We would feel uncomfortable in stating that “X
accounted for 2.1% of the variation in Y,” since in
reality we know that it accounts for no such
variation. The second drawback is that as one adds
additional terms (additional independent variables),
R2 will always increase (Neter et al. 1990). Adjusted
14 Statistical Guide to Data Analysis of Avian Monitoring Programs
Table 5A. Linear regression analysis of number of Black-headed Grosbeaks, breeding season,
log-transformed (=ltotbrs) vs. year.
Source SS df MS Number of obs = 13
Model 2.71854704 1 2.71854704 F (1, 11) = 19.31
Residual 1.54865857 11 .140787142 Prob > F = 0.0011
Total 4.26720561 12 .355600468 Rsquare = 0.6371
Adj Rsquare = 0.6041
Root MSE = .37522
ltotbrs Coef. Std. Err. t P>|t| [95% Conf. Interval]
year .1222173 .0278129 4.394 0.001 .183433 .0610016
_cons 245.1384 55.23646 4.438 0.001 123.5637 366.713
Table 5B. Linear regression analysis of number of Black-headed Grosbeaks, breeding season,
untransformed (=totalbrs) vs. year.
Source SS df MS Number of obs = 13
Model 298.291209 1 298.291209 F (1, 11) = 13.20
Residual 248.631868 11 22.6028971 Prob > F = 0.0039
Total 546.923077 12 45.5769231 Rsquare = 0.5454
Adj Rsquare = 0.5041
Root MSE = 4.7543
totalbrs Coef. Std. Err. t P>|t| [95% Conf. Interval]
year 1.28022 .3524085 3.633 0.004 2.055866 .504574
_cons 2554.44 699.8845 3.650 0.004 1014.004 4094.875




R2 (R2a) was developed to counteract these
drawbacks. Adjusted R2 is defined as
where n = number of observations and p = number
of parameters (including the constant), SSE equals
Sums of Squares of the Residual and SSTO equals
Total Sums of Squares. Note that
In other words, R2a is equal to R2 after multiplying
the proportion of unexplained variance by
(n–1)/(n–p). This ratio (the adjustment factor) is
always equal to or greater than one, and therefore
R2a will always be less than or equal to R2. As n gets
large this ratio diminishes, and as p gets large, the
ratio increases. The properties of R2a are that:
1. If there is no relationship between two variables,
R2a will, on average, be equal to zero. Thus, under
the null hypothesis, R2a provides an unbiased
measure of the true relationship between the two
variables. In other words, if Y and X are completely
unrelated, R2a but not R2 will, on average, equal
zero. In the example cited above (of random X and
Y), R2a = –0.101. Any R2a less than 0 makes it
unambiguously clear that one variable does not have
value in predicting the other.
2. R2a will not necessarily increase as one adds
parameters. If the gain in R2 is small, then R2a can
decrease because the gain in R2 does not offset the
decrement due to the increase in p. Thus, R2a can
provide a good means of selecting the best
predictive regression model. In fact, the model
which maximizes R2a is also the model that
minimizes Mean Square Error (equivalently, Root
MSE), which is a measure of residual variation
about the predicted regression line.
4. Root Mean Square Error (“Root MSE”). This
provides a measure of the variability about the
regression line. In other words, it is the residual
variation left after allowing for the effect of, in this
case, year on Black-headed Grosbeak numbers. It is,
literally, the square-root of the Mean Square
associated with the Residual term (i.e., “error”).
Root MSE would equal the standard deviation of the
outcome variable if there were no explanatory power
to the independent variable (i.e., R2 = 0.0);
otherwise, Root MSE is less than the standard
 SSE 
______
SSTO R2 =1–
 SSE 
______
SSTO
 n–1
____
 n–p R2a = 1 –
deviation. Note that Root Mean Square Error in this
example is the measure of variance which the
programs MONITOR and TRENDS ask for
(described in detail below).
5. The regression coefficients (“Coef.”), their
standard errors (“Std. Err.”), and results of t tests,
examining whether t is significantly different from
zero, are shown (“t” and “P>|t|,” respectively).
Shown first is the regression coefficient for the
independent variable, Year. From Table 5A, our best
estimate (assuming that linear regression
assumptions are met) is that the number of birds
observed declines at an instantaneous rate of 0.122
units, expressed in natural logarithms. This
translates to an 11.5 percent decline per year, i.e.,
each year the number of detected birds is 0.885 times
that of the previous year. When the untransformed
data are analyzed (Table 5B), the best estimate is a
decline of 1.28 birds per year.
Shown below the coefficient for year is the
coefficient for the intercept term (here termed
“constant”). The value of the intercept term
provides the predicted value when the independent
term (here Year) equals zero. Thus its value depends
on how the independent variable is coded. Year = 0
might refer to the year 0, to the year 1900, or to any
other year so designated. The designation is
arbitrary and won’t affect the regression coefficient
for the term, Year. Note that STATA evaluates the
regression coefficient for Year using a two-sided
test, which we consider appropriate.
6. The 95% confidence interval for the regression
coefficients are presented. We recommend that
biologists examine confidence intervals for
regression coefficients; a confidence interval can
provide clear evidence of the precision (or lack of
precision) of our analysis. For an example, where an
analysis indicates no significant effect, a confidence
interval may indicate that a very broad range of
values is consistent with the data.
Comparing Table 5A and 5B (corresponding to
Figure 1A and 1B), we see that log-transformation
(Table 5A, Figure 1A) produces a better fitting
model (higher R2, more significant P-value) than
does analysis of untransformed data. This implies
that Black-headed Grosbeaks are declining at a,
more or less, constant proportion rather than at, a
more or less a constant decrease, using the absolute
number of individuals. This result makes biological
sense. Evaluating residuals confirms that the log-transformed
model is preferable. For example, we
can evaluate whether skewness and kurtosis of
residuals deviates from normality for each model
using the Skewness/Kurtosis test in the program
STATA (StataCorp. 1999). For log-transformed data,
Assessment of Abundance and Species Composition Using Point Counts 15
we cannot reject the hypothesis of normality (P =
0.25), whereas for untransformed data we can reject
the assumption of normality (P = 0.0003) (results
obtained using “sktest” in the program STATA).
Results will not always be this clear-cut; we may
want to use graphical methods to examine normality
of residuals. Figure 2 shows a normal probability
plot for transformed (Figure 2A) and untransformed
data (Figure 2B). We won’t go into the details of
these plots (interested readers can refer to
Kleinbaum et al. 1988, Neter et al. 1990); the main
point is that if residuals are normally distributed,
the data points will fall on the straight line shown.
For the log-transformed data there is a reasonably
good match between data points and the line; for
untransformed data there is not. The graphical
method does not determine whether or not the
residuals are normally distributed. It does indicate
to what extent transformation is or is not improving
the normality of residuals.
Example 4:
Application of Simple and Multiple Regression
We now tackle a more complex example, taken from a
study by the Point Reyes Bird Observatory,
conducted for the California Department ofFish&
Game (Nur et al. 1994).We use this example as an
opportunity to provide guidance in carrying out
multiple regression analysis. In July1991an herbicide
was accidentally spilled in and near the Sacramento
River, close to Dunsmuir,CA, resulting in the death of
all aquatic forms of life for a 36-mile stretch of river.
In addition, terrestrial fauna and flora along the river
were thought to have been impacted. Nur et al. (1994)
report results of an avian monitoring project
designed to assess the impact of the spill on
terrestrial bird populations.Aquantitative measure
of presumed impact was developed by California
Department ofFish&Game biologists, relying on
defoliation, leaf death and other symptoms of stress
exhibited by the riparian vegetation, which we term
the Vegetation Damage Index. Sites along the river
varied in the degree of impact, depending on the
exposure to the herbicide. In general sites closer to
the spill site in the downstream direction received
greater damage, and therefore higher values of the
damage index. Point counts were laid out in transects
of 7 stations per transect, stations spaced 300m
apart, with each transect parallel to the river and1800
min length, with one transect per “site”. All transects
were in riparian habitat.
In general, there was a tendency for areas with high
damage to show low species richness (Figure 3). In
particular, there was an overall significant linear
trend for bird species richness to decline with
increasing damage, when analyzing all 55 point
count stations. Output for this analysis is shown in
Table 6A (using the program STATA). Note that in
Table 6A, R2 = 0.149, meaning that 85% of the
variation in species richness among point count
16 Statistical Guide to Data Analysis of Avian Monitoring Programs
Figure 2. Evaluating the assumption of normality using graphical techniques. Comparison of “Normal
probability plots” depicting residuals from analysis of log-transformed (Figure 2A) and untransformed
(Figure 2B) observations of Black-headed Grosbeaks. Figure 2A and 2B depict the empirical cumulative
distribution function expected if the variable were normally distributed (y-axis) vs. the observed cumulative
distribution function (x-axis). If the variable in question were normally distributed then the graphed points
would fall exactly on the solid line and the correlation between the two cumulative distribution functions
would be +1.0. Log-transformed data conform better to a normal distribution than do untransformed
observations.
Figure 2A. Normal probability plot,
residuals of log-transformed data
Black-headed Grosbeak
Figure 2B. Normal probability plot,
residuals of untransformed data
Black-headed Grosbeak
stations is not accounted for by differences in the
damage index. Our interpretation of this result is
that species richness data from individual point
count stations are very variable. The model is,
however, highly significant, and we have no reason to
think the model is inadequate.
One needs to keep in mind that there are two
different objectives for which one can use regression
models: (i) hypothesis testing, and (ii) prediction. In
this case, a model with only vegetation damage
would poorly predict species richness at a specific
point count station. However, such a model achieves
the objective of confirming the hypothesis that
biological damage resulting from the spill was
associated with diminished species richness.
Also keep in mind that the magnitude of R2 depends
on the unit of analysis. If one were to average data
from several point counts and then use the averaged
data in a regression analysis, this would have little
effect on the P-value, yet would increase R2
substantially. This is because some of the variation
in the dependent variable has been eliminated by
using mean species richness values in the regression
analysis, rather then species richness at individual
point count stations.
We confirmed that the linear regression analysis in
Table 6A is appropriate, first by examining
normality of residuals: P = 0.50, using the skewness/
kurtosis test (“sktest” of STATA). In other words,
residuals do not appear to deviate from normality.
We demonstrate this point graphically in Figure 4.
Figure 4A shows the frequency distribution of
residuals compared to a normal distribution; Figure
4B shows a quantile-normal plot for the residuals
from Table 6A. (Quartiles and percentiles are
examples of quantiles; a quantile-normal plot shows
quantiles for the distribution of interest vs. quantiles
from a normal distribution which matches the first
distribution in terms of mean and variance.) As with
a normal-probability plot (Figure 2), if the
distribution is indeed normal, then the data points
(quantiles in this case) would fall on the solid line
shown in the Figure 4B. In this case, there seems to
be a very good match, implying that residuals are
approximately, normally-distributed.
That bird species richness was correlated with the
Vegetation Damage Index is not by itself adequate
evidence for a causal link. In a similar fashion to the
analysis in Table 6A, a suite of vegetation
characteristics were examined, to determine
whether bird species richness, diversity and
abundance were related to habitat or vegetation
features. If so, such habitat variables could be
confounding any relationship of the bird fauna to the
impact of the spill. In one scenario, there could be no
true functional relationship between bird species
richness and vegetation damage, but a correlation
between the two can arise if both are correlated with
a vegetation feature. In another scenario, the true
causal relationship between bird species richness
and vegetation damage could be strong but it could
be masked, wholly or in part, because both are
correlated with a vegetation feature. For example, if
biological damage from the spill lowered species
richness, and the presence of willow (Salix spp.)
increased species richness, then if biological damage
was greatest in an area where willows were most
abundant, the correlation between biological
damage and bird species richness could be very
weak despite a strong causal relationship between
the latter two variables.
Nur et al. (1994) examined 25 habitat features to
determine whether they might be correlated with
abundance, species richness and/or species diversity.
They found that only two habitat features were
significantly correlated with abundance, species
richness and diversity. The latter variables were
positively correlated to the presence of willow
species and negatively with the presence of big-leaf
maple (Acer macrophyllum), i.e., the more big-leaf
maple, the fewer the bird species detected. The
independent variables were indices based on percent
cover that was willow (on a 0 to 10 scale,
corresponding to 0 to 100%), and percent cover of
big-leaf maple. Results of simple linear regression of
bird species richness in relation to willow cover and
Assessment of Abundance and Species Composition Using Point Counts 17
Figure 3. Bird species richness from 55 point count
stations along the Sacramento River, in relation to
Vegetation Damage Index. Higher values imply
greater damage from spill of metam sodium
(statistical results in Table 6A). Least squares line of
best fit is shown. Data at each point count station
have been “jittered” (Stata Corp. 1997) to reduce
overlap of points.
Figure 3. Bird species richness in relation to
Vegetation Damage Index.
in relation to big-leaf maple cover are shown in
Tables 6B and 6C, respectively.
The next step in the analysis was to conduct a
multiple regression analysis including the three
independent variables (Table 7). In this case, the
primary interest was the effect of damage index
while controlling for the two habitat variables. The
results indicate that damage was still inversely
correlated with species richness, even after
controlling for one or the other habitat variable, or
after controlling for both of the habitat variables
(Table 7). These results give support to the view that
biological damage due to the spill reduced species
richness along the river. The results do not support
the alternative view that the inverse association
between species richness and damage was
coincidental, reflecting habitat or vegetation
differences among sites along the river.
The degree and direction of differences among
independent variables (if it exists) can be assessed
by comparing regression coefficients in the
simple regression analysis (Table 6) and in the
corresponding multiple regression analysis
18 Statistical Guide to Data Analysis of Avian Monitoring Programs
Table 6. Sample output for linear regression analyses using STATA. See text, Example 4.
A) model: species richness [specrich] = Vegetation Damage Index [vegdindx]
Source SS df MS Number of obs = 55
Model 118.969398 1 118.969398 F (1, 53) = 9.28
Residual 679.466965 53 12.8201314 Prob > F = 0.0036
Total 798.436364 54 14.7858586 Rsquare = 0.1490
Adj Rsquare = 0.1329
Root MSE = 3.5805
specrich Coef. Std. Err. t P>|t| [95% Conf. Interval]
vegdindx 1.663229 .5459849 3.046 0.004 2.758336 .5681219
_cons 9.368597 .5243445 17.867 0.000 8.316895 10.4203
B) model: species richness [specrich] = willow cover [willotco]
Source SS df MS Number of obs = 55
Model 78.2553574 1 78.2553574 F (1, 53) = 5.76
Residual 720.181006 53 13.5883209 Prob > F = 0.0200
Total 798.436364 54 14.7858586 Rsquare = 0.0980
Adj Rsquare = 0.0810
Root MSE = 3.6862
specrich Coef. Std. Err. t P>|t| [95% Conf. Interval]
willotco .0838145 .0349257 2.400 0.020 .0137624 .1538667
_cons 8.184659 .549244 14.902 0.000 7.083015 9.286303
C) model: species richness [specrich] = big-leaf maple Cover [bigleaco]
Source SS df MS Number of obs = 55
Model 111.19329 1 111.19329 F (1, 53) = 8.58
Residual 687.243074 53 12.9668504 Prob > F = 0.0050
Total 798.436364 54 14.7858586 Rsquare = 0.1393
Adj Rsquare = 0.1230
Root MSE = 3.601
specrich Coef. Std. Err. t P>|t| [95% Conf. Interval]
bigleaco .0290063 .0099058 2.928 0.005 .0488740 .0091387
_cons 9.799174 .6043525 16.214 0.000 8.586997 11.01135
(Table 7). The effect of damage index was similar
when analyzed by itself or after controlling for
willow tree cover (ß = –1.66 ± 0.55 vs. ß = –1.58 ±
0.53). This indicates that willow cover did not
confound the relationship between vegetation
damage and species richness. On the other hand, the
apparent effect of damage index, was stronger when
analyzed by itself than after controlling for big-leaf
maple (ß = –1.66 ± 0.55 vs. ß = –1.26 ± 0.56). Big-leaf
maple tended to be more prevalent in areas
where biological damage was greater (in fact, there
was a significant correlation between the two,
P<0.01) and thus part of the apparent reduction in
species richness with increasing damage may be
attributed to the influence of big-leaf maple.
Analyzing Vegetation Data in Relation to
Point Count Data
In Example 4 (Tables 6-7), we provide an example of
vegetation data analysis coupled with analysis of
data on bird populations. In this case, the objective
was to determine whether the relationship between
vegetation damage and species richness was due to a
direct effect of spill-induced damage, or whether the
correlation was spurious and due to the fact that
both were correlated with additional habitat
variables. There was evidence that bird species
richness was related to habitat variables (specifically
the presence of willow and big-leaf maple), but these
relationships could not by themselves account for
the observation that bird species richness declined
as vegetation damage increased.
Collecting data on many habitat and vegetation
features doesn’t answer the question of which
habitat and vegetation features are causally related
to bird abundance or distribution. If data on
important variables are not collected then
interpretation of the data that were collected can be
compromised. There is still the problem of sifting
through the data to determine which features are
most closely related to the response variable in
question. Many techniques have been used by
investigators to evaluate multi-dimensional data,
including logistic regression, discriminant analysis,
principal component analysis, correspondence
analysis and MANOVA (Ludwig & Reynolds 1988,
Trexler & Travis 1993). It is beyond the scope of this
Guide to review these various techniques; however,
an example of the use of discriminant analysis is
presented in the section on vegetation analysis in
relation to nest-monitoring.
Design
Even if statistical inference is of no concern to the
investigator, one must decide on sample size. Ralph
et al. (1995) recommend at least 30 point count
stations per habitat and per area of interest. If one
wished to monitor two habitats in each of two areas
then this would necessitate 120 point count stations.
This is only a base number and the number of
stations should be increased if few individuals of a
species or group of species of interest are detected.
Where statistical inference is a goal, then sample size
will be dictated by considerations of statistical
Assessment of Abundance and Species Composition Using Point Counts 19
Figure 4. Evaluation of the assumption of normality of residuals using graphical techniques. Residuals of
linear regression analysis of species richness are depicted (statistical model in Table 6A). Figure 4A)
Frequency distribution of residuals (histogram), superimposing a frequency distribution for a normally-distributed
variable with the same mean and variance as the observed variable. Figure 4B) Same residuals
as in A) but graphed using a quantile-normal plot. The quantiles for the observed distribution (residuals as
in Figure 4A) are plotted against the quantiles from a normally-distributed variable with the same mean and
variance as the variable in question. Normality is demonstrated if the observations fall on the solid line.
Both A) and B) confirm that residuals are normally distributed.
Figure 4A. Distribution of residuals: species
richness vs. Vegetation Damage Index
Figure 4B. Quantile-quantile plot of residuals of
species richness vs. Vegetation Damage Index
against normal distribution
power; examples include comparisons among
treatments and monitoring programs that assess
temporal trends (Gerrodette 1987, 1991, Link &
Hatfield 1990; for a general review, see Thomas &
Krebs 1997).
Dawson (1981) has derived an equation for
determining the sample size, in this case the number
of point count stations necessary to detect an effect
of interest with 50% power. He assumed that each
station is surveyed once, and that the number of
detections (for each species or group of species) at
each station follows a Poisson distribution. If the
distribution of bird-detections deviates from Poisson
distribution, then the formula would need to be
revised. Under a Poisson distribution the mean
number of detections and the variance in the
number of detections per point count station would
be equal. If the variance substantially exceeds the
mean, or vice versa, one could either modify his
formula, or use other formulas (as modified below).
Dawson also assumed that there were two groups
(two habitats, two treatments, etc.) of interest. The
formula for sample size calculations is
[1]
(3.84)(20000)
n > ______________
(d2)(m)
where m = average number of detections per
sampling unit and d = percent difference between
group 1 and group 2, defined as
For example, if m1 = 2.5 and m2 =1.5, then m = 2.0
and d = 50.
Thus, where the average number of individuals
detected per station is equal to 1, the number of
stations per group necessary to achieve 50% power
to detect a 50% difference in mean abundance is
30.7. To detect a 25% difference would require 123
point count stations per group. That is, to detect
half the difference (all else being equal) requires 4
times the sample size! This exemplifies a general
rule—precision increases, and therefore standard
errors decrease, in proportion to the square root of
sample size. Note that as average number of
detections increases, sample size (number of
stations) decreases linearly—and proportionally.
This emphasizes that calculations of necessary
sample sizes reflect the average number of
detections. Thus, if the average number of
individuals detected per station is 0.5 rather than
m1–m2
_______
 m 
d = (100)
20 Statistical Guide to Data Analysis of Avian Monitoring Programs
Table 7. Analysis of point count data on Sacramento River. Relationship of bird species richness to Damage
Index, controlling for vegetation/habitat characteristics (n = sample size).
A. Multiple regression analysis of bird species richness per point count station in relation to vegetation
damage and willow cover
Model Statistics Independent variable1
R2a = 0.201, R2= 0.231, P = 0.0011, n =55 Vegetation Damage Index ß = –1.577 ± .525, t = –3.00, P = 0.004
Willow cover2 ß = +0.0770 ± .0326, t = +2.36, P = 0.022
B. Multiple regression analysis of bird species richness per point-count station, in relation to vegetation
damage and big-leaf maple cover.
Model Statistics Independent variable1
R2a = 0.186, R2= 0.216, P = 0.0018, n = 55 Vegetation Damage Index ß = –1.264 ± .562, t = –2.25, P = 0.029
Big-leaf maple cover2 ß = –0.0213 ± .0102, t = –2.10, P = 0.041
C. Multiple regression analysis of bird species richness per point-count station, in relation to vegetation
damage, willow cover, and big-leaf maple cover
Model Statistics Independent variable3
R2a = 0.229, R2= 0.272, P = 0.0010, n = 55 Vegetation Damage Index ß = –1.273 ± .547, t = –2.33, P = 0.024
Willow cover2 ß = +0.0651± .0329, t = +1.98, P = 0.053
Big-leaf maple cover2 ß = –0.0170 ± .0101, t = –1.68, P = 0.100
1 Both considered simultaneously
2 On 0 – 10 scale
3 All three considered simultaneously
1.0, then twice as many point count stations are
required, i.e., 246 and 61.4 point count stations
would be required per group to detect a 25% and
50% difference, respectively.
As stated earlier, it is common practice to use a
higher level of statistical power (e.g., 80%) in
designing studies. Dawson (1981) only considered
50% power, but his results can be extended to
consider more stringent levels of power as follows.
For other levels of power substitute the following
(approximate) values for the 3.84 in Equation 1: for
70% power, substitute 6.15; for 80% power substitute
7.84; for 90% power substitute 10.50.
These values were derived from a more general
formula for comparison of means using two-sample
t tests, i.e.,
where n is the required sample size for each sample,
σ1
2 refers to variance in sample 1, etc., μ1 refers to
mean value in sample 1, etc., and zsubscript is a
“normal deviate” (also called z-score). For example,
for α = 0.05, z(1–α/2) = z0.975 =1.96; for power = 0.8,
z0.8=0.84; for power=0.5, z0.5=0.0; for power=0.9,
z0.9=1.28; and so on (Snedecor & Cochran 1989).
The experimenter would set the difference between
means; σ1 and σ2 are fixed by the investigator
(i.e., determined independently). For a Poisson-distributed
variable, μ1 = σ1
2 and μ2 = σ2
2.
Thus, with 1 individual detected on average per
point count station, approximately 250 and 63 point
count stations would be required per group to
achieve 80% power to detect between-group
differences of 25% and 50%, respectively. With 0.5
individual detected per point count, at least double
these sample sizes (500 and 126) would be required
per group to achieve the same 80% power. Note that
the minimum recommendation of Ralph et al. (1995),
i.e., 30 point count stations per habitat or treatment,
provides 80% power to detect a 50% difference given
an average of 2.0 detections per station, but only
yields 50% power to detect a 50% difference given
1.0 detections per station.
Where there are three or more groups of interest,
we recommend the same number of point count
stations per group be maintained. Thus to detect a
50% between-group difference with 50% power and
0.5 individual detected, on average, per station, one
would need either 123 point count stations (total)
allocated among 2 groups or 184 point count stations
[2]
(σ1
2+σ2
2) (z1–α/2+z1–β)2
________________________
( μ1–μ2)2
n =
for 3 groups. This is admittedly a conservative
approach. It would maintain the same degree of
precision per group, whether or not there are two or
more groups.
Buckland et al. (1993) present a formula for
calculating sample size (number of point count
stations) necessary to achieve a specified precision
in estimating population size. They assume that one
has conducted a pilot study of k0 stations and
detected a total of n0 individuals (assuming no
aggregations or clustering of individuals, as is the
case in flocks). Their formula for number of
stations, K, is
where CV(D) is the coefficient of variation of
abundance, i.e., the standard error (not the
standard deviation) of abundance divided by mean
abundance. For example if one wished to estimate
abundance such that the standard error was 20% of
the mean value, then CV(D) = 0.2. b is a factor that
depends on several variables and can be estimated
from pilot data (Buckland et al 1993); however, they
state that it will usually be about 3. Thus if 15
individuals are detected at 10 point count stations
on the pilot study, CV(D) = 0.2 (by design) and b =
3, then K = 3.0⁄0.04 × 0.667 = 50 point count stations.
This will be sufficient to establish a confidence
interval for abundance that ranges ± 40% of the
true value.
Power and Sample Size Analysis Using TRENDS
Recently, there has been a veritable explosion of
software now available for determination of power
and/or necessary sample size to achieve specified
power (Thomas & Krebs 1997, available on the
world-wide web at http://www.interchg.ubc.ca/cacb/
power/review/). Two specialized, free programs are
available for monitoring programs that evaluate
trends, whether those trends are temporal or
spatial.
For analysis of trend data using linear regression, a
user-friendly program, TRENDS (Gerrodette 1987,
1991), has been written by Tim Gerrodette and is
available, free of charge, from the Internet
(ftp://ftp.im.nbs.gov/pub/software/CSE/wsb21515/
trends.zip; also available from T. Gerrodette,
Southwest Fisheries Science Center, P.O. Box 271,
La Jolla, CA 92038, in which case please provide him
a 3.5″ IBM-compatible floppy disk). A User’s Guide
is provided with the software. We offer the following
as guidance in using and interpreting results from
TRENDS.
 k0 
___
 n0 
 b 
 _________ 
 [CV(D)]2 
K =
Assessment of Abundance and Species Composition Using Point Counts 21
TRENDS can be used for either temporal trends or
spatial trends, with regard to changes in abundance.
The program TRENDS can compute any one of the
following parameters:
1. Number of samples (either number of occasions,
for temporal trends, or number of sites for spatial
trends) (n);
2. Rate of change (expressed as proportional decline
or increase of the total population, e.g., –0.10 refers
to 10% decline per unit of time or space; 0.05 refers
to 5% growth per unit of time or space, etc.) (r);
3. Measure of variation about the trend line, which
Gerrodette (1989) refers to as “initial coefficient of
variation” (CV1);
4. Significance level (α);
5. Power (1-ß, where ß is the probability of making a
Type II error).
If four of these parameters are specified, the fifth
parameter is strictly determined, and its value can
be calculated by TRENDS. For example, if one
specifies number of samples, magnitude of the rate
of change, the variation about the trend line, and the
a level, TRENDS calculates power. In the same way
one can calculate the necessary number of temporal
or spatial samples to achieve a specified power to
detect an effect of specified magnitude.
If one is evaluating temporal trends, then the
number of samples refers to the number of sampling
occasions—most commonly the number of years
studied. If several surveys or point counts
contribute to a single year’s data, these would be
averaged together to yield a single datum for that
year. In the program TRENDS, if 10 point counts
are conducted in each of 10 years, the number of
samples is 10, not 100.
The measure of variation about the trend line is
given as the coefficient of variation (= standard
deviation [or standard error] divided by the mean),
symbolized in the program as CV1. CV1 is inversely
related to precision. Gerrodette refers to CV1 as
“initial coefficient of variation”, which is misleading.
The best way to obtain an estimate of CV1 is to use
data from a trend analysis to determine root mean
square error (Table 5) and then divide this by the
mean (or expected value). Gerrodette provides an
example where CV1 was estimated from replicate
counts on the same population in the same year
which was the initial year of the study. We strongly
advise against this practice because doing so
estimates only the part of the variation due to
measurement error. An additional part of the
variation about the trend line is due to stochastic
variation in abundance, which also needs to be
incorporated into CV1.
The other three parameters are straight forward. In
addition, TRENDS requires one to make additional
specifications:
6. Whether to use a 1- or 2-tailed test,
7. Whether population change is linear or
exponential,
8. Whether one’s test statistic is the z or t statistic,
and
9. How CV1 is related to abundance.
Although the first three are straight-forward, we do
have some recommendations. First, a 2-tailed test is
almost always the appropriate test, because the
possibility of an increase in population cannot be
ruled out, and would be of interest just as much as a
decline in population. Secondly, for temporal trends,
exponential growth is to be preferred. Exponential
growth implies that the growth or decline is a
constant percentage, e.g., a population increases at
10% per year. Furthermore, this assumption is in
accord with the definition of r, the rate of change.
Thirdly, if one is estimating CV1 from data, then the
t statistic is appropriate (Link & Hatfield 1990).
The relationship of CV1 to abundance is complex.
TRENDS assumes that variance (VAR) in a
population estimate is (i) proportional to abundance
(A), (ii) proportional to A2, or (iii) proportional to A3.
TRENDS does not allow for VAR to be independent
of abundance, or for VAR to be inversely
proportional to abundance. Noting that mean
abundance is proportional to abundance and that the
CV is the square-root of VAR divided by the mean,
options (i) to (iii) imply that:
(i) CV is proportional to 1/√A,
(ii) CV is independent of A, or
(iii) CV is proportional to √A.
TRENDS allows one to choose among these three
options. Which option one chooses will affect
calculated power, necessary sample sizes, etc.
(Link & Hatfield 1990). As guidance for choosing
among the options, Gerrodette (1987) offers the
following: for quadrats, strip transects, line
transects, or catch per unit effort (CPUE), CV is
proportional to 1/√A. For distance sampling, CV is
independent of A. For the single mark-recapture
using the Peterson method, CV is proportional to
22 Statistical Guide to Data Analysis of Avian Monitoring Programs
√A. Presumably, option (i) applies to standard
(fixed-distance) point count data, since a point
count can be thought of as a line transect of length
zero (Buckland et al. 1993).
Example 5:
Power Calculation Using TRENDS
To provide an example of one of the uses of
TRENDS, we consider surveys of Black-headed
Grosbeaks (based on data given in Example 3 and
Figure 1). CV1 for a single annual count (of log-transformed
data) is estimated to be 0.284 (root
mean square error [see Table 5A] divided by mean
value). Assuming α=0.05, exponential population
change (i.e., constant percentage change each year),
a 2-tailed test, use of the t-statistic, and CV1
proportional to 1/√A, the probability (power) to
detect a 5% decline per year after 10 years is 34%.
If, instead, CV1 is independent of A, then the power
to detect a 5% decline per year after 10 years is 29%.
It is difficult to say whether option (i) or option (ii) is
more appropriate, but the difference between the
two estimates is small. Thus, the power to detect a
substantial decline (amounting to 40% decline after
10 years) is fairly weak. If we increase the time scale
to 15 years, however, power increases to 89% (under
option i). Under this scenario we would have
appreciable power to detect a decline, but after 15
years the population will have declined by 54%. An
alternative means of increasing power would be to
increase the precision of our annual estimate, which
implies lowering the variance about the trend line,
i.e., decreasing CV1. If CV1 could be lowered from
0.284 to 0.200, power would increase (assuming 5%
decline over a 10 year period) from 34% to 59%.
CV1 might be lowered if sources of error could be
reduced (e.g., conduct surveys at the same time of
year) or if replicate surveys were carried out and
the results for each year then averaged.
One can easily use TRENDS to determine, instead,
the minimum number of years required to attain
80% power to detect a 5% decline per year (14 years,
assuming option i).
Using MONITOR
Whereas TRENDS uses an analytic approach to
determine sample size, power, etc., MONITOR
(developed by James Gibbs) uses computer
simulation. Link & Hatfield (1990) argue that
computer simulation is to be preferred to analytic
solutions, because the latter can only provide
approximate results. The program MONITOR is
easy to use (ftp://ftp.im.nbs.gov/pub/software/
monitor), and a Manual is readily available as well.
Users of the program should take into account the
following points.
1. Similar to TRENDS, only one data point per plot
(or transect or route) is allowed per time unit (e.g.,
per year). Unlike TRENDS, MONITOR can allow
for analyses conducted on several plots at once.
Where several plots (transects, routes, etc.) are
analyzed at once, MONITOR calculates a weighted
trend (see below for further discussion of
weighting).
2. Whereas “plots”, can refer to “routes” or
“transects,” it can also refer to individual point count
stations if they will be analyzed in this way (and not
simply pooled across a transect or route). The
maximum number of “plots” is 250.
3. When several surveys are conducted for each plot
in the same year (or breeding season or other
interval of interest), MONITOR averages across
these data (i.e., collapsing the data into a single data
point per plot per year).
4. A critical variable is “variance of plot counts.”
This variance is used to simulate variation about the
specified trend line. The manual suggests that
within-year variation (determined from multiple
surveys) can be used to estimate between-year
variance, but this will generally not be valid. The
correct estimate is the variance about the trend line,
just as with TRENDS.
5. As with TRENDS, the trend can be linear or
exponential. We strongly recommend an exponential
trend for reasons discussed above, unless data at
hand indicate a linear trend is more appropriate.
6. Data from multiple plots can be weighted
according to mean abundance, but variance about
the plot-specific trends is not used in weighting. This
is less than satisfactory, because it means that a
poorly-estimated trend has as much weight as a
well-estimated trend.
7. When data are collected from several plots,
MONITOR de-means the values (subtracting off
the mean value for each plot) before calculating the
variance. Otherwise, variance due to habitat
differences among plots will be included in the
estimate of sampling variance (which we are
interested in). However, this de-meaning is
undesirable because it over-corrects. That is,
suppose we have n plots that are true replicates. In
this case, all between-plot differences are due to
sampling variation, which will have been completely
removed by de-meaning. A better approach would
be to use a covariate (or set of covariates) to
characterize habitat variation, and then use
residuals from a regression on the habitat
covariate, to provide an appropriate measure of
variance.
Assessment of Abundance and Species Composition Using Point Counts 23
Power and Sample-Size Analyses: Other Sources
Several stand-alone statistical packages are now
available that can calculate power for a variety of
statistical tests and situations (reviewed by Thomas
& Krebs 1997). In their review, Thomas and Krebs
mention five programs that they could recommend.
Of these we highlight two: the first is PASS (Power
And Sample Size; available from NCSS Statistical
Software, 329 North 1000 East, Kaysville, UT 84037;
http://www.ncss.com/pass.html). When a class of
ecology graduate students was asked to compare
PASS with three other recommendable power and
sample size programs, 17 out of 19 students
preferred PASS! It is flexible, accurate, easy to use,
and easy to learn. The cost is moderate ($249). The
other program we mention (also reviewed by
Thomas and Krebs) is GPOWER (Erdfelder et al.
1996); though this program did not score as highly as
PASS, it is free (http://www.psychologie.uni-trier.
de:8000/projects/gpower.html). Thomas and
Krebs (1997) examined several general-purpose
statistical programs with built-in power analyses,
but found none that they could recommend.
Two other valuable sources for power analysis are
on the web. The Patuxent Wildlife Research Center
of USGS has an excellent page, that includes a
power analysis program for calculating power
for monitoring programs, using the data in a
manner similar to TRENDS. This is available at
http://www.im.nbs.gov/powcase/powcase.html.
Also available is a web page dedicated to the
discussion and calculation of power analyses at
http://www.im.nbs.gov/powcase/powlinks.html.
This site has both MONITOR and TRENDS
available as freeware.
A number of statistical texts treat the problem of
determining power. Fleiss (1981) provides an
excellent practical treatment of the problem when
the outcome is binary (only one of two outcomes), or
can be expressed as a rate or proportion. Thus, his
text can be very useful for studies of survival or
studies in which the outcome is presence or absence.
Cohen (1988) gives an extensive non-technical
treatment of power analysis for ANOVA.
24 Statistical Guide to Data Analysis of Avian Monitoring Programs
Mist-nets can be used to provide estimates of many
parameters: 1. relative abundance, 2. species
composition (richness, diversity), 3. productivity, as
measured by production or abundance of HY
(Hatching Year) birds, and 4. annual adult survival.
In addition, one can, in theory, estimate 5. offspring
survivorship to breeding age using data from mist-nets
but this is an area that is only now being
investigated by researchers. Regarding abundance
and species composition, methods of analysis are
the same as described for analysis of point count
data. Recent examples of analyses of trends in
abundance include Johnson & Geupel (1996) and
Chase et al. (1997); Silkey et al. (1999) discuss the
validity of inferring population trends from mist-net
capture data. Nur et al. (1994) analyzed
patterns of abundance along the upper Sacramento
River (Example 4) using mist-net capture data and
using point-count data. Mist nets cannot, however,
provide an absolute measure of abundance. On the
other hand, they can provide an age-specific, and
sometimes sex-specific, measure of abundance,
with a resolution that cannot be matched by point-count
or line-transect data.
Analysis of Productivity
The number of HY birds caught in a standardized
mist-netting study can provide an index of
production of young (DeSante & Geupel 1987, Nur
& Geupel 1993b, DeSante et al. 1993). Such data
have been analyzed in three ways: (i) analysis of
total number of HY birds caught; (ii) analysis of
number of HY birds caught per AHY (After
Hatching Year, i.e., adult) caught; or (iii) analysis of
per cent of all birds who are HY. Among these
parameters, (iii) is just a transformation of (ii), and
vice versa, provided that all birds are classified as
HY or AHY (total = AHY + HY). This can be seen
as follows: let HY/AHY=R. Then proportion of all
birds that are HY = HY/(AHY + HY), can be
written as
Thus, (iii) only re-expresses (ii), but the
interpretation of (ii) is more direct: the number
of fledged young per adult.
proportion HY = ____1___
1 + (1–
R)
Nur & Geupel (1993a, 1993b) point out that there are
hazards with including the number of AHY birds
caught as a measure of productivity (as do indices ii
and iii above): notably, the catchment area of HY
and AHY can differ markedly (Nur & Geupel 1993a).
Secondly, many AHY birds are transient, i.e., not
breeding locally. As an alternative, one can use the
measure in (i), HY birds alone. If one finds
differences in the number of HY birds caught in two
sets of sites, or can establish a trend in HY numbers,
this provides information about the production of
young on a population level, but it may or may not
indicate differences or trends in productivity per
pair. Thus we recommend that, if variations in
breeding population size can safely be ruled out, in
comparing areas or comparing years, the number of
HY be analyzed by itself (see examples in Nur &
Geupel 1993b). Otherwise, the biologist should
analyze (ii) or (iii). Example 6 demonstrates
different ways of analyzing productivity.
Example 6:
Analyses of Productivity.
This example is taken from the study of the impact
of the herbicide metam sodium on landbird
populations of the Sacramento River, described in
Example 4. Table 8 shows a species-by-species
analysis of productivity as measured in two ways,
HY birds per 100 net-hours, and proportion of HY
birds in the catch.
To examine patterns of productivity we selected
those species with sufficient sample size. Our
criteria were: (1) at least 36 individuals caught (of all
age classes) from the 9 sites, and (2) at least 12 HY
individuals caught, total, from the 9 sites. Six species
met both criteria (Table 8). The “36 individual
criterion” implied that each site averaged 4 or more
individuals caught, which we considered a minimal
acceptable number. We would have preferred to
impose a minimum of 45 individuals (i.e., 5
individuals caught per site on average), but then we
would have had fewer than 6 species to analyze. The
second criterion, at least 12 HY individuals caught
among the 9 sites, may seem too low a threshold (an
average of 1.33 HY individuals caught per site).
Nevertheless, we wished to include possible
instances where reproductive success was poor or
25
III. Demographic Monitoring: Mist-nets
nil at a number of the 9 sites; such apparent
reproductive failure might be especially informative.
Thus, a hypothetical species with 5, 4, 2, 1, 0, 0, 0, 0, 0
captures at 9 sites would qualify with respect to the
“12 HY” criterion. On the other hand, a species with
12 HY captures at 1 site, and 8 sites without any HY
captures is not particularly informative. We thus set
a 3rd criterion: HY captures at a minimum of 3 sites.
All species that met criteria (1) and (2), also met the
3rd criterion.
To analyze the HY capture data, we log-transformed
capture rates (birds caught per 100 net-hours), for
each species at each site. To avoid taking the log of 0
(which is undefined), we added a constant—in this
case, 1—before log-transforming. Had there been no
zeroes in the data set, we would not have added any
constant; there would have been no need to. Whereas
adding a constant before log-transformation is
standard practice, it can lead to bias (Thomas 1996).
However, the direction of bias is conservative: adding
a constant makes it somewhat more difficult to detect
an effect (e.g., to detect a trend).We recommend that
investigators try two different constants (e.g., adding
0.5 and adding 1) and determine if results are similar.
If they are, then the investigator has some confidence
that his or her results are not unduly sensitive to the
chosen constant.
For analysis of the proportion of HY in the catch, we
used the logit-transformation. In the case where
total captures = HY + AHY,
logit(proportion of HY) = loge(HY/AHY).
26 Statistical Guide to Data Analysis of Avian Monitoring Programs
Table 8. Analysis of mist-net captures, Sacramento River 1993: Relationship to Damage Index for the
six species with adequate sample size (at least 12 HY individuals caught, and at least 36 individuals,
total, caught, among 9 sites).
A) Dependent Variable: Hatching Year birds per 100 net hoursa
Species Analysis Number HY Caught
Black-headed Grosbeak ß = –0.369 ± 0.192, P = 0.096, R2a= 0.251, R2= 0.345 12
McGillivray’s Warbler ß = –0.519 ± 0.405, P = 0.240, R2a= 0.074, R2= 0.190 39
Orange-crowned Warbler ß = –0.226 ± 0.078, P = 0.023, R2a= 0.479, R2= 0.544 15
Song Sparrow ß = –0.565 ± 0.357, P = 0.16, R2a= 0.159, R2= 0.264 55
Spotted Towhee ß = –0.649 ± 0.227, P = 0.024, R2a= 0.472, R2= 0.538 32
Yellow-breasted Chat ß = –0.462 ± 0.169, P = 0.029, R2a= 0.449, R2= 0.518 13
B) Dependent Variable: Proportion of Hatching Year birdsb
Species Analysis Number of Sites
Black headed Grosbeak ß = –0.901 ± 0.566, P = 0.15, R2a= 0.161, R2= 0.266 9
McGillivray’s Warbler ß = –1.054 ± 1.115, P > 0.3, R2a= 0.006, R2= 0.130 9
Orange-crowned Warbler ß = +1.623 ± 0.432, P = 0.007, R2a= 0.622, R2= 0.669 9
Song Sparrow ß = –0.819 ± 0.690, P > 0.2, R2a= 0.170, R2= 0.274 9
Spotted Towhee ß = –1.707 ± 0.789, P = 0.074, R2a= 0.344, R2= 0.438 8
Yellow-breasted Chat ß = –1.497 ± 1.117, P > 0.2, R2a= 0.117, R2= 0.264 7
a Hatching Year Birds caught per 100 net-hours, log-transformed, i.e. ln((HY caught + 1)/100 net-hours). Results of simple regression
analyses for effect of Vegetation Damage Index. Sample size = 9 sites for each analysis.
b Proportion of Hatching Year birds, logit-transformed Results of simple regression analyses for effect of Vegetation Damage Index.
Sample size (number of sites) for each analysis is shown.
Table 9. Analysis of mist-net captures, Sacramento River, 1993: Relationship of HY, and proportion HY birds
caught in relation to Vegetation Damage Index. Results of simple regression analyses; independent variable
in each model is Vegetation Damage Index. Number of sites (sample size) is 9. Capture rates have been
log-transformed, i.e. ln((number of birds + 1)/100 net-hours).
Dependent Variable Analysis
HY birds/100 net-hours ß = –0.795 ± 0.201, P = 0.006, R2a= 0.646, R2= 0.690
Proportion of HY birds caught
logit-transformed ß = –0.360 ± 0.120, P = 0.028, R2a= 0.453, R2= 0.521
The logit transformation is a commonly used
transformation in biological analysis and forms the
basis of logistic regression (Chapter V). Note that
the logit(proportion of HY) is undefined when the
denominator (in this case number AHY caught) is
zero.We could have added a constant to the
denominator to avoid this “problem” but did not;
we consider it biologically appropriate that our
measure of productivity is undefined when there
are (apparently) no adults present. For the analysis
in Table 8B, sites could not be included where
logit(proportion of HY) was undefined, i.e., where
no AHY were caught. This applied to two of the
six species.
As shown in Table 8, of the six species analyzed,
three showed a significant decline in capture rate
with increasing biological damage. Analyses of the
HY/AHY ratio indicated a consistently downward
trend with increasing damage (5 out of 6 species had
a negative slope), but no species had a significant
negative trend. These results suggest that sample
sizes of individual species were likely too small to
reveal significant patterns, and a pooled analysis
was carried out, shown in Table 9. Analyses of all
HY and AHY caught for all terrestrial bird species
were pooled and the results confirmed a significant
decrease in productivity with increase in damage
symptoms.
Analysis of Adult Survival
Survival can be analyzed in two ways: using
capture/recapture methods or analyzing “return
rate”. Return rate is the proportion of individuals
observed in one time period (we refer to this
period as t), which are observed again (resighted,
recaptured, etc.) in the following time period
(period t +1). Thus return rate is the product of two
processes: survival from period t to period t +1, and
resighting (or recapture) in period t +1. Resighting
probability is defined as the probability an individual
is resighted at time t +1, given that an individual
has survived until time t +1 (Clobert et al. 1987,
Nur & Clobert 1988). In short,
return rate = survival × recapture
probability .probability.
(We use “recapture” in a broad sense to refer to
both resighting and recapture.) The justification
for analyzing return rate as a means of studying
survival is the assumption that recapture probability
is 100% or, at least, that it can be treated as a
constant. This assumption is likely to be violated
when one is comparing the sexes, or comparing
different species or even different populations.
Capture/recapture methodology analyzes both
parameters, survival and recapture probability.
In this way, survival can be estimated independently
of recapture probability and one can test for
differences in survival as well as differences in
recapture probability (Lebreton et al. 1992). It
would seem that capture/recapture methodology
provides a superior means to analyze survival and,
in theory, it does. However, there are three
drawbacks to its usage:
1. Capture/recapture methods require at least three
field seasons to estimate survival for one year,
instead of two. 2. More data are required to carry
out these analyses than with return-rate analyses,
because two parameters are being estimated instead
of one. 3. The optimal software for survival analyses
is not yet available, one that combines flexibility,
statistical power, and ease of use, without requiring
specialized instruction. In the meantime, there are
several programs available which can fill the gap (for
more detailed discussion, see Lebreton et al. 1993).
Table 10 summarizes statistical programs that are
available for analyzing capture/recapture data
(based on Lebreton et al. 1992). Below we discuss six
programs that have been widely used (SURGE,
RELEASE, MARK, SURPH, JOLLY, and
JOLLYAGE).
General Comments. Capture/recapture models
such as SURGE require at least three field seasons
(usually years) in order to estimate survival between
the first season and the second, though it is possible,
making some assumptions, to derive survival
estimates for the period between the second and
third field seasons. Thus ten field seasons would
yield estimates of survival for each of eight years,
and so on. It is strongly recommended that the
capture occasions be equally spaced and generally
speaking the programs SURGE, RELEASE,
JOLLY, and JOLLYAGE assume this. If one is
seeking to estimate annual survival, then the
capture “occasion” is the year or breeding season.
In each year, an individual is either caught or
re-sighted (scored a “1”), or not observed (scored a
“0”). This allows one to construct a capture history
for each individual (a string of 1’s an