This chapter provides a basic overview of the discipline of statistics, including common terms, fundamental concepts, and a bit of application. Included are some review questions to help you practice your new knowledge.

Every day, you encounter numerical information that describes or analyzes some aspect of the world you live in. For example,
here are some news items that appeared in the pages of The New York Times during a one-month period:

Between 1969 and 2001, the rate of forearm fractures rose 52% for girls and 32% for boys, with the largest increases among
children in early puberty, according to a recent Mayo Clinic study.

Across the New York metropolitan area, the median sales price of a single-family home has risen by 75% since 1998, an increase
of more than $140,000.

A study that explored the relationship between the price of a book and the number of copies of a book sold found that raising
prices by 1% reduced sales by 4% at BN.com, but reduced sales by only 0.5% at Amazon.com.

Such stories as these would not be possible to understand without statistics, the branch of mathematics that consists of methods of processing and analyzing data to better support rational decision-making
processes. Using statistics to better understand the world means more than just producing a new set of numerical informationyou
must interpret the results by reflecting on the significance and the importance of the results to the decision-making process you face.
Interpretation also means knowing when to ignore results, either because they are misleading, are produced by incorrect methods,
or just restate the obvious, as this news story "reported" by the comedian David Letterman illustrates:

USA Today has come out with a new survey. Apparently, 3 out of every 4 people make up 75% of the population.

As newer technologies allow people to process and analyze ever-increasing amounts of data, statistics plays an increasingly
important part of many decision-making processes today. Reading this chapter will help you understand the fundamentals of
statistics and introduce you to concepts that are used throughout this book.

1.1 The Five Basic Words of Statistics

The five words population, sample, parameter, statistic (singular), and variable form the basic vocabulary of statistics. You cannot learn much about statistics unless you first learn the meanings of these
five words.

Population

CONCEPT All the members of a group about which you want to draw a conclusion.

EXAMPLES All U.S. citizens who are currently registered to vote, all patients treated at a particular hospital last year, the entire
daily output of a cereal factory's production line.

Sample

CONCEPT The part of the population selected for analysis.

EXAMPLES The registered voters selected to participate in a recent survey concerning their intention to vote in the next election,
the patients selected to fill out a patient-satisfaction questionnaire, 100 boxes of cereal selected from a factory's production
line.

Parameter

CONCEPT A numerical measure that describes a characteristic of a population.

EXAMPLES The percentage of all registered voters who intend to vote in the next election, the percentage of all patients who are very
satisfied with the care they received, the average weight of all the cereal boxes produced on a factory's production line
on a particular day.

Statistic

CONCEPT A numerical measure that describes a characteristic of a sample.

EXAMPLES The percentage in a sample of registered voters who intend to vote in the next election, the percentage in a sample of patients
who are very satisfied with the care they received, the average weight of a sample of cereal boxes produced on a factory's
production line on a particular day.

INTERPRETATION Calculating statistics for a sample is the most common activity, because collecting population data is impractical for most
actual decision-making situations.

Variable

CONCEPT A characteristic of an item or an individual that will be analyzed using statistics.

EXAMPLES Gender, the household income of the citizens who voted in the last presidential election, the publishing category (hardcover,
trade paperback, mass-market paperback, textbook) of a book, the number of varieties of a brand of cereal.

INTERPRETATION All the variables taken together form the data of an analysis. Although you may have heard people saying that they are analyzing
their data, they are, more precisely, analyzing their variables.

You should distinguish between a variable, such as gender, and its value for an individual, such as male. An observation is all the values for an individual item in the sample. For example, a survey might contain two variables, gender and age.
The first observation might be male, 40. The second observation might be female, 45. The third observation might be female,
55. A variable is sometimes known as a column of data because of the convention of entering each observation as a unique row in a table
of data. (Likewise, you may hear some refer to an observation as a row of data.)

Variables can be divided into the following types:

Categorical Variables

Numerical Variables

Concept

The values of these variables are selected from an established list of categories.

The values of these variables involve a counted or measured value.

Subtypes

None.

Discrete values are counts of things.

Continuous values are measures, and any value can theoretically occur, limited only by the precision of the measuring process.

Examples

Gender, a variable that has the categories male and female.

Academic major, a variable that might have the categories English, Math, Science, and History, among others.

The number of previous presidential elections in which a citizen voted, a discrete numerical variable.

The household income of a citizen who voted, a continuous variable.

All variables should have an operational definitionthat is, a universally-accepted meaning that is clear to all associated with an analysis. Without operational definitions,
confusion can occur. A famous example of such confusion was the tallying of votes in Florida during the 2000 U.S. presidential
election in which, at various times, nine different definitions of a valid ballot were used. (A later analysis
[1]
determined that three of these definitions, including one pursued by Al Gore, led to margins of victory for George Bush that
ranged from 225 to 493 votes and that the six others, including one pursued by George Bush, led to margins of victory for
Al Gore that ranged from 42 to 171 votes.)