Data Collection
For Complete Networks

Sources of Data

1. Questionnaires.

There are several ways to collect questionnaire data. The main way is what I call
"row-based" which means that each questionnaire forms one row in the adjacency
matrix of the group as a whole. That is, in the adjacency matrix below, the questionnaire
administered to Andy yielded the data entered in the first rows. The questionnaire
administered to Bill yielded the second row, and so on. This means that even though
analytically we usually the entire matrix as one thing, each row was actually obtained
from a different source, and could have its own measurement idiosyncracies (such as bias).

And

Bil

Car

Dan

Ele

Fra

Gar

Andy

1

0

1

0

0

1

Bill

1

1

0

1

0

0

Carol

1

1

1

1

0

0

Dan

1

1

1

0

0

0

Elena

0

0

0

0

1

0

Frank

0

0

0

0

1

0

Garth

1

1

0

0

0

0

Another way is "row and column based", where each person is asked not only
who they give advice to, but who they receive it from. That means that for any pair, such
as Andy and Bill, we have two data points: one from Andy, and one from Bill. We then need
to employ some kind of rule to decide what number goes in the matrix: is it a one if
either of them say it is a one? Or only if both say it is a one? Or do we put the average,
so that the values range from 0 to 1/2 to 1?

A third way is the "consensus method" proposed by David Krackhardt. Here,
each member of the group is asked to indicate the relationships among every pair of
persons. The result is that for any pair of persons, say Andy and Bill, we have N data
points, and again we need some kind of rule for deciding what the "right" answer
is.

2. Direct Observation

There are two basic approaches to direct observation. One is to plant an observer in a
room and record all interactions that take place in front of the observer. The other is
the time allocation method, used in ethology, where the observer shows up at various
places at random times and records who is doing what to whom over a very short interval.

3. Written Records

Examples include:

intracampus mail; memos; email

trade between countries: manufactured goods, raw materials,

political interactions between countries (ny times data): antagonistic

migration records

joint ventures/mergers among companies

interlocking directorates

historical marriage records among 15th century italian families

4. Experiments

Several studies in the past have planted rumors in schools and observed the spread over
time. Stanley Milgram, in a series of studies, examined how many links was required to
connect any two randomly chosen persons in the US. These studies were the basis for the
play "Six Degrees of Separation" (later a movie starring Will Smith) and the
Kevin Bacon Game.

5. Derivation

Given actor-by-event or actor-by-group data, we can always construct an actor-by-actor
matrix by counting the number of events/groups that each pair of actors has in common. For
example, Davis, Gardner and Gardner (1941) looked through the newspaper society pages and
recorded which women were reported to have attended which society event:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

EVELYN

1

1

1

1

1

1

0

1

1

0

0

0

0

0

LAURA

1

1

1

0

1

1

1

1

0

0

0

0

0

0

THERESA

0

1

1

1

1

1

1

1

1

0

0

0

0

0

BRENDA

1

0

1

1

1

1

1

1

0

0

0

0

0

0

CHARLOTTE

0

0

1

1

1

0

1

0

0

0

0

0

0

0

FRANCES

0

0

1

0

1

1

0

1

0

0

0

0

0

0

ELEANOR

0

0

0

0

1

1

1

1

0

0

0

0

0

0

PEARL

0

0

0

0

0

1

0

1

1

0

0

0

0

0

RUTH

0

0

0

0

1

0

1

1

1

0

0

0

0

0

VERNE

0

0

0

0

0

0

1

1

1

0

0

1

0

0

MYRNA

0

0

0

0

0

0

0

1

1

1

0

1

0

0

KATHERINE

0

0

0

0

0

0

0

1

1

1

0

1

1

1

SYLVIA

0

0

0

0

0

0

1

1

1

1

0

1

1

1

NORA

0

0

0

0

0

1

1

0

1

1

1

1

1

1

HELEN

0

0

0

0

0

0

1

1

0

1

1

1

1

1

DOROTHY

0

0

0

0

0

0

0

1

1

1

0

1

0

0

OLIVIA

0

0

0

0

0

0

0

0

1

0

1

0

0

0

FLORA

0

0

0

0

0

0

0

0

1

0

1

0

0

0

In the matrix, the rows are the women, and the columns are events they may have
attended.

We can construct a woman-by-woman matrix by multiplying the matrix times its transpose
(Y = XX'). The result is a matrix in which the ijth cell records the number of events that
woman I and woman J attended in common.

EVE

LAU

THE

BRE

CHA

FRA

ELE

PEA

RUT

VER

MYR

KAT

SYL

NOR

HEL

DOR

OLA

FLO

EVELYN

8

6

7

6

3

4

3

3

3

2

2

2

2

2

1

2

1

1

LAURA

6

7

6

6

3

4

4

2

3

2

1

1

2

2

2

1

0

0

THERESA

7

6

8

6

4

4

4

3

4

3

2

2

3

3

2

2

1

1

BRENDA

6

6

6

7

4

4

4

2

3

2

1

1

2

2

2

1

0

0

CHARLOTTE

3

3

4

4

4

2

2

0

2

1

0

0

1

1

1

0

0

0

FRANCES

4

4

4

4

2

4

3

2

2

1

1

1

1

1

1

1

0

0

ELEANOR

3

4

4

4

2

3

4

2

3

2

1

1

2

2

2

1

0

0

PEARL

3

2

3

2

0

2

2

3

2

2

2

2

2

2

1

2

1

1

RUTH

3

3

4

3

2

2

3

2

4

3

2

2

3

2

2

2

1

1

VERNE

2

2

3

2

1

1

2

2

3

4

3

3

4

3

3

3

1

1

MYRNA

2

1

2

1

0

1

1

2

2

3

4

4

4

3

3

4

1

1

KATHERINE

2

1

2

1

0

1

1

2

2

3

4

6

6

5

5

4

1

1

SYLVIA

2

2

3

2

1

1

2

2

3

4

4

6

7

6

6

4

1

1

NORA

2

2

3

2

1

1

2

2

2

3

3

5

6

8

6

3

2

2

HELEN

1

2

2

2

1

1

2

1

2

3

3

5

6

6

7

3

1

1

DOROTHY

2

1

2

1

0

1

1

2

2

3

4

4

4

3

3

4

1

1

OLIVIA

1

0

1

0

0

0

0

1

1

1

1

1

1

2

1

1

2

2

FLORA

1

0

1

0

0

0

0

1

1

1

1

1

1

2

1

1

2

2

A similar process can be used to convert a set of coordinates (e.g. latitude and
longitude for each US city) into matrix of distances between all pairs of points.

The same process is used to convert monadic attribute data, such as sex, into dyadic
attribute data, such as "is the same sex as". For example:

M

F

F

M

F

Male

1

0

0

1

0

Female

0

1

1

0

1

Female

0

1

1

0

1

Male

1

0

0

1

0

Female

0

1

1

0

1

Informant accuracy

Can people really tell you about their social networks? Marketing researchers have
found that consumers can barely tell you what they had for lunch yesterday. Bernard,
Killworth and Sailer investigated informant accuracy systematically and found that about
52% of what they said was wrong.

Based on the work of Freeman, Freeman and Romney, as well D'Andrade, DeSoto, and many
others, it appears that people's recall of their interactions with others is
systematically biased toward what is normal and/or logical. At least this is better than
being randomly wrong.

People also tend to remember interactions with people who are important, while
forgetting interactions with people that are not.

Some respondents will lie to make themselves look good, since people judge others on
who they associate with.

As with any questionnaire, there are also problems with how people interpret the
questions. What "friend" means to one person may be very different from what
"friend" means to others.

Krackhardt's solution to all this is to get everyone's opinion of everyone's
relationship with everyone. So that if a person claims to be friends with everyone, but
everyone else agrees that they are friends with no one, we have a clue that they might be
lying or misunderstanding the question.

Bounding a Network

There are two basic approaches to bounding a network: emic and etic. The emic or
natural or "realist" approach hopes that natural boundaries exist. So if we ask
each druguser who they share needles with, eventually we reach people who do not share
needles with anyone that has not yet been named. The emic approach relies on relational
criteria to determine who is in or out of the network.

The etic approach imposes arbitrary boundaries based on the needs of the researcher.
For example, the research might choose to look at the social networks of the children in
the 2nd grade at a certain school. It is understood that the children have ties outside of
that group, but for the purposes of the study, these are ignored. The etic approach relies
on attributes of the nodes to determine who is in or out of the network.

Sampling

Can you use a sampling method to study complete networks? In general, the answer is no.
However, certain kinds of hypotheses can be tested with sample data. For example, it is
possible to estimate the density of a network by looking at ties among a sample of nodes.