The Conceptual
Level:1. Think of the philosophical/conceptual
shift connected to thinking graphically: a new form
of abstract thinking. Esp. time-series, or putting data on maps.

2. This leads to a paradoxical
quality of graphic thinking: on the one hand, a graph should be transparent
enough so that the observer sees data and not design (-> data variation not
design variation). YET, the form of the graphic itself shapes the structure
of perception: the assumption that there is a relationship between time and
a variable, or between different variables, or between space and time, etc.
(relate to paradigms).

4. Inductive vs. deductive:
how much to demonstrate a specific point with the graphic vs having the reader/viewer
draw their own conclusions and see their own patterns. NOTE: pure inductive
data presentation seems impossible: all graphics involve choices over what data
to present and not to present.

In this age of digital, the issue of information content is often
seen as data storage.That is, we emphasize the amount of digital space needed for data
storage: e.g., this Netscape Communicator file (consisting
mostly of text) file is about 48,000 bytes. (Had there been more visuals,
the file size would be MUCH larger.)

pixel = pix (plural for pic or picture)
+ elementthe small discrete elements that make up an
image

________________________________________

There is the old saying:

"a picture tells a 1000 words."

But there is a difference between data storage
vs. effective content

A photo in digital form (a 4*6 inch photo
scanned on a scanner at 250 dpi -- dots per inch) may require 6
megabytes of storage, which is 6,000,000 bytes
or 1,536,000,000 bits (that is, over 1 billion sets of 0/1 binary bits of data
to represent a simple snapshot -- and still at a lower visual quality than the
standard drugstore photo print.) Typical digital cameras (as of early 2002)
record images 1-2 MB, while the better ones have 4-5 MB. Standard 35mm slide
film is generally still more detailed (but digital is catching up).

Therefore: a picture may tell a thousand
words, but require 6 million bytes (6 megabytes) to be stored digitally.
1000 words may require just about 6,000 bytes to store.In other words, one digital picture requires
as much storage space as 1000 words * 1000 = 1,000,000 words (which is equal
to about 10 books!)

Another example: a color pie chart, generated
by Excel, depicting the percentage of men vs. women in planning, contains just
a single data point. Yet the pie chart image itself, stored digitally,
might require 6,000 bytes, which is 1,536,000 bits (1 or 0 elements).

________________________________________This is an illustration of how contemporary
computer software uses an ENORMOUS amount of storage space to provide all the
visual aspects that one sees on the screen (the graphical user interface).
If early personal computers were economical in using space (e.g., the first
MAC in 1984 had 128K of memory, no hard disk, and a 400K floppy drive to run
the software and operating system; a current laptop MAC (build year circa
2006) has 1GB of memory and a 80 gigabyte (80,000,000 Kbyte)
hard drive.

This explosion in memory has allowed for
a far greater gap between data storage size and effective content. One
might not worry about this, since memory is so cheap and abundant. But
it has arguably led to a cluttered computer screen, a loss of the programmer's
former elegant parsimonious use of memory, and an emphasis on facade more
than on content and communication.

________________________________________

Why the discrepancy between data storage size
and effective content?

redundancy of information: e.g.,
it may take 2 megs to simply store a uniform blue sky background in a photo.

the human eye can't process all that detail
stored (or differentiate between the millions of different color possibilities
for each pixel)

there is thus a difference between latent
information and usable (or effective) information.

So why digitize images if they are so data
intensive and of lower visual quality?This is the digital age: allows for images
to be standardized, manipulated, and transmitted in ways traditional images
cannot. That text, data, graphs, photographs, drawings, sound, etc., can
all be stored and transmitted in a single, standardized format (e.g., CD-rom,
modem lines, etc.)

An Example:

An 8x10 inch color photograph made
from a 35 mm negative (traditional silver-based film processed in a darkroom)

a digital image (e.g., taken with
a digital camera; or a scanned photograph; or a scanned slide
transparency)

Storage

image can be stored as a negative film strip or as
a printthis "storage" is an inexpensive
technology

stored digitally. thus is treated the same as
text, sound, etc. (e.g., ISDN). with high quality images,
a high data storage requirement needed.

Image quality

the image quality is potentially quite high (depending
on the quality of the camera optics, the film, the paper and the processing.)easy to increase or decrease the size of the image
(through magnification of the enlarger image)

image quality is not as high, though getting better

Modification of the single image

hard to modify the image (except through "dodging"
and other darkroom techniques)

much easier and with far more possibilities (e.g.,
with Photoshop software).

combination of multiple images

not easy: either through double exposure techniques
or collage cut-and-paste.

much easier and with far more possibilities

transference of image

Copying Image

the photo can be mailedthe
photo can be sent by wire or fax after first converted to dots.
(with loss of quality)

each subsequent
copy leads to a reduction in quality from the original

quite easy (as easy as any other form of digital information)

one can make an identical
copy

That said, I envision a future technological
era NOT defined digitally (binary), but one in which data is stored and
processed either as a hologram, or neurologically (biological), etc.

8. In data presentation there
is arguably a hierarchy of functions:

1. to first store
data

2. then perform basic
arithmetic (sums, averages, etc.)

3. then to show univariate
patterns in the data

4. then to reveal
patterns between two or more variables (e.g., correlation) -- and to show that
these relationships are statistically significant (that is, the patterns in
the sample data reflect patterns in the population as a whole).

5. then to understand
causal relationships

6. to recognize the
difference between relationships that can be changed and those that can't (policy
evaluation)

7. Finally, to relate
to the larger context of the world outside the data set.

Relate to Kant time and space as categories of
the mind: the first way we classify sensation. (as paraphrased by Durant):

? All are forms
of representation, with advantages and drawbacks. Don't automatically
graph everything: a shortcoming of EXCEL and Lotus: the ease to graph. Create
a graph because it communicates something substantial and meaningful that the
other formats cannot.

GOAL: give the viewer the greatest number
of ideas, in the shortest time, with the least amount of ink, in the smallest
space.

graph: lots of data, to be compared, multivariate;
little text/labels.

4. complexity vs. simplicity: how much information
does the graph include? how much does the reader readily pick up? What is just
chart-junk? (This is Tufte's INK/INFORMATION RATIO) or better:

data
ink ratio = data ink / total ink.(range is 0 -> 1)

5. Is there ordering in the data (nominal, ordinal, interval)? If so, have ordering
in the graphic design (e.g., shade of gray; vs. brightness of colors; etc.).

0 -
20 %

20 -
40 %

40 -
60 %

60 -
80 %

80 -
100 %

works better than ...

0 -
20 %

20 -
40 %

40 -
60 %

60 -
80 %

80 -
100 %

or at least use brightness within a color

0 -
20 %

20 -
40 %

40 -
60 %

60 -
80 %

80 -
100 %

Why? since brightness has an order, but color does not (or
at least color has multiple dimensions, which can be confusing)

6. close and far: the first overall look and the
second in depth look (graphs should encourage both)

7. Data density:
the eye can pick up fine details; most graphs waste this ability to process
fine details. (because they often have so little information in them.) e.g.,
a bar chart of 3 cases; 1 variable. low density of data there. (and why have
a chart as all? for decoration and emphasis?). TUFTE is interested more in representing
complex, relational data). Remember: graphics can be shrunk way down in size,
and the eye can still comprehend.

Low density: can be well less than 1 data entry/square
inch. Or as high as 100- 1000s/square inch). Maps can handle higher density,
since the reader can arguably (1) easily relate spatial data side-by-side, and
(2) it requires little labeling, since one assumes that the reader can interpret
a map without labels. (This may be a potential virtue of GIS: geo-coded and
spatially displayed data.)

10. Are there times to use BOTH
a table and a chart? the value and problems of overlapping and redundancy. When
unsure, ask the question: is a chart necessary? What does it provide that a table
or text does not? (not deductively, but actually in this case?) or are you just
doing one to fill space and because your computer program can do one? Often just
a good simple table and text will do. ALSO: different role of graphs in a magazine
or newspaper (grab attention) than in a paper or book.

11. Finally, the current challenge to get computer
software to follow the rules of Tufte. Sometimes you may need to import your
half-finished graph into a paint or draw program. And: there is nothing wrong
with hand-drawn visuals!

General Guidelines on Designing
Good Graphs

(based on reading student assignments from past
years)

1. Be sure to use a full title for the graphic
(variables, dates, locations, units of analysis). I.e., rather than "Crime and
Infant Mortality," use "Crime Rate per 100,000 Population (1991) and Infant
Death Rate per 1,000 Live Births (1988) in the Largest 40 U.S. Cities". If you
choose to use a shorter title, be sure that somewhere the variables are fully
defined.

2. List the source of the data (just as you would
for a data table.). Anticipate that some readers may simply photocopy your chart
rather than your whole article or dissertation; the graph should be somewhat
self-standing. (Include a descriptive caption at the bottom if useful).

3. Explain and label missing data. Be sure that
the reader knows the difference between a missing value and a zero-value (if
you are not careful, statistical software will treat these two as the same).

4. Order the chart in some useful way. And if
the chart has an ordering to it, be sure to state this (e.g., cities ranked
by population size).

alphabetical is not always the
best:

try instead ordering based on
some relevant variable (here simply the variable displayed):

5. If you use a subset of the cases, be sure to
explain the logic of the selection (e.g., among the 10 largest U.S. Cities).

6. Label the x and y axes.

7. Use a legend or labels to define variables
in a multivariate bar or column chart. You do not need a legend for a
univariate chart.

8. Often an x-y scatterplot is preferable to a
bar chart (or column chart) with two variables. Scatterplots use less ink, and
they usually reveal bivariate relationships (i.e., the relationship between
x and y) far better than bar or column charts.

Here is the same bivariate data displayed two ways:

9. It is fine to do a regression analysis, but
be sure to explain your results.

10. Do not add the Hispanic population
with other racial categories (black, Asian, etc.), since the U.S. Census states
that "persons of Hispanic origin may be of any race."

12.
Avoid non-white backgrounds to your charts. They can be harder to
read, especially if photo-copied.

13. Avoid column charts with too many data points:
the columns become too narrow (and the labels too small or some not showing)
to read easily. (This also applies to bar charts). This problem literally multiplies
with multiple variables displayed on one chart. Above about 10-15 data points
(e.g., columns), I would consider an alternative format (such as scatterplot,
a table, grouping data, etc.). Or use several charts, side-by-side, with the
same format (e.g., one for each variable). [see Tufte on the use of "small
multiples"]. an example of a problematic chart below:

Note how it is really hard to see patterns in the data (with 3 variables and
16 cases). The gray background is distracting too. Best to avoid this type of
chart. (Remember: just because Excel can create a chart from your data
doesn't mean that it is necessarily a good format for the data.)

14. Overall, show the data; have the view think
the patterns in the data, not the graphic design; avoid distortion; encourage
the eye to compare data; clearly label the graph.

Problems
of Percentages:

1. how to determine the denominator: think of
a survey result: what to do with nonresponses, etc.

2. also: "the percentage effect": a percentage
may go down, when the absolute goes up. How do we interpret? (GIve an example).
Well, it depends on whether the actual theory of phenomenon is better explained
by absolute or percentage.

Growth ratesexample:Berlin's population from 1900
to 1930

1900

2,712,190

1905

3,226,049

1910

3,734,258

1919

3,804,048

1920

3,879,409

1925

4,024,286

1930

4,332,834

1. average annual growth rate (assumes
linear growth)

AAGR = [(Pop1930
- Pop1900 ) / Pop1900
] / 30

= +2.0% / year

2. compound annual growth rate
(assumes geometric growth)

CAGR = [(Pop1930
/ Pop1900 )1/30] - 1

= +1.6% / year

3. compounded continuously growth
rate (assumes exponential growth)

CCGR = ln(Pop1930
/ Pop1900 )/30

= +1.56% / year

When to
use which? Well, it depends on your theory. Does growth depend on the original
base (linear growth) or the compounded base (geometric or exponential growth)?
Note that compounded annually and continuously lead to fairly similar answers.