In a lower-level general education history course I am teaching this
semester at ASU, I encourage students to make a small
handwritten study sheet (a.k.a. "crib sheet" or "cheat sheet") to use
in class while taking each exam, to jog their memories, help them
better marshal specific evidence to use in their answers, and reduce
exam anxiety.1 I also hoped the process of preparing the
handwritten sheet would be a valuable study exercise, encouraging them
to comprehensively scan the material as they looked for facts to
include, and reinforcing key concepts by writing them on the
sheet.2

When the day of the midterm exam arrived, I was surprised to see that
not all students had made study sheets. In fact, only 15 out of 24
students (62.5%) made them. I collected the sheets with the exams,
and later, when recording my grades, I noted whether or not each
student had made a study sheet, so I could see if creating a sheet
made a difference in their performance on the midterm.

sheet

students

mean (of 40)

std dev

mean (pct)

With

15

32.93

2.62

82.3%

Without

9

28.50

6.47

71.3%

Perhaps unsurprisingly, each group showed a range of outcomes. In
general, those who made the sheets did better on the exam than those
who did not -- the average score was 9% higher for those with study
sheets. No study sheet user received a grade below a mid-C. For those
students without study sheets, the range was greater, spanning failing
grades all the way to a couple of As. (The exam was worth 40 points;
the final table column adjusts the mean scores to a percentage basis.)

This is of course a very small set of results, doubtlessly showing the
influence of more factors than the single issue of having a crib sheet
or not which I've examined here. It seems quite possible, for
instance, that some students who didn't make a study sheet also didn't
study comprehensively for the exam, which would mean use of a study
sheet might simply be a marker of students with better study
habits. The data is not fine-grained enough to see if the improvement
for crib sheet users was seen in the multiple choice/short answer
questions, the essay questions, or both. But in any case the outcome
is clear enough to me to continue to encourage students to make a
study sheet when they have an opportunity to do so.

The literature on the effectiveness of study sheets is
generally positive, though few researchers attribute major performance
gains to them. See for example Brigitte Erbe, "Reducing Test Anxiety
While Increasing Learning: The Cheat Sheet, College Teaching 55,
no. 3 (2007): 96-98, DOI: 10.3200/CTCH.55.3.96-98; and Afshin Gharib
and William Phillips, "Test Anxiety, Student Preferences and
Performance on Different Exam Types in Introductory Psychology,"
International Journal of e-Education, e-Business, e-Management and
e-Learning 3, no. 1 (February 2013): 1-6, DOI:
10.7763/IJEEEE.2013.V3.183. ↩

Some research suggests that crib sheets do not
effectively help students learn the material; see Thomas N. Dorsel and
Gary W. Cundiff, "The Cheat-Sheet: Efficient Coding Device or
Indispensable Crutch?" The Journal of Experimental Education 48,
no. 1 (Fall 1979): 39-42. These authors, both psychologists,
determined this by asking students to prepare crib sheets before a
short, voluntary exam. The professors then took the sheets away from
some students. The students without cheat sheets then did not do as
well on the test as those who got to use the sheet they had prepared,
showing that the crib sheets functioned as a "crutch" but not a device
that had helped students internalize information during pre-test
studying. The applicability of this study to longer, mandatory exams
is not clear; also unreported is whether any student wished to bop the
researchers on the nose. ↩

In working with manuscript census materials, modern data derived from them, and published documents from the Census Bureau, I found myself coming back to particular resources time and again. In a hope this might be of use to someone else, I've put together this list of those I use most frequently. If you have suggestions or corrections, please contact me or leave a note in the comments, below.

The Unified Enumeration District Finder -- choose the desired year in the dropdown box in the title. Helps you find Enumeration Districts from addresses, for 1880-1940 censuses. In other words, you use this to help you find a key piece of information necessary to allow you to look up a particular place in the census. A quirky but amazing site.

Modern re-creations of census forms, helpful for deciphering questions (i.e. these are readable, but double-check each against an original) or for taking notes on a limited number of people: http://www.cyndislist.com/us/census/forms/

Historical Census Data Browser, University of Virginia. This is the one you want if you are looking for data down to the county level, over a wide time span. It can also filter data by a number of variables, which is extremely powerful.

20th century county-level population counts, from the Census Bureau, arranged by state: "County Population Census Counts 1900-90" Just simple text files with tabular data, but sometimes that's all you need.

U.S. Census publications

Census Bureau: Published books from the census, in annoying linked PDF format: http://www.census.gov/prod/www/decennial.html (or via FTP here). The Dubester catalog is helpful in figuring out what's what, and what is missing. There is data available in tabular form in these publications that hasn't made its way to the databases, so it can still be helpful to access them. These particular scans are not high quality, unfortunately, but far better than no access at all.

Henry J. Dubester, Catalog of United States Census Publications, 1790-1945 (Washington: GPO, 1950). The standard, though the Census Bureau has stopped listing the Dubester numbers for early publications on their site, so it's not quite as important to use it to navigate. Still helpful to make sure you have seen what there is to see, and to help decipher similarly-named documents. Available as one half of this huge PDF, or at HathiTrust (Note: The first link above has a second piece that picks up where Dubester left off and continues to 1972.) Kevin Cook published a revised version in 1996 that provides SuDoc numbers for the publications listed by Dubester, making them much easier to find in a modern Government Publications depository library.

Henry J. Dubester, State Censuses: An Annotated Bibliography of Census of Population Taken After
the Year 1790 by States and Territories of the United States (Washington,
DC: Government Printing Office, 1948), 73 pages. Google Books

Jason G. Gauthier, Measuring America: The Decennial Censuses from 1790 to 2000 ([Washington]: U.S. Census Bureau, 2002), http://www.census.gov/prod/2002pubs/pol02-ma.pdf This is a very detailed chronology of census information that can be quite helpful in figuring out exactly what was asked when, and what survived. Includes images of the population schedules.

Commercial services

I try to avoid these, but especially if you are trying to trace the history of a particular individual, they are powerful and can save you some legwork. Note that if you happen to be in the vicinity of a National Archives facility, you can use them (and others) for free on-site, as part of a deal struck when NARA began permitting the companies to digitize NARA records in huge batches.

As I've noted before, manuscript records collected by the census can be fascinating and informative windows to the past. They can be used to learn more about groups of people that appear only occasionally in the historical record, and since they are generally well-structured, they can be used (with care) to ask data-driven historical questions. When most historians think of using historical census sources, it's the forms from the decennial census of population that come immediately to mind. These are well-known sources, but there are always fresh nuances to discover. I stumbled over one of these just the other day.

This was news to me: the 1910 federal census used different forms to record American Indians. These filled-out forms were combined on microfilm together with those manuscript schedules that had been used to record the rest of the population, but the Indian-specific forms reflected and reinforced their non-equal place in American society.

But then one find led to another: in the course of trying to find out more about these forms, I quickly became aware of how little I knew about finding American Indians in the census. In the 1800s, American Indians were only rarely counted by enumerators in the way the non-Indian population was. On the other hand, the U.S. government occasionally tried to record American Indians specifically. So policies of separation and difference ended up having an unintended outcome, leaving behind primary sources that help us know more about these groups than we otherwise might.

The "Indian Census Rolls"

Though information about American Indians can be found in a variety of census publications, one of the largest is the set of microfilmed copies of the Indian Census Rolls, 1885-1940. These forms were the result of instructions to federal Indian agents to tally all of the American Indians living on reservations under their jurisdiction. As this detailed article from the National Archives makes clear, despite the wide scope of coverage, the forms did not cover every recognized group of American Indians, nor did they list non-affiliated members.

But these documents do have certain advantages over traditional census forms. Unlike the regular census, the "Indian Census" rolls were supposed to be recorded or updated every year. Many of the records were typed instead of handwritten (hooray!), and for some years, records of individuals included a direct reference to the same person on the previous year's form, greatly easing the work of tracing a person through time.

The Indian Census Rolls were long available only on microfilm, which limited access to them. Recently they have been digitized and made available for searching through genealogy websites ancestry.com and fold3.com, where they are available to paid subscribers. If you are just looking for the name of a particular ancestor, the paid sites are undoubtedly the easiest way to find that needle in a haystack. But the records can also be accessed for free with a little extra work, as the microfilm reels published by the National Archives have been digitized by the Internet Archive. Usage patterns for 9CHRIS, a site I built to help make historic court records accessible, suggest that there is a lot of demand for greater access to federal records about American Indians, so I thought it might be useful to describe how to use the freely-available Indian Census.

Using the Finding Aid and the Internet Archive

Here's how to find the rolls for a particular American Indian tribe or group in the freely-available Indian Census sources:

First look them up in the finding aid to find out what "agency" was responsible for reporting about them. Agencies were units of geography, which sometimes (but not always) reflected federally-designated reservations. Sometimes a single agency might report about several tribal groups, and the opposite was also sometimes the case, where a particular tribe might fall under the jurisdiction of several agencies.

For example if I were looking for the Washoe (also spelled "Washo"), I would see that the Bishop, Carson, and Walker River agencies each had jurisdiction.

Next, use the second list, located later in the same finding aid document, to find out what reels of microfilm contain the records for that agency. (The agencies are listed in alphabetical order.) Some agencies share space on the same reel of microfilm, and in other cases the agency's records are spread across multiple reels, with a few years to each reel. Note the reel number you are interested in, in the left column.

To continue the example, the Carson agency has records on reels 18, 19, 20, and 21. If I wanted to see records from 1933, I'd choose reel 20.

Now we can go to the Internet Archive and look for the reel we want. I searched for "Indians of North America" AND census AND Reel 020" (note the reel number is always three digits) and it returned precisely the result I wanted.

The reel can be read online, or downloaded as a (very large) PDF. Within each agency, the records are typically divided by year, and then by the "reservation" or administrative unit within the broader agency, which are organized alphabetically within each year.

In this case, the records from 1933 were at the beginning of the reel. I skipped over several administrative units (Austin, Battle Mountain, Beowawe) before coming to the Carson Valley subunit, where the people I was looking up lived.

To save a copy of a page in the online viewer, right-click on the image and choose "Save Image As..." to download it. The Internet Archive also built their online viewer so that each specific page in every document has a unique URL, so you can just copy and paste the URL from your browser bar.

Conclusion

Historians, social scientists, independent researchers, and genealogists all make extensive use of the manuscript historical census. The discrimination and unequal treatment faced by many ethnic and racial minorities in the past is sometimes reflected in their absence or unequal treatment in the census. But sometimes, as in the case of some Native Americans, these social attitudes led to policies that left behind, as a side effect, documentation that helps us understand their lives and history better than we would otherwise.

Census microdata, such as that produced by
the NAPP project, can help illuminate interesting issues
surrounding work and labor.

Investigating the history of work using microdata must begin with the
questions asked by the census enumerators about employment. These
varied depending on the country and year, but were generally quite
simple. For example, the U.S. 1880 census form had two
questions pertaining to work, and a third that hinted toward labor as
well. Question 13 recorded the "Profession, Occupation, or Trade of
each person, male or female," and Question 14 asked the "Number of
months this person has been unemployed during the Census year." These
two questions were not supposed to be asked of any individual "under
10 years of age," according to the instructions.1 Question 15 also
implied work, asking if the person had been sick or disabled, "so as
to be unable to attend to ordinary business or duties." The recorded
occupation, Question 13 in the case of the US 1880 census, was written
free-form in the blank.

Transcribing the occupation

In the NAPP dataset, the occupation as it was transcribed by
volunteers is found in the field OCCSTRNG. The small space on the
form and the frequently creative spelling and abbreviating style of
the enumerators can lead to some unreliable results if taken alone.

Let's look at mining engineers in the 1880 US microdata as produced by
NAPP. We have 417 entries where OCCSTRNG is MINING ENGINEER.
Good! But we also have other variations, such as:

MINING ENGINEAR

MINING ENGERNEER

MINING ENG.

MINGING ENGINEER

MG. ENGINEER

ENGINEER MINING

MINING ENGR

MINING EXPERT

MININING ENGINEER

Some of these might be small variations on the overall category, worth
noting and investigating. (What's the difference, in 1880, between a
Mining Expert and a Mining Engineer?) But many are clearly mining
engineers, just spelled differently.

While this variable by itself is valuable, it can be difficult even
for simple operations -- say, counting the number of mining engineers
in a particular state -- because of the spelling differences. To
address this limitation, the NAPP team created many additional
variables to allow researchers to compare occupational information
more broadly.

Constructing variables about work

From these tiny bits of inconsistent information about each person's
occupation, NAPP adds tremendous value by creating new variables
derived from this information. These are "constructed variables."

As you can see in the full list, not all
variables are available for each sample, in part because the census
questions asked about labor could vary depending on country and year.
Additionally, some of the variables have essentially similar
information, but NAPP offers a variety in order to make it easier to
connect NAPP data with other data sets.

Some of these constructed variables are simple and intended to help
with other comparisons. For example, the LABFORCE variable
simply records if a person participated in the work force or not (or
if it was impossible to tell).2 By itself, this might not
seem very useful, but it could be helpful in conjunction with other
variables. For example, you might want to compare people in one
occupational sub-group -- farmers, for example -- with all people who
were a part of the labor force rather than the public as a whole.

Other constructed variables group together workers by their
occupation. This helps solve the sort of problem we faced above with
misspellings of "mining engineer." OCCHISCO and
OCC50US are two of these variables. Each uses a
mid-twentieth century list of occupations as a starting point, with
adaptations to better represent historical occupations. It is
important to remember that not all occupations would have fit neatly
into one of these later occupational categories, and conversely,
sometimes very different occupations get lumped together
inadvertently.3 Even so, this can be an important way to
identify relatively fire-grained occupational information.

Some occupations have a built-in hierarchy of status that might be
difficult to capture using the OCCHISCO codes alone. A "Mine
Laborer" and a "Miner" are different things, but both might reasonably
belong in OCCHISCO category 71120. The OCSTATUS variable records
any known hierarchical information from the occupation field. So
while a miner and a mine laborer would both be in the same OCCHISCO
category, a "Laborer" would have OCSTATUS 33, where the
miner might have an OCSTATUS of 32 ("Worker or works in"), 00
("None"), or 99 ("Unknown").

NAPP also includes constructed variables that suggest, in a relative
way, the wealth or status associated with an occupation.
SEIUS uses the "Duncan Socioeconomic Index," which was
developed in the 1950s, to create an occupational rank that considers
income, education, and prestige. These scores are tied to the way
those facotrs were perceived in the 1950s -- this means that they can
be compared across decades, as the scores will always be the same for
a given occupation; but it also anachronistically frames prestige in
1950 terms. For example, a "miner" receives a SEIUS score of 10,
which is fairly low.4 But perhaps mining carried more prestige
in 1880? For obvious reasons, there is considerable debate about the
usefulness of this measure, but like all of these constructed
variables, it may be helpful if used carefully.5 The NAPP
variable OCSCORUS provides a related measurement, classifying each
occupation according to its economic status in 1950. Unlike the
SEIUS score, which factored prestige or status of an occupation into
its calculations, the OCSCORUS is based only on the earning power of
that job classification in 1950. As with SEIUS, there are obvious
problems with anachronistically comparing 1880 job types based on what
those job types earned 70 years later. However, if used carefully,
OCSCORUS, like SEIUS, can put all workers somewhere on a universal
scale in order to compare them.

A simple example

Where did most mining engineers live?

Let's begin with a straightforward question: Where did mining engineers live in
1880? We will use the OCCHISCO variable to look for them, noticing
that the value "02700" is "Mining engineers." Let's group them by
state, but notice that NAPP provides a variety of geographic levels that could be used here, from simple measures of urbanity (URBAN), to small divisions such as enumeration district (ENUMDIST) and county (COUNTYUS), up to regional groupings of states (REGIONNA).

This SQL code will produce the table below, using a JOIN to grab each
state's name from the auxiliary table. (Note: I have manually folded
the table to take up less vertical space.)

SELECT stateus.desc AS State
, count(*) AS Engineers
FROM data
JOIN stateus ON data.stateus = stateus.id
WHERE occhisco = 2700
GROUP BY State
ORDER BY Engineers DESC
;

State

Engineers

State

Engineers

California

146

Colorado

83

Pennsylvania

66

New York

57

Michigan

24

West Virginia

23

Arizona Territory

22

Illinois

20

Nevada

20

Massachusetts

17

Utah Territory

17

Missouri

14

New Jersey

14

Ohio

10

New Mexico

9

Virginia

8

Idaho Territory

7

Montana Territory

7

North Carolina

7

Iowa

6

Georgia

5

District of Columbia

4

New Hampshire

4

Tennessee

4

Arkansas

3

Connecticut

3

Kentucky

3

Maine

3

South Dakota

3

Alabama

2

Indiana

2

Maryland

2

Rhode Island

2

South Carolina

2

Delaware

1

Nebraska

1

North Dakota

1

Oregon

1

Wyoming Territory

1

Unsurprisingly, many mining engineers were found in the American West,
where mining was booming. The strong numbers in Pennsylvania reflect
the importance of the anthracite and bituminous coal industry
there. But these numbers can also help remind us of the close association of
engineering expertise with capital, as in the case of those located in
New York, New Jersey, and Massachusetts. Similarly, they might help
remind a researcher who focuses on Western mining of the growing
importance of coal production in the midwest, in states such as Ohio,
Illinois, and Iowa. Unexpectedly high or low numbers can help prompt
deeper investigation. For instance, are two mere engineers in Alabama
a sign that the state's major coal industry had yet to
reach substantial levels of development, or that the mines were worked
without significant engineering oversight?

Caveats about microdata and labor history

As with any data derived from the historical manuscript census, there
are sometimes problems with NAPP's occupational data. Some of these
problems arise from the recording, transcription, and coding phases.
If the enumerator heard the person incorrectly (or could not spell
well), or if a volunteer could not make out the handwriting or assumed
the word was a different one, or if the occupation did not clearly fit
any one category and a "best guess" had to be made in classification,
errors might be introduced. It might be particularly troublesome to
imagine that job categories, and their relative status, remained
consistent over the decades.

Other issues stem from the nature of the census itself. Most census
forms only permitted one occupation to be listed.6
Occasionally enumerators tried to squeeze two occupations in the
space, such as "MINING & CIVIL ENGINEER." But most frequently the
other work was simply not counted. What if a person was a farmer
during the summer months and worked at mining during the winter? Only
one occupation could be recorded.

Census workers were supposed to record a person where they were found
on a particular day or month, such as June 1880. This specificity
could contribute to errors in recording people whose work was seasonal
or took them far from home, such as a mining engineer on a summer-long
consulting trip in the mountains of the American West.

Similarly, the census takers did a poor job understanding and
accounting for the work of women and children. A woman who took in
boarders or washing made a tangible contribution to her household's
economic prosperity, but this was often overlooked by enumerators
who would frequently simply record "Keeping House."

Job insecurity is difficult to determine in the NAPP data. It was not
well recorded by the census, especially in older censuses, which
provide only crude measures of unemployment and no data at all about
underemployment. In the 1880 US census, for example, there is a
column to mark how many months a person has been unemployed, but this
could at best unevenly reflect the cycle of on-again, off-again work
that typified many labor categories, such as anthracite miners in
Pennsylvania. Compounding the issue, most of the NAPP data sets do not
include this information, even if it had originally been recorded on
the census. (Perhaps a potentially-sensitive issue such as
unemployment had been deemed unnecessary to record by genealogist
volunteers who, in some cases, originally compiled the data sets that
were further extended by NAPP.)

Conclusion

NAPP microdata derived from the census can offer important
information about historic patterns of work and labor. The data is by
no means a perfect representation of work activity, and it can contain
noteworthy errors. Even so, when used judiciously, this microdata can
shed light on important questions about work that were central to life
in the past.

Some quick work with the database shows that this rule was
hardly observed universally. While sometimes enumerators filled this
field with age-appropriate information such as "ATTENDING SCHOOL," or
crossed it out with an "X," (and typos in the age field may also have
occurred), it is clear that children under 10 years of age worked in
small numbers in a wide variety of occupations in 1880. ↩

As usual, caveats apply and the
documentation for each variable must be read carefully.
For example, the LABFORCE variable is designed so as to report that
people who are listed as having an occupation, but are 15 years old or
younger, are automatically reported as not having an occupation.
This would make it impossible to use LABFORCE to pursue certain
kinds of occupational questions about child labor, for example. ↩

One example of inadvertent lumping of dissimilar work
in the same category can be found in OCCHISCO value 03030, "Mine
surveyors." In the 1880a US data set, only 15 individuals are placed
in this category. With occupations such as "MINE SURVEYOR" (3),
"MINING SURVEYOR" (2), and "SURVEYOR AND MINER" (1), some of these
appear to be people who work for mining companies conducting
underground surveying, which was often done by beginning-level mining
engineers. Others in this category, such as "U.S. MIN'L SURVEYOR" (1)
and "U.S. DEPUTY MINER SURV." (1) are quite different. These were
experienced land surveyors appointed by the federal government to
carefully survey the surface boundaries of any mining claim staked on
public land. They would create reports and plat maps, swearing under
oath as to their accuracy. US Mineral Surveyors, then, would seem to
be a very different type of occupation than a mine surveyor, but
because of the need to classify occupations they ended up in the same
OCCHISCO category. ↩

The average SEIUS value for all members of the labor force
in 1880 is 19.70. By way of comparison, mining engineers (OCCHISCO
= 02700) have an SEIUS score of 85. ↩

See
this cautionary note
from the IPUMS documentation. Note: NAPP's SEIUS is called SEI in
the IPUMS-USA data and documentation. ↩

Among NAPP data sets, Norway is an exception, and allowed
census takers to record two occupations. ↩

I'm grateful for all the support and assistance I've received as I've chased this fascinating history.

Here's the description from the press:

Digging mineral wealth from the ground dates to prehistoric times, and Europeans pursued mining in the Americas from the earliest colonial days. Prior to the Civil War, little mining was deep enough to require maps. However, the major finds of the mid-nineteenth century, such as the Comstock Lode, were vastly larger than any before in America. In Seeing Underground, Nystrom argues that, as industrial mining came of age in the United States, the development of maps and models gave power to a new visual culture and allowed mining engineers to advance their profession, gaining authority over mining operations from the miners themselves.

Starting in the late nineteenth century, mining engineers developed a new set of practices, artifacts, and discourses to visualize complex, pitch-dark three-dimensional spaces. These maps and models became necessary tools in creating and controlling those spaces. They made mining more understandable, predictable, and profitable. Nystrom shows that this new visual culture was crucial to specific developments in American mining, such as implementing new safety regulations after the Avondale, Pennsylvania, fire of 1869 killed 110 men and boys; understanding complex geology, as in the rich ores of Butte, Montana; and settling high-stakes litigation, such as the Tonopah, Nevada, Jim Butler v. West End lawsuit, which reached the US Supreme Court.

Nystrom demonstrates that these neglected artifacts of the nineteenth and early twentieth centuries have much to teach us today. The development of a visual culture helped create a new professional class of mining engineers and changed how mining was done.

Historians often try not to fall in love with our sources, but
sometimes I just can't help it. For me, aside from the chatty personal
journal (what historian can resist?) and the underground mine maps
I've studied for years, my greatest fondness may be for big
compilations of small bits of data, called microdata.

What's microdata? It's small bits of information that, by themselves,
might be virtually useless, but when aggregated and analyzed can show
bigger trends.

The classic use of microdata is the census. Whether you
filled out the form yourself or talked to the canvasser ("enumerator")
who visited your house, small bits of information about you and your
family were recorded. By itself, this isn't much -- half a dozen or
more websites probably know more about you than the US Census Bureau
does. But when placed together with similar information from other
people, we can see broader trends -- this neighborhood is slipping
into poverty, or that one has an emerging immigrant business
community, or this county will need to plan for more school capacity
because of the number of young children.

What if we could use that same kind of explanatory power to help
understand the past? This question, and the use of microdata that it
implies, has motivated historians since the advent of computers in the
1960s. The raw records of the census, which in the United States has
been conducted every ten years since 1790, are a good source for this
microdata. The raw records (termed "manuscript census" records) are
released to researchers 72 years after they were created. One
challenge is that they are handwritten (hence "manuscript"), meaning
that any researcher would have to carefully transcribe the handwritten
documents before being able to use them as microdata. But that
transcription only needs to be done once, if researchers do a good job
and are willing to share.

The one I've used most is the
North Atlantic Population Project (NAPP). The MPC collaborated
with
several other institutions on both sides of the Atlantic
to make historical census microdata available from multiple
countries. NAPP converts the data so that variables can be compared
directly between countries. They also add additional variables,
derived from information in each census, that extends the sorts of
questions researchers can ask. Best of all, NAPP shares their
microdata collections with researchers for free, provided you promise
to cite it appropriately,1 not redistribute the data, and
not use it for genealogical purposes. In exchange for complying with
these very moderate restrictions, NAPP makes available
samples of census microdata covering the US, Canada,
Great Britain, Sweden, Norway, and parts of present-day Germany during
the 19th and early 20th century. Some of these samples are complete
transcriptions of all of the manuscript census records, plus all the
extra NAPP bells and whistles.

Most researchers use statistical software packages such as SPSS or
STATA to browse and manipulate NAPP data. I took a different tack,
creating a set of scripts that will manipulate a NAPP
data file and load it into a database, where I can explore it with
SQL, the standard language used to query databases. Learn more about
the gory technical details.
Over time, I will post some sample SQL statements here on this blog.

Microdata census records combined with NAPP's additional variables and
a powerful search tool make it easy to dive deeply into historical
patterns large and small. As long as the limitations of the original
census sources are kept in mind, an extraordinary range of questions
-- about race, work, geography, gender, age, the family, and more --
can be asked that would be difficult to answer any other way. Can you
blame me for having a soft spot in my heart for historical microdata?

Minnesota Population Center. North Atlantic Population
Project: Complete Count Microdata. Version 2.0
[Machine-readable database]. Minneapolis: Minnesota Population
Center, 2008. Additionally, each data set offered by NAPP has its own
citation. ↩

The tools

napptools consists of three script programs:

napp2csv.sh: A Bash script that uses traditional unix tools cut,
sed, and tr to chop a NAPP data file into its respective
columns, guided by a SAS-format command file. This also creates
secondary tables in .CSV format from the variable descriptions in
the SAS file.

csv2sqlite: A public-domain AWK program written by Lorance Stinson
(available from http://lorance.freeshell.org/csvutils/) to convert
.CSV files into a series of SQL statements that load the data into a
database. Two small changes were made by Eric Nystrom to Stinson's
original code to fix a bug and better fit the output to SQLite's
capabilities by specifying non-typed columns.

nappbuild.sh: A Bash script to employ napp2csv.sh and
csv2sqlite to create .CSV and .SQL files, then load them into a
SQLite database.

Usage

Get an account at http://nappdata.org, receive access approval, and
select your desired variables. Download the fixed-width text data file
itself, which will end in a .dat.gz extension, as well as a
command file in SAS format.

Ensure all dependencies are met. On most Linux systems the only one
you may need to install will be sqlite3, the command-line client
for the SQLite database package.

Run nappbuild.sh in the directory containing your data file and
your command file, passing the name of the SQLite database you wish
to create.

If your data file is napp_00001.dat.gz and your command file is
napp_00001.sas then run nappbuild.sh like so:

nappbuild.sh -i -n napp_00001 -d MyNAPPData.db

From there, you can use your database from the SQLite command shell
sqlite3 or your favorite programming language.

Database structure

For databases created with napptools, most of the NAPP data ends up
in a single large table, called data. Each of the columns in data
is named for the NAPP field, such as SERIAL, PERWT, NAMELAST,
etc. (Since SQLite's column names are not case-sensitive, lower case
works fine too.)

Some of these columns have self-contained information, such as
NAMELAST or OCCSTRNG, but others contain a numeric code that will
typically need to be translated into human-readable
values. Translations for these codes were offered by NAPP in the
command file. The napptools suite breaks those translations out of
the command file, into separate tables loaded into the SQLite
database. These secondary tables are named for the NAPP variable, and
always contain two columns, id and desc. With this information,
it is easy to use a SQL JOIN command to bring the translations into
your results, or you can refer to the codes directly if desired.

For example, to show the number of people listed in each category of
the race variable in the state of New Mexico (stateus value of
35):

Some caveats

The SQLite index generation routine is rather crude, as it makes an
index for every column in the data table and the id column in
all secondary tables. This is likely overkill, but there's no doubt
column indexes on at least some of the columns helps many queries.

This was designed and used on a Debian Linux system. It seems likely
that it will be portable to similar unix-based systems as long as
the dependencies are all met, but YMMV.

Last semester, several of my students ran afoul of a perennial problem with PowerPoint. They had created their slide decks on a large screen, but when they connected to the room's projector, it forced a lower screen resolution.

Blammo! -- ugly slides. Text too big, images cut off, broken layouts everywhere. The wreckage was so horrifying, two design students in the audience were forced to avert their eyes.

There are several ways to solve this problem, but here's the trick I use all the time: Instead of showing your slides in PowerPoint, make your presentation deck into a PDF, and show that. PDFs will scale, unlike PowerPoint, and will capture your presentation faithfully, no matter how different the resolution might be.1

1. Install software that allows you to create a PDF

Several free software packages allow you to create a PDF.

LibreOffice is a free, open source office suite (essentially a replacement for Microsoft Office), which has a PDF export feature. There are versions for Windows, Macs, and Linux. (An earlier version was called OpenOffice.) You can use LibreOffice Impress (a PowerPoint work-alike) to create your presentation, or just open your PowerPoint file in Impress to export it to PDF. This is what I usually use.

PDFCreator is a free and open source program to help you create PDFs in Windows. When you install it, PDFCreator makes a new printer device. Then you print your document -- in this case, your presentation slides -- to this special printer, and it saves the output as a PDF. This works from any program that can print, not just PowerPoint.

On a Mac, it's even easier -- the capability to print to a PDF is built right into the system. This tutorial from MIT shows how to do it (with pictures).

Microsoft has even created a downloadable add-in to allow you to generate PDFs from within Microsoft Office.

If you happen to have paid for Adobe Acrobat (not just Acrobat Reader), you can generate PDFs with Acrobat.

2. Create your presentation in PowerPoint in the normal way

3. Generate a PDF of your final presentation

Create your PDF by printing to the PDF printer from within PowerPoint. If you're using LibreOffice, just click the PDF button in the toolbar to create it.

4. Deliver your presentation from your PDF instead of your PowerPoint

To do this, just open the PDF in the normal way with Acrobat Reader or other PDF-viewing software.

The trick is to make it display full-screen. In Acrobat Reader, hit Ctrl-L to make it full-screen. In full-screen mode, use the arrow-keys or click the mouse to make the slides advance. (Left mouse button, down-arrow, or right-arrow all advance one slide; right mouse button, up-arrow, or left-arrow all go back one slide.) When you want to exit full-screen mode, hit the Esc key. (Mac users can use Adobe Reader, or see Apple's instructions for using Preview in full-screen mode.)

By delivering your presentation from a PDF, instead of a PowerPoint file, each slide will retain the layout you created for it originally. If the resolution is not the same once the projector is plugged in, a PDF will scale down gracefully, instead of scaling some elements and leaving others full-size as PowerPoint does.

Then you will be free to concentrate on delivering an excellent presentation -- not worrying about the visual mess your carefully-crafted slides could become.

Well, I'm not sure if animation or sound will work. Probably not, actually. But I see that as a feature, not a bug. ↩

In an earlier post I discussed some
of the limitations of full-text searching in digitized copies of historic
mining engineering literature, and suggested several historic index
publications specific to mining engineering that could be used to augment
your search by looking for information the old-fashioned way.

Mining engineering information also appeared in historic indexes that
covered engineering as a whole. Third-party indexes that cover engineering
topics generally include at least some of the more popular mining
engineering journals in their coverage. These indexes can help you cover
more ground, and can also help unearth the occasional mining-related article
in a general engineering periodical that might not have gotten picked up by
one of the more specialized indexes. Finally, general indexes can help
reveal how technological change in the mining industry is related to the
engineering industry as a whole.

The most important index for very late 19th century and 20th century
engineering is the Engineering Index. It began publishing its compendium
volumes in 1892, with coverage back to 1884. The first two volumes were
published by the Association of Engineering Societies, but then beginning
with vol. 3, with coverage beginning in 1896, the work was done by The
Engineering Magazine. Beginning with volume 5, published in 1906, the
index covered one year per volume. Ownership has changed hands several
times, but the Engineering Index, now known as the COMPENDEX, is still
being published and updated today by Elsevier, at http://www.ei.org. A
paid subscription is necessary to access this site, but the index is very
valuable as the back files of the Engineering Index have all been
digitized and are searchable via the database.

Another useful third-party index is The Industrial Arts Index.
Compared with the Engineering Index, the Industrial Arts Index had
more application-oriented (and business-oriented) items. Coverage
began for periodicals published in 1913. After the 1957 issue, the
publication split, becoming the Business Periodicals Index and the
Applied Science & Technology Index. The latter still exists as an
EBSCO product. The main version covers from 1984 to the present, and
an additional "Applied Science & Technology Index Retrospective"
package contains complete data back to the beginning of the
Industrial Arts Index in 1913. These are also paid subscription
services.

Many excellent historical books on mining and mining engineering are now
available from archive.org and Google Books. In some cases, these
repositories also have partial (or, rarely, complete) runs of historical technical
journals of interest to mining historians.

Full-text search of these digitized books is a godsend, but sometimes it doesn't work right. Maybe you are looking for a topic instead of a keyword, or the conversion to text mangled the word you want, or the website doesn't allow you to look inside several volumes at once (ahem, archive.org), or there's no OCR text layer at all (as when downloading PDFs from Google Books).

But researchers can still find information in these newly-digitized historic sources using old-fashioned analog methods. Thinking about these files as the actual book volumes they represent helps us consider the tools that their makers would have used to
search for information. In an era long before full-text search,
readers relied on indexes to help them find information, and it was
critical that these indexes were made carefully. These indexes can still help us find information, often different than what you find with full-text searches.

Journal Indexes

Most technical journals from this period were printed with soft
covers, issue by issue, which were intended to be bound together into
a "volume" when enough had appeared. (Volumes typically encompassed six months or a year, depending on the journal.) When the last
issue in a volume had been produced, the publishing company would
index the contents of all the issues for the year. The idea was that
this index would get bound together with the issues in the resulting
volume. However, since each index was originally loose -- like one
more issue of the journal -- different binders might put it in
different places. Most common was to bind the index at the end of the
run of the journals. However, sometimes the index was bound in the
very front, before the issues. Knowing this can help you find a
volume's index when you download the electronic version. Check the
front of the file, and if it isn't there, check the back too. Here's the index at the back of an 1874 volume of the Mining and Scientific Press; and in this 1916 volume, it's in the front.

Some journals would occasionally compile together these yearly indexes
into a more comprehensive index that covered a span of years. For
smaller publications, this might be issued as a supplement to a
regular issue, but in other cases this would be a stand-alone volume.

Third-Party Indexes

Other indexes were compiled by third parties, and cover more than a
single periodical. Many of these covered engineering topics generally, but some were specific to the mining industry.

Mining-specific third-party indexes

For mining engineering around the turn of the 20th century, Walter
R. Crane's Index of Mining Engineering Literature is very valuable.
He indexed some 30 periodicals, including some that were not devoted
to mining specifically and others published overseas on mining topics.
The starting point of the coverage seems to vary, but in the cases of
some periodicals dates to the early 1870s at least. He produced the
first edition in 1909, and a second edition, which contains different
references, in 1912.

While Crane covers years or decades at a time, another mining-specific
index covers less than a year in a single volume, but with tremendous
reach and detail. The Mining World Index of Current Literature grew
out of the efforts of the journal Mining World to keep its readers
apprised of new work being published in the field. Eventually the
index was published on its own as a standalone volume. Unfortunately,
this ambitious project seems to have lasted only a few years, but if your research falls in this short era, it's a great resource.

Modern electronic indexes

The modern heir to these efforts is the OneMine database, created and maintained by the several minerals engineering societies that succeeded the American Institute of Mining Engineers. The database contains both references and the full-text documents. OneMine allows visitors to search and find references for free, but requires membership with an affiliated society or institution to access the full-text documents. The references alone are quite helpful, however, and provide enough information that you should be able to track down your document in a library (or via inter-library loan). While OneMine has some older historical material, the bulk of the material appears to date from the last 40-50 years.

GeoRef is another modern database of importance to mining historians. Produced by the American Geosciences Institute, much of the content is geology-related rather than strictly mining, but plenty of mining-related material is included. The GeoRef dataset is available as part of subscription packages from EBSCO, ProQuest, and other database platforms, meaning you have to be at a subscribing institution to use it.

Unlike your favorite search engine, both OneMine and GeoRef are true indexes, rather than simply full-text search platforms. Entries are supposed to be thoughtfully structured and populated with keywords that can lead you to other useful entries. It seems probable that both platforms will eventually move to full-text search, but the effort and expense of adding full-text documents to old entries makes a retroactive update less likely. This means mining historians will do research using indexes, whether paper or electronic, for decades to come.