You are here

Digitization of AAVSO data published in the Harvard Annals

In the 1980s, the AAVSO began a monumental project to completely digitize the paper and punchcard records of the AAVSO. This project, undertaken just as computers became more common and easy to use, foreshadowed the rapid advancement in networking technology that led directly to the world wide web, and to the AAVSO website. As a result of the work the AAVSO did, we are now able to serve our archive of over 18 million observations to the entire world via our website, providing decades worth of variable star observations for thousands of different variable stars to researchers world-wide.

However, even this monolithic project had limits, and not all "AAVSO data" were digitized. In the early years of the AAVSO, while it was still a working group within the Harvard College Observatory, AAVSO data were routinely published via the Annals of the Harvard College Observatory, a research publication of the Observatory detailing large and important projects conducted by groups within the University. Between 1900 and 1925 several large monographs were written and published by HCO director Edward Charles Pickering, either directly, or with the assistance of AAVSO Recorder Leon Campbell. At the time, these publications were considered the definitive data source, and these data were therefore not kept with the AAVSO archives from which the Digitization project originated. As a result, many thousands of these early observations do not appear in the AAVSO International Database today.

We believe it is imperative both for AAVSO and for the scientific community at large to digitize these observations and make them available in electronic form.

What we have

The AAVSO has paper copies of the Harvard Annals, from which local volunteers can work here at headquarters. However, the NASA Astronomical Data System (ADS) has also scanned these publications, and images of the pages from these articles are available online through the World Wide Web.

What we need

We need to locate all of the Harvard Annals papers that published AAVSO data and other variable star photometry. We need to assess (a) how much data exists, and (b) whether it is unique, or was already digitized from some other source.

We then need to get these data into machine-readable form so that they can be entered into the AAVSO International Database. This will involve (a) finding and implementing a highly reliable optical character recognition (OCR) program to "read" the scanned files, and then proofreading the results of the OCR, or (b) digitizing them by hand, entering each observation into a spreadsheet or text file that can then be processed.

We then need to write a program that will turn the raw data into our standard Visual format files and upload them, or that will take the raw data and insert it directly into our MySQL database.

How can you help?

People can help in a great number of ways. If you're thinking of participating in some way,

Do you know of and have access to a highly-reliable OCR system?

Are you willing to proofread the results of OCR, or to manually enter lots of data on cloudy nights?

Are you willing to go through the literature and look for other sources of data of the 19th and early 20th Centuries?

Are you good at writing Perl, PHP, or other scripting languages, or working with MySQL?

Are you good at organizing and motivating people, and willing to serve as a project leader and help organize and maintain the digitization effort?

All of these will be needed to make this project a success. It's the perfect way to spend a cloudy night, because you're still adding data to the archives, and making new science possible through your efforts. If you think you can help in some way, please contact Elizabeth Waagen at AAVSO Headquarters. Matthew is organizing the effort along with AAVSO Archivist Dr. Michael Saladyga, but the AAVSO will need a lot of extra help to make this happen. Whether you can digitize one page or an entire volume, you can help!

Update: 2010 August 26

Right now we have two volunteers contributing their efforts to this project: Bob Stine (SRB) has started to digitize the observations of the GK Persei nova outburst of 1901, and Kevin Paxson (PKV) has started on Harvard Annals vol. 63 -- many thanks to both for their participation!

As an example of how these data fit into the current archives, this image shows a light curve for the Mira variable X Andromedae. The black points are those that were already present in the AAVSO archives, while the red and blue points are visual observations by Leon Campbell and Annie Jump Cannon respectively, taken from Harvard Annals volume 63. As you can see, the Campbell and Cannon data add several more cycles to the light curve of this star. This is just one star of hundreds, and a handful of observations among thousands that we can add to the light curves.

There's much more to do and more volunteers would help make faster work of a big project. If you're interested in helping out -- regardless of whether you can do a page or a volume -- please contact us!