Description of the project.
This project is the Annotated Vital Records of Scituate. The first phase, described below, is to transcribe the Vital Records of Scituate Massachusetts to 1850 word for word for the web. There are two volumes, the first for births, the second for marriages and deaths, making a total of about 900 pages. There are numerous advantages of a version of the vital records on the web. Three main ones are:

Universal access. The printed version is available in the town libraries of Massachusetts and at major genealogical libraries in the country. When it's on the web, then anyone with a web browser can read it anywhere in the world.

Word searching. When the vital records are in a digital form, then they can be searched for particular words, parts of words, or clusters of words. Web search engines can catalog the vital records, and people can use the search engines to find names of particular individuals of interest.

Navigate information. With hyperlinks, it is easy to go from place to place in the vital records.

What to do after Phase 1? Phase 1 includes no annotations, but annotations are what interest me. Here are a few things I'm considering:

Include the intentions of marriages that were omitted because a marriage records was known

Include baptisms that were included because birth records were known

Include vital records of neighboring towns when relevant. For instance, many people resided in other towns but were married or baptized children in Scituate

Include standard spellings in order to make searching easier and more reliable

Include corrections to the vital records, such as found in the appendix of the third volume of Jeremy Bang's The Seventeenth Century Town Records of Scituate, Massachusetts

Include summaries of families that have been researched and published in books or in articles in journals

Include quotations from Deane's History of Scituate, Massachusetts, from Its First Settlement to 1831 and other sources. As Deane and others did make mistakes, the quotations may need qualifying comments

Include George Ernest Bowman's transcription of the early vital records as printed in the Mayflower Descendant

Include more census information. So far, only the 1790 census is here

These possibilities aren't all equally easy or important, and I haven't decided what to do. In any case, all annotations will be presented in color so it will be easy to distinguish original content from what's been added. And, of course, references will be cited for any such additions. I'm sure to start on one or more of these inclusions before Phase 1 is completed. (Transcription is particularly boring).

Phase 1, the transcription. the transcription phase will take me several months. This first installment (version 1.0, Jan 2002) of the vital records included over 200 pages of text, about 25% of the two volumes, primarily the pages that include families of personal interest.
The second installment (version 1.1, Apr 2003) extends that coverage to about 40%. It's going a little slower than I had imagined since I haven't gotten
much feedback on the project. I've only gotten two email messages since I put
up the last version, over a year ago. In comparison, I get two messages a day on my genealogical reports I've got up.

As source material, I've taken photocopied pages and scanned them into image files. I've also used the the scanned images at http://genweb.net/~blackwell/books.html. These images were then OCRed (Optical Character Recognition) into text files. The error rate of the conversion was very high. In order to reduce the error rate, some of the images had to be edited with Adobe Photoshop to align the text horizontally. Still, the OCR errors were innumerable. I think I could have reduced the errors by (1) scanning the original text for the images rather than scanning photocopies, (2) scanning for the images as a finer setting (smaller pixels), and probably (3) using better software to perform the OCR.

Here is an example of text showing it (1) right after OCRing, and (2) in the final version.

As you can see, there's a lot of work to do to clean up the text. This example text is typical. Some were in a better state, a few so much worse that I had to type them in myself. Each page takes 15 minutes or more of work. Most of that is correcting OCR errors, but there are also markups that have have to be done (such as boldface and tags for links), proofing, and keeping track of the state of every page. After editing a page, I reprinted it and compared it to the original. I did a fairly good job of proofing, but I'm sure there are plenty of errors that remain. If you have any doubt about something here, check the original printed version, or the images of it, or even better, check the microfiche of the original records. Let me know if you find any errors.