Man of High Fidelity in the Public Domain

Saturday, February 4, 2012

While reading The Master Switch by Tim Wu I became increasingly interested in the story of Edwin Howard Armstrong, who is credited with inventing numerous technologies, including the wireless technology behind the FM radio. His complex and extremely compelling life story was best told by the journalist, Lawrence Lessing, who wrote a "biographic eulogy" of Armstrong.1

The 320 page text was copyrighted on November 9, 1956.2 And it was published on November 28, 1956.3 The Library of Congress Catalog Card Number is 56-11677. The original price of the book was $5.00, as cited in the dust jacket and a New York Times Book Review. The author of that article died on November 27, 1956, which was twelve days before the article was published.

A first edition of the text was added to the Dedham Public Library on February 13, 1958. Two more copies are available through the Minuteman Library Network. A copy of the book is available from the Internet Archive. It is encrypted by the National Library Service with the DAISY standard.

In order to legally distribute the biography online I must either receive permission from the copyright holder, prove that the book is in the public domain, or there must be a legal exemption. Works published in the United States before 1964 whose copyrights were not renewed, may have entered the public domain. This is because books published before 1964 had to get their copyrights renewed at the Library of Congress Copyright Office in their 28th year, which would have been 1984 for Lessing's biography. A circular published by the Copyright Office states:

Works First Published or Copyrighted Between January 1, 1950, and December 31, 1963 · If a work was in its first 28-year term of copyright protection on January 1, 1978, it must have been renewed in a timely fashion to have secured the maximum term of copyright protection. If renewal registration was made during the 28th calendar year of its first term, copyright would endure for 95 years from the end of the year copyright was originally secured. If not renewed, the copyright expired at the end of its 28th calendar year.

Another edition was published in 1969 by Bantam Books with a new forward by the author.4 However, that edition was published 13 years after the original, which would not qualify for the required extension. Bantam Dell Books can be contacted at [email protected].

The Public Catalog of the Copyright Office only indicates the original copyright date of 1956 and an original screenplay titled High Fidelity by Robert Mondlock, which was copyrighted in 1984.

Project Gutenberg will likely be unable to verify that Man of High Fidelity is in the public domain, because of the incredible amount of research required. Their website states that they "posted all Copyright Renewal records for books from 1950 through 1977", but I can not find a link to that post.5 A good resource for further information may be a Project Gutenberg mailing list.

Distribution

If it can be verified that Man of High Fidelity is in the public domain, then Project Gutenberg and Distributed Proofreaders provide excellent resources for how to process a physical book into a digital form that can be easily distributed. Generally, the process involves scanning pages of the book to image files, converting the images to text using Optical Character Recognition software, checking the conversion for inaccuracies and finally formatting the text into a digital format.

I have access to an HP ScanJet 5370C scanner, which has "Good" support by the free software API, SANE (see: SANE: Supported Decives). After a little testing I scanned the first 20 pages of Man of High Fidelity with the following command:

The scanimage application is bundled with the sane package. Brief descriptions of the flags used in the above example can be found in by issuing scanimage -h. As outlined in the Project Gutenburg: Scanning FAQ, the resolution of the scan should be between 300-600 pixels, since higher resolutions scans cause disproportionately higher scan duration than OCR accuracy. I found a resolution of 300 pixels to be very accurate when combined with image modification. Unfortunately, the --overscan-top and --overscan-bottom flags did not appear to work properly.

In order to improve the accuracy of the OCR application, I rotated the images and increased the contrast using the following command:

mogrify -rotate 90 -level 100 pp*.tiff

Further adjustments and testing are certainly warranted at this step. Increasing the contrast of the image to 100 substantially improved the accuracy of the OCR.

There are three well regarded OCR tools that are easily accessible from most Linux distributions; GOCR, Ocrad, and Tesseract-OCR. After reading a review of all three I decided to use Tesseract-OCR, without testing the other two applications. It provides the substantial benefit of providing language specific word libraries. As such, Tesseract-OCR should be run with the language flag:

tesseract pp1.tiff pp1 -l eng

The most common conversion errors to be aware of are punctuation errors. Although many of the errors will result from processing a character as a number, those mistakes are very noticeable when using a spell checking extension. The punctuation errors can be far more subtle, often confusing a period and comma.

As a test, I used this process to digitize the first chapter of Lessig's biography.

November 2016 Project Gutenburg submission

The above text has been revised since the original publication of this page. The updated text on the Project Gutenburg Copyright How-To as of 11/05/2016 is as follows:

Rule 6: Failure of U.S. Works to Comply With Renewal Requirements Prior to 1964

Rule 6 is currently under testing and revision. A future version of Rule 6 will describe the steps for demonstrating a U.S. Work is no longer protected by U.S. copyright law because of failure to comply with former renewal requirements.