www.dloc.com 2 SECTION 1 About dLOC and Contacts About The Digital Library of the Caribbean (dLOC) is a cooperative of partners within the Caribbean and circum-Caribbean that provides users with access to Caribbean cultural, historical and research materials held in archives, libraries, and private collections. dLOC comprises collections that speak to the similarities and differences in histories, cultur es, languages and governmental systems. Types of collections include but are not limited to: newspapers archives of Caribbean leaders and governments, official documents, documentation and numeric data fo r ecosystems, scientific scholarship, historic and contemporary maps, oral and popular histories, travel accounts, literature and poetry, musical expressions, and artifacts. dLOC has become a premier international collection of Caribbean research resources. Established in 2004, dLOC has grown from the initia l nine partners to more the twenty dLOC is a content-contributing partner organization and all material s in dLOC will remain freely and fully available as open access. dLOC builds capacity in the region to support digi tization and preservation to provide access to their holdings locally and internationally. As of August 2011, the collection consisted of more than 10,000 titles and 1.5 million pages of content an d registers over 500,000 hits per month. dLOCs diverse partners serve an international communi ty of scholars, students, and citizens by working together to provide enhanced electronic access to c ultural, historical, legal, governmental, and research materials in a common web space with a multilingual interface.

www.dloc.com 4 SECTION 2 Selection and Collection Development Selection The Digital Library of the Caribbean is a cooperative digital library among partners within the Caribbean and circum-Caribbean that provides users with access to Caribbean cultural, historical and research materials currently held in archives, li braries, and private collections. In addition to facilitating access, dLOC also provides preservati on of all files, including the high resolution archival master files, in the Florida Digital Archive. dLOC welcomes collections that address the histories, cultur es, languages and governmental systems of Caribbean countries. The types of co llections appropriate for dLOC include but are not limited to: newspapers, archives of Caribbean leaders and governments, official documents, documentation and numeric data for ecosystems, scientific scholarship, historic and contemporary maps, oral and popular histories, travel accounts, literature and poetry, musical expressions, and artifacts. If you have another collection that you think may be appropriate, please contact us to discuss the collection. Copyright and Permissions Copyright and permissions are supported in the Appendix: Guide to Requesting Permissions. Collection Development We encourage partners to develop a digital collection plan to ensure that the content is converted into digital format based on its anticipated user de mand and its need for digital preservation. We also encourage partners to digitize materials that can be developed into thematic or topical digital collections. These are productive for show casing materials together for use in online exhibits, lesson plans, and promoting digital preservation of additional materials.

www.dloc.com 6 SECTION 4 Software Overview myDLOC Online Tools The freely available myDLOC online tools allow each partner to track the digitization of items and collect data about the digital resour ces. Login for access to the myDLOC tools at: http://www.dloc.com/my The myDLOC online tools support: o New item entry and online metadata creation o Uploading of digitized materials o Tracking the digitization of a resource through each of the required milestones o Tracking of all submitted items, includin g both metadata-only and full items with metadata and files The myDLOC tools are updated regularl y and automatically online. See the Appendix: Guide to myDLOC for current functionality. SobekCM METS Editor with FTP The SobekCM METS Editor with FTP is freely av ailable software that is installed on local workstations. It, in conjunction with the myDL OC tools, allows each partner to create metadata and track the digitization of items and collect data about the digital resources. The SobekCM METS Editor with available at: http://dloc.com/software/mets The SobekCM METS Editor with FTP support: o New item entry and metadata creation (SobekCM METS Editor) o Uploading of digitized materials (FTP) o Tracking the digitization of a resource through each of the required milestones (myDLOC online tools) o Tracking of all submitted items, includin g both metadata-only and full items with metadata and files (myDLOC online tools) The SobekCM METS Editor is periodically updated and the new software versions are available for download online. The myDLOC t ools are updated regularly and automatically online. See the Appendix: Guide to the SobekCM METS Editor for more information.

PAGE 7

www.dloc.com 7 SECTION 5 Metadata and Preparing for Digitization Introduction Now that you have selected your items for conversion to digital media, we will prepare for the digitization. This section will cover: Starting a new item Describing your item (creating the bibliographic metadata) Before continuing you will want to collect any information you have about the items chosen: catalog records, spreadsheets, cards, finding aids. Example 1: Starting a new item online with myDLOC We will need to enter some basi c information about your item. 1. Login to myDLOC: http://www.dloc.com/my 2. Create a new item by selecting Start a new item from the main menu 3. The permissions agreement screen will appear. To continue, this agreement must be read and accepted.

PAGE 8

www.dloc.com 8 4. The metadata entry screen will appear. Here the holding and source institution will automatically be listed as the partner institution. The title and resource type are the only required fields. Additional metadata can be entered now or can be added at a later time. For more on recommended metadata, please see the dLOC Metadata Guide : http://dloc.com/A00002864/

PAGE 9

www.dloc.com 9 5. After entering the title, resource type, addi tional available metadata, and click next to see the upload files screen, shown below. 6. Upload digital files for your item and click submit to complete the item. Any file type can be uploaded. Multiple files can be uploaded. Or, to submit metadata only, click submit to complete the item creation. When submitting metadata only you can add files later. To add files after submitting the metadata, go to the item online when logged in and select Manage Files from the top of the screen, as shown below.

PAGE 10

www.dloc.com 10 7. From the same menu, select View Work History to see the tracking of all aspects of the item: milestones, history, archives, and file directory information. These can be printed as tracking reports for the item, as shown in the example below.

PAGE 11

www.dloc.com 11 Example 2: Starting a ne w item with the SobekCM METS Editor and FTP We will need to enter some basi c information about your item. 1. Open the SobekCM METS Editor to see the first screen.

PAGE 12

www.dloc.com 12 2. Create a new item by selecting Create new METS file from the main menu. 3. Select the folder that contains your resource files or where you would like to have the METS file saved. The folder should be named in the proper BIB_VID format as shown in the screenshot above.

PAGE 13

www.dloc.com 13 4. Selecting the folder will bring up the screen to enter and create the new metadata file. 5. Enter the metadata in the tabs for Mater ial Information, Subjects and Notes, and Record Information. Again, only the ti tle and material type are required.

PAGE 14

www.dloc.com 14 6. To add files as listed in the metadata, clic k on Structure Map. From this tab, you can add files which are then shown within the structure map. 7. After entering the metadata and using the Struc ture Map to add the files, click Finish to save the METS file and close the editing form. If the digital files are not yet created, click Finish for now. After the digital file s have been created and added to the folder, open the METS file again and edit the Structure Map to add the files. 8. Once the digital files and the METS metadata file are in the folder created at the beginning of this process, the digital object package is ready for transfer to dLOC. 9. To transfer the files to dLOC, use an FTP pr ogram to transfer the full digital package to dLOC over the Internet. Alternately, the complete folder with all file s and the metadata file can be loaded to an external hard drive and shipped to the University of Florida for ingest into dLOC. Please contact the University of Florida at ufdc@uflib.ufl.edu if a hard drive should be mailed to you for transferring files using an external hard drive.

PAGE 15

www.dloc.com 15 SECTION 6 Image Theory and Specifications dLOC Requirement for Digital Master Files 8-bit Grayscale or 24-bit RGB Color (depending on whether original has significant color) 300 dpi for standard text or 600 dpi for stand-alone images (photographs, maps) Save archival files as uncompressed TIFFs Bits Depth In digitization three levels of Bit Depth are widely used: 1 Bit, 8 Bit, and 24 Bit images. A 1 Bit image is referred to as bi-tonal or, less precisely, as black-and-white. The picture elements of a 1 Bit image are expressed in stings of one bit. That bit may be either one color or an alternate and, frequently either black or white. An 8 Bit image is referred to as grey-scale, though an 8 Bit image may represent a very limited color spectrum as well. Most scanning equipment defaults 8 Bi t imaging to grey-scale. The picture elements of an 8 Bit image are expressed in strings of eight bits, for example: 00001111. 8 Bit images allow for as many as 255 shades or colors. (N.B. Technically 8 Bit images allow for 256 shade/color values, but one of these is reserved as a check-digit and is not used to express a shade/color value.) An 24 Bit image is referred to as true color or, less precisely, as a color image. The picture elements of a 24 Bit image are expressed in string s of twenty-four bits. 24 Bit images allow for as many as 16,777,216 shades or colors. You may he ar digitization specialists using the short-hand sixteen million colors. The 24 bits are divide d into three 8 Bit channels, one for each of three composite colors (Red, Green, and Blue.) Color Space Color fidelity is fundamental to accurate reproduction of source materials. Digitization, faithful to original colors, requir es a basic understanding of color and how color reproduction differs from printing technology to digital technology. Fundamental to these differences is the media on which a color image is printed. The color space most commonly used by digitization projects and required by dLOC, is a standardized Red/Green/Blue (sRGB) color space.

PAGE 16

www.dloc.com 16 Choosing the Appropriate Bit Depth and Color Space 1 Bit Image 8 Bit Image 24 Bit Image dLOC recommends that 1 Bit imaging should not be used. 1 Bit images, even at very high resolution (see, Resolution below), tend to pixelate text. Imperfections on the page or artifacts of age may read as black, obscuring text in 1 Bit images. In the 1 Bit page image above, bleed through from the text printed on the inverse page as well as artifacts of age obscure the text. Obscured text will introduce imperfections that redu ce the accuracy of text conversion by optical character recognition (OCR) software. The 8 Bit grayscale image above captures the textua l information. And, the reader of the page can make sense of the text. Readers of Latin religious texts, such as that seen above, will recognize the red text as instructions to the faithful, commentary on the spoken text of a religious service, or the narrative of the priest as opposed to that of the congregations response. dLOC advocates preserving meaningful color. Meaningful color is color required to interpret the text. In the case of a newspaper with colored images, a color image accompanying an article demonstrates meaningful color, while a color advertisement may not. It is true that The greater the Bit Depth the greater the size of the digital image file. But, digitization technicians are encouraged to produc e images that meet the readers needs rather than the needs of the digitization technician to conserve space.

PAGE 17

www.dloc.com 17 Resolution The resolution of digital images is expressed in terms of pixels. A pixel is a picture element or, simply, a block of solid shade or color that, to gether with other picture elements comprises a digital image. Trinidad and Tobagos Coat of Arms ( Zoom area in black box. ) RESOLUTION USE FOR OR 300 pixels per inch (ppi) 118 pixels per centimeter (ppc) Printed text with normal sized fonts Oversized documents and maps Manuscripts with legible script OR 600 pixels per inch (ppi) 236 pixels per centimeter (ppc) Photographs and select graphic arts Printed text with very small fonts Manuscripts with difficult scripts The dLOCs minimum digital resolution standard for printed text with normal sized fonts is 300 pixels per inch (ppi) or 118 pixels per centimet er (ppc). This threshold is based on both the characteristics of printed graphics and optical character recognition (OCR) tests. 300 ppi / 118 ppc The Rationale for Printed Graphics In general, the resolution of printed graphics do es not exceed 300 dots per inch (dpi) or 118 dots per centimeter (dpc). Dots per inch/centimeter are rough equivalents of pixels per inch/centimeter; so comparison is appropriate.

PAGE 18

www.dloc.com 18 Carifesta 72 logo printed in Guyanas Sunday Post and Weekend Argosy ( Zoom area in red box. ) Graphics printed in newspapers, for example, of ten have 80 to 100 dpi (32 to 40 dpc). Most graphics in magazines are printed with 120 dpi (48 dpc) print resolution while graphics in high-end magazines and on post-cards are printed with 300 dpi (118 dpc) print resolution. Digitization of printed graphics at resolutio n greater than 300 ppi (118 ppc) would be excessive. The Rationale for Optical Character Recognition ( Text Generation ) When a document page is digitized an image of the page is created. All text page images sent to the dLOCs central servers are subj ect to Optical Character Recognition (OCR). OCR is a process by which page images are converted to searchable text. Several OCR programs are in common use. Most are optimized for the conversion of images digitized with 200, 300, 400 or 600 ppi (80, 118, 158 or 236 ppc). Images created with other resolution can be converted to searchable text but, generally, with less accurate results. Resolution and OCR Accuracy in high contrast images 75 ppi Image 150 ppi Image 300 ppi Image 600 ppi Image OCR results OCR results OCR results OCR results L ~iC ud L~ddPa Label C of d Laid Pa Label C of d Laid Pa Label C of d Laid Pa

PAGE 19

www.dloc.com 19 The Importance of Bit-Depth on Text Recognition: the Latin word Feltis = Goodness 1 Bit Image This letter may be any of the following: c  e  o 0 8 Bit Image This letter may be any of the following: c  e  o  0 24 Bit Image The letter e appears now to be more probable. dLOC central servers use enterprise-level OC R software, configured with multiple OCR engines to ensure a high level of accuracy in text generation. For printed texts with normal size fonts, whether plain (sans serif) or embellished (serif), tests demonstrate that the average modern printed document is accu rately recognized at 200 ppi (80 ppc). dLOC sets a slightly higher st andard, 300 ppi (118 ppc), for pr inted texts with normal size fonts to compensate for occasional uses of small fonts or colored, aged (discolored), or blemished paper. Digitization of normal printed texts at higher resolution (e.g., 600 ppi/236 ppc), in tests, generally showed no increase in text conversi on accuracy. 600 ppi/236 ppc images result in higher conversion accuracy only when the source document is printed with very small fonts. 600 ppi / 236 ppc dLOC recommends digitizing at 600 ppi (236 ppc) only when working with printed texts with very small fonts; photographs and other cont inuous-tone graphics, and manuscripts with difficult scripts.

PAGE 20

www.dloc.com 20 Photographs Photographs, unlike printed graphics, have cont inuous-tone. In the source document, one shade or color blends into adjacent shades and colors. Continuous-tone images may be digitized at any resolution. dLOC recommends 600 ppi (236 ppc) resolution to facilitate special uses of images. Users of digital photographs frequently consult images for their various subjects as for the whole image. A user may want to zoom on the jewelry or hair braids in the photograph of a woman or on shop sizes in the photograph of a street scene. dLOC central servers use JPEG 2000 technology to facilitate zoom. Images digitized at 600 ppi (236 ppc) produce clearer, sharper, and more readable images than do 300 ppi (118 ppc) images. Saving Files and Image Compression Once the digital image is created, there remains the issue of saving or archiving the file. The digitization technician prefers not to loose a qual ity image to the imperfections of file saving and image compression routines. TIFF JPEG GIF TIFF contains all image data. JPEG compresses the image, seen here at leaf edges. GIF also compresses the image, seen here in color patches.

PAGE 21

www.dloc.com 21 Saving Files When saving an image file, the technician has a choice of file types, commonly including GIF, JPEG and TIFF. GIF and JPEG (sometimes: JPG) are Internet deliverable file formats. Only the TIFF (sometimes: TIF; Tagged Image File Format) is considered archival within the international digital library comm unity. It alone serves as a di gital master. There are several reasons for this, primarily: image quality with image compression. The illustration above demonstrates image quality issues as a factor in file choice. For speed of access online, dLOC creates additional derivative or secondary file formats from the digital master. With the digital master in TIFF, all needs are supported which again demonstrates the importance of saving in the TIFF format. Image Compression When saving an image file, of ten regardless file type, the technician will be given the opportunity to compress the image. Compressio n saves file space but has produces other and unwelcome artifacts. There are two classes of compression: lossy and lossless. Lossless compression is an oxymoron. Technica lly, a lossless image has no compression. A lossless image contains every bit of information created during th e scanning process. Here is another simplification: when the scanner captures the bit-stream 1 1 1 1 the lossless file saves 1 1 1 1 Though this makes for large files, it also makes for an ideal archival format and, therefore, optimal for file recovery should the digital master ever be damaged in use or degrade in storage. Lossy data compression technologies atte mpt to eliminate redundant or unnecessary information, storing a mathematical representa tion of the eliminated data. Here is yet another simplification: when the scanner captures the bit-stream 1 1 1 1 the lossy file saves a representation of 4 Because lossy images generate sma ller files, they can be delivered to readers via the Internet quickly. The human ey e compensates for image loss by filling in the gaps. But, because there is image loss, recovery from damage or degradation is more difficult and, in many cases, may be impossible without great expense.

www.dloc.com 23 SECTION 7 Scanning Creating Directories Before scanning, create the folder(s) in which yo u save the scanned images. For each item a separate folder should be created at C:/DLO C/ with the appropriate dLOC ID, for example o C:/DLOC/CA00000001_00001 o C:/DLOC/CA00000002_00001 o C:/DLOC/CA00000003_00001 dLOC Requirement for Digital Master Files 8-bit Grayscale or 24-bit RGB Color (depending on whether original has significant color) 300 dpi for standard text or 600 dpi for stand-alone images (photographs, maps) Save archival files as uncompressed TIFFs Flatbed Scanning: Epson Expression 10000 XL The following screenprints are specific to the Epso n Expression scanner, but the same settings apply to any flatbed scanner. Scan Settings Scan documents using Adobe Photoshop rather than using the scanners stand-alone image capture software and check the scanner settings with each new document. Before opening Adobe Photoshop, turn on the scanner and make sure that the bed is clean and free of any dust, debris, etc. If necessary clean the glass with a lint free cloth and a very small amount of glass cleaning fluid. 1. Launch Adobe Photoshop 2. Select: File Import Epson Expression 10000 XL

www.dloc.com 25 1. 300 dpi for mostly textual items 2. 600 dpi for stand-alone image items (photographs, maps, etc..) e. Click the CONFIGURATION button ( below the Preview and Scan buttons ) 1. Click the COLOR tab 2. Select NO COLOR CORRECTION 3. Click OK Scanning 1. Place item, image down, on scanner glass. Be care ful to place item as straight as possible in order to save time later. Close the scanner lid as much as item permits. 2. Click the PREVIEW button in the Scan Settings window. A small preview of your image will appear in the preview window. Make sure the entire document is visible, if not reposition on glass and re-preview. 3. Draw a bounding box around your entire image. If your original has 2 pages facing each other, draw a second box by selecting the dual marquee button Arrange each box to completely

PAGE 26

www.dloc.com 26 include each side of the item. Once you are satisfied with the boxes positioning click the ALL button. DO NOT move the boxes or change settings after pressing this button! 4. Click the SCAN button Saving Files 1. Save your image by selecting: File Save 2. Select the dLOC ID folder that corresponds to which the image being saved belongs. E.g., In separates folder at C:/DLOC/ with the appropriate dLOC ID, for example C:/DLOC/CA00000001_00001 C:/DLOC/CA00000002_00001 C:/DLOC/CA00000003_00001 3. Type in a sequential four digit file name, such as 0001 0002 0003 etc. 4. Select TIFF from the file format drop down menu 5. Always uncheck the ICC profile box 6. Click Save For TIFF Options select same as below

PAGE 27

www.dloc.com 27 SECTION 8 Image Correction The intent of any digitization should be a faithf ul reproduction of the original document. Toward this goal, images will need to be deskewed and cropped to fit the in-hand original. In addition, it may be desirable to perform color correction either to reproduce the in-hand original, or the original state of the document. Applying these tech niques in Adobe Photoshop is the topic of this section. Image Correction in Adobe Photoshop 1. To straighten drastically skewed images: a. Click and hold the Eyedropper Tool in the Photoshop tool box  Select the Measure Tool b. Click and draw a line to follow the bottom of any printed text, line or image ( line is red, here, for purposes of illustration ) c. Select: Image Rotate Canvas Arbitrary ( DO NOT change the angle ) click OK 2. Crop the image to remove any excess bo rders added during straightening using the crop tool 3. If necessary (e.g., if the image is muddy), adjust the levels/histogram by selecting Image Adjustments Levels

PAGE 28

www.dloc.com 28 Adjust the black, white and midpoints to improve your image quality and contrast. If the image is COLOR, you may make histogram adjustments for each RGB channel: Red, Green and Blue. But, do not over correct and eliminate detail. A histogram shows the distribution of tones over a range. The image characterize d by the histogram above is predominantly white. While the image contains shades of gray, deeper tones of black are almost entirely absent. 4. Images with good, thick printed text can al so be quickly corrected by selecting the documents white point. This is done by opening the levels/histograms by selecting Image Adjustments Levels In the levels window select the eyedropper furthest to the right and then select the point in your image that should be the brightest white. The images below show this effect before and after the white point selection. You will notice that the backgr ound becomes almost uniformly white, but the text is also lightened. Before selecting OK in the levels/histograms you will need to bring in the black

PAGE 29

www.dloc.com 29 point in order to improve the text. This is done by moving the arrow furthest to the left, in towards the right. You will notice that th e numbers in the Input Levels boxes increase. It is helpful to perform this correction while zoomed in to 100% on your image, as shown below. 5. If the image is extremely stained the document should be scanned in RGB and if possible, the stains should be lightened using Image Adjustments Replace Color

PAGE 30

www.dloc.com 30 Select Image and not Selection in th e Replace Color Window. Then using the eyedropper tool select the darker color of the stain. Adjust the Lightness, Saturation and Hue slider bars as needed to minimize th e stains. The fuzziness meter indicates how closely a color must match the selected color to be replaced. Be aware that stains may be similar in color to text and therefore too much manipulation is undesirable in order to not lose information. Often it is useful to zoom into one section of text while performing the color replacement. One must be careful not to make the text harder to read for the OCR engine. 6. Remember that any adjustments done to im ages can be undone as long as the file remains open. Maintain your history window open by selecting Window History in Photoshop, then simply select the previous step done. You can always go back several steps and re-correct your image. Other Adobe Photoshop Resources The original Adobe Photoshop installation package should include a tutorial of the software you purchased. In addition, Adobe has an on-line resource at the following URL: http://www.adobe.com/products/tips/photoshop.html Adobe, the Adobe Logo, and Photoshop are either regi stered trademarks or trad emarks of Adobe Systems Incorporated in the United St ates and/or other countries.