# Download the report created by acumenCount, and open it in Excel, with tabs as delimiters. This report lists collection identifiers, titles, item count, file count, and whether the collection has an EAD. At the end is the total counts for both items and files.

# Download the report created by acumenCount, and open it in Excel, with tabs as delimiters. This report lists collection identifiers, titles, item count, file count, and whether the collection has an EAD. At the end is the total counts for both items and files.

−

==To collect the count on the Share drive, and combine the two for the monthly report: ==

+

To collect the count on the Share drive, and combine the two for the monthly report:

For the size of content on Share, use the monthly GB counter described here [[Monthly_gb_counter]].

For the size of content on Share, use the monthly GB counter described here [[Monthly_gb_counter]].

Revision as of 16:05, 2 November 2012

What is our monthly count?

We're currently counting TIFF and WAVE files in the archive, deposits, and on the share drive, and MP3 and large JPEGs in Acumen.

Contents

Share side preparation

In October 2012, we modified how we collect the count, to capture a snapshot across multiple locations and note duplicates which are in the process of being repaired. The benefits of this new method is that we're not trying to collect a 3-D representation of our work in a single fell swoop in a single spreadsheet; and also, that we can automate it and save a ton of time and confusion. We're also working off the actual existence of files, not off the reported digitization counts. This avoids a ton of errors caused by inattention, poor counting, and forgetfulness.

In addition, we also developed a method of tracking digitization that never sees the light of day. This can include:

digitization for donors, such as providing digital versions of content donated to Special Collections

digitization for publication, at the dean's request, often for cost recovery

weeded collections (such as those containing duplicates), and collections deleted due to rights issues

digitization of typed transcripts in order to obtain OCR to enable searchability of hand-written documents (we delete the transcript tiffs)

Part I: What's where right now?

Starting in October 2012, we are capturing monthly snapshots (an example can be viewed here of image and audio content, with an intellectual item count pulled from the filenames, across web directories (Acumen), the deposits directories (awaiting archiving), the digital archive, and the working share drive, with duplicate items/files there noted in the "Under Repair" columns for clarification.

Part II: How does this differ from last month? And how many GB do we have?

Since this does not clarify progress since the previous month, we will also be generating a second synopsis specifying GB counts, total digitization thus far, and what was accomplished this past month.

For example, for October, 2012, the synopsis was as follows:

Digitized 125 new items (8719 captures)

Total items digitized: 89972

Total captures: 323204

Total items in Acumen: 78704

Total captures in Acumen: 294350

Total GB in archive: 13852

Total GB on share: 474

Total GB: 14326

The script that generates this information picks up the information encoded in the tab-delimited text files that track Documenting Invisible Digitization and incorporate these into the Total items and Total captures sums, to reflect the work we've done that can't be seen. It also captures GB counts across the archive/deposits directory and the following share drive directories in the Digital Projects area:

Digital_Coll_Complete

Digital_Coll_in_progress

Digital_Coll_ON_HOLD

Digital_Coll_Proposed

If a REVIEW folder exists in Digital_Coll_in_progress, the size of that folder is subtracted, as those files are assumed to be copies of TIFF files already in the archive.

Part III: Synopsis over time...

A third script will collect monthly synopsis files, and generate a snapshot for the fiscal year thus far, with columns for each month and totals at the end and at the bottom.

Run GBcount: type in `GBcount` and hit enter. It will output on the command line the total count in gigabytes for the archival storage directories on the server. It will also output a detailed report in the scripts/output directory.

Run acumenCount: type in `acumenCount` and hit enter. It will output on the command line the total number of collections represented in Acumen, including the number which do not have EADs; the detailed report will be output in the scripts/output directory.

Download the report created by acumenCount, and open it in Excel, with tabs as delimiters. This report lists collection identifiers, titles, item count, file count, and whether the collection has an EAD. At the end is the total counts for both items and files.

To collect the count on the Share drive, and combine the two for the monthly report:

For the size of content on Share, use the monthly GB counter described here Monthly_gb_counter.