The value and limits of online research–a quickie case study

As any of you who have the historical hound dog’s desire to hunt know, the world of online research is expanding before our eyes. It’s an exciting time in some ways, as the source material available to us grows every day. I confess I have reveled in the chance to plow through the dozens of now-online newspapers from the Civil War period–papers that I have never seen before. I have learned a few things along the way–most notably that it’s not long before you strike a point of diminishing returns: the source material pours forth, but what it tells us that’s new narrows (the entire field of military history as it relates to the Civil War suffers so). I have found thousands of wartime letters in the last few months, and while some are highly quotable, it’s a rare day that I find something that really goes beyond the interesting to tell us something new or important. Still, I realize that sometimes the significant emerges from the assemblage of tiny pieces.

There have been some spirited debates about the limits of online research. We all know researchers and writers for whom the research world begins and ends at their keyboards. If something doesn’t exist online, then they’re not going to see it. Indeed, it’s easy to imagine that we have a whole generation of historical thinkers who will be conditioned to find their material online, and largely only online. What does that mean to our historical work? I have been pondering a way to gauge just how important online resources have become, and I offer this little tidbit. Definitive? No. But maybe a useful reminder.

I went through the footnotes in seven of the 25 chapters of my book Return to Bull Run, which I finished writing in 1992 on the eve of the digital age. I looked at every citation in every footnote to calculate what percentage of them could be had online today. Going in, I guessed about 25% of the citations I included in the book would be available online today (by that I mean available on a permanent website; I did not include ebooks in the calculation, unless they were available for free use at Google Books or other archival site). I was wrong.

The seven chapters I reviewed included 663 citations. Of those, the material cited in 419 of them can today be found online. That’s 63%.

Given the nature of the book–a battle book–it includes a goodly number of references to the Official Records, which of course are readily available online. Of the 663 citations, 191 were to the ORs. If we back those out, we are left with 228 out of 472 citations available online–still 48%, a much higher number than I expected.

Swapping over to the bibliography, the percentages drop quickly. I didn’t go through it item by item, but it’s apparent that just a tiny share of the manuscript materials I used are available online today–probably less than 5%, if that. It’s worth noting that the manuscript collections I used included some of the most critical material available–Fitz John Porter’s Papers at the LOC, T.C.H. Smith Papers at the Ohio Historical Society, John Warwick Daniel’s papers at UVA, or the unpublished US Army Generals Reports of Civil War Service at the National Archives are four examples among many. Without collections like those, little new would have emerged, and I daresay the book would hardly have been worth writing.

To me, the big news is this: The internet has done little to affect access to manuscript collections (beyond making catalogues and finding aids more readily available), and that’s unlikely to change anytime soon.

Of the newspapers I used in 1992, about half are today available online. Of books, again, it’s about half. All told, I estimate that about 20%-25% of the bibliography is available digitally.

What about the other side of the ledger: What’s out there today that couldn’t be had 25 years ago–that was too obscure, too restricted, or too far away for me to find in the early 1990s? The majority of material going online goes there because genealogists will buy or use it. Some of the material from the National Archives–widows pensions, Southern Claims–that’s finding its way online is a great boon. But those sources don’t begin to relieve anyone of the need to visit NARA. Mountains remain untouched by researchers and will remain so, largely because there’s no commercial benefit to digitizing most of it.

By far the greatest bonanza of new online material is in the realm of newspapers–again, much of it being done on various genealogy sites. In the last six months alone, I have found an additional 30 relevant newspapers–probably 70 additional accounts of the campaign–that I did not have access to way back when (I cite about 90 newspapers in the book).

Where does all this leave us? I think it’s a bit like one of those government reports that tells us eating too much sugar will make us fat. It’s obvious, and it’s clear too that digitization makes research easier. (And certainly we now have a legion of people doing research who would never have attempted it if not for co-axial cables.)

But the big question is, does the internet make research BETTER–does it ultimately improve the products historians are putting on the street? Based on my own work–then and now–my sense is that the answer is “no”–at least for the sort of narrative history embodied in Return to Bull Run. Writing it in 2012 would surely have been easier, but I wonder if I would not have been snared by the ease of online research and missed much that was important elsewhere. And looking at the fabric of the research and the construction of the book, I can’t say there’s anything out there in today’s digital universe that would have changed or improved the book markedly, while there is a great deal NOT online that would have spelled historiographical disaster had I missed it.

While it’s astonishing how fast the mass of materials online is growing, we are still far from the day when new, credible, comprehensive, and definitive history can be written from the digital domain alone. Covet your Ipads, but also hang on to those rolls of dimes for the copy machines, continue to make friends in your favorite repositories, and keep those laptops ready for transcription (barbaric though it may seem), because doing really good history requires all those things.

(A disclaimer: I’m not arguing here that there is nothing in Return to Bull Run I wouldn’t change. I would–including some revised thinking on big issues and players, like McClellan–but these have little to do with the fineries of new source material and more to do with my own evolving understanding of the war. But that’s a topic for another post on another blog.)

I really enjoyed reading your analysis here. As someone who blogs almost exclusively from digital sources, I can say that I have a deep appreciation for what is available online, and how fast access is growing. Indeed, because of my location and the difficulty of traveling – I know that I would not be able to write substantive posts without this proliferation of digital materials.

That being said, I have constantly found in doing online research that I reach roadblocks that I cannot solve online (especially where it relates to NARA materials). Often I know that these roadblocks could be easily passed with one quick research visit if I had the time and resources to do so. The best that I can do in these situations as an online blogger is be up front: noting in my posts what I do know and what I cannot find out online, and where answers that I cannot provide might be located offline.

The access to these online materials is invaluable to hobby bloggers like myself – and certainly adds to the amount of research being conducted. But I think you are right – any thorough and top notch scholarship still needs those materials not available online to be taken seriously.

First, this was an excellent post, and one which applies directly to what I’ve been doing lately, online research. I bought your book through the History Book Club as an 8th grader, and it still retains a cherished place in my ever growing collection today.

Let me echo Steve’s comments concerning my lack of time and resources to actually visit archives and museums around the country. I would love to do so, but as someone who runs a “battle blog” on the Siege of Petersburg in my spare time (well, what little is left AFTER two young boys go to bed, anyway), I just do not have the time and money to make any trips from southern Illinois to almost anywhere which houses important Siege of Petersburg materials. For the present, I have contented myself with purchasing some rather obscure reels of microfilm on CD from the National Archives, and I hope more material continues to become available for online research. Like Steve, I have hit many frustrating roadblocks along the way which I’ve tried to work around as best I can with the resources at my disposal. Sites like NewsInHistory.com, Fold3.com, the Library of Congress free newspaper database, and others have helped me to find and transcribe a ton of interesting material. I realize and am a bit discouraged that I’d need to be independently wealthy or start a new career as a professional historian (which isn’t going to happen) in order to truly find and study all of the vast wealth of material buried in Archives around the country. In any case, like Steve, I’ll continue to do the best I can with the resources at my disposal. Who knows, perhaps some boy reading one of the numerous Civil War blogs out there today will turn out to be tomorrow’s John Hennessy, an outcome we can all hope for.

I enjoyed the look at how much of your research from 1992 is readily available online today, especially since I’ve read Return to Bull Run at least five separate times front to back. I too am surprised by how much you can find today readily available on the internet. I remain hopeful that new technologies will eventually convert every single scrap of paper from every archival collection into freely available digital versions online.

Brett: Thanks for your thoughts. I hope you and everyone realizes my point is not to diminish online research, but to get a handle on just how comprehensive it can or cannot be. I have huge respect for bloggers like you (Siege of Petersburg Online–http://www.beyondthecrater.com/) and Harry Smelzer (Bull Runnings–http://bullrunnings.wordpress.com/) who do the rest of us such a huge favor by your work.

Your hope that all archives digitize themselves entirelly raises an interesting issue–one that I’d be curious how some archivists view. Can archives survive if all their material is digital and available for free? The cost of reproductions, the pride of donors (of both money and stuff)–aren’t these things rooted in the physical manifestation of these archives? Wil digitization put them out of business, or at least transform their business dramatically? Is the world of archives ready for that? Do they share your vision?

Yes, I completely understand that your post in no way reflects any desire to diminish online research. I should have clarified in my earlier comment. In fact, I’m glad that you’re helping to point out just how much has become available online in the last two decades. Thanks for the support and mention of my site.

Good point on archives and potential business models. I wouldn’t necessarily expect access to these materials to be free. The money to digitize items and host them online has to come from somewhere. Who better to fund that type of digitization than researchers like myself who would gladly pay a monthly, yearly, or “by the item” fee to gain access to items otherwise beyond my current reach monetarily? When you consider the cost of paying an independent researcher or flying/driving to an archives site, I’d be willing to pay a premium for these materials. I’d love to see archives and museums go to a business model similar to Fold3.com or any of the Newspaper digitization pay sites, but I wonder if that is even viable for most or if they’d need to pool together in large groups to make it feasible. Either way, I look forward to the growing amount of material online. I never would have guessed as that young 8th grader that I’d be able to speak directly to History professors, NPS Historians, and Civil War book authors and publishers directly via sites like this and email. The internet has really opened up a vast amount of material and contacts who share my passion for Civil War History. Keep up the good work!

Having spent hours at UVA special collections library reading Major later Senator Daniel’s papers I understand your position. What will happen to those papers in a few decades? If Daniel had written his book on the Civil War before he died in 1910, it would be available as a free e-book. It could be word searched, a capability that allows one to have a chance of being productive. There is little doubt, based on what I have seen in one box of papers, that he could have helped answer what really happened on May 6, 1864.Can NPS, UVA and the Library of Virginia cooperatively scan the Daniel papers and get them online?
Dr. Peter G. Rainey

Peter: I am afraid the NPS struggles mightily to manage its own archival material, and it seems unlikely that we would be able to get involved in a project like the one you describe for the Daniel Papers (which, to my continued surprise, are rarely used, though absolutely full of stuff). In fact, in the last couple years, we have digitized the park’s entire collection of research materials–a huge task (more than 550 volumes, 50,000 pages) largely done by some really dedicated volunteers. Though scanned, it’s a long way from being online, though at some point I expect that those parts of it we can put on line will go there (part of our collection consists of material copied from other repositories). As Brett suggests, it will be very interesting to watch how this all evolves. I would never have imagined that so much had become available so quickly. I am done predicting what the next ten or twenty years will bring. Thanks again. John H.

I really enjoyed the post. You’ve highlighted the double-edged sword that is on-line research. Other commenters (and bloggers) have pointed out how valuable on-line tools are for doing research when one isn’t a professional historian and has a lack of time and resources to travel and visit institutions that archive primary source materials. I often think about how what I do today could not have been done ten years ago. I am thankful for the on-line resources that help me to uncover truly obscure, little written about aspects of the war in Northern Virginia from the comfort and ease of my living room. But I too reach dead ends on the Internet, and nothing would satisfy me more than to be able to get to actual archives. For that, I dream of retirement, or at least, finding a reason to drag the family with me! Again, great post.

It may be obvious to those who have been working in history for some time, but I do think there ARE people out there who don’t recognize the limits of Internet research because they have known nothing else (I have encountered many). My real purpose was to take a little snapshot, using a piece of work done before the onset of the internet, to quantify where we stand. As I said, not definitive, suggestive, and even obvious. Thanks for commenting.

I was directed to this by Brett’s site. This is an excellent analysis of a subject that I’ve been wondering about for some time. Manuscripts seem to be the least likely candidate for on-line/digital conversion for a slew of different reasons but there are still nuggets in those sources which any study purporting to be “definitive” cannot afford to ignore. Here’s hoping that the post serves as a sharp reminder to researchers that in terms of having primary sources on-line “we’re not where we used to be but we’re not even close to being where we want to be”.

John – great post. Your question about archives not wanting to participate in digitization can be answered with your own points – if they aren’t online, they are ceasing to exist. I can think of several repositories that have resisted the trend that are all but ignored now. They may not know it, but the only thing that will matter in 20 years will be accessibility. It’s already happening. What this all comes down to is that the National Archives’ online presence is a disaster. I’d like to call for a Civilian Conservation Corps (II) to scan and transcribe government records – it is crucial that this be done, and soon. Your post will be interesting to revisit in 5 years – most of the sites and tools you mention are VERY new.

“…I’d like to call for a Civilian Conservation Corps (II)…” – Here! Here!!

And just to put a fine point on the need for this type of emergency plan, just a few weeks ago an entire roomful of court records, pension apps, business ledgers, record books, deeds, wills, and personal correspondence, etc., etc., dating back to 1840 (before) which were abandoned in a spare room in the Franklin County, North Carolina courthouse were destroyed by order of the NC Archives (against the wishes of the Franklin Cty Historical & Genealogical Ass’n, the County Clerk, and the community, who had come together with resources to preserved, photograph, and scan the documents) – deeming the materials of no value. All materials – a roomful – were lost to all time.

I agree that it’s necessary to do more research than what the internet alone can provide on-line. One of the best things the web does for me, a long-distance researcher, is facilitate communication.

I often acquire un-published letters and memoirs from a network of descendents of soldiers, and collectors, that otherwise might not be discovered, via email, and this has proved very beneficial. I’ve also met like-minded researchers on-line and we share materials – sometimes from the very archives too far away for me to access. I’ve acquired a lot of good primary source material this way.

Its not a substitute for travel though. If I can’t obtain copies of material via scans, microfilm or photocopies, (depending on cost) I know that sooner or later, (often years later) I will have to visit in person to get the information I want. I try and make the most of travel time when the opportunity arises. Waiting is frustrating, and there is a lot of information I know about but can’t get to. I could spend a lot of time I know, at the national archives etc.

The increase of digital newspapers and books found on-line is great, and fun to go through, and I like to read them for contemporary points of view. Its fun too, to find some obscure book now on-line, like Heros Von Borcke’s memoir, and add it to my narrative. But libraries and archives, as well as individuals, are still the best resource for me, and usually the materials I want to see are not digitized.

Great post! The amount of information on the web has increased tremendously over the past decade, but still many old sources cannot be found online. Another concern is that errors and myths are so easily copied online that they may become accepted as facts, at least by students.

This is an excellent post and it is fascinating to see how much of the source material for your Bull Run book is now available online. It is also absolutely true about the limitations- as a blogger on the Civil War, as with previous posters I have found myself frustrated on many occasions when a particular trail can’t be followed to its conclusion because of my remoteness from the physical archives. One of the most exceptional benefits of the material that has become available online is for those who (like myself) are not based in the United States. It has literally opened the door for us to explore the Civil War at a primary level in a way never before possible- in my case the Irish experience- through things like widow’s pensions and newspapers. I think one of the great long term benefits of this will be an increased interest outside the United States in the American Civil War, particularly in countries like Ireland and Germany where large numbers of those who served originated. Regardless though, a trip to the National Archives remains high on my ‘to do’ list!

Good post and very true. As the owner of one a site that tries to place as much information from primary sources out there to the world, it amazes me that people believe that what IS contained there is enough. I like to think that the material available on my site points people in the right direction, or whets their appetite to seek out more. Nothing out there comes close to the amount of material contained in the National Archives…or for that matter the local library.

I’d like nothing better than to see real, period source material digitized. But, as the big brother of an archivist I know that there is far too much to be able to do cost-effectively and far too few people to do that work, never mind the potential for damage.

Some day, perhaps, but the bottom line is that the internet is not going to replace actual, hands-on research any time soon.

Thanks Dale for the observations. Everyone should know that your blog, the 17th Connecticut, is a good one. I plan on writing about compilation blogs like yours, which provide a real service for people willing to dig into them. Thanks. John H.

John – Great post! Very insightful and interesting about what the future holds. There was one comment in particular that stuck with me and it’s this one:

“… it’s not long before you strike a point of diminishing returns: the source material pours forth, but what it tells us that’s new narrows (the entire field of military history as it relates to the Civil War suffers so). I have found thousands of wartime letters in the last few months, and while some are highly quotable, it’s a rare day that I find something that really goes beyond the interesting to tell us something new or important.”

In considering that comment, I think of the many new “battle” books that are pouring forth as the sesquicentennial marches on. There are many well-known authors publishing new books on major battles that have all been well-covered in the past. Having not read many of them, I sincerely ask, are there really that many “new” interpretations on these battles to be made? With regards to your comment, is the “new” source material that’s been uncovered over the past two decades or so really so deep and revealing that it allows for new interpretations of all these major engagements?

Or, are most of these “new” works merely tied to the sesquicentennial from a marketing sense but tell the reader little if anything that’s new in the way of analysis?

Yours are good questions. Authors ply their work for one of three reasons: they really have something new to say; they can tell a familiar story in a better way than has been done before; or they simply see an opportunity within the market to produce a profitable book. Beyond that, it strikes me that books that do have something new to say either a) add to our knowledge about a person or event, or, b) add to our understanding about how that event or person fits into the larger tides of history. A precious few do both.

Having written about a couple of battles in some detail, I wonder (even when I go back and read my own stuff) what the value of the detail sometimes is. I think a fair amount of our quest for additional detail is something inherent to the Civil War market, which seems to embrace detail to a greater degree than other historical markets. And so we have innumerable books that ply those waters. How valuable they are in the larger pictures is an interesting question. Interesting?…yes. Valuable?

I haven’t seen Scott Hartwig’s new book on Antietam yet. I know Scott well enough to know that his purpose is to reveal new details AND increase our understanding of how Antietam fits into the larger flow of history. I’m not sure other authors who dive into a battle or a regimental history have those dual goals. It might be well that they don’t, for those goals are difficult to achieve. I think most authors are satisfied to reveal a few new things without moving the needle much on our understanding of the Civil War within the broader context of American history. John H.

John – It’s been well over a year since you wrote this post yet I had reason to read it again. I said to myself a few years ago the day might not be too far off where an author could research and write a credible history book without ever leaving the confines of his/her office, considering how so much new source material was coming online. Do you think we’re now at that point? Paul

Paul. Credible and acceptable–perhaps on some topics. But not truly authoritative in a broad sense. Not yet. While there is a growing number of manuscript collections being made available online, beyond presidents and truly exceptional figures, the vast majority of collections remain in boxes, and will likely stay there for a long time. Institutions have many reasons not to make their collections freely available.

To me, the great risk is that digital availability WILL someday come to define the acceptable threshold of research; in fact, I already know more than a few researchers for whom sources simply don’t exist unless they are available online. Over time that may come to define the profession, and that will be too bad.

How far are we from passing the threshold where broad, thoughtful, and definitive studies can be done using online resources alone? I’d be curious what you and others think. Certainly online material is piling up at an astonishing rate. I can find something new on the Army of the Potomac or the Battle of Fredericksburg or [pick a topic] literally every night.

The Georgia State Archives was on the verge of completely closing to the public last year before a grass-roots effort by genealogists, historians, and other interested parties convinced government officials to keep it open. Even though it is now open a couple of days a week, it operates on a skeletal staff and will undoubtedly continue to do so for the foreseeable future. I mention this because it is frankly impossible for me to imagine the vast manuscript holdings of that institution (and many others) being digitized and made available online in the foreseeable future. I noticed similar, but not as severe, budget cuts at the Ohio Historical Society several years ago when I visited there to do research.
John’s concern that digital availability will come to define the acceptable threshold of research into Civil War history is definitely something to worry about.