Mapping Books

Thursday, November 6, 2014

Today marks the beginning of the 7th annual Schoenberg Symposium on Manuscript Studies in the Digital Age here in Philadelphia. This year the symposium theme is "Collecting Histories" and features a line up of speakers discussing the ways in which provenance and the history of collecting informs our wider knowledge about manuscript culture. As readers of this blog know, I'm very much interested in the historical movement of books and manuscripts and I'm excited to speak during the conference on the ways in which the Schoenberg Database of Manuscripts (SDBM) can be used to track manuscripts over time.

For this post though I want to highlight the fantastic work done by a team of scholars whose work very much informs the SDBM project. Over the past two decades, Lisa Fagin Davis and Melissa Conway have worked to create a new directory for all institutions in the U.S. and Canada which hold European manuscripts dating to before 1600. They have published their own excellent description of the origins and methodology of the project but in short their work began as a way to update the censuses of American manuscripts created by Seymour de Ricci from 1935-40 and supplemented by Faye and Bond in 1962. Their census includes entries for 937 entities: historical owners of manuscripts derived from previous censuses, the former names of institutions now renamed, as well as current holders. Running to 126 pages in a freely available PDF sponsored by the Bibliographical Society of America, the census is an incredibly helpful resource and I wanted to find a way to make the data contained within it browseable in a different way than just on the printed page.

Example of a listing from the Fagin Davis & Conway Census (p.37)

I extracted the text from the PDF census and chopped it up into relevant delimited fields like "Name" "Address" "Holdings" etc. and then mapped the results using CartoDB. I had to make a few decisions about display along the way, especially when it came to how to determine the size of each manuscript owning dot on the map. Most institutions provided Fagin Davis and Conway with numbers for how many manuscript codices they held as well as how many leaves, documents, and scrolls were in their collection (though others reported only an aggregate number). Most institutions with full-fledged manuscript books had a fairly well-informed count of exactly how many they had but the numbers for leaves and documents often were estimated in larger round figures. As a result, the default map view gives all locations in the census with dots on their
locations by number of total manuscripts held (leaves, codices, scrolls, documents, etc.). Using the "visible layers" dropdown you can turn off and on just
those locations currently holding manuscripts or just those recorded in earlier
censuses which no longer hold manuscripts or both together. Of course sizing the dots by total manuscript holdings will be necessarily a bit misleading as a
university with 2 codices and 37 leaves appears to have total holdings
of 39 manuscripts, so there is also an
option in the "visible layers" menu to view only holdings of codices.

Unsurprisingly one can see the concentration of pre-1600 European manuscript holdings along the east coast. In a league table of manuscript holders New York, Washington, and Philadelphia(!) come out on top by volume but in terms of individual institutions the Huntington and Folger with their extensive holdings of pre-1600 documents come out on top.

Top-15 current owners of pre-1600 manuscripts by "total" count in the Fagin-Davis/Conway census

Given the fuzziness of this catch-all "total" manuscript number it's helpful to also get a sense
of institutions by number of codices held:

Top 15 current owners of pre-1600 manuscript codices in the Fagin Davis/Conway census

One of the advantages of using the Fagin Davis and Conway survey is that it lists private collections, and in the cases where these were dispersed or relocated, notes their current location. I don't think it would be terribly controversial to say that most collections of medieval manuscripts in the U.S. and Canada rest on substantial gifts from individual collectors or families. The remarkable extent of these private collections can be seen in part below:

Top 15 now-relocated collections of pre-1600 manuscript codices in the census

It's edifying to see the late Larry Schoernberg at the top of the list of codices, especially today during the conference celebrating his legacy. His manuscripts are now here at Penn but a decade ago when they were in Longboat Key, Florida they made that small community the largest holder of pre-1600 manuscript codices in the south. Others on that list will be familiar to many, including George Plimpton whose manuscripts are now largely at Columbia University and Thomas Marston whose collection is at the Beinecke, and Ricketts, whose collection is now mostly at the Lilly library.

Saturday, June 28, 2014

This past week I attended the annual conference of the Rare Book and Manuscript Section of the Association of College and Research Libraries. It's fantastic to be around so many wonderful book people and hear their take on the state of the field. As part of the program, RBMS hosted panel on "the market" with Nina Musinsky and other members of the trade and library world. Seeing the plenary and Musinsky's talk reminded me that I'd started several months ago to make sense of some data on 2013 book and manuscript auction sales but never finished.

On January 1st this year, the collector services site Americana Exchange (AE) posted a list of the "top 500" auction results by price for books and manuscripts for the previous year based on their valuable in-house data. I thought I'd clean up and parse this data a bit and try to make some sense from it. The AE's table makes it easy to see the list by value, capped off by the Bay Psalm Book which sold for $14 million at Sotheby's. I wanted to get a sense though of the field as a whole. First off, while I was unsurprised that Sotheby's and Christie's dominated the field in terms of auction houses selling top lots, I was impressed by the fact that 48 different auction houses were represented over all 500 lots!

Top 10 Auction houses in 2013 by number of the top 500 lots sold.

I then thought I'd look a bit at the age of the items being sold in the market - was the 20th century the hottest? The 16th? After a bit of cleanup I assigned dates to 497 of the 500 items and plotted them out.

Number of items in top 500 by century

There are no huge surprises in the above table with the 19th and 20th centuries responsible for the majority of the top value book and manuscript auction sales with the 17th century the poor relative in the printed-book era. The list is of course worth looking at carefully in comparison with the numbers, you'll see, for example, that a sale of comic books at Heritage Auctions really boosted the number of items from the 20th century.

Total value (US$) of items in top 500 by century

The twentieth century also fares well in looking at the total value of all lots by century, but you'll see the 17th century recovers thanks largely to the Bay Psalm Book whose high price compensates for lower total sales from the period.

Average price (US$) earned by items in top 500 by century

In looking at averages, medieval manuscripts, though numerically fewer on the list, shine through thanks to their higher per-item value. You'll see of course that the Bay Psalm Book is responsible for that inflated 17th century average.

The AE data also includes information on auction house estimates which provides an interesting window on which items blew away expectations (or which had artificially low estimates). I've divided the final sales price by the low estimate to get a kind of 'estimate factor' by which lots overperformed.

Top 10 auction lots of 2013 by how much they exceeded their low estimate

Turing's "On Computable Numbers,
with an Application to the Entscheidungsproblem" Proceedings of the LondonMathematical Society (1937)

It's nice to see this juxtaposition which shows two of the many sides of the collecting market. Early printed books like the 1555 Labé continue to do well both for their physical beauty as well as their historical importance (one of the most important early printed compilations of a female poet) while the Turing offprint demonstrates the power and interest of a cohort of collectors attracted by the recent history of science and computing. Both are historically significant material and intellectual objects and I think pretty compelling evidence for why it's a great time to be working in the Rare Book and Manuscript field.

My data set, based on that of AE but with my addition of dates and the 'estimate factor' can be found here.

Friday, January 24, 2014

Today is the American Library Association midwinter meeting LibHackathon here at the Penn Libraries. I thought I'd share a project using library data that I've been working on for a little while now in the hopes that it will be not only useful to scholars but also might generate some conversation over how libraries and archives distribute their valuable descriptive information.

Over the years and especially here at Penn I've been fortunate enough to work with a number of catalogers in both special and general collections. I can't think of a more under-appreciated part of the scholarly community. I've seen first-hand how much time, energy, and bibliographic skill goes into the description of texts and objects of all kinds. I've heard heated debates over whether one piece of information or another should go into one of the million-and-one MARC fields. What comes out of the other side of this process should be a goldmine of easily usable truly 'big' bibliographic data. Instead, I think it's safe to say that 99% of library users have no idea why one might want to search the 752 field instead of the 260 field for place of publication. Moreover, this is hardly the sole fault of users. Try searching any library online catalog for just information from subfield c of field 300 and see how far you get! So much structured data ignored and thousands of hours of cataloger effort hidden from the world [1].

Fortunately the data is there if you know how to find it [2]! I've been playing around with our catalog data at Penn for a while now and decided a few weeks ago that I wanted an easy way to visually display networks of provenance in our manuscript collection. Penn has a deep commitment to provenance and book history and for my money our catalogers have done some of the richest work in describing provenance of any manuscript collection I've seen. The Kislak Center here at the Penn Libraries currently has cataloged around 1,640 codex manuscripts (manuscripts bound in book form) as well as around 300 codex manuscripts from the Lawrence J. Schoenberg collection [3]. I knew from experience that most of these had detailed descriptions of former ownership in their online catalog records and it seemed reasonable to just download them all and make a quick visualization of who owned which manuscripts in common.

I realize now that this task would have been near to impossible at most libraries where the online catalogs and back-end databases don't easily allow public users to batch download full records. Fortunately at Penn all of our catalog records are available in MARC-XML form which looks something like this:

I knew that our catalogers were keen on including structured data about former owners in the 700 field with a "former owner" phrase after their name. It was easy enough to download a list of all of the manuscripts that possessed this field. Then, after some much needed coaching from Dot Porter, the Kislak Center's XML guru and medievalist extraordinaire, I was able to write an XSL transformation which would spit out just what I wanted. At first glance though, I didn't turn up nearly as many results as I'd hoped and I seemed to be missing a lot of data. Looking through the records I saw that, on the plus side, the 700 field was highly structured with authorized name headings but didn't always incorporate all of the rich narrative textual information in the 561 field (labeled "provenance" in our public catalog. For example, an owner like Sir Thomas Phillipps would have his name included in the 700 field but the auction house which sold the manuscript would appear only in the 561. This is for very good reasons ("Sotheby's" is rarely a "former owner") but I really wanted to know everything about a text so I moved on to extracting every 561 field from the manuscripts. Instead of nice, neat authorized names, I of course got a lot of fascinating narrative:

I broke each of these lines of narrative into sentences and began the arduous work of identifying each owner in a chain of provenance uniquely. After some maddening time using OpenRefine, regular expressions, and plain copying and pasting I got a list I was happy with. In the end I came up with 3,252 manuscript/provenance pairs, like so:

1,285 of our 1,640 odd codices (including two ms. rolls, because: why not) had at least some provenance data recorded as well as an additional 265 of the 311 Schoenberg manuscripts we've cataloged. Out of these I was able to identify 985 "unique" entities through whose hands our manuscripts had passed. More interestingly, 225 of these owners had formerly been in possession of two or more of our manuscripts.

The historical strengths of our collection and Penn's institutional history can be seen pretty clearly here at the center of the cluster. Our codices primarily come from European and American collections as mediated by the prominent dealers and auction houses of London, New York, Philadelphia, Paris,Florence, and Munich. In addition we have received several very large collections over the years including the Gondi-Medici collection via the dealer Bernard Rosenthal and the recent gift of the Lawrence J. Schoenberg collection.

Thursday, January 2, 2014

With the annual conference of the American Historical Association (AHA) starting today I'm excited to see friends and hear some great papers. I'm always struck by just how broad a field 'history' represents but yet how often historians are able to make connections to each others work, even when far removed temporally and geographically. In reading the AHA's flagship journal, The American Historical Review (AHR) this year I especially enjoyed seeing places where seemingly unconnected articles spoke from similar frames of reference, and most interestingly, from overlapping source bases (be sure to check out my Penn colleague Vanessa Ogle's great article on the history of time reform!).

Authors of articles in the 2013 AHR connected by commonly used archives

As this site indicates, I'm very interested in tracking the circulation of texts, ideas, and archives over time as well as how these sources are used by scholars. Tracking networks of citation is nothing new and has been a favorite activity of scholars for centuries but recently there's been a surge of interest in quantitative analysis of academic citation patterns. Most of this interest has been in the sciences and social sciences where "impact factors" (put simply, the quantity and importance of articles citing one's work) are de rigueur in weighing scholarly merit. Though I'm wary of many of the developments in this "bibliometrics" field, some of the more useful advances have been in using data about authorship and citation to show the material ways fields are constructed, i.e. the influence of certain universities, graduate programs, or scholars in a specific sub-discipline. Here at Penn for instance, my colleagues at the library have helped the school of Medicine and others to create a way for viewing co-authorship networks of particular researchers.

Though tracking citation of articles and secondary sources in a journal like the AHR would really illuminate networks of influence, interest, and argument, I'm more interested in how historians use archival sources. This is especially important given that the bibliometric wizards at big publishing companies like Elsevier and Proquest have done a decent job at figuring out article and book citations and linking them together, but with much less success with archival sources.

I extracted data on archival sources from 16 of the 17 feature articles in the five AHR issues for 2013 [1]. The authors of these pieces did not disappoint, citing 66 different archives and libraries located in 54 different cities from Berkeley to Sarajevo to Zanzibar [2].

Despite disparate topics and the relatively random assortment of scholars and articles across the year's issues (as far as I can tell none of the articles were grouped in 'theme' issues) there were several nodes of archival overlap.

Archives used by multiple 2013 AHR authors

Obviously one year of the AHR is a pretty weak sample but I suspect the pattern established would hold across a wider swath of the journal - i.e. an impressive array of geographically dispersed archives based on the focus of particular authors as well as a concentration of overlapping citation from the major state and university archives and libraries of Europe and North America. Along these lines I would be curious to see how the influence of particular archives have waxed and waned over the years in the profession, I imagine that a select number of repositories (NARA, the UK national archives, the British Library, Library of Congress, the BN in Paris, various German archives, etc.) have long been dominant across geographic and temporal fields given the institutional makeup of the historical profession but I would also be surprised if the dominance of these central archives haven't decreased given methodological and theoretical shifts in the discipline since the 1970s.

Tuesday, November 12, 2013

Movement of books from medieval libraries in the MLGB3. Medieval locations (red), Current locations (blue)

Today I'm teaching a workshop on using "screen scraping" in the digital humanities. No workshop is really useful without practical examples so last week I decided to try out my screen scraping chops on an exciting new database of book history data. The Kislak Center at Penn (where I'm Scholar in Residence) is quickly becoming one of the most important sites for book and manuscript provenance research and I wanted to see what I could do to highlight the potential for making extant provenance data more useful through new visualizations.

Several years ago, a few of the scholars behind the monumental Corpus of British medieval library catalogues project (now at fifteen volumes) led by Richard Sharpe began working on an online database to update and provide access to the wealth of information on medieval manuscripts contained in Neil Ker's Medieval Libraries of Great Britain (1941, 1964, and 1987). These volumes include accounts of books and manuscripts known to survive today which once were owned within Great Britain before the mid-16th century. Recently, through grants from the Mellon foundation and others, the team has taken much of this information and made it available online in the MLGB3 searchable database. The site appears to be in beta mode at the moment and intermittently accessible but when it launches fully it will be an amazing resource and the culmination of a good deal of work by Sharpe and others. Looking through the database I was especially intrigued by the wealth of data on the current location of many of these medieval books and manuscripts. Given how comprehensive and detailed the project data is, even at this stage, I wanted to get a sense of what kind of picture would develop if we looked at the points of origin and current location of all these manuscripts in aggregate.

As of last week, the MLGB3's online database included over 6,000 records for books and manuscripts owned by medieval libraries. In order to look at them in aggregate I used the ever-helpful wget utility to pull down each record in order. I was left with a gigantic mess of html with the useful data hidden within it. After extensive cleanup and parsing of the data I was able to throw the location names of the original medieval libraries as well as current owners against David Zwiefelhofer's geocoding service (which I believe uses the Yahoo API) to get longitudes and latitudes. This didn't go entirely smoothly as the names of ruined monasteries tend not to register very well in geo databases. Fortunately, there are a wealth of wikipedia entries providing detailed long./lat. information on a wide range of English historical sites and I was able to fill in the blanks.

Libraries in Medieval Great Britain (MLGB3)

Current Locations of Books from the MLGB3

Worldwide Current Location of Books in MLGB3

What most struck me from this preliminary view (I'll wait until the final MLGB3 release to make sure) is how much less movement there was than I expected. That is, if books owned by medieval libraries are any indication, the cultural patrimony of Great Britain has not moved far from its home. Over 93% (5900/6316) books from the MLGB3 data show up as being currently held in Great Britain leaving just 416 in other locations. This visualization of course elides the many movements of books between when they were cataloged or inventoried in the medieval period and when they reached their current place of residence. That being said, I wonder how a similar map of the dispersal of French or German monastic libraries would look? Are 93% still in their country of origin (loosely defined)? I doubt it.

Benedictine Abbey of St. Augustine, Canterbury

Benedictine Cathedral Priory of the Holy Trinity, Canterbury

Psalters in the MLGB3

When the data are finalized I look forward to examining in detail what mapping can tell us about the differential fate of manuscripts from certain locations, or even certain kinds of manuscripts. For example see above for the relatively similar dispersal patterns of two Canterbury libraries or right for the dispersal patterns of psalters. Likewise, in the future I would love to combine the MLGB3 records with those in the
Schoenberg Database of Manuscripts (SDBM) here at Penn. For instance, manuscripts from St. Augustine's in Canterbury feature in over 100 transaction records in the database. Similarly, the database staff here has entered over 3,200 manuscripts based on entries from Ker. I can imagine also how the fantastic resources within the MLGB3 project could be linked with extant digitized copies of the manuscripts mentioned. The one Penn manuscript noted in MLGB3 (ID 316, formerly Phillipps 20547 and Lea 23) comes from the church of St. Deiniol in Bangor and it would be fantastic to display the digital facsimile of the ex libris inscription alongside the entry. In other words, there's no more exciting place to be for linked digital humanities data than provenance and book history!

Thursday, October 10, 2013

Here at Penn, the rare books cataloging team has been working for the past several years to put images of bookplates, bookstamps, and other provenance markings online in order to facilitate identification of former owners and libraries. Thanks to the project, I've become increasingly interested in how digital tools might help scholars reconstruct historical libraries and networks of texts.

I've long been interested in the mass movement of books that took place over the 19th and 20th centuries, whether as a result of the dissolution of monasteries, the increased economic and cultural resources of the United States, or the unprecedented tragedies of the World Wars. The wide-scale looting and destruction of books and cultural artifacts by the Third Reich in the 1930s and 40s has drawn an increasing amount of scholarly interest in the past few decades [1]. Even George Clooney is getting in on the action with his upcoming movie on the "Monuments Men" team that worked to locate and preserve works of art during the last months of the war. In reading more about the fate of books and libraries destroyed or stolen by the Nazi regime I was excited to see that the records kept by the central collecting point for looted books at Offenbach were available both in microfilm and (for a fee) online. These records were largely compiled and saved by Ardelia Hall (1899-1979) who was an adviser to the State Department with a tireless focus on returning looted WWII property.

By mid-1946, U.S. and other allied forced had assembled more than 2 million
books from Nazi repositories at Offenbach with the aim of returning books to rightful owners wherever possible. The records of this endeavor are voluminous and are available in some 13 reels of microfilm from the National Archives as NARA M1942. This microfilm series has been digitized by Fold3 and is available to subscribers of that service.

To aid their work of cultural restitution, officers at the Offenbach depot made several albums of photographed bookstamps and marks found inside books in their care, which they organized by apparent place of origin. They also created additional albums featuring markings from private libraries and owners which bore no readily identifiable geographic point of origin. All told the albums contain thousands of ownership marks, a perfect candidate for mapping. Feeling decidedly unqualified to tackle the album of markings from Eastern Europe or the vast number of miscellaneous private stamps, I started with those from Western Europe. The Western European album compiled at Offenbach includes pages categorized by country, i.e. America, Argentina, Austria, Belgium, Denmark, France, Germany, Great Britain, Italy, the Netherlands, Palestine, Spain, and Switzerland, with by far and away the greatest number coming from Germany (344/514). In all there are more than 500 ownership markings present in this geographically sorted album [2].

Each page of the album usually contained many reproductions of book markings crammed together with a reference number but no textual caption. Wanting to create a database of individual library marks, I began by isolating each bookstamp or mark from the album, beginning with those from Germany. I wanted to see geographically where these likely-destroyed libraries and private collections were located and to be able to sort out different types of institutions which had been targeted by the Nazis. The results of this mapping can be seen above and are searchable at http://viewshare.org/views/mfraas/offenbach-bookplates/ [does not work in IE].

In all I mapped 289 library markings to 127 locations with 55 markings remaining unknown to me (images of each individual library mark including the 55 unknown are also available on Flickr). The very top of the list is not surprising, Berlin and Frankfurt virtually tied (32 and 31) for the cities with the most library markings recorded in the Offenbach album, but I was a little surprised that the relatively small city of Hildesheim had as many markings recorded as Hamburg.

It should be kept in mind as well that these figures only represent those library markings in the "Germany" Offenbach album, countless private and otherwise unidentified-by-place markings exist in the other albums. I faced more difficulty in coming up with vocabulary with which to categorize the types of libraries present in this album. The overwhelming majority of book markings of course came from Jewish institutional or private libraries but in my cataloging of the book markings I have largely reserved the "Jewish" library label for institutional
libraries such as those of synagogues and communal organizations and not private libraries of those who have names that
might suggest Jewish ancestry. As a result, a significant number of library markings are coded as "other." Nonetheless as the map shows below, there is still value in looking at the library markings by type:

Cluster of Jewish libraries near Koblenz

NARA M1942 (reel 12, frame 541)

These caveats pale in comparison though to one of the central problems with making conclusions about wartime destruction of libraries based on the Offenbach albums. The Offenbach team photographed all the provenance marks on a book they
could find, which do not necessarily represent the library from which
they were looted. This can be readily seen in the "State Library"
category on the map. The stamp of the Bibliothek des Bayerischen Landtags
in Munich (right) is included in the "Germany" album but this obviously does
not mean that the library was looted by the Third Reich, rather that the
book had once been in the collections of that library at some
unspecified prior point. Thus without further investigation it is difficult to know from this evidence exactly which library owned a given book on the day it was seized by the Third Reich.

Nonetheless, I think mapping out these places of origin is exceedingly important when done with a more nuanced set of questions in mind. Taking the markings as evidence more broadly of the location of Jewish and other libraries in the decades prior to World War II provides both a kind of historical recovery and might eventually offer data that could be used by scholars to make new arguments about the diffusion of reading and book culture in central Europe as well as its subsequent destruction.

Obviously mapping just shy of 350 library markings is not going to accomplish this task and I'm excited to move forward to try and catalog all of the markings in the Offenbach albums. This can only be accomplished by a large number of participants with the knowledge and language skills to identify often hard-to-read reproductions [3]. Fortunately, the Center for Jewish History in New York has digitized copies of the albums
owned by Col. Seymour Pomrenze, one of the American officers assigned
to Offenbach. These albums are virtually identical to those in the National Archives and the CJH digitized images are of better quality than the NARA microfilm. Though I haven't cataloged or geo-located them yet I have used the CJH images to put online the remaining 174 library markings from the "Western Europe" album on Flickr. Melanie Meyers and others at the CJH are working on identifying a broad swath of Eastern European and other marks from the albums and I hope in time a more complete picture, usable for research and discovery, emerges.

[2]

The "Western European" album (album II) can be found at the National Archives as NARA 260-LM-II-F and on microfilm as M1942 reel 12, frames 506-548. Another copy of this album is in the Colonel Seymour J. Pomrenze papers (P-933) at the Center for Jewish History
An additional copy of the bookplates can also be found at the University
of
Chicago (Codex Ms 1393). The NARA microfilm has also been digitized through Fold3 and is
available online to members of that
service at http://www.fold3.com/browsemore/hRyVVKV8Z_1/ .For the 344 "Germany" library markings mapped here I have used microfilm images from M1942 via Fold3, for the remaining 174 markings from album II on Flickr I have used digitized images from the Pomrenze papers at CJH.

Monday, July 29, 2013

Last week the Penn Libraries hosted a Rare Books School course on the 15th century European book in print and manuscript taught by Will Noel and Paul Needham. As someone interested in the history of libraries and the movement of books over time, I've long been impressed by the volume of detailed information available in digital form about early European printed books. Online catalogs like the Incunabula Short Title Catalog (ISTC) and the Gesamtkatalog der Wiegendrucke (GW) contain tens of thousands of entries about these books including the whereabouts of known copies today. In browsing both catalogs I had been surprised by the wide distribution of incunabula in libraries throughout the world and inspired by the work of the Atlas of Early Printing, I figured it would be interesting to see the global scope of these collections in visual rather than textual form.

Both the ISTC and GW allow users to browse by lists of libraries which hold incunabula but where the ISTC displays library abbreviations/codes (see e.g. this list), the GW actually lists geographic locations with libraries grouped by city. In addition, the GW provides helpfully detailed alternate spellings and names for locations which make them easier to geocode, for example: "Alba Julia [Gyulafehérvár, Karlsburg, Weißenburg]/Rumänien." For that reason I decided to use data from the GW here, which in all contains listings for some 2,330 place names with institutions holding incunabula.

I scraped the raw data from the GW web interface and then parsed it on my own which resulted in a few problems, namely while I captured all the place names accurately, some holdings libraries seem to have been lost in the shuffle. I've worked to manually correct these but would not be surprised if further corrections are needed. Likewise, the GW helpfully lists some libraries which formerly owned incunabula and which are now defunct or subsumed into other libraries.For example, for Philadelphia, I know that the number of holdings
libraries listed (19) includes the former Mercantile Library of
Philadelphia with 5 incunabula. All of these books are now in the Free
Library of Philadelphia which means that the total for Philadelphia in
my visualization includes one extra holdings location and 5 extra
incunabula. In addition, and most importantly, my results from the GW are most useful in counting editions rather than actual physical books. That is, while there may be just over 5,000 separate 15th c. editions in Stuttgart, the Landesbibliothek there holds closer to 7,000 actual 15th c. books as a result of having multiple copies of the same edition (many thanks to Paul Needham for pointing this out). As a result, the exact numbers contained in the visualization should be taken with a grain of salt.

Top 15 cities by holdings of Incunable editions. Number of editions in center column, number of holdings institutions in a given city in right column.

So, despite these caveats, what does the data look like? The top 15 list is hardly surprising, Munich tops the list thanks to the Bayerische Staatsbibliothek and its massive collection, but thinking geographically rather than nationally, Rome would come out as the clear winner if Vatican City and its libraries were included. Likewise, if judging by number of libraries/institutions reporting incunabula holdings (admittedly a somewhat hazy category), London emerges as the extreme outlier. I found the numbers further down the list more surprising, I would not have guessed that Dallas (1013) holds roughly the same number of early printed editions as Zurich (1002) or that Copenhagen (4146) would have a more diverse collection than Venice (3464), one of the centers of early printing.

That being said, if anything the map hews more closely to the geographic origins of the books themselves than
I fully realized (excepting the large holdings in the US of course!). The densest clusters of holdings institutions and
indeed of incunabula themselves are in the homelands of early printing,
German-speaking central Europe and Italy. Compare for example the two maps below, one from the current holdings data and the other from the excellent Atlas of Early Printing showing where incunabula were actually printed. The two pair up pretty well!

I expected that thanks to
monastic dissolution and library centralization throughout the 19th
century would have resulted in a fairly spread-out pattern of incunabula
holdings with capital cities and regional centers being the big players
with a few scattered libraries in between. This seems certainly to be
the case in France and Spain where provincial cities and towns are less
well-represented, but in central Europe, the big state and university
libraries may have a large share of books, but there are still hundreds
of small religious colleges, town libraries, and monasteries holding
incunabula in the hinterlands. (If anyone is interested, the weighted geographic center of all current institutions holding incunabula is near the Atlantic coast of France outside of Nantes).

Incunabula holdings in the Adriatic Region

These maps also drew my eye to blank spaces which in turn highlighted
borderlands between book-dense areas and those with relative scarcity today.
The Adriatic seems to be one such area, with its string of Catholic and state
libraries extending down the Croatian coast including Dubrovnik, Zadar,
and Šibenik serves to highlight the lack of 15th-century printed books
in the interior of the former Yugoslavia - perhaps reflecting the
ravages of war, different book/manuscript cultures in Orthodox and
Muslim regions, or just the simple lack of good library data.

Something
similar struck me about the region to the east of Berlin and the west of
Poznan, a seemingly "empty" salient stretching south from the Baltic
sea (left). I know next to nothing about this area but would have thought expected a more even distribution of libraries.

Of course, scale is everything. While the views above are intended to highlight cities which possess truly significant incunabula collections, the map below is perhaps a fairer representation of the data - with the sizes of
the dots scaled by quartiles. In this view, the truly broad range of holdings locations comes into play, as on this map the top quartile (largest dot) is reserved for any place holding 65 incunabula or more - a seemingly low bar which reflects just how many locations own a very small number of early European printed books.

Current Incunabula Holdings Worldwide - scaled in quartiles.

Finally, this world-view impressed on me the lack of reported holdings in North Africa and the Middle East generally. The fact that there are only four incunabula from Istanbul reported in the GW is somewhat shocking (for more see Les incunables de la bibliotheque des Musees Archeologiques d'Istanbul). Considering the place of the Ottoman Empire in Mediterranean and world history, the lack of greater numbers of early printed books in Turkish libraries begs an explanation (library destruction? lack of cataloging?). Likewise, the lack of reported holdings in Egypt prompted me to start searching library catalogs. I found six unreported in the new Bibliotheca Alexandrina but am sure there must be more in other Egyptian libraries as well.

I look forward to discovering more in the data over the coming weeks and I can't stress enough how important rich bibliographic databases like the ISTC and GW are for scholars. They are exceptional resources that took decades of work to
put together. Given the amount of work that went into creating their data I hope that in the future there will be a way for both to
offer machine interfaces which make the downloading of raw data simple and these kinds of visualizations second nature to researchers.