August 24, 2011

Note This

Too Much to Know: Managing Scholarly Information Before the Modern Age

By Ann M. Blair
(Yale University Press, 397 pp., $45)

In 1945, in an article called “As We May Think,” Vannevar Bush evoked a specter for the modern age beyond the bomb: information overload. He warned that the scientific progress of humankind would be “staggered” by the “growing mountain of research” made by “thousands of researchers.” A professor of electrical engineering at MIT in the 1920s and 1930s, Bush became science adviser to Franklin Roosevelt, and would go on to head the Manhattan Project until 1943, later becoming founding director of the National Science Foundation in 1950. Drawing on his background in electrical engineering and scientific data management, Bush came up with the “memex” machine, a response to the unmanageable mass of modern scientific knowledge. It was an ultra-sophisticated microfilm data compiler that contained an indexing system for finding and cross-referencing topics. With it, Bush claimed, miniature libraries with a million books could be reduced to fit into the back of a truck. Flawed in its design and never actually built, the memex design was nevertheless a first step by a computer scientist towards hypertext machines, computerized indexes that can search their own internal or external archives. Today we would call the memex a proto-Web browser. A wartime public servant with a classical education, Bush recognized the problem of modern information overload. Yet he was also keenly aware that the quest for searchable data banks had its origins in a long historical tradition. As an example he cited Leibniz—the inventor of infinitesimal calculus, an erudite scholar, and an information-obsessed librarian—who conceived an unrealized design for a calculation machine. In this way Bush gave the memex a historical context, pointing out that it was in part a product of the abacus, and a link in a historical tradition that connected accountants, encyclopedists, and scholars to mathematicians and computers. Past traditions of learning would, in Bush’s eyes, evolve into innovation centers of the future: “The inheritance from the master becomes, not only his additions to the world’s record, but for his disciples the entire scaffolding by which they were erected.”

Modern information impresarios do not always share Bush’s long view of the historical origins of information technology. Larry Page of Google talks about the past in terms of a few years, and locates Google’s origins in the market’s need for a search engine and the quest for profit. As computer engineers rush to innovate, compete, and (as Page boasted) to scan “all the books in the world,” they have taken scant notice of the scholarly interest and involvement in the history of information management from ancient to modern times. While no academic project has matched the industrial might of Google’s scanning of millions of books—a sort of clumsy bibliographical genome project—scholars and academics are some of the most creative and critical users of the Internet. They have pioneered many successful digital book and reference projects, from the online compilation of almost all extant scholarly articles to digital book and archive collections, with a greater sensitivity to access and accuracy than Google Books. Scholars such as Robert Darnton, Peter Burke, and Anthony Grafton have written about the long and colorful history of information.

But in response there has been no serious attempt by digital media developers to engage in a constructive public dialogue with historians of information and leading librarians. There is, perhaps, a reason for this. As Geoffrey Nunberg starkly revealed in 2009 in the Chronicle of Higher Education, Google cannot celebrate the history of indexing and cataloguing because it would draw attention to its matrix of errors. As of yet, Google Books does not work as an accurate system of cataloguing and searching for books. Nunberg showed that the seemingly clunky nineteenth-century Library of Congress Classification system is still more accurate. So intellectual history can still offer practical models and lessons to the titans of the Web.

THE IDEA that the history of scholarship is central to understanding the origins of information technology is the implicit argument of Ann M. Blair’s elegantly conceived book. If the Web has a pre-history, Blair convincingly shows that Renaissance humanists were some of its central players. While they were certainly not the only early information innovators, they pioneered in data collection and referencing systems, and developed what today we call information management. Blair tells the story of how humanist compilers and scholars handled the perception that there were too many books to know. What interests her is their response, the nuts and bolts of information management: the note scraps, collections, tables, and indices that went into making the massive and searchable reference books of the Renaissance.

Blair’s previous work demonstrated that the practices of literary reading and writing were central to the rise of scientific method. Focusing on the lawyer and scientist Jean Bodin in the sixteenth century, she meticulously examined how Bodin collected commonplace reading notes and then stored and analyzed them as scientific evidence. What Blair illustrates on a grand scale in her new book is that Bodin’s practices of reading, note-taking, archiving, and compilation—his creation of reference systems—were not just part of an esoteric world of high learning. Reference books permeated the literate world and led to innovations in information management and the basis of modern research. Long before Vannevar Bush, Francis Bacon cited Seneca to describe a similar ambition for his scientific method, which he hoped would “abridge the infinity of individual experience ... and remedy the complaint of vita brevis, ars longa.” Data management, in sum, has a long history. And the early information masters, in Blair’s extraordinary account, fell prey to the same ambition and hubris in the quest for universal knowledge that plagues Google Books.

Every age has its own form of information overload and response. The great library of Alexandria, which began around 300 B.C.E., created a cataloguing system called Pinakes to manage the estimated 500,000 books in the collection of the Ptolemaic pharaohs. The Pinakes were sophisticated bibliographical lists containing title, incipit (the first few lines of each text), the number of lines for each work, and a subject and author index. Ancient authors managed their own glut of books with collections of aphorisms, such as those by Seneca. Pliny the Elder boasted that his Natural History had collected twenty thousand “things” from two thousand volumes by one hundred authors, with tables of contents, indexes, and bibliographies.

In late antiquity, Christian culture developed its own learned forms of reading and interpretation. In fourth-century Caesarea, on the coast of Palestine, the great ecclesiastical historian Eusebius not only assembled a library of sources on Christian theology and history, he also created a system of cross-references for reading these sources. His scholarly workshop added complex indexes-reading aids called the “canon tables”—written into the Bibles that they produced. Where there were catalogues, reference books followed. Blair shows that alphabetical and bibliographical dictionaries proliferated also in the medieval Islamic world and Asia, where natural philosophers, physicians, and compilers such as the Egyptian Ahmad al-Qalqashandi developed reference works of classical and modern learning and devised tables of contents and complex reference systems. In the eighteenth century, the Qianlong emperor of the Manchu dynasty created a compilation of all known extant works, which comprised 79,000 chapters in 36,000 volumes. It contained 800 million words—it has only now been surpassed in number of words by Wikipedia. Navigating such a text was surely as challenging as copying it, which took 3,800 copyists ten years to achieve.

Standard practices of note-taking and the collection of references grew in the Christian religious reading practices that emerged in the world of thirteenth-century ecclesiastical scholarship. Florilegium, or “reading for flowers,” was a method of reading to take notes to create reference works and collections of aphorisms. These notes were the raw materials with which scholastic churchmen created their massive Latin compilations. Blair explains how they both “diffused” and “reinforced the canon of authors” of the medieval church. To extract knowledge successfully from reading was to “deflower” a book, as explained by the preface to the twelfth-century Libri deflorationum. Learned monks deflowered their books with gusto, for the information masters were often gluttons of knowledge. Extracts came from the Bible and the church fathers as well as from known classical works: Ovid, Virgil, Horace, Cicero, Juvenal, Lucan, and Seneca. Vincent de Beauvais’s Speculum maius, in 1255, was the most ambitious compilation of summaries and excerpts of its time, containing some 4.5 million words.

THEN CAME 1439 and the advent of print, and an explosion of new books with the Renaissance humanist discovery of the classical tradition and the so-called New World and the textual arms race of the Protestant and Catholic Reformations. Between 1500 and 1550, Blair counts roughly 4,373 British printed titles. By the late seventeenth century, the pace grew to two thousand per year. At the end of the eighteenth century, eight thousand titles appeared per year (it should be noted that an estimated total of around 450,000 English-language titles appeared around the globe in 2004). In sixteenth-century Europe, four thousand titles contained a potentially overwhelming amount of information on new topics ranging from Reform and Counter-Reform theology, travel, history, ethics, rhetoric, the discovery of ancient texts, and the expanding world of natural science.

According to Blair, the humanists claimed to be overwhelmed by a “flood and overflow” of books. Seventeenth century authors, such as the French political philosopher and librarian Gabriel Naudé, believed it impossible to acquire the necessary knowledge of books by one’s own labors. Only through reference works could one know all the necessary books. François de La Mothe Le Vayer, the libertine skeptical philosopher and tutor to Louis XIV, worried that the proliferation of so many new books would discourage authors from writing. By the end of the seventeenth century, Leibniz apocalyptically evoked “that horrible mass of books which keeps on growing.... the indefinite multitude of authors will shortly expose them all to the danger of general oblivion.” One scholarly editor warned in 1685 that the “multitude of books” would lead to a collapse of civilization, much like that of Rome in the age of barbarian invasions.

The humanist remedy for information overload was to produce an unprecedented number of manuals about how to read books. They outlined what Blair calls the four S’s of early modern information management: storage, sorting, selection, summary. If the World Wide Web is essentially a cross-index of texts and words, Blair is able to trace such practices of cataloguing and indexing on a much longer information highway than has previously been mapped. The long history of the rise of reference systems and the modern search engine shows the potential and the power of seemingly banal textual practices. In effect, Too Much to Know is a reference book about reference books, containing chapters on early “information management,” note-taking, reference genres and “finding devices,” compiling, and the impact of reference books.

In the rich but relatively slow-moving world of manuscripts, the arrival of printing in 1439 certainly inspired a dread of having too much to know—but it was also a moment of opportunity and entrepreneurialism. Blair notes that along with his first printed Bible of the early 1450s, Gutenberg also printed an edition of the Italian Johannes Balbus’s thirteenth-century Catholicon, which contained a Latin grammar of the Bible and an explanation of all its terms. Gutenberg “anticipated” subsequent print runs. There was money to be made in reference books, even old ones, and print made them even more profitable. It also made them more readable, not only through clear type but also through the standardization of tables of contents, page numbering, indexes, and wide margins that included bibliographical references as well as space for readers’ own manuscript annotations.

In spite of their protests, it is hard to imagine that the classical humanists who emerged as intellectual, political, and spiritual leaders in the fifteenth and sixteenth centuries truly feared information overload, at least where their own work was concerned. Many made their living, or at least their reputation, authoring reference works, not to mention their own abundant books of scholarship and pamphlets. In particular, the genre of the commonplace book—collections of aphorism, maxims, or sentences from holy and profane books—emerged as a major literary genre. Erasmus earned fame with his best-selling collection of Adages. And although he complained of new books of lesser value than those of the ancients—“Is there anywhere on earth exempt from these swarms of new books?”—the great humanist of Rotterdam wrote several daunting shelves of tomes, including his defining Greek-Latin New Testament, the Novum Instrumentum. Conrad Gesner, the German author of the founding work of modern bibliography, the boldly titled Bibliotheca Universalis, claimed to list all known extant books in learned languages (Greek, Hebrew, and Latin) of eighteen thousand indexed authors. While he complained of a “harmful abundance of books,” he nonetheless gained his fame by cataloguing them.

Jesuit manuals such as Jeremias Drexel’s Aurifodina, subtitled The Mine of All Arts and Sciences, or the Habit of Excerpting, explained how to best take notes from reading to create commonplace books: personal notebooks of reading extracts that contained the religious, ethical, and political maxims deemed necessary to lead a good life. There were even admonitions about which texts not to read and how not to fold page corners or to mark texts with fingernail scratches. Scholars, householders, merchants, lawyers, the pious, and the learned wrote extracts in personal commonplace books, often thematically or alphabetically organized, which they often carried on them for reference.

AS PAPER became ever more abundant from the fourteenth century onward, note-taking proliferated, expanding from erasable wax tablets (the method used by Cicero and medieval wool merchants) and erasable donkey skin to permanent slips of paper and notebooks. An early-modern term for notes was “scraps.” Piles of them were called scrap heaps, and tragically for historians, most notes ended up there. Yet notes made in the margins of great printed books survived, and they are like rare seashells in the sands of the libraries. Today scholars sift through the margins of books for annotations and notes, often by great historical figures such as John Adams, who filled the books of his personal library with notes, leaving remarkable traces of how he read, as well as his opinions on politics, philosophy, and his contemporaries.

The overload of books produced an overload of notes. Scholars made many thousands of them. Ulisse Aldrovandi, the great Bolognese natural philosopher and collector of specimens (such as the large lizards that adorn his university library museum), wrote four hundred volumes of manuscript notes. Joachim Jungius, a German professor of mathematics, medicine, logic, and natural philosophy, was famous for having produced an estimated 150,000 pages of notes. But as Blair makes clear, the vast collections of scientific notes were not simply the mad scratchings of obsessive pedants. Commonplace notes comprised the data for internationally successful scientific works, such as Jean Bodin’s Universae naturae theatrum.

Ideally, skilled readers organized notes into personal “arks of study,” or data chests. Vincent Placcius’s De arte excerpendi contains an engraving of a note cabinet, or scrinia literaria, in which notes are attached to hooks and hung on bars according to thematic organization, as well as various drawers for the storage of note paper, hooks, and possibly writing supplies. Both Placcius and later Leibniz built such contraptions, though none survives today. While these organizational tools cannot be directly linked to modern computers, it is difficult not to compare them. Placcius’s design looks strikingly like the old punch-card computation machines that date from the 1880s, and the first mainframes, such as the 1962 IBM 7090.

For most readers, the heroes of Too Much to Know will be esoteric figures of scholarship, often with Latinized names, forgotten until recently by all but the most erudite historians. Blair reminds us that these compilers and encyclopedists were famous in their own time, and often best-selling authors of popular and universally known reference works. Most notable among them was Theodor Zwinger, a celebrated Swiss travel writer, medical expert, and natural philosopher, and the author of the best-selling reference work Theatrum Vitae Humanae. Zwinger boasted that his twenty-nine volumes contained not precepts, but examples from all of human history which would be “useful” in all spheres of life, from theology, ethics, and history to mechanics, physics, and mathematics, which he combined for the first time in an interactive system. Headings, such as the emperor “Claudius,” are followed by extracts on the subject from various authors. Zwinger made his compilation by means of textual bricolage: he built on the work already done by his father-in-law, cut and pasted slips of paper often torn from other books, and wrote additional information in the margins.

As the erudite Samuel Hartlib explained in 1641, “Zwinger made his excerpta by being using [sic] of old books and tearing whole leaves out of them, otherwise it had beene impossible to have written so much if every thing should have beene written or copied out.” What made Zwinger’s collection of examples appealing to his reader was not only his copiousness, but also his quest to represent a true order of all the things in the world, derived from his massive sampling of nature and all the wisdom of the ages in a diagram. He designed a ten-page diagram of headings (dispositio titulorum), which contained up to seven layers of subdivisions. Then as now, the great information compilers were a proud breed: Zwinger was ambitious, or foolhardy, enough to claim that his book recreated the total knowledge of the Earth that God would possess on Judgment Day. He did take the precaution of dedicating the project to God, and he devoted a whole volume to examples of human hubris.

Inhabitants of the digital age both warn and gloat about the malleable quality of Wikipedia, but most of them use it. Yet old printed books changed too, albeit at a slower pace. Works such as the Theatrum show that humanist reference books, like modern Web pages, were adapted and edited over time, over and over again. The Theatrum was re-edited ten times in 140 years, growing threefold in the process. Along with Zwinger, printers and their in-house compilers added indexes and rearranged the book. In 1631, the Hieratus brothers’ printing house in Cologne and the Antwerp publisher Laurentius Beyerlink published an entirely new edition of the Theatrum, doubling its size to ten million words, creating a complex table of contents with full lists of headings and subheadings, with the confident title Magnum Theatrum. A prefatory text explains how a printing-house team of compilers tore apart Zwinger’s old book and pasted it back together again with new material in a new order. They added new headings and alphabetical order and indexes to make the work easier to consult. Today the Magnum Theatrum is digitized and on the Web, but Zwinger’s original Theatrum is not. Even the most important data eludes the best information systems.

WHAT IS MISSING in this story is an examination of the inherently Promethean quality of mastering and organizing massive amounts of data. No matter how sophisticated, information management does not always work. In spite of super cross-referencing computers and epic algorithms, the most basic financial data or political intelligence can fail to get to the desk of the right analyst. Experts, scholars, and administrators practice the remarkable human activity of ignoring the data in front of them, or the very systems that they have designed to manage it. Leibniz makes a good case in point. Three hundred years before Einstein, he, too, kept a messy desk. A father of mathematics, a famous historian and philosopher, the builder of calculation machines and scrinia literaria, and the librarian of the massive ducal collection in Wolfenbüttel, Leibniz was nonetheless very bad at organizing his papers. Indeed, while he was a librarian, he attempted to catalogue the more than 200,000 books in Wolfenbüttel. Each title was written on a scrap of paper. He placed the almost 120,000 reference scraps (still only half the library) not into an organized scrinia, but into a bag. Many were misplaced or spilled, and at Leibniz’s death, in 1716, the failed project had succeeded only in closing down the library for nine years. The catalogue was not finished until years after his death.

Why did a figure such as Leibniz fail to use his own tools? Perhaps messiness was the source of his creativity. This is a fact of intellectual originality with which Google must still grapple—libraries, after all, allow for the type of manageable disorder which is often the spark of creativity. Or maybe Leibniz resisted the very order of things, over which his calculus gave him a unique mastery. If anything, the rejection of systematized information handling methods could be as common as their adoption. Humanists had the tools and even the concepts to invent the cross-referenced thematic library catalogue, but they did not do so. We do not know why it took several hundred years and the Italian director of the British Museum, Antonio Panizzi, to create a truly modern reference catalogue through his “Ninety-One Cataloguing Rules” in 1841.

Blair acknowledges that her book is just a beginning in this long project, itself an encyclopedic enterprise, and that scholars comprise only one element in the history of information management. Accountants, notaries, sailors, and administrators (mathematicians are oddly absent from this story) also worked on information-handling techniques at the same time as humanists such as Zwinger. We still do not understand how information practices from the worlds of learning, finance, industry, and administration cross-pollinated. From the fourteenth century onward, accountants developed complex instructions for note-taking to describe holdings and transactions, as well for the recording of numbers and calculations. By the seventeenth century, merchants, and indeed ship captains, engineers, and state administrators, were known to travel with trunks of memoranda, massive inventories, scrap books, and various ledgers and log books that mixed descriptive notes and numbers. By the eighteenth century, tables and printed forms cut down on the need for notes and required less description and more systematic numerical notes. Notaries also were master information handlers, creating archives for their legal and financial documents and cross-referencing catalogue systems.

It was not just curiosity that inspired information management. As modern tech pioneers would surely recognize, good catalogue systems are essential when legal rights, inventories, and financial obligations are at stake. During the Renaissance, accountants such as the Tuscan Luca Pacioli and the Dutchman Simon Stevin were scholars, too. And Denis Diderot, the principal author and editor of the Encyclopédie, was the philosophizing son of an artisan who mixed the scholarly, the technical, and the mundane in his famed multi-volume reference work. The challenge of the history of information is that it defies academic disciplines and professional fields.

Blair skillfully explains the success of the great humanist compilers and indexers. She expresses confidence in the progress of the long struggle to master information overload. She is certainly right that we are living in a golden age of information technology. Yet the starts and stops of figures as ambitious as Zwinger make one suspect that Google Books might prove to be an experiment much like Leibniz’s spilled catalogue: a messy attempt among many to create a universal library. Google has created neither the perfect universal library nor the definitive information system. The story is not yet over. Its ultimate lesson is that only historical perspective and technological vision will bring us closer to the real but unattainable goal of mastering all the knowledge in the world.

Jacob Soll is a professor of history at Rutgers University and the author, most recently, of The Information Master: Jean Baptiste Colbert’s Secret State Intelligence System (University of Michigan Press). This article originally appeared in the September 15, 2011 issue of the magazine.