The Newsroom blog

84 posts categorized "Newspapers"

The British Library (and its previous incarnation as a department of the British Museum) has been collecting newspapers for over 200 years. The ways in which these items have been acquired, processed, and stored have changed over time as priorities, policies, locations and technologies have developed, but some historic practices have had interesting implications for our current Heritage Made Digital programme to digitise a number 19th century newspapers. The newspapers we are digitising are mostly London based, and we are focussing on titles that have a number of volumes in poor or unfit condition, with the aim for filling in some gaps that currently exist in the digital archive.

Multi-title newspaper volumes

One of the practices that has had a real impact on the way we have approached this digitisation project is the way in which items have been bound in the past for storage and preservation. When thinking about how newspapers are stored by the British Library, most users probably imagine that complete runs of a title, for instance the Morning Herald, would be held together. But that hasn‚Äôt always been the case. For ease of processing and storage and to conserve space, many newspapers were held in annual sequences, rather than in title runs. So newspapers published in 1832 were held together, and these were followed by the titles published in 1833.

For many newspapers, particularly the dailies, this simply means that one or more volumes of a title are held in each yearly sequence. But for a significant part of the collection, mostly weekly titles of 12 pages or less, for which the British Library collected only one edition, there simply wasn‚Äôt enough material to make up a bound volume on their own. This has meant that there are many volumes containing two, three or sometimes more titles for a year. For example, a volumes currently sat on my desk contains both The Ballot and the Weekly Times for 1831.

The Ballot and the Weekly Times in one volume

This practice saved space and money by reducing the number of bindings produced for each year. There were no strict rules about how many titles or pages constituted a volume, and practices varied over the years, but mostly newspapers of a similar size are bound together, as this makes them easier to store. In most cases all of the items bound into a volume are newspapers, but occasionally they also contain periodicals bound, and these fall under the care of other departments within the British Library.

This has led to some complications in the workflow of our digitisation processes. The catalogue records for newspapers do not contain details of how they are bound, so we are often unaware of whether items are bound individually or in multi-title volumes, or which items are bound together. This sometimes means that a single volume is called up multiple times by our digitisation team, as several titles from our list are bound together for one or more years. It has also made the process of scanning titles more laborious and complicated. Staff do not simply open a volume and scan the contents, they have to identify the correct title, and work out where it starts and ends, checking this against the details that are on the catalogue records.

It has also raised some interesting questions for our digitisation project. What impact does going through the digitisation process several times have on volumes, particularly as many are already in a poor or unfit condition? If one of the titles we are digitising is bound in a multi-title volume, should we be digitising all of the other titles with which it is bound? Should we be digitising the periodicals that are contained within these volumes, even though they are not officially part of the newspaper collection? How far should what is digitised follow the physical reality of what is archived?

We are still working to answer some of these questions. In general we have had to stick to digitising only the items already on our list, as otherwise the numbers could spiral out of control, and we might end up digitising large numbers of titles that do not meet our criteria (i.e. in a poor or unfit condition; out of copyright; and with a circulation beyond London). We look closely at the other titles we come across, and access them against our objectives, but in most cases there are reasons why they had not already been selected.

Despite their complications these multi-title volumes do also provide opportunities. I will talk in a future post about serendipity and its role in newspaper research, but it has also played a very small role in our selection process. As mentioned above, in most cases we have stuck only to those titles already on our selection list, but there have been a few occasions when looking at a volume, we have stumbled across another newspaper that has proved interesting enough to make it onto our list. It has also made us think a lot about how and why things were done in the past, and how practices evolve, giving us a better understanding of how the collection was, how it currently is, and how it could be in the future.

Beth Gaskell

Curator, Newspaper Digitisation

Posted by Luke McKernan at 1:24 PM

Tags

The British Library is currently engaged on a major programme entitled Heritage Made Digital. The aim of the programme is to transform digital access to the British Library's heritage collections by streamlining digitisation workflows, undertaking strategically led digitisation and making existing digitised content available as openly as copyright and licensing agreements allow. Heritage Made Digital is embracing a wide range of materials, from manuscripts through to sounds, and one of its major elements is newspapers.

Unfit newspaper volumes awaiting conservation inspection

The first thing to ask is why the British Library needs to be digitising newspapers, when we already have a very productive relationship with family history company Findmypast, which selects and digitises newspapers for the British Newspaper Archive, providing us with digital preservation copies in the process. It has digitised over 20 million pages from our collection, and adds hundreds of thousands of extra pages each month.

The simple answer is that there is more that we would like to see digitised that isn't likely to get digitised soon otherwise. The greater part of newspapers processed by Findmypast come from our microfilmed copies, because it is so much easier and quicker to do so (about eighteen times quicker than digitising from print). But only a third of our collection of some 60 million newspaper issues has been microfilmed. Of the newspapers for which we have only print, some get digitised, but many do not. In part this is because of the condition of many of newspapers, often produced using low-quality newsprint and for many years not stored in optimum conditions. We define preservation status of our newspapers under three categories: good, poor and unfit. Unfit no one gets to see, even onsite, unless we have a microfilm or digital access version. And around 4.5% of our collection (or 20 million pages) is in an unfit state and with no microfilmed or digitised copy available. That's a lot of newspapers not to be making available at all.

So, for Heritage Made Digital, we have chosen to concentrate on newspapers in a poor or unfit condition. This is not as straightforward as it might sound, since few runs of a newspaper title (i.e. from its first date to its last date) exist under one condition status. One volume may be good, another poor, another unfit (e.g. with a broken spine, crumbling pages etc). Therefore, although we want to concentrate on poor or unfit newspapers, we also want to digitise full runs of newspaper titles, because this will make best sense for researchers. In practice, we find that 40% of the volumes we are digitising for Heritage Made Digital are in a poor or unfit state.

We have set other restrictions for ourselves, with the aim of offering the best result for the widest range of research users. We are only digitising newspapers that are out of copyright, so that we can make the results freely available online - both the digitised pages and the data created by digitisation. Calculating when a newspaper goes out of copyright is complicated, but we are sticking to a 140-year rule - so the run of the newspaper has to have ended by 1878.

Next, we are primarily digitising newspapers that we published in London but which were distributed outside London as well. So, not newspapers for the areas of London only (i.e. London regionals), but metropolitan newspapers with a wider circulation. Curiously enough, this is a neglected area for newspaper digitisation. The British Newspaper Archive focusses heavily on British regional newspapers, while the main UK national newspapers available digitally are almost entirely those where the title still exists (e.g. The Guardian, The Times). In other words, we have identified a gap, one which we think will make a significant difference to what is available online so far.

We are not in competition with Findmypast, however - in fact, we are working closely with them. Every newspaper that we digitise will be made freely available via the British Library's catalogue, but they will also be made available via the British Newspaper Archive (a subscription site). That means that almost all of our digitised newspapers will be searchable - by title, date and word - in the one place. As things stand, the newspapers will be appearing on the BNA first, and secondly (at a date still to be determined) through the British Library catalogue, using the Universal Viewer display tool (a development project still in progress).

Waiting to be digitised

So, what are we digitising?

It will be around 1.3 million pages, 1 million from print and another 300,000 from microfilm. We're still choosing the titles to digitise, even as we start digitising, as we find out more through a process of preservation need and research, but it will be somewhere around 180 newspaper titles, many of them short runs of a year or less. We can't provide a definitive list as yet, but these are some of the titles (with title changes) that have gone to our imaging studios already:

The News / The News and Sunday Herald / The News and Sunday Globe (1805-1839)

People's Weekly Police Gazette (1835-1836)

Pictorial Times (1843-1848)

The Saint James's Chronicle (1801-1866)

The Sun / The Sun & Central Press (1801-1876)

There is a lot more that we have planned. We're exploring academic partnerships (we're already working closely with the recently-announced British Library/Alan Turing Institute data science project Living with Machines). We're aiming to do creative things with the data. We will be publishing blog posts, both about the content and about the decisions we're making on what gets digitised. We will be producing online guides and research tools, aimed at both the specialist and the general user.

We think that we have come up with a model for the digitisation of newspapers, in particular the way in which we are working in partnership with Findmypast, which will be particularly productive. We certainly hope to build on it beyond the life of the project. We can't show you any newspapers digitised through Heritage Made Digital, or offer any free datasets, as yet. But we will do soon.

It's worth remembering that the British Library has 60 million newspapers, from 1619 to the present day. After a decade or more of intensive work, we have digitised just 5%. There is a long, long way to go.

These newspapers survive at the British Library, and, looking at them, they are remarkably close to the newspapers of today. What we see is a sheet of paper: portable, foldable, shareable. There is a masthead with the title of the news publication. There is a date ‚Äď strictly speaking, a date for the first story. There are stories, arranged in columns, with a shared currency. It gives a shape to the news, with the promise of more to follow.

The newspaper has been a remarkably successful publishing model, sustained in this country, after an unsteady start, for nearly 400 years. The newspaper and its prints variants flourished, with the inhibitions of censorship, taxation or regulation failing to halt their progress. The newspaper informed, entertained and helped define the nations and regions that it served.

The newspaper went largely unchallenged as a medium of news for nearly three hundred years. Certainly there were variations on the form, from periodicals to broadsides, and changes were brought about in size, illustration, distribution patterns and so forth, but essentially the news meant the newspaper.

Title image of a 1911 edition of Pathe‚Äôs Animated Gazette, British Pathe

The newsreel did another revolutionary thing. It invited the audience to widen its understanding of the news, even to have a measure of control over it. Owing to the complexities of film processing, newsreels could not be published daily. They were published bi-weekly, matching the common pattern of cinema attendance (i.e. most people were going to the cinema twice a week), and deliberately chose news stories which had featured in the newspapers previously. You had read the story, now you could see it in motion. You the audience could combine these media together to enrich your understanding of the news, if you so wished.

But the news was spreading, increasing audience power while making it much harder for the news barons to control every manifestation of the phenomenon of news. The BBC introduced news bulletins on 23 December 1922, under government licence. It lay outside any possible control of the newspapers (though originally the BBC was restricted to using news agency copy only), and swiftly challenged them through daily publication and command of the public space. Radio added a new dimension: live reporting, collapsing the time difference between news event and news consumption.

Radio also offered sound, of course, which the newsreels adopted around 1930. News could now be read, or seen, or listened to, and with each innovation the newspaper lost that much more of its claim to the totality of news, while audience power grew with the increase in choice.

BBC newsreader Kenneth Kendall, 1950s, BBC

Next came television. The first BBC television news programme, in January 1948, was a newsreel in form and name ‚Äď Television Newsreel, while the new medium owed much in its early years to its parent medium, radio. As with radio in the UK, it originally owed its existence to government licence, and added to the trump cards of frequency, domestic space and live reporting the particular power of the newsreader.

News now had a human face, that spoke to you the viewer as an individual as well as to the mass. It added to that sense of reassurance that news publications existed to provide. Danger and calamity were what was happening to other people. The fact that you were there to read the news, or to have it read to you, implied that you were safe.

Then came news on the web. Traditional news organisations were extraordinarily slow to grasp the implications of the Internet. Confident in their well-established models, in the audiences that were assumed to be loyal to them, and in the advertising revenue that sustained them, they were profoundly shocked ‚Äď and continue to be shocked ‚Äď by this mode of distribution and communication which upturned their every expectation. A fierce rearguard action is being fought, defending traditional newspaper values against the freewheeling digital behemoths Facebook and Google, but the balance of power has shifted irrevocably.

News stories now filter through a myriad of networks; the advertising money has moved to search; choice has expanded beyond any reckoning; the timetables around which had traditionally structured itself have gone; and the audience has become all powerful. The traditional news world has been disaggregated, and we are all ‚Äď producers, readers, advertisers, regulators, legislators ‚Äď trying to work out how to put the pieces back together again. All that is certain is that the Internet makes the news, because it has become the lifeline on which all news production and news communication now depend.

News in the UK has changed greatly over the past 100 years, in medium, range, extent and ownership. Today much of the understanding on which news has been based, the contract between publisher and reader, is being challenged. Political upheaval combined with the mushrooming of digital outlets, combined with growing audience power on what is accepted as news, has made collecting the news all the more challenging ‚Äď and imperative. What is the news now, and how do we collect it?

The British Library, until recently, has not collected the news ‚Äď it has collected newspapers. As part of its function as the national research library, and as an outcome of Legal Deposit legislation, the Library (or the British Museum before it) has had the power since 1869 to request one copy of every newspaper issue published in the UK or Ireland. Just the one edition is taken where there are multiple editions of a title, usually the latest edition.

Between roughly 1822 and 1869 copies of newspapers were supposed to be sent to the Stamp Office for reasons of taxation, and these copies subsequently made their way to the British Museum. Consequently the collection is comprehensive from 1869 onwards, and nearly so for 1822 to 1869, though comprehensive is, in our case, a relative term.

Prior to 1820, the Library has been dependent on acquisitions and donations, mostly notably the newspapers, news sheets and news books from the Civil War period collected by bookseller George Thomason, and the Burney Collection of newspapers 1603-1818, collected by the Reverend Charles Burney. As a result of Legal Deposit, donation and acquisition, the collection amounts to some 60 million issues, or 450 million pages, though that is a figure derived from counting the number of volumes held, and in truth no one can really say exactly how many newspapers the British Library holds.

New newspapers received under Legal Deposit awaiting processing at British Library, Boston Spa

We do know how many are coming in, however ‚Äď currently we take in 1,200 titles every week ‚Äď that is, a combination of dailies and weeklies received under Legal Deposit. The figure is down from the 1,400 or so we were taking in only a couple of years ago, but, for the time being at least, this is remains a country with a remarkable appetite for newspapers.

Around a third of the titles in the collection are from overseas. Relatively few foreign newspapers are now collected, owing to storage issues and the availability of electronic newspaper resources, but historically there was collecting from many countries, notably from Empire and then Commonwealth countries which were received through colonial copyright deposit.

But what of the other news media? There is no Legal Deposit for sound or moving image in the UK. The Library incorporated the National Sound Archive in 1983, but its collection has been created through acquisition, special arrangements with publishers, off-air recordings and the recording of live performances and interviews by the Library itself. News, until recently, was not part of its collecting remit, though its radio collections did include some news broadcasts.

For television, the British Library deferred to the British Film Institute (BFI), which has collected the medium selectively since the late 1950s. The Broadcasting Actof 1990 brought in statutory provision for a national television archive, paid for by the television companies, driven by off-air recordings of programmes as they were broadcast. This archive is maintained by the BFI, and since the mid-80s it has been recording on a daily basis television news programmes from the main terrestrial channels.

In 2010 the British Library re-introduced off-air recording, taking advantage of an exception in UK copyright which enabled it to record broadcast programmes for the purposes of maintaining an archive. It had previously recorded radio and TV programmes up to 2000, mostly on musical themes. Now the emphasis was on news. This was driven by a wish for the Library to build up its moving image capability, and in response to a gap in archival provision. Although the BFI was recording the main terrestrial television news programmes, most news programmes from the 24-hour news channels were not being archived by any public body. There was an opportunity to become a television news specialist, adding radio news as well to the mix, to provide a service to researchers not available elsewhere. It was also recognition that television and radio news made for a logical extension of the Library‚Äôs news collection. Newspapers were no longer enough.

In 2013 the Non-Print Legal Deposit Act was passed, permitting the British Library, in partnership with the other Legal Deposit libraries of the UK and Ireland, to collect electronic publications, including websites, the same as for print. This has been a complex and gigantic undertaking, with the number of files now archived running into the billions, dwarfing in size the Library‚Äôs physical collection.

Most of the websites on the UK Legal Deposit web archive are captured once a year. That is, a snapshot record of a website is made as it appears at one point in time, with all pages linked to a root URL. This is not suitable for news, where so much can disappear quickly, and where there is a research imperative to see the news as it was made available, at regular points in time. We need web news to be archived like print newspapers, because print newspapers have established the model. So, from 2014, we have been capturing news websites on a regular basis, usually weekly, but daily for the national daily newspaper sites and news broadcaster sites.

It has taken a while to build up, but we are currently capturing some 2,000 web news titles on a regular basis, in collaboration with the other Legal Deposit libraries. This has included perhaps the most radical shift yet in our news collecting strategy, because as well as archiving the websites of the recognised news publications, around half of what we are archiving has been hyperlocal news sites. Hyperlocalism, a local publishing movement which began in the USA and has taken off greatly in the UK in the past four years, means that anyone can be a news publisher. Anyone with a bee in their bonnet or a feeling that the news in their street is being overlooked can sign up for free to a Wordpress site, give it a newsy title, and start publishing. And, if the British Library gets to hear of them, we will start archiving them. We do not discriminate.

A Little Bit of Stone, hyperlocal news site for Stone, Staffordshire, established in 2010

There is no definitive list of hyperlocal sites in the UK (though there are two directories that list many: Local List, and Cardiff University‚Äôs Centre for Community Journalism‚Äôs directory of hyperlocals). Nor is there any comprehensive listing available of standard UK news websites. Consequently we do not know what percentage of the UK‚Äôs news websites we are archiving, though we are confident at least that it is a good majority.

There are many problems with the archiving of web news, however. Firstly, there is the sheer vastness of the web. No one can say what the true size is of a phenomenon which is in a continual process of change, but in a recent talk web archivist Ed Summers calculates that the Internet Archive, which said in 2016 that it has saved 510 billion web captures, might by this have collected just 0.39% of the web. We can see something of the mania of trying to capture the ever-changing web in the Internet Archive‚Äôs hourly captures of the dailymail.co.uk (known as Mail Online in the UK). It is too much to comprehend, certainly too much to archive. The comprehensive archive of what is published can no longer exist.

Internet Archive captures of dailymail.co.uk, highlighting one day‚Äôs captures for 26 March 2018

Secondly, owing to purely technical reasons, the Library is not always able to capture the audio and video elements of news sites, and even if it can capture them it is not always able to play back the results. Next, there used to be a simple correlation between a printed newspaper and the website that shared its name, and often its content. Increasingly the two are diverging, not just in content, but in title and scope. Single websites increasingly represent several regional newspapers where costs need to be cut. Newspapers are also being replaced by web versions, most prominently The Independent, which exists no longer in print but continues its digital existence as a facsimile version of the print title, as well as the independent.co.uk website and the indy100 spin-off site.

A few years ago, many newspapers made a PDF of their newspaper available on the website, but now a far more complicated picture exists, with a combination of digital outputs and many newspapers turning to aggregators such as PageSuite to provide digital access for them. Collecting newspapers digitally, which the Library does not currently do but is investigating, will not be a simple case of matching like for like. Whatever future collecting model the Library may pursue is bound to include a measure of print newspapers, not least because we will want to continue to collect a core of newspapers as print out of respect for a 400-year-old medium, for as long as there continue to be print newspapers. But one thing is certain ‚Äď the world of digital news is different to that of physical news, and we will have to obey the rules of digital.

The current collection comprises the following: 60 million newspapers, 2,000 websites captured a total of 400,000 times, 85,000 television news programmes and 40,000 radio news programmes. Each week we take in 3,500 UK news publications of one kind or another. The news publications are collected through a combination of Legal Deposit, copyright exception and licence.

The Library‚Äôs news offering incorporates the full range of news media ‚Äď newspapers, news websites, television news, radio news, and other media

The Library's news content comprises primarily news most relevant to UK users, meaning news produced in the UK or which has had an impact on the UK

The Library also collects or connects to selected overseas newspapers, now primarily on microfilm or digital, according to availability and with focus on areas of research interest

The content strategy for news media is underpinned by Legal Deposit collecting, both print and non-print, but includes audiovisual media that lie outside Legal Deposit

The challenge for the Library will be how to bring these different news media together. That is why our news strategy focusses strongly on data. Commonalities of data ‚Äď particularly date, time and place ‚Äď will be essential for linking together different news stories. Other libraries are already experimenting with this, the Royal Danish Library for example, with its Mediestream service that brings together newspapers, television and radio.

To achieve such integration it will be essential to link up not only by date but keyword. We already capture subtitles for television news programmes where these are available; we are now experimenting with speech-to-text transcriptions of radio programmes. We will eventually be able to offer full text searching across each of the news media. The quality of such transcriptions will vary according to source, so an essential next step will be to extract entities, or themes, from these transcripts, using a shared set of terms.

So I will be able to aske of a future resource discovery system, show me everything you have relating to Brexit between 1st and 31st December 2018, and there will be there newspaper stories, the television news stories, the radio stories and the web stories, all of them indexed automatically, as well as books, papers or other media produced at that time which will enrich the picture of what the news was on this one topic at that particular time. All those objects must be born digital or to have been digitised, so our collecting policy must be digital.

There are other news media. The Library is looking at podcasts, which certainly fall under its sound and news collecting remits, not least because all the major newspaper titles and news broadcasters are producing podcasts. No commitment has been made as yet, but we have started capturing some sample news-based podcasts.

The area of current news that we get asked about most is social media. We are not archiving Twitter, firstly because it is an American company and so falls outside our UK web archiving remit. The Library of Congress took on the task of archiving Twitter, though a year ago it announced that the task was proving too great and that it would only be archiving Twitter selectively from now on. The British Library archives some Twitter feeds where these have a British focus, a number of which are news-related, but it is a tiny drop in a vast ocean.

Twitter highlights the challenge we now face in trying to collect the news. It is not just about the vast scale of the archives, but about their meaning. As I wrote earlier this year:

The archiving of Twitter is a logical impossibility. There is no single Twitter out there that might be consulted equally by any of us. There are over 300 million Twitters in existence. Each person signed up to the service selects who they will follow and what topics interest them. No one person sees the same Twitter as the next. It is universal and absolutely personal at the same time, which is the key to its particular power. No archive can replicate this, because it must convert the subjective into the objective.

The subjectivity or personalisation of news is going to present us with the greatest collecting challenge. If everyone sees the news differently, how do we collect it? Once it was understood that a news object such as a newspaper was read in the same way by the same set of people for whom it was intended, usually defined by geographical location or political persuasion. But does that apply in a wholly digital world?

Those who once saw themselves as newspaper publishers now view themselves as news publishers. News is gathered and composed digitally, and then transmitted through a variety of media, one of which - for the time being - remains the print newspaper. To get at the heart of news, to collect it fully, one might want to collect not the published forms but the individual digital elements and the content management systems that hold them. Then one could recreate the news in the various forms in which it was be distributed at any given point in time ‚Äď as print, website, mobile and so on. Collecting news as publications has been fine for 1620 through to, maybe 2020. But what after then?

Inside the British Library‚Äôs National Newspaper Building, Boston Spa

John Carey, in his introduction to the Faber Book of Reportage, makes an intriguing argument about the nature of news. Firstly, he says:

The advent of mass communications represents the greatest change in human consciousness that has taken place in recorded history. The development, within a few decades, from a situation where most of the inhabitants of the globe would have no day-to-day knowledge of or curiosity about how most of the others were faring, to a situation where the ordinary person‚Äôs mental space is filled (and must be filled daily or hourly, unless a feeling of disorientation is to ensue) with accurate reports about the doings of complete strangers, represents a revolution in mental activity which is incalculable in its effects.

Carey considers what it was in the mindset of pre-communication age humans that reportage replaced, and he suggests that the answer is religion. He continues:

Religion was the permanent backdrop to [man‚Äôs] existence, as reportage is for his modern counterpart. Reportage supplies modern man with a constant and reassuring sense of events going on beyond his immediate horizon ‚Ä¶ Reportage provides modern man, too, with a release from his trivial routines, and a habitual daily illusion of communication with a reality greater than himself ‚Ä¶ When we view reportage as the natural successor to religion, it helps us to understand why it should be so profoundly taken up with the subject of death ‚Ä¶ Reportage, taking religion‚Äôs place, endlessly feeds it reader with accounts of the deaths of other people, and therefore places him continually in the position of a survivor ‚Ä¶ [R]eportage, like religion, gives the individual a comforting sense of his own immortality.

There is plenty to challenge in Carey‚Äôs suggestion of reportage as being the natural successor to religion. There are different religions out there, and religion did not disappear with the emergence of public news forms. He also blends mass communications, reportage and news, though they are not the same as one another. But his theory is richly suggestive. One thinks of John Donne, writing in 1611 in his poem ‚ÄėAn Anatomy of the World ‚Äď The First Anniversary‚Äô of changing ideas of the universe, ‚Äú'Tis all in pieces, all coherence gone / All just supply, and all relation‚ÄĚ. Ten years later the country‚Äôs first newspaper would appear.

Carey‚Äôs insight also provides an interesting mechanism for considering the nature of news today.

Published, public news has fed curiosity, helped to solidify our sense of belonging, and has provide a sense of reassurance. It has profoundly influenced our sense of time. The question is whether our new world of news will continue to do the same. News is a constant, but the forms in which it is transmitted must change, and they could be in the process of changing quite radically. The trust in the definable news publication to tell us who we are by relaying what we want to know, could be disappearing. The need for assurance will remain, however, so what will provide it? The increase in the personalisation of news, the logical extension of which is to make everyone their own news editor, hardly seems a recipe for the sort of assurance that leads to a settled society.

Or maybe we are entering a post-news era, with a changed sense of reality, an age without reassurance. My personal definition of news is that it is ‚Äúinformation of current interest for a specific audience‚ÄĚ. It‚Äôs a flexible construction, but what happens when I no longer feel certain to what audience I belong? Maybe an age of supreme individuality is underway, in which I no longer feel a part of any audience, or else there are so many audiences to which I could be said to belong that the concept becomes meaningless. It is a world lived in a continuous now, where the past is losing its meaning, and where everyone thinks themselves immortal, now. That could be the end logic of an entirely interconnected world.

Despite the alarmist cries from some quarters about disinformation and the undermining of the news media as we have known them, these remain fringe concerns. The vast majority of people trust the established news media. They like their local newspaper, or at least the idea of there being one. They watch the same TV news programmes in their usual slots, they listen to the familiar radio news summaries. The urge for local identity is driving our politics, so there is little evidence for saying that we no longer know who we are or where we belong. We still need the reassurance of news. The post-news era is still some way off. Perhaps it will always be some way off.

Meanwhile the British Library‚Äôs collecting policy must be to collect what it can, by the mechanisms that are available to it. It wants to collect across the different news media, through a combination of Legal Deposit, copyright exception and licence, augmenting what is still its core news collection, newspapers. Everything must be built around the newspaper, for the time being. Our revised news content strategy, currently in development, has the subtitle, ‚Äúmoving from a newspaper collection to a news collection‚ÄĚ. It sounds reasonable enough. We must do what we must. But the world of news may be moving beyond us; beyond the British Library, or any of us.

This a shortened version of a talk I gave at the Media History Seminar, Senate House, on 4 December 2018. A PDF copy of the full text, with footnotes, is available here.

Posted by Luke McKernan at 11:11 PM

Tags

We are currently advertising for a Curator, Newspaper Data to join our news curatorial team. This is a fixed-term post until March 2020, based at our St Pancras site in central London. The post is being advertised as part of the British Library's Heritage Made Digital programme, a major part of which involves digitising 19th century British newspapers, with a special focus on newspapers in a poor or unfit condition.

We are looking for someone who will help us to apply data journalism thinking to this historical news material. The person we are looking for will be responsible for the analysis and creative interpretation of data derived from Heritage Made Digital and related British Library newspaper digitisation projects. They will prepare derived newspaper data sets and promote these for use by researchers. They will work with researchers to develop projects using newspaper data.

In particular we want them to help us produce stylish visualisations using historical newspaper data, working with third-party designers as necessary. A couple of years ago on the Newsroom blog we wrote about the art of the news visualisation, and how this particular branch of data science was helping to illuminate the themes behind the news. We also said that it would be a good idea if such thinking, and with such outputs, could be applied to historical news data. Now we want someone to put those thoughts into action.

The post-holder will need to have a strong background in computer science and data science. They will have experience of working with or developing tools for large content and data volumes, and an interest in nineteenth-century history and/or news and current affairs. It's a terrific opportunity for the right person. Information on how to apply is on the British Library's vacancies site. The deadline for applications is 9 September 2018.

Posted by Luke McKernan at 2:06 PM

Tags

Newspaper art, by its very nature, is an ephemeral art form, sprawling in number and fleeting in effect, so it is no easy task to research the field of Victorian Graphic Journalism. The role of Special Artist ‚Äď or artist-reporter ‚Äď as a recognized profession came into existence in the mid-nineteenth century. The seminal event in its early history was the founding of the Illustrated London News in 1842, and that of its chief rival, the Graphic, in 1869.

The critical innovation that made it possible was the discovery of a wood engraving technique by Thomas Bewick, back in 1791, which enabled images to be printed simultaneously alongside text. But it was not until Herbert Ingram, the founding editor of the ILN, seized on its potential, that the pictorial press came into its own. Ingram formalized the practice of publishing images to accompany newsworthy events. As one of the leading Special Artists, William Simpson, would reflect, you did ‚Äėnot hear of the Special Correspondent during the wars of Napoleon, but between 1815 and 1854 a great change had taken place in the character and position of the newspaper press. It was this change that evolved the ‚ÄúSpecial‚ÄĚ (Notes and Recollections of My Life, 1889, National Library of Scotland).

Acknowledging that historical events had previously been illustrated, Simpson identified that the key distinction was that the artist was now expected:

‚Äėto be always on the spot, jotting down in his sketch-book what he saw with his own eyes. ‚Ä¶ [he] sees what takes place, and his work is immediately given forth to the world, so that its accuracy can be tested even by the actors in the historical event.‚Äô

Convinced of the importance of the new profession of which they were part, because it meant accurate, visual records existed of all the most newsworthy events of Queen Victoria‚Äôs reign and ‚Äď most importantly ‚Äď had been communicated back to the British public through the medium of the press, Simpson and his journalistic colleagues, all household names in their day, believed that their work would be highly prized by ‚Äėfuture historians‚Äô. However, contrary to their expectation, they, and the imagery they produced, have been largely forgotten. If they are remembered, it is as pioneering war artists ‚Äď rightly so, as the Crimean War of 1854 marked the point at which the fledgling profession truly took off. Yet my research into their work over the past decade has revealed that the scope of what they achieved is even greater. I now share their conviction that they deserve to be celebrated - in tandem with their counterparts, the special correspondents - as the progenitors of our modern media world.

Part of the reason for their neglect comes back to the relative inaccessibly of the art form they created, the sheer scale of its production and the wrongful assumption that the original sketches were destroyed in the process. However, pockets of the original artwork do still exist, such as this evocative sketch by William Simpson of the Prince of Wales presiding over the School Children‚Äôs Fete in Mumbai in November 1875, now in the collection of the Mitchell Library in Glasgow (see above).

It is the digital innovations of the twenty-first century that are providing the means for us to look back and reassess the advances of the nineteenth ‚Äď not only in terms of locating the images in digital form, but also by providing platforms in which to group, analyze and ultimately represent them as a reconstituted body of work. Picturing the News: the Art of Victorian Graphic Journalism, the online exhibition I have co-curated with Cathy Waters, is one result. Professor Waters and I will be talking about ‚ÄėRediscovering the Art of Victorian Graphic Journalism‚Äô as part of the AHRC‚Äôs Being Human Festival in the Foyle Room at the British Library on Thursday 23 November. If you cannot get there but are interested in learning more about the subject, please visit https://research.kent.ac.uk/victorianspecials/.

Ruth Brimacombe, Freelance Curator and Art Historian

Posted by Luke McKernan at 2:31 PM

Tags

Writing in the Guardian earlier this month, Roy Greenslade queried what it is about ‚Äėfake news‚Äô that draws such widespread public attention: is it ‚Äėa wilful desire to reject ‚Äúboring‚ÄĚ reality and choose its ‚Äúexciting‚ÄĚ opposite?‚Äô he asks. The question of how to picture the news in a compelling way ‚Äď so that it remains accurate as to the facts, while imaginatively transporting newspaper readers to the scenes and events described ‚Äď goes back to the emergence of the first special correspondents and special artists who worked for the metropolitan press in the second half of the nineteenth century. Who were these newspaper pioneers and how can they help us to understand continuing debates about the media today?

William Howard Russell is probably the only one of the first generation of special correspondents who is now widely remembered, largely as a result of his famous reports from the Crimean War for the Times. Russell‚Äôs despatches from the front were gripping, eye-witness accounts that brought the war home to British readers and galvanized public opposition to the Government‚Äôs mishandling of the campaign. His narratives of spectacle, heroism and suffering established him as the Times‚Äôs leading ‚Äėspecial‚Äô.

Reporting from the seat of war was undoubtedly the assignment that most tested the special correspondent‚Äôs mettle. However, when no war was afoot, they had to turn their hand to cover all manner of events in any location at home or abroad as required by their newspaper. Their versatility was key; and at least equal in fame to Russell on this score from the 1860s onwards was George Augustus Sala: ‚Äėthe chief of travelled specials‚Äô, as he was later described. Sala‚Äôs potential as a ‚Äėtravelling correspondent‚Äô was first demonstrated in 1856-7 when Dickens sent him to St Petersburg to obtain material for a series of papers on Russian life and manners for his weekly periodical, Household Words. Sala‚Äôs colourful, descriptive style, cultivated as a contributor to Dickens‚Äôs journal, flourished when he began work as a special for the fledgling Daily Telegraph in 1857. Although he reported on a number of wars, including the American Civil, Austro-Italian and Franco-Prussian wars, special correspondents were also required, as he wrote in 1871, to ‚Äėbe Jack of all trades, and master of all ‚Äď that are journalistic‚Äô: ‚Äėto ‚Äúdo‚ÄĚ funerals as well as weddings, state-banquets, Volunteer reviews, Great Exhibitions, remarkable trials, christenings, coronations, ship-launches, agricultural shows, royal progresses, picture-shows, first-stone layings, horse-races and hangings‚Äô.

While not all of the journalists who worked as specials became so famous in their own day as Russell and Sala, what distinguished their correspondence was its mobility, versatility and descriptive power: an ability to observe and seize upon events wherever they happened, rendering them for the press in sufficiently graphic prose so as to transport readers through vivid eye-witness accounts. These qualities were also features of the New Journalism ‚Äď a development famously criticised by Matthew Arnold in 1887 as part of a commercially driven press deploying sensational reportage to sell newspapers (a debate that remains familiar today).

But for its proponents, special correspondence was a new technology ‚Äď like the railroad or the telegraph, with both of which it was closely associated ‚Äď that brought the world closer, shrinking space and time and conveying readers to distant places. In fulfilling the often arduous demands of their role, these journalists sometimes became newsworthy in their own right. Indeed, speaking at an anniversary dinner of the Newspaper Press Fund in 1878, Lord Salisbury described the special correspondent as one who ‚Äėseems to be forced to combine in himself the power of a first-class steeple-chaser with the power of the most brilliant writer ‚Äď the most wonderful physical endurance with the most remarkable mental vigour‚Äô.

Some of the remarkable achievements of this forgotten breed of journalists will be rediscovered as part of Being Human: A Festival of the Humanities on Thursday 23 November from 6-8 pm when Dr Ruth Brimacombe and I discuss our online exhibition, Picturing the News, in the Foyle Room of the British Library.

Tags

Below is the text of a short paper I gave recently at 'Language Matters', the 5th Transfopress Encounter in Paris. Transfopress is an international network of archivists, librarians and scholars interested in the study of foreign language press. The subject of this conference was printed news in English abroad and foreign-language publishing in the English-speaking world. My talk was on newspaper data and news identity.

The British Library holds one of the world‚Äôs largest newspaper collections. It has some 60 million issues dating from the 1620s to the present day. The collection is fairly comprehensive from 1840, certainly so from 1869 when legal deposit was instituted, and publishers of British and Irish newspapers were required to send one copy of each issue to the Library. 1,400 additional titles are added each week, along with a web news collection that archives over 2,000 news sites on a frequent basis, and a growing television and radio news collection.

Around two-thirds of the newspaper collection is British or Irish titles. Most overseas newspapers are now taken on only in electronic form or on microfilm, but we nevertheless have substantial holdings of overseas newspapers in English and other languages. This includes an extensive collection of newspapers from Commonwealth countries which were formerly received through colonial copyright deposit.

Our goal is to move from being a newspaper library to being a news library, reflecting the great changes taking place in the world of news today. In doing so we have had to ask questions about what the nature of news is. The definition we use is that news is information of current interest for a specific audience. Such a definition can be applied across different news media and suggests ways of linking them up, but also challenges the idea of what news is, since it can be applied more widely that that just those media we commonly identify as ‚Äėnews‚Äô. Anything can be thought of as contributing to 'news' if it helps inform our world. In particular, it draws attention to communities seeking out news that is meaningful to them, and asks how we should be expressing such audience identification in our catalogue.

British Library title-level list of newspapers (a work in progress)

These issues have come to the fore in a project we have been undertaking, to produce a single title-level listing of all newspapers at the British Library (around 34,000 titles). Producing such a listing from a catalogue built up over many decades and from diverse collections has been challenging. It ought to be a simple case for a national library to produce a single listing of the newspapers that it holds, but in practice a significant number of newspapers have been classified as journals, or even books, on our system. Ensuring that we identify every newspaper as a newspaper has involved some prolonged research, in particular working with areas of the Library that cover particular geographical areas or communities.

For example, over the past year the News section in which I work has been working with our Asian & African department to identify Indian newspapers in the collection. Many of these had been classified as Journals on our catalogue, making discovery difficult for anyone looking for Indian newspapers without a specific title in mind. Multiple standards had been applied to the cataloguing of newspapers in the past, and there were additional problem particular to newspapers, such as changes of title and similarity of titles to other newspaper series. Previous investigations had indicated that we held some 214 Indian newspaper titles; in the end, 234 were identified by a research fellow, Junaid ul-Hassan. Each title was reclassified on our catalogue, the result being that what had previously been a buried newspaper collection has been opened up for researchers.

Map of Indian newspapers held by the British Library

The Indian newspaper records each come with geographical codings, meaning that we can produce a map of their distribution, while research by Junaid into contemporary reference sources has given us a greater picture of what was published overall, from which we may judge how selective and representative our collection of Indian newspapers might be.

A significant number of our newspaper records still require better or more consistent geographical identifiers before we can say with confidence how many newspapers we have from different countries or parts of those countries, or before we can produce further maps such as we have for Indian newspapers. But what about diaspora newspapers? We have many newspapers past and present that have been published for and by different immigrant or ethnic communities within the UK. How does our catalogue reflect the existence of newspapers published by the different communities within the United Kingdom, be they identified by race, religion or particular political persuasion?

The short answer is that we cannot. There is no means of extracting information for the British Library catalogue that will identify all news published for immigrant or ethnic communities, whether in English or other languages. Our catalogue does not work that way. The newspaper titles are there, but but they are not classified in a form that would help us locate them. It is possible to identify some newspapers published in the United Kingdom by the language in which they were printed, which is one way of narrowing down diaspora newspapers, but it is an incomplete solution, since many will have been published in English.

The British Library catalogue primarily identifies a newspaper by its title, date range, place of publication and its geographical coverage. Traditionally, this has been enough. It is not the function of a research library to do the researcher's work for them. We provide the basic list, comprehensively compiled and accurately described, and you must do the rest. You must know what it is that you are looking for.

But one can argue that such an ordering of the data is a form of suppressing identity. The catalogue becomes a political tool, creating conformity of identity through rules of description. Such an ordering reinforces the suppression of difference.

The function of the catalogue as something that replicates society's power structures is well known. Catalogues and classification systems are never the value-free orderings of information that they advertise themselves as being, but are instead profoundly imbued with the values of the dominant society that maintains them.

There is an argument, therefore, that the newspaper catalogue could be doing more to identify different forms of newspaper by their audience and purpose, to counteract this impulse towards conformity.

Should this be a component of news cataloguing, and if so how should it be implemented, both for future news publications and retrospectively? How do we identify a news community, and how do we determine what their understanding of the news was, and from what sources they gained the fullest picture of the world in which they found themselves? As said, the definition of news we are employing is that news is information of current interest for a specific audience. This suggests that identification of audience should be playing a far greater part in how we catalogue newspapers than is currently the case. Cataloguing by nation and geographical area presupposes that all news is geographically determined, but this is not so. Those specific audiences may be determined by gender, age, special interest, belief, language or ethnicity. A community-led understanding of the news may be the necessary way forward - both in how we manage news collections today, and how we revisit the discoverability of our historical news archives.

One of the major growth areas for news in the UK is hyperlocal news. Hundreds of news websites, and in some cases newspapers, have been published independently on an amateur or semi-professional basis, that are aimed at small communities across the UK. Most of these hyperlocals are geographically based, as their name suggests, but they indicate the ways in which traditional structures for the production, ownership and identity of news are changing. they suggest that news is something that comes from us, however we choose to identify ourselves, rather than something that is decided for us. This is the logic of social media, where each of us selects the news world that is meaningful to them.

Another imperative is the direction in which digital libraries are going. As with some other national libraries, the British Library is now archiving its national portion of the Web, including newspaper websites and other news sites. The figures involved are overwhelming, with the number of pages being archived each now to be counted in the billions. Indeed, the amounts of published content coming in across all formats is growing at a rate beyond the comprehension of the ordinary researcher. When we curators at the Library give talks to people about what we are collecting you can see their eyes glaze over. There is too much to take in.

In such a world, there is a paradox. The more we acquire the harder it is to find the resource to make discovery through our catalogues practical, yet the greater the imperative must be to enhance discovery for those who do not need to discover everything, just something.

As collections grow exponentially, so does the need to contextualise them also grow. This cannot be managed by humans, at the rate things are going. It will need to come from algorithms, automated topic extraction, mapping tools and other forms of artificial intelligence. The future of cataloguing is automation, and in such a world it will be our job, as curators, to ensure that the machines address the right needs.

Those of us who manage news archives must rethink how we are managing them. When discoverability becomes overwhelming, and when traditional cataloguing structures hide records that do not conform, such as diaspora newspapers, then we must question what we are doing - and make changes. There will always be the single list of every title that we hold, because ultimately an archive is a collection of discrete objects, each identifiable by a title and a date. But we must think for whom the news has been shaped and published. We must produce discovery tools that bring to the fore different parts of the collection - a multi-faceted approach to replace the linear. We must be mindful of the identity of the news that we archive, without which it is not going to be news at all.

Posted by Luke McKernan at 3:06 PM

Tags

Fake news is probably as old as news itself. Certainly, as far as the British Library is concerned, it goes back to 1614 at least, when the good people of Horsham in Sussex were told of the dragon in their area that was causing great annoyance. Whether those who produced this newsbook believed what they were telling to be "true and wonderfull", who can say?

True and Wonderfull. A discourse relating a strange and monstrous serpent (or dragon) lately discovered, and yet living in Sussex, 1614 newsbook

Today, the subject of fake news is hot news, coming out of the 2016 US presidential election, but with deeper roots in the clash between traditional news providers and the search engines and social media sites through which so many now discover the news that they want to see. Fake news ranges from deliberate falsity, to news you disagree with, to satire. This special edition of the St Pancras Intelligencer rounds up some of what is being said and done about fake news today.

The Ultimate 'Fake News' List (Infowars) - But just to show that one person's truth is another person's outrageous lie, here's an American far right show's listing of the fakery it sees in the mainstream media

Building Global Community (Facebook) - Mark Zuckerberg has issued a manifesto, which in part addresses the topic of the distribution of fake news (Facebook having been the target of many of the complaints made):

We've made progress fighting hoaxes the way we fight spam, but we have more work to do. We are proceeding carefully because there is not always a clear line between hoaxes, satire and opinion. In a free society, it's important that people have the power to share their opinion, even if others think they're wrong. Our approach will focus less on banning misinformation, and more on surfacing additional perspectives and information, including that fact checkers dispute an item's accuracy.

America needs a radical new market intervention similar to that made by the UK Government in 1922 when it issued a Royal Charter and established the BBC ... If, instead of scrapping over news initiatives, the four or five leading technology companies could donate $1 billion in endowment each for a new type of engine for independent journalism, it would be more significant a contribution than a thousand scattered initiatives put together.

Google's fake news Snippets (BBC) - Rory Cellan-Jones's sneak preview of the Google Home speaker showed how it could spout false news in response to spoken enquiries. Google is now adjusting the algorithms...

Announcing New Research: "A Field Guide to Fake News" (First Draft News) - First Draft News have also announced a project that aims "to catalyze collaborations between leading digital media researchers, data journalists and civil society groups in order to map the issue and phenomenon of fake news in US and European politics"

Fake News : The Greatest Lies Ever Told (TruePublica) - So where are the UK's homegrown fake news sites? In a contentious thought piece, Graham Venbergen argues that "In Britain at least, fake news websites have failed to get a grip in the political arena. This is because traditional British news outlets, are already highly accomplished at stretching the truth to its limits and yet still get away with it"

The Choose-Your-Own-News Adventure (New York Times) - Jim Rutenberg illustrates how we can escape reality by pursuing news worlds that match our expectations. But isn't this how news has always worked?

The Institute for Studies has shown that real news is bad enough already, and therefore all fake news from now on must be unbelievably delightful. Professor Henry Brubaker said: ‚ÄúIf the ‚Äėnews‚Äô on social media is just whatever b------- anyone shares, then instead of ‚ÄėMuslims in council-backed halal Easter outrage‚Äô why not ‚ÄėPuppies discover limitless cold fusion energy source‚Äô?

http://www.thedailymash.co.uk

Posted by Luke McKernan at 7:59 AM

Tags

The latest addition to the electronic newspaper resources available to British Library readers is one that we're particularly pleased to have secured, the Rand Daily Mail. Published from 1902 to 1985, the South African daily newspaper was renowned for its anti-Apartheid stance, with notable coverage of the Sharpeville massacre, the Soweto uprising and the death of Steve Biko. Closed down in controversial circumstances in 1985, the entire newspaper is being digitised and made available by research materials service Readex. Happily the British Library is making the entire archive available for remote access to anyone with a Reader's Pass.

The Rand Daily Mail's renowned African Affairs Reporter, Benjamin Pogrund, wrote recently on the Readex blog about the significance of the newspaper and its archive:

The Rand Daily Mail was ahead of its time in reporting and exposing apartheid evils and in opposing oppressive government. This is why it was shut down.

The Mail was always a contradictory newspaper: although owned by mining interests from its start in 1902, it was known for siding with the underdog ‚Äď which, for the first two-thirds of its existence, meant the white underdog.

That changed in 1957 when Laurence Gandar‚ÄĒa quiet, reserved man‚ÄĒbecame editor. Little was expected from him except professional journalism. But he proved to have radical ideas and compassion, and he had an inner core of steel. Gandar dissected apartheid with deep and brilliant writing that electrified the country.

Gandar took his pioneering into the news columns, assembling a staff of journalists whose political views stretched from left to right but who shared a commitment to fair and honest reporting, investigation and robust comment. The newspaper became the pacesetter in illuminating dark corners of South Africa and gave hope to blacks by pointing to a new direction for the country. It transformed itself, the rest of the Press and deeply influenced the political scene.

The board of (white) directors soon turned against Gandar and in time got rid of him. His successor, Raymond Louw, made his own singular contribution: he invested the Mail with a tough news sense while retaining its policy strength.

Integral to this was that the Mail turned the newspaper adage, ‚ÄúWhen in doubt, leave out,‚ÄĚ on its head. Instead, as the authoritarian government‚Äôs restrictions grew on free publication, the newspaper sought to get as much into the open as possible. It wasn't always consistent; but right up to the end, even when tight laws and controls were throttling the Press, the Mail ensured that no-one would ever be able to say that they had not known about the ravages of Afrikaner Nationalist rule.

The Mail was admired by most South Africans of all colors and was honored by its international peers. The reason for respect was why it was loathed by many, but by no means all, within the white community, and they finally prevailed in getting the commercial owners to close it in 1985.

It's exciting to know that with digitization the Rand Daily Mail's treasure store of information about crucial years in the old South Africa will now be more widely available.

Pressures which led to the newspaper's board seeking a change of policy to reach out to more to a prosperous white audience ultimately proved damaging to sales and led to the newspaper's closure. The current owners, Times Media Group, decided in 2014 to resurrect the newspaper as an online archive and, through Readex, sought out the best materials, including the incomplete run of the title held by the British Library.

We are delighted that not only is the electronic archive now available in our Reading Rooms, but is available to British Library Reader pass holders via our Remote Resources service. It therefore joins the small but significant number of electronic newspaper resources to which we subscribe that we can offer to Library users wherever they might be, so long as they have Reader's Pass (information on obtaining such a Pass is here). Other titles available in this way include:

African American Newspapers, 1827-1998 - provides online access to approximately 270 U.S. newspapers chronicling a century and a half of the African American experience. This unique collection features papers from more than 35 states - including many rare and historically significant 19th century titles.

Early American Imprints, Series I: Evans 1639-1800 - contains virtually every book, pamphlet and broadside published in America during the 17th and 18th centuries.

Early American Newspapers, Series I - reproductions of hundreds of historic newspapers, providing more than one million pages as fully text-searchable facsimile images.

Latin American Newspapers Series 1, 1805-1922 - part of Readex's World Newspaper Archive, this database provides access to more than 35 fully searchable Latin American newspapers including key titles from Argentina, Mexico, Chile, Brazil and Peru.

We therefore have a particular strong remote access offering for researchers of African news. The Rand Daily Mail online archive is not complete as yet - currently it runs 1937-1985, but eventually it will contain the full run of 1902-1985.

Posted by Luke McKernan at 12:40 PM

Tags

There are exciting changes happening in how we use newspapers to study the past. After decades in which the use of newspapers in research meant leafing through volumes or scrolling through microfilms, digitisation made millions of newspapers more readily searchable and far more widely available. But now that digitisation that taken us to the next stage in development, which is using the data generated by the digitisation process to look at history on a grand scale. We are moving into the era of big data newspaper studies.

From the University of Bristol study: People in history. (A) famous personalities by occupation using all extracted entities associated with a Wikipedia entry; (B) the probability that a given reference to a person is to a male or a female person

Big data newspaper studies have come about through a combination of large-scale digital resources and a growth in analysis tools. Most will be aware of OCR (optical character recognition), the mechanism by which archival texts can be converted into machine-readable texts by converting what a computer sees as an image (i.e. the arrangement of letters on a page) and matches these to letters that it knows. It is an imperfect science, because OCR can struggle to work with older forms of types and deteriorating page originals, but levels of accuracy continue to improve as new OCR software is developed, and the results are generally satisfactory - that is, most of the time a researcher will find what they are looking for, if it is there to be found.

But added to this are software tools that can extract further sense from the raw data set that generated by OCR. The field of what is called Natural Language Processing, by which computer come to understand human text and speech, includes the extraction of keywords, or named entities, and the matching of these to controlled lists of terms (such as DBpedia), further mapped to geographic areas and time periods, which enables researchers to undertake controlled, thematic analysis of large historical datasets. Our archive of words yields patterns of behaviour with much to tell about our past selves.

This is the theme of a major project undertaken by the Intelligent Systems Laboratory at the university of Bristol, led by Professor Nello Cristianini. As described in their paper 'Content analysis of 150 years of British periodicals', the project worked on a corpus of newspapers digitised from the British Library's collection by family history company Findmypast for the British Newspaper Archive website. The figures involved are huge. The project analysed 28.6 billion words from 35.9 million articles contained in 120 UK regional newspapers over the period 1800-1950, which they calculate forms 14% or all regional newspapers published in the UK over the period.

The project then used this study to explore changes in culture and society, determined by changes in the language. It looks at changes in values, political interests, the rise of 'Britishness' as a concept, the spread of technological innovations, the adoption of new communications technologies (the telegraph, telephone, radio, television etc), changing discussion of the economy, and social changes such as mentions of men and women, the growth in human interest news and the rising importance of popular culture. It is the stuff of multi-volume histories of the past, boiled down to eye-catching graphs.

This does not mean that we thrown away those multi-volume histories, however, The researchers are at pains to point out that such data analysis is an inexact science, with many caveats needed to explain how the entities have been arrived at and with what degree of caution they should be treated. The data derived from such tools can only work where it is supported by traditional studies, to gain the richer understanding of what happened. The machines may have taken the natural language of humans and converted it into data, but the results need to be converted back into human language to offer real understanding.

So it is that some of the results of the project yield results that may seem obvious. We could have guessed beforehand that the newspaper archive would show an increase in discussion of popular culture subjects, that politicians are more likely to achieve notoriety within their lifetimes than scientists, or that there was a rise in coverage of the Labour Party from the 1920s onwards. But the analyses reinforce through data what we have previously inferred through study, while discoveries such as the term 'British' overtaking the term 'English' at the end of the 19th century, or the decline in terms associated with ''Victorian values - such as 'duty', 'courage' and 'endurance' - call for new studies to explore these things further.

The project is at pains to point out the importance of using newspaper archives. Previously we have had big data analyses of millions of historical books, most familiar through the Google Ngram Viewer. This has caused controversy among some scholars, because of the unevenness of coverage of topics in books, and the limitations of merely counting words and making them searchable again. Opening up newspaper archives for comparable analysis widens the amount of content available, arguably with greater reliability overall, and now with tools to make analysis that much more scientific. The use of controlled terms will also enable the analysis across different datasets - so, books and newspapers, but also other news forms, as subtitle extraction and speech-to-text technologies now start to make our television and radio archives available for similar and shared analytical studies. Our big data is only going to get bigger.

There are limitations to this use of newspaper archives. The quality of OCR varies not only according to the original newspaper, but according to the microfilm where this has been used instead of print. Digitisation is quicker and cheaper this way than digitising from print, but older microfilm can be photographically poor, leading to inferior OCR (though there are promising tools appearing for improving poor OCR). The British Newspaper Archive is made up mostly of UK regional newspapers, because the main nationals have often been digitised by their current owners and are available separately. How different was the discourse in newspapers based in London from those around the rest of the country? That has to be the subject of another major study.

One of the better jokes from the Victorian Meme Machine project

The British Library has been engaged in its own big data analyses of newspaper archives. BL Labs is an initiative designed to support and inspire the public use of the British Library‚Äôs digital collections and data in exciting and innovative ways. It has facilitated several studies of British historical topics through the digital newspaper archive. These include Bob Nicholson of Edge Hill University's study of jokes in Victorian newspapers, with the concept of the Victorian Meme Machine (automatically matching jokes to an archive of contemporary images); Katrina Navickas of the University of Hertfordshire's mapping of nineteenth century protest; and Hannah-Rose Murrayof University of Nottingham's tracing of black abolitionists in 19th century Britain. A major user of our newspaper data is M.H. Beals of Loughborough University, who is researching how ideas travel across the historical news media, creating new insights through understanding newspaper archives as structured data.

Such projects are just the start. The availability of large-scale newspaper archives in digital form, and the data derived from such archives, enables us both to seek answers to traditional questions more quickly, and to start asking new kinds of questions. The latter is the great challenge that newspaper data offers. We need to come up with new questions, because the technology enables us to do so, and because it may question what we previously thought that we knew. As the data from their archives comes more readily available, and more easily usable by the non-data specialist, so we will find that we have only just started to read the newspapers. We are going to find that they have much more yet to tell us.

Links:

All of the regional newspapers used in the University of Bristol project are available at www.britishnewspaperarchive.co.uk (subscription site, free to use at British Library locations)