The Evolving World of Big Data

A recent search for my ancestors on the internet lead me to explore the website created by the National Archives of Ireland. This site provides public access to the digitized records of the 1901 and the 1911 censuses, free of charge. The site is widely used by historians, genealogists and by the general public. In particular, its contents are of considerable interest to the many people around the world with Irish ancestry and for the many native Irish who have become disconnected from their geneological past. Not only does the site contain details of both censuses, it also contains contextual material, photographs of the time, digital documents of the period and links to relevant scholarly and genealogical sites. These links include newspapers, Irish Canadian emigration records and cemetery records.

The census website is the result of a working partnership between The National Archives of Ireland and The Library and Archives Canada. The site is a work in progress. It provides a facility for users to notify any errors contained in the census details. Mistakes are understandable, as the digitised details are taken from the original microfilms of the hand written census forms. These forms can be viewed on the site and this enables the viewer to cross reference the original details, hand written by the census taker, with its digital counterpart. In my own research I noticed one such error, where a two year old relative in the 1901 census was entered as Theresa Cole. Her actual name was Theresa Coen. This mistake is understandable as Coen is not a name you would expect to fine in rural Cork in 1901 and resulted from the return of this orphaned toddler from the United States, where she was raised by the Murphy family in Kiskeam Co Cork.

The world of Information Technology has moved on since The World Wide Web was created in by scientists at Cern, Geneva, the centre for high energy physics, in an attempt to make it easier to retrieve research documentation. The internet is now an essential part of daily life, both in the public and the private sphere. The new buzz words in Information Technology are big data and open data. In a recent article in the Irish Times entitled TakingBig Brother and big business out of big data, big data is described by Adrian McDonald, the European president of American IT giant EMC as “a set of technologies and techniques that draw together insights from multiple data to provide real-time insight on specific issues”. The article cites such confounding statistics as “90% of all information in the world today was created in the last two years”. McDonald describes the use of big data as follows: ” It’s about discovering a pattern within the data and learning from it “ The intention is to collate the many gigabytes of collected data in existence into information that can benefit society, whether in business, the sciences, the charitable sector or in private life. In an effort to humanise the concept of big data, EMC launched a downloadable app, thehumanfaceofbigdata.com. The app allows the user, by answering fifty questions, to locate ” a doppelganger”, who shar es the same mindset. Within two months of its launch, the app had amassed information from over 3’000,000 users. McDonald highlights the as yet largely untapped power of big data: “Data has been described as the new oil but it is crude oil and has to be refined…”. The analysis of big data can benefit, not only big business, but also humanitarian bodies. For example U.S. company aWhere has launched a project to eradicate malaria by analysing data embedded in pixels in satellite photographs.

Open data is a controversial issue. Businesses may not want their competitors to benefit from generated data. However there is a growing trend towards open data, particularly in the case of data generated by government bodies or publicly funded bodies. On October 24th 2012, Sean Sherlock, Minister of State, Department of Enterprise, Jobs & Innovation and Department of Education & Skills with responsibility for Research & Innovation, released a press statement in which he set out the terms of Ireland’s national open access policy . The release avers that “Open access adds value to research, to the economy and to society. The outputs from publicly-funded research should be publicly available to researchers, but also to potential users in education, business, charitable and public sectors, and to the general public”. The press release contains details of organisations that already have an open access policy such as The Higher Education Authority, Teagasc, and Dublin Institute of Technology. Other organisations involved in the national steering committee include The Environmental Protection Agency, The Health Service Executive and The Department of Agriculture Food and the Marine. The policy is to be effective from Jan 2013, with a phased implementation. This new policy paves the way for broad spectrum availability to a growing fund of big data.

EMC is a front runner in the move to harness the power of big data in business. An extract from its website describes the dynamics of big data in the workplace:

The explosion of mobile networks, cloud computing and new technologies has given rise to incomprehensibly large worlds of information. The rapidly shifting dynamics of competition, coupled with a deluge of data, create new challenges for leaders across all sectors who want to tap into the power of information to make better and more timely decisions about how their companies can best compete, grow and create new sources of value.

Big data in business is a powerful commercial tool. It allows companies, at the touch of a button, to gauge market trends, to economise, and to target new markets. Outside the world of commerce, big data can improve the management of government departments and can increase the capabilities of humanitarian organisations.

In conclusion, the acknowledged arrival of big data has far reaching consequences. Correctly used, it has the power to make significant positive changes in how we live. By adopting an open access policy to publically funded research, the Irish government is proactively engaging with the power of big data. The National Archives census website demonstrates how big data can, by making information available to the public, generate a momentum of its own. This momentum is evidenced by the site’s success in attracting the attention, not only of the historian, the sociologist and the genealogist, but also of the individual and in particular, Ireland’s diaspora, which at last count is estimated by the National Archive as 70 million.