Abstract

The growing number of people using social media to publish their opinions, share expertise, make social connections and promote their ideas to an international audience is creating data on an epic scale. This enables social scientists to conduct research into ethnography, discourse analysis and analysis of social interactions, providing insight into today's society, which is largely augmented by social computing. The tools available for such analysis are often proprietary and expensive, and often non-interoperable, meaning the rapid marshalling of large data-sets through a range of analyses is arduous and difficult to scale. The collaborative online social media observatory (COSMOS), an integrated social media analysis tool is presented, developed for open access within academia. COSMOS is underpinned by a scalable Hadoop infrastructure and can support the rapid analysis of large data-sets and the orchestration of workflows between tools with limited human effort. We describe an architecture and scalability results for the computational analysis of social media data, and comment on the storage, search and retrieval issues associated with massive social media data-sets. We also provide an insight into the impact of such an integrated on-demand service in the social science academic community.