Select language

Objective

One of the most pressing and fascinating challenges scientists face today, is understanding the complexity of our globally interconnected society. The big data arising from the digital breadcrumbs of human activities promise to let us scrutinize the ground truth of individual and collective behaviour at an unprecedented detail and scale. There is an urgent need to harness these opportunities for scientific advancement and for the social good. The main obstacle to this accomplishment, besides the scarcity of data scientists, is the lack of a large-scale open infrastructure, where big data and social mining research can be carried out.

To this end, SoBigData proposes to create the Social Mining & Big Data Ecosystem: a research infrastructure (RI) providing an integrated ecosystem for ethic-sensitive scientific discoveries and advanced applications of social data mining on the various dimensions of social life, as recorded by “big data”. Building on several established national infrastructures, SoBigData will open up new research avenues in multiple research fields, including mathematics, ICT, and human, social and economic sciences, by enabling easy comparison, re-use and integration of state-of-the-art big social data, methods, and services, into new research. It will not only strengthen the existing clusters of excellence in social data mining research, but also create a pan-European, inter-disciplinary community of social data scientists, fostered by extensive training, networking, and innovation activities. In addition, as an open research infrastucture, SoBigData will promote repeatable and open science.Although SoBigData is primarily aimed at serving the needs of researchers, the openly available datasets and open source methods and services provided by the new research infrastructure will also impact industrial and other stakeholders (e.g. government bodies, non-profit organisations, funders, policy makers).

Periodic Reporting for period 2 - SoBigData (SoBigData Research Infrastructure)

One of the most pressing and fascinating challenges scientists face today, is understanding the complexity of our globally interconnected society. The big data arising from the digital breadcrumbs of human activities promise to let us scrutinize the ground truth of individual and collective behaviour at an unprecedented detail and scale. Sensing big data at a societal scale, and the transparent interlinking of digital and physical reality, has the potential of providing a powerful social microscope, which can help us understand many complex and hidden socio-economic phenomena. It is clear that such challenge requires high-level analytics, modeling and reasoning across all the social dimensions above.

There is an urgent need to harness these opportunities for scientific advancement and for the social good, compared to the currently prevalent exploitation of big data for commercial purposes (e.g. user profiling and behavioural advertising) or, worse, social control and surveillance. The main obstacle to this accomplishment, besides the scarcity of data scientists, is the lack of a large-scale open ecosystem where big data and social mining research can be carried out.

SoBigData proposes is creating the Social Mining & Big Data Ecosystem: a research infrastructure (RI) providing an integrated ecosystem for ethic-sensitive scientific discoveries and advanced applications of social data mining on the various dimensions of social life, as recorded by “big data”.

The research community will use the SoBigData RI facilities as a “secure digital wind-tunnel” for large-scale social data analysis and simulation experiments.

SoBigData will serve the wide cross-disciplinary community of data scientists, i.e. researchers studying all aspects of societal complexity from a data- and model-driven perspective, including data and text miners, visual analytics researchers. It can support policy making, offer novel ways to produce high-quality and high-precision statistical information, empower citizens with self-awareness tools, promote ethical uses of big data. SoBigData may empower citizens, NGOs and policy makers with the means to gain a better understanding of complex socio-economic systems, methods for introspection of complex global processes, tools for assessing the implications of decisions beforehand, and hence to improve our capacity to sustainably manage our society on the basis of well-founded knowledge and inclusive participation. In particular, SoBigData may provide policy-makers with a much deeper understanding of behavior and interactions between global systems and will yield tools to develop and test policies in silico.

To this aim, SoBigData promotes repeatable and open data science on the inter-disciplinary field of large-scale social data mining on the base on three pillars establishing the overall goals of the RI:an ever-growing, distributed data ecosystem for procurement, access and curation of big social data, within an ethic-sensitive context, based on innovative strategies for acquiring social big data for research purposes, using both opportunistic means offered by social sensing technologies and participatory means based on user involvement as prosumers of social data and knowledge.an ever-growing, distributed platform of interoperable, social data mining methods and associated skills: tools, methodologies and services for mining, analysing, and visualising complex and massive datasets, harnessing the techno-legal barriers to the ethically safe deployment of big data for social mining.

Building the “Social Mining” community of scientific, industrial, and other stakeholders (e.g. policy makers), supported by joint research, transnational and virtual access activities, and brought together by extensive networking and innovation actions (e.g. workshops, summer schools, datathons, training resources in social data mining, knowledge transfer, industrial partnerships). In particular, the training events

It delivers and describes the software release of all the adaptations of existing and newly developed resources. It describes also all the activities carried out by the partners in order to register those resources to the e-infrastructure.

D11.2 will contain the SoBigData evaluation data collection toolkit, which will enable the campaign participants to create automatically the evaluation datasets, as described in T11.2. In addition, D11.2 will comprise the materials and datasets created for the SoBigData evaluation campaigns, carried out as part of T11.2, and will report on the definition of the exploratories of T11.4. All five thematic areas covered by the SoBigData project will have their corresponding datasets: text and social media mining (USFD, UNIPI), social network analysis (CNR, AALTO), human mobility analytics (CNR), web analytics (LUH), visual analytics (FRH). FRH will have overall responsibility for coordinating the deliverable production.

Describes the set of actions to deliver subsequent releases of the e-Infrastructure as due by the milestones MS2, MS3 and MS5. Plans will be drafted in synergy with all WP8 and WP9 partners, and will drive the incremental integration of existing services in the e-infrastructure, resulting in the software releases indicated in D10.3. Initial plans will include deployment of enabling services and the portal of the e-infrastructure.

Report describing responsible IP principles and practices in SoBigData and listing of best practices that have evolved in and outside SoBigData so as to accommodate both legitimate IP claims in Data and privacy related rights.

It will report on calls initiated in the reporting period, user selection process and outcomes, TA visit reports, and self-evaluation. The M18 and M36 periodic reports will also contain plans for TA calls for the subsequent period.

This deliverable reports on the training activities carried out in this workpackage, at key milestones. This will include reports on summer schools, datathons, and training events for schools and underrepresented communities in data science. Attendance details and other statistics will be collected and reported.

Report describing the evolving versions of the VSD and PbD methods, recommendations and provisions to be adopted in the SoBigData ecosystem, which will drive the integration and enhancement of social mining services and tools of JRA2 into the research infrastructure.

This deliverable will report on the results of T11.1 and position the SoBigData evaluation framework in the context of other relevant established initiatives in the fields of text mining, natural language processing, social network analysis, human mobility analytics, web mining, and visual analytics. D11.1 will also contain the design specification of the SoBigData evaluation framework, based on the results from the first 6 months of work in T11.2. KCL will lead this deliverable, with contributions provided by all partners from T11.1& 2.

The plan concerns the first year of activity and matches the project dissemination goals with the available opportunities. It also outlines the activity for the second year according to the information available at the time of delivery. Finally it includes the initial publication plan.

Report describing the composition of the ethics board and its plan of activities, governance framework, by laws, responsibilities of members and principles, schedule of meetings, reporting and event calendar.

Describes the set of actions to deliver subsequent releases of the e-Infrastructure as due by the milestones MS2, MS3 and MS5. Plans will be drafted in synergy with all WP8 and WP9 partners, and will drive the incremental integration of existing services in the e-infrastructure, resulting in the software releases indicated in D10.3. Initial plans will include deployment of enabling services and the portal of the e-infrastructure.

Report describing the evolving versions of the VSD and PbD methods, recommendations and provisions to be adopted in the SoBigData ecosystem, which will drive the integration and enhancement of social mining services and tools of JRA2 into the research infrastructure.

This deliverable periodically reports and publishes the online training materials, as well as those created for the face to face training events. We will also include access statistics for the online training materials.

This deliverable draws the plan of work for T5.1 and T5.2, including a concrete action plan for the innovation activities aimed at widening the impact of the SoBigData integrated research infrastructure within a 5 years horizon, i.e. including the second part of the project and a further period beyond the funded one.

Report describing responsible IP principles and practices in SoBigData and listing of best practices that have evolved in and outside SoBigData so as to accommodate both legitimate IP claims in Data and privacy related rights.

Report on the e-Infrastructure operation activity since its last deployment, including a detailed set of usage indicators (e.g. number of resources, accesses to resources, usage of resources from scientists of different infrastructures, etc.). Indicators will be generated automatically by the VA e-infrastructure and their current status can be consulted at any moment from the VA e-infrastructure portal. This deliverable will include the assessment report provided by Project Advisory Board.

This software prototype deliverable consists of the customisation of the Innovation Accelerator web platform, including Living Science, as well as delivering impact and innovation metrics for publications and collaborations arising from SoBigData and other big social data analytics projects and key players.

This deliverable will contain the project presentation, the project website, and the project’s social media presence. These comprise the main dissemination channels for promoting and communicating the project and relevant activities and achievements. The project web site, presentation, and social media presence will be updated continuously throughout the project.

The deliverable has an “on-going” nature and organizes and presents the results of T10.1. It will serve as a deliverable and as release-driven guidelines for all SoBigData e-infrastructure stakeholders.

The purpose of this deliverable is to provide an ongoing and up to date wiki containing the description of the available datasets in the consortium. The description includes statistics, metadata, sharing policies and archiving technologies as well as the preservation provisions and lifespan.