Next Strata Online Conference

Wednesday, June 20 at 9AM PT/12PT ET
In the next Strata Online Conference, we'll look at the way data science is shaping elections, from visualizations to game theory, understanding issues to targeting voters. Get a glimpse into the data that's driving how we choose tomorrow's leaders. Learn More

Strata Online Conference: Data That MattersSurfing the Data Deluge to Build a Better World

It was the best of times: open, unfettered access to data from around the world in an instant, and the tools to analyze it and put it in the hands of the many. Today's always-on society has access to the sum of all human thoughts in a click or a tap. Data can right wrongs, speak for the mute, and shine a light on corruption.

Join us for our seventh Strata online conference, as we look at Data That Matters. We'll explore the ways that open access to information is changing how we live, love, work, and play. Hear from innovators, activists and defenders of the commons, and look at what we can do to improve humanity by putting petabytes of raw content to work on the problems society faces.

Agenda/Schedule

Strata chair Alistair Croll will set the stage for the discussion, looking at the potential Big Data holds for society, and highlighting some of the ways the harsh light of data can fight corruption, increase transparency, and turn noble intentions into measurable results more efficiently.

In business, there is a growing awareness that decisions should be evidence based. That is, based on the analysis of the facts and figure that flow from monitoring business practice. While tremendous issues remain, it is clear that businesses are driving towards a model of decision making based on evidence.

In other realms, this idea has had a harder time taking hold. In government, public policy, and education, this notion of evidence based decision making has had a harder time making inroads. In part, this is because many of the issues involved in these decisions have both ideological and emotional elements to them. But these issues aside, there is still the problem of understanding and communicating the data. In fact I would argue that the lack of appropriate analysis, insight and communication adds to the amplies the ideological and emotional issues simply because the data and te insight contained in it cannot be communicated to bread audiences.

One way to solve this problem is to build systems that not only analyze the data but also transform it into stories and insight. That is, transform it into the natural vehicle for communication that we use everyday. The question is, how do we do this in a way that allows us to use the data to tell both the broader stories contained within as well as the singular stories to the “audience of one” that is needed to create relevant communication

In this talk, I will outline a core technology that does exactly this. Given a data set and the possible stories contained in it, this automatically generates stories that communicate the insights (trends, correlations, comparisons) that is all too often hidden within it. I will describe the technology and discuss examples from education, pharmacology and medical informatics that demonstrate how pulling the story from data can be used to drive communication and understanding in areas that desperately need it.

Over the past few years, there has an explosive growth in open data
with significant uptake in government, research and elsewhere. Open
data has the potential to transform society, government and the
economy, from how we travel to work to how we decide to vote. However,
to be useful data (open or otherwise) needs to be used: it needs
individuals and institutions to analyze it and to act on that
analysis, it needs companies and communities to build apps and
services with it, and it needs tools and processes developed to
facilitate doing those activities.

The Open Knowledge Foundation has now been involved for nearly a
decade in building tools and community to create, use and share open
knowledge and data. In this presentation, Open Knowledge Foundation
co-Founder Rufus Pollock will give an overview of some the open tools
and projects the Foundation has been working on to enable better use
of data.

In particular, we’ll be looking at [CKAN, the leading open-source data
portal platform and data management system](http://ckan.org/) which
makes it easy for governments, organizations and individuals to
publish, share and find data. CKAN now powers dozens of official and
community data portals around the world including government data
portals in the UK (data.gov.uk), the Netherlands, Norway, Brazil and
Argentina and the US and the community data hub at http://DataHub.io.

We’ll also take a look at the OpenSpending project which is working to
track every public government and corporate financial transaction
across the world and is now well on its way to doing so now having
data from more than 20 countries. Finally, we’ll provide a sneak
preview into our plans for the [School of
Data](http://blog.okfn.org/2012/02/08/announcing-the-school-of-data/)
which will be launching later this year to provide free training in
all the skills need to become a first class data wrangler and data
scientist.

Data scientists have the power to unlock the secrets within tangled datasets, to find hidden insights in mountains of numbers, and to use data to make better decisions. In this talk we’ll explore how data science can be used in the social sector to help non-profits make better decisions, understand their impact, and better fulfill their missions to improve the public good. We’ll present case studies where data scientists worked with social organizations (DC Action for Children, The Grameen Foundation, and GuideStar) through a partnership with DataKind, a non-profit organization that connects data scientists with mission-driven social organizations.

As the largest and most diverse collection of information in human
history, the web grants us tremendous insight if we can only
understand it better. Web crawl data can be used to spot trends and
identify patterns in economics, health, politics, popular culture and
many other aspects of life. It provides an immensely rich corpus for
scientific research, technological advancement, and innovative new
businesses. It is crucial for our information-based society that the
web be openly accessible to anyone who desires to utilize it.

Common Crawl produces and maintains a repository of web crawl data
that is openly accessible to everyone. The crawl currently covers 5
billion pages and includes valuable metadata. Small startups or even
individuals can now access high quality crawl data that was previously
only available to large search engine corporations.

In this session, Common Crawl Director Lisa Green will discuss the
value of open crawl data, explain how the Common Crawl corpus can be
accessed, and give examples of how the it is currently being used in
research, education and business.

Wikipedia’s editors and readers create a digital footprint of who we are, which is not just defined by the articles that editors write, but also what the Wikipedia readers read and how editors interact with each other. This results in tons of data, most of which is publicly available. The data include the actual contents of all the articles, as well as data about pageviews, data about policy making, and data about editor interactions.

The purpose of this presentation is to give a quick overview of the data sources that are available for free, the tools that the Wikimedia Foundation is developing to analyze these datasets, and how we want to attract a new audience of data lovers and data geeks to help us expanding our understanding of Wikipedia as a micro-cosmos.

Diederik van Liere works as the Product Manager Analytics at the Wikimedia Foundation. The Wikimedia Foundation is a nonprofit charitable organization dedicated to encouraging the growth, development and distribution of free, multilingual content, and to providing the full content of these wiki-based projects to the public free of charge. The Foundation operates some of the largest collaboratively edited reference projects in the world, including Wikipedia, a top-ten internet property.

Program subject to change. The entire conference will be recorded and made available to attendees free of charge afterwards.

Sponsorship Opportunities

A limited number of sponsorship opportunities are available for Strata Online Conference. To sponsor Strata Online Conference and for general sponsorship information, contact Susan Stewart at sstewart@oreilly.com.

Strata Online Conference Program Chair

Alistair Croll is an entrepreneur with a background in web performance, analytics, cloud computing, and business strategy. In 2001, he co-founded Coradiant (acquired by BMC in 2011) and has since helped launch Rednod, CloudOps, Bitcurrent, Year One Labs, and several other early-stage companies. He works with startups on business acceleration, and advises a number of larger companies on innovation and technology.

A sought-after public speaker on data-driven innovation and the impact of technology on society, Alistair has founded and run a variety of conferences, including Cloud Connect, Bitnorth, and the International Startup Festival. He’s the chair for O’Reilly’s Strata + Hadoop World conference. He has written several books on technology and business, including the best-selling Lean Analytics.