Strata 2014 Schedule

Customize Your Own Schedule

Create your own conference schedule using the personal scheduler function. Mark the Tutorials, Sessions, Keynotes, and Events you want to attend by clicking on the calendar icon next to each listing. Then click on personal schedule below and get your own customized schedule generated.

A new generation of data processing systems, including web search,
Google's Knowledge Graph, IBM's Watson, and several different
recommendation systems, combine rich databases with software driven by
machine learning. This talk
describes our recent thoughts on one crucial pain point in the
construction of trained systems feature engineering.

11:30am-12:10pm (40m)
Data Science

Predictive Modeling in the Cloud with Scikit-learn and IPython

Olivier Grisel (INRIA)

IPython and scikit-learn offer a nice environment for interactive data analytics in general and predictive modelling in particular. This presentation will give an overview on how to use both to perform tasks such as distributed model parameter tuning and parallel training of Random Forests on ad hoc compute clusters provisioned in the cloud.

1:30pm-1:50pm (20m)
Data Science

Crowdsourcing at Locu: How I Learned to Stop Worrying and Love the Crowd

Adam Marcus (B12)

Machine learning and paid crowdsourcing power several virtuous cycles in Locu's data processing pipeline. To solve various problems, we interact with hundreds of long-term crowd workers on oDesk and tens of thousands of shorter-term workers on CrowdFlower. Come learn about Locu's magic with examples based on problems we solve every day.

1:50pm-2:10pm (20m)
Data Science

Organizing Big Data with the Crowd

Lukas Biewald (CrowdFlower)

Data scientists know how hard it is to collect, categorize and label vast amounts of data. But some smart data scientists are effectively leveraging the human intelligence of the crowd to solve these problems, resulting in better training of machine learning models and improved system performance.

2:20pm-2:40pm (20m)
Data Science

Network Science Made Simple: SNA for Pie Chart Makers

Marc Smith (Connected Action Consulting Group)

SNA, social network analysis, is a powerful technique for making sense of a connected world. But the skills needed to collect, analyze, visualize, and gain insights into collections of connections are hard to find. Now, new tools make networks as easy to manage as a pie chart. Using the familiar Excel spreadsheet, NodeXL enables end users to gain insights into Twitter, Facebook & more.

2:40pm-3:00pm (20m)
Data Science

Friending Graph Analytics: Large-Scale Graph Processing Made Easy

Ted Willke (Intel)

Graph analytics promises to uncover new patterns in big data - but it's not easy to use commercially. Why is it so tough for data scientists to construct graphs and extract insight? This talk discusses Intel's efforts to deliver a graph cluster solution that is as easy to work with as it is powerful.

In this panel discussion, experts from four different industries will share their first-hand experiences building and deploying teams of data scientists.

4:50pm-5:30pm (40m)
Data Science

The Sidekick Pattern: Using Small Data to Increase the Value of Big Data

Abe Gong (Superconductive Health)

Creating value from big, messy data sets can be a daunting task. The session introduces the Sidekick Pattern: using small, curated data to increase the value of Big Data. Drawing on lessons from data science for Jawbone’s UP fitness tracker, we will see how smart selection of data sidekicks can accelerate analysis, solve cold start problems, and simplify complicated data pipelines.

10:40am-11:20am (40m)
Data in Action

10,000: The Most Dangerous Number in Sports

David Epstein (Sports Illustrated)

Epstein explains the origins of the "magic number," how it should be used, and how it is often misused in a manner that often hinders performance science-and leads sports executives to overlook simple but important data-as well as the development of athletes.

11:30am-12:10pm (40m)
Data in Action

Mining Student Notes in Real Time to Provide Study Guides

Perry Samson (University of Michigan)

What if students could be provided helpful feedback in real-time based on the notes they are typing in class? This talk presents a prototype that has been in use in multiple courses at the University of Michigan to both challenge students' understanding based on the words they type in class and offer further resources for further study.

1:30pm-2:10pm (40m)
Data in Action

Building a Lightweight Discovery Interface for Chinese Patents

Eric Pugh (OpenSource Connections)

The United States Patent and Trademark Office wanted a simple, lightweight, yet modern and rich discovery interface for Chinese patent data. This is how we did it.

2:20pm-3:00pm (40m)
Data in Action

How Twitter Monitors Millions of Time-series

Yann Ramin (Twitter, Inc.)

Twitter's Observability stack collects, processes, monitors and visualizes over 170 million real-time time series from all service and system components. This session covers how the stack is built and scales to enable developers and reliability engineers to build fault-tolerant distributed services. In this talk, you will learn what works and what doesn’t, from architecture to implementation.

4:00pm-4:40pm (40m)
Data in Action

The Netflix Data Platform - A Recipe for High Business Impact

Kurt Brown (Netflix)

Netflix is a data-driven company. While "data-driven" is often no more than a lofty buzzword, we'll discuss how we make it a reality. We'll dive into the technologies we use and the philosophies underpinning how we get things done. We'll cover our "cloud native" data infrastructure, our use and contributions to open source software, and our open and enabling data environment.

4:50pm-5:30pm (40m)
Data in Action

Exascale Data Analytics @ Facebook

Sambavi Muthukrishnan (Facebook)

Data analytics is at the heart of product development at Facebook. Facebook’s data warehouse has grown rapidly over the years, and poses unique scalability challenges. This talk will briefly outline the evolution of the analytics software stack in the last year (both storage and query engines) and then delve deeper into the data management and compute challenges at this scale.

10:40am-11:20am (40m)
Hadoop and Beyond

Navigating the Big Data Vendor Landscape

Edd Wilder-James (Google)

A maze of twisty databases, all of which look the same, and each claim they're best for the job. Welcome to the world of choosing big data vendors. In this session we'll map out the data tool landscape, and lay out a framework to help you choose a solution, or elect to build one yourself.

The growing popularity of Hadoop has led to an increasing number of clusters worldwide. Priming these clusters with data from existing client repositories is difficult due to a number of issues including data size, network constraints, security & lack of domain knowledge. In this talk, we present a number of techniques & best practices for uploading large amounts of data to remote Hadoop clusters.

1:50pm-2:10pm (20m)
Hadoop and Beyond

Break Down Data Silos with Apache Accumulo

Adam Fuchs (Sqrrl)

Apache Accumulo has evolved from a niche government project to a key component of the Hadoop ecosystem with adopters across a variety of industries. One important differentiator for Accumulo is the concept of "cell-level security." Learn how to properly implement cell-level security concepts from the former technical director of the Accumulo project at NSA.

Design of Experiments (DOE) is a scientific approach to understanding causality using data collection and applied statistical techniques. Through a series of relevant case studies, this session will review the “design” and the “experiment” side of DOE, including systematic data collection and basic statistical applications, and discuss relevant applications beyond A/B testing websites.

2:40pm-3:00pm (20m)
Hadoop and Beyond

Working With Time Series Data Using Apache Cassandra

Patrick McFadin (Datastax)

Apache Cassandra has proven to be one of the best solutions for storing and retrieving time series data. This talk will give an overview of the many ways you can be successful.

4:00pm-4:40pm (40m)
Hadoop and Beyond

Secrets of Apache Hive Queries and UDFs

Shrikanth Shankar (Qubole Inc.)

Shrikanth Shankar, Qubole’s VP of Engineering, shares his best practices for building high-performance, scalable queries and deploying User Defined Functions (UDFs) to Big Data applications in Apache Hive. For data analysts and data scientists in the trenches, this is a key session to attend.

4:50pm-5:30pm (40m)
Hadoop and Beyond

Apache Hadoop 2.0: Migration from 1.0 to 2.0

Vinod Kumar Vavilapalli (Hortonworks)

The Hadoop 2.0 revolution is in full force! Organizations, companies, users are gearing up for the move from 1.0 to 2.0. In this talk, we will discuss what Hadoop 2.0 is about, what YARN is, what features that HDFS2 unlocks and what it means to move to 2.0. We'll discuss this major migration from 1.0 to 2.0 from various perspectives - admins, frameworks, end users & data processing platforms.

10:40am-11:20am (40m)
Ethics, Policy, and Privacy

Adaptive Adversaries: Building Systems to Fight Fraud and Cyber Intruders

Ari Gesher (Palantir Technologies)

Statistical methods tends to fail when there is someone on the other side of a problem actively evading detection. Here we look at three systems successfully used to fight adaptive adversaries engaged in fraud and cyber attacks. Using a combination of big data techniques and interactive human analysis, these systems are protecting commercial banks, pharmaceutical companies, and governments.

11:30am-12:10pm (40m)
Ethics, Policy, and Privacy

Machine Learning for Social Change

Fernand Pajot (Change.org)

With more than 45 million users and over 40,000 petitions created every month, Change.org is the biggest online platform for social change around the world. This talk is about how both bleeding edge and simple machine learning algorithms are used at Change.org to connect users to petitions and social issues which are most relevant to them.

1:30pm-2:10pm (40m)
Ethics, Policy, and Privacy

Evolving Data Governance for the Big Data Enterprise

Scott Lee (Knowledgent), Rachel Haines (EMC)

Earlier Data Governance generations (that support BI-DW or MDM) succeeded by aligning stakeholders and improving data interoperability. But in the world of Big Data, interoperability is table-stakes, and next-gen Data Governance must provide contextual intelligence sufficient to reason out complex inquiries across diverse data. How? Would you believe a mash-up of building codes and game theory?

2:20pm-3:00pm (40m)
Ethics, Policy, and Privacy

A Different Look at Data and Security - Learning to Live with Fear

Pablos Holman (Turing AI)

We are at the beginning of creating a generation of scientists & analysts who can relate to data in entirely new ways. The feeble computational models we’ve created in Excel over the course of our lives are fundamentally different than what is just becoming possible.

4:00pm-4:40pm (40m)
Ethics, Policy, and Privacy

Soylent Mean: Data Science is Made of People

Cameran Hetrick (VMware), Kimberly Stedman (Freelance)

Combine your best algorithms and smartest data architecture, and what do you get? Without humans, you have an expensive, high tech brick. Humans generate data, which is used by and for humans to achieve human goals. If you want your data department to earn its keep by showing real value, you must build your social systems as meticulously as you build your pipeline.

What can an SQL query teach us about the gender gap? We'll dive into the 20 million Freebase entities to focus on people notable enough to be part of it. What percentage of them are women? How is the gender gap divided by profession? How is it changing throughout the years? How do all this variables this look mapped at a country, state, and neighborhood level?

10:40am-11:20am (40m)
Connected World

You're Halfway There: Moving from Insight to Action

Bob Filbin (Crisis Text Line)

The measure of success for a data scientist is not number of insights, but impact on co-workers' behavior. Moving from insight to action requires an art underutilized by the data science community: storytelling.
I will cover techniques including the Fogg model, loss aversion, and minimum viable stories, using examples of my failures and successes in driving behavioral change with data.

11:30am-12:10pm (40m)
Connected World

Thinking with Data

Max Shron (Warby Parker)

Why have powerful tools if you aren't asking the right questions? Good questions trump shiny tools, but our community has done little to improve how we train people in the "soft side" of data science. We will show how to borrow ideas from design, the humanities, consulting practices to structure problems and improve the questions we ask of our data.

A group of VCs who invest from very early, through later stage investments talk about all things Big Data. There will be no “3 Vs” discussion here. The Panelists are committed to making this a lively discussion about topics ranging from the typical (what sectors do they want to invest in) to the atypical (what’s out there that they don’t like?

2:20pm-3:00pm (40m)
Connected World

Harvard's Clean Energy Project: Big Data Maps To Renewable Energy

Kai Trepte (Harvard Clean Energy Project)

The present fossil fuel based economy must give way to a renewable energy based future. The Harvard Clean Energy Project set out to discover new molecular materials for the next generation of organic solar cells. In studying 2.3 million (m) compounds with 24m conformers in 150m density functional theory calculations, this Big Data project will benefit mankind aiding the quest for clean energy.

4:00pm-4:40pm (40m)
Connected World

Bedtime Stories: Learning from Sleep Data

Monica Rogati (Data Natives)

We optimize ads, but not our mood. We know more about our tweets than our own bodies. That's all about to change. As wearables transform the 'quantified self' from a niche to a mainstream market, they are generating vast amounts of data about our health, habits, and lifestyles

4:50pm-5:30pm (40m)
Connected World

Sending Millions of Surveys Around the World on Mobile Phones

Max Richman (Mobile Accord - GeoPoll)

At GeoPoll we are building a mobile integration platform to poll millions around the world via their own mobile phones. We do this by integrating with mobile carriers in places like Afghanistan and Congo to target users by location, make messages free, & pay users directly. This is hard. We have learned many dos and don'ts which we would like to share.

10:40am-11:20am (40m)
Design

Information Visualization for Large-Scale Data Workflows

Michael Conover (LinkedIn)

A core element of product innovation and successful predictive modeling, information visualization plays a central role in effective data processing pipelines. In this talk, we will explore how the technologies and workflow patterns used by LinkedIn data scientists can be applied to analytical challenges found across a wide variety of problem domains.

Storing massive data is one challenge. Making it useful throughout all levels of a company in real time is quite another. The ability to intuitively sort, sift and analyze data through touch and gesture is here.
We will review several case studies of how companies are creating an intuitive data driven cultures through Cloudera Search, leveraging Impala coupled with Zoomdata visualization.

The true power of big data will be realized when average people can use complex analytics to solve everyday problems. We will describe a future engagement model derived from work in the Intelligence Community, reviewing real-world use cases showing how user-centric design is transforming big data from a science requiring specialists to elegant visualizations that deliver insight to average users.

1:50pm-2:10pm (20m)
Design

Superconductor: Scaling Charts with Design and GPUs

Leo Meyerovich (Graphistry)

Visualization is a weak link in big data tools: shoving 1MM rows into standard charts breaks their visual design and kills interactivity. In our mission to scale charts, we built the Superconductor language. It automatically compiles declarative visualizations into GPU code (WebCL+WebGL). This talk will explore how we're redesigning and optimizing core charts like heat maps and line graphs.

2:20pm-3:00pm (40m)
Design

Unlocking the Secrets of Gertrude Stein

Ian Timourian (Paxata)

Happy accidents can influence one's creative process. Ian Timourian will discuss his exploration of the algorithms and techniques utilized by the famous poet Gertrude Stein through visualization.

4:00pm-4:40pm (40m)
Design

Music Videos and Gastronomification for Big Data Analysis

Brian Abelson (CSV Soundsystem), Thomas Levine (csv soundsystem)

We have developed some open-source tools for building and
scaling systems for realtime data analysis with data music
videos and data gastronomification. We'll discuss the theory
behind these two data analysis methods, and then we'll present
case studies on how our tools are used to enable business
analytics and instill a data-driven culture.

4:50pm-5:30pm (40m)
Design

Making Data Human

Shelley Evenson (Fjord)

This talk by Shelley Evenson, Executive Director of Organizational Evolution at Fjord, will outline the key tenets of designing for big data: the difference between using personal or aggregate data, how to identify and utilize data patterns, how to build trust, and ways to deliver ongoing value at the right moments.

10:40am-11:20am (40m)
Sponsored

Fighting Global Cybercrime and BotNets using Big Data

Bryan Hurd (Microsoft Cybercrime Center), Herain Oberoi (Microsoft)

BotNets and cybercrime are by their very nature Big Data problems. The Microsoft Cybercrime Center is working in conjunction with law enforcement, public sector, commercial and academic partners to investigate, disable and prosecute cyber criminals...

11:30am-12:10pm (40m)
Sponsored

Harness Data in Real-Time with Infinite Storage

Yuvaraj Athur Raghuvir (SAP Labs LLC.)

To seize the future data must be harnessed in actionable time. Based on a real deployment see to achieve instant results with infinite storage - filter large amounts of cold data in Hadoop, analyze in Real-Time with SAP HANA and visualize using SAP Lumira. Learn how solutions from SAP and our Hadoop partners can help your organization seize the future and gain unprecedented insight from Big Data.

1:30pm-2:10pm (40m)
Sponsored

Making Big Data Cost Effective in a Bare Metal Cloud

Harold Hannon (SoftLayer)

The cloud provides an easy onramp to building and deploying Big Data solutions, particularly the latest technologies that favor scale-out architectures. Transitioning from initial deployment to a large-scale, highly performant operation without breaking the bank may not be easy.

2:20pm-3:00pm (40m)
Sponsored

Delivering on the Promise of Big Data

Arvind Parthasarathi (YarcData)

The real promise of big data isn’t about merely doing analytics cost-effectively and at scale; it’s about discovery. Data discovery means uncovering hidden patterns from disparate sources without needing to know which questions to ask or the data relationships in advance...

4:00pm-4:40pm (40m)
Sponsored

Big Data: Beyond Bare-Metal?

Mike Wendt (NVIDIA)

In this session, we will share the results of our study, a price-performance comparison of a bare-metal Hadoop cluster and cloud-based Hadoop clusters.

Join Trifacta's founders and their customers to learn how Data Transformation is changing the way people work with data. By increasing data analyst productivity and giving business analysts direct access to Big Data for the first time, Trifacta's technology increases the breadth of data they work with, significantly shortens "time to insight", and enables better business decisions.

10:40am-11:20am (40m)
Sponsored

Apache Hadoop and the Emergence of the Enterprise Data Hub

Eli Collins (Cloudera)

In this talk, we'll explore how Apache Hadoop has rapidly evolved to become the new foundation for enterprise analytics - the enterprise data hub - and learn about the state-of-the-art in deploying a modern data warehouse on top of the Hadoop stack.

Attend this session to learn how you can take advantage of the new economics of data. This session will present examples of how leading organizations are evolving their enterprise data architectures to bring together the Data Warehouse, Hadoop & Data Discovery Platforms so All Users can benefit from ALL Analytics on ALL Data.

1:30pm-2:10pm (40m)
Sponsored

Building a Data-centered Data Center for Agile Development

Justin Makeig (MarkLogic)

Most data centers are filled with rigid data servers that are tightly linked to specific applications, leading to data duplication, lengthy development cycles, and unnecessary costs. Learn how you can use the MarkLogic Enterprise NoSQL database platform to help create a flexible, agile data fabric that will allow you to iterate your application development, optimize your data, and reduce costs.

2:20pm-3:00pm (40m)
Sponsored

Scalable PostgreSQL as your data platform

Ben Redman (Citus Data)

PostgreSQL is an advanced open source database known for its reliability. It also features a rich extension ecosystem that enables features like semi-structured data types, new SQL operators, and a columnar data store. This talk examines extensions available to PostgreSQL users and how CitusDB turns PostgreSQL into a scalable data platform for addressing real world analytics problems.

4:00pm-4:40pm (40m)
Sponsored

Transforming Search Engine Marketing at Ask.com

Mohit Sati (Ask.com)

Search Engine Marketing is an important revenue opportunity for Ask.com, planed to nearly double in 2014. Fueled by growth and acquisitions such as About.com and Investopedia, the keyword portfolio will grow by 90x through 2014.
SEM Analytics at Ask.com involves tens of millions of cost metrics stored daily, hundreds of millions of portfolio keywords, and billions of historical costs.

4:50pm-5:30pm (40m)
Sponsored

Tracking a Soccer Game with Big Data

Srinath Perera (WSO2)

This presentation discusses how we used complex event processing (CEP) and MapReduce based technologies to track and process data from a soccer match as part of the annual DEBS event processing challenge while achieving throughput in excess of 100,000 events/sec.

10:40am-11:20am (40m)
Sponsored

Best Practices for Hadoop In Production - Panel Discussion Facilitated by Forrester Analyst

Mike Gualtieri (Forrester Research)

Mike Gualtieri, principal analyst at Forrester Research, Inc., will facilitate a panel of production Hadoop users – including Cisco, The Climate Corporation, The Rubicon Project, and Solutionary – to discuss the challenges and best practices for deploying Hadoop in production. Join us for an engaging conversation on tips and tricks in deploying Hadoop in production.

11:30am-12:10pm (40m)
Sponsored

You Don't Need to Boil the Big Data Ocean with Hadoop

Ben Werther (Platfora), Sanjay Mathur (Silicon Valley Data Science)

Join us as we discuss the real-world applications of big data, examine what's working and what isn't, and discuss why you don't need to boil the big data ocean with Hadoop.

1:30pm-2:10pm (40m)
Sponsored

How Evernote Measures Conversion Using Hadoop Analytics

Damon Cool (Evernote), John Santaferraro (Actian Corporation )

In 2012, Evernote took proactive steps to prepare for a rapidly expanding customer base by making the transition from 18-hour queries on a MySQL server to ad hoc analytics for 200 million daily events—while on a budget. This session explains how Evernote is scaling to hundreds of terabytes and analyzes 200 million events per day using two-tier architecture including Hadoop and analytic platform.

2:20pm-3:00pm (40m)
Sponsored

Collaborative Predictive Analytics: How Sony, Havas, and Aridhia Opened the Black Box.

Bruno Aziza (Alpine Data Labs)

In this panel discussion, we’ll hear from entertainment, healthcare, and media industry leaders as they discuss their strategy to demystify analytics end to end. We’ll have a question and answer session moderated by Alpine Data Labs.

4:00pm-4:40pm (40m)
Sponsored

Twitter and HP HAVEn: The Big Data Big Picture.

Sanjay Goil (Autonomy IDOL)

Forget the 140 characters, Twitter is Big Data. Every day sees around 100TBs of data ingested and tens of thousands of Hadoop jobs. Join us to hear how Twitter is using HP’s HAVEn platform to run their Big Data analytics. Learn why they’ve integrated HP Vertica with their Hadoop infrastructure to deliver the scale and speed needed for their analytics.

4:50pm-5:30pm (40m)
Sponsored

Getting a Handle on Hadoop and its Potential to Catalyze a New Information Architecture Model

Milan Vaclavik (CenturyLink Technology Solutions)

We will discuss the strategic significance of infrastructure core services (compute, storage, network, and comprehensive security) required for robust big data solutions. Also the strategic significance of Hadoop 2.0, Hadoop/NoSQL convergence, and the critical need for effective modeling, query formulation, and data analysis capabilities as Hadoop becomes an enterprise platform for big data.

10:10am-10:40am (30m)

Break: Morning Break sponsored by Cloudera

3:00pm-4:00pm (1h)

Break: Afternoon Break sponsored by MapR

5:30pm-7:00pm (1h 30m)
Event

Booth Crawl

Quench your thirst with vendor-hosted libations and snacks while you check out all the cool stuff in the Expo Hall.

12:10pm-1:30pm (1h 20m)

Wednesday Lunchtime BoF Tables

Birds of a Feather (BoF) sessions are informal roundtable discussions happening during lunch on Wednesday, February 12 and Thursday, February 13. You can join any BoF table or start your own with a topic of your choice. The BoF sign-up board will be near the Registration area.

Strata Program Chairs, Roger Magoulas and Alistair Croll, welcome you to the first day of keynotes.

8:50am-9:05am (15m)

Crossing the Chasm: What’s New, What’s Not

Geoffrey Moore (Geoffrey Moore Consulting)

Crossing the Chasm has been a key reference point for high-tech marketing since its publication in 1990, but a lot has changed since then, especially with the rise of cloud computing, software as a service, mobile endpoints, big data analytics, and viral marketing.

9:05am-9:10am (5m)
Sponsored

Evolution from Apache Hadoop to the Enterprise Data Hub

Amr Awadallah (Cloudera)

In this talk Dr. Amr Awadallah will present the Enterprise Data Hub (EDH) as the new foundation for the modern information architecture. Built with Apache Hadoop at the core, the EDH is an extremely scalable, flexible, and fault-tolerant, data processing system designed to put data at the center of your business.

9:10am-9:15am (5m)
Sponsored

Collecting Massive Data via Crowdsourcing

Metro John Schitka (SAP)

Crowdsourcing can be an effective way to collect massive amounts of data to enable deeper analysis in many situations. Explore the foundational steps that can lead to successfully crowd sourcing data though the lenses of the International Barcode of Life and Technical University Munich (TUM) ProteomicsDB projects. SAP is proud to be involved with driving the success of both these projects.

9:15am-9:25am (10m)

Empowering Personalized Learning with Big Data

Ramona Pierson (Declara)

Humans are constantly curious and learning should be about making new discoveries. With big data, we have the potential to take formal learning which is taught and combine it with informal learning which is experienced, to create personalized learning paths for every individual.

9:25am-9:30am (5m)
Sponsored

Hadoop in 5 Minutes or Less

John Schroeder (MapR Technologies)

This five-minute keynote will provide a quick overview of some of the more surprising things Hadoop is capable of in 5 minutes or less.

9:30am-9:35am (5m)

People are Data Too

Farrah Bostic (The Difference Engine)

We feel safer in big numbers, and we believe that numbers don't lie. But numbers don't actually speak for themselves - people speak for them.

9:35am-9:45am (10m)
Sponsored

Bringing Big Data to One Billion People

Quentin Clark (Microsoft)

How does the world change when big data reaches a billion people? What happens when anyone, from farmers to criminal investigators, gains the power to quickly derive meaningful insights from vast and varied data sources? Join Quentin Clark, Microsoft Corporate Vice President, who will highlight how simple, familiar tools and cutting-edge cloud technologies are bringing big data to all.

9:45am-9:55am (10m)

Small Data in Sports: Little Differences that Mean Big Outcomes

David Epstein (Sports Illustrated)

The gap between legendary and anonymity in sports is often less than a 1% performance difference in elite sports. Thus, finding the core, modifiable variables that determine performance and tweaking them ever so slightly can alchemize silver medals into gold ones.

9:55am-10:05am (10m)

The Art of Good Practice

Rodney Mullen (Almost Skateboards)

The better we tune our practice, the more practice will make perfect.

8:00pm-11:00pm (3h)
Event

Data After Dark: Club Strata

Help us kick off Strata 2014 with a festive gathering featuring a poker tournament. But even if you're not a card shark, join us for plenty of networking, refreshments, and great music, played by DJs whose day job is data science.