Description:

Join a lineup of the top thinkers and technologists from the upcoming Strata + Hadoop World at this free live-streamed event, as they cover the hottest data topics and explore how businesses are using data to get results. We'll examine the ways that data is used across a variety of industries from healthcare to business—as well as a case study of Uber's simulation framework, architectural considerations for Hadoop, and building privacy protected data systems.

About Alistair Croll

Alistair has been an entrepreneur, author, and public speaker for nearly 20 years. He's worked on web performance, big data, cloud computing, and startup acceleration. In 2001, he co-founded web performance startup Coradiant (acquired by BMC in 2011), and has since helped launch Rednod, CloudOps, Bitcurrent, Year One Labs, and several other early-stage companies.

Alistair is the chair of O'Reilly's Strata conference. He also helped found Bitnorth, the International Startup Festival, and several other technology events. He works with a few startups on business acceleration, and advises a number of larger companies on innovation and technology. "Lean Analytics" is his fourth book on analytics, technology, and entrepreneurship.

Alistair lives in Montreal, Canada with his wife and daughter, and tries to mitigate chronic ADD by writing about far too many things at "Solve For Interesting".

Data and experiment driven cultures are steadily growing in the tech industry. While fostering such a culture reaps many benefits for a company it also brings an important mandate to properly instrument, measure, and attribute experiment impact. While the gold standard of A/B testing allows for straightforward experimental analysis, there are a number of scenarios that are not amenable to A/B testing due to various constraints (financial feasibility, technical capability, etc.).

Such "non-standard" quasi-experimental events are quite common but many companies, even with data driven cultures, ignore them since they fall outside the randomized control trial framework. In this talk we will explore a number of techniques that allow for improved impact measurement and attribution that enhance each other either in an iterative or modular way that allow data scientists to derive value from what might normally be thought of as "messy" or "unusable" data.

We will learn about these techniques with the aid of examples from the popular press (Zach Galifianakis and healthcare.gov), Microsoft advertising (television and print), and Bing experimentation (comparisons of A/B tests and techniques outlined in this talk). In each case we will compare analysis techniques, point out inconsistencies in naive analysis, and build methods to avoid such mistakes.

The goal of this talk is for the audience to not only gain an understanding of why impact and attribution are important, but also to understand the assumptions, pit falls, and strengths of various analytic approaches to dealing with impact and attribution. This talk is intended to bridge the gap from initial instrumentation, infrastructure, and dash boarding to designing experiments that move metrics in a positive way and understanding what caused them to move in the first place.

About Chris Harland

Chris Harland is a Data Scientist at Microsoft working on problems in Bing search, Windows, and MSN. He holds a PhD in Physics from the University of Oregon and has worked in a wide variety of fields spanning elementary science education, cutting edge biophysical research, and recommendation/personalization engines.

Ever since Chris started using Bayesian methods on a semi-regular basis the frequency with which he uses the phrase "well, maybe" in conversation with colleagues has increased ten fold. His colleagues have yet to forgive Bayes for this.

If all you can only hear are the screaming voices in your data, you're likely only acting on what every other rational expert would see. What separates innovation from incremental improvement is the ability to listen to the weak signals from your data—and customers, advisers, and partners.

How do we let go of our familiar metrics and listening posts, and instead find new hits where before we heard only silence? In this webcast talk—and with a nod to Simon and Garfunkel—Jana Eggers offers five tips to help business find the way to the words of the prophets written on data's subway walls.

About Jana Eggers

Jana is a tech exec focused on products and the messages surrounding them. She's started and grown companies, as well as led large organizations within even bigger companies. She supports, subscribes, and contributes to customer-inspired innovation, systems thinking, lean analytics, and the Autonomy/Mastery/Purpose-style leadership. Her software and technology experience comes from technology and executive positions at Intuit, Blackbaud (software for nonprofits), Basis Technology (internationalization technology), Lycos, American Airline's Sabre (decision support systems for logistics), Los Alamos National Laboratory (computational chemistry and super computing), Spreadshirt (customized apparel platform & ecomm), and acquired start-ups that you've never heard of. Eggers received her bachelor's degree in mathematics and computer science at Hendrix College in Arkansas and attended graduate school at Rensselaer Polytechnic in computer science.

Business problems don't reveal themselves neatly as data problems. As we gather more and more fine-grained data (behavioral, event-based, machine collected), we see a shift in both the tools and technical skills necessary to answer tough questions. The tools are becoming more commoditized, but the problem remains to actually bridge the gap between business needs and the math.

Who will do this work and how will they do it? A decade of investment in BI made it possible for a manager to quickly pull up answers to questions that fit into an OLAP cube. Fine-grained data poses unique challenges that make it tough if not impossible to provide tools directly to those who most understand the needs of a business. Data scientists, most of whom have exclusively technical backgrounds, need a methodology for fitting together the pieces of the puzzle. Business leaders, too, need new skills to make sure that data science work yields actual benefits.

The tools, technology and even the people aren't enough unless we can figure out how to solve the right problem. Based on material from Max Shron's book Thinking with Data and his experience running a data strategy consulting firm, we'll explore tactics for need-finding and problem scoping that make it possible to put investments in data to profitable use.

About Max Shron

Max Shron runs Shron & Company, a data strategy consulting firm based in New York. His team provides advice and analysis to help organizations tackle hard data challenges. Max previously was lead data scientist at New York-based OkCupid, and participated as the big-data side of its successful OkTrends blog. His work has appeared worldwide, in outlets including the New York Times, Chicago Tribune, Huffington Post and WNYC. Max holds a degree in Mathematics from the University of Chicago

The session introduces advanced math for business people — "just enough" to take advantage of open source frameworks — including graph theory, abstract algebra, optimization, bayesian statistics, and more advanced areas of linear algebra. These are needed for supply chain optimization, pricing models, and anti-fraud, especially given the increased data rates coming from the Internet of Things.

In the talk, Paco Nathan will highlight:

Develop themes within the material to highlight a computational thinking approach for Big Data

Decompose a complex problem into smaller solvable problems

Leverage pattern recognition to identify when a known approach can be leveraged

Abstract from those patterns into generalizations as strategies

Articulate strategies as algorithms — general recipes for how to handle complex problems

Uber has two main goals: 1) Get you a ride when you need it, and; 2) Make sure our driver partners are maximizing their earnings. Optimizing these two parameters requires modeling a number of complex, non-linear, interacting systems. Rather than actually confronting this difficult problem directly, Bradley made use of agent-based simulations of driver and passenger behaviors to see what combinations of parameters were best.

He introduced Uber's city simulation framework and explains how and why they simulate Uber passenger/driver interactions. He will also discuss how this is used for "semi-automated science" to generate plausible A/B test options for Uber to explore.

Bradley's simulations recommend optimal dispatch distances for pairing a driver with a passenger, a value that varies over time and differs across cities. Furthermore, the simulations suggest optimal behaviors for drivers to take between trips such that, when dispatch distances are very short drivers should navigate back toward demand density, however when dispatch distances are relatively longer drivers can maximize their earnings by using less gas by remaining stationary between trips.

Such plausible scenarios—which emerge purely from the simulations—provide Uber with a suite of testable A/B hypotheses. In other words, the city simulation framework generates possible A/B tests to optimize the Uber client experience and minimize gas usage to maximize driver partner earnings.

About Bradley Voytek

Brad is an professor of computational cognitive science and neuroscience at UC San Diego, and the Data Evangelist for Uber. He makes use of big data, mapping, and simulations to figure out cognition.

He's created several research tools, most notably the neuroscience literature meta-analytic resource brainSCANr.com with his wife, Jessica Bolger Voytek.

He's an avid science teacher and outreach advocate and he’s spoken at events ranging from elementary schools to venues such as Ignite, TEDxBerkeley, @GoogleTalks, and SciFoo. He runs the blog Oscillatory Thoughts (http://blog.ketyov.com) and his tongue-in-cheek book about the zombie brain, Do Zombies Dream of Undead Sheep? (Princeton University Press), comes out this fall.

Implementing solutions with Apache Hadoop requires understanding not just Hadoop, but a broad range of related projects in the Hadoop ecosystem such as Hive, Pig, Oozie, Sqoop, and Flume. The good news is that there's an abundance of materials - books, web sites, conferences, etc. - for gaining a deep understanding of Hadoop and these related projects. The bad news is there's still a scarcity of information on how to integrate these components to implement complete solutions. In this tutorial we'll walk through an end-to-end case study of a clickstream analytics engine to provide a concrete example of how to architect and implement a complete solution with Hadoop. We'll use this example to illustrate important topics such as:

Modeling data in Hadoop

Selecting optimal storage formats for data stored in Hadoop

Moving data between Hadoop and external data management systems such as relational databases

Moving event-based data such as logs and machine generated data into Hadoop

Throughout the example, best practices and considerations for architecting applications on Hadoop will be covered. This tutorial will be valuable for developers, architects, or project leads who are already knowledgeable about Hadoop and are now looking for more insight into how it can be leveraged to implement real-world applications.

About Gwen Shapira
Software Engineer at Cloudera

Leading people to build large data systems - where every millisecond counts.

Senior consultant at Pythian, Oracle ACE Director, Board member at NoCOUG and a member of the Oak Table Network.

Secure architectures and other privacy related topics in information security.

Protecting privacy and civil liberties is an important aspect of data system design. Any system that will be handling financial information, communications, personally-identifiable information, medical data, or any other of a myriad data types needs to be built to preserve the privacy of the data about individuals and organizations contained within it.

Palantir Technologies builds data analysis products, built with careful safeguards and oversight, designed to hold some of the world's most sensitive information. From the beginning, privacy protections and rigorous oversight capabilities have been baked into the data platforms we design and sell.

Written by the Privacy and Civil Liberties Team, the upcoming book, Architecture of Privacy is a survey of the privacy protection landscape and the sharing of accumulated decades of wisdom on how to build these systems in the wild.

About Ari Gesher
Palantir Technologies

Ari Gesher is a senior engineer and Engineering Ambassador at Palantir Technologies.

At Palantir Technologies, Ari has split his time between working as a backend engineer on Palantir's analysis platform, thinking and writing about Palantir's vision for human-driven information data systems, and moonlighting on both Palantir's Privacy and Civil Liberties team and Philanthropic engineering team. His current role involves understanding and discussing Palantir's role in the world of analytics, big data, the future of technology, and it's impact on the world.

An alumnus of the University of Illinois computer science department, Ari has worked in the software industry for the past fifteen years, including a stint as the lead engineer for the SourceForge.net open source software archive.