Contact Us

OSCON: Data 2011 Schedule

Below are the confirmed and scheduled talks (schedule subject to change).

Customize Your Own Schedule

Create your own OSCON: Data schedule using the personal scheduler function. Mark the keynotes, workshops, sessions, and events you want to attend by clicking on the calendar icon next to each listing. Then click on "personal schedule" below and get your own customized schedule generated.

Over the past few years, Netflix has migrated to the cloud. This talk details Netflix's transition away from relational databases and towards high-availability (NoSQL) storage systems. We rely on a combination of proprietary (e.g. SimpleDB and S3) and open-source (e.g. Cassandra and HBase) NoSQL technologies.

11:30am-12:10pm (40m)
Data: NoSQL Databases

The Right Tool For The Right Job: Choosing The Best Data Storage Option

Patrick Lightbody (New Relic)

Between the NoSQL movement and new cloud offerings, it seems there are new storage options popping up every day. How do you select which one is the best for your project? The truth is that it's unlikely one option is best for all your needs. This session walks you through the various options considered by one startup and how it selected five separate storage engines - and has no regret doing so!

1:30pm-2:10pm (40m)
Data: NoSQL Databases

Building Web Applications with MongoDB

Roger Bodamer (10gen)

In this workshop, one of the core MongoDB committers will present the fundamental principles of MongoDB, how to set up and interact with the database, and what to consider when building applications using a document-based data model.

2:20pm-3:00pm (40m)
Data: NoSQL Databases

Redis: CS101 Data Structures via the Network

Ezra Zygmuntowicz (VMware Inc)

Redis is an entry in the new breed of nosql databases. But it takes a different approach that makes it much more interesting then most of the other key/value stores in the same category. Come learn what makes redis so useful that it seems everyone is adding it to their toolbox.

3:30pm-4:10pm (40m)

Whirr: Open Source Cloud Services

Tom White (Cloudera)

Apache Whirr is a way to run distributed systems - such as Hadoop, HBase, Cassandra, and ZooKeeper - in the cloud. Whirr provides a simple API for starting and stopping clusters for evaluation, test, or production purposes. This talk explains Whirr's architecture and shows how to use it.

4:20pm-5:00pm (40m)

Gearman: From the Worker's Perspective

Brian Aker (HP)

Many people view topics like Map/Reduce and queue systems as advanced concepts that require in-depth knowledge and time consuming software setup. Gearman is changing all that by making this barrier to entry as low as possible with an open source, distributed job queuing system.

10:40am-11:20am (40m)
Data: Relational

MySQL Replication Update

Lars Thalmann (Oracle)

We describe the new replication features in MySQL 5.5 (GA) and MySQL 5.6 (Development release).

11:30am-12:10pm (40m)
Data: Relational

HandlerSocket: NoSQL via MySQL

Ryan Lowe (Percona)
et al

With most modern web applications, there are requirements for both SQL access to complex data as well as simple Key-Value look-ups. This session will cover how to use the HandlerSocket Plug-In for MySQL to get exponentially faster look-ups for simple access patterns.

1:30pm-2:10pm (40m)
Data: Hadoop

Ephemeral Hadoop Clusters in the Cloud

Greg Fodor (Etsy)

The data & analytics teams at Etsy build up and tear down more than a thousand independent Hadoop clusters on EC2 each month. This talk discusses the benefits of this approach, where Elastic Map Reduce serves as a "meta-cluster" in which on-demand Hadoop clusters can be created, used, and shut down quickly and easily.

2:20pm-3:00pm (40m)
Data: Relational

MVCC Unmasked

Bruce Momjian (EnterpriseDB)

Multiversion Concurrency Control (MVCC) allows Postgres to offer high
concurrency even during significant database read/write activity. MVCC
specifically offers behavior where "readers never block writers, and
writers never block readers". This talk explains how MVCC is implemented in Postgres and highlights optimizations which minimize the downsides of MVCC. This talk is for advanced users.

3:30pm-4:10pm (40m)
Data: Relational

MySQL for the Large Scale Social Games

Yoshinori Matsunobu (DeNA)

We at DeNA (largest social game provider in Japan) handle over 2
billion page views per day with MySQL. We heavily use SSD and tune
Linux. We run non-trivial solutions such as non-stop, automated MySQL
master failover. We also use MySQL not only as traditional RDBMS but
also an extremely high performance NoSQL. I'd like to introduce our
MySQL solutions to make our social games scale better.

4:20pm-5:00pm (40m)
Data: Relational

InnoDB: Performance and Scalability Features

Inaam Rana (Oracle)
et al

There are many exciting InnoDB performance and Scalability features in MySQL 5.5 and its upcoming release. But how to best use them? What are the caveats? At this session, we will describe those performance and Scalability features in depth. We will also present some benchmark results that explore the performance of those features.

10:40am-11:20am (40m)
Data: Hadoop

Introduction to Hadoop

Tom Hanlon (Cloudera)

Hadoop gives you the ability to process massive amounts of data at scale. This presentation will show you how hadoop makes use of commodity hardware to allow you to build a system that scales, that deals gracefully with failure of individual nodes, and gives you the power of Map/Reduce to process Petabytes.

11:30am-12:10pm (40m)
Data: Roulette

Architectural Anti-patterns for Data Handling

Gleicon Moraes (7co.cc)

Ever had to dig into a system that misused the most basic features of a RDBMS ? Better yet - after the whole NoSQL storm had you wondered why it didn't shown before when you had to twist your schema to fit into something it was not designed for ? Check on this anti-patterns collection and feel better that you are not alone - and how you can benefit from it even not having big data around.

1:30pm-2:10pm (40m)
Data: Roulette

What Every Data Programmer Needs to Know About Disks

Ted Dziuba (eBay Local/Milo.com)

What happens when you write data to disk? We'll explore everything between your programming language and the spinning platters - both optimizations and dangerous pitfalls.

2:20pm-3:00pm (40m)
Data: Real-Time and Streaming

Esperwhispering: get your real-time data game on

Theo Schlossnagle (OmniTI/Circonus)

The art of dealing with real-time data is not new. In fact, much of the world's economy is propped up my making decisions on data sub milliseconds. The technology is there, we have the power. We'll take a whirlwind tour of the open-source Esper system and understand how to integrate it into your stack to enable rapid decision making on real-time data from anywhere in your architecture.

3:30pm-4:10pm (40m)
Data: Analytics and Visualization

Distributed Data Analysis with Hadoop and R

Jonathan Seidman (Orbitz Worldwide)
et al

An overview of the state of the art for bringing together the analytical power of the R language with the big data capabilities of Hadoop.

4:20pm-5:00pm (40m)
Data: Analytics and Visualization

QYZ: LaTeX, R and Redis for Beautiful Analytics

Noah Pepper (Lucky Sort)
et al

We produce gorgeous LaTeX reports while harnessing the power of R on the backend. The data is pulled from our PostgreSQL database, the analysis and visualizations are fast and distributed thanks to Redis. We'll talk about weaving together open source tools to build powerful analytics reporting engines that rival the commercial alternatives.

10:40am-11:20am (40m)
Data: Roulette

Playful Explorations of Public and Personal Data

Andrew Turner (GeoIQ)

We're being surrounded by data: Open government data, streaming media, and data we're creating as we track our lives and connect with our communities. Learn how to leverage easy to use tools to combine this together for our personal and organization decision making without requiring complex processes or training.

11:30am-12:10pm (40m)
Data: Hadoop

Developing and Deploying Hadoop Security

Owen O'Malley (HortonWorks)

Adding security to an existing product is never easy, but our team at Yahoo added strong authentication to Apache Hadoop by integrating it with Kerberos. This project was delivered on time and is currently deployed on all of Yahoo's 40,000 Hadoop computers. Come learn how we added security to and why it matters.

1:30pm-2:10pm (40m)
Data: Real-Time and Streaming

OpenTSDB: A Scalable, Distributed Time Series Database

Benoit Sigoure (StumbleUpon, Inc.)

OpenTSDB is an open-source, distributed time series database designed to monitor large clusters of commodity machines at an unprecedented level of granularity. OpenTSDB enables operations teams to keep track in real-time of all the metrics exposed by operating systems, applications and network equipment, and makes the data easily accessible.

2:20pm-3:00pm (40m)
Data: Hadoop

YARN - Next Generation Hadoop Map-Reduce

Arun Murthy (Hortonworks Inc.)

YARN is the next generation of Hadoop Map-Reduce designed to scale out much further while allowing for running applications other than pure Map-Reduce in a highly fault-tolerant manner.

3:30pm-4:10pm (40m)
Data: Real-Time and Streaming

Real-time Streaming Analysis for Hadoop and Flume

Aaron Kimball (Zymergen, Inc.)

This talk introduces an open-source SQL-based system for continuous or ad-hoc analysis of streaming data built on top of Flume-based data collection for Hadoop.
Attendees will understand how to use a new tool to extend their Hadoop data collection pipeline with real-time streaming analytics.

4:20pm-5:00pm (40m)
Data: NoSQL Databases

Querying Riak Just Got Easier - Introducing Secondary Indices

Rusty Klophaus (Basho Technologies)

The Basho engineering team has been working to make Riak more queryable with the addition of built-in indexing plus a SQL-style query language. In this talk, Rusty describes the usage, benefits, limitations, and evolution of this this functionality, called Secondary Indices. He also covers the challenges and pitfalls of adding indexing to a distributed datastore.

In this session Dell will discuss the analysis of the data types suitable for transfer between Hadoop and EDW, EDW/Hadoop data lifecycle, Data governance between Hadoop and DBMS, and ETL performance tuning and best practices (i.e. Hadoop/DBMS connector, node and network designs, etc.)

1:30pm-2:10pm (40m)
Data: Products and Services

DataStax’ Brisk – A More Powerful, Real-time, And Easier To Deploy Hadoop, Powered By Apache Cassandra

Jonathan Ellis (DataStax)

Brisk is an open-source Hadoop and Hive distro that utilizes Cassandra for its core services. Brisk provides integrated Hadoop MapReduce, Hive and job and task tracking, while providing an HDFS-compatible storage layer powered by Cassandra. By accelerating the time between data creation and analysis with DataStax’ Brisk, users experience greater reliability, simpler deployment and lower TCO.

9:00am-9:05am (5m)

Welcome

Sarah Novotny (NGINX)
et al

Opening remarks by the OSCON Data program chairs, Sarah Novotny and Bradford Stephens.

9:05am-9:20am (15m)
Keynote

Finding the Perfect Match

Tom Quisel (OkCupid)

Dive into the distributed system that powers OkCupid’s match searches. Learn how we use C++, event-based programming, and SSDs to solve problems that crop up when building a high performance, high availability distributed system.

9:20am-9:40am (20m)
Keynote

Benjamin Black

Benjamin Black (Boundary)

Keynote by Benjamin Black, Co-founder, fast_ip.

9:40am-10:00am (20m)
Keynote

What Would You Do With Your Own Google?

Steve Yegge (Google)

It's 2021. You have a petabyte drive on your keychain, your startup company leases bulk cloud storage by the exabyte, and you have a million cores for data crunching. You even can have your own copy of the entire world's public semantic data. What do you do with it? If you're not sure yet, I've got plenty of ideas for you.

10:00am-10:10am (10m)
Keynote

Q & A

An open microphone question and answer session with the morning's keynote speakers.

7:00pm-9:00pm (2h)
Event

Ignite OSCON

If you had five minutes on stage what would you say? What if you only got 20 slides and they rotated automatically after 15 seconds? Would you pitch a project? Launch a web site? Teach a hack? We’re going to find out when we conduct our third Ignite event at OSCON.

10:10am-10:40am (30m)

Break: Morning Break

12:10pm-1:30pm (1h 20m)

Break: Lunch

3:00pm-3:30pm (30m)

Break: Afternoon Break

9:00pm-11:00pm (2h)
Event

Monday Birds of a Feather Sessions

Birds of a Feather (BoF) sessions provide face to face exposure to those interested in the same projects and concepts. BoFs can be organized for individual projects or broader topics (best practices, open data, standards). BoFs are entirely up to you. We post your topic online and onsite and provide the space and time. You provide the engaging topic.

5:00pm-7:00pm (2h)
Event

Android Happy Hour

Join other Android developers for happy hour at Gather in the Double Tree Hotel on Monday evening. Meet face-to-face and share experiences with other developers working on Android. The first 100 people there get a free drink ticket.