Schedule views

Deep Data is a no-holds-barred program for data scientists. The advanced technical content will keep you up to speed with the latest techniques, and give you the opportunity to debate and network with the most skilled data scientists in our industry.
Read more.

This tutorial provides a solid foundation for those seeking to understand large scale data processing with MapReduce and Hadoop, plus its associated ecosystem. This session is intended for those who are new to Hadoop and are seeking to understand where Hadoop is appropriate and how it fits with existing systems. No programming experience is required.
Read more.

Want to extract and process Big Data from the web? This tutorial will show you how to use key open source technologies such as Hadoop, Cascading, Bixo, Tika, Mahout and Solr to create scalable, reliable web mining solutions.
Read more.

This hands-on tutorial teaches you how to setup and use Hive, a high-level, data warehouse tool for Hadoop. Hive provides a SQL-like query language, HiveQL, that is easy to learn for people with prior SQL experience, making Hive attractive for data warehousing teams. Hive leverages the power of Hadoop for working with massive data sets without requiring expertise in MapReduce programming.
Read more.

The big data world is extremely chaotic based on technology in its infancy. Learn how to tame this chaos, integrate it within your existing data environments (RDBMS, analytic databases, applications), manage the workflow, orchestrate jobs, improve productivity and make using big data technologies accessible to a much wider spectrum of developers, analysts and data scientists.
Read more.

This workshop is a jumpstart lesson on how to get from a blank page and a pile of data to a useful data visualization. We'll focus on the design process, not specific tools. Bring your sample data and paper or a laptop; leave with new visualization ideas.
Read more.

Contrary to popular belief, SQL and NoSQL are not at odds with each other, they are duals—in fact NoSQL should really be called coSQL. Recognizing this duality can change the way we think about which technology to use when, and what we need to invest in next.
Read more.

Author and digital marketing evangelist Avinash Kaushik shares his perspective, drawing from experience with some of the world's largest online marketers, and looks at how an analyst mentality is quickly permeating all aspects of business and marketing.
Read more.

With the collection of almost every piece of information about your customers comes the ability to start asking your data the right question: Why do they do what they do? And even more: what would they do if I could interact with them. We show for the case of online display advertising, how causal analysis gives interesting new answers about the right (and wrong) ways of spending your money.
Read more.

The effect of big data on all business models cannot be denied. This panel of SCM experts looks at how business are using, or should be using, big data to drive supply chain management issues focusing on the broader manufacturing issues that must be addressed as well as practical tips that can be applied in dealing with supply chains that now span the globe.
Read more.

This presentation lays out some clear, concrete gating conditions for when it makes sense to pull the trigger on big data initiatives, and how they should be procured, depending on the use case, the data assets, and the resources available.
Read more.

Getting training data for a recommender system is easy: if users clicked it, it’s a positive - if they didn’t, it’s a negative.
… Or is it?
In this talk, we use examples from production recommender systems to bring training data to the forefront: from overcoming presentation bias to the art of crowdsourcing subjective judgments to creative data exhaust exploitation and feature creation.
Read more.

What are the fundamental skills that a CEO needs to become “Data Driven”? In this session we will discuss the 3 essential skills that will enable CEOs to effectively lead their organizations into the Data Revolution. These organizations will harness the power of data to innovate, grow profits and beat the competition.
Read more.

Learn various ways to bootstrap a custom corpus for training highly accurate natural language processing models. Real world examples will be presented with Python code samples using NLTK. Each example will show you how, starting from scratch, you can rapidly produce a highly accurate custom corpus for training the kinds of natural language processing models you need.
Read more.

There are many rapidly evolving technologies that provide objective metrics and analytics for most outward facing business interactions. The evolution of similar inward facing tools has not kept pace. In this presentation we discuss which sources of internal organizational data are frequently neglected, approaches for automating data collection, and what valuable insights can result from analysis.
Read more.

Twenty-first century big data is being used to train predictive models of emotional sentiment, customer churn, patient health, and other behavioral complexities. Variable importance and feature selection reduces the dimensionality of our models, so an unfeasible and complex problem may become somewhat more predictable.
Read more.

Wouldn't it be great if there were just use two algorithms which could handle most of your predictive modeling needs? It turns out that actually this is the case. Noted machine learning instructor Dr Mike Bowles and champion data miner Jeremy Howard will teach you everything you need to know to apply them successfully.
Read more.

This presentation goes beyond the hype, buzzwords, and rehashed slides and actually presents the attendees with a hands-on, step-by-step tutorial on how to write a Java application on top of Apache Cassandra. It focuses on concepts such as idempotence, tunable consistency, and shared-nothing clusters to help attendees get started with Apache Cassandra quickly while avoiding common pitfalls.
Read more.

In this hands-on class, learn how to turn data into effective, interactive visualizations. You do not require a Tableau license to participate, but must bring a Windows laptop or virtual machine.
Read more.

While extracting entities from massive amounts of text is a major problem, a proven solution exists. This tutorial will demonstrate a natural language parsing technology to extract entities from all kinds of text using massively parallel clusters.
Read more.

Learn now how to use a Hadoop cluster for data analysis using Java MapReduce, Apache Hive and Apache Pig, and get an overview of using the HBase Hadoop database. Some programming experience is strongly recommended for this session.
Read more.

Mark Madsen talks about how regular businesses will eventually embrace a data-driven mindset, with some trademark 'Madsen' history background to put it in context. People throw around 'industrial revolution of data' and 'new oil' a lot without really thinking about what things like the scientific method, or steam power, or petrochemicals did as a result.
Read more.

The tools of social network analysis are based on mathematical network theory. There is very little in these techniques that actually requires that the data represents social activity. We'll show how these techniques can be applied to data from areas such as geo, linguistics and the Wikipedia link graph. We'll visualise and explore the data using Gephi, the "Photoshop for graphs".
Read more.

"Big data" provides the opportunity to combine new, rich data sources in novel ways to discover business insights. How do you use analytics to exploit this data so that it will yield real business value? Learn a proven technique that ensures you identify where and how big data analytics can be successfully deployed within your organization. Case study examples will demonstrate its use.
Read more.

Relational databases were based on Set theory — which insists that the order of items does not matter. For many (most?) data problems, however, order does matter. By using Array theory, a relational-like database gains a considerable advantage over set-theory based engines.
Read more.

In this session, business agility expert Michael Hugos will present examples from his work in applying immersive animation techniques and gaming dynamics, and discuss how they can address the challenges of consuming - and responding to - the data deluge, turning information overload into business advantage.
Read more.

We examine the effectiveness of a statistical technique known as survival analysis to optimize the cache time-to-live for hotel rates in a hotel rate cache. We describe how we collect and prepare nearly a billion records per day utilizing MongoDB and Hadoop. Finally, we show how this analysis is improving the operation of our hotel rate cache.
Read more.

Search user interfaces are slow to change; ideas for new search interfaces rarely take hold. This talk will forecast how search is likely to change and what will stay the same in the coming years.
Read more.

Two events happening in the same time & place: *Mini Maker Faire* is a showcase of innovative data-related hardware, apps, and projects *Data Crush*, an experiment combining wine-tasting with the gathering, analysis, and application of data to track behavioral trends and influencing factors.
Read more.

Apache Hadoop forms the kernel of an operating system for Big Data.
This ecosystem of interdependent projects enables institutions to
affordably explore ever vaster quantities of data. The platform is
young, but it is strong and vibrant, built to evolve.
Read more.

The explosion of data is both a challenge and opportunity for businesses. In order to thrive in this new world, organizations will need a technical strategy for sifting through all of this data and driving insights.
Read more.

How big data tools and technologies give us back our individual identity ... because if you didn't know you were unique and special, well, you are. Big data can be applied to solving socio-economic problems that rival the scale and importance of building ad optimization models.
Read more.

Tools for attacking big data problems originated at consumer internet companies, but the number and variety of big data problems have spread across industries and around the world. I'll present a brief summary of some of the critical social and business problems that we're attacking with the open source Apache Hadoop platform.
Read more.

Back in the late 80s artificial intelligence was set to take over the world; it didn’t happen. In 2012; AI has been stripped down, dressed up and reborn as machine learning. Will it take over the world this time? What makes a Big Data - Machine Learning solution ‘better’?
Read more.

The increasing use of online software and digital devices in the classroom provides a source of high-frequency data streams that can be analyzed to better understand student progress, identify individual needs, and develop personal recommendations.
Read more.

So you've hoarded the world's data within your enterprise. Now what? Author and digital marketing evangelist Avinash Kaushik shares lessons from the nascent world of Web Analytics on how multiplicity, scale and outsourcing powers a data democracy, and how that in turn drives business action.
Read more.

Negative results from clinical trials go missing far too often, leading us to overestimate the benefits of treatments. Attempts to remedy this problem haven't worked well. Ben Goldacre, both a doctor and data geek, will talk about how to fix this, and other, problems in medicine.
Read more.

Despite the hype, Big Data has yet to live up to its potential. Why? Because we’ve spent too much time thinking about the data itself and not enough time considering which business decisions can be improved through the intelligent application of data. Panjiva CEO Josh Green will discuss an alternative approach: starting with a challenging business problem and then tracking down relevant data.
Read more.

Visual analysis is an iterative process for working with data that exploits the power of the human visual system. The formal core of visual analysis is the mapping of data to appropriate visual representations. Learn what years of research have taught us about designing visualizations people can learn from and understand.
Read more.

In this session, Hortonworks CEO Eric Baldeschwieler will look at the current state of Apache Hadoop, how the ecosystem is evolving by working together to close the existing technological and knowledge gaps, and present a roadmap for the future of the project.
Read more.

With billions of social activities passing through the ever-growing realtime social web each day, companies are beginning to harness the power of social data. In this session, participants will learn from real-world case studies in Financial Services, Emergency Response, Brand Analytics and other industries about how businesses are applying social data to their operations to drive value.
Read more.

In this panel discussion, DataStax CEO BIlly Bosworth will moderate a discussion that will spotlight real mission critical Big Data use cases from "hands-on" practitioners. With companies like Walmart, Netflix, & Apigee among many others adopting Apache Cassandra and other new database technologies, there's never been a more exciting time to be building data intensive applications.
Read more.

In this session, Expedia, one of the world’s leading online
travel companies, describes how they tapped into their massive machine data to
deliver unprecedented insights across key IT and business areas – from ad metrics and
risk analysis, to capacity planning, security, and availability analysis.
Read more.

R and Hadoop, the two hottest stars on the Analytics stage, were meant to be together. The open source RHadoop project was established to make it happen. We'll go over what RHadoop does for you, how to use it, and why you should add it to your toolset.
Read more.

New analysts or engineers are often lost when textbook approaches fail on real world data. Drawing inspiration from problem solving techniques in mathematics and physics, we will walk through examples that illustrate how come up with creative solutions and solve real world problems with data.
Read more.

There is a revolution at hand centering on this groundswell of data and it will change how we execute our businesses through greater efficiencies, new revenue discovery and even enable innovation. It is the revolution of Big Data. Management Strategies for Big Data will explain this new wave of technology and provide a roadmap for businesses to take advantage of this growing trend.
Read more.

With the rise of big data more and more people need effective visualizations. Needs may range from simple charts to massive interactive network graphs. A range of tools exist, but still many find none that meet all their requirements: Cross-browser usage, server-side rendering, iOS support, full control of look and feel, and your options are suddenly very slim. We share our lessons and approach.
Read more.

At Cloudera, we've found that monitoring Apache Hadoop is itself a big data problem. Here I'll present work we've been doing on turning the vast amounts of monitoring data a Hadoop cluster generates into meaningful signals to help us wrestle with the biggest challenges of maintaining large distributed systems: failure of machines, processes and people, and root-cause analysis after-the-fact.
Read more.

Birds of a Feather (BoF) sessions are informal roundtable discussions happening during lunch on Wed 2/29 and Thu 3/1. You can join any BoF table or start your own with a topic of your choice. The BoF sign-up board will be near the Registration area.
Read more.

How to simplify the data integration process and save a significant amount of development time by automatically generating code for processes (data profiling, data cleansing, and record linkage). A case study will show a complex, Big Data linking application, where insurance data was converted to HPCC using the SALT tool and reduced 20,000+ lines of source code to a 48-line SALT specification.
Read more.

While enterprises see an opportunity to increase revenues and decrease costs by becoming a data-driven organization, it is not easy to decide where and how to begin. This session highlights some principles for success through examining two real-world big data case studies.
Read more.

See how applying traditional data analysis tools, as well as more esoteric ones like computer vision, to multiple disparate data sets and data types can create a more complete and nuanced narrative of one of San Francisco’s most vibrant streets.
Read more.

In this talk, we'll build a complete, scalable collaborative filtering ("people who X also Y") system that is almost identical to what prominent Internet properties use today. We'll talk about model improvements, performance enhancements, and practical considerations. This is a practical talk accessible to all.
Read more.

Advances in columnar databases are creating bio-science opportunities that were previously not possible. Fernanda Foertter and the team at Genus discovered an innovative way to store and access the huge volumes of data being generated modeling genotypes. She and Jim Tommaney discuss the benefits of column storage and how InfiniDB’s Map Reduce empowers high performance Big Data analytics.
Read more.

Today's users won't tolerate slow applications. More often than not, the database is the bottleneck in the application. Learn how VMware vFabric SQLFire can give you the speed and scale you need in a substantially simpler way. SQLFire is a memory-optimized and horizontally-scalable distributed SQL database. Attend this session to learn how SQLFire gives high performance without the complexity.
Read more.

Hadoop is not an island. To deliver a complete Big Data solution, a data pipeline needs to be developed that incorporates and orchestrates many diverse technologies.
Using an example of real-time weblog processing, in this session we will demonstrate how the open source Spring Batch and Spring Integration projects can be used to build manageable and robust pipeline solutions around Hadoop.
Read more.

Instead of working too hard to define the parameters in an attempt to completely remove the ambiguity, look at what people do, interact with and talk about. We can watch what people do and decide from there what a coffee shop is and where the boundaries of your neighborhood are. It might not be the “truth”, but it can be darn close.
Read more.

Leapfrog enabled their learning toys and set up a system to have millions of toy owners upload their play logs. This talk covers the business strategy and the technical implementation hurdles from perspective of the former Director of Data Services who implemented it.
Read more.

Data visualization is just one tool that designers use to communicate data-driven recommendations. In this session I present a case study on the use of user-centered design practices to craft meaningful and actionable data presentations for business users. Data visualization and UX work best when they work together.
Read more.

As more companies adopt Hadoop to perform data intensive tasks for large data sets, there is a burning need to make Hadoop available to a broader set of developers. This talk covers two approaches Microsoft is exploring for this purpose: 1. JavaScript interfaces to run Hadoop jobs and 2. web interfaces for Hadoop that let developers write and run MapReduce jobs from any platform.
Read more.

Topics will span the data flow lifecycle from data collection, curation and quality, to aggregation and standardization of a multitude of complex data sources, to the creation of valuable analytics, including recommendations that connect users to the data.
Read more.

Running large scale datastores requires us to handle various challenges such as scalability, reliability, performance, and reduced operational overhead. In this talk, we will discuss how Amazon DynamoDB was designed to address these problems.
Read more.

This session will explore a new class of analytic platforms and technologies such as SQL-MapReduce® which bring the science of data to the art of business. By fusing standard business intelligence and analytics with next-generation data processing techniques such as MapReduce, big data analysis is no longer just in the hands of the few data science or MapReduce specialists in an organization!
Read more.

Cloudera Data Scientist Josh Wills will share insights and “how to” tricks about Crunch, a Java library that aims to make writing, testing and running MapReduce pipelines that run over any type of data easy, efficient and even fun.
Read more.

Netflix is known for pushing the envelope of recommendation technologies. The Netflix Prize put a spotlight on recommender system research and a focus on predicting ratings. But, predicting a rating is only part of the recommendation problem. In this talk I will describe how other sources of implicit and contextualized information can be used to create a personalized experience.
Read more.

Learn about how data is used for a fashion retailer that is on a rapid growth path. At ModCloth we don't believe in dictating fashion trends to our customer—we are inverting the pyramid and democratizing fashion. Buying patterns and user interactions are leveraged to help us understand how we can meet our customers' desires
Read more.

Custom data exploration tools can provide efficient and exciting interfaces for audiences not well served by out-of-the-box business intelligence solutions. Frameworks not only beautify data but also surface novel observations from the set. In this session, we survey the creative coding frameworks that lend themselves to visualization and offer some insight into their strengths and weaknesses.
Read more.

How do you architect big data systems that leverage virtualization and platform as a service? We will walk through a layered approach to building a unified analytics platform using virtualization, provisioning tools and platform as a service.
Read more.

Federal transparency initiatives have spawned millions of rows of data, state and local programs engage developers and wonks with APIs, contests and data galore. Private industry offers attribute-laden device exhaust, forming a geo-footprint of who is going where, when, how and (maybe) for what. Who decides data provenance? Does curated data get treated the same as heterogeneous data?
Read more.

This session looks at the requirements for a multi-tenant big data cluster:
one where different lines of businesses, different projects, and
multiple applications can be run with assured SLAs, resulting in
higher utilization and ROI for these clusters.
Read more.

Data Scientists must deal with many Big Data challenges including volume, velocity and variety of data. These challenges require a new solution - Automated Understanding - a new evolution in software. In this session Tim Estes will show the power of this new capability on a large and valuable dataset that has never been deeply understood by software before.
Read more.

Data science applied in engineering driven industries is revolutionizing how highly complex products are developed. Unprecedented access to computing power combined with advanced data science tools provide the opportunity to not only increase the speed of development but also improve the final design. Using a practical aerospace example, Joris will illustrate the tools and techniques described.
Read more.

Moneyball is to marketing science as CSI is to forensic science. The expectations are high and marketers are shouting "where's the insight?" and "ENHANCE!". Data is long and marketing scientists are short. We can only scale through technology. This is the story of how a developer and two marketing scientists became data scientists in crossing that gap.
Read more.

In this talk we report on the value of tools that support a human-driven approach to revealing innovation opportunities hidden withing big datasets. Based on our experience in data science projects involving multiple stakeholders we found that sketching with data and rapidly sharing interactive information visualizations is a key practice to transform information into useful services and products.
Read more.

Facebook's Open Graph, Schema.org, and a recent scramble towards a "Rosetta Stone" for geodata, are all examples of a trend towards linking data across the web. Weaving data into the web simplifies integration. Big Data offers ways to mine huge datasets for insight. Linked Data turns the web into a dataset
Read more.

Using Hadoop based business intelligence analytics, we analyzed Hadoop source code over time. This talk illustrates text and related analytics with Hadoop on Hadoop to reveal the true hidden secrets of the elephant.
This entertaining session highlights the value of data correlation across multiple datasets and the visualization of those correlations to reveal hidden data relationships.
Read more.

Big data isn't just an abstract problem for corporations, financial firms, and tech companies. To your mother, a 'big data' problem might simply be too much email, or a lost file on her computer. We need to democratize access to the tools used for understanding information by taking the hard-work out of drawing insight from excessive quantities of information.
Read more.

How are businesses using big data to connect with their customers, deliver new products or services faster and create a competitive advantage? Learn about the changing nature of customer intimacy and how the technologies and techniques around big data analysis provide business advantage in today's social, mobile environment – and why it is imperative to adopt a big data analytics strategy.
Read more.

The expected massive growth of connected device, appliance and sensor markets in the coming years - often called 'The Internet of Things' - will need a more rich concept of 'open data' than is currently common.
Read more.

Big Data is about extracting value from fast, huge, varied, complex data sets. But simply crunching data is only the first step. As adoption of MapReduce and data analytic technologies increases, forward thinking companies are starting to build applications on their core data assets.
Read more.

Dr. Richard Merkin, President and CEO of Heritage Provider Network, that was recently named one of Fast Company’s 10 most innovative healthcare companies for 2012, will announce the winner of the second progress prize in the $3 million dollar Heritage Health Prize competition.
Read more.

Google Insights for Search provides an index of search activity for millions of queries. These queries can sometimes help understand consumer behavior. Hal describes some of the issues that arise in trying to use this data for short-term economic forecasts and provide examples.
Read more.

In today's environments, we're often forced to collect data before we know if it will be useful. This tendency leads toe seas of data, flowing in real-time with very little structure or understanding of what the data means. Given that, how can you tell when data "is normal?" Let's find out.
Read more.

A high level overview of Microsoft IT's BI strategy and it's various applications, focusing on Self Service BI, Scorecards and Dashboards, Data Visualizations, and Leadership Decision making through robust BI tools.
Read more.

The use of video to communicate data is on the rise, but what is the most effective way to do this? Highlighting our current work with the BBC in this field we will look at best practice from storytelling principles to choosing the right visual treatment.
Read more.

This talk will go in details, architecture and challenges of building a recommendation system on a massive social graph. The talk will describe how we applied learning on large datasets using Apache Hadoop and how we scaled millions of reads and writes. We will also showcase Eventbrite's data platform architecture.
Read more.

So much of the privacy discussion is about data access, fear of future dystopia, and the complexities of law. There is a vacuum around how societal norms should be mapped to rapidly growing capabilities of big data, leaving data professionals in a "don’t ask don't tell" privacy conundrum. This conversation will discuss specific use-cases and frameworks to guide data pros.
Read more.

This presentation will cover the next generation of Apache Hadoop, known as hadoop-0.23. Learn how MapReduce has been re-architected by the community to improve reliability, availability and scalability as well as adding support for alternate programming paradigms. Also learn about HDFS Federation, which allows for significant scalability improvements, as well as other important advancements.
Read more.

Monitoring thousands of servers generates a lot of data. Many organizations trying to harness enormous amounts of data struggle with the same types of challenges as the Rackspace cloud monitoring team. Find out how Rackspace uses NoSQL technology, distributed concepts, and open source software in novel ways to produce a multi-region cloud monitoring API.
Read more.

NoSQL, Big Data, massive scale, real-time, in the cloud, do I need it, do I want it, how the heck can I even know if it’s right for me?
Choosing any database solution is a critical and tricky decision. Navigating the murky waters of NoSQL can be even tougher.
Read more.

In "The Evolution of Data Products", O'Reilly Media's Mike Loukides notes: "the question of how we take the next step — where data recedes into the background — is surprisingly tough." Jeremy Howard will show why this is tough, and what to do about it. He will show how predictive modelling, simulation, and optimization can be combined to deliver results instead of just delivering data.
Read more.

By charging interchange fees for retailers and account fees for customers banks have taken a ‘combative’ approach for revenue generation. However, technologies are emerging that enable financial institutions to leverage big data drawn from the transaction data stream to provide new, pro-consumer revenue streams.
Read more.

Long a staple of broadcast sports, augmented reality (AR) effects (like the virtual "1st and 10" line) are increasingly being driven by digital records of sports events (DREs), collected and distributed live, such as NASCAR's race car tracking system and MLB's PitchFX. The next generation of DRE-derived data will expand the use of AR to more effectively show key "invisible" elements of the game.
Read more.

This session discusses financial services use cases and challenges in using Hadoop analytics including long-term storage and analytics of transactions, identifying cross and up sell opportunities by analyzing web log files and customer profiles, value-at-risk analytics, and understanding the SLA issues and identifying problems in a thousands-of-nodes, big-services oriented architecture.
Read more.

Making sense of the privacy issues around personal data is way too complicated. Pretty Simple Data Privacy builds on the idea that users need three options - Yes, No, Maybe - to control privacy settings on their personal data. We'll explore existing projects and codebases that implement legal and technical tools for all three of the settings.
Read more.

In this session, attendees will learn about a new method for solving big data analytics via HPCC Systems, an open-source enterprise proven platform for Big Data. A case study will be given using patent data to demonstrate how big data can be process, linked, analyzed, searched and delivered to answer various queries.
Read more.

Nick Halstead CTO of DataSift will talk about Hadoop, HBase and dealing with storing and processing a billion tweets every 3 days. You will get insights into the architecture, pitfalls and real-world lessons on using Big Data technologies.
Read more.

Storm is an open-source realtime computation system relied upon by Twitter for much of its analytics. Storm does for realtime computation what Hadoop did for batch computation. It has a huge range of applications and combines ease of use with a robust foundation.
Read more.

Birds of a Feather (BoF) sessions are informal roundtable discussions happening during lunch on Wed 2/29 and Thu 3/1. You can join any BoF table or start your own with a topic of your choice. The BoF sign-up board will be near the Registration area.
Read more.

In this session we discuss approaches to mining unstructured data that gradually find their way into the real world. Text mining and analytics algorithms strive to identify documents’ categories, main topics, mentioned names and other entities; they summarize and detect sentiment. We describe case studies that take advantage of such algorithms in the legal, forensics and healthcare sectors.
Read more.

What does it really take to build a data product? Recall and relevancy are only parts of the challenge. In fact, an entire new approach is required to build consistently great data products.
Read more.

With the explosion of mobile devices, there is a plethora of geo-tagged data available for mining and visualization. To make compelling visualizations, it is often necessary to build tools that allow users to easily explore, mine, map, and market this data. This talk will focus on how to use several open-source frameworks to build such visualizations.
Read more.

Hundreds of hours of video recordings culled from multiple cameras. Most of these recordings hold little value as the scene does not change for extended periods of time. For organizations that must keep the original in tact, analyzing these recordings can be very difficult. Using Map/Reduce we can harness parallel processing to identify and tag useful periods of time for faster analysis.
Read more.

This talk uses the OODA Loop concept (Observe, Orient, Decide, Act) as a framework to categorize Big Data use cases and data-driven services and the front-ends to those services. Rather than starting with the underlying technology or the data sources, the OODA loop starts with WHY the user needs information. It answers the question of when a black box beats an analytic tool, and vice versa.
Read more.

Machine learning (ML) holds the key to the most advanced uses of big data. But is ML really possible on big data with state-of-the-art methods, or just simple ones? Can ML really be done in real time today? Is MapReduce the right answer? The cloud? I will review the current state of ML technology both at the research level and the industry-readiness level, and current best solution options.
Read more.

Flurry provides an analytics and advertising platform for smartphone applications. Every month we track over 20 billion sessions across over 330 million devices. This talk will go over the Hadoop and HBase architecture we run and the challenges we face managing a massively growing data set.
Read more.

Big data isn't just about multi-terrabyte data sets hidden inside eventually-concurrent distributed databases in the cloud. It's also about the hidden data you carry with you all the time. This talk will discuss the data that you carry with you all the time; the data on your cell phone and other mobile devices, along with the possibilities for making use of that hidden data.
Read more.

One of the most significant challenges faced by individuals and organizations is how to discover and collaborate with data within and across their organizations, which often stays trapped in application and organizational silos.
Read more.

Beautiful, useful and scalable techniques for analysing and displaying spatial information are key to unlocking important trends in geospatial and geotemporal data. Recent developments in HTML 5 enable rendering of complex visualisations within the browser, facilitating fast, dynamic user interfaces built around web maps. This session will examine emerging technologies that will shape the geoweb.
Read more.

Map/Reduce has created tremendous interest in parallel programming and big data analytics, but it isn't always the right tool for the job. Many new projects have emerged in this space over the last year including two cluster schedulers (YARN and Mesos) and numerous parallel computing environments. We'll provide an introduction to these new technologies, including some you might not have heard of.
Read more.

In a research environment, under the current operating system, most data and figures collected or generated during your work is lost, intentionally tossed aside or classified as “junk”, or at worst trapped in silos or locked behind embargo periods. In the digital age, this does not need to be the case - and it's imperative we change that reality.
Read more.

The session will talk about costs involved in Big Data projects, covering the apparent and also hidden aspects of these costs. It will also discuss how to build a Big Data solution with lower cost of “per TB Data Managed and Analyzed”.
Read more.

Mobile devices offer boundless opportunities for collection and presentation of temporally- and spatially-relevant data. But there are obstacles: intermittent connectivity as well as processing, storage and other constraints. Featuring real-world apps, this session covers device data collection; device-device and device-cloud data synchronization; and data aggregation and analysis in the cloud.
Read more.

In this talk, we will analyze various dimensions of microwork that characterize applications, tasks, and crowds. Drawing on our experience at companies that have pioneered the use of microwork (Samasource) and data science (LinkedIn), we will offer practical advice to help you design crowdsourcing workflows to meet your data product needs.
Read more.

Due to recent advancements in Big Data, cloud computing, and network maturity it's now possible to work with extremely large weather-related data sets. The Climate Corporation CTO Siraj Khaliq discusses how to apply big data principles to the real-world challenge of protecting people and businesses from the financial impact of weather.
Read more.

The ultimate utility of Big Data is transforming it into Big Insights. Charts, graphs, and tables of aggregated data are useful but still require interpretation by the end user. With advances in linguistic algorithms and data processing it is now possible to derive meaningful insights from data and present them in digestible narrative content.
Read more.

NetApp collects 250 TB per year of unstructured data from devices that phone home. They need to be able to do ad hoc analysis and build predictive models for device support and cross-sales. We discuss our experiences building a Big Data system with NetApp using Hadoop and HBase to improve customer service, drive sales and develop better products.
Read more.

The “common good” challenge for Big Data is to deliver actionable information that can be used by nonprofits and civic orgs. But that challenge isn’t new. Existing data intermediaries for NGOs have a rich history of working in common-good territory. Let’s discuss. What is this history? What can we take away from it to inform new, perhaps disruptive, approaches to meet this challenge?
Read more.

Data storage needs are increasing at an exponential rate.
Incumbent storage systems are proprietary, expensive to buy and expensive to maintain. With the advent of the cloud, everyone expects auto scaling.
Ceph storage is a massively scalable storage system that aims to fill the distributed storage system void.
Read more.

Mendeley is a New York and London-based startup that has crowdsourced the world's largest database of academic literature. Over 1M researchers strong, Mendeley is taking academia to the cloud.
Read more.

Measuring productivity remains a notoriously difficult problem. We will show how real-time collaboration data are being leveraged to measure, model and forecast organizational productivity and performance in the innovation teams at Boeing and in 3 Formula One teams. On the back of these forecasts, we will show how investment yields were improved by 15% and productivity raised by nearly 20%.
Read more.

A story or report on a subject by its very nature summarizes the underlying data. But readers may have questions specific to a time, date or place. Visualizing the data and providing effective, targeted ways to drill deeper is key to giving the reader more than just the story.
Read more.

The science and commercial worlds share requirements for a high performance informatics platform to support collection, curation, collaboration, exploration, and analysis of massive datasets. SciDB is an open source analytical database that provides seamlessly integrated massively scalable analytics. We present performance and scalability for non-embarrassingly parallel operations.
Read more.

Metastasis is the lethal form of cancer. Metastasis arises through cancer cells traveling through the blood of the patient and colonizing in other organs. Finding and characterizing these cells enables the prediction and monitoring of response to cancer treatments.
Read more.

Maps of the complex connections that form when people link, like, reply, rate, review, favorite, friend, follow, edit, and mention one another can reveal important trends. It is possible to create network maps with free and open tools that identify key people and sub-groups in any social media population with just a few key clicks. Can you make a pie chart? You can now make a network chart.
Read more.

Sponsors

Elite Sponsors

Strategic Sponsors

Partner Sponsors

Impact Sponsors

Premier Exhibitors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at sstewart@oreilly.com.