This workshop will provide the attendee an introduction to R, an open-source statistical computing environment that some say is even more powerful and flexible than SAS and SPSS. Additionally, the session will also provide an introduction to predictive analytics theory and R's ability to apply predictive analytics theory to real-world situations.
Read more.

So, you've inherited a PostgreSQL server. Congratulations?
Thanks to Postgres' popularity as the database for new applications, thousands of developers, system administrators and devops are finding themselves in charge of PostgreSQL servers with no idea what to do next. This tutorial will cover the essentials.
Read more.

This tutorial will be a crash course in the basics of how to use MongoDB, as well as an introduction to some of MongoDB's core design principles. We'll start by going over the fundamentals of what MongoDB is, use that as context for starting a simple application, and finish off by showing how to set up MongoDB Replica Sets and Sharded Clusters.
Read more.

This is a solid introduction to Apache Hadoop that explains what it is, why it's relevant and how it works. No previous experience is required, and participants will gain a clear understanding of how Apache Hadoop (and many complementary tools) can be used for scalable data processing as well as approaches for integrating it with existing systems.
Read more.

High Availability has become a mandatory feature for databases. MySQL replication is the most used replication solution on the Internet, but a whole family of alternative exists in the MySQL ecosystem. This tutorial walks you through your options and teaches you how to weigh the pro's and con's of each to pick a solution that best matches your use case.
Read more.

Apache Solr is a Lucene-based blazing fast, highly scalable search engine used in thousands of applications and projects at organizations such as Zappos, Wells Fargo, Getty Images and many more.
This tutorial will provide you with the fundamentals, enabling you to be up and running with Solr in minutes.
Read more.

We’ll start the session by giving users an overview of the Apache Drill and its key extension APIs. Afterwards, we’ll describe an example use case where Apache Drill’s native capabilities are lacking. We’ll then work through design and development using Java and scripting to add extensions to the Apache Drill platform.
Read more.

This tutorial covers the core functionality of the Neo4j graph database. With a mixture of theory and hands-on practice sessions, attendees will quickly learn how easy it is to develop a Neo4j-backed application.
Read more.

ZooKeeper is the unsung hero. Although a critical component, ZooKeeper is often noticed only after it’s missing. In this presentation, we'll talk about how to efficiently resolve some of the common issues that can cause ZooKeeper’s unavailability. An impenetrable ZooKeeper makes for a healthy cluster.
Read more.

Hadoop 2.0 offers major HDFS improvements: new append-pipeline, federation, wire compatibility, NameNode HA, performance improvements, etc. In this session, we'll describe these features, their benefits and the development underway for the next HDFS release. This includes data management features, added support for storage devices and improvements to performance, diagnosability and manageability.
Read more.

How can open source help people get something useful out of the sensor data they generate? Based on social science research, this session will give developers some simple tools to understand how non-geeks make sense of complex data, and offers some approaches to improve user experience of both hardware and software based on that knowledge.
Read more.

If you have ever wanted to dabble with Apache Hadoop, Hive, HBase or other projects in the Hadoop ecosystem but have been discouraged by the painful process of installation and configuration of these projects, this talk is for you. We will learn how to install Hadoop, Hive and HBase on a cluster by making use of various packages from Apache Bigtop.
Read more.

Map Reduce has become a household name in data processing these days, but is typically used in a backend, batch oriented manner across large data sets. In this talk we'll explore pipelining data sets far too large to fit in the browser through map reduce implementations in CouchDB, server side javascript, and finally directly in the browser, allowing for large scale, yet interactive data analysis.
Read more.

In many Performance evaluation studies, you will find comparison made in terms of peak throughput or corresponding response time. This can be misleading. In this brief presentation, we will look into why such metrics can be misleading as well as provide framework and principles about performance evaluation which focuses on being able to provide good service in real world production environments.
Read more.

MySQL 5.6 is simply a better MySQL with improvements that enhance every functional area of the database kernel. There are many new features in the InnoDB storage engine, including: better performance and scalability, online DDL, persistent statistics, NoSQL access, and many more.
Read more.

Spire is one of the first open source distributed SQL databases. Architected from the ground up with no legacy code, it's meant to power large-scale applications with 10's of thousands of reads and writes at the petabyte-scale.
This talk will cover parts of Spire like distributed computational fabric, distributed indexing, query planning, and more.
Read more.

The biomedical research community is amidst a data revolution driven by the adoption of electronic health records and the arrival of next generation genomic technologies. Researchers require tools that scale with this increase without added complexity. To address this need we have developed Harvest, an open source framework for rapid development of purpose-built data discovery web applications.
Read more.

Successful database applications do not happen by accident. In this talk we will present a half-dozen design patterns for database management to help implement 24x7 applications that handle 100s of terabytes spread over multiple continents on databases like MYSQL. Start out using these patterns now and avoid a lot of pain later.
Read more.

As new data sets become available through municipal Open Data initiatives, how can these be leveraged to reveal insights and build services for communities? This talk shows Cascalog and Open Data from the City of Palo Alto to create a sample app. Some programming background is helpful, but the emphasis is on process: how to approach large-scale Open Data to build data products for a community.
Read more.

Many companies need their employees to do more then one job - Programmer, DBA, SysAdmin. The more skills you have, the more you can contribute to the overall success of the company and improve your own job marketability. Learn the basic commands of MySQL Server Administration that every Developer should know, what each does and how to use them.
Read more.

Going from a transactional SQL/ACID-based system, to a scalable NoSQL-based system can be both scary and somewhat mysterious. Many developers don't believe it can be done. It can, however. In this talk, we'll see how and to what degree.
Read more.

With the addition of JSON functionality, PostgreSQL can hold its trunk high when compared to non-SQL databases. We'll explore the ways you can use the non-structured-data features of PostgreSQL, how they perform... and when you shouldn't use them.
Read more.