Tune into the new world of big data with this incisive video compilation of O’Reilly's inaugural Strata Conference. Held on February 1-3 in the heart of California’s Silicon Valley, Strata included 78 inspiring sessions with the world’s leading data analysts, scientists, and practitioners. Hilary Mason (Bit.ly), Ben Gimpert (Altos Research), Chris Wensel (Concurrent, Inc.), Patrick Chanezon (Google), Zane Adam (Microsoft), Mile Olson (Cloudera) and other luminaries lead the Executive Summit, Data Bootcamp, and practitioner sessions.

Cloud

If you're involved in planning IT infrastructure as a network or system architect, system administrator, or developer, this book will help you adapt your skills to work with these highly scalable, highly redundant infrastructure services. Cloud Application Architectures will help you determine whether and how to put your applications into these virtualized services, with critical guidance on issues of cost, availability, performance, scaling, privacy, and security.

You may regard cloud computing as an ideal way for your company to control IT costs, but do you know how private and secure this service really is? Not many people do. With Cloud Security and Privacy, you'll learn what's at stake when you trust your data to the cloud, and what you can do to keep your virtual infrastructure and web applications secure. This book offers you sound advice from three well-known authorities in the tech security world.

Learn the nuts and bolts of cloud computing with Windows Azure, Microsoft's new Internet services platform. Written by a key member of the product development team, Programming Windows Azure shows you how to build, deploy, host, and manage applications using Windows Azure's programming model and essential storage services.

Databases

Access 2007: The Missing Manual was written from the ground up for this redesigned application. You will learn how to design complete databases, maintain them, search for valuable nuggets of information, and build attractive forms for quick-and-easy data entry. You'll even delve into the black art of Access programming (including macros and Visual Basic), and pick up valuable tricks and techniques to automate common tasks -- even if you've never touched a line of code before. You will also learn all about the new prebuilt databases you can customize to fit your needs, and how the new complex data feature will simplify your life. With plenty of downloadable examples, this objective and witty book will turn an Access neophyte into a true master.

Three of CouchDB's creators show you how to use this document-oriented database as a standalone application framework or with high-volume, distributed applications. With its simple model for storing, processing, and accessing data, CouchDB is ideal for web applications that handle huge amounts of loosely structured data. You'll learn how to work with CouchDB through its RESTful web interface, and become familiar with key features such as simple document CRUD (create, read, update, delete), advanced MapReduce, deployment tuning, and more.

Discover how MongoDB can help you manage a huMONGOus amount of data collected through your web application. This book covers the basic principles and advanced uses of this document-oriented database, and demonstrates why why MongoDB is scalable, high-performance, and reliable. This authoritative introduction, written by two software engineers from the company that develops this open-source database, offers guidance for programmers and advanced configuration for system administrators.

The growing popularity of Apache Cassandra rests on this database’s ability to handle very large data sets that include hundreds of terabytes. This hands-on guide provides the details and practical examples you need to understand Cassandra’s non-relational database design and how to take advantage of it in a production environment. The author pays special attention to data modeling, and demonstrates Cassandra’s many advantages, including its high availability, eventual consistency model, and ability to scale easily.

Performance and Scalability

Service-oriented architecture (SOA) is finally becoming a concrete discipline rather than a hopeful collection of cloud charts. This book demonstrates how SOA can simplify the creation of large-scale applications, whether your project involves a large set of Web Services-based components, or is a means to connect legacy applications to more modern business processes. SOA in Practice explains how -- and whether -- SOA fits your needs.

The Art of Application Performance Testing provides a step-by-step approach to testing mission-critical applications for scalability and performance before they're deployed -- a critical topic to which other books devote, at most, one chapter. With it, you'll learn the complete life cycle of the testing process, along with best practices to help you plan, gain approval for, coordinate, and conduct performance tests on your applications.

With this book, you can build exciting, scalable web applications quickly and confidently, using Google App Engine -- even if you have little or no experience in programming or web development. Using Google App Engine provides an overview of the tools necessary to use Google App Engine, including Python, HTML, Cascading Style Sheets (CSS), and HTTP. You'll also learn what's required to deploy your applications to Google servers.

Google App Engine does more than provide access to a large system of servers. It also offers you a simple model for building applications that scale automatically to accommodate millions of users. With this book, you'll get expert practical guidance that will help you make the best use of this powerful platform. Google engineer Dan Sanderson shows you how to design your applications for scalability, including ways to perform common development tasks using App Engine's APIs and scalable services.

The Web is increasingly happening in realtime. With sites such as Facebook and FriendFeed leading the way, users are coming to expect that all websites should serve content to them as it occurs. With this book, you'll learn how to add several realtime features to your site -- everything from chat and messaging services to streaming content feeds -- without making significant changes to your existing infrastructure.

Apache Hadoop is ideal for organizations with a growing need to store and process massive application datasets. With Hadoop: The Definitive Guide, programmers will find details for analyzing large datasets with Hadoop, and administrators will learn how to set up and run Hadoop clusters. The book includes case studies that illustrate how Hadoop is used to solve specific problems.

Data Management

This convenient guide is for anyone who wants to take his or her SQL skills to the next level. Packed with over 200 recipes, the SQL Cookbook helps you conquer common data query and manipulation problems, including those related to window functions, data warehousing, and string manipulation. Features an easy-to-grasp problem-solution-discussion format.

Maybe you've written some simple SQL queries to interact with databases. But now you want more, you want to really dig into those databases and work with your data. Head First SQL will show you the fundamentals of SQL and how to really take advantage of it.

What can you do when database performance doesn't meet expectations? This book offers methods for refactoring (or changing) SQL code to improve performance without altering a database application's purpose -- and helps you do it on a shoestring budget. This isn't a rehash of theory, but a tested set of options for making code modifications to dramatically improve the way your applications function.

The essential reference to the SQL language used in today's most popular database products, this new edition of SQL in a Nutshell clearly documents every SQL command according to the latest ANSI standard. It also details how these commands are implemented in the Microsoft SQL Server 2008 and Oracle 11g commercial database packages, and the MySQL 5.1 and PostgreSQL 8.3 open source database products.

Understanding SQL's underlying theory is the best way to guarantee that your SQL code is correct and your database schema is robust and maintainable. In SQL and Relational Theory, author C.J. Date demonstrates how you can apply relational theory directly to your use of SQL, with numerous examples and clear explanations of the reasoning behind them. Anyone with a modest to advanced background in SQL will benefit from the many insights in this book.

Updated for the latest database management systems, this introductory guide will get you up and running with SQL quickly. Whether you need to write database applications, perform administrative tasks, or generate reports, Learning SQL, Second Edition, will help you easily master all the SQL fundamentals. Each chapter presents a self-contained lesson on a key SQL concept or technique, with numerous illustrations, annotated examples, and exercises to let you practice the skills you learn.

Apache Hadoop is ideal for organizations with a growing need to store and process massive application datasets. With Hadoop: The Definitive Guide, programmers will find details for analyzing large datasets with Hadoop, and administrators will learn how to set up and run Hadoop clusters. The book includes case studies that illustrate how Hadoop is used to solve specific problems.

Now available in an updated second edition, our very popular SQL Pocket Guide is a major help to programmers, database administrators, and everyone who uses SQL in their day-to-day work. The SQL Pocket Guide is a concise reference to frequently used SQL statements and commonly used SQL functions. Not just an endless collection of syntax diagrams, this portable guide addresses the language's complexity head on and leads by example. The information in this edition has been updated to reflect the latest versions of the most commonly used SQL variants.

Analysis

Want to calculate the probability that an event will happen? Be able to spot fake data? Prove beyond doubt whether one thing causes another? Or learn to be a better gambler? You can do that and much more with 75 practical and fun hacks packed into Statistics Hacks. These cool tips, tricks, and mind-boggling solutions from the world of statistics, measurement, and research methods will not only amaze and entertain you, but will give you an advantage in several real-world situations-including business.

This book offers practical recipes to solve a variety of common problems that users have with extracting Access data and performing calculations on it. Whether you use Access 2007 or an earlier version, this book will teach you new methods to query data, different ways to move data in and out of Access, how to calculate answers to financial and investment issues, how to jump beyond SQL by manipulating data with VBA, and more.

Need to learn statistics as part of your job, or looking for help to pass a statistics course? Statistics in a Nutshell is a clear and concise introduction and reference for anyone with no previous background in the subject. You get a firm grasp of the basics before moving into increasingly advanced material. Each chapter presents you with easy-to-follow descriptions illustrated by graphics, formulas, and plenty of solved examples.

Wouldn't it be great if there were a statistics book that made histograms, probability distributions, and chi square analysis more enjoyable than going to the dentist? Head First Statistics brings this typically dry subject to life, teaching statistics through engaging, interactive, and thought-provoking material, full of puzzles, stories, quizzes, visual aids, and real-world examples. This book satisfies the requirements for passing the College Board's Advanced Placement (AP) Statistics Exam.

How can you learn to manage and analyze all kinds of data? Turn to Head First Data Analysis, where you'll learn how to collect and organize your data, sort the distractions from the truth, find meaningful patterns, draw conclusions, predict the future, and present your findings to others. The unique approach in Head First Data Analysis is by far the most efficient way to learn what you need to know to convert raw data into a vital business tool.

R is rapidly becoming the standard for developing statistical software, and R in a Nutshell provides a quick and practical way to learn this increasingly popular open source language and environment. You'll not only learn how to program in R, but also how to find the right user-contributed R packages for statistical modeling, visualization, and bioinformatics.

Real World Data Analysis shows you how you think about data and the results you want to achieve with it. Author Philipp Janert teaches you how to effectively approach data analysis problems, and how to extract all the available information from your data. Many people can apply a data analysis formula. This book shows you how to look at the results and know whether they're meaningful.

Popular social networks such as Facebook and Twitter generate a tremendous amount of valuable data on topics and use patterns. Who’s talking to who? What are they talking about? How often are they talking? This concise and practical book shows you how to answer these questions and more by harvesting and analyzing data using social web APIs, Python, and pragmatic storage technologies such as Redis, CouchDB, and NetworkX.

Visualization

How you can take advantage of data that you might otherwise never use? With the help of a downloadable programming environment, this book helps you represent data accurately on the Web and elsewhere, complete with user interaction, animation, and more. You'll learn basic visualization principles, how to choose the right kind of display for your purposes, and how to provide interactive features to design entire interfaces around large, complex data sets.

With this insightful book, you'll learn from the best data practitioners in the field just how wide-ranging -- and beautiful -- working with data can be. Join 39 contributors as they explain how they developed simple and elegant solutions on projects ranging from the Mars lander to a Radiohead video.

With contributions from more than two dozen experts, this book demonstrates why visualizations are beautiful not only for their aesthetic design, but also for elegant layers of detail that efficiently generate insight and new understanding. Think of the familiar map of the New York City subway system, or a diagram of the human brain. These older examples have been surpassed artists, designers, commentators, scientists, analysts, statisticians, and others who show how visualizations using today's digital capabilities can help us make sense of the world.

Learn computer programming the easy way with Processing, a simple language that lets you use code to create drawings, animation, and interactive graphics. Programming courses usually start with theory, but this book lets you jump right into creative and fun projects. It's ideal for anyone who wants to learn basic programming, and serves as a simple introduction to graphics for people with some programming skills.

NLP/Machine Learning

This fascinating book demonstrates how you can build web applications to mine the enormous amount of data created by people on the Internet. With the sophisticated algorithms in this book, you can write smart programs to access interesting datasets from other web sites, collect data from users of your own applications, and analyze and understand the data once you've found it.

This book offers a highly accessible introduction to Natural Language Processing, the field that underpins a variety of language technologies ranging from predictive text and email filtering to automatic summarization and translation. You'll learn how to write Python programs to analyze the structure and meaning of texts, drawing on techniques from the fields of linguistics and artificial intelligence.

Other

With the advent of rich Internet applications, the explosion of social media, and the increased use of powerful cloud computing infrastructures, a new generation of attackers has added cunning new techniques to its arsenal. For anyone involved in defending an application or a network of systems, Hacking: The Next Generation is one of the few books to identify a variety of emerging attack vectors.