Similar

This book constitutes the proceedings of the 17th International Conference on Discovery Science, DS 2015, held in banff, AB, Canada in October 2015. The 16 long and 12 short papers presendted together with 4 invited talks in this volume were carefully reviewed and selected from 44 submissions. The combination of recent advances in the development and analysis of methods for discovering scienti c knowledge, coming from machine learning, data mining, and intelligent data analysis, as well as their application in various scienti c domains, on the one hand, with the algorithmic advances in machine learning theory, on the other hand, makes every instance of this joint event unique and attractive.

This volume contains the papers selected for presentation at the 17th Inter- tional Symposium on Methodologies for Intelligent Systems (ISMIS 2008), held in York University, Toronto, Canada, May 21–23, 2008. ISMIS is a conference series started in 1986. Held twice every three years, ISMIS provides an inter- tional forum for exchanging scienti?c research and technological achievements in building intelligent systems. Its goal is to achieve a vibrant interchange - tween researchers and practitioners on fundamental and advanced issues related to intelligent systems. ISMIS 2008featureda selectionof latestresearchworkandapplicationsfrom the following areas related to intelligent systems: active media human–computer interaction, autonomic and evolutionary computation, digital libraries, intel- gent agent technology, intelligent information retrieval, intelligent information systems, intelligent language processing, knowledge representation and integ- tion, knowledge discovery and data mining, knowledge visualization, logic for arti?cial intelligence, soft computing, Web intelligence, and Web services. - searchers and developers from 29 countries submitted more than 100 full - pers to the conference. Each paper was rigorously reviewed by three committee members and external reviewers. Out of these submissions, 40% were selected as regular papers and 22% as short papers. ISMIS 2008 also featured three plenary talks given by John Mylopoulos, Jiawei Han and Michael Lowry. They spoke on their recent research in age- oriented software engineering, information network mining, and intelligent so- ware engineering tools, respectively.

The Twelfth International Conference on Inductive Logic Programming was held in Sydney, Australia, July 9–11, 2002. The conference was colocated with two other events, the Nineteenth International Conference on Machine Learning (ICML2002) and the Fifteenth Annual Conference on Computational Learning Theory (COLT2002). Startedin1991,InductiveLogicProgrammingistheleadingannualforumfor researchers working in Inductive Logic Programming and Relational Learning. Continuing a series of international conferences devoted to Inductive Logic Programming and Relational Learning, ILP 2002 was the central event in 2002 for researchers interested in learning relational knowledge from examples. The Program Committee, following a resolution of the Community Me- ing in Strasbourg in September 2001, took upon itself the issue of the possible change of the name of the conference. Following an extended e-mail discussion, a number of proposed names were subjected to a vote. In the ?rst stage of the vote, two names were retained for the second vote. The two names were: Ind- tive Logic Programming, and Relational Learning. It had been decided that a 60% vote would be needed to change the name; the result of the vote was 57% in favor of the name Relational Learning. Consequently, the name Inductive Logic Programming was kept.

AI 2001 is the 14th in the series of Arti cial Intelligence conferences sponsored by the Canadian Society for Computational Studies of Intelligence/Soci et e - nadienne pour l’ etude de l’intelligence par ordinateur. As was the case last year too, the conference is being held in conjunction with the annual conferences of two other Canadian societies, Graphics Interface (GI 2001) and Vision Int- face (VI 2001). We believe that the overall experience will be enriched by this conjunction of conferences. This year is the \silver anniversary" of the conference: the rst Canadian AI conference was held in 1976 at UBC. During its lifetime, it has attracted Canadian and international papers of high quality from a variety of AI research areas. All papers submitted to the conference received at least three indep- dent reviews. Approximately one third were accepted for plenary presentation at the conference. The best paper of the conference will be invited to appear in Computational Intelligence.

Forecasting is required in many situations. Stocking an inventory may require forecasts of demand months in advance. Telecommunication routing requires traffic forecasts a few minutes ahead. Whatever the circumstances or time horizons involved, forecasting is an important aid in effective and efficient planning.

This textbook provides a comprehensive introduction to forecasting methods and presents enough information about each method for readers to use them sensibly.

Big Data Analytics with R and Hadoop is a tutorial style book that focuses on all the powerful big data tasks that can be achieved by integrating R and Hadoop.This book is ideal for R developers who are looking for a way to perform big data analytics with Hadoop. This book is also aimed at those who know Hadoop and want to build some intelligent applications over Big data with R packages. It would be helpful if readers have basic knowledge of R.

The human brain has some capabilities that the brains of other animals lack. It is to these distinctive capabilities that our species owes its dominant position. Other animals have stronger muscles or sharper claws, but we have cleverer brains. If machine brains one day come to surpass human brains in general intelligence, then this new superintelligence could become very powerful. As the fate of the gorillas now depends more on us humans than on the gorillas themselves, so the fate of our species then would come to depend on the actions of the machine superintelligence. But we have one advantage: we get to make the first move. Will it be possible to construct a seed AI or otherwise to engineer initial conditions so as to make an intelligence explosion survivable? How could one achieve a controlled detonation? To get closer to an answer to this question, we must make our way through a fascinating landscape of topics and considerations. Read the book and learn about oracles, genies, singletons; about boxing methods, tripwires, and mind crime; about humanity's cosmic endowment and differential technological development; indirect normativity, instrumental convergence, whole brain emulation and technology couplings; Malthusian economics and dystopian evolution; artificial intelligence, and biological cognitive enhancement, and collective intelligence. This profoundly ambitious and original book picks its way carefully through a vast tract of forbiddingly difficult intellectual terrain. Yet the writing is so lucid that it somehow makes it all seem easy. After an utterly engrossing journey that takes us to the frontiers of thinking about the human condition and the future of intelligent life, we find in Nick Bostrom's work nothing less than a reconceptualization of the essential task of our time.

Data science libraries, frameworks, modules, and toolkits are great for doing data science, but they’re also a good way to dive into the discipline without actually understanding data science. In this book, you’ll learn how many of the most fundamental data science tools and algorithms work by implementing them from scratch.

If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with hacking skills you need to get started as a data scientist. Today’s messy glut of data holds answers to questions no one’s even thought to ask. This book provides you with the know-how to dig those answers out.

Get a crash course in PythonLearn the basics of linear algebra, statistics, and probability—and understand how and when they're used in data scienceCollect, explore, clean, munge, and manipulate dataDive into the fundamentals of machine learningImplement models such as k-nearest Neighbors, Naive Bayes, linear and logistic regression, decision trees, neural networks, and clusteringExplore recommender systems, natural language processing, network analysis, MapReduce, and databases

How can you bring out MySQL’s full power? With High Performance MySQL, you’ll learn advanced techniques for everything from designing schemas, indexes, and queries to tuning your MySQL server, operating system, and hardware to their fullest potential. This guide also teaches you safe and practical ways to scale applications through replication, load balancing, high availability, and failover.

Updated to reflect recent advances in MySQL and InnoDB performance, features, and tools, this third edition not only offers specific examples of how MySQL works, it also teaches you why this system works as it does, with illustrative stories and case studies that demonstrate MySQL’s principles in action. With this book, you’ll learn how to think in MySQL.

Learn the effects of new features in MySQL 5.5, including stored procedures, partitioned databases, triggers, and viewsImplement improvements in replication, high availability, and clusteringAchieve high performance when running MySQL in the cloudOptimize advanced querying features, such as full-text searchesTake advantage of modern multi-core CPUs and solid-state disksExplore backup and recovery strategies—including new tools for hot online backups

Written by renowned data science experts Foster Provost and Tom Fawcett, Data Science for Business introduces the fundamental principles of data science, and walks you through the "data-analytic thinking" necessary for extracting useful knowledge and business value from the data you collect. This guide also helps you understand the many data-mining techniques in use today.

Based on an MBA course Provost has taught at New York University over the past ten years, Data Science for Business provides examples of real-world business problems to illustrate these principles. You’ll not only learn how to improve communication between business stakeholders and data scientists, but also how participate intelligently in your company’s data science projects. You’ll also discover how to think data-analytically, and fully appreciate how data science methods can support business decision-making.

Understand how data science fits in your organization—and how you can use it for competitive advantageTreat data as a business asset that requires careful investment if you’re to gain real valueApproach business problems data-analytically, using the data-mining process to gather good data in the most appropriate wayLearn general concepts for actually extracting knowledge from dataApply data science principles when interviewing data science job candidates

Unlock deeper insights into Machine Leaning with this vital guide to cutting-edge predictive analyticsAbout This BookLeverage Python's most powerful open-source libraries for deep learning, data wrangling, and data visualizationLearn effective strategies and best practices to improve and optimize machine learning systems and algorithmsAsk – and answer – tough questions of your data with robust statistical models, built for a range of datasetsWho This Book Is For

If you want to find out how to use Python to start answering critical questions of your data, pick up Python Machine Learning – whether you want to get started from scratch or want to extend your data science knowledge, this is an essential and unmissable resource.

What You Will LearnExplore how to use different machine learning models to ask different questions of your dataLearn how to build neural networks using Keras and TheanoFind out how to write clean and elegant Python code that will optimize the strength of your algorithmsDiscover how to embed your machine learning model in a web application for increased accessibilityPredict continuous target outcomes using regression analysisUncover hidden patterns and structures in data with clusteringOrganize data using effective pre-processing techniquesGet to grips with sentiment analysis to delve deeper into textual and social media dataIn Detail

Machine learning and predictive analytics are transforming the way businesses and other organizations operate. Being able to understand trends and patterns in complex data is critical to success, becoming one of the key strategies for unlocking growth in a challenging contemporary marketplace. Python can help you deliver key insights into your data – its unique capabilities as a language let you build sophisticated algorithms and statistical models that can reveal new perspectives and answer key questions that are vital for success.

Python Machine Learning gives you access to the world of predictive analytics and demonstrates why Python is one of the world's leading data science languages. If you want to ask better questions of data, or need to improve and extend the capabilities of your machine learning systems, this practical data science book is invaluable. Covering a wide range of powerful Python libraries, including scikit-learn, Theano, and Keras, and featuring guidance and tips on everything from sentiment analysis to neural networks, you'll soon be able to answer some of the most important questions facing you and your organization.

Style and approach

Python Machine Learning connects the fundamental theoretical principles behind machine learning to their practical application in a way that focuses you on asking and answering the right questions. It walks you through the key elements of Python and its powerful machine learning libraries, while demonstrating how to get to grips with a range of statistical models.

One of CBS News’s Best Fall Books of 2005 • Among St Louis Post-Dispatch’s Best Nonfiction Books of 2005 • One of Amazon.com’s Best Science Books of 2005

A radical and optimistic view of the future course of human development from the bestselling author of How to Create a Mind and The Age of Spiritual Machines who Bill Gates calls “the best person I know at predicting the future of artificial intelligence”

For over three decades, Ray Kurzweil has been one of the most respected and provocative advocates of the role of technology in our future. In his classic The Age of Spiritual Machines, he argued that computers would soon rival the full range of human intelligence at its best. Now he examines the next step in this inexorable evolutionary process: the union of human and machine, in which the knowledge and skills embedded in our brains will be combined with the vastly greater capacity, speed, and knowledge-sharing ability of our creations.

You know the rudiments of the SQL query language, yet you feel you aren't taking full advantage of SQL's expressive power. You'd like to learn how to do more work with SQL inside the database before pushing data across the network to your applications. You'd like to take your SQL skills to the next level.

Let's face it, SQL is a deceptively simple language to learn, and many database developers never go far beyond the simple statement: SELECT columns FROM table WHERE conditions. But there is so much more you can do with the language. In the SQL Cookbook, experienced SQL developer Anthony Molinaro shares his favorite SQL techniques and features. You'll learn about:

Window functions, arguably the most significant enhancement to SQL in the past decade. If you're not using these, you're missing out

Powerful, database-specific features such as SQL Server's PIVOT and UNPIVOT operators, Oracle's MODEL clause, and PostgreSQL's very useful GENERATE_SERIES function

Pivoting rows into columns, reverse-pivoting columns into rows, using pivoting to facilitate inter-row calculations, and double-pivoting a result set

Bucketization, and why you should never use that term in Brooklyn.

How to create histograms, summarize data into buckets, perform aggregations over a moving range of values, generate running-totals and subtotals, and other advanced, data warehousing techniques

The technique of walking a string, which allows you to use SQL to parse through the characters, words, or delimited elements of a string

Written in O'Reilly's popular Problem/Solution/Discussion style, the SQL Cookbook is sure to please. Anthony's credo is: "When it comes down to it, we all go to work, we all have bills to pay, and we all want to go home at a reasonable time and enjoy what's still available of our days." The SQL Cookbook moves quickly from problem to solution, saving you time each step of the way.

Artificial Intelligence helps choose what books you buy, what movies you see, and even who you date. It puts the "smart" in your smartphone and soon it will drive your car. It makes most of the trades on Wall Street, and controls vital energy, water, and transportation infrastructure. But Artificial Intelligence can also threaten our existence.

In as little as a decade, AI could match and then surpass human intelligence. Corporations and government agencies are pouring billions into achieving AI's Holy Grail—human-level intelligence. Once AI has attained it, scientists argue, it will have survival drives much like our own. We may be forced to compete with a rival more cunning, more powerful, and more alien than we can imagine. Through profiles of tech visionaries, industry watchdogs, and groundbreaking AI systems, Our Final Invention explores the perils of the heedless pursuit of advanced AI. Until now, human intelligence has had no rival. Can we coexist with beings whose intelligence dwarfs our own? And will they allow us to?

If you’re considering R for statistical computing and data visualization, this book provides a quick and practical guide to just about everything you can do with the open source R language and software environment. You’ll learn how to write R functions and use R packages to help you prepare, visualize, and analyze data. Author Joseph Adler illustrates each process with a wealth of examples from medicine, business, and sports.

Updated for R 2.14 and 2.15, this second edition includes new and expanded chapters on R performance, the ggplot2 data visualization package, and parallel R computing with Hadoop.

Get started quickly with an R tutorial and hundreds of examplesExplore R syntax, objects, and other language detailsFind thousands of user-contributed R packages online, including BioconductorLearn how to use R to prepare data for analysisVisualize your data with R’s graphics, lattice, and ggplot2 packagesUse R to calculate statistical fests, fit models, and compute probability distributionsSpeed up intensive computations by writing parallel R programs for HadoopGet a complete desktop reference to R

A practical Tutorial, The book targets professionals and organizations who want to implement or have already implemented Splunk for log analysis and indexing. Analysts and IT staff for end-to-end investigation, performance monitoring etc will also learn from the practical examples. It would even help Managers to build reports and summarize the health, performance, and activity of their IT infrastructure and business. You will also find it helpful as a technical administrator, consultant or end user, Some basic knowledge about Splunk would be helpful, but not necessar

The bold futurist and bestselling author explores the limitless potential of reverse-engineering the human brain

Ray Kurzweil is arguably today’s most influential—and often controversial—futurist. In How to Create a Mind, Kurzweil presents a provocative exploration of the most important project in human-machine civilization—reverse engineering the brain to understand precisely how it works and using that knowledge to create even more intelligent machines.

Kurzweil discusses how the brain functions, how the mind emerges from the brain, and the implications of vastly increasing the powers of our intelligence in addressing the world’s problems. He thoughtfully examines emotional and moral intelligence and the origins of consciousness and envisions the radical possibilities of our merging with the intelligent technology we are creating.

Certain to be one of the most widely discussed and debated science books of the year, How to Create a Mind is sure to take its place alongside Kurzweil’s previous classics which include Fantastic Voyage: Live Long Enough to Live Forever and The Age of Spiritual Machines.

From the inventor of the PalmPilot comes a new and compelling theory of intelligence, brain function, and the future of intelligent machines

Jeff Hawkins, the man who created the PalmPilot, Treo smart phone, and other handheld devices, has reshaped our relationship to computers. Now he stands ready to revolutionize both neuroscience and computing in one stroke, with a new understanding of intelligence itself.

Hawkins develops a powerful theory of how the human brain works, explaining why computers are not intelligent and how, based on this new theory, we can finally build intelligent machines.

The brain is not a computer, but a memory system that stores experiences in a way that reflects the true structure of the world, remembering sequences of events and their nested relationships and making predictions based on those memories. It is this memory-prediction system that forms the basis of intelligence, perception, creativity, and even consciousness.

In an engaging style that will captivate audiences from the merely curious to the professional scientist, Hawkins shows how a clear understanding of how the brain works will make it possible for us to build intelligent machines, in silicon, that will exceed our human ability in surprising ways.

Written with acclaimed science writer Sandra Blakeslee, On Intelligence promises to completely transfigure the possibilities of the technology age. It is a landmark book in its scope and clarity.

How can you tap into the wealth of social web data to discover who’s making connections with whom, what they’re talking about, and where they’re located? With this expanded and thoroughly revised edition, you’ll learn how to acquire, analyze, and summarize data from all corners of the social web, including Facebook, Twitter, LinkedIn, Google+, GitHub, email, websites, and blogs.

Employ the Natural Language Toolkit, NetworkX, and other scientific computing tools to mine popular social web sitesApply advanced text-mining techniques, such as clustering and TF-IDF, to extract meaning from human language dataBootstrap interest graphs from GitHub by discovering affinities among people, programming languages, and coding projectsBuild interactive visualizations with D3.js, an extraordinarily flexible HTML5 and JavaScript toolkitTake advantage of more than two-dozen Twitter recipes, presented in O’Reilly’s popular "problem/solution/discussion" cookbook format

The example code for this unique data science book is maintained in a public GitHub repository. It’s designed to be easily accessible through a turnkey virtual machine that facilitates interactive learning with an easy-to-use collection of IPython Notebooks.

During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book.

This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for ``wide'' data (p bigger than n), including multiple testing and false discovery rates.

Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, projection pursuit and gradient boosting.

Manage the huMONGOus amount of data collected through your web application with MongoDB. This authoritative introduction—written by a core contributor to the project—shows you the many advantages of using document-oriented databases, and demonstrates how this reliable, high-performance system allows for almost infinite horizontal scalability.

This updated second edition provides guidance for database developers, advanced configuration for system administrators, and an overview of the concepts and use cases for other people on your project. Ideal for NoSQL newcomers and experienced MongoDB users alike, this guide provides numerous real-world schema design examples.

Get started with MongoDB core concepts and vocabularyPerform basic write operations at different levels of safety and speedCreate complex queries, with options for limiting, skipping, and sorting resultsDesign an application that works well with MongoDBAggregate data, including counting, finding distinct values, grouping documents, and using MapReduceGather and interpret statistics about your collections and databasesSet up replica sets and automatic failover in MongoDBUse sharding to scale horizontally, and learn how it impacts applicationsDelve into monitoring, security and authentication, backup/restore, and other administrative tasks

A hands on guide to web scraping and text mining for both beginners and experienced users of R Introduces fundamental concepts of the main architecture of the web and databases and covers HTTP, HTML, XML, JSON, SQL. Provides basic techniques to query web documents and data sets (XPath and regular expressions). An extensive set of exercises are presented to guide the reader through each technique. Explores both supervised and unsupervised techniques as well as advanced techniques such as data scraping and text management. Case studies are featured throughout along with examples for each technique presented. R code and solutions to exercises featured in the book are provided on a supporting website.

If you’ve been asked to maintain large and complex Hadoop clusters, this book is a must. Demand for operations-specific material has skyrocketed now that Hadoop is becoming the de facto standard for truly large-scale data processing in the data center. Eric Sammer, Principal Solution Architect at Cloudera, shows you the particulars of running Hadoop in production, from planning, installing, and configuring the system to providing ongoing maintenance.

Rather than run through all possible scenarios, this pragmatic operations guide calls out what works, as demonstrated in critical deployments.

Get a high-level overview of HDFS and MapReduce: why they exist and how they workPlan a Hadoop deployment, from hardware and OS selection to network requirementsLearn setup and configuration details with a list of critical propertiesManage resources by sharing a cluster across multiple groupsGet a runbook of the most common cluster maintenance tasksMonitor Hadoop clusters—and learn troubleshooting with the help of real-world war storiesUse basic tools and techniques to handle backup and catastrophic failure

If you love Essbase and hate seeing it misused, then this is the book for you. Written by 12 Essbase professionals that are either acknowledged Essbase gurus or certified Oracle ACEs, Developing Essbase Applications: Advanced Techniques for Finance and IT Professionals provides an unparalleled investigation and explanation of Essbase theory and best practices.

Detailing the hows and the whys of successful Essbase implementation, the book arms you with simple yet powerful tools to meet your immediate needs, as well as the theoretical knowledge to proceed to the next level with Essbase. Infrastructure, data sourcing and transformation, database design, calculations, automation, APIs, reporting, and project implementation are covered by subject matter experts who work with the tools and techniques on a daily basis. In addition to practical cases that illustrate valuable lessons learned, the book offers:

Undocumented Secrets—Dan Pressman describes the previously unpublished and undocumented inner workings of the ASO Essbase engine. Authoritative Experts—If you have questions that no one else can solve, these 12 Essbase professionals are the ones who can answer them. Unpublished—Includes the only third-party guide to infrastructure. Infrastructure is easy to get wrong and can doom any Essbase project. Comprehensive—Let there never again be a question on how to create blocks or design BSO databases for performance—Dave Farnsworth provides the answers within. Innovative—Cameron Lackpour and Joe Aultman bring new and exciting solutions to persistent Essbase problems.

With a list of contributors as impressive as the program of presenters at a leading Essbase conference, this book offers unprecedented access to the insights and experiences of those at the forefront of the field. The previously unpublished material presented in these pages will give you the practical knowledge needed to use this powerful and intuitive tool to build highly useful analytical models, reporting systems, and forecasting applications.

Bayesian methods of inference are deeply natural and extremely powerful. However, most discussions of Bayesian inference rely on intensely complex mathematical analyses and artificial examples, making it inaccessible to anyone without a strong mathematical background. Now, though, Cameron Davidson-Pilon introduces Bayesian inference from a computational perspective, bridging theory to practice–freeing you to get results using computing power.

Bayesian Methods for Hackers illuminates Bayesian inference through probabilistic programming with the powerful PyMC language and the closely related Python tools NumPy, SciPy, and Matplotlib. Using this approach, you can reach effective solutions in small increments, without extensive mathematical intervention.

Davidson-Pilon begins by introducing the concepts underlying Bayesian inference, comparing it with other techniques and guiding you through building and training your first Bayesian model. Next, he introduces PyMC through a series of detailed examples and intuitive explanations that have been refined after extensive user feedback. You’ll learn how to use the Markov Chain Monte Carlo algorithm, choose appropriate sample sizes and priors, work with loss functions, and apply Bayesian inference in domains ranging from finance to marketing. Once you’ve mastered these techniques, you’ll constantly turn to this guide for the working PyMC code you need to jumpstart future projects.

Coverage includes

• Learning the Bayesian “state of mind” and its practical implications

• Understanding how computers perform Bayesian inference

• Using the PyMC Python library to program Bayesian analyses

• Building and debugging models with PyMC

• Testing your model’s “goodness of fit”

• Opening the “black box” of the Markov Chain Monte Carlo algorithm to see how and why it works

• Solving data science problems when only small amounts of data are available

Cameron Davidson-Pilon has worked in many areas of applied mathematics, from the evolutionary dynamics of genes and diseases to stochastic modeling of financial prices. His contributions to the open source community include lifelines, an implementation of survival analysis in Python. Educated at the University of Waterloo and at the Independent University of Moscow, he currently works with the online commerce leader Shopify.

Data Science and Big Data Analytics is about harnessing the power of data for new insights. The book covers the breadth of activities and methods and tools that Data Scientists use. The content focuses on concepts, principles and practical applications that are applicable to any industry and technology environment, and the learning is supported and explained with examples that you can replicate using open-source software.

This book will help you:

Become a contributor on a data science team Deploy a structured lifecycle approach to data analytics problems Apply appropriate analytic techniques and tools to analyzing big data Learn how to tell a compelling story with data to drive business action Prepare for EMC Proven Professional Data Science Certification

Corresponding data sets are available at www.wiley.com/go/9781118876138.

Get started discovering, analyzing, visualizing, and presenting data in a meaningful way today!

Get ready to unlock the power of your data. With the fourth edition of this comprehensive guide, you’ll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. This book is ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters.

Using Hadoop 2 exclusively, author Tom White presents new chapters on YARN and several Hadoop-related projects such as Parquet, Flume, Crunch, and Spark. You’ll learn about recent changes to Hadoop, and explore new case studies on Hadoop’s role in healthcare systems and genomics data processing.

Learn fundamental components such as MapReduce, HDFS, and YARNExplore MapReduce in depth, including steps for developing applications with itSet up and maintain a Hadoop cluster running HDFS and MapReduce on YARNLearn two data formats: Avro for data serialization and Parquet for nested dataUse data ingestion tools such as Flume (for streaming data) and Sqoop (for bulk data transfer)Understand how high-level data processing tools like Pig, Hive, Crunch, and Spark work with HadoopLearn the HBase distributed database and the ZooKeeper distributed configuration service

In this compact book, Steven Feuerstein, widely recognized as one of the world's leading experts on the Oracle PL/SQL language, distills his many years of programming, teaching, and writing about PL/SQL into a set of best practices-recommendations for developing successful applications. Covering the latest Oracle release, Oracle Database 11gR2, Feuerstein has rewritten this new edition in the style of his bestselling Oracle PL/SQL Programming. The text is organized in a problem/solution format, and chronicles the programming exploits of developers at a mythical company called My Flimsy Excuse, Inc., as they write code, make mistakes, and learn from those mistakes-and each other.

This book offers practical answers to some of the hardest questions faced by PL/SQL developers, including:What is the best way to write the SQL logic in my application code?

How should I write my packages so they can be leveraged by my entire team of developers?

How can I make sure that all my team's programs handle and record errors consistently?Oracle PL/SQL Best Practices summarizes PL/SQL best practices in nine major categories: overall PL/SQL application development; programming standards; program testing, tracing, and debugging; variables and data structures; control logic; error handling; the use of SQL in PL/SQL; building procedures, functions, packages, and triggers; and overall program performance.

This book is a concise and entertaining guide that PL/SQL developers will turn to again and again as they seek out ways to write higher quality code and more successful applications.

"This book presents ideas that make the difference between a successful project and one that never gets off the ground. It goes beyond just listing a set of rules, and provides realistic scenarios that help the reader understand where the rules come from. This book should be required reading for any team of Oracle database professionals."

What happens when a naive intern is granted unfettered access to people's most private thoughts and actions? Stephen Thorpe lands a coveted internship at Ubatoo, an Internet empire that provides its users with popular online services, from a search engine and e-mail, to social networking. When Stephen’s boss asks him to work on a project with the American Coalition for Civil Liberties, Stephen innocently obliges, believing he is mining Ubatoo’s vast databases to protect people unfairly targeted in the name of national security. But nothing is as it seems. Suspicious individuals surface, doing all they can to access Ubatoo’s wealth of confidential information. This need not require technical wizardry—simply knowing how to manipulate a well-intentioned intern may be enough.

The Silicon Jungle is a cautionary fictional tale of data mining’s promise and peril. Baluja raises ethical questions about contemporary technological innovations, and how minute details can be routinely pieced together into rich profiles that reveal our habits, goals, and secret desires—all ready to be exploited.

Technological advancements in computing have changed how data is leveraged by businesses to develop, grow, and innovate. In recent years, leading analytical companies have begun to realize the value in their vast holdings of customer data and have found ways to leverage this untapped potential. Now, more firms are following suit and looking to monetize Big Data for big profits. Such changes will have implications for both businesses and consumers in the coming years. In From Big Data to Big Profits, Russell Walker investigates the use of Big Data to stimulate innovations in operational effectiveness and business growth. Walker examines the nature of Big Data and how businesses can use it to create new monetization opportunities. Using case studies of Apple, Netflix, Google, LinkedIn, Zillow, Amazon, and other leaders in the use of Big Data, Walker explores how digital platforms such as mobile apps and social networks are changing the nature of customer interactions and the way Big Data is created and used by companies. Such changes, as Walker points out, will require careful consideration of legal and unspoken business practices as they affect consumer privacy. Companies looking to develop a Big Data strategy will find great value in the SIGMA framework, which he has developed to assess companies for Big Data readiness and provide direction on the steps necessary to get the most from Big Data. Rigorous and meticulous, From Big Data to Big Profits is a valuable resource for students, researchers, and professionals with an interest in Big Data, digital platforms, and analytics

How do we design for data when traditional design techniques cannot extend to new database technologies? In this era of big data and the Internet of Things, it is essential that we have the tools we need to understand the data coming to us faster than ever before, and to design databases and data processing systems that can adapt easily to ever-changing data schemas and ever-changing business requirements. There must be no intellectual disconnect between data and the software that manages it. It must be possible to extract meaning and knowledge from data to drive artificial intelligence applications. Novel NoSQL data organization techniques must be used side-by-side with traditional SQL databases. Are existing data modeling techniques ready for all of this?

The Concept and Object Modeling Notation (COMN) is able to cover the full spectrum of analysis and design. A single COMN model can represent the objects and concepts in the problem space, logical data design, and concrete NoSQL and SQL document, key-value, columnar, and relational database implementations. COMN models enable an unprecedented level of traceability of requirements to implementation. COMN models can also represent the static structure of software and the predicates that represent the patterns of meaning in databases.

This book will teach you:

the simple and familiar graphical notation of COMN with its three basic shapes and four line styles how to think about objects, concepts, types, and classes in the real world, using the ordinary meanings of English words that aren’t tangled with confused techno-speak how to express logical data designs that are freer from implementation considerations than is possible in any other notation how to understand key-value, document, columnar, and table-oriented database designs in logical and physical terms how to use COMN to specify physical database implementations in any NoSQL or SQL database with the precision necessary for model-driven development

Getting Started With Amazon Redshift is a step-by-step, practical guide to the world of Redshift. Learn to load, manage, and query data on Redshift.This book is for CIOs, enterprise architects, developers, and anyone else who needs to get familiar with RedShift. The CIO will gain an understanding of what their technical staff is working on; the technical implementation personnel will get an in-depth view of the technology, and what it will take to implement their own solutions.

Python is a ground breaking language for its simplicity and succinctness, allowing the user to achieve a great deal with a few lines of code, especially compared to other programming languages. The pandas brings these features of Python into the data analysis realm, by providing expressiveness, simplicity, and powerful capabilities for the task of data analysis. By mastering pandas, users will be able to do complex data analysis in a short period of time, as well as illustrate their findings using the rich visualization capabilities of related tools such as IPython and matplotlib.

This book is an in-depth guide to the use of pandas for data analysis, for either the seasoned data analysis practitioner or the novice user. It provides a basic introduction to the pandas framework, and takes users through the installation of the library and the IPython interactive environment. Thereafter, you will learn basic as well as advanced features, such as MultiIndexing, modifying data structures, and sampling data, which provide powerful capabilities for data analysis.

The field of data mining provides techniques for automated discovery of valuable information from the accumulated data of computerized operations of enterprises. This book offers a clear and comprehensive introduction to both data mining theory and practice. It is written primarily as a textbook for the students of computer science, management, computer applications, and information technology. The book ensures that the students learn the major data mining techniques even if they do not have a strong mathematical background. The techniques include data pre-processing, association rule mining, supervised classification, cluster analysis, web data mining, search engine query mining, data warehousing and OLAP. To enhance the understanding of the concepts introduced, and to show how the techniques described in the book are used in practice, each chapter is followed by one or two case studies that have been published in scholarly journals. Most case studies deal with real business problems (for example, marketing, e-commerce, CRM). Studying the case studies provides the reader with a greater insight into the data mining techniques. The book also provides many examples, review questions, multiple choice questions, chapter-end exercises and a good list of references and Web resources especially those which are easy to understand and useful for students. A number of class projects have also been included.

Whether you're running Access, MySQL, SQL Server, Oracle, or PostgreSQL, this book will help you push the limits of traditional SQL to squeeze data effectively from your database. The book offers 100 hacks -- unique tips and tools -- that bring you the knowledge of experts who apply what they know in the real world to help you take full advantage of the expressive power of SQL. You'll find practical techniques to address complex data manipulation problems. Learn how to:Wrangle data in the most efficient way possibleAggregate and organize your data for meaningful and accurate reportingMake the most of subqueries, joins, and unionsStay on top of the performance of your queries and the server that runs themAvoid common SQL security pitfalls, including the dreaded SQL injection attack

Let SQL Hacks serve as your toolbox for digging up and manipulating data. If you love to tinker and optimize, SQL is the perfect technology and SQL Hacks is the must-have book for you.

Despite its wide availability and usage, few developers and DBAs have mastered the true power of Oracle SQLPlus. This bestselling book--now updated for Oracle 10g--is the only in-depth guide to this interactive query tool for writing SQL scripts. It's an essential resource for any Oracle user.The new second edition of Oracle SQLPlus: The Definitive Guide clearly describes how to perform, step-by-step, all of the tasks that Oracle developers and DBAs want to perform with SQLPlus--and maybe some you didn't realize you could perform.With Oracle SQLPlus: The Definitive Guide, you'll expertly:write and execute script filesgenerate ad hoc reportsextract data from the databasequery the data dictionary tablescustomize an SQLPlus environmentand much moreIt also includes a handy quick reference to all of its syntax options and an often-requested chapter on SQL itself, along with a clear, concise, and complete introduction.This book is truly the definitive guide to SQLPlus. It's an indispensable resource for those who are new to SQL*Plus, a task-oriented learning tool for those who are already using it, and an immediately useful quick reference for every user. If you want to leverage the full power and flexibility of this popular Oracle tool, you'll need this book.

MySQL’s popularity has brought a flood of questions about how to solve specific problems, and that’s where this cookbook is essential. When you need quick solutions or techniques, this handy resource provides scores of short, focused pieces of code, hundreds of worked-out examples, and clear, concise explanations for programmers who don’t have the time (or expertise) to solve MySQL problems from scratch.

Ideal for beginners and professional database and web developers, this updated third edition covers powerful features in MySQL 5.6 (and some in 5.7). The book focuses on programming APIs in Python, PHP, Java, Perl, and Ruby. With more than 200+ recipes, you’ll learn how to:

A step-by-step tutorial style using examples so that users of different levels will benefit from the facilities offered by RapidMiner.If you are a computer scientist or an engineer who has real data from which you want to extract value, this book is ideal for you. You will need to have at least a basic awareness of data mining techniques and some exposure to RapidMiner.

Using Agile methods, you can bring far greater innovation, value, and quality to any data warehousing (DW), business intelligence (BI), or analytics project. However, conventional Agile methods must be carefully adapted to address the unique characteristics of DW/BI projects. In Agile Analytics, Agile pioneer Ken Collier shows how to do just that.

Collier brings together proven solutions you can apply right now—whether you’re an IT decision-maker, data warehouse professional, database administrator, business intelligence specialist, or database developer. With his help, you can mitigate project risk, improve business alignment, achieve better results—and have fun along the way.

Master modern web and network data modeling: both theory and applications.

In Web and Network Data Science, a top faculty member of Northwestern University’s prestigious analytics program presents the first fully-integrated treatment of both the business and academic elements of web and network modeling for predictive analytics.

Some books in this field focus either entirely on business issues (e.g., Google Analytics and SEO); others are strictly academic (covering topics such as sociology, complexity theory, ecology, applied physics, and economics). This text gives today's managers and students what they really need: integrated coverage of concepts, principles, and theory in the context of real-world applications.

Building on his pioneering Web Analytics course at Northwestern University, Thomas W. Miller covers usability testing, Web site performance, usage analysis, social media platforms, search engine optimization (SEO), and many other topics. He balances this practical coverage with accessible and up-to-date introductions to both social network analysis and network science, demonstrating how these disciplines can be used to solve real business problems.

Now that people are aware that data can make the difference in an election or a business model, data science as an occupation is gaining ground. But how can you get started working in a wide-ranging, interdisciplinary field that’s so clouded in hype? This insightful book, based on Columbia University’s Introduction to Data Science class, tells you what you need to know.

In many of these chapter-long lectures, data scientists from companies such as Google, Microsoft, and eBay share new algorithms, methods, and models by presenting case studies and the code they use. If you’re familiar with linear algebra, probability, and statistics, and have programming experience, this book is an ideal introduction to data science.

Data matching (also known as record or data linkage, entity resolution, object identification, or field matching) is the task of identifying, matching and merging records that correspond to the same entities from several databases or even within one database. Based on research in various domains including applied statistics, health informatics, data mining, machine learning, artificial intelligence, database management, and digital libraries, significant advances have been achieved over the last decade in all aspects of the data matching process, especially on how to improve the accuracy of data matching, and its scalability to large databases.

Peter Christen’s book is divided into three parts: Part I, “Overview”, introduces the subject by presenting several sample applications and their special challenges, as well as a general overview of a generic data matching process. Part II, “Steps of the Data Matching Process”, then details its main steps like pre-processing, indexing, field and record comparison, classification, and quality evaluation. Lastly, part III, “Further Topics”, deals with specific aspects like privacy, real-time matching, or matching unstructured data. Finally, it briefly describes the main features of many research and open source systems available today.

By providing the reader with a broad range of data matching concepts and techniques and touching on all aspects of the data matching process, this book helps researchers as well as students specializing in data quality or data matching aspects to familiarize themselves with recent research advances and to identify open research challenges in the area of data matching. To this end, each chapter of the book includes a final section that provides pointers to further background and research material. Practitioners will better understand the current state of the art in data matching as well as the internal workings and limitations of current systems. Especially, they will learn that it is often not feasible to simply implement an existing off-the-shelf data matching system without substantial adaption and customization. Such practical considerations are discussed for each of the major steps in the data matching process.

“This book is a critically needed resource for the newly released Apache Hadoop 2.0, highlighting YARN as the significant breakthrough that broadens Hadoop beyond the MapReduce paradigm.” —From the Foreword by Raymie Stata, CEO of Altiscale

The Insider’s Guide to Building Distributed, Big Data Applications with Apache Hadoop™ YARN

Apache Hadoop is helping drive the Big Data revolution. Now, its data processing has been completely overhauled: Apache Hadoop YARN provides resource management at data center scale and easier ways to create distributed applications that process petabytes of data. And now in Apache Hadoop™ YARN, two Hadoop technical leaders show you how to develop new applications and adapt existing code to fully leverage these revolutionary advances.

YARN project founder Arun Murthy and project lead Vinod Kumar Vavilapalli demonstrate how YARN increases scalability and cluster utilization, enables new programming models and services, and opens new options beyond Java and batch processing. They walk you through the entire YARN project lifecycle, from installation through deployment.

You’ll find many examples drawn from the authors’ cutting-edge experience—first as Hadoop’s earliest developers and implementers at Yahoo! and now as Hortonworks developers moving the platform forward and helping customers succeed with it.

Coverage includes

YARN’s goals, design, architecture, and components—how it expands the Apache Hadoop ecosystem Exploring YARN on a single node Administering YARN clusters and Capacity Scheduler Running existing MapReduce applications Developing a large-scale clustered YARN application Discovering new open source frameworks that run under YARN

As our society transforms into a data-driven one, the role of the Data Scientist is becoming more and more important. If you want to be on the leading edge of what is sure to become a major profession in the not-too-distant future, this book can show you how. Each chapter is filled with practical information that will help you reap the fruits of big data and become a successful Data Scientist: • Learn what big data is and how it differs from traditional data through its main characteristics: volume, variety, velocity, and veracity. • Explore the different types of Data Scientists and the skillset each one has. • Dig into what the role of the Data Scientist requires in terms of the relevant mindset, technical skills, experience, and how the Data Scientist connects with other people. • Be a Data Scientist for a day, examining the problems you may encounter and how you tackle them, what programs you use, and how you expand your knowledge and know-how. • See how you can become a Data Scientist, based on where you are starting from: a programming, machine learning, or data-related background. • Follow step-by-step through the process of landing a Data Scientist job: where you need to look, how you would present yourself to a potential employer, and what it takes to follow a freelancer path. • Read the case studies of experienced, senior-level Data Scientists, in an attempt to get a better perspective of what this role is, in practice. At the end of the book, there is a glossary of the most important terms that have been introduced, as well as three appendices – a list of useful sites, some relevant articles on the web, and a list of offline resources for further reading.

Liu has written a comprehensive text on Web mining, which consists of two parts. The first part covers the data mining and machine learning foundations, where all the essential concepts and algorithms of data mining and machine learning are presented. The second part covers the key topics of Web mining, where Web crawling, search, social network analysis, structured data extraction, information integration, opinion mining and sentiment analysis, Web usage mining, query log mining, computational advertising, and recommender systems are all treated both in breadth and in depth. His book thus brings all the related concepts and algorithms together to form an authoritative and coherent text.

The book offers a rich blend of theory and practice. It is suitable for students, researchers and practitioners interested in Web mining and data mining both as a learning text and as a reference book. Professors can readily use it for classes on data mining, Web mining, and text mining. Additional teaching materials such as lecture slides, datasets, and implemented algorithms are available online.

Powerful, Flexible Tools for a Data-Driven WorldAs the data deluge continues in today’s world, the need to master data mining, predictive analytics, and business analytics has never been greater. These techniques and tools provide unprecedented insights into data, enabling better decision making and forecasting, and ultimately the solution of increasingly complex problems.

Learn from the Creators of the RapidMiner Software Written by leaders in the data mining community, including the developers of the RapidMiner software, RapidMiner: Data Mining Use Cases and Business Analytics Applications provides an in-depth introduction to the application of data mining and business analytics techniques and tools in scientific research, medicine, industry, commerce, and diverse other sectors. It presents the most powerful and flexible open source software solutions: RapidMiner and RapidAnalytics. The software and their extensions can be freely downloaded at www.RapidMiner.com.

Understand Each Stage of the Data Mining ProcessThe book and software tools cover all relevant steps of the data mining process, from data loading, transformation, integration, aggregation, and visualization to automated feature selection, automated parameter and process optimization, and integration with other tools, such as R packages or your IT infrastructure via web services. The book and software also extensively discuss the analysis of unstructured data, including text and image mining.

Easily Implement Analytics Approaches Using RapidMiner and RapidAnalytics Each chapter describes an application, how to approach it with data mining methods, and how to implement it with RapidMiner and RapidAnalytics. These application-oriented chapters give you not only the necessary analytics to solve problems and tasks, but also reproducible, step-by-step descriptions of using RapidMiner and RapidAnalytics. The case studies serve as blueprints for your own data mining applications, enabling you to effectively solve similar problems.

If you're looking for a scalable storage solution to accommodate a virtually endless amount of data, this book shows you how Apache HBase can fulfill your needs. As the open source implementation of Google's BigTable architecture, HBase scales to billions of rows and millions of columns, while ensuring that write and read performance remain constant. Many IT executives are asking pointed questions about HBase. This book provides meaningful answers, whether you’re evaluating this non-relational database or planning to put it into practice right away.Discover how tight integration with Hadoop makes scalability with HBase easierDistribute large datasets across an inexpensive cluster of commodity serversAccess HBase with native Java clients, or with gateway servers providing REST, Avro, or Thrift APIsGet details on HBase’s architecture, including the storage format, write-ahead log, background processes, and moreIntegrate HBase with Hadoop's MapReduce framework for massively parallelized data processing jobsLearn how to tune clusters, design schemas, copy tables, import bulk data, decommission nodes, and many other tasks

In 2004, the Government Accountability Office provided a report detailing approximately 200 government-based data-mining projects. While there is comfort in knowing that there are many effective systems, that comfort isn’t worth much unless we can determine that these systems are being effectively and responsibly employed.

Written by one of the most respected consultants in the area of data mining and security, Data Mining for Intelligence, Fraud & Criminal Detection: Advanced Analytics & Information Sharing Technologies reviews the tangible results produced by these systems and evaluates their effectiveness. While CSI-type shows may depict information sharing and analysis that are accomplished with the push of a button, this sort of proficiency is more fiction than reality. Going beyond a discussion of the various technologies, the author outlines the issues of information sharing and the effective interpretation of results, which are critical to any integrated homeland security effort.

Organized into three main sections, the book fully examines and outlines the future of this field with an insider’s perspective and a visionary’s insight.

Section 1 provides a fundamental understanding of the types of data that can be used in current systems. It covers approaches to analyzing data and clearly delineates how to connect the dots among different data elements Section 2 provides real-world examples derived from actual operational systems to show how data is used, manipulated, and interpreted in domains involving human smuggling, money laundering, narcotics trafficking, and corporate fraud Section 3 provides an overview of the many information-sharing systems, organizations, and task forces as well as data interchange formats. It also discusses optimal information-sharing and analytical architectures

Currently, there is very little published literature that truly defines real-world systems. Although politics and other factors all play into how much one agency is willing to support the sharing of its resources, many now embrace the wisdom of that path. This book will provide those individuals with an understanding of what approaches are currently available and how they can be most effectively employed.

Use the latest data mining best practices to enable timely, actionable, evidence-based decision making throughout your organization! Real-World Data Mining demystifies current best practices, showing how to use data mining to uncover hidden patterns and correlations, and leverage these to improve all aspects of business performance.

Drawing on extensive experience as a researcher, practitioner, and instructor, Dr. Dursun Delen delivers an optimal balance of concepts, techniques and applications. Without compromising either simplicity or clarity, he provides enough technical depth to help readers truly understand how data mining technologies work. Coverage includes: processes, methods, techniques, tools, and metrics; the role and management of data; text and web mining; sentiment analysis; and Big Data integration. Throughout, Delen's conceptual coverage is complemented with application case studies (examples of both successes and failures), as well as simple, hands-on tutorials.

Real-World Data Mining will be valuable to professionals on analytics teams; professionals seeking certification in the field; and undergraduate or graduate students in any analytics program: concentrations, certificate-based, or degree-based.

Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving illustrates the details involved in solving real computational problems encountered in data analysis. It reveals the dynamic and iterative process by which data analysts approach a problem and reason about different ways of implementing solutions.

Suitable for self-study or as supplementary reading in a statistical computing course, the book enables instructors to incorporate interesting problems into their courses so that students gain valuable experience and data science skills. Students learn how to acquire and work with unstructured or semistructured data as well as how to narrow down and carefully frame the questions of interest about the data.

Blending computational details with statistical and data analysis concepts, this book provides readers with an understanding of how professional data scientists think about daily computational tasks. It will improve readers’ computational reasoning of real-world data analyses.

With today’s consumers spending more time on their mobiles than on their PCs, new methods of empirical stochastic modeling have emerged that can provide marketers with detailed information about the products, content, and services their customers desire.

Data Mining Mobile Devices defines the collection of machine-sensed environmental data pertaining to human social behavior. It explains how the integration of data mining and machine learning can enable the modeling of conversation context, proximity sensing, and geospatial location throughout large communities of mobile users. Examines the construction and leveraging of mobile sites Describes how to use mobile apps to gather key data about consumers’ behavior and preferences Discusses mobile mobs, which can be differentiated as distinct marketplaces—including Apple®, Google®, Facebook®, Amazon®, and Twitter® Provides detailed coverage of mobile analytics via clustering, text, and classification AI software and techniques

Mobile devices serve as detailed diaries of a person, continuously and intimately broadcasting where, how, when, and what products, services, and content your consumers desire. The future is mobile—data mining starts and stops in consumers' pockets.

Describing how to analyze Wi-Fi and GPS data from websites and apps, the book explains how to model mined data through the use of artificial intelligence software. It also discusses the monetization of mobile devices’ desires and preferences that can lead to the triangulated marketing of content, products, or services to billions of consumers—in a relevant, anonymous, and personal manner.

"A must-have book for anyone expecting to do research and/or applications in categorical data analysis." —Statistics in Medicine

"It is a total delight reading this book." —Pharmaceutical Research

"If you do any analysis of categorical data, this is an essential desktop reference." —Technometrics

The use of statistical methods for analyzing categorical data has increased dramatically, particularly in the biomedical, social sciences, and financial industries. Responding to new developments, this book offers a comprehensive treatment of the most important methods for categorical data analysis.

An emphasis on logistic and probit regression methods for binary, ordinal, and nominal responses for independent observations and for clustered data with marginal models and random effects models Two new chapters on alternative methods for binary response data, including smoothing and regularization methods, classification methods such as linear discriminant analysis and classification trees, and cluster analysis New sections introducing the Bayesian approach for methods in that chapter More than 100 analyses of data sets and over 600 exercises Notes at the end of each chapter that provide references to recent research and topics not covered in the text, linked to a bibliography of more than 1,200 sources A supplementary website showing how to use R and SAS; for all examples in the text, with information also about SPSS and Stata and with exercise solutions

Categorical Data Analysis, Third Edition is an invaluable tool for statisticians and methodologists, such as biostatisticians and researchers in the social and behavioral sciences, medicine and public health, marketing, education, finance, biological and agricultural sciences, and industrial quality control.

“Not so different in spirit from the way public intellectuals like John Kenneth Galbraith once shaped discussions of economic policy and public figures like Walter Cronkite helped sway opinion on the Vietnam War…could turn out to be one of the more momentous books of the decade.” —New York Times Book Review

"Nate Silver's The Signal and the Noise is The Soul of a New Machine for the 21st century." —Rachel Maddow, author of Drift

"A serious treatise about the craft of prediction—without academic mathematics—cheerily aimed at lay readers. Silver's coverage is polymathic, ranging from poker and earthquakes to climate change and terrorism." —New York Review of Books

Nate Silver built an innovative system for predicting baseball performance, predicted the 2008 election within a hair’s breadth, and became a national sensation as a blogger—all by the time he was thirty. He solidified his standing as the nation's foremost political forecaster with his near perfect prediction of the 2012 election. Silver is the founder and editor in chief of FiveThirtyEight.com.

Drawing on his own groundbreaking work, Silver examines the world of prediction, investigating how we can distinguish a true signal from a universe of noisy data. Most predictions fail, often at great cost to society, because most of us have a poor understanding of probability and uncertainty. Both experts and laypeople mistake more confident predictions for more accurate ones. But overconfidence is often the reason for failure. If our appreciation of uncertainty improves, our predictions can get better too. This is the “prediction paradox”: The more humility we have about our ability to make predictions, the more successful we can be in planning for the future.

In keeping with his own aim to seek truth from data, Silver visits the most successful forecasters in a range of areas, from hurricanes to baseball, from the poker table to the stock market, from Capitol Hill to the NBA. He explains and evaluates how these forecasters think and what bonds they share. What lies behind their success? Are they good—or just lucky? What patterns have they unraveled? And are their forecasts really right? He explores unanticipated commonalities and exposes unexpected juxtapositions. And sometimes, it is not so much how good a prediction is in an absolute sense that matters but how good it is relative to the competition. In other cases, prediction is still a very rudimentary—and dangerous—science.

Silver observes that the most accurate forecasters tend to have a superior command of probability, and they tend to be both humble and hardworking. They distinguish the predictable from the unpredictable, and they notice a thousand little details that lead them closer to the truth. Because of their appreciation of probability, they can distinguish the signal from the noise.

With everything from the health of the global economy to our ability to fight terrorism dependent on the quality of our predictions, Nate Silver’s insights are an essential read.

If you have interest in DynamoDB and want to know what DynamoDB is all about and become proficient in using it, this is the book for you. If you are an intermediate user who wishes to enhance your knowledge of DynamoDB, this book is aimed at you. Basic familiarity with programming, NoSQL, and cloud computing concepts would be helpful.

Machine audition is the study of algorithms and systems for the automatic analysis and understanding of sound by machine. It has recently attracted increasing interest within several research communities, such as signal processing, machine learning, auditory modeling, perception and cognition, psychology, pattern recognition, and artificial intelligence. However, the developments made so far are fragmented within these disciplines, lacking connections and incurring potentially overlapping research activities in this subject area.

Machine Audition: Principles, Algorithms and Systems contains advances in algorithmic developments, theoretical frameworks, and experimental research findings. This book is useful for professionals who want an improved understanding about how to design algorithms for performing automatic analysis of audio signals, construct a computing system for understanding sound, and learn how to build advanced human-computer interactive systems.

Get Started Fast with Apache Hadoop® 2, YARN, and Today’s Hadoop Ecosystem

With Hadoop 2.x and YARN, Hadoop moves beyond MapReduce to become practical for virtually any type of data processing. Hadoop 2.x and the Data Lake concept represent a radical shift away from conventional approaches to data usage and storage. Hadoop 2.x installations offer unmatched scalability and breakthrough extensibility that supports new and existing Big Data analytics processing methods and models.

Hadoop® 2 Quick-Start Guide is the first easy, accessible guide to Apache Hadoop 2.x, YARN, and the modern Hadoop ecosystem. Building on his unsurpassed experience teaching Hadoop and Big Data, author Douglas Eadline covers all the basics you need to know to install and use Hadoop 2 on personal computers or servers, and to navigate the powerful technologies that complement it.

This guide is ideal if you want to learn about Hadoop 2 without getting mired in technical details. Douglas Eadline will bring you up to speed quickly, whether you’re a user, admin, devops specialist, programmer, architect, analyst, or data scientist.

In this insightful book, you'll learn from the best data practitioners in the field just how wide-ranging -- and beautiful -- working with data can be. Join 39 contributors as they explain how they developed simple and elegant solutions on projects ranging from the Mars lander to a Radiohead video.

With Beautiful Data, you will:Explore the opportunities and challenges involved in working with the vast number of datasets made available by the WebLearn how to visualize trends in urban crime, using maps and data mashupsDiscover the challenges of designing a data processing system that works within the constraints of space travelLearn how crowdsourcing and transparency have combined to advance the state of drug researchUnderstand how new data can automatically trigger alerts when it matches or overlaps pre-existing dataLearn about the massive infrastructure required to create, capture, and process DNA data

That's only small sample of what you'll find in Beautiful Data. For anyone who handles data, this is a truly fascinating book. Contributors include:

Written by Oracle insiders, this indispensable guide distills an enormous amount of information about the Oracle Database into one compact volume. Ideal for novice and experienced DBAs, developers, managers, and users, Oracle Essentials walks you through technologies and features in Oracle’s product line, including its architecture, data structures, networking, concurrency, and tuning.

Complete with illustrations and helpful hints, this fifth edition provides a valuable one-stop overview of Oracle Database 12c, including an introduction to Oracle and cloud computing. Oracle Essentials provides the conceptual background you need to understand how Oracle truly works.

Topics include:

A complete overview of Oracle databases and data stores, and Fusion Middleware products and featuresCore concepts and structures in Oracle’s architecture, including pluggable databasesOracle objects and the various datatypes Oracle supportsSystem and database management, including Oracle Enterprise Manager 12cSecurity options, basic auditing capabilities, and options for meeting compliance needsPerformance characteristics of disk, memory, and CPU tuningBasic principles of multiuser concurrencyOracle’s online transaction processing (OLTP)Data warehouses, Big Data, and Oracle’s business intelligence toolsBackup and recovery, and high availability and failover solutions