November 2012 Blog Posts (28)

This newsletter contains three sections: announcements from sponsors (at the top), featured contributions from the Data Science Central community, and valuable resources for big data, data science and analytic practitioners (at the bottom).

Everybody has an opinion on Santa Claus. There are tons of books, recipes, decorations and family traditions dedicated to his arrival for a successful Holiday season. But how about Data Scientists? Is this new position…

Join Fernando, Chief Technologist, Enterprise, for MarkLogic in a discussion on how to combine the power of Tableau™ against a NoSQL Database that is able to perform analytics against disparate data sets composed of structured, unstructured and graph data. The result is an unprecedented level of flexibility and power from being able to correlate and analyze “Big Ugly Data” in its natural form. The use case will…

Along with several others, Harvard Business Review has recently pointed out an area with significant job growth which is appropriate for individuals with a curious nature and an expertise in business analytics. But who will dominate this area? The Data Scientist- trending as “the sexist job” in America, this role has a desirability that calls upon…

Top 50 data science / big data tools, described in less than 40 words, for decision makers. Please help us: any definition that you fill will have your name attached to it: send your definition or new term and definition to [email protected]…

Big Data holds a big promise. But has that promise paid out already? Or are you heading for Big Dollar Disaster? Many take inventory of their data and find out they have terabytes of data lying around. Surely something should be done with that, so here’s how we see a lot of companies going about implementing ‘something’ for their Big Data.

How to use s3 (s3 native) as input / output for hadoop MapReduce job. In this tutorial we will first try to understand what is s3, difference between s3 and s3n and how to set s3n as Input and output for hadoop map reduce job. Configuring s3n as I/O may be useful for local map reduce jobs (ie MR run on local cluster), But It has significant importance when we run elastic map reduce job (ie when we run job on cloud). When we run job on cloud we need to specify storage location for input as…

What is Hadoop:Hadoop is a framework written in Java for running applications on large clusters of commodity hardware and incorporates features similar to those of the Google File System and of MapReduce. HDFS is a highly fault-tolerant distributed file system and like…

When creating a predictive model, data miners need to “tune” it to make the right kind of mistakes. Setting the cut-off point between ‘promising’ and ‘unpromising’ depends a lot on our client’s biggest concern -- missed opportunities or false alarms.

Predicting election results in the 50 states is actually much more easy than most people think. West Coast and East Coast are democrat, Midwest, Texas etc. are mostly republican (the Midwest becoming more republican because the population is aging due to brain drain by young, smart people - mostly democrats). So indeed the task is not about correctly predicting results for the 50 states, but simply predicting…