Big Data

The real secret to unlocking big data? Math. It’s often said that mathematics is the universal language.

It represents ideas and pure logic, crossing every domain from business to social and physical sciences, and even art. Yet with the advent of more and more (big) data, we have not fully used the power of math as a universal language; instead we’ve focused on more specific concepts like the query and pattern matching to help search information and find relationships in data faster. As technologists, we know all of this ends up as 1s and 0s, but the algorithms behind the data are less of a focus, leaving many problems deep within the lingo and existing practices specific to their domain.
The Big Data Conundrum: How to Define It?
One of the biggest new ideas in computing is “big data.”

There is unanimous agreement that big data is revolutionizing commerce in the 21st century. When it comes to business, big data offers unprecedented insight, improved decision-making, and untapped sources of profit. And yet ask a chief technology officer to define big data and he or she will will stare at the floor. Chances are, you will get as many definitions as the number of people you ask.
Market Basket Analysis: Introduction and Approaches. ‘Big data’ is dead. What’s next?
How can big data and smart analytics tools ignite growth for your company?

Find out at DataBeat, May 19-20 in San Francisco, from top data scientists, analysts, investors, and entrepreneurs.
Anchor Modeler. Mohan Keynote Paper EDBT 2013 Genoa Camera Ready Modified.pdf. A MongoDB Tutorial using C# and ASP.NET MVC. In this post I’m going to create a simple ASP.NET MVC website for a simple blog that uses MongoDB and the offical 10gen C# driver.

MongoDB is no NOSQL database that stores information as Binary JSON (BSON) in documents. I have been working with it now for around 6 months on an enterprise application and so far am loving it. Our application is currently in alpha phase but should be public early next year! If you are used to working with an RDBMS, it takes a little bit of getting used to as generally you work with a denormalized schema. This means thinking about things quite differently to how you would previously; you’re going to have repeating data which is a no-no in a relational database, but it’s going to give you awesome performance, sure you may need an offline process that runs nightly and goes and cleans up your data, but for the real time performance gains it’s worth it. Download source Our reasons for choosing MongoDB were performance and scalability. Now back to MVC.
Naïve Bayes Classification. This is the continuation of my series exploring Machine Learning, converting the code samples of “Machine Learning in Action” from Python to F# as I go through the book.

Today’s post covers Chapter 4, which is dedicated to Naïve Bayes classification – and you can find the resulting code on GitHub. Disclaimer: I am new to Machine Learning, and claim no expertise on the topic. I am currently reading“Machine Learning in Action”, and thought it would be a good learning exercise to convert the book’s samples from Python to F#. The idea behind the Algorithm The canonical application of Bayes naïve classification is in text classification, where the goal is to identify to which pre-determined category a piece of text belongs to – for instance, is this email I just received spam, or ham (“valuable” email)? Imagine that you received an email containing the words “Nigeria”, “Prince”, “Diamonds” and “Money”. A simple F# implementation Note: the code presented here can be found found on GitHub 3. 4. 5.
Journalism in the Age of Data: A Video Report on Data Visualization by Geoff McGhee.

Revolutions. Big Data is the Answer - What was the Question?
Big Data is the Answer – What was the Question?

The Big Data Analytics promise: enable “data monetization” through more timely, more accurate, more complete, more granular, more frequent decisions.
SUML.pdf (application/pdf Object)
InformationWeekIBM Picks Hadoop To Analyze Large Data Volumes - software Blog. Big Blue unveiled a package of services and analytics called BigInsights based on Apache's open source Hadoop.

With Big Blue behind Hadoop, companies with Big Data problems may find the open source technology is available in more manageable forms. IBM, the originator of the SQL data access language, has recognized the NoSQL movement has a point. Some data management problems don't lend themselves to being solved by IBM's DB2 or other relational database systems.

That's why it's started offering consulting services on managing large volumes of data based on Apache's open source Hadoop. It has a package of services and Hadoop-based analytics that it calls BigInsights Core to enable companies to take the plunge in Internet-scale data volumes. Hadoop makes no pretence of running transactions or functioning like a transaction-processing database system, with its stringent requirements for a two-phase commit. BigSheets was first announced Feb. 25.
Data Center Knowledge.