Following Microsoft BI developments, I was under the impression that "on Premise" SQL Server data mining was a dying bread and that it was simply getting replaced by Machine Learning (cloud based)...

Did I get that wrong? :doze:

Is there still a point starting using SQL Server (on premise) data mining, or should we skip this and head straight for the clouds?

Thanks

Eric

I don't know what the general opinion is of the masses, but I can tell you my perspective.

Our company does machine learning in digital advertising. We use a lot of R and a little bit of Python to accomplish these tasks. Our data all lives in the Microsoft stack however.

Most of our guys use R, Python and SAS. Very few are familiar with Microsoft Data Mining with SSAS. This is because not a lot of people are learning how to do machine learning on the Microsoft stack. They are learning basic SQL and mostly how to use NoSQL solutions or cloud based solutions.

That said, Microsoft Data Mining in our camp could be very useful. Our data lives in the Microsoft stack and our data mining packages can do predictive analytics in the same location where the data lives. It's just the sheer fact that our data scientist guys are not as familiar with SSAS nor can get passed the steep learning curve in using it versus R, Python and SAS that are insanely more familiar and flexible on the fly.

Outside of that, it's hard to compete with cloud solutions like Microsoft Azure ML, Amazon Redshift and Google Compute/BigQuery that can have you computing data across dozens of machines without impacting your stack.

Azure ML specifically combines SSIS and Machine Learning with Python and R very nicely. It's a great option than using Microsoft Data Mining that doesn't seem to be going anywhere.

In our case, we don't have yet (IMHO) any of those three Vs therefore I find SQL Server quite adequate for us.

Do you think there could there be other reasons to use NoSQL if we already are a Microsoft shop?

Cheers

Eric

I'm really starting to dislike the term big data simply because it's so vague. But, some of the main reasons people are leaning towards NoSQL is the fact it's open-source, it's schema-less and commodity cluster computing.

Yes, it does handle high volumes, high velocity and diverse data well.

But for us, I think the main reason is the schema-less system that makes it very flexible to just add data in go while having the enterprise-ready feel that scales across commodity hardware.

SQL Server does require some work to get that ETL up and running. It does take some work to structure and sometimes find a relation. NoSQL is a bit more flexible in that regard, but not everything we need where we would wake up one day and find our RDBMS gone.

That's because, at the end of the day, our data eventually needs structure. It eventually needs that schema. After the data scientist have explored and analyzed the data in NoSQL, it's going to be shifted towards a relational model. This is because we still need that structure for the end-user. We need some type of strict governance over the data that is going to bring it up to a standardization for reporting.

If not, our data will still remain chaotic even though it's clean. It's very easy to shift and change the data in NoSQL. Therefore, the end user is going to have to be flexible and change with the data too. Unfortunately, our end users are not very tech-savy. Having to shift with the data may require them to have more knowledge of that shift (i.e.: what data changed, what data is new).

So, having that data discovery or centralized data platform is good with NoSQL. Then having that structure to the madness that is going to standardize the data for the future is also good too with RDBMS.

Following Microsoft BI developments, I was under the impression that "on Premise" SQL Server data mining was a dying bread and that it was simply getting replaced by Machine Learning (cloud based)...

Did I get that wrong? :doze:

Is there still a point starting using SQL Server (on premise) data mining, or should we skip this and head straight for the clouds?

Thanks

Eric

I'm an old-timer with 42 years of IT and I think the term 'data mining' is dumb, just plain dumb. Any DBA or SQL developer worth half of what they are paid can access, massage, analyze, summarize, and present data without having all the fancy stuff involved. But on the other hand, don't 'head straight for the clouds' either. That is ALSO dumb, just plain dumb. Cloud computing is destined for disaster. Just a hype idea who's time came and will go eventually die off. So all you folks get off your duffs and learn SQL, the most powerful and versatile tool you will use in your career. But you have to put forth the effort to understand and master the tool.