Kategori: Big Data

10 Rules for Creating Reproducible Results in Data Science

In recent years evidence has been mounting that points to a crisis in the reproducible results of scientific research. Reviews of papers in the fields of psychology and cancer biology found that only 40% and 10%, respectively, of the results, could be reproduced. Nature published the results of a survey of researchers in 2016 that […]

Machine Learning using Spark and R

R is ubiquitous in the data science community. Its ecosystem of more than 8,000 packages makes it the Swiss Army knife of modeling applications. Similarly, Apache Spark has rapidly become the big data platform of choice for data scientists. Its ability to perform calculations relatively quickly (due to features like in-memory caching) makes it ideal […]

Assumptions Can Ruin Your K-Means Clusters

Clustering is one of the most powerful and widely used of the machine learning techniques. It’s very seductive. Throw some data into the algorithm and let it discover hitherto unknown relationships and patterns. K-means is the most popular of all the cluster algorithms. It’s easy to understand—and therefore implement—so it’s available in almost all analysis […]

5 Machine Learning Books Worth Reading

Machine learning remains a hot topic as organizations attempt to squeeze insights and competitive advantage from their data sets. I’ve selected five books that may be of interest if you are embarking on a machine learning initiative or career. Obviously, there are much more great books out there and I’d like to hear your recommendations—recommendation […]

5 Big Data Security Challenges

As interest in big data has exploded, organizations have rushed to grab competitive advantage by deploying analytics pipelines that exploit this newly available resource. Many projects have been set up in a “skunkworks” environment, often by data science teams. While this has accelerated the time to market for new features, it has created a potential […]