May 2018 Blog Posts (100)

1. Back-propagation

This problem also appeared as an assignment problem in the coursera online courseMathematics for Machine Learning: Multivariate Calculus.The description of the problem is taken from the assignment itself.

In this assignment, we shall train a neural network to draw a curve. The curve takesone inputvariable, the…

This blog explores a typical image identification task using a convolutional ("Deep Learning") neural network. For this purpose we will use a simple JavaCNN packageby D.Persson, and make our example small and concise using the Python scripting language. This example can also be rewritten in Java, Groovy, JRuby or any scripting language supported by the Java virtual machine.

This example will use images in the grayscale format (PGM). The name "PGM" is an acronym derived from…

Bill is the Editorial Director for Data Science Central, and President and Chief Data Scientist at Data-Magnum, providing predictive analytics and big data infrastructure projects as a service. Bill has been an active commercial predictive modeler since 2001.…

Hello, this is my second article about how to use modern C++ for solving machine learning problems. This time I will show how to make a model for polynomial regression problem described in previous article, but now with another library which allows you to use your GPU easily.…

If we focus on industries that are in the business of buying (some or all) of a company, then trying to improve the operations before selling then we can identify at least three critical stages for data science to play a significant role.

Over years, a crucial part of data-gathering behavior has revolved around what other people think. With the constantly growing popularity and availability of opinion-driven resources such as personal blogs and online review sites, new challenges and opportunities are emerging as people have started using advanced technologies to make decisions now. Sentiment analysis or opinion mining, refers to the use of computational linguistics, text analytics and natural language processing to identify…

In case you've missed it, there has been a tremendous number of news stories, social media posts and the like on Bitcoin, Hashing Algorithms, Blockchain, video graphics cards and Crypto-mining. If you are anything like the most of us, the information barely provides you a…

GDPR & Our Data

All of sudden lawyers are busy and got lot of work to do on this new thing called as GDPR. Because 90% of the world’s data was created in the last two years. Will GDPR also going to impact historical data. Does GDPR require Machine Learning algorithms to explain their output? may be yes may be no or in short probably not, but there is enough ambiguity to be clarified and keep DataScientists, Lawyers, industry influencers busy.…

A time series is a sequence of data points recorded at specific time points - most often in regular time intervals (seconds, hours, days, months etc.). Every organization generates a high volume of data every single day – be it sales figure, revenue, traffic, or operating cost. Time series data mining can generate valuable information for long-term business decisions, yet they are underutilized in most organizations. Below is a list of few possible ways to…

Machine learning has the ability to automate a lot of jobs in the future. It is very easy to talk about this automation when it isn't your job that will be automated. But the scary part is that there are a lot of highly skilled jobs that will also face some type of automation in the future as well. When you are talking about your own job potentially being automated, it becomes less abstract and more real. It is very easy to say go ahead and automate jobs, until it is your own that is being…

Although I deal with many different types of metrics, I believe they can be generally classified as follows: 1) time use; 2) alignment; 3) production; 4) performance; 5) service; 6) and market. In this blog, I will be providing some comments pertaining to each. Although I have yet to encounter any myself, I am certain that there must be text books on the issue of operational metrics and how to make use of them. However, I personally developed nearly all of those that I use. Although I do…

In this post I will share some tips I learned after using the Apache Hadoop environment for some years, and doing many many workshops and courses. The information here considers Apache Hadoop around version 2.9, but it could definably be extended to other similar versions.

These are considerations for when building or using a Hadoop cluster. Some are considerations over the Cloudera distribution. Anyway, hope it…

The list below is a (non-comprehensive) selection of what I believe should be taught first, in data science classes, based on 30 years of business experience. This is a follow up to my article Why logistic regression should be taught last.

I am not sure whether these topics below are even discussed in data camps or college…