“Ikatlah ilmu dengan menuliskannya.” (Ali bin Abi Thalib)

This time I want to share about what is deep learning, at least as far as I have learned untul this day (when this article was made). Before I have tried to created a startup company that tried to implement machine learning and deep learning algorithm engine that we made (with my friend).

Many references that we can use on deep learning, especially from big companies such as google, facebook, Baidu, microsoft, amazon, nvidia and others. What was Deep Learning? how important or how valuable deep learning? especially for business, who figures that a lot of research or build deep learning? And why me and my friends want to build their own engine for machine learning before? not a lot of framework, libraries and services (especially such as azure and aws) for machine learning? (when this article was made)

I will not answer all of the questions above, because it will take time to write hahaha, I write as my fingers Moves hahaha

In these days of artificial intelligence technology get the spotlight quite a lot from the business people of the world, especially in business of technology area, even from the political world. As in the USA Obama chose they first Chief Data Scientist for America.

in wikipedia: Deep learning (deep machine learning, or deep structured learning, or hierarchical learning, or sometimes DL) is a branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in data by using model architectures, with complex structures or otherwise, composed of multiple non-linear transformations

Deep learning itself according to my observation until now (IMHO), a pile or stack of multiple algorithms or methods, so that it develops deep learning approaches with different architectures. Some of the objectives of the pile are feature extraction method, also utilizing all resources optimally. What resource? data, most of the data in the world is not labeled (categorized), deep learning is usually a stack of unsupervised and supervised learning algorithms that can take advantage of data labeled or not labeled. With a more optimal utilization of information will certainly improve the performance of the resulting model.

What is interesting is the analogy Andrew .Ng, Chief Scientist Baidu about what could be done with more data and larger architectural? using the analogy of building a rocket.

At the top at the image is Andrew Ng recipe to build a Good Machine Learning which he think is quite relevant. The next question is whether to enlarge the network such as enlarge the rocket engine or add more data like adding more fuel to the rocket and whether this is sufficient and can be finished ?? it turns out the answer is no, the next is find a network architecture suitable for the business that we are trying to accomplish with machine learning. Next …

According to Andrew Ng, 50% of the work is spent to modify the architecture of the network that used, and he gave a term in the search process suitable architecture is Black Magic hahahaa.

At the top at the image we can see the approach that is often used to implement deep learning is a graphical representation methods or multilayer, or multilayer graphical models such as belief networks, neural networks, hidden Markov, and others. Basically these methods as well as the standard machine learning is a method of statistical and stochastic which is already widely known in the world of mathematics, especially statistics.

The human brain can perform distributed representation, and deep learning idea basicly work like the human brain. For example, as shown below, people receive input from sensory vision eye, and then channeled to the cortex in the human brain, then there will be, there are several parts of the brain that will perform information processing and information extraction, as shown below :

I took the picture from the slide Professor Hugo Larochelle, eye receives input and then forwarded to several parts of the brain that each part doing estraksi such as at the LGN he (human) sees only as a variety of arrangement of information into small pieces (like a binary computer), then on the V1 information is extracted back into some form of simple, then in the next section V2 information extracted to form higher representation, atau some new form of group information before, then on the AIT part is extracted back into some of the higher representation parts again as face shape and others , after it had carried out the decision-making part of the brain such as the PFC section, and then our brain provides a specific response which may be a motoric response.

The idea in neuroscience is tried duplicated in the area of machine learning which has same purpose (relatively), or in other words, perform reverse engineering on systems thinking human beings, as in the book AI modern approach Russell, the AI which aims to make the machine one think like humans and think rationally.

The method that I think is very appropriate, and some experts or professors in the areas of machine learning, especially in deep network is multilayer neural network or multilayer graphical models. Neural network one of them, to see the idea above, it can be built into Deep Neural Network with the aim of imitating the workings of the human brain.

Neural Networks in Deep network made more than one hidden layer, but that so the problem is when we make more than one hidden layer. We know the current function of the method of back propagation neural network for example like this:

the problem that arises is the value of each gradient that will emerge will be smaller (close to zero) for each hidden layer output layer is approaching, so the accuracy instead of getting better, even worse.

Then what is the solution? in the case of Deep Network can by adding algorithms or methods that perform initialization at the beginning of the construction of the network before tuning. One method that can be used and I think the easiest to use is autoencoder, maybe I will explain in the next article (if i’m not in lazy mode) hahaha. Autoencoder this function is actually quite similar to extract information such as PCA can reduce the dimensionality of the data in the dataset.

picture from http://kiyukuta.github.io/

The above is an example of a model network of autoencoder, the real focus is on hidden layers that exist on the network topology. Basically autoencoder is unsupervised learning, so it does not require labeling. Then, how about the labels or how to perform stochastic gradient descent? What is the loss function ? (the details may later hehe …)

Autoencoder concept only encode the input of existing or compressed or upgrade estrak which features are indeed important, and the result is information on a hidden layer, and then try decoded back the input, and see whether the result that come from the hidden layer network. Compare the result with the original input, to see the conformity with the original input (returned to normal before encoded). The inferences is the network along with the weight of each synapsis or edge connecting each unit or perceptron.

In other words, the information that will be used for the hidden layer (main deep network) or the value of each unit in the hidden layers that can represent the input layer or the initial weights of that hidden layer is the result of the extraction of the information that contained in the input layer. So the synapsis weights that is appropriate is transferred to the network on a deep neural network, as initialization beginning synapsis between input layer and hidden layer first. Furthermore, the execute the autoencoder for next hidden layer or for between the first hidden layer and the second hidden layer, and so on until between the n-1 hidden layer to the n hidden layer.

So the gradients that we will get when the tuning is started, will not convergent to zero.

Maybe that’s the simple idea of deep network in technically, and in fact this is a fairly simple algorithm. There are several other algorithms such as deep autoencoder, restricted Boltzmann machine (RBM), stacked RBM or deep belief network and others.

Deep learning itself is Often juxtaposed with big data, to analyze big data with its 3V by using deep learning. So with the complexity of the method required supporting technologies such as hardware and implementation approach parallel or clusters both in the implementation of the algorithm or the data management is quite fast, so it can reduce the complexity of the time that Appears in deep learning.

There are various approaches made, from parallel programming using the usual multi-core, GPU, up to message passing inteface suppose that utilize more than one machine. As did Google, Baidu, microsoft, even nvidia have Deep Learning GPU Training System (DIGITS).

Why me and my friends want to make an engine for deep learning or perhaps at least standard machine learning. Previously I have worked for a company and have task to create a machine learning engine for fraud detection system, certainly with the goal can be implemented in production. It turned out that in the process the most difficult is to move the algorithms that have been implemented in fraud detection case into the production system. Why? due to differences in the needs of production environments. Machine learning should be able to classify fast, reliable, and can regularly improve existing models or scheduling, if you look at some of the library that has been existing such as weka or scikit learning, I was not satisfied. Weka when I want to load back the models that have kept needed all the previous information that make up the model. Scikit learning requires a whole load of data to build the model (training), while imagine in the future how much data needs to be processed if we need to load the dataset first to the memory? and weka has more cool condition, the model that produced by weka (random forest) have around 84 mb, imagine if we create services or API to perform classification and requires the model, to make one object model requires 84 mb * size information (dataset) in mb to 1 request? imagine if the request came in 1000?

We think the solution is we create our own engine or use a service that already exists for the specific purpose (machine learning with specific functions, such as fraud detection system, financial analytics, social network analysis and so on).

Me and my friends decided to try to build a system or engine to be able to cover the various needs of machine learning or deep learning when implemented in production, otherwise it will be easier when we do the adjustment for a particular purpose.

And those that I often make reference, there are many people who are engaged in deep learning but there are five people that I often see written and video lecture or seminar, namely:

Prof. Tom Mitchell, Professor at Carnegie Mellon University, my first book about machine learning, so many references in various universities.

Then Prof. Andrew Ng, a Stanford professor who is the CEO and Chief Scientist BAIDU Coursera, and was the brain behind the previous google brain.

Prof. Geoffrey Hinton is a Distinguished Researcher google and also Distinguished Professor Emeritus at the University of Toronto, he is an important figure in the movement of deep learning.

Hugo Larochelle, a professor at the Université de Sherbrooke
Jeff Dean, a Google computer scientist and Senior Fellow at the Systems and Infrastructure group.