All - O'Reilly Mediahttps://www.oreilly.com2018-02-18T05:13:48ZAll of our Ideas and Learning material from all of our topics.O'Reilly Media38.393314-122.836667oreilly/radar/atomhttps://feedburner.google.comBuild generative models using Apache MXNet2018-02-16T18:40:00Ztag:www.oreilly.com,2018-02-16:/ideas/generative-model-using-apache-mxnet<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/data-spiral-staircase-crop-724196bf71ad831869f7710e7b1c751f.jpg'/></p><p><em>A step-by-step tutorial to build generative models through generative adversarial networks (GANs) to generate a new image from existing images.</em></p><p>In our <a href="https://www.oreilly.com/tags/apache-mxnet-tutorials">previous notebooks</a>, we used a deep learning technique called convolution neural network (CNN) to classify text and images. A CNN is an example of a discriminative model, which creates a decision boundary to classify a given input signal (data) as either being in or out of a classification, such as email spam.</p>
<p>Deep learning models in recent times have been used to create even more powerful and useful models called generative models. A generative model doesn't just create a decision boundary, but understands the underlying distribution of values. Using this insight, a generative model can also generate new data or classify a given input data. Here are some examples of generative models:</p>
<ol>
<li>Producing a new song or combining two genres of songs to create an entirely different song</li>
<li>Synthesizing new images from existing images</li>
<li>
<a href="https://arxiv.org/pdf/1703.04244.pdf">Upgrading images</a> to a higher resolution in order to remove fuzziness, improve image quality, and much more</li>
</ol>
<p>In general, generative models can be used on any form of data to learn the underlying distribution, generate new data, and augment existing data.</p>
<p>In this tutorial, we are going to build generative models through generative adversarial networks (GANs) to generate a new image from existing images. Our code will use Apache MXNet’s <a href="https://mxnet.incubator.apache.org/tutorials/index.html#gluon">Gluon</a> API.</p>
<p>By the end of the notebook, you will be able to:</p>
<ol>
<li>Understand generative models</li>
<li>Place generative models into the context of deep neural networks</li>
<li>Implement a generative adversarial network (GAN)</li>
</ol>
<section data-type="sect1" id="how-generative-models-go-further-than-discriminative-models-5JMs7">
<h2>How generative models go further than discriminative models</h2>
<p>Let's see the power of generative models using a trivial example. The following depicts the heights of 10 humans and Martians.</p>
<p>Martian (height in centimeter):<br> 250,260,270,300,220,260,280,290,300,310<br>
Human (height in centimeter):<br>
160,170,180,190,175,140,180,210,140,200 </p>
<p>The heights of human beings follow a normal distribution, showing up as a bell-shaped curve on the a graph (see Figure 1). Martians tend to be much taller than humans, but also have a normal distribution. So, let's input the heights of humans and Martians into both discriminative and generative models.</p>
<figure class="center" id="id-MkinhW"><img alt="Graph of sample human and Martian heights" src="https://d3ansictanv2wj.cloudfront.net/humans_mars-a688d2418f596125564ca4ead113137b.png"><figcaption><span class="label">Figure 1. </span>Graph of sample human and Martian heights. Image by Manu Jeevan.</figcaption></figure>
<p>If we train a discriminative model, it will just plot a decision boundary (see Figure 2). The model misclassifies just one human—the accuracy is quite good overall. But the model doesn't learn about the underlying distribution of data, so it is not suitable for building the powerful applications listed in the beginning of this article. </p>
<figure class="center" id="id-EaiWcx"><img alt="decision boundary" src="https://d3ansictanv2wj.cloudfront.net/martians-chart5_preview-46f4f9f5c665b8cb9a51aaf94da447c8.jpg"><figcaption><span class="label">Figure 2. </span>Boundary between humans and Martians, as found by a discriminative model. Image by Manu Jeevan.</figcaption></figure>
<p>In contrast, a generative model will learn the underlying distribution (lower dimension representation) for Martian (mean=274, std=8.71) and Human (mean=174, std=7.32). If we know the normal distribution for Martians (mean=274, std=8.71), we can produce new data by generating a random number between 0 and 1 (uniform distribution) and then querying the normal distribution of Martians to get a value: say, 275 cm.</p>
<p>Using the underlying distribution, we can generate new Martians and humans, or a new interbreed species (humars). We have infinite ways to generate data because we can manipulate the underlying distribution of data. We can also use this model for classifying Martians and humans, just like the discriminative model. For a concrete understanding of generative versus discriminative models, please check <a href="https://arxiv.org/pdf/1703.01898.pdf">the article "Generative and Discriminative Text Classification with Recurrent Neural Networks"</a> by Yogatama, et al.</p>
<p>Examples of discriminative models include logistic regression and support vector machines (SVNs), while examples of generative models include hidden Markov models and naive Bayes classifiers.</p>
<section data-type="sect2" id="using-discriminative-and-generative-models-in-neural-networks-OKs0Cy">
<h3>Using discriminative and generative models in neural networks</h3>
<p>In this article, we’ll want train a discriminative model called "m-dis" and a generative model called "m-gen-partial" to find the difference between a dog and a cat. We’ll use them together to create a generative adversarial network (GAN), which potentially can improve a model over time through iteration to make better classifications.</p>
<p>The discriminative model will have a <a href="https://mxnet.incubator.apache.org/api/python/gluon.html#mxnet.gluon.loss.SoftmaxCrossEntropyLoss">softmax layer</a> as the final layer to do binary classification. Except for the input layer an the softmax layer, all the other layers (the hidden layers) try to learn a <a href="http://www.deeplearningbook.org/contents/representation.html">representation</a> of the input (cat or dog?) that can reduce the loss at the final layer. The hidden layer may learn a rule like, “if the eyes are blue and the image has brown stripes, it is a cat, otherwise it is a dog,” ignoring other important features like the shape of the body, height, etc. </p>
<p>In contrast, the generative model is trained to learn a lower-dimension representation (distribution) that can represent the input image of a cat or dog. The final layer is not a softmax layer for classification. The hidden layer can learn about the general features of a cat or dog (shape, color, height, etc.). Moreover, the data set needs no labeling, as we are training only to extract features to represent the input data.</p>
<p>We can then tweak the generative model to classify an animal by adding a softmax classifier at the end and by training it with few labeled examples of cats and dogs. We can also generate new data by adding a decoder network to the model. Adding a decoder network is not trivial, and we have explained this in the "Designing the GAN network" section of this post.</p>
<section data-type="sect2" id="preparing-your-environment-o8sQTyCz">
<h3>Preparing your environment</h3>
<p>If you're working in the AWS Cloud, you can save yourself a lot of installation work by using an <a href="https://aws.amazon.com/machine-learning/amis/">Amazon Machine Image</a>, pre-configured for deep learning. If you have done this, skip steps 1-5 below.</p>
<p>If you are using a Conda environment, remember to install pip inside conda by typing <code>conda install pip</code> after you activate an environment. This will save you a lot of problems down the road.</p>
<p>Here's how to get set up:</p>
<ol>
<li>Install <a href="https://www.continuum.io/downloads">Anaconda</a>, a package manager. It is easier to install Python libraries using Anaconda.</li>
<li>Install <a href="http://scikit-learn.org/stable/install.html">scikit-learn</a>, a general-purpose scientific computing library. We'll use this to pre-process our data. You can install it with <code>conda install scikit-learn</code>.</li>
<li>Grab the Jupyter Notebook, with <code>conda install jupyter notebook</code>.</li>
<li>Get <a href="https://github.com/apache/incubator-mxnet/releases">MXNet</a>, an open source deep learning library. The Python notebook was tested on version 0.12.0 of MxNet, and you can install it using pip as follows: pip install mxnet==0.12.0</li>
<li>After you activate the anaconda environment, type these commands in it: <code>source activate mxnet</code>
</li>
</ol>
<p>The consolidated list of commands is:</p>
<pre data-code-language="bash" data-type="programlisting" data-highlighted="true">curl -O https://repo.continuum.io/archive/Anaconda3-4.2.0-Linux-x86_64.sh
chmod +x Anaconda3-4.2.0-Linux-x86_64.sh
./Anaconda3-4.2.0-Linux-x86_64.sh
conda install pip
pip install opencv-python
conda install scikit-learn
conda install jupyter notebook
pip install mxnet-cu90
</pre>
<ol>
<li>You can download the <a href="https://github.com/sookinoby/generative-models/blob/master/Test-rnn.ipynb"> MXNet notebook for this part of the tutorial</a>, where we've created and run all this code, and play with it! Adjust the hyperparameters and experiment with different approaches to neural network architecture.</li>
</ol>
</section>
</section>
<section data-type="sect1" id="generative-adversarial-network-gan-XQs2sG">
<h2>Generative adversarial network (GAN)</h2>
<p><a href="https://arxiv.org/abs/1406.2661">Generative adversarial network</a> is a neural network model based on a <a href="https://en.wikipedia.org/wiki/Zero-sum_game">zero-sum game</a> from game theory. The application typically consists of two different neural networks called discriminator and generator, where each network tries to outperform the other. Let's consider an example to understand a GAN network.</p>
<p>Let's assume there is a bank (discriminator) that detects whether a given currency is real or fake using machine learning. A fraudster (generator) builds a machine learning model to counterfeit fake currency notes by looking at the real currency notes. The counterfeiter deposits the fake notes in the bank and the bank tries to identify which currencies deposited there are fake (see Figure 3).</p>
<figure class="center" id="id-WxietYs9"><img alt="generative adversarial network" src="https://d3ansictanv2wj.cloudfront.net/GAN_SAMPLE-1d0fcf23fb952c6c98684d22c0ee1f0d.png"><figcaption><span class="label">Figure 3. </span>Generative adversarial network. Image by Manu Jeevan.</figcaption></figure>
<p>If the bank tells the fraudster why it classified these notes as fake, he can improve his model based on those reasons. After multiple iterations, the bank cannot tell the difference between the "real" and "fake" currency. This is the idea behind GANs.</p>
<p>So, now let's implement a simple GAN network to generate new anime images. I encourage you to download <a href="https://github.com/sookinoby/generative-models/blob/master/GAN.ipynb">the notebook</a>. You are welcome to adjust the hyperparameters and experiment with different approaches to the neural network architecture.</p>
<section data-type="sect2" id="preparing-the-data-set-q8sBcPsW">
<h3>Preparing the data set</h3>
<p>We use a library called <a href="https://docs.brine.io/getting_started.html">Brine</a> to download our data set. Brine has many data sets, so we can choose the data set that we want to download. To install Brine and download our data set, do the following:</p>
<ol>
<li><code>pip install brine-io</code></li>
<li><code>brine install jayleicn/anime-faces</code></li>
</ol>
<p>This tutorial uses the Anime-faces data set, which contains over 100,000 anime images collected from the internet.</p>
<p>Once the data set is downloaded, you can load it using the following code:</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="c1"># brine for loading anime-faces dataset</code>
<code class="kn">import</code> <code class="nn">brine</code>
<code class="n">anime_train</code> <code class="o">=</code> <code class="n">brine</code><code class="o">.</code><code class="n">load_dataset</code><code class="p">(</code><code class="s1">'jayleicn/anime-faces'</code><code class="p">)</code>
</pre>
<p>We also need to normalize the pixel value of each image to [-1 to 1] and reshape each image from (width X height X channels) to (channels X width X height), because the latter format is what MxNet expects. The transform function does the job of reshaping the input image into the required shape expected by the MxNet model.</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="k">def</code> <code class="nf">transform</code><code class="p">(</code><code class="n">data</code><code class="p">,</code> <code class="n">target_wd</code><code class="p">,</code> <code class="n">target_ht</code><code class="p">):</code>
<code class="c1"># resize to target_wd * target_ht</code>
<code class="n">data</code> <code class="o">=</code> <code class="n">mx</code><code class="o">.</code><code class="n">image</code><code class="o">.</code><code class="n">imresize</code><code class="p">(</code><code class="n">data</code><code class="p">,</code> <code class="n">target_wd</code><code class="p">,</code> <code class="n">target_ht</code><code class="p">)</code>
<code class="c1"># transpose from (target_wd, target_ht, 3)</code>
<code class="c1"># to (3, target_wd, target_ht)</code>
<code class="n">data</code> <code class="o">=</code> <code class="n">nd</code><code class="o">.</code><code class="n">transpose</code><code class="p">(</code><code class="n">data</code><code class="p">,</code> <code class="p">(</code><code class="mi">2</code><code class="p">,</code> <code class="mi">0</code><code class="p">,</code> <code class="mi">1</code><code class="p">))</code>
<code class="c1"># normalize to [-1, 1]</code>
<code class="n">data</code> <code class="o">=</code> <code class="n">data</code><code class="o">.</code><code class="n">astype</code><code class="p">(</code><code class="n">np</code><code class="o">.</code><code class="n">float32</code><code class="p">)</code><code class="o">/</code><code class="mf">127.5</code> <code class="o">-</code> <code class="mi">1</code>
<code class="k">return</code> <code class="n">data</code><code class="o">.</code><code class="n">reshape</code><code class="p">((</code><code class="mi">1</code><code class="p">,</code> <code class="p">)</code> <code class="o">+</code> <code class="n">data</code><code class="o">.</code><code class="n">shape</code><code class="p">)</code>
</pre>
<p>The <code>getImageList</code> function reads the images from the <code>training_folder</code> and returns the images as a list, which is then transformed into an MxNet array.</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="c1"># Read images, call the transform function, attach it to list</code>
<code class="k">def</code> <code class="nf">getImageList</code><code class="p">(</code><code class="n">base_path</code><code class="p">,</code> <code class="n">training_folder</code><code class="p">):</code>
<code class="n">img_list</code> <code class="o">=</code> <code class="p">[]</code>
<code class="k">for</code> <code class="n">train</code> <code class="ow">in</code> <code class="n">training_folder</code><code class="p">:</code>
<code class="n">fname</code> <code class="o">=</code> <code class="n">base_path</code> <code class="o">+</code> <code class="n">train</code><code class="o">.</code><code class="n">image</code>
<code class="n">img_arr</code> <code class="o">=</code> <code class="n">mx</code><code class="o">.</code><code class="n">image</code><code class="o">.</code><code class="n">imread</code><code class="p">(</code><code class="n">fname</code><code class="p">)</code>
<code class="n">img_arr</code> <code class="o">=</code> <code class="n">transform</code><code class="p">(</code><code class="n">img_arr</code><code class="p">,</code> <code class="n">target_wd</code><code class="p">,</code> <code class="n">target_ht</code><code class="p">)</code>
<code class="n">img_list</code><code class="o">.</code><code class="n">append</code><code class="p">(</code><code class="n">img_arr</code><code class="p">)</code>
<code class="k">return</code> <code class="n">img_list</code>
<code class="n">base_path</code> <code class="o">=</code> <code class="s1">'brine_datasets/jayleicn/anime-faces/images/'</code>
<code class="n">img_list</code> <code class="o">=</code> <code class="n">getImageList</code><code class="p">(</code><code class="s1">'brine_datasets/jayleicn/anime-faces/images/'</code><code class="p">,</code> <code class="n">training_fold</code><code class="p">)</code>
</pre>
</section>
<section data-type="sect2" id="designing-the-gan-network-1LsKiXse">
<h3>Designing the GAN network</h3>
<p>We now need to design the two separate networks, the discriminator and the generator. The generator takes a random vector of shape (batchsize X N), where N is an integer, and converts it to an image of shape (batchsize X channels X width X height).</p>
<p>The generator uses <a href="http://deeplearning.net/software/theano_versions/dev/tutorial/conv_arithmetic.html#no-zero-padding-unit-strides-transposed">transpose convolutions</a> to upscale the input vectors. This is very similar to how a decoder unit in an <a href="https://en.wikipedia.org/wiki/Autoencoder">autoencoder</a> maps a lower-dimension vector into a higher-dimensional vector representation. You can choose to design your own generator network; the only the thing you need to be careful about is the input and the output shapes. The input to the generator network should be of low dimension (we use 1x150, <code>latent_z_size</code>) and output should be the expected number of channels (3 for color images), width, and height (3 x width x height). Here's the snippet of a generator network.</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="c1"># simple generator. Use any models but should upscale the latent variable(randome vectors) to 64 * 64 * 3 channel image</code>
<code class="k">with</code> <code class="n">netG</code><code class="o">.</code><code class="n">name_scope</code><code class="p">():</code>
<code class="c1"># input is random_z (batchsize X 150 X 1), going into a tranposed convolution</code>
<code class="n">netG</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">nn</code><code class="o">.</code><code class="n">Conv2DTranspose</code><code class="p">(</code><code class="n">ngf</code> <code class="o">*</code> <code class="mi">8</code><code class="p">,</code> <code class="mi">4</code><code class="p">,</code> <code class="mi">1</code><code class="p">,</code> <code class="mi">0</code><code class="p">))</code>
<code class="n">netG</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">nn</code><code class="o">.</code><code class="n">BatchNorm</code><code class="p">())</code>
<code class="n">netG</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">nn</code><code class="o">.</code><code class="n">Activation</code><code class="p">(</code><code class="s1">'relu'</code><code class="p">))</code>
<code class="c1"># output size. (ngf*8) x 4 x 4</code>
<code class="n">netG</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">nn</code><code class="o">.</code><code class="n">Conv2DTranspose</code><code class="p">(</code><code class="n">ngf</code> <code class="o">*</code> <code class="mi">4</code><code class="p">,</code> <code class="mi">4</code><code class="p">,</code> <code class="mi">2</code><code class="p">,</code> <code class="mi">1</code><code class="p">))</code>
<code class="n">netG</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">nn</code><code class="o">.</code><code class="n">BatchNorm</code><code class="p">())</code>
<code class="n">netG</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">nn</code><code class="o">.</code><code class="n">Activation</code><code class="p">(</code><code class="s1">'relu'</code><code class="p">))</code>
<code class="c1"># output size. (ngf*8) x 8 x 8</code>
<code class="n">netG</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">nn</code><code class="o">.</code><code class="n">Conv2DTranspose</code><code class="p">(</code><code class="n">ngf</code> <code class="o">*</code> <code class="mi">2</code><code class="p">,</code> <code class="mi">4</code><code class="p">,</code> <code class="mi">2</code><code class="p">,</code> <code class="mi">1</code><code class="p">))</code>
<code class="n">netG</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">nn</code><code class="o">.</code><code class="n">BatchNorm</code><code class="p">())</code>
<code class="n">netG</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">nn</code><code class="o">.</code><code class="n">Activation</code><code class="p">(</code><code class="s1">'relu'</code><code class="p">))</code>
<code class="c1"># output size. (ngf*8) x 16 x 16</code>
<code class="n">netG</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">nn</code><code class="o">.</code><code class="n">Conv2DTranspose</code><code class="p">(</code><code class="n">ngf</code><code class="p">,</code> <code class="mi">4</code><code class="p">,</code> <code class="mi">2</code><code class="p">,</code> <code class="mi">1</code><code class="p">))</code>
<code class="n">netG</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">nn</code><code class="o">.</code><code class="n">BatchNorm</code><code class="p">())</code>
<code class="n">netG</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">nn</code><code class="o">.</code><code class="n">Activation</code><code class="p">(</code><code class="s1">'relu'</code><code class="p">))</code>
<code class="c1"># output size. (ngf*8) x 32 x 32</code>
<code class="n">netG</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">nn</code><code class="o">.</code><code class="n">Conv2DTranspose</code><code class="p">(</code><code class="n">nc</code><code class="p">,</code> <code class="mi">4</code><code class="p">,</code> <code class="mi">2</code><code class="p">,</code> <code class="mi">1</code><code class="p">))</code>
<code class="n">netG</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">nn</code><code class="o">.</code><code class="n">Activation</code><code class="p">(</code><code class="s1">'tanh'</code><code class="p">))</code> <code class="c1"># use tanh , we need an output that is between -1 to 1, not 0 to 1 </code>
<code class="c1"># Rememeber the input image is normalised between -1 to 1, so should be the output</code>
<code class="c1"># output size. (nc) x 64 x 64</code>
</pre>
<p>Our discriminator is a binary image classification network that maps the image of shape (batchsize X channels X width x height) into a lower-dimension vector of shape (batchsize X 1). Again, you can use any model that does binary classification with reasonable accuracy.</p>
<p>Here's the snippet of the discriminator network:</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="k">with</code> <code class="n">netD</code><code class="o">.</code><code class="n">name_scope</code><code class="p">():</code>
<code class="c1"># input is (nc) x 64 x 64</code>
<code class="n">netD</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">nn</code><code class="o">.</code><code class="n">Conv2D</code><code class="p">(</code><code class="n">ndf</code><code class="p">,</code> <code class="mi">4</code><code class="p">,</code> <code class="mi">2</code><code class="p">,</code> <code class="mi">1</code><code class="p">))</code>
<code class="n">netD</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">nn</code><code class="o">.</code><code class="n">LeakyReLU</code><code class="p">(</code><code class="mf">0.2</code><code class="p">))</code>
<code class="c1"># output size. (ndf) x 32 x 32</code>
<code class="n">netD</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">nn</code><code class="o">.</code><code class="n">Conv2D</code><code class="p">(</code><code class="n">ndf</code> <code class="o">*</code> <code class="mi">2</code><code class="p">,</code> <code class="mi">4</code><code class="p">,</code> <code class="mi">2</code><code class="p">,</code> <code class="mi">1</code><code class="p">))</code>
<code class="n">netD</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">nn</code><code class="o">.</code><code class="n">BatchNorm</code><code class="p">())</code>
<code class="n">netD</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">nn</code><code class="o">.</code><code class="n">LeakyReLU</code><code class="p">(</code><code class="mf">0.2</code><code class="p">))</code>
<code class="c1"># output size. (ndf) x 16 x 16</code>
<code class="n">netD</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">nn</code><code class="o">.</code><code class="n">Conv2D</code><code class="p">(</code><code class="n">ndf</code> <code class="o">*</code> <code class="mi">4</code><code class="p">,</code> <code class="mi">4</code><code class="p">,</code> <code class="mi">2</code><code class="p">,</code> <code class="mi">1</code><code class="p">))</code>
<code class="n">netD</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">nn</code><code class="o">.</code><code class="n">BatchNorm</code><code class="p">())</code>
<code class="n">netD</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">nn</code><code class="o">.</code><code class="n">LeakyReLU</code><code class="p">(</code><code class="mf">0.2</code><code class="p">))</code>
<code class="c1"># output size. (ndf) x 8 x 8</code>
<code class="n">netD</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">nn</code><code class="o">.</code><code class="n">Conv2D</code><code class="p">(</code><code class="n">ndf</code> <code class="o">*</code> <code class="mi">8</code><code class="p">,</code> <code class="mi">4</code><code class="p">,</code> <code class="mi">2</code><code class="p">,</code> <code class="mi">1</code><code class="p">))</code>
<code class="n">netD</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">nn</code><code class="o">.</code><code class="n">BatchNorm</code><code class="p">())</code>
<code class="n">netD</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">nn</code><code class="o">.</code><code class="n">LeakyReLU</code><code class="p">(</code><code class="mf">0.2</code><code class="p">))</code>
<code class="c1"># output size. (ndf) x 4 x 4</code>
<code class="n">netD</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">nn</code><code class="o">.</code><code class="n">Conv2D</code><code class="p">(</code><code class="mi">1</code><code class="p">,</code> <code class="mi">4</code><code class="p">,</code> <code class="mi">1</code><code class="p">,</code> <code class="mi">0</code><code class="p">))</code>
</pre>
</section>
</section>
<section data-type="sect1" id="training-the-gan-network-jWs8Hv">
<h2>Training the GAN network</h2>
<p>The training of a GAN network is not straightforward, but it is simple. Figure 4 illustrates the training process.</p>
<figure class="center" id="id-WxiQI2H9"><img alt="GAN training" src="https://d3ansictanv2wj.cloudfront.net/GAN_Model-5217f53119b05bf85707fa815716b264.png"><figcaption><span class="label">Figure 4. </span>GAN training. Image by Manu Jeevan.</figcaption></figure>
<p>The real images are given a label of 1, and the fake images are given a label of 0.</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="c1"># real label is the labels of real image</code>
<code class="n">real_label</code> <code class="o">=</code> <code class="n">nd</code><code class="o">.</code><code class="n">ones</code><code class="p">((</code><code class="n">batch_size</code><code class="p">,</code> <code class="p">),</code> <code class="n">ctx</code><code class="o">=</code><code class="n">ctx</code><code class="p">)</code>
<code class="c1"># fake labels is label associated with fake image</code>
<code class="n">fake_label</code> <code class="o">=</code> <code class="n">nd</code><code class="o">.</code><code class="n">zeros</code><code class="p">((</code><code class="n">batch_size</code><code class="p">,</code> <code class="p">),</code> <code class="n">ctx</code><code class="o">=</code><code class="n">ctx</code><code class="p">)</code>
</pre>
<section data-type="sect2" id="training-the-discriminator-q8srTBHW">
<h3>Training the discriminator</h3>
<p>A real image is now passed to the discriminator to determine whether it is real or fake, and the loss associated with the prediction is calculated as <code>errD_real</code>.</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"> <code class="c1"># train with real image</code>
<code class="n">output</code> <code class="o">=</code> <code class="n">netD</code><code class="p">(</code><code class="n">data</code><code class="p">)</code><code class="o">.</code><code class="n">reshape</code><code class="p">((</code><code class="o">-</code><code class="mi">1</code><code class="p">,</code> <code class="mi">1</code><code class="p">))</code>
<code class="c1"># The loss is a real valued number</code>
<code class="n">errD_real</code> <code class="o">=</code> <code class="n">loss</code><code class="p">(</code><code class="n">output</code><code class="p">,</code> <code class="n">real_label</code><code class="p">)</code>
</pre>
<p>In the next step, a random noise <code>random_z</code> is passed to the generator network to produce a random image. This image is then passed to the discriminator to classify it as real (1) or fake (0), thereby creating a loss, <code>errD_fake</code>. This <code>errD_fake</code> is high if the discriminator wrongly classifies the fake image (label 0) as a true image (label 1). This <code>errD_fake</code> is back propagated to train the discriminator to classify the fake image as a fake image (label 0). This helps the discriminator improve its accuracy.</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"> <code class="n">train</code> <code class="k">with</code> <code class="n">fake</code> <code class="n">image</code><code class="p">,</code> <code class="n">see</code> <code class="n">the</code> <code class="n">what</code> <code class="n">the</code> <code class="n">discriminator</code> <code class="n">predicts</code>
<code class="c1"># creates fake imge</code>
<code class="n">fake</code> <code class="o">=</code> <code class="n">netG</code><code class="p">(</code><code class="n">random_z</code><code class="p">)</code>
<code class="c1"># pass it to discriminator</code>
<code class="n">output</code> <code class="o">=</code> <code class="n">netD</code><code class="p">(</code><code class="n">fake</code><code class="o">.</code><code class="n">detach</code><code class="p">())</code><code class="o">.</code><code class="n">reshape</code><code class="p">((</code><code class="o">-</code><code class="mi">1</code><code class="p">,</code> <code class="mi">1</code><code class="p">))</code>
<code class="n">errD_fake</code> <code class="o">=</code> <code class="n">loss</code><code class="p">(</code><code class="n">output</code><code class="p">,</code> <code class="n">fake_label</code><code class="p">)</code>
</pre>
<p>The total error is back propagated to tune the weights of the discriminator.</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"> <code class="c1"># compute the total error for fake image and the real image</code>
<code class="n">errD</code> <code class="o">=</code> <code class="n">errD_real</code> <code class="o">+</code> <code class="n">errD_fake</code>
<code class="c1"># improve the discriminator skill by back propagating the error</code>
<code class="n">errD</code><code class="o">.</code><code class="n">backward</code><code class="p">()</code>
</pre>
</section>
<section data-type="sect2" id="training-the-generator-1LsgcGHe">
<h3>Training the generator</h3>
<p>The random noise (<code>random_z</code>) vector used for training the discriminator is used again to generate a fake image. We then pass the fake image to the discriminator network to obtain the classification output, and the loss is calculated. The loss is high if the fake image generated (label = 0) is not similar to the real image (label 1)—i.e., if the generator is not able to produce a fake image that can trick the discriminator to classify it as a real image (label =1). The loss is then used to fine-tune the generator network.</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="n">fake</code> <code class="o">=</code> <code class="n">netG</code><code class="p">(</code><code class="n">random_z</code><code class="p">)</code>
<code class="n">output</code> <code class="o">=</code> <code class="n">netD</code><code class="p">(</code><code class="n">fake</code><code class="p">)</code><code class="o">.</code><code class="n">reshape</code><code class="p">((</code><code class="o">-</code><code class="mi">1</code><code class="p">,</code> <code class="mi">1</code><code class="p">))</code>
<code class="n">errG</code> <code class="o">=</code> <code class="n">loss</code><code class="p">(</code><code class="n">output</code><code class="p">,</code> <code class="n">real_label</code><code class="p">)</code>
<code class="n">errG</code><code class="o">.</code><code class="n">backward</code><code class="p">()</code>
</pre>
</section>
<section data-type="sect2" id="generating-new-fake-images-YmsGiNHW">
<h3>Generating new fake images</h3>
<p>The model weights are available <a href="https://www.dropbox.com/s/uu45cq5y6uigiro/GAN_t2.params?dl=0">here</a>. You can download the model parameters and load it using the <a href="https://mxnet.incubator.apache.org/api/python/module/module.html?highlight=load#mxnet.module.BaseModule.load_params">model.load_params</a> function.
We can use the generator network to create new fake images by providing 150 random dimensions as an input to the network (see Figure 5).</p>
<figure class="center" id="id-92irIwiDH4"><img alt="GAN generated images" src="https://d3ansictanv2wj.cloudfront.net/GAN_image-b4b0d3e91316c2b8d9c1c482bb3f5dcd.png"><figcaption><span class="label">Figure 5. </span>GAN generated images. Image by Manu Jeevan.</figcaption></figure>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="c1"># Let's generate some random images</code>
<code class="n">num_image</code> <code class="o">=</code> <code class="mi">8</code>
<code class="k">for</code> <code class="n">i</code> <code class="ow">in</code> <code class="nb">range</code><code class="p">(</code><code class="n">num_image</code><code class="p">):</code>
<code class="c1"># random input for the generating images</code>
<code class="n">random_z</code> <code class="o">=</code> <code class="n">mx</code><code class="o">.</code><code class="n">nd</code><code class="o">.</code><code class="n">random_normal</code><code class="p">(</code><code class="mi">0</code><code class="p">,</code> <code class="mi">1</code><code class="p">,</code> <code class="n">shape</code><code class="o">=</code><code class="p">(</code><code class="mi">1</code><code class="p">,</code> <code class="n">latent_z_size</code><code class="p">,</code> <code class="mi">1</code><code class="p">,</code> <code class="mi">1</code><code class="p">),</code> <code class="n">ctx</code><code class="o">=</code><code class="n">ctx</code><code class="p">)</code>
<code class="n">img</code> <code class="o">=</code> <code class="n">netG</code><code class="p">(</code><code class="n">random_z</code><code class="p">)</code>
<code class="n">plt</code><code class="o">.</code><code class="n">subplot</code><code class="p">(</code><code class="mi">2</code><code class="p">,</code> <code class="mi">4</code><code class="p">,</code> <code class="n">i</code><code class="o">+</code><code class="mi">1</code><code class="p">)</code>
<code class="n">visualize</code><code class="p">(</code><code class="n">img</code><code class="p">[</code><code class="mi">0</code><code class="p">])</code>
<code class="n">plt</code><code class="o">.</code><code class="n">show</code><code class="p">()</code>
</pre>
<p>Although the images generated look similar to the input data set, they are fuzzy. There are several other <a href="https://blog.openai.com/generative-models/">GAN</a> networks that you can experiment with and achieve amazing results.</p>
</section>
</section>
</section>
<section data-type="chapter" id="conclusion-59ps7">
<h2>Conclusion</h2>
<p>Generative models open up new opportunities for deep learning. This article has explored some of the famous generative models for image data. We learned about GAN models, which combine discriminative and generative models, and used one to generate new images very close to the input data (anime characters).</p>
<p><em>This post is a collaboration between O'Reilly and Amazon. <a href="http://www.oreilly.com/about/editorial_independence.html">See our statement of editorial independence</a>.<em></em></em></p>
</section>
<p>Continue reading <a href='https://www.oreilly.com/ideas/generative-model-using-apache-mxnet'>Build generative models using Apache MXNet.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/qk_Xj3auGyc" height="1" width="1" alt=""/>Suresh Rathnaraj, Manu Jeevanhttps://www.oreilly.com/ideas/generative-model-using-apache-mxnetFour short links: 16 February 20182018-02-16T09:00:00Ztag:www.oreilly.com,2018-02-16:/ideas/four-short-links-16-february-2018<p><em>Machine Design, Metrics, Layered Learning, and Automatically Mergeable Data Structure</em></p><ol>
<li>
<a href="https://www.researchgate.net/profile/Andreas_M_Hein/publication/322804487_DESIGNING_AS_COMPUTING_TOWARDS_DESIGNING_MACHINES/links/5a70c4690f7e9ba2e1cb08e0/DESIGNING-AS-COMPUTING-TOWARDS-DESIGNING-MACHINES.pdf">Towards Designing Machines</a> -- survey of theory and approaches to building machines that can design things.</li>
<li>
<a href="http://timharford.com/2018/02/review-of-the-tyranny-of-metrics-by-jerry-muller/">Review of the Tyranny of Metrics</a> (Tim Hartford) -- <i>Rather than rely on the informed judgment of people familiar with the situation, we gather meaningless numbers at great cost. We then use them to guide our actions, predictably causing unintended damage.</i>
</li>
<li>
<a href="https://physicstravelguide.com/">Physics Travel Guide</a> -- <i>a tool that makes learning physics easier. Each page here contains three layers which contain explanations with increasing level of sophistication. We call these layers: layman, student and researcher. These layers make sure that readers can always find an explanation they understand.</i> One of these for security or coding would be interesting.</li>
<li>
<a href="https://github.com/automerge/automerge">Automerge</a> -- <i>A JSON-like data structure that can be modified concurrently by different users, and merged again automatically.</i>
</li>
</ol>
<p>Continue reading <a href='https://www.oreilly.com/ideas/four-short-links-16-february-2018'>Four short links: 16 February 2018.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/ea0sD_n6hiA" height="1" width="1" alt=""/>Nat Torkingtonhttps://www.oreilly.com/ideas/four-short-links-16-february-2018Graphs as the front end for machine learning2018-02-15T12:35:00Ztag:www.oreilly.com,2018-02-15:/ideas/graphs-as-the-front-end-for-machine-learning<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/2000px-12-cube_crop-2714d6bfd0e16deb395685c9d64fff2f.jpg'/></p><p><em>The O’Reilly Data Show Podcast: Leo Meyerovich on building large-scale, interactive applications that enable visual investigations.</em></p><p>In this episode of the <a href="https://www.oreilly.com/ideas/topics/oreilly-data-show-podcast">Data Show</a>, I spoke with <a href="https://lmeyerov.github.io/">Leo Meyerovich</a>, co-founder and CEO of <a href="http://www.graphistry.com/">Graphistry</a>. Graphs have always been part of the big data revolution (think of the large graphs generated by the early social media startups). In recent months, I’ve come across companies releasing and using new tools for creating, storing, and (most importantly) analyzing large graphs. There are many problems and use cases that lend themselves naturally to graphs, and recent advances in hardware and software building blocks have made large-scale analytics possible.</p>
<p>Starting with <a href="http://ftl.eecs.berkeley.edu/">his work as a graduate student</a> at UC Berkeley, Meyerovich has pioneered the combination of hardware and software acceleration to create truly interactive environments for visualizing large amounts of data. Graphistry has built a suite of tools that enables analysts to wade through large data sets and investigate business and security incidents. The company is currently focused on the security domain—where it turns out that graph representations of data are things security analysts are quite familiar with.</p><p>Continue reading <a href='https://www.oreilly.com/ideas/graphs-as-the-front-end-for-machine-learning'>Graphs as the front end for machine learning.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/TDRKI7ChmGQ" height="1" width="1" alt=""/>Ben Loricahttps://www.oreilly.com/ideas/graphs-as-the-front-end-for-machine-learningFour short links: 15 February 20182018-02-15T11:55:00Ztag:www.oreilly.com,2018-02-15:/ideas/four-short-links-15-february-2018<p><em>Donut Drones, Consensus Algorithms, 2FA Spam, and Replacing Founders</em></p><ol>
<li>
<a href="https://spectrum.ieee.org/automaton/robotics/drones/cleo-robotics-demonstrates-uniquely-clever-ducted-fan-drone">Donut Drone</a> (IEEE) -- clever drone that is collision-safe. Nice!</li>
<li>
<a href="https://hackernoon.com/a-hitchhikers-guide-to-consensus-algorithms-d81aae3eb0e3">Hitchhiker's Guide to Consensus Algorithms</a> -- <i>In the world of crypto, consensus algorithms exist to prevent double spending. Here’s a quick rundown on some of the most popular consensus algorithms to date, from blockchains to DAGs and everything in-between.</i>
</li>
<li>
<a href="https://mashable.com/2018/02/14/facebook-spam-2fa/#0CE_HHOsZiq6">Facebook Spamming Users via Their 2FA Numbers</a> (Mashable) -- when your profits are proportional to engagement, your business model turns your business into a junkie. It will cajole, stalk, berate, and trap users to feed its engagement addiction.</li>
<li>
<a href="https://hbr.org/2018/02/research-what-happens-to-a-startup-when-venture-capitalists-replace-the-founder">What Happens When Startups Replace The Founder?</a> (HBR) -- about 20% are replaced; noncompete laws help/hinder recruitment; it's overall beneficial; startups perform better when the founder leaves the company; raising external funding raises the probability that the founder will be replaced.</li>
</ol>
<p>Continue reading <a href='https://www.oreilly.com/ideas/four-short-links-15-february-2018'>Four short links: 15 February 2018.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/S342FLDkd7g" height="1" width="1" alt=""/>Nat Torkingtonhttps://www.oreilly.com/ideas/four-short-links-15-february-2018Working with data in the financial industry: Legal considerations2018-02-14T16:55:00Ztag:www.oreilly.com,2018-02-14:/ideas/working-with-data-in-the-financial-industry-legal-considerations<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/3217859975_eab5dde770_o_crop-2d048adafd821e5e6ce0241de76d635c.jpg'/></p><p><em>Alysa Hutnik discusses the Fair Credit Reporting Act, the Equal Credit Opportunity Act, the Gramm-Leach Bliley Act, and the FTC’s focus on FinTech.</em></p><p>Continue reading <a href='https://www.oreilly.com/ideas/working-with-data-in-the-financial-industry-legal-considerations'>Working with data in the financial industry: Legal considerations.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/QEc3s8JvWOw" height="1" width="1" alt=""/>Alysa Z. Hutnikhttps://www.oreilly.com/ideas/working-with-data-in-the-financial-industry-legal-considerationsBuilding deep learning neural networks using TensorFlow layers2018-02-14T14:40:00Ztag:www.oreilly.com,2018-02-14:/ideas/building-deep-learning-neural-networks-using-tensorflow-layers<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/mnistexamples_crop-a96896fe8fa3661109fdb86e638b5b9a.jpg'/></p><p><em>A step-by-step tutorial on how to use TensorFlow to build a multi-layered convolutional network.</em></p><p><a href="https://en.wikipedia.org/wiki/Deep_learning">Deep learning</a> has proven its effectiveness in many fields, such as computer vision, natural language processing (NLP), text translation, or speech to text. It takes its name from the high number of layers used to build the neural network performing machine learning tasks. There are several types of layers as well as overall network architectures, but the general rule holds that the deeper the network is, the more complexity it can grasp. This article will explain fundamental concepts of neural network layers and walk through the process of creating several types using <a href="https://www.tensorflow.org/">TensorFlow</a>.</p>
<p>TensorFlow is the platform that contributed to making artificial intelligence (AI) available to the broader public. It’s an open source library with a vast community and great support. TensorFlow provides a set of tools for building neural network architectures, and then training and serving the models. It offers different levels of abstraction, so you can use it for cut-and-dried machine learning processes at a high level or go more in-depth and write the low-level calculations yourself.</p>
<p>TensorFlow offers many kinds of layers in its <a href="https://www.tensorflow.org/api_docs/python/tf/layers">tf.layers</a> package. The module makes it easy to create a layer in the deep learning model without going into many details. At the moment, it supports types of layers used mostly in <a href="https://en.wikipedia.org/wiki/Convolutional_neural_network">convolutional networks</a>. For other types of networks, like RNNs, you may need to look at <a href="https://www.tensorflow.org/api_docs/python/tf/contrib/rnn">tf.contrib.rnn</a> or <a href="https://www.tensorflow.org/api_docs/python/tf/nn">tf.nn</a>. The most basic type of layer is the <a href="https://en.wikipedia.org/wiki/Multilayer_perceptron">fully connected</a> one. To implement it, you only need to set up the input and the size in the <a href="https://www.tensorflow.org/api_docs/python/tf/layers/Dense">Dense class</a>. Other kinds of layers might require more parameters, but they are implemented in a way to cover the default behaviour and spare the developers’ time.</p>
<p>There is some disagreement on what a layer is and what it is not. One opinion states that a layer must store trained parameters (like weights and biases). This means, for instance, that applying the activation function is not another layer. Indeed, <code>tf.layers</code> implements such a function by using the activation parameter. Layers introduced in the module don’t always strictly follow this rule, though. You can find a large range of types there: fully connected, convolution, pooling, flatten, batch normalization, dropout, and convolution transpose. It may seem that, for example, layer flattening and max pooling don’t store any parameters trained in the learning process. Nonetheless, they are performing more complex operations than activation function, so the authors of the module decided to set them up as separate classes. Later in the article, we’ll discuss how to use some of them to build a deep convolutional network.</p>
<p>A typical convolutional network is a sequence of convolution and pooling pairs, followed by a few fully connected layers. A convolution is like a small neural network that is applied repeatedly, once at each location on its input. As a result, the network layers become much smaller but increase in depth. <a href="https://en.wikipedia.org/wiki/Convolutional_neural_network#Pooling">Pooling</a> is the operation that usually decreases the size of the input image. Max pooling is the most common pooling algorithm, and has proven to be effective in many computer vision tasks.</p>
<p>In this article, I’ll show the use of TensorFlow in applying a convolutional network to image processing, using the <a href="http://yann.lecun.com/exdb/mnist/">MNIST data set</a> for our example. The task is to recognize a digit ranging from 0 to 9 from its handwritten representation.</p>
<p>First, TensorFlow has the capabilities to load the data. All you need to do is to use the <code>input_data</code> module:</p>
<pre data-type="programlisting" data-code-language="python" data-highlighted="true">
<code class="kn">from</code> <code class="nn">tensorflow.examples.tutorials.mnist</code> <code class="kn">import</code> <code class="n">input_data</code>
<code class="n">mnist</code> <code class="o">=</code> <code class="n">input_data</code><code class="o">.</code><code class="n">read_data_sets</code><code class="p">(</code><code class="n">folder_path</code><code class="p">,</code> <code class="n">one_hot</code><code class="o">=</code><code class="bp">True</code><code class="p">)</code>
</pre>
<p>We are now going to build a multilayered architecture. After describing the learning process, I’ll walk you through the creation of different kinds of layers and apply them to the MNIST classification task.</p>
<p>The training process works by optimizing the loss function, which measures the difference between the network predictions and actual labels' values. Deep learning often uses a technique called <a href="https://en.wikipedia.org/wiki/Cross_entropy">cross entropy</a> to define the loss.</p>
<p>TensorFlow provides the function called <a href="https://www.tensorflow.org/api_docs/python/tf/losses/softmax_cross_entropy">tf.losses.softmax_cross_entropy</a> that internally applies the <a href="https://en.wikipedia.org/wiki/Softmax_function">softmax</a> algorithm on the model's unnormalized prediction and sums results across all classes. In our example, we use the <a href="https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Adam">Adam optimizer</a> provided by the <code>tf.train</code> API. <code>labels</code> will be provided in the process of training and testing, and will represent the underlying truth. <code>output</code> represents the network predictions and will be defined in the next section when building the network.</p>
<pre data-type="programlisting" data-code-language="python" data-highlighted="true">
<code class="n">loss</code> <code class="o">=</code> <code class="n">tf</code><code class="o">.</code><code class="n">losses</code><code class="o">.</code><code class="n">softmax_cross_entropy</code><code class="p">(</code><code class="n">labels</code><code class="p">,</code> <code class="n">output</code><code class="p">)</code>
<code class="n">train_step</code> <code class="o">=</code> <code class="n">tf</code><code class="o">.</code><code class="n">train</code><code class="o">.</code><code class="n">AdamOptimizer</code><code class="p">(</code><code class="mf">1e-4</code><code class="p">)</code><code class="o">.</code><code class="n">minimize</code><code class="p">(</code><code class="n">loss</code><code class="p">)</code>
</pre>
<p>To evaluate the performance of the training process, we want to compare the output with the real labels and calculate the accuracy:</p>
<pre data-type="programlisting" data-code-language="python" data-highlighted="true">
<code class="n">correct_prediction</code> <code class="o">=</code> <code class="n">tf</code><code class="o">.</code><code class="n">equal</code><code class="p">(</code><code class="n">tf</code><code class="o">.</code><code class="n">argmax</code><code class="p">(</code><code class="n">output</code><code class="p">,</code> <code class="mi">1</code><code class="p">),</code> <code class="n">tf</code><code class="o">.</code><code class="n">argmax</code><code class="p">(</code><code class="n">labels</code><code class="p">,</code> <code class="mi">1</code><code class="p">))</code>
<code class="n">accuracy</code> <code class="o">=</code> <code class="n">tf</code><code class="o">.</code><code class="n">reduce_mean</code><code class="p">(</code><code class="n">tf</code><code class="o">.</code><code class="n">cast</code><code class="p">(</code><code class="n">correct_prediction</code><code class="p">,</code> <code class="n">tf</code><code class="o">.</code><code class="n">float32</code><code class="p">))</code>
</pre>
<p>Now, we’ll introduce a simple training process using batches and a fixed number of steps and learning rate. For the MNIST data set, the <code>next_batch</code> function would just call <code>mnist.train.next_batch</code>. After the network is trained, we can check its performance on the test data.</p>
<pre data-type="programlisting" data-code-language="python" data-highlighted="true">
<code class="c1"># Open the session</code>
<code class="n">sess</code> <code class="o">=</code> <code class="n">tf</code><code class="o">.</code><code class="n">InteractiveSession</code><code class="p">()</code>
<code class="n">sess</code><code class="o">.</code><code class="n">run</code><code class="p">(</code><code class="n">tf</code><code class="o">.</code><code class="n">global_variables_initializer</code><code class="p">())</code>
<code class="k">for</code> <code class="n">i</code> <code class="ow">in</code> <code class="nb">range</code><code class="p">(</code><code class="n">steps</code><code class="p">):</code>
<code class="c1"># Get the next batch</code>
<code class="n">input_batch</code><code class="p">,</code> <code class="n">labels_batch</code> <code class="o">=</code> <code class="n">next_batch</code><code class="p">(</code><code class="mi">100</code><code class="p">)</code>
<code class="n">feed_dict</code> <code class="o">=</code> <code class="p">{</code><code class="n">x_input</code><code class="p">:</code> <code class="n">input_batch</code><code class="p">,</code> <code class="n">y_labels</code><code class="p">:</code> <code class="n">labels_batch</code><code class="p">}</code>
<code class="c1"># Print the current batch accuracy every 100 steps</code>
<code class="k">if</code> <code class="n">i</code><code class="o">%</code><code class="mi">100</code> <code class="o">==</code> <code class="mi">0</code><code class="p">:</code>
<code class="n">train_accuracy</code> <code class="o">=</code> <code class="n">accuracy</code><code class="o">.</code><code class="n">eval</code><code class="p">(</code><code class="n">feed_dict</code><code class="o">=</code><code class="n">feed_dict</code><code class="p">)</code>
<code class="k">print</code><code class="p">(</code><code class="s2">"Step </code><code class="si">%d</code><code class="s2">, training batch accuracy </code><code class="si">%g</code><code class="s2">"</code><code class="o">%</code><code class="p">(</code><code class="n">i</code><code class="p">,</code> <code class="n">train_accuracy</code><code class="p">))</code>
<code class="c1"># Run the optimization step</code>
<code class="n">train_step</code><code class="o">.</code><code class="n">run</code><code class="p">(</code><code class="n">feed_dict</code><code class="o">=</code><code class="n">feed_dict</code><code class="p">)</code>
<code class="c1"># Print the test accuracy once the training is over</code>
<code class="k">print</code><code class="p">(</code><code class="s2">"Test accuracy: </code><code class="si">%g</code><code class="s2">"</code><code class="o">%</code><code class="n">accuracy</code><code class="o">.</code><code class="n">eval</code><code class="p">(</code><code class="n">feed_dict</code><code class="o">=</code><code class="p">{</code><code class="n">x_input</code><code class="p">:</code> <code class="n">test_images</code><code class="p">,</code> <code class="n">y_labels</code><code class="p">:</code> <code class="n">test_labels</code><code class="p">}))</code>
</pre>
<p>For the actual training, let's start simple and create the network with just one output layer. We begin by defining placeholders for the input data and labels. During the training phase, they will be filled with the data from the MNIST data set. Because the data was flattened, the input layer has only one dimension. The size of the output layer corresponds to the number of labels. Both input and labels have the additional dimension set to <code>None</code>, which will handle the variable number of examples.</p>
<pre data-type="programlisting" data-code-language="python" data-highlighted="true">
<code class="nb">input</code> <code class="o">=</code> <code class="n">tf</code><code class="o">.</code><code class="n">placeholder</code><code class="p">(</code><code class="n">tf</code><code class="o">.</code><code class="n">float32</code><code class="p">,</code> <code class="p">[</code><code class="bp">None</code><code class="p">,</code> <code class="n">image_size</code><code class="o">*</code><code class="n">image_size</code><code class="p">])</code>
<code class="n">labels</code> <code class="o">=</code> <code class="n">tf</code><code class="o">.</code><code class="n">placeholder</code><code class="p">(</code><code class="n">tf</code><code class="o">.</code><code class="n">float32</code><code class="p">,</code> <code class="p">[</code><code class="bp">None</code><code class="p">,</code> <code class="n">labels_size</code><code class="p">])</code>
</pre>
<p>Now is the time to build the exciting part: the output layer. The magic behind it is quite straightforward. Every neuron in it has the weight and bias parameters, gets the data from every input, and performs some calculations. This is what makes it a fully connected layer.</p>
<p>TensorFlow’s <code>tf.layers</code> package allows you to formulate all this in just one line of code. All you need to provide is the input and the size of the layer.</p>
<pre data-type="programlisting" data-code-language="python" data-highlighted="true">
<code class="n">output</code> <code class="o">=</code> <code class="n">tf</code><code class="o">.</code><code class="n">layers</code><code class="o">.</code><code class="n">dense</code><code class="p">(</code><code class="n">inputs</code><code class="o">=</code><code class="nb">input</code><code class="p">,</code> <code class="n">units</code><code class="o">=</code><code class="n">labels_size</code><code class="p">)</code></pre>
<p>Our first network isn't that impressive in regard to accuracy. But it’s simple, so it runs very fast.</p>
<p>We’ll try to improve our network by adding more layers between the input and output. These are called hidden layers. First, we add another fully connected one.</p>
<p>Some minor changes are needed from the previous architecture. First of all, there is another parameter indicating the number of neurons of the hidden layer. The definition itself takes the input data and connects to the output layer:</p>
<pre data-type="programlisting" data-code-language="python" data-highlighted="true">
<code class="n">hidden</code> <code class="o">=</code> <code class="n">tf</code><code class="o">.</code><code class="n">layers</code><code class="o">.</code><code class="n">dense</code><code class="p">(</code><code class="n">inputs</code><code class="o">=</code><code class="nb">input</code><code class="p">,</code> <code class="n">units</code><code class="o">=</code><code class="mi">1024</code><code class="p">,</code> <code class="n">activation</code><code class="o">=</code><code class="n">tf</code><code class="o">.</code><code class="n">nn</code><code class="o">.</code><code class="n">relu</code><code class="p">)</code>
<code class="n">output</code> <code class="o">=</code> <code class="n">tf</code><code class="o">.</code><code class="n">layers</code><code class="o">.</code><code class="n">dense</code><code class="p">(</code><code class="n">inputs</code><code class="o">=</code><code class="n">hidden</code><code class="p">,</code> <code class="n">units</code><code class="o">=</code><code class="n">labels_size</code><code class="p">)</code>
</pre>
<p>Notice that this time, we used an activation parameter. It runs whatever comes out of the neuron through the activation function, which in this case is <a href="https://en.wikipedia.org/wiki/Rectifier_(neural_networks)">ReLU</a>. This algorithm has been proven to work quite well with deep architectures.</p>
<p>You should see a slight decrease in performance. Our network is becoming deeper, which means it's getting more parameters to be tuned, and this makes the training process longer. On the other hand, this will improve the accuracy significantly, to the 94% level.</p>
<p>The next two layers we're going to add are the integral parts of convolutional networks. They work differently from the dense ones and perform especially well with input that has two or more dimensions (such as images). The parameters of the convolutional layer are the size of the convolution window and the number of filters. A padding set of <code>same</code> indicates that the resulting layer is of the same size. After this step, we apply max pooling.</p>
<p>Using convolution allows us to take advantage of the 2D representation of the input data. We'd lost it when we flattened the digits pictures and fed the resulting data into the dense layer. To go back to the original structure, we can use the <code>tf.reshape function</code>.</p>
<pre data-type="programlisting" data-code-language="python" data-highlighted="true">
<code class="n">input2d</code> <code class="o">=</code> <code class="n">tf</code><code class="o">.</code><code class="n">reshape</code><code class="p">(</code><code class="nb">input</code><code class="p">,</code> <code class="p">[</code><code class="o">-</code><code class="mi">1</code><code class="p">,</code><code class="n">image_size</code><code class="p">,</code><code class="n">image_size</code><code class="p">,</code><code class="mi">1</code><code class="p">])</code></pre>
<p>The code for convolution and max pooling follows. Notice that for the next connection with the dense layer, the output must be flattened back.</p>
<pre data-type="programlisting" data-code-language="python" data-highlighted="true">
<code class="n">conv1</code> <code class="o">=</code> <code class="n">tf</code><code class="o">.</code><code class="n">layers</code><code class="o">.</code><code class="n">conv2d</code><code class="p">(</code><code class="n">inputs</code><code class="o">=</code><code class="n">input2d</code><code class="p">,</code> <code class="n">filters</code><code class="o">=</code><code class="mi">32</code><code class="p">,</code> <code class="n">kernel_size</code><code class="o">=</code><code class="p">[</code><code class="mi">5</code><code class="p">,</code> <code class="mi">5</code><code class="p">],</code> <code class="n">padding</code><code class="o">=</code><code class="s2">"same"</code><code class="p">,</code> <code class="n">activation</code><code class="o">=</code><code class="n">tf</code><code class="o">.</code><code class="n">nn</code><code class="o">.</code><code class="n">relu</code><code class="p">)</code>
<code class="n">pool1</code> <code class="o">=</code> <code class="n">tf</code><code class="o">.</code><code class="n">layers</code><code class="o">.</code><code class="n">max_pooling2d</code><code class="p">(</code><code class="n">inputs</code><code class="o">=</code><code class="n">conv1</code><code class="p">,</code> <code class="n">pool_size</code><code class="o">=</code><code class="p">[</code><code class="mi">2</code><code class="p">,</code> <code class="mi">2</code><code class="p">],</code> <code class="n">strides</code><code class="o">=</code><code class="mi">2</code><code class="p">)</code>
<code class="n">pool_flat</code> <code class="o">=</code> <code class="n">tf</code><code class="o">.</code><code class="n">reshape</code><code class="p">(</code><code class="n">pool1</code><code class="p">,</code> <code class="p">[</code><code class="o">-</code><code class="mi">1</code><code class="p">,</code> <code class="mi">14</code> <code class="o">*</code> <code class="mi">14</code> <code class="o">*</code> <code class="mi">32</code><code class="p">])</code>
<code class="n">hidden</code> <code class="o">=</code> <code class="n">tf</code><code class="o">.</code><code class="n">layers</code><code class="o">.</code><code class="n">dense</code><code class="p">(</code><code class="n">inputs</code><code class="o">=</code> <code class="n">pool_flat</code><code class="p">,</code> <code class="n">units</code><code class="o">=</code><code class="mi">1024</code><code class="p">,</code> <code class="n">activation</code><code class="o">=</code><code class="n">tf</code><code class="o">.</code><code class="n">nn</code><code class="o">.</code><code class="n">relu</code><code class="p">)</code>
<code class="n">output</code> <code class="o">=</code> <code class="n">tf</code><code class="o">.</code><code class="n">layers</code><code class="o">.</code><code class="n">dense</code><code class="p">(</code><code class="n">inputs</code><code class="o">=</code><code class="n">hidden</code><code class="p">,</code> <code class="n">units</code><code class="o">=</code><code class="n">labels_size</code><code class="p">)</code>
</pre>
<p>Adding the convolution to the picture increases the accuracy even more (to 97%), but slows down the training process significantly. To take full advantage of the model, we should continue with another layer. We again are using the 2D input, but flattening only the output of the second layer. The first one doesn't need flattening now because the convolution works with higher dimensions.</p>
<pre data-type="programlisting" data-code-language="python" data-highlighted="true">
<code class="n">conv2</code> <code class="o">=</code> <code class="n">tf</code><code class="o">.</code><code class="n">layers</code><code class="o">.</code><code class="n">conv2d</code><code class="p">(</code><code class="n">inputs</code><code class="o">=</code><code class="n">pool1</code><code class="p">,</code> <code class="n">filters</code><code class="o">=</code><code class="mi">64</code><code class="p">,</code> <code class="n">kernel_size</code><code class="o">=</code><code class="p">[</code><code class="mi">5</code><code class="p">,</code> <code class="mi">5</code><code class="p">],</code> <code class="n">padding</code><code class="o">=</code><code class="s2">"same"</code><code class="p">,</code> <code class="n">activation</code><code class="o">=</code><code class="n">tf</code><code class="o">.</code><code class="n">nn</code><code class="o">.</code><code class="n">relu</code><code class="p">)</code>
<code class="n">pool2</code> <code class="o">=</code> <code class="n">tf</code><code class="o">.</code><code class="n">layers</code><code class="o">.</code><code class="n">max_pooling2d</code><code class="p">(</code><code class="n">inputs</code><code class="o">=</code><code class="n">conv2</code><code class="p">,</code> <code class="n">pool_size</code><code class="o">=</code><code class="p">[</code><code class="mi">2</code><code class="p">,</code> <code class="mi">2</code><code class="p">],</code> <code class="n">strides</code><code class="o">=</code><code class="mi">2</code><code class="p">)</code>
<code class="n">pool_flat</code> <code class="o">=</code> <code class="n">tf</code><code class="o">.</code><code class="n">reshape</code><code class="p">(</code><code class="n">pool2</code><code class="p">,</code> <code class="p">[</code><code class="o">-</code><code class="mi">1</code><code class="p">,</code> <code class="mi">7</code> <code class="o">*</code> <code class="mi">7</code> <code class="o">*</code> <code class="mi">64</code><code class="p">])</code>
</pre>
<p>At this point, you need be quite patient when running the code. The complexity of the network is adding a lot of overhead, but we are rewarded with better accuracy.</p>
<p>We'll now introduce another technique that could improve the network performance and avoid overfitting. It's called <a href="https://en.wikipedia.org/wiki/Convolutional_neural_network#Dropout">Dropout</a>, and we’ll apply it to the hidden dense layer. Dropout works in a way that individual nodes are either shut down or kept with some explicit probability. It is used in the training phase, so remember you need to turn it off when evaluating your network.</p>
<p>To use Dropout, we need to change the code slightly. First of all, we need a placeholder to be used in both the training and testing phases to hold the probability of the Dropout.</p>
<pre data-type="programlisting" data-code-language="python" data-highlighted="true">
<code class="n">should_drop</code> <code class="o">=</code> <code class="n">tf</code><code class="o">.</code><code class="n">placeholder</code><code class="p">(</code><code class="n">tf</code><code class="o">.</code><code class="n">bool</code><code class="p">)</code></pre>
<p>Second, we need to define the dropout and connect it to the output layer. The rest of the architecture stays the same.</p>
<pre data-type="programlisting" data-code-language="python" data-highlighted="true">
<code class="n">hidden</code> <code class="o">=</code> <code class="n">tf</code><code class="o">.</code><code class="n">layers</code><code class="o">.</code><code class="n">dense</code><code class="p">(</code><code class="n">inputs</code><code class="o">=</code><code class="n">pool_flat</code><code class="p">,</code> <code class="n">units</code><code class="o">=</code><code class="mi">1024</code><code class="p">,</code> <code class="n">activation</code><code class="o">=</code><code class="n">tf</code><code class="o">.</code><code class="n">nn</code><code class="o">.</code><code class="n">relu</code><code class="p">)</code>
<code class="n">dropout</code> <code class="o">=</code> <code class="n">tf</code><code class="o">.</code><code class="n">layers</code><code class="o">.</code><code class="n">dropout</code><code class="p">(</code><code class="n">inputs</code><code class="o">=</code><code class="n">hidden</code><code class="p">,</code> <code class="n">rate</code><code class="o">=</code><code class="mf">0.5</code><code class="p">,</code> <code class="n">training</code><code class="o">=</code><code class="n">should_drop</code><code class="p">)</code>
<code class="n">output</code> <code class="o">=</code> <code class="n">tf</code><code class="o">.</code><code class="n">layers</code><code class="o">.</code><code class="n">dense</code><code class="p">(</code><code class="n">inputs</code><code class="o">=</code><code class="n">dropout</code><code class="p">,</code> <code class="n">units</code><code class="o">=</code><code class="n">labels_size</code><code class="p">)</code>
</pre>
<p>In this article, we started by introducing the concepts of deep learning and used TensorFlow to build a multi-layered convolutional network. The code can be reused for image recognition tasks and applied to any data set. More complex images, however, would require greater depth as well as more sophisticated twists, such as inception or ResNets.</p>
<p>The key lesson from this exercise is that you don’t need to master statistical techniques or write complex matrix multiplication code to create an AI model. TensorFlow can handle those for you. However, you need to know which algorithms are appropriate for your data and application, and determine the best hyperparameters, such as network architecture, depth of layers, batch size, learning rate, etc. Be aware that the variety of choices in libraries like TensorFlow give you requires a lot of responsibility on your side.</p>
<p><em>This post is a collaboration between O'Reilly and TensorFlow. </em><a href="http://www.oreilly.com/about/editorial_independence.html"><em>See our statement of editorial independence</em></a><em>.</em></p>
<p>Continue reading <a href='https://www.oreilly.com/ideas/building-deep-learning-neural-networks-using-tensorflow-layers'>Building deep learning neural networks using TensorFlow layers.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/kBsbQ-E0PH4" height="1" width="1" alt=""/>Barbara Fusinskahttps://www.oreilly.com/ideas/building-deep-learning-neural-networks-using-tensorflow-layersModern data is continuous, diverse, and ever accelerating2018-02-14T14:00:00Ztag:www.oreilly.com,2018-02-14:/ideas/modern-data-is-continuous-diverse-and-ever-accelerating<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/water-supply-2328260_1920-3c56696f51c7cda75a481632c4fecdeb.jpg'/></p><p><em>How companies such as athenahealth can transform legacy data into insights.</em></p><p>Application development in the big data space has changed rapidly over the past decade, driven by three factors: the need for continuous availability, the richness of diverse data processing pipelines, and the pressure for accelerated development. Here we look at each of these factors and how they combine to require new environments for data processing. We also look at how one company, <a href="https://www.athenahealth.com/">athenahealth</a>, is adjusting its legacy systems, for billing, scheduling, and treatment for health care providers, to accommodate these trends, <a href="https://mesosphere.com/blog/dcos-athenahealth/">using the Mesosphere DC/OS platform</a>. </p>
<h2>Continuous availability: Data is always on</h2>
<p>This is a world where people could be working or visiting your site at three in the morning, and if it's unavailable or slow you'll be hearing complaints. Failure recovery and scaling are both critical capabilities. These used to be handled separately: failure recovery revolved around heartbeats and redundant resources, whereas scaling was an element of long-term planning by IT management. But the two capabilities are now handled through the same kinds of operations. This makes sense, because they both require monitoring and resource management.</p><p>Continue reading <a href='https://www.oreilly.com/ideas/modern-data-is-continuous-diverse-and-ever-accelerating'>Modern data is continuous, diverse, and ever accelerating.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/FrgIu5VJEbI" height="1" width="1" alt=""/>Andy Oramhttps://www.oreilly.com/ideas/modern-data-is-continuous-diverse-and-ever-acceleratingFour short links: 14 February 20182018-02-14T11:45:00Ztag:www.oreilly.com,2018-02-14:/ideas/four-short-links-14-february-2018<p><em>CS Ethics, Experience the Retail Struggle, Front-End Interview Handbook, and Label Shift</em></p><ol>
<li>
<a href="https://www.nytimes.com/2018/02/12/business/computer-science-ethics-courses.html">New CS Ethics Courses</a> (NYT) -- Harvard, MIT, Stanford, and UT Austin all offering ethics classes around the challenges that computer scientists and programmers face as they research and develop the future.</li>
<li>
<a href="https://www.bloomberg.com/features/american-mall-game/">American Mall</a> -- Bloomberg's mock-retro game to illustrate the difficulties of keeping American retail malls open. I'm a huge fan of using games to let people experience/simulate a situation.</li>
<li>
<a href="https://github.com/yangshun/front-end-interview-handbook">Front End Interview Handbook</a> -- Answers to front-end interview questions. I can’t begin to imagine the rate of change in this repository. </li>
<li>
<a href="https://arxiv.org/abs/1802.03916">Detecting and Correcting for Label Shift with Black Box Predictors</a> -- <i>Faced with distribution shift between training and test set, we wish to detect and quantify the shift, and to correct our classifiers without test set labels.</i> Nice. For you discover that your training set underrepresented one of the variables. (Their example is: trained on a data set with .2% pneumonia occurrence but now you learn that pneumonia has 5% prevalence in the population.)</li>
</ol>
<p>Continue reading <a href='https://www.oreilly.com/ideas/four-short-links-14-february-2018'>Four short links: 14 February 2018.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/tsL-NZH7yAE" height="1" width="1" alt=""/>Nat Torkingtonhttps://www.oreilly.com/ideas/four-short-links-14-february-2018Just released: 30+ new live online trainings on O'Reilly's learning platform2018-02-14T11:00:00Ztag:www.oreilly.com,2018-02-14:/ideas/just-released-30-plus-new-live-online-trainings-on-oreillys-learning-platform<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/oreilly-insights-keyboard-paper-desk-crop-cb1e21b08e0b1da78c63b7fb1676d9cb.jpg'/></p><p><em>Get instructor-led training in Python, React, PMP, security, and more.</em></p><p>We just opened up more than <a href="https://www.safaribooksonline.com/live-training/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">30 new live online trainings for February, March, and April on our learning platform</a>.</p>
<p>These trainings give you hands-on instruction from expert practitioners in critical topics.</p>
<p>Space is limited and these trainings often fill up.</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/reactive-spring-and-spring-boot/0636920161592?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Reactive Spring and Spring Boot</a></em>, February 27</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/python-beyond-the-basics/0636920161882?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Python: Beyond the Basics</a></em>, March 1-2</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/having-difficult-conversations/0636920156390?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Having Difficult Conversations</a></em>, March 2</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/python-the-next-level/0636920162018?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Python: The Next Level</a></em>, March 5-6</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/python-for-applications-beyond-scripts/0636920162155?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Python for Applications: Beyond Scripts</a></em>, March 7-8</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/medium-r-programming-beyond-the-basics/0636920158622?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Medium R Programming: Beyond the Basics</a></em>, March 8-9</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/building-effective-and-adaptive-teams/0636920155003?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Building Effective and Adaptive Teams</a></em>, March 12</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/getting-started-with-react-and-redux/0636920159506?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Getting Started with React and Redux</a></em>, March 12</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/pythonic-object-oriented-programming/0636920162278?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Pythonic Object-Oriented Programming</a></em>, March 12</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/test-driven-development-in-python/0636920162377?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Test-Driven Development in Python</a></em>, March 13</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/building-a-deployment-pipeline-with-jenkins-2/0636920158875?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Building a Deployment Pipeline with Jenkins 2</a></em>, March 14-15</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/python-beyond-the-basics/0636920161943?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Python: Beyond the Basics</a></em>, March 14-15</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/intro-to-deep-learning-part-1-theory-and-practice-featuring-keras/0636920140351?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Intro to Deep Learning Part 1: Theory and Practice Featuring Keras</a></em>, March 19</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/networking-in-aws/0636920156840?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Networking in AWS</a></em>, March 19-20</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/python-the-next-level/0636920162087?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Python: The Next Level</a></em>, March 19-20</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/migrating-jenkins-environments-to-jenkins-2/0636920159025?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Migrating Jenkins Environments to Jenkins 2</a></em>, March 19 and 21</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/java-8-generics-in-3-hours/0636920159766?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Java 8 Generics in 3 Hours</a></em>, March 20</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/python-for-applications-beyond-scripts/0636920162216?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Python for Applications: Beyond Scripts</a></em>, March 21-22</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/apache-hadoop-spark-and-big-data-foundations/0636920161714?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Apache Hadoop, Spark and Big Data Foundations</a></em>, March 23</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/pmp-crash-course/0636920159353?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">PMP Crash Course</a></em>, March 26-27</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/pythonic-object-oriented-programming/0636920162322?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Pythonic Object-Oriented Programming</a></em>, March 28</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/mastering-relational-sql-querying/0636920159476?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Mastering Relational SQL Querying</a></em>, March 28-29</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/test-driven-development-in-python/0636920162414?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Test-Driven Development in Python</a></em>, March 29</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/cyber-security-fundamentals/0636920159520?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Cyber Security Fundamentals</a></em>, March 29-30</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/hands-on-introduction-to-apache-hadoop-and-spark-programming/0636920161769?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Hands-on Introduction to Apache Hadoop and Spark Programming</a></em>, March 29-30</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/design-thinking-practice-and-measurement-essentials/0636920158387?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Design Thinking: Practice and Measurement Essentials</a></em>, April 2</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/from-monolith-to-microservices/0636920141112?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">From Monolith to Microservicess</a></em>, April 4-5</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/design-thinking-90-minute-introduction/0636920137108?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Design Thinking: 90-Minute Introduction</a></em>, April 5</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/building-chatbots-with-aws/0636920156901?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Building Chatbots with AWS</a></em>, April 6</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/deep-learning-for-nlp/0636920159971?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Deep Learning for NLP</a></em>, April 6</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/analyzing-container-performance/0636920140474?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Analyzing Container Performance</a></em>, April 9</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/architecture-without-an-end-state/0636920155263?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Architecture Without an End State</a></em>, April 9-10</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/design-patterns-boot-camp/0636920144984?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Design Patterns Boot Camp</a></em>, April 9-10</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/working-with-web-push-notifications/0636920150404?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Working with Web Push Notifications</a></em>, April 10-11</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/high-performance-machine-learning-and-data-analysis-with-julia/0636920159827?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">High Performance Machine Learning and Data Analysis with Julia</a></em>, April 12</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/from-developer-to-software-architect/0636920155249?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">From Developer to Software Architect</a></em>, April 16-17</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/microservices-architecture-and-design/0636920138464?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Microservices Architecture and Design</a></em>, April 16-17</p>
<p><em><a href="https://www.safaribooksonline.com/live-training/courses/fundamental-postgresql/0636920144144?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Fundamental PostgreSQL</a></em>, April 17-18</p>
<p><a href="https://www.safaribooksonline.com/live-training/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=021418-live-online-training-announcement">Visit our learning platform</a> for more information on these and our other live online trainings.</p>
<p>Continue reading <a href='https://www.oreilly.com/ideas/just-released-30-plus-new-live-online-trainings-on-oreillys-learning-platform'>Just released: 30+ new live online trainings on O'Reilly's learning platform.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/RYseVJAPRDA" height="1" width="1" alt=""/>https://www.oreilly.com/ideas/just-released-30-plus-new-live-online-trainings-on-oreillys-learning-platformThe fundamentals of voice design by way of voice enabling2018-02-13T15:35:00Ztag:www.oreilly.com,2018-02-13:/ideas/the-fundamentals-of-voice-design-by-way-of-voice-enabling<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/intonarumori_1913_crop-be0c01e9c69469b9da0dc7ccd2b0bc33.jpg'/></p><p><em>Learn how to integrate voice with your product.</em></p><p>Continue reading <a href='https://www.oreilly.com/ideas/the-fundamentals-of-voice-design-by-way-of-voice-enabling'>The fundamentals of voice design by way of voice enabling.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/WZZwP3Z72wQ" height="1" width="1" alt=""/>Tanya Kraljichttps://www.oreilly.com/ideas/the-fundamentals-of-voice-design-by-way-of-voice-enablingIterative data modeling to avoid dreaded ETL2018-02-13T15:30:00Ztag:www.oreilly.com,2018-02-13:/ideas/iterative-data-modeling-to-avoid-dreaded-etl<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/glass-ball-2152952_1920_crop-75a3bfc56b4827ec24674f048956d550.jpg'/></p><p><em>Gain agility by loading first and transforming later.</em></p><p>In today’s world full of big data, every large enterprise faces a similar problem: how do I leverage my data more effectively when it’s spread across dozens or even hundreds of systems? Businesses build mission-critical business applications on relational databases filled with structured data, and they also have unstructured data to worry about (think patient notes, photos, reports, etc.). They want to get a better grasp of all this data, and build new applications that leverage it to innovate and better serve their customers.</p>
<h2>The ETL problem</h2>
<p>Integrating data from various silos into a relational database requires significant investment in the extract, transform, load (ETL) phase of any data project. Before building an application that leverages integrated data, data architects must first reconcile all of the data in their source systems, finalizing the schema before the data can be ingested. This data modeling effort may take years. And, additional effort will be necessary with each change in an input system data scheme or application requirement.</p>
<p>This approach is not agile, and in today’s world, it means that a business constantly plays catch-up. Not to mention, ETL tools and the work that goes into using them can eat up 60% of a project’s budget, despite providing little additional value (see the report <a href="http://download.101com.com/tdwi/research_report/2003ETLReport.pdf">TDWI, Evaluating ETL and Data Integration Platforms</a>). The meaningful work of building an application and delivering actual value only begins after all the ETL works is complete.</p>
<h2>The ELT solution</h2>
<p>No, “ELT” is not a typo. Instead of ETL, the flexibility of a document database makes it possible to extract, load...and <em>then</em> transform (hence, "ELT"). This process, known as “schema-on-read” (<a href="http://www.marklogic.com/blog/schema-on-read-vs-schema-on-write/">instead o</a><a href="http://www.marklogic.com/blog/schema-on-read-vs-schema-on-write/">f the traditional schema-on-write</a>), lets you apply your own lens to the data when you read it back out. So instead of requiring a schema first, before doing anything with your data, you can use the latent schema already with the data and update this existing schema later as desired or needed.</p>
<p>That means taking all of your data, from all of your systems—structured and unstructured, however it comes—and ingest it <em>as is</em>. Developers can start using it immediately to build applications. By loading it into a database that can support different schemas and data types (document, RDF, geospatial, binary, SQL, and text), data architects don’t have to worry about defining the schema, type, or format up front and can focus instead on how to use that data down the line.</p>
<p>Once loaded, it is possible to iteratively make adjustments to the data as needed to address current requirements. Now you can transform that data, harmonize it, and make it usable for your business needs, as you need it. Over time, as requirements and downstream systems change, so might your data transformations. In part three of the recently released <a href="https://info.marklogic.com/transforming-data-part-three-reg.html?type=PD&amp;publisher=OReilly"><em>MarkLogic Cookbook</em></a>, Dave Cassel illustrates a variety of ways to transform and harmonize data in MarkLogic after data has been loaded. In fact, he shows how you can transform around a given field as you load.</p>
<p>Of course, from a governance standpoint, you don’t want to actually change the data. You can use the MarkLogic Envelope Pattern to wrap newly harmonized data around the original data to preserve its original form. You can also transform the data that gets stored in indexes without physically changing the data stored in documents. And, finally, you can use the platform to implement a data-as-a-service pattern, transforming data on export as it is accessed by downstream applications.</p>
<p>Data modeling does not have to be an up-front activity, but rather an iterative one that evolves as the business needs evolve. This iterative data modeling approach allows large enterprises to respond faster to their business needs, reap the benefits of their business data, and cut significant costs.</p>
<p><em>This post is a collaboration between O'Reilly and MarkLogic. </em><a href="http://www.oreilly.com/about/editorial_independence.html"><em>See our statement of editorial independence</em></a><em>.</em></p>
<p>Continue reading <a href='https://www.oreilly.com/ideas/iterative-data-modeling-to-avoid-dreaded-etl'>Iterative data modeling to avoid dreaded ETL.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/5z0ttXA3uIk" height="1" width="1" alt=""/>Trinh Lieuhttps://www.oreilly.com/ideas/iterative-data-modeling-to-avoid-dreaded-etlHow neural networks learn distributed representations2018-02-13T15:10:00Ztag:www.oreilly.com,2018-02-13:/ideas/how-neural-networks-learn-distributed-representations<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/neurons-1773922_1920_crop-2c7d127d9e231ec8174a2ad52fbe7a9b.jpg'/></p><p><em>Deep learning’s effectiveness is often attributed to the ability of neural networks to learn rich representations of data.</em></p><p>The concept of distributed representations is often central to deep learning, particularly as it applies to natural language tasks. Those beginning in the field may quickly understand this as simply a vector that represents some piece of data. While this is true, understanding distributed representations at a more conceptual level increases our appreciation of the role they play in making deep learning so effective.</p>
<p>To examine different types of representation, we can do a simple thought exercise. Let’s say we have a bunch of “memory units” to store information about shapes. We can choose to represent each individual shape with a single memory unit, as demonstrated in Figure 1.</p>
<figure class="center" id="id-6YOix"><img alt="Sparse or local non-distributed representation of shapes" src="https://d3ansictanv2wj.cloudfront.net/Figure1-5eadbb96f1ac8c2b926b58d5a940f644.png"><figcaption><span class="label">Figure 1. </span>Sparse or local, non-distributed representation of shapes. Image by Garrett Hoffman.</figcaption></figure>
<p>This non-distributed representation, referred to as "sparse" or "local," is inefficient in multiple ways. First, the dimensionality of our representation will grow as the number of shapes we observe grows. More importantly, it doesn’t provide any information about how these shapes relate to each other. This is the true value of a distributed representation: its ability to capture meaningful “semantic similarity” between between data through concepts.</p>
<figure class="center" id="id-RlWiz"><img alt="Distributed representation of shapes" src="https://d3ansictanv2wj.cloudfront.net/Figure2-9b757823f56b45a64820ab55a65918dd.png"><figcaption><span class="label">Figure 2. </span>Distributed representation of shapes. Image by Garrett Hoffman.</figcaption></figure>
<p>Figure 2 shows a distributed representation of this same set of shapes where information about the shape is represented with multiple “memory units” for concepts related to orientation and shape. Now the “memory units” contain information both about an individual shape and how each shape relates to each other. When we come across a new shape with our distributed representation, such as the circle in Figure 3, we don’t increase the dimensionality and we also know some information about the circle, as it relates to the other shapes, even though we haven’t seen it before.</p>
<figure class="center" id="id-RN4iK"><img alt="Distributed representation of a circle" src="https://d3ansictanv2wj.cloudfront.net/Figure3-aa3da219493b848c894ee14c49bef24c.png"><figcaption><span class="label">Figure 3. </span>Distributed representation of a circle; This representation is more useful as it provides us with information about how this new shape is related to our other shapes. Image by Garrett Hoffman.</figcaption></figure>
<p>While this shape example is oversimplified, it serves as a great high-level, abstract introduction to distributed representations. Notice, in the case of our distributed representation for shapes, that we selected four concepts or features (vertical, horizontal, rectangle, ellipse) for our representation. In this case, we were required to know what these important and distinguishing features were beforehand, and in many cases, this is a difficult or impossible thing to know. It is for this reason that feature engineering is such a crucial task in classical machine learning techniques. Finding a good representation of our data is critical to the success of downstream tasks like classification or clustering. One of the reasons that deep learning has seen tremendous success is a neural networks' ability to learn rich distributed representations of data.</p>
<p>To examine this, we will revisit the <a href="https://www.oreilly.com/ideas/introduction-to-lstms-with-tensorflow">problem we tackled in our LSTM tutorial</a>—predicting stock market sentiment from social media posts from <a href="https://stocktwits.com/">StockTwits</a>. In this tutorial, we built a multi-layered LSTM to predict the sentiment of a message from the raw body of text. When processing our message data, we created a mapping of our vocabulary to an integer index.</p>
<p>This mapping of vocabulary to integer is a non-distributed sparse representation of our data. For instance, the word <code>buy</code> is mapped to index <code>25</code> and the word <code>long</code> is represented as index <code>68</code>. Note that this is an equivalent representation to a “one-hot encoded” vector of length <code>vocab_size</code> with a 1 in the index representing the word and a 0 everywhere else. These are two independent representations that have no relational information between the two words despite their semantic similarity when it comes to investing—both words represent a position of owning a stock.</p>
<p>The canonical methodology for learning a distributed representation of words is the <a href="https://arxiv.org/pdf/1310.4546.pdf">Word2Vec model</a>. The Word2Vec skip-gram model, whose architecture is pictured in Figure 4, takes in a single word, passes this to a single linear hidden layer unique to that word, and uses a softmax activation layer to predict the words that occur in a window around it.</p>
<figure class="center" id="id-RMkiq"><img alt="Distributed representation of shapes" src="https://d3ansictanv2wj.cloudfront.net/Figure4-cbbef751ee84945c8491c7e896fb8464.png"><figcaption><span class="label">Figure 4. </span>Distributed representation of shapes. Image by Google from <a href="https://arxiv.org/pdf/1310.4546.pdf">“Distributed Representations of Words and Phrases and their Compositionality”</a>, used with permission.</figcaption></figure>
<p>The Word2Vec model uses the J.R. Firth philosophy—“you shall know a word by the company it keeps,” and can be <a href="https://www.tensorflow.org/tutorials/word2vec">implemented very easily in TensorFlow</a>. By learning hidden weights, which will be used as our distributed representation, words that appear in similar context will have a similar representation. Word2Vec is a model designed specifically for learning distributed representations of words, also called "word embeddings," from their context. Oftentimes, these embeddings are pre trained with Word2Vec and then used as inputs to other models performing language tasks.</p>
<p>Alternatively, distributed representations can be learned in an end-to-end fashion as part of the model training process for an arbitrary task. This is how we learned our word embedding in our <a href="https://www.oreilly.com/ideas/introduction-to-lstms-with-tensorflow">stock market sentiment LSTM model</a>. Recall the model architecture (see Figure 5), where we input our sparse representation of words into an embedding layer. </p>
<figure class="center" id="id-6QZiw"><img alt="Unrolled single-layer LSTM network with embedding layer" src="https://d3ansictanv2wj.cloudfront.net/Figure5-bb5d651bc4fda2133127d0478aae02a0.png"><figcaption><span class="label">Figure 5. </span>Unrolled single-layer LSTM network with embedding layer. Image courtesy of Udacity, used with permission.</figcaption></figure>
<p>Trained under this paradigm, distributed representations will specifically learn to represent items as they relate to the learning task—in our case, our distributed representation should specifically learn semantic context around sentiments of words. We can examine this by extracting our word embeddings and looking at some examples.</p>
<p>We visualize the relationship between a few “bearish-bullish pairs” by reducing the dimensionality of our representations <a href="https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding">using tSNE </a> (see Figure 6). Note the concept of sentiment represented by the left-to-right direction between word pairs (i.e., bearish vs. bullish, overvalued vs. undervalued, short vs. long, etc.).</p>
<figure class="center" id="id-3aBiB"><img alt="Visualization of word embeddings" src="https://d3ansictanv2wj.cloudfront.net/Figure6-9335105503833cebf83ba1db2f6bb76f.png"><figcaption><span class="label">Figure 6. </span>Visualization of word embeddings demonstrating the semantic relationship of the concept of sentiment captured by our distributed representation. Image by Garrett Hoffman.</figcaption></figure>
<p>While these aren’t perfect—ideally, we would want to see the pairings more vertically aligned, and we also have some pairs where sentiment is reversed—they are pretty good given limited training. Our model's ability to learn this type of representation is a major reason it is able to achieve high accuracy when predicting sentiment.</p>
<p>A neural network's ability to learn distributed representation of data is one of the main reasons that deep learning is so effective for so many different types of problems. The power and beauty of this concept makes representation learning one of the most exciting and active areas of deep learning research. Methods for learning shared representations across multiple modals (e.g., words and images, words in different languages) are enabling advancements in image captioning and translation. We can be sure that better understanding these types of representations will continue to be a major factor in driving AI forward.</p>
<p><em>This post is a collaboration between O'Reilly and </em><a href="https://www.tensorflow.org/"><em>TensorFlow</em></a><em>. </em><a href="http://www.oreilly.com/about/editorial_independence.html"><em>See our statement of editorial independence</em></a><em>.</em></p>
<p>Continue reading <a href='https://www.oreilly.com/ideas/how-neural-networks-learn-distributed-representations'>How neural networks learn distributed representations.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/bTS0JoQmhP8" height="1" width="1" alt=""/>Garrett Hoffmanhttps://www.oreilly.com/ideas/how-neural-networks-learn-distributed-representationsFour short links: 13 February 20182018-02-13T13:00:00Ztag:www.oreilly.com,2018-02-13:/ideas/four-short-links-13-february-2018<p><em>Machine Learning, CSP Reporting, Remembering Learning, and Viz for Human Rights</em></p><ol>
<li>
<a href="https://prodi.gy/">Prodigy</a> -- <i>Radically efficient machine teaching. An annotation tool powered by active learning.</i>
</li>
<li>
<a href="https://report-uri.github.io/report-uri-js-demo/">Report URI JS</a> -- contenty security policies are awesome, but they are enforced on the browser before your server sees any requests. Use this script to find out what is being blocked by your CSP. (via <a href="https://boingboing.net/2018/02/11/ic-uh-oh.html#more-572548">BoingBoing</a>)</li>
<li>
<a href="https://dnote.io/blog/writing-everything-i-learn-coding-for-a-month/">I Wrote Down Everything I Learned While Programming for a Month</a> -- I do this and find it hugely valuable. It's one thing to say "I'm learning all the time" but another to actually be able to point to what you're learning.</li>
<li>
<a href="http://visualizingrights.org/kit/">Visualizing Data for Human Rights Advocacy</a> -- <i>A guidebook and workshop activity.</i>
</li>
</ol>
<p>Continue reading <a href='https://www.oreilly.com/ideas/four-short-links-13-february-2018'>Four short links: 13 February 2018.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/bbJDf6hR6TU" height="1" width="1" alt=""/>Nat Torkingtonhttps://www.oreilly.com/ideas/four-short-links-13-february-20184 trends in security data science for 20182018-02-13T12:00:00Ztag:www.oreilly.com,2018-02-13:/ideas/4-trends-in-security-data-science-for-2018<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/pendulum-1934311_1920_crop-deeb9a1e9a102abf6b297b630de16442.jpg'/></p><p><em>A glimpse into what lies ahead for response automation, model compliance, and repeatable experiments.</em></p><p>This is the third consecutive year I’ve tried to read the tea leaves for security analytics. <a href="https://www.oreilly.com/ideas/4-trends-in-security-data-science-for-2017">Last year’s trends post</a> manifested well: from a rise in <a href="https://blog.openai.com/adversarial-example-research/">adversarial machine learning</a> (ML) to the deep learning craze (such that<a href="https://deep-learning-security.github.io/"> entire </a><a href="https://www.ieee-security.org/TC/SPW2018/DLS/">conferences</a> are now dedicated to this subject). This year, <a href="https://www.endgame.com/our-experts/hyrum-anderson">Hyrum Anderson</a>, technical director of data science from Endgame, joins me in calling out the trends in security data science for 2018. We present a 360-degree view of the security data science landscape—from unicorn startups to established enterprises.</p>
<p>The format remains mostly remains the same: four trends to map to each quarter of the year. For each trend, we provide a rationale about why the time is right to capitalize on the trend, offer practical tips on what you can do now to join the conversation, and include links to papers, GitHub repositories, tools, and tutorials. We also added a new section “What won’t happen in 2018” to help readers look beyond the marketing material and stay clear of hype.</p>
<h2>1. Machine learning for response (semi-)automation</h2>
<p>In 2016, we predicted a shift <a href="https://www.oreilly.com/ideas/4-trends-in-security-data-science">from detection to intelligent investigation</a>. In 2018, we’re predicting a shift from rich investigative information toward distilled recommended actions, backed by information-rich incident reports. Infosec analysts have long stopped clamoring for “more alerts!” from security providers. In the coming year, we’ll see increased customer appetite for products to recommend actions based on solid evidence. Machine learning has, in large part, proven itself a valuable tool for detecting evidence of threats used to compile an incident report. Security professionals subconsciously train themselves to respond to (or ignore) the evidence of an incident in a certain way. The linchpin to scale in information security rests still on the information security analyst, and many of the knee-jerk responses can be automated. In some cases, the response might be ML-automated, but in many others it will be at least ML-recommended.</p>
<h3>Why now?</h3>
<p>The information overload pain point is as old as IDS technology—<a href="https://jisajournal.springeropen.com/articles/10.1186/1869-0238-4-7">not a new problem</a> for machine learning to tackle—and <a href="https://biztechmagazine.com/article/2017/07/pros-and-cons-automated-cybersecurity">some</a> in the industry have invested in ML-based (semi-) automated remediation. However, there are a few pressures driving more widespread application of ML to simplify response through ML distillation rather than complicate with additional evidence: (1) market pressure to optimize workflows instead of alerts—to scale human response, (2) diminishing returns on reducing time-to-detect compared to time-to-remediate.</p>
<h3>What can you do?</h3>
<ul>
<li>Assess remediation workflows of security analysts in your organization: (1) What pieces of evidence related to the incident provide high enough confidence <em>to</em> respond? (2) What evidence determines <em>how </em>to respond? (3) For a typical incident, how many decisions must be made during remediation? (4) How long does remediation take for a typical incident? (5) What is currently being automated reliably? (6) What tasks could still be automated?</li>
<li>Don’t force a solution on security analysts—chances are, they are creating custom remediation scripts in powershell or bash. You may already be using a mixed-bag of commercial and open source tools for remediation (e.g., <a href="https://www.ansible.com/">Ansible</a> to task commands to different groups, or open source <a href="https://twitter.com/davehull">@davehull</a>’s <a href="https://github.com/davehull/Kansa">Kansa</a>).</li>
<li>Assess how existing solutions can help simplify and automate remediation steps. Check out <a href="https://www.demisto.com/real-time-interactive-investigation/">Demisto</a>, or Endgame’s <a href="https://www.endgame.com/blog/technical-blog/artemis-intelligent-assistant-cyber-defense">Artemis</a>.</li>
</ul>
<h2>2. Machine learning for attack automation</h2>
<p>“Invest in adversarial machine learning” was listed in our <a href="https://www.oreilly.com/ideas/4-trends-in-security-data-science">previous</a> <a href="https://www.oreilly.com/ideas/4-trends-in-security-data-science-for-2017">two</a> yearly trends because of the tremendous uptick in research activity. In 2018, we’re predicting that one manifestation of this is now ripe for adoption in the mainstream: ML for attack automation. A caveat: although we believe that 2018 will be the year that ML begins to be adopted for automating—for example, social engineering <a href="https://www.blackhat.com/docs/us-16/materials/us-16-Seymour-Tully-Weaponizing-Data-Science-For-Social-Engineering-Automated-E2E-Spear-Phishing-On-Twitter-wp.pdf">phishing</a> attacks or bypassing <a href="https://www.vicarious.com/2017/10/26/common-sense-cortex-and-captcha/">CAPTCHA</a>—we don’t think it’s necessarily the year we’ll see evidence in the wild of sophisticated methods to <a href="https://www.blackhat.com/docs/us-17/thursday/us-17-Anderson-Bot-Vs-Bot-Evading-Machine-Learning-Malware-Detection-wp.pdf">subvert your machine learning malware detection</a>, or to discover and <a href="https://search.descarteslabs.com/">exploit vulnerabilities in your network</a>. That’s still research, and today, there are still easier methods for attackers.</p>
<h3>Why now?</h3>
<p>There’s been <a href="https://en.wikipedia.org/wiki/2016_Cyber_Grand_Challenge">significant</a> research activity to demonstrate how, at least theoretically, AI can scale digital attacks in an unprecedented way. Tool sets are making the barrier to entry quite low. And there are economic drivers to do things like bypass CAPTCHA. Incidentally, today’s security risks and exploits are often more embarrassing than sophisticated, so that even sophisticated adversaries may not require machine learning to be effective, instead <a href="https://www.youtube.com/watch?v=bDJb8WOJYdA">relying</a> on unpatched deficiencies in networks that the attacker understands and exploits. So, it’s important to not be an alarmist. Think of ML for attack automation as just an algorithmic wrinkle that adds dynamism and efficiency to automatically discovering or exploiting during an attack.</p>
<h3>What can you do?</h3>
<ul>
<li>Protect your users by more than simple image/audio CAPTCHA-like techniques that can be solved trivially by a human. Chances are that if it’s trivially solved by a human, then it’s a target for machine learning automation. There are no easy alternatives—but, moderate success has been obtained in image recognition showing fragments of a single image (say, a scene on the road), and asking to pick out those pieces that have a desired object (say, a red car).</li>
<li>Calmly prepare for even the unlikely. Ask yourself: how would you discover whether an attack on your network was automated by machine learning or by old-fashioned enumeration in a script? (Might you see exploration-turn-to-exploitation in an ML attack? Would it matter?)</li>
<li>Familiarize yourself with pen testing and red-teaming tools like <a href="https://github.com/mitre/caldera">Caldera</a>, <a href="https://www.immunityinc.com/products/innuendo/">Innuendo</a>, and <a href="https://github.com/redcanaryco/atomic-red-team">Atomic Red Team</a>, which can simulate advanced manual attacks, but would also give you a leg-up on automated attacks in years to come.</li>
</ul>
<h2>3. Model compliance</h2>
<p>Global compliance laws affect the design, engineering, and operational costs of security data science solutions. The laws themselves provide strict guidelines around data handling, movement (platform constraints), as well as model-building constraints such as explainability and the “right to be forgotten.” Model compliance is not a one-time investment: privacy laws change with the political landscape. For instance, it is not clear how Britain leaving the European Union might affect the privacy laws in the UK. In some cases, privacy laws do not agree—for instance, Irish DPA considers IP addresses to be personally identifiable information, which is not the case across the world. More concretely, if you have an anomalous logins detection based on IP addresses, in some parts of the world the detection would not work because IP addresses would be scrubbed/removed. This means that the same machine learning model for detecting these anomalous logins would not work across different geographic regions.</p>
<h3>Why now?</h3>
<p>Building models that are respectful of compliance laws is important because failure to do so not only brings with it crippling monetary costs—for instance, failure to adhere to the new <a href="https://en.wikipedia.org/wiki/General_Data_Protection_Regulation">European General Data Protection Regulation (GDPR)</a>, set to take effect in May 2018, can result in a fine of up to 20 million Euros, or 4% of annual global turnover—but also the negative press associated for the business.</p>
<h3>What can you do?</h3>
<ul>
<li>As an end consumer, you would need to audit your data and tag it appropriately. Those who are on AWS are lucky, with Amazon’s <a href="https://aws.amazon.com/macie/">Macie service</a>. If your data set is small, it is best to bite the bullet and do it by hand.</li>
<li>Many countries prevent cloud providers from merging locality-specific data outside regional boundaries. We recommend tiered modeling: each geographic region is modeled separately, and the results are scrubbed and sent to a global model. <a href="http://proceedings.mlr.press/v48/hamm16.html">Differentially private ensembles</a> are particularly relevant here.</li>
</ul>
<figure class="center" id="id-5AEiz"><img alt="Tiered modeling models each geographic region separately" src="https://d3ansictanv2wj.cloudfront.net/Figure1-c205a1f8f77b906223048f4ab43699a0.png"><figcaption><span class="label">Figure 1. </span>Tiered modeling models each geographic region separately. Source: Ram Shankar Siva Kumar.</figcaption></figure>
<h2>4. Rigor and repeatable experiments</h2>
<p>The biggest buzz of the NIPS 2017 conference was when Ali Rahimi claimed current ML methods are akin to alchemy (<a href="http://www.inference.vc/my-thoughts-on-alchemy/">read commentary from @fhuszar on this subject</a>). At the core of Rahimi’s talk was how the machine learning field is progressing on non-rigorous methods that are not widely understood and, in some cases, not repeatable. For instance, <a href="https://arxiv.org/abs/1709.06560">researchers showed</a> how the same reinforcement algorithm from two different code bases on the same data set, had vastly different results. <a href="https://machinelearningmastery.com/reproducible-results-neural-networks-keras/">Jason Brownlee’s blog</a> breaks down the different ways an ML experiment can produce random results: from randomization introduced by libraries to GPU quirks.</p>
<h3>Why now?</h3>
<p>We are at a time where there is a deluge of buzzwords in the security world—artificial intelligence, advanced persistent threats, and machine deception. As a field, we have matured to know there are limitations to every solution; there is no omnipotent solution—even if it were to use the latest methods. So, this one is less of a trend and more a call to action.</p>
<h3>What can you do?</h3>
<ol>
<li>Whenever you publish your work, at an academic conference or a security con, please release your code and the data set you used for training. The red team is very good at doing this; we defenders need to step up our game.</li>
<li>Eschew publishing your detection results on the KDD 1999 data set—claiming state-of-the-art results on a data set that was popular during the times of Internet Explorer 5 and Napster is unhygienic. (“MNIST is the new unit test,” <a href="https://twitter.com/ram_ssk/status/939250754434379777">suggested Ian Goodfellow</a>, but it doesn’t convey a successful result.) Consider using a more realistic data set like <a href="https://github.com/daveherrald/botsv1">Splunk’s Boss of the SOC</a> curated by <a href="https://twitter.com/meansec">Ryan Kovar</a>.</li>
<li>We understand that in some cases there are no publicly available benchmarks and there is a constraint to release the data set as is—in that case, consider generating evaluation data in a simulated environment using <a href="https://github.com/subTee">@subtee</a>’s <a href="https://www.redcanary.com/blog/atomic-red-team-testing/">Red Canary framework</a>.</li>
<li>When you present a method at a conference, highlight the weakness and failures of the method—go beyond false positive and false negative rates, and highlight the tradeoffs. Let the audience know what kinds of attacks you will miss and how you compensate for them. If you need inspiration, <a href="https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63649?intcmp=il-data-confreg-lp-stca18_20180212_new_site_ram_shankar_2018_data_security_trends_text_body_cta">I will be at the Strata Data Conference in San Jose this March</a> talking about the different security experiments that spectacularly failed and how we fixed them.</li>
</ol>
<p>Your efforts to bring rigor to the security analytics field are going to benefit us all—the rising tide does raise all boats.</p>
<h2>What won’t happen in 2018</h2>
<p>To temper some of the untempered excitement (and sometimes hype) about machine learning in information security, we conclude with a few suggestions for things that we <em>aren’t</em> likely to see in 2018.</p>
<h3>Reinforcement learning (RL) for offense in the wild</h3>
<p>RL has been used to train agents that demonstrate superhuman performance at very narrow tasks, like <a href="https://deepmind.com/blog/alphago-zero-learning-scratch/">AlphaGo</a> and <a href="https://deepmind.com/research/publications/playing-atari-deep-reinforcement-learning/">Atari</a>. In infosec, it has been demonstrated in research settings to, for example, discover <a href="https://media.defcon.org/DEF%20CON%2025/DEF%20CON%2025%20presentations/DEFCON-25-Hyrum-Anderson-Evading-Next-Gen-AV-Using-AI.pdf">weaknesses of next-gen AV</a> at very modest rates. However, it’s not yet in the “it just works” category, and we forecast another one to two years before infosec realizes interesting offensive or defensive automation via RL.</p>
<h3>Generative adversarial networks (GANs) in an infosec product</h3>
<p>Generally speaking, GANs continue to see a ton of research activity with impressive results—the excitement is totally warranted. Unfortunately, there’s also been a <a href="https://arxiv.org/abs/1711.10337">lack of systematic and objective evaluation</a> metrics in their development. This is a cool hammer that has yet to find its respective killer application in infosec.</p>
<h3>Machine learning displacing security jobs</h3>
<p>In fact, we think the assimilation causality may go in reverse: because of ever-improving accessibility of machine learning, many more infosec professionals will begin to adopt machine learning in traditional security tasks.</p>
<h3>Hype around AI in infosec</h3>
<p>It is a fact that, especially in infosec, those talking about “AI” usually mean “ML.” Despite our best efforts, in 2018, the loaded buzzwords about AI in security aren’t going away. We still need to educate customers about how to cut through the hype by asking the right questions. And frankly, a consumer shouldn’t care if it’s AI, ML, or hand-crafted rules. The real question should be, “does it protect me?”</p>
<h2>Parting thoughts</h2>
<p>The year 2018 is going to bring ML-for-response, as well as the milder forms of attack automation, into the mainstream. As an industry, compliance laws for machine learning will affect a more general shift toward data privacy. The ML community will self-correct toward rigor and repeatability. At the same time, this year we will not see security products infused with RL or GANs—despite popularity in ongoing research. Your infosec job is here to stay, despite more use of ML. Finally, we’ll see this year that ML is mature enough to stand on its own, with no need to be propped up with imaginative buzzwords or hype.</p>
<p>We would love to hear your thoughts—reach out to us (<a href="https://twitter.com/ram_ssk">@ram_ssk</a> and <a href="https://twitter.com/drhyrum">@drhyrum</a>) and join the conversation!</p>
<p>Continue reading <a href='https://www.oreilly.com/ideas/4-trends-in-security-data-science-for-2018'>4 trends in security data science for 2018.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/0s3UIkRlJdk" height="1" width="1" alt=""/>Ram Shankar Siva Kumar, Hyrum Andersonhttps://www.oreilly.com/ideas/4-trends-in-security-data-science-for-2018Responding to new open source vulnerability disclosures2018-02-13T11:00:00Ztag:www.oreilly.com,2018-02-13:/ideas/responding-to-new-open-source-vulnerability-disclosures<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/singapore-2148190_1920-5318600b3243abd4214e602d24059903.jpg'/></p><p><em>Best practices for quick remediation and response</em></p>
<h2>Responding to New Vulnerability Disclosures
</h2>
<p>The techniques to find, fix, and prevent vulnerable dependencies are very similar to other quality controls. They revolve around issues in our application, and maintaining quality as the application changes. The last piece in the vulnerable library puzzle is a bit different.</p>
<p>In addition to their known vulnerabilities, the libraries you use also contain <em>unknown</em> vulnerabilities. Every now and then, somebody (typically a library’s authors, its users, or security researchers) will discover and report such a vulnerability. Once a vulnerability is discovered and publicly disclosed, you need to be ready to test your applications for it and fix the findings quickly—before attackers exploit it.</p><p>Continue reading <a href='https://www.oreilly.com/ideas/responding-to-new-open-source-vulnerability-disclosures'>Responding to new open source vulnerability disclosures.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/LocRld_G03M" height="1" width="1" alt=""/>Guy Podjarnyhttps://www.oreilly.com/ideas/responding-to-new-open-source-vulnerability-disclosures10 software architecture resources on O'Reilly's online learning platform2018-02-13T11:00:00Ztag:www.oreilly.com,2018-02-13:/ideas/10-software-architecture-resources-on-oreillys-online-learning-platform<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/architecture-2725277_1920-fa369ff4ce6e5bf993cd4741416704c1.jpg'/></p><p><em>Learn about new architecture patterns, event-driven microservices, fast data, and more.</em></p><p>Get a fresh start on building a new skill or augment what you currently know with one of these new and popular titles on O'Reilly's online learning platform.</p>
<p><a href="https://www.safaribooksonline.com/library/view/software-architecture-fundamentals/9781491998991/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=10-software-architecture-resources-on-oreillys-online-learning-platform"><img align="left" src="https://www.safaribooksonline.com/library/cover/9781491998991/" style="margin-right: 20px;" width="140px"></a></p><p>Continue reading <a href='https://www.oreilly.com/ideas/10-software-architecture-resources-on-oreillys-online-learning-platform'>10 software architecture resources on O'Reilly's online learning platform.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/rbS6F5TdAzE" height="1" width="1" alt=""/>https://www.oreilly.com/ideas/10-software-architecture-resources-on-oreillys-online-learning-platformFour short links: 12 February 20182018-02-12T11:50:00Ztag:www.oreilly.com,2018-02-12:/ideas/four-short-links-12-february-2018<p><em>Tech vs. Culture, Fairness and Accountability, People Typeface, and Reproducibility Suite</em></p><ol>
<li>
<a href="http://bit.ly/2ssgLvn">Containers Will Not Fix Your Broken Culture</a> (Bridget Kromhout) -- words of truth in the tech industry, but "{some tech thing} will not fix your broken culture" is true everywhere (e.g., iPads in schools, chatbots in customer-hating organizations, etc.)</li>
<li>
<a href="http://proceedings.mlr.press/v81/">FAT</a> -- proceedings from <i>Conference on Fairness, Accountability, and Transparency</i> in machine learning research.</li>
<li>
<a href="https://github.com/propublica/weepeople/">Wee People</a> -- <i>A typeface of people sillhouettes, to make it easy to build web graphics featuring little people instead of dots.</i> (via <a href="http://flowingdata.com/2018/02/09/people-font/">Flowing Data</a>)</li>
<li>
<a href="https://stenci.la/">Stencila</a> -- <i>The office suite for reproducible research.</i> Like a cross between a word processor and a spreadsheet. Almost a Jupyter-style notebook, but WYSIWYG and with a different underlying structure. One to watch!</li>
</ol>
<p>Continue reading <a href='https://www.oreilly.com/ideas/four-short-links-12-february-2018'>Four short links: 12 February 2018.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/i2IyLk3uUFw" height="1" width="1" alt=""/>Nat Torkingtonhttps://www.oreilly.com/ideas/four-short-links-12-february-2018Four short links: 9 February 20182018-02-09T12:50:00Ztag:www.oreilly.com,2018-02-09:/ideas/four-short-links-9-february-2018<p><em>Small GUI, Dangerous URLs, Face-Recognition Glasses, and The Future is Hard</em></p><ol>
<li>
<a href="https://github.com/vurtun/nuklear">Nuklear</a> -- <i>a single-header ANSI C GUI library</i>, with a lot of bindings (<a href="https://github.com/billsix/pyNuklear">Python</a>, <a href="https://github.com/golang-ui/nuklear">Golang</a>, <a href="https://github.com/cartman300/NuklearDotNet">C#</a>, etc.). (via <a href="http://bit.ly/2BOllYt">Hacker News</a>)</li>
<li>
<a href="https://github.com/JLospinoso/unfurl">unfurl</a> -- <i>a tool that analyzes large collections of URLs and estimates their entropies to sift out URLs that might be vulnerable to attack</i>. (via <a href="https://jlospinoso.github.io/python/unfurl/abrade/hacking/2018/02/08/unfurl-url-analysis.html">this blog</a>)</li>
<li>
<a href="http://bit.ly/2EwqhCn">Chinese Police Using Face Recognition Glasses</a> -- <i>In China, people must use identity documents for train travel. This rule works to prevent people with excessive debt from using high-speed trains, and limit the movement of religious minorities who have had identity documents confiscated and can wait years to get a valid passport.</i> We asked for glasses that would help us remember people's names, we got Robocop 0.5a/BETA2FINAL. {Obligatory "Black Mirror" reference goes here} (via <a href="https://boingboing.net/2018/02/08/yes-i-have-a-dead-chicken.html">BoingBoing</a>)</li>
<li>
<a href="http://www.antipope.org/charlie/blog-static/2018/02/why-i-barely-read-sf-these-day.html">Why I Barely Read SF These Days</a> (Charlie Stross) -- <i>SF should—in my view—be draining the ocean and trying to see at a glance which of the gasping, flopping creatures on the sea bed might be lungfish. But too much SF shrugs at the state of our seas and settles for draining the local aquarium, or even just the bathtub, instead. In pathological cases, it settles for gazing into the depths of a brightly coloured computer-generated fishtank screensaver.</i> Earlier in the essay he talks about how the first to a field defines the tropes and borders that others play in, and it's remarkably hard to find authors who can and will break out of them. (via <a href="https://magicalnihilism.com/2018/02/06/look-for-the-lungfish/">Matt Jones</a>)</li>
</ol>
<p>Continue reading <a href='https://www.oreilly.com/ideas/four-short-links-9-february-2018'>Four short links: 9 February 2018.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/N1avt-T8rp4" height="1" width="1" alt=""/>Nat Torkingtonhttps://www.oreilly.com/ideas/four-short-links-9-february-2018Richard Warburton and Raoul-Gabriel Urma on Java 8 and Reactive Programming2018-02-08T12:05:00Ztag:www.oreilly.com,2018-02-08:/ideas/richard-warburton-and-raoul-gabriel-urma-on-java-8-and-reactive-programming<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/waves-close-up-view-circle-drop-of-water_crop-0d79220850e7bbb34d2f9c49eb1e5150.jpg'/></p><p><em>The O’Reilly Programming Podcast: Building reactive applications.</em></p><p>In this episode of the <a href="https://www.oreilly.com/topics/oreilly-programming-podcast">O’Reilly Programming Podcast</a>, I talk with <a href="https://twitter.com/RichardWarburto">Richard Warburton</a> and <a href="https://twitter.com/raoulUK">Raoul-Gabriel Urma</a> of <a href="http://iteratrlearning.com/about">Iteratr Learning</a>. They are the presenters of a series of O’Reilly Learning Paths, including <a href="https://www.safaribooksonline.com/learning-paths/learning-path-getting/9781492028611/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=20180206_programming_podcast_warburton_urma_text_body_cta_getting_started_with_reactive"><em>Getting Started with Reactive Programming</em></a> and <a href="https://www.safaribooksonline.com/learning-paths/learning-path-build/9781491990223/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=20180206_programming_podcast_warburton_urma_text_body_cta_build_reactive_apps"><em>Build Reactive Applications in Java 8</em></a>. Warburton is the author of <a href="https://www.safaribooksonline.com/library/view/java-8-lambdas/9781449370831/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=20180206_programming_podcast_warburton_urma_text_body_cta_java_8_lambdas"><em>Java 8 Lambdas</em></a>, and Urma is the author of <a href="https://www.safaribooksonline.com/library/view/java-8-in/9781617291999/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=20180206_programming_podcast_warburton_urma_text_body_cta_java_8_in_action">Java 8 in Action</a>.</p><p>Continue reading <a href='https://www.oreilly.com/ideas/richard-warburton-and-raoul-gabriel-urma-on-java-8-and-reactive-programming'>Richard Warburton and Raoul-Gabriel Urma on Java 8 and Reactive Programming.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/X_vUJ0dxxO4" height="1" width="1" alt=""/>Jeff Bleielhttps://www.oreilly.com/ideas/richard-warburton-and-raoul-gabriel-urma-on-java-8-and-reactive-programming5 best practices when requesting visuals for your content2018-02-08T11:30:00Ztag:www.oreilly.com,2018-02-08:/ideas/5-best-practices-when-requesting-visuals-for-your-content<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/toa-heftiba-123512_crop-ec66c36cb9a5a2406448ca6ab46172a1.jpg'/></p><p><em>What a design request should look like when you're talking to an external entity.</em></p><p>Continue reading <a href='https://www.oreilly.com/ideas/5-best-practices-when-requesting-visuals-for-your-content'>5 best practices when requesting visuals for your content.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/KeLKp23IMiI" height="1" width="1" alt=""/>Laura Buschehttps://www.oreilly.com/ideas/5-best-practices-when-requesting-visuals-for-your-contentFour short links: 8 February 20182018-02-08T11:20:00Ztag:www.oreilly.com,2018-02-08:/ideas/four-short-links-8-february-2018<p><em>Data for Problems, Quantum Algorithms, Network Transparency, and AI + Humans</em></p><ol>
<li>
<a href="http://sppd.thegovlab.org/">Solving Public Problems With Data</a> -- <i>an introduction to data science and data analytical thinking in the public interest</i>. Online lecture series. Beth Noveck gives one of them. (via <a href="https://twitter.com/TheGovLab/status/935937212558045185">The Gov Lab</a>)</li>
<li>
<a href="https://arxiv.org/abs/1511.04206">Quantum Algorithms: An Overview</a> -- <i>Here we briefly survey some known quantum algorithms, with an emphasis on a broad overview of their applications rather than their technical details. We include a discussion of recent developments and near-term applications of quantum algorithms.</i> (via <a href="https://blog.acolyer.org/2018/02/06/quantum-algorithms-an-overview/">A Paper A Day</a>)</li>
<li>
<a href="https://utcc.utoronto.ca/~cks/space/blog/unix/XNetworkTransparencyFailure">X11's Network Transparency is Largely a Failure</a> -- <i>Basic X clients that use X properties for everything may be genuinely network transparent, but there are very few of those left these days.</i>
</li>
<li>
<a href="https://jods.mitpress.mit.edu/pub/issue3-case">How to Become a Centaur</a> -- <i>When you create a Human+AI team, the hard part isn’t the "AI". It isn’t even the “Human”. It’s the “+”.</i> Interesting history and current state of human and AI systems. (via <a href="https://mindhacks.com/2018/02/07/how-to-become-a-centaur/">Tom Stafford</a>)</li>
</ol>
<p>Continue reading <a href='https://www.oreilly.com/ideas/four-short-links-8-february-2018'>Four short links: 8 February 2018.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/b1w9GebHQ3s" height="1" width="1" alt=""/>Nat Torkingtonhttps://www.oreilly.com/ideas/four-short-links-8-february-2018HVMN’s better-body biohacking2018-02-07T11:30:00Ztag:www.oreilly.com,2018-02-07:/ideas/hvmns-better-body-biohacking<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/dna-1903318_1280_crop-fb997f1ae06e457fea2d8a6c3dae5d19.jpg'/></p><p><em>Learn how biohacking is unlocking human potential.</em></p>
<p>Technology is unique in the fact that its improvement provides an intuitive next step. Products are refined, updated, and necessarily upgraded at any given time. Optimization is never viewed as a bonus in the tech industry; it’s the name of the game.</p>
<p>But what happens when we attempt to expand optimization goals to include the very facilitators of progress: our minds? Crossing the boundary between hard science and pseudoscience, biohacking companies are exploring the principle of “upgrading” the human body in the hopes that our inherited genetics are more malleable than we think. One such company is HVMN (pronounced “human”). The company’s main product is NOOTROBOX, a line of nootropics or colloquially-dubbed “smart drugs” meant to enhance neural performance in areas such as memory, learning, and focus.</p><p>Continue reading <a href='https://www.oreilly.com/ideas/hvmns-better-body-biohacking'>HVMN’s better-body biohacking.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/HFzbgxUunBQ" height="1" width="1" alt=""/>Meghan Tahbazhttps://www.oreilly.com/ideas/hvmns-better-body-biohackingDelivering effective communication in software teams2018-02-07T11:00:00Ztag:www.oreilly.com,2018-02-07:/ideas/delivering-effective-communication-in-software-teams<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/transmitter-1327920_1920-84bb5320710fd64bbf285edf28749c9c.jpg'/></p><p><em>Optimize for business value with clear feedback loops and quality standards.</em></p><p>We’ve had the privilege to work with many clients from different business sectors. Each client has granted us the opportunity to see how their teams perceive the value of software within their organizations. We’ve also witnessed how the same types of systems (e.g. ERPs) in competing organizations raise completely different problems and challenges. As a result of these experiences, we’ve come to understand that the key to building high-quality software architecture is effective communication between every team member involved in the project who expects to gain value from a software system.</p>
<p>So, if you’re a software architect or developer and you want to improve your architectures or codebases, you’ll have to address the organizational parts as well. Research conducted by Graziotin et al<a href="#dfref-footnote-1" name="ref-footnote-1">1</a>. states that software development is dominated by these often-neglected organizational elements, and that the key to high-quality software and productive developers is the happiness and satisfaction of those developers. In turn, the key to happy and productive developers is empowerment - both on an organizational and technical level.</p><p>Continue reading <a href='https://www.oreilly.com/ideas/delivering-effective-communication-in-software-teams'>Delivering effective communication in software teams.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/6qWy-o5NFrk" height="1" width="1" alt=""/>Evelyn van Kelle, Yiannis Kanellopouloshttps://www.oreilly.com/ideas/delivering-effective-communication-in-software-teamsFour short links: 7 February 20182018-02-07T11:00:00Ztag:www.oreilly.com,2018-02-07:/ideas/four-short-links-7-february-2018<p><em>Identity Advice, Customer Feedback, Fun Toy, and Reproducibility Resources</em></p><ol>
<li>
<a href="https://cloudplatform.googleblog.com/2018/01/12-best-practices-for-user-account.html">12 Best Practices for User Account, Authorization, and Password Management</a> (Google) -- <i>Your users are not an email address. They're not a phone number. They're not the unique ID provided by an OAUTH response. Your users are the culmination of their unique, personalized data and experience within your service. A well-designed user management system has low coupling and high cohesion between different parts of a user's profile.</i>
</li>
<li>
<a href="https://www.newyorker.com/magazine/2018/02/05/customer-satisfaction-at-the-push-of-a-button">Customer Satisfaction at the Push of a Button</a> (New Yorker) -- simply getting binary good/bad feedback is better than no feedback, even if it's not as good as using NPS with something like <a href="http://www.getthematic.com/">Thematic</a>. Also an interesting story about the value of physical interactions over purely digital.</li>
<li>
<a href="https://dood.al/oscilloscope/">XXY Oscilloscope</a> -- try <a href="https://dood.al/oscilloscope/#-0.05,-0.8,0,0,0,0,0.0,3,1,sin(a*t-t/5)*cos(a*t/b)*cos((a+b)*t),sin(a*t+(t/11))*cos(t*t/(b*a)),3,5,0,0.74,125,0,0,0">this</a> or <a href="https://dood.al/oscilloscope/#-0.05,-0.8,1,0,0,0,0.0,3,1,sin(22*a*t-t/15)*cos(a*t+t/12+t/30),sin(a*t+t)*cos(t/a+t/3),2,5,0,0.74,125,0.07,0,0">this</a> to get started. (via <a href="http://bit.ly/2Bi2FiT">Hacker News</a>)</li>
<li>
<a href="https://codeocean.com/workshop/caltech">Reproducibility Workshop</a> -- slides and handouts from a workshop to <i>highlight some of the resources available to help share code, data, reagents, and methods.</i> (via <a href="https://twitter.com/lteytelman/status/960597006493220864">Lenny Teltelman</a>)</li>
</ol>
<p>Continue reading <a href='https://www.oreilly.com/ideas/four-short-links-7-february-2018'>Four short links: 7 February 2018.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/gDGIF_DXaNo" height="1" width="1" alt=""/>Nat Torkingtonhttps://www.oreilly.com/ideas/four-short-links-7-february-2018Re-thinking marketing: Generating attention you can turn into profitable demand2018-02-07T10:50:00Ztag:www.oreilly.com,2018-02-07:/ideas/re-thinking-marketing-generating-attention-you-can-turn-into-profitable-demand<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/abstract-tornado_crop-4bbbe843af03a0fc91fe31ba465a9f5a.jpg'/></p><p><em>The media and ad tech sessions at the Strata Data Conference in San Jose will dig deep into how media businesses are changing.</em></p><p>First-year business students are taught that marketing consists of four Ps: product, place (or channel), price, and promotion. But this thinking is dated. In an era of information saturation, simply creating another piece of information in the form of a spec sheet, white paper, or press release compounds the problem.</p>
<p>I’ve been using a newer definition of marketing in recent years: generating attention you can turn into profitable demand. This underscores the “long funnel” of conversion from initial consumer awareness and engagement, to desirable outcomes like sales, word-of-mouth referral, and the retention of loyal customers.</p>
<p>At the start of the long funnel is media. Traditionally, media was a one-to-many model, in which a few organizations—armed with printing presses and broadcast studios—sent a single message out to the masses. They made money through purchases, subscriptions, and in many cases, advertising.</p>
<p>Much has changed. Today’s communication is bidirectional, flowing from the audience back to the publisher. It’s individualized, with each of us experiencing a tailored feed of information. The cost of publishing is vanishingly small, with anyone able to share a video with the world for practically nothing. And most importantly, we expect media to be free.</p>
<p>This expectation stems from two simple facts: there’s too much content out there, and users create most of it.</p>
<p>The abundance of content is a consequence of how easy it is to publish. Anyone can become an expert; we consume tailored news. I might read 10 publications’ technology sections, but ignore all sports news. Gone are the days of reading a single publication cover to cover. I choose podcasts to suit my interests, seldom exploring.</p>
<p>And the world of user-generated content has birthed a second kind of media. Facebook, Medium, Twitter, Reddit, and their ilk don’t employ writers, but we consume most of our words there. Traditional media outlets with paid reporting and editorial calendars are being squeezed out.</p>
<p>Jeff Jarvis has said that <a href="https://www.huffingtonpost.com/jeff-jarvis/decency-is-the-new-ad_b_212248.html">advertising is failure</a>. It means you haven’t sold an issue, or a subscription. It’s a bad outcome. And yet, it’s the basis for most of what we consume today. Craigslist decimated newspapers partly because the classified ads were the only thing keeping them alive.</p>
<p>The nature of media has shifted, too. It’s gaming, and betting, and theme parks, and blogs, and Youtube channels, and streaming subscriptions. Omnichannel analytics means tracking a customer’s engagement with a brand or some content across many platforms and devices.</p>
<p>With a sprawl of media, and an increased reliance on advertising despite razor-thin margins, media creators of all stripes take analytics very seriously. Data is the difference between dominance and obsolescence, whether you’re keeping a player engaged, trying to get a subscriber to stick around, recommending the next best song, serving a tailored ad, or satisfying a die-hard sports fan.</p>
<p>So, we’re going to dig deep into the media and advertising technology industry at the Strata Data Conference in San Jose this March. With the help of <a href="https://www.linkedin.com/in/beglen/">David Boyle</a>—one of the world’s great media analysts, whose career spans record labels, print publishers, broadcasters, and online learning—we’re assembling a lineup of experts and practitioners from every facet of media. We’ll hear case studies, never-before-shared insights, and projections. We’re even running an Oxford-style debate, where we’ll challenge the statement: “Machines have better taste than humans.” (<a href="https://conferences.oreilly.com/strata/strata-ca/public/schedule/topic/2456?intcmp=il-data-confreg-lp-stca18_20180206_new_site_alistair_croll_media_adtech_at_strata_sj_post_body_text_sessions">Check out our lineup of talks</a>.)</p>
<p>Modern business starts with attention. The risk is seldom “can you build it?” but rather, “will anyone care?” To understand how media businesses are changing—and how the journey from audience to customer begins—we hope you’ll join us next month.</p>
<p>Continue reading <a href='https://www.oreilly.com/ideas/re-thinking-marketing-generating-attention-you-can-turn-into-profitable-demand'>Re-thinking marketing: Generating attention you can turn into profitable demand.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/HkhA6OdDC4I" height="1" width="1" alt=""/>Alistair Crollhttps://www.oreilly.com/ideas/re-thinking-marketing-generating-attention-you-can-turn-into-profitable-demandIntroducing capsule networks2018-02-06T12:00:00Ztag:www.oreilly.com,2018-02-06:/ideas/introducing-capsule-networks<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/ball-407081_1920_crop-c969e7094ab9a0057702bebd6233b76f.jpg'/></p><p><em>How CapsNets can overcome some shortcomings of CNNs, including requiring less training data, preserving image details, and handling ambiguity.</em></p><p>Capsule networks (CapsNets) are a hot new neural net architecture that may well have a profound impact on deep learning, in particular for computer vision. Wait, isn't computer vision pretty much solved already? Haven't we all seen fabulous examples of convolutional neural networks (CNNs) reaching super-human level in various computer vision tasks, such as classification, localization, object detection, semantic segmentation or instance segmentation (see Figure 1)?</p>
<figure class="center" id="id-31Vik"><img alt="main computer vision tasks" width="75%" src="https://d3ansictanv2wj.cloudfront.net/Figure1-3bb53c4be7b5b4c134ddd7a61dbe776a.jpg"><figcaption><span class="label">Figure 1. </span>Some of the main computer vision tasks. Today, each of these tasks requires a very different CNN architecture, for example ResNet for classification, YOLO for object detection, Mask R-CNN for instance segmentation, and so on. Image by Aurélien Géron.</figcaption></figure>
<p>Well, yes, we’ve seen fabulous CNNs, <em>but</em>:</p>
<ul>
<li>They were trained on huge numbers of images (or they reused parts of neural nets that had). CapsNets can generalize well using much less training data.</li>
<li>CNNs don’t handle ambiguity very well. CapsNets do, so they can perform well even on crowded scenes (although, they still struggle with backgrounds right now).</li>
<li>CNNs lose plenty of information in the pooling layers. These layers reduce the spatial resolution (see Figure 2), so their outputs are invariant to small changes in the inputs. This is a problem when detailed information must be preserved throughout the network, such as in semantic segmentation. Today, this issue is addressed by building complex architectures around CNNs to recover some of the lost information. With CapsNets, detailed pose information (such as precise object position, rotation, thickness, skew, size, and so on) is preserved throughout the network, rather than lost and later recovered. Small changes to the inputs result in small changes to the outputs—information is preserved. This is called "equivariance." As a result, CapsNets can use the same simple and consistent architecture across different vision tasks.</li>
<li>Finally, CNNs require extra components to automatically identify which object a part belongs to (e.g., this leg belongs to this sheep). CapsNets give you the hierarchy of parts for free.</li>
</ul>
<figure class="center" id="id-RlWiz"><img alt="DeepLab2 pipeline for image segmentation" width="75%" src="https://d3ansictanv2wj.cloudfront.net/Figure2-83075ebf1eb8239653e4655261e843f7.png"><figcaption><span class="label">Figure 2. </span>The DeepLab2 pipeline for image segmentation, by Liang-Chieh Chen, et al.: notice that the output of the CNN (top right) is very coarse, making it necessary to add extra steps to recover some of the lost details. From the <a href="https://arxiv.org/abs/1606.00915">paper</a> <em>DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs</em>, figure reproduced with the kind permission of the authors. See this <a href="http://blog.qure.ai/notes/semantic-segmentation-deep-learning-review">great post</a> by S. Chilamkurthy to see how diverse and complex the architectures for semantic segmentation can get.</figcaption></figure>
<p>CapsNets were first introduced in 2011 by Geoffrey Hinton, et al., in a paper called <a href="https://scholar.google.com/scholar?hl=en&amp;as_sdt=0%2C5&amp;as_ylo=2011&amp;as_yhi=2011&amp;q=transforming+autoencoders+author%3AHinton+author%3AKrizhevsky+author%3AWang&amp;btnG="><em>Transforming Autoencoders</em></a>, but it was only a few months ago, in November 2017, that Sara Sabour, Nicholas Frosst, and Geoffrey Hinton published a paper called <a href="https://papers.nips.cc/paper/6975-dynamic-routing-between-capsules.pdf"><em>Dynamic Routing between Capsules</em></a>, where they introduced a CapsNet architecture that reached state-of-the-art performance on MNIST (the famous data set of handwritten digit images), and got considerably better results than CNNs on MultiMNIST (a variant with overlapping pairs of different digits). See Figure 3.</p>
<figure class="center" id="id-RN4iK"><img alt="MultiMNIST images and their reconstructions" width="75%" src="https://d3ansictanv2wj.cloudfront.net/Figure3-e2a56c8330a530df21a963ee61950acd.png"><figcaption><span class="label">Figure 3. </span>MultiMNIST images (white) and their reconstructions by a CapsNet (red+green). “R” = reconstructions; “L” = labels. For example, the predictions for the first example (top left) are correct, and so are the reconstructions. But in the fifth example, the prediction is wrong: (5,7) instead of (5,0). Therefore, the 5 is correctly reconstructed, but not the 0. From the <a href="https://arxiv.org/abs/1710.09829">paper</a>: <em>Dynamic routing between capsules</em>, figure reproduced with the kind permission of the authors.</figcaption></figure>
<p>Despite all their qualities, CapsNets are still far from perfect. Firstly, for now they don't perform as well as CNNs on larger images such as CIFAR10 or ImageNet. Moreover, they are computationally intensive, and they cannot detect two objects of the same type when they are too close to each other (this is called the "crowding problem," and it has been shown that <a href="https://scholar.google.com/scholar?hl=en&amp;as_sdt=0%2C5&amp;q=%22visual+crowding%22&amp;btnG=">humans have it, too</a>). But the key ideas are extremely promising, and it seems likely that they just need a few tweaks to reach their full potential. After all, modern CNNs were invented in 1998, yet they only beat the state of the art on ImageNet in 2012, after a few tweaks.</p>
<h2>So, what are CapsNets exactly?</h2>
<p>In short, a CapsNet is composed of capsules rather than neurons. A capsule is a small group of neurons that learns to detect a particular object (e.g., a rectangle) within a given region of the image, and it outputs a vector (e.g., an 8-dimensional vector) whose length represents the estimated probability that the object is present<a href="#_ftn1"><sup>[1]</sup></a>, and whose orientation (e.g., in 8D space) encodes the object's pose parameters (e.g., precise position, rotation, etc.). If the object is changed slightly (e.g., shifted, rotated, resized, etc.) then the capsule will output a vector of the same length, but oriented slightly differently. Thus, capsules are equivariant.</p>
<p>Much like a regular neural network, a CapsNet is organized in multiple layers (see Figure 4). The capsules in the lowest layer are called primary capsules: each of them receives a small region of the image as input (called its receptive field), and it tries to detect the presence and pose of a particular pattern, for example a rectangle. Capsules in higher layers, called routing capsules, detect larger and more complex objects, such as boats.</p>
<figure class="center" id="id-RMkiq"><img alt="two-layer CapsNet" width="75%" src="https://d3ansictanv2wj.cloudfront.net/Figure4-132734e4432dcf12d8b0d66efbf58bbb.png"><figcaption><span class="label">Figure 4. </span>A two-layer CapsNet. In this example, the primary capsule layer has two maps of 5x5 capsules, while the second capsule layer has two maps of 3x3 capsules. Each capsule outputs a vector. Each arrow represents the output of a different capsule. Blue arrows represent the output of a capsule that tries to detect triangles, black arrows represent the output of a capsule that tries to detect rectangles, and so on. Image by Aurélien Géron.</figcaption></figure>
<p>The primary capsule layer is implemented using a few regular convolutional layers. For example, in the paper, they use two convolutional layers that output 256 6x6 features maps containing scalars. They reshape this output to get 32 6x6 maps containing 8-dimensional vectors. Finally, they use a novel squashing function to ensure these vectors have a length between 0 and 1 (to represent a probability). And that's it: this gives the output of the primary capsules.</p>
<p>The capsules in the next layers also try to detect objects and their pose, but they work very differently, using an algorithm called routing by agreement. This is where most of the magic of CapsNets lies. Let's look at an example.</p>
<p>Suppose there are just two primary capsules: one rectangle capsule and one triangle capsule, and suppose they both detected what they were looking for. Both the rectangle and the triangle could be part of either a house or a boat (see Figure 5). Given the pose of the rectangle, which is slightly rotated to the right, the house and the boat would have to be slightly rotated to the right as well. Given the pose of the triangle, the house would have to be almost upside down, whereas the boat would be slightly rotated to the right. Note that both the shapes and the whole/part relationships are learned during training. Now notice that the rectangle and the triangle agree on the pose of the boat, while they strongly disagree on the pose of the house. So, it is very likely that the rectangle and triangle are part of the same boat, and there is no house.</p>
<figure class="center" id="id-5bMiM"><img alt="Routing-by-agreement" width="65%" src="https://d3ansictanv2wj.cloudfront.net/Figure5-619839b67fc35ba5860030515ee9c786.png"><figcaption><span class="label">Figure 5. </span>Routing by agreement, step 1—predict the presence and pose of objects based on the presence and pose of object parts, then look for agreement between the predictions. Image by Aurélien Géron.</figcaption></figure>
<p>Since we are now confident that the rectangle and triangle are part of the boat, it makes sense to send the outputs of the rectangle and triangle capsules more to the boat capsule, and less to the house capsule: this way, the boat capsule will receive more useful input signal, and the house capsule will receive less noise. For each connection, the routing-by-agreement algorithm maintains a routing weight (see Figure 6): it increases routing weight when there is agreement, and decreases it in case of disagreement.</p>
<figure class="center" id="id-3aBiB"><img alt="Routing-by-agreement, step 2 – Update the routing weights" width="75%" src="https://d3ansictanv2wj.cloudfront.net/Figure6-7fd6229e67b77b513b1aea568025bee0.png"><figcaption><span class="label">Figure 6. </span>Routing by agreement, step 2—update the routing weights. Image by Aurélien Géron.</figcaption></figure>
<p>The routing-by-agreement algorithm involves a few iterations of agreement-detection + routing-update (note that this happens for each prediction, not just once, and not just at training time). This is especially useful in crowded scenes: for example, in Figure 7, the scene is ambiguous because you could see an upside-down house in the middle, but this would leave the bottom rectangle and top triangle unexplained, so the routing-by-agreement algorithm will most likely converge to a better explanation: a boat at the bottom, and a house at the top. The ambiguity is said to be "explained away": the lower rectangle is best explained by the presence of a boat, which also explains the lower triangle, and once these two parts are explained away, the remaining parts are easily explained as a house.</p>
<figure class="center" id="id-6LDiA"><img alt="Routing by agreement can parse crowded scenes" width="75%" src="https://d3ansictanv2wj.cloudfront.net/Figure7-ab08e9c2bd11fbc7380b7933f4af5fd6.png"><figcaption><span class="label">Figure 7. </span>Routing by agreement can parse crowded scenes, such as this ambiguous image, which could be misinterpreted as an upside-down house plus some unexplained parts. Instead, the lower rectangle will be routed to the boat, and this will also pull the lower triangle into the boat as well. Once that boat is “explained away,” it’s easy to interpret the top part as a house. Image by Aurélien Géron.</figcaption></figure>
<p>And that’s it—you know the key ideas behind CapsNets! If you want more details, check out my two <a href="https://www.youtube.com/watch?v=pPN8d0E3900&amp;list=PLuTYjXW7aAt3HLCATBOkkXtifCVAm0O_A">videos on CapsNets</a> (one on the architecture and another on the implementation) and my <a href="https://github.com/ageron/handson-ml/blob/master/extra_capsnets.ipynb">commented TensorFlow implementation</a> (Jupyter Notebook). Please don’t hesitate to comment on the videos, file issues on GitHub if you see any, or <a href="https://twitter.com/aureliengeron">contact me on Twitter @aureliengeron</a>. I hope you found this post useful!</p>
<aside data-type="sidebar" id="id-5qBSE">
<div id="_ftn1">
<p><a href="#_ftn1"><sup>[1]</sup></a> This is the original architecture proposed in the paper <em>Dynamic routing with capsules</em>, by S. Sabour, N. Frosst, and G. Hinton, but since then, they proposed a more general architecture where the object’s presence probability and pose parameters are encoded differently in the output vector. The ideas remain the same, however.</p>
</div>
</aside>
<p>Continue reading <a href='https://www.oreilly.com/ideas/introducing-capsule-networks'>Introducing capsule networks.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/P6enbExXmpA" height="1" width="1" alt=""/>Aurélien Géronhttps://www.oreilly.com/ideas/introducing-capsule-networksFour short links: 6 February 20182018-02-06T11:50:00Ztag:www.oreilly.com,2018-02-06:/ideas/four-short-links-6-february-2018<p><em>Mine Research, Fight for Attention, AI Metaphors, and Research Browser Extensions</em></p><ol>
<li>
<a href="https://github.com/daniel1noble/metaDigitise">metaDigitise</a> -- <i>Digitising functions in R for extracting data and summary statistics from figures in primary research papers.</i>
</li>
<li>
<a href="http://humanetech.com/">Center for Humane Technology</a> -- Silicon Valley tech insiders fighting against attention-vacuuming tech design. (via <a href="https://www.nytimes.com/2018/02/04/technology/early-facebook-google-employees-fight-tech.html"><em>New York Times</em></a>)</li>
<li>
<a href="https://cyberselves.org/2018/02/05/tools-substitutes-or-companions-metaphors-for-thinking-about-technology/">Tools, Substitutes, or Companions</a> -- <i>three metaphors for how we think about digital and robotic technologies.</i> (via <a href="https://twitter.com/tomstafford/status/960642334668001280">Tom Stafford</a>)</li>
<li>
<a href="http://unpaywall.org/">Unpaywall</a> -- browser extension. <i>Click the green tab and skip the paywall on millions of peer-reviewed journal articles. It's fast, free, and legal.</i> Pair with the <a href="https://openaccessbutton.org/">open access button</a>. (via <a href="https://twitter.com/swatlibrary/status/960595237352755200">Swarthmore Libraries</a>)</li>
</ol>
<p>Continue reading <a href='https://www.oreilly.com/ideas/four-short-links-6-february-2018'>Four short links: 6 February 2018.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/i72MkRYFA30" height="1" width="1" alt=""/>Nat Torkingtonhttps://www.oreilly.com/ideas/four-short-links-6-february-2018Integrating continuous testing for improved open source security2018-02-06T11:00:00Ztag:www.oreilly.com,2018-02-06:/ideas/integrating-continuous-testing-for-improved-open-source-security<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/keyboard-65042_1280-7ea2b64fc395d79839ea41c8f9bf062e.jpg'/></p><p><em>Testing to prevent vulnerable open source libraries.</em></p>
<h2>Integrating Testing to Prevent Vulnerable Libraries</h2>
<p>Once you’ve found and fixed (or at least acknowledged) the security flaws in the libraries you use, it’s time to look into tackling this problem continuously.</p>
<p>There are two ways for additional vulnerabilities to show up in your dependencies:</p><p>Continue reading <a href='https://www.oreilly.com/ideas/integrating-continuous-testing-for-improved-open-source-security'>Integrating continuous testing for improved open source security.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/fZCKLlgMynU" height="1" width="1" alt=""/>Guy Podjarnyhttps://www.oreilly.com/ideas/integrating-continuous-testing-for-improved-open-source-securityWhy I won't whitelist your site2018-02-05T12:00:00Ztag:www.oreilly.com,2018-02-05:/ideas/why-i-wont-whitelist-your-site<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/30422177753_f8ca293003_k_crop-ce508a3a685401809dd5d050211e8846.jpg'/></p><p><em>Publishers need to take responsibility for code they run on my systems.</em></p><p>Many internet users—perhaps most—use an ad blocker. I’m one of them. All of us are familiar with the sites that won’t let us in without whitelisting them, or (only somewhat better) that repeatedly nag us to whitelist.</p>
<p>I’m not whitelisting anyone. I don’t have any fundamental problem with advertising; I wish ads weren’t as intrusive, and I believe advertisers would be better served by advertisements that had more respect for their viewers. But that’s not really why I use an ad blocker.</p>
<p>The real problem with ads is that they’re a <a href="https://en.wikipedia.org/wiki/Malvertising">vector for malware</a>. It’s relatively easy to fold malware into otherwise-innocent advertisements, and that malware executes even if you don’t click on the ads. I’ve received malware from sites as otherwise legitimate as the BBC, and there are reports of malware from virtually every major online publisher—including sites like <a href="https://www.engadget.com/2016/01/08/you-say-advertising-i-say-block-that-malware/">Forbes</a> that won’t let you in if you don’t whitelist them. The <em>New York Times</em>, Reuters, MSN, and many others have <a href="https://www.avg.com/en/signal/what-is-malvertising">all spread malware</a>.</p>
<p>And no one takes responsibility for the advertisements or the damage they cause. The publishers just say “hey, we don’t control the ads; that’s the ad placement company.” The advertisers similarly say “hey, our ads come from a marketing firm, and they use some kind of web contractor to do the coding.” And the ad placement companies and marketing firms? All you get from them is the sound of silence.</p>
<p>Here’s the deal. I’m willing to whitelist any online publisher that will agree to a license in which they take responsibility for any code they run on my systems. Call it a EULA for using my browser on my computer. If you deliver malware, you will pay for the damages: my lost time, my lost data. If the idea catches on, managing all the contracts sounds like a problem, but I think it’s a business opportunity. Something would be needed to track all the licenses in an authoritative ledger. This sounds like an application for a blockchain. Maybe even a blockchain startup.</p>
<p>If I really need to read something on your site, and you won’t let me in because I am running an ad blocker, I might read your site anyway. That’s trivial—I have four or five browsers on all of my machines, and not all of them have ad blockers installed. But I won’t link to you, quote you, or tweet you. You’re dead to me.</p>
<p>I’ve been asked whether I have any proposals for a business model other than advertising. Not really. Though my employer, O’Reilly Media, does a bit of online publishing, and we don’t take advertising. But advising publishers on their business model isn’t my job—and they’ve yet to ask me for advice, anyway. My job is keeping my systems safe, and that requires keeping malware out.</p>
<p>Again, I have nothing against advertising as a business model. However, that model (and the businesses relying on it) deserve to fail if publishers won’t take responsibility for the ads they deliver. While I understand that publishers don’t control the ads, and don’t have the technical expertise to inspect the ads they deliver, they are the ones that deliver the ads. They bear the responsibility for damages.</p>
<p>Could this be a movement? Can we imagine a future with ad blockers that would let ads through if, and only if, the publisher has agreed to a license that allows users to recover damages from advertising-spread malware?</p>
<p>I’m in.</p>
<p>Continue reading <a href='https://www.oreilly.com/ideas/why-i-wont-whitelist-your-site'>Why I won't whitelist your site.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/lk1ndN2gMaI" height="1" width="1" alt=""/>Mike Loukideshttps://www.oreilly.com/ideas/why-i-wont-whitelist-your-siteFour short links: 5 February 20182018-02-05T11:35:00Ztag:www.oreilly.com,2018-02-05:/ideas/four-short-links-5-february-2018<p><em>Company Principles, DeepFake, AGI, and Missing Devices</em></p><ol>
<li>
<a href="http://bit.ly/2E4prxu">Principles of Technology Leadership</a> (Bryan Cantrill) -- (<a href="https://www.slideshare.net/bcantrill/principles-of-technology-leadership">slides</a>) what cultural values and principles do you want to guide *your* company? (via <a href="http://dtrace.org/blogs/bmc/2018/02/03/talks/">Bryan Cantrill</a>)</li>
<li>
<a href="http://svencharleer.com/blog/2018/02/02/family-fun-with-deepfakes-or-how-i-got-my-wife-onto-the-tonight-show/">Fun With DeepFakes; or How I Got My Wife on The Tonight Show</a> -- this is going to further erode trust. How can you know what happened if all evidence can be convincingly faked? (via <a href="https://simonwillison.net/2018/Feb/2/family-fun-with-deepfakes/">Simon Willison</a>)</li>
<li>
<a href="https://agi.mit.edu/">MIT 6.S099: Artificial General Intelligence</a> -- <i>The lectures will introduce our current understanding of computational intelligence and ways in which strong AI could possibly be achieved, with insights from deep learning, reinforcement learning, computational neuroscience, robotics, cognitive modeling, psychology, and more. Additional topics will include AI safety and ethics.</i> Worth noting that we can't build an artificial general intelligence right now, and may never be able to. Don't freak out because of the course headline.</li>
<li>
<a href="https://www.eff.org/missing-devices">Catalog of Missing Devices</a> (EFF) -- <i>Things we’d pay money for—things you could earn money with—don’t exist thanks to the chilling effects of an obscure copyright law: Section 1201 of the Digital Millennium Copyright Act (DMCA 1201)</i>. From "third-party consumables for 3D printers" to an "ads-free YouTube for Kids," they're good ideas.</li>
</ol>
<p>Continue reading <a href='https://www.oreilly.com/ideas/four-short-links-5-february-2018'>Four short links: 5 February 2018.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/-RooGfltJ4k" height="1" width="1" alt=""/>Nat Torkingtonhttps://www.oreilly.com/ideas/four-short-links-5-february-2018Why product managers should master the art of user story writing2018-02-02T11:30:00Ztag:www.oreilly.com,2018-02-02:/ideas/why-product-managers-should-master-the-art-of-user-story-writing<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/typewriter-801921_1920_crop-a72889e7ee69f53476bc62df13bc4b01.jpg'/></p><p><em>A well-written user story allows product managers to clearly communicate to their Agile development teams.</em></p><p>Continue reading <a href='https://www.oreilly.com/ideas/why-product-managers-should-master-the-art-of-user-story-writing'>Why product managers should master the art of user story writing.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/FgJtJU_oSh0" height="1" width="1" alt=""/>Ryan Harperhttps://www.oreilly.com/ideas/why-product-managers-should-master-the-art-of-user-story-writingFour short links: 2 February 20182018-02-02T11:20:00Ztag:www.oreilly.com,2018-02-02:/ideas/four-short-links-2-february-2018<p><em>Digitize and Automate, Video Editor, AI + Humans, and Modest JavaScript</em></p><ol>
<li>
<a href="http://fortune.com/2018/01/30/port-automation-robots-container-ships/">Port Automation</a> (Fortune) -- <i>By digitizing and automating activities once handled by human crane operators and cargo haulers, seaports can reduce the amount of time ships sit in port and otherwise boost port productivity by up to 30%.</i> "Digitize and automate" will be the mantra of the next decade.</li>
<li>
<a href="https://www.shotcutapp.com/">Shot Cut App</a> -- <i>a free, open source, cross-platform video editor.</i>
</li>
<li>
<a href="https://www.oreilly.com/ideas/the-working-relationship-between-ais-and-humans-isnt-master-slave">The Working Relationship Between Humans and AI</a> (Mike Loukides) -- <i>Whether we're talking about doctors, lawyers, engineers, Go players, or taxi drivers, we shouldn't expect AI systems to give us unchallengeable answers ex silico. We shouldn't be told that we need to "trust AI." What's important is the conversation.</i>
</li>
<li>
<a href="https://stimulusjs.org/">Stimulus</a>-- <i>modest JavaScript framework for the HTML you already have</i>.</li>
</ol>
<p>Continue reading <a href='https://www.oreilly.com/ideas/four-short-links-2-february-2018'>Four short links: 2 February 2018.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/uDTGHCdmjD4" height="1" width="1" alt=""/>Nat Torkingtonhttps://www.oreilly.com/ideas/four-short-links-2-february-2018Logo detection using Apache MXNet2018-02-01T20:25:00Ztag:www.oreilly.com,2018-02-01:/ideas/logo-detection-using-apache-mxnet<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/image-grid-crop-2-2b71ecd665c16bf01275b1f0e71e5ea3.jpg'/></p><p><em>Image recognition and machine learning for mar tech and ad tech.</em></p><p>Digital marketing is the marketing of products, services, and offerings on digital platforms. Advertising technology, commonly known as "ad tech," is the use of digital technologies by vendors, brands, and their agencies to target potential clients, deliver personalized messages and offerings, and analyze the impact of online spending: sponsored stories on Facebook newsfeeds; Instagram stories; ads that play on YouTube before the video content begins; the recommended links at the end of a CNN article, powered by Outbrain—these all are examples of ad tech at work. </p>
<p>In the past year, there has been a significant use of deep learning for digital marketing and ad tech.</p>
<p>In this article, we will delve into one part of a popular use case: mining the Web for celebrity endorsements. Along the way, we’ll see the relative value of deep learning architectures, run actual experiments, learn the effects of data sizes, and see how to augment the data when we don’t have enough.</p>
<p><strong>Use case overview</strong></p>
<p>In this article, we will see how to build a deep learning classifier that will predict the company, given an image with logo. This section provides an overview of where this model could be used.</p>
<p>Celebrities endorse a number of products. Quite often, they post pictures on social media showing off a brand they endorse. A typical post of that type contains an image, with the celebrity and some text they have written. The brand, in turn, is eager to learn about the appearance of such postings, and to show them to potential customers who might be influenced by them.</p>
<p>The ad tech application, therefore, works as follows: large numbers of postings are fed to a processor that figures out the celebrity, the brand, and the message. Then, for each potential customer, the machine learning model generates a very specific advertisement based on the time, location, message, brand, customers' preferred brands, and other things. Another model identifies the target customer base. And the targeted ad is now sent.</p>
<p>Figure 1 shows the workflow:</p>
<figure class="center" id="id-5meiz"><img alt="Celebrity brand-endorsement bot workflow" src="https://d3ansictanv2wj.cloudfront.net/adtech2-6a689483bddf022b8a5025f60cee5efd.png"><figcaption><span class="label">Figure 1. </span>Celebrity brand-endorsement bot workflow. Image by Tuhin Sharma.</figcaption></figure>
<p>As you can see, the system is composed of a number of machine learning models. </p>
<p>Consider the image. The picture could have been taken in any setting. The first goal is to identify the objects and the celebrity in the picture. This is done by <em>object detection</em> models. Then, the next step is to identify the brand, if one appears. The easiest way to identify the brand is by its logo.</p>
<p>In this article, we will look into building a deep learning model to identify a brand by its logo in an image. Subsequent articles will talk about building some of the other pieces of the bot (object detection, text generation, etc.).</p>
<p><strong>Problem definition</strong></p>
<p>The problem addressed in this article is: given an image, predict the company (brand) in the image by identifying the logo.</p>
<p><strong>Data</strong></p>
<p>To build machine learning models, access to high-quality data sets are imperative. In real-life, the data scientists will work with brand managers and agencies to get all possible logos. </p>
<p>For the purpose of this article, we will leverage the <a href="http://www.multimedia-computing.de/flickrlogos/">FlickrLogo</a> data set. This data set has real-world images from Flickr, a popular photo sharing website. The <a href="http://www.multimedia-computing.de/flickrlogos/">FlickrLogo</a> page has instructions on how to download the data. Please download the data if you want to use the code in this article to build your own models.</p>
<p><strong>Models</strong></p>
<p>Identifying the brand from its logo is a classic computer vision problem. In the past few years, deep learning has become the state-of-the-art for computer vision problems. We will be building deep learning models for this use case</p>
<p><strong>Software</strong></p>
<p>In our <a href="https://www.oreilly.com/ideas/uncovering-hidden-patterns-through-machine-learning">previous article</a>, we talked about the strengths of <code>Apache MXNet</code>. We also talked about <code>Gluon</code>, the simpler interface on top of <code>MXNet</code>. Both are extremely powerful and allow deep learning engineers to experiment rapidly with various model architectures. </p>
<p>Let's now get to the code.</p>
<p><strong>Libraries</strong></p>
<p>Let's first import the libraries we need for building the models:</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="kn">import</code> <code class="nn">mxnet</code> <code class="kn">as</code> <code class="nn">mx</code>
<code class="kn">import</code> <code class="nn">cv2</code>
<code class="kn">from</code> <code class="nn">pathlib</code> <code class="kn">import</code> <code class="n">Path</code>
<code class="kn">import</code> <code class="nn">os</code>
<code class="kn">from</code> <code class="nn">time</code> <code class="kn">import</code> <code class="n">time</code>
<code class="kn">import</code> <code class="nn">shutil</code>
<code class="kn">import</code> <code class="nn">matplotlib.pyplot</code> <code class="kn">as</code> <code class="nn">plt</code>
<code class="o">%</code><code class="n">matplotlib</code> <code class="n">inline</code>
</pre>
<p><strong>Load the data</strong></p>
<p>From the FlickrLogos data sets, let's use the FlickrLogos-32 data set. <code>&lt;flickrlogos-url&gt;</code> is the URL to this data set. </p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="o">%%</code><code class="n">capture</code>
<code class="err">!</code><code class="n">wget</code> <code class="o">-</code><code class="n">nc</code> <code class="o">&lt;</code><code class="n">flickrlogos</code><code class="o">-</code><code class="n">url</code><code class="o">&gt;</code> <code class="c1"># Replace with the URL to the dataset</code>
<code class="err">!</code><code class="n">unzip</code> <code class="o">-</code><code class="n">n</code> <code class="o">./</code><code class="n">FlickrLogos</code><code class="o">-</code><code class="mi">32</code><code class="n">_dataset_v2</code><code class="o">.</code><code class="n">zip</code>
</pre>
<p><strong>Data preparation</strong></p>
<p>The next step is to create the following data sets:</p>
<ol>
<li>Train</li>
<li>Validation</li>
<li>Test</li>
</ol>
<p>The FlickrLogos already has train, validation and test data sets, dividing the images as follows:</p>
<ul>
<li>The train data set has 32 classes, each containing 10 images.</li>
<li>The validation data set has 3,960 images, of which 3,000 images have no logos.</li>
<li>The test data set has 3,960 images.</li>
</ul>
<p>While the train images all have logos, the validation and test images have no logos. We want to build a model that generalizes well. We want a model that predicts correctly on images that weren't used for training (validation and test images). </p>
<p>To make our learning faster, with better accuracy, for the purpose of this article, we will move 50% of the no-logo class from the validation data set to the training set. So, we will make the training data set of size 1,820 (after adding 1,500 no-logo images from validation set) and reduce the validation data set size to 2,460 (after moving out 1,500 no-logo images). In a real-life setting, we will experiment with different model architectures to choose the one that performs well on the actual validation and test data sets.</p>
<p>Next, define the directory where the data is stored.</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="n">data_directory</code> <code class="o">=</code> <code class="s2">"./FlickrLogos-v2/"</code>
</pre>
<p>Now, define the path to the train, test, and validation data sets. For validation, we define two paths: one for the images containing logos and one for the rest of the images without logos. </p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="n">train_logos_list_filename</code> <code class="o">=</code> <code class="n">data_directory</code><code class="o">+</code><code class="s2">"trainset.relpaths.txt"</code>
<code class="n">val_logos_list_filename</code> <code class="o">=</code> <code class="n">data_directory</code><code class="o">+</code><code class="s2">"valset-logosonly.relpaths.txt"</code>
<code class="n">val_nonlogos_list_filename</code> <code class="o">=</code> <code class="n">data_directory</code><code class="o">+</code><code class="s2">"valset-nologos.relpaths.txt"</code>
<code class="n">test_list_filename</code> <code class="o">=</code> <code class="n">data_directory</code><code class="o">+</code><code class="s2">"testset.relpaths.txt"</code>
</pre>
<p>Let's now read the filenames for train, test, and validation (logo
and non-logo) from the list just defined. </p>
<p>The list is given in the FlickrLogo data set, which has
already categorized the images as train, test, validation with logo,
and validation without logo.</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="c1"># List of train images </code>
<code class="k">with</code> <code class="nb">open</code><code class="p">(</code><code class="n">train_logos_list_filename</code><code class="p">)</code> <code class="k">as</code> <code class="n">f</code><code class="p">:</code>
<code class="n">train_logos_filename</code> <code class="o">=</code> <code class="n">f</code><code class="o">.</code><code class="n">read</code><code class="p">()</code><code class="o">.</code><code class="n">splitlines</code><code class="p">()</code>
</pre>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="c1"># List of validation images without logos</code>
<code class="k">with</code> <code class="nb">open</code><code class="p">(</code><code class="n">val_nonlogos_list_filename</code><code class="p">)</code> <code class="k">as</code> <code class="n">f</code><code class="p">:</code>
<code class="n">val_nonlogos_filename</code> <code class="o">=</code> <code class="n">f</code><code class="o">.</code><code class="n">read</code><code class="p">()</code><code class="o">.</code><code class="n">splitlines</code><code class="p">()</code>
</pre>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="c1"># List of validation images with logos </code>
<code class="k">with</code> <code class="nb">open</code><code class="p">(</code><code class="n">val_logos_list_filename</code><code class="p">)</code> <code class="k">as</code> <code class="n">f</code><code class="p">:</code>
<code class="n">val_logos_filename</code> <code class="o">=</code> <code class="n">f</code><code class="o">.</code><code class="n">read</code><code class="p">()</code><code class="o">.</code><code class="n">splitlines</code><code class="p">()</code>
</pre>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="c1"># List of test images </code>
<code class="k">with</code> <code class="nb">open</code><code class="p">(</code><code class="n">test_list_filename</code><code class="p">)</code> <code class="k">as</code> <code class="n">f</code><code class="p">:</code>
<code class="n">test_filenames</code> <code class="o">=</code> <code class="n">f</code><code class="o">.</code><code class="n">read</code><code class="p">()</code><code class="o">.</code><code class="n">splitlines</code><code class="p">()</code>
</pre>
<p>Now, move some of the validation images without logos to the set of train images. This set will end up with all the train images and 50% of no-logo images from the validation data set. The validation set will end up with all the validation images that have logos and the remaining 50% of no-logo images.</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="n">train_filenames</code> <code class="o">=</code> <code class="n">train_logos_filename</code> <code class="o">+</code> <code class="n">val_nonlogos_filename</code><code class="p">[</code><code class="mi">0</code><code class="p">:</code><code class="nb">int</code><code class="p">(</code><code class="nb">len</code><code class="p">(</code><code class="n">val_nonlogos_filename</code><code class="p">)</code><code class="o">/</code><code class="mi">2</code><code class="p">)]</code>
</pre>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="n">val_filenames</code> <code class="o">=</code> <code class="n">val_logos_filename</code> <code class="o">+</code> <code class="n">val_nonlogos_filename</code><code class="p">[</code><code class="nb">int</code><code class="p">(</code><code class="nb">len</code><code class="p">(</code><code class="n">val_nonlogos_filename</code><code class="p">)</code><code class="o">/</code><code class="mi">2</code><code class="p">):]</code>
</pre>
<p>To verify what we’ve done, let's print the number of images in the train, test and validation data sets.</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="k">print</code><code class="p">(</code><code class="s2">"Number of Training Images : "</code><code class="p">,</code><code class="nb">len</code><code class="p">(</code><code class="n">train_filenames</code><code class="p">))</code>
<code class="k">print</code><code class="p">(</code><code class="s2">"Number of Validation Images : "</code><code class="p">,</code><code class="nb">len</code><code class="p">(</code><code class="n">val_filenames</code><code class="p">))</code>
<code class="k">print</code><code class="p">(</code><code class="s2">"Number of Testing Images : "</code><code class="p">,</code><code class="nb">len</code><code class="p">(</code><code class="n">test_filenames</code><code class="p">))</code>
</pre>
<p>The next step in the data preparation process is to set the folder paths in a way that makes model training easy.</p>
<p>We need the folder structure to be like Figure 2.</p>
<figure class="center" id="id-meiPz"><img alt="folder structure for data" src="https://d3ansictanv2wj.cloudfront.net/tree1-4d4e3a7bad5754c9499dc131a699662d.png"><figcaption><span class="label">Figure 2. </span>Folder structure for data. Image by Tuhin Sharma.</figcaption></figure>
<p>The following function helps us create this structure.</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="k">def</code> <code class="nf">prepare_datesets</code><code class="p">(</code><code class="n">base_directory</code><code class="p">,</code><code class="n">filenames</code><code class="p">,</code><code class="n">dest_folder_name</code><code class="p">):</code>
<code class="k">for</code> <code class="n">filename</code> <code class="ow">in</code> <code class="n">filenames</code><code class="p">:</code>
<code class="n">image_src_path</code> <code class="o">=</code> <code class="n">base_directory</code><code class="o">+</code><code class="n">filename</code>
<code class="n">image_dest_path</code> <code class="o">=</code> <code class="n">image_src_path</code><code class="o">.</code><code class="n">replace</code><code class="p">(</code><code class="s1">'classes/jpg'</code><code class="p">,</code><code class="n">dest_folder_name</code><code class="p">)</code>
<code class="n">dest_directory_path</code> <code class="o">=</code> <code class="n">Path</code><code class="p">(</code><code class="n">os</code><code class="o">.</code><code class="n">path</code><code class="o">.</code><code class="n">dirname</code><code class="p">(</code><code class="n">image_dest_path</code><code class="p">))</code>
<code class="n">dest_directory_path</code><code class="o">.</code><code class="n">mkdir</code><code class="p">(</code><code class="n">parents</code><code class="o">=</code><code class="bp">True</code><code class="p">,</code><code class="n">exist_ok</code><code class="o">=</code><code class="bp">True</code><code class="p">)</code>
<code class="n">shutil</code><code class="o">.</code><code class="n">copy2</code><code class="p">(</code><code class="n">image_src_path</code><code class="p">,</code> <code class="n">image_dest_path</code><code class="p">)</code>
</pre>
<p>Call this function to create the train, validation, and test folders with the images placed under them within their respective classes.</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="n">prepare_datesets</code><code class="p">(</code><code class="n">base_directory</code><code class="o">=</code><code class="n">data_directory</code><code class="p">,</code><code class="n">filenames</code><code class="o">=</code><code class="n">train_filenames</code><code class="p">,</code><code class="n">dest_folder_name</code><code class="o">=</code><code class="s1">'train_data'</code><code class="p">)</code>
<code class="n">prepare_datesets</code><code class="p">(</code><code class="n">base_directory</code><code class="o">=</code><code class="n">data_directory</code><code class="p">,</code><code class="n">filenames</code><code class="o">=</code><code class="n">val_filenames</code><code class="p">,</code><code class="n">dest_folder_name</code><code class="o">=</code><code class="s1">'val_data'</code><code class="p">)</code>
<code class="n">prepare_datesets</code><code class="p">(</code><code class="n">base_directory</code><code class="o">=</code><code class="n">data_directory</code><code class="p">,</code><code class="n">filenames</code><code class="o">=</code><code class="n">test_filenames</code><code class="p">,</code><code class="n">dest_folder_name</code><code class="o">=</code><code class="s1">'test_data'</code><code class="p">)</code>
</pre>
<p>The next step is to define the <em>hyperparameters</em> for the model.</p>
<p>We have 33 classes (32 logos and 1 non-logo). The data size isn't huge, so we will use only one GPU. We will train for 20 epochs and use 40 as the batch size for training.</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="n">batch_size</code> <code class="o">=</code> <code class="mi">40</code>
<code class="n">num_classes</code> <code class="o">=</code> <code class="mi">33</code>
<code class="n">num_epochs</code> <code class="o">=</code> <code class="mi">20</code>
<code class="n">num_gpu</code> <code class="o">=</code> <code class="mi">1</code>
<code class="n">ctx</code> <code class="o">=</code> <code class="p">[</code><code class="n">mx</code><code class="o">.</code><code class="n">gpu</code><code class="p">(</code><code class="n">i</code><code class="p">)</code> <code class="k">for</code> <code class="n">i</code> <code class="ow">in</code> <code class="nb">range</code><code class="p">(</code><code class="n">num_gpu</code><code class="p">)]</code>
</pre>
<p><strong>Data pre-processing</strong></p>
<p>Once the images are loaded, we need to ensure the images are of the same size. We will resize all the images to be 224 * 224 pixels. </p>
<p>We have 1,820 training images, which is really not much data. Is there a smart way to get more data? An astounding yes. An image, when flipped, still means the same thing, at least for logos. A random crop of the logo is also still the same logo. </p>
<p>So, we do not need to add images for the purposes of our training, but instead can transform some of the existing images by flipping them and cropping them. This helps us get a more robust model.</p>
<p>Let's flip 50% of the training data set horizontally and crop them to 224 * 224 pixels.</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="n">train_augs</code> <code class="o">=</code> <code class="p">[</code>
<code class="n">mx</code><code class="o">.</code><code class="n">image</code><code class="o">.</code><code class="n">HorizontalFlipAug</code><code class="p">(</code><code class="o">.</code><code class="mi">5</code><code class="p">),</code>
<code class="n">mx</code><code class="o">.</code><code class="n">image</code><code class="o">.</code><code class="n">RandomCropAug</code><code class="p">((</code><code class="mi">224</code><code class="p">,</code><code class="mi">224</code><code class="p">))</code>
<code class="p">]</code>
</pre>
<p>For the validation and test data sets, let's center crop to get each image to 224 <em> 224. All the images in the train, test, and validation data sets will now be of 224 </em> 224 size.</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="n">val_test_augs</code> <code class="o">=</code> <code class="p">[</code>
<code class="n">mx</code><code class="o">.</code><code class="n">image</code><code class="o">.</code><code class="n">CenterCropAug</code><code class="p">((</code><code class="mi">224</code><code class="p">,</code><code class="mi">224</code><code class="p">))</code>
<code class="p">]</code>
</pre>
<p>To perform the transforms we want on images, define the function <code>transform</code>. Given the data and the augmentation type, it performs the transformation on the data and returns the updated data set.</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="k">def</code> <code class="nf">transform</code><code class="p">(</code><code class="n">data</code><code class="p">,</code> <code class="n">label</code><code class="p">,</code> <code class="n">augs</code><code class="p">):</code>
<code class="n">data</code> <code class="o">=</code> <code class="n">data</code><code class="o">.</code><code class="n">astype</code><code class="p">(</code><code class="s1">'float32'</code><code class="p">)</code>
<code class="k">for</code> <code class="n">aug</code> <code class="ow">in</code> <code class="n">augs</code><code class="p">:</code>
<code class="n">data</code> <code class="o">=</code> <code class="n">aug</code><code class="p">(</code><code class="n">data</code><code class="p">)</code>
<code class="c1"># from (H x W x c) to (c x H x W)</code>
<code class="n">data</code> <code class="o">=</code> <code class="n">mx</code><code class="o">.</code><code class="n">nd</code><code class="o">.</code><code class="n">transpose</code><code class="p">(</code><code class="n">data</code><code class="p">,</code> <code class="p">(</code><code class="mi">2</code><code class="p">,</code><code class="mi">0</code><code class="p">,</code><code class="mi">1</code><code class="p">))</code>
<code class="k">return</code> <code class="n">data</code><code class="p">,</code> <code class="n">mx</code><code class="o">.</code><code class="n">nd</code><code class="o">.</code><code class="n">array</code><code class="p">([</code><code class="n">label</code><code class="p">])</code><code class="o">.</code><code class="n">asscalar</code><code class="p">()</code><code class="o">.</code><code class="n">astype</code><code class="p">(</code><code class="s1">'float32'</code><code class="p">)</code>
</pre>
<p><code>Gluon</code> has an utility function to load image files: <code>mx.gluon.data.vision.ImageFolderDataset</code>. It requires the data to be available in the folder structure illustrated in Figure 2. </p>
<p>The function takes in the following parameters:</p>
<ul>
<li>Path to the root directory where the images are stored</li>
<li>A flag to instruct if images have to be converted to greyscale or color (color is the default option)</li>
<li>A function that takes the data (image) and its label and transforms them</li>
</ul>
<p>The following code shows how to transform the image when loading: </p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="n">train_imgs</code> <code class="o">=</code> <code class="n">mx</code><code class="o">.</code><code class="n">gluon</code><code class="o">.</code><code class="n">data</code><code class="o">.</code><code class="n">vision</code><code class="o">.</code><code class="n">ImageFolderDataset</code><code class="p">(</code>
<code class="n">data_directory</code><code class="o">+</code><code class="s1">'train_data'</code><code class="p">,</code>
<code class="n">transform</code><code class="o">=</code><code class="k">lambda</code> <code class="n">X</code><code class="p">,</code> <code class="n">y</code><code class="p">:</code> <code class="n">transform</code><code class="p">(</code><code class="n">X</code><code class="p">,</code> <code class="n">y</code><code class="p">,</code> <code class="n">train_augs</code><code class="p">))</code>
</pre>
<p>Similarly, the transformations are applied to the validation and test data sets and are loaded.</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="n">val_imgs</code> <code class="o">=</code> <code class="n">mx</code><code class="o">.</code><code class="n">gluon</code><code class="o">.</code><code class="n">data</code><code class="o">.</code><code class="n">vision</code><code class="o">.</code><code class="n">ImageFolderDataset</code><code class="p">(</code>
<code class="n">data_directory</code><code class="o">+</code><code class="s1">'val_data'</code><code class="p">,</code>
<code class="n">transform</code><code class="o">=</code><code class="k">lambda</code> <code class="n">X</code><code class="p">,</code> <code class="n">y</code><code class="p">:</code> <code class="n">transform</code><code class="p">(</code><code class="n">X</code><code class="p">,</code> <code class="n">y</code><code class="p">,</code> <code class="n">val_test_augs</code><code class="p">))</code>
</pre>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="n">test_imgs</code> <code class="o">=</code> <code class="n">mx</code><code class="o">.</code><code class="n">gluon</code><code class="o">.</code><code class="n">data</code><code class="o">.</code><code class="n">vision</code><code class="o">.</code><code class="n">ImageFolderDataset</code><code class="p">(</code>
<code class="n">data_directory</code><code class="o">+</code><code class="s1">'test_data'</code><code class="p">,</code>
<code class="n">transform</code><code class="o">=</code><code class="k">lambda</code> <code class="n">X</code><code class="p">,</code> <code class="n">y</code><code class="p">:</code> <code class="n">transform</code><code class="p">(</code><code class="n">X</code><code class="p">,</code> <code class="n">y</code><code class="p">,</code> <code class="n">val_test_augs</code><code class="p">))</code>
</pre>
<p><code>DataLoader</code> is the built-in utility function to load data from the data set, and it returns mini-batches of data. In the above steps, we have the train, validation, and test data sets defined ( <code>train_imgs</code>, <code>val_imgs</code>, <code>test_imgs</code> respectively). The <code>num_workers</code> attribute lets us define the number of multi-processing workers to use for data pre-processing.</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="n">train_data</code> <code class="o">=</code> <code class="n">mx</code><code class="o">.</code><code class="n">gluon</code><code class="o">.</code><code class="n">data</code><code class="o">.</code><code class="n">DataLoader</code><code class="p">(</code><code class="n">train_imgs</code><code class="p">,</code> <code class="n">batch_size</code><code class="p">,</code><code class="n">num_workers</code><code class="o">=</code><code class="mi">1</code><code class="p">,</code> <code class="n">shuffle</code><code class="o">=</code><code class="bp">True</code><code class="p">)</code>
<code class="n">val_data</code> <code class="o">=</code> <code class="n">mx</code><code class="o">.</code><code class="n">gluon</code><code class="o">.</code><code class="n">data</code><code class="o">.</code><code class="n">DataLoader</code><code class="p">(</code><code class="n">val_imgs</code><code class="p">,</code> <code class="n">batch_size</code><code class="p">,</code> <code class="n">num_workers</code><code class="o">=</code><code class="mi">1</code><code class="p">)</code>
<code class="n">test_data</code> <code class="o">=</code> <code class="n">mx</code><code class="o">.</code><code class="n">gluon</code><code class="o">.</code><code class="n">data</code><code class="o">.</code><code class="n">DataLoader</code><code class="p">(</code><code class="n">test_imgs</code><code class="p">,</code> <code class="n">batch_size</code><code class="p">,</code> <code class="n">num_workers</code><code class="o">=</code><code class="mi">1</code><code class="p">)</code>
</pre>
<p>Now that the images are loaded, let's take a look at them. Let's write a utility function called <code>show_images</code> that displays the images as a grid:</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="k">def</code> <code class="nf">show_images</code><code class="p">(</code><code class="n">imgs</code><code class="p">,</code> <code class="n">nrows</code><code class="p">,</code> <code class="n">ncols</code><code class="p">,</code> <code class="n">figsize</code><code class="o">=</code><code class="bp">None</code><code class="p">):</code>
<code class="sd">"""plot a grid of images"""</code>
<code class="n">figsize</code> <code class="o">=</code> <code class="p">(</code><code class="n">ncols</code><code class="p">,</code> <code class="n">nrows</code><code class="p">)</code>
<code class="n">_</code><code class="p">,</code> <code class="n">figs</code> <code class="o">=</code> <code class="n">plt</code><code class="o">.</code><code class="n">subplots</code><code class="p">(</code><code class="n">nrows</code><code class="p">,</code> <code class="n">ncols</code><code class="p">,</code> <code class="n">figsize</code><code class="o">=</code><code class="n">figsize</code><code class="p">)</code>
<code class="k">for</code> <code class="n">i</code> <code class="ow">in</code> <code class="nb">range</code><code class="p">(</code><code class="n">nrows</code><code class="p">):</code>
<code class="k">for</code> <code class="n">j</code> <code class="ow">in</code> <code class="nb">range</code><code class="p">(</code><code class="n">ncols</code><code class="p">):</code>
<code class="n">figs</code><code class="p">[</code><code class="n">i</code><code class="p">][</code><code class="n">j</code><code class="p">]</code><code class="o">.</code><code class="n">imshow</code><code class="p">(</code><code class="n">imgs</code><code class="p">[</code><code class="n">i</code><code class="o">*</code><code class="n">ncols</code><code class="o">+</code><code class="n">j</code><code class="p">]</code><code class="o">.</code><code class="n">asnumpy</code><code class="p">())</code>
<code class="n">figs</code><code class="p">[</code><code class="n">i</code><code class="p">][</code><code class="n">j</code><code class="p">]</code><code class="o">.</code><code class="n">axes</code><code class="o">.</code><code class="n">get_xaxis</code><code class="p">()</code><code class="o">.</code><code class="n">set_visible</code><code class="p">(</code><code class="bp">False</code><code class="p">)</code>
<code class="n">figs</code><code class="p">[</code><code class="n">i</code><code class="p">][</code><code class="n">j</code><code class="p">]</code><code class="o">.</code><code class="n">axes</code><code class="o">.</code><code class="n">get_yaxis</code><code class="p">()</code><code class="o">.</code><code class="n">set_visible</code><code class="p">(</code><code class="bp">False</code><code class="p">)</code>
<code class="n">plt</code><code class="o">.</code><code class="n">show</code><code class="p">()</code>
</pre>
<p>Now, display the first 32 images in a 8 * 4 grid:</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="k">for</code> <code class="n">X</code><code class="p">,</code> <code class="n">_</code> <code class="ow">in</code> <code class="n">train_data</code><code class="p">:</code>
<code class="c1"># from (B x c x H x W) to (Bx H x W x c)</code>
<code class="n">X</code> <code class="o">=</code> <code class="n">X</code><code class="o">.</code><code class="n">transpose</code><code class="p">((</code><code class="mi">0</code><code class="p">,</code><code class="mi">2</code><code class="p">,</code><code class="mi">3</code><code class="p">,</code><code class="mi">1</code><code class="p">))</code><code class="o">.</code><code class="n">clip</code><code class="p">(</code><code class="mi">0</code><code class="p">,</code><code class="mi">255</code><code class="p">)</code><code class="o">/</code><code class="mi">255</code>
<code class="n">show_images</code><code class="p">(</code><code class="n">X</code><code class="p">,</code> <code class="mi">4</code><code class="p">,</code> <code class="mi">8</code><code class="p">)</code>
<code class="k">break</code>
</pre>
<figure class="center" id="id-2ril8"><img alt="grid of images" src="https://d3ansictanv2wj.cloudfront.net/grid-a8440586e150c81d9ac7812c4b7c8017.png"><figcaption><span class="label">Figure 3. </span>Grid of images after transformations are performed. Image by Tuhin Sharma.</figcaption></figure>
<p>Results are shown in Figure 3. Some of the images seem to contain logos, often truncated.</p>
<p><strong>Utility functions for training</strong></p>
<p>In this section, we will define utility functions to do the following:</p>
<ul>
<li>Get the data for the batch being currently processed</li>
<li>Evaluate the accuracy of the model</li>
<li>Train the model</li>
<li>Get the image, given a URL</li>
<li>Predict the image's label, given the image</li>
</ul>
<p>The first function, <code>_get_batch</code>, returns the data and label, given the batch.</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="k">def</code> <code class="nf">_get_batch</code><code class="p">(</code><code class="n">batch</code><code class="p">,</code> <code class="n">ctx</code><code class="p">):</code>
<code class="sd">"""return data and label on ctx"""</code>
<code class="n">data</code><code class="p">,</code> <code class="n">label</code> <code class="o">=</code> <code class="n">batch</code>
<code class="k">return</code> <code class="p">(</code><code class="n">mx</code><code class="o">.</code><code class="n">gluon</code><code class="o">.</code><code class="n">utils</code><code class="o">.</code><code class="n">split_and_load</code><code class="p">(</code><code class="n">data</code><code class="p">,</code> <code class="n">ctx</code><code class="p">),</code>
<code class="n">mx</code><code class="o">.</code><code class="n">gluon</code><code class="o">.</code><code class="n">utils</code><code class="o">.</code><code class="n">split_and_load</code><code class="p">(</code><code class="n">label</code><code class="p">,</code> <code class="n">ctx</code><code class="p">),</code>
<code class="n">data</code><code class="o">.</code><code class="n">shape</code><code class="p">[</code><code class="mi">0</code><code class="p">])</code>
</pre>
<p>The function <code>evaluate_accuracy</code> returns the classification accuracy of the model. We have chosen a simple accuracy metric for the purpose of this article. In practice, the accuracy metric is chosen based on the application need. </p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="k">def</code> <code class="nf">evaluate_accuracy</code><code class="p">(</code><code class="n">data_iterator</code><code class="p">,</code> <code class="n">net</code><code class="p">,</code> <code class="n">ctx</code><code class="p">):</code>
<code class="n">acc</code> <code class="o">=</code> <code class="n">mx</code><code class="o">.</code><code class="n">nd</code><code class="o">.</code><code class="n">array</code><code class="p">([</code><code class="mi">0</code><code class="p">])</code>
<code class="n">n</code> <code class="o">=</code> <code class="mf">0.</code>
<code class="k">for</code> <code class="n">batch</code> <code class="ow">in</code> <code class="n">data_iterator</code><code class="p">:</code>
<code class="n">data</code><code class="p">,</code> <code class="n">label</code><code class="p">,</code> <code class="n">batch_size</code> <code class="o">=</code> <code class="n">_get_batch</code><code class="p">(</code><code class="n">batch</code><code class="p">,</code> <code class="n">ctx</code><code class="p">)</code>
<code class="k">for</code> <code class="n">X</code><code class="p">,</code> <code class="n">y</code> <code class="ow">in</code> <code class="nb">zip</code><code class="p">(</code><code class="n">data</code><code class="p">,</code> <code class="n">label</code><code class="p">):</code>
<code class="n">acc</code> <code class="o">+=</code> <code class="n">mx</code><code class="o">.</code><code class="n">nd</code><code class="o">.</code><code class="n">sum</code><code class="p">(</code><code class="n">net</code><code class="p">(</code><code class="n">X</code><code class="p">)</code><code class="o">.</code><code class="n">argmax</code><code class="p">(</code><code class="n">axis</code><code class="o">=</code><code class="mi">1</code><code class="p">)</code><code class="o">==</code><code class="n">y</code><code class="p">)</code><code class="o">.</code><code class="n">copyto</code><code class="p">(</code><code class="n">mx</code><code class="o">.</code><code class="n">cpu</code><code class="p">())</code>
<code class="n">n</code> <code class="o">+=</code> <code class="n">y</code><code class="o">.</code><code class="n">size</code>
<code class="n">acc</code><code class="o">.</code><code class="n">wait_to_read</code><code class="p">()</code>
<code class="k">return</code> <code class="n">acc</code><code class="o">.</code><code class="n">asscalar</code><code class="p">()</code> <code class="o">/</code> <code class="n">n</code>
</pre>
<p>The next function we will define is the <code>train</code> function. This is by far the biggest function we will create in this article. </p>
<p>Given an existing model, the train, test, and validation data sets, the model is trained for the number of epochs specified. Our <a href="https://www.oreilly.com/ideas/uncovering-hidden-patterns-through-machine-learning">previous article</a> contained a more detailed overview of how this function works.</p>
<p>Whenever the best accuracy on the validation data set is found, the model is checkpointed. For each epoch, the train, validation, and test accuracies are printed.</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="k">def</code> <code class="nf">train</code><code class="p">(</code><code class="n">net</code><code class="p">,</code> <code class="n">ctx</code><code class="p">,</code> <code class="n">train_data</code><code class="p">,</code> <code class="n">val_data</code><code class="p">,</code> <code class="n">test_data</code><code class="p">,</code> <code class="n">batch_size</code><code class="p">,</code> <code class="n">num_epochs</code><code class="p">,</code> <code class="n">model_prefix</code><code class="p">,</code> <code class="n">hybridize</code><code class="o">=</code><code class="bp">False</code><code class="p">,</code> <code class="n">learning_rate</code><code class="o">=</code><code class="mf">0.01</code><code class="p">,</code> <code class="n">wd</code><code class="o">=</code><code class="mf">0.001</code><code class="p">):</code>
<code class="n">net</code><code class="o">.</code><code class="n">collect_params</code><code class="p">()</code><code class="o">.</code><code class="n">reset_ctx</code><code class="p">(</code><code class="n">ctx</code><code class="p">)</code>
<code class="k">if</code> <code class="n">hybridize</code> <code class="o">==</code> <code class="bp">True</code><code class="p">:</code>
<code class="n">net</code><code class="o">.</code><code class="n">hybridize</code><code class="p">()</code>
<code class="n">loss</code> <code class="o">=</code> <code class="n">mx</code><code class="o">.</code><code class="n">gluon</code><code class="o">.</code><code class="n">loss</code><code class="o">.</code><code class="n">SoftmaxCrossEntropyLoss</code><code class="p">()</code>
<code class="n">trainer</code> <code class="o">=</code> <code class="n">mx</code><code class="o">.</code><code class="n">gluon</code><code class="o">.</code><code class="n">Trainer</code><code class="p">(</code><code class="n">net</code><code class="o">.</code><code class="n">collect_params</code><code class="p">(),</code> <code class="s1">'sgd'</code><code class="p">,</code> <code class="p">{</code>
<code class="s1">'learning_rate'</code><code class="p">:</code> <code class="n">learning_rate</code><code class="p">,</code> <code class="s1">'wd'</code><code class="p">:</code> <code class="n">wd</code><code class="p">})</code>
<code class="n">best_epoch</code> <code class="o">=</code> <code class="o">-</code><code class="mi">1</code>
<code class="n">best_acc</code> <code class="o">=</code> <code class="mf">0.0</code>
<code class="k">if</code> <code class="nb">isinstance</code><code class="p">(</code><code class="n">ctx</code><code class="p">,</code> <code class="n">mx</code><code class="o">.</code><code class="n">Context</code><code class="p">):</code>
<code class="n">ctx</code> <code class="o">=</code> <code class="p">[</code><code class="n">ctx</code><code class="p">]</code>
<code class="k">for</code> <code class="n">epoch</code> <code class="ow">in</code> <code class="nb">range</code><code class="p">(</code><code class="n">num_epochs</code><code class="p">):</code>
<code class="n">train_loss</code><code class="p">,</code> <code class="n">train_acc</code><code class="p">,</code> <code class="n">n</code> <code class="o">=</code> <code class="mf">0.0</code><code class="p">,</code> <code class="mf">0.0</code><code class="p">,</code> <code class="mf">0.0</code>
<code class="n">start</code> <code class="o">=</code> <code class="n">time</code><code class="p">()</code>
<code class="k">for</code> <code class="n">i</code><code class="p">,</code> <code class="n">batch</code> <code class="ow">in</code> <code class="nb">enumerate</code><code class="p">(</code><code class="n">train_data</code><code class="p">):</code>
<code class="n">data</code><code class="p">,</code> <code class="n">label</code><code class="p">,</code> <code class="n">batch_size</code> <code class="o">=</code> <code class="n">_get_batch</code><code class="p">(</code><code class="n">batch</code><code class="p">,</code> <code class="n">ctx</code><code class="p">)</code>
<code class="n">losses</code> <code class="o">=</code> <code class="p">[]</code>
<code class="k">with</code> <code class="n">mx</code><code class="o">.</code><code class="n">autograd</code><code class="o">.</code><code class="n">record</code><code class="p">():</code>
<code class="n">outputs</code> <code class="o">=</code> <code class="p">[</code><code class="n">net</code><code class="p">(</code><code class="n">X</code><code class="p">)</code> <code class="k">for</code> <code class="n">X</code> <code class="ow">in</code> <code class="n">data</code><code class="p">]</code>
<code class="n">losses</code> <code class="o">=</code> <code class="p">[</code><code class="n">loss</code><code class="p">(</code><code class="n">yhat</code><code class="p">,</code> <code class="n">y</code><code class="p">)</code> <code class="k">for</code> <code class="n">yhat</code><code class="p">,</code> <code class="n">y</code> <code class="ow">in</code> <code class="nb">zip</code><code class="p">(</code><code class="n">outputs</code><code class="p">,</code> <code class="n">label</code><code class="p">)]</code>
<code class="k">for</code> <code class="n">l</code> <code class="ow">in</code> <code class="n">losses</code><code class="p">:</code>
<code class="n">l</code><code class="o">.</code><code class="n">backward</code><code class="p">()</code>
<code class="n">train_loss</code> <code class="o">+=</code> <code class="nb">sum</code><code class="p">([</code><code class="n">l</code><code class="o">.</code><code class="n">sum</code><code class="p">()</code><code class="o">.</code><code class="n">asscalar</code><code class="p">()</code> <code class="k">for</code> <code class="n">l</code> <code class="ow">in</code> <code class="n">losses</code><code class="p">])</code>
<code class="n">trainer</code><code class="o">.</code><code class="n">step</code><code class="p">(</code><code class="n">batch_size</code><code class="p">)</code>
<code class="n">n</code> <code class="o">+=</code> <code class="n">batch_size</code>
<code class="n">train_acc</code> <code class="o">=</code> <code class="n">evaluate_accuracy</code><code class="p">(</code><code class="n">train_data</code><code class="p">,</code> <code class="n">net</code><code class="p">,</code> <code class="n">ctx</code><code class="p">)</code>
<code class="n">val_acc</code> <code class="o">=</code> <code class="n">evaluate_accuracy</code><code class="p">(</code><code class="n">val_data</code><code class="p">,</code> <code class="n">net</code><code class="p">,</code> <code class="n">ctx</code><code class="p">)</code>
<code class="n">test_acc</code> <code class="o">=</code> <code class="n">evaluate_accuracy</code><code class="p">(</code><code class="n">test_data</code><code class="p">,</code> <code class="n">net</code><code class="p">,</code> <code class="n">ctx</code><code class="p">)</code>
<code class="k">print</code><code class="p">(</code><code class="s2">"Epoch </code><code class="si">%d</code><code class="s2">. Loss: </code><code class="si">%.3f</code><code class="s2">, Train acc </code><code class="si">%.2f</code><code class="s2">, Val acc </code><code class="si">%.2f</code><code class="s2">, Test acc </code><code class="si">%.2f</code><code class="s2">, Time </code><code class="si">%.1f</code><code class="s2"> sec"</code> <code class="o">%</code> <code class="p">(</code>
<code class="n">epoch</code><code class="p">,</code> <code class="n">train_loss</code><code class="o">/</code><code class="n">n</code><code class="p">,</code> <code class="n">train_acc</code><code class="p">,</code> <code class="n">val_acc</code><code class="p">,</code> <code class="n">test_acc</code><code class="p">,</code> <code class="n">time</code><code class="p">()</code> <code class="o">-</code> <code class="n">start</code>
<code class="p">))</code>
<code class="k">if</code> <code class="n">val_acc</code> <code class="o">&gt;</code> <code class="n">best_acc</code><code class="p">:</code>
<code class="n">best_acc</code> <code class="o">=</code> <code class="n">val_acc</code>
<code class="k">if</code> <code class="n">best_epoch</code><code class="o">!=-</code><code class="mi">1</code><code class="p">:</code>
<code class="k">print</code><code class="p">(</code><code class="s1">'Deleting previous checkpoint...'</code><code class="p">)</code>
<code class="n">os</code><code class="o">.</code><code class="n">remove</code><code class="p">(</code><code class="n">model_prefix</code><code class="o">+</code><code class="s1">'-</code><code class="si">%d</code><code class="s1">.params'</code><code class="o">%</code><code class="p">(</code><code class="n">best_epoch</code><code class="p">))</code>
<code class="n">best_epoch</code> <code class="o">=</code> <code class="n">epoch</code>
<code class="k">print</code><code class="p">(</code><code class="s1">'Best validation accuracy found. Checkpointing...'</code><code class="p">)</code>
<code class="n">net</code><code class="o">.</code><code class="n">collect_params</code><code class="p">()</code><code class="o">.</code><code class="n">save</code><code class="p">(</code><code class="n">model_prefix</code><code class="o">+</code><code class="s1">'-</code><code class="si">%d</code><code class="s1">.params'</code><code class="o">%</code><code class="p">(</code><code class="n">epoch</code><code class="p">))</code>
</pre>
<p>The function <code>get_image</code> returns the image from a given URL. This is used for testing the model's accuracy</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="k">def</code> <code class="nf">get_image</code><code class="p">(</code><code class="n">url</code><code class="p">,</code> <code class="n">show</code><code class="o">=</code><code class="bp">False</code><code class="p">):</code>
<code class="c1"># download and show the image</code>
<code class="n">fname</code> <code class="o">=</code> <code class="n">mx</code><code class="o">.</code><code class="n">test_utils</code><code class="o">.</code><code class="n">download</code><code class="p">(</code><code class="n">url</code><code class="p">)</code>
<code class="n">img</code> <code class="o">=</code> <code class="n">cv2</code><code class="o">.</code><code class="n">cvtColor</code><code class="p">(</code><code class="n">cv2</code><code class="o">.</code><code class="n">imread</code><code class="p">(</code><code class="n">fname</code><code class="p">),</code> <code class="n">cv2</code><code class="o">.</code><code class="n">COLOR_BGR2RGB</code><code class="p">)</code>
<code class="n">img</code> <code class="o">=</code> <code class="n">cv2</code><code class="o">.</code><code class="n">resize</code><code class="p">(</code><code class="n">img</code><code class="p">,</code> <code class="p">(</code><code class="mi">224</code><code class="p">,</code> <code class="mi">224</code><code class="p">))</code>
<code class="n">plt</code><code class="o">.</code><code class="n">imshow</code><code class="p">(</code><code class="n">img</code><code class="p">)</code>
<code class="k">return</code> <code class="n">fname</code>
</pre>
<p>The final utility function we will define is <code>classify_logo</code>. Given the image and the model, the function returns the class of the image (in this case, the brand name) and its associated probability. </p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="k">def</code> <code class="nf">classify_logo</code><code class="p">(</code><code class="n">net</code><code class="p">,</code> <code class="n">url</code><code class="p">):</code>
<code class="n">fname</code> <code class="o">=</code> <code class="n">get_image</code><code class="p">(</code><code class="n">url</code><code class="p">)</code>
<code class="k">with</code> <code class="nb">open</code><code class="p">(</code><code class="n">fname</code><code class="p">,</code> <code class="s1">'rb'</code><code class="p">)</code> <code class="k">as</code> <code class="n">f</code><code class="p">:</code>
<code class="n">img</code> <code class="o">=</code> <code class="n">mx</code><code class="o">.</code><code class="n">image</code><code class="o">.</code><code class="n">imdecode</code><code class="p">(</code><code class="n">f</code><code class="o">.</code><code class="n">read</code><code class="p">())</code>
<code class="n">data</code><code class="p">,</code> <code class="n">_</code> <code class="o">=</code> <code class="n">transform</code><code class="p">(</code><code class="n">img</code><code class="p">,</code> <code class="o">-</code><code class="mi">1</code><code class="p">,</code> <code class="n">val_test_augs</code><code class="p">)</code>
<code class="n">data</code> <code class="o">=</code> <code class="n">data</code><code class="o">.</code><code class="n">expand_dims</code><code class="p">(</code><code class="n">axis</code><code class="o">=</code><code class="mi">0</code><code class="p">)</code>
<code class="n">out</code> <code class="o">=</code> <code class="n">net</code><code class="p">(</code><code class="n">data</code><code class="o">.</code><code class="n">as_in_context</code><code class="p">(</code><code class="n">ctx</code><code class="p">[</code><code class="mi">0</code><code class="p">]))</code>
<code class="n">out</code> <code class="o">=</code> <code class="n">mx</code><code class="o">.</code><code class="n">nd</code><code class="o">.</code><code class="n">SoftmaxActivation</code><code class="p">(</code><code class="n">out</code><code class="p">)</code>
<code class="n">pred</code> <code class="o">=</code> <code class="nb">int</code><code class="p">(</code><code class="n">mx</code><code class="o">.</code><code class="n">nd</code><code class="o">.</code><code class="n">argmax</code><code class="p">(</code><code class="n">out</code><code class="p">,</code> <code class="n">axis</code><code class="o">=</code><code class="mi">1</code><code class="p">)</code><code class="o">.</code><code class="n">asscalar</code><code class="p">())</code>
<code class="n">prob</code> <code class="o">=</code> <code class="n">out</code><code class="p">[</code><code class="mi">0</code><code class="p">][</code><code class="n">pred</code><code class="p">]</code><code class="o">.</code><code class="n">asscalar</code><code class="p">()</code>
<code class="n">label</code> <code class="o">=</code> <code class="n">train_imgs</code><code class="o">.</code><code class="n">synsets</code>
<code class="k">return</code> <code class="s1">'With prob=</code><code class="si">%f</code><code class="s1">, </code><code class="si">%s</code><code class="s1">'</code><code class="o">%</code><code class="p">(</code><code class="n">prob</code><code class="p">,</code> <code class="n">label</code><code class="p">[</code><code class="n">pred</code><code class="p">])</code>
</pre>
<p><strong>Model</strong></p>
<p>Understanding the model architecture is quite important. In our <a href="https://www.oreilly.com/ideas/uncovering-hidden-patterns-through-machine-learning">previous article</a>, we built a multi-layer perceptron (MLP). The architecture is shown in Figure 4.</p>
<figure class="center" id="id-YOix4"><img alt="multi-layer perceptron" src="https://d3ansictanv2wj.cloudfront.net/mlp1-e2c8d7f2c985a52d73e2d6694b95e89e.png"><figcaption><span class="label">Figure 4. </span>Multi-layer perceptron. Image by Tuhin Sharma.</figcaption></figure>
<p>How would the input layer for an MLP model be? Our data is 224 * 224 pixels in size. </p>
<p>The most common way to create the input layer from that is to flatten it and create an input layer with 50,176 (224 * 224) neurons, ending up with a simple bit stream as shown in Figure 5.</p>
<figure class="center" id="id-nGivO"><img alt="flattened input" src="https://d3ansictanv2wj.cloudfront.net/flatten1-789eae041758f1f777907f0df6405b41.png"><figcaption><span class="label">Figure 5. </span>Flattened input. Image by Tuhin Sharma.</figcaption></figure>
<p>But image data has a lot of spatial information that is lost when such flattening is done. And the other challenge is the number of weights. If the first hidden layer has 30 hidden neurons, the number of parameters in the model will be 50,176 * 30 + 30 bias units. So, this doesn't seem to be the right modeling approach for images. </p>
<p>Let's now discuss the more appropriate architecture: a convolutional neural network (CNN) for image classification.</p>
<p><strong>Convolutional neural network (CNN)</strong></p>
<p>CNNs are similar to MLPs, in the sense that they are also made up of neurons whose weights we learn. The key difference is that the inputs are images, and the archicture allows us to exploit the properties of the images into the architecture. </p>
<p>CNNs have convolutional layers. The term "convolution" is taken from image processing, and it is described by Figure 6. This works on a small window, called a "receptive field," instead of all the inputs from the previous layer. This allows the model to learn localized features. </p>
<p>Each layer moves a small matrix, called a kernel, over the part of the image fed to that layer. It adjusts each pixel to reflect the pixels around it, an operation that helps identify edges. Figure 6 shows an image on the left, a 3x3 kernel in the middle, and the results of applying the kernel to the top-left pixel on the right. We can also define multiple kernels, representing different feature maps.</p>
<figure class="center" id="id-7XiE7"><img alt="convolutional layer" src="https://d3ansictanv2wj.cloudfront.net/cnn1-9fd75d7d1bd3fc2abcf4956adab5a692.png"><figcaption><span class="label">Figure 6. </span>Convolutional layer. Image by Tuhin Sharma.</figcaption></figure>
<p>In the example in Figure 6, the input image was 5x5 and the kernel was 3x3. The computation was an element-wise multplication between the two matrices. The output was 5x5.</p>
<p>To understand this, we need to understand two parameters at the convolution layer: <em>stride</em> and <em>padding</em>.</p>
<p>Stride controls how the kernel (filter) moves along the image.</p>
<p>Figure 7 illustrates the movement of the kernel from the first pixel to the second.</p>
<figure class="center" id="id-aBiNq"><img alt="kernel movement" src="https://d3ansictanv2wj.cloudfront.net/cnn2-56e2a9d95bbb5635f1c38ea5ae67429d.png"><figcaption><span class="label">Figure 7. </span>Kernel movement. Image by Tuhin Sharma.</figcaption></figure>
<p>In the Figure 7, the stride is 1.</p>
<p>When a 5x5 image is convolved with a 3x3 kernel, we will be getting a 3x3 image. Consider the case where we add a zero padding around the image. The 5x5 image is now surrounded with 0. This is illustrated in Figure 8.</p>
<figure class="center" id="id-AEiKn"><img alt="zero padding" src="https://d3ansictanv2wj.cloudfront.net/padding1-de85fcb4f5d1c8a412c89468ac95048d.png"><figcaption><span class="label">Figure 8. </span>Zero padding. Image by Tuhin Sharma.</figcaption></figure>
<p>This, when multipled by a 3x3 kernel, will result in a 5x5 output. </p>
<p>So, for the computation shown in Figure 6, it had a stride of 1 and padding of size 1. </p>
<p>CNN works with drastically fewer weights than the corresponding MLP. Say we use 30 kernels, each with 3x3 elements. Each kernel has 3x3 = 9 + 1 (for bias) parameters. This leads to 10 weights per kernel, 300 for 30 kernels. Contraste this against the 150,000 weights for the MLP in the previous section.</p>
<p>The next layer is typically a sub-sampling layer. Once we have identified the features, this sub-sampling layer simplifies the information. A common method is max pooling, which outputs the greatest value from each localized region of the output from the convolutional layer (see Figure 9). It reduces the output size, while preserving the maximum activation in every localized region.</p>
<figure class="center" id="id-kAiAw"><img alt="max pooling" src="https://d3ansictanv2wj.cloudfront.net/cnn3-b1a07a981eca06ce251634763ea3a28d.png"><figcaption><span class="label">Figure 9. </span>Max pooling. Image by Tuhin Sharma.</figcaption></figure>
<p>You can see that it reduces the output size, while preserving the maximum activation in every localized region. </p>
<p>A good resource for more information on CNNs is the online book, <a href="http://neuralnetworksanddeeplearning.com/chap6.html">Neural Networks and Deep Learning</a>. Another good resource is Stanford University's <a href="http://cs231n.github.io/convolutional-networks/">CNN course</a></p>
<p>Now that we have learned the basics of what CNN is, let’s go and implement it for our problem using <code>gluon</code>.</p>
<p>The first step is to define the architecture:</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="n">cnn_net</code> <code class="o">=</code> <code class="n">mx</code><code class="o">.</code><code class="n">gluon</code><code class="o">.</code><code class="n">nn</code><code class="o">.</code><code class="n">Sequential</code><code class="p">()</code>
<code class="k">with</code> <code class="n">cnn_net</code><code class="o">.</code><code class="n">name_scope</code><code class="p">():</code>
<code class="c1"># First convolutional layer</code>
<code class="n">cnn_net</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">mx</code><code class="o">.</code><code class="n">gluon</code><code class="o">.</code><code class="n">nn</code><code class="o">.</code><code class="n">Conv2D</code><code class="p">(</code><code class="n">channels</code><code class="o">=</code><code class="mi">96</code><code class="p">,</code> <code class="n">kernel_size</code><code class="o">=</code><code class="mi">11</code><code class="p">,</code> <code class="n">strides</code><code class="o">=</code><code class="p">(</code><code class="mi">4</code><code class="p">,</code><code class="mi">4</code><code class="p">),</code> <code class="n">activation</code><code class="o">=</code><code class="s1">'relu'</code><code class="p">))</code>
<code class="n">cnn_net</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">mx</code><code class="o">.</code><code class="n">gluon</code><code class="o">.</code><code class="n">nn</code><code class="o">.</code><code class="n">MaxPool2D</code><code class="p">(</code><code class="n">pool_size</code><code class="o">=</code><code class="mi">3</code><code class="p">,</code> <code class="n">strides</code><code class="o">=</code><code class="mi">2</code><code class="p">))</code>
<code class="c1"># Second convolutional layer</code>
<code class="n">cnn_net</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">mx</code><code class="o">.</code><code class="n">gluon</code><code class="o">.</code><code class="n">nn</code><code class="o">.</code><code class="n">Conv2D</code><code class="p">(</code><code class="n">channels</code><code class="o">=</code><code class="mi">192</code><code class="p">,</code> <code class="n">kernel_size</code><code class="o">=</code><code class="mi">5</code><code class="p">,</code> <code class="n">activation</code><code class="o">=</code><code class="s1">'relu'</code><code class="p">))</code>
<code class="n">cnn_net</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">mx</code><code class="o">.</code><code class="n">gluon</code><code class="o">.</code><code class="n">nn</code><code class="o">.</code><code class="n">MaxPool2D</code><code class="p">(</code><code class="n">pool_size</code><code class="o">=</code><code class="mi">3</code><code class="p">,</code> <code class="n">strides</code><code class="o">=</code><code class="p">(</code><code class="mi">2</code><code class="p">,</code><code class="mi">2</code><code class="p">)))</code>
<code class="c1"># Flatten and apply fullly connected layers</code>
<code class="n">cnn_net</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">mx</code><code class="o">.</code><code class="n">gluon</code><code class="o">.</code><code class="n">nn</code><code class="o">.</code><code class="n">Flatten</code><code class="p">())</code>
<code class="n">cnn_net</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">mx</code><code class="o">.</code><code class="n">gluon</code><code class="o">.</code><code class="n">nn</code><code class="o">.</code><code class="n">Dense</code><code class="p">(</code><code class="mi">4096</code><code class="p">,</code> <code class="n">activation</code><code class="o">=</code><code class="s2">"relu"</code><code class="p">))</code>
<code class="n">cnn_net</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">mx</code><code class="o">.</code><code class="n">gluon</code><code class="o">.</code><code class="n">nn</code><code class="o">.</code><code class="n">Dense</code><code class="p">(</code><code class="n">num_classes</code><code class="p">))</code>
</pre>
<p>Now that the model architecture is defined, let's initialize the weights of the network. We will use the <a href="http://andyljones.tumblr.com/post/110998971763/an-explanation-of-xavier-initialization">Xavier initalizer</a>.</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="n">cnn_net</code><code class="o">.</code><code class="n">collect_params</code><code class="p">()</code><code class="o">.</code><code class="n">initialize</code><code class="p">(</code><code class="n">mx</code><code class="o">.</code><code class="n">init</code><code class="o">.</code><code class="n">Xavier</code><code class="p">(</code><code class="n">magnitude</code><code class="o">=</code><code class="mf">2.24</code><code class="p">),</code> <code class="n">ctx</code><code class="o">=</code><code class="n">ctx</code><code class="p">)</code>
</pre>
<p>Once the weights are initialized, we can train the model. We will call the same <code>train</code> function defined earlier and pass the required parameters for the function.</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="n">train</code><code class="p">(</code><code class="n">cnn_net</code><code class="p">,</code> <code class="n">ctx</code><code class="p">,</code> <code class="n">train_data</code><code class="p">,</code> <code class="n">val_data</code><code class="p">,</code> <code class="n">test_data</code><code class="p">,</code> <code class="n">batch_size</code><code class="p">,</code> <code class="n">num_epochs</code><code class="p">,</code><code class="n">model_prefix</code><code class="o">=</code><code class="s1">'cnn'</code><code class="p">)</code>
</pre>
<p><code>Epoch 0. Loss: 53.771, Train acc 0.77, Val acc 0.58, Test acc 0.72, Time 224.9 sec</code><br>
<code>Best validation accuracy found. Checkpointing...</code><br>
<code>Epoch 1. Loss: 3.417, Train acc 0.80, Val acc 0.60, Test acc 0.73, Time 222.7 sec</code><br>
<code>Deleting previous checkpoint...</code><br>
<code>Best validation accuracy found. Checkpointing...</code><br>
<code>Epoch 2. Loss: 3.333, Train acc 0.81, Val acc 0.60, Test acc 0.74, Time 222.5 sec</code><br>
<code>Deleting previous checkpoint...</code><br>
<code>Best validation accuracy found. Checkpointing...</code><br>
<code>Epoch 3. Loss: 3.227, Train acc 0.82, Val acc 0.61, Test acc 0.75, Time 222.4 sec</code><br>
<code>Deleting previous checkpoint...</code><br>
<code>Best validation accuracy found. Checkpointing...</code><br>
<code>Epoch 4. Loss: 3.079, Train acc 0.82, Val acc 0.61, Test acc 0.75, Time 222.0 sec</code><br>
<code>Deleting previous checkpoint...</code><br>
<code>Best validation accuracy found. Checkpointing...</code><br>
<code>Epoch 5. Loss: 2.850, Train acc 0.82, Val acc 0.61, Test acc 0.76, Time 222.7 sec</code><br>
<code>Deleting previous checkpoint...</code><br>
<code>Best validation accuracy found. Checkpointing...</code><br>
<code>Epoch 6. Loss: 2.488, Train acc 0.82, Val acc 0.61, Test acc 0.76, Time 222.1 sec</code><br>
<code>Epoch 7. Loss: 1.943, Train acc 0.82, Val acc 0.61, Test acc 0.76, Time 221.3 sec</code><br>
<code>Epoch 8. Loss: 1.395, Train acc 0.82, Val acc 0.61, Test acc 0.76, Time 223.6 sec</code><br>
<code>Epoch 9. Loss: 1.146, Train acc 0.82, Val acc 0.61, Test acc 0.76, Time 222.5 sec</code><br>
<code>Epoch 10. Loss: 1.089, Train acc 0.82, Val acc 0.61, Test acc 0.76, Time 221.5 sec</code><br>
<code>Epoch 11. Loss: 1.078, Train acc 0.82, Val acc 0.61, Test acc 0.76, Time 220.7 sec</code><br>
<code>Epoch 12. Loss: 1.078, Train acc 0.82, Val acc 0.61, Test acc 0.76, Time 221.1 sec</code><br>
<code>Epoch 13. Loss: 1.075, Train acc 0.82, Val acc 0.61, Test acc 0.76, Time 221.3 sec</code><br>
<code>Epoch 14. Loss: 1.076, Train acc 0.82, Val acc 0.61, Test acc 0.76, Time 221.3 sec</code><br>
<code>Epoch 15. Loss: 1.076, Train acc 0.82, Val acc 0.61, Test acc 0.76, Time 220.4 sec</code><br>
<code>Epoch 16. Loss: 1.075, Train acc 0.82, Val acc 0.61, Test acc 0.76, Time 221.3 sec</code><br>
<code>Epoch 17. Loss: 1.074, Train acc 0.82, Val acc 0.61, Test acc 0.76, Time 221.8 sec</code><br>
<code>Epoch 18. Loss: 1.074, Train acc 0.82, Val acc 0.61, Test acc 0.76, Time 221.8 sec</code><br>
<code>Epoch 19. Loss: 1.073, Train acc 0.82, Val acc 0.61, Test acc 0.76, Time 220.9 sec</code></p>
<p>We asked the model to run for 20 epochs. Typically, we train for many epochs and pick the model at the epoch where the validation accuracy is the highest. Here, after 20 epochs, we can see from the log just shown that the model's best validation accuracy was in epoch 5. After that, the model doesn't seem to have learned much. Probably, the network was saturated and learning took place very slowly. We’ll try out a better approach in the next section, but first we’ll see how our current model performs.</p>
<p>Collect the parameters of the epoch that had the best validation accuracy and assign it as our model parameters:</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="n">cnn_net</code><code class="o">.</code><code class="n">collect_params</code><code class="p">()</code><code class="o">.</code><code class="n">load</code><code class="p">(</code><code class="s1">'cnn-</code><code class="si">%d</code><code class="s1">.params'</code><code class="o">%</code><code class="p">(</code><code class="mi">5</code><code class="p">),</code><code class="n">ctx</code><code class="p">)</code>
</pre>
<p>Let’s now check how the model performs on new data. We’ll get an easy-to-recognize images from the Web (Figure 10) and see the model's accuracy.</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="n">img_url</code> <code class="o">=</code> <code class="s2">"http://sophieswift.com/wp-content/uploads/2017/09/pleasing-ideas-bmw-cake-and-satisfying-some-bmw-themed-cakes-crustncakes-delicious-cakes-128x128.jpg"</code>
<code class="n">classify_logo</code><code class="p">(</code><code class="n">cnn_net</code><code class="p">,</code> <code class="n">img_url</code><code class="p">)</code>
</pre>
<p><code>'With prob=0.081522, no-logo'</code></p>
<figure class="center" id="id-oDiyL"><img alt="BMW logo" src="https://d3ansictanv2wj.cloudfront.net/bmw-ae42be6f8eebd82a95a17d9292b16f5f.png"><figcaption><span class="label">Figure 10. </span>BMW logo. Image by Tuhin Sharma.</figcaption></figure>
<p>The model’s prediction has been terrible. It predicts the image to have no logo with probability of 8%. The prediction is wrong and the probability is quite weak.</p>
<p>Let’s try one more test image (see Figure 11) to see whether accuracy is any better.</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="n">img_url</code> <code class="o">=</code> <code class="s2">"https://dtgxwmigmg3gc.cloudfront.net/files/59cdcd6f52ba0b36b5024500-icon-256x256.png"</code>
<code class="n">classify_logo</code><code class="p">(</code><code class="n">cnn_net</code><code class="p">,</code> <code class="n">img_url</code><code class="p">)</code>
</pre>
<p><code>'With prob=0.075301, no-logo'</code></p>
<figure class="center" id="id-vyirO"><img alt="foster’s logo" src="https://d3ansictanv2wj.cloudfront.net/fosters-71eaf7e44ea3dea55836926584853371.png"><figcaption><span class="label">Figure 11. </span>Foster’s logo. Image by Tuhin Sharma.</figcaption></figure>
<p>Yet again, the model’s prediction is wrong and the probability is quite weak.</p>
<p>We don't have much data, and the model training has saturated, as just seen. We can experiment with more model architectures, but we won’t overcome the problems of small data sets and trainable parameters much greater than the number of training images. So, how do we get around this problem? Can't deep learning be used if there isn't much data?</p>
<p>The answer to that is <em>transfer learning</em>, discussed next.</p>
<p><strong>Transfer learning</strong></p>
<p>Consider this analogy: you want to pick up a new foreign language. How does the learning happen? </p>
<p>You would take a conversation, say, for example:
Instructor: How are you doing?
You: I am good. How about you?</p>
<p>And you will try to learn the equivalent of this in the new language.</p>
<p>Because of your proficiency in English, you don't start learning a new language from scratch (even if it seems that you do). You already have the mental map of a language, and you try to find the corresponding words in the new language. Therefore, in the new language, while your vocabulary might still be limited, you will still be able to converse because of your knowledge of the structure of conversations in English.</p>
<p>Transfer learning works the same way. Highly accurate models are built on data sets where a lot of data is available. A common data set that you will come across is the <a href="http://www.image-net.org/">ImageNet</a> data. It has more than a million images. Researchers from around the world have built many different state-of-art models using this data. The resulting model, comprised of model architecture and weights, is freely available on the internet. </p>
<p>And starting from that pre-trained model, we will train the model for our problem. In fact, this is quite the norm. Almost invariably, the first model one would build for a computer vision problem would employ a pre-trained model.</p>
<p>In many cases, like our example, this might be all one can do—if restricted for data. </p>
<p>The typical practice is to keep many of the early layers fixed, and train only the last layers. If data is quite limited, only the classifier layer is re-trained. If data is moderately abundant, the last few layers are re-trained.</p>
<p>This works because a convolutional neural network learns higher level representation at each successive layer; the learning it has done at many of the early layers is held in common by all image classification problems.</p>
<p>Let's now use a pre-trained model for logo detection.</p>
<p><code>MXNet</code> has a model zoo with a number of pre-trained models.</p>
<p>We will use a popular pre-trained model called resnet. The <a href="https://arxiv.org/abs/1512.03385">paper</a> provides a lot of details on the model structure. A simpler explanation can be found in <a href="https://blog.waya.ai/deep-residual-learning-9610bb62c355">this article</a>.</p>
<p>Let's first download the pre-trained model:</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="kn">from</code> <code class="nn">mxnet.gluon.model_zoo</code> <code class="kn">import</code> <code class="n">vision</code> <code class="k">as</code> <code class="n">models</code>
<code class="n">pretrained_net</code> <code class="o">=</code> <code class="n">models</code><code class="o">.</code><code class="n">resnet18_v2</code><code class="p">(</code><code class="n">pretrained</code><code class="o">=</code><code class="bp">True</code><code class="p">)</code>
</pre>
<p>Since our data set is small, we will re-train only the output layer. We randomly initialize the weights for the output layer:</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="n">finetune_net</code> <code class="o">=</code> <code class="n">models</code><code class="o">.</code><code class="n">resnet18_v2</code><code class="p">(</code><code class="n">classes</code><code class="o">=</code><code class="n">num_classes</code><code class="p">)</code>
<code class="n">finetune_net</code><code class="o">.</code><code class="n">features</code> <code class="o">=</code> <code class="n">pretrained_net</code><code class="o">.</code><code class="n">features</code>
<code class="n">finetune_net</code><code class="o">.</code><code class="n">output</code><code class="o">.</code><code class="n">initialize</code><code class="p">(</code><code class="n">mx</code><code class="o">.</code><code class="n">init</code><code class="o">.</code><code class="n">Xavier</code><code class="p">(</code><code class="n">magnitude</code><code class="o">=</code><code class="mf">2.24</code><code class="p">))</code>
</pre>
<p>We now call the same train function as before:</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="n">train</code><code class="p">(</code><code class="n">finetune_net</code><code class="p">,</code> <code class="n">ctx</code><code class="p">,</code> <code class="n">train_data</code><code class="p">,</code> <code class="n">val_data</code><code class="p">,</code> <code class="n">test_data</code><code class="p">,</code> <code class="n">batch_size</code><code class="p">,</code> <code class="n">num_epochs</code><code class="p">,</code><code class="n">model_prefix</code><code class="o">=</code><code class="s1">'ft'</code><code class="p">,</code><code class="n">hybridize</code> <code class="o">=</code> <code class="bp">True</code><code class="p">)</code>
</pre>
<p><code>Epoch 0. Loss: 1.107, Train acc 0.83, Val acc 0.62, Test acc 0.76, Time 246.1 sec</code><br>
<code>Best validation accuracy found. Checkpointing...</code><br>
<code>Epoch 1. Loss: 0.811, Train acc 0.85, Val acc 0.62, Test acc 0.77, Time 243.7 sec</code><br>
<code>Deleting previous checkpoint...</code><br>
<code>Best validation accuracy found. Checkpointing...</code><br>
<code>Epoch 2. Loss: 0.722, Train acc 0.86, Val acc 0.64, Test acc 0.78, Time 245.3 sec</code><br>
<code>Deleting previous checkpoint...</code><br>
<code>Best validation accuracy found. Checkpointing...</code><br>
<code>Epoch 3. Loss: 0.660, Train acc 0.87, Val acc 0.66, Test acc 0.79, Time 243.4 sec</code><br>
<code>Deleting previous checkpoint...</code><br>
<code>Best validation accuracy found. Checkpointing...</code><br>
<code>Epoch 4. Loss: 0.541, Train acc 0.88, Val acc 0.67, Test acc 0.80, Time 244.5 sec</code><br>
<code>Deleting previous checkpoint...</code><br>
<code>Best validation accuracy found. Checkpointing...</code><br>
<code>Epoch 5. Loss: 0.528, Train acc 0.89, Val acc 0.68, Test acc 0.80, Time 243.4 sec</code><br>
<code>Deleting previous checkpoint...</code><br>
<code>Best validation accuracy found. Checkpointing...</code><br>
<code>Epoch 6. Loss: 0.490, Train acc 0.90, Val acc 0.68, Test acc 0.81, Time 243.2 sec</code><br>
<code>Deleting previous checkpoint...</code><br>
<code>Best validation accuracy found. Checkpointing...</code><br>
<code>Epoch 7. Loss: 0.453, Train acc 0.91, Val acc 0.71, Test acc 0.82, Time 243.6 sec</code><br>
<code>Deleting previous checkpoint...</code><br>
<code>Best validation accuracy found. Checkpointing...</code><br>
<code>Epoch 8. Loss: 0.435, Train acc 0.92, Val acc 0.70, Test acc 0.82, Time 245.6 sec</code><br>
<code>Epoch 9. Loss: 0.413, Train acc 0.92, Val acc 0.72, Test acc 0.82, Time 247.7 sec</code><br>
<code>Deleting previous checkpoint...</code><br>
<code>Best validation accuracy found. Checkpointing...</code><br>
<code>Epoch 10. Loss: 0.392, Train acc 0.92, Val acc 0.72, Test acc 0.83, Time 245.3 sec</code><br>
<code>Deleting previous checkpoint...</code><br>
<code>Best validation accuracy found. Checkpointing...</code><br>
<code>Epoch 11. Loss: 0.377, Train acc 0.92, Val acc 0.72, Test acc 0.83, Time 244.5 sec</code><br>
<code>Epoch 12. Loss: 0.335, Train acc 0.93, Val acc 0.72, Test acc 0.84, Time 244.2 sec</code><br>
<code>Epoch 13. Loss: 0.321, Train acc 0.94, Val acc 0.73, Test acc 0.84, Time 245.0 sec</code><br>
<code>Deleting previous checkpoint...</code><br>
<code>Best validation accuracy found. Checkpointing...</code><br>
<code>Epoch 14. Loss: 0.305, Train acc 0.93, Val acc 0.73, Test acc 0.84, Time 243.4 sec</code><br>
<code>Deleting previous checkpoint...</code><br>
<code>Best validation accuracy found. Checkpointing...</code><br>
<code>Epoch 15. Loss: 0.298, Train acc 0.93, Val acc 0.73, Test acc 0.84, Time 243.9 sec</code><br>
<code>Epoch 16. Loss: 0.296, Train acc 0.94, Val acc 0.75, Test acc 0.84, Time 247.0 sec</code><br>
<code>Deleting previous checkpoint...</code><br>
<code>Best validation accuracy found. Checkpointing...</code><br>
<code>Epoch 17. Loss: 0.274, Train acc 0.94, Val acc 0.74, Test acc 0.84, Time 245.1 sec</code><br>
<code>Epoch 18. Loss: 0.292, Train acc 0.94, Val acc 0.74, Test acc 0.84, Time 243.9 sec</code><br>
<code>Epoch 19. Loss: 0.306, Train acc 0.95, Val acc 0.73, Test acc 0.84, Time 244.8 sec</code></p>
<p>The model starts right away with a higher accuracy. Typically, when data is less, we train only for a few epochs and pick the model at the epoch where the validation accuracy is the highest.</p>
<p>Here, epoch 16 has the best validation accuracy. Since the training data is limited, and the model kept on training, it has started to overfit. We can see that after epoch 16, while training accuracy is increasing, validation accuracy has begun to decrease.</p>
<p>Let's collect the parameters from the corresponding checkpoint of the 16th epoch and use it as the final model. </p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="c1"># The model's parameters are now set to the values at the 16th epoch</code>
<code class="n">finetune_net</code><code class="o">.</code><code class="n">collect_params</code><code class="p">()</code><code class="o">.</code><code class="n">load</code><code class="p">(</code><code class="s1">'ft-</code><code class="si">%d</code><code class="s1">.params'</code><code class="o">%</code><code class="p">(</code><code class="mi">16</code><code class="p">),</code><code class="n">ctx</code><code class="p">)</code>
</pre>
<p><strong>Evaluating the predictions</strong></p>
<p>For the same images that we used earlier to evaluate the predictions, let's see the prediction of the new model.</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="n">img_url</code> <code class="o">=</code> <code class="s2">"http://sophieswift.com/wp-content/uploads/2017/09/pleasing-ideas-bmw-cake-and-satisfying-some-bmw-themed-cakes-crustncakes-delicious-cakes-128x128.jpg"</code>
<code class="n">classify_logo</code><code class="p">(</code><code class="n">finetune_net</code><code class="p">,</code> <code class="n">img_url</code><code class="p">)</code>
</pre>
<p><code>'With prob=0.983476, bmw'</code></p>
<figure class="center" id="id-xrijj"><img alt="bmw logo 2" src="https://d3ansictanv2wj.cloudfront.net/bmw2-ae42be6f8eebd82a95a17d9292b16f5f.png"><figcaption><span class="label">Figure 12. </span>Image by Tuhin Sharma.</figcaption></figure>
<p>We can see that the model is able to predict BMW with 98% probability.</p>
<p>Let's now try the other image we tested earlier.</p>
<pre data-code-language="python" data-type="programlisting" data-highlighted="true"><code class="n">img_url</code> <code class="o">=</code> <code class="s2">"https://dtgxwmigmg3gc.cloudfront.net/files/59cdcd6f52ba0b36b5024500-icon-256x256.png"</code>
<code class="n">classify_logo</code><code class="p">(</code><code class="n">finetune_net</code><code class="p">,</code> <code class="n">img_url</code><code class="p">)</code>
</pre>
<p><code>'With prob=0.498218, fosters'</code></p>
<p>While the prediction probability isn't good, a tad lower than 50%, Foster's still gets the highest probability amongst all the logos. </p>
<p><strong>Improving the model</strong></p>
<p>To improve the model, we need to fix the way we constructed the training data set. Each individual logo had 10 training points. But as part of distributing the no-logo images from validation to training, we moved 1,500 images to training as no logo. This introduces a significant data set bias. This is not a good practice. The following are some options to fix this:</p>
<ul>
<li>Weight the cross-entroy loss.</li>
<li>Don't include the no-logo images in the training data set. Build a model that predicts low class probabilities to all logos if it doesn't exist in test/validation images. </li>
</ul>
<p>But remember, that even with transfer learning and data augmentation, we only have 320 images, and this is quite low to build highly accurate deep learning models. </p>
<p><strong>Conclusion</strong></p>
<p>In this article, we learned how to build image recognition models using <code>MXNet</code>. <code>Gluon</code> is ideal for rapid prototyping. Moving from prototyping to production is also quite easy with <a href="https://mxnet.incubator.apache.org/tutorials/gluon/hybrid.html">hybridization</a> and symbol export. With a host of pre-trained models available on <code>MXNet</code>, we were able to get very good models for logo detection in pretty quick time. A very good resource for learning more about the underying theory is the Stanford's <a href="http://cs231n.stanford.edu/">CS231n course</a>. </p>
<p>Continue reading <a href='https://www.oreilly.com/ideas/logo-detection-using-apache-mxnet'>Logo detection using Apache MXNet.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/7Xfvn0Boges" height="1" width="1" alt=""/>Tuhin Sharma, Bargava Subramanianhttps://www.oreilly.com/ideas/logo-detection-using-apache-mxnetMachine learning needs machine teaching2018-02-01T12:10:00Ztag:www.oreilly.com,2018-02-01:/ideas/machine-learning-needs-machine-teaching<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/greenmountainwindfarm_fluvanna_2004_crop-5632d1ea93fe13dc00e406023b850d50.jpg'/></p><p><em>The O’Reilly Data Show Podcast: Mark Hammond on applications of reinforcement learning to manufacturing and industrial automation.</em></p><p>In this episode of the <a href="https://www.oreilly.com/ideas/topics/oreilly-data-show-podcast">Data Show</a>, I spoke with <a href="https://www.linkedin.com/in/markisaachammond/">Mark Hammond</a>, founder and CEO of <a href="https://bons.ai/">Bonsai</a>, a startup at the forefront of developing AI systems in industrial settings. While many articles have been written about developments in computer vision, speech recognition, and autonomous vehicles, I’m particularly excited about near-term <a href="http://landing.ai">applications of AI to manufacturing</a>, robotics, and industrial automation. In <a href="https://www.oreilly.com/ideas/practical-applications-of-reinforcement-learning-in-industry">a recent post</a>, I outlined practical applications of reinforcement learning (RL)—a type of machine learning now being used in AI systems. In particular, I described how companies like Bonsai are applying RL to manufacturing and industrial automation. As researchers explore <a href="https://eng.uber.com/deep-neuroevolution/">new approaches for solving RL problems</a>, I expect many of the first applications to be in industrial automation.</p><p>Continue reading <a href='https://www.oreilly.com/ideas/machine-learning-needs-machine-teaching'>Machine learning needs machine teaching.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/gxBdQWbMvTg" height="1" width="1" alt=""/>Ben Loricahttps://www.oreilly.com/ideas/machine-learning-needs-machine-teachingDifferent continents, different data science2018-02-01T12:00:00Ztag:www.oreilly.com,2018-02-01:/ideas/different-continents-different-datascience<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/lena-bell-68534_crop-8d1ce9d9e5c9ddb84b6ca9bcfce8aeac.jpg'/></p><p><em>Regardless of country or culture, any solid data science plan needs to address veracity, storage, analysis, and use.</em></p><p>Over the last four years, I’ve had conversations about data science, machine learning, ethics, and the law on several continents. This has included startups, big companies, governments, academics, and nonprofits. And over that time, some patterns are starting to emerge.</p>
<figure class="center" id="id-31Vik"><img alt="location services patterns map" src="https://d3ansictanv2wj.cloudfront.net/Figure1-754f084359ae4e4cb74ae8003db405d5.png"><figcaption><span class="label">Figure 1. </span>This was last year, and I didn’t have location services turned on all the time. Screenshot by Alistair Croll.</figcaption></figure>
<p>I’m going to be making some sweeping generalizations in this post. Everyone is different; every circumstance is somehow unique. But in digging into these patterns with colleagues, friends, and audiences both at home and abroad, they reflect many of the concerns of those cultures.</p>
<p>Briefly: in China, they worry about the <em>veracity of the data</em>. In Europe, they worry about the <em>storage and analysis</em>. And in North America, they worry about <em>unintended consequences</em> of acting on it.</p>
<p>Let me dig into those a bit more, and explain how I think external factors influence each.</p>
<h2>Data veracity</h2>
<p>If you don’t trust your data, everything you build atop it is a house of cards. When I’ve spoken about <a href="http://www.leananalyticsbook.com/">Lean Analytics</a> or data science and critical thinking in China, many of the questions are about knowing whether the data is real or genuine.</p>
<p>China is a country in transition. A recent talk by Xi Jinping <a href="https://www.forbes.com/sites/douglasbulloch/2017/10/29/a-new-era-xi-jinping-lays-out-his-plans-for-the-future-of-china/#4761a6796a66">outlined a plan</a> in which the country creates things first, rather than copying. They want to produce the best students, rather than send them abroad. They’re transitioning from a culture of mimicry and cheap copies to one of leadership and innovation. Just look at their policies on <a href="https://qz.com/1169690/shenzhen-in-china-has-16359-electric-buses-more-than-americas-biggest-citiess-conventional-bus-fleet/">electric cars</a>, or their planned cities, or the dominance of Wechat as a <a href="https://www.forbes.com/sites/liyanchen/2015/02/19/red-envelope-war-how-alibaba-and-tencent-fight-over-chinese-new-year/#2112d951cddd">ubiquitous payment system</a>.</p>
<p>When I was in Paris a few years ago, I visited Les Galleries Lafayette, an over-the-top mall whose gold decor and outlandish ornamentation is a paeon to all things commercial. Outside one of the high-end retail outlets was a long queue of Chinese tourists, being let in to buy a purse a few at a time.</p>
<p>As each person completed their purchase, they’d pause at the exit and take a picture of themselves with their new-found luxury item, in front of the store logo. I asked the busdriver what was going on. “They want proof it’s the real,” he replied.</p>
<p><em>Proof it’s the real.</em></p>
<p>In a country with a history of copying, where data is conflated with propaganda and competition is relatively unregulated, it’s no wonder veracity is in question.</p>
<p>There are many things a data analyst can do to test whether data is real. One of the most interesting is <a href="https://en.wikipedia.org/wiki/Benford%27s_law">Benford’s Law</a>, which states that natural data of many kinds follows a power curve. In a random sample of that data, there will be more numbers beginning with a one than a two, more with a two than a three, and so on. It seems like a magic trick, but it’s been used to expose fraud in many fascinating cases.</p>
<p>There are also promising technologies that distribute trust, tamper-evident sensors, and so on.</p>
<p>But in an era of fake news and truthiness—which is only going to get worse as we start to create fiction indistinguishable from the truth—knowing you’re starting with what’s real is the first step in modern critical thinking.</p>
<h2>Storage and Analysis</h2>
<p>At a cloud computing event in D.C. several years ago, I sat at dinner with a French diplomat. Part of the EU parliament, he was in charge of data privacy. “Do you know why the French hate traffic cameras?” he asked me. “Because we can overlook a smudge of lipstick or a whiff of cologne on our partners’ shirts. But we can’t ignore a photograph of them in a car with a lover.”</p>
<p>Indeed, the French amended the laws regarding traffic camera evidence, only sending a photo when a dispute occurs. As he pointed out, “French society functions in the gray areas of legality. Data is too black and white.”</p>
<p>Another European speaker at a separate event talked about data privacy laws, and how information must be protected from the government itself, even when the government stores it. A member of the audience challenged him on this, to which he replied, “you’re from America. You haven’t had tanks roll in, take all the records on citizens, find the Jews, and round them up.” Close borders and the echoes of war inform data storage policy in Europe.</p>
<p>The arrival of GDPR in Europe—with wide-ranging effects beyond, given the global nature of most large companies—is in part an attempt by Europe to exert some control over the technical nation-states. GAFAM (Google, Amazon, Facebook, Apple, and Microsoft) are all U.S. companies; the only close competitors are Baidu, Alibaba, and Tencent—all Chinese. If populations made nations, these would be some of the biggest countries on earth, and Europe doesn’t even have an embassy. GDPR forces these firms to answer the door when Europe comes knocking.</p>
<p>But at the same time, GDPR is a reflection of European concerns, informed by history and culture, of how data should be used, and the fact that we should be its stewards, not the other way around. Nobody should know more about us than we do.</p>
<h2>Unintended consequences</h2>
<p>The Sloan Foundation’s Daniel Goroff worked on energy nudge policy for the federal government, trying to convince people to consume less electricity, particularly during the warmer months when air conditioning use skyrockets.</p>
<p>Social scientists know that you can use peer pressure to encourage behaviours. For example, if you ask someone to re-use the towel in their hotel room, there’s a certain likelihood they will. But if you tell them that other guests re-use their towels, they’re about 25% more likely to do so.</p>
<p>Applying this kind of policy to energy conservation makes sense, so utilities send letters to their customers showing them how they’re doing on energy conservation compared to their neighbours, congratulating the frugal and showing the wasteful they can do better.</p>
<p>The problem is, this doesn’t always work. <a href="http://www.nber.org/papers/w15939">It turns out that</a> if you tell a democrat/liberal they’re consuming more than others, they’ll reduce their consumption as you’d hoped. But if you tell a republican/conservative they’re consuming less, they will increase their consumption so they get their fair share.</p>
<p>Political insight aside, this is a critical lesson: knowing what the data tells you isn’t the same as using it to produce the intended outcome. Markets and humans are dynamic, responding to change. When Orbitz tasked an algorithm with maximizing revenues, it <a href="http://www.bbc.com/news/technology-18595347">offered more expensive hotel rooms</a> to Macbook users. When Amazon rolled out Prime in Boston based on purchase history, its data model <a href="https://www.csmonitor.com/Business/2016/0423/Is-Amazon-same-day-delivery-service-racist">excluded areas where minorities lived</a>.</p>
<p>Unintended consequences are hard to predict. The U.S. is a litigious society, where many laws are created on precedent and shaped by cases that make their way through the courts. This leads to seemingly ridiculous warnings on packaging (so people don’t eat laundry pods, for example.)</p>
<figure class="center" id="id-592iW"><img alt="ambiguous or ridiculous warnings" src="https://d3ansictanv2wj.cloudfront.net/Figure2-9bea17b947e8bb670989e56e08035d0b.jpg"><figcaption><span class="label">Figure 2. </span>Easy to misinterpret. Photo by Alistair Croll.</figcaption></figure>
<p>Liability matters. Companies I’ve spoken to in North America trust their data—perhaps too much. They worry less about using clouds to process private data, or about whether a particular merge is ethical.</p>
<p>But they worry a lot about the consequences of acting on it.</p>
<h2>Three parts, one whole</h2>
<p>As I said in the outset, this is a very subjective view of the patterns I’ve seen across countries. The plural of anecdote is not data; caveat emptor. But I’ve fielded literally hundreds of questions from audiences both overseas and online; this led me to ask people in each country whether my feelings could be explained by cultural, technical, political, or economic factors.</p>
<p>The reality is, any solid data science plan needs to worry about veracity, storage, analysis, and use. There are plenty of ways cognitive bias, technical error, or the wrong model can undermine the way data is put to use. Critical thinking at every stage of the process is the best answer, regardless of country or culture.</p>
<p>Continue reading <a href='https://www.oreilly.com/ideas/different-continents-different-datascience'>Different continents, different data science.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/t_OwmQuj5_c" height="1" width="1" alt=""/>Alistair Crollhttps://www.oreilly.com/ideas/different-continents-different-datascienceFour short links: 1 February 20182018-02-01T11:10:00Ztag:www.oreilly.com,2018-02-01:/ideas/four-short-links-1-february-2018<p><em>Tor + Bitcoin = De-anonymization, Classic Papers, 3D Holograms, and Big Data Privacy</em></p><ol>
<li>
<a href="https://arxiv.org/pdf/1801.07501.pdf">Deanonymizing Tor Hidden Service Users Through Bitcoin Transactions Analysis</a> -- <i>This, for example, allows an adversary to link a user with @alice Twitter address to a Tor hidden service with private.onion address by finding at least one past transaction in the blockchain that involves their publicly declared Bitcoin addresses.</i>
</li>
<li>
<a href="http://mrmgroup.cs.princeton.edu/cos583/syllabusS15.pdf">Great Moments in Computing</a> -- the reading list for this Princeton course is fascinating! (via <a href="https://blog.acolyer.org/2018/01/31/a-theory-of-the-learnable/">Paper a Day</a>)</li>
<li>
<a href="http://www.kurzweilai.net/he-princess-leia-project-volumetric-3d-images-that-float-in-thin-air">Volumetric 3D Images that Float in the Air</a> (Kurzweil AI) -- the <a href="https://youtu.be/1aAx2uWcENc">video</a> is impressive! Trap a particle with a laser, move it around really fast while illuminating it with red, green, and blue lights. Result, thanks to persistence of vision: illusion of 3D object. Brilliant!</li>
<li>
<a href="http://randomwalker.info/publications/precautionary.pdf">A Precautionary Approach to Big Data Privacy</a> -- <i>In Section 3, we discuss the levers that policymakers can use to influence data releases: research funding choices that incentivize collaboration between privacy theorists and practitioners, mandated transparency of re-identification risks, and innovation procurement. Meanwhile, practitioners and policymakers have numerous pragmatic options for narrower releases of data. In Section 4, we present advice for six of the most common use cases for sharing data. Our thesis is that the problem of “what to do about re-identification” unravels once we stop looking for a one-size-fits-all solution, and in each of the six cases we propose a solution that is tailored, yet principled.</i>
</li>
</ol>
<p>Continue reading <a href='https://www.oreilly.com/ideas/four-short-links-1-february-2018'>Four short links: 1 February 2018.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/VVIbwekKfpo" height="1" width="1" alt=""/>Nat Torkingtonhttps://www.oreilly.com/ideas/four-short-links-1-february-2018New releases from O'Reilly for February 20182018-02-01T11:00:00Ztag:www.oreilly.com,2018-02-01:/ideas/new-releases-from-oreilly-for-february-2018<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/splash-2676508_1920-c7df7391740b88ffdca9dcd1656e92d1.jpg'/></p><p><em>Find out what's new in machine learning, network automation, security, and more.</em></p><p>Get a fresh start on building a new skill or augment what you currently know with one of these five newly released titles from O'Reilly.</p>
<h2>Machine Learning and Security</h2>
<p><a href="https://www.safaribooksonline.com/library/view/machine-learning-and/9781491979891?utm_source=oreilly&amp;utm_medium=newsiteutm_campaign=new-releases-from-oreilly-for-february-1c"><img align="left" src="https://covers.oreillystatic.com/images/0636920065555/cat.gif" style="margin-right: 20px;" width="140px"></a></p><p>Continue reading <a href='https://www.oreilly.com/ideas/new-releases-from-oreilly-for-february-2018'>New releases from O'Reilly for February 2018.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/qxJc-fNIFUQ" height="1" width="1" alt=""/>https://www.oreilly.com/ideas/new-releases-from-oreilly-for-february-2018Be fast, be secure, be accessible2018-01-31T20:00:00Ztag:www.oreilly.com,2018-01-31:/ideas/be-fast-be-secure-be-accessible<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/kings-cross-1031629_1920-3b0a670df51298b559d84c54305afe92.jpg'/></p><p><em>Learn why performance, security, and accessibility are the pillars of web development and the O’Reilly Fluent Conference.</em></p><p>When my fellow program chairs of Fluent, Kyle Simpson and Tammy Everts, and I started thinking about how we'd describe a theme for the event back in 2016, we came up with "Building a Better Web." While we recognized it can sound a little hand-wavey, a big part of this theme is a crucial layer of the developer experience — the bigger-picture more goal-oriented perspective that comes along with skill development. When we thought about what it takes to build a better web, we kept coming back to the idea of a fast, secure, accessible web — one that works for users of all backgrounds and abilities, one that reaches users of varied connection speeds and devices, and one that keeps its users safe. If there are three main pillars of the modern web, they are: performance, security, and accessibility. </p>
<p>It may seem obvious that these pillars should be key areas of focus and investment for engineering and product teams, and yet so often they're treated as an afterthought. And while the practice of calling out these domains can feel like a hackneyed reminder to "eat your vegetables," it's worth it to think about the ways these areas intersect in our organizations and impact customers. </p><p>Continue reading <a href='https://www.oreilly.com/ideas/be-fast-be-secure-be-accessible'>Be fast, be secure, be accessible.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/aZOPHYEI6Y8" height="1" width="1" alt=""/>Allyson MacDonaldhttps://www.oreilly.com/ideas/be-fast-be-secure-be-accessibleHow to solve 90% of NLP problems: A step-by-step guide2018-01-31T12:00:00Ztag:www.oreilly.com,2018-01-31:/ideas/how-to-solve-90-of-nlp-problems--a-step-by-step-guide<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/pexels-photo-415068_crop-4dc2d9ae37c302fa0773587d75591e68.jpg'/></p><p><em>Using machine learning to understand and leverage text.</em></p><p>Whether you are an established company or working to launch a new service, you can always leverage text data to validate, improve, and expand the functionalities of your product. The science of extracting meaning and learning from text data is an active topic of research called natural language processing (NLP).</p>
<p>NLP produces <a href="https://arxiv.org/abs/1704.01444">new</a> and <a href="https://arxiv.org/abs/1711.00043">exciting</a> <a href="https://arxiv.org/abs/1708.04729">results</a> on a daily basis, and is a very large field. However, having worked with hundreds of companies, the Insight team has seen a few key practical applications come up much more frequently than any other:</p>
<ul>
<li>Identifying different cohorts of users/customers (e.g., predicting churn, lifetime value, product preferences)</li>
<li>Accurately detecting and extracting different categories of feedback (positive and negative reviews/opinions, mentions of particular attributes such as clothing size/fit...)</li>
<li>Classifying text according to intent (e.g., request for basic help, urgent problem)</li>
</ul>
<p>While many NLP papers and tutorials exist online, we have found it hard to find guidelines and tips on how to approach these problems efficiently from the ground up.</p>
<h2>How to build machine learning solutions to solve problems</h2>
<p>After leading hundreds of projects a year and gaining advice from top teams all over the United States, we wrote this post to explain how to build machine learning solutions to solve problems like the ones mentioned above. We’ll begin with the simplest method that could work, and then move on to more nuanced solutions, such as feature engineering, word vectors, and deep learning.</p>
<p>After reading this article, you’ll know how to:</p>
<ul>
<li>Gather, prepare, and inspect data</li>
<li>Build simple models to start, and transition to deep learning if necessary</li>
<li>Interpret and understand your models to make sure you are actually capturing information and not noise</li>
</ul>
<p>We wrote this post as a step-by-step guide; it can also serve as a high-level overview of highly effective standard approaches.</p>
<aside data-type="sidebar" id="id-3VpSk">
<p>This post is accompanied by <a href="https://github.com/hundredblocks/concrete_NLP_tutorial/blob/master/NLP_notebook.ipynb">an interactive notebook</a> demonstrating and applying all these techniques. Feel free to run the code and follow along.</p>
</aside>
<h3>Step 1: Gather your data</h3>
<h4>Example data sources</h4>
<p>Every Machine Learning problem starts with data, such as a list of emails, posts, or tweets. Common sources of textual information include:</p>
<ul>
<li>Product reviews (on Amazon, Yelp, and various App Stores)</li>
<li>User-generated content (tweets, Facebook posts, StackOverflow questions)</li>
<li>Troubleshooting (customer requests, support tickets, chat logs)</li>
</ul>
<p>For this post, we will use a data set generously provided by <a href="https://www.crowdflower.com/data-for-everyone/">CrowdFlower</a>, called “Disasters on Social Media,” where:</p>
<blockquote>Contributors looked at over 10,000 tweets culled with a variety of searches like “ablaze,” “quarantine,” and “pandemonium,” then noted whether the tweet referred to a disaster event (as opposed to a joke with the word or a movie review or something non-disastrous).</blockquote>
<p>Our task will be to detect which tweets are about a disastrous event as opposed to an irrelevant topic such as a movie. Why? A potential application would be to exclusively notify law enforcement officials about urgent emergencies while ignoring reviews of the most recent Adam Sandler film. A particular challenge with this task is that both classes contain the same search terms used to find the tweets, so we will have to use subtler differences to distinguish between them.</p>
<p>In the rest of this post, we will refer to tweets that are about disasters as “disaster”, and tweets about anything else as “irrelevant.”</p>
<h4>Labels</h4>
<p>We have labeled data, so we know which tweets belong to which categories. As Richard Socher outlines below, it is usually faster, simpler, and cheaper to find and label enough data on which to train a model, rather than trying to optimize a complex unsupervised method.</p>
<figure class="center" id="id-5AEiz"><img alt="richard socher pro tip label data" width="70%" src="https://d3ansictanv2wj.cloudfront.net/Figure1-fbf5b674e4cb08a8913f2b428af0edee.png"><figcaption><span class="label">Figure 1. </span><a href="https://twitter.com/richardsocher/status/840333380130553856">Richard Socher’s pro-tip</a>.</figcaption></figure>
<h3>Step 2: Clean your data</h3>
<p>The number one rule we follow is: Your model will only ever be as good as your data.</p>
<p>One of the key skills of a data scientist is knowing whether the next step should be working on the model or the data. A good rule of thumb is to look at the data first and then clean it up. A clean data set will allow a model to learn meaningful features and not overfit on irrelevant noise.</p>
<p>Here is a checklist to use to clean your data: (see the <a href="https://github.com/hundredblocks/concrete_NLP_tutorial/blob/master/NLP_notebook.ipynb">code</a> for more details):</p>
<ol>
<li>Remove all irrelevant characters, such as any non-alphanumeric characters</li>
<li>
<a href="https://nlp.stanford.edu/IR-book/html/htmledition/tokenization-1.html">Tokenize</a> your text by separating it into individual words</li>
<li>Remove words that are not relevant, such as “@” twitter mentions or urls</li>
<li>Convert all characters to lowercase, in order to treat words such as “hello”, “Hello”, and “HELLO” the same</li>
<li>Consider combining misspelled or alternately spelled words to a single representation (e.g., “cool”/”kewl”/”cooool”)</li>
<li>Consider <a href="https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html">lemmatization</a> (reduce words such as “am,” “are,” and “is” to a common form, such as “be”)</li>
</ol>
<p>After following these steps and checking for additional errors, we can start using the clean, labeled data to train models.</p>
<h3>Step 3: Find a good data representation</h3>
<p>Machine learning models take numerical values as input. Models working on images, for example, take in a matrix representing the intensity of each pixel in each color channel.</p>
<figure class="center" id="id-6ZXig"><img alt="matrix of numbers" src="https://d3ansictanv2wj.cloudfront.net/Figure2-7a15bc9f3e896635594694fe3e538d9e.png"><figcaption><span class="label">Figure 2. </span>A smiling face represented as a matrix of numbers. Image by <a href="https://teachwithict.weebly.com/binary-representation-of-images.html">Teach with ICT</a>, used under a Creative Commons license.</figcaption></figure>
<p>Our data set is a list of sentences, so in order for our algorithm to extract patterns from the data, we first need to find a way to represent it in a way that our algorithm can understand—i.e., as a list of numbers.</p>
<h4>One-hot encoding (bag of words)</h4>
<p>A natural way to represent text for computers is to encode each character individually as a number (<a href="https://en.wikipedia.org/wiki/ASCII">ASCII</a>, for example). If we were to feed this simple representation into a classifier, it would have to learn the structure of words from scratch based only on our data, which is impossible for most data sets. We need to use a higher level approach.</p>
<p>For example, we can build a vocabulary of all the unique words in our data set, and associate a unique index to each word in the vocabulary. Each sentence is then represented as a list that is as long as the number of distinct words in our vocabulary. At each index in this list, we mark how many times the given word appears in our sentence. This is called a <a href="https://en.wikipedia.org/wiki/Bag-of-words_model">bag of words</a> model, since it is a representation that completely ignores the order of words in our sentence. This is illustrated below.</p>
<figure class="center" id="id-5wmiG"><img alt="bag of words" src="https://d3ansictanv2wj.cloudfront.net/Figure3-1a8409a126d973b4265c66d35d093d4a.png"><figcaption><span class="label">Figure 3. </span>Representing sentences as a Bag of Words. Sentences on the left, representation on the right. Each index in the vectors represent one particular word. Image by Insight Data Science, used with permission.</figcaption></figure>
<h4>Visualizing the embeddings</h4>
<p>We have around 20,000 words in our vocabulary in the “Disasters of Social Media” example, which means that every sentence will be represented as a vector of length 20,000. The vector will contain mostly 0s because each sentence contains only a very small subset of our vocabulary.</p>
<p>In order to see whether our embeddings are capturing information that is relevant to our problem (i.e., whether the tweets are about disasters or not), it is a good idea to visualize them and see if the classes look well separated. Since vocabularies are usually very large and visualizing data in 20,000 dimensions is impossible, techniques like <a href="https://en.wikipedia.org/wiki/Principal_component_analysis">PCA</a> will help project the data down to two dimensions. This is plotted below.</p>
<figure class="center" id="id-3XPiE"><img alt="visualizing bag of words" src="https://d3ansictanv2wj.cloudfront.net/Figure4-72c723b0856ffc683a2840868f5c8968.png"><figcaption><span class="label">Figure 4. </span>Visualizing bag of words embeddings. Image by Insight Data Science, used with permission.</figcaption></figure>
<p>The two classes do not look very well separated, which could be a feature of our embeddings or simply of our dimensionality reduction. In order to see whether the bag of words features are of any use, we can train a classifier based on them.</p>
<h3>Step 4: Classification</h3>
<p>When first approaching a problem, a general best practice is to start with the simplest tool that could solve the job. Whenever it comes to classifying data, a common favorite for its versatility and explainability is <a href="https://en.wikipedia.org/wiki/Logistic_regression">logistic regression</a>. It is very simple to train, and the results are interpretable, as you can easily extract the most important coefficients from the model.</p>
<p>We split our data in to a training set used to fit our model and a test set to see how well it generalizes to unseen data. After training, we get an accuracy of 75.4%. Not too shabby! Guessing the most frequent class (“irrelevant”) would give us only 57%. However, even if 75% precision was good enough for our needs, we should never ship a model without trying to understand it.</p>
<h3>Step 5: Inspection</h3>
<h4>Confusion matrix</h4>
<p>A first step is to understand the types of errors our model makes, and which kinds of errors are least desirable. In our example, false positives are classifying an irrelevant tweet as a disaster, and false negatives are classifying a disaster as an irrelevant tweet. If the priority is to react to every potential event, we would want to lower our false negatives. If we are constrained in resources, however, we might prioritize a lower false positive rate to reduce false alarms. A good way to visualize this information is using a <a href="https://en.wikipedia.org/wiki/Confusion_matrix">confusion matrix</a>, which compares the predictions our model makes with the true label. Ideally, the matrix would be a diagonal line from top left to bottom right (our predictions match the truth perfectly).</p>
<figure class="center" id="id-YOiex"><img alt="confusion matrix" src="https://d3ansictanv2wj.cloudfront.net/Figure5-9c49b984f8c249396d2eabcdf0275231.png"><figcaption><span class="label">Figure 5. </span>Confusion matrix (green is a high proportion, blue is low). Image by Insight Data Science, used with permission.</figcaption></figure>
<p>Our classifier creates more false negatives than false positives (proportionally). In other words, our model’s most common error is inaccurately classifying disasters as irrelevant. If false positives represent a high cost for law enforcement, this could be a good bias for our classifier to have.</p>
<h4>Explaining and interpreting our model</h4>
<p>To validate our model and interpret its predictions, it is important to look at which words it is using to make decisions. If our data is biased, our classifier will make make accurate predictions in the sample data, but the model would not generalize well in the real world. Here we plot the most important words for both the disaster and irrelevant classes. Plotting word importance is simple with bag of words and logistic regression since we can just extract and rank the coefficients that the model used for its predictions.</p>
<figure class="center" id="id-N4i7K"><img alt="bag of words importance" src="https://d3ansictanv2wj.cloudfront.net/Figure6-309f3a9341e1fcfd88c36dad25b3d541.png"><figcaption><span class="label">Figure 6. </span>Bag of words: Word importance. Image by Insight Data Science, used with permission.</figcaption></figure>
<p>Our classifier correctly picks up on some patterns (Hiroshima, massacre), but clearly seems to be overfitting on some meaningless terms (heyoo, x1392). Right now, our bag of words model is dealing with a huge vocabulary of different words and treating all words equally. However, some of these words are very frequent, and are only contributing noise to our predictions. Next, we will try a way to represent sentences that can account for the frequency of words, to see if we can pick up more signal from our data.</p>
<h3>Step 6: Accounting for vocabulary structure</h3>
<h4>TF-IDF</h4>
<p>In order to help our model focus more on meaningful words, we can use a <a href="https://en.wikipedia.org/wiki/Tf%E2%80%93idf">term frequency–inverse document frequency (TF-IDF) score</a> on top of our bag of words model. TF-IDF weighs words by how rare they are in our data set, discounting words that are too frequent and just add to the noise. Here is the PCA projection of our new embeddings.</p>
<figure class="center" id="id-Mkibq"><img alt="Visualizing TF-IDF embeddings" src="https://d3ansictanv2wj.cloudfront.net/Figure7-5b3bf6a3e56e39cc27015d8677271b9e.png"><figcaption><span class="label">Figure 7. </span>Visualizing TF-IDF embeddings. Image by Insight Data Science, used with permission.</figcaption></figure>
<p>We can see there is a clearer distinction between the two colors. This should make it easier for our classifier to separate both groups. Let’s see if this leads to better performance. Training another logistic regression on our new embeddings, we get an accuracy of 76.2%.</p>
<p>A very slight improvement. Has our model started picking up on more important words? If we are getting a better result while preventing our model from “cheating,” then we can truly consider this model an upgrade.</p>
<figure class="center" id="id-QZiPw"><img alt="TF-IDF: Word importance" src="https://d3ansictanv2wj.cloudfront.net/Figure8-014b94146ec6892dac35c4e6d0809f3a.png"><figcaption><span class="label">Figure 8. </span>TF-IDF: Word importance. Image by Insight Data Science, used with permission.</figcaption></figure>
<p>The words it picked up look much more relevant. Although our metrics on our test set only increased slightly, we have much more confidence in the terms our model is using, and thus would feel more comfortable deploying it in a system that would interact with customers.</p>
<h3>Step 7: Leveraging semantics</h3>
<h4>Word2Vec</h4>
<p>Our latest model managed to pick up on high-signal words. However, it is very likely that if we deploy this model, we will encounter words that we have not seen in our training set. The previous model will not be able to accurately classify these tweets, even if it has seen very similar words during training.</p>
<p>To solve this problem, we need to capture the semantic meaning of words, meaning we need to understand that words like ‘good’ and ‘positive’ are closer than ‘apricot’ and ‘continent.’ The tool we will use to help us capture meaning is called Word2Vec.</p>
<p><strong>Using pre-trained words</strong></p>
<p><a href="https://arxiv.org/abs/1301.3781">Word2Vec</a> is a technique to find continuous embeddings for words. It learns from reading massive amounts of text and memorizing which words tend to appear in similar contexts. After being trained on enough data, it generates a 300-dimension vector for each word in a vocabulary, with words of similar meaning being closer to each other.</p>
<p>The authors of <a href="https://arxiv.org/abs/1301.3781" data-href="https://arxiv.org/abs/1301.3781">this paper</a> open sourced a model that was pre-trained on a very large corpus, which we can leverage to include some knowledge of semantic meaning into our model. The pre-trained vectors can be found in the <a href="https://github.com/hundredblocks/concrete_NLP_tutorial">repository</a> associated with this post.</p>
<h4>Sentence-level representation</h4>
<p>A quick way to get a sentence embedding for our classifier is to average Word2Vec scores of all words in our sentence. This is a bag of words approach just like before, but this time we only lose the syntax of our sentence, while keeping some semantic information.</p>
<figure class="center" id="id-kAiEJ"><img alt="Word2Vec sentence embedding" src="https://d3ansictanv2wj.cloudfront.net/Figure9-21c0d2edb558d14bd0c8c790ab2eabc4.png"><figcaption><span class="label">Figure 9. </span>Word2Vec sentence embedding. Image by Insight Data Science, used with permission.</figcaption></figure>
<p>Here is a visualization of our new embeddings using previous techniques:</p>
<figure class="center" id="id-Jeizw"><img alt="Visualizing Word2Vec embeddings" src="https://d3ansictanv2wj.cloudfront.net/figure10-767df37f2f64377e9012574b12136f30.png"><figcaption><span class="label">Figure 10. </span>Visualizing Word2Vec embeddings. Image by Insight Data Science, used with permission.</figcaption></figure>
<p>The two groups of colors look even more separated here; our new embeddings should help our classifier find the separation between both classes. After training the same model a third time (a logistic regression), we get an accuracy score of 77.7%, our best result yet! Time to inspect our model.</p>
<h4>The complexity/explainability trade-off</h4>
<p>Since our embeddings are not represented as a vector with one dimension per word as in our previous models, it’s harder to see which words are the most relevant to our classification. While we still have access to the coefficients of our logistic regression, they relate to the 300 dimensions of our embeddings rather than the indices of words.</p>
<p>For such a low gain in accuracy, losing all explainability seems like a harsh trade-off. However, with more complex models, we can leverage black box explainers such as <a href="https://arxiv.org/abs/1602.04938">LIME</a> in order to get some insight into how our classifier works.</p>
<p><strong>LIME</strong></p>
<p>LIME is <a href="https://github.com/marcotcr/lime" data-href="https://github.com/marcotcr/lime">available on GitHub</a> through an open sourced package. A black-box explainer allows users to explain the decisions of any classifier on one particular example by perturbing the input (in our case, removing words from the sentence) and seeing how the prediction changes.</p>
<p>Let’s see a couple explanations for sentences from our data set.</p>
<figure class="center" id="id-xriZD"><img alt="black box explainer" src="https://d3ansictanv2wj.cloudfront.net/Figure11a-33e5a2bf6fd089ca466594a4ea341c09.png"><figcaption><span class="label">Figure 11. </span>Correct disaster words are picked up to classify as “relevant”. Image by Emmanuel Ameisen, used with permission.</figcaption></figure>
<figure class="center" id="id-P9ix4"><img alt="black box explainer" src="https://d3ansictanv2wj.cloudfront.net/Figure11b-c3253f733feb355a6b6a93ed50f98a29.png"><figcaption><span class="label">Figure 12. </span>Here, the contribution of the words to the classification seems less obvious. Image by Emmanuel Ameisen, used with permission.</figcaption></figure>
<p>However, we do not have time to explore the thousands of examples in our data set. What we’ll do instead is run LIME on a representative sample of test cases and see which words keep coming up as strong contributors. Using this approach, we can get word importance scores like we had for previous models and validate our model’s predictions.</p>
<figure class="center" id="id-XPipE"><img alt="Word2Vec: Word importance" src="https://d3ansictanv2wj.cloudfront.net/Image12-8905e33219802766bfb90d9dfe82a9a3.png"><figcaption><span class="label">Figure 13. </span>Word2Vec: Word importance. Image by Insight Data Science, used with permission.</figcaption></figure>
<p>Looks like the model picks up highly relevant words, implying that it appears to make understandable decisions. These seem like the most relevant words out of all previous models and, therefore, we’re more comfortable deploying it into production.</p>
<h3>Step 8: Leveraging syntax using end-to-end approaches</h3>
<p>We’ve covered quick and efficient approaches to generate compact sentence embeddings. However, by omitting the order of words, we are discarding all of the syntactic information of our sentences. If these methods do not provide sufficient results, you can utilize more complex models that take in whole sentences as input and predict labels without the need to build an intermediate representation. A common way to do that is to treat a sentence as a sequence of individual word vectors using either Word2Vec or more recent approaches such as <a href="https://nlp.stanford.edu/projects/glove/">GloVe</a> or <a href="https://arxiv.org/abs/1708.00107">CoVe</a>.</p>
<p><a href="https://arxiv.org/abs/1408.5882">Convolutional neural networks (CNN) for sentence classification</a> train very quickly and work well as an entry-level deep learning architecture. While CNNs are mainly known for their performance on image data, they have been providing excellent results on text-related tasks, and are usually much quicker to train than most complex NLP approaches (e.g., <a href="http://colah.github.io/posts/2015-08-Understanding-LSTMs/">LSTMs</a> and <a href="https://www.tensorflow.org/tutorials/seq2seq">Encoder/Decoder</a> architectures). This model preserves the order of words and learns valuable information on which sequences of words are predictive of our target classes. Contrary to previous models, it can tell the difference between “Alex eats plants” and “Plants eat Alex.”</p>
<p>Training this model does not require much more work than previous approaches (see <a href="https://github.com/hundredblocks/concrete_NLP_tutorial/blob/master/NLP_notebook.ipynb">the code</a> for details) and gives us a model that is much better than the previous ones, getting 79.5% accuracy. As with the models above, the next step should be to explore and explain the predictions using the methods we described to validate that it is indeed the best model to deploy to users. By now, you should feel comfortable tackling this on your own.</p>
<h3>Final Notes</h3>
<p>Here is a quick recap of the approach we’ve successfully used:</p>
<ul>
<li>Start with a quick and simple model</li>
<li>Explain its predictions</li>
<li>Understand the kinds of mistakes it is making</li>
<li>Use that knowledge to inform your next step, whether that is working on your data, or a more complex model.</li>
</ul>
<p>These approaches were applied to a particular example case using models tailored toward understanding and leveraging short text such as tweets, but the ideas are widely applicable to a variety of problems. If you have questions or comments, feel free to reach out to <a href="https://medium.com/@EmmanuelAmeisen">@EmmanuelAmeisen on Medium</a> or on <a href="https://twitter.com/EmmanuelAmeisen">Twitter</a>.</p>
<p>Continue reading <a href='https://www.oreilly.com/ideas/how-to-solve-90-of-nlp-problems--a-step-by-step-guide'>How to solve 90% of NLP problems: A step-by-step guide.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/F0tz7iO4bs0" height="1" width="1" alt=""/>Emmanuel Ameisenhttps://www.oreilly.com/ideas/how-to-solve-90-of-nlp-problems--a-step-by-step-guideFour short links: 31 January 20182018-01-31T11:55:00Ztag:www.oreilly.com,2018-01-31:/ideas/four-short-links-31-january-2018<p><em>Fairness, Typesetting, Anomalies, and Faking Out Speech Recognition</em></p><ol>
<li>
<a href="https://www.oreilly.com/ideas/the-problem-with-building-a-fair-system">The Problem with Building a Fair System</a> (Mike Loukides) -- <i>We're ultimately after justice, not fairness. And by stopping with fairness, we are shortchanging the people most at risk. If justice is the real issue, what are we missing?</i>
</li>
<li>
<a href="https://github.com/parrt/bookish">Bookish</a> -- open source <i>tool that translates augmented markdown into HTML or latex</i>. </li>
<li>
<a href="https://github.com/MentatInnovations/datastream.io">datastream.io</a> -- <i>An open source framework for real-time anomaly detection using Python, ElasticSearch, and Kibana.</i> See also <a href="https://medium.com/@ment_at/datastream-io-open-source-anomaly-detection-64db282735e0">the announcement</a>.</li>
<li>
<a href="https://arxiv.org/abs/1801.01944">Audio Adversarial Examples</a> -- <i>Given any audio waveform, we can produce another that is over 99.9% similar, but transcribes as any phrase we choose (at a rate of up to 50 characters per second).</i> You say "potato," I say "single quote semicolon drop table users semicolon dash dash."</li>
</ol>
<p>Continue reading <a href='https://www.oreilly.com/ideas/four-short-links-31-january-2018'>Four short links: 31 January 2018.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/mPa2frh8D48" height="1" width="1" alt=""/>Nat Torkingtonhttps://www.oreilly.com/ideas/four-short-links-31-january-2018“Data as a feature” is coming. Are product managers ready?2018-01-31T11:30:00Ztag:www.oreilly.com,2018-01-31:/ideas/data-as-a-feature-is-coming-are-product-managers-ready<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/digital-marketing-1433427_1920_crop-645e2cc00a68c9f58c67d23d2ddb520f.jpg'/></p><p><em>By packaging and delivering actionable data in applications, product managers can help users achieve their goals.</em></p><p>More and more, apps are capable of delivering more than a service, such as access to a bank account or the ability to order a pizza. Apps can offer users data—and not just a dump of data that has little value, but data specifically designed to be of value to the user.</p>
<p>This is called “data as a feature”—it is the act and process of treating data as a core feature of a software product in a way that delivers value to the user. Taking this definition a step further, a product with data as a feature delivers that data in a way that helps the user meet a goal.</p>
<p>The trend of consumerizing applications and making data easier for users to consume and make decisions with is affecting apps and services across all industries. As consumers more and more come to expect, and even demand, value and deep insights, software product managers are increasingly tasked with the responsibility for making sure that “data as a feature” is successfully implemented in the apps they are bringing to market. By packaging and delivering data effectively in a product, they can help users become more informed and better able to take action.</p>
<h2>Implications for product managers</h2>
<p>So, why build data as a feature?</p>
<p>Because everyone is being overwhelmed with data. Business users as well as consumers have data bombarding them from all angles. The challenge for people has shifted. They used to ask for more data all the time. Now they are getting too much of it. They want to get value out of it, but are overwhelmed.</p>
<p>Software developers are responding to this need by translating and presenting data to users in a way they can immediately grasp, and on which they can take action. They’re building data as a feature into their products and displaying that data in visually appealing, intuitive, and easily consumable ways.</p>
<p>In turn, product managers today face a significant addition to their duties—they now need to begin treating data as a feature in the products they’re building. In other words, not viewing data as only a byproduct of the apps they’re charged with developing, but as a prominent feature.</p>
<p>To do this successfully, it’s critical to understand who is going to be using the products, what their data needs are, and how a specific data-driven “slice of business functionality” could help users meet those needs.</p>
<p>To achieve this, “design thinking” is important, even critical. But even more so is “goal thinking”: what are users’ goals, and how do you present data in a way that helps them achieve those goals?</p>
<p>These are challenging questions, and product managers are the ones who have to bring it all together. They need to do all of the following:</p>
<ul>
<li>Understand the data needs—and goals—of their users</li>
<li>Keep the apps they’re developing aligned with the goals of their businesses</li>
<li>Deliver exemplary user experiences with data within the confines of their current technical capacities</li>
<li>Balance the inputs of their developers, UX designers, and users with their own visions for their products so that they bring real value to market</li>
</ul>
<p>To help product managers achieve all these requirements, O’Reilly and TIBCO Jaspersoft have written a free report, <a href="https://www.jaspersoft.com/Free-eBook-Data-as-a-Feature?utm_source=oreilly&amp;utm_medium=referral&amp;utm_campaign=data_feature"><em>Data as a Feature</em></a>, to help you learn why treating data as a feature in your products is a way to make them stand out from the crowd. The report emphasizes that standing out requires more than just providing beautiful data visualizations. The data needs to help users take action, make a decision, or reach a goal.</p>
<p>The report includes hands-on examples, and tips and best practices for managing a project that includes data as a feature. Among other things, the report shows you how to use personas, surface your assumptions, and make your data “over-the-counter” so it can be easily understood and valuable to your users.</p>
<p>For more, download the free report, <a href="https://www.jaspersoft.com/Free-eBook-Data-as-a-Feature?utm_source=oreilly&amp;utm_medium=referral&amp;utm_campaign=data_feature"><em>Data as a Feature</em></a>.</p>
<p><em>This post is a collaboration between O'Reilly and TIBCO Jaspersoft. </em><a href="http://www.oreilly.com/about/editorial_independence.html"><em>See our statement of editorial independence</em></a><em>.</em></p>
<p>Continue reading <a href='https://www.oreilly.com/ideas/data-as-a-feature-is-coming-are-product-managers-ready'>“Data as a feature” is coming. Are product managers ready?.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/y4JWqkJjxas" height="1" width="1" alt=""/>Alice LaPlantehttps://www.oreilly.com/ideas/data-as-a-feature-is-coming-are-product-managers-ready7 on-the-rise technology trends to track and learn2018-01-31T11:00:00Ztag:www.oreilly.com,2018-01-31:/ideas/7-on-the-rise-technology-trends-to-track-and-learn<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/oreilly-news-antenna-crop-f5fb18cb55686b70c37ac5003131ec4b.jpg'/></p><p><em>AI, Python, Java, blockchain, and cloud technologies are active topics on O’Reilly’s online learning platform.</em></p><p>When developers and tech leaders are figuring out what to do next, how to advance their careers, or offer more value to their companies, they need to know what’s hot and what’s not living up to the hype.</p>
<p>Toward that end, we dove into the last two year’s worth of search data from our <a href="https://www.safaribooksonline.com/home/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=7-tech-trends-to-track-and-learn-body-cta">online learning platform</a> to identify the topics you should consider exploring in the months ahead. We find search activity particularly effective for spotting shifts in technology usage: what’s gaining traction, what’s falling out of favor, and what topics are maturing.</p><p>Continue reading <a href='https://www.oreilly.com/ideas/7-on-the-rise-technology-trends-to-track-and-learn'>7 on-the-rise technology trends to track and learn.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/zkwYqLYJFKs" height="1" width="1" alt=""/>Andy Oram, Roger Magoulashttps://www.oreilly.com/ideas/7-on-the-rise-technology-trends-to-track-and-learnThe problem with building a “fair” system2018-01-30T12:00:00Ztag:www.oreilly.com,2018-01-30:/ideas/the-problem-with-building-a-fair-system<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/gavel-statue-2771088_1920_crop-99cb4f6916febf5b5333d9ddaf14b66f.jpg'/></p><p><em>The ability to appeal may be the most important part of a fair system, and it's one that isn't often discussed in data circles.</em></p><p>Fairness is a slippery concept. We haven't yet gotten past the first-grade playground: what's "fair" is what's fair to me, not necessarily to everyone else. That's one reason we need to talk about ethics in the first place: to move away from the playground's "that's not fair" (someone has my favorite toy) to a statement about justice.</p>
<p>There have been several important discussions of fairness recently. Cody Marie Wild's “<a href="https://hackernoon.com/fair-and-balanced-thoughts-on-bias-in-probabilistic-modeling-2ffdbd8a880f">Fair and Balanced? Thoughts on Bias in Probabilistic Modeling</a>” and Kate Crawford's NIPS 2017 keynote, “<a href="https://www.youtube.com/watch?v=fMym_BKWQzk">The Trouble with Bias</a>,” do an excellent job of discussing how and why bias keeps reappearing in our data-driven systems. Neither of these papers pretend to have any final answer to the problem of fairness. Nor do I. I would like to expose some of the problems, and suggest some directions for making progress toward the elusive goal of "fairness."</p>
<p>The nature of data itself presents a fundamental problem. "Fairness" is aspirational: we want to be fair, we hope to be fair. Fairness has much more to do with breaking away from our past and transcending it than with replicating it. But data is inevitably historical, and it reflects all the prejudices and biases of the past. If our systems are driven by data, can they possibly be "fair"? Or do they just legitimize historical biases under the guise of science and mathematics? Is it possible to make fair systems out of data that reflects historical biases? I'm uncomfortable with the idea that we can tweak the outputs of a data-driven system to compensate for biases; my instincts tell me that approach will lead to pain and regret. Some research suggests that <a href="http://sorelle.friedler.net/papers/kdd_disparate_impact.pdf">de-biasing the input data</a> may be a better approach, but it's still early.</p>
<p>It is easier to think about fairness when there's only one dimension. Does using a system like <a href="https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing">COMPASS</a> lead to harsher punishments for blacks than non-blacks? Was Amazon's same-day delivery service initially <a href="https://www.bloomberg.com/graphics/2016-amazon-same-day/">offered only in predominantly white neighborhoods</a>? (Amazon has addressed this problem.) Those questions are relatively easy to evaluate. But in reality, these problems have many dimensions. A machine learning system that is unfair to people of color might also be unfair to the elderly or the young; it might be unfair to people without college degrees, women, and the handicapped. We actually don't know; for the most part, we haven't asked those questions. We do know that AI is good at finding groups with similar characteristics (such as "doesn't have a college degree"), even when that characteristic isn't explicitly in the data. I doubt that Amazon's same-day delivery service intentionally excluded black neighborhoods; what does "intention" even mean here? Software can't have intentions. Developers and managers can, though their intention was certainly to maximize sales while minimizing costs, not to build an unfair system. But if you build a system that is optimized for high-value Amazon customers, that system will probably discriminate against low-income neighborhoods, and those neighborhoods will "just happen" to include many black neighborhoods. Building an unfair system isn't the intent, but it is a consequence of machine learning's innate ability to form classes.</p>
<p>Going beyond the issue of forming (and impacting) groups in ways that are unintended, we have to ask ourselves what a fair solution would mean. We can test many dimensions for fairness: race, age, gender, disability, nationality, religion, wealth, education, and many more. Wild makes the important point that, as questions about disparate impact cross different groups, we're subdividing our training data into smaller and smaller classes, and that a proliferation of classes, with less data in each class, is itself a recipe for poor performance. But there's a problem that's even more fundamental. Is it possible for a single solution to be fair for all groups? We might be able to design a solution that’s fair to two or three groups, but as the number of groups explodes, I doubt it. Do we care about some kinds of discrimination more than others? Perhaps we do; but that's a discussion that is bound to be uncomfortable.</p>
<p>We are rightly uncomfortable with building dimensions like race and age into our models. However, there are situations in which we have no choice. We don't want race to be a factor in real estate or criminal justice, nor do we want our systems to be finding their own proxies for race, such as street addresses. But what about other kinds of decisions? I've recently read several articles about increased mortality in childbirth for black women, of which the best appeared in <a href="https://www.propublica.org/article/nothing-protects-black-women-from-dying-in-pregnancy-and-childbirth">Pro Publica</a>. Mortality for black women is significantly higher than for white women, even when you control for socio-economics: even when everything is equal, black women are at a much higher risk than white women, and nobody knows why. This means that, if you're designing a medical AI system, you have to take race into account. That's the only way to ensure that the system has a chance to consider the additional risks that black women face. It's also the only way to ensure that the system might be able to determine the factors that actually affect mortality.</p>
<p>Is there a way out of this mess? There are two ways to divide a cake between two children. We can get out micrometers and scales, measure every possible dimension of the cake, and cut the cake so that there are two exactly equal slices. That's a procedural solution; it describes a process that's intended to be fair, but the definition of fairness is external to the process. Or we can give the knife to one child, let them make the cut, then let the other choose. This solution builds fairness into the process.</p>
<p>Computational systems (and software developers) are inherently more comfortable with the first solution. We're very good at doing computation with as many significant digits as you'd like. But cutting the cake ever more precisely isn't likely to be the answer we want. Slicing more precisely only gives us the appearance of fairness—or, more aptly put, something that we can justify as "fair," but without putting an end to the argument. (When I was growing up, the argument typically wasn't about size, but who got more pieces of cherry. Or the icing flower.) Can we do better? Can we come up with solutions that leave people satisfied, regardless of how the cake is cut?</p>
<p>We're ultimately after justice, not fairness. And by stopping with fairness, we are shortchanging the people most at risk. If justice is the real issue, what are we missing? In a conversation, Anna Lauren Hoffmann pointed out that often the biggest difference between having privilege and being underprivileged isn't formal; it’s practical. That is, people can formally have the same rights or opportunities but differ in their practical capacity to seek redress for harms or violations of those things. Having privilege, for example, means having the resources to appeal wrongs if one is short-changed by an unfair system. They may have the time or economic bandwidth to hire a lawyer, spend hours on the phone, contact elected officials, do what it takes. If you are underprivileged, these things can be effectively out of reach. We need to make systems that are more fair (whatever that might mean); but, recognizing that our systems aren't fair, and can't be, we need to provide mechanisms to repair the damage they do. And we need to make sure those systems are easily accessible, regardless of privilege.</p>
<p>The right to appeal builds fairness into the system, rather than having fairness as an external criterion. It's similar to letting one child cut the cake, and the other choose. It's only similar because the appeal process can itself be unfair, but it's a huge step forward. When an appeal is possible, and available to all, you don't need a perfect algorithm.</p>
<p>So, can we get to some conclusions? Being fair is hard, algorithmic or otherwise. Even deciding what we mean by "fair" is difficult. Do not take that to mean that we should give up. But do take that to mean that we shouldn't expect easy, simple solutions. We desperately need to have a discussion about "fairness" and what that means. That discussion needs to be broad and inclusive. And we may need to conclude that "fairness" is contextual, and isn't the same in all situations.</p>
<p>As with everything else, machine learning can help us to be fair. But we're better off using machine learning to understand what's unfair about our data, rather than trusting our systems to make data-driven decisions about what "fair" should be. While our systems can be assistants or even collaborators, we do not want to hand off responsibility to them. When we treat machine learning systems as oracles, rather than as assistants, we are headed in the wrong direction. We can't trick ourselves into thinking that a decision is fair because it is algorithmic. We can't afford to "mathwash" important decisions.</p>
<p>Finally, however we make decisions, we need to provide appeal mechanisms that are equally available to all—not just to those who can afford a lawyer, or who can spend hours listening to music on hold. The ability to appeal may be the most important part of a fair system, and it's one that isn't often discussed in data circles. The ability to appeal means that we don't have to design systems that get it right all the time—and that's important because our systems most certainly won't get it right all the time. Fairness ultimately has less to do with the quality of our decisions than the ability to get a bad decision corrected.</p>
<p>Continue reading <a href='https://www.oreilly.com/ideas/the-problem-with-building-a-fair-system'>The problem with building a “fair” system.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/JPXED3PeClA" height="1" width="1" alt=""/>Mike Loukideshttps://www.oreilly.com/ideas/the-problem-with-building-a-fair-systemMitigating known security risks in open source libraries2018-01-30T11:00:00Ztag:www.oreilly.com,2018-01-30:/ideas/mitigating-known-security-risks-in-open-source-libraries<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/machine-2881186_1920-aa3ebed0567d4ab0a107baa640661e35.jpg'/></p><p><em>Fixing vulnerable open source packages.</em></p>
<h2>Fixing Vulnerable Packages</h2>
<p>Finding out if you’re using vulnerable packages is an important step, but it’s not the real goal. The real goal is to fix those issues!</p>
<p>This chapter focuses on all you should know about fixing vulnerable packages, including remediation options, tooling, and various nuances. Note that SCA tools traditionally focused on finding or preventing vulnerabilities, and most put little emphasis on fix beyond providing advisory information or logging an issue. Therefore, you may need to implement some of these remediations yourself, at least until more SCA solutions expand to include them.</p><p>Continue reading <a href='https://www.oreilly.com/ideas/mitigating-known-security-risks-in-open-source-libraries'>Mitigating known security risks in open source libraries.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/i2JbeuAieFY" height="1" width="1" alt=""/>Guy Podjarnyhttps://www.oreilly.com/ideas/mitigating-known-security-risks-in-open-source-libraries60+ new live online trainings just launched on O'Reilly's learning platform2018-01-30T11:00:00Ztag:www.oreilly.com,2018-01-30:/ideas/60-plus-new-live-online-trainings-just-launched-on-oreillys-learning-platform<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/learning-at-computer-crop-65cfa0df1b0dba40efbe7670051517d9.jpg'/></p><p><em>Get hands-on training in machine learning, AI, Python, security, usability, and many more topics.</em></p><p>We just opened up <a href="https://www.safaribooksonline.com/live-training/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched">more than 60 live online trainings on our learning platform</a>.</p>
<p>These trainings give you hands-on experience in critical technology, design, and business topics. You'll learn from instructors in O’Reilly’s network of tech innovators and expert practitioners and from our trusted partners.</p>
<p>Space is limited and these trainings often fill up.</p>
<hr>
<br>
<p><a href="https://www.safaribooksonline.com/live-training/courses/hands-on-machine-learning-with-python-clustering-dimension-reduction-and-time-series-analysis/0636920156161/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Hands-on Machine Learning with Python: Clustering, Dimension Reduction, and Time Series Analysis</em></a> on February 14</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/getting-started-with-pythons-pytest/0636920156192/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Getting Started with Python’s Pytest</em></a> on February 14</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/hands-on-machine-learning-with-python-classification-and-regression/0636920156147/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Hands-on Machine Learning with Python: Classification and Regression</em></a> on February 15</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/mastering-pythons-pytest/0636920156239/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Mastering Python’s Pytest</em></a> on February 15</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/aws-security-fundamentals/0636920158516/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>AWS Security Fundamentals</em></a> on March 1</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/customer-research-for-product-managers/0636920149705/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Customer Research for Product Managers</em></a> on March 1</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/advanced-agile-scaling-in-the-enterprise/0636920145486/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Advanced Agile: Scaling in the Enterprise</em></a> on March 2</p>
<!-- <p><em>Understanding Methods and Techniques</em> on March 5</p> -->
<p><a href="https://www.safaribooksonline.com/live-training/courses/foundational-data-science-with-r/0636920136927/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Foundational Data Science with R</em></a> on March 5-6</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/linux-filesystem-administration/0636920139287/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Linux Filesystem Administration</em></a> on March 5-6</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/introduction-to-lean/0636920134091/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Introduction to Lean</em></a> on March 6</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/introduction-to-kubernetes/0636920122555/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Introduction to Kubernetes</em></a> on March 6-7</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/learn-the-basics-of-scala-in-3-hours/0636920155706/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Learn the Basics of Scala in 3 Hours</em></a> on March 7</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/effective-design-workshops/0636920158233/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Effective Design Workshops</em></a> on March 7</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/practical-ai-on-ios/0636920158172/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Practical AI on iOS</em></a> on March 7</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/docker-up-and-running/0636920124719/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Docker: Up and Running</em></a> on March 7-8</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/cloud-native-architecture-patterns/0636920111191/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Cloud Native Architecture Patterns</em></a> on March 7-8</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/getting-started-with-vue-js/0636920143611/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Getting Started With Vue.js</em></a> on March 8</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/cissp-stumbling-blocks-security-architecture-engineering-and-cryptography/0636920138341/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>CISSP Stumbling blocks: Security Architecture, Engineering, and Cryptography</em></a> on March 9</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/get-started-with-natural-language-processing-in-python/0636920149163/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Get Started with Natural Language Processing in Python</em></a> on March 12</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/testing-and-validating-product-ideas-with-lean/0636920141976/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Testing and Validating Product Ideas with Lean</em></a> on March 12</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/explore-visualize-and-predict-using-pandas-and-jupyter/0636920150985/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Explore, Visualize, and Predict Using Pandas and Jupyter</em></a> on March 12-13</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/getting-started-with-python-3/0636920151180/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Getting started with Python 3</em></a> on March 12-13</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/building-and-managing-kubernetes-applications/0636920155522/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Building and Managing Kubernetes Applications</em></a> on March 13</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/ipv4-subnetting/0636920141556/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>IPv4 Subnetting</em></a> on March 13-14</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/go-programming-for-distributed-computing/0636920157687/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Go Programming for Distributed Computing</em></a> on March 13-14</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/how-to-do-great-customer-interviews/0636920143130/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>How to do Great Customer Interviews</em></a> on March 14</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/beginners-guide-to-creating-prototypes-in-sketch/0636920110521/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Beginner’s Guide to Creating Prototypes in Sketch</em></a> on March 14</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/sql-fundamentals-for-data/0636920108931/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>SQL Fundamentals for Data</em></a> on March 14-15</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/negotiation-fundamentals/0636920149934/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Negotiation Fundamentals</em></a> on March 15</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/big-data-hadoop-for-beginners/0636920152491/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Big Data and Hadoop for Beginners</em></a> on March 15</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/managing-enterprise-data-strategies-with-hadoop-spark-and-kafka/0636920152583/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Managing Enterprise Data Strategies with Hadoop, Spark, and Kafka</em></a> on March 15</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/managing-enterprise-data-strategies-with-hadoop-spark-and-kafka-full-day/0636920152729/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Managing Enterprise Data Strategies with Hadoop, Spark, and Kafka</em></a> on March 16</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/css-layout-fundamentals-from-floats-to-flexbox-and-css-grid/0636920138921/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>CSS Layout Fundamentals: From Floats to Flexbox and CSS Grid</em></a> on March 16</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/porting-from-python-2-to-python-3/0636920146803/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Porting from Python 2 to Python 3</em></a> on March 16</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/cissp-stumbling-blocks-software-development-security-and-identity/0636920156789/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>CISSP Stumbling blocks: Software Development Security and Identity</em></a> on March 16</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/introduction-to-critical-thinking/0636920152095/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Introduction to Critical Thinking</em></a> on March 19</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/troubleshooting-agile/0636920145387/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Troubleshooting Agile</em></a> on March 19</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/scala-beyond-the-basics/0636920151258/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Scala Beyond the Basics</em></a> on March 19-20 </p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/data-science-for-security-professionals/0636920149989/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Data Science for Security Professionals</em></a> on March 19-21</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/amazon-web-services-architect-associate-certification-aws-core-architecture-concepts/0636920131281/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Amazon Web Services: Architect Associate Certification - AWS Core Architecture Concepts</em></a> on March 20-21</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/mastering-data-science-at-enterprise-scale/0636920117384/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Mastering Data Science at Enterprise Scale</em></a> on March 20-21</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/hands-on-machine-learning-with-python-clustering-dimension-reduction-and-time-series-analysis/0636920157984/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Hands-on Machine Learning with Python: Clustering, Dimension Reduction, and Time Series Analysis</em></a> on March 21</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/introduction-to-analytics-for-product-managers/0636920106715/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Introduction to Analytics for Product Managers</em></a> on March 21</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/getting-started-with-pythons-pytest/0636920158127/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Getting Started with Python’s Pytest</em></a> on March 21</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/design-patterns-boot-camp/0636920144915/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Design Patterns Boot Camp</em></a> on March 21-22</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/introduction-to-tensorflow/0636920139713/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Introduction to TensorFlow</em></a> on March 21-22</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/docker-beyond-the-basics-ci-cd/0636920125549/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Docker: Beyond the Basics (CI/CD)</em></a> on March 21-22</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/hands-on-machine-learning-with-python-classification-and-regression/0636920157793/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Hands-on Machine Learning with Python: Classification and Regression</em></a> on March 22</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/mastering-pythons-pytest/0636920158073/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Mastering Python’s Pytest</em></a> on March 22</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/getting-started-with-devops-in-90-minutes/0636920142416/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Getting Started with DevOps in 90 Minutes</em></a> on March 26</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/advanced-sql-for-data-analysis-with-python-r-and-java/0636920109365/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Advanced SQL for Data Analysis (with Python, R, and Java)</em></a> on March 26</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/introduction-to-encryption/0636920135371/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Introduction to Encryption</em></a> on March 26</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/microservices-architecture-and-design/0636920138167/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Microservices Architecture and Design</em></a> on March 26-27</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/usability-testing-101/0636920110064/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Usability Testing 101</em></a> on March 27</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/performance-monitoring-and-diagnostics-for-linux-applications/0636920123590/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Performance Monitoring and Diagnostics for Linux Applications</em></a> on March 27-28</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/amazon-web-services-aws-managed-services/0636920132004/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Amazon Web Services: AWS Managed Services</em></a> on March 27-28</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/getting-started-with-cybersecurity-science/0636920158554/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Getting Started with Cybersecurity Science</em></a> on March 28</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/how-agile-and-traditional-teams-work-together/0636920144250/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>How Agile and Traditional Teams Work Together</em></a> on March 28</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/architecture-by-example/0636920140061/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Architecture by Example</em></a> on March 28-29</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/amazon-web-services-aws-design-fundamentals/0636920131700/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Amazon Web Services: AWS Design Fundamentals</em></a> on March 29-30</p>
<p><a href="https://www.safaribooksonline.com/live-training/courses/scalable-web-development-with-angular/0636920157045/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched"><em>Scalable Web Development with Angular</em></a> on April 16-17</p>
<br>
<hr>
<br>
<p><a href="https://www.safaribooksonline.com/live-training/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=60-live-online-trainings-launched">Visit our learning platform</a> for more information on these and our other live online trainings.</p>
<p>Continue reading <a href='https://www.oreilly.com/ideas/60-plus-new-live-online-trainings-just-launched-on-oreillys-learning-platform'>60+ new live online trainings just launched on O'Reilly's learning platform.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/rbR4uiXWowU" height="1" width="1" alt=""/>https://www.oreilly.com/ideas/60-plus-new-live-online-trainings-just-launched-on-oreillys-learning-platformFour short links: 30 January 20182018-01-30T10:50:00Ztag:www.oreilly.com,2018-01-30:/ideas/four-short-links-30-january-2018<p><em>Podcast Data, Data Stories, Distributed Systems, and Tech Future Scenarios</em></p><ol>
<li>
<a href="https://www.wired.com/story/apple-podcast-analytics-first-month">Podcast Data</a> -- <i>Apple’s Podcast Analytics feature finally became available last month[...]. Though it’s still early days, the numbers podcasters are seeing are highly encouraging. [...] Listeners are typically getting through 80-90% of content. [...] According to Panoply, the few listeners who do skip ads continue to remain engaged with the episode, rather than dropping off at the first sign of an interruption.</i>
</li>
<li>
<a href="https://datafloq.com/read/the-anatomy-of-a-data-story/4435">The Anatomy of a Data Story</a> -- <i>Great data stories: connect with people; try to convey one idea; keep it simple; explore a topic you know well.</i>
</li>
<li>
<a href="https://azure.microsoft.com/en-us/resources/designing-distributed-systems/en-us/">Designing Distributed Systems</a> (Microsoft) -- 160 pages from Microsoft with <i>repeatable, generic patterns, and reusable components to make developing reliable systems easier and more efficient.</i>
</li>
<li>
<a href="http://cifs.dk/media/4302/2018-scenario-1-en-web.pdf">Scenario</a> -- <i>How will society change over the next 50 years? Will we still have jobs as we do today, perhaps with slightly shorter working weeks, or will the so-called "technological singularity" lead us to totally restructure our society? Perhaps reality lies somewhere in the middle. We look at three scenarios for the next few decades of technological development.</i> From <a href="http://www.scenariomagazine.com/"><em>Scenario Magazine</em></a>.</li>
</ol>
<p>Continue reading <a href='https://www.oreilly.com/ideas/four-short-links-30-january-2018'>Four short links: 30 January 2018.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/a2L0BMMiv9Q" height="1" width="1" alt=""/>Nat Torkingtonhttps://www.oreilly.com/ideas/four-short-links-30-january-2018From big data to fast data2018-01-29T19:45:00Ztag:www.oreilly.com,2018-01-29:/ideas/from-big-data-to-fast-data<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/light-2383202_1920-1458bca3caad374fb14f61fbf7a3818b.jpg'/></p><p><em>Designing application architectures for real-time decisions.</em></p><p>Enterprise data needs change constantly but at inconsistent rates, and in recent years change has come at an increasing clip. Tools once considered useful for big data applications are not longer sufficient. When batch operations predominated, Hadoop could handle most of an organization’s needs. Development in other IT areas (think IoT, geolocation, etc.) have changed the way data is collected, stored, distributed, processed and analyzed. Real-time decision needs complicate this scenario and new tools and architectures are needed to handle these challenges efficiently.</p>
<p>Think of the 3 V's of data: volume, velocity, and variety. For a while big data emphasized data volume; now fast data applications mean velocity and variety are key. Two tendencies have emerged from this evolution: first, the variety and velocity of data that enterprise needs for decision making continues to grow. This data includes not only transactional information, but also business data, IoT metrics, operational information, and application logs. Second, modern enterprise needs to make those decisions in real time, based on all that collected data. This need is best clarified by looking at how modern shopping websites work.</p><p>Continue reading <a href='https://www.oreilly.com/ideas/from-big-data-to-fast-data'>From big data to fast data.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/BrglI8kNBmc" height="1" width="1" alt=""/>Raul Estradahttps://www.oreilly.com/ideas/from-big-data-to-fast-dataFour short links: 29 January 20182018-01-29T11:55:00Ztag:www.oreilly.com,2018-01-29:/ideas/four-short-links-29-january-2018<p><em>Dangerous Data, Data Linter, Participatory Budgeting, and Security Wargames</em></p><ol>
<li>
<a href="https://twitter.com/Nrg8000/status/957318498102865920">Aggregated Data is Dangerous Even When Aggregated</a> -- jogging app releases visualization of all its customers' data, inadvertently exposing military bases. It is dangerous to use data for purposes other than that for which it was collected.</li>
<li>
<a href="https://github.com/brain-research/data-linter">Data Linter</a> -- <i>identifies potential issues (lints) in your ML training data.</i>
</li>
<li>
<a href="https://www.mysociety.org/files/2018/01/Participatory-Budgeting-research-by-mySociety-Jan-2018.pdf">Participatory Budgeting</a> -- <i>This research identified significant challenges in the participatory budgeting sphere, from a very common lack of goals to be achieved through participatory budgeting exercises, to very weak network links and peer support for implementers, to the frustrations of the exercises as a result of political corruption or subversion. The migration to managing participatory budgeting digitally presents the very real risk of the process becoming gentrified, and is just one example of the consequences of scale in participatory budgeting only being achieved at the expense of disenfranchising the most under-represented.</i> There are recommendations as well.</li>
<li>
<a href="http://overthewire.org/wargames/">Over The Wire</a> -- wargames to help you learn and practice security concepts. (via <a href="http://bit.ly/2FqrgRB">Hacker News</a>)</li>
</ol>
<p>Continue reading <a href='https://www.oreilly.com/ideas/four-short-links-29-january-2018'>Four short links: 29 January 2018.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/CP_N2CrYFy4" height="1" width="1" alt=""/>Nat Torkingtonhttps://www.oreilly.com/ideas/four-short-links-29-january-2018Four short links: 26 January 20182018-01-26T11:00:00Ztag:www.oreilly.com,2018-01-26:/ideas/four-short-links-26-january-2018<p><em>Bitcoin, Ted Nelson, Constraint Modeling, and NLP</em></p><ol>
<li>
<a href="https://www.nytimes.com/2018/01/16/magazine/beyond-the-bitcoin-bubble.html">Beyond the Bitcoin Bubble</a> (Steven Johnson) -- a fine exegesis of the thesis that blockchain tech is a return to the open-to-innovation protocols of the early days of the internet. <i>Right now, the only real hope for a revival of the open-protocol ethos lies in the blockchain.</i>
</li>
<li>
<a href="https://spectrum.ieee.org/video/geek-life/profiles/ted-nelson-on-what-modern-programmers-can-learn-from-the-past">Ted Nelson on What Modern Programmers Can Learn From The Past</a> -- <i>We thought computing would be artisinal. We did not imagine large monopolies. We thought the Citizen Programmer would be the leader.</i>
</li>
<li>
<a href="http://www.minizinc.org/">MiniZinc</a> -- <i>a free and open source constraint modeling language.</i> See also <a href="http://www.hakank.org/minizinc/index.html">Hakan Kjellerstrand</a>'s page on it. (via <a href="http://bit.ly/2Gf3usZ">Hacker News</a>)</li>
<li>
<a href="https://blog.insightdatascience.com/how-to-solve-90-of-nlp-problems-a-step-by-step-guide-fda605278e4e">How to Solve 90% of NLP Problems: A Step-by-Step Guide</a> -- with an interactive notebook!</li>
</ol>
<p>Continue reading <a href='https://www.oreilly.com/ideas/four-short-links-26-january-2018'>Four short links: 26 January 2018.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/4PSryTQB5qE" height="1" width="1" alt=""/>Nat Torkingtonhttps://www.oreilly.com/ideas/four-short-links-26-january-2018Stream all the things2018-01-26T11:00:00Ztag:www.oreilly.com,2018-01-26:/ideas/stream-all-the-things<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/transport-system-3061716_1920-365c154c8a7dd05dc05cd64d8f12fb3d.jpg'/></p><p><em>Streaming architectures for data sets that never end.</em></p><p>Continue reading <a href='https://www.oreilly.com/ideas/stream-all-the-things'>Stream all the things.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/J_5C8QTD2bs" height="1" width="1" alt=""/>Dean Wamplerhttps://www.oreilly.com/ideas/stream-all-the-things10 popular resources on O'Reilly's online learning platform2018-01-26T11:00:00Ztag:www.oreilly.com,2018-01-26:/ideas/10-popular-resources-on-oreillys-online-learning-platform<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/fountains-561913_1920-ac4d9da83511a94c8cc5ebe243d518c8.jpg'/></p><p><em>Learn skills in TensorFlow, Java, Spark, and more.</em></p><p>Get a fresh start on building a new skill or augment what you currently know with one of these new and popular titles on Safari.</p>
<h2>Hands-On Machine Learning with Scikit-Learn and TensorFlow (O'Reilly)</h2>
<p><a href="https://www.safaribooksonline.com/library/view/hands-on-machine-learning/9781491962282/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=new-and-popular-releases-on-safari-for-january-2018"><img align="left" src="https://covers.oreillystatic.com/images/0636920052289/cat.gif" style="margin-right: 20px;" width="140px"></a></p><p>Continue reading <a href='https://www.oreilly.com/ideas/10-popular-resources-on-oreillys-online-learning-platform'>10 popular resources on O'Reilly's online learning platform.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/ZRJOWVVKZGE" height="1" width="1" alt=""/>https://www.oreilly.com/ideas/10-popular-resources-on-oreillys-online-learning-platformFour short links: 25 January 20182018-01-25T19:10:00Ztag:www.oreilly.com,2018-01-25:/ideas/four-short-links-25-january-2018<p><em>Young Developers Coding Later, Algorithmic Fairness, Google Goss, and Deep Learned Folk Music</em></p><ol>
<li>
<a href="http://research.hackerrank.com/developer-skills/2018/">2018 Developer Report</a> -- <i>Almost half of all developers (47%) between the ages of 45 and 54 started coding before they were 16 years old. Meanwhile, developers between 18 and 24 today are the least likely to have started coding before 16 (only 20%).</i> That's right: today's kids learn to code later than 80s kids. (via <a href="https://thenextweb.com/dd/2018/01/23/report-80s-kids-started-programming-at-an-earlier-age-than-todays-millennials/">The Next Web</a>)</li>
<li>
<a href="https://arxiv.org/abs/1701.08230">Algorithmic Decision-making and the Cost of Fairness</a> -- <i>Maximizing public safety requires detaining all individuals deemed sufficiently likely to commit a violent crime, regardless of race. However, to satisfy common metrics of fairness, one must set multiple, race-specific thresholds. There is thus an inherent tension between minimizing expected violent crime and satisfying common notions of fairness. This tension is real: by analyzing data from Broward County, we find that optimizing for public safety yields stark racial disparities; conversely, satisfying past fairness definitions means releasing more high-risk defendants, adversely affecting public safety.</i>
</li>
<li>
<a href="https://medium.com/@steve.yegge/why-i-left-google-to-join-grab-86dfffc0be84">Why I Left Google to Join Grab</a> (Steve Yegge) -- finally, he's blogging again, and dishes both dirt on Google and analysis on the food delivery market. <i>Google has become 100% competitor focused rather than customer focused. [...] The problem is that their incentive structure isn’t aligned for focusing on their customers, so they wind up being too busy, and it always gets deprioritized. A slogan isn’t good enough. It takes real effort to set aside time regularly for every employee to interact with your customers. Instead, they play the dangerous but easier game of using competitor activity as a proxy for what customers really need.</i> and <i>The entire food truck industry is about to go tango uniform, disrupted by food delivery just as it was getting off the ground.</i> and <i>Unlike in the U.S., ride-hailing transport is a game-changer for the entire social and economic infrastructure of Southeast Asia, including Singapore, Thailand, Vietnam, Cambodia, Myanmar, Malaysia, Indonesia and the Philippines.</i> tl;dr: he's hiring and gives great rhetoric.</li>
<li>
<a href="https://github.com/IraKorshunova/folk-rnn">folk-rnn</a> -- <i>folk music modeling with LSTM</i>. Applying deep learning to folk music: it's basically my happy place. I love <a href="https://highnoongmt.wordpress.com/2016/08/28/millennial-whoop-with-derp-learning/">the millenial whoop Irish tunes</a>.</li>
</ol>
<p>Continue reading <a href='https://www.oreilly.com/ideas/four-short-links-25-january-2018'>Four short links: 25 January 2018.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/0mFRHX91Rdw" height="1" width="1" alt=""/>Nat Torkingtonhttps://www.oreilly.com/ideas/four-short-links-25-january-2018Enterprise data integration with an operational data hub2018-01-25T17:30:00Ztag:www.oreilly.com,2018-01-25:/ideas/enterprise-data-integration-with-an-operational-data-hub<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/swirl-1170475_1920_crop-792705ae8e680587a5c63893162d7057.jpg'/></p><p><em>Facilitating data exchange across the enterprise.</em></p><p>Organizations in the private and public sectors alike are looking for ways to integrate relevant data across the enterprise in support of business, operational, and compliance needs.</p>
<p>Big data (also called NoSQL) technologies facilitate the ingestion, processing, and search of data with no regard to schema (database structure). Web technologies such as Google, LinkedIn, and Facebook use big data technologies to process the tremendous amount of data from every possible source without regard to structure, and offer a searchable interface to access it. Modern NoSQL technologies have evolved to offer capabilities to govern, process, secure, and deliver data, and have facilitated the development of an integration pattern called the operational data hub (ODH).</p>
<p>The Centers for Medicare and Medicaid Services (CMS) and other organizations (public and private) in the health, finance, banking, entertainment, insurance, and defense sectors (amongst others) utilize the capabilities of ODH technologies for enterprise data integration. This gives them the ability to access, integrate, master, process, and deliver data across the enterprise.</p>
<h2>Traditional model and data silos</h2>
<p>For decades, the standard pattern to produce operational data and enterprise analytics was to develop data warehouses with data schemas dedicated to the purpose of use. Let’s consider an example: an HR department required detailed analysis of human resource data. The development team was engaged to elicit requirements for the reports that would be generated and design a database schema to store that data. Data feeds were developed to pull HR data from all relevant systems (such as payroll and vacation registers), then insert or update tables in the data warehouse to build the required analytical data. Once completed, the HR director could pull metrics on trends in pay raises, tenure, and paid time off.</p>
<p>However, if additional information such as trends in employee satisfaction scores were required, the development team had to be engaged again to elicit requirements, source the data, determine the impact on the database and then build the processes that updates the data warehouse. This process had to be repeated every time the data warehouse needed updating. Each update to the data warehouse typically included a tremendous amount of development and testing to ensure the updated schema did not break existing code. For this reason, the level of effort for analyzing and implementing any change was typically enormous.</p>
<p>Each department, having developed its own operational systems and own data warehouses, could execute business processes and draw analytical information. However, this practice caused isolation in information technology resources—referred to as “data silos.” It is very difficult to draw analytical correlations across data silos. For instance, if a CEO wanted to know the impact of seasonal staff-turnover on the ability to fulfill product delivery and shipment, it would require that HR, sales, production, and shipping data be correlated over time. The effort involved typically resulted in huge delays in time to produce it and significant cost.</p>
<p>Many organizations used relational technologies to implement enterprise data warehouses (EDWs) across the relevant data silos to answer enterprise-wide questions. However, these EDWs suffer from the same challenges as their smaller, departmental cousins. The effort associated with designing and implementing the schema, data extracts and data feeds are significant. Once developed, changes are typically not any easier either.</p>
<h2>How are things better with an ODH?</h2>
<p>An ODH combines the flexible schema processing capabilities of NoSQL technologies with the governance, rigor, and transactional integrity of relational technologies. To illustrate how an ODH would be helpful, let’s consider the example provided above. Since an ODH is built on a NoSQL technology, and NoSQL technologies allow data to be ingested without consideration to schema, the organization can start ingesting available data in raw format into the ODH. Our organization has the following systems across the enterprise:</p>
<ul>
<li>A payroll system that includes employee, position, benefits, and payroll information</li>
<li>A vacation register that manages, approves, and tracks paid time off</li>
<li>A training system that tracks compliance training and job-related training</li>
<li>A product management system that manages product development and parts ordering</li>
<li>Warehouse management that tracks products on hand and manages shipping</li>
<li>An order management system that manages sales and customer information</li>
<li>A customer relationship management (CRM) system that manages customer information and tracks sales</li>
<li>A document management system that manages electronic versions of paper documents</li>
</ul>
<p>The files that the payroll system, vacation system, and training system exchange with one another to coordinate HR information can be ingested into the ODH in raw format. The same can be done for the warehouse management, order management, and CRM systems. Data can be ingested directly from the product management database and the documents in the document management system. This allows for structured files, unstructured (document) files, and database content to reside together in their native formats in the ODH, where they can be indexed, processed, and searched. Thus far, the only additional efforts expended are:</p>
<ul>
<li>the data queries (simple SQL) from the product management system</li>
<li>the processes to ingest the files from the existing interfaces</li>
<li>the processes to ingest the PDF files from document management system</li>
</ul>
<p>So, with very little effort expended, we have all the data across the enterprise in a single location, with metadata specifying sources of data. Data analysts can now query data across these sources to find answers to the types of questions that the CEO requested and store the results for future, quick, search.</p>
<p>However, the true value of the ODH is realized when we leverage the data governance, processing, and consistency capabilities to establish data processing patterns upon ingest. In addition to ingesting the raw data, additional processes can:</p>
<ul>
<li>group cohorts of data based on identifiers or configurable fuzzy logic</li>
<li>apply an in-place harmonized (canonical) model of data elements</li>
<li>apply data quality updates</li>
<li>create master records with updates from disparate systems</li>
</ul>
<p>Let’s see how this applies to our operating example. The IT organization can set up scripts to progressively apply a logically translated common data structure (canonical model) over time so that stored data can be easily processed and searched. Scripts are developed to group cohorts of data, such as clients, vendors, or employees. Updates to these cohorts across source systems maintain a central, mastered copy of the record. If, for instance, a client updates his/her address, we don’t have to rely on fragile point-to-point integrations between the CRM, order management, and warehouse management systems. The update is processed centrally and the ODH distributes the data to the transactional systems for further processing. Just as it is schema-agnostic during ingest, the ODH also allows for configurable schema mapping upon data delivery. This allows for translation of data upon ingest, and for processing and delivery to ensure maximum flexibility during data distribution.</p>
<p>Why do we need operational data hubs? We need them to facilitate enterprise data integration with the flexibility of big data/NoSQL technologies, but with the added rigor, governance, and consistency required in an enterprise environment. The ODH facilitates data exchange across the enterprise and allows for analytical processing of raw or mastered data at a fraction of the cost of traditional technologies.</p>
<p><em>This post is a collaboration between O’Reilly and MarkLogic. </em><a href="http://www.oreilly.com/about/editorial_independence.html"><em>See our statement of editorial independence</em></a><em>.</em></p>
<p>Continue reading <a href='https://www.oreilly.com/ideas/enterprise-data-integration-with-an-operational-data-hub'>Enterprise data integration with an operational data hub.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/sxYCWUxAzp0" height="1" width="1" alt=""/>Gerhard Ungererhttps://www.oreilly.com/ideas/enterprise-data-integration-with-an-operational-data-hubA quick intro to experience mapping2018-01-25T14:35:00Ztag:www.oreilly.com,2018-01-25:/ideas/a-quick-intro-to-experience-mapping<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/projection_mapping_pakistan_4_crop-c7a9535be34b8bd7d5e346d0db2b3c5b.jpg'/></p><p><em>Experience mapping helps organizations see themselves from the outside in, rather than the inside out.</em></p><p>Continue reading <a href='https://www.oreilly.com/ideas/a-quick-intro-to-experience-mapping'>A quick intro to experience mapping.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/F37RUpPs-LE" height="1" width="1" alt=""/>James Kalbachhttps://www.oreilly.com/ideas/a-quick-intro-to-experience-mappingPaul Bakker and Sander Mak on Java 9 modularity2018-01-25T12:35:00Ztag:www.oreilly.com,2018-01-25:/ideas/paul-bakker-and-sander-mak-on-java-9-modularity<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/cubes-1778740_1920_crop-035a7daaf2790db12e5b310878d1abf1.jpg'/></p><p><em>The O’Reilly Programming Podcast: The Java module system and the “start of a new era.”</em></p><p>In this episode of the <a href="https://www.oreilly.com/topics/oreilly-programming-podcast">O’Reilly Programming Podcast</a>, I talk with <a href="https://twitter.com/pbakker">Paul Bakker</a>, senior software engineer on the edge developer experience team at Netflix, and <a href="https://twitter.com/Sander_Mak">Sander Mak</a>, a fellow at <a href="https://www.luminis.eu/">Luminis Technologies</a>. They are the authors of the O’Reilly book <a href="https://www.safaribooksonline.com/library/view/java-9-modularity/9781491954157/?utm_source=oreilly&amp;utm_medium=newsite&amp;utm_campaign=20180124_programming_podcast_bakker_mak_text_body_java_modularity_book"><em>Java 9 Modularity</em></a>, in which they call the introduction of the module system to the platform “the start of a new era.” </p><p>Continue reading <a href='https://www.oreilly.com/ideas/paul-bakker-and-sander-mak-on-java-9-modularity'>Paul Bakker and Sander Mak on Java 9 modularity.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/dECvywfPDOU" height="1" width="1" alt=""/>Jeff Bleielhttps://www.oreilly.com/ideas/paul-bakker-and-sander-mak-on-java-9-modularityApache Beam lowers barriers to entry for big data processing technologies2018-01-25T11:30:00Ztag:www.oreilly.com,2018-01-25:/ideas/apache-beam-lowers-barriers-to-entry-for-big-data-processing-technologies<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/photo_6288_20160702_crop-58446f0d655fc41d37c16b27c36a8ff8.jpg'/></p><p><em>The classic “write once, run everywhere” principle comes to life in streaming data.</em></p><p>New technologies in data processing are piling up faster than most programmers can learn them. Eager to enter the radically innovative programming worlds of streaming input and big data, we heard that we had to learn <a href="https://research.google.com/archive/mapreduce.html">MapReduce</a>, and then—no, it’s <a href="https://spark.apache.org/">Spark</a> we need to know, and now perhaps something still different such as <a href="https://flink.apache.org/">Flink</a>. Big data is in an exciting stage of development, where new technologies continuously sprout up. Just take a look at the Apache projects offered for every point in the pipeline (including tools to manage the pipeline). Not to be outpaced, the major cloud services (such as Amazon.com’s AWS, Microsoft’s Azure, and Google Cloud) compete furiously in this space, eager to offer data processing platforms in order to build their brands beyond IaaS or PaaS services that are at risk of becoming commoditized.</p>
<p>The result is a barrier to programmers who wish to be of greater value to their employers and to organizations striving to integrate better sources of data into their decision-making. Because it’s so hard to learn one technology, the organization may stick with it much longer than appropriate and lose the chance to apply a newer and more efficient technology to its data processing needs. Data engineers may still be using traditional relational databases and ETL technologies, which oftentime focus on batch processing in contrast to newer technologies that allow stream processing.</p>
<p>Into this churning environment comes <a href="https://beam.apache.org/">Apache Beam</a> as a much-needed standard to open up access to all the popular streaming technologies through a single API. Several important data processing tools (notably Spark, Flink, and Google Cloud Dataflow) are now supported by the Beam API, and as an open source technology, it is welcoming to all.</p>
<p>The Beam architecture works like this: developers specify what they want to run in a simple JSON format and run a conversion program called the Beam “compiler” to create Beam files containing all the specifications. People can then schedule the jobs on drivers called “runners” that convert the Beam specifications into the precise command needed by the chosen processor (Spark, etc.). The people running the jobs can be different from the developers creating them, and the same job can be run on different processors for different purposes—trading off issues such as data size and needed response time. Thus, the slogan “write once, run everywhere,” originally used to describe the new Java language, applies to Beam in this context. Multiple programming languages are also supported by Beam.</p>
<p>While Apache Beam hopes to become the one ring to bind all the data processing frameworks, it is not a lowest common denominator. (Google software engineer Frances Perry made this point in a 2017 <a href="https://www.talend.com/blog/2017/07/17/podcast-whats-next-for-apache-beam-featuring-frances-perry-of-google/">interview</a>.) The Beam development team tracks the adoption of new concepts and features by streaming platforms, and standardizes important new trends. The provision of a standard also drives platforms to incorporate new features so as to support Beam more fully. The tools can continue to compete on the basis of performance, flexibility, and other differences in their architectures. Tools for relational data are also being developed, based on <a href="https://calcite.apache.org/">Apache Calcite</a>.</p>
<p>Is it worth your time to learn Beam? It’s important to thoroughly understand the strengths and weaknesses of the underlying platform you use, but if you know Beam, you might be able to greatly reduce development time for each platform, and make porting almost instant. Beam has a thriving <a href="https://beam.apache.org/">developer and user community</a> with contributions from such major companies as Google, Talend, PayPal, and data Artisans. There is a distinct possibility that Beam will become a de facto requirement for new tools in the data processing space, enhancing its value even more. In that case, the investment that programmers make in learning Beam will continue to pay off for years to come.</p>
<p><em>This post is a collaboration between O'Reilly and Talend. </em><a href="http://www.oreilly.com/about/editorial_independence.html"><em>See our statement of editorial independence</em></a><em>.</em></p>
<p>Continue reading <a href='https://www.oreilly.com/ideas/apache-beam-lowers-barriers-to-entry-for-big-data-processing-technologies'>Apache Beam lowers barriers to entry for big data processing technologies.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/X3lls26WHgA" height="1" width="1" alt=""/>Andy Oramhttps://www.oreilly.com/ideas/apache-beam-lowers-barriers-to-entry-for-big-data-processing-technologiesImplement OAuth in 15 minutes with Firebase2018-01-25T11:00:00Ztag:www.oreilly.com,2018-01-25:/ideas/implement-oauth0-in-15-minutes-with-firebase<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/fire-1905608_1280-9e219738777669127bcf5564211040c6.jpg'/></p><p><em>A Start-to-finish example with Angular and Typescript.</em></p><p>OAuth is the undisputed standard for authenticating over the web or in native mobile apps. The promise of one-click signups and logins has obvious appeal, but getting this up and running is often quite challenging. Over the years we have heard countless horror stories from developers who have spent weeks trying to add OAuth support to their applications, with varying degrees of success.</p>
<p>This article provides a 15 minute, step-by-step guide to adding OAuth support to a CLI-generated Angular application using Firebase. We will implement OAuth with a Google account, but other platforms supported by Firebase include: Facebook, Twitter, and GitHub.</p><p>Continue reading <a href='https://www.oreilly.com/ideas/implement-oauth0-in-15-minutes-with-firebase'>Implement OAuth in 15 minutes with Firebase.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/W5NwGDAGBtA" height="1" width="1" alt=""/>Michael Dowden, Martine Dowden, Michael McGinnishttps://www.oreilly.com/ideas/implement-oauth0-in-15-minutes-with-firebaseAnomaly detection with Apache MXNet2018-01-24T19:15:00Ztag:www.oreilly.com,2018-01-24:/ideas/anomaly-detection-with-apache-mxnet<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/stadium-2921657_1920_crop-53b7acef14ad2406d39ed765c57d7894.jpg'/></p><p><em>Finding anomalies in time series using neural networks.</em></p><p>In recent years, the term “anomaly detection” (also referred to as “outlier detection”) has started popping up more and more on the internet and in conference presentations. This is not a new topic by any means, though. Niche fields have been using it for a long time. Nowadays, though, due to advances in banking, auditing, the Internet of Things (IoT), etc., anomaly detection has become a fairly common task in a broad spectrum of domains. As with other tasks that have widespread applications, anomaly detection can be tackled using multiple techniques and tools. This, of course, can cause a lot of confusion concerning what it’s for and how it works.</p>
<p>This article takes a look at how different types of neural networks can be applied to detect anomalies in time series data using <a href="https://mxnet.apache.org/">Apache MXNet</a>, a fast and scalable training and inference framework with an easy-to-use, concise API for machine learning, in Python using Jupyter Notebooks. By the end of this tutorial, you should:</p>
<ul>
<li>Know what anomaly detection is and the common techniques for solving it</li>
<li>Be able to set up your MXNet environment</li>
<li>See the difference between different types of networks, along with their strengths and weaknesses</li>
<li>Load and preprocess the data for such a task</li>
<li>Build network architectures in MXNet</li>
<li>Train models using MXNet and use them for predictions</li>
</ul>
<p>All the code and the data used in this tutorial can be found on <a href="https://github.com/mdymczyk/anomaly-detection">GitHub</a>.</p>
<h2>Anomaly detection</h2>
<p>When talking about any machine learning task, I like to start by pointing out that, in many cases, the task is really all about finding <em>patterns</em>. This problem is no different. Anomaly detection is a process of training a model to find a pattern in our training data, which we subsequently can use to identify any observations that do not conform to that pattern. Such observations will be called <em>anomalies</em> or <em>outliers</em>. In other words, we will be looking for a deviation from the standard pattern, something rare and unexpected.</p>
<p>Figure 1 shows anomalies in a human heartbeat, indicating a medical syndrome.</p>
<figure class="center" id="id-68jin"><img alt="anomalies in a human heartbeat" src="https://d3ansictanv2wj.cloudfront.net/Figure1-314a4eae5fb5535bc807e80df3c71466.png"><figcaption><span class="label">Figure 1. </span>Wolff-Parkinson-White syndrome is a type of heartbeat anomaly, where you can clearly see how the delta wave broadens the ventricular complex and shortens the PR interval. Figure by Mateusz Dymczyk.</figcaption></figure>
<p>An important distinction has to be made between anomaly detection and “novelty detection.” The latter turns up new, previously unobserved, events that still are acceptable and expected. For example, at some point in time, your credit card statements might start showing baby products, which you’ve never before purchased. Those are new observations not found in the training data, but given the normal changes in consumers’ lives, may be acceptable purchases that should not be marked as anomalies.</p>
<p>Anomalies can also be leveraged for multiple use cases:</p>
<ul>
<li>
<strong>Predictive maintenance.</strong> In factories, or any kind of IoT environment, you can build a model using data gathered during normal execution modes and use it to predict imminent failure. This means no unplanned halts in production.</li>
<li>
<strong>Fraud detection.</strong> Financial institutions often use this technique to catch unexpected expenses: for example, if your credit card got stolen.</li>
<li>
<strong>Health care.</strong> Used for diagnostics, for instance.</li>
<li>
<strong>Cybersecurity.</strong> Ever wanted to catch all those intruders trying to hack into your system? Anomaly detection can help you.</li>
</ul>
<p>Similarly, as mentioned before, a wide range of methods can be used to solve this problem. A few of the most popular include:</p>
<ul>
<li>
<a href="http://web.mit.edu/kirtley/kirtley/binlustuff/literature/control/Kalman%20filter.pdf">Kalman filters</a>, which use simple statistics</li>
<li><a href="https://ocw.mit.edu/courses/sloan-school-of-management/15-097-prediction-machine-learning-and-statistics-spring-2012/lecture-notes/MIT15_097S12_lec06.pdf">K-nearest neighbors</a></li>
<li><a href="http://stanford.edu/~cpiech/cs221/handouts/kmeans.html">K-means clustering</a></li>
<li>Deep learning-based <a href="https://cs.stanford.edu/~quocle/tutorial2.pdf">autoencoders</a>
</li>
</ul>
<p>Several problems arise when you try to use most of these algorithms. For instance, they tend to make specific assumptions about the data, and some do not work with multivariate data sets.</p>
<p>This is why today we will look into the last method—autoencoders—using two types of neural networks: multilayer perceptron and long-short-term-memory (LSTM) networks. For simplicity, this tutorial uses only a single feature, for which other methods might turn out just as good, but the great thing about neural nets is how good they are at modeling multivariate problems, which is what you’d probably want to do in production (especially when working with IoT time-series data, as in this example).</p>
<h2>Autoencoders</h2>
<p>The kind of networks we will discuss here go by many names: autoencoder; autoassociator; or, my personal favorite, Diabolo. The technique is a type of artificial neural network used for unsupervised learning of efficient codings. In plain English, this means it is used to find a different way of representing (encoding) our input data. Autoencoders are sometimes also used to reduce the dimensions of the data.</p>
<p>An autoencoder finds an approximation of the identity function (Id : X → X) through two steps:</p>
<ol>
<li>The encoder step, where the input data is transformed into an intermediate state</li>
<li>The decoder step, which transforms it to match the number of input features</li>
</ol>
<figure class="center" id="id-6LDiA"><img alt="Autoencoder flow diagram" src="https://d3ansictanv2wj.cloudfront.net/Figure2-266285cc235f5a35b56f1044df59900d.png"><figcaption><span class="label">Figure 2. </span>Autoencoder flow diagram, where we input an image of a number (4), encode it into compressed format and then decode it back into image format. Figure by Mateusz Dymczyk.</figcaption></figure>
<p>For the math buffs this can be described as two transitions:</p>
<p class="center"><span class="math-tex" data-type="tex">\(\phi : X \rightarrow F\)</span></p>
<p class="center"><span class="math-tex" data-type="tex">\(\psi : F \rightarrow X\)</span></p>
<p>Usually, autoencoders are trained by optimizing the <a href="http://statsmaths.github.io/stat612/lectures/lec14/lecture14.pdf">mean square error</a> between the output and input layers, where <em>X</em> is the input vector, <em>Y</em> is the output vector, and <em>n</em> is the number of elements:</p>
<p class="center"><span class="math-tex" data-type="tex">\(Y = (\psi\circ\phi)X\)</span></p>
<p class="center"><span class="math-tex" data-type="tex">\(MSE = \frac{1}{n}\sum^{n}_{i=1}(Y-X)^{2}\)</span></p>
<p>After we are done training our autoencoder, we need to set a threshold, which will decide whether we have predicted an anomaly or not. Depending on your exact use case and data, there are different ways to set this threshold—for example, based on a <a href="http://people.inf.elte.hu/kiss/12dwhdm/roc.pdf">receiver operating characteristics</a> (ROC) curve or <a href="https://arxiv.org/pdf/1402.1892.pdf">F1 score</a>. The higher the threshold you set, the longer it will take the system to detect an anomaly (and fewer will be detected, in some circumstances). In this tutorial, we will run predictions on our training data set after we are done training our model, calculate the error for each prediction, and find the mean and standard deviation for those errors. Everything higher than the third standard deviation will be marked as an anomaly.</p>
<h2>System setup</h2>
<p>We will need to first install a few tools before we can jump into data analysis and modeling. I highly recommend using some sort of a Python environment management system such as <a href="https://anaconda.org/anaconda">Anaconda</a> or <a href="https://virtualenv.pypa.io/en/stable/">Virtualenv</a>. This tutorial will use the latter.</p>
<ol>
<li>
<a href="https://virtualenv.pypa.io/en/stable/installation/">Install Virtualenv</a>. On most systems it should be as easy as calling <code>pip install virtualenv</code>.</li>
<li>Create a new virtualenv environment by calling <code>virtualenv oreilly-anomaly</code>. This will create a new folder called <em>oreilly-anomaly</em> in your current directory.</li>
<li>Activate the environment by calling <code>. oreilly-anomaly/bin/activate</code>
</li>
<li>Install <a href="http://www.numpy.org/">Numpy</a>, <a href="https://pandas.pydata.org/">Pandas</a>, <a href="http://jupyter.org/">Jupyter Notebook</a>, and <a href="https://matplotlib.org">Matplotlib</a> by running <code>pip install numpy pandas ipython jupyter ipykernel matplotlib</code>
</li>
<li>Install <a href="https://mxnet.incubator.apache.org/get_started/install.html">MXNet</a>.</li>
<li>Add the virtual env as a Jupyter kernel: <code>python -m ipykernel install --user --name=oreilly-anomaly</code>
</li>
<li>Run the notebook <code>jupyter notebook .</code>
</li>
<li>In Jupyter, choose oreilly as the kernel: Menu → Kernel → Change kernel → oreilly-anomaly</li>
</ol>
<h2>Data set</h2>
<p>As mentioned in the introduction, anomaly detection can be used on data with labels or without labels from different industries. Today, we will use IoT-based data, which can be used for <a href="https://en.wikipedia.org/wiki/Predictive_maintenance">predictive maintenance</a>. Predictive maintenance lets you, using machine data, predict ahead of time when an issue may occur. This has a number of advantages over scheduled maintenance. In traditional systems, you would have to either know your machines really well to know how often they need maintenance, or do frequent checks. Otherwise, there would be a chance of a failure.</p>
<p>The data was gathered using hardware sensors made by a Tokyo based startup, <a href="https://www.lp-research.com/">LP research</a>. The sensors used this time can read up to 21 different values, including <a href="http://mathworld.wolfram.com/Acceleration.html">linear acceleration</a> (rate of change in velocity without changing direction) in the X, Y, and Z dimensions. Today, for the sake of simplicity (and visualization), we will use only one feature: linear acceleration in X. In real life, you probably would want to use more, especially when using neural networks, as they are great at figuring out all the features by themselves.</p>
<p>Figure 3 shows sample data gathered about this feature.</p>
<figure class="center" id="id-5wmiG"><img alt="IoT data about linear acceleration along X axis of equipment" src="https://d3ansictanv2wj.cloudfront.net/Figure3-51d76d01f4280e7bc41fdcea75e07ecd.png"><figcaption><span class="label">Figure 3. </span>IoT data about linear acceleration along X axis of equipment. Figure by Mateusz Dymczyk.</figcaption></figure>
<p>As you can see, the data is quite cyclical and already nicely scaled—this is not always the case. One thing to notice are the occasional spikes, which must be recognized as normal and not anomalous. We will need to make sure our model is smart enough to handle them.</p>
<h2>Feed-forward networks</h2>
<p>When working on a machine learning problem, starting with a simple solution and working your way up iteratively is always a good idea. Otherwise, you can get lost in the complexity from the start.</p>
<p>For that reason, we first will implement our autoencoder using one of the simplest possible types of neural networks, a multilayer perceptron (MLP). An MLP is a type of a <a href="http://media.wiley.com/product_data/excerpt/19/04713491/0471349119.pdf">feedforward neural network</a>, meaning a network with no cycles—all the connections go forward (contrary to <a href="https://www.cs.toronto.edu/~hinton/csc2535/notes/lec10new.pdf">recurrent neural networks</a>, which we will use in the next section). An MLP is a simple network with at least three layers: input, output, and at least one hidden layer between them. A feedforward autoencoder is a special type of MLP, where the number of neurons in the input layer is the same as the number of neurons in the output layer. A simple example appears in Figure 4.</p>
<figure class="center" id="id-5jLin"><img alt="Simple feedforward autoencoder multilayer perceptron" src="https://d3ansictanv2wj.cloudfront.net/Figure4-2a8d0655224df2adfc63b023ebcbdf64.png"><figcaption><span class="label">Figure 4. </span>A simple feedforward autoencoder (MLP). Figure by Mateusz Dymczyk.</figcaption></figure>
<p>The main advantage of MLPs is that they are quite easy to model and fast to train. Also, a lot of research has been done using them, so they are fairly well understood.</p>
<p>When modeling an MLP, there are a few things you, as the creator, need to figure out, including:</p>
<ul>
<li>the number of hidden layers and number of neurons at each layer</li>
<li>the type of activation function used in each neuron</li>
<li>the optimizer used for training</li>
</ul>
<p>All these choices will affect the results of your model. If you choose the wrong parameters, your network might not converge at all, take a long time to converge (for example, if you choose a bad optimizer or bad learning rate), overfit the real-life data, or underfit the real-life data.</p>
<p>Let’s go through the most important parts of the code.</p>
<h3>Data preparation</h3>
<p>We first read the data from our CSV files using the Pandas framework. This will return a Pandas Dataframe:</p>
<pre data-type="programlisting" data-code-language="python">
<code class="n">train_data_raw</code> <code class="o">=</code> <code class="n">pd</code><code class="o">.</code><code class="n">read_csv</code><code class="p">(</code><code class="s1">'resources/normal.csv'</code><code class="p">)</code>
<code class="n">validate_data_raw</code> <code class="o">=</code> <code class="n">pd</code><code class="o">.</code><code class="n">read_csv</code><code class="p">(</code><code class="s1">'resources/verify.csv'</code><code class="p">)</code>
</pre>
<p>Now we want to extract the columns, which we will actually use for training and predictions:</p>
<pre data-type="programlisting" data-code-language="python">
<code class="n">feature_list</code> <code class="o">=</code> <code class="p">[</code><code class="s2">" LinAccX (g)"</code><code class="p">]</code>
<code class="n">features</code> <code class="o">=</code> <code class="nb">len</code><code class="p">(</code><code class="n">feature_list</code><code class="p">)</code>
<code class="n">train_data_selected</code> <code class="o">=</code> <code class="n">train_data_raw</code><code class="p">[</code><code class="n">feature_list</code><code class="p">]</code><code class="o">.</code><code class="n">as_matrix</code><code class="p">()</code>
<code class="n">validate_data_selected</code> <code class="o">=</code> <code class="n">validate_data_raw</code><code class="p">[</code><code class="n">feature_list</code><code class="p">]</code><code class="o">.</code><code class="n">as_matrix</code><code class="p">()</code>
</pre>
<p>Before we start modeling our network, we need to do some more preprocessing. The major drawback of MLP networks is their lack of “memory.” Each record is treated as a separate entity during training and predictions. When dealing with time series, though, the dependency between observations is very important. A single spike in our data does not necessarily mean an anomaly: that depends on its surroundings.</p>
<p>To tackle this, we will create <em>windowed records</em> using a simple method, which will go record by record and append <code>window - 1</code> records to the back of it (in our case, <em>window</em> will be set to 25, but this value should be based on your use case, frequency with which you are getting readings, and how fast you want your model to predict anomalies, potentially sacrificing accuracy). This new type of record will have <code>window * features</code> size and will mimic temporal dependency between between timesteps. If we wish to utilize the first <code>window - 1</code> readings, though, we need to pad them to the appropriate length because MLP networks require a constant number of inputs. In this example, we will pad them with zeroes:</p>
<pre data-type="programlisting" data-code-language="python">
<code class="k">def</code> <code class="nf">prepare_dataset</code><code class="p">(</code><code class="n">dataset</code><code class="p">,</code> <code class="n">window</code><code class="p">):</code>
<code class="n">windowed_data</code> <code class="o">=</code> <code class="p">[]</code>
<code class="k">for</code> <code class="n">i</code> <code class="ow">in</code> <code class="nb">range</code><code class="p">(</code><code class="nb">len</code><code class="p">(</code><code class="n">dataset</code><code class="p">)):</code>
<code class="n">start</code> <code class="o">=</code> <code class="n">i</code> <code class="o">+</code> <code class="mi">1</code> <code class="o">-</code> <code class="n">window</code> <code class="k">if</code> <code class="n">i</code> <code class="o">+</code> <code class="mi">1</code> <code class="o">-</code> <code class="n">window</code> <code class="o">&gt;=</code> <code class="mi">0</code> <code class="k">else</code> <code class="mi">0</code>
<code class="n">observation</code> <code class="o">=</code> <code class="n">dataset</code><code class="p">[</code><code class="n">start</code> <code class="p">:</code> <code class="n">i</code> <code class="o">+</code> <code class="mi">1</code><code class="p">,]</code>
<code class="n">to_pad</code> <code class="o">=</code> <code class="p">(</code><code class="n">window</code> <code class="o">-</code> <code class="n">i</code> <code class="o">-</code> <code class="mi">1</code> <code class="k">if</code> <code class="n">i</code> <code class="o">+</code> <code class="mi">1</code> <code class="o">-</code> <code class="n">window</code> <code class="o">&lt;</code> <code class="mi">0</code> <code class="k">else</code> <code class="mi">0</code><code class="p">)</code> <code class="o">*</code> <code class="n">features</code>
<code class="n">observation</code> <code class="o">=</code> <code class="n">observation</code><code class="o">.</code><code class="n">flatten</code><code class="p">()</code>
<code class="n">observation</code> <code class="o">=</code> <code class="n">np</code><code class="o">.</code><code class="n">lib</code><code class="o">.</code><code class="n">pad</code><code class="p">(</code><code class="n">observation</code><code class="p">,</code> <code class="p">(</code><code class="n">to_pad</code><code class="p">,</code> <code class="mi">0</code><code class="p">),</code> <code class="s1">'constant'</code><code class="p">,</code> <code class="n">constant_values</code><code class="o">=</code><code class="p">(</code><code class="mi">0</code><code class="p">,</code> <code class="mi">0</code><code class="p">))</code>
<code class="n">windowed_data</code><code class="o">.</code><code class="n">append</code><code class="p">(</code><code class="n">observation</code><code class="p">)</code>
<code class="k">return</code> <code class="n">np</code><code class="o">.</code><code class="n">array</code><code class="p">(</code><code class="n">windowed_data</code><code class="p">)</code>
</pre>
<p>When building machine learning models, you don’t want to use all the data for training—this might leave you with a highly overfitted model. For this reason, it is normal to split the data into train and validation sets, and use both for evaluation during training. Normally, splitting the data is easy, but with time-series data, it gets a bit more complicated. This is because of the temporal dependency between records: the context in which each datapoint appears is very important. This is why in this tutorial, instead of randomly sampling the data, we will simply find a split point and use it to split the data into two subsets (80% of the data for training and 20% for testing):</p>
<pre data-type="programlisting" data-code-language="python">
<code class="n">rows</code> <code class="o">=</code> <code class="nb">len</code><code class="p">(</code><code class="n">data_train</code><code class="p">)</code>
<code class="n">split_factor</code> <code class="o">=</code> <code class="mf">0.8</code>
<code class="n">train</code> <code class="o">=</code> <code class="n">data_train</code><code class="p">[</code><code class="mi">0</code><code class="p">:</code><code class="nb">int</code><code class="p">(</code><code class="n">rows</code><code class="o">*</code><code class="n">split_factor</code><code class="p">)]</code>
<code class="n">test</code> <code class="o">=</code> <code class="n">data_train</code><code class="p">[</code><code class="nb">int</code><code class="p">(</code><code class="n">rows</code><code class="o">*</code><code class="n">split_factor</code><code class="p">):]</code>
</pre>
<p>Now we need to prepare a <a href="https://mxnet.incubator.apache.org/api/python/gluon/data.html#mxnet.gluon.data.DataLoader">DataLoader</a> object, which will feed the data in a batched manner to MXNet:</p>
<pre data-type="programlisting" data-code-language="python">
<code class="n">batch_size</code> <code class="o">=</code> <code class="mi">256</code>
<code class="n">train_data</code> <code class="o">=</code> <code class="n">mx</code><code class="o">.</code><code class="n">gluon</code><code class="o">.</code><code class="n">data</code><code class="o">.</code><code class="n">DataLoader</code><code class="p">(</code><code class="n">train</code><code class="p">,</code> <code class="n">batch_size</code><code class="p">,</code> <code class="n">shuffle</code><code class="o">=</code><code class="bp">False</code><code class="p">)</code>
<code class="n">test_data</code> <code class="o">=</code> <code class="n">mx</code><code class="o">.</code><code class="n">gluon</code><code class="o">.</code><code class="n">data</code><code class="o">.</code><code class="n">DataLoader</code><code class="p">(</code><code class="n">test</code><code class="p">,</code> <code class="n">batch_size</code><code class="p">,</code> <code class="n">shuffle</code><code class="o">=</code><code class="bp">False</code><code class="p">)</code>
</pre>
<p>This iterator will pass the training data in batches of 256 records. This is especially important if you’re running on the GPU—batches that are too big can quickly result in out-of-memory errors. On the other hand, batches that are too small will lengthen training time.</p>
<h3>Modeling</h3>
<p>The code for modeling is very brief thanks to the use of <a href="https://mxnet.incubator.apache.org/tutorials/gluon/gluon.html">Apache Gluon</a>, a high-level interface to MXNet.</p>
<p>Our model will be a sequence of blocks representing hidden layers. To make modeling easy, we will use the <a href="https://mxnet.incubator.apache.org/api/python/gluon.html#mxnet.gluon.nn.Sequential">gluon.nn.Sequential</a> for that:</p>
<pre data-type="programlisting" data-code-language="python">
<code class="n">model</code> <code class="o">=</code> <code class="n">gluon</code><code class="o">.</code><code class="n">nn</code><code class="o">.</code><code class="n">Sequential</code><code class="p">()</code>
<code class="k">with</code> <code class="n">model</code><code class="o">.</code><code class="n">name_scope</code><code class="p">():</code>
</pre>
<p>Adding hidden layers, activation functions, and a dropout layer now is a matter of simple MXNet method calls:</p>
<pre data-type="programlisting" data-code-language="python">
<code class="n">model</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">gluon</code><code class="o">.</code><code class="n">nn</code><code class="o">.</code><code class="n">Dense</code><code class="p">(</code><code class="mi">16</code><code class="p">,</code> <code class="n">activation</code><code class="o">=</code><code class="s1">'tanh'</code><code class="p">))</code> <code class="c1"># Adds a fully connected layer with 16 neurons and a tanh activation</code>
<code class="n">model</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">gluon</code><code class="o">.</code><code class="n">nn</code><code class="o">.</code><code class="n">Dropout</code><code class="p">(</code><code class="mf">0.25</code><code class="p">))</code> <code class="c1"># Adds a dropout layer</code>
</pre>
<p>This will feed the input, which we will pass later on to our model object, to the first hidden layer containing 16 neurons, pass it through an activation layer—in this case “tanh,” which not only is computationally cheaper than many other activation functions, but has also been <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.740.9413&amp;rep=rep1&amp;type=pdf">shown</a> to converge quickly and achieve high accuracy for MLP networks—and drop out a portion of our data so we do not overfit. In our network, we will pass the output of Dropout layer to another hidden layer and repeat the cycle two more times (hidden layer with 8 and 16 neurons—hidden layers should have fewer layers than the input one to find structure). Our final layer will not have any activation or dropouts after it and will be treated as the output layer.</p>
<p>Before modeling, we need to assign initial values to the network parameters (in this case, we are using the so-called <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.207.2059&amp;rep=rep1&amp;type=pdf">Xavier initialization</a>) and prepare a trainer object (here we are using the <a href="https://arxiv.org/pdf/1412.6980.pdf">Adam optimizer</a>:</p>
<pre data-type="programlisting" data-code-language="python">
<code class="n">model</code><code class="o">.</code><code class="n">collect_params</code><code class="p">()</code><code class="o">.</code><code class="n">initialize</code><code class="p">(</code><code class="n">mx</code><code class="o">.</code><code class="n">init</code><code class="o">.</code><code class="n">Xavier</code><code class="p">(),</code> <code class="n">ctx</code><code class="o">=</code><code class="n">ctx</code><code class="p">)</code>
<code class="n">trainer</code> <code class="o">=</code> <code class="n">gluon</code><code class="o">.</code><code class="n">Trainer</code><code class="p">(</code><code class="n">model</code><code class="o">.</code><code class="n">collect_params</code><code class="p">(),</code> <code class="s1">'adam'</code><code class="p">,</code> <code class="p">{</code><code class="s1">'learning_rate'</code><code class="p">:</code> <code class="mf">0.001</code><code class="p">})</code>
</pre>
<p>For this problem (we are interested in calculating our loss as a difference between the output and input layers), we can use the <a href="https://mxnet.incubator.apache.org/api/python/gluon/loss.html#mxnet.gluon.loss.L2Loss">gluon.loss.L2Loss</a> to calculate the mean squared error for this:</p>
<pre data-type="programlisting" data-code-language="python">
<code class="n">L</code> <code class="o">=</code> <code class="n">gluon</code><code class="o">.</code><code class="n">loss</code><code class="o">.</code><code class="n">L2Loss</code><code class="p">()</code>
</pre>
<p>Finally, we prepare an evaluation method, which will check how well our model does after each epoch:</p>
<pre data-type="programlisting" data-code-language="python">
<code class="k">def</code> <code class="nf">evaluate_accuracy</code><code class="p">(</code><code class="n">data_iterator</code><code class="p">,</code> <code class="n">model</code><code class="p">,</code> <code class="n">L</code><code class="p">):</code>
<code class="n">loss_avg</code> <code class="o">=</code> <code class="mf">0.</code>
<code class="k">for</code> <code class="n">i</code><code class="p">,</code> <code class="n">data</code> <code class="ow">in</code> <code class="nb">enumerate</code><code class="p">(</code><code class="n">data_iterator</code><code class="p">):</code>
<code class="n">data</code> <code class="o">=</code> <code class="n">data</code><code class="o">.</code><code class="n">as_in_context</code><code class="p">(</code><code class="n">ctx</code><code class="p">)</code> <code class="c1"># Pass data to the CPU or GPU</code>
<code class="n">label</code> <code class="o">=</code> <code class="n">data</code>
<code class="n">output</code> <code class="o">=</code> <code class="n">model</code><code class="p">(</code><code class="n">data</code><code class="p">)</code> <code class="c1"># Run batch through our network</code>
<code class="n">loss</code> <code class="o">=</code> <code class="n">L</code><code class="p">(</code><code class="n">output</code><code class="p">,</code> <code class="n">label</code><code class="p">)</code> <code class="c1"># Calculate the loss</code>
<code class="n">loss_avg</code> <code class="o">=</code> <code class="n">loss_avg</code><code class="o">*</code><code class="n">i</code><code class="o">/</code><code class="p">(</code><code class="n">i</code><code class="o">+</code><code class="mi">1</code><code class="p">)</code> <code class="o">+</code> <code class="n">nd</code><code class="o">.</code><code class="n">mean</code><code class="p">(</code><code class="n">loss</code><code class="p">)</code><code class="o">.</code><code class="n">asscalar</code><code class="p">()</code><code class="o">/</code><code class="p">(</code><code class="n">i</code><code class="o">+</code><code class="mi">1</code><code class="p">)</code>
<code class="k">return</code> <code class="n">loss_avg</code>
</pre>
<p>And train in a loop for a number of epochs:</p>
<pre data-type="programlisting" data-code-language="python">
<code class="n">epochs</code> <code class="o">=</code> <code class="mi">50</code>
<code class="n">all_train_mse</code> <code class="o">=</code> <code class="p">[]</code>
<code class="n">all_test_mse</code> <code class="o">=</code> <code class="p">[]</code>
<code class="c1"># Gluon training loop</code>
<code class="k">for</code> <code class="n">e</code> <code class="ow">in</code> <code class="nb">range</code><code class="p">(</code><code class="n">epochs</code><code class="p">):</code>
<code class="k">for</code> <code class="n">i</code><code class="p">,</code> <code class="n">data</code> <code class="ow">in</code> <code class="nb">enumerate</code><code class="p">(</code><code class="n">train_data</code><code class="p">):</code>
<code class="n">data</code> <code class="o">=</code> <code class="n">data</code><code class="o">.</code><code class="n">as_in_context</code><code class="p">(</code><code class="n">ctx</code><code class="p">)</code>
<code class="n">label</code> <code class="o">=</code> <code class="n">data</code>
<code class="k">with</code> <code class="n">autograd</code><code class="o">.</code><code class="n">record</code><code class="p">():</code>
<code class="n">output</code> <code class="o">=</code> <code class="n">model</code><code class="p">(</code><code class="n">data</code><code class="p">)</code> <code class="c1">#Feed the data into our model</code>
<code class="n">loss</code> <code class="o">=</code> <code class="n">L</code><code class="p">(</code><code class="n">output</code><code class="p">,</code> <code class="n">label</code><code class="p">)</code> <code class="c1">#Compute the loss</code>
<code class="n">loss</code><code class="o">.</code><code class="n">backward</code><code class="p">()</code> <code class="c1">#Adjust parameters</code>
<code class="n">trainer</code><code class="o">.</code><code class="n">step</code><code class="p">(</code><code class="n">batch_size</code><code class="p">)</code>
<code class="n">train_mse</code> <code class="o">=</code> <code class="n">evaluate_accuracy</code><code class="p">(</code><code class="n">train_data</code><code class="p">,</code> <code class="n">model</code><code class="p">,</code> <code class="n">L</code><code class="p">)</code>
<code class="n">test_mse</code> <code class="o">=</code> <code class="n">evaluate_accuracy</code><code class="p">(</code><code class="n">test_data</code><code class="p">,</code> <code class="n">model</code><code class="p">,</code> <code class="n">L</code><code class="p">)</code>
<code class="n">all_train_mse</code><code class="o">.</code><code class="n">append</code><code class="p">(</code><code class="n">train_mse</code><code class="p">)</code>
<code class="n">all_test_mse</code><code class="o">.</code><code class="n">append</code><code class="p">(</code><code class="n">test_mse</code><code class="p">)</code>
</pre>
<p>Figure 5 shows how close the model works for our training data and validation data.</p>
<figure class="center" id="id-zxikA"><img alt="MSE results for training and validation data" src="https://d3ansictanv2wj.cloudfront.net/Figure5-cda522dfb5414acc6c93963cf95d8f98.png"><figcaption><span class="label">Figure 5. </span>MSE results for training and validation data. Figure by Mateusz Dymczyk.</figcaption></figure>
<p>After fitting the model, we can feed new data to make predictions:</p>
<pre data-type="programlisting" data-code-language="python">
<code class="k">def</code> <code class="nf">predict</code><code class="p">(</code><code class="n">to_predict</code><code class="p">,</code> <code class="n">L</code><code class="p">):</code>
<code class="n">predictions</code> <code class="o">=</code> <code class="p">[]</code>
<code class="k">for</code> <code class="n">i</code><code class="p">,</code> <code class="n">data</code> <code class="ow">in</code> <code class="nb">enumerate</code><code class="p">(</code><code class="n">to_predict</code><code class="p">):</code>
<code class="nb">input</code> <code class="o">=</code> <code class="n">data</code><code class="o">.</code><code class="n">as_in_context</code><code class="p">(</code><code class="n">ctx</code><code class="p">)</code>
<code class="n">out</code> <code class="o">=</code> <code class="n">model</code><code class="p">(</code><code class="nb">input</code><code class="p">)</code>
<code class="n">prediction</code> <code class="o">=</code> <code class="n">L</code><code class="p">(</code><code class="n">out</code><code class="p">,</code> <code class="nb">input</code><code class="p">)</code><code class="o">.</code><code class="n">asnumpy</code><code class="p">()</code><code class="o">.</code><code class="n">flatten</code><code class="p">()</code>
<code class="n">predictions</code> <code class="o">=</code> <code class="n">np</code><code class="o">.</code><code class="n">append</code><code class="p">(</code><code class="n">predictions</code><code class="p">,</code> <code class="n">prediction</code><code class="p">)</code>
<code class="k">return</code> <code class="n">predictions</code>
</pre>
<p>After calculating the MSE for all of our training data, we can set our threshold for anomalies:</p>
<pre data-type="programlisting" data-code-language="python">
<code class="n">threshold</code> <code class="o">=</code> <code class="n">np</code><code class="o">.</code><code class="n">mean</code><code class="p">(</code><code class="n">errors</code><code class="p">)</code> <code class="o">+</code> <code class="mi">3</code><code class="o">*</code><code class="n">np</code><code class="o">.</code><code class="n">std</code><code class="p">(</code><code class="n">errors</code><code class="p">)</code>
</pre>
<p>Finally, we can run predictions on a test data set, which was, in this case, prepared by programming the robot engine to simulate failure (another option would be to use statistics to generate erroneous data). Figure 6 shows the resulting anomalies in red.</p>
<figure class="center" id="id-P9ix4"><img alt="Anomalies in test data set" src="https://d3ansictanv2wj.cloudfront.net/Figure6-d9ad364be6d88d0ec8baf0b1c3dc5fd9.png"><figcaption><span class="label">Figure 6. </span>Anomalies in the test data set. Figure by Mateusz Dymczyk.</figcaption></figure>
<p>The robot was programmed so it would stutter around time 2,000 and just before 4,000, which the MLP diagnosed correctly.</p>
<p>As we can see, even though our training data set contained a lot of scattered points between 0.1 and 0.2, our network was smart enough to figure out that if there are multiple such readings together, there’s probably something wrong. We can also notice that it properly predicts that some readings with such values around time 1,000 are non-anomalous, but it also gets some of them wrong. We might need to tweak the parameters (dropout, regularization, or split type) a bit to get a better model.</p>
<p>The necessity of windowing our data set can be shown by running our script with window size 1 (see Figure 7):</p>
<figure class="center" id="id-2ril8"><img alt="Results of MLP without proper windowing" src="https://d3ansictanv2wj.cloudfront.net/Figure7-326d1c91ad762a7883d3a68aacb2a1da.png"><figcaption><span class="label">Figure 7. </span>Results of MLP without proper windowing. Figure by Mateusz Dymczyk.</figcaption></figure>
<p>We clearly see that the network does not respect any temporal structure and simply overfits to the majority of our training data set, which is approximately between [-1, 1]. All the scattered points outside that range are incorrectly marked as anomalies.</p>
<h2>LSTM</h2>
<p>Now that we have a working, basic, solution, let’s think about our problem a bit more. In the “data preparation” section of our MLP example, we mentioned that training record by record using MLP networks does not persist any previous information, and that this might be an issue in this case. This is because, in time-series analysis, the time dependency is often of great importance. Our windowing strategy was a poor man’s solution to this shortcoming in the MLP example.</p>
<p>To overcome this shortcoming, we will look now into <a href="https://www.cs.toronto.edu/~hinton/csc2535/notes/lec10new.pdf">recurrent neural networks</a>. As in FF nets, they will be built using neurons with activation functions, but the main difference is that they will also contain cycles. This will be achieved by feeding into our network not only the observation, but also some state from the previous run of the network. Figure 8 shows the architecture of an RNN.</p>
<figure class="center" id="id-1ViEk"><img alt="Recurrent neural network" src="https://d3ansictanv2wj.cloudfront.net/Figure8-64698ab246ce02fdfadf033b9f2779d4.png"><figcaption><span class="label">Figure 8. </span>Recurrent neural network. Figure by Mateusz Dymczyk.</figcaption></figure>
<p>The main issue with traditional RNNs is that, when using, for example, tanh or a similar activation layer in which output is always in the [-1,1] range, they encode a vast amount of information into a small output range. This makes learning long-term dependencies very challenging. For example, when building a model predicting the next word in a sentence, if the next word can be derived using only a few previous words, we will probably be fine. If, on the other hand, the context required is much broader (several sentences), we might not retain enough “long-term” information to be able to make a proper prediction.</p>
<p>For the math inclined, this is due to a phenomena called <a href="https://arxiv.org/pdf/1211.5063.pdf">vanishing gradient and exploding gradient</a> problems. In the latter case, the weights in our network can start growing exponentially, possibly becoming more important than they should. This problem could be solved by simple truncation or squashing of too high values. The former is a bigger problem, where some (or all) of our weights get smaller exponentially, sometimes becoming so small that the computation error on your machine might render them completely useless.</p>
<p>These problems led to the creation of so-called <em>long short-term memory networks</em> (LSTM). The basic idea behind LSTMs is that they, like traditional RNNs, have a chain-like structure and retain previous information, but at a more steady rate. Whereas vanilla RNNs have only one activation layer inside each cell, LSTMs use input (which decides what new information should be passed to the network), forget (which decides what information to forget and at what rate), and output gates to calculate the new state as shown in Figure 9.</p>
<figure class="center" id="id-nGive"><img alt="Cell in an LSTM network" src="https://d3ansictanv2wj.cloudfront.net/Figure9-23272a5735ea32f8d60889b4c426c1bc.png"><figcaption><span class="label">Figure 9. </span>Cell in an LSTM network. Figure by Mateusz Dymczyk.</figcaption></figure>
<p>By setting the number of outputs to the same value as the number of inputs, we can again obtain an autoencoder—this time one that remembers previous state on its own.</p>
<p>In contrast to our MLP example, we will not do any upfront data preprocessing. We will split the data into train and validation sets again, though. Gluon provides us with a bit different abstraction for data ingestion:</p>
<pre data-type="programlisting" data-code-language="python">
<code class="n">split_factor</code> <code class="o">=</code> <code class="mf">0.8</code>
<code class="n">train</code> <code class="o">=</code> <code class="n">train_data_selected</code><code class="o">.</code><code class="n">astype</code><code class="p">(</code><code class="n">np</code><code class="o">.</code><code class="n">float32</code><code class="p">)[</code><code class="mi">0</code><code class="p">:</code><code class="nb">int</code><code class="p">(</code><code class="n">rows</code><code class="o">*</code><code class="n">split_factor</code><code class="p">)]</code>
<code class="n">validation</code> <code class="o">=</code> <code class="n">train_data_selected</code><code class="o">.</code><code class="n">astype</code><code class="p">(</code><code class="n">np</code><code class="o">.</code><code class="n">float32</code><code class="p">)[</code><code class="nb">int</code><code class="p">(</code><code class="n">rows</code><code class="o">*</code><code class="n">split_factor</code><code class="p">):]</code>
<code class="n">train_data</code> <code class="o">=</code> <code class="n">mx</code><code class="o">.</code><code class="n">gluon</code><code class="o">.</code><code class="n">data</code><code class="o">.</code><code class="n">DataLoader</code><code class="p">(</code><code class="n">train</code><code class="p">,</code> <code class="n">batch_size</code><code class="p">,</code> <code class="n">shuffle</code><code class="o">=</code><code class="bp">False</code><code class="p">)</code>
<code class="n">validation_data</code> <code class="o">=</code> <code class="n">mx</code><code class="o">.</code><code class="n">gluon</code><code class="o">.</code><code class="n">data</code><code class="o">.</code><code class="n">DataLoader</code><code class="p">(</code><code class="n">validation</code><code class="p">,</code> <code class="n">batch_size</code><code class="p">,</code> <code class="n">shuffle</code><code class="o">=</code><code class="bp">False</code><code class="p">)</code>
</pre>
<p>Wrapper classes make it easy to model even complex networks such as LSTM or, even better, sequences of LSTMs. The following code creates a sequence of stacked neural network blocks, initializes all the parameters using the Xavier initialization, generates an optimizer, and prepares a loss function:</p>
<pre data-type="programlisting" data-code-language="python">
<code class="n">model</code> <code class="o">=</code> <code class="n">mx</code><code class="o">.</code><code class="n">gluon</code><code class="o">.</code><code class="n">nn</code><code class="o">.</code><code class="n">Sequential</code><code class="p">()</code>
<code class="k">with</code> <code class="n">model</code><code class="o">.</code><code class="n">name_scope</code><code class="p">():</code>
<code class="n">model</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">mx</code><code class="o">.</code><code class="n">gluon</code><code class="o">.</code><code class="n">rnn</code><code class="o">.</code><code class="n">LSTM</code><code class="p">(</code><code class="n">window</code><code class="p">,</code> <code class="n">dropout</code><code class="o">=</code><code class="mf">0.35</code><code class="p">))</code>
<code class="n">model</code><code class="o">.</code><code class="n">add</code><code class="p">(</code><code class="n">mx</code><code class="o">.</code><code class="n">gluon</code><code class="o">.</code><code class="n">rnn</code><code class="o">.</code><code class="n">LSTM</code><code class="p">(</code><code class="n">features</code><code class="p">))</code>
<code class="c1"># Use the non default Xavier parameter initializer</code>
<code class="n">model</code><code class="o">.</code><code class="n">collect_params</code><code class="p">()</code><code class="o">.</code><code class="n">initialize</code><code class="p">(</code><code class="n">mx</code><code class="o">.</code><code class="n">init</code><code class="o">.</code><code class="n">Xavier</code><code class="p">(),</code> <code class="n">ctx</code><code class="o">=</code><code class="n">ctx</code><code class="p">)</code>
<code class="c1"># Use Adam optimizer for training</code>
<code class="n">trainer</code> <code class="o">=</code> <code class="n">gluon</code><code class="o">.</code><code class="n">Trainer</code><code class="p">(</code><code class="n">model</code><code class="o">.</code><code class="n">collect_params</code><code class="p">(),</code> <code class="s1">'adam'</code><code class="p">,</code> <code class="p">{</code><code class="s1">'learning_rate'</code><code class="p">:</code> <code class="mf">0.01</code><code class="p">})</code>
<code class="c1"># Similarly to previous example we will use L2 loss for evaluation</code>
<code class="n">L</code> <code class="o">=</code> <code class="n">gluon</code><code class="o">.</code><code class="n">loss</code><code class="o">.</code><code class="n">L2Loss</code><code class="p">()</code>
</pre>
<p>Training in Gluon is performed in a simple for loop, looping the number of times specified by the variable epochs:</p>
<pre data-type="programlisting" data-code-language="python">
<code class="k">for</code> <code class="n">e</code> <code class="ow">in</code> <code class="nb">range</code><code class="p">(</code><code class="n">epochs</code><code class="p">):</code>
<code class="k">for</code> <code class="n">i</code><code class="p">,</code> <code class="n">data</code> <code class="ow">in</code> <code class="nb">enumerate</code><code class="p">(</code><code class="n">train_data</code><code class="p">):</code>
<code class="n">data</code> <code class="o">=</code> <code class="n">data</code><code class="o">.</code><code class="n">as_in_context</code><code class="p">(</code><code class="n">ctx</code><code class="p">)</code><code class="o">.</code><code class="n">reshape</code><code class="p">((</code><code class="o">-</code><code class="mi">1</code><code class="p">,</code><code class="n">features</code><code class="p">,</code><code class="mi">1</code><code class="p">))</code>
<code class="n">label</code> <code class="o">=</code> <code class="n">data</code>
<code class="k">with</code> <code class="n">autograd</code><code class="o">.</code><code class="n">record</code><code class="p">():</code>
<code class="n">output</code> <code class="o">=</code> <code class="n">model</code><code class="p">(</code><code class="n">data</code><code class="p">)</code>
<code class="n">loss</code> <code class="o">=</code> <code class="n">L</code><code class="p">(</code><code class="n">output</code><code class="p">,</code> <code class="n">label</code><code class="p">)</code>
<code class="n">loss</code><code class="o">.</code><code class="n">backward</code><code class="p">()</code>
<code class="n">trainer</code><code class="o">.</code><code class="n">step</code><code class="p">(</code><code class="n">batch_size</code><code class="p">)</code>
</pre>
<p>This code will, for each epoch, use our training data batch by batch to calculate outputs using the <code>model(data)</code> call, then calculate the loss and update all the parameters using the trainer object. MSE results are shown in Figure 10.</p>
<figure class="center" id="id-1ViEL"><img alt="How the first LSTM model fits training and validation data" src="https://d3ansictanv2wj.cloudfront.net/Figure10-9b3a08ea397f29822a9a77a1fa462c50.png"><figcaption><span class="label">Figure 10. </span>How the first LSTM model fits training and validation data. Figure by Mateusz Dymczyk.</figcaption></figure>
<p>In this case, both MSE values converge really fast to 0. This might mean we are overfitting our model and should probably tweak some parts of our network design.</p>
<p>Using the same technique as in our previous example, we can obtain a threshold value and run predictions on our test set, which produce the results shown in Figure 11.</p>
<figure class="center" id="id-lWiAb"><img alt="Results of the first LSTM" src="https://d3ansictanv2wj.cloudfront.net/Figure11-73af0f1ce918ba094595cccb5dce76fb.png"><figcaption><span class="label">Figure 11. </span>Results of the first LSTM. Figure by Mateusz Dymczyk.</figcaption></figure>
<p>As we can see this time, the network is not predicting all of the readings of the erroneous cycles as anomalies, but it did get enough right for us to detect a problem with the machine.</p>
<h2>Further improvements</h2>
<p>Even though we were able to create seemingly useful models in this tutorial, there are still several important aspects we did not have time to cover:</p>
<ol>
<li>Data preparation. Depending on your data set, you might need to explore additional data preparation steps. For example, it is often a good idea to normalize your time-series data for neural networks. In other cases, you will need to encode your features—for example when you have categorical features.</li>
<li>Network architecture optimization. Different use cases will require a different number of hidden layers, activation functions, regularizations, optimizers, dropout layers, etc.</li>
<li>If you go through the <a href="https://mxnet.incubator.apache.org/api/python/gluon.html#mxnet.gluon.rnn.LSTM">LSTM Gluon doc</a> and the LSTM diagram again, you will notice that each sample in an LSTM problem can be defined as a sequence of observations. Similarly, as in the MLP example, we could use our windowing method to make it so every observation contained several readings and insert that into our LSTM network. In this case, the network could potentially make more use of the temporal dependency between readings, while the input to the network would still be only of size <em>features</em>.</li>
<li>Different train/validation split strategies and automated model evaluation. We evaluated our model only by plotting and checking a sample test set. In the real world, you will want to prepare labeled test data sets (for example, using statistics) and use some kind of metric, such as <a href="https://www.cs.cornell.edu/courses/cs578/2003fa/performance_measures.pdf">prediction and recall</a>, to see in an automated fashion how well your model is performing.</li>
</ol>
<h2>Conclusions</h2>
<p>In this tutorial, we tackled the problem of anomaly detection in time-series IoT data. As we now see, anomaly detection is a very broad problem, where different use cases require different techniques both for data preparation and modeling. We explored two robust approaches: feed-forward neural networks and long short-term memory networks, each having advantages and disadvantages. FFNs are faster (5.5 sec for 50 epochs vs 33 sec for 25 epochs on an NVidia 1080 GPU) but require a bit more planning and preparation upfront, whereas LSTMs are a bit slower but also smarter. Deep neural networks prove to be really good at finding structure and dependencies, but in return require at least basic knowledge of how to structure them. Finally, we see the importance of frameworks, such as Apache MXNet, to make this difficult task more approachable.</p>
<p>Continue reading <a href='https://www.oreilly.com/ideas/anomaly-detection-with-apache-mxnet'>Anomaly detection with Apache MXNet.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/EAxig6PzVbc" height="1" width="1" alt=""/>Mateusz Dymczykhttps://www.oreilly.com/ideas/anomaly-detection-with-apache-mxnetTensorFlow brings AI to the connected device2018-01-24T12:00:00Ztag:www.oreilly.com,2018-01-24:/ideas/tensorflow-brings-ai-to-the-connected-device<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/system-2457651_1920_crop-d22414539d6effd9a376b4c88cd0751a.jpg'/></p><p><em>TensorFlow Lite enriches the mobile experience.</em></p><p>Consumers are beginning to expect more AI-driven interactions with their devices, whether they are interacting with smart assistants or expecting more tailored content within an application. However, when considering the landscape of available AI-focused applications, the list is significantly biased toward manufacturers. So, how can a third-party app developer provide an experience that is similar in performance and interactivity to built-in AI’s like Siri or Google Assistant?</p>
<p>This is why the release of TensorFlow Lite is so significant. TensorFlow released <a href="https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/lite">TensorFlow Lite</a>, this past November as an evolution of TensorFlow Mobile. TensorFlow Lite aims to alleviate some of the barriers to successfully implementing machine learning on a mobile device for third-party developers. The out-of-the-box version of TensorFlow can certainly develop models that are leveraged by mobile applications, but depending on the model’s complexity and size of the input data, the actual computation with current versions may be more efficient off of the device.</p>
<p>As personalized interactions become more ubiquitous, however, some of the computation and inference needs to reside closer to the device itself. Additionally, by enabling more of the computation to take place locally, security and offline interactions are enhanced by not requiring data transferred to and from the device.</p>
<p>We can do this by taking an existing TensorFlow model and running it through the TensorFlow Lite converter. TensorFlow Lite is designed to take existing trained models, which were developed on less constrained hardware, and convert them into a mobile-optimized version. We still get all of the flexibility and scale of training models on scalable infrastructure, but once we have that model, we are no longer bound to using that same hardware to deploy the application.</p>
<figure class="center" id="id-RlWiz"><img alt="tensorflow lite architecture" src="https://d3ansictanv2wj.cloudfront.net/image1-a111b1de2750c77c43e2e7d63cab18db.png"><figcaption><span class="label">Figure 1. </span>Image <a href="https://www.tensorflow.org/images/tflite-architecture.jpg">courtesy of TensorFlow</a>.</figcaption></figure>
<p>As a concrete example, let’s say we are building an application that considers whether an object is or is not a hotdog (borrowed from HBO's <em>Silicon Valley</em> show). Image classification techniques are fairly well understood, but the underlying data is often fairly sizable, especially with the resolution of current mobile cameras. This leaves us in a bit of a pickle (not hotdog). We know that consumer expectations are for low latency, high accuracy, and minimal impact on their data plan. So, what do we do?</p>
<p>In the days before TensorFlow Lite (or its earlier iterations), we might have had to consider sending these images to the cloud so they could be processed and scored on scalable hardware. This almost certainly would mean that we'd failed to meet our consumer's expectations. There would probably be a noticeable time between submitting the photo through the app and the returned response, and while the accuracy of the model should not be impacted, we definitely are impacting the consumer's data plan by sending the captured image off of the device.</p>
<p>So, in order to provide a better and more efficient experience, we could either come up with a <a href="https://en.wikipedia.org/wiki/Weissman_score">Weissman score</a>-defying compression algorithm that allows us to reduce the size of the input data for transfer to the cloud, or we could convert the existing model with TensorFlow Lite and do the computation locally.</p>
<p>If we were to pursue the second option, it might look something like this:</p>
<ol>
<li>Convince a group of graduate students to classify images by hand, or use a mechanical turk service.</li>
<li>Train the classifier in the cloud where resources can scale to meet the needs of the model.</li>
<li>Convert the trained model with TensorFlow Lite to enable better performance and computation on a mobile device.</li>
</ol>
<p>The model format for TensorFlow Lite, tflite, has been developed to enable this use case by making optimizations and enhancements to the format of the model itself, better memory performance for inference and calculation, and an interface to interact directly with the mobile device’s hardware if available.</p>
<p>The release of TensorFlow Lite is a key development in the adoption of AI into the mobile experience. By optimizing an existing model in TensorFlow for mobile hardware and operating systems, third-party applications can begin to incorporate more sophisticated AI applications on the device directly. I am definitely looking forward to seeing these benefits take shape with more augmented reality and natural language applications.</p>
<p><em>This post is a collaboration between O'Reilly and </em><a href="https://www.tensorflow.org/"><em>TensorFlow</em></a><em>. </em><a href="http://www.oreilly.com/about/editorial_independence.html"><em>See our statement of editorial independence</em></a><em>.</em></p>
<p>Continue reading <a href='https://www.oreilly.com/ideas/tensorflow-brings-ai-to-the-connected-device'>TensorFlow brings AI to the connected device.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/6JnUvm4ixUE" height="1" width="1" alt=""/>Mitchell Weisshttps://www.oreilly.com/ideas/tensorflow-brings-ai-to-the-connected-deviceFour short links: 24 January 20182018-01-24T11:05:00Ztag:www.oreilly.com,2018-01-24:/ideas/four-short-links-24-january-2018<p><em>ActivityPub Approved, Real-Time Compression, Beating Toxic Work, and Javascript Verification</em></p><ol>
<li>
<a href="https://www.w3.org/TR/2018/REC-activitypub-20180123/">ActivityPub is a W3C Recommendation</a> -- <i>a decentralized social networking protocol based upon the [ActivityStreams] 2.0 data format. It provides a client-to-server API for creating, updating and deleting content, as well as a federated server-to-server API for delivering notifications and content.</i>
</li>
<li>
<a href="http://facebook.github.io/zstd/">Zstd</a> -- <i>a real-time compression algorithm, providing high compression ratios. It offers a very wide range of compression / speed trade-off, while being backed by a very fast decoder. It also offers a special mode for small data, called dictionary compression, and can create dictionaries from any sample set. Zstandard library is provided as open source software using a BSD license.</i> (via <a href="https://twitter.com/jedisct1/status/954707388069810178">Frank Denis</a>)</li>
<li>
<a href="https://medium.com/girl-geek-x/4-tips-for-self-care-from-tech-workers-c55f1747fbba">Tips for Self-Care from Tech Workers</a> (Angie Chang) -- <i>Quit if you are in a toxic work environment.</i> (via <a href="https://twitter.com/MikeStok/status/955989787608891392">Mike Stok</a>)</li>
<li>
<a href="https://www.doc.ic.ac.uk/~jfaustin/javert.pdf">JaVerT: The JavaScript Verification Toolchain</a> -- <i>Using JaVerT, we verify functional correctness properties of data-structure libraries (key-value map, priority queue) written in object-oriented style; operations on data structures such as binary search trees (BSTs) and lists; examples illustrating function closures; and test cases from the official ECMAScript test suite. The verification times suggest that reasoning about larger, more complex code using JaVerT is feasible.</i> (via <a href="https://blog.acolyer.org/2018/01/19/javert-javascript-verification-toolchain/">Adrian Colyer</a>)</li>
</ol>
<p>Continue reading <a href='https://www.oreilly.com/ideas/four-short-links-24-january-2018'>Four short links: 24 January 2018.</a></p><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/3sGIgV2zRvQ" height="1" width="1" alt=""/>Nat Torkingtonhttps://www.oreilly.com/ideas/four-short-links-24-january-2018