Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Deep learning cases - Founders Institute/Moscow - 2017.10.19

2.
AI/ML/DL
● Artificial Intelligence (AI) is a broad field of
study dedicated to complex problem solving.
● Machine Learning (ML) is usually considered
as a subfield of AI. ML is a data-driven
approach focused on creating algorithms that
has the ability to learn from the data without
being explicitly programmed.
● Deep Learning (DL) is a subfield of ML focused
on deep neural networks (NN) able to
automatically learn hierarchical
representations.

31.
Encoding semantics
Using word2vec instead of word indexes allows you to better deal with the word
meanings (e.g. no need to enumerate all synonyms because their vectors are
already close to each other).
But the naive way to work with word2vec vectors still gives you a “bag of words”
model, where phrases “The man killed the tiger” and “The tiger killed the man” are
equal.
Need models which pay attention to the word ordering: paragraph2vec, sentence
embeddings (using RNN/LSTM), even World2Vec (LeCunn @CVPR2015).

41.
Case: Sentiment analysis
https://blog.openai.com/unsupervised-sentiment-neuron/
“Our research implies that simply training large unsupervised next-step-prediction
models on large amounts of data may be a good approach to use when creating
systems with good representation learning capabilities.”

44.
Speech Recognition: Word Error Rate (WER)
“Google now has just an 8 percent error rate. Compare that to 23 percent in
2013” (2015)
http://venturebeat.com/2015/05/28/google-says-its-speech-recognition-technology-now-has-only-an-8-word-error-rate/
IBM Watson. “The performance of our new system – an 8% word error rate – is
36% better than previously reported external results.” (2015)
https://developer.ibm.com/watson/blog/2015/05/26/ibm-watson-announces-breakthrough-in-conversational-speech-transcr
iption/
Baidu. “We are able to reduce error rates of our previous end-to-end system in
English by up to 43%, and can also recognize Mandarin speech with high
accuracy. Creating high-performing recognizers for two very different languages,
English and Mandarin, required essentially no expert knowledge of the
languages” (2015)
http://arxiv.org/abs/1512.02595

45.
Example: Baidu Deep Speech 2 (2015)
● “The Deep Speech 2 ASR pipeline approaches or exceeds the accuracy of Amazon Mechanical
Turk human workers on several benchmarks, works in multiple languages with little modification,
and is deployable in a production setting.”
● “Table 13 shows that the DS2 system outperforms humans in 3 out of the 4 test sets and is
competitive on the fourth. Given this result, we suspect that there is little room for a generic speech
system to further improve on clean read speech without further domain adaptation”
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin, http://arxiv.org/abs/1512.02595

59.
AlphaGo in datacenters
“We’ve managed to reduce the amount of energy we use for cooling by up to 40 percent.”
https://deepmind.com/blog/deepmind-ai-reduces-google-data-centre-cooling-bill-40/

60.
Drone control
http://www.digitaltrends.com/cool-tech/swiss-drone-ai-follows-trails/
This drone can automatically follow forest
trails to track down lost hikers

61.
Car control
Meet the 26-Year-Old Hacker Who Built a
Self-Driving Car... in His Garage
https://www.youtube.com/watch?v=KTrgRYa2wbI

62.
Car driving
https://www.youtube.com/watch?v=YuyT2SDcYrU
“Actually a “Perception to Action” system. The visual perception and control
system is a Deep learning architecture trained end to end to transform pixels
from the cameras into steering angles. And this car uses regular color cameras,
not LIDARS like the Google cars. It is watching the driver and learns.”

63.
Example: Sensorimotor Deep Learning
“In this project we aim to develop deep learning techniques that can be deployed
on a robot to allow it to learn directly from trial-and-error, where the only
information provided by the teacher is the degree to which it is succeeding at the
current task.”
http://rll.berkeley.edu/deeplearningrobotics/

66.
DL/Transfer of Ideas
Methods developed for one modality are successfully transferred to another:
● Convolutional Neural Networks, CNNs (originally developed for image
recognition) work well on texts, speech and some time-series signals (e.g.
ECG).
● Recurrent Neural Networks, RNNs (mostly used on language and other
sequential data) seem to work on images.
If the technologies successfully transfer from one modality to another (for
example, image to texts for CNNs) then probably the ideas worked in one domain
will work in another (style transfer for images could be transferred to texts).

67.
Why Deep Learning is helpful? Or even a game-changer
● Works on raw data (pixels, sound, text or chars), no need to feature
engineering
○ Some features are really hard to develop (requires years of work for
group of experts)
○ Some features are patented (i.e. SIFT, SURF for images)
● Allows end-to-end learning (pixels-to-category, sound to sentence, English
sentence to Chinese sentence, etc)
○ No need to do segmentation, etc. (a lot of manual labor)
⇒ You can iterate faster (and get superior quality at the same time!)

68.
Still some issues exist: Datasets
● No dataset -- no deep learning
There are a lot of data available (and it’s required for deep learning,
otherwise simple models could be better)
○ But sometimes you have no dataset…
■ Nonetheless some hacks available: Transfer learning, Data
augmentation, Mechanical Turk, …

70.
Still some issues exist: Computing power
● Requires a lot of computations. No cluster or GPU machines
-- much more time required
● Currently GPUs (mostly NVIDIA) is the only choice
● Waiting FPGA/ASIC coming into this field (Google TPU gen.2, Intel
2017+). The situation resembles the path of Bitcoin mining
● Neuromorphic computing is on the rise (IBM TrueNorth, memristors, etc)
● Quantum computing can benefit machine learning as well (but probably it
won’t be a desktop or in-house server solutions)

74.
Still some issues exist: Reasoning
Deep learning is mainly about perception, but there is a lot of inference involved in
everyday human reasoning.
● Neural networks lack common sense
● Cannot find information by inference
● Cannot explain the answer
○ It could be a must-have requirement in some areas, i.e. law, medicine.

75.
Still some issues exist: Reasoning
The most fruitful approach is likely to be a hybrid neural-symbolic system. Topic of
active research right now.
And it seems all major players are already go this way (Watson, Siri, Cyc, …)
There is a lot of knowledge available (or extractable) in the world. Large
knowledge bases about the real world (Cyc/OpenCyc, FreeBase, Wikipedia,
schema.org, RDF, ..., scientific journals + text mining, …)

87.
(Sep 23, 2017) Inside iPhone 8: Apple's A11 Bionic
introduces 5 new custom silicon engines
“Creating an entirely new GPU architecture "wasn't
innovative enough," so A11 Bionic also features an entirely
new Neural Engine within its Image Signal Processor, tuned to solve very specific
problems such as matching, analyzing and calculating thousands of reference
points within a flood of image data rushing from the camera sensor.
Those tasks could be sent to the GPU, but having logic optimized specifically for
matrix multiplications and floating-point processing allows the Neural Engine to
excel at those tasks.
http://appleinsider.com/articles/17/09/23/inside-iphone-8-apples-a11-bionic-introduces-5-new-custom-silicon-engines
Mobile AI: Apple

88.
(Aug 16, 2017) We are making on-device AI ubiquitous
“In fact, the Hexagon DSP with Qualcomm Hexagon Vector
eXtensions on Snapdragon 835 has been shown to offer a
25X improvement in energy efficiency and an 8X
Improvement in performance when compared against running the same
workloads (GoogleNet Inception Network) on the Qualcomm Kryo CPU.
We have introduced the Snapdragon Neural Processing Engine (NPE) Software
Developer Kit (SDK). This features an accelerated runtime for on-device execution
of convolutional neural networks (CNN) and recurrent neural networks (RNN) —
which are great for tasks like image recognition and natural language processing,
respectively”
https://www.qualcomm.com/news/onq/2017/08/16/we-are-making-device-ai-ubiquitous
Mobile AI: Qualcomm

89.
FPGA/ASIC
● FPGA (field-programmable gate array) is an integrated circuit designed to be
configured by a customer or a designer after manufacturing
● ASIC (application-specific integrated circuit) is an integrated circuit customized
for a particular use, rather than intended for general-purpose use.
● Both FPGAs and ASICs are usually much more energy-efficient than general
purpose processors (so more productive with respect to GFLOPS per Watt).
● OpenCL can be the language for development for FPGA, and more ML/DL
libraries are using OpenCL too (for example, Caffe). So, there should appear an
easy way to do ML on FPGAs.
● Bitcoin mining is another heavy-lifting task which passed the way from CPU
through GPU to FPGA and finally ASICs. The history could repeat itself with
deep learning.

90.
FPGA/ASIC custom chips
There is a lot of movement to FPGA/ASIC right now:
● Mobileye chips with specially developed ASIC cores are used in BMW, Tesla, Volvo, etc.
● Microsoft develops Project Catapult that uses clusters of FPGAs
https://blogs.msdn.microsoft.com/msr_er/2015/11/12/project-catapult-servers-available-to-academic-researchers/
● Baidu tries to use FPGAs for DL
http://www.hotchips.org/wp-content/uploads/hc_archives/hc26/HC26-12-day2-epub/HC26.12-5-FPGAs-epub/HC26.12.545-Soft-Def-Acc-Ouyang-baidu-v3--baidu-v4.pdf
● Altera (one of the FPGA monsters) was acquired by Intel in 2015. Intel is working on a
hybrid Xeon+FPGA chip
http://www.nextplatform.com/2016/03/14/intel-marrying-fpga-beefy-broadwell-open-compute-future/
● Nervana plans to make a special chip to make machine learning faster (acquired by Intel)
http://www.eetimes.com/document.asp?doc_id=1328523&
● Movidius (acquired by Intel) Myriad X VPU - a dedicated hardware accelerator for deep
neural network inferences.
https://www.movidius.com/myriadx

96.
ASIC: Intel Knights Mill
(Aug 24, 2017) Intel Spills Details on Knights Mill
Processor
Knights Mill, a Xeon Phi processor tweaked for machine
learning applications.
Knights Mill represents the chipmaker’s first Xeon Phi offering aimed exclusively
at the machine learning market, specifically for the training of deep neural
networks. For the inferencing side of deep learning, Intel points to its
Altera-based FPGA products, which are being used extensively by Microsoft in its
Azure cloud.
Knights Mill is scheduled for launch in Q4 of this year.
https://www.top500.org/news/intel-spills-details-on-knights-mill-processor/

99.
Neuromorphic chips: Snapdragon 820
Over the years, Qualcomm’s primary focus had been to make mobile
processors for smartphones and tablets. But the company is now trying
to expand into other areas including making chips for automobile and
robots as well. The company is also marketing the Kyro as its
neuromorphic, cognitive computing platform Zeroth.
http://www.extremetech.com/computing/200090-qualcomms-cognitive-compute-processors-are-coming-to-snapdragon-820

100.
Neuromorphic chips: IBM TrueNorth
● 1M neurons, 256M synapses, 4096 neurosynaptic
cores on a chip, est. 46B synaptic ops per sec per W
● Uses 70mW, power density is 20 milliwatts per
cm^2— almost 1/10,000th the power of most modern
microprocessors
● “Our sights are now set high on the ambitious goal of
integrating 4,096 chips in a single rack with 4B neurons and 1T synapses while
consuming ~4kW of power”.
● Currently IBM is making plans to commercialize it.
● (2016) Lawrence Livermore National Lab got a cluster of 16 TrueNorth chips
(16M neurons, 4B synapses, for context, the human brain has 86B neurons).
When running flat out, the entire cluster will consume a grand total of 2.5 watts.
http://spectrum.ieee.org/tech-talk/computing/hardware/ibms-braininspired-computer-chip-comes-from-the-future

102.
Neuromorphic chips: Intel Loihi
(Sep 25, 2017) As part of an effort within Intel Labs, Intel has
developed a first-of-its-kind self-learning neuromorphic chip –
codenamed Loihi – that mimics how the brain functions by
learning to operate based on various modes of feedback from the
environment. This extremely energy-efficient chip, which uses the
data to learn and make inferences, gets smarter over time and
does not need to be trained in the traditional way. It takes a novel
approach to computing via asynchronous spiking.
It is up to 1,000 times more energy-efficient than general purpose
computing required for typical training systems.
In the first half of 2018, the Loihi test chip will be shared with
leading university and research institutions with a focus on
advancing AI.
https://newsroom.intel.com/editorials/intels-new-self-learning-chip-promises-accelerate-artificial-intelligence/

103.
Neuromorphic chips: Intel Loihi
● Fully asynchronous neuromorphic many core mesh that
supports a wide range of sparse, hierarchical and recurrent
neural network topologies
● Each neuromorphic core includes a learning engine that can
be programmed to adapt network parameters during
operation, supporting supervised, unsupervised,
reinforcement and other learning paradigms.
● Fabrication on Intel’s 14 nm process technology.
● A total of 130,000 neurons and 130 million synapses.
● Development and testing of several algorithms with high
algorithmic efficiency for problems including path planning,
constraint satisfaction, sparse coding, dictionary learning,
and dynamic pattern learning and adaptation.
https://newsroom.intel.com/editorials/intels-new-self-learning-chip-promises-accelerate-artificial-intelligence/

104.
Memristors
● Neuromorphic chips generally use the same silicon transistors and digital
circuits that make up ordinary computer processors. There is another way to
build brain inspired chips.
https://www.technologyreview.com/s/537211/a-better-way-to-build-brain-inspired-chips/
● Memristors (memory resistor), exotic electronic devices only confirmed to exist
in 2008. The memristor's electrical resistance is not constant but depends on
the history of current that had previously flowed through the device, i.e.the
device remembers its history. An analog memory device.
● Some startups try to make special chips for low-power machine learning, i.e.
Knowm
http://www.forbes.com/sites/alexknapp/2015/09/09/this-startup-has-a-brain-inspired-chip-for-machine-learning/#5007095d51a2
http://www.eetimes.com/document.asp?doc_id=1327068

107.
Quantum Computing: D-Wave
● (May 2013)
“We’ve already developed some quantum machine learning algorithms. One
produces very compact, efficient recognizers -- very useful when you’re short
on power, as on a mobile device. Another can handle highly polluted training
data, where a high percentage of the examples are mislabeled, as they often
are in the real world. And we’ve learned some useful principles: e.g., you get
the best results not with pure quantum computing, but by mixing quantum and
classical computing.”
https://research.googleblog.com/2013/05/launching-quantum-artificial.html

108.
Quantum Computing: D-Wave
● (Jun 2014) Yet results on the D-Wave 2 computer seem controversial:
“Using random spin glass instances as a benchmark, we find no evidence of
quantum speedup when the entire data set is considered, and obtain
inconclusive results when comparing subsets of instances on an
instance-by-instance basis. Our results do not rule out the possibility of
speedup for other classes of problems and illustrate the subtle nature of the
quantum speedup question.”
http://science.sciencemag.org/content/early/2014/06/18/science.1252319

109.
Quantum Computing: D-Wave
● (Dec 2015)
“We found that for problem instances involving nearly 1000 binary variables,
quantum annealing significantly outperforms its classical counterpart, simulated
annealing. It is more than 108
times faster than simulated annealing running on a
single core. We also compared the quantum hardware to another algorithm
called Quantum Monte Carlo. This is a method designed to emulate the behavior
of quantum systems, but it runs on conventional processors. While the scaling
with size between these two methods is comparable, they are again separated
by a large factor sometimes as high as 108
.”
https://research.googleblog.com/2015/12/when-can-quantum-annealing-win.html

110.
Quantum Computing: Google
● (Jul 2016)
“ We have performed the first completely scalable quantum simulation of a
molecule
…
In our experiment, we focus on an approach known as the variational quantum
eigensolver (VQE), which can be understood as a quantum analog of a
neural network. The quantum advantage of VQE is that quantum bits can
efficiently represent the molecular wavefunction whereas exponentially many
classical bits would be required. Using VQE, we quantum computed the energy
landscape of molecular hydrogen, H2.
https://research.googleblog.com/2016/07/towards-exact-quantum-description-of.html

111.
Quantum Computing: Google
(May 2017) Google Plans to Demonstrate the Supremacy
of Quantum Computing
“Google’s quantum computing chip is a 2-by-3 array of qubits.
The company hopes to make a 7-by-7 array later this year.
By the end of this year, the team aims to increase the number of superconducting
qubits it builds on integrated circuits to create a 7-by-7 array. With this quantum IC,
the Google researchers aim to perform operations at the edge of what’s possible with
even the best supercomputers, and so demonstrate “quantum supremacy.””
https://spectrum.ieee.org/computing/hardware/google-plans-to-demonstrate-the-supremacy-of-quantum-computing

112.
Quantum Computing: IBM
(Sep 13, 2017) IBM Makes Breakthrough in Race to
Commercialize Quantum Computers
“IBM has been pushing to commercialize quantum computers and
recently began allowing anyone to experiment with running
calculations on a 16-qubit quantum computer it has built to
demonstrate the technology.”
https://www.bloomberg.com/news/articles/2017-09-13/ibm-makes-breakthrough-in-race-to-commercialize-quantum-computers
“IBM announced on May 17, 2017 that it has successfully built and tested its most powerful universal quantum
computing processors. Its upgraded 16 qubit processor (pictured) will be available for use by developers,
researchers, and programmers to explore quantum computing using a real quantum processor at no cost via
the IBM Cloud. IBM first opened public access to its quantum processors one year ago, to serve as an
enablement tool for scientific research, a resource for university classrooms, and a catalyst of enthusiasm for
the field. To date users have run more than 300,000 quantum experiments on the IBM Cloud”
https://phys.org/news/2017-05-ibm-powerful-universal-quantum-processors.html

113.
Quantum Computing: Intel
(Oct 10, 2017) Quantum Inside: Intel Manufactures
an Exotic New Chip
“Intel’s quantum chip uses superconducting qubits.
The approach builds on an existing electrical circuit
design but uses a fundamentally different electronic phenomenon that only works at
very low temperatures. The chip, which can handle 17 qubits, was developed over
the past 18 months by researchers at a lab in Oregon and is being manufactured at
an Intel facility in Arizona.
https://www.technologyreview.com/s/609094/quantum-inside-intel-manufactures-an-exotic-new-chip/
https://newsroom.intel.com/news/intel-delivers-17-qubit-superconducting-chip-advanced-packaging-qutech/

114.
Quantum Computing
● Quantum computers can provide significant speedups for many problems in
machine learning (training of classical Boltzmann machines, Quantum Bayesian
inference, SVM, PCA, Linear algebra, etc) and can enable fundamentally
different types of learning.
https://www.youtube.com/watch?v=ETJcALOplOA
● The three known types of quantum computing:
○ Universal Quantum: Offers the potential to be exponentially faster than traditional computers for
a number of important applications: Machine Learning, Cryptography, Material Science, etc. The
hardest to build. Current estimates: >100.000 physical qubits.
○ Analog Quantum: will be able to simulate complex quantum interactions that are intractable for
any known conventional machine: Quantum Chemistry, Quantum Dynamics, etc. Could happen
within next 5 years. It is conjectured that it will contain physical 50-100 qubits.
○ Quantum Annealer: a very specialized form of quantum computing. Suited for optimization
problems. The easiest to build. Has no known advantages over conventional computing.
http://www.research.ibm.com/quantum/expertise.html

115.
Hardware: Summary
● Ordinary CPUs are general purpose and not as effective as they could be
● GPUs are becoming more and more powerful each year (but still consuming a
lot of power).
● ASICs/FPGAs are on the rise. We’ve already seen some and will probably
see even more interesting announces this year.
● Neuromorphic chips etc. are probably much farther from the market (3-5
years?) while already show interesting results.
● Memristors are probably even farther, but keep an eye on them.
● Quantum computing: still unclear. Probably will be cloud solutions, not
desktop ones.