One of the challenges we have with AI is that there isn’t any universal definition – it is a broad category that means everything to everyone. Debating the rights, and, the wrongs, and the should’s and the shouldn’t s is another post though.

DARPA outlines this as the “programmedability to process information” and across a certain set of criteria that span across perceiving, learning, abstracting, and, reasoning.

AI Scale Intelligence

They classify AI in three waves – out outlined below. Each of these is at a different level across the intelligence scale. I believe it is important to have a scale such as this – it will help temper expectations and compare apples to apples; and for enterprises it will help create roadmaps on outcomes and their implementations; and finally help cut through the hype cycle noise that AI has generated.

Wave 1 – Handcrafted Knowledge

The first wave operates on a very narrow problem area (the domain) and essentially has no (self)learning capability. The key area to understand that the machine can explore specifics, based on the knowledge and related taxonomy/ structure which is defined by humans. We create a set of rules to represent the knowledge in a well-defined domain.

Of course as the Autonomous grand challenge taught us – it cannot handle uncertainty.

AI First wave stumbles

Wave 2 – Statistical Learning

The second wave, has better classification and prediction capabilities – a lot of which is via statistical learning. Essentially problems in certain domains are solved by statistical models – which are training on big data. It still doesn’t have contextual ability and has minimal reasoning ability.

A lot of what we are seeing today is related to this second wave; and one of the hypothesis holding this up is called manifold hypothesis. This essentially states that high dimension data (e.g. images, speech, etc.) tends to be in the vicinity of low dimension manifolds.

A manifold is an abstract mathematical space which, in a close-up view, resembles the spaces described by Euclidean geometry. Think of it as a set of points satisfying certain relationships, expressible in terms of distance and angle. Each manifold represents a different entity and the understanding of the data comes by separating the manifolds.

Using handwriting digits as an example – each image is one element in a set which has 784 dimensions, which form a number of different manifolds.

Handwritten digitsManifolds of handwriting

Separating each of these manifolds (by stretching and squishing of data) to get them isolated is what makes the layers in a Neural net work. Each layer in the neural network computes its output from the preceding layer of inputs (implemented usually by a non-linear function) – learning from the data.

AI Neural NetsAI Neural Nets learning from data

So, in statistical learning, one would design and program the network structure based on experience. Here is an example of how the number 2 to be recognized goes through the various feature maps.

And whilst it is statistically impressive, it is also individually unreliable.

AI failureAI Failure

Wave 3 – Contextual Adaptation

The future on AI, is what DARPA is calling Contextual adaptation – where models explain their decisions, which is then used to drive further decisions. Essentially one ends up in this world where we construct contextual explanatory models that are reflective of real world situations.

AI Models to explain decisionsAI Models to drive decisions

In summary, we are in the midst of Wave 2 – which is already very exciting. For an enterprise, it is key to have a scale that outlines the ability to process information across the intelligence scale to help make this AI revolution more tangible and manageable.

What is it?

What’s the next big thing in computing? Not the #AI or #blockchain’s of the world, that is starting to happen today (albeit a little early)? Quantum computing is one of those next big things, that is on the horizon (probably in the ~5 years range).

Why do I care?

Why do I care about quantum? Well, some problems are simply not solvable on convention digital computers – the kind we have today – these are called “classical” machines. Even if Moore’s law did continue (and that is a whole different debate), are the still some problems whose scaling obey a different set of properties and law; and the double of transistors on a chip wont really help. In fact, some of these problems require longer than the lifetime of the universe – and that is with the biggest, and fastest supercomputers available!

Quantum computing, is a paradigm shift in computing – it is moving beyond silicon and bits.

What are Quantum computers?

Quantum machines are different – they are machines based on properties of quantum mechanics compared to classical mechanics (i.e. machines we use today). The few characteristics that make quantum computers different are:

Today’s computers, use transistors to manipulate bits as either 0 or 1; quantum computer encode information as qubits (quantum bits) and are not limited to bits in two states.

Qubits are superimposed – can be both in a state of 0, or 1, simultaneously; furthermore they are also in all points in between 0 and 1 at the same time. This makes them inherently parallel at an exponential scale

They are notoriously delicate! These need to cooled (-459F, which is 100 times colder than deep space) and the noise isolated to preserve the system’s integrity. This level of cooling and sensitivity, requires new and different (quantum) error correction techniques than what we are used to.

There is a “No-cloning mechanism” i.e. one cannot copy the data/result to inspect it. Entanglement helps observe the result of a calculation whilst preserving the integrity.

Hybrid machines – we need a classical machine to control a quantum machine – to program, hint/nudge in the right direction (remember they are sensitive and need different error correction), and, observe the collapsed state.

To put it in perspective, an entangled system of 250 quibits would require ~1080 bits to store classically. And that is more atoms that exist in the universe! And as implied, a quantum machine would only need 250 quibits to store those. A few more comparisons that might help:

A terabyte needs: ~1012 bits

A Petabyte needs: ~1015 bits

A Exascale (possible in a few years) will needs: ~1018 bits

Application Areas

In the early days, most problem areas will be optimization problems – things that are very difficult, or not possible to do with classical computers today. Some vertical scenarios that one can think of:

Privacy and Security – Example: Quantum encryption would need to be supersede current encryption techniques that underpin modern commerce

Machine Learning – Example: New probability distributions, and new inferences which allows to ask a new question that isn’t possible today. Also exponential speed ups with better solutions and models – new (e.g. nearest neighbor classification)

Cloud Computing

However, as if today, we don’t know what are the best questions to ask a quantum machine to answer – at least not yet.

Making it Real – Some examples

Example 1 – Encryption

I guess the pet example that everyone talks about is using encryption and the RSA 2K challenge – where in if you have a large number, such as the one shown below which is used as a key for encryption; when trying to find out the two large prime numbers that can provide the key – on a classical machine this will take 1 billion years; and on a quantum machine approx. 100 seconds. Needless to say that will have a significant impact on digital commerce and encryption and security in general.

Example 2: Simulating physical systems

Ferredoxin (Fe2S2) is a compound that is used in many metabolic reactions including energy transport in photosynthesis. When currently being used, one wastes a lot of resources as part of the process and one thing that would help is finding ground state of ferredoxin. However using this with a classical algorithm, that is an impossible task and is intractable. But with a quantum algorithm, this would be approx an hour (in 2015).

Entanglement – what is it?

It is a fundamental property of quantum mechanics. It is a physical phenomenon where two particles interact with each other in ways that the state of one particle cannot be described independently of the other. The paradox here is that measuring either of the particles, collapses the state of the entire (entangled) system. One cannot directly observe the result of a quantum computer – if you try to look at the subatomic particles, you bump them, and thereby change their value. If you look at a qubit in superposition to determine its value, the qubit will assume the value of either 0 or 1, but not both (effectively turning our quantum computer into a mundane digital computer).

One aspect though is that the two particles aren’t necessarily next to each other, they can be miles apart, but still connected. This is what some call as the “spooky action” that in the past have upset few scientists, including Einstein. The way to observe the result is to preserve the system’s integrity and indirectly measure the result – using Entanglement. Apply an outside force to two atoms, it can cause them to become entangled, and the second atom can take on the properties of the first atom. So if left alone, an atom will spin in all directions. The instant it is disturbed it chooses one spin, (i.e. one value); and at the same time, the second entangled atom will choose an opposite spin, or value. This allows scientists to know the value of the qubits without actually looking at them.

Quantum Superposition – what is it?

It is another fundamental property of quantum mechanics where the state can be multidimensional, at the same time. Superposition is what makes a quantum machine inheritably parallel. A normal Turing machine can only perform one calculation at a time, a quantum Turing machine can perform many calculations at once – given the symbols are both 0 and 1 (and all points in between) at the same time. Similar to waves (say in a pond), any two (or more) quantum states can be added together (“superposed”) and the result will be another valid quantum state. And, that every quantum state can be represented as a sum of two or more other distinct states.

Microsoft’s Position

I also wanted to outline a more Microsoft specific view on Quantum computing and what is their perspective. Microsoft Research (MSR) has been doing research since late 90’s in Quantum, and their approach is topological quantum computation (different than the competition). They have a dedicated lab called Station Q and also Quantum Software Architecture – Liquid (LIQUi|>).

The photo below shows MSR’s primary research collaborators on quantum.

The above image, essentially is a large fridge cooling the quantum chip (below) to –459oF. And the various layers (discs) that one sees in the image below is how it the cooled down in the process. The quantum chip is at the very bottom (not visible in the photo).

LIQUi|> is quite interesting; it is a domain specific language (DSL using F#) and also has the tools required (compiler used for Quantum circuits and gates), and relevant optimization of those gates and circuits. It includes a simulator for Quantum circuit (up to 31 qubit) and you can download it from GitHub.

Remember, this is a hybrid situation; so they are also working on a classical computer to control the quantum computer. And as you can imagine, this isn’t for the fain hearted. This classical computer, needs to factor in and transpose various dimensions between the classical and quantum world; some things like communication, heat dissipation, quantum error correction, multiplexing, latency, clock speed, etc.

We certainly live in a very exciting time and this video below does a nice job to explaining some of the basic principles outlined in this post.

At a recent internal meeting, we were discussing productivity and the various levels of distractions that one has these days. Did you know that there is a hierarchy of digital distractions (see image below). No wonder, in todays connected, and agile world, for some people why it is so difficult to get any actual work done (that is not to suggest that they are not busy of course).

At this meeting, analogy of the distraction was coined as the “monkey” – the monkey that each of us has on our shoulder and the constant attention it demands – I.e. the distraction. And we all know we cannot control this monkey and bottle it up. The idea isn’t to try and bottle it up, which will rattle it more trying to get out and demand more attention – but rather let it out in a controlled manner for some time – similar to how one would take a dog out for a walk (of course different outcomes) .

So instead of avoiding distractions, which might be very difficult for some folks, the idea is to let it out in a controlled manner – so the monkey is entertained and happy. This will help concentrate on the rest of the times and enable one to be more productive. And the science behind is how our brains gets the same effect as with drugs, and the ‘pleasure’ effects – it is both fascinating and scary.

Thinking about #machinelearning? It will be helpful to understand some numerical computations and concepts that affect the #ML algorithm.

One might not interact with these directly, but we surely can feel the effect. The things you need to think about are:

1. Overflow and underflow – thinking of them as rounding up or down errors that shift the functions enough, and compounded across the iterations cam be devastating. Of course can also easily get to division by zero.

2. Poor conditioning – essentially with small changes of input data, how large can the output move. You want this small. (And in cryptography you want the opposite, and large).

3. Gradient optimizations – there will be some optimization happening in the algorithm, question is how does it handle various local points on the curve? Local minimum, saddle points, and local maximum. Generally speaking, it’s about optimizing continuous spaces.

Some algorithms take this a step further by measuring a second derivative (think of it as measuring the derivative of a derivative – the curvature of a function).

4. Constrained Optimization – sometimes we just want to operate on a subset – so constraints only on that set.

All of these come into play some way, directly or indirectly and having a basic understanding and constraints around this would help a long way.

I know I have had to explain this a lot in most #AI related conversations that I have had – and lately those have been quite a lot. In my experience, most people use these terms interchangeably when they are meaning one over the other.

Whilst they all are (inter)related and one might help trigger the other, they are still fundamentally different and at some point, it is good to understand the differences. I like the image below (source) that whilst on one hand is showing a time graph, the correlation between them and how one is a subset of the other is what is interesting.

#AI vs #ML vs #DNN

#AI is getting more powerful and the potential of it which personally really excites me is the paradigm shift we are starting to see. Fundamentally it is changing on how we use, interact, and, value computers and technology.

It is shifting from us learning machines and their idiosyncrasies (remember when being computer literate was a differentiator on a resume) to this shift where technology learns us and interacts with us in a more natural, and dare I say human manner.

AI paradigm shift

I almost see it as StarTrek (and now showing my age) – the computer is everywhere, yet it is no where. It is embedded and woven into everything we do on the Enterprise rather an some “thing” one interacts with.

And it is awesome to start seeing some of this coming to life, even if it is in a demo as outlined at Build a couple of weeks ago. #AI in the Workplace and how it interacts with objects in real-time and can invoke and interact Business workflow (such as workplace policies).

AI in WorkplacePolicy violation

The degree of calculations is pretty phenomenal – 27 million / sec [separately I would love to understand the definition on calculation 🙂 ]. But then given where we are heading with a fully autonomous car generating about 100GB of data each second, this isn’t small potatoes.

And whilst you can read up more on these terms and how they link, I really like to move away from the different terms which most people confuse in the first place and start thinking of more business outcomes and how enterprises and people will use.

AI

To that end, the three buckets of Intelligent Automation, Robotic Process Automation (RPA), and Physical Automation is what we have found work better. On RPA, the one caveat being that it is not about robots, but rather the automation of a (business) process. The robots aspect would fall under physical automation – which essentially is anything that interacts with the real/physical world.

Update: Modified the script to handle multiple instances but pay heed to the warning here.

Similar to last year, I have a PowerShell script that will allow you to download the various PowerPoint decks and videos to watch locally rather than stream. This makes some improvements from the earlier scripts (e.g. if a file is already downloaded it will skip downloading it again) and does the following:

Creates the relevant folder which includes the Session details (including the Title, and the Presenters)

For each session, saves the description in a text file in the created folder.

Downloads the relevant presentation (if any)

Downloads a jpg which shows the image session – sometimes it is easier just to see the title slide. I thought better to have it and not use it, than the other way.

And finally downloads the high-quality video of that session.

In the script, you can change the following (and if you understand Build then this should be easy):

Change the path where to download this to (default is d:\build)

Choose a lower quality video if you prefer (which of course takes less space and might not be bad depending on which device you are seeing). Of course this also uses less bandwidth.

Of course. And if you want only the decks, then you can comment out parts of the script where it doesn’t download the video.

The script will spit out some basic errors and will ‘eat’ some of the exceptions that are expected (e.g. every session doesn’t have a pptx or a video). That won’t break the script, it will just move to the next session.

In case you did not see Story Remix demos from Build, it is awesome. And here is my first take on it just using the photos that I took at Build 2017. Some of the things you saw at the keynote are not in the RS3 build I am running but interesting possibilities nevertheless.

There of course are many, but for someone coming from computer science, and, software engineering, where the environment is relatively clean and certain (deterministic), it usually is a leap to understand that Machine Learning (and other elements of #AI) are not.

Machine learning, is based on probability theory and deals with stochastic (non-deterministic) elements all the time. Nearly all activities in machine learning, require the ability to factor and more importantly, represent andreason with uncertainty.

To that end, when designing a system, it is recommended to use a simple but uncertain (with some non-deterministic aspects) rule, rather than a complex but certain rule.

For example, having a simple but uncertain rule saying “most birds fly”, is easier and more effective than a certain rule such as “Birds can fly, except flightless species, or those who are sick, or babies, etc.”

The only way I think this is possible in a fool-proof way in the near future is that every has to absolutely implement a two-factor-DDA-authentication. There is not better #security today – period! There ain’t no stinking #AI, #RNN, #DNN, or Boltzmann machine in the world, or #Quantum computer worth its #quibits which can crack this – at least not in the near future.

And of course, when you have friends and family involved, the group authentication is a sure-fire way to stop anyone snooping in. #security