November 21, 2016

I normally use Encog and a self-written learning framework for when I do audio pipeline learning. I’ve been tempted by CNTK and TensorFlow. CNTK uses tools whose license is, sadly, too restrictive. TensorFlow’s ecosystem is more in-line with what I need.

I’m a windows guy, and I can use TensorFlow(TS) via docker. But, I want to use my GPU. I have a CUDA compliant GPU on one of my machines along with Windows 10 and Visual Studio Community. The official readme is designed for VS Pro, not community. The key difference is that VS Community doesn’t officially support TensorFlow 32 bit with CUDA, only 64 bit.

Here’s the steps I’ve figure out so far:

Prerequisites.

You’ll need SWIG, CUDA, the Nvidea NN library for Cuda, Git, CMake, Python3.5 and numpy 1.11. You can use Anaconda to satisfy the python/numpy requirement. Install Anaconda, then conda install numpy in an elevated command prompt. The rest, you’ll have to download installers and install. Oh, and Visual Studio Community 2015. I’ll assume a default install drive of C: I’ve adapted the steps from the official Github here:

September 22, 2016

Experian just released a survey of 180 Echo owners about their Echo experience. You can read the report here. It showed some great findings — NPS of the echo was 19 — very high, but not extreme ( Google Chrome, for example, is around 35 ). Most impressive is that 35% of echo owners are shopping online, right now, with voice! This means people like the echo and spend money with it.

Experian believes that Voice is now entering the, “Early adopter” phase of the hype cycle. I’m surprised that it’s taken voice so long to get to this phase — but I’m an early adopter, having used the echo since their late betas. I also have a VR setup, and I code for a living.

When voice dialers ( Siri, call my wife! ) became mainstream, they changed the world. I use one every day I make a phone call. This gave me hands free ability I use when I drive, and voice dialing is a mainstream use case. I fully expect voice computing to go mainstream, and the market to grow here in leaps and bounds.

June 23, 2016

Want to learn to make 3d objects like this cool Tiara from Adafruit? Come to the OpenSCAD class at Metrix Create Space on Aug 4,2016. We’ll go over the basics of drawing 3d objects by describing them in an open-source, free, C-like language.

June 7, 2016

For my voice AI project, I’ve been looking at genetic algorithms and neural nets. I wrote a gate array learner and created a truth table of 4 input points and 2 output points. I knew, ahead of time, that 2 XOR gates wired to inputs 1,2 and 3,4 respectively would perfectly fit the space.

I then wrote a genetic algorithm to solve the space problem, and counted how many times the algorithm tried to solve the problem before succeeding.

The algorithm tried between 811 and about 28,000 times before solving the space. A Neural Net solved the same problem in between 43 and never times, even when given the same number of nodes. A massively overfitted neural net of 2000 nodes in the hidden layer converged far faster than a low overfitted network of only 5 nodes in the hidden layer.

So, I’d probably call NN the winner — but only when massively overt-fitted.

May 26, 2016

I’ve been working with a BitArray pattern recognition system for sound processing. I implemented a genetic algorithm with a single-point mutation ability and tested that algorithm against a data set of sounds ( me talking and a recording of violins playing. The idea is to detect me talking over the violin noise, with the hope of eventually being able to tell speech from noise apart. )

It didn’t work at all. I could create semi-optimal ( aka local minima ) solutions that could mostly guess me talking vs the violin, but not always. There was a global solution — by pure chance I hit it a few times where the system worked correctly. ( about 1/10 of the time, I hit the correct global optima, 9/10 I hit a local optima ).

I wanted to see if I could evolve the local optima detectors to the global. With a SNIP mutation, it didn’t work ( though I hypothesized it should work some of the time. The global optima is a single bit, bit 47, being false in the encoded samples. )

I calculated from this the number of mutations to get to global optima from all suboptimal solutions. I calculate at least 4 to around 7 serial snips, with add/deletion being far more valuable than transpose.

Cost tracking indicates that the global optima takes 318,000 if/then tests to achieve in a good case. ( 500 sample points in the space — small data… )

I have no idea what gradient descent would take here. But, I now know an appropriate DNN topography to guess correctly. 25 samples in input layer, 25 neurons + bias, and 1 output neuron should simulate my genetic selector. Then I can tell what is more efficient. I’m suspecting that poly-snip genetic would work, along with 25 neuron DNN. I’ll have to impliment both and see which is more efficient DNN or Genetic?

May 9, 2016

For the past month, I’ve been working on a machine learning program, accidentally.

A year or so ago, I wrote a little app that uses cloud AI to do language translation. It worked! Only for me! See, I grew up in the American Midwest. I actually went to the University of Nebraska for a while. I speak broadcast perfect — I could be a news anchorperson. I also understand AI. In machine translation, I understand that it’s just “transcoding” based on word frequency, Kenneth. This means I can have this kind of conversation with myself:

“How many dogs do you have?”/ “I have two dogs”.

So, because of these factors, I can use a translation AI without problem. But I often interact with people who are older, have strong accents, and don’t really understand the processing time and optimal speech patterns for cloud machine translation. They speak differently:

“How many dogs do you have?” / “two”.

Fragmented, fast, impatient, and ambiguous. A machine system won’t handle this conversation well. The accented, older human is now just frustrated with the thing. They didn’t have enough clue from the system of what was going on, and it took too long for it to work. They want “Effortless” translation, or they don’t believe/trust it at all.

So, I wanted to solve the problem of conversational translation along with a slew of other problems like contact search. Thus, I stepped through the looking glass and decided it was time I learned AI development. I went looking for frameworks, and discovered Encog, a C# neural network/ ML framework, and played around with it. I discovered the amount of featurization and pre-processing for sound NN was higher/harder than I liked. It could be done, but only with a metric tonne of labeled data — data I don’t have.

So, I looked at “small data” ideas. One that interested me was the two-dimensional vector field learner that Numenta has. I began a pure C# implementation ( I normally don’t code in C# because I hate UWP — but this kind of project uses old .NET APIS and no UWP). And along the way, it hit me — this two dimensional learner was a neural network, and machine learning is really just pattern recognition. The sparse maps are like labels — another way of saying, “Like these, not those”. The two dimensional field could be represented by a vector of A elements, where A = M x N of the original field.

But there’s power in the representation that I hadn’t expected. Turns our that viewing NN as a two dimensional vector and using masking leads to easier human understanding of what the heck is actually going on in the system. And this leads to new ideas ( which I’m not ready to share yet, because they’re possibly insane ).

Now a days, I’m developing out the system because it’s intellectually engaging. I’ve started from Ideas, seen how they work in existing frameworks, then moved and maybe improved those ideas into my own framework — because I believe “if you don’t build it, you don’t understand it”. My framework is woefully incomplete. It will always create a pattern based on the least significant bits. It’s easy to fool, and doesn’t use enough horizontal data when building masks. But it can do something amazing — it can tell apart two sounds with exactly one sample of each sound, and does so without a label.

And that’s not the most exciting part! As I’ve been playing with these ideas, a new one has emerged about how to stack and parallelize the detectors and make an atemporal representation of sound streams. This seems to match what Noam Chomsky says about how human “Universal Grammar” must work. If this idea pans out ( and it’s maybe months of implementation time to find out ), then there’s a small chance that I’ll figure out some part of the language translation problem.

All that excitement is tempered by the fact that I have limited time. Eventually, I’ll run out of money, and thus time, to do this research. So the problems I must solve are:

Can I build a framework that’s able to solve the problems I’m interested in?

If not, can the pattern detectors solve problems others are interested in?

One thing I learned working in big tech — there’s always someone watching.

Take the Amazon Alexa. You can bet the big 4 tech firms are watching Amazon and trying to decide if they’ll make technology to compete. And I really wish they would. I have an echo and love it, but programming for the Echo is crap.

Why is programming for the echo crap? So many issues:

provisioning services is a nightmare. You do’t even know what services you need to provision, much less have access to a configuration file. Lots of AWS console pages — lots — to get to hello world.

No audio stream. If you want to make a phone app, forget it. Amazon won’t give you the voice data. There’s AVS that you can use to send voice you capture to Alexa — but there’s no access to the voice in the Echo.

90 second, fixed format playback from the API. You literally chunk everything as 90 second long mp3s.

NodeJS. Voice is not web, and the stateless nature of web design makes no sense in voice apps. The biggest issue is that your app will respond to any of the registered commands in any sequence. Conversations, however, are always sequential. It’s just the wrong language for the job.

NodeJS, outside the web, is sort of a problem. There’s real harm in imposing the async paradigm on problems that are much simpler to read in a stateful manner.

And not just any NodeJS — you can’t write the code in your own editor. Amazon wants to make sure they own the coding platform, and you have to write Alexa code in their web editor.

Can’t really sell what you make. Amazon won’t let you monetize the actual ASK — instead, you have to sell something else, like an unlock code, on Android.

AVS platform lock — AVS is essentially only available for Linux/Android. If you want to use AVS on PC/Mac, well, you’re SOL.

Overly cloudy. I’m not a fan of the cloud, because it adds complexity that doesn’t need to be there. But Alexa takes the cake on too much cloud for no reason. Can’t write the code on local system — must be in browser. Can’t run any part on your own hardware — must run in AWS. Evey instance requires a lambda spin-up. Can’t sell what you make. Developers give too much control away when using Alexa.

My team won the Echo prize in the recent Seattle VR Hack-a-thon. The team at Amazon is amazing, and echo is an amazing product. Again, I own and love my echo. But without a competitor, the developer experience is really sub-par. I also don’t like these cloud companies forcing devs to lock in to them — can’t even use your own editor? Come on!

So that’s my argument — that Echo needs competition from the big tech companies. Sure, some start-up can make a great echo-like product with a better developer experience. There are small-shop products that make similar products that I run across on KickStarter/Indiegogo. But those companies are vertically focused — no developer experience at all — where the big 4 make APIs…

April 21, 2016

Sparse maps are an idea I saw from an AI firm ( numenta? ) about how to visualize and filter vector arrays. The basic idea is that you take a vector ( can be a binary vector, but could be a vector of ints ), assign some color value to some numbers, and spread the vector over a 2d map. From this map, you can find some number of clusters, and those clusters are essentially concepts. You build masks of these concepts to see if an output contains the concept. They use it in natural language search.

I took an open source codec called codec2 and build a sparse map of 71 frames of me saying, “ah” and “sh”, and put that into a 640×480 picture from the codec’s 51-bit frames.

So, a frame is a vertical line in the picture. bit 0 is the top, bit 51 is the bottom of the line. Each 9×9 block represents either a 1 or a 0. Red colored 9×9 blocks are 1, Green 9×9 pixel blocks are 0. Frame 0 is the leftmost vertical line, and frame 71 is the rightmost. There are no spaces between the colored blocks, so it looks continuous, but is really discrete blocks. I did this in C#, so there will be byte order flipping issues which I haven’t corrected for ( essentially, BitArray.Copyto(byte[]) will copy in little-endian order which I then bitshift back into order, even though the bit array is in concatenated bit order — something I’ll fix later, but this error is consistent, so the generated color map is also consistent ).

The results are staggering. Here’s the picture of “ah”

Here’s the picture of “sh”

These maps look interesting — I think a filter masks might be able to detect either:

my voice.

the phones being spoken.

Of course, this could be a dead end. I haven’t seen if I can generate masks from this yet — but it looks super interesting, so I thought I’d share.

March 8, 2016

MS has not been keeping .NET non-UWP up to date. For example, the desktop Cortana APIs are UWP locked. But, you can use “Cortana” via azure in a convoluted way, or you can use straightforward APIs within UWP. But, you can’t use Cortana in a straighforward way in .NET.

Even when the API is in both UWP and .NET, the documentation is not updated for .NET. I’ve run into cases where the docs are UWP only, and the .NET version of the API has a different calling convention.

UWP detracts from .NET improvement. MS is using too much developer time keeping two forked APIs that do the same thing. Nothing is stopping MS from updating .NET and bringing it to all platforms. Nothing is stopping MS from making store APIs part of .NET. .NET already supports strong cryptography, include strong-name signed dlls — everything that UWP is supposed to solve, .NET already provides.

I hate developing UWP. So much, that I’ve abandoned .NET development. All new dev work I do is in NodeJS. This is because UWP keeps creeping into things. Starting VS? You get pestered with ,”Where’s this month’s license?” even in VS community.

I love C# and .NET — really, I do. I *want* to develop in the MS stack. UWP has driven me away. I can’t trust APIs I want are present. I can’t trust the API docs. I can’t get away from hassles. Don’t get me started the annoyance of things like NuGet ( how do you debug a NuGet package deployment failure? That’s a nightmare… npm is so easy — rd /s /q node_modules and npm install… ) and the reducing number of devs.

To get me to reconsider the MS platform as a serious developer platform, UWP must die.