May 9, 2016

What am I up to in 2016?

For the past month, I’ve been working on a machine learning program, accidentally.

A year or so ago, I wrote a little app that uses cloud AI to do language translation. It worked! Only for me! See, I grew up in the American Midwest. I actually went to the University of Nebraska for a while. I speak broadcast perfect — I could be a news anchorperson. I also understand AI. In machine translation, I understand that it’s just “transcoding” based on word frequency, Kenneth. This means I can have this kind of conversation with myself:

“How many dogs do you have?”/ “I have two dogs”.

So, because of these factors, I can use a translation AI without problem. But I often interact with people who are older, have strong accents, and don’t really understand the processing time and optimal speech patterns for cloud machine translation. They speak differently:

“How many dogs do you have?” / “two”.

Fragmented, fast, impatient, and ambiguous. A machine system won’t handle this conversation well. The accented, older human is now just frustrated with the thing. They didn’t have enough clue from the system of what was going on, and it took too long for it to work. They want “Effortless” translation, or they don’t believe/trust it at all.

So, I wanted to solve the problem of conversational translation along with a slew of other problems like contact search. Thus, I stepped through the looking glass and decided it was time I learned AI development. I went looking for frameworks, and discovered Encog, a C# neural network/ ML framework, and played around with it. I discovered the amount of featurization and pre-processing for sound NN was higher/harder than I liked. It could be done, but only with a metric tonne of labeled data — data I don’t have.

So, I looked at “small data” ideas. One that interested me was the two-dimensional vector field learner that Numenta has. I began a pure C# implementation ( I normally don’t code in C# because I hate UWP — but this kind of project uses old .NET APIS and no UWP). And along the way, it hit me — this two dimensional learner was a neural network, and machine learning is really just pattern recognition. The sparse maps are like labels — another way of saying, “Like these, not those”. The two dimensional field could be represented by a vector of A elements, where A = M x N of the original field.

But there’s power in the representation that I hadn’t expected. Turns our that viewing NN as a two dimensional vector and using masking leads to easier human understanding of what the heck is actually going on in the system. And this leads to new ideas ( which I’m not ready to share yet, because they’re possibly insane ).

Now a days, I’m developing out the system because it’s intellectually engaging. I’ve started from Ideas, seen how they work in existing frameworks, then moved and maybe improved those ideas into my own framework — because I believe “if you don’t build it, you don’t understand it”. My framework is woefully incomplete. It will always create a pattern based on the least significant bits. It’s easy to fool, and doesn’t use enough horizontal data when building masks. But it can do something amazing — it can tell apart two sounds with exactly one sample of each sound, and does so without a label.

And that’s not the most exciting part! As I’ve been playing with these ideas, a new one has emerged about how to stack and parallelize the detectors and make an atemporal representation of sound streams. This seems to match what Noam Chomsky says about how human “Universal Grammar” must work. If this idea pans out ( and it’s maybe months of implementation time to find out ), then there’s a small chance that I’ll figure out some part of the language translation problem.

All that excitement is tempered by the fact that I have limited time. Eventually, I’ll run out of money, and thus time, to do this research. So the problems I must solve are:

Can I build a framework that’s able to solve the problems I’m interested in?

If not, can the pattern detectors solve problems others are interested in?