All of these categories, be patient. Let me give you some background story fist.

Three-four years ago I found this. It is a giant board with bird songs visualized as spectrograms "using machine learning". It is an ideal example of Google project - beautiful, decently looking, nice, expensive to make and utterly useless and lacking any connection to reality. At that moment I was impressed by that project, because I did not know anything about Data Science and machine learning and really thought that to produce such visualizations you really needed some "black magic".

Several years later after having started this website and our telegram channel and after our neural chicken coop project I saw that page and was a bit more skeptical. Roughly at the same time I stumbled upon this video on Youtube (there are literally dozens of videos of related to different taxonomic units).

I binge-watched several dozens of these videos. Then I remembered this scene from the renown Silicon Valley TV-series.

And then I just happened to remember reading the following articles / news items / blog posts:

How Silicon Valley not hotdog app was made using react native, tensorflow and squeeze nets;

Well you do not have to be a genius to see the following patterns / projections:

Calculations follow cyclical patterns. When some kind of calculation can be moved to end-user devices (PCs, notebooks, smart-phones) with some benefit, eventually it does;

Squeeze net enables us to run a the prediction part powerful neural network architecture on a mobile device without significant limitation and weights take just 5-10 Mbs of memory;

As of ~ July 2017 nobody did an app / algorithm for bird species recognition by its song;

1. Action plan

So it's decided - let's make an app that will recognize a bird by listening to its song. Sounds cool, right? It takes a lof of time to describe all of this, but actually when you know all the inputs the plan is born in your head in literally seconds, which happened to me. And I got really inspired.

The project can be roughly separated into the following chunks:

Find bird songs, download them, do some basic statistical analysis;

Analyze them, choose a representative subsample for proof of concept;

Learn how to extract features from sound;

Run a lot of experiments with plain vanilla neural networks to see if this project is viable;

If it is, then migrate the NN to squeeze net architecture to reduce weight matrix size;

Build a really react native app, that will listen to the birds and tell you which bird this is;

Sound somewhat easy. But there are a lot of tricky parts, especially with collecting data, sampling data and NN architecture.

2. Without further ado, let's jump in?

After a bit of research I found this spectacular website with ca. 350k bird voice recordings of ca. 10k bird species. For reference in the Animal kingdom (taxonomic term) there are ca. 30k animals (birds are also animals).

If you did not study biology well in school - this is your last chance to catch up. Here you can find the interactive version.

This website also features a simple but powerful API. So - let's start.

Let's include the libraries that we will most likely need (I am lazy).

4. So what?

What? Can't we just download all the songs, load them into VGG-16 and that's it?

Not so fast. Above you could see that we have ~30-40 bird songs per species on average, which is not really enough. And we want to build a classifier that is actually usable in real world conditions, not in some kind of walled garden. So we need at least hundreds of songs per class (in a balanced dataset!) for neural networks to run properly.

So the obvious idea is to use taxonomic tree data to predict not bird species, but for example bird genus, which may be easies (it may be scientifically trivial, but we are talking about proof of concept stage now).

After some research, I found that the biggest resource on this field is ITIS and it boasts direct database download. So, we will need to be a bit technical about it.