I got to review the O'Reilly book about AngularJS, which was nice since I was planning on using it for a recent project. I followed through most of the examples, which went OK-ish. Yes, I'm very impressed with how easy it seems to be to make very dynamic and interactive web apps. But as soon as I tried to apply it to my own project, I had quite some problems getting it working. Especially with concepts like directives or getting functions from multiple modules work together, I couldn't get the required knowledge from the book to get them working.

This book is probably more recommended for somebody who's already familiar with Javascript MVC frameworks, looking to brush up on AngularJS, rather than a Javascript rookie like myself.

Background

Human Movement Scientist

Specialized in gait analysis and pressure measurements

Pythonista, Matlab-survivor and reluctant R-user

Normally pressure measurements are done on humans using a pressure plate. In
this case we used a 200x36 cm pressure plate, it has 256x64 sensors which
measure at 126 Hz. When you walk over it, each sensor measures the force applied
to them and after interpolation, you get these nice looking pressure
measurements.

But I needed a version that worked on animals too. The regular software doesn't
work on animals, due to several limitations (not detecting the paws being a
major one).

So I created Pawlabeling, see
Github: https://www.github.com/ivoflipse/pawlabeling

Its called Pawlabeling, because its used to label paws. Who ever said naming
hard to be hard?!?

Scientific Python stack

The application uses a lot of libraries from the scientific python stack:

Interesting problems

While working on this project I ran into lots of interesting problems and like
any junior developer: you go to Stack Overflow! Luckily the Python community is
awesome and especially Joe Kington's answers helped me a lot (all his Matplotlib answers too BTW).

Focus of the talk

Today, I'll just be focusing on the problem I'm working on right now: Labeling
the paws.

Given the data of a paw, predict its label (LF, LH, RF, RH)

So how can we solve this?

Let's use Machine Learning

Gael Varoquax (scikit-learn developer):

Machine Learning is about building programs with tunable parameters that are
adjusted automatically so as to improve their behavior by adapting to previously
seen data.

Today we'll focus on Supervised Learning:

Supervised learning consists of learning the link between two datasets: the
observed data X and an external variable y that we are trying to predict,
usually called target or labels. Most often, y is a 1D array of length
n_samples.

Like finding a line that separates these black and white points

Or predicting the digit given an small image of a digit, like in the MNIST
dataset:

If you have no clue what algorithm to use, you should read the documentation
first or just use the Scikit-Learn cheat-sheet:

Suppose you're very indecisive, so whenever you want to watch a movie, you ask
your friend Willow if she thinks you'll like it. In order to answer, Willow
first needs to figure out what movies you like, so you give her a bunch of
movies and tell her whether you liked each one or not (i.e., you give her a
labeled training set). Then, when you ask her if she thinks you'll like movie X
or not, she plays a 20 questions-like game with IMDB, asking questions like "Is
X a romantic movie?", "Does Johnny Depp star in X?", and so on. She asks more
informative questions first (i.e., she maximizes the information gain of each
question), and gives you a yes/no answer at the end.

Thus, Willow is a decision tree for your movie preferences.

But Willow is only human, so she doesn't always generalize your preferences
very well (i.e., she overfits). In order to get more accurate recommendations,
you'd like to ask a bunch of your friends, and watch movie X if most of them say
they think you'll like it. That is, instead of asking only Willow, you want to
ask Woody, Apple, and Cartman as well, and they vote on whether you'll like a
movie (i.e., you build an ensemble classifier, aka a forest in this case).

Now you don't want each of your friends to do the same thing and give you the
same answer, so you first give each of them slightly different data. After all,
you're not absolutely sure of your preferences yourself -- you told Willow you
loved Titanic, but maybe you were just happy that day because it was your
birthday, so maybe some of your friends shouldn't use the fact that you liked
Titanic in making their recommendations. Or maybe you told her you loved
Cinderella, but actually you really really loved it, so some of your friends
should give Cinderella more weight. So instead of giving your friends the same
data you gave Willow, you give them slightly perturbed versions. You don't
change your love/hate decisions, you just say you love/hate some movies a little
more or less (you give each of your friends a bootstrapped version of your
original training data). For example, whereas you told Willow that you liked
Black Swan and Harry Potter and disliked Avatar, you tell Woody that you liked
Black Swan so much you watched it twice, you disliked Avatar, and don't mention
Harry Potter at all.

By using this ensemble, you hope that while each of your friends gives
somewhat idiosyncratic recommendations (Willow thinks you like vampire movies
more than you do, Woody thinks you like Pixar movies, and Cartman thinks you
just hate everything), the errors get canceled out in the majority. Thus, your
friends now form a bagged (bootstrap aggregated) forest of your movie
preferences.

There's still one problem with your data, however. While you loved both
Titanic and Inception, it wasn't because you like movies that star Leonardio
DiCaprio. Maybe you liked both movies for other reasons. Thus, you don't want
your friends to all base their recommendations on whether Leo is in a movie or
not. So when each friend asks IMDB a question, only a random subset of the
possible questions is allowed (i.e., when you're building a decision tree, at
each node you use some randomness in selecting the attribute to split on, say by
randomly selecting an attribute or by selecting an attribute from a random
subset). This means your friends aren't allowed to ask whether Leonardo DiCaprio
is in the movie whenever they want. So whereas previously you injected
randomness at the data level, by perturbing your movie preferences slightly, now
you're injecting randomness at the model level, by making your friends ask
different questions at different times.

And so your friends now form a random forest.

A Forest of Decision Trees

Random Forests combines
a 'forest' of decision trees each trained on a random subset of the training
data

So what's a decision tree, we'll here's an example:

Then you train a whole 'forest' of these decision trees on random subsets of
your data so they don't all learn the same features and they don't overfit

Random Forests in Action

Most machine learning algorithms implemented in scikit-learn expect a numpy
array as input X. The expected shape of X is (nsamples, nfeatures)

Supervised Learning also needs as input an array Y, the labels of the data

We'll also split the data in a training and testing set.

Never ever ever ever learn on test data

The classifiers learn the patterns in your data and if you test how well the
classifier works on the data you used to train it, off course its going to work
well. But then you present it with new data and suddenly your performance can go
down the drain. Even better would be to make split your training data even
further (check out cross-validation) and take the model that
performs best on all the different splits of your data. Seriously, go read the
docs on this.

Normally you can take 50-50 splits or 80-20, especially if you want to do Cross
Validation, but this seemed to work just fine.

As you can see, there are 32 features which basically describe the following:

max_force = the peak of the force that paw applied to the ground. The ratio
between front and hind paws seems to be about 60:40 so this should give it a way
to estimate whether its a front or hind paw

max_surface = the peak of the contact surface area of the paw. Again, the
front paws are almost always a bit larger, probably because they also have to
bear more force and are compressed more, so again a nice feature for separating
front and hind paws.

max_duration = the number of frames the paw was in contact with the ground.
This isn't particularly useful, but it might put the difference in frames
between paws in perspective

Then for each paw, we look backwards and forwards 2 paws and calculate the same
features (f, s, m) but we also add the distance in x (width), y (length) and z
(frames). This basically tells you where the other paw was located relative to
the current one. Given that the pattern is highly repeatable, I have high hopes
this will work well.

For the first two and last two paws, there may not be 1 or 2 paws in front or
behind it, so I'll fill those numbers with NaN's for now.

The features

So for the yellow encircled paw, we calculate the x and y distances. The white
line connecting the paws shows the chronological order.

Here I added a plot of the total force (for the whole paw) for each frame. Red
are the right front paws, green are the left front paws. For now, we'll just use
the peak value.

from sklearn.ensemble import RandomForestClassifier
# So we create a class instance, without any hyperparameters, we'll get to that later
rf = RandomForestClassifier()
# We fit a model on the training data, using only the feature columns and we drop any columns that contain NaN's
# Because of I have to drop the NaN's I have to make sure I also drop them from my labels column, so it looks a bit clunky
rf.fit(X_train[cols].dropna(),
X_train[cols+["label"]].dropna()["label"])
score = rf.score(X_test[cols].dropna(),
X_test[cols+["label"]].dropna()["label"])
print "Score: {:.2f}".format(score)
Score: 0.93

Ways to improve performance

We scored 93%! Which is most likely better than the interns that labeled the
data for me! Apparently there are such strong patterns in the data that it
already works really well on such a small data set.

Since the Random Forest is indeed random, the score I get above here seems to
vary slightly each time I run it. There are probably flags to prevent this
behavior, but its more fun this way :-)

Since my example data has a lot of samples (~800, so about 25%) that contain
NaN, so we'll replace them with the mean using Imputation. You
could get more fancy and use regression to predict what values I should replace
them with, but we'll keep it simple. Actually, the way I'm imputing the values
is wrong, because I'm replacing it with the mean along all the columns, even
though there are huge differences based on the size of the dog. A better way
would be to group the dataframe by dog and then impute the values, but again:
let's keep it simple.

We also ran the Random Forest without hyperparameters, so we'll use better
settings. I got these using a Grid Search that checked 216 different
combinations of hyperparameters on a slightly bigger portion of the data. Grid
Search can also perform Cross-Validation, so I'm relatively confident that these
hyperparameters will give a better result

Evaluating the results

This time we score slightly higher: 95%, but on the whole 100% of the dataset.
It seems it systematically get's certainly samples wrong, so tuning the
parameters and imputating values won't really help here.

By looking at the confusion matrix, we see on the diagonal that it did a kick
ass job of predicting most right. Then interestingly enough, there are some
superdiagonal which also have 30-60 errors. It turns out, sometimes when you
give it a Right Front (RF) sample, it gets labeled as a Left Front (LF) sample
or vice versa. The same thing happens for the hind paws. So clearly, it would
help to have better features that distinguish between left and right.

I'm also curious how well it performs on each separate dog in our testing set,
perhaps there are some outliers.

For each dog, I printed out their weight and their mean/std score. This was
calculated on each measurement, perhaps there are just some measurements where
it performs poorly.

If the score dropped below 90% I added a small <- behind it and as we can see
there are at least 4 dogs where it performs more poorly. However, in three of
these dogs it has to classify between 141-178 samples, so getting at least 80%
right of those is still pretty impressive. On our biggest (or at least heaviest)
dog it got 87% right, but out of only 59 measurements. This dog is probably too
big for the plate, so you get much less samples out of each measurements. Also,
a lot of his steps are on the edge of the plate, which I didn't try to predict
in this analysis.

I also looked up the worst measurement, but again randomness made sure that it
didn't align with the example I cooked up below. Because it has some incomplete
steps, it was a bit of a hassle to show that one, so instead we'll show one of
his other measurements instead:

The white line again connects the paws in chronological order. The color on the
inside it was the algorithm predicted, the color on the outside is what it
should have been. It basically just tries LH, RF everywhere, so by chance it got
that right at least once.

The problem seems to be that the dog walked diagonally, which causes left and
right to cross over. His previous and next right front paw don't even hit the
plate, because his strides are that long, so its not hard to imagine why the
classifier got it wrong. Obviously, adversarial samples like this can give hints
to what features I should incorporate to make it more robust.

Take home message

Machine Learning isn't going to bite you. While there are plenty of mistakes you
can make and things that can go wrong, as I've hopefully shown you can get a
really good performing classifier with relative ease.

Sadly the book is really basic, so while it was helpful to get the hang of the syntax, I could have learned that from their documentation. They reuse a lot of code between examples, making them rather repetitive and don't really show off any examples that show the benefits of having a document store like MongoDB. Furthermore, the book is only 53 pages, which really doesn't justify the price. So I'd skip this one and just go for the online documentation and experimenting yourself.

I recently decided to try and create a web app and picked Tornado as my web server, because it is also being used in IPython.
I like learning new tools by reading books about them, so I got my hands on a copy of Introduction to Tornado and got started.

The book is pretty thin, which I think in this case is a good thing. Its not meant to exhaustively describe all features Tornado has to offer, but rather a gentle introduction. The book covers all of the important elements to get you started:

Creating templates

Extending templates with Javascript and CSS

Interacting with databases (MongoDB in this case)

Making your web app asynchronously

Basic security features and authentication

Signing in with Twitter and Facebook's OAuth

The book features several nice examples, like a shopping cart for a bookstore, asynchronously keeping tabs on how many items are remaining. A simple Twitter client displaying your latest tweets, a Facebook client showing your timeline, both dealing with authentication. Most examples worked pretty well, though I had some issues getting the Twitter client working, because of errors I made in the callback url on localhost. I didn't get the Facebook example working for the same reason, but its not a big issue.

Overall, I found it a pretty useful book. While I was already somewhat familiar with Web Apps through Udacity and Coursera courses, it was good to get a bit more formal explanation about topics like routing, handlers and templates. I also liked the way they explained what each part of the code did, instead of assuming you had already figured it out. So while its a short read, I think its a nice introduction to Tornado to get you going.

On the top of my bucket list is fixing a ton of bugs or issues with regards to the workflow, that should make the app run more stable. A lot of things I want to change are caused by common beginner mistakes. Even though I’ve read the Pragmatic Programmer and tried to take a lot of their comments to heart, it’s funny how poorly I understood what it all meant until I really encountered a situation where it applied. Jeff Atwood posted a nice summary of the book here.

Some of the things I’d like to fix are:

Refactoring my early code to be more up to date based on my current experience. This applies to much of the main panel and how the GUI is controlled by PubSub.

Make the code more self-contained. Even though I tried to apply MVC by separating the GUI from the calculations and the database, when I started out a lot of database functions would pass along a message back to the GUI. You can imagine that later on when you want to reuse the same database functions, the GUI got called as well creating all sorts of nasty side effects.

Because I didn’t fully understand how to use debugging, I cheated by simply adding print statements to each function. Obviously that doesn’t scale (though it helped me understand my workflow tremendously), so instead I’d like critical functions to send a PubSub message to which I can subscribe. Then I can simply add a Settings option to print these messages from one central location or simply unsubscribe from them.

Currently when I change the database, I manually change my own MySQL database and then figure out how to replicate it. Up until now I’d advise my one user not to invest large amount of time in annotating contacts, because she’d risk having all that effort go to waste when I decided to update her table. In the future I’d like to make this more easier for the user, especially the non-technical ones, by only dropping tables if its trivial to compute the results again or update the database (rather than dropping it) when it’s an important table.

Interface several database action/settings and more general settings, like making a backup of the database, tweak the ratios of the color map or change the default location where the measurement data can be found.

Add exceptions or make sure the code doesn’t have any expectations about the data. A good example was that my results first required each subject to have 4 paws, else the code wouldn’t run. But some dogs or cats are amputees and humans obviously only have 2 feet, so I did my best to remove any of these expectations. However, I’m sure there are still some of these assumptions hidden in the code waiting to come across a trial that doesn’t meet them.

Document more! This is one major area where I’ve slacked off, telling myself that it’s just a waste of time and didn’t feel like wasting it. However, after spending several days trying to hunt down obscure bugs, because I didn’t fully remember what function triggered what other functions, I’ve definitely changed my mind. Another great advantage I’ve found was that when I write out what I want the code to do, I spot errors in my thinking much faster and get a much better grasp of what I actually want the code to achieve. On top of all this, I want to look into a library that takes all my doc strings and uses that to create proper documentation.

Furthermore, I’d like to keep adding features that either improve the usability or that make the app more useful. Examples of this are:

Allow the user to edit objects, rather than requiring them to delete it and create a new one. This applies to simple things adding measurements after a session has been saved, tweaking a single zone location or even being able to redo it without having to recalculate all the results.

Allow the user to drag the zone’s square to the right position rather than requiring to use the directional keys on the keyboard. Not everyone prefers using a keyboard, so they shouldn’t be forced to use it.

Make it easier to annotate the contacts when the keyboard lacks a numpad. While the numpad is definitely the fastest way to annotate them, laptops generally lack one and pressing Fn while trying to find the keys isn’t as easy. I used to have a version where you could click on the average contacts, but the event required to do so (EVTCHILDFOCUS) had the nasty habit of not being very reliable and getting called more often. One ‘shortcut’ would be to make a button and display the imshow() image as the face of the button.

Create a function to manually override the paw detection in a given slice of the entire plate, so if the automagic detection fails enabled the user to try and fix it.

Allow different shapes, sizes and number of zones. This should make the application more flexible for measuring other kinds of data, such as humans, horses or elephants (you have to think big!).

There are also features I’d like to add that allow for more data exploration and analysis:

Currently I don’t remove any outliers (other than ignoring incomplete contacts or those that didn’t get recognized properly), so it would be helpful if before calculating all the definitive results you could clean up the results. For example by displaying histograms with the distribution of certain variables or plotting all contacts in one graph. By listing all contacts in a list and allow the user to delete them if necessary they can perform any required data cleaning.

At the moment you can only analyze one protocol at a time (while the graphs do allow you to plot both at the same time), but obviously comparing them would be very interesting. As I’ve experienced in the past, just displaying two graphs next to each other is not comparing. It takes a lot of experience to interpret the differences especially without a clear frame of reference.

Another thing that I’d like to analyze are differences on a population level. Even though I already experimented with this, I wasn’t happy with the end result. Simply displaying multiple graphs over one another wasn’t really useful and since the dogs were subdivided into weight groups there was a lot of variety in numbers. Another issue was averaging dogs with a ‘normal’ and amble gait pattern. As you can see in this figure, the step lengths between the left front and right hind paw (bottom right) were either negative (right hind being behind the left front paw) or positive. However, when you calculate an average, you get a value around -10 which doesn’t really describe either pattern. Clearly there’s a need for some more diligence when segmenting the data.

I don’t only want to display the population data alone, but rather compare any dog with the ‘normal’ values based on the medical history of the dog. That way its results are much easier to quantify, because you get a sense of what they should be, so spotting abnormalities should be a lot easier.

Lastly there are several things that I’d like to work on in the future, but that will me require me to learn a lot more. First on the bucket list is:
- Reading more books, first of all finishing Code Complete. I’ve got plenty of interesting books, I just haven’t had the time the past few months to try and read them. On top of that, in most cases its best to have a small pet project to try out all the news things you’re trying to learn. This obviously doesn’t go so well if you’re under any time constraints.
- Experiment with OpenGL, because though I have an animation working, that library doesn’t work with wxPython. So I’d love to learn some OpenGL so I can learn how to draw to a Canvas myself, without the need of a library to do the heavy lifting.
- I’d love to mess around with Microsoft’s Kinect. Not only because it’s a fun gadget, but also because the kinematic data could be a great asset to the gait analysis. First off, it allows me to measure the angles and make estimations about the moments around the joints. Second, it tells me which paw is where at what point in time, so a synchronized Kinect + pressure measurement would make it much easier to automate the paw detection. But obviously Kinect requires some OpenGL knowledge (for performant displaying), OpenCV knowledge for interpreting the image data and brushing up on my old courses on Inverse Kinematics... So I’m still a long way off ever getting this to work.
- Try out if sorting results is any better when using MongoDB. Now I know that NoSQL isn’t the solution to all my database problems, but the problem I have with MySQL is how it requires me to break my data into pieces and stitch them back together when I need them again. I’d much rather leave things as they are and skip tedious parsing loops every time I need a different result. Furthermore, requiring a schema is an absolute pain in the ass when your design isn’t set in stone. I’ve spend nearly as much time writing code to ‘build’ a database as I needed to put things in the database. Off course, I don’t intend to break something that’s already working, so I’m probably going to work on a small pet project to see if I like this any better.
- Given that I’m writing scientific software I need to be pretty darn sure whether the results are correct. I’d love to build in a dummy data set that can function as a ‘Mock’ object and allow me to check whether the results are correct. This should allow me to catch errors with my calculations that aren’t detectable with the human eye. Especially when you have highly dimensional data, errors easily slip in without a good way to spot them early on.
- Above all else, I desperately want to be able to manage measuring myself! Currently I first need to do measurements in the vendor’s software, without any way to tell if the measurement went ok, then export it from their software and import it into mine. You can understand that most clinicians would find this process far too laborious and decide not to use my app. On top of that, the new drivers allow for continuous measuring in contrast to the 2 second limit of the current software version. This would allow me to greatly increase the pace at which the measurements can be performed and analyzed. However, I first need to convince the vendor to give me this access…

As you can see there’s more than enough work for me in store! On top of all this I have to await what the clinic thinks of my first version and how useful they find it. But I’ll be sure to keep you guys up to date on any progress I’m making. If you have any questions on how or why I did something a certain way, be sure to drop a comment!