5. The Space between the Data Set and the Algorithm
Many people go straight from a data set to applying an algorithm. But there’s a huge space in between of important stuff. It’s easy to run a piece of code that predicts or classifies. That’s not the hard part. The hard part is doing it well.
One needs to conduct exploratory data analysis as I’ve emphasized; and conduct feature selection as Will Cukierski emphasized.

I've highlighted the part of the post which describes exactly what we've been doing!

Unfortunately the accuracy on the cross validation set (10% of the training data) was only 24% which is pretty useless so it's back to the drawing board!

Our next task is to try and work out whether we can derive some features which have a stronger correlation with the label values or combining the new features with the existing pixel values to see if that has any impact.

As you can probably tell I don't really understand how you should go about extracting features so if anybody has ideas or papers/articles I can read to learn more please let me know in the comments!