Peekaboo

Thursday, March 20, 2014

As the title suggests, this is a non-machine-learning, non-vision, non-python post *gasp*.
Some people in my network posted about spritz a startup that recently went out of stealth-mode. They do a pretty cool app for speed reading. See this huffington post article for a quick demo and explanation.
They say they are still in development, so the app is not available for the public.

The app seemed seems pretty neat but also pretty easy to do. I said that and people came back with "they probably do a lot of natural language processing and parsing the sentence to align to the proper character" and similar remarks.
So I reverse engineered it. By which I mean opening the demo gifs in Gimp and counting letters. And, surprise surprise: they just count letters. So the letter they highlight (at least in the demo) is only depending on the letter of the word.
The magic formula is highlight = 1 + ceil(word_length / 4)
. They might also be timing the transitions differently, haven't really looked at that, I must admit.
After this revelation, I coded up my own little version in javascript.
Obviously the real value of an app like that is integration with various ecosystems, browser integration etc...
But if you are just interested in the concept, you can paste your favorite (or maybe rather least favorite given the nature of the app?) e-book into my webpage and give it a shot.
The code is obviously on github and CC0.
I wouldn't try to use it commercially though, without talking to spritz.

Monday, July 29, 2013

Yesterday a week-long scikit-learn coding sprint in Paris ended.
And let me just say: a week is pretty long for a sprint. I think most of us were pretty exhausted in the end. But we put together a release candidate for 0.14 that Gael Varoquaux tagged last night.

You can install it via:pip install -U https://github.com/scikit-learn/scikit-learn/archive/0.14a1.zip

Tuesday, July 2, 2013

The ICML is now already over for two weeks, but I still wanted to write about my reading list, as there have been some quite interesting papers (the proceedings are here). Also, I haven't blogged in ages, for which I really have no excuse ;)

There are three topics that I am particularly interested in, which got a lot of attention at this years ICML: Neural networks, feature expansion and kernel approximation, and Structured prediction.

Sunday, January 27, 2013

Some time ago I wrote about a structured learning project I have been working on for some time, called pystruct.
After not working on it for some time, I think it has come quite a long way the last couple of weeks as I picked up work on structured SVMs again. So here is a quick update on what you can do with it.

To the best of my knowledge this is the only tool with ready-to-use functionality to learn structural SVMs (or max-margin CRFs) on loopy graphs - even though this is pretty standard in the (computer vision) literature.

Friday, January 25, 2013

As you hopefully have heard, we at scikit-learn are doing a user survey (which is still open by the way).
One of the requests there was to provide some sort of flow chart on how to do machine learning.

Monday, January 21, 2013

After a little delay, the team finished work on the 0.13 release of scikit-learn.
There is also a user survey that we launched in parallel with the release, to get some feedback from our users.

There is a list of changes and new features on the website.
You can upgrade using easy-install or pip using:

pip install -U scikit-learn
or
easy_install -u scikit-learn

There were more than 60 people contributing to this release, with 24 people having 10 commits or more.

Again many improvements are behind the scenes or only slightly notable. We improved test coverage a lot and we have much more consistent parameter names now. There is now also a user guide entry for the classification metrics, and their naming was improved.

This was one of the many improvements Arnaud Joly, who joined the project very recently but nevertheless wound up being the one with the second most commits in this release!

Now let me get to some of the more visible highlights of this release from my perspective:

- Thanks to Lars and Olivier, the Hashing Trick finally made it into scikit-learn.
This allows for very fast vectorization of large text corpora and stateless transformers for the same.

- Sample weights were added to the tree module thanks to Noel and Gilles. This enabled the implementation of a smarter resampling for random forests, which leads to a speed-up of random forests of up to a factor of two! Also, this is the basis of including AdaBoost with Trees in the next release.

Wednesday, December 26, 2012

Recently we added another method for kernel approximation, the Nyström method,
to scikit-learn, which will be featured in the upcoming 0.13 release.
Kernel-approximations were my first somewhat bigger contribution to
scikit-learn and I have been thinking about them for a while.
To dive into kernel approximations, first recall the kernel-trick.