Tag: machine learning

This week I worked on automated email analysis and storage for side-project #38465 (more on this in future editions) and on bits of UI for a wxPython desktop app (yes desktop app! some of us fortunately still get to make them!) for my current main work project.

Had to make screencast to demonstrate milestone deliverable of above-mentioned main project. Making screencasts is an obscure but longstanding hobby of mine, but I needed to level up slightly, so the business bought me ScreenFlow 7.2. For the first time ever, I recorded the screencast in multiple segments and did the voice-over later. Soon these new skillz will trickle down to my publically available screencasts.

On that topic, having a good microphone is crucial, not only for screen recordings but also for video meetings. I recently acquired the Samson Go Mic to complement my larger Samson C01U. The Go is brilliant: Recorded voice quality comes close to the C01U in spite of the Go’s compact form factor, and it has a hardware switch to select either of the the built-in omni-directional, for meetings, or cardioid, for more dedicated voice recording, microphone elements.

Ironically, an ex-colleague posted “How to Fix Facebook—Before It Fixes Us” on Facebook, a long and worthwhile read on how FB is used to spread fake news that effectively manipulates public opinion, and what should be done to remedy this. Here is a choice quote to get you started:

We still don’t know the exact degree of collusion between the Russians and the Trump campaign. But the debate over collusion, while important, risks missing what should be an obvious point: Facebook, Google, Twitter, and other platforms were manipulated by the Russians to shift outcomes in Brexit and the U.S. presidential election, and unless major changes are made, they will be manipulated again. Next time, there is no telling who the manipulators will be.

In the same vein, I continuously try to spend as few as possible minutes on YouTube, but the one thing I will definitely continue watching is Károly Zsolnai-Fehér’s brilliant Two Minute Papers channel! Most recently, his treatment of Distilling a Neural Network Into a Soft Decision Tree, a paper by Nicholas Frost and Geoffrey Hinton, caught my interest. In this, they address the problem of neural network explicability (it’s hard saying at a higher level why a neural network makes a particular decision) by deriving a soft decision tree from that trained neural network. The tree is not as accurate as the network, but is able to give plausible explanations for the network’s decisions. See the 4 minute long two minute paper video (hehe) here:

I came across the following on reddit, again quite ironically, and I have since taken to saying it to my genetic offspring units (GOUs) at every possible opportunity:

The week of Monday February 13 to Sunday February 19, 2017 might have appeared to be really pretty boring to any inter-dimensional and also more mundane onlookers.

(I mention both groups, because I’m almost sure I would have detected the second group watching, whereas the first group, being interdimensional, would probably have been able to escape detection. As far as I know, nobody watched.)

I just went through my orgmode journals. They are filled with a mix of notes on the following mostly very nerdy and quite boring topics.

Warning: If you’re not an emacs, python or machine learning nerd, there is a high probability that you might not enjoy this post. Please feel free to skip to the pretty mountain at the end!

Advanced Emacs configuration

Prelude is a fantastic Emacs “distribution” (it’s a simple git clone away!) that truly upgrades one’s Emacs experience in terms of look and feel, and functionality. It played a central role in my return to the Emacs fold after a decade long hiatus spent with JED, VIM (there was more really weird stuff going on during that time…) and Sublime.

However, it’s a sort of rite of passage constructing one’s own Emacs configuration from scratch, and my time had come.

In parallel with Day Job, I extricated Prelude from my configuration, and filled up the gaps it left with my own constructs. There is something quite addictive using emacs-lisp to weave together whatever you need in your computing environment.

To celebrate, I decided that it was also time to move my todo system away from todoist (a really great ecosystem) and into Emacs orgmode.

I had sort of settled with todoist for the past few years. However, my yearly subscription is about to end on March 5, and I’ve realised that with the above-mentioned Emacs-lisp weaving and orgmode, there is almost unlimited flexibility also in managing my todo list.

Anyways, I have it setup so that tasks are extracted right from their context in various orgfiles, including my current monthly journal, and shown in a special view. I can add arbitrary metadata, such as attachments and just plain text, and more esoteric tidbits such as live queries into my email database.

The advantage of having the bulk of the tasks in my month journal, means I am forced to review all of the remaining tasks at the end of the month before transferring them to the new month’s journal.

We’ll see how this goes!

Jupyter Notebook usage

Due to an interesting machine learning project at work, I had a great excuse to spend some quality time with the Jupyter Notebook (formerly known as IPython Notebook) and the scipy family of packages.

However, the initial exhilaration quickly fizzled out as EIN exhibits some flakiness (primarily broken indentation in cells which makes this hard to interact with), and I had no time to try to fix or work-around, because day job deadlines. (When I have a little more time, I will have to get back to the EIN! Apparently they were planning to call this new fork Zwei. Now that would have been awesome.)

So it was back to the Jupyter Notebook. This time I made an effort to learn all of the new hotkeys. (Things have gone modal since I last used this intensively.)

The Notebook is an awe-inspiringly useful tool.

However, the cell-based execution model definitely has its drawbacks. I often wish to re-execute a single line or a few lines after changing something. With the notebook, I have to split the cell at the very least once to do this, resulting in multiple cells that I now have to manage.

In certain other languages, which I cannot mention anymore because I have utterly exhausted my monthly quota, you can easily re-execute any sub-expression interactively, which makes for a more effective interactive coding experience.

The notebook is a good and practical way to document one’s analytical path. However, I sometimes wonder if there are less linear (graph-oriented?) ways of representing the often branching routes one follows during an analysis session.

Dissimilarity representation

Some years ago, I attended a talk where Prof. Robert P.W. Duin gave a fantastic talk about the history and future of pattern recognition.

In this talk, he introduced the idea of dissimilarity representation.

In much of pattern recognition, it was pretty much the norm that you had to reduce your training samples (and later unseen samples) to feature vectors. The core idea of building a classifier, is constructing hyper-surfaces that divide the high-dimensional feature space into classes. An unseen sample can then be positioned in feature space, and its class simply determined by checking on which side of the hypersurface(s) it finds itself.

However, for many types of (heterogenous) data, determining these feature vectors can be prohibitively difficult.

With the dissimilarity representation, one only has to determine a suitable function that can be used to calculate the dissimilarity between any two samples in the population. Especially for heterogenous data, or data such as geometric shapes for example, this is a much more tractable exercise.

More importantly, it’s often easier to discuss with domain experts about similarity than it is to talk about feature spaces.

Due to the machine learning project mentioned above, I had to work with categorical data that will probably later also prove to be of heterogeneous modality. This was of course the best (THE BEST) excuse to get out the old dissimilarity toolbox (in my case, that’s SciPy and friends), and to read a bunch of dissimilarity papers that were still on my list.

Besides the fact that much fun was had by all (me), I am cautiously optimistic, based on first experiments, that this approach might be a good one. I was especially impressed by how much I could put together in a relatively short time with the SciPy ecosystem.

Machine learning peeps in the audience, what is your experience with the dissimilarity representation?

A mountain at the end

By the end of a week filled with nerdery, it was high time to walk up a mountain(ish), and so I did, in the sun and the wind, up a piece of the Kogelberg in Betty’s Bay.

At the top, I made you this panoroma of the view:

Click for the 7738 x 2067 full resolution panorama!

At that point, the wind was doing its best to blow me off the mountain, which served as a visceral reminder of my mortality, and thus also kept the big M (for mindfulness) dial turned up to 11.

I was really only planning to go up and down in brisk hike mode due to a whiny knee, but I could not help turning parts of the up and the largest part of the down into an exhilarating lope.

From now on, I would like to limit WHVs to bullets (really) or to named sections, to ease reading. DOWN WITH WALLS OF TEXT!

After a multi-year, completely coincidental, break from medical imaging, I am back to The Real Business since the start of July. I am super excited about the plans we’re cooking up and executing. I can obviously not say too much, unless beer is involved, or you hang around here for muuuuuch longer. I think I am allowed to mention digital pathology and machine learning and beer.

Pro tip: When road tripping with more than 0 (zero) children (babies count double; sick babies +5 hit points), and you have to stay overnight somewhere, invest extra in the biggest suite you (or your children’s college fund) can afford.

On the beach in St Francis Bay (right in the middle of winter, you still seem to get these lovely balmy beach days), it seemed that everybody was surfing. Whole families, with the mom, the dad, all the kids, and grandma and grandpa, were all on various sizes of surfboards in the sea catching some waves.

Here’s a photo from the furthest point on what I call “Not The Ugliest Jogging Route in The World” (in St Francis Bay):

Last night I accidentally discovered that I can pinpoint the exact weekend and location when and where I first tasted my favourite trappist beer (EVER), namely Rochefort #8. It’s all written up in this 2003 post.

When Google send me an email this weekend asking me exactly how I would like them to use my email (yes, a few months ago I migrated my mail empire back to Google because my self-hosting experiment had started to cost me time and money) to show me custom advertisements, I was reminded that I do actually find the machine learning models they’re building about me quite creepy, and that perhaps I would prefer not also handing them 12 years of emails to make their models more accurate. There and then I migrated said 12 years of emails from GMail to FastMail. So far I’m really impressed by the product, mostly due to the speed and the user experience of the web-app. There might be a more detailed post in the near future, let me know if you’re interested.

Most surprising and interesting (to me) new scientific discovery of the week: A team of scientists at Northeastern University in Boston have shown that one of the many kinds of bacteria living in your stomach eats exclusively GABA, a really important brain chemical (neurotransmitter) that plays a role in keeping you calm. Based on this and other work, it looks like the bacteria in your tummy, also known as your gastrointestinal microbiota, besides being crucially important to your digestive system and your general survival, probably also play quite an important role in your psyche. I find this slightly mind-blowing.

Have a great week kids, I hope to see you on the other side.

P.S. Those bullets made for quite an impressive wall of text, didn’t they?

Get the weekly email update!

About

This is my personal blog. You will find posts on science, software, general nerdery, privacy and backyard philosophy. It also hosts the Weekly Head Voices, a weekly (mostly not) personal diary, in which I usually also try to include something entertaining and/or educational.