Machine Learning, Entrepreneurship and more…

Post navigation

PyData happened in San Francisco two weeks ago and I’m happy to say that I was fortunate enough to be one of the speakers at this fine event. It was three exciting days of meeting interesting people and listening to insightful … read more →

Everybody who has taken a machine learning course probably knows the geometric intuition behind a support vector machine (SVM, great book): A SVM is a large margin classifier. In other words, it maximizes the geometric distance between the decision boundary and the classes of samples. … read more →

A challenge which machine learning practitioners often face, is how to deal with skewed classes in classification problems. Such a tricky situation occurs when one class is over-represented in the data set. A common example for this issue is fraud detection: … read more →

Half a year ago, I was working in the heart of Silicon Valley and attended many meetups and networking parties – yes, I would call them parties rather than events. It became obvious to me that I wanted to try … read more →

In one of my previous posts about Nutch, I already mentioned plugins. The plugin system is central to how Nutch works and allows you to customize Nutch to your personal needs in a very flexible and maintainable way. Everybody who … read more →

This is going to be an ongoing article series about various aspects of Machine Learning. In the first post of the series I’m going to explain why I decided to learn and use R, and why it is probably the best statistical software … read more →

As you might already have noticed by now, one of my big interests is IT, especially new developments and trends within that area. Just so happens, that in my spare time I like to do many kinds of sports and … read more →

After the installation of Nutch as described in my previous post, you can either follow this tutorial without the need of thinking, or get a sense of how Nutch actually works beforehand. I recommend doing both in parallel. And since you won’t find … read more →

Nutch is a flexible and powerful open source tool for web crawling, developed by the Apache Software Foundation and its community. It builds on Apache Solr and comes with an integration of the highly popular Apache Hadoop, which actually started … read more →

Post navigation

About Me

Hi, I'm Florian Hartl. My main interests are data science, software engineering, health, and meaning. Originally from Bavaria, Germany, I currently live in Santa Monica where I work as a Data Scientist.