该读

Back in September last year, the Guardian published a fantastic visualisation looking at house price affordability in the United Kingdom. The raw data is easily available from data.gov.uk, and they provide monthly, annual and the complete history allowing you to work with a reasonably sized set before running on the complete data set. Recreating the Guardian’s data process within Apache Spark felt like a great way to get an introduction into the platform.

uvloop has been making waves in the Python world lately as a blazingly fast drop-in for asyncio’s default event loop. Sanic is a Flask-like, uvloop-based web framework that’s written to go fast. Sanic is made for Python 3.5 . The framework allows you to take advantage of async/await syntax for defining asynchronous functions. With this, you can write async applications in Python similar to how you would write them in Node.js.

This blog post describes three prototype solutions for the task of Named Entity Classification in the context of Booking.com. The aim is to present different approaches to the classification task, analyse their implementation and compare them in a small scale prototype use case. Sample code in Python is also provided in the following sections for each model described.

Most of us programmers go through technical interviews every once in a while. At other times, many of us sit on the opposite side of the table running these interviews. Stakes are high, emotions run strong, intellectual pressure builds up. I have found that an unfortunate code review may turn into something similar to a harsh job interview.

Datetimes are a headache to deal with in Python, especially when dealing with timezones, especially when dealing with different machines with different locales. Maya exists to do all the hard work for you, so you can focus on what you're trying to do — import or export simple datetime data in known human and machine-readable formats.

好物

Pre-trained word vectors of 30+ languages. This project has two purposes. First of all, I'd like to share some of my experience in nlp tasks such as segmentation or word vectors. The other, which is more important, is that probably some people are searching for pre-trained word vector models for non-English languages.

Tiny Gradient Boosting Tree. It is a Tiny implement of Gradient Boosting tree, based on the xgboost algorithm, and support most features in xgboost. This project aims to help people get deeper insights into GBM, especially XGBoost. The current implement has little optimization, so the code is easy to follow. But this leads to high memory consumption and slow speed.