data.coalesce() — Automattic Data Division Meets in Montréal

At Automattic, we know you don’t have to come to an office every day to create excellent products. (Ours serve more than 400,000,000 unique visitors a month.) Most of the time, we work from wherever we are, collaborating via the internet. From time to time, however, we meet in person for several days of intense discussions, socializing, and some sight seeing.

Last week, Automattic’s Data Division got together for its annual meetup. This year, the datums, as we call ourselves, were in Montréal. As Automatticians, our creed includes the tenet “I will never stop learning.” This meetup presented an excellent opportunity for our in-house experts to share their knowledge and experience with the rest of the division — something we do often here at Automattic. The Data Division — made up of data scientists, data engineers, front-end developers, and designers — has a wide range of expertise to share with one another.

Search Wrangler Greg Brown kicked things off with a discussion on the data structures that Elasticsearch implements for scaling search and analytics.

Up next, Code Wrangler Ben Lowery demonstrated how to build a redux-connected component with Calypso, the framework that powers WordPress.com.

We delved into the intricacies of spam detection on WordPress.com. Code Wrangler Peter Westwood shared some of the latest trends in blog spam and the constant work it takes to keep squashing spammers.

Data Scientist Sirin Odrowski gave an overview of text analysis techniques, and discussed the theory and implementation of word embeddings.

We learned more about the architecture of Spark from Data Engineer Anand Nalya, and he gave us a great overview of the internal tools available for debugging our Spark applications.

Finally, Data Engineer Xiao Yu gave us some tips for optimizing query speed using Impala and Hive, sharing details of how queries are processed by both engines and when it’s beneficial to use one over the other.

After all that, we took a day off to see some sights, throw some axes, and do some whitewater rafting!

The Data Division would fare well in a Zombie Apocalypse.

Data for Breakfast! Designer Jan Cavan cooked up a new logo for the occasion!

Sirin Odrowski shares her knowledge of word embeddings.

Enjoying an afternoon coffee.

It wasn’t all code, calculus and coffee! Our division lead, Martin Remy, also inspired a series of strategic discussions, centered around better serving our users. The Data Division has a unique role in developing products for WordPress.com users, like the Reader, and developing internal tools aimed at connecting Automatticians with the data relevant to their needs.

Nothing motivates hard work like a stern boss

Automattic is a growing company with a huge impact on the open web. The Data Division is just one part of the complex machinery that keeps Automattic running smoothly. We love our work, we love the challenges it poses, and the impact we create. We’re also very friendly and love working with smart people from different countries and cultures. You should definitely check out our Work With Us page to see whether one of our open positions is right for you. Who knows, maybe we’ll see you at the next meetup!