Topics

Featured in Development

Peter Alvaro talks about the reasons one should engage in language design and why many of us would (or should) do something so perverse as to design a language that no one will ever use. He shares some of the extreme and sometimes obnoxious opinions that guided his design process.

Featured in AI, ML & Data Engineering

Today on The InfoQ Podcast, Wes talks with Katharine Jarmul about privacy and fairness in machine learning algorithms. Jarul discusses what’s meant by Ethical Machine Learning and some things to consider when working towards achieving fairness. Jarmul is the co-founder at KIProtect a machine learning security and privacy firm based in Germany and is one of the three keynote speakers at QCon.ai.

Featured in Culture & Methods

Organizations struggle to scale their agility. While every organization is different, common patterns explain the major challenges that most organizations face: organizational design, trying to copy others, “one-size-fits-all” scaling, scaling in siloes, and neglecting engineering practices. This article explains why, what to do about it, and how the three leading scaling frameworks compare.

Riley Newman on How Airbnb Uses Data Science

Riley Newman, head of data science at Airbnb, recently published an article describing how the Californian startup defines and uses data science. He explains that data can be seen as the voice of the customers, and data science as an act of interpretation. He also details several initiatives that have been particularly important for scaling data science, including partnering data scientists directly with other teams, integrating data science in every business process, and building a fast and stable data infrastructure.

In the earliest days of the company, Airbnb’s founders - Brian Chesky, Joe Gebbia, and Nathan Blecharczyk - used to meet personally with guests and hosts to improve the service. The company is still doing so but, with 30M guests per year, it is now impossible to connect with everyone. As an alternative, Airbnb records various events and actions from its booking platform and uses this data as a proxy to understand what users like or dislike. This kind of feedback is particularly valuable to make decisions about “community growth, product development, and resource prioritization”, but it first needs to be deciphered and translated by data scientists “into a language more suitable for decision-making”.

While strongly connected to the company history, this vision of data as “customer’s voice” and data scientists as “interpreters” has not been easy to preserve during the growth of the company. Among the many aspects and initiatives described in the article, three stand out more particularly.

First, data scientists should not be seen as passive statistics-gathering people. They should interact directly with other business functions, not only to fully understand the problem to be solved, but also to make sure decision makers fully understand the result of their analysis and can, therefore, act upon them. Airbnb data science team is hence organized in sub-teams that partner directly with engineers, designers, product managers, and others.

Second, data and data science should be present at every stage of the decision process. Airbnb typically breaks up this process into 4 stages: Learn, Plan, Test, and Measure; each of them benefiting from different elements of data science. According to Rilley: “the more disciplined we’ve become about following each step sequentially, the more impactful everyone at Airbnb has become”.

Last but not least, data science should rely upon a fast and stable infrastructure to minimize the time spent querying the data, and allow non-scientists to answer basic data questions on their own. This is particularly useful to democratize the use of data in all business functions, but it also allows data scientists to stay focus on more complex problems.

Contacted earlier by email, Riley gave us some additional insights into Airbnb’s data science team.

InfoQ: What type of people are you looking for to join the data science team?

Riley: We look for people coming from a wide range of backgrounds. I don’t think there’s a standard template for who will be successful in this field, but the traits that tend to be correlated with success are: a curious/inquisitive mind, an eagle-eye for detail, and an effective communicator. Understanding statistics and R or Python are also critical.

InfoQ: How do you know if you picked up a great candidate?

Riley: We’ve tweaked our interview process a lot over the years to maximize the signal we get from the least possible friction for candidates and our employees. Today it mostly consists of giving candidates some data, a broad question, and then seeing how they attack it. When they’re ready, they present their work back to a couple of people from our team so we can discuss how they thought about the problem, the path they chose to solve it, and what about their results is actionable. Great candidates demonstrate the traits above through this challenge project.

InfoQ: What tools or technologies do you use the most as a data scientist?

Riley: Most of the team spends their time with just a few tools: Hive and Presto (aka SQL) for extracting data from our Hadoop cluster, R and Python for analyzing it, and Tableau for visualization (sometimes other tools for custom viz projects). If someone hoping to break into the field only knew SQL and (R or Python), they could go a long way.

InfoQ: What technical improvements are you looking forward to see in the next few years?

Riley: There are currently a lot of steps between the decision to instrument an aspect of our product experience with logging that will capture some data, and turning that data into an actionable insight that informs a business decision. The closer we can get to tightening that feedback loop, the more effective we’ll be.

In conclusion, reflecting on his first five years at Airbnb, Rilley explains that “measuring the impact of data science is ironically difficult”, but seeing a robust infrastructure, an increasing ability to determine causal impacts, and a systematic use of data to make all kind of decisions is certainly a good signal.