How Teem Uses Talend in Our Data Pipeline

11.21.2016 |
Michael Moulton

Our dev team is one of the best in the business. From time to time, we like to highlight what they’ve been up to behind the scenes. (Check out our Product Updates blog to see what feature releases have resulted from behind-the-scenes innovation.)

We talked to Joe Reis and Ken Myers from our data team, to learn how they set up our data pipeline so that our analytics and data platform could reach new heights, and offer even more value to customers.

It all started way back in mid-2015 …

To say that Teem processes a lot of third-party data would be a huge understatement. Teem generates millions of events per day from its thousands of devices and integrated calendars. The load is formidable.

“When Ken and I started, we were tasked with having to load very large amounts of data that was expected to scale rapidly,” says Joe.

“We could have taken the route of writing endless lines of code, but being two people who were extremely restricted on resources and time, we chose to develop and schedule our ETL jobs with Talend Big Data hosted on Amazon Web Services (AWS) infrastructure. It was the simplest way to get where we wanted to go, fast. It did a lot of the heavy lifting for us that would’ve otherwise required a team of several very senior developers.”

Both Joe and Ken had worked with Talend and AWS at their previous jobs.

“Talend has connectors for most of the prevailing data sources out there,” says Ken. “It’s comprehensive and provides an easy UI for mapping data from source to destination.”

From the beginning, Talend allowed Joe and Ken to very rapidly pull in data from dozens of sources and build an enterprise-scale data pipeline capable of ingesting Teem’s rapidly growing data sets. The easy-to-use database components also allowed the rapid creation of data warehouses in Postgres and Redshift. These data warehouses feed the analytics engine for both internal and external users.

Joe adds, “We like it for the potential it has to support all of our jobs, schedule our recurring jobs, and monitor success and failure.”

In part, Talend was a good choice for us because it supports convenient handling of multiple data formats, including JSON.

Currently, all of Teem’s external source data is in JSON format. That’s because our goal is to help our customers create frictionless workplaces, which means being able to integrate with and pull data from all sorts of systems and tools.

“Especially for IoT, JSON is becoming the lingua franca of how IoT devices communicate,” says Joe.

Ken adds: “JSON is a powerful means of data exchange through an API. We’re accessing a range of APIs to pull in everything that represents your workplace, aiming to learn about everything you use and interact with so that we can build more meaningful reporting and recommended actions.”

Joe and Ken point out that the reason we’re working toward being able to collect more – and more diverse – data for our customers isn’t to act as Big Brother, but rather to “anticipate what you need at work, before you need it.”

Eventually, our platform will be able to anticipate that you need more meetings rooms before you need them, or know in advance that you’ll be wanting to schedule a desk for the afternoon.

As Joe puts it: “We’re not trying to control how many meetings you have. We want to make sure you’re in the right meetings with the right people, doing the right things. Optimizing is about eliminating the waste in the workplace.”

“We’re actively seeking out integrations with workplace systems so we can paint that picture,” says Ken. “And as we move closer to near real-time reporting, we will be migrating a number of current batch processes to a hybrid real time and micro-batch architecture so that both processes can operate to their strengths.”

“And do it without bloating our team and our budget,” says Joe. “We could accomplish the same thing by hiring a crazy number of devs, but we don’t need to.”