Who is this presentation for?

What you'll learn

Explore real-world production applications of open source and cloud technology

Learn how web analytics data enables insights into consumer behavior and what modern machine learning and natural language processing techniques work well against web content

Description

Parse.ly runs a real-time web and content analytics platform that serves 350+ enterprise clients, 30,000+ site operators, and thousands of high-traffic sites. This platform is used to understand audience, content, and attention at a granular level, but the aggregate data exhaust from these integrations provides a front-row seat to what the internet is looking at today.

Andrew Montalenti explains how consumer attention in the web era really works (e.g., to what degree Facebook and Google dominate consumer web attention versus more niche platforms). Andrew also showcases how Parse.ly recently applied modern natural language processing and machine learning techniques to better understand its evolving dataset of more than a million unique pieces of content per day, including how the company classified all web pages into a structured content taxonomy and automatically extracted out relevant topics and entities.

Alongside some of these network data findings related to news trends, social networks, search engines, and device usage patterns, Andrew also digs into the technology running under the hood, particularly multicloud setups (in the hundreds) with Elasticsearch, Cassandra, Kafka, Storm, and Spark, and discusses open source projects the company has built and released, such as PyKafka and streamparse. Andrew even talks about Parse.ly’s recent adoption of serverless cloud tooling, which makes machine learning easier.

Andrew concludes by explaining how Parse.ly’s web-wide trend data has been used so far, such as for content strategy inside major newsrooms as well as for predicting offline consumer behavior (e.g., which movies would win at the box office based on the web attention those movies received in weeks prior).

Andrew Montalenti

Parse.ly

Andrew Montalenti is the cofounder and CTO of Parse.ly, a widely used real-time web content analytics platform. The product is trusted daily by editors at HuffPost, Time, TechCrunch, Slate, Quartz, the Wall Street Journal, and over 350 other leading digital companies. Andrew is a dedicated Pythonista and has presented his team’s work at the PyCon and PyData conferences. He is also the cohost of the web data and analytics podcast The Center of Attention. For more information, check out Parse.ly’s research on internet attention via @parsely.