"time series data" entries

Analytic Monitoring, Commenter Demographics, Math and Empathy, and How We Read

MacroBase — Analytic monitoring for the Internet of Things. The code behind a research paper, written up in the morning paper where Adrian Colyer says, there is another story that also unfolds in the paper – one of careful system design based on analysis of properties of the problem space, of thinking deeply and taking the time to understand the prior art (aka “the literature”), and then building on those discoveries to advance and adapt them to the new situation. “That’s what research is all about!” you may say, but it’s also what we’d (I’d?) love to see more of in practitioner settings, too. The result of all this hard work is a system that comprises just 7,000 lines of code, and I’m sure, many, many hours of thinking!

Survey of Commenters and Comment Readers — Americans who leave news comments, who read news comments, and who do neither are demographically distinct. News commenters are more male, have lower levels of education, and have lower incomes compared to those who read news comments. (via Marginal Revolution)

Moneyball for Book Publishers: A Detailed Look at How We Read (NYT) — On average, fewer than half of the books tested were finished by a majority of readers. Most readers typically give up on a book in the early chapters. Women tend to quit after 50 to 100 pages, men after 30 to 50. Only 5% of the books Jellybooks tested were completed by more than 75% of readers. Sixty percent of books fell into a range where 25% to 50% of test readers finished them. Business books have surprisingly low completion rates. Not surprisingly low to anyone who has ever read a business book. They’re always a 20-page idea stretched to 150 pages because that’s how wide a book’s spine has to be to visible on the airport bookshelf. Fat paper stock and 14-point text with wide margins and 1.5 line spacing help, too. Don’t forget to leave pages after each chapter for the reader’s notes. And summary checklists. And … sorry, I need to take a moment.

How Snapchat Built a Business by Confusing Olds (Bloomberg) — Advertisers don’t have a lot of good options to reach under-30s. The audiences of CBS, NBC, and ABC are, on average, in their 50s. Cable networks such as CNN and Fox News have it worse, with median viewerships near or past Social Security age. MTV’s median viewers are in their early 20s, but ratings have dropped in recent years. Marketers are understandably anxious, and Spiegel and his deputies have capitalized on those anxieties brilliantly by charging hundreds of thousands of dollars when Snapchat introduces an ad product.

Tracking Voters — On the night of the Iowa caucus, Dstillery flagged all the [ad network-mediated ad] auctions that took place on phones in latitudes and longitudes near caucus locations. It wound up spotting 16,000 devices on caucus night, as those people had granted location privileges to the apps or devices that served them ads. It captured those mobile ID’s and then looked up the characteristics associated with those IDs in order to make observations about the kind of people that went to Republican caucus locations (young parents) versus Democrat caucus locations. It drilled down further (e.g., ‘people who like NASCAR voted for Trump and Clinton’) by looking at which candidate won at a particular caucus location.

Data Science, Temporal Graph, Biomedical Superstars, and VR Primer

50 Years of Data Science (PDF) — Because all of science itself will soon become data that can be mined, the imminent revolution in Data Science is not about mere “scaling up,” but instead the emergence of scientific studies of data analysis science-wide.

The O'Reilly Data Show Podcast: Phil Liu on the evolution of metric monitoring tools and cloud computing.

One of the main sources of real-time data processing tools is IT operations. In fact, a previous post I wrote on the re-emergence of real-time, was to a large extent prompted by my discussions with engineers and entrepreneurs building monitoring tools for IT operations. In many ways, data centers are perfect laboratories in that they are controlled environments managed by teams willing to instrument devices and software, and monitor fine-grain metrics.

During a recent episode of the O’Reilly Data Show Podcast, I caught up with Phil Liu, co-founder and CTO of SignalFx, a SF Bay Area startup focused on building self-service monitoring tools for time series. We discussed hiring and building teams in the age of cloud computing, building tools for monitoring large numbers of time series, and lessons he’s learned from managing teams at leading technology companies.

Evolution of monitoring tools

Having worked at LoudCloud, Opsware, and Facebook, Liu has seen first hand the evolution of real-time monitoring tools and platforms. Liu described how he has watched the number of metrics grow, to volumes that require large compute clusters:

One of the first services I worked on at LoudCloud was a service called MyLoudCloud. Essentially that was a monitoring portal for all LoudCloud customers. At the time, [the way] we thought about monitoring was still in a per-instance-oriented monitoring system. [Later], I was one of the first engineers on the operational side of Facebook and eventually became part of the infrastructure team at Facebook. When I joined, Facebook basically was using a collection of open source software for monitoring and configuration, so these are things that everybody knows — Nagios, Ganglia. It started out basically using just per-instance instant monitoring techniques, basically the same techniques that we used back at LoudCloud, but interestingly and very quickly as Facebook grew, this per-instance-oriented monitoring no longer worked because we went from tens or thousands of servers to hundreds of thousands of servers, from tens of services to hundreds and thousands of services internally.

Teach Don’t Tell — what I think good documentation is and how I think you should go about writing it. Sample common sense: This is obvious when you’re working face-to-face with someone. When you tell them how to play a C major chord on the guitar and they only produce a strangled squeak, it’s clear that you need to slow down and talk about how to press down on the strings properly. As programmers, we almost never get this kind of feedback about our documentation. We don’t see that the person on the other end of the wire is hopelessly confused and blundering around because they’re missing something we thought was obvious (but wasn’t). Teaching someone in person helps you learn to anticipate this, which will pay off (for your users) when you’re writing documentation.