During some recent downtime, I downloaded the raw pitch-by-pitch data for the 2016 MLB season. The complete dataset is over 725,000 records. While certainly not GB-scale, a 700k set of records provides a decent population to work with.

Shifting to a Python and pandas mindset for performing the data import and data cleansing was a good challenge for me. As I worked through the subsequent calculations, I started to think about possible uses for the flexibility and power that this approach provided. It’s a really complementary set of skills for anyone with a strong SQL background and who dedicates lots of time to data analysis.

Starting from scratch with ELK v5 was a lot of fun. And after replicating the baseball calculations in a Kibana dashboard, I asked several questions of the data by running searches of the play-by-play descriptions. The combination of basic visualizations with Kibana and data exploration in Elasticsearch is where the solution really shines for data analysis situations.

Amazon announced AWS Quicksight at re:Invent 2015 and then released it publicly more than a full year later. After taking Amazon’s Quicksight for a “quick” spin, I’m very bullish on the future of the product. It’s pretty simple to get up and going, and I have every confidence in AWS’s commitment to continued deployment of new features.

The success of any product is whether the customer uses it — and, in nearly all cases — pays for it. I’m going to describe on how I have personally moved away from the monolithic ESPN for my sports consumption, moving instead toward sports “microservices.” For the first time ever, I am seriously considering killing my cable/dish subscription with little feelings of loss for ESPN. Here are a couple of examples.

We’ve been using Elasticsearch a lot over the past year. It’s a fantastic distributed data store and can do lots more than just power search in your application. Elastic’s ELK stack includes Elastic, Logstash, and Kibana. Elastic and Logstash each merit their own discussion. Right now I want to focus just Kibana, which is how you can explore and visualize data in an Elasticsearch index.

I’ve found that on many occasions I want to take a quick look at some data without going through the process of extracting a data set or querying a database. Maybe a colleague has sent me a spreadsheet, and I just want to quickly visualize the data set. Maybe I just want to take a quick look at my progress working on an analysis and see if the data are telling me what I think they are.

The ability to paste data directly into JMP or Tableau is probably one of my favorite underestimated features. Here’s how I do it:

I've had a lot of conversations lately regarding the adoption of polyglot programming and persistence in applications and database solutions. And if not actual adoption, at least moving toward adoption of the concepts and approach of a polyglot world. I get excited about our solutions harnessing the best features of available technologies.