Martin Kleppmann

In my time exploring the world of data, I often feel intrigued by the unfamiliar. But like most, I’m leery of buzzwords and maybe a little worried that I’m missing out on the thing that we’ll all need to learn to extend our careers beyond the next five years.

Adding to the frustration, I have difficulty evaluating new technologies when the only help I’ve got is a vendor’s brochure, or a blog titled “Why X is way way better than Y”. Balanced, unbiased help is hard to find. And I really value unbiased ideas. Personally, I try not to appear the Microsoft groupie. I value my MVP award, but I also like to stress the “independent” part of that.

I’m also wary of some academic researchers. The theory can sometimes drift too far away from the practical (dba.stackexchange only has 411 results for “normal form” and I suspect many of those are homework questions). Some academics almost seem offended whenever a vendor deviates from relational theory.

That’s why I was so thrilled (and bit relieved) to discover Martin Kleppmann’s Designing Data-Intensive Applications. Martin is an amazing writer who approached his book with a really balanced style. He’s also a researcher with real-world experience that helps him focus on the practical.

First let me just say that the book has a really cool wild boar on the cover. The boar reminds me of Porcellino at the University of Waterloo.

Designing Data-Intensive Applications is a book that covers database systems very comprehensively. He covers both relational systems and distributed systems. He covers data models, fault tolerance strategies and so much more. In fact he covers so many topics that the whole book seems like a table of contents for our data industry.

Here’s the thing. When I read the parts I know, he’s a hundred percent right and that helps me trust him when he talks about the parts that I don’t know about.

Martin talks a lot about distributed systems, both the benefits and drawbacks, and even though you may have no plans to write a Map-Reduce job, you’ll be equipped to talk about it intelligently with those that do. The same goes for other new systems. For example, after reading Martin’s book, you’ll be able to read the spec sheet on Cosmos DB and feel more comfortable reasoning about its benefits (but that’s a post for another day).

Event Sourcing

Martin then goes on to write about event sourcing. Martin is a fan of event sourcing (as are several of my colleagues). In fact I first learned about event sourcing when a friend sent me Martin’s video Turning the Database Inside Out With Apache Samza.
A ridiculously simplified summary is this. Make the transaction log the source of truth and make everything else a materialized view. It allows for some really powerful data solutions and it simplifies some problems you and I have learned to live with. So I wondered whether his chapter on event sourcing would sound like a commercial. Nope, that chapter is still remarkably well balanced.

By the way, when the revolution comes it won’t bother me at all. There’s no longer such thing as a one-size-fits-all data system and the fascinating work involves fitting all the pieces together. I’m working in the right place to explore this brave new world and excited to learn the best way to move from here to there.

Some Other Notes I Made

This feels like the RedBook put together by Michael Stonebraker but it deals with more kinds of systems. I hope Martin refreshes this book every few years with new editions as the industry changes.

Martin suggests that the C in ACID might have been introduced for the purpose of the acronym. I knew it!

Martin calls the CAP theorem unhelpful and explains why. He admits the CAP theorem “encouraged engineers to explore a wider design space” but it is probably better left behind.

My wife Leanne doesn’t like fantasy books and she won’t read a book that has a map in the front. My friend Paul won’t read a book without one. Martin is very professional, but his style shows and I love it. He’s got a map at the beginning of every chapter.

The quotes at the beginning of each chapter are really well chosen. They come from Douglas Adams, Terry Pratchett and others. But my favorite is from Thomas Aquinas “If the highest aim of a captain were to preserve his ship, he would keep it in port forever.”

Martin writes “The goal of this book is to help you navigate the fast and diverse changing landscapes of technologies for processing and storing data”. I believe he met that goal.

May 5, 2017

You have my permission to skip this post. This one’s just for me. So I’ve been drawing again with SQL Server’s spatial results tab, the first time I posted something was with Botticelli’s Birth of Venus in More images from the Spatial Results Tab.

Why Michael??

Because it’s a stupid challenge and I wanted to see what I could do with it. The SSMS spatial tab is a lousy crummy medium for images. It really is quite terrible and using SSMS to draw imposes restrictions and rules. It’s fun to see what I can do by staying within that framework. It’s something to push against just because it’s challenging. Others do crosswords, This week, I did this.

Why Now?

I realized a couple things lately.

The Colors Seem Dull … But Don’t Have to Be
I used to think the spatial results tab uses lousy colors, pastel and dull. I realized that they’re not dull, they’re just transparent. I can overlap polygons inside a geometry collection to get more solid colors. Here are the top 100 colors without transparency.

The Colors Seem Arbitrary … But Don’t Have to Be
The palette that SSMS uses is terrible. It’s almost as if the nth color is chosen using something like Color.FromArgb(new Random(n).Next()); Notice that color 6 and 7 (the beige colors on the left side of the grid) are almost indistinguishable from each-other. But I can use that. I can overlap different colors to get the color I need. And I can write a program to pick the best combination of overlaps. Here’s a nice red and blue:

But black remains difficult.

Curves Are Supported Now
I can use arc segments called CIRCULARSTRING. SVG files mostly use Bézier curves which cannot be translated easily to arc segments.
Here’s a logo that I rebuilt using arcs instead of Bézier curves:

For some reason, if you begin to use CIRCULARSTRING, then the transparent colors won’t blend with itself (just other colors).
Also arc segments are rendered as several small line segments anyway, so for my purposes, it’s not a super feature.

Polly

One last picture/query of a scarlet macaw. Click on it or any other picture in this post to get the query that generated it.