How do we become a high performing organization, to move faster and build more secure and resilient systems? That’s the $64,000 question!

A16Z strikes again! Andreeson Horowitz’s epic podcast hosts world class guests around all sorts of startup & new technology topics. This week they interview Jez Humble and Nicole Forsgren. They run Dora which is DevOps Research and Assessment, which shows organizations just how to get the advantages of devops in the real world.

Technology does not drive organizational performance

Check out section 16:04 in the podcast…

“the point of distinction comes from how you tie process and culture together technology through devops”

It’s the classic Amazon model. They’re running hundreds of experiments in production at any one time!

Mainframes of Kubernetes?

What about tooling? Is that important? Here’s what Jez has to say. Jump to 29:30…

“Implementing those technologies does *not* give you those outcomes. You can achieve those results with Mainframes. Equally you can use Kubernetes, Docker and microservices and not achieve those outcomes.”

2. Pushbutton builds

You’ve heard it before. Automate your builds. That means putting everything in version control, from environment building scripts, to configs, artifacts & reference data. Once you can do that, you’re on your way to automating production deploys completely.

Actually it was the first I had heard of 12 factor, so I did a bit of reading.

1. How to treat your data resources

12 factor recommends that backing services be treated like attached resources. Databases are loosely coupled to the applications, making it easier to replace a badly behaving database, or connect multiple ones.

2. Stay loosely coupled

In 12 Fractured Apps Kelsey Hightower adds that this loose coupling can be taken a step further. Applications shouldn’t even assume the database is available. Why not fall back to some useful state, even when the database isn’t online. Great idea!

3. Degrade gracefully

A read-only or browse-only mode is another example of this. Allow your application to have multiple decoupled database resources, some that are read-only. The application behaves intelligently based on what’s available. I’ve advocated those before in Why Dropbox didn’t have to fail.

Many of you know I publish a newsletter monthly. One thing I love about it is that after almost a decade of writing it regularly, the list has grown considerably. And I’m always surprised at how many former colleagues are actually reading it.

So that is a really gratifying thing. Thanks to those who are, and if you’re not already on there, signup here.

“I’m interested to hear your thoughts on the pros and cons of using a json column to embed data (almost like a poor-man’s Mongo) vs having a separate table for the bill of materials.”

Interesting question. Here are my thoughts.

1. Be clean or muddy?

In my view, these type of design decisions are always about tradeoffs.

The old advice was normalize everything to start off with. Then as you’re performance tuning, denormalize in special cases where it’ll eliminate messy joins. The special cases would then also need to be handled at insert & update time, as you’d have duplication of data.

NoSQL & mongo introduce all sorts of new choices. So too Postgres with json column data.

We know that putting everything in one table will be blazingly fast, as you don’t need to join. So reads will be cached cleanly, and hopefully based on single ID or a small set of ID lookups.

2. Go relational

For example you might choose MySQL or Postgres as your datastore, use it for what it’s good at. Keep your data in rows & columns, so you can later query it in arbitrary ways. That’s the discipline up front, and the benefit & beauty down the line.

I would shy away from the NoSQL add-ons that some relational vendors have added, to compete with their newer database cousins. This starts to feel like a fashion contest after a while.

3. Go distributed

If you’d like to go the NoSQL route, for example you could choose Mongodb. You’ll gain advantages like distributed-out-of-the-box, eventually consistent, and easy coding & integration with applications.

Downside being you’ll have to rearrange and/or pipeline to a relational or warehouse (redshift?) if & when you need arbitrary reports from that data. For example there may be new reports & ways of slicing & dicing the data that you can’t forsee right now.

4. Hazards of muddy models

Given those two options, I’m erring against the model of muddying the waters. My feeling is that features like JSON blobs in Postgres, and the memcache plugin in MySQL are features that the db designers are adding to compete in the fashion show with the NoSQL offerings, and still keep you in their ecosystem. But those interfaces within the relational (legacy?) databases are often cumbersome and clunky compared to their NoSQL cousins like Mongo.

I was reading one of my favorite blogs again, Todd Hoff’s High Scalability. He has an interesting weekly post format called “Quotable Quotes”. I like them because they’re often choice quotes that highlight some larger truth.

4. Beware the ORM

Anyone who’s a regular reader of this blog knows that I’ve railed on against ORM’s for a long time. These are object relational mappers, they are a library on to of a relational database. They allow developers to build software, without mucking around in SQL.

@sandromancuso I say yes to frameworks like netty, guice play, say no to all J2EE stuff like hibernate, spring.

Hibernate is a famous one. I remember back in the late 90’s I was brought on to fix some terrible performance problems. At root the problem was the SQL code. It wasn’t being written by their team & tuned carefully. It was written en mass & horribly by this Hibernate library.

I think this tweet gets to the heart of it. Often the decision to use a framework or not is simply hidden in plain sight, an assumption albeit a large one, that this is how you build software.

But these decisions are so fundamental because once your scaffolding is built, it becomes very hard to disentangle. Rip & replace becomes terribly expensive, and scalability becomes a painful unattainable dream.

This story gave me warm fuzzies… I was excited in a similar way when Linux was first released. This was many years back through the mists of time in 1992. I had recently graduated computer science, and one of my favorite classes was Operating Systems. We worked to build an OS following Andrew Tennenbaum’s book Minix.

After graduation, I heard about the Linux project & got excited. I was hearing whispering online that Linux could really completely replace windows. So I bought all the parts to build a 486 tower, graphics card, motherboard, memory cards & IDE drives. This ran into the thousands of dollars. Hardware wasn’t cheap then! Keyboard & monitor. I even ordered an optical mouse because it felt like you were sitting at a sun workstation, at home!

I remember putting all this together, and loading the first floppy disk into the thing. Did I image the disks properly? Will it really load something?

Up comes the bios and sure enough it’s booting off of the floppy drive. I thought “Wow, mother of god! This is amazing!”

From there I had init running, and soon the very seat of the soul, the Unix OS itself. That felt so darn cool.

After that I’d spend weeks configuring x-windows, but to have a GUI seemed like the mission impossible. And then you’d go about tweaking and rebuilding your kernel for this or that.

Thank you to Karanbir for rekindling this memory. It’s a great one!

For those starting out now as a developer, operations, or cloud site reliability engineer, I would totally recommend following Karanbir’s instructions. Here’s why!

1. Learning by building

My favorite thing about building a server myself, is that there’s something physical going on. You’re plugging in a cable for the disk bus. Bus is no longer just a concept, but a thing you can hold. You’re plugging in memory, you can look at it & say oh this is a chip, it’s different than a disk drive. You can hold the drive and say, oh there’s a miniature little record player in there, with magnetic arm. Cool!

2. Linux early beginnings

Another thing I remember about those days, was feeling like I was part of something big. I knew operating systems were crucial. And I knew that Windows wasn’t working. I knew it wouldn’t scale to the datacenter anyway.

I realized I wasn’t the only one to think this way. There were many others as excited as me, who were contributing code, and debug reports.

3. Debugging & problem solving

Building your own server involves a ton of debugging. In those days you had to compile all those support programs, using the GNU C compiler. You’d run make and get a whole slew of errors, and fire up your editor and get to work.

Configuring your windowing system meant figuring out where the right driver was, and also buying a graphics card that was *supported*. You would then tweak the refresh frequencies, resolution, and so forth. There was no auto detection. You could actually fry your monitor, if you set those numbers wrong!

4. Ownership of the stack

These days you hear a lot about “fullstack engineers”. There is no doubt in my mind, that this is the way to become one. Basic systems administration requires you to compile other peoples software & troubleshoot it. All those developer skills that will come in very handy.

They force you to see all the hardware, and how it fits together into a greater performant whole. It also gives you an appreciation for speed. Use one bus such as IDE or another such as SCSI and experience a different performance profile. Because all that software that Unix is paging in and out of memory, it’s doing by reading & writing to disk!

4. What to do? Do you like boring?

This comes as a shock to many in the startup world. It sort of smacks in the face of open source, or does it?

I worked in the enterprise space as an Oracle DBA for a decade starting in the mid-nineties. Among DBAs there was always a chuckle when a new version of Oracle came out. No one wanted to touch it with a ten foot pole. Sure we’d install it on test boxes, start learning the new features and so forth. But deploy it? No way, wait a good 2 or even 3 years before upgrading.

Meanwhile management was eager for the latest software. Don’t we want the newest? The Oracle sales guys would be selling the virtues of all sorts of features that nobody needed right away anyway.

5. Use tried & tested components

One webserver like nginx, one caching server like memcache or redis, one search server like solr or elasticsearch, one database like MySQL or postgres. Standardize all your components on one image, so you can use that for all your servers, regardless of which you use.

I took away a few key lessons from these that seem to be repeating refrains…

1. UX of data

UX design involves looking at how customers actually use a product in the real world. What parts of the product work for them, how they flow through that product and so on.

That same design sense can be applied to data. At high level that means exposing data in a measured, meaningful & authoritative way. Not all the tables & all the data points but rather key ones that help the business make decisions. Then layering on top discovery tools like Looker to allow the biz-ops to make more informed decisions.

2. Be iterative

Clean data, presented to business operations in a meaningful way, allows them to explore the data, and find useful trends. What’s more with good discovery tools, biz-ops is empowered to do their own reporting.

All this reduces the need to go to engineering for each report. It reduces friction and facilitates faster iteration. That’s agile!

4. Spot checks & balances

Spot checks on data are like unit tests on code. They keep you honest. Those rules for how your business works, and what your data should look like, can be captured in code, then applied as tests against source data.

5. Monitoring for data outages

As data is treated as a product, it should be monitored just like other production systems. A data inconsistency or failed spot check then becomes an “outage”. By taking these very seriously, and fire fighting just as you do other production systems, you can build trust in that data, as those fires become less frequent.

Sure there are firms like Netflix, who have turned the fickle cloud into one of virtues & reliability. But most of the firms I work with everyday, have moved to Amazon as though it’s regular bare-metal. And encountered some real problems in the process.

1. Everyday hardware outages

Many of the firms I’ve seen hosted on AWS don’t realize the servers fail so often. Amazon actually choosing cheap commodity components as a cost-savings measure. The assumption is, resilience should be built into your infrastructure using devops practices & automation tools like Chef & Puppet.

The sad reality is most firms provision the usual way, through the dashboard, with no safety net.

2. Ongoing network problems

Network latency is a big problem on Amazon. And it will affect you more. One reason is you’re most likely sitting on EBS as your storage. EBS? That’s elastic block storage, it’s Amazon’s NAS solution. Your little cheapo instance has to cross the network to get to storage. That *WILL* affect your performance.

3. Hard to be as resilient as netflix

From what I’m seeing at startups, most have a bit of devops in place, a bit of automation, such as autoscaling around the webservers. But little in terms of cross-region deployments. What’s more their database tier is protected only by multi-az or just a read-replica or two. These are fine for what they are, but will require real intervention when (not if) the server fails.

4. Provisioning isn’t your only problem

But the cloud gives me instant infrastructure. I can spinup servers & configure components through an API! Yes this is a major benefit of the cloud, compared to 1-2 hours in traditional environments like Softlayer or Rackspace. But you can also compare that with an outage every couple of years! Amazon’s hardware may fail a couple times a hear, more if you’re unlucky.

Meanwhile you’re going to deal with season weather problems *INSIDE* your datacenter. Think of these as swarms of customers invading your servers, like a DDOS attack, but self-inflicted.

Amazon is like a weak immune system attacking itself all the time, requiring constant medication to keep the host alive!

5. RDS is going to bite you

Besides all these other problems, I’m seeing more customers build their applications on the managed database solution MySQL RDS. I’ve found RDS terribly hard to manage. It introduces downtime at every turn, where standard MySQL would incur none.

When I picked up Johnson’s book, I knew nothing about Cholera. Sure I’d heard the name, but I didn’t know what a plague it was, during the 19th century.

Johnson’s book is at once a thriller, of the deadly progress of the disease. But in that story, we learn of the squalor inhabitants of victorian england endured, before public works & sanitation. We learn of architecture & city planning, statistics & how epidemiology was born. The evidence is weaved together with map making, the study of pandemics, information design, environmentalism & modern crisis management.

“It is a great testimony to the connectedness of life on earth that the fates of the largest and the tiniest life should be so closely dependent on each other. In a city like Victorian London, unchallenged by military threats and bursting with new forms of capital and energy, microbes were the primary force reigning in the city’s otherwise runaway growth, precisely because London had offered Vibrio cholerae (not to mention countless other species of bacterium) precisely what it had offered stock-brokers and coffee-house proprietors and sewer-hunters: a whole new way of making a living.”

1. Scientific pollination

John Snow was the investigator who solved the riddle. He didn’t believe that putrid smells carried disease, the miasma theory prevailing of the day.

“Part of what made Snow’s map groundbreaking was the fact that it wedded state-of-the-art information design to a scientifically valid theory of cholera’s transmission. “

3. A Generalist saves the day

The interesting thing about John Snow was how much of a generalist he really was. Because of this he was able to see thing across disciplines that others of the time were not able to see.

“Snow himself was a kind of one-man coffeehouse: one of the primary reasons he was able to cut through the fog of miasma was his multi-disciplinary approach, as a practicing physician, mapmaker, inventor, chemist, demographer & medical detective”

4. Enabling the growth of modern cities

The discovery of the cause of Cholera prompted the city of London to a huge public works project, to build sewers that would flush waste water out to sea. Truly, it was this discovery and it’s solution that has enabled much of the population densities we see in the 21st century. Without modern sanitation, 20 million plus cities would certainly not be possible.

5. Bird Flu & modern crisis management

In recent years we have seen public health officials around the world push for Thai poultry workers to get their flu shots. Why? Because although avian flu only spreads from animal to humans now, were it to encounter a person who had our run-of-the-mill flu, it could quickly learn to move human to human.

All these disciplines of science, epidemiology and so forth direct decisions we make today. And much of that is informed by what Snow pioneered with the study of cholera.