A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away. ~Antoine de Saint-Exupery -- Note, the opinions stated here are mine alone and are not those of any past, present, or future employer. --

Saturday, December 03, 2011

Many years ago at a management training class the instructor went over the 4 stages of knowledge progression. I found it a reasonable perspective. They are:

We don't know we don't know.

We know we don't know.

We don't know we know.

We know we know.

I relate to this progression quite well. I, like everyone, am in all of these states, all the time on different areas. I came to an interesting realization recently about the transition between these states though.

I wrote about asking questions instead of simply providing answers a while ago. I think that works most effectively when you are helping someone move from stage 3 to stage 4. You know they know, but they don't realize it yet. A series of questions can help them come to the answer that they know and also help improve their confidence on the topic. Helping them reach the point of knowing they know. It is probably less effective with helping someone move between the earlier states because they often unable to answer your questions.

Helping move from stage 3 to stage 4 is important. It's only when you reach stage 4 that you have the confidence to help others on the subject. If you want your organization to grow, to become stronger, you have to not only mentor, but you have to build an organization of mentors. Moving people from not knowing what they know to being confident in what they know is a critical step in creating mentors.

Stage 1 is the one that is most interesting to me lately. We all start there. Every time the lads on Top Gear start a build project they characterize it with the definitive, "How hard can it be?" statement. Every episode ends with them landing firmly in stage 2, realizing exactly how hard it can be. Moving from stage 1 to stage 2 can be a very humbling experience. It involves accepting you don't know something that you think you do know. The key distinction between stage 1 and 2 is that in stage 2 we clearly understand that we aren't expert in a topic, while in stage 1, we believe we are.

If the transition is humbling, then it requires us to set aside our egos. To accept that we are fallible and need to open our minds to the possibility that we are, frankly, clueless about the topic. It's incredibly important though because until you can accept that, you can't move to stage 2 and there is no path to stage 4 that doesn't pass through stage 2. Someone recently shared with me, the difference between true geniuses and those who aren't is a true genius knows the material but doubts it while others don't know the material but are sure. Stage 3 vs stage 1.

The entire point of this article though is to encourage all of us to be open to the fact that we are in stage 1 on many topics. We all know somebody who is there on a subject and at the same time has too much ego to even be approached on the fact that they are in stage 1, not stage 4. It's unfortunate because it blocks that person from ever reaching stage 1. There may be little you can do to help them, but you can help yourself. Open your mind to the possibility of what stage you are in for a topic. Have you truly made the journey to the knowledge? If you believe that you just know something through divine inspiration, there's a good chance that you're stuck in stage 1, not stage 4.

Saturday, October 22, 2011

During one of my first few weeks at eBay I got involved in a conversation about mark down logic. Now, I had only been at the company for a short while and I was working for an e-commerce company, so I assumed mark down logic must be some business rules about price discounts. Certainly seemed like a reasonable line of thought. As it turns out, I was completely wrong and as a result got introduced to a concept that is critical to the ability of any site to achieve high levels of availability.

The term came into existence inside of eBay because the DBA's wanted a mechanism to tell the applications that the database was down, regardless of the true state of the database. They wanted to mark the database state as down. The original motivation for this was to deal with challenges with the database listener. With hundreds, at that time, of application servers, all waiting for the database to come back up, the moment the listener was turned on, a connection storm would hit and often cause the database to go down again. Rather than requiring an involved set of start up procedures that would effectively be a total site reboot, the desire was to be able to control the rate of application connections.

There are actually two concepts that have to be considered here:

The ability to mark an external resource state as up or down and have the application honor that state.

The ability of the application to behave in a defined way when the resource is down and to return to the proper behavior when the resource returns, without being restarted.

The first one is relatively straightforward as long as your application manages the connections through an abstraction, such as a connection pool. Attempts to get connections receive a "marked down" exception which then part 2 needs to handle. It clearly is more challenging if you haven't created any abstraction for dealing with external resources. For example if REST services are called in a variety of ways with no common path to establish the HTTP connection, then supporting mark down becomes more problematic. There are many other reasons why you would want to provide a common HTTP connection path anyway (e.g., managing time outs, retries, consistency of configuration, etc.).

There are nuances to supporting mark down however. The first is dealing with how you will change the state. For a small deployment, something as simple as an HTTP POST on an administrative listener could be used. I've also seen a configuration file with a watch for modifications. Tools like Puppet can then be used to push out state changes. This works well for small deployments. Larger deployments would benefit from configuration service tools like Zookeeper.

The second concept is much more involved. The challenge faced here is that applications need to behave in a predictable way when a resource becomes unavailable. I chose predictable here because the actual behavior is going to vary considerably with the application logic and the down resource. While simply returning HTTP status 500 may be predictable, that's not what I mean and is usually not sufficient to be considered robust.

One of the most challenging but important considerations is what can the application do without the resource that is down? The simple minded approach is to simply state, nothing and return the equivalent of service temporarily unavailable. This may in fact be the only option depending upon the scenario. A more robust approach however is to design applications to try and make as much forward progress as they can with the resource missing. Design the application with resilience to missing resources. Think about what it could do if you took the resource away. What functionality could still be provided?

Equally challenging is managing state that might get confused if the resource becomes unavailable. When exceptions start coming back from database connections or REST services, the internal state of the application could become corrupt. The result is that even though the resource has returned, the application is unable to use it correctly and ultimately has to be restarted itself.

This brings me to another important point. The only way to make sure that your application can behave predictably and recover correctly from resources going down it so test it. Testing mark down needs to be a standard part of the application regressions. Netflix has taken it to the ultimate state by creating a Chaos Monkey. They turn it lose in production with the sole purpose of randomly killing things and making sure their applications can survive. I'm a fan!