Eventually Consistent

I was asked to write a post for an external site! It’s available here.

It takes a while to develop “taste” in programming, to figure what you think works well and what doesn’t. Your taste and another excellent engineer’s taste might be different, and result in different tradeoffs, but I find it useful and interesting to write down what I think (so far!). This is how I think about Functional Programming in Python.

I was reading a book recently where a character scuba dives while pregnant. The book is set in the past, so I decided to look up modern medical information on whether this is a good idea.

The answer is “No”. There is no medical reason why the answer is no. To the best of our knowledge, none of the biological systems required for gestation are significantly affected by scuba diving. But most people think about scuba diving as risky, and if they were to guess without doing research, they’d conclude… maybe it’s not a great idea to do while pregnant. So even though the medical establishment agrees that it should theoretically be fine, if something goes wrong, psychologically it is very destructive to look back and see something you could have done differently.

Why am I talking about this in an engineering blog? Because it’s an analogy to failure scenario generation, safety, and verification in any discipline. Many engineers have shipped a piece of code, having told themselves it’s fine to the best of their knowledge, only to see if fall over or emit terrible bugs in production. This is why we often push to a small percentage of real traffic, do bug-bashes and conduct pre-mortems where we imagine different types of failures and what would have caused them. We’re trying to smoke out the unknown unknowns that cause issues. It’s a type of thinking I am actively learning how to lean into. As an optimist, someone who tends to seek out nuance, and a person who has a strong bias towards action, I tend to chafe against risk-aversion without a clear threat model. The term “Cover Your Ass” gets thrown around to describe extreme end of this - wasteful carefulness.

I am now on a team that is business-critical, and one of the lowest layers of the engineering stack. We have many customers for the code we write, including ourselves. There are some kind of things we can break without any consequence - low-priority monitoring, for example. There are things that if they break, it will be obvious, and someone can step in and fix it - like an auto-remediation service that can replaced by hand-operations.

Then there failures that are very hard to reverse, are hard to detect, and that heavily affect other teams and end user - like breaches in our durability and consistency guarantees. If something is permanently deleted, there is no undoing. If an inconsistency is introduced, there may not be repercussions until much later, when system invariant check begin to fire or users notice. These are the areas where defaulting to risk-aversion is the right thing to do.

You can’t guarantee nothing will go wrong, of course. And a good post-mortem will take goodwill into account. But there’s a rule of thumb you can you use: if there’s a serious outage, how would you justify your actions to engineering leadership? If you look at the testing and rollout strategy you’ve planned, and think about ways in which the system might fail, and you can sincerely tell your boss’s boss’s boss that you’ve made reasonable tradeoffs even if something goes wrong, then you’re good to go.

It’s a balance, like all things. There are probably situations where there is an urgent reason to metaphorically scuba dive that is worth the unknown-unknown nature of the risk, even with high stakes. People’s intuitions and risk-friendliness also vary based on personality, and how they’ve seen things fail in the past. A lot of growing as an engineer is fine-tuning that initial response to design decisions. Now when I see code that has a lot of side effects, that relies on implicit state, or that is clever but not clear, it makes me uncomfortable, because I emotionally remember having to debug similar issues in the post. Sometimes the situation is different, and the scary things really are justifiable, or less scary. That discomfort is worth exploring with an open mind, rather than fighting the last war over and over. Going forward, I hope to expand my intuition for system design as well.

I just wrapped up a couple of years working on things in the general space “release engineering” “continuous integration” and “delivery/auto-update”, with a bit of “build scripting”.

I now irrevocably think about software with two things in mind:

versioning

release process

These two bullet points are very closely related but quite the same thing. Just like software architecture, they are easy to over-engineer if you add a lot of structure too early, or to under-invest in if you wait too long. Ad-hoc designs that were totally logical and understandable at the time often become unwieldy and impair productivity.

First, some definitions.

Versioning

Let’s take versioning to mean any way to point to a snapshot of code. The formats can vary widely. Every commit in git is a distinct version described by a hash and some number of “refs” (branches and tags). End user software is usually published with a version - at the time of this example I’m running Chrome 64.0.3282.167 and PyCharm 2017.2.4 Build # PY-127.4343.27, on a MacBook with MacOS Sierra Version 10.12.6 G1651212.

In addition, versions are often used to embed a pointer into another piece of software, specifying that it should be incorporated as a dependency. There are programming language and operating system specific package managers designed to help us keep track of our versions and do upgrades (npm, apt, pip, brew), and configuration management tools to deploy changes of those versions (chef, puppet, salt, ansible). Dependency upgrades often cause cascading work to maintain compatibility, even just for consumer software. I personally have spent an unreasonable amount of time trying to install the right version of Java or Silverlight (and maybe had to upgrade my browser) to view a webpage or video.

The version formats themselves encode information about release ordering and hierarchy, with any number of digits/letter, a date stamp, a unique unordered id or hash, or a catchy name. Numerals may have hidden information. At Dropbox we used to use even/odd middle digits to keep track of whether a Desktop Client build was experimental, and Ubuntu’s major number is the year of release, the minor is the quarter, and even major numbers imply “Long Term Support”.

Release Process

This is a convenient segue into release process. Ubuntu does not want to have to maintain backwards compatibility forever, so it explicitly publishes a contract on the life cycle of the code. We are not all on the same version of Chrome (if you use Chrome), depending on whether you booted your laptop recently, how successful the auto-updater is, and maybe even whether you did some manual intervention like accepting new permissions or performing a force upgrade. Most larger projects have some way to feed higher risk-tolerance users earlier updates, with internal versions and/or alpha/beta testing programs. All of these versions have some cadence, some threshold of testing or quality, and some way they handle “ship-stopper” bugs. Rollouts can go even beyond the binary version and have A/B testing via dynamic flags to change code pathways depending on the user.

There is a proliferation of dependency management tools because when one piece of software relies on another, they have to pay attention to new releases, evaluate whether it’s a breaking change, maybe patch the code, potentially introduce bugs (from dependency change, from the code change, or from unveiled assumption mismatch between the two1), and then release… maybe causing similar effort for someone upstream. Not doing this work often causes a panic when end of support happens on a critical library.

What’s more, the same dependency might be used in more than one place in the same codebase or company. Can they be upgraded at the same time? Did someone assume they’d always be pinned together?2 Do they have dependencies on each other, further expanding the decision/testing matrix?3

The catch

I’m more or less convinced that when people say they like the low overhead of small companies, they’re not just talking about polynomial communication cost or political hierarchy. When you don’t have very much code, you don’t have very many dependencies.

So, why can’t we all share a standard three digit version number with strict hierarchy, and take out some of the complexity? Sometimes versioning feels like continuously reinventing the wheel - I have had to work on several almost-the-same little libraries that parse and do common operations on version numbers.

The problem is that version numbers are a manifestation of the developers’ assumptions of the life cycle of the code. Developers may want to record the date of the release, or have an ordering hierarchy, or specify build flags, or mark a version as backwards incompatible. In some cases the ecosystem enforces some really strict rollout process. Python’s packages are all versioned in the way I described above, because pip provides a contract for the author and end user.4 You get a single snapshot in time, rolling monotonically forward on a single logical branch. End users can cherry pick and patch changes if they want to, but it’s up to them to manage that complexity, eat the consequences of using a new version of a package that is insufficiently battle tested, and upgrade if e.g. a security vulnerability is found.

This doesn’t work well when the consumer is not another software project. Users generally hate having to manage their own updates, and it’s your bad if they end up with a buggy or insecure product. For-profit software orgs also need to figure out what to build, trying experiments early and often. At Dropbox a couple of years ago, we realized that even/odd thing wasn’t going to cut it. We needed to move faster, and to be able to segment metrics by the alpha/beta type even with overlapping rollouts. So we redesigned the system to be a simple three numbers, a major version to indicate a distinct “release train” (ever-hardening logical branch) that would go out to all users, middle/minor version 1-4 to indicate build type, a final point version always incrementing as builds are created on each release train. We made the necessary refactors, thought carefully about the ordering consequences because Dropbox doesn’t support non-manual downgrades, and wrote up a post for our forums users. But we didn’t leave enough space in between the minor versions! When we wanted to indicate new types of thing, like to break out versioning of some subcomponents or introduce a different type of experimental build, we were stuck, and had to do the same headache all over again.

Lessons learned

The end lesson, as hinted above, is to approach versioning and release like architecture. Implement what is useful to you now, but think about assumptions and whether they could change. Design it in a way so that adding new things is possible, and ideally not too good. Evolving is good, but thrashing on design just causes confusion and overhead for you and your collaborators.

Release processes, like testing, should serve some quantifiable good for your quality. Do you actually care if this thing breaks? Then have longer and more verification periods. Just throwing spaghetti on the wall? Not so much. Don’t put code in stage/alpha/etc for a shorter amount of time than you can actually find the bugs you care about, or you’re just doing pretend quality assurance. And if that time is too long, improve you telemetry so you can find bugs.

Regardless, have a really really good plan for last minute changes, including bugfixes for emergent issues The biggest take away I got from writing this series of blog posts for Dropbox is that your product is only as good as the test pass/canary period since the last code change. You can and should design code and processes to default to turning things back to a known good state in an emergency. Hot fix after hot fix only works if you can push quickly and are tolerant to poor quality. It doesn’t work well if you need strong consistency, durability, availability, or care about miffing users with a buggy interface. Quick feedback loops are key - as long as you consciously think about the price you willing to pay for that feedback, whenever it be in hardware costs, user loss, or developer time.

Fun bug I’ve heard about recently: a new hardware specification uncovered a bug in the Go compiler. Go figure ;)

I have seen a system like this. It was good for forcing everyone to be on the same old version forever :).

AKA “Migrapocalypse”.

Turns out this is a bit more flexible than I’d realized, including allowing a variety of pre and post fixes, thanks @Sumana Harihareswara! Check out PEP440 for more detail. Pip is still pretty prescriptive.

I’ve been programming for about 6.5 years, in some capacity or another. I don’t know if you ever try to remember what it was like to be a past version of yourself, but I do sometimes, and it’s hard. Remembering not knowing something is really especially difficult - this is a big challenge in teaching.

I’m better at programming than past me. I’m MUCH better at getting things to work reliably. My effort is more efficiently expended, while my standards for “working” have changed - the code I write might be part of a collaborative project, roll up into a larger system or service with emergent behavior, or have a lot of potential edge cases.

This is what I remember of some key turning points in my “make it work” strategies. Like most learning, a lot of my knowledge came from failure educational misadventure.

Run the code

Like many people I first learned to program in a class. I studied physics, so our goals were usually to get a simulation working to demonstrate a concept, but i could in theory turn in my code without running it at all. If I did, 9 times of of 10 there would be some glaring bug.

So the first line of the defense in getting code to work is to run the code. Sometimes this is still enough - when writing little scripts to assist me I often write once and run once, or a handful of times with minimal troubleshooting. Sometimes, when the environment is tricky or the computation takes a long time, it is actually quite hard to do.

Print statements

Running code only tells you if the inputs and outputs are correct, and maybe any side effects that you can obviously observe. Intermediate checkpoints are incredibly useful. One of the first tools I learned in MatLab for physics computation was disp.

Code review

In my college programming classes, we were encouraged to work with a lab parter. Getting someone else to look at your code can smoke out obvious mistakes - just the other day a coworker pointed out that a bug I thought was complicated was just a misspelling of the word “response”.

Out-of-the-box debugging tools

My second Computer Science course was Data Structures and Algorithms in C++ – which meant memory management. My labmate and I were absolutely stuck on some segfault, until a friend a couple of courses ahead at the time (and later a contributing member of the C++ community) came by and showed us Valgrind, in exchange for oreos. I didn’t totally get what was going on, but having an independent program with a view onto my totally malfunctioning, un-print-statementable code was key.

Static analyzers

You know what’s great? IDEs. During college, I’d used environments given to me by my instructor to learn a language - Dr. Racket, Eclipse, MatLab, etc. I learned Python at Recurse Center (then called Hacker School), in the summer of 2013, and for the first time I had total choice over my workflow. After experimenting with vim plugins, I found PyCharm, and a lot of the rough edges of Python were (and continue) to be smoothed.

You know what’s great about PyCharm? It has a bunch of built in static analyzers, things that parse the code and look for known pitfalls, on top of standard syntax highlighting. PyCharm now supports type hints, which I cannot bang the drum of enough.

You might be thinking “my compiled language doesn’t have these problems”, but languages do not (and maybe should not) implement every feature you want by default. I was delighted to learn recently that there is a comment syntax and checker for null exceptions in Java, and seems to be similar to “strict optional” for Python.

Unittests

My first software engineering job at Juniper Networks was the first time I wrote code other people had to reuse and rely on. I worked on a team of about twelve people. I was encouraged to write unittests, and reviewers would block code reviews if I didn’t.

The thing is, writing unit tests is hard! There is a cost to any kind of testing, and an up-front cost of ramping up on a new set of tools. The big barrier to entry for me was learning how to effectively mock out components. Mocking means replacing real, production-accurate bits of code with fake functions or objects that can parrot whatever you tell them. My tech lead had written his own mocking library for Python, called vmock, and evangelized it to the entire team. Some other parts of code used the standard Python mock library. They had different philosophies, different APIs, and different documentation.

I still run into barriers to entry in learning a new testing paradigm today. I wanted to add a tiny bit of logging to an unfamiliar codebase recently, and reading and understanding the test codebase felt like a huge amount of overhead. Thankfully the existing unit tests on this code saved my butt - that tiny bit of logging had about 3 SyntaxErrors in it.

This brings up why unit testing is useful:

It reduces the chance that small units of code are broken. If 10% of your code has bugs, then the chance that both your test and production code have complimentary bugs that cause false positives should be less than 1%.

It documents the expected behavior for posterity. Assuming the tests are run regularly and kept green, the tests are forced to stay up to date with the code, unlike docstrings or external documentation. It’s a great way to remember what exactly the inputs and outputs to something should look like, or what side effects should hapen.

At this point, unit tests have become such a habit that I often write them for untested code in order to understand it. I also write them for anything new that I will share with others. Unit tests can be the happy outcome of laziness and fear. Writing a unit test when you understand code well is less work than running it over and over by hand, and you’re less likely to embarrass yourself by presenting someone else with broken code.

Integration tests

Early at Dropbox, where I currently work, I was trying to fix a regression in the screenshots feature on a new operating system version that wasn’t yet in wide release. I wrote some new unit tests, ran the code on the target system, and felt pretty confident. In the process, due to subtle differences in the operating system APIs (which I had mocked out) I broke it on every version of the platform except the one I was repairing. It rolled out to some users, who caught it. I could probably point you to the forums post with the reports :/.

Then I learned something about the limitations of unit tests:

Mocks encode your assumptions, and can lead to false positives.

Unit tests are intended to pass or fail deterministically, and therefore cannot rely on outside dependencies. It’s very easy to inaccurately mock a web API, for example. Even if you’re very careful about mocking as little as possible, you will feed the tests a constrained sets of inputs and outputs that may or may not reflect real usage.

Unit tests don’t test UI well.

Any person executing the code would have noticed that a dialog didn’t pop up saying their screenshot was saved. But it would be really annoying and repetitive to run that code on every single platform every time I made a small change.

Enter integration tests. Integration tests often affect multiple system, often from the outside, and usually test features end-to-end. This means they take more thought and time to write. They are are also necessarily more expensive and are more likely to fail inconsistently. Frankly, they are not always worth the expense to write and maintain, but when the tradeoff is between an explosion of manual test cases or integration tests, integration tests are way to go. It is an interesting exercise to figure out how to insert test hooks, and may force you (like unit tests) to improve the modularity of the code.

Slow rollouts, flagging, and logging

My first project at Dropbox touched our installer, a pretty important bit of code for the success of the application. I had written unit tests and done a battery of manual tests. When the new feature was starting to roll out, we got error reports from a handful of users, so we halted rollout, turned off the feature, and furiously investigated.

The root cause was something I never would have thought of - the name of a directory had been changed at some point while the Dropbox was installed, which broke some assumptions in the application surfaced by my code.

My three lessons here: even integration tests or thorough manual tests only catch your known unknowns. You’re still vulnerable to the things you cannot anticipate. Also, having ways to quickly turn off code, from the server if possible, is fantastic for limiting the exposure of risky code. Third, testing in a small subset of users only works if you know the important high level of what’s going on while the code is testing, either from event logging “step complete” or error reporting.

Acceptance testing

This is the rare case where I think I learned from best practices recommended by others. If you are going to cut over from one system to a supposedly parallel one, it is much better the run the new one “dark” and confirm they produce the same output rather than do the swap and wonder why things seem to be different, or why it’s falling over under realistic load. A recent surprise application: hardware misconfigurations. Now I help run a continuous integration cluster, and trying to add new hardware to expand capacity actually bit us when some machines were misconfigured.

“Belt and suspsenders”

A year or so ago I worked a migration of our built application binaries storage. Build scripts are really hard to test end-to-end without just making a build, since they mostly call other scripts and move files around, and I didn’t want to make too many extra builds. I already had a neat little acceptance testing plan with a double write. But it turned out I had runtime errors in the new code, which caused the script to crash.

The takeaway: when there’s something critical and hard to run or write a test for, it’s better to build a moat around your new bit (in this case, a try/except and logging would have done), and fall back to the old code if it errors for any reason. The tricky thing here is that the moat-code itself can be buggy (who amongst us has not mistyped the name of a feature flag on the first try, not I), so that moat-code needs to be well tested using the previous tools.

Configuration is code

A lot of code is at least a little dependent on its running environment. It can be as simple as the flags passed into the main function, or as complicated as the other applications running on the machine at the same time.

I’ve relearned the maxim “configuration is code” over and over in my current context, working on a contiguous integration cluster. For one example, we have a few “pet” machines that have been configured over time by many people. It is neigh on impossible to duplicate these machines, or look to a reference of how they were configured. This is why containers and tools like Chef and Puppet are useful. For another, our scripts run via a coordination service that has a UI with hand-editable JSON configuration. While it may be nice to be able to change the flags passed into scripts with little friction, there is no record of changes and it’s difficult to deploy the changes atomically.

I hope you enjoyed these accounts, and I may add on as I misadventure more and learn more :)

Part two of my blog post for Dropbox, “Accelerating Iteration Velocity on Dropbox’s Desktop Client” came out last week. I find emergent properties and trying to understand and engineer for complexity really interesting. One of the cool things about my job is that I get to think about the emergent properties of both code and people (specifically my coworkers). The article is about what has worked for us.

This week has been a good week for finishing writing. If you’d like to see what happens when I have a dedicated editor, the feedback of other close colleagues, and a professional audience read this post on the Dropbox Tech blog. It is the first of two on the things my larger team (Desktop Platform) did in 2016 to speed up the rate at which we can release new versions of the Dropbox Desktop Client.

Definition and Background

In software, exceptions date from the late 1960s, when they were introduced into Lisp-based languages. By definition, they are intended to indicate something unusual is happening. Hardware often allows execution to continue in its original flow directly after an exception, but over time these kinds of “resumable” exceptions have been entirely phased out of software. You can read more about this in the Wikipedia article, but the important point is that these days, software exceptions are essentially a way to jump to an entirely different logical path.

Intuitively there are good reasons for exceptions to exist. It is difficult to anticipate the full scope of possible inputs to your program, or the potential outputs of your dependencies. Even if you were able to somehow able to guarantee that your program will only ever be called with the same parameters, cosmic rays happen, disks fail, and network connectivity flakes, and these can manifest in a variety ways that all result in your program needing to metaphorically vomit.

In Python

I primarily program in Python, which is a very flexible programming language. Given that it has “duck typing”, values are happily passed around as long as possible until it is proven that computation cannot succeed. I had noticed a tendency in my own programming and in other code I read to use exception handling as program flow control. Here is a really small example that I saw (paraphrased) in code earlier this week:

If I run this, all that will ever happen is either foo will be returned, or None will be returned, right?

(no.)

There are a lot of other possible exceptions in this code. What if the data doesn’t turn out to be a dict? As far as I know, this would raise a TypeError. What if data isn’t valid JSON? With the standard json library, it would be a ValueError or JSONDecodeError. What if there isn’t even a file called filename? IOError. All of these are indicative of malformed data, rather than just a value that hasn’t been set.

When I look at this however, it feels a little weird… like it’s not the best practice. The exception handlers are being used for two fundamentally different things: an expected, but maybe unusual case, and violation of the underlying assumptions of the function.

Plus, trying to think of every single possible source of exceptions is arduous. It makes me nervous now, even in this toy example of four lines of code non-try/except code. What if the filename isn’t a str? Should I add another handler? There are already a lot of except clauses, and they’re starting to get in the way of the readabilily of the code.

Ok, new idea. Clearly the IndexError should be replaced with a simple get. Probably that would be more performant anyway (maybe some other time I’ll delve into the implementation of exception handlers and their perfomance). Around everything else, I could throw in a general try/except.

Suddenly I’m getting a MalformedDataError on every run of this function… Am I missing the file? Is it formatted correctly? Eventually, in a hypothetical debugging session, I’d read closely and add some print statements and figure out that I accidentally used the wrong json load function: load and loads are sneakily similar.

The general try/except is disguising what is clearly a third kind of issue - a bug within my code, locally in this function. Try as I might, the first draft of my code regularly has this kind of “dumb” mistake. Things like misspellings or wrong parameter order are really hard for humans to catch in code review, so we want to fail local bugs early and loudly.

Aside: Exception hierarchies

The possibility of conflating two problems that raise the same exception, but need to be handled differently, is a nerve-wracking part of Python. Say my typo had caused a TypeError in line 16 of the 2nd code snippet, by, for example, trying to index with a mutable type, return data[['foo']]. Even if I tried to switch to a except TypeError around just that line, in case the JSON object was not a dict, the local-code bug would not be uniquely identified.

Conversely, when I write my own libraries, it can be overwhelming to decide whether two situations actually merit the same exception. If I found a list instead of a dict, I could raise TypeError, which is builtin and seems self-explanatory, but might get mixed up with thousands of other TypeErrors from other parts of the stack. Or I could a custom exception WrongJSONObjectError, but then I have to import it into other modules to catch it, and if I make too many my library can become bloated.

I could rabbit-hole further on this code, exploring more potential configurations. Maybe I should check the type of foo before returning. Maybe I could try to catch only the errors I definitely know aren’t my fault, and then stick a pass and retry in there in case there’s some transient error. Hey, filename me be on a network drive and connectivity is flaky. The proliferation is huge, so it’s time to get axiomatic.

It’s worth noting that all of the above full examples pass a standard pep8 linter. They are all patterns that I’ve seen somewhere in thoroughly tested and used production code, and they can be made to work. But the question I want to answer is, what are the best practices to make code as correct, readable, and maintainable as possible?

Classifying exceptions

So far we’ve enumerated three ways exceptions are used:

1. Unusual, expected cases.

It seems better to me to just not use exceptions, and try an if statement instead of relying on the underlying code to fail. The only exception (hehhh) to this rule that I can think of is if there is no way to catch a expected-unusual case early, but I can’t think of a good example for this. The Python language has made the design decision to make StopIteration and handful of other program flow oriented concepts an Exception subclass, but it’s worth noting they are not a subclass of StandardError. Probably our own custom subclasses should respect this nomenclature, using Error for something gone wrong and Exception for something unusual but non-problematic.

Another subtle facet: do we fold these mildly exceptional cases into the expected return type, by maybe returning an empty string if foo expected to be a string, or return something special like None? That is beyond the scope of this specific article, but I will give it more thought. I recommend looking into mypy, a static type analyzer for Python, and its “strict optional” type checking to help keep if foo is None checks under control.

2. A logical error in this section of code.

Logical errors happen, since writing code means writing bugs. Python does its best to find and dispatch obviously line-level buggy code with SyntaxError.

In fact, SyntaxErrors AFAIK can’t be caught, which is a pretty good indicator of what should happen when executing locally buggy code - fail fast and with as much explicit information as possible on location and context. Otherwise, local bugs that are not raised immediately will show up as a violated expectation in the next layer up or down of the code.

That brings us to the next type:

3. Other things are effed.

Some other code broke the rules. Ugh. It feels like it shouldn’t be my job to clean up if something else messed up. It’s really attractive, when starting out on a project, to just let all exceptions bubble up, giving other code the benefit of the doubt. If something consistently comes up in testing or production, maybe then it’s worth adding a handler.

But, there is some some more nuance here. In our examples above, the “culprit” of errors could be many different things. They roughly break down into the following:

Problems with dependent systems. The callee of this code did not meet my expectations. Perhaps I called a function, and it raised an error. Could be an error due to a bug in their code, or it could be something further down the stack, like the disk is full. Maybe it returned the wrong type. Maybe the computer exploded.

Problems with depending systems. The caller of the code did not meet my expectations. They passed in a parameter that isn’t valid. They didn’t do the setup I expected. The program was executed in Python 3 and this is Python 2.

However, there are plenty of places where the distinction between these possibilities doesn’t seem particularly obvious. Did the callee raise an error because it failed, or because I violated its assumptions? Or did I pass through a violated assumption from my caller? In the examples above, IOError would be raised by json.loads(filename), regardless of whether the filename didn’t exist (probably the caller’s problem) or the disk was broken (the callee’s problem, since the ultimate callee of any software is hardware).

Design by Contract

The concepts might sound familiar if you’ve heard much about Design by Contract, a paradigm originated by Bertrand Meyer in the late 1990s. Thinking about our functions explicitly as contracts, and then enforcing them like contracts, is actually potentially the solution to both of the top-level problems we’ve come across: attributing error correctly and handling it effectively.

The basic idea behind Design by Contract is that interfaces should have specific, enforceable expectations, guarantees, and invariants. If a “client” meets the expectations, the “supplier” must complete the guarantee. If the “supplier” cannot complete the guarantee, at a minimum it maintains the invariant, and indicates failure. If the “supplier” cannot maintain the invariants, crash the program, because the world is broken and nothing makes sense anymore. An example of an invariant is something like, list size must not be negative, or there must be 0 or 1 winners of a game.

In application

The language Eiffel was designed by Meyer to include these concepts at the syntax level, but we can port a lot of the same benefit to Python. A first step is documenting what the contract actually is. Here is the list of required elements for a contract, quoted from the Wikipedia page:

Acceptable and unacceptable input values or types, and their meanings

Return values or types, and their meanings

Error and exception condition values or types that can occur, and their meanings

Side effects

Preconditions

Postconditions

Invariants

Most Python comment styles have standard ways of specifying the first three item. Static analyzers, like mypy or Pycharm’s code inspection, can even identify bugs by parsing comments before you ever run the code. Taking the last version of our example from above, and applying the Google comment style + type hints, we might end up with something like this:

We even got a new piece of information! I hadn’t been forced to think about the return type previously, but with this docstring, I realized foo is expected to be an int, and added a bit of code to enforce that.

Side effects and invariants

Generally speaking there aren’t good ways of enumerating or statically checking the execution of side effects, which is a reason to avoid them when possible. A common example would be updating an instance variable on a class, in which case the side effect would be hopefully obvious from the method name and specified in the dostring. At a minimum, side effects should obey any invariants.

This tiny code snippet doesn’t have an obvious invariant, and I theorize this is because it is side-effect free. Conceivably, another function in the module would mutate the dictionary and write it, and our invariant would be that filename would preserve a JSON encoded dictionary, maybe even of a certain schema. The other classic example is a bank account: a transaction can succeed or fail, due to caller error (inputting an invalid currency) or callee error (account database isn’t available), but the accounts should never go negative.

Preconditions and postconditions

Preconditions are things that must be true before a function can execute. They are they responsibility of the “client” in the business contract analogy, and if they are not met, the “supplier” doesn’t have to do anything. Postconditions are the criteria by which the function is graded. If they are not met, and no error is indicated, the bug can be attributed to that function. This maps really cleanly to the caller and callee problems we delineated above.

Left up to the programmer is where to draw the line between pre- and post-. In our toy example above, I’ve implicitly decided to make the preconditions pretty strict: filename must exist and be a JSON encoded dictionary. I could have chosen to, for example, accept a file with a single integer ascii-encoded integer, or return None if the file doesn’t exist. However, I might have been more strict: I do allow dictionaries with a missing foo value, and simply return None, and I’m willing to cast the value at foo to an integer even if it’s a string or float.

Having seen a lot of code that tries to lump similar things together with zillions of if statements, usually in the name of deduplication or convenience and at the cost of unreadability, I would argue that stronger preconditions are better. The deduplication is better done via shared helpers with strong preconditions themselves. Casting it int is probably even a bad idea… but I’ll leave it as is given that the requirements of this program are pretty arbitrary.

My postconditions are pretty simple in this case: return the int or None that corresponds to the value of foo at that path.

Identifying violations

A lot of the ideas in this post are based on this article by Meyer, which has a bit of a vendetta against “Defensive Programming”. In the defensive paradigm, we would anticipate all possible problems and end up with a check at every stage. I added a client to the get_foo function to demonstrate this:

Indeed, this is really verbose. But there isn’t a hard and fast rule that the article offers for who is supposed to check. The main thing it argues explicitly against is the exception handler usage I first identified as potentially dangerous above: special but expected cases. My interpretation of Meyer’s work, in the context of Python, is that assertions can be used to make it really clear what is at fault when something goes wrong. In fact, that should be the primary goal of all exception handling: ease of debugging.

I think this the final version I would go with:

import json
FOO_FILE = 'somefile.json'
def get_foo(filename: str) -> Optional[int]:
"""Return the integer value of `foo` in JSON-encoded dictionary, if found.
Args:
filename: Full or relative path at which to look for the value of 'foo'. File must exist and be a valid JSON-encoded dictionary.
Returns:
The integer-casted value at key `foo`, or `None` if not found.
"""
with open(filename, 'rb') as fobj:
data = json.loads(fobj)
assert isinstance(data, dict), 'Must be a JSON-encoded dictionary'
if 'foo' in data:
try:
foo = int(dict.get('foo'))
except TypeError:
return None
else:
return None
def foo_times_five(): -> int
"""Print 'hello' the number of times that foo specifies"""
if not os.path.isfile(FOO_FILE)):
return 0
foo = get_foo(FOO_FILE)
assert isintance(foo, (int, None)), "Foo should be an integer value or None"
if foo is None:
return 0
else:
return foo * 5

My rationale is the following:

A static type checker would take care of the filename not being a string.

A failure of opening the file would be easily attributed to the caller, given the docstring. So, my caller takes care of that check, since it’s using a module-level variable it may not entirely trust (it could get mutated at runtime).

JSONDecodeError is also pretty easily to interpret. Plus, there isn’t really a way to assert for the validity of the file other than to use the json library’s own contracts.

I generally find failing to index a presumed dictionary to be confusing error to debug, especially since mypy and and json itself can’t help us out to catch type issues from validly deserialized data. Therefore I make an explicit check for the type of data.

foo_times_five wants to ensure its own guarantees, and since the * operator works in strings and lists and so on as well as ints, I added an assertion.

Also, in addressing the custom error vs builtin error question, I didn’t think it was necessary to make my own exception subclass. All the assertions I added should completely disrupt the flow of the program, and AssertionError comments are effective for communicating what went wrong.

Handling violations

You’ll notice that my two functions have drastically different ways of handling exceptional cases once they occur. foo_times_five returns an int almost at all costs, while get_foo raises more and asserts more. Once again, this is a really toy example, but the goal is to originate the exception as close to the source of the error as possible. As I think about i, further, there seems to be b

At the boundary of a self-contained component, because that component has violated an invarient and needs to be reset. Self-contained should be emphasized - if the component shares any state, then it could leak that invariant violation to other components before being torn down. This is an argument for distinct processes per component, and I think good part of the reason why microservices, that are deployed and rebooted independently, are increasingly popular. The Midori language is really opinionated about this, (and that blog post is a long-but-good read on how they came to their exception model). Any error that can not be caught statically results in the immediate teardown and recreation of an entire process, which is called “abandonment”.

A great big try/except and automatic reboot clause at the highest application level. This is essentially the equivalent of #1, but we’re giving up on everything. The downside is that any bug that makes it to production will cause crashloops, so this needs to be paired with good logging and reporting.

Really close to code that failed. An important distinction is that it should be near the code that could not fulfill its contract - not at the first level that happened to not be able to continue execution. Interestingly, most of the examples I can think of also qualify as #1, like retrying after network errors, server 500, or refused database connections. We just happened to already have well-defined ways of communicating that something has gone wrong across these “component” boundaries.

Wrapping callee libraries that are designed to have control-flow oriented exceptions.

I would add two more debug-facilitating rules: leave some trace every time an exceptions is strategically swallowed, and be as consistent as possible (perhaps, via documented invariants) about the criteria for exception recoverability across environments and parts of the codebase. Here is why:

A war story

At work, we used to have a general-purpose exception handler that would report stacktraces to a central server, but behaved somewhat differently in “debug” and “stable” contexts. In debug situations, it would bubble up the exception, causing a crash for the developer to notice and fix, since they were right there editing code anyway. In stable, it would continue as is. The intention was noble - we want to minimize crashes for end users. But this made it really difficult for developers to reason about what would happen when their code was promoted to stable, and in a worst case scenario the application could continue running after an error in a untested, unknown state. This problem was thrust into the light when a unittest began failing only when the stable environment flag was on. We systematically got rid of the handler, and also now run all tests with a mock-stable configuration on every commit to guard against other configuration-dependent regressions.

I’m going to talk about something really boring. That thing is packaging and distributing Python code.

Motivation

I have been programming primarily in Python for about 3.5 years now, happily creating virtualenvs and pip installing things - and sometimes yak shaving non-Python dependencies of those libraries - without any idea how python code is packaged and distributed. And then, one day I had to share a library my team had written with another team, and lo and behold, I wrote my first requirements.txt file. I was inspired to peek under the hood.

It turns out some people think that Python packaging is sucky. For most of my purposes - working on small pet projects or in fully isolated development environments maintained by another engineer, this works fine. The problem is that the ONLY thing that Python tries to encapsulate is Python code - not dependencies, not environment variables, etc. There is some more exposition about that from other people if you’re interested: example.

The biggest hurdle to getting started was understanding the lingo.

Important terms

The PyPI (Python Package Index), previously known as the Cheese Shop, has a glossary with everything that you might want to know. Here are the ones I was previously confused about:

module - In Python, a single .py file is a module. This means that Python can be organized without classes quite nicely. You can put some methods and constants into a file and tada, a logically isolated bit of code.

package - A collection of modules. The tricky thing is that it can either be an import package, which is module that contains other modules that you can import, or a distribution package, described by distribution below. In colloquial usage it seems to mean “a collection of Python code published as a unit”.

distribution - An archive with a version and all kinds of files necessary to make the code run. This term tends to be avoided in the Python community to prevent confusion with Linux distros or larger software projects. Many distributions are binary or built, shortened to bdist, and are binary, platform-specific, and python-version specific. sdist or source distributions are made up only of source files and various other data files you would need. As you might expect, they must be built before they are run.

egg - A built packaging format of Python files with some metadata, introduced by the setuptools library, but gradually being phased out. It can be put direcly on the PYTHONPATH without unpacking, which is kind of nice.

wheel - The new version of the egg. It has many purported benefits, like being more implementaton-agnosticmm and faster than building from source. More info. About two thirds of the most popular Python packages are now wheels on PyPI rather than eggs. Citation.

I also found it mildly interesting to figure out how this somewhat fractured environment came to me.

A brief history

There is a chapter of The Architecture of Open Source dedicated to nitty gritty of python packaging. The CliffNotes version is that there was some turbulence over Python’s packinging libraries, setuptools and distutils. distutils came first, but it lacked some really integral features… like uninstallation. Setuptools was written to be a superset of distutils, but inherited some of the same problems. One key issue is that the same code was written to take care of both publishing and installing Python packages, which meant bad separation of responsibility.

Meanwhile, there was a also some friction between easy_install and pip. Easy_install comes automatically with Python, can install from eggs, and has perfect feature parity on Windows, but pip has a much richer feature set, like keeping track of requirements heirarchies and (most of the time) uninstallation, but is more finicky.

I may follow up with more on how virtualenv and pip actually work, stay tuned!

One day at work I was presented with a problem. We use Amazon Web Services’ EC2
for a lot of things. An instance was storing lots of data in a specific directory, and it ran out of space.

The machine at hand: an Ubuntu 14.04 instance, the Trusty Tahr. I made a blank micro instance for demo purposes.

The solution: Elastic Block Store, or EBS.

What is an EBS volume?

From the Amazon documentation,

An Amazon EBS volume is a durable, block-level storage device that you can attach to a single EC2 instance.

Basically it’s a bit of storage that can have a filesystem and be treated by the operating system mostly like it would a physical drive you add or remove from your computer. However, it’s connected the the server over the network, and can therefore be umounted and saved or shifted from one instance to the next. They’re a really core AWS product - most EC2 instances these days are “EBS-backed”, meaning an EBS volume taken from a standard snapshot is mounted at the root directory. You can turn off the instance and keep all its data intact if you want.

The other kind of instance is “instance store”, which doesn’t persist and mostly seems to be around for legacy reasons.

Mounting

What does all of this mounting mean? Presumably no steeds are involved.

Well, a computer’s files can be spread across a bunch of different storage devices. Mount is the Unix command that tells your operating system that some storage with a filesystem is available and what its “mount point” is, i.e. the part of the directory tree for which it should be used. If you were dealing with a physical computer sitting on your lap or desk, you might use mount to indicate a new SD drive or DVD. As you might expect, umount is its opposite, and disengages the storage.

So… where’s my EBS volume?

First of all, talking specifically about AWS, you have to do some configuration dances before you get an actual extra EBS volume anywhere near your instance. As you can see from the output above, there is already a device called /dev/xvda1 hanging out at the root directory, which is the root device.

When you’re creating an instance, there’s a handy button that says “Add new volume” and then you have to choose a name for your volume, use a specific snapshot, and various other specifications. The names are things like “/dev/sdb1”.

What does this name mean? The helpful popover says:

Depending on the block device driver of the selected AMI’s kernel, the device may be attached with a different name than what you specify.

Wat?

A little history of technology

EBS volumes are “block devices”, which means they read data in and out in chunks, do some buffering, allow random access, and in general have nice high level features. Their counterpoint is the character device, which is less abstracted from its hardware implementation. They are sound cards, modems, etc. Their interfaces exist as “special” or “device” files which live in the /dev directory.

The options for block device names in the AWS console pretty much all start with sd[a-z]{2}. This stands for “SCSI disk”, which is pronounced like “scuzzy disk”. SCSI is a protocol for interacting with storage developed long ago. It was the newfangled way to connect storage to your computer compared to IDE cables, which are those ribbons of wire I at least remember my dad pulling out of the computer to detach a floppy drive in the ’90s.

More nomenclature: hd would indicate a hard disk, and xvd is Xen virtual disk. Everything in Amazon is virtualized, and the tricky thing is that only some kinds of instances understand that and adjust operations to account for it. Those that do, like Ubuntu 14.04, convert the sd name you chose to xvd. Anyway, you can suss out what exactly your instance’s OS will think the device is name using this reference.

What does this all mean PRACTICALLY? Like how do I actually get to the volume?

Filesystems

I originally thought that filesystems were part of the operating system, since with OSX or Windows (which is what I’ve used historically), you’d have to go out of your way to use a non-standard one. With Linux, choice is abundant. This article has a lot of good things to say clarifying the many Linux filsystem options. For devices that are not crazily large, the standard on Linux these days is ext4 - it’s journaling (meaning it checks in both before and after writes which prevents corruption issues), it’s backwards compatible (so you can mount its predecessor ext filesystems as ext4), and it’s fast.

Important note: EBS volumes do not come with a filesystem already set up. That line about telling you to “specify” in the error above is a red herring. First, you have to make a filesystem with mkfs.

Some more resources

Welcome to my blog! I was going to get right down to business, but this felt like a good time to set out what exactly
I aim to accomplish by writing things publically available on the internet.

This is primarily a “technical” blog. I certainly love writing witty one-liners, posting about my personal life, or taking pictures of myself eating things - but I have twitter, facebook, and a silly tumblr for all of those respectively.

My goals:

To keep a record of stuff that I’ve learned recently for reinforcement and future reference.

To be useful to others looking for resources. For right now that will be a lot of systems and command-line utility oriented things, hopefully accessible to those who (like me) don’t have a computer architecture course under their belts.

To give myself a reason to go in depth into topics outside of work, like for example, learning how to set up a Jekyll-based blog.