At edX, we’ve gone through several evolutions of how we configure our primary django application (edx-platform). We started with a settings.py, as is standard for django projects. I completed the migration from a single settings file to a directory of environment-specific settings files in this commit. One of those environment specific files was aws.py which read in specific settings from json configuration files that we could deploy with configuration management software. That setup continues to be the way we inject production configuration to this day.

One downside of this style of configuration, however, is that it requires a redeploy/reboot cycle to update the configuration. It would be better to have the configuration stored in a central location that the system could read from (and update) on demand.

In part 1, we saw a memory leak that was in some sense static: The memory use was unintentional/undesired, but it was only allocated once, and didn’t get worse as the process continued. On the other hand, the second leak was a true leak in the classic sense: the memory use appeared to be unbounded.

This is the second of a 3-part series on memory leaks in edx-platform. Read the first part for an introduction to memsee, the tool I used for much of my debugging.

Over the past several weeks, we’ve been contending with several memory issues on edx.org. The first manifested as a sudden increase in the resting memory footprint of our production web-processes. The second presented as a classic memory leak while doing offline grading of an entire course. In this post, I’ll go though the steps and tools we used to identify the causes of the first of these leaks.

This is the first of a 3-part series of posts detailing my investigations into two separate memory leaks in edx-platform. In this post, I’ll describe memsee, a tool built by Ned and me during a previous memory investigation which can help to provide insight while diagnosing a leak.

One of the difficulties when writing unit tests is picking input data (or test cases) that expose all of the potential bugs in your code. Often, you write test cases to catch the bugs you know exist (or thought about guarding against), but miss the input that would lead to the bugs that still exist in your code. An alternative to this is property-based testing, where rather than choosing inputs yourself, you let the computer choose inputs for you.

Property-based testing is an alternative approach to unit testing where rather than describing specific input values and the results of executing operations on those values, you instead write a test that should be true of all input data of a certain form, and then let the test framework feed in data until it either finds an example that fails, or runs out of attempts to do so. The grand-daddy of all property-testing libraries is QuickCheck, but re-implementations exist in many languages.

My name is Calen Pennington. I'm a software developer and lead architect at edX,
father to a two-year old, part-time haskell hacker, board/card/video-gamer. This blog will primarily
focus on the first of those.
Find me on: