As a vi user who codes in a GNOME environment with sloppy keyboard habits (you'd think there wouldn't be that many of us, but... surprise!), I hate the GNOME default of having F1 bound to the help menu. I'm constantly hitting F1 when reaching for ESC, so I regularly get annoying help windows popping up.

For a long time, I assumed that it was the top-level GNOME help document -- you know, like "how to use GNOME." I'm still not sure if this was ever the case. But today, I noticed that it's actually showing me the help file for GNOME Terminal. I scoffed that I would actually care to read the help file for my terminal emulator.

The irony is that if I had read that annoying help file, I might have actually found the answer to my problem. You can disable it (and set a lot of other nice shortcuts) under Edit | Keyboard Shortcuts.

A lot of the code I write these days is for running experiments (or analyzing data from experiments). And I'm always trying to automate my testing more, in order to keep the system from being idle when there are more cases to test. So, given some experimental program E, and a set of inputs to test {1,2,3}, it's tempting to give E the power (through the magic of computer programming) to take in a list of tasks to work on -- so you can tell it "do {1,2,3}" instead of telling it "do 1", then telling it "do 2", and then "do 3".

But there's a problem here, especially if your tests take a long time to complete. If your code crashes (or a test fails) in the middle of "do {1,2,3}" it may not be obvious where in the test sequence the code failed. Or, even if it is obvious, it may not be easy to pick up the tasks from where things crashed.

Instead, it's more robust to write a simple wrapper W that, when given the inputs {1,2,3}, picks them off one at a time, calling E with the individual tasks. It may take a little extra work to do things this way, but it will pay dividends in the long run.

For example, another benefit is intelligent management of repetitions of experiments. Suppose you have tasks {1,2,3}, but you want to run each of them 3 times. A naive approach will run {1,1,1,2,2,2,3,3,3} but then you have potential ordering bias. And, if (say) tests {1,1,1} succeed, but the 2nd test 2 fails, what happens to your results?

Using a wrapper W, you can generate the list of experiments, randomize them, and write them to a file. W can track the success or failure of each test, keeping your experiments running but also separating the wheat from the chaff. This approach can also give you a good way to track progress through a set of experiments.

My "task log" usually looks like a shuffled file with one line for each task, or a directory of files with each file representing one task; this also allows you to modify the task set on the fly -- adding tests to the end of the task log. W doesn't remove a task until it's been completed successfully, but it moves on to subsequent tasks so it doesn't get stuck in a pathological case. (You could have it put failed tests in a separate list.) Anyway, all this allows for better recovery and progress tracking.

Anyway, I know there are lots of names for this general principle of abstraction or compartmentalization, but I'm calling it the "Principle of Least Responsibility" -- with the first working definition being "give a piece of code the least amount of responsibility that is practical". I say "practical", not "possible" because taking it to the extreme could be burdensome.

This is pretty similar to the UNIX philosophy of "Do one thing and do it well." In my case, it specifically means "don't make the code hold the state of a sequence of experiments because all it needs to be responsible for is the single experiment it is currently performing!"

In the moment, it may seem like a great idea to let your code do all the heavy lifting for you, but a little abstraction/functional decomposition can save you a lot of grief in the long run.

I generally have a real aversion to using a full-on SQL implementation like MySQL or Postgres when it's not really necessary... and I typically think it's unnecessary until something forces me to change my mind. When I started writing small projects in Perl, I used to use Berkeley DB key-value pairs — lately I've used Python Pickles for similar purposes. It's simple and quick, which is nice, but it basically forces you to roll your own code.

Lately, everyone's been getting on the SQLite bandwagon, and it's pretty awesome. I've moved to using SQLite as my first choice when storing anything beyond flat text data. It has nice portability characteristics (unlike your homemade solution), simple backup and export formats. And, being able to make queries on the data is great for me, since these days most of my data are experimental results, etc.

But another good reason for using an external, portable data sink became obvious when I started to visualize and analyze my data using R. R has the RSQLite package, which made importing data into R for plotting and analysis a breeze. (Or as much of a breeze as anything is in R.) The thing is, roll-your-own formats may be perfectly "good enough" for isolated projects, but the minute you start wanting to use a different tool to view the data, having it already be in a format that is easily accessible is a major win. And if you're like me, you won't always know that in two months you're going to want to use the data in some completely different way. So I feel like I got that capability for "free" just because I decided to store my data in a more structured and portable way.

But SQLite is targeted at a specific niche — projects that would benefit from SQL behaviors but don't need all the robustness and consistency guarantees of "enterprise" databases. While you get good performance (because SQLite works inside your code, rather than through RPCs), if you start wishing for read/write concurrency (say, importing new data and plotting other unrelated data at the same time) you may find yourself frustrated with SQLite's limitations. That's what happened to me. As I started to generate large numbers of plots, I wanted to be able to run multiple scripts (some of which generate plots, some of which update other tables) at the same time — SQLite can balk at this.

So, I switched to a more traditional SQL database, which itself was relatively painless because of the underlying SQL standard. (R has libraries for it as well — another good reason not to roll your own even if you don't need all that SQL provides.) And in turn, this highlighted another unexpected value for the more "enterprise" systems: caching. Re-running long queries just to tweak a plot is considerably quicker with the big server as opposed to SQLite.

None of this is new information — in fact, it's a pretty textbook tour of the hows and whys of data storage. But for me, it inspired a change of heart, thanks to the convenience factor. In the future, I'll probably start with a Big Dog SQL server for research projects (running on my personal laptop) because it avoids the papercuts encountered when my projects get to big for SQLite. But I'll stick to SQLite for simpler things, to avoid the dependencies created by the more enterprise approach.

Lately I've been really interested in the process that a craftsman goes through, intentionally or unintentionally, in the process of creation. I use the word craftsman because I'm interested not just in "art" per se, but also (not coincidentally) in things like academic writing, problem solving and engineering. This is a fascinating treasure trove, not just for the answers provided, but for who provided them...

In the Boy Scouts, there is a thing called a "Totin' Chip". It is "both an award and contract in Boy Scouts of America that shows Scouts understand and agree to certain principles of using different tools with blades" (WP). To get the Totin' Chip, which is a paper card (like a library card or the like) scouts must demonstrate a certain amount of knowledge and responsibility. The Wikipedia page has more on it, of course. The main thing (besides the rules) is that violations of the Totin' Chip code result in one or more corners of the card being removed; when all the corners are gone, you lose your right to tote a blade.

Anyway, I think there should be a "Codin' Chip" -- maybe it's a card, maybe it's an actual chip. If it's a card you lose corners; if it's a chip, you lose pins. Anyway, when you lose em' all, you're done.

Violations can be large or small; for example, not commenting code meant for others to read falls into that category, as does using equality to test floating point numbers inappropriately. Using strcpy and the like is definitely in there.