The Basic Toolbox

In the software engineering course I’m teaching this spring, I often find myself saying things like “you need to know a scripting language” or “everyone should be able to run a code coverage tool.” Finally, the other day, a student stopped me and asked for the whole list. In other words, what — in my opinion — is the collection of tools that someone graduating with a CS degree should know how to use. Of course I couldn’t answer this on the spot but I’ve been thinking about it since then. The basic idea is that for most any common situation, you should have a decent tool at hand and be able to start solving problems with it without too much fumbling around. (Keep in mind that this is a wish list for self-study: I doubt that any CS program teaches all of these. Also, I didn’t have all of these tool skills when I got my undergraduate CS degree, though I did by the time I got a PhD.)

A version control system: Git is the obvious choice; the main thing you should have is a basic Github-centric workflow including pull requests, remotes, dealing with merge conflicts, etc.

A text editor: We all end up using different editors from time to time, but we should each have a solid default choice that does a good job with most editing tasks. It should highlight and indent any common programming language, integrate with a spellchecker, easily load gigantic files, have nice regex-based search and replace, etc. There are plenty of choices, many CS people migrate to vim or emacs.

A graphing program: I routinely use gnuplot, graphviz, and Powerpoint to make figures. Lots of people like matplotlib.

A scripting language: This is for low-grade automation, quick and dirty data analysis tasks, etc. Python and JavaScript would seem like natural choices. Around 20 years ago I was an intern at a networking company and my supervisor popped out of a meeting with some data concerning switch errors, and asked me to do some analysis to locate the underlying pattern. I wasn’t sure how; he handed me a Perl book and I was able to get the job done before the meeting ended.

A shell language: This is probably bash or PowerShell, but there are plenty of other choices. There’s some overlap with scripting languages but I think there are two distinct niches here: a shell language is for scripting a smallish number of commands, doing a bit of error checking, and perhaps looping or interacting with the user slightly This sort of job is a bit too cumbersome in Python, Perl, or JavaScript.

A systems language: This is for creating servers, daemons, and other code that wants to go fast, use little memory, have few dependencies, and interact tightly with the OS. C or C++ would be the obvious choices, but Rust and Go may be fine too.

A workhorse language: This is your default programming language for most tasks, it should have a huge collection of high-quality libraries, be pretty fast, run on all common platforms, have a great tool ecosystem, etc. Racket, Java, Scala, OCaml, C#, Swift, or Haskell would be great — even C++ would work.

A pocket calculator: This is your go-to REPL for basic arithmetic and conversions between number representations, it should be near-instantaneous to get answers. For reasons I no longer remember, I use gdb for this — typically multiple times in any work day. Old standbys like bc and dc also seem like bad choices. I’m curious what other people do here.

Tools for Programming Languages

There’s no reason these days to use a language that doesn’t have a good tool ecosystem. For any given language you should know how to use its interactive debugger, static and dynamic bug-finding tools, a profiler, a code coverage tool, a build system, a package manager, and perhaps a random test-case generator.

Secondary Tools

There are a lot of other tools that could have gone into my basic toolbox, such as a data analysis tool, a browser language, a cloud-based testing service, a statistics language, a typesetting system, a spreadsheet, a database, and a GUI builder/toolkit. I don’t consider these as fundamental; of course, your mileage may vary.

17 replies on “The Basic Toolbox”

Great list!! I have some questions that I’d love to hear your feedback on, but I wanted to ask you this question up front:

Do you think that knowing a “profiler” is something that is required? I’ve found it invaluable in plenty of circumstances but I’m not sure that it applies generally. Also, the profiler is unfortunately linked to the source language so I’m not sure that it belongs in its own category. Just a question.

Thanks again for the great list and your writing, overall. I love reading your work!

I haven’t found a good pocket calculator. I switch between bc (only used when scripting), awk, shell $((…)) forms, and calc in Emacs. Emacs’ calc can do it all, but it’s a huge thing to learn and I tend to forget how to use it.

As an emacs user I probably should use calc for calculation needs, but I’ve never gotten the hang of it. I vote for the Python REPL as a calculator.

If you are at a command prompt Python is right at your fingertips. It has infix entry, follows order of operations for arithmetic, uses GNU readline for command editing and history, can do hex, octal, and binary, has FP and arbitrary precision integer arithmetic, features loops and user-defined functions, and operates the same way in Unix or Windows. The ability to save results in variables and refer to the later can also be a big help. The ipython shell is an even nicer Python REPL, although startup is a little slow so you might want to start it in a command window and leave it running.

I hoped to see PowerPoint not mentioned as the first option in presentation tools. Nowadays there’s reveal.js, and other web based presentation methods. There’s also old but gold Beamer which is very easy to export from Org mode in Emacs. Org also exports to reveal.js slides.

For the shell, I’ve started using python a lot more for this–my auto-creduce-assertions script is written in python. Bash gets way too fiddly when you start needing to process output more complex than a grep or do iteration more complex than “for every file.” After having had to rewrite scripts in python several times, I’ve become more proactive about guessing that I’ll need to use python and starting with that instead of trying to drive bash to the frustration limit.

Another point about python is that you can get the networkx package and have fairly ready access to graph library routines, which is invaluable if you’re a compiler developer and spend a lot of time having to think about different kinds of graphs.

My main pocket calculator is Google. If it doesn’t recognize your search query as a calculation, try adding “=” at the beginning. I routinely use it for arithmetic, unit conversion, hex/decimal conversion, even basic trigonometry. My backup calculators are irb (Ruby REPL) and Wolfram Alpha.

I’d say every engineer needs a scripting language in their toolbox, definitely. Even if you only do embedded C development, being able to write a Ruby script to quickly generate a lookup table or test an algorithm is really handy.

Nice article! I used dc for a very long time, but I’d always found it lacking. Last year I wrote clac (https://github.com/soveran/clac), a command line, stack-based calculator with postfix notation that displays the stack contents at all times and updates it as you type. It can be installed with homebrew (just brew install clac) and with package managers for some Linux distributions, but it’s also trivial to compile and install by hand with make install.

If I’m already in the terminal I just use the Python REPL as a calculator. Otherwise I use whatever the native OS one is. Using Python is nice because anything more complicated than simple arithmetic is just an import away.

Joshua, I don’t have any particular feelings about Mercurial but I do think that git+github are so dominant that people should know them.

Tony, I did mean to say that gdb, bc, and dc are all pretty bad options these days. The only reason I keep using gdb is that it has a familiar syntax and I happen to need machine-word-sized computations a lot.

Will, I thought about the profiler issue a bit. I do think they are fundamental to performance-oriented programming but on the other hand a lot of what we do is never performance critical so I relegated it to the “PL-specific tool” section.

The “pocket calculator” question hit a nerve. I realize that I use a totally over-powered tool for that: the interactive command-line of our full-system simulator Simics. I usually have a few sessions running at any point in time, and for dealing with binary-hex-decimal conversions and grouping digits, it does a really good job. Probably makes sense since I most often deal with hardware register values or hardware addresses that are very close to the core value prop of Simics.

I really hope that in a few years, people would naturally add to this list some model checking tool, such as the TLA+ toolbox for example.

To me, the ability to write a bit of “pseudo” code (actually a really precise, yet concise, specification of an algorithm or concurrent/distributed system) and then have the model checker verify that invariants are maintained does feel like a superpower: designing no longer means “playing computer” in your head (or much less so).

One could argue that this goes beyond the “basic toolbox”, but merely knowing about this class of tools is important, so you can either write your own (simpler) variant, or learn to use the current state of the art.