Sex, software, politics, and firearms. Life's simple pleasures…

Main menu

Post navigation

A teaching story

The craft of programming is not a thing easily taught. It’s not so much that the low level details like language syntaxes are difficult to convey, it’s more that (as I’ve written before) “the way of the hacker is a posture of mind”.

The posture of mind is more essential than the details. I only know one way to teach that, and it looks like this…

19:51:23 esr | You know, at some point you should build Open Adventure and play it. For that geek heritage experience, like admiring Classical temple friezes.

ianbruene | note to self: play advent

19:53:56 esr | I actually think this should count as (a very minor) part of your training, though I’m not sure I can fully articulate why. Mumble mumble something mimesis and mindset. It was written by two guys with the mindset of great hackers. If playing that game gets you inside their heads even a little bit, you’ll have gained value…

19:55:08 ianbruene | I already had a flag set to fix the non-human-readable save problem someday, if no one else got to it first. Kind of hard to do that without playing at least *some* of the game.

19:55:15 esr | :-)

19:55:29 ianbruene | (non human readable saves irk me)

19:55:35 esr | An excellently chosen exercise, apprentice!

19:57:57 esr | Since you’ve brought it up, let’s think through some of the tradeoffs.

20:08:20 esr | 1. Format is rather less fragile than you think (I’ll explain that).

20:08:33 ianbruene | I only knew of the ludicrous example of the MS formats previously

20:09:27 esr | 2. The FORTRAN save/restore code was really nasty and complicated. It doesn’t get as simple as it now is unless you have fread/fwrite and a language with structs.

20:10:07 esr | Now, why this isn’t as bad a choice as you think:

20:10:31 ianbruene | (pre-guess: everything is pre-swizzled)

20:11:56 esr | No. It’s because any processor this is likely to run on uses the same set of struct-member aligment rules, what’s called “self-aligned” members. So padding won’t bite you, just endianness and word size.

20:12:54 ianbruene | *blink* oh…. another win from the intervening steamroller of standardization

20:13:17 esr | Precisely, another steamroller win.

(Editor’s note: ianbruene is aware that at the time the original ADVENT was written, greater diversity in processor architectures meant that the structure member alignment rules were more variable and more difficult to predict. So you took a harder portability hit from using a structure dump as a save format then.)

20:13:26 ianbruene | read it before, been quite a while (and didn’t have anything to put it into practice)

20:14:22 esr | So, reread with this save format in mind. Go through the reasoning to satisfy yourself about what I just claimed.

20:14:32 esr | It won’t take long.

20:16:06 esr | Now, this does *not* mean a memory dump would be a good format for anything much more complex than this game state. We’re sort of just below the max-complexity threshold here.

20:16:44 esr | And we do get screwed by endianness and word-size differences.

20:17:39 esr | But…let’s get real, how often are these save files going to move between machines? This is not data with a long service lifetime!

20:19:49 esr | OK. Continuation of exercise:

20:20:28 esr | What’s the simplest way you can think of to design an eyeballable save format?

20:21:14 ianbruene | *thinks* (given that I know nothing of the internal structure of advent)

20:22:45 esr | You don’t need to. Look at the structure definition.

20:23:47 * | ianbruene looking up struct

20:28:34 ianbruene | well *one* simple way of doing it would be to do a (I forget what the format is called) var=value\n format, with the save function being a giant printf of doom and the load function being a giant switch of doom.

20:28:44 ianbruene | I don’t think that is *the* simple one though

20:30:06 esr | That sort of thing is generally called a “keyword/value” format. It is the most obvious choice here. Can you think of a simpler one?

20:30:42 esr | (I’m not sure I can.)

20:31:16 ianbruene | ok, we know the struct “shape”, could arrange for all of type X, all of type Y, etc. to be in contiguous spans. Sequence either hardcoded or using a build time generator for the var names. Hmmmmmmmm….. while it has a certain elegance it seems brittle and complex

20:31:25 esr | Yes. It would be that.

20:31:38 esr | Pretty classic example of “too clever by half”, there.

20:33:35 ianbruene | ok, ignore assuming a shape, it would be *possible* for a code generator to simply look at the struct and create a pair of load/save functions from it, using either the internal names, or special comments in the definition

20:33:59 ianbruene | I don’t think the format itself can get any simpler than key=value though

20:34:12 ianbruene | there isn’t much complex structure in the save.

20:34:21 esr | I think you’re right.

20:34:33 ianbruene | it isn’t a bunch of logical blocks of different rooms and characters

20:35:09 ianbruene | if there were there would be useful tradeoffs in how you grouped things

20:35:45 esr | Good! That was a sound insight, there.

20:34:59 esr | Now, you’ve correctly described a way to implement dump as a giant printf.

20:35:27 esr | Do you as yet know enough C to sketch restore?

20:35:39 ianbruene | *thinks*

20:36:24 ianbruene | ok, the template I’m thinking is similar to the packet handling code for mode 6 (python side). but it is more complicated due to C

20:37:29 ianbruene | read until get a token, slice off the token, feed the token into the Switch of Doom

20:37:47 ianbruene | the SoD sets any vars it gets tokens for

20:38:07 ianbruene | if the file is well formed you get all the data you need

20:39:49 esr | Alas, you’ll find actually doing restore in C is a PITA for a couple reasons. One is that you can’t switch on a string’s content, only its start address – C switch only accepts scalars.

20:40:31 ianbruene | grrrr, so you have a big ugly set of str compares in if statements

20:40:37 esr | Indeed, you’re going to write a big fscking if () with a whole bunch of strcmp() guards.

(Editor’s note: There’s another way to do it, driven from iterating through a table of struct initializers, that would be slightly more elegant but no simpler.)

20:40:38 ianbruene | I forgot about that

20:41:29 ianbruene | this is something where your code style becomes *very* important or it will be an ugly, incomprehensible mess

20:41:45 esr | Yes. Now you begin to see why I went to stupid fread/fwrite and stayed there.

20:42:31 ianbruene | and the obvious way to do it in something like python (magic introspection to class elements or dict keys) doesn’t work here

20:43:03 esr | Right. Replacing this binary dump with something clean and textual is not a terrible idea, but really only justifies itself as a finger exercise for a trainee, like playing scales to learn an instrument.

20:43:17 ianbruene | I see

20:43:39 ianbruene | hence why you mentioned in the blog that it was very low priority

20:43:46 esr | Right. The absence of introspection is the other lack in C that makes it a PITA.

20:44:17 esr | And you’ve extracted most of the value of the finger exercise by thinking through the design issues.

20:45:02 ianbruene | when you get to do it introspection is a gigantic win, makes it difficult to remember how bad it is when you don’t get it

20:45:08 esr | Yes.

20:46:05 esr | Those of us who started in LISP learned this early. Took forty years for the rest of the world to catch up, and they’re only getting there now.

20:46:40 ianbruene | I have done only the barest toying with lisp, barely even hello world level

20:47:00 ianbruene | but even that (coupled with some reading of a lisp book) changed the way I thought

20:47:35 ianbruene | plenty of times I’ve hit a snag of “this would be *so much easier* if I could do a lisp macro in python”

20:47:52 esr | Indeed.

20:48:33 ianbruene | incidentally, has GvR used lisp at all? the impression I’ve heard is that he doesn’t like the lispy features?

20:49:11 esr | He doesn’t. Back in the late ’90s I practically had to arm-wrestle him into not killing lambdas.

20:51:37 esr | I think I might edit this dialog into a blog post. Start it with the Heinlein quote about the ideal university: a log with a teacher on one end and a student on the other.

20:51:51 * | ianbruene grins

The foregoing was transcribed from IRC and lightly edited to fix typos, fill out sentence fragments, and complete 80%-articulated ideas we mutually glarked from context. A few exchanges have been slightly reordered.

Google+

34 thoughts on “A teaching story”

Anti-cheat could be solved by putting a (non-standard, or at least not obtainable from the text representation with standard tools) hash of the rest of the file as a data element. They could hack the code to remove the check, but if they can do that they can hack to cheat anyway.

As for avoiding the if/strcmp chain: Solvable with an alphabetized table of { name, information needed for offset or type (or perhaps just an integer that can be switched on) } and bsearch.

>Anti-cheat could be solved by putting a (non-standard, or at least not obtainable from the text representation with standard tools) hash of the rest of the file as a data element. They could hack the code to remove the check, but if they can do that they can hack to cheat anyway.

Yes, true. I thought this through and concluded it was not worth the bother.

Aww, but cheating on my games is one of the things that taught me about binary file formats to begin with. And since I had to do it in DOS, without a decent hex editor, it taught me some C file IO and typecasting too. Don’t encrypt your saves on single-player games: That deprives kids of a learning experience! :-P

I did that! In DOS, using the internal editor of Norton Commander. It was a text editor, but it did not truncate on a null byte, and preserved line endings, so good enough to patch a couple of bytes as long as you didn’t need to enter a null byte from the keyboard. (For all other byte values, the Alt+nnn method worked.)

The lack of introspection is merely something that is inherent in C due to its simplicity. C++ is what you get when C gets the christmastree treatment. For C, you could write a metaprocessor (like the C “beautifiers” – there aren’t many for Python?) because of the simplicity so as to add something to do introspection.

As to the key/value or something else, there are JSON or XML libraries that will do it for you. It isn’t a POSIX standard like printf, but if you want it, there is where it will be.

(A C – to Javascript so it could run on any web browser…).

I find it strange that sometimes you seem to want to be plain, bare metal, and other times you want to assume a modern processor (e.g. IEEE floating point or better).

Doom (sans sound) runs on an ESP32. And some IoT thermostats.

Much of the “why” of certain practices involved squeezing things into 8k of ram. See the ATtiny85 based Arduinos. Even if you don’t hack hardware, fitting things into the finite and unexpandable space of a $2 microcontroller is the strengthening exercise and test. I think my Arduino feature complete (but ugly, it isn’t fread/write) FAT32 takes under 12k. With notes for reetrancy and FAT16 support. And that include full SD card features supported over SPI at full clock speed (with some assembly).

I had an apprentice though I interacted less. But I told him to learn on a small format system. It forced him to learn to be efficient. And it wasn’t easy to debug, though easier than when I came up.

I can often be found on the following Freenode channels: #ntpsec #gpsd #libertarians #newguard #icei #reposurgeon #irker. If you want me rather than one of my projects, ##esr. This conversation took place on #ntpsec

With text adventures, there’s a third option for save files. Instead of recording the current state of the game, record the sequence of valid commands the player gave to reach that state from the (known) start state. The program already knows how to convert that sequence into a valid state, because that’s how the game is played. And it’s an uncheatable format, because no sequence of valid commands can put the game in a state the player can’t reach.

The disadvantage, of course, is that while a copy of the current state has a fixed size, a copy of commands that reach that state can grow without limit – unless the game imposes a time limit, restricting the number of commands a player may issue in a single run.

Eric, I don’t know when/if the conversation did take place, or even if you perhaps already know this, but there are macros in Python: https://github.com/lihaoyi/macropy
(apologies for the nonclickable URL)

Just save the commands that are input by the user, so that they can be written to the savefile (replacing meaningless ones with something short that will eat a turn); then all that’s left to do is saving the initial PRNG seed so that it can be written out as a ‘seed’ command.

This would be more complicated in the end because you’d need to (rigorously) test it, and for that you need a (human-readable) save format for the game state, thus getting you back to the initial problem (with some slight differences), but apart from that it’s simpler.

And the best part: this makes it impossible to cheat!

— EDIT: it looks like I was in a race condition with somebody else, and lost. Whoops :-)

The first is to read the “21st Century C” book by Ben Clemens. I think it is a good book about modern C and surrounding ecosystem, but I am not an expert.

The second: if possible, try to avoid reinventing the wheel. There are nowadays many open-source single file (micro)libraries for C/C++; here is one list: https://github.com/nothings/single_file_libs (including parsing ini-files).

If all else fails, you can use libMarpa library, which can create a parser (and lexer) based on EBNF notation (or its own extension, SLIF notation, which includes notes for lexer, parser config and procedural parsing).

On those microlibraries, I had, just a few days ago, rejected a patch from someone to add PNG support to DeuTex via adding in lodepng to the files — I didn’t want the burden of manually maintaining lodepng whenever it got updated, thought that linking to libpng was the better idea.

I suppose everyone has their own tastes, but I certainly didn’t feel comfortable with such a thing :)

I’ve had the pleasure of being in these kinds of situations myself. Sometimes the best ones were where you couldn’t tell which end of the log was which, i.e., the “teacher” was learning as much as the “student”.

Ahhhh, now that is the right kind of clever. Bravo! If we’re going to take on this sort of additional complexity, that would be the smart way to do it.

For anyone wondering why, it’s because programs are better at generating this kind of boilerplate code correctly than people. The Python to generate the parsing code would be easier to verify and extend than the handwritten parser.

This is pretty much the approach used by Naughty Dog to define game structures and even data (initial spawn points, particle parameters, enemy-AI parameters, animation scripts, etc.), for their Uncharted and The Last of Us games. They used Scheme instead of Python as the language in which to implement the code generators, and from there it was easy to define a DSL for game data, allowing programmers and content artists alike to write new scripts or tweak existing ones. The .h files defining the structures, and C++ serialization/deserialization code, were generated by their compiler, written in Scheme, alongside a binary blob that contained the data itself.

Sidenote: the idea of muggles being productive in Lisp is not new; there are reports of secretaries writing their own macros in Lisp in Multics Emacs. In this case, with an appropriate DSL and tooling to hide the nasty details, for an artist to script a new animation that got snarfed directly into the C++ engine to run in the live game was a doddle compared to having the programmer code the animation in C++ with artist input.

Sorry if that doesn’t seem obvious to those of us from later generations. Most of us grew up with Windows or the Macintosh; asking someone my age or younger to touch a config file, let alone computer code, when their primary work isn’t development or “devops” is asking too much.

Emacs itself has been in slow decline since the nineties, perhaps dimly remembered by old-timers who saw it running on VAX iron; its nonstandard, RSI-inducing UI and the abstrusity of elisp compared to Python or JavaScript are oft-cited contributors to its decline. The best way to get a young programmer to try Emacs these days is to reskin it so it works like vim. Subjecting nonprogrammers to it is virtually unconscionable.

In particular the game_t struct. As you can see advent’s data is both simple, and there isn’t very much of it. In the IRC chat I mention that it doesn’t have data for multiple rooms or characters. If it did then some of the fancier serialization methods would be useful because you could write something like saveRoom() and then use it several times on different rooms.

As it stands implementing a different system would bloat the code tremendously, adding a crapton of potential bugs along the way.

> 20:42:31 ianbruene | and the obvious way to do it in something like python (magic introspection to class elements or dict keys) doesn’t work here

I am curious as to why the “obvious way” isn’t to store game state in a big dict and toss it to json.dump() and json.load()? Sure it isn’t simple under the hood, but it’s already written and it comes with the language. The json module is always my first instinct when I have a bunch of data to serialize from python.