I’m going through The Elements of Computing Systems right now. In chapter six, the task is to write an assembler in the language of your choice. I’m doing it in Rust.

I already have a (horribly hacky) parser for the specified assembly language, but I’d like to do it a little more properly. To that end, I thought I’d use the nom crate to build the parser (I realize that this is probably massive overkill for writing an assembler, but I want to learn to do things properly in Rust while I’m at it). Here’s where I’m at: Every command instruction in the assembly language contains one of eight jump strings, specifying whether to perform a jump if the output of the command is less than, equal to, or greater than zero. For instance, “JNE” means “jump if the output is not equal to zero (i.e. either less than or greater than)”.

I would like to parse the eight different strings into a struct

Jump { ng: bool, zr: bool, ps: bool }

and I can basically see two ways to accomplish this using nom.

Write a parsing function, i.e. a function jump(input: &str) -> IResult<&str, Jump>, by hand. This would mean that I have to implement the error handling myself.

Write a separate parser for each of the eight strings and combine them, all using combinators.

Which of these options should I choose? Are there alternatives that I’m missing altogether? Am I completely on the wrong track with this?

Does each jump instruction in your assembly language start with J? If there are no other instructions that do so you could start by recognizing “J” with the tag!() macro and then alt!() for the other two characters (assuming three-letter codes).

Could you post some more info about the jump instructions in your language?

I actually have another question. But first, I need to explain the assembly syntax a little more. Every instruction is of the form <dest>=<command>;<jump>, where each of <dest>, <command>, <jump> has one of finitely many possible values. The ones for <jump> are the ones I was talking above. <command> is what computation to perform, <dest> is where to save the result of <command>, and <jump> is whether we should jump, depending on the result of <command>.

So I can create parsers for each of these parts separately, and I expect that combining them won’t be much of a problem. There’s just one difficulty: both <dest>= and ;<jump> are actually optional, but <command> is mandatory. Missing <dest> and <jump> should be treated as “don’t save anywhere” and “don’t jump”, respectively.

So my question is, assuming I know how to parse <jump>, how do I parse ;<jump> optionally?

But this parser would return None if there’s no semicolon or if the jump mnemonic can’t be parsed, right? If so, then it can’t distinguish between the legitimate case of there being no semicolon + jump mnemonic at all and the error case of there being a semicolon and an invalid jump mnemonic.