— Tagline Wanted, apply within —

Menu

A popular interview question is coding FizzBuzz. Now I’ve had similar experiences to what is shown at Coding Horror — I had a description for a trivial piece of code that would offer insight into the depth of engineering skill, with my careful posing of the requirements.

I had a sheet with this written down, so I would always say it in the same carefully-crafted way. I lost that some years ago, but here is the problem from memory:

Write a class whose constructor takes two integral arguments, and has members that return their sum and their difference.

Trivial enough? Not even a “problem” with logic to be considered; just a routine microtask that anyone with any fluency can express without having to think about it.

What I expected was to see how well the applicant processed the statement of requirements: making assumptions, or asking for clarification. I figure an experienced engineer would ask what was meant by integral, or use a typedef to abstract the decision of which integer type to use and specify it at only one point. More fluent coders would write it as a template.

I was very surprised to find that most applicants who got past the screening process thus far had trouble writing a C++ class at all! The best insight I had was from someone who explained that he usually uses the IDE and doesn’t know the details of syntax on how to write a class definition. Because of the reliance on what Visual Studio calls wizards, we dubbed these people Wizards’ Apprentices with a reference to the segment from Disney’s Fantasia. The story is simple: the young apprentice tries to use his master’s power but cannot control it; what we would recognize as a buggy program.

What are you showing them?

Now suppose that you are fluent in C++ (or whichever language you are using) and would have no trouble jotting down a piece of code that does what you want.

I see many solutions to FizzBuzz, as well as many other problems, that are written in a way that is completely unlike what we see in “real” code. Someone who can effortlessly produce this might be in “scripting mode” which is used for single-use small programs — written quickly, sloppy, and highly specialized. It does not illustrate good software engineering practices, and the interviewer can’t tell if the applicant always (only) writes like that, or is capable of working on projects like they have: large bodies of code, need for maintainability, testability; follows good practices, etc.

So, I thought about FizzBuzz as if it were a small utility feature that was ordered for use in a large project.

Goodness

Certainly, a short piece of code can be written using up-to-date style and best practices as applicable for software development at the scale that is involved in real projects at work.

But I also thought about how much “good engineering” I can illustrate without making the result not-so-trivial. Here is what I came up with:

Separate the logic from I/O.
Most solutions to simple problems like this mix console I/O with the logic, which is completely unlike real programs in multiple ways. It should not directly solicit input from the user as if he was a text file, or print output in the middle of the work; the “solution” should feature a function that takes parameters and produces results to return.
As with real projects, this can be called by a unit-testing main program as well as find a home in the application that needs that feature.

One thing that bugs me about the typical implementation of FizzBuzz is the explicit repetition of code, checking each factor and often the combination as well. Don’t duplicate code that varies only by a data value! Instead, store the values and loop over them.

Be mindful of future enhancements.
Code always gets more complex over time. It is an art forged by long experience to balance future maintenance with extra work now. Don’t program in features that are not needed — rather, anticipate what will be changed or added later and consider that in the overall design, and allow for such things when it does not add complexity to the task at hand. (I can probably write a whole essay on this topic. Message me if you have examples or general thoughts.)

Results

I was very successful in the point 2 makes the code simpler and shorter. Point 3 takes the form of putting the configuration at the very top of the file, and showing how it can be extended; and the extensibility is free due to the architecture noted in point 2.

There are the needed #includes at the top of the file, and then the interesting part starts off with:

When I wrote this, Visual Studio’s C++17 compiler is still in per-release, and part of the exercise for me was to use new features and see where style needs to be updated.

“constexpr is the new static const.” Naturally, this tabular data will be declared as constexpr.

Now, why did I use const char* rather than an object type? This was a deliberate choice. First of all, the use of this table (described later) is flexible in what it can take, so I don’t require a std::string here even though that’s what will be used in the constructed object. There is no need to store a std::string object here, which would make another copy of the lexical string literals, and string does not have a constexpr constructor.

The usual worries about primitive C-style strings do not apply since this is constant data. After all, do you worry about writing cout<<"hello"; and demand that you store that in a string object before passing it to the function?

A hot-off-the-press alternative would be std::string_view. But there is simply no need here. I chose not to use gsl::zstring, since it would be the only use of the Guidelines Support Library in the program, and there is no question that C-strings are being used since they are right there, and only right there. This is not a function parameter that needs documenting.

Likewise for the use of the plain array, rather than std::array. Arrays are not as poor as they used to be: with the preference for free functions begin, end, size, etc. what does the object wrapper do that the plain doesn’t? Only the regular value semantics of passing and assigning — and I’m not doing that.

Here is the meat: a simple stateless function that accepts the number as a parameter and produces the results to return as a string.

The loop has a one-line body, and it is automatically iterated over the data table shown earlier. That is why adding another item just works. This code is smaller and simpler than the cascade of tests you normally see.

In the main, I avoided a legacy for-loop by using Boost.Range 2.0. That is tried and true for production work. I’ve not tried to get the latest Visual Studio compiler drop to swallow Range-v3 yet.

Discussion

Not only does this show fluency in C++ with up-to-date skills, it is well crafted and provides a number of discussion points. As summarized along with the code samples, I can explain why I did things a certain way, alternatives considered, what I might do under different circumstances, etc.

It is good to understand the constructs used and choose them purposefully, not apply things in a cargo-cult or voodoo programming situation. On either end of the interview process, understand that simple coding problems can give far more insight than simply finding out whether the applicant can “write code” at all.

In Ancient Times, I was privileged to be among those who deliberated upon a fundamental question in C++ programming philosophy: “Whither exception handling?” Exception handling was still in the conceptual stages then, and the air was charged with excitement.

Twenty two years ago, I wrote an article for Windows Tech Journal recounting my experience a few years earlier in comparing error handling mechanisms of exceptions vs. error code returns. In particular, I discovered that using both was an impedance mismatch, with code always wrapping the other kind to convert it: catch the exception and return an error code; or test for an error code and throw an exception. This blather is a road leading towards a pure exception handling approach.

Seriously, in this presentation he also said “Remember when I told you the worst thing that happened to humankind was? I see the std::optional.” He introduces the expected class (still being argued about in committee as I write this), as the right solution giving the best of both worlds: It can return a result or an error code, and the caller can take the step of checking the error code or just access the result without checking, in which case it throws an exception if it’s actually an error state.

There’s no substitute for experience

This is part of a series of posts I’m writing about using toy projects and other exploration to get a hands-on feel for the new C++17 features, library, and style, as well as the behavior and real limitations of the compiler.

I coded up a very simple recursive descent parser, since it’s been noted that the new std::string_view is especially suited for parsers. (There is a lot of sentiment about string_view being problematic and evil, but that’s another story.)

Now a recursive descent parser is one of the few places, it is generally acknowledged, where throwing an exception as a form of control flow is genuinely a good decision. But, this simple grammar doesn’t have enough depth to need anything like that.

The interpreter (it does the work as it parses, as opposed to building a tree representation to be evaluated later) throws exceptions in case of errors. The user of the class will know this as the way it gives errors on bad input or run-time conditions.

The parsing step itself, though, makes heavy use of std::optional. As is the nature of such a parser, a production (grammar rule) might be called where this thing may or may not exist: optional parts in the syntax and alternatives and lists all lead to logic that says, “read one of these; did you get one? No? OK, skip it (or try reading one of those instead).”

Other callers need to read something, period. In that case, the caller needs to check for an error (just as it did when it was optional) and throw an exception. This code is what gave me déjà vu:

The function read_required throws an exception itself if it can’t do what was asked. read_terms and read_identifier, like most functions modeling grammar productions, return an optional-wrapped value.

Call a function, then check the return value and throw. This is done repeatedly in these functions. That is exactly the kind of mismatch I saw all those years ago.

From the nature of the optional returns, it is the caller who decides on the error code. In a more complex grammar, I can easily see wanting to propagate that or modify it as it passes back up the call tree. But in the case of optional, there is no error code coming up — just the lack of a result.

In testing this with different syntax error test cases, I found places where I was not checking the return code. This can coredump because dereferencing the optional does not do any checking on whether it contains a value. (On the other hand, there is a value() member function that does check.) I guess I’m so used to writing such that functions do what they are asked (or don’t return at all) that writing in a style where every call is followed up by an explicit test is challenging as well as ugly and obfuscating.

It’s back to the assessment I made when promoting exceptions in the first place: look at this block of code — what does it do? It’s main purpose is a bunch of if statements and errors. Where is the real logic I came here for? The testing is drowning out the real work.

Would using expected/outcome be better for this?

If the called function loaded the result with an error code rather than just the lack of result, and the attempted dereference would throw an exception, then I would not have to check after every call but still could when I wanted to.

But… it is the caller that knows what error code to assign. Furthermore, what about the specifics of the exception and its payload? If productions returned an error code (just an enumeration), the exception still needs to be the parser_error type and include the cursor position.

Its like the (deeply nested) called function needs to interact with the caller to formulate the proper error. Catching and modifying and re-throwing is something that will not go over well with modern implementations, as throwing is very slow.

Parsing is one case where we really care about details across several levels of function call, in order to get meaningful feedback on the error. In more normal circumstances, you tend to rely on logging. Even so, having a speculative log — include this line if an error is logged; flush it upon success — is very helpful and cuts back on the spew of irrelevant information logged.

But the same technique could be used to generate a more meaningful exception, adding information as the stack is unwound.

Besides catch to deal with a problem after it occurs, and destructors to automatically clean up resources, we need some way to prepare for a possible error. This would be primarily for reporting, not for recovery actions. That’s probably why it has been neglected as a topic — destructors are perfectly good at cleanup and using them means we don’t have to explicitly prepare.

There was no built-in bool type or true/false keywords at this time! You might find that hard to fathom, but as I write this in 2017 the existence of std::byte is brand new and almost no code uses it yet.

Of course, when I learned C there was no such thing as the void type for making void*, names of structure members had to be globally unique and if you used the wrong member name for a variable there was no error — you just got the fixed offset represented by that member name, applied to the wrong type of variable. And functions declarations did not have arguments declared, and the compiler did not check what you passed when calling a function anyway.

Some of the names were scoped as being in cmdl, but that’s only the flag enumerations defined inside the class (enumerators scoped to the enumeration name itself didn’t come until C++11). The various classes used though are defined globally. Why? Because there were no such thing as namespaces.

The cmdl_int type is written specifically for the int type. Why not a template? Because there were no such thing as templates. And BTW, int for me was 16 bits, running in “real mode” 8086 code.

The makefile shows a symbol that’s defined if I’m compiling under Borland C++ 3, which supports the 2.1 version of the C++ specification.

Turbo C++ 1.0 was released in May 1990, and TC++ 1.01 was released February 28, 1991. Borland C++ 3.0 was released in 1991. That should indicate the true vintage of this code.

Wikipedia chronicles that C++2.0 was released in 1989. As it so happens, I was a reviewer of the spec and documentation before it was finished, and got my name in the Annotated C++ Reference Manual. This added, of note, multiple inheritance, abstract classes, static member functions, and const member functions, and placement new.

Version 2.1, noted above as this code was used when 2.0 and 2.1 compilers were both in use, added partial nesting of classes. So that explains why none of the other types were nested inside cmdl — you could not do such a thing!

Once upon a time, I developed an easily-used command line parser in C++ with review and feedback from the community (Fellow TeamB members, and regulars on CompuServe’s DDJ and CLM forums including other authors). Note that this pre-dated the world wide web and mass access to the Internet.

I ended up using it a lot, for most every testing, benchmark, or demo program that I created.

Philosophically, declaring gobal variables that reflect the command line parameters is essentially declaring the parameters being passed to the program. It was designed to be composable so that a library could come with its own (possibly obscure) arguments to control it, and the program would just automatically respond to them. For example, a logging component might have options for specifying the output location, verbosity, archiving behavior, etc. Any program that used that logger would have those options available.

Here is the original documentation, last saved December 22, 1992.

I’ll continue in the next post with some observations about the C++ language circa 1992.

C++ Library for Easy Command-Line Parsing
by John M. Dlugosz

I’ve always felt that the argv[] array was difficult to use. Not bad, just primitive. If all you have are a couple arguments, it is not too hard. But you still have to check for the correct count and convert each value to the proper type.

If your program has various flags and switches, things can get much more difficult. How many programs have you written and suffered through the argument processing? In how many programs have you wished you had a better way? In my case, I’ve written many simple programs that could benefit from command line arguments, but found it more trouble than it was worth. So I was stuck with a simpler, less flexible program. For test code and such, I would even change a value and recompile, instead of adding a nice command line processing.

Now, I do have a simple way. It has revolutionized the way I write small programs. Rich command line argument processing, sign-on messages, and help on usage are now trivial.

Here is an example. Consider a program that takes a -v switch for verbose mode. Using this library, this is accomplished by including the definition

cmdl_flag v ('v', "requests verbose mode");

to make the program recognize the flag, and code such as

if (v()) { //do this in verbose mode
//whatever...
}

to respond to the state of this flag. There is no messy string manipulation, error checking, or anything. The library automatically handles -v or /v forms, disabling a switch with -v-, cascading switches such as -vbx, and other features.

Notice the definition of v above takes two constructor arguments. The second argument is a string that provides usage information. The library will automatically generate the usage message, collecting the messages from all the parameters in the program.

Concepts

The basic idea is to model command-line parameters as program arguments. That is, they should be analogous to arguments passed to a function. In a function call, each value passed is bound to a name in the called function. By analogy, a program argument is a name which gets bound to something which can be specified on the command line. To provide for command line input, you declare those arguments you want to receive, along with their types.

The cmdl library has a type for each type of command line parameter: flags, integers, strings (more can be added).

The constructor is given the name of the parameter, as used on the command line. It can also be given a help string, and flags. Here are some examples:

Type cmdl_flag is a simple switch. Using that flag makes the parameter TRUE, if absent it is FALSE. You can also turn off the switch by using the name with a trailing - sign. (The library takes care of cascading switches, too.)

Type cmdl_string allows input of an arbitrary string. The syntax is somewhat flexible, with the argument separated from the keyword by a space or an =, and the string can be in quotes.

Type cmdl_int allows input of an integer. The input is checked for valid syntax.

Type cmdl_help provides for an automatically generated help screen if the command line is empty, or with the -? switch.

Except for the special cmdl_help class, the constructors take two or three arguments. The first is the name of the command-line parameter. This can be given as a single char or as a string. If passed (char*)0, there will be no name and it is taken to be a positional parameter, explained later.

The second constructor argument is the usage help string.

The optional third argument to the constructors is a bank of flags. once indicates that the argument can only appear once in the command line. Ordinarily, repeating it will override the previous mention. The required flag means that it is an error to omit the parameter. There are others, detailed in the code listing.

A flag worth particular attention is keyword. If present, then the command-line parameter name will not use the switchchar (- or /) to indicate that this is a parameter. If a keyword is found anyplace outside of a quoted string it will be used as an instance of the parameter.

Using class cmdl in a program

The program that contains these definitions will kick off everything by calling cmdl::parseit();.

This works because the constructor for each command-line argument class linked them together into a linked list. The command-line argument objects should be global, or defined in main before calling parseit(). In any case, no commnand-line object should ever go out of scope before parseit() is called.

Because the objects link themselves up, the complete collection of defined command line parameters is known. parseit() will parse the command line, and compare what it finds with the list of possible arguments. It takes care of usage errors and such, so the program aborts if the command line is invalid. No error checking is required by the main program.

Each command-line-parameter object contains an operator() which provides a succinct way to get the value of that parameter. There is a default value in case it was not specified on the command line. If you would rather check for its presence, use the hasvalue() member.

Before calling parseit(), you can use the static member signon() to note a string used during the usage help message.

See the listing of TEST3.CPP and other files for usage of the examples described above.

Kinds of argument names: char, string, and positional

The first argument to the constructor of the command-line parameter objects is the name of the parameter that will be used on the command line. The constructor has two forms. It can take a char, used for a single letter switch. Or it can take a string (char*), for arbitrary names.

In addition, the string form responds to a special name of NULL. Passing in (char*)0 for the name makes it a positional parameter. The parser will not assign it based on a name. Instead, it is used for unnamed parameters.

If a parameter does not start with a - or /, and it does not match the name of a keyword parameter (those that don’t use the -), it is taken to be a positional parameter. It is assigned to the first unused positional parameter you defined. This lets you mix switches with non-named parameters such as filenames.

Note that positional parameters can be flagged as required.

The Use of C++

A few C++ language concepts may need explaining.

Note the syntax of the flags in the third constructor argument.

cmdl::required | cmdl::keyword

The names here are enumeration constants. They are created with an enum definition (see CMDL.H, line 50). The names are defined within the class, and are in the scope of the class. They are not global, and don’t pollute the global namespace. So, you have no conflict with a name keyword used elsewhere in the program, for example. The downside of this is that you qualify the name with its classname, as shown. Note that in C, you probably would have seen CMDL_KEYWORD instead — the name would contain its “family” identifier as part of itself. So it really is not additional typing to use class-scoped names like these.

The enumeration constants are given explicit values as powers of 2, so they behave as flags which can be combined with | or +. The function’s parameter taking the flags are defined as unsigned, not as an enum type (in fact, the enum type has no name. It just defines the constants). This is necessary because the result of | or + is an int, not an enum type.

The class contains two definitions of enum names for flags that share the same flags variables. But some are public and some are protected.

Another interesting feature is the use of operator(). See CMDL.H lines 84, 97, and 109. The operator is defined with the name operator() which is then followed by the parameter list. Here it has no parameters, so you see two sets of ()’s in a row. The operator is invoked by following the object name with the parameter list, as shown in the test programs.

The positional parameter ability requires you to pass (char*)0 instead of just 0 because 0 is ambiguous — 0 can be a char '\0' or a null pointer.

The Parser

The core parser code breaks up the command line into tokens and looks up names of parameters. The value of parameters is sent to the matched object for conversion to the proper type. The virtual scan() function does this final part. An earlier version of this library had a seemingly more flexible system that allowed significant customization in the specific parameter type’s code. However, it proved too clumsy and was never really used. This points out a good design philosophy: Make a thing just flexible enough. If it is too configurable, it can become as difficult, or more so, to use as writing code each time; which is exactly what the library is supposed to avoid.

The parser uses a class cmdlscan for low level character manipulation and tokenizing. It is planned to give this more power in the future, for better error reporting. Some of the implementation details are implemented as they are for that reason. The need for a parser class was indicated because several related values, including the string and its current scan position, were always being passed together. When things like this happen, think about combining them into an object.

Error and Report Output

I did not want the code to simply use cerr and cout for output. This may be used in programs that have their own idea of I/O, including programs that run in graphics mode. For maximum flexibility, all output is separated. The final results are funneled through a pair of functions called cmdl::output(), both defined in OUTPUT.CPP. If linking in CMDL.LIB, you can supply your own versions of these two functions to handle output your way, without having to recompile the cmdl library.

This is not a “story” because it doesn’t have any conflict. It would be included as exposition within a story where this setting is important. It could be a sci-fi story, but it really doesn’t have to be.

How can a modern-day kingdom exist, outside of any normal jurisdiction? There are already a concept called a micronation. But more interestingly is the case of cruise ships and oil platforms: you can find news stories about how “things happen” and no country will investigate, even when family members are demanding that somebody do something.

Story excerpt/scene first draft —
“What Country is This, Anyway?”

Copyright 2017 by John M. Długosz

It was the summer in which she had moved out of the kids’ bunk room and into her own place. It was just a bedroom and tiny bathroom, adjacent to the main house, but hidden away a little so it didn’t exactly feel like next door. She covered the walls with drawings and images torn from magazines or printed out from web sites, making the space her own. She also had a large map of the Bahamas and Caribbean; an old nautical chart from the looks of it. They were near the right edge about two thirds of the way down, among the Lesser Antilles. But their little home wasn’t even on the map! It was probably too old.

“What country are we in?” 15-year-old Beth asked her mother.

“Well… nobody’s sure exactly. And we go to some effort to keep it that way.”

“How can that happen?”

“History, same as everything else. You’ve studied the history of the region, right? You recall how different islands, or even parts of the same island, were claimed by different European governments; and that until fairly recently they switched around or were even fought over sometimes.”

“A hundred years ago, the ground we’re standing on did not exist. If you were on a boat, you’d see the Worthless Duck Rock nearby. That’s all there was — a rock that looked like a duck. Nobody wanted it for anything. Even pirates didn’t like it because there was no place to bury treasure.” Liz smiled in a way to show that this part was a bit of a joke and not literal. “Unless you count the bird poop which was piled up as deep as your waist!”

Beth involuntarily imagined herself standing up to her waist in a pool of bird poop. “EEwwwwww!”

“Even the birds didn’t want it! They would land because they saw something that wasn’t ocean, but there was nothing to eat and no place to build a nest, so they would immediately take off again. And they habitually lighten their load when they take off, so it was just their poop stop.”

“Now England and France both had historical claims, but neither country had it under any specific administrative organization; it was just ignored.”

“Less than a hundred years ago, a rich tycoon named Plait bought the place by paying both England and France. It was just a token amount really, and then they could stop worrying about which of them owned it or if they would ever fight over it.”

“Still, neither country has that rock listed as being part of any region (a region in England and France is like a Provence). Each country divested the land completely, rather than showing it still as a part of themselves but with someone listed to pay taxes. Know what I mean? Normally in such cases the implication is that it is transferred to the government of the person who bought it. But Plait was English! When England did the paperwork, it slipped through the cracks, as they say.”

“That’s just for the original rock that was the only land that existed at the time. All of this,” she waved her arms to indicate everything, “was built. So it’s like a moored ship or an oil platform: maritime law would apply. And it’s, as they say, ‘hopelessly convoluted’. Cruise ships that are outside of anyone’s national waters have been in the news from time to time when no jurisdiction will do anything about a reported crime. The cruise ships in particular want to keep independent from the country where it’s from, so they can have gambling, alcohol, and whatnot. There should be a ‘port of registry’ that the owners get to choose; but from those news stories we can see that it doesn’t always help.”

“So look at the ownership. We have U.S. citizenship so that should do, right? Nope! The real ownership is by holding companies which are wholly owned by family members who live in different countries. The holding company that directly owns this resort is in Amsterdam. But this property, and the others on this artificial cay, is leased from the owner of the ground itself. We have a large interest in that company, which your Daddy bought into in order to fix up the place; have the town where staff can live, electricity that always works, and so on. But that company itself is incorporated in Switzerland.”

Beth’s head was spinning. “So no country has it on its books officially, and anyone who tried to figure out who ought to have it will be confused.”

“More practically, anyone who suggests that some specific country has jurisdiction will be chased by lawyers saying that one or more others should, instead, so it’s not theirs. And not just our lawyers; cruise lines and shipping companies have a huge interest in making sure that legal precedent is not established.”

“I get it, but what is it? What’s it called? Where are we, if you needed to name that state of affairs?” Beth loved words, and it would bother her if some nebulous concept didn’t have a name. Liz knew that about her daughter, and knew what she needed.

“I might not be getting this exactly right; you can look it up. But we’re a de-facto non-secessionist sovereign micronation. I’m basically the queen.” Liz takes a mock bow. “As long as we don’t harbor fugitives or otherwise get others upset with us in a major way, cooperate with the neighbors, and stay under the radar, we’re an independent state. If anyone does try to muscle in, it will create international incidents out of proportion to what they’re trying for. That is, for example, if Guadeloupe tries to annex us, France will be starting a war with the UK. Kings and presidents don’t rule the world: lawyers do!”

“Also, recognition isn’t an on/off thing like a light switch. It’s more like a dimmer switch knob. Different governments, agencies within a single government, corporations, and NGOs can independently recognize us or not. And the extended family over-all — not just Taft branch but distant relatives; the fortune goes back over 150 years to the Industrial Revolution — are close-knit with many important NGOs. Having organizations like Red Cross and Médecins Sans Frontières recognize us a a neutral place and having an open invite to use the grounds if a neutral location is needed, is not as good as recognition by the U.N. but it’s pretty close.”

Beth had had no idea her little home, Lizard’s Hideaway, was actually a player on the world stage. Yet it wasn’t on the map!

Last time, I found that Haswell core executes 8.57 instructions per clock cycle, which is odd because there is a hard limit of 4 instructions per clock.

If the benchmark is calibrated by setting the performance of the VAX 11/780 and IBM System/370 model 158-3 as equal to 1 MIPS, I can think of a couple reasons:

First, those machines were marketed and claimed to be 1 MIPS, and the real performance of the Dhrystone benchmark is considerably less than that theoretical value.

Second, the number of “instructions” in a high level language procedure might translate to more than one machine opcode each. Modern compilers are more efficient and generate far better code. In fact, it’s gotten to the point of being hard to compile such a benchmark program as-intended as the compiler will eliminate it as being pre-computed or useless!

Finally, maybe the modern CPU takes fewer machine instructions to do the same amount of work. That’s certainly true, but does the benchmark code use the larger registers? Looking at an implementation, I notice a couple things along these lines. Control statements can produce fused macro-ops combining the test and branch, which picks up another simultaneous instruction. Procedure calls might be “free” as that simply starts prefetching from the new location and the unconditional jump itself is not seen as an instruction at all. Even if the compiler is not optimizing out work, the CPU itself might be! Assignment statements are also part of the mix, and if these variables are in registers than a MOV can be free, taken care of in the register mappings in the pipeline and never actually moving anything. Finally, string manipulation might take place in bigger gulps, not one byte at a time.

The OCD Method

I brought up the fully-optimized version of the program in the debugger, and counted the instructions. On the “short” path where it does the early-out, it does as few as 85 instructions and sometimes over a hundred. To err on the side of caution (fewer actual instructions done in the given time) let’s use a weighted average of 90. BTW, the early-out part at the top of the loop is only 21 instructions; the bulk of the expense of one iteration is permuting the array.

On the “long” path, the full code to decode the three strings with the current values is 110 instructions, and then it takes 3 to do the final comparison. So let’s say 200 instructions in this case.

Without counting the rare times where it does a skip by reversing a few elements, this is about 135,693 full iterations at 200 and 219,360 short iterations at 90, or a total of 46,881,000 instructions.

Doing 47 million instructions in 3.4 milliseconds is about 11.8 billion instructions per second. At 3.6 GHz this is 3.3 instructions per clock, where now “instruction” means an X64 opcode.

In my previous post, I wondered what the power of Knuth’s computers were, at the time TAOPC was being written. Someone suggested the IBM S/360 series as an exemplar. That turned out to be a good idea specifically, since I’ve written programs for the S/370 in assembly language, so I’m familiar with it. Models 30, 40, and 50 were released in April 1965. On the pricier side were models 65 and 75. Here is a scanned “System Summary” describing the various models in detail. So, I suppose somewhere between 100 and 900 kilo instructions per second. A larger machine would probably be servicing multiple users. Fifty years later, my Xeon E3-1276 is supposedly around 133 billion instructions per second.

Interestingly, the S/360 takes many (like 13 on average) clock cycles to perform one instruction. Meanwhile each core of the Xeon performs 8½ instructions in one clock cycle. I suppose the clock speed of the S/360 is the cycle time for the internal microcode.

But what’s an instruction? On the S/360, I would not need the decode function at all, but would just sum the digits directly using unpacked decimal.

The modern CPU knows only binary arithmetic on various word sizes. So converting from a decimal digit-per-byte requires 4 iterations on two operands doing a multiply and add: at least 16 distinct instructions (if the loop is unrolled), plus the actual add once that’s all done.

Interestingly, the x64 code generated by the compiler doesn’t actually issue a multiply instruction in this loop. In fact, the entire expression does not use the regular ALU! There is neither a MUL or ADD instruction there. Instead, it exploits the address generator to do stuff that has nothing to do with actual pointer addresses. The complicated addressing modes of the CISC processor means that a separate address generator unit has a variety of things it can compute, yet it is far more limited than a completely general ALU. So, it is much simpler and thus faster.

In particular, Scaled Index mode looks like this: [ebx + ecx*S + constant] Register ebx is the base, and ecx is used as an index here. The index can be used directly, or scaled by 2, 4, or 8. If the same register is used in both positions, you can multiply it by five! The LEA instruction is Load Effective Address, and gives the processed address without fetching what it resolves to like a MOV would. So, if we have total in EAX and the digit in EBX,

LEA EDX, [EBX+EBX*4]
LEA EDX, [EAX+EDX*2]
ADD EAX, EDX

The first instruction multiplies by five. The second instruction not only multiplies by two, but also adds in the digit as the base of the addressing mode.

I also found it interesting how the S/360 line anticipated what we have today: one compatible instruction set, but pricey implementations have more pipelining and faster clocks; also they keep adding more layers of cache memory. The “processor storage” housed with the CPU is analogous to the L2 cache. Adding external memory modules gives more storage but slower: 8 microsecond access time. If you add pairs of modules you can go dual-channel and double the throughput. Finally, later high-end models added extra-high-speed memory to keep up with the CPU, and that is analogous to our L1 cache.

Back to the quantitative comparisons: The modern machine has 4 independent cores, but my program only used one. If a brute force problem required a significant amount of time, it could be split up into 4 tasks. At full utilization, 133 billion vs 133 thousand, more or less. That’s a factor of about one million. With the single thread, a quarter of that. 30 ms on one core would be about 8½ hours on a S/360-50 using it exclusively for this job.

Knuth’s suggestion of 10! can be scaled up by a million. That’s midway between 12! and 13!. Now in terms of exponential growth, realize that an institutional computer like that cost about 1000 times more than a personal desktop computer today. At computing power per constant dollars (not adjusting for inflation) is indeed about one billion.

For floating-point calculations, the difference in power over 50 years is a few orders of magnitude higher. A $300 GPU card can do 4 teraflops? That means it would be a world-class supercomputer as recently as 2005!

I saw this puzzle on Bartosz Milewski’s blog, with an entry on using monads in C++. I’d like to hold it up as an example of a completely different lesson to C++ programmers: A central idea I want to teach is know your libraries.

I recall it was in the early ’90’s, when the STL for C++ was submitted as a proposal to include in the upcoming standard. I noticed there were algorithms called next_permutation and prev_permutation, and wondered how they work—how do you order them and re-arrange your collection to the next such, without keeping an auxiliary state? Then I wondered what I would ever use such a thing for. Well, nearly 25 years later, I found out!

You should look through the list of algorithms every once in a while just to see what’s there. Otherwise you might only know about the common ones that you use. Consider the analogy with a tool (say, a special bent screwdriver only used to attach doorknobs) that you know is in the very back of the drawer, though you may need to rummage around to locate it. Remembering you have that tool makes for a quick job. Having to rig up something from common parts instead (to continue the analogy, use locking pliers to grab a short screwdriver bit from the side) is not as good, and certainly more work.

So… 8 nested for loops followed by 7 if statements containing 28 conditions? Get real! If you have a line that reads });});});});});});});}); then the code review will be a sight indeed.

You’ll notice that besides the uniform initialization syntax introduced in C++11, this uses something you might not have seen before (if you’re reading this in 2017). Hot off the press in C++17 is structured bindings.

auto [ig1,ig2, s,e,n,d,m,o,r,y] {A};

This defines 10 variables and assigns all the elements of the array to them. The first two are ignored so I used scratch names, and the others are simply the names of the letters in the puzzle.

One thing I have to point out from Milewski’s listing is the call to turn a word into a numeric value. He writes:

int send = asNumber(vector{s, e, n, d});

This creates a std::vector on every use. Let me elaborate: it allocates a block of memory from the heap (vectors can’t use a small-vector optimization). Then after the call returns it is deallocated. Then the next two lines to the same thing. And that happens on every iteration.

The constructor for std::vector takes this handy literal list. Now in C++ these containers are not special language features, but are ordinary libraries. It should be clear that anything they do — cool syntax or unusual ability — you can do yourself on your own code! My version of the same construct does not create a vector, doesn’t require more words to make the call, and most importantly does not have any overhead.

The solving function doesn’t print the results because I want to time just the solving logic. So the solutions are pushed onto a vector, and the caller prints them after stopping the clock. In a real program, this might be an object (not globals) and the results available in the object afterwards, or as the return value from the solving function. In another post I’ll make it lazy.

Make it faster

This simple function found 50 solutions, one of which doesn’t have a leading zero. It ran in 39.6 milliseconds, trying all 3,628,800 permutations. That’s 92 million iterations per second.

The value of 10 factorial is an interesting number here. Donald Knuth, in the The Art of Computer Programming, wrote that this is about the size that separates things that may be done by brute force from things that are impractical to simply try all possibilities. Volume 3 was published in 1973. I hazard to guess that computers now are about 230 (or about a billion) times the power of the machines that were around when he wrote that. A billion times 30 milliseconds is 460 years. So, I revise that to more like ten million times the speed, if I postulate that he could have run this program to completion in a matter of days.

Anyway, to make it faster, I need to skip over permutations that are “the same” as one I already rejected. The order of the two ignored digits don’t change the solution, so if I decide that one order is canonical and when the other is detected I skip over the whole lot, that would cut the number of iterations in half.

So how do you skip over states in the next_permutation algorithm? I looked it up — a StackOverflow post described it well and also pointed to a Wikipedia page on it. The states are generated in lexicographical order, so when a given digit changes everything to the right is in reverse sorted order, and it “rolls over” to advance that digit by one and everything to the right is now in forward sorted order — the lowest value of that substring.

So, when I identify a digit value that I know will not be a solution no matter what the other digits are, I can skip to when that digit is advanced again by reversing everything to the right of it.

Indeed, it still found 50 solutions but the iteration_count showed half the number: only 1.8 million times through the loop. However, the time only went down to 26ms — about two thirds the time, not half.

We also don’t want solutions with a leading zero, so filter those out too. Notice in the listing above I changed the order of declaring the digit variables. It doesn’t matter to the solution algorithm, but putting these farther to the left means I can skip over more.

Another big constraint can be found on the right side of the puzzle. I would think that parsing the arrangement of digits into ints would be slow, seeing as that involves multiplying by 10 and adding each digit. Looking at the rightmost (units) digit only, the puzzle has d+e=y with a possible carry. Test that before parsing the int values, and furthermore skip ahead until one of those three digits changes again. To that end, the declaration order has d, e, and y next after the previous items we wanted on the left. This leaves only 3 letters to the right, so each time the units digits don’t work it can skip 6 iterations.

I added a counter to that, and see that it happened 219,360 times. The loop only executed 355,053 times now, taking a mere 4.7 ms.

Faster yet?!

Notice in the listing that I declared a type CellT for the array of digits and anything that holds a digit. My thought was that keeping the array small would save in parameter passing to decode. Keeping the size of the data small is generally good for memory caching, but it probably doesn’t matter here because I don’t have enough data to exceed the L1 cache anyway.

But, I can change the type in one place in the code and it affects everything in the program. I changed it from int8_t to plain int (int is 32 bits here), and… the time was down to 3.4 ms!

64-bit was the same. 16-bit values was the slowest at 5.2 ms. So, the CPU is inefficient at loading 8-bit values and worse with 16. I don’t know if the actual load is slower, or the compiler generated somewhat different code rather than just using MOVXS instead of MOV. This was a difference of a quarter of the execution time over the entire algorithm, and loading values is only a part of what it does, with values kept in registers once loaded.

The Memoir

When I was a kid, my family of 4 played a game called Aggravation. According to the history recounted on Wikipedia, this was the original version and is no longer made. I recall it was a piece of thick material similar to what I’d buy today as high-density fiberboard, with holes in it. This was placed on top of the bottom of the box which was printed with colored patches that would show through the holes. The box had short sides so the marbles could be placed within and the lid put on, for storage. (Here is the same board from 1962, but the marbles were different and I think the lid was slightly different too.) I think it was obtained in 1964 by my parents.

Vintage 1962 Aggravation board courtesy of LeolasAttic

Later, I got a 6-player version for a birthday present. This had a folding board like many board games used, but was this single board only—there was no backing, so the holes went through and you could see the table through the holes. Instead, colors were drawn around the holes to indicate different zones, and this gave it a completely different character.

Now I see the 6-player set for sale as “Rare! Original vintage deluxe party edition antique” circa 1972 for $130. I believe I’m missing one of the colors of marbles and have substituted ordinary cats-eyes for that color. Even so, I hope it turns up in my parents’ garage in the same good condition!

I must say, it lived up to its name. Even rolling the die (by a child barely old enough to play) might fly across the table and knock into marbles in play, fall off the table, or (most irritating to some) be dropped in place so gently that it was not sufficiently randomized.

I don’t remember personally, but I’m told that my maternal grandfather called the center the “knockout hole” and also enjoyed the game with family, when it was first released.

This is another family game that I’d like to revive, particularly with my mother-in-law visiting.

It would be easy enough to make a copy of the classic version, with a couple pieces of hardboard. But not easy enough, right this moment. It would be even simpler to just print out the board on paper. After all, the historical predecessor that this is a variation of was normally made out of cloth, and many other modern versions are flat boards that don’t need holes. It’s just a matter of using more typical game tokens rather than marbles. These can be borrowed from another game, or, to preserve more of the original flavor, I can use glass stones from a Pente set. (I see my “Vintage Pente Game (1977)” being sold for $125 now!)

The Project

My idea is to make it modular. Not only does this make each module fit easily one a standard-sized sheet of printer paper, but it can then be customized for the specific situation. Any number of players can be used by using 3, 4, 5, 6 (or more?) modules, and there will be no dead zones or asymmetry from using a board that accommodates more players than are in play. It can also arrange the different colors in any order, to suit seating preference.

So, each module is the section of track that us a “W” shape, with the center row being one player’s home. The modules will overlap at the first/last spots which form the inside corners. A spot for the knockout hole needs to be positioned in the center, separately.

The track module is 6 spots tall and 5 wide. So, it should be easy to lay out on a grid using Adobe Illustrator.

New Variations

Now, Aggravation is a specific variation owned by Hasbro. The larger family of Pachisi (पचीसी) games is ancient, though, and there are many westernized commercial versions of the idea. So, if I come up with my own unique variation, I can publish it as my own game that needs a suitable name.

I think the modular nature is key to making a new variation. Adding and removing modules can be a dynamic effect of game play, not just a configuration made before starting the game. I found a similar game called “marbles (or pegs) and Jokers” that has a modular track. But it doesn’t have the knock-out hole or the shortcut track of the 6-player board. And that’s the best part! So my variation will feature more self-intersecting tracks and shortcuts.

I have a general idea of adding/removing sections that provides for a loop at each corner, and then a flying bridge shortcut between these loops. A player can spend chips (use poker chips) to perform the actions of adding, removing, or moving track modules.

Now here comes the clever part: whenever starting a new token out of the base, the player gets a chip. This means that attacking a player — knocking out his tokens repeatedly so he has to start over — also makes him stronger in terms of these board-changing powers.

Another variation would be to use different types of dice, with 4, 8, 12, or 20 sides, as used in role-playing games. Simply using a different die, perhaps scaling the board to match the range of motion from one throw, isn’t much of a change. It would be interesting to use difference dice throughout the game, giving a speed boost for example by rolling a d12, or making it easier to snuggle up the tokens in the “home” by using a d4. I don’t have any good ideas as-yet as to decide when to allow these choices.

…work in progress…

Files

My mother-in-law is visiting from China, and one of the games my family has always played while I was growing up is Yahtzee. Although now I have official published score cards, I recall as a small child that we originally had home-made sheets produced by my Great Aunt Harriett. They probably date from about the time the game was first introduced and popularized: I read in Wikipedia that innovations made by Yahtzee® over the traditional forms include the upper-section bonus. The sheets I remember did have an upper-section bonus, but had the Big/Little straight idea as shown for Yacht, and I recall it had a Pair. So, it had to have been influenced by the E.S.Lowe product some time after 1956, and I know my parents were playing it in the 1960’s.

I’ve played Yahtzee since I was old enough to understand it, sometimes in large family gatherings with parents and grandparents. It was always a favorite of my Mom’s.

So naturally I thought it would be great to play during the holiday season with my mother-in-law’s visit. The catch is that she doesn’t speak English.

I had an idea to make, not a translated score sheet to use in place of our English sheets, but an adaptor. Originally, I thought to make a stiff card, printed on letter-size paper, that the sheet would attach to using paperclips. So, it would contain translations for the score information (the first two columns of the printed sheet) that exactly line up with the rows of the score sheet, to the left of the sheet; and general notes and instructions that could point exactly to the rows it referred to.

So, the main design work involved exactly duplicating the grid spacing and position on the sheet. That did not seem as simple as it should be in Illustrator, so I posted a question on StackExchange. I quickly got comments that Illustrator doesn’t do that easily but InDesign has a table tool.

While playing (initially using translations on a separate sheet), I noticed that it was proving difficult to use columns beyond the first two games. So I modified the design to take care of this also: I reversed the attachment idea, and now have the card attached on top of the score sheet. The new legend appears on the right, and can be shifted over for each new game. This turned out to be a good design in general as now the explanation can continue to the right of the name, as wide as desired.

The question then became how to attach the papers when using the rightmost columns on the score sheet? There is no paper beyond that to clip under the card. I solved this by having the card both in front and behind the score sheet at the same time: a careful cutout and fold, and the letter-size page can provide a “tail” that supports the entire width of the score sheet in any position, from behind.

As planned, the adaptor is lined up such that the papers can be aligned on their vertical edges, and then two paperclips will hold them perfectly. I do suggest using smaller clips that I have in the photos: less than one inch long and they’ll naturally cover only the title area above the part you write on. The photo above shows the adaptor card positioned to the right of the Game 5 column (nearly at the right border), and the score sheet is clipped to the folded-back strip along the entire top edge and holds securely. I printed on 32# HP Premium printer paper.

A final improvement concerned the scoring. I noticed some confusion in remembering that some rows used “total of all dice” and others used just the matching dice. So I color-coded the different scoring schemes in the “score description” column, as well as color matching the row where the upper total is copied down below. And as long as I was shading things, I shaded the “name” column to indicate which rows represented turns to be scored, as opposed to totals and bonuses.

Here is the PDF File of the Simplified Chinese Yahtzee Scorecard Adaptor. Be sure to print as “actual size” and simply allow the margins to be cut off as needed. This is free to use with attribution, under the CC-BY 4.0 license, so here is the InDesign file. If you make one for a different language (just change the text in my layout) or other variations, let me know and I’ll link to it or include it here.