Someone must have thought it was important to include the input number (IN1, IN2 and IN3) — possibly so they would not need to look through defines to see what value goes to what input — so I kept those in my names.

But now, when I find this code months down the line, I’ll immediately know which function is most likely detecting that the robot is on fire.

NOTE: This article was originally written a few years ago, so some references may be out of date.

I have been enjoying working on the SirSound project the past month, as well as some fun challenges at my day job. Every now and then I run into something I’d like to do that is not doable in C, or doable but not proper to do in C, or … maybe doable. It’s sometimes difficult for me to find answers when I do not know how to ask the question.

With that said, I have a C question for anyone who might be able to answer it.

Using my multi-track music sequencer as an example, consider representing data like this:

It seems like all I’m doing lately is regurgitating things that Robin of 8-Bit Show and Tell has already done.

So let’s do that again.

Don’t blame me. Blame YouTube!

YouTube did what YouTube does and it showed me another of Robin’s well-done videos. This one caught my attention because it dealt with the Commodore VIC-20 and its Super Expander cartridge.

The main thing that pulled me away from Commodore was seeing the TRS-80 Color Computer’s Extended Color BASIC. The CoCo had simple commands to play music and draw lines, boxes and circles. It also had this wondrous ELSE command I’d only heard rumors about.

On the VIC-20, it seemed you needed to use POKE and PEEK for just about anything graphics or sound related. Thus I gave up a computer with programmable characters and a hardware sound chip for a machine that had neither. On my new CoCo, at least I could draw a circle and play music without needing pages of DATA statements and cryptic PEEKS and POKEs.

Commodore was aware of this shortcoming, and they sold the Super Expander as a way to make up for it. Not only did it provide an extra 3K of memory (giving a whopping 6.5K for BASIC), it also added new commands to do “high resolution” graphics including drawing lines and circles, as well as ways to PRINT music using simple notation.

I used the Super Expander to do TV titles for fishing videos my father shot and edited. It was a thrill to see my VIC-20 graphics on TV screens at the Houston Boat Show.

But no one else could run my programs unless they had purchased the Super Expander as well.

But I digress.

(And besides, the Commodore 64 was $600 when it came out, and I was able to get a 64K CoCo 1 for $300 at the time.)

Don’t blame YouTube. Blame Twitter.

Robin’s video was making use of the Super Expander to let the VIC-20 solve a challenge initiated by Twitter user Dataram_57. On June 8th, 2019, they wrote:

That’s not a very good example. It doesn’t erase itself, nor does it use the bottom line to avoid screen scrolling when the ball hits the bottom right position. It does show how I would use X and Y coordinates then an XM (X movement) and YM (Y movement) variable to increment or decrement them based on if they hit an edge.

The parameters of Dataram_57’s challenge were as follows:

Width: 90

Height: 80

Starting Position: 0,0

Time: ???

I wrote a quick graphical program do do this using my X/Y/XM/YM method:

The first thing to notice is that I draw a filled box from 0,0 to 89,79 and then set black pixels in it. This lets me visually verify my line is going all the way to the edge of the 90×80 target area. Also, I am using the CoCo 1/2 double speed poke since this is time consuming. If you do this on a CoCo 3, feel free to use POKE 65497,0 instead.

Twitter user Dataram_57’s challenge running on a CoCo.

Eventually the area should be entirely black when every dot has been erased.

How long has this been going on?

I did some tests and figured out that it takes 7032 iterations (0-7031) for the dot to cycle through the entire 90×80 area before it has erased all the other dots.

With that in mind, I propose we turn this into both a logic and optimization challenge. On the CoCo, let’s see if we can use the PMODE 0 screen (128×96 resolution with 1 color). We can put this in a modified version of benchmark framework for 7032 cycles and see how fast we can do it. (By modified, I am removing the outer “try this three times and average the results” loop.)

My original “nicer to read, slower to run” version took 25.28 seconds to run. Here are some things that can speed it up.

RENUMber by 1

Renumbering the program by 1 can save program memory (since “GOTO 77” takes less space than “GOTO 1500”, and if the program is large enough, you get a slight speed increase since it’s also less work to parse shorter line numbers. But this program is too small for this to matter.

Remove comments / avoid comments at the end of lines

Removing comments, in general, is a good way to make a program smaller (and thus, a bit faster since BASIC will have less lines to skip). But, as I found out, comments at the end of lines are a real problem, since BASIC cannot just skip the entire line like it can if the line starts with a comment:

10 X=42:REM SET X TO THE ANSWER.

10 REM SET X TO THE ANSWER. 20 X=42

The second version is faster, since even though BASIC has to scan an extra line, it can immediately skip it, while the first example must be parsed, character-by-character, to find the end of that line. But for this example, this did not help. It would have, if there were more GOTOs to earlier line numbers, since they would have to parse through a few lines with end comments, which would be slower. Since our looping was done with a FOR/NEXT, we didn’t have to do that.

Remove unnecessary spaces

Spaces are great for readability, but bad for code size and speed. There are only a few times when a space is needed, such as when BASIC can’t tell if the next character is a keyword or part of a variable:

10 IF A=B THEN 100

BASIC must have the space between the variable and the keyword THEN, else the parser isn’t sure if the variable is “B” or “BT” followed by the characters HEN. BUT, if it’s a number, it can figure it out:

10 IFA=42THEN100

Removing all other spaces reduces the amount of code BASIC has to parse through. My program had plenty of spaces. Removing them in this tiny program only reduced it to 25.15 seconds, but it can have a bigger impact on larger programs.

Combine lines

This one is easy. The less lines BASIC has to scan through, the quicker it will run. But, there are exceptions to this. Any time BASIC has to do more work at the end of a line, such as checking an ELSE, if that condition is not satisfied, BASIC has to scan forward character-by-character to get to the end of the line:

10 IF X=42 THEN 100 ELSE PRINT "I DON'T KNOW THE ANSWER"

Above, if X is 42, the ELSE code will not be ran. BUT, before the program can find 100, it has to scan through all the rest of the bytes in that line to find the end of it, then it continues scanning a line at a time looking for 100. Based on line lengths, sometime it might be faster to do:

10 IF X=42 THEN 10020 IF X<>42 THEN PRINT "I DON'T KNOW THE ANSWER"

Combining lines in the program reduced the speed slightly to 25.13 seconds. The more code, the more you save by doing this (and it makes the program smaller, since every line number takes up space).

NEXT without variable

Unless you are doing really weird stuff with FOR/NEXT loops, it’s faster to leave off the variable in the NEXT. For example:

10 FOR A=1 TO 10:NEXT A

…will be slower than:

10 FOR A=1 TO 10:NEXT

NEXT without a variable will go to the most recent FOR anyway, so this works fine:

10 FOR A=1 TO 1020 FOR B=1 TO 2030 PRINT A,B40 NEXT50 NEXT

For this program, just removing the variables from NEXT A and NEXT Z increased the speed to 24.66 seconds.

Use HEX instead of decimal

It takes much more work to convert base-10 (i.e., “normal”) decimal numbers than it does to convert base-16 (hex) values. Every time a number is encountered, time can be saved by simply changing the number to a HEX — even though a hex digit has to start with “&H”. It will make the program larger, since 15 takes less bytes than &HF. For this example, it reduced time down to 21.66 seconds!

Pre-render strings

If you have to PRINT a generated string more than once, you are better off pre-rendering that string first so it only gets generated one time. This includes using things like STRING$, CHR$, etc. For example:

10 PRINT CHR$(128);"HELLO";CHR$(128)

Each time BASIC has to print that, it allocates a new string a few times and copies things over. See my string theory article for more details. For this program, I use STRING$() to generate a repeating block of characters. But, since it’s in a loop to print the block, it generates the same string over and over for each line of the block. Instead of this:

10 FOR A=1 TO 1020 PRINT STRING$(10,128)30 NEXT A

…I could generate that string ahead of time and just print it in the loop:

5 LN$=STRING$(10,128)10 FOR A=1 TO 1020 PRINT LN$30 NEXT A

In this program, doing this trick reduces the time to 20.98 seconds! String are slow, you see…

The time savings so far…

Combining all of these (the ones that matter, at least), reduces time taken down to 18.45 seconds – saving 6.83 seconds overall.

Here is what it looks like, with the stuff outside of the benchmark timing loop left alone:

If we’d done any GOTO/GOSUBing to earlier lines, that would have to scan through those lines with ending REMarks, so cleaning all that up would be useful. If this was a larger program, RENUMbering by 1s might also squeak out a bit more speed.

But wait! There’s more!

I originally wrote this follow-up the evening that I wrote the original article. However, several folks have contributed their own optimizations. Let’s see what they did…

First, we go to the comments of the original article…

George Phillips – inner loop optimization

My current best effort is going full strength reduction on the inner loop. Replace lines 120 to 170 with:

The STRING$ requires Extended Color Basic which starts at a baseline of 20.4 seconds and those changes get it down to 13.

George Phillips

George pre-rendered the string, removed spaces, and vastly sped things up by simplifying the math. On Xroar, it reports 16.15 – faster than any of my efforts. Nice job!

I took his updated code and converted the integers to HEX and that dropped it to 15.1 seconds! Even faster.

Jason Pittman – hex and start row calculation

Interestingly, changing the blue square value on line 130 to hex (“&HAF”) saves about 11.5% for me (27.55 to 24.33). Then, moving the row start calculation out of the print statement takes it down to 22.87.

120 FOR A=32 TO H*32 STEP 32130 PRINT@P+A,STRING$(W,&HAF)140 NEXT A

Jason Pittman

Ah, the magic of hex! Jason did a similar trick with the row calculation. I tried Jason’s updates, and it was 20.28 on my Xroar. George is still in the lead.

Adam – DIM, reductions and variable substituion

So I was able to cut out 5 seconds doing the following:– DIM all variables– Adding variables for constants (2, 32, etc.) for lines 130, 160.– Removing the math in lines 10, 20.– Removing the variables after NEXT.– Removing all remarks.

Ciaran reorganized the code, placing initialization at the end. This is a habbit I need to get into since GOTO/GOSUB to an earlier line number have to start at the FIRST line of the program and scan forward EVERY TIME. Code that only needs to run once, if placed at the top, slows down every GOTO to an earlier line number.

I just ran this and got 13.13 seconds!

There is alot to dissect here. Numbers are replaced with variables. Stuff is pre-calculated. And he’s using a variable for the end of a FOR/NEXT, for some reason. I’m going to have to dig in to that one.

Twice as fast, indeed!

I feel like there were a few more responses, but I can’t locate them at the moment, so maybe there will be a second attempt article down the line.

Until next time, kudos to all of you for giving me new things to consider — especially Ciaran with that amazing improvement. Very impressive.

Yesterday, I wrote a short bit about how I wasted a work day trying to figure out why we would tell our hardware to go to “902.1 Mhz” but it would not.

The root cause was a limitation in the floating point used in the C program I was working with. A float cannot represent every value, and it turns out values we were using were some of them. I showed a sample like this:

The float representation of 902.1 would display as 902.099976, and this caused a problem when we multiplied it by one million to turn it into hertz and send it to the hardware.

I played WHAC-A-MOLE trying to find all that places in the code that needed to change floats to doubles, and then played more WHAC-A-MOLE to update the user interface to change fields that use floats to use doubles…

Then, I realized we don’t actually care about precision past one decimal point. Here is the workaround I decided to go with, which means I can stop whacking moles.

I wasted most of my work day trying to figure out why some hardware was not going to the proper frequency. In my case, it looked fine going to 902.0 mHz, but failed on various odd values such as 902.1 mHz, 902.3 mHz, etc. But, I was told, there was an internal “frequency sweep” function that went through all those frequencies and it worked fine.

I finally ended up looking at the difference between what our host system sent (“Go to frequency X”) and what the internal function was doing (“Scan from frequency X to frequency Y”).

Then I saw it.

The frequency was being represented in hertz as a large 32-bit value, such as 902000000 for 902 mHz, or 902100000 for 902.1 mHz. The host program was taking its 902.1 floating point value and converting it to a 32-bit integer by multiplying that by 1,000,000… which resulted it it sending 902099975… which was then fed into some formula and resulted in enough drift due to being slightly off that the end results was also off.

After this runs, it will report the approximate number of seconds it took. It does this by resetting the TIMER at the start, then printing the current TIMER value divided by 60 (since the CoCo timer is based on the NTSC video interrupt that happens 60 times a second).

NOTE: If you run this on a PAL system, you will need to change the 60 to a 50 in line 190. (edit: thanks, George P., for catching my typo.)

On the Xroar emulator running on my Mac it reports 25.25 seconds.

Color BASIC scaling demo.

Your challenge, should you decide to accept it, is to take this code and make it run faster.

Rules

You must leave the basic algorithm intact (the SW, SH, S and SH stuff with all the math). You can rename variables, change the representation of values, speed up PRINTing, etc. but the core program flow should remain the same.

For bonus points, you are welcome to rewrite the program (in BASIC) to improve upon the algorithm in any way that makes sense, provided it achieves the same results (including the 1 to 100 benchmark loop).

There are some very (very!) simple things that can be done to dramatically improve the speed to his code.

Feel free to share your efforts in the comments. If you post your code, be sure to post the resulting time, too.

There is a PIC compiler from CCS that I use at work. Every now and then we run across a bug (most of which get quickly fixed by CCS). The latest is a rather bizarre issue where strings are getting mixed up.

Attempting to do something like this:

printf("Value %d", value);

…created output like:

XXXXXX42

…because somewhere else in the code was this:

printf("XXXXXXXXXXXXX");

The compiler was using the correct format string (to know where to put the %d variable on output) but was displaying the string from a different bit of code.

And that other bit of code was unused (not being called at all).

This may be a compiler optimizer bug, but I found it amusing. I haven’t seen a mixed up printf() issue before.

That example is backwards from the topic of this post, but hopefully you can see the approach.

The question today is … which would you choose?

Clean Code would say writing it out in the simplest form is the best, since it would take less time for someone to understand what it was doing. But if you needed every last byte of code space, or every last clock cycle, you might want to do something else.

Every now and then, I run across a bug in a C program that makes me say “oh, one of those again?” For instance, a very common one is when you want to compare two values using equal (“==”) but you leave one off and do an assignment by mistake (“=”):

if (a = 10)
{
DoSomething();
}

When you do this, the result is the value, thus “a = 10” resolves to 10, so it’s like saying “if (10) …”. It would work for any non-zero value, since “if (0)” would not run. A fun bug to find!

Most modern C compilers will emit a warning about that, because they know it’s probably not what the programmer intended.

But a lot of crappy small embedded compilers don’t. And since I work with a lot of small embedded compilers, I stumble across bugs like this from time to time.

I found a new one to add to my list, this time using a logical “and” comparison (“&&”) versus an accidental bitwise “and’ (“&”). Consider this:

if ( (a==100) & (b==42) )
{
DoSomething();
}

The intent of that code was to only DoSomething() if a was 100 AND b was 42. But, there was only one ampersand, so it was actually taking the result of “a==100” (either true or false) and mathematically ANDing it to the result of “b==42” (either true or false).

If a was 100 and b was 42, it was doing this:

if ( (true) and (true) )
{
DoSomething();
}

And that worked, since “true” resolves to a 1, and “1 & 1” is 1. (My grade school math teacher never taught it that way…)

But, clearly this was a bug and it should have been the logical AND (“&&”).

Which made me look at the generated code and see what the difference is. And the difference is that using the bitwise AND generates far more code because it has to save the results of two comparisons then AND them together and check that result. It was basically doing this: