A long, long time ago, Matt [Godbolt] and I submitted a program to Acorn User's "*Info" column which generated Filled Julia Sets on the Beeb - we'd just learnt how to do multiplication on the 6502, and it seemed like a good application for it. But it wasn't a particularly fast routine, and although we were proud of it at the time, there was undoubtedly a ton of room for improvement. It did earn us fifty quid though

Fast forward, erm, 25 years... ...and I was thinking about something I'd seen in the past where Mandelbrots were rendered using a border tracing algorithm. Basically the idea is that, if you can identify where the colour boundaries are, you only need to iterate the Mandelbrot function there, and you can then do a simple flood fill on any areas which are left. I was wondering if this could be done on the Beeb, so, a few evenings later and voilà!, a new Mandelbrot program was born!

The disk image and source code are attached. Once it's finished drawing, you can move the zoom selection box around with ZX:/ and Return to select. For simplicity, there's no way to resize the box - it always zooms to 1/4 of the previous zoom (how convenient). Escape returns you to the fully zoomed out parameters. In very detailed bits of fractal, sometimes it runs out of stack and makes a total mess of everything, need to try and correct that!

If anyone can think of more ways to speed it up, I'd love to know! I actually expected it to run a bit faster than it does with the super optimized multiplication routine, but I think I'm finally at peace with the idea that the 6502 was never really meant to calculate Mandelbrots. Still, it's fun to see it working on the Beeb, even if it takes a minute or so to get there...

Just uploaded a new version, but it'll never run on a second processor as-is. It's doing everything illegal - screen access and keyboard access - and turns off IRQs and shuts the OS out, mostly so I can use as much of the OS's memory as possible in order to give it more space for its stack.

This imposes a maximum zoom level (as precision issues mean that it looks rubbish after the 5th zoom in), and checks for stack becoming full so it doesn't walk all over the multiplication tables and make everything look horrible. Also, lurkio pointed out that it needs a higher load address so that the open !Boot file doesn't complain when the executable is loaded - now fixed too, thanks!

The next way it could be improved would be to increase the precision, maybe using 64 bit values, and/or a floating point type model rather than the fixed point representation it currently uses. Also could do with increasing the maximum number of iterations it does as the zoom level gets higher.

I'd forgotten quite how bad the 6502 is at doing this sort of thing, so I decided to dig out the Acorn User disc from August 1990 to look at their Mandelbrot program, which promised "Mandelbrots at a speed you have never seen before"!

Here's the output compared to my program:

The Acorn User program took 496 seconds to generate that image at a resolution of 80x128, with a maximum of 40 iterations before deeming a pixel to be in the Mandelbrot set.

Mandelbeeb takes 98 seconds at double the resolution (160x128), with a maximum of 32 iterations.

The big win comes from the border tracing algorithm. The pixels which lie in the Mandelbrot set (the black ones) are the slowest to calculate as they have to undergo the maximum number of iterations, and we save a huge amount of processing by only needing to evaluate the pixels which form the outline. The table-based multiplication also makes a difference - my guess is that a more naive multiplication routine would maybe double the time taken.

I've been thinking about how this would be adapted to run on the second processor. It'd have to use legal methods, i.e. using the VDU PLOT codes to output pixels, which is fine. A small issue is that the code currently uses the top bit of each pixel as a flag - each Mandelbrot pixel is made up of two screen pixels; the top bit of the top pixel is used to indicate that the pixel's colour hasn't yet been calculated, and the top bit of the bottom pixel is used to indicate that this pixel has already been put in the stack to be processed (so that it's not added again by a different neighbour). It'd be a big bottleneck to convert this to use the OSWORD to read a screen pixel, so I'd need to maintain a couple of buffers instead. Perfectly doable with all the second processor memory, but not such a trivial change. When I have time I'll give it a go.

If you're manipulating pixels directly, the simplest thing do do would be to have a buffered mirror of the screen memory, manipulate that, then blat across the whole screen when you've finished. You would probably be able to use the same code with just a final 'if on CoPro then copy screen buffer'.

The default copy-across-tube calls (OSWORD 5 and 6) are very slow for large amounts of data, so it would probably be worth installing OSWORD &FF and calling it to chuck the entire 20K across in one go. (link)

Then you wouldn't see it building up the image as it drew the outlines, which is the fun thing about the program and probably the only thing that makes the wait bearable! I don't know what the overhead is for sending two GCOL and PLOT commands (VDU 16,0,c,25,a,x;y;) across the TUBE to the I/O processor - I would assume that a single GCOL/PLOT call is probably pretty quick as it fits in the 10 byte FIFO, but that queuing two of them would cause a bottleneck while the TUBE FIFO was flushed. I'd certainly need to keep a mirror of the screen anyway, because it also acts as a cache of already-calculated values (along with the two flag bits).

Seems to me like the only way to handle this would be to adopt Elite's method and intercept WRCHV on the host, and have it interpret VDU a,b,c as GCOL 0,a; PLOT 69,b*8,c*4. Sounds like a lot more work though...

Rich Talbot-Watkins wrote:Then you wouldn't see it building up the image as it drew the outlines, which is the fun thing about the program and probably the only thing that makes the wait bearable! I don't know what the overhead is for sending two GCOL and PLOT commands (VDU 16,0,c,25,a,x;y;) across the TUBE to the I/O processor - I would assume that a single GCOL/PLOT call is probably pretty quick as it fits in the 10 byte FIFO, but that queuing two of them would cause a bottleneck while the TUBE FIFO was flushed. I'd certainly need to keep a mirror of the screen anyway, because it also acts as a cache of already-calculated values (along with the two flag bits).

Seems to me like the only way to handle this would be to adopt Elite's method and intercept WRCHV on the host, and have it interpret VDU a,b,c as GCOL 0,a; PLOT 69,b*8,c*4. Sounds like a lot more work though...

Ah right! I was looking at http://www.sprow.co.uk/bbc/hardware/armcopro/004.pdf which said it was a 10 byte FIFO, but maybe the hardware buffer is 10 bytes, while the TUBE interrupt system implements a 24 byte FIFO which fills the TUBE hardware without blocking? Or maybe that documentation is just wrong?

That said, I quite like the elegance of a custom host WRCHV which can plot double pixels in colour 0-32 (performing the appropriate stippling), just with a simple VDU sequence (it could also do direct screen access to permit larger-than-MODE 2 screens). But it's a bit more effort to do!

Rich Talbot-Watkins wrote:Ah right! I was looking at http://www.sprow.co.uk/bbc/hardware/armcopro/004.pdf which said it was a 10 byte FIFO, but maybe the hardware buffer is 10 bytes, while the TUBE interrupt system implements a 24 byte FIFO which fills the TUBE hardware without blocking? Or maybe that documentation is just wrong?

I think it meant the maximum VDU command length was 10 bytes.

Later on is says:- Register 1 (24 byte FIFO read only)

The Acorn TUBE ULA is 24 bytes, the Pi Tube Direct is 24 bytes, the Matchbox Co Pro is 32 bytes, and I think some of John Kortink's designs (e.g. ReTuLa) are only one byte. So your mileage may vary.

Yep, OSWORD 5 and 6 I knew about. What I'm wondering is how to get the host to execute arbitrary code. Can this only be done by intercepting vectors and going through the supported protocols (OSWRCH, OSBYTE etc), or is there some other way specifically designed to allow this? Obviously I can do anything by just literally writing WRCHV on the host with OSWORD 6 and reserving various VDU codes to perform different host operations, but I was wondering if there were a less hacky way.

Rich Talbot-Watkins wrote:Yep, OSWORD 5 and 6 I knew about. What I'm wondering is how to get the host to execute arbitrary code. Can this only be done by intercepting vectors and going through the supported protocols (OSWRCH, OSBYTE etc), or is there some other way specifically designed to allow this? Obviously I can do anything by just literally writing WRCHV on the host with OSWORD 6 and reserving various VDU codes to perform different host operations, but I was wondering if there were a less hacky way.

I can't think of a cleaner way to execute arbitrary code on the host.

What the Z80 Co Pro does is install it's own User Vector handler (with OSWORD 6), then that gets called by *LINE, *CODE, OSWORD >=&E0. Problem is, that slow, and is blocking (on the parasite side).

I think you do need to intercept the OSWRCH handler and install your own. That's the only way you'll then benefit from the 24-byte FIFO.

Thanks. I'll give it a go some time! Would definitely expect to see a 2x speed increase on a 4MHz 6502. To be honest though, there's unlikely to be a lot of speedup from running the calculations and plotting in parallel - the plotting code is a tiny fraction of the processing compared to iterating the Mandelbrot function. Unlike Elite, which benefits from the both the faster 6502 and the parallelisation of the maths and the line drawing.

Rich Talbot-Watkins wrote:Thanks. I'll give it a go some time! Would definitely expect to see a 2x speed increase on a 4MHz 6502. To be honest though, there's unlikely to be a lot of speedup from running the calculations and plotting in parallel - the plotting code is a tiny fraction of the processing compared to iterating the Mandelbrot function. Unlike Elite, which benefits from the both the faster 6502 and the parallelisation of the maths and the line drawing.

To intercept and extend the VDU protocol properly, you'd need the host-side code to model the lengths of all the VDU control operations. Which is not terribly hard, I suppose. But to do a quick and dirty job for a specific purpose, it's enough to model only the lengths which you'll use. It might even be possible to ignore the length problem, if you can arrange that some values will never appear as operands to control codes: FF and FE for example might not be used by your application, and could therefore be used for extended operations, such as a one-byte update.

Now it looks like all the outlines are grown in parallel from the outside inwards - not sure if this is more or less visually interesting, but it has lower space requirements, so it's probably here to stay.

@jgh: Thanks for the tip - nice to know what my options are.@Ed: This would be a complete replacement for the VDU drivers, it wouldn't be supporting any of the real control codes, so I'd be free to implement it as I wished. I'd probably just say VDU c, x, y plots two pixels in colour 0-32 (with dithering), at (x,y) in screen pixels from the top left. Then would make exceptions for special values of c, to do things like clearing the screen, setting palettes, plotting the selection box, etc.

Of course the next possible speedup is to realise that if you have the 0 imaginary axis somewhere within your top y and bottom y range then the negative value pixels are the same as the positive values (e.g. the image is mirrored at the 0 coordinate on the imaginary axis).

I'm writing a game where you can change your character from a Wizard to a monkey to a cat.

It is interesting coincidence that the similar threads present at Amstrad CPC and Commodore +4 forums. However they are about programs in Basic. I have just converted one from Commodore Basic to BBC Basic.The original program.

Funnily enough, I have been writing the Mandelbrot in a BASIC program on the ZX Spectrum recently as I am a member of a facebook group called BASIC on the ZX Spectrum.

My code includes a zoom rectangle, three dithering methods (to avoid the famous Spectrum colour clash it is in monochrome) - simple alternating black/white iteration bands, black, checkerboard, white, opposite checkerboard,black (and so on), or smoothed between bands alternating from solid black to solid white, then back to solid black but with dithering in between. Only thing I haven't included is saving and loading of locations.

Needless to say, the code is pretty slow, but it has progressed as far as I wish it to for now. I tweak now and then to add another bit of functionality.

Does anyone wish to see the code and/or samples of the generated fractals. I'm not sure if links to the images on facebook's servers would display inline in the post, so here's hoping:

The codes like \{vi} or \{vn} are ZX BASIC codes for setting inverse video or normal video, and \* (in line 20) is the copyright character, so can safely be ignored. There is a programming environment that runs in Windows, called BasinC which allows one window to edit code, another to see the output (e.g. an emulator) and is very complete. Installing it and pasting in my code is sufficient to run the program. There is a turbo mode within BasinC which activates on a press of the F1 key (which makes program execution in this case almost bearable).

The whole code is a bit spaghetti-like and could be improved, but Sinclair BASIC doesn't have PROCedures and REPEAT UNTIL etc.

I'm writing a game where you can change your character from a Wizard to a monkey to a cat.