RetroBSD: Run old BSD Unix on a microcontroller

Modern microcontrollers are becoming quite beefy. The Microchip PIC32 line is actually an implementation of the MIPS32 4K architecture – and with 512K of flash and 128K of RAM you can even run Unix! RetroBSD is a port of BSD 2.11 for the PIC32. You might not be able to run X11, but it is still very useful and a great reminder of how small Unix used to be – and how far it has come.

78 Comments

While theoretically you could run a UNIX on this part, it would not really make sense. It would be better to run a standard RTOS and save memory for your application code. I’m not a fan on these parts over the ARM microcontrollers due to lack of on board pull resistors.

I think this story is more in the spirit of “People who want a full MIPS32 core with some decent memory to run a retro UNIX on” than “People who want a UNIX to run on a micro controller”, if you get my meaning.

So basically get the kernel running on the part and then do nothing with it then? I see the point of the article about early kernels being small enough to run on a uC, but a pointless exercise. Other more useful things can be run on the part to actually do stuff with it. Articles like this give people the dumb idea that clusters can or should be built with microcontrollers. PIC32 is far from a microprocessor.

The PIC32 is a microcontroller not a microprocessor. It’s a microprocessor with Microchip peripherals wrapped around it to do microcontroller type applications. I am not even sure how a general computing “cluster” could even be implemented. Would love to know how this could be pulled off, still laughing at the concept.

Yes, I am corrected on the MX, the uC does have pullup resistors on it after re-reviewing the datasheet.

I like the ARMs better because there is a lot more support for the family development tool-wise. The ARMs have a virtual monopoly on 32 bit microcontrollers. The STM32 being my favorite family. Not really sure what advantage the PIC32 would have over something like the STM32 or ATSam.

Also most RTOS like Micrium, CoOS or FreeRTOS all can run under 10K of RAM, not 128K. 128K is a lot of consumption on a uC and is not realy suited for real world applications.

The PIC32 is a microcontroller not a microprocessor. It’s a microprocessor with Microchip peripherals wrapped around it to do microcontroller type applications. I am not even sure how a general computing “cluster” could even be implemented. Would love to know how this could be pulled off, still laughing at the concept.

It would clearly be possible to attach many many micro-controllers to one or more interlink buses with peripherals, storage, etc. However I think what you might be missing is that it would be for boasting rights more than anything else. If someone built a cluster of these things, I think it would be osnews-worthy. Not because it’s particularly powerful mind you (a 4k cluster of pics still has a fraction of the ram and storage in my laptop, LOL), but it would be impressive just because someone managed to do it.

Don’t take it too seriously, look at it more as a project to have fun with. Additionally, don’t underestimate what people can do with primitive technology.

A good example is NASA’s original tech for the apallo missions. Their tech was extremely primitive, your run of the mill office worker today has much more computing power the the entirety of NASA back then, but it remains so fascinating because of what they managed to do with it:

Now there may not be much commercial merit in going back and doing things using primitive tools, but it doesn’t have to be commercially viable in order for us to recognize the technical skills of whoever took up the challenge.

It’s just for geeks and Unix zealots. Total waste of time and life. It could be done with a lot of coding to run the bus, but you are right, would not make any sense capability-wise. Even then, you might not be able to run actual Unix applications on such an animal. I just don’t understand why people jump to the conclusion that a microcontroller is the same as a processor. I can see the device driver routines with a bunch of TRIS statements in them already…I am well aware of what can be done with simple technology as I program microcontrollers daily. Funny but CoOS actually takes up less than 1K and is a base RTOS!

Revisiting the datasheets…the PIC32 is actually a fine product. Since you can compile most of your C code to other microcontrollers pretty easily, the difference between using ARM vs. MIPS is pretty small.

I do a lot of things that only geeks enjoy, but so what? This isn’t really about unix zealotry. It could be any OS and still be interesting, IMHO.

Total waste of time and life.

Sure, some people enjoy legacy computers, some people photograph pots, some people fly model airplanes, some people collect dolls, some people watch TV… arguably it’s all a “total waste of time and life”. But if it’s what people enjoy doing, then who are you to judge? Seriously, if old stuff is so uninteresting to you then why even bother posting to an article titled “retrobsd”? Find something that actually does interest you and spend your time enjoying it! Fair enough?

Microcontrollers interest me, but using BSD on a small part makes zero sense considering the alternatives. Unless you use a large part, just getting the kernel running and not actually doing anything will take the entire part’s memory up. The practicality of the idea is what I am questioning. And the fact that people think microcontrollers are the same as a raw processor.

It is both. All microcontrollers are microprocessors, but not all microprocessors are microcontrollers.

A microcontroller is a microprocessor that includes memory (ie RAM) and I/O on a single die. Most microcontrollers also include Flash or [E][E][P]ROM to hold the program, but not all do. The XMOS chips load their program from an SPI Flash chip into memory at power up.

I am not sure I would go that far. ARM is very popular to be sure, but there are a whole lot of MIPS cores out there. It is probably the most common more used in routers, and those chips would qualify as microcontrollers. PowerPC microcontrollers remain very popular in telecommunications, etc.

ARM has a very large marketshare, but it isn’t quite to virtual monopoly yet.

Also most RTOS like Micrium, CoOS or FreeRTOS all can run under 10K of RAM, not 128K. 128K is a lot of consumption on a uC and is not realy suited for real world applications.

First off lets not focus only on RTOS’s as that is only one segment of the overall Embedded OS world.

Yes there are plenty of tiny barebones embedded OSs out there, but that doesn’t mean they rule the roost or solve all problems.

If I am working on a resource limited chip and need to be close to the metal, then I will use an OS like the ones you mentioned; but there is a reason why they are so small: they do almost nothing.

But if you need more advanced scheduling, installable device drivers, networking, POSIX support (so you can use advanced libraries like Pocket Sphinx, etc), memory management, hardware security, etc; then you need a more capable OS, and that means larger footprints and beefier hardware.

That is why we have Prex, VxWorks, uCLinux, Unison, etc etc etc.

It isn’t a one size fits all solution. I can personally see using RetroBSD on the right project. I have learned never to say never…

Are you going to compile and debug all of these advanced libraries under MPLAB? It would be challenging to do all that from PIC32-gcc via command line. Running POSIX applications on a PIC32 not compiled into the application code is something that I would love to see in person on how it would be done. Also seeing all this fit into RAM our Flash would be interesting and difficult. You would need a file system storage and driver.

Also if you look at the major 32 bit volume vendors, they are mostly ARM based… And if they are not then customers are starting to reject their proprietary platforms… This is what is happening to Renesas. The only reason Renesas introduced theIr ARM series is because they are having trouble selling the RX. Besides Microchip, the volume vendors are all ARM… This includes ST, NXP, Atmel (they pretty much have up on the AVR32, which is an excellent product), Cypress, Luminary, TI, Freescale and several others. In the future we are going to see many other older proprietary platforms drop off. PIC32 will probably one of the only popular non ARM micro in the future because the large 8 bit base of microchip developers are sticking with a brand name they are familiar with, but some of them are also seeing benefits in other lines and are questioning Microchips choice of MIPS over ARM considering differences in the two architectures… Context switching being one of them.

Are you going to compile and debug all of these advanced libraries under MPLAB? It would be challenging to do all that from PIC32-gcc via command line.

#1 why not? and #2 not everyone likes MPLAB. I think it is fine, but I know more than a couple people who just use makefiles and the command line. It isn’t that hard.

Running POSIX applications on a PIC32…

You just made up that requirement. Remember I was talking about cases where you would need a lot of resources, and a much more capable OS than COoS, or FreeRTOS.

When I did the very thing I described, I was using a MIPS core, and uCLinux â€” where I had an embedded filesystem, and could run multiple applications… not just one that is compiled into the kernel.

Though I should point out that embedded OSs like Unison and DSPnano allow you to have a full filesystem that is then compiled into a single executable image, which can include multiple executables.

not compiled into the application code is something that I would love to see in person on how it would be done. Also seeing all this fit into RAM our Flash would be interesting and difficult. You would need a file system storage and driver.

The world of embedded OS’s and embedded apps is vast and wide. It goes from CPUs may have only a handful of bytes of ram, all the way up to 64bit cores and gigs of RAM. It isn’t just one thing, and there is cause for lots of different approaches to the OSs, including something like RetroBSD…

Also if you look at the major 32 bit volume vendors, they are mostly ARM based..

So you are saying that because the majority of “32 bit volume vendors) have ARM in their catalogs, ARM is dominant? I would suggest that you take a look at unit sales.

By that same logic, one would think that AVR and PIC had a virtual duopoly over the market, except that when you look at sales, the venerable 8051 sells more units.

The number of 8051’s shipped inside of SD cards is shocking in and of itself.

And if they are not then customers are starting to reject their proprietary platforms… [elided] Freescale…

Have you looked at Freescale’s catalog? It is filled with 68XX variants, 680X0 variants (ie ColdFire, and they are embedded in Freescale’s accelerometers), PowerPC, etc.

Atmel makes AVR and ARM sure, but they also make 8051s and SPARC cores because the market in those things is too big to be ignored.

ARM is widely popular, and I use it a lot, but that doesn’t mean it eclipse’s everything else. You should go off and look up actual sales numbers.

and several others. In the future we are going to see many other older proprietary platforms drop off. PIC32 will probably one of the only popular non ARM micro in the future because the large 8 bit base of microchip developers are sticking with a brand name they are familiar with, but some of them are also seeing benefits in other lines and are questioning Microchips choice of MIPS over ARM considering differences in the two architectures… Context switching being one of them. [/q]

My points are 1. ARM will dominate in the future and proprietary platforms will drop off for new designs, 2. You were talking about running POSIX applications on PIC32 and I told you that would be very difficult and 3. You are expanding your argument into larger platforms. It would be difficult to implement real world applications on the PIC32 because of size, and lastly 4. Operating systems like Micrium are more than sufficient for practically every application on a microcontroller.

The 8051 is a venerable platform and there are lot of products still being produced with it, especially older designs.

1. ARM will dominate in the future and proprietary platforms will drop off for new designs,

Actually you said that ARM has a virtual monopoly today, and I challenged that. Also dominating and having a virtual monopoly are different things.

I then challenged your methodology. If you can prove that shipments of ARM CPUs represent the vast majority of all 32 bit cores, then I would like to see those numbers.

2. You were talking about running POSIX applications on PIC32 and I told you that would be very difficult…

First off I have never limited my arguments to PIC32, and have said so in other comments. You keep imposing this limit, not me.

But let us say we limit my argument to just PIC32 until we get to the next quoted passage.

PIC32’s can support up to 512K or much much more if you use external RAM. 512K would give a lot more breathing room to RetroBSD; and if you use external RAM you could put uCLinux on it.

However if you just want to stick to RetroBSD, it is, by definition, a subset of POSIX. So long as the code I wanted to use could fit on the amount of ram I have â€” which could be quite large since the MX line supports external RAM â€” then it shouldn’t be that hard to port.

However the way you phrased your argument implied that this would somehow be very difficult in MPLAB, which you don’t explain; or that cross compiling using the PIC32 compiler would somehow be exceptionally difficult.

Except that I don’t understand why that should be. People cross compile for bare bones, or other OS’s all the time. It may take a bit to get all your compile flags in your makefile right the first time, but once you do you have a template to use on every other project. Most of the time it isn’t even as hard as that.

This isn’t some unusual practice. It is done every day by embedded devs across the world.

3. You are expanding your argument into larger platforms. It would be difficult to implement real world applications on the PIC32 because of size

Again no. I have always been talking about the wide world of all microcontrollers and pointing out that they don’t have to be small like the ones you are using. The definition of a microcontroller is quite broad.

I have never just limited myself to just PIC32, or just small microcontrollers, in my comment.

You repeatedly said that RetroBSD was overkill, and I have repeatedly tried to explain that it depends on what you are doing, and what your requirements are.

4. Operating systems like Micrium are more than sufficient for practically every application on a microcontroller.

Small nit: Micrium is a vendor, uC/OS is the OS.

Please define microcontroller for me. I already have, and pointed out they can refer to systems much more powerful than you imply.

and there is no need to limit yourself to an OS like uC/OS II[i]. There is VxWorks, Prex, uCLinux, and even full Linux… depending on the microcontroller and the project’s needs.

As much as I like ARM as a core, and Atmel as a vendor, when I set out on a project for a client I don’t make any assumption about core, bit depth, or vender. I make a thorough examination of the requirements and then pick the best solution for the job.

I pick the best core and vendor for the job. If I need Bluetooth 4.0 then I see if I can use the Nordic nRF51822. If I need wifi and a lot of OS services, then I will look at MediaTek or Atheros.

The same applies to Operating Systems. Pick the right one for the job. You hammer shouldn’t force all problems to be nail shaped…

The 8051 is a venerable platform and there are lot of products still being produced with it, especially older designs.

First off this was my point, that using your “virtual monopoly” methodology, you would have missed the 8051 because it doesn’t dominate the catalogs; and yet it remains the most popular core today.

Secondly, not just old designs. SD Cards, RC toys, toaster ovens, etc etc etc and etc. It isn’t surviving off of its legacy, but on its merits.

A7 is a SOIC…even more advanced peripherals like a high end video card, wrapped into one part with an ARM core. PIC32 is a MIPS core wrapped with simpler Microchip peripherals and onboard RAM & Flash. MIPS processors are just the core itself in a part, put on a bus used in a PC with external RAM. Totally different target applications for each.

I think you mean SoC â€” System On a Chip. SOIC is a IC package type. While the A7 is definitely a SoC it is most assuredly not a SOIC

Thanks for spell checking my Windows Phone keyboard!

Good luck making your SPI cluster. Hope that you run the SPI interface in fast mode at a whopping 400 kHz. Better over clock the interface and add buffers to accommodate fanout interfacing those 4000 PIC32s!

Good luck making your SPI cluster. Hope that you run the SPI interface in fast mode at a whopping 400 kHz. Better over clock the interface and add buffers to accommodate fanout interfacing those 4000 PIC32s!

Danoâ€”

I genuinely don’t want to sound mean, or hostile; but most of your comments sound like you looked at the summary, made some unfounded assumptions, and are sticking to them.

For example: RetroBSD isn’t just a kernel it is a full distribution, the kernel needed 128K (128k is what you need for the kernel, userland, and enough space to run apps), the PIC32 doesn’t have internal pull up/down resistor, and now the speed of SPI.

If you bothered to glance at the datasheet for the PIC32 you would see the maximum SPI speed is 25Mhz.

Indeed I have yet to find a microcontroller whose SPI speed wasn’t in the MHz range. Even the AVRs work at 4Mhz.

I2C however typically works at around 400khz, but comes in MHz variants as well. The PIC32 supports 1 Mhz I2C. It also supports 20Mbs serial, and 10/100 ethernet.

Considering that the Transputer’s link speed topped out at 20Mbps, and I personally saw real time 640×480 rendering demos done at the time. I don’t know how many you would need, but it is theoretically possible.

But back to rendering â€” which wasn’t my suggestion, but makes an interesting thought experiment â€” 25Mhz SPI provides almost enough bandwidth to transmit a 640×480 8-bit image uncompressed in 1 second. Apply some mild compression to that and it should fit fairly easily.

To render in 24bit colors, you would need to diivide the problem across three PIC32s and you can handle all all three color channels.

We need 30fps so we would need at 30 rendering pipelines of 3 PIC32s each, for a total of 90. Of course I haven’t factored in rendering time, just transmission time. So that is really one frame every two seconds. So add in another 90 cores group A can be transmitting while group B is rendering.

You might need some extra in there for managing the pipelines, etc so call it 200 tops.

Just like the transputer you would probably use a traditional computer on either side of the pipeline to send the data in, and collect the accumulated data and display it.

No one claims it would be easy, no one claims it would be useful, there are a lot of assumptions (for example the scene can be rendered in one second for example); but the point of all of this is that it is theoretically possible. Nothing more. No one is running out to spend $1600 for 200 PIC32s ($8 each at QTY 100).

It is like you are complaining that people are having fun the wrong way…

I did mention an bus. Even at 25 Mhz, interfacing even 200 parts serially bandwidth-wise would be much slower than the processing power of all of the parts. A closer won’t gain because of this, you just can’t get the results out. You are also comparing a cluster of microcontrollers to a parallel processed architecture that has a unique kernel that can actually split the thread properly between processor cores. The whole discussion using microcontrollers in a cluster given their output bandwidth is ridiculous.

I did mention an overclocked bus. Even at 25 Mhz, interfacing even 200 parts serially bandwidth-wise would be much slower than the processing power of all of the parts. A cluster won’t gain because of this, you just can’t get the results out. You are also comparing a cluster of microcontrollers to a parallel processed architecture that has a unique kernel that can actually split the thread properly between processor cores. The whole discussion using microcontrollers in a cluster given their output bandwidth is ridiculous.

I’ve been programming micros for almost 20 years. The transputer had very fast serial links for the time, but it was still limited to 32 cores as they hit a limit as to what was feasible bandwidth wise over the bus. Same thing would happens with large numbers of PICs. The transputer guys never really solved the larger distributed core bus scaling issues, which was my point many posts ago. If you decide to cluster with SPI you have to divide the bandwidth by the number of bits in the transmission, and then divide by the number of parts that need to talk on the bus. If you don’t use chip select lines for each part, you also need to add address bits to the SPI transmission blocks. All this dividing of bandwidth goes up with the number of parts networked. You can do this division without me.

Yes and people got around that by having clusters of clusters, or parallel clusters working on different datasets. As I mentioned before I personally witnessed a real time rendering demo that involved 80-90 transputers.

There are solutions, it can be done, and has been done. This is not theoretical. It isn’t easy but it is doable.

Also the transputer’s fastest bus was 20Mbps, the PIC32’s SPI but is 25Mbps, that along with the fact that the PIC32 is much faster than the transputer ever was should compensate for the need for addressing, and dispatching service.

Also the transputer’s fastest bus was 20Mbps, the PIC32’s SPI but is 25Mbps, that along with the fact that the PIC32 is much faster than the transputer ever was should compensate for the need for addressing, and dispatching service.

and almost all of the pic32 have more than one spi-module (many of them have 4)

This is an interesting conversation…I did learn and was reminded of many things.

Despite this, everytime you add a new part to the system, you add more computing power… The problem is the computing output has to be put on a bus for output. The bus has limited bandwidth based on how many parts have to share it. The more parts you put on, the more division that takes place even though you are adding computing power.

The PICs have limited bandwidth based on what they have for IO pins and interfaces. This limitation is what makes very large clusters difficult to implement with diminishing returns.

If you decide to cluster with SPI you have to divide the bandwidth by the number of bits in the transmission, and then divide by the number of parts that need to talk on the bus. If you don’t use chip select lines for each part, you also need to add address bits to the SPI transmission blocks. All this dividing of bandwidth goes up with the number of parts networked.

Despite this, everytime you add a new part to the system, you add more computing power… The problem is the computing output has to be put on a bus for output. The bus has limited bandwidth based on how many parts have to share it. The more parts you put on, the more division that takes place even though you are adding computing power.

You keep making bad assumptions that all nodes need to be on one common bus. I don’t know where you are getting this assumption from?? I don’t want to sound condescending, but you NEED to erase this assumption from your mind so that you can see that many concurrency problems don’t require all nodes connected via a single bus.

The PICs have limited bandwidth based on what they have for IO pins and interfaces. This limitation is what makes very large clusters difficult to implement with diminishing returns.

Take a look at “embarrassingly parallel” problems. Graphics are a prime example of these.

Let’s consider a complex physics simulation for a humongous 3d game world that requires thousands of nodes for computation. Will it work if we connect them all to a single bus? No of course not, as you’ve already pointed out that could never work due to bus contention and overloading it’s electrical characteristics. You seem to throw in the towel at this point, however it is premature to suggest the problem cannot be solved without even considering alternative designs.

Edit: Obviously, don’t take this to mean I would use micro-controllers for this, I’m referring to the cluster connectivity characteristics here, which would be similar on PIC32, x86, etc.

In 3d space, it’s extremely likely for physical events to mostly occur between direct neighbors, and our network topology could optimize for that. For example, each physical node can be connected directly to it’s 3d virtual space neighbors. Hypothetically this design could scale to millions of nodes if need be. Traffic between (0,0) and (0,1) will never collide with the traffic between (2320,102) and (2320,103) for the simple reason that they’re not directly connected together. We could create more interconnections as needed by the game world, but nobody here is suggesting having hundreds/thousands of nodes on the same electrical bus, so there’s no need to argue in those terms.

You guys need to dredge up the November 1988 BYTE magazine. In the Circuit Cellar column, Steve Ciarcia built a supercomputer for calculating Mandelbrot sets out of 8051s, which communicated with each other using their built-in serial ports (IIRC, the 8051 serial port has a 9-bit mode that includes the ability to prefix a message with an address).

November 1988 is part 2 of the 3 part series. I’ve not found it online and I don’t know where my copy is.

You guys need to dredge up the November 1988 BYTE magazine. In the Circuit Cellar column, Steve Ciarcia built a supercomputer for calculating Mandelbrot sets out of 8051s, which communicated with each other using their built-in serial ports…

How in the world would you remember that? It’s definitely a good example

I couldn’t find the byte article online either, but maybe this is related to the machine you are referring to? It certainly sounds like they designed the entire cluster for the very specific purpose of calculating mandelbrot recursions.

It would be pretty awesome if these old articles could be re-published online. I think they’d still give us tons of things to talk about, and they might even be more interesting than the news we have today.

I genuinely don’t want to sound mean, or hostile; but most of your comments sound like you looked at the summary, made some unfounded assumptions, and are sticking to them.

Haha, I got the same impression. I assumed Dano was in agreement with the points in my last post that he didn’t respond to.

But back to rendering â€” which wasn’t my suggestion, but makes an interesting thought experiment â€” 25Mhz SPI provides almost enough bandwidth to transmit a 640×480 8-bit image uncompressed in 1 second. Apply some mild compression to that and it should fit fairly easily.

To render in 24bit colors, you would need to diivide the problem across three PIC32s and you can handle all all three color channels.

I wouldn’t divide it this way, let me explain why.

While it would technically work, it would result in 3 controllers duplicating the exact same polyfill/raytracing computations(other than a different intensity per color channel obviously).

If one controller calculated all 3 color channels, that controller would be slightly slower. However with 2 other controllers being liberated from the need to (re)compute those same polygons/rays, it would result in a net gain.

No one claims it would be easy, no one claims it would be useful, there are a lot of assumptions (for example the scene can be rendered in one second for example); but the point of all of this is that it is theoretically possible. Nothing more. No one is running out to spend $1600 for 200 PIC32s ($8 each at QTY 100).

I think one second per frame is much too conservative, actually. These links are for AVRs and not PIC microcontrollers, but never the less they show doom and quake running in real time. Quake is a bit slow on a 140MHz controller from six years ago, but still much faster than 1FPS.

These ran at 320×240 (just like my pentium did back in the day). Today’s controllers can probably do better. This would be a total hack, but since there are no “rules” for DIY inventors, you could even hookup an array (say 4×4) of low resolution 320*240 LCDs into a larger panel of 1280*960, driven by 16 micro-controllers.

You understand that this is the full distribution of BSD 2.11 right? Shell, games, awk, etc?

So you can do anything that people used BSD on a PDP-11 or early VAX could; which was a lot. It is smaller than uCLinux, smaller than Prex, but supports a very similar feature set. It is very mature and well understood.

You keep calling it big, but have you actually looked at its memory footprint? From what I can see it is roughly on par with VxWorks “new” microcontroller Kernel. And of course you don’t have to use the full BSD distro if you were using it in an embedded project.

Oh I realised the context; I just wonder what actually are pull up or pull down resistors… :p (and prefer to ask than to google; I might still have to do the latter, this thread is probably going to close soon :p )

A pull up resistor is one that is attached to a A GPIO pin and positive voltage. This was it is being “pulled up” and requires the circuit to pull them down. Pull down resistors are the inverse: the resistor it attached to ground, and requires the circuit to provide positive current.

They are used for various things but buttons are a common case. When the button isn’t closed (IE pressed) the GPIO pin is “floating” and may return 0 or 1 until it is pressed. To resolved this you use pull up or pull down resistors.

Internal pull resistors are built into the chip itself and can be enabled via software. It is a a nice to have, but far from essential. Some engineers don’t like to trust the internal resistors for things like buttons because they can be fairly week, and so sometimes you can get false positives or false negatives depending.

I’m not an EE, but basically these resisters are there to pull up or down the voltage by default (when no transistors/latches are active) – to prevent the pin from being in a floating state that can produce erratic behavior with other components.

Consider without the resister, the pin’s on/off states might be 5v/unknown. But by adding a pulldown resister (internally or externally) we can make the pin’s on/off states be 5v/0v. The resistance needs to be chosen such that it’s strong enough to pull the voltage down to 0v, but weak enough to allow the transistor to pull it back up to 5v.

Sometimes you want a floating state to implement tri-state logic (on/off/unset), and in such cases a pull down resister is bad. Consider a 485/1wire/etc bus where you can hook up dozens of devices together on the same wires. If each device had a resister pulling down the voltage to 0 on the same line, the sum of all those would exceed the transistor’s power to pull it back up to high voltage. So in these cases two transistors are used to explicitly set on/off states on exactly one device at a time.

Rather well in terms of features with one area: threading. I might be wrong about that, but I don’t think I am.

Perhaps a 4k pic cluster could be used for raytracing, it would be an impressive feat. Although I suspect the limited ram of the pics would pose problems even for relatively moderate scenes. Protein folding applications might be feasible. If nothing else, it could do bitcoin mining.

Of course anyone tackling these problems seriously would opt for high powered GPGPUs or custom designed FPGA/ASIC. Using a pic cluster is just a neat challenge.

The fact that you don’t see the problem is the problem. You have to have a kernel running on every part in the cluster, have drivers and hardware to tie them all together, have enough onboard memory to run user code, have an interface to launch user code along with a hundred other issues including clocking and bus arbitration issues that would slow a cluster to a halt even if you had enough ram and glue to get it all working. Has anyone here actually programmed a microcontroller before? Even after all of this you would need custom app code and perhaps a custom compiler to write the app code to make a program that actually runs on the platform.

The fact that you don’t see the problem is the problem. You have to have a kernel running on every part in the cluster, have drivers and hardware to tie them all together, have enough onboard memory to run user code, have an interface to launch user code along with a hundred other issues including clocking and bus arbitration issues that would slow a cluster to a halt even if you had enough ram and glue to get it all working. Has anyone here actually programmed a microcontroller before? Even after all of this you would need custom app code and perhaps a custom compiler to write the app code to make a program that actually runs on the platform.

And no, this wouldn’t be that bad if the cluster master (a normal system) and the cluster were designed right – in fact, it’d just be a really really large version of the Cell Processor used by Sony for the PS3.

Yes, the Cell was a PITA to program for, but it did work. Now scale that out to 4k nodes with about the same amount of RAM. A PITA but doable.

The cell is a multi core processor with an internal bus to make all of the cores interconnect. You don’t have anything like this available to interconnect thousands of PIC32s. The software that would run on a theoretical PIC cluster would not be a standard UNIX application. You won’t have enough IO and clocks to actually make it run efficiently. As a matter of fact you would have a limit as to how many pics you could actually interconnect if you even wanted to die it just because of IO.

The cell is a multi core processor with an internal bus to make all of the cores interconnect. You don’t have anything like this available to interconnect thousands of PIC32s. The software that would run on a theoretical PIC cluster would not be a standard UNIX application. You won’t have enough IO and clocks to actually make it run efficiently. As a matter of fact you would have a limit as to how many pics you could actually interconnect if you even wanted to die it just because of IO.

All depends on the design.

If you take a handful and put them on one board controlled by a master on that board, then interconnect those boards you could probably do it and scale pretty well.

It’s not as if that kind of scaling hasn’t been done before.

I didn’t say it would be easy either – it’d be a massive PITA. But it is doable if you expand your thinking on how they would all interconnect.

Personally I would take a page out of the Transputer playbook and implement a 4-way interconnect â€” possibly as SPI â€” to the other chips on the board. Give each chip in the cluster some for of unique id, and then one dedicated chip per board to connect to three other boards.

Or you could just use I2C as a crude network. Not as fast but requires less overhead from the code. This would limit the number of nodes per network, but there are solutions for that.

You would still need custom device drivers to interface the kernels in each micro to a particular interface. 100 MBit Ethernet or parallel custom interface would be the only options with enough speed to get the data out of each part in a cluster.

Ray tracing? What would the output be presented on? You guys are on crack and I am sure that you don’t understand the depth of the problem.

With all due respect, you can dismiss the practical utility all you want, but if you are trying to dismiss the technical feasibility then I think you lack vision and ingenuity.

Standard definition TV has roughly 704*240 pixels at 60FPS (interlaced), that’s 10137600 pixels/second (assuming every single pixel gets computed every single frame). With a ~4k cluster, or 2534.4 pixels/processor/second. Depending on pic32 model, we might get 100-330MIPS, lets choose conservatively: 100MIPS. That’s 39,457 instructions per pixel. Let’s assume 75% of that is lost to various overhead, so about 10K instructions are conservatively available for raytracing a single ray. It certainly seems this could produce some stunning realtime ray tracing.

Granted I haven’t done it, and there may be pitfalls (I suspect the lack of RAM will limit scene complexity), but that gets solved as you come to it. Fractal/procedurally generated scenery can produce infinite detail without infinite ram, etc. The trick is to optimize, optimize, optimize… Creativity can overcome huge obstacles, but only if we allow ourselves to envision it first instead of being dismissive.

Your math dismisses bus arbitration and software overhead dramatically. I’d you could even make the connection on a bus. And you know that it’s not really even feasible with this family.

Of course it’s naive to stick 4k controllers on a massive bus, but this is part of the problem that you need to solve creatively. It seems you have strong preconceived ideas about the limitations of micro-controllers stemming from the designs you may have already worked with. But try to think creatively, other more scalable designs will pop out that are better suited to the problem at hand.

The internet doesn’t croak with millions of hosts online because it’s subdivided into a hierarchies. Our bus doesn’t need to be flat, by applying appropriate network hierarchies we can increase parallelism while reducing I/O contention.

However if you think about raytracing for a second, you should realize that one raytracing controller has very little reason to communicate to any of it’s peers, so we can use this to our advantage in designing a bus. So right there alot of your theoretical problem goes away. For the most part they could be write-only, take rendering instructions and then output the rendered results into a separate buffer into the DAC. For that matter, since controllers will be rendering the exact same scene at a tiny offset, we might even get away with just broadcast the same scene instructions to all controllers simultaneously and allow them to render the scene on their own with no further I/O at all.

This is exactly what I mean. When we take some time to think about the problem, we find that the problems we thought were intractable (ie bus I/O overhead), can be optimized away if we dare to embrace the challenge. That’s all part of the fun. If we jump the gun and simply assume it’s not feasible, then not only does this attitude hold back our ingenuity, it also prevents us from enjoying the thrill of solving the challenge.