The New 16-Cog, 512KB, 64 analog I/O Propeller Chip - Part 2

Comments

What's the point of non-long-aligned hubexec? From what I understand, it just complicates addressing, and makes even streamer-aligned jumps take a tick longer because the first instruction can't be run until two longs are fetched because it spans two longs. Of course, I could just ignore it by putting "long" at the top of all my PASM...

Otherwise, I'm very excited for the P2 and wish I had the time and money for an FPGA!

I thought all instruction fetches are long aligned, including in hubexec mode.

What's the point of non-long-aligned hubexec? From what I understand, it just complicates addressing, and makes even streamer-aligned jumps take a tick longer because the first instruction can't be run until two longs are fetched because it spans two longs. Of course, I could just ignore it by putting "long" at the top of all my PASM...

Otherwise, I'm very excited for the P2 and wish I had the time and money for an FPGA!

I thought all instruction fetches are long aligned, including in hubexec mode.

The byte addressing is natural and normal. The smallest "atom" of data in any machine now a days is the byte.

That does not imply that instructions can sit at any byte address. There are many machines, especially those with fixed sized 16 or 32 bit instructions where the instructions have to be located at two or four byte boundaries.

If you are going to have byte addressing in HUB it makes sense to be consistent and have byte addressing in COG, even if you can never address a arbitrary byte in COG.

First of all, let me clarify that I'm only talking about instruction fetching. Data should definitely be byte-addressible. I love it that (R|W)F(BYTE|WORD|LONG) can operate on misaligned words and longs.

All instructions are exactly one long long. The hub streamer reads one aligned long at a time. If an instruction isn't long-aligned, then it must be split over a long boundary. That means that the first instruction after a jump can't be run until two aligned longs are fetched by the streamer. An optimizing compiler that tries to align hubexec jumps so that the streamer is ready to load the jump target instruction immediately after the jump would never misalign an instruction because doing that would slow down the first instruction after each jump. Yes, this is a matter of ticks, but they add up for inner loops.

Caveats/Questions:
1. COGSTART/COGINIT will have to clear the special registers in cog - or could it be done at BOOT and COGSTOP ???
2. Will remainder of COG RAM/registers & LUT be cleared?
If possible, suggest NO - we can then in fact hold what was in Cog RAM on a previous start ???
-or do you power down the whole cog and it's RAM & LUT at boot and/or COGSTOP ???
3. HUB RAM - will it be cleared on BOOT ???
Again suggest NO - it's been nice with the P2 FPGA to be able to reboot and examine the hub ram by jumping into the monitor.

How is the HUB ROM going to work? Will it be part of the HUB Address space or will it switch out after bootup ???

1. The reset hardware causes all the cog I/O to reset. However, it is still necessary to clear the RAM registers for OUTA/OUTB/DIRA/DIRB. This will be done by the now-simplified cog startup code that is realized in logic, as it's only a 5-long program:

MOV OUTA,#0
MOV OUTB,#0
MOV DIRA,#0
MOV DIRB,#0
JMP PTRB

2. Nope. They will retain whatever they had in them before.

3. Utimately, on the chip, the hub RAM will have to be cleared on failed startup when security is enabled to get rid of all SHA-256 residue and any data left behind. Without security enabled, there's no need to clear the hub RAM, though.

(4.) The hub ROM is read via COGID WC, sequentially. This happens at boot-up and the contents are loaded into the first 16KB of RAM and executed by cog0. 16KB is complete overkill, for now, but it is sufficient room for a complete on-chip development system in the future.

I was thinking about how addresses above $1000 are hub-exec, while addresses $0..$7FF are cog-exec and $800..$FFF are LUT-exec, and what a pain it is that you can't have hub-executable code below $1000. Then, it dawned on me that cog/LUT-exec could be restricted to long-aligned addresses, only, allowing hub exec to occur on non-long-aligned addresses below $1000. Here's the new way:

I was thinking about how addresses above $1000 are hub-exec, while addresses $0..$7FF are cog-exec and $800..$FFF are LUT-exec, and what a pain it is that you can't have hub-executable code below $1000. Then, it dawned on me that cog/LUT-exec could be restricted to long-aligned addresses, only, allowing hub exec to occur on non-long-aligned addresses below $1000. Here's the new way:

In other words, since you appear to be supporting 20-bit addressing (1MB), put cog and lut execution above the hub address range. Then there is absolutely no overlap in the addressing and you still get full coverage. Data addressing is not an issue anyhow, since you use different instructions for each type of memory.

Yes, it means our cog code would have to have "ORG $80000", but I don't see an issue with that. Now that all of the jumps are either long jumps from a register value or relative jumps, it doesn't really matter where the cog (and LUT) address range is actually located.

Ahh. Now, the FPGA argument I agree with. But, only just. It's going to have to be a pretty beefy FPGA to provide 1MB of hub ram plus ram for the cogs and LUT (and internal registers). Seems like the tail wagging the dog.

(note: I suspect the more economical approach will be to use external DDR and map it's page size in via an additional address range above $7FFFF.)

Additional note: I'm not attached to the suggestion, by the way. Just throwing it out there as an alternative suggestion in case Chip is determined to make it work one way or another. The instruction address space from $80000 is not used by the P2, even though it has instructions that support it. How about this instead:

COG addresses stay simple, HUB code starts at $1000, done. Large address constants for COG code is more goofy more of the time than the occasional non aligned code is, and that can be managed by software too, if we want.

Now that I think about it, the booter, monitor, crypt, etc... can go in the non aligned space, leaving some room for user debug or dev code and or hooks to nicely integrate the same and the dev system.

Call this the system area and it is expected to be used in that fashion.

And if it is really needed, it can be used anyway, unlike the hot ROM.

Now that I think about it, the booter, monitor, crypt, etc... can go in the non aligned space, leaving some room for user debug or dev code and or hooks to nicely integrate the same and the dev system.

Call this the system area and it is expected to be used in that fashion.

And if it is really needed, it can be used anyway, unlike the hot ROM.

It just seems rather ugly to have two address spaces overlaid on top of each other with a slight offset. What is this? Parallel universes? :-)

(4.) The hub ROM is read via COGID WC, sequentially. This happens at boot-up and the contents are loaded into the first 16KB of RAM and executed by cog0. 16KB is complete overkill, for now, but it is sufficient room for a complete on-chip development system in the future.

I was thinking about how addresses above $1000 are hub-exec, while addresses $0..$7FF are cog-exec and $800..$FFF are LUT-exec, and what a pain it is that you can't have hub-executable code below $1000. Then, it dawned on me that cog/LUT-exec could be restricted to long-aligned addresses, only, allowing hub exec to occur on non-long-aligned addresses below $1000. Here's the new way: