"There is a fifth dimension, beyond that which is known to man. It is a dimension as vast as space and as timeless as infinity. It is the middle ground between light and shadow, between science and superstition, and it lies between the pit of man's fears and the summit of his knowledge. This is the dimension of imagination. It is an area which we call the COG Shadow Zone."

Pit of man's fear. It leans in that direction. It is a raspberry-seed-in-your-wisdom-tooth kind of situation.

Okay...

This is ugly as all get-out, but here's what would work very nicely (consider that this allows for a 1K x 32 LUT):

Nobody would notice these funny %01, %10 and %11 LSB's in cog/LUT addresses because they would be contained in symbols, with their LSB's established by the particular ORGCOG/ORGLUT/ORGLUT2 directive used before their declaration.

I think someone suggested something like this before.

Increasing the LUTs to 1K x 32 would only take about 1.2 mm2 of die area. If we couldn't fit it into this device, it could certainly go into a future smaller-geometry chip. We could implement it on the FPGA, in any case.

This would give 1,528 internal instructions per cog.

P.S. It was Seairth who had proposed something like this on the prior page.

But, why do we have to have hub-exec able to run from non-long aligned code???
Seems to me that we have the cart before the horse and that is complicating the PC counter.

Why couldn't we just address all instructions on long boundaries and save the 2 bits (and it's complications for the masses to understand)?
The PC would contain an extra 2 (hidden) bits (that could be extended in future P2's) to designate COG/LUT/HUB.
The jump/call/return instructions would still contain these 2 bits, but the compiler would insert these depending on whether the address was in COG/LUT/HUB.

But simplifying even further, there should be no reason to differentiate the COG/LUT so we can have seemless instruction addresses from COG $000-$3FF(or 5FF), ignoring the special register gap. The compiler will just insert these 2 address bits.

So, in reality, the PC would be the same as you have now, just that it would increment by 4, and the last 2 bits would be defined as you have suggested here but would be hidden from the user (except in the case of actual hand assembly).

So, I am just saying, hide these 2 bits from the user. Hope I have made this clear enough

BTW We can live with the extra 2 bits being the address for simplifying your pnut compiler.

The trouble with getting rid of the two LSB's of the PC and insisting that hub-exec be long-aligned is that we loose the address matching between... wait a minute. I understand what you are saying now. While we need the full address range, we don't need to track the bottom two LSB's of the PC if we are always long-aligned. I get it. We still need 20-bit addressing for bytes/words/longs.

Currently, there is only one long-aligned rule for hub memory: When RDFAST/WRFAST wrap around to repeat a block, you must use long-aligned addresses or you will get some errant data in the partial longs during the wrap. There's no way around this. That's the only time when you really need to think about long alignment: on block-wrapping fast reads and writes.

Hub-exec doesn't care about the alignment. This saves memory when placing odd-size strings or data in-between PASM code. By forcing long alignment for all instructions, we could save on some adder chains, and maybe 20 flops per cog. It would introduce a caveat that hub-exec instructions must be long-aligned. I don't know if it's worth it. The one good thing I can see is that it would clear the air on cog-exec and lut-exec, and erase ambiguities surrounding the execution-address LSB's (that only exist in people's minds).

What do you guys think??

P.S. We can't have seamless cog-to-LUT execution because of the special I/O registers from $1F8..$1FF.

Wait! We CAN have seamless cog-to-LUT execution IF we move the special-function registers down to the start of cog memory (ie registers $000..$007). Then, the PC could flow right into LUT space, making a seamless 1k instruction space.

Since cog-exec code doesn't necessarily need to load starting at $000, anymore, we can and will be putting it everywhere.

What do you guys think about that? It means ORG'ing your cog code at $008 (x4). ORG(COG) could automatically do that, if no operand was given.

P.S. This would mean that you could load your interrupt vectors as part of your code, without having to make discreet writes to $1F0..$1F5.

Wait! We CAN have seamless cog-to-LUT execution IF we move the special-function registers down to the start of cog memory (ie registers $000..$007). Then, the PC could flow right into LUT space, making a seamless 1k instruction space.

Since cog-exec code doesn't necessarily need to load starting at $000, anymore, we can and will be putting it everywhere.

What do you guys think about that? It means ORG'ing your cog code at $008 (x4). ORG(COG) could automatically do that, if no operand was given.

Sounds worthwhile, - anything simple the tools can do, and check.

Does the overflow happen (largely?) without caveats to most code ?
ie could users simply write, and the expanded code runs. ?

What happens when they then go over the top of LUT ?
Does that flow into HUB ?

Wait! We CAN have seamless cog-to-LUT execution IF we move the special-function registers down to the start of cog memory (ie registers $000..$007). Then, the PC could flow right into LUT space, making a seamless 1k instruction space.

Since cog-exec code doesn't necessarily need to load starting at $000, anymore, we can and will be putting it everywhere.

What do you guys think about that? It means ORG'ing your cog code at $008 (x4). ORG(COG) could automatically do that, if no operand was given.

Sounds worthwhile, - anything simple the tools can do,
Does the overflow happen without caveats to most code ?
ie could users simply write, and the expanded code runs. ?

What happens when they then go over the top of LUT ?
Does that flow into HUB ?

The only caveat to cog/LUT code would be that the portion of code that exists in LUT could not be self-modifying in the normal sense, because it is out-of-range of the D-specified registers. You would have to use RDLUT/WRLUT or SETQ2+RDLONG to access it. If you hit the top address of $3FF (x4), the assembler would error out because you just crossed a boundary that needs to be handled with distinct intent.

Too many options! I'm not saying that you shouldn't change it, but can you wait (or settle on one) until after you get an initial version of the FPGA image released? That way, we can start testing while you continue to tweak.

>move the special-function registers down to the start of cog memory (ie registers $000..$007).

Sounds OK, as on the prop1 we never got an option for the special registers to be included in the 512long transfer.
You would only use org 8 (if it is possible to fill in at a offset?) if you don't want to clear or set the first 8 longs.

org 0
long 0 ' default is to clear the special registers but can be pre-set.
long 0
long 0
long 0
long 0
long 0
long 0
long 0
org(cog) 'optional only needed if you forget to write the above 8 longs or wanted them non-initialized
code goes here

Wait! We CAN have seamless cog-to-LUT execution IF we move the special-function registers down to the start of cog memory (ie registers $000..$007). Then, the PC could flow right into LUT space, making a seamless 1k instruction space.

Since cog-exec code doesn't necessarily need to load starting at $000, anymore, we can and will be putting it everywhere.

What do you guys think about that? It means ORG'ing your cog code at $008 (x4). ORG(COG) could automatically do that, if no operand was given.

Sounds worthwhile, - anything simple the tools can do,
Does the overflow happen without caveats to most code ?
ie could users simply write, and the expanded code runs. ?

What happens when they then go over the top of LUT ?
Does that flow into HUB ?

The only caveat to cog/LUT code would be that the portion of code that exists in LUT could not be self-modifying in the normal sense, because it is out-of-range of the D-specified registers. You would have to use RDLUT/WRLUT or SETQ2+RDLONG to access it. If you hit the top address of $3FF (x4), the assembler would error out because you just crossed a boundary that needs to be handled with distinct intent.

In both cases the ASM can give clear error messages, so ti sounds well worth doing.
It is also not a bad thing thing to have the INIT constants first in code, as it forces users to do the housekeeping, and means the smallest loadable pgm is one compact piece.

I presume those 0..7 declared as constants will load-into the registers ?
What is then the smallest executable visible program ? Init... then INC Port & Loop ?

org 0
long 0 ' default is to clear the special registers but can be pre-set.
long 0
long 0
long 0
long 0
long 0
long 0
long 0
org(cog) 'optional only needed if you forget to write the above 8 longs or wanted them non-initialized
code goes here

I would prefer a segment approach so users can comment out any line, and not break things.
things like SEGREG and SEGCOG would encapsulate the ORG and also tell the assembler ASM code was legal, or not.

I was about to suggest having LUT first $000-1FF/3FF followed by COG Registers.

This way the address space is contiguous.
The COG Registers are at the top of COG/LUT $3F8..3FF(or $5F8..5FF)
The register space still works for all instructions because the D & S results are only 9 bits, so effectively registers are still addressed as $000..$1FF (the compiler takes care of this)
We can still use COG Registers as lookups based on $000+offset
Self-modifying code only works in the register space (ie code above #1FF/3FF) - easy to do an ORG $200/400 (The compiler can catch errors here)
We can still use FIT to check the boundaries (registers & special registers)

For a later revision of P2, perhaps the instructions could be modified (by global cog switch?) such that either S or D when read from cog (clock2) could read one of S or D from LUT by fetching/using 11 bits from the S or D address rather than 9 bits. LUT space could even be expanded further in a later revision.

Chip,
Assuming you moved special regs and made it so you could seamlessly go from cog to LUT with execution... what's stopping you from seamlessly going to hubexec from LUT? Isn't it essentially the same thing? You'd, of course, have the stall for the fifo to fill up to begin execution from hub, but that's the same as if you branched to hub, right?

I guess you could just put a branch in the last instruction slot of the LUT to go into hubexec.

Also, I still really strongly prefer not having the weird handling of the first 4k of hub. Just start hubexec at $1000. THe rom can still load starting at &0000, just that the entry point of the rom image is at $1000 in...

I got the special registers moved to the bottom of cog RAM. It makes code with interrupts a lot better. Since the interrupt vectors are located right after the special registers, they can conveniently precede code now.

There is one huge headache with all this, which was latent, all along: It's that this addressing scheme needs you to shift everything up by two bits to go from cog register address to actual assembler address. And then you must divide by 4 to get offset counts.

See all that divide-by-4 and multiply-by-4 stuff in the first several lines? It's tricky. My problem at a few different points in getting the new register locations to work was getting all this address stuff straight. It's too treacherous. This needs to be simplified, somehow, so that all that div/shl math goes away. It's very fatiguing to deal with, as it must be perfect before anything works right. It's just too complicated.

In looking at that code, I realized that some of the math could go away by using labels. This is much more tolerable, but still not a cake walk:

I think we just have to live with this register<<2 addressing. There seems to be no way out, given the greater hub context that must be regarded for other code and data.

There will always be some of that, but labels makes sense, and a SEGCOG or similar could swallow the ORG 8<<2 - the tools should be able to help the users here, and catch any concentration lapses with error messages.

Chip,
I think you feel the need to do the x<<2 thing for addresses in cog space because you are used to the P1 method where cog memory was addressed in longs (which was the "odd" way of doing it). Just use the byte addresses for everything and eventually you'll get used to using them.

Chip,
I think you feel the need to do the x<<2 thing for addresses in cog space because you are used to the P1 method where cog memory was addressed in longs (which was the "odd" way of doing it). Just use the byte addresses for everything and eventually you'll get used to using them.

I don't think it would be a good idea to have div-by-4 happen automatically for #, because it creates a discontinuity between behavior from immediate values and register contents.

Yeah, maybe instead of # doing it, we could have another symbol the means "immediate with div by 4". Maybe ##?

I think labels should be the byte address, not the long address, otherwise we'll have oddity between labels in cog code vs hub code, and labels should be able to mark data that can be unaligned, so doing #label should require a div 4 in cog space. Right?

Also, when you have the orgh $0001, that means the code starts at 1 byte into hub ram right? So your IP starts at 1 instead of 0. I still really hate that.
I'd much rather just have hub exec space start at $01000.

The only difference would be having orgh $01000 in front, and making the IP start at $01000 instead of $00001. You can have the booter either read in the ROM starting at $01000, or have it start at 0, and just have the entry point in the ROM be at offset $01000. Seriously, it's better that having non-aligned memory addresses below $1000 = hub exec, and aligned ones = cog/lut exec. This is just ugh... seriously, people are going to see that and scratch their heads going "WTF? What kind of kluge mess is that?"

Also, when you have the orgh $0001, that means the code starts at 1 byte into hub ram right? So your IP starts at 1 instead of 0. I still really hate that.
I'd much rather just have hub exec space start at $01000.

The only difference would be having orgh $01000 in front, and making the IP start at $01000 instead of $00001. You can have the booter either read in the ROM starting at $01000, or have it start at 0, and just have the entry point in the ROM be at offset $01000. Seriously, it's better that having non-aligned memory addresses below $1000 = hub exec, and aligned ones = cog/lut exec. This is just ugh... seriously, people are going to see that and scratch their heads going "WTF? What kind of kluge mess is that?"

I'll change the hub-exec rule to $01000+.

I made the ORG (for cog/lut) use register (long index) addresses, not byte addresses. That should probably be changed to byte addresses, right?

Wouldn't all/most of that addressing bit shift stuff go away if you made hub instructions long-aligned and starting at byte $1000? In this case, all instruction addressing is in terms of longs, not bytes:

Cog: $000-$1FF
LUT: $200-$3FF
Hub: $0400-$1FFF

This would have some other advantages as well:

* If relative addresses where byte-oriented, this gives relative addresses a greater range. If relative addresses were long-oriented, then it is now more consistent.

* The 20-bit address would now cover 4x as much instruction space. Of course, that won't do much for the P2, but a few people have expressed extending memory on an FPGA. And then there is the P3.

Also, I will make a plug for one variation on the above scheme:

Instruction Addressing that is local to a cog is in the form %0xxx_xxxxxxxx_xxxxxxxx. Instruction Addressing that is global to all cogs is in the form %1xxx_xxxxxxxx_xxxxxxxx. This would make the current P2 implementation look like:

Cog: $000-$1FF
LUT: $200-$3FF
Hub: $80000-9FFFF

This makes the entire hub memory executable. Because addressing is long-aligned, the hub can still be extended to $FFFFF (an additional 384K instructions), so you certainly aren't limiting your options. Further, this provides additional cog-local addressing space, should that ever be desired (e.g. LUT2).

And this does not affect data addressing, since each memory type has it's own instruction set:

I made the ORG (for cog/lut) use register (long index) addresses, not byte addresses. That should probably be changed to byte addresses, right?

Thanks Chip!

I think everything should use byte addressing as far as what you are expected to type into it, and have the assembler convert things that can be safely/cleanly converted automatically (such as the org stuff). I, also, like the idea of using ## or & (as Jac suggests) for cases when we have immediate values that should be shifted. To me this is a lot cleaner and easier to wrangle than having to manage all the /4 and <<2 stuff in your code.