Prop2 FPGA files!!! - Updated 1 January 2018 - Version 31

Comments

What loader protocol is required to load code into a P2? Is it the same as it was with P2-hot? I'd like to make a loader that will run on the Mac and Linux without having to resort to using wine to run PNut.exe.

It's been a while since I've looked at the new P2 instruction set. What does "@" mean? Is that relative addressing? Also, there are lots of CALLx (CALLA, CALLB, CALLD) instructions. Where do they put their return addresses? Are there still PTRA and PTRB registers? I assume CALL still works like before.

I think I almost understand the blinky example, but some things look magic...

What does "orgh 1" do? Why not just orgh ? Does this code start at $1000 (I think so)?

The last two lines with org and res x are hurting my brain...
Does the compiler load anything after "org" into cog before starting?
Or does this only work for "res" reserved space that doesn't need initializing?

That ORGH 1 is there because that's where the loader jumps into your code. It's that non-aligned hub exec below $1000 that people hate. I just haven't changed it yet. I kind of don't want to, because it allows most efficient use of memory. You could always just put a JMP #$1000 after it and pretend it's not really happening.

That ORG + RES business was just a quick way to get some symbolic cog registers declared. It doesn't generate any code. Each blinking cog will use its own instance of those registers.

What loader protocol is required to load code into a P2? Is it the same as it was with P2-hot? I'd like to make a loader that will run on the Mac and Linux without having to resort to using wine to run PNut.exe.

Thanks,
David

The ROM_Booter.spin is what runs on boot. It doesn't handle anything, yet, but serial loading. Then, MainLoader.spin gets downloaded by PNut.exe and it receives all the memory data and JMPs to your app. The last three longs get customized by PNut.exe for the board's RAM size and speed.

Not everyone hates it... :) I think it's a perfectly great idea that makes good use of the addressing scheme.

I really, really, really don't think we should change it. Having that region under $1000, non aligned as it is, makes for a perfect boot code area.

If I were king, I would add a write protect bit and some instruction state type latch to that bit so the region can be loaded from ROM, then kept from being trashed easily to simulate having a real ROM there. This is one pretty great feature P2 "hot" had, and I think we should keep it in this P2.

The beauty of doing it this way is there being no need to compromise on ROM / RAM like we had to in the "hot" chip. In that one, we all wanted to keep the ROM small, because it cost us RAM.

In this one, if we were to add a write protect bit for that region, or maybe the whole 16K region ($4000), debug, dev, monitor, and whatever else could go there. The chip would ship with some stuff, and a binary image could ship with other stuff. We get the choice of default ROM, use it all as RAM, or a newer "rom" and I think people would take great advantage of this facility to offer up tools, utilities, etc...

Say one is done with the development and or just wants to use the RAM as RAM. Simply include whatever you want in your binary image, and write over the area, using it as a data / font whatever buffer. Ignore the non-aligned code feature and carry on.

What loader protocol is required to load code into a P2? Is it the same as it was with P2-hot? I'd like to make a loader that will run on the Mac and Linux without having to resort to using wine to run PNut.exe.

Thanks,
David

The ROM_Booter.spin is what runs on boot. It doesn't handle anything, yet, but serial loading. Then, MainLoader.spin gets downloaded by PNut.exe and it receives all the memory data and JMPs to your app. The last three longs get customized by PNut.exe for the board's RAM size and speed.

Okay, so ROM_Booter.spin is the only thing that is fixed. MainLoader.spin is the second-stage loader and could be different for different loader implementations, right?

It's been a while since I've looked at the new P2 instruction set. What does "@" mean? Is that relative addressing? Also, there are lots of CALLx (CALLA, CALLB, CALLD) instructions. Where do they put their return addresses? Are there still PTRA and PTRB registers? I assume CALL still works like before.

It's been a while since I've looked at the new P2 instruction set. What does "@" mean? Is that relative addressing? Also, there are lots of CALLx (CALLA, CALLB, CALLD) instructions. Where do they put their return addresses? Are there still PTRA and PTRB registers? I assume CALL still works like before.

What loader protocol is required to load code into a P2? Is it the same as it was with P2-hot? I'd like to make a loader that will run on the Mac and Linux without having to resort to using wine to run PNut.exe.

Thanks,
David

The ROM_Booter.spin is what runs on boot. It doesn't handle anything, yet, but serial loading. Then, MainLoader.spin gets downloaded by PNut.exe and it receives all the memory data and JMPs to your app. The last three longs get customized by PNut.exe for the board's RAM size and speed.

Okay, so ROM_Booter.spin is the only thing that is fixed. MainLoader.spin is the second-stage loader and could be different for different loader implementations, right?

It's been a while since I've looked at the new P2 instruction set. What does "@" mean? Is that relative addressing? Also, there are lots of CALLx (CALLA, CALLB, CALLD) instructions. Where do they put their return addresses? Are there still PTRA and PTRB registers? I assume CALL still works like before.

It's been a while since I've looked at the new P2 instruction set. What does "@" mean? Is that relative addressing? Also, there are lots of CALLx (CALLA, CALLB, CALLD) instructions. Where do they put their return addresses? Are there still PTRA and PTRB registers? I assume CALL still works like before.

That is all the same, but with one difference. Now, there are only five bits of offset, so you have + 15 to -16 range. Those SUP bits haved moved down by one. Now, if the MSB is zero, you access immediate addresses 00 to FF.

Not everyone hates it... :) I think it's a perfectly great idea that makes good use of the addressing scheme.

I really, really, really don't think we should change it. Having that region under $1000, non aligned as it is, makes for a perfect boot code area.

If I were king, I would add a write protect bit and some instruction state type latch to that bit so the region can be loaded from ROM, then kept from being trashed easily to simulate having a real ROM there. This is one pretty great feature P2 "hot" had, and I think we should keep it in this P2.

The beauty of doing it this way is there being no need to compromise on ROM / RAM like we had to in the "hot" chip. In that one, we all wanted to keep the ROM small, because it cost us RAM.

In this one, if we were to add a write protect bit for that region, or maybe the whole 16K region ($4000), debug, dev, monitor, and whatever else could go there. The chip would ship with some stuff, and a binary image could ship with other stuff. We get the choice of default ROM, use it all as RAM, or a newer "rom" and I think people would take great advantage of this facility to offer up tools, utilities, etc...

Say one is done with the development and or just wants to use the RAM as RAM. Simply include whatever you want in your binary image, and write over the area, using it as a data / font whatever buffer. Ignore the non-aligned code feature and carry on.

The kluge to make hub exec work offset by 1 at addresses below $1000 is just that... a kluge. It's obviously a hack, and looks really lame to have in hardware as the way things work. Also, it doesn't give any advantage at all, I don't understand why anyone thinks it does?!?!

Having the 16k rom be loaded and then jumping to $0001 instead of $1000 is the difference in practice. I don't see how it more efficiently uses memory at all? There is still 16k of ROM loaded into the first 16k of hub. the only difference is where the entry point is at. However, now things work differently when you branch to addresses below $1000 depending on if your branch target is aligned or not. This is all the time, and it's odd and makes little sense.

Things should work in straight forward easy to describe ways, not with weird gotcha kluges.

Not everyone hates it... :) I think it's a perfectly great idea that makes good use of the addressing scheme.

I really, really, really don't think we should change it. Having that region under $1000, non aligned as it is, makes for a perfect boot code area.

If I were king, I would add a write protect bit and some instruction state type latch to that bit so the region can be loaded from ROM, then kept from being trashed easily to simulate having a real ROM there. This is one pretty great feature P2 "hot" had, and I think we should keep it in this P2.

The beauty of doing it this way is there being no need to compromise on ROM / RAM like we had to in the "hot" chip. In that one, we all wanted to keep the ROM small, because it cost us RAM.

In this one, if we were to add a write protect bit for that region, or maybe the whole 16K region ($4000), debug, dev, monitor, and whatever else could go there. The chip would ship with some stuff, and a binary image could ship with other stuff. We get the choice of default ROM, use it all as RAM, or a newer "rom" and I think people would take great advantage of this facility to offer up tools, utilities, etc...

Say one is done with the development and or just wants to use the RAM as RAM. Simply include whatever you want in your binary image, and write over the area, using it as a data / font whatever buffer. Ignore the non-aligned code feature and carry on.

The kluge to make hub exec work offset by 1 at addresses below $1000 is just that... a kluge. It's obviously a hack, and looks really lame to have in hardware as the way things work. Also, it doesn't give any advantage at all, I don't understand why anyone thinks it does?!?!

Having the 16k rom be loaded and then jumping to $0001 instead of $1000 is the difference in practice. I don't see how it more efficiently uses memory at all? There is still 16k of ROM loaded into the first 16k of hub. the only difference is where the entry point is at. However, now things work differently when you branch to addresses below $1000 depending on if your branch target is aligned or not. This is all the time, and it's odd and makes little sense.

Things should work in straight forward easy to describe ways, not with weird gotcha kluges.

Ummm... Is there any reason that the ROM has to be loaded starting at $0000? Why can't it be loaded starting at $1000 in the first place?

That is all the same, but with one difference. Now, there are only five bits of offset, so you have + 15 to -16 range. Those SUP bits haved moved down by one. Now, if the MSB is zero, you access immediate addresses 00 to FF.

Thanks Chip. So the encoding now looks like this? And the I bit in the instruction determines of the S field is interpreted as a register or as below.

Not everyone hates it... :) I think it's a perfectly great idea that makes good use of the addressing scheme.

I really, really, really don't think we should change it. Having that region under $1000, non aligned as it is, makes for a perfect boot code area.

If I were king, I would add a write protect bit and some instruction state type latch to that bit so the region can be loaded from ROM, then kept from being trashed easily to simulate having a real ROM there. This is one pretty great feature P2 "hot" had, and I think we should keep it in this P2.

The beauty of doing it this way is there being no need to compromise on ROM / RAM like we had to in the "hot" chip. In that one, we all wanted to keep the ROM small, because it cost us RAM.

In this one, if we were to add a write protect bit for that region, or maybe the whole 16K region ($4000), debug, dev, monitor, and whatever else could go there. The chip would ship with some stuff, and a binary image could ship with other stuff. We get the choice of default ROM, use it all as RAM, or a newer "rom" and I think people would take great advantage of this facility to offer up tools, utilities, etc...

Say one is done with the development and or just wants to use the RAM as RAM. Simply include whatever you want in your binary image, and write over the area, using it as a data / font whatever buffer. Ignore the non-aligned code feature and carry on.

The kluge to make hub exec work offset by 1 at addresses below $1000 is just that... a kluge. It's obviously a hack, and looks really lame to have in hardware as the way things work. Also, it doesn't give any advantage at all, I don't understand why anyone thinks it does?!?!

Having the 16k rom be loaded and then jumping to $0001 instead of $1000 is the difference in practice. I don't see how it more efficiently uses memory at all? There is still 16k of ROM loaded into the first 16k of hub. the only difference is where the entry point is at. However, now things work differently when you branch to addresses below $1000 depending on if your branch target is aligned or not. This is all the time, and it's odd and makes little sense.

Things should work in straight forward easy to describe ways, not with weird gotcha kluges.

Here's the thing, though. The assembler will never let you create cog code at non-long-aligned addresses. In other words, you will never be jumping to cog locations that are not long-aligned. If you jump to low addresses that are not long aligned, it can only be hub code.

Do we also get an FPGA image for the DE0-Nano with the Adapter board?
It was a supported board for the P2-hot, and I and maybe others bought it specially for P2 developement and tests.
The pin-assignement files and other necessary files should already exist, and also this adapter board would not be wasted.

Not sure how many cogs it can fit, but I was quite happy with the one P2-hot cog. Okay there were 4 hardware tasks then.
Perhaps now 2 cogs of the new simplified P2 will fit, without the cordic.

I hope some others also want to play with the DE0-Nano, otherwise it makes not much sense to take the effort.
I understand that you can't support too many different boards.

Also, I thought there was something special about calling COGID twice in a row. Or was that a P2-hot thing? Or am I just recalling incorrectly?

Nothing special. Could replace the second coginit with mov pin,x

re your following post about REP
I agree something nicer would be better. It will be the job of the compiler (not the FPGA image) so perhaps we can wait for this. We get by with Chip's pnut.exe and save complexities for Roy's open compiler later.

I would also like to see CALL -> CALLS since it uses the internal stack.
CALLD is fine for me although CALLR (cog register) would work fine too.

BTW I edited the coginit after you read my code. I think it should be coginit #3,#blink ???

I think the first parameter is:

#0 - #15 : specific cog
#16 : next available cog (WC indicates whether a cog was started or not)

Since the cog memory is no longer automatically cleared/reloaded, I think it would also be possible to have one cog force another cog to "jmp". For instance, an auxiliary cog could run a snippet (instigated by another cog), then call cogstop on itself.

BTW I edited the coginit after you read my code. I think it should be coginit #3,#blink ???

I think the first parameter is:

#0 - #15 : specific cog
#16 : next available cog (WC indicates whether a cog was started or not)

Of course. The #blink is the start address - need more coffee ;)
Wonder where you find out which cog started when using next available cog?

Since the cog memory is no longer automatically cleared/reloaded, I think it would also be possible to have one cog force another cog to "jmp". For instance, an auxiliary cog could run a snippet (instigated by another cog), then call cogstop on itself.