Yes, if the PC is between $200 and $3FF, the cog is fetching instructions (at full speed) from LUT instead of COG. Like you suggest, you could execute from the LUT and use the COG ram purely as data registers. Combine that with shared LUT mode, where the paired cog could dynamically swap out executable code, and you end up with some really interesting execution options!

I wonder how elastic the LUT size is ?

If we are wildly optimistic for a moment, and presume the routed device has spare space after the 512k RAM is included, how easy is it to increase the LUT size to the next notch ?

If there is any spare space after the 512K RAM is included I rather see it filled up with even more hub RAM. It seems like that would be much easier to do than increasing the LUT size.

I just found a bug in XBYTE. If the next instruction in the pipeline following the _RET_/RET to $1F8..$1FF had an immediate D field, it wouldn't read the LUT byte. Someone had said earlier that they had some funny problem with XBYTE. I imagine this was it. I just discovered this in optimizing the D mux.

100MHz for the next FPGA release is going to be no problem. I'm almost wondering if we could get 120MHz.

I just found a bug in XBYTE. If the next instruction in the pipeline following the _RET_/RET to $1F8..$1FF had an immediate D field, it wouldn't read the LUT byte. Someone had said earlier that they had some funny problem with XBYTE. I imagine this was it. I just discovered this in optimizing the D mux.

All the more reason to truly lock the design. That could have been a subtle bug to find. Moreover, it shows that the P2 will need a lot of testing, which will be much more successful if the design is not a moving target.

Here's a suggestion: create a new Google spreadsheet or document called "P3 ideas" and make it editable. If anyone has a new idea or suggestion for the P2, we all are responsible for redirecting that person to add the idea/suggesting to that document instead. That way, the person's idea is not being shot down just because the P2 itself is locked down, and we capture the idea in a place that will be more discoverable than sifting back through forum posts will be.

The trouble is, a design freeze has been announced several times already. How do we know that a new one will really stick?

All it takes is Chip saying "add it to the P3 Ideas document" for every suggestion that comes up. And if Chip floats a new idea, it's up to the rest of us to say "add it to the P3 Ideas document" instead of discussing it. And if you are wondering if even this will happen, we shall see. I'm sure that jmg will be testing Chip's resolve as soon as that document is created.

The trouble is, a design freeze has been announced several times already. How do we know that a new one will really stick?

All it takes is Chip saying "add it to the P3 Ideas document" for every suggestion that comes up. And if Chip floats a new idea, it's up to the rest of us to say "add it to the P3 Ideas document" instead of discussing it. And if you are wondering if even this will happen, we shall see. I'm sure that jmg will be testing Chip's resolve as soon as that document is created.

I was not just adressing the matter of new ideas, but the general idea of a design freeze; that includes refraining from fixing things that don't work.

While that statement still holds, the design isn't ready to be frozen.

The feature set should be frozen now, which is where the "ideas for P3" repository can be used as a parking lot.

I fully understand the desire to continue to improve the design while fixing errors. That's where I rely on my managers to say "good enough" and deliver. Without a manager to make that call, Chip needs to exercise that rigour for himself.

I'm not saying that it is ready to ship (not my place). My comments were in response to the statement about design freeze.

As this work isn't happening in response to a contract from a particular customer Chip has the freedom to fix "everything". If this was a contracted delivery with declared shipping date he would have to meet that, even at the expense of leaving things "broken".
I see many comments that seem to miss this important distinction.
Without a declared end date, projects run the risk of never being finished. The cost of such a date is that the end product may contain known problems. You generally can't have both a declared end date and a perfect product.

The errata I have seen on some chips is quite large, and they often take many revisions to fix them. Some of the published workarounds say don't use xxx which are not really workarounds.

I don't recall seeing any errata on the P1, nor am I aware of any bugs. There is a PLL issue if you don't connect all the power and ground pins but I would consider this as a user issue, not a bug.

That being said, the P2 is considerably more complex, and there hasn't really been any concerted testing efforts since the P2HOT. There will most likely be bugs. As long as they don't break everything, they will likely just block a specific piece working. Because the P2 is so flexible there will be other ways around any such problems.

Even in a worst case where the smart pins didn't work, we can still drive them directly. We have 16 cogs after all. Maybe there would be a few things it wouldn't do, but it wouldn't break the chip.

Many of us just wanted a P1 with more HUB RAM, faster, and more I/O. The P2 kills that hands down.

...
That being said, the P2 is considerably more complex, and there hasn't really been any concerted testing efforts since the P2HOT. There will most likely be bugs. As long as they don't break everything, they will likely just block a specific piece working. Because the P2 is so flexible there will be other ways around any such problems.

Even in a worst case where the smart pins didn't work, we can still drive them directly. We have 16 cogs after all. Maybe there would be a few things it wouldn't do, but it wouldn't break the chip.
...

There hasn't been any formal testing, but people have been testing a few things since P2-Hot. If everyone would re-run what they've done on previous versions it would go a long way toward testing out new FPGA images. I'm hoping that any changes after the next version will be small and incremental. So I would encourage everyone to run whatever they have on each new FPGA image from now on.

Chip did build a test chip containing Smart pins. This should have provided an effective test bed for the analog circuitry, and whatever digital circuitry he included in the test chip.

I feel that the P2 is getting very close. There is light at the end of the tunnel. Hopefully it's not the headlights of a high speed train coming straight at us.

It would be nice if people could restrain themselves from proposing their own pet ideas. There are several things that I would have liked to see in the P2, but I feel it would be reckless to propose them at this point. I think the bit ops are a good example of something that wasn't absolutely necessary for the P2. This ended up consuming a couple of weeks, and caused a bit of reshuffling of the instruction set. There are other ways to do bit operations at the expense of a few extra cycles.

I am all for having discussion on the P3, but can it wait a few months until the P2 is sent off to the foundry? I don't think this forum is capable of having a completely separate discussion on the P3 without it spilling over into the P2. It would be more productive for everyone to test the P2 than to discuss new features on the P3 right now.

I'm hanging out for a Prop2 loader that works on Linux. PNut.exe plays havoc with the DTR line, preventing the Prop2 from accepting the program download.

Dave Hein's loadp2 program works great on Linux --that's what I've been using. He also posted a p2asm assembler, which I think works well but I haven't used as much (I mostly use spin2cpp/fastspin for my P2 development).

I'm glad the loader works for you. I've been doing a little more work on GCC for the P2, and I hope to post an update soon. The loader hasn't changed much, except that I added a -v option to enable a verbosity mode. If the -v option isn't specified in the new version it disables the prints so it runs silently.

I've modified p2asm to generate an object file, and I wrote a linker call p2link to produce an executable binary file. The mods to p2asm and p2link are based on the work I did on the Taz C compiler. All the tools are tied together with a bash script called p2gcc. It's actually working out pretty well. I can compile a C program and load it on the P2 board by typing "p2gcc -r -t hello.c". The -r will cause loadp2 to run, and the -t option is passed to loadp2 to run the terminal emulator. Eventually I'll update spinsim and tie that into p2gcc so that programs can be compiled and run on the simulator by typing "p2gcc -sim hello.c".

At 120MHz, it's moving 64KB of data within the hub in 500us. That's with a read-then-write transfer buffer of 32 longs. I can almost double that speed by going to a 256-long buffer, but there won't be that much free space in the interpreter. As it is, we are a little better than half of theoretical full speed with a 32-long buffer.

This thing took me three days to write. Because there are a few levels of performance possible in the Prop2, optimizing general-purpose things like this gets complicated.

Chip, that looks great. However, I wonder if you could get a speed improvement by separating xxxxFILL from xxxxMOVE. Also, because the P2 does unaligned reads and writes the BYTE and WORD operations could be almost as fast as the LONG operations. It just requires performing 0 to 3 BYTE accesses at the beginning, and then doing LONG accesses after that. This is the code I used for memset() in p2gcc, which is basically the same as BYTEFILL() in Spin.