General questions :
What is the current USB / Smart pin / Code status ? ISTR there were regression issues around USB ?

I have been unable to find the root cause for my USB demo code fail on any FPGA image release post v18. At this time I don't think the USB smartpin implementation is the problem, because the USB traffic looks good on the analyzer. I've checked and re-checked that I've got the syntax correct for all of the P2 instructions that were changed post-v18, but still the only thing I can say with certainty is that the v18 code was stable, and on v19+ it's not

I haven't made any progress since my last code posting to the "USB Testing" thread. It's a significant amount of code, and the list of v18-v19 instruction change list was fairly long. Maybe some fresh eyes can expose something that I've been missing.

General questions :
What is the current USB / Smart pin / Code status ? ISTR there were regression issues around USB ?

I have been unable to find the root cause for my USB demo code fail on any FPGA image release post v18. At this time I don't think the USB smartpin implementation is the problem, because the USB traffic looks good on the analyzer. I've checked and re-checked that I've got the syntax correct for all of the P2 instructions that were changed post-v18, but still the only thing I can say with certainty is that the v18 code was stable, and on v19+ it's not

I haven't made any progress since my last code posting to the "USB Testing" thread. It's a significant amount of code, and the list of v18-v19 instruction change list was fairly long. Maybe some fresh eyes can expose something that I've been missing.

This is the one area I'm concerned about.

Right now, I'm getting timing information together for our I/O pad, but this USB matter will need resolution very soon.

More the latter, I think. If the stars align properly, there is repeatable expected behavior. Then if you make a code change, which can be as trivial as adding a NOP, what was previously repeatable now isn't.

About the only thing that looks consistent is that when it breaks, it's in code that uses a timer event, and it has gone into spinlock waiting for the event or, less frequently, will spin until timer wrap-around occurs. It's like the timer event mechanism somehow gets pushed out of step just enough to miss the trigger point. But I can't say for certain that it's timer related because I built a small stand-alone test to exercise two timers in a similar manner, and it works as expected.

More the latter, I think. If the stars align properly, there is repeatable expected behavior. Then if you make a code change, which can be as trivial as adding a NOP, what was previously repeatable now isn't.

Just to clarify, in this 'stable state', your complete USB test suite works fine ? ie - survives power cycles and resets, and only breaks on a code edit ?
These are 120MHz tests ? - should the code work at 96MHz & 80MHz if a build was done for those ?
I think the 120MHz is 'overclocked', so maybe you 'hit the jackpot' with a SW-HW timing aperture failure ?

Then the edit can be seemingly trivial/unrelated and that can break things...
I've seen MCUs with CACHE reads, vary timing, by code align, but I don't think P2 does that.

The other effect could be aperture related, eg like where a timer event is seen in one half of an opcode, but not the other (or similar) ?
Does P2 trigger on ==, or on >= ?

About the only thing that looks consistent is that when it breaks, it's in code that uses a timer event, and it has gone into spinlock waiting for the event or, less frequently, will spin until timer wrap-around occurs. It's like the timer event mechanism somehow gets pushed out of step just enough to miss the trigger point. But I can't say for certain that it's timer related because I built a small stand-alone test to exercise two timers in a similar manner, and it works as expected.
Frustrating, indeed.

Is there room/time to toggle a pin while it waits, or clr/set a pin either side of the wait ?

The trouble is, synthesis and integration are increasingly reliant upon everything being exactly-known quantities. It doesn't fly, anymore, to say, "Here are my setup and hold times, along with my clock delay." Nope. Now they want to simulate your design from a schematic extracted from the actual layout. This is maybe a $50k expense. In the end, they have a "Liberty" file which makes your foreign I/O pin all part of the highly-detailed soup. I suppose this has to do with humans being trusted less and less to provide raw input to the process. Everything must be known, exactly. I think this paranoia comes from the high end, where tens millions of dollars are at stake and no unknowns are permitted.

So, it's easy to make a chip that uses a foundry's own IP, or some other proven IP, but to introduce your own transistor-level circuitry creates a huge ripple in the process.

If all that was required for this project was submitting a set of Verilog files, it would be so easy. Without good analog pins, though, we'd be missing a lot of functionality.

This is maybe a $50k expense. In the end, they have a "Liberty" file which makes your foreign I/O pin all part of the highly-detailed soup. I suppose this has to do with humans being trusted less and less to provide raw input to the process. Everything must be known, exactly.

Since so much is automated, where does the $50k come from ?
You have a SCH that the layout was made from, so do they need to re-create a new SCH from the physical design, to exactly include the routing delays etc

This is maybe a $50k expense. In the end, they have a "Liberty" file which makes your foreign I/O pin all part of the highly-detailed soup. I suppose this has to do with humans being trusted less and less to provide raw input to the process. Everything must be known, exactly.

Since so much is automated, where does the $50k come from ?
You have a SCH that the layout was made from, so do they need to re-create a new SCH from the physical design, to exactly include the routing delays etc

Even there, $50k is not that large a cost, as it is a one-off.

It's for extracting a schematic from layout and then simulating it to learn the exact performance. The extraction includes all parasitic capacitances and resistances of wires. The layout-to-schematic extraction is an automated process. They just need to know how to stimulate various nodes to evoke the timing behavior on internal flop nodes and external pins. $50k feels like a lot of money. And if you make a change...

1. You and Beau layed out the ring soup.
2. OnSemi built a test chip that failed due to integration of the CPU logic with the ring soup.
3. Treehouse (recommended by OnSemi) then re-layed out the ring soup for $$$
3. OnSemi produced a test chip of the ring soup around a year ago.
4. OnSemi are producing another test chip of the ring soup.
5. You go to join the ring soup and the verilog CPU. Now OnSemi says there is an integration problem!

Time for some serious heated discussions with OnSemi senior management!!!

And raise the issue of integrating their FLASH/EEPROM IP too. They do have a solution - you just haven't been given access to the right people.

My observation is OnSemi is just giving you acces to beginners who dont properly understand their own processes and are learning at your expense. Remind them that your process is public, and it's their reputation that is at stake too. If this fails, they must bear a lot of the blame for not being able to explain the process up front.

1) True.
2) That was OpenSilicon that did that work. And, yes, it was an integration problem.
3a) True.
3b) True.
4) True.
5) Yes, but this is not the GDS2-handoff department that only does a design-rule check. This is the design-resource department that does complete chip design.

Nobody is at fault here. It's just the way things work. The people that do the synthesis and integration rely increasingly on complex tools to know that things are correct. Everything must pass through their extrusion die to reach homogeneity that the soup processing system demands. It does feel like it's all headed towards government paperwork, though.

It's for extracting a schematic from layout and then simulating it to learn the exact performance. The extraction includes all parasitic capacitances and resistances of wires. The layout-to-schematic extraction is an automated process. They just need to know how to stimulate various nodes to evoke the timing behavior on internal flop nodes and external pins. $50k feels like a lot of money. And if you make a change...

If that's pretty much all automated, yes, it does feel like a lot on money.
Does that include a calibrate pass, where they use the actual silicon to confirm the models ?

What would a smart pin look like if made from onsemi IP (or the other IP you refer to) ?

Would you still be able to squeeze an onsemi fast dac or two + ADC into the area currently reserved for each pin ? I know you've put so much effort into those pin drivers, but still worth asking even at this late stage.

What would a smart pin look like if made from onsemi IP (or the other IP you refer to) ?

Would you still be able to squeeze an onsemi fast dac or two + ADC into the area currently reserved for each pin ? I know you've put so much effort into those pin drivers, but still worth asking even at this late stage.

I don't know about their analog IP, but I'm sure it would cost something. For sure, we could get CMOS I/O pins, probably similar to what the Prop1 has.

Chip,
Are you sure it's not worth producing a P1 with completely their IP, with just Hub Addressing extended to 20 bits (1MB) ???
ie Keep everything else the same: 8 cogs, 32 I/O, I2C external eeprom, etc.
It would be much faster than a P1.

Things needing to be done...
1. IIRC their is a PLL issue that needs resolving but I presume this is done with P2.
2. Add instruction (HUBOP) to switch the Cog to use 1MB HUB. Default is 64KB so that SPIN Interpreter still works.
3. Option: Change the BOOTER code to load 64KB from external I2C 24C512 EEPROM.
4. Option: Change the BOOTER code to download up to 64KB.
5. Minimal BOOT ROM to load the boot code and Interpreter (booter, runner, Interpreter) into the same hub locations as P1 (hub ~60-64KB).
6. Crystal / RC oscillator circuit needs solving - from P2 ?
7. Option: Hub slot could be 8:1. I have this working but no idea if their is a timing problem. It's a 1-line change. Enable with a HUBOP Instruction ???

This could be tested very quickly by lots of us. It's immediately saleable.

In my above P1x example, I have gone for total backward compatibility. Hardly anything new except hub ram, so minimal risk. It should at least be 2X faster clock speed although it's quite possibly much faster. And a possible 2X Hub access (1:8) too.

With a huge increase in Hub RAM, an increase in I/Os is not so much of a problem. Perhaps a nicer QFP IC package with 0.8mm pitch?

Of course it would be better if the EEPROM/FLASH could be internal.

Hopefully good sales would bring a later variant with a few more features such as more I/O and ADC.

Chip,
Are you sure it's not worth producing a P1 with completely their IP, ...

The Prop2 Smartpins are still feature rich without the analogue portion. It is well possible to do a logic pins only version of the Prop2. In fact that was already a suggested route for a smaller/cheaper Prop2 when Chip has the time and money to do variants.

I can see Chip is already pondering this route now but the investment in the analogue features has mostly completed so it would be a small crime not to finish the job.

Cluso, have I introduced you to the benefits of max10 yet? Has internal flash...

Have an iCE40UP5K QFN48 board, as does Peter. Has OTP on chip or loads from SPI Flash. ~$6.50

IMHO, P2 is too much, too late!

Unfortunately I can no longer see P2 gaining much traction, if/when it finally arrives.
There has been little verification done on the latter images. Many bugs likely still exist.
There are major software tools still to be done.
There are only a few hobbyists left.

Without a lot of external support, much of which has left for other pastures, P2 won't get the traction to gain many commercial design wins. Such an expensive design process cannot be supported by the education market alone.

That is why I keep suggesting a P1 extension. IMHO it's the best short term strategy to get a financial return to help fund the extra costs of P2 to get it to market.

Cluso, have I introduced you to the benefits of max10 yet? Has internal flash...

Have an iCE40UP5K QFN48 board, as does Peter. Has OTP on chip or loads from SPI Flash. ~$6.50

IMHO, P2 is too much, too late!

Unfortunately I can no longer see P2 gaining much traction, if/when it finally arrives.
There has been little verification done on the latter images. Many bugs likely still exist.
There are major software tools still to be done.
There are only a few hobbyists left.

Without a lot of external support, much of which has left for other pastures, P2 won't get the traction to gain many commercial design wins. Such an expensive design process cannot be supported by the education market alone.

That is why I keep suggesting a P1 extension. IMHO it's the best short term strategy to get a financial return to help fund the extra costs of P2 to get it to market.

Too much, too late.

Perhaps. It's amazing how much work has gone into this thing. And I mean EVERYONE's work here.

I'm plagued by doubt, myself. I worry that what people want these days is not necessarily something precise, but just something that lets them connect their Arduino-level creation to the internet. Oh, and with 2-way video capability. That's the modern expectation. And it should be able to play video games that already exist. Better have BlueTooth, too. Even a kid would expect that as a baseline function set. What they want is a computer with lots of "connectivity".

I just want something that would be fun to program and let me play with analog signals, trigonometry, and logarithms without any hiccups or hijacking. Maybe not many people care about the basics, anymore. The game's been upped to something more communal than esoteric. It's all about being a part of the "ecosystem", in every regard. I wish I found all that more interesting.

What to do? We are about to spend $250k, at least, to make this chip, which may have some bugs. Is it worth it? Such questions have gotten old, I know. I wish I had a stronger sense of what to do. I find programming the Prop2 rather tedious, at times, which I think is kind of ominous. It's because lots of optimization is possible. Maybe I'm from the generation that considered the matter of "implementation" to be paramount, while the newer generation assumes that whatever you want to do should just flow effortlessly.

On the other hand, could there be some pent-up, unrecognized desire for things that work precisely and with low latency? Things that let you really tinker? Are there many engineers out there who need FPGA timing exactness, but without the raw fabric - rather, with bigger functional chunks? We could shine there. If we make something really snappy to use, people could learn on it very quickly - but, maybe those same people hold computer-level expectations of what should be possible. I wish I knew.

Cluso, have I introduced you to the benefits of max10 yet? Has internal flash...

Have an iCE40UP5K QFN48 board, as does Peter. Has OTP on chip or loads from SPI Flash. ~$6.50

IMHO, P2 is too much, too late!

Unfortunately I can no longer see P2 gaining much traction, if/when it finally arrives.
There has been little verification done on the latter images. Many bugs likely still exist.
There are major software tools still to be done.
There are only a few hobbyists left.

Without a lot of external support, much of which has left for other pastures, P2 won't get the traction to gain many commercial design wins. Such an expensive design process cannot be supported by the education market alone.

That is why I keep suggesting a P1 extension. IMHO it's the best short term strategy to get a financial return to help fund the extra costs of P2 to get it to market.

Too much, too late.

Perhaps. It's amazing how much work has gone into this thing. And I mean EVERYONE's work here.

I'm plagued by doubt, myself. I worry that what people want these days is not necessarily something precise, but just something that lets them connect their Arduino-level creation to the internet. Oh, and with 2-way video capability. That's the modern expectation. And it should be able to play video games that already exist. Better have BlueTooth, too. Even a kid would expect that as a baseline function set. What they want is a computer with lots of "connectivity".

I just want something that would be fun to program and let me play with analog signals, trigonometry, and logarithms without any hiccups or hijacking. Maybe not many people care about the basics, anymore. The game's been upped to something more communal than esoteric. It's all about being a part of the "ecosystem", in every regard. I wish I found all that more interesting.

What to do? We are about to spend $250k, at least, to make this chip, which may have some bugs. Is it worth it? Such questions have gotten old, I know. I wish I had a stronger sense of what to do. I find programming the Prop2 rather tedious, at times, which I think is kind of ominous. It's because lots of optimization is possible. Maybe I'm from the generation that considered the matter of "implementation" to be paramount, while the newer generation assumes that whatever you want to do should just flow effortlessly.

On the other hand, could there be some pent-up, unrecognized desire for things that work precisely and with low latency? Things that let you really tinker? Are there many engineers out there who need FPGA timing exactness, but without the raw fabric - rather, with bigger functional chunks? We could shine there. If we make something really snappy to use, people could learn on it very quickly - but, maybe those same people hold computer-level expectations of what should be possible. I wish I knew.

Chip,
IMHO, what you are after is more unusual for your age, and even more so for those younger. A number of us on the forum are older, and we are searching for the easy way to do things that are a joy to do. We have usually been there, done that, and realise that the modern way doesn't give us the enjoyment it does when you have total control of your design. The P1 gives us the enthusiasm and reward because we can do things with total control, do it simply because we can use Spin when parts don't need the speed or precision, and use assembler when something has to be done fast and/or timing specific. Having 8 cores gives us the ability to mix these program block without the usual interrupt problems that plague other micros.

I keep on looking at ARM processors. But every time I do, I look at the instruction set and go shock, horror! Then I look at the registers - way too complex! Need an ARM Degree just to get a LED or PIN to flash/toggle. They say just use C or some other HLL. Now I don't have any control. Goodbye ARM till I look again.

We love the P1 concept, but a lot of us have outgrown its capabilities years ago. The world has moved on. But, I just want a BIG P1.

Unfortunately, the P2 is no longer a Bigger P1. It has evolved into a much more complex design that is never-ending. Sure, we are all to blame for this, and the longer it has taken, the more has been added.

The original expectation for P1 was that you throw a core or two to make a smart peripheral. It's quite easy to do, using the same old instructions. There were a couple of difficulties that could do with a little hardware and/or instruction assistance. But now we have the smart pins that have become quite complicated to understand.

P2 also has these great mathematical instructions. This added a lot of complexity. Do we need them? I cannot answer that as I don't know where I could use them. Sure, I can just ignore them. But newbies may not recognise this. The P2 manual needs to be many many pages. Maybe the manual will end up being as big and frightening as the ARM Manuals.
And who will write these? Remember, the P1 is still lacking in some areas, over 10 years after release

We all wanted a BIGGER P1. Many have given up and moved on.
IMHO, those left are in two camps...
1. Those who want the P2 whenever it becomes available, and
2. Those who just want a Bigger P1 yesterday.

I am now very firmly in the #2 camp. As you know, I have been here for a couple of years. I am now so desperate I am willing to sacrifice almost everything to just get a P1 with a lot more HUB RAM !!! I am definitely not alone.

There are a number of users that would use a Bigger P1 now. How many commercial users require the current P2, and couldn't use a Bigger P1 instead ???

Remember, P2 requires lots of things before it will be useful for most commercial users. How many year(s) will this take?

A Bigger P1 which is backward compatible would be usable from day one of silicon release.

Two years ago I still thought P2 would be viable. Unfortunately I no longer think that is the case. If the P2 does get done, I hope you prove me wrong.

Chip,
Further to your comments, IMHO the Arduino is irrelevant for commercial users. For a lot of hobbyists, this is still where the action is. But the Arduino removes a lot of hardware and software understanding.

The requirements you perceive seem to be from the hobbyists point of view. You need to be considering the requirements from a commercial view. The hobbyists are not going to financially support the P2 any time soon.

I share your desire to make your programs run faster using all the tricks. But there isn't many of us that think this way, and in most cases it's not a requirement anymore. FWIW the original Macs had a largish and very tight ROM code to operate the basics. It gave the Macintosh an advantage. I am fairly sure the iPhone doesn't have this level of tweaked code.

I realise the next step of $250K++ is a lot to spend, particularly when there are no guarantees it will be bug free. At the risk of repeating myself again, even when the silicon is done, there is so much software and documentation to be done before the P2 can get any worthwhile traction.

Certainly, there are a lot more people playing with, or using, FPGAs. They are quite complex and the tools are cumbersome. I find them fun to play with, but there is a lack of good info to learn the ins and outs. For instance I have had trouble getting info to have simple build options for how many cogs in P1V. So yes, there is a possibility here. Currently most FPGAs seem way over priced. The lower end is changing though.

From where I sit, what is missing in the marketplace is a fast micro, simple and easy to use, fun to program, lots of RAM, and deterministic. This requires the multiple core style micro, and that is solved by the P1/P2 concept.

Chip, please continue on with the P2. Many of us are looking forward to using it in the future. I'm sure it will be used in many projects and products. I believe it will be a big success. I thank you for your persistence on the P2, and I look forward to using it soon.