The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

Comments

I'm in favor of adjacent COG comms, provided it does not start another painful set of tradeoffs. COGS already need to be designated based on near PIN mapping.

(Greets all, I'm helping build a startup. Been reading and writing some PASM. The advanced comms is a topic I feel I don't add a lot of value to. I got a fun project just about done!

The comms look great, and I stand a chance of understanding them.)

@Chip: looks like we are really in a place to wrap it up. Can I put a write only flag for low RAM request in one more time?

The purpose is to prevent errant code from trashing on chip dev tools. A couple state changes done through COG ID should make unsetting that flag unlikely enough to be worth doing. If this causes any grief, or has more than a small amount of time attached to it, ignore this one.

No, not mutually exclusive.
I'm not sure where Chip is on this, but once you have Dual Port, it becomes easier to overlap portions Left/right to get quite large and user selectable buffers between COGS.
There may be some routing impacts, as this does spread the BUS more across the COG, so I'm not sure COG-COG Dual port is outside the critical timing path ?

Of course, I'd prefer the Smart Pins are polished & gaps closed, and handshakes added to Streamer before things like Dual Port are added, and that could even be a compile-time switch, as when the final die P&R is done, some elasticity on resource could be useful to pack things in.

It's easy to miss a "not" when reading. We tend to read what we expect (and we all expect good things of the next chip). And apparently we missed your misread, else we would have pounced all over you like a pack of wolves.

Thanks for your comments about the potential package size limitations. I briefly wondered about that when posting and should have explicitly asked.

So the package does sound like a big limiting factor. However, I did find a table on pg. 9 of the following Cirrus Logic .pdf that lists a 100-pin LQFP with a max die size of 10.48x10.48 mm in a 14x14 mm package (bond form 843-5036), at least if I interpreted it correctly. And that's right in the ball park of what it would take to double HUB RAM. But that doesn't mean such a package is available to Chip or necessarily a good choice. (I do like the 0.5mm lead pitch, though.)

... that lists a 100-pin LQFP with a max die size of 10.48x10.48 mm in a 14x14 mm package (bond form 843-5036), ...

Nice document. I do see many options for 100pin that have smaller maximums too. The other thing to remember is there will need to be room around the edge of the die for the GNDs to reach the underside heat spreader.

Ah, 75x75um for standard bonding pad dimensions.

"There's no huge amount of massive material
hidden in the rings that we can't see,
the rings are almost pure ice."

In the 24/16/8 bit mode the group is positioned at the top of PortB.
Currently we have P63,P62 (Rx,Tx) pins and the pins P61 to P58 will be SPI/EEPROM/SD?.
This will limit these modes to nbits - 4 usable bits, 8 bit mode will be affected in particular.
Can this group be moved elsewhere, maybe top of PortA.
The new 4/2/1 bit modes may require the same move too.

With a range of possible sources, and higher MHz being smaller these days, being able to divide the Xtal by a modest amount before PFD is useful. Something like 4-5b Xtal Divide and 6-7b PLL divide gives better PLL control, but still light logic.

I found more info and measurements on CL with Xtals, that may be useful.
Si5351A PLL has CL choices of 6,8,10pF and on the board tested, 8pF is +7.8ppm and 10pF -14.3ppm, which gives appx 11ppm/pF for that Crystal, and indicates it's probably a 9pF CL device.
Lower CL are more common, as that needs less Oscillator current.
Murata compact ones (2.0 x 1.6mm) are 24~48MHz in 11 stocked values @ Digikey, and 6pF CL
In 20ppm spec, 8 values from 24~32MHz are stocked, 6pF CL

both SiLabs and intel look to spec the CL that results, which makes for easier matching with crystal specs
The very smallest packages have higher ESR, & a quick scan at Digikey shows the largish 3.2 x 2.5mm package has ESR as low as 40 ohms for 32MHz

With a range of possible sources, and higher MHz being smaller these days, being able to divide the Xtal by a modest amount before PFD is useful. Something like 4-5b Xtal Divide and 6-7b PLL divide gives better PLL control, but still light logic.

I found more info and measurements on CL with Xtals, that may be useful.
Si5351A PLL has CL choices of 6,8,10pF and on the board tested, 8pF is +7.8ppm and 10pF -14.3ppm, which gives appx 11ppm/pF for that Crystal, and indicates it's probably a 9pF CL device.
Lower CL are more common, as that needs less Oscillator current.
Murata compact ones (2.0 x 1.6mm) are 24~48MHz in 11 stocked values @ Digikey, and 6pF CL
In 20ppm spec, 8 values from 24~32MHz are stocked, 6pF CL

both SiLabs and intel look to spec the CL that results, which makes for easier matching with crystal specs
The very smallest packages have higher ESR, & a quick scan at Digikey shows the largish 3.2 x 2.5mm package has ESR as low as 40 ohms for 32MHz

My understanding has been that to get 7.5pF loading capacitance, you need one 15pF on each crystal pin, since they are effectively in series, causing the net capacitance to be half of each cap. With this in mind, we actually have 7.5pF and 15pF options in the current design, which are about right. The main target is a 20MHz crystal.

My understanding has been that to get 7.5pF loading capacitance, you need one 15pF on each crystal pin, since they are effectively in series, causing the net capacitance to be half of each cap. With this in mind, we actually have 7.5pF and 15pF options in the current design, which are about right. The main target is a 20MHz crystal.

Broadly, yes, that is correct, but the CL values should be verified in a working circuit, as the feedback capacitance of the linear amplifier gets into the mix as well, as do bonding wires etc.
I would specify the actual effective CL on the data sheets, not some internal value that is only part of the story.

What range of Crystals is P2 designed to support ? Is there an ESR max target value yet >

Chip,
There are two guys who are extremely serious about wanting a P1V.
Perhaps they may be willing to fund the licence section to have a P1V placed inside the frame. What are the chances the frame will work in its basic mode for P1V?

Curiosity killed the cat, but I'm still curious about the feasibility of increasing the die size to make room for 1MB. I just reread Chip's musing (however brief, and assuming he wasn't kidding) about resorting to a 17x17mm die for P2-Hot to dissipate a possible 5W of heat. Okay, that sounds big and expensive, but what about a 10x10mm die size? ...

Doh! How come no one corrected me on the not? I was babbling away happily about choosing double the HubRAM.

Regarding the die size, I'd think it's seriously constrained by the QFP100 package size. Any talk of larger for the Prop2-Hot was when targeting a package with closer to 200 pins I'd guess. The Hot design did have a full extra 32 I/O.

The bigger the die gets, the lower the yield goes. To do 1MB, we really need a finer process, which costs a lot more.

I could support the BeMicroCV which has a Cyclone V -A2 device on it, which holds more than the DE0-Nano. I think that board is only $49. It would do one or two cogs and several smart pins, maybe 64KB hub RAM. If you wanted that working, I could start doing compiles for it, along with the others.

I would be extremely pleased to see such a build for the BeMicroCV. Like many of us, I have an awaiting application... Thanks for keeping us in the loop, it is appreciated.

The difference between theory and practice is that, in theory, there is no difference between theory and practice, but in practice, there is.

It seems the main area of improvement could be in the simplification of the development environments.

Given that just about every uC coming onto the market now has internal debugging circuitry, either JTAG or proprietary, and associated debugging 'pods' for around the $30-$50 mark, how will the P2 stack up against them?

It seems the main area of improvement could be in the simplification of the development environments.

Given that just about every uC coming onto the market now has internal debugging circuitry, either JTAG or proprietary, and associated debugging 'pods' for around the $30-$50 mark, how will the P2 stack up against them?

The main bonus is the target Cog code packing is not disturbed. If timing is critical then the debugging will have to be incidentally gleaned in real time by sampling what's appearing in HubRAM or physical I/O.

Hand crafted debugging is always the best method anyway.

"There's no huge amount of massive material
hidden in the rings that we can't see,
the rings are almost pure ice."

The main bonus is the target Cog code packing is not disturbed. If timing is critical then the debugging will have to be incidentally gleaned in real time by sampling what's appearing in HubRAM or physical I/O.

Hand crafted debugging is always the best method anyway.

Single stepping through time critical processes...hmmm. So, there is no way to just freeze all of the cogs and look at the states of all the variables?... hmmm.

Single stepping through time critical processes...hmmm. So, there is way to just freeze all of the cogs and look at the states of all the variables?... hmmm.

If it really is time critical, the general approach is to break after that section, rather than single-step.
Also Capture of tracking info into memory, inside the time critical processes (adds a few SysCLKs), so that is common.
or if it is Pin-IO based, another COG can capture.

ok... I got it. I am older and I have been between moments of perfect clarity.

Next time the light looks right, I am going to go back and look at a problem again and figure it out.
But... I wouldn't have to be thinking exactly correctly if I knew exactly what was happening at a particular moment in two different cogs... cycling through two different loops: STOP@ClockPlusX.

I wouldn't have to be thinking exactly correctly if I knew exactly what was happening at a particular moment in two different cogs... cycling through two different loops: STOP@ClockPlusX.

There could be different ways to manage multiple COGS debug.
With each Debug-stub supporting a PinCell UART, you probably could run multiple PC-Side sessions, and then use some absolute-time-stamp reporting to figure out how to align trace-reports.

Sounds like another use for the 4 UART device I suggested

I think you could even pair 2 EV-Boards, (idle one P2), to connect 4 more Debug Channels to the Active P2, and have 2 developers working on 4 Debug Channels each !

By the way, I don't believe in competing. I have always found success when I found a place where there was no competition.
I hate to create losers... and find no joy in "beating" anyone. My father taught me... "never stand in line. If there is a line, go somewhere else."

Chip is right "there" with this design, totally unique, hard to compare... and the answer to a lot of questions.

By the way, I don't believe in competing. I have always found success when I found a place where there was no competition.
I hate to create losers... and find no joy in "beating" anyone. My father taught me... "never stand in line. If there is a line, go somewhere else."

Chip is right "there" with this design, totally unique, hard to compare... and the answer to a lot of questions.

Does your solution exist in other comparable mcu's?

I agree with your dad. I'm going to tell my kids to stay out of lines, too. I have always avoided them, too, out of some survival sensibility.