Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

Comments

I registered all the incoming pin signals and I can't get a download failure, anymore. This should have been done from day one, but I was worried about adding clock delays. One of the first rules in synchronous design is to register all inputs. I wonder what intermittent flakiness we've all experienced that may now be gone. Not registering those incoming pins was really reckless, in hindsight. Now, everything is registered, coming and going, as it should be.

The FPGA pins which make RX and TX are both tristate-able, so they can do open-drain.

Is that true OD via OUT pin, or emulated via DIR pin ?

FWIR there is a pin-mode of Open drain, which means the OUT pin is driven.
In emulated open drain, using TS, you set OUT low, and drive DIR, but that has Smart pin connection variance, and inverts.

There are 13 config bits that go directly to the I/O pad which control high and low drive types (normal, 1.5k, 15k, 150k, 10uA, 100uA, 1ma, and float). You would set the HHH bits to %111 so that the pin floats when driven high, giving you open drain. This circuitry is AFTER the smart pin.

Turns out the reason only the Prop123-A9 worked, at first, was because I had changed its reset timer from 50ms down to 3ms, like the actual silicon has. I had forgotten to update the other top-level FPGA board files. This is why tight timing from the PC didn't work on the other boards. So, I'm shortening all FPGA versions to the same 3ms, so that they'll all run faster. The TX signal handling was fine. Just the reset timer was the problem.

IMHO, all this OD stuff, single pin UART, and multiple P2's, are all way less important than it is to be able to support (pinout wise) Quad SPI FLASH, and Boot SD (without FLASH).

I2C EEPROM is only important if you want to save more pins, which IMHO is not necessary, at least for the first P2. But this can be done easily, so the possibility is there.

Pullups can be determined easily by just reading the pins. No need to output to them first. A pullup and a driven input pin will give the same result but I don't think that matters. The user just has to be aware that a few pins have special boot requirements/consequences.

I mean, if we allow this quad SPI setup of CS, CLK, D3, D2, D1, D0, then we'd need to drive all those pins on boot-up or the user would have to put pull-ups on what were HOLDn and WPn. It just looks like a sprawling mess to me with lots of ugly contingencies. I DO LIKE SD, though.

IMHO, all this OD stuff, single pin UART, and multiple P2's, are all way less important than it is to be able to support (pinout wise) Quad SPI FLASH, and Boot SD (without FLASH).

All boot sources are important, to those users who deploy them - but I agree, being able to connect QuadSPI parts is important.

Also possible should be octal-connected parts, as they are now expanding.
Still missing from P2 is DTR/DDR support.

To simplify pin strapping boot-choice issues, I have suggested before that the OTP/Config areas of newer Flash be use as a 'Flash-DIP sw'.
Whilst the very cheapest parts do not have this, the next level up is not many cents more, with Unique-ID and protected areas.

And I mention again that the life-cycle of Parallax products is way longer then usual in the industry and NEEDING a part from a different supplier to be able to store the boot-configuration seems very unwise to me.

Having pull-ups or pull-downs on pins is for sure still doable in 10-15 years, but if Quad-SPI Flash with protected areas will be still on the market then is - hmm - not so sure.

That is why I think serial-boot is the most future-proof way, if all fails one can bit-bang serial data with any future host.

The ability to directly boot from SD is nice, but with the exponential size grow of SD cards I see FAT32 on its way out. Like FAT16 disappeared, or FAT12. Floppy disks, anyone?

One question comes to my mind, the fastest serial auto baud is around 2Mb, what is the slowest?

SD card in 4 bit mode is not important. Nobody has done it on the Propeller yet, it's possible only in secure mode, which needs that you calculate a CRC per data line, and I'm not sure the use of it is legal without license.

If you go a page back in the thread from last year, you can see how it's possible to connect Flash and SD card, also with Flash in 3 wire mode.

The reason for Quad SPI Flash access is speed. You are 4 times faster if the hardware supports it. On the P1 quad access is slower than the fastest single SPI, because of the software overhead. If you can't use the streamer for Quad on P2, it may be the same.

The reason for Quad SPI Flash access is speed. You are 4 times faster if the hardware supports it. On the P1 quad access is slower than the fastest single SPI, because of the software overhead. If you can't use the streamer for Quad on P2, it may be the same.

You may also be 8 times faster, in DTR/DDR mode, which the P2 might support to streamer speeds.
Streamer looks to support 1,2,4,8 pin modes.

I've not seen much actual testing, reporting Streamer with Clk/2 combined, for Tsu,Thold etc margins ?

SD card in 4 bit mode is not important. Nobody has done it on the Propeller yet, it's possible only in secure mode, which needs that you calculate a CRC per data line, and I'm not sure the use of it is legal without license.

If you go a page back in the thread from last year, you can see how it's possible to connect Flash and SD card, also with Flash in 3 wire mode.

The reason for Quad SPI Flash access is speed. You are 4 times faster if the hardware supports it. On the P1 quad access is slower than the fastest single SPI, because of the software overhead. If you can't use the streamer for Quad on P2, it may be the same.

Andy

Yes, I remember the discussion taking place, but I couldn't recall the conclusions.

Thanks for the link. I will re-read the discussion.

I know quad SPI is faster. I just don't know if it's worth tying up three more pins at boot. I mean, how many times will you need to read data over and over from a quad SPI flash? With one data wire, we can read 512KB in 52ms. What difference will 4x faster really make? I hear about pulling in bitmap resources, but what size? 400KB, maybe? You can't fit enough of those in a quad SPI flash to make much of an animation. So, I'm not getting it. And this quad interface does nothing for writing, only reading. I don't mean to be a stick-in-the mud about this, by the way. I'm just not getting it.

I mean, if we allow this quad SPI setup of CS, CLK, D3, D2, D1, D0, then we'd need to drive all those pins on boot-up or the user would have to put pull-ups on what were HOLDn and WPn. It just looks like a sprawling mess to me with lots of ugly contingencies. I DO LIKE SD, though.

SD Card

This requires 4 pins because as you receive data in from DO, you must also clock out a high's on DI. They cannot be combined.

I am using the 4pin scheme on my boards, and so is Peter AFAIK.

The initialisation sequence must run <400KHz. It's only a preamble and I run at 50KHz.
Currently at 80MHz, P1 can achieve (without card delays, etc) 2.85MHz clocking. P2 will double this. Looking at the code it's possible to increase this slightly.

Using SD 4-data would increase this 4 times. Then we can also enlist the SmartPins for another increase.

Current cheapest microSD Cards are SanDisk SDHC Class 10 16GB at ~$15 (8GB no longer available, were $6). They will run at 10MB/s minimum.
There are faster microSD Cards UHS-3 (U-III) run at 30MB/s minimum and there are even faster versions.

The SD Card can have all 6 pins connected without any driving problems (when two pins are unused). I currently leave them floating.

AT25SF041 4Mbit FLASH $0.32/100 Mouser
Read Single/Dual/Quad up to 85MHz
Continuous Read Dual/Quad Reset (ie XIP) up to 108MHz
Both /WP and /HOLD have internal pullups and can be left floating, but preferred to tie to Vcc.

...
I know quad SPI is faster. I just don't know if it's worth tying up three more pins at boot. I mean, how many times will you need to read data over and over from a quad SPI flash? With one data wire, we can read 512KB in 52ms. What difference will 4x faster really make? I hear about pulling in bitmap resources, but what size? 400KB, maybe? You can't fit enough of those in a quad SPI flash to make much of an animation. So, I'm not getting it. And this quad interface does nothing for writing, only reading. I don't mean to be a stick-in-the mud about this, by the way. I'm just not getting it.

Maybe it's not fast and big enough for video, but for audio for example. You can have waveforms for the instruments in Flash and read them in realtime to play at different frequencies. And not only one voice at the same time.

An other application is: Executing of the bytecode directly from Flash (XIP). You get a really big code memory for one cog with that.

I know quad SPI is faster. I just don't know if it's worth tying up three more pins at boot. I mean, how many times will you need to read data over and over from a quad SPI flash? With one data wire, we can read 512KB in 52ms. What difference will 4x faster really make? I hear about pulling in bitmap resources, but what size? 400KB, maybe? You can't fit enough of those in a quad SPI flash to make much of an animation. So, I'm not getting it. And this quad interface does nothing for writing, only reading. I don't mean to be a stick-in-the mud about this, by the way. I'm just not getting it.

I'm not sure anyone is pushing for initial boot in Quad-Mode, more that users can connect a Quad-mode device.
eg That's why you do a quad-mode exit command in the SPI (hopefully that's still there ?!)

Initial boot would be 1-wide SPI, and users can optionally change to QuadSPI (DTR) if they want, given they know the Flash part info.

The PCB-pins needs to be Quad-mode compliant, which I think means light pullups on pins HOLD#, WP# pins
So they are not really lost pins should someone really want single SPI only, they are just defined during boot as light pullups.

I would say users will expect Quad SPI and Quad SPI DTR to be possible, if you say to them the boot-decisions-excluded-those-choices, their reaction is easy to predict....

So here are two ways the bootloader can use single SPI and you still can switch to Quad mode in your own application:
The first needs that you send the commands and addresses in bit revers order:
The second doubles the D0 pin, so you loose one pin, but you get the right bit order:

Neither really shines
- having to reverse bits is a kludge that will annoy many... & losing another pin may drop streamer align & octal support ?

It will be important to have the streamer able to use this, in x4 and x8 settings.
Not sure what dictates that imposes on pin-allocates, but I'd work back from there....

I can see why Chip does not like the Quad connection of the Flash. Beside more pins there is also a gap in the pin order if you only want single SPI, which is mostley the case.

So here are two ways the bootloader can use single SPI and you still can switch to Quad mode in your own application:

The first needs that you send the commands and addresses in bit revers order:

The second doubles the D0 pin, so you loose one pin, but you get the right bit order:

If you don't want quad mode, you can replace all the resistors with wires. And no change in the current bootcode required.

Yes. If we are going to support quad SPI, we might as well mandate it.

I like your alternatives for hooking up quad SPI while still maintaining a tidy 3-pin situation at boot. Having to bit-reverse commands is really no big deal.

Using pin 58 to add SD card support, in lieu of SPI flash presence, would be fine, I think. We would need that one more pin for data.

In a case where boot was from a 3-pin flash, a removable SD card could be supported on nearby pins, maybe getting common use of the CLK and DI/DO pin. For boot, though, it would be an either/or situation.

Can you add a map for OctaFlash ?
Does the streamer allow nibble align of Byte loads, in which case I think it is just 4 more pins.

That OctaFlash you linked to doesn't seem to be available through the usual channels. Hence no price either.
There is a DQS pin but I cannot find what it does, nor if it's an input or output.
It is a big IC ! Then again 64MB is big too !

IMHO it would just map to the next 4 lower pins P55..P52 for D7..D4.
IIRC we can swap nibbles around in one instruction.

I have no idea about the streamer.

BTW It's funny... we are going full circle... serial 1bit back to serial 8bit parallel. Just need ALE and we will be there

That OctaFlash you linked to doesn't seem to be available through the usual channels. Hence no price either.
There is a DQS pin but I cannot find what it does, nor if it's an input or output.

I think that's similar to the HyperFLASH, where that signal is a clock-back, to delay-compensate and so allow them to push up the clock speeds.
At more modest P2 type speeds, it may not be required.
Less clear in the data, is if that DQS ever has missing pulses, eg as wait states for reads across boundaries.
If it does, it either needs to be used, or the 'cache sw' needs to boundary align.