Escape Hatch Labs - Ian Hanschen's random tinkerings

Ever since I saw a YDPG18A mod post where flash was upgraded (unsuccessfully), I've been curious about upgrading NAND flash. After I started investigating I realized the upgrade mentioned wouldn't work, so I called the guy on it and he mentioned it didn't work, just that he hadn't gotten around to posting about it. If he'd found the exact same model flash chip as the existing one, soldered it on, and reflashed, it would have worked. But a lot of the flash chips used in these devices are old and hard to get ahold of. Luckily, they use a standard footprint (TSOP48), and a standard pinout used for SLC and MLC NAND flash.

When a system is flashed using LiveSuit, LiveSuit first sends over some modules located within the flash image file to load into system memory to assist with the flashing. One of these modules partitions and formats the NAND flash - then individual blobs of data are sent for each partition. The thing about these flash chips is that even though they often come in a TSOP48 package, they all have different model numbers, which can be queried from the chip. They also have different geometry - number of dies in one chip, number of sectors, number of blocks, and page size. The driver only supports the same model of flash for every flash chip the board is populated with - otherwise it will ignore everything but the first chip.

AllWinner has a generic NAND flash driver which is compiled into Android, the bootloader's sprite.axf and U-Boot (on 4.x devices), the generic 'boot' image that jumps to the bootloader, and the binaries that LiveSuit sends over to assist with flashing the device during a flash. This generic driver contains several tables, one for each vendor of NAND supported, containing each supported chip ID, and the geoemetry of those chips. You can see the list here - it's a short list. The largest flash chips supported are 16GB, and every one I've looked for so far has been hard to get ahold of.

So what I figured I'd do is get an understanding of how the NAND driver works, add support for a new chip, remove the old NAND and solder in the new chip(s), and flash the device. I wanted to try maxng out the possible storage - the largest TSOP48 flash you can get is 32 gigabytes. So I purchased some flash from Avnet - a couple of the Micron MT29F256G08CJAAAWP. (256 gigabit chips, meaning 32 gigabytes). This chip isn't in the Micron table in the AllWinner NAND driver. Worse, I found out I had to add support to all of the modules I mentioned above - even though they use the same NAND driver, source code is not available for them so it's not possible to rebuild them.

Desoldering the old flash was something I had to take great care with. In the Yinlips case, these flash chips are glued to the board with a weak glue. But strong enough that the chip won't easily lift once you liquify the solder. So I slide a razor blade under the chip, gently, not pushing the front against the board (which could have cut traces), and only enough to separate the glue on the chip casing from the board a little bit - the razor blade isn't used to cut the pins of the chip from the board... The choice then is, you can use a hot air rework station to remove the chip, or your soldering iron. My hot air rework station was elsewhere and I was in a hurry to try the new flash out. So what I did was to apply flux to the pins on either side of the chip, and then melt enough solder onto the pins on each side, until I got 2 rows of solder that that just covered the space of all of the pins. I started at one side again and kept the solder liquid with the iron until the solder on all of the pads melted an started to give way. Then I gently lifted from underneath the flash with the razor blade, *gently*, until the device separated from the board. When this happened I used a solder sucker to suck up the solder I'd added to that side of the chip. I repeated the process on the other side, and removed the chip. Then I used flux and solder whick to clean up the pads. To solder both of the new 32 gigabyte chips I used the 'drag' method with a hoof tip, and then went over and cleaned up shorts either by dragging away from the chip onto the leads, or using more solder which. A video of the drag method for soldering can be found here:

A Micron NAND table entry looks like this in the driver:

The hard part was figuring out the new 'geometry' to use in the table entry. Unfortunately I looked at the datasheets for several supported chips and the values in the tables mostly didn't match the contents of the datasheets. After trial and error, I found out the following - use the page size and the block size specified by the data sheet. Use the chip count (or die count) specified by the datasheet. To figure out the sector count, multiply block * page * die(chip) count, and divide total flash byte capacity by this number. So for the flash I wanted to add, it had 4 dies, 256 blocks, a page size of 8192 - yielding 8388608. Multiply this by 4 and you get 33554432 - the capacity of the chip (remember 1k is 1024 bytes, not 1000). So I used 4 for the sector value. I decided to unpack the flash .img, modify the individual binaries, repack, and try to flash the new chips. I used 'imgrepacker' to unpack the flash, and the awesome free hex editor from hexedit.com to modify the files - it can open several files at once and allow you to search across all of them for a pattern. I chose an existing Micron table entry, since I had Micron flash. Each table entry points to an 'architecture' structure which has the special commands needed to talk to that branch of chip. These are pretty generic for simple flash commands but unique for the advanced ones, so it's important to reuse a table entry that matches the brand of your flash. I also confirmed from the chip datsheet that the commands for the new chip matched the commands in the architecture structure for the brand.

Revisiting the driver entry, it looked like this:

In hex, that looks like this:

So what I did was to open all of the binaries in the unpacked flash image, and search for this exact pattern. Once I found it, I overwrote the entry with the hex version of the new table entry. If the new entry were in the code it would look like this:

As hex, it looks like this:

I searched for, and replaced the block (manually by hitting f3) - you can't do a mass hex-based search and replace in the tool. Once I was done, I saved and closed all of the files, and used imgrepacker to re-pack everything into an .img file. Then I used livesuit to flash the device. It flashed it! But it didn't work, it kept rebooting! For shame! So I needed to debug the issue. On the yinlips boards, there are pads labeled 'rx' and 'tx' - these are used for serial output on the device, to watch the boot process, interact with u-boot, and as a shell when the device is fully booted. These are 3.3v pads, so you need a 3.3v serial TTL cable. I hooked up device ground, rx, and tx, and used a terminal program to watch the boot process. It turned out that part of the bootloader was doing a CRC on one of the files I needed to modify, and refusing to boot from it if it had changed. The assembly looked like this:

So what I did was to figure out the bytecode modification necessary to disable the CRC check on the file - again the CRC didn't match what was hardcoded in the code because I changed a table entry in the file. I specifically changed "check_file" in sprite.axf to return 0 (success in this case) even if the CRC didn't match. To do this I searched for the bytecode pattern for the old assembly, in the same way as I did the table entries above, with the unpacked image, and replaced it everywhere I found it (multiple images in the flash file will contain sprite.axf) with the new bytecode. Then I saved and closed all the files, repacked the image, and re-flashed This time it worked! Something LiveSuit does, after it creates all the images hardcoded into the partition config file in the flash image, is to create a partition with whatever is left called 'UDISK' - this partition is used for the 'internal sdcard' partition. So on my 4GB device, I went from a 2GB internal sdcard partition to a 58GB one. I did this modification on a YDPG18A and a YDPG16 and both worked fine. I imagine this will work on any Allwinner A10 based device.

Yinlips YDPG18A hacks/mods

A while ago I purchased the Yinlips YDPG18A. It's a PSP look-alike from China that runs Android and has a pretty beefy processor. For the cost (~$130) it's nice, but some things about it could be improved.

Specs

Battery

I replaced my battery with the biggest battery I could fit - a 4200 mah battery. It makes the device feel better in my hands since it is a touch heavier.

Heatsink

The device gets warm when playing games. I followed Ashen's advice about adding a heatsink, but I went small. Just a thermal adhesive pad and a copper shim (search ebay for copper shim and you'll find several small ones meant for low end GPUs and such).

Screen

The only difference I can find between the YDPG18 and the YDPG18A, hardware-wise, is that the LCD with built-in resistive touch lacks the resistive touch layer (even though it brings out the signals to the host), the YDPG18A case has a capacitive digitizer built into it, and there is a connector soldered on the board. Other than that I think it's all in the software. It might be possible to create a rom that is compatible with both the YDPG18A and the YDPG18.

Replacing the screen if it breaks
I was eager to do this, since I figured the display was shoddy, and part of my display had been affected by poor heating insulation. The KENTEC K50DWN0-V1-F fits the bill. Same size, same signals. The display that came with the device seems like a clone of it. This display will work for both the YDPG18 and YDPG18A - if you have the latter, carefully use a razor-blade to remove the resistive digitizer overlay and snip the flex pcb connecting it to the rest of the display. After replacing, the area that was marred by heat is gone, but I can't really tell a difference in quality. Certainly this did not affect brightness - brightness is just a function of the backlight (which is the same) and the backlight driver.

Fixing screen brightness

Standard disclaimer applies. I got 2x brightness by removing the 22ohm resistor pointed to by the red arrow and turning it into a solder bridge. So far the jury is out as to how it affects power usage, but this will definitely suck more battery. Update: I swapped out the solder bridge for a 10 ohm 0603 resistor. It's not nearly as bright but it's better than before.

Before:

After:
Note: the display looks great afterwards, my camera just washed out the colors because it was...so bright.

Washed colors/oily looking screen
As near as I can tell, this is from the oxidized materials that are a part of the multitouch digitizer. Unfortunately nothing can be done about this.

Gamepad

Gamepad does not work without running Game Manager
Yinlips' IOC driver waits for a specific IOCtl before enabling the gamepad. Game Manager sends it while a game is running, and sends another to disable it. To enable gamepad at boot just add the following to your /init.sun4i.rc file after "insmod /drv/ioc.ko":

wait /dev/ioc
exec /system/bin/ioctl -d /dev/ioc 0x0 0x1
Note: the 0x bit is important, this will not work otherwise.

Analog stick and d-pad send the same input
I disassembled the gamepad driver - there is no way to fix this. The driver receives the same button flags from the hardware for both the stick and the d-pad.

Resources for making my own OLEDs/OLEPs

I'm planning to make my own OLED devices with the help of a biologist. And maybe some other things that light up too! Below are links to labs with instructions, easy-to-deal-with suppliers, and a whole lot of extra info. This page is here both as a resource for building information on how to do this, as well as a log of my own experience.

Warning - I write this standing on a soapbox, and this is pretty open and unorganized. This is how I brainstorm in general. I've decided to start putting these up on my wiki instead of keeping them in my head.

I am big on thin client lately. I believe there is a general sine-wavyness in the industry that causes things to move towards ubiquitous computing and back to having everything localized again because some things about thin client just plain suck. Namely, bandwidth, latency, and the power of the client.

Lately though, you can get a cortex a8-based SOC for fairly cheap. And ubiquitous (buzzword alert) bandwidth really is starting to become a reality. So I think we're hitting the era of the rich thin client. It's cheap, too. Think of the rich thin client as everything you see in a couple-hundred dollar netbook today. There really is a lot of power there.

Netbooks are able to sell themselves. I haven't seen any artsy commercials purporting how cool you will be if you have one. They're just cheap and useful. And these devices are not only economically feasible, they're nearly disposable. I have a friend who drives those boxy Scion xBs - he calls it his throwaway car and jokes that he has another one still in the plastic.

They have a right-sizing problem though. They're starting to get discreet GPUs, larger & higher resolution screens, multiple cores, etc - and that's going to blur things quite a bit. You can already configure one to cost 8 times the amount that the eeepc goes for. So I think that the enthusiasm created behind them will also be their downfall - unless the need for these devices to become more powerful goes away, the line between a netbook and a full blown laptop is going to get very, very blury.

Again, I believe the reason a netbook is popular, is because it's cheap and it's 'rich' enough to do what you need. Heck, you could call it a different factoring of the hardware platform in the iPod touch if you want.

That hardware platform, respun in any form factor is the perfect rich thin client. It gives me accelerated access to a framebuffer, plenty of kinds of input, and a fat link to the ether. And this wonderful bit in between where it doesn't matter if I'm not always connected because the machine is powerful enough on its own to do some things.

Wouldn't it be great if someone stepped up to the bat and created a software platform for these devices that made the line between what work is done on the client versus what work is done in the (dammit) cloud transparent to the user?

I think some folks when asked this question would say "Hey we're done, it's called web 2.0, and Palm WebOS is the perfect realization of that."

Unfortunately it's not. It's a good step forward, that I believe was delivered before completion. The reasons WebOS sucks can be fixed, but so far they are things that any good operating system should have. Heck, I'd love to take the webkit porting layer and build that directly on top of a super thin hardware abstraction, and add modern operating system features. WebOS sits on top of linux, but does not make use of a lot of benefits that a full blown operating system provides, simply because it tries to be the operating system in the webkit abstraction.

Inside of its "nearly everything is javascript running in a browser" (which is fine, given canvas & local storage), WebOS lacks real support for isolating tasks from each other, and this includes prioritization and accounting, both brain-dead important things for any operating system that is multitasking.

Because of this I had a phone call go to voice mail simply because I was loading a traffic webpage and the machine couldn't render the incoming call info - it drew black, then zoomed out from the webbrowser app, and then sent my friend to voicemail, all without me touching it. I also tried out an html 5 webpage that used canvas, and it leaked a ton of JS allocations to the point where the machine popped an out of memory dialog. Fine. Navigate away from the page and the device should have memory again right? Nope. Close the webbrowser? Nope. Not even enough available memory to open the system dialog to reboot.

WebOS also does not use acceleration for anything outside of video, from what I've found so far (I expect at least fast blts between memory allocated by the application and the framebuffer or window backing store).

These things can be fixed but really should have been designed in from the ground up. Fortunately or unfortunately, this is the best realized standard I've seen for a software platform on a rich thin client.

Jury-rigging existing architectures to support thin client is an interesting spectrum - at one end you have "gimme a frame and the capabilities of html5 or just flash and I'll do the rest" which I look at as google gears, adobe flex, etc. I don't have enough depth in that area but so far I really believe it's the wrong direction. At the other end of that spectrum you have the Pano Logic - a shiny little box that offers most of the IO you get from an actual PC - audio, video, USB, ethernet, all glued together by a xilinx spartan 3e 1600k. That may well have an ARM core on it, but they call it "cpu-less." This device talks over ethernet to a server that can be in their (dammit) cloud, which runs software that arbits between the device and a vmware session.

The entry-level Pano Logic 'license' costs around $2000 for 5 devices and a year of service, unfortunately. (Tangent warning) I picked up 5 on e-bay for $20 to check them out. They have a pretty incredible design and if I can't get them to work with vmware, I'll just probe all the connections to the FPGA and make the device do something different. I have what I believe is an original, expedient hack to do this that I'll write about later - if you think about it, it'll probably be obvious, I used the same thing for the 32x24 pixel RGB LED displays.

Anyway, to summarize:

The netbook is the hardware platform headed towards rich thin client computing. Forcing the evolution of that platform by souping it up with expensive hardware is the wrong way to go. That expensive hardware will eventually get cheaper because the process to make it will become cheaper, while at the same time more powerful and expensive hardware is becoming available. It's not a rich thin client if it's using the latest powerful and expensive hardware. It's a rich thin client because it's using proven hardware that is now cheaper to make. That 'rich thin client' hardware will always be a constant target in the shallow end of the affordable computing spectrum just as the sooped up hardware is at the other edge. Blurring the line is foolish and will kill this hardware platform's ability to sell itself. The nature of this platform lends itself towards the development of shared computing in a self-sustaining way.

For the software, execution on the device versus in the cloud should be very blurred. Some things are better right next to the input and framebuffer. Other things are expensive, but also pipelinable, cachable, and should be run on the big iron in the cloud. Often these are both a part of the same task. The external requirements are being filled by the evolution of computing. We have the pervasiveness of the rich client hardware, a low latency high bandwidth network, and are riding yet another wave towards attempting to do more work in the cloud. The software platform for this is currently evolving in a genetic manner that is much more haphazard than the evolution of the hardware. 'Survival of the species' lends itself more towards physical things. With hardware we are eager to throw out the old for the new, but after hitting a certain point of pervasiveness, software just won't die. It is much more expensive and time consuming for specific hardware to hit the same level of pervasiveness.

There have been many interesting attempts to build platforms for different pieces of the offload-work software puzzle, and currently developers pick and choose those pieces to put together. For example, there are already C compilers for offloading parallelizable work to DSPs, FPGAs, GPUs, and CPU clusters. But there is no overall coherent vision for doing this. It is meta to the role of today's operating system design, but ideally should not be completely on the shoulders of the task being performed either. It is a reaching statement, but I believe any efforts towards a good rich thin client platform will step into and help standardize this arena with the realization of a structure that helps engineers more easily balance between the location of the resources, the latency between them, and their execution properties with the tasks themselves. I believe this is the meta operating system, and it will need to span machines.