My XBOX 360 stuff

Thursday, 30 August 2012

Hey, as you might have noticed, I resumed my work on nulldc-360 and libxenon not long ago.

I'm currently working on 3 things: compatibility/sound/speed.

6 months ago I was badly stuck on 2 bugs, one texture endianess problem, and one random crash/infinite loop in the dynarec.
First thing I did was to look at that texture bug, in a few days I found the exact case where it happened and fixed it for good.
Then I fixed the dynarec one, it was quite an awful one: I forgot to save/restore the SH4 condition flag on dreamcast interruptions, so it was randomly corrupted as the emulated console handled its IRQs!
I fixed a few more bugs until it was clear I needed proper sound emulation for more games to boot.

Adding sound was relatively straightforward, of course there were the usual endianess related bugs, but I guess I'm getting used to it ^^
The Dreamcast sound chip (AICA) is a complex design, it has an ARM processor core, a 64 channel sound generator, and a DSP.
I don't emulate the DSP for now, many games don't really need it.
Main problem with sound is the induced emulator slowness, though thanks to the 360 multicore cpu, I was able to make it almost free.

Updating peripherals in the dynarec works that way: each code block knows how many SH4 cpu cycles it emulates, and each time a fixed number of cycles (448) is reached, it calls a procedure that updates those peripherals.
To multithread sound, and by the way, other peripherals, I run parts of that update on a separate core.
It runs concurently with the dynarec, and, basically, every 448 SH4 cycles both get sychronized. so as long as peripherals emulation takes less time than SH4 emulation, the dynarec doesn't have to wait for them!
So that makes them almost free to emulate, almost because they still stress the 360 L2 cache and memory controller a little.

Anyway, compatibility seems pretty good now, sound works, speed is more or less the same it was before sound emulation, a proper binary release *might* not be that far ;)

Monday, 29 August 2011

When I started this blog, I said it would be about open source libraries, emulators, reverse engineering and hacking. My libxenon stuff easily covers the first 3 items, but I didn't write much about hacking.
That word means a lot of different things, but it was synonym of 'finding exploits' for me when I wrote it.

And fact is I already had a working reset glitch hack when I wrote it, hell, all my libxenon stuff was coded on a glitched console !

It started around febuary 2010, I don't really know why, but I wanted to hack the 360, badly, I had a few ideas on how to do it but even if I red xboxhacker , I had very little knowledge of how the 360 worked.
Nonetheless, I started disassembling a Zephyr I bought, wiring POST port, JTAG, and a few other things I red on xboxhacker about.
Then I met Tiros, he was already well known for his hacking work on the 360. He told me my ideas were junk, but he also started to 'teach' me how previous exploits worked, how the boot process worked,...
Then we took the 'regular' way of hacking it, first searching for a way to prevent the boot process to deactivate JTAG, then trying to find a kernel that was vulnerable to SMC/System flash controller DMA attack but wasn't blacklisted by bootloaders, then trying to find hypervisor flaws, then ... nothing, we searched flaws for months, found a few interesting bugs, but nothing that could lead to unsigned code running...
It is now late summer 2010, I'm starting to lose hope, then as a desperate move, I start to think about glitching.
From that moment it took me no more than a few weeks to run Xell for the first time on my Zephyr!
I had incredible luck, there was a bug in my CPU reset code, instead of sending a millisecond-like pulse, it sent a 100 nanoseconds one, it was in fact the time for 2 consecutive GPIO write instructions to complete on my microcontroller ! Call it while the CPU is slowed down and it glitches !

Tuesday, 28 June 2011

Sorry for the lack of updates, I was busy with other stuff for some months.

As you can see on my github, ( https://github.com/gligli/libxenon/commits/master ), I think libXenon has improved a lot lately, with lots of stuff added from Xell (NAND access, lwip & network code,...), a new ELF loader, a unified ATA driver (that can access both HDD and DVD), the return of opendir/readdir/... functions and many bug fixes and smaller improvements in almost all drivers.

The reason for many of those changes is to be able to make most of Xell a regular libXenon app: Xell would be splitted into 2 stages, one stage which recovers from exploit and then decompresses and launches a second stage, which is a libXenon ELF. To maintain backwards compatibility, both stages would have to fit in the 256KB limit for a Xell binary.

Last but not least, I'm working on mupen64-360, my Wii64 port these days, I already added sound and done some optimisations to try to get more speed.
I multithreaded a good part of sound processing so it's done almost for free, and in fact anything that isn't multithreaded (RSP emulation) was already running in my version from january. I might be able to multithread RSP too, it would probably give a nice speed boost :)
I also redone the port of the Wii64 dynarec, I did my first port from the ps3 branch and it seems it wasn't up to date with the trunk speed-wise. Now it's using Wii64 1.1 code. I also changed the way stores were handled in the dynarec, trying to generate more code and rely less on a (slow) generic C function to do the job.
By the way, source code is now available on my github ( https://github.com/gligli/mupen64-360 ). Anybody that can compile it can try it, but please don't distribute binaries ! It's not a good idea at all to release unofficial versions of a work in progess of someone else code so I hope You can be responsible on this.

Here's a new video showing the progress on Mario64, jerkiness is due to the video capture card, trust me that game runs smooth :)

Thursday, 27 January 2011

During the last weeks, I worked a bit on improving my mupen64 port, here are the things I did.

As SM64 started to work well, I switched to the Zelda rom for my testing, it is a much more complex game to emulate graphics wise, so obviously it was completely buggy and painfully slow on the first run. The biggest problem was that unlike SM64, most of the rendering was done with my slowest pixel shader, and fixing the bugs would have made it even slower so I decided to do a complete rewrite of the shader. This time I designed it around something I just discovered: constant boolean registers, it allows flow control without a big performance hit. I took me a few tries to get it fast and accurate, but now that pixel shader is almost as fast as the old one on simple cases while being more accurate and much faster on complex cases. I also made my old shader as accurate as I could, it is now used for some rare cases the new one can't emulate. With a few more fixes to libxenon and the emulator (implementing 2D rendering for example), this makes Zelda reasonably fast and playable.

Next game was Mario Kart 64, this time it was fast and looking good on the first try, but crashed after a few races with some 'out of memory' message. It turns out something very important was missing from the libxenon 3D driver: a way to free what you allocate ! (texures/vertex buffers/...) So I replaced the very basic GFX memory allocator with some malloc-like one I found in libxenon sourcecode, and modified the emulator texture cache to actually free old textures when needed.

Sunday, 9 January 2011

As you probably already know, I added support for Jasper EDRAM init in libxenon not long ago.

Next was adding support for more display resolutions. To add one in libxenon, you basically need 2 things: one ana chip registers dump and one xenos GPU registers dump. From the GPU dumps, you can guess video timings. So you add those timings to libxenon along with the ana dump and I thought it was supposed to work.
It turns out it wasn't that easy, you have to edit ana dumps a bit before they work in libxenon. I had to do some reverse engineering on the official kernel to find out exactly how.
Anyway I was able to add 1280x720, 1440x900 and 1280x1024 (all VGA). 1440x900 is rendered as 1440x896 because EDRAM isn't big enough to render it. This is the highest resolution I can add for now because unlike the official games, libxenon doesn't render 3d as 1280x720 and then scale it up, instead it renders to the native resolution directly.

Next was adding HDMI support. I quickly had an idea for this: logging I2C/SMbus and GPIOs accesses inside an official kernel (I thought that was the 3 ways of talking to HDMI hardware). I need to thank cOz for doing the kernel patching and logging. From those logs I was able to guess which registers I needed to write to activate HDMI output.

So we now have jasper support, HDMI, and more available resolutions in libxenon :)

Now I think I'll work on updating Xell a bit, so my next post will probably be about it.

Friday, 7 January 2011

I have already made quite a few improvements to my mupen64 port since the video.

About half an hour after making it, I fixed almost all the missing screens/skies and the clipped mario by adjusting the Z buffer range (previous range was 0..1, now it is -1..1).

I then improved N64 gfx combiners emulation (combiners are sort of early primitive pixel shaders). I use 360 pixel shaders to emulate them. At first it was really slow because I used many switches and loops in it and it seems doing that in a pixel shader isn't such a good idea. I got everything back to playable speeds by using mainly 3 techniques:

a math formula that handles all the possible cases for the combiner operation (ie mul/add/sub/...)

having different pixel shaders (one fast that can only do simple things, one intermediate, and one slow that emulates everything) and switching between then when needed.

So now gfx emulation is quite fast but the emu still runs slowly when it emulates floating point intensive scenes like the mario head demo at the begininng of SM64 so I start looking at how mupen64 emulates floating point operations, I quickly discovered that the whole floating point unit was running in interpreter mode, oops!
A few #define later the emu was up to 50% faster.

Next I had an idea: why not try to get the X360 GPU to render my current frame in background instead of actively waiting for it to finish rendering.
Usually you do this:

/* resolve (and clear) */

Xe_Resolve(xe);

/* wait for render finish */

Xe_Sync(xe);

Now I do this:

/* resolve (and clear) */

Xe_Resolve(xe);

/* begin rendering in background */

Xe_Execute(xe);

and then I call Xe_Sync() at the last time right before beginning my next frame
I got a huge speed boost with this, Super Mario 64 now runs at around 100fps ingame !

Obviously it isn't about my (boring) life, instead I will try to post updates about what I work on the Xbox 360 so it will probably be about open source libraries, emulators, reverse engineering and hacking.

Just to make it clear, my end goal is to revive the free/legal homebrew scene on the 360 because it's the only next-gen console without a proper one, it's a shame even the PS3, which has been hacked only recently, has a bigger one.