How can it take 3GHz to emulate a Super Nintendo? The man behind a major SNES …

Emulators for playing older games are immensely popular online, with regular arguments breaking out over which emulator is best for which game. Today we present another point of view from a gentleman who has created the Super Nintendo emulator bsnes. He wants to share his thoughts on the most important part of the emulation experience: accuracy.

It doesn't take much raw power to play Nintendo or SNES games on a modern PC; emulators could do it in the 1990s with a mere 25MHz of processing power. But emulating those old consoles accurately—well, that's another challenge entirely; accurate emulators may need up to 3GHz of power to faithfully recreate aging tech. In this piece we'll take a look at why accuracy is so important for emulators and why it's so hard to achieve.

Put simply, accuracy is the measure of how well emulation software mimics the original hardware. Apparent compatibility is the most obvious measure of accuracy—will an old game run on my new emulator?—but such a narrow view can paper over many small problems. In truth, most software runs with great tolerance to timing issues and appears to be functioning normally even if timing is off by as much as 20 percent.

So the question becomes: if we can achieve basic compatibility, why care about improving accuracy further when such improvement comes at a great cost in speed? Two reasons: performance and preservation.

First, performance. Let's take the case of Speedy Gonzales. This is an SNES platformer with no save functionality, and it's roughly 2-3 hours long. At first glance, it appears to run fine in any emulator. Yet once you reach stage 6-1, you can quickly spot the difference between an accurate emulator and a fast one: there is a switch, required to complete the level, where the game will deadlock if a rare hardware edge case is not emulated. One can imagine the frustration of instantly losing three hours of progress and being met with an unbeatable game. Unless the software does everything in the exact same way the hardware used to, the game remains broken.

Or consider Air Strike Patrol, where a shadow is drawn under your aircraft. This is done using mid-scanline raster effects, which are extraordinarily resource intensive to emulate. But without the raster effects, your aircraft's shadow will not show up, as you see in the screenshot below. It's easy to overlook, especially if you do not know that it is supposed to be there. But once you actually see it, you realize that it's quite helpful. Your aircraft has the ability to drop bombs, and this shadow acts as a sort of targeting system to determine where they will land.—something that's slightly more difficult without this seemingly minor effect.

The second issue is preservation. Take a look at Nintendo's Game & Watch hardware. These devices debuted in 1980, and by now most of the 43 million produced have failed due to age or have been destroyed. Although they are still relatively obtainable, their scarcity will only increase, as no additional units will ever be produced. This same problem extends to any hardware: once it's gone, it's gone for good. At that point, emulators are the only way to experience those old games, so they should be capable of doing so accurately.

But this accuracy comes at a serious cost. Making an emulator twice as accurate will make it roughly twice as slow; double that accuracy again and you're now four times slower. At the same time, the rewards for this accuracy diminish quickly, as most games look and feel "playable" at modest levels of emulator accuracy. (Most emulators target a "sweet spot" of around 95 percent compatibility with optimal performance.)

There's nothing wrong with less accurate but speedy emulators, and such code can run on lower-powered hardware like cell phones and handheld gaming devices. These emulators are also more suited for use on laptops where battery life is a concern. But there's something to be said for chasing accuracy, too, and it's what I've attempted to do in my own work. Here's why it matters to me.

Doing it in software

Back in the late '90s, Nesticle was easily the NES emulator of choice, with system requirements of roughly 25MHz. This performance came at a significant cost: game images were hacked to run on this emulator specifically. Fan-made translations and hacks relied on emulation quirks that rendered games unplayable on both real hardware and on other emulators, creating a sort of lock-in effect that took a long while to break. At the time, people didn't care about how the games originally looked and played in general, they just cared about how they looked and played in this arbitrary and artificial environment.

These days, the most dominant emulators are Nestopia and Nintendulator, requiring 800MHz and 1.6GHz, respectively, to attain full speed. The need for speed isn't because the emulators aren't well optimized: it's because they are a far more faithful recreation of the original NES hardware in software.

Now compare these to the older N64 emulator, UltraHLE, whose system requirements were a meager 350MHz Pentium II system. To the casual observer, it can be quite perplexing to see Mario 64 requiring less processing power than the original Mario Bros.

My experience in emulation is in the SNES field, working on the bsnes emulator. I adored the ideal behind Nestopia, and wanted to recreate this level of accuracy for the Super Nintendo. As it turns out, the same level of dedication to accuracy pushed requirements up into the 2-3GHz range, depending on the title.

TimeCop, on two very different emulators

Nestopia caught on because its system requirements were paltry for its time, but I have no doubt that releasing it in 1997 would have been disastrous. Since my emulator ultimately required a computing system with more power than half the market, I've seen first-hand the effect of high system specs and the backlash it causes. It's easier to blame the program than to admit your computer isn't powerful enough, but the reality is that faking an entire gaming console in software is an intensive process.

Why accuracy matters

So if an emulator appears to run all games correctly, why should we then improve upon it? The simple answer is because it improves the things we don't yet know about. This is particularly prominent in less popular software.

As an example, compare the spinning triforce animation from the opening to Legend of Zelda on the ZSNES and bsnes emulators. On the former, the triforces will complete their rotations far too soon as a result of the CPU running well over 40 percent faster than a real SNES. These are little details, but if you have an eye for accuracy, they can be maddening.

I've encountered dozens of titles with obscure quirks. Sometimes the correct, more accurate emulation actually produces a "wrong" result. Super Bonk's attract mode demo actually desynchronizes, causing Bonk to get stuck near a wall on most real systems. And Starfox suffers from significant slowdown issues throughout the game. These are certainly not desirable attributes, but they are correct nonetheless. We wouldn't round pi down to 3 simply because irrational numbers are inconvenient, right?

I don't deny the advantages of treating classic games as something that can be improved upon: N64 emulators employ stunning high-resolution texture packs and 1080p upscaling, while SNES emulators often provide 2x anti-aliasing for Mode7 graphics and cubic-spline interpolation for audio samples. Such emulated games look and sound better. While there is nothing wrong with this, it is contrary to the goal of writing a hardware-accurate emulator. These improvement techniques typically make it more difficult even to allow for the option of accurate emulation, in fact.

Another major area where accuracy is a benefit is in fan-created works from translators, ROM hackers, and homebrew developers. Few of them have access to run code on real hardware, so they will often develop their software using emulators. Unfortunately, speed-oriented emulators will often ignore hardware limitations. This is never a problem for a commercially developed game: upon required testing on real hardware, the bug would quickly be discovered and fixed. But if you can only test on a specific emulator, such bugs tend to persist.

I can name a few examples. The fan translations for Dragon Quest 1&2, Dual Orb 2, Sailor Moon: Another Story and Ys 4 all suffered invisible text issues as a result of writing to video RAM while the video processor had it locked out for rendering the screen. Only half of these titles have subsequently been fixed.

We've known about this hardware limitation since 1997, which consists of a one-line code fix, but the most popular emulator still does not support this behavior. As a result, translations made solely for this emulator continue to cause problems and lock-in. Who would want to use a more accurate emulator that couldn't run a large number of their favorite fan translations?

It doesn't stop there, though. The original hardware had a delay upon asking the math unit for multiplication and division results. Again, any commercial game ever released would respect those delays, but fan hacks led to a Zelda translation's music cutting out and to the Super Mario World chain-chomp patch going haywire.

Or an emulator might ignore the fact that the sound processor writes echo samples into shared RAM. Not a problem until you wind up with hacks that use wildly unrealistic echo buffer sizes, which in turn end up overwriting the entire audio program in memory, crashing and burning in spectacular fashion. This one issue single-handedly renders dozens of Super Mario World fan-made levels unplayable.

I found out that in ZSNES, battles actually run at about 50-100% faster than they should. Considering battles in SO are real-time, tactics are thrown out the window because you don't have enough time to react.

It's actually how i discovered bSNES, when someone recommended I try it instead. I've been using it since.

I do want to point out that this isn't a "please use bsnes" article, it's an appeal for all accuracy-oriented emulators. Such as Nemesis' Genesis emulator 'Exodus', sinamas' Game Boy Color emulator 'gambatte', and so forth. I was going to get into the importance of open source with regards to preservation, but that's a whole other article.

In fact, I actually do not recommend bsnes anymore to casual gamers. Features are an important part of an emulator's popularity, and I used to go out of my way up to v070 or so (you can still download it from the archives of my Google Code page), but lately I've given up on the whole popularity thing. Since I'm a sole developer, and 90% of my time was spent on the GUI, I decided I had to scale back. I've gutted most of the functionality to produce an easier to maintain project. I've also removed lots of legacy formats and conveniences, so new users will have to jump through a few hoops. I'm under no delusions that my current builds could never catch on mainstream-wise.

> Accuracy is great, but so is portability.

I agree, but we already have Snes9X v1.43 for iOS/Android/etc. Up until 2010, the 9X team was afraid to ever improve any, for fear that people would stop using the emulator at all. It wasn't until the third generation of core developers that we've started to improve its emulation. And things work out just as I've predicted: handheld porters keep improving v1.43, and desktop users keep improving v1.52+.

> Star Ocean is a big example of an accuracy problem with ZSNES.

Not only do the battles run twice as fast, the game constantly crashes all over itself. The game has actually earned a widespread reputation for being notoriously buggy. But let's be honest, it would have been recalled if it were that bad. In truth, while the game is far from perfect, it's extremely playable. The crashes are caused by ZSNES.

> I wonder if virtualization technology could be applied to emulators.

Virtualization technology relies on the same processor being used. As much as I'd kill to have it, I don't see a resurgence of the 6502 family in any modern systems. Virtualization also relies on loose timing requirements as a result of eg x86 processors having countless variants and clock rates.

> Another problem is accurate emulation will require much more developer's effort? Most emulators are seems to be done by a single guy or a small group with their spare time.

Yes, it took me several years for a relatively old system with a decade worth of research already shared. I can't imagine the burden of pulling this off for eg a PS3 emulator. I don't realistically even see a bright future for high-level emulators for modern systems. The complexity is becoming far too great for hobbyists.

> I've been thinking for years of emulation like DICE uses (transistor level emulation), but I had no idea it would be that computationally expensive.

Oh yes, the amount of shortcuts emulators take is absolutely staggering. I'm not sure transistor-level emulation will ever be practical, but it is at least nice for the documentation-aspect. We run into new problems there, though. Modern chips have multiple layers of transistors sandwiched onto a single PCB. You can't simply scan in the surface of said chips and painfully digitize them: you'd have to slice through each layer, and now you are talking astronomical costs.

> When we're talking 3GHz, what does that actually mean?

In this article's case, I am referring to a 3GHz Core series chip, as compared to a Pentium II 300MHz. You are indeed correct, those old Pentium 4s were extremely inefficient.

Without the pixel-based video renderer, bsnes can run on a 1.6GHz Atom, but compromise kind of defeats the purpose. I was hoping to provide something for slower machines that was at least better, but with the entire underlying design striving for accuracy (separate threads for each chip, every individual memory address invoking function callbacks, etc), it's not really practical to match the speed of eg Snes9X (processor enslavement, passing raw block pointers with multi-byte reads, etc.)

> I would have loved even more technical detail.. maybe some assembly code snippets or something.

I'd love to go into more detail, but it would certainly limit the audience. Explaining the pipelining behavior in detail is really fun. For instance, Marko's Magic Football will clear an IRQ and still expect it to trigger right after. This works because the two-stage pipeline performs the work-cycle IRQ test one cycle behind the bus-cycle next-opcode fetch. The same does not hold true for clearing the IRQ with REP, because that instruction clears the flags more than one cycle prior to the opcode termination. Attempting to simulate this pipeline is quite the challenge, you have to turn one serial thread into two parallel processes.

The costs are ridiculous ($8 for Wii SNES, or $2 for local video game shop; that should be reversed since Nintendo endures no manufacturing or distribution costs), 90% of the library is unavailable, and you aren't buying the games anymore. Many of us refuse to lease games.

Games are also hacked up in interesting ways: Wave Runner lost the license to Kawasaki, so the game was edited to remove all references to it. Not really a big deal, I admit. Perhaps a more passionate example, I hear that the earlier Zelda OoT builds had red blood, subsequently edited out? Some are even questioning whether we should be preserving these VC variants.

> If a transistor-perfect pong is so difficult, we have a long way to go before we're emulating neuron-accurate brains!

I suspect AI will work much the same as higher-level emulators: we cut out the inefficient parts of the brain that seem unimportant to non-organic lifeforms A more entertaining question is: do we have the necessary intelligence to create an artificial intelligence that exceeds our own? We can certainly do it with Chess, at least. But there's no learning involved there.

> So I presume wii-ware classic titles are specifically re-coded and tweaked to operate properly with the Wii's emulator? Or do they tend to have bizarre timing issues as well?

Each Wii Virtual Console game comes as an emulator+ROM package. The emulator included is tweaked to work with that specific game. The N64 emulation uses the same high-level and dynamic recompilation techniques of UltraHLE. They are not designed for accuracy, just playability. Which is understandable, given the Wii's processing power.

> Could it be easily threaded and some of the upcoming 8+ core chips make incredibly light work of emulation?

Not practically, no. At least not for the SNES. Consider that the fastest single-cores already run bsnes at full speed. Requiring an 8-core system kind of makes the system requirements even more steep.

But from a timing perspective: emulation is inherently serial. When you have multiple cores, you cannot finely control the exact interleaving of one virtual processor's operations with another virtual processor. The clock rates are not exact, the OS will interfere with scheduling, etc.

You could perhaps get away with this with faster and faster systems, as your tolerance window to timing errors will undoubtedly increase. But trying this for a 2-3MHz system ... it's not really practical. And right off the bat, the whole idea goes against the goal of accuracy. You're sacrificing timing precision in exchange for performance. I suspect it'll be a necessary compromise for the first few years of eg Xbox 360 emulation, at least.

> Given the image that shows multiple cores at a high percentage of usage, I'm assuming that it is already threaded.

No, when I speak of bsnes being threaded, I am using what are known as cooperative threads (or fibers, or green threads, or coroutines, or ... [everyone who implements them comes up with their own names for them]) They are threads that only the application knows about. They're like function calls that also switch out the stack (call frame.)

bsnes only uses a single thread, the task manager picture shown is of some other program =)

> I could have sworn that MAME was sharing or planning on using the libsnes core(which is the core to BSNES)

They are trying to port some of it, but won't use it outright due to differences in the underlying design. The porting is producing countless bugs, unfortunately. Emulation code, especially accuracy focused code, is very precise. The subtlest change from a >= to a > can totally break things.

The bsnes approach is: code specialized to handle the system it is emulating. The MAME/MESS approach is: hammer all square pegs into round slots. MAME produces unified code that only needs to be understood once to work with thousands of hardware systems. bsnes produces cleaner code, but learning how it all works will only help you with the SNES.

> I don't think of outdated hardware in the same way.

I certainly understand where you're coming from.

To me, I see that we can either preserve 4,000 games individually; or one system. By preserving one system, we automatically preserve all games. By preserving one game, we've done nothing for the other 3,999. What you see with game-focused emulators is that the big-name games are preserved great. Who here has trouble running Super Mario World or Chrono Trigger or Zelda 3 in any emulator? Now how about the oddballs like Speedy Gonzales, Air Strike Patrol, Mecarobot Golf, Timecop, etc? Sure, those games are shit to most of us. But everyone has an obscure game they love, that nobody else has heard of. For me, that SNES game is Jaki Crush

I sincerely doubt that. And even if they did, they'd never share it or release new hardware designs anyway. What good is the existence of Starfox 2 to everyone if only Nintendo ever had the one copy of it?

> Actually we do round pi down because irrational numbers are 'inconvenient'

I was meaning all the way down to 3, but yeah, the line was kind of cheesy.

> I'm torn. I totally respect the effort and the output, and the article is interesting... but it's video games. Not nearly as important as the "life-or-death" tone of the article seems to imply.

This is one of the hardest things to explain to people.

I can write an article with a, "hey this is great but eh who really cares?" vibe to it. And with that, why would I expect anyone to care?

A penny on the ground is garbage to most, but a mint 1939 C-stamped wheat penny (making this up) can be priceless to someone else.

This is most definitely a hobby to me. I also study Japanese, play the guitar, watch a lot of Sci-Fi, enjoy cooking and fine dining, draw a bit, and work on other programming projects like lightweight alternatives to SDL and Qt, and so forth. It's really not life or death to me, it's just my hobby. I know I find it more important than the average person, but I'm honestly not obsessed with it.

> It would also be possible to implement a SNES emulator in an FPGA, which would give much more predictable results at a lower clock speed.

I've wanted to do that for a long time. An FPGA would indeed be a far more faithful and consistent representation, but unfortunately then only a dozen people could ever enjoy the result.

> Is there any information on when bsnes will be available for the PS3 or 360?

We are capped at about 50-55fps on the PS3, sadly. An in-order PowerPC core really needs code hand-crafted for it: branch predictions, unaligned memory, uint8 types, and cache are all far more expensive there. The SPEs are also useless for our purposes. I would love to work on a PS3, if Sony weren't so hostile toward potential indie developers. Not willing to jump through hoops and use stolen devkits to do it, though.

> Byuu is also working on a private archive of SNES game boxes, manuals, cartridge labels, etc, before they are lost to time.

Indeed, and that is a very expensive and difficult project. I hope to recover losses upon selling the scanned set, but yeah. I am having great difficulty in finding good condition copies of certain boxes and manuals, and I've only targeted the US set so far.

It seems that I cannot find anyone with complete sets that are willing to let me scan them in. I find it very unfortunate that most are content to completely ignore the boxes, manuals, cartridge labels, maps, guides, etc that came with our games

> See, I'd always pointed to ZSNES as an example of a great emulator.

Yes, it's designed to run the most popular games the best. Hence I very often hear that it's 'perfect'. See the ZSNES SVN bug reports list on page 2, though.

Supported 7-zip, multi-archive files, an integrated database of all know Game Genie/PAR cheat codes, pixel shaders, far more software video filters, all the color adjustments you could think of, overscan cropping, widescreen scaling, full GUI theming support, custom graphic borders in fullscreen mode, screen shots and metadata in the file load window, multiple keyboard+mouse support with crazy mapping capabilities (you could play Mario with only a mouse, or map fullscreen toggle to Shift+Left Click+Joypad button B, etc), a state manager that gives you descriptions for each save, rewind support, multiple emulation profiles that would trade accuracy for speed, hotkey bindings for absolutely everything, a really powerful debugger, all kinds of stuff. Almost all of it not supported elsewhere.

It didn't really help at all, it just distracted from my initial goal. Being the only developer, that was a really big deal.

I would have loved even more technical detail.. maybe some assembly code snippets or something.

I think people are missing the point of the article in complaining about the need to play these on iOS/Android, etc.. the value of these games is not necessarily to play them it's to preserve them, and the author recognizes the preservation is important. You can't place portability above preservation.. both are important but portability is useless without preservation.

"And Starfox suffers from significant slowdown issues throughout the game. These are certainly not desirable attributes, but they are correct nonetheless. We wouldn't round pi down to 3 simply because irrational numbers are inconvenient, right?"

I stopped reading right here.

Why? It would be rewriting history to change that. Imperfection is a part of everything, and it's part of the experience.

I'm torn. I totally respect the effort and the output, and the article is interesting... but it's video games. Not nearly as important as the "life-or-death" tone of the article seems to imply.

That's not to denigrate the actual work that goes into something like this, but get a little perspective. Your entertainment desires aren't exactly the stuff of history no matter how much you love SNES roms.

People said largely the same thing about the work of Vincent van Gogh while he was still alive. The presence of his works in the world's most famous museums (and four of them on my own living room wall) would suggest that none of us is equipped to know exactly what will and will not become "the stuff of history."

When we're talking 3GHz, what does that actually mean? Do improvements to CPUs (a 3GHz P4 core doesn't match a 3GHz i7 core) matter at all, or would building CPUs geared towards emulation help? Could it be easily threaded and some of the upcoming 8+ core chips make incredibly light work of emulation?

I've been thinking for years of emulation like DICE uses (transistor level emulation), but I had no idea it would be that computationally expensive. It must be similar to software that Intel or AMD would use to test and debug CPUs before putting them into production.

Also, I loved the idea of scanning the physical chips and emulating them that way, it's even closer to virtual hardware! You'd almost think that all you'd need was the scan, no custom code at all, plug different processor scans in and out since all you'd really need code for was following the flow of electrons. Very exciting!

I'm torn. I totally respect the effort and the output, and the article is interesting... but it's video games. Not nearly as important as the "life-or-death" tone of the article seems to imply.

That's not to denigrate the actual work that goes into something like this, but get a little perspective. Your entertainment desires aren't exactly the stuff of history no matter how much you love SNES roms.

It's a preservation thing. Why wouldn't we want to preserve accurately the history of a medium that will only get more robust and interactive.

Just because painting are more intricate now doesn't mean the cave drawings we unearth are any less art. That's how these games may look someday and they deserve to be preserved as accurately as possible.

This reminds me of the work that was done in MAME, the arcade emulator. A lot of very smart people spent a great deal of time reverse engineering the hardware to preserve it. I was involed in the project for a couple years in the early 2000s and I was amazed at the diversity of talent it took to make the project sucessful. Fun times.

with companies re-releasing games i am surprised the emulation market hasn't dried up.

Mostly because they have not re-released the whole library. They have barely scratched the surface added to the fact some games probably won't ever be ported *Mother 3 comes to mind, the best we can do is the fan translation*.

I found BSNES when I was looking for a better emulator to play Super Mario RPG. There were issues with using the "newer" version of ZSNES (the older version of which I had used for many years) so I had to keep the older one around. Keeping old software around is very unsettling... so when I found this I gave it a try and I have never looked back. It is very accurate and very fast. This comes into play with things like the Super Jump / Ultra-jump requirements for Mario's final armor, and whether or not the Equip menu would freeze when changing equipment, which could be quite frustrating even with a Save State feature.

So Byuu, you have my support and I have recommended BSNES to anyone looking for an emulator.

I agree that we need to emulate these old systems as close as possible for history's sake, but with the huge push towards emus on iOS/Android (as Exelius pointed out) I think it's going to be a tough road...

Quote:

Take the case of DICE, the digital integrated circuit emulator. Here is an emulator that works at the transistor level for absolutely perfect recreation of the very first video games ever created.

Bah -- come back to me when you're emulating at the atomic level, and then you'll have "perfect" emulation

You're not alone working on accurate emulation. I'm part of the openMSX team and we're doing similar things.

Central in openMSX is the concept of EmuTime, which is a timestamp in emulated time. Every I/O operation includes such a timestamp. So for example if a register of a peripheral chip is read or written, the peripheral will first update its internal state to the given EmuTime stamp and then it will process the register read or write action. This is probably what you call "just-in-time synchronization".

We also have an internal scheduler on which devices can register when they need to initiate communication with the outside world, for example requesting an interrupt (emulated machine event) or deliver a rendered frame (emulator event). Is this what you call "cooperative multithreading"?

The MSX is a relatively light system to accurately emulate because there is not a lot of communication going on: there is no DMA and no shared memory. Therefore openMSX can run on mobile systems like the Nokia N900 or the Dingoo A320 at usable frame rates, without sacrificing accuracy. The most demanding peripherals to emulate are some of the sound chips (FM synthesis of Yamaha OPL1/OPLL/OPL4) and the video command engine. We have special cases to reduce the amount of synchronization required, for example if the command engine writes to an area of the video RAM that will not be rasterized this frame, there is no need to sync command engine writes with the rasterization process.

openMSX is certainly not 100% accurate yet. The main bottleneck is reverse engineering accurate specs of the hardware: the documentation that was written by the manufacturers states how you can use the chip, but not how the chip works internally. To emulate accurately, you need more information than can be found in the official data books. So far we've reverse engineered with MSX (Z80) assembly test programs and logic analyzers. We would be interested in decapping the OPLL (YM2413), since it contains parameters for the 15 built-in FM instruments that are very hard to obtain in any other way (people have tried by ear, but they end up with slightly different results).

"And Starfox suffers from significant slowdown issues throughout the game. These are certainly not desirable attributes, but they are correct nonetheless. We wouldn't round pi down to 3 simply because irrational numbers are inconvenient, right?"

Actually we do round pi down because irrational numbers are 'inconvenient'. However I did not stop reading because of a bad analogy... good article. I've been using ZSNES almost forever now, but I will likely give BSNES a spin in the near future.

I've done quite a bit of coding and I have yet to see anyone ever round PI down to "3." I mean, maybe at the absolute laziest "3.14" for convenience' sake, but I don't think I've ever seen anyone use PI at integer precision... . We "round PI down to some 'convenient' precision," but we don't typically, if ever, round PI down to "3."