Unfortunately, it seems to have devolved into mostly a get-rich-quick scheme for nerds, and by nearly any measure it’s turning into a spectacular catastrophe. Its “success” is measured in how much a bitcoin is worth in US dollars, which is pretty close to an admission from its own investors that its only value is in converting back to “real” money — all while that same “success” is making it less useful as a distinct currency.

Blah, blah, everyone already knows this.

What concerns me slightly more is the gold rush hype cycle, which is putting cryptocurrency and “blockchain” in the news and lending it all legitimacy. People have raked in millions of dollars on ICOs of novel coins I’ve never heard mentioned again. (Note: again, that value is measured in dollars.) Most likely, none of the investors will see any return whatsoever on that money. They can’t, really, unless a coin actually takes off as a currency, and that seems at odds with speculative investing since everyone either wants to hoard or ditch their coins. When the coins have no value themselves, the money can only come from other investors, and eventually the hype winds down and you run out of other investors.

I fear this will hurt a lot of people before it’s over, so I’d like for it to be over as soon as possible.

That said, the hype itself has gotten way out of hand too. First it was the obsession with “blockchain” like it’s a revolutionary technology, but hey, Git is a fucking blockchain. The novel part is the way it handles distributed consensus (which in Git is basically left for you to figure out), and that’s uniquely important to currency because you want to be pretty sure that money doesn’t get duplicated or lost when moved around.

But now we have startups trying to use blockchains for website backends and file storage and who knows what else? Why? What advantage does this have? When you say “blockchain”, I hear “single Git repository” — so when you say “email on the blockchain”, I have an aneurysm.

Bitcoin seems to have sparked imagination in large part because it’s decentralized, but I’d argue it’s actually a pretty bad example of a decentralized network, since people keep forking it. The ability to fork is a feature, sure, but the trouble here is that the Bitcoin family has no notion of federation — there is one canonical Bitcoin ledger and it has no notion of communication with any other. That’s what you want for currency, not necessarily other applications. (Bitcoin also incentivizesfrivolous forking by giving the creator an initial pile of coins to keep and sell.)

And federation is much more interesting than decentralization! Federation gives us email and the web. Federation means I can set up my own instance with my own rules and still be able to meaningfully communicate with the rest of the network. Federation has some amount of tolerance for changes to the protocol, so such changes are more flexible and rely more heavily on consensus.

Federation is fantastic, and it feels like a massive tragedy that this rekindled interest in decentralization is mostly focused on peer-to-peer networks, which do little to address our current problems with centralized platforms.

Again, the tech is cool and all, but the marketing hype is getting way out of hand.

Maybe what I really want from 2018 is less marketing?

For one, I’ve seen a huge uptick in uncritically referring to any software that creates or classifies creative work as “AI”. Can we… can we not. It’s not AI. Yes, yes, nerds, I don’t care about the hair-splitting about the nature of intelligence — you know that when we hear “AI” we think of a human-like self-aware intelligence. But we’re applying it to stuff like a weird dog generator. Or to whatever neural network a website threw into production this week.

And this is dangerously misleading — we already had massive tech companies scapegoating The Algorithm™ for the poor behavior of their software, and now we’re talking about those algorithms as though they were self-aware, untouchable, untameable, unknowable entities of pure chaos whose decisions we are arbitrarily bound to. Ancient, powerful gods who exist just outside human comprehension or law.

It’s weird to see this stuff appear in consumer products so quickly, too. It feels quick, anyway. The latest iPhone can unlock via facial recognition, right? I’m sure a lot of effort was put into ensuring that the same person’s face would always be recognized… but how confident are we that other faces won’t be recognized? I admit I don’t follow all this super closely, so I may be imagining a non-problem, but I do know that humans are remarkably bad at checking for negative cases.

Hell, take the recurring problem of major platforms like Twitter and YouTube classifying anything mentioning “bisexual” as pornographic — because the word is also used as a porn genre, and someone threw a list of porn terms into a filter without thinking too hard about it. That’s just a word list, a fairly simple thing that any human can review; but suddenly we’re confident in opaque networks of inferred details?

I don’t know. “Traditional” classification and generation are much more comforting, since they’re a set of fairly abstract rules that can be examined and followed. Machine learning, as I understand it, is less about rules and much more about pattern-matching; it’s built out of the fingerprints of the stuff it’s trained on. Surely that’s just begging for tons of edge cases. They’re practically made of edge cases.

I’m reminded of a point I saw made a few days ago on Twitter, something I’d never thought about but should have. TurnItIn is a service for universities that checks whether students’ papers match any others, in order to detect cheating. But this is a paid service, one that fundamentally hinges on its corpus: a large collection of existing student papers. So students pay money to attend school, where they’re required to let their work be given to a third-party company, which then profits off of it? What kind of a goofy business model is this?

And my thoughts turn to machine learning, which is fundamentally different from an algorithm you can simply copy from a paper, because it’s all about the training data. And to get good results, you need a lot of training data. Where is that all coming from? How many for-profit companies are setting a neural network loose on the web — on millions of people’s work — and then turning around and selling the result as a product?

This is really a question of how intellectual property works in the internet era, and it continues our proud decades-long tradition of just kinda doing whatever we want without thinking about it too much. Nothing if not consistent.

A bit tougher, since computers are pretty alright now and everything continues to chug along. Maybe we should just quit while we’re ahead. There’s some real pie-in-the-sky stuff that would be nice, but it certainly won’t happen within a year, and may never happen except in some horrific Algorithmic™ form designed by people that don’t know anything about the problem space and only works 60% of the time but is treated as though it were bulletproof.

The giants are getting more giant. Maybe too giant? Granted, it could be much worse than Google and Amazon — it could be Apple!

Amazon has its own delivery service and brick-and-mortar stores now, as well as providing the plumbing for vast amounts of the web. They’re not doing anything particularly outrageous, but they kind of loom.

Ad company Google just put ad blocking in its majority-share browser — albeit for the ambiguously-noble goal of only blocking obnoxious ads so that people will be less inclined to install a blanket ad blocker.

Twitter is kind of a nightmare but no one wants to leave. I keep trying to use Mastodon as well, but I always forget about it after a day, whoops.

Facebook sounds like a total nightmare but no one wants to leave that either, because normies don’t use anything else, which is itself direly concerning.

IRC is rapidly bleeding mindshare to Slack and Discord, both of which are far better at the things IRC sadly never tried to do and absolutely terrible at the exact things IRC excels at.

The problem is the same as ever: there’s no incentive to interoperate. There’s no fundamental technical reason why Twitter and Tumblr and MySpace and Facebook can’t intermingle their posts; they just don’t, because why would they bother? It’s extra work that makes it easier for people to not use your ecosystem.

I don’t know what can be done about that, except that hope for a really big player to decide to play nice out of the kindness of their heart. The really big federated success stories — say, the web — mostly won out because they came along first. At this point, how does a federated social network take over? I don’t know.

I… don’t really have a solid grasp on what’s happening in tech socially at the moment. I’ve drifted a bit away from the industry part, which is where that all tends to come up. I have the vague sense that things are improving, but that might just be because the Rust community is the one I hear the most about, and it puts a lot of effort into being inclusive and welcoming.

So… more projects should be like Rust? Do whatever Rust is doing? And not so much what Linus is doing.

I haven’t heard this brought up much lately, but it would still be nice to see. The Bay Area runs on open source and is raking in zillions of dollars on its back; pump some of that cash back into the ecosystem, somehow.

I’ve seen a couple open source projects on Patreon, which is fantastic, but feels like a very small solution given how much money is flowing through the commercial tech industry.

One might wonder where the money to host a website comes from, then? I don’t know. Maybe we should loop this in with the above thing and find a more informal way to pay people for the stuff they make when we find it useful, without the financial and cognitive overhead of A Transaction or Giving Someone My Damn Credit Card Number. You know, something like Bitco— ah, fuck.

Last fall, Epic Games released Fortnite’s free-to-play “Battle Royale” game mode for the PC and other platforms, generating massive interest among gamers.

This also included thousands of cheaters, many of whom were subsequently banned. Epic Games then went a step further by taking several cheaters to court for copyright infringement.

In the months that have passed several cases have been settled with undisclosed terms, but it appears that not all defendants are easy to track down. In at least two cases, Epic had to retain the services of private investigators to locate their targets.

In a case filed in North Carolina, the games company was unable to serve the defendant (now identified as B.B) so they called in the help of Klatt Investigations, with success.

“[A]fter having previously engaged two other process servers that were unable to locate and successfully serve B.B., Epic engaged Klatt Investigations, a Canadian firm that provides various services related to the private service of process in civil matters.

“In this case, we engaged Klatt Investigations to locate and effect service of process by personal service on Defendant,” Epic informs the court.

As Epic Games didn’t know the age of the defendant beforehand they chose to approach the person as a minor, which turned out to be a wise choice. The alleged cheater indeed appears to be a minor, so both the Defendant and Defendant’s mother were served.

Based on this new information, Epic Games asked the court to redact any court documents that reveal personal information of the defendant, which includes his or her full name.

Epic’s request to seal

This is not the first time Epic Games has used a private investigator to locate a defendant. It hired S&H Investigative Services in another widely reported case, where the defendant also turned out to be a minor.

It’s fair to say that of all video games anti-piracy technologies, Denuvo is perhaps the most hated of recent times. That hatred unsurprisingly stems from both its success and complexity.

Those with knowledge of the system say it’s fiendishly difficult to defeat but in recent times, cracks have been showing. In 2017, various iterations of the anti-tamper system were defeated by several cracking groups, much to the delight of the pirate masses.

Now, however, a new development has the potential to herald a new lease of life for the Austria-based anti-piracy company. A few moments ago it was revealed that the company has been bought by Irdeto, a global anti-piracy company with considerable heritage and resources.

“Irdeto has acquired Denuvo, the world leader in gaming security, to provide anti-piracy and anti-cheat solutions for games on desktop, mobile, console and VR devices,” Irdeto said in a statement.

“Denuvo provides technology and services for game publishers and platforms, independent software vendors, e-publishers and video publishers across the globe. Current Denuvo customers include Electronic Arts, UbiSoft, Warner Bros and Lionsgate Entertainment, with protection provided for games such as Star Wars Battlefront II, Football Manager, Injustice 2 and others.”

Irdeto says that Denuvo will “continue to operate as usual” with all of its staff retained – a total of 45 across Austria, Poland, the Czech Republic, and the US. Denuvo headquarters in Salzburg, Austria, will also remain intact along with its sales operations.

“The success of any game title is dependent upon the ability of the title to operate as the publisher intended,” says Irdeto CEO Doug Lowther.

“As a result, protection of both the game itself and the gaming experience for end users is critical. Our partnership brings together decades of security expertise under one roof to better address new and evolving security threats. We are looking forward to collaborating as a team on a number of initiatives to improve our core technology and services to better serve our customers.”

Denuvo was founded relatively recently in 2013 and employs less than 50 people. In contrast, Irdeto’s roots go all the way back to 1969 and currently has almost 1,000 staff. It’s a subsidiary of South Africa-based Internet and media group Naspers, a corporate giant with dozens of notable companies under its control.

While Denuvo is perhaps best known for its anti-piracy technology, Irdeto is also placing emphasis on the company’s ability to hinder cheating in online multi-player gaming environments. This has become a hot topic recently, with several lawsuits filed in the US by companies including Blizzard and Epic.

Denuvo CEO Reinhard Blaukovitsch

“Hackers and cybercriminals in the gaming space are savvy, and always have been. It is critical to implement robust security strategies to combat the latest gaming threats and protect the investment in games. Much like the movie industry, it’s the only way to ensure that great games continue to get made,” says Denuvo CEO Reinhard Blaukovitsch.

“In joining with Irdeto, we are bringing together a unique combination of security expertise, technology and enhanced piracy services to aggressively address security challenges that customers and gamers face from hackers.”

While it seems likely that the companies have been in negotiations for some, the timing of this announcement also coincides with negative news for Denuvo.

Yesterday it was revealed that the latest variant of its anti-tamper technology – Denuvo v4.8 – had been defeated by online cracking group CPY (Conspiracy). Version 4.8 had been protecting Sonic Forces since its release early November 2017 but the game was leaked out onto the Internet late Sunday with all protection neutralized.

It’s also noteworthy that Irdeto supplied behind-the-scenes support in two of the largest IPTV provider raids of recent times, one focused on Spain in 2017 and more recently in Cyprus, Bulgaria, Greece and the Netherlands (1,2,3).

This is all aimed at frivolous pursuits like video games. Hell, even video games where money is at stake should be deferring to someone who knows way more than I do. Otherwise you might find out that your deck shuffles in your poker game are woefully inadequate and some smartass is cheating you out of millions. (If your random number generator has fewer than 226 bits of state, it can’t even generate every possible shuffling of a deck of cards!)

Say you want to pitch up a sound by a random amount, perhaps up to an octave. Your audio API probably has a way to do this that takes a pitch multiplier, where I say “probably” because that’s how the only audio API I’ve used works.

Easy peasy. If 1 is unchanged and 2 is pitched up by an octave, then all you need is rand() + 1. Right?

No! Pitch is exponential — within the same octave, the “gap” between C and C♯ is about half as big as the gap between B and the following C. If you pick a pitch multiplier uniformly, you’ll have a noticeable bias towards the higher pitches.

One octave corresponds to a doubling of pitch, so if you want to pick a random note, you want 2 ** rand().

For two dimensions, you can just pick a random angle with rand() * TAU.

If you want a vector rather than an angle, or if you want a random direction in three dimensions, it’s a little trickier. You might be tempted to just pick a random point where each component is rand() * 2 - 1 (ranging from −1 to 1), but that’s not quite right. A direction is a point on the surface (or, equivalently, within the volume) of a sphere, and picking each component independently produces a point within the volume of a cube; the result will be a bias towards the corners of the cube, where there’s much more extra volume beyond the sphere.

No? Well, just trust me. I don’t know how to make a diagram for this.

Anyway, you could use the Pythagorean theorem a few times and make a huge mess of things, or it turns out there’s a really easy way that even works for two or four or any number of dimensions. You pick each coordinate from a Gaussian (normal) distribution, then normalize the resulting vector. In other words, using Python’s random module:

Note that it is possible to get zero (or close to it) for every component, in which case the result is nonsense. You can re-roll all the components if necessary; just check that the magnitude (or its square) is less than some epsilon, which is equivalent to throwing away a tiny sphere at the center and shouldn’t affect the distribution.

Since I brought it up: the Gaussian distribution is a pretty nice one for choosing things in some range, where the middle is the common case and should appear more frequently.

That said, I never use it, because it has one annoying drawback: the Gaussian distribution has no minimum or maximum value, so you can’t really scale it down to the range you want. In theory, you might get any value out of it, with no limit on scale.

In practice, it’s astronomically rare to actually get such a value out. I did a hundred million trials just to see what would happen, and the largest value produced was 5.8.

But, still, I’d rather not knowingly put extremely rare corner cases in my code if I can at all avoid it. I could clamp the ends, but that would cause unnatural bunching at the endpoints. I could reroll if I got a value outside some desired range, but I prefer to avoid rerolling when I can, too; after all, it’s still (astronomically) possible to have to reroll for an indefinite amount of time. (Okay, it’s really not, since you’ll eventually hit the period of your PRNG. Still, though.) I don’t bend over backwards here — I did just say to reroll when picking a random direction, after all — but when there’s a nicer alternative I’ll gladly use it.

And lo, there is a nicer alternative! Enter the beta distribution. It always spits out a number in [0, 1], so you can easily swap it in for the standard normal function, but it takes two “shape” parameters α and β that alter its behavior fairly dramatically.

With α = β = 1, the beta distribution is uniform, i.e. no different from rand(). As α increases, the distribution skews towards the right, and as β increases, the distribution skews towards the left. If α = β, the whole thing is symmetric with a hump in the middle. The higher either one gets, the more extreme the hump (meaning that value is far more common than any other). With a little fiddling, you can get a number of interesting curves.

Screenshots don’t really do it justice, so here’s a little Wolfram widget that lets you play with α and β live:

Note that if α = 1, then 1 is a possible value; if β = 1, then 0 is a possible value. You probably want them both greater than 1, which clamps the endpoints to zero.

Also, it’s possible to have either α or β or both be less than 1, but this creates very different behavior: the corresponding endpoints become poles.

Anyway, something like α = β = 3 is probably close enough to normal for most purposes but already clamped for you. And you could easily replicate something like, say, NetHack’s incredibly bizarre rnz function.

Say you want some event to have an 80% chance to happen every second. You (who am I kidding, I) might be tempted to do something like this:

1
2

ifrandom()<0.8*dt:do_thing()

In an ideal world, dt is always the same and is equal to 1 / f, where f is the framerate. Replace that 80% with a variable, say P, and every tic you have a P / f chance to do the… whatever it is.

Each second, f tics pass, so you’ll make this check f times. The chance that any check succeeds is the inverse of the chance that every check fails, which is \(1 – \left(1 – \frac{P}{f}\right)^f\).

For P of 80% and a framerate of 60, that’s a total probability of 55.3%. Wait, what?

Consider what happens if the framerate is 2. On the first tic, you roll 0.4 twice — but probabilities are combined by multiplying, and splitting work up by dt only works for additive quantities. You lose some accuracy along the way. If you’re dealing with something that multiplies, you need an exponent somewhere.

But in this case, maybe you don’t want that at all. Each separate roll you make might independently succeed, so it’s possible (but very unlikely) that the event will happen 60 times within a single second! Or 200 times, if that’s someone’s framerate.

If you explicitly want something to have a chance to happen on a specific interval, you have to check on that interval. If you don’t have a gizmo handy to run code on an interval, it’s easy to do yourself with a time buffer:

Using while means rolls still happen even if you somehow skipped over an entire second.

(For the curious, and the nerds who already noticed: the expression \(1 – \left(1 – \frac{P}{f}\right)^f\) converges to a specific value! As the framerate increases, it becomes a better and better approximation for \(1 – e^{-P}\), which for the example above is 0.551. Hey, 60 fps is pretty accurate — it’s just accurately representing something nowhere near what I wanted. Er, you wanted.)

Of course, you can fuss with the classic [0, 1] uniform value however you want. If I want a bias towards zero, I’ll often just square it, or multiply two of them together. If I want a bias towards one, I’ll take a square root. If I want something like a Gaussian/normal distribution, but with clearly-defined endpoints, I might add together n rolls and divide by n. (The normal distribution is just what you get if you roll infinite dice and divide by infinity!)

It’d be nice to be able to understand exactly what this will do to the distribution. Unfortunately, that requires some calculus, which this post is too small to contain, and which I didn’t even know much about myself until I went down a deep rabbit hole while writing, and which in many cases is straight up impossible to express directly.

Here’s the non-calculus bit. A source of randomness is often graphed as a PDF — a probability density function. You’ve almost certainly seen a bell curve graphed, and that’s a PDF. They’re pretty nice, since they do exactly what they look like: they show the relative chance that any given value will pop out. On a bog standard bell curve, there’s a peak at zero, and of course zero is the most common result from a normal distribution.

(Okay, actually, since the results are continuous, it’s vanishingly unlikely that you’ll get exactly zero — but you’re much more likely to get a value near zero than near any other number.)

For the uniform distribution, which is what a classic rand() gives you, the PDF is just a straight horizontal line — every result is equally likely.

If there were a calculus bit, it would go here! Instead, we can cheat. Sometimes. Mathematica knows how to work with probability distributions in the abstract, and there’s a free web version you can use. For the example of squaring a uniform variable, try this out:

(The \[Distributed] is a funny tilde that doesn’t exist in Unicode, but which Mathematica uses as a first-class operator. Also, press shiftEnter to evaluate the line.)

This will tell you that the distribution is… \(\frac{1}{2\sqrt{u}}\). Weird! You can plot it:

1

Plot[%, {u, 0, 1}]

(The % refers to the result of the last thing you did, so if you want to try several of these, you can just do Plot[PDF[…], u] directly.)

The resulting graph shows that numbers around zero are, in fact, vastly — infinitely — more likely than anything else.

What about multiplying two together? I can’t figure out how to get Mathematica to understand this, but a great amount of digging revealed that the answer is -ln x, and from there you can plot them both on Wolfram Alpha. They’re similar, though squaring has a much better chance of giving you high numbers than multiplying two separate rolls — which makes some sense, since if either of two rolls is a low number, the product will be even lower.

What if you know the graph you want, and you want to figure out how to play with a uniform roll to get it? Good news! That’s a whole thing called inverse transform sampling. All you have to do is take an integral. Good luck!

This is all extremely ridiculous. New tactic: Just Simulate The Damn Thing. You already have the code; run it a million times, make a histogram, and tada, there’s your PDF. That’s one of the great things about computers! Brute-force numerical answers are easy to come by, so there’s no excuse for producing something like rnz. (Though, be sure your histogram has sufficiently narrow buckets — I tried plotting one for rnz once and the weird stuff on the left side didn’t show up at all!)

By the way, I learned something from futzing with Mathematica here! Taking the square root (to bias towards 1) gives a PDF that’s a straight diagonal line, nothing like the hyperbola you get from squaring (to bias towards 0). How do you get a straight line the other way? Surprise: \(1 – \sqrt{1 – u}\).

I don’t claim to have a very firm grasp on this, but I had a hell of a time finding it written out clearly, so I might as well write it down as best I can. This was a great excuse to finally set up MathJax, too.

Say \(u(x)\) is the PDF of the original distribution and \(u\) is a representative number you plucked from that distribution. For the uniform distribution, \(u(x) = 1\). Or, more accurately,

Remember that \(x\) here is a possible outcome you want to know about, and the PDF tells you the relative probability that a roll will be near it. This PDF spits out 1 for every \(x\), meaning every number between 0 and 1 is equally likely to appear.

We want to do something to that PDF, which creates a new distribution, whose PDF we want to know. I’ll use my original example of \(f(u) = u^2\), which creates a new PDF\(v(x)\).

The trick is that we need to work in terms of the cumulative distribution function for \(u\). Where the PDF gives the relative chance that a roll will be (“near”) a specific value, the CDF gives the relative chance that a roll will be less than a specific value.

The conventions for this seem to be a bit fuzzy, and nobody bothers to explain which ones they’re using, which makes this all the more confusing to read about… but let’s write the CDF with a capital letter, so we have \(U(x)\). In this case, \(U(x) = x\), a straight 45° line (at least between 0 and 1). With the definition I gave, this should make sense. At some arbitrary point like 0.4, the value of the PDF is 1 (0.4 is just as likely as anything else), and the value of the CDF is 0.4 (you have a 40% chance of getting a number from 0 to 0.4).

Calculus ahoy: the PDF is the derivative of the CDF, which means it measures the slope of the CDF at any point. For \(U(x) = x\), the slope is always 1, and indeed \(u(x) = 1\). See, calculus is easy.

Okay, so, now we’re getting somewhere. What we want is the CDF of our new distribution, \(V(x)\). The CDF is defined as the probability that a roll \(v\) will be less than \(x\), so we can literally write:

$$V(x) = P(v \le x)$$

(This is why we have to work with CDFs, rather than PDFs — a PDF gives the chance that a roll will be “nearby,” whatever that means. A CDF is much more concrete.)

What is \(v\), exactly? We defined it ourselves; it’s the do something applied to a roll from the original distribution, or \(f(u)\).

$$V(x) = P\!\left(f(u) \le x\right)$$

Now the first tricky part: we have to solve that inequality for \(u\), which means we have to do something, backwards to \(x\).

$$V(x) = P\!\left(u \le f^{-1}(x)\right)$$

Almost there! We now have a probability that \(u\) is less than some value, and that’s the definition of a CDF!

$$V(x) = U\!\left(f^{-1}(x)\right)$$

Hooray! Now to turn these CDFs back into PDFs, all we need to do is differentiate both sides and use the chain rule. If you never took calculus, don’t worry too much about what that means!

Wait! Where did that absolute value come from? It takes care of whether \(f(x)\) increases or decreases. It’s the least interesting part here by far, so, whatever.

There’s one more magical part here when using the uniform distribution — \(u(\dots)\) is always equal to 1, so that entire term disappears! (Note that this only works for a uniform distribution with a width of 1; PDFs are scaled so the entire area under them sums to 1, so if you had a rand() that could spit out a number between 0 and 2, the PDF would be \(u(x) = \frac{1}{2}\).)

$$v(x) = \left|\frac{d}{dx}f^{-1}(x)\right|$$

So for the specific case of modifying the output of rand(), all we have to do is invert, then differentiate. The inverse of \(f(u) = u^2\) is \(f^{-1}(x) = \sqrt{x}\) (no need for a ± since we’re only dealing with positive numbers), and differentiating that gives \(v(x) = \frac{1}{2\sqrt{x}}\). Done! This is also why square root comes out nicer; inverting it gives \(x^2\), and differentiating that gives \(2x\), a straight line.

Incidentally, that method for turning a uniform distribution into any distribution — inverse transform sampling — is pretty much the same thing in reverse: integrate, then invert. For example, when I saw that taking the square root gave \(v(x) = 2x\), I naturally wondered how to get a straight line going the other way, \(v(x) = 2 – 2x\). Integrating that gives \(2x – x^2\), and then you can use the quadratic formula (or just ask Wolfram Alpha) to solve \(2x – x^2 = u\) for \(x\) and get \(f(u) = 1 – \sqrt{1 – u}\).

Multiply two rolls is a bit more complicated; you have to write out the CDF as an integral and you end up doing a double integral and wow it’s a mess. The only thing I’ve retained is that you do a division somewhere, which then gets integrated, and that’s why it ends up as \(-\ln x\).

And that’s quite enough of that! (Okay but having math in my blog is pretty cool and I will definitely be doing more of this, sorry, not sorry.)

Sometimes, random isn’t actually what you want. We tend to use the word “random” casually to mean something more like chaotic, i.e., with no discernible pattern. But that’s not really random. In fact, given how good humans can be at finding incidental patterns, they aren’t all that unlikely! Consider that when you roll two dice, they’ll come up either the same or only one apart almost half the time. Coincidence? Well, yes.

If you ask for randomness, you’re saying that any outcome — or series of outcomes — is acceptable, including five heads in a row or five tails in a row. Most of the time, that’s fine. Some of the time, it’s less fine, and what you really want is variety. Here are a couple examples and some fairly easy workarounds.

The nature of games is such that NPCs will eventually run out of things to say, at which point further conversation will give the player a short brush-off quip — a slight nod from the designer to the player that, hey, you hit the end of the script.

Some NPCs have multiple possible quips and will give one at random. The trouble with this is that it’s very possible for an NPC to repeat the same quip several times in a row before abruptly switching to another one. With only a few options to choose from, getting the same option twice or thrice (especially across an entire game, which may have numerous NPCs) isn’t all that unlikely. The notion of an NPC quip isn’t very realistic to start with, but having someone repeat themselves and then abruptly switch to something else is especially jarring.

The easy fix is to show the quips in order! Paradoxically, this is more consistently varied than choosing at random — the original “order” is likely to be meaningless anyway, and it already has the property that the same quip can never appear twice in a row.

If you like, you can shuffle the list of quips every time you reach the end, but take care here — it’s possible that the last quip in the old order will be the same as the first quip in the new order, so you may still get a repeat. (Of course, you can just check for this case and swap the first quip somewhere else if it bothers you.)

Random drops are often implemented as a flat chance each time. Maybe enemies have a 5% chance to drop health when they die. Legally speaking, over the long term, a player will see health drops for about 5% of enemy kills.

Over the short term, they may be desperate for health and not survive to see the long term. So you may want to put a thumb on the scale sometimes. Games in the Metroid series, for example, have a somewhat infamous bias towards whatever kind of drop they think you need — health if your health is low, missiles if your missiles are low.

I can’t give you an exact approach to use, since it depends on the game and the feeling you’re going for and the variables at your disposal. In extreme cases, you might want to guarantee a health drop from a tough enemy when the player is critically low on health. (Or if you’re feeling particularly evil, you could go the other way and deny the player health when they most need it…)

The problem becomes a little different, and worse, when the event that triggers the drop is relatively rare. The pathological case here would be something like a raid boss in World of Warcraft, which requires hours of effort from a coordinated group of people to defeat, and which has some tiny chance of dropping a good item that will go to only one of those people. This is why I stopped playing World of Warcraft at 60.

Dialing it back a little bit gives us Enter the Gungeon, a roguelike where each room is a set of encounters and each floor only has a dozen or so rooms. Initially, you have a 1% chance of getting a reward after completing a room — but every time you complete a room and don’t get a reward, the chance increases by 9%, up to a cap of 80%. Once you get a reward, the chance resets to 1%.

The natural question is: how frequently, exactly, can a player expect to get a reward? We could do math, or we could Just Simulate The Damn Thing.

We’ve got kind of a hilly distribution, skewed to the left, which is up in this histogram. Most of the time, a player should see a reward every three to six rooms, which is maybe twice per floor. It’s vanishingly unlikely to go through a dozen rooms without ever seeing a reward, so a player should see at least one per floor.

Of course, this simulated a single continuous playthrough; when starting the game from scratch, your chance at a reward always starts fresh at 1%, the worst it can be. If you want to know about how many rewards a player will get on the first floor, hey, Just Simulate The Damn Thing.

Cool. Though, that’s assuming exactly 12 rooms; it might be worth changing that to pick at random in a way that matches the level generator.

(Enter the Gungeon does some other things to skew probability, which is very nice in a roguelike where blind luck can make or break you. For example, if you kill a boss without having gotten a new gun anywhere else on the floor, the boss is guaranteed to drop a gun.)

Say you have a battle sim where every attack has a 6% chance to land a devastating critical hit. Presumably the same rules apply to both the player and the AI opponents.

Consider, then, that the AI opponents have exactly the same 6% chance to ruin the player’s day. Consider also that this gives them an 0.4% chance to critical hit twice in a row. 0.4% doesn’t sound like much, but across an entire playthrough, it’s not unlikely that a player might see it happen and find it incredibly annoying.

Perhaps it would be worthwhile to explicitly forbid AI opponents from getting consecutive critical hits.

An emerging theme here has been to Just Simulate The Damn Thing. So consider Just Simulating The Damn Thing. Even a simple change to a random value can do surprising things to the resulting distribution, so unless you feel like differentiating the inverse function of your code, maybe test out any non-trivial behavior and make sure it’s what you wanted. Probability is hard to reason about.

Allow your robots to join in the fun this Christmas with a round of Channel 4’s Countdown. https://www.rosietheredrobot.com/2017/12/tea-minus-30.html

Rosie the Red Robot

First, a little bit of backstory. Challenged by his eldest daughter to build a robot, technology-loving Alan got to work building Rosie.

I became (unusually) determined. I wanted to show her what can be done… and the how can be learnt later. After all, there is nothing more exciting and encouraging than seeing technology come alive. Move. Groove. Quite literally.

Originally, Rosie had a Raspberry Pi 3 brain controlling ultrasonic sensors and motors via Python. From there, she has evolved into something much grander, and Alan has documented her upgrades on the Rosie the Red Robot blog. Using GPS trackers and a Raspberry Pi camera module, she became Rosie Patrol, a rolling, walking, interactive bot; then, with further upgrades, the Tea Minus 30 project came to be. Which brings us back to Countdown.

T(ea) minus 30

In case it hasn’t been a big part of your life up until now, Countdown is one of the longest running televisions shows in history, and occupies a special place in British culture. Contestants take turns to fill a board with nine randomly selected vowels and consonants, before battling the Countdown clock to find the longest word they can in the space of 30 seconds.

I’ve had quite a few requests to show just the Countdown clock for use in school activities/own games etc., so here it is! Enjoy! It’s a brand new version too, using the 2010 Office package.

There’s a numbers round involving arithmetic, too – but for now, we’re going to focus on letters and words, because that’s where Rosie’s skills shine.

Using an online resource, Alan created a dataset of the ten thousand most common English words.

Many words, listed in order of common-ness. Alan wrote a Python script to order them alphabetically and by length

Next, Alan wrote a Python script to select nine letters at random, then search the word list to find all the words that could be spelled using only these letters. He used the randint function to select letters from a pre-loaded alphabet, and introduced a requirement to include at least two vowels among the nine letters.

Words that match the available letters are displayed on the screen.

Putting it all together

With the basic game-play working, it was time to bring the project to life. For this, Alan used Rosie’s camera module, along with optical character recognition (OCR) and text-to-speech capabilities.

Alan writes, “Here’s a very amateurish drawing to brainstorm our idea. Let’s call it a design as it makes it sound like we know what we’re doing.”

Alan’s script has Rosie take a photo of the TV screen during the Countdown letters round, then perform OCR using the Google Cloud Vision API to detect the nine letters contestants have to work with. Next, Rosie runs Alan’s code to check the letters against the ten-thousand-word dataset, converts text to speech with Python gTTS, and finally speaks her highest-scoring word via omxplayer.

You can follow the adventures of Rosie the Red Robot on her blog, or follow her on Twitter. And if you’d like to build your own Rosie, Alan has provided code and tutorials for his projects too. Thanks, Alan!

Frustrated by thousands of cheaters who wreak havoc in Fortnite’s “Battle Royale,” game publisher Epic Games decided to take several of them to court.

One of the defendants is Minnesota resident Charles Vraspir, a.k.a. “Joreallean,”

The game publisher accused him of copyright infringement and breach of contract, by injecting unauthorized computer code in order to cheat.

According to Epic’s allegations, Vraspir was banned at least nine times but registered new accounts to continue his cheating. In addition, he was also suspected of having written code for the cheats.

“Defendant’s cheating, and his inducing and enabling of others to cheat, is ruining the game playing experience of players who do not cheat,” Epic games wrote.

While the complaint included all the elements for an extensive legal battle, both sides chose to resolve the case without much of a fight. Yesterday, they informed the court that a settlement had been reached.

Epic Games’ counsel asked the court to enter the agreement as well as a permanent injunction, which both have agreed on.

The proposed injunction, signed today, forbids Vraspir from carrying out any copyright infringements in the future, to destroy all cheats, and to never cheat again.

Among other things, he is prohibited from “creating, writing, developing, advertising, promoting, and/or distributing anything that infringes Epic’s works now or hereafter protected by any of Epic’s copyrights.”

While there is no mention of a settlement fee or fine, Vraspir will have to pay $5,000 if he breaches the agreement.

From the injunction

Based on the swift settlement, it can be assumed that Epic Games is not aiming to bankrupt the cheaters. Instead, it’s likely that the company wants to set an example and deter others from cheating in the future.

In addition to the settlement, Epic Games also responded to the mother of the 14-year-old cheater who was sued in a separate case. After we first covered the news last week it was quickly picked up by mainstream media, and it hasn’t gone unnoticed by the game publisher either.

The mother accused Epic of taking a minor to court and making his personal info known to the public.

In a response this week, the company notes that it had no idea of the age of the defendant when it filed the complaint. In addition, Epic notes that by handing over his full name and address in the unredacted letter, she exposed her son.

The rules dictate that filings mentioning an individual known to be a minor should use the minor’s initials only, not the full name as the mother did. While the mother may have waived this protection with her letter, Epic says it will stick to the initials going forward.

“Although there is an argument that by submitting the Letter to the Court containing Defendant’s name and address, Defendant’s mother waived this protection […] we plan to include only Defendant’s initials or redact his name entirely in all future filings with the Court, including this letter.”

Given the quick settlement in the Vraspir case, it’s likely that the case against the 14-year-old boy will also be resolved without much additional damage. That is, if both sides can come to an agreement.

—

A copy of the stipulation and injunction is available here (pdf). The reply to the mother can be found here (pdf).

It’s never too early for Christmas-themed resources — especially when you want to make the most of them in your school, Code Club or CoderDojo! So here’s the ever-wonderful Laura Sach with an introduction of our newest festive projects.

In the immortal words of Noddy Holder: “it’s Christmaaaaaaasssss!” Well, maybe it isn’t quite Christmas yet, but since the shops have been playing Mariah Carey on a loop since the last pumpkin lantern hit the bargain bin, you’re hopefully well prepared.

To get you in the mood with some festive fun, we’ve put together a selection of seasonal free resources for you. Each project has a difficulty level in line with our Digital Making Curriculum, so you can check which might suit you best. Why not try them out at your local Raspberry Jam, CoderDojo, or Code Club, at school, or even on a cold day at home with a big mug of hot chocolate?

Jazzy jumpers

Jazzy jumpers (Creator level): as a child in the eighties, you’d always get an embarrassing and probably badly sized jazzy jumper at Christmas from some distant relative. Thank goodness the trend has gone hipster and dreadful jumpers are now cool!

This resource shows you how to build a memory game in Scratch where you must remember the colour and picture of a jazzy jumper before recreating it. How many jumpers can you successfully recall in a row?

Sense HAT advent calendar

Sense HAT advent calendar (Builder level): put the lovely lights on your Sense HAT to festive use by creating an advent calendar you can open day by day. However, there’s strictly no cheating with this calendar — we teach you how to use Python to detect the current date and prevent would-be premature peekers!

Press the Enter key to open today’s door:

(Note: no chocolate will be dispensed from your Raspberry Pi. Sorry about that.)

Code a carol

Code a carol (Developer level): Have you ever noticed how much repetition there is in carols and other songs? This resource teaches you how to break down the Twelve days of Christmas tune into its component parts and code it up in Sonic Pi the lazy way: get the computer to do all the repetition for you!

Your browser does not support the audio element.

No musical knowledge required — just follow our lead, and you’ll have yourself a rocking doorbell tune in no time!

Naughty and nice

Naughty and nice (Maker level): Have you been naughty or nice? Find out by using sentiment analysis on your tweets to see what sort of things you’ve been talking about throughout the year. For added fun, why not use your program on the Twitter account of your sibling/spouse/arch nemesis and report their level of naughtiness to Santa with an @ mention?

raspberry_pi is 65.5 percent NICE, with an accuracy of 0.9046692607003891

It’s Christmaaaaaasssss

With the festive season just around the corner, it’s time to get started on your Christmas projects! Whether you’re planning to run your Christmas lights via a phone app, install a home assistant inside an Elf on a Shelf, or work through our Christmas resources, we would like to see what you make. So do share your festive builds with us on social media, or by posting links in the comments.

I play Pokémon Go. (There, I’ve admitted it.) One of the interesting aspects of the game I’ve been watching is how the game’s publisher, Niantec, deals with cheaters.

There are three basic types of cheating in Pokémon Go. The first is botting, where a computer plays the game instead of a person. The second is spoofing, which is faking GPS to convince the game that you’re somewhere you’re not. These two cheats are often used together — and you see the results in the many high-level accounts for sale on the Internet. The third type of cheating is the use of third-party apps like trackers to get extra information about the game.

None of this would matter if everyone played independently. The only reason any player cares about whether other players are cheating is that there is a group aspect of the game: gym battling. Everyone’s enjoyment of that part of the game is affected by cheaters who can pretend to be where they’re not, especially if they have lots of powerful Pokémon that they collected effortlessly.

Niantec has been trying to deal with this problem since the game debuted, mostly by banning accounts when it detects cheating. Its initial strategy was basic — algorithmically detecting impossibly fast travel between physical locations or super-human amounts of playing, and then banning those accounts — with limited success. The limiting factor in all of this is false positives. While Niantec wants to stop cheating, it doesn’t want to block or limit any legitimate players. This makes it a very difficult problem, and contributes to the balance in the attacker/defender arms race.

Recently, Niantic implemented twonewanti-cheating measures. The first is machine learning to detect cheaters. About this, we know little. The second is to limit the functionality of cheating accounts rather than ban them outright, making it harder for cheaters to know when they’ve been discovered.

“This is may very well be the beginning of Niantic’s machine learning approach to active bot countering,” user Dronpes writes on The Silph Road subreddit. “If the parameters for a shadowban are constantly adjusted server-side, as they can now easily be, then Niantic’s machine learning engineers can train their detection (classification) algorithms in ever-improving, ever more aggressive ways, and botters will constantly be forced to re-evaluate what factors may be triggering the detection.”

One of the expected future features in the game is trading. Creating a market for rare or powerful Pokémon would add a huge additional financial incentive to cheat. Unless Niantec can effectively prevent botting and spoofing, it’s unlikely to implement that feature.

Cheating detection in virtual reality games is going to be a constant problem as these games become more popular, especially if there are ways to monetize the results of cheating. This means that cheater detection will continue to be a critical component of these games’ success. Anything Niantec learns in Pokémon Go will be useful in whatever games come next.

Mystic, level 39 — if you must know.

And, yes, I know the game tracks works by tracking your location. I’m all right with that. As I repeatedly say, Internet privacy is all about trade-offs.

Founded in 1991, Epic has developed and published computer games for over a quarter century.

The North Carolina company is known for titles such as Unreal, Gears of War, Infinity Blade, and most recently, the popular co-op survival and building action game Fortnite.

A few weeks ago, Fortnite released the free-to-play “Battle Royale” game mode for the PC and other platforms, generating massive interest from gamers. Unfortunately, this also included thousands of cheaters, many whom have been banned since.

Last week, Epic stressed that addressing Fortnite cheaters is the company’s highest priority, hinting that they wouldn’t stop at banning users.

“We are constantly working against both the cheaters themselves and the cheat providers. And it’s ongoing, we’re exploring every measure to ensure these cheaters are removed and stay removed from Fortnite Battle Royale and the Epic ecosystem,” the company wrote.

It turns out that this wasn’t an idle threat. TorrentFreak has obtained two complaints that were filed in a North Carolina federal court this week, which show that Epic is launching a legal battle against two prolific cheaters.

The two alleged cheaters are identified as Mr. Broom and Mr. Vraspir. Both are accused of violating Fortnite’s terms of service and EULA by cheating. This involves modifying and changing the game’s code, committing copyright infringement in the process.

“The software that Defendant uses to cheat infringes Epic’s copyrights in the game and breaches the terms of the agreements to which Defendant agreed in order to have access to the game,” the company notes.

From the complaints

The two complaints are largely the same and both defendants are accused of ruining the fun for others.

“Nobody likes a cheater. And nobody likes playing with cheaters. These axioms are particularly true in this case. Defendant uses cheats in a deliberate attempt to destroy the integrity of, and otherwise wreak havoc in, the Fortnite game.

“As Defendant intends, this often ruins the game for the other players, and for the many people who watch ‘streamers’,” the complaint adds.

Both defendants are connected to the cheat provider AddictedCheats.net, either as moderators or support personnel. They specifically target streamers and boast about their accomplishments, making comments such as ‘LOL I f*cked them’ after killing them.

According to Epic’s complaint, Vraspir was banned at least nine times but registered new accounts to continue his cheating. He also stands accused of having written code for the cheats.

Broom was banned once and previously stated that he’s also working on his own cheat. He publicly stated that he aims to create “unwanted chaos and disorder” in Fortnite and said the game was the highest priority of the cheat provider.

With the two lawsuits, the game publisher hopes to put an end to the cheating.

Both defendants face $150,000 in statutory damages for copyright infringement. The complaint further lists breach of contract and circumvention of technological measures as additional claims.

While taking out two cheaters is just a drop in the ocean, Epic is sending a stark warning to people who don’t play by the rules.

The Boston Red Sox admitted to eavesdropping on the communications channel between catcher and pitcher.

Stealing signs is believed to be particularly effective when there is a runner on second base who can both watch what hand signals the catcher is using to communicate with the pitcher and can easily relay to the batter any clues about what type of pitch may be coming. Such tactics are allowed as long as teams do not use any methods beyond their eyes. Binoculars and electronic devices are both prohibited.

In recent years, as cameras have proliferated in major league ballparks, teams have begun using the abundance of video to help them discern opponents’ signs, including the catcher’s signals to the pitcher. Some clubs have had clubhouse attendants quickly relay information to the dugout from the personnel monitoring video feeds.

But such information has to be rushed to the dugout on foot so it can be relayed to players on the field — a runner on second, the batter at the plate — while the information is still relevant. The Red Sox admitted to league investigators that they were able to significantly shorten this communications chain by using electronics. In what mimicked the rhythm of a double play, the information would rapidly go from video personnel to a trainer to the players.

This is ridiculous. The rules about what sorts of sign stealing are allowed and what sorts are not are arbitrary and unenforceable. My guess is that the only reason there aren’t more complaints is because everyone does it.

The Red Sox responded in kind on Tuesday, filing a complaint against the Yankees claiming that the team uses a camera from its YES television network exclusively to steal signs during games, an assertion the Yankees denied.

Boston’s mistake here was using a very conspicuous Apple Watch as a communications device. They need to learn to be more subtle, like everyone else.

A kind anonymous patron offers this prompt, which I totally fucked up getting done in July:

Something to do with programming languages? Alternatively, interesting game mechanics!

It’s been a while since I’ve written a thing about programming languages, eh? But I feel like I’ve run low on interesting things to say about them. And I just did that level design article, which already touched on some interesting game mechanics… oh dear.

Okay, how about this. It’s something I’ve been neck-deep in for quite some time, and most of the knowledge is squirrelled away in obscure wikis and ancient forum threads: getting data out of Pokémon games. I think that preserves the spirit of your two options, since it’s sort of nestled in a dark corner between how programming languages work and how game mechanics are implemented.

In the grand scheme of things, I don’t know all that much about this. I know more than people who’ve never looked into it at all, which I suppose is most people — but there are also people who basically do this stuff full-time, and that experience is crucial since so much of this work comes down to noticing patterns. While it sure helped to have a technical background, I wouldn’t have gotten anywhere at all if I weren’t acquainted with a few people who actually know what they’re doing. Most of what I’ve done is take their work and run with it.

Also, I am not a lawyer and cannot comment on any legal questions here. Is it okay to download ROMs of games you own? Is it okay to dump ROMs yourself if you have the hardware? Does this count as reverse engineering, and do the DMCA protections apply? I have no idea. But that said, it’s not exactly hard to find ROM hacking communities, and there’s no way Nintendo isn’t aware of them (or of the fact that every single Pokémon fansite gets their info from ROMs), so I suspect Nintendo simply doesn’t care unless something risks going mainstream — and thus putting a tangible dent in the market for their own franchise.

Still, I don’t want to direct an angry legal laser at anyone, so I’m going to be a bit selective about what resources I link to and what I merely allude to the existence of.

This is, necessarily, a pretty technical topic. It starts out in binary data and spirals down into microscopic details that even most programmers don’t need to care about. Sometimes people approach me to ask how they can help with this work, and all I can do is imagine the entire contents of this post and shrug helplessly.

Still, as usual, I’ll do my best to make this accessible without also making it a 500-page introduction to all of computing. Here is some helpful background stuff that would be clumsy to cram into the rest of the post.

Computers deal in bytes. Pop culture likes to depict computers as working in binary (individual bits or “binary digits”), which is technically true down on the level of the circuitry, but virtually none of the actual logic in a computer cares about individual bits. In fact, computers can’t access individual bits directly; they can only fetch bytes, then extract bits from those bytes as a separate step.

A byte is made of eight bits, which gives it 2⁸ or 256 possible values. It’s helpful to see what a byte is, but writing them in decimal is a bit clumsy, and writing them in binary is impossible to read. A clever compromise is to write them in hexadecimal, base sixteen, where the digits run from 0 to 9 and then A to F. Because sixteen is 2⁴, one hex digit is exactly four binary digits, and so a byte can conveniently be written as exactly two hex digits.

(A running theme across all of this is that many of the choices are arbitrary; in different times or places, other choices may have been made. Most likely, other choices were made, and they’re still in use somewhere. Virtually the only reliable constant is that any computer you will ever encounter will have bytes made out of eight bits. But even that wasn’t always the case.)

The Unix program xxd will print out bytes in a somewhat readable way. Here’s its output for a short chunk of English text.

Each line shows sixteen bytes. The left column shows the position (or “offset”) in the data, in hex. (It starts at zero, because programmers like to start counting at zero; it makes various things easier.) The middle column shows the bytes themselves, written as two hex digits each, with just enough space that you can tell where the boundaries between bytes are. The right column shows the bytes interpreted as ASCII text, with any non-characters replaced with a ..

(ASCII is a character encoding, a way to represent text as bytes — which are only numbers — by listing a set of characters in some order and then assigning numbers to them. Text crops up in a lot of formats, and this makes it easy to spot at a glance. Alas, ASCII is only one of many schemes, and it only really works for English text, but it’s the most common character encoding by far and has some overlap with the runners-up as well.)

Since everything is made out of bytes, there are an awful lot of schemes for how to express various kinds of information as bytes. As a result, a byte is meaningless on its own; it only has meaning when something else interprets it. It might be a plain number ranging from 0 to 255; it might be a plain number ranging from −128 to 127; it might be part of a bigger number that spans multiple bytes; it might be several small numbers crammed into one byte; it might be part of a color value; it might be a letter.

A meaningful arrangement for a whole sequence of bytes is loosely referred to as a format. If it’s intended for an entire file, it’s a file format. A file containing only bytes that are intended as text is called a plain text file (or format); this is in contrast to a binary file, which is basically anything else.

Some file formats are very common and well-understood, like PNG or MP3. Some are very common but were invented behind closed doors, like Photoshop’s PSD, so they’ve had to be reverse engineered for other software to be able to read and write them. And a great many file formats are obscure and ad hoc, invented only for use by one piece of software. Programmers invent file formats all the time.

Reverse engineering a format is largely a matter of identifying common patterns and finding data that’s expected to be present somewhere. Of course, in cases like Photoshop’s PSD, the most productive approach is to make small changes to a file in Photoshop and then see what changed in the resulting PSD. That’s not always an option — say, if you’re working with a game for a handheld that won’t let you easily run modified games.

Okay, hopefully that’s enough of that and you can pick up the rest along the way!

Before Diamond and Pearl, all of veekun’s data was just copied from other sources. Like, when I was in high school, I would spend lunch in the computer lab meticulously copy/pasting the Gold and Silver Pokédex text from another website into mine. Hey, I started the thing when I was 12.

But then… something happened. I can’t remember what it was, which makes this a much less compelling story. I assume veekun got popular enough that a couple other Pokénerds found out about it and started hanging around. Then when Diamond and Pearl came out, they started digging into the games, and I thought that was super interesting, so I did it too.

This is what led veekun into being much more about ripped data, though its track record has been… bumpy.

Everything in a computer is, on some level, a sequence of bytes. Game consoles and handhelds, being computers, also deal in bytes. A game cartridge is just a custom disk, and a ROM is a file containing all the bytes on that disk. (It’s a specific case of a disk image, like an ISO is for CDs and DVDs. You can take a disk image of a hard drive or a floppy disk or anything else, too; they’re all just bytes.)

But what are those bytes? That’s the fundamental and pervasive question. In the case of a Nintendo DS cartridge, the first thing I learned was that they’re arranged in a filesystem. Most disks have a filesystem — it’s like a table of contents for the disk, explaining how the one single block of bytes is divided into named files.

That is fantastically useful, and I didn’t even have to figure out how it works, because other people already had. Let’s have a look at it, because seeing binary formats is the best way to get an idea of how they might be designed. Here’s the beginning of the English version of Pokémon Diamond.

How do we make sense of this? Let us consult the little tool I started writing for this, porigon-z. It’s abandoned and unfinished and not terribly well-written; I would just link to the documentation I consulted when writing this, but it’s conspicuously 404ing now, so this’ll have to do. I described the format using an old version of the Construct binary format parsing library, and it looks like this:

A String is text of a fixed length, either truncated or padded with NULs (character zero) to fit. The clumsy ULInt16 means an Unsigned, Little-endian, 16-bit (two byte) integer.

(What does little-endian mean? I’m glad you asked! When a number spans multiple bytes, there’s a choice to be made: what order do those bytes go in? The way we write numbers is big-endian, where the biggest part appears first; but most computers are little-endian, putting the smallest part first. That means a number like 0x1234 is actually stored in two bytes as 34 12.)

Alas, this is a terrible example, since most of this is goofy internal stuff we don’t actually care about. The interesting bit is the “file table”. A little ways down my description of the format is this block of ULInt32s, which start at position 0x40 in the file.

I included one previous line for context; starting right after a whole bunch of ffs or 00s is a pretty good sign, since those are likely to be junk used to fill space. So we’re probably in the right place, or at least a right place. Also we’re definitely in the right place since I already know porigon-z works, but, you know.

The beginning part of this is a bunch of numbers that start out relatively low and gradually get bigger. That’s a pretty good indication of an offset table — a list of “where this thing starts” and “how long it is”, just like the offset/length pairs that pointed us here in the first place. The only difference here is that we have a whole bunch of them. And porigon-z confirms that this is a list of:

My code does a bit more than this, but I don’t want this post to be about the intricacies of an old version of Construct. The short version is that each entry is eight bytes long and corresponds to a directory; this list actually describes the directory tree. Decoding the first few produces:

Again, we encounter some mild weirdness. The parent ids seem to count upwards, except for the first one, and where did that f come from? It turns out that for the first record only — which is the root directory and therefore has no parent — the parent id is actually the total number of records to read. So there are 0x0045 or 69 records here. As for the f, well, I have no idea! I just discard it entirely when linking directories together.

So let’s fully decode entry 3 (the fourth one, since we started at zero). It has offset 0x000002fd, which is relative to where the table starts, so we need to add that to 0x00336400 to get 0x003366fd. We don’t have a length, but starting from there we see:

I called the structure here a filename_list_struct. Also, as I read this code, I really wish I’d made it more sensible; sorry, I guess I’ll clean it up when I get around to re-ripping gen 4. The Construct code is a bit goofy, but the idea is:

Read a byte. If it’s zero, stop here. Otherwise, the top bit is a flag indicating whether this entry is a directory; the rest is a length.

The next length bytes are the filename.

Iff this is a directory, the next two bytes are the directory id.

Repeat.

(Ah yes, bits and flags. A flag is something that can only be true or false, so it really only needs one bit to store. So programmers like to cram flags into the same byte as other stuff to save space. Computers can’t examine individual bits directly, but it’s easy to manipulate them from code with a little math. Of course, using 1 bit for a flag means only 7 are left for the length, so it’s limited to 127 instead of 255.)

Let’s try this. The first byte is 0c. I can tell you right away that the top bit is zero; if the top bit is one, then the first hex digit will be 8 or greater. So this is just a file, and it’s 0c or 12 bytes long. The next twelve bytes are cb_data.narc, so that’s the filename. Repeat from the beginning: the next byte is 00, which is zero, so we’re done. This directory only contains a single file, cb_data.narc.

But wait, what is this directory? We know its id is 3; its name would appear somewhere in the filename list for its parent directory, 2, along with an extra two bytes indicating it matches to directory 3. To get the name for directory 2, we’d consult directory 1; and directory 1’s parent is directory 0. Directory 0 is the root, which is just / and has no name, so at that point we’re done. Of course, if we read all these filename lists in order rather than skipping straight to the third one, then we’d have already seen all these names and wouldn’t have to look them up.

One final question: where’s the data? All we have are filenames. It turns out the data is in a totally separate table at fat_offset — “FAT” is short for “file allocation table”. That’s a vastly simpler list of pairs of start offset and end offset, giving the positions of the individual files, and nothing else.

All we have to do is match up the filenames to those offset pairs. This is where the “top file id” comes in: it’s the id of the first file in the directory, and the others count up from there. This directory’s top file id is 0x57, so cb_data.narc has file id 0x57. (If there were a next file, it would have id 0x58, and so on.) Its data is given by the 0x57th (87th) pair of offsets.

Phew! We haven’t even gotten anywhere yet. But this is important for figuring out where anything even is. And you don’t have to do it by hand, since I wrote a program to do it. Run:

Hey, it’s our friend cb_data.narc, with its full path! On the left is its file id, 87. Next are its start and end offsets, followed by its filesize.

You may notice that before the filenames start, you’ll get a list of unnamed files. These are entries in the FAT that have no corresponding filename. I learned only recently that they’re code — overlays, in fact, though I don’t know what that means yet.

This was fantastic. All the game data, nearly arranged into files, and even named sensibly for us. A goldmine. It didn’t used to be so easy, as we will see later.

Other people had already noticed the file /poketool/personal/personal.narc contains much of the base data about Pokémon. You’ll notice it has a “501” in brackets next to it, indicating that it’s actually a NARC file — a “Nitro archive”, though I’m not sure what “Nitro” refers to. This is a generic uncompressed container that just holds some number of sub-files — in this case, 501. The subfiles can have names, but the ones in this game generally don’t, so the only way to refer to them is by number.

You may also notice that evo.narc and wotbl.narc, in the same directory, are also NARCs with 501 records. It’s a pretty safe bet that they all have one record per Pokémon. That’s a little odd, since Pokémon Diamond only has 493 Pokémon, but we’ll figure that out later.

NARC is, as far as I can tell, an invention of Nintendo. I think it’s in other DS games, though I haven’t investigated any others very much, so I can’t say how common it is. It’s a very simple format, and it uses basically the same structure as the entire DS filesystem: a list of start/end offsets and a list of filenames. It doesn’t have the same directory nesting, so it’s much simpler, and also the filenames are usually missing, so it’s simpler still. But you don’t have to care, because you can examine the contents of a file with:

This will print every record as an unbroken string of hex, one record per line. (I admit this is not the smartest format; it’s hard to see where byte boundaries are. Again, hopefully I’ll fix this up a bit when I rerip gen 4.) Here are the first six Pokémon records.

That first one is pretty conspicuous, what with its being all zeroes. It’s probably some a dummy entry, for whatever reason. That does make things a little simpler, though! Numbering from zero has caused some confusion in the past: Bulbasaur (National Dex number 1) would be record 0, and I’ve had all kinds of goofy bugs from forgetting to subtract or add 1 all over the place. With a dummy record at 0, that means Bulbasaur is 1, and everything is numbered as expected.

So, what is any of this? The heart of figuring out data formats is looking for stuff you know. That might mean looking for data you know should be there, or it might mean identifying common schemes for storing data.

A good start, then, would be to look at what I already know about Bulbasaur. Base stats are a pretty fundamental property, and Bulbasaur’s are 45, 49, 49, 65, 65, and 45. In hex, that’s 2d 31 31 41 41 2d. Hey, that’s the beginning of the first line! It’s just slightly out of order; Speed comes before the special stats.

You can also pick out some differences by comparing rows. About 60% of the way along the line, I see 03 for Bulbasaur, Ivysaur, and Venusaur, but then 00 for Charmander and Charmeleon. That’s different between families, which seems like a huge hint; does that continue to hold true? (As it turns out, no! It fails for Butterfree — because it indicates the Pokémon’s color, used in the Pokédex search. Most families are similar colors.)

Sometimes a byte will seem to only take one of a few small values, which usually means it’s an enum (one of a list of numbered options), like the colors are. A byte that only ranges from 1 to 17 (or perhaps 0 to 16) is almost certainly type, for example, since there are 17 types.

Noticing common patterns — very tiny formats, I suppose — is also very helpful (and saves you from wild goose chases). For example, Pokémon can appear in the wild holding held items, and there are more than 256 items, so referring to an item requires two bytes. But there are only slightly more than 256 items in this game, so the second byte is always 00 or 01. If you remember that some fields must span multiple bytes, that’s an incredible hint that you’re looking at small 16-bit numbers; if you forget, you might thing the 01 is a separate field that only stores a single flag… and drive yourself mad trying to find a pattern to it.

The games have a number of TMs which can teach particular moves, and each Pokémon can learn a unique set of TMs. These are stored as a longer block of bytes, where each individual bit is either 1 or 0 to indicate compatibility. Those are a bit harder to identify with certainty, since (a) the set of TMs changes in every game so you can’t just check what the expected value is, and (b) bitflags can produce virtually any number with no discernible pattern.

Thankfully, there’s a pretty big giveaway for TMs in particular. Here are Caterpie, Metapod, and Butterfree:

Butterfree can learn TMs. Caterpie and Metapod are almost unique in that they can’t learn any. Guess where the TMs are! Even better, Caterpie is only #9, so this shows up very early on.

And, well, that’s the basic process. It’s mostly about cheating, about leveraging every possible trick you can come up with to find patterns and landmarks. I even wrote a script for this (and several other files) that dumped out a huge HTML table with the names of the (known) Pokémon on the left and byte positions as columns. When I figured something out, or at least had a suspicion, I labelled the column and changed that byte to print as something more readable (e.g., printing the names of types instead of just their numbers).

Of course, if you have a flash cartridge or emulator (both of which were hard to come by at the time), you can always invoke the nuclear option: change the data and see what changes in the game.

What we really really wanted were the sprites. This was a new generation with new Pokémon, after all, and sprites were the primary way we got to see them. Unlike nearly everything else, this hadn’t already been figured out by other people by the time I showed up.

Finding them was easy enough — there’s a file named /poketool/pokegra/pokegra.narc, which is conspicuously large. It’s a NARC containing 2964 records. A little factoring reveals that 2964 is 494 × 6 — aha! There are 493 Pokémon, plus one dummy.

This will extract the contents of pokegra.narc to a directory called pokemon-diamond.nds:data, which I guess might be invalid on Windows or something, so use -d to give another directory name if you need to. Anyway, in there you’ll find a directory called pokegra.narc, inside of which are 2964 numbered binary files.

Some brief inspection reveals that they definitely come in groups of six: the filesizes consistently repeat 6.5K, 6.5K, 6.5K, 6.5K, 72, 72. Sometimes a couple of the files are empty, but the pattern is otherwise very distinct. Four sprites per Pokémon, then?

Let’s have a look at the first file! Since it’s a dummy sprite, it should be blank or perhaps a question mark, right? Oh boy I’m so excited.

Hm. Okay, so, this is a problem. No matter what the actual contents are, this is a sprite, and virtually all Pokémon sprites have a big ol’ blob of completely empty space in the upper-left corner. Every corner, in fact. Except for a handful of truly massive species, the corners should be empty. So no matter what scheme this is using or what order the pixels are in, I should be seeing a whole lot of zeroes somewhere. And I’m not.

Compression? Seems very unlikely, since every file is either 0, 72, or 6448 bytes, without exception.

Well, let’s see what we’ve got here. RGCN and RAHC are almost certainly magic numbers, so this is one file format nested inside another. (A lot of file formats start with a short fixed string identifying them, a so-called “magic number”. Every GIF starts with the text GIF89a, for example. A NARC file starts with CRAN — presumably it’s “backwards” because it’s being read as an actual little-endian number.) I assume the real data begins at 0x30.

Without that leading 0x30 (48) bytes, the file is 6400 bytes large, which is a mighty conspicuous square number! Pokémon sprites have always been square, so this could mean they’re 80×80, one byte per pixel. (Hm, but Pokémon sprites don’t need anywhere near 256 colors?)

I see a 30 in the first line, which is probably the address of the data. I also see a 10, which is probably the (16-bit?) length of that initial header, or the address of the second header. What about in the second header? Well, uh, hm. I see a lot of what seem to be small 16-bit or 32-bit numbers: 0x000a is 10, 0x0014 is 20, 0x0003 is 3; 0x0018 is 24. A quick check reveals that 0x1900 is 6400 (the size of the data), and so 0x1920 is the size of the data plus this second header.

This hasn’t really told me anything I don’t already know. It seems very conspicuous that there’s no 0x50, which is 80, my assumed size of the sprite.

Well, hm, let’s look at the second file. It’s in the block for the same “Pokémon”, so maybe it’ll provide some clues.

Ah. No. It starts out completely identical. In fact, md5sum reveals that all four of these first sprites are identical. Might make sense for a dummy Pokémon. Does that pattern hold for the next Pokémon, which I assume is Bulbasaur? Not quite! Files 6 and 7 are identical, and 8 and 9 are identical, but they’re distinct from each other.

What’s the point of them then? Further inspection reveals that most Pokémon have paired sprites like this, but Pikachu does not — suggesting (correctly) that the sprites are male versus female, so Pokémon that don’t have gender differences naturally have identical pairs of sprites.

Okay, then, let’s look at Pikachu’s first sprite, 150. The key is often in the differences, remember. If the dummy sprite is either blank or a question mark, then it should still have a lot of corner pixels in common with the relatively small Pikachu.

Make a histogram? Every possible value appears with roughly the same frequency. Now that is interesting, and suggests some form of encryption — most likely one big “mask” xor’d with the whole sprite. But how to find the mask?

(It doesn’t matter exactly what xor is, here. It only has two relevant properties. One is that it’s self-reversing, making it handy for encryption like this — (data xor mask) xor mask produces the original data. The other is that anything xor’d with zero is left unchanged, so if I think the original data was zero — as it ought to be for the blank pixels in the corners of a sprite — then the encrypted data is just the mask! So I know at least the beginning parts of the mask for most sprites; I just have to figure out how to use a little bit of that to reconstitute the whole thing.)

I stared at this for days. I printed out copies of sprite hex and stared at them more on breaks at work. I threw everything I knew, which admittedly wasn’t a lot, at this ridiculous problem.

And slowly, some patterns started to emerge. Consider the first digit of the last column in the above hex dump: it goes e, d, d, c, c, b, b, a. In fact, if you look at the entire byte, they go e0, d8, d0, c8, etc. That’s just subtracting 08 on each row.

Are there other cases like this? Kinda! In the third column, the second digit alternates between 7 and f; closer inspection reveals that byte’s increasing by 18 every row. Oh, the sixth column too. Hang on — in every column, the second digit alternates between two values. That seems true for every other file we’ve seen so far, too.

This is extremely promising! Let’s try this. Take the first two rows, which are bytes 0–15 and bytes 16–31. Subtract the second row from the first row bytewise, making a sort of “delta row”. For the second Pikachu, that produces:

1

d82b 3830 1809 789f 58ae b82c 9828 f827

As expected, the second digit in each column is an 8. Now just start with the first row and keep adding the delta to it to produce enough rows to cover the whole file, and xor that with the file itself. Results:

Promising! We got a bunch of zeroes, as expected, though everything else is still fairly garbled. It might help if we, say, printed this out to a file.

By now it had become clear that the small files were palettes of sixteen colors stored as RGB555 — that is, each color is packed into two bytes, with five bits each for the red, green, and blue channels. Sixteen colors means two pixels can be crammed into a single byte, so the sprites are actually 160×80, not 80×80. Combining this knowledge with the above partially-decrypted output, we get:

Success!

Kinda!

Meanwhile another fansite found our code and put up a full set of these ugly-ass corrupt sprites, so that was nice.

It took me a while to notice another pattern, which emerges if you break the sprite into blocks that are 512 bytes wide (rather than only 16). You get this:

This time, the byte in the first column is always identical all the way down. Well, kind of. This is encrypted data, remember, and I only know what the mask is because the beginning of the data is usually blank. The exceptions are when the mask is hitting actual colored pixels, at which point it becomes garbage.

But even better, look at the second byte in each column. Now they’re all separated by a constant, all the way down! That means I can repeat the same logic as before, except with two “rows” that are 512 bytes long, and as long as the first 1024 bytes of the original data are all zeroes, I’ll get a perfect sprite out!

And indeed, I did! Mostly. Legendary Pokémon and a handful of others tend to be quite large, so they didn’t start with as many zeroes as I needed for this scheme to work. But it mostly worked, and that was pretty goddamn cool.

magical, a long-time co-conspirator, managed to scrounge up my final “working” code from that era (which then helped me find my own local copy of all my D/P research stuff, which I oughta put up somewhere). It’s total nonsense, but it came pretty close to working.

…

Hm? What? You want to know the real answer? Yeah, I bet you do.

Okay, here you go. So the games have a random number generator, for… generating random numbers. This being fairly simple hardware with fairly simple (non-crypto) requirements, the random number generator is also fairly simple. It’s an LCG, a linear congruential generator, which is a really bizarre name for a very simple idea:

1

ax + b

The generator is defined by the numbers a and b. (You have to pick them carefully, or you’ll get numbers that don’t look particularly random.) You pick a starting number (a seed) and call that x. When you want a random number, you compute ax + b. You then take a modulus, which really means “chop off the higher digits because you only have so much space to store the answer”. That’s your new x, which you’ll plug in to get the next random number, and so on.

In the case of the gen 4 Pokémon games, a = 0x4e6d and b = 0x6073.

What does any of this have to do with the encryption? Well! The entire sprite is interpreted as a bunch of 16-bit integers. The last one is used as the seed and plugged into the RNG, and then it keeps spitting out a sequence of numbers. Reverse them, since you’re starting at the end, and that’s the mask.

The seed technically overlaps with the last four pixels, but it happens to work since no Pokémon sprites touched the bottom-right corner in Diamond and Pearl. In Platinum a couple very large sprites broke that rule, so they ended up switching this around and starting from the beginning. Same idea, though.

Of course, porigon-z knows how to handle this… though it’s currently hardcoded to use the Platinum approach. Funny story: the algorithm was originally thought to go from the beginning, not the end, and it used an LCG with different constants. Turns out someone had just accidentally (?) discovered the reverse of the Pokémon LCG, which would produce exactly the same sequence, backwards. Super cool.

Why did that thing with subtracting the two rows kinda-sorta work, then? Well! It’s because… when you… and… uh… wow, I still have no goddamn idea. That makes no sense at all. I’d sure love for someone to explain that to me. I’m sure I could explain it if I sat down and thought about it for a while, but I suspect it’s something subtle and I’m not that interested.

I’d also like to know: why were the sprites encrypted in the first place? What possible point is there? They must have known we cracked the encryption, but then they used it again for Platinum, and Heart Gold and Soul Silver. Maybe it was only intended to be enough to delay us, during the gap between the Japanese and worldwide releases…? Hm.

Incidentially, the entire game text is also encrypted in much the same way. Without the encryption, it’s just UTF-16 — a common character encoding that uses two bytes for every character. I have no idea why.

So Nintendo DS cartridge have a little filesystem on them, making them act kinda like any other disk. Nice.

Game Boy cartridges… don’t. A Game Boy cartridge is essentially just a single file, a program. You pop the cartridge in, and the Game Boy runs that program.

Where is the data, then? Baked into the program — referred to as hard-coded. Just, somewhere, mixed in alongside all the program code. There’s no nice index of where it is; rather, somewhere there’s some code that happens to say “look at the byte at 0x006f9d10 in this program and treat it as data”.

I wasn’t involved in data dumping in these days; I was copying stuff straight out of the wildly inaccurate Prima strategy guide. (Again, you know, I was 12.) It’s hard to say exactly how people fished out the data, though I can take a few guesses.

To our advantage is the fact that Game Boy cartridges are much smaller than DS cartridges, so there’s much less to sift through. Pokémon Red and Blue are on 1 MB cartridges, and even those are half empty (unused NULs); the original Japanese Red and Green barely fit into 512 KB, and Red and Blue ended up just slightly bigger.

To our disadvantage is that these are the very first games, so we don’t have any pre-existing knowledge to look for. We don’t know any Pokémon’s base stats; we may not even know that “base stats” are a thing yet. Also, it’s not immediately obvious, but the Pokémon aren’t even stored in order. Oh, and Mew is completely separate; it really was a last-minute addition.

What do we know? Well, by playing the game, we can see what moves a Pokémon learns and when. There don’t seem to be all that many moves, so it’s reasonable to assume that a move would be represented with a single byte. Levels are capped at 100, so that’s probably also a single byte. Most likely, the level-up moves are stored as either level move level move... or move level move level....

Great! All we need to do is put together a string of known moves both ways and find them.

Except, ah, hm. We don’t actually know how the moves are numbered. But we still know the levels, so maybe we can get somewhere. Let’s take Bulbasaur, which we know learns Leech Seed at level 7, Vine Whip at level 13, and Poison Powder at level 20. (Or, I guess, that should be LEECHSEED, VINEWHIP, and POISONPOWDER.) No matter whether the levels or moves come first, this will result in a string like:

1

07 ?? 0D ?? 14

So we can do my favorite thing and slap together a regex for that. (A regex is a very compact way to search text — or bytes — for a particular pattern. A lone . means any single character, so the regex below is a straightforward translation of the pattern above.)

This seems pretty promising! It looks like the same set of moves is repeated 16 bytes later, but with different (slightly higher) levels after a certain point, which matches how evolved Pokémon behave. So this looks to be at least Bulbasaur and Ivysaur, though I’m not quite sure what happened to Venusaur.

By repeating this process with some other Pokémon, we can start to fill in a mapping of moves to their ids. Eventually we’ll realize that a Pokémon’s starting moves don’t seem to appear within this structure, and so we’ll go searching for those for a Pokémon that starts with moves we know the ids for. That will lead us to the basic Pokémon data along with base stats, because starting moves happen to be stored there in these early games.

The text isn’t encrypted, but also isn’t ASCII, but it’s possible to find it in much the same way by treating it as a cryptogram (or a substitution cipher). I assume that there’s some consistent scheme, such that the letter “A” is always represented with the same byte. So I pick some text that I know has a few repeated letters, like BULBASAUR, and I recognize that it could be substituted in some way to read as 123145426. I can turn that into a regex!

Unfortunately, this produces a zillion matches, most of them solid strings of NUL bytes. The problem is that nothing in the regex requires that the different groups are, well, different. You could write extra code to filter those cases out, or if you’re masochistic enough, you could express it directly within the regex using (?!...) negative lookahead assertions:

That’s much more reasonable. (The set of matches, I mean, not the regex.) It wouldn’t be hard to write a script with a bunch of known strings in it, generate appropriate regexes for each, eliminate inconsistent matches, and eventually generate a full alphabet. (Or you could assume that “B” immediately follows “A” and in general the letters are clustered together, which would lead you to correctly suspect that the strings at 0x0001c80e and 0x00094e75 are the ones you want.)

Even better, once you have an alphabet, you can use it to translate entire ROM — plenty of it will be garbage, but you’ll find quite a lot of blocks of human-readable text! And now you have all the names of everything and also the entire game script.

But like I said, I wasn’t involved in any of that. Until recently! I’ve been working on an experiment for veekun where I re-dump all the games to a new YAML-based format. Long story short: the current database is a pain to work with, and some old data has been lost entirely. Also, most of the data was extracted bits at a time with short hacky scripts that we then threw away, and I’d like to have more permanent code that can dump everything at once. It’ll be nice to have an audit trail, too — multiple times in the past, we’ve discovered that some old thing was dumped imperfectly or copied from an unreliable source.

So I started re-dumping Red and Blue, from scratch. I’ve made modest progress, though it’s taken a backseat to Sun and Moon lately.

It’s helped immensely that there’s an open source disassembly of Red and Blue floating around. What on earth is a disassembly? I’m so glad you asked!

Game Boy games were written in assembly code, which is just about as simple and painful as you can get. It’s human-readable, kinda, but it’s built from the basic set of instructions that a particular CPU family understands. It can’t directly express concepts like “if this, do that, otherwise do something else” or “repeat this code five times”. Instead, it’s a single long sequence of basic operations like “compare two numbers” and “jump ahead four instructions”. (Very few programmers work with assembly nowadays, but for various reasons, no other programming languages would work on the Game Boy at the time.)

To give you a more concrete idea of what this is like to work with: the Game Boy’s CPU doesn’t have an instruction for multiplying, so you have to do it yourself by adding repeatedly. I thought that would make a good example, but it turns out that Pokémon’s multiply code is sixty lines long. Division is even longer! Here’s something a bit simpler, which fills a span of memory:

CPUs tend to have a small number of registers, which can hold values while the CPU works on them — even as fast as RAM is, it’s considered much slower than registers. The downside is, well, you only have a few registers. The Game Boy CPU (a modified Z80) has eight registers that can each hold one byte: a through f, plus h and l.. They can be used together in pairs to store 16-bit values, giving the four combinations af, bc, de, and hl.

(If you need more than 16 bits, well, that’s your problem! 16 bits is the most the CPU understands; it can’t even access memory addresses beyond that range, so you’re limited to 64K. “But wait”, you ask, “how can a Game Boy cartridge be 512K or 1M?” Very painfully, that’s how.)

Now we can understand the comment in the above code. Starting at the memory address given by the 16-bit number in hl, it will copy the value in a into each byte, for a total of bc bytes. Translated into English, the above means something like this:

Save copies of d and e, so I can mess with them without losing any important data that was in them. (This code doesn’t use e, but there’s no push d instruction.)

Copy a, the fill value, into d.

Copy d, the fill value, into a.

Copy a, the fill value, into the memory at address hl. Then increase hl (the actual registers, not the memory) by 1.

Decrease bc, the number of bytes to fill, by 1.

Copy b, part of the number of bytes to fill, into a.

ORa with c, the other part of the number of bytes to fill, and leave the result in a. The result will be zero only if bc itself is zero, in which case the “zero” flag will be set.

If the zero flag is not set (i.e., if bc isn’t zero, meaning we’re not done yet), jump back to the instruction marked by .loop, which is step 3.

Restore d and e to their previous values.

Return to whatever code jumped here in the first place.

Even this relatively simple example has to resort to a weird trick — ORing b and c together — just to check if bc, a value the CPU understands, is zero or not.

CPUs don’t execute assembly code directly. It has to be assembled into machine code, which is (surprise!) a sequence of bytes corresponding to CPU instructions. When the above code is compiled, it produces these bytes, which you can verify for yourself appear in Pokémon Red and Blue in exactly one place:

1

d5 57 7a 22 0b 78 b1 20 f9 d1 c9

I stress that this is way beyond anything virtually any programmer actually needs to know. Even the few programmers working with assembly, as far as I know, don’t usually care about the actual bytes that are spat out. I’ve actually had trouble tracking down lists of opcodes before — almost no one is trying to read machine code. We are out in the weeds a bit here.

To finally answer your hypothetical question: disassembly is the process of converting this machine code back into assembly. Most of it can be done automatically, but it takes extra human effort to make the result sensible. Let’s consult the Game Boy CPU’s (rather terse) opcode reference and see if we can make sense of this, pretending we don’t know what the original code was.

Find d5 in the table — it’s in row Dx, column x5. That’s push de. The first number in the box is 1, meaning the instruction is only one byte long, so the next instruction is the very next byte. That’s 57, which is ld d, a. Keep on going. Eventually we hit 20, which is jr nz, r8 and two bytes long — the notes at the bottom explain that r8 is 8-bit signed data. That means the next byte is part of this instruction; it’s f9, but it’s signed, so really that’s -7. We end up with:

This looks an awful lot like what we started with, but there are a couple notable exceptions. First, the FillMemory:: line is completely missing. That’s just another kind of label, and the only way to know that the first line should be labelled at all is to find some other place in the code that tries to jump here. Given just these bytes, we can’t even tell if this is a complete snippet. Once we find that out, there’s still no way to recover the name FillMemory; even that is just a fan name and not the name from the original code. Someone came up with that name by reading this assembly code, understanding what it’s intended to do, and giving it a name.

Second, the .loop label is missing. The jr line forgot about the label and ended up with a number, which is how many bytes to jump backwards or forwards. (You can imagine how a label is much easier to work with than a number of bytes, especially when some instructions are one byte long and some are two!) An automated disassembler would be smart enough to notice this and would put a label in the right place. A really good disassembler might even recognize that this code is a simple loop that executes some number of times, and name that label .loop; otherwise, or for more complicated kinds of jumps, it would have a meaningless name that a human would have to improve.

And there’s a whole project where people have done the work of restoring names like this and splitting code up sensibly! The whole thing even assembles into a byte-for-byte identical copy of the original games. It’s really quite impressive, and it’s made tinkering with the original games vastly more accessible. You still have to write assembly, of course, but it’s better than editing machine code. Imagine trying to add a new instruction in the middle of the loop above; you’d screw up the jr‘s byte count, and every single later address in the ROM.

But more relevant to this post, a disassembly makes it easy to figure out where data is, since I don’t have to go hunting for it! When the code is assembled, it can generate a .sym file, which lists every single “global” label and the position it ended up in the final ROM. Many of those labels are for functions, like FillMemory is, but some of them are for blocks of data.

I set out to write some code to dump data from Game Boy games. Red/Green, Red/Blue, and Yellow were all fairly similar, so I wanted to use as much of the same code as possible for all of those games (and their various translations).

A very early pain point was, well, the existence of all those translations. Because there’s no filesystem, the only obvious way to locate data is to either search for it (which requires knowing it ahead of time, a catch-22 for a script that’s meant to extract it) or to bake in the addresses. The games contain quite a lot of data I want, and they exist in quite a few distinct versions, so that would make for a lot of addresses.

Also, with a disassembly readily available, it was now (relatively) easy for people to modify the games as they saw fit, in much the same way as it’s easy to modify most aspects of modern games by changing the data files. But if I simply had a list of addresses for each known release, then my code wouldn’t know what to do with modified games. It’s not a huge deal — obviously I don’t intend to put fan modifications into veekun — but it seemed silly to write all this extraction code and then only have it work on a small handful of specific files.

I decided to at least try to find data automatically. How can I do that, when the positions of the data only existed buried within machine code somewhere?

Obviously, I just need to find that machine code. See, that whole previous section was actually relevant!

I set out to do that. Remember the goofy regex from earlier, which searched for particular patterns of bytes? I did something like that, except with machine code. And by machine code, I mean assembly. And by assembly, I mean— okay just look at this.

I wrote my own little assembler that can convert Game Boy assembly into Game Boy machine code. The difference is that when it sees something like #foo, it assumes that’s a value I don’t know yet and sticks in a regex capturing group instead. It’s smart enough to know whether the value has to be one or two bytes, based on the instruction. It also knows that if the same placeholder appears twice, it must have the same value both times. I can also pass in a placeholder value, if I only know it at runtime.

I have half a dozen or so chunks like this. Every time I wanted to find something new, I went looking for code that referenced it and copied the smallest unique chunk I could (to reduce the risk that the code itself is different between games, or in a fan hack). I did run into a few goofy difficulties, such as code that changed completely in Yellow, but I ended up with something that seems to be pretty robust and knows as little as possible about the games.

I even auto-detect the language… by examining the name of the TOWNMAP, the first item that has a unique name in every official language.

This is probably ridiculous overkill, but it was a blast to get working. It also leaves the door open for some… shenanigans I’ve wanted to do for a while.

Recent games have been slightly more complicated, though the complexity is largely in someone else’s court. The 3DS uses encryption — real, serious encryption, not baby stuff you can work around by comparing rows of bytes.

When X and Y came out, the encryption still hadn’t been broken, so all of veekun’s initial data was acquired by… making a Google Docs spreadsheet and asking for volunteers to fill it in. It wasn’t great, but it was better than nothing.

This was late 2013, and I suppose it’s around when veekun’s gentle decline into stagnation began. When X and Y were finally ripped, I was… what was I doing? I guess I was busy at work? For whatever reason, I had barely any involvement in it. Then Omega Ruby and Alpha Sapphire came out, and now everyone was busy, and it took forever just to get stuff like movesets dumped.

Now I’m working on Sun and Moon again. It’s not especially hard — much of the basic structure has been preserved since Diamond and Pearl, and a lot of the Omega Ruby and Alpha Sapphire code I wrote works exactly the same with Sun and Moon — but there are a lot of edge cases.

The most obvious wrinkle is that the filenames are gone. This has actually been the case since Heart Gold and Soul Silver — all the files now simply have names like /a/0/0/0 and count upwards from there. I don’t know the reason for the change, but I assume the original filenames weren’t intended to be left in the game in the first place. The files move around in every new pair of games, too, requiring bunches of people to go through the files by hand and note down what each of them appears to be.

Newer games use GARC instead of NARC. I don’t know what the G stands for now. (Gooder? Gen 6?) It’s basically the same idea, except that now a single GARC archive has two levels of nesting — that is, a GARC contains some number of sub-archives, and each of those in turn contains some number of subfiles. Usually there’s only one subfile per subarchive, but I’ve seen zanier schemes once or twice.

Also, just in case that’s not enough levels of containers for you, there are also a number of other single-level containers embedded inside GARCs. They’re all very basic and nearly identical: just a list of start and end offsets.

Oh, and some of the data is compressed now. (Maybe that was the case before X/Y? I don’t remember.) Compression is fun. Any given data might be compressed with one of two flavors of LZSS, and it seems completely arbitrary what’s compressed and what’s not. There’s no indication of what’s compressed or what’s not, either; the only “header” that compressed data has is that the first byte is either 0x10 or 0x11, which isn’t particularly helpful since plenty of valid data also begins with one of those bytes.

But there was one much bigger practical problem with X and Y, one I’d been dreading for a while. X and Y, you see, use models — which means they don’t have any sprites for us to rip at all. And that kind of sucks.

The community’s solution has been for a few people (who have screen-capture hardware) to take screenshots and cut them out. It works, but it’s not great. The thing I’ve wanted for a very long time is rips of the actual models.

(Later games went back to including “sprites” that are just static screenshots of the models. Maybe out of kindness to us? Okay, yeah, doubtful. Oh, and those sprites are in the incredibly obtuse ETC1 format, which I had never heard of and needed help to identify, and which I will let you just read about yourself.)

The Pokémon models are an absolute massive chunk of the game data. All the data combined is 3GB; the Pokémon models are a hair under half of that, despite being compressed.

At least this makes them easy to find, since they’re all packed into a single GARC file.

That file contains, I don’t know, a zillion other files. And many of those files are generic containers, containing more files. And none of these files are named. Of course. It’s easy enough to notice that there are nine files per Pokémon, since the sizes follow a rough pattern like the sprites did in Diamond and Pearl. (You’d think that they’d use GARC’s two levels of nesting to group the per-Pokémon files together, but apparently not.)

At this point, I had zero experience with 3D — in fact, working on this was my introduction to 3D and Blender — so I didn’t get very far on my own. I basically had to wait a few years for other people to figure it out, look at their source code, replicate it myself, and then figure out some of the bits they missed. The one thing I did get myself was texture animations, which are generally used to make Pokémon change expressions — last I saw, no one had gotten those ripped, but I managed it. Hooray. I’m sure someone else has done the same by now.

Anyway, I bring up models because of two very weird things that I never would’ve guessed in a million years.

One was the mesh data itself. A mesh is just the description of a 3D model’s shape — its vertices (points), the edges between vertices, and the faces that fill in the space between the edges.

And, well, those are the three parts to a basic mesh. A few very simple model formats are even just those things written out: a list of vertices (defined by three numbers each, x y z), a list of edges (defined by the two vertices they connect), and a list of faces (defined by their edges).

It should be easy to find models by looking for long lists of triplets of numbers — vertex coordinates. Well, not quite. Pokémon models are stored as compiled shaders.

A shader is a simple kind of program that runs directly on a video card, since video cards tend to be a more appropriate place for doing a bunch of graphics-related math. On a desktop or phone or whatever, you’d usually write a shader as text, then compile it when your program/game runs. In fact, you have to do this, since the compilation is different for each kind of video card.

But Pokémon games only have to worry about one video card: the graphics chip in the 3DS. And there’s absolutely no reason to waste time compiling shaders while the game is running, when they could just do it ahead of time and put the compiled shader in the game directly. (Incidentally, the Dolphin emulator recently wrote about how GameCube games do much the same thing.)

So they did that. Thankfully, the compiled shader is much simpler than machine code, and the parts I care about are just the parts that load in the mesh data — which mostly looks like opcodes for “here’s some data”, followed by some data. It would probably be possible to figure out without knowing anything about the particular graphics chip, but if you didn’t know it was supposed to be a shader, you’d be mighty confused by all the mesh data surrounded by weird extra junk that doesn’t look at all like mesh data.

The other was skeletal animation. The basic idea is that you want to make a high-resolution model move around, but it would be a huge pain in the ass to describe the movement of every single vertex. Instead, you make an invisible “skeleton” — a branching tree of bones. The bones tend to follow major parts of the body, so they do look like very simple skeletons, with spines and arms and whatnot (though of course skeletons aren’t limited only to living creatures). Every vertex attaches to one or more of those bones — a rough first pass of this can be done automatically — and then by animating the much simpler skeleton, vertices will all move to match the bones they’re attached to.

The skeleton itself isn’t too surprising. It’s a tree, whatever; we’ve seen one of those already, with the DS directory structure. The skeletons and models are in a neutral pose by default: T for bipeds, simply standing on all fours for quadrupeds, etc. All of this is pretty straightforward.

But then there are the animations themselves.

An animation has some number of keyframes which specify a position, rotation, and size for each bone. Animating the skeleton involves smoothly moving each bone from one keyframe’s position to the next.

Position, rotation, and size each exist in three dimensions, so there are nine possible values for each keyframe. You might expect a set of nine values, then, times the number of keyframes, times the number of bones.

But no! These animations are stored the other way: each of those nine values is animated separately per bone. Also, each of those nine values can have a different number of keyframes, even for the same bone. Also, each of those nine values is optional, and if it’s not animated then its keyframes are simply missing, and there’s a set of bitflags indicating which values are present.

Okay, well, you might at least expect that a single value’s keyframes are given by a list of numbers, right?

Not quite! Such a set of keyframes has an initial “scale” and “offset”, given as single-precision floating point numbers (which are fairly accurate). Each keyframe then gives a “value” as an integer, which is actually the numerator of a fraction whose denominator is 65,535. So the actual value of each keyframe is:

1

offset + scale * value / 65535

Maybe this is a more common scheme than I suspect. Animation does take up an awful lot of space, and this isn’t an entirely unreasonable way to squish it down. The fraction thing is just incredibly goofy at first blush. I have no idea how anyone figured out what was going on there. (It’s used for texture animation, too.)

Anyway, thanks mostly to other people’s hard work, I managed to write a script that can dump models and then play them with a pretty decent in-browser model viewer. I never got around to finishing it, which is a shame, because it took so much effort and it’s so close to being really good. (My local copy has texture animation mostly working; the online version doesn’t yet.)

Hopefully I will someday, because I think this is pretty dang cool, and there’s a lot of interesting stuff that could be done with it. (For example, one request: applying one Pokémon’s animations to another Pokémon’s model. Hm.)

The one thing that really haunts me about it is the outline effect. It’s not actually the effect from the games; I had to approximate it, and there are a few obvious ways it falls flat. I would love to exactly emulate what the games do, but I just don’t know what that is. But maybe… maybe there’s a chance I can find the compiled shader and figure it out. Maybe. Somehow.

Let’s finish up with some small bumps in the road that are still fresh in my mind.

TMs are still in the Pokémon data, as is compatibility with move tutors. Alas, the lists of what the TMs and tutor moves are are embedded in the code, just like in the Game Boy days. You don’t really need to know the TM order, since they have a numbering exposed to the player in-game, and TM compatibility is in that same order… but move tutors have no natural ordering, so you have to either match them up by hand or somehow find the list in the binary.

I had a similar problem with incense items, which each affect the breeding outcome for a specific Pokémon. In Ruby and Sapphire, the incense effects were hardcoded. I don’t mean they were a data structure baked into the code; I mean they were actually “if the baby is this species and one parent is holding this incense, do this; otherwise,” etc. I spent a good few hours hunting for something similar in later games, to no avail — I’d searched for every permutation of machine code I could think of and come up with nothing. I was about to give up when someone pointed out to me that incense is now a data structure; it’s just in the one format I’d forgotten to try searching for. Alas.

Moves have a bunch of metadata, like “status effect inflicted” or “min/max number of turns to last”. Trouble is, I’m pretty sure that same information is duplicated within the code for each individual move — most moves get their own code, and there’s no single “generic move” function. Which raises the question… what is this metadata actually used for? Is it useful to expose on veekun? Is it guaranteed to be correct? I already know that some of it is a little misleading; for example, Tri Attack is listed as inflicting some mystery one-off status effect, because the format doesn’t allow expressing what it actually does (inflict one of burn, freezing, or paralysis at random).

Items have a similar problem: they get a bunch of their own data, but it’s not entirely clear what most of it is used for. It’s not even straightforward to identify how the game decides which items go in which pocket.

Moves also have flags, and it took some effort to figure out what each of them meant. Sun and Moon added a new flag, and I agonized over it for a while before I was fed the answer: it’s an obscure detail relating to move animations. No idea how anyone figured that out.

In Omega Ruby and Alpha Sapphire, there are two lists of item names. They’re exactly identical, with one exception: in Korean, the items “PP Up” and “PP Max” have their names written with English letters “PP” in one list but with Hangul in the other list. Why? No idea.

Evolution types are numbered. Method 4 is a regular level up; method 5 is a trade; method 21 is levelling up while knowing a specific move, which is given as a parameter. Cool. But there are two oddities. Karrablast and Shelmet only evolve when traded with each other, but the data doesn’t indicate this in any way; they both get the same unique evolution method, but there’s no parameter to indicate what they need to be traded with, as you might expect. Also, Shedinja isn’t listed as an evolution at all, since it’s produced as a side effect of Nincada’s evolution (which is listed as a normal level-up). To my considerable chagrin, that means neither of these cases can be ripped 100% automatically.

Pokémon are listed in a different order, depending on context. Sometimes they’re identified by species, e.g. Deoxys. Sometimes they’re identified by form, e.g. Attack Forme Deoxys. Sometimes they’re identified by species and also a separate per-species form number. Sometimes the numbering includes aesthetic-only forms, like Unown, that only affect visuals. But sprites and models both seem to have their own completely separate numberings, which are (of course) baked into the binary.

Incidentally, it turns out that all of the Totem Pokémon in Sun and Moon count as distinct forms! They’re just not obtainable. Do I expose them on veekun, then? I guess so?

Encounters are particularly thorny. The data is simple enough: for each map, there’s a list of Pokémon that can be encountered by various methods (e.g. walking in grass, fishing, surfing). But each of those Pokémon appears at a different rate, and those rates are somewhere in the code, not in the data. And there are some weird cases like swarms, which have special rules. And there are unique encounters that aren’t listed in this data at all, and which veekun has thus never had. And how do you even figure out where a map is anyway, when a named place can span multiple maps, and the encounters are only very slightly different in each map?

Anyway, that’s why veekun is taking so long. Also because I’ve spent several days not working on veekun so I could write this post, which could be much longer but has gone on more than long enough already. I hope some of this was interesting!

Oh, and all my recent code is on the pokedex GitHub. The model extraction stuff isn’t up yet, but it will be… eventually? Next time I work on it, maybe?

How do you go about learning about yourself? Has your view of yourself changed recently? How did you handle it?

Whoof. That’s incredibly abstract and open-ended — there’s a lot I could say, but most of it is hard to turn into words.

The first example to come to mind — and the most conspicuous, at least from where I’m sitting — has been the transition from technical to creative since quitting my tech job. I think I touched on this a year ago, but it’s become all the more pronounced since then.

I quit in part because I wanted more time to work on my own projects. Two years ago, those projects included such things as: giving the Python ecosystem a better imaging library, designing an alternative to regular expressions, building a Very Correct IRC bot framework, and a few more things along similar lines. The goals were all to solve problems — not hugely important ones, but mildly inconvenient ones that I thought I could bring something novel to. Problem-solving for its own sake.

Now that I had all the time in the world to work on these things, I… didn’t. It turned out they were almost as much of a slog as my job had been!

The problem, I think, was that there was no point.

This was really weird to realize and come to terms with. I do like solving problems for its own sake; it’s interesting and educational. And most of the programming folks I know and surround myself with have that same drive and use it to create interesting tools like Twisted. So besides taking for granted that this was the kind of stuff I wanted to do, it seemed like the kind of stuff I should want to do.

But even if I create a really interesting tool, what do I have? I don’t have a thing; I have a tool that can be used to build things. If I want a thing, I have to either now build it myself — starting from nearly zero despite all the work on the tool, because it can only do so much in isolation — or convince a bunch of other people to use my tool to build things. Then they’d be depending on my tool, which means I have to maintain and support it, which is even more time and effort poured into this non-thing.

Despite frequently being drawn to think about solving abstract tooling problems, it seems I truly want to make things. This is probably why I have a lot of abandoned projects boldly described as “let’s solve X problem forever!” — I go to scratch the itch, I do just enough work that it doesn’t itch any more, and then I lose interest.

I spent a few months quietly flailing over this minor existential crisis. I’d spent years daydreaming about making tools; what did I have if not that drive? I was having to force myself to work on what I thought were my passion projects.

Meanwhile, I’d vaguely intended to do some game development, but for some reason dragged my feet forever and then took my sweet time dipping my toes in the water. I did work on a text adventure, Runed Awakening, on and off… but it was a fractal of creative decisions and I had a hard time making all of them. It might’ve been too ambitious, despite feeling small, and that might’ve discouraged me from pursuing other kinds of games earlier.

A big part of it might have been the same reason I took so long to even give art a serious try. I thought of myself as a technical person, and art is a thing for creative people, so I’m simply disqualified, right? Maybe the same thing applies to games.

Lord knows I had enough trouble when I tried. I’d orbited the Doom community for years but never released a single finished level. I did finally give it a shot again, now that I had the time. Six months into my funemployment, I wrote a three-part guide on making Doom levels. Three months after that, I finally released one of my own.

I suppose that opened the floodgates; a couple weeks later, glip and I decided to try making something for the PICO-8, and then we did that (almost exactly a year ago!). Then kept doing it.

It’s been incredibly rewarding — far moreso than any “pure” tooling problem I’ve ever approached. Moreso than even something like veekun, which is a useful thing. People have thoughts and opinions on games. Games give people feelings, which they then tell you about. Most of the commentary on a reference website is that something is missing or incorrect.

I like doing creative work. There was never a singular moment when this dawned on me; it was a slow process over the course of a year or more. I probably should’ve had an inkling when I started drawing, half a year before I quit; even my early (and very rough) daily comics made people laugh, and I liked that a lot. Even the most well-crafted software doesn’t tend to bring joy to people, but amateur art can.

I still like doing technical work, but I prefer when it’s a means to a creative end. And, just as important, I prefer when it has a clear and constrained scope. “Make a library/tool for X” is a nebulous problem that could go in a great many directions; “make a bot that tweets Perlin noise” has a pretty definitive finish line. It was interesting to write a little physics engine, but I would’ve hated doing it if it weren’t for a game I were making and didn’t have the clear scope of “do what I need for this game”.

It feels like creative work is something I’ve been wanting to do for a long time. If this were a made-for-TV movie, I would’ve discovered this impulse one day and immediately revealed myself as a natural-born artistic genius of immense unrealized talent.

That didn’t happen. Instead I’ve found that even something as mundane as having ideas is a skill, and while it’s one I enjoy, I’ve barely ever exercised it at all. I have plenty of ideas with technical work, but I run into brick walls all the time with creative stuff.

How do I theme this area? Well, I don’t know. How do I think of something? I don’t know that either. It’s a strange paradox to have an urge to create things but not quite know what those things are.

It’s such a new and completely different kind of problem. There’s no right answer, or even an answer I can check for “correctness”. I can do anything. With no landmarks to start from, it’s easy to feel completely lost and just draw blanks.

I’ve essentially recalibrated the texture of stuff I work on, and I have to find some completely new ways to approach problems. I haven’t found them yet. I don’t think they’re anything that can be told or taught. But I’m starting to get there, and part of it is just accepting that I can’t treat these like problems with clear best solutions and clear algorithms to find those solutions.

A particularly glaring irony is that I’ve had a really tough problem designing abstract spaces, even though that’s exactly the kind of architecture I praise in Doom. It’s much trickier than it looks — a good abstract design is reminiscent of something without quite being that something.

I suppose it’s similar to a struggle I’ve had with art. I’m drawn to a cartoony style, and cartooning is also a mild form of abstraction, of whittling away details to leave only what’s most important. I’m reminded in particular of the forest background in fox flux — I was completely lost on how to make something reminiscent of a tree line. I knew enough to know that drawing trees would’ve made the background far too busy, but trees are naturally busy, so how do you represent that?

The answer glip gave me was to make big chunky leaf shapes around the edges and where light levels change. Merely overlapping those shapes implies depth well enough to convey the overall shape of the tree. The result works very well and looks very simple — yet it took a lot of effort just to get to the idea.

It reminds me of mathematical research, in a way? You know the general outcome you want, and you know the tools at your disposal, and it’s up to you to make some creative leaps. I don’t think there’s a way to directly learn how to approach that kind of problem; all you can do is look at what others have done and let it fuel your imagination.

I think I’m getting a little distracted here, but this is stuff that’s been rattling around lately.

If there’s a more personal meaning to the tree story, it’s that this is a thing I can do. I can learn it, and it makes sense to me, despite being a huge nerd.

Two and a half years ago, I never would’ve thought I’d ever make an entire game from scratch and do all the art for it. It was completely unfathomable. Maybe we can do a lot of things we don’t expect we’re capable of, if only we give them a serious shot.

And ask for help, of course. I have a hell of a time doing that. I did a painting recently that factored in mountains of glip’s advice, and on some level I feel like I didn’t quite do it myself, even though every stroke was made by my hand. Hell, I don’t even look at references nearly as much as I should. It feels like cheating, somehow? I know that’s ridiculous, but my natural impulse is to put my head down and figure it out myself. Maybe I’ve been doing that for too long with programming. Trust me, it doesn’t work quite so well in a brand new field.

I’m getting distracted again!

To answer your actual questions: how do I go about learning about myself? I don’t! It happens completely by accident. I’ll consciously examine my surface-level thoughts or behaviors or whatever, sure, but the serious fundamental revelations have all caught me completely by surprise — sometimes slowly, sometimes suddenly.

Most of them also came from listening to the people who observe me from the outside: I only started drawing in the first place because of some ridiculous deal I made with glip. At the time I thought they just wanted everyone to draw because art is their thing, but now I’m starting to suspect they’d caught on after eight years of watching me lament that I couldn’t draw.

I don’t know how I handle such discoveries, either. What is handling? I imagine someone discovering something and trying to come to grips with it, but I don’t know that I have quite that experience — my grappling usually comes earlier, when I’m still trying to figure the thing out despite not knowing that there’s a thing to find out. Once I know it, it’s on the table; I can’t un-know it or reject it meaningfully. All I can do is figure out what to do with it, and I approach that the same way I approach every other problem: by flailing at it and hoping for the best.

This isn’t quite 2000 words. Sorry. I’ve run out of things to say about me. This paragraph is very conspicuous filler. Banana. Atmosphere. Vocation.

Doom is 100% deterministic. Its random number generator is really a list of shuffled values; each request for a random number produces the next value in the list. There is no seed, either; a game always begins at the first value in the list. Thus, if you play the game twice with exactly identical input, you’ll see exactly the same playthrough: same damage, same monster behavior, and so on.

And that’s exactly what a Doom demo is: a file containing a recording of player input. To play back a demo, Doom runs the game as normal, except that it reads input from a file rather than the keyboard.

Multiplayer works the same way. Rather than passing around the entirety of the world state, Doom sends the player’s input to all the other players. Once a node has received input from every connected player, it advances the world by one tic. There’s no client or server; every peer talks to every other peer.

You can read the code if you want to, but at a glance, I don’t think there’s anything too surprising here. Only sending input means there’s not that much to send, and the receiving end just has to queue up packets from every peer and then play them back once it’s heard from everyone. The underlying transport was pluggable (this being the days before we’d even standardized on IP), which complicated things a bit, but the Unix port that’s on GitHub just uses UDP. The Doom Wiki has some further detail.

This approach is very clever and has a few significant advantages. Bandwidth requirements are fairly low, which is important if it happens to be 1993. Bandwidth and processing requirements are also completely unaffected by the size of the map, since map state never touches the network.

Unfortunately, it has some drawbacks as well. The biggest is that, well, sometimes you want to get the world state back in sync. What if a player drops and wants to reconnect? Everyone has to quit and reconnect to one another. What if an extra player wants to join in? It’s possible to load a saved game in multiplayer, but because the saved game won’t have an actor for the new player, you can’t really load it; you’d have to start fresh from the beginning of a map.

It’s fairly fundamental that Doom allows you to save your game at any moment… but there’s no way to load in the middle of a network game. Everyone has to quit and restart the game, loading the right save file from the command line. And if some players load the wrong save file… I’m not actually sure what happens! I’ve seen ZDoom detect the inconsistency and refuse to start the game, but I suspect that in vanilla Doom, players would have mismatched world states and their movements would look like nonsense when played back in each others’ worlds.

Ah, yes. Having the entire game state be generated independently by each peer leads to another big problem.

Maybe this wasn’t as big a deal with Doom, where you’d probably be playing with friends or acquaintances (or coworkers). Modern games have matchmaking that pits you against strangers, and the trouble with strangers is that a nontrivial number of them are assholes.

Doom is a very moddable game, and it doesn’t check that everyone is using exactly the same game data. As long as you don’t change anything that would alter the shape of the world or change the number of RNG rolls (since those would completely desynchronize you from other players), you can modify your own game however you like, and no one will be the wiser. For example, you might change the light level in a dark map, so you can see more easily than the other players. Lighting doesn’t affect the game, only how its drawn, and it doesn’t go over the network, so no one would be the wiser.

Or you could alter the executable itself! It knows everything about the game state, including the health and loadout of the other players; altering it to show you this information would give you an advantage. Also, all that’s sent is input; no one said the input had to come from a human. The game knows where all the other players are, so you could modify it to generate the right input to automatically aim at them. Congratulations; you’ve invented the aimbot.

I don’t know how you can reliably fix these issues. There seems to be an entire underground ecosystem built around playing cat and mouse with game developers. Perhaps the most infamous example is World of Warcraft, where people farm in-game gold as automatically as possible to sell to other players for real-world cash.

Egregious cheating in multiplayer really gets on my nerves; I couldn’t bear knowing that it was rampant in a game I’d made. So I will probably not be working on anything with random matchmaking anytime soon.

Starbound is a procedurally generated universe exploration game — like Terraria in space. Or, if you prefer, like Minecraft in space and also flat. Notably, it supports multiplayer, using the more familiar client/server approach. The server uses the same data files as single-player, but it runs as a separate process; if you want to run a server on your own machine, you run the server and then connect to localhost with the client.

I’ve run a server before, but that doesn’t tell me anything about how it works. Starbound is an interesting example because of the existence of StarryPy — a proxy server that can add some interesting extra behavior by intercepting packets going to and from the real server.

I modded StarryPy to print out every single decoded packet it received (from either the client or the server), then connected and immediately disconnected. (Note that these aren’t necessarily TCP packets; they’re just single messages in the Starbound protocol.) Here is my quick interpretation of what happens:

The client and server briefly negotiate a connection. The password, if any, is sent with a challenge and response.

The client sends a full description of its “ship world” — the player’s ship, which they take with them to other servers. The server sends a partial description of the planet the player is either on, or orbiting.

From here, the server and client mostly communicate world state in the form of small delta updates. StarryPy doesn’t delve into the exact format here, unfortunately. The world basically freezes around you during a multiplayer lag spike, though, so it’s safe to assume that the vast bulk of game simulation happens server-side, and the effects are broadcast to clients.

The protocol has specific message types for various player actions: damaging tiles, dropping items, connecting wires, collecting liquids, moving your ship, and so on. So the basic model is that the player can attempt to do stuff with the chunk of the world they’re looking at, and they’ll get a reaction whenever the server gets back to them.

(I’m dimly aware that some subset of object interactions can happen client-side, but I don’t know exactly which ones. The implications for custom scripted objects are… interesting. Actually, those are slightly hellish in general; Starbound is very moddable, but last I checked it has no way to send mods from the server to the client or anything similar, and by default the server doesn’t even enforce that everyone’s using the same set of mods… so it’s possible that you’ll have an object on your ship that’s only provided by a mod you have but the server lacks, and then who knows what happens.)

Starbound’s “fire and forget” approach reminds me a lot of IRC — a protocol I’ve even implemented, a little bit, kinda. IRC doesn’t have any way to match the messages you send to the responses you get back, and success is silent for some kinds of messages, so it’s impossible (in the general case) to know what caused an error. The most obvious fix for this would be to attach a message id to messages sent out by the client, and include the same id on responses from the server.

It doesn’t look like Starbound has message ids or any other solution to this problem — though StarryPy doesn’t document the protocol well enough for me to be sure. The server just sends a stream of stuff it thinks is important, and when it gets a request from the client, it queues up a response to that as well. It’s TCP, so the client should get all the right messages, eventually. Some of them might be slightly out of order depending on the order the client does stuff, but that’s not a big deal; anyway, the server knows the canonical state.

I bring up IRC because I’m kind of at the limit of things that I know. But one of those things is that IRC is simultaneously very rickety and wildly successful: it’s a decade older than Google and still in use. (Some recent offerings are starting to eat its lunch, but those are really because clients are inaccessible to new users and the protocol hasn’t evolved much. The problems with the fundamental design of the protocol are only obvious to server and client authors.)

Doom’s cheery assumption that the game will play out the same way for every player feels similarly rickety. Obviously it works — well enough that you can go play multiplayer Doom with exactly the same approach right now, 24 years later — but for something as complex as an FPS it really doesn’t feel like it should.

So while I don’t have enough experience writing multiplayer games to give you a run-down of how to do it, I think the lesson here is that you can get pretty far with simple ideas. Maybe your game isn’t deterministic like Doom — although there’s no reason it couldn’t be — but you probably still have to save the game, or at least restore the state of the world on death/loss/restart, right? There you go: you already have a fragment of a concept of entity state outside the actual entities. Codify that, stick it on the network, and see what happens.

I don’t know if I’ll be doing any significant multiplayer development myself; I don’t even play many multiplayer games. But I’d always assumed it would be a nigh-impossible feat of architectural engineering, and I’m starting to think that maybe it’s no more difficult than anything else in game dev. Easy to fudge, hard to do well, impossible to truly get right so give up that train of thought right now.

Also now I am definitely thinking about how a multiplayer puzzle-platformer would work.

While most gamers do their best to win fair and square, there are always those who try to cheat themselves to victory.

With the growth of the gaming industry, the market for “cheats,” “hacks” and bots has also grown spectacularly. The German company Bossland is one of the frontrunners in this area.

Bossland created cheats and bots for several Blizzard games including World of Warcraft, Diablo 3, Heroes of the Storm, Hearthstone, and Overwatch, handing its users an unfair advantage over the competition. Blizzard is not happy with these and the two companies have been battling in court for quite some time, both in the US and Germany.

Last week a prominent US case came to a conclusion in the California District Court. Because Bossland decided not to represent itself, it was a relatively easy for Blizzard, which was awarded several million in copyright damages.

The court agreed that hacks developed by Bossland effectively bypassed Blizzard’s cheat protection technology “Warden,” violating the DMCA. By reverse engineering the games and allowing users to play modified versions, Bossland infringed Blizzard’s copyrights and allowed its users to do the same.

“Bossland materially contributes to infringement by creating the Bossland Hacks, making the Bossland Hacks available to the public, instructing users how to install and operate the Bossland Hacks, and enabling users to use the software to create derivative works,” the court’s order reads (pdf).

The WoW Honorbuddy

The infringing actions are damaging to the game maker as they render its anti-cheat protection ineffective. The cheaters, subsequently, ruin the gaming experience for other players who may lose interest, causing additional damage.

“Blizzard has established a showing of resulting damage or harm because Blizzard expends a substantial amount of money combating the use of the Bossland Hacks to ensure fair game play,” the court writes.

“Additionally, players of the Blizzard Games lodge complaints against cheating players, which has caused users to grow dissatisfied with the Blizzard Games and cease playing. Accordingly, the in-game cheating also harms Blizzard’s goodwill and reputation.”

As a result, the court grants the statutory copyright damages Blizzard requested for 42,818 violations within the United States, totaling $8,563,600. In addition, the game developer is entitled to $174,872 in attorneys’ fees.

To prevent further damage, Bossland is also prohibited from marketing or sellings its cheats in the United States. This applies to hacks including “Honorbuddy,” “Demonbuddy,”
“Stormbuddy,” “Hearthbuddy,” and “Watchover Tyrant,” as well as any other software designed to exploit Blizzard games.

While its a hefty judgment, the order doesn’t really come as a surprise given that the German cheat maker failed to defend itself.

Bossland CEO Zwetan Letschew previously informed TorrentFreak that his company would continue the legal battle after the issue of a default judgment. Whatever the outcome, the cheats will remain widely available outside of the US for now.

Wired is reporting on a new slot machine hack. A Russian group has reverse-engineered a particular brand of slot machine — from Austrian company Novomatic — and can simulate and predict the pseudo-random number generator.

The cell phones from Pechanga, combined with intelligence from investigations in Missouri and Europe, revealed key details. According to Willy Allison, a Las Vegas­-based casino security consultant who has been tracking the Russian scam for years, the operatives use their phones to record about two dozen spins on a game they aim to cheat. They upload that footage to a technical staff in St. Petersburg, who analyze the video and calculate the machine’s pattern based on what they know about the model’s pseudorandom number generator. Finally, the St. Petersburg team transmits a list of timing markers to a custom app on the operative’s phone; those markers cause the handset to vibrate roughly 0.25 seconds before the operative should press the spin button.

“The normal reaction time for a human is about a quarter of a second, which is why they do that,” says Allison, who is also the founder of the annual World Game Protection Conference. The timed spins are not always successful, but they result in far more payouts than a machine normally awards: Individual scammers typically win more than $10,000 per day. (Allison notes that those operatives try to keep their winnings on each machine to less than $1,000, to avoid arousing suspicion.) A four-person team working multiple casinos can earn upwards of $250,000 in a single week.

The easy solution is to use a random-number generator that accepts local entropy, like Fortuna. But there’s probably no way to easily reprogram those old machines.

Now it’s time for NaNoWriMo! Almost. I don’t have any immediate interest in writing a novel, but I do have plenty of other stuff that needs writing — blog posts, my book, Runed Awakening, etc. So I’m going to try to write 100,000 words this month, spread across whatever.

Rules:

I’m only measuring, like, works. I’ll count this page, as short as it is, because it’s still a single self-contained thing that took some writing effort. But no tweets or IRC or the like.

I’m counting with vim’s gC-gorwc -w, whichever is more convenient. The former is easier for single files I edit in vim; the latter is easier for multiple files or stuff I edit outside of vim.

I’m making absolutely zero effort to distinguish between English text, code, comments, etc.; whatever the word count is, that’s what it is. So code snippets in the book will count, as will markup in blog posts. Runed Awakening is a weird case, but I’m choosing to count it because it’s inherently a text-based game, plus it’s written in a prosaic language. On the other hand, dialogue for Isaac HD does not count, because it’s a few bits of text in what is otherwise just a Lua codebase.

Only daily net change counts. This rule punishes me for editing, but that’s the entire point of NaNoWriMo’s focus on word count: to get something written rather than linger on a section forever and edit it to death. I tend to do far too much of the latter.

This rule already bit me on day one, where I made some significant progress on Runed Awakening but ended up with a net word count of -762 because it involved some serious refactoring. Oops. Turns out word-counting code is an even worse measure of productivity than line-counting code.

These rules are specifically crafted to nudge me into working a lot more on my book and Runed Awakening, those two things I’d hoped to get a lot further on in the last three months. And unlike Inktober, blog posts contribute towards my preposterous goal rather than being at odds with it.

With one week down, so far I’m at +8077 words. I got off to a pretty slow (negative, in fact) start, and then spent a day out of action from an ear infection, so I’m a bit behind. Hoping I can still catch up as I get used to this whole “don’t rewrite the same paragraph over and over for hours” approach.

I posted a little “what type am I” meme on Twitter and drew some of the interesting responses. I intended to draw a couple more, but then I got knocked on my ass and my brain stopped working. I still might get back to them later.

runed awakening: I realized I didn’t need all the complexity of (and fallout caused by) the dialogue extension I was using, so I ditched it in favor of something much simpler. I cleaned up some stuff, fixed some stuff, improved some stuff, and started on some stuff. You know.

book: I’m working on the PICO-8 chapter, since I’ve actually finished the games it describes. I’m having to speedily reconstruct the story of how I wrote Under Construction, which is interesting. I hope it still comes out like a story and not a tutorial.

As for the three big things, well, they sort of went down the drain. I thought they might; I don’t tend to be very good at sticking with the same thing for a long and contiguous block of time. I’m still making steady progress on all of them, though, and I did some other interesting stuff in the last three months, so I’m satisfied regardless.

With November devoted almost exclusively to writing, I’m really hoping I can finally have a draft chapter of the book ready for Patreon by the end of the month. That $4 tier has kinda been languishing, sorry.

Now it’s time for NaNoWriMo! Almost. I don’t have any immediate interest in writing a novel, but I do have plenty of other stuff that needs writing — blog posts, my book, Runed Awakening, etc. So I’m going to try to write 100,000 words this month, spread across whatever.

Rules:

I’m only measuring, like, works. I’ll count this page, as short as it is, because it’s still a single self-contained thing that took some writing effort. But no tweets or IRC or the like.

I’m counting with vim’s gC-gorwc -w, whichever is more convenient. The former is easier for single files I edit in vim; the latter is easier for multiple files or stuff I edit outside of vim.

I’m making absolutely zero effort to distinguish between English text, code, comments, etc.; whatever the word count is, that’s what it is. So code snippets in the book will count, as will markup in blog posts. Runed Awakening is a weird case, but I’m choosing to count it because it’s inherently a text-based game, plus it’s written in a prosaic language. On the other hand, dialogue for Isaac HD does not count, because it’s a few bits of text in what is otherwise just a Lua codebase.

Only daily net change counts. This rule punishes me for editing, but that’s the entire point of NaNoWriMo’s focus on word count: to get something written rather than linger on a section forever and edit it to death. I tend to do far too much of the latter.

This rule already bit me on day one, where I made some significant progress on Runed Awakening but ended up with a net word count of -762 because it involved some serious refactoring. Oops. Turns out word-counting code is an even worse measure of productivity than line-counting code.

These rules are specifically crafted to nudge me into working a lot more on my book and Runed Awakening, those two things I’d hoped to get a lot further on in the last three months. And unlike Inktober, blog posts contribute towards my preposterous goal rather than being at odds with it.

With one week down, so far I’m at +8077 words. I got off to a pretty slow (negative, in fact) start, and then spent a day out of action from an ear infection, so I’m a bit behind. Hoping I can still catch up as I get used to this whole “don’t rewrite the same paragraph over and over for hours” approach.

I posted a little “what type am I” meme on Twitter and drew some of the interesting responses. I intended to draw a couple more, but then I got knocked on my ass and my brain stopped working. I still might get back to them later.

runed awakening: I realized I didn’t need all the complexity of (and fallout caused by) the dialogue extension I was using, so I ditched it in favor of something much simpler. I cleaned up some stuff, fixed some stuff, improved some stuff, and started on some stuff. You know.

book: I’m working on the PICO-8 chapter, since I’ve actually finished the games it describes. I’m having to speedily reconstruct the story of how I wrote Under Construction, which is interesting. I hope it still comes out like a story and not a tutorial.

As for the three big things, well, they sort of went down the drain. I thought they might; I don’t tend to be very good at sticking with the same thing for a long and contiguous block of time. I’m still making steady progress on all of them, though, and I did some other interesting stuff in the last three months, so I’m satisfied regardless.

With November devoted almost exclusively to writing, I’m really hoping I can finally have a draft chapter of the book ready for Patreon by the end of the month. That $4 tier has kinda been languishing, sorry.

art: Almost the last of the ink drawings of Pokémon, all of them done in fountain pen now. I filled up the sketchbook I’d been using and switched to a 9”×12” one. Much to my surprise, that made the inks take longer.

mario maker: One of the level themes I got was “The Wreckage”, and I didn’t know how to speedmap that in Doom in only an hour, but it sounded like an interesting concept for a Mario level.

I managed to catch up on writing by the end of the month (by cheating slightly), so I’m starting fresh in November. The “three big things” obviously went out the window in favor of Inktober, but I’m okay with that. I’ve got something planned for this next month that should make up for it, anyway.

art: Almost the last of the ink drawings of Pokémon, all of them done in fountain pen now. I filled up the sketchbook I’d been using and switched to a 9”×12” one. Much to my surprise, that made the inks take longer.

mario maker: One of the level themes I got was “The Wreckage”, and I didn’t know how to speedmap that in Doom in only an hour, but it sounded like an interesting concept for a Mario level.

I managed to catch up on writing by the end of the month (by cheating slightly), so I’m starting fresh in November. The “three big things” obviously went out the window in favor of Inktober, but I’m okay with that. I’ve got something planned for this next month that should make up for it, anyway.

Tags

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.