NOTE: This post is a little less thought-out than my usual programming posts. This was written pretty much on the fly as I was experimenting with stuff and not after I’d reflected on it. I’m not even sure it will make sense. Give it a try.

Today we’re going to be talking about Threes!, an iOS game. I haven’t played that version, but I’ve played this web-based clone. For the purposes of this discussion, you should probably go play the game, get addicted for a few days (everyone does) and then come back here once the mania passes. It will be easier to follow the discussion that way.

If you don’t have that kind of time, then here’s a basic run-down of the gameplay:

You play using the arrow keys. Tiles will attempt to move in the given direction. If a blue slides into a red (or vice-versa) they merge to form a 3. From there it follows a simple pattern of matching like with like. 3+3=6. 6+6=12. 12+12=24. 24+24=48. And so on. The trick is that every time you move, a new tile is added to the board. If I shift the pieces up, then a new tile slides in on the bottom row. The game ends when the board fills up such that no more moves are possible.

So your apparent objective is to keep merging tiles to make ever-larger numbers. But the actual challenge is to simply merge tiles faster than they appear to keep the board from filling in. If you play a couple of times, you’ll probably get a score of a few hundred.

You normally expect your scores to go up as you play a game. Over time, your skill improves and you’re able to do better. Except, that’s not quite how things went for me. Sure, I repeatedly broke my high score, eventually playing a game all the way to about 7,500It’s been reported that scores in excess of 21,000 are possible.. But mixed in there were still a lot of 150-point games. When that kind of thing happened I always assumed that I had stopped paying attention. But this kept happening, no matter how hard I “tried”. Some games dead-ended early and some went a long way, and my results didn’t seem to line up with how much effort I was putting in.

This makes me think that the game has a huge element of luck. I wanted to play around with this idea, so I decided to make my own version of the game so I could explore the mechanics.

(All of this is written in C++ using old-school OpenGL. It’s overkill for an afternoon project like this, but I’ve already got the boilerplate code handy and using that is faster than learning Python or whatever you kids use for your prototyping work these days.)

First off, this business with the red and blue tiles is kind of suspect. The player needs even numbers of red and blue tiles in order to combine them. So if I get four red tiles in succession they will eat up a quarter of my play area and I’ll have no way to get rid of themAccording to the Touch Arcade article, the RNG compensates for this, but it takes several moves for the needed pieces to show up. Enough time to kill a game.. That can doom a game through no fault of the user. Having four more of one color than the other is rareFor varying definitions of “rare”. at any particular moment. But in the course of a game that lasts 100 or so moves it starts to become likely. Actually, it’s worse than that. It’s “likely” in the sense that it will happen in some games and not others. I suspect that my long-running games are ones where this sort of thing – against the odds – didn’t happen, thus letting me squeak through those tough points in the game where you’ve got a lot of high-value pieces on the board that aren’t quite ready to combine.

If this were done the other way with a series of direct combinations, then this randomness would be mitigated. If 1+1=2 and (humor me here) 2+2=3, then there wouldn’t be any combination of four low-value tiles it could throw at the player that would be mutually inert. Something would be able to combine.

It’s entirely possible the original designer had a good reason for setting things up this way, but I don’t know what it was. Maybe the concern was that it would be too easy to “solve” the game without this randomness. Maybe everyone would end up with about the same score without it. (If this is true, then it means this is a game of luck where you use skill to reach as much of your determined-by-luck potential as possible. That’s not bad or anything. Lots of games work that way.)

I don’t know. But I’m going to build an alternate rule set for my version. In my rules, I’m going to use direct progression using powers of two. I know my powers of two (and more importantly, my square roots) a lot better than I know all these multiples of three, which will make it easier to wrap my head around the game. So 1+1=2, 2+2=4, 4+4=8, 8+8=16, and so on.

I’ll explain the information on the left a little later on. Let’s just get the basics down first.

Basing things on powers of two avoids the obviously ridiculous business of having 2+2=3. It also lets us use a cool shorthand for high-value tiles. 1,024 can be 1k and 1,048,576 can be 1M. (kilobyte and megabyte, respectively. It’s educational!)

So now I’m going to build an AI to play the game for me. It’s not very bright. It only looks at the next move and doesn’t attempt to plan several moves in advance. It just attempts to keep the board as clear as possible. Barring that, it will try to move combine-able tiles into place next to each other. For scoring, we don’t don’t actually care about “points”. We’re just interested in how long a game lasts.

So, I’ll have my AI play a round of 32 games. First it will play according to the original rules where the first two tiles must combine to make the third. Then I’ll do another run where the first tile combines with itself to make the second, and the second combines with itself to make the third, etc.

The results? Kind of a surprise:

This is a run of 32 games, as played by the same AI, using the same pseudo-random sequence, under the two rule sets. The red line represents the game according to my rules. The blue one is the original rules. The higher the line, the longer the games. So in the very first game the AI – playing under the original rules – lost the game just before turn 100. Then playing the exact same game under my rules, the AI ended somewhere past 300 turns.

You can see my rules are quite a bit easier. (The games are longer overall.) But what I didn’t expect is that both rule sets are still incredibly random. My rules allow the AI to score anywhere from 150 to 525. The original rules have games that run from 50 to about 225 or so. Which one is “more random”? Original rules have a lower delta between the top and bottom of the range, although the delta is a larger portion of the average. Roughly:

The best Original-rule games scored five times higher than the worst ones, while the top Shamus-rule games only scored about three times higher than the worst.

HOWEVER:

The best Original-rule games were about 175 higher than the worst, and the best Shamus-rule games were ~375 higher.

I think I’d need the help of statistics nerds to explore this further. There are a lot of ways to look at this data. The point is, I don’t know which one of these counts as “less random” in the totally subjective sense of feeling more fair to the player.

Now let’s see what happens when we make the play area larger. We’ll do the same run, comparing Original and Shamus games on a 5×5 grid instead of a 4×4.

A game I played myself. (No AI.) Here we’re pretty close to the end.

In case you’re curious about the text: (Some of which is debugging info.)

Score: The scoring system used by the original game is a little mysterious. For my program, I’m just adding up all the tiles currently in play.

Moves: The real measure of success, in terms of appraising your strategy.

Ruleset: Original or Shamus.

Highest: This is used when figuring out what the next tile will be. In my game, it halves the exponent of the highest piece on the board. So if your highest tile is 256, that’s 28. Halving the exponent gives us 24, which is 16. So the “Next Tile” will give us 1, 2, 4, 8, or 16. Without this, games can take bloody ages before you start running out of room.

AI Rating: This is how much the AI “likes” this particular board layout. More empty space=better. More combine-able pieces next to each other=better. This is just for my own debugging purposes.

AI Movement: This number just tells me which direction[s] the AI can move. Again, debugging.

Filled: What percent of the board is filled. The game begins with mostly empty space, but quickly rises to about 60% full. It then plateaus in the 60-70 range for the course of the game. Once you hit 85%+, you hit a tipping point where the lack of movement options leads to having even less options, and the game usually ends.

Playtime: This is how long the current game would take if played by a human that made a move every 1.5 seconds or so. This is important later.

So if we make the game area 5×5, the outcomes look like this:

Okay, so let’s try it again on a 7×7 board:

For the record, a game of 11,000 moves or so would take you right around 5 hours, assuming you averaged a second and a half per move. (It takes my AI about 4 seconds to play through the same game.)

Well, it looks like I was wrong. My rule set is somehow more random, not less. I don’t know how. Maybe I’ve got a bug or design flaw in my AI that’s keeping it from performing properly. In the end, this was less illuminating than I’d hoped.

An AI game in progress. Note that the “playtime” is how long a human would take. The AI had only been playing for about two minutes at this point.

Still, we did learn a few interesting things:

As you might expect, making the board larger adds dramatically to the length of the game. A 4×4 takes a few minutes. A 6×6 takes about half an hour. An 8×8 takes about 5 hours. 10×10 is a couple of days. 12×12 is about ten days. (Again: This is assuming non-stop rapid-fire movements.)

For anything larger than 5×5, I think the game needs a little something else. Some special pieces or a powerup or something.

On larger boards, a lot of the interesting activity happens in the very last stages of the game. It’s kind of like starting a game of Tetris at level zero. You’ve got half an hour of really boring play. Then three minutes of of challenge, then a minute of sheer chaos where it all falls apart. But unlike Tetris, we can’t just “start” the player near the endgame, because how they fare in the endgame is a measure of how careful and disciplined they have been at managing the board during the “boring” parts. This probably means the ideal board size is 6×6 or less. Anything larger, and it just takes too dang long before you can see the results of your efforts.

There are a lot of interesting things you can do with the “next tile” logic. You could have it only give you 1’s and 2’s, which would make the game stupidly long and boring. But maybe my approach is too conservative. Maybe instead of 2n/2, it would be more interesting to use 2n-2, or just 2n. The latter would mean that once you get a 256, then it will start randomly giving you 256’s. That would make the difficulty ramp up quickly. It might also make the game more random.

Then again, this post proves I’m probably bad at intuiting how “random” a system is.

I offer this post as an example of why the constant “cloning” of mobile games isn’t necessarily a bad thing. Threes! is a dead-simple game, but here I’ve stumbled on several interesting variants of number-combining that are thus far left totally unexplored. I’m sure there are other variations you could play with. You could have a half dozen of these games on iOS and each one of them would be unique and worthwhile. Or someone could put out a mindless re-skin with identical mechanics. A lot of it depends on who is making the game and why they’re doing it.

It goes back to the “We make games to make money” vs. “We make money to make games” problem. If all you want is money and you don’t care about games, then you’ll look at what’s selling and do a straight-up clone. If you love thinking about and exploring mechanics then you would probably find direct cloning to be tedious and boring. You’ll be driven to make something different – something you want to play that doesn’t already exist – and you’ll put it up for sale as a way of getting paid for your efforts.

In any case, this is a gem of a game. Lots of neat stuff to think about. Do give it a try if you haven’t already.

Not really, because the standard deviation only measures the dispersion from the mean. Because the mean number of turns are already so different, it’s unlikely to be the “right” measure: two games that went 20 moves above the mean would give the same contribution for the standard deviation, but I think a discrepancy of 20 should be considered more exceptional in the original rules than in Shamu’s rules.

An alternative would be to measure number of moves as percentage of the average, and look at the standard deviation of that.

The more standard approach is the coefficient of variation, which is the standard deviation divided by the mean. The CV has a bit of literature behind it, and there is a distribution associated with it under the usual normality assumptions (and a non-sero assumption which clearly applies here).

The even lazier (less precise but visually more intuitive) approach would be to plot the graphs you showed in the post with a logarithmic y-axis. That way a constant visual distance in vertical direction means a constant factor between to points.

Also, as someone else mentioned, doing a few more runs and putting the reults in histograms would be neat but it’s work and you may not want to spend the time.

Game 1: 715(or so) points. Also, “Wait, how does this game work?”
Game 2: 900-ish points
Game 3: 3117 points. Bored now.(This is a defense mechanism. Otherwise I’d still be playing and wouldn’t have finished reading the blog post.)

Yeah, after the second or third game, I finally “got” the rules.
I was bored on game one, however.
Mostly because I want games with lots of interactive stuff to mess around with, lots of skill required, or cool graphics/aesthetics.
I guess technically doing well at this game requires skill, but the idea of moving numbers around makes it seem utterly pointless to me.
At least in Skyrim, the numbers are shaped like orcs and zombies! :)

My scores steadily went up the first three games, then dropped drastically on the fourth. For the fifth game I just spastically hit the direction buttons at random…and ended up blowing away my previous scores.

Small sample to be sure, but Interesting none the less.

Edit: After more playing, I’ve realized that I didn’t really “get” the game before. Now that I understand the rules more, pure luck is no longer superior to thought out actions…. However like Shamus and many others, my results are all over the place.

Doing well requires some planning and forethought. The third game, I think I actually got one of the blocks to 192(mostly luck after I got a pair of 96s on the board), mostly by looking at what I had and trying to combine numbers both as opportune AND in such a way that they would end up nearer to where I wanted to use the results.

I suspect a skilled player could reliably score higher than the AI that Shamus put together, but the AI itself is probably sufficient for gauging the “complexity” of a particular puzzle.

32 is a pretty small sample size, if your ai can play a game in a few seconds then leave it running in the background and get 1000 or so games (~3 hours at 10 sec a game). Then display the results as a histogram, that should display the data in a way that shows what is happening more clearly.
If you’re more interested in the difference between the rulesets, check the difference in number of moves between games with the same random number seed.

1k+ in my second game, and 7668 after a few. It seems like a good idea is to try to keep one big box in the middle, and add to it, but I think a better idea is to keep a string of high numbers towards the center- two 24s, a 48, and a 96 strung in there is better than a 192 and two blank spots.

It feels like the game gives a 1/3 chance of getting a 1, a 1/3 chance of getting a 2, and a 1/3 chance of getting anything else. That might be why your game is seemingly more random, you’re giving a bigger spread of what’s coming in.

Maybe more information could reduce randomness. But to account for that You need your AI to know what to do with this additional information. I was thinking about showing what are the next N cards to come.

re: Randomness. This game, the original flavor at least, isn’t actually all that random. It’s /coarse/, and that seems to produce the arbitrariness you’re experiencing. The RNG can only select between three possible outcomes when generating a new tile, and as the game progresses the chances of you having somewhere to put any given new tile dwindle since you have to build up tiles to match the ones you already have and new tiles can become trapped between stacks they can’t join.

If, as you suggest, the game were to draw from tile sizes you’ve surpassed as you progress, I would posit this to have a smoothing effect. The game might become technically more random in that the RNG would have a wider range of possible outcomes on generating a new tile, but it would tend to feel less arbitrary as the game would be more likely to hand you a tile you either already have a place to merge or could easily build something for from what you already have on the board. The worst possible results would seem to be no worse for your board than what the vanilla game produces.

Another angle to experiment with would be tile entry position. I believe this is where the game experiences it’s highest degree of entropy. The player has control over which side of the board a new tile enters on, but not which specific space on that side it fills, and there are more ways for a tile to enter the board overall than there are things for that tile to start as (28 total starting positions for 3 possible tiles in the vanilla game…that’s what, 21952 possible new tile entry configurations, ignoring previous board state? That narrows as the board fills and sides lose open and movable tiles, but it’s still significant. Amusingly, it does mean that in some game configurations the player /can/ exert some control over which space on a side a tile enters on, but they’re very likely closing in on game end or otherwise hosed at that point.) I’m a bit fuzzy on this, but I think corners have slightly higher odds of filling over a number of turns, since a given corner has a chance to fill from two sides whereas the spaces between corners can only fill when their specific side is selected. This may have the effect of amplifying the collision/tile entrapment hazard in corners at any board size.

Disclaimer: I’m not actually a mathematician of any kind, so please correct me if I’m wildly off base here.

interesting article even if I seem to be immune to threes’ charms (I really don’t get it). That said I noticed a small nit to pick. You said the new tiles that could appear were half of the maximum on the board. That would be 2^(n-1) not 2^(n/2).

Example: 256=2^8 is max on the board, 128=2^7=2^(8-1) is the maximum allowed in (2^n/2 doesn’t work since you could get 2^1.5 for example).

“Highest: This is used when figuring out what the next tile will be. In my game, it halves the exponent of the highest piece on the board. So if your highest tile is 256, that’s 28. Halving the exponent gives us 24, which is 16. So the “Next Tile” will give us 1, 2, 4, 8, or 16. Without this, games can take bloody ages before you start running out of room.”

So 2^(n/2) looks right, and I don’t see anywhere he describes it as halving the number (it says “halves the exponent” in that paragraph.)

This is really cool! I like seeing breakdowns of design decisions like this, and looking into how game mechanics work.

It would also be interesting to look at the AI and see how it’s playing. You gave us a rundown of how it works, but would you be willing to post the code somewhere? Something like github that makes it easy to work with would be awesome, but even just a zip file you host would make it possible to play around with it and see if we can find anything else interesting.

First impressions, this game punishes high skill. Or, rather, it gets harder when you get better? Basically, when more of the board is clear, the new tiles have more spaces to show up in, so it becomes more difficult to plan for. I feel like the game is punishing me for clearing the board.

Also, because the tiles end up pushed against the side before they can combine is frustrating. It feels like sloshing stuff around in a pan or something. Ooh, that could be an interesting mechanic. Combined tiles pushed against a far side with room adjacent to them could split back into smaller pieces.

Anyway, if you want to reduce the randomness, there should be some procedural way to produce starting positions and new tiles, instead of positioning them purely randomly.

Higher skill is actually leaving matchable pairs unmatched; they clog up the board and reduce the number of places new tiles show up, and as long as you don’t accidentally shift them away from each other they’re as safe as free spaces.

On a more general note about randomness, skill and the mixed scores, this is something that happens a lot in many games with a strong random factor, even with a much more complex system, for example roguelikes.

The online version of Dungeon Crawl provides some interesting data because it gives access to scores of all kinds, information on how the character died etc. Basically what happens is that the “pros”, people who can fairly reliably sit down to the game and pass it, will generally rush, rather than carefully inch their way, through the initial part of the game in hopes of getting a good start. What’s more they will likely burn through several early game stages either dying or even outright quitting when they don’t feel they stand a good chance of preparing for mid-game. If they had a good early-game they can usually sail through mid-game without a lot of problems and even tackle the endgame relatively confidently, barring the RNG providing something pretty extreme, so basically the low-score high-score situation. Interestingly enough a large part of the appeal for seasoned players seemes to be arming themselves for those situations where randomness will lash out at them and trying to surivive. There is even a kind of self-imposed extra challenge mode where players will decide to worship Xom, the god of chaos, whose primary interest is his own amusement and so whenever he feels bored he will do random stuff to the player or the surrounding, he may heal you and strike your enemies with lightning, or he may teleport you into the middle of a room filled with killer bees.

Random is a synonym for unexpected. When people hear random they expect white noise, this is not correct.

I’ll link to two videos, first see this one http://www.youtube.com/watch?v=QEM9EgnIcQU
And you will see Derren do a series of coin tosses that defies what one would expect, and do note there are no tricks photography and that is a real unmodified coin and no magnets or sneaky things like that is done.

Conclusion, we only have a narrow view or the world, even in a long series of random numbers or data there will be patterns.

Even if there is a 1 in a million probability of something happening, the chance of it happening is always there, the likelihood is very small (probability) but the chance is always there, if the chance is not there then it’s a fixed point in time (thus either always true or always false, and now we’re suddenly playing with relativity, observer, string theory and time itself and it all gets messy fast).

What you want usually in games is uniformly distributed pseudo-random numbers, and these resemble pretty much white noise.

Disagree. Random is a synonym for unpredictable. Unpredictable is not a synonym for unexpected – unexpected brings in the notion of what normal humans EXPECT, which is a different kettle of fish entirely (as you point out).

A person playing a Martingale “double up” betting strategy might not EXPECT to lose, but that losing (and when it’s likely to occur) can certainly be PREDICTED with high accuracy. People are bad at predicting things like “how many coin flips would I need to do before I should expect to see 6 tails in a row?” Which of course is the same point you were making.

Always reminds me of the opening scene of Rosencrantz And Guildenstern Are Dead.

Hm. So how did you get the original rules from the actual game, to put into your simulator? (I’m wondering if your simulation baseline is accurate enough to draw conclusions; if you just made an educated guess at the rules of ‘3’, then your simulations… well, they don’t exactly measure the ‘3’ game itself; rather just a contest between two rulesets of yours).

But I think you can see from your post why they went with their original rules: Your game is longer!

I mean, the game probably wouldn’t be interesting on a 3 by 3 grid, so they made this “1 + 2 = 3″ rule the first step, to make the game harder, to make the game shorter, so it is in the “sweet spot” of length for a mobile game.

I mean, that’s just my guess, but I think games that take hundreds of steps every time you play wouldn’t be too optimal for a mobile, on-the-go experience.

“Highest: This is used when figuring out what the next tile will be. In my game, it halves the exponent of the highest piece on the board. So if your highest tile is 256, that’s 28. Halving the exponent gives us 24, which is 16. So the “Next Tile” will give us 1, 2, 4, 8, or 16. Without this, games can take bloody ages before you start running out of room.”

So in short: the maximum tile next to be added has a value of sqrt(highest on board)? ^^

As an experiment I decided to try playing a few games where I just pushed the keys in a pattern, rather than actually trying to play the game. First I just went in a circle (up, right, down, left) and got scores ranging from 500-something to just over 1000 (over the course of about five games). Then I switched to doing up, down, left, right, and earned my lowest score of 90, and didn’t get any scores above 150 or so. It’d be interesting to see what kind of results you’d get doing this over a larger sample size.

The game seems to be a lot more left-brain centric; as in, I can play it pretty well. The main game is definitely made for people who aren’t me, and enjoy the more Rubik’s Cube-like feel of lots of moving parts that can’t easily be isolated.

So this version actually makes me kind of sad. It’s like if someone made a Surgery Simulator clone where the controls were fluid and intuitive.

It might be worth giving a measure of change over time, too, and then you would want a timeline rather than a histogram.

For example, I’m guessing what some designers would be aiming for would be lots of small playtimes, with the occasional long game, so you wouldn’t want the large times to cluster together too much.

Maybe set some sort of threshold on the number of turns, record the number of games between those that exceed the threshold, then take the mean and variation of those short-game intervals. I’m not sure how you’d tweak that to compare between different rule sets, though.

Seems to me that on the randomness front, there are two sides of the issue and this test handles only one, and maybe the less important of them.

I mean, this AI doesn’t learn; it’s always the same. The fact that it gets scores that vary widely shows some sort of randomness. But what we don’t know from testing the AI is whether different play approaches, handed the same game inputs, would result in different scores–that is, whether the randomness of what the game hands you actually dominates over player skill in determining outcomes. If you had ten different AIs with different strategies and they all ended up nobbled by the same input tile sequences, I think that would say something; if on the other hand some AIs pushed through difficult tile feeds while others didn’t, that would tell you something different.

Not that I recommend spending ages designing a host of different AIs. For one thing, the whole question is complicated by the fact that the randomness of the game interacts with player actions–where tiles can appear is constrained by which tiles the player clears, so different play will result in tiles appearing differently. So I’m not sure you can really hand different players the “same” game in any meaningful sense.