Pan Galactic Affluence is in a league of its own. Lots of the cards with prestige are well liked.

Kesterer is also in a league of his own, wow. The AI does very well in the leaderboard.

There are some known analysis bugs.

* Doomed world is over creddited as a homeworld. When homeworlds blow up/are replaced, DW gets the credit. Non DW homeworlds being replaced happen a lot more now, due to terraforming engineers and increased frequency of takeovers.

* Some games have no homeworld's associated, because Keldon's server only partially renders the game.

Selection bias. Near game end, if you have lots of prestige, then placing PGA is a no-brainer and will often be correlated with a win. This overstates its strength (and PGA is definitely a strong card).

(PGA will fare better than FedCap in stats because PGA is also a (dev-spam) "strategy in a box" card, in a way that FedCap isn't.)

This "dual-use" nature (end-game-scorer plus strategy-in-a-box) is the same reason that Terraforming Guild fared so well in GS-only individual card stats. GalFed is similar (and also fares quite well in individual stats).

In all three cases, the individual cards end up with "credit" for other play actions/investments (gathering lots of prestige, placing lots of windfall worlds, placing other dev-spam enablers (IC, IB, PW)), similar to how the top stars of an ensemble show can sometimes earn awards and credit for what is both a good performance by them and by their unsung supporting cast.

It's one reason why single card stats are only a partial (and somewhat distorted) metric/proxy for a card's overall "strength".

I also don't know when these stats were gathered -- or, more precisely, how far along the fairly extensive BoW learning curve the players who generated these stats were -- but my suspicion is that all the prestige related cards will be somewhat overvalued in early BoW stats, given that it definitely takes a while to learn how to beat prestige strategies. This means that stats collected from players who haven't yet learned how to do this will overstate the value of prestige related cards.

Looking at the stats, the fact that Gal. Bankers (which grants a prestige) is now faring quite a bit better than GalFed leads me to believe that this bias is quite strong in this data set. My rather extensive experience (1000s of BoW games in a variety of play environments) says otherwise -- Gal. Bankers and GalFed should both be performing well, with GalFed still having a slight edge. Which one would you rather see in your hand in a BoW environment?

My other takeaway from a quick look at these stats is that they are heavily biased towards a 2PA play environment. I believe this also will tend to overstate Prestige effects since, in 2PA, if you're not in the Prestige lead, then your opponent usually is. This is not as true in multi-player games, where the prestige lead moves around a lot more and early prestige leads don't tend to hold up as well (since the odds that *someone* eventually builds a prestige engine to take it is quite high), reducing the value of the cards that grant a single prestige.

I dunno how big this effect is in quantitative terms, but I definitely know it exists.

Are any of these stats collected from games playing Keldon's AI? I do believe that Keldon's AI overvalues prestige (a tendency I often punish it for when I play and beat the AI), which, in turn, could also bias these stats in favor of prestige-related cards.

Are any of these stats collected from games playing Keldon's AI? I do believe that Keldon's AI overvalues prestige (a tendency I often punish it for when I play and beat the AI), which, in turn, could also bias these stats in favor of prestige-related cards.

As rrenaud hinted at, the AI themselves are also in the stats, and are highly rated. [AI] Data is 43rd out of 4612 in BoW (106 out of 11470 overall), quite the feat.

One can view the 5 AIs' individual stats, i'm sure there are interesting patterns therein:

1) "PGA is only good because it is earning credit for other cards." This is true of every 6-dev. In fact, that's pretty much what a 6-dev does. But there isn't any 6-dev that earns as much credit, or hogs the limelight nearly as much, as PGA. It's a no-brainer to put down New Econ / NGO / Gal Fed after a lot of IV powers, Military, or developments, but those aren't dealing with the same selection effect. This also contradicts the next point: since PGA is a dual-use strategy-in-a-box card, then you'd also expect the card to be played early on and not just when it scores overwhelmingly high.

2) "PGA is very strong because it's a dual-use strategy-in-a-box card." This has been true of the other very strong 6-devs (GalFed and TGuild), as you pointed out, but none of those have ever been this far out of the curve. People complained a lot about TGuild and GalFed, but the data never backed them up in the same way PGA does now.

3) "Prestige cards are overrated because bad players don't know how to beat prestige." This is definitely true. On the other hand, you would therefore expect bad players to play PGA and then lose to good players who can beat prestige and drag the winning rate down. And since the skill-normalized winning rate is so high, this suggests that the opposite is true: that good players are losing to bad players who build PGA. Plus, if you look at the graphs of the top players, PGA is a top card for just about all of them.

4) "Gal Bankers is faring better than Gal Fed, which indicates that players are overvaluing Prestige." The two are mostly balanced, insofar as Gal Fed is played quite a bit more often than Gal Bankers and therefore (as expected) has a lower winning rate. More interestingly, when this discussion first cropped up, you used Spice World and Alien Burial Site as an example of two cards that should be somewhat balanced, arguing that the Prestige is balanced for by other factors. Well, ABS is played three times as often and wins a staggering amount more than Spice World. In other words, this argument works both ways: either everyone is still overvaluing Prestige, or maybe Prestige really is that strong and we underestimate its power.

5) "This is biased in favor of 2pa and AI games." This is completely true.

This data is consistent both with the hypothesis that Prestige is overvalued and that Prestige is incredibly strong (in 2pa). 12 of the 14 cards above the 0.07 curve are Prestige cards (with only Gal Fed and PGR in their company). The strategy space of "super strong cards" that used to be occupied by Gal Fed + TGuild is now almost entirely the Big Prestige cards. The bottom end of the graph contains almost no BoW cards (I think Lifeforms Inc. is the closest to the bottom) and definitely no Prestige cards.

I never said this. Please don't falsely put words in my mouth. Especially when you next contend that this false characterization contradicts something else that I did say. That's just sleazy debate tactics, which doesn't push forward a discussion.

Quote:

People complained a lot about TGuild and GalFed, but the data never backed them up in the same way PGA does now.

Shrug. How much of the seemingly extreme results for PGA is due to various systematic biases in this data set regarding prestige simply isn't clear. If, after factoring those biases out, PGA would end up roughly where GalFed is, I don't think there's any real problem.

Run the animation from 2P to 6P and you'll see PGA not being a problem at all in 6P (however, Rebel Seat then looks like a huge problem -- in fairly specialized circumstances). But, I'm pretty suspicious of the 6P results due to the very small sample size. But, the movement of PGA "inward" as the number of players increases is quite striking.

So, it's unclear to me how much weight I should give to this data.

Quote:

since the skill-normalized winning rate is so high, this suggests that the opposite is true: that good players are losing to bad players who build PGA.

Yes; if both the AI overvalues prestige and most players haven't figured out how to stop prestige, then it becomes, overall, a self-fulfilling prophecy (group-think at work). In such a situation, of course PGA looks extremely good.

In both the original game and with each expansion, I've seen a very strong learning curve effect. At each stage of the learning curve, group-think can be very powerful.

For example, very early on players complained that Military was too powerful. And data generated *at that point in time* might have supported this claim. Let's suppose -- for a moment -- that this was true. What should I, as the designer with lots more experience with the game, then have done? Followed blindly this data or argue (as I did, you can look at my old posts) that groupthink and a learning effect is going on?

Next, players argued that consume/produce was too strong. And, you can still see new RFTG players, who are clearly competent gamers, posting that they have "solved" RFTG after some 25 plays. And, I read those threads and it is very clear to me that those players have not yet discovered "dev-spam", which tends to reset the strategy space in base RFTG a fair amount ... Again, group-think.

One of the fundamental problems (and there are several, imo) with this statistical approach to gauge individual card strength is that the data from players further up the learning curve tends to dominate. And, further, in some cases, it *will* continue to dominate (through the magic of group-think) until a large % of the group migrates further down the learning curve. Looking at just the winner's data often doesn't work, as they're not beating people further down the learning cure.

This is not unique to RFTG. For many, many years, Titan was played locally in the SF Bay Area as a long, slow development game (8-14 hrs). It wasn't until a few very aggressive, very skilled players moved to the Bay area and started winning consistently that the rest of the players adapted. Then the game length came down very dramatically (to 3-6 hrs).

But, any data collected earlier from the SF Bay Area would have supported the notion that the winning strategy in Titan was to follow these extremely long term developmental strategies. After all, they *were* winning! Players who win are often fairly conservative and don't change their play styles until they are forced to.

I understand that you guys are very "mathematical" and "data-driven" in your orientation, but as someone who has previously done modeling, at the professional level, I tend to be a lot more cautious about what data is actually telling me. It is extremely easy to go wrong.

I am just not into race/race stats hacking like I used to be, so I am going to refrain from joining the balance discussion or delving deeper into the data. But the code/data is there for anyone who wants to do it themselves.

Re: TrueSkill, I will happily accept patches, with much less than 9 months latency

The PGA gets credit for wins it didn't cause thing is a bit falsifiable with the data. If you also keep track of the tableau position a card was built on, then the statement that a card is getting wins it didn't earn is would likely correlate with an up and to the right behavior in the played/won graph as a function of tableau position.

I never said this. Please don't falsely put words in my mouth. Especially when you next contend that this false characterization contradicts something else that I did say. That's just sleazy debate tactics, which doesn't push forward a discussion.

...

I understand that you guys are very "mathematical" and "data-driven" in your orientation, but as someone who has previously done modeling, at the professional level, I tend to be a lot more cautious about what data is actually telling me. It is extremely easy to go wrong.

He is a lawyer now, which approximately means he is committed to winning arguments regardless of the truth. I have done no such shameful thing in my career .

With all due respect Tom, I don't buy the groupthink argument. It just isn't appropriate to the keldon server. The server doesn't support that kind of behaviour. Games are usually played quickly with little communication between players, and since this is the first kind of ranking we've had for the server, there hasn't been any kind of high-level competitive interaction going on. It is a very different place than any of the web servers (genie, flex, boardgamearena), where top players compete daily for status. And the scale of any of those are much greater than a local group. One more thing, Keldon is sporadic in terms of population, and not dependable for setting up games. So you can't expect to see the same people online without arranging it in advance.

The dataset is 70,000+ games from an international group of players with a wide range of experience with the game. I can't think of a better sample set for considering how people play race, from newbies to veterans (I've played Brink easily more than 500 times, although a great deal of that is against the AI). The only complaint I can think of is the great bias towards 2pa games. And that isn't even a great problem: 2pa is the easiest format to set up (only requires 2 bodies!), and is strategically engaging. If a card is not working in 2pa that is still a significant issue.

... love ... rftg ... but ... must ... get ... back ... to ... work ...

My regular 2p opponent and I (200 BoW games on Keldon plus hundreds of pre-BoW ones IRL) have houseruled that if you have PGA in your starting hand you have to discard it. We also regard a win with PGA in tableau as a lesser win, and beating a tableau with PGA as a special achievement. If Keldon supported it, we'd remove the card from the game entirely. So I wasn't at all surprised to see it a huge distance away from the rest of the cluster in the stats breakdown.

My only problem with the game with all three expansions is that development strategies are not just very, very powerful (in 2pa at least) but also very boring to play.

Perhaps it will help clarify if I make an overall point so that we can evaluate the evidence, rather than just slinging evidence at each other without context.

My claim is this: Brink of War, specifically 2pa, would have been a "better" game if Prestige (and Prestige-related cards) were weaker than they are now.

My evidence: RFTGstats indicates that 12 of the top 14 cards are Prestige-related. PGA is far and away the card in the game most correlated with winning the game, to a degree that no other card has ever been in online RftG play.

As far as I understand it, your arguments are (and I think this is a fair characterization):

1) Data isn't conclusive.2) People on Keldon are probably bad and/or suffering from groupthink, which biases the data towards Prestige.

The problem with all this is that you're just criticizing the data without any real basis for it, as if the fact that the data is flawed invalidates any argument made on the basis of that data.

Yes, data isn't conclusive. Yes, the data is probably somewhat biased. Yes, we don't know how biased. But at some point, you've gotta throw up your hands and say, yes, this data demonstrates something. How extreme does the data need to be? If it won twice as often as it did now, would you still dismiss it as a product of groupthink? Everything about the data aligns with expectations, except the stuff that doesn't?

Now, you can say, "I don't care about the data. It doesn't matter what you claim it proves, because I am sufficiently knowledgable about this game to be able to tell you for a fact that PGA is not that overpowered." This suffers from a couple flaws:

1) You, as the designer, have a vested interest in the proposition that the game is balanced. Doesn't mean I don't believe you, just means I need a little more objective evidence.

2) It assumes that the designer plays on a significantly higher level of skill and understanding. Fair enough, I think this is mostly accurate. I am not so sure that this is true for the entire playtest group.

3) Even if you assume that the imbalance disappears at a sufficiently high level, this is one damn high level if it is beyond the reach of even the most skilled players in the world outside of the designer and his circle.

4) It is true that there have been stupid arguments like, "Military is too strong". But: no global data has ever supported such hypotheses; and the existence of stupid criticisms does not mean that other criticisms are invalid.

I want to specifically discuss your Titan analogy, which I think is a very important point in this debate. It is entirely possible that the Keldon data is irrelevant if you can demonstrate that it is analogous to the Bay Area Titan gamers.

But it seems to me that the Titan analogy works the other way around. You have a playtesting group that has played hundreds or thousands of games, compared to the entire world and tens of thousands of games. One of these is more subject to groupthink, and with all due respect, it's not the global community of players. It's completely backwards to say, "Well, a group of international skilled players representing hundreds of different game groups, they're subject to groupthink, but the people I play with! They're not subject to groupthink!"

So this is why I don't believe the criticism that Keldon data should just be ignored because of "groupthink", because it just sounds like an automatic way to criticize any batch of data that doesn't support one's claim. How much less groupthinky can a gaming environment be than Keldon? Especially when the rest of the data is pretty spot-on. Would you just never trust data, ever, from any gaming environment? If that's the case, how can you believe that your own experiences are not similarly tainted by groupthink?

Perhaps a better analogy is Backgammon, which is a good illustration of how even the best players in the world were forced to reconsider long-held beliefs thanks to data.

To repeat: I claim that 2pa BoW would be a better game if Prestige was less strong than it is now. As support, I have a lot of data gathered from a lot of games. You can correctly point out that the data is flawed, but unless it is fundamentally, deeply flawed to the point of irrelevancy, it still supports my claim.

Whether or not Prestige is "unacceptably" strong is a different matter, of course. BoW is still fun, and it's far from A Few Acres of Snow-level broken. But I tend to believe, both based on data and personal experience, that the Prestige mechanic in BoW is too strong. Beatable, but too strong.

Thank you for providing the aggregated data. Is the raw data available? If yes, in which form (flat file, database,...)?

I like the game as it is with PGA and prestige points. I have to admit that initially I saw PGA and PP as too dominant, too. After "a few" games it somehow lost its shock and awe effect. That is of course only true for my play group. I have played a lot against Keldon's AI, but even there it doesn't bother me anymore. I suppose I somehow adapted. Or I just don't care if I lose when my opponent lucks PGA. ;)Terraforming Guild was harder to deal with IMHO.

I think the dataset is unavoidably biased if you are looking for the cards that perform best at the higher levels of play, which is mainly what Tom is talking about. This dataset is from all players playing somewhat random sets of other players. Most of the matchups do not involve the top ranked players playing against each other, so it is only going to show the card play/win rate across all levels of play.

As for the stats, is there any chance to read the start of the game message log to determine start worlds to correct for TE and takeovers?

I think the dataset is unavoidably biased if you are looking for the cards that perform best at the higher levels of play, which is mainly what Tom is talking about. This dataset is from all players playing somewhat random sets of other players. Most of the matchups do not involve the top ranked players playing against each other, so it is only going to show the card play/win rate across all levels of play.

The data is skill-normalized, so the win rates displayed are relative to existing win probabilities based on Elo.

As for the stats, is there any chance to read the start of the game message log to determine start worlds to correct for TE and takeovers?

When I asked Ben (aka raistlin) about this, he said,

Quote:

In principle one find out what the homeworlds are from the log, but Idon't think I implemented that, partly because the log doesn't loadfor many of the games. As I recall, what I wrote only looks at thecards down and the point/prestige/cards in hand totals.

My other takeaway from a quick look at these stats is that they are heavily biased towards a 2PA play environment. I believe this also will tend to overstate Prestige effects since, in 2PA, if you're not in the Prestige lead, then your opponent usually is. This is not as true in multi-player games, where the prestige lead moves around a lot more and early prestige leads don't tend to hold up as well (since the odds that *someone* eventually builds a prestige engine to take it is quite high), reducing the value of the cards that grant a single prestige.

I think this is a very good point, and I'd be interested to see what the graph looked like with all data from 2PA games removed. I'd guess that PGA would still be really powerful but not quite so out-there.

My other takeaway from a quick look at these stats is that they are heavily biased towards a 2PA play environment. I believe this also will tend to overstate Prestige effects since, in 2PA, if you're not in the Prestige lead, then your opponent usually is. This is not as true in multi-player games, where the prestige lead moves around a lot more and early prestige leads don't tend to hold up as well (since the odds that *someone* eventually builds a prestige engine to take it is quite high), reducing the value of the cards that grant a single prestige.

I think this is a very good point, and I'd be interested to see what the graph looked like with all data from 2PA games removed. I'd guess that PGA would still be really powerful but not quite so out-there.

As far as I understand it, your arguments are (and I think this is a fair characterization):

1) Data isn't conclusive.2) People on Keldon are probably bad and/or suffering from groupthink, which biases the data towards Prestige.

No, this is NOT what I'm saying.

I'm saying that BoW, like RFTG and its previous, has a learning curve and this data is *biased* by being taken primarily from A) players which are still early on the learning curve and B) AIs which over-value prestige and C) players who are playing those AIs.

I'm not saying that the players are necessarily "bad" -- you can be a good player, winning lots of RFTG games, and still be early on the learning curve. I've run into many such players since RFTG was released.

For example, Rob Renaud, with respect to BoW immediately after it was released. I think we can all agree that Rob is a skilled RFTG player -- his multiple tournament wins clearly indicates that. At the con where BoW was introduced, Rob won the small tournament there as well.

However, Rob came up to me at one point as I was showing and discussing a new game prototype to Richard Garfield. Rob clearly wanted to play me 1:1 RFTG. So, when Richard and I were done, I did so. Rob and I played 5??? games of 2PA BoW RFTG and I believe I won all??? all but 1??? of them. Rob's comment when we were done was something along the lines of, "I don't understand how to evaluate prestige right now."

I don't think this proves anything about our relative strength as players. I do think this says something about the BoW learning curve. At that point, I had played roughly 750-1000 games of BoW RFTG and Rob had played maybe 20-30??? games. Rob was a "good player" for that number of BoW games, as demonstrated by his BoW tournament win.

I'm reasonably confident I could have restricted myself to mostly prestige based strategies (which I didn't do, though I did take the opportunity in one game to demonstrate a prestige win using CB's consume power, just so Rob could see it), played Rob some 50 games at this point, won most of them, gathered the data from doing so, and it would show a tremendous bias in favor of prestige.

And, I'm reasonably confident I could do this against any player new to BoW. Do this enough times (against a pool of new players) and, voila, a biased data set.

That's the sort of experience that makes me very wary of the approach used here.

Quote:

The problem with all this is that you're just criticizing the data without any real basis for it.

I've suggested a number of ways in which this data could be systematically biased.

But, as for evidence, consider the many players who state that PGA/prestige was a huge problem initially in their play group and that it no longer is. There are now a lot of posts along those lines in a lot of threads. That's not hard data, but it is certainly suggestive that there is a real "learning curve effect" going on.

Quote:

I want to specifically discuss your Titan analogy, which I think is a very important point in this debate. It is entirely possible that the Keldon data is irrelevant if you can demonstrate that it is analogous to the Bay Area Titan gamers.

I am not arguing that my playtest group is much *better* than any other group. I will argue that the group, as a whole, has walked the learning curve and we don't find prestige to be so dominant.

Quote:

How much less groupthinky can a gaming environment be than Keldon?

If the AIs are biased strongly in favor of prestige, then they do bias the play group towards prestige strategies (at least initially) due to the "fight fire with fire" response.

I personally went through that phase for a while playing the AIs until I figured out when to do this and when not to do so.

Among the things I have noted when playing the AIs in BoW TOs Goals games (I have over 1000 AI games under my belt in various configs) is that I win a lot more with military strategies versus the AIs in BoW than I do when playing live players.

After a while, I figured out it was, in part, because the AIs keep building Imperium Invasion Fleet, then blowing it up to conquer a non-military world, just to quickly gain 3 Prestige (one from IIF, 2 from the conquest). Watch them. The AIs *love* doing this. The AI will often spend their entire hand to do so at the start of the game. I'm far more likely to build IIF, then use it as part of a larger military strategy to conquer the 8s and 9s and end up beating the AI (which went prestige) militarily.

Note that IIF doesn't show up as a huge card in this data set (since its very unlikely to end up in winning AI tableaus due to being blown up all the time). I suspect if you take the AIs out, its ranking will improve significantly (since players do this less often and it is a reasonably strong card). That's the sort of effect I'm talking about.

The 2p data (at the beginning of the animation) is particularly interesting. It is almost a straight line, which is not surprising considering that it is the richest data set. The tightness of the results shows a direct correlation between the impact of a card on winning and the amount it is played. Tom has mentioned before that it is important that not all of the cards in Race are completely balanced against each other. For one thing, non-linear effects are an important part of making a game interesting. That said, PGA is way ahead of the pack again in this graph.

Tom Lehmann wrote:

groupthink

necessitates having a group in the first place, of which there isn't really on Keldon.

Tom, your points about playing against the AI are true. I play a lot of more military tableaux against the AI than against humans. But even if what you say is true, that there is some point on the learning curve where PGA is no longer overpowered, then it is beyond people who have played this game nearly as much as you have. If a given card can only be beaten regularly if you know the secret handshake, then I can't see how this isn't problematic. This is a card that warps the play space for the vast majority of players.

I don't expect you to change BoW. It is very, very fun. But I hope that you are wary in future iterations of Race of the strength of dev-spam. There is no other strategy that reinforces itself so well. It is a multiplicative effect. With settling and engine, there are cards that provide resources, and cards that score, but with devspam you often have both in the same card. And the effect of these cards is that the momentum of such a tableau increases exponentially as the game progresses, since you don't need to divert your actions to replenish your hand or add other useful pieces to your board, and you aren't reliant on others to call phases for you. Dev-spam, particularly in 2p, seems to be the problematic part of Race that we keep coming back to.