Official Data Miner

So based on the fact that the distributions of "stalliness" in the most of the metagames basically look like bell curves (instead of seeing peaks around certain values), it looks like the only good way to come up with stall-score-based cutoffs for "Hyper Offense" vs. "Offense" vs. "Balance" vs. "Semi-Stall" vs. "Full Stall" will be for people to give me some teams to score.

So here's my request: post a team in this thread in PO/PS-importable plaintext, tell me how YOU would classify the team, and I'll post the stall score. Please use the CODE tags so that this thread doesn't turn into a massive block of text.

"T/KO" means "turns per KO," as in the number of turns the stalliness metric predicts should elapse in your average battle between KOs. Keeping in mind that what you'll more likely see in a battle is something based on the average stalliness of the two teams, this actually matches my experience decently well, as when this team battles other stall teams, the battles will regularly stretch into the 70-80 turn range and will end 5-0 or 4-0.

i'd look at the rmt archive for this; it's got tons of great stuff and it even has the team style/importables written down for convenience. i'll just pluck out two to start with and speculate on what i think their stalliness should look like.

for example: http://www.smogon.com/forums/showthread.php?t=3463417. the archive deems this balanced. it consists of utility physdef jellicent, tspike/spin forry, sdef heatran, sd virizion, np mew and subdd nite. i offer this example mainly because it's a balanced team - 3 bulky mons and 3 boosting sweepers - and could land on any side of the stalliness metric, so it's interesting to see where teams like this actually end up.

http://www.smogon.com/forums/showthread.php?t=3458118 textbook heavy offense in the deoxys-S era. screen/sr deoxys-S, sd viriz, all-out-offensive ddnite, all-out-offensive dd gyara, taunt/dd wallbreaking haxorus and offensive sd scizor. this should be really easy to categorize as extreme offense, every single mon has a 252/252 spread and a boosting move except for an obvious dual screen deoxys.

Official Data Miner

i'd look at the rmt archive for this; it's got tons of great stuff and it even has the team style/importables written down for convenience. i'll just pluck out two to start with and speculate on what i think their stalliness should look like.

Click to expand...

I thought about that, but I'm lazy.

for example: http://www.smogon.com/forums/showthread.php?t=3463417. the archive deems this balanced. it consists of utility physdef jellicent, tspike/spin forry, sdef heatran, sd virizion, np mew and subdd nite. i offer this example mainly because it's a balanced team - 3 bulky mons and 3 boosting sweepers - and could land on any side of the stalliness metric, so it's interesting to see where teams like this actually end up.

A little stallier than I would have expected for balanced (I was thinking balanced was going to be -0.5 to +0.5). Note that the "bias" score falls into the "balanced" range (-1500<bias<600). Really this just appears to be a reflection that the distribution is a bit skewed (that is, you see a larger range on the stall side than on the offense side).

http://www.smogon.com/forums/showthread.php?t=3458118 textbook heavy offense in the deoxys-S era. screen/sr deoxys-S, sd viriz, all-out-offensive ddnite, all-out-offensive dd gyara, taunt/dd wallbreaking haxorus and offensive sd scizor. this should be really easy to categorize as extreme offense, every single mon has a 252/252 spread and a boosting move except for an obvious dual screen deoxys.

Click to expand...

No importable in the RMT(Move your mouse to reveal the content)No importable in the RMT (open)No importable in the RMT (close)

Note that going off bias alone, Innocent Criminal would have classified this team as merely "offense," due to the mixed EVs on Scizor and Haxorus and the HP investment on Deo-S. Of course, he had the exception to the rule that a screener makes it HO. Meanwhile, take the Light Clay off Deo-S, and the team still has a stall score of -1.92, which is most definitely HO.

^ EDIT: i think i corrected my math now
anyway the stalliness metric is the base-2 logarithm of how many turns it would take for a mon to kill itself. in that case log2 X = 4.27, X = 2^4.27 which is over 19. it does sound pretty legit when chansey would be softboiling while throwing seismic tosses at its mirror image; it'd probably take a lot of turns for it to kill itself.

moreover, mons with low attack and big defenses are pretty much only useful for their defensive utility. when you have that much defense and that little attack, the number is unlikely to be an accurate reflection of the turns taken to KO yourself, but i still think such mons should significantly swing the team in favor of stall because of their massive lean to defense. just including one of them on a team should dramatically alter it. i could call a team bulky offense if it had a cb ferrothorn on it, but i'd be rather dubious if it had a chansey

If this purely about how long it would take for a pokemon to kill itself in a mirror, then Chansey should be 2.81 considering it takes realistically 7 turns to kill itself with seismic toss. I guess if Chansey was using ice beam it would take 19 turns, but thats pretty unrealistic (I think you run out of PP before then lol) Blissey could be 3, but I guess in theory 4 does apply if Blissey decides to go with flamethrower (it has just enough PP for 16 turns...).

Not saying there anything wrong, but Chansey with such high a number is admittedly an interesting quirk of the system.

Here's a little rain offense-balance team. I'd classify it offense with some defensive pivots, mainly that the strategy is to have offense with things to take hits, not stally mons with things to clean up. It'll be interesting to see how this gets classified.

Hide(Move your mouse to the hide area to reveal the content)Show HideHide Hide

Here's a little rain offense-balance team. I'd classify it offense with some defensive pivots, mainly that the strategy is to have offense with things to take hits, not stally mons with things to clean up. It'll be interesting to see how this gets classified.

In actually wondering if you can use this method to make sort of a mixed bulk defense teirs, I am sure it can be done, but I am unaware of the logarithms involved to do this. Maybe you can give some incite antar.

Official Data Miner

So I ran my algorithm over the 54 non-LC, non-VGC teams in the RMT archive and plotted bias and stalliness vs. how the RMT archive classifies the teams. The results are in this blog post.

As you can see, neither metric does a particularly good job of suggesting defined cutoffs for the playstyles. As I say in the post,

That’s really disappointing, because what is says is that I’m missing something. What makes a team heavy offense vs. offense? balance vs. semi-stall? Is it just a judgement call, or is there something concrete that I can try to incorporate?

brute force is really a textbook example of hyper offense: dual screen lead + 5 setup sweepers. you can't really get any more hyper than that. however, i would say hawaiian air is ALSO a hyper offense team (it might not have screens, but it's all about attacking attacking attacking and establishing momentum by sheer force and revenge potential), but the line between "offense" and "hyper offense" is one that needs a bit more defining, just in general.

jimera's definition of hyper offense, which i believe revolutionized the term for me, is that a) the team maintains momentum by quickly revenge killing anything that attempts to exert offensive pressure on it, so that it can resume pushing back, and b) the team's attackers all take out one another's checks, counters and walls so that no defensive team can hope to wall any mon safely, lest it give a free switch to something else that will proceed to set up and end the game. twash's team totally looks like that to me: several strong fast attackers with revenge potential (eg mamo), plus a setup sweeper or two that closes out the show. offense might follow a similar concept, but for me, i'd guess that the line is between a team whose supporters drive towards that goal, and a team that has no supporters and simply uses every single pokemon for it (bar a suicide lead that might establish favorable conditions to develop play). please feel free to argue because i think the terms *are* a bit loose and could use discussion.

when i look at yee's sand team though, it reeks of full stall. i don't think it would be considered semistall at all, although perhaps that's what the archive labels it? sdef jirachi and standard slowbro tank stuff and serve as the team's walls, rade/hippo/dtail all provide a mixture of respectable bulk and residual damage (hazards and phazing), and excadrill is the cleaner. some will disagree with me, but i think one fast, aggressive pokemon cannot take away from the full-stall-ness of a stall team, since that mon often ends up in the dual role of killing mons that the stall team has allowed to set up too much, and cleaning up worn-down teams in the mid-late game. such a pokemon does not prevent a team from being full stall... in my opinion, at least.

i think the real problem is that there is a ton of overlap between the team types that we're trying to define, and i think establishing the boundaries is really difficult, unless we start looking at the roles played by a mon, and how the team actually functions in the hands of its creator. those are things that are really difficult to look at when breaking the team down into mons, moves or EVs but they make all the difference for those subtle lines between bulky offense, balance, and semistall, or that kind of stuff. perhaps it is time to delve even deeper and come up with ways of defining a pokemon's roles: does it generate momentum? set up residual damage? serve as an all-out wall? sweep? offensive pivot? defensive pivot? where are the lines in between? is it even possible to define them? those are the questions we need to answer, i think. if we can put together the data we have about each mon and convert it into a list of roles that the mon plays on a team, we can see the roles that the team as a whole puts the most stress on, and those most important roles define what the team needs to function properly - ie how it works. that is the core of playstyle, imo.

Official Data Miner

Defining playstyle as a combination of roles rather than through a metric like stalliness is definitely one route we could go down, but I'm not quite ready yet to give up on stalliness.

It's looking like I can improve on agreement by adjusting my moveset modifications, raising and lowering some weights while adding or removing other modifications.

For example, hail and sand should count towards stall, rapid spin is really a stall move (magic bounce, though, I think plays better for offense, so I'm leaving it off for now), I was obviously weighting will-o-wisp WAY too strongly in favor of stall, and setup moves are not all created equal (Shell Smash is far more offensive than Calm Mind).

For example, hail and sand should count towards stall, rapid spin is really a stall move (magic bounce, though, I think plays better for offense, so I'm leaving it off for now), I was obviously weighting will-o-wisp WAY too strongly in favor of stall, and setup moves are not all created equal (Shell Smash is far more offensive than Calm Mind).

Click to expand...

I don't think that Sand should count towards stall, a lot of people use TTar simply to fight rain and sun teams (not to mention that TTar is an excellent pokemon all by it self). Sun on the other hand should clearly count towards offense because the amount of sun stall teams is completly negligible wich already brings me to the next point Sun teams almost always carry a spinner and often even a defensive one like Forretress or Donphan without being defensive in any way outside of this pokemon (and often Ninetails). Maybe it would be the easiest solution simply always categorizing Ninetails as an ultra offensive Pokemon to make up for this flaws of the metric.
Counting Rapid Spin as a stall move is a bit one sided as Starmie is one of the most useful partners for Pokemon such as Cloyster, Volcarona, Dragonite and many other heavy offensive, but SR weak Pokemon though you are absolutly right that many offensive teams prefer Magic Bounce Espeon/Xatu are not that easy to fit on any team and require some prediction so maybe you shouldn't count it too much towards stall.
I am not too sure about set-up moves sure bulk up and calm mind are not as offensive as the other boosts (and could be treated more as defensive boosts), but i wouldn't say that SD or DD are less offensive than shell smash ( wich is also a special case when you look at how often it also involves baton pass) and i wouldn't rank Cloyster as a more offensive mon than SD Terrakion or DD Salamence.

Overall big thanks to you for doing all the work to get even better statistics

For example, hail and sand should count towards stall, rapid spin is really a stall move (magic bounce, though, I think plays better for offense, so I'm leaving it off for now), I was obviously weighting will-o-wisp WAY too strongly in favor of stall, and setup moves are not all created equal (Shell Smash is far more offensive than Calm Mind).

Click to expand...

it remains to be seen if these adjustments really do make the metric more accurate, so it's worth trying, but i wouldn't agree that sand should be stally, and rapid spin is not always stally either (although it's definitely more defensive than magic bounce). both of those things are all-around useful and even offensive teams will often incorporate a spinner if it doesn't hurt their momentum excessively to do so. perhaps make magic bounce offensive, because if anything it's an ability that's really all about momentum and offensive teams tend to appreciate it much more than defensive ones. just take a look at lavos's archetypal sun team lol

i also think that if anything cm and bulk up should be rated down in offensiveness, rather than shell smash being rated up. shell smash's effect is way more aggressive than say sd or nasty plot, but in the end it tends to wind up on the same kind of team as those other boosters would. on the other hand, things like cm and bulk up can just as easily work on a slow and steady late game finisher as they would on an all-out offensive mon - in fact, now that i think about it, i'd say cm and bulk up actually work BETTER as late game bulky setup moves than they do as offensive ones. perhaps they should actually be rated as stall moves! just think about how much rarer CM latios is than specs/recover+3/4 attacks. only 10% of latios run CM, where as a whopping 65% of latias do. the difference between the two mons, and the reason they do or do not run CM, is obvious.

Official Data Miner

With my revisions complete, I also feel confident in defining the cutoffs between the various stall-related playstyles. First, a graph, showing how my original and newly revised stall scores score the 54 non-LC, non-VGC teams in the RMT archive:

The limitation of this would be that my system has no way of differentiating between offense and bulky offense, which is fine with me.

Also, to quote the blog post,

As you can see, it’s not perfect, but a lot of the teams that are incorrectly classified are pathological cases (for example, the Balance team that almost scores high enough to be Full Stall was designed by Molk and is built around a Scraggy). Frankly, no one has adequately explained to me the difference between Offense and Heavy Offense (Hawaiian Air is the Offense team with the lowest stall score, and it features two exploders and two more Pokemon that set up and, frankly, seems much more offensive than the "Hyper Offense" team Reflections).

results are looking pretty legit. curious to know which teams landed where in the metric vs what they were classified as in the archive index. perhaps it could give us some insight as to how we classify teams, vs how a computer would like to classify teams. otherwise the results seem to speak for themselves; i don't think there are any arguments left to make. let's see what happens if this is run on the official ladders!

it's the toxic staller gliscor and the sdef heatran here. with mons as offensive as hydreigon and cb terrakion, this team definitely plays offensive despite those two. the score looks pretty legit though so not much room to complain, it's PRACTICALLY on the offense side of the split.

the fact that it's all choiced except a hitmontop makes me think HO right away so the score seems to agree with a surface analysis. i guess it was mainly called bulky offense because the mons themselves (azumarill, zapdos in particular) are a bit bulkier and not as aggressive as setup-based HO would be.

i had individual things to say for each of these but they all sounded the same.. basically i agree. they all have enough bulky mons that they might as well be balance, and really the line between bulky offense and balance is a fiiiine one

yeah this team is actually really stally even though the objective is obviously to set up for a garchomp sweep. i would rather say that it's semistall with garchomp as the cleaner, than offense with the rest of the team as support lol

Team WOLF GANG @ -0.75. Two SDers and a Nasty Plot Celebi. How is this not offense?

this team is kinda on the defensive side what with flareon/tangela/quag FWG and a very bulky variant of scraggy that probably looks like a stallmon. however it's obviously not THAT stally. i have an interesting suggestion here because i don't think this team deserves that high of a stalliness: perhaps eviolite should count for less in lower tiers? as the higher evolutions get caught in higher tiers, lower tiers can often make use of eviolite to turn their pre-evos into viable competitors, especially if those mons have either boosting moves with which to turn their bulk into a sweep, or some natural bulk to begin with. the obvious culprit here is the scraggy whose moveset, EVs and item look stall-minded at first glance, but are actually meant to play as an offensive late-game cleaner. this could have significant effect in LC as well because even the most aggressive of LC mons can run eviolite and suddenly go from all-or-nothing suicide sweeper to bulky setup mon.

i'd categorize this team as bordering on full stall actually if you ask me. the victini is probably what makes all the difference. interspersing even ONE offensive uturner into a team swings it towards offense quite significantly, in terms of actual play. score looks reasonable.

overall looks pretty legit. also, lol at how all full stall teams were classified correctly. full stall is just THAT obvious lol

drown all is actually pretty stallish in the sense that you mainly get damage from spikes + lugia with a check-all in kyogre

i guess you could call it balance because it's not exactly as stally as groudon / blissey / forretress / giratina / latias / filler, along with the fact that tr is used to full stall so he probably rates it as a more offensive team

Read through all the blog posts and this thread. Interesting project, and cool scatter plots. Mostly looks very good, though there were a few things which seem like they may be questionable. If you're not keen on revisiting things that's fair enough, but my thoughts:

Leftovers do nothing to the metric. I based this decision on my observations of the differences in metric between bulky- and fully-offensive sets. In some ways, Leftovers are the “anti-Life Orb” in that it adds health where Life Orb takes it away, but the difference between a Life Orb and Leftovers set shouldn’t be a whopping 1.0. Fine then, you might suggest, split the difference and have it be Life Orb -0.25, Leftovers +0.25. The two problems with this are that (1) I truly believe that Life Orb should have the same effect as Choice items, and (2) in my experience, Leftovers is the item you throw on your Pokemon when you don’t have anything better to give it. I see plenty of Leftovers Pokemon who run offensive (even heavily offensive) sets. On the other hand, you rarely see a bulky Pokemon go with Life Orb.

Click to expand...

I don't think Leftovers should have an equal effect to Life Orb since Leftovers both has a smaller per-turn effect on HP and does not change damage output, however giving it no effect (equal to no item, Quick Claw, BrightPowder, or even the 20% type boosting items in the initial blog post) seems.. not right. Leftovers may be used as a default item in a few cases, but even in those cases it is very clearly increasing the defensive ability of the holder, and it seems like this should be represented by a + on the metric, albeit one smaller than Life Orb's -.

Also, why exactly do you feel LO should have exactly the same effect as Choice items? From my experience Choice users tend to be more wallbreakers than full sweepers, and unlike choice items Life Orb has a significant direct harm to the holder's defensive ability. My gut feeling is to give LO a higher rating, though I'm not entirely sure about that.

Second and much more major point, you seem to be discarding the lower offensive and defensive stats entirely.

I propose to measure “stalliness” based on the the number of hits of a (non-STAB) base-120* neutrally effective move it would take for a Pokemon to KO itself, or, more precisely, its mirror (ignoring items, abilities, status and actual movesets, and assuming the Pokemon is using its stronger attack stat against its stronger defense stat).

Click to expand...

This will mean your formula cannot take into account the advantages of being a mixed sweeper, or, more importantly, the fact that some Pokemon may have one decent defensive stat but be extremely frail to the other kind of attacks (Cloyster, Aggron, Blissey, and Mantine are excellent examples, but even more mildly unbalanced defenses will cause a Pokémon's stallishness to be overestimated to a lesser extent). I can see why you'd want to make that simplification, dealing with both stats can get kind of messy, but this seems likely to be the biggest issue with your formula's correctly assigning stallishness from stats. For attacks perhaps raising both to a power, adding them, then taking that power's root of the result would be effective? A larger power would mean a smaller boost for mixed attackers, and visa versa. Ideally this would only be applied if the set used both physical and special moves. A similar method (perhaps with a different power) could be used for defenses.

Doing this may complicate the effects of certain items. In particular, Eviolite and the Choice items could no longer reasonably said to grant exactly the same boosts. Applying the item boosts in the initial calculation would solve this. And doing the same with Life Orb changes the previous point, applying the boost to both stats then having a smaller modifier simply from HP loss which is near equal or equal seems sane.

One-time use items subtract 0.5 from the metric. The idea here is that consumption is antithetical to stall. Stall teams are often in pretty much the exact same position 50 turns in as they are 25 turns in. It’s what makes stall so annoying. There is an exception to this reasoning: Harvest and Recycle. See below. Note that this negates the effect of Red Card, which I believe is well and good.

Click to expand...

Generally a good idea, but I'd suggest some change to how healing berries and berry juice are handled. In LC holding an item like that gives a massive boost to endurance, even though Eviolite seems much more popular in 5th gen and Berry Juice is banned from both. Making these items have the same effect as Salac or a Gem seems backwards. I'd suggest making one time use items which heal health either have +0.5 or at least be neutral (also helps with VGC/doubles/triples, where Sitrus is somewhat viable, and clearly more defensive than other one-time use items). Status healing berries are more debatable. They're used with Rest for one time healing, but of course that's still just a one time thing, not full stall's style, but also not hyper offense style.

The move Protect (and variants) adds 1 to the metric. From a mathematical standpoint, it’ll take you at least twice as many turns to KO this Pokemon.

Click to expand...

This is assuming Protect is used purely as a stall tactic, rather than to activate a status orb, delay for more Speed Boosts, or for scouting dangerous moves as a frail sweeper. Also, while Protect is used to stall along with Wish/Leech Seed/Toxic/other passive damage, those forms of passive damage already give a significant + score. Adding a whole +1 to that seems too much. Many offensive sets will use Protect only occasionally, and even defensive seed/toxic stallers risk giving free turns by using it predictably, so the mathematical double turns is not generally applicable (except for Stallrien and friends).

The ability Regenerator adds 0.5 to the metric. It’s less simply because it recovers less health.

Click to expand...

Halving the change to the metric because of a fairly small difference in health gained, when it can be activated on the switch rather than needing a turn to just heal.. hm, maybe it's not quite as stally as others, but 0.5 does seem slightly low.