Even I feel like I'm missing the kick of gratification, and seem to be slightly less enthusiastic to cast flags - although my actual behaviour seems unchanged. It just doesn't feel like a game any more. I think this is a healthy effect and abolishing it was a good step - too much gamification is bad IMO.

Stack Overflow seems unaffected in any discernible way. We still get about the same amount of flags each day, at the same relative level of quality.
–
Robert HarveyJan 28 '12 at 21:35

1

@yoda yeah, that is possible. Although I suppose a weight fanatic would check in pretty frequently, and would by now have realized that the metric has gone?... Robert - interesting, thanks. That might make a good answer for now?
–
PëkkaJan 28 '12 at 21:36

You mean you aren't more enticed now that it's been transformed into a mysterious number that operates in the dark shadows of the system, away from the prying eyes of users?
–
Tim StoneJan 28 '12 at 21:37

2

@Tim apparently not. I seem to need the carrot of a metric dangling in front of me :)
–
PëkkaJan 28 '12 at 21:38

8

I feel less concerned about flagging the borderline cases now, but I wouldn't assign any weight to evidence of that quality.
–
FlexoJan 28 '12 at 21:41

1

I no longer fear it being attributed as flag-weight-whoring, so will actively use it from now. And presumably there will be less triviality flags piling up, so you can summon moderators for actual judgement calls henceforth.
–
marioJan 28 '12 at 22:01

1

@mario yeah. And I guess non-answers will no longer get piles of flags within 3 minutes as must have been the case until now... the motivation of "cool, a non-answer! Free flag weight! click" is gone :)
–
PëkkaJan 28 '12 at 22:04

1

(On a side note you can still infer your flag weight from the number of available daily flags)
–
FlexoJan 28 '12 at 22:04

2

Once I reached 750 I'd just continue doing as before. Getting the "helpful" check is always nice to have, feels like a little "Yay, I did something right today" pat on the shoulder. I don't think people are less enthusiastic flagging. You'll probably just get a few more negative flag records.
–
slhckJan 28 '12 at 22:11

Really, no offense intended, but that plot is not a very good way to determine if there is a change. It has weekly cycles, for a total of 12 weeks, and some of those overlap with some major holidays (Thanksgiving, Christmas, and New Year's). I could go on, but would love to just see the raw data. Perhaps it could be posted to stats.SE? :) I don't have a high flag weight on meta.SE, so the mods might never take of that. Couldja do that for me? :D
–
IteratorFeb 1 '12 at 22:06

1

@Iterator we need more time here ... we only collected a week of data
–
wafflesFeb 1 '12 at 22:09

@waffles Ah, the flag weight was dropped so recently? In any case, I would also investigate flagging frequency relative to prior flag weight, or, more simply, the proportion of flags created by low versus high weight users. Time will tell, of course.
–
IteratorFeb 1 '12 at 22:39

If anything it has reduced the number of questions in Meta we get when we deny a flag for someone with high flag weight.

On SF I haven't noticed a diminution of flags. We still have one or two people on old-question-off-topic holy war that are still at it, even though there is no more Marshal badge at the end of their troubles.

First my opinion, and then the data (on SO, and only insomuch as I can give you).

From what I've seen on Stack Overflow, the tying of moderation activities to a gaming aspect of the site severely hampered the moderators in their ability to moderate effectively.

On a fundamental level, the gaming aspects and the moderation of the site are diametrically opposed to each other. A moderator's first responsibility to any Stack Exchange site is to the content (making the Internet better), not to the game. The gaming aspects provided by the system are there to provide an incentive for users to that end as well.

However, as moderators, we sometimes have to take action that has an impact on the game (converting answers to comments, deleting answers, deleting posts, CW conversions, etc.). While that usually has an unfortunate impact on the game for someone, until flag weight, the worst impact was incredibly minimal, and it was almost always something that could be overcome easily by simply trying again.

All-in-all though, we had little impact on the actual game and we could focus on the primary goal of the site; curating great content.

When flag weight was introduced, moderators became the singular source for the minimal benefits and drastic penalties of that gaming aspect of the site. Granted, we should never let the gaming aspect of the site prevent us from the primary goal (curating great content), but as we've all seen, there were numerous posts on meta regarding the rejection of a single flag.

Granted, every user on Stack Exchange is completely justified for asking for an explanation for a moderator action on meta; unfortunately, the driving force behind these questions generally wasn't to further the first goal (curating great content) or out of a desire to understand how we work to achieve that goal, but usually out of a desire to further themselves in the game.

An additional impact was the ordering of the items in the flag queue; because flag weight contributes to the order of items in the flag queue, moderators were seeing flags whose order was contributed to by a metric that was meaningless.

All-in-all, this placed a tremendous amount of pressure on the moderators to place more emphasis on the (distant) secondary goal (to play referee for a specific gaming aspect of the site) of the Stack Exchange sites, instead of the primary goal (to curate great content).

Now the data.

From what I've seen on Stack Overflow, the removal of flag weight as a gaming mechanism has had a tremendous impact.

First, I can't recall a single post on meta asking why a flag was rejected since it's been removed. Why would it? If a flag was rejected, they simply had to try and flag again; the severe penalties are no longer there.

This means that there is less noise on meta, and that's a good thing, as it allows the moderators to focus on the primary goal of the Stack Exchange sites.

Additionally, it's allowed us to effectively use the rejection of flags as teaching moments, taking action (or none) where it was needed and conveying to the user that the flag was incorrect (and possibly why depending on the rejection reason).

This will eventually lead to better quality flags over time.

However, on Stack Overflow, from what I've observed of the flag queue, the sheer number of flags are typically triple or quadruple the number that I typically saw from the time I was elected in November of 2011.

Granted, we've been able handle them, but to me, it seems people are flagging much more with generally good results.

To your point about missing the kick of gratification, from my perspective, you are an outlier in that regard, as I've seen a few individuals go absolutely nuts with certain types of flags, unearthing dark, horrible content that I didn't know existed in the bowels of Stack Overflow.

I've cleared 16k flags on Stack Overflow and the vast overwhelming majority of them were indeed helpful. In general, where there is smoke (flags), there is fire (a problem of some kind). Where we saw issues with this is moderators who were excessively literal and anal -- "oh, you flagged this as X, but it is actually some other problem, so I will decline this flag." Anyway, THAT is why the guidance is to clear as helpful if there is any problem whatsoever with the post, not because of flag weight. Just to clarify.
–
Jeff Atwood♦Jan 29 '12 at 6:43

I'll grant you that the flag weight discussions on meta got very tedious. But in general, the problem with the flag weight system had to do with subjectivity; such a system can really only be used (or rather, should only be used) when the criteria for judgment are completely unambiguous: consider Is this spam? versus Is this a valid answer to the question? which is WILDLY ambiguous; even moderators can't agree among themselves whether it is or isn't.
–
Jeff Atwood♦Jan 29 '12 at 7:08

3

@jeff I disagree that the primary flaw of the flag weight was subjectivity; moderators were thrown into the middle of the gaming system and forced us to focus less on the primary goal of curating content. Moderators were now in fact part of the game when they never should have been (or have had minimal impact) in the first place. We live by the motto "narrowly scoped power" and flag weight pushed us beyond a narrowly defined scope. I believe that now, with flag weight no longer an aspect of the game, that definition of scope is restored.
–
casperOneJan 29 '12 at 7:11

the primary flaw was absolutely subjectivity; a strict system like that requires virtually unanimous agreement on what is and isn't correct. When someone flags something as "not an answer", there's just no way that there can ever be absolute agreement on whether that particular flag was correct or incorrect. It's always a judgment call. Flag weight didn't "push you beyond a narrowly defined scope", whatever that means, it simply made you the final single arbiter of a completely subjective decision that had strict binary GOOD/BAD results, and a strong "penalty" at >500 flag weight.
–
Jeff Atwood♦Jan 29 '12 at 7:14

4

My point in bringing up the flag number is that I've seen a lot of flags, too, and the intent of flags is to improve the system, therefore, all flags that improve the system in any way -- even if they are technically incorrect -- are helpful. Of the 16k flags I have seen, easily 99% are helpful by this criteria. I don't like having these discussions with people who haven't cleared a few thousand flags (outside of mods on SO, that's almost nobody) because they haven't seen the underlying flag data, they haven't walked a mile in the shoes of the people looking at these flags.
–
Jeff Atwood♦Jan 29 '12 at 8:00

it's allowed us to effectively use the rejection of flags as teaching moments - do you really think that people are checking to see if their flag was accepted/rejected? Without flag weight a user is more likely to flag and move on, not caring how accurate the flag really was (and not caring how much work they've created for a mod).
–
slugsterJan 29 '12 at 11:17

@slugster that's an interesting argument. I can confirm that I no longer check the flags list as eagerly as before (although I still do check it and scan it for "declined" entries. But not as often as I used to.)
–
PëkkaJan 29 '12 at 11:56

1

"the severe penalties are no longer there." Source? From what I read the flag-weight penalty is unchanged. The only difference is that it is hidden from the user interface.
–
CodesInChaosJan 29 '12 at 14:40

@slugster I can say that it absolutely has been used effectively as a teaching moment. One particular scenario is in the use of the "off topic" flag, sometimes, sometimes not asking for migration to Programmers on career advice questions. A simple message indicating that those questions are not suitable for Programmers (as per their FAQ) but really "not constructive" has helped people flag more efficiently. Granted, there isn't a widespread denial of this flag in this situation, but if a single user is doing it often, one denial with a good message has improved the quality of flagging.
–
casperOneJan 29 '12 at 15:19

1

@CodeInChaos perhaps that should be rephrased to "the impression you had that you were penalized so severely when a flag was rejected is no longer there"; yes, it's still there and calculated the same way, but the impact is not observable and not tied to any gaming aspects, effectively reducing the agita around that number. Unless you really care about the order of the flags in the queue, in which case, there's not much we can do (and how you'd know is beyond me, since you can't see the weight or the queue anyways =) ).
–
casperOneJan 29 '12 at 15:25

I think that removing the visibility of flag weight now is a positive, but if Sam expanded his graph back to when flag weight was first made visible, I bet you'd see a huge increase in flag volume as a result. Flag weight brought needed attention to flagging, so it did have a positive impact. It's just outlived its usefulness, and I'm glad to see the Meta complaints go away.
–
Brad LarsonJan 30 '12 at 22:01

@BradLarson I don't deny that flag weight brought needed attention to flagging, but I can't say that the way that flagging is currently incorporated into the gaming aspect of the site wouldn't have brought more attention than flag weight. The way it is now, the risk-reward ratio is much lower, giving incentive to many more people to flag, instead of heavily penalizing people who were doing the best job of it.
–
casperOneJan 30 '12 at 22:05

@casperOne - Perhaps just having the Deputy and Marshal badges would have provided enough incentive to kickstart the flagging volume we see now, given the sharp increase upon the requirement of the former for the last election. Unfortunately, we can't go back and test that now. It will be interesting to see if flag volume and quality are sustained in the long term with just the badges as motivation. There's momentum right now, but will it fade off over time?
–
Brad LarsonJan 30 '12 at 22:29

For better or worse, I think the removal of flag weight has destroyed any fun that used to exist in flagging. In my opinion it has now simply become a chore and this question made me realize that I have (unintentionally) stopped most flagging (except for the obvious spam/"me too" posts).

Badges and rep flow like milk and honey, but a high flag weight was the only thing that was actually difficult to attain. When I noticed that FW was removed it made me sad for a few minutes.

I'd say the complete opposite - it focuses on the positive now "you helped fix N problems" and feels far less like a game where the objective is to get a high score and not wipe out.
–
FlexoJan 28 '12 at 21:54

4

As said, I think removing the gamification from this was a healthy step, but this is a valid observation nevertheless, especially "a high flag weight was the only thing that was actually difficult to attain"
–
PëkkaJan 28 '12 at 22:00

4

I agree with this, but the decision of "helpful"/"declined" can be quite subjective on things like, say, is this not an answer flag valid? which made the flag weight game kind of .. random, at times. It is much clearer on things like spam flags. Overall I'd say the #1 problem was using such a strict system on decisions that are so subjective that even the sites' moderators can't definitively agree among themselves.
–
Jeff Atwood♦Jan 29 '12 at 8:15

I completely agree. Contrary to popular belief, I don't spend my days flagging just because I'm a nice person. There is an element of that (which is why I'm also a very, very active editor), but...
–
Lightness Races in OrbitJan 31 '12 at 20:53

Revisiting this a year on, I've barely flagged a damn thing since my last comment! I stick to close votes -- things that I can see.
–
Lightness Races in OrbitJan 17 '13 at 3:12