I want to implement something similar to Stack Exchange's reputation system. What would be the better practice: using an SQL aggregate function to recalculate every time it changes, or storing reputations in a table and manually altering it every time reputation goes up or comes down? How does Stack Exchange's implementation work?

@Matt - And since it's a solar calculator ("going green" and all that), he has to work really fast.
–
cdeszaqMar 26 '12 at 13:27

1

I'm not working on a question and answer site like stackoverflow, I give points to users on each file upload, article post etc. I have a field on each users row in the database, I'll manually increment the users points/reputation when the user makes an upload or post, and will subtract if the user deletes.
–
ChibuzoMar 26 '12 at 13:33

1 Answer
1

So do we do this in SQL? Well, not really. We query to see if you're at the daily rep cap via SQL and make some decisions based on that, but we don't determine what reputation value you should have from that...it'd be a little crazy. If you want a detailed answer, keep reading.

First, credit goes to the entire team here, I didn't write the original rep code, or even all of the current code. The last major revision was from SuperDalgas and I, but there are a lot of hands that go into making this system tick.

We grab everything involved in your history and crawl over it doing a variety of things:

totalling the votes

gather time-based aggregates (rep this week, month, etc.)

fixing vote rep changes

for the rep cap

for the rep floor

other changes like merged users, etc.

see if you should be earning the association bonus, and grant it if missing

That's just the votes, we run over them and figure out what should be what, what totals each day has, etc. Now the way we store votes has 4 vital components (with regards to rep):

Voter

VoterRepChange (e.g. -1 on an answer downvote, or -400 for a bounty, +2 for accept, etc.)

Target

TargetRepChange

While this is great for storing the votes in a concise way, it suuuuuuuucks for determining reputation, because you have to do a Union of several sets of events (e.g. a Post deleted here counts but not here, unless it's a spam/offensive vote which always counts, etc.)

Once the votes are totaled up then we need to record a denormalized history of everything that affects a user:

Asked Accepts Answer

Asked Unaccepts Answer

Answer Accepted

Answer Unccepted

Voter Downvotes

Voter Undownvotes

Post Downvoted

Post Undownvoted

Post Upvoted

Post Unupvoted

Reveived Suggested Edit Approval

Spam Flag

Offensive Flag

Bounty Given

Bounty Earned

Bounty Cancelled

Post Deleted

Post Undeleted (not used, in practice, because a recalc happens)

Association Bonus

Arbitrary Reputation Change (StackApps API beta participation badge)

Vote Fraud Reversal

Post Migrated (not used in the DB, but shown in the UI, for a few reasons)

User Deleted

These are generated by looking at the results of the votes from the previous step, and recorded in order (chronologically) for the user as a ledger of all events affecting their rep (these RepHistory entries are what's shown on the reputation tab in the profile).

All of the above is for a recalc only. The process itself is atomic and non-commit-dependent. We track all of this through in-memory objects so we can fully simulate a recalc and all fixed votes without actually running it, this helps immensely with debugging (I highly recommend this for anyone considering a reputation system - I have built several developer debug views just for this purpose).

When you do anything that changes your rep, or someone else does it, we need to evaluate what's happened. A variety of things may happen at this point to determine the delta and record history, the simplest example is an upvote on a non-capped user:

User upvotes your non-CW post

Are you at the rep cap? Would this post put you over?

If yes, the delta of effective rep, which may be as low as 0, is determined

Otherwise the delta is +5 or +10 depending on post type

Record the Reputation change (Update User Set Reputation=@NewRep...)

Record the RepHistory entry (Insert Into RepHistory...)

Every other case is more complicated. For example an upvote reversal:

Were you at the cap?

If so, queue for a recalc since we need to re-walk votes and see if others that day counted now

Deduct the TargetRepChange

Record the Reputation change (Update User Set Reputation=@NewRep...)

Record the RepHistory entry (Insert Into RepHistory...)

Then there's the more extreme cases where we need to walk votes for everyone involved, let's say a post is deleted (or undeleted, this includes migrations):

Find anyone with a rep impact:

The owner

Downvoters (if an answer, question downvotes were free)

Anyone with an approved suggested edit, and a few others.

If they were impacted, queue them for a recalc, again this requires a re-crawl of votes

There are another dozen or so cases off the top of my head, but there's no reason to go into detail on every single one, you get the idea. If there's some crazy corner case (there are many) that you're curious about in particular, comment.

Wow! Now I understand your first comment. I guess I need to read this answer again and again. Thanks for your time.
–
ChibuzoMar 26 '12 at 21:17

2

@Chibuzo - there are other pieces of course, like how the background recalc queue works, etc...if you have specific questions feel free to ask, none of this is secret stuff really, just not laid out anywhere until now.
–
Nick Craver♦Mar 27 '12 at 0:35

5

Before this gets closed out to 'thank you\'s' ......... THANK YOU for the excellent answer :)
–
d-_-bSep 10 '12 at 23:20