My brain says hotness of a post should be calculated on every request because old posts should have a low hotness score.

I implemented this function in MySql but as you might guess, it's slow. Very slow even indexes are there.

And then I think to update the hotness field of the post on every upvote or downvote of the post. Let's assume we have a post just posted now and has 100 upvote and 1 downvote and it's hotness is 78 (example). But it doesn't get any votes for 3 days. After 3 days a new post appears and gets 100 upvotes and 1 downvote and has 78 score. If I'm updating hotness score on votes, the 3 days old post and the new post will be show together. But in theory, the 3 days old post shouldn't even show up in front page.

3 Answers
3

My brain says hotness of a post should be calculated on every request because old posts should have a low hotness score.

You're brain is wrong. The date in the function is the original time of the post. The function's value is independent of calculation time and remains constant between votes. Keeping this in mind, you only need to update it when votes are cast. You just store the number in the database & query on an indexed column. Performance issues disappear.

This also addresses your second issue - a post with 100 upvotes made today will always be hotter than a post at +100 made yesterday.

In my humble opinion (not an expert) is that you can update the hotness in the db on any upvote or downvote but to distinguish the new posts from the old ones you can apply some kind of equation to the hotness that you keep in the DB. Something like:

this means posts will get less hotness with time and posts with more then 150 days start geting negative hotness for that and also the post that had votes in the last 15 days get hotness for that and after 15 days without votes they start getting negative hotness

I upvoted your answer :) But the "effect of votes" problem is already solved with the current algorithm. 1 upvote doesn't mean "1" if it's given 15 days after the submission of post. But, if no one votes in last 3 days, the calculated hotness should "decay".
–
CnktJan 24 '13 at 13:07

When you have long-running processes, it is best to not make a request dependent on the completion of the process. If giving the user the most up-to-date hotness score is not of utmost importance, you could probably get away with giving the user a relatively close score.

Instead of launching this calculation on each request, you should have some sort of mechanism to trigger a background process that pulls the needed data, calculates the hotness and updates a set value for the post. Your database would like something like:

/* Other Post Data */
postViews - BIGINT // Number of actual post views
postViewsLastRun - BIGINT // Number of post views at last hotness job
lastViewed - DATETIME // Timestamp of the last viewing of the post
lastCalculated - DATETIME // Timestamp of the last hotness job
hostness - INT // Hotness rating to display to the user

Depending on what you feel is appropriate, you may wish to recalculate hotness after every n number of post views, or perhaps you want to calculate it every hour.

For the former, you would pull the postViews and postViewsLastRun column, increment it, and check it against your threshold of posts. If the difference between postViewsLastRun and postViews meets or exceeds your threshold, you would want to update the postViewsLastRun spin off a process with some sort of job scheduler to handle the processing and updating of the hotness score.

For the latter, you would have a scheduled process to run, say every hour. It would check if there was whether the post was viewed or not in the within it's threshold amount of time by checking the current time against the lastCalculated column. Assuming the criteria for another calculation are met, it would spin off a job to handle processing and continue checking the relevant data for other posts in the database.

Now whenever a user requests a post's data, you can give them a relatively accurate hotness rating, based on your criteria for accurate, and it will not cause any latency because of a long running calculation. In your case, you won't want any stale information to show up, because this would cause a bad user experience, so you would want to use the approach that runs at a regular interval of time.