99.9 percent of comments are either spam, off-message or simply wasted electrons.—ithacaindy (Obviously part of the 0.1%.)

Stack Exchange does not have that problem. Thanks to flagging, our spam and offensive comments have a half-life of minutes. So the goal of this proposal is not to mess with a good thing. Instead, I propose we increase our information density by hiding comments that are not bad, but just trivial and likely obsolete.

Currently on Stack Overflow and other non-meta sites the top 5 comments by vote are displayed in chronological order. The sixth and following comments are hidden behind a link. For posts with many comments that means that the earliest comments have a strong tendency to be shown even if they are not particularly useful or friendly. For posts with 5 or fewer, the comments are a nearly permanent feature even if an edit or the passage of time makes them obsolete. To rectify the situation, I propose using:

Comment Weight

Each comment is given a weight from 0-29 based on the following criteria:

One point for every vote up to the tenth.

One point for each 15 characters beyond the minimum (15) capped at 9.

10 - age (in days) down to 0.

A comment with no votes, less than 30 characters long, and older than 10 days will have a weight of zero. If a comment of length 150 gets ten votes on the first day, it will temporarily weigh in at 29.

Comments with a weight less than 10 are hidden.1 All remaining comments are shown in chronological order.2

Let's look at each factor individually:

Age

All comments start at a weight of 10 and decay over the next 10 days. That means that every comment is displayed for at least one day and most3 will be displayed longer. Several proposals have suggested hidingoldcomments. The common thread is that most comments don't matter after they have been seen by whomever they are directed to.4 Under the top 5 system, comments are assumed to be valuable unless there are too many. Under the weighting system, comments must demonstrate their value in order to be displayed. The age factor gives comments the time they need to gather support.

Length

Lukas Mathis conducted a survey that showed statistically longer is better when it comes to comments. While I could find no other studies to support or reject this claim, it does match my experience.5 Not every long comment is worth keeping around, but as the comment length approaches 150 or so, the odds are a lot better. The length factor is capped at 9 so that a commenter can't pad their work to force display. Even the longest comment must be validated by a second person upvoting it.

Score

It's very difficult to properly evaluate comments in isolation from their post. This is where users come in. Only people can tell which comments deserve pride of place. Voting allows you to decide whether or not a comment gets shown to future readers.

People aren't perfect: I've observed that votes on short comments tend to mean "Funny". But votes on longer comments tend to mean (to take a page from Slashdot) "Insightful", "Interesting", or "Informative". Comments that we want to keep around have a combination of length and upvotes.

How you can help

As a baseline, the top five scheme hides 3,059,691 comments on Stack Overflow as of November 4, 2013. There were 24,136,126 undeleted comments in total. The weight algorithm would hide 22,517,301 comments. Changing the algorithm would hide 93% of comments compared to 13% as now. I feel confident that much of that is noise, but there's bound to be some signal lost as well.

I've written a query that displays the comments that are shown by the comment weight algorithm. Please take a few minutes to explore the comments on some posts with problematic comments.6 Fork my query and tweak the algorithm. Write up your findings in an answer below. Let us know if you find significantly useful comments that would be lost or long comment threads that would be even noisier by this weighting algorithm.

By which I mean, they will be behind the add / show X more comments link.

There is a significant edge case surrounding very long, very heated comment threads. If we don't cap the number of comment somehow there will be posts with up to 55(!) comments shown. (See the top 100 which includes election posts.) The simplest solution is to keep the current caps (5 for main sites and 15 for metas). I'm certainly open to other ways of preventing comments from overwhelming answers in these cases.

Will every comment in existence eventually become hidden given enough time?
–
BoltClock's a UnicornNov 6 '13 at 5:16

1

@BoltClock's a Unicorn: Nope. If a comment is 150 characters or longer and has one upvote, it will be shown forever or until deleted, whichever comes first. Shorter comments need more votes, but even the shortest gets displayed permanently with 10 or more votes.
–
Jon Ericson♦Nov 6 '13 at 5:19

26

I like this idea. Problem I foresee: in comment conversations between the OP of a question and someone who is helping them, it is very common to see upvotes on the "helper's" comments, and none on the "helpee's" comments. I feel like this might lead to unneccessary annoyance where every conversation in which an asker has sought help from an answerer will be partially obscured by a link. Perhaps the question owner should have extra weight on their comments?
–
AsadNov 6 '13 at 5:34

11

@Asad: That's a very interesting idea. A counter argument is that the OP should probably edit their question in addition to/instead of having a long comment conversation. There would also have to be some way to avoid letting the OP get their comments written in stone by making them long enough. I still think a comment needs to be validated via at least one upvote.
–
Jon Ericson♦Nov 6 '13 at 5:41

5

Third class citizens finally getting some love, nice! One thing I would add is that each flag on a comment will reduce its weight. I think 2 points is sensible. Many times old comments get one or two flags that are not enough to auto-delete and get buried too deep in the queue so moderator can take some days until he handles this.
–
Shadow WizardNov 6 '13 at 9:16

5

@Asad That happens already if, in a back and forth, only one side of the conversation is getting upvotes (or one is getting more) and the comment thread goes on long enough. So in that case, while it may not be desirable, it's still just staying the same, not getting any worse. In fact, given the proposed change, the less voted on comments would have enough "weight" to be shown for several days without votes, unlike now, where they'd be hidden as soon as there are enough comments for some to be hidden. It's a net wash or win in every way, even if it's not perfect.
–
ServyNov 6 '13 at 17:14

4

@Sha Wiz Dow Ard: Hmmm... I hate to mess with comment flagging, which really is a strength of ours. But it does seem like the obsolete and too chatty flags could be used to reduce comment weight rather than outright deletion. As a moderator, I found these flags very difficult to handle since they are so often judgement calls. But wiping out 2 upvotes seems disproportionate to me. (I'll keep thinking about this; seems like a tweak we could apply after we've seen how the feature works in practice.)
–
Jon Ericson♦Nov 6 '13 at 17:55

4

My problem with this is that it's another one of these good ideas that suits the heavily trafficked tags where many experienced users routinely vote on everything. On some of the less popular tags, one or two upvotes would be considered a popular comment and I would expect to see every comment disappear over time quite frankly. I would suggest some kind weighting in the calculation that takes overall voting patterns in the tags into account.
–
McNabNov 6 '13 at 18:10

2

Would there be some minimum threshold number of comments posted before hiding takes effect, even if the comments aren't high enough score? I'm lazy and don't want to click to see the single comment on an older question. This would also help alleviate the low traffic problem that McNab raises.
–
Esoteric Screen NameNov 6 '13 at 19:19

2

@Esoteric Screen Name: That sort of takes us back to square one in my opinion. Now waiting until a question has enough views might be a more promising threshold. For a low traffic question, there just aren't enough potential voters to even see a comment. Hmmmm...
–
Jon Ericson♦Nov 6 '13 at 19:30

7 Answers
7

The biggest change here is that all comments which get no upvotes will be hidden by default after 10-20 days. I am not sure this is a good idea.

20 days isn't very long at all in the total lifetime of a question. I'm not sure we've demonstrated that most comments that don't get upvoted are, on the whole, detrimental. I suspect that we get very poor vote coverage on comments, so there is not much to distinguish "This comment got 0 votes because it's noise" vs. "This comment got 0 votes because nobody bothered to upvote it yet"

I'd like to see more research into 0-score comments: how many of them are actually signal vs. noise? Then we can decide how much signal we're willing to lose in exchange for eliminating the noise.

If this were deletion then yes, it'd be a major problem, but it's just de-emphasizing them by adding a clickthrough. Something doesn't need to be detrimental to warrant less emphasis, it just needs to not be specifically called out as worth emphasizing.
–
ServyNov 6 '13 at 20:36

1

But keen users can always expand the comments and upvote those they deem helpful, right? I would cancel or at least lower the 5 seconds rate limiting to help them do this, I see many users finding that limit very frustrating and it might be part of the reason for the low comment votes in general.
–
Shadow WizardNov 6 '13 at 20:37

5

@Servy If I have to click that link every time just in case there's something useful in the comments, haven't we failed? We should be pretty sure there's nothing useful if we're hiding something by default. Right now a question with hidden comments is the exception, not the norm. This is reversing that.
–
David Fullerton♦Nov 6 '13 at 20:44

1

It's difficult to know how to study this statistically. However, here's a random sample of singleton comments that don't have upvotes. It occurs to me that so far there's no practical reason for people to upvote a singleton comment since there's no display difference. I'm not sure how many people make that calculation, but I know I have. Perhaps skipping the hiding algorithm for posts with, say, 1 to 2 comments would preserve more of the signal?
–
Jon Ericson♦Nov 6 '13 at 20:49

4

Keeping N comments always displayed doesn't seem like a bad idea to me (basically the current system but with a better way of determining what gets shown before you have to click through). I haven't seen much below the threshold I'd care to read personally, but I'll admit I've been looking at very active questions when I composed my answer. Veryinactive ones merit some attention too...
–
voretaq7Nov 6 '13 at 20:54

What about basing the comment time cutoff on post age and activity?
–
hexafractionNov 6 '13 at 21:42

1

@hexafraction I think we'd get the same results by only applying the age-out aspects if there are more comments than get displayed by default (i.e. show everything until we have 6 comments, When there are 6 or more apply the algorithm to determine which comments are "above the fold") -- simplicity is elegance :)
–
voretaq7Nov 6 '13 at 23:33

The click through is already annoying in the current implementation, especially since the hidden comments don't get downloaded with the page, and thus clicking through requires internet connectivity.
–
CodesInChaosFeb 3 at 19:19

Thank you all for the feedback! I'd like to propose a few tweaks to cover some of the concerns that have been raised so far:

We need Top N

If you look at a post with an insane number of comments it becomes clear that we need some sort of limiting factor. The top 5 (15 on meta) has worked out pretty well in terms of showing the right number of comments though I'd argue it doesn't always pick the best comments. voretaq7 wrote a query that orders comments by weight. Look at the query with some long comments threads and consider how sad you'd be to have to click a link to see the 6th and following comments.

Grace Period

Something that bugs me about the top N system is that new comments aren't displayed and don't have much chance of getting displayed. It's an excellent example of wealth condensation. That's why I designed the weight system to show every comment for at least one day. But after I wrote up the question, I started thinking about the "day in court" concept. If grace period is tied to the other parts of the weight system, some comments will be seen for a day and others for 9 days before their fate is decided. That's not exactly fair. (Though being fair isn't a primary goal of this feature.)

Including age also makes the implementation a bit more complex than it should be. So instead of making age part of the weight calculation, I propose giving every comment a grace period of 7 days. That should be long enough for good comments to gather the votes they need to be permanently eligible to be shown. So the weight factors become just upvotes and length.

To be clear, the grace period should override the top N criteria. If a comment is less than a week old, it will be shown no matter how many comments are in a thread. That means there will be some huge comment blocks at times, but they won't last more than a week.

Don't hide just one comment

Singleton comments are a little bit different than longer comment blocks. One of the goals of the proposal is to increase information density. This line is very nearly devoid of content if there is only one comment:

add / show 1 more comments

In terms of density, it would be far better to show this instead:

Try ulimit -s. If that is not unlimited, set the stack segment size with ulimit -s unlimited. – Jon Ericson Mar 15 at 0:02

Or even:

Welcome to Stack Overflow and thank you for the suggestions! – Jon Ericson Apr 18 at 15:35

There's no need to substitute a nearly content-free link for a potentially useful comment. If we suspect that even half of all comments are useful, it probably doesn't make sense to hide twin comments either. As Esoteric Screen Name demonstrated we'd still be hiding a lot if we set the threshold for invoking the hiding algorithm to 2 comments on a post.

I would expand Don't hide just one comment to Don't hide just N comments -- basically always show the Top N. The weighting algorithm is certainly not perfect at picking out the "most interesting" comments, but in all the cases I've looked at (even low-traffic) it's pretty darn good. (In my poking on SF 5 seems to be a good value for N -- Reducing N often loses signal, increasing it almost always adds noise.) This doesn't "declutter" (which is a big benefit to hiding even the Top N), but it does vastly improve what we've got now.
–
voretaq7Nov 7 '13 at 21:58

2

(Replying to my own comment, the major concern I have are the "insightful" comments that may not cross the weight threshold - "On FooOS 11 the command is frotz, not blorple" - +2 for length, and +5 for hypothetical upvotes gives it a long-term weight of 7, so it will age out even though it's got useful information)
–
voretaq7Nov 7 '13 at 22:06

RE: If you look at a post with an insane number of comments... Two thoughts on that: (1) It stands to reason that a post with an insane number of upvotes might also have an insane number of comments; (2) Why rely on some algorithm here? Why not just have a moderator clean it up? It seems like a human can figure out which comments are extraneous, and which should endure as part of the conversation. (Take Single most informative answer ever made on stackoverflow and Best. Answer. Ever!, e.g. – if the comments are becoming "too much", just pick one and delete the other.)
–
J.R.Dec 13 '13 at 10:46

A number of folks have inquired as to what the current status on this is, so here's an update:

Our internal discussions and analysis have led at least some of us to conclude that the proposal may do more harm than good.

While the selection method Jon proposed, particularly the length input, is really clever, when we reviewed an (admittedly small) sample of questions where it would hide comments, it's pretty safe to say the following:

It essentially hides everything but a small percent that seem to have a very strong quality indicator.

This led a couple of us to conclude that:

It clears out a bunch of noise, but takes a fair amount of signal with it.

In reviewing some posts where the comments would be hidden, there were a lot of examples where the comments, while far from awesome, seemed to add some value to a reader by being default visible, by adding information, clarifying a point, or simply asking a question so no one else needs to. Some examples:

Requests for clarification or details

Comments that add color, like the pros and cons of a proposed solution

Comments that explain why an answer that sounds good won't actually work

Requests for additional code, and the response explaining why the OP can't share it (which prevents others from asking the same thing).

Feedback that the suggested solution didn't work (conveying to others that it looks good, but isn't actually effective

You actually need to start here and get more information before asking this question

Note that some of the above are not ideal use of comments. Some of the places it felt harmful to hide comments, they were being slightly misused, because an edit would be more appropriate. But the key is that edits weren't happening, so the comment being visible to a visitor who'd otherwise miss the warning, etc. is better than it being hidden.

Another observation:

There appears to be more low-hanging-fruit for comment hiding on Questions than answers.

Answer comments were more often signal.

Plus the noise harm is much bigger on questions, as comments there could be pushing the top answer off the page, (as opposed to bumping a lower-ranked answer for questions).

We haven't given up on this, as there's a lot of strong of internal belief that we can do much better here, but this particular implementation has left us fairly split on whether its a net win or loss.

The top N rows in the results are what will be shown (where N=5 for main sites, 15 for meta) - I'm using Creation Date as the tie breaker value when multiple comments have the same Weight. Everything else would be behind the "view/add more comments" link.

TODO (for someone feeling ambitious): Play with tie-breaker algorithms.
It's possible that Score would be a better tie-breaker than Creation Date.
A modified "weight" (just summing score, length, age) may also work well.

There seem to be two goals here: better automatic identification of comments which can be hidden, and reducing the incidence of immediately visible and lengthy comment chains.

As David Fullerton and McNab have pointed out, hiding potentially helpful information without just cause is detrimental, and would frustrate the first goal. Also, users are lazy (this one certainly is), and clicking the "show comments" link is just so much effort. Thus, I think that hiding 93% of comments is too aggressive.

I suggest modifying the algorithm to hide comments as dictated, but also only when the hosting post has at least X number of comments (requiring # comments > X). For example, a one month old, 20 character comment with no upvotes scores 0 and would be hidden. But, if it's the only comment on a post, there's no reason at all not to show it.

Using the proposed point based algorithm and requiring a minimum of 6 comments on a post (a distant parallel to the current display-top-5-voted approach) still more than doubles the number of hidden comments on SO, and runs a much smaller risk of hiding useful content.

For reference, here's a count of the number of posts with a given number of comments on SO (included in the SEDE query):

I can't quite write a query for this (ot enough SQL-fu), however it shouldn't be hard to implement:

Why not hide long conversations by default? I've seen these happening, and they seem to generate comment votes, too, and clutter up the answer area.

If there are more than 3 comments from 2 users, replying to each other, hide them all (except for the more upvoted ones maybe)

Why not give comment replies in general more weight/points? It seems like it's pretty common for them to contain useful info after being asked for it. Especially replies by the OP, except maybe excluding replies that start with "Thanks" and/or contain the words "updated"/"edited". Ideally, these should be incorporated into the post, but we can't automate that.

How would you tell the difference between a reply in a useless conversation and a reply to a clarification request or a "reply" to the OP with a correction to their answer?
–
Anna Lear♦Nov 6 '13 at 18:50

@AnnaLear \@replies to the OP get excluded, \@replies by the OP get excluded unless in a long conversation. Though that gets a bit unwieldy.
–
ManishearthNov 6 '13 at 19:02

Interestingly, I've considered adding weight to comments that are replies to other people. On reflection, I'm not sure there is much difference between a reply and a straight comment. (Also, replies get a small weight boost by virtue of having extra characters that don't really add much to the content. I could see removing those from the length calculation being a subsequent tweak if it weren't costly and difficult.)
–
Jon Ericson♦Nov 6 '13 at 19:11

@JonEricson On reflection, that makes more sense. A lot more sense. Time to U-turn :P What I was considering the minority case is actually the majority case, on looking at most SO posts.
–
ManishearthNov 6 '13 at 19:15

@JonEricson Yes, but that's been 6-8 week'd for so long :/
–
ManishearthNov 7 '13 at 17:52

@JonEricson Also, sometimes I see discussions that get over in 6 messages or so. No point moving them to chat after the fact. Besides, on a site like SO moderating comments is the last priority for most mods.
–
ManishearthNov 7 '13 at 17:54

Commenting is a way to balance out posts which are either inaccurate or disputed. When an issue is raised, then the post's votes may be affected by the comment's content because it raises an unaddressed issue. However, I do not believe the mere presence of a comment negatively affects a post.

Always show the first two

Always show the first two because it sets the tone for the "thread" which follows. I know Stack Exchange is not a forum, but the comments sure seem to follow that path where the post( question or answer ) which is the parent to the comments is the start of the thread.

Apply your metric to any past the first two to get another 2 as a summary

Of 19 points, 10(max) for votes, 9(max) for content length - I am not sold on the days metric. Take the set of comments with points > 1, order by total votes then by date posted descending (oldest first), and then display another 2 of those.