If you’ve ever worked on a website where users are allowed to comment, it’s likely that you’ve had a discussion where somebody asked that the length of comments be restricted. I guess the assumption is that people will only write useless drivel anyways, so the shorter the comment, the less useless drivel they can put on your site. But if that’s the assumption, why have comments to begin with?

In my subjective experience, there is a correlation between comment length and quality. When reading comments online, I’ve always felt that the length of a comment is a good predictor for its quality; the longer a comment, the better it probably is. If this is true, restricting comment length is likely to restrict the best comments, while leaving poorer comments intact.

To find out whether there really is a correlation between comment length and quality, I’ve set up a little experiment this weekend. I’ve loaded a bunch of comments from MetaFilter and YouTube,1 and I’ve asked people to rate them.

Since YouTube restricts comment length to 500 characters and MetaFilter doesn’t, mixing the results of the two sites would skew the results. YouTube comments are obviously worse than MetaFilter comments, so mixing the results of the two sites would unfairly hurt the average quality of short comments. To get around this problem, I’ve kept the two data sets separate.

I should also note that some short comments may be replies to other comments that don’t make sense out of context, so short comments may have a slight disadvantage. Additionally, I’ve removed all formatting from the comments, which may have made some of them look slightly incoherent. This is possibly more likely to hit longer comments. I believe that all of these issues don’t have a huge influence on the result, though. The context of most comments is obvious even without any formatting, and additionally, the problem likely affects all comment lengths to some degree.

In total, you guys have kindly rated 9310 comments (4470 from MetaFilter, 4840 from YouTube) with 3972 distinctive comment texts (1469 from MetaFilter, 2503 from YouTube). For duplicate comment texts (some comments were rated more than once, and other – short – comments had the same text), I’ve taken the average rating. For the correlations themselves, I’m providing charts, but I haven’t calculated the correlation for any of the results.

Finally, I’ve assigned a number to each «quality group», from 1 for worst to 5 for best. Sometimes, you’ll see the group’s smiley, sometimes, you’ll see the number

Okay, with all the disclaimers out of the way, let’s look at the data.

A Word On Comment Length

Before I’m answering any of the interesting questions, I have to point out that most comments are short. Here’s how MetaFilter’s comment length distribution looks like with a linear x-axis:

YouTube’s comment length drops off even quicker since no comment can be longer than 500 characters. In order to get around this problem, I’m sometimes using an exponential scale with base 2 for comment length, i.e. when grouping comments by length, I’m increasing group sizes as comment sizes increase. The same comment length distribution as seen above looks like this on an exponential scale:

Since the sizes of the groups are mostly irrelevant for my results, using an exponential scale allows for much more readable charts.

Is there a Correlation Between Comment Length and Comment Quality?

First, I’ve taken the average rating for each comment, rounded it to the nearest «quality group», and calculated how long the average comment in each group is. Here’s the result for YouTube:

(Show me the Numbers)

Quality

Average Comment Length (Characters)

Avg. Word Length

Avg. Sentence Length

Number of Comments

1

71.16

4.04

38.52

1150

2

75.74

4.01

40.59

669

3

79.27

4.04

42.51

426

4

80.48

4.03

43.22

143

5

57.74

3.92

35.56

115

Note that you can click on «Show me the Numbers» below each chart to see the numbers used in that chart, as well as some additional data such as the number of comments in each group, and the average word length in each group.

On YouTube at last, there doesn’t seem to be a correlation between comment length and quality. MetaFilter paints a different picture:

(Show me the Numbers)

Quality

Average Comment Length (Characters)

Avg. Word Length

Avg. Sentence Length

Number of Comments

1

248.31

4.44

72.04

108

2

239.27

4.48

72.56

214

3

373.83

4.55

87.30

533

4

527.07

4.53

91.94

452

5

751.80

4.50

106.78

162

It is clear that higher-rated comments are, on average, longer than lower-rated comments. The group of highest-rated comments has an average length of over 750 characters, way ahead of what is even allowed on YouTube.

Let’s turn this chart around and plot quality based on length. Note that I’ve made the group sizes grow exponentially on the x-axis since there are fewer long comments.

Here’s how this looks on YouTube:

(Show me the Numbers)

Number of Characters

Average Rating

Number of Comments

3-4

1.69

33

5-8

1.81

76

9-16

2.00

238

17-32

1.95

565

33-64

1.90

676

65-128

1.87

537

129-256

1.96

271

257-512

1.97

107

Again, this looks pretty flat (there actually is a correlation between length and average rating here - correlation coefficient is around 0.4 - but the results are all pretty similar, so I don’t think it matters much).

Let’s look at MetaFilter’s chart:

(Show me the Numbers)

Number of Characters

Average Rating

Number of Comments

3-4

1.67

3

5-8

1.50

5

9-16

2.52

27

17-32

2.18

47

33-64

2.73

138

65-128

2.88

240

129-256

3.09

323

257-512

3.39

319

513-1024

3.57

209

1025-2048

3.70

119

2049-4096

3.74

33

4097-8192

3.50

4

This chart clearly shows that the longer a comment is, the higher it is being rated. However, the «ratings growth» flattens out at a comment length of about 2000 characters with a rating of about 3.70 – long before hitting the «rating limit» of 5.

Conclusion

Given this data, my recommendation is to not restrict comment length, or to restrict it at a high level.

Looking at MetaFilter comments, there seems to be a clear correlation between comment quality and comment length. At least on websites with an audience that is not actively malevolent, longer comments seem to be better. Since longer comments on average were rated higher, restricting comment length may cut off the best comments, or force comment writers to redact their comments until they fit – and it’s pretty hard to trim a 2000-character-text into 500 characters without removing substance.

If, for some reason, you are required to restrict comment length, I would recommend to set the limit at around 2000-4000 characters. At this point, the return in quality for additional comment length seems to diminish. Comment quality actually decreases after 4000 characters in my data. However, since there are only four comments longer than 4000 characters, this may not be representative. Even so, the data points before that final group imply that average quality is at least not going to grow further after 4000 characters.

Note that I am not claiming that there is a causal relationship between comment length and quality. Comment quality is determined by many factors. Length may be one of them. However, the data does seem to suggest that restricting comment length will not improve comment quality. It does not prove that this is the case, though - even given this data, it’s possible that comments on MetaFilter would be more succinct and to the point if their length was restricted. It might also be that people simply rated longer comments higher because shorter comments were more likely to lack context, and that there is no actual correlation between quality and length even on MetaFilter. I should also repeat that YouTube did not exhibit a similar correlation between comment length and comment quality, so clearly, different sites operate under different rules.

What Else?

Of course, at this point, all of this got me thinking. Is there anything about the content of these comments that acts as a predictor for quality? You betcha.

Please note that this is not a statistically valid study. The selection of the comments is not very random (which should be okay when evaluating comment length, but probably isn’t when evaluating words, especially for words which occur rarely). The people who voted were self-selected, and the sites which advertised the voting form further skewed who was able to vote.2

Please also note that I am not advocating using any of this data for filtering purposes. This is just for fun.

For each of these charts, I’ve started with the average rating, followed by the ratings for the given subset of comments. For those «subset» ratings, I’ve marked them red if they were below average, and green if they were above average.

Weird Punctuation

First, let’s look whether there’s a connection between poor punctuation and quality. I’ve looked at four different punctuation mistakes: A missing apostrophe in «don’t», an ellipsis with only two dots, an ellipsis with four dots, and four consecutive question marks (arguably, the last one isn’t really a mistake, I guess).

Mistake

Average Rating

Number of Comments

Standard Deviation

Average

2.38

3972

….

2.00

147

1.17

dont

1.84

69

1.12

..

2.00

91

1.10

????

1.81

15

0.81

All of these mistakes seem to correlate with a decrease in perceived comment quality. The four consecutive question marks are hit the hardest, followed by the missing apostrophe.

CAPS LOCK

Looking at the comments with at least 8 or 20 consecutive capital letters, we get a similar picture. CAPS seems to be an indicator for a poor comment; the more caps, the worse the comment:

Average Rating

Number of Comments

Standard Deviation

Average

2.38

3972

8 Consecutive Caps

2.01

256

1.18

20 Consecutive Caps

1.90

46

1.06

Colloquialisms

Next, let’s look at terms which are typically not used in formal language. This includes abbreviations like «OMG» or «LOL». I’m also including things like using «u» instead of «you». Here’s a list, and how they stack up.

Colloquialism

Average Rating

Number of Comments

Standard Deviation

all

2.38

3972

1.24

omg

2.16

54

1.25

LOL

1.79

245

1.00

u

1.82

173

1.06

dat

1.58

12

0.93

bro

1.60

5

0.73

Unsurprisingly, they’re all indicators for poor comments. So let’s move on to something more interesting.

Swearing

What’s interesting about swear words is that not all of them are created alike. Some actually improve comment quality. See for yourself:

Swear Word

Average Rating

Number of Comments

Standard Deviation

all

2.38

3972

1.24

suck

1.87

60

1.12

fuck

2.10

229

1.25

cock

1.98

14

1.29

cunt

1.40

5

0.37

asshole

1.99

14

1.21

jackass

2.70

11

1.30

crap

2.70

30

1.31

shit

1.94

140

1.16

bullshit

1.88

17

1.05

holy shit

2.25

2

1.25

douchebag

2.57

7

1.59

Most notably, both «jackass» and «crap» seem to be reasonably good indicators for above-average comments.

Addressing Each Other

Comments which refer to the writer or to the another person (and contain words such as «I», «you» or «me») rank above average. I guess People are nicer and write more thoughtfully when they are addressing each other personally.

Average Rating

Number of Comments

Standard Deviation

all

2.38

3972

1.24

you

2.64

1117

1.28

I

2.72

1527

1.24

me

2.76

386

1.30

your

2.69

299

1.32

my

2.69

461

1.30

Other Positive Words

There are many more words which are predictors for good comments. Here are a few.

Word

Average Rating

Number of Comments

Standard Deviation

all

2.38

3972

1.24

consider

3.61

48

0.91

interestingly

3.50

2

0.50

presumably

3.89

5

0.40

Note that «interestingly» and «presumably» only occurred twice and five times, respectively. What’s more, none of these words appeared on YouTube at all. This might imply that they’re only above average because they only appear in MetaFilter comments. However, MetaFilter’s average rating is 3.17, so they are above average even when only considering MetaFilter comments.

Average Word Length

Next, let’s look at word length. Is there a connection between average word length and comment quality? Let’s see. Here’s the average word length for every rating group for YouTube comments:

(Show me the Numbers)

Rating

Avg. Word Length

1

4.04

2

4.01

3

4.04

4

4.03

5

3.92

And here’s the same chart for MetaFilter comments:

(Show me the Numbers)

Rating

Avg. Word Length

1

4.44

2

4.48

3

4.55

4

4.53

5

4.50

I guess the interesting part here is that there is no interesting part. I would have expected to see a clear correlation between comment quality and word length. That does not seem to be the case.

Average Sentence Length

Same deal as average word length, but this time comparing average sentence length. On YouTube, it looks like this:

(Show me the Numbers)

Quality

Average Comment Length (Characters)

Avg. Word Length

Avg. Sentence Length

Number of Comments

1

71.16

4.04

38.52

1150

2

75.74

4.01

40.59

669

3

79.27

4.04

42.51

426

4

80.48

4.03

43.22

143

5

57.74

3.92

35.56

115

Again, that doesn’t look like much of a correlation. But look at MetaFilter:

(Show me the Numbers)

Quality

Average Comment Length (Characters)

Avg. Word Length

Avg. Sentence Length

Number of Comments

1

248.31

4.44

72.04

108

2

239.27

4.48

72.56

214

3

373.83

4.55

87.30

533

4

527.07

4.53

91.94

452

5

751.80

4.50

106.78

162

Clearly, MetaFilter comments with longer sentences are perceived as «better.»

Context

So, can you use this data to predict whether a comment is good or bad? No. It’s all about context. Consider the words «gay» and «dog». On YouTube, these words have clear negative connotations. On MetaFilter, however, comments containing the word «gay» or «dog» are rated above average.

Word

Average Rating on YouTube

Average Rating on MetaFilter

all

1.92

3.17

bush

2.5

3.04

obama

2.22

2.95

god

1.60

3.26

dog

1.74

3.78

gay

1.73

3.85

One possible explanation may be that «gay» and «dog» are used as insults on YouTube, but are used with their neutral meanings on MetaFilter.

As for Obama and Bush, perhaps political discussions are above average compared to the typical YouTube discussion, but below average compared to the typical MetaFilter discussion.

Comparing the Quality of YouTube and MetaFilter Comments

And finally, if I’m not going to provide this information, you’ll ask me for it, so here goes.

I’m not going to compare comment length, since YouTube restricts the length of its comments. However, what I can compare is word length. On average, MetaFilter commenters use longer words:

Average Word Length

All

4.40

Mefi

4.52

YouTube

4.02

Despite using words that are only slightly longer on average, their sentences are more than twice as long:

Average Sentence Length (Characters)

All

70.38

Mefi

89.87

YouTube

39.92

That means that MetaFilter commenters use roughly twice as many words per sentence as YouTube commenters.

And the average quality of their comments is better:

Average Comment Rating

All

2.38

Mefi

3.17

YouTube

1.92

And that’s all, folks.

To be perfectly clear, none of the data I have collected will be stored or released except for the data in this blog post. The collected list of rated comments will not be released for privacy and copyright reasons. In fact, I have already deleted the corpus of rated comments, so if you have additional ideas for evaluation, I unfortunately can’t implement them.

A word on statistical significance. Given that some of these results are based on a very small number of comments, it is possible or even likely that they have occurred by accident. I haven’t calculated the significance for any of the results. For some results, I have provided the standard deviation to indicate how closely clustered the results were. Again, I’m not a statistician, and this part is just for fun; don’t rely on any of these results.

If you liked this, you'll love my book. It's called Designed for Use: Create Usable Interfaces for Applications and the Web. In it, I cover the whole design process, from user research and sketching to usability tests and A/B testing. But I don't just explain techniques, I also talk about concepts like discoverability, when and how to use animations, what we can learn from video games, and much more.

Hi. My name is Lukas Mathis. I studied Computer Science/Software Engineering and Ergonomics/Usability at ETH Zürich. I work as a software engineer and user interface designer for a swiss software company creating process management software. I've written a book about usability. It's been translated to Chinese and Japanese. My first computer was a Performa 450, my first programming language was HyperTalk, my first electric guitar was a cheap Peavey, my first videogame was a VCS 2600 and my current snowboard is from Lib Tech. I live in a small cottage in a remote part of the Swiss Alps, and you can reach me at or on twitter.