Now that we have over 3 million questions on Stack Overflow, the "Duplicate Question" problem is becoming way more difficult to handle and address. We are seeing over 130 possibly duplicate questions on SO daily, handling them is always very tricky.

We have been kicking around some radical ideas to help improve this process and make it friendlier.

My current thinking is switching the focus when we deal with duplication. Instead of presenting the OP with a "duplicate question" and shutting them down, I think we should involve the OP and community at-large in the process by focusing on the "duplicate answer"

The way I see this working is allowing any user with more than X rep to select an exact duplicate answer from another question and "teleporting" it into the question she thinks is a dupe.

Once this "shadow community wiki answer" shows up it can be subject to voting and being accepted.

If it is accepted by the OP clearly the question is a duplicate, so the system can close it.

If it is downvoted by the community cause it does not answer the question, no loss, we can automatically remove it once it gets -1 or -2.

For these shadow answers, editing should be locked (and only accessible on the original question, so we do not create fragments)

I think that introducing such a system would be far more effective than what we have today, would heavily reduce mod workload, and would lead to better canonical answers. It would also eliminate the annoyingly long comment threads on possible duplicate questions.

We can also use this to give people "badges" for finding lots of great canonical answers and so on.

My first thought is that this seems really confusing. It's also more trouble for the people doing the selecting, since now instead of pointing the OP at a set of applicable answers, you'd have to reason which of those answers would be best for the OP's scenario. I'm also not sure what about this would necessarily lead to better canonical answers; can you elaborate on that point?
–
Tim StoneMay 10 '12 at 6:26

This doesn't solve the problem of getting new answers to the question posted on the "master" duplicate, so you're potentially creating a lot more duplication, with good answers to the same general problem scattered around in multiple places and therefore hard to find.
–
Cody GrayMay 10 '12 at 6:27

@TheEstablishment how is it any worse that what we have now?
–
wafflesMay 10 '12 at 6:28

2

When a question gets closed as a duplicate, you can't post new answers to it. All new answers have to go on the "master" question.
–
Cody GrayMay 10 '12 at 6:30

@TheEstablishment it still closes in the proposed system, in fact it would close faster ... eg: shadow answer gets N votes ... close as dupe or something. It also, auto closes, if the OP accepts, which shortcuts the wait time.
–
wafflesMay 10 '12 at 6:31

@TimStone a "teleported" answer would have to stand in the context of a slightly re-worded question, so it may need to be widened.
–
wafflesMay 10 '12 at 6:32

But who is going to bother to go to another question (as required by the edit lock) to update the answer so that it's also applicable to a different question than the one it was posted on originally? Just seems like a lot of jumping back and forth.
–
Tim StoneMay 10 '12 at 6:33

@TimStone then why are we closing as duplicates stuff that is not really a duplicate ...
–
wafflesMay 10 '12 at 6:35

Are we? I was just saying that I don't see this increasing the likelihood of people editing to create more canonical answers, so I'm not sold that that's actually a benefit. Also, it's not clear in your proposal, but I assume that the shadow answers will be copied over with score = 0?
–
Tim StoneMay 10 '12 at 6:46

1

I guess I'm also having a hard time envisioning a scenario wherein the OP would be satisfied with a shadow answer, but would have also felt shut down by a tradition close. I imagine the real benefit here is the part about better involving the OP in the process, but it still seems like a very complex system to achieve that. If, at the end of the day, the problem actually does lie with people doing a poor job at knowing what to close as duplicate, is there anything that could be done about that?
–
Tim StoneMay 10 '12 at 7:02

1

...Otherwise we might have OPs going "WTF is this random answer on my question?" instead of "OMG my question is so totally not a duplicate!"
–
Tim StoneMay 10 '12 at 7:02

@tim that already happens. But "good" (read: simple, clear) duplicates that are interesting, even if they have been duplicated a zillion times over get voted up anyway, which negates that factor and is a fault of the system. At any given time every question, no matter how duplicated, is new to someone
–
Jeff Atwood♦May 13 '12 at 17:48

Somehow this is more appealing to me than the original proposal. To me the hardest part about duplicate questions is search, finding the question you are about to duplicate. If I found that question more easily, that would help a lot.
–
joshpMay 10 '12 at 7:03

5

I'm especially concerned with disadvantage number two, given that people already do that, and certainly don't need to be encouraged. Of course, between the auto-comment functionality and that proposal about link to text ratio that Robert has out there, perhaps that risk could be mitigated.
–
Tim StoneMay 10 '12 at 7:10

1

What should happen when somebody under 3k uses a flag? No change (i.e. report the duplicate to us as usual)?
–
BoltClock's a UnicornMay 10 '12 at 8:54

6

@TimStone from some time today, link only answers are going to be banned, I added an html parser that strips to text before measuring the length of the post
–
wafflesMay 11 '12 at 0:02

6

@waffles no more link-only answers? O frabjous day! Callooh! Callay! I'm going to be happy about this for several more minutes before I finally accept that it's going to lead to answers like "w3schools.com more wrds to get aruond t3h stupid link filter."
–
Pops♦May 11 '12 at 0:44

well @PopularDemand those answers become way easier to flag as delete then :)
–
wafflesMay 11 '12 at 1:24

@waffles does it remove comments as well?
–
ManishearthMay 11 '12 at 2:24

@TimManishEarth feel free to try and stump it on the meta sandbox question
–
wafflesMay 11 '12 at 2:26

@TimManishEarth do I really need to chuck a unicode filter into that function
–
wafflesMay 11 '12 at 4:45

@waffles: Not really... I just added that since you sort of challenged me :) Even if you add a Unicode filter, there probably are other workarounds. Leave it as it is and hope that nobody finds out. If a user is repeatedly (ab)using Unicode to post link-only answers, he'll get caught sooner or later.
–
ManishearthMay 11 '12 at 4:47

@waffles the interstitial Matt proposed is the way to go; neither this nor the solution proposed in the Q addresses the root problem of finding the duplicate, but adding a mandatory interstitial duplicate selection page after submitting the question does.
–
Jeff Atwood♦May 12 '12 at 14:55

1

I ♥ this. How about, on acceptance of the answer, awarding the finder of the duplicate the accepted answer rep, in effect combining this proposal with meta.stackexchange.com/questions/90620 ? Although the reward would then depend on the asker's acceptance of the fact that he asked a dupe, which would almost certainly lead to tensions. Difficult. Still, plz consider awarding dupe-finders with rep points. At 3m questions, finding good dupes is one of the greatest services one can do to the site.
–
PëkkaMay 24 '12 at 19:35

Wait, does <3 trigger the ♥ or is pekka just typing ♥ by hand, like a maniac?
–
jcolebrandJun 8 '12 at 19:50

If this was made an answer-but-not-an-answer I can sort of see this working; the problem with posting the dupe as a comment is:

It's sort of easy to be skipped over.

Only people of whatever privilege can react and vote-to-close on it.

... but the problem with posting it as an answer makes it:

The original poster of the answer can miss out on up-votes (as up-votes are on the CW rather than their own post).

Is there anything to distinguish a random answer from the duplicate answer?

A user finding and posting the dupe answer may have taken more time to find it than the FGITW, who could have posted a placeholder answer and attracted a few upvotes already; the dupe-answer is then lost in the sea of answers.

Could the dupe-answer be posted in the format of an answer, but in its own category, above the answers?

However, I think a much less radical idea would be to simply make the list of possible duplicates more visible to the OP when typing the question.

Currently the first list of duplicates is presented after they've entered a title; which is possibly almost entirely useless; the system has barely anything to go on; it doesn't even know which language the question is about (which is entered in tags at the end). Even on Meta, I've asked questions which have shown no duplicates in this list, only to be closed within seconds with a link to a spitting image of the question I was asking.

If we move the tag input above the body of the question and show suggestion after that, the system has a lot more information to draw a list of suggestions from.

The second list of duplicates is presented whilst the user is typing the body of a question, but:

Its feedback is very delayed. I'm not sure whether it's gauged by number of chars or time to type a message, but it always takes more than 1 minute to show when I've ever typed a question.

It's at the side, which is not where the user is looking.

I'd like to see the user finish typing their question clickpost, and then be presented with a page saying:

Hey, heres a list of duplicates, do any of them answer your question?

Yes, really that big.

If the user scrolls to the bottom and doesn't see a match, let them post their question.

Whether or not this replaces the list of suggestions along the side or is shown in tandem with the list of questions, I'm impartial. It could even be the same list of suggestions; let the people who didn't even see the list have their first scan, and let the people who've read the list scroll past it.

Some alternatives;

Make the vote-to-close barrier lower when the question is first posted (e.g. only 3 votes during the first hour, 5 after).

I don't have a problem with submitting to "here's a page of very likely duplicates, click any to open in a new window", and for now, make them the exact duplicates you've already tried to alert them to.
–
jcolebrandMay 10 '12 at 23:16

Alternative number one (reduce vote to close threshold for new posts and dupe votes only) makes a lot of sense regardless of what else is done.
–
Jeff Atwood♦May 12 '12 at 14:57

5

An interstitial "is this a duplicate of..." page after submitting a question (for newish users only? All users?) is something I have considered for a long time. Perhaps sheer question volume now justifies such an approach.
–
Jeff Atwood♦May 12 '12 at 15:35

We need to a think a little out of the box here. Sometimes the duplicates have nuances which are threaded in the back and forth across the various answers and comments regarding the motivation of the question. It would be nice if this was able to be somehow synthesized into a definitive "answer".

I have mentioned before about the need for Stack Overflow to support a more didactic long form - i.e. articles or blog posts which examine a problem's facets more fully.

For whatever reason, I find that Community Wiki isn't really rich enough for a real article - and long Community Wiki things are difficult to read here for some reason - not sure if it's the layout or just the way it fits into SO's existing Q&A format.

Long text is difficult to read, period. Adding more text just makes it worse.
–
Jeff Atwood♦May 12 '12 at 14:52

@JeffAtwood So the definitive answer to a question must always be short? And just link answers (especially out of the site which may die due to linkrot) are discouraged? My point then is that some definitive answers to VERY repetitive questions here are not best handled by StackOverflow at all (because of the facets), yet people still feel compelled to ask them here. So a community has been created which people feel comfortable asking such questions and expecting such questions, yet which does not welcome such questions.
–
Cade RouxMay 13 '12 at 2:05

1

the best answers are as short as they can be. You can also fit some pretty comprehensive answers in 30k chars. Did you know I measured once and the average length of a Steve Yegge blog post is about the same as the max length of a SE post, that is, 30k chars? And if you can't fit a "definitive" answer in One Metric Yegge, then, well.. maybe these aren't the droids you're looking for.
–
Jeff Atwood♦May 13 '12 at 4:38

How about adding a second canonical form of answers which are used as duplicates, instead of copying the original?

One problem with shadowing an answer between questions is that the answer to one question may look confusing or incorrect when posted with a different question, even if the two questions are duplicates. For example, if the answer references code in the first question, those references wouldn't make sense in the context of the second. Another possibility is that original answer was a quick, oversimplified description of how to solve the problem, while the OP of the second question is a beginner and needs a more detailed explanation.

To make a single answer suitable for both questions (and possibly others in the future), it may have to be edited heavily. During this process, it is possible that some extraneous detail specific to the first question is removed, making the answer less helpful for that question. It is also possible that the original answerer may not like the edits, or not understand why they were made, and revert them, making the answer unsuitable again. Also, anyone who wants to edit that answer in the future would have to check every question it is shadowed onto to make sure it still makes sense in that context.

A possible solution to some of these problems is to create a canonical form of the answer which is connected to it, but not the same. When choosing an answer as a duplicate, the user would be presented with an editor pre-filled with the original answer for them to modify. The user creates a canonical answer which would work be suitable for both questions and submits it. The second question shows this new answer as a possible duplicate, with a link to the original added automatically. The first question continues to show the original answer, but adds a link at the bottom which can be used to view the canonical answer and a list of the questions it is used on, but the original answerer has the choice to replace it with the canonical version (hiding the original behind the link). If the same answer is used as a duplicate for a different question in the future, the canonical form of the answer is automatically used.

This would allow the first question to keep its specific answer and no information is lost, but there is a single canonical answer used for all other questions. The canonical answer could be edited from any of the questions it is posted on without modifying the original. The original answerer can keep their answer, so they won't be upset or revert the edits.

Obviously, the edits required to create a canonical form of an answer could be significant. Therefore, I would suggest giving reputation to the user if the answer meets the requirements to close the question. If the answer never meets the requirements, no reputation would be awarded. Perhaps 2 reputation per upvote for both the original answerer and the person who found the duplicate. The reputation for finding the duplicate could be awarded only if the finder provides significant edits to the post, to be more like the current system. This could prevent others from using the same answer for a different question since they couldn't get reputation from it, so perhaps the editing window should be provided with the canonical answer for future duplicates so that the second finder still has an opportunity to gain reputation for edits.

The only duplicate closes I agree with are questions asked by the same person on the same day.

Other duplicates never really are exact duplicates. The OP's context might be slightly different. Better solutions may have come up since the last time the question was asked. Problems with the old best answer might have surfaced. As a case in point, this question itself is also a duplicate, see here and here, among many others. But it's still useful.

The worst aspect of a duplicate close is how it alienate new users. A colleague of mine came to SO and asked a good question. It good closed as a duplicate. That might have been technically correct. But it also alienates fresh blood exactly like the RTFM comments on Usenet used to do.

So while I think your solution moves in the right direction, I'd go even further. The OP should be treated with respect and his question should not be closed as a duplicate without his agreement. The OP should be given a chance to respond to the duplicate suggestion. With maybe an exception for same-user same-day questions.