Really, people, do you want to be answering these same questions ten years from now? How about when you’re 65? That doesn’t sound so appealing now, does it?

We predicted this problem, even before we launched Stack Overflow. Why? Because the same thing happened on Usenet, where:

Most users could only see a few days or, at best, one month of archives for any given newsgroup. It was literally impossible to search the archives. I know that’s hard to imagine for you youngun’s, but this was before the web was even invented(!), and every Usenet site had to store their own copy of every newsgroup they were interested in reading, so there just wasn’t enough room for archives.

As a result, newbies would frequently ask the same beginner questions.

Ye Olde Timers got Ye Olde Tired of this.

What happened next depended on the newsgroup.

If the old timers were feeling generous, someone would write a FAQ, and re-post it every week or two. This was supposed to prevent simple questions getting asked again and again. These FAQs evolved into one of the early, great reference sources on the Internet before the WWW was invented.

Otherwise, the basic questions would just get asked again and again, and the old timers would grow bored and leave. The quality of the newsgroup would then deteriorate to approximately the level you would expect if seventh-graders were left to themselves, in other words, Lord of the Flies.

Stack Overflow does have a memory, and it has numerous mechanisms to prevent duplicates from getting through. We try to detect them even before the duplicate question has been asked. We have ways to close questions that are exact duplicates, we have ways of merging two questions that are identical if both questions already have quality answers, and we have extensive editing capabilities.

The editing capabilities are sometimes overlooked. With 100 reputation points, you can edit any post that has been marked as “community wiki.” With 2000 reputation points, you can edit anything.

The editing feature is there so that old question/answer pairs can get better and better. For every person who asks a question and gets an answer on Stack Overflow, hundreds or thousands of people will come read that conversation later. Even if the original asker got a decent answer and moved on, the question lives on and may continue to be useful for decades.

This is fundamentally different from Usenet or any of the web-based forums. It means that Stack Overflow is not just a historical record of questions and answers. It’s a lot more than that: it’s actually a community-edited wiki of narrow, “long-tail” questions — questions that aren’t quite important enough to deserve a page on Wikipedia, but which come up over and over again.

When you see a question that seems like it might reflect a common problem, don’t just answer it to get a few points. That doesn’t make the Internet any better. Instead, help us build up a library of canonical questions and answers that are more generic versions of the same question, and then start closing all the exact duplicates. Here are some guidelines:

Don’t answer questions that have already been answered elsewhere. Yeah, you might earn a couple of points of reputation, but, because you are duplicating content, you are actually making the internet worse. Why? Because that answer might be true today, but as technology changes, it might not be true tomorrow. There are almost certainly thousands of wrong facts on Stack Overflow already, which may have been true when they were written but are no longer true. These facts will pollute the Internet for years. This problem is not tractable if we allow Stack Overflow to become just an endless river of questions and answers. It has to be more like a Wikipedia of Questions and Answers, with canonical answers that can be edited in one place, if we are ever going to stand a chance of keeping all the information that we expose to the Internet at least reasonably correct.

If you’re going to close a user’s question as a duplicate, it has to be a real duplicate. For example, if a user asks, “What does the IP address 128.0.1.1/24 mean?” it’s OK to close that as a duplicate of a more general question like “What do IP addresses of the form a.b.c.d/e mean?” But it’s not OK to close it as a duplicate of a twenty-seven page guide to netmasks. That’s the moral equivalent of saying “RTFM.” Stack Overflow is not meant to be a library of reference manuals. It’s supposed to contain the same information as a library of reference manuals, in the form of millions of questions and answers. Combined with Google, that gives us the magical power of a library of reference manuals you never have to read! It’s like, you got to the library, and there’s a wizard there at the door, and you ask your question, and, instead of being told to read a book, you just got (are you sitting down?) the actual answer!

It is OK to edit a question to make it more general. With the power of editing comes the power to take someone’s selfish, very specific question, and edit it a little bit until they’re asking the more general question that hundreds of people encounter. For example, if someone asks, “I set up a web server at home but I can’t access it from work,” it’s OK to rewrite the question as, “What things should I check when a web server running at home is not visible on the Internet?” In fact, sometimes selfish, stupid questions of the “do my homework” variety can be easily edited into a form where the answer will provide an extremely valuable resource for the internet at large.

Help us build a great library of canonical answers. If you keep seeing the same form of questions, whether it’s mod_rewrite rules on Server Fault, freezing computers on Super User, or how to use regular expressions to parse HTML, write a great, canonical answer, once and for all. Make it community wiki so that as many other people as possible can make it great. Work really hard on writing something that is clear, concise, and understandable by as wide an audience as possible.

89 Comments

I agree on the premise of having One Reference Question with One Reference Answer — usually, if many people stumble on the same problem, that’s because there’s no one good enough reference on the topic, so we might as well make one!

I’m not comfortable, however, with the idea of taking over one question and blowing it up to the particular case. That goes against the “respect the author” fundamental premise of editing and will get in the way of the asker if his particular problem didn’t happen to be one of the general pitfalls.

@badp As long as a user gets an answer they can understand to the specific question they asked, it doesn’t hurt to edit their question. It certainly doesn’t hurt to edit it for grammar and typos, and it doesn’t even hurt to remove unnecessary specifics (changing someone’s domain name to “example.com” for example).

A lot of times the original question is so hopeless, either because it’s in extremely poor English, or because the asker is deeply, deeply confused about something, that the question, as written, is not answerable. In those cases I argue that you should feel free to edit it into an answerable form, making your best guess as to what the asker was looking for. If you get it wrong, the asker can always come back and ask another question.

Also — sometimes a user will ask a specific question (“My server has been hacked! Help!”) with a lot of specifics (see for example http://serverfault.com/questions/218005/my-servers-been-hacked-emergency ). Once the asker gets their original question asked, if someone wants to edit it to clean it up enough for posterity (removing confusing anecdotes, etc), it’s Making the Internet Better so why not?

One thing that could usefully be done would be to find a way to remove all the questions on malloc/realloc/free. There only needs to be a single one. You could probably cut the size down by double digits percent if you cracked that one!!

This post confuses me. The second bullet when editing a question is “clarify meaning without changing it” – which is the general consensus in meta discussions. I don’t think we can just edit questions to a general question we’d like to answer.
As for FAQs – they are almost useless. It is almost impossible of a new user to find the per-tag FAQs – the closest they get is the automatic “Related Questions”, which isn’t even based on the tags…
I don’t have a suggesting to recurring questions, but I don’t really see one here…

Are you overlooking the laziness factor? It is much easier to quickly compose a new question and have it answered then it is to search and read through a bunch of questions that may be close but not totally relevant.

The person who knows the answer can easily understand it is a duplicate question. It requires a lot of effort on the person who is asking to determine this.

I agree with Joel, it certainly does not hurt the original originator of the question if his or her question is clarified. As he said, the best thing is this all works like Wikipedia, the author can then re-edit back (knowing that again it may be edited). I’ve seen cases were the subject (the actual question) didn’t match the actual content of that question. Someone edited it and I was easily able to go back and answer that question.

I think in the long term, there is no alternative to rewarding the identifying of good duplicates. After all, it’s not only about finding a duplicate question – if at all possible, it’s about finding one in which the OP is actually going to find a good answer! This process often takes more time than answering the question yourself, and will need some kind of reward.

In the same vein, I think the system will need some way of quickly finding questions with the best answers. As it stands, you need to browse through potential duplicate questions to identify those with highly-upvoted answers.

Also, as an addendum, maybe Stack Overflow needs a new close reason: “Localized duplicate of generic question X”. This could help put a plug e.g. into the constant stream of “how do I rewrite an URL” mod_rewrite questions, most of which can be answered with a generic quick mod_rewrite introduction, and a small amount of common sense. At the moment, there is no way to deal with those questions other than “too localized” – and that is simply not being used, because SOers are mostly nice people and hesitant to close valid questions.

Answer #N:
The SO search function almost never finds anything useful. You can tell search isn’t and wasn’t a priority just by the UI design- no highlighting, same color as the background, and its placement on the page ignores Fitt’s Law.

My solution to the deficient search function is to use google’s advanced search features to filter out scrape sites and use a few trusted sources who I’ve identified to be expert in narrow fields as complimentary search terms.

Answer #N+1:
Since your users provide your content, you’re dependent on their ability to frame questions and tags in a meaningful way. Good luck on that one- the overlap between skilled programmers and willingly proficient technical writers is vanishingly small.

1) It is unusable with a text mode browser (elinks) or on a small screen (handheld).

Try it with elinks. Shudder.

Then imagine you are partially blind and use a speech synthesizer for many things. Feed the output to a speech synthesizer if you like. Listen to how many irrelevant things and links are part of the \answer\ to a given question. Even a sighted user has to filter all that stuff out to go from question to answer.

2) It doesn’t integrate with emacs. I know, I’m being snarky…

Usenet and usenet faqs were far more flexible in these regards.

Because I actually use stackoverflow, and like the reputation based concept, maybe I’ll take a stab at the json interface with gnugol and emacs to see if I can make it more navigable.

Great post. I agree that editing is important to prevent answers from becoming obsolete. The goal of creating canonical answers to everything is laudable.

One issue with editing old questions and answers is that they are basically dead. They receive no attention from the answerer community. True, people find them in searches, but when a question is first asked there is a buzz of activity around it. People are correcting answers, asking clarifying questions in the comments, etc. There should be a way of rejuvenating a question. All the old answerers should be notified to take a look at it again as it is now obsolete and needs more attention. It should show back up on the main page or something. I don’t have a solution to this problem. Maybe questions can be marked as expired by a moderator and old answerers will get a message in their inbox to review their work?

There is no incentive (outside of altruism) to edit old questions. There should be more badges and reputation points awarded in this area.

There is only a short window of time to edit a question to make it more general. After that people answered it already and you can’t fundamentally change a question after it has already been answered. The right thing to do is probably to answer the specific question and within the answer link to the general one. If there is no general form of the question you should create one and answer it. The FAQ mentions asking and answering your own question, but I don’t see much of that on SO.

I agree with David but besides users who don’t know how to search there are people who doesn’t know even *what* to search.

Anyway, IMHO the Related Questions displayed after you type the question title could be improved. If I type “is select * good?” none of those related question brings me something useful. The same for “should I use free or delete in c++” or “Can I allocate memory using malloc in c++”.

@David I agree. The problem is with the Questioner. If a person wants the question answered quickly, then the best way is to *look for duplicates* before asking. Or am I naive? And in fact, as you ask a question on SO, it displays possible matching questions. (However, I just tested it with “What’s better: SELECT * or SELECT c1, c2, c3?”, and an actual duplicate was about 10 questions down.)

Is it possible that many of duplicate questions are coming from reputation whores?

I agree with Mr. Heffernan, at least in part. SO is often used by those looking for quick answers and when they don’t find those in the first three results of a search, they will create a new, often poorly worded, version.

Some of the searches don’t return the best results either. Search for str int python compare and the first three results don’t have the word “compare” in them. It isn’t until the forth search result that you will find something which actually addresses the question at hand.

Moreover, as a UI perspective, I think that there is a mistake in the placement of SO’s “search tips”. It is also visually “out of the way” — it is in a right column, an area generally relegated to menu bars and meta data which is generally “unimportant”. In Bing and Google “advanced search” is something which is fairly prominent in a search result display page. Yahoo a little less so — the user has to click options. But access to advanced is still in prime real estate in all cases.

Another thing which SO might benefit from is the format which is used by the search engines. Google will generally return four lines: Title, two line description, and then a short version of the URL. It takes up about half as much space as the typical SO response. This means that I can see twice as many results without any interaction beyond the initial search.

With 49% of the general public unwilling to search beyond page 1 of search results (and developers are people too), there should be real concern over making those results appear bloated. I know that a lot of the data presented is imporant (and I will always prefer to look at questions with positive votes), but there is a lot of extra in there as well. (Is user icon really relavent to a search?)

Perhaps this is unfair. I don’t have to deal with laying out changes. I certainly am not involved in the CSS which would be involved (though I think that this would largely be just that: CSS and nothing more). But this is what the rest of the world does with searching. Further, if searches cannot reliably return pages with all search terms in them (or at least cannot place them as the first result), maybe it would be worth considering.

Right now the reward system on stackoverflow is geared very much towards asking and answering questions. If you want stackoverflow to become more of a wiki then you need to adjust the rewards accordingly.

Personally I actually like that stackoverflow is a river of questions. If a topic requires a single collaboratively edited webpage then it belongs on wikipedia. I see and use stackoverflow as more of a conversational site and as in real life not every conversation has to be wholly unique to be interesting.

The way I see it duplicates also provide some value in and of themselves. The questioner gets a specific answer and the answerer gets experience in writing a complete solution and both parties get feedback on how well they did.

Unfortunately, my experience with Stack Overflow (and all other Q/A sites too, include Expert Exchange, MSDN Forums, etc.) is that the hard questions never get answered. The easiest thermometer of “will this question get an accurate answer?” is, “have I been able to find the answer with a search engine?” If not, you probably won’t get an answer on a Q/A site. At least, that’s been my experience. In fact, one of my principles is that whenever I discover a solution to a problem that I could not get an answer to on the Q/A sites (including Stack Overflow), I answer my own question there and mark it as “answer”, and then blog about it as well. This is my contribution to the world in this regard. :)

There’s an ongoing Catch-22. When you are new to a problem domain, you lack the background (in particular, the vocabulary) to search effectively.

It’s very hard to tell which search results are relevant, because you’re not entirely sure what the search terms mean in the first place. In a world of jargon and domain specific language, you can’t effectively read or write and so the archives are (largely) inaccessible to you.

So you seek out a person who speaks natural language and can interpret. AKA posting a question.

There will *always* be people who can’t search. Not because they are lazy, but because really… they can’t.

People tend to optimize around what is measured. You created an incentives system that was effective enough at keeping people interested and getting the site where it is today. And I mean that in both the positive and negative way. It has a lot of questions and answers, because that is what people are rewarded for. Now you’ve detected a flaw in the reward system – this post sounds like you’ve realized you need to fix it (but aren’t exactly saying that).

It’s just the way incentives work now: answering lots of easy but popular questions that were already answered dozens of times before for an average contributer is a much is safer strategy paying off in reputation.

There is no penalty in asking a question there answer already exist on the site, on the contrary the more popular the question the easier to get reputation for asking it.

I don’t think that writing a blog entry will change the situation. The incentives system need to be adjusted to reward the kind of behaviour site founders expect.

This might not pertain exactly to the problem that this post is discussing but I’d like to share my thoughts.

Problem with Stack Overflow is that you need high reputation points to edit. Way too high. I often see answers that can be improved through my quick edits but I can’t, so I leave.

Here is what typically happens:

1. I have a problem
2. I search google
3. I click on a Stack Overflow link
4. It either solves my problem or at least points me in the right direction.
5. I finally solve my problem and gain a knowledge that I can contribute.
6. I come back to the site to add my knowledge but all I can do is add another answer.

At this point, Stack Overflow becomes just another Q&A forum for me. So, I leave.

Where as Wikipedia just lets me edit. Although I rarely create new content, I constantly make good edits. This has been the biggest turn off for Stack Overflow for me.

Stack Overflow advertises all these great features but it’s never available for me. I can’t even vote down from the start. This high wall of barrier has demotivated me from creating useful content for Stack Overflow.

Maybe I’m a special case, but if Stack Overflow is about cultivating the long tail shouldn’t the barrier of entry be low?

Those are my *exact* experiences with Stack Overflow as well! It’s too hard to get to the point where I can make meaningful contributions in a way that doesn’t involve answering lame questions where “RTFM” or “learn to use a search engine” would not be valid answers. I can’t even knock down rep for people who give “answers” to my questions where they clearly didn’t read the question.

I’ve basically stopped paying attention to Stack Overflow because, at least in the Perl section, it’s been overrun by people who have realized that there are easy answers to get with almost no work. There are a lot of good people answering questions, but Stack Overflow is now chewing through them just like Usenet and Wikipedia did. The consensus system doesn’t work. Someone needs to be the boss.

I think Stack Overflow had the potential to make the internet better, but now its noise level is equalizing with all the other sources. The system of incentives is skewed in the wrong direction. I agree with much of what George proposes.

Despite Joel commenting “Once the asker gets their original question asked,”, that’s not how it is. I have a pretty high edit count, and there was always been pressure to not edit. I agree with Joel on how it should be, but in practice that’s not how it has been and how people think of “their questions” or “their answers”.

This is fine as long as the generic question/answer is adaptable enough to be easily understood by the person wanting the specific answer.

What to do with the blue wire may be easily answered by what to do with a five wire hookup on system x, but it doesn’t really help someone looking for what to do when they can’t find the blue wire, unless the topic also addresses missing wires (and covers differences in case if there are any).

Those arguing for “remove reputation from dupes” have no idea of the Frankenstein monster you’d be creating.

That would be an awesome rep-denial tool for griefing others, first of all.

Second, it creates a MASSIVE disincentive to participation — even if you had no idea there were duplicates, you could be bushwhacked at any time even if you contributed a *fantastic* answer. Oops sorry dupe! Zero rep, buddy! Bzzzzt! Go away! Take your awesome answer elsewhere!

Incentives are fine, but penalties are an incredibly bad idea and would actively hurt users and the network.

I was a little miffed by the comparison of stack overflow to a USENET FAQ, so…

I spent the afternoon prototyping a gnugol engine that could parse the json output that stackoverflow provides into a format that I can deal with. I got it sort of working a few minutes ago. Screenshot:

This was on a monster 3 screen beast, I have about 1/2 as much screen space available now.

Since I’m down from 3 screens to two, at present, I hope you’ll understand why I find it difficult to participate in the modern internet to the extent that I did in the USENET days, and I find interacting on most forums ducedly timewasting and inconvenient. USENET at least had kill files and scoring – built in and invisible rather than up front, on every page.

Some asides:

+ your json api is well documented. Thank you.
- There seems to be no way to go from a question to an answer in a single query. It would be nice if the API would supply both the text of the question and the recommended answer in one go.
- The search titles interface to the json api takes 7-10 seconds to complete. Searching google with a site:stackoverflow.com restriction takes 384ms and yields a screen-full of more relevant results.
+ Your search (not json) engine interface uses an opensearch API
- The fastest I’ve seen a result with that API took 709ms.

(I am based in colorado if that helps your speed of light calculation)

- Searching titles with json or using your is nowhere near as effective as searching google with a site restriction of your site.

Again, it’s the EXTRA distance stackoverflow invokes between a question and an answer that’s the problem here.

It might help (others) somewhat if the question(s) and (best) answers were actually on the same page on the search results, perhaps folded, using javascript.

- from reading the api, I don’t see support for vote up/down built in. (then again, my eyes hath glazed over)

- Stackoverflow UI could take better advantage of the wider screens now available as there is a lot of whitespace in the second screenshot.

Please note that I would not make these comments unless I actually liked using stack overflow. I do, I just can’t stand the UI.

Old school FAQs and usenet still have their place, and I would like it if it were possible to meet somewhere closer to the middle between these two extremes.

I have a tendency to agree. Someone who gives a “go look here” answer should be rewarded, because the answer is correct. Someone who essentially rewrites the “go look here” answer should get a LOT less of a reward, though, because now they’re just duplicating content. Why *should* they be rewarded for repeating what has already been said? Someone giving a unique answer to a duplicate question should get the biggest reward of the three, because they’ve added a lot of value.

The solution to the duplicate post issue, as many have hinted at (including Joel’s original post) or stated outright, is better duplicate detection by the system at the time of post, and better search functionality.

But there has to be something more to the solution. Every time I’ve used Stack Overflow (and again, this is not unique to Stack Overflow… the only similar place that ever gets better than “no results” is MSDN Forums, because I can get an MVP or Microsoft employee to answer, and play the “Gold Partner Card” as needed), I could not get an answer to any question where the answer did not exist on a search engine. Period, end of story. If Stack Overflow can’t answer a question better than Google, Bing, or Yahoo!, what’s the point? I get the feeling like the only questions that truly get answered are the ones where it’s an obvious answer or a top 10 search results answer.

Sorry, but I think suggesting people should edit questions significantly (by e.g. making them more general) just to ‘make the internet (or SO) better’ is a terrible idea. If I had asked a bad question, I would rather it be closed than be morphed into someone else’s interpretation of the right thing to ask. This happens sometimes already in the form of smart-arsed answerers telling people they are asking the wrong question, and then going on to sidestep the question completely. Unless it’s complete crap, the question should belong to the asker.

I suppose I would advocate continued ‘education’ for question askers. Oh, and for dupes – please improve the search engine on SO! It’s not very good at the moment. I almost always need to use Google (why not just use Google internally?)

yes, imho you need to disincentive the participation on duplicate questions. Why should I search for duplicate questions and try to close them when people can answer them and earn reps? Where is the incentive to follow a good behaviour here? Am I missing something?

If you are answering a possible duplicate question you should have the “I’m walking on a mine field” feeling.

I much prefer incentives than disincentives – my suggestion for shooting down in flames would be some sort of award or page ranking to rejuvinate the older quality responses. There have been times where I’ve posted a reply where I’ve referenced and linked an earlier Q&A because of it’s value. Would there be some way to recognize the value of the earlier replies from the number of links or references to the Q&A? Recognition in the form of rep points, bump it’s popularity in some way, either higher rankings in the search or related searches etc.

Disincentivising answering duplicates isn’t a great idea, not all duplicates are easy to find and quite often they’re found because a user recalls seeing the same question before – obviously that doesn’t happen right away all the time so some answers are posted before the dupe is found. In that case, the user deserves to keep their reputation if they’ve taken the time to throw in a good answer.

The problem I have with the system is that people often ignore the close votes when they have enough rep to agree with them – especially if they’ve already “invested” an answer in the question. I have a hard time believing they don’t see it when they’re actively involved in answering comments and updating their answers whilst conveniently not noticing the “Possible duplicate of…” comments.

The idea of incentivising finding duplicates has been tossed around and I’m surprised that nothing has been put in place, considering this blog post appears to be encouraging this behavior.

I think a part of the problem is that SO lacks support for reusing answers. Often, these “duplicates” are really different questions but with overlapping answers.

But our only tools are either “close entirely, and redirect to an existing answer”, or “leave open, and duplicate a lot of information”.

What if there was a middle road, where we could write new answers to new questions, and it was just made easier to create and refer to small “factoids” too small to constitute a full answer, but describing a single piece of commonly used information?

Then when I come across a question where I know the answer overlaps another common answer, I could write a thin answer “shell” providing a bit of context and relating it to the OP’s problem, and then just drop in a reference to the canonical description of whatever the OP needs to know.

Just thinking out loud here.

But whatever the solution, I think a big part of the problem is that we lack flexible tools for dealing with “non-duplicate questions with duplicate or overlapping answers”, or more generally, a way to reuse answers (or pieces of answers) we already wrote.

Refactoring tools for answers, I guess is a good way to summarize my comment above.

We’re programmers, it seems like this should be an obvious idea, doesn’t it? When we keep having to write the same information in our answers, allow us to factor it out so it is only stored once, and can be maintained easily, but while allowing us to refer to it from all our different answers.

Part of the problem, as I see it, is that people asking the question may not be asking the question the same way it was previously asked. If you don’t use the right wording, or something close to it, you won’t find it, so it will be asked again and again and again.

This is especially true when trying to solve a coding problem, there are always multiple ways of asking the same question, so why not allow for multiple headings?

The way a new programmer asks a questions is not at all the same way an expert will. Chances are the newbie doesn’t have the vocabulary to ask it the same way, and isn’t this about teaching?

> but we have to look at reasons why people are asking
> duplicate questions. The main reason is that they can
> not find the same question asked earlier.

That may be one reason, however I suspect that for many they can’t be bothered to look for the earlier answer, no matter how good the search engine is. There are several users who seem to have a scatter gun approach popping off several questions in a matter of minutes. Hoping that the others will do the searching for them.

>And they are supposed to know this is a duplicate, how? >Perhaps for THAT PERSON, this is the first time they >have ever seen that question.

You are absolutely right, but some people is answering questions even with one or more close votes and that’s not fair for the other people who spent time to search for duplicates.
I totally agree that it is a difficult beast to tackle, but since it’s not working at all maybe it could work a little better with a little bit of “fist of justice”.

In my opinion the point is: to deter people in asking duplicate questions, discourage people to answer them giving incentives to find duplicates instead.

You’re not going to stop people from being lazy and creating duplicates instead of searching, that’s always a factor, and until someone writes some pretty cool code to automatically detect duplicates, that problem will exist.

But I have always found it a nightmare to find answers to some questions only to find out that the answer is there if only you knew the question (how it was originally asked).

Having multiple headings for the same answer will help greatly. And by this I mean you can see all the headings for the same answer listed line by line. As in (another way this question was asked)

This solves many problems:
1. It helps people find the answer.
2. It teaches people better ways of asking the question(new vocabulary)
3. Cuts down on duplicates.
4. Better Internet.

@systempuntoout: nor is it fair to those who go to the effort of actually providing quality answers, if someone arbitrarily decides \this looks like a duplicate\ and votes to close it. It works both ways. And personally, when there is a conflict I tend to side with the people answering questions, over the people *closing* questions.

Seems people might be losing sight of the goal here. Catching duplicates has no intrinsic value. It only matters in so far as it makes SO a better place to find answers to your questions.

If people answer duplicate questions instead of closing them, the only real downside is that we end up with a lot of duplicated information, which we as programmers tend to dislike, and which makes it harder to ensure that all of it is correct and up to date.

Perhaps the most constructive suggestion I can make is that I would like there to be a \short stack overflow\, much as I just outlined on my blog (http://nex-6.taht.net/posts/Screen_Space/), that wouldn’t compete for screen space with all my other applications. It would be kind of cool to have a jabber interface as well, that would make interacting with the site faster and simpler.

The Search on Stack Overflow is not its best feature but I would suggest just using Google (or Bing or others) as the default. I just searched for some key words I’d looked at days ago and Search did not offer any meaningful help – EXCEPT that it suggested using:

site:stackoverflow.com/questions

at Google and that returned terrific results. Why not just do that in the Search box? Take the keywords and pass them to Google with that command line?

On the other hand, I’m afraid most people don’t like taking the time to search for existing questions/answers before asking their question. But I think Stack Overflow is a great step forward. Not perfect but at you are getting better.

There are a lot of very good thoughts and observations presented here. I hope they’re all given due consideration.

I’d like to underline two themes that have been true in my experience at stack exchange sites. They’ve both been mentioned before but I think I have something to add.

(a) I’m “guilty” of a several duplicate questions. I *always* search before I post, and I conscientiously read through at *least* the first page of results (though admittedly sometimes at speed and without attention to detail). If I ask a dupe it’s because I couldn’t find the pre-existing one. Sometimes it’s because I don’t know the right words, and sometimes it’s because the SO search engine returns too many irrelevant results. It’s never because I’m too lazy to work for it.

It’s a lot easier to be conscientious and thorough now that I have a comfortable amount of reputation under my belt. When a raw newbie there’s a lot of pressure to up the rep count as quickly as possible in order to participate more meaningfully. The memory of being chastized and downvoted for adding a new answer to a question instead of a comment still stings a bit (I didn’t have enough rep to comment). This means the strata of users which are most likely to ask dupes, newbies, is the same strata which is most likely to benefit from asking a dupe (a lot of dupes get an upvote or two before the dupe is noticed/discovered). (((as an aside, this same pressure for fast upward mobility also contributes to asking questions without thinking about them carefully first.)))

(b) Old questions are basically dead to the community of experienced answerers. It’s very hard to revive an old question and get attention focussed on it.

Agree with the many comments about the search. I couldn’t find an answer to a question last week even though I knew it was there, having found it a few days earlier.

Here’s a thought no one has brought up yet:

Answering simple questions earns you quick reputation. Answering difficult questions often earns you very little, because so few people can verify the answer. Most of the simple questions have already been answered and had been answered long before I became active on SO.

Despite knowing this, I have answered a good few difficult questions while I’ve been here answering easy ones. Reputation is not a major driving goal for me, but if I were getting none by answering only difficult questions, I would be less inclined to turn up.

If you take away my ability to earn easy points, you possibly don’t get my participation.

However, I understand the need to keep the site’s archive clean. So why not allow people to mark a question as a duplicate and then later – say one week after the asker has accepted an answer – remove it, or better still, merge the answer pool somehow.

And reward that high-effort job well. Make it something that the experienced members feel a need to do. Make it more valuable to them than spending time answering questions which have been answered before; let the new members answer those for a first time?

If someone has a reputation of over 100K, 10 points for each upvote on a difficult question is worth nothing to them. But 100 points for taking the effort to merge a marked duplicate with the original might be worth plenty.

Given that we do have multiple questions in the wild, and in some cases many copies have useful answers would it be possible to add references to other usefully answered questions on the same topic.

At the moment the closest we have is the Related questions, but they don’t seem to have any weighting to questions with highly uprated answers.

Alternatively, we could migrate useful answers from duplicate questions to the canonical question. However that would reduce the richness of the search terms for the question and the answer would not make much sense in the question it was moved to, unless it was given context.

For what it’s worth — you’ve created a great Mechanical-Turk-style general-coding-questions site that really encourages people to answer low-to-mid-level questions.

I mean that sincerely — I realise it’s not your intent — and I think the competition to understand, translate and rephrase questions quickly is actually useful.

I agree with @pdr; I’d suggest keeping it as it is, and rewarding people richly (archivists) for trawling through the duplicates and merging them up *somewhere else* if you want to maintain a clean archive of canonical questions. You’ve almost got two distinct populations of experts; it might be good to make use of that.

Unless the number of the duplicates is unmanageble, I really do not understand what the problem is.
There are much bigger (at least IMHO) problems with stackoverflow that prevent me from participating more.
I much prefer the MS forums (http://social.msdn.microsoft.com/Forums/en-US/categories), even if their scope is limited to the Windows and MS specific questions.

1. E-mail notifications: they are only sent for the questions that I asked. How about the questions that I *answered*? How about the threads that I do not participate in (because I cannot add anything of value), but still want to track?
The notifications are far from being real-time. If somebody reponds to a topic that I am interested in, let me know *now* so that I can provide a real-time feedback.

2. I really do not understand the “no signatures” policy. Yes, I point people to my web site and plug my products. Selling my products is how I make money. Yes, I want people to see my signature when I provide a relevant answer and one of my products might help them. Why is it wrong? The only way for me to get more impressions is to *answer* questions. Isn’t this what everybody wants?

@jalf:
“If people answer duplicate questions instead of closing them, the only real downside is that we end up with a lot of duplicated information, which we as programmers tend to dislike, and which makes it harder to ensure that all of it is correct and up to date.”

I fully agree, except for the word “only”. IMO duplicates _are_ a real problem. The quality of answers depends a lot on the day of the week and time of day (U.S. business hours?) the question was asked (plus a few other factors) and varies wildly. If you have dozens of duplicates for a question, you have dozens of answers voted to the top which are of a wildly varying quality. If the community’s efforts would be channeled into answering one question instead, this one question could have high quality answers.

This would reverse the current situation. Instead of asking questions and praying that someone will answer, why not have people answer them before it gets asked?

Reputation and all that can work the same, you can even force the same format that most recipe books have. A problem statement, a solution, and a short discussion. Then allow people to comment/edit it if something is not clear.

Something like this would have infinitely more value then the current stack exchange setup.

That is easier said then done. Think about it for a minute, how are you going to write up an FYI for a specific problem that involves various technologies.

It’s really not that easy so an FYI is not the answer to this challenge. Also think about the number of people that will come to a site like this and look through a “general” fyi list, surely it’s not going to cover even 50% of the questions asked.

I have two problems with the Stack overflow model.
Before I post I look for dups.
When I search and find 5 pages of questions, I am not going to read 5 pages. 1 page yes, but not 5.

If its a hard question, normally I have spent many hours on it before posting, there might be few answers. Its possible none are the answer. In a day or so the question is essentially dead. No one bothers to answer it.

Now its a fantastic system. I am stunned how fast I can get answers, which may be why there are so many dups. Its still early and I am optimistic these problems can be solved. PLEASE keep up the good work.

IMO, a little structure can’t hurt and could even improve the search results. Why not create a new question that links to similar questions? Add a collapsible element to each question that contains forward links to related questions and also an automatically generated element that contains links to referring questions. These lists would be sortable by time, score, etc. There’s no need to delete any question unless it’s just a bad question. Over time the best questions with the best answers will percolate to the top and bad questions will sink to the bottom toward negative infinity, certainly below any reasonable noise floor.

-> If a person wants the question answered quickly, then the best way is to *look for duplicates* before asking.

I think often best way is the other way round, write the question as that help you understand your problem. Post the question, then look for duplicates and *vote to close your own quesion” if you find any good duplicates.

Not to harp on the same issue, but search-ability remains a major problem. This problem:Diffrence between x++ and ++x is an exact duplicate of any number of stackoverflow questions, but if you put the title into search, you get no results. Now, if you were a newbie, how would you know how to phrase this question so that you could find the duplicates? ++x, x++ do not show anything. Is it fair to expect newbies to know the name of incrementers?

What about ternary operators? I know that there are a number of duplicates there too. How would a newbie know that?

Every time I see a duplicate where someone has literally copied my answer from another question (instead of voting to close as duplicate), I die a little inside.

I work very hard to write very good answers and, and this may sound vain, I like getting credit for my hard work. To see someone lift my answers (sometimes with attribution) and receive reputation for it, sometimes more than my original answer, is frustrating for me.

I realize that you don’t want to disincentivise, but I can’t help but feel this is the result of not having a better MERGE operation. If questions that were close as duplicates were able to merge answers (in some fashion) that people would be less likely to lift answers on to every new duplicate.

Now I realize discussing a MERGE operation is no easy feat, but I do think it could be accomplished as long as you also provide for a FORK; this would make for more easy unwinding of MERGEs and even allow for splitting of questions with disparate parts.

Unfortunately, as it stands it feels as though people are disincentivised to close as duplicate, and that is harmful to YE OLDE TIMERS

So many people mentioned about how to prevent dross by improving search. And I think SO has the ablility to search better than the current on (or Google site:stackoverflow.com).

My idea is SO can try to do search when users has input the detailed questions and are ready to submit the question.

In old way of search, users can only input a few words. It’s hard to match the existing questions using just a few words. Also, one thing can be described in many different ways.
But if users’re going to submit a questions, they must input a lot of detailed descriptions of their questions.
The search engine can get better result using these detailed descriptions. Then show result before submit.

I agree with the general principle not to pollute the internet with huge amounts of duplicate Q&A (by “duplicate” I mean the same “root cause” and same “fix”) by closing dupilcate and generalize canonical answers.
The problem is you have to get people do this smartly, which is hard, or it might hurt the new comers who even don’t have the skills to ask ‘good’ question yet.
Couple ways achieving this
a) Only allow smart people do this — Can’t think of a way to implement this
b) If people do it unsmartly, there is a way to detect and fix. It might help allow the question opener to thumb up/down on the edit and count this in the editer’s credit.
c) Only allow edit(duplicate/generalize) after a certain period, eg: one month. So this won’t hurt the newbies allow them get immediate answer to what they need. And it’s OK to leave the jam questions searchable on google for a month, as they will go away eventually.

You want us to help make the stackoverflow wiki better, that’s great we love what you have built and it’s super useful BUT… if you want us to take part in the effort help us in return: Make the stackoverflow knowladge database freely accecible! Not just by HTTP but by a simple download of the whole context. Much like wikipedia allows people to download it’s database. Give us the option of setting up an internal stack over flow site (release the code!) Help the community.

After using the SE sites for a while and gaining quite some reputation on some of them, I have found two major flaws in the system and wonder how they can be addressed.

1) How does the system handle when a wrong answer has been selected? In this case, right answers can get more votes, but wrong is still selected and shows up first. By wrong I mean objectively non-optimal when others are objectively more optimal (we are programmers, after all). This can be because a neophyte asked and eagerly accepted the first answer, which was good enough, and are no longer participating and notice other answers. Or perhaps the question has been edited to become more general, duplicate (or near-duplicate) questions point here, and more general answers are submitted. Again, the original asker is not involved, and therefore the selected “right” answer does not get updated. How does the SE system handle this?

The second major problem is somewhat related and is: Staleness. Ask a general question, it is answered for OS X 10.5. A while later 10.6 comes out, the question is still valid, but the answer is now totally wrong, perhaps even dangerous on the new system. There is a preference for new questions to be active, gain votes, gain answers, etc. Hence the issue of staleness. I can edit the question to make it clear it was for a 10.5 system, edit the ‘best’ answer with 10s of votes to make it mention it is a 10.5 solution, add a 10.6 solution, but the 10.6 solution will just not gain much notice because the original question is a year old.

People might google the Q, come across the A, read the 10.6 stuff as that is relevant, but not being part of the community, they’ll just use the info and go on their way, not bothering to vote what is now the ‘more correct’ (as more people are now running 10.6 than 10.5) answer up. It is stale.

I think Eric Zhao is on the right track – search for their answer and present them with the results before they can actually submit.
Additionally all the original questions and edited questions are a gold mine for making the system smarter about finding duplicates and directing anyone who does not know “how” to ask the question to the right answer. If I ask a question about “x++, ++x” and similar questions exist that are closed as duplicates then point me to the right answer about using increment operators anyway. Get the system to learn the way real people ask questions by mining the questions people ask, or at least searching all the questions asked (before any editing) and returning corrected results.

One problem that I have had with SO is simply those smart alecs trying to point out that the question is wrong and this would be a wrong feature to build and point me to a remotely similar question (that uses a different DB server and different SQL dialect, but is “otherwise just the same question”).