There will always be the fringe cases and judgment calls. If a proposed change is that controversial, it could be discussed here (meta.SO), where appropriate.

Maintaining a completely sanitized tag cloud is difficult to impossible. I'm not even suggesting that. But there might be a place for a bit of routine maintenance and sanity checking. This is best done on an on-going basis rather than all at once.

I think a moderator should be able to mark one tag as a duplicate of another, and have the engine immediately retag any post which uses the duplicate, to be automatically edited to use the correct tag.

Given the dangers of a mistake in merging tags, I'd suggest taking (stealing!) a technique from the genealogists of the Mormon church ... yes, really!

When they find that records of two people are actually records of the same person, they don't merge those into a single record, just in case there's a mistake. Instead, each is tagged with the ID of the other, and it's a presentation issue to merge the details together.

Then, if an error is discovered later on - even years later - the link can be removed and the data cleanly split apart.

Therefore, I propose that people with sufficiently high reputation be able to define "tag synonyms" that get merged in the presentation layer.

For example, you could define visual-studio and visual+studio as synonyms of visualstudio, with the effect that only visualstudio would appear to the user. The other two forms would be silently converted.

Reversing an inappropriate merge would be easy, just delete the synonym link.

The major challenge with this approach is making it sufficiently performant ... but there are ways to achieve that.

I like the idea of synonyms. I don't expect them to get too crazy about this here but out of personal interest... it would be cool if tried to create a tag (say, c#.net), the system would tell me that it is a known synonym and would be mapped to the tag "c#"... or, no, maybe the other way around... if I searched a tag (say, c#), the system would tell me it has synonyms (c#2.0, c#3.0, c#.net, etc) which would be included. And maybe I would be able to individually deselect them if I wanted. Hmm....
–
Robert Cartaino♦Jul 1 '09 at 14:14

5

One thing to be aware of is that synonyms are not transitive. In other words, if "a" means the same as "b", and "b" means the same as "c", it does not necessarily follow that "a" means the same as "c". Wolfram Alpha encountered this with the words "black", "dim", and "dumb", and made the connection that "black" and "dumb" meant the same thing. blog.wolframalpha.com/2009/05/24/…
–
Kyle CroninJul 1 '09 at 15:26

@rcar Having auto-magic suggestions for related (relevant) tags could be really useful. I suggest you raise this as a separate meta.stackoverflow question so that it's more visible than a comment buried here.
–
BevanJul 1 '09 at 19:57

@Kyle - you raise a good point about synonyms not being transitive. I guess they're not always reversible either. Having them defined one way would raise some interesting capabilities - to search for [C#] and find [C#3] and [C#4] as well, for example.
–
BevanJul 1 '09 at 19:59

Like Bill the Lizard mentioned, automatic changing of tags can be very problematic. I think that re-tagging really needs to be a manual effort because you have to read the post to figure out what the tags should be.

However, the re-tagging process needs some serious streamlining. I occassionally try to do some re-tagging because I have slight OCD when it comes to organization of things, but it becomes incredibly tedious in a short amount of time.

What we need is system that will support re-tagging in batches:

Step 1: Select Source tag
Step 2: Select Target tag
Step 3: System shows you posts with the Source tag one at a time. For each post you say "Source", "Target", "Other", or "Remove" to choose whether to leave the Source tag alone, convert it to the Target tag, convert it to some other tag, or remove the tag entirely.

This would make it very easy to convert tags but allow manual confirmation that the tag should be converted.

This can do more harm than good. I once took it upon myself to change a bunch of posts tagged 'access' to 'ms-access' since the latter was more popular by 10:1. Once I had retagged a couple dozen posts I ran across one that was asking about disk access and clearly had nothing to do with MS-Acess the application. I then had to go back through and check each one I had already changed to make sure I hadn't mis-tagged anything (I had). If retagging had been easier, it might have been some time before my mistake had been discovered.

I've said it elsewhere but a lot of tag duplication happens of things like dashes and punctuation. For instance you will have vb.net and vbdotnet. Or vb6 an vb-6. Such errors could easily be fixed on the fly. Even if you had an admin maintain a lookup table say one day a month, I think you would see drastic improvement. You could also have a google-like "did you mean X?".

I would object to trying to clean up tags that have become obsolete, or irrelevant. I believe it would be too difficult to determine what technologies are no longer relevant. Perhaps with beta versions, it might be safe. But, outside of that, many people and companies are stuck, for various reasons, with technologies that perhaps should be obsolete.

That said, I'm sure a lot of the tags that have only been used once could be cleaned up, and that might trim things down quite a bit. By my search, pages 190-332 of tags contain tags that have only one associated question, even just removing the bulk of those would help.

Tag cleanup is good in principle. Last tag I cleaned up was to remove 'gae'. 'google-app-engine' was much more popular and 14/15 of the 'gae' questions were also tagged with 'google-app-engine'. Well I got hit with CAPTCHA seven times in that little stint of 15 retags. That is a serious disincentive to do any kind of mass edit.

So I'm not really surprised we've had a proliferation of redundant and pointless tags.

Adding an extra interface for adjusting tags has been proposed and it was felt by the SO Team that it could potentially do a lot more harm than good and could cause some potentially catastrophic problems in even a minor mistake.

I don't quite see what the title "Needs Tag Editors" means, because retagging is one of the very first moderation abilities given out. So that should mean that there are even more users (and more every day) who gain this ability to retag things as they come along (or go back and attempt to clean up the tags done in the past).

I was suggesting database-level editing where you would rename or combine an entire tag entry. Not on a message-by-message basis. I wouldn't grant this to just anyone who as the current "edit tag" capability.
–
Robert Cartaino♦Jun 29 '09 at 14:12

database-level editing is what I am talking about here. Making such massive and sweeping changes could have potentially disastrous effects and would undoubtedly drag some questions along that might not have needed to be changed in that way.
–
TheTXIJun 29 '09 at 14:21

Gotcha. I was clarifying what "Needs Tag [Database] Editors" was referring to.
–
Robert Cartaino♦Jun 29 '09 at 15:30

It seems to me that the biggest service that Rich B et al. serve the community is their tag updates. Most of these formatting variations are edited to match the common tags. I think you'll notice a large number of these 23000 tags have only one or two questions.

Basically, given the current level of moderation, I would think this is more or less a non-issue.