Breaking up phrases seems not such a good idea. Phrases like "strongly typed" get broken up into "strongly" and "typed", "ad hoc" into "ad" and "hoc", "very wonderful" into "very" and "wonderful" asf. But what's the use of "strongly", "ad" or "very" as tag words?! Either restrict to single words, or leave words of a phrase together.

The fact is that I originally made a mistake (or is it one?) of allowing people to use "short sentences" in addition to single words.

At this stage (> 150 entries) not breaking short sentences 1) gives very bad visual results and 2) reduces the overal frequency of the most common words (tag clouds are no longer interesting in such a scenario).

So, I'm still looking for a solution here. I don't like the idea of requiring single words only and I doubt people will actually comply (unless the formulary force them of course). What about filtering adverbs, coordination conjunctions and the like?

I would do two things. For a specific language I would keep phrases together, maybe breaking up on conjunctions, even if visual entries get bigger that way. For me, color and spacial orientation do enough to keep things readable and interesting, while nonsensical entries annoy.
For the overall frequency I would filter much more (adverbs etc.) to restrict to only the essential words, even if that incurs some false negatives.