Introduction

WordCloud is a visual depiction of how many times a word is used, or its frequency if you will, within a given set of words. It does this by: reading in plain text, filtering out "stop words", counting how many times a word is used, and displaying results in a Squarified Treemap. (In the images above, the larger a node and more saturated the color, the more frequent its use.)

Background

At best, I'm a hobbyist with the technologies used in this example, so I'm defaulting to various articles I read that lead to creating WordCloud.

The Squarified Treemap

Display is handled by Microsoft's TreemapGenerator, part of the Data Visualization Components suite. While a true treemap utilizes both hierarchical and proportional attributes, WordCloud only uses proportional attributes to show word count.

Drawing Nodes

The treemap uses custom drawing for nodes, which is called from OnPaint.

// We want to do owner drawing, so handle the DrawItem event.
m_TreemapGenerator.DrawItem +=
new TreemapGenerator.TreemapDrawItemEventHandler(DrawItem);

...

protectedoverridevoid OnPaint(PaintEventArgs e)
{
AssertValid();
// Save the Graphics object so it can be accessed by OnDrawItem().
m_Graphics = e.Graphics;
// Tell the TreemapGenerator to draw the treemap using owner-// implemented code. This causes the DrawItem event to get fired for// each node in the treemap.
m_TreemapGenerator.Draw(this.ClientRectangle);
// All DrawItem events have been fired. Make sure the Graphics object// doesn't get used again.
m_Graphics = null;
}

Node rendering is handled in DrawItem(). Within this method we extract the NodeInfo object, get name and count, set color and text size based on count, and then draw the node. Final node result: the greater the count, the larger the text and more saturated the color.

The Demo Application

Controls

Input Text: Paste text into this dialog from another document to visualize (128k max, but can be changed to your liking)

Stop Words: A dialog allowing you to modify the set of stop words**

Font: A dialog allowing you set the display font

Node Color: A dialog allowing you set the gradient colors for node display

Scale Text: Toggle for scaling text relative to count

Show Count: Toggle for showing/hiding word count in nodes**

Minimum word count slider: Dynamically controls how many nodes to display based on word frequency

Save as image: Save the treemap as a gif image

**NOTE: Document text is not retained in memory; it's only parsed, added to the treemap as nodes, and then discarded. So the Show Count and Stop Words features are only useful before opening/inputting text; it doesn't dynamically show/hide node counts or apply stop words.

Input Data

I've tried various document sizes, ranging from 400 to 6000 words - mostly presidential speeches and the like. In the project, I've included two text files: mlk.txt and kennedy.txt. These are Martin Luther King's "I Have a Dream" address at the March on Washington, August 28, 1963, and former United States President John F. Kennedy's 1961 State of the Union Address - 1,588 and 5,184 words respectively.

Another issue to be aware of is stop words. I've added a default set of stop words which is user configurable and greatly affects word parsing. The 430 stop words provided are fairly standard and cover a wide number of stop words without getting too aggressive.

Conclusion

While crude, un-optimized, not web-based, and entry level at best when compared to other tag/word cloud generators, the example could perhaps be a starting point for someone interested in the idea. It also may serve as a basic example using Microsoft's TreemapGenerator from the Data Visualization Components suite.

Attribution

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

hi
ur project is very helpful for me ..can i use ur code to my university project can u pls tell me the way that i can show that words in the data grid without using tree pannel and also i dont need to show all words i am having some words in the listbox i need to show how many times that words (words in the list box ) was repeated and show them in the data grid

You are free to use the code anyway you like, but I don't understand your questions. You want to use a DataGrid control instead of the the Treemap? I'm not sure how you want to show the number of words, or weighting, in the data grid. Please explain.

With regards to not showing certain words, do you mean the "stop words"? These are contained in a text file.

The link to Microsoft's Data Visualization Components might not work. If so, you can go to the browse downloads page, click on the page three (3) link, and click "Data Visualization Components" which is the second link on the page.