I got an email yesterday from a government customer that asked a very good question:

It seems like Wordnet might be used to construct synonym lists [for SAS Text Miner] that could map terms "up" to more general synonyms possibly reducing noise and enhancing concept extraction. Has anyone in TM R+D ever considered using Wordnet?

Wordnet is a public-domain thesaurus/lexical database for English. It contains synsets or synonym rings that can show all related words to a given word. Since we allow the user to create synonym lists for SAS Text Miner, it seems reasonable to assume that some generic free source of a huge lists of synonyms might be beneficial. And in fact, we have looked at Wordnet before, but found that the reality does not live up to the expectation. In fact, using a generic synonym substitution usually turns out to generate worse results than doing nothing at all.

For why that is we need to look at when synonym substitution is helpful.

First prompts are silent.
Subsequent prompts loud and clear.
Now all prompts are heard.

Poem from R&D staff?
Yes. Rhyming sonnets were shakespeare-like complex;
they wrote Japanese haiku, showed as above.

The SAS R&D staff should complete some paper work in defects system before changing a code. They use informal descriptive language(HAIKUUU!) in the early stage. Chris Hemedinger, a senior software engineer at SAS, collected some haikus in his blog to show the humor side of SAS R&D staff. It’s interesting to cite one of the most famous haikus by Matsuo Bashō for comparison:

I read this verse in W. Bennentt’s popular book, The Book of Virtues, during the bus-to-company time this morning. It’s interesting to read Stevenson’s Treasure Island, of course in Chinese edition when I was young.

Yes, it sounds “uncool”, –I went to work, with technical documents in my bag, and read a for-children book. A grown-up with childlike innocence? dare not say. I just read the book to fresh my mind and my English.

I just wanted to quickly introduce myself as the SAS R&D manager for SAS Text Miner. With my research-oriented background, I will be posting distinctly different types of blog entries than you will see from Manya, Barry and Mary.

I will be looking at detailed technical approaches and algorithms being researched for handling text data, i.e. the grungy details. So if you are more interested in a bird's eye view, you may want to skim over my postings. On the other hand, if you want to understand how things work, why we've decided to take the approach that we do, and what we are considering doing for the future, then tune right in. And I encourage you to make comments and suggestions. I am not tied to particular approaches, and I would love to find out "better" ways to do things that we may not even have considered.

The onslaught of blogs and social media sites has initiated a huge power shift INTO the hands of customers. This can be good (if your customer is singing your praises), bad (if they are not), or more likely both.

The reality is that this represents a huge opportunity for businesses to use all available data in decision making to help you understand not only what your customers look like, but what they think. During a downturn in any economy, the customer is the last bastion, THE touch point to help you better understand what they think about your products and services.

Text mining is the technology that integrates structured and unstructured data to help you better understand your customers, enabling you to surpass the competition, save time and save money. While blogs and social media sites put power into customers’ hands, they also can empower businesses. Consider the JetBlue fiasco, which generated outrage across the Internet . The JetBlue CEO publicly apologized via YouTube! Since then, some 338,000 YouTube users have viewed the apology. They gave David Neeleman four stars for his performance. What also came out of this media was the opportunity to mine additional information – customer comments in response to the YouTube apology about the flight cancellations and peer ratings about those comments. All the makings of a “goldmine” for text and data mining for decision making. Current manual processes are inconsistent, costly and time-consuming, with information typically organized by functional area, not across the enterprise. Decisions get made in isolation. It's clear that companies must have automated processes to mine data to consistently to identify and quantify customer/product issues. Text mining is that technology. Businesses are rapidly embracing this technology. Are you one of them?

To take yesterday’s quote from a social media friend – “we live in a world of unlimited ideas”. When it comes to analyzing text this quote would probably have to be my mantra. Analyzing text itself isn’t exactly a new idea.Government agencies have been doing this behind closed doors for a long time. What’s new is the ability to understand textual information while NOT being behind closed doors. Text mining/text analytics technology is available for commercial businesses to understand data about their customers, their competitors and much more. We use text for analysis, and combine related numeric fields. But even numbers can be saved as text strings and voice signals can be translated to text and used for better understanding of information. Imagine a bullet pushing the sound barrier. That’s what I picture when I think of The Text Frontier. We like to push boundaries – hard. And we’d like to share our experiences with you. We encourage you to join us pushing boundaries while sharing your experiences, or just watch us and comment. Whether you watch, wait or dive in with your thoughts, here’s something for you to think about:

Treasures and memories and trash is what I found in my closet during a much needed cleaning. This was one of those deep cleans that only happens once every few years. I looked in every box, bag, and dark corner. You wouldn't believe the things I found -- treasures, memories, and trash. During one archeological dig into a plastic storage bag, I found a purse that had been long forgotten. As I am preparing it for the charity pile, I noticed a brilliant blue corner of cloth peeking out from inside the purse. I found this:

Do you recognize it? It is a SAS T-shirt from the early 90s. I got this shirt shortly after coming to work for SAS. (I'm guessing that it was about the time we released SAS 6.08.) Running SAS on Windows was new and exiting in the early 90s and this was a hot shirt. Finding this pristine, never-worn T-shirt started me to thinking. I can't be the only person with old SAS memorabilia stashed in a closet or drawer.

This post from Tom Hide on SAS-L assures me that I am not the only person keeping stuff. Tom has a copy of Guide to Using SAS 76. I've only seen pictures of manuals this old. See the pictures at the end of this post for a glimpse of old manuals as well as some other items from the past.

As you prepare your 2009 travel and education budgets, keep SAS Global Forum 2009 in mind. This conference is a great way to share and expand your existing SAS knowledge. The Gaylord National Resort in National Harbor will be over-run with SAS professionals from around the world. You can

meet the authors of your favorite conference paper

discover a new favorite paper

chat with SAS Technical Support staff, SAS R&D staff, and other SAS users.

If you are as excited about SAS Global Forum 2009 as I am, here's some information that you need to know.

SAS Global Forum registration and housing is now open. Take advantage of registering online for prompt confirmation reply and payment processes. Visit the home page at www.sasglobalforum.com and click REGISTER NOW.

The online registration system has a number of features that we hope you will find helpful. Those features include:

The ability to connect directly to hotel reservations at the end of your registration

The ability to log back into the system to review or modify your registration and hotel

Separate meal selections for each guest

If you have questions while registering or you are just wondering what else SAS Global Forum has to offer, look for the Chat Now option on the right side of most of our conference pages.

SAS Global Forum has reserved rooms at discounted group rates for conference attendees at hotels in the National Harbor, Maryland, and Alexandria, Virginia, areas. Reservations must be made by booking online.

SAS Press wants you to get to know your favorite SAS authors, so they are taping interviews with authors and making them available to you as podcasts. One of the very first podcasts was with Chris Hemedinger, co-author of SAS for Dummies.

One of the most recent podcasts is with Ron Cody, who is the author of seven SAS books, the first of which was published in 1985. His Learning SAS® by Example: A Programmer’s Guide is one of SAS’ best selling titles. Listen to this podcast to find out more about the quiet life of this popular SAS author.

In between Chris and Ron, we have interviewed Michael Raithel, Art Carpenter, Phil Mason, Howard Schreier, and many more. To hear Ron and the other authors talk about SAS, their reasons for writing the books that you love, and their writing process, visit the SAS Press podcasts page on support.sas.com.

Let us know what you think
We created a quick poll to gather a little information about how you use podcasts. Please take a minute to visit the poll in the right column of this page and tell us what you think.