Sunday, June 21, 2009

Figure 1: Top-20 query set similarity, or, what fraction of the top-20 queries, separated by a time lag (X-axis) are the same. (Click figure to enlarge).

Figure 2: Frequency of occurrence of a query name vs. the number of unique query names in the top-20 (hourly) query sets over 6 months. About half the queries appear twice or once (Click figure to enlarge)

Twitter has become the latest darling Silicon Valley start-up. What started off as a seemingly incremental idea ("micro-blogging" in 140 characters or less) seems to have caught on big time - more people are twittering than ever before. Twitter has shown its prowess in everything from influencing the American presidential election to challenging Iranian theocracy. Its popularity makes it a very compelling service, but how can it make money for its founders, Evan Williams and Biz Stone, and its promoters?

Needless to say, many Twitter posts (tweets) are inane (A dog bit me) instead of newsworthy (I bit a dog). Many users underestimate the difficulty in producing a constant stream of interesting 140-character long information texts from their everyday lives and experiences. Fortunately, Twitter comes with a search engine optimized to index tweets in real-time. So users can query Twitter for useful information within the deluge of tweets being posted every moment. This makes Twitter a real-time application and a perfect vehicle for propagating news on the Internet. Product releases, reviews, security bugs and vulnerabilities, company press releases, executives' and analysts' statements, etc. previously had to wait for a search engine to crawl and index them on the Internet (a lag of weeks sometimes). But because the tweets first come to Twitter directly from users, these announcements are instantly indexed and available for search via Twitter's search engine.

Twitter provides a great API to study queries returned by Twitter over a time period. The API allowed me to download information about the top-20 most-popular queries submitted to Twitter in every 1-hour interval over 6 months. Parsing this information sheds light on what users search for when they go online looking for time-sensitive information. It also suggests ways of monetizing this vast treasure of users' mind-space - what they are think, search, and find as time goes by.

Figure 1 shows top-20 query name set similarity, or, what fraction of the top-20 queries, separated by a time lag (X-axis) are the same. Twitter reported the set of 20 most-popular query names in each hour. The figure is plotted by finding the fraction of common elements between any two such sets separated by a certain lag (X-axis). The similarity quickly dies within 48 hours, and after 14 days, it settles at about 0.1 (meaning only 2 of 20 query names remain the same between the compared query name sets). There is also a noticeable drop in similarity at 24 and 48 hours (probably due to periodicity effects). Also note the diurnal bumps. Why does that happen?

Figure 2 plots the frequency of occurrence of a query name vs. the number of unique query names in the top-20 (hourly) query sets over 3 months. The majority of top-20 queries captured users' interest for a very few hours. Only about 20% of the queries remained in the top-20 lists for more than 10 hours in the 6 month period.

But what does all this mean for making money via Twitter. Well, here are my 2 cents:

1. Marketing benchmarks: Companies can use Twitter query-popularity to measure their marketing success. Say, how much has an advertising campaign been able to enter a customers mind measured in terms of query frequency (say as compared to their competitors). Twitter can develop and sell analytic tools for companies to measure such stuff. In my opinion, looking for this information in user queries is more effective than looking in the tweets themselves because the latter can be gamed (spam tweets) and because tweeting users are still a minority of people passionate about posting messages (as compared to the silent majority that does not post). Monitoring product mentions in query names is also a great way to keep tabs on marketing success. For example, in Figures 2 and 3, a product name occurring frequently is good news, but if the rate of occurrence decreases over time then its time to launch more marketing efforts or to improve the product's visibility in some way.

2. Real-time Customer feedback: Product groups can use Twitter query information to pinpoint product bugs, fast. There is a certain cost for a user to go on the Internet and search Twitter for say, "iPhone screen blank". If such a query bubbles up in query popularity (say, a top-XX query), then the bug is most certainly a widespread issue. Twitter's real-time feature highlights problems very quickly and efficiently. Selling such product specific query information to companies may create a nice revenue stream for Twitter.

3. Keyword Analytics: The occurrence of a product name with another query word may signal a selling opportunity. For example, the query "iPhone anti-virus" may point to market demand for Apple to sell anti-virus software with its iPhone. Keyword analytics can also be used to help in choosing keywords for online advertising.

4. Risk Management: Twitter quickly captures the viral spread of information on the Internet. This could allow a company to react to, say, a malicious video posted about its product on You-tube. Twitter is your fast-response Internet guardian. For example, Twitter can offer a service to subscribed companies that notifies them about any information (positive or negative) gaining traction on Twitter.

And finally, here is a list of Twitter search strings that were in the 20-most popular search lists for 100 hours or more (over a 6 month period). Funny how Apple dominates the top-3 slots, and then there is AT&T on the 4th slot (probably due to selling the iPhone in the USA). Hats off to Apple's marketing to have captured so much of users' mind-space. Or are they gaming Twitter? Or are Twitter users disproportionately Apple fans? Or is Twitter the newest Apple rumor spreading mechanism?