Online Techniques to find sub-trending phrases in Streaming Documents

View/Open

Date

Author

Metadata

Abstract

Real world events are almost always being discussed and in some cases, documented, on various popular social networks online as hashtags. Hashtags are the Internet's analogues of topics which are popular among users at the present instant of time. The choice of the text used to represent the hashtag is often highly contextual on some outstanding characteristics of the real world event, and is often derived from vernacular or common parlance and thus incognizant of the central aspects of the real world event. We thus are encountered with many hashtags which though represent the same real world event, but have seemingly unrelated names. This problem of finding definition of a trend is unique in its chief constraint of being dynamic due to constant evolution of the real world event. Thus a procedure to identify the main aspects of a trending event should also be temporally sensitive to shift their result along with the development of the event in the real world with the passage of time. We discuss an approach to identify such candidate representative phrases through identifying trends within trends, to mean that if certain phrases are being observed frequently within a time interval, they must represent a contemporary topic of rising interest among the documents being generated at the time. This technique gives emphasis to various aspects of the trending event which are analogous to sub-events within the main event. One of the cornerstones of the technique is to maintain an online mode of operation, with a methodology to observe and process the incoming stream of documents only once. This also makes it practicable to most real world scenarios where information systems consume feeds from various sources and ingest various forms of metadata to enhance their processes of information extraction.