An Incremental Algorithm for Data Mining based on Rough Sets

Abstract:

In data mining, there are lots of methods about learning interesting rules (knowledge) from a database or an information table. Pawlak’s Rough Set Theory (RST), is a useful method for data mining and data analysis. Successful applications in data mining have proved that those learning approaches from the view of rough sets are rather helpful and valuable in obtaining interesting rules. Such approaches, however, assume that training examples recorded in a database will eventually converge to a stable state. That is, the information table is a finite and fixed set of records which share a common set of properties. In contrast to this assumption, the volume of data grows rapidly. For example, in 2006 the eBay’s massive oracle database has over 212 million registered users, holding two Petabytes of user Data. The database is running on Teradata with over 20 billion transactions per day. For management and market decision in such a business environment, an efficient rule learning algorithm with the real time processing ability is extraordinarily valuable. In this presentation, we will introduce an incremental RST algorithm based on the assumption that the objects in the information table change while time evolves.