InfoSci®-Journals Annual Subscription Price for New Customers: As Low As US$ 4,950

This collection of over 175 e-journals offers unlimited access to highly-cited, forward-thinking content in full-text PDF and XML with no DRM. There are no platform or maintenance fees and a guarantee of no more than 5% increase annually.

Receive the complimentary e-books for the first, second, and third editions with the purchase of the Encyclopedia of Information Science and Technology, Fourth Edition e-book. Plus, take 30% off until July 1, 2018.

Take 20% Off All Publications Purchased Directly Through the IGI Global Online Bookstore: www.igi-global.com/

Abstract

Clustering is well suited for Web mining by automatically organizing Web pages into categories each of which contains Web pages having similar contents. However, one problem in clustering is the lack of general methods to automatically determine the number of categories or clusters. For the Web domain, until now there is no such a method suitable for Web page clustering. To address this problem, we discovered a constant factor that characterizes the Web domain, based on which we propose a new method for automatically determining the number of clusters in Web page datasets. We also propose a new Bidirectional Hierarchical Clustering algorithm, which arranges individual Web pages into clusters and then arranges the clusters into larger clusters and so on until the average inter-cluster similarity approaches the constant factor. Having the new constant factor together with the new algorithm, we have developed a clustering system suitable for mining the Web.