"... Abstract. We describe Athena: a system for creating, exploiting, and maintaining a hierarchy of textual documents through interactive miningbased operations. Requirements of any such system include speed and minimal end-user e ort. Athena satis es these requirements through linear-time classi cation ..."

Abstract. We describe Athena: a system for creating, exploiting, and maintaining a hierarchy of textual documents through interactive miningbased operations. Requirements of any such system include speed and minimal end-user e ort. Athena satis es these requirements through linear-time classi cation and clustering engines which are applied interactively to speed the development of accurate models. Naive Bayes classi ers are recognized to be among the best for classifying text. We show that our specialization of the Naive Bayes classi er is considerably more accurate (7 to 29 % absolute increase in accuracy) than a standard implementation. Our enhancements include using Lidstone&apos;s law of succession instead of Laplace&apos;s law, under-weighting long documents, and over-weighting author and subject. We also present a new interactive clustering algorithm, C-Evolve, for topic discovery. C-Evolve rst nds highly accurate cluster digests (partial clusters), gets user feedback to merge and correct these digests, and then uses the classi cation algorithm to complete the partitioning of the data. By allowing this interactivity in the clustering process, C-Evolve achieves considerably higher clustering accuracy (10 to 20 % absolute increase in our experiments) than the popular K-Means and agglomerative clustering methods. 1

...ed to base our classi er on the Naive-Bayes model [Goo65] for the following reasons: { Naive-Bayes classi ers are very competitive with other techniques for text classi cation [CDAR97] [LR94] [Lan95] =-=[PB97]-=- [MN98]. 2 { They stabilize quickly [Koh96], which supports automated hierarchy reorganization with a limited number of examples. { They are fast. They can be constructed quickly with a single pass ov...

"... Document filtering is increasingly deployed in Web environments to reduce information overload of users. We formulate online information filtering as a reinforcement learning problem, i.e. TD(0). The goal is to learn user profiles that best represent his information needs and thus maximize the expec ..."

Document filtering is increasingly deployed in Web environments to reduce information overload of users. We formulate online information filtering as a reinforcement learning problem, i.e. TD(0). The goal is to learn user profiles that best represent his information needs and thus maximize the expected value of user relevance feedback. A method is then presented that acquires reinforcement signals automatically by estimating user&apos;s implicit feedback from direct observations of browsing behaviors. This &quot;learning by observation&quot; approach is contrasted with conventional relevance feedback methods which require explicit user feedbacks. Field tests have been performed which involved 10 users reading a total of 18,750 HTML documents during 45 days. Compared to the existing document filtering techniques, the proposed learning method showed superior performance in information quality and adaptation speed to user preferences in online filtering.

"... The problem of assigning conference paper submissions to suitable reviewers can be viewed asavariant of the general problem of technical paper recommendation. In both cases one would ideally like to direct only those papers that are of the greatest interest to the appropriate set of people. Current ..."

The problem of assigning conference paper submissions to suitable reviewers can be viewed asavariant of the general problem of technical paper recommendation. In both cases one would ideally like to direct only those papers that are of the greatest interest to the appropriate set of people. Current attempts to automate the conference reviewing process have typically converted it into a task that requires reviewers to rate keywords and sift through long lists of abstracts to nd those that are appropriate for their interests and background. In this paper, we propose an automated method for recommending small focused sets of papers to reviewers. We show howintelligent paper recommendation can be performed by combining techniques from information retrieval and database technology, and by mining multiple information sources from the Web. We use abstracts of papers submitted to AAAI-98 and data mined from the home pages of its program committee members, and we evaluate our approach based on actual reviewing preferences given by the committee members. 1

"... In this paper we present a framework analysis for managing the feedback explicitly given by visitors of a Web site. We introduce the concepts of scope, ltering, and relevance proles for managing users ' feedback, and show their applicability by using Gugubarra as a reference system, a prototype ..."

In this paper we present a framework analysis for managing the feedback explicitly given by visitors of a Web site. We introduce the concepts of scope, ltering, and relevance proles for managing users &apos; feedback, and show their applicability by using Gugubarra as a reference system, a prototype developed by DBIS at the Goethe University of Frankfurt, for creating and managing user pro les of Web visitors.

...spect to their reputation. For example using a di erent lter f for expert and non-expert users or with a speci c consistency history, we can get di erent weights re ecting their reputation levels. In =-=[15]-=- algorithms for learning and revising user pro les are de ned that can determine which World Wide Web sites on a given topic would be interesting to a user. The authors use a Bayesian classi er for th...

"... In this paper we describe Calvin, an intelligent agent that learns user interests by monitoring user activities while he/she searches and browses the Web. The user profile is created and maintained from a contentbased and event-based analysis of the visited pages using Inductive Logic Programming. T ..."

In this paper we describe Calvin, an intelligent agent that learns user interests by monitoring user activities while he/she searches and browses the Web. The user profile is created and maintained from a contentbased and event-based analysis of the visited pages using Inductive Logic Programming. The user submits queries which are expanded considering the information represented in her/his profile. Once the expanded query is submitted to and answered byasearch engine, the agent performs a relevance ranking of the results based on the user interests. After some experiments, Calvin has demonstrated tobecapable of learning and adapting user interests without any explicit feedback from her/him.

"... We describeAthena: a system for creating, exploiting, and maintaining a hierarchical arrangement of textual documents through interactive mining-based operations. Requirements of any suchsystem include speed and minimal end-user e ort. Athena satis es these requirements through linear-time classi ca ..."

We describeAthena: a system for creating, exploiting, and maintaining a hierarchical arrangement of textual documents through interactive mining-based operations. Requirements of any suchsystem include speed and minimal end-user e ort. Athena satis es these requirements through linear-time classi cation and clustering engines which are applied interactively to speed the development of accurate models. Naive Bayes classi ers are recognized to be among the best for classifying text. We show that our specialization of the Naive Bayes classi er is considerably more accurate (7 to 29% absolute increase in accuracy) than a standard implementation. Our enhancements include using Lidstone&apos;s law of succession instead of Laplace&apos;s law, under-weighting long documents, and over-weighting author and subject. We also present anewinteractive clustering algorithm, C-Evolve, for topic discovery. C-Evolve rst nds highly accurate cluster digests (partial clusters), gets user feedback to merge and correct these digests, and then uses the classi cation algorithm to complete the partitioning of the data. By allowing this interactivity in the clustering process, C-Evolve achieves considerably higher clustering accuracy (10 to 20 % absolute increase in our experiments)thanthe popular K-Means and agglomerative clustering methods. 1