inductive databases and knowledge scouts

(Michalski, Kaufman, Pietrzykowski,
This e-mail address is being protected from spambots. You need JavaScript enabled to view it.
, Wojtusiak, Sharma,
This e-mail address is being protected from spambots. You need JavaScript enabled to view it.
, Fischthal, Alkharouf,
This e-mail address is being protected from spambots. You need JavaScript enabled to view it.
, Draminski, Glowinski)

The objectives of this research are to develop, implement, and test a methodology for building inductive databases, which extend conventional databases by integrating in them inductive inference capabilities. These capabilities allow a database to answer queries that require synthesizing plausible knowledge. Such knowledge is not directly or deductively obtainable from the database, but can be hypothesized through inductive inference. This knowledge may be in the form of hypotheses about future datapoints, likely consequences from the data, generalized data summaries, emerging global patterns, exceptions from hypothesized patterns, suspected errors and implied inconsistencies, hypothetical plans synthesized from the data, etc.

These capabilities are obtained by implementing a new type of database operators that are based on methods for inductive inference developed in the fields of machine learning and approximate reasoning. These operators, together with conventional DB operators, are integrated into a knowledge generation language, which allows a user to create scripts for synthesizing desirable knowledge (target knowledge). A script includes a plan of operations to be performed on a database, and an abstract definition of target knowledge. A script can run continuously in the background, and outputs its findings when an alert criterion is satisfied or on the user's request. As inductively derived knowledge normally has lower certainty than directly or deductively obtained knowledge, results of inductive queries are annotated by a certainty measure.

An inductive database can be used to build knowledge scouts, which are specialized agents operating on a system of databases (e.g., one or more distributed temporal databases, web, etc.). Their function is to synthesize and manage knowledge that is tailored to a specific user or a defined group of users. During the course of its existence, a knowledge scout builds a model of interests and experiences of the user, and employs that model in synthesizing the target knowledge (e.g., builds a data summary on a specific topic, generates a personal travel plan, etc.). Our initial efforts toward the development of the concept of an inductive database have resulted in a preliminary system, which integrates a database with a simple knowledge base and several machine learning and inference programs. The system includes a preliminary knowledge generation language (KGL-1) for creating simple scripts for building knowledge scouts.

A simple knowledge scout based on these principles was experimentally implemented for problems of determining multidimensional patterns in a medical database (click here to download the paper describing this work).

Current research has focused methodologies for on building an inductive database system that includes domain-specific knowledge systems, access to a relational DBMS through SQL, an addressable knowledge base stored in relational tables, a knowledge query language (KQL) for creating and applying knowledge scouts, and a functionally oriented graphical interface for facilitated use. One distinguishing feature of this approach is the storage of knowledge in relational tables along with the data. The hierarchical storage scheme allows for easy access, querying, and manipulation of the knowledge, manually, or through KQL.

In conjunction with this research, we are developing a system VINLEN that makes available these capabilities through an understandable graphical interface representing the knowledge system and its available operators. Among the areas to which VINLEN is to be applied are computer intrusion and misuse detection through user profiling, and the discovery of climatological patterns and relationships.

Any opinions, findings and conclusions or recomendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).

References

Wojtusiak, J., Michalski, R. S., Kaufman, K. and Pietrzykowski, J., "The AQ21 Natural Induction Program for Pattern Discovery: Initial Version and its Novel Features," Proceedings of The 18th IEEE International Conference on Tools with Artificial Intelligence, Washington D.C., November 13-15, 2006.

Kaufman, K., Michalski, R. S., Pietrzykowski, J. and Wojtusiak, J., "An Integrated Multi-task Inductive Database and Decision Support System VINLEN: An initial implementation and first results ," Presented at the 5th International Workshop on Knowledge Discovery in Inductive Databases, KDID'06, in conjunction with ECML/PKDD, Berlin, Germany, September 18, 2006.