DataKnowledgeEngJour.. - School of Computing

3. Conceptual Design For

3. Conceptual Design For many years, when library users have problems finding what they need using the search systems in the libraries, they would approach the reference librarians. The latter are people who are more knowledgeable and proficient with the catalog search system. In general, librarians would ask what the user is looking for, clarifying the subject or topic when necessary before constructing a query in boolean logic to search the catalog. If too little records are retrieved, the librarian would either change some of the boolean operators in the query, like changing the AND operator to OR to get more records, or try using similar keywords or broader subject headings in the reformulation. Similarly if too many records are retrieved, the librarian may try modifying the boolean operators or use narrower subject headings to reduce the search results. This process goes on until the user is satisfied with the search result. Basically, there are two types of knowledge present in a search session: 1. Domain knowledge – Classification information of records in OPACS such as Subject Headings, and Dewey and Library of Congress Call Numbers. These information can be used to help users clarify their search topic and locate the relevant documents. The hierarchy present in the various classification schemes can also be used to broaden or narrow a search. 2. Domain-independent knowledge – Search strategies that librarians used to formulate the user’s original query to the format expected by the OPAC (boolean logic), and reformulation of the query by modifying the boolean logic operators to get more or less records. In the design of the E-Referencer, we have decided to incorporate these two types of knowledge into the system. A conceptual knowledge base of domain knowledge has been incorporated into the E- Referencer to map keywords to concepts represented in the subject headings. The domain-independent knowledge of formulation and reformulation rules has been implemented as search strategies in the E- Referencer. For a complete description of search strategies used in the E-Referencer, the reader is referred to an earlier published paper [21]. 11

4. System Design and Implementation Having illustrated the potential of the E-Referencer, we will now describe the design and implementation of the E-Referencer. 4.1 Design Approach and Considerations The approach we have adopted in developing our system is that of rapid prototyping and incremental development. We first implemented an initial prototype using simple-minded strategies specified by an experienced librarian. We then carry out experiments to evaluate the system and compare its performance with that of experienced librarians. From this, we identify the areas the system is deficient and how it can be improved. The prototype system had been designed to study the reasons why experienced librarians are more superior in their searches than ordinary users, and how expert search systems can be designed to match what experienced librarians can do. This approach of rapid prototyping is necessary because at the start of the project, we do not know which strategy is the most effective one to use. A detailed planning and design approach is thus not feasible. This cycle of incremental development, testing and redevelopment has allowed us to add new features and refine the system gradually. The increasingly popular Z39.50 Informational Retrieval protocol was used to provide a common interface to multiple online catalogs. Search strategies as used by librarians have been incorporated into the knowledge base of the prototype system using JESS. E-Referencer uses a three-tier design architecture consisting of a client, proxy and server. The client handles user interaction, the server (a Z39.50 server) contains the data and search strategies, and a proxy sits between the client and server. This approach is necessary because the subject heading database required by the conceptual knowledge base is huge, around 300 megabyte. It is not feasible to send this large database across the network to every client in order to extract subject headings of usually a few keywords (at most ten). In our design, the proxy houses the database and handles all the 12