Our project Context Based Search Engine is basically a research oriented project where we are required to understand the working of search engines such as google, msn etc and study the concepts of data mining and data warehousing and also to research on available search algorithms. We also would be doing a comparative analysis of Google Search API and Lucene framework.

Technology Used

Java

Platform

Windows Machine

Software and Hardware Requirements

JDK 1.5

Microsoft Windows XP Professional SP2

512 Mb RAM

80 Gb HDD

Pentium 4 processor

Context Based Search Engine Project Description

Although several new operating systems attempt to provide users with content-based search capabilities, they are limited to text documents. A key challenge in implementing a content-based similarity search system for feature-rich data is that such data is noisy and complex. For example, consider two different photographs of an identical scene, or two separate recordings of a person speaking the same sentence. Despite the high degree of similarity between the two images or between the audio files or data, the digital pattern of this data are different at very low level. By Comparing noise inside the digital data , usually data requires matching based on some similarity of pattern instead of exact match of digital representation. However, if we try similarity search in high dimensional data it is notoriously difficult. So in current scenario, today’s advanced search algorithms such as database tools and search engines have limited capability to search for exact matches. These kind of search engines can work only for textual data and text annotations only. To date, there is no practical content-based search engine for massive amounts of inherently noisy, feature-rich data.

Our application would be a code indexing and search application. It will be an application of Search API and Lucene framework. This is a branched out specialized domain from context based searching.