Practicle Dristributed Web Crawler

Sunday, June 8, 2008

The main problem that this project faces is to solve the need of very high resources that is required to provide a successful web crawling. Most of the web crawlers used at the present date uses server farms to cater their needs. This makes the area untouchable for normal developers. My goal is to reduce the resources for web crawling by using a distributed system.

The distributed system will be used to do the web crawling and also the data processing. And a single database server to store the data.And also the project will provide the searching facility according to page details and images tags to provide a better image search.

Friday, June 6, 2008

The final goal of this project is to create a web crawler that can work under the practical environments. In order to achieve that level the project will look in depth to usability and the flexibility of the product.

The final product will be able attract users to the system, and to distribute a client among them. And the web crawler will be capable of collecting information under given key words so that the clients may customize the search patterns according to their needs.

This will provide the common user a web site that he/she might search for web address under a given key word or images under given key words. And for image search the users may select if they want a search according to image tags or page content.

And other developers will be able to download the editable version of the client to edit and distribute a client that is capable of searching a specific area that the developer needs to focus on.

The final deliverables are as followings –

·An online web server application to distribute the workload to the clients.

·An online web site to do the publicity and to distribute the web crawler.

Wednesday, June 4, 2008

The development methodology I have selected to complete this project is the Extreme Programming methodology. Extreme programming is a methodology encourages the developer to start from the simplest form of the product.

This allows the development to be flexible to future developments and extra functionalities. And extreme Programming favors simple designs, common metaphors, collaboration of users and programmers, frequent verbal communication, and feedback. Since user involvement is a must in this project this will also make it flexible to user requirements and to change accordingly.

The project will undergo 3 meager areas in the development stage. Each stage is a expanded and added functionalities of the previous stage.

Areas developed under the 1st stage:

The web server application to distribute information and collect user support and comments

Human resources will be used in the testing phase to identify errors and also to make sure that the system is compatible under any working environment. voluntarily participants are very important for this project since this is a distributed system

I'm a Final Year Undergraduate of APIIT Sri Lanka (B.Sc. in Computing). 221BoT is my final year project. This project is supervised by Mr: Ashan Fonseka and assessed by Dr: Damith Mudugamuwa, of APIIT Sri Lanka.