Web-Harvest is Open Source Web Data Extraction tool written in Java. It offers a way to collect desired Web pages and extract useful data from them. In order to do that, it leverages well established techniques and technologies for text/xml manipulation such as XSLT, XQuery and Regular Expressions. Web-Harvest mainly focuses on HTML/XML based web sites which still make vast majority of the Web content. On the other hand, it could be easily supplemented by custom Java libraries in order to augment its extraction capabilities.

Process of extracting data from Web pages is also referred as Web Scraping or Web Data Mining. World Wide Web, as the largest database, often contains various data that we would like to consume for our needs. The problem is that this data is in most cases mixed together with formatting code – that way making human-friendly, but not machine-friendly content. Doing manual copy-paste is error prone, tedious and sometimes even impossible. Web software designers usually discuss how to make clean separation between content and style, using various frameworks and design patterns in order to achieve that. Anyway, some kind of merge occurs usually at the server side, so that the bunch of HTML is delivered to the web client.

Piwik is a downloadable, open source (GPL licensed) real time web analytics software program. It provides you with detailed reports on your website visitors: the search engines and keywords they used, the language they speak, your popular pages… and so much more.

Piwik is a PHP MySQL software program that you download and install on your own webserver. At the end of the five minute installation process you will be given a JavaScript tag. Simply copy and paste this tag on websites you wish to track (or use an existing plugin to do it automatically for you) and access your analytics reports in real time.

Red Hat Enterprise Linux 3 with a 2.4 kernel base uses a single, robust, general purpose I/O elevator. The I/O schedulers provided in Red Hat Enterprise Linux 4, embedded in the 2.6 kernel, have advanced the I/O capabilities of Linux significantly. With Red Hat Enterprise Linux 4, applications can now optimize the kernel I/O at boot time, by selecting one of four different I/O schedulers to accommodate different I/O usage patterns:

The following are the tunable files for the deadline scheduler. They can be tuned to any suitable value according to hardware performance and software requirements:
/sys/block/DEVNAME/queue/iosched/read_expire
/sys/block/DEVNAME/queue/iosched/write_expire
/sys/block/DEVNAME/queue/iosched/fifo_batch
/sys/block/DEVNAME/queue/iosched/write_starved
/sys/block/DEVNAME/queue/iosched/front_merges

DEVNAME is the name of block device (such as sda, sdb, hda, etc)

A detailed description of the deadline I/O scheduler can be found at:
/usr/share/doc/kernel-[version]/Documentation/block/deadline-iosched.txt.