ht://Check Features

ht://Check is made up of two logical parts: a "spider" which starts checking URLs
from a specific one or from a list of them; and an "analyser" which takes the
results of the first part and shows summaries (this part can be done via console
or by using the PHP interface through a web server).

The "Spider" or "Crawler"

- HTTP/1.1 compliant with persistent connections and cookies support
- HTTP Basic authentication supported
- HTTP Proxy support (basic authentication included)
- Crawl customisable through many configuration attributes which let the user
limit the digging on URLs pattern matchings and distance ("hops") from the first URL.
- MySQL databases directly created by the spider
- MySQL connections through user or general option files as defined by the
database system (/etc/my.cnf or ~/.my.cnf)
No support for Javascript and other protocols like HTTPS, FTP, NNTP and local files.

The "Analyser"

Just a preface: as long as all of the data after a crawl are all stored into a
MySQL database, it is pretty easy to get your desired info by querying the
database. The spider, anyway, is included into the 'htcheck' application, which
at the end shows by itself a small text report. In a second time you can always
retrieve info from that database by building your own interface (PHP, Perl for
instance) or by just using the default one written in PHP.

I also believe that ht://Check builds a data source that can be used for
Web structure mining, revealing knowledge about the relationships within
and between documents. Also Web usage mining tools can find interesting
information from ht://Check, and use it as auxiliary data source in order
to build a sort of site map.

The database schema

Here you can find a very *skimmy* entities-relationships diagram of the
every ht://Check database being created by the spider (click on the image
for details).

The tables as of the 'mysqldump' program

Here follows the structure of the tables of the a typical ht://Check database,
as created by the mysqldump program. Please refer to the MySQL
documentation for more and further information. And if you find some useful
advice and suggestions to give me regarding the database (and of course
everything else) please come up tome with an e-mail! :-)