> Please think about this a little more. In essence you will have to
> proxy/filter the retrieved HTML page. This means that the base URL will
> be different so all relative links and URLs for any components
> referenced from that page (images, activeX controls, applets, etc.) will
> have to be modified. The proxy will have to interpret the HTML and
> modify the right tags.
> Now think about the problems with CSS, Layers, Javascript, Frames,
> etc...

for the project that this will be used (at least one part of it) there
will be a set of files (pure html, no scripts) that will be indexed
(about 1.5 Gb of data currently). these files will be used only
for this search engine. this particular instance should be ok for this
solution (once i get the spaces thing working right).

>
> I know this is not impossible to solve since my company actually does
> this... I know how big a job this is to get right. :-)

that is a good feeling. at least i know it's possible ;)

cam

Cam Proctor, System Administrator, PowerNET ISP bcp1@kopower.com
---------------------------------------------------------------------------
I've had answers for years...Nobody'd ever listen...They burned themselves
with a choice made long ago by a conspiracy of men who thought they could
sleep with the enemy, only to awaken another enemy...(What does that mean?)
It means, the future is here... All bets are off.

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
htdig@htdig.org containing the single word "unsubscribe" in
the SUBJECT of the message.