On Wed, Dec 03, 2003 at 04:21:10AM -0800, John Angel wrote:
> Is it possible to have default configuration for all sites (using Prog
> mode)?
>
> E.g. I want to index several sites using the same settings for agent, email,
> keep_alive, etc. Is it necessary to repeat all that parameters for all
> websites in the list?
I'm not quite clear. You want to reuse the settings for different
sites?
The spider config file is loaded with a do() call in Perl, which simply
executes the commands in that file. All that's required by that file is
to set a variable called @servers. So, that file can do anything -- it
can read parameters from a database if you like.
If you just want to use the same settings for a bunch of servers that
get indexed at the same time you can do something like this:
my %default_config = (
agent => 'swish-e spider http://swish-e.org/',
email => 'swish@domain.invalid',
# limit to only .html files
test_url => sub { $_[0]->path =~ /\.html?$/ },
);
my @hosts_to_spider = qw(
http://first.host/index.htmlhttp://second.host/index.htmlhttp://third.host/index.html
);
# Now push the configs onto the @servers array
for my $cur_host ( @hosts_to_spider ) {
my %this_host = (
base_url => $cur_host,
%default_config, # copy in default parameters
);
push @servers, \%this_host;
);
> The another problem with that is local setting of use_md5=1. How to avoid
> duplicates from different servers if this option is not set globally?
The md5 hashes are stored globally, but the option to check them is per
server.
--
Bill Moseley
moseley@hank.org