nutch-dev mailing list archives

Can I add a url to be crawled without putting it in a file and feeding it to "Inject"?

Date

Wed, 05 Aug 2009 16:57:46 GMT

I want to do some specific crawling where I crawl one site with one
set of urls to accept/reject, then reset to crawl another site with
another set of urls to accept/reject, etc. I'm writing my own wrapper
that sticks the urls to accept/reject into the Configuration and a
URLFilter that uses that configuration item to do the
accepting/rejecting, but I don't see how to make it start at a given
url other than making a dir/url file with that url in it. In this
case that's inefficient and I'd rather just parse one file with a list
of urls and the accept/reject list for that url, then say "Inject this
url", then do my own generate/fetch/updatedb cycle, then inject the
next and repeat.
--
http://www.linkedin.com/in/paultomblin