Plucker Perl HTML Spider

The community of Plucker users could benefit from a Perl-based HTML spider that can take a parent URL, and follow the links to a specified depth, and convert those pages to the standard Plucker document format:

Plucker Document Format:

[url removed, login to view]

Plucker Workshop:

[url removed, login to view]

Plucker Homepage:

[url removed, login to view]

The spider will take several arguments, matching the existing Python spider that is used currently for this task.

I started to rewrite the Python spider in Perl several years ago, and had to give up on the effort due to time constraints.

These include (but are not limited to):

--url: Home/parent url for root document (starting page)

--file: Final output filename for the completed document

--maxdepth: Maximum depth to spider the content

--bpp: Bits per pixel for images; 1, 2, 4, 8, 16

--no-urlinfo: Do not include info about the URLs

--compression: none, doc, zlib

--stayonhost: Do not follow external URLs

--stayondomain: Do not follow URLs off of this domain

--staybelow: Stay below this URL prefix

--launchable, --not-launchable: Set/unset the Launchable attribute

--backup, --no-backup: Set/unset the Backup attribute

--beamable, --not-beamable: Set/unset the Beam attribute

The document format is open and available, and I can provide as many examples and resources as possible to make this as easy as possible.

At a bare minimum, the spider must be able to handle HTML, RSS and text formats.

For the right person chosen for this work, this project must leverage as many upstream CPAN modules as possible to keep the spider itself small and compact (LWP, GD, XML::RSS, Parallel::UserAgent, etc.)

This should be very easy to complete in a very short period of time.

When bidding, please provide examples of previous and relevant work you've done in this area.

The entire Plucker Open Source community of several thousand users would be using this tool, if it was available, so please make the code as clean, extensible and well-commented as possible; it will be seen and reviewed by thousands of eyes across the planet.