get_links

This function extracts all URLs from a web page.

Syntax:

When called on an HTML::Miner Object :
$retun_element = $html_miner->get_links();
When called directly :
$retun_element = get_links( $url, $optionally_html_of_url );
The direct call is intended to be a simplified version of OO call
and so does not allow for customization of the useragent and so on!

Output:

This function ( regardless of how its called ) returns a pointer to an Array of Hashes who's structure is as follows:

So, to access the title of the second URL found you would use (yes the order is maintained):

@{ $retun_element }[1]->{ TITLE }

NOTES:

If ABS_EXISTS is 0 then DOMAIN, DOMAIN_IS_BASE, PROTOCOL and URI will be undefined
To extract URLs from a HTML snippet when one does not care about the url of that page, simply pass some garbage as the URL
and ignore everything except URL, TITLE and ANCHOR
"ANCHOR" might contain HTML such as <span>, use HTML::Strip if required.

get_page_css_and_js

This function extracts all CSS style sheets and JS Script files use on a web page.

Syntax:

When called on an HTML::Miner Object :
$retun_element = $html_miner->get_page_css_and_js(
CONVERT_URLS_TO_ABS => 0/1 [ B<Optional> argument, default is 1 ]
);
When called directly :
$retun_element = get_page_css_and_js(
URL => $url ,
HTML => $optionally_html_of_url , [ B<Optional> argument, html extracted if not provided ]
CONVERT_URLS_TO_ABS => 0/1 , [ B<Optional> argument, default is 1 ]
);
The direct call is intended to be a simplified version of OO call
and so does not allow for customization of the useragent and so on!

Output:

This function ( regardless of how its called ) returns a pointer to a Hash [ JS or CSS ] of Arrays containing the URLs

This function is NOT thread safe, use get_redirect_destination_thread_safe ( described below ) if this function is
being used within a thread and there is a chance that any of the interim redirect URLs are HTTPS.

get_redirect_destination_thread_safe

This function takes, as argument, an URL that is potentially redirected to another and another and ... URL and returns the FINAL destination URL and is thread safe.

This function CANNOT be called on the HTML::Miner Object.
This function hits the web for each redirect that it tracks - So to find the redirect of an URL that redirects 15 times it will
access the web 15 times. Do NOT use this function instead of get_redirect_destination unless you have to.

get_images

This function extracts all images from a web page.

Syntax:

When called on an HTML::Miner Object :
$retun_element = $html_miner->get_images();
When called directly :
$retun_element = get_images( $url, $optionally_html_of_url );
The direct call is intended to be a simplified version of OO call
and so does not allow for customization of the useragent and so on!

Output:

This function ( regardless of how its called ) returns a pointer to an Array of Hashes who's structure is as follows:

So, to access the alt text of the second image found you would use (yes the order is maintained):

@{ $retun_element }[1]->{ TITLE }

NOTE:

If ABS_EXISTS is 0 then IMG_DOMAIN and DOMAIN_IS_BASE will be undefined
To extract images from a HTML snippet when one does not care about the URL of that page, simply pass some garbage as
the URL and ignore everything except absolute locations and domains.

get_meta_elements

This function retrieves the following meta elements for a given URL (or HTML snippet)

Page Title
Meta Description
Meta Keywords
Page RSS Feeds

Syntax:

It is called through the HTML::Miner Object as follows:

$return_hash = $html_miner->get_meta_elements( );

It is called directly as follows:

$return_hash = $html_miner->get_meta_elements(
URL => "url_of_page" ,
HTML => "html_of_page
);
Note: The above function requires either the html of the url. If the
HTML is provided then the URL is used to retrieve the HTML.
If both are not provided this function will croak.
Again this function does not allow for customization of User Agent
and timeout when called directly.

Any use, modification, and distribution of the Standard or Modified Versions is governed by this Artistic License. By using, modifying or distributing the Package, you accept this license. Do not use, modify, or distribute the Package, if you do not accept this license.

If your Modified Version has been derived from a Modified Version made by someone other than you, you are nevertheless required to ensure that your Modified Version complies with the requirements of this license.

This license does not grant you the right to use any trademark, service mark, tradename, or logo of the Copyright Holder.

This license includes the non-exclusive, worldwide, free-of-charge patent license to make, have made, use, offer to sell, sell, import and otherwise transfer the Package with respect to any patent claims licensable by the Copyright Holder that are necessarily infringed by the Package. If you institute patent litigation (including a cross-claim or counterclaim) against any party alleging that the Package constitutes direct or contributory patent infringement, then this Artistic License to you shall terminate on the date that such litigation is filed.

Disclaimer of Warranty: THE PACKAGE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS "AS IS' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES. THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT ARE DISCLAIMED TO THE EXTENT PERMITTED BY YOUR LOCAL LAW. UNLESS REQUIRED BY LAW, NO COPYRIGHT HOLDER OR CONTRIBUTOR WILL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THE PACKAGE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Module Install Instructions

To install HTML::Miner, simply copy and paste either of the commands in to your terminal

As a valued partner and proud supporter of MetaCPAN, StickerYou is
happy to offer a 10% discount on all Custom Stickers,
Business Labels, Roll Labels,
Vinyl Lettering or Custom Decals. StickerYou.com
is your one-stop shop to make your business stick.
Use code METACPAN10 at checkout to apply your discount.