Global symbol errors - all along the way... need help with a little script

Can't Post

hello dear perl-experts,

I'm pretty new to Programming and OO programming especially. Nonetheless, I'm trying to get done a very simple Spider for web crawling.

Here's what i do not get to work

Code

#!C:\Perl\bin\perl

use strict; # You always want to include both strict and warnings use warnings;

use LWP::Simple; use LWP::UserAgent; use HTTP::Request; use HTTP::Response; use HTML::LinkExtor;

# There was no reason for this to be in a BEGIN block (and there # are a few good reasons for it not to be) open my $file1,"+>>", ("links.txt"); select($file1);

#The Url I want it to start at; # Note that I've made this an array, @urls, rather than a scalar, $URL my @urls = ('http://www.computersecrets.eu.pn/');

# I'm not positive, but this should only need to be set up once, not # on every pass through the loop my $browser = LWP::UserAgent->new('IE 6'); $browser->timeout(10);

#Request and receive contents of a web page; # Need to use a while loop instead of a for loop because @urls will # be changing as we go while (@urls) { my $url = shift @urls; my $request = HTTP::Request->new(GET => $URL); my $response = $browser->request($request);

#Tell me if there is an error; if ($response->is_error()) {printf "%s\n", $response->status_line;} my $contents = $response->content();

Re: [dilbert] Global symbol errors - all along the way... need help with a little script
[In reply to]

Can't Post

Hello,

Just one problem I see in your fixed script.

the line: next if $visited{$url};

really should be: next if $visited{$url}++;

The first time you see a particular url, it will be undefined (false) so the next won't apply. But the next time you find the same url, it will be defined (value == 1 == true) so the next will execute.

The '++' is a post increment so if the first time a url is found it will test then increment - that's how this works.

Re: [Chris Charley] Global symbol errors - all along the way... need help with a little script
[In reply to]

Can't Post

hello dear Chris,

many thanks or the reply great to hear from you.

well - great; i want to fetch

- all urls that contain a certain set of characters: "bar"

Code

"http://www.foo.com/bar"

in fact: i want to fetch all urls with the certain set of characters

Code

"http://www.foo.com/bar"

so the above mentioned code in the threadstart should search all the links with

Code

bar

so we have to rewrite this a bit...

In Reply To

Hello,

Just one problem I see in your fixed script.

the line: next if $visited{$url};

really should be: next if $visited{$url}++;

The first time you see a particular url, it will be undefined (false) so the next won't apply. But the next time you find the same url, it will be defined (value == 1 == true) so the next will execute.

The '++' is a post increment so if the first time a url is found it will test then increment - that's how this works.

Re: [dilbert] Global symbol errors - all along the way... need help with a little script
[In reply to]

Can't Post

You've posted 185 questions since Sept 2010 and often cross posted them on other sites and in a large number of those you were told how to fix Global symbol errors so you should be able to fix that part on your own.

Please post a short sample "links.txt" file so we can run some tests.

I see several problems in your code but before I comment on them I want to run a few tests using your links file.

Part of the solution will be to add another module. Most likely either URI or URI::Split.

Re: [FishMonger] Global symbol errors - all along the way... need help with a little script
[In reply to]

Can't Post

hello dear Fishmonger,

many many thanks for the reply - great to hear from you. Youre right.

I tried to make some efforts in php and perl - for some tasks perl is the language of choice....

here below i have the code that works - and that is the base of some further changes: the new tasks: well what i want to do now is to change is the following; i want to modify the script a bit - tailoring and tinkering is the way to learn. I want to fetch urls with a certain content in the URL-string ....

in other words: what is aimed:

- i need to fetch all the urls that contains the term " /bar " . in other words: - after fetching the urls i want to extract the "bar" so that it remains the url of the whole construct: http://www.xy.com/participants-database/

but first of all - here the code that works - the base of my weekend-project:

Code

#!C:\Perl\bin\perl

use strict; # You always want to include both strict and warnings use warnings;

use LWP::Simple; use LWP::UserAgent; use HTTP::Request; use HTTP::Response; use HTML::LinkExtor;

# There was no reason for this to be in a BEGIN block (and there # are a few good reasons for it not to be) open my $file1,"+>>", ("links.txt"); select($file1);

the results: i got back more than 200 lines - see below the output sample:

the new tasks: well what i want to do now is to change is the following; i want to modify the script a bit - tailoring and tinkering is the way to learn. I want to fetch urls with a certain content in the URL-string ....

in other words: what is aimed:

- i need to fetch all the urls that contains the term " /bar " . in other words: - after fetching the urls i want to extract the "bar" so that it remains the url of the whole construct: http://www.xy.com/participants-database/

well: ...to achieve this i need to tailor the script a bit. And yes: i think that i need split, why: if i have the results - i guess 200 or more lines - then i want to extract parts of the URLs using regular expressions. i have to strip the url

- URL scheme://domain:port..../participants-database/ ... that i can get the domain ,,,,,

so that i can get the urls alone...:

Well - That can be done with Perl like so:

given the general format of a URL is scheme://domain:port/path?query_string#fragment_id

While domain (and possible other parts of the URL) may contain Unicode characters, in the following we assume that only ASCII characters are used. Furthermore, we assume that

Code

scheme only consists of letters a–z and A–Z; domain does not contain :, ?, # or /; port is a natural number, :port is optional; path does not contain ? or #, path is optional; query_string does not contain #, ?query_string is optional; fragment_id can contain arbitrary characters, #fragment_id is optional.