[back to the list]
On Wed, Mar 03, 2004 at 12:31:14PM -0800, Gil Vidals wrote:
> Well if it will only take you a minute, could you do it cheaper than by the
> day ;-)If it's as easy as you say can you just show me how this is done?
How what is done? Indexing?
moseley@bumby:~$ cat c
HTMLLinksMetaName links
moseley@bumby:~$ cat 1.html
<html>
<head>
<title>Title</title>
</head>
<body>
text <a href="http://www.abc.com">abc site</a>
</body>
moseley@bumby:~$ swish-e -c c -i 1.html -T indexed_words -v0
Adding:[1:swishdefault(1)] 'title' Pos:2 Stuct:0x7 ( HEAD TITLE FILE )
Adding:[1:links(10)] 'http' Pos:5 Stuct:0x9 ( BODY FILE )
Adding:[1:links(10)] 'www' Pos:6 Stuct:0x9 ( BODY FILE )
Adding:[1:links(10)] 'abc' Pos:7 Stuct:0x9 ( BODY FILE )
Adding:[1:links(10)] 'com' Pos:8 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] 'text' Pos:9 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] 'abc' Pos:10 Stuct:0x9 ( BODY FILE )
Adding:[1:swishdefault(1)] 'site' Pos:11 Stuct:0x9 ( BODY FILE )
ok, so the link http://www.abc.com was indexed as three works (that can
be changed by WordCharacters but I like being able to search for
"abc.com" and still find it.
So to search:
moseley@bumby:~$ swish-e -w 'links=("www.abc.com")' -H0
1000 1.html "Title" 108
> It should search <a href> tags; however, javascript links should be searched
> as well.
All bests are off with javascript. You need a javascript interpreter to
figure that out. If they are simple you could filter the files and
convert the javascript links into something that swish-e can index (i.e.
convert it to a meta tag).
You can use the included swish.cgi or search.cgi examples for creating a
search interface. Look at http://search.apache.org/ -- it has a way to
search "HTML Links".
> The code should search the entire site up to N pages deep.
Filter results by number of path segments.
>
>
>
> -----Original Message-----
> From: Bill Moseley [mailto:moseley@hank.org]
> Sent: Wednesday, March 03, 2004 12:28 PM
> To: Gil Vidals
> Cc: Multiple recipients of list
> Subject: Re: contract work for a site search utility
>
>
> On Wed, Mar 03, 2004 at 12:12:27PM -0800, Gil Vidals wrote:
> > I've downloaded and studied Swish-e. My company, Position Research, has a
> > small project which involves locating a given URL on a given website. For
> > example, use Swish-e to see if the url www.123.com is anywhere on the site
> > www.abc.com. If it is, then return the page from www.abc.com where the
> link
> > to www.123.co was found.
>
> You mean search href tags?
>
> > Let me know if you are interested and approximately how many hours of work
> > is required to produce the perl code.
>
> HTMLLinksMetaName links
>
> Less than a minute. But I charge by the day. Invoice to follow.
>
> Or do you mean something more custom than that?
>
> --
> Bill Moseley
> moseley@hank.org
>
>
>
>
>
--
Bill Moseley
moseley@hank.org