The ineffecient method:
-pull all results
-check each field (substring(search)). ie. If substr($row[$result]) == $whatever_the_user_searched_for) { then put in results array.}

print the results array. that's just a quick idea.

welo

11-04-2003, 03:36 AM

phpCodeGenie (http://phpcodegenie.sourceforge.net) might help you with this. I don't think the code it generates is compatible with register_globals off yet (been awhile since I looked at it), but it gives you lots of ideas for search form functions that should help you along considerably.

weronpc

11-11-2003, 12:45 AM

Celtboy:
I like yoiur little quick idea, but is there a little more advancing method for searching? More complicate one??

welo:
that's for the site, what I am looking for is idea, not code, but thanx still...

Mike

firepages

11-11-2003, 02:48 AM

ok :)

you can either grab a ready-made spider like phpdig or roll your own .

how you populate is up to you , but the basic plan is ...to write a script that logs every page of your site and keeps a reference to it in the spider table , including the path to the page , and optionally some sample content.

(when I say every page of the site that does not actually mean the page itself , rather the database that provides the content for that page so your spider may or may not contain an actual path to the real page , perhaps just the _GET data required to build it)

At the same time your script will take each individual word in that page and store it in keywords (but only if it does not already exist (eg keywords is unique)
(Most systems will filter out common words like 'like , and ,the' etc)

Each word that is added to the keywords table is also added to the engine table (or rather its key_id from the keyword table) along with the spider_id , if a word is found in the same page more than once then instead of adding a new record to the engine table you increment the weight field to show that that word shows up $x times in that page.

Thats why its important to filter out 'I,the,and' etc as the engine table gets very big very quickly , though as it only holds integer values its still very fast to search.

How you run your scripts to populate the DB is up to you , some systems (eg this forum) will run the above script on this post as soon as I submit it ... others will run the script on the whole site at regular intervals.

To search your tables you would (at the simplest level)

SELECT key_id FROM keywords WHERE keyword='$searchword'

then

SELECT * FROM spider where key_id=$keyword_id_result ORDER by weight

you may want to join those and its normally more complex as you may expect several words to search , search queries for these types of searches are indvidually quite complex so its common to first fetch just the spider.spider_id , store those id's somewhere (flatfile/session/temp_db) and then query those results later for actual content.

there are lots of more complex variations for weighting but they all start to make the engine table slow and bulky or rely on stored procedures in more robust databases than mysql.

The main advantage to rolling your own as it were is that you can customize what pages you index and to what extent , eg I have a site where I search all the tables of a DB and index them but give double-bubble to a field 'list_keywords' eg I double the weight of any words from that particular field which makes sense in that particular application.

weronpc

01-28-2004, 03:09 AM

That's really great answer you gave me, thank you so much for your time.

I still having so misunderstanding, I know it's a many to many relationship, that's why the engine table exist.

question 1,
do you know where I can find more of those common key workds? such as 'like, and, the' etc.

question 2
Just making sure, $keyword_id_result is the words user entered or is the result from the select statement? Also, I thought the spider table does not have weight field.

SELECT * FROM spider where key_id=$keyword_id_result ORDER by weight

question 3
I don't understand what you are trying to say.

How you run your scripts to populate the DB is up to you , some systems (eg this forum) will run the above script on this post as soon as I submit it ... others will run the script on the whole site at regular intervals.