i just realize that "screen" is a stopword. could this be removed? there is an application called screen (everyone probably knows that) and i have answered questions about this one at least 2 or 3 times.

I can understand the reasons for having a list like this, but I'd say that having this many words on it is just overkill. Many of the words on the list are very useful under some circumstances.

For example, let's say you're having a problem compiling kde 3.4. Such a problem would usually mean that there's something wrong with the ebuild, so other users have likely seen the same problem and posted a fix for it. I'd expect a search for "kde 3.4 compile error" to turn up some useful posts within the first few results, but the current search would just turn up useless crap about kde 3.4.

I've run into the problem several times without knowing why I got useless results. I'd say that slightly slower servers would be better than people thinking that the forums are just full of crap.

Actually, it'll just turn up useless crap about kde. "3.4" (or other similar numbers) aren't indexed. The stopwords list was generated based on how often the word is used. For example, "the" and "gentoo" are not likely to be helpful search words._________________lolgov. 'cause where we're going, you don't have civil liberties.

Actually, it'll just turn up useless crap about kde. "3.4" (or other similar numbers) aren't indexed. The stopwords list was generated based on how often the word is used. For example, "the" and "gentoo" are not likely to be helpful search words.

Well, I guess that makes my example somewhat less than useful, but the underlying point still stands.

If the list was just automatically generated and there aren't any specific objections to removing the more useful words, I'd be glad to go through and find such words.

Well, it wasn't a script that did it without human intervention. Someone looked at the worst offenders, and filtered out the obvious useful terms, such as kde._________________lolgov. 'cause where we're going, you don't have civil liberties.

what happens when you put a phrase in speech marks like in a search engine?_________________when you're sitting back, in your rose pink Cadillac
Making bets on Kentucky Derby Day
I'll be in my basement room, with a needle and a spoon
And another girl to take my pain away...

Wow. No wonder seraching for anything in the forums is such a whippin'. I agree with this concept for keywords, but this is completely counter productive for phrases, such as the "you have mail" example. As a sysadmin, I can sympathize with the resources being consumed, but, that's what we have computers for: To do things that are hard for humans to do. If the computer has to work hard, then it's doing its job. Breaking the system for humans so that things are easy on the computer doesn't seem like the proper way to handle a problem.

Breaking the system for humans so that things are easy on the computer doesn't seem like the proper way to handle a problem.

When the hardware can no longer perform the tasks demanded by humans, and there is no reasonable means to ensure power is available as load increases, tradeoffs need to be made. If you'd like to donate an 8-way dual-core Opteron system with SCSI disks and many gigs of RAM, that might delay it for a while._________________lolgov. 'cause where we're going, you don't have civil liberties.

is it possible to say, not index those words in the actual posts (but take them from the post subject), and get rid of the stopwords altogether? or block them when alone, but for some, use it when other (valid) words are combined? like have a completely block list (like RTFM, the, or LOL) and have a partial block list (error, screen, compile...) for things that are combined?

because just stripping them seems like it will lower the accuracy of the posts a lot...

I agree with cokehabit, a heavy forum needs an advanced search widget to help sift through the crap and find what you want. Encapsualting whole strings (as google can) would soon make the word shitlist irrelevant, and I have always felt this is a missing feature from phpBB.

I understand we might not make such changes, and this may be the only acceptable solution for now, but a better solution does exist._________________ 6700 @ 2.66GHz, 4Gb RAM, 2 x 500Gb, 8800 GTX, PhysX, X-Fi, 24" Widescreen, Tux mascot

Having some of those words removed from searches WILL lead to more double postings, and in the end, very frustrated users who cannot find the solution they want to a "serious" problem. We cater not only for the home user, but companies also, and being a sys-admin and go through pages uppon pages of worthless search results will land Gentoo in a bit of hot water at the end of the day. I have also been there this week trying to find a cure for why GLX was suddenly broken and all the search strings I entered landed me less than helpfull results. I was only by chanse that I found the topic to help me while browsing the forums a bit. I have resolved, in many cases now, to rather use Google and see if I can't find an answer rather than using the forums search.
For example, look when I joined and look at the number of posts I made till now. Very few, and most of them was made within this year cause the search function did not return usable results anymore. Get my drift?

I think "strings" are important, like searching the error string output. That will lead you exactly to the right place almost every time.

That said, I know that the servers are taking an enormous load, but isn't there a way to make the searching a bit more effective? Limiting the results to say 3 months (with maybe an advanced option for 6, 9, 12 months), upgrading the board maybe to better software, rallying for donations to buy a better server?
We once bought, from donations, an Opteron server for a Bit-Torrent tracker here in South Africa with only 300 members! US-$10 might not sound much, but add a few thousand users x $10 and you have yourself a new server. There must be a better way to either filter the stopwords, better the search engine software or someting?
_________________"Ubuntu" - an African word meaning "Gentoo is too hard for me".

Having some of those words removed from searches WILL lead to more double postings, and in the end, very frustrated users who cannot find the solution they want to a "serious" problem. We cater not only for the home user, but companies also, and being a sys-admin and go through pages uppon pages of worthless search results will land Gentoo in a bit of hot water at the end of the day. I have also been there this week trying to find a cure for why GLX was suddenly broken and all the search strings I entered landed me less than helpfull results. I was only by chanse that I found the topic to help me while browsing the forums a bit. I have resolved, in many cases now, to rather use Google and see if I can't find an answer rather than using the forums search.
For example, look when I joined and look at the number of posts I made till now. Very few, and most of them was made within this year cause the search function did not return usable results anymore. Get my drift?

I think "strings" are important, like searching the error string output. That will lead you exactly to the right place almost every time.

That said, I know that the servers are taking an enormous load, but isn't there a way to make the searching a bit more effective? Limiting the results to say 3 months (with maybe an advanced option for 6, 9, 12 months), upgrading the board maybe to better software, rallying for donations to buy a better server?
We once bought, from donations, an Opteron server for a Bit-Torrent tracker here in South Africa with only 300 members! US-$10 might not sound much, but add a few thousand users x $10 and you have yourself a new server. There must be a better way to either filter the stopwords, better the search engine software or someting?

Wow, just wow. If your business relies on the gentoo forums for tech. support, maybe they should fire the tech guys and use the saved cash on a redhat contract.

Well, I ended up here because I could not get relevant search result anymore and I wanted to know if there was a problem with it. I guess I've found the answer.

My opinion: bad idea. Like many said, it's the combination of words that matters. This solution will not hold the long term road, I'm afraid. Like one poster said, more irrelevant search will lead to more posting will lead to more words will lead to a bigger database anyway, and back to square one.

The cause is quantity and the problem is two fold, depending on how you see things:

1- More quantities requires better search algorithm. I wonder how the db is set up. Maybe the problem is right there. Ever thought trying Oracle instead of mysql? How are the indexes made?

2- Do you really need to index all the posts made since april 2002? I know it's been mentionned before and other big forums do it, after some time, the threads should be made static. It as to be. You can't keep accumulating and preserve live threads and posts ad vitam eternam. Makes no sense (for proof).

So either find (make) a good search algo, or diminush the ammount of data. Don't pretend to know that by removing words is a solution, because any of these words in combination with other words is relevant. With time, you'll block all the words from the dictionnary and then we'll be able to say "wow man, searches are blazing fast now!" _________________(7 of 9) Installing star-trek/species-8.4.7.2::talax.

1- More quantities requires better search algorithm. I wonder how the db is set up. Maybe the problem is right there. Ever thought trying Oracle instead of mysql?

Oracle is rather expensive and I don't think that the Gentoo Project is willing to spend their money on a license for an Oracle Database-Server.

YMMV_________________Der Mensch kämpft um zu überleben, und nicht, um zu Grunde zu gehen. - Paulo Coelho
It is the end of all hope. To lose the child, the faith. To end all the innocence. To be someone like me. - Nightwish - End of all hope

I have to agree with some of the others here that have said that these stopwords suck. Obviously the desktop thread isn't a great example since it's not a support thread (and can be found by skimming through the pages in Desktop Environments), but I have had problems trying to find support issues too.

And I don't understand something. If it's not a hardware problem (as ian! mentioned) what exactly is wrong with the software? I, like others, would rather have a slow relevant search than a quick irrelevant one.

We are of course aware that the stopwords list isn't the perfect soltution and has some limitations. However we think that the positive effects outweigh the negative ones._________________Dinosaur week! (Ok, this thread is so last week)