Searching

Vanilla 2 search uses MySql full text searching which doesn't behave intuitively. In fact, our users find the our forums' search completely unusable. The information about how to correct this as best possible is somewhat buried, as are the customisation options for searching.

Using Google Custom Search is a poor solution, because you have to wait for the next page crawl.

A mainstream search function should try to mirror the same query syntax used for Google, since that is what most users will expect. You don't need the full Google syntax. You just need the following rules:

That would go a long way. You could get slightly more fancy and allow prefixing terms or phrases with "-" to exclude them. "+" would be unnecessary (but could be recognised and ignored), since in Google it means *precise* phrase, not relevant for mysql boolean full text searching.

The problems with the existing behaviour are:

You enter 2 words, and get something along the lines of word1 OR word2, completely counter what you expect

Words of 3 characters or less result in no hits at all

There's no explanation of existing behaviour

You often get back either nothing, or completely irrelevant results. Often you *know* a page exists but just can't find it.

The SQL query appears to be wrongly structured; with boolean searching, and the query "+term1 +term2", you only get results where *just* the discussion, or *just* the a single comment, contains both terms, but what I'm after is where the [discussion and all comments] contains both terms anywhere - for example, the discussion contains term1, and the 3rd comment contains term2.

match = default behaviour; natural language searching which attempts to sort by relevance and largely fails, in part due to the min-4-character word indexing, and OR operator

boolean = use mysql boolean searching, where you enter "+term1 +term2 -term3" to mean "term1 AND term2 AND NOT term 3"

matchboolean = hybrid; if there are any pluses or minuses, it uses boolean searching - this is probably the best default option (in the absence of below improvements to vanilla search), but you'll need to explain to users that the Google search eggs sausages "scrambled eggs" needs to be written +eggs +sausages +"scrambled eggs"

like = use SQL LIKE, low performance but OK for small forums

Change the SQL query so it is effectively searching the page you see when you view the discussion - i.e. the discussion title, body, and all comment bodies. Allow each term to "hit" anywhere in the discussion and comments.

Comments

Iterate through all characters in the search query.Track whether we are "in quotes" or not.When not "in quotes", and a word start is encountered (the first non-space character, or any non-space character following a space): If it is a plus, keep If it is a minus, keep Otherwise, insert a plus.

I think that's right. I'm going to try hacking this up on our site's PHP.

To me programming a custom search parsing like this is a wild goose chase that the core development team won't go down. I tried doing a custom search before the launch of Vanilla 2 and it just didn't perform well and the parsing broke in unicode. Using an OR query on a large data set is also very slow.

The fact of the matter is MySQL is not built for search. The other fact is that no matter how good of a search engine we use for a site, nothing will beat Google. And real-time searching is a a goal that is really not achievable for full-text search. In a way, putting an internal search on a forum is just for the graphic design.

Our default search sucks because MySQL full-text search sucks and that's what we have to support. However, search is not our goal with Vanilla. Building a great forum is.

We may change the default search, but if we do the following will be used.

1. A simple like search on search terms.2. More search options like author and date and tags.

@Todd, I agree, the approach I'm describing may not work for multi-byte unicode, but should for unicode, and certainly does for western/ascii.

I'm not trying to completely solve the problem of search and come up with a perfect solution. I'm only trying to do an 80:20 - to make a few simple tweaks that measurably improve the user experience for the most common search requirements, while keeping the basic paradigm (MySQL full text searching).

(Unfortunately, we can't rely on Google CSE, since half of our site is restricted-access, only available to logged-in users in permissioned groups. For public sites, a "fig-leaf" search isn't an issue if you can swap it for CSE, but for ours, it's a really big deal, since the entire point of our forums is to be a searchable knowledge repository for users of our product).

To sum up, I'm trying to allow users to search, for example, for eggs bacon -"scrambled eggs" as they would in Google and have that be interpreted as "Any discussion or comment containing "eggs" AND "bacon" AND NOT EXACT PHRASE "scrambled eggs", rather than the default behaviour of natural language searching which returns results that don't containing all required phrases and misses out words 3 letters or shorter.

The limitations of the changes I'm outlining are:

Unlike Google, search terms must be exact. You can't search for "colour" and get hits for "color" or "colouring"

I don't know how this will work for unicode. It certainly won't work for multi-byte, but there are a great number of non-multibyte locales using Vanilla, I should imagine. I would expect unicode to work OK; all I'm doing is inserting a + character before every search term.

They don't address the issue where all your search terms must match against a single discussion or comment. So, if you are searching for [term1 term2], and your discussion contains "term1" in one comment and "term2" in another comment, you still won't get any results.

Performance - I don't know how the full-text indexing tweaks will perform on very large sites; on our modest 1000+ discussion site, they have no noticeable impact.

For those who wish to try this out, here are the instructions:

In conf/config.php, add the following to the end of the 'Garden' section:$Configuration['Garden']['Search']['Mode'] = 'google';

In applications/dashboard/models/class.searchmodel.php

insert a plus before every search term and switch to "boolean" search mode, if the search mode is initially 'google'. I'll paste the tested & working changes necessary in my next comment.

Update the MySQL database to perform full-text indexing of every word without any constraints. This is described in step 3 of my original post, above.