Wikia's Search Strategy Heavy On Buzzwords And FUD

from the like-a-broken-record dept

Wikia, Jimmy Wales' for-profit venture, has been talking for awhile about taking on Google in the search space. The company believes it can do better by augmenting traditional algorithm-based search with wiki-like collaboration and human editors. So far, the company doesn't have anything to show for its efforts, but it recently announced the purchase of the open source web crawler Grub from LookSmart (remember them?). As part of the company's PR efforts, Wales has tried to make the case that existing search tools are "broken" and that another party needs to come along and fix it. This same line, that Google is broken for whatever reasons, gets repeated by every fledgling search startup out there. While Google has its share of problems (spam, etc.), it's unlikely that most users would see things as being so bad. In the end, neither FUD nor buzzwords, like "wiki", "open source" or "semantic web" will be enough to dethrone Google if the underlying product isn't clearly superior.

Obvious

Thats a pretty obvious statement I should think. But unaware as I am of the internal workings at Wikia as a company - how many programmers, projects et al they have at the moment - perhaps someone could come up with clear numbers about who's doing what, and by when? If they're transparent enough, this shouldn't be a problem.

That data is imperative to classify their statements as FUD or brash google-bashing - without this, I would give them the benefit of doubt.

Google *is* broken. Because of the importance of search you are
needlessly at the mercy of commercial search engines, your privacy is
compromised, you cannot 'trust' them to yield unbiased results (if for
on other reason than your biases are unique), and you are helpless to
try and improve the search engines (algorithms, interface, API, etc.).

In principle people can manage a distributed search engine themselves.
Whether Wikisara is the answer is another question (all signs point to
no so far).

Re:

"Google *is* broken. Because of the importance of search you are needlessly at the mercy of commercial search engines.." - there are so many things wrong with statement that my head just exploded!

This I guess is really what happens in open forums where "comments" are open to reader. Don't get me wrong, I wouldn't want it any other way but no one still knows the number of people MS "hired" to hype the Zune and Xbox and slam the Playstation and ipod on forums and groups. This is in fact the danger with the wiki model when used for profit. If you don't get that by now that the fundemental value (and motivation for colaboration) of things like Wikipedia are that they not profit models, you just won't get it at all. Just wait around for "Web 3.0" or another buzzword you can get an early jump on and hopefully sell of before anyone figures out what you pulled.

Have to agree, Google is broken in some ways

Google is broken to a considerable degree and they broke it themselves.

When I do a search, my results contain entries for millions of worthless blogs, each trying to game Google for ranking so they can make a couple of cents from my viewing their pages, cents paid mostly by -- you guessed it -- Google. I have been a Google user for many years now, but I increasingly find their search results less than useful because I have to wade through millions of blogs to find what it was I searched for in the first place.

The first search engine which tags sites by type and actual content and allows me to ignore certain types of sites will get my searches. If Wikia does this, they will do very good in the market place as I'm not the only person complaining about this swap of worthless blogs.

Not quite there yet

The problem potentially with Jimbo's desired scheme is that if the results are 'controlled' by anyone, ala Wikipedia, we'll end up with situations where people will game Wikia search the same way that trolls and idiots game the most crucial Wikipedia articles, on real people, politics, and pivotal social matters like global warming, abortion, and the Iraq War. If it turns into another scenario like Wikipedia, with cliques of Jimbo's "trusted people", it'll be a farce.

DMOZ.org is a better example of how to do what he's after, for a starting point.

Re: Have to agree, Google is broken in some ways

On that token, there a lot of shit blogs out there, but there are also many whose search hits serve either with good information, or link back in turn to exactly what I'm after in searches. It's a double edged sword. You could always just add -blog to your search result or craft a search URL ahead of time that does it for you, and bookmark that directly.

There's definitely a niche for this

The discussion on this post seems to be a debate on whether or not Google is, in fact, 'broken.' Broken's probably a bad word to use, since Google is obviously not in a useless state, but there's many things people don't like about it, particularly the number of ways of raising Pagerank without raising actual popularity or usefulness of your site. I hardly think this makes Google 'broken', but it's certainly flawed. Of course, everything's flawed, and there's always room for improvement. The question for any potential Google competition is very obvious, but much harder to answer - how will you do it better?

Google was successful on a very simple, but important, idea - have an algorithm that has a way of determining a site's value without just scanning for key words in meta tags or in the page itself. Initially this was simply link count, though of course that's been refined over time. Still, the Pagerank system alone doesn't work well in some cases - especially when you're researching something controversial... the more supported position is of course, always going to get linked to more. In cases where it's the WAY more supported position, the less popular side won't even appear on the first page.

DMOZ was mentioned, and while that's a great directory, very few things are going to appear there unless they're already easily searched for, or well-known in their community - and that's if you have GOOD editor on the topic. Assuming the topic in question has a good editor, this often makes it a great place to quickly find some relevant sites, and it DOES do a great job of filtering out the crap that gets good search results but has no real value, but it's hard for a relatively unknown site to ever get known through something like DMOZ. It's a great supplement to a search engine, but it really can't replace one. It's also based heavily on each category having (usually one) editor. Even assuming the editor is very professional and mostly unbiased, you run into another issue - what happens when he simply gets sick of updating his section? An entire category can go years without updates, because not only do people have to realize it's not getting updated, but someone with an interest in fixing that problem - and enough knowledge to do it - needs to appear.

The wiki approach, in theory anyway, makes it much easier to keep stuff up to date, and the idea of a discussion page for all the search results allows you to quickly scan for dissenting opinions on the results - very good when you're trying to fully understand all sides of an issue - especially when the discussion page may have a link you want that the search results don't.

The problem is that for that approach to work, you need a good system of checks and balances, and you need VERY fast responses to vandalism (Wikipedia gets hit often enough, but a search engine as popular as Google would get hit CONSTANTLY.) Making page edits take a few days to actually commit would probably help (especially if you have section editors DMOZ to check periodically for crap.)

Ideally, I think you need a mixture of the wiki approach and the DMOZ approach - let anyone edit it, but make sure someone's there to actually review the edits - ideally more than one someone. Also make sure that there's a way to make sure all of these chief editors are actually still active (simply checking login frequency should work) and have some sort of way of removing a heavily biased section leader.
Of course, none of this works if new but relevant pages aren't discovered... easy enough in a topic where there's a strong community and the editors are probably active in said community (though this creates another issue... if the community itself is controversial, you need to ensure links to sites AGAINST it can appear too...), but much harder for a more academic topic, or simply a relatively obscure one... so you're still going to need crawler bots finding stuff, and then people manually sifting through it.
This raises another problem though... manually sifting through bot results is not exactly interesting, and unlike wikipedia, this would be a for profit company - people might be more than happy to submit known links on a favorite topic, but VERY few people are going to do raw data processing for free. You'd need a pretty big staff to fully support the effort.

Search engines are one of those things where almost anyone can think of an improvement. Actually getting it implemented is another story, however. I do think it's only a matter of time before a viable Google competitor appears. Wales has a chance, but he's going to need more than just wiki mechanics, and it's almost certainly going to take a lot of trial and error to get right.

If thay set it up like Google but allow you to flag the web page as relevent or not to your search terms and leave comments or user recomended URLs it would be nice you can refine the results and weed out the crap