Posted
by
Zonk
on Tuesday December 26, 2006 @09:26AM
from the many-hands-make-light-workd dept.

An anonymous reader writes "Jimmy Wales, founder of the Wikia corporation, has revealed plans to offer a user-driven search engine. Ars Technica reports that the plan is to leverage user preferences to pick the 'best' site for any given search term, while at the same time utilizing advertising for commercial gain. The article admits this may not be the ideal solution: 'Users may be reluctant to contribute to the betterment of a commercial site that may end up being bought by a bigger company. Consider, for example, the tragic death of TV Tome, a comprehensive community-driven television content guide that was eventually bought by CNET and transformed into a garish, excessively commercialized Web 2.0 monstrosity of significantly less value to users.' Just the same, Wales seems very enthusiastic in the Times Online article highlighting this venture."

Results for controversial subjects will be just as accurate as those for more commonsensical topics: All search results will helpfully direct the user to discounted prescription drugs and aphrodesiacs. Contact information for local singles will also be provided. I daresay they shall be "hot".

Well, there are many mechanisms to choose from, some of which are already implemented on Wikipedia. There are many commercial and political interests in the biggest and most popular online encyclopedia, so this search venture is not that very different in that sense.

The real answer, I guess, is that you can't control it for all cases, but you can be sure that the most popular terms will have enough eyes on them to be safe from it.

Early rumors had him working with Amazon in the effort, but this [mashable.com] should clear things up.

Google, Amazon, Opera, Mozilla, all are good ideas but as they expand their reach, they turn to crap. Google is going to Hell, Amazon is there, Opera likes the road, and Moz? They seem to be eyeing it.

"Google, Amazon, Opera, Mozilla, all are good ideas but as they expand their reach, they turn to crap. Google is going to Hell, Amazon is there, Opera likes the road, and Moz? They seem to be eyeing it."

WED Fan

True but heaven for trolls, imagine a group working together...what they could accomplish with this type of tool at our disposal.

I don't believe WikiMedia will ever solve the problem of coding the actual search system. Namely, who'd do it? None of the core developers. In order to be viable, the system would be too complex...

It'd have to solve many of the current probs with the W, for one. Prob's such as accuracy, which apparently, said proposer [arstechnica.com] doesn't believe W should be trusted for. Not to mention filtering for biased-users who'd get all their friends to promote irrelevant attachments to search terms, using the engine as a source

Back in 1999 a company called Direct Hit Technologies developed what they coined a "popularity engine" that ranked results based on tracking user behavior. They basically partnered with existing search engines and mined their web logs looking for patterns among users. If a lot of people who entered search term "A" went to website "X" but within a minute or two went to website "Y" then "Y" would be ranked higher than "X". Direct Hit was bought in the middle of the internet boom by Ask Jeeves for a cool $512 million. Some of that technology likely still exists within their full search engine, and I'm sure others like Google, Yahoo, etc. all use similar methods of tracking user behavior for helping with their rankings.

Moreover, doesn't Google already have the ability to make something like this possible, and, in fact, has already implemented part of it? On my Google toolbar, when I type in a common search term, the input text also outputs how many users searched for the same item. It also suggests popular search terms (listed next to the number of searches), as I type...

Disclaimer: I am a founder of a competing search engine concept that is based on volunteers running distributed software that crawls pages, conducts partial analysis and indexing before passing over results to central server that will index data into main central index: human aspect here is the people who take part in the project, and participants can actually change ranking formulaes, and shortly will be able to assist in human detection of spam etc.

Searching the Web is a very challenging problem (that's why few companies do it): volume of data is huge and one only appreciates value of good algorithms when faced with situation when poor algorithms make stuff run for weeks failing near the end and you have to restart the run to wait another week. You can either try to handle this very big problem, which is very hard even if you have the money (look at Amazon's A9 funded with millions, yet they licensed Google's code and database), or you can try to reduce the problem: only focus on a handful of "important" pages - Yahoo did that when they were human edited directory/search engine hybrid.

It seems to me that Mr Wales entertains the illusion that a very small number of manually checked pages in the Web space will be sufficient to satisfy vast majority (and it has got to be 98%+ as I won't be hopping from one search engine to another) of search queries. If this was the case then we would still be using Yahoo that did pretty much just that, yet almost everyone (including Yahoo) moved to algorithmic search engines because it is the only way to handle billions of pages, and billions of pages you will have to handle: even if you just index homepages of all registered domain names you will be dealing with 100 mln+ pages, that's good 20 times more than articles in Wikipedia and checking pages can be far more duller than reading nice article you have some personal interest in.

What I find ironic that our own concept of the search engine was removed from Wikipedia because we were supposedly "not noteable enough", that's the sign how they handle problem of "too much data" in Wikipedia - they just reduce the problem by reducing datasets greatly, sometimes this is done wrongly, sometimes rightly and it might well work for Wikipedia, but it sure as hell won't work for Web scale searches. Oh, and by the way who said Google and others don't use human reviewers? They sure do, just check TrustRank [wikipedia.org], this link is ranked as #1 match on Google for search TrustRank! Notice what Wikipedia tells us: "While human experts can easily identify spam, it is too expensive to evaluate manually a large number of pages."

Human input plays an important (although fairly unknown as they prefer to keep it secret) role in the state of the art search engines, however suggestion that humans can handle billions of pages and/or that a handful of pages will be sufficient for a general purpose search engine is wrong and a very backwards move that will result in exactly the kind of wrong attitude present in Wikipedia now.

OK, perhaps the new generation of Web entrepreneurs had better learn something from their "elders". We're about to see a lot of concepts that didn't work in the 90's be resurrected and funded. This has been tried and failed, badly:

It wasn't so much that so many "ideas" failed in the 90s, it was just the one really bad idea that failed: That you can build a profitable company by just getting people to come to your website without any idea of how to get them to come back or how you where going to make money from them while you had their attention.I think just about every actual content idea has since made money for some company or other, but with many casualties along the way. But given how many businesses fail, let alone new busines

I thought a lot of the failures were down to selling unsuitable things, due to shipping costs, urgency of items, etc. Or, just having a site that launches late, overbudget, and doesn't work. Or, (though not originally a bad idea, but some sites launched when this was obvious), selling books and CDs, just like Amazon. Or having dodgy people on the board and launch parties costing millions. Ad revenue turned out to not be as hot an idea as first expected, but it worked for TV and radio, so, who knew?What was

In the summary quote he pretty much announces his intention to sell out at some point. More attention now leads to more money later, either through having a higher-profile name, or through suckering more people into developing his search rankings.So the guy founded Wikipedia. Good for him. It doesn't mean he walks on water, and the advent of yet another search engine doesn't deserve front page of slashdot. Especially when you know its going to get swamped by spammers (or their bots) and quickly become u

If Mr. Wales uses some kind of Netflix like system to guard against spam, we will have a problem. People who have entrenched beliefs will read sources that further those beliefs, further entrenching them. Every day we see how it's become harder to have meaningful debate due to polarization. So far, Europe has remained immune, but if the demagogues gain power it will happen there also. If This Goes On-- [wikipedia.org]

SearchSays.com [searchsays.com] is a user-based search engine that just recently launched, and has a growing contributor base. Looks like it might be a race, but I guess I'm not surprised it's already been started. There are so few original ideas on the Internet these days.
As to the "it's already been done and failed" remarks - timing is everything isn't it? I could see something like this taking off with the current Web 2.0 craze.

Back in the late 90s, I used to recommend browsing downward to see if there was anything in the hierchical categorization of web content Yahoo set up as a complement to the many, but often irrelevant results people got back from AltaVista.Fast forward to days of Google and Wikipedia and you have infinitely better "dumb" search, and an equally easy to use, generally decently accurate, and well contained treatment of a dizzying array of topics.

That's easy. You need to leverage the new digital paradigm offered to us by Web 2.0 and the Semantic Web to effectively harness and integrate user-generated and user-driven content in a dynamic framework accessable over a simple user-oriented interface via a wireless broadband multiplexed link. Fool.

TVTome is an excellent example of why free licensing matters. When a community has free licensing as the social framework to allow for forking, the infrastructure providers are forced to continue to provide good service, to prevent the community from forking and leaving.Even Dmoz, for which I have great fondness and respect, has been crippled for years by a non-free license that allowed AOL to run it into the dirt. (See the recent 6 week server outage, for which there is simply no excuse.) (The Dmoz lice

And how would he handle shills? Bots? Trolls? Ok, google is struggling bad enough with link farms and the like but I can't really imagine end-users being the answer to this, even with some sort of meta-recommendation system. Personally, I find either a wikipedia search (for example now, recently I've been answering some christmas quizzes which it just excels at) or a very targeted google search (3-4 search terms) finds 99.9% of what I want, and the rest just isn't there.

How come advertising is an acceptable business model in polite company?! How come thumping one's chest with no backup data is an acceptable form of communication?! I'm getting REALLY tired by all the Web wannabes that think advertisement is a valid business model. Just give me a clean per-pay service for once!

Ads requires me to AVOID looking at them. I really don't like being brainwashed. Just give me the service I'm looking for and leave my brain alone. It is not yours to try to tamper with.Just look at what are/were the most advertized things around:Cigarettes (Thank god we finally got laws to keep those off)Soft DrinksFast FoodSUVsAll total crap that we would live much better without. But they hang around by blatantly manipulating innocent bystanders brains with constant exposure to absolutely unsubstantiated

It's not only about searches. It's about *everything* on the Web becoming an advertisement dissemination vehicle, instead of providing clean information. Brought to you by Gooooooogle. And by everybody else.

So that means a site with no content other than links to a bunch of ad's is easy to spot (i.e., same "class" of generic layouts). I've never seen a custom-made AdSense farm. Such things just don't exist; if anyone's willing to spend a good amount of their time designing a non-generic-looking site, they wouldn't submit to the pollution of no content.

You might be willing to go out of your way and pay money to avoid a few, nonintrusive ads (i.e. they're there, but not in your way), but most people aren't. Google/Yahoo/etc. don't cater to you, they cater to everyone. And there is no reason to solve the pay-per-use problem. Even if it was easy, it would still cost money, which would be a major barrier to attracting people to the service no matter how cheap. Ads aren't like that. There's no "fund" where you can only view so many ads, and you don't have to w

'Everyone' is a sum of individuals. I am one of them. Don't pretend I don't exist. We are past the age of mass disemination of ads, err infomation, i.e. tv age. Customize your software to cater to all. Including people who value their brains.

Part of me welcomes new methods and new technology where search is concerned. However, the involvement of Mr Wales into this arena isn't one I welcome at all.

The only good thing about this is that possibly Wikipedia might be ousted from the primary or secondary page rank for most subjects. That is an authority most highly undeserved, and proof of nothing more than how far we need to go in terms of achieving accurate search.

I think (hope) this is just a piece of self publicity. I doubt they have the technology - judging by the fact that at peak times Wikipedia search shuts down and defaults to Google and Yahoo.

Interesting too, that while Google employs seriously smart people and is founded by seriously smart people, that Jimbo and whomever he cobbles together from the smart-search-technologists-who-decided-not-to-work -for-google-exactly-why? are likely to succeed where they have failed. Sure, brains aren't necessarily everything, but they really do help. I think no small amount of Google's success is the size of the brains behind it. It's why they have a competitive advantage in most markets they enter.

We have seen very clearly that Wikipedia is extremely vulnerable to, and tainted with, group-think manipulation. (Jimbo's icon, Ayn Rand as one very tiny example of many). Why would anyone think this search will be in any way different. This looks just as vulnerable and easy to manipulate if you get a group together. Which every SEO blackhat on the planet will do on the day of launch. This looks much easier to manipulate than meta tags, or page rank.

I'm sure SEO blackhats and right wing organisations are foaming at the mouth with excitement at this wonderful Christmas announcement.

I'm sure SEO blackhats and right wing organisations are foaming at the mouth with excitement at this wonderful Christmas announcement.

The one thing we all learned post 9/11 is how it can be to tell the difference between foam and saliva. I'm coming to the opinion that foam is a just a glandular camoflage used to disguise malice as outrage.

Why start by creating an entirely seperate search engine? I often find myself using Google just to search wikipedia becasue I am not sure how to spell the item I am looking for and the search box on Wikipedia needs an exact phrase. Why not improve that first and then consider expanding?

After reading your points, I'm getting the impression we need a pre-built form like the "Why your anti-spam idea will not work" one but for "Why your new Internet idea will not work" and include some of your above points..

You're going to run out of money, because..

( ) - Your ass will get sued for patent violation( ) - No-one would be stupid enough to buy you out( ) - No-one would be stupid enough to pay for your productetc..

Excellent. Just what we need, more hermetic negativism designed to throw the baby out with the bath water so that the earth can continue to spin on its present axis.war on spam = Iraqwar on botnets = Afghanistan

While we're at it, let's do one for the war on drugs and the war on terror for good measure. Let's do one for poverty in Africa, and dementia in the elderly. Let's do one for hieroglyphics, the alphabet, the digital number system, and man-made fire. Think of the untold failures and aggravation ca

I entered a bunch of my CDs into the music CD database CDDB [wikipedia.org] (now Gracenote), thinking that the database's contents would remain public domain or at least freely usable. Then it was sold and the company that owned it forbade access by media applications without a fee. Thousands of people like me voluntarily built that thing, and now they wanted to sell it back to us. Any organization asking for my help on future projects had better have an ironclad guarantee that my work product will remain free to users.

I actually started a company doing much the same thing back in 2000, probably a bit ahead of its time. Even took out a patent [uspto.gov] . Unfortunately investors (actually almost everyone we talked to..) didn't have a clue what we were talking about, so we ran out of money.

Why do I live in such a small country, where nobody has a clue.... sigh...

Anybody got a job working with interesting people that can actually think ????

Before Mr. Wales expands his empire to cash in on search, maybe he ought to invest some development in improving Wikipedia's search feature wich is almost useless, one of the least useful search features I've ever seen. To search Wikipedia, I use Google.

Why not create a project that takes the Google search results, and then creates a collaborative layer on top of that? Sort of like GreaseMonkey for Google?
I know Google will never alter their automated search results - this is where a project like this could come in. It would have all the power of Google, plus all the power of collaborative human input. Right off the bat, it would be at least as powerful as Google - and assuming human input is more beneficial than detrimental, it would only get better

Why not create a project that takes the Google search results, and then creates a collaborative layer on top of that?

I just posted this the other day on the Wisdom of Crowds article; here is the link [slashdot.org]:

What's the next logical step?

Search engines. Google's PageRank algorithm may point to highly rated *websites*, but searches themselves can be rated. Since most queries are less than 3 words, track where all less-than-3-word-queries go to, and rate *those* sites higher. Since humans are doing the searchi

Yes, that's a great idea - track what people actually click on, and weight those results higher. It's almost like meta-moderating Google's search results.
I really don't know anything about search, or if there's an open-source search engine? I wonder if anyone else does this?

That is what I can never figure out with these community driven things. All these saps putting there time and energy into it, when they know that behind the scenes someone owns it and for them regardless of the personal investment or emotional attachment the bottom line (money) is going to call the shots at the end of the day.I have quite a view of my old customers who know me personally from my webhosting company and they still can't understand how I would abandon them and sell the company, not that I wan

While I'm not talking about the current venture that Mr. Wales is doing, the point about Wikipedia is that it isn't "owned" by any single individual or even organization. With the GFDL, it is very easy to "fork" the service if a group like the WMF decides to "sell out" or do something that really pisses off the community.Trust me, I've had offers to do exactly that, due to some huge infighting that took place on one particularly prominent Wikimedia project. I politely refused, prefering to stick within th

Wikipedia was a reasonably inspired idea... this one, of a user-driven search engine, is NOT. As a not-exactly-run-of-the-mill human who is already nearly drowning in the mediocrity and stupidity that surrounds me, I'm frankly terrified at the prospect that AVERAGE people will be entirely responsible for determining what results are returned by my search engine! I want the exceptional results that *I* can comprehend and appreciate, not the moronic ones that appeal to people of average intelligence and abi