Unofficial news and tips about Google

July 28, 2008

Cuil, a New Search Engine

Cuil, the start-up founded by Tom Costello and two former Google employees: Anna Patterson and Russell Power, unveiled a search engine that claims to have more than 120 billion pages in the index. According to Cuil, that's "three times as many as Google and ten times as many as Microsoft."

At Google, Anna Patterson designed TeraGoogle, a system that is able to index a large number of documents, while Russell Power worked on web ranking and automatic spam detection.

"Cuil's goal is to solve the two great problems of search: how to index the whole Internet - not just part of it - and how to analyze and sort out its pages so you get relevant results." Cuil thinks that today's search engines can't index all the information that is available on the web (more than one trillion pages, according to Google). Even Google admits that it's selective: "many [web pages] are similar to each other, or represent auto-generated content that isn't very useful to searchers".

Regarding ranking, Cuil combines metrics that measure popularity with information about the context of each web page. "Cuil prefers to find all the pages with your keyword or phrase and then analyze the rest of the content on those pages. During this analysis we discover that your keywords have different meanings in different contexts. Once we've established the context of the pages, we're in a much better position to help you in your search."

The most striking new idea is the way search results are formatted. Instead of the ten blue links displayed linearly, Cuil makes better use of the space by using columns. The search engine also shows thumbnails next to some of the results, but they don't always represent images included in the adjacent web page. Another interesting idea is the explorative category section that shows related Wikipedia categories and topics. Cuil has an excellent auto-complete feature and it displays a list of related searches using an design pattern that suggests exploration.

It's probably not fair to compare Cuil with Google, but when Google was launched, users could see substantially better results. Cuil returns results that are either similar to Google's results or substantially worse. In some cases, the site doesn't return any result for your queries, probably because of the huge traffic from the launch day.

Cuil has problems with relevancy, spam, robots.txt (the site indexes albums from Picasa Web) and the number of search results for almost every query is smaller than the number of Google results. This is especially obvious for queries that return a small number of results:

All in all, Cuil is the best search engine launched this year, but it doesn't offer convincing reasons to switch from Google. If Cuil focuses on developing technologies that allow faster indexing of web pages, it's probably the perfect match for existing search engines with less frequently updated indexes like Live Search or Ask.com.

Searched for my name in Google and my website (my domain is name with .com) shows up #3, with many more results in following pages. Same search in Cuil showed my LinkedIn profile #1 but my great website domain never showed up in the first 20 PAGES. Also, it keeps repeating the same results over and over and over again. What's the point of that?

Tried again while searching my work. In Google our official website is result #1. I Cuil never showed our homepage after 10 pages (I quit then). Odder yet is that my LinkedIn profile showed up again. Although that makes sense since I work here and I listed it, why was my LinkedIn profile listed in the search results on page 2, page 3, page 4, and page 5. Why does it keep repeating the exact same results?

Not terribly impressed. They claimed to have an index three times the size of Google's. Instead, it seems Google has an index twice the size of theirs.

Also, I'm not sure if this is buried somewhere in the preferences (I doubt it,) but I would generally prefer there NOT be images for most of my searches. I don't mind the few that Google shows at the top, but to have images strewn about just means I have to look past several images to find the text I am looking for.

Also, a magazine layout is ok for actual content worth reading. The goal of the design is to get people to scan the page for something interesting to read. It is also meant to make advertisements more noticeable. In this case, I am performing a search to find something... I don't need my search results obfuscated further so that I must further search the screen for the relevant results.

For claiming to have a huge index, a search for STARE CLIPS shows many results on Google... no results whatsoever on CRUIL. A search for STARECLIPS shows some results on CRUIL, but it doesn't even bother showing the stareclips.com site. A search for STARECLIPS on Google at least shows the stareclips.com in the number one spot.

If CRUIL had made a claim that they are just tinkering with search technology and hope to eventually do better, this would be one thing. However, coming out of the gate, they claimed to have a bigger index and claimed to have a better way of doing things than Google. However, right out of the gates, it looks like they are worse off than ASK.COM. I sure hope those who invested in CRUIL are hoping for a buyout by a competitor.

I stand corrected. Now a search for STARE CLIPS shows some results on CRUIL. However, the results are still not terribly impressive. I also find their choice of images to be amusing and often irrelevant. When I do a search for Bob Oliver Bigellow XLII, it shows a link to StareClips.com (not entirely certain why, considering this appears nowhere on the site AND a more relevant search for stareclips won't even show StareClips.com). What's weirder is it shows a picture of some random guy next to the result. Clicking the picture of this guy takes you to StareClips.com. I don't know if this guy's name happens to be Bob Oliver Bigellow XLII (I seriously doubt it)... or if he is just some random blogger who happened to blog about the site... but, to a researched, CUIL could lead to a lot of misleading information. A very bad idea to trust CRUIL for any form of relevant research.

I'll check back in a year or so and see if they have a better algorithm.

most of the results that I found in my initial searches were for pretty crappy .com results. There's no way that I'd recommend this to my students for research, at this point.

the impression that I get at this point is of a giant step backwards to pre-google spam filled results, but with a snazzier graphic interface. I grant that it may improve, but I will wait to hear others talking about its great results before trying again.

Can't say that I'm very impressed with the search results, yet. I'm sure just like any start up business they are going to experience their share of bumps, but taking on Google is like putting a 50 foot speed bump on a mini go-cart track with carts that are solar powered as you try to run them in the dead of night!

Another area in which Google has the advantage is that it's much more intuitive to pronounce. I'd hesitate to recommend that anyone use "cool" or "queel" or "quill" or whatever else it may be, for fear that they wouldn't be able to figure out how to spell it.

"Recently, even our search engineers stopped in awe about just how big the web is these days -- when our systems that process links on the web to find new content hit a milestone: 1 trillion (as in 1,000,000,000,000) unique URLs on the web at once! (...) We don't index every one of those trillion pages -- many of them are similar to each other, or represent auto-generated content similar to the calendar example that isn't very useful to searchers."

Re: spam.. it probably didn't help that the then Cuill (where did that last "L" go?) crawler spent several weeks locked in a deadly embrace with my web site spam trap - 9000 in 5 days - completely ignoring my robots.txt. Even worse, it completely ignored the *real* content on my web site. Multiply this scenario thousands of times and you can understand the problem!

I’ve tried about four searches on topics I know well and got pretty skewed results. Then it occurred to me to google, er — CUIL my own field in which I come up at TOP of Google: HANDWRITING ANALYSIS SAN FRANCISCO.

It brought up 9 KANSAS handwriting/graphology sites as well as one site for a “spiritual healer,” and this crazy site, osidjfios11df.angelfire.com/handwri…, whose subheadings are all possible misspellings of “handwriting” and started to INFECT my computer with three viruses, one a “fatal”! So look out, folks.

And CUIL.com is using the San Francisco Bay Area/Silicon Valley PEDIGREE of their officers as credibility???? Only Dorothy and Toto would offer 9 KS sites when San Francisco was specified!

When a friend asked today if I’d heard of the new search engine at first I thought it was kind of cute for my purposes as a handwriting expert, cuil = quill? Then I found that it’s pronounced COOL. That’s beyond stupid. With their precious little blue “i” it appears they’re trying to hitch their wagon to Steve Jobs’s iTrain.

Relax Google, and just keep on getting better all the time. I’m pleased to tell people I provided entertainment for one of your company parties.

Bad results is one thing, but their bot is very, very abusive. It actually hacks url's it finds searching for "hidden" content. I have read many posts and made few of my own about Cuil's abusive Twiceler Bot.

Also, its search results full of pages Google would never show like 404, 403 and 500 error pages., as well as results from http proxy servers and obvious spam, for example mispelling my namealex hggins

Solution looking for a problem. Google is all anyone needs for search... using address bar adders like site: etc. make it very easy to focus a search. And if you hate GOogle, you have Microsoft and Yahoo - both acceptable.... so what is the point of this?

It's just not "Cuil" (plus, the name is laughable. Ranks right up there with Xobni(Inbox backwards... get it? ) and other contrived Web 2.0 nonsense.

It is quite interesting, that a search engine targeting _all_ pages on the Internet does not handle character encodings well. Try to search for something like "űr" (space in Hungarian), and you will get _no_ relevant pages.

It has been defined, roughly, as a swamp on a hill--a type of wetland that may be regulated by some states and the federal government of the USA. When you search for pocosin, you get pages/results from sites that deal with the wetland type rather than pages that just have this word on the page or within the HTML...

Therefore, cuil does limit the SPAM pages that often have more words defined for search engines than it has content on its pages...

What it misses is other items that are somewhat related to the term...

My domain, pocosin.com does not appear within the first 23 pages of their results. While another 54,250 or so results were identified, there were only 23 pages of results. I saw now way of getting to the "other" results--perhaps they were duplicates?

While the technology may severely reduce those "SPAM" sites that contain those search terms for search engines, the technology is not working for me--even if it does work for other non-profit environmental organizations...

In my opinion, the technology needs further refining and might should have been refined prior to releasing the technology as "ready" for use. A "beta" for refining might be more applicable at this point in time...

I'm not a computer science engineer just a guy who enjoys browsing the web looking for stuff to use on my web based business. I have to say that for some time used Yahoo! late 90's and from 2000 Google has been my choice for whatever I'm looking for.I tried Cuil and was very disappointed of the results not the presentation which I like very much but for relevancy I end up reading the URL's and noticed quite a bit of spam all over the place. Right now it's just a waste of time. Mostly irrelevant results, mismatched images but, here's my surprise Cuil: my website appears number one! with my basic keywords and with the matching logo wow! my competition is nowhere to be found.

I wish the could fix it up but the way I see it will take a long long time.... by then we'll be talking about Cruel or whatever they want to call it.

Google's advantage is not just its search technology but the unmatchable data processing capabilities that they can harness to respond swiftly to queries. Last I heard, MS was pumping in something to the North of Billion Dollars to match up Google's processing capacity. Even if someone comes up with a great algorithm, they would find it tough to replicate Google's datacenter with Millions of machines wired together to give an advanced computing environment. Also, the search business is about speed, relevance and completeness - graphics etc. don't even count

Some of the results on Cuil were actually good, but not better than google. However, I found a new search engine call SupremeSearch.net, I believe this Search Engine may have what it takes to become one of the top 3 or 5 search engines.