Wednesday, December 29, 2004

Clive Thompson at Wired has an interesting article on BitTorrent, the filesharing software that has tens of millions of users and generates about a third of all internet traffic.

The entire article is worth reading, but I wanted to highlight this excerpt on using BitTorrent for watching TV:

BitTorrent is something deeper and more subtle. It's a technology that is changing the landscape of broadcast media.

"All hell's about to break loose," says Brad Burnham, a venture capitalist with Union Square Ventures ... BitTorrent does not require the wires or airwaves that the cable and network giants have spent billions constructing and buying ... BitTorrent transforms the Internet into the world's largest TiVo.

If enough people start getting their TV online, it will drastically change the nature of the medium ... The whole concept of must-see TV changes from being something you stop and watch every Thursday to something you gotta check out right now, dude. Just click here.

What exactly would a next-generation broadcaster look like? The VCs at Union Square Ventures ... suspect the network of the future will resemble Yahoo! or Amazon.com - an aggregator that finds shows, distributes them in P2P video torrents, and sells ads or subscriptions to its portal. The real value of the so-called BitTorrent broadcaster would be in highlighting the good stuff, much as the collaborative filtering of Amazon and TiVo helps people pick good material.

In a flood of information, we need focus. With tens of thousands of TV shows, we need personalization to filter, to help us find what we need.

Tuesday, December 28, 2004

The problem for newspapers isn't Craigslist. The problem for newspapers is the newspapers themselves. Specifically, that class of slow-blink-rate executive who refuses to see today through the lens of today....They recite from business self-help manuals and reduce the hard work of innovation and creativity to comic book parables. Meanwhile, they lose market share, circulation and audience. Ultimately these people will cost an industry its future.

Harsh words. I'd say that Cauthorn is being unreasonable, but moves by newspapers such as mandatory registration seem to support his fears.

Newspapers used to have localized monopolies on distribution. Reading the local newspaper was the only way to see local news and local classifieds.

Increasingly, newspapers have to live in a world of decentralized distribution. Advertisements that used to run in a local paper may now run on Craigslist, Yahoo Local, Monster, or eBay. More visitors will come to read local news not through the front page of newspaper's website, but via RSS feeds or aggregators like Google News.

Newspapers know local better than anyone. They know the local advertisers. They know the local news. They are the kings of local content.

Tom Curley (CEO of AP) said it best: "The franchise is not the newspaper; it's not the broadcast; it's not even the Web site. The franchise is the content itself."

Newspapers should take advantage of decentralized distribution. Before, advertising and classifieds would run in a print newspaper to a small subscriber base. Now, newspapers could distribute local advertisements out across many channels, with the newspaper managing the key relationship with the local advertisers. Before, reporters for the paper often find their articles condemned to the back pages, read by only a few thousand readers. Now, a vast audience of readers can discover their work through RSS and news aggregators, pulling readers to the newspaper's website through the strength of their content.

Grasping for the fading monopoly on local distribution will only cause it to slip away faster. Focus on the content. Embrace change.

Thursday, December 23, 2004

There's no shortage of people at Google who are disappointed with the way Orkut is not catching on. Google really wanted to build a powerful community, and it isn't going to happen through Orkut.

When I read Nathan's post, I realized it had been months since I used Orkut. Like many people, I played with it a bit when it first came out, set up my little network, got in contact with a few old friends and colleagues.

It was a fun toy. But the fun died quickly. The discussion forums were useless, all noise, no signal. The messaging system was full of spam, people foolishly broadcasting inane messages out to all friends of friends. And Orkut became so slow as to be unusable (something that, I can only assume, is quite embarrassing to the rest of Google).

My visits, initially a couple times a day, dropped to once a week, then dropped off entirely.

The toy wasn't fun anymore, so I stopped playing with it. Had it been more than a toy -- if it were a useful tool that helped me with my life -- I would have stuck with it, but there was no real value to Orkut.

Checking it out again now, it seems that everyone else abandoned Orkut too. The only ones left seem to be Brazilian teenagers. Oh, Orkut. What has become of you?

Talking to novices about webfeeds is like trying to explain the World Wide Web in 1995 to someone who'd never used a browser. But as soon as browser software became easily accessible and there was good content to view through it, the significance of the Web became clear to most everyone.

Because the Web (and XML) already existed when RSS was invented, it was relatively easy to generate webfeeds with interesting content. But we're still waiting for the equivalent of the first Netscape browser -- the software that makes ordinary consumers ... go, "Aha. I get it."

It's not at all clear to me that ordinary users want to know what a webfeed is. They just want news. They want their news to be quick to access, easy to read, and relevant to their lives.

Focusing on webfeeds confuses the tool with the goal. Webfeeds are a means to an end, not the end itself.

Information overload. Its the next big issue in publishing, and technology in general.

With the internet still growing and changing at such a rapid rate, the raw amount of information your brain processes will see a huge increase ... The flow of information into our lives is only going up and our free time is only going down ...

The key to our information gathering lives is all about smart aggregation. The days of media companies deciding whats on your "front page" are numbered. Within five years, I believe customizable newsreader technology ... will be as prevalent as the web is right now.

Wednesday, December 22, 2004

It's that time of year again. John Battelle has his predictions for the search war for 2005. It's a great list.

Personalized news and search isn't mentioned explicitly, but is implied in the long tail (#4) and in redefining what's possible in search (#10).

The further entry of Yahoo and Google into e-commerce (#7) seems to me like a bigger threat to Amazon and eBay than John says. If Google's AdWords intrudes on classified advertising (see "Google, small business, and eBay") and Froogle becomes the place to find and buy anything online (see "Froogle adds product reviews"), eBay and Amazon will be hurt.

In 2004, Google is the leader of GYM -- the triumvirate of Google/Yahoo/Microsoft, which in turn leads a dozen other related companies in the web-related innovations that improve peoples' lives.

Google lays down one gauntlet after another -- a better email experience and a Gig of storage, and a better desktop experience in searching my stuff, to name two examples from 2004 alone -- and Yahoo and Microsoft follow the leader by improving their email experiences and announcing their desktop search tools. Often then others follow the troika -- even if, as in X1's and Lycos's cases they actually had desktop search before Google did, once Google plants a flag it's like a shot hearing round the world, and everyone seems like a follower.

Together GYM and their followers offer a suite of tools that give me hope that I can manage my personal Web -- and accelerate my ability to search and research simply, to discover and find again easily, to filter and incorporate suggestions collaboratively. As the web grows, so does each of our personal Webs, and tools become not just important but critical to productivity.

And, dipping my knife into that peanut butter, a "Googlecalifragilisticexpialidocious" and "Yahoocalifragilisticexpialidocious" 2005 to you too, Adam.

Monday, December 20, 2004

Jeremy Zawodny argues that search engines should stop using links in weblog comments for PageRank in order to reduce the incentive for comment spam.

As with e-mail spam, the basic problem is that, at least for some, the benefits of posting spam exceed the costs. So, how do you attack the problem? Increase the costs or reduce the benefits.

Not counting links in weblog comments for PageRank reduces the benefits. People won't be able to use weblog comments to inflate their PageRank.

But this alone is not sufficient. There's value from a spammer just to having a link or even just a product name mentioned in a public forum. Since the costs are so low -- just like with e-mail spam -- a spammer only needs a tiny fraction of spammed people to respond to make their campaign of annoyance worthwhile.

Increasing the costs will have to be part of the solution. Spammers rely on being able to hit tens of thousands of weblogs automatically, so anything that makes this automation more difficult increases costs.

And there's many strategies out there to make weblog spam more difficult. Blacklists ban specific IP addresses from posting comments. Some require an account or a verified e-mail address before posting. Requiring entering a code from a distorted image (that is difficult for a robot to read) is another technique. Even asking a simple question (e.g. "What's the third word in this sentence?") before posting can be enough of a hassle to block spammers if everyone asks a different question.

But this will be an ongoing problem. The full costs of spam are not borne by the spammers. As long as someone, somewhere finds comment spam rewarding, the problem will exist.

The search requests did not need to originate from a web browser visiting Google.com.

Integration is triggered by observing outgoing packets, and occurs after packets are received, but before they are given to the web browser or application.

This is pretty cool. Google Desktop Search integrates local results into a Google search by intercepting the request out to Google and rewriting it before it gets to the web browser.

At this point, Nielson et al. had already found the chink in the armor, that the request doesn't have to be from a web browser directly. They tried a few tricks to get Google Desktop Search to show local data inappropriately. And were successful.

We found that the Google Desktop personal search engine contained serious security flaws that would allow a third party to read the search result summaries that are embedded in normal Google web searches by the local search engine. While an attacker would not be able to read the victims files directly, the search results often contain snippets of the file results that will be visible to the attacker.

Doh. No need to panic though. Google has already patched the problem and automatically updated everyone.

Google Desktop Search's integration of the local search results into a Google web search was really clever. Ever since I saw it, I've been curious about the details of how it was implemented. This paper was an enjoyable read.

Friday, December 17, 2004

"AdWords on Steroids" ... Any article or feed I'm interested in [has] content that can be mined and transformed into relevant pay-per-click advertising.

While Google and Overture sell advertising based on a limited number of keywords, the content in feeds is rich with information that can be mined to laser-target the advertising.

[Bloglines CEO Mark Fletcher] commented that the aggregate of subscriptions could also be mined to provide additional inventory, e.g., if I subscribe to Engadget and Gizmodo there is A) a strong chance I am a personal technology person and B) I am probably subscribed to other blogs that are gadget-relevant.

Mark's idea makes sense and is a better idea than injecting advertisements into my feeds.

There is a lot of rich data here. There is an opportunity to do well-targeted, relevant, useful, and unobtrusive advertising. We at Findory are planning something similar for our advertising engine.

Currently, Froogle is a price comparison engine, helping users find a specific product at a low price. With product reviews, Froogle gives users the information they need to differentiate between products, helping them find the right product at the right price. It's a much more useful service.

While companies such as Google and Microsoft are also experimenting with the idea of letting outsiders tap into their databases and use their content in unpredictable ways, none is proceeding more aggressively than Amazon.

The company has, in essence, outsourced much of its R&D, and a growing portion of its actual sales, to an army of thousands of software developers ... The result: a syndicate of mini-Amazons operating at very little cost to Amazon itself and capturing customers who might otherwise have gone elsewhere.

It's as if Starbucks were to recruit 50,000 of its most loyal caffeine addicts to strap urns of coffee to their backs each morning and, for a small commission, spend the day dispensing the elixir to their officemates.

The strategy behind Amazon Web Services is to give programmers virtually unlimited access to the very foundation of Amazon's business -- its product database -- whether they are inside or outside the company's walls.

Enhancing personalized results is a large near-term goal for Web search.

"Our challenge is to read a user's mind," says Daniel Read, vice president of product management for Ask Jeeves. It's an intriguing challenge, given that most Web searches today still contain just two to three words.

The example given of personalized search -- learning of a general interest in cooking and biasing all search results toward cooking -- is coarse-grained and doesn't capture the potential of personalization. Biasing all my searches toward a general subject interest isn't likely to work very well. How does my interest in cooking help when I'm searching for a camera? Fine-grained personalization focuses on your mission -- what you are doing right now -- and how to help you find what you want faster.

There's a brief mention of implicit vs. explicit personalization in the article. While it's true that implicit personalization is hard, working from sparse and noisy data, the article missed the major issue with explicit personalization: Most people won't do it. It takes work. People don't want more work. The entire point of personalization is to make things easier.

There's also a brief mention of privacy, something that can be handled by making users anonymous.

Personalized web, news, and blog search on Findory is fine-grained, implicit, and anonymous. We keep our eye on the goal, helping searchers find what they want quickly and easily.

For example, if you're reading a web page on RSS, perhaps they would surface some relevant video clips of news programs talking about RSS. It's a very hard problem, but it'd be pretty cool if they can do it right.

Google supposedly is also working on TV search, but it's still vaporware.

Wednesday, December 15, 2004

After hearing about Google's library project, David Coursey at eWeek says that Google is losing its focus.

My Google searches today are significantly less useful than the searches I made just a year ago. This is partially a reflection of the ever-increasing size of Google's collection, but it also shows how information providers have learned to spoof Google's robotic system.

I'd rather see Google concentrate on getting search right than trumpet how much is being added to its sea of information.

The company's first task should be throwing us a line, not building a bigger ocean.

Help me find focus in a flood of information. Help me find order in chaos. Help me find what I need.

The real threat remains the Web and how a vendor like Google has found a new way to exploit the Internet's utility beyond Windows.

Search is one of several mechanisms (fast data connectivity is another) that could catalyst alternative platforms. Search would give tremendous utility to portable devices connected to the Internet or home or corporate networks. With so much computing focus on information and so much information stored somewhere else (meaning not locally), ubiquitous search could unify the utility of many disparate types of devices.

So like Microsoft integrated the browser into Windows to fight off the threat posed by the Web, so the company is looking to tie the utility of search to its operating system. Because any technology utility where no Windows is required threatens Microsoft's core franchise.

The threat is much larger than just Google. It's about the future of Windows as the dominant computing platform.

Microsoft has been fighting this battle for many years. They worried about the rising power of handheld devices like Palm Pilots and cell phones, so they launched Windows CE. They worried about the additional functionality being built into game consoles, so they launched XBox. They worried about the rise of entertainment devices like TiVo and Replay, so they launched Windows Media Center. They worried about the threat from web-based applications, so they launched IE and MSN.

The latest shining star is Google. It's popular to talk about the search war as involving just two players, Microsoft and Google. In fact, the search war involves many players: Google, Yahoo, AOL, Microsoft, Amazon, and many smaller firms. And, the search war is only one front in the broader war Microsoft must fight to continue its dominance.

Tuesday, December 14, 2004

Charles Ferguson publishes a long article in MIT Tech Review on Google's "war with Microsoft":

Google's defeat is not a foregone conclusion. Indeed, if it does everything right, it could become an enormously powerful and profitable company, representing the most serious challenge Microsoft has faced since the Apple Macintosh. But if Microsoft gets serious about search -- and there is every reason to believe that it will -- Google will need brilliant strategy and flawless execution simply to survive.

What should Google do? Google should understand that it faces an architecture war and act accordingly. Its most urgent task must be to turn its website into a major platform, as [Amazon has] already done.

Google should first create APIs for Web search services and make sure they become the industry standard. Second, it should spread those standards and APIs, through some combination of technology licensing, alliances, and software products, over all of the major server software platforms, in order to cover the dark Web and the enterprise market.

The Microsoft giant is awake, says Charles, and it's hungering for a Google snack.

The impressive Google cluster is one part of Google's competitive advantage. I'm curious to see if Google does start offering better web services APIs. I'd certainly love to get my hands on that juicy cluster.

But will Google lose the search war if it doesn't offer better web service APIs? I doubt it.

Google has an impressive track record of innovation on its own. Amazon has web services APIs because it is seeking outside developers to boost innovation. Yahoo is considering them for similar reasons. But it's not clear Google has a problem with innovation. Google's biggest problem seems to be getting all the innovations available internally out the door and available to the public.

Furthermore, Google's lifeblood is advertising. Google is in the middle of building an advertising revolution. I think it is the AdSense revolution that will empower small websites and businesses, wrapping them around Google, not a software API into Google's infrastructure.

That being said, I do expect Google to launch services that allow users to further exploit the power of the Google cluster. But I expect these to be finished services like GMail that target end users, not web services targeting developers.

Monday, December 13, 2004

John Battelle reports that Google is digitizing the collections of major libraries.

Google is working with Stanford, the University of Michigan, Harvard, Oxford, and the New York Public Library to make millions of books available in its index.

The idea that the world's knowledge, as held through books and libraries, is opening up to all via a web browser cannot be understated. It's one thing to have the an original copy of The Origin of Species on the shelves, where students and interested parties have to travel to find it. It's another to have it available to everyone via a search index and your web browser.

This could well be a step toward diversifying Google's revenue streams away from advertising and into ... the content business ... Google is not doing this only out of the kindness of its heart - there is a lot of money to be made in selling books, in particular books with no copyright.

Are you paying attention to this, Amazon?

Update: Felicia Lee from the New York Times writes about Google Library and the reaction to it from scholars.

Update: Tara Calishain posts that the Internet Archive is expanding its text archive with support from ten international universities.

I'm not sure why these companies are launching poorly differentiated products into a crowded space. Google has some nifty integration with their web search that I kind of like. Other than that, I can't tell the difference between these offerings.

Even if one were better, it's not clear that'd be enough. The only reason this market opportunity exists is that the default search in WinXP and MS Office is poor. As soon as MSN integrates their desktop search into Windows, this game is over.

MSN seems to be using Lookout (a company they acquired) for much of their desktop search. Yahoo is licensing X1's desktop search. AOL will be using Copernic. Only Google decided to build their own.

MSN's entry is a little amusing since, aside from searching your browsing history, seems like most of what they're doing is just fixing the miserable file search functionality built into WinXP and MS Office products.

I do find the hype behind desktop search mystifying. You're searching a few thousand files and e-mails on a desktop box . The major problem is grokking thousands of different file formats, painful, sure, but not exciting. With web search, it's the scale -- billions of documents -- makes the problem interesting.

That being said, there is some interesting innovation going on in desktop search. Dashboard, and Blinkx are trying to do personalized information retrieval. They have early, first steps toward making your computer figure out what you are doing and what information might be helpful for that task. Very cool.

Saturday, December 11, 2004

For example, when I go to Wired magazine on Findory, because of my reading history, two articles are marked as personalized, "Troops stay in touch on the internet" and "Yahoo searches desktops too". When I go to Scobleizer on Findory, four articles are highlighted for me.

The problem with current web feed readers is that they don't solve the information overload problem. Sure, I can pick and choose which RSS feeds I subscribe to. But, once you have tens of subscribed feeds, reading them becomes this cumbersome process. Click on a feed, skim the articles. Anything interesting in that one? No. Click, skim. Click, skim. Click, skim. Ugh.

With Findory, the important news bubbles to the top. On the home page, interesting articles are selected just for you, pulled from thousands of news and blogs. On a search, relevant articles are highlighted. When you read a blog on Findory, important posts are highlighted.

Current RSS readers merely reformat XML for display. That isn't enough. They need to filter and prioritize. Show me what matters. Help me find what I need. Next-generation RSS readers will be personalized.

This is about more than just reading news. This is about information. Where before there was an undifferentiated glut of information, now there is focus. Where before there was noise, now there is knowledge.

What will this future look like? Findory has taken the first steps. Come and take a look.

Friday, December 10, 2004

In a long post, John Battelle describes the difference between Yahoo and Google:

Yahoo is a natural media company - the company is willing to have overt editorial and commercial agendas, and to let humans intervene in search results so as to create media which supports those agendas. Google, on the other hand, is repelled by the idea of becoming a content- or editorially-driven company.

While both companies ... lay claim to the mission of "organizing the world's information and making it accessible" ... they approach the task with vastly different stances.

Google sees the problem as one that can be solved mainly through technology - clever algorithms and sheer computational horsepower will prevail. Humans enter the search picture only when algorithms fail.

But Yahoo has always viewed the problem as one where human beings, with all their biases and brilliance, are integral to the solution ... Humans first, technology second.

Update: Okay, I take it back. Looking at this more, I'm impressed, not with the data, but with the UI. Google is using some clever Javascript tricks (you can see in the code at http://google.com/ac.js) to constantly talk back to the server and retrieve data about how to expand your search string.

Neat-o-jet. Like GMail, this is a remarkable use of Javascript to create a simple, clean, functional UI within the web browser.

Thursday, December 09, 2004

Chris Sherman writes about the new partnership between Eurekster and Friendster. I suppose these "-ster" companies just can't help but get together.

Their new search engine personalizes Yahoo search results using your Friendster social network. From Chris' article:

Search results are prioritized with results viewed by anyone in your personal network appearing at the top of the list. These results are highlighted with a smiley face icon.

It's an interesting approach, but it remains to be seen how well it works. On the one hand, you trust your friends, so things your friends clicked on might be interesting for you to know about.

On the other hand, I'm not sure how often this will change the search results, whether the changes will focus your attention on the most relevant result for your search, and whether it is scalable to access search and clickstream history for everyone in your social network on every web search you do.

Nevertheless, it's an interesting development, an unusual use of a large social network to do a version of personalized web search.

Tuesday, December 07, 2004

Neowin reports that MSN will be launching a new e-commerce service called Messenger Marketplace:

Buy and Sell within social network, also list wants and share recommendations. List items you want to sell, things you are looking for, and your recommendations. Your buddies notice new items youve listed when they login. They can either buy, sell or refer you to one of their buddies.

It is like eBay except with people that you already know and trust directly (or a few degrees out).

Clever idea. Hard to do this without already having a good social network built, but MSN does have that for MSN Messenger users.

Danny Sullivan (Founder of Search Engine Watch) says the solution to manipulation of search result rankings is to:

... involve human editors as part of the search equation. At one time, several search engines allowed human beings to make editorial choices about what would be shown in response to a query, to complement technological selections. Today, all the major services have sadly followed Google's lead in assuming all things can be solved through automation and search algorithms.

I assume Danny doesn't literally mean human editors hardcoding which results are returned for queries. How do human editors scale to billions of web pages? How do you do this efficiently and effectively, at low cost with high quality?

You might imagine that humans could provide canned responses to the most frequent queries. But this would only apply to a small subset of queries, and even this would be prohibitively expensive to maintain.

A more scalable and more common form of this is shortcuts where a search engine will detect particular categories of queries and return some results from a specialized data source. This is automated, of course, but humans are involved in identifying and creating the shortcuts.

I do wonder how much this debate of human vs. robots is a real issue. Truth be told, search engines have teams of good ol' humans analyzing data behind the scenes. These humans discover patterns in the data that are lowering the quality of the relevance rank, such as search engine spam, and change the algorithms to adapt.

Is this different than using "human editors as part of the search equation"?

Monday, December 06, 2004

Jakob Nielsen summarizes research on user's perceptions of online advertising and which advertising practices (popups, playing sound, blinking) are most annoying.

Jakob ends with some "Lessons for Websites":

Sites that accept advertising should think twice before accepting ads that 80 to 90% of users strongly dislike. The resulting drop in customer satisfaction will damage your long-term prospects.

Advertisers themselves might be tempted to continue with these nasty design techniques as long as they can find sites that will run them. After all, they typically yield higher clickthrough rates. But clickthrough is not the only goal. Users who are deceived into clicking on a misleading ad might drive up your CTR, but they're unlikely to convert into paying customers. And your brand suffers a distinct negative impact when you antagonize customers.

Rayg (from Feedster) asks, "If search engines are so willing to pimp their space to sponsored links, why ... [not add] an affiliate ID in the [search result] links?" Ray goes on to argue that the move would largely go unnoticed and wouldn't damage credibility.

Jeremy Zawodny (from Yahoo) disagrees, saying that this would blur the lines between sponsored and non-sponsored results too badly. Even if the relevance rank is unbiased by the affiliates revenue, the perception that some links are paid would damage the credibility of the search engine.

I have wondered if Yahoo and Google have considered adding affiliate links, not to their search engine, but to their metashopping searches (Yahoo Shopping and Froogle).

But Jeremy's point that probably applies to shopping search as well. It would look like a conflict of interest and potentially damage credibility, even if the affiliates revenue did not influence their relevance rank.

Sunday, December 05, 2004

Gary Price reports that Google recently registered a few new domains, including googlereviews.com.

As Nathan Weinberg points out, there isn't really a Google Reviews product out there yet. Closest to it is the merchant reviews spidered into Froogle.

I'm curious to see if Google will be releasing a review service that is more of a competitor with Epinions, CitySearch, Zagat, and Amazon.com's customer reviews. What would this look like?

One version of this could be a combination of product reviews spidered from the web and reviews entered by Google users. I imagine these product reviews would be integrated into Froogle, supplementing the store reviews. Currently, Froogle is a price comparison engine, helping users find a specific product at a low price. With product reviews, Froogle gives users the information they need to differentiate between products, helping them find the right product at the right price. It's a much more useful service.

Or perhaps we'll see small business merchant reviews integrated into Google Local, as Yahoo Local already has done. Currently, Google Local is essentially the Yellow Pages, helping users find a local merchant. With merchant reviews, Google Local would help users differentiate between merchants and find the right merchant for the task. Another much more useful service.

Friday, December 03, 2004

Fortune has an interview with Eric Schmidt (CEO of Google). It's interesting and worth reading.

One particular quote on helping small businesses caught my eye:

The longer term goal is to have businesses give us very timely local information. So, for example, they'll say we have too much of this or too much of that product, and we want to have a sale. The goal is to have the computers arrange that real time and send out targeted advertising to interested parties nearby.

Eric is saying that they want to help small businesses tell interested people about individual products.

Is this more than just targeted advertising? To me, it's starting to look like allowing merchants to use Google to sell their products.

What do I mean? Consider eBay for a moment. What is eBay really? It's classified advertising. Small merchants post advertisements for their products on eBay. Buyers come to eBay, find what they need, and close the sale, often using a non-eBay site for payment. eBay's business is essentially classified advertising.

Now, look back at what Eric Schmidt said. Google will help merchants target individual products to interested people. Advertising at this level of granularity is very similar to eBay's product. Using Froogle, AdWords, and AdSense, small merchants could sell their products through Google instead of eBay.

Update: Three years later, a BusinessWeek article reports the eBay "magic is gone ... Shoppers are simply not buying all the inventory anymore. Some items languish without a single bidder. Many shoppers opt for other sites including Amazon.com, use sophisticated search engines such as Google and Yahoo!, or head to store sites directly."

Thursday, December 02, 2004

Eric Peterson is one of many to look at what's going on at Las Ultimas Noticias. The Chilean newspaper is using click data to see what stories are popular and picking headlines for the next day's paper based on that data. Clever to use of online data (clicks on their website) to change an offline publication (their print newspaper).

As Danna Harman describes, the technique has been part of turning the paper from "a middle-of-the-road piece of nothing" into "Chile's most widely read newspaper today."

But some are concerned that newspapers "just cater to the lowest common denominator" if they use this kind of data without exercising good judgment.

Wednesday, December 01, 2004

Personalized Search: We're the first commercial search engine to modify web search results in real-time based on the searcher's behavior. Our changes are modest for now, but will increase over the coming months. Our News and Blogs search engine is also personalized.

New Look: Our redesigned website combines Findory News and Blogory, making it easy for you to keep up with current events.

Source Pages: Now every news source and blog has its own page. This makes it easy to find recently published articles. Our readers are also using it to discover related articles and explore related sources.

Search History: Findory keeps track of all your web, news, and blog searches in one convenient place, so you can easily retrace your steps.

Fountains of Feeds: Findory content, both personalized and unpersonalized, is now accessible from 44 categorized RSS feeds.

Findory Blogs by E-mail: Weblog headlines are now available as a daily e-mail delivery, alongside Findory News daily e-mails.

Last but not least, just today we launched My Recent Sources, a new feature which makes it easy for you to keep track of what news and blog sources you've been reading.

We're so pleased by the positive reaction to all our hard work. In the last month, traffic to Findory more than doubled! Thanks to all our readers for using our service and providing valuable feedback.

I'm curious where Yahoo is in all of this. Given all their other community and content creation features, you'd expect Yahoo to be faster out of the gate on a Blogger knockoff than Microsoft. But not this time.

Rex Hammock responded: "Someone much smarter than I can speculate what algorithm causes that to happen. Whatever it is, it makes perfect sense to me as we discovered each other through our blogs and have since become friends and even got together for lunch recently when I was in New York."

Nathan Weinberg at InsideGoogle called Findory "a powerful, smart news site". Nathan even went as far as to say, "I think I can now replace Google with Findory for [some] searches." since Findory "'just works', and works far better than anything out there."

Cindy Chick at Law Lib Tech said: "Personalization obviously has a lot of advantages in a many different areas. Personalization is what Amazon uses to display other items that might interest you, and it's what Findory uses to give you the news that you want to read."

Bruce Schneier responds to some of the hype ([1][2][3]) over so-called security flaws in Google Desktop Search:

Google's desktop search software is so good that it exposes vulnerabilities on your computer that you didn't know about.

Some people blame Google for these problems and suggest, wrongly, that Google fix them. What if Google were to bow to public pressure and modify GDS to avoid showing confidential information? The underlying problems would remain: The private Web pages would still be in the browser's cache; the encryption program would still be leaving copies of the plain-text files in the operating system's cache; and the administrator could still eavesdrop on anyone's computer to which he or she has access. The only thing that would have changed is that these vulnerabilities once again would be hidden from the average computer user.

GDS is very good at searching. It's so good that it exposes vulnerabilities on your computer that you didn't know about. And now that you know about them, pressure your software vendors to fix them. Don't shoot the messenger.

Monday, November 29, 2004

Want to try it? Read a few news or blog articles on Findory, then do a news or blog search for something related to some of the articles you read.

For example, if you read the Wired article "Google Treads on Microsoft's Turf" through Findory, then do a news search for "desktop search", you'll see some of the articles will be marked with our orange personalized icon. Clicking on the icon will explain why the article was recommended.

As with our personalized web search, our personalized news and blogs search is a first step. As we learn more about how to help people find what they need, we'll begin to make more dramatic changes to the search results.

Personalized search is the future. We at Findory are excited to be part of it.

Steve Outing talks about the launch of Newsbreak, an Australian news aggregator from Fairfax Digital. The aggregator itself seems indistinguishable from many others, but what is interesting is that it comes from a traditional news organization, Fairfax Digital.

Steve says Newsbreak is a reaction to Google News and other online news aggregators. And he thinks this is a positive sign:

This continues the trend -- a good one, I think -- of traditional news organizations realizing that they can't continue to operate as islands on the Internet. Linking to other sources (even competitors, in many cases) serves the interests of readers, and establishes the news entity as a portal to the world of news, not just its own coverage. Such services give readers of a news brand less of a reason to turn to Google News, et al.

If only they would embrace the opportunity, traditional news organizations should be better positioned to innovate in online news than Google or Yahoo. Up to this point, innovation has been coming from elsewhere.

Stefanie Olsen at CNet reports on Google, Yahoo, and MSN's efforts on search for video streams.

One particularly interesting excerpt on Google TV search:

Google's project for TV search is ultra-secretive; only a handful of broadcast executives have seen it demonstrated so far. To build the service, the company is recording live TV shows and indexing the related closed-caption text of the programming. It uses the text to identify themes, concepts and relevant keywords for video so they can be triggers for searching.

The software allows people to type in keywords, such as "Jon Stewart," to retrieve video clips of the comedian's TV appearances, marked with a thumbnail picture with some captioning text, for example. Refining the search results for the show "Crossfire" would display a page that looks similar to a film reel, with various still images paired with excerpts of closed captioned text of the now-infamous fight between Stewart and CNN's "Crossfire" hosts. The searcher could click on and watch a specific segment of the show.

Watch out TiVo (and Comcast, DirectTV, ...).

See also my earlier post, "Query-free news search" which mentions a Google paper on searching television close caption text to find related news articles.

Chris DiBona digs up an interesting old article by Mimi Sheraton that criticizes Zagat's user reviews:

The Zagat surveys stand or fall on their central premise: that thousands of separate opinions add up to something like the truth ... [But] the majority can be wrong, and one well-informed opinion is worth more than those of a thousand amateurs.

It's a great point. How do you find the authoritative, well-informed, useful opinions? Not only does this apply to community-generated content like customer reviews and product ratings, but even to blog postings and discussion forum comments where the signal-to-noise ratio is equally poor.

One common approach is to allow people to rate the reviews. Amazon.com does this for customer reviews, allowing people to vote on whether the review was helpful. Slashdot takes this a step further, not only allowing users to moderate (rate comments), but also allowing users to metamoderate (rate the rating of the comment).

Mimi Sheraton would probably criticize this approach as just layering a popularity contest on top of a popularity contest. And it does have problems. For example, positive reviews on Amazon.com seem to get many more "helpful" votes than negative reviews. Slashdot moderators seem to have an adolescent sense of humor and favor ill-informed rants, perhaps seeking entertainment more than information.

So, what else can we do? Another approach is to attempt to identify authoritative people and treat all of their reviews or comments as higher quality. This is closer to what Mimi wants, well-informed reviewers to count more than uninformed reviewers. The trick is identifying informed reviewers. Amazon and Slashdot both emphasize active users, I'd guess on the theory that those that bother to put in the effort to be involved probably have something useful to say. Users could rate each other, but this again reverts into a popularity contest.

This does seem like a spot where social networks actually could be useful. Who is an authoritative reviewer? Someone who is considered authoritative by other authoritative users. Yes, it's circular, but identifying a seed set of authoritative users is enough to start the process going.

Most e-mails to Findory are either suggestions or oh-my-god-this-is-so-great fan letters. We're thrilled by the feedback we've been getting. It's great to have such an enthusiastic and supportive community using and enjoying Findory.

If you do have ideas, suggestions, or things you'd like to see at Findory, please feel free to drop us an e-mail anytime at suggestions@findory.com or comment on this post. We'd always love to hear from you.

Sunday, November 28, 2004

Gary Stein (analyst at Jupiter Research) posts that Mamma, a metasearch engine based in Canada, just purchased Copernic, one of the leading desktop search companies.

Gary doesn't seem to think this was a very smart move by the so-called "mother of all search engines". He says, "Desktop Search has fully entered into the world of hype," and criticizes desktop search as having no business model: "No one's going to be cool with seeing ads -- contextual or otherwise -- displayed with their desktop results."

It also seems to me that, if Microsoft makes the default file and e-mail search on Windows "good enough" for most users -- perhaps by releasing MSN Desktop Search as part of Windows -- most of the opportunity for third-party desktop search applications will evaporate.

Update: Five months later, Copernic kills the deal due to a ongoing SEC investigation of Mamma.

It's an interesting point, particularly since Yahoo started as a web directory.

Google also deemphasized its directory a few months ago. Google Directory is based on DMOZ, the "largest, most comprehensive human-edited directory of the Web." At the time Google deemphasized Google Directory, I thought Google would be releasing a new, automated version of a web directory soon. That hasn't happened.

Keyword web search is great, but there's times when a browseable web directory is really useful, such as when you want a list of related sites, a comprehensive list of sites, or you're having a hard time specifying a search query that gets you what you need.

Update: Andrew Goodman says, "The lack of a definitive directory or two is the single biggest glaring hole in online search."

Wednesday, November 24, 2004

[Young] focus-group participants declared they wouldn't accept a Washington Post subscription even if it were free. The main reason (and I'm not making this up): They didn't like the idea of old newspapers piling up in their houses.

Don't think for a minute that young people don't read ... They access The Washington Post website or surf Google News, where they select from literally thousands of information sources. They receive RSS feeds on their PDAs or visit bloggers whose views mesh with their own.

In short, they customize their news-gathering experience in a way a single paper publication could never do. And their hands never get dirty from newsprint.

But should newspapers be worried? This trend toward online news is an opportunity. No longer are newspaper articles competing for scarce space on the front page and the limited space available on the newsprint. No longer are articles limited to distribution to a localized markets.

The online news audience is massive and worldwide. It's hungry for your content. All you have to do is give it to them.

The company's signature cartoon butler, known as Jeeves, was a symbol of dot-com excess ... "We had great marketing, but the product just didn't deliver," [CEO Steve] Berkowitz admits about Jeeves' early days.

Jeeves was initially known for its gimmick: It promised to answer any query formed in a question. Most of the time, though, Jeeves replied with irrelevant links, sending millions away to alternatives such as Google.

[Acquiring Teoma in 2001] enabled Jeeves to acquire its own search technology and make its search results more relevant to queries ... Jeeves' most profitable move of all [was] deciding to partner with rival Google. Google-placed text ads, which appear atop Jeeves' search results, represent nearly 70% of Jeeves' income.

"We look at the Web differently  at the credibility of a source, as opposed to just the popularity of a site," says Jim Lanzone, Jeeves' senior vice president.

For instance, a search for "Bay Area airports" on Jeeves displays official airport sites for San Francisco, Oakland and San Jose. The same search on Google highlights local newspaper articles about the airports.

The biggest problem I have with Ask Jeeves is the focus on advertising. SiliconBeat illustrates this well with screenshots of the same search on Google, Yahoo, and Ask Jeeves.

On Google, search results are at the top. On Ask Jeeves, advertising (sponsored results) fill the top of the page. Which is more appealing to someone trying to find something?

Ask Jeeves' advertising-focused page may result higher short-term revenue, but Ask is crippling its long-term growth with its obnoxious and intrusive advertising.

Brian Dennis points to an interesting paper by Brian Whitman and Steve Lawrence, "Inferring Descriptions and Similarity for Music from Community Metadata".

If that title didn't turn you off completely, the paper does have an interesting idea. Basically, they mine text in web pages, discussion groups, and blogs (which they call "community metadata") to discover information about music artists. They extract phrases from the community metadata and use it to find relationships. Because they analyze the web pages and discussion groups continuously, they claim to be able to capture short-term trends, like a groundswell of buzz around a particular song or artist.

This idea of extracting data and relationships from community metadata is clever. AllConsuming.net is an interesting example of this for books. It "watches weblogs for books that they're talking about". Memeorandum is an interesting example for news. It watches blogs to see what news articles they are talking about.

By the way, one of the authors of this paper, Steve Lawrence, is now at Google.

Every article on Findory has a link to the appropriate source page just under the title. Each source page has recent articles, related sources, and related articles. Related sources and related articles are a great way for readers to discover interesting news stories and sources.

Raul Valdes-Perez (CEO of Vivisimo) has a CNet article attacking personalized web search and touting the virtues of document clustering.

Raul makes some excellent points on the difficulties of doing personalized web search well. He says people's interests are fleeting and noisy. Raul says it's difficult to accurately infer interests from clickstream data, which is also noisy and imprecise.

It's true that personalized search is challenging. But Raul criticism is overstated. If personalized search learns immediately in response to new data, it can react to people's immediate goals and interests, even if they differ from their long-term behavior. If the personalization helps in more cases than it hurts, then the personalization has value, even if the data is noisy and the assumptions made from the data are speculative.

Raul's solution is to give up on personalized search and do document clustering instead. Vivisimo's Clusty is an excellent clustering web search -- if you haven't tried it, go try it, it's great -- but it requires effort. Users have to refine their query repeatedly using the clusters to find what they want.

People are lazy. They want what they want and they want it now. Google recognizes this, providing an "I'm feeling lucky" button that just sends you to the top search result immediately. They recognize that it's better to just find what the searcher wants on the first try, no refining, no effort.

Personalized search offers improvements to relevance rank by recognizing that relevance differs from individual to individual. Personalized search makes it more likely that you find what you need on the first try.

Sunday, November 21, 2004

Robin Sloan produced a clever Flash movie called "EPIC 2004" speculating about the future of personalized news and information. Worth watching.

After a brief recap of events of the last decade, the movie speculates about a future product from Google, the Google Grid, a vast file and content-sharing network that appears to be some combination of Blogger, TiVo, Napster, and the Google cluster. In Robin's vision, this is followed by MSN Newsbotster, a personalized news site that appears to be some combination of Findory, Slashdot, Memeorandum, and social networking tools like Friendster. Next up is Googlezon and EPIC, an "evolving personalized media construct" that provide personalized information by summarizing and rewriting content dynamically for each user.

Summarizing news and documents as described in EPIC is very difficult, but it is an active area of research. One of the most interesting examples out there now is Columbia Newsblaster. Microsoft Research also is doing work in this area.

Aside from the silly brand names of Newsbotster and Googlezon, Robin Sloan has created an interesting and thought-provoking vision of the future. Definitely watch the movie.

The movie ends with criticism of this new world of personalized news and information, complaining that it will be dominated by "narrow, shallow, sensationalist trivia", apparently what Robin Sloan thinks is all people really want and all they'll get from personalized news. He also claims Googlezon and EPIC will cause the death of large and well-respected news organizations like the New York Times.

The death of the New York Times? Clearly hyperbole. At best, Google News is another distribution channel for news. While it may reduce traffic to the front page of online newspapers, it drives traffic to their content, to individual articles. As the CEO of AP said recently, the "content will be more important than its container." News organizations will continue and thrive in the future. The only differences are that content -- the work of talented reporters and writers -- will be emphasized and that the content will be distributed more widely.

Are personalized news sites more shallow or more narrow? Compare a personalized news site to the current front page of CNN. The unpersonalized front page of CNN provides only a shallow view targeting some mishmash of the general interests of millions of readers. By trying to satisfy everyone, it satisfies no one, a bland blend of interests that results in mediocrity. And, I only get the perspective of CNN, what they think is important to their readers.

Personalized news provides an opportunity to broaden reader's interests, exposing them to news sources, perspectives, and viewpoints they otherwise would never have seen. A personalized news aggregator provides both breadth and focus, sorting through huge numbers of sources and articles and helping you find what you need.

Personalized news helps you discover news you would otherwise miss. It makes it easier to get the information you need to be well-informed about the events that impact your life. If this is the future, it is a future which should excite us.

Friday, November 19, 2004

A couple weeks ago, Findory launched search history for web, news, and blog search. As I've said before, search history is not personalized search.

This week, Findory took our first step toward true personalized web search. In subtle and small ways, we are starting to modify web search results based on your history at Findory.com.

To see the impact, do a web search at Findory, then click on one or two of the search results, then do another search for something fairly similar. In cases where we believe we can help, we'll modify and highlight some of the search results.

Here's a couple specific examples:

Search for "Yahoo".Click on the top link for Yahoo.com.Search for "Dilbert".The Dilbert page at Google a few results down will be highlighted and modestly reranked.

Search for "Incredibles"Click on the IMDB link (fourth down).Search for "Nemo".The IMDB page on Finding Nemo will be highlighted and popped up to the top slot.

Please keep in mind these are our first, early, baby steps. The changes are small, infrequent, and subtle. Findory need to learn to walk before it can run. Over time, Findory will better understand how to help people find what they need and the changes will become larger and more frequent.

As small as this step may be, we believe it is a first for a commercial web search engine. Many are talking about personalized search, but no one is doing it. Our personalized web search learns from your behavior, modifies your search results, and helps you find what you need.

Update: It took many months, but a new version of Findory personalized web search has launched that makes more substantial changes in the relevance rank.

The Google Kirkland open house party was last night. It was a great time. Quite a turnout, totally packed. Strong UW presence, which wasn't surprising, but I was amazed by the number of Amazon and MSN people there.

Brady Forrest (PM at MSN Search, frequent poster on MSN Search blog) was there. Scott Pitasky (former Amazon.com, now head of HR at MSN). Erik Selberg (author of Metacrawler, one of the first metasearch engines, now at MSN Search). Robert Scoble was apparently there, but I didn't bump into him.

I got a chance to catch up with a few Googlers, Joshua Redstone (old friend from graduate school, works on GFS), Peter Norvig, Jeff Dean. Jeff Dean and I had an interesting discussion about the potential for abuse of MapReduce; I was arguing you might see tragedy of the commons issues because the system makes it so easy to consume vast resources on the Google cluster, but Jeff said everyone plays nice and that it isn't an issue. I was hoping to see Joe Beda, but he couldn't make it, unfortunately. David Krane was there, but I didn't see him.

Bumped into a couple of the Slashcode guys too, Brian Aker and Chris Nandor. Unbelievable that Slashdot uses NFS in a production system, but Brian and Chris insist it's not a serious problem.

I finally got a chance to meet Todd Bishop from the Seattle PI in person. Great to see you there, Todd.

Our lives are overwhelmed by all the information coming at us in a very disorganized way. We're going to hunger for something that will make sense of all the chaos--that will look at all the things happening in the world and filter and order them in a way that's personalized to us. That will be the next great revolution--that is something that doesn't take an index of the dead information on the Net, but the live information of things as they are occurring and as they are relevant to us.

The next great revolution is finding focus and relevance in the flood of new information. The next great revolution is personalized news.

Thursday, November 18, 2004

Adam Bosworth (who left BEA for Google recently) at his ICSOC 2004 talk:

You want to see the future. Don’t look at Longhorn. Look at Slashdot. 500,000 nerds coming together everyday just to manage information overload. Look at BlogLines. What will be the big enabler? Will it be Attention.XML as Steve Gillmor and Dave Sifry hope? Or something else less formal and more organic? It doesn’t matter. The currency of reputation and judgment is the answer to the tragedy of the commons and it will find a way. This is where the action will be. Learning Avalon or Swing isn’t going to matter. Machine learning and inference and data mining will. For the first time since computers came along, AI is the mainstream.

In addition to coining some odd words like "g-nius" and "glog", the article has an interesting piece on Google AdWords and AdSense:

In 2003, Google derived 97% of its revenues from advertising.

Google's simple text-based ad results often segue so well with the search that they can hold actual interest for the user. Even the most grudging critic probably has to acknowledge that there was a time when he or she clicked on one of Google's ads because of its relevance to their interests.

Relevance. That's where Google's got it down, especially with its AdWords and AdSense programs.

Charlene Li at Forrester writes about personalized advertising, which she labels "intent marketing":

Today, publishers announce that they have content and an audience that is attracted to that content  so if you're a marketer interested in that audience, the publisher will sell you access to those users in the form of advertising at a specified price. The onus falls to the marketer to figure out where the audience is, hence the important role of media buyers and ad agencies.

In the future, marketers will announce that they want to reach a certain segment  let's say, women in-market for a car  and are willing to pay $25 per qualified lead. The onus now falls to the publisher to deliver that audience to the marketer. Publishers will be able to see what the "bids" are within the system for a particular user profile and optimize their ad serving to maximize revenue per page.

This is the development of what I call "intent marketing" where the marketer targets intent, in this case, inferred from past behaviors.

I'd like to see this go one step further. I'd like to see the entire process of targeting advertisements handled as an optimization problem.

In this future, marketers create a large pool of advertisements with specific segments in mind for each ad. The advertisements go out on the network of publishers, mostly showing to people who match the segments, but also sampling related segments outside of the marketers intent. Quickly, the advertisements focus in on narrow clusters of readers who are interested or, if no one seems interested, the advertisements are dropped completely.

I'm not alone in having this vision. Many have talked about it. But it's quite a challenge to implement. It requires a massive amount of data, only possible at scale.

But Google AdWords seems close to doing it. They suggest alternative keywords, show ads for queries that aren't exact matches to the specified keywords, and drop ads that perform poorly. The next step is to use the vast amount of data they have on what ads are effective to start showing ads for other keywords than what was specified and to further narrow the targets when responsive audiences are found.

Already, I click on Google ads much more than other ads because they're relevant, especially when I do a Google search for a specific product. Perhaps advertising actually can be informative, unobnoxious, and useful.