A year after its launch, the computer-generated aggregator is still taking flak for how it defines news. But Krishna Bharat has had the satisfaction of seeing growing acceptance of his news site. In a Q&A with OJR, he explains how things work behind the screens, and why he calls the site "a force for democracy."

The beta tag still hangs on Google News, but in the year since it launched, the computer-generated news portal has achieved alpha status for millions of users looking for the latest news.

The 2003 Webby Award winner for best news site wasn't dreamed up by marketers looking for a way to extend the ubiquitous Google brand. It started with one avid news consumer trying to manage the flood of information in the days after Sept. 11, 2001.

With his personal news tool, Bharat could read headlines from around the world at a glance, and with a click he could follow the links to the source.

His news portal quickly became an in-house favorite at Google. The search company started a headline service later that year and posted the first beta of Google News in spring of 2002. The beta now in use launched September, 2002.

The algorithm that defines which news is gathered and where it's posted on the page is still being tweaked. Google has added eight international versions, including India, Canada and France. All nine editions update continuously as the program sifts through millions of pages on more than 4,500 news sites.

The thing I think is indicative of a news source is that some portion of the world believes it's a news source. Who are we to tell them it's not news?

Users can search the site and sort the news by date or relevance but have no say in the news front or section-front layouts.

Bharat's tool has taken flak for how it defines news -- including press releases, for example, and excluding many blogs -- and for the mistakes his algorithm can make in playing or selecting news.

Despite these criticisms, the site has become one of the best tools available for keeping tabs on running stories or watching news unfold beyond the usual-suspect array of news sites. The recent addition of breaking news alerts has increased its utility.

It's not the only news search engine, but it is a popular favorite: Google News had 2.24 million unique visitors in August, making it the 17th most popular general news site, according to comScore's Media Metrix.

The following is an edited transcript of a series of phone interviews and e-mail follow-ups with Bharat. Nathan Tyler, a Google spokesman, participated in the interview process.

Online Journalism Review: Walk me through that "ah ha" moment when you thought this was something you wanted to do for yourself?

Krishna Bharat: This is not my first experiment with journalism. When I was a graduate student, I was interested in selecting online news and trying to personalize it, so I had some experience with getting hold of news content and trying to categorize it.

OJR: Was that the Krakatoa Chronicle?

KB: Exactly.

OJR: Explain that, because that was back when Java was still a cup of coffee to most people.

KB: I was excited, because it allowed content on the Web to become dynamic. I was fetching essentially AP stories and trying to patch them into a format that allowed me to match people's actions with the kind of topics they liked. People would spend a lot of time reading [a particular] article, [the site] would remember that, and the next day the newspaper would look different. The layout of the newspaper was customized by people's behavior. But also it was deliberate ? move the slider and [the site] would change from extremely personalized to an extremely general one.

OJR: So you'd had this experience with journalism before with how to use people's actions and interests to determine the appearance. With Google News, you went in the other direction.

KB: I went in the other direction because I was not looking at users. I was looking at news content coming from different sources trying to understand what was common between this article and that article. Were they talking about the same event? How could I detect the fact that they're talking about the same event? Even if they used the same words but had different points of view and one was technical and one wasn't, one addressed the common man and one was business-oriented. These are all the challenges. You remember the way it came about?

OJR: Yes, but not everybody reading this may remember.

KB: After Sept. 11, when all the newspapers were recording who, what, when, where -- there was a big question of why. Why did this happen? What's going to happen in the future? A lot of people were spending a lot of time looking for news, and I was one of them. All the servers were slow and it took a long time to find the content. Fundamentally, I wanted to build a tool that would automate this: Here's a new development, let's find all the articles that talk about this development.

We audit the site to make sure we're getting it right -- and very often we are.

OJR: How many Web sites did you start with?

KB: I started with 20; my list grew to about 200. So the first demo of 20 newspapers was the top names, and I had something that visited each newspaper every hour and checked all the content trying to find out which one was new.

OJR: How would it know what the content was?

KB: There's a whole field of study called "information retrieval," which deals with text analysis -- trying to find which documents match the query, which documents match other documents. So I drew on a lot of technical work that I knew of in order to make this happen. ? I had to bring in a lot of intuition specific to the news domain to try to bring in diverse articles.

OJR: Did you wonder if you had the right to do this with other people's sites?

KB: The nice thing about research is until you actually make a product, you just want to find out if it works. In the long run, the issue of rights had to be addressed. In a sense, it was a no-brainer because what we do fundamentally at Google is we take people to the content -- and this is another way to take people to the content. We don't manufacture content. We don't substitute our content for theirs.

Usually when people come to Google search, they tell us what they're looking for. In the context of news it's unfair to ask them to tell us what they're looking for -- because it's news. It's new. They may not know it's happening, so the burden is on us to tell them what's interesting and new. ...

OJR: In terms of project lifetime, this seems to have moved pretty rapidly.

KB: That's not atypical for Google. We're a young company at heart.

We get 100,000 articles a day. A human editor couldn't read that many.

OJR: At what point did you starting thinking this moves beyond personal use to become a potential Google application?

KB: We had an internal demo that was updating every hour and people were taking it quite seriously. ? Somebody said, 'Hey, if you could actually categorize this into sections, it would be almost a newspaper.' That wasn't very hard to do.

OJR: How do you define news?

KB: Honestly, I didn't spend a lot of time thinking about it. Reporting and commentary on current events from a verifiable source. I know that's pretty broad. The thing I think is indicative of a news source is that some portion of the world believes it's a news source. Who are we to tell them it's not news? ? Since you asked for my personal definitions of "news" and "news source," these shouldn't be interpreted as Google News or Google Inc.'s definition. Google News has a team of reviewers who decide what should go in our crawl. I provide input but do not make the selections.

OJR: Where do press releases fit in?

KB: Press releases we don't consider to be a news source, that's for sure. Historically, we started out with a search where we believe all information is good to make accessible to people.

We're attuned to journalists. But we're more inclined to listen than to follow the rules blindly. I don't want to go and police all the news out there. I've seen lots of articles where the press release appears verbatim. Do we wait for that to show up hours late, or do we allow people to use it and act on it -- especially when it's a business item?

There are no press releases on the browsable pages or news pages. We have a higher editorial responsibility on those because we're telling you where you should look. On the news pages, we do not intend to use press releases. We would never do anything to compromise the objectivity of the product. We don't even show advertising ? we do this because we think it's useful. Making a press release available as part of the search results gives the full facts that were available to the reporter when they wrote it.

OJR: You can sort of see when a story is hitting the critical mass of interest that pushes it into your news area.

KB: Take the SARS epidemic. When it was just in Hong Kong, it was just a small story from a news point of view. When it came to Canada, it became a much bigger deal. And when a plane was quarantined at San Jose airport, it became a mega story. ? We try to give you more of a global perspective. You may not have heard about as much of SARS [during] the war coverage in the newspaper, but on Google News there is enough diversity to pick up on stories such as this.

OJR: Sometimes this diversity can work against Google News. I'm thinking, for instance, of the morning the shuttle disintegrated and how long it took for the story -- that was instantly the uppermost story for so many people -- to move up the ranks of Google News. And it kept slipping in and out all day.

KB: There were two problems there. One is fundamentally [that] we'd focused on adding value almost as a research tool to bring together different perspectives once a story's fairly mature. We hadn't spent that much time worrying about breaking news. Secondly, for that one particular incident there were some technical problems that got in the way. But the fundamental issue still remains that Google News was originally intended as a tool to put together diversity of opinions beyond who, what, where. We are addressing the problem and soon we hope to be able to have a much fresher response to breaking news. ?

We are fundamentally a big-picture newspaper, but we can't neglect the breaking news aspect and will probably have to get into a different strategy.

OJR: You have taken some criticism for that.

KB: We have. That's certainly the most glaring problem. People have complained about inappropriate images and so forth, but those don't really get in the way of what we're doing. This does.

The news community needs to figure out how they're going to get traffic from us. The New York Times has a nice solution.

OJR: You also have another issue where people in the journalism industry seem to go on the defensive when it comes to Google News. When a picture is mismatched or a press release is at the top of the page, people seem to take a sort of delight in Google News being wrong. Especially if they're in the business, they like to say, "See, you can't replace a human editor."

KB: I notice that. If it happens all the time, I think they have a very good point. If it happens in one edition out of 100, it's misrepresentation to make it sound like it's the norm. That's one problem. The second thing is I think they take it personally -- and that was never the intention. ?

OJR: I think they resent the idea that a computer can do their work sometimes.

KB: It wouldn't be there at all without these editors in the first place. We've been dragged into a comparison we never intended.

OJR: I've seen Google News all along as complementary, as a way of bringing more voices to the mix, more information to my fingertips and also distributing an audience back to sites that may not be the first place you would go.

KB: You might not even know that site exists.

OJR: Or that they're covering that particular story in a certain way. That's the plus side of it. Some of the possible minuses: weight. Sometimes I think that with Google Search we're so used to seeing things ranked in the order of their potential value to us, that it's a little jarring when Google News doesn't seem to do the same thing.

KB: We are ranking it, but maybe not the same criterion as the general search. What's an example?

We want to create a newspaper that's suitable for everyone. Personalization is a much grander challenge.

OJR: Here's an example somebody else threw out -- [when] Coach Jim Harrick retired [from the University of Georgia] under a cloud of smoke and scandal. This [was] very highly covered on the national level, but very intensely on the local level. And ... somebody pointed out from the newspaper in Athens, Ga.: You would think when you go to Google News, you might see the stories that would give you the most information about this, and you might start with the local coverage. But instead, what he saw was something like two pages of AP stories. He does have a point. You would expect that the Athens Banner-Herald and The Red and Black and the Atlanta Journal-Constitution would have the most in-depth information.

KB: Two possibilities. One is there was a technical problem with the way things work and it didn't get to be in the running. The second possibility is that we had a number of other criteria to use and we emphasized one. We may look at how popular the source was in general, how well the content matches, at the time it appeared. Sometimes the local newspaper has the first article and everybody else comes later.

In general, we use a number of techniques. One of them is the fact that the newspaper is local, but we can't overemphasize that. We throw all of these different criteria in the mix. A human would probably say, "Let's emphasize this one," but the computer is after all a machine, and it has limitations. In many other cases people have commented to me, "Oh, you featured a local news item as No. 1 and it makes perfect sense. That's great." There is a component of that in our ranking, but we can't get every cluster right all the time. But we try our best.

There's a limit to how much we can do that (rank local stories above wire stories). Imagine if all the Iraqi stories were like that -- that would be terrible. We have to draw a line. That's why editors are wonderful. If we had an editor looking at every story, they could make the right judgment every time. We have so many issues to balance, sometimes we get it wrong. We audit the site to make sure we're getting it right -- and very often we are.

The thing that you must remember: It's hard for a computer to tell when an article has something that's fundamentally different from other articles, to find that nugget that's different. So we have to generally emphasize freshness just to make sure we find that nugget. ? We also want to have the most reputable sources. We have all these things that are tugging at us. One solution for that is to browse over the news for many editions.

OJR: A human editor wouldn't be able to do what this is doing.

KB: Because we get 100,000 articles a day. A human editor couldn't read that many. We have people who try to create an aggregate of what's been done in the media on a given topic and they write a report about it. Journalists do that all the time, and they do an extremely good job. But imagine doing that for every story in the world, every time. We want to give you speed in addition to timeliness.

We don't think of newspapers as competition because I think we have a complementary relationship with newspapers.

OJR: Another aspect that your average user might not be attuned to even though it says it right at the top is -- no matter how many sources you have -- it's not complete. It's selective. You're not claiming to access all the news sites, and there are times when you're going to come online and think there may be certain publications in the mix and find they're not there.

KB: We are only able to crawl sites that allow us to crawl. Any news search that tries to link you to new content is going to come up against a barrier -- either they specify that robots are not allowed to access this site, or they put the content behind registration that the machine cannot get by. This is a fundamental issue. It has to do with how people monetize their content. ? The news community needs to figure out how they're going to get traffic from us. The New York Times has a nice solution. They allow us to connect to the content and send traffic to one page. If people want to browse beyond that then they have to register. If people are really happy with the content they'll register. I think that's a great model.

OJR: What kind of response have you had from journalists?

KB: I think the first reaction that everybody had was, "Wow, this is a cool tool." They didn't know it could be done. Then afterwards there was this whole issue of, "Are you trying to be competitive?" Lately all of that has settled down. Newspapers are trying to make sure their stories are well represented.

OJR: Do you have people who ask you, "How can we make our story the lead story?"

KB: No, because the lead story is hard to define because it keeps changing every 15 minutes. They will ask us why our coverage is not as much as they would like. Usually the answer is it's exactly as much as comparable sources are getting. ? We don't usually change our ranking, but if the problem is more fundamental -- like if we fail to crawl their content -- we would certainly act on that.

OJR: So where's the list of all your resources?

KB: We don't disclose that list for competitive reasons. We've never made that public.

OJR: What percent of the sites are domestic and foreign?

KB: Google is international. I don't have the number off the top of my head, but there's a huge number of sites from the U.S. -- I'm guessing 50%. There's a huge bias towards English-speaking countries. It's a bit misleading, because a lot of sources just run wire stories.

OJR: That takes me back again to the issue of weight. We talked about how stories end up at the top of the list. I've also heard some comments about the repetitive nature of some of the stories, about the duplicates that show up. What are you doing to deal with the fact that the first three pages some people might see could be the same AP story?

KB: On the home page, we make a conscious effort not to duplicate. On the results page, you're supposed to get the non-duplicates first and the duplicates afterwards. On the home page, we try to be a little bit more picky. After you run out of all the original stories we've detected, you'll start seeing duplicates.

OJR: Part of what we'll see has to do with when we're doing the search?

KB: If it just happened, we might only have duplicates to show you.

OJR: At what point do you -- or do you -- take the power of Google News and give it to people to personalize?

KB: That's an excellent question. At some point we will address that. Right now we have a lot on our plate. People have asked for personalization. We have nothing planned at this point though.

OJR: What could personalization do? What would be the positives?

KB: The positives for personalization: You'd be able to say, "I like baseball, not cricket," -- and only get baseball. In the extreme, that kind of puts you in a shell where you only see what you like. But between that extreme and where we are now, there's a lot of space.

There are lots of possibilities. The kind of things people like to personalize -- geographically, particular sport, celebrity, or company or industry. They might just be random searches they happen to like or all of the above. Anyway, we have no immediate plans.

We've been responsive to what has been asked thus far and we have a very diverse pool of users. We want to create a newspaper that's suitable for everyone. Personalization is a much grander challenge.

I want this to be a force for a democracy. I want us to be an honest broker, and I want newspapers featured on our site to get traffic from us.

OJR: Is there a chance that this would never come out of beta?

KB: No. Beta certainly means something. It basically means that our design is in flux, that we are evaluating the design and we are trying to make the engineering work. Before long we will have enough stability in engineering and we will have defined the pattern to the extent that we can take it out of beta. That's not to say we won't make any changes after that. We are still evolving.

OJR: Who are your competitors?

KB: Other news search engines. We don't think of newspapers as competition because I think we have a complementary relationship with newspapers.

OJR: What about the portals?

KB: We send a lot of traffic to them. I see them more as partners.

OJR: If people were to pay you to be part of this or were to be in partnership with you in a revenue producing way ?

KB: Then that would compromise our objectivity.

OJR: Is it important that be a part of any financial model going forward?

KB: To me personally?

OJR: As the creator.

KB: I want this to be a force for a democracy. I want us to be an honest broker, and I want newspapers featured on our site to get traffic from us. ? There's never been a more controversial time on the planet. I think it's great to be a news source at this point because there's so much hunger for news. You see a lot more diversity in the news coverage on our site than on others. I think the diversity is a mirror to the diversity of opinion there is worldwide. One of the things that makes us objective is we show all points of view. Even if you disagree with one, we give you both -- the majority and the minority point of view. The ones you don't agree with are education. It's nice to know what the other side is thinking. You'll see left-leaning ones as much as much as you see right-leaning ones. Frankly, the software doesn't know the difference between left and right, which is good.

OJR: You could train it to if you wanted.

It's almost like this is the contingency we planned for when there's so much turmoil and difference. People are at odds with understanding each other's point of view. I think we are helping the cause of democracy.

KB: Yes, but this is code that is set down and is on the public record. We are very proud of what we are doing.

OJR: What you're trying to do is give people a way to go into a story from as many different possible approaches as they can.

KB: Even within Google, people have different political leanings. Even if we did want to bias it, fundamentally we are committed so strongly to objectivity we couldn't possibly do it. I think no matter what political association you belong to, it's valuable to see what the other side is saying.

OJR: You have taken something that was an idea that was to have helped you and now, how does it make you feel to know that people are coming to this site during difficult times and getting what they need from it?

KB: I think it's wonderful. I think there's a profound sense of satisfaction. It's almost like this is the contingency we planned for when there's so much turmoil and difference. People are at odds with understanding each other's point of view. I think we are helping the cause of democracy.