Sunday, September 30, 2007

Update: The shut down of Findory has been rescheduled to occur on November 22, 2007.

Update: The shut down of Findory has been postponed. The website will remain active past November 1, 2007. More information when I can.

Findory will be shutting down on November 1, 2007. I posted the following on the site:

Thank you, Findory readers, for the last four years. I hope you not only found interesting news on Findory you otherwise would have missed, but also, just by using Findory, took pleasure in helping others find news they might enjoy.

Sadly, all good things must come to an end. On November 1, 2007, Findory will be shutting down.

It was a wonderful experiment building Findory. Information personalization is in our future.

Some day, online newspapers will focus on your interests, building you your own unique, customized front page of news. Some day, search engines will learn from what you do to help you find what you need. Some day, your computer will adapt to your needs to help with the task at hand. Some day, information overload will be tamed.

But not today. Findory will be shutting down on November 1. The website will no longer carry news, blogs, videos, podcasts, or favorites. The daily e-mails will cease. To ease the transition for users of Findory Inline and the Findory API, empty feeds will be served for a couple weeks into November.

I am sorry to see Findory go. Though it will not be Findory doing so, I continue to believe that the future will be personalized.

Analyst Henry Blodget calls Yahoo's declining Comscore numbers "an absolute disaster", says the company is a "train wreck", and predicts that "if the company can't reverse this trend in short order, its only hope will be to sell itself."

Yowsers. It's not all that bad, is it?

Like Microsoft, Yahoo may have been a feeble competitor lately, often appearing distracted, slow to react, and unable to do more than follow in most areas.

Yet, like Microsoft, Yahoo remains an giant in the field. In fact, not only does Yahoo have second place market share in many Web products, but also, according to at least one study, Yahoo has the strongest search brand around.

Yahoo's biggest problem at the moment seems to be that they try to do too much and end up doing nothing well. Even in their core business of advertising, they are playing second fiddle to Google. There's no reason for people to use the second best when switching costs are so low, but Yahoo seems bizarrely content to stay behind the leader even as it sees its audience trickling away.

There are as many opinions on what Yahoo should focus on as Yahoo has products, but, I have to say, I am amazed Yahoo has not done more with personalization.

Yahoo has hundreds of millions of signed in users with long histories of what they wanted and enjoyed. Rather than chasing Google's tail, they could lead in personalized advertising, search, e-mail, and news. Yahoo could use their knowledge of what their users need to focus attention, surface relevant information, and be as helpful as possible.

It is something Yahoo could do better than anyone else and would make Yahoo different. As Jeff Bezos used to say about Amazon, when shoppers try other stores, the experience should seem "hollow and pathetic". "Why doesn't this store know me?", shoppers should ask. "Why doesn't it know what I want?"

Yahoo should be the same way. Yahoo should know you. Elsewhere, the experience should feel vaguely unpleasant, like jumping from talking to your friends to being alone in a group of strangers. "Why is this site showing me that?", they should ask. "Doesn't it know I don't like that?"

Friday, September 28, 2007

[After a long break, I am returning to my "Starting Findory" series, a group of posts about my experience starting and building Findory.]

From the beginning of Findory, I was obsessed with performance and scaling.

The problem with personalization is that that it breaks the most common strategy for scaling, caching. When every visitor is seeing a different page, there are much fewer good opportunities to cache.

No longer can you just grab the page you served the last guy and serve it up again. With personalization, every time someone asks for a page, you have to serve it up fresh.

But you can't just serve up any old content fresh. With personalization, when a visitor asks for a web page, first you need to ask, who is this person and what do they like? Then, you need to ask, what do I have that they might like?

So, when someone comes to your personalized site, you need to load everything you need to know about them, find all the content that that person might like, rank and layout that content, and serve up a pipin' hot page. All while the customer is waiting.

Findory works hard to do all that quickly, almost always in well under 100ms. Time is money, after all, both in terms of customer satisfaction and the number of servers Findory has to pay for.

The way Findory does this is that it pre-computes as much of the expensive personalization as it can. Much of the task of matching interests to content is moved to an offline batch process. The online task of personalization, the part while the user is waiting, is reduced to a few thousand data lookups.

Even a few thousand database accesses could be prohibitive given the time constraints. However, much of the content and pre-computed data is effectively read-only data. Findory replicates the read-only data out to its webservers, making these thousands of lookups lightning fast local accesses.

Read-write data, such as each reader's history on Findory, is in MySQL. MyISAM works well for this task since the data is not critical and speed is more important than transaction support.

The read-write user data in MySQL can be partitioned by user id, making the database trivially scalable. The online personalization task scales independently of the number of Findory users. Only the offline batch process faced any issue of scaling as Findory grew, but that batch process can be done in parallel.

In the end, it is blazingly fast. Readers receive fully personalized pages in under 100ms. As they read new articles, the page changes immediately, no delay. It all just works.

Even so, I wonder if I have been too focused on scaling and performance. For example, there have been some features in the crawl, search engine, history, API, and Findory Favorites that were not implemented because of the concern about how they might scale. That may have been foolish.

The architecture, the software, the hardware cluster, these are just tools. They serve a purpose, to help users, and have little value on their own. A company should focus on users first and infrastructure second. Despite the success in the design of the core personalization engine, perhaps I was too focused on keeping performance high and avoiding scaling traps when I should have been giving readers new features they wanted.

Wednesday, September 26, 2007

An enjoyable WWW 2007 paper out of UC Santa Cruz, "A Content-Driven Reputation System for the Wikipedia" (PDF), builds on a great but simple idea: High quality authors usually do not have their Wikipedia edits reversed.

From the paper:

In our system, authors gain reputation when the edits they perform to Wikipedia articles are preserved by subsequent authors, and they lose reputation when their edits are rolled back or undone in short order.

Most reputation systems are user-driven: they are based on users rating each other's contributions or behavior ... In contrast, ... [our] system ... requires no user input ... authors are evaluated on the basis of how their contributions fare.

A content-driven reputation system has an intrinsic objectivity advantage over user-driven reputation systems. In order to badmouth (damage the reputation of) author B, an author A cannot simply give a negative rating to a contribution by B. Rather, to discredit B, A needs to undo some contribution of B, thus running the risk that if subsequent authors restore B's contribution, it will be A's reputation, rather than B's, to suffer, as A's edit is reversed. Likewise, authors cannot simply praise each other's contributions to enhance their reputations: their contributions must actually withstand the test of time.

A fun demo of the technique is available that colors the text of some Wikipedia articles based on the reputation of the authors, providing some measure of how trustworthy particular passages of text might be.

It is curious how this simple but clever technique seems less susceptible to gaming. I was trying to think of ways the system could be manipulated -- Would people retaliate for having their edits reversed? Would they make lots of non-controversial but useless edits to increase their reputation? -- but these and other obvious attacks seem like they might have a fairly high risk of damaging your own reputation as people caught on and reversed the changes.

I also was trying to think how this might be applied elsewhere. For example, on eBay, rather than have sellers and buyers rate each other with inane things like "A++++!", perhaps eBay seller reputation could be determined instead by how often the transaction is reversed? What if eBay implemented a 30-day unconditional return policy on all transactions, then reported buyer reputation based on payment rate and seller reputation based on return rate?

There is an interesting group of papers in the KDD Cup 2007 on the Netflix Prize recommender contest. The full papers are particularly good and detail techniques that are doing well in the contest, but also don't miss the much lighter introductory paper, "The Netflix Prize" (PDF), and its discussion and charts of the progress on the prize.

On a related note, you might also be interested in Simon Funk's enjoyable write-ups ([1][2][3]) on his explorations into using SVD for the Netflix contest. I particularly liked his focus on the speed of and twiddles to SVD.

By the way, I am no longer playing with the contest, but I should admit that I never got anywhere near the performance of these contenders. For more on that, see my post, "Latest on the Netflix prize". Interesting that my prediction in that post -- that additional data on the movies from another source will be necessary -- so far has turned out to be wrong.

Microsoft Research is offering $1M in research grants for "research in the area of Semantic Computing, Internet Economics and Online Advertising".

A juicy dataset is provided for that research, a "Microsoft adCenter Search query log excerpt with 100 million search queries along with ad click logs sampled over a few months, and a Live Search query log excerpt with 15 million search queries with per-query search result clickthroughs."

Sadly, the grants and access to the data are open only to academic research groups. I suppose that is to be expected after the ruckus that followed the now-defunct AOL Research group's naive attempt to offer their search logs more widely. I guess non-academics will just have to buy search logs from ISPs.

By the way, I thought it was amusing, when looking over the terms of Microsoft's request for proposals, to see that participants are discouraged from "relying exclusively on non-Microsoft technologies." Ah, yes, what good is research if it doesn't use Microsoft products?

As the second of the two papers mentions, there are some amusing examples of where people direct their attention. Here in Seattle, a "small, very bright point on the shore of Lake Washington points out Bill Gates' house."

Update: One month later, the NYT reviews a commercial device called the Dash Express that apparently "broadcasts information about its travels back to the Dash network", allowing users to "warn each other through the network [anonymously] the second they hit a traffic slowdown."

Update: Two months later, Saul Hansell at the NYT posts about the Yahoo Inbox 2.0 project, an extension to Yahoo Mail "that can automatically determine the strength of your relationship to someone by how often you exchange e-mail and instant messages with him or her" and displays "other information about your friends ... much like the news feed on Facebook."

As you might expect in an online app, the focus appears to be on collaboration, sharing, and virtual conferencing (using chat and synchronized online viewing of the presentation).

Stepping back and looking at the bigger picture here, I find myself getting to the point where my entire day is spent in the browser. Even on machines where I have Microsoft Office installed, I often find it faster to quickly view documents using the GMail integration with Google Docs than open other applications.

I was skeptical that Google would get us to that point, but they have. Google appears to be making remarkable progress chipping away at the utility of a desktop PC environment.

Monday, September 17, 2007

Bob Cringely's latest column proposes that Google spend several billion to buy the 700 MHz band, sell "Google Cubes" that act as small fileservers, WiFi points, and the mesh of a 700 MHz network, and then "overnight" become the "biggest and lowest-cost ISP" and the "biggest and lowest-cost mobile phone company" while "dominating local- and location-based search".

Friday, September 14, 2007

Security research guru Ross Anderson has a talk up on Google Video, "Searching for Evil", that, among other things, surveys some of the more unusual Web-based financial schemes.

If you only have a few minutes, jump to 20:23 to check out Ross' frightening examples of some phishing-like schemes that are popping up on the web. The first example shows how people recruit mules on the Web to sit in the middle of a fraudulent financial transaction, with the person who accepted a too-good-to-be-true job offer getting badly screwed in the end.

If you have more time to dive in deeper and watch the whole thing, I enjoyed Ross' discussion at the beginning of the talk about using evolutionary game theory in simulations of network attacks. He refers to a WEIS 2006 paper, "The topology of covert conflict" (PDF), for more details. That paper starts to "build a bridge between network science and evolutionary game theory" and to "explore ... sophisticated [network] defensive strategies" including "cliques ... the cell structure often used in revolutionary warfare" which turn out to be "remarkably effective" for defending a network against adaptive attackers.

Similarly, though not mentioned in his talk, Ross has a ESAS 2007 paper, "New Strategies for Revocation in Ad-Hoc Networks" (PDF) which looks at how to "remove nodes that are observed to be behaving badly" from ad-hoc networks. They come up with a remarkable conclusion that "the most effective way of doing revocation in general ad-hoc networks is the suicide attack ... [where] a node observing another node behaving badly simply broadcasts a signed message declaring both of them to be dead."

Wednesday, September 12, 2007

The Netflix Prize leaderboard continues to be a fascinating proof of the value of experimentation when working with big data.

The top entries include teams of graduate students from around the world, with eastern Europe particularly well represented. The second best entry at the moment is from undergraduates from Princeton (kudos, Dinosaur Planet).

Some of the teams disclose information about their solutions, enough to make it clear that the teams are playing with a wide variety of techniques.

I love the "King of the Hill" approach to these kinds of problems. There should be no sacred cows, no egos preventing people from trying and testing new techniques. From the seasoned researcher to the summer intern, anyone should be able to try their hand at the problem and build on what works.

Tuesday, September 11, 2007

According to Philipp Lenssen, an internal Google talk with confidential information on Google Reader was briefly available on Google Video.

Philipp posts a summary from someone referred to as "Fanboy". Ionut Alex Chitu also posts two summaries ([1][2]) of the content of the talk. Worth a look.

There is mention of planned social sharing features, details on the internal operations of Google Reader, and various statistics on feeds and feed reading. It also sounds like they plan on launching feed recommendations soon.

Impressive that the team working on Google Reader is so small, just seven people.

Filip Radlinski and Thorsten Joachims had a paper at KDD 2007, "Active Exploration for Learning Rankings from Clickthrough Data" (PDF), with a good discussion of strategies for experimenting with changes to the search results to maximize a search engine's ability to learn from clickstream data.

Some excerpts:

[When] learning rankings of documents from search engine logs .... all previous work has only used logs collected passively, simply using the recorded interactions that take place anyway. We instead propose techniques to guide users so as to provide more useful training data for a learning search engine.

[With] passively collected data ... users very rarely evaluate results beyond the first page, so the data obtained is strongly biased toward documents already ranked highly. Highly relevant results that are not initially ranked highly may never be observed and evaluated.

One possibility would be to intentionally present unevaluated results in the top few positions, aiming to collect more feedback on them. However, such an ad-hoc approach is unlikely to be useful in the long run and would hurt user satisfaction.

We instead introduce ... changes ... [designed to] not substantially reduce the quality of the ranking shown to users, produce much more informative training data and quickly lead to higher quality rankings being shown to users.

The strategy they propose is to come up with some rough estimate of the cost of ranking incorrectly, then twiddle with the search results in such a way that the data produced will help us minimize that cost.

There are a bunch of questions raised by the paper that could use further discussion: Is the loss function proposed a good one (in particular, with how it deals with lack of data)? How do other loss functions perform on real data? How much computation does the proposed method require to determine which experiment to run? Are there simpler strategies that require less online computation (while the searcher is waiting) that perform nearly as well on real data?

But, such quibbles are beside the point. The interesting thing about this paper is the suggestion of learning from clickstream data, not just passively from what people do, but also actively by changing what people see depending on what we need to learn. The system should explore the data, constantly looking for whether what it believes to be true actually is true, constantly looking for improvements.

On a broader point, this paper appears to be part of an ongoing trend in search relevance rank away from link and text analysis and toward analysis of searcher behavior. Rather than trying to get computers to understand the content and whether it is useful, we watch people who read the content and look at whether they found it useful.

People are great at reading web pages and figuring out which ones are useful to them. Computers are bad at that. But, people do not have time to compile all the pages they found useful and share that information with billions of others. Computers are great at that. Let computers be computers and people be people. Crowds find the wisdom on the web. Computers surface that wisdom.

See also my June 2007 post, "The perils of tweaking Google by hand", where I discussed treating every search query as an experiment where results are frequently twiddled, predictions made on the impact of those changes, and unexpected outcomes result in new optimizations.

Monday, September 10, 2007

The paper describes finding similar songs using very short audio snippets from the songs, a potential step toward building a music recommendation system. The similarity algorithm used is described as a variant of locality-sensitive hashing.

First, an excerpt from the paper describing LSH:

The general idea is to partition the feature vectors into subvectors and to hash each point into separate hash tables... Neighbors can be [found by] ... each hash casting votes for the entries of its indexed bin, and retaining the candidates that receive some minimum number of votes.

What is interesting (and a bit odd) about this work is that they used neural networks to learn the hash functions for LSH:

Our goal is to create a hash function that also groups "similar" points in the same bin, where similar is defined by the task. We call this a forgiving hash function in that it forgives differences that are small.

We train a neural network to take as input an audio spectrogram and to output a bin location where similar audio spectrograms will be hashed.

A curious detail is that initializing the training data by picking output bins randomly worked poorly, so instead they gradually change the output bins over time, allowing them to drift together.

The primary difficulty in training arises in finding suitable target outputs for each network.

Every snippet of a song is labeled with the same target output. The target output for each song is assigned randomly.

The drawback of this randomized target assignment is that different songs that sound similar may have entirely different output representations (large Hamming distance between their target outputs). If we force the network to learn these artificial distinctions, we may hinder or entirely prevent the network from being able to correctly perform the mapping.

Instead of statically assigning the outputs, the target outputs shift throughout training ... We ... dynamically reassign the target outputs for each song to the [bin] ... that is closest to the network's response, aggregated over that song's snippets.

By letting the network adapt its outputs in this manner, the outputs across training examples can be effectively reordered to avoid forcing artificial distinctions .... Without reordering ... performance was barely above random.

I have not quite been able to make up my mind about whether this is clever or a hack. On the one hand, reinitializing the target outputs to eliminate biases introduced by random initialization seems like a clever idea, perhaps even one that might have broad applicability. On the other hand, it seems like their learning model has a problem in that it does not automatically learn to shift the outputs from their initial settings, and the reordering step seems like a hack to force it to do so.

In the end, I leave this paper confused. Is this a good approach? Or are there ways to solve this problem more directly?

Perhaps part of my confusion is my lack of understanding of what the authors are using for their similarity metric. It never appears to be explicitly stated. Is it that songs should be considered similar if the difference between their snippets is small? If so, is it clear that is what the NNet is learning?

Moreover, as much as I want to like the authors idea, the evaluation only compares their approach to LSH, not to other classifiers. If the goal is to minimize the differences between snippets in the same bin while also minimizing the number of bins used for snippets of any given song, are there better tools to use for that task?

Starting at 03:52 in the talk, Nicholas begins describing how he thinks the desktop should work, listing four categories of user goals: Search, Summon, Browse, and Act.

Search is a fast, comprehensive, and easy-to-use desktop search tool, such as Google Desktop. This may not sound new but, amazingly, this has only recently begun to be a common part of the desktop experience.

Summon is using desktop search for navigation. You know something exists, you just want to get back to it immediately.

Browse may sound close to what the desktop does now, but Nicholas seems to mean browse not as navigating a folder hierarchy but as finding objects related to or near other objects. For example, you might not remember the name of the specific song you want, but you might be able to remember the artist who wrote it; getting to the first allows you to recall the second.

Act is when you want to immediately do a task (e.g. play a music track) without any intervening steps such as opening an application.

Note the deemphasis of the traditional file hierarchy, the focus on objects, and the shift away from applications and toward actions on objects.

The desktop should seek to satisfy our goals immediately. We should not have to start to adjust the lighting in a photo album by navigating an hierarchical menu, locating an application that allows you to edit photos, waiting for the application to load, and then opening the files using the open menu in that application. We should just ask to adjust the lightening in a photo album.

The next few minutes of the talk further break down some of the constraints on the desktop metaphor. Nicholas advocates fast, universal access that ignores the boundaries of the machine, reaching out to the network to whatever data and code is needed to act. The focus should be on the task -- getting work done -- using whatever resources are necessary, requiring as little effort as possible.

The vision is fantastic and inspiring. However, while Quicksilver is an interesting example, from what I saw, it appears to be only a baby step toward these lofty goals. The learning and automation appears primitive, and the effort required to customize severe, which may make Quicksilver closer to a geek tool than a realization of the broader ambition.

Even so, Nicholas is offering intriguing thoughts on where the desktop should go. It is well worth listening.

Friday, September 07, 2007

Again, I would recommend reading the whole thing, but I here will focus on the parts on personalization.

In particular, there are a few tidbits on personalized advertising in this round of interviews. Some excerpts:

Chris Sherman: ... As they get to know you and your preferences, you know... "I never click on that video ad," they’ll gradually stop showing you [those] ads ... and maybe increase the ads ... that you do click on.

Larry Cornett: ... The more they understand about what a specific user is looking for in their context, the more intelligent they can be about what they're actually offering ... By being more targeted it will add more value for the users and hopefully, be a better experience for them as well .... Do [users] really want to spend time in the context where they're seeing a lot of stuff that’s not targeted and not appropriate and might even be annoying or would they rather ... [see ads that] could be beneficial for them.

[Gord Hotchkiss:] Personalization of advertising will happen incrementally and the ability to target accurately will improve over time. For many users, it will be a mixed environment, with some very well targeted, relevant ads in some locations that don’t even look like advertising and the more typical forms of untargeted advertising we're more familiar with.

The impression I got from this is that personalized advertising is now seen as inevitable. Privacy concerns may make it appear incrementally, but most seem to agree that it will happen.

On a different topic, usability guru Jakob Nielsen used his time to promote NLP over personalization and pick on Amazon.com's recommendations yet again. Gord asked me to respond.

On the one hand, I agree with Jakob about the long-term promise of natural language techniques (though I think he may be underestimating the challenges and overestimating the likelihood of rapid progress there) and his criticism of inaccuracies in personalization and recommendations (and they are inaccurate, no doubt).

On the other hand, I think Jakob is using an absolute measure of the effectiveness of personalization where a relative measure is more appropriate. Specifically, the metric should not be how often does personalized content accurately reflect your interests; it should be how much better does personalized content predict your interests than whatever unpersonalized content you otherwise would have to put in the space.

That is a much lower bar. Bestsellers and other unpersonalized content tend to be very poor predictors of individual interest. By knowing even a little bit about you, it is easy to do better.

Is personalization ever going to be perfect? No, but it does not have to be. It just has to be more useful than the alternative. Personalized content only has to be marginally more interesting than unpersonalized content to be helpful.

Wednesday, September 05, 2007

Some research out of Penn State, "The Effect of Brand Awareness on the Evaluation of Search Engine Results" (PDF) puts some hard numbers on the hurdles for Google's web search competitors.

The study showed participants Google search results on all queries, but switched around branding elements at the top and bottom of the page to label the results as from Yahoo, Microsoft Live Search, a startup called AI2RS, and Google.

From the paper:

Based on average relevance ratings, there was a 25% difference between the most highly rated search engine and the lowest, even though search engine results were identical in both content and presentation.

The 25% difference was between the results branded with the AI2RS startup and the results branded as Yahoo.

Curiously, Yahoo was rated substantially higher than Google, despite the fact that these were Google's search results. Yahoo has failed to gain web search market share, but, if you believe this study, brand weakness is not the reason why.

It is true that this study is small, just 32 participants and across 4 different queries. It would be nice to see a broader study that confirms these results.

Even so, it probably is safe to say that the strength of the Google and Yahoo brands (and Microsoft's ownership of the defaults in Internet Explorer) make it very difficult for any web search startup. As Rich Skrenta once said, "A conventional attack against Google's search product will fail ... A copy of their product with your brand has no pull."

Compiling data from Hitwise, they shows that Google, Microsoft, and Yahoo now have a meager 15% market share in shopping search combined, down from nearly 50% combined three years ago.

Considering that a substantial percentage of web search queries are shopping-related (see "A taxonomy of web search" (PDF)) and the ease of extracting advertising revenue and revenue sharing where there is such strong purchase intent, I would think that Google, Yahoo, and Microsoft would be pursuing shopping metasearch more aggressively.

Saturday, September 01, 2007

A SIGIR 2007 paper out of Microsoft Research, "HITS on the Web: How does it Compare?" by Marc Najork, Hugo Zaragoza, and Michael Taylor is a large-scale study of several ranking algorithms using a substantial web crawl and data from the MSN query logs.

The authors appear to have expected the HITS algorithm to outperform the others in their tests, but found instead that a combination of BM25F and simple in-degree link analysis outperformed everything else. From the paper:

We were quite surprised to find that HITS, a query-dependent feature, is about as effective as web page in-degree, the most simpleminded query-independent link-based feature.

As expected, BM25F outperforms all link-based features by a large margin. The link-based features are divided into two groups, with a noticeable performance drop between the groups. The better-performing group consists of the features that are based on the number and/or quality of incoming links (in-degree, PageRank, and HITS authority scores); and the worse-performing group consists of the features that are based on the number and/or quality of outgoing links (outdegree and HITS hub scores).

The combination of BM25F with ... id in-degree consistently outperforms the combination of BM25F with PageRank or HITS authority scores, and can be computed much easier and faster.

PageRank performed poorly in their tests. However, their explanation of why struck me as unconvincing. From the paper:

The fact that in-degree features outperform PageRank under all measures is quite surprising. A possible explanation is that link-spammers have been targeting the published PageRank algorithm for many years, and that this has led to anomalies in the web graph that affect PageRank.

This begs the question of whether they picked the right PageRank algorithm. In particular, there are variants of PageRank that they could have used that appear less sensitive to spam and may have performed much better. Unfortunately, without results for those variants, it is hard to know whether the criticisms in this paper of naive PageRank are applicable to the algorithms evolved from PageRank used by search engines today.

Even so, the results of the study are interesting, both for the overview of several relevance ranking algorithms and the conclusions about their effectiveness. Particularly intriguing is the evidence that computationally expensive algorithms such as the query-dependent HITS algorithm seem to hold no advantage over much simpler techniques.

Update: Marc Najork, one of the authors of the paper, expands on the PageRank algorithm issue and the performance of HITS in the comments for this post.