It's a metadata profile definition that's linked to by lots of social media sites. It's pretty much just defined in the header (lets people add different attributes to html). For the same reason that w3.org is up there since people link to it when setting a doctype.

There's an extension to the <link rel> tag that overloads it by, instead of linking to actual related data (as the tag was intended to do), treats the target of the link as defining a schema / data format, when rel="profile". The URL is then essentially a globally unique key for the data format; parsers that recognize the format will see the key and know how to parse some other information on the page. gmpg.org is the host of one of the early ones, XFN [gmpg.org], which is linked in default Wordpress installs [wordpress.org].

Up top you have those web sites that have their fingers in damned near everything, because they are looking at "centralization" of the website. More and more websites are using videos, and who better than YouTube to host? Need to provide a way to search your website? Google has already done it for you. Need to update your 3 billion fans what you're having for lunch? Facebook and Twitter have you covered. I can't see the list from work, but I'd wager that Facebook is up there too, with their ever-present "like" buttons. What's surprising is Wikipedia, you'll only sometimes see a link to Wikipedia, even on discussions on Slashdot, they don't go out there and wave their hands saying "everybody link to me" like other sites do.

What about other aspects that would make a website "good"? Such as ease of navigation (find what you want in 5 clicks or less)? Size/amount of useful content? Number of external sites that link to their content?
If we included that sort of data, YouTube could potentially be far up there with Wikipedia. I would think Google and Bing would be ruled out entirely since by their very design they don't hold real data.

If you look at the way they developed this list, it is closer to how Google ranks their searches. The metrics are scored on how many other pages link to the sites. For example, reddit and slashdot aren't high on the list because they link to other sites but very few link back. Creative Commons is in the top ten because everyone links there. It also explains why Myspace is so darn high.

It is now official. The Università degli studi di Milano has confirmed: Linux is dying.

One more crippling bombshell hit the already beleaguered Linux community when UNIMI confirmed that Linux's flagship domain, kernel.org [kernel.org], fell to a shocking #1797 in the Common Crawl rankings. You don't need to be the Amazing Kreskin to predict Linux's future. Its domain now ranks just behind Excite.com, the now-irrelevant search engine from the 1990s, which edges it out at #1796.

The glaring gap between Linux's ranking and the rankings of those in the vibrant, enterprise-ready world is in itself embarrassing enough: Apple #8, Microsoft #17, even Oracle #248. But what seals the coffin is that Linux has fallen behind even the notoriously moribund FreeBSD operating system in these industry-leading metrics, trailing it by nearly one thousand, five hundred positions.

Whatever they are doing here does not reflect anything too useful (from my perspective). Source: I have a number of sites in the top 10,000 - and nothing here makes any sense. It doesn't correlate with any real world metrics I can see. ie: Sites that receive 140,000 visitors a day, and have millions of incoming links are showing up in the 1 million area, and sites of mine with little-to-no power are showing up in the top 100,000. Weird.

The default ranking we show you is by harmonic centrality. If you want, you can find its definition in Wikipedia. But we can explain it easily.

Suppose your site is example.com. Your score by harmonic centrality is, as a start, the number of sites with a link towards example.com. They are called sites at distance one. Say, there are 50 such sites: your score is now 50.

There will be also sites with a link towards sites that have a link towards example.com, but they are not at distance one. They are called sites at distance two. Say, there are 80 such sites: they are not as important as before—we will give them just half a point. So you get 40 more points and your score is now 90.

We can go on: there will be also sites with a link towards sites that have a link towards sites that have a link towards example.com (!), but they are not at distance one or two. They are called sites at distance three. Say, there are 100 such sites: as you can guess, we will give them just one third of a point. So you get 33.333 more points and your score is now 123.333.

My intuition:

Incoming links with degree one should be allocated 1 point. *yep*Incoming links with degree two should be allocated half of 1 point = 0.5 points. *yep*Incoming links with degree three should be allocated half of 0.5 points = 0.25 points. *NOPE* It actually gets allocated 0.33 points.

This means degree ten links still get 0.1 point? 10 hops away and they're still showing up significantly? That measure is broken. 10 hops away should score

Upon further reading (http://en.wikipedia.org/wiki/Centrality), methods that use an attenuation factor like I described are called Eigenvector Centrality, which Katz and PageRank are specific implementations of.