We are pleased to announce the release of a host-level web graph of recent monthly crawls (February, March, April 2017). The graph consists of 385 million nodes and 2.5 billion edges.
The following results from the development of this graph:
- a ranked list of hosts to expand the crawl frontier;
- pages ranked by Harmonic Centrality with less influence from spam, among other attributes (for comparison we include PageRank);
- the template/process for Common Crawl to produce graphs and page rankings at regular intervals.
We produced this graph, and intend to produce similar graphs going forward, because the Common Crawl community has expressed a strong interest in using Common Crawl data for graph processing, particularly with respect to:
- web graph and page rankings produced by Common Search in 2016;
- the Hyperlink Graph data set produced in 2013 by Web Data Commons (WDC);
- the “WWW Ranking” from WDC, along with a second set of hyperlink graphs based on crawl data from April 2014.
*Please note: the graph includes dangling nodes i.e. hosts that have not been crawled yet are pointed to from a link on a crawled page. Seventeen percent (65 million) of the hosts represented have been crawled in one of the three monthly crawls. Thus, 320 million of the hosts represented in the graph are known only from links. (Host names are not wholly verified: host names that are obviously invalid are skipped; others are not resolved in DNS.)
Extraction of links and construction of the graph
Links are taken from WAT extracts but we also included redirects from WARC files of the redirect and 404 dataset. All types of links are included, including pure “technical” ones pointing to JavaScript libraries, web fonts, etc.
The host names are reversed and a leading www.
is stripped: www.subdomain.example.com
becomes com.example.subdomain
. Node IDs are assigned sequentially to the the node list sorted by reversed host name. This keeps links between hosts of the same domain or in the same country-code top-level domain close together and allows for an efficient delta-compression of edges.
The extraction is done in three steps:
- links are extracted, reduced to host-level links and stored as pairs 〈reversed host from, rev. host to〉
- host names are assigned to IDs and edges are represented as 〈from id, to id〉 pairs
- ranks are computed.
The first two steps are done by Spark and Python; the code is part of the project cc-pyspark. To compute the rankings the webgraph is loaded into the WebGraph framework.
Hosts ranked by Harmonic Centrality and PageRank
We provide a list of ranked nodes (host names) by
- Harmonic Centrality (calculated by HyperBall)
- and PageRank (by PageRankParallelGaussSeidel)
You can download the ranks of all 385 millions hosts. Below are the top 1000 hosts ranked by Harmonic Centrality.
Top 1000 hosts ranked by harmonic centrality
harmonic centrality rank | hc value | page rank | page rank value | reversed hostname |
---|---|---|---|---|
1 | 38039408 | 1 | .00775205 | com.facebook |
2 | 34958084 | 3 | .00428814 | com.twitter |
3 | 33540440 | 2 | .00540973 | com.googleapis.fonts |
4 | 32887486 | 4 | .00286281 | com.youtube |
5 | 31128886 | 7 | .00167971 | com.google.plus |
6 | 30375812 | 6 | .00183995 | com.google |
7 | 28535442 | 12 | .00101167 | com.linkedin |
8 | 28304832 | 10 | .00118472 | com.instagram |
9 | 28086834 | 8 | .00158847 | com.blogger |
10 | 27583494 | 22 | .00055368 | com.pinterest |
11 | 27218188 | 40 | .00029768 | org.wikipedia.en |
12 | 27078504 | 15 | .00080352 | org.wordpress |
13 | 26959586 | 29 | .00043823 | com.apple.itunes |
14 | 26666682 | 26 | .00053618 | com.blogspot.bp.2 |
15 | 26645472 | 25 | .00053637 | com.blogspot.bp.4 |
16 | 26627026 | 68 | .00020569 | be.youtu |
17 | 26593046 | 24 | .00053727 | com.blogspot.bp.3 |
18 | 26572996 | 20 | .00057575 | com.blogspot.bp.1 |
19 | 26522710 | 58 | .00022707 | com.amazon |
20 | 26491596 | 33 | .00035486 | com.google.play |
21 | 26469056 | 19 | .00059326 | com.google.maps |
22 | 26467424 | 50 | .00024854 | com.vimeo |
23 | 26443448 | 45 | .00027360 | com.flickr |
24 | 26424044 | 5 | .00219270 | org.gmpg |
25 | 26324636 | 37 | .00031984 | com.google.mail |
26 | 26318060 | 43 | .00028895 | gl.goo |
27 | 26286836 | 44 | .00028723 | com.github |
28 | 26280188 | 65 | .00021446 | com.microsoft |
29 | 26124690 | 79 | .00018142 | com.google.support |
30 | 26103492 | 32 | .00043580 | com.adobe |
31 | 26085138 | 100 | .00016162 | ly.bit |
32 | 26046002 | 98 | .00016696 | com.google.docs |
33 | 26023404 | 108 | .00013636 | org.w3 |
34 | 26014732 | 152 | .00010471 | com.facebook.developers |
35 | 25949012 | 30 | .00043749 | me.wp |
36 | 25905092 | 222 | .00007208 | com.nytimes |
37 | 25845124 | 72 | .00019664 | com.google.sites |
38 | 25833394 | 186 | .00008012 | com.facebook.m |
39 | 25790176 | 138 | .00011458 | com.weibo |
40 | 25732096 | 147 | .00010760 | com.apple |
41 | 25715634 | 39 | .00029787 | com.paypal |
42 | 25692290 | 38 | .00030937 | co.t |
43 | 25689954 | 407 | .00004073 | com.huffingtonpost |
44 | 25679134 | 115 | .00013061 | org.creativecommons |
45 | 25657428 | 188 | .00007975 | com.facebook.apps |
46 | 25609926 | 11 | .00112239 | com.blogblog.resources |
47 | 25583454 | 414 | .00004019 | com.forbes |
48 | 25546588 | 131 | .00012114 | com.imgur |
49 | 25535844 | 416 | .00004002 | net.slideshare |
50 | 25508504 | 669 | .00002026 | com.mashable |
51 | 25503482 | 454 | .00003445 | com.tinyurl |
52 | 25498396 | 52 | .00024415 | com.etsy |
53 | 25478394 | 659 | .00002054 | com.businessinsider |
54 | 25475376 | 201 | .00007711 | com.google.drive |
55 | 25475334 | 27 | .00050381 | com.wordpress |
56 | 25458606 | 481 | .00003153 | com.washingtonpost |
57 | 25411174 | 189 | .00007959 | com.myspace |
58 | 25387732 | 146 | .00010823 | com.medium |
59 | 25383148 | 417 | .00003969 | uk.co.bbc |
60 | 25381774 | 415 | .00004003 | com.imdb |
61 | 25381672 | 603 | .00002224 | com.wired |
62 | 25361154 | 572 | .00002431 | com.techcrunch |
63 | 25349038 | 428 | .00003804 | com.bing |
64 | 25347080 | 183 | .00008131 | com.google.groups |
65 | 25339734 | 807 | .00001619 | com.time |
66 | 25325246 | 473 | .00003265 | com.theguardian |
67 | 25313804 | 447 | .00003533 | uk.co.amazon |
68 | 25313562 | 307 | .00006529 | com.reddit |
69 | 25309096 | 162 | .00009843 | com.feedburner.feeds |
70 | 25281640 | 306 | .00006533 | com.tumblr |
71 | 25275352 | 190 | .00007951 | com.soundcloud |
72 | 25266250 | 694 | .00001917 | uk.co.dailymail |
73 | 25255004 | 570 | .00002446 | com.surveymonkey |
74 | 25252200 | 181 | .00008142 | org.archive.web |
75 | 25250540 | 76 | .00018708 | com.eventbrite |
76 | 25236658 | 171 | .00008740 | com.google.developers |
77 | 25235286 | 59 | .00022308 | com.vimeo.player |
78 | 25226754 | 613 | .00002197 | org.mozilla.addons |
79 | 25220086 | 777 | .00001695 | com.wsj.online |
80 | 25203382 | 316 | .00006277 | com.yahoo |
81 | 25192820 | 1356 | .00001042 | com.wsj.blogs |
82 | 25177614 | 336 | .00005412 | com.issuu |
83 | 25174772 | 828 | .00001586 | com.latimes |
84 | 25173482 | 66 | .00021367 | com.vk |
85 | 25164782 | 628 | .00002133 | com.cnn |
86 | 25163448 | 396 | .00004196 | com.dropbox |
87 | 25154430 | 588 | .00002310 | me.about |
88 | 25149472 | 698 | .00001900 | org.npr |
89 | 25121032 | 504 | .00002963 | com.meetup |
90 | 25105294 | 112 | .00013267 | com.gravatar |
91 | 25103488 | 1094 | .00001268 | com.theatlantic |
92 | 25100710 | 652 | .00002072 | com.gmail |
93 | 25099516 | 528 | .00002776 | org.wikimedia.upload |
94 | 25093888 | 119 | .00012789 | com.imgur.i |
95 | 25090464 | 215 | .00007402 | me.fb |
96 | 25089908 | 1156 | .00001216 | com.venturebeat |
97 | 25084918 | 391 | .00004336 | com.google.picasaweb |
98 | 25083712 | 526 | .00002804 | com.dropboxusercontent.dl |
99 | 25081758 | 1083 | .00001277 | com.ted |
100 | 25068848 | 740 | .00001773 | com.reuters |
101 | 25067392 | 1439 | .00000979 | com.economist |
102 | 25060080 | 971 | .00001483 | com.adweek |
103 | 25040748 | 14 | .00084868 | com.macromedia.download |
104 | 25036148 | 1619 | .00000895 | com.thenextweb |
105 | 25029640 | 681 | .00001990 | com.goodreads |
106 | 25025954 | 1168 | .00001199 | com.cnn.money |
107 | 25024186 | 476 | .00003199 | com.google.translate |
108 | 25022650 | 530 | .00002745 | com.dailymotion |
109 | 25022502 | 170 | .00008762 | com.facebook.fr-fr |
110 | 25012510 | 713 | .00001855 | gov.whitehouse |
111 | 24989682 | 501 | .00002977 | gov.nih.nlm.ncbi |
112 | 24984344 | 80 | .00018111 | com.twitter.mobile |
113 | 24965374 | 1457 | .00000967 | ca.cbc |
114 | 24959346 | 718 | .00001832 | org.archive |
115 | 24943750 | 216 | .00007377 | com.facebook.es-la |
116 | 24936772 | 1053 | .00001316 | org.pbs |
117 | 24936464 | 219 | .00007270 | com.microsoft.windows |
118 | 24931290 | 804 | .00001628 | com.scribd |
119 | 24930846 | 1067 | .00001298 | com.cnbc |
120 | 24930722 | 2132 | .00000684 | com.youtube.m |
121 | 24922360 | 1299 | .00001085 | com.buzzfeed |
122 | 24915482 | 426 | .00003817 | org.wikimedia.commons |
123 | 24914368 | 1522 | .00000924 | com.newyorker |
124 | 24912590 | 600 | .00002235 | com.microsoft.msdn |
125 | 24906018 | 671 | .00002023 | com.geocities |
126 | 24900624 | 2899 | .00000490 | net.boingboing |
127 | 24888838 | 1318 | .00001070 | com.gizmodo |
128 | 24878576 | 355 | .00005095 | com.netvibes |
129 | 24869944 | 826 | .00001589 | com.prnewswire |
130 | 24864830 | 1824 | .00000811 | com.examiner |
131 | 24854006 | 1340 | .00001059 | com.engadget |
132 | 24852978 | 54 | .00024174 | net.sourceforge |
133 | 24847988 | 1951 | .00000758 | com.storify |
134 | 24846402 | 598 | .00002245 | com.stackoverflow |
135 | 24845912 | 955 | .00001522 | uk.co.guardian |
136 | 24843904 | 1164 | .00001206 | uk.co.independent |
137 | 24836756 | 168 | .00009144 | com.google.code |
138 | 24835642 | 1080 | .00001280 | com.nature |
139 | 24830762 | 782 | .00001688 | com.bizjournals |
140 | 24828682 | 42 | .00029373 | com.twimg.pbs |
141 | 24825094 | 141 | .00011273 | com.google.feedburner |
142 | 24823614 | 185 | .00008036 | com.macromedia |
143 | 24821860 | 151 | .00010505 | org.mozilla |
144 | 24821036 | 212 | .00007504 | com.facebook.web |
145 | 24820042 | 1785 | .00000829 | com.pcworld |
146 | 24815138 | 589 | .00002307 | com.ebay |
147 | 24808886 | 196 | .00007819 | com.qq.t |
148 | 24802454 | 159 | .00010021 | com.amazonaws.s3 |
149 | 24787304 | 1360 | .00001039 | com.foxnews |
150 | 24783168 | 1482 | .00000940 | com.slate |
151 | 24781108 | 432 | .00003772 | net.behance |
152 | 24779900 | 648 | .00002090 | com.ibm |
153 | 24774872 | 2022 | .00000727 | com.arstechnica |
154 | 24771608 | 2715 | .00000522 | com.indiatimes.timesofindia |
155 | 24769380 | 2004 | .00000732 | au.net.abc |
156 | 24768702 | 1660 | .00000885 | com.marketwatch |
157 | 24768442 | 424 | .00003843 | de.amazon |
158 | 24768200 | 472 | .00003274 | com.google.feedproxy |
159 | 24756046 | 221 | .00007213 | com.facebook.en-gb |
160 | 24753102 | 97 | .00017011 | com.facebook.l |
161 | 24753098 | 373 | .00004743 | com.digg |
162 | 24751480 | 1407 | .00001004 | com.ft |
163 | 24747080 | 356 | .00005021 | com.microsoft.support |
164 | 24746920 | 1621 | .00000895 | com.quora |
165 | 24745500 | 1929 | .00000769 | com.gigaom |
166 | 24741868 | 1728 | .00000861 | com.sfgate |
167 | 24740150 | 1121 | .00001241 | tv.ustream |
168 | 24727386 | 1454 | .00000968 | com.chicagotribune |
169 | 24726066 | 1432 | .00000981 | com.wikihow |
170 | 24719856 | 174 | .00008585 | com.messenger |
171 | 24713146 | 136 | .00011598 | com.istockphoto |
172 | 24711160 | 344 | .00005293 | com.stumbleupon |
173 | 24706692 | 1055 | .00001315 | uk.co.bbc.news |
174 | 24705698 | 2007 | .00000732 | com.boston |
175 | 24702288 | 2028 | .00000723 | com.searchengineland |
176 | 24699804 | 1193 | .00001154 | com.cafepress |
177 | 24699314 | 869 | .00001553 | fm.last |
178 | 24691258 | 17 | .00069783 | com.google.accounts |
179 | 24686178 | 609 | .00002203 | com.usatoday |
180 | 24682302 | 1538 | .00000919 | com.indiegogo |
181 | 24659484 | 997 | .00001423 | com.google.books |
182 | 24658388 | 963 | .00001506 | com.yahoo.finance |
183 | 24655636 | 637 | .00002121 | com.yahoo.groups |
184 | 24652858 | 452 | .00003483 | com.google.news |
185 | 24651408 | 349 | .00005207 | com.googleusercontent.lh4 |
186 | 24648530 | 339 | .00005363 | jp.co.amazon |
187 | 24646736 | 46 | .00026535 | com.wix |
188 | 24642202 | 522 | .00002841 | to.amzn |
189 | 24638684 | 3434 | .00000408 | com.nationalgeographic |
190 | 24636846 | 1070 | .00001295 | org.mozilla.developer |
191 | 24629212 | 1838 | .00000805 | com.businessweek |
192 | 24626124 | 1186 | .00001167 | com.dropbox.dl |
193 | 24626058 | 1200 | .00001149 | com.fortune |
194 | 24624978 | 1812 | .00000816 | com.mtv |
195 | 24624080 | 2074 | .00000707 | com.go.espn |
196 | 24619310 | 142 | .00011187 | com.facebook.de-de |
197 | 24617294 | 1492 | .00000936 | com.gofundme |
198 | 24615060 | 517 | .00002905 | uk.gov |
199 | 24610044 | 1036 | .00001338 | com.cargocollective |
200 | 24609318 | 1059 | .00001310 | com.zazzle |
201 | 24609048 | 462 | .00003391 | com.nbcnews |
202 | 24607704 | 1398 | .00001010 | ly.ow |
203 | 24601154 | 2048 | .00000714 | com.politico |
204 | 24597452 | 2010 | .00000731 | com.cnet.news |
205 | 24593614 | 2482 | .00000576 | au.com.smh |
206 | 24592606 | 743 | .00001767 | com.kickstarter |
207 | 24589922 | 105 | .00014729 | org.w3.validator |
208 | 24583110 | 758 | .00001739 | ca.google |
209 | 24571000 | 687 | .00001965 | com.delicious |
210 | 24568576 | 1376 | .00001028 | com.yahoo.news |
211 | 24564844 | 1444 | .00000975 | com.prweb |
212 | 24563108 | 1486 | .00000937 | com.technologyreview |
213 | 24562216 | 2925 | .00000486 | com.csmonitor |
214 | 24556706 | 1047 | .00001324 | com.go.abcnews |
215 | 24556260 | 3095 | .00000455 | com.merriam-webster |
216 | 24555618 | 645 | .00002095 | com.spotify.open |
217 | 24555456 | 1153 | .00001218 | com.zdnet |
218 | 24553626 | 1052 | .00001319 | com.wiley.onlinelibrary |
219 | 24553552 | 2208 | .00000661 | com.yahoo.sports |
220 | 24548350 | 2209 | .00000661 | com.nymag |
221 | 24548034 | 1865 | .00000789 | net.researchgate |
222 | 24543784 | 1760 | .00000841 | com.cnn.edition |
223 | 24540216 | 1862 | .00000791 | com.angelfire |
224 | 24538684 | 5476 | .00000249 | com.thenation |
225 | 24538034 | 87 | .00017752 | com.wp.i1 |
226 | 24537674 | 658 | .00002055 | uk.co.telegraph |
227 | 24536988 | 366 | .00004837 | uk.co.google |
228 | 24534654 | 1151 | .00001221 | com.entrepreneur |
229 | 24531580 | 34 | .00033395 | com.twitter.blog |
230 | 24531188 | 372 | .00004753 | com.tripadvisor |
231 | 24528734 | 2999 | .00000473 | com.thedailybeast |
232 | 24527348 | 1099 | .00001264 | fr.amazon |
233 | 24525918 | 1286 | .00001099 | gov.nps |
234 | 24522910 | 1215 | .00001137 | tv.twitch |
235 | 24522716 | 213 | .00007442 | com.facebook.pt-br |
236 | 24511620 | 2409 | .00000597 | uk.co.theregister |
237 | 24509882 | 1930 | .00000769 | com.prezi |
238 | 24509572 | 1333 | .00001065 | org.change |
239 | 24508852 | 229 | .00007108 | com.google.chrome |
240 | 24507186 | 368 | .00004829 | com.apple.support |
241 | 24505498 | 67 | .00021293 | com.addthis |
242 | 24504130 | 567 | .00002465 | com.google.video |
243 | 24498714 | 345 | .00005293 | de.google |
244 | 24492670 | 4843 | .00000285 | au.com.theage |
245 | 24489828 | 2894 | .00000491 | com.salon |
246 | 24484826 | 1873 | .00000784 | org.arxiv |
247 | 24482800 | 691 | .00001931 | org.wikipedia.fr |
248 | 24482310 | 1346 | .00001054 | com.microsoft.office |
249 | 24482184 | 198 | .00007767 | jp.ameblo |
250 | 24476510 | 3078 | .00000459 | com.xkcd |
251 | 24474790 | 1694 | .00000872 | com.pcmag |
252 | 24474556 | 1459 | .00000965 | gov.nasa |
253 | 24473994 | 1969 | .00000747 | com.mixcloud |
254 | 24470958 | 6235 | .00000222 | com.reuters.blogs |
255 | 24469354 | 1649 | .00000886 | com.feedburner.feeds2 |
256 | 24468246 | 1232 | .00001129 | com.cnet |
257 | 24463680 | 204 | .00007689 | eu.europa.ec |
258 | 24455228 | 5903 | .00000232 | com.laughingsquid |
259 | 24454312 | 1324 | .00001067 | com.fastcompany |
260 | 24451544 | 5791 | .00000235 | com.forbes.blogs |
261 | 24450950 | 3224 | .00000442 | com.vox |
262 | 24449488 | 1088 | .00001270 | com.reverbnation |
263 | 24444814 | 1518 | .00000926 | ca.amazon |
264 | 24442382 | 324 | .00005879 | com.weebly |
265 | 24442150 | 1747 | .00000847 | com.blogspot.googleblog |
266 | 24440634 | 3692 | .00000376 | com.google.images |
267 | 24439882 | 3248 | .00000437 | com.billboard |
268 | 24438356 | 347 | .00005214 | com.googleusercontent.lh5 |
269 | 24437560 | 331 | .00005727 | com.yelp |
270 | 24434144 | 2445 | .00000585 | com.google.productforums |
271 | 24430074 | 218 | .00007273 | com.facebook.business |
272 | 24429478 | 489 | .00003099 | com.windowsphone |
273 | 24428674 | 226 | .00007160 | me.m |
274 | 24428294 | 1443 | .00000975 | com.newsweek |
275 | 24426648 | 206 | .00007652 | com.facebook.es-es |
276 | 24426188 | 4379 | .00000317 | com.theonion |
277 | 24425694 | 1499 | .00000935 | it.scoop |
278 | 24425110 | 3838 | .00000358 | com.pandora |
279 | 24423972 | 546 | .00002567 | org.wikipedia.es |
280 | 24423302 | 608 | .00002209 | com.bloomberg |
281 | 24420222 | 57 | .00023139 | com.twitter.support |
282 | 24419178 | 1797 | .00000822 | com.adage |
283 | 24418438 | 104 | .00014890 | com.adobe.get |
284 | 24417464 | 2056 | .00000711 | com.walmart |
285 | 24412442 | 2704 | .00000524 | com.rollingstone |
286 | 24409082 | 223 | .00007206 | com.facebook.id-id |
287 | 24408340 | 332 | .00005712 | com.deviantart |
288 | 24407960 | 749 | .00001751 | jp.ne.hatena.d |
289 | 24407810 | 2058 | .00000711 | com.variety |
290 | 24404858 | 1635 | .00000891 | com.webmd |
291 | 24404134 | 2590 | .00000551 | com.thehill |
292 | 24403500 | 1784 | .00000829 | com.adobe.blogs |
293 | 24401770 | 2377 | .00000600 | com.usnews |
294 | 24396586 | 2450 | .00000583 | me.fb.on |
295 | 24396528 | 764 | .00001718 | com.wsj |
296 | 24392360 | 2414 | .00000595 | com.bleacherreport |
297 | 24390596 | 459 | .00003408 | com.technorati |
298 | 24388858 | 2251 | .00000646 | com.shutterstock |
299 | 24385530 | 2357 | .00000608 | com.qz |
300 | 24385054 | 193 | .00007869 | com.facebook.it-it |
301 | 24384310 | 2420 | .00000594 | org.sciencemag |
302 | 24383358 | 5654 | .00000242 | com.esquire |
303 | 24383180 | 1198 | .00001150 | au.com.google |
304 | 24380792 | 985 | .00001445 | com.foursquare |
305 | 24379916 | 2339 | .00000612 | edu.stanford |
306 | 24379554 | 539 | .00002638 | jp.livedoor.blog |
307 | 24377848 | 1309 | .00001078 | com.theverge |
308 | 24374422 | 13402 | .00000109 | com.hackaday |
309 | 24371332 | 1367 | .00001029 | co.vine |
310 | 24368150 | 1452 | .00000969 | com.msn.msnbc |
311 | 24367912 | 6479 | .00000213 | com.ted.blog |
312 | 24365996 | 3719 | .00000372 | gd.is |
313 | 24365318 | 4181 | .00000331 | com.vice |
314 | 24365106 | 2983 | .00000476 | com.nbc |
315 | 24363772 | 679 | .00001994 | gov.cdc |
316 | 24363062 | 191 | .00007939 | com.xing |
317 | 24362254 | 2891 | .00000492 | com.scientificamerican |
318 | 24361736 | 1069 | .00001297 | com.cbsnews |
319 | 24361436 | 413 | .00004029 | us.icio.del |
320 | 24359470 | 6886 | .00000201 | com.scienceblogs |
321 | 24358744 | 3043 | .00000466 | com.microsoft.research |
322 | 24356420 | 3502 | .00000398 | com.bestbuy |
323 | 24356010 | 988 | .00001441 | com.bbc |
324 | 24354886 | 5002 | .00000275 | com.gawker |
325 | 24352878 | 3988 | .00000346 | com.startribune |
326 | 24349014 | 2078 | .00000706 | fr.lemonde |
327 | 24348074 | 4191 | .00000330 | com.allrecipes |
328 | 24346274 | 5087 | .00000270 | com.space |
329 | 24344730 | 1878 | .00000783 | com.smashingmagazine |
330 | 24344018 | 5668 | .00000241 | com.treehugger |
331 | 24342878 | 747 | .00001758 | es.google |
332 | 24340844 | 2655 | .00000535 | uk.co.huffingtonpost |
333 | 24335980 | 8358 | .00000164 | com.techdirt |
334 | 24334066 | 1068 | .00001297 | org.wikipedia |
335 | 24333934 | 4872 | .00000284 | com.nytimes.blogs.well |
336 | 24333436 | 1313 | .00001075 | br.com.google |
337 | 24332660 | 542 | .00002617 | com.timeout |
338 | 24332538 | 18 | .00063404 | com.googleusercontent.lh3 |
339 | 24331738 | 2088 | .00000699 | com.redbubble |
340 | 24330804 | 3878 | .00000353 | com.miamiherald |
341 | 24329910 | 2057 | .00000711 | com.msdn.blogs |
342 | 24329688 | 3651 | .00000381 | com.refinery29 |
343 | 24328628 | 2118 | .00000688 | com.ign |
344 | 24328504 | 310 | .00006387 | com.livejournal |
345 | 24327956 | 4641 | .00000298 | com.panoramio |
346 | 24324718 | 2281 | .00000634 | edu.mit.web |
347 | 24324402 | 4910 | .00000281 | com.answers |
348 | 24323460 | 980 | .00001451 | com.apple.developer |
349 | 24321142 | 2632 | .00000542 | com.apple.phobos |
350 | 24320832 | 974 | .00001466 | com.example |
351 | 24318370 | 155 | .00010226 | com.polyvore |
352 | 24314874 | 1003 | .00001410 | com.marriott |
353 | 24314410 | 593 | .00002278 | com.dribbble |
354 | 24314292 | 3608 | .00000386 | org.greenpeace |
355 | 24314094 | 667 | .00002032 | com.sxsw |
356 | 24310974 | 2928 | .00000485 | com.newscientist |
357 | 24310954 | 205 | .00007659 | com.facebook.nl-nl |
358 | 24310176 | 5470 | .00000250 | com.dreamstime |
359 | 24309666 | 4235 | .00000326 | com.chronicle |
360 | 24308702 | 480 | .00003159 | net.php |
361 | 24307464 | 5030 | .00000273 | org.moma |
362 | 24303536 | 7255 | .00000191 | org.grist |
363 | 24303532 | 207 | .00007629 | com.facebook.pl-pl |
364 | 24303454 | 2426 | .00000592 | com.ehow |
365 | 24302498 | 56 | .00023600 | com.wp.i2 |
366 | 24301188 | 28 | .00045486 | com.urbandictionary |
367 | 24299868 | 1190 | .00001160 | com.fb |
368 | 24297602 | 246 | .00006938 | org.cwa-union |
369 | 24297130 | 360 | .00004944 | com.disqus |
370 | 24290242 | 441 | .00003574 | com.alexa |
371 | 24288344 | 1991 | .00000736 | com.lifehacker |
372 | 24287342 | 2687 | .00000528 | gov.fws |
373 | 24287314 | 2416 | .00000594 | uk.co.mirror |
374 | 24285594 | 4702 | .00000294 | com.rottentomatoes |
375 | 24283856 | 722 | .00001816 | com.bitly |
376 | 24283856 | 2850 | .00000499 | gov.archives |
377 | 24280836 | 4042 | .00000343 | com.vogue |
378 | 24280580 | 4998 | .00000276 | com.patheos |
379 | 24276936 | 5141 | .00000268 | com.snopes |
380 | 24275338 | 3214 | .00000443 | com.zimbio |
381 | 24273644 | 9247 | .00000147 | com.infowars |
382 | 24271098 | 3246 | .00000438 | com.technet.blogs |
383 | 24267918 | 2140 | .00000683 | com.hubspot.blog |
384 | 24265636 | 3639 | .00000382 | com.marthastewart |
385 | 24265482 | 235 | .00007000 | com.facebook.pt-pt |
386 | 24264716 | 721 | .00001818 | com.salesforce |
387 | 24261210 | 736 | .00001784 | com.nwsource.seattletimes |
388 | 24259634 | 5424 | .00000252 | com.gq |
389 | 24259334 | 5525 | .00000247 | uk.org.tate |
390 | 24255346 | 614 | .00002189 | com.orkut |
391 | 24254838 | 3037 | .00000467 | com.gallup |
392 | 24253488 | 5260 | .00000261 | com.oregonlive |
393 | 24253108 | 271 | .00006760 | com.facebook.zh-tw |
394 | 24251676 | 1023 | .00001376 | com.wunderground |
395 | 24251452 | 4356 | .00000318 | com.mlb.mlb |
396 | 24250682 | 4493 | .00000307 | com.motherjones |
397 | 24250028 | 1084 | .00001272 | com.inc |
398 | 24245212 | 1766 | .00000838 | com.target |
399 | 24245044 | 3265 | .00000435 | com.google.profiles |
400 | 24243106 | 6758 | .00000204 | com.sheknows |
401 | 24242896 | 4403 | .00000315 | li.paper |
402 | 24242100 | 1931 | .00000768 | com.lulu |
403 | 24240898 | 6136 | .00000225 | com.petapixel |
404 | 24240804 | 154 | .00010339 | net.fbcdn.xx.scontent |
405 | 24240404 | 3693 | .00000376 | au.com.news |
406 | 24238646 | 6525 | .00000211 | com.ndtv |
407 | 24237706 | 1983 | .00000740 | gov.nih.nlm |
408 | 24235826 | 10370 | .00000131 | com.cnn.blogs.politicalticker |
409 | 24234434 | 352 | .00005179 | org.gnu |
410 | 24233398 | 53 | .00024219 | us.peeep |
411 | 24232492 | 4365 | .00000317 | org.thinkprogress |
412 | 24232402 | 2681 | .00000529 | com.nba |
413 | 24230752 | 559 | .00002517 | com.android.market |
414 | 24227512 | 9722 | .00000140 | com.kodak |
415 | 24224448 | 4797 | .00000289 | edu.brookings |
416 | 24223792 | 3583 | .00000389 | com.css-tricks |
417 | 24219520 | 4692 | .00000295 | com.latimes.latimesblogs |
418 | 24218162 | 7020 | .00000197 | com.dailykos |
419 | 24217922 | 2643 | .00000540 | com.popsugar |
420 | 24216202 | 4721 | .00000293 | com.rt |
421 | 24214564 | 1249 | .00001102 | in.co.google |
422 | 24210576 | 263 | .00006836 | com.facebook.sv-se |
423 | 24205520 | 2501 | .00000571 | com.nfl |
424 | 24205488 | 863 | .00001556 | org.doi.dx |
425 | 24203562 | 7641 | .00000181 | com.care2 |
426 | 24199082 | 4847 | .00000285 | com.plurk |
427 | 24199002 | 14486 | .00000101 | com.makezine.blog |
428 | 24194726 | 519 | .00002888 | com.mozilla |
429 | 24192646 | 642 | .00002103 | com.barnesandnoble |
430 | 24192184 | 4227 | .00000326 | org.raspberrypi |
431 | 24187602 | 5989 | .00000230 | com.jezebel |
432 | 24187016 | 1145 | .00001231 | org.python |
433 | 24184408 | 2410 | .00000596 | com.psychologytoday |
434 | 24183280 | 5080 | .00000271 | com.mediabistro |
435 | 24182428 | 3657 | .00000380 | com.instructables |
436 | 24181806 | 4487 | .00000308 | com.baltimoresun |
437 | 24179686 | 1354 | .00001044 | com.google.scholar |
438 | 24178630 | 261 | .00006882 | net.akamaihd.fbcdn-sphotos-a-a |
439 | 24178612 | 23 | .00054639 | com.bootstrapcdn.maxcdn |
440 | 24178472 | 1234 | .00001124 | com.linkedin.ca |
441 | 24177928 | 6438 | .00000214 | com.dezeen |
442 | 24176854 | 2474 | .00000578 | com.people |
443 | 24175148 | 1002 | .00001411 | com.mediafire |
444 | 24174352 | 2470 | .00000578 | com.indeed |
445 | 24171920 | 4428 | .00000312 | net.comcast.home |
446 | 24171428 | 2657 | .00000535 | com.readwriteweb |
447 | 24168478 | 3905 | .00000349 | com.macworld |
448 | 24167630 | 1481 | .00000940 | com.box |
449 | 24165256 | 3575 | .00000390 | es.elmundo |
450 | 24165178 | 746 | .00001760 | com.microsoft.technet |
451 | 24163738 | 1844 | .00000801 | com.500px |
452 | 24162276 | 8600 | .00000159 | com.consumerist |
453 | 24161110 | 8423 | .00000163 | com.uproxx |
454 | 24160036 | 13745 | .00000107 | com.dawn |
455 | 24157468 | 2049 | .00000714 | com.sciencedaily |
456 | 24157122 | 7100 | .00000195 | org.alternet |
457 | 24156724 | 12559 | .00000117 | com.chicagonow |
458 | 24156126 | 959 | .00001516 | com.photobucket |
459 | 24154688 | 6392 | .00000216 | com.designboom |
460 | 24153780 | 3864 | .00000355 | com.blurb |
461 | 24151830 | 4664 | .00000296 | com.weheartit |
462 | 24148122 | 502 | .00002977 | com.opera |
463 | 24144302 | 1063 | .00001300 | gov.epa |
464 | 24143138 | 11507 | .00000127 | com.wbir |
465 | 24140646 | 4335 | .00000319 | com.foreignpolicy |
466 | 24139430 | 458 | .00003416 | org.ietf |
467 | 24136252 | 4034 | .00000343 | com.nikkei |
468 | 24132110 | 31 | .00043664 | com.statcounter |
469 | 24131460 | 237 | .00006978 | com.facebook.th-th |
470 | 24130856 | 6366 | .00000217 | com.geekwire |
471 | 24126266 | 1625 | .00000893 | com.linkedin.in |
472 | 24124158 | 8222 | .00000167 | com.appleinsider |
473 | 24123730 | 7141 | .00000194 | com.avclub |
474 | 24121082 | 252 | .00006926 | net.akamaihd.fbcdn-profile-a |
475 | 24120082 | 15904 | .00000093 | com.wonkette |
476 | 24118694 | 3871 | .00000354 | com.chron |
477 | 24114758 | 977 | .00001461 | com.houzz |
478 | 24114164 | 269 | .00006781 | com.facebook.tr-tr |
479 | 24112388 | 623 | .00002160 | gov.ftc |
480 | 24111450 | 8902 | .00000153 | com.reason |
481 | 24109710 | 3459 | .00000403 | tv.blip |
482 | 24106846 | 1484 | .00000938 | com.google.photos |
483 | 24106650 | 543 | .00002611 | com.oracle |
484 | 24105370 | 7602 | .00000182 | com.pastemagazine |
485 | 24103328 | 1483 | .00000938 | gov.copyright |
486 | 24100786 | 3055 | .00000464 | org.aclu |
487 | 24094758 | 3076 | .00000460 | com.philly |
488 | 24093252 | 1685 | .00000878 | com.squareup |
489 | 24089356 | 1086 | .00001272 | com.samsung |
490 | 24087978 | 3118 | .00000451 | com.me.web |
491 | 24087660 | 1231 | .00001129 | com.cdbaby |
492 | 24087564 | 6720 | .00000205 | com.deseretnews |
493 | 24083176 | 6367 | .00000217 | com.io9 |
494 | 24081856 | 402 | .00004145 | org.wikipedia.de |
495 | 24081124 | 11338 | .00000128 | org.peta |
496 | 24080908 | 12982 | .00000113 | com.hongkiat |
497 | 24080680 | 4684 | .00000295 | com.tmz |
498 | 24077924 | 818 | .00001605 | com.amazon.aws |
499 | 24077256 | 8766 | .00000156 | org.pri |
500 | 24074186 | 2270 | .00000638 | com.oreilly |
501 | 24074110 | 2291 | .00000630 | com.freewebs |
502 | 24074088 | 1235 | .00001123 | org.wikipedia.it |
503 | 24073390 | 3848 | .00000356 | com.azcentral |
504 | 24073192 | 6245 | .00000221 | com.mentalfloss |
505 | 24069232 | 523 | .00002830 | fr.google |
506 | 24068848 | 17311 | .00000085 | com.tor |
507 | 24067132 | 2904 | .00000490 | org.worldbank |
508 | 24067054 | 1762 | .00000841 | de.heise |
509 | 24066464 | 8159 | .00000169 | com.liveleak |
510 | 24066038 | 8772 | .00000155 | com.gothamist |
511 | 24064742 | 3396 | .00000415 | com.latimes.articles |
512 | 24064626 | 9244 | .00000147 | com.extremetech |
513 | 24061640 | 3705 | .00000374 | com.yahoo.answers |
514 | 24061430 | 32734 | .00000046 | com.wreg |
515 | 24061166 | 8511 | .00000161 | com.nybooks |
516 | 24060780 | 5379 | .00000254 | com.pbase |
517 | 24059834 | 5049 | .00000272 | edu.nap |
518 | 24058554 | 6572 | .00000210 | com.cnn.sportsillustrated |
519 | 24057656 | 6644 | .00000208 | com.grantland |
520 | 24056572 | 1639 | .00000887 | gov.loc |
521 | 24055816 | 4418 | .00000313 | org.nobelprize |
522 | 24054494 | 4260 | .00000324 | com.eonline |
523 | 24053380 | 7372 | .00000189 | com.haaretz |
524 | 24053122 | 5199 | .00000264 | com.bhphotovideo |
525 | 24052964 | 4827 | .00000287 | com.esri |
526 | 24052152 | 9995 | .00000136 | org.commondreams |
527 | 24052148 | 5161 | .00000266 | com.glamour |
528 | 24051456 | 1460 | .00000960 | com.fineartamerica |
529 | 24049980 | 6715 | .00000205 | edu.uchicago.press |
530 | 24048432 | 9103 | .00000149 | gov.nasa.science |
531 | 24046894 | 35700 | .00000042 | com.bossip |
532 | 24046632 | 19728 | .00000075 | com.neatorama |
533 | 24044690 | 720 | .00001819 | org.acm |
534 | 24044462 | 4081 | .00000340 | org.weforum |
535 | 24044308 | 1897 | .00000777 | it.amazon |
536 | 24044006 | 1959 | .00000752 | me.flavors |
537 | 24043394 | 9459 | .00000143 | com.howstuffworks |
538 | 24041872 | 6466 | .00000213 | com.9to5mac |
539 | 24040142 | 4517 | .00000306 | com.uber |
540 | 24039284 | 680 | .00001992 | com.bloglovin |
541 | 24037882 | 16205 | .00000091 | com.highsnobiety |
542 | 24037316 | 4560 | .00000303 | com.audible |
543 | 24036490 | 7371 | .00000189 | com.complex |
544 | 24036176 | 15916 | .00000093 | com.time.swampland |
545 | 24034654 | 4025 | .00000344 | com.lonelyplanet |
546 | 24033934 | 11839 | .00000123 | com.dilbert |
547 | 24033436 | 3125 | .00000450 | com.deezer |
548 | 24033252 | 4168 | .00000333 | com.lynda |
549 | 24033222 | 9648 | .00000141 | com.discovermagazine.blogs |
550 | 24030840 | 4421 | .00000313 | com.cbs |
551 | 24030616 | 3761 | .00000367 | net.daringfireball |
552 | 24029860 | 1933 | .00000766 | com.patreon |
553 | 24027164 | 7181 | .00000194 | com.deadspin |
554 | 24023692 | 8065 | .00000171 | com.bostonherald |
555 | 24023434 | 6210 | .00000223 | com.cosmopolitan |
556 | 24022760 | 829 | .00001585 | jp.ne.goo.blog |
557 | 24021384 | 18750 | .00000079 | com.hotair |
558 | 24021238 | 6632 | .00000208 | com.librarything |
559 | 24021014 | 6000 | .00000230 | cc.arduino |
560 | 24019102 | 6652 | .00000207 | com.logitech |
561 | 24017382 | 3257 | .00000437 | com.asahi |
562 | 24015414 | 4050 | .00000342 | com.nationalgeographic.news |
563 | 24015322 | 20178 | .00000073 | com.matadornetwork |
564 | 24015270 | 4638 | .00000298 | com.observer |
565 | 24012214 | 5943 | .00000231 | com.copyblogger |
566 | 24011444 | 5514 | .00000247 | com.seekingalpha |
567 | 24010644 | 227 | .00007141 | mp.j |
568 | 24010056 | 236 | .00006995 | com.xiami |
569 | 24009786 | 3429 | .00000408 | com.elpais |
570 | 24008040 | 2638 | .00000541 | com.ew |
571 | 24007730 | 7976 | .00000173 | com.bonappetit |
572 | 24006944 | 4514 | .00000306 | org.lds |
573 | 24006798 | 3385 | .00000416 | com.cbssports |
574 | 24006242 | 7137 | .00000194 | com.cbslocal.newyork |
575 | 24004794 | 11215 | .00000129 | com.modelmayhem |
576 | 24004760 | 982 | .00001449 | eu.europa |
577 | 24003960 | 731 | .00001790 | com.google.hangouts |
578 | 24003550 | 6718 | .00000205 | com.vancouversun |
579 | 24002320 | 7253 | .00000191 | com.talkingpointsmemo |
580 | 24001352 | 1975 | .00000743 | com.google.spreadsheets |
581 | 24000984 | 2570 | .00000556 | cn.com.sina.blog |
582 | 24000402 | 3222 | .00000443 | com.ravelry |
583 | 23999440 | 1795 | .00000824 | com.amazon.astore |
584 | 23997902 | 1755 | .00000843 | org.eff |
585 | 23997856 | 573 | .00002431 | com.adobe.helpx |
586 | 23997488 | 7093 | .00000195 | uk.ac.vam |
587 | 23997220 | 4633 | .00000299 | com.vice.motherboard |
588 | 23996036 | 16595 | .00000089 | com.thesmokinggun |
589 | 23994000 | 11829 | .00000124 | com.imore |
590 | 23993422 | 101 | .00016094 | com.tinypic |
591 | 23993226 | 545 | .00002574 | com.msn |
592 | 23993162 | 4385 | .00000316 | ca.globalnews |
593 | 23993118 | 8243 | .00000167 | com.discovery.dsc |
594 | 23992702 | 5617 | .00000244 | com.pitchfork |
595 | 23991424 | 5858 | .00000233 | com.blogspot.youtube-global |
596 | 23990288 | 8750 | .00000156 | com.realclearpolitics |
597 | 23987906 | 678 | .00001996 | it.google |
598 | 23987890 | 7105 | .00000195 | com.scmp |
599 | 23986188 | 2777 | .00000508 | jp.or.nhk |
600 | 23982278 | 686 | .00001969 | com.hubpages |
601 | 23981872 | 2779 | .00000508 | gov.uspto |
602 | 23981698 | 1374 | .00001028 | com.timeanddate |
603 | 23981438 | 8335 | .00000165 | com.christianitytoday |
604 | 23980168 | 4066 | .00000341 | net.faz |
605 | 23978898 | 7230 | .00000192 | com.theweek |
606 | 23977900 | 28318 | .00000054 | com.gottabemobile |
607 | 23977382 | 7199 | .00000193 | org.plos.blogs |
608 | 23977204 | 4627 | .00000299 | com.howtogeek |
609 | 23975494 | 2009 | .00000731 | com.getpocket |
610 | 23973182 | 5047 | .00000272 | com.kotaku |
611 | 23972704 | 3324 | .00000425 | cc.tiny |
612 | 23971046 | 12039 | .00000122 | com.perezhilton |
613 | 23969426 | 13727 | .00000107 | com.mcclatchydc |
614 | 23968350 | 651 | .00002075 | com.aol |
615 | 23968112 | 7323 | .00000190 | com.lmgtfy |
616 | 23966254 | 822 | .00001598 | com.businesswire |
617 | 23966092 | 4345 | .00000319 | org.ibiblio |
618 | 23965478 | 3362 | .00000420 | org.unicef |
619 | 23965442 | 2417 | .00000594 | com.hollywoodreporter |
620 | 23960550 | 1041 | .00001333 | int.who |
621 | 23959580 | 1026 | .00001373 | com.android.developer |
622 | 23957294 | 5198 | .00000264 | edu.cmu |
623 | 23956930 | 6817 | .00000203 | com.sbnation |
624 | 23956548 | 7619 | .00000182 | com.marvel |
625 | 23956496 | 6517 | .00000211 | edu.harvard.law.blogs |
626 | 23952002 | 3640 | .00000382 | com.fiverr |
627 | 23950554 | 2718 | .00000520 | gov.dhs |
628 | 23949964 | 2653 | .00000535 | com.smashwords |
629 | 23948856 | 209 | .00007583 | com.facebook.ja-jp |
630 | 23948760 | 5633 | .00000243 | com.stagram.web |
631 | 23948680 | 14134 | .00000104 | com.nytimes.blogs.thelede |
632 | 23948676 | 6768 | .00000204 | com.nme |
633 | 23946666 | 4902 | .00000281 | com.hbo |
634 | 23946272 | 13062 | .00000112 | org.counterpunch |
635 | 23946038 | 11613 | .00000126 | com.cultofmac |
636 | 23943994 | 1543 | .00000913 | com.evernote |
637 | 23943942 | 270 | .00006766 | com.360doc |
638 | 23942228 | 9626 | .00000141 | com.cracked |
639 | 23940820 | 2486 | .00000575 | com.blogtalkradio |
640 | 23939540 | 357 | .00004971 | com.gravatar.en |
641 | 23937964 | 389 | .00004423 | org.icann |
642 | 23937816 | 684 | .00001977 | com.ggpht.lh3 |
643 | 23937786 | 8491 | .00000161 | com.teenvogue |
644 | 23937342 | 16658 | .00000088 | com.flickriver |
645 | 23936480 | 4027 | .00000344 | com.smithsonianmag |
646 | 23935070 | 5480 | .00000249 | com.codeproject |
647 | 23934200 | 260 | .00006883 | net.fbcdn.ak.static |
648 | 23934102 | 1816 | .00000815 | gov.census |
649 | 23933432 | 657 | .00002057 | com.linkedin.uk |
650 | 23932600 | 577 | .00002400 | com.w3schools |
651 | 23931740 | 4463 | .00000310 | com.mac.homepage |
652 | 23929122 | 10155 | .00000134 | com.rawstory |
653 | 23927662 | 3404 | .00000414 | com.squidoo |
654 | 23924746 | 2044 | .00000715 | com.dell |
655 | 23922980 | 488 | .00003104 | com.4shared |
656 | 23922814 | 14131 | .00000104 | org.mediamatters |
657 | 23920802 | 8760 | .00000156 | com.parents |
658 | 23920526 | 7287 | .00000191 | com.opera.my |
659 | 23920482 | 8124 | .00000169 | org.ieee.spectrum |
660 | 23920150 | 1038 | .00001337 | jp.geocities |
661 | 23915244 | 6512 | .00000212 | com.townhall |
662 | 23913292 | 399 | .00004167 | org.mozilla.support |
663 | 23912792 | 2006 | .00000732 | org.oecd |
664 | 23911804 | 1048 | .00001324 | org.eclipse |
665 | 23911584 | 20701 | .00000072 | com.hellogiggles |
666 | 23910298 | 8868 | .00000154 | com.clarin |
667 | 23909680 | 827 | .00001589 | com.symantec |
668 | 23909040 | 7558 | .00000184 | org.aaas |
669 | 23908752 | 2311 | .00000621 | com.justgiving |
670 | 23908444 | 4671 | .00000296 | org.coursera |
671 | 23906482 | 1938 | .00000763 | com.nydailynews |
672 | 23905082 | 343 | .00005302 | com.googleusercontent.lh6 |
673 | 23904092 | 387 | .00004468 | com.soundcloud.w |
674 | 23902250 | 1027 | .00001368 | gov.irs |
675 | 23902232 | 15133 | .00000097 | com.craveonline |
676 | 23902136 | 4011 | .00000345 | com.channel4 |
677 | 23899916 | 20101 | .00000074 | com.nature.blogs |
678 | 23897218 | 9313 | .00000146 | com.myspace.blog |
679 | 23895526 | 9205 | .00000147 | com.klout |
680 | 23894208 | 1412 | .00000996 | com.steampowered.store |
681 | 23894002 | 10218 | .00000133 | com.boredpanda |
682 | 23893512 | 707 | .00001860 | com.friendster |
683 | 23893504 | 36 | .00032755 | com.godaddy |
684 | 23893052 | 2950 | .00000481 | com.amzn |
685 | 23892424 | 13276 | .00000110 | ca.globalresearch |
686 | 23891298 | 17990 | .00000082 | org.calacademy |
687 | 23890756 | 6671 | .00000207 | net.box |
688 | 23889636 | 4900 | .00000282 | com.fanpop |
689 | 23889094 | 5845 | .00000234 | com.datacenterknowledge |
690 | 23887196 | 27479 | .00000056 | com.americanrhetoric |
691 | 23886556 | 5185 | .00000265 | com.threadless |
692 | 23884966 | 4246 | .00000325 | ms.1drv |
693 | 23883530 | 10188 | .00000134 | com.barackobama |
694 | 23883098 | 12688 | .00000116 | com.spin |
695 | 23883092 | 13664 | .00000107 | com.yahoo.pipes |
696 | 23882956 | 8684 | .00000157 | com.comedycentral |
697 | 23882896 | 1066 | .00001299 | com.googleartproject |
698 | 23882788 | 2652 | .00000535 | com.computerworld |
699 | 23881862 | 16912 | .00000087 | com.giantbomb |
700 | 23881530 | 276 | .00006705 | com.weibo.vdisk |
701 | 23881520 | 23165 | .00000064 | com.wattsupwiththat |
702 | 23879146 | 3942 | .00000347 | com.screencast |
703 | 23877632 | 10235 | .00000133 | org.tvtropes |
704 | 23877470 | 4460 | .00000310 | com.megaupload |
705 | 23876764 | 16116 | .00000092 | com.catholicnewsagency |
706 | 23876700 | 1476 | .00000947 | org.hbr |
707 | 23876552 | 24017 | .00000062 | com.cnn.blogs.religion |
708 | 23874712 | 161 | .00009861 | com.mailchimp |
709 | 23874368 | 2437 | .00000589 | com.alibaba |
710 | 23874060 | 2992 | .00000474 | com.ezinearticles |
711 | 23873944 | 3058 | .00000463 | uk.co.ebay |
712 | 23873742 | 1146 | .00001228 | org.un |
713 | 23873062 | 1710 | .00000869 | org.iso |
714 | 23867822 | 999 | .00001415 | com.snapchat |
715 | 23867776 | 6315 | .00000219 | com.victoriassecret |
716 | 23866136 | 13917 | .00000105 | com.washingtonian |
717 | 23865960 | 25817 | .00000059 | com.humanevents |
718 | 23865938 | 1722 | .00000864 | com.newgrounds |
719 | 23864840 | 1677 | .00000882 | com.biblegateway |
720 | 23860126 | 2516 | .00000567 | com.friendfeed |
721 | 23858304 | 18020 | .00000082 | com.moddb |
722 | 23857708 | 27085 | .00000057 | com.singularityhub |
723 | 23854512 | 4265 | .00000324 | com.pixlr |
724 | 23853694 | 9675 | .00000140 | com.marieclaire |
725 | 23851816 | 242 | .00006954 | com.facebook.ar-ar |
726 | 23851144 | 8505 | .00000161 | org.ams |
727 | 23851132 | 2494 | .00000572 | com.createspace |
728 | 23849798 | 1615 | .00000899 | com.ebay.stores |
729 | 23849790 | 1169 | .00001199 | com.sciencedirect |
730 | 23849238 | 5291 | .00000259 | com.tampabay |
731 | 23848164 | 4113 | .00000337 | com.ibtimes |
732 | 23847692 | 8114 | .00000170 | com.oreilly.radar |
733 | 23846722 | 16808 | .00000088 | com.escapistmagazine |
734 | 23846690 | 1233 | .00001125 | org.seomoz |
735 | 23845474 | 140 | .00011415 | com.ytimg.i |
736 | 23845438 | 1618 | .00000895 | com.net-a-porter |
737 | 23845312 | 1829 | .00000810 | com.cnet.download |
738 | 23845008 | 13611 | .00000108 | org.brooklynmuseum |
739 | 23844826 | 8231 | .00000167 | net.fanfiction |
740 | 23844270 | 15039 | .00000097 | com.flavorwire |
741 | 23843454 | 3472 | .00000402 | com.modcloth |
742 | 23841740 | 30133 | .00000050 | org.jihadwatch |
743 | 23841066 | 1160 | .00001210 | com.weather |
744 | 23841000 | 4818 | .00000287 | to.gplus |
745 | 23840946 | 1537 | .00000919 | com.viadeo |
746 | 23840690 | 7902 | .00000174 | org.edutopia |
747 | 23840514 | 3327 | .00000425 | org.apa |
748 | 23839446 | 5982 | .00000230 | de.tagesschau |
749 | 23837820 | 2994 | .00000474 | me.paypal |
750 | 23837530 | 18909 | .00000078 | edu.hawaii |
751 | 23837158 | 576 | .00002411 | com.images-amazon.ecx |
752 | 23836826 | 2544 | .00000561 | gov.fbi |
753 | 23836740 | 5216 | .00000263 | com.manta |
754 | 23836306 | 6318 | .00000219 | uk.org.nationaltrust |
755 | 23834992 | 5593 | .00000244 | com.googleusercontent.webcache |
756 | 23834916 | 8158 | .00000169 | org.truth-out |
757 | 23834858 | 4989 | .00000276 | com.typepad.sethgodin |
758 | 23833432 | 21393 | .00000069 | org.spectator |
759 | 23832456 | 7393 | .00000188 | com.mendeley |
760 | 23830808 | 3077 | .00000460 | tr.com.google |
761 | 23830588 | 2961 | .00000480 | org.cancer |
762 | 23830048 | 3361 | .00000421 | com.networkworld |
763 | 23829968 | 14271 | .00000103 | com.topix |
764 | 23829470 | 5491 | .00000249 | com.starwars |
765 | 23829128 | 2735 | .00000517 | com.hulu |
766 | 23826852 | 7672 | .00000181 | com.discovery.news |
767 | 23826450 | 2600 | .00000550 | org.dmoz |
768 | 23825662 | 6630 | .00000208 | com.villagevoice |
769 | 23824160 | 5362 | .00000255 | com.dpreview |
770 | 23823926 | 5207 | .00000263 | edu.cmu.cs |
771 | 23823176 | 15187 | .00000097 | com.dazeddigital |
772 | 23822508 | 3709 | .00000374 | org.mozilla.wiki |
773 | 23821830 | 1167 | .00001201 | gov.fda |
774 | 23820892 | 1433 | .00000981 | gov.justice |
775 | 23820826 | 3603 | .00000387 | gov.cia |
776 | 23820332 | 439 | .00003639 | com.posterous |
777 | 23820328 | 12812 | .00000114 | au.com.sbs |
778 | 23820324 | 7682 | .00000180 | com.gamasutra |
779 | 23819870 | 6665 | .00000207 | com.epicurious |
780 | 23819266 | 3771 | .00000365 | com.socialmediaexaminer |
781 | 23819210 | 9190 | .00000148 | org.sierraclub |
782 | 23819200 | 3366 | .00000420 | net.earthlink.home |
783 | 23818136 | 2005 | .00000732 | com.gartner |
784 | 23817246 | 2354 | .00000608 | com.theglobeandmail |
785 | 23817136 | 1336 | .00001063 | org.wikipedia.pt |
786 | 23816512 | 6963 | .00000198 | com.suntimes |
787 | 23816392 | 3390 | .00000416 | nl.xs4all |
788 | 23814692 | 25030 | .00000060 | com.elephantjournal |
789 | 23812264 | 6672 | .00000207 | com.cntraveler |
790 | 23812086 | 1004 | .00001409 | com.linkedin.fr |
791 | 23811986 | 4753 | .00000291 | com.nationalreview |
792 | 23811646 | 1999 | .00000735 | com.thefreedictionary |
793 | 23811380 | 280 | .00006674 | com.facebook.zh-cn |
794 | 23809264 | 4474 | .00000309 | uk.ac.ucl |
795 | 23806406 | 4613 | .00000299 | com.denverpost |
796 | 23806360 | 231 | .00007094 | kr.flic |
797 | 23806322 | 7693 | .00000180 | com.instyle |
798 | 23805592 | 23068 | .00000064 | edu.usra.lpi |
799 | 23804122 | 10350 | .00000131 | com.scobleizer |
800 | 23804022 | 4182 | .00000331 | uk.co.metro |
801 | 23803192 | 233 | .00007056 | jp.co.google |
802 | 23803018 | 4971 | .00000277 | com.nvidia |
803 | 23802870 | 3887 | .00000352 | com.irishtimes |
804 | 23802070 | 1287 | .00001099 | co.g |
805 | 23801772 | 6674 | .00000207 | edu.washington.depts |
806 | 23801418 | 7466 | .00000187 | com.tennessean |
807 | 23800658 | 2439 | .00000589 | com.hp |
808 | 23800322 | 3644 | .00000382 | com.aljazeera |
809 | 23800138 | 16549 | .00000089 | com.wmagazine |
810 | 23798820 | 579 | .00002383 | uk.co.eventbrite |
811 | 23798338 | 421 | .00003858 | com.googleadservices |
812 | 23797806 | 10117 | .00000134 | com.eater |
813 | 23797390 | 8292 | .00000166 | com.yahoo.movies |
814 | 23795640 | 6997 | .00000197 | com.bhg |
815 | 23794498 | 26069 | .00000058 | org.fair |
816 | 23793616 | 2379 | .00000600 | com.bostonglobe |
817 | 23793136 | 6353 | .00000217 | com.autoblog |
818 | 23792828 | 10025 | .00000136 | com.linuxjournal |
819 | 23792666 | 21056 | .00000070 | com.thisiscolossal |
820 | 23791402 | 15130 | .00000097 | com.radaronline |
821 | 23791170 | 14444 | .00000101 | com.fourhourworkweek |
822 | 23790400 | 3416 | .00000412 | com.thestar |
823 | 23789630 | 8992 | .00000151 | gov.nasa.apod |
824 | 23789108 | 3558 | .00000391 | com.twitpic |
825 | 23789038 | 85 | .00017941 | com.twitter.status |
826 | 23787750 | 7183 | .00000194 | com.financialexpress |
827 | 23787178 | 7036 | .00000197 | mx.com.eluniversal |
828 | 23787170 | 4196 | .00000330 | edu.princeton |
829 | 23786954 | 12725 | .00000115 | edu.uvm |
830 | 23786442 | 1244 | .00001108 | com.skype |
831 | 23785620 | 1350 | .00001049 | com.flickr.static.farm3 |
832 | 23784050 | 19151 | .00000077 | com.slashfilm |
833 | 23783102 | 8462 | .00000162 | org.nypl |
834 | 23781314 | 13888 | .00000106 | com.associatedcontent |
835 | 23780690 | 3787 | .00000363 | org.gutenberg |
836 | 23779958 | 217 | .00007285 | org.bbb |
837 | 23779836 | 7685 | .00000180 | com.macrumors |
838 | 23778094 | 13271 | .00000110 | com.theroot |
839 | 23776510 | 5072 | .00000271 | com.akamai |
840 | 23775880 | 3305 | .00000428 | au.com.theaustralian |
841 | 23775714 | 13415 | .00000109 | com.factmag |
842 | 23774368 | 18373 | .00000080 | com.marykay |
843 | 23774028 | 7646 | .00000181 | com.viddler |
844 | 23771308 | 1937 | .00000763 | com.android |
845 | 23770202 | 7311 | .00000191 | gov.loc.memory |
846 | 23769696 | 1765 | .00000840 | com.yahoo.search |
847 | 23769628 | 1149 | .00001223 | com.feedburner |
848 | 23769516 | 1142 | .00001233 | com.google.adwords |
849 | 23769416 | 5410 | .00000253 | com.diigo |
850 | 23768256 | 10007 | .00000136 | com.mattcutts |
851 | 23767426 | 15826 | .00000093 | org.davidsuzuki |
852 | 23766438 | 11933 | .00000123 | com.break |
853 | 23766120 | 475 | .00003242 | org.drupal |
854 | 23765766 | 33614 | .00000045 | com.animalnewyork |
855 | 23765754 | 19037 | .00000078 | com.crooksandliars |
856 | 23765726 | 1920 | .00000772 | com.steamcommunity |
857 | 23765702 | 13001 | .00000113 | com.weeklystandard |
858 | 23765452 | 10549 | .00000130 | com.tuaw |
859 | 23764140 | 21194 | .00000070 | com.inthesetimes |
860 | 23761976 | 5530 | .00000247 | org.hrc |
861 | 23761956 | 1860 | .00000793 | com.networkedblogs |
862 | 23760534 | 3114 | .00000452 | com.theknot |
863 | 23760368 | 33353 | .00000045 | com.littlegreenfootballs |
864 | 23760260 | 3635 | .00000382 | com.barnesandnoble.search |
865 | 23759984 | 3656 | .00000380 | com.globo.g1 |
866 | 23758774 | 8577 | .00000159 | com.smittenkitchen |
867 | 23758726 | 2079 | .00000705 | es.amazon |
868 | 23758722 | 14721 | .00000100 | com.ktla |
869 | 23758592 | 19539 | .00000076 | com.rediff |
870 | 23758238 | 21443 | .00000069 | com.artofmanliness |
871 | 23758220 | 17645 | .00000083 | org.whitney |
872 | 23757664 | 9843 | .00000138 | com.menshealth |
873 | 23757282 | 2211 | .00000660 | com.nypost |
874 | 23756390 | 2958 | .00000481 | com.gstatic.t0 |
875 | 23756362 | 13499 | .00000109 | com.ffffound |
876 | 23755804 | 9503 | .00000143 | com.inhabitat |
877 | 23755314 | 4301 | .00000322 | edu.columbia |
878 | 23754390 | 10407 | .00000131 | com.hypebeast |
879 | 23753534 | 4665 | .00000296 | com.thinkgeek |
880 | 23753136 | 7586 | .00000183 | com.foodandwine |
881 | 23752360 | 9750 | .00000139 | org.wikibooks.en |
882 | 23750508 | 11492 | .00000127 | com.gocomics |
883 | 23750416 | 153 | .00010465 | ru.yandex |
884 | 23749606 | 21336 | .00000069 | edu.rochester |
885 | 23749390 | 43018 | .00000035 | com.2dopeboyz |
886 | 23747876 | 19010 | .00000078 | org.nycgovparks |
887 | 23747726 | 17755 | .00000083 | com.justjared |
888 | 23746300 | 15964 | .00000092 | com.blogspot.googlemobile |
889 | 23745588 | 6710 | .00000206 | com.wwd |
890 | 23745272 | 333 | .00005701 | org.fedoraproject |
891 | 23742476 | 13660 | .00000107 | com.hollywoodlife |
892 | 23742310 | 9994 | .00000136 | mil.navy |
893 | 23741808 | 967 | .00001490 | com.staticflickr.farm8 |
894 | 23741366 | 20244 | .00000073 | com.coolhunting |
895 | 23740522 | 17885 | .00000082 | com.magcloud |
896 | 23739544 | 2770 | .00000509 | com.gettyimages |
897 | 23737558 | 15787 | .00000093 | com.washingtonpost.articles |
898 | 23737374 | 272 | .00006752 | net.akamaihd.fbstatic-a |
899 | 23737334 | 3662 | .00000380 | uk.co.thesun |
900 | 23735470 | 5484 | .00000249 | edu.yale |
901 | 23734544 | 19149 | .00000077 | org.labnol |
902 | 23734358 | 449 | .00003503 | nl.google |
903 | 23733716 | 25945 | .00000058 | org.globalvoicesonline |
904 | 23732498 | 2489 | .00000574 | dk.google |
905 | 23732492 | 1345 | .00001055 | com.staticflickr.farm9 |
906 | 23732404 | 228 | .00007126 | com.facebook.da-dk |
907 | 23732036 | 350 | .00005193 | us.imageshack |
908 | 23731808 | 3550 | .00000392 | com.mercurynews |
909 | 23731588 | 24915 | .00000060 | com.celebuzz |
910 | 23731524 | 5769 | .00000236 | com.yahoo.groups.tech |
911 | 23730844 | 5249 | .00000261 | int.esa |
912 | 23730468 | 6716 | .00000205 | com.linkedin.blog |
913 | 23730430 | 15982 | .00000092 | com.eurasiareview |
914 | 23730204 | 163 | .00009658 | com.blogblog.img1 |
915 | 23729604 | 16528 | .00000089 | com.redstate |
916 | 23728680 | 13070 | .00000112 | com.torrentfreak |
917 | 23728206 | 23033 | .00000065 | com.movieweb |
918 | 23727744 | 13113 | .00000112 | com.seroundtable |
919 | 23726214 | 1733 | .00000856 | edu.cornell.law |
920 | 23726080 | 3120 | .00000451 | com.nifty.homepage3 |
921 | 23725478 | 19732 | .00000075 | com.craphound |
922 | 23724518 | 700 | .00001890 | com.ggpht.lh6 |
923 | 23723162 | 1737 | .00000851 | com.hotmail |
924 | 23723128 | 31743 | .00000048 | org.thesocietypages |
925 | 23722898 | 2015 | .00000729 | com.spotify.play |
926 | 23722214 | 9125 | .00000149 | com.scientificamerican.blogs |
927 | 23722022 | 3445 | .00000405 | com.digitaltrends |
928 | 23721446 | 3659 | .00000380 | com.jamendo |
929 | 23721156 | 2859 | .00000498 | com.netflix |
930 | 23720582 | 15182 | .00000097 | com.stereogum |
931 | 23719906 | 84 | .00017961 | com.twitter.business |
932 | 23719456 | 180 | .00008153 | com.blogblog.img2 |
933 | 23719080 | 8438 | .00000163 | com.dailydot |
934 | 23717794 | 4118 | .00000337 | edu.bu |
935 | 23717000 | 4558 | .00000303 | com.zara |
936 | 23715354 | 8040 | .00000171 | net.asp.weblogs |
937 | 23714546 | 2453 | .00000583 | com.ebay.rover |
938 | 23713182 | 8138 | .00000169 | com.marketingprofs |
939 | 23712890 | 11959 | .00000122 | com.takepart |
940 | 23711162 | 6443 | .00000214 | org.propublica |
941 | 23708912 | 4866 | .00000284 | com.makeuseof |
942 | 23706304 | 13298 | .00000110 | com.models |
943 | 23705808 | 8228 | .00000167 | com.sportingnews |
944 | 23705042 | 6501 | .00000212 | com.digitaljournal |
945 | 23704958 | 5382 | .00000254 | com.active |
946 | 23704410 | 8643 | .00000158 | ar.com.lanacion |
947 | 23704224 | 11282 | .00000129 | com.ssrn |
948 | 23704176 | 14358 | .00000102 | com.gazette |
949 | 23704156 | 3007 | .00000471 | org.pewinternet |
950 | 23703788 | 12315 | .00000119 | org.caringbridge |
951 | 23702268 | 6669 | .00000207 | fm.ask |
952 | 23701806 | 7727 | .00000179 | com.politifact |
953 | 23701452 | 12920 | .00000113 | com.theoatmeal |
954 | 23700692 | 77 | .00018590 | com.twitter.api |
955 | 23700068 | 11585 | .00000126 | org.brainpickings |
956 | 23699570 | 6414 | .00000215 | com.harpercollins |
957 | 23699250 | 19689 | .00000075 | net.360cities |
958 | 23699166 | 55038 | .00000028 | nz.co.sciblogs |
959 | 23697458 | 4465 | .00000310 | com.starbucks |
960 | 23697382 | 5868 | .00000233 | com.elle |
961 | 23696812 | 28485 | .00000054 | com.listverse |
962 | 23695400 | 564 | .00002484 | com.booking |
963 | 23693668 | 2996 | .00000473 | com.dallasnews |
964 | 23693160 | 3621 | .00000384 | com.pastebin |
965 | 23692226 | 6156 | .00000225 | com.purevolume |
966 | 23691954 | 1310 | .00001077 | com.amazon.smile |
967 | 23691362 | 32716 | .00000046 | com.truthdig |
968 | 23691286 | 5706 | .00000240 | com.knowyourmeme |
969 | 23687582 | 25990 | .00000058 | com.babble.blogs |
970 | 23686080 | 3355 | .00000421 | com.vanityfair |
971 | 23685698 | 32746 | .00000046 | net.fubiz |
972 | 23685224 | 562 | .00002501 | com.giphy |
973 | 23684482 | 2113 | .00000689 | com.intel |
974 | 23683854 | 3139 | .00000447 | com.livescience |
975 | 23683060 | 13332 | .00000110 | uk.org.iwm |
976 | 23682760 | 5190 | .00000265 | com.randomhouse |
977 | 23681520 | 1191 | .00001159 | es.google.maps |
978 | 23681176 | 37675 | .00000040 | com.tucsonweekly |
979 | 23680142 | 9980 | .00000136 | com.gilt |
980 | 23678464 | 2799 | .00000503 | com.gstatic.t2 |
981 | 23678408 | 8966 | .00000152 | org.thisamericanlife |
982 | 23678164 | 22636 | .00000066 | uk.co.creativereview |
983 | 23676348 | 5792 | .00000235 | com.microsofttranslator |
984 | 23676190 | 1633 | .00000891 | gov.sec |
985 | 23674500 | 17492 | .00000084 | com.penny-arcade |
986 | 23673996 | 1210 | .00001140 | com.springer.link |
987 | 23673626 | 2412 | .00000596 | com.redhat |
988 | 23673126 | 19650 | .00000075 | org.newsbusters |
989 | 23672884 | 238 | .00006976 | com.facebook.el-gr |
990 | 23672818 | 8628 | .00000158 | com.heavy |
991 | 23672144 | 9097 | .00000150 | com.globalpost |
992 | 23671496 | 14074 | .00000104 | com.wisegeek |
993 | 23670910 | 7992 | .00000172 | com.animoto |
994 | 23670052 | 778 | .00001694 | com.naver.blog |
995 | 23669322 | 8386 | .00000164 | com.time.techland |
996 | 23669190 | 8652 | .00000158 | com.jalopnik |
997 | 23668684 | 3562 | .00000391 | com.indiatimes.economictimes |
998 | 23667394 | 474 | .00003251 | ru.vkontakte |
999 | 23666552 | 1487 | .00000937 | com.msnbc |
1000 | 23666094 | 16350 | .00000090 | com.rockpapershotgun |
Data and download instructions
The host-level graph as well as the rankings are placed on AWS S3 on the path
s3://commoncrawl/projects/hyperlinkgraph/cc-main-2017-feb-mar-apr-hostgraph/
Alternatively, you can use
https://data.commoncrawl.org/projects/hyperlinkgraph/cc-main-2017-feb-mar-apr-hostgraph/
as prefix to access the files from everywhere.
The following files and formats are provided:
Download files of the Common Crawl Feb/Mar/Apr 2017 host-level webgraph
Size | File | Description |
---|---|---|
2.72 GB | vertices.txt.gz | nodes 〈id, rev host〉 |
9.42 GB | edges.txt.gz | edges 〈from_id, to_id〉 |
4.51 GB | bvgraph.graph | graph in BVGraph format |
0.22 GB | bvgraph.offsets | |
1 kB | bvgraph.properties | |
5.06 GB | bvgraph-t.graph | transpose of the graph (outlinks mapped to inlinks) |
0.47 GB | bvgraph-t.offsets | |
1 kB | bvgraph-t.properties | |
1 kB | bvgraph.stats | WebGraph statistics |
6.26 GB | ranks.txt.gz | harmonic centrality and pagerank |
We hope the data will be useful for you to do any kind of research on ranking, graph analysis, link SPAM detection, etc. Let us know about your results via Common Crawl’s Google Group!
Credits
Thanks to
- Web Data Commons, for their web graph data set and everything related.
- Common Search; we first used their web graph to expand the crawler frontier, and Common Search’s cosr-back project was an important source of inspiration how to process our data using PySpark.
- the authors of the WebGraph framework, whose software simplifies the computation of rankings.