We are pleased to announce a new release of host-level and domain-level web graphs based on the published crawls of February, March and April 2018. Additional information about data formats, the processing pipeline, our objectives, and credits can be found in the announcements of prior webgraph releases (e.g., Nov/Dec/Jan 2017-2018 Webgraphs).

What’s new?

The graphs now contain links from sitemap announcements in robots.txt files. This small addition of 2.5 million inter-host links is motivated by the fact that sitemaps directives are sometimes (see example 1, 2, 3) used for link spam or aggressive SEO, often in combination with excessive use of inter-host hyperlinks on HTML pages. We hope that this addition helps to improve the detection rate of link spam detection algorithms.

Host-level graph

The graph consists of 2.14 billion nodes and 10.15 billion edges and includes dangling nodes i.e. hosts that have not been crawled yet are pointed to from a link on a crawled page. There are 2.02 billion dangling nodes (94%) and the largest strongly connected component contains only 77 million (3.6%) nodes. The host names are reversed and a leading www. is stripped: www.subdomain.example.com becomes com.example.subdomain.

You can download the graph and the ranks of all 2 billion hosts from AWS S3 on the path s3://commoncrawl/projects/hyperlinkgraph/cc-main-2018-feb-mar-apr/host/. Alternatively, you can use https://commoncrawl.s3.amazonaws.com/projects/hyperlinkgraph/cc-main-2018-feb-mar-apr/host/ as prefix to access the files from everywhere.

The following files and formats are provided:

Download files of the Common Crawl Feb/Mar/Apr 2018 host-level webgraph

SizeFileDescription
12.45 GBcc-main-2018-feb-mar-apr-host-vertices.paths.gznodes ⟨id, rev host⟩, paths of 80 vertices files
50.22 GBcc-main-2018-feb-mar-apr-host-edges.paths.gzedges ⟨from_id, to_id⟩, paths of 160 edges files
20.68 GBcc-main-2018-feb-mar-apr-host.graphgraph in BVGraph format
2 kBcc-main-2018-feb-mar-apr-host.properties
24.82 GBcc-main-2018-feb-mar-apr-host-t.graphtranspose of the graph (outlinks inverted to inlinks)
2 kBcc-main-2018-feb-mar-apr-host-t.properties
1 kBcc-main-2018-feb-mar-apr-host.statsWebGraph statistics
28.84 GBcc-main-2018-feb-mar-apr-host-ranks.txt.gzharmonic centrality and pagerank

Domain-level graph

The domain graph was built by aggregating the host graph on the level of pay-level domains (PLDs) based on the public suffix list maintained on publicsuffix.org.

The domain-level graph has 98 million nodes and 1.5 billion edges. 57% or 56 million nodes are dangling nodes, the largest strongly connected component covers 37 million or 38% of the nodes.

All files related to the domain graph are available on AWS S3 under s3://commoncrawl/projects/hyperlinkgraph/cc-main-2018-feb-mar-apr/domain/ resp. https://commoncrawl.s3.amazonaws.com/projects/hyperlinkgraph/cc-main-2018-feb-mar-apr/domain/.

Download files of the Common Crawl Feb/Mar/Apr 2018 domain-level webgraph

SizeFileDescription
0.68 GBcc-main-2018-feb-mar-apr-domain-vertices.txt.gznodes ⟨id, rev domain, num hosts⟩
6.04 GBcc-main-2018-feb-mar-apr-domain-edges.txt.gzedges ⟨from_id, to_id⟩
3.32 GBcc-main-2018-feb-mar-apr-domain.graphgraph in BVGraph format
2 kBcc-main-2018-feb-mar-apr-domain.properties
3.53 GBcc-main-2018-feb-mar-apr-domain-t.graphtranspose of the graph
2 kBcc-main-2018-feb-mar-apr-domain-t.properties
1 kBcc-main-2018-feb-mar-apr-domain.statsWebGraph statistics
2.06 GBcc-main-2018-feb-mar-apr-domain-ranks.txt.gzharmonic centrality and pagerank

Below you’ll find the top 1000 domains ranked by Harmonic Centrality or PageRank. The full list of all 98 million domains is available for download.

Top 1000 domains ranked by harmonic centrality (Feb/Mar/Apr 2018)

harmonic
centrality
rank
hc valuepage rankpage rank
value
reversed hostname
12716049010.016926com.googleapis
22691740020.013683com.facebook
32502320830.009981com.google
42424149240.008519com.twitter
52392399050.007249com.youtube
62305594460.006339org.w
72120813270.004853org.gmpg
82112645280.003674com.instagram
920871022100.003315com.linkedin
1020308692140.002286com.wordpress
1120254236120.002787org.wordpress
1220083558230.001537com.gravatar
1320073594260.001290org.wikipedia
1419978538180.001671com.pinterest
1519658798110.002973com.bootstrapcdn
1619570742200.001624com.apple
1719538492310.000948com.blogspot
1819437996250.001294com.vimeo
1919203616350.000867gl.goo
2019193288370.000853com.amazon
2119158752170.001701com.adobe
2219158570290.001124com.microsoft
2319126248450.000704com.tumblr
2419082854150.001792com.macromedia
2519011582360.000856com.wp
2619005558410.000794ly.bit
2718975638540.000499com.yahoo
2818971200160.001786com.googletagmanager
2918852580440.000711be.youtu
3018806694330.000921com.amazonaws
3118791732320.000922com.paypal
3218760344190.001634com.cloudflare
3318706230420.000775com.flickr
3418680604610.000433org.mozilla
3518680160300.001017com.github
3618678006960.000297com.googleusercontent
3718625646690.000409com.weebly
3818604632490.000594org.w3
3918547840660.000424org.creativecommons
4018536644830.000354com.soundcloud
4118495276240.001407net.doubleclick
42184658441490.000207com.blogger
43184016201470.000212com.imgur
4418400744400.000809me.wp
4518397664280.001163com.gstatic
4618392744550.000498com.list-manage
47183914161740.000162com.myspace
4818390430670.000417com.medium
49183732701540.000194net.slideshare
5018360766870.000337io.github
51183488881770.000161com.wsj
5218332910510.000589co.t
53182562782280.000115com.reuters
5418245054390.000818org.apache
5518231016380.000837com.statcounter
56182209402360.000110uk.co.telegraph
5718214734680.000415eu.europa
58182073822320.000111org.npr
59182000742920.000087com.appspot
6018195584530.000550com.jquery
61181857381660.000178com.android
62181842722660.000096com.cnbc
63181836661460.000213com.issuu
6418181476780.000387com.cnn
65181701362610.000099com.about
66181617441330.000240com.nytimes
67181608521370.000231com.yelp
68181527922530.000102me.about
69181328881700.000172com.spotify
70181113361900.000146uk.co.bbc
71181047481510.000201com.wixsite
72180993941710.000168com.tripadvisor
73180989821780.000160org.gnu
74180862402040.000134org.wikimedia
75180859183560.000072edu.washington
7618084938430.000730net.cloudfront
77180808722010.000135com.oracle
78180754842820.000088org.python
79180658584270.000063org.chromium
80180632681650.000182org.ietf
8118058042720.000391com.huffingtonpost
82180507304030.000067edu.ucla
83180364824490.000059edu.princeton
84180287203680.000070com.slate
85180231161200.000254com.reddit
86180179442410.000107com.mozilla
87180139523050.000085com.mysql
8818004788630.000428com.ytimg
8917983326600.000457com.bing
90179795521390.000220com.dropbox
91179706584140.000065com.pixabay
92179692402130.000122com.nbcnews
93179480721640.000182com.forbes
94179477223320.000077gov.loc
95179435223800.000069com.googlecode
96179388521670.000176org.archive
97179365043110.000083com.foursquare
98179351221600.000186net.sourceforge
99179229562710.000093com.go
100179149925110.000053edu.gatech
101179147281890.000148com.theguardian
10217911318570.000476org.schema
103179087002070.000129es.google
104179044662680.000094com.example
105178939342340.000110com.hubspot
10617893100340.000899com.squarespace
10717882030710.000392com.paypalobjects
10817876036910.000322com.mashable
109178717923900.000068com.steampowered
110178713383290.000079gov.fda
11117866752810.000369com.wix
11217863474560.000486net.fbcdn
113178619883840.000069com.tinypic
114178605284310.000062com.variety
115178594743820.000069org.nodejs
116178563782030.000135edu.stanford
117178485522210.000119com.dribbble
1181784499690.003327com.godaddy
119178356382100.000126com.tinyurl
12017833096620.000433com.fb
121178275182520.000102com.businessinsider
122178201425850.000048edu.utah
123178168305580.000050edu.illinois
124178149982110.000125com.imdb
125178079362730.000093com.live
126177988282290.000112com.washingtonpost
127177956962860.000088au.com.google
128177948846650.000046com.chrome
129177940143100.000083edu.mit
130177931622150.000121com.typepad
131177928282460.000104com.techcrunch
132177926123910.000068com.sun
133177881464590.000059org.sciencemag
134177862884630.000058org.eclipse
135177855684360.000061com.withgoogle
136177634881040.000272com.addthis
137177626481440.000217com.jimdo
13817751854890.000332net.akamaihd
139177507665050.000053com.nike
140177466982810.000088com.bloomberg
141177465224540.000059org.ampproject
142177445963060.000084edu.harvard
143177426186770.000045com.hbo
144177404483080.000083com.cnet
145177399284660.000058co.g
146177334105100.000053com.chron
147177327065440.000051com.jetbrains
148177310545310.000052edu.tamu
149177262361720.000167com.etsy
150177241624790.000055com.sap
151177234225910.000047com.wikidot
152177230744450.000060com.libsyn
15317708952270.001246ru.yandex
154177008023070.000084com.wired
155176993086780.000045uk.org.tate
156176961863230.000080com.aol
157176954801560.000192com.twimg
158176921402780.000090com.usatoday
159176917563810.000069uk.co.dailymail
160176897805370.000051com.cc
161176893002740.000092io.codepen
162176886143600.000071com.cdbaby
163176883605920.000047org.virtualbox
164176869665140.000053edu.berkeley
165176843665990.000047com.googleblog
166176766826940.000044com.discogs
167176739383270.000079com.time
168176696421100.000259com.shopify
169176683962910.000087com.images-amazon
170176682981830.000155gov.nih
171176662181360.000237com.mailchimp
172176560163860.000069com.bbc
173176544506740.000045com.quora
174176497144340.000062com.marketingland
175176425602970.000086com.ibm
176176381464220.000064gov.nasa
177176370043180.000081com.photobucket
178176368121810.000159com.eventbrite
179176365085420.000051com.theverge
180176359565000.000054edu.cornell
181176354164180.000064com.git-scm
182176341544170.000065com.livejournal
183176335903280.000079com.msn
184176253405160.000052com.strikingly
185176241542550.000102com.mapquest
186176208767270.000042id.co.blogspot
187176184605740.000049com.yellowpages
188176175225510.000050org.rubyonrails
189176123424720.000057com.xrea
190176107123830.000069au.gov.nsw
191176090803630.000071com.latimes
192176080104750.000056org.kernel
193176036583760.000069com.gmail
19417597398990.000283edu.utexas
195175959222960.000086net.windows
196175951024650.000058me.paypal
197175908762430.000105com.stackoverflow
198175844744810.000055com.buzzfeed
199175780723340.000077com.meetup
200175735422120.000124com.ebay
201175680483030.000085com.staticflickr
202175680243580.000072com.npmjs
203175678927180.000042io.itch
204175671163700.000070com.ted
205175609265570.000050org.aarp
206175522742400.000108uk.co.amazon
20717551168590.000473com.vk
208175489222060.000130com.opera
209175309565350.000051net.codecanyon
210175293841760.000161com.feedburner
211175276045220.000052com.cbsnews
212175199486660.000046com.scribd
213175193365500.000050com.neilpatel
214175191203260.000079com.googlesyndication
215175187343970.000067com.springer
216175159582480.000103net.php
217175151964100.000066uk.co.blogspot
218175133465700.000049com.zdnet
219175132084370.000061com.angieslist
220175131583250.000079com.nypost
221175118765290.000052com.venturebeat
222175045464710.000057com.theatlantic
223175040044670.000058com.fortune
224175028781000.000282de.google
22517502814970.000292com.livestream
22617499932640.000427com.qq
227174962082630.000097com.surveymonkey
228174943486710.000046com.vice
229174922761610.000185uk.co.google
230174880662250.000117com.getclicky
231174865005770.000049net.daringfireball
232174853065860.000048org.maven
233174813802720.000093io.atom
2341748103210120.000034org.arxiv
235174780922420.000107com.digg
236174739824250.000063gov.whitehouse
237174731205250.000052com.box
238174720226990.000043com.newyorker
239174670943940.000068com.wiley
240174656661090.000264cc.co
241174652687390.000041com.arstechnica
242174628183730.000069org.mediawiki
243174618564420.000060com.kickstarter
244174614727890.000039com.jsbin
245174578485640.000050com.ft
24617454848940.000305me.fb
247174541105410.000051com.citysearch
248174523462670.000094com.hp
249174521462890.000087gov.ftc
250174517203140.000083gov.cdc
251174506921980.000137com.zendesk
252174504307160.000042com.unsplash
253174501322590.000101com.disqus
254174484461500.000204net.behance
255174456281620.000184com.salesforce
256174393803870.000069com.prnewswire
257174373005760.000049org.pbs
258174364964400.000060com.entrepreneur
259174317348370.000036edu.yale
260174292304910.000054com.inc
261174291865730.000049com.wikihow
262174261001010.000280com.baidu
263174236286810.000045com.dropboxusercontent
264174232347260.000042com.nationalgeographic
265174227967560.000040com.foxnews
266174219105620.000050com.wikia
267174197281950.000140jp.co.yahoo
268174195447940.000038com.naturalnews
269174184725660.000050com.deviantart
270174143962760.000091com.webs
271174138723480.000075fr.free
272174122103170.000081org.acm
273174028163960.000067ly.ow
2741739488414530.000026edu.purdue
275173936803450.000075gov.ca
276173906742180.000120com.stumbleupon
277173892549180.000035edu.psu
278173851444230.000063org.un
279173844005010.000054com.cisco
2801738343413490.000029edu.ucsd
281173817064740.000056com.giphy
282173805167360.000041com.economist
283173793742510.000103com.wufoo
2841737417412650.000032com.gizmodo
285173737523690.000070com.dailymotion
2861737286412910.000031gov.fbi
287173727805560.000050com.office
288173724827530.000041org.aclweb
289173723701730.000165com.constantcontact
290173718804530.000059com.businesswire
291173700584150.000065com.skype
292173697407380.000041org.amnesty
293173669266840.000045com.nature
294173665749320.000034edu.columbia
295173658585180.000052org.postgresql
296173646686830.000045org.tigris
2971736136815470.000023com.hotmail
298173565867460.000041com.storify
2991735200415870.000022com.vanityfair
300173513423210.000081it.placehold
301173485587400.000041com.yarnpkg
302173418863570.000072com.oreilly
303173382084940.000054com.snapchat
3041733731410320.000033edu.upenn
305173352048070.000037com.googledrive
3061733483012770.000032com.qz
3071733377410790.000033com.evernote
308173317583220.000080com.tripod
309173306347040.000043com.googlesource
310173296364260.000063int.who
311173263667100.000043com.intel
312173257867340.000041com.sublimetext
3131732487011350.000033com.shutterstock
314173220721820.000158com.weibo
315173189469440.000034org.ieee
316173185949910.000034gov.uspto
3171731635418260.000018com.fifa
318173150947900.000039ly.snip
319173146686930.000044com.bandsintown
320173140082300.000111com.bandcamp
321173135728920.000035com.statista
322173096565720.000049com.goodreads
323173095869110.000035com.bizcommunity
3241730909214850.000025uk.co.theregister
325173088427490.000041com.engadget
3261730618410720.000033org.eff
3271730333614060.000027com.trello
328173020601940.000141de.amazon
329173013804440.000060org.hbr
330173010809050.000035com.psychologytoday
3311730047614700.000025com.elpais
3321729975015760.000023com.upwork
333172953407250.000042com.samsung
334172917667190.000042org.gnupg
3351729156615320.000024edu.northwestern
336172914543240.000080com.smugmug
337172904848460.000036com.stackexchange
338172888644350.000061com.force
3391728276816020.000022org.khanacademy
3401728143012840.000031com.hootsuite
3411727868613960.000028com.ning
3421727855210010.000034com.thenextweb
343172780989550.000034com.weather
344172776249060.000035org.webmproject
345172772428870.000035com.timeanddate
3461727678215000.000025com.pexels
347172760507640.000040com.manta
348172755787970.000038com.mysanantonio
3491727380013840.000028co.vine
350172723267840.000039uk.co.guardian
3511726970416600.000021com.speakerdeck
352172689707740.000039com.uk
3531726515212590.000032org.iso
3541726329815600.000023com.billboard
355172604228740.000035com.marketwatch
356172587607650.000040com.sciencedaily
3571725860014420.000026com.thinkwithgoogle
358172581207690.000040com.digitaljournal
3591725560213790.000028in.blogspot
36017255046930.000313org.networkadvertising
3611725478415190.000024com.pcworld
3621725330815900.000022com.posterous
3631725315814320.000026com.pcmag
3641725169012810.000032com.mckinsey
365172497587440.000041com.blackberry
366172486269040.000035org.unesco
367172479166920.000044gov.noaa
3681724731613510.000029com.airbnb
3691724063213560.000029com.istockphoto
3701724012213500.000029org.altervista
371172395065530.000050com.githubusercontent
372172389362640.000096to.amzn
3731723855217190.000020org.owasp
374172378481080.000266com.bleacherreport
375172374842440.000105org.joomla
3761723679012750.000032com.netflix
377172351709170.000035edu.utep
378172349281400.000218com.googleadservices
379172345847930.000038com.merchantcircle
380172315621580.000190org.bbb
3811722750217820.000018uk.co.metro
382172270965040.000053com.bizjournals
3831722509012740.000032ca.blogspot
3841722454815560.000023com.merriam-webster
385172243283330.000077com.rawgit
3861722296616630.000021edu.usc
3871722260613370.000029uk.ac.ox
388172221207170.000042gov.sec
3891722179414360.000026de.spiegel
390172216887910.000039com.vagrantup
391172198508030.000038com.sfgate
3921721573612730.000032us.imageshack
3931721443210830.000033com.lifehacker
394172132623660.000070com.sxsw
3951721290018500.000017net.boingboing
396172119169850.000034com.americanexpress
397172114226960.000044com.moz
398172109364920.000054net.openid
3991720972412660.000032com.indiegogo
400172094302880.000088com.windowsphone
401172093663130.000083ca.google
402172088721050.000271com.people
4031720868216280.000021edu.jhu
404172062742190.000120org.drupal
4051720235218420.000018com.nfl
4061719960613600.000028gov.usgs
407171994482950.000086com.fc2
4081719894219140.000017ca.uwaterloo
409171973321310.000252jp.co.google
4101719690017890.000018com.socialmediaexaminer
4111719431016080.000022com.mcafee
4121719394619650.000016com.tutsplus
4131719209222560.000014com.twitpic
414171916064160.000065com.booking
415171905043300.000078com.bitly
416171894205320.000052com.w3schools
4171718929614570.000026com.boston
418171892685800.000048com.squareup
4191718830617280.000019com.technologyreview
4201718667214610.000026com.gumroad
4211718661012970.000031com.redhat
4221718649816770.000020com.hulu
4231718603211320.000033gov.nist
4241718547814690.000025com.discovery
4251718544413000.000031fr.blogspot
426171826648060.000037gov.nps
427171814146170.000047uk.co.independent
4281717867013060.000030com.politico
429171761065360.000051com.typeform
4301717396814180.000027com.zoho
4311717374217530.000019com.ehow
4321717309819850.000016com.cbs
4331717242015750.000023com.codeplex
434171721349610.000034ca.cbc
435171720748260.000037com.whitepages
4361717114614310.000026com.alibaba
437171703962230.000118org.icann
438171703144970.000054org.doi
439171700749310.000034net.researchgate
440171697009620.000034au.net.abc
4411716942816580.000021org.gnome
4421716872629280.000010com.hubpages
443171683101850.000153it.google
444171668363430.000075com.nielsen
445171661981970.000138com.histats
446171638348910.000035com.gofundme
4471716276815940.000022com.mtv
448171619408520.000036gov.copyright
449171617365300.000052com.sciencedirect
450171609946000.000047org.doxygen
451171599684690.000058us.icio
452171593587110.000042com.slack
4531715929416340.000021edu.academia
4541715888413550.000029com.pingdom
455171571348280.000036it.binged
456171571247350.000041com.java
4571715711210140.000034edu.alamo
4581715645216110.000022edu.unc
459171557284580.000059com.sitelock
460171555141990.000137com.xing
461171554004380.000061com.adweek
4621715532815380.000024gov.nyc
463171543607280.000042org.vim
4641715302617700.000019edu.cuny
4651715153016230.000021com.nba
466171512824950.000054mp.j
4671715127210620.000033tv.ustream
468171510149360.000034com.groupspaces
4691714948416310.000021com.udemy
470171477104890.000054edu.cmu
4711714575014740.000025com.over-blog
4721714386221100.000015com.mentalfloss
473171432387010.000043de.blogspot
4741714320213970.000028uk.ac.cam
4751714292619220.000017com.fiverr
476171415367480.000041com.webmd
477171403407470.000041com.questionpro
478171393849600.000034gov.fcc
4791713697213320.000030edu.wisc
4801713650013180.000030org.postimg
481171358209220.000035edu.umich
482171332821930.000141com.eepurl
4831713239013210.000030com.deloitte
484171323001030.000278me.m
4851713148418410.000018com.gamespot
486171287585790.000048org.whatbrowser
487171256167950.000038br.com.uol
488171235067410.000041org.jenkins-ci
489171234548250.000037gov.senate
4901712069015440.000023com.hollywoodreporter
4911711967819110.000017uk.ac.ucl
4921711877632970.000009com.blog
4931711523215340.000024net.recode
494171134928730.000035com.att
4951711307617520.000019com.angelfire
4961711281815160.000024com.techrepublic
497171115582310.000111fr.google
4981710829218460.000017com.ikea
4991710811415120.000024com.prezi
500171072167230.000042com.adage
5011710697413680.000028com.gigaom
5021710509618170.000018com.canva
5031710493815530.000023edu.uchicago
5041710452815580.000023com.econsultancy
5051710391012950.000031com.formstack
506171038024570.000059com.bigcartel
5071710332415490.000023com.scientificamerican
508171032988320.000036gov.census
5091710203810520.000033com.searchengineland
510171019303610.000071com.fastcompany
511171009826310.000046com.symantec
5121710097220410.000016ca.ualberta
513170991909710.000034com.pinimg
514170991627240.000042com.geocities
515170985929920.000034com.hotfrog
51617098588210.001560com.wixstatic
517170979024510.000059gov.ed
5181709781416680.000021net.daum
5191709517020400.000016ch.ethz
5201709416013140.000030org.redcross
521170941483920.000068com.naver
522170938485880.000048tv.twitch
5231709369618350.000018com.sky
524170936465130.000053cn.com.sina
5251709272010880.000033com.collegian
526170925084090.000066net.themeforest
5271709206818760.000017tv.periscope
5281708956023480.000013com.flipboard
5291708937019590.000016com.ign
530170886982690.000094com.myshopify
531170846983090.000083com.whatsapp
5321708382017920.000018com.starbucks
5331708180217960.000018com.aliexpress
5341708094018940.000017com.ibtimes
5351708076015210.000024com.target
5361707989211850.000033com.fotolia
537170798367310.000041gov.hhs
5381707797218620.000017edu.msu
5391707756213280.000030com.animoto
540170770969520.000034jp.ac.kobe-u
541170752089290.000034com.ubuntu
5421707475026930.000011com.klout
5431707422419490.000016it.scoop
544170736103790.000069nl.google
5451707299214400.000026com.com
5461707178418060.000018com.mac
5471707064016890.000020edu.umd
5481706858623090.000013org.gimp
549170681165120.000053com.msdn
5501706727614600.000026com.nydailynews
551170663863400.000076edu.nyu
5521706542819610.000016org.aclu
5531706514015990.000022com.intuit
554170634388140.000037com.indiatimes
5551706278814990.000025com.nymag
556170624029240.000035com.lighthouseapp
557170617768780.000035com.insiderpages
558170613466750.000045com.delicious
559170611206730.000046com.cargocollective
5601706103214430.000026fm.last
5611706083415660.000023gov.uscourts
5621706078812610.000032org.worldbank
5631705901819400.000016com.getsatisfaction
5641705898022480.000014edu.gmu
5651705777012820.000032org.change
5661705743613100.000030gov.sba
567170565188850.000035com.cbslocal
568170561165630.000050gov.usda
5691705586821220.000015org.d3js
5701705583013570.000029com.500px
5711705574615170.000024com.businessweek
572170545684410.000060com.clicky
5731705326616560.000021com.vox
5741705284811250.000033net.digitalcongo
5751705227615480.000023de.heise
576170520305550.000050in.co.google
5771705189818890.000017edu.arizona
57817050912920.000320com.atdmt
5791705052021180.000015edu.asu
5801704809810330.000033com.citysquares
581170476465830.000048gov.epa
582170474329450.000034com.feedly
5831704629414560.000026edu.si
5841704602420910.000015uk.co.wired
5851704601214750.000025com.globo
5861704596016690.000021fr.lemonde
5871704581214540.000026edu.umn
588170442501840.000154com.youtube-nocookie
589170434287770.000039com.usnews
5901704334230290.000010gd.is
5911704113418250.000018com.autodesk
5921704054819780.000016com.exacttarget
5931703948410160.000034com.alexa
594170390389760.000034com.wsoctv
5951703694620850.000015com.yolasite
596170363161550.000193com.google-analytics
5971703606216590.000021it.blogspot
598170346401630.000183com.ggpht
5991703400010510.000033org.plos
600170323947570.000040es.com.blogspot
6011703233015520.000023org.cancer
602170312248610.000036com.tiddlywiki
6031702951418330.000018au.com.blogspot
604170286884300.000062gov.irs
6051702690017840.000018edu.virginia
606170264962790.000089com.getbootstrap
607170262182850.000088jp.co.amazon
608170251601570.000191ru.mail
6091702493221470.000015com.bestbuy
610170248902200.000120jp.ameblo
6111702460013620.000028com.ycombinator
61217024552740.000390com.messenger
6131702320616930.000020com.zazzle
614170231364470.000060com.barnesandnoble
6151702178411280.000033com.reverbnation
6161702101811140.000033site.tenerifeforum
617170208568450.000036org.bouncycastle
618170206329980.000034com.chamberofcommerce
6191702056620100.000016com.marketingprofs
6201702045620610.000015com.invisionapp
6211701780814160.000027com.searchenginejournal
6221701778215180.000024org.apa
623170168247140.000042fr.amazon
6241701523415050.000024com.kissmetrics
625170151624010.000067com.nasdaq
626170147649990.000034com.2findlocal
6271701449410360.000033net.brownbook
6281701355418600.000017com.msnbc
6291701342813830.000028com.bufferapp
630170130121870.000151com.amazon-adsystem
6311701213227740.000011com.popsci
6321701143816010.000022kr.flic
633170113542080.000128jp.ne.hatena
634170111887020.000043com.herokuapp
635170109024290.000062com.custhelp
6361701079824120.000013com.starwars
6371701063417730.000019com.knowyourmeme
6381701056826340.000011org.kiva
6391701031014350.000026com.cafepress
6401700955614820.000025com.searchenginewatch
6411700888416200.000021com.newsweek
6421700874615330.000024com.uber
643170086944520.000059com.stripe
6441700860022060.000014com.freep
6451700729017320.000019com.salon
646170066968390.000036com.newsbank
6471700668022140.000014com.wikispaces
6481700638016850.000020com.splashthat
649170054648110.000037gov.state
6501700314417990.000018google.blog
6511700200012680.000032com.patreon
6521700141233470.000009com.space
6531700101018360.000018edu.rutgers
6541700093021610.000015net.comcast
6551700091216570.000021com.today
6561699883215590.000023org.coursera
657169979988440.000036com.tandfonline
6581699735817340.000019org.filezilla-project
6591699711815740.000023int.coe
660169968727670.000040com.photoshelter
661169967869280.000035jp.co.fujixerox
6621699614610270.000034com.uservoice
6631699578220840.000015uk.ac.ed
6641699512813270.000030com.walmart
6651699498218230.000018com.semrush
6661699457639290.000007com.wolframalpha
6671699368618080.000018com.blogtalkradio
6681699364816800.000020au.com.smh
669169926389780.000034com.independent
6701699210210670.000033com.lacartes
6711699039621530.000015edu.bu
672169900967030.000043com.emarketer
673169900589870.000034uk.co.eventbrite
6741699004413800.000028org.iana
675169896926720.000046com.houzz
676169890788930.000035com.chambermaster
6771698903223030.000014com.xerox
6781698897622260.000014com.instructables
6791698840617210.000020uk.co.mirror
6801698766821630.000015nl.blogspot
6811698722420930.000015nl.xs4all
6821698693014040.000027com.investopedia
6831698666821620.000015de.bild
684169865047090.000043org.sonatype
6851698582421050.000015com.newscientist
686169855329810.000034com.strawberryperl
6871698510013300.000030gov.dot
6881698352815640.000023com.mediafire
6891698296616210.000021org.weforum
6901698296417670.000019com.thedailybeast
691169826461860.000152com.sharethis
6921698147830180.000010com.tvguide
6931698134621920.000014com.foxsports
6941698032418450.000018ru.narod
6951697884426210.000012edu.caltech
6961697842817310.000019org.cambridge
6971697834415020.000025com.mixcloud
6981697789615360.000024com.smashingmagazine
6991697750628190.000011org.greenpeace
7001697681816520.000021com.timeout
7011697652822820.000014com.homedepot
7021697596214810.000025com.ssrn
7031697578218950.000017uk.ac.lse
7041697546229230.000010org.icrc
7051697358027920.000011pt.sapo
7061697308817180.000020br.com.blogspot
7071697299823230.000013com.googlepages
7081697274815400.000024com.playstation
7091697219614280.000026com.elsevier
7101697218217330.000019mil.navy
7111697080621330.000015com.britannica
7121697032810250.000034org.gwtproject
7131696945421600.000015com.gawker
7141696851621570.000015au.com.news
7151696843013920.000028com.dell
7161696808429140.000010com.fivethirtyeight
7171696794813160.000030net.java
7181696754219360.000017com.vogue
719169668468360.000036com.gartner
7201696588618050.000018com.deezer
721169657063620.000071com.heroku
7221696562610310.000033org.twinery
7231696474012880.000031net.azurewebsites
724169643783780.000069com.ea
7251696424021410.000015com.seattletimes
7261696360027930.000011org.angularjs
727169606949210.000035com.prweb
7281696001418580.000017edu.ucdavis
7291695991820260.000016edu.uci
7301695860613910.000028com.bostonglobe
7311695843213950.000028com.hostgator
7321695829015810.000023com.nokia
7331695771819500.000016edu.ucsf
7341695712616250.000021com.bmj
7351695647422300.000014com.nbc
7361695622021380.000015ca.ubc
737169558408000.000038st.assi
7381695582020900.000015uk.co.thesun
7391695452825410.000012edu.brown
7401695447417880.000018com.zillow
7411695447410490.000033org.swi-prolog
7421695352017010.000020edu.duke
7431695351817830.000018com.csmonitor
7441695345028900.000010com.voanews
7451695205010380.000033com.zwire
7461695203417020.000020com.yandex
7471695181216350.000021com.rollingstone
7481695127221780.000014com.webnode
7491695121231190.000009edu.uic
7501694957416830.000020com.examiner
751169487944990.000054com.marriott
7521694725814330.000026com.digiday
7531694682632470.000009com.lmgtfy
7541694630029150.000010com.campaignmonitor
7551694599216170.000022com.thomsonreuters
7561694580415670.000023com.vmware
7571694572810180.000034com.themonitor
7581694533419420.000016edu.indiana
7591694486412690.000032net.yahoo
7601694479612960.000031org.openstreetmap
7611694469610430.000033com.sagepub
7621694354814620.000026jp.blogspot
763169434082940.000087com.wpengine
7641694272023810.000013com.rt
7651694236018020.000018com.pastebin
7661694211628460.000011int.esa
7671693995821260.000015net.oauth
7681693980416480.000021com.css-tricks
7691693937621540.000015org.acs
7701693920220640.000015com.aljazeera
771169388128170.000037io.getmdl
7721693816421930.000014com.redbubble
773169359289190.000035com.sacurrent
774169357543770.000069com.monster
7751693541824480.000013edu.unl
7761693530214580.000026gov.congress
7771693529817800.000018org.ap
7781693513617870.000018org.golang
7791693511231940.000009ru.blogspot
7801693460817270.000019net.battle
7811693374812800.000032com.marketwired
7821693299616240.000021com.hyatt
7831693277418340.000018int.wipo
784169325881690.000173info.aboutads
7851693196410550.000033com.hatenadiary
7861693144620270.000016com.irishtimes
7871693074214450.000026com.mlb
7881693055810260.000034com.showmelocal
7891692995820690.000015com.getfirebug
7901692973222200.000014com.crunchbase
7911692950825920.000012ms.1drv
7921692935417160.000020ru.spb
793169288608290.000036com.engineyard
7941692765213880.000028com.justgiving
7951692612621440.000015org.hrw
7961692580628620.000010com.threadless
797169256728620.000036com.qualaroo
7981692514627880.000011com.lynda
799169247589070.000035com.proofpoint
8001692470816000.000022com.xbox
8011692455215080.000024com.steamcommunity
8021692438423340.000013com.theonion
8031692416026000.000012edu.hawaii
804169233748150.000037gov.justice
805169231705670.000050com.bigcommerce
8061692288214250.000026com.docker
8071692089619860.000016com.si
8081692064820170.000016com.allthingsd
8091692033825270.000012com.madmimi
8101691971812890.000031com.atlassian
8111691968410350.000033com.bitballoon
8121691908015410.000024com.theglobeandmail
8131691811425510.000012org.phys
8141691759019180.000017edu.umass
815169173069820.000034com.judysbook
8161691688218710.000017com.buffer
817169157043670.000070jp.co.rakuten
8181691558610590.000033com.spoke
8191691536031790.000009uk.bl
82016915232820.000360com.parallels
8211691509424670.000013com.fox
822169145703470.000075com.youku
8231691393421850.000014com.topsy
8241691341416740.000021com.fedex
825169129304390.000061com.taobao
8261691162433620.000009com.friendfeed
8271691136210570.000033com.ibegin
8281690838226840.000011org.semanticscholar
8291690811216880.000020com.ecwid
830169053442800.000089com.automattic
831169051643650.000070me.t
832169040144460.000060com.cracked
833169028807520.000041ru.google
8341690256824640.000013com.groupon
8351690223021370.000015org.videolan
8361690115016360.000021org.unicode
8371690001616530.000021com.gettyimages
838168997909340.000034com.gfmag
8391689937621560.000015com.lonelyplanet
8401689895824550.000013edu.hbs
8411689856416490.000021com.unity3d
8421689840010600.000033dk.brics
8431689833010870.000033com.salespider
8441689745823410.000013hk.com.google
8451689724013230.000030org.oecd
8461689620421020.000015com.livestrong
8471689590017860.000018com.freewebs
848168950462830.000088com.garmin
8491689480615460.000023gov.wa
8501689429021160.000015com.netvibes
8511689211017080.000020com.lulu
8521689092835790.000008com.aviary
8531689040213460.000029es.amazon
8541689009226560.000011org.wikiquote
8551688995220110.000016com.getresponse
8561688940020230.000016com.espn
8571688911614790.000025com.xkcd
8581688870835470.000008org.edx
8591688859614370.000026gov.va
860168885668600.000036com.infusionsoft
8611688841820740.000015com.history
8621688837422630.000014me.flavors
8631688774828060.000011org.moma
8641688765272330.000004com.weheartit
8651688759619870.000016com.popsugar
8661688586415920.000022org.mayoclinic
867168857849640.000034com.chicagotribune
868168857821790.000160com.fontawesome
8691688558415300.000024com.oup
8701688502620440.000016com.w3techs
8711688432014490.000026gov.bls
8721688243835340.000008com.squidoo
8731688151233310.000009edu.rochester
8741688045424320.000013gov.cia
8751688036621010.000015com.mercurynews
8761688025816660.000021se.haxx
8771688023819380.000017org.jstor
8781688011216700.000021com.adjust
8791688002223600.000013com.hindustantimes
8801687989410420.000033com.enterprisenetworkingplanet
8811687789438850.000008com.4shared
8821687658217450.000019com.ssllabs
8831687617217580.000019com.freepik
8841687600817570.000019de.welt
8851687539426730.000011com.panasonic
8861687529414650.000025com.forrester
8871687498022650.000014com.readwriteweb
8881687445029680.000010com.softonic
8891687380630720.000010org.psychologicalscience
8901687297238090.000008org.libreoffice
8911687121024020.000013net.faz
8921687047420080.000016com.urbandictionary
893168704489680.000034com.marketersmedia
8941687020412920.000031com.brightcove
8951687019814390.000026com.techtarget
8961686967810110.000034com.shareasale
897168695828190.000037gov.house
898168688581110.000259com.namecheap
8991686859634450.000008edu.rit
9001686849613290.000030gov.archives
9011686833211310.000033place.yellow
902168681264320.000062pl.google
9031686789624830.000012com.shutterfly
9041686782216030.000022mil.army
9051686779032440.000009com.avast
9061686694023760.000013com.iconfinder
9071686692856560.000005ro.blogspot
9081686654416260.000021com.warnerbros
9091686651610760.000033org.rethinkingschools
9101686625230770.000010cc.tiny
9111686514625940.000012com.html5rocks
9121686497832430.000009com.askmen
9131686469819520.000016com.nvidia
9141686335222730.000014com.cbssports
9151686262622750.000014edu.osu
9161686171632690.000009edu.missouri
9171686141428570.000010edu.oregonstate
9181686049619790.000016com.colourlovers
9191686046010730.000033com.elocal
9201686033222090.000014com.ask
921168602107870.000039org.bitbucket
9221686001620420.000016com.me
9231685993025560.000012com.technet
9241685951428910.000010com.ezinearticles
9251685897229830.000010com.scmp
926168588044730.000057br.com.google
9271685830031290.000009com.fitbit
9281685788421120.000015com.comcast
929168578844560.000059com.udacity
9301685764227090.000011edu.pitt
9311685707017560.000019com.findlaw
9321685540220630.000015org.openoffice
9331685527244820.000007net.minecraft
9341685507420120.000016org.kde
935168548764960.000054jp.ne.sakura
9361685473626550.000011edu.tufts
9371685437238230.000008com.techsmith
9381685405418130.000018com.screencast
9391685373422740.000014net.earthlink
9401685373217600.000019uk.co.thetimes
941168533529160.000035com.campaign-archive1
9421685326230310.000010edu.ucsc
9431685284215860.000022com.outlook
9441685264613440.000029com.usps
9451685248417040.000020gov.uscis
9461685224026390.000011com.virgin
9471685192026850.000011ly.cl
9481685190427630.000011com.asus
9491685136021640.000014fr.lefigaro
9501685085820030.000016org.poynter
9511685044639620.000007edu.byu
9521685029828220.000011com.rottentomatoes
9531685012623720.000013uk.co.express
9541684940619290.000017org.craigslist
9551684921818160.000018com.smartinsights
956168490682260.000117me.line
9571684895815770.000023com.yoast
9581684791221950.000014com.podbean
9591684712227340.000011com.tesla
9601684677231770.000009com.9to5mac
9611684562236790.000008org.wikibooks
9621684557615310.000024com.business2community
9631684455219340.000017tech.ces
9641684403427850.000011ca.huffingtonpost
9651684378825780.000012com.sophos
9661684353210630.000033com.fixr
9671684350229720.000010edu.dartmouth
9681684338037330.000008org.laptop
9691684230422760.000014com.denverpost
9701684220427730.000011com.business
9711684217414410.000026jp.geocities
9721684203411410.000033com.microbiologybytes
9731684163621760.000014ca.sfu
9741684129018530.000017com.deadline
9751684037219990.000016com.jamanetwork
9761684013420150.000016jp.ne.biglobe
9771683988215140.000024org.fao
978168398767330.000041com.sprinklr
9791683893215200.000024jp.ne.goo
9801683869642880.000007com.panoramio
9811683854430780.000010com.freelancer
9821683851643860.000007com.ladygaga
9831683791832350.000009be.blogspot
984168370207500.000041com.aweber
9851683686474390.000004com.xanga
9861683653429520.000010se.blogspot
9871683626613240.000030com.linksynergy
9881683603417590.000019com.ew
9891683492435200.000008com.diigo
9901683365424180.000013com.asahi
991168336429700.000034de.bund
9921683275823060.000014com.teenvogue
9931683174435620.000008edu.fsu
994168314849530.000034net.viainfo
9951683112410660.000033com.superiorthreads
9961683049811060.000033lt.yn
9971682987213350.000029it.amazon
998168294348210.000037ca.amazon
9991682929218480.000017com.bigthink
1000168286403350.000077org.debian

Credits

Thanks to the authors of the WebGraph framework, whose software made the computation of graph properties and ranks possible.

We hope the data will be useful for you to do any kind of research on ranking, graph analysis, link spam detection, etc. Let us know about your results via Common Crawl’s Google Group!