We are pleased to announce a new release of host-level and domain-level web graphs based on the published crawls of May, June and July 2018. Additional information about data formats, the processing pipeline, our objectives, and credits can be found in the announcements of prior webgraph releases (e.g., Nov/Dec/Jan 2017-2018 Webgraphs).

Host-level graph

The graph consists of 886 million nodes and 5.4 billion edges and includes dangling nodes i.e. hosts that have not been crawled yet are pointed to from a link on a crawled page. There are 793 million dangling nodes (89.5%) and the largest strongly connected component contains only 67 million (7.5%) nodes. The host names are reversed and a leading www. is stripped: www.subdomain.example.com becomes com.example.subdomain.

You can download the graph and the ranks of all 886 million hosts from AWS S3 on the path s3://commoncrawl/projects/hyperlinkgraph/cc-main-2018-may-jun-jul/host/. Alternatively, you can use https://commoncrawl.s3.amazonaws.com/projects/hyperlinkgraph/cc-main-2018-may-jun-jul/host/ as prefix to access the files from everywhere.

The following files and formats are provided:

Download files of the Common Crawl May/June/July 2018 host-level webgraph

SizeFileDescription
5.60 GBcc-main-2018-may-jun-jul-host-vertices.paths.gznodes ⟨id, rev host⟩, paths of 98 vertices files
25.12 GBcc-main-2018-may-jun-jul-host-edges.paths.gzedges ⟨from_id, to_id⟩, paths of 196 edges files
9.99 GBcc-main-2018-may-jun-jul-host.graphgraph in BVGraph format
2 kBcc-main-2018-may-jun-jul-host.properties
11.30 GBcc-main-2018-may-jun-jul-host-t.graphtranspose of the graph (outlinks inverted to inlinks)
2 kBcc-main-2018-may-jun-jul-host-t.properties
1 kBcc-main-2018-may-jun-jul-host.statsWebGraph statistics
13.35 GBcc-main-2018-may-jun-jul-host-ranks.txt.gzharmonic centrality and pagerank

Domain-level graph

The domain graph was built by aggregating the host graph on the level of pay-level domains (PLDs) based on the public suffix list maintained on publicsuffix.org.

The domain-level graph has 92 million nodes and 1.45 billion edges. 57% or 53 million nodes are dangling nodes, the largest strongly connected component covers 34 million or 37% of the nodes.

All files related to the domain graph are available on AWS S3 under s3://commoncrawl/projects/hyperlinkgraph/cc-main-2018-may-jun-jul/domain/ resp. https://commoncrawl.s3.amazonaws.com/projects/hyperlinkgraph/cc-main-2018-may-jun-jul/domain/.

Download files of the Common Crawl May/June/July 2018 domain-level webgraph

SizeFileDescription
0.64 GBcc-main-2018-may-jun-jul-domain-vertices.txt.gznodes ⟨id, rev domain, num hosts⟩
5.85 GBcc-main-2018-may-jun-jul-domain-edges.txt.gzedges ⟨from_id, to_id⟩
3.21 GBcc-main-2018-may-jun-jul-domain.graphgraph in BVGraph format
2 kBcc-main-2018-may-jun-jul-domain.properties
3.43 GBcc-main-2018-may-jun-jul-domain-t.graphtranspose of the graph
2 kBcc-main-2018-may-jun-jul-domain-t.properties
1 kBcc-main-2018-may-jun-jul-domain.statsWebGraph statistics
1.96 GBcc-main-2018-may-jun-jul-domain-ranks.txt.gzharmonic centrality and pagerank

Below you’ll find the top 1000 domains ranked by Harmonic Centrality or PageRank. The full list of all 92 million domains is available for download.

Top 1000 domains ranked by harmonic centrality (May/June/July 2018)

harmonic
centrality
rank
hc valuepage rankpage rank
value
reversed hostname
12538162220.013272com.facebook
22476750010.016429com.googleapis
32357456630.009596com.google
42282638440.008408com.twitter
52239867650.007043com.youtube
62144685060.006211org.w
72000017070.004495org.gmpg
81991779290.003686com.instagram
919565892110.003123com.linkedin
1018904142250.001434com.gravatar
1118886866140.002009com.wordpress
1218791656260.001378org.wikipedia
1318683474230.001591com.pinterest
1418605644130.002616org.wordpress
1518523062210.001661com.apple
1618506550120.002795com.bootstrapcdn
1718299454330.000893com.blogspot
1818261082240.001454com.vimeo
1918104372370.000799com.amazon
2018101742340.000875gl.goo
2118052860380.000756be.youtu
2218015066280.001162com.microsoft
2317990876160.001783com.googletagmanager
2417945844190.001702com.adobe
2517942552440.000652com.tumblr
2617901968150.001947com.cloudflare
2717853502200.001684com.macromedia
2817832868450.000641com.wp
2917823530610.000483com.yahoo
3017781678400.000719com.flickr
3117733538460.000626ly.bit
3217680810480.000606me.wp
3317674312350.000857com.paypal
3417654864320.000904com.amazonaws
3517602138220.001598com.github
36175884601040.000250com.nytimes
3717550714540.000545org.mozilla
3817517872700.000400com.weebly
3917506702890.000291com.googleusercontent
4017435634410.000714io.github
41174005361840.000140com.wsj
42173536381440.000209com.dropbox
43173410621660.000161org.wikimedia
44173367101410.000217com.imgur
4517319686570.000497com.medium
4617318280680.000411org.creativecommons
4717316806650.000434com.bing
48172749761470.000198com.blogger
4917261700290.001115com.gstatic
5017257470660.000422com.jquery
51172361042110.000119com.businessinsider
52172116981550.000182net.slideshare
53172114302070.000120com.wired
5417203902530.000577co.t
5517197252560.000520eu.europa
56171877201820.000142com.myspace
5717184156920.000278com.mailchimp
5817153616360.000843org.apache
5917131630310.000912net.doubleclick
6017127558690.000402com.statcounter
6117120814630.000477com.list-manage
62171128502460.000100org.npr
63171052201450.000203com.issuu
6417099450270.001250ru.yandex
65170898263140.000078com.theverge
66170890123210.000077com.appspot
67170806881680.000159org.gnu
68170592801420.000216com.yelp
6917056948520.000581org.w3
70170534383310.000075com.about
71170480622670.000090me.about
72170357261760.000148com.oracle
731703127880.004428com.godaddy
74170163121750.000148org.ietf
75170149423770.000065com.slate
7616988198420.000702net.cloudfront
77169875503010.000082com.buzzfeed
78169862362260.000111com.tinyurl
79169840384360.000056edu.princeton
80169701483360.000074com.deviantart
81169450982060.000122com.cnn
82169432423660.000066edu.washington
83169413041050.000250com.reddit
84169229143930.000062edu.ucla
8516917390850.000302com.soundcloud
86169173384490.000055com.nike
87169094781930.000136uk.co.bbc
8816901808600.000485org.schema
89168997323780.000064org.arxiv
90168968523970.000060org.chromium
91168944041810.000142com.theguardian
92168885861650.000163com.forbes
93168878023800.000064com.stackexchange
94168863501610.000172com.android
95168858803430.000070gov.loc
96168813044370.000056com.qz
97168780643330.000074com.foursquare
98168710922420.000102com.nbcnews
99168628843120.000079gov.fda
100168562443890.000063org.ieee
10116855606300.000991com.squarespace
102168499164430.000055org.sciencemag
10316828640820.000323net.akamaihd
104168250502840.000085com.example
105168219224230.000057com.trello
106168154721720.000152com.whatsapp
107168128362150.000119es.google
108168125982980.000082com.typeform
109167934186160.000043com.flipboard
11016787856590.000493net.fbcdn
111167828361510.000190org.bbb
112167818222180.000118edu.stanford
113167814384270.000057com.libsyn
114167794445020.000049google.blog
115167771342830.000085com.go
116167746364190.000057com.withgoogle
117167633156100.000043edu.utah
11816762919640.000437com.ytimg
119167615123280.000076com.reuters
120167562092510.000097com.live
121167499831640.000163org.archive
122167477955180.000048edu.gatech
12316741185750.000357com.fb
124167391691060.000250edu.utexas
125167385382080.000120com.huffingtonpost
126167376952860.000084com.bloomberg
127167326292410.000103com.techcrunch
128167267683170.000078edu.harvard
129167241252050.000123com.dribbble
130167239353180.000078com.git-scm
131167191421690.000159gov.nih
132167074501460.000199net.sourceforge
133167067163400.000072com.msn
13416705396770.000351com.wix
135166972362940.000083uk.co.blogspot
136166946653160.000078com.googlecode
137166882643690.000066com.bbc
138166850432250.000111com.typepad
139166842812340.000106com.washingtonpost
140166830862130.000119com.imdb
141166792845380.000047com.chron
142166716337060.000042com.hbo
143166683473640.000067com.mashable
14416665099870.000294com.shopify
14516661523760.000351com.paypalobjects
146166593752620.000092edu.mit
147166509673940.000062com.tinypic
148166509082930.000083au.com.google
149166471493080.000080com.cnet
150166411643000.000082com.usatoday
151166401122960.000083net.windows
152166374164160.000058au.gov.nsw
153166266382990.000082com.ibm
154166228233710.000065uk.co.dailymail
155166221253390.000073uk.co.telegraph
156166193193620.000067com.gmail
157166158871730.000151com.eventbrite
158166122042270.000110net.php
159166090485230.000048com.fastcodesign
160165965773340.000074com.time
161165942044210.000057com.ted
16216593660730.000366de.google
163165916165310.000047org.rubyonrails
164165754022710.000088com.mapquest
165165751335610.000045edu.illinois
166165716701700.000154com.opera
167165689943950.000061com.latimes
168165651698190.000037com.dezeen
169165604263060.000081com.hp
170165569981800.000143com.stackoverflow
171165541844520.000055org.eclipse
172165494662360.000105com.ebay
173165437064400.000055com.kickstarter
174165403014250.000057gov.nasa
175165389742310.000106uk.co.amazon
176165333494860.000051edu.cornell
177165319171880.000139com.etsy
178165290574100.000058com.aol
179165260905420.000046com.quora
180165250422880.000084com.meetup
181165201855200.000048com.googleblog
182165186407610.000039io.itch
183165179724140.000058com.variety
184165172194990.000050edu.berkeley
185165087336220.000043uk.co.pinterest
186165061121000.000257com.livestream
187165029455140.000049com.ft
188165013764660.000053co.g
189164997134390.000056com.theatlantic
190164981315190.000048com.zdnet
191164920142440.000101com.surveymonkey
192164886271990.000130com.tripadvisor
193164851753900.000063com.cnbc
1941648316210370.000031com.engadget
195164810384630.000054com.mixcloud
196164760026050.000044com.vogue
197164702777520.000039com.nationalgeographic
198164686577500.000040com.creativebloq
199164674515650.000045com.yellowpages
20016467125900.000290com.addthis
201164661472390.000103org.drupal
202164641143110.000079com.udacity
203164636058980.000035com.sfgate
204164561917480.000040com.discogs
205164555492550.000095com.digg
206164533818170.000037com.wikia
207164495105280.000047com.nature
208164489441850.000140com.spotify
209164480545600.000045org.pbs
21016448046860.000300com.twimg
211164444394310.000056com.angieslist
212164373153840.000063com.skype
213164353804690.000053com.fortune
214164345011020.000255net.jsfiddle
215164337436020.000044com.newyorker
216164318714920.000051com.cbsnews
217164301824050.000059gov.whitehouse
218164274143540.000069org.python
219164257042800.000086com.hubspot
220164246053240.000076gov.cdc
221164212835470.000046org.aarp
222164206235410.000046com.findlaw
223164190611940.000135com.zendesk
224164165658790.000036com.arstechnica
225164153244930.000051org.hbr
226164150391600.000173com.wixsite
227164142645130.000049com.cisco
22816414231580.000495com.vk
229164140073610.000067com.photobucket
230164083883570.000069com.springer
231164076675090.000049com.superpages
232164060367280.000041com.intel
233164054314070.000058com.giphy
234164045192530.000096to.amzn
235164042408000.000038com.manta
23616403321470.000608com.qq
2371640326011880.000029com.gizmodo
238164030764980.000050com.entrepreneur
239164029506080.000043com.venturebeat
2401640210810470.000031edu.upenn
241164011453600.000068com.nypost
242164010944060.000059org.un
2431639587414050.000028uk.ac.ox
244163950786260.000043com.scribd
245163940569610.000034com.thenextweb
246163932375870.000044com.unsplash
247163921644700.000053com.xrea
248163908297640.000039com.hackernoon
2491639073510130.000032edu.columbia
250163856807130.000041com.box
251163839532300.000108com.stumbleupon
2521638363815510.000024edu.purdue
253163816495930.000044com.vice
254163813029130.000035ly.snip
255163804142290.000109net.behance
256163801116000.000044com.symantec
257163795921110.000236com.jimdo
258163783189590.000034com.googledrive
259163771722570.000094com.salesforce
260163759693870.000063com.images-amazon
261163740335040.000049org.unicode
262163726834450.000055com.office
263163709546350.000043com.citysearch
264163657868760.000036com.healthgrades
265163639573030.000081org.acm
266163634462400.000103com.disqus
267163631443380.000073com.tripod
268163569139920.000033com.pixabay
269163542343700.000066com.oreilly
2701634396310480.000031com.indiegogo
2711634252110400.000031com.evernote
272163374767350.000040gov.noaa
273163372267850.000038com.spreadshirt
2741633290213640.000029com.searchengineland
2751633046615970.000023uk.co.theregister
276163273606200.000043com.avvo
277163250451890.000138com.constantcontact
278163243544890.000051com.inc
279163229489200.000035com.naturalnews
280163217023760.000065org.ampproject
281163215784500.000055me.paypal
282163201883420.000071com.livejournal
283163199524220.000057com.businesswire
284163193879740.000033au.net.abc
285163191442590.000093org.joomla
286163141408850.000035com.dropboxusercontent
287163121038530.000036com.statista
288163083785580.000045com.goodreads
2891630836113980.000028com.sciencedaily
2901630737813770.000029com.storify
291163027927960.000038com.curbed
292163027291860.000139com.feedburner
2931630093915480.000024com.pcmag
294163003426440.000042gov.defense
295163000607260.000041org.eff
296162986713830.000064com.sxsw
2971629839815940.000023com.mcafee
298162968434710.000052com.snapchat
2991629504710020.000032com.shutterstock
300162940877000.000042com.moz
301162934011580.000175uk.co.google
302162913544330.000056com.adweek
303162897333370.000073gov.ca
304162888692370.000105com.bandcamp
305162851972330.000106de.amazon
306162812799090.000035gov.census
307162790956270.000043site.business
308162784117670.000039com.economist
309162734033460.000070com.wiley
310162715951570.000177com.weibo
311162714369110.000035gov.uspto
31216270078930.000273me.fb
313162664039900.000033gov.fcc
3141626321116240.000022com.pcworld
3151626307013660.000029org.worldbank
3161626240416950.000021com.fifa
317162618938920.000035com.merchantcircle
318162610247490.000040tv.twitch
3191625745517190.000020edu.unc
320162556999540.000034com.steampowered
3211625567216370.000022org.khanacademy
322162553978800.000036com.indiatimes
323162550763090.000080com.smugmug
324162546675640.000045com.wikihow
325162537049490.000034org.unesco
3261625338415900.000023edu.northwestern
3271625248110560.000031com.redhat
3281625005015990.000023com.scientificamerican
329162472258080.000037gov.nist
3301624661115630.000024com.smashingmagazine
331162413418220.000037com.deloitte
3321624055413880.000028com.politico
333162395143250.000076com.googlesyndication
334162385807870.000038org.tigris
335162377773670.000066com.prnewswire
3361623571110170.000032edu.yale
337162355439250.000035com.ubuntu
338162336137420.000040org.aiga
3391623267814870.000026com.pexels
3401623050714930.000026com.thinkwithgoogle
3411622962114400.000027com.alibaba
342162266353040.000081ca.google
343162262323590.000068com.dailymotion
3441622612217030.000021com.vanityfair
3451622376617940.000019com.udemy
346162233522770.000086com.windowsphone
347162223786210.000043com.slack
3481621993614090.000028ca.blogspot
349162194733320.000074com.bitly
350162142299620.000034gov.nps
351162132182790.000086com.wufoo
352162130887550.000039com.webmd
353162125547940.000038de.blogspot
3541620900410190.000032com.prweb
3551620493117660.000019edu.usc
356162045205460.000046com.homeadvisor
357162035909910.000033com.deepmind
358162032663220.000077com.mozilla
3591620153516520.000022org.weforum
3601620078419120.000017com.ehow
361161998795860.000044com.netflix
362161994427240.000041com.samsung
363161973874770.000052com.webs
3641619719618330.000018com.ikea
365161962281830.000141jp.co.yahoo
3661619580725480.000012com.sophos
367161950598130.000037org.amnesty
368161915019960.000033org.spie
3691619148216350.000022com.billboard
3701619112013740.000029com.hootsuite
3711618822110010.000032com.whitepages
372161881113190.000078fr.free
373161871951870.000139com.xing
374161865318150.000037com.java
3751618625218270.000018org.coursera
3761618413913810.000029com.speakerdeck
377161823691520.000190com.youtube-nocookie
3781617809919080.000017com.tutsplus
379161778368990.000035com.marketwatch
380161755459950.000033edu.psu
3811617449216880.000021com.chrome
3821617393010770.000031com.airbnb
3831617348517400.000020au.com.smh
384161730089520.000034gov.senate
385161722582760.000087com.getbootstrap
3861617141414430.000027com.marketingland
3871616988311040.000030com.ycombinator
388161685414130.000058int.who
3891616840310210.000032edu.umich
3901616764715110.000025com.xkcd
3911616520816170.000022com.merriam-webster
392161640629860.000033it.binged
3931616243710520.000031com.sun
394161621507690.000039com.googlesource
3951616188214390.000027edu.ucsd
396161608243990.000060com.mysql
397161566984180.000057com.bigcartel
398161548618460.000036gov.state
399161537259810.000033com.itsnicethat
4001615369314590.000027uk.ac.cam
401161525352600.000093com.myshopify
4021615205314820.000026co.vine
403161508305400.000047gov.usda
4041614793619340.000017edu.ucdavis
4051614753918400.000018com.autodesk
406161471957950.000038org.aclweb
4071614663314030.000028com.css-tricks
4081614467021540.000014edu.ncsu
4091614365516460.000022com.playstation
410161433499460.000034io.material
4111614331411050.000030org.iso
412161429318360.000036gov.justice
413161418588270.000037com.foxnews
414161411888290.000037com.gartner
4151614096717210.000020uk.ac.ucl
416161377873820.000064com.booking
417161368569450.000034com.psychologytoday
41816136353810.000337com.baidu
419161358047860.000038gov.copyright
4201613557415930.000023com.target
4211613508320210.000016edu.arizona
4221613130414500.000027io.codepen
423161300584150.000058com.monster
424161264944820.000052gov.irs
4251612616117320.000020com.freepik
4261612356214300.000027com.gumroad
4271612250015190.000025de.spiegel
428161202422850.000085gov.ftc
4291611772216600.000021com.com
430161167163630.000067com.githubusercontent
4311611506319770.000016com.msnbc
432161138735720.000045in.co.google
4331611002217220.000020com.gigaom
4341610967014380.000027com.dell
435161089489070.000035com.tandfonline
436161088263960.000060net.themeforest
4371610855214420.000027com.businessweek
438161070325330.000047gov.epa
4391610684110390.000031com.gofundme
440161068213260.000076com.rawgit
4411610660918750.000018com.angelfire
4421610559318240.000018com.yoast
4431610495025250.000012com.fiverr
4441610494616230.000022com.nymag
4451610359916160.000022com.hollywoodreporter
4461610345710270.000032ca.cbc
4471610323416480.000022com.sap
4481610303610980.000030com.nielsen
449161029984260.000057org.nodejs
4501610270925100.000012edu.hbs
451161027011950.000135com.eepurl
452161026887410.000040com.blackberry
4531610212126180.000012edu.caltech
4541610151515760.000024com.ning
455161009485820.000044uk.co.independent
456160990809670.000033com.underconsideration
4571609878719040.000017com.semrush
4581609651425620.000012com.popsci
4591609622119030.000017com.howstuffworks
460160961117340.000040gov.hhs
461160949817920.000038com.usnews
46216093468170.001739com.wixstatic
4631609324015910.000023org.fao
4641609105619850.000016tv.periscope
4651609041921520.000014com.cbs
4661608958614570.000027org.altervista
467160886564800.000052us.icio
468160870934510.000055com.force
4691608664910230.000032com.500px
4701608522121140.000015uk.ac.ed
4711608376322280.000014com.instructables
4721608365819360.000017org.filezilla-project
4731608224916750.000021com.nba
4741608178225990.000012com.codecademy
4751608108715310.000025com.elpais
4761608094210910.000030es.iac
477160785921200.000226com.google-analytics
478160780963050.000081com.staticflickr
479160778068610.000036uk.co.guardian
4801607677615170.000025com.warnerbros
481160763994960.000050com.cargocollective
4821607633618680.000018com.canva
4831607621220720.000015com.gamespot
4841607551916650.000021edu.jhu
4851607490714370.000027edu.wisc
486160728449760.000033com.uservoice
487160705489630.000033net.researchgate
4881607027614200.000028com.istockphoto
4891606988010220.000032com.insiderpages
490160687987900.000038tv.ustream
4911606833721910.000014au.com.news
4921606826540050.000007com.space
493160674178810.000036gov.arts
494160671212810.000085com.fc2
495160670284440.000055com.sciencedirect
4961606680717260.000020com.hulu
4971606641214340.000027gov.usgs
4981606405818040.000019com.fedex
4991606329014600.000027com.forrester
5001606197918970.000017org.pnas
501160612567990.000038com.feedly
5021606017629860.000010com.hubpages
5031606007221070.000015com.crunchbase
5041605944717250.000020com.mercurynews
5051605774514150.000028com.reverbnation
5061605701210040.000032com.lighthouseapp
5071605649116090.000023com.indeed
5081605633623960.000013com.programmableweb
509160556147460.000040com.gotowebinar
5101605512512340.000029com.mlb
5111605405510050.000032com.timeanddate
5121605395717130.000020kr.flic
513160538981590.000174com.googleadservices
5141605374916000.000023edu.si
515160526922470.000099com.getclicky
516160520332220.000114jp.co.amazon
5171605182617490.000020com.today
5181605133813760.000029ly.ow
519160506864410.000055edu.cmu
5201605049813750.000029org.redcross
521160503906010.000044com.squareup
5221604873518090.000019com.domain
5231604817916920.000021edu.uchicago
5241604749813850.000028de.heise
5251604658910420.000031com.googlelabs
526160464979350.000034com.patreon
5271604584321440.000015com.ibtimes
528160454514010.000059com.clicky
5291604359418260.000018com.socialmediaexaminer
5301604278714160.000028com.americanexpress
531160424474110.000058com.w3schools
5321604213221790.000014org.gimp
533160418007270.000041com.photoshelter
534160416453860.000063edu.nyu
535160397049100.000035org.scala-lang
5361603841328770.000010com.oxforddictionaries
537160381648880.000035ca.amazon
5381603790417610.000019com.upwork
5391603684916040.000023org.apa
5401603673615490.000024com.accenture
5411603602023200.000013com.csmonitor
5421603552127720.000011com.lynda
5431603457020920.000015com.bestbuy
544160345309430.000034com.emarketer
545160328654030.000059com.herokuapp
546160326069980.000033au.com.yellowpages
547160323345810.000045com.houzz
5481603156216570.000022com.codeplex
549160273931080.000243jp.co.google
5501602714616540.000022com.theglobeandmail
5511602706518170.000019com.zillow
5521602613931270.000009org.notepad-plus-plus
5531602558610350.000031com.uber
5541602551221580.000014com.aljazeera
555160251934810.000052org.doi
5561602453114020.000028gov.fbi
557160242282780.000086com.youku
5581602414711350.000030edu.alamo
5591602351917740.000019org.letsencrypt
5601602351518470.000018com.lulu
5611602200014180.000028com.unity3d
562160217244720.000052com.iconfinder
563160197911980.000133com.histats
5641601966918780.000017com.norton
565160195657030.000042uk.co.tripadvisor
5661601911113860.000028com.walmart
5671601842921800.000014edu.asu
5681601795415560.000024com.prezi
569160168899890.000033gov.usa
5701601542017600.000020com.thehill
5711601385720530.000015com.thestar
5721601361215610.000024in.blogspot
5731601226510410.000031jp.co.fujixerox
5741601101019800.000016com.trendmicro
5751601077215330.000025com.bufferapp
5761600953315790.000024com.intuit
5771600858315070.000025edu.umn
5781600771325120.000012edu.wustl
5791600637910290.000032com.chamberofcommerce
5801600598211190.000030net.brownbook
5811600544716790.000021com.hotmail
582160040964300.000056cn.com.sina
5831600351613920.000028com.techrepublic
5841600288016660.000021com.econsultancy
5851600223141880.000007com.boredpanda
586160019681070.000247com.messenger
5871600160121670.000014com.icloud
588160011779570.000034com.outlook
5891600075424120.000013com.twitpic
5901600069720450.000015com.ifttt
5911600065921340.000015com.lonelyplanet
5921600030718510.000018edu.virginia
593160000433290.000075com.naver
5941599911221030.000015com.mentalfloss
5951599871822600.000014com.refinery29
5961599751935010.000008net.minecraft
597159970192350.000106fr.google
5981599681515540.000024com.jetbrains
5991599604413990.000028com.aweber
6001599581523890.000013com.animoto
6011599574913950.000028us.imageshack
6021599563118390.000018com.zazzle
6031599548311110.000030com.ezlocal
604159950334000.000060com.newrelic
6051599487417540.000020com.posterous
6061599470919860.000016org.aclu
607159945906330.000043gov.sec
608159940839800.000033uk.co.eventbrite
6091599299725330.000012edu.unl
6101599102433900.000009com.fitbit
6111599056335620.000008com.wolfram
6121599013510490.000031edu.utep
6131598969917650.000019org.owasp
6141598956019140.000017com.people
6151598935317000.000021com.irishtimes
6161598798216580.000021org.cambridge
6171598748817140.000020com.aliexpress
6181598661327450.000011org.kiva
6191598609922010.000014com.getresponse
620159853288700.000036ca.yelp
6211598476626500.000011com.klout
6221598442017240.000020edu.academia
6231598437038260.000008edu.byu
6241598436319180.000017edu.cuny
6251598393431520.000009edu.dartmouth
6261598375739070.000007com.lmgtfy
6271598227713970.000028com.alexa
6281598211728410.000011com.lastpass
6291598108714790.000026com.mckinsey
6301598093820780.000015it.scoop
63115980757980.000261org.reactjs
63215979441880.000292net.facebook
6331597924727660.000011com.campaignmonitor
6341597825933920.000009edu.uic
6351597719421220.000015ch.ethz
636159760841480.000197ru.mail
6371597545323280.000013com.glamour
638159751831970.000135it.google
6391597418211280.000030fr.blogspot
6401597318917730.000019com.foxbusiness
6411597251719930.000016edu.msu
6421597247022060.000014ca.ualberta
643159722879560.000034com.city-data
6441597032620290.000016edu.uci
6451597006617290.000020com.newsweek
646159698157970.000038org.jenkins-ci
6471596974116760.000021com.marketo
6481596966610120.000032com.cdbaby
6491596958518480.000018com.hostgator
6501596924422160.000014com.softpedia
6511596903547120.000006com.diigo
652159687369970.000033au.com.truelocal
6531596852417050.000021com.yandex
6541596846735070.000008com.starwars
6551596702232770.000009com.softonic
6561596658413870.000028com.lifehacker
657159647694170.000057com.stripe
6581596423616810.000021com.thomsonreuters
6591596401219670.000016com.nfl
660159627317800.000038com.uk
6611596264114510.000027com.weather
6621596204522620.000014edu.bu
663159608411770.000146org.icann
6641596058028960.000010org.ala
665159604168350.000036org.openstreetmap
6661595761816470.000022mp.j
667159573952890.000084com.maxcdn
66815957230910.000290org.networkadvertising
6691595714431680.000009com.avast
6701595592526640.000011org.virtualbox
6711595529320080.000016edu.umass
6721595473718290.000018gov.nyc
6731595449123430.000013com.homedepot
6741595359519420.000017edu.ufl
6751595266816910.000021com.nokia
6761595233321610.000014com.livestrong
6771595118421240.000015com.history
678159486904090.000058com.fastcompany
6791594772322890.000013com.newscientist
6801594671617060.000021com.vox
681159443523130.000078com.taobao
682159440105160.000048net.openid
6831594371515840.000023fm.last
6841594362321900.000014org.craigslist
685159433108780.000036br.com.uol
6861594251229500.000010ca.uwaterloo
687159400803740.000065com.netdna-ssl
6881593843817300.000020com.pwc
689159356209880.000033gov.sba
690159352074580.000054com.barnesandnoble
6911593504526530.000011org.moma
6921593423827440.000011org.phys
693159339757080.000041com.docker
6941593326210550.000031com.adage
6951593313611140.000030com.formstack
6961593273136800.000008cc.co
697159321609690.000033com.pinimg
6981593210916700.000021com.xbox
699159311765000.000050com.cracked
700159306613490.000070nl.google
701159301962040.000123jp.ameblo
7021592970028000.000011edu.hawaii
7031592930521020.000015com.blogtalkradio
704159291738570.000036com.delicious
7051592858429930.000010com.123rf
7061592821021630.000014com.britannica
7071592798429460.000010org.greenpeace
7081592706718590.000018com.stitcher
7091592691018310.000018com.marketwired
7101592672315140.000025gov.ny
7111592647730270.000010uk.bl
7121592622619980.000016net.boingboing
713159262053520.000070org.opensource
714159254936970.000042fr.amazon
7151592541720590.000015com.templatemonster
7161592538419280.000017com.networkworld
7171592536310380.000031com.infusionsoft
7181592503514240.000028com.shareasale
7191592493210200.000032au.com.yelp
7201592410210250.000032org.designmuseum
7211592407141090.000007org.libreoffice
7221592236030720.000010com.wikidot
7231592198615360.000024com.globo
7241592171029600.000010ca.globalnews
7251592125724640.000012com.fox
726159207717710.000039com.163
7271592069533990.000009org.edx
7281591925619610.000016com.mac
7291591881118420.000018gov.treasury
7301591838620850.000015com.urbandictionary
7311591799315280.000025gov.bls
732159177812020.000126jp.ne.hatena
7331591678814000.000028com.arcgis
734159156565430.000046com.technologyreview
7351591560917090.000021com.gettyimages
736159152368720.000036com.msdn
7371591519517040.000021com.windows
7381591456910940.000030com.mtnonline
7391591277628020.000011com.knowyourmeme
740159124722230.000111com.automattic
741159120837830.000038com.discordapp
7421591204112130.000029com.gloworld
7431591075642600.000007com.trulia
744159104229480.000034com.mysanantonio
745159103982380.000104com.parallels
7461591032610920.000030com.cbslocal
747159097884540.000055com.mapbox
7481590910916940.000021com.mtv
7491590907626950.000011com.imageshack
7501590830818550.000018edu.duke
7511590726914060.000028com.accuweather
7521590720141300.000007com.techsmith
7531590714317420.000020uk.co.wired
7541590709735940.000008com.makezine
7551590646127590.000011edu.pitt
7561590641620770.000015edu.indiana
7571590573510850.000030edu.uah
7581590563114960.000025me.m
7591590530310830.000030com.judysbook
7601590511614940.000026com.buffer
7611590492316120.000023com.searchenginewatch
7621590445846190.000006org.edublogs
7631590276514280.000028com.ups
764159021757770.000038gov.ed
765159020339600.000034au.com.whitepages
7661590113522880.000013uk.co.metro
7671590098121360.000015com.ign
7681590056819400.000017net.codecanyon
7691589952619870.000016com.pastebin
7701589842520150.000016com.nvidia
7711589828810680.000031com.womentechmakers
7721589811832960.000009org.code
7731589801029760.000010edu.oregonstate
7741589779519630.000016com.espn
7751589747717230.000020org.gnome
776158969208330.000036com.proofpoint
7771589666114630.000027gov.dot
7781589661210670.000031com.zoho
7791589517625340.000012com.producthunt
780158951507510.000039com.atlassian
7811589449621880.000014ca.ubc
782158944395760.000045com.us
7831589428518150.000019com.contentmarketinginstitute
7841589408115010.000025com.investopedia
7851589326725080.000012com.bankofamerica
7861589318716850.000021gov.wa
7871589299919210.000017com.deadline
7881589211725440.000012com.nhl
7891589200136570.000008org.lifehack
7901589179016830.000021com.vmware
7911589171824510.000012com.starbucks
7921589145523620.000013ly.visual
7931589139912040.000029org.change
7941589134819230.000017uk.ac.lse
7951589119421150.000015com.magentocommerce
796158908812480.000098org.iana
7971589037626310.000012com.lifewire
7981588996710150.000032com.visualstudio
7991588939715590.000024jp.blogspot
8001588918416060.000023com.sky
8011588909416010.000023com.gotomeeting
8021588905210600.000031com.bizcommunity
8031588894032810.000009com.smashwords
8041588716816510.000022com.mediafire
8051588659915780.000024com.ssrn
8061588609516860.000021net.recode
8071588583225930.000012com.asus
8081588439316360.000022se.haxx
8091588413813700.000029es.amazon
810158837864740.000052com.teamviewer
8111588362117870.000019com.outbrain
812158825776990.000042com.getpocket
8131588118430530.000010com.macrumors
8141588077232680.000009net.battle
8151588036215810.000024com.nydailynews
8161587964117970.000019edu.vanderbilt
8171587923124170.000013com.thestreet
818158786739410.000034net.azurewebsites
8191587849916410.000022fr.lemonde
8201587849316200.000022org.postimg
8211587770233930.000009com.formula1
8221587756514550.000027com.oup
8231587662126240.000012gov.cia
8241587630427100.000011org.olympic
8251587562825760.000012org.7-zip
8261587557739060.000007uk.ac.warwick
8271587548125730.000012com.tesla
8281587372623500.000013hk.com.google
8291587362918720.000018com.ecwid
8301587312010030.000032com.mlstatic
8311587216916630.000021com.glassdoor
8321587216320760.000015ca.utoronto
8331587158323030.000013net.comcast
8341587041621180.000015com.readwrite
8351587013021850.000014ca.qc.gouv
8361586991617850.000019gov.congress
837158677249470.000034com.att
8381586732718010.000019uk.co.mirror
839158672994950.000050com.marriott
8401586623929560.000010com.coinbase
8411586613420240.000016com.me
8421586603532560.000009gd.is
8431586551914210.000028org.plos
8441586492915180.000025com.business2community
8451586377710790.000030com.sagepub
8461586340628760.000010com.fineartamerica
8471586302411770.000029me.pxlme
8481586299515690.000024com.over-blog
8491586290915200.000025com.techtarget
8501586268719700.000016ru.narod
8511586249617700.000019com.ssllabs
8521586183524230.000013com.ge
8531586130118380.000018org.unicef
8541586098720320.000016int.wipo
855158606211790.000143de.bund
856158592329820.000033gov.house
8571585877221120.000015uk.co.thesun
8581585873230780.000010net.sucuri
8591585841721840.000014com.yolasite
8601585841325670.000012ms.1drv
8611585827121230.000015au.com.blogspot
8621585785426860.000011com.fool
8631585783439500.000007com.thenation
8641585761044180.000007edu.temple
8651585758333410.000009com.makeuseof
8661585701818320.000018edu.umd
867158569437820.000038es.com.blogspot
8681585536415830.000023com.pingdom
8691585533422920.000013com.macworld
870158552904550.000054jp.ne.sakura
8711585508825630.000012com.webnode
8721585484030440.000010com.freelancer
8731585400022970.000013gov.nsf
8741585387226230.000012edu.brown
8751585260149140.000006ca.gc.statcan
8761585259519970.000016com.getfirebug
8771585193322820.000013com.wikispaces
8781585073720090.000016org.jstor
8791585069118560.000018co.angel
8801585066529120.000010edu.tufts
881158500017430.000040org.bitbucket
8821584976422430.000014edu.osu
8831584921421210.000015edu.tamu
8841584906521270.000015org.wpmudev
885158476368340.000036net.noscript
8861584723237350.000008com.appleinsider
887158470681490.000195com.ggpht
8881584653217630.000019it.blogspot
8891584648524900.000012org.documentcloud
8901584590219260.000017com.cc
8911584458822070.000014us.zoom
8921584403217550.000020com.rollingstone
8931584349642050.000007li.paper
8941584317120520.000015edu.rutgers
8951584266923650.000013com.theonion
896158422927230.000041com.geocities
8971584215733280.000009com.indiewire
8981584170229780.000010int.esa
899158409549390.000034com.netdna-cdn
9001584066726040.000012ly.generalassemb
9011584060235710.000008edu.buffalo
902158400944850.000051br.com.google
9031584003810780.000030com.bitballoon
904158397034840.000051com.1and1
9051583897025710.000012com.sony
906158387706740.000042com.trustpilot
9071583737912000.000029org.oecd
9081583703420640.000015com.azcentral
9091583608111420.000030com.communitywalk
9101583582121970.000014org.videolan
9111583490725460.000012com.pandora
9121583328750620.000006org.anitaborg
9131583328423880.000013gov.in
9141583308130170.000010com.4shared
9151583279330580.000010org.metmuseum
9161583253321490.000015com.theknot
917158324798970.000035org.osgeo
918158301142010.000127me.line
919158276103720.000065com.bizjournals
9201582687425840.000012com.fujitsu
9211582674626740.000011com.blogs
922158262442720.000087org.debian
9231582489377700.000004edu.du
9241582376923370.000013com.bleacherreport
925158234105660.000045com.quantcast
9261582291724090.000013uk.co.express
9271582247921480.000015com.redbubble
9281582241328560.000010com.cosmopolitan
9291582221018030.000019org.cancer
9301582181711360.000030com.graphis
9311582106820580.000015de.zeit
9321582105323460.000013ca.sfu
933158207297320.000040com.wunderground
9341582067523110.000013com.convinceandconvert
9351582037228880.000010org.bitcoin
9361582019214640.000027com.usps
9371581987245450.000006com.blog
9381581959020670.000015com.salon
9391581727727790.000011com.technet
9401581726117990.000019net.daringfireball
9411581718925600.000012com.googlepages
942158166471150.000229com.bluehost
9431581655720890.000015com.w3techs
9441581626017710.000019com.calendly
9451581546930500.000010com.rottentomatoes
9461581499972680.000004com.elance
9471581453225290.000012com.createspace
948158139438550.000036com.comscore
9491581361024040.000013edu.colorado
9501581359910890.000030com.2findlocal
9511581356510570.000031org.tpr
9521581290816430.000022com.bt
953158128413650.000067com.rackcdn
9541581123434100.000009com.kotaku
9551581100837830.000008edu.syr
956158104977290.000041com.verisign
9571580990210090.000032com.tiddlywiki
9581580895717360.000020com.strikingly
9591580886761970.000005com.mercedes-benz
9601580886027360.000011com.oprah
9611580885117070.000021com.bmj
9621580836721400.000015com.popsugar
9631580740821750.000014org.hrw
9641580735520340.000016com.shareholder
9651580731618880.000017com.digicert
9661580708716450.000022com.steamcommunity
9671580703640570.000007com.pastemagazine
9681580666830990.000010com.voanews
9691580662610440.000031org.travelblog
9701580647014410.000027org.heart
9711580629029630.000010com.thrillist
9721580577241640.000007com.youcaring
9731580572910750.000031com.independent
9741580541121770.000014net.atlassian
9751580497145290.000006com.secondlife
9761580431115460.000024int.coe
9771580425223630.000013com.xerox
9781580208019070.000017com.computerworld
9791580186126080.000012com.groupon
9801580125638090.000008edu.rochester
9811580070934280.000009com.sas
9821579966320370.000016com.getsatisfaction
983157994674760.000052com.aliyuncs
9841579862665250.000005com.threatpost
9851579827118250.000018ru.spb
9861579794422240.000014com.gawker
9871579780924680.000012me.flavors
9881579773041970.000007com.slides
9891579764127940.000011com.madmimi
9901579750924110.000013com.hindustantimes
9911579697843380.000007org.teamusa
9921579657014670.000026gov.va
9931579567218700.000018mil.navy
994157954783500.000070jp.co.rakuten
995157953897200.000041com.hilton
9961579503610460.000031com.chicagotribune
9971579502414860.000026com.cafepress
9981579428323360.000013org.dyndns
9991579411322710.000014com.teenvogue
1000157936988960.000035gov.export

Credits

Thanks to the authors of the WebGraph framework, whose software made the computation of graph properties and ranks possible.

We hope the data will be useful for you to do any kind of research on ranking, graph analysis, link spam detection, etc. Let us know about your results via Common Crawl’s Google Group!