We are pleased to announce a new release of host-level and domain-level web graphs based on the crawls of June, July/August and September 2021. Additional information about the data formats, the processing pipeline, our objectives, and credits can be found in the announcements of prior webgraph releases. You may also visit the projects cc-webgraph and cc-pyspark which include all scripts and tools required to construct the graphs. Instructions to explore the graphs in the webgraph format are given in our collection of webgraph notebooks.

Host-level graph

The graph consists of 766 million nodes and 4.95 billion edges. Both hyperlinks and HTTP redirects and link headers are used as edges to span up the graph. All types of links are included, including pure “technical” ones pointing to images, JavaScript libraries, web fonts, etc. However, only host names with a valid IANA TLD are used. Consequently, URLs with an IP address as host component are not taken into account for building the host-level graph.

There are 701 million dangling nodes (91.6%) and the largest strongly connected component contains 47.0 million (6.1%) nodes. Dangling nodes stem from

  • hosts that have not been crawled, yet are pointed to from a link on a crawled page
  • hosts without any links pointing to a different host name
  • or hosts which did only return an error page (eg. HTTP 404)

Host names in the graph are in reverse domain name notation and a leading www. is stripped: www.subdomain.example.com becomes com.example.subdomain.

You can download the graph and the ranks of all 766 million hosts from AWS S3 on the path s3://commoncrawl/projects/hyperlinkgraph/cc-main-2021-jun-jul-sep/host/. Alternatively, you can use https://commoncrawl.s3.amazonaws.com/projects/hyperlinkgraph/cc-main-2021-jun-jul-sep/host/ as prefix to access the files from everywhere.

Please note that the text representation of the host-level graph is shipped in 42 gzip-compressed files listed in two path listings – one for the nodes (vertices), one for the edges (arcs). First, download the paths listing and decompress it using “gzip”. By adding the prefix s3://commoncrawl/ or https://commoncrawl.s3.amazonaws.com/ to each line in the path listing you get the list of URLs to download the entire graph.

SizeFileDescription
4.50 GBcc-main-2021-jun-jul-sep-host-vertices.paths.gznodes ⟨id, rev host⟩, paths of 24 vertices files
20.76 GBcc-main-2021-jun-jul-sep-host-edges.paths.gzedges ⟨from_id, to_id⟩, paths of 48 edges files
8.43 GBcc-main-2021-jun-jul-sep-host.graphgraph in BVGraph format
2 kBcc-main-2021-jun-jul-sep-host.properties
9.83 GBcc-main-2021-jun-jul-sep-host-t.graphtranspose of the graph (outlinks inverted to inlinks)
2 kBcc-main-2021-jun-jul-sep-host-t.properties
1 kBcc-main-2021-jun-jul-sep-host.statsWebGraph statistics
11.06 GBcc-main-2021-jun-jul-sep-host-ranks.txt.gzharmonic centrality and pagerank

Domain-level graph

The domain graph is built by aggregating the host graph on the level of pay-level domains (PLDs) based on the public suffix list maintained on publicsuffix.org. Version (commit) a5b046d of the public suffix list was used (commit date 2021-10-06).

The domain-level graph has 88 million nodes and 1.56 billion edges. 49% or 43 million nodes are dangling nodes, the largest strongly connected component covers 35 million or 40% of the nodes.

All files related to the domain graph are available on AWS S3 under s3://commoncrawl/projects/hyperlinkgraph/cc-main-2021-jun-jul-sep/domain/ resp. https://commoncrawl.s3.amazonaws.com/projects/hyperlinkgraph/cc-main-2021-jun-jul-sep/domain/.

Download files of the Common Crawl Jun/Jul/Sep 2021 domain-level webgraph

SizeFileDescription
0.61 GBcc-main-2021-jun-jul-sep-domain-vertices.txt.gznodes ⟨id, rev domain, num hosts⟩
6.34 GBcc-main-2021-jun-jul-sep-domain-edges.txt.gzedges ⟨from_id, to_id⟩
3.59 GBcc-main-2021-jun-jul-sep-domain.graphgraph in BVGraph format
2 kBcc-main-2021-jun-jul-sep-domain.properties
3.44 GBcc-main-2021-jun-jul-sep-domain-t.graphtranspose of the graph
2 kBcc-main-2021-jun-jul-sep-domain-t.properties
1 kBcc-main-2021-jun-jul-sep-domain.statsWebGraph statistics
1.89 GBcc-main-2021-jun-jul-sep-domain-ranks.txt.gzharmonic centrality and pagerank

Below you’ll find the top 1000 domains ranked by Harmonic Centrality or PageRank. The full list of all 88 million domain ranks is available for download.

Top 1000 domains ranked by harmonic centrality (Jun/Jul/Sep 2021)

harmonic
centrality
rank
hc valuepage rankpage rank
value
reversed domain name
13163452410.018526com.googleapis
23059044830.013879com.facebook
32934410220.015082com.google
42657920850.007506org.w
52649088640.008344com.twitter
62619450460.007270com.youtube
72559836480.006802com.instagram
82480861870.006829com.googletagmanager
92431433490.004946org.gmpg
1023501518140.003442com.linkedin
1123191984110.003661com.gstatic
1222552330100.004093com.cloudflare
1322306348170.001939com.gravatar
1422254410120.003556org.wordpress
1522159662220.001575com.pinterest
1621706482230.001396org.wikipedia
1721593592160.001952com.apple
1821537622370.001024com.wordpress
1921459498270.001176com.vimeo
2021360146410.000898be.youtu
2121289640150.002252com.bootstrapcdn
2221211768200.001717com.jquery
2321069122280.001166com.microsoft
2421014760540.000699com.blogspot
2520975968460.000793com.amazon
2620916000450.000798gl.goo
2720909476500.000720com.wp
2820878568620.000580ly.bit
2920832704350.001053com.google-analytics
3020815940330.001068com.amazonaws
3120731242770.000417com.tumblr
3220705584420.000841org.mozilla
3320703870310.001080net.cloudfront
3420700046180.001797com.adobe
3520687574490.000720eu.europa
3620682862190.001772com.github
3720663412340.001054net.jsdelivr
3820641588290.001134com.wixstatic
3920534372750.000442com.googleusercontent
4020518204210.001630com.fontawesome
4120499484960.000366com.yahoo
4220489920510.000711com.paypal
4320470068570.000645co.t
4420460526470.000784com.whatsapp
4520453542530.000704com.flickr
46204296101060.000315com.reddit
4720405542720.000449com.medium
4820396096380.000956com.googlesyndication
4920388958480.000748io.github
50203645841250.000241com.nytimes
5120363658590.000616org.w3
5220350454980.000358com.weebly
5320330952970.000363org.creativecommons
5420324280650.000510com.shopify
55203156241100.000281com.soundcloud
5620308274850.000402me.wp
5720300136600.000610org.schema
5820300004320.001072ru.yandex
59202728741320.000215com.forbes
6020266896640.000527com.vk
61202166401140.000268com.spotify
62202148181750.000146com.cnn
6320172836520.000708net.doubleclick
6420156654560.000653com.addthis
65201536421900.000132uk.co.bbc
66201518822300.000113com.wsj
67201498221350.000208gov.nih
6820146194400.000924com.baidu
69201449521610.000166com.theguardian
70201271721510.000179int.who
71201211342400.000108com.bloomberg
72201152701300.000217org.archive
73201115681630.000163com.giphy
7420107092940.000388com.list-manage
75201008301920.000130org.wikimedia
7620095176550.000682com.macromedia
77200918062290.000113com.oracle
78200815721450.000193com.imdb
79200745801870.000134com.businessinsider
80200669122970.000091edu.mit
81200664363000.000090edu.stanford
82200613141150.000267com.mailchimp
8320059906440.000800net.facebook
84200580501400.000204us.zoom
85200566824260.000065com.googleblog
86200532101710.000153com.unsplash
87200470202840.000095com.reuters
88200437221850.000137com.imgur
89200357541420.000199com.wixsite
90200253001910.000131com.stackoverflow
91200221001280.000223com.weibo
92200165601620.000163com.issuu
93200160264230.000065gov.nasa
9420014282360.001043net.fbcdn
95200139102730.000098com.android
96200127781190.000259me.t
97200123661690.000160org.ietf
98200045301390.000206com.ytimg
99200011821480.000183org.apache
100199994643180.000086com.theverge
101199930263250.000084com.slack
102199928502580.000103edu.harvard
103199898702240.000116com.washingtonpost
104199883022680.000099com.bbc
105199866424600.000060edu.cornell
106199847121550.000174com.ft
107199798561170.000261com.npmjs
108199787444160.000066com.ted
109199627384240.000065com.myspace
110199573223520.000078com.wired
111199562965260.000052com.livejournal
112199454843550.000078com.appspot
113199442742630.000101org.un
114199434582150.000119org.gnu
115199331123900.000070com.goodreads
11619932600690.000476com.godaddy
117199315223870.000071org.hbr
118199174403090.000087org.npr
119199166123220.000085com.prnewswire
120199116283320.000083net.researchgate
121199101223070.000088com.githubusercontent
12219909276240.001390io.polyfill
123199082762860.000095com.wiley
124199059602470.000106com.tiktok
125198949901930.000130com.blogger
12619892346710.000466com.unpkg
127198849781020.000322de.google
128198695564320.000064com.gmail
129198692365650.000049com.vice
130198684066270.000045org.chromium
131198671461460.000186gle.forms
132198669161130.000269com.youtube-nocookie
133198668341790.000142org.ampproject
134198566083540.000078com.time
135198562726680.000043edu.upenn
136198560403700.000074com.example
137198548186330.000045com.economist
138198508847270.000040com.evernote
139198424345000.000055com.steampowered
140198413806920.000042google.blog
141198413544620.000060com.theatlantic
142198400365860.000047org.weforum
143198375966280.000045com.deviantart
144198370722390.000109uk.co.google
145198334703570.000077org.arxiv
146198334023950.000070com.scribd
147198318824070.000067uk.co.telegraph
148198298783680.000074com.huffingtonpost
149198298766420.000045com.mysql
150198218005550.000050org.worldbank
151198181002420.000107com.sciencedirect
152198166763470.000080com.nature
153198149922280.000113com.twimg
154198123361200.000259com.statcounter
155198100982460.000106org.acm
156198052145540.000050org.ieee
157197996863810.000071com.fastcompany
158197968922550.000104org.python
159197949067010.000041com.apnews
160197908204310.000064com.meetup
161197902767380.000040com.qz
162197898585810.000047com.globenewswire
163197884163650.000074com.docker
164197884024440.000062com.pixabay
165197880384530.000061uk.co.dailymail
166197844423210.000085com.springer
167197835522510.000105com.bandcamp
168197792882560.000103net.behance
169197780763610.000075com.gitlab
170197689486030.000046com.git-scm
171197642245400.000051io.readthedocs
172197639827150.000041com.engadget
173197630346470.000044com.trello
174197611401830.000138com.bing
175197581203460.000080com.usatoday
176197573262200.000117com.squarespace
177197524281580.000169com.yelp
178197478843530.000078com.dribbble
179197477564170.000066com.digg
180197475321430.000199com.dropbox
181197448182760.000097com.ibm
182197340924640.000060uk.co.independent
183197333103980.000069com.w3schools
184197238005440.000051ee.linktr
185197233346740.000043uk.co.blogspot
186197232004120.000067com.staticflickr
187197212285510.000050com.pexels
188197212241440.000194gov.cdc
189197188025390.000051org.pbs
190197183887750.000038com.stackexchange
191197182227030.000041org.cambridge
1921971735211030.000029org.eclipse
19319716696430.000819com.fb
194197125166780.000042edu.columbia
195197103081000.000351com.wix
196197062567300.000040edu.washington
197197056122900.000094com.tinyurl
198197051263670.000074com.sagepub
199197042586730.000043me.about
200196990442190.000117net.slideshare
201196948186530.000044org.sciencemag
202196890942940.000091org.pewresearch
203196857706010.000046com.withgoogle
204196849564580.000061com.herokuapp
205196751884400.000063com.quora
206196730701230.000246com.sharethis
20719672408390.000941com.qq
208196719821760.000145org.doi
209196700884380.000063co.ibb
210196676746140.000046com.newyorker
2111966472411830.000027com.nike
212196642403160.000087com.typeform
213196582542480.000106com.outlook
214196544367360.000040com.hp
215196541047910.000037com.foxnews
216196510262340.000112com.cloudinary
217196482149330.000035edu.princeton
218196465745720.000048com.moz
219196422604370.000063com.getpocket
220196398984850.000057com.nbcnews
221196388366450.000044org.bitbucket
222196358722020.000124page.g
223196351421540.000176gov.privacyshield
224196349242770.000096com.disqus
225196249361410.000203com.opera
226196236144630.000060com.airbnb
227196203189600.000034com.dropboxusercontent
228196186243720.000073com.force
229196177969230.000035co.elastic
230196140062140.000119com.wpengine
231196131646790.000042org.semver
232196131603050.000089com.typepad
233196112549720.000033com.nypost
234196107987260.000040com.ubuntu
2351961007012630.000025se.haxx
236196052343030.000089com.live
237196034286830.000042au.net.abc
238196033604650.000060com.mozilla
239195997643820.000071com.criteo
2401959318412320.000026uk.co.thesun
2411958135414340.000023edu.rutgers
242195810202780.000096com.feedburner
243195772889690.000033com.politico
2441957241410540.000030co.g
2451957237218490.000019com.instructables
246195708747780.000038com.sap
2471956869411230.000028org.greenpeace
2481956413411270.000028org.kernel
2491956228016500.000022com.googlesource
250195614841240.000245com.filesusr
2511955944411500.000028com.unity3d
252195580566820.000042com.freepik
253195571065420.000051com.fortune
254195539546380.000045uk.ac.ox
255195536342090.000121org.iana
256195518722410.000108com.eepurl
257195513989580.000034com.ssrn
258195510108660.000035com.nvidia
2591954954415690.000022com.storify
2601954903610000.000032com.sun
261195485146240.000045uk.co.eventbrite
262195444668290.000036edu.jhu
263195411506070.000046net.azurewebsites
2641953907612910.000025com.reverbnation
265195386604390.000063gov.fda
26619538044990.000351com.stripe
267195348609270.000035com.podbean
268195326123410.000082net.windows
2691953096612560.000025uk.co.ebay
270195307062600.000102com.calendly
271195266328040.000037com.chrome
2721952501618040.000020com.martinfowler
273195249229260.000035edu.academia
274195234845680.000049site.business
275195215682310.000113com.office
276195211063280.000084com.netdna-ssl
277195188742530.000105com.newsweek
278195168802100.000120tv.twitch
2791951518212200.000026com.vogue
2801951472623070.000017com.diigo
2811951466212140.000026org.postgresql
282195132926910.000042com.xinhuanet
2831950837813680.000024de.mpg
284195083265250.000052com.squareup
285195078924080.000067org.debian
286195020821210.000252com.paypalobjects
287195019867160.000041gov.senate
2881950178823920.000016com.pearltrees
2891950024011470.000028com.500px
290194990463430.000081com.googlecode
291194979807120.000041org.change
292194975164910.000056com.tandfonline
29319496848670.000505net.akamaihd
2941949207612650.000025com.aljazeera
295194920547060.000041com.qualtrics
296194907507830.000037com.theconversation
2971949038810780.000029com.theglobeandmail
298194862281340.000211de.bund
2991948578812990.000025edu.illinois
300194857863270.000084com.cnbc
301194841408580.000036uk.co.guardian
302194837684830.000057com.msn
303194754161290.000219com.rawgit
304194723724740.000059com.stumbleupon
305194692781980.000125net.sourceforge
306194689823560.000077com.optimizely
307194686564700.000059org.openstreetmap
308194675263350.000083com.techcrunch
309194659984340.000064com.ssl-images-amazon
3101946418814050.000023edu.ufl
3111946291213950.000023edu.gatech
312194628161650.000162com.hubspot
313194627363100.000087com.mapbox
314194624044410.000063com.go
315194590726080.000046gov.noaa
3161945884614700.000023com.channel4
3171945879412370.000026ca.sfu
318194576985920.000047com.healthline
319194576607940.000037org.fao
320194575285070.000054ca.google
3211945717024310.000016com.wattpad
3221945426812800.000025uk.co.standard
323194526786670.000043gov.house
3241945193613090.000025uk.co.wired
3251945029418620.000019com.invisionapp
326194497547430.000039com.pinimg
327194497381840.000138com.amazon-adsystem
3281944948018360.000019org.maven
3291944624225070.000015com.openai
330194436988030.000037org.pypi
331194421422620.000101net.azureedge
332194410004900.000056com.kickstarter
3331943957019620.000019uk.bl
3341943940212360.000026au.com.smh
3351943690413390.000024com.vanityfair
336194355042610.000102uk.org.ico
337194343681530.000176com.addtoany
338194270349860.000032com.uk
3391942611410970.000029com.scmp
340194240369810.000033org.pnas
341194224744330.000064com.cnet
342194221743480.000080com.statista
343194213601490.000180org.nodejs
344194188367930.000037us.icio
3451941629029160.000013com.instapaper
346194157524890.000056gov.epa
3471940568210030.000032com.mixcloud
348194045006770.000042org.d3js
349194018728510.000036com.britannica
350194002045930.000047uk.gov.service
351193994661770.000143org.allaboutcookies
352193969545460.000050edu.berkeley
353193951622960.000091me.telegram
3541939367014360.000023com.irishtimes
355193927169440.000034int.coe
356193888741570.000170com.zendesk
3571938822215070.000022org.hrc
3581938704012510.000025com.history
359193868721520.000178io.shields
3601938629416860.000021ms.1drv
361193847665480.000050com.biomedcentral
362193847604730.000059com.latimes
3631938123410620.000030org.jstor
3641938120611240.000028com.jetbrains
3651937615210840.000029org.ilo
366193760367770.000038edu.psu
367193753341370.000207com.youronlinechoices
368193749868950.000035com.ecwid
3691937193010670.000030com.brightcove
370193714607480.000039it.scoop
371193694103290.000084ru.ok
3721936491012260.000026com.digitaltrends
3731936278810040.000032uk.co.thetimes
3741935976214040.000023com.thedailybeast
3751935912216040.000022edu.osu
376193542905130.000054edu.yale
377193517641160.000262com.jimdo
3781934980023540.000016com.fastcodesign
379193494669880.000032uk.parliament
380193481284560.000061org.freecodecamp
3811934348822090.000018com.us
382193402145730.000048com.deloitte
3831933997610100.000031uk.co.huffingtonpost
384193397386050.000046com.zdnet
385193390981810.000139ru.mail
386193375724150.000066com.elsevier
3871933581011850.000027org.nejm
3881933557224220.000016com.instructure
389193354564270.000065net.imgix
3901933459617140.000021com.citrix
3911933095825900.000014org.aclweb
3921933088619640.000018org.haskell
393193267546490.000044gov.state
3941932663413500.000024app.netlify
395193260688010.000037com.venturebeat
396193238962170.000118com.eventbrite
397193238766220.000045com.seattletimes
3981932328418680.000019jp.ac.u-tokyo
3991932240830480.000012org.uxplanet
4001931979411860.000027com.dw
4011931840412480.000026org.undp
402193179502110.000120com.etsy
4031931794817630.000020com.itv
404193175482330.000112net.php
40519314522580.000638com.googleadservices
406193140042710.000098com.surveymonkey
407193100042450.000107org.aboutcookies
4081930957419000.000019edu.vt
4091930743424770.000015org.wikibooks
410193057806870.000042gov.nist
4111929765814230.000023com.thehindu
4121929714211200.000028org.hrw
413192951186000.000046com.thinkwithgoogle
4141929457416820.000021gov.usembassy
415192942746560.000044com.intel
4161929288417250.000021int.unfccc
417192918063940.000070com.ebay
4181928990824190.000016google.ai
4191928656222700.000017com.netvibes
4201928515625620.000015io.material
4211928310622350.000017ly.rebrand
4221928087019540.000019org.archlinux
423192801148780.000035uk.co.pinterest
4241927921628330.000013org.doctorswithoutborders
4251927759020690.000018org.accessnow
4261927250614820.000023com.findlaw
427192719388000.000037net.clickbank
4281927087635360.000010com.viki
429192695089800.000033edu.brookings
4301926563823630.000016co.carrd
4311926551829500.000012org.neocities
4321926170012940.000025com.wikia
433192606086570.000044com.mashable
434192603467980.000037com.thelancet
435192590749740.000033uk.ac.cam
4361925896424160.000016org.rsf
4371925878618410.000019net.daringfireball
438192556847080.000041com.canva
439192556744950.000056gov.whitehouse
440192533742040.000122com.salesforce
4411925232811910.000027com.thenextweb
4421925193420760.000018com.france24
443192519145340.000052io.codepen
4441925067028960.000013com.laweekly
4451925019612180.000026com.licdn
4461924889428390.000013cc.uxdesign
4471924513623680.000016edu.kit
4481924031210710.000030watch.fb
4491923922425830.000014org.scala-lang
4501923892625130.000015au.com.theage
4511923462226440.000014com.hubpages
4521923286015330.000022ch.ipcc
4531923235810920.000029com.digitaloceanspaces
4541923107824350.000015org.vim
4551922740817210.000021com.refinery29
456192271022920.000093net.secureservercdn
457192259967690.000038com.marketwatch
4581922469026080.000014app.web
4591922466616810.000022org.unwomen
4601922362224470.000015com.fineartamerica
4611922271025660.000015nl.blogspot
462192196785370.000051edu.cmu
463192196625050.000054fr.free
464192171228310.000036com.box
4651921694411590.000027com.imageshack
4661921535824090.000016edu.usf
4671921413823960.000016nz.co.nzherald
4681921259827580.000013com.smashwords
469192085443640.000074net.datatables
470192079304190.000066com.nationalgeographic
471192078322870.000095com.iubenda
4721920622024700.000015re.appsto
473192057462350.000112com.adnxs
4741920382625090.000015org.gentoo
4751920368419820.000018com.voanews
4761920306424980.000015com.superuser
477192010903790.000072com.businesswire
478192001626430.000045int.wipo
4791919942622010.000018org.biorxiv
4801919887414510.000023org.amnesty
4811919845424360.000015com.oregonlive
4821919844820330.000018org.nobelprize
48319197824810.000414net.jsfiddle
4841919738419630.000019com.ew
4851919564810080.000031com.arstechnica
4861919470018260.000020org.ocks
487191946562650.000101com.aliyuncs
4881919231426490.000014com.dezeen
4891919152426780.000014org.transparency
490191914749250.000035org.mediawiki
4911919102028530.000013com.scribblelive
4921919086613570.000024io.gitlab
4931919077616830.000021org.aiga
4941918899419850.000018uk.gov.tfl
495191862504280.000064com.adweek
4961918552819410.000019org.unep
497191854444300.000064org.js
498191840303710.000073com.atlassian
4991918327422820.000017com.foreignpolicy
5001918128826960.000014org.democracynow
501191811609850.000032com.webs
5021918112411980.000026com.wetransfer
5031917953213690.000024org.altervista
5041917781425010.000015google.research
5051917710229590.000012za.co.iol
5061917557210280.000031com.slate
5071917552827130.000014org.cpj
5081917445620160.000018org.example
5091917378223900.000016com.googlegroups
510191679022430.000107com.naver
5111916430422660.000017net.openid
5121916415430580.000012com.deepmind
513191640222690.000099org.drupal
514191637162640.000101gov.ca
515191637144140.000067com.livechatinc
5161915558623280.000016com.washingtontimes
517191530826370.000045com.cbsnews
518191518247590.000038com.oreilly
5191915001230340.000012com.podomatic
520191495386640.000043gov.loc
521191476341360.000208org.networkadvertising
522191467027180.000041com.buzzfeed
5231914489613330.000024link.page
524191434408380.000036com.pcmag
525191422249560.000034com.verisign
5261913347628920.000013com.thoughtworks
5271913275826690.000014uk.co.timesonline
528191313022740.000097com.getbootstrap
5291913087230820.000012com.mariadb
5301912965011160.000028com.jekyllrb
531191289929380.000034com.vox
532191275081270.000234info.aboutads
533191223624920.000056com.patreon
5341912203424880.000015com.curbed
535191217365140.000054it.placehold
5361912162218070.000020com.ascentlawfirm
537191188022270.000114to.amzn
538191174625800.000047com.visualstudio
5391911742612810.000025com.smashingmagazine
540191169264990.000055com.sxsw
541191169229780.000033com.hootsuite
542191163842820.000095gov.ftc
5431911357023360.000016com.snopes
5441911069813920.000023com.upwork
5451910992013880.000024com.haaretz
5461910814217010.000021com.firebaseapp
547191046869200.000035com.zoho
5481910376830230.000012org.peta
5491910022011740.000027com.att
5501909750216870.000021com.techrepublic
5511909734817760.000020com.surveygizmo
5521909705023940.000016com.treehugger
5531909529433590.000011com.letterboxd
5541909485630250.000012gov.anl
5551909443625680.000015com.kaggle
5561909143024790.000015fm.omny
5571909142229990.000012com.bangkokpost
558190895424780.000058gov.irs
5591908696217900.000020ca.bc.gov
560190868808230.000036com.emarketer
5611908685411730.000027com.mediaplex
562190863702590.000103uk.co.amazon
5631908414027370.000014int.au
5641908374025140.000015no.google
5651908361236180.000010com.newgrounds
566190835961860.000134jp.co.yahoo
5671908285231080.000012org.hypotheses
568190823323370.000082mp.mailchi
5691908222828080.000013com.usmagazine
5701908064816740.000022com.routledge
5711907890428810.000013org.polymer-project
5721907773224460.000015org.unctad
573190770601800.000140com.caniuse
574190761423440.000080com.onesignal
5751907371427140.000014int.interpol
5761907252232480.000011org.elasticsearch
577190722186200.000045com.entrepreneur
5781907210825500.000015uk.gov.metoffice
5791907101629210.000013org.jenkins-ci
580190700526320.000045com.samsung
5811907000813670.000024org.unicode
5821906941233810.000011uk.mod
5831906935827150.000014org.mozillazine
5841906703231480.000011edu.ucpress
5851906670611560.000027com.gizmodo
5861906435619010.000019org.americanbar
5871906335835930.000010org.scala-sbt
588190628923300.000084ai.shortpixel
5891905991429120.000013in.indiatoday
590190587023010.000090gg.discord
5911905854436870.000010jp.riken
5921905682024400.000015com.timesofisrael
5931905651830640.000012com.manta
594190561667850.000037com.fandom
5951905605212740.000025com.sfgate
5961905481219940.000018com.knightlab
5971905368420420.000018org.donorbox
5981905366822510.000017eu.politico
5991905187424940.000015org.gnupg
60019051168840.000402me.ogp
601190507845750.000048com.cisco
6021905014627570.000013uk.ac.york
6031904858811890.000027com.buffer
6041904853430650.000012uk.org.wwf
605190471929290.000035com.variety
6061904521835170.000010com.flightradar24
6071904441633870.000011com.flock
608190443685790.000048com.sedo
609190440848710.000035com.libsyn
6101904316225610.000015com.thenation
6111904299225690.000015com.monday
612190422264820.000057com.arcgis
6131904209832230.000011net.inquirer
6141903882426990.000014com.real
6151903676221950.000018com.secondlife
616190354147320.000040org.unesco
617190352929320.000035com.wikihow
6181903419626090.000014uk.ac.leeds
61919033676800.000415com.livestream
6201903156631000.000012org.cato
6211902928827160.000014org.sonatype
6221902809831690.000011com.intensedebate
6231902800610890.000029com.symantec
6241902746240480.000009org.jw
6251902720827930.000013com.wayfair
6261902686419220.000019com.scene7
62719026030760.000421com.messenger
6281902565616890.000021org.coursera
6291902459211960.000026edu.umn
6301902403030930.000012org.rferl
6311902359025120.000015org.wikidata
632190224646440.000045com.psychologytoday
6331902160630910.000012com.vancouversun
6341902101624960.000015uk.org.nationaltrust
6351902035615530.000022ly.ow
6361902030012640.000025edu.ucsd
6371902019829050.000013tr.com.aa
6381901895837670.000010it.polito
6391901767433890.000011org.sourcewatch
6401901726432070.000011ch.qos
6411901714437850.000010jp.ac.kobe-u
6421901650016590.000022com.speakerdeck
6431901560429740.000012com.sciencealert
644190150346360.000045com.photobucket
6451901502630870.000012com.hsbc
6461901441238520.000009edu.uah
647190134541880.000133com.jimcdn
6481901270011950.000027com.rollingstone
6491901255828970.000013org.osce
6501901215042050.000009com.gust
6511900987416840.000021org.webkit
652190093969450.000034com.shutterstock
6531900867429450.000012com.townnews
6541900831826500.000014org.wri
655190061965200.000053com.inc
656190046726480.000044com.gartner
6571900422625020.000015ru.rg
6581900394226480.000014io.bower
6591900378033710.000011net.thedailystar
6601900372225640.000015net.dwcdn
6611900357427700.000013com.articulate
662190035102210.000117com.myshopify
663190028681720.000151jp.co.google
6641900254813540.000024gov.uspto
665190008249980.000032edu.ucla
666189986326850.000042com.investopedia
6671899850231610.000011com.mongabay
668189977005320.000052com.aol
6691899421023520.000016ca.citizenlab
6701899387812840.000025com.today
671189926261600.000167org.whatwg
6721899232010480.000030com.smartadserver
6731899205429950.000012org.pewforum
6741899126629970.000012org.sierraclub
6751899106229640.000012net.vnexpress
6761899084810690.000030com.about
6771898927430810.000012uk.co.spectator
678189884824800.000058com.dmca
6791898704214620.000023ly.cutt
6801898689032240.000011ru.interfax
6811898654636700.000010uk.co.zoopla
6821898500428430.000013org.iucnredlist
683189846821940.000130com.tripadvisor
6841898425634730.000010fm.audioboo
6851898365427620.000013uk.co.bbci
6861898309834160.000011edu.sjsu
6871898229014440.000023edu.northwestern
688189820405430.000051com.googleoptimize
6891898016028170.000013int.iom
6901897959813860.000024edu.umd
691189788326290.000045org.eff
6921897822422970.000017uk.org.ofcom
6931897803824500.000015int.reliefweb
6941897770835020.000010com.torontosun
695189753484930.000056com.indeed
6961897323816570.000022com.nngroup
697189728983510.000078com.constantcontact
6981897277417800.000020co.lpages
6991897243413290.000024edu.utexas
7001897130233860.000011com.iconarchive
701189712403120.000087com.pubmatic
7021897107810430.000030org.reactjs
703189694289750.000033edu.umich
7041896842812190.000026com.tableau
7051896820819140.000019com.hatenablog
7061896757411380.000028com.chicagotribune
7071896719639990.000009info.spain
708189657405470.000050gov.copyright
7091896544241710.000009org.gwtproject
710189646566500.000044com.netflix
711189637147530.000039net.adform
7121896146228470.000013uk.ac.jisc
7131896117228650.000013com.ringcentral
714189602929760.000033com.redhat
7151896028032370.000011com.city-data
7161895993829820.000012uk.org.stonewall
7171895876436460.000010za.co.timeslive
7181895756039890.000009com.programmableweb
719189572044030.000068com.bigcommerce
7201895716630530.000012com.flippa
7211895571032930.000011com.multiscreensite
7221895529427190.000014com.bloglines
7231895456026290.000014mp.j
7241895252028070.000013uk.org.rspb
7251895235829490.000012com.foreignaffairs
7261895231422060.000018co.pcdn
7271895146235490.000010in.theprint
7281895100642270.000009com.symbaloo
7291895099244780.000008com.algorithmia
7301895035617330.000021com.billboard
731189499987420.000039com.splashthat
7321894833232550.000011com.cleantechnica
7331894792636040.000010com.businessdailyafrica
7341894720411080.000028com.dell
7351894719028250.000013com.yell
736189470084430.000062net.hubspot
7371894694438380.000010org.rfa
7381894661834950.000010za.co.mg
7391894503843460.000008com.apsense
7401894496017720.000020com.alibabagroup
7411894466022670.000017to.dev
7421894455834710.000010ru.mid
7431894427434980.000010com.itsnicethat
744189426345270.000052org.unicef
7451894235823640.000016net.noscript
7461894060613900.000024com.techradar
7471893834818570.000019edu.uci
7481893706411180.000028com.windowsphone
7491893662627340.000014com.doubleclickbygoogle
7501893632635240.000010org.350
7511893508030760.000012org.aei
7521893455830750.000012gov.arts
753189342486710.000043gov.sec
7541893368222980.000017com.urbandictionary
7551893359639250.000009com.forbesimg
756189333104870.000056com.fc2
7571893143633520.000011com.brill
7581893140624910.000015com.infoworld
7591893078213520.000024com.bazaarvoice
7601893035036000.000010de.uni-konstanz
7611893023211870.000027com.alexa
7621892998222400.000017org.linuxfoundation
7631892973835800.000010edu.dukeupress
7641892921840380.000009com.hotfrog
765189288845120.000054com.mckinsey
7661892871025390.000015org.crossref
7671892832838930.000009com.environmentalleader
7681892790022360.000017tv.ustream
7691892729011010.000029fm.last
7701892690619510.000019com.businessweek
771189268164130.000067org.opensource
772189253087500.000039org.whatbrowser
7731892501212960.000025com.merriam-webster
774189244004250.000065com.proofpoint
7751892330231960.000011com.alchemer
7761892268036840.000010com.arfadia
7771892212016920.000021com.kinstacdn
7781892133436190.000010com.ecowatch
7791892130222150.000018net.leadpages
7801892018834480.000010com.total
7811892011238780.000009uk.org.npg
7821891983630540.000012io.crates
7831891932025170.000015com.lego
784189191505030.000055com.wufoo
7851891547827420.000014io.redis
7861891495622890.000017uk.co.metro
7871891401641610.000009uk.co.theweek
7881891393022040.000018gd.is
7891891364041960.000009io.coda
790189134821960.000128com.hackerone
7911891209414830.000023com.msdn
792189115001560.000170org.nginx
7931891139031970.000011com.klokantech
7941891128616680.000022com.sky
7951891014242720.000009de.fernuni-hagen
7961890906418320.000020de.hessen
797189088349530.000034com.adroll
7981890796019860.000018com.windows
7991890625443160.000008com.tupalo
800189042422180.000118org.icann
8011890400411480.000028net.atlassian
8021890344242340.000009net.ccm
8031890337238080.000010com.oilprice
8041890312819560.000019org.khanacademy
8051890289640720.000009net.iwpr
806189024183240.000084eu.youronlinechoices
8071890231838900.000009uk.ac.mmu
8081890196817000.000021edu.usc
8091890167411460.000028com.playstation
8101890085442680.000009uk.ac.ceh
8111890023610600.000030com.akamai
8121889775428510.000013com.hindustantimes
813188966949790.000033gov.fcc
814188963689900.000032com.gumroad
8151889612843830.000008et.com.google
8161889465239520.000009com.theoutline
8171889458643480.000008org.cgsociety
8181889250240930.000009edu.mtsu
8191889244626360.000014com.html5rocks
8201889222245780.000008com.blockchair
8211889157236030.000010org.spie
8221889127812080.000026at.gv.bka
8231889081437680.000010uk.co.lrb
824188881964100.000067com.heroku
825188880628150.000036edu.wisc
8261888793810090.000031com.yoast
8271888784238070.000010za.co.dailymaverick
8281888561418640.000019org.json
8291888519429690.000012org.thinkprogress
830188849287000.000041com.feedly
8311888356242780.000009com.ingress
8321888301237060.000010google.design
8331888295246660.000008com.bmwblog
8341888152638620.000009com.thepetitionsite
8351888131239930.000009in.bbc
8361888086022370.000017com.w3techs
8371888053232570.000011org.carbonbrief
838188804282720.000098jp.ne.hatena
8391888012426140.000014ru.mk
8401888006419240.000019edu.hbs
841188794869350.000034com.pingdom
8421887878011990.000026com.ycombinator
8431887642240650.000009com.gifer
8441887605037090.000010uk.org.amnesty
8451887570039610.000009com.africanews
8461887547845350.000008com.the-dots
847188753509920.000032so.notion
8481886994433750.000011org.commondreams
8491886925042360.000009com.flutterwave
8501886830237310.000010org.refworld
8511886644632850.000011uk.gov.charitycommission
8521886624443960.000008com.newsru
8531886581035070.000010uk.org.oxfam
8541886578642060.000009uk.org.somersethouse
8551886459024840.000015in.scroll
8561886451414100.000023com.intuit
8571886442842570.000009uk.co.harpercollins
858188639423310.000084jp.ameblo
8591886362233700.000011ke.co.nation
8601886358010910.000029com.insurancejournal
8611886336033940.000011com.cbsistatic
8621886330226970.000014com.spreaker
8631886235827200.000014com.springernature
8641886182422280.000017com.firefox
8651886122246880.000008co.iglobal
8661886096047140.000008io.devdocs
8671886024627050.000014com.verywellhealth
868188600205380.000051com.booking
869188598385350.000051com.gofundme
8701885979812340.000026com.indiegogo
8711885932847810.000008com.kdpcommunity
8721885840423440.000016build.bazel
8731885801611190.000028com.foursquare
874188574525450.000051com.snapchat
87518856948930.000390com.trustpilot
8761885654625860.000014com.avast
8771885616018130.000020com.pcworld
8781885527841490.000009com.hybris
8791885498249370.000008com.jetphotos
880188540147110.000041com.yandex
8811885396210230.000031com.css-tricks
8821885390213600.000024org.golang
8831885390041680.000009uk.ac.mdx
8841885359626730.000014com.flipboard
8851885298828480.000013com.discovery
8861885182239040.000009at.kleinezeitung
887188514223020.000090de.amazon
888188505961110.000276me.wa
889188505584290.000064com.skype
8901885029411340.000028com.scientificamerican
8911884864625160.000015org.raspberrypi
8921884777044260.000008com.armorgames
8931884759617580.000020com.fiverr
894188471107210.000040org.iso
8951884646427070.000014com.codecademy
8961884394837270.000010net.middleeasteye
8971884287630570.000012org.man7
8981884107445310.000008com.e-estonia
8991884047418270.000020fr.blogspot
9001884028210470.000030com.huffpost
9011883994447250.000008net.gebco
9021883982247780.000008com.slite
9031883977617990.000020com.visa
904188392587220.000040com.newrelic
9051883721441630.000009com.cnsnews
906188360048070.000037br.com.uol
9071883449433880.000011com.lithub
9081883403639420.000009net.bostonreview
9091883205210220.000031au.com.google
9101883161629580.000012com.hackernoon
9111883149028020.000013com.unity
912188312302490.000106net.2mdn
9131883109012240.000026gov.usgs
914188307144000.000068com.semrush
9151882973238550.000009com.indexmundi
916188296425160.000053com.dailymotion
917188288726860.000042com.accenture
918188273146650.000043org.poynter
9191882657420860.000018org.aclu
9201882633639740.000009org.jython
9211882579211520.000027com.searchengineland
9221882521240030.000009com.inthesetimes
9231882491017530.000020com.over-blog
924188244065180.000053nl.google
9251882412833240.000011de.bfarm
9261882354411260.000028com.techtarget
9271882310644170.000008za.co.ewn
9281882269442510.000009uk.co.bristolpost
9291882210047990.000008community.studiopress
930188218008460.000036gov.justice
9311882068217070.000021com.technologyreview
9321882067239230.000009com.recyclenow
9331881975042930.000009lb.com.dailystar
934188193283840.000071com.bitly
9351881908841010.000009org.occrp
9361881905837860.000010com.theyworkforyou
9371881879840220.000009org.ifaw
938188186481680.000161com.jimstatic
939188182429660.000033sh.brew
9401881806643430.000008com.yahoosites
9411881760423210.000016com.fool
9421881736022810.000017com.pastebin
9431881692836070.000010com.gr-assets
9441881524238720.000009com.climatechangenews
9451881388647710.000008in.ac.iith
946188135867020.000041org.plos
9471881281240680.000009com.chamberofcommerce
9481881260644380.000008us.tuugo
9491881234411420.000028com.buzzsprout
9501881211011220.000028com.timeanddate
951188114187600.000038com.discordapp
9521881108211530.000027com.sitepoint
9531880979845790.000008com.desmogblog
954188096385960.000047com.aliexpress
9551880890625220.000015com.sendgrid
9561880797241280.000009uk.ac.rcplondon
9571880793420040.000018com.ssllabs
9581880773839550.000009org.soilassociation
9591880659213590.000024com.xkcd
960188061925360.000051gov.hhs
9611880558835840.000010com.hearstapps
9621880515012890.000025com.searchenginejournal
963188044584420.000062me.fb
9641880413247110.000008net.sott
9651880362845940.000008com.gpsvisualizer
966188035482130.000120com.discord
9671880275818200.000020org.mitre
9681880191647440.000008com.natureindex
9691880137434990.000010uk.org.rspca
9701880132240870.000009org.c2es
9711880129638240.000010com.qgiv
9721880128837990.000010ug.co.monitor
9731880049245960.000008com.lacartes
974188003841780.000142com.xing
9751879996639730.000009com.svbtle
9761879894438750.000009uk.org.savethechildren
9771879893246160.000008com.slurl
9781879833224660.000015com.sophos
9791879795621980.000018com.twilio
9801879700846620.000008za.co.moneyweb
9811879628842650.000009com.menafn
9821879604838210.000010org.usip
9831879557245840.000008com.power-technology
9841879536846010.000008org.heartland
985187948346750.000042com.usnews
9861879420025570.000015org.usenix
9871879396228990.000013net.privacypolicytemplate
9881879390040810.000009org.theecologist
9891879297041320.000009org.neweconomics
9901879268012900.000025com.netlify
9911879195441890.000009com.businessgreen
9921879075246110.000008org.monthlyreview
9931879045023690.000016uk.ac.ed
9941879019422450.000017ch.ethz
995187899742670.000099com.nielsen
9961878930226740.000014ca.uwaterloo
9971878881042840.000009org.unep-wcmc
9981878770240420.000009org.ramsar
9991878695243840.000008com.googlelabs
10001878680846340.000008org.berkeleyearth

Credits

Thanks to the authors of the WebGraph framework, whose software made the computation of graph properties and ranks possible.

We hope the data will be useful for you to do any kind of research on ranking, graph analysis, link spam detection, etc. Let us know about your results via Common Crawl’s Google Group!