We are pleased to announce a new release of host-level and domain-level web graphs based on the crawls of October, November/December 2021 and January 2022. Additional information about the data formats, the processing pipeline, our objectives, and credits can be found in the announcements of prior webgraph releases. You may also visit the projects cc-webgraph and cc-pyspark which include all scripts and tools required to construct the graphs. Instructions to explore the graphs in the webgraph format are given in our collection of webgraph notebooks.

Host-level graph

The graph consists of 384 million nodes and 2.47 billion edges. Both hyperlinks and HTTP redirects and link headers are used as edges to span up the graph. All types of links are included, including pure “technical” ones pointing to images, JavaScript libraries, web fonts, etc. However, only host names with a valid IANA TLD are used. Consequently, URLs with an IP address as host component are not taken into account for building the host-level graph.

There are 326 million dangling nodes (84.6%) and the largest strongly connected component contains 45.2 million (11.7%) nodes. Dangling nodes stem from

  • hosts that have not been crawled, yet are pointed to from a link on a crawled page
  • hosts without any links pointing to a different host name
  • or hosts which did only return an error page (eg. HTTP 404)

Host names in the graph are in reverse domain name notation and a leading www. is stripped: www.subdomain.example.com becomes com.example.subdomain.

You can download the graph and the ranks of all 766 million hosts from AWS S3 on the path s3://commoncrawl/projects/hyperlinkgraph/cc-main-2021-22-oct-nov-jan/host/ (this requires an account on AWS). Alternatively, you can use https://data.commoncrawl.org/projects/hyperlinkgraph/cc-main-2021-22-oct-nov-jan/host/ as prefix to access the files from everywhere.

Please note that the text representation of the host-level graph is shipped in 96 gzip-compressed files listed in two path listings – one for the nodes (vertices), one for the edges (arcs). First, download the paths listing and decompress it using “gzip -d” or “gunzip”. By adding the prefix s3://commoncrawl/ or https://data.commoncrawl.org/ to each line in the path listing you get the list of URLs to download the entire graph.

Download files of the Common Crawl Oct/Nov/Jan 2021-2022 host-level webgraph

SizeFileDescription
2.66 GBcc-main-2021-22-oct-nov-jan-host-vertices.paths.gznodes ⟨id, rev host⟩, paths of 32 vertices files
11.76 GBcc-main-2021-22-oct-nov-jan-host-edges.paths.gzedges ⟨from_id, to_id⟩, paths of 64 edges files
5.32 GBcc-main-2021-22-oct-nov-jan-host.graphgraph in BVGraph format
2 kBcc-main-2021-22-oct-nov-jan-host.properties
5.78 GBcc-main-2021-22-oct-nov-jan-host-t.graphtranspose of the graph (outlinks inverted to inlinks)
2 kBcc-main-2021-22-oct-nov-jan-host-t.properties
1 kBcc-main-2021-22-oct-nov-jan-host.statsWebGraph statistics
6.38 GBcc-main-2021-22-oct-nov-jan-host-ranks.txt.gzharmonic centrality and pagerank

Domain-level graph

The domain graph is built by aggregating the host graph on the level of pay-level domains (PLDs) based on the public suffix list maintained on publicsuffix.org. Version (commit) 68b67d3 of the public suffix list was used (commit date 2022-03-04).

The domain-level graph has 90 million nodes and 1.55 billion edges. 50% or 45 million nodes are dangling nodes, the largest strongly connected component covers 36 million or 40% of the nodes.

All files related to the domain graph are available on AWS S3 under s3://commoncrawl/projects/hyperlinkgraph/cc-main-2021-22-oct-nov-jan/domain/ or on https://data.commoncrawl.org/projects/hyperlinkgraph/cc-main-2021-22-oct-nov-jan/domain/.

Download files of the Common Crawl Oct/Nov/Jan 2021-2022 domain-level webgraph

SizeFileDescription
0.62 GBcc-main-2021-22-oct-nov-jan-domain-vertices.txt.gznodes ⟨id, rev domain, num hosts⟩
6.36 GBcc-main-2021-22-oct-nov-jan-domain-edges.txt.gzedges ⟨from_id, to_id⟩
3.65 GBcc-main-2021-22-oct-nov-jan-domain.graphgraph in BVGraph format
2 kBcc-main-2021-22-oct-nov-jan-domain.properties
3.53 GBcc-main-2021-22-oct-nov-jan-domain-t.graphtranspose of the graph
2 kBcc-main-2021-22-oct-nov-jan-domain-t.properties
1 kBcc-main-2021-22-oct-nov-jan-domain.statsWebGraph statistics
1.93 GBcc-main-2021-22-oct-nov-jan-domain-ranks.txt.gzharmonic centrality and pagerank

Below you’ll find the top 1000 domains ranked by Harmonic Centrality or PageRank. The full list of all 90 million domain ranks is available for download.

Top 1000 domains ranked by harmonic centrality (Oct/Nov/Jan 2020-2021)

harmonic
centrality
rank
hc valuepage rankpage rank
value
reversed domain name
13144841810.017921com.googleapis
23010508630.013006com.facebook
32899172620.014028com.google
42623388660.007154org.w
52623015240.008081com.twitter
62586499480.006261com.youtube
72532525070.006558com.instagram
82466516050.007716com.googletagmanager
92409722490.004716org.gmpg
1023219788110.003948com.gstatic
1123145196130.003418com.linkedin
1222256010100.004364com.cloudflare
1321953630170.001942com.gravatar
1421841462210.001594com.pinterest
1521786278140.003223org.wordpress
1621441268250.001417org.wikipedia
1721311496160.002057com.apple
1821176702330.001077com.wordpress
1921045484320.001141com.vimeo
2021032900150.002070com.bootstrapcdn
2120995456410.000913be.youtu
2220796928180.001658com.jquery
2320726342280.001191com.microsoft
2420673758450.000789com.blogspot
2520634464230.001451io.polyfill
2620617114470.000775gl.goo
2720589400490.000736com.amazon
2820486426640.000550ly.bit
2920476754290.001178com.wixstatic
3020455806500.000729com.wp
3120452992220.001551net.cloudfront
3220445298400.000962com.amazonaws
3320410144430.000863org.mozilla
3420403210310.001162net.jsdelivr
3520395958510.000721eu.europa
3620389728370.000994com.google-analytics
3720343648200.001595com.fontawesome
3820336692910.000384com.tumblr
3920333242190.001648com.adobe
4020324398240.001421com.github
4120181488750.000480com.googleusercontent
4220150384550.000684com.flickr
43201227581040.000326com.yahoo
4420112952570.000670com.paypal
4520106820480.000752io.github
46201027501060.000314com.reddit
47200600461170.000268com.soundcloud
4820049174380.000966com.googlesyndication
4920043166810.000425com.medium
5020007290530.000703org.w3
51199947101270.000231com.nytimes
5219974404620.000611co.t
53199561441020.000338com.weebly
54199556061140.000277com.spotify
5519925440580.000656com.whatsapp
5619906786340.001038ru.yandex
5719901846850.000401org.creativecommons
58198942141360.000208org.archive
59198708461830.000139com.cnn
6019863762630.000608org.schema
6119855684600.000645com.addthis
62198513441460.000194com.forbes
63198365822030.000127uk.co.bbc
6419834544700.000513com.shopify
65198117582340.000113com.washingtonpost
6619808664690.000523com.vk
67198048621850.000138com.bing
68198036221470.000193gov.cdc
69198029141570.000172int.who
7019798680920.000383me.wp
7119780956440.000809net.doubleclick
72197607381410.000201gov.nih
7319749646590.000649com.macromedia
7419748198710.000506com.unpkg
75197476642600.000103net.researchgate
76197341102520.000106com.wsj
77197331522930.000093edu.stanford
78197316742320.000115com.imdb
79197247601880.000134org.wikimedia
8019712934390.000965net.fbcdn
81197009582060.000124com.businessinsider
82196901141430.000200com.dropbox
83196878202840.000095edu.mit
84196830601000.000363com.list-manage
85196823383080.000089com.tinyurl
8619665976270.001204org.apache
87196574461630.000157com.theguardian
88196514582990.000092com.android
89196477684450.000067com.quora
90196430181650.000156org.doi
91196365943280.000085com.go
92196298262400.000109com.bloomberg
93196297622740.000098edu.harvard
94196274445010.000060com.msn
95196258701640.000157com.issuu
96196256462540.000106com.oracle
97196246783440.000083com.springer
98196219461490.000190com.wixsite
99196151821400.000203us.zoom
100196124841310.000220com.npmjs
101196088121110.000294me.t
102196083143340.000084com.slack
103196023541240.000241com.mailchimp
104195961922440.000108com.stackoverflow
105195923123030.000091com.reuters
106195878003210.000087com.techcrunch
107195861925050.000059com.myspace
108195861262410.000109com.twimg
109195842381660.000155com.giphy
110195839082920.000093com.example
11119577384520.000709com.fb
112195767261670.000153com.yelp
113195760201690.000151com.office
114195725023410.000083com.prnewswire
115195679661900.000133com.unsplash
116195644921080.000309de.google
117195610163000.000091com.wiley
11819555072460.000785net.facebook
119195525102680.000101org.un
120195490982560.000105com.sciencedirect
121195481624890.000061com.latimes
122195475085940.000050com.livejournal
123195474521450.000196gle.forms
124195443004660.000063uk.co.telegraph
125195436904020.000078com.nature
126195423843400.000083org.npr
127195414204840.000062com.ted
128195353325140.000057edu.berkeley
129195331066470.000046com.vice
130195313282370.000110org.gnu
131195297981980.000130org.ietf
132195278527140.000042uk.ac.cam
133195249384340.000071com.time
134195223822890.000094com.bbc
135195174604970.000060com.goodreads
136195136164120.000076org.arxiv
137195100343060.000090com.cnbc
138195081841440.000197com.ytimg
139194922067000.000043edu.columbia
140194836424330.000071com.sagepub
141194759081530.000186com.ft
142194745921730.000149org.acm
143194733163230.000086com.githubusercontent
144194731324380.000069com.cnet
145194680421200.000254com.youtube-nocookie
146194585404090.000077com.wired
147194556342050.000125com.imgur
148194542105180.000057uk.co.dailymail
149194496282020.000127com.blogger
15019448116790.000459com.godaddy
15119443158260.001330cn.gov.miit
152194377583430.000083com.theverge
153194355685090.000058edu.yale
154194342361710.000150org.ampproject
155194264464720.000063com.nationalgeographic
156194232522810.000096com.squarespace
157194198808040.000037org.chromium
158194165306960.000043uk.ac.ox
159194131384880.000061com.googleblog
160194078484580.000064gov.whitehouse
161194076143040.000090com.usatoday
162194072284990.000060com.staticflickr
163194070668290.000036com.evernote
164194047341920.000131com.hubspot
165194044144760.000062org.ieee
166194023985870.000051org.worldbank
167194020044070.000077com.dribbble
168194002141320.000219com.statcounter
169193997965120.000058ee.linktr
170193993585270.000056edu.cornell
171193955001230.000243com.sharethis
172193882465190.000057com.theatlantic
173193874544080.000077com.docker
174193851826730.000044com.git-scm
175193833042140.000122com.wpengine
176193830307970.000037org.sciencemag
177193817508110.000037com.arstechnica
178193814742750.000098gg.discord
179193812221590.000167com.zendesk
180193790422640.000101uk.co.google
181193766641380.000206me.line
182193735262460.000107uk.co.amazon
183193734486020.000050com.zdnet
184193721542390.000109net.slideshare
185193690463250.000086com.appspot
186193688667020.000042com.economist
187193686207370.000041org.cambridge
188193681125710.000052com.cisco
189193671447920.000038edu.washington
190193663446480.000046org.weforum
191193617606680.000044com.box
192193616926240.000047org.pbs
193193562204790.000062org.python
194193556044360.000070com.huffingtonpost
195193542842260.000117com.outlook
196193531125420.000055com.typepad
197193500442880.000095org.pewresearch
198193456105580.000054com.cbsnews
199193427742670.000101net.windows
200193346385560.000054com.deloitte
2011933368810580.000028com.rollingstone
202193332604960.000060com.pixabay
203193332605540.000054gov.usda
204193323907200.000042google.blog
205193312865720.000052site.business
206193308025440.000055uk.co.independent
2071932516212230.000025ly.cutt
20819323700420.000906com.qq
209193236067600.000039com.apnews
210193233027450.000040ca.cbc
211193228386390.000046org.unesco
212193224344930.000061com.gitlab
213193079487040.000042com.mysql
214193074006110.000049com.pexels
215193021865860.000051gov.loc
216192976227280.000041edu.upenn
217192962627130.000042edu.wisc
218192958864850.000062com.getpocket
219192950205530.000054com.nbcnews
220192940104510.000065com.fastcompany
2211929259213810.000022com.ikea
222192912941780.000143com.tripadvisor
2231928675811710.000026org.eclipse
224192859346760.000043com.scribd
225192850007680.000039com.shutterstock
226192841786340.000047com.mozilla
2271928273812250.000025org.kernel
228192778687980.000037uk.co.blogspot
229192774807840.000038com.qz
230192722468590.000034com.ggpht
231192719883180.000087com.live
232192710809790.000030uk.co.guardian
233192689343010.000091com.w3schools
2341926539819240.000018com.lego
235192643824980.000060gov.irs
236192621488850.000034edu.jhu
237192600066720.000044com.buzzfeed
238192597086740.000044uk.co.eventbrite
239192593667960.000038com.trello
2401925845811810.000025com.technologyreview
241192561929840.000030com.playstation
2421925501410570.000028fr.lemonde
243192514585520.000054com.squareup
244192512665500.000054com.fortune
245192498145070.000059gov.nasa
246192496447510.000040me.about
247192441865960.000050com.oup
248192432562830.000095net.behance
249192417509730.000031com.foursquare
2501924036422120.000017com.hbo
251192388066350.000047fm.anchor
252192372123270.000085com.disqus
253192358809150.000033com.slate
25419234328540.000697co.g
25519232270360.000996com.baidu
256192310944560.000065com.bigcommerce
257192294301750.000146jp.co.google
258192292702490.000107com.calendly
259192292287560.000040com.vox
260192275645570.000054com.dailymotion
261192264806540.000045com.investopedia
262192255248600.000034com.ubuntu
263192251462720.000099com.bandcamp
2641922425815640.000021com.hatenablog
2651922244610870.000027co.elastic
266192214447240.000042com.newyorker
267192213608540.000035com.about
268192207624600.000064com.arcgis
269192194408140.000037com.variety
270192190247810.000038au.net.abc
271192172285510.000054com.elpais
272192156128220.000036edu.ucla
273192149428020.000037gov.congress
274192144226770.000043org.apa
275192124027400.000041com.freepik
2761921134611260.000026com.steamcommunity
277192108982500.000107gov.ca
278192098507730.000039org.pypi
279192095267190.000042com.libsyn
280192083088650.000034edu.princeton
281192079241420.000200com.opera
282192072849400.000032com.nypost
283192052968130.000037edu.umich
2841920419213390.000023com.billboard
285192040563120.000088com.typeform
286192019042800.000097com.feedburner
287192005508710.000034com.ssrn
288191997145460.000055com.tandfonline
289191979089050.000033com.podbean
290191977542240.000117page.g
291191976048270.000036org.fao
292191975388680.000034com.foxnews
293191971269640.000031com.merriam-webster
2941919624210930.000027edu.purdue
2951919033015790.000021ca.ubc
296191867727100.000042org.bitbucket
297191864181010.000349com.wix
2981918397210750.000028org.owasp
299191823722730.000099com.ibm
300191809869550.000031com.newsweek
301191782906880.000043org.semver
302191779502510.000106org.bbb
3031917037414030.000022ca.sfu
3041916991021550.000017com.discovery
3051916970614540.000022uk.co.metro
306191689484030.000078org.openstreetmap
307191681289280.000032com.webs
308191642242770.000098com.eepurl
309191633123990.000079com.netdna-ssl
310191626703100.000089com.wistia
3111916076813010.000023app.netlify
312191602429100.000033com.nasdaq
313191588846370.000047gov.senate
314191587341350.000212com.filesusr
315191567925280.000056com.snapchat
316191567323170.000088tv.twitch
3171915547213520.000023uk.co.standard
318191536349770.000030com.uk
319191504546690.000044org.eff
3201914870614330.000022io.gitlab
3211914676216090.000021com.warnerbros
3221914544810250.000029com.techradar
3231914461811730.000026com.500px
3241914363011580.000026com.pastebin
325191416445830.000051gov.epa
326191405748200.000036com.theconversation
3271913929411470.000026org.semanticscholar
328191391301250.000238com.rawgit
3291913884212280.000025com.sky
3301913653417470.000019com.flipboard
331191348024540.000065com.ebay
332191338162040.000125com.amazon-adsystem
333191330285990.000050edu.cmu
3341913292612820.000024edu.illinois
3351913271013270.000023org.greenpeace
336191317924290.000073com.optimizely
3371913056216680.000020com.urbandictionary
338191303122010.000127org.iana
339191295466090.000049gov.house
34019129270980.000373com.stripe
341191270044630.000064org.opensource
342191253782470.000107com.cloudinary
343191224569030.000033edu.academia
3441912007011820.000025org.mitre
3451911999210080.000030gov.usgs
346191190442150.000122net.sourceforge
3471911892421170.000018com.channel4
3481911813614890.000022uk.co.thesun
3491911751615580.000021com.deadline
350191146109960.000030com.thehill
351191135768970.000033edu.umn
352191132267580.000040gov.justice
3531911018616430.000020org.maven
354191100161560.000173com.addtoany
355191081844040.000077com.criteo
3561910322023560.000016com.freep
357191013681280.000230com.paypalobjects
3581909877011330.000026com.nikkei
359190987584570.000065es.google
360190964486780.000043org.oecd
3611909332812910.000024org.postgresql
3621909221419410.000018com.euronews
363190917227720.000039gov.archives
3641909126215840.000021com.reverbnation
3651909037611380.000026uk.co.mirror
366190887765640.000053com.kickstarter
3671908746027660.000013edu.byu
3681908707013380.000023edu.hbs
3691908541413820.000022com.googlesource
3701908472621160.000018edu.wustl
3711908445810100.000030com.politico
3721908322222260.000017org.nobelprize
3731908258411110.000027com.dw
3741908228410220.000029com.pingdom
375190821945480.000054com.walmart
37619078996840.000405net.jsfiddle
3771907856812840.000024ch.ethz
3781907376622990.000016gov.cia
379190734789830.000030com.salon
380190730427150.000042org.change
3811907281011840.000025com.theglobeandmail
382190714724620.000064com.elsevier
3831907122218370.000019com.storify
384190664381370.000207de.bund
385190663841210.000250com.jimdo
3861906634414290.000022edu.gatech
38719064084670.000527net.typekit
3881906273813100.000023com.digitaltrends
3891906247812860.000024int.unfccc
390190613128100.000037au.com.google
3911905974610260.000029gov.treasury
3921905951422830.000016com.mystrikingly
393190591109060.000033com.britannica
3941905817012720.000024edu.ucdavis
395190575589040.000033uk.parliament
39619056260760.000468me.fb
3971905432010270.000029com.mdpi
3981905274612440.000024com.aljazeera
399190521442070.000124com.etsy
400190520622690.000100net.azureedge
4011905181210630.000028gov.fbi
4021905172016080.000021ms.1drv
403190488368240.000036com.bmj
4041904865413250.000023de.mpg
4051904742425380.000014com.virustotal
4061904712211060.000027org.nejm
407190463362110.000123com.tiktok
408190457541940.000131org.nodejs
4091904489225660.000014com.diigo
4101904361411830.000025com.scmp
4111904280010950.000027au.com.smh
412190422587180.000042org.d3js
4131904180413110.000023com.history
4141904150611780.000026org.hrw
4151903975412350.000025uk.ac.ucl
4161903835612130.000025com.socialmediatoday
4171903577610380.000029edu.uchicago
4181903377427100.000013com.thecvf
419190303569680.000031org.readthedocs
42019030118610.000612com.googleadservices
4211902981410180.000029org.jstor
422190290708330.000035com.pinimg
4231902869624860.000015com.oxforddictionaries
4241902841021450.000017com.discogs
4251902747625680.000014edu.buffalo
4261902335218100.000019com.buzzfeednews
427190231969570.000031watch.fb
4281902249211360.000026org.sphinx-doc
4291902249018510.000018com.spreaker
4301902207815480.000021com.irishtimes
431190218506080.000049com.biomedcentral
4321901943814520.000022uk.ac.lse
433190183084690.000063org.hbr
434190181063960.000079com.statista
4351901738610450.000029com.substack
436190141822850.000095ru.ok
4371901337435090.000010com.quizlet
438190124708070.000037com.deviantart
4391901060211150.000027org.undp
4401900553219590.000018com.rt
4411900465210040.000030org.ilo
4421900441633520.000011cc.uxdesign
4431900330622340.000017org.wto
4441900269016730.000020org.rfc-editor
4451900203413020.000023com.penguinrandomhouse
446190015369200.000032de.spiegel
4471899994013180.000023com.producthunt
448189975827050.000042gov.sec
449189971285200.000057com.meetup
4501899471821620.000017com.ibtimes
4511899389810610.000028com.sun
4521899278821200.000018gov.federalreserve
4531899162417600.000019edu.arizona
454189907326000.000050edu.utah
4551899064218610.000018com.newscientist
456189897305370.000055com.gmail
4571898941011230.000026net.java
4581898664418850.000018com.itv
459189864505040.000059com.ssl-images-amazon
460189860102430.000108uk.org.ico
4611898554217810.000019ca.blogspot
46218985454730.000503net.akamaihd
463189850108400.000035in.co.google
4641898480213910.000022de.zeit
4651898448010050.000030uk.co.thetimes
4661898445810420.000029com.prweb
4671898387626610.000013com.twitpic
4681898379011950.000025io.pypa
4691898264829560.000012com.openai
470189804944470.000067net.imgix
4711897985019010.000018com.martinfowler
472189790622420.000108org.purl
473189785966550.000045de.gesetze-im-internet
474189784503970.000079net.themeforest
475189784082160.000121jp.co.yahoo
4761897799214090.000022edu.ufl
477189773504250.000073com.atlassian
4781897578413730.000023edu.duke
479189744862360.000111to.amzn
4801897413022850.000016edu.gmu
4811897400410560.000028edu.nyu
482189734145750.000052org.debian
4831897314812490.000024com.jetbrains
484189731082350.000111com.mapbox
485189730122960.000092me.telegram
4861897255612870.000024com.wikia
48718972192820.000409com.oversightboard
488189711303420.000083com.proofpoint
489189708309940.000030com.jimdofree
4901897078226380.000014org.nypl
491189688348060.000037edu.brookings
4921896847022130.000017org.wfp
4931896836624170.000015mp.j
4941896796626240.000014app.web
4951896744823630.000015com.instructables
4961896695012670.000024org.imf
4971896595211340.000026org.unhcr
4981896565016100.000021edu.virginia
4991896558630000.000012ph.telegra
5001896335821220.000018org.propublica
5011896306816550.000020edu.brown
5021896102014300.000022com.seattletimes
503189604543130.000088io.shields
5041896017423480.000016org.archlinux
505189597303550.000081com.surveymonkey
506189588927470.000040gov.state
5071895686410190.000029com.yarnpkg
5081895670621420.000017org.phys
5091895667216880.000020org.unwomen
510189552008090.000037com.fiverr
5111895488429300.000012org.vim
5121895463434470.000010com.instapaper
513189538142210.000119com.eventbrite
514189531368780.000034edu.psu
5151894851415990.000021com.asahi
5161894820623380.000016ca.ualberta
5171894770631300.000011com.rd
518189472607480.000040com.intel
5191894708824140.000015com.gfycat
5201894697626490.000013org.icrc
5211894470024670.000015org.biorxiv
5221894121416580.000020org.r-project
523189398742570.000105com.aliyuncs
524189368461390.000205com.weibo
5251893647418520.000018com.gettyimages
526189317985250.000057com.googlecode
5271893022436990.000010com.plurk
5281892932021360.000017org.unep
5291892911417370.000019com.howstuffworks
5301892892226170.000014com.udacity
5311892753418300.000019edu.georgetown
5321892673823270.000016com.esri
533189256126850.000043uk.gov.service
5341892427427420.000013jp.co.japantimes
5351892424224060.000015com.kobo
536189215985550.000054com.samsung
5371892149414470.000022fr.gouvernement
5381892051828210.000013org.wikibooks
5391891955024540.000015it.scoop
5401891754234410.000010net.openreview
5411891695021370.000017es.abc
5421891368424220.000015jp.geocities
5431891349427480.000013edu.uoregon
5441891290021190.000018google.ai
5451890939226550.000013co.carrd
5461890761621150.000018uk.co.huffingtonpost
547189044046900.000043com.mashable
548189035706040.000049com.steampowered
5491890282817300.000020org.torproject
550189025704490.000066com.netflix
5511890168422150.000017google.research
5521890047826990.000013at.ac.univie
5531889942821430.000017edu.tufts
554188977749130.000033com.thelancet
5551889491637910.000009goog.translate
556188938289820.000030org.ohchr
5571889246438800.000009com.bravesites
5581889043228870.000012org.rsf
5591889034217920.000019gov.usembassy
5601888666230160.000012com.architecturaldigest
5611888609811030.000027cn.news
5621888556623670.000015uk.bl
5631888318634070.000010uk.co.walesonline
5641888234424610.000015org.accessnow
5651888085622910.000016com.france24
5661887997633090.000011com.pearltrees
5671887890228150.000013org.freedomhouse
568188755282190.000120com.salesforce
5691887486825750.000014org.scala-lang
5701887426611420.000026be.google
5711887422026840.000013re.appsto
5721887324624000.000015org.ap
5731887300831830.000011do.bit
5741887231428330.000012com.sputniknews
5751887192621650.000017org.americanprogress
5761887118615680.000021com.chron
5771887107227920.000013org.unaids
5781886990226890.000013com.ajc
5791886855822510.000016app.vercel
580188680764950.000060com.visualstudio
5811886674613330.000023net.daringfireball
5821886546426540.000013org.csis
5831886376023500.000016com.ew
5841886209212930.000024link.page
5851886128623200.000016fr.gouv.diplomatie
586188607901890.000133ru.mail
587188606368520.000035org.mediawiki
588188599306220.000048com.thinkwithgoogle
5891885861228650.000012com.duolingo
5901885558621120.000018com.domaintools
591188553942910.000094net.secureservercdn
5921885523628600.000012com.biography
5931885396416480.000020jp.ne.goo
5941885343418270.000019com.lifewire
5951885316024790.000015ie.independent
5961885013629030.000012uk.ac.leeds
5971884948429510.000012com.allure
5981884945222410.000016com.timeout
5991884820631350.000011org.cpj
6001884701641370.000009com.bonanza
6011884432416400.000020ca.globalnews
6021884338017740.000019gov.in
6031884317017310.000020com.images-amazon
6041884311627820.000013com.depositphotos
6051884307418430.000018com.thebalance
60618842016860.000401com.livestream
607188416822870.000095com.naver
608188415584770.000062com.force
6091884075614220.000022net.codecanyon
6101883889033460.000011io.ghost
6111883888424190.000015com.teenvogue
6121883883623660.000015nz.co.stuff
6131883743827320.000013com.123rf
6141883640825080.000014com.motherjones
615188363546790.000043int.wipo
6161883633226150.000014edu.kit
6171883468417060.000020com.routledge
618188343366950.000043io.readthedocs
6191883403836920.000010com.laweekly
620188322084410.000069com.businesswire
6211883126629140.000012org.oxfam
622188298524860.000062com.adweek
6231882961624980.000014edu.hawaii
6241882921635430.000010com.udn
625188282707260.000042com.canva
6261882803429850.000012com.slides
627188279086270.000047io.codepen
6281882755226640.000013com.googlegroups
6291882634839010.000009cn.org.china
6301882570026980.000013com.coca-colacompany
631188256028880.000033uk.co.pinterest
6321882305825510.000014org.fas
6331882283011100.000027net.clickbank
6341882256830990.000011uk.co.timesonline
635188224323630.000081net.php
6361882141026120.000014edu.iastate
6371882139221440.000017com.refinery29
638188203149880.000030gov.dhs
6391881989642160.000008com.alamy
640188198029910.000030de.t-online
641188177002790.000097com.iubenda
6421881743416120.000021com.haaretz
6431881661615650.000021mil.army
6441881648430530.000011com.hm
6451881615616050.000021uk.gov.ons
646188137403590.000081mp.mailchi
6471881142834690.000010org.heritage
6481880892812220.000025org.eugdpr
6491880874823410.000016za.co.google
650188083485630.000053org.unicef
6511880820629380.000012com.theonion
652188079323450.000083com.akismet
653188072161500.000190org.networkadvertising
654188053848340.000035com.venturebeat
6551880367026710.000013com.timesofisrael
6561880296831820.000011com.ogilvy
657188009781340.000217info.aboutads
6581880034412650.000024tw.com.google
659187994725490.000054com.fc2
6601879817027150.000013com.theintercept
6611879814023770.000015com.foreignpolicy
6621879794440090.000009com.zara
6631879675830420.000012org.project-syndicate
6641879641027900.000013cn.gov.fmprc
665187955865410.000055com.patreon
6661879433229120.000012org.ballotpedia
6671879378636750.000010uk.co.guim
6681879366812030.000025com.thenextweb
6691879325024600.000015nz.co.nzherald
6701879195623290.000016gov.faa
671187916066640.000045com.entrepreneur
6721879029816990.000020com.nike
6731878822022570.000016com.voanews
6741878568236460.000010com.podomatic
675187836523940.000080jp.ameblo
6761878173236650.000010nz.co.scoop
6771877982626680.000013com.jpost
678187797707550.000040org.js
6791877887425890.000014de.tagesspiegel
680187784365910.000051com.gofundme
681187782366510.000046it.placehold
682187781306920.000043gov.nist
6831877752830900.000011no.uib
6841877726437590.000010com.clustrmaps
6851877623825330.000014com.channelnewsasia
6861877562623320.000016com.carto
6871877497025440.000014edu.usf
6881877491044010.000008uk.ac.essex
6891877447422030.000017de.br
6901877310242560.000008org.marxists
6911876955828260.000013br.com.blogspot
692187694286800.000043com.photobucket
6931876921633950.000010com.parade
6941876841237310.000010com.mongabay
695187682447420.000041com.moz
6961876807035190.000010ar.com.lanacion
6971876748011620.000026com.digitaloceanspaces
6981876725238330.000009com.scribblelive
6991876628436670.000010ru.msk
7001876514817070.000020org.oxfordjournals
7011876501818460.000018com.speakerdeck
7021876434211770.000026com.jekyllrb
7031876427213200.000023com.imageshack
704187636286980.000043com.withgoogle
7051876351827750.000013com.fineartamerica
7061876330616270.000020org.amnesty
7071876251225720.000014org.unctad
7081876194030490.000012int.au
709187611621090.000306me.wa
7101876093621840.000017org.ncsl
7111875934028570.000012uk.org.nationaltrust
7121875839644350.000008com.mysanantonio
7131875827831540.000011fr.rfi
7141875767413000.000023gov.federalregister
7151875692059680.000006org.arkive
7161875685833850.000011com.nationalreview
7171875639822320.000017org.worldcat
7181875638841600.000009com.turkishairlines
7191875610030770.000011uk.ac.york
7201875587630290.000012org.nationalgeographic
7211875539033600.000011org.tigris
722187552283620.000081com.adnxs
7231875366817650.000019com.indianexpress
7241875348232930.000011org.neocities
7251875170230580.000011ly.genial
7261875039232740.000011uk.co.penguin
727187502429190.000032com.hootsuite
7281875000234940.000010com.nme
7291874733427350.000013com.kaggle
730187468941930.000131com.discord
7311874660033570.000011de.taz
7321874629831470.000011edu.bc
7331874619030410.000012tr.com.aa
7341874554431560.000011com.cgtn
7351874513224310.000015org.unodc
736187445502820.000096gov.ftc
7371874368223440.000016eu.politico
7381874305213640.000023com.symantec
7391874083425200.000014net.openid
7401874075236370.000010il.ac.tau
7411873939824530.000015ru.ria
7421873894830310.000012com.allafrica
7431873877211720.000026jp.ac.keio
7441873820028690.000012edu.educause
7451873815634020.000010org.firstmonday
7461873812226420.000014org.wikidata
747187377884910.000061com.placeholder
7481873498029000.000012com.simonandschuster
7491873486037960.000009org.amnestyusa
7501873469621680.000017com.justia
7511873330821970.000017ca.on.gov
7521873308837070.000010uk.gov.scotland
7531873302645630.000008com.flightradar24
7541873294443260.000008com.interviewmagazine
7551873274237440.000010com.afp
7561873215441120.000009org.scala-sbt
7571873099227800.000013ae.google
7581873082413030.000023org.webkit
7591872988229470.000012com.superuser
7601872967810140.000029com.highcharts
7611872902038020.000009com.wusa9
7621872890025500.000014jp.nicovideo
7631872809421410.000017gov.pa
7641872799039370.000009org.one
7651872667229080.000012edu.uky
7661872503430520.000011in.businessinsider
7671872447433820.000011org.hypotheses
7681872337826810.000013org.wbur
769187232365690.000052com.inc
7701872106614340.000022com.upwork
7711872090842320.000008org.sourcewatch
7721872067633660.000011com.sciencealert
7731872022213160.000023de.rki
7741871911434660.000010org.royalsociety
7751871852423540.000016ru.rbc
7761871819810340.000029com.videojs
7771871753033830.000011org.polymer-project
778187172284870.000062ee.lin
7791871700832020.000011org.texastribune
7801871659810540.000028fm.last
7811871656436580.000010se.gu
7821871591621810.000017it.redd
7831871555212190.000025com.smashingmagazine
7841871553627430.000013org.undocs
7851871346027130.000013org.iucn
7861871309236620.000010com.hashicorp
7871871294417120.000020scot.gov
788187125649760.000031com.jwplayer
7891871233038430.000009edu.wayne
790187111225760.000052com.booking
791187109068030.000037com.fandom
7921871085838500.000009com.triplepundit
793187096903290.000085com.hackerone
7941870931442350.000008com.letterboxd
7951870765411510.000026com.alexa
7961870753422580.000016com.knightlab
797187069647700.000039com.sedo
7981870675033940.000010org.iucnredlist
7991870638019440.000018com.firebaseapp
8001870580840360.000009com.manta
8011870254630720.000011au.com.theage
8021870145034750.000010org.sierraclub
803187004244000.000078com.onesignal
8041870034826520.000013ru.kommersant
8051870025033790.000011com.hasbro
8061869935638210.000009edu.unu
8071869914636560.000010com.crashlytics
808186988948170.000037com.marketwatch
8091869849440710.000009ru.aif
8101869822444190.000008com.folkd
8111869815212850.000024gov.uspto
8121869773433580.000011net.ipsnews
8131869748426320.000014org.unfpa
814186973369870.000030com.stackexchange
8151869651032070.000011ly.plot
816186961845320.000056com.indeed
8171869533416670.000020fr.blogspot
818186952469720.000031com.css-tricks
819186943829330.000032org.reactjs
8201869329240830.000009com.marinetraffic
8211869294027060.000013ru.rg
8221869290043650.000008com.balenciaga
8231869246825610.000014com.kinstacdn
8241869171025860.000014build.bazel
825186907465450.000055com.digg
8261869019039970.000009jp.co.tepco
8271869018214570.000022io.webflow
8281869014846850.000008com.gmanetwork
8291868999437790.000009org.rferl
8301868987044160.000008kr.co.koreatimes
831186894528210.000036com.oreilly
832186894269530.000031gov.fcc
8331868902429570.000012com.articulate
8341868863430540.000011site.notion
8351868825624410.000015int.reliefweb
8361868821827220.000013com.insidehighered
8371868787210860.000027so.notion
8381868781044660.000008org.sfpl
8391868767235800.000010uk.co.spectator
8401868762228400.000012com.suntimes
841186874529170.000032com.verisign
8421868688029670.000012org.cfr
8431868662228190.000013org.panda
8441868629810160.000029com.mixcloud
84518686002830.000405com.messenger
846186859345330.000056jp.co.rakuten
8471868576443430.000008com.upworthy
8481868549426440.000014ru.kremlin
849186848062780.000097com.sxsw
8501868421235420.000010com.flippa
851186840065680.000052com.mckinsey
8521868395625340.000014net.convio
8531868351012310.000025com.buffer
8541868300631000.000011com.yougov
8551868292851100.000007com.viki
8561868243646740.000008org.birdlife
8571868196644130.000008com.itsnicethat
858186813746500.000046com.gartner
8591868137229270.000012uk.gov.metoffice
860186810844940.000061com.dmca
8611868084035970.000010org.jenkins-ci
8621868035830890.000011int.iom
8631867867040820.000009com.iconarchive
8641867738444800.000008com.oriflame
8651867651845380.000008net.middleeasteye
8661867585450410.000007com.waitbutwhy
8671867553443160.000008org.pen
8681867527428300.000013fm.omny
8691867428239020.000009org.icij
8701867404441880.000008org.constitutioncenter
8711867397238570.000009ch.qos
8721867347440700.000009com.9to5google
8731867343235260.000010uk.gov.companieshouse
8741867340639940.000009uk.ac.sussex
8751867325832660.000011com.foreignaffairs
8761867324628350.000012com.news24
8771867320441320.000009re.cli
8781867269042270.000008jp.ac.kobe-u
879186717149020.000033br.com.uol
8801867155237180.000010com.nybooks
8811867144618180.000019com.over-blog
8821867136255780.000006com.symbaloo
8831866961231700.000011uk.co.bbci
884186692009340.000032com.pubmatic
8851866901023850.000015com.scene7
8861866881034670.000010org.wikileaks
8871866724246370.000008org.foodandwaterwatch
8881866643833050.000011at.derstandard
889186660748760.000034com.zoho
8901866541031670.000011org.adb
8911866436235180.000010com.benzinga
892186639887300.000041com.usnews
8931866345055920.000006io.postach
8941866303035470.000010com.palgrave
8951866246411090.000027net.media
8961866212040910.000009net.datasociety
897186614265400.000055com.googleoptimize
8981866104441160.000009au.com.heraldsun
8991865906834400.000010ru.kp
9001865798826750.000013com.thenation
901186576906630.000045me.zalo
9021865705230010.000012com.unity
9031865579616350.000020org.altervista
9041865458048260.000007it.polito
9051865449044820.000008edu.odu
9061865420031710.000011org.sonatype
9071865379028530.000012net.vnexpress
908186532847350.000041com.alibaba
9091865251044540.000008com.muckrack
9101865240029370.000012com.lexology
9111865227049620.000007kr.co.hani
9121865081837230.000010com.tradingeconomics
9131865060638820.000009com.study
914186505945950.000050com.airbnb
9151864977636630.000010gov.ustr
9161864967449000.000007com.theodysseyonline
9171864952037240.000010uk.gov.homeoffice
9181864827810310.000029com.pcmag
919186471346330.000047org.joomla
9201864577033670.000011br.scielo
92118645448740.000486com.trustpilot
9221864489055520.000006au.edu.vu
9231864471031270.000011tw.com.pchome
924186442607250.000042com.splashthat
9251864356027450.000013ca.citizenlab
9261864262845990.000008com.condenast
9271864235416760.000020com.techrepublic
9281864143021250.000018io.pantheonsite
9291864132232810.000011ru.cbr
9301864124028660.000012ca.uwaterloo
9311864091243840.000008uk.co.belfasttelegraph
932186406885970.000050com.wufoo
9331863919432310.000011org.ellenmacarthurfoundation
9341863904648210.000007com.zimbio
9351863882433210.000011com.rabbitmq
936186384225470.000054com.herokuapp
9371863685256570.000006org.cgsociety
9381863491257040.000006in.teletype
939186349005310.000056com.aol
9401863402637520.000010edu.ucpress
9411863397633750.000011com.scotsman
9421863349432770.000011com.kroger
943186322724050.000077com.constantcontact
944186319308700.000034com.emarketer
9451863087456430.000006com.dbs
9461863083844070.000008au.edu.deakin
9471863063033150.000011org.osce
9481862945829710.000012com.euractiv
9491862864247880.000007com.latercera
9501862610238810.000009com.bloombergquint
951186255809520.000031com.digitalocean
9521862510236380.000010org.ushmm
9531862488837510.000010com.lawfareblog
9541862466448770.000007ke.co.google
9551862440038350.000009com.thenationalnews
9561862437847160.000007com.kongregate
9571862424051320.000007com.apsense
9581862402013420.000023com.nvidia
959186238386170.000048gov.copyright
9601862350444370.000008com.jacobinmag
9611862339629340.000012net.dwcdn
962186226806430.000046com.accenture
9631862232045290.000008uk.ac.soas
9641862116634450.000010de.test
9651862056816610.000020com.createjs
9661862014032180.000011com.obsproject
9671861997628920.000012org.gnupg
9681861987043180.000008com.washingtonian
9691861939249080.000007uk.co.birminghammail
9701861915445480.000008io.meduza
9711861905840340.000009ru.mid
9721861888212070.000025org.golang
9731861853439300.000009org.cgiar
9741861711624110.000015co.pcdn
9751861630426130.000014com.olark
9761861556210070.000030com.gumroad
9771861365227550.000013ru.tass
9781861351048250.000007com.selfridges
9791861281437700.000009fr.capital
9801861221443910.000008za.co.mg
981186121689110.000033net.atlassian
982186120448440.000035com.redhat
9831861151817490.000019com.indiegogo
9841861143850090.000007edu.utep
9851861085617270.000020org.linuxfoundation
986186104469560.000031com.att
9871860918628900.000012org.transparency
9881860858839180.000009com.encyclopedia
98918606828720.000505com.oculus
990186067726990.000043com.psychologytoday
9911860669830910.000011com.sharefile
992186065041510.000189org.whatwg
993186063547540.000040org.poynter
9941860626833860.000011com.alchemer
995186046487070.000042co.ibb
996186044322860.000095com.caniuse
9971860440227380.000013com.springeropen
9981860438624900.000014studio.flourish
9991860413843730.000008com.googledrive
10001860401444900.000008tw.com.books

Credits

Thanks to the authors of the WebGraph framework, whose software made the computation of graph properties and ranks possible.

We hope the data will be useful for you to do any kind of research on ranking, graph analysis, link spam detection, etc. Let us know about your results via Common Crawl’s Google Group!