We are pleased to announce a new release of host-level and domain-level web graphs based on the crawls of July, August and September 2020. Additional information about the data formats, the processing pipeline, our objectives, and credits can be found in the announcements of prior webgraph releases (e.g., Nov/Dec/Jan 2017-2018 Webgraphs). You may also visit the projects cc-webgraph and cc-pyspark which host all scripts and tools required to construct the graphs.

Host-level graph

The graph consists of 539 million nodes and 3.02 billion edges and includes dangling nodes i.e. hosts that have not been crawled yet are pointed to from a link on a crawled page. There are 467 million dangling nodes (86.7%) and the largest strongly connected component contains 46 million (8.5%) nodes.

You can download the graph and the ranks of all 539 million hosts from AWS S3 on the path s3://commoncrawl/projects/hyperlinkgraph/cc-main-2020-jul-aug-sep/host/. Alternatively, you can use https://data.commoncrawl.org/projects/hyperlinkgraph/cc-main-2020-jul-aug-sep/host/ as prefix to access the files from everywhere.

SizeFileDescription
3.32 GBcc-main-2020-jul-aug-sep-host-vertices.paths.gznodes ⟨id, rev host⟩, paths of 12 vertices files
13.7 GBcc-main-2020-jul-aug-sep-host-edges.paths.gzedges ⟨from_id, to_id⟩, paths of 24 edges files
5.95 GBcc-main-2020-jul-aug-sep-host.graphgraph in BVGraph format
2 kBcc-main-2020-jul-aug-sep-host.properties
6.76 GBcc-main-2020-jul-aug-sep-host-t.graphtranspose of the graph (outlinks inverted to inlinks)
2 kBcc-main-2020-jul-aug-sep-host-t.properties
1 kBcc-main-2020-jul-aug-sep-host.statsWebGraph statistics
7.77 GBcc-main-2020-jul-aug-sep-host-ranks.txt.gzharmonic centrality and pagerank

Note that the host names are reversed and a leading www. is stripped: www.subdomain.example.com becomes com.example.subdomain.

Domain-level graph

The domain graph was built by aggregating the host graph on the level of pay-level domains (PLDs) based on the public suffix list maintained on publicsuffix.org.

The domain-level graph has 89 million nodes and 1.71 billion edges. 51% or 45 million nodes are dangling nodes, the largest strongly connected component covers 35 million or 39% of the nodes.

All files related to the domain graph are available on AWS S3 under s3://commoncrawl/projects/hyperlinkgraph/cc-main-2020-jul-aug-sep/domain/ resp. https://data.commoncrawl.org/projects/hyperlinkgraph/cc-main-2020-jul-aug-sep/domain/.

Download files of the Common Crawl Jul/Aug/Sep 2020 domain-level webgraph

SizeFileDescription
0.61 GBcc-main-2020-jul-aug-sep-domain-vertices.txt.gznodes ⟨id, rev domain, num hosts⟩
6.80 GBcc-main-2020-jul-aug-sep-domain-edges.txt.gzedges ⟨from_id, to_id⟩
3.75 GBcc-main-2020-jul-aug-sep-domain.graphgraph in BVGraph format
2 kBcc-main-2020-jul-aug-sep-domain.properties
3.69 GBcc-main-2020-jul-aug-sep-domain-t.graphtranspose of the graph
2 kBcc-main-2020-jul-aug-sep-domain-t.properties
1 kBcc-main-2020-jul-aug-sep-domain.statsWebGraph statistics
1.91 GBcc-main-2020-jul-aug-sep-domain-ranks.txt.gzharmonic centrality and pagerank

Below you’ll find the top 1000 domains ranked by Harmonic Centrality or PageRank. The full list of all 89 million domain ranks is available for download.

Top 1000 domains ranked by harmonic centrality (Jul/Aug/Sep 2020)

harmonic
centrality
rank
hc valuepage rankpage rank
value
reversed hostname
13202792810.018888com.googleapis
23031294430.012001com.facebook
32902594820.013237com.google
42656047240.007343org.w
52651653450.007172com.twitter
62601646460.006600com.youtube
72461419090.004795com.instagram
82422071280.005190org.gmpg
92357297070.005599com.googletagmanager
1023188190110.003202com.linkedin
1122457894150.002590com.gravatar
1222451350100.003967com.cloudflare
1322364152140.002726com.gstatic
1422350042120.003105org.wordpress
1521926906220.001505com.pinterest
1621699168210.001752com.wordpress
1721599006260.001181org.wikipedia
1821538264160.002431com.bootstrapcdn
1921497526180.001836com.apple
2021314410300.001106com.vimeo
2121248994410.000830be.youtu
2221186566200.001794com.jquery
2321081822230.001444com.microsoft
2421073240450.000773com.blogspot
2520994964390.000952com.amazonaws
2620975988460.000732gl.goo
2720971574250.001384com.wp
2820921220470.000723com.amazon
2920788608720.000439com.tumblr
3020716256190.001804com.adobe
3120694562670.000535ly.bit
3220675418340.001018com.google-analytics
3320627694530.000673org.mozilla
3420618998170.001975com.github
3520617620310.001059net.cloudfront
3620579928710.000449com.yahoo
3720571130290.001127com.googlesyndication
3820570586600.000612eu.europa
3920562028520.000679com.flickr
4020560188420.000818net.jsdelivr
4120526264970.000347com.googleusercontent
4220481758620.000606co.t
43204802181090.000313com.reddit
4420451670240.001419com.fontawesome
4520436180830.000389com.weebly
4620387228560.000628com.paypal
4720375802400.000910com.macromedia
4820372972700.000450com.medium
4920370180430.000808com.addthis
5020360678280.001156ru.yandex
5120338498270.001156me.wp
5220331252640.000559org.w3
5320326560790.000411io.github
54202928361380.000223com.nytimes
5520275824760.000414org.creativecommons
5620274244590.000615org.schema
57202553261500.000192com.forbes
58202460681730.000151com.imgur
5920227930360.000979net.doubleclick
60202196121940.000133uk.co.bbc
61202109241140.000285com.soundcloud
6220171070660.000548com.vk
63201552221950.000133com.cnn
6420142696440.000803org.apache
6520134806630.000587com.whatsapp
66201295823140.000082edu.mit
67201230321800.000146com.imdb
68201183102080.000124net.slideshare
69201166262430.000101com.wsj
70201157681970.000128org.wikimedia
7120089462850.000388com.shopify
72200822042150.000120edu.stanford
73200766841540.000181gov.cdc
74200756323280.000079com.wired
75200697242680.000094com.techcrunch
76200570662550.000096edu.harvard
77200513363530.000076com.appspot
78200512922070.000124net.sourceforge
79200512642570.000096com.oracle
80200512501550.000177int.who
81200508882060.000124com.businessinsider
82200460501370.000227org.archive
83200381982300.000113com.washingtonpost
84200358102500.000097com.live
85200299401640.000163com.bing
86200282105490.000054com.livejournal
87200276224240.000069com.go
88200246664560.000066com.msn
89200199924070.000072uk.co.telegraph
90200093061700.000154com.theguardian
91200025145270.000056edu.cornell
92199971461990.000128org.ietf
93199967144860.000063gov.nasa
94199954762590.000096com.android
95199862523020.000084com.reuters
9619983946510.000702net.fbcdn
97199748902400.000102com.bloomberg
98199664641620.000164com.giphy
9919960428770.000414com.list-manage
100199590465200.000057com.googleblog
101199565582690.000093com.bbc
102199552044090.000071com.slack
103199420561430.000205com.spotify
104199388285910.000049com.zdnet
10519936894480.000721net.facebook
106199350105860.000050com.quora
107199310721260.000265com.ytimg
108199227744440.000067com.myspace
109199220467570.000038edu.umich
110199201787150.000040edu.upenn
111199174821510.000185gov.nih
112199078863440.000077com.usatoday
113199038966540.000045com.economist
114199037223130.000082com.cnbc
115199027003080.000083com.example
116198965525250.000056com.pixabay
117198950144180.000070net.researchgate
118198827904490.000066com.latimes
119198811641880.000138com.blogger
120198700463870.000075org.python
12119864804650.000555com.wix
122198607604330.000068com.githubusercontent
123198587326930.000042org.ieee
124198542544990.000061com.mashable
125198509185710.000052edu.berkeley
126198475541350.000241com.youtube-nocookie
127198451301600.000167com.issuu
128198430682180.000118org.acm
129198397368340.000036org.chromium
130198395502350.000106uk.co.google
131198357905510.000054org.arxiv
132198330202460.000099net.behance
133198326822910.000086org.npr
134198319941080.000320com.unpkg
135198311368840.000034com.arstechnica
136198268402130.000121com.unsplash
137198228843410.000078com.outlook
138198226701100.000303de.google
13919812430540.000654com.googleadservices
140198108723470.000077com.prnewswire
141198064586780.000043edu.columbia
142198053821710.000153me.t
143198048862970.000085com.dribbble
144198041422560.000096com.squarespace
145197990321390.000215gov.privacyshield
146197988063060.000083com.huffingtonpost
147197979642600.000096com.bandcamp
148197951123980.000074com.time
14919793874370.000975com.baidu
150197920826160.000048com.gitlab
151197904063340.000079com.nationalgeographic
152197882144430.000067com.nature
153197851787940.000037com.stackexchange
154197821141790.000147gle.forms
155197816762580.000096org.ampproject
156197785345480.000054com.fortune
157197779028130.000036com.git-scm
15819776608330.001030com.wixstatic
159197740307710.000038com.qz
160197723902810.000089com.wiley
161197722686460.000046au.net.abc
162197709306380.000046edu.yale
163197695824280.000068com.meetup
164197678764680.000064com.ted
1651976138611600.000026com.hatenablog
166197590524480.000066com.patreon
167197574722830.000089com.disqus
168197567489360.000032edu.ucla
169197539981470.000195com.dropbox
170197533801680.000158com.yelp
171197506782710.000093org.un
172197463842120.000122com.twimg
173197431182540.000096org.drupal
174197414746890.000042org.bitbucket
175197365404220.000069com.statista
176197354409030.000033uk.ac.cam
177197319407180.000040com.evernote
178197319166820.000043com.newyorker
179197256386030.000049com.buzzfeed
180197195446060.000049me.about
181197186547220.000040com.mysql
182197168048500.000035com.thenextweb
183197154204950.000061com.theatlantic
184197109202790.000091com.sciencedirect
185197108264030.000073com.getpocket
186197053266690.000043uk.co.blogspot
1871970212612930.000023com.tinypic
188196967304500.000066com.booking
189196956525140.000058com.xinhuanet
190196949047430.000039org.weforum
191196942682470.000098gov.ca
192196923226020.000049gov.loc
1931969099812820.000023org.postgresql
194196899088280.000036edu.princeton
195196879542390.000103uk.co.amazon
196196859424800.000063com.dailymotion
1971967967214520.000021ru.narod
198196789261890.000138com.xing
199196759148790.000034edu.jhu
200196736705000.000060gov.whitehouse
201196718466650.000044org.worldbank
2021966870613650.000022org.eclipse
203196677704000.000073com.springer
204196676844450.000067com.nypost
205196658723160.000081com.ft
20619660930610.000606com.fb
207196589862040.000125com.feedburner
208196583948260.000036org.cambridge
209196547624760.000063uk.co.dailymail
210196543867660.000038edu.washington
211196542424960.000061org.eff
21219653044320.001054com.qq
213196501444730.000064com.goodreads
214196495242640.000095org.doi
215196495025120.000058com.w3schools
2161964124213110.000023edu.virginia
217196412124400.000067com.googlecode
218196383486330.000047com.vice
219196331285060.000059com.force
220196329767230.000040com.trello
221196327808360.000035com.about
222196305625230.000056com.inc
223196294824530.000066com.scribd
2241962936820530.000016com.wikidot
225196284366190.000048org.semver
226196144966070.000049com.cbsnews
227196077946510.000045com.withgoogle
228196055121460.000196me.line
2291960341020890.000016com.googlesource
230196014762190.000118org.iana
231196014525460.000054gov.usda
232195998003090.000083com.tinyurl
2331959829010900.000027com.techradar
234195976748580.000035com.dropboxusercontent
235195974463840.000076com.ibm
2361959520012840.000023co.elastic
237195940242890.000087com.squareup
2381959333614340.000021org.linuxfoundation
2391959238811340.000026org.coursera
2401958983010270.000029gov.fbi
2411958828411580.000026edu.unc
242195860087050.000041com.vox
243195833501930.000134de.amazon
244195830965500.000054uk.co.independent
2451958055414230.000021ms.1drv
246195789503830.000076com.digg
2471956761213930.000022org.kernel
248195639481130.000287com.sharethis
249195634687510.000039org.d3js
250195574908010.000037gov.fcc
2511955729210260.000029com.hollywoodreporter
2521955625813690.000022com.howstuffworks
253195537004300.000068com.cnet
254195520688040.000037com.foxnews
255195471341520.000183com.addtoany
256195470066440.000046com.indiatimes
257195469289950.000029com.steamcommunity
2581954686411050.000026cn.com.chinadaily
259195456285840.000050com.psychologytoday
260195441308230.000036uk.co.guardian
2611954392014630.000021it.scoop
262195437541330.000247com.mailchimp
263195422348370.000035com.slate
264195422141530.000182com.opera
265195384125890.000050com.mckinsey
2661953681610200.000029com.sap
2671953641826050.000013org.wikiquote
268195343343070.000083com.bitly
269195333086270.000047com.mozilla
270195330542620.000095jp.ameblo
271195312607350.000039org.sciencemag
272195282461160.000284com.paypalobjects
2731952810823450.000014org.wikibooks
274195271041760.000151com.amazon-adsystem
275195269486880.000042gov.noaa
276195248683050.000083com.netdna-ssl
277195245443100.000083com.nbcnews
278195233309890.000030com.target
2791952277615230.000020com.instructables
280195175269750.000030edu.umn
281195165309650.000031com.merriam-webster
2821951626014310.000021hk.com.google
283195148521850.000140com.tripadvisor
2841951460823770.000014com.diigo
285195039164970.000061ca.google
286194992622360.000106com.wpengine
2871949924610290.000028com.sun
2881949656211890.000025com.digitaltrends
289194963403910.000075com.stumbleupon
290194918461150.000284com.weibo
2911949163816260.000019com.ign
2921949121013140.000023com.mercurynews
2931949096413520.000022de.zeit
294194906362290.000114com.etsy
295194891067970.000037uk.ac.ox
296194874542840.000089com.optimizely
29719485106730.000425net.akamaihd
2981948436812070.000025net.speedtest
2991948428415220.000020org.greenpeace
3001948362215530.000020net.seesaa
301194794507200.000040au.com.google
302194786049040.000033de.spiegel
3031947633610770.000027com.podbean
304194751426280.000047org.pbs
305194747225160.000058com.gofundme
306194744844160.000070com.kickstarter
3071947359013400.000022com.urbandictionary
308194724224720.000064org.pewresearch
309194713205190.000057com.bigcommerce
3101946791221370.000015de.bild
311194672402310.000112com.eepurl
312194653005150.000058com.theverge
313194647922730.000092com.stackoverflow
314194645989260.000032com.politico
315194630368110.000036co.ibb
316194623943320.000079it.google
3171946216221100.000016ly.visual
318194618409550.000031org.unicef
3191946093220200.000016org.tensorflow
3201945759216880.000018com.itv
3211945715010130.000029com.lifehacker
322194565121060.000334com.stripe
3231945627213490.000022edu.msu
324194554123120.000083net.windows
325194533748050.000037edu.academia
3261945028413910.000022com.storify
3271944963812570.000024com.crunchbase
328194493865950.000049com.tandfonline
3291944913219580.000017com.lego
3301944468211870.000025com.jetbrains
331194437966770.000043gov.senate
332194436648550.000035com.chicagotribune
3331944323423010.000014com.rottentomatoes
334194402247700.000038ca.cbc
335194399342050.000125com.eventbrite
3361943949612730.000023hk.hku
3371943640210350.000028edu.wisc
338194361046910.000042com.libsyn
3391943574210510.000028edu.northwestern
340194332129440.000031com.scientificamerican
3411943279810430.000028edu.uchicago
3421943118212880.000023uk.co.wired
343194255461900.000137jp.co.google
3441942434620020.000016org.maven
3451942373210300.000028com.mediafire
346194233504150.000070me.telegram
347194184403960.000074com.criteo
348194172083570.000076fr.google
349194170386640.000044us.icio
3501941640214770.000020com.deadline
351194158086400.000046com.sagepub
352194142567300.000039com.ecwid
3531941346612750.000023org.aclu
354194132585760.000051com.typepad
355194121684710.000064com.photobucket
356194072945330.000055com.oup
3571940716811990.000025com.reverbnation
3581940696815140.000020de.mpg
3591940533013890.000022edu.rutgers
3601940479010670.000027com.scmp
36119403976810.000392net.jsfiddle
362194036924210.000069com.calendly
363194036188440.000035com.sciencedaily
364194034687270.000039gov.justice
365194008305750.000051gov.hhs
366193982589190.000032com.theconversation
367193975969910.000030com.apnews
368193974429380.000032com.huffpost
3691939493415180.000020com.newscientist
370193946566080.000049org.openstreetmap
3711939330012870.000023com.aljazeera
372193932302160.000119com.hubspot
373193900186450.000046gov.house
3741938811826820.000012uk.co.timesonline
3751938803425640.000013com.space
376193839107000.000041com.pinimg
377193835044320.000068page.g
3781938199012410.000024com.sky
379193818448660.000035gov.congress
380193810269120.000033com.500px
3811938063212170.000024org.wiktionary
382193803409580.000031com.ssrn
3831937974217090.000018edu.bu
3841937764017570.000018gov.cia
385193757402140.000120org.bbb
3861937563414380.000021com.foxbusiness
387193718146240.000047ru.gov
3881937105615980.000019ca.mcgill
389193679267900.000037com.qualtrics
3901936605412900.000023org.semanticscholar
391193657787610.000038site.business
392193657602670.000094ru.ok
393193637989770.000030edu.si
394193637588870.000034br.com.google
395193636888470.000035co.g
3961936320410210.000029uk.co.thetimes
3971936212226630.000012com.discovermagazine
398193599201820.000142us.zoom
399193594928890.000034org.fao
400193593526830.000043org.change
4011935786614690.000020com.salon
402193566502280.000114com.aliyuncs
403193562809970.000029com.thehill
404193548189730.000030gov.usgs
405193515842980.000085com.ebay
4061935098812220.000024com.nikkei
407193501423380.000078com.rawgit
408193496605780.000051it.placehold
409193488241570.000173com.wixsite
4101934812212380.000024com.smithsonianmag
411193465527580.000038org.oecd
4121934651410880.000027ee.linktr
4131934525433120.000011com.openai
4141934228810480.000028uk.co.mirror
415193416566790.000043com.deviantart
4161934133215760.000019org.phys
417193405984130.000070tv.twitch
418193401384040.000072com.mapbox
4191933524615460.000020ca.sfu
4201933246427540.000012com.instapaper
421193306562440.000100org.gnu
4221933050421150.000016au.edu.unimelb
4231932872410440.000028int.coe
4241932832020780.000016org.nobelprize
425193282866670.000043pl.google
4261932768013330.000022com.irishtimes
427193275782930.000086com.office
4281932753619620.000017org.torproject
429193249364840.000063net.imgix
4301932462812810.000023uk.ac.ucl
4311932092610540.000028org.ohchr
4321931877212130.000025com.strikingly
433193155025090.000059org.hbr
4341931504014110.000021uk.co.metro
435193143041230.000270com.statcounter
436193134689720.000030gov.dhs
437193133802870.000088com.thedailybeast
4381931323418110.000017com.bankofamerica
4391931253412650.000024com.buzzsprout
440193119408630.000035gov.nps
4411930986824260.000014au.com.theage
442193074729330.000032com.aweber
4431930676615570.000020blog.home
444193054488480.000035gov.bls
445193052964900.000062edu.nyu
4461930434620870.000016com.oxforddictionaries
4471930407411620.000025gov.nyc
44819303568930.000356org.reactjs
4491930277813820.000022au.com.news
4501930088222910.000014sg.edu.nus
4511929990014290.000021com.flipboard
452192998964810.000063com.scorecardresearch
4531929801025170.000013com.dummies
4541929584024650.000013org.rsc
4551929547210100.000029com.britannica
456192949847140.000040gov.state
4571929421617000.000018org.gutenberg
4581929289235650.000010fm.ask
4591929086629700.000011com.pearltrees
460192899907930.000037com.zapier
4611928649425620.000013com.mystrikingly
462192840928760.000034com.cctv
463192835008160.000036com.healthline
4641928304419550.000017com.chrome
4651928263814840.000020com.rt
466192825509670.000031com.newsweek
4671928053823620.000014com.biography
4681927964610050.000029ch.google
4691927050414120.000021com.ifttt
4701927023815840.000019com.axios
471192700424660.000065es.google
472192696588820.000034au.gov.nsw
4731926744434830.000010hk.edu.cuhk
474192671508620.000035com.stitcher
4751926700025200.000013com.boredpanda
4761926558211920.000025fr.lemonde
477192639925540.000053com.steampowered
4781926387810550.000028org.jstor
4791926215013350.000022org.imf
480192619188730.000034com.venturebeat
481192611968250.000036org.poynter
4821925957416840.000018com.straitstimes
4831925945233900.000010com.chosun
4841925932215020.000020edu.asu
4851925876223510.000014io.gitlab
486192568109560.000031ru.google
487192559969520.000031sg.com.google
4881925379813310.000022uk.co.standard
489192529066120.000048de.gesetze-im-internet
490192515169480.000031gov.archives
4911925027023850.000014th.co.google
492192497304230.000069io.codepen
4931924893030330.000011com.nola
4941924889420230.000016edu.gmu
4951924524628360.000012app.netlify
4961924515811160.000026com.wikia
4971924265613530.000022com.history
4981924216010070.000029com.thelancet
4991924183029180.000011com.coca-colacompany
5001924064026540.000012google.ai
501192406008560.000035com.freepik
5021924043015480.000020com.buzzfeednews
5031923864828940.000012org.cato
504192377004310.000068net.datatables
505192374565010.000060com.rackcdn
5061923616815900.000019gov.supremecourt
5071923330225340.000013edu.byu
508192332686420.000046fr.amazon
5091923292028720.000012tw.blogspot
510192319448030.000037in.co.google
5111923153019770.000017org.edx
5121923122813090.000023com.tunein
5131923115617790.000018org.ocks
514192304785220.000057nl.google
515192283705550.000053com.gmail
5161922706823980.000014com.nationalpost
5171922691018670.000017edu.ucsb
5181922641823830.000014edu.nd
5191922639213720.000022com.dw
520192262561270.000262com.jimdo
5211922586024120.000014no.uio
5221922540010060.000029google.blog
5231922239814090.000021cn.cntv
5241922216432850.000011cn.org.china
5251922113616390.000019org.unwomen
526192189509460.000031com.airtable
5271921778825100.000013edu.uoregon
5281921537621720.000015org.britishcouncil
5291921467426680.000012org.icrc
530192144629510.000031com.gallup
5311921337822650.000015ru.kremlin
5321921289413320.000022com.globalsign
533192108508750.000034gov.uspto
534192104929590.000031edu.psu
5351921002215090.000020com.penguinrandomhouse
5361920931813450.000022com.netdna-cdn
5371920868632690.000011is.archive
5381920834415310.000020uk.ac.lse
5391920795225030.000013fi.helsinki
5401920762020420.000016edu.pitt
5411920723621700.000015net.openid
5421920625611550.000026edu.brookings
543192052907860.000037com.imageshack
544192047701720.000152com.npmjs
5451920448632900.000011de.diplo
5461920438019560.000017edu.unl
5471920383215440.000020edu.georgetown
5481920321021250.000015org.metmuseum
5491920275012400.000024org.nejm
550192022447260.000040com.adage
5511920043419900.000017com.channel4
5521920029015110.000020com.findlaw
5531920003022240.000015com.france24
554191989382820.000089net.php
5551919869817840.000017com.csmonitor
556191978664190.000069com.proofpoint
557191953201920.000135com.iubenda
5581919437210110.000029gov.treasury
5591919402817080.000018com.euronews
5601919144622860.000014com.thoughtco
5611919013637420.000009com.doodlekit
562191898621070.000320com.godaddy
5631918933412980.000023edu.duke
5641918865220710.000016com.foreignpolicy
5651918511819960.000017org.documentcloud
5661918375613000.000023com.livescience
5671918370625080.000013com.upi
5681918310420850.000016com.gq
569191822601780.000148com.zendesk
5701918207430200.000011com.authorstream
5711918207439150.000009com.mysanantonio
5721918169441330.000008tw.edu.sinica
5731917789427190.000012org.wikisource
5741917738222200.000015com.insider
575191771808510.000035gov.nist
5761917700016250.000019com.thestar
577191766421810.000145jp.co.yahoo
5781917454613040.000023au.com.smh
5791917402820250.000016org.ncsl
5801917380042520.000008hk.edu.cityu
5811917374433490.000010com.sina
5821917310821970.000015ie.independent
5831917226621560.000015edu.uky
58419171704960.000349me.ogp
5851917093634130.000010uk.ac.sussex
5861917079217550.000018gov.doc
587191707041310.000250org.networkadvertising
588191695663200.000080io.shields
589191680586490.000045gov.usa
5901916699042910.000008org.china-embassy
5911916681031370.000011com.udn
592191637741610.000166ru.mail
5931916371234740.000010com.worldatlas
594191635225050.000060com.netflix
595191632548570.000035com.thinkwithgoogle
5961916235614410.000021gov.defense
5971916195213180.000023tw.com.google
5981916082616040.000019org.hrw
5991915981214950.000020com.asahi
600191595707850.000037io.readthedocs
6011915876826880.000012org.freedomhouse
6021915865414130.000021tv.ustream
603191578228930.000034org.mediawiki
6041915644617150.000018org.pypi
6051915180030280.000011org.adb
6061915140620990.000016fr.leparisien
6071915115226150.000013com.abc7news
6081915065020630.000016com.voanews
6091915004810190.000029com.pcmag
610191486984470.000067org.nodejs
6111914855442880.000008com.theundefeated
6121914781638600.000009org.gephi
6131914717613270.000023org.undp
6141914646232770.000011org.iucnredlist
6151914645425830.000013com.sacbee
6161914620415940.000019com.treehugger
6171914560822920.000014no.google
6181914446224710.000013co.ello
6191914335419860.000017com.msnbc
620191433542520.000097com.myshopify
621191428109810.000030uk.parliament
6221914252022870.000014co.pcdn
6231914194212550.000024gov.uscourts
6241914189614220.000021co.lpages
6251914078023440.000014org.fas
626191397687810.000037com.intel
627191387408070.000036com.marketwatch
6281913691420470.000016com.infogram
6291913384825380.000013com.sputniknews
6301913370424300.000014ie.google
6311913258213440.000022se.google
632191317989900.000030com.netlify
633191310009250.000032com.jekyllrb
6341913061230550.000011int.interpol
635191303085240.000056fr.free
6361913018011980.000025be.google
6371912975015750.000019uk.co.huffingtonpost
6381912931023230.000014ly.rebrand
6391912910415040.000020link.page
6401912870417940.000017com.sched
6411912772422180.000015jp.co.japantimes
6421912725428290.000012org.tigris
6431912715228390.000012org.pri
6441912700623190.000014nz.co.nzherald
6451912562212040.000025at.google
6461912546452920.000007org.arkive
647191253262220.000116com.salesforce
648191232966500.000045br.com.uol
6491912101842420.000008kr.co.kbs
6501911937416650.000018com.thebalance
6511911912614550.000021org.oxfordjournals
6521911863837380.000009com.encyclopedia
6531911726222040.000015org.eji
6541911650628180.000012org.heritage
6551911629823710.000014com.popsci
6561911451821990.000015com.snopes
6571911409826010.000013org.oas
658191133481560.000174com.aspnetcdn
6591911271210310.000028org.ilo
6601910965422630.000015com.insidehighered
6611910898015870.000019gov.usembassy
6621910893216220.000019dk.google
6631910804033920.000010org.jenkins-ci
6641910738828270.000012org.project-syndicate
6651910655619630.000017com.justia
6661910412015630.000019gov.govinfo
6671910315216990.000018com.firebaseapp
6681910206820930.000016edu.uga
6691910202836780.000010edu.wm
6701910161432840.000011com.cgtn
6711910159618810.000017org.worldcat
672191012269000.000033com.zoho
673191005903920.000074com.atlassian
6741910029026760.000012org.transparency
6751909977613170.000023org.aarp
6761909968616750.000018org.americanbar
6771909916422390.000015com.timeshighereducation
6781909796432700.000011com.pastemagazine
6791909590225980.000013org.csis
680190943426290.000047com.samsung
681190940587740.000038com.pexels
6821909337419640.000017com.washingtontimes
6831909271420160.000016gov.usaid
6841909016613340.000022org.heart
685190887641910.000136com.automattic
686190884288650.000035com.verisign
6871908766021080.000016com.motherjones
6881908703429440.000011org.vim
6891908649820620.000016edu.nap
690190861729240.000032com.webs
6911908477815930.000019org.amnesty
6921908434421010.000016ua.com.google
6931908355239880.000009org.globalnetworkinitiative
6941908319625460.000013org.globalcitizen
6951908250017540.000018com.surveygizmo
6961908205822620.000015org.wbur
6971908104823530.000014uk.gov.companieshouse
6981908039824680.000013jp.mainichi
6991908028631810.000011com.podomatic
7001907811617510.000018org.unhcr
7011907627621180.000016ca.ctvnews
7021907531025650.000013uk.co.bbci
703190738129680.000031uk.gov.legislation
7041907152226810.000012com.nationalreview
7051907083225230.000013com.cleveland
7061907047438140.000009org.neocities
7071906988410730.000027ly.snip
708190688644380.000067com.herokuapp
709190685106560.000045com.oreilly
7101906673011540.000026cz.google
7111906646421640.000015org.nrdc
7121906576826710.000012org.thinkprogress
7131906565417950.000017ca.globalnews
714190651062700.000093jp.co.amazon
7151906284013280.000023org.altervista
7161906173231190.000011uk.ac.nottingham
7171906116812670.000024uk.gov.nationalarchives
7181906093421060.000016au.edu.anu
7191906023630350.000011com.intensedebate
7201906010227340.000012de.hu-berlin
721190598027360.000039com.airbnb
7221905980023260.000014de.auswaertiges-amt
7231905937623160.000014nz.co.google
7241905917026720.000012org.unenvironment
7251905897831320.000011org.rsf
7261905793241100.000008com.koreaherald
7271905777819600.000017org.pewtrusts
7281905767828670.000012com.techinasia
7291905748822760.000014com.thecut
7301905617437000.000009com.viki
7311905606827240.000012org.gnupg
7321905459024690.000013ro.google
7331905439420570.000016edu.gwu
7341905411630570.000011com.bangkokpost
7351905362625720.000013fr.rfi
736190528684140.000070com.pubmatic
7371905190623090.000014com.tutsplus
7381905164810790.000027tr.com.google
739190515162480.000098com.getbootstrap
7401905090844240.000008com.wonderhowto
7411905062636190.000010com.upworthy
7421905049628830.000012org.sonatype
743190503822880.000087com.typeform
7441904957428060.000012il.co.google
7451904938427390.000012uk.ac.leeds
746190481162010.000127to.amzn
7471904798627030.000012vn.com.google
748190475782740.000092com.surveymonkey
749190473809220.000032int.wipo
7501904628810570.000028com.gizmodo
751190461448740.000034com.box
7521904557822980.000014com.oregonlive
753190449165470.000054gg.discord
7541904444433560.000010com.theepochtimes
7551904440024800.000013ar.com.google
7561904414429430.000011bg.google
7571904363220610.000016com.squarespace-cdn
7581904340034790.000010io.soup
7591904277825450.000013com.webbyawards
7601904238427440.000012io.fabric
7611904229815880.000019com.speakerdeck
762190416841360.000232info.aboutads
763190406069070.000033com.docker
7641903881418170.000017com.miamiherald
7651903792431910.000011ph.com.google
7661903776224630.000013com.channelnewsasia
7671903755631980.000011uk.co.vogue
7681903755426190.000013edu.fsu
769190358704850.000063com.staticflickr
7701903528424950.000013za.co.google
7711903367826960.000012com.thejakartapost
7721903244212360.000024edu.ucsd
773190322584870.000062com.fc2
7741903203854150.000007com.armorgames
7751903194421550.000015fi.google
7761903123438850.000009com.alamy
7771903086822210.000015id.co.google
7781903046227940.000012com.rd
7791902971229510.000011com.cartodb
7801902958420920.000016com.newrepublic
7811902934834360.000010com.benzinga
782190283646610.000044com.entrepreneur
7831902796053760.000007org.gwtproject
7841902666029880.000011com.sciencealert
7851902653827630.000012org.iaea
7861902640223760.000014com.thenation
7871902369234110.000010si.google
7881902304624000.000014pt.google
7891902012429650.000011au.gov.nla
7901901983835130.000010com.dailykos
791190197564940.000061com.aol
7921901912825190.000013edu.emory
7931901901235730.000010com.inhabitat
7941901895634150.000010uk.ac.soas
795190184026660.000044com.deloitte
7961901823011850.000025com.today
797190168389780.000030com.windowsphone
7981901618636590.000010org.cpj
7991901616421190.000016kr.co.google
8001901590629810.000011se.lu
8011901578027740.000012org.cfr
802190148564290.000068me.fb
8031901367832880.000011com.joins
8041901298042640.000008sa.com.google
8051901287828140.000012com.politifact
806190122929640.000031com.alexa
8071901144241310.000008edu.utm
8081901106827350.000012com.law360
809190105469830.000030com.engadget
8101900866235830.000010hr.google
8111900853821460.000015hu.google
812190068606310.000047fm.last
8131900654024760.000013eu.politico
8141900624840470.000009com.chinatimes
8151900611625210.000013mx.com.google
8161900606031410.000011com.jezebel
8171900594238680.000009com.iconarchive
8181900531834710.000010com.ogilvy
8191900486623990.000014gr.google
8201900408628160.000012com.monday
8211900325227380.000012com.digitaljournal
8221900324831490.000011com.nyt
8231900322033000.000011audio.breaker
8241900264028230.000012uk.co.guim
825190023846250.000047com.cisco
8261900203833910.000010cn.globaltimes
8271900180826480.000012com.instructure
8281900064633210.000011com.crashlytics
8291899972027230.000012au.com.businessinsider
8301899933834300.000010org.grist
8311899828012090.000025com.pastebin
832189981183150.000082ai.shortpixel
8331899807839900.000009org.constitutioncenter
8341899796048420.000007jp.hatenadiary
8351899678037700.000009edu.ttu
8361899607629970.000011uk.ac.york
8371899593616710.000018com.eater
83818995084900.000364com.livestream
8391899503627720.000012com.bepress
8401899475228980.000012org.wri
8411899226220430.000016my.com.thestar
8421899112237750.000009com.minds
8431899059223520.000014mp.j
8441899057037080.000009app.web
8451899006234100.000010org.carnegieendowment
8461898978636450.000010tr.com.aa
847189894187110.000041gov.sec
8481898774638120.000009com.hyperallergic
8491898728234080.000010com.foreignaffairs
8501898664037970.000009au.edu.uts
851189853924700.000064com.fastcompany
8521898503235600.000010org.hypotheses
8531898446838960.000009com.japantoday
8541898275235070.000010edu.wayne
8551898204837130.000009uk.ac.kent
8561898198836970.000009rs.google
8571898053240710.000009org.sourcewatch
858189793668320.000036com.symantec
8591897842425390.000013fr.paris
8601897799629420.000011com.prweek
8611897790217650.000018ch.ipcc
8621897696022170.000015com.kinstacdn
8631897626210460.000028edu.cmu
8641897546220390.000016int.unfccc
8651897506241960.000008eg.com.google
8661897480431800.000011org.nationalgeographic
8671897454826430.000013gov.doi
8681897394034060.000010de.uni-frankfurt
8691897349442430.000008by.google
8701897202250500.000007com.symbaloo
8711897101034170.000010nl.wur
8721896995023280.000014org.unodc
8731896843015990.000019com.routledge
8741896841245090.000008com.ipsos-mori
8751896696236580.000010ae.google
8761896615244820.000008com.etymonline
8771896588849820.000007build.bazel
8781896556633200.000011org.brainpickings
8791896454431430.000011com.scotsman
8801896379642950.000008com.oilprice
8811896338035970.000010uk.ac.westminster
8821896326645450.000008lk.google
8831896257612600.000024fr.blogspot
8841896136034120.000010org.rferl
8851896131031730.000011org.epi
8861895990041150.000008lv.google
8871895981239090.000009au.edu.griffith
8881895942242190.000008kr.ac.snu
8891895728013120.000023com.upwork
8901895707624360.000014com.html5rocks
8911895671454930.000007me.nimbusweb
8921895650229400.000011fr.archives-ouvertes
8931895639842930.000008com.delawareonline
8941895546217920.000017ru.rbc
895189549687450.000039com.gartner
8961895493011270.000026edu.utexas
8971895364225260.000013net.noscript
8981895346627170.000012ae.thenational
8991895333633800.000010com.study
900189530924270.000068com.hp
9011895307436410.000010uk.co.spectator
9021895276238690.000009com.cleantechnica
9031895220828030.000012org.unctad
9041895120042550.000008com.teslamotors
9051895011816140.000019com.billboard
9061894936630740.000011com.theculturetrip
9071894789624540.000013com.multiscreensite
908189477387040.000041com.visualstudio
9091894758839850.000009uk.ac.plymouth
9101894745426600.000012sk.google
9111894731238110.000009net.aljazeera
9121894711024130.000014com.theintercept
9131894655634210.000010uk.ac.exeter
9141894649433320.000010social.mastodon
9151894587628280.000012com.euractiv
9161894586436350.000010com.db
9171894273644470.000008org.mises
9181894231646800.000008ng.com.google
9191894201627950.000012org.panda
9201894162224660.000013uk.gov.justice
9211894143056020.000007net.chinadialogue
9221894092441180.000008cat.uab
9231894074642270.000008com.spokesman
9241894008235230.000010co.com.google
9251893923044730.000008lu.google
9261893899641890.000008pe.com.google
9271893861833660.000010com.nybooks
9281893860643810.000008uk.ac.core
9291893820622280.000015com.termsfeed
9301893819416690.000018com.pcworld
9311893811238460.000009kr.co.yna
9321893800247930.000007com.gust
9331893778838800.000009org.cgiar
9341893730042310.000008pk.com.google
9351893653035750.000010net.inquirer
9361893600830830.000011ru.lenta
9371893400014680.000020com.nokia
9381893367629320.000011tw.com.pchome
9391893349612230.000024com.ycombinator
9401893335029110.000011nl.volkskrant
94118933194780.000411com.oculus
9421893261234550.000010cl.google
9431893186239490.000009org.polymer-project
9441893088826370.000013com.washingtonexaminer
9451893062239450.000009sk.sme
9461893053433890.000010edu.monash
947189300869180.000032com.canva
948189295524540.000066org.opensource
9491892939839770.000009com.rappler
9501892863040000.000009org.plan-international
9511892651845610.000008cr.co.google
9521892641235870.000010lt.google
9531892583238100.000009ca.macleans
954189256468170.000036net.adform
9551892504648730.000007com.blogto
9561892495235080.000010uk.ac.nhm
9571892492832110.000011edu.ua
9581892355428150.000012com.articulate
959189232882490.000098com.sxsw
9601892286639930.000009org.wilsoncenter
9611892267640820.000009edu.lehigh
962189223364170.000070com.skype
9631892154646990.000008com.out
9641892071410850.000027com.redhat
9651892068032660.000011my.com.google
9661891906420310.000016gov.ecfr
9671891890045850.000008org.nsidc
968189187784120.000070net.secureservercdn
9691891811245360.000008kz.google
9701891759032950.000011org.osce
971189175625570.000053org.whatwg
9721891741840960.000009com.wsoctv
9731891738025870.000013uk.org.nationaltrust
9741891722032010.000011uk.gov.london
9751891704819730.000017scot.gov
9761891698238650.000009uk.ac.qub
9771891646038070.000009com.governing
978189164305280.000056com.businesswire
9791891630022530.000015wales.gov
9801891506634220.000010com.afp
9811891498230800.000011uk.ac.qmul
9821891487851540.000007com.ingress
9831891454045960.000008com.webcindario
9841891431634020.000010org.psychiatryonline
9851891323041480.000008org.marxists
9861891309640730.000009me.thinglink
9871891297016600.000018com.css-tricks
9881891285847320.000008ie.nuigalway
9891891251443480.000008com.asiaone
9901891236833540.000010com.kaspersky-labs
9911891211012490.000024com.smashingmagazine
9921891206437870.000009org.nationalinterest
993189118485560.000053com.adweek
9941891143644980.000008ec.com.google
9951891140447220.000008bd.com.google
9961891000648460.000007uy.com.google
9971890999842330.000008com.match
9981890974640210.000009ee.google
9991890968839620.000009com.adn
10001890947443100.000008com.wnd

Credits

Thanks to the authors of the WebGraph framework, whose software made the computation of graph properties and ranks possible.

We hope the data will be useful for you to do any kind of research on ranking, graph analysis, link spam detection, etc. Let us know about your results via Common Crawl’s Google Group!