We are pleased to announce a new release of host-level and domain-level web graphs based on the crawls of October, November/December 2020 and January 2021. Additional information about the data formats, the processing pipeline, our objectives, and credits can be found in the announcements of prior webgraph releases (e.g., Nov/Dec/Jan 2017-2018 Webgraphs). You may also visit the projects cc-webgraph and cc-pyspark which include all scripts and tools required to construct the graphs. Instructions to explore the graphs in the webgraph format are given in our collection of webgraph notebooks.

Host-level graph

The graph consists of 490 million nodes and 2.57 billion edges and includes dangling nodes i.e. hosts that have not been crawled yet are pointed to from a link on a crawled page. There are 414 million dangling nodes (84.4%) and the largest strongly connected component contains 42.6 million (8.7%) nodes.

Host names in the graph are in reverse domain name notation and a leading www. is stripped: www.subdomain.example.com becomes com.example.subdomain.

You can download the graph and the ranks of all 490 million hosts from AWS S3 on the path s3://commoncrawl/projects/hyperlinkgraph/cc-main-2020-21-oct-nov-jan/host/. Alternatively, you can use https://commoncrawl.s3.amazonaws.com/projects/hyperlinkgraph/cc-main-2020-21-oct-nov-jan/host/ as prefix to access the files from everywhere.

Please note that the text representation of the host-level graph is shipped in 36 gzip-compressed files listed in two path listings – one for the nodes, one for the edges. First, download the paths listing and uncompress it using “gzip”. By adding the prefix s3://commoncrawl/ or https://commoncrawl.s3.amazonaws.com/ to each line in the path listing you get the list of URLs to download the entire graph.

Download files of the Common Crawl Oct/Nov/Jan 2020-2021 host-level webgraph

SizeFileDescription
3.08 GBcc-main-2020-21-oct-nov-jan-host-vertices.paths.gznodes ⟨id, rev host⟩, paths of 12 vertices files
11.76 GBcc-main-2020-21-oct-nov-jan-host-edges.paths.gzedges ⟨from_id, to_id⟩, paths of 24 edges files
5.18 GBcc-main-2020-21-oct-nov-jan-host.graphgraph in BVGraph format
2 kBcc-main-2020-21-oct-nov-jan-host.properties
5.63 GBcc-main-2020-21-oct-nov-jan-host-t.graphtranspose of the graph (outlinks inverted to inlinks)
2 kBcc-main-2020-21-oct-nov-jan-host-t.properties
1 kBcc-main-2020-21-oct-nov-jan-host.statsWebGraph statistics
7.04 GBcc-main-2020-21-oct-nov-jan-host-ranks.txt.gzharmonic centrality and pagerank

Domain-level graph

The domain graph was built by aggregating the host graph on the level of pay-level domains (PLDs) based on the public suffix list maintained on publicsuffix.org.

The domain-level graph has 86 million nodes and 1.47 billion edges. 50% or 43 million nodes are dangling nodes, the largest strongly connected component covers 34 million or 39% of the nodes.

All files related to the domain graph are available on AWS S3 under s3://commoncrawl/projects/hyperlinkgraph/cc-main-2020-21-oct-nov-jan/domain/ resp. https://commoncrawl.s3.amazonaws.com/projects/hyperlinkgraph/cc-main-2020-21-oct-nov-jan/domain/.

Download files of the Common Crawl Oct/Nov/Jan 2020-2021 domain-level webgraph

SizeFileDescription
0.59 GBcc-main-2020-21-oct-nov-jan-domain-vertices.txt.gznodes ⟨id, rev domain, num hosts⟩
6.00 GBcc-main-2020-21-oct-nov-jan-domain-edges.txt.gzedges ⟨from_id, to_id⟩
3.40 GBcc-main-2020-21-oct-nov-jan-domain.graphgraph in BVGraph format
2 kBcc-main-2020-21-oct-nov-jan-domain.properties
3.26 GBcc-main-2020-21-oct-nov-jan-domain-t.graphtranspose of the graph
2 kBcc-main-2020-21-oct-nov-jan-domain-t.properties
1 kBcc-main-2020-21-oct-nov-jan-domain.statsWebGraph statistics
1.85 GBcc-main-2020-21-oct-nov-jan-domain-ranks.txt.gzharmonic centrality and pagerank

Below you’ll find the top 1000 domains ranked by Harmonic Centrality or PageRank. The full list of all 86 million domain ranks is available for download.

Top 1000 domains ranked by harmonic centrality (Oct/Nov/Jan 2020-2021)

harmonic
centrality
rank
hc valuepage rankpage rank
value
reversed domain name
13035556610.017956com.googleapis
22942716430.012871com.facebook
32817356220.012899com.google
42570281250.007348com.twitter
52562831440.007628org.w
62529780860.007231com.youtube
72419546690.005352com.instagram
82335535680.005532org.gmpg
92323367470.006500com.googletagmanager
1022492432110.003277com.linkedin
1121576402100.004076com.cloudflare
1221468510140.002649com.gravatar
1321395642130.003020org.wordpress
1421353798220.001726com.pinterest
1520946722300.001242org.wikipedia
1620926308190.001834com.wordpress
1720877776160.002056com.gstatic
1820799472150.002451com.bootstrapcdn
1920795402180.001943com.apple
2020626472320.001165com.vimeo
2120527986410.000886be.youtu
2220419038210.001769com.jquery
2320391686280.001246com.microsoft
2420327544240.001500com.wp
2520314602450.000769com.blogspot
2620231490370.001025com.amazonaws
2720208912510.000691com.amazon
2820199388470.000740gl.goo
2920093688710.000448com.tumblr
3020070176350.001070com.google-analytics
3120050256610.000598ly.bit
3220030452200.001794com.adobe
3319998314170.002005com.github
3419989010500.000715org.mozilla
3519962834580.000639eu.europa
3619945306340.001103net.cloudfront
3719849112520.000682com.flickr
3819843288400.000909net.jsdelivr
3919833032910.000369com.googleusercontent
40198235601050.000347com.yahoo
4119752300560.000650co.t
4219722088330.001114com.googlesyndication
4319712406230.001517com.fontawesome
4419708354810.000392com.weebly
4519706054550.000653com.paypal
46196952881090.000308com.reddit
4719641534310.001231me.wp
4819640398730.000435com.medium
4919635162670.000491io.github
50195904441370.000225com.nytimes
51195878801210.000280com.soundcloud
5219585192270.001262ru.yandex
5319583494430.000786com.addthis
5419582250440.000776com.macromedia
5519560416660.000504org.w3
5619549714700.000451com.shopify
57195186721460.000201com.forbes
58195024481440.000205org.archive
5919496300900.000371org.creativecommons
60194903481940.000131uk.co.bbc
6119482926590.000630org.schema
6219479528390.000910com.baidu
6319464572360.001035net.doubleclick
64194599662000.000129com.cnn
6519451100530.000677com.whatsapp
6619449068600.000611com.vk
67194449662060.000126net.slideshare
68194439561580.000169com.bing
69194198781740.000152com.imdb
70193859561860.000140com.imgur
71193725202360.000112com.washingtonpost
72193710761760.000150com.theguardian
73193569522540.000102com.wsj
74193564742100.000123org.wikimedia
75193521282190.000117com.businessinsider
76193476982090.000123com.stackoverflow
77193427124090.000065com.msn
78193266543270.000079com.appspot
79193243341570.000172int.who
80193211122160.000119edu.stanford
81193167961790.000148org.apache
82193103903330.000078com.ibm
83193093543370.000077edu.mit
84193049382250.000116net.sourceforge
85192929321160.000288com.ytimg
8619287812570.000649net.fbcdn
87192824862850.000091com.techcrunch
88192765002690.000094com.bbc
89192754801550.000181com.wixsite
90192752221520.000189gov.nih
91192752002200.000117com.livejournal
92192706502330.000113uk.co.google
93192706104400.000062gov.nasa
9419263354540.000666com.googleadservices
95192434042620.000097edu.harvard
96192431542700.000094com.oracle
97192431262760.000093org.acm
98192386502180.000117org.ietf
99192384501850.000142com.blogger
100192384262230.000116gov.ca
101192346304650.000059fr.free
102192320582590.000098com.bloomberg
103192218442750.000093com.android
104192186363040.000085com.live
105192108121260.000271com.jimdo
106192088961690.000159com.issuu
107192058021660.000162com.giphy
108191941564380.000062com.ted
109191901783480.000075com.huffingtonpost
110191877821300.000254com.weibo
111191868621540.000186us.zoom
112191857942520.000103org.gnu
113191763324030.000066com.myspace
1141916212210390.000030com.wikia
115191525823730.000071net.researchgate
116191500583430.000075com.usatoday
117191483323090.000084com.reuters
118191439884000.000067uk.co.telegraph
119191412024460.000061com.latimes
120191309763720.000071com.example
121191295523450.000075com.githubusercontent
12219127344930.000366com.unpkg
123191271163840.000069com.nature
124191253963360.000077com.wired
12519124320250.001485com.wixstatic
126191148422990.000087org.npr
127191110183080.000084com.cnbc
128191077723280.000079com.ebay
129191037042930.000088com.wiley
130191028141110.000299de.google
131190977321910.000135com.npmjs
132190954543440.000075com.hp
133190885505390.000050com.cisco
134190840489320.000034com.stackexchange
135190817361320.000251com.youtube-nocookie
136190806381340.000250com.ft
137190788142130.000120org.ampproject
138190772325320.000051com.steampowered
139190746383650.000072com.patreon
140190729184550.000061com.theatlantic
141190728804760.000057com.gitlab
142190723448900.000035com.pcmag
143190684361950.000131com.unsplash
144190654948770.000036edu.psu
145190639263760.000070com.time
146190611422080.000125com.twimg
147190610641640.000165com.yelp
148190593328730.000036edu.washington
149190571965330.000051edu.cornell
150190541521480.000197com.dropbox
151190517386030.000046org.arxiv
152190476263790.000070com.statista
153190430503240.000080org.un
154190426022490.000104com.bandcamp
155190409148240.000038com.venturebeat
15619040684750.000432me.fb
157190398828410.000037org.chromium
15819033464650.000519com.wix
159190262442840.000092com.sciencedirect
160190197666290.000045edu.yale
161190163265840.000047com.pexels
162190152308260.000038org.bitbucket
163190104528320.000038org.ieee
164190076363880.000068com.springer
165190018107650.000041com.evernote
166189975068550.000037edu.upenn
167189949262580.000098jp.ameblo
168189937721490.000195me.t
169189928344160.000065org.hbr
170189920282960.000088com.outlook
171189859541680.000160jp.co.yahoo
172189832385770.000048com.cbsnews
173189825467920.000040me.about
174189812288910.000035com.git-scm
175189803368290.000038com.economist
176189803281500.000193com.opera
177189780561380.000223me.line
178189749964500.000061com.goodreads
179189733646450.000044com.mysql
180189731148420.000037com.docker
181189697085620.000048com.buzzfeed
182189695665650.000048com.mashable
183189683985870.000047com.mozilla
184189645409510.000034com.about
185189626327970.000040org.worldbank
186189561288150.000039com.newyorker
187189546683420.000076com.dribbble
188189542362650.000096net.behance
189189518763900.000068com.theverge
190189518385010.000054gov.whitehouse
191189501424560.000061uk.co.dailymail
192189438903470.000075com.xinhuanet
193189428123200.000080com.w3schools
194189411243780.000070com.fc2
1951893648811510.000027edu.wisc
196189350747640.000041gov.noaa
197189323962940.000088com.disqus
1981893122813370.000023co.elastic
19918927646380.000956com.qq
200189266944480.000061com.bigcommerce
201189264426240.000045gov.loc
202189256201560.000179gov.cdc
203189246329290.000035gov.fcc
204189228161360.000228info.aboutads
205189216308210.000039com.qz
2061892130822950.000015com.wikidot
207189192403850.000069com.scribd
208189151047480.000042org.unesco
209189144189590.000033com.apnews
210189124263750.000070com.digg
211189110827790.000040com.vox
212189103701800.000147com.amazon-adsystem
213189101102720.000094com.squareup
214189074104950.000054uk.co.independent
215189062242560.000100org.iana
2161890560812510.000025edu.uchicago
217189013984200.000064com.force
218188987026460.000044com.usnews
219188981086470.000044com.gartner
220188949182950.000088com.nbcnews
221188901604700.000058com.dailymotion
2221888348810040.000031com.dropboxusercontent
223188782766170.000045org.pbs
224188764541810.000147jp.co.google
225188761641130.000292com.sharethis
226188758244670.000059com.nationalgeographic
227188741128110.000039uk.co.blogspot
228188733408440.000037au.net.abc
229188680009340.000034com.foxnews
2301886532215590.000020org.eclipse
231188594643990.000067com.getpocket
232188592289470.000034com.slate
233188590622660.000095org.doi
23418858866630.000541com.fb
235188566389680.000033com.politico
236188499929070.000035com.playstation
237188493346000.000046org.semver
2381884846815650.000020gd.is
2391884700413110.000024edu.unc
2401884675815230.000021org.kernel
241188463108390.000037org.sciencemag
242188460382570.000099com.typepad
2431884499811520.000027com.hatenablog
2441884400419810.000018com.googlesource
245188421802020.000128com.naver
246188405482480.000104com.feedburner
2471883983010280.000030edu.umn
248188375184210.000064com.ecwid
249188330483320.000078net.windows
250188310429140.000035com.trello
251188291765540.000049com.tandfonline
2521882917213690.000023cn.com.chinadaily
253188283821890.000138org.allaboutcookies
254188258447460.000042gov.senate
255188239461190.000286com.paypalobjects
2561881998010050.000031ly.ow
2571881872420140.000017org.tensorflow
258188187109010.000035edu.umich
259188179362910.000089com.tinyurl
260188172124790.000056org.pewresearch
26118815000760.000423com.list-manage
262188111322390.000111com.wpengine
263188069088340.000038ca.cbc
264188051447400.000043co.ibb
265188040444770.000057gov.fda
266188029342220.000117com.eepurl
267188024623180.000081it.google
26818798744790.000413net.facebook
2691879704620190.000017com.instructables
2701879556212000.000026edu.northwestern
271187947107520.000042org.change
272187936103940.000068es.google
273187934848930.000035org.cambridge
274187902022510.000103com.calendly
275187848629620.000033gov.congress
2761878486210220.000030uk.co.guardian
277187820145550.000049com.bigcartel
2781877780813480.000023org.semanticscholar
2791877634010060.000031com.gumroad
280187756906370.000044org.plos
2811877495613410.000023com.nikkei
282187737123130.000083com.optimizely
283187729884050.000066com.googlecode
284187666748960.000035gov.justice
2851876478810440.000029com.huffpost
286187643121530.000186com.addtoany
287187634083980.000067me.m
28818761658800.000403com.wsimg
289187600464110.000065com.tripod
290187548849570.000033ee.linktr
2911875452610210.000030gov.usgs
2921875316414590.000021uk.co.wired
293187527283380.000077fr.google
2941875184610590.000029com.500px
295187516364520.000061ca.google
2961874941819960.000017com.amd
2971874444419440.000018com.azure
298187429647770.000040au.com.google
299187425064810.000056com.163
3001874129210910.000028com.ssrn
3011874075810650.000029com.newsweek
3021873491016880.000019ca.utoronto
303187346201390.000218com.spotify
304187311127440.000042cn.com.people
305187303843340.000078page.g
3061873007427510.000012com.nabble
3071872840014540.000021com.howstuffworks
3081872293821070.000016com.lego
3091871976216750.000019com.storify
3101871933211400.000027uk.co.thetimes
311187179308010.000039site.business
312187177268840.000036uk.ac.ox
313187162063110.000083com.bitly
3141871506012180.000026com.scmp
315187136187980.000040com.adage
316187135526540.000044com.indiatimes
3171871256419080.000018de.mpg
3181871236810570.000029com.thehill
319187054665190.000052com.criteo
3201870475410780.000028org.ohchr
3211870447415310.000020com.aljazeera
322187033488020.000039uk.gov.service
3231870148215450.000020org.greenpeace
324186990643310.000078com.netdna-ssl
325186983789670.000033ch.google
326186939947840.000040us.icio
3271869369011530.000027int.coe
328186925569330.000034org.d3js
3291869045614990.000021com.history
3301868979410180.000030com.netlify
3311868806413200.000023com.nymag
3321868706413630.000023org.wiktionary
333186848682870.000091ru.ok
3341868379212930.000024com.intuit
3351868279614190.000022uk.co.standard
3361868138819950.000017edu.arizona
337186790589440.000034gov.archives
338186787949530.000034ru.google
3391867708410540.000029sg.com.google
340186758909000.000035br.com.google
34118674402850.000385co.g
3421867406819750.000018com.wattpad
343186737545260.000051ru.gov
3441867337013510.000023com.ikea
3451866859814610.000021com.reverbnation
3461866844426810.000013edu.drexel
3471866827611210.000027edu.si
3481866699411740.000027uk.co.mirror
3491866684625720.000013org.maven
350186667244120.000065com.cnet
351186645425800.000048org.openstreetmap
3521866371013730.000023com.jetbrains
3531866368810320.000030com.theconversation
3541866354219210.000018com.newscientist
355186614728470.000037gov.state
3561866114615720.000020ms.1drv
3571866015226440.000013com.mystrikingly
358186553609730.000032org.fao
359186544585900.000047cn.google
360186534722350.000112com.etsy
3611865223214850.000021com.flipboard
362186518207670.000041com.deviantart
3631865151413750.000023com.thedailybeast
3641865140412200.000026org.jstor
3651864902412700.000024com.strikingly
3661864742220450.000017blog.home
367186468126340.000044com.zdnet
368186448283250.000079tv.twitch
3691864227227810.000012com.diigo
3701864048211230.000027com.britannica
3711863925419040.000018ca.ubc
372186388403670.000072com.jotform
3731863518819590.000018com.gettyimages
3741863425416850.000019com.channel4
3751863127814940.000021org.pypi
376186303868130.000039in.co.google
377186278144170.000064com.ssl-images-amazon
378186269781610.000166gle.forms
3791862331019820.000018org.hrw
380186231322810.000092com.cloudinary
3811861861213820.000022au.com.smh
3821861723415660.000020uk.co.metro
3831861718020310.000017hk.com.google
3841861707215990.000020edu.ufl
3851861359023320.000015ly.rebrand
386186127864570.000061net.imgix
387186097464180.000064com.webflow
3881860905023110.000015com.shutterfly
389186077825680.000048com.feedly
390186038505380.000050gov.epa
391186024701040.000348com.stripe
39218601118830.000391net.jsfiddle
3931859979634230.000010org.aclweb
3941859716623480.000014com.yarnpkg
39518596278690.000461net.akamaihd
3961859620219070.000018gov.supremecourt
3971859524423440.000014com.thefreedictionary
398185938164680.000058nl.google
3991859207215780.000020com.dw
4001858829429550.000012com.upi
401185879329810.000032com.thelancet
402185879264250.000064com.slack
403185876803960.000067com.kickstarter
404185873787870.000040com.urldefense
4051858595017130.000019ca.sfu
406185835824600.000060com.livechatinc
407185810826230.000045com.quora
408185809644280.000063com.rackcdn
4091858062019670.000018com.euronews
410185805524510.000061com.go
4111858013013680.000023com.tunein
412185780765940.000046ru.liveinternet
413185767124750.000057com.googleblog
4141857177625970.000013pt.sapo
4151857121221090.000016com.itv
4161857063019450.000018uk.co.huffingtonpost
4171857054212860.000024edu.brookings
4181857052844230.000008tl.page
4191857005823690.000014com.angelfire
4201856888226140.000013org.wikibooks
4211856730216920.000019com.ifttt
422185641348610.000036com.freepik
4231856324622440.000015com.netvibes
424185626021330.000251com.mailchimp
425185625643640.000072me.telegram
426185624005610.000048com.microsoftonline
4271856222419760.000018uk.co.express
4281855920628880.000012sg.edu.nus
4291855909219280.000018io.webflow
430185572927720.000041pl.google
431185559004800.000056com.meetup
4321855548247520.000007com.newgrounds
4331855494423970.000014google.ai
4341855451224390.000014com.yolasite
4351855391221240.000016jp.geocities
4361855298633940.000011com.instapaper
437185513383620.000072com.proofpoint
4381854884413580.000023com.people
43918546296640.000531net.typekit
4401854369421040.000016org.c-span
441185419181590.000169ru.mail
4421854183420430.000017com.avg
4431854065022490.000015app.netlify
4441853939430040.000011com.000webhostapp
445185393164850.000055com.elsevier
4461853800834940.000010cn.edu.pku
4471853687216090.000020com.asahi
448185354228760.000036org.worldwildlife
4491853520411270.000027uk.parliament
4501853482219560.000018uk.gov.ons
451185336941880.000138com.iubenda
4521853279021130.000016org.documentcloud
4531853233830740.000011uk.co.timesonline
454185311182640.000096com.office
455185277642370.000112com.eventbrite
4561852701226990.000013com.self
4571852617225110.000013com.foreignpolicy
4581852480424210.000014org.sundance
459185247022140.000120com.aliyuncs
4601852414012130.000026be.google
4611852324222000.000016ie.google
4621852300014320.000022gov.weather
4631852269431360.000011com.openai
464185225888790.000036org.mediawiki
4651852112428060.000012com.pearltrees
4661852030617040.000019com.firebaseapp
4671851652036200.000010com.dailycaller
468185145124980.000054it.placehold
4691851416826950.000013com.france24
470185130266440.000044edu.berkeley
471185121384920.000055cn.360
4721851142822960.000015com.msnbc
4731851098620890.000017com.thestar
4741851025837320.000009me.site123
4751850939221330.000016com.gfycat
476185089063410.000076com.rawgit
477185079205210.000052com.gmail
4781850768619520.000018org.ocks
4791850687227390.000012org.rsc
4801850431024860.000014edu.hawaii
4811850376623660.000014de.br
4821850325024470.000014edu.colostate
483185025781710.000154com.zendesk
4841850142422220.000015org.nobelprize
4851850109632930.000011net.pixnet
4861850018815280.000020net.seesaa
4871850016424710.000014com.motherjones
488184997207560.000042com.vice
4891849937842340.000008com.masslive
4901849663423550.000014com.cision
491184950581010.000361com.godaddy
492184921048860.000036gov.nist
4931849195612490.000025org.ilo
4941849065420700.000017com.surveygizmo
4951849062833780.000011com.minds
496184905766350.000044com.matterport
4971848985826560.000013ph.com.google
498184881063690.000071org.python
499184870329800.000032gov.va
5001848580011660.000027at.google
5011848515213180.000023se.google
5021848364419610.000018ru.ucoz
5031848299624010.000014com.freep
5041848219038740.000009com.wizards
5051848173835830.000010edu.uvm
5061847814237110.000010org.tvtropes
5071847698815060.000021com.cognitoforms
5081847651614930.000021gov.uscourts
5091847602435300.000010org.oxfam
5101847399222350.000015cn.t
5111847305443310.000008fm.ask
5121847303417080.000019dk.google
5131847052631220.000011de.dw
5141846720420090.000017ua.com.google
5151846712639350.000009com.youdao
516184640161280.000262org.networkadvertising
5171846296810310.000030com.arstechnica
5181846267423100.000015int.unfccc
5191846184433230.000011ch.nzz
520184601561230.000276com.statcounter
5211846012637570.000009net.hinet
5221846001824840.000014com.washingtontimes
5231845977833910.000011edu.miami
5241845964850250.000007tw.com.gamer
5251845912043130.000008ch.qos
526184587747880.000040com.intel
5271845658422200.000015mx.com.google
5281845573422410.000015gov.ky
5291845550434260.000010com.nwsource
530184549488560.000037io.readthedocs
5311845373021870.000016gov.cisa
5321845198822560.000015com.straitstimes
533184494663710.000071io.codepen
534184470063610.000072com.prnewswire
5351844622440970.000009com.smore
5361844613221880.000016pt.google
5371844592027190.000012net.bplaced
5381844580253490.000007net.wargaming
5391844523232720.000011org.csis
5401844473214350.000022org.aarp
541184440802890.000090net.php
5421844375822820.000015no.google
5431844322839240.000009com.steemit
5441844314613040.000024tw.com.google
545184420183140.000083com.squarespace
546184408727430.000043com.oreilly
547184405961990.000130com.hubspot
5481843935448770.000007com.bonanza
5491843880220200.000017co.lpages
5501843860610790.000028net.ovh
551184382088350.000037com.imageshack
5521843787440230.000009com.doodlekit
5531843681824250.000014com.voanews
554184366803580.000073ru.rambler
5551843604828050.000012com.nationalpost
5561843542045340.000008by.google
557184352566140.000045org.nodejs
558184352003970.000067com.onesignal
5591843447033740.000011fr.rfi
560184344664630.000060gov.irs
5611843444425840.000013com.snopes
5621843423018990.000018link.page
5631843419036370.000010org.vim
5641843401822400.000015th.co.google
5651843378233950.000010org.scala-lang
5661843243431420.000011com.inquirer
5671843089828870.000012org.ballotpedia
5681843088833240.000011com.real
569184286006490.000044br.com.uol
570184280045130.000052com.pixabay
5711842665821420.000016uk.co.which
5721842663440700.000009com.viki
5731842567410380.000030com.thenextweb
5741842430231460.000011org.aps
5751842405027640.000012com.post-gazette
5761842351624990.000014net.openid
5771842270226270.000013edu.usf
57818421138820.000391com.livestream
579184204149610.000033jp.shinobi
580184202729560.000033int.wipo
5811841714644500.000008com.bravesites
5821841554228810.000012ru.aif
5831841457429060.000012io.gitlab
5841841428433870.000011org.pri
5851841427619320.000018gov.ct
5861841398426020.000013il.co.google
5871841390619100.000018org.oxfordjournals
5881841321846640.000008com.ucoz
589184124225660.000048com.photobucket
5901841234421910.000016com.xrea
5911841219822340.000015nz.co.google
5921841092020880.000017net.cnki
5931841082828470.000012com.webbyawards
594184101644330.000063com.staticflickr
5951840993436750.000010org.heritage
5961840890819930.000018tr.com.google
5971840857420530.000017com.treehugger
5981840606216950.000019net.leadpages
5991840528221120.000016fi.google
6001840276451530.000007kz.google
601184027082110.000121to.amzn
602184026705690.000048com.deloitte
6031840266211000.000028cz.google
6041840252645620.000008com.freehostia
6051840233421560.000016gov.faa
6061840232627240.000012com.detroitnews
6071840222027740.000012com.slidesharecdn
608184021023460.000075com.adnxs
609183967268120.000039com.thinkwithgoogle
6101839281614710.000021com.trustwave
6111839237626400.000013org.iea
6121839226228830.000012jp.blog
6131839114844260.000008com.goal
6141839018432840.000011com.financialpost
6151838914036360.000010net.alarabiya
6161838908235700.000010org.neocities
6171838858037840.000009co.ello
618183882562070.000126com.salesforce
6191838647835000.000010com.archdaily
6201838598445170.000008com.alamy
6211838592422970.000015gr.google
622183853981600.000168gov.privacyshield
6231838502025690.000013org.kqed
624183831962770.000093org.drupal
625183821103540.000074com.snapchat
6261838149623380.000015ro.google
6271838139233670.000011uk.ac.leeds
628183813162710.000094com.mapbox
6291838014439070.000009uk.gov.scotland
6301837962019460.000018hu.google
6311837822443990.000008co.aeon
632183774463740.000070com.cdninstagram
6331837606235450.000010gov.fec
6341837602233120.000011com.virgin
6351837562822190.000015ar.com.google
6361837506041280.000009cn.globaltimes
6371837468843330.000008com.corel
638183740664640.000059com.herokuapp
6391837320040620.000009jp.go.ndl
640183731107910.000040google.blog
6411837231622080.000016com.justia
6421837221623200.000015za.co.google
6431837061622160.000016ru.ria
6441837023236940.000010com.intensedebate
6451836979437930.000009com.visualcapitalist
6461836909427220.000012si.google
6471836851241820.000008com.rediff
6481836760438340.000009ca.uvic
6491836723625770.000013ru.rosminzdrav
650183659184390.000062com.nypost
6511836588046780.000008org.wikimapia
6521836535034390.000010com.nationalreview
6531836496221340.000016uk.org.asa
6541836428238500.000009tw.edu.ntu
655183639745980.000046com.samsung
6561836319027030.000012is.google
6571836259838690.000009com.podomatic
658183612423160.000082cn.bshare
6591836042434840.000010org.wri
6601836002841600.000009uk.co.spectator
6611835985817110.000019ly.cutt
6621835831649890.000007to.gplus
6631835808649080.000007com.atwebpages
664183578261770.000150com.tripadvisor
6651835743850030.000007org.scala-sbt
6661835648842760.000008ru.msu
6671835645011610.000027com.udemy
6681835535829730.000011com.timesofisrael
6691835250652130.000007edu.csulb
6701835162247440.000007com.authorstream
6711835094441270.000009gy.rb
6721835011032040.000011us.ny.state
6731834987636440.000010com.linuxquota
6741834979835630.000010com.udn
6751834957838450.000009org.jenkins-ci
6761834950816860.000019com.pcworld
6771834910424810.000014uk.ac.imperial
6781834878452380.000007com.etymonline
6791834802634920.000010eg.com.google
6801834777433630.000011uk.co.bbci
6811834733823860.000014com.name
6821834693837450.000009com.novell
6831834592414870.000021com.digitaloceanspaces
6841834537660400.000006net.vingle
6851834535026150.000013us.pa.state
686183450406420.000044com.xiti
6871834500623020.000015fr.pagesjaunes
6881834424646040.000008by.tut
68918341982780.000417com.messenger
6901834150216720.000019id.co.google
6911834149240120.000009com.donaldjtrump
6921833972423590.000014co.pcdn
693183386746060.000046com.indeed
694183384464590.000060com.sxsw
6951833787023790.000014sk.google
696183371262460.000105uk.co.amazon
697183368263510.000074com.atlassian
6981833681012250.000025com.dell
6991833644249470.000007fr.online
7001833622619330.000018com.law
7011833564837830.000009com.wmtransfer
7021833542222420.000015kr.co.google
7031833540247090.000008edu.odu
7041833513029710.000011cl.google
7051833502443000.000008il.ac.huji
7061833478242710.000008tw.gov.cdc
7071833379428860.000012my.com.google
7081833301433850.000011com.scotsman
7091833286433220.000011com.instructure
7101833283245630.000008com.hackaday
7111833219421310.000016gov.pa
712183320546270.000045com.withgoogle
7131833110819970.000017scot.gov
7141833091231780.000011com.broadwayworld
715183308048580.000036com.canva
7161833069445250.000008com.mongabay
7171832980245080.000008com.macobserver
7181832968637250.000010org.sonatype
7191832811823910.000014gov.wi
7201832773626830.000013org.usgbc
7211832766241130.000009gov.peacecorps
7221832762446520.000008cn.tianya
7231832671034950.000010pk.com.google
724183263028700.000036com.marketwatch
7251832616414900.000021com.billboard
726183249761070.000316net.gandi
7271832487828450.000012com.thecut
72818324686890.000372me.ogp
7291832398045850.000008io.meduza
7301832389828270.000012uk.org.nationaltrust
7311832375839110.000009au.edu.adelaide
7321832339847660.000007de.uni-erlangen
7331832248237590.000009uk.org.rspb
7341832237637730.000009cv.google
7351832125651350.000007cat.bcn
7361831973637280.000009com.ipage
7371831972653110.000007com.brother
7381831814824100.000014my.com.thestar
7391831787234010.000010uk.ac.york
7401831750433150.000011com.politifact
7411831740831280.000011ee.google
7421831717833260.000011org.thinkprogress
7431831703421020.000016se.haxx
7441831676445540.000008au.edu.rmit
7451831627229590.000011hr.google
7461831529652120.000007com.selfridges
7471831524437720.000009au.com.telstra
7481831374614360.000022com.fiverr
7491831304434200.000010de.hu-berlin
7501831151635720.000010com.nola
7511831109434580.000010sa.com.google
7521831043641450.000009ca.dal
7531831012662370.000006org.arkive
7541830942227590.000012bg.google
7551830869634290.000010com.monday
7561830866446350.000008at.tugraz
7571830843235080.000010com.eiseverywhere
7581830829837640.000009uk.co.cfdr
7591830810232980.000011org.iucn
7601830744435710.000010app.web
7611830693237020.000010org.iucnredlist
762183069082920.000088com.surveymonkey
7631830639038060.000009gi.com.google
7641830603850560.000007ec.com.google
7651830596238750.000009de.uni-freiburg
7661830552842440.000008au.com.heraldsun
767183052225150.000052io.shields
768183049146100.000046org.eff
7691830487838290.000009com.psmag
7701830450647210.000007ua.at
771183027989300.000034gov.uspto
772183026481900.000137com.automattic
7731830128639480.000009com.mozello
7741830061211080.000028com.gizmodo
7751830041835960.000010pl.wp
7761830032234710.000010org.royalsociety
7771829962228190.000012org.unep
7781829945236060.000010com.realclearpolitics
7791829829835310.000010jp.coocan
7801829829626130.000013vn.com.google
7811829821844340.000008jp.hatenablog
7821829789642810.000008com.waitrose
7831829787646760.000008info.webry
7841829785244270.000008net.inquirer
7851829770442740.000008jp.gree
7861829717846110.000008org.nationalinterest
7871829633029810.000011edu.uconn
788182956109460.000034edu.columbia
7891829555455310.000006org.mises
7901829545212740.000024com.smashingmagazine
7911829522433030.000011uk.gov.companieshouse
7921829486644420.000008gov.ourdocuments
7931829466638940.000009sl.com.google
7941829291262180.000006com.rhino3d
7951829284234350.000010org.cfr
796182927807900.000040com.airbnb
797182927122830.000092jp.co.amazon
798182915704130.000065com.pubmatic
799182909208780.000036com.box
8001829042656100.000006com.coroflot
8011829034643480.000008com.thediplomat
8021828690240660.000009com.inhabitat
8031828666832770.000011com.bp
8041828652245920.000008cat.uab
8051828348038270.000009uk.co.villiers-london
8061828301441400.000009org.grist
8071828245240160.000009com.foreignaffairs
8081828132410810.000028com.tapad
8091828037813470.000023org.altervista
810182803583820.000069com.skype
8111828032443490.000008com.worldsecuresystems
8121827968024090.000014com.volusion
8131827951629070.000012ru.nethouse
8141827948035270.000010pe.com.google
8151827943847790.000007be.lesoir
8161827887432880.000011co.com.google
8171827881638850.000009de.uni-koeln
8181827877829100.000012org.gnupg
8191827802246560.000008com.mihanblog
8201827755433600.000011org.panda
8211827718634400.000010lv.google
8221827667453000.000007lu.google
823182764424840.000055com.inc
8241827567651030.000007cn.com.caijing
8251827513433310.000011uk.gov.metoffice
82618274258680.000471com.oculus
8271827373223640.000014org.donorbox
8281827331230380.000011rs.google
8291827325611970.000026com.merriam-webster
8301827144850510.000007ee.ut
8311827106025190.000013com.amebaownd
8321827092244820.000008com.marksandspencer
8331827078064470.000006su.clan
8341826994840960.000009ru.interfax
8351826962038520.000009org.rferl
8361826875629040.000012gov.nd
837182679945480.000049com.fortune
8381826777646930.000008it.unitn
8391826771456650.000006am.google
8401826676235020.000010org.iaea
8411826374838930.000009pr.com.google
8421826215850450.000007com.tok2
8431826193819010.000018ch.ethz
8441826192233420.000011gov.la
8451826118245070.000008org.democracynow
8461826117625930.000013net.noscript
847182602168360.000037com.mix
848182598624080.000066net.adform
8491825960852080.000007tn.google
8501825797842120.000008jp.hateblo
8511825788860290.000006hk.edu.hkbu
8521825768038840.000009nl.wur
8531825759450090.000007gr.auth
854182574069970.000031com.webs
8551825676045120.000008com.mnn
8561825670257590.000006ru.nnov
8571825623839540.000009com.afp
8581825574413650.000023com.format
8591825566252090.000007nf.co
860182539543290.000079com.getbootstrap
8611825298849610.000007jp.hatenadiary
8621825215447280.000007hk.com.hkex
8631825125811930.000026com.redhat
8641825097456000.000006com.gust
8651825008810670.000029com.symantec
8661824946625620.000013net.ucoz
867182493202680.000095com.typeform
8681824869463270.000006com.x10host
8691824833235470.000010uk.co.saveourschools
8701824789829340.000012com.squarespace-cdn
8711824729229770.000011lt.google
872182468725250.000051com.adweek
8731824684442950.000008com.scienceblogs
8741824647248480.000007de.uni-konstanz
8751824556263620.000006com.ueuo
8761824504838560.000009uk.gov.data
8771824475640050.000009tr.com.hurriyet
8781824365230700.000011ae.google
8791824357018910.000019com.speakerdeck
8801824333050790.000007com.blogsky
8811824313420440.000017tv.ustream
8821824037467110.000006su.moy
883182392987610.000041gov.copyright
8841823909652920.000007ru.novayagazeta
8851823904427890.000012gov.nh
8861823899040570.000009org.hathitrust
8871823894836480.000010org.annualreviews
8881823893211540.000027pl.home
8891823888238150.000009com.businesscatalyst
890182377404720.000058com.ea
8911823772630870.000011uk.gov.hmrc
8921823694039300.000009cc.uxdesign
8931823689460150.000006com.artfire
894182367043660.000072org.opensource
8951823653034670.000010it.beniculturali
8961823613225070.000014gov.mn
8971823607610190.000030com.engadget
8981823590236820.000010ve.co.google
8991823545249730.000007com.teslamotors
9001823403874750.000005com.hangame
901182339664270.000063com.fastcompany
9021823360042630.000008com.hsbc
9031823307424620.000014com.netsolhost
9041823258255560.000006me.google
9051823234456430.000006mu.google
9061823157055290.000006com.yam
9071823124239690.000009tz.co.google
908182309989740.000032com.verisign
9091823091633640.000011tw.com.pchome
9101823066272930.000005com.addr
9111823062826360.000013com.shell
9121823060265990.000006com.dropmark
9131822970856350.000006li.google
9141822911650020.000007com.gab
9151822910644930.000008com.tapatalk
9161822819413250.000023edu.ucla
9171822795835570.000010uk.co.newmedianow
9181822793849880.000007edu.whoi
9191822781037380.000009ng.com.google
9201822763054440.000007ni.com.google
9211822607641100.000009uk.co.sainsburys
9221822545844120.000008com.iconarchive
9231822508053800.000007gr.ntua
9241822492461520.000006com.epochtimes
9251822471651980.000007org.birdlife
9261822461035320.000010uk.co.intersol
9271822417856150.000006id.co.kaskus
928182237629500.000034com.zoho
9291822316654030.000007cr.co.google
9301822304656950.000006sv.com.google
9311822288240740.000009vn.zing
9321822271445370.000008uk.co.zoopla
9331822248040390.000009uk.ac.jisc
9341822103438360.000009com.prweek
9351822042230980.000011int.wmo
9361822041054660.000006mz.co.google
9371822020249660.000007edu.umb
9381822019612900.000024uk.co.freeukbusinessdirectory
9391822006814760.000021org.owasp
9401821972666690.000006net.comunidades
9411821897641410.000009com.scotusblog
9421821884056360.000006com.cyberlink
9431821873838280.000009do.com.google
9441821867229660.000011io.termly
9451821826247350.000007com.fatcow
9461821817238510.000009mt.com.google
9471821811035890.000010uk.org.oxonaa
9481821795837740.000009gt.com.google
9491821690837370.000009com.solidworks
9501821678236410.000010uk.co.profilebusiness
9511821627036250.000010uk.co.heatall
9521821603445060.000008com.theringer
9531821538825580.000013nl.jouwweb
954182153208000.000039com.wikihow
9551821506059530.000006com.symbaloo
9561821476851710.000007pl.cba
9571821416257400.000006kg.google
9581821359423210.000015com.freeprivacypolicy
9591821285012220.000026com.att
9601821268052030.000007pl.lublin
9611821267215410.000020edu.umd
9621821217454850.000006uk.org.labour
9631821207442880.000008us.ms.state
9641821182834490.000010com.wantedly
9651821157043960.000008org.ametsoc
9661821154237010.000010uy.com.google
9671821148655530.000006jp.ifdef
9681821143852180.000007es.usal
969182113987690.000041com.netflix
9701821119663290.000006org.cgsociety
9711821085438970.000009hn.google
9721821054456020.000006org.svoboda
9731820782844320.000008org.ascd
9741820778445000.000008uk.co.dailystar
9751820771236510.000010uk.co.articlelistings
976182073705030.000054com.dmca
977182071149160.000035com.ggpht
9781820703251990.000007com.curseforge
9791820643252650.000007org.nsidc
9801820634015200.000021com.technologyreview
9811820590856680.000006ug.co.google
9821820582240300.000009org.lacity
9831820534848430.000007com.cbn
984182047164340.000063com.businesswire
9851820471258600.000006mn.google
9861820439468680.000005kr.ac.postech
9871820433256130.000006it.unige
9881820352633140.000011uk.gov.food
9891820331463530.000006com.skepticalscience
990182030529090.000035org.weforum
9911820243449070.000007com.globalpost
9921820241651720.000007com.weightwatchers
9931820200034030.000010com.lexology
9941820073859440.000006tt.google
9951820021052820.000007com.betfair
9961819996854280.000007py.com.google
9971819892848150.000007com.abcnews
998181986987630.000041com.psychologytoday
9991819851269740.000005org.toile-libre
10001819841432910.000011net.vnexpress

Credits

Thanks to the authors of the WebGraph framework, whose software made the computation of graph properties and ranks possible.

We hope the data will be useful for you to do any kind of research on ranking, graph analysis, link spam detection, etc. Let us know about your results via Common Crawl’s Google Group!