We are pleased to announce a new release of host-level and domain-level web graphs based on the published crawls of May, June and July 2019. Additional information about the data formats, the processing pipeline, our objectives, and credits can be found in the announcements of prior webgraph releases (e.g., Nov/Dec/Jan 2017-2018 Webgraphs). You may also visit the projects cc-webgraph and cc-pyspark on GitHub which host all scripts and tools required to construct the graphs.

What’s new?

Links from Content-Location and Link HTTP headers are now also used to span up the web graphs. This is in accordance with RFC 5988 which defines the Link HTTP header as semantically equivalent to the element in HTML. It also fits previous web graph releases which used to include all kinds of links including technical ones and redirects.

Host-level graph

The graph consists of 445 million nodes and 3.14 billion edges and includes dangling nodes i.e. hosts that have not been crawled yet are pointed to from a link on a crawled page. There are 382 million dangling nodes (86%) and the largest strongly connected component contains 48 million (11%) nodes.

You can download the graph and the ranks of all 445 million hosts from AWS S3 on the path s3://commoncrawl/projects/hyperlinkgraph/cc-main-2019-may-jun-jul/host/. Alternatively, you can use https://data.commoncrawl.org/projects/hyperlinkgraph/cc-main-2019-may-jun-jul/host/ as prefix to access the files from everywhere.

Download files of the Common Crawl May/June/July 2019 host-level webgraph

SizeFileDescription
3.02 GBcc-main-2019-may-jun-jul-host-vertices.paths.gznodes ⟨id, rev host⟩, paths of 28 vertices files
14.75 GBcc-main-2019-may-jun-jul-host-edges.paths.gzedges ⟨from_id, to_id⟩, paths of 56 edges files
6.42 GBcc-main-2019-may-jun-jul-host.graphgraph in BVGraph format
2 kBcc-main-2019-may-jun-jul-host.properties
7.01 GBcc-main-2019-may-jun-jul-host-t.graphtranspose of the graph (outlinks inverted to inlinks)
2 kBcc-main-2019-may-jun-jul-host-t.properties
1 kBcc-main-2019-may-jun-jul-host.statsWebGraph statistics
7.22 GBcc-main-2019-may-jun-jul-host-ranks.txt.gzharmonic centrality and pagerank

Note that the host names are reversed and a leading www. is stripped: www.subdomain.example.com becomes com.example.subdomain.

Domain-level graph

The domain graph was built by aggregating the host graph on the level of pay-level domains (PLDs) based on the public suffix list maintained on publicsuffix.org.

The domain-level graph has 88 million nodes and 1.9 billion edges. 52% or 46 million nodes are dangling nodes, the largest strongly connected component covers 35 million or 40% of the nodes.

All files related to the domain graph are available on AWS S3 under s3://commoncrawl/projects/hyperlinkgraph/cc-main-2019-may-jun-jul/domain/ resp. https://data.commoncrawl.org/projects/hyperlinkgraph/cc-main-2019-may-jun-jul/domain/.

Download files of the Common Crawl May/June/July 2019 domain-level webgraph

SizeFileDescription
0.61 GBcc-main-2019-may-jun-jul-domain-vertices.txt.gznodes ⟨id, rev domain, num hosts⟩
7.50 GBcc-main-2019-may-jun-jul-domain-edges.txt.gzedges ⟨from_id, to_id⟩
4.06 GBcc-main-2019-may-jun-jul-domain.graphgraph in BVGraph format
2 kBcc-main-2019-may-jun-jul-domain.properties
3.99 GBcc-main-2019-may-jun-jul-domain-t.graphtranspose of the graph
2 kBcc-main-2019-may-jun-jul-domain-t.properties
1 kBcc-main-2019-may-jun-jul-domain.statsWebGraph statistics
1.91 GBcc-main-2019-may-jun-jul-domain-ranks.txt.gzharmonic centrality and pagerank

Below you’ll find the top 1000 domains ranked by Harmonic Centrality or PageRank. The full list of all 90 million domain ranks is available for download.

Top 1000 domains ranked by harmonic centrality (May/June/July 2019)

harmonic
centrality
rank
hc valuepage rankpage rank
value
reversed hostname
12997766810.020841com.googleapis
22786770430.011812com.facebook
32741998020.012857com.google
42519603040.007273com.twitter
52455883650.006439org.w
62453370260.005984com.youtube
72259209890.003799com.instagram
82206065070.004857org.gmpg
921829028130.002863com.linkedin
102159544680.004481com.googletagmanager
1120930920220.001704com.gravatar
1220912076240.001531com.pinterest
1320730700110.003384com.cloudflare
1420698732170.002180com.wordpress
1520613210120.003087org.wordpress
1620607942260.001241org.wikipedia
1720408594140.002452com.bootstrapcdn
1820351540200.001823com.apple
1920148418410.000904com.blogspot
2020103846300.001124com.vimeo
2120036764210.001719com.jquery
2219874716500.000673com.wp
2319870332290.001130com.microsoft
2419839912430.000816gl.goo
2519828406450.000769com.amazon
2619793040180.002021com.gstatic
2719790998190.002015com.adobe
2819788744570.000573com.tumblr
2919754126310.001104com.amazonaws
3019619798250.001407com.macromedia
3119616602340.001057com.googlesyndication
3219585788470.000744be.youtu
3319585670390.000937com.google-analytics
3419583342620.000531ly.bit
3519572994680.000440com.yahoo
3619549710330.001080com.flickr
3719526876350.001023net.cloudfront
3819526762230.001676com.github
3919503814600.000553me.wp
4019467672270.001170ru.yandex
4119467424580.000568org.mozilla
42194548901060.000305com.googleusercontent
4319411724490.000725net.doubleclick
4419374766520.000658co.t
4519366860440.000776com.baidu
4619322188700.000401com.weebly
47193217541050.000310com.reddit
48193170941230.000234com.nytimes
4919313908460.000749com.paypal
50193080941040.000312com.soundcloud
5119278436670.000448com.medium
5219268558660.000451io.github
5319266970630.000517org.w3
5419255616800.000379org.creativecommons
55192280341840.000143uk.co.bbc
56192194701750.000151com.imgur
57191914841370.000194com.forbes
58191843201680.000154net.slideshare
5919169524560.000588org.schema
60191662141620.000162com.bing
61191633881800.000144net.sourceforge
62191558821820.000143org.wikimedia
6319145618480.000738com.googleadservices
64191438402150.000109com.businessinsider
65191360402330.000104com.techcrunch
66191251982730.000089com.reuters
67191127301520.000169com.theguardian
68190917081770.000147com.imdb
6919081148640.000496net.jsdelivr
70190766421450.000177org.apache
71190689122020.000120org.gnu
72190677202500.000097com.ibm
73190659042740.000089com.cnet
74190604021940.000124com.washingtonpost
75190561621640.000159com.blogger
76190496223360.000073gov.nasa
77190434062710.000090com.android
7819038878320.001080com.fontawesome
79190308241960.000123com.huffingtonpost
80190227642430.000100com.oracle
8119022114990.000323com.shopify
82190100921780.000147com.stackoverflow
83190088662640.000092com.bbc
84189915041380.000194com.wixsite
85189797941930.000128org.ampproject
86189796063310.000074com.latimes
87189669243340.000073com.livejournal
88189543521480.000171com.eventbrite
89189529144060.000061com.zdnet
9018951470380.000950com.addthis
91189411682600.000093com.usatoday
92189303062610.000093com.wired
93189299484730.000052com.economist
94189248941220.000237com.ytimg
95189158202950.000083com.prnewswire
96189077841070.000304com.whatsapp
97189055622410.000101com.appspot
98189037502890.000086org.npr
99188998266050.000046com.thenextweb
100188987321390.000192com.issuu
101188971301980.000122org.ietf
102188931881810.000143jp.co.yahoo
103188890961420.000183com.spotify
104188887604490.000055com.venturebeat
10518888186550.000590eu.europa
106188862403820.000064com.goodreads
10718880882370.000994com.qq
108188808806010.000046org.ieee
109188769882090.000114com.bandcamp
110188744483590.000068com.quora
111188726664260.000058com.cisco
112188696402110.000112net.behance
113188665604740.000052org.arxiv
114188520803940.000062com.buzzfeed
11518844806950.000330com.sharethis
116188345024270.000058com.deviantart
117188341668990.000031com.ibtimes
118188297621850.000141com.giphy
11918828960960.000328com.statcounter
120188250746490.000043com.stackexchange
121188236241700.000152uk.co.google
122188188482830.000087com.cnbc
123188173848250.000034org.eclipse
124188145663330.000074com.aol
125188143924850.000051com.pixabay
126188069442060.000117com.disqus
127188009124580.000054com.about
12818793968420.000849com.squarespace
129187935725220.000048com.mysql
130187927401440.000180com.yelp
131187907943550.000068com.theatlantic
132187874244170.000059me.about
133187870063170.000077com.skype
134187826364760.000052com.visualstudio
135187805382320.000104me.t
136187726669480.000030com.nvidia
137187725604680.000053com.wikihow
138187683582760.000089com.sciencedirect
139187678222240.000106com.dribbble
140187622663240.000075com.scribd
141187592367120.000039google.blog
142187568861830.000143com.salesforce
143187562365510.000048com.slate
144187539681310.000208com.dropbox
145187516964070.000061uk.co.independent
146187512422990.000081com.fastcompany
147187465902570.000094com.googlecode
148187461422130.000111com.hubspot
149187444704400.000057com.newyorker
150187444524300.000058com.box
151187433321200.000249org.networkadvertising
152187369566670.000042org.chromium
153187359184630.000053gov.loc
154187341902970.000082com.example
155187337622000.000121com.cnn
156187314286710.000041com.tinypic
157187281602690.000090com.fc2
158187261047900.000035com.nymag
159187231847070.000039com.smashingmagazine
160187197046160.000045com.evernote
161187185762720.000090com.nbcnews
162187163965480.000048net.azurewebsites
163187106062190.000108com.npmjs
164187097701550.000167org.archive
165187087683060.000079com.w3schools
1661870509010240.000028ca.utoronto
167187038801910.000130jp.ne.hatena
168186999744770.000052io.codepen
16918699212610.000544com.vk
170186990169690.000029com.ign
171186946927030.000039com.speakerdeck
172186942568530.000033com.mediafire
173186916285060.000049com.foursquare
174186861028940.000031com.nike
175186841366080.000046com.trello
176186793021190.000251info.aboutads
177186761683760.000066com.mozilla
17818670790530.000604com.wix
179186697806390.000044uk.ac.ox
180186642641460.000174com.amazon-adsystem
181186611621030.000317com.paypalobjects
18218658320840.000366com.bizjournals
183186534383420.000072com.getpocket
184186390783160.000077ca.google
185186363244980.000050com.indiatimes
186186283165960.000047com.pinimg
187186261626240.000045com.cbslocal
188186242783110.000078edu.mit
189186238789420.000030com.chron
190186224201140.000272net.windows
1911861878611580.000025org.tensorflow
192186183267260.000038ca.blogspot
193186176028420.000033com.sap
194186156788410.000033com.css-tricks
195186121443600.000068com.entrepreneur
196186060506230.000045com.libsyn
197186033401340.000205com.unpkg
198186023021170.000253com.stripe
199186003523080.000079edu.harvard
200185974642260.000106com.wsj
2011859521410700.000026com.hackernoon
202185941748360.000033com.thehill
20318592786590.000557com.fb
204185905106250.000045ca.cbc
205185901729120.000031org.unicode
206185866107920.000035com.buffer
207185858803690.000067com.elsevier
208185811267940.000035com.theglobeandmail
20918580570150.002238com.wixstatic
210185793263630.000068me.telegram
211185788586620.000042com.searchengineland
212185764021790.000147org.bbb
213185749006560.000043site.business
214185744764810.000051com.withgoogle
215185743462520.000097es.google
216185726168740.000032org.kernel
217185724986440.000044com.flipboard
218185711067250.000038co.ibb
219185650146580.000042com.huffpost
2201856358810050.000028edu.rutgers
221185627888480.000033uk.co.wired
222185607447590.000036com.ssrn
223185606061130.000272com.weibo
2241855764610390.000027com.aljazeera
225185558607360.000037gov.archives
226185543383460.000071com.mapbox
227185540086370.000044org.d3js
228185532781510.000170com.yimg
2291855109810930.000026org.hrw
230185491046030.000046gg.discord
2311854683414680.000020com.hm
2321854670811460.000025ly.visual
233185459689850.000029com.geekwire
234185454642010.000120com.optimizely
2351854475412510.000023ca.huffingtonpost
236185441142120.000111edu.stanford
237185428609430.000030uk.co.huffingtonpost
238185423346180.000045co.elastic
2391853924218770.000017com.pearltrees
2401853588018290.000017cn.people
2411853069014020.000021com.diigo
242185288042960.000082com.tinyurl
243185280344550.000054com.mapquest
244185255509790.000029org.slashdot
2451852427611060.000025edu.osu
24618523478650.000473net.akamaihd
247185223168820.000032com.theconversation
248185174362780.000089org.purl
249185173623750.000066com.mashable
2501851427210970.000026com.dw
251185137709340.000030com.bt
252185115128440.000033com.today
253185114908770.000032com.marketwired
2541851032412670.000023jp.co.ntv
2551850984210990.000026com.mentalfloss
256185095889860.000029com.computerworld
2571850527613760.000021jp.ac.u-tokyo
258185052248400.000033co.g
259185030728470.000033com.healthline
260185027047820.000035com.ecwid
2611850161014280.000021com.sas
262185015264650.000053com.yoast
2631849900611010.000025edu.gatech
264184900784540.000054com.moz
2651848797616360.000019com.kaggle
2661848777213710.000021com.makeuseof
267184874005010.000050me.m
268184871642800.000088com.bloomberg
269184871547710.000036com.econsultancy
270184868388800.000032uk.parliament
271184837708200.000034com.newsweek
2721848373013000.000022com.googlesource
2731848064014260.000021blog.home
274184802547610.000036com.outbrain
2751847830610770.000026com.sfchronicle
276184755702040.000119org.iana
277184736423130.000077com.scorecardresearch
278184714661540.000169gov.nih
2791846176212790.000022com.avg
280184598224600.000054com.theverge
281184577326360.000044jp.shinobi
2821845503610090.000028org.postgresql
2831845205614150.000021com.dailydot
284184495729400.000030com.foxbusiness
285184483308140.000034com.adjust
286184470547640.000036edu.brookings
287184460747670.000036com.business2community
2881843792818610.000017com.uniqlo
2891843586614460.000020com.dezeen
290184333103470.000071com.trustpilot
291184327228050.000035com.contentmarketinginstitute
2921843196810360.000027com.trendmicro
293184312229090.000031org.aarp
294184309228500.000033com.searchenginewatch
295184280423090.000078org.python
296184279841600.000163com.twimg
297184274545080.000049edu.berkeley
298184252007680.000036uk.co.pinterest
299184233964460.000055com.bigcommerce
3001841826613580.000021edu.iastate
3011841717213070.000022com.motherjones
302184169307150.000039com.techtarget
303184140682810.000087com.myspace
3041841352211540.000025com.hostgator
3051841131410460.000027com.medicalnewstoday
3061841069610250.000028com.bustle
30718410084690.000420com.list-manage
308184095623230.000076uk.co.telegraph
309184093623300.000074com.meetup
3101840869011680.000024org.openoffice
3111840601212960.000022com.contently
312184035327200.000038com.cdbaby
313184020045140.000049com.adage
3141840140413370.000022org.wnyc
315184002867140.000039com.neilpatel
3161839861015270.000020com.mathworks
317183975184780.000052net.researchgate
318183945989380.000030co.apple
319183945863290.000074com.go
32018393232940.000347com.godaddy
321183920264110.000060com.msn
322183916744040.000061com.ted
3231839014814840.000020io.material
324183899948170.000034com.arstechnica
325183895088600.000033com.wikia
326183881889700.000029com.vogue
327183849123770.000066me.wa
3281838202014750.000020se.blogspot
329183806707890.000035edu.washington
330183803621650.000158com.opera
331183771062480.000098com.rawgit
332183769388190.000034com.bandsintown
3331837455810660.000026com.convinceandconvert
3341837415412380.000023com.convertkit
3351837385018710.000017io.soup
3361837031014380.000020com.secondlife
3371836615217210.000018com.zara
338183644382870.000086com.live
339183626282380.000102com.surveymonkey
340183586541880.000132com.etsy
341183563881690.000153com.feedburner
3421835609419440.000016edu.uark
3431835504819110.000017com.mysanantonio
344183547262660.000092uk.org.ico
345183526304290.000058org.hbr
346183524066020.000046com.livechatinc
3471835205814930.000020com.thenation
348183515867500.000037com.yellowpages
349183499221120.000281com.mailchimp
350183491268150.000034com.wordstream
3511834906215060.000020com.toptal
3521834703612160.000024io.itch
353183426264940.000050com.kickstarter
354183415722350.000104com.typepad
355183406084200.000059com.googleblog
356183380463660.000068com.aliyuncs
3571833776016700.000018com.manta
3581833760014630.000020com.amcharts
3591833665214190.000021com.indiewire
360183353784560.000054com.fortune
36118333310510.000663net.fbcdn
362183332862310.000105uk.co.amazon
3631833109215870.000019ly.adobe
364183297609220.000030com.searchenginejournal
3651832860215690.000019ms.nyti
366183257443740.000066com.ft
3671832351620040.000016com.zoominfo
3681832344211790.000024com.grammarly
3691832178016200.000019li.paper
3701832175012100.000024com.csmonitor
3711832151221480.000015com.brandyourself
3721832071620760.000015me.websta
373183108423400.000072com.getclicky
374183104009110.000031uk.gov.nationalarchives
375183068468630.000033com.engadget
376183040661590.000165com.zendesk
377183009809620.000029com.cio
37818300896870.000360de.google
3791830085216250.000019id.co.blogspot
3801830075217350.000018org.unfpa
381182994786860.000040com.intel
382182977164310.000058com.nationalgeographic
3831829768419420.000016com.cinemablend
3841829549219390.000016com.wral
385182953726630.000042com.vice
386182946684430.000056com.oreilly
3871829455410200.000028com.weddingwire
388182930344610.000053com.nature
3891829299814400.000020com.harpercollins
390182915702900.000085gov.cdc
391182906583640.000068com.githubusercontent
392182903685200.000048com.photobucket
393182903649260.000030com.socialmediaexaminer
394182900209980.000028com.firebaseapp
395182891508750.000032com.angieslist
396182888429010.000031com.sendpulse
397182886288220.000034edu.columbia
398182877508230.000034com.pexels
3991828660015410.000019com.mindbodygreen
4001827945215160.000020com.mailjet
401182783561490.000171com.tripadvisor
402182781683190.000077com.wiley
4031827693018500.000017com.merchantcircle
404182764542680.000090com.digg
4051827608818900.000017fr.huffingtonpost
4061827574616950.000018com.thoughtworks
4071827376010140.000028org.ocks
4081827322020620.000015jp.pinterest
409182727684840.000051com.cbsnews
410182718783520.000069int.who
411182705288160.000034com.format
412182701082550.000096net.php
4131826992414640.000020com.thecut
4141826865820550.000015org.spie
415182645542140.000110org.aboutcookies
4161826330012310.000023com.mynewsdesk
417182617324090.000060com.office
4181826162410710.000026com.fastcodesign
4191826085614520.000020fr.liberation
420182607743350.000073com.time
421182603664440.000056org.freecodecamp
4221826002016060.000019com.dummies
4231825940017780.000018com.instapaper
424182589307550.000036com.mediapost
425182558426300.000044com.proofpoint
4261825411818780.000017it.binged
4271825408613210.000022ly.snip
428182528584160.000059uk.co.dailymail
429182492606040.000046org.nodejs
430182485903920.000062fr.free
431182484924640.000053com.statista
432182473568790.000032com.gizmodo
433182466463150.000077com.st-hatena
4341824538816600.000018com.superpages
4351824407811200.000025com.theknot
436182436783570.000068com.unsplash
4371824149413970.000021com.jeffbullas
4381823620815220.000020com.biography
4391823594621460.000015de.huffingtonpost
4401823482014320.000021com.csoonline
4411823472614860.000020com.louisvuitton
442182335121210.000246com.jimdo
4431823292010400.000027uk.ac.cam
4441823234813380.000022google.ai
4451823158621900.000014com.mango
4461823090212270.000023com.activecampaign
447182263369640.000029com.netlify
448182261729530.000030com.eater
4491822398410040.000028com.smallbiztrends
4501822356421050.000015site.negocio
451182231002770.000089com.ebay
4521822177813010.000022ca.yellowpages
453182204226890.000040com.windowsphone
454182203667750.000035com.marketwatch
4551821971411470.000025com.redhat
4561821797221700.000015edu.scad
4571821766012900.000022com.digitaltrends
4581821731811230.000025org.mathjax
4591821667016580.000019com.politifact
4601821554622250.000014com.dexknows
461182147904900.000050gov.whitehouse
4621821004412250.000023com.quicksprout
463182074941760.000150com.slack
4641820520816550.000019uk.co.bbci
4651820319413360.000022com.cmswire
46618202308790.000382net.jsfiddle
4671819967416830.000018com.nyt
4681819849019280.000016com.itsnicethat
469181974928350.000034edu.psu
470181968563540.000068com.booking
471181967966880.000040com.webs
472181958529600.000030edu.ucla
473181913647010.000039gov.nist
474181911389450.000030com.sprinklr
475181911023070.000079gov.ca
47618188332760.000389com.livestream
4771818690813750.000021net.openid
4781818675011310.000025gov.fbi
479181858344750.000052tv.twitch
4801818349819820.000016google.design
4811817695017900.000017com.psmag
482181757887740.000036com.oath
4831817381614980.000020org.gnupg
484181721443510.000069com.hp
485181716302910.000085org.acm
4861816729624880.000013org.travelblog
4871816703222430.000014com.ingress
4881816557812640.000023com.coschedule
4891816476617460.000018com.financialexpress
4901816464818680.000017com.allafrica
4911816436011100.000025edu.princeton
4921816367223050.000014com.tommy
4931816337616290.000019org.whatbrowser
4941816234412990.000022com.kinsta
4951816131224410.000014com.algorithmia
496181611405300.000048net.brightcove
4971815843220240.000016jp.riken
498181576668310.000034com.msdn
499181576405110.000049edu.cornell
5001815656822790.000014com.theminimalists
501181539402360.000103to.amzn
5021815384014600.000020net.noscript
503181470563050.000079com.typeform
5041814683817270.000018com.iconarchive
505181453609280.000030org.weforum
506181446228380.000033com.git-scm
5071814354022010.000014net.organicfacts
5081814024217240.000018com.gap
509181389007090.000039org.bitbucket
510181366304030.000061com.dailymotion
511181346483930.000062com.nypost
5121813442422440.000014com.bonfire
5131813383220950.000015it.polito
514181325729030.000031com.sfgate
515181305442390.000101com.stumbleupon
5161813038622730.000014net.brownbook
5171812959420730.000015com.zynga
518181272965230.000048edu.yale
5191812613416560.000019com.wayfair
520181259982540.000096org.drupal
521181259263810.000065org.un
5221812381223000.000014com.23hq
523181228366140.000045gov.sec
524181201564150.000059com.gmail
5251811968411960.000024com.playstation
5261811823417170.000018org.polymer-project
5271811332817720.000018za.co.iol
5281811254819940.000016au.com.huffingtonpost
5291811092223440.000014com.marksdailyapple
5301811089415070.000020com.impactbnd
531181097406480.000043com.jwplatform
5321810923615130.000020com.instapage
5331810729213740.000021com.ning
5341810699020350.000015com.dreamgrow
535181061222920.000085cn.com.sina
5361810501021880.000014net.openreview
5371810480616040.000019com.aolcdn
538181040042100.000113com.constantcontact
5391810387418890.000017uk.ac.jisc
5401810365619930.000016com.towardsdatascience
5411810183016690.000018com.thermofisher
5421810027019240.000016com.city-data
543180999009410.000030uk.co.guardian
5441809982621360.000015com.whitepages
5451809950018910.000017com.deepmind
546180984966110.000046com.mobirise
547180974403560.000068com.springer
5481809627819290.000016org.elasticsearch
549180943907430.000037com.steampowered
5501809204810910.000026com.auth0
551180920081920.000128com.eepurl
5521809169412140.000024kr.or.kisa
553180908008320.000034gov.senate
55418090404710.000398me.fb
5551809018839300.000010com.artstation
556180901106840.000040org.eff
5571808850620750.000015com.quickanddirtytips
5581808822018560.000017com.googledrive
5591808789022670.000014lb.com.dailystar
5601808737010830.000026de.spiegel
5611808718421890.000014com.oilprice
5621808659813770.000021io.bower
5631808658619970.000016com.batchgeo
5641808636010600.000027com.clicky
5651808599011720.000024com.merriam-webster
5661808474619270.000016com.nytco
567180842722840.000087com.histats
5681808385616130.000019org.jenkins-ci
5691808358019660.000016com.underconsideration
5701808309022210.000014com.swatch
571180818686170.000045uk.co.blogspot
572180789363430.000071com.sxsw
573180785746190.000045com.patreon
5741807738814710.000020io.getmdl
5751807650610810.000026com.hollywoodreporter
576180753946100.000046com.163
577180753481560.000166ru.mail
5781807494018450.000017com.rabbitmq
5791807463617830.000017com.lexology
5801807455016650.000018com.invisionapp
5811807427219870.000016com.lightreading
5821807390613510.000021edu.northwestern
583180735569960.000028com.ubuntu
5841807345421110.000015edu.dukeupress
5851807185221730.000015org.onegreenplanet
5861807148021410.000015com.hotfrog
5871807059223610.000014edu.uah
5881806887213690.000021org.khanacademy
5891806843814610.000020uk.co.thesun
5901806690434870.000012com.wikidot
5911806637016140.000019com.digitaloceanspaces
5921806560224320.000014net.sott
5931806550015470.000019com.technologyreview
594180652823490.000070com.staticflickr
59518063590780.000383org.reactjs
596180619546600.000042com.xinhuanet
5971806177625550.000013com.idt
598180616782470.000098de.amazon
599180612687390.000037com.qz
6001805793619310.000016com.googleapps
6011805788417530.000018io.pantheon
6021805776821320.000015net.eenews
603180571167790.000035com.deloitte
6041805703816510.000019com.checkatrade
605180546226570.000043com.psychologytoday
606180545969000.000031gov.nps
6071805107819730.000016com.shoutmeloud
6081804917427610.000013ca.411
6091804862014960.000020com.citysearch
6101804818417480.000018com.tutsplus
6111804499021260.000015io.flutter
6121804403620640.000015com.vanguardngr
6131804324214730.000020edu.unc
6141804317419340.000016com.gimletmedia
6151804262616270.000019com.fifa
6161804143624630.000013org.simile-widgets
617180412309320.000030edu.upenn
6181804093020980.000015com.designobserver
619180408346610.000042org.pbs
6201804055224100.000014com.ubu
6211804042211180.000025net.recode
6221803956612080.000024jobs.amazon
623180384463450.000071com.tripod
6241803656213150.000022edu.purdue
625180351969210.000030com.variety
626180345029800.000029com.alexa
6271803415012110.000024us.imageshack
6281803317421980.000014edu.arizona
6291803286220080.000016in.huffingtonpost
6301803073415920.000019com.yell
631180302789740.000029org.sciencemag
6321802972813200.000022uk.co.theregister
6331802824616790.000018com.verywellmind
634180259548520.000033org.worldbank
635180256388650.000033io.readthedocs
636180251041300.000208com.youku
6371802471421780.000015com.epochtimes
6381802430621860.000015info.bem
639180233982210.000107com.taobao
6401802228811440.000025com.elpais
6411802180019630.000016org.dartlang
6421802156610880.000026org.altervista
643180212943580.000068org.debian
644180209924450.000056com.force
6451802094012750.000023com.ifttt
6461802046622090.000014com.youm7
6471801964010730.000026com.vox
6481801956819330.000016com.hulu
6491801903222560.000014au.com.yellowpages
6501801898025050.000013com.pushwoosh
6511801661211770.000024com.nydailynews
652180161306980.000039gov.noaa
6531801460016570.000019com.yext
654180140229580.000030com.shutterstock
6551801362823200.000014com.gifyu
6561801332012620.000023com.storify
657180132566760.000041com.samsung
6581801294410950.000026edu.ucsd
659180119784220.000058edu.nyu
660180097366960.000040com.tandfonline
661180095824470.000055com.atlassian
662180092468960.000031com.geocities
663180088124390.000057edu.cmu
6641800874624330.000014com.yelloyello
665180086027800.000035com.netflix
6661800744012910.000022tv.ustream
667180071046200.000045us.icio
6681800681211380.000025edu.utexas
669180059244480.000055com.gitlab
6701800579020930.000015com.targetmarketingmag
6711800430621660.000015com.cargurus
672180042068860.000032com.docker
6731800293211910.000024com.trustedshops
6741800221824790.000013com.analyticsvidhya
6751800143424450.000013com.2findlocal
676179985208450.000033com.foxnews
6771799714620800.000015jp.huffingtonpost
6781799573626370.000013com.instructables
6791799523819450.000016com.nokia
6801799510011970.000024edu.academia
681179926647560.000036com.gettyimages
682179912302450.000099com.wpengine
6831799108422120.000014ca.uwaterloo
6841798868625470.000013com.cmgdigital
685179870908660.000033edu.umich
686179869746930.000040com.symantec
687179866348100.000034net.2mdn
6881798662621290.000015com.mondaq
689179861649520.000030com.ycombinator
6901798579422060.000014com.keepersecurity
691179850963880.000063com.newrelic
6921798474620540.000015com.doctoroz
693179845349080.000031com.uservoice
694179838622070.000115com.naver
6951798268415570.000019com.pastebin
696179804161890.000132com.xing
6971797873618570.000017com.duckduckgo
698179783129560.000030com.thinkwithgoogle
6991797812819070.000017se.haxx
7001797698420070.000016com.thecvf
7011797592622550.000014au.com.truelocal
7021797464035340.000012com.9to5mac
7031797453421300.000015uk.co.yelp
7041797416214440.000020fm.last
705179740868910.000032com.dropboxusercontent
7061797335415540.000019com.sankei
7071797292422050.000014com.tiddlywiki
7081797185823150.000014com.galvanize
7091797124021490.000015es.huffingtonpost
710179711082490.000098com.automattic
711179697289200.000031com.investopedia
7121796799422350.000014com.bizcommunity
7131796745811560.000025org.cambridge
7141796729612200.000023com.freeprivacypolicy
715179672869170.000031org.change
7161796654221450.000015com.winemag
7171796632424440.000014com.maritime-executive
7181796542410520.000027gov.uspto
7191796446425560.000013com.alternion
7201796335818340.000017com.autodesk
7211796312424110.000014com.communitywalk
7221796272618390.000017org.coursera
7231796220212550.000023com.upwork
7241796068223410.000014net.futurecdn
7251795997420890.000015com.kudzu
7261795985823520.000014com.ericsson
7271795832018320.000017com.adespresso
7281795692225270.000013edu.alamo
7291795678412600.000023com.irishtimes
7301795677823420.000014com.filedn
7311795666013530.000021edu.usc
7321795615810410.000027com.wunderground
733179557228640.000033br.com.uol
734179557186970.000039com.gartner
7351795538422540.000014com.gamespot
7361795525420740.000015com.btplc
7371795443220580.000015com.showmelocal
7381795425623860.000014com.massimodutti
7391795377420200.000016edu.virginia
7401795360017310.000018com.ikea
7411795347622600.000014com.insiderpages
7421795341612740.000023com.indiegogo
7431795267620300.000016com.goinswriter
7441794969424400.000014com.bershka
7451794935021840.000015com.almanac
746179491607700.000036gov.census
7471794688012330.000023com.intuit
748179459144130.000060com.inc
7491794453243470.000009com.programmableweb
7501794326811320.000025com.pcmag
7511794266621940.000014com.writersdigest
7521794213422830.000014com.citysquares
7531794164615350.000020com.fiverr
7541794131618720.000017com.csswizardry
7551794129613730.000021com.vanityfair
7561794117219030.000017jp.sankeibiz
7571794079624560.000013com.live5news
7581793923411170.000025gov.usgs
759179389169140.000031com.zoho
7601793828234000.000012com.freep
761179374088300.000034com.blackberry
7621793725621630.000015jp.booklog
7631793684423510.000014com.thedrinksbusiness
7641793542610570.000027com.politico
7651793538821970.000014com.winefolly
766179347686870.000040com.alibaba
7671793436629700.000013com.jeeran
7681793408024940.000013io.stackedit
7691793386219580.000016ca.ubc
770179337641610.000163me.line
7711793353839400.000010org.greenpeace
7721793306223810.000014com.yellowbook
7731793233624590.000013za.co.bdlive
7741793221223960.000014com.asianage
7751793209016310.000019com.udemy
7761793205834150.000012com.glamour
7771793183016350.000019com.chrome
7781793164211940.000024com.techrepublic
779179316148490.000033com.unity3d
7801793159020330.000015mp.j
781179315365980.000047gov.usda
7821793100423650.000014net.islamweb
783179293348080.000034int.wipo
7841792846623550.000014com.wsoctv
785179275926650.000042com.marketo
7861792704810490.000027edu.umn
787179269324140.000060mp.mailchi
788179263549680.000029com.aliexpress
7891792590026080.000013org.torproject
7901792536023220.000014com.utah
791179252289540.000030com.sciencedaily
7921792450219320.000016org.ap
793179240987240.000038gov.house
7941792397622080.000014com.chamberofcommerce
7951792359419530.000016com.urbandictionary
7961792355825700.000013com.spoke
7971792279428070.000013com.salespider
7981792117623690.000014com.ibmbigdatahub
799179211269810.000029au.net.abc
8001792107415650.000019com.problogger
801179210085330.000048com.snapchat
8021792092214250.000021fr.lemonde
803179193461410.000185jp.co.google
8041791759255390.000006cc.co
8051791702020170.000016com.posterous
8061791688211290.000025com.canva
8071791610416380.000019com.britannica
8081791578223790.000014com.wpxi
8091791546820400.000015edu.cuny
8101791545025190.000013com.americantowns
811179143646750.000041gov.hhs
8121791399622870.000014org.themoth
8131791332814200.000021com.rollingstone
8141791302212450.000023com.xkcd
8151791298435820.000011edu.brown
816179128806340.000044com.feedly
8171791245824470.000013com.hdnux
8181791239626090.000013com.zionsbank
8191791231625150.000013com.pacegallery
8201791143426450.000013com.tupalo
821179111366400.000044au.com.google
822179110608430.000033com.uk
823179090561280.000215com.youtube-nocookie
8241790886213220.000022com.vmware
8251790856820940.000015org.semanticscholar
8261790834219700.000016com.sanspo
8271790824810130.000028com.java
8281790817822390.000014it.scoop
829179078424700.000053com.adweek
8301790755023010.000014uk.co.dennis
8311790747421080.000015jp.co.sankei
8321790695025760.000013za.co.sowetanlive
833179067485040.000049gov.copyright
834179062503620.000068com.wufoo
8351790556223100.000014edu.uci
8361790490820520.000015jp.ne.iza
8371790481025430.000013org.foodrevolution
8381790445624910.000013com.thewritepractice
8391790445421470.000015com.parksassociates
8401790441011110.000025fr.blogspot
8411790403426360.000013au.com.whitepages
8421790387813290.000022com.billboard
8431790336012720.000023com.prezi
8441790221621720.000015com.local
845179014143200.000076gov.ftc
8461789997018310.000017edu.illinois
847178995869750.000029com.indeed
848178990668110.000034org.unesco
8491789898618520.000017com.hatenablog
8501789818424970.000013dk.brics
8511789800621180.000015uk.ac.ed
8521789761811730.000024org.unicef
853178972324250.000058com.criteo
8541789639821510.000015org.linuxfoundation
8551789606822150.000014com.vendio
8561789571819810.000016uk.ac.ucl
857178949962790.000089com.marriott
8581789419667530.000005com.blog
8591789376211870.000024com.steamcommunity
860178935828340.000034com.gofundme
8611789335420220.000016net.privacypolicytemplate
8621789303840670.000009com.virustotal
863178920184670.000053com.iconfinder
8641789161625410.000013com.lacartes
8651789127422690.000014ai.fast
8661789123217960.000017com.howstuffworks
8671788898412220.000023com.dell
8681788886624730.000013com.ibegin
8691788816614700.000020com.over-blog
870178874043500.000069net.themeforest
871178873763020.000080com.netdna-ssl
8721788727235760.000011edu.tufts
8731788657224870.000013za.co.moneyweb
8741788613616890.000018com.twilio
875178860428870.000032com.hootsuite
8761788457612300.000023com.gallup
8771788433223940.000014com.machinelearningmastery
8781788290623890.000014io.dropwizard
879178825129920.000028com.att
8801788141623080.000014com.ehow
8811788078836600.000011com.discogs
8821788072423330.000014com.blogs
8831788068821390.000015com.dandb
884178796964860.000051com.squareup
8851787953210370.000027gov.bls
886178794784010.000061com.bitly
8871787854036650.000011com.twitpic
8881787835830640.000013com.invoicesherpa
889178775786500.000043com.herokuapp
8901787710222650.000014ru.narod
8911787657018750.000017com.tunein
8921787525215700.000019com.com
8931787465019800.000016jp.co.zakzak
894178740526130.000046com.airbnb
8951787348821230.000015uk.co.realbusiness
896178722088370.000033gov.justice
8971787183819510.000016co.gcdn
898178716182670.000091com.myshopify
8991787084435040.000012de.bild
900178704782340.000104jp.co.amazon
9011786997619050.000017org.filezilla-project
9021786972225740.000013com.growtix
9031786971019220.000016com.newsfactor
9041786862627750.000013org.earthmagazine
9051786824035950.000011cc.tiny
906178681023390.000072org.opensource
9071786762817100.000018org.owasp
9081786743416780.000018org.cancer
909178652203700.000067org.doi
9101786475212150.000024ly.ow
9111786445828200.000013co.iglobal
9121786355613300.000022edu.uchicago
913178632621330.000206de.bund
914178628542590.000094com.getbootstrap
915178621864990.000050com.nasdaq
9161786182410000.000028com.lifehacker
9171786174812710.000023org.pnas
918178616443950.000062io.atom
9191786132414580.000020in.blogspot
9201786030626440.000013ai.becominghuman
9211786024826800.000013com.googlemaps
9221785853020090.000016net.nend
9231785686847300.000008com.colourlovers
9241785679814130.000021com.splashthat
925178566769820.000029com.jetbrains
926178561769150.000031jp.livedoor
927178561523030.000080com.ssl-images-amazon
9281785422426430.000013nl.zeelandnet
929178536208690.000032com.pingdom
9301785353430780.000013com.sophos
9311785290825250.000013gr.huffingtonpost
9321785214210020.000028de.blogspot
9331785068227600.000013com.fox13memphis
9341785048821140.000015com.richmediagallery
9351785037818210.000017com.hotmail
93617850366720.000395com.messenger
9371785027622310.000014edu.asu
938178500529950.000028org.iso
9391784997613890.000021com.imimg
9401784937211450.000025com.uber
9411784912023560.000014com.tuck
9421784855617260.000018com.nba
9431784823224040.000014jp.news24
9441784773215590.000019com.ogilvy
9451784731827720.000013com.addustour
9461784680028310.000013org.grayarea
9471784671421030.000015com.homestars
9481784665011360.000025com.seattletimes
949178465802650.000092ru.rambler
9501784598823620.000014edu.utah
9511784586838620.000010com.starwars
952178456404790.000051jp.ne.sakura
9531784471810630.000027gov.congress
9541784310214100.000021dk.datatilsynet
955178429328590.000033com.stitcher
9561784279829710.000013com.oilandgas360
9571784248617850.000017edu.umd
958178424307580.000036com.yandex
9591784010018850.000017com.wetransfer
9601783962824570.000013ms.1drv
961178382129770.000029com.prweb
962178380864230.000058com.smugmug
9631783770224140.000014com.delta
9641783635623060.000014edu.bu
9651783615611410.000025com.500px
9661783466827960.000013org.cmlibrary
9671783424825650.000013com.fixr
9681783376413120.000022com.firefox
9691783336820500.000015edu.ufl
9701783161024090.000014ca.ualberta
9711783138639770.000010com.thingiverse
972178308884000.000061com.discordapp
9731783071438170.000010edu.unl
9741782974827460.000013tw.com.ibon
9751782913424900.000013au.com.hotfrog
9761782896623500.000014de.mpg
9771782892811600.000025com.timeanddate
9781782858024950.000013com.figure-eight
9791782857423700.000014com.codecademy
980178279648900.000032gov.usa
981178275182560.000096it.google
9821782706424270.000014com.outboundengine
9831782687413080.000022com.strikingly
9841782684012430.000023com.target
9851782575824550.000013com.theblogpress
9861782530025850.000013com.expressbusinessdirectory
9871782528822160.000014com.nfl
9881782519226070.000013com.elocal
9891782512026280.000013au.com.news
9901782431411160.000025com.scientificamerican
9911782416813250.000022co.vine
992178237107470.000037com.cargocollective
993178235306910.000040com.caniuse
9941782193021070.000015com.angelfire
9951782078830050.000013com.hbo
9961782064816390.000019uk.co.screamingfrog
9971782030426170.000013com.ovoenergy
998178200107370.000037uk.co.eventbrite
9991781972626910.000013com.normacomics
1000178195767520.000037com.sagepub

Credits

Thanks to the authors of the WebGraph framework, whose software made the computation of graph properties and ranks possible.

We hope the data will be useful for you to do any kind of research on ranking, graph analysis, link spam detection, etc. Let us know about your results via Common Crawl’s Google Group!