We are pleased to announce a new release of host-level and domain-level web graphs based on the published crawls of August, September and October 2018. Additional information about data formats, the processing pipeline, our objectives, and credits can be found in the announcements of prior webgraph releases (e.g., the Feb/Mar/Apr 2017 Webgraphs). You may also visit the projects cc-webgraph and cc-pyspark which host all scripts and tools required to construct the graphs.

Host-level graph

The graph consists of 903 million nodes and 5.25 billion edges and includes dangling nodes i.e. hosts that have not been crawled yet are pointed to from a link on a crawled page. There are 819 million dangling nodes (91%) and the largest strongly connected component contains only 60 million (6.5%) nodes. The host names are reversed and a leading www. is stripped: www.subdomain.example.com becomes com.example.subdomain.

You can download the graph and the ranks of all 903 million hosts from AWS S3 on the path s3://commoncrawl/projects/hyperlinkgraph/cc-main-2018-aug-sep-oct/host/. Alternatively, you can use https://data.commoncrawl.org/projects/hyperlinkgraph/cc-main-2018-aug-sep-oct/host/ as prefix to access the files from everywhere.

The following files and formats are provided:

SizeFileDescription
5.66 GBcc-main-2018-aug-sep-oct-host-vertices.paths.gznodes ⟨id, rev host⟩, paths of 42 vertices files
23.60 GBcc-main-2018-aug-sep-oct-host-edges.paths.gzedges ⟨from_id, to_id⟩, paths of 98 edges files
9.63 GBcc-main-2018-aug-sep-oct-host.graphgraph in BVGraph format
2 kBcc-main-2018-aug-sep-oct-host.properties
10.83 GBcc-main-2018-aug-sep-oct-host-t.graphtranspose of the graph (outlinks inverted to inlinks)
2 kBcc-main-2018-aug-sep-oct-host-t.properties
1 kBcc-main-2018-aug-sep-oct-host.statsWebGraph statistics
13.47 GBcc-main-2018-aug-sep-oct-host-ranks.txt.gzharmonic centrality and pagerank

Domain-level graph

The domain graph was built by aggregating the host graph on the level of pay-level domains (PLDs) based on the public suffix list maintained on publicsuffix.org.

The domain-level graph has 87 million nodes and 1.48 billion edges. 56% or 49 million nodes are dangling nodes, the largest strongly connected component covers 33.5 million or 38% of the nodes.

All files related to the domain graph are available on AWS S3 under s3://commoncrawl/projects/hyperlinkgraph/cc-main-2018-aug-sep-oct/domain/ resp. https://data.commoncrawl.org/projects/hyperlinkgraph/cc-main-2018-aug-sep-oct/domain/.

Download files of the Common Crawl Aug/Sep/Oct 2018 domain-level webgraph

SizeFileDescription
0.60 GBcc-main-2018-aug-sep-oct-domain-vertices.txt.gznodes ⟨id, rev domain, num hosts⟩
5.95 GBcc-main-2018-aug-sep-oct-domain-edges.txt.gzedges ⟨from_id, to_id⟩
3.24 GBcc-main-2018-aug-sep-oct-domain.graphgraph in BVGraph format
2 kBcc-main-2018-aug-sep-oct-domain.properties
3.39 GBcc-main-2018-aug-sep-oct-domain-t.graphtranspose of the graph
2 kBcc-main-2018-aug-sep-oct-domain-t.properties
1 kBcc-main-2018-aug-sep-oct-domain.statsWebGraph statistics
1.89 GBcc-main-2018-aug-sep-oct-domain-ranks.txt.gzharmonic centrality and pagerank

Below you’ll find the top 1000 domains ranked by Harmonic Centrality or PageRank. The full list of all 87 million domain ranks is available for download.

Top 1000 domains ranked by harmonic centrality (Aug/Sept/Oct 2018)

harmonic
centrality
rank
hc valuepage rankpage rank
value
reversed hostname
12499327620.012750com.facebook
22467105610.017210com.googleapis
32345336630.010761com.google
42237157240.008252com.twitter
52213683650.006786com.youtube
62111524660.006404org.w
71959833890.003741com.instagram
81949947280.004616org.gmpg
919210640100.003396com.linkedin
1018557384140.002133com.wordpress
1118475084130.002729org.wordpress
1218409784250.001383org.wikipedia
1318366100240.001489com.gravatar
1418361154210.001623com.pinterest
1517992020120.002819com.bootstrapcdn
1617977192190.001774com.apple
1717931956320.001000com.blogspot
1817718492410.000758be.youtu
1917629132340.000918gl.goo
2017616074260.001308com.microsoft
2117591076160.001984com.googletagmanager
2217583026370.000840com.amazon
2317567490150.002013com.cloudflare
2417539566460.000658com.tumblr
2517512444230.001557com.adobe
2617478800280.001256com.vimeo
2717351964610.000482com.yahoo
2817328248220.001618com.macromedia
2917262608450.000669com.wp
3017247612360.000867com.paypal
3117245380200.001675com.github
3217243016270.001272com.gstatic
3317237096330.000928com.amazonaws
3417218638470.000617me.wp
3517175034520.000568org.mozilla
3617148704980.000318com.googleusercontent
3717124438400.000793co.t
3817114262750.000425com.weebly
39170882261150.000256com.nytimes
4017080324390.000796net.cloudfront
4117058308780.000424org.creativecommons
4217046746510.000584org.w3
43170344461500.000171org.wikimedia
4416978138650.000465com.medium
4516977018590.000490com.flickr
4616969818600.000482ly.bit
4716934330380.000830io.github
48169255641390.000188net.slideshare
49169079181490.000171com.theguardian
5016886648710.000433com.jquery
51168480681330.000196com.imgur
52168380701650.000145com.myspace
5316825278530.000551eu.europa
54168139681590.000152com.imdb
5516800572350.000899net.fbcdn
56167515371270.000208com.issuu
57167312961130.000267org.apache
5816730104300.001101net.doubleclick
59167130841880.000124com.tinyurl
60166901642780.000085com.theverge
61166862241050.000288com.reddit
62166815261200.000229com.yelp
6316673945170.001886com.wixstatic
64166664272600.000091com.appspot
65166612482740.000086com.buzzfeed
66166602871430.000182com.oracle
67166536601340.000191com.spotify
68166428502410.000097me.about
69166412621230.000216com.android
70166351423350.000071org.chromium
711661984570.004633com.godaddy
72166191181550.000164com.tripadvisor
7316611356310.001041com.squarespace
74166098652850.000083com.mysql
75165999433080.000076com.about
76165931643460.000067org.arxiv
77165922991310.000204org.ietf
7816588645950.000343com.soundcloud
79165838513370.000071edu.upenn
80165815914350.000057edu.princeton
81165748033580.000066org.ieee
82165737291560.000164org.gnu
83165673421240.000215com.dropbox
84165671263240.000073com.deviantart
85165438361480.000174com.forbes
86165283341300.000206com.whatsapp
8716521358870.000400com.statcounter
88165212544270.000058google.blog
89165204604300.000057com.ssrn
9016519609620.000481org.schema
91165167901320.000199org.archive
92165111141440.000182net.sourceforge
93165024311800.000130com.cnn
94164991283200.000074gov.loc
95164893331910.000121com.foursquare
96164883801890.000122edu.stanford
9716487861880.000386com.bing
98164837083710.000065edu.ucla
99164798523890.000062com.stackexchange
100164789364940.000050edu.gatech
101164784084140.000059org.sciencemag
102164761801830.000129com.dribbble
103164754712030.000109com.nbcnews
104164709334130.000059com.withgoogle
105164500542550.000092com.example
106164479773590.000066com.googlecode
107164439871080.000278com.ytimg
108164386571690.000142uk.co.bbc
109164367612360.000098edu.mit
110164282942300.000099com.mozilla
111164243522240.000102com.githubusercontent
112164146274580.000054com.sap
113164122535650.000046com.flipboard
114164119582050.000109com.washingtonpost
115164084021460.000176com.blogger
116164068975460.000047com.chrome
11716404143490.000590com.fb
118164027075710.000045edu.utah
119163981224020.000060com.jetbrains
120163961634920.000050com.chron
121163954493220.000073com.git-scm
122163946551860.000125com.huffingtonpost
123163939012230.000102com.businessinsider
12416393651970.000330com.wix
12516391015890.000380com.paypalobjects
126163798471220.000225org.bbb
127163795472430.000097com.live
128163726702870.000082gov.fda
129163726412800.000084au.com.google
13016372405540.000524com.list-manage
131163710022860.000082edu.harvard
132163670165320.000047com.fastcodesign
133163661463650.000066com.tinypic
134163649371940.000117com.wsj
135163475624100.000059tv.ustream
136163441052980.000080com.cnet
137163427653340.000071com.bbc
138163399483870.000062com.variety
139163392484070.000060org.eclipse
140163386364410.000056co.g
141163368193040.000078com.reuters
142163331162580.000091org.doi
143163262282620.000090com.ibm
144163211632660.000088com.wired
145163189573120.000076uk.co.telegraph
146163172741970.000112com.typepad
147163172623490.000067com.gmail
148163163424180.000058org.iana
149163097962690.000087com.bloomberg
150163093632480.000095net.windows
151163083941040.000290com.shopify
152163049614560.000054co.ibb
153163048911810.000129com.stackoverflow
154163032802400.000097com.techcrunch
15516297167550.000519net.akamaihd
156162966752720.000087com.go
157162962171540.000166gov.nih
158162895193940.000061gov.nasa
159162886953390.000071com.msn
160162875253520.000067com.latimes
161162854231620.000147com.etsy
162162823101090.000274com.google-analytics
163162822865080.000049edu.rutgers
164162822325450.000047ca.utoronto
165162756721700.000142com.twimg
166162755821030.000293com.mailchimp
16716274598900.000378de.google
168162712402650.000088org.acm
169162687073660.000066com.mashable
170162674304980.000050com.quora
171162647464160.000058au.gov.nsw
172162641481160.000242com.jimdo
17316261109500.000589com.fontawesome
174162526685500.000047com.vogue
175162516424670.000053com.zdnet
176162508183570.000067uk.co.dailymail
177162477306630.000044com.hbo
178162476214470.000055com.googleblog
179162456807610.000039com.dezeen
180162446872770.000085com.usatoday
181162443201580.000162com.eventbrite
182162431635820.000045edu.osu
183162394592630.000090com.meetup
184162294624290.000058gov.archives
185162241804500.000055edu.cornell
186162231754610.000053edu.berkeley
187162187633960.000061com.ted
188162179361510.000170com.opera
189162149805810.000045edu.washington
190162113632990.000080com.udacity
191162083635800.000045org.hrw
192161978692080.000107com.surveymonkey
193161954993160.000075com.time
194161924494860.000051com.ecwid
195161874384090.000060com.kickstarter
196161874073210.000074org.npr
197161873446960.000042com.discogs
198161811087000.000042io.itch
199161778074960.000050org.unicode
200161777483130.000076com.springer
20116176015290.001149ru.yandex
202161742314460.000055org.kernel
203161731993700.000065com.aol
204161730597010.000042com.economist
205161714042900.000081com.hp
206161689832310.000099com.mapquest
20716167485480.000602com.qq
208161637857580.000039org.wikibooks
209161605183620.000066com.cnbc
210161545403900.000062org.un
211161527693330.000072org.python
212161524224880.000051com.ft
213161514742100.000107org.drupal
214161489814010.000060me.paypal
215161487406900.000042com.strava
216161481524170.000058com.angieslist
217161442222670.000088com.hubspot
218161424741360.000191com.zendesk
219161414035040.000049org.aarp
220161399273640.000066com.giphy
221161380247410.000040org.amnesty
222161360865520.000046com.yellowpages
223161336163430.000069com.nypost
224161327977670.000038com.wikia
225161322237140.000041com.dropboxusercontent
226161316154190.000058com.fortune
22716128988700.000439net.jsfiddle
228161284283300.000072com.wiley
22916127117910.000355com.baidu
230161264492010.000110uk.co.amazon
231161246485090.000049com.unsplash
232161233351450.000179uk.co.google
233161228603610.000066com.prnewswire
234161190938210.000037com.slate
235161178004820.000051com.cisco
236161143233530.000067com.photobucket
237161120365610.000046com.venturebeat
238161111668730.000036com.pixabay
239161084369760.000034com.arstechnica
240161056541980.000111org.purl
241161026422060.000108com.ebay
242161012427980.000038com.manta
243160992231370.000189com.wixsite
244160983187020.000042com.intel
245160975986850.000043com.nationalgeographic
246160963994420.000056com.entrepreneur
247160902874050.000060gov.whitehouse
248160900764590.000054com.nature
249160898023190.000074com.oreilly
250160883193760.000064com.office
251160871485760.000045com.samsung
25216084044570.000494com.vk
253160828144790.000052com.matterport
254160805534750.000052org.postgresql
255160780466010.000045com.newyorker
256160752552970.000081gov.cdc
257160751191730.000138com.constantcontact
258160739336970.000042com.vice
259160731138290.000037edu.psu
2601607140211360.000031com.gizmodo
261160714025510.000047com.scribd
262160703539230.000035com.qz
263160686163560.000067org.ampproject
264160686035570.000046gov.nist
265160683562940.000081me.telegram
266160670254900.000051com.wikihow
267160663868120.000037ly.snip
268160661422200.000104com.disqus
269160656949730.000034edu.yale
270160629224740.000052com.cbsnews
271160622647790.000038edu.kit
272160602416070.000044org.eff
273160597865830.000045com.box
274160593282370.000098net.php
275160579171260.000209com.feedburner
276160571794760.000052com.theatlantic
277160556828280.000037com.engadget
278160520132640.000089gov.ftc
279160475077910.000038com.merchantcircle
280160449012520.000093com.digg
281160441414480.000055org.hbr
282160426607070.000041org.nodejs
283160421894530.000055com.inc
284160408923740.000064com.images-amazon
285160397453790.000064com.skype
286160388012120.000107com.salesforce
287160382917160.000041com.statista
2881603536414210.000027edu.utexas
289160342942930.000081com.staticflickr
290160336002910.000081com.fastcompany
2911603303714390.000027com.pexels
292160304766940.000042edu.columbia
293160286188760.000036com.marketwatch
294160271536000.000045com.avvo
2951602434914360.000027com.storify
296160239863400.000070int.who
297160232921060.000284com.addthis
298160213207750.000038com.indiatimes
2991601617714450.000026com.thinkwithgoogle
300160157214060.000060org.maven
301160143994490.000055com.w3schools
3021601365814080.000028com.smashingmagazine
303160121648780.000036com.mysanantonio
304160117143720.000064co.elastic
305160117052150.000105com.stumbleupon
306160112292260.000101to.amzn
3071600818314920.000025edu.purdue
308160065451950.000116net.behance
309160060385600.000046org.pbs
31016005359630.000476me.fb
311160033753020.000079com.googlesyndication
312160029949690.000034au.net.abc
3131600295116010.000022com.vanityfair
314160027914990.000050com.slack
315160014922700.000087gov.ca
316159999723110.000076com.tripod
317159952763380.000071com.sxsw
318159930364080.000060uk.co.blogspot
319159902511410.000185com.weibo
320159876816840.000043net.researchgate
3211598556714150.000027com.alexa
322159843353550.000067com.dailymotion
3231598283514010.000028edu.ucsd
324159821386860.000042com.blackberry
3251598189110140.000033org.worldbank
326159791003150.000075fr.free
327159780044720.000052net.leadpages
328159765849740.000034com.thenextweb
329159716885540.000046com.moz
3301597046316990.000020org.owasp
331159694043600.000066com.sciencedirect
332159687767620.000039com.uservoice
3331596849810030.000033com.shutterstock
334159657743750.000064edu.cmu
335159622101760.000134org.icann
336159611797320.000040com.proofpoint
337159584009030.000035edu.uark
3381595732310820.000032com.evernote
339159568903730.000064com.livejournal
340159552336910.000042com.googlesource
3411595148010060.000033ly.ow
342159497795890.000045gov.sec
343159463019550.000034com.speakerdeck
3441594494913510.000029com.lifehacker
345159416795840.000045com.citysearch
346159413048790.000035org.unesco
347159405258140.000037com.psychologytoday
3481593781313190.000031com.trello
349159376069130.000035com.sfgate
350159362859940.000033com.designobserver
3511593410315360.000024edu.northwestern
352159335054570.000054com.snapchat
3531593207113200.000031uk.ac.ox
354159316966710.000043tv.twitch
3551593155110210.000032gov.fcc
356159309236780.000043org.bitbucket
3571592982017780.000019com.fifa
358159297244120.000059com.businesswire
359159289288030.000037org.aiga
360159266172440.000096com.wufoo
361159265935690.000045com.atlassian
362159262032140.000106de.amazon
363159254963270.000072com.typeform
3641592476616030.000022com.mcafee
3651592299810470.000032com.libsyn
3661592223318780.000017org.coursera
367159217529160.000035com.zynga
368159215499610.000034com.kudzu
3691592152918520.000018com.semrush
370159205416740.000043com.ubuntu
3711592014816730.000021com.econsultancy
3721591820014400.000027com.indiegogo
3731591727513830.000028com.politico
374159172612950.000081org.mediawiki
375159167397540.000039org.aclweb
376159166399630.000034com.deloitte
377159148549300.000034org.spie
378159147609810.000033com.livestream
3791591220314490.000026co.vine
3801591011915530.000023org.khanacademy
381159088265160.000048com.goodreads
382159083399890.000033gov.uspto
383159077623030.000079org.joomla
3841590699213980.000028com.zoho
385159026869080.000035me.websta
386159015168250.000037com.foxnews
387159003023500.000067com.booking
3881589916010130.000033io.codepen
389158983821290.000206com.youtube-nocookie
390158970261630.000146jp.co.yahoo
3911589637816130.000022edu.unc
3921589614116520.000021com.technologyreview
3931589427717090.000020com.digitaltrends
3941589338312170.000031org.iso
3951589322915690.000023com.pingdom
396158931179140.000035gov.senate
397158928602890.000082com.smugmug
398158905951990.000111com.bandcamp
399158893839750.000034com.mckinsey
400158886019200.000035it.binged
4011588810213890.000028com.udemy
402158859959910.000033com.what3words
4031588524725610.000012com.sophos
4041588464416220.000022org.weforum
405158845603800.000064net.themeforest
406158842726260.000044gov.noaa
4071588277718960.000017com.ehow
408158812157180.000041org.vim
4091587959414900.000025com.elpais
4101587919713430.000030com.sciencedaily
411158790744450.000056com.squareup
412158790528160.000037com.gartner
413158770644390.000056com.netflix
414158739824700.000053com.webs
415158734982710.000087com.rawgit
4161587348510350.000032edu.uah
4171587325717100.000020uk.co.wired
418158715034630.000053com.bizjournals
4191587116110020.000033com.americanexpress
4201587016715680.000023org.pnas
421158691514330.000057com.monster
4221586910610240.000032com.nielsen
4231586667013170.000031com.redhat
424158664416670.000044com.java
42515865156760.000425org.reactjs
4261586469219490.000017ch.ethz
427158621914000.000060com.force
428158619304040.000060com.herokuapp
4291586137717980.000019com.socialmediaexaminer
4301586080311080.000031com.adage
431158602128920.000035com.googledrive
4321585965318990.000017com.tutsplus
433158582841140.000263jp.co.google
4341585707316960.000020edu.usc
435158565709840.000033com.prweb
436158561917600.000039gov.justice
4371585574814810.000025com.playstation
4381585543217340.000020com.canva
439158549615140.000049us.icio
440158528131720.000138com.xing
441158501978660.000036re.cli
4421585011415720.000023edu.uchicago
4431584956714000.000028com.bostonglobe
444158487238010.000038com.steampowered
445158444412920.000081ca.google
446158442594370.000057com.bigcartel
4471584342321500.000015com.urbandictionary
448158427288440.000036io.material
449158414254810.000051com.bigcommerce
4501583919615400.000024com.caniuse
451158383092450.000096com.getclicky
4521583444813840.000028com.dell
453158343758080.000037gov.state
4541583421417320.000020com.hotmail
455158336722500.000094es.google
4561583133816920.000021au.com.smh
4571583076716320.000022com.upwork
458158301997370.000040org.gnupg
459158298129980.000033edu.utep
460158290953540.000067com.stripe
461158288529010.000035com.msdn
462158281834220.000058com.adweek
4631582691517010.000020com.codeplex
4641582625720050.000016ca.uwaterloo
465158258961070.000283org.networkadvertising
4661582499624750.000013com.twitpic
4671582363013750.000029uk.ac.cam
468158232422250.000101com.myshopify
4691582308817520.000019com.nike
470158224188450.000036com.outlook
4711582231414980.000025com.gettyimages
4721582123313410.000030com.istockphoto
4731582092111890.000031de.heise
4741581947416000.000022com.marketo
475158184755200.000048com.cargocollective
4761581833413680.000029ca.blogspot
4771581723119900.000016com.norton
4781581574414590.000026de.spiegel
479158146268460.000036jp.co.fujixerox
480158136309970.000033com.chicagotribune
4811581288718070.000018com.ikea
4821581247715500.000023com.ning
4831581225420520.000016com.crunchbase
484158110346990.000042com.webmd
485158088262020.000110com.windowsphone
4861580837515210.000024com.scientificamerican
487158082672390.000097com.getbootstrap
4881580821225670.000012com.codecademy
4891580798310990.000031edu.alamo
490158073995070.000049com.npmjs
4911580686615850.000022com.billboard
4921580616610520.000032com.theschooloflife
4931580553320140.000016com.msnbc
4941580433723030.000014com.instructables
495158039207250.000040gov.copyright
4961580364015300.000024uk.ac.ucl
4971580357216760.000021fr.lemonde
498158023349250.000034edu.umich
4991580085313780.000028edu.wisc
500158007471400.000188ru.mail
5011580037123580.000013com.starwars
502157978788650.000036de.blogspot
5031579779115080.000024com.kissmetrics
5041579704711150.000031com.beautifulpixels
5051579694713860.000028com.airbnb
5061579685324510.000013edu.hbs
507157963251660.000145com.eepurl
508157956377680.000038com.css-tricks
509157953632330.000098com.bitly
5101579482216150.000022edu.jhu
5111579369113620.000029com.alibaba
5121579265411640.000031com.sun
513157922717720.000038com.tandfonline
514157915938930.000035com.underconsideration
515157906335180.000048in.co.google
516157890877930.000038com.uber
517157885077040.000042com.photoshelter
518157873325660.000046com.symantec
5191578719629360.000010uk.bl
520157868556830.000043gov.hhs
521157832098070.000037io.getmdl
5221578283116910.000021com.irishtimes
5231578187421540.000015edu.ncsu
5241578133313930.000028com.searchenginejournal
52515781203670.000448com.messenger
526157800515170.000048org.sonatype
527157786779790.000033ca.cbc
5281577858715920.000022com.yandex
529157778904310.000057com.clicky
5301577742717680.000019com.hulu
5311577662514430.000026com.accenture
5321577442016100.000022edu.academia
533157733145280.000047gov.epa
5341577283314030.000028com.marketingland
535157724729720.000034uk.co.guardian
5361577175920400.000016tv.periscope
5371576999916160.000022com.today
5381576844724530.000013ly.visual
539157669453690.000065edu.nyu
5401576684314090.000028org.apa
5411576656128940.000011com.girlswhocode
5421576640915460.000024com.hollywoodreporter
543157655045490.000047uk.co.independent
5441576479123780.000013com.glamour
5451576476021310.000015au.com.news
546157644895530.000046gov.ed
5471576337417140.000020com.invisionapp
5481576332124500.000013org.gimp
549157631047090.000041com.feedly
5501576300113220.000031org.change
5511576164520720.000015com.ibtimes
5521576125515980.000022com.thomsonreuters
5531576028115170.000024gov.nyc
5541576020018260.000018com.posterous
555157590199430.000034com.bravesites
5561575812336490.000008com.space
5571575810514340.000027gov.bls
558157569794430.000056cn.com.sina
559157567123970.000061com.custhelp
5601575507123890.000013com.tesla
5611575390614760.000025com.businessweek
562157534347740.000038com.uk
5631575318617820.000019com.zillow
5641575223518140.000018com.zapier
5651575199725830.000012com.dreamstime
5661575154630030.000010com.klout
5671575099216230.000022com.thehill
568157507222340.000098com.wpengine
5691575007629780.000010com.rottentomatoes
5701574979526930.000012com.campaignmonitor
5711574913016290.000022uk.ac.ed
5721574862622460.000014com.wikidot
5731574838722520.000014com.123rf
574157480382170.000105fr.google
5751574787316110.000022com.intuit
5761574747916410.000021org.letsencrypt
577157467828750.000036com.questionpro
578157448076640.000044com.gotowebinar
5791574452619870.000016com.nokia
5801574293926580.000012edu.brown
5811574249436000.000008com.formula1
5821574236421840.000014com.mentalfloss
583157423424510.000055gov.irs
584157422664910.000050net.openid
5851574066416880.000021com.nba
5861573922215930.000022org.pewresearch
5871573872422220.000014com.aljazeera
5881573835610580.000032com.ezlocal
5891573738114370.000027org.altervista
5901573700214780.000025in.blogspot
591157363202790.000084it.placehold
5921573554832330.000009edu.uic
5931573528022510.000014com.programmableweb
5941573524721530.000015com.cbs
5951573480711530.000031gov.sba
5961573421821980.000014com.techradar
597157341588260.000037gov.census
5981573351317470.000019org.postimg
599157328785060.000049gov.usda
6001573253315350.000024com.target
601157313437210.000041com.docker
6021573112215190.000024com.gigaom
6031573101328000.000011com.oxforddictionaries
6041572817216930.000021net.daum
605157279899620.000034com.gofundme
6061572798016390.000022kr.flic
6071572668111560.000031com.formstack
608157262567630.000039org.sqlite
6091572543616610.000021com.autodesk
6101572481213960.000028com.techrepublic
611157248068170.000037com.patreon
612157212409700.000034com.insiderpages
6131572122717950.000019com.us
6141572019810310.000032com.hotfrog
615157201669660.000034com.whitepages
6161571960519000.000017edu.illinois
6171571957816420.000021com.pwc
6181571846020390.000016edu.asu
6191571839124860.000013com.animoto
620157173202490.000094com.fc2
6211571705218250.000018org.rubyonrails
622157166257420.000040com.wunderground
623157160372130.000106org.debian
6241571596011240.000031org.cmlibrary
6251571582910620.000032com.idt
6261571570513590.000029com.investopedia
6271571545218560.000018com.howstuffworks
6281571475313260.000030org.redcross
6291571461714930.000025com.indeed
6301571376221010.000015com.lonelyplanet
6311571370520540.000016com.gamespot
632157134319100.000035gov.nps
6331571315910840.000032com.thesprintbook
6341571272911410.000031com.smartguy
635157119518320.000037com.att
6361571166020490.000016com.refinery29
637157092925220.000048com.vendio
6381570914428510.000011com.domaintools
639157088428740.000036com.itsnicethat
6401570793918010.000018org.filezilla-project
6411570776013950.000028com.vmware
642157070051710.000139it.google
6431570636139940.000007com.boredpanda
6441570536413910.000028gov.va
645157053358490.000036com.pinimg
6461570496514280.000027com.reverbnation
6471570460420160.000016ca.ubc
6481570434619950.000016com.nfl
649157037686660.000044com.houzz
6501570370015160.000024com.prezi
6511570307419120.000017edu.indiana
6521570217430490.000010com.hubpages
653157015224360.000057com.nasdaq
6541570135927340.000011com.9to5mac
6551570123915790.000023com.pcworld
6561570078518240.000018edu.ucdavis
6571570073114160.000027gov.usgs
658157000758860.000035com.500px
6591569965210010.000033com.acninc
6601569944821570.000015com.livestrong
6611569904813280.000030org.oecd
6621569851922670.000014com.newscientist
6631569720618460.000018com.espn
6641569710114840.000025edu.umn
6651569707417030.000020com.freepik
6661569632219020.000017edu.virginia
6671569488716050.000022com.vox
6681569460618580.000018com.deadline
669156935254830.000051org.whatbrowser
6701569299114990.000025com.mixcloud
671156917288470.000036com.emarketer
6721569159713600.000029fr.blogspot
673156915396950.000042com.flippa
674156912032560.000092com.elegantthemes
6751569035615900.000022com.newsweek
6761568967521700.000015com.getresponse
677156885894600.000054io.atom
6781568858417000.000020com.gallup
6791568831821870.000014edu.bu
6801568736928150.000011org.moma
6811568639418880.000017com.findlaw
6821568377515420.000024edu.si
6831568351620940.000015com.pastebin
6841568291711550.000031dk.fcm
6851568264015470.000024com.globo
686156826173680.000065org.openstreetmap
6871568188911420.000031org.writersleague
6881568057718840.000017edu.cuny
6891568055119250.000017com.starbucks
6901568046514470.000026com.warnerbros
6911567923820750.000015com.socialmediatoday
6921567896611500.000031com.prosperent
6931567867311140.000031org.grayarea
6941567844819840.000016org.aclu
695156778797390.000040org.jenkins-ci
6961567459220020.000016com.mercurynews
6971567444315520.000023com.business2community
6981567440218360.000018mp.j
6991567436343680.000007com.petapixel
7001567378226300.000012com.googlepages
7011567349218940.000017com.hostgator
702156732797450.000039com.geocities
7031567282513360.000030org.mayoclinic
704156722611670.000143gov.privacyshield
7051567123410490.000032com.ycombinator
7061567057114960.000025net.java
7071567001714630.000026us.imageshack
7081566990524320.000013com.psychcentral
7091566906116240.000022com.boston
7101566780015090.000024org.fao
7111566643819800.000016edu.arizona
7121566563915810.000023com.nydailynews
7131566519218320.000018de.welt
714156650992380.000098com.youku
7151566469319150.000017com.salon
7161566455423650.000013edu.gmu
7171566368710170.000032com.aweber
718156636612420.000097jp.co.amazon
7191566365622990.000014com.yourdomain
7201566215020210.000016com.domain
7211566151122850.000014com.ew
7221565970611490.000031com.collegian
723156590437960.000038org.elasticsearch
7241565848713800.000028com.mlb
725156584558990.000035com.delicious
7261565825722390.000014ca.ualberta
7271565783032650.000009org.edx
728156559209880.000033google.design
7291565566627760.000011org.kiva
7301565452614100.000028com.weather
7311565429918370.000018net.codecanyon
7321565428827430.000011com.lynda
7331565420515030.000024com.merriam-webster
7341565414710420.000032com.womentechmakers
7351565409110650.000032net.brownbook
736156537899860.000033com.hootsuite
7371565345039790.000007com.lmgtfy
738156520704260.000058com.ea
7391565191817790.000019edu.umd
7401565185814970.000025com.thedrum
7411565096117350.000020com.aliexpress
742156508842040.000109com.automattic
7431565024315140.000024int.coe
7441565001922470.000014org.openoffice
7451564998816580.000021com.firefox
7461564977215990.000022com.searchenginewatch
7471564962018530.000018com.zazzle
7481564896020270.000016com.gq
7491564865615740.000023org.cambridge
7501564865119040.000017edu.msu
751156477414440.000056com.barnesandnoble
7521564738821490.000015com.azcentral
7531564718124290.000013edu.wustl
7541564711025540.000012org.semanticscholar
7551564696018870.000017edu.umass
7561564665015550.000023fm.last
7571564616720600.000016au.com.blogspot
7581564560711910.000031site.tenerifeforum
7591564554931400.000010com.copyblogger
7601564539011020.000031uk.gov.peterborough
7611564487022890.000014com.topsy
762156445908970.000035com.unity3d
7631564447216280.000022com.over-blog
7641564391115010.000025com.waze
7651564216423200.000014com.gawker
7661564210324660.000013ms.1drv
7671564182813700.000029com.timeanddate
7681564133934770.000009com.answers
7691564116913250.000030com.arcgis
770156408597940.000038com.clkmg
7711563906910800.000032com.cbslocal
7721563893025760.000012org.phys
7731563834816950.000021com.stitcher
7741563768116630.000021com.gumroad
7751563721713660.000029gov.fbi
7761563705023340.000013com.fiverr
7771563622718000.000019com.lulu
7781563567616710.000021com.rollingstone
7791563541718800.000017com.nvidia
7801563509427020.000012com.headspace
781156347673410.000070org.opensource
7821563440016840.000021com.neilpatel
7831563399917710.000019uk.co.metro
784156332089900.000033jp.ac.kobe-u
7851563299617410.000020com.mtv
78615632518560.000499net.facebook
7871563251626490.000012edu.tufts
788156324199150.000035br.com.uol
7891563189425620.000012com.fox
7901563185010570.000032com.brightcove
7911563131315050.000024com.sky
7921563077629330.000010com.popsci
7931563030133890.000009com.wolfram
7941562757224340.000013com.theonion
7951562756713480.000029org.readthedocs
7961562733519270.000017com.trendmicro
797156266543880.000062com.marriott
798156265693420.000070nl.google
7991562630527210.000011edu.caltech
8001562589610600.000032com.2findlocal
8011562527014720.000025uk.co.theregister
802156251578910.000035uk.co.eventbrite
8031562515611210.000031com.fotolia
8041562459618490.000018com.history
805156242743060.000077com.naver
8061562377029850.000010edu.dartmouth
8071562362416060.000022com.bmj
8081562302825150.000012ch.cern
8091562291919140.000017it.scoop
8101562193613570.000029com.walmart
8111562174619300.000017org.kde
8121562134418980.000017com.nrf
8131561933016490.000021im.gitter
8141561928623790.000013com.bestbuy
815156192834730.000052com.iconfinder
8161561835618660.000018org.jstor
8171561810913770.000028com.searchengineland
818156162721840.000128jp.ne.hatena
8191561580015430.000024com.splashthat
8201561456331100.000010org.notepad-plus-plus
8211561411016270.000022com.com
8221561373815290.000024org.heart
8231561289625290.000012edu.uiuc
8241561266627300.000011com.fitbit
8251561185910260.000032com.company
8261561095424890.000012com.wikispaces
8271561087515410.000024com.cafepress
8281561054217380.000020com.ssllabs
8291561013923520.000013de.bild
83015608795690.000447com.parallels
831156086309170.000035gov.usa
8321560862418060.000018com.buffer
8331560854319660.000016com.discordapp
8341560777812060.000031com.infusionsoft
8351560752320310.000016edu.uci
836156072248380.000036org.openweathermap
8371560663231590.000010gd.is
838156055021820.000129jp.ameblo
839156048379000.000035com.cdbaby
8401560459810000.000033com.newsbank
8411560439318150.000018com.deezer
8421560380418220.000018com.discovery
843156027847650.000038org.doxygen
8441560222610300.000032org.travelblog
8451560221310340.000032org.tpr
846156010344280.000058net.launchpad
847156001897770.000038com.sagepub
8481559849310590.000032com.chamberofcommerce
849155980655100.000049com.cracked
850155975907490.000039org.plos
8511559725149430.000006com.checkpoint
8521559703119360.000017uk.co.thesun
85315597000990.000302com.namecheap
8541559665431620.000009com.spreaker
8551559619915330.000024com.xkcd
8561559375913310.000030com.tableau
8571559364214880.000025com.pcmag
8581559349719340.000017edu.ufl
8591559163434530.000009edu.buffalo
8601559134427710.000011com.producthunt
8611559127934240.000009org.lifehack
8621559111319770.000016com.examiner
8631559102210730.000032net.azurewebsites
8641559009123600.000013com.bleacherreport
8651558956610160.000033com.bizcommunity
866155894209960.000033com.chambermaster
8671558929411470.000031com.oup
8681558912618890.000017com.thedailybeast
8691558880526400.000012com.snopes
8701558813720840.000015com.ign
8711558805935920.000008com.appleinsider
8721558757110960.000031com.lookuppage
8731558754120590.000016com.mac
874155874467460.000039com.usnews
875155869286690.000043com.163
8761558625929660.000010org.greenpeace
8771558611436720.000008edu.temple
878155861089190.000035com.tiddlywiki
8791558518019930.000016de.zeit
8801558432716600.000021com.strikingly
8811558412418540.000018co.angel
8821558341922370.000014com.yolasite
883155832665410.000047com.1and1
8841558322816500.000021com.windows
8851558316024540.000013net.comcast
8861558218345410.000007com.blog
8871558191113500.000029com.shareasale
8881558158411030.000031com.spoke
8891558071226760.000012com.macrumors
8901558023521060.000015com.si
8911558005529470.000010com.avast
8921557973111040.000031com.communitywalk
8931557953410390.000032com.independent
8941557948418280.000018it.blogspot
8951557888222350.000014com.icloud
8961557881322270.000014ca.sfu
8971557833117590.000019edu.duke
8981557813414060.000028gov.ny
8991557805830480.000010edu.ucsc
9001557748217810.000019com.lithium
9011557744730810.000010com.marieclaire
902155772478900.000035com.mariadb
9031557696232020.000009com.brainyquote
9041557676825980.000012ca.globalnews
9051557603428950.000011edu.oregonstate
906155759677380.000040es.com.blogspot
907155756076810.000043fr.amazon
9081557504025700.000012com.nintendo
909155749891530.000166de.bund
9101557462920780.000015com.popsugar
9111557404011160.000031com.lacartes
9121557364119290.000017com.angelfire
9131557358820670.000015org.poynter
9141557357310710.000032com.citysquares
9151557341722020.000014com.movember
9161557326618820.000017uk.ac.lse
9171557301710450.000032com.thegreatdiscontent
9181557291720980.000015org.wpmudev
9191557216425220.000012com.fineartamerica
9201557208324930.000012edu.vt
9211557196528330.000011edu.hawaii
9221557173021710.000015com.teenvogue
9231557164115310.000024com.calendly
9241557163515580.000023com.steamcommunity
9251557142930260.000010org.thinkprogress
9261557142514350.000027com.techtarget
9271557124820450.000016com.blogtalkradio
928155711886870.000042uk.co.tripadvisor
9291557110915260.000024com.glassdoor
9301557051215440.000024com.xbox
9311557031213810.000028me.m
9321556988522180.000014uk.co.express
9331556928816750.000021uk.co.mirror
934155685031190.000232info.aboutads
9351556839325080.000012com.blogs
9361556802721150.000015com.templatemonster
937155675783280.000072com.netdna-ssl
9381556728913710.000029gov.dol
9391556723015540.000023org.unicef
9401556718811350.000031com.netdna-cdn
941155663904110.000059com.mapbox
9421556606110880.000032com.americantowns
9431556593024560.000013org.7-zip
9441556529038520.000008com.thenation
945155647728530.000036ca.amazon
9461556462448170.000006com.depositphotos
9471556455526870.000012edu.pitt
9481556453826030.000012nl.uva
9491556428421110.000015sg.com.google
9501556423610200.000032com.galvanize
9511556376310430.000032com.judysbook
9521556333010860.000032org.twinery
9531556302717560.000019com.timeout
9541556278716970.000020com.mediafire
9551556170920860.000015com.w3techs
9561556167613730.000029com.ups
957155612919450.000034gov.house
958155609948550.000036io.pantheon
9591556047723950.000013com.me
9601555916932070.000009cc.tiny
9611555909819580.000016com.apnews
9621555788332480.000009org.code
963155577275370.000047com.getpocket
964155575276720.000043com.elsevier
965155568487290.000040com.prestashop
9661555683323880.000013com.homedepot
9671555679814460.000026com.bufferapp
9681555676734160.000009com.virustotal
9691555655116940.000021com.outbrain
9701555603650220.000006com.wechat
9711555572223400.000013com.pandora
9721555571323010.000014com.foxmovies
9731555561641400.000007com.kpcb
9741555548528600.000011com.lanyrd
975155553147240.000041com.redbubble
9761555392432620.000009org.catalyst
9771555330720960.000015tech.ces
9781555328415910.000022gov.wa
9791555316014770.000025jp.blogspot
9801555294821100.000015com.twilio
981155527384770.000052mp.mailchi
9821555267135810.000008com.biography
9831555206519320.000017com.healthline
9841555157610850.000032com.pacegallery
9851555154130020.000010com.iconosquare
9861555083620000.000016com.baltimoresun
9871555016628030.000011com.imageshack
9881554994319450.000017gov.uscourts
9891554966328750.000011int.esa
9901554914727310.000011com.virgin
9911554839245880.000006com.diigo
9921554839018100.000018com.people
9931554838614730.000025se.haxx
9941554775917060.000020com.visualstudio
9951554773730640.000010com.freelancer
9961554764023070.000014com.xerox
997155475055120.000049com.myportfolio
9981554743813640.000029es.amazon
9991554642433560.000009com.complex
1000155459774690.000053br.com.google

Credits

Thanks to the authors of the WebGraph framework, whose software made the computation of graph properties and ranks possible.

We hope the data will be useful for you to do any kind of research on ranking, graph analysis, link spam detection, etc. Let us know about your results via Common Crawl’s Google Group!