We are pleased to announce a new release of host-level and domain-level web graphs based on the published crawls of November, December 2017 and January 2018. These graphs, along with ranked lists of hosts and domains, follow the prior web graph releases (Feb/Mar/Apr 2017, May/Jun/Jul 2017 and Aug/Sep/Oct 2017). Additional information about data formats, the processing pipeline, our objectives, and credits can be found in the preceding announcements.

What’s new?

Here is a summary of notable aspects and changes of this web graph release:

  • the host graph has shrunk significantly in size compared to the preceding release
  • a bug has been fixed which caused that relative links pointing to a different host (//www.example.com/index.html) are not added as edges of the host/domain-level webgraphs
  • the domain graph now contains the number of hosts per domain as additional column in the vertices and rankings files
  • the naming scheme has changed – the release name is now part of the file name
  • webgraph offset files are not released any more, they can be created by running
    java it.unimi.dsi.webgraph.BVGraph -O -L cc-main-2017-18-nov-dec-jan-host
    java it.unimi.dsi.webgraph.BVGraph -O -L cc-main-2017-18-nov-dec-jan-domain

Host-level graph

The graph consists of 775 million nodes and 2.7 billion edges. The graph includes dangling nodes i.e. hosts that have not been crawled yet are pointed to from a link on a crawled page. There are 719 million dangling nodes (93%). The host names are reversed and a leading www. is stripped: www.subdomain.example.com becomes com.example.subdomain.

You can download the graph and the ranks of all 775 million hosts from AWS S3 on the path s3://commoncrawl/projects/hyperlinkgraph/cc-main-2017-18-nov-dec-jan/host/. Alternatively, you can use https://commoncrawl.s3.amazonaws.com/projects/hyperlinkgraph/cc-main-2017-18-nov-dec-jan/host/ as prefix to access the files from everywhere.

The following files and formats are provided:

Download files of the Common Crawl Nov/Dec/Jan 2017-18 host-level webgraph

SizeFileDescription
4.84 GBcc-main-2017-18-nov-dec-jan-host-vertices.txt.gznodes ⟨id, rev host⟩
10.21 GBcc-main-2017-18-nov-dec-jan-host-edges.txt.gzedges ⟨from_id, to_id⟩
4.90 GBcc-main-2017-18-nov-dec-jan-host.graphgraph in BVGraph format
2 kBcc-main-2017-18-nov-dec-jan-host.properties
5.94 GBcc-main-2017-18-nov-dec-jan-host-t.graphtranspose of the graph (outlinks mapped to inlinks)
2 kBcc-main-2017-18-nov-dec-jan-host-t.properties
1 kBcc-main-2017-18-nov-dec-jan-host.statsWebGraph statistics
10.79 GBcc-main-2017-18-nov-dec-jan-host-ranks.txt.gzharmonic centrality and pagerank

Domain-level graph

The domain graph was built by aggregating the host graph on the level of pay-level domains (PLDs). The extraction of PLDs is based on the public suffix list from publicsuffix.org. Only “ICANN” domains are accepted; “private” domains are not accepted (cf. section “divisions” in the documentation on publicsuffix.org). For example, foo.blogspot.com and commoncrawl.s3.amazonaws.com are not accepted as pay-level domains, they are aggregated, respectively, as the domains blogspot.com, amazonaws.com and stored in the reversed form com.blogspot.

The domain-level graph has 70 million nodes and 835 million edges. 60% or 42 million nodes are dangling nodes, the largest strongly connected component covers 22 million or 31% of the nodes.

All files related to the domain graph are available on AWS S3 under s3://commoncrawl/projects/hyperlinkgraph/cc-main-2017-18-nov-dec-jan/domain/ resp. https://commoncrawl.s3.amazonaws.com/projects/hyperlinkgraph/cc-main-2017-18-nov-dec-jan/domain/.

Download files of the Common Crawl Nov/Dec/Jan 2017-18 domain-level webgraph

SizeFileDescription
0.49 GBcc-main-2017-18-nov-dec-jan-domain-vertices.txt.gznodes ⟨id, rev domain, num hosts⟩
3.30 GBcc-main-2017-18-nov-dec-jan-domain-edges.txt.gzedges ⟨from_id, to_id⟩
1.80 GBcc-main-2017-18-nov-dec-jan-domain.graphgraph in BVGraph format
2 kBcc-main-2017-18-nov-dec-jan-domain.properties
1.89 GBcc-main-2017-18-nov-dec-jan-domain-t.graphtranspose of the graph
2 kBcc-main-2017-18-nov-dec-jan-domain-t.properties
1 kBcc-main-2017-18-nov-dec-jan-domain.statsWebGraph statistics
1.46 GBcc-main-2017-18-nov-dec-jan-domain-ranks.txt.gzharmonic centrality and pagerank

Below you’ll find the top 1000 domains ranked by Harmonic Centrality or PageRank. The full list of all 70 million domains is available for download.

Top 1000 domains ranked by harmonic centrality (Nov/Dec/Jan 2017-2018)

harmonic
centrality
rank
hc valuepage rankpage rank
value
reversed hostname
11860790620.015085com.facebook
21837314810.017459com.googleapis
31679248030.009910com.twitter
41668046340.009269com.google
51599162650.008473com.youtube
61492547860.007038org.w
71431525280.004298com.instagram
81415737770.005270org.gmpg
91410484290.003235com.linkedin
1013746553110.002881org.wordpress
1113700700130.002424com.wordpress
1213531057210.001612com.gravatar
1313521560170.001875com.pinterest
1413392743310.001190org.wikipedia
1513270455330.001086com.blogspot
1613174998200.001667com.apple
1713121895100.003063com.bootstrapcdn
1812985827140.002380com.adobe
1912970584470.000877com.amazon
2012960764150.002272com.macromedia
2112950338550.000770com.tumblr
2212938063680.000665be.youtu
2312907253350.001015com.microsoft
2412907114230.001383com.flickr
2512864289700.000596com.yahoo
2612852628570.000715gl.goo
2712851330180.001868net.doubleclick
2812800323160.001965com.googletagmanager
2912754354250.001281com.vimeo
3012701701340.001062com.github
3112693023420.000952com.amazonaws
3212686071260.001271com.cloudflare
3312666506430.000899com.paypal
3412623324360.001011io.github
3512595270810.000396eu.europa
3612586323750.000456org.creativecommons
3712573253770.000414co.t
3812540176790.000398com.weebly
39125105181140.000212net.slideshare
40124891171150.000207com.myspace
4112486341540.000777com.blogger
4212478293190.001751com.medium
4312469958890.000308com.googleusercontent
4412431470720.000553com.list-manage
4512431014480.000847com.bing
4612427733830.000371com.android
47124193861120.000215org.archive
48124129731380.000171com.wsj
4912385832440.000898org.apache
50123537941260.000187com.digg
5112342433490.000840me.wp
52123207201650.000142com.livejournal
5312311819820.000394ly.bit
54122925451180.000204uk.co.google
55122800391130.000214com.issuu
56122712121090.000223com.nytimes
5712265960840.000368org.mozilla
58122596792030.000113com.about
59122595591010.000248com.jimdo
60122529611860.000124com.ted
61122445961900.000122me.about
6212240602290.001231com.gstatic
63122397011470.000156com.staticflickr
6412235297460.000880com.wp
65122270111250.000190com.stumbleupon
6612221630560.000724com.statcounter
67122141491440.000164com.oracle
6812212700690.000601org.w3
69122064751020.000248com.yelp
70121990281240.000191com.spotify
71121975572140.000105com.scribd
7212196921530.000790net.cloudfront
73121916911840.000126com.dailymotion
74121911711960.000118com.webs
75121726371230.000191com.wixsite
7612170188860.000330com.ytimg
77121617942370.000095com.storify
78121560522240.000100gov.loc
79121387892190.000101com.quora
80121297451390.000171net.behance
81121126953020.000081edu.virginia
82121112161820.000127uk.co.bbc
83121101741830.000126com.cnn
84121052391170.000205com.tripadvisor
8512088100380.000980net.fbcdn
8612087599730.000534org.schema
87120728021300.000177org.gnu
8812069134450.000882com.fb
89120638191220.000192com.youtube-nocookie
9012057857400.000974com.squarespace
91120513483030.000080au.com.blogspot
92120509761660.000140com.typepad
9312050495800.000397com.paypalobjects
94120503372580.000092ca.blogspot
95120450192900.000085co.g
96120441931210.000195com.imgur
97120430801360.000172com.disqus
9812039433920.000301com.soundcloud
99120382142960.000083kr.flic
100120372221800.000128edu.stanford
101120290411580.000147com.dropbox
102120219442420.000095com.500px
103120104761460.000159com.forbes
104120094991050.000240com.reddit
105120016802150.000103com.symantec
106119830712260.000100com.googlecode
107119763595280.000061gov.nasa
108119739561570.000147com.nbcnews
109119736632220.000100com.washingtonpost
110119723722340.000097com.getpocket
111119673943260.000075ms.1drv
112119640371950.000119com.huffingtonpost
113119548323550.000069com.4shared
11411953215910.000305com.wix
115119476311930.000119com.imdb
116119471501410.000169net.sourceforge
11711945880510.000809net.akamaihd
118119455232790.000086edu.harvard
119119454671100.000220com.mailchimp
120119415502310.000098com.images-amazon
121119406042550.000093com.businessinsider
122119313013190.000077com.reuters
123119236876030.000052edu.berkeley
12411923484220.001408ru.yandex
125119219553390.000073com.wikidot
126119206882000.000115org.wikimedia
127119147301550.000151com.live
128119099643000.000081com.bloomberg
129119093841690.000138com.eventbrite
130118997702890.000085com.go
131118990113660.000069com.time
132118962013380.000073com.wired
133118956043070.000080com.ibm
134118913922880.000085com.photobucket
135118879701870.000124com.udacity
136118842611070.000232com.sharethis
13711882070520.000798me.fb
138118817663200.000076com.crunchbase
139118799273500.000070org.npr
140118767173860.000067uk.co.telegraph
141118723951590.000146com.theguardian
142118722723330.000073com.meetup
143118695012660.000089com.cnet
144118669355430.000058org.un
145118655683310.000074com.mashable
146118618543940.000066edu.mit
147118559877460.000045com.vice
148118540212360.000096net.php
149118515304350.000063org.python
150118478893010.000081com.stackoverflow
15111846984120.002428com.godaddy
152118439987070.000049com.theverge
153118381013730.000068com.klout
154118378051160.000205com.feedburner
155118376045260.000061com.appspot
156118356293220.000076edu.utah
157118342272080.000108com.ebay
158118249172600.000091com.opera
159118228891540.000152com.zendesk
160118198553170.000078com.tinyurl
161118198538080.000039edu.washington
162118133013140.000078com.msn
163118100883400.000072com.discogs
164118072844250.000063com.gmail
165118068142650.000090au.gov.nsw
166118009387390.000045com.dropboxusercontent
167117966745200.000062gov.whitehouse
16811792725740.000517com.vk
169117900852690.000088org.acm
17011787547780.000408com.jquery
171117835928600.000036edu.yale
172117792276400.000050com.zdnet
173117776514070.000065com.fotolog
174117772742730.000088com.mozilla
175117757008550.000036org.nodejs
176117737173810.000067co.ello
177117729866290.000051edu.cornell
178117714266250.000051com.cbsnews
179117687973720.000068com.naturalnews
18011767467850.000341com.qq
181117671378190.000038org.arxiv
182117662415360.000060uk.co.blogspot
183117630032710.000088com.usatoday
184117590361880.000123com.etsy
185117588123080.000080com.salesforce
186117578237220.000047com.buzzfeed
187117567335340.000060com.cnbc
18811754751370.000993com.yimg
189117512209140.000033edu.psu
190117478457320.000046org.hbr
191117469233850.000067com.googledrive
192117468593540.000069com.oreilly
193117461385250.000061com.foursquare
194117453633350.000073com.bandsintown
195117394962950.000084com.surveymonkey
196117356378350.000037com.gizmodo
197117351128110.000039com.foxnews
198117310921630.000143com.weibo
199117307112160.000103com.techcrunch
200117303687270.000047com.deviantart
201117292983110.000079com.exacttarget
202117266242990.000081fr.free
203117265155970.000053com.kickstarter
204117253941720.000136com.twimg
205117203561810.000127gov.nih
206117202793210.000076com.theknot
207117188383890.000067com.pearltrees
208117179156460.000050com.box
209117168507140.000048org.pbs
210117167383910.000066com.instapaper
211117109952200.000101org.ietf
212117065647100.000049com.cisco
213117055437470.000045uk.co.guardian
214117051068260.000038com.economist
215117025112870.000085com.tripod
216117011618410.000037com.weather
217116985285400.000059com.wiley
218116983703180.000077com.aol
219116973853770.000068com.merchantcircle
220116967237360.000046com.herokuapp
221116915511520.000155com.getclicky
222116909572640.000090com.hp
223116883104030.000065com.googlelabs
224116878223490.000070org.sqlite
225116874665210.000062com.skype
226116869836170.000052com.bbc
22711685451880.000313com.addthis
2281168371315150.000018com.twitpic
229116826501340.000172com.constantcontact
230116821203620.000069com.bravesites
231116816128000.000040com.blackberry
2321167915910190.000028com.posterous
233116790782110.000107uk.co.amazon
234116767529580.000030edu.columbia
235116717714060.000065tl.page
236116697738560.000036co.vine
237116589715760.000054uk.co.dailymail
238116583482940.000084com.prnewswire
239116581576410.000050com.githubusercontent
2401165628910530.000027com.mtv
241116544607920.000040com.newyorker
242116542047940.000040com.wikihow
243116539401060.000240de.google
244116475128170.000038com.arstechnica
245116464012330.000097com.smugmug
246116462993120.000079com.yellowpages
247116413739800.000030com.netflix
248116403353700.000068org.gradle
249116379743290.000075com.googlesource
2501163452110080.000028com.ning
251116312725600.000056com.latimes
252116310612050.000109com.bandcamp
2531163051010360.000028com.airbnb
254116262147610.000043com.venturebeat
255116223702180.000101to.amzn
256116207223880.000067com.example
257116190277740.000042com.googleblog
258116190278650.000035com.wikia
259116182383970.000066com.jigsy
260116177763650.000069com.manta
261116107407860.000041com.squareup
262116080568580.000036us.imageshack
263116071924020.000065io.material
264116058763680.000069com.booking
2651160559914090.000019org.khanacademy
2661160476912550.000022mp.j
267116046526210.000051com.entrepreneur
26811599175600.000696me.m
269115980049880.000029in.blogspot
270115973707440.000045int.who
271115969372380.000095com.hubspot
272115966478210.000038com.timeanddate
273115958619200.000032com.discovery
27411595787900.000308com.shopify
275115936779540.000031edu.ucla
276115917738490.000036com.nature
2771158976710490.000027com.vox
278115873514100.000064com.authorstream
279115870841770.000131org.drupal
2801158591010380.000028edu.upenn
281115855019470.000031tv.ustream
282115844078220.000038de.blogspot
283115835577180.000047edu.cmu
284115830574220.000064com.adsoftheworld
2851158207412340.000022edu.jhu
286115816719100.000033com.politico
287115813099400.000031com.engadget
2881157986613020.000021org.chromium
289115798106020.000052com.fortune
290115793457720.000042gov.noaa
291115762746300.000051com.inc
292115748529440.000031ca.cbc
2931157454111220.000026org.eclipse
294115736248460.000036com.nationalgeographic
295115728039760.000030gov.copyright
296115707658400.000037gov.senate
297115669143740.000068org.openweathermap
298115664581400.000169com.blogblog
299115660134240.000064gov.cdc
300115650023410.000071gov.ftc
301115601552020.000114org.joomla
302115600088020.000040com.adage
3031155938310240.000028ly.ow
304115559569480.000031com.variety
305115557413900.000067com.mysanantonio
306115548548730.000035com.samsung
307115537797090.000049com.goodreads
308115513768430.000036com.proofpoint
309115512639450.000031com.reverbnation
3101155081411400.000025net.researchgate
3111154933714610.000018com.aliexpress
312115489762350.000097fr.google
313115485958750.000035gov.census
3141154840910980.000026org.kernel
315115479507540.000044com.giphy
316115469521080.000229ru.mail
3171154663410630.000027com.searchengineland
318115461191620.000144com.eepurl
319115460869410.000031com.businessweek
320115455358930.000034com.psychologytoday
321115444781980.000116de.amazon
3221153967611340.000025org.altervista
323115393165380.000059gov.ca
324115392146130.000052com.force
325115382006160.000052com.wunderground
3261153571110870.000026com.gumroad
327115352727780.000042es.com.blogspot
3281153305911540.000024com.pixabay
329115327959390.000031com.unsplash
3301153255010730.000026org.unesco
331115325207080.000049com.geocities
3321153231910250.000028com.hotmail
3331153014112100.000023uk.co.theregister
334115295023610.000069ca.google
335115292755350.000060com.springer
336115277314140.000064com.getskeleton
337115265357310.000046com.theatlantic
338115260713570.000069ly.snip
3391152571811930.000023com.nba
340115257103690.000068com.themonitor
3411152554013840.000020tv.periscope
3421152523312430.000022edu.umn
3431152492518790.000014edu.gatech
3441152486310070.000029com.stackexchange
3451152398312690.000022com.zazzle
346115228553580.000069org.jenkins-ci
347115223688160.000038br.com.uol
3481152081012000.000023edu.northwestern
349115205971190.000200com.baidu
3501152023010720.000027edu.umich
3511151990211890.000023com.unity3d
352115196765480.000057com.businesswire
3531151956812050.000023com.indiegogo
3541151914211910.000023edu.utexas
355115190495680.000055com.tinypic
3561151824811750.000024com.angelfire
357115171274180.000064com.hotfrog
358115169059910.000029com.steampowered
3591151540916320.000016com.animoto
360115148831110.000216com.google-analytics
3611151438817830.000014edu.msu
3621151421812020.000023uk.ac.ox
363115139618120.000038uk.co.independent
364115130987200.000047gov.irs
3651151294210440.000027org.iso
366115122738640.000035com.sun
367115122383630.000069com.newsbank
3681151078610890.000026com.libsyn
369115106243760.000068org.osgeo
370115105984300.000063com.communitywalk
371115091948900.000034org.eff
3721150917913080.000021edu.wisc
3731150697812330.000023com.target
3741150554313980.000020com.deadline
375115046658390.000037com.java
376115043538880.000034com.ubuntu
377115042479530.000031gov.nps
378115040139070.000033com.evernote
379115038834190.000064com.sxsw
3801150300710660.000027com.prweb
3811150208012190.000023com.scientificamerican
382115020789170.000032com.uk
383115014327580.000043com.marketwatch
384115012681530.000154com.xing
385115001161750.000133jp.co.yahoo
3861149917012790.000022com.xbox
387114987793790.000067io.getmdl
388114958994560.000063org.simile-widgets
389114944061200.000196jp.co.google
390114930999820.000030fm.last
391114920941790.000128it.google
392114906326230.000051com.delicious
3931149046514250.000019com.nfl
394114901025300.000061com.newsweek
3951148794510030.000029com.cafepress
3961148753911670.000024com.over-blog
397114872452620.000090com.pingdom
3981148720511190.000026com.elpais
399114868279420.000031com.formstack
400114865408800.000035com.shutterstock
4011148624211840.000023com.billboard
4021148600111230.000025com.livestream
4031148554917300.000015com.instructables
4041148467915030.000018com.yfrog
405114844672010.000114jp.co.amazon
406114843161270.000186com.googleadservices
407114839463560.000069com.ea
408114816724130.000064org.tpr
4091148147113530.000020com.msnbc
4101147995611960.000023uk.ac.cam
4111147943318960.000014com.colourlovers
412114790418040.000040gov.fda
4131147827816110.000016edu.usc
414114769107040.000049com.houzz
415114764182910.000085com.newrelic
4161147567612900.000022com.mercurynews
417114745733990.000066com.folkd
4181147449212350.000022br.com.blogspot
419114739854480.000063com.northsails
420114732009960.000029org.ieee
4211147315210770.000026com.thinkwithgoogle
422114687002610.000091com.myshopify
42311468404390.000979com.messenger
424114675271040.000241org.networkadvertising
4251146702213770.000020google.blog
4261146586016190.000016edu.ufl
427114656274410.000063com.weddingbee
428114653871290.000178org.bbb
4291146509712390.000022org.unicode
430114637728310.000037com.intel
431114633683360.000073es.google
4321146257715050.000018ca.ualberta
433114625474380.000063com.quandl
4341146223515490.000017edu.duke
43511462218590.000699com.atdmt
436114606657820.000041gov.sec
437114603957840.000041com.moz
4381145997914320.000019com.allthingsd
439114596419280.000032com.slate
4401145937612400.000022com.codeplex
4411145911416640.000016org.ap
442114583073320.000074com.bitly
4431145775415120.000018org.aclu
4441145628812130.000023com.justgiving
4451145601018520.000014ca.utoronto
446114553662750.000088com.fc2
4471145404010020.000029org.change
448114536834150.000064com.sacurrent
449114534278200.000038gov.state
450114529159380.000032com.ggpht
4511145257412490.000022com.mixcloud
452114522211560.000148jp.ne.hatena
4531144966110780.000026com.com
4541144955010610.000027org.worldbank
455114494444430.000063com.tupalo
4561144894314460.000019com.readwriteweb
4571144709914200.000019edu.uchicago
4581144534612740.000022se.haxx
459114452699640.000030gov.nist
4601144516321010.000012org.gimp
461114449554310.000063com.scribblemaps
4621144474616750.000016edu.purdue
4631144423714760.000018edu.umd
4641144415028610.000009cc.co
465114428034420.000063com.zwire
466114415897430.000045com.photoshelter
467114415544290.000063com.citysquares
4681143904011980.000023com.dell
469114372191940.000119info.aboutads
4701143702919600.000013com.aljazeera
4711143594013670.000020com.macworld
4721143543310500.000027com.sciencedaily
473114351905310.000061com.adweek
4741143498919970.000013edu.ncsu
475114348294500.000063edu.alamo
4761143478212150.000023com.econsultancy
477114342038290.000037com.ft
4781143417115920.000017com.fox
479114339939700.000030fr.blogspot
4801143397915510.000017com.makezine
481114338707680.000042com.cargocollective
482114336024360.000063net.brownbook
483114327864400.000063org.linux-foundation
484114321085820.000054au.com.google
485114318959590.000030com.patreon
4861143132413570.000020com.playstation
4871143087515330.000017fr.lemonde
488114301746280.000051com.office
489114287065270.000061com.barnesandnoble
4901142816517000.000015net.comcast
491114274624320.000063au.com.yelp
4921142712610970.000026com.techrepublic
493114261804450.000063com.live5news
494114257955330.000060net.themeforest
4951142426118580.000014com.redbubble
4961142261817200.000015com.udemy
4971142145935440.000008edu.iastate
4981141955325400.000010ca.uwaterloo
499114193634230.000064com.showmelocal
5001141873919150.000013edu.tamu
5011141802424230.000011org.ampproject
5021141743812470.000022org.owasp
5031141738520450.000012com.tutsplus
5041141727817250.000015com.yolasite
5051141717916080.000016com.pastebin
506114147794530.000063org.rethinkingschools
5071141463515280.000018com.gamespot
5081141354518120.000014org.hrc
509114134179670.000030com.redhat
5101141291611680.000024com.getfirebug
511114128478140.000038tv.twitch
5121141259119510.000013com.aviary
513114120287300.000046fr.amazon
514114114494600.000063com.storeboard
515114108943520.000069au.com.hotfrog
5161141056916460.000016com.autodesk
517114103639740.000030gov.usgs
5181141027218340.000014org.kiva
519114096943820.000067com.tractorsupply
5201140931516550.000016com.ign
521114084842930.000084com.dribbble
5221140831432950.000009com.squidoo
5231140804916740.000016org.weforum
524114080237990.000040ca.amazon
5251140734410640.000027com.ssrn
5261140658333520.000009com.blog
5271140656911250.000025com.walmart
5281140505013440.000021com.getsatisfaction
5291140440213760.000020com.prezi
5301140407927960.000009com.lynda
531114036573240.000076com.nypost
532114035648700.000035gov.usa
5331140343415480.000017edu.ucsd
534114027695590.000056com.nwsource
5351140195830430.000009edu.rice
5361139993618680.000014com.laughingsquid
537113996414080.000064com.bigcartel
538113987304650.000063org.hedgebrook
5391139847810760.000026au.net.abc
5401139847614750.000018it.blogspot
5411139833210410.000027com.cbslocal
5421139708114640.000018com.topsy
5431139628815160.000018com.us
544113960904160.000064org.asciidoctor
545113960095510.000057com.bizjournals
546113949297930.000040com.quantcast
5471139393814850.000018edu.academia
5481139309114880.000018int.wipo
5491139268226380.000010com.hubpages
5501139227420290.000013edu.uci
551113915052810.000086net.yahoo
5521139129512560.000022com.trello
5531139035716910.000015com.hulu
5541138941516560.000016com.wikispaces
555113894065740.000055nl.google
5561138901315450.000017it.scoop
5571138888012250.000023com.searchenginewatch
5581138752612680.000022edu.princeton
5591138707616470.000016com.movember
5601138667116990.000015org.commonsensemedia
561113861139240.000032com.slack
5621138572119220.000013com.searchenginejournal
5631138525013000.000021de.spiegel
564113846099270.000032com.thenextweb
5651138430812940.000021com.mediafire
5661138355411420.000025com.deloitte
5671138243617910.000014org.unicef
568113790514520.000063com.fixr
5691137818510580.000027com.gofundme
5701137695621430.000012com.technet
5711137655713680.000020edu.si
5721137610213190.000021com.csmonitor
5731137605011620.000024com.globo
574113756239040.000033com.nielsen
575113753721280.000181com.windowsphone
5761137517013490.000021com.techradar
577113751647810.000041com.w3schools
5781137414911490.000024jp.blogspot
5791137324915460.000017uk.co.mirror
5801137101912650.000022com.freewebs
581113695845840.000054jp.ne.sakura
582113695379460.000031com.pinimg
583113689074770.000062com.risevision
5841136881127590.000009ly.cl
585113685756600.000049com.whatsapp
5861136796620890.000012ca.ubc
587113677838630.000035com.emarketer
5881136760916430.000016com.oxforddictionaries
589113673265490.000057com.naver
5901136723312180.000023com.nokia
591113661119870.000029gov.house
5921136488315910.000017uk.ac.lse
593113647174510.000063com.expressbusinessdirectory
5941136455217170.000015com.nike
5951136425712360.000022com.vanityfair
5961136417418990.000014org.craigslist
5971136401817970.000014com.googlepages
5981136350719940.000013com.webnode
599113628282090.000108org.purl
600113626797260.000047com.typeform
601113607608510.000036gov.ed
602113607414580.000063com.mapbox
6031136046418250.000014org.wfp
6041135850211430.000025com.boston
6051135813912310.000023com.jotform
606113580998090.000039com.att
6071135795115010.000018com.vmware
608113579324200.000064com.gerritcodereview
609113570754280.000063uk.co.collectbritain
6101135649517650.000015ch.ethz
6111135613814630.000018ru.spb
612113553598760.000035ru.google
6131135535620270.000013com.nbcolympics
6141135513612960.000021gov.uscis
615113546799890.000029com.git-scm
6161135432310160.000028com.mlb
6171135424212970.000021com.qz
618113529622680.000089org.icann
619113525637900.000040br.com.google
6201135219926760.000010org.greenpeace
6211134991414210.000019com.indeed
6221134972815600.000017com.cbs
623113488848970.000034org.bitbucket
6241134840521310.000012org.sundance
6251134792448290.000006com.weheartit
6261134784615550.000017edu.umass
6271134742913880.000020uk.co.thetimes
6281134721643140.000007edu.byu
6291134701212620.000022com.docker
6301134680713780.000020com.firefox
6311134632713410.000021com.adjust
6321134565953290.000005edu.rpi
6331134519237060.000008com.indiewire
6341134323215420.000017com.examiner
635113424584330.000063org.blogging
6361134092937920.000008edu.syr
6371133998414290.000019com.gettyimages
6381133983713890.000020com.withgoogle
6391133801715560.000017com.domain
6401133771513290.000021ru.narod
6411133703521680.000012gd.is
642113353945770.000054me.t
6431133528318540.000014edu.rutgers
644113348761740.000134jp.ameblo
645113335932970.000083com.wufoo
6461133317014550.000019com.uber
647113330044590.000063uk.co.myvouchercodes
6481133272514770.000018org.apa
6491133202810170.000028com.chicagotribune
650113316564740.000063ca.hotfrog
6511133108719890.000013net.daum
6521133087413320.000021com.steamcommunity
6531133062214600.000018com.flipboard
654113304954810.000062com.lekkoo
655113300423050.000080net.freenode
656113298575850.000054com.youku
6571132936115360.000017com.nabble
658113290685220.000062com.atlassian
659113286204670.000063com.locality
6601132850713260.000021com.lulu
6611132793710680.000027com.hootsuite
6621132772517520.000015com.ikea
6631132755135510.000008edu.gmu
6641132705820530.000012com.sony
6651132649214520.000019com.fedex
6661132624523450.000011edu.unl
6671132615010710.000027com.bloglovin
6681132586312990.000021com.gigaom
6691132578912710.000022org.redcross
6701132568513580.000020com.xkcd
6711132560816380.000016com.ehow
672113254559550.000030com.linksynergy
6731132502234900.000008cc.tiny
6741132485816010.000016com.people
67511324391760.000443com.quantserve
6761132385935360.000008com.answers
6771132350816510.000016com.jetbrains
678113229964820.000062fm.company
6791132281820360.000013edu.colorado
6801132265510180.000028com.optimizely
681113223313750.000068com.fastcompany
6821132198226060.000010com.asus
6831132195221400.000012com.rt
6841132181155120.000005nr.co
6851132162614370.000019com.ecwid
6861132059411410.000025org.sciencemag
6871132039627990.000009ca.mcgill
6881132026836380.000008com.techsmith
689113201907960.000040com.mysql
690113198654850.000062com.tucando
69111319832580.000705org.internet
6921131966115680.000017net.oauth
6931131949917990.000014org.cancer
6941131923620980.000012tv.blip
695113190286380.000050net.launchpad
696113178259940.000029com.hostgator
6971131771916660.000016uk.co.timesonline
6981131705620250.000013me.cash
699113162124790.000062au.net.businesslistings
7001131545334680.000009net.battle
7011131537010650.000027com.tandfonline
7021131461714260.000019com.smartinsights
7031131447911260.000025com.windows
7041131372847300.000006com.treehugger
7051131300641310.000007edu.buffalo
7061131240440950.000007tt.db
7071131232036040.000008edu.oregonstate
7081131213719820.000013com.podbean
7091131155816580.000016com.computerworld
7101131132016120.000016com.gq
7111131123736580.000008com.appstore
712113109053930.000066us.icio
7131131068311860.000023com.intuit
714113101299980.000029com.hilton
7151130920513900.000020com.sap
7161130912017570.000015edu.illinois
7171130908037730.000008com.secondlife
718113085769570.000030com.webmd
7191130777820260.000013uk.co.metro
7201130759417400.000015co.angel
721113071574610.000063org.bravenewvoices
7221130680313140.000021me.paypal
7231130661717120.000015com.cbssports
7241130615335960.000008ru.org
7251130578013120.000021com.ssllabs
7261130470915130.000018mil.army
727113033299690.000030com.lifehacker
7281130329111800.000024com.pcmag
729113029392450.000094construction.homebuilders
7301130283022930.000011com.hbo
7311130264618470.000014com.canneslions
732113026018010.000040gov.usda
7331130213216840.000015com.digitaltrends
7341130181617270.000015com.gigya
735112996935370.000059com.list-manage1
7361129938637890.000008edu.colostate
7371129866415800.000017gov.nyc
738112972229370.000032com.amazonwebservices
7391129681616180.000016com.stitcher
7401129503944330.000006com.skyrock
7411129256613340.000021com.blogs
742112908803250.000075org.debian
7431129006715370.000017com.twilio
744112900337550.000044com.aweber
7451128926317930.000014com.freepik
746112885948330.000037com.msdn
747112884873600.000069com.heroku
748112881649620.000030com.npmjs
7491128744816350.000016com.marketingland
7501128638615610.000017com.netvibes
7511128622223170.000011com.carbonmade
7521128520115880.000017de.welt
7531128505513940.000020com.philly
7541128394615210.000018com.blogtalkradio
7551128355311090.000026gov.fcc
7561128336120840.000012com.ford
7571128255843090.000007to.gplus
7581128226924670.000010com.fiverr
7591128196839970.000007com.makeuseof
7601128158321220.000012com.cocolog-nifty
7611128059117030.000015org.networkforgood
7621128020320140.000013com.nbc
7631128011015640.000017com.marketingprofs
7641128005519100.000013au.com.news
7651127987612930.000021com.deezer
766112795278370.000037com.alexa
7671127934015830.000017com.chronicle
7681127710619300.000013org.donorschoose
7691127542825730.000010com.lego
770112752606100.000052org.opensource
7711127476120240.000013org.chillingeffects
7721127475144360.000006com.elle
7731127470936180.000008am.instagr
7741127465924720.000010edu.uic
7751127435811370.000025com.forrester
776112736986110.000052com.snapchat
7771127351550310.000005com.stagram
7781127245217450.000015com.hackerone
7791127237941150.000007com.epicurious
7801127180515870.000017com.pwc
7811127140113360.000021com.oup
7821127108136540.000008com.nbcsports
7831127074425830.000010com.seekingalpha
7841127042420900.000012edu.ucsb
7851127018230560.000009au.edu.anu
7861127004940220.000007com.spreaker
787112697357400.000045org.doi
7881126939113510.000020se.google
7891126937923920.000011com.lonelyplanet
7901126873010430.000027gov.sba
791112686929610.000030org.moodle
792112686607230.000047com.gotowebinar
7931126846838610.000008com.sparkfun
7941126752518840.000014uk.co.thesun
7951126712651310.000005com.rapidshare
796112668674800.000062com.webdesign499
7971126574622270.000011net.wordle
7981126562725570.000010com.dallasnews
799112656088840.000034it.amazon
800112655508230.000038com.digiday
8011126508714380.000019com.today
8021126471812820.000022com.amzn
8031126398816020.000016com.canva
8041126382724570.000010ly.adobe
8051126312614400.000019com.si
8061126288421550.000012hk.com.google
8071126161253130.000005com.dafont
8081126123620150.000013org.charitywater
8091126113228920.000009com.expedia
8101126084273840.000004com.friendster
811112592959150.000033com.infusionsoft
8121125927621250.000012com.programmableweb
8131125910517750.000015jp.or.nhk
8141125910440670.000007com.tistory
8151125817013460.000021org.r-project
8161125808923850.000011me.flavors
817112578132230.000100me.line
8181125764258840.000005com.archdaily
8191125749713540.000020com.pcworld
8201125741821760.000012com.pbwiki
8211125712019540.000013org.icu-project
822112565595950.000053pl.google
8231125623223220.000011com.chow
8241125622618170.000014nl.blogspot
8251125519418300.000014net.bplaced
8261125511420620.000012com.smithsonianmag
8271125476419200.000013com.society6
8281125427613420.000021com.w3techs
829112541282720.000088au.com.blissfulrealestate
8301125388535320.000008org.notepad-plus-plus
831112535788960.000034com.marketo
83211252869270.001256com.wixstatic
8331125231735330.000008com.codecademy
8341125174912840.000022gov.fbi
8351125126815790.000017au.com.smh
8361125024640890.000007gr.blogspot
837112489003300.000075edu.nyu
8381124763252260.000005edu.temple
8391124626214310.000019com.americanexpress
840112453488980.000034com.cdbaby
8411124478411450.000025com.disney
8421124472024050.000011com.current
8431124436825470.000010edu.hawaii
8441124434920790.000012ca.bell
845112434608500.000036gov.hhs
8461124325145540.000006com.freelancer
8471124309637100.000008edu.uiowa
8481124305338440.000008com.panoramio
8491124232997040.000003com.universetoday
8501124223210860.000026com.usps
8511124212751900.000005ca.uvic
8521124204758700.000005com.theta360
8531124195214930.000018com.sproutsocial
8541124187114270.000019us.tx.state
8551124166355200.000005com.barackobama
856112410368440.000036com.feedly
8571124095546360.000006net.box
8581124063615840.000017au.com.theaustralian
8591124046527710.000009com.mentalfloss
8601124044917180.000015org.aarp
8611124040912170.000023com.mastercard
8621124029235290.000008com.avast
863112396719230.000032org.openstreetmap
864112396239110.000033com.shareasale
8651123938215730.000017com.zillow
8661123933013330.000021org.d3js
8671123849118030.000014com.merriam-webster
8681123847426880.000010com.infoq
8691123837515430.000017uk.co.ebay
8701123759417550.000015com.mediabistro
8711123758015020.000018com.bleacherreport
8721123753436890.000008com.space
873112371047050.000049net.clickbank
874112363918850.000034jp.livedoor
875112363908660.000035org.postimg
8761123618922490.000011com.britannica
8771123615418450.000014com.getresponse
878112358942980.000081ru.ok
8791123543917500.000015edu.indiana
88011235284240.001333com.cdninstagram
8811123485042080.000007li.paper
8821123333712500.000022com.kissmetrics
883112328605440.000058ru.vkontakte
8841123279415170.000018com.html5rocks
8851123272222790.000011se.blogspot
8861123229355420.000005com.ucoz
8871123224923520.000011org.kde
8881123188319180.000013com.vogue
889112316714110.000064com.ripple
8901123110235340.000008edu.nd
8911123101615620.000017com.tradedoubler
8921123093912480.000022int.coe
8931123068628260.000009int.esa
8941123017417260.000015com.xerox
8951123015617040.000015com.createspace
8961123007413800.000020com.mac
8971122985835900.000008com.polyvore
8981122961243130.000007com.cheezburger
8991122879018590.000014uk.ac.ed
9001122786344920.000006es.esy
9011122774216960.000015org.computer
9021122757154970.000005com.threadless
9031122753158870.000005com.plurk
9041122703270380.000004com.shapeways
9051122626143240.000007it.libero
9061122591615940.000017com.bestbuy
9071122551638740.000008com.canon
9081122536018370.000014com.nvidia
9091122487015540.000017com.strikingly
9101122424672230.000004com.sendspace
9111122389010930.000026jp.ne.goo
9121122299722950.000011com.denverpost
913112228535450.000058com.cracked
9141122253212240.000023org.fao
9151122244027720.000009com.invisionapp
9161122218314940.000018com.webex
917112221359780.000030com.sfgate
9181122199648040.000006net.minecraft
9191122175414360.000019com.splashthat
9201122134513390.000021com.foxbusiness
9211122126168350.000004ua.pp
9221121961122180.000012tv.pscp
9231121947122670.000011org.unicefusa
9241121912918800.000014org.openoffice
925112191087490.000044com.netdna-cdn
9261121879923570.000011edu.osu
9271121857416820.000016com.accenture
9281121851022690.000011org.glaad
9291121846316270.000016com.delta
9301121836343680.000007mx.blogspot
931112183195940.000053net.windows
9321121810862760.000004com.rdio
9331121753921370.000012org.thehotline
9341121751720960.000012com.tweetdeck
9351121736426640.000010jp.main
9361121734111590.000024jp.exblog
937112169378620.000036com.usnews
938112168569030.000033com.brightcove
9391121682221030.000012org.wiktionary
9401121607325090.000010de.fraunhofer
9411121577345420.000006net.uk
9421121522713350.000021com.business2community
9431121520823340.000011org.undp
9441121485339980.000007com.asos
9451121474314470.000019gov.dot
9461121462623810.000011com.motherjones
9471121386666240.000004com.collegehumor
94811213486136070.000002de.myblog
9491121272842570.000007com.pluralsight
9501121231823100.000011edu.brown
9511121227743760.000006com.dailykos
9521121156550670.000005ro.blogspot
953112115623830.000067com.nasdaq
9541121148526420.000010com.gopro
9551121059526610.000010pt.sapo
9561121047015760.000017org.mayoclinic
9571121032221100.000012net.sf
958112097648420.000037ru.liveinternet
9591120972020580.000012com.bigthink
9601120971015060.000018org.rubyforge
9611120951948330.000006edu.ku
9621120899513470.000021org.poynter
9631120854111510.000024org.oecd
9641120721468560.000004fm.ask
9651120719721470.000012com.creativemarket
966112068884050.000065jp.co.rakuten
9671120676218740.000014com.fineartamerica
9681120646325180.000010mil.af
9691120575110880.000026org.plos
9701120535114100.000019com.warnerbros
9711120521828400.000009com.pitchfork
9721120465221040.000012com.networkworld
9731120460390360.000003com.50webs
9741120356327680.000009com.channel4
9751120267350990.000005com.boardgamegeek
9761120232021390.000012com.hatenablog
9771120225020350.000013org.samaritans
9781120134114330.000019gov.congress
9791120035751150.000005in.linkd
9801120021124600.000010com.dreamstime
9811120009617710.000015com.c2
9821120000812010.000023de.ebay
9831119996913030.000021org.json
9841119973139350.000007us.zoom
9851119934910700.000027jp.jugem
9861119899825690.000010com.flaticon
9871119828130980.000009com.podomatic
9881119824984600.000003com.steemit
9891119693812770.000022com.mcafee
9901119662948120.000006com.viglink
9911119655816850.000015com.bt
9921119646455900.000005com.smore
993111962468740.000035es.amazon
9941119601812260.000023com.thedrum
995111957228280.000038com.list-manage2
9961119551654930.000005com.inhabitat
9971119509310000.000029org.telegram
9981119487114230.000019edu.unc
9991119468818560.000014com.yumpu
100011194633610.000686com.facebookmarketingpartners

Credits

Thanks to the authors of the WebGraph framework, whose software made the computation of graph properties and ranks possible.

We hope the data will be useful for you to do any kind of research on ranking, graph analysis, link spam detection, etc. Let us know about your results via Common Crawl’s Google Group!