We are pleased to announce a new release of host-level and domain-level web graphs based on the published crawls of August, September, and October 2017. These graphs, along with ranked lists of hosts and domains, follow the first (February, March, April 2017) and second (May, June, July 2017) web graph releases. Additional information about data formats, the processing pipeline, our objectives, and credits can be found in a prior announcement.

What’s new?

Here is a summary of notable aspects of this web graph release:

  • Tools and scripts to produce the web graph and rank the graph vertices are released as part of the project “cc-webgraph” on GitHub.
  • As compared to prior web graphs, two changes are caused by the large size of this host-level graph (5.1 billion hosts):
    • the text dump of the graph is split into multiple files;
    • there is no page rank calculation at this time. At present, we provide ranking by harmonic centrality, and hope to add page rank values in the upcoming weeks.
    • Update Feb 7, 2018: the host-level ranks file now also contains the page ranks. Thanks to Sebastiano Vigna, one of the authors of the WebGraph framework, for the kind support!
  • For the domain-level graph, we provide ranking by both harmonic centrality and page rank.
  • The host-level graph contains a significant portion of hosts related to link spam clusters (possibly 50% or more of the hosts). This data set, therefore, is a useful tool for the study of link spam; from it, we have identified 300,000 spam domains. 2.25 billion hosts in the host-level webgraph belong to these domains. However, in the October crawl archive, these domains comprise less than 2% of the crawled HTML pages (56 million pages out of 3.6 billion) and less than 0.3% of the crawled domains (70,000 out of 26 million). We will start to penalize pages from these domains going forward.

Host-level graph

The graph consists of 5.1 billion nodes and 18.8 billion edges. The graph includes dangling nodes i.e. hosts that have not been crawled yet are pointed to from a link on a crawled page. The host names are reversed and a leading www. is stripped: www.subdomain.example.com becomes com.example.subdomain.

You can download the graph and the ranks of all 5.1 billion hosts from AWS S3 on the path s3://commoncrawl/projects/hyperlinkgraph/cc-main-2017-aug-sep-oct/hostgraph/. Alternatively, you can use https://data.commoncrawl.org/projects/hyperlinkgraph/cc-main-2017-aug-sep-oct/hostgraph/ as prefix to access the files from everywhere.

The following files and formats are provided:

SizeFileDescription
27.9 GBvertices.paths.gznodes ⟨id, rev host⟩, paths of 48 vertices files
95.2 GBedges.paths.gzedges ⟨from_id, to_id⟩, paths of 72 edge files
37.9 GBbvgraph.graphgraph in BVGraph format
2.0 GBbvgraph.offsets
2 kBbvgraph.properties
56.9 GBbvgraph-t.graphtranspose of the graph (outlinks mapped to inlinks)
6.7 GBbvgraph-t.offsets
2 kBbvgraph-t.properties
1 kBbvgraph.statsWebGraph statistics
74 GBranks.txt.gzhosts ranked by harmonic centrality and pagerank

To download the graph in text format, you need to download all files listed in the two path listings.

Domain-level graph

The domain graph was built by aggregating the host graph on the level of pay-level domains (PLDs). The extraction of PLDs is based on the public suffix list from publicsuffix.org. Only “ICANN” domains are accepted; “private” domains are not accepted (cf. section “divisions” in the documentation on publicsuffix.org). For example, foo.blogspot.com and data.commoncrawl.org are not accepted as pay-level domains, they are aggregated, respectively, as the domains blogspot.com, amazonaws.com.

The domain-level graph has 93 million nodes and 1,258 million edges. 60% or 56 million nodes are dangling nodes, the largest strongly connected component covers 31 million or 33% of the nodes.

All files related to the domain graph are available on AWS S3 under s3://commoncrawl/projects/hyperlinkgraph/cc-main-2017-aug-sep-oct/domaingraph/ resp. https://data.commoncrawl.org/projects/hyperlinkgraph/cc-main-2017-aug-sep-oct/domaingraph/.

Download files of the Common Crawl Aug/Sept/Oct 2017 domain-level webgraph

SizeFileDescription
0.65 GBvertices.txt.gznodes ⟨id, rev host⟩
5.0 GBedges.txt.gzedges ⟨from_id, to_id⟩
2.7 GBbvgraph.graphgraph in BVGraph format
0.09 GBbvgraph.offsets
2 kBbvgraph.properties
3.0 GBbvgraph-t.graphtranspose of the graph (outlinks mapped to inlinks)
0.14 GBbvgraph-t.offsets
2 kBbvgraph-t.properties
1 kBbvgraph.statsWebGraph statistics
1.9 GBranks.txt.gzharmonic centrality and pagerank

Below you’ll find the top 1000 domains ranked by Harmonic Centrality or PageRank. The full list of all 93 million domains is available for download.

Top 1000 domains ranked by harmonic centrality (Aug/Sept/Oct 2017)

harmonic
centrality
rank
hc valuepage rankpage rank
value
reversed domain name
12462439410.016162com.facebook
22248914230.009682com.google
32223066640.009431com.twitter
42192025020.011812com.googleapis
52144646250.008148com.youtube
61915827260.005169org.gmpg
71891859670.003893com.instagram
81878007490.003043com.linkedin
918322072100.002469org.wordpress
1018163650200.001295org.wikipedia
1117977714160.001665com.pinterest
1217934400230.001146com.wordpress
1317769666300.000940com.blogspot
1417754502150.001705com.apple
1517668520190.001310com.gravatar
1617559336210.001278com.vimeo
1717418572110.002239com.adobe
1817384530260.001022com.microsoft
1917371196360.000819com.amazon
2017338218140.001979com.macromedia
2117291274560.000617com.flickr
2217174674480.000707com.tumblr
2317140484460.000716be.youtu
2417134992390.000792gl.goo
2517053582270.001006com.paypal
2616948294720.000500com.yahoo
2716940548730.000498ly.bit
2816907622250.001066com.amazonaws
2916905114350.000836me.wp
3016860996970.000263com.nytimes
3116822716380.000794com.github
3216802262320.000884io.github
3316747451800.000376org.creativecommons
34167136171070.000223com.googleusercontent
3516708091860.000325com.weebly
36166913361100.000213com.blogger
3716675803620.000602net.cloudfront
3816652535420.000756com.huffingtonpost
3916626251780.000398eu.europa
4016602228770.000421org.mozilla
41166000541540.000144org.wikimedia
42165981991300.000165net.slideshare
43165699921290.000166com.myspace
4416564424220.001153net.fbcdn
4516558860790.000378com.medium
4616542265310.000902com.cloudflare
4716520668660.000558org.w3
4816520419440.000732com.cnn
4916509535170.001657com.bootstrapcdn
50164885541020.000239com.android
51164675451920.000118com.photobucket
5216418457850.000342com.soundcloud
53164132331650.000135com.ebay
54164099272140.000103com.about
55164073811270.000169org.apache
5616404812700.000505com.wp
57164041142740.000081gov.nasa
58163621151050.000230com.yelp
5916346871880.000315co.t
60163440312480.000090com.livejournal
61163329011450.000151uk.co.bbc
62163195481120.000198com.issuu
63163172992850.000078com.cnbc
64163126131320.000164org.ietf
65163125381060.000224com.dropbox
66163031112430.000091uk.co.telegraph
67163020933000.000075com.appspot
68163020371350.000158com.forbes
69162941261330.000162net.sourceforge
7016285841710.000502com.gstatic
71162569752340.000093org.npr
7216240414840.000342com.wix
73162326331800.000126com.live
74162248801430.000152com.spotify
75162205313580.000061com.alexa
7616220429400.000787com.statcounter
7716219025290.000952com.squarespace
7816216103520.000667com.mashable
79162062762880.000077edu.mit
80161970411850.000120com.oracle
8116196376500.000691com.bing
82161841931460.000150com.imgur
83161813033310.000067gov.loc
84161661391810.000126com.disqus
8516162488470.000714net.akamaihd
86161601771840.000124com.typepad
87161567972260.000096com.mozilla
88161490312660.000083me.about
89161479871150.000196com.baidu
90161449281040.000233com.reddit
91161397221520.000145com.theguardian
92161394914310.000050edu.ucla
93161372334030.000054com.evernote
94161338151170.000192org.archive
95161292641530.000144org.gnu
96161239312860.000078com.foursquare
97161237533250.000069com.buzzfeed
98161196792190.000099com.techcrunch
99161115611760.000128com.imdb
100160958283760.000058com.slate
101160846321890.000119edu.stanford
1021608304980.003658com.godaddy
103160819343480.000063com.mysql
104160815431860.000120com.wsj
105160814793290.000068com.w3schools
106160775991490.000148com.etsy
107160752164710.000047uk.ac.ox
108160703902350.000093com.nbcnews
109160672642710.000082com.wired
110160587782230.000097com.tinyurl
111160529722410.000091com.cnet
112160479622920.000077com.reuters
11316037078370.000802com.fb
114160257735540.000042edu.princeton
11516021981760.000429com.paypalobjects
116160091602910.000077com.meetup
117160071892280.000096com.businessinsider
118160071891690.000132com.digg
119160002732900.000077edu.harvard
120159927464150.000052org.nodejs
121159875292240.000097gov.ca
122159814781220.000186com.feedburner
123159813631580.000142com.opera
124159786151250.000171com.twimg
125159783984120.000053edu.berkeley
126159754291680.000132com.dribbble
127159674022680.000082com.bloomberg
128159671623100.000071uk.co.dailymail
129159626552840.000078com.msn
130159626534450.000049org.worldbank
131159597961160.000196com.jquery
13215959089810.000376net.doubleclick
13315952785680.000520org.schema
134159461273080.000072com.bbc
135159459932520.000090com.aol
136159431612120.000105com.go
137159413452540.000089com.usatoday
138159331984920.000046com.theverge
139159317032770.000080com.ibm
14015926311900.000284com.addthis
141159262831280.000166com.eventbrite
142159248885040.000045co.g
143159248572310.000095com.staticflickr
144159211453020.000074com.time
145159190142100.000105com.surveymonkey
146159185212080.000106com.washingtonpost
147159164914140.000053org.pbs
148159162673320.000067uk.co.blogspot
149159150211640.000138com.zendesk
150159120313500.000062com.images-amazon
151159119611410.000153gov.nih
152159111362950.000076com.latimes
153159077872700.000082au.com.google
154159042824600.000047gov.wa
155159040895260.000044org.eclipse
156159002044300.000050uk.co.guardian
15715897748450.000720me.fb
158158977383180.000070com.gmail
159158918035050.000045com.pixabay
160158888863750.000058org.un
161158843964200.000051com.variety
162158824431200.000187com.list-manage
163158816285600.000041edu.washington
164158804101130.000198uk.co.google
165158778596420.000039org.chromium
166158664195570.000042org.sciencemag
167158616895910.000040com.chron
168158585374290.000050org.python
169158565998130.000036it.scoop
170158516082980.000076com.example
171158445798880.000033edu.gatech
172158336128000.000037com.arstechnica
1731582920110910.000026com.panoramio
174158266388450.000035edu.illinois
175158228242130.000104com.rackcdn
176158212502250.000096net.php
177158195823130.000071org.acm
178158166374610.000047com.scribd
179158154484680.000047com.dropboxusercontent
180158125903680.000059com.kickstarter
181158118665510.000042com.quora
182158118311180.000188com.jimdo
183158104482150.000102gov.ftc
184158072933030.000074com.stackoverflow
185158013891940.000117org.drupal
186158010001010.000239org.bbb
187157997156440.000039com.withgoogle
18815798347690.000516com.vk
189157976459190.000032edu.utah
190157954473950.000055com.theatlantic
191157921104060.000054edu.cornell
192157894778860.000033com.flipboard
193157891594330.000050com.cisco
194157829672760.000080fr.free
195157809354560.000048com.getpocket
19615774946280.000982ru.yandex
197157741742020.000110uk.co.amazon
198157731763460.000063gov.whitehouse
199157697318070.000036edu.columbia
200157690404590.000047com.venturebeat
201157684293630.000060com.webs
202157621568480.000035edu.yale
203157603254930.000046com.zdnet
204157597076320.000039org.kernel
205157555848950.000032com.businessweek
206157545116560.000038com.economist
207157514669160.000032com.jetbrains
208157441719380.000031uk.org.tate
209157438426060.000040com.libsyn
210157428821420.000152com.windowsphone
211157425944470.000049au.gov.nsw
212157345924070.000053com.inc
213157336329910.000029com.googlecode
214157328031870.000120com.mailchimp
21515729191550.000619com.people
216157288712000.000112net.behance
217157279943430.000064com.wiley
218157267233590.000061com.wikihow
21915725173490.000698com.googletagmanager
22015724381950.000279de.google
22115722940870.000324com.qq
2221572053910110.000028com.storify
223157201276120.000039com.box
224157147791750.000129jp.co.yahoo
225157106578350.000035org.unicode
226157103426940.000038com.vogue
227157095145320.000043com.samsung
228157087792400.000092com.salesforce
229157041174530.000049com.deviantart
230157030858700.000034com.ecwid
231157012973140.000071com.barnesandnoble
232156990183720.000058com.oreilly
233156983839000.000032org.arxiv
2341569829410120.000028edu.rice
235156973351110.000208com.shopify
236156945723170.000070com.tripod
237156937505060.000045com.wikia
238156925483270.000068com.dailymotion
2391569226811190.000025com.diigo
240156912995900.000040com.nationalgeographic
241156906731400.000153com.tripadvisor
242156905543790.000057com.office
243156897531470.000149com.stumbleupon
24415687628570.000615com.bleacherreport
245156862912630.000085gov.cdc
246156828339780.000029com.discogs
247156799659600.000030ms.1drv
248156797009660.000030com.hbo
249156782678680.000034org.eff
250156782154700.000047gov.dot
251156749646600.000038gov.fcc
252156746507050.000038com.tinypic
253156738885660.000041com.vice
254156736053300.000068com.skype
255156690743870.000056com.cbsnews
256156678395180.000044com.blackberry
257156644669260.000031org.ieee
258156639278020.000037com.googleblog
259156624538980.000032gov.ky
260156615613420.000064int.who
261156586547350.000037com.unsplash
262156582147370.000037com.indiatimes
263156554164820.000046com.git-scm
264156478883660.000060com.ted
265156478832270.000096com.mapquest
2661564591910730.000026com.sublimetext
267156441128640.000034gov.mo
268156437315700.000041com.foxnews
269156430868050.000036com.livestream
270156358333600.000061com.springer
271156355348110.000036gov.michigan
272156331804740.000046com.npmjs
273156315599670.000030com.ning
274156301375780.000041com.java
275156300405460.000043de.blogspot
276156298329330.000031gov.mt
277156262109720.000029in.blogspot
278156261706690.000038com.sfgate
2791562149311010.000026com.trello
2801562143410760.000026org.amnesty
281156207569630.000030com.hatenablog
282156189414990.000045com.ft
283156176838340.000035com.marketwatch
284156159091500.000145com.ytimg
285156158508320.000036com.yellowpages
2861561480510420.000027edu.psu
287156147898620.000034gov.oregon
2881561374910490.000027edu.ucsd
2891561291312730.000022com.codeplex
290156127789220.000031com.ubuntu
2911561181911610.000024edu.purdue
292156102294050.000054com.goodreads
2931560971218960.000017com.fifa
294156086832180.000099com.wufoo
295156078802750.000080com.hubspot
296156070858940.000033gov.nist
297156057713450.000063com.sxsw
298156053375300.000043gov.state
299156034259640.000030edu.upenn
3001560207010840.000026com.alibaba
3011560146810250.000028com.boston
302155999828510.000035com.timeanddate
303155982706750.000038org.aarp
304155977331950.000117de.amazon
305155951923440.000063com.prnewswire
3061559452811430.000025com.posterous
307155905257950.000037com.atlassian
3081558843512980.000022net.comcast
3091558762610290.000028edu.wisc
3101558716110670.000027com.qz
311155863474810.000046com.intel
3121558586111880.000024com.instapaper
313155858419350.000031com.politico
3141558510313850.000020com.ehow
315155847249620.000030com.pcmag
3161558433210680.000026uk.ac.cam
3171558413012100.000023com.vox
318155797853990.000054edu.cmu
319155746283860.000056com.symantec
320155736663850.000056com.snapchat
321155735174380.000049com.entrepreneur
322155731874970.000045com.nature
323155719017920.000037com.weather
324155694809680.000030com.gizmodo
325155661989740.000029com.nintendo
326155659229010.000032com.lifehacker
327155644244870.000046com.xrea
328155638961590.000141com.weibo
329155625789790.000029edu.utexas
330155622582640.000084com.getbootstrap
331155606723900.000056com.businesswire
3321555796910970.000026com.hotmail
333155548908210.000036us.imageshack
334155521842940.000076net.themeforest
335155508663380.000065org.debian
336155507911310.000164com.bandcamp
337155492498220.000036net.daringfireball
3381554780512300.000023edu.northwestern
3391554564711340.000025com.discovery
3401554404622680.000014com.wikidot
341155409919900.000029com.indiegogo
342155398109970.000028co.vine
343155391298800.000033com.engadget
344155382639340.000031com.stackexchange
345155378922560.000088com.smugmug
346155378825480.000043com.newyorker
3471553623919070.000017edu.cuny
3481553577610280.000028id.co.blogspot
349155338701970.000116com.wixsite
350155333768460.000035tv.ustream
351155333548820.000033fr.blogspot
352155329511190.000188com.constantcontact
353155321386290.000039us.mn.state
3541553203521870.000014com.twitpic
355155318854950.000046com.moz
3561553015315130.000019org.khanacademy
357155295123330.000067mp.j
358155294609520.000030gov.fbi
359155284285120.000044com.giphy
3601552766510050.000028au.net.abc
3611552660312970.000022ie.thejournal
362155252345990.000040com.uk
363155244679250.000031gov.usgs
364155242588760.000033edu.umich
365155233589060.000032org.change
366155222356250.000039org.redcross
367155209502580.000087to.amzn
368155189058830.000033gov.maryland
369155184733640.000060com.fastcompany
37015516021130.002020com.wixstatic
3711551530313780.000020org.owasp
3721551477212060.000023com.googledrive
373155139469860.000029org.plos
3741551391315250.000019org.cambridge
375155133598690.000034ca.cbc
376155101508710.000033com.mlb
3771550915910430.000027com.dell
3781550901911530.000024com.nba
379155081479800.000029com.pingdom
380155079858720.000033com.slack
381155076146610.000038com.chicagotribune
3821550677418260.000018org.gnome
383155053751710.000131fr.google
384155048232290.000095com.scorecardresearch
385155042318740.000033com.searchengineland
3861550166411990.000023com.mtv
3871550075021820.000014ca.uwaterloo
388154997433470.000063org.whatbrowser
3891549849613240.000021com.nike
3901549737117190.000019net.boingboing
3911549671113280.000021edu.jhu
3921549657913220.000021edu.academia
393154948308530.000035com.sun
394154933126520.000038br.com.uol
39515492698580.000615me.m
396154921172890.000077com.hp
3971549132210950.000026com.manta
3981549029318310.000018com.blogs
399154893637130.000037com.sciencedaily
400154875275450.000043com.geocities
401154852378160.000036gov.census
40215484388330.000876com.messenger
403154809338900.000033ca.blogspot
4041548034712680.000022com.jigsy
405154800543120.000071us.icio
406154800483920.000056com.force
407154784078290.000036us.pa.state
4081547794311490.000024com.target
409154763195550.000042uk.co.independent
410154748494410.000049com.squareup
4111547480411350.000025com.prezi
412154746204400.000049gov.noaa
413154742771780.000127org.icann
4141547427512780.000022com.elpais
415154734719950.000029org.altervista
4161547345511410.000025edu.si
4171547268013940.000020com.gawker
418154716963350.000066com.bitly
419154706939180.000032fm.last
4201547055211810.000024com.hulu
4211547034115010.000019tv.periscope
4221546931610740.000026edu.umn
4231546887219170.000017com.sky
424154686111930.000117it.google
4251546558311750.000024com.pcworld
4261546250911680.000024com.teespring
427154603623410.000065com.booking
4281545868312720.000022com.upwork
4291545863310960.000026de.spiegel
430154564644570.000047org.doi
431154534519870.000029com.vimeopro
432154514942990.000075ly.ow
433154507018300.000036com.fastcodesign
434154504221370.000155com.eepurl
4351544984110010.000028net.researchgate
4361544983313020.000022com.salon
437154493959610.000030org.unesco
4381544877720170.000016com.ikea
4391544790610150.000028com.airbnb
4401544762422630.000014com.wikispaces
4411544726012750.000022edu.uchicago
4421544686511690.000024gov.nyc
443154462996100.000039es.com.blogspot
444154456258490.000035com.bandsintown
445154432046890.000038com.cbslocal
4461544279420800.000015com.speakerdeck
4471544243218150.000018edu.virginia
4481544228218130.000018com.csmonitor
4491544183814480.000020com.vanityfair
4501544011012390.000023com.scientificamerican
451154394589360.000031com.thenextweb
4521543916419550.000017edu.msu
4531543867713260.000021com.freewebs
454154385478500.000035com.shutterstock
4551543590923950.000013org.nypl
456154348754390.000049com.delicious
457154348504540.000048com.sciencedirect
458154338209130.000032gov.wi
4591543308012920.000022edu.unc
460154328133360.000066com.technorati
461154313142370.000093ca.google
4621542844312810.000022com.lulu
4631542805110380.000028com.over-blog
4641542751913950.000020edu.usc
4651542702211890.000024uk.co.theregister
4661542685618520.000018uk.ac.ed
467154266825200.000044com.githubusercontent
468154259475210.000044org.hbr
4691542539110660.000027com.istockphoto
470154253496530.000038gov.copyright
471154244948590.000034com.msdn
4721542404421400.000015fr.lefigaro
4731542401413620.000020au.com.smh
474154213396070.000040org.bitbucket
47515421040530.000656com.atdmt
476154196178600.000034com.ooyala
4771541949313290.000021com.searchenginewatch
478154189201360.000156com.xing
479154183618430.000035com.psychologytoday
4801541755812500.000023com.thinkwithgoogle
4811541689121120.000015au.com.theaustralian
4821541629522760.000014ca.ualberta
4831541561822410.000014com.softpedia
4841541554821800.000015de.bild
4851541319420530.000016org.moma
4861541291211840.000024com.cbs
487154128819300.000031com.ggpht
488154106596370.000039gov.nps
4891540884022120.000014edu.gmu
490154076432530.000090com.fc2
4911540757021420.000015edu.asu
4921540704614020.000020edu.duke
4931540569421200.000015com.gamespot
4941540556115270.000019com.nfl
4951540386810080.000028com.zoho
496154027375290.000043cn.com.sina
4971539856518580.000018edu.umd
4981539849721070.000015com.yfrog
4991539805811240.000025com.globo
500153979793670.000060com.photoshelter
5011539625722520.000014com.mentalfloss
502153952155650.000041fr.amazon
5031539433221030.000015com.yolasite
5041539397314900.000019au.com.blogspot
5051539226612570.000022com.billboard
506153912729370.000031us.fl.state
507153908934650.000047com.fortune
5081538972611670.000024com.forrester
509153888671740.000129com.youtube-nocookie
5101538752611510.000024org.filezilla-project
5111538677121910.000014com.pastebin
5121538675412650.000022it.blogspot
51315385485410.000777org.networkadvertising
5141538493918700.000018ru.narod
515153849067430.000037com.att
5161538440419300.000017com.computerworld
517153832978030.000037gov.justice
5181538326431000.000010com.xanga
5191538074325800.000012org.kiva
5201537990020010.000016nl.blogspot
5211537989822840.000014com.googlepages
5221537959530580.000010cc.co
5231537948614790.000019com.colourlovers
524153790921830.000125com.nielsen
5251537865614270.000020fr.lemonde
5261537821210360.000028org.postimg
527153780432420.000091es.google
528153777019230.000031com.gofundme
5291537700025540.000012edu.unl
530153768322970.000076nl.google
531153765639210.000031org.postgresql
532153763722160.000102com.myshopify
533153753156020.000040gov.senate
53415375181630.000602com.shareaholic
5351537507810630.000027com.gigaom
536153744328630.000034com.steampowered
537153732013610.000061edu.nyu
5381537295618410.000018com.techradar
539153725529170.000032com.sagepub
540153719446380.000039com.quantcast
5411537150424530.000013com.citysearch
5421537077413600.000021com.semrush
5431537051113760.000020com.deezer
544153702784850.000046br.com.google
5451537027810170.000028com.mckinsey
546153689639100.000032com.redhat
547153684559580.000030net.azurewebsites
548153674448540.000034com.prweb
549153666973550.000062io.atom
5501536634911250.000025gov.uspto
5511536618616430.000019com.ibtimes
5521536609223840.000013ly.visual
5531536528713560.000021de.zeit
5541536504011160.000025com.ycombinator
555153627461620.000139org.joomla
556153620013730.000058com.naver
5571536141614620.000020com.pexels
558153598995690.000041com.webmd
5591535817114770.000019com.nbc
560153576058870.000033net.leadpages
561153572711260.000170ru.mail
562153571806170.000039org.openstreetmap
5631535698718870.000017edu.ufl
5641535679210500.000027org.tigris
5651535590811770.000024com.smashingmagazine
5661535544513550.000021com.ssllabs
5671535495313610.000021se.haxx
5681535414413070.000021com.econsultancy
56915353069980.000256jp.co.google
570153512924580.000047gov.fda
5711534900612890.000022com.timeout
572153487159710.000029gov.nh
5731534849622030.000014com.bestbuy
5741534633822380.000014com.codecademy
5751534632912560.000022com.whitepages
5761534626114830.000019com.philly
5771534611521300.000015edu.caltech
5781534448119180.000017com.deadline
579153439229770.000029org.iana
5801534167921570.000015com.gq
5811534144312010.000023com.bostonglobe
5821534085619350.000017com.starwars
5831533952422490.000014com.instructables
5841533840713590.000021com.xkcd
5851533803424590.000013edu.bu
5861533697019950.000016edu.indiana
5871533666213010.000022com.animoto
5881533658313820.000020com.vmware
5891533560712700.000022com.zazzle
590153340965490.000042gov.hhs
591153332724440.000049com.marriott
5921533238730680.000010cc.tiny
5931533194827160.000011com.sophos
5941533193918330.000018com.angelfire
5951533100334040.000009com.blog
5961533078711520.000024com.com
5971532942910510.000027com.superpages
5981532753112850.000022com.nokia
5991532668118840.000018com.me
6001532534121950.000014uk.ac.ucl
601153249421960.000116com.googleadservices
602153248502390.000092jp.co.amazon
603153247252930.000077com.list-manage1
604153246719890.000029com.hootsuite
6051532443620730.000015edu.umass
6061532081229890.000010org.icrc
607153207318970.000032com.formstack
6081532055410620.000027com.nydailynews
6091532013724580.000013org.gimp
6101532003223410.000013edu.uiuc
6111531921727020.000011com.klout
6121531850119190.000017org.aclu
613153184731700.000131jp.ameblo
6141531800323750.000013com.canalblog
6151531765710060.000028org.unicef
616153171453880.000056in.co.google
6171531626926540.000012com.technet
6181531621418810.000018edu.rutgers
6191531447218370.000018com.gettyimages
6201531395719460.000017com.thedailybeast
6211531393124980.000012uk.co.metro
622153135385500.000042com.cargocollective
6231531347424780.000012gov.cia
624153134348750.000033com.pinimg
6251531315313330.000021com.brandyourself
626153130424880.000046com.nwsource
6271531265726820.000011com.nabble
6281531199024100.000013com.fiverr
6291531159013660.000020com.reference
6301531120822430.000014edu.uci
6311531068121690.000015com.denverpost
632153104515190.000044gov.usa
633153102549460.000031org.iso
634153092613540.000062com.newrelic
635153084005730.000041com.herokuapp
636153081428360.000035tv.twitch
6371530802915520.000019com.mac
638153077873740.000058com.bizjournals
6391530749619240.000017org.rubyonrails
640153074857270.000037com.usnews
641153060509020.000032com.fotolia
6421530565231650.000009com.laughingsquid
643153054513490.000063com.bigcartel
6441530479020100.000016com.ign
6451530477335860.000008edu.syr
6461530282619600.000017edu.ucdavis
6471530159712710.000022com.playstation
648153015073690.000059gov.irs
6491530077433770.000009com.answers
6501530020126420.000012edu.hawaii
6511530011820640.000016com.topsy
6521530009923160.000014org.videolan
6531529945413110.000021com.underconsideration
6541529894319410.000017com.investopedia
6551529857127250.000011com.hubpages
6561529843726990.000011org.greenpeace
6571529840522990.000014org.webkit
6581529810913040.000022com.accenture
6591529770718820.000018com.howstuffworks
660152974859920.000029us.ma.state
6611529699510890.000026com.netflix
6621529664132600.000009com.bigthink
6631529619012320.000023com.firefox
664152961894490.000049com.angieslist
6651529581512180.000023com.techrepublic
6661529555635200.000008edu.ucsc
6671529533120440.000016ca.utoronto
668152944454240.000051com.houzz
6691529338144860.000007com.skyrock
6701529328622470.000014com.urbandictionary
6711529288628150.000011org.donorschoose
6721529284713060.000021com.wfaa
6731529236810180.000028com.justgiving
6741529201619130.000017com.getfirebug
6751529195511600.000024com.king5
6761529141318770.000018com.us
6771529119912250.000023uk.co.mirror
678152908298520.000035com.reverbnation
679152906328420.000035gov.sba
6801528943420490.000016com.ndtv
6811528826529700.000010edu.brown
6821528802923790.000013org.coursera
6831528796121900.000014com.readwriteweb
6841528744716110.000019ru.spb
6851528688512380.000023ly.snip
6861528546413490.000021br.com.blogspot
6871528535619470.000017com.ey
688152850884160.000052gov.usda
689152845673340.000066kr.flic
6901528450718560.000018uk.co.thesun
691152825245630.000041gov.sec
6921528158811400.000025com.merchantcircle
6931528148118730.000018edu.tamu
6941528110527950.000011com.techsmith
6951527928012240.000023net.vnexpress
6961527917922690.000014edu.osu
6971527850111660.000024com.mixcloud
6981527829733690.000009edu.rochester
699152779311560.000143jp.ne.hatena
7001527789633070.000009org.oxfam
7011527788212360.000023com.examiner
7021527749618320.000018int.wipo
7031527708520870.000015edu.arizona
704152769548080.000036com.hostgator
7051527657111390.000025jp.blogspot
7061527626734460.000009com.allbusiness
7071527599118950.000017com.msnbc
708152755734690.000047com.wunderground
7091527440114380.000020org.craigslist
710152739614830.000046gov.ny
7111527391619840.000016com.udemy
7121527277010070.000028com.cafepress
7131527190713250.000021ca.kijiji
7141527098524690.000013com.tutsplus
7151526996125910.000012com.wolfram
7161526964718180.000018de.welt
7171526843624680.000013com.makezine
7181526746929070.000010com.friendfeed
719152673158470.000035net.openid
7201526711624910.000012it.repubblica
7211526690813800.000020it.justpaste
7221526479410390.000028com.500px
7231526433313690.000020com.bizcommunity
7241526431821720.000015com.pbworks
7251526325512770.000022com.gumroad
7261526289111960.000023com.mysanantonio
7271526276913890.000020com.foxbusiness
7281526235515240.000019com.canva
7291526188523370.000013com.glamour
7301526092832820.000009com.avast
7311526086811290.000025de.heise
7321526008033330.000009com.news
7331526006532690.000009gd.is
7341525844320430.000016uk.ac.lse
7351525838124610.000013com.tv
736152583694280.000050com.typeform
7371525778822950.000014com.theonion
7381525735414550.000020io.codepen
739152572423700.000059com.adweek
740152572049570.000030org.mediawiki
7411525600022550.000014org.acs
7421525593119020.000017com.getsatisfaction
7431525486027330.000011com.expedia
744152546082600.000086com.windows
745152545824960.000046gov.ed
7461525372312950.000022com.sohu
7471525299510580.000027org.oecd
7481525224415550.000019org.ap
7491525220812410.000023com.city-data
750152516284430.000049gov.epa
7511525098035830.000008edu.missouri
7521524974621890.000014com.ask
7531524930018800.000018com.norton
7541524929521640.000015edu.ncsu
755152487298170.000036net.launchpad
7561524861135340.000008com.scobleizer
757152482999120.000032com.redbubble
7581524575715470.000019com.newsweek
7591524538515210.000019com.blogtalkradio
7601524528212870.000022com.garmin
7611524521210270.000028com.hollywoodreporter
7621524518316370.000019com.yandex
7631524517522780.000014com.foodnetwork
7641524392220920.000015com.si
7651524310021940.000014com.mediabistro
7661524239311830.000024com.mediafire
767152423601340.000161com.parallels
7681524217321100.000015com.delta
769152414833370.000066org.microformats
7701524134514420.000020th.co.lazada
7711524123521230.000015com.ford
7721524043418650.000018com.allthingsd
7731524008212990.000022com.technologyreview
7741523958321930.000014com.freep
775152392513560.000062com.teamviewer
7761523906526520.000012edu.pitt
777152388972070.000107org.purl
7781523853218500.000018org.openoffice
7791523756524570.000013com.podbean
780152371623710.000059com.nypost
7811523710324290.000013uk.co.timesonline
7821523592520740.000015com.real
7831523528223630.000013com.oxforddictionaries
7841523418923310.000013ch.ethz
7851523359233340.000009org.notepad-plus-plus
7861523355022920.000014com.nhl
7871523354224490.000013net.jsfiddle
7881523298333270.000009net.battle
7891523278125550.000012com.chrome
7901523278012090.000023org.iihs
7911523274911800.000024org.fao
7921523207822800.000014org.gutenberg
7931523186519390.000017com.ssrn
7941523127634380.000009net.wordle
7951523108514410.000020net.brownbook
7961523021736480.000008com.dreamstime
7971522996934350.000009be.blogspot
798152298579530.000030us.nm.state
7991522969719010.000017org.weforum
8001522935623460.000013org.lifehack
8011522918234440.000009edu.umaryland
802152282935280.000043com.list-manage2
803152276933820.000057net.windows
8041522701324960.000012edu.vt
8051522690532390.000009com.voanews
8061522576324250.000013com.shutterfly
8071522522527620.000011edu.ucsf
8081522517619620.000017com.starwoodhotels
809152244384350.000050com.ea
810152239619110.000032com.gartner
8111522345041150.000007it.libero
812152223618260.000036org.w
8131522222618790.000018org.slashdot
8141522218514670.000019org.cancer
815152210373960.000055jp.ne.sakura
8161522092213580.000021org.json
8171522081310930.000026com.bufferapp
8181521998613570.000021com.unity3d
8191521963422150.000014net.earthlink
8201521903520590.000016com.digitaltrends
8211521857121580.000015uk.co.huffingtonpost
8221521773229500.000010edu.tufts
8231521753314200.000020com.wsoctv
8241521706024350.000013com.thefreedictionary
825152165933530.000062net.freenode
8261521646913370.000021com.kgw
8271521586718710.000018gov.uscourts
8281521580411080.000025com.steamcommunity
8291521473720500.000016com.kaspersky
8301521440014280.000020com.ripple
8311521430614130.000020mil.navy
8321521263632810.000009edu.emory
833152125718580.000034com.linksynergy
8341521204638840.000008com.4shared
8351521166613920.000020com.chamberofcommerce
8361521142922710.000014com.chronicle
8371521126810710.000026be.google
8381521119730570.000010com.squidoo
8391521113930330.000010com.esquire
8401521105921220.000015com.azcentral
8411521056714180.000020ly.list
8421521030723640.000013com.sony
843152101299090.000032com.deloitte
844152090686280.000039com.webnode
8451520889213670.000020net.yahoo
8461520887529520.000010com.threadless
8471520857214110.000020org.whatwg
8481520852840530.000007edu.udel
849152083573810.000057ru.vkontakte
8501520825731020.000010ca.mcgill
8511520813612610.000022com.zillow
8521520733727440.000011edu.uga
853152073329850.000029com.163
8541520691511500.000024com.techtarget
8551520649322810.000014org.wnyc
8561520608711060.000025com.yoast
8571520582530360.000010pt.sapo
8581520565919920.000016org.jstor
8591520564411580.000024com.fedex
8601520470013520.000021org.oxfordjournals
8611520457513470.000021com.thomsonreuters
8621520452011820.000024org.gnupg
8631520441319650.000017org.ampproject
8641520427722620.000014com.css-tricks
865152037283890.000056com.monster
866152032876330.000039com.cdbaby
8671520291811940.000023com.business2community
8681520233970970.000005com.zimbio
8691520225933950.000009com.macrumors
8701520164030100.000010edu.dartmouth
8711520162741140.000007gr.blogspot
872152012765890.000040com.friendster
8731520117125360.000012org.computer
8741520099910530.000027gov.dhs
8751520075813650.000020com.bmj
8761520026412280.000023com.nymag
877152002563830.000057com.youku
8781520009020890.000015org.npmjs
8791519996722820.000014ca.ubc
8801519958313050.000021com.oup
8811519912537910.000008org.laptop
8821519869738620.000008org.wikiquote
8831519865021470.000015com.gopro
8841519837520140.000016me.flavors
8851519714113870.000020com.hotfrog
8861519670126350.000012com.aviary
8871519657311700.000024com.googleapps
8881519626122170.000014com.popsugar
8891519600720510.000016com.patch
8901519597514510.000020com.communitywalk
8911519576944240.000007com.pbase
8921519551110480.000027us.wi.state
8931519518736400.000008tv.wat
8941519459520380.000016com.autodesk
8951519409130500.000010edu.oregonstate
896151939991600.000141info.aboutads
8971519354013310.000021gov.uscis
8981519320923520.000013com.harpercollins
8991519313032370.000009com.blackgirlscode
9001519248446260.000006com.boredpanda
901151923755830.000040com.hilton
9021519228768290.000005com.weheartit
9031519183921620.000015com.ifttt
9041519173857800.000005com.rebelmouse
9051519100931970.000009int.esa
9061519029421550.000015org.raspberrypi
9071519029339410.000008com.pixlr
908151900787990.000037com.adage
9091518988721000.000015com.netvibes
9101518988731730.000009edu.iastate
911151896804800.000046com.taobao
9121518929713360.000021com.today
9131518866434880.000009edu.usf
9141518865820090.000016com.thestar
9151518826022050.000014org.r-project
9161518800814060.000020com.sap
9171518771530400.000010org.charitywater
9181518720424830.000012tv.blip
9191518708421170.000015com.livestrong
9201518704623210.000013com.britannica
9211518598834080.000009au.com.theage
9221518598115610.000019com.mercurynews
9231518546618970.000017com.foxsports
9241518501212960.000022org.apa
9251518377933480.000009tv.arte
9261518352133020.000009com.bhphotovideo
9271518349112660.000022com.comodo
928151833638010.000037com.brightcove
9291518296234830.000009uk.org.bhf
9301518252341550.000007mx.unam
9311518155311110.000025com.convertkit
9321518153322070.000014com.geekwire
9331518141426130.000012nl.xs4all
9341518121728520.000011com.newsvine
9351518079024720.000013com.w3techs
9361518037721920.000014com.newscientist
9371517998131510.000009com.popsci
9381517990718350.000018gov.in
9391517943511950.000023com.gotomeeting
9401517916124240.000013com.fox
9411517884913400.000021com.rollingstone
9421517789513460.000021co.angel
9431517752525440.000012com.avg
944151770881090.000213com.yimg
945151733434640.000047com.zenfolio
9461517234223660.000013org.kde
947151720159390.000031uk.co.eventbrite
9481517187240820.000007net.minecraft
9491517171866270.000005fr.unblog
9501517144626160.000012com.ezinearticles
9511517111619560.000017com.macworld
9521517109210260.000028com.uservoice
9531517047235190.000008cc.arduino
9541517031824360.000013net.oauth
955151700614460.000049com.gotowebinar
9561517003026070.000012us.zoom
9571516970319720.000016com.wetransfer
958151692215970.000040ru.google
959151690014500.000049com.meyerweb
9601516887634200.000009org.britishcouncil
9611516868833940.000009net.deviantart
9621516862311720.000024com.intuit
9631516843810880.000026com.americanexpress
9641516842620080.000016com.amzn
9651516777620600.000016net.speedtest
9661516750532060.000009se.blogspot
967151673654090.000053com.mapbox
9681516702436940.000008edu.uky
9691516670710220.000028gov.va
9701516666340380.000007com.secondlife
9711516655120610.000016com.comcast
9721516593414960.000019com.espn
9731516573410540.000027com.walmart
9741516562141690.000007org.edublogs
9751516505321280.000015com.mydomain
976151645789820.000029us.tx.state
9771516437241600.000007ca.yorku
9781516429410560.000027jp.ne.goo
979151641038120.000036com.emarketer
9801516308618220.000018uk.gov.legislation
9811516290336680.000008tt.db
9821516264359400.000005edu.ua
9831516235733910.000009com.mlive
9841516220513480.000021com.arabianbusiness
9851516213419420.000017org.c-span
9861516131519160.000017uk.co.wired
9871516107624820.000012org.worldcat
9881515960823140.000014net.daum
9891515921526310.000012com.chow
9901515918233010.000009org.ibiblio
9911515904112230.000023gov.archives
9921515859125390.000012com.univision
9931515814819380.000017com.aliexpress
9941515704043770.000007com.vodpod
9951515698318530.000018com.merriam-webster
9961515690824270.000013org.hrc
9971515631925160.000012com.crunchbase
9981515616233510.000009fr.cnrs
9991515594714310.000020com.ktvb
1000151555583260.000068pl.google

Credits

Thanks to the authors of the WebGraph framework, whose software made the computation of graph properties and ranks possible.

We hope the data will be useful for you to do any kind of research on ranking, graph analysis, link spam detection, etc. Let us know about your results via Common Crawl’s Google Group!