Host- and Domain-Level Web Graphs May, June/July and August 2022

We are pleased to announce a new release of host-level and domain-level web graphs based on the crawls of May, June/July and August 2022. Additional information about the data formats, the processing pipeline, our objectives, and credits can be found in the announcements of prior webgraph releases. You may also visit the projects cc-webgraph and cc-pyspark which include all scripts and tools required to construct the graphs. Instructions to explore the graphs in the webgraph format are given in our collection of webgraph notebooks. See below for a summary of changes and improvements implemented for the current web graph release.

Changes, improvements and bug fixes

  • Unicode internationalized domain names are always converted into their ASCII equivalents (IDNA). This is now ensured for node labels in the host-level webgraph (see cc-pyspark#35) and consequently also for the domain-level webgraph where non-ASCII characters were replaced by question marks (see cc-webgraph#6)
  • The nodes of the domain graph are now strictly sorted lexicographically by node label (the reverse domain name). This should allow for more efficient compression of the list of domain nodes.
  • The strict sorting was implemented to address a bug (cc-webgraph#3) which may cause duplicated nodes (two or more nodes with the same label) in the domain graph.
  • The domain graph includes domain names equal to multi-part public suffixes. Previously the assumption was that names of registered domains are exactly one level below any ICANN suffix in the public suffix list and host names which are equal to multi-part suffixes (including at least one dot) were excluded. Such host names are now included, eg. gov.uk, freight.aero or altoadige.it. No further validation (eg. DNS lookup) is performed, so also invalid domain names may be included. Generally, except for a valid domain name string with a valid TLD or public suffix, no further validation is performed for any domain name. For more details, see cc-webgraph#1.

Host-level graph

The graph consists of 449 million nodes and 2.69 billion edges. Both hyperlinks and HTTP redirects and link headers are used as edges to span up the graph. All types of links are included, including pure “technical” ones pointing to images, JavaScript libraries, web fonts, etc. However, only host names with a valid IANA TLD are used. Consequently, URLs with an IP address as host component are not taken into account for building the host-level graph.

There are 389 million dangling nodes (86.6%) and the largest strongly connected component contains 46.4 million (10.3%) nodes. Dangling nodes stem from

  • hosts that have not been crawled, yet are pointed to from a link on a crawled page
  • hosts without any links pointing to a different host name
  • or hosts which did only return an error page (eg. HTTP 404)

Host names in the graph are in reverse domain name notation and a leading www. is stripped: www.subdomain.example.com becomes com.example.subdomain.

You can download the graph and the ranks of all 449 million hosts from AWS S3 on the path s3://commoncrawl/projects/hyperlinkgraph/cc-main-2022-may-jun-aug/host/ (this requires an account on AWS). Alternatively, you can use https://data.commoncrawl.org/projects/hyperlinkgraph/cc-main-2022-may-jun-aug/host/ as prefix to access the files from everywhere.

Please note that the text representation of the host-level graph is shipped in 96 gzip-compressed files listed in two path listings – one for the nodes (vertices), one for the edges (arcs). First, download the paths listing and decompress it using “gzip -d” or “gunzip”. By adding the prefix s3://commoncrawl/ or https://data.commoncrawl.org/ to each line in the path listing you get the list of URLs to download the entire graph.

Download files of the Common Crawl May/Jun/Aug 2022 host-level webgraph

SizeFileDescription
3.09 GBcc-main-2022-may-jun-aug-host-vertices.paths.gznodes ⟨id, rev host⟩, paths of 32 vertices files
11.91 GBcc-main-2022-may-jun-aug-host-edges.paths.gzedges ⟨from_id, to_id⟩, paths of 64 edges files
5.76 GBcc-main-2022-may-jun-aug-host.graphgraph in BVGraph format
2 kBcc-main-2022-may-jun-aug-host.properties
6.20 GBcc-main-2022-may-jun-aug-host-t.graphtranspose of the graph (outlinks inverted to inlinks)
2 kBcc-main-2022-may-jun-aug-host-t.properties
1 kBcc-main-2022-may-jun-aug-host.statsWebGraph statistics
7.46 GBcc-main-2022-may-jun-aug-host-ranks.txt.gzharmonic centrality and pagerank

Domain-level graph

The domain graph is built by aggregating the host graph on the level of pay-level domains (PLDs) based on the public suffix list maintained on publicsuffix.org. Version (commit) e5ff0c7 of the public suffix list was used (commit date 2022-09-15).

The domain-level graph has 91 million nodes and 1.57 billion edges. 50% or 45 million nodes are dangling nodes, the largest strongly connected component covers 37 million or 40% of the nodes.

All files related to the domain graph are available on AWS S3 under s3://commoncrawl/projects/hyperlinkgraph/cc-main-2022-may-jun-aug/domain/ or on https://data.commoncrawl.org/projects/hyperlinkgraph/cc-main-2022-may-jun-aug/domain/.

Download files of the Common Crawl May/Jun/Aug 2022 domain-level webgraph

SizeFileDescription
0.63 GBcc-main-2022-may-jun-aug-domain-vertices.txt.gznodes ⟨id, rev domain, num hosts⟩
6.52 GBcc-main-2022-may-jun-aug-domain-edges.txt.gzedges ⟨from_id, to_id⟩
3.77 GBcc-main-2022-may-jun-aug-domain.graphgraph in BVGraph format
2 kBcc-main-2022-may-jun-aug-domain.properties
3.59 GBcc-main-2022-may-jun-aug-domain-t.graphtranspose of the graph
2 kBcc-main-2022-may-jun-aug-domain-t.properties
1 kBcc-main-2022-may-jun-aug-domain.statsWebGraph statistics
1.96 GBcc-main-2022-may-jun-aug-domain-ranks.txt.gzharmonic centrality and pagerank

Below you’ll find the top 1000 domains ranked by Harmonic Centrality or PageRank. The full list of all 91 million domain ranks is available for download.

Top 1000 domains ranked by harmonic centrality (May/Jun/Aug 2022)

harmonic
centrality
rank
hc valuepage rankpage rank
value
reversed domain name
13291468610.018077com.googleapis
23213156230.012273com.facebook
33177019020.015371com.google
42797171450.007018com.twitter
52783907470.006164com.youtube
62766270860.006892org.w
72725446880.005701com.instagram
82673948640.007602com.googletagmanager
925688664100.004673org.gmpg
102557859090.004792com.gstatic
1125033650120.003435com.linkedin
1224026588110.004116com.cloudflare
1323679430170.002013com.gravatar
1423526052130.002488org.wordpress
1523497052240.001546com.pinterest
1623111634280.001244org.wikipedia
1723073066140.002254com.apple
1822828954250.001434com.wordpress
1922794652310.001150com.vimeo
2022725476390.000940be.youtu
2122500412180.001913com.bootstrapcdn
2222420202320.001128com.microsoft
2322394764150.002193net.cloudfront
2422370010220.001568com.jquery
2522285798230.001553io.polyfill
2622278972510.000652com.blogspot
2722275856440.000799gl.goo
2822242208350.001012com.amazonaws
2922199000470.000701com.amazon
3022170846270.001252net.jsdelivr
3122147346460.000764eu.europa
3222143092410.000874ly.bit
3322058786420.000835org.mozilla
3422050970380.000958com.google-analytics
3522028542210.001626com.fontawesome
3621967818360.001001com.adobe
3721947388200.001865com.github
3821939440940.000371com.tumblr
3921919148190.001882com.googleusercontent
4021916910490.000687com.wp
4121896858520.000647com.paypal
4221790948610.000550co.t
4321769982480.000695com.whatsapp
4421761882540.000605com.flickr
4521753952990.000356com.yahoo
4621729404690.000515io.github
47217135181330.000248com.nytimes
4821675054340.001031ru.yandex
4921669788910.000382com.medium
5021638440300.001195com.wixstatic
5121614440670.000526com.shopify
52216013721190.000315com.reddit
53215943561580.000193com.forbes
5421576744400.000925com.googlesyndication
5521576166630.000546org.w3
56215626681310.000257com.soundcloud
57215395261080.000328com.weebly
5821506586590.000571org.schema
59214851141220.000306org.creativecommons
60214731801530.000207gov.nih
61214648621790.000156int.who
6221460442650.000529com.vk
63214215981770.000158com.theguardian
64214192922030.000129com.cnn
65214140341470.000213org.archive
66214109022180.000122uk.co.bbc
6721408640500.000660net.doubleclick
6821393638620.000550com.unpkg
69213901942170.000122com.businessinsider
70213842621490.000212com.tiktok
71213794501980.000134com.imgur
72213699061060.000332me.wp
7321362530780.000407com.android
74213616481480.000213com.wixsite
7521359636560.000603com.addthis
76213505402810.000098com.bloomberg
7721338508600.000564com.fb
78213326143370.000083edu.stanford
79213308843550.000078com.theverge
8021308772570.000588com.macromedia
81213068122400.000109com.imdb
82213056301170.000324me.t
83213017141810.000154com.bing
8421299082920.000379com.giphy
85212773663000.000093com.bbc
86212709581000.000353com.list-manage
8721266506430.000827net.fbcdn
88212651921430.000218gle.forms
89212633782520.000106com.wsj
90212378263680.000075com.go
91212366543210.000087com.reuters
92212366142200.000120org.ietf
93212335261320.000253com.statcounter
9421223988930.000375com.stripe
95212223321940.000137uk.gov
96212214063020.000093edu.mit
97212199642540.000105org.un
98212183022950.000096edu.harvard
99212181401840.000151com.issuu
100212155661750.000159gov.cdc
101212132921200.000314de.google
102212128762850.000097com.oracle
103212088481500.000209com.ytimg
104212065043960.000068com.cnet
105212047903380.000083com.techcrunch
106212030723650.000075gov.nasa
107211980901570.000198com.dropbox
108211974164760.000055com.msn
109211967222490.000107com.twimg
110211919143570.000077com.quora
111211909243670.000075com.wired
112211841722890.000097net.slideshare
113211841361900.000142com.unsplash
11421183394730.000469com.wix
115211817081740.000160org.apache
116211711024550.000058com.googleblog
117211693301350.000237com.mailchimp
118211687361820.000153com.etsy
119211675483640.000075org.hbr
120211629761250.000284com.spotify
121211597002480.000107com.stackoverflow
122211457102060.000127com.blogger
123211447963970.000067org.arxiv
124211403362920.000096com.slack
125211395362700.000101net.researchgate
126211385402630.000104uk.co.amazon
127211367883450.000080org.npr
128211343103740.000073com.example
129211285341560.000200us.zoom
130211280182360.000110com.washingtonpost
131211244943630.000076com.appspot
132211172901240.000298com.ft
133211156743250.000086com.cnbc
134211151363090.000091com.wiley
135211126043560.000078com.nature
136211049605100.000052edu.berkeley
137211000904830.000055com.myspace
138210953542160.000122com.outlook
139210936802980.000095org.acm
140210914961550.000203com.weibo
141210889361520.000208org.networkadvertising
142210803825720.000047com.cbsnews
143210791522790.000099org.gnu
144210747604370.000061uk.co.telegraph
14521073516680.000520com.godaddy
146210680682910.000096uk.co.google
147210650501300.000266com.youtube-nocookie
148210650301950.000136org.wikimedia
149210620663930.000069com.usatoday
150210618526450.000041com.intel
151210527524780.000055com.goodreads
152210481123780.000072com.time
153210411384810.000055com.theatlantic
154210403746330.000042com.box
155210385022740.000100com.squarespace
156210313082040.000129com.eventbrite
15721030242370.000977com.qq
158210300401850.000150com.yelp
159210256641370.000230com.opera
160210223303600.000076ee.linktr
1612102218211410.000026com.wikia
162210206703490.000079com.springer
163210175244650.000056com.latimes
164210172021700.000165com.zendesk
165210155964240.000062com.huffingtonpost
166210142021620.000185org.ampproject
167210118745740.000046com.indiatimes
168210098301450.000217info.aboutads
169210061828860.000035com.qz
170210051107040.000039org.chromium
171210035386820.000040com.buzzfeed
172210008982210.000120org.doi
173209994425850.000045com.vice
1742098945811160.000027com.thenextweb
175209879143040.000092com.typeform
176209836122610.000104com.sciencedirect
177209827025060.000053edu.cornell
178209822945440.000049com.mashable
179209771726260.000043com.scribd
180209736965230.000051edu.yale
181209712205010.000053uk.co.independent
182209708662580.000105net.behance
183209707766790.000040com.economist
184209682907470.000037edu.upenn
185209642822780.000099org.pewresearch
186209608285450.000049com.cisco
187209605824510.000058com.bigcommerce
188209560625640.000047com.psychologytoday
189209427265130.000052com.fortune
190209426181930.000139page.g
191209403023820.000071com.gitlab
192209391364620.000057uk.co.dailymail
193209361464320.000061com.pixabay
194209339223060.000091com.tinyurl
195209325364970.000053com.deloitte
196209320409560.000031com.evernote
197209254365420.000049io.codepen
198209244502120.000125com.calendly
199209232266940.000039com.vox
200209194947310.000038com.git-scm
201209185826100.000044org.unesco
2022091744810080.000030com.about
20320916974710.000469net.facebook
204209158325710.000047org.weforum
205209150844190.000062com.w3schools
206209149783260.000086com.typepad
207209112685150.000052com.squareup
208209077429040.000034com.arstechnica
209209009524730.000055com.nbcnews
210208999243730.000074co.ibb
211208995326320.000042com.withgoogle
212208989288090.000036edu.washington
213208966165210.000051com.inc
214208918208980.000034uk.ac.cam
215208863104050.000066com.sagepub
216208781345430.000049fm.anchor
217208768266830.000040com.apnews
218208756709670.000031com.slate
219208756504420.000059gov.whitehouse
220208726646890.000040com.venturebeat
221208711025300.000050com.pexels
222208666322420.000109org.iana
223208654282600.000105de.amazon
224208620545490.000048gov.noaa
225208608847550.000037me.about
22620858432330.001073com.baidu
2272085640613120.000023org.eclipse
228208542146090.000044com.mysql
229208470142440.000108com.live
230208460646540.000041com.nationalgeographic
231208443588760.000035edu.asu
232208428822990.000094com.ibm
233208390801960.000136jp.co.google
234208358383510.000078com.dribbble
235208354727160.000038ca.cbc
236208280325580.000048org.worldbank
2372082750012780.000023com.nike
238208149144590.000057gov.fda
239208130986030.000044org.pbs
240208114345860.000045gov.loc
241208104904670.000056gov.usda
242208102844850.000054com.gofundme
243208078083160.000088com.feedburner
244208070063290.000084net.windows
2452080527611320.000027com.hollywoodreporter
246208049301610.000187com.staticflickr
2472080469010030.000030org.greenpeace
248208023104920.000054com.tandfonline
249208023083390.000081eu.youronlinechoices
250208017249920.000031app.netlify
2512080137612810.000023com.billboard
252207993386420.000042com.newyorker
253207981948750.000035edu.wisc
254207969367020.000039au.net.abc
255207962729160.000033org.pypi
256207959001760.000159com.office
2572079531812660.000024com.technologyreview
258207849744770.000055com.theconversation
259207828988870.000035org.sciencemag
260207826402530.000105com.jotform
261207794809840.000031com.gizmodo
262207787086730.000040org.cambridge
2632077771412940.000023com.500px
264207772387300.000038com.walmart
265207759005250.000051com.oup
266207732166080.000044com.xinhuanet
267207721444290.000061com.getpocket
2682077059011810.000025edu.umd
269207687725000.000053gov.epa
270207678627090.000039org.bitbucket
2712076739011440.000026edu.purdue
2722076344013830.000022ms.1drv
2732076297410840.000028co.elastic
274207601208910.000034org.semver
275207555444300.000061org.debian
2762075363013080.000023org.kernel
277207497687570.000037com.britannica
278207497169630.000031com.nypost
279207471306380.000042com.elpais
280207446529290.000032com.foxnews
281207383605020.000053com.dailymotion
2822073661211540.000026com.sky
2832073567810000.000030com.uk
284207296882460.000108com.wpengine
2852072889016230.000019com.googlesource
2862072684610070.000030edu.princeton
287207254405480.000048gov.house
288207224365920.000045com.mozilla
28920721772860.000393com.wsimg
2902072165814040.000021com.over-blog
291207189064880.000054com.ted
2922071683816600.000018com.lego
293207159287540.000037gov.justice
2942071483210050.000030uk.co.guardian
295207138564250.000062com.arcgis
2962071348613190.000023com.digitaltrends
297207117507950.000036edu.umich
298207106504280.000061org.openstreetmap
299207095862410.000109net.sourceforge
300207086249470.000032com.ssrn
3012070308816970.000018org.usenix
302207003543860.000070com.netdna-ssl
303206983189350.000032com.ggpht
304206975182320.000113com.amazon-adsystem
305206968383140.000090tv.twitch
306206963309500.000032uk.co.blogspot
3072069613614160.000021com.hatenablog
3082069288411490.000026co.g
309206919642300.000114gov.ca
310206895888010.000036com.politico
3112068924613150.000023com.socialmediatoday
312206864407280.000038org.change
313206855282390.000110uk.org.ico
314206854982230.000119jp.co.yahoo
315206852125880.000045uk.gov.service
316206843541710.000162com.rawgit
317206842322800.000098net.azureedge
3182068182612100.000025io.itch
3192068034613180.000023de.mpg
3202067871415520.000019com.euronews
321206764909640.000031edu.jhu
322206761869400.000032edu.umn
323206750585310.000050site.business
324206727081690.000166com.addtoany
325206717744740.000055gov.hhs
326206701144120.000064com.ebay
3272066893815500.000019com.urbandictionary
3282066486611820.000025com.axios
3292066400812420.000024org.semanticscholar
3302066303411030.000027com.udemy
3312066250013950.000021com.reverbnation
3322065985815050.000020edu.indiana
3332065682414810.000020au.com.news
3342065492410790.000028edu.uchicago
335206542747520.000037org.fao
336206531126220.000043gov.census
3372065269811780.000025net.speedtest
3382065080817710.000017org.phys
33920650016740.000424net.akamaihd
340206479382290.000115com.hubspot
341206425949950.000030com.scientificamerican
3422064146613280.000023com.nymag
3432063887017880.000017com.martinfowler
3442063838216630.000018edu.gatech
345206376805550.000048com.kickstarter
346206355581870.000146com.xing
3472063550611290.000027org.wiktionary
3482063459210420.000029edu.utexas
3492063391223140.000015com.flipboard
350206336345670.000047com.snapchat
3512063324632040.000011com.openai
3522063031614230.000021ch.ethz
3532062952214200.000021com.businessweek
354206291008730.000035watch.fb
355206260181540.000206com.sharethis
356206256729480.000032com.timeanddate
357206250367200.000038org.d3js
3582062457817440.000017com.itv
3592062069012670.000024uk.ac.ucl
3602061883414550.000020uk.co.metro
361206177183200.000087com.statista
362206176025290.000050com.googlecode
3632061737811470.000026com.jetbrains
364206172926140.000044org.ohchr
365206172669150.000033de.spiegel
366206166544720.000055com.meetup
367206165803220.000086com.disqus
368206159663990.000067com.optimizely
3692061541428020.000013com.diigo
370206150482870.000097jp.ne.hatena
3712061436012850.000023com.smithsonianmag
3722061409811520.000026com.scmp
3732061401011100.000027com.foursquare
3742061049026360.000014blog.home
3752061020220200.000016com.knowyourmeme
376206078563530.000078net.themeforest
377206075067330.000038au.gov.nsw
3782060621810780.000028com.chicagotribune
3792060354811640.000026au.com.smh
3802060324815890.000019uk.co.express
3812059998611210.000027edu.nyu
382205995082680.000102com.npmjs
383205985666410.000042gov.senate
384205949746390.000042com.zdnet
3852059426411280.000027link.page
386205915689680.000031com.usps
387205887328900.000035gov.congress
388205868882930.000096com.eepurl
3892058531410020.000030com.history
390205840246770.000040com.pinimg
391205822661410.000221com.paypalobjects
39220581216660.000528com.googleadservices
393205803444500.000058es.google
3942057905227360.000014edu.byu
395205777488990.000034au.com.google
3962057758814500.000021uk.co.standard
397205766327110.000039com.istockphoto
39820572810970.000357net.jsfiddle
399205722022830.000097me.telegram
4002056853613330.000022cn.com.chinadaily
401205683225520.000048ca.google
4022056793611740.000025de.bild
4032056670413940.000022com.producthunt
404205660743920.000069com.proofpoint
405205647889550.000031edu.si
406205624166350.000042org.oecd
4072055958414790.000020ca.ubc
4082055917414670.000020com.wattpad
4092055814221320.000015app.web
410205579568880.000035google.blog
4112055788010950.000028com.dw
412205543187190.000038gov.archives
4132055317214910.000020com.buzzfeednews
414205529965030.000053nl.google
4152055192619210.000016com.mystrikingly
416205513844580.000057com.criteo
4172055086610350.000029uk.co.thetimes
418205496563520.000078com.prnewswire
4192054898214630.000020uk.ac.lse
420205487089740.000031in.co.google
421205482383800.000071com.sohu
4222054495614480.000021uk.co.wired
423205446863890.000069com.atlassian
424205443263590.000077net.php
425205420345270.000050com.matterport
4262054066616380.000018de.ebay
427205365067770.000036com.livejournal
428205354543280.000085ru.ok
4292053513010590.000029gov.treasury
4302053425011940.000025com.sun
4312053369817870.000017com.channel4
432205329003810.000071net.imgix
4332053263819320.000016gov.cia
4342053237010540.000029org.telegram
4352053150010530.000029uk.parliament
4362053109627590.000013ph.telegra
4372053055615090.000020uk.co.thesun
438205299285930.000045edu.cmu
4392052969410700.000028int.coe
440205280584940.000053com.media-amazon
4412052805218140.000017com.hindustantimes
442205277169190.000033com.iconfinder
4432052666410040.000030org.jstor
4442052529015900.000019com.straitstimes
4452052468017670.000017edu.tufts
446205233524180.000062com.elsevier
447205208684900.000054ru.gov
4482052020810830.000028gov.fbi
4492051631013710.000022edu.duke
450205149684080.000065com.adroll
4512051422613440.000022int.itu
4522051302613820.000022de.zeit
4532051242216540.000018com.newscientist
454205115743720.000074com.githubusercontent
4552051153214540.000021com.unity3d
4562050901417120.000018org.maven
457205086049880.000031de.focus
4582050828225250.000015com.storify
4592050653414750.000020com.irishtimes
460205064746270.000043gov.state
461205052687050.000039uk.nhs
4622050518817110.000018com.mercurynews
4632050514611960.000025edu.unc
464205044003110.000090com.mapbox
465205034206000.000044net.ctfassets
4662050308414060.000021jp.ne.goo
4672050137216900.000018org.propublica
468204999169000.000034gov.sba
4692049960027650.000013me.ogp
4702049896015410.000020com.mcafee
4712049829215640.000019com.nydailynews
4722049701213220.000023org.unhcr
4732049250619800.000016com.csmonitor
4742049141616450.000018ca.mcgill
475204909724960.000053org.python
476204884742590.000105gg.discord
4772048772835690.000010net.docdroid
4782048581818810.000016app.vercel
4792048497625570.000015com.instructure
4802048382812630.000024ch.ipcc
4812048115019430.000016io.gitlab
482204811442670.000102com.aliyuncs
4832048039819630.000016com.thoughtco
4842047852210250.000030gov.dhs
4852047793416350.000019com.lenovo
4862047744011240.000027gov.usgs
4872047531410370.000029org.ilo
4882047287812460.000024org.hrw
48920472770950.000363me.wa
490204721524530.000058com.samsung
491204704201420.000219com.salesforce
4922046719628180.000013com.oxforddictionaries
4932046655025390.000015au.com.sbs
494204656524360.000061com.filesusr
4952046410420510.000016com.brave
4962046213211070.000027com.thehill
4972046200812490.000024com.aljazeera
4982046132813270.000023com.brightcove
499204605327800.000036com.thinkwithgoogle
500204602985760.000046org.worldwildlife
5012045783028340.000013sg.edu.nus
502204558924350.000061com.visualstudio
5032045445438170.000009com.minds
5042045402410290.000029edu.brookings
5052045279810880.000028sg.com.google
506204520182960.000095gov.ftc
5072045125020810.000016com.rt
5082045057413350.000022de.welt
509204503268890.000035com.fandom
5102044911013070.000023de.sueddeutsche
511204488504980.000053com.fastcompany
512204482847680.000037com.oreilly
5132044815231820.000011cc.uxdesign
514204474089050.000034com.deviantart
515204426844490.000058com.ssl-images-amazon
5162044257228910.000013org.accessnow
5172044250038090.000009org.edublogs
518204410721590.000192com.jimdo
5192043955622450.000015tl.we
5202043864631430.000012com.instapaper
521204378422080.000125ru.mail
522204363964570.000057com.patreon
5232043619828410.000013com.bloglovin
5242043498415390.000020com.firebaseapp
5252043234235870.000010com.pearltrees
5262042997425650.000015edu.oregonstate
527204281203690.000074com.surveymonkey
528204262224030.000066com.businesswire
5292042590029070.000013org.wikibooks
5302042366211080.000027de.stern
5312042331016530.000018com.warnerbros
5322041899414070.000021be.google
5332041838011480.000026ly.rebrand
5342041610619130.000016edu.ucsb
535204157745980.000044com.airbnb
53620414102980.000356com.messenger
5372041310015660.000019org.rfc-editor
538204130103030.000093net.secureservercdn
5392041283419110.000016co.carrd
5402041263825550.000015it.scoop
541204119207870.000036com.zoho
542204117225390.000050com.gmail
543204112849230.000033com.thelancet
5442041047820230.000016com.dictionary
5452040866246620.000008com.folkd
546204084169530.000032edu.psu
5472040824819750.000016org.documentcloud
5482040767812030.000025org.undp
549204064646970.000039io.readthedocs
5502040627214030.000021net.codecanyon
5512040567631420.000012com.hubpages
552204039586400.000042com.entrepreneur
5532040231018550.000017com.france24
554204005242370.000110to.amzn
5552039957625500.000015gov.lbl
5562039799632960.000011google.ai
5572039739228120.000013com.aboutamazon
5582039573812840.000023com.snopes
5592039474814150.000021int.unfccc
560203943629540.000032com.ubuntu
561203942642130.000125com.aspnetcdn
562203930365610.000047com.steampowered
5632039234430410.000012com.dreamstime
5642039166615270.000020gov.defense
5652039024618290.000017org.iea
5662038772229910.000012com.oregonlive
5672038397233950.000011org.neocities
5682038375216520.000018io.ghost
5692037956826250.000014org.nature
5702037893411800.000025com.prweb
571203782425650.000047com.netflix
5722037787814590.000020mil.army
573203773884990.000053org.nodejs
5742037714422290.000015uk.bl
5752037700820490.000016org.archlinux
5762037639211190.000027com.dell
5772037607430490.000012org.paho
5782037576221030.000016com.thefreedictionary
579203730647010.000039com.docker
5802037249627210.000014org.computer
5812037243626500.000014com.googlegroups
5822037072022330.000015org.ap
5832036947031160.000012com.webbyawards
584203693941380.000229me.line
585203693746060.000044com.investopedia
5862036915431260.000012org.scala-lang
5872036834827380.000014com.msnbc
5882036599215470.000019ca.sfu
5892036351217640.000017com.patch
5902036291412390.000024net.clickbank
5912036208832230.000011de.chip
5922035964032070.000011org.vim
593203579489360.000032org.js
5942035751813900.000022io.shields
5952035737229650.000012org.rsf
5962035294217980.000017gov.usembassy
5972035115413850.000022com.mixpanel
598203505401020.000349com.uservoice
5992035011840860.000009com.bravesites
6002035005227390.000014edu.iastate
6012034994032020.000011com.slides
602203498608930.000034com.office365
6032034911834430.000010org.aclweb
6042034910433750.000011org.google
6052034713834110.000011uk.co.yougov
606203451885790.000046org.unicef
6072034509231950.000011com.dummies
6082034507243950.000008it.justpaste
6092034479633940.000011org.globalcitizen
6102034459018740.000016ca.globalnews
611203438845070.000053com.fc2
612203420345240.000051com.adweek
6132034136228670.000013jp.co.japantimes
6142034050412700.000023com.loom
615203393769370.000032com.digitaloceanspaces
61620339258720.000469com.oculus
617203391267440.000038uk.co.pinterest
6182033877010910.000028com.webs
6192033848634160.000011com.thecvf
6202033242826570.000014ca.ualberta
6212033187422520.000015com.channelnewsasia
6222033156029680.000012in.businessinsider
623203306089820.000031org.mediawiki
6242033043815250.000020com.bol
6252032881023250.000015com.foreignpolicy
626203280385570.000048com.digg
627203261242880.000097com.bandcamp
628203258469590.000031com.variety
6292032510012930.000023org.imf
6302032496211300.000027ly.cutt
6312032387232210.000011org.freedomhouse
6322032338417460.000017us.mn.state
6332032335849700.000007com.sendspace
6342032248043370.000008org.marxists
63520322396640.000540com.trustpilot
636203218343320.000084me.fb
6372032076616990.000018com.ipsos
6382032020216980.000018gov.uscis
639203196802110.000125org.whatwg
6402031912018110.000017eu.politico
6412031874455190.000006com.edocr
6422031826633510.000011de.diplo
6432031652420770.000016com.spreaker
6442031641629610.000012com.space
6452031513418660.000017com.voanews
6462031489627500.000014org.wikidata
6472031389620470.000016dk.google
6482030999810330.000029me.onelink
6492030916036950.000010com.prweek
6502030864435780.000010com.virgin
6512030788232060.000011com.slidesharecdn
652203062306050.000044com.canva
6532030582616150.000019com.indianexpress
6542030536233790.000011com.reason
655203034869080.000034com.imageshack
6562030321238150.000009org.cpj
6572030292211570.000026com.att
658203021067590.000037uk.co.eventbrite
6592030074832330.000011com.hm
660203000127610.000037com.gumroad
6612029980431510.000012de.taz
6622029767836710.000010uk.ac.nhm
663202974729450.000032com.fiverr
6642029732427290.000014com.verywellhealth
665202971105730.000046com.globenewswire
666202968328830.000035com.wikihow
6672029399822510.000015org.ocks
6682029348832110.000011org.iucnredlist
6692029343430080.000012edu.uoregon
6702029320426580.000014com.gfycat
6712029309434090.000011org.oxfam
672202930668050.000036int.wipo
6732029283428510.000013com.fineartamerica
6742029272015010.000020pl.gov
6752029212445360.000008com.backblazeb2
6762029170818180.000017com.jimdosite
6772029095817740.000017com.thestar
6782029048031390.000012org.eji
6792029043242600.000008com.theodysseyonline
6802028928615330.000020com.routledge
6812028671235500.000010uk.co.timesonline
6822028543229000.000013org.gnupg
6832028491825700.000015com.infogram
6842028474619610.000016uk.org.greenend
6852028469422690.000015org.rand
6862028438019010.000016com.surveygizmo
687202835706210.000043br.com.uol
688202832185110.000052org.drupal
6892028283434230.000011org.democracynow
6902028141810570.000029org.unicode
6912027968042480.000008com.roche
6922027910249780.000007re.cli
6932027847229590.000012com.kaggle
6942027834412080.000025cn.news
6952027808221820.000015cc.tiny
6962027756435130.000010org.bitcointalk
6972027696831800.000011com.gawker
6982027552434590.000010com.bigthink
6992027539613620.000022com.jekyllrb
7002027425017350.000017com.justia
7012027330010770.000028com.css-tricks
7022027237828210.000013com.motherjones
7032027215628500.000013edu.nd
7042027207616910.000018org.ourworldindata
7052027109818850.000016ca.on.gov
7062027016628030.000013com.timesofisrael
7072027014036460.000010org.project-syndicate
708202699985090.000052com.mckinsey
709202695961920.000140com.discord
7102026951425530.000015net.openid
7112026946614050.000021org.amnesty
7122026945228420.000013net.vnexpress
7132026829641610.000009com.crayola
7142026829214460.000021gov.uscourts
7152026768220290.000016gov.faa
716202673444840.000055com.onesignal
7172026667023800.000015com.lexisnexis
7182026573032970.000011com.nme
7192026500612310.000024ms.aka
7202026494820430.000016gov.usaid
7212026363210660.000028com.pcmag
7222026348029760.000012com.mathworks
7232026342227960.000013uk.ac.kcl
7242026300227460.000014fr.gouv.diplomatie
7252026213618540.000017org.worldcat
726202605225530.000048ca.youradchoices
7272025773630500.000012org.csis
7282025721633300.000011org.repec
7292025719220030.000016de.ndr
7302025691011930.000025com.playstation
7312025630830830.000012ru.kp
7322025487033760.000011no.uib
733202547126600.000041gov.nist
7342025367431290.000012org.ewg
7352025357025880.000014de.web
736202531329010.000034com.mobirise
7372025267830050.000012au.com.businessinsider
7382025202235860.000010org.polymer-project
739202518465410.000049com.sxsw
740202499586880.000040com.usnews
741202484482090.000125com.myshopify
742202475203420.000081mp.mailchi
743202474948120.000036net.b-cdn
7442024643042360.000008com.mail
7452024638825710.000015com.sina
7462024565815400.000020com.pastebin
7472024472446330.000008com.mysanantonio
7482024472026560.000014org.unctad
7492024392434920.000010com.thejakartapost
7502024340012890.000023org.coursera
7512024300212960.000023com.smashingmagazine
7522024239637140.000010io.fabric
753202423441640.000176de.bund
7542024167036060.000010com.shell
7552024142430640.000012com.biography
7562024111637510.000010com.nwsource
7572024046842890.000008build.bazel
7582024014425290.000015org.medrxiv
7592023691429140.000013com.coca-colacompany
760202365029460.000032com.shutterstock
761202362429490.000032uk.gov.legislation
762202358765180.000052com.herokuapp
763202346546290.000042it.placehold
7642023438057700.000006com.filedropper
7652023422843310.000008org.globalnetworkinitiative
7662023356615980.000019org.altervista
7672023356232980.000011com.sacbee
7682023342625380.000015org.biorxiv
7692023254032130.000011fr.rfi
7702023253429740.000012com.ericsson
7712023253040730.000009com.kinja
772202303829910.000031com.trello
7732022823031080.000012org.oas
7742022765011830.000025com.ycombinator
7752022670018100.000017org.donorbox
7762022631027170.000014com.e-monsite
7772022571210220.000030gov.fcc
7782022527420990.000016org.unodc
7792022406811590.000026com.tableau
78020223780750.000419net.cpanel
7812022230635800.000010org.tigris
7822022207813660.000022com.alexa
7832022199012480.000024gov.uspto
7842022120638280.000009com.wasabisys
7852022108818090.000017com.speakerdeck
7862021979424160.000015com.miamiherald
7872021913837710.000010com.bangkokpost
7882021836811250.000027gov.cms
7892021675011430.000026org.reactjs
790202160265620.000047com.gartner
7912021567211110.000027com.jwplayer
7922021500828440.000013edu.usf
7932021425829030.000013com.thenation
7942021360627300.000014com.washingtontimes
7952021305833550.000011com.wikidot
796202129729600.000031com.hp
797202109046510.000041gov.sec
7982021023220820.000016com.squarespace-cdn
7992020945026470.000014jp.nicovideo
8002020843044130.000008de.otto
8012020760826790.000014ru.kremlin
802202070882510.000106com.cloudinary
803202064125800.000046fr.free
8042020638410160.000030com.podbean
8052020623664750.000006com.uberant
806202061967140.000039org.apa
8072020488026260.000014se.haxx
8082020477040900.000009com.bloombergquint
8092020379219960.000016org.khanacademy
8102020330810810.000028com.engadget
8112020323037050.000010com.allafrica
8122020319032860.000011vn.com.google
8132020274651230.000007to.gplus
8142020174034680.000010my.com.thestar
8152020148436170.000010uk.org.asa
8162020094828960.000013com.simonandschuster
8172020083029040.000013com.lowes
8182020080422230.000015org.wto
819201999822070.000126com.caniuse
820201999822240.000118com.getbootstrap
8212019996626680.000014tv.ustream
8222019981441180.000009uk.co.spectator
823201988702270.000117org.icann
824201983946530.000041org.eff
8252019752234710.000010com.sputniknews
8262019634039820.000009com.manta
8272019599435090.000010uk.ac.qmul
8282019593033780.000011com.eiu
8292019540631850.000011com.financialpost
8302019539829540.000012uk.gov.metoffice
831201939822690.000102com.naver
8322019390022400.000015gov.gao
8332019313211140.000027edu.ucla
8342019242017130.000018fr.blogspot
8352019234234150.000011org.heritage
8362019205244450.000008org.scala-sbt
8372019181436520.000010com.thenationalnews
8382019140841790.000009com.rappler
8392019136042640.000008com.wusa9
8402019039632420.000011org.rferl
8412018971828070.000013ru.kommersant
8422018962238140.000009org.grist
8432018925414850.000020us.imageshack
8442018882213910.000022com.freeprivacypolicy
8452018878026280.000014org.wbur
8462018823650020.000007com.picsart
8472018803048270.000007org.frontlinedefenders
8482018770038440.000009com.newatlas
849201857805360.000050com.wufoo
8502018499614010.000021edu.northwestern
8512018341226730.000014com.fivethirtyeight
852201832527030.000039com.moz
8532018254420500.000016to.dev
8542018186037990.000009de.wwf
8552018174442660.000008com.iconarchive
8562018136637750.000009org.pri
857201794789430.000032com.redhat
85820178530550.000603com.dan
8592017848444900.000008tw.blogspot
8602017761028760.000013com.infoworld
861201757366640.000041com.aliexpress
862201756566850.000040com.photobucket
8632017454637100.000010int.au
8642017238636130.000010org.jenkins-ci
8652017196038070.000009com.obsproject
8662017018427270.000014com.discogs
8672017014845320.000008com.koreaherald
8682016968438380.000009ru.forbes
869201693809750.000031com.stackexchange
8702016757230070.000012com.yougov
8712016722835420.000010ly.plot
8722016687630860.000012org.panda
8732016668034000.000011com.law360
8742016575410640.000028com.emarketer
8752016381045490.000008org.article19
8762016377011090.000027com.merriam-webster
877201632063980.000067com.bitly
8782016195842270.000008com.prevention
8792016171263140.000006org.arkive
8802016165019360.000016com.hackerone
8812016140434500.000010com.news24
8822016138832770.000011com.foreignaffairs
8832016123244510.000008fr.huffingtonpost
884201605264430.000059com.skype
8852015839066530.000006com.booklikes
886201582829130.000033com.marketwatch
8872015800611990.000025org.webkit
8882015772640760.000009au.com.heraldsun
8892015756044760.000008org.siggraph
8902015695016480.000018com.newrelic
8912015638438840.000009gov.fec
8922015590238500.000009org.brainpickings
8932015515036310.000010de.uni-frankfurt
8942015484823390.000015com.w3techs
8952015442832670.000011edu.unh
8962015430243870.000008br.unicamp
89720153190580.000586com.afternic
8982015287657420.000006cc.kknews
8992015261010460.000029com.pwc
9002015235841820.000008com.wallethub
9012015149239870.000009com.collinsdictionary
902201502823120.000090com.webflow
9032015019445690.000008org.firstmonday
9042015016611620.000026com.appnexus
9052014971243710.000008uk.ac.westminster
9062014858447700.000007com.selfridges
9072014852233410.000011com.scotsman
9082014839419420.000016com.ssllabs
9092014767241690.000009com.datacenterknowledge
9102014658229530.000012com.washingtonexaminer
911201463964170.000063com.force
9122014571647390.000007br.ufrgs
9132014559826120.000014ru.ria
9142014507060180.000006com.armorgames
9152014441444780.000008net.middleeasteye
9162014300437430.000010com.thediplomat
9172014193043400.000008com.the-scientist
9182014185236760.000010gov.ornl
9192014133226180.000014gov.energystar
9202014031829270.000013org.wri
9212013986612730.000023org.owasp
9222013750635720.000010org.wilsoncenter
9232013722641940.000008uk.co.manchestereveningnews
924201371984520.000058gov.consumerfinance
9252013718013540.000022com.symantec
926201369368770.000035com.libsyn
9272013693014720.000020com.twilio
9282013678011770.000025com.semrush
9292013675657630.000006net.postheaven
9302013674241070.000009com.crashlytics
9312013634016080.000019com.techrepublic
9322013627814690.000020com.createjs
933201361749110.000033edu.columbia
934201355269580.000031com.buzzsprout
935201350185910.000045net.azurewebsites
9362013485827880.000013org.iucn
9372013445444840.000008com.googledrive
9382013417635560.000010org.sonatype
9392013411818040.000017ly.ow
9402013410445220.000008io.meduza
9412013399415590.000019net.msecnd
9422013397214170.000021com.weather
9432013279613750.000022com.rollingstone
9442013255041670.000009ru.aif
9452013248816490.000018com.upwork
9462013207818320.000017com.chrome
947201314184450.000059com.dmca
9482013093840970.000009org.avaaz
9492012964253340.000007cn.edu.sdu
9502012859025740.000015ru.rbc
951201285266610.000041com.figma
9522012748433910.000011nl.rug
9532012654052100.000007org.sourcewatch
9542012586651480.000007com.wsoctv
9552012554651490.000007com.linodeobjects
9562012546227240.000014int.reliefweb
9572012486031300.000012org.cfr
9582012454228590.000013com.springeropen
959201239703350.000083com.wistia
960201221989900.000031org.json
9612012198258400.000006com.grabcad
9622012111835510.000010ru.vedomosti
9632012088455050.000006org.sfpl
9642012018846370.000008ch.qos
9652011971636930.000010org.escholarship
9662011914839770.000009uk.ac.sussex
967201189823440.000081com.automattic
9682011877841310.000009com.gannett-cdn
9692011734640520.000009edu.scu
9702011727440680.000009org.nationalinterest
9712011708435100.000010com.tradingeconomics
9722011705237020.000010org.thinkprogress
9732011696442400.000008com.dawn
9742011628441660.000009cc.taplink
9752011517831060.000012ca.citizenlab
9762011482826520.000014com.bankrate
9772011468220300.000016com.tutsplus
9782011446214120.000021org.golang
9792011383257060.000006com.london2012
9802011363620330.000016org.linuxfoundation
9812011328018400.000017edu.rutgers
9822011304831790.000011org.undocs
9832011267248640.000007za.co.dailymaverick
9842011262831980.000011com.springernature
9852011255237400.000010au.edu.adelaide
9862011182246650.000008com.mnn
9872011028043120.000008ae.google
9882011027430890.000012org.crossref
9892011026235450.000010com.vox-cdn
9902011025641080.000009com.dailykos
9912010954848820.000007uk.ac.lancs
992201094728630.000036org.ieee
993201080507290.000038ca.canada
9942010684834440.000010org.cato
9952010659437720.000009gov.ustr
996201065624690.000056com.indeed
9972010655434790.000010com.cityam
9982010617241930.000008de.ebay-kleinanzeigen
9992010580410190.000030com.techtarget
1000201044946690.000040gov.copyright

Credits

Thanks to the authors of the WebGraph framework, whose software made the computation of graph properties and ranks possible.

We hope the data will be useful for you to do any kind of research on ranking, graph analysis, link spam detection, etc. Let us know about your results via Common Crawl’s Google Group!

Host- and Domain-Level Web Graphs October, November/December 2021 and January 2022

We are pleased to announce a new release of host-level and domain-level web graphs based on the crawls of October, November/December 2021 and January 2022. Additional information about the data formats, the processing pipeline, our objectives, and credits can be found in the announcements of prior webgraph releases. You may also visit the projects cc-webgraph and cc-pyspark which include all scripts and tools required to construct the graphs. Instructions to explore the graphs in the webgraph format are given in our collection of webgraph notebooks.

Host-level graph

The graph consists of 384 million nodes and 2.47 billion edges. Both hyperlinks and HTTP redirects and link headers are used as edges to span up the graph. All types of links are included, including pure “technical” ones pointing to images, JavaScript libraries, web fonts, etc. However, only host names with a valid IANA TLD are used. Consequently, URLs with an IP address as host component are not taken into account for building the host-level graph.

There are 326 million dangling nodes (84.6%) and the largest strongly connected component contains 45.2 million (11.7%) nodes. Dangling nodes stem from

  • hosts that have not been crawled, yet are pointed to from a link on a crawled page
  • hosts without any links pointing to a different host name
  • or hosts which did only return an error page (eg. HTTP 404)

Host names in the graph are in reverse domain name notation and a leading www. is stripped: www.subdomain.example.com becomes com.example.subdomain.

You can download the graph and the ranks of all 384 million hosts from AWS S3 on the path s3://commoncrawl/projects/hyperlinkgraph/cc-main-2021-22-oct-nov-jan/host/ (this requires an account on AWS). Alternatively, you can use https://data.commoncrawl.org/projects/hyperlinkgraph/cc-main-2021-22-oct-nov-jan/host/ as prefix to access the files from everywhere.

Please note that the text representation of the host-level graph is shipped in 96 gzip-compressed files listed in two path listings – one for the nodes (vertices), one for the edges (arcs). First, download the paths listing and decompress it using “gzip -d” or “gunzip”. By adding the prefix s3://commoncrawl/ or https://data.commoncrawl.org/ to each line in the path listing you get the list of URLs to download the entire graph.

Download files of the Common Crawl Oct/Nov/Jan 2021-2022 host-level webgraph

SizeFileDescription
2.66 GBcc-main-2021-22-oct-nov-jan-host-vertices.paths.gznodes ⟨id, rev host⟩, paths of 32 vertices files
11.76 GBcc-main-2021-22-oct-nov-jan-host-edges.paths.gzedges ⟨from_id, to_id⟩, paths of 64 edges files
5.32 GBcc-main-2021-22-oct-nov-jan-host.graphgraph in BVGraph format
2 kBcc-main-2021-22-oct-nov-jan-host.properties
5.78 GBcc-main-2021-22-oct-nov-jan-host-t.graphtranspose of the graph (outlinks inverted to inlinks)
2 kBcc-main-2021-22-oct-nov-jan-host-t.properties
1 kBcc-main-2021-22-oct-nov-jan-host.statsWebGraph statistics
6.38 GBcc-main-2021-22-oct-nov-jan-host-ranks.txt.gzharmonic centrality and pagerank

Domain-level graph

The domain graph is built by aggregating the host graph on the level of pay-level domains (PLDs) based on the public suffix list maintained on publicsuffix.org. Version (commit) 68b67d3 of the public suffix list was used (commit date 2022-03-04).

The domain-level graph has 90 million nodes and 1.55 billion edges. 50% or 45 million nodes are dangling nodes, the largest strongly connected component covers 36 million or 40% of the nodes.

All files related to the domain graph are available on AWS S3 under s3://commoncrawl/projects/hyperlinkgraph/cc-main-2021-22-oct-nov-jan/domain/ or on https://data.commoncrawl.org/projects/hyperlinkgraph/cc-main-2021-22-oct-nov-jan/domain/.

Download files of the Common Crawl Oct/Nov/Jan 2021-2022 domain-level webgraph

SizeFileDescription
0.62 GBcc-main-2021-22-oct-nov-jan-domain-vertices.txt.gznodes ⟨id, rev domain, num hosts⟩
6.36 GBcc-main-2021-22-oct-nov-jan-domain-edges.txt.gzedges ⟨from_id, to_id⟩
3.65 GBcc-main-2021-22-oct-nov-jan-domain.graphgraph in BVGraph format
2 kBcc-main-2021-22-oct-nov-jan-domain.properties
3.53 GBcc-main-2021-22-oct-nov-jan-domain-t.graphtranspose of the graph
2 kBcc-main-2021-22-oct-nov-jan-domain-t.properties
1 kBcc-main-2021-22-oct-nov-jan-domain.statsWebGraph statistics
1.93 GBcc-main-2021-22-oct-nov-jan-domain-ranks.txt.gzharmonic centrality and pagerank

Below you’ll find the top 1000 domains ranked by Harmonic Centrality or PageRank. The full list of all 90 million domain ranks is available for download.

Top 1000 domains ranked by harmonic centrality (Oct/Nov/Jan 2020-2021)

harmonic
centrality
rank
hc valuepage rankpage rank
value
reversed domain name
13144841810.017921com.googleapis
23010508630.013006com.facebook
32899172620.014028com.google
42623388660.007154org.w
52623015240.008081com.twitter
62586499480.006261com.youtube
72532525070.006558com.instagram
82466516050.007716com.googletagmanager
92409722490.004716org.gmpg
1023219788110.003948com.gstatic
1123145196130.003418com.linkedin
1222256010100.004364com.cloudflare
1321953630170.001942com.gravatar
1421841462210.001594com.pinterest
1521786278140.003223org.wordpress
1621441268250.001417org.wikipedia
1721311496160.002057com.apple
1821176702330.001077com.wordpress
1921045484320.001141com.vimeo
2021032900150.002070com.bootstrapcdn
2120995456410.000913be.youtu
2220796928180.001658com.jquery
2320726342280.001191com.microsoft
2420673758450.000789com.blogspot
2520634464230.001451io.polyfill
2620617114470.000775gl.goo
2720589400490.000736com.amazon
2820486426640.000550ly.bit
2920476754290.001178com.wixstatic
3020455806500.000729com.wp
3120452992220.001551net.cloudfront
3220445298400.000962com.amazonaws
3320410144430.000863org.mozilla
3420403210310.001162net.jsdelivr
3520395958510.000721eu.europa
3620389728370.000994com.google-analytics
3720343648200.001595com.fontawesome
3820336692910.000384com.tumblr
3920333242190.001648com.adobe
4020324398240.001421com.github
4120181488750.000480com.googleusercontent
4220150384550.000684com.flickr
43201227581040.000326com.yahoo
4420112952570.000670com.paypal
4520106820480.000752io.github
46201027501060.000314com.reddit
47200600461170.000268com.soundcloud
4820049174380.000966com.googlesyndication
4920043166810.000425com.medium
5020007290530.000703org.w3
51199947101270.000231com.nytimes
5219974404620.000611co.t
53199561441020.000338com.weebly
54199556061140.000277com.spotify
5519925440580.000656com.whatsapp
5619906786340.001038ru.yandex
5719901846850.000401org.creativecommons
58198942141360.000208org.archive
59198708461830.000139com.cnn
6019863762630.000608org.schema
6119855684600.000645com.addthis
62198513441460.000194com.forbes
63198365822030.000127uk.co.bbc
6419834544700.000513com.shopify
65198117582340.000113com.washingtonpost
6619808664690.000523com.vk
67198048621850.000138com.bing
68198036221470.000193gov.cdc
69198029141570.000172int.who
7019798680920.000383me.wp
7119780956440.000809net.doubleclick
72197607381410.000201gov.nih
7319749646590.000649com.macromedia
7419748198710.000506com.unpkg
75197476642600.000103net.researchgate
76197341102520.000106com.wsj
77197331522930.000093edu.stanford
78197316742320.000115com.imdb
79197247601880.000134org.wikimedia
8019712934390.000965net.fbcdn
81197009582060.000124com.businessinsider
82196901141430.000200com.dropbox
83196878202840.000095edu.mit
84196830601000.000363com.list-manage
85196823383080.000089com.tinyurl
8619665976270.001204org.apache
87196574461630.000157com.theguardian
88196514582990.000092com.android
89196477684450.000067com.quora
90196430181650.000156org.doi
91196365943280.000085com.go
92196298262400.000109com.bloomberg
93196297622740.000098edu.harvard
94196274445010.000060com.msn
95196258701640.000157com.issuu
96196256462540.000106com.oracle
97196246783440.000083com.springer
98196219461490.000190com.wixsite
99196151821400.000203us.zoom
100196124841310.000220com.npmjs
101196088121110.000294me.t
102196083143340.000084com.slack
103196023541240.000241com.mailchimp
104195961922440.000108com.stackoverflow
105195923123030.000091com.reuters
106195878003210.000087com.techcrunch
107195861925050.000059com.myspace
108195861262410.000109com.twimg
109195842381660.000155com.giphy
110195839082920.000093com.example
11119577384520.000709com.fb
112195767261670.000153com.yelp
113195760201690.000151com.office
114195725023410.000083com.prnewswire
115195679661900.000133com.unsplash
116195644921080.000309de.google
117195610163000.000091com.wiley
11819555072460.000785net.facebook
119195525102680.000101org.un
120195490982560.000105com.sciencedirect
121195481624890.000061com.latimes
122195475085940.000050com.livejournal
123195474521450.000196gle.forms
124195443004660.000063uk.co.telegraph
125195436904020.000078com.nature
126195423843400.000083org.npr
127195414204840.000062com.ted
128195353325140.000057edu.berkeley
129195331066470.000046com.vice
130195313282370.000110org.gnu
131195297981980.000130org.ietf
132195278527140.000042uk.ac.cam
133195249384340.000071com.time
134195223822890.000094com.bbc
135195174604970.000060com.goodreads
136195136164120.000076org.arxiv
137195100343060.000090com.cnbc
138195081841440.000197com.ytimg
139194922067000.000043edu.columbia
140194836424330.000071com.sagepub
141194759081530.000186com.ft
142194745921730.000149org.acm
143194733163230.000086com.githubusercontent
144194731324380.000069com.cnet
145194680421200.000254com.youtube-nocookie
146194585404090.000077com.wired
147194556342050.000125com.imgur
148194542105180.000057uk.co.dailymail
149194496282020.000127com.blogger
15019448116790.000459com.godaddy
15119443158260.001330cn.gov.miit
152194377583430.000083com.theverge
153194355685090.000058edu.yale
154194342361710.000150org.ampproject
155194264464720.000063com.nationalgeographic
156194232522810.000096com.squarespace
157194198808040.000037org.chromium
158194165306960.000043uk.ac.ox
159194131384880.000061com.googleblog
160194078484580.000064gov.whitehouse
161194076143040.000090com.usatoday
162194072284990.000060com.staticflickr
163194070668290.000036com.evernote
164194047341920.000131com.hubspot
165194044144760.000062org.ieee
166194023985870.000051org.worldbank
167194020044070.000077com.dribbble
168194002141320.000219com.statcounter
169193997965120.000058ee.linktr
170193993585270.000056edu.cornell
171193955001230.000243com.sharethis
172193882465190.000057com.theatlantic
173193874544080.000077com.docker
174193851826730.000044com.git-scm
175193833042140.000122com.wpengine
176193830307970.000037org.sciencemag
177193817508110.000037com.arstechnica
178193814742750.000098gg.discord
179193812221590.000167com.zendesk
180193790422640.000101uk.co.google
181193766641380.000206me.line
182193735262460.000107uk.co.amazon
183193734486020.000050com.zdnet
184193721542390.000109net.slideshare
185193690463250.000086com.appspot
186193688667020.000042com.economist
187193686207370.000041org.cambridge
188193681125710.000052com.cisco
189193671447920.000038edu.washington
190193663446480.000046org.weforum
191193617606680.000044com.box
192193616926240.000047org.pbs
193193562204790.000062org.python
194193556044360.000070com.huffingtonpost
195193542842260.000117com.outlook
196193531125420.000055com.typepad
197193500442880.000095org.pewresearch
198193456105580.000054com.cbsnews
199193427742670.000101net.windows
200193346385560.000054com.deloitte
2011933368810580.000028com.rollingstone
202193332604960.000060com.pixabay
203193332605540.000054gov.usda
204193323907200.000042google.blog
205193312865720.000052site.business
206193308025440.000055uk.co.independent
2071932516212230.000025ly.cutt
20819323700420.000906com.qq
209193236067600.000039com.apnews
210193233027450.000040ca.cbc
211193228386390.000046org.unesco
212193224344930.000061com.gitlab
213193079487040.000042com.mysql
214193074006110.000049com.pexels
215193021865860.000051gov.loc
216192976227280.000041edu.upenn
217192962627130.000042edu.wisc
218192958864850.000062com.getpocket
219192950205530.000054com.nbcnews
220192940104510.000065com.fastcompany
2211929259213810.000022com.ikea
222192912941780.000143com.tripadvisor
2231928675811710.000026org.eclipse
224192859346760.000043com.scribd
225192850007680.000039com.shutterstock
226192841786340.000047com.mozilla
2271928273812250.000025org.kernel
228192778687980.000037uk.co.blogspot
229192774807840.000038com.qz
230192722468590.000034com.ggpht
231192719883180.000087com.live
232192710809790.000030uk.co.guardian
233192689343010.000091com.w3schools
2341926539819240.000018com.lego
235192643824980.000060gov.irs
236192621488850.000034edu.jhu
237192600066720.000044com.buzzfeed
238192597086740.000044uk.co.eventbrite
239192593667960.000038com.trello
2401925845811810.000025com.technologyreview
241192561929840.000030com.playstation
2421925501410570.000028fr.lemonde
243192514585520.000054com.squareup
244192512665500.000054com.fortune
245192498145070.000059gov.nasa
246192496447510.000040me.about
247192441865960.000050com.oup
248192432562830.000095net.behance
249192417509730.000031com.foursquare
2501924036422120.000017com.hbo
251192388066350.000047fm.anchor
252192372123270.000085com.disqus
253192358809150.000033com.slate
25419234328540.000697co.g
25519232270360.000996com.baidu
256192310944560.000065com.bigcommerce
257192294301750.000146jp.co.google
258192292702490.000107com.calendly
259192292287560.000040com.vox
260192275645570.000054com.dailymotion
261192264806540.000045com.investopedia
262192255248600.000034com.ubuntu
263192251462720.000099com.bandcamp
2641922425815640.000021com.hatenablog
2651922244610870.000027co.elastic
266192214447240.000042com.newyorker
267192213608540.000035com.about
268192207624600.000064com.arcgis
269192194408140.000037com.variety
270192190247810.000038au.net.abc
271192172285510.000054com.elpais
272192156128220.000036edu.ucla
273192149428020.000037gov.congress
274192144226770.000043org.apa
275192124027400.000041com.freepik
2761921134611260.000026com.steamcommunity
277192108982500.000107gov.ca
278192098507730.000039org.pypi
279192095267190.000042com.libsyn
280192083088650.000034edu.princeton
281192079241420.000200com.opera
282192072849400.000032com.nypost
283192052968130.000037edu.umich
2841920419213390.000023com.billboard
285192040563120.000088com.typeform
286192019042800.000097com.feedburner
287192005508710.000034com.ssrn
288191997145460.000055com.tandfonline
289191979089050.000033com.podbean
290191977542240.000117page.g
291191976048270.000036org.fao
292191975388680.000034com.foxnews
293191971269640.000031com.merriam-webster
2941919624210930.000027edu.purdue
2951919033015790.000021ca.ubc
296191867727100.000042org.bitbucket
297191864181010.000349com.wix
2981918397210750.000028org.owasp
299191823722730.000099com.ibm
300191809869550.000031com.newsweek
301191782906880.000043org.semver
302191779502510.000106org.bbb
3031917037414030.000022ca.sfu
3041916991021550.000017com.discovery
3051916970614540.000022uk.co.metro
306191689484030.000078org.openstreetmap
307191681289280.000032com.webs
308191642242770.000098com.eepurl
309191633123990.000079com.netdna-ssl
310191626703100.000089com.wistia
3111916076813010.000023app.netlify
312191602429100.000033com.nasdaq
313191588846370.000047gov.senate
314191587341350.000212com.filesusr
315191567925280.000056com.snapchat
316191567323170.000088tv.twitch
3171915547213520.000023uk.co.standard
318191536349770.000030com.uk
319191504546690.000044org.eff
3201914870614330.000022io.gitlab
3211914676216090.000021com.warnerbros
3221914544810250.000029com.techradar
3231914461811730.000026com.500px
3241914363011580.000026com.pastebin
325191416445830.000051gov.epa
326191405748200.000036com.theconversation
3271913929411470.000026org.semanticscholar
328191391301250.000238com.rawgit
3291913884212280.000025com.sky
3301913653417470.000019com.flipboard
331191348024540.000065com.ebay
332191338162040.000125com.amazon-adsystem
333191330285990.000050edu.cmu
3341913292612820.000024edu.illinois
3351913271013270.000023org.greenpeace
336191317924290.000073com.optimizely
3371913056216680.000020com.urbandictionary
338191303122010.000127org.iana
339191295466090.000049gov.house
34019129270980.000373com.stripe
341191270044630.000064org.opensource
342191253782470.000107com.cloudinary
343191224569030.000033edu.academia
3441912007011820.000025org.mitre
3451911999210080.000030gov.usgs
346191190442150.000122net.sourceforge
3471911892421170.000018com.channel4
3481911813614890.000022uk.co.thesun
3491911751615580.000021com.deadline
350191146109960.000030com.thehill
351191135768970.000033edu.umn
352191132267580.000040gov.justice
3531911018616430.000020org.maven
354191100161560.000173com.addtoany
355191081844040.000077com.criteo
3561910322023560.000016com.freep
357191013681280.000230com.paypalobjects
3581909877011330.000026com.nikkei
359190987584570.000065es.google
360190964486780.000043org.oecd
3611909332812910.000024org.postgresql
3621909221419410.000018com.euronews
363190917227720.000039gov.archives
3641909126215840.000021com.reverbnation
3651909037611380.000026uk.co.mirror
366190887765640.000053com.kickstarter
3671908746027660.000013edu.byu
3681908707013380.000023edu.hbs
3691908541413820.000022com.googlesource
3701908472621160.000018edu.wustl
3711908445810100.000030com.politico
3721908322222260.000017org.nobelprize
3731908258411110.000027com.dw
3741908228410220.000029com.pingdom
375190821945480.000054com.walmart
37619078996840.000405net.jsfiddle
3771907856812840.000024ch.ethz
3781907376622990.000016gov.cia
379190734789830.000030com.salon
380190730427150.000042org.change
3811907281011840.000025com.theglobeandmail
382190714724620.000064com.elsevier
3831907122218370.000019com.storify
384190664381370.000207de.bund
385190663841210.000250com.jimdo
3861906634414290.000022edu.gatech
38719064084670.000527net.typekit
3881906273813100.000023com.digitaltrends
3891906247812860.000024int.unfccc
390190613128100.000037au.com.google
3911905974610260.000029gov.treasury
3921905951422830.000016com.mystrikingly
393190591109060.000033com.britannica
3941905817012720.000024edu.ucdavis
395190575589040.000033uk.parliament
39619056260760.000468me.fb
3971905432010270.000029com.mdpi
3981905274612440.000024com.aljazeera
399190521442070.000124com.etsy
400190520622690.000100net.azureedge
4011905181210630.000028gov.fbi
4021905172016080.000021ms.1drv
403190488368240.000036com.bmj
4041904865413250.000023de.mpg
4051904742425380.000014com.virustotal
4061904712211060.000027org.nejm
407190463362110.000123com.tiktok
408190457541940.000131org.nodejs
4091904489225660.000014com.diigo
4101904361411830.000025com.scmp
4111904280010950.000027au.com.smh
412190422587180.000042org.d3js
4131904180413110.000023com.history
4141904150611780.000026org.hrw
4151903975412350.000025uk.ac.ucl
4161903835612130.000025com.socialmediatoday
4171903577610380.000029edu.uchicago
4181903377427100.000013com.thecvf
419190303569680.000031org.readthedocs
42019030118610.000612com.googleadservices
4211902981410180.000029org.jstor
422190290708330.000035com.pinimg
4231902869624860.000015com.oxforddictionaries
4241902841021450.000017com.discogs
4251902747625680.000014edu.buffalo
4261902335218100.000019com.buzzfeednews
427190231969570.000031watch.fb
4281902249211360.000026org.sphinx-doc
4291902249018510.000018com.spreaker
4301902207815480.000021com.irishtimes
431190218506080.000049com.biomedcentral
4321901943814520.000022uk.ac.lse
433190183084690.000063org.hbr
434190181063960.000079com.statista
4351901738610450.000029com.substack
436190141822850.000095ru.ok
4371901337435090.000010com.quizlet
438190124708070.000037com.deviantart
4391901060211150.000027org.undp
4401900553219590.000018com.rt
4411900465210040.000030org.ilo
4421900441633520.000011cc.uxdesign
4431900330622340.000017org.wto
4441900269016730.000020org.rfc-editor
4451900203413020.000023com.penguinrandomhouse
446190015369200.000032de.spiegel
4471899994013180.000023com.producthunt
448189975827050.000042gov.sec
449189971285200.000057com.meetup
4501899471821620.000017com.ibtimes
4511899389810610.000028com.sun
4521899278821200.000018gov.federalreserve
4531899162417600.000019edu.arizona
454189907326000.000050edu.utah
4551899064218610.000018com.newscientist
456189897305370.000055com.gmail
4571898941011230.000026net.java
4581898664418850.000018com.itv
459189864505040.000059com.ssl-images-amazon
460189860102430.000108uk.org.ico
4611898554217810.000019ca.blogspot
46218985454730.000503net.akamaihd
463189850108400.000035in.co.google
4641898480213910.000022de.zeit
4651898448010050.000030uk.co.thetimes
4661898445810420.000029com.prweb
4671898387626610.000013com.twitpic
4681898379011950.000025io.pypa
4691898264829560.000012com.openai
470189804944470.000067net.imgix
4711897985019010.000018com.martinfowler
472189790622420.000108org.purl
473189785966550.000045de.gesetze-im-internet
474189784503970.000079net.themeforest
475189784082160.000121jp.co.yahoo
4761897799214090.000022edu.ufl
477189773504250.000073com.atlassian
4781897578413730.000023edu.duke
479189744862360.000111to.amzn
4801897413022850.000016edu.gmu
4811897400410560.000028edu.nyu
482189734145750.000052org.debian
4831897314812490.000024com.jetbrains
484189731082350.000111com.mapbox
485189730122960.000092me.telegram
4861897255612870.000024com.wikia
48718972192820.000409com.oversightboard
488189711303420.000083com.proofpoint
489189708309940.000030com.jimdofree
4901897078226380.000014org.nypl
491189688348060.000037edu.brookings
4921896847022130.000017org.wfp
4931896836624170.000015mp.j
4941896796626240.000014app.web
4951896744823630.000015com.instructables
4961896695012670.000024org.imf
4971896595211340.000026org.unhcr
4981896565016100.000021edu.virginia
4991896558630000.000012ph.telegra
5001896335821220.000018org.propublica
5011896306816550.000020edu.brown
5021896102014300.000022com.seattletimes
503189604543130.000088io.shields
5041896017423480.000016org.archlinux
505189597303550.000081com.surveymonkey
506189588927470.000040gov.state
5071895686410190.000029com.yarnpkg
5081895670621420.000017org.phys
5091895667216880.000020org.unwomen
510189552008090.000037com.fiverr
5111895488429300.000012org.vim
5121895463434470.000010com.instapaper
513189538142210.000119com.eventbrite
514189531368780.000034edu.psu
5151894851415990.000021com.asahi
5161894820623380.000016ca.ualberta
5171894770631300.000011com.rd
518189472607480.000040com.intel
5191894708824140.000015com.gfycat
5201894697626490.000013org.icrc
5211894470024670.000015org.biorxiv
5221894121416580.000020org.r-project
523189398742570.000105com.aliyuncs
524189368461390.000205com.weibo
5251893647418520.000018com.gettyimages
526189317985250.000057com.googlecode
5271893022436990.000010com.plurk
5281892932021360.000017org.unep
5291892911417370.000019com.howstuffworks
5301892892226170.000014com.udacity
5311892753418300.000019edu.georgetown
5321892673823270.000016com.esri
533189256126850.000043uk.gov.service
5341892427427420.000013jp.co.japantimes
5351892424224060.000015com.kobo
536189215985550.000054com.samsung
5371892149414470.000022fr.gouvernement
5381892051828210.000013org.wikibooks
5391891955024540.000015it.scoop
5401891754234410.000010net.openreview
5411891695021370.000017es.abc
5421891368424220.000015jp.geocities
5431891349427480.000013edu.uoregon
5441891290021190.000018google.ai
5451890939226550.000013co.carrd
5461890761621150.000018uk.co.huffingtonpost
547189044046900.000043com.mashable
548189035706040.000049com.steampowered
5491890282817300.000020org.torproject
550189025704490.000066com.netflix
5511890168422150.000017google.research
5521890047826990.000013at.ac.univie
5531889942821430.000017edu.tufts
554188977749130.000033com.thelancet
5551889491637910.000009goog.translate
556188938289820.000030org.ohchr
5571889246438800.000009com.bravesites
5581889043228870.000012org.rsf
5591889034217920.000019gov.usembassy
5601888666230160.000012com.architecturaldigest
5611888609811030.000027cn.news
5621888556623670.000015uk.bl
5631888318634070.000010uk.co.walesonline
5641888234424610.000015org.accessnow
5651888085622910.000016com.france24
5661887997633090.000011com.pearltrees
5671887890228150.000013org.freedomhouse
568188755282190.000120com.salesforce
5691887486825750.000014org.scala-lang
5701887426611420.000026be.google
5711887422026840.000013re.appsto
5721887324624000.000015org.ap
5731887300831830.000011do.bit
5741887231428330.000012com.sputniknews
5751887192621650.000017org.americanprogress
5761887118615680.000021com.chron
5771887107227920.000013org.unaids
5781886990226890.000013com.ajc
5791886855822510.000016app.vercel
580188680764950.000060com.visualstudio
5811886674613330.000023net.daringfireball
5821886546426540.000013org.csis
5831886376023500.000016com.ew
5841886209212930.000024link.page
5851886128623200.000016fr.gouv.diplomatie
586188607901890.000133ru.mail
587188606368520.000035org.mediawiki
588188599306220.000048com.thinkwithgoogle
5891885861228650.000012com.duolingo
5901885558621120.000018com.domaintools
591188553942910.000094net.secureservercdn
5921885523628600.000012com.biography
5931885396416480.000020jp.ne.goo
5941885343418270.000019com.lifewire
5951885316024790.000015ie.independent
5961885013629030.000012uk.ac.leeds
5971884948429510.000012com.allure
5981884945222410.000016com.timeout
5991884820631350.000011org.cpj
6001884701641370.000009com.bonanza
6011884432416400.000020ca.globalnews
6021884338017740.000019gov.in
6031884317017310.000020com.images-amazon
6041884311627820.000013com.depositphotos
6051884307418430.000018com.thebalance
60618842016860.000401com.livestream
607188416822870.000095com.naver
608188415584770.000062com.force
6091884075614220.000022net.codecanyon
6101883889033460.000011io.ghost
6111883888424190.000015com.teenvogue
6121883883623660.000015nz.co.stuff
6131883743827320.000013com.123rf
6141883640825080.000014com.motherjones
615188363546790.000043int.wipo
6161883633226150.000014edu.kit
6171883468417060.000020com.routledge
618188343366950.000043io.readthedocs
6191883403836920.000010com.laweekly
620188322084410.000069com.businesswire
6211883126629140.000012org.oxfam
622188298524860.000062com.adweek
6231882961624980.000014edu.hawaii
6241882921635430.000010com.udn
625188282707260.000042com.canva
6261882803429850.000012com.slides
627188279086270.000047io.codepen
6281882755226640.000013com.googlegroups
6291882634839010.000009cn.org.china
6301882570026980.000013com.coca-colacompany
631188256028880.000033uk.co.pinterest
6321882305825510.000014org.fas
6331882283011100.000027net.clickbank
6341882256830990.000011uk.co.timesonline
635188224323630.000081net.php
6361882141026120.000014edu.iastate
6371882139221440.000017com.refinery29
638188203149880.000030gov.dhs
6391881989642160.000008com.alamy
640188198029910.000030de.t-online
641188177002790.000097com.iubenda
6421881743416120.000021com.haaretz
6431881661615650.000021mil.army
6441881648430530.000011com.hm
6451881615616050.000021uk.gov.ons
646188137403590.000081mp.mailchi
6471881142834690.000010org.heritage
6481880892812220.000025org.eugdpr
6491880874823410.000016za.co.google
650188083485630.000053org.unicef
6511880820629380.000012com.theonion
652188079323450.000083com.akismet
653188072161500.000190org.networkadvertising
654188053848340.000035com.venturebeat
6551880367026710.000013com.timesofisrael
6561880296831820.000011com.ogilvy
657188009781340.000217info.aboutads
6581880034412650.000024tw.com.google
659187994725490.000054com.fc2
6601879817027150.000013com.theintercept
6611879814023770.000015com.foreignpolicy
6621879794440090.000009com.zara
6631879675830420.000012org.project-syndicate
6641879641027900.000013cn.gov.fmprc
665187955865410.000055com.patreon
6661879433229120.000012org.ballotpedia
6671879378636750.000010uk.co.guim
6681879366812030.000025com.thenextweb
6691879325024600.000015nz.co.nzherald
6701879195623290.000016gov.faa
671187916066640.000045com.entrepreneur
6721879029816990.000020com.nike
6731878822022570.000016com.voanews
6741878568236460.000010com.podomatic
675187836523940.000080jp.ameblo
6761878173236650.000010nz.co.scoop
6771877982626680.000013com.jpost
678187797707550.000040org.js
6791877887425890.000014de.tagesspiegel
680187784365910.000051com.gofundme
681187782366510.000046it.placehold
682187781306920.000043gov.nist
6831877752830900.000011no.uib
6841877726437590.000010com.clustrmaps
6851877623825330.000014com.channelnewsasia
6861877562623320.000016com.carto
6871877497025440.000014edu.usf
6881877491044010.000008uk.ac.essex
6891877447422030.000017de.br
6901877310242560.000008org.marxists
6911876955828260.000013br.com.blogspot
692187694286800.000043com.photobucket
6931876921633950.000010com.parade
6941876841237310.000010com.mongabay
695187682447420.000041com.moz
6961876807035190.000010ar.com.lanacion
6971876748011620.000026com.digitaloceanspaces
6981876725238330.000009com.scribblelive
6991876628436670.000010ru.msk
7001876514817070.000020org.oxfordjournals
7011876501818460.000018com.speakerdeck
7021876434211770.000026com.jekyllrb
7031876427213200.000023com.imageshack
704187636286980.000043com.withgoogle
7051876351827750.000013com.fineartamerica
7061876330616270.000020org.amnesty
7071876251225720.000014org.unctad
7081876194030490.000012int.au
709187611621090.000306me.wa
7101876093621840.000017org.ncsl
7111875934028570.000012uk.org.nationaltrust
7121875839644350.000008com.mysanantonio
7131875827831540.000011fr.rfi
7141875767413000.000023gov.federalregister
7151875692059680.000006org.arkive
7161875685833850.000011com.nationalreview
7171875639822320.000017org.worldcat
7181875638841600.000009com.turkishairlines
7191875610030770.000011uk.ac.york
7201875587630290.000012org.nationalgeographic
7211875539033600.000011org.tigris
722187552283620.000081com.adnxs
7231875366817650.000019com.indianexpress
7241875348232930.000011org.neocities
7251875170230580.000011ly.genial
7261875039232740.000011uk.co.penguin
727187502429190.000032com.hootsuite
7281875000234940.000010com.nme
7291874733427350.000013com.kaggle
730187468941930.000131com.discord
7311874660033570.000011de.taz
7321874629831470.000011edu.bc
7331874619030410.000012tr.com.aa
7341874554431560.000011com.cgtn
7351874513224310.000015org.unodc
736187445502820.000096gov.ftc
7371874368223440.000016eu.politico
7381874305213640.000023com.symantec
7391874083425200.000014net.openid
7401874075236370.000010il.ac.tau
7411873939824530.000015ru.ria
7421873894830310.000012com.allafrica
7431873877211720.000026jp.ac.keio
7441873820028690.000012edu.educause
7451873815634020.000010org.firstmonday
7461873812226420.000014org.wikidata
747187377884910.000061com.placeholder
7481873498029000.000012com.simonandschuster
7491873486037960.000009org.amnestyusa
7501873469621680.000017com.justia
7511873330821970.000017ca.on.gov
7521873308837070.000010uk.gov.scotland
7531873302645630.000008com.flightradar24
7541873294443260.000008com.interviewmagazine
7551873274237440.000010com.afp
7561873215441120.000009org.scala-sbt
7571873099227800.000013ae.google
7581873082413030.000023org.webkit
7591872988229470.000012com.superuser
7601872967810140.000029com.highcharts
7611872902038020.000009com.wusa9
7621872890025500.000014jp.nicovideo
7631872809421410.000017gov.pa
7641872799039370.000009org.one
7651872667229080.000012edu.uky
7661872503430520.000011in.businessinsider
7671872447433820.000011org.hypotheses
7681872337826810.000013org.wbur
769187232365690.000052com.inc
7701872106614340.000022com.upwork
7711872090842320.000008org.sourcewatch
7721872067633660.000011com.sciencealert
7731872022213160.000023de.rki
7741871911434660.000010org.royalsociety
7751871852423540.000016ru.rbc
7761871819810340.000029com.videojs
7771871753033830.000011org.polymer-project
778187172284870.000062ee.lin
7791871700832020.000011org.texastribune
7801871659810540.000028fm.last
7811871656436580.000010se.gu
7821871591621810.000017it.redd
7831871555212190.000025com.smashingmagazine
7841871553627430.000013org.undocs
7851871346027130.000013org.iucn
7861871309236620.000010com.hashicorp
7871871294417120.000020scot.gov
788187125649760.000031com.jwplayer
7891871233038430.000009edu.wayne
790187111225760.000052com.booking
791187109068030.000037com.fandom
7921871085838500.000009com.triplepundit
793187096903290.000085com.hackerone
7941870931442350.000008com.letterboxd
7951870765411510.000026com.alexa
7961870753422580.000016com.knightlab
797187069647700.000039com.sedo
7981870675033940.000010org.iucnredlist
7991870638019440.000018com.firebaseapp
8001870580840360.000009com.manta
8011870254630720.000011au.com.theage
8021870145034750.000010org.sierraclub
803187004244000.000078com.onesignal
8041870034826520.000013ru.kommersant
8051870025033790.000011com.hasbro
8061869935638210.000009edu.unu
8071869914636560.000010com.crashlytics
808186988948170.000037com.marketwatch
8091869849440710.000009ru.aif
8101869822444190.000008com.folkd
8111869815212850.000024gov.uspto
8121869773433580.000011net.ipsnews
8131869748426320.000014org.unfpa
814186973369870.000030com.stackexchange
8151869651032070.000011ly.plot
816186961845320.000056com.indeed
8171869533416670.000020fr.blogspot
818186952469720.000031com.css-tricks
819186943829330.000032org.reactjs
8201869329240830.000009com.marinetraffic
8211869294027060.000013ru.rg
8221869290043650.000008com.balenciaga
8231869246825610.000014com.kinstacdn
8241869171025860.000014build.bazel
825186907465450.000055com.digg
8261869019039970.000009jp.co.tepco
8271869018214570.000022io.webflow
8281869014846850.000008com.gmanetwork
8291868999437790.000009org.rferl
8301868987044160.000008kr.co.koreatimes
831186894528210.000036com.oreilly
832186894269530.000031gov.fcc
8331868902429570.000012com.articulate
8341868863430540.000011site.notion
8351868825624410.000015int.reliefweb
8361868821827220.000013com.insidehighered
8371868787210860.000027so.notion
8381868781044660.000008org.sfpl
8391868767235800.000010uk.co.spectator
8401868762228400.000012com.suntimes
841186874529170.000032com.verisign
8421868688029670.000012org.cfr
8431868662228190.000013org.panda
8441868629810160.000029com.mixcloud
84518686002830.000405com.messenger
846186859345330.000056jp.co.rakuten
8471868576443430.000008com.upworthy
8481868549426440.000014ru.kremlin
849186848062780.000097com.sxsw
8501868421235420.000010com.flippa
851186840065680.000052com.mckinsey
8521868395625340.000014net.convio
8531868351012310.000025com.buffer
8541868300631000.000011com.yougov
8551868292851100.000007com.viki
8561868243646740.000008org.birdlife
8571868196644130.000008com.itsnicethat
858186813746500.000046com.gartner
8591868137229270.000012uk.gov.metoffice
860186810844940.000061com.dmca
8611868084035970.000010org.jenkins-ci
8621868035830890.000011int.iom
8631867867040820.000009com.iconarchive
8641867738444800.000008com.oriflame
8651867651845380.000008net.middleeasteye
8661867585450410.000007com.waitbutwhy
8671867553443160.000008org.pen
8681867527428300.000013fm.omny
8691867428239020.000009org.icij
8701867404441880.000008org.constitutioncenter
8711867397238570.000009ch.qos
8721867347440700.000009com.9to5google
8731867343235260.000010uk.gov.companieshouse
8741867340639940.000009uk.ac.sussex
8751867325832660.000011com.foreignaffairs
8761867324628350.000012com.news24
8771867320441320.000009re.cli
8781867269042270.000008jp.ac.kobe-u
879186717149020.000033br.com.uol
8801867155237180.000010com.nybooks
8811867144618180.000019com.over-blog
8821867136255780.000006com.symbaloo
8831866961231700.000011uk.co.bbci
884186692009340.000032com.pubmatic
8851866901023850.000015com.scene7
8861866881034670.000010org.wikileaks
8871866724246370.000008org.foodandwaterwatch
8881866643833050.000011at.derstandard
889186660748760.000034com.zoho
8901866541031670.000011org.adb
8911866436235180.000010com.benzinga
892186639887300.000041com.usnews
8931866345055920.000006io.postach
8941866303035470.000010com.palgrave
8951866246411090.000027net.media
8961866212040910.000009net.datasociety
897186614265400.000055com.googleoptimize
8981866104441160.000009au.com.heraldsun
8991865906834400.000010ru.kp
9001865798826750.000013com.thenation
901186576906630.000045me.zalo
9021865705230010.000012com.unity
9031865579616350.000020org.altervista
9041865458048260.000007it.polito
9051865449044820.000008edu.odu
9061865420031710.000011org.sonatype
9071865379028530.000012net.vnexpress
908186532847350.000041com.alibaba
9091865251044540.000008com.muckrack
9101865240029370.000012com.lexology
9111865227049620.000007kr.co.hani
9121865081837230.000010com.tradingeconomics
9131865060638820.000009com.study
914186505945950.000050com.airbnb
9151864977636630.000010gov.ustr
9161864967449000.000007com.theodysseyonline
9171864952037240.000010uk.gov.homeoffice
9181864827810310.000029com.pcmag
919186471346330.000047org.joomla
9201864577033670.000011br.scielo
92118645448740.000486com.trustpilot
9221864489055520.000006au.edu.vu
9231864471031270.000011tw.com.pchome
924186442607250.000042com.splashthat
9251864356027450.000013ca.citizenlab
9261864262845990.000008com.condenast
9271864235416760.000020com.techrepublic
9281864143021250.000018io.pantheonsite
9291864132232810.000011ru.cbr
9301864124028660.000012ca.uwaterloo
9311864091243840.000008uk.co.belfasttelegraph
932186406885970.000050com.wufoo
9331863919432310.000011org.ellenmacarthurfoundation
9341863904648210.000007com.zimbio
9351863882433210.000011com.rabbitmq
936186384225470.000054com.herokuapp
9371863685256570.000006org.cgsociety
9381863491257040.000006in.teletype
939186349005310.000056com.aol
9401863402637520.000010edu.ucpress
9411863397633750.000011com.scotsman
9421863349432770.000011com.kroger
943186322724050.000077com.constantcontact
944186319308700.000034com.emarketer
9451863087456430.000006com.dbs
9461863083844070.000008au.edu.deakin
9471863063033150.000011org.osce
9481862945829710.000012com.euractiv
9491862864247880.000007com.latercera
9501862610238810.000009com.bloombergquint
951186255809520.000031com.digitalocean
9521862510236380.000010org.ushmm
9531862488837510.000010com.lawfareblog
9541862466448770.000007ke.co.google
9551862440038350.000009com.thenationalnews
9561862437847160.000007com.kongregate
9571862424051320.000007com.apsense
9581862402013420.000023com.nvidia
959186238386170.000048gov.copyright
9601862350444370.000008com.jacobinmag
9611862339629340.000012net.dwcdn
962186226806430.000046com.accenture
9631862232045290.000008uk.ac.soas
9641862116634450.000010de.test
9651862056816610.000020com.createjs
9661862014032180.000011com.obsproject
9671861997628920.000012org.gnupg
9681861987043180.000008com.washingtonian
9691861939249080.000007uk.co.birminghammail
9701861915445480.000008io.meduza
9711861905840340.000009ru.mid
9721861888212070.000025org.golang
9731861853439300.000009org.cgiar
9741861711624110.000015co.pcdn
9751861630426130.000014com.olark
9761861556210070.000030com.gumroad
9771861365227550.000013ru.tass
9781861351048250.000007com.selfridges
9791861281437700.000009fr.capital
9801861221443910.000008za.co.mg
981186121689110.000033net.atlassian
982186120448440.000035com.redhat
9831861151817490.000019com.indiegogo
9841861143850090.000007edu.utep
9851861085617270.000020org.linuxfoundation
986186104469560.000031com.att
9871860918628900.000012org.transparency
9881860858839180.000009com.encyclopedia
98918606828720.000505com.oculus
990186067726990.000043com.psychologytoday
9911860669830910.000011com.sharefile
992186065041510.000189org.whatwg
993186063547540.000040org.poynter
9941860626833860.000011com.alchemer
995186046487070.000042co.ibb
996186044322860.000095com.caniuse
9971860440227380.000013com.springeropen
9981860438624900.000014studio.flourish
9991860413843730.000008com.googledrive
10001860401444900.000008tw.com.books

Credits

Thanks to the authors of the WebGraph framework, whose software made the computation of graph properties and ranks possible.

We hope the data will be useful for you to do any kind of research on ranking, graph analysis, link spam detection, etc. Let us know about your results via Common Crawl’s Google Group!

Host- and Domain-Level Web Graphs June, July/August and September 2021

We are pleased to announce a new release of host-level and domain-level web graphs based on the crawls of June, July/August and September 2021. Additional information about the data formats, the processing pipeline, our objectives, and credits can be found in the announcements of prior webgraph releases. You may also visit the projects cc-webgraph and cc-pyspark which include all scripts and tools required to construct the graphs. Instructions to explore the graphs in the webgraph format are given in our collection of webgraph notebooks.

Host-level graph

The graph consists of 766 million nodes and 4.95 billion edges. Both hyperlinks and HTTP redirects and link headers are used as edges to span up the graph. All types of links are included, including pure “technical” ones pointing to images, JavaScript libraries, web fonts, etc. However, only host names with a valid IANA TLD are used. Consequently, URLs with an IP address as host component are not taken into account for building the host-level graph.

There are 701 million dangling nodes (91.6%) and the largest strongly connected component contains 47.0 million (6.1%) nodes. Dangling nodes stem from

  • hosts that have not been crawled, yet are pointed to from a link on a crawled page
  • hosts without any links pointing to a different host name
  • or hosts which did only return an error page (eg. HTTP 404)

Host names in the graph are in reverse domain name notation and a leading www. is stripped: www.subdomain.example.com becomes com.example.subdomain.

You can download the graph and the ranks of all 766 million hosts from AWS S3 on the path s3://commoncrawl/projects/hyperlinkgraph/cc-main-2021-jun-jul-sep/host/. Alternatively, you can use https://data.commoncrawl.org/projects/hyperlinkgraph/cc-main-2021-jun-jul-sep/host/ as prefix to access the files from everywhere.

Please note that the text representation of the host-level graph is shipped in 40 gzip-compressed files listed in two path listings – one for the nodes (vertices), one for the edges (arcs). First, download the paths listing and decompress it using “gzip”. By adding the prefix s3://commoncrawl/ or https://data.commoncrawl.org/ to each line in the path listing you get the list of URLs to download the entire graph.

SizeFileDescription
4.50 GBcc-main-2021-jun-jul-sep-host-vertices.paths.gznodes ⟨id, rev host⟩, paths of 16 vertices files
20.76 GBcc-main-2021-jun-jul-sep-host-edges.paths.gzedges ⟨from_id, to_id⟩, paths of 24 edges files
8.43 GBcc-main-2021-jun-jul-sep-host.graphgraph in BVGraph format
2 kBcc-main-2021-jun-jul-sep-host.properties
9.83 GBcc-main-2021-jun-jul-sep-host-t.graphtranspose of the graph (outlinks inverted to inlinks)
2 kBcc-main-2021-jun-jul-sep-host-t.properties
1 kBcc-main-2021-jun-jul-sep-host.statsWebGraph statistics
11.06 GBcc-main-2021-jun-jul-sep-host-ranks.txt.gzharmonic centrality and pagerank

Domain-level graph

The domain graph is built by aggregating the host graph on the level of pay-level domains (PLDs) based on the public suffix list maintained on publicsuffix.org. Version (commit) a5b046d of the public suffix list was used (commit date 2021-10-06).

The domain-level graph has 88 million nodes and 1.56 billion edges. 49% or 43 million nodes are dangling nodes, the largest strongly connected component covers 35 million or 40% of the nodes.

All files related to the domain graph are available on AWS S3 under s3://commoncrawl/projects/hyperlinkgraph/cc-main-2021-jun-jul-sep/domain/ resp. https://data.commoncrawl.org/projects/hyperlinkgraph/cc-main-2021-jun-jul-sep/domain/.

Download files of the Common Crawl Jun/Jul/Sep 2021 domain-level webgraph

SizeFileDescription
0.61 GBcc-main-2021-jun-jul-sep-domain-vertices.txt.gznodes ⟨id, rev domain, num hosts⟩
6.34 GBcc-main-2021-jun-jul-sep-domain-edges.txt.gzedges ⟨from_id, to_id⟩
3.59 GBcc-main-2021-jun-jul-sep-domain.graphgraph in BVGraph format
2 kBcc-main-2021-jun-jul-sep-domain.properties
3.44 GBcc-main-2021-jun-jul-sep-domain-t.graphtranspose of the graph
2 kBcc-main-2021-jun-jul-sep-domain-t.properties
1 kBcc-main-2021-jun-jul-sep-domain.statsWebGraph statistics
1.89 GBcc-main-2021-jun-jul-sep-domain-ranks.txt.gzharmonic centrality and pagerank

Below you’ll find the top 1000 domains ranked by Harmonic Centrality or PageRank. The full list of all 88 million domain ranks is available for download.

Top 1000 domains ranked by harmonic centrality (Jun/Jul/Sep 2021)

harmonic
centrality
rank
hc valuepage rankpage rank
value
reversed domain name
13163452410.018526com.googleapis
23059044830.013879com.facebook
32934410220.015082com.google
42657920850.007506org.w
52649088640.008344com.twitter
62619450460.007270com.youtube
72559836480.006802com.instagram
82480861870.006829com.googletagmanager
92431433490.004946org.gmpg
1023501518140.003442com.linkedin
1123191984110.003661com.gstatic
1222552330100.004093com.cloudflare
1322306348170.001939com.gravatar
1422254410120.003556org.wordpress
1522159662220.001575com.pinterest
1621706482230.001396org.wikipedia
1721593592160.001952com.apple
1821537622370.001024com.wordpress
1921459498270.001176com.vimeo
2021360146410.000898be.youtu
2121289640150.002252com.bootstrapcdn
2221211768200.001717com.jquery
2321069122280.001166com.microsoft
2421014760540.000699com.blogspot
2520975968460.000793com.amazon
2620916000450.000798gl.goo
2720909476500.000720com.wp
2820878568620.000580ly.bit
2920832704350.001053com.google-analytics
3020815940330.001068com.amazonaws
3120731242770.000417com.tumblr
3220705584420.000841org.mozilla
3320703870310.001080net.cloudfront
3420700046180.001797com.adobe
3520687574490.000720eu.europa
3620682862190.001772com.github
3720663412340.001054net.jsdelivr
3820641588290.001134com.wixstatic
3920534372750.000442com.googleusercontent
4020518204210.001630com.fontawesome
4120499484960.000366com.yahoo
4220489920510.000711com.paypal
4320470068570.000645co.t
4420460526470.000784com.whatsapp
4520453542530.000704com.flickr
46204296101060.000315com.reddit
4720405542720.000449com.medium
4820396096380.000956com.googlesyndication
4920388958480.000748io.github
50203645841250.000241com.nytimes
5120363658590.000616org.w3
5220350454980.000358com.weebly
5320330952970.000363org.creativecommons
5420324280650.000510com.shopify
55203156241100.000281com.soundcloud
5620308274850.000402me.wp
5720300136600.000610org.schema
5820300004320.001072ru.yandex
59202728741320.000215com.forbes
6020266896640.000527com.vk
61202166401140.000268com.spotify
62202148181750.000146com.cnn
6320172836520.000708net.doubleclick
6420156654560.000653com.addthis
65201536421900.000132uk.co.bbc
66201518822300.000113com.wsj
67201498221350.000208gov.nih
6820146194400.000924com.baidu
69201449521610.000166com.theguardian
70201271721510.000179int.who
71201211342400.000108com.bloomberg
72201152701300.000217org.archive
73201115681630.000163com.giphy
7420107092940.000388com.list-manage
75201008301920.000130org.wikimedia
7620095176550.000682com.macromedia
77200918062290.000113com.oracle
78200815721450.000193com.imdb
79200745801870.000134com.businessinsider
80200669122970.000091edu.mit
81200664363000.000090edu.stanford
82200613141150.000267com.mailchimp
8320059906440.000800net.facebook
84200580501400.000204us.zoom
85200566824260.000065com.googleblog
86200532101710.000153com.unsplash
87200470202840.000095com.reuters
88200437221850.000137com.imgur
89200357541420.000199com.wixsite
90200253001910.000131com.stackoverflow
91200221001280.000223com.weibo
92200165601620.000163com.issuu
93200160264230.000065gov.nasa
9420014282360.001043net.fbcdn
95200139102730.000098com.android
96200127781190.000259me.t
97200123661690.000160org.ietf
98200045301390.000206com.ytimg
99200011821480.000183org.apache
100199994643180.000086com.theverge
101199930263250.000084com.slack
102199928502580.000103edu.harvard
103199898702240.000116com.washingtonpost
104199883022680.000099com.bbc
105199866424600.000060edu.cornell
106199847121550.000174com.ft
107199798561170.000261com.npmjs
108199787444160.000066com.ted
109199627384240.000065com.myspace
110199573223520.000078com.wired
111199562965260.000052com.livejournal
112199454843550.000078com.appspot
113199442742630.000101org.un
114199434582150.000119org.gnu
115199331123900.000070com.goodreads
11619932600690.000476com.godaddy
117199315223870.000071org.hbr
118199174403090.000087org.npr
119199166123220.000085com.prnewswire
120199116283320.000083net.researchgate
121199101223070.000088com.githubusercontent
12219909276240.001390io.polyfill
123199082762860.000095com.wiley
124199059602470.000106com.tiktok
125198949901930.000130com.blogger
12619892346710.000466com.unpkg
127198849781020.000322de.google
128198695564320.000064com.gmail
129198692365650.000049com.vice
130198684066270.000045org.chromium
131198671461460.000186gle.forms
132198669161130.000269com.youtube-nocookie
133198668341790.000142org.ampproject
134198566083540.000078com.time
135198562726680.000043edu.upenn
136198560403700.000074com.example
137198548186330.000045com.economist
138198508847270.000040com.evernote
139198424345000.000055com.steampowered
140198413806920.000042google.blog
141198413544620.000060com.theatlantic
142198400365860.000047org.weforum
143198375966280.000045com.deviantart
144198370722390.000109uk.co.google
145198334703570.000077org.arxiv
146198334023950.000070com.scribd
147198318824070.000067uk.co.telegraph
148198298783680.000074com.huffingtonpost
149198298766420.000045com.mysql
150198218005550.000050org.worldbank
151198181002420.000107com.sciencedirect
152198166763470.000080com.nature
153198149922280.000113com.twimg
154198123361200.000259com.statcounter
155198100982460.000106org.acm
156198052145540.000050org.ieee
157197996863810.000071com.fastcompany
158197968922550.000104org.python
159197949067010.000041com.apnews
160197908204310.000064com.meetup
161197902767380.000040com.qz
162197898585810.000047com.globenewswire
163197884163650.000074com.docker
164197884024440.000062com.pixabay
165197880384530.000061uk.co.dailymail
166197844423210.000085com.springer
167197835522510.000105com.bandcamp
168197792882560.000103net.behance
169197780763610.000075com.gitlab
170197689486030.000046com.git-scm
171197642245400.000051io.readthedocs
172197639827150.000041com.engadget
173197630346470.000044com.trello
174197611401830.000138com.bing
175197581203460.000080com.usatoday
176197573262200.000117com.squarespace
177197524281580.000169com.yelp
178197478843530.000078com.dribbble
179197477564170.000066com.digg
180197475321430.000199com.dropbox
181197448182760.000097com.ibm
182197340924640.000060uk.co.independent
183197333103980.000069com.w3schools
184197238005440.000051ee.linktr
185197233346740.000043uk.co.blogspot
186197232004120.000067com.staticflickr
187197212285510.000050com.pexels
188197212241440.000194gov.cdc
189197188025390.000051org.pbs
190197183887750.000038com.stackexchange
191197182227030.000041org.cambridge
1921971735211030.000029org.eclipse
19319716696430.000819com.fb
194197125166780.000042edu.columbia
195197103081000.000351com.wix
196197062567300.000040edu.washington
197197056122900.000094com.tinyurl
198197051263670.000074com.sagepub
199197042586730.000043me.about
200196990442190.000117net.slideshare
201196948186530.000044org.sciencemag
202196890942940.000091org.pewresearch
203196857706010.000046com.withgoogle
204196849564580.000061com.herokuapp
205196751884400.000063com.quora
206196730701230.000246com.sharethis
20719672408390.000941com.qq
208196719821760.000145org.doi
209196700884380.000063co.ibb
210196676746140.000046com.newyorker
2111966472411830.000027com.nike
212196642403160.000087com.typeform
213196582542480.000106com.outlook
214196544367360.000040com.hp
215196541047910.000037com.foxnews
216196510262340.000112com.cloudinary
217196482149330.000035edu.princeton
218196465745720.000048com.moz
219196422604370.000063com.getpocket
220196398984850.000057com.nbcnews
221196388366450.000044org.bitbucket
222196358722020.000124page.g
223196351421540.000176gov.privacyshield
224196349242770.000096com.disqus
225196249361410.000203com.opera
226196236144630.000060com.airbnb
227196203189600.000034com.dropboxusercontent
228196186243720.000073com.force
229196177969230.000035co.elastic
230196140062140.000119com.wpengine
231196131646790.000042org.semver
232196131603050.000089com.typepad
233196112549720.000033com.nypost
234196107987260.000040com.ubuntu
2351961007012630.000025se.haxx
236196052343030.000089com.live
237196034286830.000042au.net.abc
238196033604650.000060com.mozilla
239195997643820.000071com.criteo
2401959318412320.000026uk.co.thesun
2411958135414340.000023edu.rutgers
242195810202780.000096com.feedburner
243195772889690.000033com.politico
2441957241410540.000030co.g
2451957237218490.000019com.instructables
246195708747780.000038com.sap
2471956869411230.000028org.greenpeace
2481956413411270.000028org.kernel
2491956228016500.000022com.googlesource
250195614841240.000245com.filesusr
2511955944411500.000028com.unity3d
252195580566820.000042com.freepik
253195571065420.000051com.fortune
254195539546380.000045uk.ac.ox
255195536342090.000121org.iana
256195518722410.000108com.eepurl
257195513989580.000034com.ssrn
258195510108660.000035com.nvidia
2591954954415690.000022com.storify
2601954903610000.000032com.sun
261195485146240.000045uk.co.eventbrite
262195444668290.000036edu.jhu
263195411506070.000046net.azurewebsites
2641953907612910.000025com.reverbnation
265195386604390.000063gov.fda
26619538044990.000351com.stripe
267195348609270.000035com.podbean
268195326123410.000082net.windows
2691953096612560.000025uk.co.ebay
270195307062600.000102com.calendly
271195266328040.000037com.chrome
2721952501618040.000020com.martinfowler
273195249229260.000035edu.academia
274195234845680.000049site.business
275195215682310.000113com.office
276195211063280.000084com.netdna-ssl
277195188742530.000105com.newsweek
278195168802100.000120tv.twitch
2791951518212200.000026com.vogue
2801951472623070.000017com.diigo
2811951466212140.000026org.postgresql
282195132926910.000042com.xinhuanet
2831950837813680.000024de.mpg
284195083265250.000052com.squareup
285195078924080.000067org.debian
286195020821210.000252com.paypalobjects
287195019867160.000041gov.senate
2881950178823920.000016com.pearltrees
2891950024011470.000028com.500px
290194990463430.000081com.googlecode
291194979807120.000041org.change
292194975164910.000056com.tandfonline
29319496848670.000505net.akamaihd
2941949207612650.000025com.aljazeera
295194920547060.000041com.qualtrics
296194907507830.000037com.theconversation
2971949038810780.000029com.theglobeandmail
298194862281340.000211de.bund
2991948578812990.000025edu.illinois
300194857863270.000084com.cnbc
301194841408580.000036uk.co.guardian
302194837684830.000057com.msn
303194754161290.000219com.rawgit
304194723724740.000059com.stumbleupon
305194692781980.000125net.sourceforge
306194689823560.000077com.optimizely
307194686564700.000059org.openstreetmap
308194675263350.000083com.techcrunch
309194659984340.000064com.ssl-images-amazon
3101946418814050.000023edu.ufl
3111946291213950.000023edu.gatech
312194628161650.000162com.hubspot
313194627363100.000087com.mapbox
314194624044410.000063com.go
315194590726080.000046gov.noaa
3161945884614700.000023com.channel4
3171945879412370.000026ca.sfu
318194576985920.000047com.healthline
319194576607940.000037org.fao
320194575285070.000054ca.google
3211945717024310.000016com.wattpad
3221945426812800.000025uk.co.standard
323194526786670.000043gov.house
3241945193613090.000025uk.co.wired
3251945029418620.000019com.invisionapp
326194497547430.000039com.pinimg
327194497381840.000138com.amazon-adsystem
3281944948018360.000019org.maven
3291944624225070.000015com.openai
330194436988030.000037org.pypi
331194421422620.000101net.azureedge
332194410004900.000056com.kickstarter
3331943957019620.000019uk.bl
3341943940212360.000026au.com.smh
3351943690413390.000024com.vanityfair
336194355042610.000102uk.org.ico
337194343681530.000176com.addtoany
338194270349860.000032com.uk
3391942611410970.000029com.scmp
340194240369810.000033org.pnas
341194224744330.000064com.cnet
342194221743480.000080com.statista
343194213601490.000180org.nodejs
344194188367930.000037us.icio
3451941629029160.000013com.instapaper
346194157524890.000056gov.epa
3471940568210030.000032com.mixcloud
348194045006770.000042org.d3js
349194018728510.000036com.britannica
350194002045930.000047uk.gov.service
351193994661770.000143org.allaboutcookies
352193969545460.000050edu.berkeley
353193951622960.000091me.telegram
3541939367014360.000023com.irishtimes
355193927169440.000034int.coe
356193888741570.000170com.zendesk
3571938822215070.000022org.hrc
3581938704012510.000025com.history
359193868721520.000178io.shields
3601938629416860.000021ms.1drv
361193847665480.000050com.biomedcentral
362193847604730.000059com.latimes
3631938123410620.000030org.jstor
3641938120611240.000028com.jetbrains
3651937615210840.000029org.ilo
366193760367770.000038edu.psu
367193753341370.000207com.youronlinechoices
368193749868950.000035com.ecwid
3691937193010670.000030com.brightcove
370193714607480.000039it.scoop
371193694103290.000084ru.ok
3721936491012260.000026com.digitaltrends
3731936278810040.000032uk.co.thetimes
3741935976214040.000023com.thedailybeast
3751935912216040.000022edu.osu
376193542905130.000054edu.yale
377193517641160.000262com.jimdo
3781934980023540.000016com.fastcodesign
379193494669880.000032uk.parliament
380193481284560.000061org.freecodecamp
3811934348822090.000018com.us
382193402145730.000048com.deloitte
3831933997610100.000031uk.co.huffingtonpost
384193397386050.000046com.zdnet
385193390981810.000139ru.mail
386193375724150.000066com.elsevier
3871933581011850.000027org.nejm
3881933557224220.000016com.instructure
389193354564270.000065net.imgix
3901933459617140.000021com.citrix
3911933095825900.000014org.aclweb
3921933088619640.000018org.haskell
393193267546490.000044gov.state
3941932663413500.000024app.netlify
395193260688010.000037com.venturebeat
396193238962170.000118com.eventbrite
397193238766220.000045com.seattletimes
3981932328418680.000019jp.ac.u-tokyo
3991932240830480.000012org.uxplanet
4001931979411860.000027com.dw
4011931840412480.000026org.undp
402193179502110.000120com.etsy
4031931794817630.000020com.itv
404193175482330.000112net.php
40519314522580.000638com.googleadservices
406193140042710.000098com.surveymonkey
407193100042450.000107org.aboutcookies
4081930957419000.000019edu.vt
4091930743424770.000015org.wikibooks
410193057806870.000042gov.nist
4111929765814230.000023com.thehindu
4121929714211200.000028org.hrw
413192951186000.000046com.thinkwithgoogle
4141929457416820.000021gov.usembassy
415192942746560.000044com.intel
4161929288417250.000021int.unfccc
417192918063940.000070com.ebay
4181928990824190.000016google.ai
4191928656222700.000017com.netvibes
4201928515625620.000015io.material
4211928310622350.000017ly.rebrand
4221928087019540.000019org.archlinux
423192801148780.000035uk.co.pinterest
4241927921628330.000013org.doctorswithoutborders
4251927759020690.000018org.accessnow
4261927250614820.000023com.findlaw
427192719388000.000037net.clickbank
4281927087635360.000010com.viki
429192695089800.000033edu.brookings
4301926563823630.000016co.carrd
4311926551829500.000012org.neocities
4321926170012940.000025com.wikia
433192606086570.000044com.mashable
434192603467980.000037com.thelancet
435192590749740.000033uk.ac.cam
4361925896424160.000016org.rsf
4371925878618410.000019net.daringfireball
438192556847080.000041com.canva
439192556744950.000056gov.whitehouse
440192533742040.000122com.salesforce
4411925232811910.000027com.thenextweb
4421925193420760.000018com.france24
443192519145340.000052io.codepen
4441925067028960.000013com.laweekly
4451925019612180.000026com.licdn
4461924889428390.000013cc.uxdesign
4471924513623680.000016edu.kit
4481924031210710.000030watch.fb
4491923922425830.000014org.scala-lang
4501923892625130.000015au.com.theage
4511923462226440.000014com.hubpages
4521923286015330.000022ch.ipcc
4531923235810920.000029com.digitaloceanspaces
4541923107824350.000015org.vim
4551922740817210.000021com.refinery29
456192271022920.000093net.secureservercdn
457192259967690.000038com.marketwatch
4581922469026080.000014app.web
4591922466616810.000022org.unwomen
4601922362224470.000015com.fineartamerica
4611922271025660.000015nl.blogspot
462192196785370.000051edu.cmu
463192196625050.000054fr.free
464192171228310.000036com.box
4651921694411590.000027com.imageshack
4661921535824090.000016edu.usf
4671921413823960.000016nz.co.nzherald
4681921259827580.000013com.smashwords
469192085443640.000074net.datatables
470192079304190.000066com.nationalgeographic
471192078322870.000095com.iubenda
4721920622024700.000015re.appsto
473192057462350.000112com.adnxs
4741920382625090.000015org.gentoo
4751920368419820.000018com.voanews
4761920306424980.000015com.superuser
477192010903790.000072com.businesswire
478192001626430.000045int.wipo
4791919942622010.000018org.biorxiv
4801919887414510.000023org.amnesty
4811919845424360.000015com.oregonlive
4821919844820330.000018org.nobelprize
48319197824810.000414net.jsfiddle
4841919738419630.000019com.ew
4851919564810080.000031com.arstechnica
4861919470018260.000020org.ocks
487191946562650.000101com.aliyuncs
4881919231426490.000014com.dezeen
4891919152426780.000014org.transparency
490191914749250.000035org.mediawiki
4911919102028530.000013com.scribblelive
4921919086613570.000024io.gitlab
4931919077616830.000021org.aiga
4941918899419850.000018uk.gov.tfl
495191862504280.000064com.adweek
4961918552819410.000019org.unep
497191854444300.000064org.js
498191840303710.000073com.atlassian
4991918327422820.000017com.foreignpolicy
5001918128826960.000014org.democracynow
501191811609850.000032com.webs
5021918112411980.000026com.wetransfer
5031917953213690.000024org.altervista
5041917781425010.000015google.research
5051917710229590.000012za.co.iol
5061917557210280.000031com.slate
5071917552827130.000014org.cpj
5081917445620160.000018org.example
5091917378223900.000016com.googlegroups
510191679022430.000107com.naver
5111916430422660.000017net.openid
5121916415430580.000012com.deepmind
513191640222690.000099org.drupal
514191637162640.000101gov.ca
515191637144140.000067com.livechatinc
5161915558623280.000016com.washingtontimes
517191530826370.000045com.cbsnews
518191518247590.000038com.oreilly
5191915001230340.000012com.podomatic
520191495386640.000043gov.loc
521191476341360.000208org.networkadvertising
522191467027180.000041com.buzzfeed
5231914489613330.000024link.page
524191434408380.000036com.pcmag
525191422249560.000034com.verisign
5261913347628920.000013com.thoughtworks
5271913275826690.000014uk.co.timesonline
528191313022740.000097com.getbootstrap
5291913087230820.000012com.mariadb
5301912965011160.000028com.jekyllrb
531191289929380.000034com.vox
532191275081270.000234info.aboutads
533191223624920.000056com.patreon
5341912203424880.000015com.curbed
535191217365140.000054it.placehold
5361912162218070.000020com.ascentlawfirm
537191188022270.000114to.amzn
538191174625800.000047com.visualstudio
5391911742612810.000025com.smashingmagazine
540191169264990.000055com.sxsw
541191169229780.000033com.hootsuite
542191163842820.000095gov.ftc
5431911357023360.000016com.snopes
5441911069813920.000023com.upwork
5451910992013880.000024com.haaretz
5461910814217010.000021com.firebaseapp
547191046869200.000035com.zoho
5481910376830230.000012org.peta
5491910022011740.000027com.att
5501909750216870.000021com.techrepublic
5511909734817760.000020com.surveygizmo
5521909705023940.000016com.treehugger
5531909529433590.000011com.letterboxd
5541909485630250.000012gov.anl
5551909443625680.000015com.kaggle
5561909143024790.000015fm.omny
5571909142229990.000012com.bangkokpost
558190895424780.000058gov.irs
5591908696217900.000020ca.bc.gov
560190868808230.000036com.emarketer
5611908685411730.000027com.mediaplex
562190863702590.000103uk.co.amazon
5631908414027370.000014int.au
5641908374025140.000015no.google
5651908361236180.000010com.newgrounds
566190835961860.000134jp.co.yahoo
5671908285231080.000012org.hypotheses
568190823323370.000082mp.mailchi
5691908222828080.000013com.usmagazine
5701908064816740.000022com.routledge
5711907890428810.000013org.polymer-project
5721907773224460.000015org.unctad
573190770601800.000140com.caniuse
574190761423440.000080com.onesignal
5751907371427140.000014int.interpol
5761907252232480.000011org.elasticsearch
577190722186200.000045com.entrepreneur
5781907210825500.000015uk.gov.metoffice
5791907101629210.000013org.jenkins-ci
580190700526320.000045com.samsung
5811907000813670.000024org.unicode
5821906941233810.000011uk.mod
5831906935827150.000014org.mozillazine
5841906703231480.000011edu.ucpress
5851906670611560.000027com.gizmodo
5861906435619010.000019org.americanbar
5871906335835930.000010org.scala-sbt
588190628923300.000084ai.shortpixel
5891905991429120.000013in.indiatoday
590190587023010.000090gg.discord
5911905854436870.000010jp.riken
5921905682024400.000015com.timesofisrael
5931905651830640.000012com.manta
594190561667850.000037com.fandom
5951905605212740.000025com.sfgate
5961905481219940.000018com.knightlab
5971905368420420.000018org.donorbox
5981905366822510.000017eu.politico
5991905187424940.000015org.gnupg
60019051168840.000402me.ogp
601190507845750.000048com.cisco
6021905014627570.000013uk.ac.york
6031904858811890.000027com.buffer
6041904853430650.000012uk.org.wwf
605190471929290.000035com.variety
6061904521835170.000010com.flightradar24
6071904441633870.000011com.flock
608190443685790.000048com.sedo
609190440848710.000035com.libsyn
6101904316225610.000015com.thenation
6111904299225690.000015com.monday
612190422264820.000057com.arcgis
6131904209832230.000011net.inquirer
6141903882426990.000014com.real
6151903676221950.000018com.secondlife
616190354147320.000040org.unesco
617190352929320.000035com.wikihow
6181903419626090.000014uk.ac.leeds
61919033676800.000415com.livestream
6201903156631000.000012org.cato
6211902928827160.000014org.sonatype
6221902809831690.000011com.intensedebate
6231902800610890.000029com.symantec
6241902746240480.000009org.jw
6251902720827930.000013com.wayfair
6261902686419220.000019com.scene7
62719026030760.000421com.messenger
6281902565616890.000021org.coursera
6291902459211960.000026edu.umn
6301902403030930.000012org.rferl
6311902359025120.000015org.wikidata
632190224646440.000045com.psychologytoday
6331902160630910.000012com.vancouversun
6341902101624960.000015uk.org.nationaltrust
6351902035615530.000022ly.ow
6361902030012640.000025edu.ucsd
6371902019829050.000013tr.com.aa
6381901895837670.000010it.polito
6391901767433890.000011org.sourcewatch
6401901726432070.000011ch.qos
6411901714437850.000010jp.ac.kobe-u
6421901650016590.000022com.speakerdeck
6431901560429740.000012com.sciencealert
644190150346360.000045com.photobucket
6451901502630870.000012com.hsbc
6461901441238520.000009edu.uah
647190134541880.000133com.jimcdn
6481901270011950.000027com.rollingstone
6491901255828970.000013org.osce
6501901215042050.000009com.gust
6511900987416840.000021org.webkit
652190093969450.000034com.shutterstock
6531900867429450.000012com.townnews
6541900831826500.000014org.wri
655190061965200.000053com.inc
656190046726480.000044com.gartner
6571900422625020.000015ru.rg
6581900394226480.000014io.bower
6591900378033710.000011net.thedailystar
6601900372225640.000015net.dwcdn
6611900357427700.000013com.articulate
662190035102210.000117com.myshopify
663190028681720.000151jp.co.google
6641900254813540.000024gov.uspto
665190008249980.000032edu.ucla
666189986326850.000042com.investopedia
6671899850231610.000011com.mongabay
668189977005320.000052com.aol
6691899421023520.000016ca.citizenlab
6701899387812840.000025com.today
671189926261600.000167org.whatwg
6721899232010480.000030com.smartadserver
6731899205429950.000012org.pewforum
6741899126629970.000012org.sierraclub
6751899106229640.000012net.vnexpress
6761899084810690.000030com.about
6771898927430810.000012uk.co.spectator
678189884824800.000058com.dmca
6791898704214620.000023ly.cutt
6801898689032240.000011ru.interfax
6811898654636700.000010uk.co.zoopla
6821898500428430.000013org.iucnredlist
683189846821940.000130com.tripadvisor
6841898425634730.000010fm.audioboo
6851898365427620.000013uk.co.bbci
6861898309834160.000011edu.sjsu
6871898229014440.000023edu.northwestern
688189820405430.000051com.googleoptimize
6891898016028170.000013int.iom
6901897959813860.000024edu.umd
691189788326290.000045org.eff
6921897822422970.000017uk.org.ofcom
6931897803824500.000015int.reliefweb
6941897770835020.000010com.torontosun
695189753484930.000056com.indeed
6961897323816570.000022com.nngroup
697189728983510.000078com.constantcontact
6981897277417800.000020co.lpages
6991897243413290.000024edu.utexas
7001897130233860.000011com.iconarchive
701189712403120.000087com.pubmatic
7021897107810430.000030org.reactjs
703189694289750.000033edu.umich
7041896842812190.000026com.tableau
7051896820819140.000019com.hatenablog
7061896757411380.000028com.chicagotribune
7071896719639990.000009info.spain
708189657405470.000050gov.copyright
7091896544241710.000009org.gwtproject
710189646566500.000044com.netflix
711189637147530.000039net.adform
7121896146228470.000013uk.ac.jisc
7131896117228650.000013com.ringcentral
714189602929760.000033com.redhat
7151896028032370.000011com.city-data
7161895993829820.000012uk.org.stonewall
7171895876436460.000010za.co.timeslive
7181895756039890.000009com.programmableweb
719189572044030.000068com.bigcommerce
7201895716630530.000012com.flippa
7211895571032930.000011com.multiscreensite
7221895529427190.000014com.bloglines
7231895456026290.000014mp.j
7241895252028070.000013uk.org.rspb
7251895235829490.000012com.foreignaffairs
7261895231422060.000018co.pcdn
7271895146235490.000010in.theprint
7281895100642270.000009com.symbaloo
7291895099244780.000008com.algorithmia
7301895035617330.000021com.billboard
731189499987420.000039com.splashthat
7321894833232550.000011com.cleantechnica
7331894792636040.000010com.businessdailyafrica
7341894720411080.000028com.dell
7351894719028250.000013com.yell
736189470084430.000062net.hubspot
7371894694438380.000010org.rfa
7381894661834950.000010za.co.mg
7391894503843460.000008com.apsense
7401894496017720.000020com.alibabagroup
7411894466022670.000017to.dev
7421894455834710.000010ru.mid
7431894427434980.000010com.itsnicethat
744189426345270.000052org.unicef
7451894235823640.000016net.noscript
7461894060613900.000024com.techradar
7471893834818570.000019edu.uci
7481893706411180.000028com.windowsphone
7491893662627340.000014com.doubleclickbygoogle
7501893632635240.000010org.350
7511893508030760.000012org.aei
7521893455830750.000012gov.arts
753189342486710.000043gov.sec
7541893368222980.000017com.urbandictionary
7551893359639250.000009com.forbesimg
756189333104870.000056com.fc2
7571893143633520.000011com.brill
7581893140624910.000015com.infoworld
7591893078213520.000024com.bazaarvoice
7601893035036000.000010de.uni-konstanz
7611893023211870.000027com.alexa
7621892998222400.000017org.linuxfoundation
7631892973835800.000010edu.dukeupress
7641892921840380.000009com.hotfrog
765189288845120.000054com.mckinsey
7661892871025390.000015org.crossref
7671892832838930.000009com.environmentalleader
7681892790022360.000017tv.ustream
7691892729011010.000029fm.last
7701892690619510.000019com.businessweek
771189268164130.000067org.opensource
772189253087500.000039org.whatbrowser
7731892501212960.000025com.merriam-webster
774189244004250.000065com.proofpoint
7751892330231960.000011com.alchemer
7761892268036840.000010com.arfadia
7771892212016920.000021com.kinstacdn
7781892133436190.000010com.ecowatch
7791892130222150.000018net.leadpages
7801892018834480.000010com.total
7811892011238780.000009uk.org.npg
7821891983630540.000012io.crates
7831891932025170.000015com.lego
784189191505030.000055com.wufoo
7851891547827420.000014io.redis
7861891495622890.000017uk.co.metro
7871891401641610.000009uk.co.theweek
7881891393022040.000018gd.is
7891891364041960.000009io.coda
790189134821960.000128com.hackerone
7911891209414830.000023com.msdn
792189115001560.000170org.nginx
7931891139031970.000011com.klokantech
7941891128616680.000022com.sky
7951891014242720.000009de.fernuni-hagen
7961890906418320.000020de.hessen
797189088349530.000034com.adroll
7981890796019860.000018com.windows
7991890625443160.000008com.tupalo
800189042422180.000118org.icann
8011890400411480.000028net.atlassian
8021890344242340.000009net.ccm
8031890337238080.000010com.oilprice
8041890312819560.000019org.khanacademy
8051890289640720.000009net.iwpr
806189024183240.000084eu.youronlinechoices
8071890231838900.000009uk.ac.mmu
8081890196817000.000021edu.usc
8091890167411460.000028com.playstation
8101890085442680.000009uk.ac.ceh
8111890023610600.000030com.akamai
8121889775428510.000013com.hindustantimes
813188966949790.000033gov.fcc
814188963689900.000032com.gumroad
8151889612843830.000008et.com.google
8161889465239520.000009com.theoutline
8171889458643480.000008org.cgsociety
8181889250240930.000009edu.mtsu
8191889244626360.000014com.html5rocks
8201889222245780.000008com.blockchair
8211889157236030.000010org.spie
8221889127812080.000026at.gv.bka
8231889081437680.000010uk.co.lrb
824188881964100.000067com.heroku
825188880628150.000036edu.wisc
8261888793810090.000031com.yoast
8271888784238070.000010za.co.dailymaverick
8281888561418640.000019org.json
8291888519429690.000012org.thinkprogress
830188849287000.000041com.feedly
8311888356242780.000009com.ingress
8321888301237060.000010google.design
8331888295246660.000008com.bmwblog
8341888152638620.000009com.thepetitionsite
8351888131239930.000009in.bbc
8361888086022370.000017com.w3techs
8371888053232570.000011org.carbonbrief
838188804282720.000098jp.ne.hatena
8391888012426140.000014ru.mk
8401888006419240.000019edu.hbs
841188794869350.000034com.pingdom
8421887878011990.000026com.ycombinator
8431887642240650.000009com.gifer
8441887605037090.000010uk.org.amnesty
8451887570039610.000009com.africanews
8461887547845350.000008com.the-dots
847188753509920.000032so.notion
8481886994433750.000011org.commondreams
8491886925042360.000009com.flutterwave
8501886830237310.000010org.refworld
8511886644632850.000011uk.gov.charitycommission
8521886624443960.000008com.newsru
8531886581035070.000010uk.org.oxfam
8541886578642060.000009uk.org.somersethouse
8551886459024840.000015in.scroll
8561886451414100.000023com.intuit
8571886442842570.000009uk.co.harpercollins
858188639423310.000084jp.ameblo
8591886362233700.000011ke.co.nation
8601886358010910.000029com.insurancejournal
8611886336033940.000011com.cbsistatic
8621886330226970.000014com.spreaker
8631886235827200.000014com.springernature
8641886182422280.000017com.firefox
8651886122246880.000008co.iglobal
8661886096047140.000008io.devdocs
8671886024627050.000014com.verywellhealth
868188600205380.000051com.booking
869188598385350.000051com.gofundme
8701885979812340.000026com.indiegogo
8711885932847810.000008com.kdpcommunity
8721885840423440.000016build.bazel
8731885801611190.000028com.foursquare
874188574525450.000051com.snapchat
87518856948930.000390com.trustpilot
8761885654625860.000014com.avast
8771885616018130.000020com.pcworld
8781885527841490.000009com.hybris
8791885498249370.000008com.jetphotos
880188540147110.000041com.yandex
8811885396210230.000031com.css-tricks
8821885390213600.000024org.golang
8831885390041680.000009uk.ac.mdx
8841885359626730.000014com.flipboard
8851885298828480.000013com.discovery
8861885182239040.000009at.kleinezeitung
887188514223020.000090de.amazon
888188505961110.000276me.wa
889188505584290.000064com.skype
8901885029411340.000028com.scientificamerican
8911884864625160.000015org.raspberrypi
8921884777044260.000008com.armorgames
8931884759617580.000020com.fiverr
894188471107210.000040org.iso
8951884646427070.000014com.codecademy
8961884394837270.000010net.middleeasteye
8971884287630570.000012org.man7
8981884107445310.000008com.e-estonia
8991884047418270.000020fr.blogspot
9001884028210470.000030com.huffpost
9011883994447250.000008net.gebco
9021883982247780.000008com.slite
9031883977617990.000020com.visa
904188392587220.000040com.newrelic
9051883721441630.000009com.cnsnews
906188360048070.000037br.com.uol
9071883449433880.000011com.lithub
9081883403639420.000009net.bostonreview
9091883205210220.000031au.com.google
9101883161629580.000012com.hackernoon
9111883149028020.000013com.unity
912188312302490.000106net.2mdn
9131883109012240.000026gov.usgs
914188307144000.000068com.semrush
9151882973238550.000009com.indexmundi
916188296425160.000053com.dailymotion
917188288726860.000042com.accenture
918188273146650.000043org.poynter
9191882657420860.000018org.aclu
9201882633639740.000009org.jython
9211882579211520.000027com.searchengineland
9221882521240030.000009com.inthesetimes
9231882491017530.000020com.over-blog
924188244065180.000053nl.google
9251882412833240.000011de.bfarm
9261882354411260.000028com.techtarget
9271882310644170.000008za.co.ewn
9281882269442510.000009uk.co.bristolpost
9291882210047990.000008community.studiopress
930188218008460.000036gov.justice
9311882068217070.000021com.technologyreview
9321882067239230.000009com.recyclenow
9331881975042930.000009lb.com.dailystar
934188193283840.000071com.bitly
9351881908841010.000009org.occrp
9361881905837860.000010com.theyworkforyou
9371881879840220.000009org.ifaw
938188186481680.000161com.jimstatic
939188182429660.000033sh.brew
9401881806643430.000008com.yahoosites
9411881760423210.000016com.fool
9421881736022810.000017com.pastebin
9431881692836070.000010com.gr-assets
9441881524238720.000009com.climatechangenews
9451881388647710.000008in.ac.iith
946188135867020.000041org.plos
9471881281240680.000009com.chamberofcommerce
9481881260644380.000008us.tuugo
9491881234411420.000028com.buzzsprout
9501881211011220.000028com.timeanddate
951188114187600.000038com.discordapp
9521881108211530.000027com.sitepoint
9531880979845790.000008com.desmogblog
954188096385960.000047com.aliexpress
9551880890625220.000015com.sendgrid
9561880797241280.000009uk.ac.rcplondon
9571880793420040.000018com.ssllabs
9581880773839550.000009org.soilassociation
9591880659213590.000024com.xkcd
960188061925360.000051gov.hhs
9611880558835840.000010com.hearstapps
9621880515012890.000025com.searchenginejournal
963188044584420.000062me.fb
9641880413247110.000008net.sott
9651880362845940.000008com.gpsvisualizer
966188035482130.000120com.discord
9671880275818200.000020org.mitre
9681880191647440.000008com.natureindex
9691880137434990.000010uk.org.rspca
9701880132240870.000009org.c2es
9711880129638240.000010com.qgiv
9721880128837990.000010ug.co.monitor
9731880049245960.000008com.lacartes
974188003841780.000142com.xing
9751879996639730.000009com.svbtle
9761879894438750.000009uk.org.savethechildren
9771879893246160.000008com.slurl
9781879833224660.000015com.sophos
9791879795621980.000018com.twilio
9801879700846620.000008za.co.moneyweb
9811879628842650.000009com.menafn
9821879604838210.000010org.usip
9831879557245840.000008com.power-technology
9841879536846010.000008org.heartland
985187948346750.000042com.usnews
9861879420025570.000015org.usenix
9871879396228990.000013net.privacypolicytemplate
9881879390040810.000009org.theecologist
9891879297041320.000009org.neweconomics
9901879268012900.000025com.netlify
9911879195441890.000009com.businessgreen
9921879075246110.000008org.monthlyreview
9931879045023690.000016uk.ac.ed
9941879019422450.000017ch.ethz
995187899742670.000099com.nielsen
9961878930226740.000014ca.uwaterloo
9971878881042840.000009org.unep-wcmc
9981878770240420.000009org.ramsar
9991878695243840.000008com.googlelabs
10001878680846340.000008org.berkeleyearth

Credits

Thanks to the authors of the WebGraph framework, whose software made the computation of graph properties and ranks possible.

We hope the data will be useful for you to do any kind of research on ranking, graph analysis, link spam detection, etc. Let us know about your results via Common Crawl’s Google Group!

Host- and Domain-Level Web Graphs February/March, April and May 2021

We are pleased to announce a new release of host-level and domain-level web graphs based on the crawls of February/March, April and May 2021. Additional information about the data formats, the processing pipeline, our objectives, and credits can be found in the announcements of prior webgraph releases. You may also visit the projects cc-webgraph and cc-pyspark which include all scripts and tools required to construct the graphs. Instructions to explore the graphs in the webgraph format are given in our collection of webgraph notebooks.

What’s new?

The host-level graph now includes all hosts visited by the crawler even if there is no link pointing to the host and all visited URLs of a host failed (HTTP 404 and other error codes) or the host’s robots.txt does not allow crawling. Note that the links leading to these hosts may have been found in a prior crawl, not in one of the 3 crawls used to build this web graph.

Host-level graph

The graph consists of 515 million nodes and 2.82 billion edges. Both hyperlinks and HTTP redirects and link headers are used as edges to span up the graph. All types of links are included, including pure “technical” ones pointing to images, JavaScript libraries, web fonts, etc. However, only host names with a valid IANA TLD are used. Consequently, URLs with an IP address as host component are not taken into account for building the host-level graph.

There are 452 million dangling nodes (87.9%) and the largest strongly connected component contains 45.2 million (8.8%) nodes. Dangling nodes stem from

  • hosts that have not been crawled, yet are pointed to from a link on a crawled page
  • hosts without any links pointing to a different host name
  • or hosts which did only return an error page (eg. HTTP 404)

Host names in the graph are in reverse domain name notation and a leading www. is stripped: www.subdomain.example.com becomes com.example.subdomain.

You can download the graph and the ranks of all 515 million hosts from AWS S3 on the path s3://commoncrawl/projects/hyperlinkgraph/cc-main-2021-feb-apr-may/host/. Alternatively, you can use https://data.commoncrawl.org/projects/hyperlinkgraph/cc-main-2021-feb-apr-may/host/ as prefix to access the files from everywhere.

Please note that the text representation of the host-level graph is shipped in 72 gzip-compressed files listed in two path listings – one for the nodes (vertices), one for the edges (arcs). First, download the paths listing and decompress it using “gzip”. By adding the prefix s3://commoncrawl/ or https://data.commoncrawl.org/ to each line in the path listing you get the list of URLs to download the entire graph.

Download files of the Common Crawl Feb/Apr/May 2021 host-level webgraph

SizeFileDescription
3.31 GBcc-main-2021-feb-apr-may-host-vertices.paths.gznodes ⟨id, rev host⟩, paths of 24 vertices files
12.94 GBcc-main-2021-feb-apr-may-host-edges.paths.gzedges ⟨from_id, to_id⟩, paths of 48 edges files
5.57 GBcc-main-2021-feb-apr-may-host.graphgraph in BVGraph format
2 kBcc-main-2021-feb-apr-may-host.properties
6.22 GBcc-main-2021-feb-apr-may-host-t.graphtranspose of the graph (outlinks inverted to inlinks)
2 kBcc-main-2021-feb-apr-may-host-t.properties
1 kBcc-main-2021-feb-apr-may-host.statsWebGraph statistics
7.69 GBcc-main-2021-feb-apr-may-host-ranks.txt.gzharmonic centrality and pagerank

Domain-level graph

The domain graph is built by aggregating the host graph on the level of pay-level domains (PLDs) based on the public suffix list maintained on publicsuffix.org.

The domain-level graph has 88 million nodes and 1.58 billion edges. 50% or 44 million nodes are dangling nodes, the largest strongly connected component covers 34 million or 39% of the nodes.

All files related to the domain graph are available on AWS S3 under s3://commoncrawl/projects/hyperlinkgraph/cc-main-2021-feb-apr-may/domain/ resp. https://data.commoncrawl.org/projects/hyperlinkgraph/cc-main-2021-feb-apr-may/domain/.

Download files of the Common Crawl Feb/Apr/May 2021 domain-level webgraph

SizeFileDescription
0.61 GBcc-main-2021-feb-apr-may-domain-vertices.txt.gznodes ⟨id, rev domain, num hosts⟩
6.37 GBcc-main-2021-feb-apr-may-domain-edges.txt.gzedges ⟨from_id, to_id⟩
3.58 GBcc-main-2021-feb-apr-may-domain.graphgraph in BVGraph format
2 kBcc-main-2021-feb-apr-may-domain.properties
3.42 GBcc-main-2021-feb-apr-may-domain-t.graphtranspose of the graph
2 kBcc-main-2021-feb-apr-may-domain-t.properties
1 kBcc-main-2021-feb-apr-may-domain.statsWebGraph statistics
1.89 GBcc-main-2021-feb-apr-may-domain-ranks.txt.gzharmonic centrality and pagerank

Below you’ll find the top 1000 domains ranked by Harmonic Centrality or PageRank. The full list of all 88 million domain ranks is available for download.

Top 1000 domains ranked by harmonic centrality (Feb/Apr/May 2021)

harmonic
centrality
rank
hc valuepage rankpage rank
value
reversed domain name
13192093410.017627com.googleapis
23103278430.013762com.facebook
32968130420.013832com.google
42710169240.007844com.twitter
52695466050.007519org.w
62688662470.006967com.youtube
72551585080.005718com.instagram
82503149060.007143com.googletagmanager
92439611690.005506org.gmpg
1023807122120.003347com.linkedin
1122970992130.003048com.gstatic
1222854052100.003951com.cloudflare
1322698594190.001914com.gravatar
1422504168140.002908org.wordpress
1522434542220.001564com.pinterest
1622100870250.001270org.wikipedia
1721950578170.002031com.wordpress
1821940826180.001958com.apple
1921766696150.002258com.bootstrapcdn
2021762964300.001174com.vimeo
2121722198380.000914be.youtu
2221556142210.001842com.jquery
2321478118290.001182com.microsoft
2421432212530.000703com.blogspot
2521354260350.001025com.amazonaws
2621337432440.000765com.amazon
2721320702430.000789gl.goo
2821170722620.000600ly.bit
2921149628990.000409com.tumblr
3021148242500.000739com.wp
3121136818450.000758org.mozilla
3221110018570.000689eu.europa
3321104262200.001894com.adobe
3421048760160.002200com.github
3521040284340.001026com.google-analytics
3621027350360.001015net.jsdelivr
3720998320270.001218com.wixstatic
3820995232310.001119net.cloudfront
3920946148470.000744com.flickr
40209131041070.000338com.yahoo
4120851316830.000436com.googleusercontent
4220843068370.000929io.github
43208406701110.000317com.reddit
4420834398580.000677com.paypal
4520816886230.001554com.fontawesome
46207735821030.000368com.weebly
4720764576790.000455com.medium
4820764512330.001035com.googlesyndication
4920757582320.001118ru.yandex
5020741944480.000743com.whatsapp
5120708152680.000520org.w3
52207058261320.000240com.nytimes
5320696906590.000673co.t
54206780881020.000375org.creativecommons
55206758221150.000290com.soundcloud
5620644978600.000624org.schema
5720627114740.000479com.shopify
5820621162660.000543com.vk
59206047261810.000149org.wikimedia
60206047241470.000204com.dropbox
6120579720550.000702com.addthis
62205729501380.000211org.archive
63205706101980.000133com.cnn
64205581141520.000187gov.cdc
6520550306800.000446me.wp
66205388161930.000136com.imgur
6720530078490.000740net.doubleclick
68205122941990.000133uk.co.bbc
69205059642000.000133net.slideshare
70204998641710.000155com.theguardian
71204897561580.000175int.who
72204822561200.000263com.spotify
73204811181750.000151com.bing
74204783202130.000124com.businessinsider
75204774782530.000104com.bloomberg
76204773001440.000206gov.nih
7720473648460.000748com.macromedia
78204405202540.000103com.wsj
79204343202240.000118edu.stanford
8020419762410.000847net.fbcdn
8120417930390.000885org.apache
82204096361570.000175org.ietf
8320397792900.000420com.list-manage
84203955943680.000071com.googleblog
85203953502170.000123com.stackoverflow
86203931721700.000155com.giphy
87203912263140.000085edu.mit
88203819482230.000118com.washingtonpost
89203726021340.000232com.ytimg
90203635923620.000073com.appspot
91203602363510.000076com.theverge
92203596102860.000093com.bbc
93203588703960.000067uk.co.telegraph
94203560364990.000056edu.berkeley
95203480482660.000101edu.harvard
96203460123300.000080com.go
97203416762370.000112com.office
98203387101450.000206us.zoom
99203357822470.000109com.android
100203353663270.000082com.wired
101203341602880.000092com.techcrunch
102203317822380.000111com.oracle
103203236385470.000051com.livejournal
104202966701640.000170com.issuu
105202958402960.000090com.cnbc
106202921462110.000124gov.ca
107202917544020.000066com.ted
108202883803790.000069gov.nasa
109202834261490.000195com.forbes
110202830501480.000199com.wixsite
111202829721510.000192com.npmjs
112202825245180.000054com.zdnet
113202796564470.000062com.msn
114202777522920.000091com.reuters
115202755403500.000076com.nature
11620273474780.000459com.godaddy
117202717183710.000070com.myspace
118202704942220.000119com.etsy
119202688323210.000084com.prnewswire
120202557262090.000125org.ampproject
121202523864070.000065org.arxiv
122202522923120.000085org.npr
123202522182630.000101com.sciencedirect
12420248804980.000410com.unpkg
125202464022650.000101com.example
12620245616670.000524net.akamaihd
127202370562150.000123com.eventbrite
128202345323670.000072org.hbr
129202323381760.000151com.blogger
130202316581270.000247org.networkadvertising
131202315523990.000066com.latimes
132202286902680.000101org.acm
133202232423380.000079com.statista
134202094343890.000068com.fastcompany
135202058486600.000043com.economist
136202024823430.000078com.time
137202024522260.000117com.twimg
138202019026790.000042edu.upenn
139202015305500.000050edu.yale
140202008422580.000102com.githubusercontent
141201912724740.000060com.steampowered
142201898241430.000206com.opera
143201886204440.000062uk.co.dailymail
144201884863530.000076com.springer
145201868065760.000047com.scribd
146201847847800.000041edu.columbia
147201801005350.000052org.chromium
148201758765910.000046me.about
149201757326040.000046google.blog
150201752842850.000094com.squarespace
151201740503350.000079com.huffingtonpost
152201713564310.000063com.nationalgeographic
153201687882210.000119uk.co.google
154201653722080.000125com.unsplash
155201635803880.000068com.w3schools
156201589563390.000079com.dribbble
157201547863400.000079com.tiktok
158201533562930.000091org.un
159201379247940.000040com.qz
160201338142480.000108com.bandcamp
161201295984850.000058edu.cornell
162201259548210.000039edu.umich
163201211201190.000267com.ft
164201153424350.000063com.theatlantic
165201110289660.000033edu.princeton
166201108083410.000078com.usatoday
167201055567860.000040com.evernote
168201054821330.000235info.aboutads
169201048104080.000065com.meetup
170201026384380.000062com.goodreads
171201008946250.000045org.ieee
172200989728780.000036com.slate
173200978706770.000042com.mysql
174200976564530.000061com.patreon
175200975301370.000216me.t
176200956005150.000055com.cbsnews
177200842046560.000043com.docker
178200833362910.000092com.wiley
179200825204800.000059gov.usda
180200806644540.000061com.dailymotion
181200788188170.000039edu.washington
182200771604930.000057com.withgoogle
183200750645230.000054io.readthedocs
184200710146440.000044com.marketwatch
185200650106500.000043uk.co.blogspot
186200627348680.000037com.shutterstock
18720062652540.000703com.fb
188200596644970.000056uk.co.independent
18920056344760.000467com.wix
190200559328110.000039org.cambridge
191200518445590.000049com.pexels
192200485767790.000041org.sciencemag
193200480045920.000046com.buzzfeed
194200442488190.000039com.stackexchange
195200434661790.000149ru.mail
196200434468440.000038com.webs
197200430745730.000048com.git-scm
198200402084640.000060com.inc
199200373542720.000100net.behance
200200297444250.000063gov.whitehouse
201200253428320.000038com.apnews
202200235187690.000041com.vox
2032002203013650.000024uk.co.thesun
204200185482740.000098com.outlook
205200183187720.000041org.bitbucket
20620017276400.000871com.qq
207200148722440.000110org.doi
208200120828120.000039uk.ac.cam
209200119982550.000103com.disqus
210200073122360.000112com.feedburner
211200056306700.000043org.worldbank
212200012305840.000047org.unicef
213200009324190.000064com.mozilla
214199997405930.000046co.ibb
21519999080260.001261io.polyfill
216199979285250.000054com.booking
21719993488420.000808com.baidu
218199897842600.000101com.cloudinary
219199858562890.000092com.tinyurl
220199839803450.000077com.ibm
2211998302211630.000027com.speakerdeck
222199825065970.000046gov.noaa
223199782066120.000045ee.linktr
224199773105690.000048com.psychologytoday
225199737105310.000053gov.loc
226199729204000.000066com.getpocket
2271997276010410.000031edu.utexas
228199717943200.000084org.pewresearch
2291997131013660.000024edu.rutgers
230199708945510.000050com.sagepub
231199702003090.000087com.nbcnews
2321996796211340.000028org.eclipse
233199655866480.000043com.trello
234199642803260.000082net.windows
235199641943840.000068com.quora
236199614306000.000046net.azurewebsites
237199599102750.000098gov.ftc
2381995593810570.000030edu.uchicago
239199533083110.000086com.netdna-ssl
240199519607820.000041org.semver
241199512861240.000252com.mailchimp
242199502944360.000063com.nypost
2431994929611950.000027com.hatenablog
244199471426520.000043com.newyorker
245199439389850.000033uk.co.guardian
246199435645900.000046com.usnews
247199404982200.000119tv.twitch
248199397387840.000041au.net.abc
249199388201660.000167com.amazon-adsystem
2501993630812780.000025com.vogue
251199354662300.000113com.wpengine
252199340981060.000338com.stripe
2531993326612610.000025org.kernel
254199297389410.000034com.politico
2551992641611930.000027org.unicode
256199256025800.000047org.eff
257199251745410.000051br.com.uol
258199248068520.000037com.about
2591992364413580.000024edu.hbs
260199236009540.000034com.dropboxusercontent
261199234649110.000035edu.jhu
262199220629930.000032co.elastic
263199218889130.000035com.steamcommunity
2641992015019710.000018com.googlesource
265199197605220.000054com.tandfonline
266199180102770.000097com.criteo
267199157085520.000050org.pbs
2681991298611060.000029edu.umd
26919912224640.000549co.g
270199083408650.000037com.foxnews
271199074561230.000261com.sharethis
2721990417810270.000031com.rollingstone
273199030822280.000115com.imdb
274199027749770.000033com.scientificamerican
2751990194013920.000023com.urbandictionary
276199008767750.000041uk.ac.ox
277199004063910.000067com.arcgis
2781989852020160.000018com.lego
279198984202510.000107page.g
280198983186310.000044gov.census
281198900565300.000053com.oup
282198879683460.000077com.optimizely
283198874245820.000047com.indiatimes
284198871943760.000069com.cnet
285198840244220.000064com.wufoo
286198829307040.000042uk.co.eventbrite
287198828064210.000064com.bigcommerce
2881988030613500.000024ca.blogspot
289198790168330.000038org.fao
290198787329080.000035com.jetbrains
2911987104414670.000022ca.ubc
2921986765019380.000018com.warnerbros
293198660124460.000062org.d3js
294198655189460.000034org.greenpeace
295198646322060.000127net.sourceforge
296198634503230.000083fr.google
2971986291612790.000025com.history
298198618068510.000038com.gumroad
299198617509190.000035com.chicagotribune
300198598446360.000044gov.archives
301198589022840.000095com.googlecode
302198535023420.000078com.slack
303198519322290.000114com.eepurl
304198456261140.000292com.paypalobjects
305198417029270.000035com.sap
306198398301530.000180com.addtoany
307198374662900.000092com.typepad
3081983408215620.000021de.mpg
309198300546640.000043com.pinimg
310198281482820.000095com.calendly
311198275304910.000057gov.epa
312198257563540.000076com.proofpoint
3131982112814300.000023ch.ethz
3141982109410280.000031com.500px
3151982055417320.000019com.diigo
316198203983340.000079com.live
3171982003412770.000025org.postgresql
3181981854412570.000025org.wiktionary
3191981791012740.000025org.aclu
320198176989810.000033edu.si
3211981658613940.000023edu.msu
3221981621010290.000031com.thehill
323198149368900.000036de.spiegel
324198131729160.000035com.huffpost
325198112824720.000060gov.hhs
3261980924011140.000028com.scmp
32719806650730.000484me.fb
328198063067640.000042org.change
329198050703780.000069com.sohu
3301980433613290.000024edu.illinois
331198041641850.000147com.xing
3321980119213230.000024org.tensorflow
3331980108610080.000032com.ssrn
334198001841620.000171com.zendesk
335197984289040.000035com.netlify
336197972945080.000056com.squareup
3371979702013520.000024com.sky
338197944001960.000134org.iana
3391979271410780.000029uk.co.thetimes
340197924948470.000038gov.congress
341197887048090.000039org.pypi
3421978387814220.000023cn.com.chinadaily
343197811429720.000033edu.academia
344197809744560.000061com.kickstarter
345197800848020.000040gov.senate
3461977912824150.000015org.pydata
3471977812411400.000027org.semanticscholar
348197757166200.000045site.business
3491977501212750.000025com.over-blog
350197748667920.000040org.oecd
3511977484616600.000020org.phys
352197743349990.000032com.yarnpkg
353197722488160.000039com.deviantart
3541977093610840.000029uk.co.mirror
355197705221870.000145com.rawgit
3561977011413150.000024com.axios
357197697006230.000045gov.house
358197689988940.000036com.discordapp
359197688668800.000036com.sciencedaily
360197662925110.000055com.gmail
361197656784230.000064com.technorati
362197639442160.000123com.hubspot
3631976163814330.000023com.unity3d
3641976076821370.000017org.threejs
3651976023813640.000024com.aljazeera
366197595802450.000109org.nodejs
367197588468960.000036com.bmj
368197555642610.000101com.ebay
3691975519811970.000026au.com.smh
370197536282340.000113org.gnu
3711975196415160.000021edu.osu
3721975136210250.000031int.coe
373197503029940.000032com.britannica
3741974840813120.000024edu.gatech
3751974681826910.000013com.openai
376197443704950.000056org.openstreetmap
377197430864370.000062com.ssl-images-amazon
378197415827910.000040br.com.google
379197410308550.000037ca.cbc
380197404848690.000037com.theconversation
3811973985225820.000014edu.toronto
3821973865210440.000031gov.usgs
3831973830615560.000021com.newscientist
384197362263010.000088net.themeforest
385197356986050.000046com.udacity
386197356684730.000060edu.nyu
3871973408417160.000019edu.ucsc
3881972370817000.000020org.emojipedia
3891972219420680.000017it.scoop
3901972202427540.000013com.slides
3911972187214590.000022ca.sfu
392197200048450.000038au.gov.nsw
3931971790819030.000019org.propublica
3941971758613860.000023com.firebaseapp
3951971609422470.000016com.skyrock
396197105167760.000041com.freepik
39719707962970.000412net.facebook
3981970490014540.000022com.penguinrandomhouse
399197035721950.000135org.bbb
4001970343219340.000018jp.co.japantimes
4011970103017620.000019com.itv
40219700818820.000437net.jsfiddle
4031970061619850.000018org.maven
4041969974623700.000015com.deepmind
405196978446170.000045com.healthline
406196953245060.000056de.gesetze-im-internet
407196947204650.000060org.python
4081969442823310.000015com.mystrikingly
409196915368840.000036gov.dhs
4101968823812330.000026com.wikia
4111968598620900.000017org.sqlite
4121968297615440.000021ms.1drv
413196820941780.000150com.salesforce
414196799143220.000084net.php
415196714843240.000083com.surveymonkey
416196709626340.000044com.mashable
4171967033816280.000020com.motherjones
418196687241390.000211com.weibo
4191966855424530.000014com.fastcodesign
4201966744415060.000021com.flipboard
4211966674624350.000015edu.byu
4221966548217480.000019edu.cuny
423196648863170.000085ru.ok
424196626182870.000092net.azureedge
4251966210813390.000024com.thedailybeast
426196596722460.000109org.aboutcookies
4271965883822830.000015com.shutterfly
4281965610814130.000023com.reverbnation
4291965572226660.000013io.material
430196552545370.000052io.codepen
4311965277612960.000025com.dw
432196519861250.000250com.youtube-nocookie
4331965041617240.000019com.esri
434196501884900.000057fr.free
4351964841615090.000021com.substack
436196474385610.000049com.matterport
4371964658419560.000018com.hindustantimes
4381964583019090.000019com.insider
4391964234221100.000017edu.oregonstate
4401964181423900.000015org.wikibooks
441196408388910.000036int.wipo
4421964024428200.000013org.aclweb
443196392266070.000045gov.state
4441963889423660.000015com.wattpad
445196386521600.000172gle.forms
4461963669210520.000030org.jstor
4471963639819510.000018com.channel4
4481963612617520.000019edu.ucsb
4491963594213200.000024gov.supremecourt
45019633994560.000697com.googleadservices
4511963176024410.000015at.ac.univie
4521962909629240.000013com.pbase
453196265722780.000097uk.org.ico
454196248026390.000044com.licdn
4551962342215180.000021ch.ipcc
456196218749370.000034com.gallup
457196217804960.000056com.herokuapp
4581961858411410.000027edu.brookings
459196173889630.000033edu.psu
4601961679013330.000024mil.army
461196166264340.000063com.rackcdn
462196144843850.000068com.atlassian
4631961176012260.000026com.smashingmagazine
4641960963422270.000016blog.home
4651960845013620.000024gov.defense
4661960769811310.000028com.photoshelter
467196074644830.000058net.imgix
468196070121820.000149jp.co.yahoo
4691960531622840.000015com.contently
470196020408260.000039com.oreilly
4711959770811740.000027com.mediafire
4721959559621170.000017com.thecut
4731959460419600.000018google.ai
4741959456831510.000012cc.uxdesign
4751959428031610.000012edu.uvm
476195941005200.000054edu.cmu
4771959308631370.000012com.instapaper
4781959109015910.000020com.thestar
479195883783690.000071net.researchgate
4801958721435020.000011com.raywenderlich
481195870085270.000053com.thinkwithgoogle
4821958486821490.000016fr.liberation
483195822301090.000336de.google
4841958141815740.000021com.buzzfeednews
485195776487670.000041org.worldwildlife
4861957666210130.000032com.ecwid
4871957611814770.000022com.findlaw
4881957480410120.000032com.thelancet
489195739367740.000041com.vice
490195735068130.000039gov.nist
4911957287219640.000018org.google
4921957250815310.000021org.hrw
493195704107650.000042com.intel
4941956823826950.000013uk.co.ibtimes
4951956779023720.000015com.oprah
49619567558870.000428com.workplace
4971956719433290.000011com.pearltrees
4981956717421030.000017com.voanews
499195667629650.000033com.engadget
500195661881260.000247com.statcounter
5011956477233650.000011org.edublogs
5021956398012600.000025org.aiga
5031956282810310.000031de.stern
5041956206815830.000020fr.francetvinfo
5051956019626200.000014com.hm
506195593423150.000085org.drupal
5071955913237360.000010fr.unblog
508195587867470.000042com.canva
5091955836228700.000013edu.ucf
5101955806432040.000012ph.telegra
511195575349260.000035uk.co.pinterest
5121955707224020.000015edu.kit
513195563585440.000051it.placehold
5141955552822190.000016net.corporate-ir
5151955391027680.000013co.ello
516195534268810.000036com.arstechnica
5171955301814490.000022com.livescience
5181955096821500.000016com.gq
5191955083619530.000018uk.gov.tfl
520195502542100.000125com.iubenda
521195500425330.000053com.pixabay
5221954832814080.000023org.undp
523195476688070.000039ca.amazon
5241954722610200.000031it.smarturl
5251954703226450.000014org.icrc
5261954693424470.000015com.webbyawards
5271954556424230.000015uk.ac.kcl
528195455549490.000034edu.ucla
5291954446214440.000022link.page
5301954396828610.000013com.dummies
5311954136615810.000021org.ocks
53219540748650.000544net.typekit
5331954002211220.000028org.ilo
5341953888225640.000014com.depositphotos
5351953886625020.000014com.unilever
5361953695013480.000024org.acs
53719536262810.000440com.livestream
5381953509826720.000013org.rsf
539195350764890.000057com.adweek
5401953404420500.000017com.msnbc
5411953022025090.000014com.slidesharecdn
5421953008420350.000018com.chronicle
5431952983630880.000012com.bepress
5441952958025710.000014com.biography
5451952932233840.000011tl.de
546195278863320.000079com.typeform
5471952642821850.000016com.newrepublic
5481952540023030.000015com.thoughtco
549195238566060.000045com.samsung
5501952311211000.000029org.ohchr
551195226687900.000040com.fiverr
5521952151817430.000019io.gitlab
553195212401210.000262com.jimdo
5541952029211570.000027com.thenextweb
5551952007020090.000018fr.orange
5561951961832720.000012net.openreview
5571951893622940.000015com.channelnewsasia
5581951709012830.000025org.aarp
5591951691826340.000014org.pewsocialtrends
5601951647619980.000018com.straitstimes
5611951393623100.000015edu.nd
5621951095620990.000017com.dallasnews
5631951073221300.000017de.br
5641950881822780.000015org.fas
5651950800012970.000024org.altervista
566195079782560.000103uk.co.amazon
567195072902190.000121to.amzn
5681950662428350.000013com.thejakartapost
5691950512822110.000016gov.lbl
5701950455616100.000020de.berlin
5711950436210860.000029com.popularmechanics
5721950370627430.000013uk.ac.leeds
573195036444590.000061com.staticflickr
5741950321033970.000011org.neocities
5751950235829960.000012org.vim
5761950218628830.000013org.globalcitizen
577194994505720.000048com.deloitte
578194993929220.000035com.zoho
579194989642330.000113io.shields
5801949893623280.000015com.indianexpress
5811949890238890.000010com.stratechery
5821949772828190.000013app.web
5831949635833860.000011org.zotero
5841949362429390.000013uk.gov.scotland
585194933145670.000048com.photobucket
5861949152437560.000010com.bravesites
5871949055214640.000022org.iea
588194899764320.000063com.hp
5891948995427130.000013uk.co.timesonline
590194894783650.000073com.quantserve
591194893364040.000066com.digg
592194866605600.000049com.cisco
5931948661811550.000027uk.parliament
5941948501429140.000013com.nwsource
5951948501223620.000015com.fineartamerica
596194845982670.000101com.onesignal
5971948423822340.000016com.foreignpolicy
598194842007980.000040org.weforum
5991948339829900.000012com.thoughtworks
6001948320215480.000021com.treehugger
601194823983070.000087com.aliyuncs
602194822246020.000046org.js
6031948023215270.000021gov.uscis
6041947904032560.000012uk.ac.city
6051947747620770.000017com.washingtontimes
6061947719835040.000011com.mariadb
6071947631625650.000014org.oas
608194752364170.000065com.gitlab
6091947225825840.000014com.mathworks
6101947175228300.000013com.dezeen
611194712848350.000038com.investopedia
6121947063824970.000014uk.co.yougov
6131946931629340.000013org.heritage
614194693086140.000045com.netflix
6151946625232810.000011com.shell
6161946538825400.000014fr.paris
617194649564480.000061gov.irs
6181946273240880.000009tl.page
6191946133013610.000024com.upwork
620194611704620.000061com.sxsw
6211946091412550.000025com.digitaloceanspaces
6221946054840910.000009com.jigsy
623194600668610.000037com.venturebeat
6241945841812150.000026com.dell
6251945734810160.000031gov.fcc
6261945682832290.000012uk.co.walesonline
6271945634629610.000013org.project-syndicate
6281945569620240.000018com.fivethirtyeight
629194552429200.000035fm.last
6301945505620860.000017info.worldometers
631194542529310.000034org.mediawiki
6321945367023770.000015ly.rebrand
6331945315840770.000009net.myanimelist
6341945282420750.000017cn.gov.fmprc
6351945201215020.000021org.amnesty
636194505483490.000077com.adnxs
6371944935019450.000018com.justia
6381944871240190.000009edu.usfca
6391944829827050.000013com.monday
6401944657615150.000021ca.bc.gov
641194464869430.000034org.reactjs
6421944612622850.000015net.openid
643194459043830.000068com.newrelic
6441944536613630.000024com.imageshack
6451944514435680.000010org.globalnetworkinitiative
6461944394025490.000014com.kaggle
6471944356236930.000010com.doodlekit
648194397922590.000102com.getbootstrap
6491943867028310.000013uk.co.inews
6501943831231290.000012com.bangkokpost
651194382304090.000065com.force
6521943790821070.000017uk.ac.imperial
6531943543446290.000008net.vingle
6541943415019820.000018be.kuleuven
6551943406635300.000011com.intensedebate
656194329265680.000048com.entrepreneur
6571943235035180.000011be.blogspot
6581942974031660.000012se.blogspot
6591942971213180.000024co.lpages
6601942899232660.000012org.carnegieendowment
661194286748370.000038com.globenewswire
6621942866231750.000012is.good
6631942809822460.000016com.instructure
6641942769829650.000012net.alarabiya
6651942720440900.000009com.kongregate
6661942651427950.000013com.discovermagazine
6671942574626130.000014org.gnupg
668194255185560.000049com.visualstudio
669194241301910.000139com.atdmt
6701942352837730.000010com.openlearning
6711942323037940.000010ch.swissinfo
6721942198235470.000010com.pixar
6731942008021540.000016com.livemint
674194197089570.000033com.variety
6751941714228160.000013uk.gov.metoffice
6761941434620040.000018com.surveygizmo
6771941299433370.000011cn.globaltimes
678194112129290.000035uk.gov.legislation
6791941107026390.000014org.ballotpedia
680194097362430.000110org.whatwg
6811940862031480.000012com.coca-colacompany
6821940834213430.000024uk.gov.nationalarchives
6831940616823260.000015com.thebalancesmb
6841940482231450.000012uk.gov.companieshouse
6851940308835320.000011com.dailykos
686194010081650.000170com.yelp
687194005122570.000103com.automattic
6881940027041690.000009com.penzu
6891939968624890.000014com.bloomberglaw
690193996624120.000065org.opensource
6911939812615470.000021org.khanacademy
6921939737638340.000010com.sfweekly
6931939523627790.000013com.thumbtack
6941939420228800.000013org.royalsociety
6951939368416740.000020kr.co.google
6961939367825310.000014com.post-gazette
6971939352028000.000013org.panda
6981939064824210.000015com.thenation
6991938971428230.000013io.fabric
7001938897449360.000008org.arkive
7011938875626890.000013uk.co.bbci
7021938762440420.000009hk.edu.cityu
7031938740631940.000012com.scribblelive
7041938635235530.000010com.gimletmedia
7051938587234890.000011com.tweetmeme
7061938483025410.000014de.uni-heidelberg
707193842842980.000089ai.shortpixel
7081938387219200.000019gov.gao
7091938297444250.000008com.storeboard
7101938165028140.000013com.politifact
7111938020233490.000011org.cato
7121937928248890.000008com.uberant
7131937730631830.000012fr.lepoint
7141937719438090.000010edu.depaul
7151937612638440.000010net.thedailystar
716193755904060.000066com.aol
7171937557040460.000009edu.umt
7181937279419480.000018tv.ustream
7191937262810340.000031com.verisign
7201936958832790.000011com.theweek
721193679349050.000035com.box
7221936717037240.000010com.eklablog
7231936585034880.000011com.militarytimes
724193658328660.000037gov.uspto
7251936558034830.000011com.multiscreensite
7261936409831030.000012uk.ac.york
7271935948831650.000012org.openweathermap
7281935857415260.000021com.techrepublic
7291935807033150.000011org.jenkins-ci
7301935796828150.000013org.wnyc
731193574586380.000044gov.copyright
7321935683434330.000011com.lawfareblog
7331935461023570.000015co.pcdn
7341935300432630.000012com.nyt
7351935276631010.000012se.svt
7361935186610480.000030net.clickbank
7371935154631210.000012com.scotsman
7381934872011820.000027com.foursquare
7391934866012390.000026com.pingdom
7401934804824750.000014com.squarespace-cdn
7411934667823230.000015com.natlawreview
7421934635027690.000013org.wri
7431934580034300.000011com.bigthink
7441934505441320.000009com.newgrounds
7451934469238620.000010org.sourcewatch
7461934235637200.000010re.cli
7471934178831560.000012gov.ncjrs
7481934145830870.000012my.com.thestar
7491934069833070.000011gov.anl
7501933993231170.000012com.nationalreview
7511933913225970.000014ca.newswire
7521933809016030.000020org.webkit
7531933740237000.000010org.elasticsearch
754193352769280.000035com.hootsuite
755193349363000.000088com.caniuse
7561933425232360.000012gov.fec
7571933391023270.000015ru.rg
7581933312437410.000010org.constitutioncenter
7591933210216020.000020com.jwplayer
7601933175442530.000009com.etymonline
7611933167836200.000010it.eventbrite
7621933151029600.000013com.madmimi
7631933146034910.000011com.afp
7641933019219070.000019com.kinstacdn
7651932813631630.000012gov.ornl
766193270424610.000061com.pubmatic
767193258664010.000066gg.discord
7681932551812890.000025com.intuit
7691932548211680.000027com.ycombinator
7701932525832920.000011com.crashlytics
7711932430242700.000009com.underconsideration
7721932285625990.000014com.articulate
7731932223032460.000012de.uni-frankfurt
7741932149636920.000010uk.co.spectator
775193210968670.000037com.wikihow
7761932101042750.000009to.gplus
7771932080249200.000008pl.pastebin
7781932062237910.000010uk.co.manchestereveningnews
7791931985429380.000013edu.unh
7801931897625530.000014de.tagesschau
7811931880221160.000017gov.energystar
782193183724290.000063com.businesswire
783193180508290.000038com.moz
7841931484835500.000010org.avaaz
7851931455436830.000010com.mnn
7861931447611720.000027com.alexa
7871931415023320.000015net.vnexpress
788193132683480.000077com.constantcontact
7891931273236000.000010com.heraldscotland
7901931232638430.000010fm.audioboo
7911931175044810.000008tv.eurovision
792193116469740.000033com.fandom
7931931125637170.000010uk.ac.uea
7941931117436970.000010uk.ac.core
7951931026835140.000011com.hsbc
7961931025434920.000011org.sciencenews
7971931024249160.000008com.blackplanet
7981931009632890.000011com.realclearpolitics
7991930936616980.000020com.pastebin
8001930919631900.000012uk.org.rspb
8011930832213770.000023com.techradar
802193080945290.000053com.indeed
8031930754849850.000007dk.bloggersdelight
8041930714444910.000008com.xtgem
8051930610820730.000017ca.on.gov
8061930550035360.000011uk.co.thisismoney
807193049087970.000040gov.sec
8081930233011280.000028net.atlassian
8091930224039370.000009com.collinsdictionary
8101929994414790.000022edu.purdue
8111929902031790.000012com.wayfair
8121929890836110.000010org.chathamhouse
8131929790032180.000012org.rferl
814192972163970.000066com.skype
8151929653647380.000008edu.ualr
8161929601635230.000011org.diva-portal
8171929567227850.000013org.cfr
8181929480612490.000025com.merriam-webster
8191929296848350.000008com.designobserver
8201929273433990.000011org.pewforum
821192922002700.000100jp.co.amazon
8221929146839940.000009uk.co.dailyrecord
8231929093639510.000009edu.swarthmore
8241929057033390.000011com.ubs
8251928974810750.000030so.notion
8261928974228470.000013us.govtrack
8271928923612560.000025com.udemy
828192890403330.000079com.hackerone
8291928871637870.000010org.nationalinterest
8301928862631380.000012com.doubleclickbygoogle
831192880002790.000097de.amazon
8321928724420360.000018org.doxygen
8331928684016610.000020scot.gov
8341928665239330.000009de.berliner-zeitung
8351928586815190.000021com.billboard
836192839106810.000042com.gartner
8371928339046980.000008net.writeablog
8381928268824650.000014com.infoworld
839192820848230.000039com.sedo
8401928170032000.000012org.aei
84119280820710.000502com.oculus
8421928065215800.000021edu.ucsd
843192803963290.000081mp.mailchi
8441928028839170.000009edu.umaine
8451927922232620.000012org.iucnredlist
8461927913028270.000013com.lexology
8471927830448510.000008com.nation2
8481927815652900.000007com.anotepad
8491927805641280.000009za.co.mg
85019276824770.000467com.messenger
8511927646020830.000017org.dejure
8521927600244940.000008net.blogfreely
8531927563013020.000024org.owasp
8541927514233090.000011com.foreignaffairs
8551927509240670.000009tw.com.books
8561927491642670.000009ca.nfb
857192748223640.000073com.bitly
8581927456032250.000012org.osce
8591927402837260.000010uk.org.wwf
8601927400639710.000009org.truthout
861192731041550.000178gov.privacyshield
8621927270819810.000018edu.uci
8631927236820440.000017se.haxx
864192722888970.000036com.emarketer
8651927211045320.000008com.symbaloo
8661927150810040.000032com.playstation
8671927133821960.000016org.sundance
868192712163630.000073eu.youronlinechoices
8691927119634960.000011com.rev
8701927108040710.000009in.thewire
871192709761590.000174org.nginx
872192705289030.000036com.libsyn
8731926865024000.000015us.pa.state
874192676101460.000205me.line
8751926747852020.000007net.bravejournal
8761926738631400.000012ru.kp
8771926733440140.000009com.ecowatch
878192667005140.000055org.debian
879192663025390.000052com.gofundme
880192661949760.000033com.pcmag
8811926491441510.000009com.theoutline
8821926451243160.000009org.icj-cij
8831926362614700.000022org.coursera
8841926161020760.000017gov.healthcare
8851926062637210.000010com.iconarchive
8861925973416570.000020net.leadpages
8871925903414860.000022com.technologyreview
8881925803223670.000015ca.citizenlab
8891925788436900.000010com.governing
8901925778233220.000011com.wikidot
8911925726023850.000015org.raspberrypi
8921925645246210.000008jp.ac.kobe-u
8931925545410730.000030com.timeanddate
8941925483610960.000029com.buffer
8951925403239780.000009com.ogilvy
896192515309400.000034com.css-tricks
8971925109615010.000021com.msdn
8981925013839580.000009com.gab
8991924999436730.000010com.what3words
9001924926012410.000026com.tableau
9011924831613190.000024com.xkcd
9021924822436950.000010com.nestle
9031924767849820.000007net.postheaven
904192464284700.000060com.fc2
9051924623817950.000019com.pcworld
9061924602825890.000014mp.j
9071924575443180.000009org.kuow
9081924530039060.000009org.migrationpolicy
909192452825850.000047com.fortune
9101924432437690.000010de.morgenpost
9111924412032820.000011uk.gov.data
9121924355849520.000007cz.webgarden
9131924310021180.000017org.donorbox
9141924219239090.000009de.uni-konstanz
9151924168442180.000009org.birdlife
9161924098238750.000010org.people-press
9171924077821320.000017to.dev
918192398469060.000035org.golang
9191923873224250.000015net.noscript
9201923774212230.000026com.podbean
9211923590641300.000009com.scienceblogs
9221923570649480.000007it.clyp
9231923549833550.000011edu.fordham
9241923169640760.000009org.oyez
9251923065634410.000011com.joebiden
9261922996028670.000013com.washingtonexaminer
9271922972811150.000028com.gizmodo
9281922911227570.000013org.healthaffairs
9291922891012320.000026com.searchengineland
930192286788540.000037fm.anchor
9311922741250840.000007com.zcubes
9321922725819950.000018com.ssllabs
9331922596410720.000030org.poynter
9341922443616440.000020net.java
9351922363215140.000021edu.usc
9361922325236800.000010org.carbonbrief
9371922150251650.000007org.csgrid
938192212863080.000087jp.ameblo
9391922006415780.000021com.sun
9401922001039590.000009org.rfa
9411921858826160.000014uk.gov.defra
9421921855639120.000009com.exxonmobil
9431921810252490.000007com.topsitenet
9441921773230120.000012com.html5rocks
9451921749436600.000010ca.yelp
9461921657629400.000013com.instructables
9471921558222120.000016org.linuxfoundation
9481921541040690.000009uk.org.woodlandtrust
9491921385420580.000017org.json
950192137902140.000124com.tripadvisor
9511921249052330.000007net.squareblogs
9521921237838640.000010ru.mid
953192121702310.000113com.myshopify
9541921110833100.000011com.flippa
9551921109238500.000010com.townandcountrymag
9561921093812920.000025build.bazel
9571921081652950.000007net.werite
9581921021212400.000026com.uk
9591920974223540.000015com.storify
9601920950832800.000011org.cjr
9611920885431580.000012org.acog
9621920844839210.000009br.com.sebrae
963192083802500.000107org.icann
9641920787616350.000020fr.blogspot
965192075821220.000262com.bizjournals
9661920728034060.000011org.cites
9671920710226870.000013com.tutsplus
9681920705834090.000011tr.com.aa
9691920638011090.000028org.whatbrowser
9701920575046800.000008org.learner
9711920514434240.000011no.yr
9721920373842710.000009com.s-nbcnews
9731920315041660.000009org.spie
9741920308213350.000024com.indiegogo
975192026347080.000042com.airbnb
9761920228842170.000009com.revolut
9771920151443390.000009org.atsjournals
9781920138610330.000031com.redhat
9791920076040660.000009uk.co.zoopla
980191998263180.000084it.google
9811919924611370.000028com.windowsphone
9821919866614850.000022edu.unc
983191985084660.000060gov.fda
984191984086530.000043com.zapier
9851919827221610.000016com.gigaom
9861919731644570.000008ru.novayagazeta
9871919650419360.000018br.com.correios
9881919646841010.000009google.design
9891919535021940.000016org.eu
9901919223837580.000010com.mail-archive
9911919131044370.000008com.out
9921919100047590.000008tw.focustaiwan
9931919094642350.000009org.insideclimatenews
9941919077420380.000017com.freeprivacypolicy
9951919044242650.000009org.escardio
9961919035446630.000008com.theschooloflife
997191897662410.000111com.naver
9981918836247110.000008edu.uah
9991918823016110.000020com.nike
10001918736043190.000009edu.mtsu

Credits

Thanks to the authors of the WebGraph framework, whose software made the computation of graph properties and ranks possible.

We hope the data will be useful for you to do any kind of research on ranking, graph analysis, link spam detection, etc. Let us know about your results via Common Crawl’s Google Group!

Host- and Domain-Level Web Graphs October, November/December 2020 and January 2021

We are pleased to announce a new release of host-level and domain-level web graphs based on the crawls of October, November/December 2020 and January 2021. Additional information about the data formats, the processing pipeline, our objectives, and credits can be found in the announcements of prior webgraph releases (e.g., Nov/Dec/Jan 2017-2018 Webgraphs). You may also visit the projects cc-webgraph and cc-pyspark which include all scripts and tools required to construct the graphs. Instructions to explore the graphs in the webgraph format are given in our collection of webgraph notebooks.

Host-level graph

The graph consists of 490 million nodes and 2.57 billion edges and includes dangling nodes i.e. hosts that have not been crawled yet are pointed to from a link on a crawled page. There are 414 million dangling nodes (84.4%) and the largest strongly connected component contains 42.6 million (8.7%) nodes.

Host names in the graph are in reverse domain name notation and a leading www. is stripped: www.subdomain.example.com becomes com.example.subdomain.

You can download the graph and the ranks of all 490 million hosts from AWS S3 on the path s3://commoncrawl/projects/hyperlinkgraph/cc-main-2020-21-oct-nov-jan/host/. Alternatively, you can use https://data.commoncrawl.org/projects/hyperlinkgraph/cc-main-2020-21-oct-nov-jan/host/ as prefix to access the files from everywhere.

Please note that the text representation of the host-level graph is shipped in 36 gzip-compressed files listed in two path listings – one for the nodes, one for the edges. First, download the paths listing and uncompress it using “gzip”. By adding the prefix s3://commoncrawl/ or https://data.commoncrawl.org/ to each line in the path listing you get the list of URLs to download the entire graph.

Download files of the Common Crawl Oct/Nov/Jan 2020-2021 host-level webgraph

SizeFileDescription
3.08 GBcc-main-2020-21-oct-nov-jan-host-vertices.paths.gznodes ⟨id, rev host⟩, paths of 12 vertices files
11.76 GBcc-main-2020-21-oct-nov-jan-host-edges.paths.gzedges ⟨from_id, to_id⟩, paths of 24 edges files
5.18 GBcc-main-2020-21-oct-nov-jan-host.graphgraph in BVGraph format
2 kBcc-main-2020-21-oct-nov-jan-host.properties
5.63 GBcc-main-2020-21-oct-nov-jan-host-t.graphtranspose of the graph (outlinks inverted to inlinks)
2 kBcc-main-2020-21-oct-nov-jan-host-t.properties
1 kBcc-main-2020-21-oct-nov-jan-host.statsWebGraph statistics
7.04 GBcc-main-2020-21-oct-nov-jan-host-ranks.txt.gzharmonic centrality and pagerank

Domain-level graph

The domain graph was built by aggregating the host graph on the level of pay-level domains (PLDs) based on the public suffix list maintained on publicsuffix.org.

The domain-level graph has 86 million nodes and 1.47 billion edges. 50% or 43 million nodes are dangling nodes, the largest strongly connected component covers 34 million or 39% of the nodes.

All files related to the domain graph are available on AWS S3 under s3://commoncrawl/projects/hyperlinkgraph/cc-main-2020-21-oct-nov-jan/domain/ resp. https://data.commoncrawl.org/projects/hyperlinkgraph/cc-main-2020-21-oct-nov-jan/domain/.

Download files of the Common Crawl Oct/Nov/Jan 2020-2021 domain-level webgraph

SizeFileDescription
0.59 GBcc-main-2020-21-oct-nov-jan-domain-vertices.txt.gznodes ⟨id, rev domain, num hosts⟩
6.00 GBcc-main-2020-21-oct-nov-jan-domain-edges.txt.gzedges ⟨from_id, to_id⟩
3.40 GBcc-main-2020-21-oct-nov-jan-domain.graphgraph in BVGraph format
2 kBcc-main-2020-21-oct-nov-jan-domain.properties
3.26 GBcc-main-2020-21-oct-nov-jan-domain-t.graphtranspose of the graph
2 kBcc-main-2020-21-oct-nov-jan-domain-t.properties
1 kBcc-main-2020-21-oct-nov-jan-domain.statsWebGraph statistics
1.85 GBcc-main-2020-21-oct-nov-jan-domain-ranks.txt.gzharmonic centrality and pagerank

Below you’ll find the top 1000 domains ranked by Harmonic Centrality or PageRank. The full list of all 86 million domain ranks is available for download.

Top 1000 domains ranked by harmonic centrality (Oct/Nov/Jan 2020-2021)

harmonic
centrality
rank
hc valuepage rankpage rank
value
reversed domain name
13035556610.017956com.googleapis
22942716430.012871com.facebook
32817356220.012899com.google
42570281250.007348com.twitter
52562831440.007628org.w
62529780860.007231com.youtube
72419546690.005352com.instagram
82335535680.005532org.gmpg
92323367470.006500com.googletagmanager
1022492432110.003277com.linkedin
1121576402100.004076com.cloudflare
1221468510140.002649com.gravatar
1321395642130.003020org.wordpress
1421353798220.001726com.pinterest
1520946722300.001242org.wikipedia
1620926308190.001834com.wordpress
1720877776160.002056com.gstatic
1820799472150.002451com.bootstrapcdn
1920795402180.001943com.apple
2020626472320.001165com.vimeo
2120527986410.000886be.youtu
2220419038210.001769com.jquery
2320391686280.001246com.microsoft
2420327544240.001500com.wp
2520314602450.000769com.blogspot
2620231490370.001025com.amazonaws
2720208912510.000691com.amazon
2820199388470.000740gl.goo
2920093688710.000448com.tumblr
3020070176350.001070com.google-analytics
3120050256610.000598ly.bit
3220030452200.001794com.adobe
3319998314170.002005com.github
3419989010500.000715org.mozilla
3519962834580.000639eu.europa
3619945306340.001103net.cloudfront
3719849112520.000682com.flickr
3819843288400.000909net.jsdelivr
3919833032910.000369com.googleusercontent
40198235601050.000347com.yahoo
4119752300560.000650co.t
4219722088330.001114com.googlesyndication
4319712406230.001517com.fontawesome
4419708354810.000392com.weebly
4519706054550.000653com.paypal
46196952881090.000308com.reddit
4719641534310.001231me.wp
4819640398730.000435com.medium
4919635162670.000491io.github
50195904441370.000225com.nytimes
51195878801210.000280com.soundcloud
5219585192270.001262ru.yandex
5319583494430.000786com.addthis
5419582250440.000776com.macromedia
5519560416660.000504org.w3
5619549714700.000451com.shopify
57195186721460.000201com.forbes
58195024481440.000205org.archive
5919496300900.000371org.creativecommons
60194903481940.000131uk.co.bbc
6119482926590.000630org.schema
6219479528390.000910com.baidu
6319464572360.001035net.doubleclick
64194599662000.000129com.cnn
6519451100530.000677com.whatsapp
6619449068600.000611com.vk
67194449662060.000126net.slideshare
68194439561580.000169com.bing
69194198781740.000152com.imdb
70193859561860.000140com.imgur
71193725202360.000112com.washingtonpost
72193710761760.000150com.theguardian
73193569522540.000102com.wsj
74193564742100.000123org.wikimedia
75193521282190.000117com.businessinsider
76193476982090.000123com.stackoverflow
77193427124090.000065com.msn
78193266543270.000079com.appspot
79193243341570.000172int.who
80193211122160.000119edu.stanford
81193167961790.000148org.apache
82193103903330.000078com.ibm
83193093543370.000077edu.mit
84193049382250.000116net.sourceforge
85192929321160.000288com.ytimg
8619287812570.000649net.fbcdn
87192824862850.000091com.techcrunch
88192765002690.000094com.bbc
89192754801550.000181com.wixsite
90192752221520.000189gov.nih
91192752002200.000117com.livejournal
92192706502330.000113uk.co.google
93192706104400.000062gov.nasa
9419263354540.000666com.googleadservices
95192434042620.000097edu.harvard
96192431542700.000094com.oracle
97192431262760.000093org.acm
98192386502180.000117org.ietf
99192384501850.000142com.blogger
100192384262230.000116gov.ca
101192346304650.000059fr.free
102192320582590.000098com.bloomberg
103192218442750.000093com.android
104192186363040.000085com.live
105192108121260.000271com.jimdo
106192088961690.000159com.issuu
107192058021660.000162com.giphy
108191941564380.000062com.ted
109191901783480.000075com.huffingtonpost
110191877821300.000254com.weibo
111191868621540.000186us.zoom
112191857942520.000103org.gnu
113191763324030.000066com.myspace
1141916212210390.000030com.wikia
115191525823730.000071net.researchgate
116191500583430.000075com.usatoday
117191483323090.000084com.reuters
118191439884000.000067uk.co.telegraph
119191412024460.000061com.latimes
120191309763720.000071com.example
121191295523450.000075com.githubusercontent
12219127344930.000366com.unpkg
123191271163840.000069com.nature
124191253963360.000077com.wired
12519124320250.001485com.wixstatic
126191148422990.000087org.npr
127191110183080.000084com.cnbc
128191077723280.000079com.ebay
129191037042930.000088com.wiley
130191028141110.000299de.google
131190977321910.000135com.npmjs
132190954543440.000075com.hp
133190885505390.000050com.cisco
134190840489320.000034com.stackexchange
135190817361320.000251com.youtube-nocookie
136190806381340.000250com.ft
137190788142130.000120org.ampproject
138190772325320.000051com.steampowered
139190746383650.000072com.patreon
140190729184550.000061com.theatlantic
141190728804760.000057com.gitlab
142190723448900.000035com.pcmag
143190684361950.000131com.unsplash
144190654948770.000036edu.psu
145190639263760.000070com.time
146190611422080.000125com.twimg
147190610641640.000165com.yelp
148190593328730.000036edu.washington
149190571965330.000051edu.cornell
150190541521480.000197com.dropbox
151190517386030.000046org.arxiv
152190476263790.000070com.statista
153190430503240.000080org.un
154190426022490.000104com.bandcamp
155190409148240.000038com.venturebeat
15619040684750.000432me.fb
157190398828410.000037org.chromium
15819033464650.000519com.wix
159190262442840.000092com.sciencedirect
160190197666290.000045edu.yale
161190163265840.000047com.pexels
162190152308260.000038org.bitbucket
163190104528320.000038org.ieee
164190076363880.000068com.springer
165190018107650.000041com.evernote
166189975068550.000037edu.upenn
167189949262580.000098jp.ameblo
168189937721490.000195me.t
169189928344160.000065org.hbr
170189920282960.000088com.outlook
171189859541680.000160jp.co.yahoo
172189832385770.000048com.cbsnews
173189825467920.000040me.about
174189812288910.000035com.git-scm
175189803368290.000038com.economist
176189803281500.000193com.opera
177189780561380.000223me.line
178189749964500.000061com.goodreads
179189733646450.000044com.mysql
180189731148420.000037com.docker
181189697085620.000048com.buzzfeed
182189695665650.000048com.mashable
183189683985870.000047com.mozilla
184189645409510.000034com.about
185189626327970.000040org.worldbank
186189561288150.000039com.newyorker
187189546683420.000076com.dribbble
188189542362650.000096net.behance
189189518763900.000068com.theverge
190189518385010.000054gov.whitehouse
191189501424560.000061uk.co.dailymail
192189438903470.000075com.xinhuanet
193189428123200.000080com.w3schools
194189411243780.000070com.fc2
1951893648811510.000027edu.wisc
196189350747640.000041gov.noaa
197189323962940.000088com.disqus
1981893122813370.000023co.elastic
19918927646380.000956com.qq
200189266944480.000061com.bigcommerce
201189264426240.000045gov.loc
202189256201560.000179gov.cdc
203189246329290.000035gov.fcc
204189228161360.000228info.aboutads
205189216308210.000039com.qz
2061892130822950.000015com.wikidot
207189192403850.000069com.scribd
208189151047480.000042org.unesco
209189144189590.000033com.apnews
210189124263750.000070com.digg
211189110827790.000040com.vox
212189103701800.000147com.amazon-adsystem
213189101102720.000094com.squareup
214189074104950.000054uk.co.independent
215189062242560.000100org.iana
2161890560812510.000025edu.uchicago
217189013984200.000064com.force
218188987026460.000044com.usnews
219188981086470.000044com.gartner
220188949182950.000088com.nbcnews
221188901604700.000058com.dailymotion
2221888348810040.000031com.dropboxusercontent
223188782766170.000045org.pbs
224188764541810.000147jp.co.google
225188761641130.000292com.sharethis
226188758244670.000059com.nationalgeographic
227188741128110.000039uk.co.blogspot
228188733408440.000037au.net.abc
229188680009340.000034com.foxnews
2301886532215590.000020org.eclipse
231188594643990.000067com.getpocket
232188592289470.000034com.slate
233188590622660.000095org.doi
23418858866630.000541com.fb
235188566389680.000033com.politico
236188499929070.000035com.playstation
237188493346000.000046org.semver
2381884846815650.000020gd.is
2391884700413110.000024edu.unc
2401884675815230.000021org.kernel
241188463108390.000037org.sciencemag
242188460382570.000099com.typepad
2431884499811520.000027com.hatenablog
2441884400419810.000018com.googlesource
245188421802020.000128com.naver
246188405482480.000104com.feedburner
2471883983010280.000030edu.umn
248188375184210.000064com.ecwid
249188330483320.000078net.windows
250188310429140.000035com.trello
251188291765540.000049com.tandfonline
2521882917213690.000023cn.com.chinadaily
253188283821890.000138org.allaboutcookies
254188258447460.000042gov.senate
255188239461190.000286com.paypalobjects
2561881998010050.000031ly.ow
2571881872420140.000017org.tensorflow
258188187109010.000035edu.umich
259188179362910.000089com.tinyurl
260188172124790.000056org.pewresearch
26118815000760.000423com.list-manage
262188111322390.000111com.wpengine
263188069088340.000038ca.cbc
264188051447400.000043co.ibb
265188040444770.000057gov.fda
266188029342220.000117com.eepurl
267188024623180.000081it.google
26818798744790.000413net.facebook
2691879704620190.000017com.instructables
2701879556212000.000026edu.northwestern
271187947107520.000042org.change
272187936103940.000068es.google
273187934848930.000035org.cambridge
274187902022510.000103com.calendly
275187848629620.000033gov.congress
2761878486210220.000030uk.co.guardian
277187820145550.000049com.bigcartel
2781877780813480.000023org.semanticscholar
2791877634010060.000031com.gumroad
280187756906370.000044org.plos
2811877495613410.000023com.nikkei
282187737123130.000083com.optimizely
283187729884050.000066com.googlecode
284187666748960.000035gov.justice
2851876478810440.000029com.huffpost
286187643121530.000186com.addtoany
287187634083980.000067me.m
28818761658800.000403com.wsimg
289187600464110.000065com.tripod
290187548849570.000033ee.linktr
2911875452610210.000030gov.usgs
2921875316414590.000021uk.co.wired
293187527283380.000077fr.google
2941875184610590.000029com.500px
295187516364520.000061ca.google
2961874941819960.000017com.amd
2971874444419440.000018com.azure
298187429647770.000040au.com.google
299187425064810.000056com.163
3001874129210910.000028com.ssrn
3011874075810650.000029com.newsweek
3021873491016880.000019ca.utoronto
303187346201390.000218com.spotify
304187311127440.000042cn.com.people
305187303843340.000078page.g
3061873007427510.000012com.nabble
3071872840014540.000021com.howstuffworks
3081872293821070.000016com.lego
3091871976216750.000019com.storify
3101871933211400.000027uk.co.thetimes
311187179308010.000039site.business
312187177268840.000036uk.ac.ox
313187162063110.000083com.bitly
3141871506012180.000026com.scmp
315187136187980.000040com.adage
316187135526540.000044com.indiatimes
3171871256419080.000018de.mpg
3181871236810570.000029com.thehill
319187054665190.000052com.criteo
3201870475410780.000028org.ohchr
3211870447415310.000020com.aljazeera
322187033488020.000039uk.gov.service
3231870148215450.000020org.greenpeace
324186990643310.000078com.netdna-ssl
325186983789670.000033ch.google
326186939947840.000040us.icio
3271869369011530.000027int.coe
328186925569330.000034org.d3js
3291869045614990.000021com.history
3301868979410180.000030com.netlify
3311868806413200.000023com.nymag
3321868706413630.000023org.wiktionary
333186848682870.000091ru.ok
3341868379212930.000024com.intuit
3351868279614190.000022uk.co.standard
3361868138819950.000017edu.arizona
337186790589440.000034gov.archives
338186787949530.000034ru.google
3391867708410540.000029sg.com.google
340186758909000.000035br.com.google
34118674402850.000385co.g
3421867406819750.000018com.wattpad
343186737545260.000051ru.gov
3441867337013510.000023com.ikea
3451866859814610.000021com.reverbnation
3461866844426810.000013edu.drexel
3471866827611210.000027edu.si
3481866699411740.000027uk.co.mirror
3491866684625720.000013org.maven
350186667244120.000065com.cnet
351186645425800.000048org.openstreetmap
3521866371013730.000023com.jetbrains
3531866368810320.000030com.theconversation
3541866354219210.000018com.newscientist
355186614728470.000037gov.state
3561866114615720.000020ms.1drv
3571866015226440.000013com.mystrikingly
358186553609730.000032org.fao
359186544585900.000047cn.google
360186534722350.000112com.etsy
3611865223214850.000021com.flipboard
362186518207670.000041com.deviantart
3631865151413750.000023com.thedailybeast
3641865140412200.000026org.jstor
3651864902412700.000024com.strikingly
3661864742220450.000017blog.home
367186468126340.000044com.zdnet
368186448283250.000079tv.twitch
3691864227227810.000012com.diigo
3701864048211230.000027com.britannica
3711863925419040.000018ca.ubc
372186388403670.000072com.jotform
3731863518819590.000018com.gettyimages
3741863425416850.000019com.channel4
3751863127814940.000021org.pypi
376186303868130.000039in.co.google
377186278144170.000064com.ssl-images-amazon
378186269781610.000166gle.forms
3791862331019820.000018org.hrw
380186231322810.000092com.cloudinary
3811861861213820.000022au.com.smh
3821861723415660.000020uk.co.metro
3831861718020310.000017hk.com.google
3841861707215990.000020edu.ufl
3851861359023320.000015ly.rebrand
386186127864570.000061net.imgix
387186097464180.000064com.webflow
3881860905023110.000015com.shutterfly
389186077825680.000048com.feedly
390186038505380.000050gov.epa
391186024701040.000348com.stripe
39218601118830.000391net.jsfiddle
3931859979634230.000010org.aclweb
3941859716623480.000014com.yarnpkg
39518596278690.000461net.akamaihd
3961859620219070.000018gov.supremecourt
3971859524423440.000014com.thefreedictionary
398185938164680.000058nl.google
3991859207215780.000020com.dw
4001858829429550.000012com.upi
401185879329810.000032com.thelancet
402185879264250.000064com.slack
403185876803960.000067com.kickstarter
404185873787870.000040com.urldefense
4051858595017130.000019ca.sfu
406185835824600.000060com.livechatinc
407185810826230.000045com.quora
408185809644280.000063com.rackcdn
4091858062019670.000018com.euronews
410185805524510.000061com.go
4111858013013680.000023com.tunein
412185780765940.000046ru.liveinternet
413185767124750.000057com.googleblog
4141857177625970.000013pt.sapo
4151857121221090.000016com.itv
4161857063019450.000018uk.co.huffingtonpost
4171857054212860.000024edu.brookings
4181857052844230.000008tl.page
4191857005823690.000014com.angelfire
4201856888226140.000013org.wikibooks
4211856730216920.000019com.ifttt
422185641348610.000036com.freepik
4231856324622440.000015com.netvibes
424185626021330.000251com.mailchimp
425185625643640.000072me.telegram
426185624005610.000048com.microsoftonline
4271856222419760.000018uk.co.express
4281855920628880.000012sg.edu.nus
4291855909219280.000018io.webflow
430185572927720.000041pl.google
431185559004800.000056com.meetup
4321855548247520.000007com.newgrounds
4331855494423970.000014google.ai
4341855451224390.000014com.yolasite
4351855391221240.000016jp.geocities
4361855298633940.000011com.instapaper
437185513383620.000072com.proofpoint
4381854884413580.000023com.people
43918546296640.000531net.typekit
4401854369421040.000016org.c-span
441185419181590.000169ru.mail
4421854183420430.000017com.avg
4431854065022490.000015app.netlify
4441853939430040.000011com.000webhostapp
445185393164850.000055com.elsevier
4461853800834940.000010cn.edu.pku
4471853687216090.000020com.asahi
448185354228760.000036org.worldwildlife
4491853520411270.000027uk.parliament
4501853482219560.000018uk.gov.ons
451185336941880.000138com.iubenda
4521853279021130.000016org.documentcloud
4531853233830740.000011uk.co.timesonline
454185311182640.000096com.office
455185277642370.000112com.eventbrite
4561852701226990.000013com.self
4571852617225110.000013com.foreignpolicy
4581852480424210.000014org.sundance
459185247022140.000120com.aliyuncs
4601852414012130.000026be.google
4611852324222000.000016ie.google
4621852300014320.000022gov.weather
4631852269431360.000011com.openai
464185225888790.000036org.mediawiki
4651852112428060.000012com.pearltrees
4661852030617040.000019com.firebaseapp
4671851652036200.000010com.dailycaller
468185145124980.000054it.placehold
4691851416826950.000013com.france24
470185130266440.000044edu.berkeley
471185121384920.000055cn.360
4721851142822960.000015com.msnbc
4731851098620890.000017com.thestar
4741851025837320.000009me.site123
4751850939221330.000016com.gfycat
476185089063410.000076com.rawgit
477185079205210.000052com.gmail
4781850768619520.000018org.ocks
4791850687227390.000012org.rsc
4801850431024860.000014edu.hawaii
4811850376623660.000014de.br
4821850325024470.000014edu.colostate
483185025781710.000154com.zendesk
4841850142422220.000015org.nobelprize
4851850109632930.000011net.pixnet
4861850018815280.000020net.seesaa
4871850016424710.000014com.motherjones
488184997207560.000042com.vice
4891849937842340.000008com.masslive
4901849663423550.000014com.cision
491184950581010.000361com.godaddy
492184921048860.000036gov.nist
4931849195612490.000025org.ilo
4941849065420700.000017com.surveygizmo
4951849062833780.000011com.minds
496184905766350.000044com.matterport
4971848985826560.000013ph.com.google
498184881063690.000071org.python
499184870329800.000032gov.va
5001848580011660.000027at.google
5011848515213180.000023se.google
5021848364419610.000018ru.ucoz
5031848299624010.000014com.freep
5041848219038740.000009com.wizards
5051848173835830.000010edu.uvm
5061847814237110.000010org.tvtropes
5071847698815060.000021com.cognitoforms
5081847651614930.000021gov.uscourts
5091847602435300.000010org.oxfam
5101847399222350.000015cn.t
5111847305443310.000008fm.ask
5121847303417080.000019dk.google
5131847052631220.000011de.dw
5141846720420090.000017ua.com.google
5151846712639350.000009com.youdao
516184640161280.000262org.networkadvertising
5171846296810310.000030com.arstechnica
5181846267423100.000015int.unfccc
5191846184433230.000011ch.nzz
520184601561230.000276com.statcounter
5211846012637570.000009net.hinet
5221846001824840.000014com.washingtontimes
5231845977833910.000011edu.miami
5241845964850250.000007tw.com.gamer
5251845912043130.000008ch.qos
526184587747880.000040com.intel
5271845658422200.000015mx.com.google
5281845573422410.000015gov.ky
5291845550434260.000010com.nwsource
530184549488560.000037io.readthedocs
5311845373021870.000016gov.cisa
5321845198822560.000015com.straitstimes
533184494663710.000071io.codepen
534184470063610.000072com.prnewswire
5351844622440970.000009com.smore
5361844613221880.000016pt.google
5371844592027190.000012net.bplaced
5381844580253490.000007net.wargaming
5391844523232720.000011org.csis
5401844473214350.000022org.aarp
541184440802890.000090net.php
5421844375822820.000015no.google
5431844322839240.000009com.steemit
5441844314613040.000024tw.com.google
545184420183140.000083com.squarespace
546184408727430.000043com.oreilly
547184405961990.000130com.hubspot
5481843935448770.000007com.bonanza
5491843880220200.000017co.lpages
5501843860610790.000028net.ovh
551184382088350.000037com.imageshack
5521843787440230.000009com.doodlekit
5531843681824250.000014com.voanews
554184366803580.000073ru.rambler
5551843604828050.000012com.nationalpost
5561843542045340.000008by.google
557184352566140.000045org.nodejs
558184352003970.000067com.onesignal
5591843447033740.000011fr.rfi
560184344664630.000060gov.irs
5611843444425840.000013com.snopes
5621843423018990.000018link.page
5631843419036370.000010org.vim
5641843401822400.000015th.co.google
5651843378233950.000010org.scala-lang
5661843243431420.000011com.inquirer
5671843089828870.000012org.ballotpedia
5681843088833240.000011com.real
569184286006490.000044br.com.uol
570184280045130.000052com.pixabay
5711842665821420.000016uk.co.which
5721842663440700.000009com.viki
5731842567410380.000030com.thenextweb
5741842430231460.000011org.aps
5751842405027640.000012com.post-gazette
5761842351624990.000014net.openid
5771842270226270.000013edu.usf
57818421138820.000391com.livestream
579184204149610.000033jp.shinobi
580184202729560.000033int.wipo
5811841714644500.000008com.bravesites
5821841554228810.000012ru.aif
5831841457429060.000012io.gitlab
5841841428433870.000011org.pri
5851841427619320.000018gov.ct
5861841398426020.000013il.co.google
5871841390619100.000018org.oxfordjournals
5881841321846640.000008com.ucoz
589184124225660.000048com.photobucket
5901841234421910.000016com.xrea
5911841219822340.000015nz.co.google
5921841092020880.000017net.cnki
5931841082828470.000012com.webbyawards
594184101644330.000063com.staticflickr
5951840993436750.000010org.heritage
5961840890819930.000018tr.com.google
5971840857420530.000017com.treehugger
5981840606216950.000019net.leadpages
5991840528221120.000016fi.google
6001840276451530.000007kz.google
601184027082110.000121to.amzn
602184026705690.000048com.deloitte
6031840266211000.000028cz.google
6041840252645620.000008com.freehostia
6051840233421560.000016gov.faa
6061840232627240.000012com.detroitnews
6071840222027740.000012com.slidesharecdn
608184021023460.000075com.adnxs
609183967268120.000039com.thinkwithgoogle
6101839281614710.000021com.trustwave
6111839237626400.000013org.iea
6121839226228830.000012jp.blog
6131839114844260.000008com.goal
6141839018432840.000011com.financialpost
6151838914036360.000010net.alarabiya
6161838908235700.000010org.neocities
6171838858037840.000009co.ello
618183882562070.000126com.salesforce
6191838647835000.000010com.archdaily
6201838598445170.000008com.alamy
6211838592422970.000015gr.google
622183853981600.000168gov.privacyshield
6231838502025690.000013org.kqed
624183831962770.000093org.drupal
625183821103540.000074com.snapchat
6261838149623380.000015ro.google
6271838139233670.000011uk.ac.leeds
628183813162710.000094com.mapbox
6291838014439070.000009uk.gov.scotland
6301837962019460.000018hu.google
6311837822443990.000008co.aeon
632183774463740.000070com.cdninstagram
6331837606235450.000010gov.fec
6341837602233120.000011com.virgin
6351837562822190.000015ar.com.google
6361837506041280.000009cn.globaltimes
6371837468843330.000008com.corel
638183740664640.000059com.herokuapp
6391837320040620.000009jp.go.ndl
640183731107910.000040google.blog
6411837231622080.000016com.justia
6421837221623200.000015za.co.google
6431837061622160.000016ru.ria
6441837023236940.000010com.intensedebate
6451836979437930.000009com.visualcapitalist
6461836909427220.000012si.google
6471836851241820.000008com.rediff
6481836760438340.000009ca.uvic
6491836723625770.000013ru.rosminzdrav
650183659184390.000062com.nypost
6511836588046780.000008org.wikimapia
6521836535034390.000010com.nationalreview
6531836496221340.000016uk.org.asa
6541836428238500.000009tw.edu.ntu
655183639745980.000046com.samsung
6561836319027030.000012is.google
6571836259838690.000009com.podomatic
658183612423160.000082cn.bshare
6591836042434840.000010org.wri
6601836002841600.000009uk.co.spectator
6611835985817110.000019ly.cutt
6621835831649890.000007to.gplus
6631835808649080.000007com.atwebpages
664183578261770.000150com.tripadvisor
6651835743850030.000007org.scala-sbt
6661835648842760.000008ru.msu
6671835645011610.000027com.udemy
6681835535829730.000011com.timesofisrael
6691835250652130.000007edu.csulb
6701835162247440.000007com.authorstream
6711835094441270.000009gy.rb
6721835011032040.000011us.ny.state
6731834987636440.000010com.linuxquota
6741834979835630.000010com.udn
6751834957838450.000009org.jenkins-ci
6761834950816860.000019com.pcworld
6771834910424810.000014uk.ac.imperial
6781834878452380.000007com.etymonline
6791834802634920.000010eg.com.google
6801834777433630.000011uk.co.bbci
6811834733823860.000014com.name
6821834693837450.000009com.novell
6831834592414870.000021com.digitaloceanspaces
6841834537660400.000006net.vingle
6851834535026150.000013us.pa.state
686183450406420.000044com.xiti
6871834500623020.000015fr.pagesjaunes
6881834424646040.000008by.tut
68918341982780.000417com.messenger
6901834150216720.000019id.co.google
6911834149240120.000009com.donaldjtrump
6921833972423590.000014co.pcdn
693183386746060.000046com.indeed
694183384464590.000060com.sxsw
6951833787023790.000014sk.google
696183371262460.000105uk.co.amazon
697183368263510.000074com.atlassian
6981833681012250.000025com.dell
6991833644249470.000007fr.online
7001833622619330.000018com.law
7011833564837830.000009com.wmtransfer
7021833542222420.000015kr.co.google
7031833540247090.000008edu.odu
7041833513029710.000011cl.google
7051833502443000.000008il.ac.huji
7061833478242710.000008tw.gov.cdc
7071833379428860.000012my.com.google
7081833301433850.000011com.scotsman
7091833286433220.000011com.instructure
7101833283245630.000008com.hackaday
7111833219421310.000016gov.pa
712183320546270.000045com.withgoogle
7131833110819970.000017scot.gov
7141833091231780.000011com.broadwayworld
715183308048580.000036com.canva
7161833069445250.000008com.mongabay
7171832980245080.000008com.macobserver
7181832968637250.000010org.sonatype
7191832811823910.000014gov.wi
7201832773626830.000013org.usgbc
7211832766241130.000009gov.peacecorps
7221832762446520.000008cn.tianya
7231832671034950.000010pk.com.google
724183263028700.000036com.marketwatch
7251832616414900.000021com.billboard
726183249761070.000316net.gandi
7271832487828450.000012com.thecut
72818324686890.000372me.ogp
7291832398045850.000008io.meduza
7301832389828270.000012uk.org.nationaltrust
7311832375839110.000009au.edu.adelaide
7321832339847660.000007de.uni-erlangen
7331832248237590.000009uk.org.rspb
7341832237637730.000009cv.google
7351832125651350.000007cat.bcn
7361831973637280.000009com.ipage
7371831972653110.000007com.brother
7381831814824100.000014my.com.thestar
7391831787234010.000010uk.ac.york
7401831750433150.000011com.politifact
7411831740831280.000011ee.google
7421831717833260.000011org.thinkprogress
7431831703421020.000016se.haxx
7441831676445540.000008au.edu.rmit
7451831627229590.000011hr.google
7461831529652120.000007com.selfridges
7471831524437720.000009au.com.telstra
7481831374614360.000022com.fiverr
7491831304434200.000010de.hu-berlin
7501831151635720.000010com.nola
7511831109434580.000010sa.com.google
7521831043641450.000009ca.dal
7531831012662370.000006org.arkive
7541830942227590.000012bg.google
7551830869634290.000010com.monday
7561830866446350.000008at.tugraz
7571830843235080.000010com.eiseverywhere
7581830829837640.000009uk.co.cfdr
7591830810232980.000011org.iucn
7601830744435710.000010app.web
7611830693237020.000010org.iucnredlist
762183069082920.000088com.surveymonkey
7631830639038060.000009gi.com.google
7641830603850560.000007ec.com.google
7651830596238750.000009de.uni-freiburg
7661830552842440.000008au.com.heraldsun
767183052225150.000052io.shields
768183049146100.000046org.eff
7691830487838290.000009com.psmag
7701830450647210.000007ua.at
771183027989300.000034gov.uspto
772183026481900.000137com.automattic
7731830128639480.000009com.mozello
7741830061211080.000028com.gizmodo
7751830041835960.000010pl.wp
7761830032234710.000010org.royalsociety
7771829962228190.000012org.unep
7781829945236060.000010com.realclearpolitics
7791829829835310.000010jp.coocan
7801829829626130.000013vn.com.google
7811829821844340.000008jp.hatenablog
7821829789642810.000008com.waitrose
7831829787646760.000008info.webry
7841829785244270.000008net.inquirer
7851829770442740.000008jp.gree
7861829717846110.000008org.nationalinterest
7871829633029810.000011edu.uconn
788182956109460.000034edu.columbia
7891829555455310.000006org.mises
7901829545212740.000024com.smashingmagazine
7911829522433030.000011uk.gov.companieshouse
7921829486644420.000008gov.ourdocuments
7931829466638940.000009sl.com.google
7941829291262180.000006com.rhino3d
7951829284234350.000010org.cfr
796182927807900.000040com.airbnb
797182927122830.000092jp.co.amazon
798182915704130.000065com.pubmatic
799182909208780.000036com.box
8001829042656100.000006com.coroflot
8011829034643480.000008com.thediplomat
8021828690240660.000009com.inhabitat
8031828666832770.000011com.bp
8041828652245920.000008cat.uab
8051828348038270.000009uk.co.villiers-london
8061828301441400.000009org.grist
8071828245240160.000009com.foreignaffairs
8081828132410810.000028com.tapad
8091828037813470.000023org.altervista
810182803583820.000069com.skype
8111828032443490.000008com.worldsecuresystems
8121827968024090.000014com.volusion
8131827951629070.000012ru.nethouse
8141827948035270.000010pe.com.google
8151827943847790.000007be.lesoir
8161827887432880.000011co.com.google
8171827881638850.000009de.uni-koeln
8181827877829100.000012org.gnupg
8191827802246560.000008com.mihanblog
8201827755433600.000011org.panda
8211827718634400.000010lv.google
8221827667453000.000007lu.google
823182764424840.000055com.inc
8241827567651030.000007cn.com.caijing
8251827513433310.000011uk.gov.metoffice
82618274258680.000471com.oculus
8271827373223640.000014org.donorbox
8281827331230380.000011rs.google
8291827325611970.000026com.merriam-webster
8301827144850510.000007ee.ut
8311827106025190.000013com.amebaownd
8321827092244820.000008com.marksandspencer
8331827078064470.000006su.clan
8341826994840960.000009ru.interfax
8351826962038520.000009org.rferl
8361826875629040.000012gov.nd
837182679945480.000049com.fortune
8381826777646930.000008it.unitn
8391826771456650.000006am.google
8401826676235020.000010org.iaea
8411826374838930.000009pr.com.google
8421826215850450.000007com.tok2
8431826193819010.000018ch.ethz
8441826192233420.000011gov.la
8451826118245070.000008org.democracynow
8461826117625930.000013net.noscript
847182602168360.000037com.mix
848182598624080.000066net.adform
8491825960852080.000007tn.google
8501825797842120.000008jp.hateblo
8511825788860290.000006hk.edu.hkbu
8521825768038840.000009nl.wur
8531825759450090.000007gr.auth
854182574069970.000031com.webs
8551825676045120.000008com.mnn
8561825670257590.000006ru.nnov
8571825623839540.000009com.afp
8581825574413650.000023com.format
8591825566252090.000007nf.co
860182539543290.000079com.getbootstrap
8611825298849610.000007jp.hatenadiary
8621825215447280.000007hk.com.hkex
8631825125811930.000026com.redhat
8641825097456000.000006com.gust
8651825008810670.000029com.symantec
8661824946625620.000013net.ucoz
867182493202680.000095com.typeform
8681824869463270.000006com.x10host
8691824833235470.000010uk.co.saveourschools
8701824789829340.000012com.squarespace-cdn
8711824729229770.000011lt.google
872182468725250.000051com.adweek
8731824684442950.000008com.scienceblogs
8741824647248480.000007de.uni-konstanz
8751824556263620.000006com.ueuo
8761824504838560.000009uk.gov.data
8771824475640050.000009tr.com.hurriyet</