Researchers index dark web, find most of it contains illegal material

The Internet is less of a wild west than it once was, but there are still corners of it that are obscured from view and quite shady. For instance, Tor hidden services have included numerous criminal enterprises like The Silk Road. Two researchers from King’s College London set out to discover just how much of Tor was devoted to illegal content. The result? Most of it.

Tor (which originally stood for The Onion Router) is a network composed of layers of encrypted relays through which data is passed. Each node in the network only knows where a packet just was and where it’s going next. After a few hops, the source of a packet is (almost) impossible to discern. Most people use Tor to reach sites on the open Internet anonymously, but there are also sites that are hosted entirely within Tor, called hidden services. The Silk Road was a hidden service, but there are innumerable others. It is these sites Daniel Moore and Thomas Rid sought to quantify.

It’s no easy task to find all the hidden services on Tor, let alone get a look at the data they’re hosting. Hidden services are ephemeral, often switching addresses and server locations without notice. To top it off, Tor addresses are just long strings of characters with a .onion domain at the end. In order to get a proper sample of all the hidden services lurking out there, the pair built a Python script that crawled the dark web, starting with the popular Tor search engines Onion City and Ahmia.

The bot’s job was to scrape the content from each page and upload it for analysis. When the bot found a link to another hidden service (the main way you find things on the dark web), it would hop to that one and scrape it too. The pair used an algorithm to process all the content collected and sort it into categories like drugs, social, pornography, and financial. The sorting was spot checked and found to be very accurate overall.

After the script had run its course, 5,205 live websites were indexed; a total of 2,723 pages were classified by content. Pages with fewer than 50 words and those with no content were dropped in the “none” category. According to the analysis, 57% of the sites hosted illicit content like drugs and child pornography. The Tor project estimates there are about 35,000 total hidden services active, so this is far from a full survey, but enough to be a representative sample.

Moore and Rid say their goal with this project was to establish a more moderate perspective on the role of encryption. Politicians are currently demanding unworkable backdoors to encryption, but Moore and Rid say that privacy activists fail to fully acknowledge the potential for abuse. They don’t have a solution in mind — they’re just making the data available.

I suppose if the content were legal, there would be no need to hide it out of sight.

Keith Koons

Sure. That’s why the constitution says everyone must submit to a home search by armed military every day to make sure everyone is safe and legal. Oh.. wait, no that’s not now that works.

AlCarn

Just looking for justification to ban encryption.

charlie november november

So, despite ExtremeTech writers continually hand-wringing about how the Dark Web is mostly used by innocent people trying to hide from government persecution and other legit, defensible reasons, it turns out that it IS mostly illegal and illicit activities that maybe you shouldn’t be shedding tears about?

http://www.funstufftosee.com/ Dozerman

That’s a false dichotomy. The vast majority of people using the “dark web” are using it for completely defensible purposes. These hidden services are in no way linked to that demographic. In addition, what this data doesn’t show is what percentage of those pages are the same “site”. For instance, if someone were to do the same analisys on the clearweb as we see it today, ET would probably get thousands of counts, since each article has its own page.

Good sir, you are confusing the darknet with the deepweb. The darknet is a section of the deepweb that can only be accessed anonymously through the Tor browser, hence “Tor Hidden Services.”

Additionally, the darknet itself contains a *lot* of illegal content, but there are legitimate websites mixed in. One of my primary reasons for accessing the darknet is to view conspiracy theory forums. They’re great for a good laugh, especially with friends.

LostAlone

It’s crazy important to differentiate between hidden services ONLY available through Tor and regular services that people choose (or have to) access via Tor to be safe. Tor allows for people in censorious countries to visit such shocking content as non-state run news and contact their families who have escaped. While I agree that hidden services are largely unsavory, that in no way changes what Tor is and the good that it can do.

Nick2000

” the pair found 5,205 live websites, 2,723 of which they were able to
classify by content. Of those, 1,547 hosted illicit material—around 57
percent.” “around 35,000 unique .onion addresses exist”

So they found that about 57% of less than 10% of TOR hidden services serve child porn and illegal drugs. I assume that these got forwarded to the FBI (which should already know about them).

Unfortunately, this does NOT give any idea if 57% or 0% or 100% of the other 32000 sites host any of it as there is no real way of determining if the snapshot they got is ‘representative’ or not. It would be like walking in a town without any map or anything and asking anybody you encounter for an opinion!
In short, while the data is interesting, it does not allow ANY conclusion…

It also does not move the needle to much on the discussion: is TOR worthwhile because of the good it allows despite the bad? It seems that the State Department thinks so. In addition, considering that the FBI occasionally takes control of illegal websites (Silk road, child porn sites, …) to catch members, then I suppose that this “honey pot” still has it’s uses.

Javier Martinez

“In short, while the data is interesting, it does not allow ANY conclusion…”
This is incorrect, it all depend on the statistical analysis the team made. It’s similar how voter polls work, it’s not necessary to call EVERY person to know the answer to a questions but it’s necessary to collect data from a smaller group that correctly represents the whole group.

Orumus

Which is also why voter polls are not considered anything close to factual data. I use parts of the tor technology as part of my job everyday and I can tell you from a whole lot of actual experience that while yes there are illegal activities that use the inherently more secure and anonymized tor network, there is also a lot of legitimate business. The way I see it is similar to old Bazaar. A normal user will find sites very similar to sites on the clear web selling things but if you go looking you will stumble into the seedier part of the Bazaar.

People read these articles having never actually used the dark web and think they know what it is all about when in reality they do not. Government agencies do not want normal people to have access to encryption that they do not have an easy way to break. So they sponsor misinformation to sway public opinion away from things that protect their basic rights so they can take away these protections without public outcry. Its an old trick and its sad people still fall for it.

Javier Martinez

The devil is in the details, you’re throwing a blanket statement about sampling as a methodology being bad when in reality the issue is either bad implementation or more likely abuse to influence people’s opinions as you mention in your post.

In any case I agree with your sentiment about the need for encryption without back doors and we as citizens need to make it clear to our representatives.

Orumus

No I am not. Without getting into all the ways sampling can be done incorrectly to either suit an agenda or simply out of ignorance, I stated voter polls are not considered factual data…. and they arent.

I would also point out that saying this ratio of sites are illegal does not = all people do is illegal. There is a huge difference between web traffic and number of sites. A studly like this would have been somewhat helpful if it monitored the traffic of the “illegal” sites and the legitimate sites but oh wait they can’t… which is the point. Also people need to understand that the primary reason people like myself use tor technology is for secure communication around the world with people trying to flee bad situations or for legit business that needs a certain level of security without a huge investment(and other reasons). Illegal sites mean nothing and when they get taken down no one that I know that uses the dark web sheds a tear. But to try to convince everyone that tor/dark web/encryption is bad and if you dont need it no good person does based on this type of half truth sickens me.

Javier Martinez

In case you haven’t noticed we’re talking about different things. I’m saying that polls and sampling methods as a technique when done correctly do represent the population. On the other hand you’re saying that given the many way for voter polls to be done incorrectly (all) voter polls are not considered factual data. Different things on a related topic. I’m also neither supporting nor opposing your statement; just acknowledging it.

Orumus

We where not until you decided we were. Your original comment was holding up voter polls as a great example of how sampling can be done correctly. I pointed out that voter polls are not considered factual or reliable data. If you want to point out accurate sampling voter polls would not be the one to use.

greybirdtoo

Well, if you listen to the fivethirtyeight blog, which is considered an excellent source of voter poll information, they say that voter polls of primaries and caucuses is not reliable. Voter polls done correctly just prior to an election are much more reliable. So you need to differentiate exactly which voter polls.

Nick2000

In this case, they actually have no mechanism to asses that the sample is representative of all the sites. That is the issue really. Maybe it is, maybe it is not. Great for debate but useless for writing actual policy unless you have an agenda one way or the other.
For example, looking at all the fishes I could find in one small geographical area does not necessarily mean anything regarding the entire ocean. You have no idea if you stumbled across specific local conditions or not (especially if you cannot do some spot checks in other spots)

charlie november november

Why do I suspect that IF the results were that only 5.7% of the randomly chosen sites were found to have illegal content, your response would be that “PROVES” that most of the Dark Web is legit. With no issues with how the sampling was done!

Orumus

I am sorry but no. At no point does the OP conclude that this data proves anything one way or the other, which is the point of the post. The problem he points out is the people assuming such a small sample proves that it is inherently bad.

Nick2000

Because you would be wrong.

http://www.funstufftosee.com/ Dozerman

Copied from my reply to another commenter on this article:
What this data doesn’t show is what percentage of those pages are the
same “site”. For instance, if someone were to do the same analisys on
the clearweb as we see it today, ET would probably get thousands of
counts, since each article has its own page.

cah404

Best place to hide government secrets as they communicate to their agents. A wealth of the world’s dirty laundry that has the truth about how the world really works and how the elite pull the levers of control but the public can’t dare go on it without being inundated with illegal material…clever and devious

http://www.LaserGuidedLoogie.com Ken

Imagine what this is like for a cop.

You don’t even have to leave your office, just sit there and troll through the “dark web” for a bit and your only problem will be deciding which criminals you want to go after first.

Personally, I think it’s prone to make them lazy.

Cops get ruined by this type of easy police work.

It makes them less likely to want to put in the effort to do REAL police work, and instead set up 80 IQ morons as “Terrorist Masterminds!” Or set up road blocks, with 40 cops on a snowy mountain road, and shoot the hell out of a car full of anti-government protesters because they were “trespassing.”

-Ken
LaserGuidedLoogie.com

Zunalter

I’ll take “Obvious Conclusions to Soak Up Research Grants” for 600, Alex

Kyle

My initial response to the title was “You don’t say”. The only thing here that surprised me is “After the script had run its course, 5,205 live websites were indexed”.

WTF

With the warranted distrust many have over their government and its agencies do we honestly believe much of what really goes on or is trickled down to us. Its a sad state of affairs that whether its discussing the dark web, immigration, gun control or any hot topic, there are too many interest groups and especially partisan government having their own agenda instead of being transparent.

Excuse me for being a skeptic, but I don’t take anything a government says as being fact as they’ve lied and been caught out too many times. In fact I believe its got so bad now that even if the authorities say it as it is without spin, obfuscation or omission, most still disbelieve them.

As for this article, I suspect most content is illegal and although I’ve toyed with the idea of using secure systems to stop prying eyes just out of principal, in the end, I couldn’t be bothered. The problem is although I have nothing to hide, when a persons hides something, the natural response from ‘big brother’ is to say what or why are you hiding something and as innocent as you are you’re immediately under suspicion. Perhaps if everyone used super secure systems, the work load for the authorities would overwhelm their ability to filter out who is doing what and almost guarantee security.

pete moss

Politicians are demanding for unworkable backdoors to encryption. WTF!! Does that mean???
Politicians might find that their extremely deep dark secrets will be uncovered.
Maybe learn the real facts of Obama and his APARTHEID promoting administration.

http://www.asplint.com/ Jeffrey Deutsch

Your headline says that “most” dark web sites contain illegal material. To my mind, most is overwhelming, like 75% or 80% if not 90 or 95+%. If it’s 57%, a much better term IMHO would be “much” or “a majority of”.

In other words, yes it’s a concern, but we need to not insinuate that there aren’t plenty of legitimate dark web sites too. Including, say, human rights networks and dissidents in certain countries.

I can think

Name one site. Cheers

http://www.asplint.com/ Jeffrey Deutsch

I don’t need to. The authors themselves have pointed out that 43% — more than two sites out of five — are legitimate. I’m just calling them out for using an unwarranted click-bait term “most” in their headline.

I can think

most people do have trouble naming legitimate onion sites. 3 divided by 5 is 60%. “most” seems like a reasonable statement by the authors.

I can think

Good report. All you need to do is download tor browser and cruise the darknet. It is the cesspool of the universe. Way to go Tor and the Electronic Freedom Foundation and JP Barlow. BTW, Garcia and Hunter wrote songs SOOOOO much better than yours. Keep on supporting Child Pornography pervs.

ExtremeTech Newsletter

Subscribe Today to get the latest ExtremeTech news delivered right to your inbox.

Use of this site is governed by our Terms of Use and Privacy Policy. Copyright 1996-2016 Ziff Davis, LLC.PCMag Digital Group All Rights Reserved. ExtremeTech is a registered trademark of Ziff Davis, LLC. Reproduction in whole or in part in any form or medium without express written permission of Ziff Davis, LLC. is prohibited.