Where to start a deep web search is easy. You hit Google.com and when you brick wall it, you go to scholar.google.com which is the academic database of Google. After you brick wall there, your true deep web search begins.

You need to know something about your topic in order to choose the next tool. To be fair, some of these sites have improved their index-ability with Google and are now technically no longer Deep Web, rather kind-of-deep-web. However, there are only a few that have done so.

Multi Search engines

Deeperweb.com – (broken as of Sept 2016, hopefully not dead) This is my favorite search engine. It breaks your results down into categories – general web, blogs, news, academic, cloud, metrics, research, etc. This allows you to quickly focus on the type of answer you were looking for. Makes my top 10 websites!

Surfwax – They have a 2011 interface for rss and a 2009 interface I think is better. Takes 60 seconds to understand how to use it.Dogpile – another multi engine aggregator
Scout Project- scout.wisc.edu — Since 1994, the Scout Project has focused on developing better tools and services for finding, filtering, and presenting online information and metadata.

www.findsmarter.com – You can filter the search by domain extension, or by topic which is quite neat. Sources Yahoo, Bing, Wiki, Blekko, and Alltheweb

These Deep Web search engines talks to the onion service via Tor and relays, resolve the .onion links and then deliver the final output to your regular browser on the ordinary World Wide Web.

However, there is one consequence of browsing Deep or Dark Web on a regular browser. Working this way will make these .onion search results visible to you, me, and also, for Google.

Moreover, tracker-less search engines are also popular in the TOR culture – like Disconnect, DDG, IXQuick – which ensures your privacy searches.

Cluster Analysis Engine

TouchGraph – A brilliant clustering tool that shows you relationships in your search results using a damn spiffy visualization. The smart way to use it, is to let it help you find new sources to your search topic. I have to add, the wiggly effect on the visualization is damn cool, just grab the center item and move it to understand what I’m talking about. (sometimes it doesn’t wiggle, however. Java issue?)

Yippy.com – A useful, non-graphical clustering of results. Give it 2 minutes of your time to understand how it works and it will give back hours of saved research time. They are actively funded and are acquiring other search tech assets, so worth using.

www.navagent.com/ – Not a web based search engine, requires you download software. Highly rated, very interesting especially to the 35F intel types.

National Security Archive – Declassified papers and such. In their words – “National Security Archive Electronic Briefing Books provide online access to critical declassified records on issues including U.S. national security, foreign policy, diplomatic and military history, intelligence policy, and more. Updated frequently, the Electronic Briefing Books represent just a small sample of the documents in our published and unpublished collections.”

www.osti.gov – Government research archives, if your tax dollars paid for it, the results are here. Also a huge collection of science presentation videos.

San Francisco Public Library – A great online library. This is just one example of many such local public libraries that offer similar services. Sorry, you can’t use their access to commercial archives unless you are a member.

Xrefer(commercial) — Fee based database of 236 titles and over 2.9 million entries.

LexisNexis(commercial) — Billed as the world’s largest collection of public records, unpublished opinions, legal, news, and business information. Over 35,000 individual sources are claimed as searchable. I’ve not been able to justify subscribing, so I don’t know.

Forrester Research(Commercial) — An independent technology and market research company, publishing in-depth research reports on a variety of subjects.

Books Online

Archive.org – Has books online in epub, txt, and pdf formats. The collection encompasses others such as Gutenberg Press, etc. So this is the best site to start with. Again, this makes my top 10 websites. Share the love.

Hathi Trust – http://www.hathitrust.org/ — a partnership of major research institutions and libraries working to ensure that the cultural record is preserved and accessible long into the future. There are more than sixty partners in HathiTrust, and membership is open to institutions worldwide.

Books.google.com – They are putting the squeeze on all the book scanning businesses. They want to scan the world to add it to the Google Borg. You will be assimilated.

The Online Books Page — A searchable database of more than 28,000 English works with full text available for free online.

Bibliomania — A database of free literature from more than 2,000 classic texts. Archive.org crushes this.

Project Gutenberg — The granddaddy of online books with a catalog of more than 20,000 free books with full text available online. Included in Archive.org.

Get Abstracts(commercial) — Large online library of more than 8,000 business book summaries. It is the most efficient way to get the best business titles.

Getty Research Institute – http://www.getty.edu/research/library/ – The Getty Research Institute library collections include over one million books, periodicals, study photographs, and auction catalogs as well as extensive special collections of rare and unique materials. Focusing on art history, architecture, and related fields, they begin with the archaeology of prehistory and extend to the contemporary moment.

Audio Books Online

Videos

www.liveleak.com – A Video news aggregator of citizen supplied videos. Great for OSInt in foreign countries. The site appears to have some good aggregation functions to turn randomly submitted videos into a logical collection around a topic. Proven useful for aggregating the Feb 15, 2013 meteorite strike in Russia.

Government Printing Office — Big catalog of stuff published by the Government Printing office. Has business stuff but much much more. Environmental reports, legal docs, nature stuff. Hell, I typed in ‘mushroom’ and pulled up 34 entries.

United States Government Printing Office (GPO) — I mentioned this earlier, they seem to have everything. A search engine for mutliple government databases: US budgets, campaign reform hearings, code of federal regulations, congressional bills, etc

heinonline.org – (commercial) – Claims to be the ‘worlds largest image based database of legal documents’ . I was able to find an obtuse document on using Bayes Theorem for fact finding in a criminal case.

Project Vote Smart — Government officials and election candidates database, order by last name or ZIP code.

Medical and Health

PubMed — The U.S. National Library of Medicine contains over 16 million citations from MEDLINE and other life science journals going back to the 1950s. Contains links to full-text articles and external resources. Supposed to be the best damn resource for medical out there.

Science and Academic

Academic Index – Main search is a filtered Google search aimed at high authority rank sites, mainly .edu and .gov which filters a great deal out. Second search ties into deep web academic and non-academic databases skewed to librarians and educators.

Science.gov — Gateway to science info provided by US government agencies.

VideoLectures.net – Phenomenal video lecture coverage from high authority rank sources. A great go-to place to find peer-reviewed, conference presented, in depth coverage of a topic at a conference. A nice bonus, is the presentation slides are shown separately, and you can jump to slides of interest to you. Heavily technology based, and 66% is in English. Most lectures 45 minutes or longer.

WebCASPAR — A horrible interface to an alleged wealth of statistical info on science and engineering. I found the site slow, cludgy and designed around 1965 run off of candle power. From their website:”The WebCASPAR database provides easy access to a large body of statistical data resources for science and engineering (S&E) at U.S. academic institutions. WebCASPAR emphasizes S&E, but its data resources also provide information on non-S&E fields and higher education in general. ”

The Complete Work of Charles Darwin — Charles Darwin’s published works, search-able and available online. He’s still old and his works still ramble. Scanning didn’t help him much.

Arxiv – arxiv.org/ — Cornell University repository. Access to 700,000+ technical papers on everything from quantitative biology to computer science. Appears to offer full text in several formats.

VADLO – www.vadlo.com/ — Life Science Search Engine. Very hit and miss. Don’t have high expectations.

Deep Dyve (Commercial) — www.deepdyve.com DeepDyve has aggregated millions of articles across thousands of journals from the world’s leading publishers, including Springer, Nature Publishing Group, Wiley-Blackwell and more. Haven’t paid the premium to give it a test ride, if someone has, please write a review below.

Cyber War / InfoSec

Security Tube – securitytube.net — A large library of videos covering many topics in InfoSec, cyberwar, and most of the hacking conferences.

DefCon — The main hackers Con, so well known that now the Feds send their folks here and it has become a wild west training ground for coming trends. Archives go back to Defcon 1. They are now on Defcon 20, I think.