Using deep web search engines for academic and scholarly research

You may have heard the term in passing before, the rumored-but-rarely-talked-about topic of the “deep web”. A web underneath the web, filled with petabytes of data and information that’s out of the reach of your standard Google, Bing, or Yahoo search bar.

But what is the deep web exactly, and what purpose does it serve for the greater research community as a whole? Read on in our guide to find out everything you need to know about the deep web, including what it means, where it lives, and how you can use it to your advantage.

The Deep Web: A Proper Definition

Google utilizes what’s known as a “spider-based crawler” to trawl the web for static webpage results, and then return them to you when you punch the right terms into the search bar. This only covers a very small portion of the actual information that’s available on the web.

Any results you get back from a basic Google search are from what’s known as the “Surface Web”. The Surface Web covers your basics: social media, news sites, shopping, blogs, etc.

Then there’s the Deep Web, which is not to be confused with the “Dark Web”, a portion of the internet most often associated with privacy protection connection services like TOR and online drug marketplaces like the now-defunct Silk Road.

The deep web contains a constantly updated torrent of raw, unchecked information, surging with complex technical terms and so many diagrams it’s enough to make Google’s Deep Dream AI blow a circuitboard. These are documents that keep records for things like census data, NASA mission data, patents, and academic paper databases.

It’s estimated that the whole of the entire surface web only amounts to about 20 terabytes of information, or 5 percent of the information available for open search. On the other hand, the deep web occupies about 7.5 petabytes of information, or just around 95 percent of the total.

How to Search the Deep Web

Knowing where to look when diving into the deepend of the web is the first, and probably most important step you should take before starting anything else. While the deep web is almost infinitely vast when it comes to the amount of information you can find, unlike what most people are used to when searching for something in Google, all of that data isn’t centralized in the same place.

This means for as many different subjects you can think of (finance, software, business, economics, academia, etc), there are an equal number of search engines designed to dive into the deep web archives of those particular subjects.

One issue that some researchers run into though is the problem of paywalls. There’s no getting around it; in order to run these websites/databases and keep the lights on, many of the sites mentioned below will keep their content hidden behind a paywall that can cost upwards of $50 to read a single document, or monthly subscription plans that get you access to all content for a flat fee.

If paywalls are a problem for you, one tool we recommend checking out is the Google Chrome browser extension Unpaywall. Unpaywall automatically scours the web for a free version of any content you’re trying to access that says it’s behind a paywall. You may not always get back a free result for every paper you search, however it’s still nice to the know the option is there if you need it in a pinch.

Below we’ve included a list of some of the services we think do the best job of cataloging all the information you might need during your next research project, making special note to highlight those that make it easier to search through than most.

JSTOR – The first – and probably most obvious – addition to this list is the JSTOR database. Established in 1995, this treasure trove of research continues to be one of the first stops for any academic researcher on their way down the rabbit hole. Offering full-text searches for over 2,000 individual journals and 15,000+ books, JSTOR is a must-have for anyone who prefers a more “one-stop shop” approach during their data deep dives. JSTOR allows you access to up to three books for free, while a subscription to the JPASS service ($19.50 a month/$199 per year) will give you unlimited reading and 10 PDF downloads every 30 days (up to 120 per year). If you can’t afford that, many universities (more specifically, their professors) should have a subscription they’d be willing to let you use as long as you ask nicely enough!

Archive.org – A gigantic database of media that’s been entered into the public domain. Sound recordings, old videos, rare books, pretty much anything you might need to build your next great presentation at school, work, or both! Partnered with the Wayback Machine, which has over 280 billion webpages that have been indexed since nearly the inception of the internet itself.

Library of Congress – Digitized archives of everything that’s entered the Library of Congress. Over 200 years of historical information as well as up-to-date volumes

Osti.gov – Government research archives, complete with a history of all studies undertaken by the government. Your tax dollars paid for these, so why shouldn’t they belong to you? 100 percent searchable, and capable of returning results from within any document you’re trying to search for.

General

The National Archives — National Archives’ research tools and online database. If there is anything you need to know about America’s history or the current state of the nation, this is the place.

HighWire Press — Online catalog of the largest repository of free full-text and non-free text, peer-reviewed content, from over 1,000 different journals. It’s hit or miss as far as what’s behind a paywall and what isn’t. The only way to find out is to filter down your search terms to a point where you can see enough publications on both the paid and non-paid side of the aisle to decide whether or not you’ll need to pull out that wallet.

Encyclopedia Britannica – The original Google, now online with all the great pictures and text you still remember from the books!

FRED – Up-to-date financial data covering 470,000 time series from 85 different resources, this database is provided free of charge thanks to the helping hands over at the Federal Reserve Bank of St. Louis. FRED links out to a number of other equally impressive resources for economic data. It should be the primary resource for anyone doing research in the fields of finance and economic theory in the US.

Books

Google Books – The most obvious choice. Though the other listings below are fine for what they do, but none can quite measure up to Google’s book-scanning prowess. Some books will have partial previews, others fully available, and even more still won’t let you see anything at all. All text is digitized (and searchable), but whether or not you’ll be able to read your results depends entirely on the state of the copyright license on that particular piece of text.

Scribd – This may not exactly fill the role of your ultimate academic research database, however the monthly subscription service is still a good way to stay up to date on any new articles that might be running in your favorite magazines or be able to search through books that just hit the shelves. The documents section allows users to upload pretty much anything with few restrictions, so it’s become a repository for many textbooks and other academic content.

Project Gutenberg — 53,000 free e-books available online, also part of the Archive.org searchable database.

The Online Books Page — A searchable database of more than 28,000 English books with the complete text available online.

Getty Research Institute – The Getty Research Institute library collections include over one million books, study photographs, periodicals and auction catalogs. There’s also a pretty deep collection of rare or unique materials that focus on art history and architecture.

Law and Politics

Law Library of Congress — Claims to be the largest collection of legal materials in the world, over 2 million volumes available.

THOMAS (Library of Congress) — Legislative information from the Library of Congress. All current and past bills that have ever been presented on the floor of the House of Representatives are archived here.

LexisNexis – Solid resource for any aspiring law student or practicing lawyer. Daily updated database of information, though it doesn’t come cheap. Prices for different services offered by LexisNexis will vary depending on the service and even what state you’re searching in, but expect to spend upwards of $125/month for services like Lexis Advance, which let you search through millions of court and legal documents submitted in actual cases from all around the United States. Your local library or university might have a subscription you can use.

Medical and Health

Science.gov — Gateway to science info provided by US government agencies. Searches an aggregated database of 200 million different publications and journals, best for anyone trying to do research on topics that are covered specifically under the “science” category.

PubMed — The U.S. National Library of Medicine contains over 16 million citations from MEDLINE and other life science journals reaching all the way back to the 1950’s. One of the first, and still one of the best medical databases available online today.

Globalhealthfacts.org – Indexed database of world health information, searchable by disease type, country, conditions, symptoms, and more. Great resource complete with hundreds of infographics that can be used to explain the statistics of certain health problems on a broader scale.

New England Journal of Medicine – One of the leading medical journals with full text past issues available online. Be ready to pay for some content, but quite a bit is available for free as well.

Science and Academic

Geography and Geology

US Geologic Survey – Packed with as many maps and images as you can stomach, covering many different aspects of the the US geological topography.

US National Map by USGS – The source for current geospatial data from the USGS. All maps provided are both interactively available on the web, as well as in their downloadable formats.

USGS Real-Time Water Data — A map of the United States showing realtime water quality data of the country’s rivers and reservoirs.USGS Earthquake Hazards Program — Maps of the world showing realtime earthquake data. Uses an interactive map that you can use to jump from location to location, fun for anyone who’s even got a passing level of interest in what’s really happening just under our feet.

Physics and Astronomy

The SAO/NASA Astrophysics Data System – A physics and astronomy data engine for academic papers. Every paper you want to read must be individually requested, which can be a hassle, but still one of the best ways to get your hands on the raw data pouring out from telescopes and physics experiments from all around the globe.

Academic Index – Splits into two different types of searches: the main search which basically returns more fine-tuned Google results, and the other that searches deep web academic troves.

Engineering and Technology

IEEE Xplore Digital Library – Contains over 1.4 million documents from the Institute of Electronics and Electrical Engineers. Searchable database of up-to-date materials regarding almost anything and everything to do with electrical engineering and technology as a whole.

TechXtra — Free access to reports, e-materials, research, industry news, and even job listings in the math, science, and engineering fields.MiscScienceResearch.com — Searchable access to scientific journals and databases. Huge database of aggregated papers and research, all text-searchable. Should be your first stop for any early research that may not require as deep of a dive as somewhere else. Arxiv – Cornell University repository. Access to 700,000+ technical papers on everything from quantitative biology to computer science. Appears to offer full text in several formats.DeepDyve – DeepDyve is a commercial trawler that has aggregated quite literally millions of articles across thousands of scientific journals. If you’re searching for anything in the way of STEM projects, this is a great place to start (you’ll have to pay for the privilege, however).Video Resources VideoLectures.net – Really strong set of video lectures from high authority sources, nearly 20,000 lectures to choose from and over 22,000 informational videos in total.

The web is a giant, wonderful place filled with just about any information you could possibly dream up and then some. By using these sites and search engines to trawl the deep web, you can be sure that your next academic paper, Ph.D thesis, or your college entry essay will be packed with the richest sources possible.