Six Clicks or Less, LeRumba Connects People of the World to 1 Trillion Pages on the Web

2/20/2016
Anant Goel & Alan Kyle Goel

LeRumba is a web portal that connects people of the world─ from 50,000+ virtual Town Squares─ to find what they need in six clicks or less on the Web. And most importantly, you can choose custom searches to rank based on “relevance or “timeliness”.

The secret lies in connecting the global communities’ to─ and leveraging the best of breed, global search engines, social media sites, and real-time location based data feeds and customized consumer services.

We knew the web was big...

In March of 2013, Google revealed: How do you run 100 billion web searches a month on 30 trillion unique individual pages?

The 100 Billion statistics is a towering number, but when you consider that Google.com is just a small part of our relationship with Google, it’s an overwhelming statistic. Consider Google Maps, Google Navigation, G-Mail, Google Docs, Google Adsense ads seen on sites across the web, Android Phones, Chrome Browsers, and the list goes on and will continue to grow.

And it sure has grown…

“…to over 60 trillion pages worldwide and growing at the rate of several billions daily.”

Google gave an inside peekinto how web search works today, revealing some fascinating numbers in the process.

According to Google, search starts, of course, with crawling and indexing… and that the web now has 60 trillion unique individual pages. That’s up an astonishing 60 times in five years: Google reported in 2008 that the web had just one trillion pages.

Google says that it stores information about those 60 trillion pages in the Google Index, which is now over 100 million gigabytes.

How Google does the searches for you…

When you search, Google tries to figure out not just what you’re typing into the box, but what you mean. So algorithms for spelling, auto completion, synonyms, and query understanding jump into action. When Google thinks it knows what you want, it pulls results from those 60 trillion pages and 100 million gigabytes, but it doesn’t just give you what it finds.

How Google Ranks the Searches…

First, a ranking procedure uses over 200 closely guarded secret factors that look at the freshness of the results, quality of the website, age of the domain, safety and appropriateness of the content, and user context like location, prior searches, Google+ history and connections, and much more.

How fast is a Google Search?

In just over an eighth of a second, Google then delivers the results to your computer, tablet, or phone.

To test how well its searches are actually performing, Google also uses real-live humans as search evaluators. Forty thousand times a year, Google’s search testers check results, see what’s working, and provide suggestions on how to improve.

And What about Web Spam?

Web spam is useless pages that are crafted to rank well on Google, draw your attention and clicks, and then monetize your eyeballs or clicks off to somewhere else. Google said that it notifies sites that it considers them spam, or that they have been hacked, at a rate of 40,000-60,000 per month.

How big is the World Wide Web?

No one knows for sure how many individual pages are on the web, but right now, it’s estimated that there are more than 1 trillion that are connected.

Of the roughly 60 trillion web documents in existence— excluding the aforementioned 1 trillion pages that include every image, video or other file hosted on every single one of them— the vast majority are poorly connected, or linked to perhaps just a few other pages or documents.

Distributed across the entire web, there are a minority of pages─ search engines, indexes and aggregators─ that are very highly connected and can be used to move from one area of the web to another. These nodes serve as the “Kevin Bacons” of the web, allowing users to navigate from most areas to most others in less than 19 clicks.

However, LeRumba Connects people of the World to 1 Trillion Pages in Six Clicks or Less

They Do It in 19 Clicks or Less?

Barabási, Hungarian-American physicist, credits this “small world” of the web to human nature─ the fact that we tend to group into communities, whether in real life or the virtual world. The pages of the web aren’t linked randomly, he says: They’re organized in an interconnected hierarchy of organizational themes, including region, country and subject area.

Interestingly, this means that no matter how large the web grows, the same interconnectedness will rule. Barabási analyzed the network looking at a variety of levels—examining anywhere from a tiny slice to the full 1 trillion documents—and found that regardless of scale, the same 19-click-or-less rule applied.

How LeRumba Does It in Six Clicks or Less?

Its the way we have structured LeRumba portal…

“… It invokes the theory of six degrees of separation, and not the small-world properties, by eliminating repeatedly duplicated clicks and keeping it relevant, fresh and live, thus driving down the clicks from 19 to 6 or less.”

This arrangement, though, reveals some cybersecurity risks. Barabási writes that knocking out a relatively small number of the crucial nodes that connect the web could isolate various pages and make it impossible to move from one to another. Of course, these vital nodes are among the most robustly protected parts of the web, but the findings still underline the significance of a few key pages.

To get an idea of what this interconnected massive network actually looks like, head over to the Opte Project, an endeavor started by Barrett Lyon in 2003 to create publicly available visualizations of the web. In the map above, for example, red lines represent links between web pages in Asia, green for Europe, the Middle East and Africa, blue for North America, yellow for Latin America and white for unknown IP addresses.

Although the most recent visualization is several years old, Lyon reports that he’s currently working on a new version of the project that will be released soon.

[Curated content based on excerpts from posts, blogs, media articles, and sponsored research]