Posted
by
CmdrTaco
on Wednesday August 02, 2000 @05:04PM
from the is-anybody-out-there?--is-anybody-listening? dept.

NTT writes "The Telcordia Internet Sizer provides daily updates on the size of the Internet. The Telcordia solution to quantifying Internet growth statistics is based on an internally developed unique sampling method. In this approach, over 150,000 randomly generated IP addresses are sampled on a daily basis and checked for their existence. Check out the other stats they have here"

In my own world, twisted as it may be, has a keen fascination with the word "starfish". I have come to discover that starfish's look like anuses. So let us do an experiment. I'm going to replace everytime you say starfish with the word anus. Let's take a look at the insuing hilarity, shall we?:

Just like anus: big, brown, and hairy. Cut it and you just eventually end up with more anuses. Each operates in its own little anus world not really worrying what the other anus is doing. Ejaculation (eventually) reroute around dead anuses. Only in the old DARPANET days could one logically speak of a central anus.

I would be willing to bet that we will need a major upgrade of the backbone in the next 3-7 years.

I have to disagree with that. The bandwidth is always being added. The bandwidth of the internet may lag behind what is needed by a bit; but it is maintaining a steady pace with bandwidth growth. New pipes are added every day.

If there is an overhaul, it may need to be at major peering points. Not so much upgrades as much as new ones.

There are many instances where internal hosts (that is, those behind a firewall) have real registered address space addresses. RFC 1918 addresses are nice, but not even close to every company uses them, even for their internal network.

Also, this test doesn't really consider network address translated addresses with public DNS entries. For example, suppose I have an address for www.mydomain.com with my own authoratative domain server. The address is, say, 172.16.1.1 and anyone can connect to it. However, I actually have my firewall round robin the requests for that address to my web farm of 10 machines, 192.168.1.1-10, none of which are in external DNS. The survey would only catch one address, which actually has *no* machine directly associated with it. DNS is a reasonable measure of the size of the internet, but it is hardly an authoritative one.

This isn't even counting DMZ machines (those external to firewalls) that are connected to the internet "directly", but don't have a DNS entry. Why would you want a machine like that? Well, how about IP addresses on routers? Would you want those in DNS? How about intrusion detection servers, which monitor incoming traffic for attempted break ins. You really want to make yourself publicly known, making it easier for script kiddies to find you?

A better test would be an aggragate test of DNS reverse resolution, ping & traceroute. I'm sure that there are many machines out there that are open to some of these but not all three.

According to this netsizer site there are 18.2m users of the internet in the UK (click the java world map, then go to europe).

However, a recent consumer association (I think) survey reported that 40% of UK households now have internet access. That would make about 25m and then you need to include the number of people that have access at work etc..

Whilst the method they use may return a acurate(ish) report on the number of hosts on the internet, I can't see how they have extrapolated the number of users.

In a recent survey, 45% of those surveyed admited that they lied in surveys.

Immediately following posting on slashdot.org of the statistic of approximately 87 million hosts being connected to the internet, the statistics increased jumped to 186 trillion hosts. "How in the hell is this possible," one spokesman was quoted as saying, "There aren't that many people on this whole damn planet, and Hell! There can't be _THAT_ many addresses under IPv4!"Logs indicate connections above and beyond the standard 255.255.255.255 range, showing such IP addresses as 1.4m.3l337.b147ch and 666.666.666.666. Federal officers have been subsequently summoned to investigate whether or not this is actualy a function of a new Distributed Denial of Service [DDoS] such as the one that struck Yahoo! and other major sites recently. This phenomenon is being classed as a new variant of well-known Trinoo and TFN, labelled curiously "Slashdot Effect".

The error induced by the sample size is overcome by the error in the sampling methodology.

They presume reverse DNS implies IP address usage. This is not correct, of course. There are many machines that don't reverse lookup. Also, there are many IP addresses that reverse lookup and aren't there. The most glaring data is to look at Lucent in their enterprise list [netsizer.com] Apparently, Lucent has 48 machines for each employee. Lucent will successfully reverse DNS every IP that they are asked about, into something like h135-1-1-1.outland.lucent.com. Splitrock.net apparently has a similar scheme, although the naming method is a little more opaque.

When your estimate is 87 million, of which 8.3 million of your count are highly suspect, it's not the 3 per cent sampling error that you should be concerned about.

Mine just dials my usual selection of ISPs. As it happens one of them is a fixed IP dialup (demon internet since the have mobile phone access numbers for very fast conencting) and what I was pinging was definitely my phone since the ping time was about 900ms from a cable modem and when i hung the phone up the responses stopped.

Since my freephone ISP (lineone)are stopping access soon i'll have to start paying for access again soon:( Does anyone else know any particularly Wap friendly 0800 isps in the uk?

I'm always curious as to why people are interested in the size of the internet. As long as it works, and people think it's running nicely, does it really matter?

I'm guessing you're not a network administrator. It works because the infrastructure can support all the traffic that's currently on it. If your infrastructure is build to support 10 billion hosts, and your survey reveals you have 10 million active hosts, you can relax.

On the other hand, if it reveals you have 900 million hosts, and you only had 500 millon two weeks ago, you're in trouble and you need to get some new hardware, fast.

So aside from a general curiosity as to how many people out there want to download my mp3's, there's a legitimate reason for the 'net community to be interested and even concerned by the size and growth rate of the Internet.

I'm always curious as to why people are interested in the size of the internet. As long as it works, and people think it's running nicely, does it really matter? I can't see any competitors to the internet for various institutions to be battling down, so I'm assuming that these reports are issued as nothing more than a cheap way to get some hits on reporter's website and to raise their meagre profile a little.

This only measures how many hosts are listed within DNS, not the total number of machines on the internet. It doesn't measure IPs used by dialups, machines behind firewalls, IP masquaraded machines, etc. In other words, there are more than 87 million computers on the internet, quite a few more I would guess. In fact, I would say that the exact number is almost impossible to figure out.

I doubt that machines using IPs reserved for local networks (machines that therefore never can be reached directly from the Internet) really should be counted as "hosts on the internet"... (this is the case with masqueraded machines, etc)

Now that this survey has shown us how deep the penetration of the Internet is, let's figure out the stats for the real meaning of the net: PORNO!!!!!

Assuming that each host were to have 10 megs of original, non duplicate pr0n online (yes I know thats a very very low estimate), with 87 million hosts out there, that would mean that there 830 terrabytes of luscious luscious pr0n out for your downloading pleasure! Excuse me while I check out the newsgroups...

"The most fortunate of persons is he who has the most means to satisfy his vagaries."

With all the mostly unused but allocated Class As and Class Bs that were given out long before we ever knew how popular the net was going to be, firewalls, masquerading, dynamic IPs and God knows what else, how good can sampling be on this network?

A general rule of thumb for errors in counting in a sample is sqrt(n).

Err...If I remember correctly from my stats class the general rule for margin of error is actually 1/sqrt(n). Common sense says that as the sample size increases, the margin of error should decrease, so sqrt(n) doesn't seem right.

(IIRC, this is because we are sampling from binomial distribution (either an IP exists or it doesn't), where the margin of error in the normal approximation is given by z_star*sqrt(p_hat*(1-p_hat))/sqrt(n). Using z_star~2 for 95% confidence and p_hat=.5 in the worst case, this reduces to 1/sqrt(n)).

Anyway, a sample size of 150,000 is incredibly good, and I think margin of error will be so small that it's not worth calculating (yes I'm lazy). So a better statistical question is whether the IP addresses tested were a random sample of all possible IP addresses? (For example, I know that some addresses are reserved and may not be used, so it would be a mistake to sample such addresses.)

ipchains -P input DENY to be contiunued... long live IP address surveys.

This is a good point. If they do a scan at an IP address and none of the priviledged ports are responding (accepting connections or indicating that they're closed), the best you can do is assume that there's no computer there. Right?

Does this measure all IP's, or just addresses in DNS or what? If its all IPs then that means theres only 87000000 out of a possible 256^4=4294967296 which means we are only using 2% of the possible address space. So why all the noise about IPv6?

Every host, router,web server,dns server,ftp server and every computer connected to the internet needs an IP. There are ways to share an IP but I have to think that of all the host on the internet more than 98% are just grandmothers browsing the web, that leaves a smaller and smaller amount for every one else. The crunch is on, just try to get your ISP to give you a static IP or your own block without some explaning.

If I remember correctly from my stats class the general rule for margin of error is actually 1/sqrt(n). Common sense says that as the sample size increases, the margin of error should decrease, so sqrt(n) doesn't seem right.

1/sqrt(n) gives you the margin of error as a percentage, to figure out the number of IP addresses which this accounts for, the calculation is 1/sqrt(n) * n which becomes sqrt(n) which is what he was talking about.

Anyway, a sample size of 150,000 is incredibly good, and I think margin of error will be so small that it's not worth calculating

I would be willing to bet that we will need a major upgrade of the backbone in the next 3-7 years. Simply because of the growth of the broadband market and the fact that the internet is catering itself to that market. I mean imagine the volume of network traffic that/. produces simply through raw text and then imagine a site like Joe Cartoon [joecartoon.com] or Shockwave.com [shockwave.com] might produce through their flash and other multimedia products being transported across the net.

What about those ignorant Website Administrators that filter out ICMP completely (because there are many evil attacks that use ICMP;) and dont't know what consequences it will have ? (besides on not counting them;)Samba Information HQ

I've whiled away many a dreary evening by trying to make a guess at the amount of data available on the Web. Once upon a time I thought there were no more than maybe a couple of terabytes out there. NOw I know that this is wrong by several orders of magnitude. I'm starting to think that we're on the order of tens of petabytes here- maybe more. The *useful* information is of course a tiny fraction of that.

The difficulty I think is that I have no concept of size above about a megabyte. It just loses all meaning, and becomes purely "big". The same applies to the count of the hosts on the net, or the number of people reading this. It means nothing more than a number to me...

You don't need to be running a "server" off of your pc in order to be an internet host. You just need to be a "computer" on the internet. Hell, you don't even need to be a computer. In my senior year, I developed software and hardware to put a coffee machine on the web.

All's you need is an IP address and a TCP/IP stack (oh, and since they probably check for the existence of hosts using ICMP messages, you might want to have your host reply to ping requests in order to become part of this statistic).

A general rule of thumb for errors in counting in a sample is sqrt(n). So if they sample 150,000 the error is around +/-400 but of course, they then scale this up to apply to the internet with its' ~2^30 (usable) ip addresses. So they're scaling by a factor of ~7200 so their error is of the order of 7200*400 or nearly 3 million.

It's something to be borne in mind when you see polls on TV. Frequently the sample is so small that any lead one party has is lost in statistical noise. Say in a poll of 400 people, you have a statistical error of 20 or in other words, 1 in 20 or 5%. Thus if for example, in suchg a small sampled poll, Bush leads Gore by 8%, with a 5% error on each candidate's poopularity or 10% overall, it's statistically insignificant and doesn't show a thing.

I'd say that it is kind of like taking the census. People want to know the size of the internet in the same way that they want to know the size of the population. I'm just waiting until they try to figure out everyone's age/sex/race. Now THAT will be fun.

These behave very strangely. The TCP/IP stack on my nokia 7110 does respond to pings (although i'd hazard a guess that not all models do. It does produce very odd results in nmap - which kicks up dozens of errors about unrecognised responses.

Remember that nokia expect there to be a worldwide market for 500 million wap enabled phones!? that'll eat into the ip space.

...with most providers using their own gateway, generally the phones are allocated IP's in the private address space.

This does make some sense, since a random probe (you know, you get them all the time on any machine that's been connected for longer than 10 minutes) could upset this little (generally buggy) device. An nmap scan is enough to crash some of the early 7110's (which btw have had a lot of stack problems so don't be suprised about different ping replies between different firmware versions). Even with the always-on GPRS, it's likely some kind of cut-down and reshaped DHCP will be used, since even though the phones can stay on-line all the time a lot of people don't leave them switched on all the time...:) silly really, I have about 3 weeks uptime on my Motorola Timeport:)

This only measures how many hosts are listed within DNS, not the total number of machines on the internet. It doesn't measure IPs used by dialups, machines behind firewalls, IP masquaraded machines, etc. In other words, there are more than 87 million computers on the internet, quite a few more I would guess. In fact, I would say that the exact number is almost impossible to figure out.

The Telcordia solution to quantifying Internet growth statistics is based on an internally developed unique sampling method. In this approach, over 150,000 randomly generated IP addresses are sampled on a daily basis and checked for their existence.

I haven't checked http://www.argreenhouse.com/netsizer [argreenhouse.com] ('cause it seems to be taking just forever to load), but they appear to be generating random quads and seeing if they exist, and running statistics against their findings over time.

Now I know why she was complaining earlier when she tried to connect and it wouldn't. First I am going to sue you for the theft of the internet. Then I am going to sue AOL for trying to sell a product they no longer owned.

I always look at these kinds of stats with extreme prejudice. Just exactly how do they determine if a host is alive, where it is, how many IPs are on the same physical machine, etc? "Ping" is a VERY bad way to locate devices. It has the lowest possible priority (read: NONE) and in fact is ignored/filtered at most sites.

As for location... With the increasing use of non-IP transports for IP (read: ATM and SONET), it's even more impossible to guess where something is. Looking at the traceroute to one of my machines, one might think it's in CLT -- that's just the last place it was IP along the ATM path to the IDSL router 200miles away. (Heck, ALL of Interpath's dialup -- three entire states -- looks like the machines are at RDU.)

I have to wonder if all of this hype about the ever-growing size of the internet is just blind optimism. Growth doesn't seem to have brought very much of any good. For every Slashdot or Linux kernel there's a thousand new pr0n or warez sites, a thousand badly designed web pages...

Corporations patenting obvious ideas left and right to try to gain some control over the network. So-called "intellectual property concerns" become dominant features in internet policy making. Commercialism seems more dominant than community. What happened to it all? Is there any hope for a populist revival to restore more of the old community feel?

"-iR This option tells Nmap to generate its own hosts to scan by simply picking random numbers:). It will never end. This can be useful for statistical sampling of the Internet to estimate various things. If you are ever really bored, try nmap -sS -iR -p 80 to find some web servers to look at."

The only difference is that most normal people aren't bored enough to keep going after the 500th or so 403 Forbidden error.