Posted
by
Soulskill
on Wednesday October 14, 2009 @09:09AM
from the as-long-as-he's-proud-of-the-dots dept.

Stony Stevenson writes "A light has been shone on one of the great mysteries of the internet. What is the point of the two forward slashes that sit directly in front of the 'www' in every internet website address? The answer, according to Tim Berners-Lee, who had an important role in the creation of the web, is that there isn't one. Berners-Lee revisited that design decision during a recent talk with Paul Mohr of the NY Times when Mohr asked if he would do any differently, given the chance. 'Look at all the paper and trees, he said, that could have been saved if people had not had to write or type out those slashes on paper over the years — not to mention the human labor and time spent typing those two keystrokes countless millions of times in browser address boxes.'"

From technical point of view, *not* having the// could create problems more easily. For example if you include port number in the URL and browser or program tries to look at what protocol it is based on value before first :

http://tech.slashdot.org:80/story/09/10/14/1219215/Tim-Berners-Lee-Is-Sorry-About-the-Slasheshttp:tech.slashdot.org:80/story/09/10/14/1219215/Tim-Berners-Lee-Is-Sorry-About-the-SlashesNow if you dont write that http: in browser:tech.slashdot.org:80/story/09/10/14/1219215/Tim-Berners-Lee-Is-Sorry-About-the-Slashes

Now the browser would think the protocol is tech.slashdot.org and tries to pass it to a responsible program instead of loading it. This means you would now need to actually type in the http: which none of us do now. Or dropping general URI support from browsers and IM windows and any other programs (you know all those irc:// spotify: and so on URI's). Or then typing in the:80 would be mandatory.

But, thinking of that.... many many pieces of software allows you to write URLs directly in a body of text, no tags needed, and finds the URLs and turns them into links, but searching for "://". So, what would you regexp for if all you had was a ":"? Normal text quite often does contain colons....

So, what would you regexp for if all you had was a ":"? Normal text quite often does contain colons....

Back in the early 1980s, before the folks at CERN gave us the first browser, there was another notation that was implemented by an assortment of networking software. It originated, as far as I can tell, with The Newcastle Connection (from the U of Newcastle-upon-Tyne" in England), one of the first fully-distributed unix file systems. What it did conceptually was to define a conceptual network directory one level above your root directory, named "/../". So to reference a file on machine X.Y.Z, you'd use a path like "/../X.Y.Z/...". The actual server on each machine typically wouldn't export its "/" directory, but rather would do what web servers do, and supply only a server-root directory (which could also be mounted by other machines by the unix mount command). So if you tried to access the file/../X.Y.Z/some/dir/foo.txt, you'd get the file that the remote machine had at/server-root/some/dir/foo.txt, so files outside the/server-root/ directory would be invisible to outsiders.

This is, of course, merely another syntax for what the WWW calls "http://X.Y.Z/some/dir/foo.txt", but without the protocol field. The TPC implementation made the file readable or writable, depending on what the permission module allowed, via the usual open(), read(), write(), etc. library routines. This meant that all of the software on your machine was automatically able to use accessible files on other machines without any special coding. As with the Web, you just needed the machine name and the file's location relative to the server-root directory.

The advantage of the Web's "http://" notation, of course, is that it allow the explicit use of different protocols. TNC's "/../" notation doesn't do that; the implementation gives direct access via the usual file-system routines, and hides the comm protocol inside the kernel's file-system code just as is done with local file I/O.

Note that the "/../" notation isn't any more difficult to match than "http://", and it's a string that's equally unlikely to occur anywhere but in a TNC-style file reference. And note that there's no problem with adding a ":port" to the machine name with either notation.

I've sometimes wondered why various browsers, especially the mozilla suite, haven't quietly implemented TNC notation and invited users to start using it. You don't need permission from any standards body to do this. It would only take a few lines of new code, wherever the software parses URLs. You'd have to add "/\.\./" as an alternative to "(\w*)://" at the start of the match, and make 'HTTP' the default protocol if omitted. While you're at it, add another * after the//, so omitting the second / will also work. But that's probably too user-friendly for any real web developer to bother implementing.;-)

(Actually, I've done this in a few projects that I've worked on. It doesn't break anything, and when people see that notation, they usually really like it and the new conceptual model of the Net that it puts into their mind. The Net becomes just a large, slow bus connecting millions of machines and their disks, joining them into one huge virtual computer. Replacing a big, messy communication protocol with a big, tree-structured file system gives a major reduction in complexity and points to a much easier way to do things.)

Don't be an ass. Using ports allows someone to set up an ah-hoc server for testing or whatever easily. The last thing they want to do is dick about having to update DNS's bastard child before they can access it from the browser.

Don't be an ass. Using ports allows someone to set up an ah-hoc server for testing or whatever easily. The last thing they want to do is dick about having to update DNS's bastard child before they can access it from the browser.

One: I don't think the GP was the one being "an ass."

Two: I don't think the GP was suggesting that there be no way at all to force your browser to another port for testing purposes (at the very least, a command-line option to browsers could be provided). The point was that for general usage when talking to remote systems, there's no reason not to use DNS, and it would have solved one of the largest problems with IP proliferation: the need to lock SSL to a specific port, due to the fact that the URI used is

Then you still need to have a format that allows programmatic parseability when the port is required. I mean, I have no problem with using more DNS records for port numbers, but the format of the URI needs to account for all supported possibilities, not just the common ones. That's why the OP suggested getting rid of the port number ability when getting rid of the double slash, but when suggesting to allow a default but still allowing overriding, you get back to finding a mechanism that can be determinist

Or just use a different punctuation character for ports. If you think about how the design of URLs could have been better, other decisions are not cast in stone. The ':' also clashes with the separator in IPv6 addresses (which is an oversight on part of those who designed IPv6).

I've always liked that idea (which I've seen before) because it treats subdomains in the same way it treats subdirectories under the document root, which opens a lot of possibilities for creative web hosting schemes (e.g., Slashdot could have put this story on a site at slashdot.org/tech, but when that server's load became high enough to justify a separate host the site could move to tech.slashdot.org - there would be no difference between/tech and tech. from a navigational standpoint).

Also, the dot-com boom would have been the com-slash boom, which would have been much cooler.

I disagree with you. You are thinking in terms of what we have today. So if you remove the slashes today we would have issues. If, however, they designed the system to use something other then the// they would have created a new convention to avoid easily created issues. IM windows and other programs like IRC would have also used a different convention. They use the current convention because that was what they designed their systems for.

Because URL is not just for webpages - it's used for other protocols too. Just like you can click on a http link in other program, you can click other programs link in your browser. This can be irc:// mailto: spotify: and so on. And theres many instances where websites have such links to launch external application directly from it.

My examples showed that the standard:port option, which is optional, would have to be removed for it to work. Or it would had to be made mandatory, or the : changed to something

If we're mobilizing the torch and pitchfork mob, I'd rather send them after the person (Bill Gates???) who decided in MS-DOS to substitute backward slashes where convention for a long time had been forward slashes as the separator in pathnames

DOS got it from CP/M. I don't know why you think the UNIX way of doing it was the convention. Macs and a lot of microcomputers used colons, and a lot of mainframe and older time-sharing operating systems used dots, as did VMS.

I understand that. But my point is that if your public website does not route from domain.com to www.domain.com automatically, you do not know what customer service is. The fact that you can also have different services running at support.domain.com and docs.domain.com is besides the point.

Probably the single biggest reason to have domain.com redirect to www.domain.com is so that you can have specific domains for resources, like images.domain.com... Further the reason for this is to limit the unneeded transmission of cookies. If your website uses domain.com, then using images.domain.com serves very little good as any cookies assigned to domain.com will be sent with each request. Using www.domain.com, scripts.domain.com and images.domain.com extends the ability to have localized (or no) cookies in the resource-only domains. This is less necessary as more and more browsers are breaking older standards and allowing more than two connections to a given host name at a time. (IE8 extended it to six by default, and IIRC firefox already exceeded this limit as well).

Regarding the port, this is very much necessary for development purposes, as using high-range ports is the most reasonable method of allowing a User to run a development server instance.

I used my time modem to login to the Internet3 in 2022 and pulled this review from cdweggbuy (yes, that's a full URL because people thought it was ok to remove gTLDs and also got rid of that pesky http:/// [http] for a VeriLogiSoft Computer Interface device. But of course I got infected by a future virus because my Firefox plugin that matches malicious content didn't know how to identify as a URL.

Ok back to the present.

The problem with letting people have what they want is that the majority of people don't understand why things are the way they are. Tim made the right choice,he just feels that it is wrong now because he's had to hear people complain about it for the past 15-20 years. But when it comes down to it you need some parts of a URL to indicate what something is.

I love that a post that begins "I used my time modem..." can be modded as Insightful. God bless you, you crazy mods.

And crazier Slashdot admins. Because they want to discourage smart-ass comments [slashdot.org], "Funny" gives no karma on slashdot.org. An alternating sequence of "Funny" and the allegedly M2-proof "Overrated" quickly drains a poster's karma. "Insightful", on the other hand, invites no such danger.

ObTopic: Without the slashes, Slashdot would have been called something else.

There are a number of https websites I have used/use that (for whatever reason) don't automatically redirect if you simply type the web-address. Hence you have to manually type "https://..." to get the secure site.

No, it's a joke on the URL syntax. You read it "H T T P colon slash slash slash dot dot org." The FAQ addresses this somewhere, and reluctantly admits that, although funny, it was perhaps an ill-thought-out joke since it does make it difficult to verbally speak the URL without confusing your listener./faq/slashmeta.shtml#sm150 [slashdot.org]

I'm pretty sure they are sorry about that. I can't remember who it was, Paul Allen maybe? But one of the early MS programmers said once that he hugely regretted using / for switches in DOS 1.0. When they added directories in a later version, / was already taken so they had to use \ instead.

MS-DOS was a copy of CP/M.CP/M used / for switches so MS did the same. Maybe CP/M machines didn't have a backslash? I know they sure didn't have a pipe command. Also QD-Dos was designed to run on S100 buss machines and used terminals. Microsoft bought it and made it into PC-Dos and then made it into MS-DOS.

Well, they added directories in MS-DOS 2 and had already used forward slashes for switches in MS-DOS 1, so what could they do? Can someone older than me confirm that they 'researched' the slash for switches from CPM?

Nah. Slashes are fine, but Microsoft should be sorry about backslashes!

Thanks for using the proper terminology. There are slashes, and there are backslashes. There are NO 'forward' slashes. And though Microsoft is cupable, I blame Windows users for unnecessarily complicating the language in a vain attempt to sound like they know what they are talking about. Those extra seven letters are superfluous. It's like saying "CPU processor" or "DNS server." If I see it used again, I'm really going to forwardflip out!

What's worse is people who say "forward slash". There is no such animal. It's either a backslash or a slash. Does anybody say full colon as opposed to a semi-colon ? I use it as a natural filter against people who don't know what they're talking about.

I use the term "forwardslash" fairly frequently, because a good number of times when I say "slash" people ask "which one?" While "slash" and "backslash" are technically correct, "forwardslash" is a descriptive synonym for "slash". Yes, it adds unnecessary syllables, but it's not nearly as bad as the myriad (and sometimes very ambiguous) names for "*" (asterisk, star, splat, bang, etc) and "#" (number, pound, hash, octothorp, etc).

I do not use "full colon" except when I've had too much curry and am waiting in line at a restroom asking the person in the stall to please hurry up lest they exit the stall into a sudden Superfund Site.

Doesn't the same logic hold for the person that decided it should be 'http' for hypertext transfer protocol and not just simply 'h'? Yes, http is more descriptive but unnecessary. Had another protocol came along starting with 'h' they could have opted for another letter or -- if they were all taken -- became a two letter protocol. I mean, if we're going to get into pedantic apologies for lack of brevity I would assume the three unnecessary letters in http are a greater crime than the double slashes, right? Of course, rarely do I find myself typing anything other than the domain and TLD (i.e. slashdot.org, mail.google.com, woot.com) so this has really become a non-issue.

Come on man, humanity chooses wrong paths all the time with the best of intentions, because none of us (apart from you apparently) can predict the future. We do our best to evaluate the future results of our actions, but our foreknowledge is always sketchy at best.

I think it's interesting to be able to talk to someone who picked something that affects so many people on a daily basis. Of course, it's a really tiny effect, but very visible. He could have picked two colons or dollar signs or any random thing. It's not often you get to make a decision that ends up being used globally.

Back when I wrote a thesis on dissemination of company-internal information via the world-wide web, in 1994 or so, I remember stating that originally, an indication of which network protocol to use was meant to go between the slashes. But since, in the real world, the network protocol was always TCP/IP, this was made the default and whatever was once put between the slashes was dropped.

As far as I understand, it was never envisioned that users would actually type "http://www.whatever.com" in an "address bar", users were not supposed to see this at all - it was purely to be used by software and mark-up pages to specifiy the protocol.

I remember stating that originally, an indication of which network protocol to use was meant to go between the slashes.

I don't think so, since the double slashes only apply to Internet schemes anyway. RFC1738 says:

//<user>:<password>@<host>:<port>/<url-path>

Some or all of the parts "<user>:<password>@", ":<password>",
":<port>", and "/<url-path>" may be excluded. The scheme specific
data start with a double slash "//" to indicate that it complies with
the common Internet scheme syntax.

Back in the '80s, the double slashes were invented to indicate that the following token was a machine name and not a local directory or local mount-point. The first time I met the double slashes was on an Apollo workstation, which ran Domain, one of the first OSes where you could access remote files and local files without special software--it was built into the OS. At the time, on UNIX, you had to use commands like FTP, or RCP. On Domain, you could also make a soft symlink on your local computer that pointed to another server, so you could move directories around the network and the local programs didn't need to change. (I would not be surprised if the double slashes came from DEC VMS, but I don't know.)

Compare syntaxes:cp//machine1/dir/dir/filename//machine2/dir/dir -- copy a file from machine1 to machine 2cp/dir/dir/filename//machine2/dir/dir -- copy a file from the current machine to machine 2

In RCP the syntax was rather more cumbersome:rcp user@machine1:/dir/dir/filename/dir/dir/filename -- copy a file from machine1 to my local machineIn RCP, the assumption is that a path name is a "remote" path if it contains the character ':'.

Windows NT and Novell Netware both used double slashes to denote machine names, although Novell's implementation wasn't originally transparent to application programs. Because of the history of PC-DOS, they used backslashes instead of forward slashes.

I had occasion to have an email conversation with Berners-Lee at one time (he bought a license for a program of mine), and I asked if he regretted choosing "www" instead of "web". I was very surprised that this was not something he'd change if he could do the whole thing over...

Saying "double u double u double u" takes about twice as long as saying "web" so that would have been far more beneficial than worrying about the slashes.

There was a bit of a drive to use "web" some years ago, but unfortunately that fizzled..

WWW is no quicker to type than web, and in fact web is more natural to type quickly because may hands can pre-prepare the "e" and "b" while I'm still pressing the "w" and I think that's the same for anyone who's done any decent amount of typing in their lives (i.e. almost everyone over the age of 18 by now!)

I think web is a better idea, in retrospect, but I can't remember the last time I typed www either - it comes naturally and I don't even notice, but http:/// [http] is still a pain in the bum to speak over the phone, especially when people aren't used to the syntax.

Neither, I haven't dealt with a domain in more than a year that didn't either automatically redirect foo.com to www.foo.com, or had the web server running on the foo.com host itself. I frankly don't know why anyone uses protocol-specific subdomains at all any more. Either ftp.foo.com and www.foo.com are the same machine or they aren't. If they are, then there's no reason to have the subdomains because the machine is already listening on ports 21 and 80. If they aren't the same machine, then at best only the

What I wonder is why the designers of DNS put the name in reverse? If the name had been in most-significant-first order, one could have tabcompleted it properly (using history and maybe zonetransfers of smaller zones). Also, if http had included a way to get _parsable_ directory listings, the tab-completion could have gone even further...

My guess is that having the domains in that order allows you copy them directorly to/from DNS packets.And the reason for the order in the DNS packets is that it allows compression by back-references. Roughly if a packet contains multiple names: some.domain.example.com
other.domain.example.comcan be transmitted like:
some.domain.example.com
other<go back in packet at offset X>

Essentially, every line ends with one byte (in order to save space) stating how many of the following lines are direct subdomains of this one. The last line would thus identify org.another.yet.blah. Granted, this would be slightly more computationally intensive than the currently used one.

Come to think of it, if space and computational complexity is that imp

People don't consider the TLD to be most significant, and especially did not consider it so before any domain names used anything other than.com or maybe.edu/.net/.org.

Unless you happen to use both slashdot.com and slashdot.org (and.net, etc.), odds are you can almost always tab complete the name "sla" and avoid ever typing "com." or "co". The TLD is essentially totally insignificant, and would have to be typed every single time if the order were reversed.

I have to say that now I regret that the syntax is so clumsy. I would like http://www.example.com/foo/bar/baz [example.com] to be just written http:com/example/foo/bar/baz [com] where the client would figure out that www.example.com existed and was the server to contact. But it is too late now. It turned out the shorthand "//www.example.com/foo/bar/baz" is rarely used and so we could dispense with the "//".

It might be habits but IMHO www.example.com looks much natural than com/example on any media, from business cards to tv commercials. We use dot in normal writing and not slashes.
And what's better at highlighting a brand: org/slashdot or slashdot.org?

Luckly B-L got it right the first time. Maybe the web wouldn't have add all this success if he designed the addresses in the other way.

What's natural about www.example.com? It looks nothing like any other kind of address. Phone numbers maybe but those are rendered in a very non-uniform way and ".", "/", "-" and " " are all very common separators there.

You just feel that that format feels natural for domain names because that's the way domain names are usually written. If Berners-Lee had went with com/example/www, you'd find "//com/example" to be the natural format.

What I wonder is why the designers of DNS put the name in reverse? If the name had been in most-significant-first order, one could have tabcompleted it properly (using history and maybe zonetransfers of smaller zones).

Not only that, but it would go a _long_ way towards preventing phishing scams.

People mange OK with directories being a nested list, and there is a certain unnamable protocol which uses names that way around. Unfortunately, we're stuck with the backwards system in use now, so there's no point worrying about it.

Everybody knows that the parent of C:\ in windows is Desktop. Which is located somewhere in C:\ making Desktop its own great grand parent. What kind of sick twisted incestuous OS are they pushing on the world?

It is explained by TBL at http://www.w3.org/People/Berners-Lee/FAQ.html#etc

"I wanted the syntax of the URI to separate the bit which the web browser has to know about (www.example.com) from the rest (the opaque string which is blindly requested by the client from the server). Within the rest of the URI, slashes (/) were the clear choice to separate parts of a hierarchical system, and I wanted to be able to make a link without having to know the name of the service (www.example.com) which was publishing the data. The relative URI syntax is just unix pathname syntax reused without apology. Anyone who had used unix would find it quite obvious. Then I needed an extension to add the service name (hostname). In fact this was similar to the problem the Apollo domain system had had when they created a network file system. They had extended the filename syntax to allow//computername/file/path/as/usual. So I just copied Apollo. Apollo was a brand of unix workstation."

I *like* unique, easily-visually-identifiable structures. a@b.c is an email address. If you're in the U.S. you know that XXX-XXX-XXXX is a phone number and that XXX-XX-XXXX is a social security number. You know that X/Y/Z is a date, even if it's not always clear if it's M/D/Y or D/M/Y.

"://", while verbose, is very clear and you always know EXACTLY what it is and what it means--that it's the START of a COMPLETE Web address. If it would have been just a : or a / it wouldn't always be clear because those symbols, by themselves, are often used elsewhere and it would lead to confusion.

Now if we could just teach a planet full of lusers the difference between "slash" and "backslash." People always say "backslash" because they've heard computer guys say it every so often when talking about logging onto MS servers so they call EVERY slash a backslash. Damn you Paul Allen!!! [nytimes.com]

Ever since I started working as a contractor for the Air Force, I've been using DDMMMYYYY for my dates(ie. 14OCT2009), and technically, it is shorter than MM/DD/YYYY by a single character. It's also less ambiguous for all parties involved as not everyone has MM and DD in the same location.

As for web addresses (to stay sort-of on topic...), most people do not use their web browser for anything other than http addresses. So the http:/// [http] is automatically filled in for them; worrying about whether they were

OK, maybe it could have been reduced to one slash, since there's no:/ smiley elsewhere in the URL pattern, but you need to be able to distinguish relative URLs from absolute ones. Without some unique token sequence that was guaranteed not to occur elsewhere in a URI you're going to run into problems. Start removing components from a fully specified URI and see how quickly you run into ambiguities:

The reasons for the// convention for the "super root" in networks like OpenNet and FutureNet, that he was copying, are still valid in URIs. You need something that's easily parsed by computers, and easily recognized by humans. When I first saw the syntax I was all "slash slash whiskey tango foxtrot?", but after using it for a while I was convinced that I was wrong and he was right, and even if he's forgotten why... I still think he was right the first time.

I agree, the// does serve a purpose. Having a marker for the start of the hostname makes it possible to construct a scheme-agnostic URL.

Suppose you had a web page that might be served via either HTTP or HTTPS. You need to ensure that any resources (images and stylesheets) it references use the same protocol, else the browser will warn of a secure/insecure mix. Suppose also that the resources are hosted on a separate server (a common performance-enhancing technique).

Many website addresses don't begin with "www", including the address of the page you're currently reading.

The physicist admitted that if he had his time again, he might have made a change, or more specifically, two.

Well, what's the other one? I'm waiting, don't keep me in suspense here... (Not to mention, correctly speaking he would have done it differently, not have made a change.)

"Boy, now people on the radio are calling it ‘backslash backslash’," Sir Tim told his audience, even though he knows they are, in fact, forward slashes.

He does? Whew, glad they cleared that one up.

Showing them his index finger he added: "People are having to use that finger so much."

I type the slash key with my pinky finger, not my index finger. I even checked the British keyboard [google.com] to make sure it's not a culture disconnect. The British keyboard seems to have it in the same place as the keyboard I'm familiar with.

He knows that no one has calculated the number of exasperated groans emitted at the sight of a "syntax error" message generated by the grave omission of a single slash.

I've never seen such an error message, and both Firefox and IE correctly convert http:/google.com to http://google.com [google.com].

Nowadays web browsers such as Explorer

Explorer is Windows' file system manager. The web browser is called Internet Explorer.

the British scientist who created the world wide web

Sir Tim Berners-Lee, who wrote the code that transformed a private computer network into the web two decades ago

The physicist is credited with being the architect of the world wide web, which was to transform the internet into something usable and understandable by more than just computer programmers.

Shouldn't they say it a third time, in case someone missed it the first two times?

Today the URLs — better known as web addresses — that Sir Tim created, beginning http://www, are familiar to anyone navigating their way around the internet.

Every time I set-up a sub-domain for work I always have to tell my boss "http://subdomain." out loud first, in the hope that he'll not prefix "www".

Sometimes he still just does both, then asks me why it isn't working. This results in a lengthy conversation where we're both saying "http colon slash slash" and "www" to each other. Makes me want to stab him in the face.