Posted
by
CmdrTaco
on Wednesday November 22, 2000 @03:31PM
from the stuff-to-think-about dept.

Whanana sent us an article about information objects as visualized by Robert Kahn.
The article is written from a fairly childish place (it explains DNS for crying out loud, and the bulk of it is a history lesson obviously designed for a mainstream paper) but Kahn's Digital Object Identifier concept is interesting. If anyone has links to RFCs and the like, please post them in the comments.

This may be a dumb question, but is this "Handle" approach compatible with URIs?

It sounds like a good idea to abstract the identifier for a document (or whatever) from its location, but will this mean an incompatibility with current Web names? Are we going to have two different standards of access to information resources, or can they peacefully coexist?

Personally, I love the Franklin quote:
'We must all hang together, or assuredly we shall all hang separately.'

If you read the MSNBC article carefully, you notice a few scary things mentioned, like "[it] is using it to build the Defense Virtual Library" and "another problem is with copyrights and other protections of intellectual property."

First of all, what's scary about the DoD putting its library on-line?

Second of all, only people who create nothing think that the creative work of others should be free. If copyright holders want to be able to track their work and make sure that their work is only available to people who have acquired a licence, I don't see a problem. In fact, it will be a HUGE help to individual authors/musicians/artists/whatevers, since they can take care of managing distribution all by themselves without needing a big company to handle it. Of course, promotion is still an issue, but that's another debate...

If you want to be a thief, you'll hate this. If you want to actually use the net to find stuff and be reimbursed for the things you create, you'll love it.

The idea of objects being passed around by handles is the original concept for the Internet as espoused by Dr. Alan Kay. This is how he originally envisioned Object-Oriented information models. Now the Internet is being re-invented to change it from the simple collection of connection paths to a real highway where real self-contained objects can be passed around. This may be better, maybe not. I guess it depends on how it's implemented. If each object has to be accompanied by a slew of "helpers" to allow the recieving node to interpret it, this could get ugly. But if a single, open, method is used, this could be beautiful. Imagine a fully portable object going from platform to platform totally transparent to the user!

Of course, it'll have to compete with.NET and I just hope the geniuses who are behind this idea don't get mown down by Microsoft's marketing muscle.

A central database would not necissarily have the same problems as our current DNS system if OIDs were not human readable. Unfortunatly, three would still be two serious problems:

(1) Who would do the human readable -> OID translation.

(2) Using a centralized database to find things would make censorship really easy. I've seen a lot of people here asking "who would own the centralized database." This question is totally irrelevent as any government would strongly regulate the database owners. the real question is "what country would be able to pass laws about it?" i.e. who's version of censorship are we going to force on the world.

First, there maybe be a solution to (1), but it's not totally clear how to implement it. Specifically, you need a "philosophical" cross between search engines and alternative DNS servers. I do not see how to d this, but it seems like you want to have the "athoritative" qualities of DNS, but allow eople to switch as easily as going to a diffrent search engine.

Second, the only real solution to (2) is to eliminate the centralized database. Actually, you really should just junk all this guys ideas an use freenet. Now, information on freennet is not perminant, but there are soltions to that too. Specifically, get people to permenently rehost thngs they think are importent.

Anyway, issue (1) is central to freenet too, so there is really no point in even considering this guys proposals. Freenet is beats these proposals in every way.

I've looked everywhere... can't find anything current. Most info on Xanadu stops at about September 1999. Hell, I even submitted an "Ask Slashdot" about it.

Had no idea that some people hated it so much (see first reply to "What about Xanadu"). Is it really that crappy? Damn, and I thought that it would rock and it would change a bunch of stuff or something.

This is damned difficult! The idea of labeling information instead of data is a good one - but we need to sort out what is information, what is labels, what is data, and how to make it all work, and why!

I think it would be grat to be able to access the closest copy of an article (or music, or the drawings of a historical organ, or the latest Linux kernel) without worrying whose computer it is on, and if they have moved it to a different location.

As far as I can see, this scheme does nothing towards solving the (admittedly real) problems of intellectual property. If I can fire up a nslookup or its relative, and translate the ID to anb URL and then to an IP address and a filename, then at most it can obscure the path to direct access. And we all know how badly "security by obscurity" has performed...

This brings up the whole philosophical discussion of what is information, and how it can be or should be owned or controlled. Not all information wants to be free - at least my credit card number wants not.

No matter how the legal and philosophical discussions go, this scheme may provide a valuable tool for identifying information, and that I see that as something positive. But will it take off? Only time will show.

Organizing and accessing my data by object/content over my own network would be pretty cool. However, in the "wild" of the Internet, I don't necessarily need some central organizer knowing the "what" of what I am accessing. I guess I must be a privacy paranoid.

However, just because selfish and immature cynics exist doesn't mean that real radicals don't.

I'm guessing that real radicals like to eat and have shelter. If they are going to give away what they produce for the Greater Good, they have to live off of the money/goods/etc. from other people, and they can get it either by force or the good graces of other people. The second option is wonderful in theory, but the first is more common in practice.

The sole reason why the Open Source movement exists is students and professors who have been living off of parents and/or government grants.

Granted, there are companies which are trying to make money from Open Source projects, but they are trying to profit from obscurity; their products are so hard to use, that people are willing to pay for support. I don't see that happening with books or music any time soon, and when people start putting easy-to-use interfaces on these Open Source products, these companies are sunk.

I've written code in my spare time which I've given away, but if I (or my company) was unable to make money from the code I write for them, I wouldn't be writing code, and I probably wouldn't have the spare time to write code that I give away. As much as I love to code, I love to take care of my family even more.

If there is already a DNS root in place, why not visualize setting up a central object database at the specific site? I.E. cod.slashdot.org/whatever.object.you.want. that way, each individual site would administer it's own object database. this would eliminate the need for standardizing the server systems and would take the power away from a central authority(that could possibly screw it up...)

Where does the censorship come in? It sounds to me like the owner of the domain (in the example give, MSNBC) would be responsible for maintaining the object ID table. DNS would resolve the domain name to an IP address, then the handle would be resolved by the object ID table that resides on that domain's server. Different than how it works today, but I don't see any new censorship opportunities.

Can you please explain where the possibility for censorship lies? I think I'm missing something here.

Of course most real radicals enjoy eating and having shelter. Putting patentable ideas into the public domain, or GPL'ing them, doesn't prevent the creator from making money by being first to sell products based on them or by creating better products based on those ideas than others can make.

Many Open Source developers are students and professors, it is true. But there are others: Linus Torvalds has a day job and still finds time to direct kernel development, the KDE team is largely made up of people who work for TrollTech, and there are many many sysadmins who Open Source tools they have created to help themselves in their jobs. Furthermore, we can assume that there are *some* developers who have made themselves independently wealthy through their own hard work and can therefore afford to code for free. If I am lucky enough to find myself in such a position, that is what I hope to do.

You are mistaken when you imply that those who don't believe in the ownership of ideas are themselves incapable of making a valuable creative contribution. If you believe in "Intellectual Property", that's your business, but it doesn't give you the right to denigrate the work of those who believe differently than you. There is not *yet* a law that states that all intellectual activity must be undertaken in service of the profit motive.

DNS has a record called "hinfo" for Hardware Information, however due to security concerns, not many people use them now. The record is just a text string that can be almost anything to discribe the machine including hardware information, physical location, etc.

We could use this record for the IHS information without any changes to the current DNS system.

Check me on this... how would you determine who gets to be the Object Id Root owner? Highest bidder? First bid over a certain amount of cash? What? Personally, I have no idea how ICANN got to where they are, but would it be that likely to go the same way, however that may have been?

Or even worse... ICANN somehow gets to be the Root owner for this too....

I have gigantic mixed feelings on DOIs, handle systems, and other metadata schemes. I come at this as an anarchist, a librarian, and as a person who has actually purchased a DOI prefix for my employer. I've even been to the DOI workshop that was held several months back at CNRI in Reston, VA.

First, the postive side of DOIs. As most of you know, there is alot of information on the Internet and it isn't organized logically or in a way that a library would organize it. Librarians have been trying to instil some order on the Internet for years, mostly via various metadata schemes. A metadata-based system, like the DOI file handle system, would get us away from identifying content based on location (URLs) and get back to identifying content based on classification (i.e. like Dewey or LC call numbers in your local library). So, if you've installed the proper DOI plugin into your browser and you click on a DOI-enabled link, you'll be given a choice of where you want to get the item. The article by Professor X on nanotechnology is identified by a number, not a URL. Youc an choose to get it from a variety of sources, some of which will give you free access, say if you are a student at a particular university.

In other words, DOIs would greatly help people find information on the Internet.

Now for the flip side. If you read the MSNBC article carefully, you notice a few scary things mentioned, like "[it] is using it to build the Defense Virtual Library" and "another problem is with copyrights and other protections of intellectual property." If you care about the free flow of information on the Internet, which tech like Napster has enabled, DOI and handle schemes should throw up lots of red flags. The music industry is salivating over the DOI project. They are involved in it, the extent to which is unknown to the public. I suspect that the DOI system will be sold as a cool way to find Internet content and that its use to police the Internet for intellectual property owners will be downplayed. If Microsoft and the AAP are involved, you can bet that they don't have the interests of Internet freedom in mind. They simply want to protect the profits they make from the intellectual work of other people.

This is another example of why technology is never neutral. There are always socio-political ramifications from every new tech. Is this new system, which allows you to find content easier, worth the tradeoff in how it makes intellectual property fascism easier?

Forget censorship. This would lead to a bold new world of broken links. It's bad enough to have to update hyperlinks on pages on your own server. Imagine trying to get someone like Network Solutions to update their database every time you move a file.

Did anybody notice that to be able to assign handle's you had to have a "naming authority" as in:

Under the handle system, my last column might have an identifier like: "10.12345/nov0700-zaret". "10.12345" is MSNBC's naming authority, and "nov0700-zaret" is the name of the object. MSNBC would then keep a record in its handle registry that told the computer what server the object is on, what file it's stored in, as well as the copyright information and anything else it may want in that record.

Scary stuff given the recently introduced $2000 price of the.biz domains. I mean, so if I as a person want to use this new scheme, I've not only got to apply for an ICANN controlled domain name, I've now got to apply and pay for a "naming authority". What's to keep them from pricing this naming authority out of reach from the common person? I think this is a looming large threat to independent posting of material on the internet. Or am I being paranoid (again, heh)?

Putting patentable ideas into the public domain, or GPL'ing them, doesn't prevent the creator from making money by being first to sell products based on them or by creating better products based on those ideas than others can make.

And that's simply not true. If you remove the cost of creation, then distribution and mass production costs predominate. It would be trivial for a large company to steal a novel, song, movie, whatever from a person who works alone and produces something. It's virtually certain that the creator will be screwed and the large company will profit. This is what people refuse to understand: copyright and patent are intended to protect the little guy against the big guy and the tyranny of the masses, not the other way around.

others: Linus Torvalds has a day job and still finds time to direct kernel development, the KDE team is largely made up of people who work for TrollTech, and there are many many sysadmins who Open Source tools they have created to help themselves in their jobs.

Linus started Linux when he was at school. His work for Transmeta is owned by Transmeta, not him. It pays for him to spend time working on Linux. If he didn't get paid by Transmeta, he probably wouldn't be working on Linux.

I'm not sure about TrollTech's funding, but how do they make money? VC funding? How profitable is the company? Companies based on open source are probably long-term doomed.

Sys admins who are contributing stuff done during work hours are technically stealing from their company; it's almost certain that they signed a work contract which stated that anything they create during work hours is company property, and anything RELATED to company work created at any time is owned by the company, too. Just wait until the first lawsuits which try to remove that sort of code from an Open Source project...

If you believe in "Intellectual Property", that's your business, but it doesn't give you the right to denigrate the work of those who believe differently than you. There is not *yet* a law that states that all intellectual activity must be undertaken in service of the profit motive.

If you subsidize your salable creative work with non-creative work, even if you don't enjoy your non-creative work solely because you think it's morally wrong to profit from creative work, you're either a saint or a moron. I can't decide which. If you're doing some creative work for money and some creative work for free, then you're a hypocrite.

Can you please explain where the possibility for censorship lies? I think I'm missing something here.

The point of this exercise seems to be making navigation revolve around the 'what' of the information instead of the 'where'. So, if I want to go looking for yahoo.com/nazi_auctions, I no longer simply ask the DNS server for the IP yahoo.com and then ask the server for nazi_auctions. I ask a distributed DOI database for the whole thing, if I understood it correctly. Conceptually, this would mean that intermediaries will know 'what' I'm looking for and not just 'where' I'm looking for. That's what I mean.

The sort of read capabilities Kahn is talking about were the conerstone of the Xanadu project and its plans for handling copyright protection and payments for creators. Systems like Mojo Nation [mojonation.net] and Freenet [sourceforge.net] create these sorts of absolute references (usually based on SHA1 hashes and the like) and flexible addressing schemes a la SPKI [std.com]/SDSI [mit.edu] deal with all of the namespace issues Kahn is talking about. This is basically a not-well-researched rehash of some old ideas; the bits of those old ideas which are of value are already being incorporated into systems, but the central registry/indirection via tollbooths bit is new and does not seem to add much real value to the users of such system.

I just tried to check out zigzag- that site is a nightmare of broken hyperlinks- there are several on every page, and you pretty much have to navigate by figuring out what to type into the location bar. Is that ironic, or is it just me?

There was nothing particularly crappy about Xanadu, it just tried to do too much and expected the rest of the world to stop what it was doing for a few years while they finished this uber-cool thing. It goes something like this: Ted Nelson has an idea of the 6 things hypertext "must have" to work and gathers too many mad scientists and not enough hunchback to work on things. A couple of years later Tim Berners-Lee figures out that you only need two of the six "requirements" and creates the web. Five years later Ted presents a variation of the original idea that is trying to find footing against a system (http/html) which is demonstrably inferior, but good enough. The rest, as they say is history...

BTW, if you are looking for the current incarnation of Xanadu, look for zigzag [xanadu.net].

Of course, an OID system still doesn't solve the problem it purports to address. So you have a registry and an object handle in that registry - what happens when the object is removed, or moved to a different place?

If you change service providers, will you still be using your old OIDs? I doubt it... use of the registries is hardly going to be free. So you're back to the 404 problem... only this time you have to remember what looks like a phone number with a name on the end, instead of a nice simple URL.

Oh, and while we're at it... let's throw DNS into a new crisis by negating the value of everyone's domain names... whoops!

It seems to me this sort of discussion has been handled many times with the same results. Object descriptions and addresses ought to remain separate. DOI looks like a big directory structure for the net; your objects be they computers, printers, or individual files are given handles which are are then in turn given directory registrations. Am I following so far? It seems like this is just restructuring overhead without making it particularly more efficient or effective. TCP/IP packets can be run through a stack which pretty quickly gives the receiver information about the packet but leaves the content alone. This is very simple and amorphous which is why it caught on (you can even use different routing/addressing schemes as long as it follows the header-has-little-to-do-with-the-packet concept). Directory structures on the other hand need alot of overhead due to the fact something somewhere has to know exactly where something is. Lets say all of the DNS servers around today had to hold references for every file available on the internet. That is amazing overhead just to access a text file on a server someplace. Overhead that is distributed over the WHOLE network (the entire internet) as you've only got so many directory servers you can possibly access. TCP/IP combined with transfers that overhead to the computers that are actually talking rather than the entire network. Its easy to upgrade the speed of your hardware to handle an increased demand or whatnot which is generating the extra overhead but is truely hard to squeeze more umph out of a network that is forced to access a limited number of nodes to do absolutely everything.

Thats what the dudes who came up with EJB wanted to do. Objects pass around the net and the person accessing said object only becomes aware if there is some response lag. Lets say instead of just a reference to a static web page you accessed a web page object. You could then pass that object onto someone else. But you could also substitute the object's contents with your own content that someone will access. Even with a powerful encryption and validation system in place you could still spoof the contents of an object. You could also make the contents of an object interactive so you could make a BO-type spoofed object that totally fucks someone over if it is ever accessed. The host-based internet sidesteps some of the security problems of an all peer internet by centralizing files and objects. Besides, I don't want to share my comparitively paltry dialup bandwidth with some other dude because I have a web page object on my system he wants to see.

The handle approach wants to extend and replace URLs. Right now you type a URL into your browser and it goes to a DNS server and places a query. The DNS server matches up slashdot.org with an IP address which your browser sends an HTTP request to. The slashdot server then finds the file and sends it to you. With the handle approach the file and address are stored in a directory so you type in slashdot.org.index and your browser authomatically goes to the index file on the slashdot server. That directory entry to slashdot.org.index is dynamic though, if the file moves to a different computer or server the name still points to it.

What do you think HTTP already does? You store (or link) the files you want people to access in your www directory. Your server receives a HTTP request looking for a file, the HTTP server either finds it and sends it back or doesn't find it and tells the person so. You can transfer anything you want over HTTP and it allows for contextual information to be transfered (size and MIME type) of the object requested. DNS doesn't touch your system's internal components it just helps computers find each other whcih is all it ought to do.

making this stuff obligatory for everybody that wants to use the Internet? That should make tech-support really interesting...Hi, I just put this shiny new AOL CD in my shiny new Compaq Presario, and now it makes me answer all sorts of funny questions. I didn't do anything wrong. This wrong is broken! I want to talk to your supervisor!!!

Why don't we just use the current DNS system to resolve to the hostname, and each host has its own database of object id's? This seems most reasonable to me. Each site can (if it chooses to) migrate to using OID's at its own leisure. Then, we could use this along with the current protocols and filesystems, without having to create a whole new internet. It sounds like this is a good solution for administering a single domain, but not for the entire internet. Can you imagine the size of the database necessary to store id & location of every page on the net? Geez...

Freenet stores files under a unique name in a distributed filesystem (i.e. freenet). All you need to retrieve a file is it's name. It appears to me that this is Kahn's idea taken to the extreme. Freenet takes care of storing and retrieving objects with a unique identifier. The system could easily be extended with databases coupling relevant keywords to the identifier. Also it is safe, freenet is explicitly designed to hide the location of the files. Even the owner can't touch it after it has been put into freenet.

In 1982, Apple, Atari, Packet Cable, AT&T, Knight-Ridder News and Xerox PARC were all part of a group I was putting together to push for a standard object identifier for network communications. It was going to be 64 bits with 2 pieces:

A system serial number with bits reversed, and packed against the top of the 64 bit word.
An object creation counter for that system serial number -- under localized control/increment.

I had to continually fight off people who wanted to subdivide the 64 bits into fields, the way IP was. The primary discipline I wanted people to follow was to keep routing information out of the object identifier so that object locations could be changed dynamically. It was amazing how many times I had to explain this to people who should have known better.

Unfortunately, I didn't explain it to the right people at DARPA, although I did have a couple of meetings with David P. Reed about it when he was still at MIT's LCS.

I touch on some of this history in a couple of documents, one written recently [geocities.com] and one written at the time [geocities.com].

Until I read the article about Kahn, I didn't realize that DARPA chose the IP nonsense at almost exactly the time that the AT&T/Knight-Ridder project that was funding me made a bad choice of vendors that resulted in my resignation from that particular high-profile effort and try to strike out on my own turning 8MHz PC's into multiuser network servers (which I actually succeeded in doing after a lot of blood letting, but that's another story).

It sounded pretty interesting, but I guess I missed something because they started saying it would help people protect copyrights. At first I thought, they must be talking about publishers which want an easy way to identify the copyright holder of a given object (article, song, whatever) so they can electronically pay for their royalties, etc. For example, when colleges want to create course packets for a class by photocopying various articles, etc. they send faxes to various publishers asking for prices for permission to copy, and then sending in payments. (I was once working on a database to help this process, never bothered to finish it though.) But it's not like the photocopier machines require authorization to copy, the people selling coursepacks manually get permission.
So the question is, is this object thing they are talking about a way to make it easier for these type of people (and other who wish to maintain copyright compliance because they happen to be operating publicly and will get sued if they don't, perhaps an online radio station for example), is is this supposed to be part of a trusted client (the perpetual motion machine of the information age) scheme like the failed Divx DVD players?

This is similar to akamai's setup. Akamai (for those who don't know) is a cache that a lot of bigger sites use (Yahoo, etc). The way it basically works is as follows:

ISPs are part of Akamai's network. They maintain a certain amount of cache for the websites involved and, in turn, provide fast content to thier users (dial-up, etc) and others

Something similar to this would work. Multiple distributed, _independant_ machines would be responsible for maintaining some segment of the the library. They would provide this content to their users (and everybody else) and everybody should be happy.

The objects would be fast for the local users, the objects would be permanenent (because it's distributed).

Of course it would need to be more complicated than this... but the idea is there

> A paper which provides a summary of the current thinking on DOI has
> just been published in D- Lib magazine at
> http://www.dlib.org/dlib/may99/05paskin.html

This does answer a lot of questions we had, mostly in what seems
to be the right direction. The relationship with INDECS on metadata
issues looks like a particularly good resolution ("functional granularity"
is essentially what I was looking for in one of my earlier
questions). It looks like a specific metadata "Genre" needs to be
worked out in detail for journal articles (re reference linking) - and
it's not clear who has responsibility for this (the IDF or someone else?)
but at least at the level specified in this article it looks workable.

But to some extent the paper shows the DOI is a solution in search
of a "killer application" (mentioned several times in the article).
There's a chicken-and-egg problem here: the potential applications seem
to require widespread adoption before they become useful.
As one of the final bullets says: "Internet solutions are unlikely to
succeed unless they are globally applicable and show convincing power
over alternatives" - does the DOI as described show convincing power
over the alternatives?

It's sometimes hard to know what counts as an alternative, but the
following systems (some listed in the article) could be
alternatives for at least some of the things the DOI does:

Alternatives 1-4 provide a variety of routes for creating a unique
digital identifier for something - we really don't NEED the DOI just
to have digital identifiers, though DOI does provide a handy rallying
point for those of us providing intellectual property in digital form.

Alternative 2 is the highest level of digital identifier, but perhaps
that is all we really need? There is room for many "naming authorities" -
perhaps even each publisher could be their own naming authority. That
would depend on widespread adoption of (3) which may or may not happen,
and resolution of general registration processes too.
As the article mentions, general implementation of URN's is quite
limited even after almost a decade of work. Is there a reason why
nobody has found it particularly useful yet?

Alternative 1 is, to some extent, a non-issue (a DOI is, after all,
just a handle) and is also, to some extent, the same issue. Any
publisher could, with or without DOI, register as a handle naming
authority and create handles for its digital objects. Is some of
the DOI work duplicating what has already been done (or should have
been done) for the handle system itself? As the handle system web
pages mention (http://www.handle.net/) it is at least receiving some
use as a digital identifier of intellectual property by NCSTRL,
the Library of Congress, DTIC, NLM, etc. Does the DOI provide
convincing power over using the handle system directly?

Alternative 4 (PURL's) is critiqued at length in the article,
particularly on the issue of resolution (section 3). Perhaps I
don't understand properly, but I don't quite agree with some of
the arguments against PURLs. Any digital identifier can be used to
offer great flexibility in resolution - a local proxy can redirect to a local
cache or resource, for example, for ANY of the unique identifiers
under question. Once resolved, the "document" resolved to can
itself contain multiple alternative resolutions. And a handle is only
going to have multiple resolutions if the publisher puts it there
(who else has the authority to insert the data?). So I think the
single vs. multiple redirection issue is a red herring. I do agree it's
nice to have a more direct protocol (though from looking at the details
of the way handles are supposed to resolve there is a lot of
back-and-forth there too). As far as being a URN or not, there's
no reason why PURLs couldn't be treated as legitimate digital identifiers,
even if they are simply URL's at the moment. On "scalability" - the
current handle implementation doesn't seem particularly scalable
either. Only 4 million handles per server? Only 4 global servers
(with 4 backups that seem to point to the very same machines on
different ports)? And those servers seem to all be in the D.C. area...

Not that I think PURLs are wonderful, but does the DOI provide
convincing power over using PURLs, as far as identification and
resolution goes?

Which is presumably why we've been told DOI's have to do
more than just identification and resolution. Hence metadata, to
provide standard information to allow "look-up", multiple-resolution,
and digital commerce applications. This actually makes a lot of
sense. And the other id/resolution alternatives do not
seem to meet the INDECS criteria as well as the DOI can.

But what does this have to do with reference linking, the
first "killer application" mentioned? The look-ups required there
are almost certainly going to be more easily performed with
specialized databases (A&I services) or direct rule-based
linking (alternative 5) and in fact this is already
being done, generally without the use of DOI's. The DOI does not seem to
make the linking process easier, so there's no "convincing power"
here it would seem.

I added alternative 6 (global directory service) as a wild-card -
this seems to be a major focus of "network operating system" vendors -
Novell's NDS, Oracle's OID, Microsoft's Active Directory - these seem
to be systems intended to hold information on hundreds of
millions of "objects" available on a network - an example being the
personal information of a subscriber to an internet service provider.
But another potential application of these is to identify and provide
data on objects available on the net - intellectual property or other
things available for commerce. Is this something the DOI could
fit into, or is it something that could sweep URN's, handles, DOI and
all the rest away? I really don't know, but it seems like
something to watch closely over the next year or so.

It's virtually certain that the creator will be screwed and the large company will profit.

Telling me that performing an operation on a patient will kill the patient would not dissuade me from performing it if the patient were already dead. That is to say, strong copyright and patent protection have not prevented the large companies from screwing the little guy, so it's not a valid reason to maintain the status quo. It is a well known fact that the only people who can count on making a profit from most publishing are the publishers themselves. Maybe big wheels like Stephen King can afford to walk away from publishers who refuse to allow them to retain copyright, but most authors that fit the description "little guy" wind up having to give all rights to the publisher. How many recording artists and authors have been screwed by their publishers? It is clear that copyright is not protecting *them*.

In the software world, distribution and mass production costs have now been brought to virtually nothing. The same could be said of e-publishing of copyright material. The large companies to which you refer would probably not touch anything in the public domain anyway, because they would know that they couldn't prevent other large companies from competing with them. GPL'd materials will be avoided by most large companies for the same reason and also because of the redistribution clauses in the GPL. Even where these factors do not prevent the large companies from trying to crush the little guy, many customers would make the decision to buy from more ethically clean sources, because Barnes and Noble and the GNU web page are just about equally accessible from the web.

Besides, according to those who wrote the Copyright Laws, the reason for those laws is *not* to protect anyone, but to benefit *society* by encouraging people to add to the public domain by giving them exclusive rights to publish *for a limited time*. Disney and other major copyright holders are being quite successful at removing the time limits; it is widely acknowledged that nothing published by a major outlet today will *ever* enter the public domain. Add to that that the technological protections being put in (DVD-CSS, DOI, etc.), together with the DMCA, have made it illegal to access copyright material without the publisher's permission, even after (or if) it has passed into the public domain. In this setting, copyright holders are no longer obligated to compensate society for the exclusive right to publish, and so are basically getting a free ride.

As for the issues of paying the bills while producing public domain and GPL work, maybe it is true that those who believe that intellectual property *should* be free have to make compromises to the current legal environment in which it is not. Releasing as much of your work into the public domain as you can afford to is for many a way of giving back to society. The same motivation leads many ISPs whose businesses are largely based on Open Source software to encourage their sysadmins to GPL the tools they develop.

In closing, I'm not calling you or people who believe as you do morons or hypocrites, or corporate shills, or any of the other things I might like to in a fit of emotion, but instead attempting to reason with you. If you do not understand why I believe the way I do, I will try to explain, but please refrain in any reply from this unseemly name-calling.

> [...] I agree with
> Stu's comments on policy development being key. In talks about the
> handle system I usually describe DOI and other handle uses as policy
> laid on top of infrastructure.

I found myself agreeing with Stu's comments on this too. But policies
and practices won't be adopted unless they are either evolutionary,
based on existing well-tested standards, or truly revolutionary,
allowing some wonderful new thing to be accomplished that can't
be done any other way. As I was trying to convey earlier, we have a lot of
choices for both the technology and the content of unique identifiers,
including long-lived ones, and it doesn't look like DOI's or even handles
meet the revolutionary criteria. There are also more application-specific
alternatives to the DOI (such as SICI) that I didn't include earlier, many
of which have also not received much use despite their ease of creation.
If we're talking about identification for the purposes of intellectual
property, shouldn't the Copyright Clearance Center and the other
Reproduction Rights Organizations be at the center of
determining such standards? Don't they already have unique identifiers
that they use (there is some CCC number at the foot of every page
we publish now)?

> [...] there
> are hard technical issues around ease of use, both from an end user as
> well as an administrative point of view. Especially from an
> administrative side, there is a 'good intentions' factor that I believe
> has been here since we all starting talking about this stuff almost ten
> years ago now. The net makes it easy to distribute information in an ad
> hoc fashion. It also makes it easy to lose things.

Things get "lost" either through neglect, deliberate removal, or
relocation (though I would call that "misplaced" rather than "lost").
DOI is unlikely to help either of the first two situations.
If there is no economic incentive for anybody to
support the preservation of some piece of digital information, there
will certainly be no incentive to keep the DOI pointer up to date
for it. And if the owner of a piece of information wants to remove
it, how could a DOI stop them?

Where the DOI would help is if a piece of information is relocated -
but so would any other unique identifier coupled with a location
system (PURL in general, and S-Link-S, Urania, PubMed, etc. specifically
for scholarly articles already exist - A&I services are also doing a lot
in this area). The more such systems pop up
and gain "market share" in different applications, the stronger the
incentive for the publisher never to change the location of anything
ever again because of the work required to keep them all up to date.

Administrative ease is basically a factor of how much work is required
to register each new published item, plus how much work is required
to change all the location information when things are relocated.
One can even write an equation for this:

where B is the "burden" associated with inserting a new item,
and R is the "burden" associated with updating an existing item.
Even if much of this is handled with automated systems that make
the initial per-item burdens tiny, there is still a need for quality
control, ensurance of the interoperability of systems (for example,
what is the standard for representation of author names containing
special characters? mathematics in titles? etc) and programming
work whose complexity is at least proportional to the per-item
information and translations required. DOI without metadata
had the advantage that the per-item information required
was minimal. With metadata it's not clear which would have
lowest burdens, though the unfamiliarity and lack of applications
for the handle system could be a disadvantage to DOI here (increasing
the required programming effort).

Except that this formula does not apply to S-Link-S, and in
some cases PURLs. S-Link-S uses rules to locate ALL the articles
for a particular scholarly journal, not on an article by article basis.
PURLs can handle relocation of a large number of URL's with a single
change - but the "suffix" URL's must be unchanged for this to work,
which is not true of many publisher relocations. In those cases
where it is true, and especially for S-Link-S, the burden becomes:

where B' and R' are probably larger than B and R, but comparable
at least for smaller publishers that don't have enough items
to justify a lot of programming work. Once a journal has 10 or so
items to publish, rule-based locating is the easiest approach, and
for larger publishers the zero per-item burden would always be
an advantage.

Now rule-based locating systems are not global unique digital identifiers -
but they keep the administrative burden very low, and so are by
far the most likely candidates to solve the "lost" information problem
as far as it can be solved.

> [...]
> Re. Arthur Smith's wondering about handle system scalability and the
> number of current servers: the global system currently consists of four
> servers - two on each of the US coasts. The primary use of the global
> service currently is to point to other services, e.g., the DOI service,
> for clients who don't know where to start. Most handle clients, e.g.,
> the http proxy, do know where to start most of the time since they cache
> this information, so in fact the global service is not much stressed and
> four servers are plenty at the moment.

Thanks for the clarification - however if we're proposing to put direct
HDL or DOI clients in every web browser, that burden is going to
go way up, unless we get cracking on installing local handle
resolvers in the same way we have local DNS resolvers all
over the place. And then who's going to administer them and ensure
that every client is configured to point to the local servers rather
than the global ones? We at least have an established system for DNS,
that when new machines are configured with an IP address they are
also assigned a local DNS resolver, with several backups. Are we
proposing to add another "local HDL resolver" to the setup
procedure of every machine on the net?

The http proxy of course is even less scalable, since it's a single
machine somewhere (admittedly http servers can be scaled pretty
large, but this really doesn't solve the problem).

And as far as I could tell, the handle system doesn't seem to have
the same redundancy built in that DNS has. Perhaps I misunderstood,
but the four global handle servers seem not to contain duplicate
information - rather they each are responsible for a different group
of handles based on the MD5 hash. The redundancy is really just
a single secondary server, which also as far as I could tell right
now resides on the same physical machine (at least the same IP
address) for all four existing global servers.

And remember the DOI/HDL system needs to be able to handle
hundreds of millions or billions of digital objects - that is
one or two orders of magnitude beyond what DNS has to deal with now.

> [...]
> The four million handles per server is a specific implementation
> limit that will go away later this year, to be replaced by some
> extremely large number that escapes me at the moment.

Well that's good. I'm guessing a 2GB or 4GB file size limit was
the problem? The DOI has several hundred thousand items with
handles - how many do the global handle servers contain right now
for DOI and other uses?

An article by L. Davidson and K. Douglas in the December 1998 issue of
the Journal of Electronic Publishing raised in a different sense many
of the issues I recently expressed some concern on with the DOI, as
well as other issues I haven't seen discussed here at all. Was there
ever a discussion here of the points in the Davidson and Douglas paper?
The authors indicate a feeling of encouragement that these problems
will be resolved, but has much changed in the six months
since their paper appeared? I'm enclosing their "summary of selected
concerns" below. Point 2 was the one that I particularly was concerned
with in the most recent exchange.

Arthur Smith (apsmith@aps.org)

----------------------------
Summary of Selected Concerns
The importance of the work being done on the design of the DOI
System, and its consequences with respect to digital identifiers in
general, would be difficult to overrate. Solving the problems of
identifying specific objects on the Internet is extremely important,
and the work being done on the DOI System will help with that
solution. Still, there are a number of current issues concerning this
system that have no easy solutions and particularly concern us:

1.At present, only established commercial and society
publishers are purchasing publisher prefixes and so are
allowed to issue DOIs. This means that most individual or
non-traditional publishers are not participating directly in
the DOI System, but are merely acting as end users. Since
the biggest problems with URL stability and the lack of
persistence of Internet objects lies outside the products
provided through large publishers, it is unclear how the DOI
System is going have any generally beneficial effect on the
solution of the Internet's problems.

2.Those who participate in the DOI System will need to
include in their operating costs the overhead of detailed
housekeeping of the DOIs and each item's associated
metadata, upon which many of the DOI's more advanced
functions will depend. In addition, there are the fees that the
Foundation will need to levy to support the maintenance of
the resolver-databases server for the continued tracking of
traded, retired, erased, or simply forgotten and abandoned
identifiers. Even with computerized aids, the cost to
publishers of maintaining the robust and persistent matrix of
numbers and descriptive text that a handle-based system
requires will be considerable. Under the current model, the
annual fees exacted by the Foundation from its participating
publishers must cover operating expenses. Since no one yet
knows how high these fees might be, we are concerned that
costs for smaller publishers and not-for-profit participants
might be so prohibitive that they will be largely excluded.

3.At up to 128 characters, DOIs are simply too long to be
practical outside of the digital universe. The Publisher Item
Identifier (PII), for example, at seventeen characters, is a
much more reasonable length and probably is still long
enough to identify every item we will ever need to identify.
Indeed, Norman Paskin estimates that only 10^11 digital
objects will ever require identification.[33] Since it is
unlikely that we will never need to copy DOIs manually
from print into electronic format, and since both their length
and limited affordance (mnemonic content) will make it
difficult to transfer them accurately by any manual means,
this could turn out to be a nuisance factor that will hinder
their widespread acceptance. Long identifiers are also
harder to code into watermarks, especially in text objects
that lack background noise in which to hide such data.

4.DOIs will probably not lead to more open access to online
materials, at least to those commercially published. In fact,
most DOI queries from most users, except for those that can
demonstrate access rights, will probably lead to invoice
forms of one sort or another rather than directly to the
primarily requested object. This aspect of the DOI System
could make the Internet even more frustrating for the
majority of the users than it is now.

I happen to be a producer of "intellectual property" who thinks that the creative work of myself and others should be free. There are many intellectual workers who have sufficient humility to know that their best ideas are discoveries and not creations.

Does this mean that I will not seek patent protection for my ideas? No, because the fact is that, without such protection, some unscrupulous person may deprive me of my rights to use the thoughts in my own head. However, my reluctant willingness to pay protection to the patent lawyers does not change the fact of my non-belief in "intellectual property", nor does it change the fact that I would be extremely reluctant to prosecute someone for infringing on my IP.

While I think it is nice to be compensated for my work, I can't just proceed from there on purely economic and non-philosophical grounds to a belief in something which doesn't exist. I think there are many in the Open Source community who feel this way; in fact, I think this belief is a large part of the philosophical foundation of the Open Source community.

All that having been said, I think it is probably true that many of the people exclaiming "Information wants to be free!" simply want to get movies, music, etc. for free for less lofty reasons. However, just because selfish and immature cynics exist doesn't mean that real radicals don't.

duh, this isn't that revolutionary, and it wouldn't require anything special. just have a database and a primary redirect/include page, and you can change your data around all you want. still doesn't fix the problem that if something is DELETED, (not simply moved) that people aren't gonna find it, all you would do is either keep a list of the 'resourceid' that have been used and tell the user that it was deleted, or have a standard way of generating them, and if its not found in the reference database, tell them its deleted (even tho the id in this case might not have ever been used, it would just fit the criteria for generation)

Kahn is a smart man, but this is a classic go-nowhere idea. Too many people are invested in a host-centric model. This is DOA.
"(The) architecture can not write the law, but it provides a technical design that matches the legal structure that is expected to emerge," the Library of Congress says on its Web site.
Heh.

An alternative to the current (mis)use of the DNS would be great, but I don't think one could easily drop the current system and introduce a new one. It would have to be supported by most of the people on the net.On the other hand what if firms (e.g MS, AOL) started to support that, who could resists?Or maybe we get a second Internet finally...

People won't take the IP protection crap and once it leaves the protected lands for the real world they'll rip it to bits.

The idea of content objects with unique ID's isn't at all new but is a good one. I always liked the idea of using encryption signatures as the keys. give it sig for itself and one for it's owner and build a simple search engine mechanism into the Net itself and you have a nice lil system. An important note might be that such a system does not need to, and possibly should not, replace TCP/IP or even rely on TCP/IP as it's only supported carrier. It should be as agnostic about transports as possible for the most flexibility.

Jabber might be a good start for this layer since it is a very flexible system for transporting XML-ized content and contact-type information. I really expect something like this to assimilate the web in a couple years. Maybe Jabber merged w/ FreeNet.

Someone who doesn't know the resource they want could search for it by known facts just as they do now at Yahoo, Google, etc.. once they find it they could store the objects unique id and then every time they needed that object again they could ask the net for it and the closest copy found would be returned.

Global scope: A URN is a name with global scope which does not imply a location. It has the same meaning everywhere.

Global uniqueness: The same URN will never be assigned to two different resources.

Persistence: It is intended that the lifetime of a URN be permanent. That is, the URN will be globally unique forever, and may well be used as a reference to a resource well beyond the
lifetime of the resource it identifies or of any
naming authority involved in the assignment of its name.

Scalability: URNs can be assigned to any resource that might conceivably be available on the network, for hundreds of years.

Legacy support: The scheme must permit the support of existing legacy naming systems, insofar as they satisfy the other requirements described here. For example, ISBN numbers, ISO public identifiers, and UPC product codes seem to satisfy the functional requirements, and allow an embedding that satisfies the syntactic requirements described here.

Extensibility: Any scheme for URNs must permit future extensions to the scheme.

Independence: It is solely the responsibility of a name issuing authority to determine the conditions under which it will issue a name.

Resolution: A URN will not impede resolution(translation into a URL, q.v.). To be more specific, for URNs that have corresponding URLs, there must be some feasible mechanism to translate a URN to a URL.

URNs are actually specified in RFC2141: URN Syntax, which gives identifiers in the form "urn:NAMESPACE:NAME", where NAMESPACE could be something like "dns" and NAME could be "slashdot.org".

The actual method used to retrieve the object that a URN refers to is left as an exercise to the reader.

---
The Hotmail addres is my decoy account. I read it approximately once per year.

First, how is 10.12345/nov0700-zaret different from 207.46.238.109/nov0700-zaret. Or
msnbc-news/nov0700-zaret different from www.msnbc.com/news/nov0700-zaret. This means that using a handle is no different from using the current DNS scheme to name content.

Second, the article says nothing about how they are planning to solve the persistence problem. If something is deleted, what will your handle point to? Simply implementing some new arbitrary naming scheme doesn't make the problem go away.

What is needed is something much like freenet, but with a different twist. Let's call this system infonet. Now, number one priority on infonet would be that information should never disappear. To make this work, there should always be at least three hosts at different locations storing each file. This should be ensured by infonet software. If one host go down, another should take over. Thus, there will always be redundancy. Furthermore, we need a central organization governing infonet, let's call it infonet-adm. It would have two tasks: (1) managing namespace (usenet is a fine model for this) (2) managing content and ensuring enough machine resources are available. The last task is the most difficult. I see several business models for infonet-adm

charging a one-time fee for each file stored in infonet

charging users for downloading content (probably stupid)

charging ISP's for connecting to infonet

demanding ISP's to set contribute a suitable amount of machine resources to infonet, the amount can be a function of number of users and their average activity on infonet.

A combination of option 1 and 4 sounds most reasonable to me. Note that as long as More's law is in effect, it will be now problem to save the old content forever.

The general concensus is that RFCs have been obsoleted by patents [ibm.com]. The only difference is that with a patent, comments must be officially sanctioned by the holder. Woohoo! fuzzylogic [slashdot.org]!

Anyone else notice the part about the central Object Id database?
Just think how much grief ICANN has caused with the DNS root.
I love to think what the Object Id Root owner will be like.
This is a lame duck. Companies will probably love it (in theory), however, it cant function in the real world unless almost everyone adopts it.
And god knows that wont happen.

Its actually a good idea. DNS only tells you
the ip of the machine. IHS gives you the exact
location on the internet of some subject/information handle. So the handle is something that is generic and lasts a lifetime.
If you quiry the handle to the IHS server it
returns the exact URL, which can be dynamic
and changes every week.

People can't even get DNS right. Does this guy seriously think that people will actually make a more complex system work. To quote the article (bold by me):

Under the handle system, my last column might have an
identifier like: "10.12345/nov0700-zaret". In this fictional
example (since MSNBC doesn't use the handle system),
"10.12345" is MSNBC's naming authority, and
"nov0700-zaret" is the name of the object. MSNBC would
then keep a record in its handle registry that told the
computer what server the object is on, what file it's stored
in, as well as the copyright information and anything else it
may want in that record. No matter where the file is moved,
you would be able to use the handle to get to it,

as long as
the record is updated.

Oh, and think of the new and exciting ways things can be censored. Filtering information by it's nature sounds like a real possible evolution of this.

Can't we just have IPv6 and take it from there? I'll take my class A and give an IP address to everything I want people to go straight to.