Just to supplement the lighting talk of yesterday with some technical
beef.
Below we walk through DDDS (rfc3401-3405) with the URL:
http://www.asemantics.com//n/index.html
A-Priori rule valid for all URIs (and URLs are a subset of URI's)
(chicken-egg solution - [4])
s/^([^:]+)/\1/i; [1]
which is hardcoded in the applications URI parser. Then:
http://www.asemantics.com//n/index.html =~ s/^([^:]+)/\1/i;
gives
http
Then look up 'http' in the well known domain (another chicken-egg
hardcoded thing - [6]):
dig -t NAPTR http.uri.arpa.
For the HTTP uri scheme.
$ dig -t NAPTR http.uri.arpa.
...
http.uri.arpa. 21600 IN
NAPTR 0 0 "" "" "!^http://([^:/?#]*).*$!\\1!i" .
...
So what we get back is an NAPTR [5] record with 5 values:
value meaning
1 0 Order - if there are multiple NAPTR records
returned; this is the order of them.
2 0 Preference - preference within an order block
3 "" Flags; several are possible and generally
they denote a terminal rule (see below).
4 "" Services: list of protocols and services
supported by the end point (i.e. when it
is a terminal rule - see below).
5 ... Regular expression
6 ... Replacement [2]
So we have a new regex:
"!^http://([^:/?#]*).*$!\\1!i
Now apply this again to our url:
http://www.asemantics.com//n/index.html =~
"!^http://([^:/?#]*).*$!\\1!i .
results in
www.asemantics.com
Note that this was the last central/standards defined step;
everything from here is totally fqdn manager specific (i.e.
to who-ever manages asemantics.com).
Now continue our DDDS loop (which is NOT recursive):
dig -t NAPTR www.asemantics.com
And we get
ww.asemantics.com. 1800 IN
NAPTR 100 20 "" "" "!^http://([^:/?#]*).*$!bali.asemantics.com!i" .
NAPTR 100 10 "" ""
"!^http://foaf.([^:/?#]*).*$!foaf.asemantics.com!i" .
Again we apply the regexes to the URL, in the right order (ordered by
order first and by pref (second field) second).
order 100, pref 10
http://www.asemantics.com//n/index.html
=~"!^http://foaf.([^:/?#]*).*$!foaf.asemantics.com!i"
no match. Ok, next one.
order 100, pref 20
http://www.asemantics.com//n/index.html =~
"!^http://([^:/?#]*).*$!bali.asemantics.com!i"
and we get
bali.asemantics.com
So what has happened here is that we are routing the request to the
right place;
as some URI's on our FOAF server are special cases; whereas most of them
go to the server Bali.
Then, you've guessed it, we do an other lookup
dig -t NAPTR bali.asemantics.com
and get back:
bali.asemantics.com 1800 IN
NAPTR 100 10 "u" "http+I2L"
"!^http://([^:/?#]*)(.*)$!http://\\1/url.pl/\\2!i" .
NAPTR 100 10 "a" "z3950+I2C"
"!^http://([^:/?#]*)(.*)$!209.132.96.45!i" .
NAPTR 100 10 "u" "http+I2C"
"!^http://([^:/?#]*)(.*)$!http://\\1/rdf.pl/\\2!i" .
NAPTR 100 10 "u" "http+I2R" "!^(.*)$!\\1!i" .
Note that this time there is a value in the 'flags' field; a 'U'. This
signals that
a match of the corrensponding regex means:
-> Terminal; do not evaluate any further.
-> And the result of the regex (if it matched) MUST be a URI.
Several other flags are defined.
Secondly you'll notice that the 'service' field contains something. The
syntax is
[ protocol ] [ '+' service >
Where protocol is any valid IANA service (see your /etc/services file);
http
or ftp are well known examples and 'service' can be several values;
shown
above are
I2R Identifier to Resource -> give me the thing
I2L identifier to Location -> give me the location
I2C identifier to Characteristic -> give me metadata about the
resource
So lets now assume that we started this procedure out with the desire
to learn ABOUT the url, and that we speak http; then apply the above
rules:
NAPTR 100 10 "u" "http+I2R" "!^(.*)$!\\1!i" .
would match fine; we can do http, but we're not interested in I2R, so
next we try
NAPTR 100 10 "a" "z3950+I2C"
"!^http://([^:/?#]*)(.*)$!209.132.96.45!i" .
and this matches, we want I2C - but we're no dinosaurs; so we do not
speak
z3950. So next we try:
NAPTR 100 10 "u" "http+I2C"
"!^http://([^:/?#]*)(.*)$!http://\\1/rdf.pl/\\2!i" .
this matches, and we can do http and we want I2C; so the fiinal result
is
http://www.asemantics.com/rdf.pl//n/index.html
and the terminal type is 'U' - so I should interpret the above result
as a URI. [3]
Apologies for any types/cut-and-paste errors in above - I'll spend some
cycles
in the next week to simply the rules in our demo domain above to make
it a
bit easier to follow.
On http://foaf-demo.asemantics.com/ex.html you can find some very rough
and
ready code in perl/java which does the above; OR (better) you can cut
and
paste the algorithm from RFC 3402 and 3404. Which probably is much
quicker.
(Though I'd love to hear if you open source your
python/perl/php/ruby/assembler
version of it :-).
C'est Ãa
Dw.
Notes
1: Using abbreviated/simplified and not quite correct
regexpes in perl style to make it easier to follow, the
real ones are more complex to deal with escaping
and match exact the URI def in the RFC.
2; See rfc3401-3405 for exactly how when this is used; in
general use the regex if there is a value or do outright
substitution if it is empty with the replacement value.
3: Other options are an SRV record or simply an IP
address.
4: Rfc 2396 Uniform Resource Identifiers
5: Rfc 2915 NAPTR record
6. Rfc 3405, http://uri.net/ddds.html
Dw