tag:blogger.com,1999:blog-4181762098927767112016-12-08T05:01:07.344-08:00Larry Masinter MusingsPersonal blog of Larry Masinter on mainly technical topics.Larry Masinterhttps://plus.google.com/106838758956333672633noreply@blogger.comBlogger42125tag:blogger.com,1999:blog-418176209892776711.post-69405806404216799392016-09-10T18:59:00.000-07:002016-09-27T21:35:33.834-07:00IETF "Security Considerations" and PDFOne of the things I've been doing lately is trying to dampen some of the misconceptions and misdirections concerning the Portable Document Format (PDF). I'm not sure why, except people seem to forget what was good about it and what wasn't (isn't).<br />
<br />
<h3>
Some background about PDF </h3>
Everyone has heard of PDF, but I'm not sure there's widespread understanding of its role and history. <a href="https://en.wikipedia.org/wiki/Portable_Document_Format">Wikipedia PDF</a> isn't too bad; page independent data structures but based on Postscript, a way of getting licenses to embed fonts, first released in 1993. Originally a "distill" of a printed page, over the years, features were added: forms, 3D, compression, reflow, accessibility.<br />
<br />
PDF is over 20 years old ... "as old as the Web"-- I first heard about it at GopherCon '93. Has it run its course, time for something new? But PDF supplies a unique solution for an application that spans the work between paper documents and electronic, and assurances of fidelity: if I send you a document, and you say you got it and read it (using a conforming reader) then I know what you saw. <br />
<h3>
&nbsp;</h3>
<h3>
&nbsp;MIME types</h3>
In email and web, file types are labeled by a two part label, like text/html, image/jpeg, application/pdf. This "Internet Media Type" is (supposed to be) used in content-type headers in email and web as a way for the sender to say how a receiver should interpret the content (except for "sniffing" but that's another blog post).<br />
<br />
There's an official list of media types managed by IANA (in the news lately for other reasons, another blog post). IANA, Internet Assigned Numbers[sic] Auhority, is in charge of maintaining the<br />
&nbsp;registries, as directed by the IETF. <br />
<br />
IETF has a different decision-making process than other standards groups. However, as usual, the process involves creating and distributing a draft, and asking for comments. Comments need to be responded to, even if you don't make changes because of the comment. Different kinds of documents have different criteria for advancement, and it's sometimes hard to figure out what rules apply.<br />
<h3>
&nbsp;</h3>
<h3>
Getting <a href="https://tools.ietf.org/pdf/draft-hardy-pdf-mime-04.pdf">draft-hardy-pdf-mime</a> to RFC </h3>
I got into updating the registration of PDF a while back, while
working on "<a href="https://datatracker.ietf.org/doc/draft-iab-rfc-use-of-pdf/">PDF for RFCs</a>", and, after consultation, took the path of
revising the RFC which authorized the current registration, RFC 3778, in the form of <a href="https://datatracker.ietf.org/doc/draft-hardy-pdf-mime/">a document that replaced 3778, including the registration template for application/pdf</a> . That's the document I'm trying to get passed.<br />
<br />
There were lots of comments during the review period, and I responded to most of them last week, in <a href="https://mailarchive.ietf.org/arch/msg/ietf/yKbjLYUcxHHdj63PbrWbte12jBEd">a single email</a>.<br />
<br />
<h3>
Which process?</h3>
&nbsp;I won't go into the detailed rules, but the path we chose involved getting IESG approval for an Informational specification, one of the paths laid out in <a href="https://tools.ietf.org/html/rfc6838">RFC 6838</a>, which lays out the rules for the IANA media type registry.<br />
<br />
But which rules apply?&nbsp; RFC 6838 Section 3.1 for types "registered by a recognized standards-related organization using the 'Specification Required' IANA registration policy [<a href="https://tools.ietf.org/html/rfc5226" title="&quot;Guidelines for Writing an IANA Considerations Section in RFCs&quot;">RFC5226</a>]"? Or do we follow Section 6.1, "in cases where the original definition of the scheme is contained in an IESG-approved document, updates of the specification also requires IESG approval."?<br />
<br />
And does the "DISCUSS" laid on the document's approval meet any of the criteria of <a href="https://www.ietf.org/iesg/statement/discuss-criteria.html">the rules for a DISCUSS</a>?<br />
<br />
But I'd like to accommodate the&nbsp; common request that the document say more about security of PDF software. It's well-known that PDF has been a vector for several infamous exploits... why can't we say more?<br />
<br />
<h3>
"Security Considerations"</h3>
IETF has an unusual policy of requiring ALL documents (https://tools.ietf.org/html/rfc3552 Section 5) to consider security and document threats and possible mitigations. ISO has no such rule; security is considered the responsibility of the implementation. W3C nominally does through TAG review, I think, but WHATWG is more haphazard.&nbsp;&nbsp;&nbsp; The question remains: does a conforming implementation <i>require</i> that the implementation expose the user to security risks.<br />
<br />
I'm sure we could say more. And if this were a new registration or the PDF spec itself I'd try. But application/pdf has been around over 20 years, the exploits and their prevention publicized.<br />
&nbsp;But there is no single valid account of software vulnerabilities; the paper suggested (in a COMMENT, not a DISCUSS) isn't anything I could cite; I disagree with too many parts.<br />
<br />
I’ll go back to the question of the purpose of “Security Considerations” in MIME registrations; for whom should it be written?&nbsp; For a novice, it is not enough. For an expert, you wind up enumerating the exploits that are understood and can be explained. The situation is fluid because the deployment of browser-based PDF interpreters is changing for desktop and mobile, and PDF is just another part of the web.<br />
<br />
I agreed with the reasoning behind the requirement, that requiring everyone to write about Security might make them think a little more about security.<br />
<br />
But I think there’s another view, that&nbsp; Security is a feature of the implementation. It’s the implementation’s job to mitigate vulnerabilities. So any security problems, blame the implementation, not the protocol.&nbsp; And the implementors need to worry about not writing buggy code, not just about security per se.<br />
<br />
And there is no point of saying “write your implementations carefully”, because there are so many ways to write software badly. Talking about the obvious easy-to-describe exploits isn’t really useful, because we know how to avoid those.<br />
<br />
Now perhaps this is just "don't set a bad precedent". So maybe the clue is to follow text/html, and suggest that "entire novels" have been written about PDF security, but not here in the Internet Media Type registration.Larry Masinterhttps://plus.google.com/106838758956333672633noreply@blogger.com0tag:blogger.com,1999:blog-418176209892776711.post-72209542030720879652015-02-15T12:40:00.000-08:002015-02-15T12:40:27.956-08:00Birthday greetings, Packed committees, community, standards<div class="tr_bq">
Today is my birthday. I woke to find many birthday greetings on Facebook, and more roll in throughout the day. It's hard to admit how pleasing it is, embarrassing. I haven't asked for birthday greetings and don't usually give them. Maybe I'll change my mind.</div>
Perhaps I'm late to the party but I'm still trying to understand the 'why' of social networking -- why does Facebook encourage birthday greetings? What human pleasure does getting 11 "happy birthday" notes trigger?<br />
But it fits into the need to have and build community, and the mechanism for community requires periodic acknowledgement. We engage in sharing our humanity (everyone has a birthday) by greeting. Hello, goodbye, I'm here, poke. But not too often, once a year is enough.<br />
I wrote about <a href="http://www.ietf.org/mail-archive/web/ietf/current/msg91783.html" target="_blank">standards and community</a> yesterday on the IETF list, &nbsp;but people didn't get it. <br />
Explaining that message and its relationship to birthday greetings is hard.<br />
The topic of discussion was "Updating BCP 10 -- NomCom ELEGIBILITY".<br />
<ul>
<li><a href="https://www.ietf.org/" target="_blank">IETF</a>&nbsp;: group that does Internet Standards</li>
<li><a href="https://tools.ietf.org/html/bcp10" target="_blank">BCP10</a>: the unique process for how IETF recruits and picks leadership</li>
<li><a href="https://www.ietf.org/nomcom/" target="_blank">NOMCOM</a>: the "Nominating Committee" which picks leadership amongst volunteers</li>
<li><a href="http://tools.ietf.org/html/rfc7437#section-4.14" target="_blank">elegibility</a>: qualifications for getting on the NOMCOM</li>
</ul>
<div>
I think BCP10 is a remarkable piece of social engineering, at the center of the question of governance of the Internet: how to make hard decisions, who has final authority, who gets to choose them, and how to choose the choosers. &nbsp; Most standards groups get their authority from governments and treaties or are consortia. But IETF developed a complex algorithm for trying to create structure without any other final authority to resolve disputes.</div>
<div>
<br /></div>
<div>
But it looks like this complex algorithm is buggy, and the IETF is trying to debug the process without being too open about the problem. The idea was to let people volunteer, and choose randomly among qualified volunteers. But what qualifications? There's been some concern about the latest round of nomcom volunteers, that's what started this thread.</div>
<div>
<br /></div>
<div>
During the long email thread on the topic, the discussion turned to the tradeoffs between attending a meeting in person vs. using new Internet tools for virtual meetings or more support for remote participation. &nbsp;Various people noted that the advantage of meeting in person is the ability to have conversations in the hallways outside the formal, minuted meetings.&nbsp;</div>
<div>
<br /></div>
<div>
I thought people were too focused on their personal preferences rather than the needs of the community. What are we trying to accomplish, and how do meetings help with that? How would we satisfy the requirements for effective work.</div>
<div>
<br /></div>
<div>
A few more bits: I mention some of the conflicts between IETF and other standards groups over URLs and JSON because W3C, WHATWG, ECMA are different tribes, different communities.</div>
<div>
<br /></div>
<blockquote class="tr_bq">
Creating effective standards is a community activity to avoid the&nbsp;<a href="http://en.wikipedia.org/wiki/Tragedy_of_the_commons" rel="nofollow">Tragedy of the Commons</a>&nbsp;that would result if individuals and organizations all went their own way. The common good is “the Internet works consistently for everyone” which needs to compete against “enough of the Internet works ok for my friends” where everyone has different friends.</blockquote>
<blockquote>
For voluntary standards to happen, you need rough consensus — enough people agree to force the remainder to go along.&nbsp;</blockquote>
<blockquote>
It’s a community activity, and for that to work there has to be a sense of community. And video links with remote participation aren’t enough to create a sense of community.&nbsp;</blockquote>
<blockquote>
There are groups that purport to manage with minimal face-to-face meetings, but I think those are mainly narrow scope and a small number of relevant players, or an already established community, and they regularly rely heavily on 24/7 online chat, social media, open source tools, wikis which are requirements for full participation.<br />
<br />
The “hallway conversations” are not a nice-to-have, they’re how the IETF preserves community with open participation.<br />
One negative aspect of IETF “culture” (loosely, the way in which the IETF community interacts) is that it isn’t friendly or easy to match and negotiate with other SDOs, so we see the WHATWG / W3C / IETF unnecessary forking of URL / URI / IRI, encodings, MIME sniffing, and the RFC7159-JSON competing specs based at least partly on cultural misunderstandings.<br />
The main thing nomcom &nbsp;needs to select for is &nbsp;technical leadership (the skill of getting people to follow) &nbsp;in service of the common good). And nomcom members should have enough experience to have witnessed successful leadership. One hopes there might be some chance of that just by attending 3 meetings, although the most effective leadership is often exercised in those private hallway conversations where compromises are made.</blockquote>
<br />
<br />
<br />
<br />
Larry Masinterhttps://plus.google.com/106838758956333672633noreply@blogger.com0tag:blogger.com,1999:blog-418176209892776711.post-34843668138507293152014-11-20T12:15:00.001-08:002016-09-10T19:43:50.110-07:00Ambiguity, Semantic web, speech acts, truth and beauty<div>
<br /></div>
<div>
(I think this post is pretty academic for the web dev crowd, oh well)</div>
<div>
<br /></div>
<div>
When talking about URLs and URNs or semantic web or linked data, I keep on returning to a topic. Carl Hewitt gave me a paper about inconsistency which this post reacts to.</div>
<div>
<br /></div>
<div>
The traditional AI model of semantics and meaning don't work well for the web.&nbsp;</div>
<div>
Maybe this is old-hat somewhere but if you know any writings on this topic, send me references.</div>
<div>
<br /></div>
<div>
In the traditional model (from Bobrow's essay in Representation and Understanding), the real world has objects and people and places and facts; there is a KRL Knowledge Representation Language in which statements about the world are written, using terms that refer to the objects in the real world. Experts use their expertise to write additional statements about the world, and an "Inference Engine" processes those statements together to derive new statements of facts.</div>
<div>
<br /></div>
<div>
This is like classic deduction "Socrates is a man, all men are mortal, thus Socrates is mortal" or arithmetic (37+53) by adding 7+3, write 0 carry 1 plus 3 plus 5 write 9, giving 90.</div>
<div>
<br /></div>
<div>
And to a first approximation, the semantic web was based on the idea of using URLs as the terms to refer to real world, and relationships, and RDF as an underlying KRL where statements consisted of triples.</div>
<div>
<br /></div>
<div>
Now we get to the great and horrible debate over "what is the range of the http function" which has so many untenable presumptions that it's almost impossible to discuss. That the question makes sense.</div>
<div>
That you can talk about two resources being "the same". That URLs are 'unambiguous enough', and the only question is to deal with some niggly ambiguity problems, with a proposal for new HTTP result codes.</div>
<div>
<br /></div>
<div>
So does http://larry.masinter.net refer to me or my web page? To my web page now or for all history, to just the HTML of the home page or does it include the images loaded, or maybe the whole site?</div>
<div>
<br /></div>
<div>
"http://larry.masinter.net" "looks" "good".</div>
<div>
<br /></div>
<div>
So I keep on coming back to the fundamental assumption, the model for the model.</div>
<div>
<br /></div>
<div>
Coupled with my concern that we're struggling with identity (what is a customer, what is a visitor) in every field, and phishing and fraud on another front.</div>
<div>
<br /></div>
<div>
Another influence has been thinking about "speech acts". It's one thing to say "Socrates is a man" and completely different thing to say "Wow!". "Wow!" isn't an assertion (by itself), so what is it? It's a "speech act" and you distinguish between assertions and questions and speech acts.</div>
<div>
<br /></div>
<div>
A different model for models, with some different properties:</div>
<h4>
Every speech is a speech act.</h4>
<div>
&nbsp; &nbsp; &nbsp; There are no categories into assertion, question, speech act. Each message passed is just some message intending to cause a reaction, on receipt. And information theory applies: you can't supply more than the bits sent will carry. "http://larry.masinter.net" doesn't intrinsically carry any more than the entropy of the string can hold. You can't tell by any process whether it was intended to refer to me or to my web page.</div>
<h4>
Truth is too simple, make belief fundamental.&nbsp;</h4>
<div>
&nbsp; &nbsp;So in this model, individuals do not 'know' assertions, they only 'believe' &nbsp;to a degree. Some things are believed so strongly that they are treated as if they were known. Some things we don't believe at all. A speech act accomplishes its mission if the belief of the &nbsp;recipient changes in the way the sender wanted. &nbsp; Trust is a measure of influence: your speech acts that look like statements influence my beliefs about the world insofar as I trust you. The web page telling me my account balance influences my beliefs about how much I owe.<br />
<br />
<h4>
Changing the model helps think about security</h4>
Part of the problem with security and authorization is we don't have a good model for &nbsp;reasoning about it. Usually we divide the world into "Good guys" and "bad guys": Good guys make true statements ("this web page comes from bank trustme") &nbsp;while bad guys lie. (Let's block the bad guys.) &nbsp; By putting trust and ambiguity at the base of the model and not as an after-patch we have a much better way of describing what we're trying to accomplish.</div>
<h4>
Inference, induction, intuition are just different kinds of processing</h4>
<div>
&nbsp; &nbsp;In this model, you would like influence of belief to resemble logic in the cases where there is trust and those communicating have some agreement about what the terms used refer to. But inference is subject to its own flaws ("Which Socrates? What do you mean by mortal? Or 'all men'").&nbsp;</div>
<h4>
Every identifier is intrinsically ambiguous</h4>
<div>
Among all of the meanings the speaker might have meant, there is no inbound right way to disambiguate. Other context, out of band, might give the receiver of the message with a URL more information about what the sender might have meant. But part of the inference, part of the assessment of trust, would have to take into account belief about the sender's model as to what the sender might have meant. Precision of terms is not absolute.</div>
<h4>
URNs are not 'permanent' nor 'unambiguous', they're just terms with a registrar</h4>
<div>
I've written more on this which i'll expand elsewhere. But URNs aren't exempt from ambiguity, they're generally just URLs with different assigned organizations to disambiguate if called on.</div>
<h4>
Metadata, linked data, are speech acts too.</h4>
<div>
When you look in or around an object on the net, you can often &nbsp;find additional data, trying to tell you things about the object. This is the metadata. But it isn't "truth", metadata is also a communication act, just one where one of the terms used is the object.</div>
<div>
<br /></div>
<div>
There's more but I think I'll stop here. What do you think?</div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
Larry Masinterhttps://plus.google.com/106838758956333672633noreply@blogger.com3tag:blogger.com,1999:blog-418176209892776711.post-27199664064542050992014-09-14T14:04:00.001-07:002016-09-10T19:45:19.036-07:00Living Standards: "Who Needs IANA?"<div dir="ltr" style="text-align: left;" trbidi="on">
I'm reading about two tussles, which seem completely disconnected, although they are about the same thing, and I'm puzzled why there isn't a connection.<br />
<br />
This is about the IANA protocol parameter registries. &nbsp;Over in <a href="http://www.ietf.org/mail-archive/web/ianaplan/current/maillist.html" target="_blank">ianaplan@ietf.org</a>&nbsp;people are worrying about preserving the IANA function and the relationship between IETF and IANA, because it is working well and shouldn't be disturbed (by misplaced US political maneuvering that the long-planned transition from NTIA is somehow giving something away by the administration.)<br />
<br />
Meanwhile, over in <a href="http://lists.w3.org/Archives/Public/www-international/" target="_blank">www-international@w3.org</a>, there's a discussion of the Encodings document, being copied from WHATWG's document of that name into W3C recommendation. See the thread (started by me), about the "false statement".<br />
<br />
Living Standards don't need or want registries for most things the web use registries for now: Encodings, MIME types, URL schemes. A Living Standard has an exhaustive list, and if you want to add a new one or change one, you just change the standard. &nbsp;Who needs IANA with its fussy separate set of rules? Who needs any registry really? <br />
<br />
So that's the contradiction: why doesn't the web need registries while other applications do? Or is IANAPLAN deluded?</div>
Larry Masinterhttps://plus.google.com/106838758956333672633noreply@blogger.com0tag:blogger.com,1999:blog-418176209892776711.post-39293824014489029762014-09-09T14:47:00.002-07:002016-09-10T20:35:38.011-07:00The multipart/form-data mess<div dir="ltr" style="text-align: left;" trbidi="on">
OK, this is only a tiny mess, in comparison with <a href="http://masinter.blogspot.com/2014/09/the-url-mess.html" target="_blank">the URL mess</a>, &nbsp;and I have more hope for this one.<br />
<br />
Way back when (1995), I spec'ed a way of doing "file upload" in <a href="http://tools.ietf.org/html/rfc1867" target="_blank">RFC1867</a>.&nbsp;I got into this&nbsp;because some Xerox printing product in the 90s wanted it, and enough other folks in the web community seemed to want it too. I was happy to find something that a Xerox product actually wanted from Xerox research.<br />
<br />
It seemed natural, if you were sending files, to use MIME's methods for doing so, in the hopes that the design constraints were similar and that implementors would already be familiar with email MIME implementations. &nbsp;The original file upload spec was done in IETF because at the time, all of the web, including HTML, was being standardized in the IETF. &nbsp; RFC 1867 was "experimental," which in IETF used to be one way of floating a proposal for new stuff without having to declare it ready.<br />
<br />
After some experimentation we wanted to move the spec toward standardization. Part of the process of making the proposal standard was to modularize the specification, so that it wasn't just about uploading files in web pages. &nbsp; Rather, all the stuff about extending forms and names of form fields and so forth went with HTML. And the container, the holder of "form data"--&nbsp;independent of what kind of form you had or whether it had any files at all -- went into the definition of multipart/form-data (in&nbsp;<a href="http://tools.ietf.org/html/rfc2388" target="_blank">RFC2388</a>). &nbsp; Now, I don't know if it was "theoretical purity" or just some sense of building things that are general purpose to allow unintended mash-ups, but RFC2388 was pretty general, and HTML 3.2 and HTML 4.0 were being developed by people who were more interested in spec-ing a markup language than a form processing application, so there was a specification gap between RFC 2388 and HTML 4.0 about when and how and what browsers were supposed to do to process a form and produce multipart/form-data.<br />
<br />
February of last year (2013) I got a <a href="https://github.com/masinter/multipart-form-data/issues/18" target="_blank">request</a>&nbsp;to find someone to update RFC 2388. After many months of trying to find another volunteer (most declined because of lack of time to deal with the politics) I went ahead and started work: update the spec, investigate what browsers did, make some known changes. &nbsp;See&nbsp;<a href="https://github.com/masinter/multipart-form-data" target="_blank">GitHub repo for multipart/form-data</a>&nbsp;and&nbsp;<a href="http://tools.ietf.org/html/draft-ietf-appsawg-multipart-form-data" target="_blank">the latest Internet Draft spec</a>.<br />
<br />
Now, I admit I got distracted trying to build a test framework for a "test the web forward" kind of automated test, and spent way too much time building what wound up to be a fairly arcane system. But I've updated the document, and recommended its "working group last call". The only problem is that I just made stuff up based on some unvalidated guesswork reported second hand ... there is no working group of people willing to do work. No browser implementor has reviewed the latest drafts that I can tell.<br />
<br />
I'm not sure what it takes to actually get technical reviewers who will actually read the document and compare it to one or more implementations to justify the changes in the draft.<br />
<br />
Go to it! Review the spec! Make concrete suggestions for change, comments or even better, send GitHub pull requests!<br />
<br />
<br />
<br /></div>
Larry Masinterhttps://plus.google.com/106838758956333672633noreply@blogger.com0tag:blogger.com,1999:blog-418176209892776711.post-5692820264030217592014-09-07T19:54:00.000-07:002016-09-10T20:38:53.403-07:00The URL mess<div dir="ltr" style="text-align: left;" trbidi="on">
(updated 9/8/14)<br />
<br />
One of the main inventions of the Web was the URL. &nbsp;And I've gotten stuck trying to help fix up the standards so that they actually work.<br />
<br />
The standards around URLs, though, have gotten themselves into an organizational political quandary to the point where it's like many other situations where a polarized power struggle keeps the right thing from happening.<br />
<br />
Here's an update to&nbsp;<a href="http://lists.w3.org/Archives/Public/www-archive/2014Apr/0014.html" target="_blank">an earlier description of the situation</a>:<br />
<br />
URLs were originally defined as ASCII only. Although it was quickly determined that it was desirable to allow non-ASCII characters, shoehorning utf-8 into ASCII-only systems was unacceptable; at the time, Unicode was not so widely deployed, and there were other issues. The tack was taken to leave "URI" alone and define a new protocol element, "IRI"; &nbsp;RFC 3987 published in 2005 (in sync with the RFC 3986 update to the URI definition). &nbsp; (This is a very compressed history of what really happened.)<br />
<br />
The IRI-to-URI transformation specified in RFC 3987 &nbsp;had options; it wasn't a deterministic path. The URI-to-IRI transformation was also heuristic, since there was no guarantee that %xx-encoded bytes in the URI were actually meant to be %xx percent-hex-encoded bytes of a utf8 encoding of a Unicode string.<br />
<br />
To address issues and to fix URL for HTML5, a new working group was established in IETF in 2009 (<a href="https://tools.ietf.org/wg/iri/charters" target="_blank">The IRI working group</a>). Despite years of development, the group didn't get the attention of those active in WHATWG, W3C or Unicode consortium, and the IRI group was closed in 2014, with the consolation that the documents that were being developed in the IRI working group could be updated as individual submissions or within the "applications area" working group. &nbsp;In particular, one of the IRI working group items was to update the "<a href="http://tools.ietf.org/html/draft-ietf-appsawg-uri-scheme-reg" target="_blank">scheme guidelines and registration process</a>", &nbsp;which is currently under development in IETF's application area.<br />
<br />
Independently, the HTML5 specs in WHATWG/W3C defined "Web Address", in an attempt to match what some of the browsers were doing. This definition (mainly a published parsing algorithm) was moved out into a separate WHATWG document called "URL".<br />
<br />
The world has also moved on. ICANN has approved non-ascii top level domains, and IDN 2003 and 2008 didn't really address IRI Encoding. Unicode consortium is working on UTS #46.<br />
<br />
The big issue is to make the IRI -to-URI transformation non-ambiguous and stable. &nbsp;But I don't know what to do about non-domain-name non-ascii 'authority' fields. &nbsp;There is some evidence that some processors are %xx-hex-encoding the UTF8 of domain names in some circumstances.<br />
<br />
There are four umbrella organizations (IETF, W3C, WHATWG, Unicode consortium) and multiple documents, and it's unclear whether there's a trajectory to make them consistent:<br />
<br />
<h4 style="text-align: left;">
IETF</h4>
<div style="text-align: left;">
Dave Thaler (mainly) has updated<a href="http://tools.ietf.org/html/draft-ietf-appsawg-uri-scheme-reg" target="_blank"> http://tools.ietf.org/html/draft-ietf-appsawg-uri-scheme-reg</a>,&nbsp;which needs comunity review.</div>
<br />
The IRI working group closed, but work can continue in the APPS area working group. Documents sitting needing update, abandoned now, are three drafts (<a href="http://tools.ietf.org/html/draft-ietf-iri-3987bis" target="_blank">iri-3987bis</a>, <a href="http://tools.ietf.org/html/draft-ietf-iri-comparison" target="_blank">iri-comparison</a>, <a href="http://tools.ietf.org/html/draft-ietf-iri-bidi-guidelines" target="_blank">iri-bidi-guidelines</a>) intended originally to obsolete RFC 3987.<br />
<br />
Other work in IETF that is relevant but I'm not as familiar with is the IDN/IDNA work for internationalizing domain names, since the rules for canonicalization, equivalence, encoding, parsing, and displaying domain names needs to be compatible with the rules for doing those things to URLs that contain domain names.<br />
<br />
In addition, there's quite a bit of activity around URNs and library identifiers in the URN working group, work that is ignored by other organizations.<br />
<h4 style="text-align: left;">
W3C</h4>
The W3C has many existing recommendations which reference the IETF URI/IRI specs in various ways (for example, XML has its own restricted/expanded allowed syntax for URL-like-things). The HTML5 spec references something, the TAG seems to be involved, as well as the sysapps working group, I believe. I haven't tracked what's happened in the last few months.<br />
<br />
<h4 style="text-align: left;">
WHATWG</h4>
<div style="text-align: left;">
The WHATWG spec is <a href="http://url.spec.whatwg.org/">http://url.spec.whatwg.org/</a> &nbsp;(Anne, Leif). This fits in with the WHATWG principle of focusing on specifying what is important for browsers, so it leaves out many of the topics in the IETF specs. I don't think there is any reference to registration, and (when I checked last) had a fixed set of relative schemes: ftp, file, gopher (a mistake?), http, https, ws, wss, used IDNA 2003 not 2008, and was (perhaps, perhaps not) at odds with IETF specs.</div>
<br />
<h4 style="text-align: left;">
Unicode consortium</h4>
Early versions of &nbsp;#46 and I think others recommends translating toAscii and back using punycode &nbsp;? But it wasn't specific about which schemes.<br />
<br />
<h3 style="text-align: left;">
Conclusion</h3>
<br />
From a user or developer point of view, it makes no sense for there to be a proliferation of definitions of URL, or a large variety URL syntax categories. Yes, currently there is a proliferation of slightly incompatible implementations. &nbsp;This shouldn't be a competitive feature. Yet the organizations involved have little incentive to incur the overhead of cooperation, especially since there is an ongoing power struggle for legitimacy and control. The same dynamic applies to the Encoding spec, and, to a lesser degree, handling of MIME types (sniffing) and multipart/form-data.</div>
Larry Masinterhttps://plus.google.com/106838758956333672633noreply@blogger.com0tag:blogger.com,1999:blog-418176209892776711.post-14912157451597329922014-09-06T14:55:00.003-07:002016-09-10T20:40:35.935-07:00On blogging, tweeting, facebooking, emailing<div dir="ltr" style="text-align: left;" trbidi="on">
I wanted to try all the social media, just to keep an understanding of how things really work, I say.<br />
<br />
And my curiosity satisfied, I 'get' blogging, tweeting, facebook posting, linking in, although I haven't tried pinning and instagramming. And I'm not sure what about.me is about, really, and quora sends me annoying spam which tempts me to read.<br />
<br />
Meanwhile, I'm hardly blogging at all; I have lots of topics with something to say. &nbsp;Meanwhile Carol (wife) is <a href="http://carolscruise.blogspot.com/" target="_blank">blogging about a trip</a>;&nbsp;I supply photo-captions and Internet support.<br />
<br />
So I'm going to follow suit, try to blog daily. Blogspot for technical, Facebook for personal, tweet to announce. LinkedIn notice when there's more to read. &nbsp;I want to update my site, too; more on that later.<br />
<br /></div>
Larry Masinterhttps://plus.google.com/106838758956333672633noreply@blogger.com0tag:blogger.com,1999:blog-418176209892776711.post-36443246792216736102013-09-10T17:09:00.000-07:002013-09-11T00:23:19.579-07:00HTTP/2.0 worriesI tried to explain HTTP/2.0 in my <a href="http://masinter.blogspot.com/2013/09/why-http20-perspective.html" target="_blank">previous post</a>. This post notes some nagging worries about HTTP/2.0 going forward.&nbsp;Maybe these are nonsense, but ... tell me why I'm wrong ....<br />
<h4>
Faster is better, but faster for whom?</h4>
It should be no surprise that using software is more pleasant when it responds more quickly. &nbsp;But the effect is pronounced and the difference between "usable" and "just frustrating". &nbsp;For the web, the critical time is between when the user clicks on a link and the results are legible and useful.&nbsp;<a href="http://googleresearch.blogspot.com/2009/06/speed-matters.html" target="_blank">Studies</a>&nbsp;(and&nbsp;<a href="http://programming.oreilly.com/2009/07/velocity-making-your-site-fast.html" target="_blank">others</a>)&nbsp;show that improving page load time has a significant effect on the use of web sites. &nbsp;And a primary component of web speed is the network speed: not just the bandwidth but, for the web, the latency. Much of the world doesn't have high-speed Internet, and the web is often close to unusable.<br />
<br />
The problem is -- faster for whom? In general, when optimizing something, one makes changes that speed up common cases, even if making uncommon cases more expensive. Unfortunately, different communities can disagree about what is "common", depending on their perspective.<br />
<br />
Clearly, connection multiplexing helps sites that host all of their data at a single server more than it helps sites that open connection to multiple systems.<br />
<br />
It should be a good thing that the protocol designers are basing optimizations by measuring the results on real web sites and real data. But the data being used risks a bias; so far little of the data used has been itself published and results reproduced. Decisions in the working group are being made based on limited data, and often are not reproducible or auditable. <br />
<h4>
Flow control at multiple layers can interfere</h4>
This isn't the first time there's been an attempt to revise HTTP/1.1; the HTTP-NG effort also tried. One of the difficulties with HTTP-NG was that there was some interaction between TCP flow control and the framing of messages at the application layer, resulting in latency spikes. &nbsp;And those working with SPDY report that SPDY isn't effective without server "prioritization", which I understand to be predictively deciding which resources the client will &nbsp;need first, and returning their content chunks with higher priority for being sent sooner. While some servers have added such facilities for prioritization and prediction, those mechanisms are unreported and proprietary.<br />
<h4>
Forking &nbsp;</h4>
While HTTP/2.0 started with SPDY, <a href="http://www.chromium.org/spdy/spdy-protocol" target="_blank">SPDY</a>&nbsp;development development continues independently of HTTP/2.0. While the intention is to roll good ideas from SPDY into HTTP/2.0, there still remains the risk that the projects will fork. Whether the possibility of forking is itself positive or negative is itself controversial, but I think the bar should be higher.<br />
<h4>
Encryption everywhere&nbsp;</h4>
There is a long-running and still unresolved debate around the guidelines for using, mandating, requiring use of, or implementation of encryption, in both HTTP/1.1 and HTTP/2.0. It's clear that HTTP/2.0 changes the cost of multiple encrypted connections to the same host significantly, thus reducing the overhead of using encryption everywhere: Normally, setting up an encrypted channel is relatively slow, requiring a lot more network round trips to establish. With multiplexing, the setup cost only happens once, so encrypting everything is less of a problem.<br />
<br />
But there are a few reasons why that might not actually be ideal. For example, there is also a large market for devices which monitor, adjust, redirect or otherwise interact with unencrypted HTTP traffic; a company might scan and block some kinds of information on its corporate net. Encryption everywhere will have a serious impact for sites that have these interception devices, for better or worse. And adding encryption in a situation where the traffic is already protected is less than ideal, adding unnecessary overhead.<br />
<br />
In any case, encryption everywhere might be more feasible with HTTP/2.0 than HTTP/1.1 because of the lower overhead, but it doesn't promise any significant advantage for privacy per se.<br />
<br />
<b>Need realistic measurement data</b><br />
<br />
To insure that HTTP/2.0 is good enough to completely replace HTTP 1.1, it's necessary to insure that HTTP/2.0 is better in <i>all&nbsp;</i>cases. We do not have agreement or reproducable ways of measuring performance and impact in a wide variety of realistic configurations of bandwidth and latency. Measurement is crucial, lest we introduce changes which make things worse in unanticipated situations, or wind up with protocol changes that only help the use cases important to those who attend the meetings regularly and not the unrepresented.<br />
<h4>
</h4>
Larry Masinterhttps://plus.google.com/106838758956333672633noreply@blogger.com3tag:blogger.com,1999:blog-418176209892776711.post-41668314606297518432013-09-10T15:44:00.001-07:002013-09-20T19:41:00.880-07:00Why HTTP/2.0? A PerspectiveWhen setting up for the&nbsp;<a href="http://masinter.blogspot.com/2013/09/http-meeting-in-hamburg.html" target="_blank">HTTP meeting in Hamburg</a>, I was asked, reasonably enough, what the group is doing, why it was important, and my prognosis for its success. &nbsp;It was hard to explain, so I thought I'd try to write up my take "why&nbsp;<a href="http://en.wikipedia.org/wiki/HTTP/2.0" target="_blank">HTTP/2.0</a>?" &nbsp;Corrections, additions welcome.<br />
<h4>
HTTP Started Simple</h4>
<div>
The HyperText Transfer Protocol when first proposed was a very simple network protocol, much simpler than FTP (File Transfer Protocol), and quite similar to&nbsp;<a href="http://en.wikipedia.org/wiki/Gopher_(protocol)" target="_blank">Gopher</a>. Basically, the protocol is layered on the Transport Control Protocol (TCP) &nbsp;which sets up bi-directional reliable streams of data. HTTP/0.9 expected one TCP connection per user click to get a new document. When the user clicks a link, it takes the URL of the link (which contains the host, port, and path of the link) and</div>
<ol>
<li>Using DNS, client get the IP address of the server in the URL</li>
<li>opens a TCP connection to that server's address on the port named in the URL</li>
<li>client writes "GET" and the path of the URL onto the connection</li>
<li>the server responds with HTML for the page</li>
<li>the client reads the HTML and displays it</li>
<li>the connection is closed</li>
</ol>
<div>
Simple HTTP was adequate, judging by latency and bandwidth, as the overhead of HTTP/0.9 was minimal; the only overhead is the time to look up the DNS name and set up the TCP connection.&nbsp;</div>
<h4>
Growing Complexity</h4>
<div>
HTTP got lots more complicated; changes were reflected in a series of specifications, initially with <a href="http://tools.ietf.org/html/rfc1945" target="_blank">HTTP/1.0</a>, and subsequently<a href="http://tools.ietf.org/html/rfc2616" target="_blank"> HTTP/1.1</a>. Evolution has been&nbsp;lengthy, painstaking work; a second edition of the HTTP/1.1 specification (in six parts, only now nearing completion) has been under development for 8 years.&nbsp;</div>
<h4>
Adding Headers</h4>
<div>
HTTP/1.0 request and response (steps 3 and 4 above) added <i>headers</i>: fields and values that modified the meaning of requests and responses. Headers were added to support a wide variety of additional use cases, e.g., adding a "Content-Type" header to allow images and &nbsp;other kinds of content, a "Content-Transfer-Encoding" header and others to allow optional compression, quite a number of headers for support of caching and cache maintenance, a "DNT" header to express user privacy preferences.<br />
<br />
While each header has its uses and justification, and many are optional, headers add both size and complexity to every HTTP request. When HTTP headers get big, there is more chance of delay (e.g., the request no longer fits in a single packet), and the same header information gets repeated.<br />
<h4>
Many More Requests per Web Page</h4>
</div>
<div>
The use of HTTP changed, as web expressiveness increased. Initially NCSA Mosaic led by supporting embedded &nbsp;images in web pages, doing this by using a separate URL and HTTP request for each image. &nbsp;Over time, more elements also have been set up as separate cachable resources, such as style sheets, JavaScript and fonts. Presently, <a href="https://developers.google.com/speed/articles/web-metrics" target="_blank">the average popular web home page makes over 40 HTTP requests</a>.&nbsp;<https: articles="" developers.google.com="" speed="" web-metrics="">&nbsp;</https:><br />
<h4>
HTTP is stateless</h4>
<h4>
<span style="font-weight: normal;">Neither client nor server need to allocate memory or remember anything from one request/response to the next. This is an important characteristic of the web that allows highly popular web sites to serve many independent clients simultaneously, because the server need not allocate and manage memory for each client.&nbsp;</span>&nbsp;Headers must be repeatedly sent, to maintain the stateless nature of the protocol.</h4>
<h4>
Congestion and Flow Control</h4>
&nbsp;Flow control in TCP, like traffic metering lights, throttles a sender's output to match the receivers capability to read. Using many simultaneous connections does not work well, because the streams use the same routers and bridges which must manage the streams independently, but the TCP flow control algorithms do not, cannot, take into account the other traffic on the other connections. Also, setting up a new connection potentially involves additional latency, and opening encrypted connections is even slower since it requires more round-trips of communication of information.</div>
<div>
<h4>
Starting HTTP/2.0</h4>
</div>
While these problems were well-recognized quite a while ago, work on optimizing HTTP labeled "HTTP-NG" (next generation) foundered. But more recent work (and deployment) by Google on a protocol called SPDY shows that, at least in some circumstances, HTTP can be replaced with something which can improve page load time. SPDY is already widely deployed, but there is an advantage in making it a standard, at least to get review by those using HTTP for other applications. The IETF working group finishing the HTTP/1.1 second edition ("HTTPbis") has been rechartered to develop HTTP/2.0 which addresses performance problems. The group decided to start with (a subset of) SPDY<https: draft-mbelshe-httpbis-spdy-00="" html="" tools.ietf.org="">&nbsp;and make changes from there.</https:><br />
<br />
HTTP/2.0 builds on HTTP/1.1; for the most part, it is not a reduction of the complexity of HTTP, but rather adds new features primarily for performance.<br />
<h4>
Header Compression</h4>
The obvious thing to do to reduce the size of something is to try to compress it, and HTTP headers compress well. But the goal is not just to speed transmission, it's also to reduce parse time of the headers. The <a href="http://tools.ietf.org/html/draft-ietf-httpbis-header-compression" target="_blank">header compression method</a> is undergoing significant changes.<br />
<h4>
Connection multiplexing</h4>
One way to insure coordinated flow control and avoid causing network congestion is to "multiplex" a single connection. That is, rather than open 40 connections, only open one per destination. A site that serves all of its images and style sheets and JavaScript libraries on the same host could send the data for the page over the same connection. The only issue is how to coordinate independent requests and responses which can either be produced or consumed in chunks.<br />
<h4>
Push vs. Pull</h4>
A "push" is when the server sends a response that hadn't been asked for. HTTP semantics are strictly request followed by response, and one of the reasons why HTTP was considered OK to let out through a firewall that filtered out incoming requests. &nbsp;When the server can "push" some content to clients even when the client didn't explicitly request it, it is "server push". &nbsp;Push in HTTP/2.0 uses a promise "A is what you would get if you asked for B", that is, a promise of the result of a potential pull. The HTTP/2.0 semantics are developed in such a way that these "push" requests look like they are responses to requests not made yet, so it is called a "push promise". &nbsp;Making use of this capability requires redesigning the web site and server to make proper use of this capability.<br />
<br />
With this background, I can now talk about some of the ways HTTP/2.0 can go wrong. Coming up!<br />
<div>
<br /></div>
Larry Masinterhttps://plus.google.com/106838758956333672633noreply@blogger.com1tag:blogger.com,1999:blog-418176209892776711.post-20483812436468119102013-09-06T14:07:00.002-07:002013-09-10T17:12:32.798-07:00HTTP meeting in HamburgI was going to do a trip report about the <a href="https://github.com/http2/wg_materials/tree/master/interim-13-08" target="_blank">HTTPbis meeting August 5-7</a> at the <a href="https://www.facebook.com/AdobeHamburg" target="_blank">Adobe Hamburg office</a>, but wound up writing up a longer essay about HTTP/2.0 (which I will post soon, promise.) So, to post the photo:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://1.bp.blogspot.com/-7GzAbGwbZso/UionZy16YgI/AAAAAAAABKM/DwJ5psgO8AI/s1600/httpbis-hamburg.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="105" src="http://1.bp.blogspot.com/-7GzAbGwbZso/UionZy16YgI/AAAAAAAABKM/DwJ5psgO8AI/s400/httpbis-hamburg.jpg" width="400" /></a></div>
<br />
It was great to have so many&nbsp;knowledgeable&nbsp;implementors working on live interoperability: 30 people from around the industry and around the world came, including participants from Adobe, Akamai, Canon, Google, Microsoft, Mozilla, Twitter, and many others representing browsers, servers, proxies and other&nbsp;intermediaries.<br />
It's good the standard development is being driven by implementation and testing. While testing across the Internet is feasible, meeting face-to-face helped with establishing coordination on the standard.
<br />
I do have some concerns about things that might go wrong, which I'll also post soon.
Larry Masinterhttps://plus.google.com/106838758956333672633noreply@blogger.com0tag:blogger.com,1999:blog-418176209892776711.post-38946569387397618552013-07-21T08:08:00.001-07:002013-07-21T08:45:39.789-07:00Linking and the Law<a href="https://plus.google.com/115446420930193049836/posts" target="_blank">Ashok Malhotra</a>&nbsp;and I (with help from a few&nbsp;friends) wrote a&nbsp;short blog post&nbsp;&nbsp;"<a href="http://malhotrasahib.blogspot.com/2013/07/linking-and-law.html" target="_blank">Linking and the Law</a>" as a follow-on of the W3C TAG note&nbsp;<a href="http://www.w3.org/TR/publishing-linking/" target="_blank">Publishing and Linking on the Web</a>&nbsp;(which Ashok and I helped with after its original work by Jeni Tennison and Dan Appelquist.)<br />
<br />
Now, we wanted to make this a joint publication, but ... where to host it? Here, Ashok's personal blog, Adobe's, the W3C?<br />
<br />
Well, rather than including the post here (copying the material) and in lieu of real transclusion, I'm linking to Ashok's blog: see "<a href="http://malhotrasahib.blogspot.com/2013/07/linking-and-law.html" target="_blank">Linking and the Law</a>".<br />
<br />
Following this: the problems identified in&nbsp;<a href="http://www.w3.org/2001/tag/doc/governanceFramework-2012-07-19.html" target="_blank">Governance and Web Architecture</a>&nbsp;are visible here:<br />
<ol>
<li>Regulation doesn't match technology</li>
<li>Regulations conflict because of technology mis-match</li>
<li>Jurisdiction is local, the Internet is global</li>
</ol>
<div>
These principles reflect the difficulties for Internet governance ahead. The debates on managing and regulating the Internet are getting more heated. The most serious difficulty for Internet regulation is the risk that the regulation won't actually make sense with the technology (as we're seeing with Do Not Track).<br />
The second most serious problem is that standards for what is or isn't OK to do will vary widely across communities to the extent that user created content cannot be reasonably vetted for general distribution.<br />
<br /></div>
Larry Masinterhttps://plus.google.com/106838758956333672633noreply@blogger.com0tag:blogger.com,1999:blog-418176209892776711.post-34388869134818606832013-04-02T07:56:00.001-07:002013-04-02T08:59:17.062-07:00Safe and Secure InternetThe Orlando IETF meeting was sponsored by Comcast/NBC Universal. IETF sponsors get to give a talk on Thursday afternoon of IETF week, and the talk was a panel, <a href="https://www.ietf.org/meeting/86/86-host-speaker.html">"A Safe, Secure, Scalable Internet"</a>.
<p>
What I thought was interesting was the scope of what the speaker's definition of "Safe" and "Secure", and the mismatch to the technologies and methods being considered. "Safety" included "letting my kids surf the web without coming across pornography or being subject to bullying", while the methods they were talking about were things like site blocking by IP address or routing.
<p>
This seems like a oomplete mismatch. If bullying happens because harassers facebook post nasty pictures which they label with the victim's name, this problem cannot be addressed by IP-address blocking. "Looking in the wrong end of the telescope."
<p>
I'm not sure there's a single right answer, but we have to define the question correctly.
Larry Masinterhttps://plus.google.com/106838758956333672633noreply@blogger.com0tag:blogger.com,1999:blog-418176209892776711.post-60343462903934807852013-03-25T07:02:00.000-07:002013-04-02T08:57:25.720-07:00Standardizing JSON<i>Update 4/2/2013: in <a href="http://www.ietf.org/mail-archive/web/json/current/msg00254.html">an email</a> to the <a href="">IETF JSON mailing list</a>, Barry Leiba (Applications Area director in IETF) noted that discussions had started with ECMA and ECMA TC 39 to reach agreement on where JSON will be standardized, before continuing with the chartering of an IETF working group.</i>
<p>
JSON (JavaScript Object Notation) is a text representation for data interchange. It is derived from the JavaScript scripting language for representing data structures and arrays. Although derived from JavaScript, it is language-independent, with parsers available for many programming languages.
<p>
JSON is often used for serializing and transmitting structured data over a network connection. It is commonly used to transmit data between a server and web application, serving as an alternative to XML.
</p>
<p>
JSON was originally specified by Doug Crockford in <a href="http://tools.ietf.org/html/rfc4627">RFC 4627</a>, an "Informational" RFC. &nbsp;IETF specifications known as RFCs come in lots of flavors: an "Informational" RFC isn't a standard that has gone through careful review, while a "standards track" RFC is.</p>
<p>
An increasing number of other IETF documents want to specify a reference to JSON, and the IETF rules generally require references to other documents that are the same or higher levels of stability. For this reason and a few others, the IETF is starting a <a href="http://trac.tools.ietf.org/wg/appsawg/trac/wiki/JSON">JSON working group</a> (<a href="https://www.ietf.org/mailman/listinfo/json">mailing list</a>) to update RFC 4627.</p>
<p>
The JavaScript language itself is standardized by a different committee (<a href="http://www.ecma-international.org/memento/TC39.htm">TC-39</a>) in a different standards organization (<a href="http://www.ecma-international.org/memento/index.html">ECMA</a>). &nbsp;For various reasons, the <b>standard </b>is called "ECMAScript" rather than JavaScript. &nbsp;TC 39 published ECMAScript 5.1, and are working on ECMAScript 6, with a plan to be done in the same time frame as the IETF work.</p>
<p>
The <a href="http://www.w3.org/">W3C&nbsp;</a> also is developing standards that use JSON and need a stable specification.</p>
<h4>
Risk of divergence</h4>
<p>
Unfortunately, there is a possibility of (minor) divergence between the two specifications without coordination, either formally (organizational liaison) or informally, e.g., by making sure there are participants who work in both committees.</p>
<p>
There is a formal liaison between IETF and W3C. There is <strike>currently no</strike> <i>also</i> a <a href="http://www.w3.org/2001/11/StdLiaison">formal liaison between W3C</a> and ECMA (and a mailing list, <a href="http://lists.w3.org/Archives/Public/public-script-coord/">public-script-coord@w3.org</a> ). There is no formal liaison between TC39/ECMA and IETF.</p>
<p>
Having multiple conflicting specifications for JSON would be bad. While some want to avoid the overhead of a formal liaison, there needs to be explicit assignment of responsibility. I'm in favor of a formal liaison as well as informal coordination. I think it makes sense for IETF to specify the "normative" definition of JSON, while ECMA TC-39's ECMAScript 6.0 and W3C specs should all point to the new IETF spec.</p>
<h3>
JSON vs. XML</h3>
<p>
JSON is often considered as an alternative to XML as a way of passing language-independent data structures as part of network protocols.</p>
<p>
In the IETF, <a href="http://tools.ietf.org/html/bcp70">BCP 70</a>&nbsp;(also known as <a href="http://tools.ietf.org/html/rfc3470">RFC 3470</a>)&nbsp;<i>"Guidelines for the Use of Extensible Markup Language (XML) within IETF Protocols"</i> gives guidelines for use of XML in network protocols. However, this published in 2003. (I was a co-author with Marshall Rose and Scott Hollenbeck.)</p>
<p>
But of course these guidelines don't answer the question many have: When people want to pass data structures between applications in network protocols, do they use XML or JSON and when? What is the rough consensus of the community? Is it a choice? What are the alternatives and considerations? (Fashion? deployment? expressiveness? extensibility?)&nbsp;</p>
<p>
This is a critical bit of web architecture that needs attention. The community needs guidelines for understanding the competing benefits and costs of XML vs. JSON. &nbsp;If there's interest, I'd like to see an update to BCP 70 which covers JSON as well as XML.</p>Larry Masinterhttps://plus.google.com/106838758956333672633noreply@blogger.com9tag:blogger.com,1999:blog-418176209892776711.post-21651988225518110462012-12-30T17:50:00.000-08:002012-12-30T17:50:50.885-08:00Reinventing the W3C TAG<blockquote>
This is the fourth in a series of blog posts about my personal priorities for Web standards and the W3C TAG, as part of the ongoing TAG election.</blockquote>
<p> The <a href="http://www.w3.org/2004/10/27-tag-charter.html#Mission">Mission of the W3C TAG</a> has three aspects:</p>
<ol>
<li> to document and build consensus around principles of Web architecture and to interpret and clarify these principles when necessary;</li>
<li> to resolve issues involving general Web architecture brought to the TAG; and</li>
<li>
to help coordinate cross-technology architecture developments inside and outside W3C.</li>
</ol>
<p>Success has been elusive:</p>
<ol>
<li> After the publication of <a href="http://www.w3.org/TR/webarch/">Architecture of the World Wide Web</a> in 2004, attempts to update it, extend it, or even clarify it have foundered.</li>
<li>Issues involving general Web architecture are rarely brought to the TAG, either by Working Group chairs, W3C staff, or the W3C Director, and those issues that have been raised have rarely been dealt with promptly or decisively.</li>
<li>The TAG's efforts in coordinating cross-technology architectural developments within W3C (XHTML/HTML and RDFa/Microdata) have had mixed results. Coordinating cross-technology architecture developments outside W3C would require far more architectural liaison, primarily with <a href="http://masinter.blogspot.com/2012/12/w3c-and-ietf-coordination.html">IETF's Internet Architecture Board</a> but also with ECMAScript TC39.</li>
</ol>
<h4>Building consensus around principles of Web architecture</h4>
<p> I have long argued that the TAG practice of issuing <a href="http://www.w3.org/2001/tag/findings">Findings</a> is
not within the TAG charter, and does not build consensus. In the W3C,
the issuing of a Recommendation is the stamp of consensus. There may
be a few cases where the TAG is so far in advance of the
community that achieving sufficient consensus for Recommendation is
impossible, but those cases should be extremely rare.</p>
<ul>
<li><b>Recommendation</b>: Review <a href="http://www.w3.org/2001/tag/findings">TAG Findings</a> and triage; either (a) update and bring the Finding to Recommendation, (b) obsolete and withdraw, or (c) hand off to a working group or task force.</li>
</ul>
<p>To build consensus, the TAG's technical focus should match more closely the interest of the Web community.</p>
<ul>
<li><b>Recommendation:</b> Encourage and elect new TAG members with proven leadership skills as well as interest and experience in the architectural topics of most interest to W3C members.</li>
<li><b>Recommendation:</b> The TAG should focus its efforts on the "Web of Applications" at the expense of shedding work on the semantic web and pushing <a href="http://www.w3.org/2001/tag/group/track/issues/57">ISSUE-57</a> and related topics to a working group or task force.</li>
</ul>
<p> Updating AWWW to cover Web applications, Web security and other architectural components of the modern Web is a massive task, and those most qualified to document the architecture are also likely to be inhibited by the overhead and legacy of the TAG.</p>
<ul>
<li><b>Recommendation:</b> Charter a task force or working group to update AWWW.</li>
</ul>
<h4>Resolving issues involving general Web architecture brought to the TAG</h4>
<p>To resolve an issue requires addressing it quickly, decisively, and in a way that is accepted by the parties involved. The infamous ISSUE-57 has been unresolved for over five years. The community has, for the most part, moved on. </p>
<ul>
<li><b>Recommendation:</b> encourage Working Group chairs and staff to bring current architectural issues to the TAG.
</li>
<li><b>Recommendation:</b> drop issues which have not been resolved within a year of being raised.</li>
</ul>
<h4>Coordinate cross-technology architectural developments inside and outside W3C</h4>
<p>Within W3C, one contentious set of issues involve differing perspectives on the role of standards.</p>
<ul>
<li><b>Recommendation:</b> The TAG should define the W3C's perspective on the <a href="http://masinter.blogspot.com/2011/06/irreconcilable-differences.html">Irreconcilable Differences</a> I've identified as disagreements on the role of standards. </p>
</ul>
<p>For coordination with standards outside of W3C:</p>
<ul>
<li><b>Recommendation:</b> The TAG should
meet at least annually with the IETF IAB, review their documents, and ask the IAB to review relevant TAG documents. The TAG should periodically review the status of liaison with other standards groups, most notably ECMA TC39.</li>
</ul>
<hr />
<h4>On the current TAG election</h4>
<p>An influx of new enthusiastic voices to the TAG may well help bring the TAG to more productivity than it's had in the past years, so I am reluctant to discourage those who have newly volunteered to participate, even though their prior interaction with the TAG has been minimal or (in most cases) non-existent. I agree the TAG needs reform, but the platforms offered have not specifically addressed the roadblocks to the TAG accomplishing its Mission.</p>
<p>In these blog posts, I've offered some insights into my personal perspectives and priorities, and recommended concrete steps the TAG could take.</p>
<p>If you're participating in W3C:
<ul>
<li>Review carefully the current output and priorities of the TAG and give feedback.</li>
<li>When voting, consider the record of leadership and thinking, as well as expertise and platform.</li>
<li>Hold elected TAG members accountable for campaign promises made, and their commitment to participate fully in the TAG. </li>
</ul>
<p> Being on the TAG is an honor and a responsibility I take seriously. Good luck to all. </p>Larry Masinterhttps://plus.google.com/106838758956333672633noreply@blogger.com0tag:blogger.com,1999:blog-418176209892776711.post-27180472396668010062012-12-29T11:26:00.000-08:002012-12-29T11:26:11.429-08:00W3C and IETF coordination<blockquote>This is the third of a series of posts about my personal priorities for Web standards, and the relationship to the W3C TAG.</blockquote>
<h4>Internet Applications = Web Applications</h4>
<p>For better or worse, the Web is becoming <i>the</i> universal Internet application platform. Traditionally, the Web was considered just one of many Internet applications. But the rise of Web applications and the enhancements of the Web platform to accommodate them (HyBi, RTCWeb, SysApps) have further blurred the line between Web and non-Web.</p>
<p>Correspondingly, the line between IETF and W3C, always somewhat fuzzy, has further blurred, and made difficult the assignment of responsibility for developing standards, interoperability testing, performance measurement and other aspects.</p>
<p>Unfortunately, while there is some cooperation in a few areas, coordination over application standards between IETF and W3C is poor, even for the standards that are central to the existing web: HTTP, URL/URI/IRI, MIME, encodings.</p>
<h4>W3C TAG and IETF coordination</h4>
<p>One of the primary aspects of the <a href="http://www.w3.org/2004/10/27-tag-charter.html#Mission">TAG mission</a> is to coordinate with other standards organizations at an architectural level. In practice, the few efforts the TAG has made have been only narrowly successful.</p>
<p>An overall framework for how the Web is becoming a universal Internet application platform is missing from AWWW. The outline of architectural topics the TAG did generate was a bit of a mish-mash, and then was not followed up.</p>
<p>The current TAG document <a href="http://www.w3.org/TR/fragid-best-practices/">Best Practices for Fragment Identifiers and Media Type Definitions</a>, is narrow; the first public working draft was too late to affect the primary IETF document that should have referenced it, and is likely to not be read by those to whom it is directed.</p>
<p>There cannot be a separate "architecture of the Internet" and "architecture of the Web". The TAG <i>should</i> be coordinating more closely with the IETF <a href="http://www.iab.org">Internet Architecture Board</a> and <a href="http://trac.tools.ietf.org/area/app/trac/wiki/ApplicationsAreaDirectorate">applications area directorate</a>.
Larry Masinterhttps://plus.google.com/106838758956333672633noreply@blogger.com7tag:blogger.com,1999:blog-418176209892776711.post-63860476849420755682012-12-29T11:00:00.000-08:002012-12-29T11:00:40.863-08:00Web Standards and Security<blockquote>This is the second in a series of posts about my personal priorities for the W3C Technical Architecture Group.</blockquote>
<p>Computer security is a complex topic, and it is easy to get lost in the detailed accounts of threats and counter-measures. It is hard to get to the general architectural principles. But fundamentally, computer security can be thought of as an arms race:&nbsp; new threats are continually being invented, and counter-measures come along eventually to counter the threats. In the battle between threats and defense of Internet and Web systems, my fear is that the "bad guys" (those who threaten the value of the shared Internet and Web) are winning. My reasoning is simple:&nbsp; as the Internet and the Web become more central to society, the value of attacks on Internet infrastructure and users increases, attracting <a href="http://www.economist.com/news/international/21567886-america-leading-way-developing-doctrines-cyber-warfare-other-countries-may">organized crime and threats of cyber-warfare</a>. </p>
<p>Further, most reasoning about computer security is "anti-architectural":&nbsp; the exploits of security threats cut across the traditional means of architecting scalable systems&mdash;modularity, layering, information hiding. In the Web, many security threats depend on unanticipated information flows through the layer boundaries. (Consider the recently discovered "<a href="http://zoompf.com/2012/09/explaining-the-crime-weakness-in-spdy-and-ssl">CRIME</a>" exploit.) <a href="http://www.us-cert.gov/reading_room/malware-threats-mitigation.pdf">Traditional computer security analysis</a> consists of analyzing the <a href="http://en.wikipedia.org/wiki/Attack_surface">attack surface</a> of a system to discover the security threats and provide for mitigation of those threats.</p>
<h4>New Features Mean New Threats</h4>
<p>Much of the standards community is focused on inventing and standardizing new features. Because security threats are often based on unanticipated consequences of minor details of the use of new features, security analysis cannot easily be completed early in the development process. As new features are added to the Web platform, more ways to attack the web are created. Although the focus of the computer security community is not on standards, we cannot continue to add new features to the Web platform without sufficient regard to security, or to treat security as an implementation issue.</p>
<h4>Governance and Security</h4>
<p> In many ways, every area of <a href="http://masinter.blogspot.com/2012/12/governance-and-web-standards.html">governance</a> is also an area where violation of the governance objectives has increasing value to an attacker. Even without the addition of new features, deployment of existing features in new social and economic applications grows the attack surface. While traditional security analysis was primarily focused on access control, the growth of social networking and novel features increases the ways in which the Web can be misused.</p>
<h4>The W3C TAG and Security</h4>
<p>The original architecture of the Web did not account for security, and the W3C TAG has so far had insufficient expertise and energy to focus on security. While individual security issues may be best addressed in working groups or outside the W3C, the architecture of the Web also needs a security architecture, which gives a better model for trust, authentication, certificates, confidentiality, and other security properties.</p>
Larry Masinterhttps://plus.google.com/106838758956333672633noreply@blogger.com3tag:blogger.com,1999:blog-418176209892776711.post-37739655476564234182012-12-29T10:23:00.000-08:002012-12-29T10:23:08.739-08:00Governance and Web Standards<blockquote>
I promised I would write more about my personal priorities for W3C and the W3C TAG in a series of posts. This is the first. Please note that, as usual, these are my personal opinions. Comments, discussion, disagreements welcome.</blockquote>
<p> A large and growing percentage of the world depends on the Internet as a critical shared resource for commerce, communication, and community. The primary <i>value</i> of the Internet is that it is common: there is <i>one</i> Internet, <i>one</i> Web, and everyone on the planet can communicate with everyone else. But whenever there is a shared resource, opportunities for conflict arise—different individuals, groups, companies, nations, want different things and act in ways that threaten this primary value. There are endless <a href="http://groups.csail.mit.edu/ana/Publications/PubPDFs/Tussle%20in%20Cyberspace%20Defining%20Tomorrows%20Internet%202005's%20Internet.pdf">tussles in cyberspace</a>, including conflicts over economics, social policy, technology, and intellectual property. While some of the conflicts are related to "whose technology wins," many are related to social policy, e.g., whether Internet use can be anonymous, private, promote or allow or censor prohibited speech, protect or allow use of copyrighted material.</p>
<p> Shared resources in conflict, unregulated, are ultimately unsustainable. The choices for sustainability are between voluntary community action and enforced government action; if community action fails, governments may step in; but government action is often slow to move and adapt to changes.</p>
<p> As <a href="http://www.reuters.com/article/2012/12/17/net-us-telecoms-treaty-fail-idUSBRE8BG00K20121217">the recent kerfuffle over ITU vs. "multi-stakeholder" governance of the Internet</a> shows, increased Internet regulation is looming. If the Internet community does not govern itself or provide modes of governance, varying national regulations will be imposed, which will threaten the economic and social value of a common Internet. Resolving conflict between the stakeholders will require direct attention and dedicated resources.</p>
<h4>Governance and W3C</h4>
<p> Standards and community organizations are a logical venue for addressing most of Internet governance conflicts. This is primarily because "<a href="http://harvardmagazine.com/2000/01/code-is-law-html">code is law</a>": &nbsp;the technical functioning of the Internet determines how governance can work, and separating governance from technology is usually impossible. Further, the community that gathers at IETF and W3C (whether members or not), are the most affected.</p>
<p> I think W3C needs increased effort and collaboration with ISOC and others to bring "governance" and "Web architecture for governance" to the forefront.</p>
<h4> Governance and the W3C TAG</h4>
<p> The recent TAG first public working draft, "<a href="http://www.w3.org/TR/publishing-linking/">Publishing and Linking on the Web</a>" is an initial foray of the W3C TAG in this space. While some may argue that this work exceeds the charter of the TAG, I think it's valuable work that currently has no other venue, and should continue in the TAG.
Larry Masinterhttps://plus.google.com/106838758956333672633noreply@blogger.com4tag:blogger.com,1999:blog-418176209892776711.post-24983658886247847422012-12-13T15:47:00.000-08:002012-12-16T09:05:13.895-08:00I Invented the W3C TAG :)As a few of you know, W3C TAG elections are upon us. While this is usually a pretty boring event, this year it's been livened by electioneering. &nbsp;I don't have a long platform document prepared ("stand on my record"), but I'll write some things about where I think web standards need to go.... But first a bit of history:<br />
<br />
<b>I invented the W3C TAG.</b> At least more than Al Gore invented the Internet. I was Xerox' AC representative when I started on the W3C Advisory Board, and it was in 2000 that I and Steve Zilles edited&nbsp;<a href="https://www.w3.org/2000/11/TAG-charter-V5">the initial TAG charter</a>. &nbsp;I think a lot of the details (size, scope, term limits, election method) were fairly arbitrarily arrived at, based on the judgment of a group speculating about the long-term needs of the community. I prioritize a focus on architecture, not design; stability as well as progress; responsibility to the community; a role in dispute resolution. The TAG has <i>no power:</i>&nbsp;it's a leadership responsibility; there is no authority.<br />
<br />
And the main concern then, as now, is finding qualified volunteers who can actually put in the work needed to get "leadership" done.<br />
<br />
In a few future blog posts I'll outline what I think some of the problems for the Web, W3C, and the TAG might be. <a href="http://lists.w3.org/Archives/Public/www-tag/2012Nov/0043.html">I'll write more on</a><br />
<br />
1. <a href="http://www.w3.org/2001/tag/doc/governanceFramework.html">Governance</a>. Architectural impact of legislative, regulatory requirements.<br />
2. Security. In the arms race, the bad guys are winning.<br />
3. Coordination with other standards activities (mainly IETF Applications area),&nbsp;fuzziness&nbsp;of the boundary of the "web".<br />
<br />
Questions? Please ask (here, twitter, www-tag@w3.org)<br />
<div>
<h4>
Update 12/16/2012 ... I didn't invent the TAG alone&nbsp;</h4>
Doing a little more research:<br />
<br />
It's easy to find earlier <a href="http://www.w3.org/DesignIssues/Architecture">writings&nbsp;</a>&nbsp;and&nbsp;<a href="https://www.w3.org/Member/1998/11/Talks/tbl-1/">talks</a>&nbsp;about Web Architecture. At the <a href="https://www.w3.org/2000/05/AC-23-minutes.html#23AM-Communication">May 2000 W3C advisory committee meeting</a>,&nbsp;&nbsp;I was part of the discussion of whether Architecture needed a special kind of group or could be completed by an ordinary working group. I think the main concern was long-term maintenance.<br />
By the 6/9/2000 Advisory Board meeting, the notion of a "Architecture Board" was part of the discussion. An&nbsp;initial charter was sent out by&nbsp;Jean-Francois Abramatic to the Advisory Board &nbsp;8/11/2000 6:02 AM PST.<br />
<div class="MsoPlainText">
<o:p></o:p></div>
<div class="MsoPlainText">
<br /></div>
<div class="MsoPlainText">
Steve Zilles sent a second proposed charter (forwarded to the AB 8/14/2000 08:35PST) with cover note:</div>
<blockquote class="tr_bq">
<div class="MsoPlainText">
The attached draft charter is modelled on the structure of the Hypertext CG&nbsp;charter. This was done for completeness. Much of the content is based on&nbsp;notes that I took during the discussion with Larry Masinter refered to&nbsp;above, but the words are all mine. The Background section is my creation. &nbsp;The mission is based on our joint notes. The Scope is mostly my creation,&nbsp;but, I belive consistent with&nbsp;&nbsp;&nbsp; our discussion. The Participants section&nbsp;has most of what we discussed.&nbsp; I tried to capture the intent of what&nbsp;Jean-Francios wrote, but I did not borrow any of the words because I was&nbsp;using a different outline. My apologies if I failed in that respect.</div>
</blockquote>
While I contributed to the definition of the TAG and many of the ideas in the TAG charter, others get "invention" credit as well.<br />
<h4>
An Architecture Working Group...&nbsp;</h4>
Reading the discussions about the TAG made me wonder if it's time to reconsider an "architecture working group" whose <b><i>sole </i></b>responsibility is to develop AWWW2. &nbsp;There's a lot of enthusiasm for an AWWW2, &nbsp;can we capture the energy without politicizing it? Given the poor history of the TAG in maintaining AWWW, perhaps it should be moved out to a more focused group (with TAG participation encouraged).<br />
<br />
<h4>
</h4>
<div>
<br /></div>
</div>
Larry Masinterhttps://plus.google.com/106838758956333672633noreply@blogger.com0tag:blogger.com,1999:blog-418176209892776711.post-59785231655571300972012-05-20T23:19:00.001-07:002012-05-20T23:19:18.955-07:00Are homepages on the way out?<div class="sA Hn">
<div class="jn gu">
Is the idea of a home page on the way out?&nbsp; I've had a "home page" since at
least 1996. But I'm wondering if it is declining. What with things like Facebook and LinkedIn and
so on, there are too many places to look for "identity".&nbsp; But it's really just a social convention, that people and organizations might have a "home page" which is them, which you might sign an email with.&nbsp;</div>
<div class="jn gu">
&nbsp;</div>
<div class="jn gu">
When I sign my email I use http://larry.masinter.net alone. Why include&nbsp; a larger signature block when I can sum it all up in one URL? But I'm doing it less and less. People can find me, just do search.</div>
<div class="jn gu">
&nbsp;</div>
<div class="jn gu">
But I wonder -- is the notion of a "home page" underlying the semantic web's use of a URL to stand for some thing, person or group in the real world?</div>
<br /><div class="jn gu">
For example, you might say that there was a link, for a URL U and a thing X between:</div>
<div class="jn gu">
&nbsp;</div>
<div class="jn gu">
*&nbsp; how good the page at the U serves a "home page" for X</div>
<div class="jn gu">
* how appropriate U is as a URI for the concept X in RDF</div>
<div class="jn gu">
&nbsp;</div>
</div>
&nbsp;(I talked about this on Google+, but blog is better)<br />
<br />Larry Masinterhttps://plus.google.com/106838758956333672633noreply@blogger.com1tag:blogger.com,1999:blog-418176209892776711.post-87888280404674584272011-12-14T23:30:00.001-08:002012-12-16T12:07:19.547-08:00HTTP Status Cat: 418 - I'm a teapot<div xmlns="http://www.w3.org/1999/xhtml">
<div style="font-size: 0.8em; line-height: 1.6em; margin: 0 0 10px 0; padding: 0;">
<a href="http://www.flickr.com/photos/girliemac/6508102407/" title="418 - I'm a teapot"><img alt="418 - I'm a teapot by GirlieMac" height="300" src="http://farm8.staticflickr.com/7006/6508102407_a4de65687b.jpg" width="375" /></a><br />
<span style="margin: 0;"><a href="http://www.flickr.com/photos/girliemac/6508102407/">418 - I'm a teapot</a>, a photo by <a href="http://www.flickr.com/photos/girliemac/">GirlieMac</a> on Flickr.</span></div>
In the <a href="http://www.w3.org/2001/tag/">W3C TAG</a>, I'm working on bringing together a set of threads around the evolution of the web, the <a href="http://larry.masinter.net/1112-extensions-registries-mime.html">use of registries and extension points, and MIME in web standards</a>.<br />
<br />
A delightful collection of <a href="http://www.flickr.com/photos/girliemac/sets/72157628409467125/">HTTP Status Cats</a> includes the above cat-in-teapot came from <a href="http://tools.ietf.org/html/rfc2324">HTCPCP "The HyperText Coffee Pot Control Protocol" [RFC 2324]</a>.<br />
<br />
The IETF regularly each April 1st also publishes humorous specifications (as "Informational" documents), perhaps to make the point that "Not all RFCs are standards", but to also provide humorous fodder for technical debates.<br />
The target of HTCPC was the wave of proposals we were seeing for extensions to HTTP in the HTTP working group (which I had chaired) to support what seemed to me to be cockeyed, inappropriate applications. <br />
<br />
I set out in RFC2324 to misuse as many of the HTTP extensibility points as a could.<br />
<br />
But one of the issues facing registries of codes, values, identifiers is what to do with submissions that are not "serious". Should 418 be in the <a href="http://www.iana.org/assignments/http-status-codes">IANA registry of HTTP status codes</a>? Should the many (not actually valid) URI schemes in it (coffee: in 12 languages) be listed as <a href="http://www.iana.org/assignments/uri-schemes.html">registered URI schemes</a>? </div>
Larry Masinterhttps://plus.google.com/106838758956333672633noreply@blogger.com0tag:blogger.com,1999:blog-418176209892776711.post-50574632809932093682011-08-15T00:00:00.000-07:002011-08-15T00:00:42.803-07:00Expert System Scalability and the Semantic WebIn the late 80s, we saw the fall of AI and Expert Systems as a "hot" technology -- the "AI winter".&nbsp; The methodology, in brief: build a representation system (a way of talking about facts about the world) and an inference engine (a way of making logical inferences bet of a set of facts).&nbsp; Get experts to tell you facts about the world. Grind the inference engine, and get new facts. Voila!<br />
<br />
I always felt that the problem with the methodology was the failure of model theory to scale: the more people and time involved in developing the "facts" about the world, the more likely it is that the terminology in the representation system would fuzz -- that different people involved in entering and maintaining the "knowledge base" would disagree about what the terms in the representation system stood for.<br />
<br />
The "semantic web" chose to use URIs as the terminology for grounding abstract assertions and creating a model where those assertions were presumed to be about the real world.<br />
<br />
This exacerbates the scalability problem. URIs are intrinsically ambiguous and were not designed to be precise denotation terms. The semantic web terminology of "definition" and "assignment" of URIs reflects a point of view I fundamentally disagree with.&nbsp; URIs don't "denote". People may use them to denote, but it is a communication act; the fact that I say by "http://larry.masinter.net" I mean *me* does not imbue that URI with any intrinsic semantics.<br />
<br />
I've been trying to get at these issues around ambiguity with the "duri" and "tdb" URI schemes, for example, but I think the fundamental perspective still simmers.Larry Masinterhttps://plus.google.com/106838758956333672633noreply@blogger.com3tag:blogger.com,1999:blog-418176209892776711.post-42484760682197532742011-08-07T09:05:00.001-07:002011-08-07T09:05:15.272-07:00Internet Privacy: TELLING a friend may mean telling THE ENEMY<div style="margin: 0 0 10px 0; padding: 0; font-size: 0.8em; line-height: 1.6em;"><a href="http://www.flickr.com/photos/lar4ry/6015607760/" title="In the Quebec maritime museum"><img src="http://farm7.static.flickr.com/6132/6015607760_8da376c46f.jpg" alt="In the Quebec maritime museum by Lar4ry" /></a><br/><span style="margin: 0;"><a href="http://www.flickr.com/photos/lar4ry/6015607760/">In the Quebec maritime museum</a>, a photo by <a href="http://www.flickr.com/photos/lar4ry/">Lar4ry</a> on Flickr.</span></div><p>After the recent IETF in Quebec, I found htis poster in a maritime museum.<br /><br />The problem with most of the Internet privacy initiatives is that they don't seem to start with a threat analysis: who are your friends (those with web sites you want to visit) and who are your enemies (those who would use your personal information for purposes you don't want), and how do you tell things to friends without those things getting into the hands of your enemies. It's counter-intuitive to have to treat your friends as if they're a channel to your enemies, but ... information leaks.<br /><br /><i>Via Flickr:</i><br />TELLING a friend may mean telling THE ENEMY</p>Larry Masinterhttps://plus.google.com/106838758956333672633noreply@blogger.com2tag:blogger.com,1999:blog-418176209892776711.post-72373684425148625652011-07-16T09:17:00.000-07:002011-07-16T09:17:51.876-07:00Leadership: getting others to followOften people talk about something "leading" as whether it is newer, faster, better, more exciting, having more new features, etc.<div><br />
</div><div>But fundamentally, leadership only occurs if others follow... a leading product is imitated by its competitors, has a following of customers, and a leading standard is widely implemented.</div><div><br />
</div><div>Where does leadership come from? Can it come from a committee? Not really .... in the end, invention and leadership come from individuals.&nbsp;</div><div><br />
</div><div>In the area of standards and technology, leadership and innovation comes from individuals and groups .... they make proposals, get feedback, adoption, agreement, and then get others to follow. &nbsp;A working group, committee, mailing list can only review, suggest improvements, push back on alternatives.</div><div><br />
</div><div>It is foolish to desire that "leadership" in a technology area will only come from one segment, one group, one committee... and impossible to mandate, even if it were desirable.</div><div><br />
</div><div>Industry prospers when those who innovate find ways to get others to follow. The web needs innovation from outside the standards organizations; those innovations then can be brought in, reviewed, updated, modified to meet additional requirements discovered or added during the review and "standardization" process.</div>Larry Masinterhttps://plus.google.com/106838758956333672633noreply@blogger.com0tag:blogger.com,1999:blog-418176209892776711.post-42763994189436986572011-06-26T11:35:00.000-07:002012-12-30T23:56:39.465-08:00Irreconcilable differences<p class='tags'><a href='http://www.technorati.com/tag/html' rel='tag'>html</a>,<a href='http://www.technorati.com/tag/html5' rel='tag'>html5</a>,<a href='http://www.technorati.com/tag/w3c' rel='tag'>w3c</a>,<a href='http://www.technorati.com/tag/standards' rel='tag'>standards</a>,<a href='http://www.technorati.com/tag/open source' rel='tag'>open source</a></p>
<blockquote><p>I've been meaning to post some version of this forever, and it's been getting in the way of me blogging more. So ... here goes... incomplete and warty as this post is.</p>
<p>I've come to think that many of these differences might be from a "implementation" vs. "specification" view, but I'll have to say more about that later....</p>
</blockquote>
<p>The ongoing battle for future control over HTML is dominated not only by the usual forces ("whose technology wins?") but also some very polarized views of what standards are, what they should be, how standards should work and so forth. The debate over these prinicples has really slowed down the development of web standards. Many of these issues <a href='http://www.w3.org/2010/11/TPAC/HTMLnext-perspectives.pdf'> were also presented</a> at the November 2010 W3C Technical Plenary in the "HTML Next" session. </p>
<p>I've written down some of these polarized viewpoints, as an extreme position and a counterposition.</p>
<p><strong>Matching Reality:</strong></p>
<ul>
<li><font color='#cc0000'>Standards should be written to "match
reality": the standard should follow what (some, all, most, the
important, the open source) systems have implemented (or are willing
to implement in the very near future.)</font></li>
<li><em><font color='#0000FF'>Standards should try to "lead
reality": The standard should try to move things in directions that
improve modularity, reliability, and other values</font>. </em></li>
</ul>
<p><em>Of course, having standards that do not "match reality" in the long
run is not a good situation, but the question is whether backward
compatibility with (admittedly buggy) implementations should dominate
the discussion of "where standards should go". If new standards
always match the "reality" of existing content and systems, then you
could never add any features at all. But if you're willing to add new
features, why not also try to 'fix' things that are misimplemented or
done badly? There does need to be a transition plan (how to make
changes in a way that doesn't break existing content or viewers), but
that's often feasible. </em></p>
<p><strong>Precision:</strong></p>
<ul>
<li><font color='#cc0000'>Standards should precisely specify
behavior, and give sufficient details for how to implement something
"compatible" with the what is currently deployed, sufficiently that
no user will complain that some implementation doesn't work "the
same". Such behavior MUST be <strong>mandated</strong> by the
standard.</font></li>
<li><em><font color='#0000FF'>Standards should minimize the
compliance requirements to allow widest possible range of
implementations; "interoperability" doesn't necessarily mean that
even badly written web pages must be supported. Conformance
("MUST") should be used very sparingly.</font></em></li>
</ul>
<p><em>Personally, I'm more on the "blue" side: the more precisely
behavior is specified, the narrower the applicability of the
standard. There's a tradeoff, but it seems better to err on the side
of under- rather than over-specifying, if a standard is going to have
a long-term value. If a subset of implementations want a more precise
guideline, doing so could be in a separate implementation guide or
profile.</em></p>
<p><strong>Leading:</strong></p>
<ul>
<li><font color='#cc0000'>Standards should lead the community and
add exciting new features. New features should ideally appear first
in the standard.</font></li>
<li><em><font color='#0000FF'>Standards should follow innovative
practice only after wide experience with technology. Sample
implementations should be widely reviwed and tested; only after
wide experience with technology should it be added to the
standard.</font></em></li>
</ul>
<p><em>In general standards should <strong>follow</strong>
innovation. Refinements during the standardization phase might be
seen as "leading", in order to satisfy the broader requirements
brought to bear as the standard gets reviewed. There's a compromise,
but looking for innovation from a committee.... well, we all know
about "design by committee".</em></p>
<p><strong>Extensibility:</strong></p>
<ul>
<li><font color='#cc0000'>Non-standard extensions should be
avoided. Ideally, we should eliminate any non-standard extensions;
everyone's experience should be the same.</font></li>
<li><em><font color='#0000FF'>Non-standard extensions are
valuable. Innovations have (and will continue to) come from
competing (non-standard) extensions, including plugins. Not all
plugins are universally deployed; sites can choose to use
non-standard extensions if they want.</font></em></li>
</ul>
<p><em>In the past, plugins and other non-standard extensions have
fueled new features; why should this trend stop? There are
trade-offs, but moves to eliminate non-standard extensions or make
them less viable are conter-productive.</em></p>
<p><strong>Modularity:</strong></p>
<ul>
<li><font color='#CC0000'>Modularity is disruptive. Independent
evolution of components leads to divergence and
confusion. Independent committees go their own way. Subsets just
mean unwanted choices and chaos.</font></li>
<li><font color='#0000FF'><em>Modularity is valuable. Specifying
technology into smaller separate parts is beneficial: the ability to
choose subsets extends the range of applications; modules can evolve
independently.</em></font></li>
</ul>
<p><em>Modularity is important, but it has to be done
"right". Architecture recapitulations organizational structure;
separate committes with independent specs requires a great deal of
good-faith effort to coordinate, and there's not a lot of "good
faith" going around.</em></p>
<p><strong>Timely:</strong></p>
<ul>
<li><font color='#cc0000'>Standards take too long, move
faster. Implementing and shipping the latest proposal is a good way
to validate proposed standards and get technology in the hands of
users. Standards that take years aren't interesting.</font></li>
<li><font color='#0000FF'><em>Encouraging users to deploy
experimental extensions before they are completed will cause
fragmentation, because not all experiments
succeed.</em></font></li>
</ul>
<p><em>The community can see innovation pretty quickly, but good
standards take time. I'd rather see experimental features as
"proposals" rather than passed around as "the standard"
misleadingly.</em></p>
<p><strong>Web Content Authors Ignore Standards:</strong></p>
<ul>
<li><font color='#cc0000'>Web authors don't care about
standards. Most individual authors, designers, developers and
content providers ignore standards anyway, so any efforts based on
assuming authors will change isn't helpful.</font></li>
<li><em><font color='#0000FF'>Influencing authors is
possible. Authors can and will adopt standards if popular browsers
tie new features to standard-conforming content.</font></em></li>
</ul>
<p><em>I'm not convincued that influencing content authors is
impossible. Doing so requires some agreement from "leading
implementors" to give authors sufficient feedback to make them care,
but this isn't impossible. It's happened in other standards when it
was important.</em></p>
<p><strong>Versionless Standards and Always On Committee:</strong></p>
<ul>
<li><font color='#cc0000'>Standards committees should be chartered
to work forever, because the technology needs to evolve
continuously. A stable "standard" is just a meaningless
snapshot. Standard committees should be "always on", to allow for
rapid evolution. The notion of "version numbers" for standards is
obsolete in a world where there are continual
improvements.</font></li>
<li><em><font color='#0000FF'>Standards should be stable. Continual
innovation is good for technology suppliers, but bad for standards;
evolution should be handled by allowing individual technology
providers to innovate, and then to bring these innovations into
standards in specific versions.</font></em></li>
</ul>
<p><em>We shouldn't guarantee "lifetime employement for standards
writers". A stable document should have a long lifetime, not subject
to constant revision. If we're not ready to settle on a feature, it
should likely move into a separate document and be designed as a
(perhaps proprietary) extension. An "always on" committee is more
likely to concentrate power in the few who can afford to commit
resources, independently of how deeply they are affected by
changes.</em></p>
<p><strong>Open Source:</strong></p>
<ul>
<li><font color='#cc0000'>Standards should always have an open
source implementation. Allowing any company or software developer to
provide their own private extensions is harmful; a content standard
should be managed by the group of major (or major open source)
implementors, so that any "standard" extension is available to
all. </font></li>
<li><em><font color='#0000FF'>Open source is useful but
unnecessary. Proprietary extensions and capabilities (originally
from a single source or a consortium) have benefited the web in the
past and will continue to be sources of innovation. While "open
source" may be beneficial, not everything will or can be open
source.</font></em></li>
</ul>
<p><em>Working on open source implementations can go hand in hand
with working with standards. However, a standard is very different
from open source software. In the end, users care about compatibility
of a wide variety of implementations. We shouldn't guarantee
"lifetime employement for standards writers".</em></p>
<p><strong>The "Web" is defined by "What Browsers Do":</strong></p>
<ul>
<li><font color='#CC0000'>The web is first and foremost “what
browsers do”, and secondly a source of "web applications" technology
(browser technology used for installable applications)</font></li>
<li> <em><font color='#0000FF'>Other needs can dominate browser
needs Web technologies extend to the widest range of Internet
applications, including email, instant messaging, news
distribution, syndication and aggregation, help systems, electronic
publishing; requirements of these applications should have equal
weight, even when those requirements are meaningless for what
“browsers” are used for.</font></em></li>
</ul>
<p><strong><font color='#000000'>Royalty Free:</font></strong></p>
<ul>
<li><font color='#990000'> Avoid all patented technology. Every
component of a browser MUST be implementable without any restriction
based on patents or copyright (although creation tools, search
engines, analysis, translation gateways, traffic analysis may not
be)</font></li>
<li><font color='#0000FF'><em>Patented technology has a place. In
some cases, patented technology cannot be avoided, or is so
widespread that “royalty free” is just one more requirement among
many tradeoffs.</em></font></li>
</ul>
<p><strong>Forking:</strong></p>
<ul>
<li><font color='#990000'>Forking a spec allows innovation. Having
multiple specifications which offer different definitions same thing
(such as HTML) allows leading features to be widely known and
implemented, and allows group to work around organizational
bottlenecks.</font></li>
<li><font color='#0000FF'><em>Forking a spec is harmful. Multiple
specifications which claim to define the same thing is a power
trip, causing confusion.</em></font></li>
</ul>
<p><strong>Accessibility:</strong></p>
<ul>
<li><font color='#990000'>Accessibility is just one of many
requirements Accessibility is an important requirement for the web
platform, but only one of many sets of requirements, to be traded
off against the requirements of other user communities when
developing standards</font></li>
<li><font color='#0000FF'><em>Accessibility is not an
option. Insuring that those who deploy products implementing W3C
standards allow building accessible content is necessary before W3C
can endorse or recommend that standard.</em></font></li>
</ul>
<p><strong>Architecture:</strong></p>
<ul>
<li><font color='#990000'> Architecture is mainly theoretical; it
is not a very useful concern; rather, invoking "architecture" is
mainly a way of adding requirements that aren’t very
useful.</font></li>
<li><font color='#0000FF'><em>Architecture and consistency is
crucial. Consistency between components of the web architecture and
guidelines for consistency and orthogonality are important enough
that existing work should slow down to insure architectural
consistency.</em></font></li>
</ul>
<p><strong>And a few other topics I ran out of time to
elaborate:</strong></p>
<p><strong>Digital Rights Management:</strong> DRM is Evil? DRM is an Important feature?</p>
<p><strong>Privacy:</strong> Up to browsers? Mandated in specs?</p>
<p> <strong>Voice:</strong> Integrated? Separate spec?</p>
<p><strong>Applications:</strong> Great? Misuse: use Browser?</p>
<p><strong>JavaScript:</strong> Essential, stable? Fundamentally broken?<br/> </p>
Larry Masinterhttps://plus.google.com/106838758956333672633noreply@blogger.com6tag:blogger.com,1999:blog-418176209892776711.post-3470006811574219982010-10-22T16:37:00.001-07:002010-10-22T16:37:01.510-07:00Another take on 'persistence' and 'indirection'<div xmlns='http://www.w3.org/1999/xhtml'><p>I've noodled on the questions of persistence of identifiers, waht is a "resource" and so on for a while; <a href='http://www.ietf.org/id/draft-masinter-dated-uri-07.txt'>http://www.ietf.org/id/draft-masinter-dated-uri-07.txt</a> is the latest edition of a "thought experiment". If a 'data:' URI is an immediate address, is a "tdb" URI an indirect one?<br/> </p></div>Larry Masinterhttps://plus.google.com/106838758956333672633noreply@blogger.com0