Claude Bullard writes
> We seem to be unclear about how to apply it and
> even less clear about how to explain its application
> to a non-information theory specialist.
I hope that the new draft just posted will deal with some of these
concerns [1]. I don't think the point is to explore in detail the nuances
of Chomsky hiearchies or similar formal metrics of complexity. Rather,
this is a finding that is intended to remind a broad audience of Web
contributors that they should be thinking hard about a variety of ways in
which complex or powerful languages can obscure the information being
conveyed on the web. Quoting the pertinent new paragraph from the draft
(note that this comes after the familiar text that has talked about
languages ranging from the "plainly descriptive" up through "those that
are unashamedly imperative and Turing-complete", so the following is to
point out that the Chomsky hierarchy is not the only axis that matters):
"There are many dimensions to language power and complexity that should be
considered when publishing information. For example, a language with a
straightforward syntax may be easier to analyze than an otherwise
equivalent one with more complex structure. A language that wraps simple
computations in unnecessary mechanics, such as object creation or thread
management, may similarly inhibit information extraction. The intention of
this finding is neither to rigorously characterize the many ways in which
a programming language may exhibit power or complexity, nor to suggest
that all such power necessarily interferes with information reuse. Rather,
this finding observes that a variety of characteristics that make
languages powerful can complicate or prevent analysis of programs or
information conveyed in those languages, and it suggests that such risks
be weighed seriously when publishing information on the Web. Indeed, on
the Web, the least powerful language that's suitable should usually be
chosen. This is The Rule of Least Power:
Good Practice: Use the least powerful language suitable for expressing
information, constraints or programs on the World Wide Web."
I hope this succesfully signals the direction that I think we should take.
Noah
[1] http://www.w3.org/2001/tag/doc/leastPower-2006-2-13.html
--------------------------------------
Noah Mendelsohn
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------
"Bullard, Claude L \(Len\)" <len.bullard@intergraph.com>
Sent by: www-tag-request@w3.org
02/13/2006 02:58 PM
To: "Harry Halpin" <hhalpin@ibiblio.org>
cc: <www-tag@w3.org>, (bcc: Noah Mendelsohn/Cambridge/IBM)
Subject: RE: Principle of Least Power
This principle looks less useful with every message.
We seem to be unclear about how to apply it and
even less clear about how to explain its application
to a non-information theory specialist.
My eyes don't glaze over when I see "Kolmogorov
complexity" except where the description says
'complexity' I substitute 'cost' because a
string's length can be arbitrarily long without
being arbitrarily complex given any function
that produces it, yet to the consumer, it can
be expensive to process and not necessarily
equally expensive for any given run. Cost of
applying a language is good; information reuse
or discovery is a characteristic that contributes
to lower cost.
This may be one of those principles that has
a disclaimer on it saying 'don't apply this at home'
or like the 'division by zero' prohibition which
everyone learns but is seldom explained.
If this were operational, I would explain it in
terms of the old story from a sci-fi story about
strategy, tactics and probability: strategy says pick a
sensitive system. Tactics say pick the point
of highest sensitivity such that the lowest cost
or lowest force or least risk action is applied
to get the most effect by taking advantage of the
complexity (density of interconnection over affective
message rate) of the system itself; sometimes known
as 'wasping' because of the example where the
probability of changing the vector of a two-ton
object moving at a given velocity by impacting
it with a one gram object is low until you
consider the effect of hitting a speeding car
with a wasp in the driver's eye.
Again, if this is about data typing, this is
oblique. The fact of a number being stored as
an integer is not necessarily informative with
it being a person's age.
HTML is a pretty bad example too. As soon as it
became widely available, it was customized and
extended to the point that it is now a collection
of languages ready to bifurcate (Say HTML, CSS,
Forms, XHTML, microformats, XML data islands,
namespaces, and so on). Less power or incomplete
with respect to reapplication to new problems?
len
-----Original Message-----
From: Harry Halpin [mailto:hhalpin@ibiblio.org]
Sent: Sunday, February 12, 2006 2:44 PM
To: Bullard, Claude L (Len)
Cc: www-tag@w3.org
Subject: Re: Principle of Least Power
More power does not always equal less information. For example, both
Haskell and C++ are Turing-complete, but you can argue pretty well that
Haskell via its type system/monads/etc. gives you *more information*
even though they are on the same level of the Chomksy Hierarchy. The
Chomksy Hierarchy is the ranking of languages from regular languages to
Turing-complete and recursive languages.
It goes more confusing if you have something of a *lower rank*
(DTDs?) in the Chomksy Hierarchy that you want to argue provides *less
information* than something of a *higher rank* (XML Schemas?). What I am
saying is that in general knowing Turing-completeness gives you some
information - whether the program will halt or not given the halting
problem. But the space of all possible informations may not be
objectively measurable - although I do think Kolmogorov
complexity/information theory has something to say about that. However,
what we could argue is that knowing some technologies place in the
Chomksy Hierarchy only gives you some information, but that is far from
the only metric. We can argue XML Schemas give more information by
saying that their typing information and annotations (not present in
DTDS) allow them to express more information even though they may be
higher in the Chomksy hierarchy.
Bullard, Claude L (Len) wrote:
> That confuses me, Harry. Are you saying that XML Schemas being more
powerful
> and more expressive than DTDs (they are) also provide more information?
>
> Wouldn't that contradict the principle?
>
> I get the halting example. The language can't be used to determine
> if an answer will return. In that sense of information (the
> probability of halting), it is undecidable. An analog to this
> discussion occurred recently on the CG list concerning the
> "reality or intuition" of infinities. Practical applications
> don't care but schools of mathematics bifurcate around that debate
> (platonism vs intuitionism vs constructivism and so on). All
> computer systems are finite if they work; they may use concepts
> of infinities but these are functional (eg. limits, or the empty
> set is a member of all sets).
>
> Let me try another example:
>
> If a language automatically casts data types, thus hiding from the
> user what it is doing, it exposes in the syntax less information
> but has more power in the implementation. So in the sense that it
> hides that under the covers, it is more *powerful*. In what it
> documents in the syntax of the program, it is hiding information.
> One of the original principles used to sell object-orientation
> was 'information hiding'.
>
> I'm looking for an example I can explain to the pointy-haired guy
> without him rolling his eyes. "Trust me" isn't good enough. If
> we have to explain the halting problem, he will say "you are making
> my head hurt". That is not a good thing.
>
> len
>
>
> From: Harry Halpin [mailto:hhalpin@ibiblio.org]
>
> Point was that it seems to me the "power" in this note isn't
> Turing-completeness only, but that often less powerful languages give
> you *more information* than more powerful ones. So I'm not sure if
> ranking a bunch of things according to Turing-completeness is really all
> that useful, although it helps!
>
> So an XML Schema gives you more information (i.e. it has more types,
> substitution groups, numeric ranges etc.) than a DTD, and you should use
> XML Schemas instead of DTDs even if both can be implemented as regular
> languages (Now the RELAX NG question is a whole other post...). Same
> with programming in Haskell versus C - although both languages are
> Turing complete, Haskell would give you more information via its typing
> system and pure functional architecture about itself, and is so more
> amendable to analysis without looking at the code or running the
> program. I think this way of thinking about it help connects sections 2
> and 3 to each other.
>
> One example of this idea of information is Turing-completeness - if you
> know a language is Turing-complete, then you know whether it halts or
> not, while for Turing complete languages "you don't and can't know" -
> which translates into *less information* even if the formalism is *more
> powerful.*
>
> Ditto for traditional complexity computer science re Henry - if I tell
> you a problem is of class L (solvable in logarithmic time), than if I
> tell you it's solvable in P (polynomial), and even more than if I told
> you if it was solvable in NP (non-deterministic polynomial time) , since
> we don't know if P=NP, but we do have a pretty good idea what L is :)
>
> I don't think this requires any major amendments to said document, maybe
> a sentence or two about this as suggested earlier might help clarify
> Henry's issues, which confused me as well when I first read it, as I
> thought it was talking about only Turing-completeness - and so the
> Haskell bit seemed a bit weird, but in retrospect it makes sense.
>
>