Saturday, November 28, 2009

Wednesday, November 11, 2009

go. Hmmmm. Given that it is from Google this will get lots of interest obviously. Will it kill all other programming languages at Google? How long will it be before someone writes a compiler/translator that turns Python into Go, Go into Python, Javascript into Go, Go into Javascript?

The whole point of using XML "upstream" is to allow a multiplicity of transforms downstream. However, care needs to be taken when the documents are critical - like legislation...It is critical that the normative copy is made explicit. The ideal normative copy is one that can partake in author/edit cycles. However, the normative copy is (typically) the result of a printing process because paper is signed by empowered officers with an ink pen. On the way to paper, there are umpteen points of intervention in the typical paper printing workflow. Camera-ready or direct-to-plate workflows in print shops involve page imposition and all sorts of pre-flight work that can - and often do - render the upstream content suspect with respect to the final printed pages.

Legislative artifacts - especially bills - need very close attention to line/page numbers because of the time honored way in which legislative amendment cycles work. Most knee-jerk "structured" XML approaches fall flat on their faces as a result. With legislation, line/page numbers are not throw-away artifacts. They are as important as the words themselves...

None of the problems are insurmountable but they involve a lot of care and thought. Throwing out PDF isn't the solution. Simply plugging in a structured XML editor with a custom/industry schema won't work either. The solution involves combining structured XML technology with wordprocessor technology and DTP technology. The key is recognizing that a multiplicity of formats/techniques are required in order to serve the needs of the complete legislative workflow; and to be absolutely clear - every step of the way - what the normative copy of the digital text is.

Saturday, September 05, 2009

Rick Jelliffe is writing some interesting stuff these days on parsing : SGML, XML, HTML etc. Rick talks about GROVES and that triggered a flashback. Boy, those were the days! DSSSL, HyTime...I remember GROVES being extruded into Graphical Representation of Property values. Rick says GROVEs—Groupings Of Valid Elements. Tomayto. Tomato. Its very instructive to watch the recent RDF goings-on in the light of the GROVE stuff of old.

Anyway, Rick makes the important point that you cannot linearize SGML parsing because it has feedback. Amen to that. SGML has more feedback loops than a room full of amps and microphones - as anyone who has tried to write a true SGML parser will tell you.

For me, the big question is this: is that complexity justified? Given that SGML is, after all, an invented language, its degree of parsing computational complexity is in human hands. With invented languages, we make parsing problems that we then have to solve. Does the cost outweigh the benefit?

In the case of SGML, I believe the answer is no. Charles Goldfarb has a brilliant mind but it is the mind of a lawyer moreso than a computer scientist in my opinion.

Now I work with legal texts a lot in my day job. I have read Peter Suber's Paradox of Self Amendment and I'm pretty familiar with the difficulties of creating homomorphisms between concepts from logic and concepts from jurisprudence.

"That legal rules may be bad logic and good jurisprudence at the same time is yet to be established, of course, but I will at least allow myself to proceed as if that conclusion were not foreclosed a priori." -- The Paradox of Self Amendment

I am as fond of hermeneutic circles as the next language nerd person but I think we need to create Strange Loops with caution in computer science. We need to bring an awareness of the issues they create downstream from the intellectual delights involved in their creation.

Now insofar as markup languages are attempting to be expressive in a human language sense, we get pulled towards parsing complexity. Insofar as we are designing them for machine readabilty, we get pulled towards simple models in the Chomsky-esque taxonomies of language types.

It is the age old debate in disguise. Are markup languages a branch of linguistics or a branch of mathematics?

The answer of course is "yes" and there-in lies the heart of the problem.

Thursday, August 13, 2009

Sadly, another year has passed and my 100% record of not attending the Balisage conference is intact.

This year looks to have been particularly interesting with Michael Kay speaking not only on pipelines but on the fascinating overlaps between markup processing and an all-but-forgotten software design methodology known as JSP (Jackson Structured Programming).

At SGML '96, I gave a talk on the relationship between the JSP methodology and SGML processing. At the time, we were using the ideas in JSP extensively in an C++-based SGML processing toolkit we called IDM. Its great to see JSP get some air again as there is a lot of stuff in Michael Jackson's thinking that really resonates today. Not only the ideas in JSP but also the ideas in JSD (Jackson Structured Design).

I had the great good fortune to work for Dave Croydon of Fiamass when I left college. He introduced me to JSP in the context of building real-time financial trading systems in 8086 assembler! I didn't know it then but what I learned from Dave hugely influenced the approach I would end up taking to everything from programming to system architecture at Propylon.

Tuesday, August 11, 2009

Well, my problems with getting up and running with the MyTouch turned out to be a SIM card problem and has now been fixed. I'm into my second day with this thing now and so far so good. It is taking me some time to get used to the on-screen touch keyboard but I'm getting there.The app store is so far a little disappointing as there are lots of "doesn't work on G2" comments. Some of the apps I've downloaded have flat-out not worked but a good few have worked fine. I'm still exploring. Biggest hardware disappointment so far: that weird cable you need to add to turn the usb connector into a headphones jack. What where they thinking?

Friday, August 07, 2009

When is a mobile phone not a phone? When it is a smart phone apparently.

I got a T-Mobile MyTouch. My SIM needs to be provisioned for data services before I can use the internet stuff on it. Ok fine. I found out today that it might take 48 hours...Hmmmm, not great but ok, fine.

But here is the kicker, unless I am missing something I cannot make a phone call in the interim! My "phone" insists on being a computer first and a phone second it seems.

The Turing Test will soon be passed by a machine - or so some believe. I think we need to up the ante. How about something that simultaneously captures humanity and aesthetics in a way impervious to mere brute calculation attacks?

I want to see a machine be genuinely moved watching this particular rendering of Cold Play's Fix You.

Friday, July 31, 2009

So, I've been playing Lisa Hannigan's debut album a lot lately. I have been nagged by the feeling that she reminded me of someone and I couldn't put my finger on it. I've previously blogged the bits that remind me of Leonard Cohen and Norah Jones but the one I missed has finallycome to me: Beth Gibbons of Portishead. It feels like a mystery solved.

Well, without a doubt, the highlight of this trip to D.C. for me was the behind-the-scenes tour of the Library of Congress. The scale of the operation is just astounding. I got to stand next to some books that Gutenburg himself probably touched. I got to breath that unmistakable aroma that comes from really, really old books.

The Library of Congress does *volume* like no other entity I know and it was truly impressive. I stood in the stacks - the size of two football fields. 2.5 million books. I gazed at a 5 foot high card catalog which, essentially, disappeared over the horizon in both directions. Amazing. A Disneyland for a book nerd like me.

A terribly useful concept for reasoning about URI versioning from Saul Kripke.

I have successfully applied Kripke's rigid designator concepts to some thorny document management problems - in particular legal citation. Sadly, making it work for the Web-at-large would involve centralizing some plumbing in a way that is anathema to the decentralized model of the Web.

Saturday, June 20, 2009

One of the perennial risks with Atkins-style diets is that there comes a point where you really, really want two slices of something to hold your bacon and cheese together in an ergonomic and non-messy way. The urge to grab that wonderful but seditious invention - bread - is very strong. Lettuce works but just doesn't feel right. Especially if you just lurve the smell of toast:-/

I am reminded of Woody Allen's hilarious short story about the Earl of Sandwich...

All joking aside, s/he who invents something that is very low carb (that rules out most of the so-called low-carb breads on the market), and can function like bread in a sandwich, has a bright financial future.

Friday, May 29, 2009

Some of the talks look intriguining. Markup as a nomic game sounds very interesting. Lots about pipelines...

I see that Michael Kay is going to talk about Jacskson Structured Programming and how it relates to XML and XML processing. Yay! Back in 1996 I gave a talk about JSP/JSD and SGML processing at SGML '96. I'm looking forward to reading Michael's paper. Jackson's inversion concept was a stroke of genius and his JSP/JSD work has very much influenced how I think about XML processing.

Sunday, April 19, 2009

Atompub is the subject of Hugh Winkler's REST hypothesis post. I think I agree with most of it. When history is written, I think it will show that without the visual nature of the Web, it never would have taken off as an IT substrate. I.e. The web was something you looked at. It attracted end-user eye-balls first. The droves of substrate diver-types came second. No end-user eyeballs to fuel the fires, no web. Or, looked at another way, no compelling end user application out of the box, no reason to engineer a compelling IT substrate.

The slippery bit here is the stuff that eyeballs look at on the web. We all know that the web started as "pages" where electronic "page" had a strong analogy to a paper "page". The content was forged from tags that marked out paragraphs and bold and headings and what not...

...but that was then and this is now. Moving from mere pages to full-on applications requires more than just html. More than just declarative syntax for end-user-facing text. For a while, the technical answer seemed obvious. Allow browsers to work with structured content if they want to and render said structured content using stylesheets.

For reasons I do not profess to understand, this never really happened. Somewhere along the line, the "structured content+stylesheet=dynamically rendered page" equation broke down. Javascript began to flex its Turing Complete muscles and today we are staring down the barrel of a completely different concept of a web "page"...

...In the new world, it seems to me that HTML is taking a back seat and becoming - goodness gracious me - an envelope notation into which you can pour Javascript for deliver to the client side...

...where it gets turned into HTML (maybe) for rendering using the HTML rendering smarts of the browser.

It is as if declarative information is disappearing into the silos that are not on the web - but interfaced to it. Interfaced by application servers that convert content server side into Javascript programs to be EVALed by the client. The EVAL-ed javascript is then further EVAL-ed by the end-user eyeballs that birthed the success of the web in the first place.

So what? Well, There is a big loss happening I think. At least, I believe it is a real risk. Maybe I'm just a pessimist. I see content disappearing at a rate of knots into silos that are not on the web. Access to these silos is being controlled by application servers that are spitting out programs. Not pages-o-useful-content but PROGRAMS.

We are doing this because programs are so much more useful than mere content if we want to create compelling end-user-applications and because if you squint just right, content is a trivial special case of a Turing Complete program. Just ask any Lisper.

This is happening somewhat under the covers because HTML - gotta love it - allows JavaScript payloads. But if 99% of my pages are 99% JavaScript and 1% declarative markup of content, am I serving out content or serving out programs?

Maybe JSON is pointing at where this is all headed. Maybe we will see efforts to standardize data representation techniques in JSON so that the JSON can be parsed and used separately from the rendering normally bound to it? Maybe XML-on-the-client-side will have a resurgence?

I don't know which way it will go but I would suggest that if we are searching for what exactly the web *is* we have to go further than say it is HTML, as Hugh does in this piece.

For me, the web is URIs, a standard set of verbs and a standardized EVAL function. The verbs are mostly GET and POST and the standardized EVAL function is the concept of a browser that can EVAL HTML and can eval JavaScript. I don't thing we can afford to leave JavaScript out of the top level definition of what the Web is because there is too much at stake.

There is a huge difference between a web of algorithms and a web of data. For computing eons, we have known that a combination of algorithms and data structures lead to programs. Less well known (outside computer science) are the problems of trying to build applications using one without the other or trying to fake one using the other.

Lisp, TeX, SGML...all of these evidence the struggle between declarative and imperative methods. Today, the problems are all the same but the buzzwords are different: JavaScript, XSLT, XML...

We have not solved the fundamental problem: separating what information *is* from what information *does* in a way that makes the "is" part usable without the "does" part and yet does not impede the easy creation of the main application which unfortunately (generally) needs to fuse "is" and "does" in a deep and complex way.

Sunday, April 05, 2009

We live in interesting times on the web-o-data front. An interesting discussion is taking place on Jon Udell's blog.

Lets imagine a Web awash with machine readable numbers and with the ability to perform all sorts of wonderful calculations for you on said numbers behind the scenes. So wonderful if fact that you, as an information consumer, do not need to know or care if the answer to your questions was pre-computed or computed on the fly.

Now. How would you best like to express your questions to this web-o-data/computations? Visually? Textually? What software abstractions exist now for framing such questions? Google has a text box. Geo-systems have maps. RDBs have query-by-example and good old SQL. What else? Lots of programming languages of course, but what else can we use in an end-user, non-programmer context?

Well, I think the spreadsheet is one of the most powerful abstractions for asking questions of data/computations ever discovered. However, there ia a huge disconnect between the old spreadsheet concept of offline, localized data/localized formulae and what the Web enables.

We kmow how to decentralize the data. All the bits are in place. We have the HTTP, URI's, XML, CSV, JSON, RSS/Atom. We know how to notify people/processes when data changes : e-mail, RSS/ATOM, XMPP, WebHooks yada, yada.

Two necessary bits are late to the party. The first is well on the way : cloud computing. It is what will allow us to take *a computation* and put it on the web as a first class, always-on, scalable resource. Once we can compute easily in the cloud, we can derive facts from existing facts and - critically - derive notification events when the inter-relationships between fact objects change. (Sidebar: Change...think about how many business functions you know of that are triggered by the changing relationships between facts. Lots right?)

The second missing bit is the paradigm for asking the questions of the web-o-data. Wolfram Research are making exciting noises in the area of natural language questions. The Geo brigade are creating ever more fantastic stuff for geo-located data. RDF etc. continues to provide a grand unifying theory of it all but...where is the end-user facing paradigm for interacting with a humongous web of smart numeric data and its concomitant legion of web-hosted computations?

I think the spreadsheet metaphor is an interesting place to start. Facts are cells. Questions (known as formulas) as also cells. New facts and new formulas can be created based on existing facts/formulas ad infinitum.

Thursday, March 19, 2009

Giving numbers URIs is something that Hypernumbers is trying to do. Will be interesting to see if they get traction. They are a bit stealthy so not sure how they are progressing since winning seedcamp in 2007.

I'm hoping that soon now, the number of entities live in the business of giving numbers homes on the Web will reach the recombinant growth threshold number R.

R is a term we use in Timetric-land for to capture the economic/biological concept of "recombinant growth. Thankfully, R is, I think, in the low single digits in this case :-)

It is straightforward to see how entities can independently host numbers that collectively can be mashed-up to create new host numbers. The more the merrier! Less obvious[1] is how the fabric of the Web can be made to support the critical concept of near real-time calculations and update notifications on those numbers so that "calculating" is as easy as "hosting" is today.

Then of course, there is the minor sounding but hugely thorny issues of naming the numbers (the URIs), namespaces, normative copies yada, yada.

Hard and interesting problems one and all. Problems well worth solving for the value they will bring. I would suggest that this emerging space is the future of what today is known as a "spreadsheet". It is not a desktop calculation experience hosted in a web browser. That is not particularly interesting as it just takes a desktop paradigm and puts it on the Web. I'm talking about radically rethinking the whole concept of a spreadsheet. I'm talking about something that is not so much "on the web" as "in the web". Built in. Native. All around you. All the time. A worldwide interlinked network of bazillions of numeric quantities. All updating and being updated in line with their semantic interconnections (known in today's terminology as "spreadsheet formulas"). All ready to be used to create yet more numbers and drive decision making at client and server levels of the ecosystem.

The world is literally awash with very, very useful scalar data types. The big gorilla being integer and fractional[1] quantities that change value over time. I firmly believe that in order to make these things first class members of the Web, they need to live *on* the web.

Simply put, numbers need URIs with RESTian APIs for management. Lets put that layer in place first. Then we can make RDF statements about numbers (and numbers at idempotent points-in-time). We can also provide feeds that describe time series changes using things like XBRL...

Sunday, March 15, 2009

Eve has started blogging about the fattening effect of carbs and is pointing out that low-carb regimes like Atkins are not as crazy as some would have us all believe.

I thought I would weigh[1] in with my story. A few years ago I woke up and found myself weighing 18 stone with high blood pressure, high cholesterol and a stressful, largely exercise free lifestlye.

I tried the standard approaches, precise calorie counting, fruit for lunch, nothing after 6...all that stuff. No joy. I stumbled upon Atkins and the geek in me was intrigued. I have a soft spot for contrarian conceptual models that put received wisdom through the wood chipper. I decided to give it a go although I was very skeptical. Especially when I read the sentence in the book that says something like "Remember to eat regularly. You may forget to eat.". Yeah, that will happen...

But it did. I found that I had essentially complete control over hunger pangs. With Atkins, if you are hungry you eat. That's it. It is just that you are very careful *what* you eat. Hunger pangs are not part of the recipe. You control them so that they do not control you.

So, the weight started to come off. I joined a gym and the weight started to fall off. I lost 2.5 stone without a single hunger pang. "This is trivial", one part of me said while another part of me was thinking "Perhaps my insides are turning to mush. Perhaps by arteries are disintegrating or getting clogged with lumps of cheesey egg roll wrapped in chicken skin?".

I went to the doctor for a checkup and my blood pressure and cholesterol were both significantly better than they were before. Now as a geek, I'm always wary of conflating correlation with causation. I don't know if was purely the weight loss that dropped the BP and cholesterol. Maybe the diet was not a part of it. Perhaps the diet was the primary driver of the weight loss? I don't know...and neither, it seems does medical science.

I have my weight under control now. Every now and then I fall of the wagon and it starts to climb up again - especially now that I live in the epicenter of the processed carb universe - the USA. Every now and then I drop some more carbs from my intake and the extra weight goes away. No panic. No problem.

Yes, yes I know that I really should get some exercise too but, heck, who's perfect? I'm working on it. Quit nagging! Yes, yes, I know that the scientific/medical community is very much split on this whole low-carb issue and maybe I'm killing myself.

I've seen scientific controversies before and been involved in a few (if you allow "computing" to be classed as a science that is). This one looks very familiar. It smacks of Thomas Kuhn. It also smacks of "Big Carb". Consider me a walking experiment if you like. If the low-carb lifestlye kills me, the lack of activity on my blog will let you know I was wrong and the "fat makes you fat" brigade were right :-)

Saturday, March 14, 2009

Saturday, March 07, 2009

Some time ago, I had some ideas burning a hole in my head. Ideas I knew I had no time to pursue but felt pretty passionate about. My Two Django ideas in need of good homes posting resulted in some meetings and some false starts...but it also resulted in meeting up with three ace Django/Python programmers / "recovering" scientists, based in the fine city of Cambridge, UK. Three of the smartest folks I've ever met to boot.

It took me all of, uh, 20 seconds to explain one of the ideas to them and the result, a mere matter of months later is http://timetric.com/. it is now in public beta and you might like to check it out.

The idea is simple but has significant consequences in my opinion

The basic thought is this: What if numbers were first class members of the Web ecosystem?...

What if numeric quantities had their own "home" on the Web? I don't mean a great big slab of numbers, or a database, I mean the actual numeric quantities themselves : the time it takes you to drive to work, the spot price of gold, the average daily rainfall in Tanna Tuva...whatever.

What if the changes to those quantities are recorded through time? You would end up with a time series for each numeric quantity.

What if each time series was a blog, with its own feeds, its own simple web based update mechanism etc?

What if all these numeric blogs lived out in the cloud so that it can scale to silly numbers and provide very high availabilty?

What if the entire system provided simple webby APIs so that developers can upload as well as download stuff easily?

And last but definitely not least...what if new time series could be created using spreadsheet like formulae - and automatically updated when any of the underlying numeric quantities are updated.

That would be pretty interesting. Not a database on the web, not a spreadsheet on the Web. Something new and much more interesting on the Web.

It will be fascinating to see what types of applications get built on top of the Timetric platform.

Sunday, March 01, 2009

Partly in jest and partly in complete seriousness. Here are a list of things I believe contribute to docheadedness.

A love of language. This can present itself in a love of reading, puns, typography, poetry, concern for the beautiful renderings of the theorems of Ramanujan.

A love of classification. This can present itself in a love of folder structures, meta-models, abstractions that blur distinctions between nouns and verbs. Formalisms like speech act theory, FRBR, HyTime, RDF. Frameworks like ebXML, TEI etc.

Monday, February 23, 2009

Propylon have some vacancies for committed IT professionals to work with me and my team in the USA. We are based out of Lawrence, Kansas (yes, the place Django was invented) and working in an embedded team with the Kansas State Legislature in Topeka, about 20 mins by car from Lawrence.

The opportunities are for docheads. Seriously, you really need to be comfortable working with complex document-centric workflows and automation in order to enjoy this job and you need to enjoy it to be good at it :-)

The use case that ill sits with Postel's Law is the one where I'm sending you the XML message needed to change the parameters of the [nuclear reactor/medical monitor/industrial process/zillion dollar fund transfer exchange] and you get duff XML. Should you be liberal in what you accept?

Friday, February 20, 2009

So, I upgrade my 8.04 (Hardy) on my Thinpad X61 and get a new kernel 2.6.24-23. All is fine except the WIFI which comprehensively does not work. The flashing light under the monitor does not flash. The hardware pessimist in me thinks hardware failure. The "all upgrades have consequences" Karmic causal correlationist in me blames the kernel upgrade.

Tuesday, January 27, 2009

"There are many aspects of the IT business that have been turned upside down the internet. One concept which is effectively being meaningless is the concept of a "version" in the sense of an application software package version number/name or an operating system version number/name." -- http://www.itworld.com/offbeat/60990/what-version-are-you-running?

Monday, January 19, 2009

"The cold winds of recessionary pressures were blowing around Pentementi Mountain as the two technologists made their way to Master Foo's cave near the summit. The WIFI signals had long disappeared from their netbook gadgets but text messaging still worked on their cellphones. Contact with the valley below helped them feel connected to the Twenty First, or was that the Twentieth?, Century." -- Master Foo chews on a fork

Tuesday, January 13, 2009

"Is it really sensible to try to soft code everywhere when, whether you like it or not, the host environment is going to make its own hard/soft configuration changes?" -- Hard wired, soft coded, confused

Tuesday, January 06, 2009

Booting from something you carry around in your pocket. There is a trend there I think. A trend that could take the "personal" out of the PC box and into your pocket which, arguably, is where it belongs in this day and age : From Personal Computer to Impersonal Chameleon.