YQL XML open table returns no data when strange characters are in the XML

When running a query on the Google weather API for a city in France, very often the returned XML contains strange characters, like e.g. in “Avignon, Provence-Alpes-Côte d’Azur”. When querying for this xml through YQL XML open table, nothing is returned!

If you are curious about how the Google Data APIs work at the basic level using XML and HTTP, you can read theProtocol Guide. This guide details the requests and responses that the Data API servers expect and return. To learn more about the structure of these requests and responses, read theReference Guide. This guide defines the API’s feed types, HTTP request parameters, HTTP response codes, and XML elements.

To make working with the API easier, we have a number ofclient libraries that abstract the API into a language-specific object model. There are Developer’s Guides forJava,.NET,PHP, andPython as well as sample code.

So I guess I’m left with these alternatives:

building a small java application to create the table

start fiddling about withhttp console or curl-like tools to POST the request manually

TemplateDemo – creating a first project (targeting Spreadsheet rather than Documents)

Got this error: “java.lang.NoClassDefFoundError: com/google/common/collect/Maps”, because I also need the Google Collections (a.k.aGuava).Installation is unzipping, and adding guava-r06.jar and guava-src-r06.zip to the project’s libraries.

Now the program runs and gives me a list of my spreadsheets:

Getting Spreadsheet entries…

Terminology liturgical days

spiegelserre

krantenwaaier

Lexfeed defects

Kindermissaal

klapluik

multigrade

Total Entries: 7

A few modifications of the example in “create a table” and some browsing through the client’s libraryjavadoc.

To change the number of rows (if defined as ‘0’, there are no rows – obviously?), I have to remove the table if it already exists (see bold above).When retrieving the table feed, I must supply a table number, which seems to increment when tables are deleted and added. So now I can retrieve the data from:https://spreadsheets.google.com/feeds/0Au659FdpCliwdG44Q2htMWJEQUxVQ3NfRlZUdlZaalE/records/1But damn me! The records are still printed like “fr: kerstmis, nl: noel, en: christmas”…

The column identifier is the column index (‘A’, ‘B’, …) and not the label. That’s a pity. Could be solved by having another table that makes the relationship between the language names and the column indices, but that’s not very clean.

If you are curious about how the Google Data APIs work at the basic level using XML and HTTP, you can read theProtocol Guide. This guide details the requests and responses that the Data API servers expect and return. To learn more about the structure of these requests and responses, read theReference Guide. This guide defines the API’s feed types, HTTP request parameters, HTTP response codes, and XML elements.

To make working with the API easier, we have a number ofclient libraries that abstract the API into a language-specific object model. There are Developer’s Guides forJava,.NET,PHP, andPython as well as sample code.

So I guess I’m left with these alternatives:

building a small java application to create the table

start fiddling about withhttp console or curl-like tools to POST the request manually

It’s fairly simple to mimick (some of) the behaviour of Dapper using these two tools. I’m into this, because I bumped into a Dapper disadvantage (well, in most cases it’s an advantage, but not in my particular case): it reduces the captured html fragments to plain text, which is good for a simple RSS update, but bad if you want to retain formatting in captured text (e.g. italics may be important and relevant for reflecting the content of a text).

Procedure:

Install the SelectorGadget in your browser (it’s a bookmarklet, so you can simply drag it into your bookmarks)

Open the webpage you want to retrieve data from (or at least a very similar webpage)

for more complex selections, SelectorGadget returns stuff like this
:nth-child(3) , .article-meta
It looks like the YQL table can’t cope with this (probably the :nth-child() is something coming from a newer CSS standard than supported by the YQL table)

Dapper can do lots more, like capturing multiple fields, grouping, rendering RSS and other output formats,…

Problem

I spent a couple of days figuring out some weird behaviour of Serna. When trying to develop my own plugin, it always rendered my content in editable text fields, rather than word-processor-like inline editable text. Looking at the FO-tree dump, it appeared to be rendered using the serna-extensions.

Strip-down

I decided to strip-down the resume project, up to the level at which my own plugin was built up, to see where the text fields would show up.

The trigger for getting the editable text fields, is removing this line from the source resume xml:

It’s not a bug per se. That final decision rests with the folks at Syntext. Serna has the capability to support entities (which XML Schema does not have). In order to do get that capability, Serna uses in-line DOCTYPE.

That said, to process those documents in the Toolkit one needs to do one of the following:

a) remove the DOCTYPE from the XML docs; you get XSD validation

b) remove xmlxs:xsi namespace and xsi:noNamespaceSchemaLocation attribute from the XML docs; you get DTD validation

c) modify the DTDs in the Toolkit to include xmlxs:xsi namespace and xsi:noNamespaceSchemaLocation attribute; you get DTD validation

Changing the XML parser parameters will not do anything useful since it is dependent on what appears in the XML docs. The easiest one to do ay this point would be to modify the DTDs in the Toolkit. That would allow you to process the documents without having to modify each one them.

Further stripping down of the resume example shows how the DTD declares a FIXED xmlns-attribute to the top-level element. Setting or unsetting this attribute would cause Serna to behave differently regarding presentation of the element text in wysiwyg-view. After removing the DTD alltogether, the xmlns-attribute doesn’t seem to be needed at all.

Bare-bone Serna plug-in

This is the complete specification of a bare-bone Serna plug-in that can be used for displaying, editing and saving xml-files using a custom stylesheet.

Example XML

Conclusion

I seem to have solved my problem, but I still don’t understand why Serna behaves the way it does. Anyway, lesson learned (once again): if a stylesheet is behaving strange, 9 to 10 a namespace is in your way!

[Update!] Conclusion 2

I must have been completely deceived…. now it seems more like that the element definition in the xsd controls the behaviour of Serna:
This definition renders text fields: