XML Technologies

I'm working on discussion forums for NearbyGamers and I'm building the first feeds into the site. I worked up a clean way to add them from my controllers similar to my tidy stylesheets code. Here's how to do it.

In the <head> of your app/views/layouts/application.rhtml call the auto_discovery_link_tag to print the tags:

Last week, I was talking about Live Clipboard, and how it was well worth investigating. Now I’m taking a closer look at the technical introduction, to see how it works, and find out how easy it is to add to a page.

This week, Microsoft’s Ray Ozzie presented his Live Clipboard concept at eTech. I saw links to the announcement everywhere I looked for a day or so. At first, I wondered what all the fuss was about, and if the muted reaction on the BoS forum is anything to go by, I might not have been the only one.

CodeSnipers has been growing pleasantly over the past few months and I believe it's reaching a critical tipping point. After the holidays, there are going to be a series of announcements about new functionality, areas, and ideas that will be coming about in this little community. If all goes well, some of them may happen when they're announced.

Anyway, towards this goal, I'm looking for a few brave people to share their thoughts, bare their souls, and generally start a discussion with the community. I'm not looking for just any people, but people with particular skills:

First, I need a .htaccess/mod_rewrite wizard. I'm not looking for any work to be done, but just to get some questions answered and bounce some opinions around.

With Ajax being all the rage, people have been forgetting about client applications. Well there are times when you’re on a plane or in one of those horrible corners of the world (like your local Starbucks) that don’t have free accessible WiFi and you’re forced to work with what you have locally. One of the things I love about Ajax apps is the fact that they’re cross platform and deploy instantly. I’d like to get to the same point with thick client applications and it looks like Mozilla may be the platform of choice for the future.

At this point I’ve only done the bare minimum of research into the subject, but I’d like to share what I’ve found and start a discussion on the topic. I’ve found that the main tools in Mozilla for developing applications are XUL (XML User Interface Language) and XPCOM (Cross-platform Component Object Model). XULRunner seems to be the framework that encapsulates these goodies for creating client applications. XULRunner also provides functionality for networking, file access and some other stuff. It seems that upcoming versions of Firefox will be installed with XULRunner, so any system with the latest Firefox installed will have the framework available. XULRunner can also be installed without Firefox so applications built on the framework don’t have a Firefox deployment requirement.

The author is completely correct that importing old data is normally considered a last step in the process of a implementing a new project. Most developers love to start with a fresh clean codebase, whiteboard, database, etc and build their projects from the ground up. It is a wonderful feeling starting with a blank slate and actually making something from nothing. Generally, it is much less satisfying to take a (mostly) functional codebase, learn about it, dig through its oddities, and expand or fix the features. I've talked about this tendency before in Scrapping It All vs A Salvage Operation, but I thought it needed some expansion.

With the explosion of international text resources brought by the Internet, the standards for determining file encodings have become more important. This is my attempt at making the text file encoding issues digestible by leaving out some of the unimportant anecdotal stuff. I'm also calling attention to blunders in the MSDN docs.

For Unicode files, the BOM ("Byte Order Mark" also called the signature or preamble) is a set of 2 or so bytes at the beginning used to indicate the type of Unicode encoding. The key to the BOM is that it is generally not included with the content of the file when the file's text is loaded into memory, but it may be used to affect how the file is loaded into memory. Here are the most important BOMs and the encodings they indicate:

Over the years, I've worked with a number of systems which aggregate data in some way. These have included simple RSS aggregators, the importation of multiple XML formats, and even interacting with a multiple of bug tracking systems. Regardless of the domain, there is a simple concept here: There are a particular set of actions we wish to perform on the data, but quite often the data sources have a variety of data structures.

For some reason, most people immediately jump to the "simple" solution of building a inheritance heirarchy, each with a distinct class/structure beneath it to a specific data structure. This can work fine for quite a few situations and ends up being quite elegant in patterns such as the Data Access Object and others where you can completely encapsulate the data structure and simply forget about it. Unfortunately, this doesn't work in all scenarios.

For example, I'm involved in a large scale Java system where we are retriving XML data structures from a series of different sources (think aggregation). When we started there were twelve different sources in two different formats and the solution looked simple, we would simply import each of the structures and be done with it.

Since I've been bitten by this before, I proposed a different solution. Instead of doing the import in a single step, we would use a Two-Step Pattern to first convert each data structure into our custom structure and simply import that. We've been able to implement this by building a single importer class and creating a new XML-to-XML XSL transform for each new data source.

Just in time too, within 2 months, we were up to 7 different formats from 237 different sources...