Changing file-formats is far from free. It fractures your user base as your users find they cannot share their documents with their friends. It makes it difficult for you to regression test your code. And it dissuades third parties from supporting your file-format, because you’re giving them more work each time you change. Overall changing file-formats should be avoided where possible. How then can you extend your application?

Future proof your file-formats!

I try to keep my file-format forwards compatible: the same file should work with older versions of my software. This makes it easier to find bugs that I may inadvertently have added: I can regression test against an older version of the software.

I use a file-format that lets the Application store but not use the fields it does not understand. There are many such solutions: SQLITE database, Trees, XML, python pickles… The reason to store fields one does not understand is that one can save them back out later.

For example consider an HTML document. Even if your HTML editor does not know what the “blink” tag means, it can remember the fact that that there is a “blink” tag around the text over there. Even if you edit the text within the blink tag, it can make reasonable guesses about grouping. Of course if you delete the text and type it back in, the blink tag will be gone. But most of the time you don’t just lose your formatting because a colleague used a different tool to edit your file. The key idea here is that you can use an extensible hierarchical data format to keep things associated with their attributes.

Keep the conversion code separate!

Sometimes, to simplify the main application’s code, it does makes sense to change file-formats. In that case, write a bidirectional converter. A version field in the file-format specifies (by naming convention) the converter to use so that the App when encountering a file of a different file-format can figure out the chain of conversions it must apply to get something it understands.

Now if you downgrade, you’re in almost the same position as your friend who never upgraded: your application can’t read the new format files. There is however a difference: as part of the upgrade process a bidirectional converter was added. It does not need to be removed on a downgrade. Because the the older version of the application knows how to figure out which converters to call, it will just find them and call them.

Note that because the original file-format is extensible, it’s possible for the bi-directional converter not to throw away any data. Similarly because the oldest version of the app stores the stuff it does not understand and saves it back out again, changes made by a co-worker with an older version of the software should cause little data loss.

Additional benefits of independent converters

Because you have independent converters you can earn bonus points with people who have not yet been able to upgrade. You can give or sell them the tool to convert the files. The tool can even advertise the features they’d get by upgrading.

You benefit because your code is simpler. The application code is simpler because it only knows its native file-format, the converter’s code is simpler because it’s focussed on one task. You also benefit because it’s easier to require a new base OS for your next version without fracturing your user-base.

Of course writing writing a converter is a pain, and a bidirectional one even more of a pain, but as I said in the beginning changing file formats should be avoided. Writing a bidirectional converter forces you to consider more cases, leading to a better upgrade converter. Furthermore you can reduce the pain for third parties by giving or licensing your converter to them.

What about standard file formats?

I think the same idea of independent converters can be applied, although it may not be possible to keep the additional information known to later versions of the file-format in earlier versions of the file format.

Like the folks at TidBits, I found it slowed down my computer significantly when indexing my drive. However one can turn it off using the System Preferences panel it installs. Like that I can let it index stuff at night.

Press Command twice and a search panel shows up, which will show the first 10 results. To see more, your browser will be opened to display the results as page that looks like google’s generic search page, so it’s running a small web server.

It runs as root, and does not respect your update statistics settings

Google Desktop installs itself as root: the index is at /Library/Google/Google Desktop/Index/(some directory which only root can access). This means it can access anything on your machine and do anything it likes. It doesn’t need to and on a first date, I don’t trust anything that much. Every user on the machine will have their content indexed, even if they don’t agree. You could say that Spotlight also runs as root, but people using an operating system written by Apple do have to trust Apple.

Even more bothersome: I told it not to upload statistics to Google. Their Privacy Policy says:

If you choose to enable Usage Statistics on Google Desktop, it allows Google Desktop to send crash reports and to collect a limited amount of non-personal information from your computer and send it to Google. This includes summary information, such as the number of searches you do and the time it takes for you to see your results, and application reports we’ll use to make the program better.

Well I didn’t, but Little Snitch tells me that a program called StatsUploader wants to talk to dc-in-f99.google.com every 30 odd minutes so. I happen to trust Little Snitch as I used it to help me make sure that Find It! Keep It! wasn’t loading anything from the Internet (unlike most other “internet page saving solutions”, such as those that use WebArchives).

It silently installs an Input Manager

Find It! Keep It! crashed, and the crash started neither Apple’s CrashReporter nor my built in CrashReporter which is extremely odd. Given my past bad experience with Input Managers, I used Find It! Keep It!’s input manager panel to see whether I had acquired a new one. Indeed I had. It lurks in /Library/InputManagers/GoogleModLoader.

Now this bothers me. I did NOT agree to have an InputManager installed. InputManagers in /Library/InputManagers are loaded into EVERY application running on the computer for every user. So what the #!$! does it do? Simply runningcd /Library/InputManagers/GoogleModLoader/strings GoogleModLoader.bundle/Contents/MacOS/GoogleModLoader
in the Terminal tells us that it loads modules.

Further investigation using OTX shows that indeed it crawls a Google/Mods directory and loads modifier bundles into the applications specified by the key GoogleModTargetApplications in some dictionary somewhere. It also appears to do a fair amount of stderr, debugging, pthread and system logging.

If you attach gdb to a running copy of Safari, you can see that SafariSearchResults.gmod and SafariWebHistory.gmod from /Library/Application Support/Google/Mods/ are now loaded by typing info sharedl. One thing they do is to add a new item to your google searches: “About 34 results stored on your computer”. I’m guessing that SafariWebHistory allows pages you just visited to be found with google desktop.

Nevertheless, Input Managers should not be installed silently. They can easily cause system instabilities and this particular mechanism could be diverted by third parties to install unauthorized gmods in a place no one knows about: a big security risk. Given the furore over Unsanity’s Smart Crash Reporter, I’m surprised Google installs this. It’s not like anybody worries about Unsanity’s secret plans of world domination.

It also installs a Kernel Extension

Again kernel extensions aren’t something that should be installed silently as they could very easily impact the system’s stability.
For instance, it includes the nice message “socred_fini() failed, which is a known bug with Apple’s socket filters. Sorry but you have to reboot”.cd /Library/Google/Google\ Desktop/GoogleDesktopDaemon.bundle/Contents/Resources
sudo strings GDFSNotifications.kext/Contents/MacOS/GDFSNotifications
I’m have no idea what its doing with the sockets, but a guess would be that they might need something like that to inform Google Desktop when a file changes to reindex it or for their snapshot capability.

Conclusion

I’m disappointed. I was going to look into Google’s open API to speed up searching the Find It! Keep It! Database for those users using Google Desktop. I think I’ll wait.

Hopefully future versions of Google Desktop will respect user preferences, clearly request the right to install any Input Managers and allow paranoid people like me to give it limited permissions (eg: a single user’s permissions). Alternatively they could release its source code, as they have done with MacFUSE so that we know what it’s doing. In the mean time, I’m uninstalling it.

LLVM is being added to the OpenGL stack. Right now the feature-set of programs using OpenGL is limited by the machine they run on. Because the Mac has a smaller market size, it is not economical to develop software that relies on more recent capabilities of 3D hardware. Although the Mac’s OpenGL drivers apparently already try to fill in some of the gaps, LLVM should extend the range of supported features because it’s a full blown compiler rather than a set of optimized canned routines.

I find this another interesting example of Apple leveraging open source software to improve its proprietary OS: They employed LLVM’s main author, Chris Lattner, who’s PhD topic was LLVM. This may end up being one of Leopard’s most important features: it will make CoreAnimation and CoreVideo available to many more Macs. It continues Mac OS X’ counter-tradition of each new release being faster than the previous one.

In other news, Microsoft has decided to drop VB from the Mac version of Office. They’re open to suggestions. I would have thought the only point of Office on the Mac is to be 100% compatible with Office on the PC. It’s odd that they’re not just migrating to x86 for future versions of Office: Compile the Windows VBA engine with MASM, munge the binary to link into gcc. Yes the ABI isn’t compatible, but that only affects the input/output edge of the engine. I’m surprised that’s a concern to a company that invented thunking…

Having coded and debugged both platforms (Windows previously at Cyrix/National Semiconductor/AMD) I think the problem lies deeper:

Backwards compatibility: Microsoft avoids breaking old software. This is difficult because the APIs were poorly thought out to begin with, so programmers worked around them relying on the underlying implementation.

Lack of a hardware platform: a tongue-in-cheek comment I’ve often heard from engineers at hardware manufacturers is: “if the driver/chip has a bug, who’ll notice it? It’s just another blue screen of death!” Support for multiple platforms makes Microsoft’s task many times more difficult than Apple’s.

Windows is too big and keeps growing. Windows Vista is allegedly 60 million lines of code, all of them maintained by Microsoft.

Microsoft can’t ship PCs for anti-trust reasons. It could however reduce its backwards compatibility issues by using the second core of x86’s to run a virtual PC containing a copy of whatever old OS was needed to run some old software. Instead the second core improves the user’s perception of performance by running his malware. Similarly Microsoft could offload some less valuable features by adopting open source solutions: I don’t believe they’ll do this because it would be a very painful institutional reversal, and they have the money to avoid it.

Apple avoids these issues:

Apple doesn’t really mind breaking compatibility: they provide a bridge for a while to help their customers, but then drop it (68000 emulation, Classic environment, Rosetta) so they can move on.

Apple does not support third party platforms: it works on their computers, and that’s it. It discourages the use of third party extensions such as haxies or plugins

Much of Mac OS X is open sourced. That means Apple has a much larger pool of developers than simply those they employ.

Apple sells wonderful packaging. It avoids much of the invisible grunt work, and it gets to sell something that is much better than the sum of the pieces that went into it. This is a very good business model: reduced costs, better value.

I personally would have preferred the time Apple used at WWDC to bash Microsoft to have been spent one more Leopard details.

Daniel’s new article contains some interesting tidbits, such as the fact BeOS developers now work at Apple.

A Sun developer has posted more information on DTrace which is now included with Leopard. DTrace is an awesome debugging tool for debugging system issues. Many example scripts already exist, and should apparently work as is. Even better: Apple has added hooks to support Objective C debugging. This should be very helpful towards understanding what is sometimes a very black box.