Finally something really interesting to talk about. If you've used UNIX or any of its derivatives, you've probably wondered why there's /bin, /sbin, /usr/bin, /usr/sbin in the file system. You may even have a rationalisation for the existence of each and every one of these directories. The thing is, though - all these rationalisations were thought up after these directories were created. As it turns out, the real reasoning is pretty damn straightforward.

The case in question is meta data, the tool 'extract' sticks the meta data in a set structure out to stdout. With that, find and grep, you could write a "query" that find all jpegs that where taken with a certain camera. Of course this is a sequential search and a indexed one would be better. Plenty of programs that build custom indexs/database of meta data for this kind of purpose, but you don't want to index everything in every way, just in case.

Nope, it's the right tool for the job. The various development problems we are having today are due to the lack of databases in a large degree.

Example please. Windows WMI certainly hasn't convinced me.

These GBs that you speak of would be broken down to their individual parts, if stored in a database, and they will be indexable, and queriable, discoverable by any program

What parts exactly? Not all fileformats are like that, many are just blobs. What about a video, once you dismiss the metadata stuff, how are you going to store the stream? Binary blob block/chunks, just like a filesystem? On their own one is likely meaningless, and won't have anything in it you can search. Then why bother? Creates lots of needless big vacuuming for no gain.

they would support transactions, and they would allow programs to be notified of changes in the data store. All these capabilities are absent, more or less, from today's data storage systems.

As I said before, metadata. Or maybe media information from something like 'mediainfo'.

It's not metadata. It's data. And it's not only media that have useful data.

Nor should it be. Wrong level to have it.

Yes it should. There should be a standard for it, to allow applications to interoperate at data level.

Not quite, but it's not really hierarchical either, not with links, it's more of a network. Still easy to argue is a database of a form.

You can't run queries on a filesystem regarding the data inside the files, and therefore it is not a database.

The case in question is meta data

Nope. The discussion is about information management, not metadata only.

the tool 'extract' sticks the meta data in a set structure out to stdout.

The schema of the information output cannot be queried at runtime.

With that, find and grep, you could write a "query" that find all jpegs that where taken with a certain camera.

But not all jpegs taken on a certain afternoon, or within a specific time period, or with multiple cameras or users, or many other things.

Plenty of programs that build custom indexs/database of meta data for this kind of purpose

If this functionality was supported out of the box, these programs would be redundant.

Example please. Windows WMI certainly hasn't convinced me.

Almost every application has a layer of data input/output from/to files. This layer would be redundant if databases were supported out of the box.

What parts exactly? Not all fileformats are like that, many are just blobs

Nope. All file formats have an internal structure, otherwise they could be read after they were written.

What about a video, once you dismiss the metadata stuff, how are you going to store the stream?

As an array of frames, and each frame being an array of pixels.

On their own one is likely meaningless, and won't have anything in it you can search.

Nope. One can search for a particular scene, employing pattern matching with another picture, or even a hand-drawn one, for example.

Many filesystem do offer transactions

Many, but none of the major ones, as far as I know.

The SQLite site has some a doc on how they do it generically

SqlLite achieves transactions through various mechanisms, as described in the paper. These mechanisms are based on the functionality provided by filesystems, but the filesystems themselves do not have the concept of transaction.

In order to enable transactions in another application, one has to rewrite the same SqlLite mechanism.

Pretty much every OS has a notification system for file or folder changes.

But these notification mechanisms are not compatible with each other, many O/Ses don't even have such notification mechanisms, these notification mechanisms don't work over the internet, and applications cannot be notified about what exactly changed inside a file.

Um.. video isn't like that. You CAN tread video that way, but it is too incomplete. For instance, a video frame is actually a collection of blocks, each block is compressed. Subsequent blocks use tables generated from the first block for further decoding...

And then there is audio... each audio track associated with a frame is blocked.. and there can be many separate audio tracks.. And for any video track, there can be multiple alternate video tracks.

Now we get to the additional packaging... The video may be encrypted as well... Some tracks encrypted, some not.

NOT GOOD FOR A DATABASE.

And trying to coerce hundreds of different formats into a database would make the database useless. Especially when most data won't need the complexity.

I have worked with wether simulation data. A single run is 100GB or more (one week at low resolution). Some simulations are in 3D for just a few hours (also 100GB). Searches are made for intersections...(weather patterns)... yet the pattern is not something that can be described in SQL, which is really really bad at it.

These searches are closer to what is used in gaming - A 3D mathematical intersection from different points of view....

Database queries are just really stupid at that. And really really slow.