Posted
by
timothy
on Wednesday April 28, 2010 @02:17PM
from the wouldn't-a-lossy-format-make-more-sense? dept.

@10u8 writes "The Vatican Library plans to digtize 80,000 manuscripts and store them in the open data format FITS, originally developed for astronomy and maintained under the IAU. The result is expected to be 40 million pages and 45 petabytes. FITS was chosen because it 'has been used for more than 40 years for the conservation of data concerning spatial missions and, in the past decade, in astrophysics and nuclear medicine. It permits the conservation of images with neither technical nor financial problems in the future, since it is systematically updated by the international scientific community.'"

Not really. Nowhere in TFA does it mention these records being available to the general public, let alone free to download over the net. Just because they are digitizing the archives for some safety/redundancy does NOT mean that the church is suddenly backtracking and opening the archives up to everyone.

We must have read different articles, the second link to the British Library is confusing if what you say is true:

I am particularly interested in the business model that the Vatican Library will adopt in making these manuscripts digitally accessible. In particular, I am thinking of the manuscripts that are held across institutions and the potential for aggregating them (or even 'virtually re-uniting' them) in Virtual Research Environments.

While not free it sounds like they want to make them more available and make a little cash on the side too to me. Nevertheless they will use the internet to not only spread these articles but also make money. Still a bit two faced, wouldn't you say? Although it's not the utmost in transparency it's still more so than locked underneath the Vatican where only the most holy scholars on site can read them.

DjVu is a format intended specifically for document distribution which uses lossy compression to obtain small files. It's not nearly as flexible as FITS, so you can't use it to represent hyperspectral images, metadata, etc.

Since the Vatican wants a format for data archival, they probably want to preserve as much information as possible for a wide variety of documents, so they can keep the originals in a vault and not touch them for the next 100 years.

No, they read "Catholic Church" and think "pedophile", for the same reason one would read, "Christian Conservative" and think "cruising for gay hookers."

It's just the way the human brain works: things that are found together with relatively high frequency, like Catholic priests and child abuse, or Christian "Conservatives" and unseemly acts in public restrooms, tend to conjure each other up.

It might not be around as long as FITS, but isn't DjVu more suited for the digitization of manuscripts?

I don't know DjVu, but I'm an astronomer and I've worked with FITS a lot. It's actually a very simple data format. There's a header with all the document metadata, followed by the binary data. The metadata has a few standard [required] keywords, but as long as it's formatted correctly, you can add any header fields you like. The data is stored as uncompressed binary vector (unsigned char, short, int, long, float, or double types are supported). It's about as non-proprietary and flexible a format as you could ask for. The only downside is that the files are normally uncompressed, so they can be big. On the other hand, you can always gzip them after the fact, so it's not as big a limitation as it might seem.

In short, FITS is a pretty good format to choose if your goal is to make digital copies that will still be readable 100 years from now.

No, not really. The most important consideration for the ancient Vatican documents is an exact and accurate replication of document image. If you have an document fragment from the third century, a proper reading of the document may hinge on how a particular letter fragment is reconstructed. To do this work properly, you need as exact a replication of the original as possible. It seems that FITS is designed to do just that. DjVu is not. DjVu works with modern documents and is focused on creating high quality readable documents that minimize resources so they can be made available on the web. In some respects, this kind of imaging is more like digitizing astronomical data than it is digitizing documents.

Isn't the Vatican one of the more reasonable major religions when it comes to science and technology?.

Yes, and it was only in 1992 that they admitted that they had made a mistake in forcing Galileo to recant that the Earth went around the sun. Yes, Galileo was an ass about how he said it, but it doesn't change the fact that the church opposed the science with real physical and political force. Since this is how a "more reasonable major religion" behaves I think this is an EXCELLENT argument against "moderate" religion.

Just imagine how silly he's going to feel when he realizes that the church is choosing to use technology which was produced by the same scientific community the church had previously persecuted [wikipedia.org].

Since you point out the persecution of the scientific community by the Church, would you care to give an example from the last 2000 years other than Galileo?

It might interest you to know that the Vatican Observatory [wikipedia.org] is one of the oldest astronomical institutes in the world.

The Institute of Physics's magazine/Physics World/ did an article on his trial last year (IIRC, it may have been earlier). He was tried for heresy, but the reason he was tried was not for heliocentric theory, but rather for insulting the Pope (who had been interested in his theories) about an unrelated (somewhat political) matter instead of answering his questions. IOW, he was killed not for arguing against the church but for publicly insulting the man with the power to have him killed, which is generally regarded as a bad idea.