Posted
by
timothy
on Friday July 04, 2003 @01:04PM
from the read-franklin's-autobiography dept.

David Moynihan writes "July 4th marks the 32nd anniversary of that day in 1971 when Michael Hart first sped an all-caps version of the Declaration of Independence to anyone and everyone then on what later became the web, thus founding Project Gutenberg. Thanks to an army of volunteers and the Distributed Proofreaders, this is the last year PG will have fewer than 10,000 titles.
Strangely, Microsoft picked this dual anniversary of literacy and freedom to re-launch their Reader product, with three free bestsellers a week, if you activate the new version with Passport, sign a EULA, etc. Real reason for the upgrade might be that the DRM on MS's old Reader was cracked. If you're not into giving away data, or are running a system other than Windows, maybe you could take the time to tell a friend about free books online, or even help out by visiting the Distributed Proofers and editing one page per day."

As one that has been involved with Distributed Proofreaders for the past 18 months, yes we are serious about having Slashdot people proofread. The last time a story about D.P. ran in November, thousands of new users joined us and helped us grow and expand to our current size.

Go and check it out, there is great work being done there. (I am a bit biased though). Click here [pgdp.net] for a history of DP.

It probably accounts for Achilles shouting "First Greek" when he lands at Troy's beach and all those descriptions of statues of a girl suspiciously similar to a modern actress in Roman poems. And I was really sure Tom Sawyer's pet was not a penguin.

The thing is, this brings up a somewhat serious point. I've proofread professionally in the past, and I know that it's hard and nobody's perfect at doing it. An open approach might work with software, because anyone can easily test it: there are bugs in the program. But without a wiki-type format (www.wikipedia.org) who is there to make sure it's proofread properly?
If this is proofread incorrectly and distributed to schools and stuff, I have to worry about the quality level of the texts students are learn

They're inviting those who mock and scorn the bad spellers. Obviously, if you've read enough posting here, there appears to be a 20:1 ratio of bad spellers to good spellers. So, there are still some who they can extract sufficient proofreading capabilities from.

Now all we need is more people promoting this in schools and printing the books. Much like the IA Bookmobile [archive.org]. It seems like the people who could use this the most, don't even know it exists.

Yes, I can agree with this.
We people here won't benefit from it half as much as needy school districts who could use the texts. Methinks what they really need to do is work on some awareness program, distributing the books to teachers... or even letting know that such a resource exists. With more technology in the classroom, Gutenberg shouldn't be out of reach to many teachers.

The first Gutenberg books I came across were being passed around BBSs at 2400 bps or so. When they started 32 years ago, 110, maybe 300 bps. Who cares? Check the size of the files, these aren't Word documents, you know.

i am going to be teaching modern civ next year in high school (i have been at the junior high for 7 years) , and have already gone to the site and gotten works from aristotle, plato, locke, montesque, et al. thanks guys. there is still something to be said for a classical education. glad somebody is doing all they can to preserve the classics, especially with all the assaults on it from the social reconstructionists.

There's really a problem though about getting the word out to people, in pretty much the same way the popularity of libraries today has been dropping. A good idea would be a separate advocacy site to come up with lists of texts in the project (i.e. What's New?, Most Popular, etc.) to help people wade in immediately.

It currently has 20,000 FREE titles listed, from hundreds (at least!) of sources, in all subjects, beautifully categorizes by title, author and subject--and topped off by an up-to-date what's new listing and a fine search engine. Much props to John Mark Ockerbloom and the University

Just on a whim, I decided to see how much cheaper titles in microsoft reader format was over a physical book.

I went to the MS Reader site and followed the links to the on-line publishers sites (such as B&N and amazon). In most cases, the reader format is only $1 cheaper, and sometimes $2 more expensive, than the corresponding paper book (soft or hardcover).

So... why in the world would anyone want to use a format that ties them to the computer?? With a paperback, I can read it anywhere, read for as long as I want without having to change batteries, and even pass the book onto a friend.

If they want to make the electonic formats more attractive, they need to make them a LOT cheaper than the corresponding paper version.

So... why in the world would anyone want to use a format that ties them to the computer?? With a paperback, I can read it anywhere, read for as long as I want without having to change batteries, and even pass the book onto a friend

Well, I don't use MS-Reader myself (For commercial e-books I like the cross-platform Mobipocket), but a major reason I like e-books is I like to read them on my PDA -- not to save money. I carry my PDA around anyway, and having e-books means less to carry. I would purchase all m

Someone else mentioned the fact that he's got a reader with him all the time anyway, which makes it pretty conveinent to have a book or three in there. I'm not going to bring a book around with me everywhere I go just on the offchance that I might get stuck in a long line, or waiting for someone. But when such an event happens, having good reading material right at hand is very nice. Also nice is being able to have a selection of books in there at any one time, just in case I finish one book while waiting s

Notwithstanding the provisions of section 106(3), the owner of a particular copy or phonorecord lawfully made under this title, or any person authorized by such owner, is entitled, without the authority of the copyright owner, to sell or otherwise dispose of the possession of that

I went to the MS Reader site and followed the links to the on-line publishers sites (such as B&N and amazon). In most cases, the reader format is only $1 cheaper, and sometimes $2 more expensive, than the corresponding paper book (soft or hardcover).

These facts being plainly obvious, the logical conclusion is either that A: The cost of setting up the Reader infrastructure is so high that these high prices must be charged to recoup them, or B: They want them to fail.

"...to anyone and everyone then on what later became the web..."
What?? In 1971 http protocol was around? Or is the author trying to suggest that the internet became the web? I thought the web was part of the internet, not a replacement for. Perhaps Im misreading the article.

> "...to anyone and everyone then on what later became the web..." What??

I think they are saying in 1971 it was distributed to anyone and everyone...Then, on what later became the web, they distributed it there too.

Keeping in mind the web ripped most of its ideas from gopher, and FTP before that, so the web wasnt a breakthrough idea out of nothingness.But i dont think they meant it as 'distributed on one medium which later that medium turned into the web'

Gutenberg is great and all, but it really needs to dump the text format. So much information is lost that it makes reading some texts extremely difficult. Some format that preserved chapter headings, footnotes, illustrations etc. would be a massive step forward.

I think they discuss this somewhere. The whole point of ASCII is that it can be accessed simply, by almost any machine. It is as stable a format as you will find for data storage, anywhere. They are commited to these books being widely readable, and ASCII is the best way to assure this.

However, I agree that some books (most actually) lose something in ASCII. What I would like to see is a project which works off the basic Gutenberg texts and formats them in a readable way, preserves illustrations, etc. But it should be an add on to the project, not the main project. Also, remember that that level of preservation is much harder than just typing in and proofreading - you have to consider formatting and scanning images as well.

As a temporary measure, it would be nice to see someone do an XML markup that can be easily translated into LaTeX, so people can have pdfs with nice fonts, table of contents, title page, etc. That would be a step up. But to do it properly would take a separate effort, and a very large scale one even by Gutenberg standards. Worthwhile, yes. But involved.

I wonder. Does Gutenberg keep their sources in ASCII or something else that they runoff to produce the ASCII final version? It might be that they already have formating information that a smarter runoff process could use. (Heh, I can dream, right?)

The final ASCII version is also produced by hand. After two rounds of proofing, the text gets into a queue. From that queue, a 'post-processor' checks it out and reformats it according to the Gutenberg guidelines, along with any error corrections that might still be necessary. Then she or he uploads the final version to Project Gutenberg, where the 'whitewashers' check the text yet again before posting it to the archive.

About the XML: You are in fact welcome to produce an XML version, I believe some fellows at DP indeed do that already. However, the main version is the simple text version, since you can read that with everything. But nothing keeps you from also posting an XML or PDF or TeX or whatever version.

The whole point of ASCII is that it can be accessed simply, by almost any machine.

Just because you store something in XML, doesn't mean people have to use XML to read it. The whole point of XML is to have a format that you can easily transform. Transforming in ASCII is particularly easy.

XML markup that can be easily translated into LaTeX

If it's a good content-oriented XML app, it's easily transformed into LaTeX, or anything else. If it isn't a good content-oriented XML app (the StarOffice native format comes to mind) then it shouldn't be used for an online document repository.

I think the basic problem with the Guttenberg/DP people is that they've been doing things a certain way for so long, and they don't want to retool. And I can see their point -- changing over to XML is a lot of work. And the core DP team already seems pretty busy keeping the web site going.

On the other hand, I do wish they'd make it a priority. Right now I'm a volunteer proofreader, concentrating on getting out the famous Britannica 11th edition [wikipedia.org]. The amount of information that gets lost in scanning in Greek and other text with weird phonological conventions is just appalling. And the conventions for math and science formulas and equations produces a complex linear format I can't believe anyone would actually want to read.

Then again, it wouldn't be that hard to go back and insert proper markup. For 90% of the text there's a simple transform between the Gutenberg conventions and a reasonable XML format. The other 10% probably need another look anyway, and wouldn't be hard to do if they've saved the scan images. I haven't had the heart to ask if they do.

And the conventions for math and science formulas and equations produces a complex linear format I can't believe anyone would actually want to read.

It's basically TeX, the one true math typesetting system. Most mathematicans and many scientists know it quite well. It beats the heck out of MathML (one example in a MathML tutorial was 8 characters in TeX, and about 50 in MathML.)

The entire point of the project is to preserver the content in a format that is both human and machine readable. See if I don't have any software from the present here in fifteen years and XML is long dead I will still be able to read standard ASCII text even if I am just cat(ing) it through less or printing it as is. I can't resonably read a book that is filled with XML tags and if there is no longer software to parse them then its not to useful. I am not saying that it would be hard to write such softw

I can't resonably read a book that is filled with XML tags and if there is no longer software to parse them then its not to useful.

This is complete bullshit. With a proper setup you would convert the source into multiple output formats, including TXT, but you would keep the source in a format that maintains meta information such as formatting, chapters and pages. XML is used in the entire industry exactly with the expectation that it will be around for decades. Even if it won't, the open source code that we have to parse it will not magically disappear -- PG would keep using it to generate output texts from the XML source through all these years. You might as well argue that ASCII will go away.

XML is not a character encoding. XML does not require the use of non-ASCII characters. What can be represented by an XML document is a superset of what can be represented by a plain ASCII document. XML is a human-readable markup.

MS Word 2000.doc is a binary format.

I suspect that you have very little idea what you are talking about.

PG already uses XML-like markup to indicate an emphasized portion of a passage, among other things. If we were to accept your argument, then even this alone should be seen as

Yeah but the entire point of XML is that it defines structure not presentation. If you want to go off and produce something which is readable in some other format (e.g. text), feed the document through some XSL transformation or perl script and it pops out the other end in any way you desire. Someone else can feed it through something that produces a PDF, someone else a Palm e-Book, someone else braille. And this can all be automated on the server. Everyone is happy.

As for XML being long dead, this is highly unlikely. XML is just structured data and is itself just text. It would be trivial 5, 10, or even 100 years from now to pull out the data from the xml format in any way you please. Unless the grammar is horribly mangled (MS Office), it would even be possible to infer it without even knowing the grammar. I would trust Gutenberg to collectively come up with a format which would be simple for proof readers and parsers alike.

Michael Hart has repeatedly made mention that he does not want to get caught up into the fad of the moment with text formatting issues, and that plain old ASCII is one constant that hasn't needed changing. Indeed, you can open up the original Declaration of Independence document with your standard web browser, and you can still read it just fine. I dare you to try and find any other data format that was commonly used 32 years ago that you can still read with current equipment.

With that said, I believe that XML is perhaps going to have the staying power that ASCII text has had for the past many years. And there are many volunteer projects that you can get involved with that do this including:

The HTML Writers Guild [hwg.org] - Originally they were trying to convert all of the gutenberg texts to HTML, which has been admittedly a resonable standard for a good number of years. Currently they are now going to a version of XML with some standard headings for titles, copyright info (or lack thereof), chapter headings and so forth. More is on this website.

Project Gutenberg XML [pgxml.org]This is a group more dedicated to the XML, but has a very similar purpose.

The point here is that once the data is put into ASCII text format, projects like this can and are being done. If you really feel that you want to help with the effort, please join one of these. Also, at any time you can also take the Project Gutenberg files yourself and do this, but at least this gives you a forum to share your work once you are done.

The thing is, XML is just plain ascii too (assuming you mandate not to use Unicode or some weird charset), so therefore you're not reducing the ability of people to read the text. At worst they'd be inconvenienced by extra tags if they tried to read it raw, but then again they wouldn't have to.

The reason for this is XML is easily translatable into just about anything else that the grammar allows for. So I don't see it would make any difference to the project goals if the 'master copy' for every document w

One of the advantages of XML is that it's very easily transformable. If Project Gutenberg were to produce XML texts, it'd be trivial for them to automatically convert them to plain ASCII and make that version available as well.

The point is that many of us would prefer an XML version. The argument against this was that ASCII is a longer-lasting archive format. My counter-argument was that an ASCII version can trivially be produced from the XML both for archival purposes and for those who would prefer such a version.

The point is that many of us would prefer an XML version. The argument against this was that ASCII is a longer-lasting archive format. My counter-argument was that an ASCII version can trivially be produced from the XML both for archival purposes and for those who would prefer such a version.

I would have to agree that XML does offer some resonable options that make it much superior to plain ASCII test (or Latin-1 as has been discussed in this thread).

I think you're a little unclear as to what ASCII is. As the "A" in "ASCII" indicates, it's oriented towards American applications. And it consists of a mere 127 characters, which includes 32 control characters that you don't use in text.

In point of fact, Project Gutenberg has long outgrown the 96 graphic characters in ASCII, though I think they themselves are ignorant of the fact. The seem to have experimented with characters until they found a set that displays the same on "normal" Windows, Macs and Unix/Linux. The result is something they call "extended ASCII" but that's actually subset of both ISO's Latin1 character set [czyborra.com] and Microsoft's Latin1 code page [microsoft.com].

When is this an issue? Well, I'm a DP volunteer, and I'm concentrating on the Britannica 11th edition. Lots of geographic entries, all of which contain degree symbols. This symbol is not in ASCII! If you follow the DP instructions, you end up entering byte 186 (decimal). If you're using the ISO or Microsoft Latin1 set (and if your computer is localized for the U.S., Canada, or Western Europe, you probably are) then 186 does in fact display as a degree symbol. But if your system is localized for Eastern Europe, you're probably using Latin2, and this byte stands for an S with a cedilla accent!

In short, "ASCII" is actually less universal than well-formed HTML. In which you represent the degree symbol with a character entity (&deg;) that's the same everywhere.

Indeed, you can open up the original Declaration of Independence document with your standard web browser, and you can still read it just fine.

Hardly a representative example. The Declaration of Independence [archives.gov] was hand-written, and thus doesn't include a lot of fancy fonts or formatting. A better example is a contemporary novel, such as 1984.

As it happens I just finished re-reading this one. I read a Plucker [plkr.org] file that somebody had transformed from an HTML version [adelaide.edu.au], which in turn came from the Project Gutenberg "ASCII" version. Readable enough. But all the typographic nicities -- italics, boldface, etc. -- were reduced to ALL CAPS in the text version, and that was retained in the HTML version. Pretty distracting -- made me feel like somebody was shouting at me. Double Plus Ungood! Thoughtcrime!

...once the data is put into ASCII text format, projects like this [XML] can and are being done.

You make it sound easy. A lot of information is lost when your primary version is "ASCII". It all has to be put back by hand. There's no avoiding this for the large body of existing Gutenberg texts. And of course as recently as 5 years ago, there wasn't a real choice anyway. Even HTML had issues, and serious XML tools didn't exist.

But now XML technology is pretty mature. It makes sense to store new Gutenberg texts in XML. If people still want "ASCII" copies, the XML is easily transformed into that. Though I a lot more people will want the HTML version -- a format which is actually accessible to more people than "ASCII".

There are two reasons this won't happen soon.

The first is that somebody will have to design and implement the necessary XML apps for inputing and proofreading the texts. (Which would alsio elminate a lot of the errors proofreaders make, like entering [Greek: Tau] when they mean [Greek: T].) A huge project. As it stands, the people who maintain the DP web site have their work cut out just to keep the existing software working. That's a vali

In point of fact, Project Gutenberg has long outgrown the 96 graphic characters in ASCII, though I think they themselves are ignorant of the fact.

Then I invite you to actually take a look at some of the texts. The Gutenberg people know quite well when they're using ASCII and when they're using Latin-1. If you'll look at the books that are posted, some of the books are posted just in ASCII, and some in 7foo.txt and 8foo.txt files, where 7foo is ASCII and 8foo is Latin-1, and a few just in Latin-1, and the

Using ASCII presupposes that all the important texts you want to preserve are in American English. Since a fair amount of the important pieces of literature come from mainland Europe (actually even the British £ sign isn't in ASCII), it is clearly not up to the job and should be replaced.

Further, authors often use devices like italics or bold to add emphasis to their work and nowadays even completely different fonts and typefaces. Translating these works to ASCII with no markup actually destroys so

Since a fair amount of the important pieces of literature come from mainland Europe (actually even the British £ sign isn't in ASCII), it is clearly not up to the job and should be replaced.

As a matter of fact, the DP web interface allows you to enter the pound sterling symbol even if you don't have it on your keyboard. It also has a lot of accented characters that aren't in English. The fact is that the Gutenberg people think they're using ASCII, but are actually using Latin1. So Gutenberg texts wi

The fact is that the Gutenberg people think they're using ASCII, but are actually using Latin1. So Gutenberg texts will display correctly on any system that's localized for the U.S., Canada, or Western Europe. But not elsewhere.

Excuse me? The Gutenberg people know quite well when they're using ASCII and when they're using Latin-1. If you'll look at the books that are posted, some of the books posted from DP are posted just in ASCII, and some in 7foo.txt and 8foo.txt files, where 7foo is ASCII and 8foo is

I hadn't noticed that. But that convention isn't followed consistently. Of the last 10 files posted from DP, only 7 follow this convention. And I haven't seen it documented anywhere.

I shouldn't have spoken categorically about the Gutenberg people. Somebody is aware of this issue, because recent posts from DP say "Character set encoding: ISO-Latin-1", which I guess is some help. My assumption of ignorance was based on the DP Proofing Guidelines [pgdp.net], which refers to 8-bit characters as "Upper ASCII". But I gues

Well I did preview it and it looked OK so that was good enough for me although technically a mistake since I was using HTML mode.

However, even latin-1 does not have the complete range of characters in use by all writing systems based on the Latin alphabet and you're totally screwed if you want to preserve the Iliad or the Bible (to pick two random texts) in the original. Also, to do bold and italics etc you need some sort of markup - so it might as well be XML or HTML.

Actually, you didn't make any mistakes with your input, and I shouldn't have implied that you did.

This all comes down to a simple misunderstanding: people use "ASCII" and "text" interchangably. Nine times out of 10, when you hear somebody talking about ASCII, they're really talking about Latin1. Usually, this mistake doesn't really matter. But this time it did: The guy who was defending Gutenberg's use of "ASCII" managed to imply that Gutenberg uses an American character set. Which was why you flamed him

While I actually took the time to sit down and learn how to read punchcards from just their hole patterns (which isn't too difficult compared to reading data files directly from a hex editor if you have to dig into why a program isn't reading a certain file correctly).

I have seen some punchcard machines come into the local thrift store a couple of years ago, I think it would be hard to find one now.

The nice advantage that punch cards have over just about every other data storage medium is that as long as

Why you'd ZIP an HTML file that you're offering on a web site is beyond me.

Because one common reason to do an HTML edition is pictures, and the system is set up to have one file per document.

we're doing a lot of mark-up anyway

Italics is not a lot of markup. XML calls for a lot of details that would take work. How many books have you post-processed? They accept XML; why don't you find out how hard it is to make an XML edition first hand?

I get all the information I need (and more) from "reading" lamb livers (all the Universe is reflected in even its tiniest fragment, you only have to look hard enough). On most days though, I have to resort to using tea leaves (as there aren't too many sheep left in 20 mile radius) but tea leaves have lower bandwidth and they generate more errors (mostly typos, but when reading Slashdot, I occasionally experience a kind of deja vu). I post to Slashdot by using complicated black magic (it includes drawing sev

Unfortunately, with the copyright periods being extended so long, the material will only be of (ancient) historical interest. The 98 percent of copyrighted works that are unpublished and should be on there, unfortunately, gets to sit collecting dust instead of benefitting mankind.

If you have an FTP program (or emulator), pleaseFTP directly to the Project Gutenberg archives:[Mac users, do NOT point and click. ..type]

Given that a) Macs, being Unix-based, have command-line FTP like everybody else and b) the idea of a point-and-click interface has now passed so far from being a bizarre and contemptible innovation that lots of people are trying hard to develop nice-looking Linux GUIs...... isn't this snarky instruction now more

Putting a flag on your front porch is a great way to celebrate the 4th of July. An even better way to celebrate the United States' birthday would be to go to this site and actually read the documents that define us as a country.

In this day in age when it seems everyone is a suspected terrorist and our liberties are stripped one by one in the name of homeland security, and in the name of the rights of large companies, I wish some of our elected officials would actually read these documents sometime.

A red white and blue flag isn't what makes this country great, nor does an extremely high gross domestic product -- it is the set of ideas that where written over 200 years ago that makes the USA great.

So everyone go to this site and read those documents. Even if you aren't American you should still read those documents because everyone has the right to the freedoms that our founding fathers wrote about.

Conversion on the fly to many formats. We'll putting eBooks into XML format (mostly using teixlite.dtd, we think) for conversion on the fly to many other formats.

New ways to donate. "Sponsor a book"

More contemporary content. We receive donations nearly every week from currently published authors who want to make their stuff available to a wider audience (i.e., our Doctorow's Down and Out [ibiblio.org])

Your ideas! Visit gutenberg.net [gutenberg.net] to sign up for newsletters, find out how to get started producing an eBook, and find eBooks

Thanks especially to our main and backup distribution sites, iBiblio [ibiblio.org] and The Internet Archive [archive.org]. And thanks to the THOUSANDS of volunteers who have brought us nearly to our 10,000th eBook.

I know it is complicated, but is it worth also publshing a style sheet for each work, which can be used to replicate the 'look and feel' of the original? It shouldn't interfere with the aims of readability, as one is free to ignore the style sheet and just read the raw XML or text file.

I just looked over the links in earlier replies (PGXML and HTML-Writers) and was surprised: HTML-Writers hasn't touched only converted 20-odd etexts from Jan to Feb 2000; and PGXML hasn't even the ability to do valid HTML curled quotes.

Both look like amateur do-gooders, and we need more of those; but these efforts should be folded back into the organisation of PG, where they may find a permanent home. The alternative is to go adrift, due to too few people being involved (only _two_ people do PGXML) to rou

A while back, I used wget to mirror the entire Project Gutenberg works. (I did it off-hours, and contacted them to see if it was a problem, or if there was some other more effecient way to do things)

Anyhow, with my GBs of text, I used bzip2 -9 to compress each text file. In the end, the entire collection of PG was able to fit on one CD. Since most people don't have bzip2 support I also included the free archiver, Ultimate Zip [ultimatezip.com] on the CD as well. I also put a read-me on the CD (that would appear as the first file) with basic instructions what to do.

One of the great things about CDs is how easy they are to transfer... One stamp, and a 5cent CD envelope, and you can send 2 CDs anywhere in the country (this predated Netflix AFAIK).

Anyhow, I sent these CDs to two different people, and the next time I talked with them, I found out they'd made several coppies of it. Basically, they heard someone mention some subject that related to one of the files on the CD, brought up the CD, and offered to make a copy for them. This happened a few times that I know of, and quite possibly many times that I don't know of. Quite as easy way to spread the word.

Of course, with that said, I don't read the PG texts myself... There are two reasons. The first is that I have yet to come across decent software designed for long-term reading. Something that saves your place (automatically?), something with a legible font, and something with light colored text on a dark background, which brings me to my next point...

The second reason is that monitors are all backlit... That means, reading on a computer screen is like reading text on a floursent lightbulb. It's possible for a while, but your eyes are quickly fatigued. The only screen I have that doesn't do that is my 640x240 B&W LCD screen on my Psion handheld. As good as that is, it's just too small for efective reading. Someone needs to create a non-backlit LCD screen, approx 6" (about the size of a book page) that is small, light, silent, compatible with everything, and most importantly, it needs to have good software that makes reading less work than it normally is on a computer... Until then, relectronic reading isn't going to really be feasable. Screw electronic paper, just give me a screen that doesn't hurt my eyes, and I'm set to go.

Indeed, but after a few "modifications" like this, your soultion looks rather hackish. If I wanted, I could write a small shell script and call it "reader.sh" or something like that, and simply have it store your position as the first line in the book, and restore you to that position next time. Of course, once again, things like that get pretty ugly after a few features get added.

That's not actually what I was talking about. What I mean is that page forward/page backward should be one (obvious) keystroke, not a command sequence, or anything like that. Also, movements shouldn't be cursor-based. When people hit "down" everything should go down a page. With a cursor-based system (which is pretty much every editor) you can never be sure what's going to happen. A web browser would make a better book-reader than vi (maybe lynx/links?).

I wasn't listing the final specifications for a device in detail. Yes, it would have HTML support, and CSS would be useful to have as well. With HTML, people are going to want images supported, that means a few different libraries there as well.

Then there are more document formats. SGML, Tex, info, Postscript, etc.

why would you want a document reader when what you're reading are long and lengthy texts?

I wasn't listing the final specifications for a device in detail. Yes, it would have HTML support, and CSS would be useful to have as well. With HTML, people are going to want images supported, that means a few different libraries there as well.

Ok I'm gonna tone myself down a little... this should be a little less of a rant so hang on. The point I was trying to make is that I think HTML should be the one technology an ebook reader should be able to support unlike even standard desktop browsers. I'm not

Yeah and the pdf reader for WinCE needs, uhh, "work". It is by no means comparable to its desktop cousins... a cheap knock-off from a huge company complaining about the limitations of PDAs. IMHO, avantgo is a considerably better "ebook reader" that's easier to code for and is far more compatible. HTML 3.2, that's it...

Well, I do have a WinCE device that I paid several hundreds of dollars for, sitting around collecting dust. Instead, I use my Psion 5mx all the time, and it has a great PDF reader. It is,

Wouldn't it be possible to rig up a high-speed scanner based on digital video technology?

A large part of the speed problem is the page turning or moving the page past the sensors. In any case, digital cameras haven't shown enough detail for good scans, and plantery scanners (expensive digital cameras for scanning) cost several thousand dollars.