Posted
by
timothy
on Wednesday March 17, 2010 @02:49PM
from the don't-they-have-any-boffins? dept.

An anonymous reader writes "IEEE Spectrum reports that Tokyo University researchers have developed a superfast book scanner that uses lasers and a high-speed camera to achieve a capture rate of 200 pages per minute. You just quickly flip the book pages in front of the system and it digitizes the pages, building a 3D model of each and reconstructing it as a normal flat page. The prototype is large and bulky, but if this thing could be made smaller, one day we could scan a book or magazine in seconds using a smartphone." The article mentions Google's similar dewarping system; the difference here is speed.

You're absolutely correct! The researchers need to immediately be jailed for contributing to copyright violations. Scientists! They never think about how their inventions will impact our Corporate Overlords.

I know! Can you believe that even now you can go into Borders or Barnes and Nobel and -read- an entire book! And guess what? The employees there think its perfectly natural! There was a man there who said he had spent -3- hours just reading a book and drinking coffee! Talk about outrageous!

Strange the way people seem to make the same typo. The i is rather close to the p but not next to it, so why do so many people put a 'i' in front of the word 'phone'? The summary did not mention any specific phone, just that this technology might be shrunk to fit in a phone.

Don't be an iTool or iDroid! Use normal words! A phone is a phone and does not need a vowel prepended to become viable. If this scanner technology ever comes to fruition the Apple-branded version of it wo

The system is currently a prototype that occupies an entire lab bench. But in the future, they hope to simplify and miniaturize it for integration into portable devices like a smartphone. So one day you might be able to flip the pages of a book in front of your iPhone and get a digitized version in seconds.

(Spoilerish bit follows. Only a spoiler for the worst of purists, but they have been warned.)

Rainbow's End [teleread.org] has an act where an virtual book cartel deploys a giant vacuum/shredder/optical scanner to the UCSD Geisel Library. It sucks in books a shelf at a time, feeds them thru a wood chipper, and the shreds pass thru a tunnel lined with optical scanners. A photo is taken of each bit, and software reconstructs the books.

By the way: “handy” is not used as a term for a mobile phone aka cell phone in the English language.I know it’s used in Germany, and people from there are prone to mess it up, because it’s a foreign English word in the German language.

It's a particularly convenient false friend because the "alternatives" are regionalisms (ie either AE or BE) and much longer because phone is tacked onto them or, in their short forms, colloquial and have even stronger associations with one region. Of course, these days you can often get away with simply using phone by itself.

Cut the spine of the book off with a bandsaw with a metal cutting blade (finer pitch teeth than typical wood blade)

Run thru sheet feeder scanner twice, once for each side.

A bit of scripting hackery later, one fresh PDF! Or.djvu, or whatever.

For those of us brought up that its sacrilegious to damage a book, realize that many books were printed on acid paper; yellowing, decaying, brittle, and will soon be dust regardless of what you do, so may as well preserve the content and properly recycle the pulp.

The bandsaw trick also works on magazines, you know, the things we used to read before websites.

First, there are guillotine-style shears for cutting bindings off books that do no damage at all to the pages. Second, nearly all the high-speed sheet-fed document scanners out there are duplex scanners. In the case where the owner is willing to cut the binding off the book, there are well-known equipment and well-established techniques that do not involve rubes with bandsaws and script hackery.

First, there are guillotine-style shears for cutting bindings off books that do no damage at all to the pages.

My bandsaw does no damage to the pages either. Clearly you haven't tried this. It worked for me, but I'm a small timer compared to the guys at bitsavers.org. They claim it works on an EXTREMELY large scale. I "saw" an ad for a paper shear (usually used for binding, and sorry for the pun). The shear was about 10 times the cost of my little tabletop bandsaw. If the market has changed and you can now buy a shear for the cost of a good steak dinner, well, I guess I'm out of date then. But even then, I needed a bandsaw for other purposes, and if its dual use, all the better, and I'd not be amused at buying, storing, maintaining, and evnetually disposing of two tools to do a job that one does perfectly well.

Second, nearly all the high-speed sheet-fed document scanners out there are duplex scanners.

New, maybe. Not in the olden times aka longer ago than yesterday. Maybe the new ones even duplex properly with paper other than standard 8.5x11 laser paper, and don't just jam on the cut edge. Maybe the new ones don't duplex at a speed about 4 times slower than non-duplex. You're the expert, I'm merely a guy who's actually done it.

I'm only saying what worked with what I had, and what I know other people have successfully done in the past, I'm not just some dude quoting specs out of a tiger direct catalog with an infinite budget for brand new gadgets.

Maybe the new ones even duplex properly with paper other than standard 8.5x11 laser paper, and don't just jam on the cut edge.

As do the older ones.

Maybe the new ones don't duplex at a speed about 4 times slower than non-duplex.

Same speed duplex as single-sided. I do have to admit that I don't know how long that's been common.

You're the expert, I'm merely a guy who's actually done it.

Well, thanks for the compliment, but I am also a guy who's actually done a lot of scanning, with several different models spanning a fairly wide range of costs & speeds.

I'm only saying what worked with what I had, and what I know other people have successfully done in the past, I'm not just some dude quoting specs out of a tiger direct catalog with an infinite budget for brand new gadgets.

So, who is this mythical dude quoting specs out of a catalog? Must be what some people call a "straw man", because it sure as heck isn't me.

Damage as in think of how the bottom of a piece of plywood looks after you cut it, chips yanked off the edge. Tensile strength of paper is pretty high... with fine tooth blade and a cardboard backer board the pages are not torn, wrinkled, ripped thru the saw, etc. One sneaky way to prevent damage to the cover/last pages of a book you want, is to use a magazine/catalog/cardboard box or whatever as a backer board underneath the book you want to cut.

Fair enough dude, I'll try a shear some time, since you claim it works so well. If its anything like my old high school sheet metal shear, I'd worry about losing fingers in it, but I'll be careful so I think it will be OK...

I've never used one; I've only stood by and watched someone else use one. Dude, the thing could take your arm off in the blink of an eye. (And you'd definitely want to blink, considering the blood spatter...)

If you're not scanning massive quantities of books (which is what the article is about), then a bandsaw is probably a darn fine hack to get covers off books. If you're doing a library, the difference between a split second per book and a few seconds per book, plus the smoother edge, would be worth it I'

there are well-known equipment and well-established techniques that do not involve rubes with bandsaws and script hackery.

But you keep saying nothing about how to remove the binding, other than recommending that people buy an overpriced and completely unwieldy guillotine (which, incidentally, also doesn't just work). What cheaper methods are there? Is a bandsaw OK or should it be a circular saw? Does a scroll saw work? How do you fix the book? How do you avoid having the pages become jagged?

A $20000 scanner lets you scan a lot faster than a $50 scanner, but you'll probably actually have a harder time getting it to work.

No, you won't. It will have vastly superior paper handling compared to the $50 scanner.

In summary: you don't know what you're talking about, and you would do well to just keep quiet and don't give people lousy advice.

I have experience in the area, and know first-hand that an appropriate scanner does make the scanning part very easy. Your last two posts make it clear that you've got no experience with production-level document scanners. Perhaps you should stop denigrating the advice of someone who's worked on projects scanning millions of pages (some portion of which were old and in lousy condition).

Your last two posts make it clear that you've got no experience with production-level document scanners. [...] an appropriate scanner does make the scanning part very easy

That is preposterous. There are so many exceptions when scanning books (stuck pages, brittle pages, bad cuts, foldouts, torn pages, dog-ears, gum, double-feeds, failure of double-feed detection, sticky notes, napkins, and tons of others) that scanning is never "very easy", even if scanners were perfect. But scanners aren't perfect: they

I just place my kindle on my scanner, hit scan, then next page. Rinse and repeat. 10 minutes later I have the book ripped. Then a little OCR work converts to text. this still takes a little time though as I'd have to proof read afterwards as well. Once I've done a few, I'll look at finding out how to re encode as a.mobi file.

Note to self.. remember not to use Vim's method on priceless, one off books that are irreplaceable.

You, uh, might have missed the rest of the post:

For those of us brought up that its sacrilegious to damage a book, realize that many books were printed on acid paper; yellowing, decaying, brittle, and will soon be dust regardless of what you do, so may as well preserve the content and properly recycle the pulp.

I own DEC technical manuals from the 70s that are going in the trash within a decade at most. A decade ago, painfully yellowed. Today, turn a page and it snaps off. Thankfully, someone else did the bandsaw and scanner thing some time ago, so I can still read a.PDF of the same manual.

And I wouldn't exactly call a DEC manual priceless, one-off or irreplaceable.

Not in the 70s, no. But now, they are more or less "irreplaceable" in one sense, just like any other out of print book. As far as priceless, assuming its not so rare it never, ever hits ebay, I guess it had a recent "price", sort of.

Since DEC enjoyed using acid based paper which is literally rotting away, a 60s/70s era DEC manual will very soon be literally priceless, one-off, and irreplaceable.

Cutting the spine off a book you already own may or may not be sacrilege. But doing that to your friend's book might strain your relationship.

The employees at Borders were not amused when I wheeled my band saw in. They demanded that I pay for the book I'd just sawed up and scanned. I told them "I'm certainly not paying money for that book now, look how ruined it is! Besides, I already have a copy," as I waved my thumb drive in their face.

The employees at Borders were not amused when I wheeled my band saw in. They demanded that I pay for the book I'd just sawed up and scanned. I told them "I'm certainly not paying money for that book now, look how ruined it is! Besides, I already have a copy," as I waved my thumb drive in their face.

Someone with real balls would have asked for a cash refund. "Clearly my copy of the book is faulty, can I get cash refund, or just instore credit?"

1) Yes, but does it run Linux....2) Imagine a beowulf cluster of these...3) I can't understand 200 pages/minute, what's that in LOC/furlough?4) I can't read you insensitive clod.5) In Soviet Russia, the book scans the book scanner...wait that's not quite right...ah, got it,... the book scans you!6.1) Scan books real fast6.2) Tie into massive database that indexes every perceivable medium on the planet6.3) Get sued by publishers.6.4)....6.5) Profit!!7) How fast can it build a 3d model of Natalie Portman with hot gritz?8) The CIA will use this to scan every page of the manuscripts you've stored in your apartment and will come for your tin foil.9) Netcraft confirms: reading is dying...10) A book scanner is like a car that drives really fast over a highway full of book pages...

That would be a "library". A dynamically linked library, I suppose, since multiple people can borrow/read the same book.

11) If I read something on a LCD, my eyes hurt. And, I refuse to see an optometrist, instead the world has to bend their display technology to my will, ADA style.

12) If I compare, side by side, an expensive ebook reader with a cheap one, the expensive one always subjectively seems to look better. Surprisingly, works for audiophile stuff too. I'm waiting for an ebook reader with those "

I don't really care much either way about LCD vs e-ink, but in a real-life environment, there is an effective difference between reflected and transmitted photons. The brightness of the screen can be drastically different than the surrounding environment with a backlit screen, with e-ink that is generally not the case. Don't optometrists recommend not using a bright monitor in a dark room? Presumably you want the display to be fairly well matched to the background. E-ink allows that, and LCDs don't.

in a real-life environment, there is an effective difference between reflected and transmitted photons.

Show me the physics... other than light polarization weakly depending on reflection. But human eyes have an extremely weak response to polarization.

The brightness of the screen can be drastically different than the surrounding environment with a backlit screen

Then it looks terrible until you adjust brightness/contrast. Which my ipod touch tries to do automatically, albeit very poorly. I think TVs have been available with auto-brightness adjustment since I owned one with that feature in the late 70s.

Don't optometrists recommend not using a bright monitor in a dark room?

This guy has produced some really fascinating work, I strongly recommend checking out some more of it if you have some free time. The high-speed robot hand [youtube.com] he developed literally made my jaw drop.

The article mentions Google's similar dewarping system; the difference here is speed.

There is nothing preventing Google from pushing high speed video through their book software. In fact, they could probably do that with very little work, since you can use an off-the-shelf high speed video recorder and then just push the frames through the regular processing pipeline.

The reason they don't (and nobody else does) is because it's not useful. For getting acceptable quality from book scanning, you need upwards

The prototype is large and bulky, but if this thing could be made smaller, one day we could scan a book or magazine in seconds using a smartphone.

You lost me here. How exactly do I scan an entire book or magazine in seconds using only a smartphone. Somehow I imagine this technology is slightly more than software, unless cameras start coming with super-fast automated page turners attached.

There was an episode of Futurama where Bender is captaining the ship, and Fry asks him if he's read the manual. Bender flips through the several-hundred-page book in about a half second and proclaims "Done", then proceeds to quote it.

It always seemed like a plausible thing to me. Isn't that what they're doing here?

You'd have to be pretty good at flipping pages. Some of them always stick together, and I'd hate to be in a space ship where the Captain is a robot who "read" the manual but skipped the page about turning on the life support systems.

I believe the narrator in the video says that the high speed camera is scanning 1000x1000 pixels, and the book he is scanning has very large type, with fewer than 20 text lines per page. I imagine that this scanner can't scan normal text as fast as the Google book scanner.

Technology like this will cause the publishing industry to go the way of the music and movie industries.

Right now the publishing industry is where the music industry was 7 years ago. Multiple incompatible book formats, DRM that lets rights holders yank your paid content away from you, DRM/formats that leave you tied to specific vendor readers, etc.

The barrier of scanning a book has made the publishing industry think that they don't need to provide books in a format that users want and feel that they ca

Why the fuck are we scanning books? Isn't there, you know, a DIGITAL REPRESENTATION which is used during typesetting? This reminds me of that crazy story of the person who printed out a spreadsheet, scanned it in, printed out the scan, laid it on a wooden table, took a digital picture of it, then uploaded it to his web site (or something like that).

There are many (most?) books published before computer aided writing and typesetting became the norm. Even for many books that were published electronically, the electronic files used to create the books may not exist or may be unreadable due to poor archiving, publisher is out of business, hard to parse proprietary file formats, archaic hardware (cobbling together a punched tape reader from the 70's might be harder and more trouble-prone than just scanning the book), etc.

Obviously, books printed before the digital era are not available in digital form. Duh. But I don't understand -- you want to take a very old, presumably fragile book, and run it through a 200-page-per-minute scanner? The only books I'd feel comfortable doing that to are books where the value is mostly in the words, not the paper they are printed on -- and for the most part, those are recently published books where a digital representation is available.

There were around 400,000 books published in the 70's alone reference [swivel.com]. Most of these books are not rare, nor would they be fragile enough to be significantly damaged by a high speed scanner. And I'd be willing to bet that most of them do not have electronic publishing files.

Some high speed scanners (like Google's) are designed to cause no more harm to a book than a person reading it.

I always assumed licensed, translated Japanese comics were made by acquiring the digital masters from the Japanese publishing companies and using staff translators, maybe even in collaboration with the original author. I was very wrong.Tokyopop, a large importer of Japanese comics, has a video explaining their technique. They have a contact in Japan purchase off-the-self tankobon (compiled volumes) and ship them to the states, where they microwave them to loosen the binding, and scan them in. Then they outs

When established industries become prey for new technology, why do they resist and ask for protection? This is a fundamental question of society. We protect indigenous peoples. We have copyright and patents. We do much to preserve the old along with the new - backwards compatibility. Why do we not simply tell such industries that it's time to change and support them through the change? Yes, I get the whole free market thing, but rather than fight them to force them to accept change, why don't we offer them ideas and methods to change their business model to match the change in consumer requirements?

No, I'm not being trollish or suggesting stupidity. Why can't we crowd-source ideas for how these industries can recover from game changing technology? Must we wait for Jobs to tell us?

I buy no versions of MS Word. There is nothing innately wrong with suggesting crowd sourcing of ideas to allow businesses to move forward rather than stagnate and die. Consumers do choose, and there is nothing wrong with telling manufacturers what we are willing to pay for. They spend a lot of money trying to figure that out on their own. Not too many of them are successful at it.

A few months ago I asked my city's transit if they would post pdfs of the schedules on the web page. They print route schedules/maps and provide them in malls, campuses, and larger public places all over the city. Online, they use Navigo trip planner, links to pdfs and gifs of route maps, and text links to the schedules. So obviously they have some graphic designer in a hole somewhere making this stuff, and probably with InDesign.Despite all the obvious cost in printed materials, and huge effort in the w

A medical CT scanner lacks the resolution to scan a book, but there are CT scanners for other purposes which claim to have the resolution. However, I suspect most inks are essentially transparent to X-rays, so it wouldn't work.