Posted
by
timothy
on Saturday November 07, 2015 @04:03AM
from the paperless-future dept.

New submitter David Rothman writes: Scan a 300-page book in just five minutes or so? For a mere $199 and shipping — the current price on Indiegogo — a Chinese company says you can buy a device to do just that. And a related video is most convincing. The Czur scanner from CzurTek uses a speedy 32-bit MIPS CPU and fast software for scanning and correction. It comes with a foot pedal and even offers WiFi support. Create a book cloud for your DIY digital library? Imagine the possibilities for Project Gutenberg-style efforts, schools, libraries and the print-challenged as well as for booklovers eager to digitize their paper libraries for convenient reading on cellphones, e-readers and tablets. Even at the $400 expected retail price, this could be quite a bargain if the claims are true. I myself have ordered one at the $199 price.

A digital camera on a tripod PLUS...
Proper lighting
Foot pedal interface
Lots of software to take the pictures, manipulate the images and stitch them all together into an eBook
So a bit more than just a digital camera and a tripod

Video cameras don't have high enough resolution to produce good quality scans of printed material.
A standard 300dpi scan of an 8.5 x 11" sheet of paper results in 8.5m pixels.
This particular device claims it has 16m pixels which would be about right to be able to cover a scanning surface area that appears to be bigger than an 8.5 x 11" sheet.
Another approach might be to detect when a page has been turned using a low resolution video sensor and using that to trigger the higher resolution camera.

Both of the smartphone OSes have apps for that, and they perform just as well as a digital camera on a tripod, and rival a good flatbed. Back in the nineteen hundreds, if I wanted to save an article I was reading at the library, I had to check out the volume and bring it home, or bring it to the reserve librarian, who would make a not-very-good paper copy for me at a buck a page - assuming that some horrible copyright objection wasn't raised.

Both of the smartphone OSes have apps for that, and they perform just as well as a digital camera on a tripod, and rival a good flatbed. Back in the nineteen hundreds, if I wanted to save an article I was reading at the library, I had to check out the volume and bring it home, or bring it to the reserve librarian, who would make a not-very-good paper copy for me at a buck a page - assuming that some horrible copyright objection wasn't raised.

Now, wherever I might be, I just whip out my iPhone and run JotNot, which snaps a picture of each page and saves it as a PDF, just like a flatbed scanner. I love living in the future!

Get the book you want in audio format, then run it through voice recognition software.

You still have to turn pages manually, I had expected they would have automated that (well, perhaps better if you still want to return the book to the library later).

Any digital camera on a tripod can do the same thing.

In theory, yes, in the same way that anyone can build their own home from raw materials. Scanners like this have been around for awhile, and if you can afford the five-figure price tag they do a good job. What these guys have done is lowered the cost from five figures to three. If it works as advertised (in other words as well as a $50,000 equivalent), it's a pretty amazing piece of technology. I'd really like to see some independent, third-party reviews of how well it performs before I go out and buy o

1. Get a sheet fed scanner like a Fujitsu Snapscan ($400)2. Cut the binding off the book3. Place the stack of pages into the scanner4. Get a coffee

And you're done, the thing's 600 DPI and does both sides in the same pass. It creates a PDF directly, and you then want to OCR the PDF, running a sharpen filter on the text, and decide on how much you want to compress the PDF. A 1000 page textbook ends up being about 700 megabytes, in crystal clear quality.

Thanks, but what about those of us who might prefer nondestructive scanning? Also consider other factors--for example, the speed and quality of the scans, as well as the price. The Czur appears to be several times faster than a $600 model from Fujitsu [amazon.com] that allows nondestructive book scans. If you're scanning lots of books, that won't be a trivial detail. As for quality, the Fujitsu is good but not nirvana. Let's see if the Czur will do better.

Prototype 1 could scan the majority of books without damage, but may tear one or two pages in some books. Out of 50 books tested, 45% had one or two of their pages either torn or folded. This is a very early prototype and there are many areas for improvement in the design.

No need to cut anything off with this scanner (if you've seen the demo youtube video). So will users just check out books from the university/public library and scan it at home? Later they can upload it to bittorrent or other sharing sites.

Is it likely this device will be banned because it allows easy circumvention of copyright laws?

4 This only slightly speeds up/makes the process easier. Anything you can read can be transcribed.

The speedup is very high... any book scanned in an hour at zero cost (other than the one-time $199 scanner cost). Try transcribing manually (for example typing the contents of a book into your editor) and see how long and tedious a task that is.

If the OCR quality is as good as they say it is, the book's pdf file size will be really small (less than 50 MB).

Slashdot was actually hiding some of the conversation.But the current non-destructive method still isn't transcription regardless of what the GP said. It is a phone + app to take pictures, then apply a few tools to ocr and combine the text into your choice of digital file. This basically just puts the camera on a stand, provides a footpedal, and automates the toolchain to perform the same process. Not really a huge speedup.

Honestly though, most pirated books come from simply cracking the encryption on the p

The problem I forsee with this is for books that won't stay open on their own, or ones that barely do and have significant page curl. Still possible with the foot pedal I guess, but a lot more annoying.

I have a SnapScan, its sheet feeder won't hold an entire book and the process of scanning hundreds of pages each from many books will generate substantial wear on a SnapScan. There also tend to be misfeeds that you need to manually fix. SnapScan is great at what it does but I wouldn't want to destroy books and manually feed them through it if a cheaper, faster, non-destructive method existed.

There are two curses of modern book publishing that cause problems whatever hardware you use. The first is so-called 'perfect binding' in which the folds of page gatherings, through which the sections are traditionally sewn together, are instead sliced off and glued to make a rigid spine with an exceedingly narrow angle of opening; the second is the use of low-grade, thin paper with high show-through that mucks up the scan.

The best software I've found to scan and collate is Softi ScanWiz. With it you may scan one stack of pages, flip the stack and scan the other side - the program then shuffles the page images into the correct order. It also automatically adjusts brightness and contrast so as to minimise ink show through.

I have run over 1200 pounds of paper from townhouse through Canon to recycling. A few thousand books, mags and newsletters. Basically, I agree with the premise that it can be done with a $400 autofeeder and a spine cutter and I also agree with the objections. Is this a webcam on a tripod and something like gscan2pdf? Maybe. How well the software handles things like page curl is important to how worthwhile it is. But he is only asking a fraction of what an autoscanner setup would cost so it is not that expen

1. Get a sheet fed scanner like a Fujitsu Snapscan ($400)
2. Cut the binding off the book
3. Place the stack of pages into the scanner
4. Get a coffee

And you're done, the thing's 600 DPI and does both sides in the same pass. It creates a PDF directly, and you then want to OCR the PDF, running a sharpen filter on the text, and decide on how much you want to compress the PDF. A 1000 page textbook ends up being about 700 megabytes, in crystal clear quality.

I vaguely recall something recently about an IR scanner or something that could be focused finely enough to read the pages sequentially down through a closed book.

That's incredible that it is even possible, though I suspect it might not ever become the common way to do this (the common way to do this is the author of the book just exports his digital file to a format compatible with e-readers, like word->mobi/epub or pdf. All new books are being published like this)

This one came up first when I googled shredders. http://www.staples.com/InfoGua... [staples.com] But it wouldn't let me link to a specific product, just a list. Just picked it out of the history and now it's showing a printer.

The actual big news here: The company doing the indiegogo is located in Shenzhen, China.

This is the first one of these I've seen. It struck me as very odd that the video narrator was an almost perfect midwest accent, but had terrible grammar and word choice, but when looking at the location of the startup, it became more obvious that this was actually an Indiegogo out of China.

Anyway, good on them; I expect that we will be seeing a lot more people doing crowd-sourcing from non-U.S. locations, given that VC thends to be pretty tight outside of specific regions of the U.S. (which is, in turn, why most startups that go anywhere are U.S. based, rather than being in Europe, or elsewhere, where the funding climate is pretty terrible).

The only reason devices that can display printed sheet music like tablets and e-ink readers are not popular is that they are essentially useless for sight reading. A foot pedal for page turns could easily create a reader for musicians. It would catch on like wild fire and the music publishers could finally start to distribute good editions again. I have been saying this for years and no one listens, it is the usual routine with industry not seeing the forest for the trees that are still being cut to print m

Buy a cheap set of USB racing foot pedals and a micro-usb adapter and voila, you can probably already do that. Or at the most a simple driver to interface the pedals as standard inputs and assign macros to them.

Forget everything you assume about whether or not there is a market for large format e-readers. Categorically there is

Categorically? Have you done any market research? Or are you just projecting your own desire (so strong that you've essentially posted off-topic to bring it up) onto everyone else, because you can't imagine why they wouldn't want the same thing?

A large format e-reader would be considerably heavier than a few dozen pages of sheet music. Yes, it could store more data, but that's not really going to be of much use to someone playing a fixed set. You can't fold it down the middle to save space. You can't make a

The only reason devices that can display printed sheet music like tablets and e-ink readers are not popular is that they are essentially useless for sight reading. A foot pedal for page turns could easily create a reader for musicians. It would catch on like wild fire and the music publishers could finally start to distribute good editions again. I have been saying this for years and no one listens, it is the usual routine with industry not seeing the forest for the trees that are still being cut to print music.

You clearly have done zero research. There's a number of options, the most popular I've come across is the AirTurn [airturn.com], although the Cicada [pageflip.com] works well too from what I've heard.

Strangely, most people seem to disagree with that very idea. Reading not convenient on electronic devices. Paper still is the best medium for books. If I have the book, why would I want to read it digitally?

The one thing an electronic library is good for is rapid searching. If you need a vast amount of knowledge available at a fingertip, and on the road, not in your library, then it's great.

For everything else, I and most other people prefer to turn around, take the book from the shelf and look it up there.

It is often too difficult to read PDFs and scans on mobile devices. We could use a software to identify individual words in the scanned page and reflow the text to match the narrow screen size of phones and tablets. The reflowed document would use the original images of the words, only the rows and pages would be changed. Then we could read without panning and zooming.

"Strangely, most people seem to disagree with that very idea. Reading not convenient on electronic devices. Paper still is the best medium for books. If I have the book, why would I want to read it digitally?"

Because you can select the typeface, the font size, the border, there's built-in bookmarks, there's a search function where you can jump from place to place containing the search expression, there's a built-in word explanation/translation/wikipedia search built-in, you can highlight passages without da

Hint: That's why thousands of bookstores are closed, because people prefer eBooks over paper ones.

No, thousands of bookstores are closed because people can select from a much wider selection from Amazon.

THIS. And, well, there's the fact that Amazon can basically undercut any actual physical bookstore's prices, without having to pay for as many facilities (more expensive in high-traffic areas), staff to deal with customers... and of course the fact that Amazon seemingly doesn't actually need to even make a profit (ever, really) to keep investors pouring in.

Physical bookstores obviously have a lot of trouble competing against something like that. Which is why so many have closed.

I prefer paper books. The advantage for me of ebooks is portability. Say you want to take a book with you on a trip. So you carry a book. What if you want to take two? Now you have either 2 books or an machine the size of one.

I personally have an ipad mini (gift from the company, so no monies from me). I commute by train (again paid for by the company) so I use that as a reader.

The obvious downside is that if you break the reader or do not have access to power, a book will be way better.

Wait till you get older, reading normal books gets almost painful. On my Kindle I can make the fonts as big as I want. I hardly read paper books any more since now that I'm over half a century old it's kind of tiring after more than 30 minutes. But I can go for hours on an e-reader

Ah, another aficionado of dead tree technology. I find reading long documents online is very tiring. That is why I prefer using dead tree technology by printing the document.

Dead tree technology has many benefits:It never needs to be recharged.It is very portable. Just toss it into your bag. No cords or power supply.It is very easy to share with some one. Just hand the book to them. Remember to put your name in it.It has a very user friendly user indexing system called "dog ear".Simply fold a corner o

An eBook can also be displayed in whatever type size is desired (within reason). I have a relative with a degenerative disease that has affected her eyes. The only reason she can still read is that we gave her a Nook.

I read a lot of books from OpenLibrary [openlibrary.org] (an awesome resource for old books). Most e-books are offered for download in EPUB and PDF format. The PDF is a direct book scan, the EPUB is OCR'd from the scan. Invariably the EPUB is filled with errors caused by OCR - hyphenated words not joined back together, page numbers appearing in the middle of text, words autocorrected to something else, chapter headings screwed up etc. Sometimes the OCR gives up entirely.

It's simply easier to read the PDF although the file size is enormous and you're basically looking at images of some yellowing old book which means lots of panning and zooming particularly on small devices. And forget reading it on an e-reader.

So yeah I think you could automate scanning of books, but the second step of getting it into EPUB format is the tricky part.

I have scanned 100 books from my personal library and realized I can't find nice open source software to OCR the images and search over the text of the entire library for keywords. At some point I created my own clone of Google Books, with OCRopus for translating the images and my own front end for searching and hi-lighting keyword matches. It would be very useful if we had a way to manage searching in hundreds of books, taking notes and remembering the page/citation. It would work like a research library.

Do your high-quality Linux OCR solutions include one that allows me to:1) select rectangular OCR areas of "image", "text", and "table" types for different OCR behavior;2) add or subtract rectangular sub-areas to or from these areas;3) OCR those areas while retaining basic character and paragraph formatting;- and all of that using a stable GUI-based software?

I'm a professional technical translator who would like to be able to work on Linux. Being free as in speech/beer is not required, I'm prepared to pay th

This is just a camera and some CPU board for image processing and interfacing (Wifi, USB, HDMI).If they opened their algorithms, you could probably do the same with a RPi and its camera module (assuming there is no AF or aperture control build in).

Since this product gets free placement here at/., I figure it is okay to put in a word for the good folks at Distributed Proofreaders.

Books are scanned and [sometimes roughly] OCR'd.
Each and every word, period, hyphen, and ellipsis on each and every page is scrutinized by at least three proofreaders.
Each bold, italic, underline and indent is evaluated by at least two formatters.
The work is finalized in HTML, proofread as a whole, and published to Project Gutenberg in various formats, txt, pdf, html and epub.

The resulting publication typically has far fewer publishing errors than the original book. This is especially true of books from the 17th century where drinking was part of a typesetter's expectation.
Be a part of it.
Sign up at http://www.pgdp.net/c/ [pgdp.net]

I've had one of these for quite some time now, and it looks pretty much the same except more expensive and without the foot pedal option (great idea!)

The important thing is the software rather than the hardware which is meant to be able to detect the curvature of the pages on a bound book and adjust for it. It sort of works most of the time on the SV600 but it's not especially fast and neither is it entirely reliable.

I gave up on it mostly because the software for the Mac was pretty unreliable. I do note th

I have seen this kind products for years (lamp-style scanner) all from China (or produced there) with different brands or no name. Now, I know where they copy from.
Don't care much until I saw this appears in Slashdot with title likely about an innovation, I expected a scanner, like which was introduced in Slashdot before:

I use 'scantailor' for post processing scanned pages (nearly automatic) (dewarp, cleaning, etc..) then use cuneiform to ocr (output must be hocr format data) (it's faster and more accurate than tesseract but not update since 2011), then convert to DJVU and embed the ocred text layer into it.

There are a lot of things that simply aren't available on ebooks. And if I purchased the book and I'm using the pdf for my own use then it's not piracy. At least it's not morally wrong to me, and that's the only thing that matters as far as I am concerned.

how many of those do you reread on a regular basis? how many are so old you can buy them for a dollar or two in the kindle store? or simply put them into a wishlist and wait for the periodic sales to buy them for a dollar or two?
i have over a thousand books in my kindle collection. lots of classics are free. lots of books you can buy on sale and read later. the only one i've ever read more than once is A Song of Ice and Fire

When I read your title, I assumed you were going to comment on how cheap the device was because it breaks even after only 40 books. I didn't expect you to think a 40 count book collection would be considered large.

I've tried scanning some of my books with a camera. This is simply an overhead scanner with manual page turning; you can buy them already. Realistically, it probably takes around 2-3s to scan a page, so it's about 20 minutes to scan a 500 page book. That's a lot of time to sit at a table turning pages.

But let's say you're willing to put in the work. The hard part in making this work is the software, not some $200 digital camera on a stick. And the really hard part in making this work is not on books that ar

To me, the device looks like a camera on a copy stand [wikipedia.org].

My guess is that it uses a camera from a cell phone, some LEDs to provide illumination, and the foot pedal is the shutter trigger.

To scan, you hit the foot pedal to snap a photo, turn the page, hit the foot pedal again to snap another photo, turn the page, snap another photo, turn the page again, snap another photo, etc. Software then combines the photos into a scanned document.

Lots, and lots and lots of reasons to dis this offering here. No new tech, just buy eBooks at $10 a pop, who wants hundreds of books on a device, what's so hard about destroying a book, OSS software already exists that does this, etc. etc.

Here's the thing for me: I want a research library I can take where ever I go. I am a heavy research library book user, and I buy a lot of used books, trying to get out of print texts. When I need a book, I need that exact book, and no substitute will do, because none exis

Re 'I want a research library I can take where ever I go." So true:)
The ability to get the distance, light and lens makes the capture more easy. A fast CPU and good software then take over to convert every word into text.
So many other solutions have difficult methods, resolution restricted lens, huge bulky capture systems. Standalone software to do the later OCR might expect flat scanner pages, color corrected, perfect text.
The good part about this system is the understanding of the shape of the boo

(Some) Luddites in academia will still object if you show up in class and pull out a tablet with the book digitized on it. The dead-tree-textbook-publishing racket will die a slow and painful death as the publishing professors and companies seek to maintain their monopoly. $400 for a "new" Calculus textbook printed this year when the previous edition of that same book was in print for only 2 years? In most other areas of life this would be called extortion.

And what about a xerox copy of the book?I remember at school : most of us bought copies of textbooks from a shady copy shop for about the same price as a paperback novel. Sometimes the copies were actually better than the real deal for studying because of the format they were printed on.And before you ask, then yes, it was commercial scale piracy. But it shows that you don't need eBooks to counter the extortion.

The indigogo site says "Your sketches, paintings, and notes can be scanned and stored in the Czur cloud".
Do we have the option to use our choice of server (maybe local)?
What if I don't want everything that I scan going to a company in China?
What if one day the "Czur cloud" is gone - is the scanner then unusable?

Has anybody tracked down these answers? The product seem appealing if non-cloud, independent operation is allowed.

People have mentioned a number of important points like lighting. From the materials presented, there is no built in lighting. The scans produced in the promotional video are horribly lighted, with the top and bottom of the pages very dark, and the middle over-exposed. Horrible.

I would be rather dubious about getting adequate quality images for OCR without controlling the lighting better. (I also wouldn't consider trying a task like this without pretty good OCR. that is near enough a solved problem these d

Your concerns are valid. No, OCR does not handle mathematical formulas very well (in any that I have seen). But remember - OCR does not replace the image, it only augments it (at least if your doing PDFs, I don't know about eBook formats). You are still reading the scanned image. OCR simply provides fast searching and indexing capabilities, a huge win. So formula's can be searched for? Well, in my experience, no math or science book include formulasin its indices anyway so this is no different (the names of

I have seen some ads from Chinese companies, such as their tablets, these could play HD videos while do other multi-tasks without any lag, the responding of touching ability is amazing fast, etc... AND the price is about less than 99$.;)

About this scanner, they claim their scanner could scan 300 pages per 5 minutes, it means 2 pages per 2 seconds (the scanner scans 2 pages at once), it's possible but what I doubt is about the quality of outputs at that speed and price.