DIY Book Scanner

Daniel Reetz, the founder of the DIY Book Scanner community, has recently started making videos of prototyping and shop tips. If you are tinkering with a book scanner (or any other project) in your home shop, these tips will come in handy. https://www.youtube.com/channel/UCn0gq8 ... g_8K1nfInQ

Hey guys, back with yet another question about ST. It's not a huge issue, but I've noticed that if you use the automatic content selection it will be a bit aggressive in selection, and will more often than not cut off a few edges of the content. At the bottom this is usually the lower part of the letters "g", "y" or "p", at the right it can be the right part of the letters "e", "r", "f". At the top (or bottom) it can be the edges of the page numbers.

Here's an example:

Notice how the very bottom part of the letter "g" is cut off slightly, and after I've adjusted the selection box manually it's restored.

My question is if there is a way to adjust how "aggressive" ST is in selecting content? It's a bit cumbersome going through every page adjusting the content selection manually but that's what I've been doing since I've been unable to find any automatic way to prevent the edges of the text from being cut off.

FWIW, I work around this (and a few other) auto selection issues by not using the content auto-selection at all (I kept losing footnotes or image portions).

If your pages are fairly close to the same position in the scan field (I use a form feed scanner so they're always relatively close) you can get very good results by selecting just inside the actual physical page margins on one page and then applying that selection to all pages. If your pages alternate or you want better text centering, you can select on one page and apply to "every other" page, and then do the same thing on the following page.

Using that method, I only have to do manual adjustment for a handful of pages with unusually large images or that are out of alignment with the rest. Remember to turn off the extra marginal padding if you use this method.

Another alternative, with some similarity to alraban's suggestion, is to use BookCrop to first crop all pages and then use Scan Tailor Enhanced (an older fork, not the same as Scan Tailor Experimental) and a script to command line process with the whole page as content. With that method there is no manual steps inside ScanTailor at all, but one in BookCrop. Works well for text only scans.

or do I go insane working through hundreds of pages manually selecting content?

I do an automatic selection first, and then check one page after the other and apply manual corrections if needed.

Same with page separation or skewing.

Yeah that's what I've been doing too, the problem is that every page needs corrections on all 4 sides, so for a 200 page book you basically need to do 800 margin adjustments in order to prevent clipping of the edges. Add that to the usual necessary adjustments and suddenly you need to spend a LOT of time on each book simply because the automatic content detection has a small flaw.

According to Tulon there is apparently a simple fix, but without knowledge of C++ it is beyond my abilities. Maybe one day I can learn some basic C++ and fix it, if nobody else picks up the mantle.

EDIT: Oh hold on, I just saw that somebody is developing Scan Tailor Advanced, a new version of ST! This is amazing, perhaps the problem is fixed/can be fixed there! Excuse me while I try out this new version.

in that case I think that you are worrying too much. It's OK when the content box is drawn very tight.

I have to do corrections on some pages when a spot is erroneously taken for a piece of text, so that I have to draw the content box narrower to the actual text. Or when there is a title page with a graphic frame, and the automatic guessing thinks the frame to be some dark shadow to be removed.

Well, the bug does results in parts of letters being unnecessarily being cut off on every single page, simply because the content box is being erroneously drawn at a lower resolution. This is not something that happens just some of the time...it happens all the time on every single page. I guess we're all different, to me it's an issue.