Inefficient scanning is a big issue for me, as I am trying to go paperless in a much smaller office space. I eventually took the time to track down the ongoing problem of large file size. I did two trials of scanning of a two-page document, one trial at 75dpi and another at 100dpi, both in grayscale. They were both 350KB, which seemed awfully suspicious. From a websearch, I found that I could use Adobe Pro to extract the image files contained within the PDF files. I'm not in the office right now, but from what I can recall, each page was in the low umpteen KBs, if not below 10KB, and lower pixel density did in fact yield smaller file sizes.

Xerox's 350KB seems to be a minimum PDF container size, regardless of the images contained therein. That's more than an order of magnitude bigger the actual scanned image files therein. I could use analytical software to further decrease the bits per pixel so that the files were *well* below 10KB without out noticable degradation in readability. You don't need many levels of grayscale for scanned hand notes.

The scanning capability is of limited utility for going paperless if it creates such unnecessarily exorbitant file sizes. Users need to be able to scan documents, often times notes of a few pages, without a second thought. In an operational setting, it is impractical (and often impossible) to shuttle documents to other networks where analysis tools can be brought to bear to recreate the scanned documents with the correct sizes. I'm hoping that there is actually a solution to this?

As well, is there a way to unmark the response as the answer in the thread linked to above?

Re: PDF files from scan are much larger than the images files therein

You sound very knowledgable so this is an unlikely solution but is it possible that your scan templates are set to default ‘searchable PDF’ which would increase file size. Also have you considered the compression settings and other defaults that may increase file size? If the notes are on a coloured or grey paper you could suppress the background when you can.

Re: PDF files from scan are much larger than the images files therein

@JoeDaft: I'm actually not knowledgeable at all about the equipment or image processing. I just spent a lot of time experimenting and reading up on the image handling features of Matlab.

As for scan templates and defaults, I'm not sure how practical it is for a mere user to get into that. Ours is a very corporate environment, and users send print jobs to a queue, then log into one of many printers to print them out. A lot of configuration is done by IT staff, and the users have available to them the scan options accessible via the user panel on the machine. I contacted IT today, but I don't have visibility into the ongoings. I don't know whether and when it will be addressed.

There is another challenge too. If there is a way to access parameters other than those on the control panel on the machine, it wouldn't be practical for me to use that unless the "how" is obvious. This is simply due to the reality that the time available for these detours is limited (and I'm pretty sure I burned up that slack many hours ago).

Re: PDF files from scan are much larger than the images files therein

The defaults on the templates don’t need to be user changeable. It could be that IT have configured these in a way that increases file size such as the points noted above so check with IT or your supplier who can assist.

The connectkey devices are extremely productive but an incorrect set-up can cause problems like anything else. I would be inclined to raise a service call and organise training. Your supplier should be able to assist remotely

Re: PDF files from scan are much larger than the images files therein

Everything is changed on the web interface of the printer, this is the same as pretty much any network printer of any brand.

You can change things at the machine on a per scan basis, but changing what the defaults are is done on the web interface (From here on I will refer to it as CWIS).

To make your sizes smaller, you disable the things that make the scan larger. The first thing you would want to do is disable the OCR (Searchable) option, because embedding the fonts increases the size dramatically. And since you don't same to care about print quality and are just wanting good digital look (Small file size is OK on a PC, it is awful on a print)

So take the printers IP address and put it in the address bar of your web browser to load CWIS.

Go to the Scan tab and select the template you want to change on the left (1) then scroll down on the right, increase compression and decrease file size (2) and disable searchable (3). You will also notice many options for encoding, I can't tell you what is best for your usage as I don't know your usage, so you will just need to test it.

Once you figure out what is needed, apply it to the other templates as needed, to make these settings the default for any templates created in the future, just go to CWIS > Properties > Services > Workflow Scanning > Default template and apply these same changes there

Current firmware levels also are able to do some smaller sizes, so update your printer to this version

Please be sure to select "Accept Solution" and or select the thumbs up icon to enter Kudos for posts that resolve your issues. Your feedback counts!

Re: PDF files from scan are much larger than the images files therein

Hello, JoeDaft and Joe053204-xrx,

Our IM setup is geared to a large enterprise. Users can't access individual printers on the network. Jobs are sent to a central network printing queue, then users go to individual printers to have it print jobs from the central queue. There is no network path info provided for the individual printers.

I have asked IT staff to look at templates, but so far, they are quite comfortable with 350KB minimum sizes because they can be successfully opened. I have explained my concern, which is how such files quickly accumulate into megabytes, and if sent by email, they also bog down mailboxes.

I also looked from some of the factors affecting scan file size on the user panel on the physcial printer. There is no control for OCR and the compression is "medium". My understanding is that compression is usually JPG based, which means good for images with colour and grayscale gradients, but not great for things like schematics and text.

As I describe in the original post, however, the image files themselves are small. The PDF container file is wasteful of space. So image compression is probably not the major culprit, though as I said, it may be a partial culprit just based on the fact that if I repackage the images into PDF, the file size is one fifth of the scan file size.

As for OCR, I'm not too familiar with how that is done in PDF, but if there is no recognizable text, I wonder if space would be wasted on font information within the PDF file. That's the case with my scans -- hand drawn schematics and hand scrawl, unlikely that there would be recognizable text.

Re: PDF files from scan are much larger than the images files therein

@Joe053204-xrx: I've referred our expert to this thread, and I will also point out the specific post in which you provide pictures of the CWIS controls. Meanwhile, he provided me with the following screen shots of the controls that he was able to find. None of them seem to be the answer. I can access many of them as a normal user at the printer/scanner/copier panel.

Re: PDF files from scan are much larger than the images files therein

Our IT person confirmed that our firmware is up-to-date and that OCR is disabled. I manually tried high compression and found that it made very little difference (the warning displayed to me also said that it has minimal impact on non-photographic content).

I am currenly sending the scans to anoher network, extracting the images, reducing the grayscale levels, and recombining them into a much, much smaller PDF. Looking ahead to the aimed-for paperless office, I find that the scan capability's value is severely compromised due to the onerousness of this requirement.

As I have described, the scanner's original PDF is much bigger than the sum of the sizes of the image files contained therein; decreasing the number of grayscale levels aside, therefore, it looks like a large contributor to the problem is the manner in which the original PDF is assembled by the scanner from scanned images.