Posted by: RTT

What about a case where we have a text-book archived with an archive of a CD content, where many file formats are recognizable by the scanner, for example, *.txt, but ultimately have no purpose for being indexed into a DB?

The archive within archive scan depth, that I just finished implementing, is about instructing the scanner how many levels of archives inside archives should be scanned. If in the scenario you are referring, these .txt files are archived in an archive inside a main archive, then setting the scan depth can indeed exclude these files from the indexation, and speed-up the scanning. But if you just want to scan all, the scan depth check, in the end, makes the process slower. But not that much, and the feature is indeed useful.

Posted by: Padanges

What about a case where we have a text-book archived with an archive of a CD content, where many file formats are recognizable by the scanner, for example, *.txt, but ultimately have no purpose for being indexed into a DB?

if (fileName.indexOf('>') > 0) { // remove archive name tagfileName = fileName.substring(fileName.indexOf('>') + 1); }After messing around I found out that it would not work properly depending on archive depth.

Currently our file name pattern is: <archive.zip>archive-inside.zip|document-inside.pdf . Wouldn't it be simpler if we had pattern like this: <archive.zip><archive-inside.zip>document-inside.pdf ?

No. Current format makes it easy to parse with a simple split operation. What's after the main archive name will be handled by the un-archive code, and it is passed to it as the filename to extract. It splits it and follows the split array in order to reach the last level, that is the file the caller requested.

Posted by: Padanges

if (fileName.indexOf('>') > 0) { // remove archive name tagfileName = fileName.substring(fileName.indexOf('>') + 1); }After messing around I found out that it would not work properly depending on archive depth.Currently our file name pattern is: <archive.zip>archive-inside.zip|document-inside.pdf . Wouldn't it be simpler if we had pattern like this: <archive.zip><archive-inside.zip>document-inside.pdf ?

Posted by: Padanges

Hi,is it possible to limit the depth of archives for document scanning? For example, I have an archive within an archive, and I would like to find only documents which are only in the primary archive - is there a way to do that?