Hi,
I have a few shares indexed on our SAN which total about 12M files and 500K folders. I need to re-scan them daily to update the index.
Having the shares on Indexes->Folders, the re-indexing takes about 16h (and frequently crashes mid-way). Change monitoring is OFF.

I analyzed network accesses during both scans, and the root cause seems clear:
- EFU scanning does a depth-first scan of the tree
- Folder scan does a breadth-first scan of the tree

The breadth-first is extremely inefficient on modern storage because it does not take advantage of folder Caching. Imagine a very reduced folder set and a disk with a very small Cache, enough to hold only 5 folders:
Depth first:

Thanks, awesome RTT from you
I'll test this new build all let you know the scan time.

For the crashes: Everything would just freeze with no error, forcing me to kill the process. I've had Debug logging enabled for some time, but not anymore - I'll reenable and try to reproduce. I remember on one ocasion the debug log ended with something similar to this:

Btw, the printed scan duration times are wrong, as you can see from the timestamps.
Another annoying issue this exposed: after a successful Folder rescan the DB is not saved to disk. So if Everything crashes... that scan is lost, it just reloads the old database.

Well, it seems index time is similar for me with this version, though I see on the log it now does Depth First. There goes my "cache" theory...

Is it possible that the cost of adding new entries to the index keeps growing as the index grows? With 12M files, 500K folders and 25 tree depth, this seems a possibility. I know there's also interference from other processes running overnight on the filers (backup, sync), but even so, creating an EFU file is much faster during the same time period:

That's a massive 7x improvement over Everything EFU and 32x improvement over using Everything Folder indexing This is the solution I'm using for the last few days. I've compared EFUs produced by my tool in update mode with Everything's full scan, and they match.

Last edited by zybexXL on Wed Mar 06, 2019 5:22 pm, edited 1 time in total.

Please check the log time between rescan update db and leave folder update.
During this time Everything is processing the folder rescan.
I think you will see Everything is spending most of its time here.

Are there many changes to your files over 24 hours, seems to me you must have millions of changes for Everything to take 10+ hours to update the db.

I think you will also find it will be faster for Everything to perform a re-index than update the db.

Please try disabling folder rescanning and schedule a reindex:

In Everything, from the Tools menu, click Options.

Click the Folders tab.

For each folder listed:

Check Never rescan.

Click OK.

Create a Windows scheduled task to run the following command:
Everything.exe -reindex

The fast scanning method

I've added to my TODO list: use FindFirstFileEx to find folders only to build a list of modified folders to rescan during a folder update.

Please check the log time between rescan update db and leave folder update.
During this time Everything is processing the folder rescan.
I think you will see Everything is spending most of its time here.

Nope - here are those timestamps, for each of the 5 shares that are being scanned:

Total scan time 15:49h, of which only about 4 minutes doing "rescan update DB".
I'll try to anonymize the full debug log and send it to you. My company is quite paranoid about these things, though the filenames/paths in our case do reveal quite a lot of info

Are there many changes to your files over 24 hours, seems to me you must have millions of changes for Everything to take 10+ hours to update the db.

No, it can reach some 100.000 daily changes max, usually much lower. Some days where lots of data is compressed/archived it may go up to 500k (mostly deleted files), but that's rare. File churn is really high here... most changes are either new or deleted files, not modified files.

I think you will also find it will be faster for Everything to perform a re-index than update the db.
(...)
[*]Create a Windows scheduled task to run the following command:
Everything.exe -reindex

I'll try this. But does "reindex" also perform a folder scan? Isn't this the same thing?

"I've added to my TODO list: use FindFirstFileEx to find folders only to build a list of modified folders to rescan during a folder update."

Great You can check my code for the algorithm, though I didn't comment it much. It's a fairly simple concept.

There's always some load on these filers, but I don't have admin access to them to quantify it. They are however massive beasts, with huge IOPS capability. During a 16h scan time there are bound to be other processes running, like volume snapshots, backups and rsyncs, but that doesn't impact performance as much as it sounds.

Everything itself is running on a VM (doing pretty much nothing else), which I've seen can be fairly slower for I/O networking than for instance my desktop PC. I've measured some 20-30% speed difference, but not more than that. Perhaps VMWare impacts low-priority threads more severely, but I don't know.

@void
Any ETA for implementing faster rescan mentined by @zybexXL?
Thanks for your great tool and also thanks to zybexXL for SMB rescan optimizations as it has big impact not only on huge number of folders/files but also on shares (networks) with high RTT!
Thanks!

Great thanks! Will check this thread and the beta one when it will be released.
Btw. from technical point of view what was the difference between your old SMB (re)scanning code and the "optimized" one as it wasn't problem only with cache hits or depth-first/breath-first reading? Like not to traverse folders for files with not modified date, etc.?
Thank you very much

Btw. from technical point of view what was the difference between your old SMB (re)scanning code and the "optimized" one as it wasn't problem only with cache hits or depth-first/breath-first reading? Like not to traverse folders for files with not modified date, etc.?

There is little performance difference between the two versions, theoretically it should be faster, in real world scenarios there's no noticeable performance gain.

Everything-1.4.1.935 and earlierEverything-1.4.1.936 and later would scan in the following order:
c:\folder1
c:\folder1\subfolder1
c:\folder1\subfolder2
c:\folder2
c:\folder2\subfolder1
c:\folder2\subfolder2
c:\folder3
c:\folder3\subfolder1
c:\folder3\subfolder2

Thanks guys
That is exactly what I asked for, mentioned in VOID last sentence
I know about the change between 935 and 936 but from your discussion with zybexXL I know the rescan speed difference was minimal as problem was in thread priority and not generally in depth-first or breath-first reading.
My question was about difference of your old code and zybexXL latest one as my suspicion was that the main difference will be "not to traverse folders for files with not modified date" but wasn't sure if only this difference can make so huge speed difference or there is also another optimizations done not mentioned here.
Your last sentence confirmed my assumption (skip non-modified folders is the real gain here)
Thanks!