I have a question about enumerating a directory with 100 000 and more files.

Basically, CBFS drive is capable of displaying even 200 000 files, but, it is taking quite a long time to enumerate all files.

I have tested the same number of files in SMB Share Drive.
What SMB drive does is, it adds files to Exlorer's file list on-flight (is it "NotifyDirectoryChange"?).
For example, when you double click the target folder, Explorer is not blocked.
You can open, read, write files, where the number of files in the folders continue growing, until all of them are enumerated.

CallbackFS should work similar to SMB. Windows performs directory enumeration in the following way:
1. Opens a directory.
2. Performs several "enumerate directory" calls to enumerate files in the directory.
3. Close the directory.

These "enumerate directory" calls are not blocking calls. I.e. in parallel it's possible to do any other operations with files/directories on the disk. But it's necessary to have free worker threads for it (the CallbackFileSystem.ThreadPoolSize property must be greater than 1, but maybe it's better to set it to 10 or even more) and the CallbackFileSystem.SerializeCallbacks property must be set to false.

Quote

Ulughbek Muslimov wrote:
For example, when you double click the target folder, Explorer is not blocked.

Perhaps Explorer is blocked because your directory enumeration is too slow. During one enumerate directory request, which Explorer performs synchronously, it requested several files to enumerate (actually it's the ZwQueryDirectoryFile API call). But CallbackFS, in order to simplify implementation of the user CallbackFS callbacks, calls the OnEnumerateDirectory callback for each file being enumerated. So Explorer is waiting until several files (usually it's about from 1 to 10 which depends on a buffer size passed to ZwQueryDirectoryFile) are finished to enumerate. Try to process the OnEnumerateDirectory callback maximally fast.
Another reason is in the case you use a local type of mounting points (i.e any one except created with the flag CBFS_SYMLINK_NETWORK). In this case Explorer "thinks" that the disk is local (i.e. fast) and during enumeration also opens each enumerated file and reads thumbnail for it.

But, in our case, we have to deal with POSIX readdir() method in DirectoryEnumerationContext() to return mFileList.
This is where our drive is blocking.

Is is safe to return mFileList with, let's say, 10 000 FileInfos first, and push FileInfos starting from 10 001 to the end of mFileList using a worker thread?
What negative issues may come out from using a worker thread inside callbacks?

Ulughbek Muslimov wrote:
What negative issues may come out from using a worker thread inside callbacks?

Explorer will be "frozen" until the OnDirectoryEnumeration callback isn't finished.

Quote

Ulughbek Muslimov wrote:
Is is safe to return mFileList with, let's say, 10 000 FileInfos first, and push FileInfos starting from 10 001 to the end of mFileList using a worker thread?

In the case of 10000 files it seems it's ok. I'm not good at .NET, but as I understand the GetFileSystemInfos method allocates list of objects for each of enumerated files. Let suppose that each object is about 50 bytes long (~15 symbols for file name and 20 bytes extra). So 50*10000 = 500000 bytes, which is not so much for desktop/server systems.

I had a similar problem while enumerating directories and the best solution i came up with in terms of performance and stability was to use native api to enumerate directory contents. Its much faster and allows you to have enumeration context of directory something .NET itself lacked.

I haven't used the latest .NET additions to directory enumeration but its seems to me that the problem will remain in your case since you still cant have a real enumeration context with them, you should check out this function on MSDN FindFirstFile and build a custom enumeration context based on it.

I agree with you that native API is much more faster than .NET in the context of working with File IO.
But, the thing is, I do use a native API library.
The library is used to communicate with our Linux Servers.

I have solved the problem partially.
What I did is, DirectoryEnumerationContext() is returning mFileList with only file names in it, which is really fast.
And, all other file info is retrieved in EnumerateDirectory() callback.

Originally, mFileList would return full file info, which would take a long time to construct. Cause, in our native library, first you get file name (readdir()), and then, you request file statistics (statfile()) for the given file name.
If those two are combined in DirectoryEnumerationContext() only, it would take at least twice more time to enumerate the directory (100000 requests for file name + 100000 requests for file stat).
Moving statfile() to EnumerateDirectory() has solved the problem. Still it is taking much time to enumerate all 100000 files. But, this is not CBFS's problem.
The good thing is, Explorer is not "freezing" anymore. It is possible to open, read, write files, while the number of files in given directory continues growing.

We use cookies to help provide you with the best possible online experience. By using this site, you agree that we may store and access cookies on your device. You can find out more about and set your own preferences here.