Forum Read Only

I got many files... and by many, I really mean MANY... close to 3 million files... might be even close to 4.

How do you manage this many files and structure it in the drive in order to not lose performance? I split my drive into several smaller ones which I was thinking maybe helps... but I got no facts behind this. Recently, I have merged some of them into one
bigger drive and performance is still OK.

Say I got 2TB space... hm, would it be OK to have everything in one fat drive letter? ( fat as in obese, not FAT
:] )

What if my space grows, say to 4TB ( still speaking one physical drive ), would it still be OK to have just one drive letter?

Partitioning was a big thing in the 1990s due to the 2GB partition size limit, then with FAT32 it went away. With modern laptops not having optical drives we see the resurgence of partitions for storing recovery data that should remain hidden from the user.

Partitioning was a big thing in the 1990s due to the 2GB partition size limit, then with FAT32 it went away. With modern laptops not having optical drives we see the resurgence of partitions for storing recovery data that should remain hidden from the user.

Anyway, spreading files over partitions does not help performance.

I see, cool because I'm kind'a sick of partioning right now. In the next PC setup, I'll probably do a small 30GB C: and one other partition for everything else.

Partitioning was a big thing in the 1990s due to the 2GB partition size limit, then with FAT32 it went away. With modern laptops not having optical drives we see the resurgence of partitions for storing recovery data that should remain hidden from the user.

Anyway, spreading files over partitions does not help performance.

we see the resurgence of partitions for storing recovery

What do you mean, resurgence? Every single laptop I've owned since the late 90s has had this, regardless of whether they also included CDs or not. It's hardly a resurgence if it never went away.

Personally, I still use at least two partitions: one for the OS and applications, and one for data. This is a habit I started when I first joined the Windows 2000 beta program, because it allows me to wipe and reinstall the OS without having to think about
whether I have all my data. I never put anything that's not easily recoverable from some other source on the system partitions, so I always know it's safe to format that partition.

I also don't like combing drives into one partition so I currently have several drive letters for my three separate physical drives. This is more because I fear if one of them fails I'd lose the data on the others too. If you do want to combine them, and
it's performance you're after, I say go the whole nine yards and use RAID 0.

Unlike with FAT, NTFS cluster size does not grow with partition size. There may be some performance impact from the growth in size of the MFT, but it's minimal. You should however keep the number of files in a single directory low, NTFS does get slow if
that number gets too big.

What do you mean, resurgence? Every single laptop I've owned since the late 90s has had this, regardless of whether they also included CDs or not. It's hardly a resurgence if it never went away.

Personally, I still use at least two partitions: one for the OS and applications, and one for data. This is a habit I started when I first joined the Windows 2000 beta program, because it allows me to wipe and reinstall the OS without having to think about
whether I have all my data. I never put anything that's not easily recoverable from some other source on the system partitions, so I always know it's safe to format that partition.

I also don't like combing drives into one partition so I currently have several drive letters for my three separate physical drives. This is more because I fear if one of them fails I'd lose the data on the others too. If you do want to combine them, and
it's performance you're after, I say go the whole nine yards and use RAID 0.

Unlike with FAT, NTFS cluster size does not grow with partition size. There may be some performance impact from the growth in size of the MFT, but it's minimal. You should however keep the number of files in a single directory low, NTFS does get slow if
that number gets too big.

"You should however keep the number of files in a single directory low, NTFS does get slow if that number gets too big."

...that was my next question. I see.

Is there any theoretical "max" file numbers one should have inside one folder to keep performance "good"? My guess is less than 100K or would it be even less? Like less than 20K?

"You should however keep the number of files in a single directory low, NTFS does get slow if that number gets too big."

...that was my next question. I see.

Is there any theoretical "max" file numbers one should have inside one folder to keep performance "good"? My guess is less than 100K or would it be even less? Like less than 20K?

I don't know what the exact number is. I'd try to keep it under a 1000 personally if only for my own sanity in trying to find stuff, and because Explorer will probably start suffering before NTFS itself does.

I don't know what the exact number is. I'd try to keep it under a 1000 personally if only for my own sanity in trying to find stuff, and because Explorer will probably start suffering before NTFS itself does.

I don't know what the exact number is. I'd try to keep it under a 1000 personally if only for my own sanity in trying to find stuff, and because Explorer will probably start suffering before NTFS itself does.

If you had three million files, and you wanted to keep them in directories containing less than 1000 files, you could arrange them by file hash. Create a directory called Hash or something like this, and create hex named directories from 00-ff, and for
each of those create the same set. Hash the files and move them into the directories by the first two byes of the hash. Or is this silly?

If you had three million files, and you wanted to keep them in directories containing less than 1000 files, you could arrange them by file hash. Create a directory called Hash or something like this, and create hex named directories from 00-ff, and for
each of those create the same set. Hash the files and move them into the directories by the first two byes of the hash. Or is this silly?

-Josh

That's silly. If you had three million files you organise them by content and subject matter. I doubt anyone has three million files of related data.

Of note: why doesn't the Disk Usage-o-meter say how much space is taken up by the filesystem itself?

...and why can't the filesystem be held in-memory? That way filesystem traversal would be instantaneous.

I don't know what the exact number is. I'd try to keep it under a 1000 personally if only for my own sanity in trying to find stuff, and because Explorer will probably start suffering before NTFS itself does.

NTFS uses b-trees for it's directory structure - you should be able to put millions of files in a single directory without the time to open a single named file increasing significantly. Of course Windows Explorer will become slow and use a lot of memory
and don't even think about sharing that many files over SMB. But NTFS itself is fine with it.

NTFS uses b-trees for it's directory structure - you should be able to put millions of files in a single directory without the time to open a single named file increasing significantly. Of course Windows Explorer will become slow and use a lot of memory
and don't even think about sharing that many files over SMB. But NTFS itself is fine with it.

Yes, because of the way the MFT works, max files per volume and max files per folder are the same thing, and how you organise them should make little or no difference to performance of NTFS.

To a file server handling requests, it's probably much of a muchness... but, of course, Windows Explorer viewing a folder with 2^32 files, might a different matter

Large volumes and/or folders can suffer from mft/folder/file fragmentation, and so I find that it's often good practice to archive rarely used files to separate volumes, rather than mixing rarely acessed files with frequently accessed files... unless you
want to use an automated defrag ultility.

To a file server handling requests, it's probably much of a muchness... but, of course, Windows Explorer viewing a folder with 2^32 files, might a different matter

Large volumes and/or folders can suffer from mft/folder/file fragmentation, and so I find that it's often good practice to archive rarely used files to separate volumes, rather than mixing rarely acessed files with frequently accessed files... unless you
want to use an automated defrag ultility.

I'm not sure about XP and earlier (don't recall offhand, and I'm too lazy to go boot XP), but Vista and Win7 will automagically run scheduled defrags in the background. IIRC, one caveat is that you can't defrag the MFT when the volume is in use, it has to
be done at boot time (similar to a disk check/repair).

I'm not sure about XP and earlier (don't recall offhand, and I'm too lazy to go boot XP), but Vista and Win7 will automagically run scheduled defrags in the background. IIRC, one caveat is that you can't defrag the MFT when the volume is in use, it has to
be done at boot time (similar to a disk check/repair).

There are automated defrag utilities (diskeeper for example) that monitor the MFT and Pagefile to defrag them while the volume is in use.

That's silly. If you had three million files you organise them by content and subject matter. I doubt anyone has three million files of related data.

Of note: why doesn't the Disk Usage-o-meter say how much space is taken up by the filesystem itself?

...and why can't the filesystem be held in-memory? That way filesystem traversal would be instantaneous.

One could persist the file system index on a system partition on a separate solid state disk to mitigate the issue. I
imagine one could perhaps also use the cache of a hybrid disk to maintain the index.

The problem with organizing files and with tree-structured file systems is that often files do not naturally fall into a single category, so what is needed is a graph-structured layout. On the other hand, few people want to maintain a graph-structured layout
manually. It's just too much work.

The new Semantic Engine that Microsoft presented at the last PDC looks like an attempt to solve this issue by having an engine that applies machine learning techniques to automatically index files - both textual and binary, such as images and audio. It'll
be interesting to see how easily extensible it is. It could be one hell of a replacement for ifilters. There's so many interesting types of files that are not indexed currently.

NTFS uses b-trees for it's directory structure - you should be able to put millions of files in a single directory without the time to open a single named file increasing significantly. Of course Windows Explorer will become slow and use a lot of memory
and don't even think about sharing that many files over SMB. But NTFS itself is fine with it.

That windows explorer doesn't handle folders with a large number of ("1st generation") files sounds more like a design issue with windows explorer than an intrinsic issue with NTFS as you say...

One could persist the file system index on a system partition on a separate solid state disk to mitigate the issue. I
imagine one could perhaps also use the cache of a hybrid disk to maintain the index.

The problem with organizing files and with tree-structured file systems is that often files do not naturally fall into a single category, so what is needed is a graph-structured layout. On the other hand, few people want to maintain a graph-structured layout
manually. It's just too much work.

The new Semantic Engine that Microsoft presented at the last PDC looks like an attempt to solve this issue by having an engine that applies machine learning techniques to automatically index files - both textual and binary, such as images and audio. It'll
be interesting to see how easily extensible it is. It could be one hell of a replacement for ifilters. There's so many interesting types of files that are not indexed currently.

I thought that WinFS was the attempt to manage this... essentially a relational view of the underlaying NTFS attributes.

My solution.... don't get 3million files. 3 million files is the real problem here. It simply doesn't make sense.

I dunno, someday soon it might. I do not think a million files is out of the question. If you keep compressed copies of images (to send to family and friends) in addition to the edited and orgional copies. You take many exposures .... over a half a decade
...