This patch tries to prevent calling KFileItemModel::retrieveData() when possible in order to save memory and time when loading huge directories.
There are changes in the following members of KFileItemModel:
* data() and setData() are modified such that the QHash "values" for an item is initialized with retrieveData() if that has not happened already.
* indexForKeyboardSearch() uses the "text" from the KFileItem itself, rather than looking than calling data() (which can be expensive if called for many items because data() might call retrieveData()).
* isExpandable() calls data(), rather than accessing the QHash "values" directly. If this hash was not initialized it, isExpandable() would always return the default value "false".
* createItemDataList() does not call retrieveData() for each item any more. However, if the sort role is such that sortRoleCompare() would try to access the QHash "values" when sorting, retrieveData() is called to ensure that sorting works correctly.
Things that could still be improved:
* Currently, the layouting in Compact View calls KFileItemModel::data() for every item. In the case that only the "Name" is shown, this could be prevented (similar to the approach that is used in Icons View, see https://git.reviewboard.kde.org/r/112253/
* When changing the roles (e.g., by adding or removing columns in Details View) or changing the sort role, retrieveData() is currently called for all items. Moreover, a changedItems signal is issued for all items, such that KFileItemModelRolesUpdater tries to load icons/previews for all items (like in Dolphin < 4.11).

Testing Done

I opened a directory with 100,000 files in Icons View and Details View with a release build, and measured the memory usage with KSysGuard. Moreover, I measured the time required to load the directory using the following patch for DolphinView: http://paste.kde.org/pae0a64c1/
The time required for the initial loading is not very meaningful because it varies strongly. We can get a better idea of how much time is spent inside Dolphin for loading if all items are cached, so I re-loaded the folder 8 of times by pressing Alt+Up and then Alt+Left.
In Icons View, the memory consumption is reduced from 198.2 MB to 164.6 MB (-17%), and the average time from 2898 ms to 2814 ms (-2.9%).
In Details View, the memory consumption is reduced from 225.1 MB to 166.6 MB (-26%), and the average time from 1539 ms to 1076 ms (-30%).
The savings are larger in Details View because retrieveData() stores more information about the items in the QHash "values" (like the "expanded parents count").
I could not see any significant differences in Compace View (as expected).

Screenshots

Files

Hi Frank,
I wonder who said that having folders with 100.000 entries is insane? ;) Either You or David said that to me when i first brought up massive folders and optimization ideas for it. It's very nice to see that you're now optimizing for those cases as well! Good job and keep hunting those resources.
Now for the more constructive feedback - or rather, an idea where you might find much more memory being wasted where i haven't found a good solution yet. With massive folders the UDSEntry object is stored for each object. In your case (since you're working from the higher level classes) the UDSEntry is stored within KFileItem. I think you can save a lot of memory in that class by making it smarter or different. KFileItems also has a bunch of private members to store the same info from UDSEntry thus i can imagine that quite a lot of data is being duplicated there.
Good luck in your resource hunting :)

I think that both David and I were a bit skeptical about the "500,000 files in one folder" thing ;-) I still think that optimizing for this case might not be the best thing to do if it causes a very significant cost in terms of code complexity. However, there are far more opportunities for small changes like the one I've proposed here than I first thought, which do not require massive code modifications, but still reduce our resource usage in all cases (also if there are far less files in one folder).
And I admit that opening large folders is indeed the best way to find out about such opportunities, and also to measure how much we gain by a change, so your input in the "massive folder" area was very useful, more useful than I first thought!
About the UDSEntry issue: Yes, I've also noticed that this is one of the major consumers of memory, but improving this is probably not quite trivial, so I've focused on things where we can gain something with less effort inside Dolphin for the time being. In the long term, a way to have most information about files (i.e., everything that is not needed for sorting) lazy-loaded probably makes sense.

Hi Frank,
I'm just glad you now do realise the importance of benchmarking with massive folders :) Ok, granted, i had to take some "who in his sane mind uses such big folders" comments but other then that it turned out wonderfully nice if you ask me. Actually, my initial benchmarks where with "just" 100.000 files, but i soon made enough experimental performance improvements to raise the bar to 500.000. At that level you really begin to see massive memory duplication. I'm playing with compression ideas, but haven't been got anything working in that area.
Lazy loading is awesome, but if you can compress the data enough that you still have all details without lazy loading is even better i think. For example, doing a directory listing on any given folder gives at least the following UDSEntry values:
- Device ID
- Inode
- User
- Group
That are the same for all files! Ok, you can have folders where you have files from multiple users and if you have folders then you also have different inode numbers so a simple static cache won't work. But this is an area where a _lot_ can be gained. Since we're going off-topic here, posting this to kfm-devel as well.

Change Summary:

Oops, it turns out that I forgot two places in the code which still assumed that the QHash is always filled with data. That caused a crash when grouping by "Name" in some situations. Fixing that is straightforward, but I guess I'll better test it a bit more and probably add a unit test to make sure that I don't mess anything up before pushing to master.

Diff:

Change Summary:

Added a unit test (which could be pushed to KDE/4.11 as well - it never hurts to add tests to the stable branch IMHO).
The problem was that isChildItem(int index) would return "false" incorrectly when the QHash for that item was not initialized yet. The grouping code would then try to read the "text" from the empty QHash, which yielded an empty QString, and then accessing the first character of that string caused a crash.
I think that it works OK now, but I'll test it for a few days more and give others the opportunity to shout if they notice anything.