Introduction

WPF provides some clever UI virtualization features for dealing efficiently with large collections, at least from a UI perspective. What isn’t provided is a generic method for achieving data virtualization. While several posts on internet forums discuss data virtualization, no one has (to my knowledge) published a solution. This article presents one such solution.

Background

UI Virtualization

When a WPF ItemsControl is bound to a large collection data source, with UI virtualization enabled, the control will only create visual containers for the items that are actually visible (plus a few above and below). This is typically only a small fraction of the entire collection. When the user scrolls, new visual containers are created as items become visible, and old containers are disposed when items are no longer visible. When container recycling is enabled, it will reuse visual containers instead of creating and disposing, avoiding the object instantiation and garbage collection overheads.

UI virtualization means that controls can be bound to large collections without incurring a large memory footprint due to visual containers. There is, however, a potentially large memory footprint due to the actual data objects in the collection.

Data Virtualization

Data virtualization is a term that means achieving virtualization for the actual data objects that are bound to the ItemsControl. Data virtualization is not provided by WPF. For relatively small collections of basic data objects, the memory consumption is not significant; however, for large collections, the memory consumption can become very significant. In addition, actually retrieving the data (e.g., from a database) and instantiating all the objects could be time consuming, particularly if a network operation is involved. For these reasons, it is desirable to use some sort of data virtualization mechanism to limit the amount of data objects that need to be retrieved and instantiated in memory.

Solution

Overview

This solution makes use of the fact that when an ItemsControl is bound to an IList implementation, rather than an IEnumerable implementation, it will not enumerate the entire list, and instead only accesses the items required for display. It uses the Count property to determine the size of the collection, presumably to set the scroll extents. It will then iterate through the onscreen items using the list indexer. Thus, it is possible to create an IList that can report to have a large number of items, and yet only actually retrieve the items when required.

IItemsProvider<T>

In order to utilize this solution, the underlying data source must be able to provide the number of items in the collection, and be able to provide small chunks (or pages) of the entire collection. This requirement is encapsulated in the IItemsProvider interface.

///<summary>/// Represents a provider of collection details.
///</summary>///<typeparamname="T">The type of items in the collection.</typeparam>publicinterface IItemsProvider<T>
{
///<summary>/// Fetches the total number of items available.
///</summary>///<returns></returns>int FetchCount();
///<summary>/// Fetches a range of items.
///</summary>///<paramname="startIndex">The start index.</param>///<paramname="count">The number of items to fetch.</param>///<returns></returns> IList<T> FetchRange(int startIndex, int count);
}

If the underlying data source is a database query, it is relatively simple to implement the IItemsProvider interface using the COUNT() aggregate function, and the OFFSET and LIMIT expressions, provided by most database vendors.

VirtualizingCollection<T>

This is the IList implementation that performs the data virtualization. The VirtualizingCollection(T) divides the entire collection space into a number of pages. Pages are then loaded into memory as required, and released when no longer required.

The interesting parts are discussed below. For all the details, please refer to the attached source code project.

The first aspect of the IList implementation is the implementation of the Count property. This is used to by the ItemsControl to gauge the size of the collection and render the scrollbar appropriately.

The Count property is implemented using the deferred or lazy loading pattern. It uses the special value of -1 to indicate that it has not loaded. The first time it is accessed, it will load the actual count from the ItemsProvider.

The other important aspect of the IList interface is the indexer implementation.

The indexer performs the clever bit of the solution. First, it must determine which page the requested item is in (pageIndex), and its offset (pageOffset) within that page. It then calls the RequestPage() method for the required page.

The additional step is then to load the next page or the previous page based upon the pageOffset. This is based upon the assumption that if the user is viewing page 0, there is a good chance they will scroll down to view page 1. Fetching it earlier results in no gap in the display.

CleanUpPages() is then called to clean up (or unload) any pages that are no longer in use.

Finally, a defensive check is in place in case the page is not yet available, which is necessary if RequestPage does not operate synchronously as in the case of the derived class AsyncVirtualizingCollection<T>.

The pages are stored in a Dictionary, where the page index is used as the key. An additional Dictionary is used to store touch times. Touch times record the time each page was last accessed. This is used by the CleanUpPages() method to remove pages that have not been accessed for a considerable amount of time.

To complete the solution, FetchPage() performs the fetch from the ItemsProvider, and the LoadPage() method does the work of getting the page call the PopulatePage method to store it in the Dictionary.

It may seem that there are a few too many inconsequential methods, but they have been designed in that way for a reason. Each method performs exactly one task. This helps to keep the code readable, and also makes it easier to extend or modify functionality in derived classes, as will be observed below.

The VirtualizingCollection<T> class achieves the primary objective to implement data virtualization. Unfortunately, when in use, this class has one severe drawback: the data fetch methods are all executed synchronously. This means they will be executed by the UI thread, resulting, potentially, in a sluggish application.

AsyncVirtualizingCollection<T>

The AsyncVirtualizingCollection<T> class is derived from VirtualizingCollection<T>, and overrides the Load methods in order to perform the data loading asynchronously.

The key behind an asynchronous data source in WPF is that it must then notify the UI via data binding when the data has been fetched. In a regular object, this is usually achieved with the INotifyPropertyChanged interface. For a collection implementation, however, it is necessary to use its close relative, INotifyCollectionChanged. This is the interface used by ObservableCollection<T>.

The AsyncVirtualizingCollection<T> implements both INotifyCollectionChanged and INotifyPropertyChanged, providing maximum data binding flexibility. There is nothing really of note in this implementation.

In the overridden LoadCount() method, the fetch is invoked asynchronously via the ThreadPool. Once completed, the new Count is set, and the FireCollectionReset() method is called to update the UI via the INotifyCollectionChanged interface. Note that the LoadCountCompleted method is invoked on the UI thread again, by using the SynchronizationContext. This SynchronizationContext property is set in the constructor, with the assumption that the collection instance will be created on the UI thread.

The asynchronous load of the page data follows the same convention, and again the FireCollectionReset() method is used to update the UI.

Note also the IsLoading property. This is a simple flag, which can be used to indicate to the UI that the collection is loading. When the IsLoading property is changed, the FirePropertyChanged() method is called to update the UI via the INotifyPropertyChanged mechanism.

The window layout is pretty basic, but sufficient to demonstrate the solution.

The user can configure the number of items in the DemoCustomerProvider, and the simulated fetch delay.

The demo allows the user to compare a standard List(T) implementation to the VirtualizingCollection(T) and AsyncVirtualizingCollection(T) implementations. With the VirtualizingCollection(T) and AsyncVirtualizingCollection(T), the user can specify the page size and page timeout. These should be chosen to suit the characteristics of the control and the expected usage patterns.

The total (managed) memory usage is shown to offer comparisons in the memory footprint in the different IList implementations. The rotating square animation is used to indicate stalls on the UI thread. In a fully asynchronous solution, the animation should not stutter or pause.

Points of Interest

Incidentally, during the course of creating this solution, I discovered that the implementation must implement the IList interface (rather than the generic IList<T> interface). This is in contradiction to the current MSDN documentation (link). However, it is good practice to implement both the IList and IList<T> interfaces in any generic list implementation.

In practice, it would appear that the ItemsControl binding will also invoke the IndexOf() method. I can’t explain why this is required, and obviously, if a correct implementation were required, this solution would not be possible. Fortunately, it was found that simply returning –1 from the IndexOf() implementation was sufficient.

Known Issues and Future Extensions

The above solution assumes that the source collection is readonly and does not change. Ideally, the solution should periodically (or on demand) re-load the Count and pages.

The IItemsProvider interface could be extended to provide support for editing and sorting.

Final Remarks

This is my first article on CodeProject. Having lurked in the shadows for several years, it was finally time to come out into the open. I hope you find this article useful, and I would appreciate any comments or suggestions. If you find any errors, please accept my apologies and leave a comment; I will endeavor to fix any errors promptly.

Update

Since I first published this article, Bea Stollnitz has published a much more comprehensive and complete solution to data virtualization. I would refer readers to her solution.

I am greatly encouraged by all the positive comments this article received and would like to thank all those who read it, commented on it or voted. I hope people will continue to find this article and example code useful. As such I would like to remove the license on the source code and place it in the public domain. This means anyone may use it in any application, but without any warranty.

Comments and Discussions

I read the article and it was very interesting and very useful for my project.
It is a great way to view the data.
I am using it to read, one hundred and ninety thousand data from my database.
But sometimes, when I read this data inside my GUI, and I'm going to read rows, I get an error saying:

Exception error nell'UIThread
"System.OurOfMemoryException"

Before the error, I realized, through a line counter in the GUI,
that does not read all the data in the database, but stops about thirty-seven thousand lines.
After that when I go down with the scrollbar, then appears the error message.
What could cause this kind of error?

i use a logFile for trace my project. I read that i have a error here:
VirtualCollection: catch exception get T method
the catch message say:"the specified key was not in the dictionary"

I have been stuck with this issue forsome time now of enabling virtualization in my data grid. I have a data grid with row details (the size of row details can be huge). CanContentScroll issetto true and Virtualization is enabled.
The problem is whenever a row with huge number of row details are shown, data is cropped (obviously because it doesn't fit in the window size). I cannot afford to turn off virtualization either. I noticed that this can be fixed by directly placing the data grid in the scroll viewer and not to try to fit it in the ItemsPresenter. (Currently i have a style wherein a control template has been defined, default MSDN datagrid style is used). But will doing this again turn off virtualization? How can i verify if virtualization is enabled??
Thanks

This is an amazing demo, and I'd give it five stars if this was set up with an MVVM pattern. The reason I withhold the last star involves my discovery of an interesting issue revolving around how UI virtualization actually works relative to VirtualizingStackPanel. Upon running face-first into an issue where my MVVM-virtualized collections appear to be mined endlessly for items even with recycling virtualization turned on (this seems to be an issue that several other commenters on this article have run into in various situations as well). I considered this situation perfectly confounding as it's obvious the layout is restricted by parent-level regions (so in theory, it doesn't need to fetch all items in a large collection). After several days of near-insanity trying to figure out why the unlimited polling issue was occurring, I happened to hardcode the listview's height, and voila, it worked as expected! Obviously, no one would want to do this in practice, so I figured the following templates could be plugged into a virtualized control to provide an acceptable workaround for the endless fetch problem (note that I provide two templates to account for panel orientation):

I arrived at this particular solution by noting the following facts about my application and this demo project:

- The demo is not using MVVM and loads its data directly to the data context upon clicking a button (and therefore relatively late-bound to the data context of the view that will be displaying the items).
- My application is MVVM (and uses a tabbed layout that exposes a listview that definitely needs to be virtualized in full), and therefore loads its data relatively early via a complete data binding hierarchy. Since data binding operates at higher priority than rendering in WPF's dispatcher model, the layout can't determine its size until it has found all the items it thinks it needs. For obvious reasons, this wouldn't be an issue in the demo.
- Restriction of the height suggests that natural evaluation of bindings leads the layout into somewhat of an indeterminate state where it isn't initially sure of its real height since hard-coding the height stopped the runaway retrieval of items. Again, this wouldn't be an issue with the demo since the data is set well after the layout has been dimensioned, and lack of restriction in the layout (coupled with runaway item fetching) implies that the virtual UI doesn't have a good size reference (infinity?) to base its ability to choose a range of indexes on. This could be an MS bug, but I can only speculate at this point. (BTW, my project is being done in .NET 3.5 SP1 if that helps.)
- By extension, I needed to find an acceptable means of restricting the height of the items panel itself (and it turns out that parent containers do wonders with providing constraining size references of expandable regions (like ScrollViewer does for its content), so it made sense to attack VirtualizingStackPanel as the source of the runaway item fetch problem. Note also that the ancestor source of the bindings in the workaround templates can be tweaked to work with other types of ItemsControls (depending upon the layout) as needed.

Whether the default behavior is by design is at best debatable in my mind, but without a doubt, it will destroy the ability to perform data virtualization effectively. Nevertheless, I hope that people who see this article will also find this comment as it would save them hours of headache trying to control this behavior.

Neat example, I'm trying to debug why I'm getting an out of memory exception using VS 2013 DataGrid when binding a large (17k elements) list and this is quite useful in tracking down what's going on.

However I'm getting a KeyNotFoundException in the indexer method of VirtualizingCollection. The relevent lines are below. The issue is CleanUpPages() seems to be removing the page which was just added by RequestPage() so the following line then throws.

Hi It seems to be crashing the program using the AsyncVirtualization and also when using the Virtualization it seems to load all the pages not loading the pages when scrolling. Im using VS2013 wpf 4.5 if that makes any different.

I've been trying to do something similar myself, virtualizing the data item itself but I had issues that the ListView would, at databinding time, pull all the million (or how many items) the data source would have.

Turns out, on my virtualizing collection, I hadn't implemented IList, just IList. That was certainly not enough and finally, after implementing (the untyped) IList things would go right.

I have a project that I want to provide as open source project under the GPL.
But the problem is, that the license (Code project license) for the above code is not compatible with the GPL.
Could you please provide the code with a GPL compatible license?

Hi Paul
Have you tried to use your code in a Wpf datagrid ?.
In this scenario all the pages are loaded during the initial loading.
I am willing to receive your reply about this issue.
Best regards
Polmau