The most universal STXXL container is stxxl::vector. Vector is an array whose size can vary dynamically. The implementation of stxxl::vector is similar to the LEDA-SM array [25]. The content of a vector is striped block-wise over the disks, using an assignment strategy given as a template parameter. Some of the blocks are cached in a vector cache of fixed size (also a parameter). The replacement of cache blocks is controlled by a specified page-replacement strategy. STXXL has implementations of LRU and random replacement strategies. The user can provide his/her own strategy as well. The stxxl::vector has STL-compatible Random Access Iterators.

One random access costs I/Os in the worst case. Same for insertion and removal at the end.

The Architecture of stxxl::vector

The stxxl::vector is organized as a collection of blocks residing on the external storage media (parallel disks). Access to the external blocks is organized through the fully associative cache which consist of some fixed amount of in-memory pages (The page is a collection of consecutive blocks. The number of blocks in the page is constant.). The schema of stxxl::vector is depicted in the following figure. When accessing an element the implementation of stxxl::vector access methods ([]operator, push_back, etc.) first checks whether the page to which the requested element belongs is in the vector's cache. If it is the case the reference to the element in the cache is returned. Otherwise the page is brought into the cache (If the page of the element has not been touched so far, this step is skipped. To keep an eye on such situations there is a special flag for each page.). If there was no free space in the cache, then some page is to be written out. Vector maintains a pager object, that tells which page to kick out. STXXL provides LRU and random paging strategies. The most efficient and default one is LRU. For each page vector maintains the dirty flag, which is set when non-constant reference to one of the page's elements was returned. The dirty flag is cleared each time when the page is read into the cache. The purpose of the flag is to track whether any element of the page is modified and therefore the page needs to be written to the disk(s) when it has to be evicted from the cache.

The schema of stxxl::vector that consists of ten external memory pages and has a cache with the capacity of four pages. The first cache page is mapped to external page 1, the second page is mapped to external page 8, and the fourth cache page is mapped to page 5. The third page is not assigned to any external memory page.

In the worst case scenario when vector elements are read/written in the random order each access takes 2 x blocks_per_page I/Os. The factor two shows up here because one has to write the replaced from cache page and read the required one). However the scanning of the array costs about I/Os using constant vector iterators or const reference to the vector, where n is the number of elements to read or write (read-only access). Using non-const vector access methods leads to I/Os because every page becomes dirty when returning a non const reference. If one needs only to sequentially write elements to the vector in I/Os the currently fastest method is stxxl::generate. Sequential writing to an untouched before vector (e.g. when created using stxxl::vector(size_type n) constructor.) or alone adding elements at the end of the vector, using the push_back(const T&) method, leads also to I/Os.

stxxl::VECTOR_GENERATOR

Besides the type of the elements stxxl::vector has many other template parameters (block size, number of blocks per page, pager class, etc.). To make the configuration of the vector type easier STXXL provides special type generator template meta programs for its containers.

Internal Memory Consumption of stxxl::vector

The cache of stxxl::vector largely dominates in its internal memory consumption. Other members consume very small fraction of stxxl::vector's memory even when the vector size is large. Therefore, the internal memory consumption of stxxl::vector can be estimated as bytes.

More Notes

In opposite to STL, stxxl::vector's iterators do not get invalidated when the vector is resized or reallocated.

Dereferencing a non-const iterator makes the page of the element to which the iterator points to dirty. This causes the page to be written back to the disks(s) when the page is to be kicked off from the cache (additional write I/Os). If you do not want this behavior, use const iterators instead. See following example:

vector_type V;

// ... fill the vector here

vector_type::iterator iter = V.begin();

// ... advance the iterator

a = *iter; // causes write I/Os, although *iter is not changed

vector_type::const_iterator citer = V.begin();

// ... advance the iterator

a = *citer; // read-only access, causes no write I/Os

*citer = b; // does not compile, citer is const

Non const operator[] makes the page of the element dirty. This causes the page to be written back to the disks(s) when the page is to be kicked off from the cache (additional write I/Os). If you do not want this behavior, use const operator[]. For that you need to access the vector via a const reference to it. Example: