Description

Welcome to another installment of C9 Lectures covering the powerful general C++ library, STL. Joining us once again is the great Stephan T. Lavavej, Microsoft's keeper of the STL cloth (this means he manages the partnership between the owners of STL (dinkumware) and Microsoft, including, of course, bug fixes and enhancements to the STL that ships as part of Visual C++). Simply, Stephan is a C++ library developer.

As is Stephan's nature, he elaborates on technical details in very substantive way. The Standard Template Library, orSTL, is a C++ library of container classes, algorithms, and iterators. STL provides many fundamental algorithms and data structures. Furthermore, the STL is a general-purpose library: its components are heavily parameterized, such that almost every component in the STL is a template.

In part 6, Stephan guides us into the logical and beautiful world of algorithms. STL shines here.

[AdamSpeight2008]> From the video I get the sense recursive lambda are bad idea.> Why not? Wouldn't the compiler eliminate the tail recursion and maybe even in-line it. ?

Syntactically, trying to get an *unnamed* function object to call itself is a huge headache. (There are ways, and they're all bad.) Getting *named* function objects to call themselves is trivial, and perfectly fine. Here's what that looks like:

[Garp]> You need to keep doing them, even when you're done with the STL. You're a born teacher!

That's very kind of you to say. I may do an advanced series on the STL's internal implementation, or on Core Language features, especially C++0x features. Even walking through something "simple" like template argument deduction would probably be valuable.

[AdamSpeight2008]> As an observation you may have missed a few subtle things in the snippet I provided.

I only glanced at it; the last time that I read any form of Basic was in the 20th century, and I'd like to keep it that way.

I find lambda calculus-speak to be intensely confusing, and that's despite doing Scheme (and some LISP) for years in college. Surprisingly, C++ is a far better language for functional programming, because its machine model is simple. In C++, function objects are just classes, and everyone knows how member functions and data members work. The only enlightening sentence in http://en.wikipedia.org/wiki/Fixed_point_combinator is this: "In programming languages that support function literals, fixed point combinators allow the definition and use of anonymous recursive functions, i.e. without having to bind such functions to identifiers." In C++, if you want recursion, you should simply use a named function object class, as I demonstrated above.

I may do an advanced series on the STL's internal implementation, or on Core Language features, especially C++0x features. Even walking through something "simple" like template argument deduction would probably be valuable.

Yes, please, to all of the above!

Keep 'em coming Stephan. You are definitely one of the best/clearest presenters on Ch.9.

> Perhaps it has to do with the application you use to play the media?

A valid point, so i tested it in both 'Windows Media Player' and VLC, and still the same low quality.

> There is nothing wrong with the audio.Perhaps you can't hear badness ?

> At any rate, tone down your hostility please...I'm not hostile, not even close, i'm just honest, speaks the truth and my observations.

You just don't like me pointing out the bad / problems and that is your problem not mine.

If you don't like the feedback you're getting perhaps it's time for you to actually fix the problems reported and stop trying to sweep them under the rug ( like microsoft likes to do btw, sadly enough)

Though i should be a bit hostile since you ignore my request for your e-mail, for the detailed problem report. But i won't.

But i guess that just shows how interested you are in fixing problems.

But refusing to fix bug is normal at microsoft so...

What is it you call it "by design" ?

PS: That was not hostile, it was an observation.

Anyway, thanks for reading my posts even though you seem to ignore what you read.

@Mr Crash: There is nothing wrong with the audio. If anything, it is a bit too loud (it could stand to be a little less). I have no problems with honesty. In this case, the problem you express with passion is not a problem (or at least not one what I can hear using WMP on Windows 7).

Can you be more specific? Also, let's take this conversation off of this thread. Send me a mail throught the contact us form (I get those mails....).

Like Parts 4 and 5, Parts 6 and 7 were filmed back-to-back, so Part 7 is already in the pipeline.

> The sound quality of the "High Quality WMV" is not so high quality at all

The first 20 seconds of the High Quality WMV, the MP3, and the MP4 sound perfectly fine to me (through my cheap headphones; my awesome wireless headphones are at home). The MP3 seemed to have greater compression artifacts than the others.

Maybe I just sound (and look) weird.

By the way, the end of Part 7 has a hilarious mishap near the end that was entirely my fault. See if you notice it when the video comes out.

[ryanb]> Keep 'em coming Stephan. You are definitely one of the best/clearest presenters on Ch.9.

Thanks - comments like this definitely motivate me to keep going.

[Tominator2005]> Stephen, is there anyway to use any of the STL algorithms on multi dimensional arrays.

It depends on what you're trying to do. Writing an iterator to present a linear view of a multi-dimensional container is certainly possible, and not too difficult.

> and also is a vector of a vector better for that.

Yes. Vectors are superior to built-in arrays for many reasons, one of which being that vectors know their own sizes. (std::array also knows its own size. If you look at my Nurikabe solver, I used a std::array when I knew the size at compile-time.)

Hi Stephan, thanks for these great lectures! They are very, very useful and unique to have an introduction into the new STL features available in VS2010.

There is one special subject that I would love to hear more about in one of your future STL videos - and that is: Allocators.

Accidentally I saw on the Dinkumware website that they offer an "Allocators library". Taking then a closer look to MSDN I found that this library is indeed also part of VS2010 ("allocators" header file). Is that new in VS2010? At least I can't remember having this in VS2008.

I'm especially interested in this subject because I am maintaining a library (currently still in VS2008) which deals with very large datasets (detailled road networks for Europe) and does shortest path calculations on the network based on a preprocessed graph which allows for much faster path calculations.

This mentioned preprocessing uses STL and I'm having constant trouble with memory management, especially memory fragmentation, because the algorithm used by this preprocessing happens to need a huge amount of inserts and removes of small objects from map and list containers. I've already done a lot of changes and optimizations of the algorithm itself but I am pretty sure that I could make a major step forward by replacing the STL default allocator by some other allocator which can help to reduce memory fragmentation.

So I was happy to see now a thing like "allocator_chunklist" and the other allocators in the library which allow to customize memory management in STL.

Well, perhaps this wish is too special for your introduction, but it would be great if you could spend a few minutes on this subject sooner or later!

Anyway, if so or not, I'm prepared for a download of the next lecture and hope you still have a lot of stuff to talk about!

The ability to make a workable Y-Combinator(Fix point recursion of annonymous functions) aren't really possible to do (DIrectly) with the MS version of the STL at this moment, but you can make your own class to make the combiner since this seems to be whats limiting:

Seeing that in the VS2010 headers make me sad inside BUT an implementation should be find if you use the bost::bind or tbb::combine_each to allow strict binding for a solution with 1 combiner function for the usage in the C++0x lambda calculation.

First I just want to say I've been following these videos since they started and I love them, you have a natural knack for disseminating quality information

It just so happened that as I was watching this video I was also working on an old and nefarious piece of code: A network message that is essentially a string in char16_t* format that contains 5 integers, formatted as strings and separated by spaces, followed by a text message (this is actually a network message for SpatialChat in the MMORPG project I'm working on). The 5 integers at the beginning are id's for things such as the mood of the speaker, the form of delivery (shouting, whispering, etc).

The original implementation of this involved converting a char16_t* string to a int8_t* (or from utf16 to utf8) before trying to work its way through processing and then converting back to char16_t*. This works great for English text but we recently had a few players from Russia using the Cyrillic alphabet in which the code blows up due to the loss of precision in switching between string formats. It was my task to resolve this issue and given that the bulk of the processing is in a while loop I wanted to see if this could be eliminated using STL algorithms.

Here is the original code as it existed in the source, note the lack of comments which made figuring out what the code was actually doing a big chore:

After careful examination you can see that the string is converted from utf16 to ansi format, then the string is looped over looking for the 5 integers (in string format) and finally placing the leftover data into a new string. Very complicated, brittle, and failes horribly in the case of strings that require utf16 format. Someone in the past also had decided it would be a good idea to typedef a custom BString class to string, which has been an endless source of confusion as well.

Here is my "homework" solution which makes use of several STL algorithms to achieve the desired results:

// Get the unicode data and convert it to ansii, then get the raw data.
std::u16string chat_data = message->getStringUnicode16();
std::vector<std::u16string> tmp;
std::vector<uint64_t> chat_elements;
int elements_size = 0;
// The spatial chat data is all in a ustring. This consists of 5 chat elements
// and the text of the spatial chat. The 5 chat elements are integers that are
// sent as strings so here we use an istream_iterator which splits the strings
// at spaces.
std::basic_istringstream<char16_t> iss(chat_data);
std::copy_n(std::istream_iterator<std::u16string, char16_t, std::char_traits<char16_t>>(iss), 5,
std::back_inserter<std::vector<std::u16string>>(tmp));
// Now we use the STL transform to convert the vector of std::u16string to a vector of uint64_t.
try {
std::transform(tmp.begin(), tmp.end(), std::back_inserter<std::vector<uint64_t>>(chat_elements),
[&elements_size] (const std::u16string& s) -> uint64_t {
// Convert the element to a uint64_t
uint64_t output = boost::lexical_cast<uint64_t>(std::string(s.begin(), s.end()));
// After successful conversion update we need to store how long
// the string was (plus 1 for the space delimiter that came after it).
elements_size += s.size() + 1;
return output;
});
} catch(const boost::bad_lexical_cast& e) {
LOG(ERROR) << e.what();
return; // We suffered an unrecoverable error, bail out now.
}
// After pulling out the chat elements store the rest of the data as the spatial text body.
std::u16string spatial_text(chat_data.begin()+elements_size, chat_data.end());

The above works like a charm! My only complaint is the lack of string literal identifiers in vc2010 when working with std::u16string and std::u32string, however, it's not a dealbreaker (ie., I can't do something like:

std::u16string mystring(u"Some text goes here");

).

Thanks again for a great series, you're definitely making an impact in the C++ community!

[Slauma]> Accidentally I saw on the Dinkumware website that they offer an "Allocators library".> Taking then a closer look to MSDN I found that this library is indeed also part of VS2010> ("allocators" header file). Is that new in VS2010? At least I can't remember having this in VS2008.

We added the Dinkum Allocators Library to VS 2010. Picking it up wasn't too much work, and we hoped that it would help customers with advanced allocator needs.

Because it's a non-Standard library, it's subject to change in the future, especially in response to customer feedback.

> I'm especially interested in this subject because I am maintaining a library...

Several suggestions:

1. Upgrade to VS 2010. Its move semantics increases the performance of STL-using applications. The magnitude of the increase depends on your application (it could range from unobservable to order-of-magnitude), but it certainly can't hurt. Additionally, _ITERATOR_DEBUG_LEVEL (the successor to _SECURE_SCL) now defaults to 0 in release mode. If you weren't aware of this before, it exacted up to a 2x perf penalty in VS 2005 and 2008. It is possible to manually disable it in 2005/2008, but doing so is fraught with peril; upgrading to 2010 is much, much better (also thanks to our deterministic linker checks for _ITERATOR_DEBUG_LEVEL mismatch).

2. If you're compiling for 32-bit and running on XP (or its server counterpart, Server 2003), then you're not getting Windows' Low Fragmentation Heap by default. It can be manually requested, and doing so is worth trying (it takes like 5 lines and immediately affects your entire application). When you compile for 64-bit, the CRT enables the LFH for you. And when you run on Vista or higher, Windows automagically detects allocations that would benefit from the LFH, and enables it for you; I am told that this automagic detection is so good, there is no benefit to manually enabling the LFH. That's why only 32-bit programs running on XP are affected.

3. Consider using the Boost Graph Library; replumbing your application could be a significant amount of work, but the BGL is very advanced.

4. Parallelization, if possible.

5. Looking into allocators is definitely appropriate here.

> Well, perhaps this wish is too special for your introduction, but it would be great if you could spend a few minutes on this subject sooner or later!

As it so happens, I've been experimenting with allocators in my Nurikabe solver, whose performance is currently dominated by set manipulation (list, map, and set are all node-based containers, so they're the same as far as allocators are concerned except for the size of the nodes). More recently, I've discovered that avoiding set manipulation in one crucial part of the algorithm, instead using vectors, is even better for performance - but I may go back and polish up the allocator changes later.

Here is my code, if you'd like to get some ideas from it. WARNING DANGER HAZARD CAUTION, this is HIGHLY EXPERIMENTAL and is best regarded as a SKETCH of what a real allocator should do. Using this verbatim in production code would be an EXTREMELY BAD IDEA. (For example, it currently depends on using namespace std; which makes it unsuitable for inclusion in a header, and that is the least of its issues.)

This increases my performance on nikoli_9 from 172.772 seconds to 111.355 seconds, a 55% speedup. Briefly summarized, the allocator just calls malloc()/free() except for certain sizes. When 128 bytes or less (this is rather arbitrarily chosen; my nodes of interest are 20 bytes on x86), and evenly divisible by the size of a pointer (true for nodes, which contain pointers - I did this to avoid worrying about alignment), it activates a special scheme. For each size, on each thread (remember multithreading!), there's a void * block. If this pointer is null, it malloc()s a fresh chunk of data, and returns that. If this pointer is not null, it removes the block from a singly-linked list by hand (this is the crazy casting game), and returns that block. When a block is available, this requires no locking, and no calls to the CRT or Windows. It's just a few tests and pointer surgery. Deallocation is the reverse; for non-magic sizes it just calls free(). For magic sizes, it adds the block to the singly-linked list.

Note that this does not handle allocating memory from one thread and deallocating it from another, which ordinary new/malloc() handles just fine (at the cost of a lock). Also, blocks are currently never freed; with more thought (and especially stateful allocators) this would be possible. Here are some helper typedefs that I used:

Hopefully this should give you some idea of what is possible with allocators. Anything is possible as long as you are careful to satisfy the requirements - allocation is tricky, so the requirements are not trivial.

[HeavensRevenge]> Seeing that in the VS2010 headers make me sad inside

I don't know what you're referring to. That's a valid implementation of N3126 20.8.10.1.3 "Placeholders" [func.bind.place]. Our bind() implementation has (many) bugs, but that is not one of them.

[Nils]> aha did anybody else notice that the program is called meow?

Nice catch - I love cats.

[devcodex]> First I just want to say I've been following these videos since they started and I love them, you have a natural knack for disseminating quality information

Thanks for continuing to watch them!

> The original implementation of this involved converting a char16_t* string to a int8_t*> (or from utf16 to utf8) before trying to work its way through processing and then> converting back to char16_t*. This works great for English text but we recently had> a few players from Russia using the Cyrillic alphabet in which the code blows up due> to the loss of precision in switching between string formats.

Slight correction: UTF-16 can be losslessly converted to and from UTF-8. Both are encodings of Unicode. However, converting UTF-16 (which is Unicode) to ANSI (which is non-Unicode) loses information; I believe you meant ANSI here instead of UTF-8. In VC10, I fixed bugs like this in the CRT.

(By the way, Unicode is full of headaches, but ANSI is far far worse, as I'm sure you've discovered.)

> Here is my "homework" solution which makes use of several STL algorithms to achieve the desired results:

Nice.

Here's a possible further improvement: given your description of the problem, I'd use a const wregex r(L"(\\d+) (\\d+) (\\d+) (\\d+) (\\d+) (.*)") and perform a regex_match() which tests whether the whole string matches the whole regex. Having passed wsmatch m into regex_match(), I could parse m[1] through m[5] as integers (stoi(), new in VC10, can be used for this), and m[6] would be the rest of the string.

> My only complaint is the lack of string literal identifiers in vc2010 when working with std::u16string and std::u32string

Unicode string literals are a C++0x feature that's on our radar, but it's pretty far down on our list of priorities.

> Thanks again for a great series, you're definitely making an impact in the C++ community!

Looking at this solution compared to the original while loop that did the same thing it's much easier now to determine the intention of the code just by reading it. I am continually amazed at just how powerful and elegant the stl is and at the impact the new standard has on the language, what an awesome experience it must be to have a career working so closely with it!

[devcodex]> Looking at this solution compared to the original while loop that did the same> thing it's much easier now to determine the intention of the code just by reading it.

Yay! Yep, that's exactly right. It's also less likely to contain bugs, especially as the code is modified over time (what if you add a sixth number or change the format in another way?).

[Mr Crash]> Could you perhaps explain when to use 'get_temporary_buffer'

It has a very specialized purpose. Note that it doesn't throw exceptions, like new (nothrow), but it also doesn't construct objects, unlike new (nothrow).

It's used internally by the STL in algorithms like stable_partition(). This happens when there are magic words like N3126 25.3.13 [alg.partitions]/11: stable_partition() has complexity "At most (last - first) * log(last - first) swaps, but only linear number of swaps if there is enough extra memory." When the magic words "if there is enough extra memory" appear, the STL uses get_temporary_buffer() to attempt to acquire working space. If it can, then it can implement the algorithm more efficiently. If it can't, because the system is running dangerously close to out-of-memory (or the ranges involved are huge), the algorithm can fall back to a slower technique.

99.9% of STL users will never need to know about get_temporary_buffer().

> Btw have you ever thought about writing a book about STL ?

That's one of the things I'd like to do if I had enough free time.

> Oh and there's nothing wrong with your voice or appearance, it is an encoding/hardware problem

Well, maybe I only sound weird to myself, but my right eye is made of plastic (due to a birth defect), so I know I look weird. Thanks, though. :->

1) I've moved my library to VS2010 to make a performance test: It runs 8% faster compared to VS2008 on the same machine. But this is only the total time including some file read/write and other operations. The benefit for the part which works extensively on the STL containers will probably have more than 8% performance increase (but not more than roughly 20% to 25% which is still fine for me considering that I didn't have any work to achieve this except recompiling).

2) Why didn't you use the new allocator_chunklist in VS2010 instead of your hand-written block allocator? Isn't this allocators intention to provide a block allocation out of the box? Did you avoid it because the allocator is in the stdext namespace and not standard compliant? Does the allocator_chunklist have any drawbacks? (I'm asking because it's the allocator I am planning to make use of in my library.)

[Slauma]> I've moved my library to VS2010 to make a performance test: It runs 8% faster compared to VS2008 on the same machine.

Awesome!

> Why didn't you use the new allocator_chunklist in VS2010 instead of your hand-written block allocator?

I played with it. As I recall, I was confused by having to specify the size of the blocks being allocated. I wanted an allocator that would work for a range of block sizes. I could have been mistaken - I didn't spend very long playing with <allocators>.

well that's not the only strangeness i've noticed but you get my point.

Actually i'm already counting the days until the next visual studio version is released, vs2010 is so incomplete, buggy, slow (going over to WPF was a really bad move performance wise), etc.. compared to vs2008

I also looked at all of these videos since the beginning, and previous videos from you (STL). TBH, these are the only learning videos I watch. I often go to C9, to see if they have a new video from you.

Keep it up!!!

P.S. A video on STL (compile-time calculation) would be cool. I forgot the name, but a small example's shown in the Effective C++ book (don't have it here), using the enum hack/workaround.

In the Standard Library, when we become aware of problems, we fix them if we have sufficient time remaining. When the Working Paper itself is defective, that's a headache for us, but we try to figure out something reasonable to do. In this case, nobody reported problems with to_string() in time for us to fix them, so they didn't get fixed. (to_string() has already been fixed for VC11.)

The moral of this story: always grab beta releases and report problems with them as soon as possible (through Microsoft Connect). If you report bugs too late, you'll have no choice but to wait for the next major version.

> the compiler got some really embarrassing bugs

To put this in the gentlest possible way, mentioning unspecified compiler bugs to someone who isn't a compiler dev (me) is a doubly unproductive use of time. Please report compiler bugs through Microsoft Connect. Due to limited resources, the compiler team has to aggressively triage bugs so that they can spend time on the very worst ones, so you should be prepared for Won't Fix resolutions. However, that's better than not reporting bugs at all, which makes them even less likely to be fixed.

If you maintain software, you should be familiar with these issues; every developer wants their users to report bugs as soon as possible.

[Deraynger]> I also looked at all of these videos since the beginning, and previous videos from you (STL).> TBH, these are the only learning videos I watch. I often go to C9, to see if they have a new video from you.

Cool! I'll be filming Part 8 in early November.

> P.S. A video on STL (compile-time calculation) would be cool.

Explaining template metaprogramming with C++0x <type_traits> is an excellent idea. I may do just that for Part 8.

@STL: Just wanted to mention that the compiler bugs have been reported already.

Only two reasons why i did mention the compiler bugs was to give an example to why i'm waiting on the next version. Second one is, sadly enough, the more you complain and and inform the public (other developers in this case) about the bugs the high the chances are the they will get fixed, at all, (many bugs never gets fixed as you obviously already know)

I do run betas in VM's but time is limited so i have to trust that microsoft's internal beta testers do their work (with the public beta testers help of course )

> Due to limited resources..

Yeah, i've never really bought that as real reason, smells more like an excuse. Usually it's easily fixable (ex, a pipeline issue) but due to poor prioritizing it's never fixed and due to that you get a back log etc..

Now when i think about it, it would be interesting to see this process in action, hey Charles what do you say about a channel9 video about from bug report to fix with information about this "limited resources" microsoft dev teams talk about all the time ?

"An inside look in how the compiler team (try to ) fix bugs"

On another subject:

STL what processes do/did you go though to optimize the nurikabe program ?

Ex. are there any special techniques you use to detect when you have chosen the wrong container, etc..

or is it just trial and error ?

Perhaps make a video someday on how to optimize the usage of the STL. ?

> Just wanted to mention that the compiler bugs have been reported already.

Excellent.

>> Due to limited resources> Yeah, i've never really bought that as real reason, smells more like an excuse.

It means that we have a finite number of people (devs/testers) with a finite amount of time to work on extremely complicated machinery.

> Usually it's easily fixable (ex, a pipeline issue)

I don't know what "pipeline issue" means.

> STL what processes do/did you go though to optimize the nurikabe program ?

Profiling, to detect where all of my time was going. This took several forms:

1. Adding more test cases. I developed my solver against wikipedia_hard, which turned out to be sufficient for correctness and functionality. But for performance, I discovered that other test cases were even harder to solve. Since then, I've been focusing on nikoli_9, the hardest puzzle available to me.

2. Emitting more information. I've enhanced the output to display the time taken by each step of analysis, and also to indicate where hypothetical contradiction analysis was unsuccessful. This allows me to notice which steps take longer, and to focus on when and where failed guesses occur (as those are extremely, punishingly expensive).

3. Using VS's profiler, which indicated (and continues to indicate) that confinement analysis (both top-level, and during guessing) is where the solver spends all of its time. This is a classic hotspot, so I don't have to worry about optimizing the rest of the code unless and until (hopefully) confinement analysis becomes so blazingly fast that other steps of analysis start showing up in the profiler.

Also lots of experimentation - I'll try making a change, and see if it makes things better or worse.

Great video Stephan, even my (non-programmer) wife commented on how enthusiastic you are! The amount of info you pack into a short time is superb, there is no fluff here. Keep 'em coming, this series is turning into a classic guide to the STL that will be valuable for a long time I suspect

Actually yes the comments about audio in this and couple earlier parts (I think first part or two were OK) were truthful but unspecific. The recording gain is too high in one stage or another. It seems the recording level has been calibrated to be loud at soft voice, however the presenters enthuastic voice goes louder at times and it starts to distort. This is very clearly audible immediately in first 10 seconds, on the words "there" "Lavavej" "part" "my" ..

It's better to record gain at low level, then normalize the audio and remove noise incase the noise floor rises too much, than to record at high level and have clipping since there's nothing that can be done about that later.