File size

File size

File size

File size

File size

157.6 MB

As promised, the great Stephan T. Lavavej is back! Tens of thousands of you have watched STL's (those are his initials, so that's what we call him) introductory series on the STL, or Standard Template Library. If you haven't, you should. This series, Advanced STL, will cover the gory details of the STL's implementation -> you will therefore need to be versed in the basics of STL, competent in C++ (of course), and able to pay attention! Stephan is a great teacher and we are so happy to have him on Channel 9, and C9 is the only place you'll find this level of technical detail regarding the internals of the STL. There are no books. There are no websites. This is Stephan taking us into what is uncharted territory for most of us, even those with a more advanced STL skill set.

In the first part of this n-part series, Stephan digs deeply into shared_ptr. As you already know (since you will have the perquisites in place in your mind before watching this—remember, watch the intro series first), shared_ptr is a wrapper of sorts: it wraps a reference-counted smart pointer around a dynamically allocated object. shared_ptr is a template class (almost everything in the STL is a template, thus the name...) that describes an object (int, string, vector, etc.) that uses reference counting to manage resources. A shared_ptr object effectively either holds a pointer to the resource that it owns or holds a null pointer. A resource can be owned by more than one shared_ptr object, and when the last shared_ptr object that owns a particular resource is destroyed, the resource is freed.

You will also learn a lot about the beauty and the weirdness inside the STL. You should take Stephan's wisdom to heart and see if you can implement some of the patterns he shares with you in your own code, and you should of course take his advice about what NOT to do in your native compositions.

Finally some C++ goodness (took you a month and a half ) Finally something i know will be worth the time. More C++0x please (btw charles, i'm seeing a lot of system errors when trying to post comments today, is it fail fest over there? )

This video was a bit easy for me, but I've dealt with shared_ptr implementations in detail before.Here are some topic ideas:How exceptions work under the hood (the exception object is stored on the stack!)How dynamic_cast works, especially in cross casting situations (I understand there are some nasty x64 hacks)How you prevent STL code bloat (are the void * + static_cast tricks still necessary)?

Hi STL, excellent video on shared_ptr. I always look forward to your STL videos. I have some STL questions I hope you can answer. My user suggested to me to implement iterators for iterating elements in my open-source xml library. Is it possible to implement my own iterators to work with the STL algorithms in VC8, VC9 and VC10? It will be best if my custom iterators can work with algorithms in all STL implementations out there.

I am thinking of rewriting my own next_combination algorithm to take advantage of bidirectional and random access iterators. Right now, it only uses the least-common denominator iterators (bidirectional iterators) which is slow because the algorithm has to increment íterator 1 by 1, instead of 'jumping' to the required iterator. Is it possible to write my algorithm to work with all VC(8,9,10) STL iterators or all STL implementations? Are the iterator trait tags the same?

Marek> I'd like to hear something about sorting, trees (maps) and some tricky algorithms.

Good ideas - I've definitely been planning to explore various algorithms, and our sorts and trees contain some of the most interesting machinery.

MikeK> As for ideas for future videos, one on std::string would be great.

std::string's Small String Optimization is worth looking at. And going through its support for move semantics would give me a chance to explain in detail the Standardization Committee's bug that we fixed right before VC10 RTM thanks to an observant customer. Also, I recently optimized string::resize() and string::erase(), and it might be useful to show what I did and how I looked at the generated assembly.

Mr Crash> Finally some C++ goodness (took you a month and a half)

Heh. The studio was still undergoing renovation in January, and also I've been very busy with VC11. Charles said that he wanted to get me in the studio every week, and I was all, "great idea, but I've got this day job that you might have heard about..." :->

Ben Craig> This video was a bit easy for me, but I've dealt with shared_ptr implementations in detail before.

I'll do my best to cover mind-bendingly complicated topics in the future. :->

Ben Craig> How exceptions work under the hood (the exception object is stored on the stack!) How dynamic_cast works, especially in cross casting situations (I understand there are some nasty x64 hacks)

(Un)fortunately, these things are beneath my level of abstraction, which is to say that they're deeply magical to me, I'm glad they just work, and I'm very glad that somebody else has to worry about their implementations. In particular, almost all of that machinery is in the compiler, not the libraries. It might appear that I know a lot about the compiler, but my knowledge mostly ends where the Standard ends.

We rely on /OPT:REF,ICF magic. That's basically guaranteed to merge stuff like vector<X *> and vector<Y *> (note: STL containers of owning raw pointers are leaktrocity, as I explained in the intro series, but STL containers of non-owning raw pointers are perfectly fine and sometimes useful). In 4 years of maintaining the STL, I haven't seen a single customer reporting code bloat problems.

shaovoon> Is it possible to implement my own iterators to work with the STL algorithms in VC8, VC9 and VC10? It will be best if my custom iterators can work with algorithms in all STL implementations out there.

Totally possible. That's the best thing about having an International Standard, and a library designed by a genius (Stepanov) with easy and efficient extensibility in mind.

shaovoon> Right now, it only uses the least-common denominator iterators (bidirectional iterators) which is slow because the algorithm has to increment íterator 1 by 1, instead of 'jumping' to the required iterator.

@STL: "I'll do my best to cover mind-bendingly complicated topics in the future. :->"

Yes please! I do like to watch the "easier" stuff, but the mind-bending stuff is always fun too. The deeper you dive the better.

@Charles: STL's comment brings another idea to mind, but I suspect it would be hard to pull off. I would REALLY like to see a series with the compiler guys, digging into the gory details of the compiler. (Maybe that is too proprietary to share. I'm sure the compiler guys are busy too.) Or if that is too specific, maybe a more general series on compilation-related topics, parsing, translation, etc.... whatever happened to Phoenix, C# compiler as a service, etc... Just thinking out loud here.

Ah, the download FINALLY completed (C9 is really slow tonight). Off to watch this episode.

I've been always wondering why the C++ Standards Committee decided against adding intrusive_ptr to SC++L? A shared_ptr not only is sizeof( void* ) greater in size than intrusive_ptr, but it also requires an extra heap allocation for the ref count block (which can be avoided by using make_shared, but make_shared can not be used in scenarios where allocation needs to take place at a different site than pointer definition.) Besides, shared_ptr's approach to thread safety forces a design where reference counting must be done atomically regardless of whether the pointer is accessed by multiple threads or not. intrusive_ptr on the other hand, would allow for a design where such decisions could be made per object type. So my question is why?! Why don't we have a std::intrusive_ptr type like Boost?

We could name it Native TV, but I don't want to offend my Native American brothers and sisters (and friends). Nor do I want to cause angst among the native developers out there who don't program in C++...

@Charles: Keep C++ TV video length at 40+ mins, ok? Because I always couldn't wait till I get home to watch the STL videos. I usually watch it at workplace during lunchtime. 15mins lunch and 45 mins of STL goodness!

Interesting lecture,I think it would have been good if you went into the assignment or copy constructor of the shared_ptr so its clear how the control block is passed around to the various copies of the shared/weak ptr's. Other than that it was an insightful view into the topic, also interesting how you used the pre-processor macro's and the inclusion of the file multiple times to generate the different combinations of templates required (rather than using code generation tool or similar).The fact its included 11 times, does that mean that the STL version only supports up to 11 arguments for the constructor?

@ryanb: That's good stuff, Ryan. As you say, the C++ compiler people are extraordinarily busy - but it's not inconceivable that we go and meet them, dig into how the front end and back end compilers parse, analyze, optimize, etc... There is a huge amount of stuff we need to do further up the stack at the language level as well.

It's also clear that we should consider exploring the jewel in the haystack: the machine. After all, the notion of "native code" we interchangeably use when refering to C++ really means the high level syntax we humans compose in such a way that efficiently abstracts what the machine will eventually do with the processing instructions (machine code) created by the compiler (in C++, the back end compiler...). The argument on reddit about my stating that C/C++ is native code (in the description of the Mohsen and Craig interview) is entertaining, by the way. Love the passion out there

C

PS: Let's end this tangential conversation. We can move the C++ TV ideas to the Coffeehouse and leave the comments on this thread for the topic at hand ->The STL's shared_ptr implementation. Stephan doesn't have much free time - so let's make it easy for him to parse this thread for related questions/comments. Also, if you feel compelled to debate the meaning of "native code" then Coffeehouse is the place. Thanks for your understanding

Here is something I used to wonder for a long time. It is impossible to create an array of a user-define type that has no default constructor (unless you explicitly initialize all elements via the array-initalizer, of course). How does std::vector pull it off even though it uses an array internally? So if it's not too trivial, you could talk about placement new and explicit destructor calls. (This would be a perfect place to discuss various memory management and object lifetime details/issues.)

Also, I would love to see a guide through the implementation of unordered_set, provided there is any interesting "magic" going on.

I guess <initializer_list> is not practical to talk about yet due to lack of support in VC, right?

One thing: why are shared pointers considered "advanced STL"? I know they're not as well-known as some of the other STL things, but they really shouldn't be considered advanced material.

"How does std::vector pull it off even though it uses an array internally?"

That's pretty easy. std::vector allocates bytes, not arrays. It doesn't internally do a "new ClassType[size]"; it just calls the allocator and asks for a block of memory with a size of "sizeof(ClassType) * size". When it adds an entry, it first constructs that piece of memory by calling a placement new, then issues the copy/move constructor.

I've been on the Committee's mailing lists for the last 4 years, but I haven't attended any of their meetings, so I can't answer this with perfect precision. (Also, while I worked on getting Dinkumware's implementation of TR1 into VC9 SP1, the design of TR1 happened before my time.) My understanding is that intrusive_ptr wasn't proposed for inclusion in TR1/C++0x, rather than being proposed and rejected. You might be able to get a more detailed answer by asking on the Boost mailing list.

fileoffset> I think it would have been good if you went into the assignment or copy constructor of the shared_ptr so its clear how the control block is passed around to the various copies of the shared/weak ptr's.

Agreed - especially for the converting copy constructor from shared_ptr<Derived> to shared_ptr<Base>. However, I have to cram everything into 40-45 minutes, and there just wasn't time. I also had to spend some time explaining the overall series.

> The fact its included 11 times, does that mean that the STL version only supports up to 11 arguments for the constructor?

0 to 10. 10 is infinity. This Standard Library doesn't go to 11. :->

NotFredSafe> So if it's not too trivial, you could talk about placement new and explicit destructor calls.

I may be able to work that into a future part. My concern wouldn't be that it's trivial - it's actually rather complicated - but that it's not widely useful enough. Using Part 1 as an example, type erasure is an enormously powerful trick that can be used in lots of situations, and knowing about make_shared<T>()'s optimizations can help you to use the STL more effectively. Placement new seems very low-level, given the number of times I've had to explain it to people (not many). Still, when I start poking around the guts of containers I may have to mention it whether I like it or not.

> Also, I would love to see a guide through the implementation of unordered_set, provided there is any interesting "magic" going on.

Some magic - actually, we've been squashing debug perf bugs there in VC11. (Debug perf isn't terribly important, except when it's so slow as to be un-debuggable!) I'm not too familiar with unordered_foo's machinery, but I could probably figure it out pretty quickly.

> I guess <initializer_list> is not practical to talk about yet due to lack of support in VC, right?

Correct, VC10 RTM doesn't support initializer lists. It contains a nonfunctional <initializer_list> header because I simply forgot to remove it. (I carefully scrubbed out Dinkumware's library support for initializer lists, but forgot about a whole header. Go figure.)

JamesG> Fix the blurry text problem with code in VS. Part of the time, I could not read the code.

What blurry text? I downloaded the High Quality WMV, viewed it at 100%, and forwarded to 30:28 - meow.cpp's Consolas font is ginormous (as intended), and I can even clearly read Intellisense's tooltip, which I feared would be invisible.

std::vector allocates bytes, not arrays. It doesn't internally do a "new ClassType[size]"; it just calls the allocator and asks for a block of memory with a size of "sizeof(ClassType) * size". When it adds an entry, it first constructs that piece of memory by calling a placement new, then issues the copy/move constructor.

Yes, I know. That's why I said "used to wonder" and later mentioned placement new.

You are an excellent presenter Stephan. This series is great to watch and follow even for more experienced developers. Altohugh I have been using the STL for years its internals have always been intimidating. Vut your clear presenting and expert knowledge give us mortals a much better understanding of the magic. Keep up the good work.

3) At compile-time, is there any way (even some weird compiler-dependent macro) to determine how deep into template instantiations we are without keeping track of that depth ourselves?

4) for-loops seem to unroll themsleves if the conditional expression can be determined at compile-time and if the number of iterations are small enough. What is the maximum number of iterations that still allow for-loops to unroll? In others words, when does for-loop unrolling end and assembly jumps begin?

5) What improvements can be made to my loop unrolling code at the end of this post?

Background:Previously, I asked if we could cover loop unrolling (esp. for assignment statements). Normally, if I needed to repeatedly perform a few hundred thousand assignment statements (i.e. copying the buffer of images or video frames for processing) and I wanted to minimize the impact of the "i < size" and "++i" in that for-loop, then I would just write out a for-loop with a bunch of assignment statements in the for-loop body using a script and then copy & paste that code into the relevent cpp file by hand. Of course, this manual for-loop unrolling assumed that the size of the loops (i.e. the size of the images or video frames) weren't going to change from one compilation to another. A few weeks back, I had to come up with something a little easier to work with since I was going to be dealing with a number of different buffer sizes (all still known at compile time ... no run-time querying). With the help of pages 314-318 of C++ Templates: The Complete Guide, I ended up writing something like the following:

This code can be compiled in g++ 4.5.2 using the following command line (assuming the code is in a file named main.cpp):

g++ -o main.exe main.cpp -std=c++0x -O3 -Wall -Wextra -Werror

or compiled in VS2010 using Warning Level 4. For VS2010, it takes about two minutes to compile in Release mode if you are also producing the Assembly with Source Code ( /FAs ... or Properties / Configuration Properties / C/C++ / Output Files / Assembler Output ) as well. This code produces the following output:

Burkholder> 3) At compile-time, is there any way (even some weird compiler-dependent macro) to determine how deep into template instantiations we are without keeping track of that depth ourselves?

No. I've never heard of any compiler having such an ability, and it would be extremely problematic.

Burkholder> 4) for-loops seem to unroll themsleves if the conditional expression can be determined at compile-time and if the number of iterations are small enough. What is the maximum number of iterations that still allow for-loops to unroll? In others words, when does for-loop unrolling end and assembly jumps begin?

This is up to the optimizer and your optimization settings. Crazy magic happens here.

Burkholder> 5) What improvements can be made to my loop unrolling code at the end of this post?

Consider using SSE, etc. Video processing is a perfect scenario for vectorization.

(Of course, for simple copying, just use memcpy()/memmove(). In fact, our implementation of std::copy() calls memmove() when it can get away with it - something I'm very likely to cover in future parts.)

Great lecture as ever from STL of the STL. The pace was spot on - if I needed something clarifying I could use the seek bar.The book 'C++ in Action' recommends using a leading underscore to name private data members (http://relisoft.com/book/lang/scopes/2local.html (scroll to bottom)) which I started doing but couldn't stand its ugliness after a while. Now I know there's an even stronger reason not to use this convention.I would like to see how the STL can be used to implement machine learning, search algorithm optimisation (e.g. iterative deepening, incurred cost estimation, etc) and other aspects in the AI field.I'll be happy with whatever direction you go in though...good work!!

To clarify, only _Leading_underscore_capital and double__underscoreAnywhere names are reserved everywhere. _leading_underscore names are reserved in the global namespace, but users can use them in classes. (See N3225 17.6.3.3.2 [global.names].)

In my opinion, _member and member_ are terribly ugly. I use m_foo for members, because the lifetime of a data member exceeds that of any individual member function, and it's important to be constantly reminded of that fact. (This isn't Hungarian notation, which is evil - that attempts to encode types into names.)

Philhippus> I would like to see how the STL can be used to implement machine learning, search algorithm optimisation (e.g. iterative deepening, incurred cost estimation, etc) and other aspects in the AI field.

I'm not familiar with those domains, sorry. As I explained back in Intro Part 1, the STL is a library for pure computation, so you get to figure out how to apply it to your field. :->

(My Nurikabe solver was an attempt to demonstrate how the STL could be applied in a nontrivial program - but while I thought it was fascinating, I'm not sure how successful it was. It also took me weeks to write, something I can't easily do again. Even repurposing my code at home for data compression or font rendering would take a while.)

WATCHER> Would be nice if you got the watch window font to a size that can be seen.

I couldn't find an option for it. If somebody could find one, I'd be very grateful.

WATCHER> Be nice if you found out why the watch window was wrong, too

I think I'll reformat my laptop before filming Advanced Part 2. I may have messed with the visualizers in the past, but I thought I put everything back to its original state.

@JamesG: Smooth streaming will streaming quality ranges from low to high depending on your network conditions. We are aware that this isn't a great experience when there's code on screen and your network isn't capable of a large data stream. The dowloadable files are located under the Download section next to the inline player.

Ugh, design patterns. As far as I can tell (I'm looking at the book right now), "Strategy" means "customize behavior". The STL does this in lots of places: functors given to algorithms, comparators given to maps, allocators given to containers. Their bullet point "Strategies as template parameters." covers this.

@Charles: maybe you should use a 2-pane layout for lectures and show us the slides and code in another view, just like the pdc player or msr lectures player. and there should be a formal section for download links of sildes or other downloadable materials.

First: Thanks for a fantastic video, Stephan. Type erasure is used in many places now (std::function, boost::any...) and it's a fantastic idiom that helps decoupling implementation details from interfaces.One question though: If the make_shared allocation allocates the object and the reference counting block together, do they also have to be freed together? If I have a very big object and no more "uses", but still weak links, will the memory not be freed?

I have a question that i think you can answer or at least clear up a bit.

For some time now i have been confused about which of these to use for dynamic buffers that can be as small as 1 byte to over 500 megabytes and beyond, and modifiable. (But normally around 2 - 100 megabytes)vector buf; orunique_ptr buf(new BYTE[...]);

Marius> If the make_shared allocation allocates the object and the reference counting block together, do they also have to be freed together?

Yep. When all of the shared_ptrs have been destroyed/reset/assigned/etc. the object will be destroyed, but the refcount control block containing space for the object will persist until all of the weak_ptrs have been destroyed/reset/assigned/etc.

Marius> If I have a very big object and no more "uses", but still weak links, will the memory not be freed?

Yep. This is the one scenario (big object, weak_ptrs) where traditional shared_ptr construction is better than make_shared<T>(). Of course, only sizeof(T) matters, not the size of any dynamically allocated memory it might contain - for example, vector<T> is small according to this metric (only 16 bytes in VC10 and 12 bytes in VC11).

Use vector<unsigned char> instead of unique_ptr<unsigned char[]> unless your scenario would strongly benefit from unique_ptr and you know exactly what you're doing. vector is much more powerful (and still insanely efficient), beginning with the fact that it knows its own length. vector stores 3 raw pointers compared to unique_ptr's 1, but that matters only when you've got a zillion of the things - and if your buffers are megabyte-size I can guarantee that you don't have a zillion of the things.

Way back in Intro Part 1, I stressed that vector should be your container of first resort. That's still true.

STL: "Way back in Intro Part 1, I stressed that vector should be your container of first resort. That's still true."

Yes i did see it.There are so many ways of doing the same thing with the stl, that it is sometimes hard / confusing to know what to use in which situation, etc.Your videos are helping in that area though. :)

Is the above still true when you use that vector buffer and cast it to a structure ex LPBITMAP and then change values in it and then saving the buffer to file again and / or the other way new up a new vector to use as a temporary buffer, inserting structure info/ file headers, etc and then saving it to a file ?

"unique_ptr unless your scenario would strongly benefit from unique_ptr and you know exactly what you're doing""...strongly benefit..." Can you elaborate on that ?

I like to know both sides of the story before i choose a side.side1:pros/cons vs side2:pros/cons

Marty: Yep, still true. Also, vector works just fine with C-style APIs, where you can pass v.data() and v.size().

Marty> "...strongly benefit..." Can you elaborate on that ?

Basically, if you have a zillion - like ten million - tiny buffers (so fixed overheads per buffer really matter), AND their sizes aren't known at compiletime (so you can't just use std::array), AND they're not growing/shrinking during execution (so you don't have to worry about performing reallocation yourself), AND you can determine their runtime size from other information you're already storing (otherwise, you'll need to store unique_ptr<T[]> and size_t, which is 8 bytes versus vector's soon-to-be 12). In that contrived case, unique_ptr might be better than vector.

Regarding your question about the pace of these shows, I listen to the audio while driving so the pace is fine for me. Perhaps if I were to watch the video it would be too slow but as it is I can usually follow everything that's going on without having too many fatal accidents.I do have a question regarding this show, when using make_shared the T must sometimes be destroyed before the reference counting block (if weak pointers exist). How is this managed? Do you use a char buffer with placement new and then call the destructor explicitly? If so how do you guarantee that the memory block to T is correctly aligned (is it as simple as it's the first part of the structure returned from new and therefore aligned to all types). Normal 0 false false false EN-US X-NONE HE guarantee

Motti: See _Ref_count_obj in <memory>. Yes, we use placement new and explicit destructor calls. We also use a fancy bit of machinery called std::aligned_storage to guarantee alignment - as its name indicates, it's Standard machinery available for public consumption (by experts who know what they're doing).

You have mentioned Standard machinery available to deal with object's alignment.It'd be great if you could cover the STL allocators and the new C++0x alignment specifiers with STL.Implementing an SSE-friendly-container guaranteed-alignment allocator, along the lines of estl:allocator discussed in N2271 ( http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2271.html ) would be a fantastic example!

@Matt_PD: on the comments of before video (the one about type_traits) I make a comment about std::aligned_storage, I almost sure I'm exploiting it wrongly but it worked fine and really grant the alignment of subsequent data. For work with most multimedia instructions (SSE, AVX, etc) the initial address must be aligned too, with the info of this series I'm now testing shared_ and unique_ptr with the compiler aligned_alloc and aligned_free. It makes easy use the std::vector to store, lets say, 4 or more small aligned buffers for audio/video processing, but it is compiler dependent.I believe using placement new and delete would helps decouple/shields the code.Anyway I'm loving the series and the slides from STL is helping to demystify the source code we see when playing with debug. Lets hope next version of VS makes easy add the code for formating debug.

Burkholder> 5) What improvements can be made to my loop unrolling code at the end of this post?

Consider using SSE, etc. Video processing is a perfect scenario for vectorization.

(Of course, for simple copying, just use memcpy()/memmove(). In fact, our implementation of std::copy() calls memmove() when it can get away with it - something I'm very likely to cover in future parts.)

Thanks for the AWESOME suggestions!!! SSE, memcpy(), and memmove() are amazing!

I'm a complete newbie to SSE ... but WOW ... it seems like using one instruction to load multiple floating point values into a 128-bit register and then using another instruction to store those values is a quicker way to go. I have a few questions on SSE:

1) Since I'm a newbie to SSE, I used the following type of code to copy memory from one place to another:

Will this type of code (i.e. using __m128d * the same way I would double * or any other pointer) be valid in the future? Or is this something that works in VS2010, but might not work in future versions? ... If this will work in future versions, then what's up with all that _mm_xxxx_pd() stuff?

3) I have no idea if I'm writing good or bad SSE code. What are the suggested tutorials? Are there any good books?

Lastly, memcpy() and memmove() were faster at copying a single image than anything that I could write ... even with SSE and loop unrolling. I could only beat memcpy() and memmove() when I took into account my specific situation ... copying a single image buffer into two images ... where I could use a single for-loop for both images (as above), vice a separate loop for each image. So my question is:

4) What is the "secret sauce" in memcpy() and memmove() that makes them so much faster? Is the implementation of VS2010's memcpy() or memmove() available? If so, where can I find that code?

If memory serves me well, since VS2008 memcpy and memmove already make use of SSE instructions (including cache bypass) when you compile in release mode and with SSE flags up (/arch:SSE2 or /arch:AVX etc), i ready about it on some specialized sites (new memory not nerve me :doh:)even string search functions can take advantages on SS4.2 (with specific str intrinsics).You don't need manually make move bytes in your code, unless u doing something specific like, move and expand YCbCr to RGB in the same pass.And if you like it, AVX (new instructions from 2nd generation of Core iX) are 256bit wide.

Sorry the typing mistakes above. [old]"i ready about it on some specialized sites (new memory not nerve me :doh:) even string search functions can take advantages on SS4.2".[revised]"I read about it on some specialized sites (now memory not serve ...) ... SSE4.2). Another thing i want to add, u using #include , you only need include , this header include all other headers and include check macros against architectures (some intrinsics are 64bit or itanium specific and the macro nulls then)

Unfortunately, the machines that I'm writing for don't seem to have the AVX instruction set and its 256 bit registers. Our machines are about three years old ... and it seems that AVX is relatively new.

I am new to SSE. The only info that I have read is the couple of MSDN Help webpages about MMX/SSE intrinsics and the one GCC webpage that I could find. Where can I learn about how to program for the SSE instruction sets? What are the good book titles? What are the good websites/tutorials?

FYI: The reason that I am making copies of images is that I am at the start of a research project where I will get video streams from a couple of cameras. I need to send each video stream to a couple of real-time algorithms that will execute concurrently in separate threads. Since I don't know how destructive each algorithm will be to the image buffer, I just have another thread capture the images (i.e. fill the image buffer), make copies, and then let those algorithm threads loose on the copies ... where they can destructively edit those copies in complete isolation. The real-time algorithms have changed a few times, so this way I have a system that works without the chance of one one thread stomping on another. I'm sure that I'll revisit this copying-images code at the end of the project ... but by then, all the other algorithms will have been decided upon (and hopefully, set in stone).

@Burkholder: (now I make an account ), unfortunate I don't know any book about SSE/AVX, at least none for programming (when can find any they are a thousand page manual). Most I learned on intrinsic was reading Gamasutra articles and Intel (few times IBM) whitepapers. The Intel ones are complex but portable, as VS uses same headers (GCC prefer use vectorization classes).

Is safe nowdays use SSE2 code, and near safe use SSE3. Some notable citizens: AMD CPU OpenCL and WARP (Windows Advanced Rasterization Platform). A foot note, intrinsics are mandatory when compiler is targeting 64bit.

I need search if there is any form of direct mail or send a message to a niner, then I can stop poluting the comments with off-topic )

Unfortunately, intrin.h does not seem to exist for GCC's g++ and MinGW's port of g++; however, emmintrin.h exists for both GCC's and MinGW's g++ ... and for Visual C++ 2010. Here's the simple test that I ran:

@Burkholder: Good they added the XXXintrin.h, last time I used gcc (mingw) was really long time ago

The intrin.h is VS specific, only a huge all-in-one header that make those architecture check for you

On Intel have a guy talk about how his program (video processing, under en-us/blogs/2010/12/20/visual-studio-2010-built-in-cpu-acceleration/) get faster by simple using SSE2 arch option on VS2010. And under (en-us/avx/) is the Intel source for articles, the page always point to newest technology, but serve as a hub for the 'old' ones.

@new2stl: I've tried looking for your comments mentioning that on the "Standard Template Library (STL), 10 of 10" video (that's the one discussing type_traits) but I couldn't see them -- was I looking in the wrong place?

SEH is extremely low level, and in general programs shouldn't mess with it.

Burkholder> I have a few questions on SSE:

I know very little about SSE, other than the fact that it exists and what it's generally useful for.

Burkholder> What is the "secret sauce" in memcpy() and memmove() that makes them so much faster?

They have dedicated assembly implementations, and I believe they're constantly maintained by some combination of our compiler back-end devs and our Intel/AMD contacts. These assembly implementations know the fastest way to copy bytes from one location to another, which can be very processor-specific.

Burkholder> Is the implementation of VS2010's memcpy() or memmove() available?

My vague understanding is that there are actually 3 implementations of memcpy/memmove: assembly, "compiler intrinsic", and plain old C. I don't know how the compiler selects between the 3. See "C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\crt\src\intel\memcpy.asm" and "C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\crt\src\memcpy.c" for the first and third. The compiler intrinsic implementation, which I've been told exists, is a sequence of instructions that the compiler knows how to generate on demand.

Gordon> So when is the next video coming, i'm in withdrawal here.

Filming tomorrow. I finished setting my laptop up today.

theDUFF> Do you think it would be possible to cover allocators at some point?

Yes, although I'll have to think of something useful to say about them.

Mr Crash> I wasn't sure if you kept an eye on the c9 forum so here you go.

I don't monitor the Channel 9 forums, but I occasionally scan the MSDN Visual C++ forums.

Mr Crash> What am i doing wrong ?

It looks like you're trying to write a scope guard. I've written an implementation powered by std::function:

* I haven't extensively tested this, nor have I used it in production code.

* Destructors must not emit exceptions, so if invoking m_f() in ~HyperScopeGuard() throws, we immediately terminate().

* In HyperScopeGuard's constructor, storing f (via perfect forwarding) in m_f might throw, e.g. if the std::function tries to allocate memory and that throws bad_alloc. In that event, we invoke f() before rethrowing, so that we don't leak whatever we're trying to guard. There are a couple of subtleties here. First, f() itself might throw. That's bad (just like in HyperScopeGuard's destructor), so immediate termination is the answer. Second, there are subtleties involving moved-from functors, but I think I'm worrying about nothing there.

With std::aligned_storage I can make the default allocators grant the alignment of subsequent data, but the start address still need to be aligned too for SSE and SSE2. Inspecting the source code, allocators look for the info in the std::aligned_storage when possible.

Now with a more indepth of std::XXX_ptr, I feel more confortable to append a call to _aligned_free for the destructor.

If @STL provide some more info about allocator I believe the cycle will be complete making a a wrapper arround _aligned_malloc

PS.: all this is only necessary for MMX/SSE/SSE2 or for play with cache lines, SSE3 and beyond include unaligned versions of the LD and MOV.

@new2STL: Regarding "SSE3 and beyond include unaligned versions of the LD and MOV" -- would you happen to know what the costs are of using unaligned load/move in SSE3+?

According to Intel, for SSE2, the costs are at least 40% slowdown (going up to possible 500%): "Empirical evaluation using a 2.8 Ghz Pentium® 4 processor system shows that an unaligned 16-byte load contained within one cache line (128 bytes) is only moderately slower–about 40%–compared to an aligned access. The cost rises sharply though when the 16-byte chunk crosses a cache line boundary. Such cache line splitting loads can be up to five times slower!"

@Matt_PD: SSE3 (and SSSE3) works better with Core Architecture (Core, Core2, Core i#). The Core architecture are different from Prescott (P4). The Core have a different way to handle cache and SIMD. Diary of X264 have some talks about cache and SIMD in Nehalem (Core i#).

Here a excerpt of Intel about SSE3 LDDQU: "... is a special 128-bit unaligned load designed to avoid cache-line splits. If the address of the load is aligned on a 16-byte boundary, LDQQU loads the 16 bytes requested. If the address of the load is not aligned on a 16-byte boundary, LDDQU loads a 32-byte block starting at the 16-byte aligned address immediately below the load request. It then extracts the requested 16 bytes. The instruction provides significant performance improvement on 128-bit unaligned memory accesses at the cost of some usage-model restrictions."

If I find some numbers i edit this post

@Burkholder: I can be wrong but, reading on en.wikipedia.org/wiki/Decltype, semantic rule 3, the function void f() as it is passed to decltype are an lvalue then it is returning the reference to the function type (the "(*)" denote an unnamed function type). You can see a lot of this declaration on OpenGL headers.

"If P is not a reference type:- If A is an array type, the pointer type produced by the array-to-pointer standard conversion (4.2) is used in place of A for type deduction; otherwise,- If A is a function type, the pointer type produced by the function-to-pointer standard conversion (4.3) is used in place of A for type deduction; otherwise,- If A is a cv-qualified type, the top level cv-qualifiers of A’s type are ignored for type deduction."

The second bullet point applies here. When you pass an array or a function (like f) to a function template (like check_similarity_[12]) with a value parameter (like F f_), an array will decay to a pointer and a function will decay to a function pointer. These template argument deduction rules mirror how C and C++ work - any attempt to write a function that takes an array parameter or a function parameter is immediately and forcibly rewritten to take a pointer parameter or a function pointer parameter. (This rewriting is different from decay.) In C++ this is N3225 8.3.5 [dcl.fct]/5 "After determining the type of each parameter, any parameter of type “array of T” or “function returning T” is adjusted to be “pointer to T” or “pointer to function returning T,” respectively." and C has the same rule (C99 6.7.5.3/7). This is why array parameters are widely regarded to be a bad idea (anyone writing array parameters probably doesn't know what they're doing - fortunately almost nobody tries to write function parameters, which is why that rule is so obscure).

The third bullet point, const-dropping, may seem weird but it makes sense. Given template <typename T> void foobar(T t), and const int c = 1729; calling foobar(c) deduces T to be int, not const int. That's because when passed by value, the constness of the source is unrelated to the constness of the destination. foobar()'s author may want to modify its copy of t. (Otherwise, they could write template <typename T> void foobar(const T t).)

This stuff is somewhat subtle but fundamentally important, which is why I've explained at length - hopefully I haven't made things more confusing.

new2STL: Your explanation of what's happening with decltype is not correct.

double (int): This is a function type, "function taking int and returning double".double (*)(int): This is a pointer to function type, "pointer to function taking int and returning double".double (&)(int): This is an lvalue reference to function type, "lvalue reference to function taking int and returning double".

The (*) means "pointer". Here's a definition of a function pointer:

double (*fp)(int) = &func;

N3225 7.1.6.2 [dcl.type.simple]/4 specifies how decltype works:

"The type denoted by decltype(e) is defined as follows:- if e is an unparenthesized id-expression or a class member access (5.2.5), decltype(e) is the type of the entity named by e. If there is no such entity, or if e names a set of overloaded functions, the program is ill-formed;- otherwise, if e is a function call (5.2.2) or an invocation of an overloaded operator (parentheses around e are ignored), decltype(e) is the return type of the statically chosen function;- otherwise, if e is an lvalue, decltype(e) is T&, where T is the type of e;- otherwise, decltype(e) is the type of e.The operand of the decltype specifier is an unevaluated operand (Clause 5)."

Both decltype(f) and decltype(f_) activate bullet point #1, "unparenthesized id-expression", and return the type of f/f_ without modification.

Bullet point #3 applies to things like decltype(ptr[index]). In this case, it turns out that adding an lvalue reference is desirable.

@STL:Looks like we were both typing at the same time. Thank you very much for the explanation! I'm getting closer to understanding.

Is there anyway to go from a "void(*)()" back to "void()"? Or avoid this rewriting?

Lastly, why is the decltype( f ) in my first couple of std::is_same<...> lines in main() __not__ being rewritten to a pointer to function type once it is inside std::is_same<...>? In other words, why is the following output being produced from main()?

I'm still not sure why the rewriting in the std::is_same<>'s in main isn't taking the function type down to a pointer to function type, but I figured out how to go from "void(*)()" back to "void()" ... which was so simple I'm a little embarrased I asked the question ... just a simple typename std::remove_pointer<...>::type. Here's the code:

Since we have F ff as the parameter of the is_F_same_as_decltype_ff() function, it seems like F should __always__ agree with decltype( ff ). In other words, the bug is that there is type disagreement when explicit template arguments are used (esp. when there is no type disagreement when implicit template arguments are deduced).

Great videos, thanks!Just today I saw a bit weird behavior. Consider this code:struct deleter { void operator() (int* p) { delete p; }};std::unique_ptr a(new int);std::unique_ptr b;std::unique_ptr c;b=a; // does not compile c=a; // should not compile too; does not linkI checked the stl source code that the operator = is private member of unique_ptr but the "c=a" still compiles in VC2010. Am I missing something?

@Burkholder:I filed this as a bug on Microsoft's Connect website. The title of this bug report is "VC++ 2010: Explicit template arguments cause type disagreement for types that decay to pointers" (bug id: 647035) and is under the "Visual Studio and .NET Framework" section.

Note: This bug affects anything that decays ... so both function types and array types. If you start out with say "int arr[3];" and pass arr to those type of functions using implicit and explicit template arguments, then you get the same type of results ... type disagreement ( int[3] versus int * ) when explicit template arguments are used in VC++ 2010, but __no__ type disagreement in g++ 4.5.2.

Mr Crash> Interesting code though a bit heavy since it use exception handling.

Exception handling is part of the language, and is used by the STL.

PetrM> I checked the stl source code that the operator = is private member but the "c=a" still compiles in VC2010.

We've already changed unique_ptr such that VC11 emits "error C2679: binary '=' : no operator found which takes a right-hand operand of type 'std::unique_ptr<_Ty>' (or there is no acceptable conversion)".

However, it appears that you've found a compiler bug. I've filed DevDiv#150368 "Access control mysteriously not applied for VC10 RTM's unique_ptr" with a minimal repro:

N3225 7.1.6.2 [dcl.type.simple]/4 says: "The type denoted by decltype(e) is defined as follows: — if e is an unparenthesized id-expression or a class member access (5.2.5), decltype(e) is the type of the entity named by e."

8.3.5 [dcl.fct]/5 says: "After determining the type of each parameter, any parameter of type “array of T” or “function returning T” is adjusted to be “pointer to T” or “pointer to function returning T,” respectively."

This "adjustment" happens before sizeof (which can easily be verified with arrays adjusted to pointers), so it should happen before decltype too.

In fact, in the absence of templates, GCC believes that the adjustment happens before decltype:

N3225 7.1.6.2 [dcl.type.simple]/4 says: "The type denoted by decltype(e) is defined as follows: — if e is an unparenthesized id-expression or a class member access (5.2.5), decltype(e) is the type of the entity named by e."

8.3.5 [dcl.fct]/5 says: "After determining the type of each parameter, any parameter of type “array of T” or “function returning T” is adjusted to be “pointer to T” or “pointer to function returning T,” respectively."

This "adjustment" happens before sizeof (which can easily be verified with arrays adjusted to pointers), so it should happen before decltype too.

If decltype(ff) has to be adjusted to a pointer type, then why doesn't F also have to be adjusted to a pointer type (regardless of the explicit template arguments) ... like it's adjusted in the implicit template arguments case?

If (as you are suggesting) this type-disagreement behavior is intended by the standard, then why? What does this behavior enable? Or prevent? I ask because conflicting types ( i.e. F not being the same type as decltype( ff ) even though we declared F ff ) doesn't make much sense to me (esp. if ... say ... F is char[8] and decltype( ff ) adjusts down to char * where sizeof( F ) would be 8 and sizeof( decltype( ff ) ) would be 4 ).

There are several things going on here, so it's helpful to go step by step.

First, template argument deduction. This happens when you call a function template without providing explicit template arguments (or providing some-but-not-all). Being called in this manner is how most function templates are intended to be used, so template argument deduction is a very important process. (Indeed, providing explicit template arguments when you shouldn't is a subtle way to misuse C++.)

I quoted the relevant rules for this above, N3225 14.8.2.1 [temp.deduct.call]/2. (There are other rules not relevant here.) This says that given "template <typename T> whatever foobar(T t)" and "double func(int)" and "foobar(func)", T is deduced to be double (*)(int), i.e. a function pointer type. That's just how C++ works. Now, there's a reason for these rules - you can't pass around functions, but you can pass around function pointers, so when func is passed around by value, it makes sense for T to be a function pointer. (Hypothetically, the language could be restrictive and simply ban foobar(func), requiring you to say foobar(&func) in which there should be absolutely no mystery whatsoever - but recall that C++ is extremely permissive and allows programmers to say lots of things, then tries to figure out what they meant.)

After template argument deduction runs, what happens is the same as if explicit template arguments were used. So "foobar(func)" is exactly equivalent to "foobar<double (*)(int)>(func)".

The second thing is function parameter adjustment. This rule is ANCIENT, literally, because it comes from C. This says that functions declared as taking arrays or functions are immediately rewritten, or adjusted, to take pointers or function pointers instead. That's just how C works (and how C++ works). Now, there's a reason for THESE rules - in C you can't pass around arrays or functions, but you can pass around pointers or function pointers. So when the language sees a function declared as taking something "impossible", it just says, "okay, you can think it works like that, but I need to compile this into something possible". Personally, I take an extremely harsh view of this syntax, as I mentioned earlier - but the rules are what they are.

The rules for template argument deduction and function parameter adjustment are totally different - they occur in different clauses of the Standard - and yet similar, because they are both dealing with the same thing - you can't pass arrays and functions by value.

Now, you've taken it the next level by mixing templates and function parameters of function types.

> why doesn't F also have to be adjusted to a pointer type (regardless of the explicit template arguments)

Explicit template arguments don't get messed with. (I believe I'm glossing over a couple of subtleties here, mostly with template non-type arguments - please don't ask about those - but for the most part this is true.) If you tell the compiler that F needs to be a function type instead of a function pointer type, then a function type it shall be.

That still doesn't stop the compiler from performing function parameter adjustment, though - that process is unstoppable.

As you can see, this is subtle enough that two compilers written by experts disagree, but I believe that the Standard speaks with a clear voice here. (Sometimes it's worse and the Standard itself is ambiguous, requiring a Core Language Issue to be filed. Even Standards have bugs.) I believe that function parameter adjustment should happen before decltype inspects the type. I could be wrong - I have been known to be wrong about the Core Language in the past.

> I ask because conflicting types ( i.e. F not being the same type as decltype( ff ) even though we declared F ff )

Take a look at my "absence of templates" example above where VC and GCC are in agreement. There, ff is declared to have function type, but it actually has function pointer type.

Perhaps another example will illustrate why I believe that GCC is incorrect and inconsistent.

Given meow<int[3]>(arr), GCC believes that t is 4 bytes (incontrovertibly correct, it is a pointer), but that t's declared type is 12 bytes. Yet given purr(arr), where x is declared in the source code to be int[3], GCC believes that both x and x's declared type are 4 bytes.

I cannot imagine any possible interpretation of the Standard that permits GCC's behavior here.

@STL:Wow ... time to submit a bug report on g++! Or have you already done so?

I hear what you are saying about C++ staying consistent with C via adjustment ... which is why void ( char[123] ) is the same as void ( char * ) ... and void (*) ( char[123] ) is the same as void (*) ( char * ); however, VC++ seems to be just as inconsistent as g++ ... only in a slightly different way. Here's the inconsistency (switching to character arrays ... because sizeof( void () ) is not allowed by the standard):

If the currently proposed standard allows either one of these inconsistencies, then maybe some compiler warnings should happen ... or just fix the proposed standard before it is finalized.

Lastly, you wrote:"Indeed, providing explicit template arguments when you shouldn't is a subtle way to misuse C++"

I'm gonna go with "what?" on this one. Using implicit or explicit template arguments should produce exactly the same results ... because that would make the most sense (i.e. the whole "intuitive" thing). Don't you agree?

> VC++ seems to be just as inconsistent as g++ ... only in a slightly different way.

VC's working correctly there. In the case you're looking at (where VC prints 20 for T and 4 for t and decltype(t)), you've explicitly specified T to be char[20]. Function parameter adjustment makes t a char *, which is why it's 4 bytes.

Function parameter adjustment does not affect template parameters.

In particular, this behavior (sizeof(T) is 20, sizeof(t) is 4) is shared by VC and GCC, and mandated by C++98/03. It's not new.

> Using implicit or explicit template arguments should produce exactly the same results

It mostly does, when you're careful to specify exactly the same template argument that automatic deduction would have chosen for you - but then, why bother? And if you specify something different, now you're forcing the template into an unusual mode of operation.

(I'm aware of one case where you can specify explicit template arguments identical to what template argument deduction would have chosen, and yet the compilation explodes. This happens when people use explicit template arguments with swap(), which is WRONG and BAD and WRONG. The problem is that there are many swap() overloads, and while the provided explicit template arguments will work for the overload desired by the programmer, the compiler has to look at the *other* overloads too, and plugging those explicit template arguments in can cause a hard error. In contrast, when you rely on template argument deduction like you're supposed to, the undesired overloads fail out of deduction and are silently removed from the overload set. Again, this is subtle - if you don't understand it, simply remember that you shouldn't use explicit template arguments unless the function is documented as being called like that, as with make_shared<T>() for the first template argument.)

Where in the proposed C++0x standard does it state that function parameter adjustment does not affect explicit template parameters? My interpretation of 14.8.2.3 (PDF page 383) and the explicit template arguments in it's example seems to suggest that it does (esp. #2, where f<const int> implies that T and decltype( t ) are both const int, but the signature of the explicit f<const int> is adjusted to void(*)(int) ). In other words, T agrees with decltype( t ) in spite of adjustment ... but my interpretation could be wrong.

In particular, this behavior (sizeof(T) is 20, sizeof(t) is 4) is shared by VC and GCC, and mandated by C++98/03.

I don't have a copy of the C++98/03 standard handy. Is this still in the proposed C++0x standard? If so, where can I look this up? I did a quick search of "sizeof" in the PDF and I didn't see anything applicable to this situation ... but I might have read right over it.

> Using implicit or explicit template arguments should produce exactly the same results

It mostly does, when you're careful to specify exactly the same template argument that automatic deduction would have chosen for you - but then, why bother? And if you specify something different, now you're forcing the template into an unusual mode of operation.

In order to pass functions around, function types must be known at compile-time. If you have a templated function, then we have to explicitly instaniate the function template in order to pass it around. Contrived Example:

Here's a tip to avoid confusion: when citing the Standard/Working Paper, always mention what you're citing (e.g. C++03 or N3225) and both the numeric and alphabetic section IDs (e.g. 14.8.2.3 [temp.deduct.conv]). Knowing what document is being cited avoids the pitfall of looking at different Working Papers and being confused by wording changes between them. As for the section IDs, numeric IDs are easy to find through the bookmark tree, but are occasionally renumbered as sections are added, removed, or moved. The alphabetic IDs are provided because they're more stable (although very rarely they are modified, as happened to the Standard Library after C++03).

> and the explicit template arguments in it's example seems to suggest that it does

Those examples are depicting what happens to the function parameter types (which affect the overall function type).

Perhaps there's terminology confusion here. In "template <class T> void f(T * p);" the "T" is a template parameter. It'll be given a template argument, either explicitly or through template argument deduction. The "p" is a function parameter, and its type is "T *".

The same applies to "f(T t)". The template type parameter (on the left) and the function parameter type (on the right) are still distinct things, although both appear as "T" in the source code. Function parameter adjustment affects the latter, but not the former.

> (esp. #2, where f<const int> implies that T and decltype( t ) are both const int, but the signature of the explicit f<const int> is adjusted to void(*)(int) ).

That one's special - I've tried to avoid mentioning every possible scenario in the interests of reducing complexity. The thing about const value parameters is that they don't affect the callers of a function, but they do affect the function itself (where the const value parameter cannot be modified). Therefore, const value parameters are stripped out of function types, but still affect function definitions. Note the differences between this and what happens to array/function parameters. For THOSE, they get adjusted to pointers/function pointers in function types, AND this affects function definitions.

> In other words, T agrees with decltype( t ) in spite of adjustment

In that case, yes, because the adjustment has deliberately not been performed on the function definition (where const value parameters still matter).

> I don't have a copy of the C++98/03 standard handy. Is this still in the proposed C++0x standard?

Yes, same behavior. There are breaking changes between C++03 and C++0x, but not many (especially in the Core Language).

> If so, where can I look this up?

N3225 5.3.3 [expr.sizeof]/1: "The sizeof operator yields the number of bytes in the object representation of its operand. The operand is either an expression, which is an unevaluated operand (Clause 5), or a parenthesized type-id."

When 8.3.5 [dcl.fct]/5 says "After determining the type of each parameter, any parameter of type “array of T” or “function returning T” is adjusted to be “pointer to T” or “pointer to function returning T,” respectively." it's talking about function parameters.

Because the function parameter has been adjusted to be a pointer, sizeof(t) is 4. t behaves as a pointer in every other respect (e.g. when being passed to other templates).

The template parameter is unaffected - it is still an array type. sizeof(T) therefore returns how many bytes would be in such an array, which is 20.

> If you have a templated function, then we have to explicitly instaniate the function template in order to pass it around.

Ah, but there's a better way to do that (one that avoids the pitfall I mentioned earlier, where explicit template arguments make the compilation explode).

Instead of "f<decltype(s)>" you can pass "static_cast<void (*)(decltype(s))>(f)". (Yes, it's more typing, but it doesn't explode. I'll construct an example if you really want one.) Thanks to N3225 13.4 [over.over], when faced with overloaded and/or templated functions, you can use static_cast to disambiguate exactly which one you want. (This is one of the few good uses of casts).

Note that this will change the output, because as soon as you say static_cast<void (*)(decltype(s))> which is static_cast<void (*)(char[])>, the compiler adjusts that function pointer type to static_cast<void (*)(char *)>, so T is deduced to be char *.

Sorry about that. I didn't mean to confuse you. I should have written "explicit template __arguments__", vice "parameters". Since we were previously writing about the interaction between explicit template arguments for templated functions ( the A in "f<A>( a )" where f is "template<T>void f( T t )" ) and what the function parameters eventually get adjusted into ( where f<A> is adjusted to type "void ( adjusted(A) )" ), I figured that you would understand what I was writing about ... even though I slacked on the terminology. All apologies.

> If so, where can I look this up?

N3225 5.3.3 [expr.sizeof]/1: "The sizeof operator yields the number of bytes in the object representation of its operand. The operand is either an expression, which is an unevaluated operand (Clause 5), or a parenthesized type-id."

When 8.3.5 [dcl.fct]/5 says "After determining the type of each parameter, any parameter of type “array of T” or “function returning T” is adjusted to be “pointer to T” or “pointer to function returning T,” respectively." it's talking about function parameters.

Because the function parameter has been adjusted to be a pointer, sizeof(t) is 4. t behaves as a pointer in every other respect (e.g. when being passed to other templates).

The template parameter is unaffected - it is still an array type. sizeof(T) therefore returns how many bytes would be in such an array, which is 20.

Thanks! I completely forgot about that in C++98/03!!!

I think that I'm starting to get a clearer picture of what is going wrong here ... and what is going right. In order to clarify things even more, I'm trying to test something out using decltype() ... but VC++ 2010 is not cooperating. If I have the following function template and regular function:

g++ 4.5.2 compiles the following decltype of the instantiated function template just fine, but I cannot get VC++ 2010 to compile the same code (I get the following error in VC++: error C3556: 'f': incorrect argument to 'decltype'):

decltype( f<char[20]> )

Note: I can put regular functions in decltype in VC++ 2010 with no issues. In other words, I __can__ compile the following in VC++ 2010:

decltype( g )

Is decltype fully baked in VC++ 2010? Or are there known limitations?

Joshua Burkholder

P.S. - I would love more info and examples of that static_cast< function pointer >( function template name ) thing that you were writing about.

About all this interesting debate about template and functions I see The Visual C++ Weekly Vol. 1 Issue 9 (Feb 26, 2011) come with an interesting link titled Expressive C++: Fun With Function Composition (cpp-next.com/archive/2010/11/expressive-c-fun-with-function-composition/), they talk about function composition in C++ like the compositor operation . ("dot") in Haskell.

The examples ilustrate the use of template metaprogramaing and recursion, result_of protocol, boost equivalent (for pre C++0x compilers) and touch the type decay aborded by @Burkholder and @STL.

Burkholder> Well, I have to say that I'm really learning a lot about C++0x through this discussion.

Cool!

Burkholder> Hopefully, you're getting something out of this as well ... so that I'm not just irritating you.

Very few things irritate me - chief among them is when people are wasting my time. But when I'm explaining something and people are listening, I'm never wasting my time.

Burkholder> I should have written "explicit template __arguments__", vice "parameters".

Precise terminology is indeed important. It appears that this doesn't affect my response, though. Basically, you've got a function template, like "template <typename T> void f(T& r, T v)", with template parameters (like "T") and function parameters (like "T& r" and "T v"). First, this needs to be fed template arguments. It can get them implicitly (through template argument deduction) or explicitly (through explicit template arguments). Template argument deduction follows certain rules, while explicit template arguments are used as-is. After template arguments have been determined, they're plugged ("substituted") into the function template, in order to instantiate a real function. Suppose that f<int[3]>(blah, blah) has been called. In this case, T is int[3], end of line. When substituted into the signature, we get (int (&r)[3], int v[3]). The former is cool, but the latter is not, so it gets adjusted, and we end up with (int (&r)[3], int * v). Those are the function parameters that the function will use. This is what sizeof(r) and sizeof(v) see, and I claim (and VC agrees) that the same should be true for decltype.

Burkholder> Is decltype fully baked in VC++ 2010? Or are there known limitations?

There are bugs, but there are always bugs. So far they seem to be relatively rare and relatively minor (for example, decltype(expr1, expr2, etc, exprN) wasn't working properly, and Dinkumware wanted that badly, so we got the compiler fixed).

Burkholder> I would love more info and examples of that static_cast< function pointer >( function template name ) thing that you were writing about.

Coconut: That's maintained by another team, you'll have to ask them. Dinkumware and I maintain the One True STL in Visual C++, which other Microsoft toolsets (e.g. the Xbox Development Kit) are derived from.

Just finished watching this video on shared_ptr.As an experienced educator myself, I'd say well done. As an old programmer, though, I'd have wanted something a bit more challenging. ;-)Anyway, I wonder if you would comment, in a bit more detail, about how this 'type forgetting' works, particularly with regard to inheritance trees. I have often had to deal with complex inheritance trees for modelling ecosystems (and often would have base classes representing families or genera of related organisms and derived classes representing species: so a genus class might represent canids, and from it would be derived classes representing wolves, dogs, foxes, coyotes, &c.). All of these classes would be ultimately derived from an abstract base class (with only a small number of data members and perhaps have a dozen pure vitual functions). The simulation engine would have an std::vector containing boost::shared_ptr instances. The most basic base class has an empty, but virtual destructor to ensure that the right destructor is always used when instances are destroyed. In a complete, but simple, model codebase, there could well be hundreds of derived classes (which will inevitably grow as more life forms get modelled), and a model that is running would have thousands of instances of these. When a new object is created, the value returned by operator new is cast to the base class. I rely on this, and the fact the destructor is virtual, to ensure that each is cleaned up properly.If you ever saw my production code, you'd find that all pointers are immediately handed over to the most appropriate smart pointer the instant operator new returns it. Except way back when I first started using C++, you would never find a naked pointer in my code.This is the context, and why I try to encourage my junior colleagues to develop a habit of making a virtual destructor whenever they find themselves adding a virtual member function to whatever class they've been assigned to write.My first question is how would your method of 'type forgetting' fit into the context I often face. And my second is like the first: Why? Or what does this type forgetting provide that I don't already have with virtual destructors and a vector of boost::shared_ptr (I am not eccentric enough to even consider using malloc/free in my C++ code, so your example left me wanting more).My last question relates to habits to be encouraged among junior programmers being mentored by old fossils like me. As I said, I encourgage kids to add public virtual destructors whever they add virtual member functions to a class. But I have been told, recently, by equally old programmers that it is better to encourage a habit of making destructors protected and non-virtual. What I have not been able to get from these guys is an explanation of what significant downside there may be from the habit I encourage or what the upside is for the practice they recommend.I am not omniscient, so I will acknowledge there's plenty I don't know, but at the same time, I don't do things just because I can, but rather because there is a demonstrable benefit for doing it. My objective is always stable, fast and correct production code.Can you contribute to my education on this matter?ThankTed

I don't know if the reason that someone suggested this was, because you're maybe managing the deletion from elsewhere. Another reason, (not sure about this though), could be that the vtable will be bigger if you have a virtual dtor???

Creating and destroying the derived class will work in both cases (public virtual dtor or protected dtor) as usual:

CDerived* d = new CDerived();
delete d;

Calls ctor/dtors like this:

CBase()

CDerived()

~CDerived()

~CBase()

Hope that was of any help, and hope even more that it's all correct as I explained it.

Thanks Deraynger,You have it right, as far as you go.It is true that having a virtual destructor increases the memory consumed by vtable by the size of one pointer, but that is hardly significant on a machine with 8 GB RAM, and pointers to objects that can consume several kilobytes.The problem that the recommendation of making destructors protected makes is that it becomes impossible to make a std::vector > instance that holds instances of boost::shared_ptr, so all objects held by the vector are properly deleted.boost::shared_ptr is a different type from boost::shared_ptrIf you have only two derived types, this isn't much of a problem as you can have two vectors; one for each derived type. But if you have thousands of derived types, it becomes a nightmare. Having a public virtual destructor guarantees that one vector containing instances of all derived types through pointers to the base class is sufficient.But in the context of an event driven application where the user can set up a simulation by adding, modifying or removed instances of the derived classes, there are numerous places where these instances can be either created or deleted. THAT is a second reason why is it so useful to have all instances of the derived classes managed in instances of shared_ptr containing pointers to the base class.If I understood Stephan correctly, the latest incarnation(s) of shared_ptr provides a way to make the base class destructor protected and still store pointers to the derived class in shared_ptr, in turn in an std::vector, and still have things properly deleted. But what I don't see is what benefit this extra magic provides.ThanksTed

Ok, I get it now, and great that I got my part right (though you seem to know it all already, and also better than me )

Regarding the shared_ptr of a base class, with protected dtors, having instances of derived class', I have no idea, you'll have to ask STL (e-mail or comment in newest video thread: http://channel9.msdn.com/Shows/Going+Deep/C9-Lectures-Stephan-T-Lavavej-Advanced-STL-3-of-n). All I can think of, is maybe it doesn't use a vtable (not sure of the implementation of shared_ptr), and maybe the only benefit, is that it won't leave a memory leak, as opposed to not being able to delete the derived class

Ray

Remove this comment

Remove this thread

Comments Closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation,
please create a new thread in our Forums, or
Contact Us and let us know.