File size

File size

File size

File size

File size

126.6 MB

Welcome to another installment of C9 Lectures covering the powerful general C++ library, STL. Joining us once again is the great Stephan T. Lavavej, Microsoft's keeper of the STL cloth (this means he manages the partnership between the owners of STL (dinkumware) and Microsoft, including, of course, bug fixes and enhancements to the STL that ships as part of Visual C++). Simply, Stephan is a C++ library developer.

As is Stephan's nature, he elaborates on technical details in very substantive way. The Standard Template Library, or STL, is a C++ library of container classes, algorithms, and iterators. STL provides many fundamental algorithms and data structures. Furthermore, the STL is a general-purpose library: its components are heavily parameterized, such that almost every component in the STL is a template.

In part 8, Stephan guides us on a journey into the dense forest of Regular Expressions.

If I match the regular expression "a+" against a sequence of "a" characters, how does the regex determine how many "a" characters to choose? Is there something like a "maximum munch rule" for regular expressions?

My question is really about lecture #1 -- I posted this question at the page for lecture # 1 originally: Why does the DOS screen pop up during your Visual Studio builds? I use VS 2010 and my installation does not do this. Thanks again -

Great lecture series, thank you.My question is really about lecture #1 -- I posted this question at the page for lecture # 1 originally: Why does the DOS screen pop up during your Visual Studio builds? I use VS 2010 and my installation does not do this. Thanks again -Todd

[/quote]STL is putting a breakpoint on the last line and is running in debug mode in most videos. This causes his programs to break before closing.

[NotFredSafe]> If I match the regular expression "a+" against a sequence of "a" characters,> how does the regex determine how many "a" characters to choose? Is there> something like a "maximum munch rule" for regular expressions?

[Vaclav]> @NotFredSafe:If the regex is greedy (default), as many characters as> possible are matched. It is explained in the PPT document on page 16.

Slight correction: greediness is a property of individual quantifiers, not whole regexes. A regex can contain both greedy quantifiers and non-greedy quantifiers:

Greedy quantifiers want to consume as many characters as possible, while still allowing the regex to match the string (regex_match() versus regex_search() determines whether the whole regex must match the whole string, or just a substring).

Non-greedy quantifiers want to consume as few characters as possible, while still allowing the regex to match the string.

In the example above, ([0-9]+) is greedy, so it'll consume 123456 and two of the 1701s. It can't eat the third, though, because ((?:1701)+) comes next and wants to eat at least one 1701. (This might be confusing because two greedy quantifiers, ([0-9]+) and ((?:1701)+), appear next to each other. You can think of the regex as being matched left to right, so ([0-9]+) comes first and gets to decide what it's going to do. It'll satisfy its own preferences, while being aware of whether the rest of the regex will match the rest of the string.)

Because ([0-9]+?) is non-greedy, it wants to eat as few digits as possible. Ideally, it'd like to eat just the 6. However, that would leave 54321172917291729ghi for ((?:1729)+)ghi to match, which wouldn't work. Therefore, the ([0-9]+?) consumes 654321, and leaves all three 1729s for ((?:1729)+) to eat.

[Toddintr]> Why does the DOS screen pop up during your Visual Studio builds?

I'm building console programs, so when I compile-and-execute them, the Command Prompt appears. If I were building graphical programs, their GUIs would appear.

I usually run the compiler from the Command Prompt, as in the example above (if you haven't seen this, Start > All Programs > Microsoft Visual Studio 2010 > Visual Studio Tools > Visual Studio Command Prompt (2010) will launch a Command Prompt where you can run cl.exe, the compiler). But I use the Visual Studio IDE in the videos because most people are familiar with that.

[quote]Nov 17, 2010 at 1:26 PM, Charles wrote9 is coming next week.Stephan recorded 8 and 9 on the same day, so there is no excuse for 9 not posting next week. I'm to blame if it doesn't![/quote]So it's friday, time is up and oh <sarcasm> surprise </sarcasm> Charles failedi saw that coming.Charles, predicable much ?yes Charles there is still time (time zones, sigh) but i couldnt resist being early with the blaming and hating <sarcasm> since i love microsoft employees so much </sarcasm> well we all have to have a hobby right ? Oh who am i kidding microsoft deserves it :3

[quote] Nov 17, 2010 at 1:26 PM, Charles wrote 9 is coming next week.Stephan recorded 8 and 9 on the same day, so there is no excuse for 9 not posting next week. I'm to blame if it doesn't!

So it's friday, time is up and oh <sarcasm> surprise </sarcasm> Charles failedi saw that coming.Charles, predicable much ?yes Charles there is still time (time zones, sigh) but i couldnt resist being early with the blaming and hating <sarcasm> since i love microsoft employees so much </sarcasm> well we all have to have a hobby right ? Oh who am i kidding microsoft deserves it :3

@Charles:So you're not the sharpest one when it comes to planing then, shouldnt you be good at that in your job, anyway, oh well could happen to all of us, although shouldnt you have seen this holidays coming since it's every year at the same date ?

@NotFredSafe: Sure, can i have some biscuits too ? Note: I was just pointing out the obvious flaws in Charles excuse.What's next excuse ?"Sorry, santa claus came at Christmas so i couldn't ..." "Sorry, the dog ate my to do list" "Sorry, a butterfly died at the other side of the planet so i couldn't ..."Do i get more cheese now ? with biscuits ? and some tea too ? ooo yummy...

Hi Stephan,Another great lecture. I was working with boost::regex and didn't know tr1 has it already. I found the interface similar to boost::regex, so i wanted to know if tr1 regex is adopted from boost and i should expect similar interface?Thanks,Vikas.

Their interfaces are virtually identical. The biggest differences are in their regular expression grammars. std::regex uses ECMAScript 3's grammar, which was derived from Perl's. boost::regex supports those features, but has recently implemented even more advanced features from Perl (e.g. possessive quantifiers).

std::set<long,mycmp<long> >s; mycmp<long >* p=&s.key_comp(); It looks like this STL version creates a class copy, so the copied class members are hard to reach. p points to the original class instance. I am not able to find info on how to:a/Prevent STL copiing the class, so it works with the original. b/Locate description of some established method to comunicate with the copied class data members.Thanks Jan

@STL:Hello, thanks for quick response!and for excellent lessons on STL! Without the lessons, I'd awoid STL...To drill it down to essentials. I'm trying to do what I used to do with quicksort and its callback. I am aware this is -probably- not intended use of STL. I have a lot of strings I have to sort, or rather, I do not want to sort in a set. I'm putting the strings( counted in 1000+) in a vector, numberig them by arrival as en index in the vector, which I sort in a set. Sacrificing some ram, things used to run much faster that way. In the set<long>'s key_compare, I'm comparing strings. I even use set.find to exclude doublets. That means I need 2 ways of comparing strings, one for set.find and another one for set.insert.In addition, I wish I could hide a lot of data members. Recently, I'm forced to use static/global data(Hm...) So forget about easy concurrency...Otherwise, the code works.// compile with: /EHsc#include <stdafx.h>#include <ppl.h>#include <iostream>#include<set>#include<vector>#include<string>#include<algorithm>#include<functional>using std::binary_function;// the class data members are temporarily referenced as globals because I experience key_comp() // class data member access problems.// there is one code, so DBG tracks key_comp(), 2 data members sets// in addition to constructor, operator () adresses the copy as well std::vector<std::string> v;// instead of sorting strings, put them in an unsorted // vector and sort corresponding vector indexes in a set// Pros: On most architectures it will outperform string sort.// Cons: extra memory allocation of index values//template<class T> class cmp;// compiles OK without declarationstemplate<class T> class cmp{public:// forced to global residence#if 0 int st,mxi; std::string stl;#endif cmp(){}; //~cmp();// linker error bool operator () (const T& t1,const T& t2)const{ switch(st){ case 0:// sort vector arrival indexes in a set return v.at(t1)<v.at(t2); case 1:// find index of the string, mxi is max index value+1,not present in the set if (t1==mxi)return stl<v.at(t2); return v.at(t1)<stl; default: // the program is sick std::cout<<"program error: st,mxi,stl="<<st<<" "<<mxi<<" "<<stl<<std::endl; terminate(); ; }//sw }//op};Another piece of code:const std::string str[10]={"1","9","8","6","5","1","9","3","2","1"}; int maxind=0; std::cout<<std::endl<<"Input:"<<std::endl; for(int i=0;i<10;i++){ std::cout<<str[i]<<" "; // find str st=1;mxi=maxind;stl=str[i];if(s.find(maxind)!=s.end())continue;// found v.push_back(str[i]);// not found st=0; s.insert(maxind);maxind++; } std::cout<<std::endl<<"Output:"<<std::endl; for_each( s.begin(),s.end(),prn); std::cout<<std::endl; // How?? [](int n){std::cout<<std::endl<<v.at(n);} s.clear(); v.clear(); //____________________ std::set<long,cmpo<long> >s1; cmpo<long>* p1=&s1.key_comp();Thanks Jan

Are you saying that your ultimate goal is to sort a sequence of strings? With VC10, just use sort() on vector<string>. Move semantics will automatically avoid unnecessary dynamic memory allocation/deallocation during the sort.

Hello,Please, You have not answered my question: how do I access data members of my set's class key_compare() in STL ?I will not only sort, I will remove about 80% of redundant info, using set.find.Using a vector, I am out of memory on the destination computer, which will not be upgraded.Using a set, things will work.Thanks for taking Your time for the comparations! Sorting strings of a vector can make life easier later on in an application, so its good thing to optimize it. The comparations:Q9450, what an impressing speed! With that power, I would test every case building the containers from scratch for every case, thus re-reading the string file or the like.a/You do not add the time to build up the first vector, and the build up of the other containers may take longer than You think. It may be missleading. b/You sort it, and then feed it(sorted) to the pointer and set versions. The sort times can not be compared, its different data. Some sort versions do not like sorted data.c/ For the vector, You prealocate the final space, but not for the set.But anyway, the pointer and set versions can be somehow compared. I expect the set perfom worse, it has to rebuild the tree for every new item. But in my app, from the pointer vector version, I would have to compare 100% and delete about 80% of the items, which is already done in the set version. So I expect both methods take about the same time for my app.Thanks Jan

> With that power, I would test every case building the containers from scratch for every case, thus re-reading the string file or the like.

Oh, I presented my code in what appears to be a slightly misleading manner in an attempt to save space. The snippets beginning with "const long long start = counter();" are intended to entirely replace the bottom of the original program beginning with that line, forming three independent programs. I believed that this was clear but if it wasn't, I apologize.

> a/You do not add the time to build up the first vector

That was intentional. My understanding at the time was that the data was already in memory.

> b/You sort it, and then feed it(sorted) to the pointer and set versions. The sort times can not be compared, its different data. Some sort versions do not like sorted data.

This was not the case - see above.

> c/ For the vector, You prealocate the final space, but not for the set.

Again, I assumed that the data was already in memory.

(This is why *extremely clear* problem descriptions are vital.)

set<string> should outperform set<const string *, comparator> above due to reduced indirection (but I expect the difference to be slight).

However, there is a performance bug in VC10 RTM that you should be aware of. When inserting lots of duplicate data into a set (which is your scenario), VC10 RTM can perform unnecessary allocations and deallocations which significantly harm performance. (The code allocates a node, tries to insert it in the set, notices that it's a duplicate, and then deallocates the node.) When you insert *const lvalues*, the unnecessary work is avoided. So, T t; s.insert(t); triggers the slow codepath, but s.insert(static_cast<const T&>(t)); triggers the fast codepath.

Hello,I understand You are wondering why I use two containers. It will be compared to another software, which is designed that way. This is the only way to sort strings in that design. Its fast, but wery limited and very, very inconvenient to work with. I think it will be abandoned, regardless of STL performance. Later, I will change STL approach to more simple, one container design, for this gentleman who wants it this way. I already did in other simple projects for myself and others, in fact. I have not experienced any STL introduced troubles at all. But my STL experience and knowlege is still very limited.It is a strange "double container" design that caused me strange troubles.So thanks on performance hints, if that is so easy to do so why not do it.I do not expect top performance from STL in this early release, there are so many other countless reasons to use it. Dot.In a post marked now 4 days ago I posted a small test of "double container software" to show it works with globals. You see if I find an item already in the set, it will not insert it. If I could not design set.find in few minutes, I would not. I heard other peple discovered the node behavior by insert, but it cannot hart performace significantly I believe.Now, I can see while debugging, there is a copy of my key_comp class data somewhere.I understand I have to implement a copy constructor and get a pointer to and work with the copy. But I do not know why the copy is implemented, so I wonder if there is something more I have to do.Thanks Jan

Hi Stephan,Awesome video series. Looking forward to the next parts.One question: Qt's regex engine allows to determine whether a given string is a prefix of something that could match a regex, e.g. for the regext a+b+, it would recognize that "aaaaa" is a valid prefix. This is very useful for validating user input as it comes in. Is there a way to make the std regex engine do this?On a side note, if you use Start Without Debugging (Ctrl+F5) to start programs, VS will insert keep the console open, so you don't need a dummy line with a breakpoint.

Hello,class set with replaced functor comparator is a frenetic copier of key_comp object.It creates 3 copies to begin with, later it copies copy #2 to copy #3 at every acessof its functions. Of course its possible to callback from a class copy creator, but a function using global control variables is maybe better. Questions:Is there any rational explanation of the software's behavior?It is claimed functor object adds a "state". What does that mean in these circumstances?

If I understood correctly Stephan said DO NOT loop incrementally through smatch m[i] results. Would the following code work in context?:static int i;if(regex_match(s,m,r)) {while(i <= m.size()) return foo( m[++i], pf); }

really talking with this Guy on STL is twice as valuable as reading four books on stl . awesome and looking for a presentation like this on MFC. the presentation is not simply about programming but it include scientific computation which helps understand what is under the hood.

Remove this comment

Remove this thread

Comments Closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation,
please create a new thread in our Forums, or
Contact Us and let us know.