If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register or Login
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Re: std::string to std::vector<TCHAR> with terminator

Originally Posted by OReubens

Been wondering about this one...

I need to copy a std::string into a vector<TCHAR> (many times) and I'm looking at the most efficient way to do so.

Note that you should profile your code using a profiler and not assume that some method is slow or fast "by eyesight". C++ is a language where many things cannot be judged for speed by just looking at the code.

Re: std::string to std::vector<TCHAR> with terminator

Originally Posted by OReubens

is this the only thing I can do, or is there a better solution that does: reserve size+1, copy size bytes from string to vector, set terminator. without any excess ?

you can use a specialized allocator replacing default construction with a do-nothing operation and hope that the compiler is smart enough to avoid any loop while resizing the vector. If this is the case, you can use a resize() + copy() + back()='\0'.

BTW, you could also insert a null terminator in the original std::string, but I suppose you already thought about that

Re: std::string to std::vector<TCHAR> with terminator

Originally Posted by Paul McKenzie

Note that you should profile your code using a profiler and not assume that some method is slow or fast "by eyesight". C++ is a language where many things cannot be judged for speed by just looking at the code.

There doesn't appear to be a vector::reserve() with 2 parameters ?
Did you intend the constructor with a size+filler char ? Or vector::resize( size, filler) ?

in any case... The constructor approach wouldn't be usable in this case, since the vector is a member of a class, and the place this used is in a setter-member function.

The resize with filler, has the overhead of first filling the entire vector buffer with 0, then overwriting all but the last TCHAR with the contents of the string. A fairly costly operation on large strings.

Unfortunately resize without a filler also does an explicit zero-fill.

reserve+copy doesn't work because reserve doesn't change the size of the vector, and the subsequent copy errors (in debug)

Originally Posted by VictorN

Also note that there may be needed the conversion from char* to TCHAR* in UNICODE build.

Not an issue in this case, the unicode build is set up to use a std::stringw so it remains a straightforward copy.

Originally Posted by superbonzo

you can use a specialized allocator replacing default construction with a do-nothing operation and hope that the compiler is smart enough to avoid any loop while resizing the vector. If this is the case, you can use a resize() + copy() + back()='\0'.

BTW, you could also insert a null terminator in the original std::string, but I suppose you already thought about that

that custom allocator idea seems very flaky :s

changing the passed std:string is a bad idea. I can't go around changing the caller's data. All of those strings are const for a reason. And it has the nasty side effect that adding the extra terminator might need a reallocation as well (which is what I'm trying to avoid).

still no acceptable vector-based solution
I'm currently only seeing a way out in changing vector<TCHAR> to unique_ptr<TCHAR> and do all buffer management myself I'll need to file a 'breach of specs' form for this, which is going to be a hassle... sigh... let along changing/testing all the downlevel code to deal with that change.

Re: std::string to std::vector<TCHAR> with terminator

>> Not an issue in this case, the unicode build is set up to use a std::stringw so it remains a straightforward copy.
I'm confused. So the source of the copy is always "std::string", and the ANSI build uses a destination type of "std::vector<TCHAR>", but the Unicode build uses a destination type of "std::wstring"?
Unicode: std::string --> std::wstring
ANSI: std::string --> std::vector<TCHAR>

Re: std::string to std::vector<TCHAR> with terminator

Originally Posted by OReubens

that custom allocator idea seems very flaky :s

anyway, it's reasonable to assume that the compiler will cancel out a do-nothing loop and such an allocator behavior is legal and not immoral from a semantics pov. Actually, the only problem I see now is that older compilers ( those not supporting the new c++11 allocator spec ) use the copy constructor during vector::resize() instead of the default ctor.

Anyway, here is a small test to see if vc2010 effectively optimizes such a scenario or not:

the above, although not technically a valid vector value type, emulates an allocator implementing a do-nothing default-construction( note that vc2010 does not support the new allocator specification ).
In my system, char measures 6-8 ms whilst char_ measures 0, showing that the optimization effectively takes place.

Out of curiosity, does this code actually have production builds for both ANSI and Unicode?
If so, for what purpose - legacy OS's, or to support legacy app's that consume both targets?
If not, does the opposite of the production target even compile?

Re: std::string to std::vector<TCHAR> with terminator

Originally Posted by Codeplug

>> Not an issue in this case, the unicode build is set up to use a std::stringw so it remains a straightforward copy.
I'm confused. So the source of the copy is always "std::string", and the ANSI build uses a destination type of "std::vector<TCHAR>", but the Unicode build uses a destination type of "std::wstring"?
Unicode: std::string --> std::wstring
ANSI: std::string --> std::vector<TCHAR>

Is that what you're dealing with?

gg

no, the code uses TCHAR, and uses string in ansi and stringw in unicode. Should have simplified the question with char and string rather than mention TCHAR, that's just what the code was like when I copied it.

Re: std::string to std::vector<TCHAR> with terminator

Originally Posted by superbonzo

anyway, it's reasonable to assume that the compiler will cancel out a do-nothing loop and such an allocator behavior is legal and not immoral from a semantics pov. Actually, the only problem I see now is that older compilers ( those not supporting the new c++11 allocator spec ) use the copy constructor during vector::resize() instead of the default ctor.

Compilers tend to be pretty good at optimizing even very complex looking code.

now, replace the vector value type with

Code:

struct char_ { char_(){} char_(char_ const& ){} char _; };

the above, although not technically a valid vector value type, emulates an allocator implementing a do-nothing default-construction( note that vc2010 does not support the new allocator specification ).
In my system, char measures 6-8 ms whilst char_ measures 0, showing that the optimization effectively takes place.

interesting approach...

can you elaborate on the "not technically a valid vectory value type" ?

I needed to add an assignment operator to the char_ class to make it work, but

does seem to work.
The zero-fill is effectively removed.
the code behaves as expected for all tests.
The only (big) disadvantage here is that the compiler now fails to optimize the std::copy into a memmove() call, but instead opts for a byte-by-byte copy loop

For short strings this would be an advantage, for very long strings, it's a noticable slowdown.

it also has a rather strange syntax to feed into the legacy API (getting the result as an LPTSTR)
LPSTR lpsz = &vec[0].ch;
which isn't as nice as the vec.data() it was before.

it'll need some more tests, and this of course also changes the resulttant type, so it's just as much a breach of spec as the unique_ptr<TCHAR> approach. Something tells me they'll feel less for this way out.

Re: std::string to std::vector<TCHAR> with terminator

can you elaborate on the "not technically a valid vectory value type" ?

char_ is not copy-constructible anymore; indeed, the vector data may become garbage just after a simple push_back.

on the contrary, if you have a compliant c++11 compiler a do-nothing default constructor is sufficient because AFAIR the new container specification directly default-constructs elements into raw storage instead of copyng a default constructed instance as before. In other words, a "struct char_{ char_(){}; char _; };" would do the trick. However, the resulting char_ would still inhibit the memmove optimization ( yes, you can write char_() = default; to avoid that, but this makes the zeroing kick in again ).

For this reason, I suggested to write an allocator instead ( this use case is specifically allowed by the newst standard ), this would solve the zeroing overhead, the memmove issue and the "strange syntax" issue. That said, I suppose only the latest clang and gcc actually support this.

anyway, did you considered codeplug's reserve+assign+c_str suggestion ? I think you could even spare the reserve call ( being the const char* returned by c_str() random access iterators, the reserve should be done automatically, but I'm not sure though ... )