How to concatenate, then widen string literals

This is a discussion on How to concatenate, then widen string literals within the C Programming forums, part of the General Programming Boards category; I have a useful macro:
#ifdef _UNICODE
#define TEXT(x) L##x
#else
#define TEXT(x) x
#endif
Ie using the L prefix ...

How to concatenate, then widen string literals

I have a useful macro:
#ifdef _UNICODE
#define TEXT(x) L##x
#else
#define TEXT(x) x
#endif
Ie using the L prefix to 'widen' the string literals. This part alone works great.

Now, I want to do this:
TEXT("foo" "bar" "baz")
which ideally, #ifdef _UNICODE, would yield:
L"foobarbaz"
which is what I want out of all this - a concatenated, wide string.

The problem is that C preprocessors expand the TEXT macro and do the token-pasting first, yielding this:
L"foo" "bar" "baz"
and only afterwards does the C compiler do concatenation of the adjacent string literals. At this stage, one C compiler
emits the error "concatenating mismatched wide strings"; other C compilers seem to do what I want - if any string is wide, they are all 'widened', then they are concatenated.

What I really want is for the string-literal concatenation pass to be done first, yielding:
TEXT("foobarbaz")
and only afterwards, the token-pasting operator will prepend L, yielding:
L"foobarbaz"
At the moment, I am forcing things to happen the way I want by doing:
#define TEXT2(a, b) TEXT(a) TEXT(b)
#define TEXT3(a, b, c) TEXT(a) TEXT(b) TEXT(c)
... and so on. This works, forcing the L to be prepended first (but to each string seperately), so that all of the strings are wide by the time the compiler sees them, so that they can always be legally concatenated by any compiler.

However I will soon reach TEXT<INT_MAX>, and it is a kluge that hurts my eyeballs each time I scroll past it.

Apparently, under C89, the value of "foo" L"bar" is undefined, whereas later C flavours define it to be identical to L"foo" L"bar". I suppose this is why I only encounter the problem with one specific compiler. But I can't upgrade the compiler in question (because it's a closed source POS that I am forced to use and cannot change).

Ie T_FMT is 'the format string that means the passed pointer points to a foo string', where foo is wide #ifdef UNICODE_SUPPORT, and narrow if not.

Hope you understand what I'm trying to achieve... I want to use one format string to mean 'here comes a string', no matter whether (given the current build settings/OS/phase of the moon) these strings are wide, narrow, or whatever. But this requires different values of T_FMT depending on build settings, OS, phase of the moon, etc. So I am stuck with trying to concatenate a biggish number of seperate constant strings. (I think - am I?).

BTW, you would think that this stuff would be standardised - how to say "this is a string" across platforms. But Microsoft had to go and add their own proprietary non-standard extensions... <sigh>

>you would think that this stuff would be standardised
That's convenient, because it is. Now we won't have confused programmers wandering around.

>how to say "this is a string" across platforms
"this is a string" works just fine for me and everyone else using standard C. L"this is a string" is the wide version of that, and oddly enough there are variants of the standard library that work with wide strings.

>But Microsoft had to go and add their own proprietary non-standard extensions... <sigh>
The nice thing about proprietary non-standard extensions is that you aren't forced to use them.

>(I think - am I?)
It really looks like you're trying too hard. Sit back and rethink the problem and I'm sure you'll see something a little more elegant.

Elegant solutions...

There're only three elegant solutions I can see at the moment:

1. Preprocess format strings in a wrapper for *printf*, converting %s to %ls or whatever as needed. But this makes every call to *printf* into a malloc, copy/munge, call *printf*, then free. Elegant because it completely hides the problem beneath an abstraction layer; but inefficient and therefore offensive to my C programmer sensibilities :-)

3. The other 'elegant' solution would be to switch to using GCC under Cygwin or Mingw32 to do the Win32 port of my program. That looks more appealing to me each day I fight^H^H^H^H^Huse Visual C++.

<rant>

It would be much easier to convince the boss to shell out several thousand dollars on a new version of VC++ if I could be sure that the new version does contain these newer ISO C string concatenation semantics. But do you think I can find that information anywhere on Microsoft's site? Grrr.

The only stuff they have is all marketing-type crap ('increase developer productivity', blah blah), or technical information regarding web services, C#, managed code, and other 'fluff', which I couldn't care less about. Where is a Changelog for CL.EXE? Cut the crap about 'managed code' - where is information on the friggin' ANSI C compiler? It's the most important part since almost everything uses C eventually. In fact to me it's the only important part, since I couldn't give a rat's arse about 'Web Services', c sharp, c flat, e sharp minor, whatever.

>2. Upgrade Visual Crud++ to a newer version that groks newer ISO C dialects
The latest version of Visual Studio does not parse C99.

>and therefore interprets L"foo" "bar" as being identical to L"foo" L"bar" instead of printing silly error messages
And what makes you think that this is the wrong behavior? Can you quote me line and verse from the standard that says L"foo" "bar" must be parsed as L"foo" L"bar"? Don't claim that a compiler does it wrong unless you can prove it, and if you can I highly suggest doing so to avoid being flamed.

>This is all rather frustrating...
I can empathize with you, but you could hold back the scathing insults about your tools and spend some of that frustration energy doing more productive things. Reading paragraph after paragraph of Microsoft bashing gets old after a few years.

And what makes you think that this is the wrong behavior? Can you quote me line and verse from the standard that says L"foo" "bar" must be parsed as L"foo" L"bar"? Don't claim that a compiler does it wrong unless you can prove it, and if you can I highly suggest doing so to avoid being flamed.

IMO printing an error message, instead of compiling L"foo" "bar" as if it instead had have been written L"foo" L"bar", is 'wrong', in the sense that the former behaviour is more surprising and less useful to me than the latter. In this subjective sense of the word 'wrong', then as seen by me, this behaviour is wrong, yes.

I have never claimed that this behaviour is 'wrong' in the sense of not adhering to C standards to which the product claims to adhere. You've misinterpreted me as saying something that I in fact didn't say (deliberately perhaps?).

I do claim that this behaviour does not conform to C99. But VC++ does not claim to implement C99. VC++ does in this matter adhere to C90, and that's all its docs claim to do. So in summary:
- VC++ is not 'wrong' in this objective sense of the word, meaning adherence to specifications as advertised; and
- I never said that it was; and
- I am not saying so now.

According to some web pages I read (no links, sorry), ISO C90 leaves this behaviour unspecified (so VC++ complies with C90 as advertised when it rejects my program) but C99 mandates the behaviour I prefer - make all strings being concatenated wide, then concatenate them. Google for 'wide string concatenation iso c' or something similar, and you'll doubtless find the pages that I read that cause me to say this. If you want ISO C spec chapter and verse, I encourage you to obtain the spec and find the appropriate section(s). I don't have time to do this for you, I'm sorry.

I apologise for offending you with my 'microsoft-bashing'. As I said, I was frustrated. That being so, I have to admit that I enjoy Microsoft bashing, and I often find it to be justified. I am not sorry that I engage in Microsoft-bashing; I *am* sorry I offended you with it. I will avoid it here in future for your benefit.

quzah: My problem here (ok, well, one of them) is the different behaviour of GCC 3.3.4 and VC++ 7.something (or ISO C99 and C90 compilers respectively) when attempting to concatenate a wide and a narrow string, thus: L"foo" "bar". Or written more clearly: "foo" L"bar" - only one 'L', but two seperate strings in quotes.
Your program doesn't do this - I don't see any string concatenation in there. Your program just shows that your UNIX box is capable of writing a wide string to stdout, which is a cool feature, but it's nothing to do with string concat behaviour exactly.

anonytmouse: Thanks for the link - I'll go download that and have a play with it - even though it doesn't solve this particular problem it'll be useful in the future, I bet.

quzah: My problem here (ok, well, one of them) is the different behaviour of GCC 3.3.4 and VC++ 7.something (or ISO C99 and C90 compilers respectively) when attempting to concatenate a wide and a narrow string, thus: L"foo" "bar". Or written more clearly: "foo" L"bar" - only one 'L', but two seperate strings in quotes.
Your program doesn't do this - I don't see any string concatenation in there. Your program just shows that your UNIX box is capable of writing a wide string to stdout, which is a cool feature, but it's nothing to do with string concat behaviour exactly.

I misread their quote. I read it as "Hello World", not "Hello" "World". But if it really matters: