Introduction

I've been doing some work lately with the MSHTML control, which takes most of its arguments as VARIANTs. I can see you shuddering already. What's a
self respecting C++ programmer doing dirtying his hands with VB/Scripting language datatypes? Well it's the lesser of two evils. Either I can learn about
VARIANTs or I can write my own HTML parser and editor. Which do you think is easier? I knew you'd agree

Truth be told, I think the VARIANT concept is actually pretty cool. Wrap your data up in a nice little package with a type descriptor or two, throw it
across a function call boundary and let the other side figure it out. If done right it can solve a lot of otherwise nasty problems. I just wish they were easier to work with!

So that's the apology out of the way. Let's look at what a VARIANT is.

Why a VARIANT?

In contrast to a strongly typed language like C++, Visual Basic and many scripting languages are weakly typed. What this means is that in a strongly typed language
you must pass the exact types of arguments to a function that the function was written to accept. If the function expects a pointer to a string you can't pass it an
integer. Try to do so and you'll get a compile time error.

Weakly typed languages allow you to pass arguments that don't match the types expected. So the question should arise - if you can pass the wrong argument type to a
function how does the language respond? Most weakly typed languages 'coerce' the value that was passed into the expected type. What does this mean? It means that
the language runtime will try and convert the data that was passed into the correct data type. For example, if you were to pass an integer to a function that
expected to see a string the most natural 'coercion' is to convert the integer into a string representation. Pass a date where a string is expected and the natural
'coercion' is to convert it to a string representation.

As C++ programmers we're already used to coercion on a small scale - we're used to the idea that the compiler can do promotions from short
to int and so on. Weakly typed languages just take it a step or two further.

So what has this to do with VARIANTs?

Imagine you're designing your own programming language. You know the kinds of datatypes you want to support. You know the kinds of intrinsic operators you want. You
can design your compiler to keep track of the datatype of everything in your program, so that when the programmer passes the wrong datatype to a function your compiler
knows it and can insert the necessary code to convert the data.

Now imagine you're required to not only support your language but another language (say C++). You have complete control over your own language but no control whatsoever
over the second language. Yet you want to be able to interoperate with that language. Since it's you who wants to interoperate with something you cannot change it's up
to you to adapt to the 'something you cannot change'. So you design your datatypes in such a way that they contain sufficient information over and above the data
they encapsulate to allow anyone else to decipher their contents.

Enter the VARIANT

A VARIANT is a not so exotic way of solving this problem. Simplified a VARIANT looks like this.

This is a very simplified version of the full VARIANT definition to be found in your nearest copy of oaidl.h. I have no idea what
the wReserved values mean, nor do I care.

What we're interested in are the vt values and the union. vt is the valuetype and the union is the value. You'll see that the union
encompasses LONG, BYTE, SHORT, FLOAT and so on (there are a bucketload of em). vt tells us how to
interpret the value, using the member names. In C++, you might do it like this

This checks the vt member of the VARIANT. If it's a VT_I4 then the data we want is contained in the lVal member
of the union. Since the lVal member is a LONG we can use %d as the format spec in the printf call. If it's a
VT_BSTR then the data is a BSTR contained in the bstrVal member of the union.

Notice how VARIANTs use the BSTR datatype to pass string data. This is done so that a VARIANT can be passed across
a process boundary without incurring marshaling overhead. There are many other datatypes (not discussed in this article) which do require marshaling to cross a
process boundary but the passing of strings is so common that using a BSTR to sidestep marshaling is a nice optimisation.

Encapsulating a VARIANT in a simple class

Based on the code snippet we saw earlier it might make sense to hide the dirty details of a VARIANT in a class. We might do it thusly

That simplifies the code a little by hiding the dirty details of figuring out the VARIANT type or converting it's contents inside a method call on the
object but it's hardly enough to warrant a new class let alone an article about it.

Encapsulating a VARIANT in a more complex class

The simple class I showed above is probably adequate for most casual VARIANT usage. It's certainly adequate for using the MSHTML control I alluded to in
the introduction. It may not be sufficient for other environments. For example, some years ago I wrote a whole bunch of software using the Microsoft Chat Protocol
control, which seems to have been designed by a committee whose members only knew VB. Almost all data passed between the host and the control is passed as
VARIANTs and some of those VARIANTs are arrays. A VARIANT represents an array using the SAFEARRAY structure.

The SAFEARRAY definition looks like this (this is the Win32 definition - it's a trifle different for WinCE).

You're going to love the purpose of the SAFEARRAYBOUND member. It's a structure that specifies the number of elements in this dimension and the
lower bound. This allows an index into a particular dimension of the SAFEARRAY to start at any arbitrary number rather than the 0 that we C/C++
programmers know and love. There's an array of these structures, one for each cDim.

So accessing a VARIANT array in C++ involves interpreting the contents of the VARIANT as a pointer to a SAFEARRAY, validating
the first array index against cDims to be sure it's in range, then indexing into pvData by the size of cbElements, accounting for
the contents of this indices entry in the rgsabound array. Phew, what a mouthful!

Suddenly it's starting to look like maybe a class to encapsulate this stuff might be useful.

The class itself

Caveats

The class presented here does not cover all possibilities; not by a long chalk. What it does cover are the situations I've encountered using the Microsoft Chat Protocol
control and the MSHTML control. I suspect the code within Visual Basic that handles all the possibilities of the VARIANT type is orders of magnitude
more complex than the class presented here.

This class can handle simple VARIANTS with signed integer datatypes or strings. It can also handle 1 dimensional arrays where each element of the array
is a VARIANT which can be any of the simple types handled by the class. If you want more you can follow the code to see how to handle extra types. I've
not needed types beyond those supported so I haven't written support for those types.

You've already seen the simple constructors. There are two other constructors. The first constructor lets you define an array. It takes the lower bound for an index,
and a count of how many elements. The code looks like this.

From my description of the SAFEARRAY structure earlier this should all be pretty clear. We only support 1 dimensional arrays so we set the various
members of the newly created SAFEARRAY instance to reflect that fact. The new SAFEARRAYs rgsabound[0] structure is set with
our lower bound and count variables. It's important to remember that the VARIANT we're creating may be used to interoperate with a module created in
another language and we can't assume that indexes start at 0. Where you start your indexes depends on what you're interoperating with.

The fFeatures member needs some explanation. The flag values I used specify that the array contains VARIANTs of a fixed size and static (not
created on the stack). I specify that it's static because if I need to allocate memory I do it from the heap.

The other constructor lets you take an existing VARIANT (passed perhaps to an event handler for some foreign object you're hosting) and attach it to
a CVariant. The code looks like this.

If it's a debug build we do some asserts to be sure that it's a pointer to a block of valid memory at least large enough to actually contain a VARIANT.
There's not much more runtime validation we can do. Once we're sure it's something that could be a VARIANT we assign the pointer to the
pvarVal member and set the type to VT_VARIANT. Once that's done we can use any of the other member functions on the VARIANT as
though we'd created it ourselves.

Warning warning warning

Now listen up. Never ever use the CVariant::CVariant(VARIANT *pV) constructor to attempt to preserve a VARIANT across a function boundary.
The only reason you'd use this constructor is to put the class wrapper around a VARIANT you got from somewhere else. I don't want to say the only way you'd
get such a VARIANT is from an event but I'd put it at being asymptotically close to 100% of the time. This is why there's no Attach function.
The Attach idiom is a temptation to try and preserve something across function boundaries. It works for objects that are going to be around for a long time,
such as window handles, but it doesn't work for things like VARIANTS that are created on the fly to communicate with some other module (such as yours).

Note well that there is no attempt at a copy constructor. Life is way too short to try and write such a beast. Think about it. Your code would have to cope with
every possible variation and do deep copies of arrays within arrays within arrays.

VARIANT attributes

Once we've created our CVariant by whatever method we use it. You wouldn't use a VARIANT to communicate from one function of your program
to another function in the same program. You probably wouldn't want to use it across a DLL boundary either. There's too much overhead to make a VARIANT an
attractive proposition. So it's almost a given that you're communicating with something you didn't write yourself. Thus, there are a few functions you can call to
check the datatype of something that's been passed to you from the something you didn't write.

These IsAsomething() functions mirror the datatypes the class supports. If you're not sure about the type of a particular VARIANT use
these functions to determine if some operation you're about to perform has any chance of succeeding.

Why don't I encourage access to the vt member via an explicit member function? Glad you asked. Access to that member would return the exact type.
Why is that bad? It's bad because you then have to allow for all the myriad options. It could be VT_USERDEFINED or VT_BLOB_OBJECT or
VT_DISPATCH. Since the class doesn't handle those types you can do nothing useful with the information. Much better, in my opinion, to ask the class,
are you a string? Or are you an integer? If the answer is yes then you can proceed to perform meaningful operations. If not, you do whatever error handling is
appropriate.

Of course there's nothing stopping you accessing the vt member explicitly but if you do you're on your own.

VARIANT access

Once you've determined the data type you call the appropriate accessor. The accessors are used for both simple VARIANT access and for array access and take
a parameter which defaults to zero. The accessors figure out for themselves whether you've got an array or not and do the right thing depending on the exact contents
of the VARIANT.

What's the index base for arrays?

Since these are C++ wrappers for VARIANT and SAFEARRAY operations they treat arrays as being OPTION BASE 0. Internally they need
not be (they could have come from VB for example with OPTION BASE 1 set but internally the functions correct for the OPTION BASE.

The accessors use the ElementAt() helper function to access the data requested and then apply the appropriate data conversion based on the datatype.
The ElementAt() function looks like this.

You can see what I was talking about earlier. If the VARIANT is wrapping a VARIANT obtained from somewhere else we return that
VARIANT. If the VARIANT isn't an array we return a pointer to ourselves (remember the class is derived from the VARIANT
structure and has no vtable so this is equivalent to a pointer to the base VARIANT structure).

Otherwise we have an array so we calculate an offset into the SAFEARRAY taking into account the lower bound stored in the rgsabound structure.
Then we check that the offset is greater than or equal to 0 and less than the number of elements in the array and if it is we return a pointer to the
SAFEARRAY element. If you've specified an index that's invalid you get back a NULL pointer.

Notice that the bool overloads use the lowercase bool datatype, not the typedef'd BOOL. This is necessary to
distinguish between the int and bool overloads. We need the different overloads so we can in fact create a VARIANT with the
VT_BOOL type.

Type coercion

The class presented above doesn't do any 'type coercion' (nor does the class implementation in the download). This is by design. While I accept the idea of 'coercion' I
don't think it's appropriate in the C++ environment. I'd much rather know at runtime that an error occurred than have it masked by 'helpful' class design. The
class does play safe by returning a zero if the VARIANT isn't in fact numeric data of the expected kind or an empty string if it's not a string
VARIANT but it doesn't do type coercion. However, if you wanted to implement type coercion you might do it like this.

This little snippet returns a converted string if the VARIANT does indeed contain a string. Otherwise it attempts to convert numeric data into a string
representation and returns that. Finally, if the VARIANT type isn't any covered by the switch statement it returns an empty string.

Share

About the Author

I've been programming for 35 years - started in machine language on the National Semiconductor SC/MP chip, moved via the 8080 to the Z80 - graduated through HP Rocky Mountain Basic and HPL - then to C and C++ and now C#.

I used (30 or so years ago when I worked for Hewlett Packard) to repair HP Oscilloscopes and Spectrum Analysers - for a while there I was the one repairing DC to daylight SpecAns in the Asia Pacific area.

Afterward I was the fourth team member added to the Australia Post EPOS project at Unisys Australia. We grew to become an A$400 million project. I wrote a few device drivers for the project under Microsoft OS/2 v 1.3 - did hardware qualification and was part of the rollout team dealing directly with the customer.

Born and bred in Melbourne Australia, now living in Scottsdale Arizona USA, became a US Citizen on September 29th, 2006.

I work for a medical insurance broker, learning how to create ASP.NET websites in VB.Net and C#. It's all good.

Actually a loads better variant type is discussed in CUJ in its 2000 articles.It is templatized,it does not use the type fields like this class uses as the embedded union.Moreover,you can also try boost library's any_cast<> function to achieve near variant functionality.

Maybe I am being picky here, but I see you are using the A2W and W2A string conversion macros in your code. AFAIK your code will crash in UNICODE builds when you use those macros. Would you not be better off using T2OLE and OLE2T instead?

Well I'll be... You're right about using the generic conversion macros - no need for USES_CONVERSION. I wasn't aware of problems with UNICODE builds which probably indicates how often I do UNICODE builds

Rob Manderson

Colin Davies wrote: I'm sure Americans could use more of it, and thus reduce the world supply faster. This of course would be good, because the faster we run out globally, the less chance of pollution there will be. (Talking about the price of petrol) The Soapbox, March 5 2004

hi there! why do you bother about varíants? just use the concepts of PIDLs, which is very common in windows. it can contain ANY type, including complex objects.
it is a pointer to a block containing an id (like variants, probably the size of the type, and the object, which can be casted, itself).

Ever heard of COleVariant and COleSafeArray ?
Why write your own ? and if you must why not inherit ?
No offense intended but it seems to me like you wasted your time.
Anyway if you ever used COM interface you know why you need to use VARIANT (just try and pass a complex argument between VB and VC without writing your own Type Library).
Cheers,
Asa

Colin Davies wrote: I'm sure Americans could use more of it, and thus reduce the world supply faster. This of course would be good, because the faster we run out globally, the less chance of pollution there will be. (Talking about the price of petrol) The Soapbox, March 5 2004

1. The support for safe arrays is very useful, but you should include conversion of variables that are passed with the VT_BYREF flag too. I've noticed that VB often uses this flag when passing variants.

2. There are a number of conversion functions in VC++ in the form VarTYPE1FromTYPE2 in OleAuto.h. These can be used to convert numbers to a required type, eg:

Even though you made a good case about the use of VARIANT in certain situations, why even get yourself in those situations for you to have to consider using VARIANT?

Between the type conversion operator in C++, and the use of templates, why bother with VARIANT?

When I first learnt about VARIANT, I didn't like it, and even though your article (which is good BTW) makes it a little easier to digest, I still don't like VARIANT!!

I realize I might be the one losing out on certain benefits it might provide, but until I encounter a situation where I have no other means of using some other tool, I'll only go kicking and screaming.

Heh - I don't like em very much either But like I said, sometimes it's the lesser of two evils.

Rob Manderson

Colin Davies wrote: I'm sure Americans could use more of it, and thus reduce the world supply faster. This of course would be good, because the faster we run out globally, the less chance of pollution there will be. (Talking about the price of petrol) The Soapbox, March 5 2004

Between the type conversion operator in C++, and the use of templates, why bother with VARIANT?

C++ being the important part of this sentence.

Windows is NOT designed to be programmed using C++. It is designed to be available to more programming languages by sticking to a more basic C style API. Thus beasts like VARIANT must exist to enable more languages to program for Windows.

I have also been using the MSHTML control for quite some time. This encapsulate class would have helped me when I first discovered the VARIANT requirements. Also I had no idea of how to pass a "safe array" to one of the MSHTML methods.

I eventually learned about _variant_t only by searching MS source code. That is what I use for now. Sure wish VARIANT did not exist.