I've updated the code and the patch description to clarify the checking
of a NULL structure base before using the cpumask_t pointers. I've also
changed CPUMASK_VAR to CPUMASK_PTR to be a bit more clear on it's function.

* Provide a generic set of CPUMASK_ALLOC macros patterned after the
SCHED_CPUMASK_ALLOC macros. This is used where multiple cpumask_t
variables are declared on the stack to reduce the amount of stack
space required when the NR_CPUS count is large enough to warrant it.

Basically, if NR_CPUS <= BITS_PER_LONG then the multiple cpumask_t
structure (which needs to be pre-defined) is declared as a local
variable and pointers to each mask is provided. The compiler
will optimize out the extra dereference, resulting in code that
is the same without the pointer reference.

If NR_CPUS > BITS_PER_LONG, then instead of declaring the combined
cpumask_t structure on the stack, kmalloc is used to obtain the
memory space. In this case, the CPUMASK_FREE is now kfree instead
of a nop.

For both cases, CPUMASK_PTR declares and initializes each cpumask_t
pointer but these should *not* be used before the structure pointer
is verified not to be NULL. (This check for NULL will be optimized
out for the case where the structure is declared as local memory.)

One question that remains, should the threshold to use kmalloc be
BITS_PER_LONG or something larger? Sched uses NR_CPUS > 128, though
it has about 7 cpumask_t vars it uses. My (obvious) concern is when
NR_CPUS is 4096 (and soon 16384), but where is the line between a
fairly large system and a really huge system?