Re: What is a Boxed Array?

A "box" is a cell representing some value in a program. It generally
contains a pointer to code (a thunk), or to a proper value. When
evaluation of that box is forced for the first time, the code
executes, and when it is done, it updates the pointer with a pointer
to the result value. There are a number of slight variations on this
scheme, but essentially the idea is that there is a layer of
indirection which keeps track of whether the value has been evaluated
yet or not. This sort of thing forms a major role in the
implementation of a lazy language like Haskell.

Ordinary Arrays in Haskell, say, Array i e for some index type i, and
element type e, are arrays of boxed values. That is, each element of
the array is a pointer to either code, or a value, and each may be
evaluated separately and lazily.

By contrast, unboxed arrays, like those of type UArray i e, are arrays
of unboxed values. The layer of indirection which normally allows for
separate lazy evaluation of the elements is missing. Consequently,
such arrays consume much less space in overhead, however, there is a
penalty to be paid in ease of use. The array's box now has code which
determines the value of every element of the array as soon as it is
forced. Thus, even a single lookup in the array will cause the entire
array to be computed. If you were going to compute it anyway, you
don't lose much. What you do lose however, is the ability to define
the array elements recursively in terms of one another.

The following definition works for boxed array types, but not unboxed ones:
a = array (0,20000) $ (0,0):(1,1):[(i, a ! (i-1) + a ! (i-2)) | i <- [2..20000]]

Further, you restrict yourself to only a small collection of possible
element types which have a compact unboxed representation. A list of
these in the case of UArray is available in the documentation for
Data.Array.Unboxed.

Re: What is a Boxed Array?

On Fri, Dec 09, 2005 at 02:29:33PM -0500, Cale Gibbard wrote:

> A "box" is a cell representing some value in a program. It generally
> contains a pointer to code (a thunk), or to a proper value. When
> evaluation of that box is forced for the first time, the code
> executes, and when it is done, it updates the pointer with a pointer
> to the result value. There are a number of slight variations on this
> scheme, but essentially the idea is that there is a layer of
> indirection which keeps track of whether the value has been evaluated
> yet or not. This sort of thing forms a major role in the
> implementation of a lazy language like Haskell.
>
> Ordinary Arrays in Haskell, say, Array i e for some index type i, and
> element type e, are arrays of boxed values. That is, each element of
> the array is a pointer to either code, or a value, and each may be
> evaluated separately and lazily.

I just want to add that many strict languages also have a distinction
between boxed and unboxed arrays (or offer only boxed arrays), because
there are other reasons to box values besides implementing laziness,
like for simplifying the memory model (which can simplify GC
implementation), allowing to intermix values of different types
(different kinds of polymorphism) or allow variable sized values (think
strings, arrays and arbitrary size integers).

Some concrete examples: Arrays in many scripting languages, like Python,
are often boxed. There are some languages, like R, where there are many
types of unboxed arrays, for efficiency reasons.

The situation in OCaml is a bit more complicated - standard arrays are
boxed in general, the tagged 31bit int hack makes them unboxed for ints,
and arrays of doubles are unboxed as an exception - IIUC in some cases
this requires a runtime check, for example in non-inlined functions
which are polymorphic on the type of array elements.

> Some concrete examples: Arrays in many scripting languages, like Python,
> are often boxed. There are some languages, like R, where there are many
> types of unboxed arrays, for efficiency reasons.

There are nice unboxed arrays in Clean as well. The gain of efficiency is
enormous.

==

Hmmm.... I thought that arrays in Python have no reason to be boxed. Lists,
yes, since they are untyped, so they are arrays of pointers, but arrays are
homogeneous.

I don't know why boxing may simplify the *memory model*.

The most dramatic effect of implicit boxing is the possibility -
available in Smalltalk - to change the *identity* of an object. You
write (x become y), and all references to the object assigned to x, point
now to something completely different. A terrible weapon.

Re: What is a Boxed Array?

> Hmmm.... I thought that arrays in Python have no reason to be boxed. Lists,
> yes, since they are untyped, so they are arrays of pointers, but arrays are
> homogeneous.

Python has a different meaning for "list" than the rest of the world.
When python says "list", read "boxed array" (heterogeneous) and when
it says "array" read "unboxed array" (homogenous).

> I don't know why boxing may simplify the *memory model*.

If you can have unboxed types floating around then you need to ether
put them on the stack or your GC needs to be cool with them...
relocating GC I imagine having especial trouble with unboxed types.
Having a common handle on all types would certainly simplify the GC.
I don't know if this is what he meant by "memory model".

> The most dramatic effect of implicit boxing is the possibility -
> available in Smalltalk - to change the *identity* of an object.