The fact that you're copying a matrix piece by piece using MLGetNext() +
MLGet<type> is a truly tiny inefficiency. Seriously! I mean, it obviously is
an inefficiency, but I seriously doubt it's the biggest inefficiency you're
running into. I don't know exactly what your problem is...based upon your
concerns, I'll assume you're working with large data sets at some level...but in
general for this sort of problem, there are bigger fish to fry. Let me outline
several different points below that you'll want to understand about dealing with
large data sets.
First big point, are the data being represented as a packed array in
Mathematica? I can tell you right now that if you have Indeterminate or
Infinity in your data set, the answer is no!! For example:
In[1]:= ByteCount[N@Range[100000]]
Out[1]= 800168
In[2]:= ByteCount[Append[N@Range[100000], 1.]]
Out[2]= 800176
In[3]:= ByteCount[Append[N@Range[100000], 1./0.]]
Power::infy: Infinite expression 1/0. encountered. >>
Out[3]= 3200088
Let me be very clear. Mathematica certainly understands IEEE math internally,
but it represents the edge cases not as doubles, but as symbols. You report
below that Mathematica correctly reads IEEE NaNs off of MathLink. You're
correct, but this is a one-way conversion which immediately loses the game if
you were trying to maintain an extremely compact form.
Second big point...could you have done things more efficiently in Mathematica,
and avoided the massive inefficiency of pushing data sets through a shared
memory protocol between processes? I can't speak to this at all since I don't
know what you're doing, but you should understand that transmission of data
between processes is never cheap, and a bit less so in a highly structured
communications protocol like MathLink. But, more importantly, this compounds on
top of my first point. Packed arrays will be transferred over MathLink in a
much more compact form than a list of mixed floats and symbols. Expression
lists will take much longer to push over MathLink. This communication time will
dwarf any costs to the particular method by which you retrieve the data from the
link.
Third big point...as I said above, the transmission is going to be more costly
than the individual read calls (which don't affect the mode of
transmission...only how the data are being spoon-fed to you). Packed arrays
just represent that many fewer bytes to transmit, and that turns out to be a
pretty big deal. You could, in Mathematica, do a last minute conversion to a
packed array by substituting Indeterminates and Infinities with some sort of
magic value that Mathematica will leave alone as a float, but that your C
program recognizes. There might be a better way to do this...I am not the
world's expert on the kernel's internal representations, but I know quite a lot
about MathLink, and this is the best idea I have. Of course, synthesizing the
packed array has its own cost...whether the benefit is outweighed by the cost is
something I would probably determine experimentally.
Fourth big point...the Manual type does *not* preclude efficient reading of
packed arrays when that's what has actually been written. If MLGetReal64Array()
succeeds, then you've won! In all likelihood (although exceptions are
technically possible), you've gotten the highest speed transmission rate, and
you get to have only one copy of the data in memory, so long as you're willing
to treat that copy as immutable. If MLGetReal64Array() fails, then you can
construct the array using the more piecemeal approach I suggested in my previous
email.
Sincerely,
John Fultz
jfultz at wolfram.com
User Interface Group
Wolfram Research, Inc.
On Thu, 15 Sep 2011 04:40:56 -0400 (EDT), Roman wrote:
> Thanks John. The entire reason why I am communicating with external C
> procedures is in order to speed up computation; if much time is spent
> in the communication interface then this defeats the point. In
> particular, when passing large arrays of real numbers (containing NaN
> and/or inf) then receiving the array element by element via
> MLGetNext() seems a very inefficient thing to do.
>
> I appreciate how faithful Mathematica is in the transmission process,
> but when I pass "NaN" from C to Mathematica via MLPutReal64() then
> Mathematica does in fact receive the "Indeterminate" symbol, not an
> IEEE "NaN". So the conversion capability is built into Mathematica in
> one direction but not in the other. What I was hoping for is a trick
> which allows me to use such an automatism in the Mathematica->C
> direction via MLGetReal64Array(). For instance, is there a way to
> convert a matrix in Mathematica into a pure numerical representation
> (where every element must be an IEEE number) which could then be
> forwarded immediately (no conversions) to MLGetReal64Array()?
>
> Cheers!
> Roman
>
>
> On Sep 12, 10:23 am, John Fultz <jfu... at wolfram.com> wrote:
>> On Sat, 10 Sep 2011 07:29:23 -0400 (EDT), Roman wrote:
>>> Hello all,
>>> I am setting up a C function which accepts real numbers from MathLink.
>>> The behavior I would like to achieve is that whenever the number is
>>> "Infinity" then the C function receives "inf" (which is a valid
>>> double-
>>> precision-format number); and whenever the number is "Indeterminate"
>>> then the C function receives "nan" (which is also a valid double-
>>> precision-format number).
>>> Unfortunately MathLink (Mathematica 7.0 for Mac OS X x86 (64-bit))
>>> crashes whenever I am trying to pass either Infinity or Indeterminate
>>> to a MathLink function expecting a double-precision number.
>>> Would you know how to solve this without going into If[] statements on
>>> the Mathematica side of MathLink?
>>> Thanks!
>>> Roman
>>>
>> Mathematica represents Indeterminate and Infinity as symbols in its
>> expression tree, and MathLink is always very faithful about transmitting
>> the expression tree precisely. Note that it's not very difficult to
>> deal with this in= your C code, though. You can just declare the
>> function as having a Manual MathLink type and then, in the C function
>> determine using MLGetNext() whether the next thing is a symbol or a
>> real. If it's a symbol, then you can just synthesizes the IEEE version
>> of the indeterminate value in your C program.
>>
>> About two thirds of the way down this help page:
>>
>> tutorial/HandlingListsArraysAndOtherExpressions
>>
>> there's an example that illustrates how to use Manual as an argument
>> type=