Kevin J. McCann wrote:
> I have a very large data set (64000 x 583) in which negative values
> indicate "no data", unfortunately these negatives are not all the same.
> I would like to efficiently set all these negatives to zero. I know that
> I will likely be embarrassed when I see how to do it, but I can't seem
> to remember or figure it out. I should emphasize that because of the
> size of the data set, this needs to be done efficiently. Another
> programming language does it as follows:
>
> x(x < 0) = 0;
Here is a couple of solutions. They works fine but speaking about
efficiency they are about 70 times *slower* than the vectorization you
used with the other product.
First, we create a small set of data to show the principle.
data = RandomReal[{-10, 100}, {6, 4}]
{{90.6031, 16.644, 15.2568, 88.4432}, {95.3404, -0.391179, 22.6264,
41.0332}, {18.7866, 90.8717, 48.073, 59.3251}, {24.2224, 21.1771,
91.7082, 50.719}, {96.9408, 27.4581, 56.9265, 2.22925}, {31.6366,
0.266302, 68.7124, 7.80917}}
Then we use a replacement rule,
data /. x_ /; x < 0 -> 0.
{{90.6031, 16.644, 15.2568, 88.4432}, {95.3404, 0., 22.6264,
41.0332}, {18.7866, 90.8717, 48.073, 59.3251}, {24.2224, 21.1771,
91.7082, 50.719}, {96.9408, 27.4581, 56.9265, 2.22925}, {31.6366,
0.266302, 68.7124, 7.80917}}
We can also do it we *Cases*,
Cases[data, x_ /; x < 0 -> 0., {-1}]
{0.}
Now we test both method on a matrix of doubles of the size you
specified, and check the time spent in seconds.
data = RandomReal[{-10, 100}, {64000, 583}];
Timing[data /. x_ /; x < 0 -> 0.;][[1]]
Timing[Cases[data, x_ /; x < 0 -> 0., {-1}];][[1]]
62.046
49.797
In comparison, a similar replacement on a similar matrix done with the
other product takes less than a second.
>> x = -10 + (100 - (-10)).*rand(64000,583);
>> tic; x(x < 0) = 0; toc
Elapsed time is 0.867847 seconds.
>> whos x
Name Size Bytes Class Attributes
x 64000x583 298496000 double
I am confident that we can improve the performances for Mathematica; but
I draw a blank right now (though I suspect something is going on with
the packed array technology used by Mathematica).
Regards,
--
Jean-Marc