I have been using http://www.snowflake.net for a new data processing at work for a few months, and it’s just amazing, to be able to run large queries over large data-sets, and the ability to increase the cluster size, when doing development work to get faster turnarounds, that are not impacting the production cluster, brilliant.

One of the thing I have noticed is slower than I would like is joins based on tableA.time being inside a time-range of tableB.start and tableB.end when the time period being is in the months not days.

So the pattern mapping a value from TABLE_I onto all rows in the time span (not including the end)

For one set of data spanning 3 months the above takes 45 minutes on a small cluster for TABLE_P 65M/TABLE_I 10M rows.
Where-as for a similar set of 4 days, and ~45M rows this takes 30 seconds.

So I add some TO_DATE(time), TO_DATE(start_time) columns to the two tables, and then added AND tp.time_day = i.start_time_day and the first query went to ~60 seconds. But I was missing a few million rows as my time ranges span multiple days…

So I did many things that didn’t work (like trying to use a GENERATOR with dynamic input) and settled on a simple solution

So the above code creates another temp table with a row per table B record with every Day is a duplicate row, now we have more rows in the seconds table, but we can do a date based match to speedup the query.

CREATE OR REPLACE TEMPORARY TABLE WORKING_B_B AS
SELECT tp.u_id, tp.time, i.value
FROM TABLE_P_B tp
JOIN TABLE_I_B i
ON tp.u_id = i.u_id AND tp.time_day = i.range_day_part
AND tp.time >= i.start_time AND tp.time < i.end_time;
This code runs in 60 seconds and gives the same results as the 45 minute code.

Things to note, putting LATERAL table joins on a selects with CTE’s presently breaks the SQL parser, in fact even nested selects and LATERAL don’t mix, thus the extra tables with _B etc. Also CTE’s make life so much easier, but as you start joining to them a lot, performance slips, I have found where I do a complex join is a good time to output to a temporary table, and the performance again is crazy…

James has a new post called http://prog21.dadgum.com/206.html in which he wonders if lots of the idioms like const, and static or sealed classes are them to allow developers to protect themselves from themselves. I would have commented on his blog, but he doesn’t like comments, which is a completely sane stance to have, heck the only comments I get here are spam.

So I had always thought those idioms where there to allow complier optimizations because there was extra information that was not tracked on older compilers, or to make the code cleaner.

const variables are to a avoid macro’s “true evil” and to keep the type information.
static allowed to keep the symbol table size down
sealed allowed real optimization, as functions will never to replaced.

And these were needed, so the new language could perform “better” at some key benchmark used to once again prove that poor “real world” problems in assembly, C are faster than in new language X. Thus all problems should use C/asm.

The other day on Google+ I read a post from Ana Andres talking about coding Tiny Planets, and how it was running really slow in Matlab somewhere in the pixel drawing code.

That got me thinking about how I would code it in C#, and Ana had a good blog post showing one way of thinking about how the transformation was happening , one of her goals was to make an animated GIF so people could see the wrap bend happening.

Now the this was plenty fast enough, I was getting 2-3 fps on my laptop using only one core, so from a speed perspective moved on.

My first plan was to interpolate between the points (all though that would really slow, and look bad at the outside), but when I showed my co-worker my method, he said I was silly and should just traverse the output space, and find the input pixel that maps, so I did that and got:

I then had to put the bend code back in, as that was a special aspect needed for the animation and got:

I then started playing with sub-sampling to make the output less jaggie (F), and then I decided blend in YUV (or YCrCb) colour space, and finally settled on five point YUV average where you have a center of the pixel and the four 1/4 towards the pixel diagonal corners. Givening:

We are porting our MSVC/Win32 applications to Clang/GCC/Linux and have just spent the morning tracing why our unit tests fail.

void BadExampleCode()
{
double a = NAN;
ASSERT(!isnan(a));
}

Under MSVC and Clang all good, GCC asserts. We added printf’s and looked at the assembly and the code was hard coded to 0.

Some googling found 2006 posts stating GCC -ffast-math did odd things with isnan, and it’s still a presently reported issue

This came about because we are using Premake, and had the FloatFast flag set, because that’s how our MSVC projects were set, and we don’t want to change those builds, so for now we have tweaked the Premake code for this flag under GCC, as it doesn’t make sense that you would ever want isnan to be hard coded to zero, that’s really not fast maths at all.

We also put a #ifdef for ffast-math to break if this flag is present in the future.