Manipulate and analyze big data with the full expression power of Rax - Relational Algebra with eXtensions.

Sort by Regex

Effectively this means no ORDER BY on GROUP_CONCAT in IMPALA. This must be implemented eventually, because the whole GROUP_CONCAT has limited use without it. Meantime, regexes can be used to put things back in order like so:

regex_replace('annadaveeve','(anna)(dave)(eve)','\3\1\2')

Which would place eve first, anna second, and dave third. The rank number rnum of each name can be expressed like this:

ROW_NUMBER() OVER(ORDER BY age DESC) AS rnum

The GROUP_CONCAT from rnum belonging to (anna)(dave)(eve) would be 231, which can be translated to \3\1\2 using regexp_replace and translate like so:

regexp_replace(translate('123', '231', '123'), '.', '\\\0')

Giving us the third argument of the sorting regex. The first and second argument are simple. Likewise an SQL SELECT statement like this:

SELECT
parent,
GROUP_CONCAT(child, '; ') OVER(ORDER BY age DESC)
FROM table
GROUP BY parent

Your milage may vary using this code, but if you, like Rax, need a solution right now, try a variant of this code. (You might need to double up the \\.) Also note that it might be safer to use an escaped version of child to use in the second argument of the outer regexp_replace like so:

regexp_replace(child, '[[:punct:]]', '\\\0') AS esc_child

Finally, note, this only works with group sizes below ten. To up that, replace 123456789 by abcdefghijklmnop (or longer), replace CAST(rnum AS STRING) with something like substr('abcdefghijklmnop', rnum, 1) and add an extra regexp_replace in front of strleft to transform \a\b\c...\o\p into \1\2\3..\15\16.