Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> Greg Stark <gsstark(at)mit(dot)edu> writes:
> > Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> >> 2) switching the locale at run time is too expensive when using the system
> >> library.
>
> > Fwiw I did some experiments with this and found it wasn't true.
>
> Really?
We're following two different methodologies so the results aren't comparable.
I exposed strxfrm to postgres and then did a sort on strxfrm(col). The
resulting query times were slower than sorting on lower(col) by negligible
amounts.
I have the original code I wrote and Joe Conway's reimplementation of it using
setjmp/longjmp to protect against errors. However the list archives appear to
have been down that month so I've attached Joe Conway's implementation below.
> These are on machines of widely varying horsepower, so the absolute
> numbers shouldn't be compared across rows, but the general story holds:
> setlocale should be considered to be at least an order of magnitude
> slower than strcoll, and on non-glibc machines it can be a whole lot
> worse than that.
I don't see how this is relevant though. One way or another postgres is going
to have to sort strings in varying locales chosen at run-time. Comparing
against strcoll's execution time without changing changing locales is a straw
man. It's like comparing your tcp/ip bandwidth with the loopback interface's
bandwidth.
I see no reason to think Postgres's implementation of looking up xfrm rules
for the specified locale will be any faster than the OS's. We know some OS's
suck but some certainly don't.
Perhaps glibc's locale handling functions ought to be available as a separate
library users of those OS's could install -- if it isn't already.
> I don't think we can take that attitude when the cost penalty involved
> can be a couple of orders of magnitude.
Aside from the above complaint there's another problem with your methodology.
An order of magnitude change in the cost for strcoll isn't really relevant
unless the cost for strcoll is significant to begin with. I suspect strcoll
costs are currently dwarfed by the palloc costs to evaluate the expression
already.
Here's the implementation in postgres from Joe Conway btw: