So after a bit more coffee and a bit of research it seems to me that
the only way you are going to get this to be fast would be if you used
a hash based looked for one of your sequences. Something like a
HashMap or BloomFilter would do the job, see:
http://stackoverflow.com/questions/4261619/fastest-set-operations-in-the-west
I have made some experimentations with the above in Scala, a HashMap
works nicely here in-terms of performance because it is simple and
available, and in your case with Integers you do not need to worry
about duplicates because the value of a number is it's identity.
So, if you don't want to implement a HashMap or BloomFilter in XQuery,
what is one to do? Well XQuery 3.1 will introduce the Map data-type
and some implementations already have done this. If you understand the
implementation (or even hazard a guess), there is a good chance that
that XQuery map may in-fact under the covers be a HashMap.
I won't say too much more about this as I have been discussing it with
Wolfgang this morning, and I think he will shortly post you a very
fast example when using XQuery 3.1 Maps in eXist...
On 28 September 2014 13:29, Adam Retter <adam.retter at googlemail.com> wrote:
>> I think that question is very implementation specific. If all of your
>> data is in RAM, as your dataset is relatively small and these are just
>> numbers, I would expect performance to be excellent. I am not sure
>> that sorting will make much of a difference, but it depends on the
>> implementation and how it initiates the search for a false comparison
>> in a large sequence.
>> I retract any previous statements that may have alluded to good
> performance. A quick test on Saxon and eXist shows that this is a very
> slow problem. I am trying to see if there is not a short-cut that
> could be taken, I will come back to you...
>> --
> Adam Retter
>> skype: adam.retter
> tweet: adamretter
>http://www.adamretter.org.uk
--
Adam Retter
skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk