Ah yes, I see what you mean. It is fairly common to want to ORDER by
variables that are not projected, so that would constrain ORDER to be
done before projection.
I withdraw my suggestion. Thanks for your help!
David
On Wed, 2011-09-14 at 18:18 -0400, Stephen Allen wrote:
> Hi David,
>
> Unfortunately, solution sequence modifiers cannot be rearranged as you
> suggest. The Order modifier must appear before Projection in order to
> have access to all variables in the where clause. Distinct and
> Reduced must appear after Projection since that can quite easily
> introduce duplicates.
>
> As you note, the implementation is free to reorder these operations if
> it can produce the same solution sequence. Implementations typically
> do this for query optimization purposes.
>
> In the particular instance you cite, if the optimizer can determine a
> priori that there are likely to be a large number of duplicates AND
> there is no projection, it can perform the distinct before the order.
> However, it is not clear to me that an optimizer could easily
> determine the number of duplicates ahead of time. Indeed, if the
> number of duplicates were a small proportion of the total number of
> bindings, this optimization could be more expensive, since it could
> otherwise optimize distinct(order(...)) as reduced(order(...)).
>
> Speaking more directly about Parliament, it uses Jena's ARQ to perform
> query execution. That is slated to receive an update at some point in
> the future to perform non-memory bound sorting [1]. If you are
> interested in query optimization in ARQ and Parliament, I recommend
> you browse [2], [3], [4] and [5].
>
> -Stephen
>
> [1] https://issues.apache.org/jira/browse/JENA-44
> [2] https://issues.apache.org/jira/browse/JENA-89
> [3] https://issues.apache.org/jira/browse/JENA-90
> [4] https://issues.apache.org/jira/browse/JENA-109
> [5] https://issues.apache.org/jira/browse/JENA-111
>
>
>
>
> > -----Original Message-----
> > From: public-rdf-dawg-comments-request@w3.org [mailto:public-rdf-dawg-
> > comments-request@w3.org] On Behalf Of David Booth
> > Sent: Wednesday, September 14, 2011 5:23 PM
> > To: public-rdf-dawg-comments
> > Subject: Order of applying solution sequence modifiers
> >
> > Sec 15 Solution Sequences and Modifiers
> > http://www.w3.org/TR/sparql11-query/#solutionModifiers
> > says that the ORDER modifier is applied before the DISTINCT modifier.
> > I
> > suggest that this be changed to say that ORDER is applied after
> > REDUCED.
> > This would be a very simple change to the spec -- moving one line in
> > sec
> > 15. AFAICT it would not affect the semantics, but I think it would
> > help
> > implementers.
> >
> > The reason I suggest this is that an implementation performing the
> > ORDER
> > before DISTINCT must generate *all* (non-distinct) solutions in order
> > to
> > sort them, in spite of the fact that the ordering of the solutions can
> > never affect the elimination of those that are non-distinct. AFAICT it
> > would be much more efficient to perform the ORDER *after* performing
> > DISTINCT (or REDUCED).
> >
> > For example, there may be 100 million solutions but only 20 distinct
> > solutions. If SORT is done first, then all 100 million solutions must
> > be held in memory at once and sorted, before later reducing the result
> > set to 20. But if DISTINCT is done first, then duplicate solutions can
> > be discarded *as* they are generated, so only 20 need to be held in
> > memory.
> >
> > Granted, since an implementation is free to optimize things however it
> > wishes anyway -- provided that it still produces the correct results --
> > it could perform the ORDER modifier after the DISTINCT and REDUCED
> > modifiers even if the spec says to do them the other way around. So in
> > that sense there is no absolute need to change the order of these
> > modifiers in the SPARQL specification. However, since most
> > implementations will first want to ensure correctness and later
> > optimize, I think it would be helpful to change the SPARQL
> > specification
> > to say that ORDER is done after REDUCED. Indeed, the reason I ran
> > into
> > this is that the implementation that I'm currently using (Parliament
> > 2.7.1) appears to be doing the SORT first (per the spec), thus causing
> > a
> > query that I thought should have run quite efficiently to instead run
> > the system out of memory.
> >
> > Thanks!
> >
> >
> > --
> > David Booth, Ph.D.
> > http://dbooth.org/
> >
> > Opinions expressed herein are those of the author and do not
> > necessarily
> > reflect those of his employer.
>
>
>
>
--
David Booth, Ph.D.
http://dbooth.org/
Opinions expressed herein are those of the author and do not necessarily
reflect those of his employer.