Re: [SMW-devel] sorting pages with multiple values for a property

After returning from Wikimania in Alexandria, I can contribute my five
piastres to close this issue:
As was rightly remarked, SMW does not define the sorting behaviour if a
property has many values (or no value). This is documented. If you want to
define one special value (the first or whatever) for this purpose, then you
just need to make a property for that task and give it only one value per
page. Using the new sortkeys can also help in some cases.
The JavaScript live sorting of tables acts on the text as displayed, hence
depends on the order of displayed values. One could make that alphabetical if
considered useful. In general, we would like to have some more parameters for
printouts (e.g. to set a limit on how many values of a multi-valued property
should be given). We currently lack some smart syntax for doing this.
In the early days of SMW, we have also had one property value per column and
line (similar to what Mov GP 0 suggested). But this has funny effects if
there are many such columns, since you get all combinations of values -- just
as a proper DB is supposed to do it. This was not very useful and hence has
been dropped.
Another reason to keep sorting options low is performance. If you multiply
columns for each value given in certain columns, then you get much larger
result sets to handle (this was one reason early SMWs were slower on
querying), and result caching as scheduled for next release would probably be
less effective or more difficult.
So, to make long things short, we do not intend to extend the sorting options
anytime soon. I also feel that the multi-property sorting is already quite
complex; think of all that poor not so technically minded users that must
grok all that!
Cheers,
Markus
On Dienstag, 1. Juli 2008, Jon Lang wrote:
> Mov GP 0 wrote:
> > Hello,
> > I think the problem with sorting this is that the lines are not
> > atomar. Instead of having a table of the form
> >
> > |-
> > | Property1 || Property2.1, Property2.2, Property2.3, Property2.4
> > |-
> >
> > the output should be rather
> >
> > |-
> > | Property1 || Property2.1
> > |-
> > | Property1 || Property2.2
> > |-
> > | Property1 || Property2.3
> > |-
> > | Property1 || Property2.4
> > |-
> >
> > this would allow proper sorting. To not break anything, I suggest a
> > new parameter ie. called "group":
> >
> > {{#ask:
> > [[Author::+]]
> >
> > |?Author
> > |sort=Author
> > |group= false
> >
> > }}
> >
> > or, more familar to SQL, "groupby":
> >
> > {{#ask:
> > [[Author::+]]
> >
> > |?Author
> > |sort=Author
> > |groupby= Author
> >
> > }}
> >
> > This syntax could resolve the sorting problem.
>
> This does not resolve the sorting problem, since you're still left
> with the question of how to handle sorting when grouping multiple
> values into a single entry. As well, it opens a new can of worms in
> bringing up the question of whether a given page should be reported
> once, or once for every value that it has in a given property. It's a
> fascinating question that deserves discussion; but it has consequences
> well beyond the issue of sorting.
>
> For instance, take the following query:
>
> {{#ask:
> [[Category:Book]]
>
> | sort=Author
>
> }}
>
> Note that this query does not display the author for each book found;
> indeed, it doesn't even guarantee that a given book will name the
> author(s). Note also that I did not include any sort of grouping or
> degrouping parameter. What sort of result should this query produce?
>
> As written, I believe that it should list each book on the wiki
> exactly once. The question at hand is the order in which the books
> should be presented. There are actually two issues at hand here: what
> to do with multiple values of a property on a page, and what to do
> with the absence of values for a property on a page. I've already
> stated my proposal for resolving the first issue; for the second
> issue, Pages without the sorted property should probably be listed
> after pages with it, unless you explicitly state otherwise.
>
> --
>
> Now, let's look at "grouping":
>
> {{#ask:
> [[Category:Book]]
>
> | ?Author
> | duplicate=Author
>
> }}
>
> My proposal here is that "duplicate" causes the page to show up in the
> results once if it has zero or one Author, and once per Author if it
> has more than one, treating each entry as if it only had one Author.
> Note that I'm not sorting by Author in this query; you don't have to
> sort by a property in order to duplicate on it. Conversely, while I
> _am_ having it list the Author for each result, you don't have to do
> that, either. This is why I say that the subject should be addressed
> separately: it's largely orthogonal to the sorting problem, with the
> sole exception that for the case where you're willing to duplicate on
> the property that you're sorting on, the multiple values issue goes
> away.
--
Markus Krötzsch
Semantic MediaWiki http://semantic-mediawiki.orghttp://korrekt.org markus@...

Thread view

If a page gives multiple values for a property, which one wins when you
sort on that property?
In my local tests on SMW 1.1.2. I can't see any pattern to it. On
http://sandbox.semantic-mediawiki.org/wiki/Test_sorting (version
1.2f-SVN) it seems to sort using the "first" value on the page for the
property. That page sorts by Author and many papers put each author
annotation alphabetically. But this paper appears in the 'B's despite
having some 'A' authors:
Using and Combining RDF Vocabularies for Expert Finding
Boanerges Aleman-Meza
Lyndon JB Nixon
Axel Polleres <--- !!
John G. Breslin
Harold Boley
Anna V. Zhdanova <--- !!
Malgorzata Mochol
Uldis Bojars
If I sort descending, again SMW seems to sort on the first value it
finds for the property.
==> Is this a bug? You could define a sort order, e.g. if ascending
sort based on the lowest value of the property's values on the page.
A while ago Markus told me the order of property values for a page
retrieved from the database was indeterminate, so values might not be
returned in page order, so there's no "first" property value on a page.
==> Should
http://semantic-mediawiki.org/wiki/Help:Semantic_search#Sorting_results
just say "If pages have multiple values for the property, their sort
order is undefined." ? Or I could just not say anything ;-)
Thanks for any insight!
--
=S Page

Since the order in which the properties appear on a given page is
arbitrary, I would not want to use it to help resolve page sorting,
even if it is possible to do so. IMHO, the best approach is to sort
the properties on the page using the same approach that you intend for
sorting the pages (e.g., if sorting pages in ascending order, sort
multiple properties on a page in ascending order), then sort the pages
based on the first property in this order, with subsequent properties
being used to break ties.
--
Jonathan "Dataweaver" Lang

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello,
I think the problem with sorting this is that the lines are not
atomar. Instead of having a table of the form
|-
| Property1 || Property2.1, Property2.2, Property2.3, Property2.4
|-
the output should be rather
|-
| Property1 || Property2.1
|-
| Property1 || Property2.2
|-
| Property1 || Property2.3
|-
| Property1 || Property2.4
|-
this would allow proper sorting. To not break anything, I suggest a
new parameter ie. called "group":
{{#ask:
[[Author::+]]
|?Author
|sort=Author
|group= false
}}
or, more familar to SQL, "groupby":
{{#ask:
[[Author::+]]
|?Author
|sort=Author
|groupby= Author
}}
This syntax could resolve the sorting problem.
ys, MovGP0
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: http://getfiregpg.org
iEYEARECAAYFAkhqDRAACgkQw23Lwv58rb48ZwCglsP+33YIntGX65UHMAL7NQnD
zWEAoJhtPdWf1eeNz+c3S1yJ28psrj6+
=dsvg
-----END PGP SIGNATURE-----
2008/7/1 Jon Lang <dataweaver@...>:
> Since the order in which the properties appear on a given page is
> arbitrary, I would not want to use it to help resolve page sorting,
> even if it is possible to do so. IMHO, the best approach is to sort
> the properties on the page using the same approach that you intend for
> sorting the pages (e.g., if sorting pages in ascending order, sort
> multiple properties on a page in ascending order), then sort the pages
> based on the first property in this order, with subsequent properties
> being used to break ties.
>
> --
> Jonathan "Dataweaver" Lang
>
> -------------------------------------------------------------------------
> Check out the new SourceForge.net Marketplace.
> It's the best place to buy or sell services for
> just about anything Open Source.
> http://sourceforge.net/services/buy/index.php
> _______________________________________________
> Semediawiki-devel mailing list
> Semediawiki-devel@...
> https://lists.sourceforge.net/lists/listinfo/semediawiki-devel
>
>
--
------
You can download my public PGP-Key here:
http://members.aon.at/custos/public_pgp_key.asc
KeyID: FE7CADBE
Fingerprint: 9B8F 259C 4172 221C B2F8 BE3D C36D CBC2 FE7C ADBE

Mov GP 0 wrote:
> Hello,
> I think the problem with sorting this is that the lines are not
> atomar. Instead of having a table of the form
>
> |-
> | Property1 || Property2.1, Property2.2, Property2.3, Property2.4
> |-
>
> the output should be rather
>
> |-
> | Property1 || Property2.1
> |-
> | Property1 || Property2.2
> |-
> | Property1 || Property2.3
> |-
> | Property1 || Property2.4
> |-
>
> this would allow proper sorting. To not break anything, I suggest a
> new parameter ie. called "group":
>
> {{#ask:
> [[Author::+]]
> |?Author
> |sort=Author
> |group= false
> }}
>
> or, more familar to SQL, "groupby":
>
> {{#ask:
> [[Author::+]]
> |?Author
> |sort=Author
> |groupby= Author
> }}
>
> This syntax could resolve the sorting problem.
This does not resolve the sorting problem, since you're still left
with the question of how to handle sorting when grouping multiple
values into a single entry. As well, it opens a new can of worms in
bringing up the question of whether a given page should be reported
once, or once for every value that it has in a given property. It's a
fascinating question that deserves discussion; but it has consequences
well beyond the issue of sorting.
For instance, take the following query:
{{#ask:
[[Category:Book]]
| sort=Author
}}
Note that this query does not display the author for each book found;
indeed, it doesn't even guarantee that a given book will name the
author(s). Note also that I did not include any sort of grouping or
degrouping parameter. What sort of result should this query produce?
As written, I believe that it should list each book on the wiki
exactly once. The question at hand is the order in which the books
should be presented. There are actually two issues at hand here: what
to do with multiple values of a property on a page, and what to do
with the absence of values for a property on a page. I've already
stated my proposal for resolving the first issue; for the second
issue, Pages without the sorted property should probably be listed
after pages with it, unless you explicitly state otherwise.
--
Now, let's look at "grouping":
{{#ask:
[[Category:Book]]
| ?Author
| duplicate=Author
}}
My proposal here is that "duplicate" causes the page to show up in the
results once if it has zero or one Author, and once per Author if it
has more than one, treating each entry as if it only had one Author.
Note that I'm not sorting by Author in this query; you don't have to
sort by a property in order to duplicate on it. Conversely, while I
_am_ having it list the Author for each result, you don't have to do
that, either. This is why I say that the subject should be addressed
separately: it's largely orthogonal to the sorting problem, with the
sole exception that for the case where you're willing to duplicate on
the property that you're sorting on, the multiple values issue goes
away.
--
Jonathan "Dataweaver" Lang

After returning from Wikimania in Alexandria, I can contribute my five
piastres to close this issue:
As was rightly remarked, SMW does not define the sorting behaviour if a
property has many values (or no value). This is documented. If you want to
define one special value (the first or whatever) for this purpose, then you
just need to make a property for that task and give it only one value per
page. Using the new sortkeys can also help in some cases.
The JavaScript live sorting of tables acts on the text as displayed, hence
depends on the order of displayed values. One could make that alphabetical if
considered useful. In general, we would like to have some more parameters for
printouts (e.g. to set a limit on how many values of a multi-valued property
should be given). We currently lack some smart syntax for doing this.
In the early days of SMW, we have also had one property value per column and
line (similar to what Mov GP 0 suggested). But this has funny effects if
there are many such columns, since you get all combinations of values -- just
as a proper DB is supposed to do it. This was not very useful and hence has
been dropped.
Another reason to keep sorting options low is performance. If you multiply
columns for each value given in certain columns, then you get much larger
result sets to handle (this was one reason early SMWs were slower on
querying), and result caching as scheduled for next release would probably be
less effective or more difficult.
So, to make long things short, we do not intend to extend the sorting options
anytime soon. I also feel that the multi-property sorting is already quite
complex; think of all that poor not so technically minded users that must
grok all that!
Cheers,
Markus
On Dienstag, 1. Juli 2008, Jon Lang wrote:
> Mov GP 0 wrote:
> > Hello,
> > I think the problem with sorting this is that the lines are not
> > atomar. Instead of having a table of the form
> >
> > |-
> > | Property1 || Property2.1, Property2.2, Property2.3, Property2.4
> > |-
> >
> > the output should be rather
> >
> > |-
> > | Property1 || Property2.1
> > |-
> > | Property1 || Property2.2
> > |-
> > | Property1 || Property2.3
> > |-
> > | Property1 || Property2.4
> > |-
> >
> > this would allow proper sorting. To not break anything, I suggest a
> > new parameter ie. called "group":
> >
> > {{#ask:
> > [[Author::+]]
> >
> > |?Author
> > |sort=Author
> > |group= false
> >
> > }}
> >
> > or, more familar to SQL, "groupby":
> >
> > {{#ask:
> > [[Author::+]]
> >
> > |?Author
> > |sort=Author
> > |groupby= Author
> >
> > }}
> >
> > This syntax could resolve the sorting problem.
>
> This does not resolve the sorting problem, since you're still left
> with the question of how to handle sorting when grouping multiple
> values into a single entry. As well, it opens a new can of worms in
> bringing up the question of whether a given page should be reported
> once, or once for every value that it has in a given property. It's a
> fascinating question that deserves discussion; but it has consequences
> well beyond the issue of sorting.
>
> For instance, take the following query:
>
> {{#ask:
> [[Category:Book]]
>
> | sort=Author
>
> }}
>
> Note that this query does not display the author for each book found;
> indeed, it doesn't even guarantee that a given book will name the
> author(s). Note also that I did not include any sort of grouping or
> degrouping parameter. What sort of result should this query produce?
>
> As written, I believe that it should list each book on the wiki
> exactly once. The question at hand is the order in which the books
> should be presented. There are actually two issues at hand here: what
> to do with multiple values of a property on a page, and what to do
> with the absence of values for a property on a page. I've already
> stated my proposal for resolving the first issue; for the second
> issue, Pages without the sorted property should probably be listed
> after pages with it, unless you explicitly state otherwise.
>
> --
>
> Now, let's look at "grouping":
>
> {{#ask:
> [[Category:Book]]
>
> | ?Author
> | duplicate=Author
>
> }}
>
> My proposal here is that "duplicate" causes the page to show up in the
> results once if it has zero or one Author, and once per Author if it
> has more than one, treating each entry as if it only had one Author.
> Note that I'm not sorting by Author in this query; you don't have to
> sort by a property in order to duplicate on it. Conversely, while I
> _am_ having it list the Author for each result, you don't have to do
> that, either. This is why I say that the subject should be addressed
> separately: it's largely orthogonal to the sorting problem, with the
> sole exception that for the case where you're willing to duplicate on
> the property that you're sorting on, the multiple values issue goes
> away.
--
Markus Krötzsch
Semantic MediaWiki http://semantic-mediawiki.orghttp://korrekt.org markus@...

2008/7/28 Markus Krötzsch <markus@...>:
> After returning from Wikimania in Alexandria, I can contribute my five
> piastres to close this issue:
>
> As was rightly remarked, SMW does not define the sorting behaviour if a
> property has many values (or no value). This is documented. If you want to
> define one special value (the first or whatever) for this purpose, then
you
> just need to make a property for that task and give it only one value per
> page. Using the new sortkeys can also help in some cases.
>
> The JavaScript live sorting of tables acts on the text as displayed, hence
> depends on the order of displayed values. One could make that alphabetical
if
> considered useful. In general, we would like to have some more parameters
for
> printouts (e.g. to set a limit on how many values of a multi-valued
property
> should be given). We currently lack some smart syntax for doing this.
>
> In the early days of SMW, we have also had one property value per column
and
> line (similar to what Mov GP 0 suggested). But this has funny effects if
> there are many such columns, since you get all combinations of values --
just
> as a proper DB is supposed to do it. This was not very useful and hence
has
> been dropped.
>
> Another reason to keep sorting options low is performance. If you multiply
> columns for each value given in certain columns, then you get much larger
> result sets to handle (this was one reason early SMWs were slower on
> querying), and result caching as scheduled for next release would probably
be
> less effective or more difficult.
>
> So, to make long things short, we do not intend to extend the sorting
options
> anytime soon. I also feel that the multi-property sorting is already quite
> complex; think of all that poor not so technically minded users that must
> grok all that!
Sorry to dig up this old thread, but I was searching back over the list to
see if the behaviour that I ran into was documented.
I have several pages with several properties, some properties with multiple
values. When I #ask for pages (default, tabular output), I get one row of
data per page, with multiple values per-page grouped into separate lines of
one cell. However, when I sort on a property that can have more than one
value, I see some pages turning up multiple times in the result. The page
occurs once in the table for each unique instance of the property, in the
right sort order for that property.
Because that's a bit cryptic, here is an example:
Sorting on ID (or without sorting):
+----+-----------+-----------+
| ID | Property1 | Property2 |
+----+-----------+-----------+
| A | P | X |
+----+-----------+-----------+
| B | Q | W |
| | | Y |
+----+-----------+-----------+
| C | R | Z |
+----+-----------+-----------+
...
Sorting on Property2:
+----+-----------+-----------+
| ID | Property1 | Property2 |
+----+-----------+-----------+
| B | Q | W |
| | | Y |
+----+-----------+-----------+
| A | P | X |
+----+-----------+-----------+
| B | Q | W |
| | | Y |
+----+-----------+-----------+
| C | R | Z |
+----+-----------+-----------+
...
Is this now the agreed correct behaviour? It seems reasonable, but the above
discussion was never resolved, so I thought I'd ask.
At first I found this behaviour confusing, but actually it fits what I need
quite well. The only slightly annoying thing is that the multiple value that
is being sorted on occurs both times in both places (in an arbitrary order).
If you go the whole hog and duplicate the row, I'd rather see something like
this:
Sorting on Property2:
+----+-----------+-----------+
| ID | Property1 | Property2 |
+----+-----------+-----------+
| B | Q | W |
+----+-----------+-----------+
| A | P | X |
+----+-----------+-----------+
| B | Q | Y |
+----+-----------+-----------+
| C | R | Z |
+----+-----------+-----------+
...
(Its just a bit neater).
Thanks,
Dan.
> Cheers,
>
> Markus
>
>
> On Dienstag, 1. Juli 2008, Jon Lang wrote:
>> Mov GP 0 wrote:
>> > Hello,
>> > I think the problem with sorting this is that the lines are not
>> > atomar. Instead of having a table of the form
>> >
>> > |-
>> > | Property1 || Property2.1, Property2.2, Property2.3, Property2.4
>> > |-
>> >
>> > the output should be rather
>> >
>> > |-
>> > | Property1 || Property2.1
>> > |-
>> > | Property1 || Property2.2
>> > |-
>> > | Property1 || Property2.3
>> > |-
>> > | Property1 || Property2.4
>> > |-
>> >
>> > this would allow proper sorting. To not break anything, I suggest a
>> > new parameter ie. called "group":
>> >
>> > {{#ask:
>> > [[Author::+]]
>> >
>> > |?Author
>> > |sort=Author
>> > |group= false
>> >
>> > }}
>> >
>> > or, more familar to SQL, "groupby":
>> >
>> > {{#ask:
>> > [[Author::+]]
>> >
>> > |?Author
>> > |sort=Author
>> > |groupby= Author
>> >
>> > }}
>> >
>> > This syntax could resolve the sorting problem.
>>
>> This does not resolve the sorting problem, since you're still left
>> with the question of how to handle sorting when grouping multiple
>> values into a single entry. As well, it opens a new can of worms in
>> bringing up the question of whether a given page should be reported
>> once, or once for every value that it has in a given property. It's a
>> fascinating question that deserves discussion; but it has consequences
>> well beyond the issue of sorting.
>>
>> For instance, take the following query:
>>
>> {{#ask:
>> [[Category:Book]]
>>
>> | sort=Author
>>
>> }}
>>
>> Note that this query does not display the author for each book found;
>> indeed, it doesn't even guarantee that a given book will name the
>> author(s). Note also that I did not include any sort of grouping or
>> degrouping parameter. What sort of result should this query produce?
>>
>> As written, I believe that it should list each book on the wiki
>> exactly once. The question at hand is the order in which the books
>> should be presented. There are actually two issues at hand here: what
>> to do with multiple values of a property on a page, and what to do
>> with the absence of values for a property on a page. I've already
>> stated my proposal for resolving the first issue; for the second
>> issue, Pages without the sorted property should probably be listed
>> after pages with it, unless you explicitly state otherwise.
>>
>> --
>>
>> Now, let's look at "grouping":
>>
>> {{#ask:
>> [[Category:Book]]
>>
>> | ?Author
>> | duplicate=Author
>>
>> }}
>>
>> My proposal here is that "duplicate" causes the page to show up in the
>> results once if it has zero or one Author, and once per Author if it
>> has more than one, treating each entry as if it only had one Author.
>> Note that I'm not sorting by Author in this query; you don't have to
>> sort by a property in order to duplicate on it. Conversely, while I
>> _am_ having it list the Author for each result, you don't have to do
>> that, either. This is why I say that the subject should be addressed
>> separately: it's largely orthogonal to the sorting problem, with the
>> sole exception that for the case where you're willing to duplicate on
>> the property that you're sorting on, the multiple values issue goes
>> away.
>
>
>
> --
> Markus Krötzsch
> Semantic MediaWiki http://semantic-mediawiki.org
> http://korrekt.org markus@...
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's
challenge
> Build the coolest Linux based applications with Moblin SDK & win great
prizes
> Grand prize is a trip for two to an Open Source event anywhere in the
world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Semediawiki-devel mailing list
> Semediawiki-devel@...
> https://lists.sourceforge.net/lists/listinfo/semediawiki-devel
>
>