Essentially the team has given up on the hope of using Wikidatahierarchies to suggest generalised "depicts" values to store for imageson Commons, to match against terms in incoming search requests.

i.e. if an image is of a German Shepherd dog, and identified as such,the team has given up on trying to infer in general from Wikidata that'dog' is also a search term that such an image should score positively with.

Apparently the Wikidata hierarchies were simply too complicated, toounpredictable, and too arbitrary and inconsistent in their design acrossdifferent subject areas to be readily assimilated (before one evenstarts on the density of bugs and glitches that then undermine them).

Instead, if that image ought to be considered in a search for 'dog', itlooks as though an explicit 'depicts:dog' statement may be going to beneeded to be specifically present, in addition to 'depicts:German Shepherd'.

Some of the background behind this assessment can be read inhttps://phabricator.wikimedia.org/T199119in particular the first substantive comment on that ticket, by Cparle on10 July, giving his quick initial read of some of the issues usingWikidata would face.

SDC was considered a flagship end-application for Wikidata. If the datain Wikidata is not usable enough to supply the dogfood that project wasexpected to be going to be relying on, that should be a serious wake-upcall, a red flag we should not ignore.

If the way data is organised across different subjects is currently tooinconsistent and confusing to be usable by our own SDC project, arethere actions we can take to address that? Are there design principlesto be chosen that then need to be applied consistently? Is thissomething the community can do, or is some more active direction goingto need to be applied?

Wikidata's 'ontology' has grown haphazardly, with little oversight, likean untended bank of weeds. Is some more active gardening now required?

Post by James HealdApparently the Wikidata hierarchies were simply too complicated, toounpredictable, and too arbitrary and inconsistent in their design acrossdifferent subject areas to be readily assimilated (before one evenstarts on the density of bugs and glitches that then undermine them).

The main problem is that there is no standard way (or even defined smallnumber of ways) to get the hierarchy that is relevant for "depicts" fromcurrent Wikidata data. It may even be that for a specific type or classthe hierarchy is well defined, but the sheer number of different ways itis done in different areas is overwhelming and ill-suited for automaticprocessing. Of course things like "is "cat" a common name of an animalor a taxon and which one of these will be used in depicts" addscomplexity too.

One way of solving it is to create a special hierarchy for "depicts"purposes that would serve this particular use case. Another way is toamend existing hierarchies and meta-hierarchies so that there would bean algorithmic way of navigating them in a common case. This issomething that would be nice to hear about from people that areexperienced in ontology creation and maintenance.

Post by James Healdto be chosen that then need to be applied consistently? Is thissomething the community can do, or is some more active direction goingto need to be applied?

It looks like a lot of that phabricator issue was around Taxons ? For thePoodle to show a class of Mammal...

Seems like many of these could be answered if someone responded tohttps://www.wikidata.org/wiki/User:Danyaljj on their last question about ifan "OR" could be used with linktype with gas:service ... where no one gavean answer to their final question comment here:https://www.wikidata.org/wiki/Wikidata:Request_a_query/Archive/2017/01#Timeout_when_finding_distance_between_two_entities

I tried myself to answer that question and find either Parent Taxon ORSubclass of a Poodle, but couldn't seem to pull it off using gas:serviceand 1 hour of trial and error in many forms, even duplicating the programtwice ...

Post by James HealdApparently the Wikidata hierarchies were simply too complicated, toounpredictable, and too arbitrary and inconsistent in their design acrossdifferent subject areas to be readily assimilated (before one evenstarts on the density of bugs and glitches that then undermine them).

The main problem is that there is no standard way (or even defined smallnumber of ways) to get the hierarchy that is relevant for "depicts" fromcurrent Wikidata data. It may even be that for a specific type or classthe hierarchy is well defined, but the sheer number of different ways itis done in different areas is overwhelming and ill-suited for automaticprocessing. Of course things like "is "cat" a common name of an animalor a taxon and which one of these will be used in depicts" addscomplexity too.One way of solving it is to create a special hierarchy for "depicts"purposes that would serve this particular use case. Another way is toamend existing hierarchies and meta-hierarchies so that there would bean algorithmic way of navigating them in a common case. This issomething that would be nice to hear about from people that areexperienced in ontology creation and maintenance.

Post by James Healdto be chosen that then need to be applied consistently? Is thissomething the community can do, or is some more active direction goingto need to be applied?

I think this is very much something that the community can do.--Stas Malyshev_______________________________________________Wikidata mailing listhttps://lists.wikimedia.org/mailman/listinfo/wikidata

The Wikidata's ontology is a mess, and I do not see how it could beotherwise. While the creation of new properties is controlled, any fool candecide that a woman <https://www.wikidata.org/wiki/Q467>is no longer ahuman or is part of family. Maybe I'm a fool too? I wanted to remove theclaim that a ship <https://www.wikidata.org/wiki/Q11446> is an instance of"ship type" because it produces weird circular inferences in myapplication; but maybe that makes sense to someone else.

There will never be a universal ontology on which everyone agrees. I wonder(sorry to think aloud) if Wikidata should not rather facilitate the use ofexternal classifications. Many external ids are knowledge organizationsystems (ontologies, thesauri, classifications ...) I dream of a simplequery that could search, in Wikidata, "all elements of the same class as'poodle' according to the classification of imagenet<http://imagenet.stanford.edu/synset?wnid=n02113335>.

Post by Thad GuidryJames,It looks like a lot of that phabricator issue was around Taxons ? For thePoodle to show a class of Mammal...Seems like many of these could be answered if someone responded tohttps://www.wikidata.org/wiki/User:Danyaljj on their last question aboutif an "OR" could be used with linktype with gas:service ... where no onehttps://www.wikidata.org/wiki/Wikidata:Request_a_query/Archive/2017/01#Timeout_when_finding_distance_between_two_entitiesI tried myself to answer that question and find either Parent Taxon ORSubclass of a Poodle, but couldn't seem to pull it off using gas:serviceand 1 hour of trial and error in many forms, even duplicating the programtwice ...http://tinyurl.com/yb7wfpwh#defaultView:GraphPREFIX gas: <http://www.bigdata.com/rdf/gas#>SELECT ?item ?itemLabelWHERE {SERVICE gas:service {gas:program gas:gasClass "com.bigdata.rdf.graph.analytics.SSSP" ;gas:in wd:Q38904 ;gas:traversalDirection "Forward" ;gas:out ?item ;gas:out1 ?depth ;gas:maxIterations 10 ;gas:linkType wdt:P279 .}SERVICE gas:service {gas:program gas:gasClass "com.bigdata.rdf.graph.analytics.SSSP" ;gas:in wd:Q38904 ;gas:traversalDirection "Forward" ;gas:out ?item ;gas:out1 ?depth ;gas:maxIterations 10 ;gas:linkType wdt:P171 .}SERVICE wikibase:label {bd:serviceParam wikibase:language"[AUTO_LANGUAGE],en" }}

Post by James HealdApparently the Wikidata hierarchies were simply too complicated, toounpredictable, and too arbitrary and inconsistent in their design acrossdifferent subject areas to be readily assimilated (before one evenstarts on the density of bugs and glitches that then undermine them).

The main problem is that there is no standard way (or even defined smallnumber of ways) to get the hierarchy that is relevant for "depicts" fromcurrent Wikidata data. It may even be that for a specific type or classthe hierarchy is well defined, but the sheer number of different ways itis done in different areas is overwhelming and ill-suited for automaticprocessing. Of course things like "is "cat" a common name of an animalor a taxon and which one of these will be used in depicts" addscomplexity too.One way of solving it is to create a special hierarchy for "depicts"purposes that would serve this particular use case. Another way is toamend existing hierarchies and meta-hierarchies so that there would bean algorithmic way of navigating them in a common case. This issomething that would be nice to hear about from people that areexperienced in ontology creation and maintenance.

Post by James Healdto be chosen that then need to be applied consistently? Is thissomething the community can do, or is some more active direction goingto need to be applied?

I think this is very much something that the community can do.--Stas Malyshev_______________________________________________Wikidata mailing listhttps://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata has the ability of crowdsourcing...unfortunately, it is noteffectively utilized.

Its because Wikidata does not yet provide a voting feature onstatements...where as the vote gets higher...more resistance to change thestatement is required.But that breaks the notion of a "wiki" for some folks.And there we circle back to Gerard's age old question of ... shouldWikidata really be considered a wiki at all for the benefit of society ?or should it apply voting/resistance to keep it tidy, factual and lessmessy.

We have the technology to implement voting/resistance on statements. Ipersonally would utilize that feature and many others probably would aswell. Crowdsourcing the low voted facts back to applications likeOpenRefine, or the recently sent out Survey vote mechanism for spamanalysis on the low voted statements could highlight where things areuntidy and implement vote casting to clean them up.

"...the burden of proof has to be placed on authority, and it should bedismantled if that burden cannot be met..."

-Thad+ThadGuidry <https://plus.google.com/+ThadGuidry>

Post by Ettore RIZZAHi,The Wikidata's ontology is a mess, and I do not see how it could beotherwise. While the creation of new properties is controlled, any fool candecide that a woman <https://www.wikidata.org/wiki/Q467>is no longer ahuman or is part of family. Maybe I'm a fool too? I wanted to remove theclaim that a ship <https://www.wikidata.org/wiki/Q11446> is an instanceof "ship type" because it produces weird circular inferences in myapplication; but maybe that makes sense to someone else.There will never be a universal ontology on which everyone agrees. Iwonder (sorry to think aloud) if Wikidata should not rather facilitate theuse of external classifications. Many external ids are knowledgeorganization systems (ontologies, thesauri, classifications ...) I dream ofa simple query that could search, in Wikidata, "all elements of the sameclass as 'poodle' according to the classification of imagenet<http://imagenet.stanford.edu/synset?wnid=n02113335>.

I understand that an open Wiki has its advantages and disadvantages (Isometimes prefer a system like StackOverflow, where you need a certainreputation to do some things). I am afraid that a voting system simplyfavors the opinions shared by the majority of Wikidata editors, namely aWestern worldview. And even within this subgroup opinions may legitimatelydiffer.

But there may be ways to avoid messing up the ontology while respecting thewiki spirit. For example, a warning pop-up every time you edit anontological property (P31, P279, P361...). Something like: "OK, you addedthe statement "a poodle is an instance of toy". Do you agree with the factthat poodle is now a goods, a work, an artificial physical object? "

But that would only work for manual edits...

Post by Thad GuidryEttore,Wikidata has the ability of crowdsourcing...unfortunately, it is noteffectively utilized.Its because Wikidata does not yet provide a voting feature onstatements...where as the vote gets higher...more resistance to change thestatement is required.But that breaks the notion of a "wiki" for some folks.And there we circle back to Gerard's age old question of ... shouldWikidata really be considered a wiki at all for the benefit of society ?or should it apply voting/resistance to keep it tidy, factual and lessmessy.We have the technology to implement voting/resistance on statements. Ipersonally would utilize that feature and many others probably would aswell. Crowdsourcing the low voted facts back to applications likeOpenRefine, or the recently sent out Survey vote mechanism for spamanalysis on the low voted statements could highlight where things areuntidy and implement vote casting to clean them up."...the burden of proof has to be placed on authority, and it should bedismantled if that burden cannot be met..."-Thad+ThadGuidry <https://plus.google.com/+ThadGuidry>

Post by Ettore RIZZAHi,The Wikidata's ontology is a mess, and I do not see how it could beotherwise. While the creation of new properties is controlled, any fool candecide that a woman <https://www.wikidata.org/wiki/Q467>is no longer ahuman or is part of family. Maybe I'm a fool too? I wanted to remove theclaim that a ship <https://www.wikidata.org/wiki/Q11446> is an instanceof "ship type" because it produces weird circular inferences in myapplication; but maybe that makes sense to someone else.There will never be a universal ontology on which everyone agrees. Iwonder (sorry to think aloud) if Wikidata should not rather facilitate theuse of external classifications. Many external ids are knowledgeorganization systems (ontologies, thesauri, classifications ...) I dream ofa simple query that could search, in Wikidata, "all elements of the sameclass as 'poodle' according to the classification of imagenet<http://imagenet.stanford.edu/synset?wnid=n02113335>._______________________________________________

Hoi,There is also the age old conundrum where some want to enforce their rulesfor the good all all because (argument of the day follows).

First of all, Wikidata is very much a child of Wikipedia. It has its ownstructures and people have endeavoured to build those same structures inWikidata never mind that it is a very different medium and never mind thatthere are 280+ Wikipedias that might consider things to be different. Thestart of Wikidata was also an auspicious occasion where it was thought tobe OK to adopt an external German authority. That proved to be a disasterand there are still residues of this awful decision. It took not long toshow the short comings of this schedule and it was replaced by somethingmore sensible.

However, we got something really Wiki and it was all too wild. It took notlong for me to ask for someone to explain the current structures and nobodyvolunteered. So I did what I do best, I largely ignored the results of theclasses and subclasses. It does not work for me. It works against me so mecurrent strategy is to ignore this nonsense and concentrate on includingdata. The reason is simple; once data is included, it is easy to slice itand dice it.structure it as we see fit at a later date.

So when our priority becomes to make our data reusable, more open we shouldagree on it. So far we have not because we choose to fight each other. Somehave ideas, some have invested too much in what we have at this time. Whenwe are to make our data reusable, we should agree on what it is exactly weaim to achieve. Is it to support Commons, it is to support some externalstandard that is academically sound. I would always favour what ispractical and easily measured.

I would support Commons first. It has the benefit that it will bring ourcommunities together in a clear objective. It has the benefit that changesin the operations of Wikidata support the whole of the Wikimedia universeand consequentially financial, technical and operational needs andinvestments are easily understood. It also means that all the bureaucracythat has materialised will show to be in the way when it is.

So my question is not if we are a Wiki, my question is are we a Wiki enoughand willing to change our way for our own good.Thanks,GerardM

Post by Thad GuidryEttore,Wikidata has the ability of crowdsourcing...unfortunately, it is noteffectively utilized.Its because Wikidata does not yet provide a voting feature onstatements...where as the vote gets higher...more resistance to change thestatement is required.But that breaks the notion of a "wiki" for some folks.And there we circle back to Gerard's age old question of ... shouldWikidata really be considered a wiki at all for the benefit of society ?or should it apply voting/resistance to keep it tidy, factual and lessmessy.We have the technology to implement voting/resistance on statements. Ipersonally would utilize that feature and many others probably would aswell. Crowdsourcing the low voted facts back to applications likeOpenRefine, or the recently sent out Survey vote mechanism for spamanalysis on the low voted statements could highlight where things areuntidy and implement vote casting to clean them up."...the burden of proof has to be placed on authority, and it should bedismantled if that burden cannot be met..."-Thad+ThadGuidry <https://plus.google.com/+ThadGuidry>

Post by Ettore RIZZAHi,The Wikidata's ontology is a mess, and I do not see how it could beotherwise. While the creation of new properties is controlled, any fool candecide that a woman <https://www.wikidata.org/wiki/Q467>is no longer ahuman or is part of family. Maybe I'm a fool too? I wanted to remove theclaim that a ship <https://www.wikidata.org/wiki/Q11446> is an instanceof "ship type" because it produces weird circular inferences in myapplication; but maybe that makes sense to someone else.There will never be a universal ontology on which everyone agrees. Iwonder (sorry to think aloud) if Wikidata should not rather facilitate theuse of external classifications. Many external ids are knowledgeorganization systems (ontologies, thesauri, classifications ...) I dream ofa simple query that could search, in Wikidata, "all elements of the sameclass as 'poodle' according to the classification of imagenet<http://imagenet.stanford.edu/synset?wnid=n02113335>._______________________________________________

Wiki content grows in a messy way, and it stays messy until the messiness causesproblems. Once it causes problems, people are motivated to clean it up.

I propose to implement hierarchical search based on very simple, predictablerules, e.g. by having a configurable list of transitive relationships that getevaluated to a certain depth. I'd go for subclasses, geographical inclusion, andsubspecies at first.

Doing this will NOT produce good results. You would have to implement a lot ofspecial cases and heuristics to work around dirty data. I say: let it producebad results, tell people why the results are bad, and what they can do about it!

The Wikimedia community is AMAZING at making good use of whatever capabilitiesthe software, and adapting content to make the software produce the results theywant. By providing limited but clearly defined software support for hierarchicalsearch, we allow the community to optimize the content to work with that search.Keeping the rules simple means that other consumers can then follow the samerules, and the content will work for them as well.

-- daniel

Hoi,There is also the age old conundrum where some want to enforce their rules forthe good all all because (argument of the day follows).First of all, Wikidata is very much a child of Wikipedia. It has its ownstructures and people have endeavoured to build those same structures inWikidata never mind that it is a very different medium and never mind that thereare 280+ Wikipedias that might consider things to be different. The start ofWikidata was also an auspicious occasion where it was thought to be OK to adoptan external German authority. That proved to be a disaster and there are stillresidues of this awful decision. It took not long to show the short comings ofthis schedule and it was replaced by something more sensible.However, we got something really Wiki and it was all too wild. It took not longfor me to ask for someone to explain the current structures and nobodyvolunteered. So I did what I do best, I largely ignored the results of theclasses and subclasses. It does not work for me. It works against me so mecurrent strategy is to ignore this nonsense and concentrate on including data.The reason is simple; once data is included, it is easy to slice it and diceit.structure it as we see fit at a later date.So when our priority becomes to make our data reusable, more open we shouldagree on it. So far we have not because we choose to fight each other. Some haveideas, some have invested too much in what we have at this time. When we are tomake our data reusable, we should agree on what it is exactly we aim to achieve.Is it to support Commons, it is to support some external standard that isacademically sound. I would always favour what is practical and easily measured. I would support Commons first. It has the benefit that it will bring ourcommunities together in a clear objective. It has the benefit that changes inthe operations of Wikidata support the whole of the Wikimedia universe andconsequentially financial, technical and operational needs and investments areeasily understood. It also means that all the bureaucracy that has materialisedwill show to be in the way when it is.So my question is not if we are a Wiki, my question is are we a Wiki enough andwilling to change our way for our own good.Thanks, GerardMEttore,Wikidata has the ability of crowdsourcing...unfortunately, it is noteffectively utilized.Its because Wikidata does not yet provide a voting feature onstatements...where as the vote gets higher...more resistance to change thestatement is required.But that breaks the notion of a "wiki" for some folks.And there we circle back to Gerard's age old question of ... should Wikidatareally be considered a wiki at all for the benefit of society ? or shouldit apply voting/resistance to keep it tidy, factual and less messy.We have the technology to implement voting/resistance on statements. Ipersonally would utilize that feature and many others probably would aswell. Crowdsourcing the low voted facts back to applications likeOpenRefine, or the recently sent out Survey vote mechanism for spam analysison the low voted statements could highlight where things are untidy andimplement vote casting to clean them up."...the burden of proof has to be placed on authority, and it should bedismantled if that burden cannot be met..."-Thad+ThadGuidry <https://plus.google.com/+ThadGuidry>Hi,The Wikidata's ontology is a mess, and I do not see how it could beotherwise. While the creation of new properties is controlled, any foolcan decide that a woman <https://www.wikidata.org/wiki/Q467>is no longera human or is part of family. Maybe I'm a fool too? I wanted to removethe claim that a ship <https://www.wikidata.org/wiki/Q11446> is aninstance of "ship type" because it produces weird circular inferences inmy application; but maybe that makes sense to someone else.There will never be a universal ontology on which everyone agrees. Iwonder (sorry to think aloud) if Wikidata should not rather facilitatethe use of external classifications. Many external ids are knowledgeorganization systems (ontologies, thesauri, classifications ...) I dreamof a simple query that could search, in Wikidata, "all elements of thesame class as 'poodle' according to the classification of imagenet<http://imagenet.stanford.edu/synset?wnid=n02113335>._______________________________________________Wikidata mailing listhttps://lists.wikimedia.org/mailman/listinfo/wikidata_______________________________________________Wikidata mailing listhttps://lists.wikimedia.org/mailman/listinfo/wikidata

I say: let it produce bad results, tell people why the results are bad, and what they can do about it!

TL;DR: let's produce bad results, and let's analyse those results tofind the best practical solution we can come up with.

I totally agree with Daniel here. It is definitely a red flag that weshould tackle head-first, but we need data first. We need to know*where* ontology fails, *why* it fails, and *how* can we fix it.

Now it's probably the best time to talk about this, not just becausewe have a potential big application such as Structured Data, but alsobecause we focused on other not-so-easy problems such as dealing withisolated sitelinks/projects and try to establish relations betweenitems, and between items and other databases.

What we need to do IMHO is to find whatever best practical solution wehave at hand, in order to primarily use it on Wikimedia projects. Myonly fear is that such discussions may end up in a swamp because of"that one user" who doesn't want to apply that particular solution(not accusing anyone in particular, I've been that user too in somediscussions). Anyway, if we start from data, we can come up with somesolution.

I say: let it produce> bad results, tell people why the results are bad, and

what they can do about it![...]

-- daniel

My view is that there is a big problem with this for industrial use of Wikidata.

I would very much like to use Wikidata more in my company. However, I view itas my duty in my company to point out problems with the use of any technology.So whenever I talk about Wikidata I also have to talk about the problems Isee in the Wikidata ontology and how they will affect use of Wikidata in mycompany.

If Wikidata is going to have significant use in my company there needs to beat least some indication that the problems in Wikidata are being addressed. Idon't see that happening at the moment.

What is the biggest problem I see in Wikidata? It is the poor organization ofthe Wikidata ontology. To fix the ontology, beyond doing point fixes, isgoing to require some commitment from the Wikidata community.

I say: let it produce> bad results, tell people why the results are bad, and

what they can do about it![...]

-- daniel

My view is that there is a big problem with this for industrial use of Wikidata.

[...]

Post by Daniel KinzlerWhat is the biggest problem I see in Wikidata? It is the poor organization ofthe Wikidata ontology. To fix the ontology, beyond doing point fixes, isgoing to require some commitment from the Wikidata community.

I agree. And I think the best way to achieve this is to start using the ontologyas an ontology on wikimedia projects, and thus expose the fact that the ontologyis broken. This gives incentive to fix it, and examples as to what things shouldbe possible using that ontology (namely, some level of basic inference).

And, on another note, there is also a huge misunderstanding exposed inthe discussion on th search-related tracker item [1]: Cparle therespeaks about "traversing the subclass hierarchy" but is actually lookingat *super*classes of, e.g., "Clarinet", which he mostly finds irrelevantto users who care about clarinets. But surely that's the wrongdirection! You have to look for *sub*classes to find special cases ofwhat you are looking for. Looking downwards will often lead to muchsaner ontologies than when turning your head towards the dizzy heightsof upper ontology. Yes, the few of us looking for instances of "logicalconsequence" will still get clarinets, but those who look for instancesof clarinet merely will see instances of alto clarinet, piccoloclarinet, basset horn, Saxonette, and so on [2]. So instead of trying tosuggest to Commons editors meaningful "upper concepts", one could simplyenable the use of lower concepts in search. It does not work in allcases yet, but it many.

There are still problems (such as the biological taxonomy being modelledas a hierarchy of names rather than animal classes, placing dog far awayfrom mammal), but it is still always much easier to come up with a saneorganisation for the *sub*classes of a concrete class.

FYI, I recently gave a talk about ontological modelling in Wikidata thatdiscussed some of the current issues:https://iccl.inf.tu-dresden.de/web/Misc3058/en (audience were ontologydesign pattern researchers there).

Post by Daniel KinzlerWiki content grows in a messy way, and it stays messy until the messiness causesproblems. Once it causes problems, people are motivated to clean it up.I propose to implement hierarchical search based on very simple, predictablerules, e.g. by having a configurable list of transitive relationships that getevaluated to a certain depth. I'd go for subclasses, geographical inclusion, andsubspecies at first.Doing this will NOT produce good results. You would have to implement a lot ofspecial cases and heuristics to work around dirty data. I say: let it producebad results, tell people why the results are bad, and what they can do about it!The Wikimedia community is AMAZING at making good use of whatever capabilitiesthe software, and adapting content to make the software produce the results theywant. By providing limited but clearly defined software support for hierarchicalsearch, we allow the community to optimize the content to work with that search.Keeping the rules simple means that other consumers can then follow the samerules, and the content will work for them as well.-- daniel

Hoi,There is also the age old conundrum where some want to enforce their rules forthe good all all because (argument of the day follows).First of all, Wikidata is very much a child of Wikipedia. It has its ownstructures and people have endeavoured to build those same structures inWikidata never mind that it is a very different medium and never mind that thereare 280+ Wikipedias that might consider things to be different.Â The start ofWikidata was also an auspicious occasion where it was thought to be OK to adoptan external German authority. That proved to be a disaster and there are stillresidues of this awful decision. It took not long to show the short comings ofthis schedule and it was replaced by something more sensible.However, we got something really Wiki and it was all too wild. It took not longfor me to ask for someone to explain the current structures and nobodyvolunteered. So I did what I do best, I largely ignored the results of theclasses and subclasses. It does not work for me. It works against me so mecurrent strategy is to ignore this nonsense and concentrate on including data.The reason is simple; once data is included, it is easy to slice it and diceit.structure it as we see fit at a later date.So when our priority becomes to make our data reusable, more open we shouldagree on it. So far we have not because we choose to fight each other. Some haveideas, some have invested too much in what we have at this time. When we are tomake our data reusable, we should agree on what it is exactly we aim to achieve.Is it to support Commons, it is to support some external standard that isacademically sound. I would always favour what is practical and easily measured.I would support Commons first. It has the benefit that it will bring ourcommunities together in a clear objective. It has the benefit that changes inthe operations of Wikidata support the whole of the Wikimedia universe andconsequentially financial, technical and operational needs and investments areeasily understood. It also means that all the bureaucracy that has materialisedwill show to be in the way when it is.So my question is not if we are a Wiki, my question is are we a Wiki enough andwilling to change our way for our own good.Thanks,Â Â Â GerardMEttore,Wikidata has the ability of crowdsourcing...unfortunately, it is noteffectively utilized.Its because Wikidata does not yet provide a voting feature onstatements...where as the vote gets higher...more resistance to change thestatement is required.But that breaks the notion of a "wiki" for some folks.And there we circle back to Gerard's age old question of ... should Wikidatareally be considered a wiki at all for the benefit of society ?Â or shouldit apply voting/resistance to keep it tidy, factual and less messy.We have the technology to implement voting/resistance on statements.Â Ipersonally would utilize that feature and many others probably would aswell.Â Crowdsourcing the low voted facts back to applications likeOpenRefine, or the recently sent out Survey vote mechanism for spam analysison the low voted statements could highlight where things are untidy andimplement vote casting to clean them up."...the burden of proof has to be placed on authority, and it should bedismantled if that burden cannot be met..."-Thad+ThadGuidry <https://plus.google.com/+ThadGuidry>Hi,The Wikidata's ontology is a mess, and I do not see how it could beotherwise. While the creation of new properties is controlled, any foolcan decide that a woman <https://www.wikidata.org/wiki/Q467>is no longera human or is part of family. Maybe I'm a fool too? I wanted to removethe claim that a ship <https://www.wikidata.org/wiki/Q11446> is aninstance of "ship type" because it produces weird circular inferences inmy application; but maybe that makes sense to someone else.There will never be a universal ontology on which everyone agrees. Iwonder (sorry to think aloud) if Wikidata should not rather facilitatethe use of external classifications. Many external ids are knowledgeorganization systems (ontologies, thesauri, classifications ...) I dreamof a simple query that could search, in Wikidata, "all elements of thesame class as 'poodle' according to the classification of imagenet<http://imagenet.stanford.edu/synset?wnid=n02113335>._______________________________________________Wikidata mailing listhttps://lists.wikimedia.org/mailman/listinfo/wikidata_______________________________________________Wikidata mailing listhttps://lists.wikimedia.org/mailman/listinfo/wikidata

Post by Markus KroetzschAnd, on another note, there is also a huge misunderstanding exposed inthe discussion on th search-related tracker item [1]: Cparle therespeaks about "traversing the subclass hierarchy" but is actually lookingat *super*classes of, e.g., "Clarinet", which he mostly finds irrelevantto users who care about clarinets. But surely that's the wrongdirection! You have to look for *sub*classes to find special cases ofwhat you are looking for. Looking downwards will often lead to muchsaner ontologies than when turning your head towards the dizzy heightsof upper ontology. Yes, the few of us looking for instances of "logicalconsequence" will still get clarinets, but those who look for instancesof clarinet merely will see instances of alto clarinet, piccoloclarinet, basset horn, Saxonette, and so on [2]. So instead of trying tosuggest to Commons editors meaningful "upper concepts", one could simplyenable the use of lower concepts in search. It does not work in allcases yet, but it many.

Not really.

Cparle wants to make sure that people searching for "clarinet" also getshown images of "piccolo clarinet" etc.

To make this possible, where an image has been tagged "basset horn" heis therefore looking to add "clarinet" as an additional keyword, so thatif somebody types "clarinet" into the search box, one of the imagesretrieved by ElasticSearch will be the basset horn one.

I imagine there are pluses and minuses both ways, whether you try tomake sure one search returns more hits, or try to run multiple searcheseach returning fewer hits.

Your suggestion of the latter approach may not involve so muchpre-investigation of the top of the tree, which may be terms that peopleare less likely to search for; but on the other hand, the actualsearching may be less efficient than a single indexed search.

Post by Markus KroetzschThere are still problems (such as the biological taxonomy being modelledas a hierarchy of names rather than animal classes, placing dog far awayfrom mammal), but it is still always much easier to come up with a saneorganisation for the *sub*classes of a concrete class.

For what it's worth, there's currently quite a lively discussion onProject Chat about issues with the current modelling of biologicaltaxonomies,https://www.wikidata.org/wiki/Wikidata:Project_chat#Taxonomy:_concept_centric_vs_name_centric

People on this thread might like to comment on some of the lessfortunate elements of current practice, and the appropriateness of someof the thoughts that have been suggested.

But the taxo project has become such a walled garden, answerable only toitself, that people with comments may need to be quite forceful to gettheir message through, if we are to deal eg with some of thedifficulties Cparle describes in the ticket athttps://phabricator.wikimedia.org/T199119

Post by Markus KroetzschAnd, on another note, there is also a huge misunderstanding exposed inthe discussion on th search-related tracker item [1]: Cparle therespeaks about "traversing the subclass hierarchy" but is actuallylooking at *super*classes of, e.g., "Clarinet", which he mostly findsirrelevant to users who care about clarinets. But surely that's thewrong direction! You have to look for *sub*classes to find specialcases of what you are looking for. Looking downwards will often leadto much saner ontologies than when turning your head towards the dizzyheights of upper ontology. Yes, the few of us looking for instances of"logical consequence" will still get clarinets, but those who look forinstances of clarinet merely will see instances of alto clarinet,piccolo clarinet, basset horn, Saxonette, and so on [2]. So instead oftrying to suggest to Commons editors meaningful "upper concepts", onecould simply enable the use of lower concepts in search. It does notwork in all cases yet, but it many.

Not really.Cparle wants to make sure that people searching for "clarinet" also getshown images of "piccolo clarinet" etc.To make this possible, where an image has been tagged "basset horn" heis therefore looking to add "clarinet" as an additional keyword, so thatif somebody types "clarinet" into the search box, one of the imagesretrieved by ElasticSearch will be the basset horn one.I imagine there are pluses and minuses both ways, whether you try tomake sure one search returns more hits, or try to run multiple searcheseach returning fewer hits.Your suggestion of the latter approach may not involve so muchpre-investigation of the top of the tree, which may be terms that peopleare less likely to search for; but on the other hand, the actualsearching may be less efficient than a single indexed search.

True, but with the Wikidata Query Service we already have infrastructurethat completes millions of search requests of this kind (involving pathqueries), so that seems doable for Commons as well. WDQS already hasWikimedia API bindings that allow it to use Lucene-based results inaddition, if needed (though this would only make sense if the searchshould use some content that for some reason cannot be imported into aquery service as graph data, mostly free-text search over longer texts).

I think the approach of completing tags towards the upper classes is nota good idea in general, since it creates extra work for editors thatrequires a million times the resources needed in the other approach: ifthe subclass hierarchy is wrong, you only need to fix it once to improvesearch for all existing Commons content; if you rely on manual extratags, you'd have to add them to every file on Commons and keep itup-to-date with changes in the concepts -- an enormous, redundant effortthat will invariably lead to a very non-uniform search experience acrossotherwise similar media. This seems like a huge waste of editors' timeeven if it would work (i.e., if we would live in a world where thesuperclasses of a class would be easy to understand and closely relatedto the topic that an editor is working on -- which will never happen forWikidata or Commons, since both cover such a breadth of topics thattheir upper ontology necessarily has to be very general even if modelledin a clean and fully correct way).

Post by Markus KroetzschThere are still problems (such as the biological taxonomy beingmodelled as a hierarchy of names rather than animal classes, placingdog far away from mammal), but it is still always much easier to comeup with a sane organisation for the *sub*classes of a concrete class.

For what it's worth, there's currently quite a lively discussion onProject Chat about issues with the current modelling of biologicaltaxonomies,https://www.wikidata.org/wiki/Wikidata:Project_chat#Taxonomy:_concept_centric_vs_name_centricPeople on this thread might like to comment on some of the lessfortunate elements of current practice, and the appropriateness of someof the thoughts that have been suggested.But the taxo project has become such a walled garden, answerable only toitself, that people with comments may need to be quite forceful to gettheir message through, if we are to deal eg with some of thedifficulties Cparle describes in the ticket atÂ https://phabricator.wikimedia.org/T199119Â -- James.---This email has been checked for viruses by AVG.https://www.avg.com_______________________________________________Wikidata mailing listhttps://lists.wikimedia.org/mailman/listinfo/wikidata

Post by James HealdBut the taxo project has become such a walled garden, answerable only toitself, that people with comments may need to be quite forceful to gettheir message through, if we are to deal eg with some of thedifficulties Cparle describes in the ticket [...]

Me and other admins are unfortunately aware of this and this isexactly what I was referring to in my previous e-mail. I do agree withyou the situation there is frankly unbearable, and IMHO it will likelybe ended also through "removals" of some users who think they shouldbe the only one in charge of deciding what's good and what's not. Youmight easily understand why this situation deteriorated like this, butI acknowledge this is no excuse for it to continue.

Post by James HealdBut the taxo project has become such a walled garden, answerable only toitself, that people with comments may need to be quite forceful to gettheir message through, if we are to deal eg with some of thedifficulties Cparle describes in the ticket [...]

Me and other admins are unfortunately aware of this and this isexactly what I was referring to in my previous e-mail. I do agree withyou the situation there is frankly unbearable, and IMHO it will likelybe ended also through "removals" of some users who think they shouldbe the only one in charge of deciding what's good and what's not. Youmight easily understand why this situation deteriorated like this, butI acknowledge this is no excuse for it to continue.

Re this tricky situation, it might be good that the taxonomy part ofWikidata avoid the use of "subclass of" altogether. Doesn't this open upa path for compromise? Wikidata could intentionally "overload" taxons toalso refer to sets of organisms (in some cases). The taxonomic modelwould not be affected by this in any way, since it ignores "subclassof". Some (historic or debated) taxons could be ignored for this"colloquial" subclass hierarchy, while other merely colloquially definedclasses of animals could be put in relation to proper species. I thinksuch overloading is acceptable as long as there cannot be confusionbetween which statement refers to which facet of the concept. Then nouse of either facet will be impaired by the presence of the "irrelevant"extra data.

The only alternative seems to build a "mirror taxonomy" that consistsnot of taxon names but of animal classes (and that would include "dog"somewhere in its hierarchy [1]). But then we will need a community-widedecision on which of the two (class of organisms vs. scientific name) isthe subject of actual Wikipedia articles, which might be a difficulttopic to discuss.

Alternatively, if the taxons are mostly considered as "names" (syntax)rather than classes of individual organism, then it seems we areactually building a kind of scientific dictionary here that might ratherbelong into the lexeme space.

Whatever happens, this problem needs some solution.

Cheers,

Markus

[1] It seems that the strange position of "dog" is mostly due to thefact that two taxons are associated with it. In general, this seems animportant issue (many common names are not clearly specifying a taxon),but in the case of dog it seems that the two taxons are synonyms of oneanother, i.e., the taxon for dog simply changed names over time.

Post by James HealdCparle wants to make sure that people searching for "clarinet" also getshown images of "piccolo clarinet" etc.To make this possible, where an image has been tagged "basset horn" heis therefore looking to add "clarinet" as an additional keyword, so thatif somebody types "clarinet" into the search box, one of the imagesretrieved by ElasticSearch will be the basset horn one.

Generally if the image is tagged with "basset horn" and the user queryis "clarinet", we can do one of the following:

1. Index all upstream-hierarchy for "basset horn" (presumably we wouldhave to cut off when it gets too deep or too abstract) and then matchdirectly when searching.

3. Have some manual or automatic process that ensures that both"clarinet" and "basset horn" are indexed (not necessarily at once) andrely on it to discover the matches.

The problem with (1) is that if hierarchy changes, we will have to dohuge number of updates which might overwhelm the system, and most ofthese updates would be not even for things people search for, but wehave no way to know that.

The problem with (2) is that downstream hierarchies explode very fast,and if you search for "clarinet" and there are 10000 descendants inthese hierarchies, we can't search for all of them, so you may never geta chance to find the basset horn. Also, of course, querying bigdownstream hierarchies takes time too, which means performance hit.

Post by James HealdCparle wants to make sure that people searching for "clarinet" also getshown images of "piccolo clarinet" etc.To make this possible, where an image has been tagged "basset horn" heis therefore looking to add "clarinet" as an additional keyword, so thatif somebody types "clarinet" into the search box, one of the imagesretrieved by ElasticSearch will be the basset horn one.

Generally if the image is tagged with "basset horn" and the user query1. Index all upstream-hierarchy for "basset horn" (presumably we wouldhave to cut off when it gets too deep or too abstract) and then matchdirectly when searching.2. Expand hierarchy down-stream from "clarinet" and then match againstsearch index.3. Have some manual or automatic process that ensures that both"clarinet" and "basset horn" are indexed (not necessarily at once) andrely on it to discover the matches.The problem with (1) is that if hierarchy changes, we will have to dohuge number of updates which might overwhelm the system, and most ofthese updates would be not even for things people search for, but wehave no way to know that.The problem with (2) is that downstream hierarchies explode very fast,and if you search for "clarinet" and there are 10000 descendants inthese hierarchies, we can't search for all of them, so you may never geta chance to find the basset horn. Also, of course, querying bigdownstream hierarchies takes time too, which means performance hit.

Is this such a problem? It is what people now commonly do with P31/P279*queries. For example, finding 10K instances of (some subclass of)building takes 9 secs: http://tinyurl.com/y7e5j5sd (I think this is oneof the more complex hierarchies; maybe you know larger downstreamhierarchies one could try?) If you omit the labels, it takes 650ms.That's maybe not quite autocompletion speed yet, but seems acceptablefor a media search.

I would appreciate clarification what is proposed with regard to exposing problematic Wikidata ontology on Wikipedia. If the idea involves inserting poor-quality information onto English Wikipedia in order to spur us to fix problems with Wikidata, then I am likely to oppose it. English Wikipedia is not an endless resource for free labor, and we have too few skilled and good-faith volunteers to handle our already enormous scope of work.

Hoi Pine,The ontology of Wikidata has nothing to do with English Wikipedia. Thenotion that English Wikipedia is the only endless resource of free labouris pathetic. Its dismissive attitude prevents functional contributions thatwill benefit the users of Wikimedia projects.

For authors of "scholarly articles" we have an increasing amount ofinformation that is impossible for Wikipedia to include. It does not takemuch to have a template that show them (standard collapsed) and links to"Scholia" information for the paper.

For authors of books we could have a similar template. They could link to*your local library* where you can check if it is available for reading.Alternatively we could link to the "Open Library".

What it would do is provide a SERVICE to our readers that is easy enough toprovide, that leverages the data in Wikidata and is of a high quality. Theissue about the ontology has everything to do with the discovery of imagesin Commons. It cannot get worse as it is, it is disfunctional. It onlyworks for English and I understand that is something you do not reallynotice.

Yes, I do recognise Wikidata is a wiki. It is a work in progress and assuch the quality and quantity steadily improves.. Just like EnglishWikipedia.Thanks,Gerard

Post by Pine WI would appreciate clarification what is proposed with regard to exposingproblematic Wikidata ontology on Wikipedia. If the idea involves insertingpoor-quality information onto English Wikipedia in order to spur us to fixproblems with Wikidata, then I am likely to oppose it. English Wikipedia isnot an endless resource for free labor, and we have too few skilled andgood-faith volunteers to handle our already enormous scope of work.Pine( https://meta.wikimedia.org/wiki/User:Pine )_______________________________________________Wikidata mailing listhttps://lists.wikimedia.org/mailman/listinfo/wikidata

Post by Pine WI would appreciate clarification what is proposed with regard toexposing problematic Wikidata ontology on Wikipedia. If the ideainvolves inserting poor-quality information onto English Wikipedia inorder to spur us to fix problems with Wikidata, then I am likely tooppose it. English Wikipedia is not an endless resource for free labor,and we have too few skilled and good-faith volunteers to handle ouralready enormous scope of work.

You are right, and thankfully this is not what is proposed. The proposalwas to offer people who search for Commons media the (maybe optional)possibility to find more results by letting the search engine traversethe "more-general-than" links stored in Wikidata. People have discoveredcases where some of these links are not correct (surprise! it's a wiki;-), and the suggestion was that such glitches would be fixed withhigher priority if there would be an application relying on it. But evenwith some wrong links, the results a searcher would get would stillinclude mostly useful hits. Also, at least half of the currentlyobserved problems with this approach would lead to fewer results (e.g.,dogs would be hard to include automatically to a search for allmammals), but in such cases the proposed extension would simply do whatthe baseline approach (ignoring the links) would do anyway, so servicewould not get any worse. Also, the manual workarounds suggested by some(adding "mammal" to all pictures of some "dog") would be compatible withthis, so one could do both to improve search experience on both ends.

Post by Markus Kroetzschpossibility to find more results by letting the search engine traversethe "more-general-than" links stored in Wikidata. People have discoveredcases where some of these links are not correct (surprise! it's a wiki;-), and the suggestion was that such glitches would be fixed withhigher priority if there would be an application relying on it. But even

The main problem I see here is not that some links are incorrect - whichmay have bad effects, but it's not the most important issue. The mostimportant one, IMHO, that there's no way to figure out in any scalableand scriptable way what "more-general-than" means for any particular case.

It's different for each type of objects and often inconsistent withinthe same class (e.g. see confusion between whether "dog" is an animal, aname of the animal, name of the taxon, etc.) It's not that navigatingthe hierarchy would lead as astray - we're not even there yet to havethis problem, because we don't even have a good way to navigate it.

Using instance-of/subclass-of only seems to not be that useful, becausea lot of interesting things are not represented in this way - e.g.finding out that Donna Strickland (Q56855591) is a woman (Q467) isimpossible using only this hierarchy. We could special-case a bunch ofthose but given how diverse Wikidata is, I don't think this will evercover any significant part of the hierarchy unless we find a non-ad-hocmethod of doing this.

This also makes it particularly hard to do something like "let's startusing it and fix the issues as we discover them", because the main issuehere is that we don't have a way to start with anything useful beyond atiny subset of classes that we can special-case manually. We can'tlaunch a rocket and figure how to build the engine later - having aworking engine is a prerequisite to launching the rocket!

There are also significant technical challenges in this - indexingdynamically changing hierarchy is very problematic, and with ourapproach to ontology anything can be a class, so we'd have to constantlyupdate the hierarchy. But this is more of a technical challenge, whichwill come after we have some solution for the above.

Thanks for elaborating. I think we could always start with traversingonly "subclass of". In spite of its limits, it does work in many areas(e.g. buildings, astronomical objects, vehicles, organisations, etc.),even if by far not in all. Where it doesn't work, one would simply notget enough results, but the alternative (do not even use "subclass of")will just make this problem worse. Any approach of fixing the latterwill also help the former.

Now regarding issues such as dog, woman, and many other things, it seemsclear that what one would need are inference rules. It should bepossible to say somewhere that a "if a human is female, then it is alsowoman" without having to add the unwanted statement "instance of woman"everywhere. Or "if someone has profession 'programmer' then he/she/theyis/are a programmer" -- at least for the purpose of media search. Thecase of dogs would be complicated (referring to quantifiers) but stilldoable.

Obvious questions arise:* Would we prefer to maintain such rules somewhere rather than addingthe relations they might infer manually? (Probably yes, since one wouldneed much fewer rules than manual statements, which would always addredundancy and cause conflicts -- cf. taxonomy modelling discussion --that are not necessary when applications can select which inferencerules to use without touching the underlying data.)* How would the rules look to human editors? (We have made some firstproposals for this; see the rules supported by SQID [1]; but one cancome up with other options)* Where would such rules be managed? (Preferably on Wikidata, but theencoding in statements would be a challenge; another challenge is how toassociate rules with entities -- usually they make connections betweenseveral entities)* How would the rules be applied on the live data, especially if thereare many updates? (Doable using known algorithms and based on existingtools, but still needs some implementation work; I think for a start onecould just reduce the update speed on these "inferred tags" and stillget a big improvement over the case where nothing of this type is doneat all).

So would this be a mid-term goal to overcome this issue? I would thinkso, also because there are enough degrees of freedom here to graduallygrow this from simple (only allow rules that effectively add some moretraversal hints) to powerful (have rules that can use qualifiers, asneeded to get from dog to mammal). The main challenge is to find a goodapproach for community-editing this part without restricting upfront toa few special cases (as for the case of the constraints).

Inference rules come up as potential solutions in many similar taskswhere you want users to access/query the data. Imagine someone wouldlook for the brothers of a person (let's assume we'd built anintelligent search for such things) -- again, Wikidata has no concept of"brother" and we would not have any idea how to answer this, unlesssomewhere we'd have a rule that defines how you can findbrother-relationships from the data that we actually have. This happensa lot when you want users who are not familiar with how we organise datafind things, but the solution cannot be to add every possibleview/inferred statement to Wikidata explicitly.

Obviously, the rule approach is not something we could deploy anytimesoon, but it could be something to work towards ...

Cheers,

Markus

[1] Example rule with explanation of how it was applied to find agrandfather of Ada Lovelace: https://tinyurl.com/y7rgmk7oThe qualifier sets (X, Y, Z) are unused here and could be hiddenentirely, but this is just a prototype.

Post by Markus Kroetzschpossibility to find more results by letting the search engine traversethe "more-general-than" links stored in Wikidata. People have discoveredcases where some of these links are not correct (surprise! it's a wiki;-), and the suggestion was that such glitches would be fixed withhigher priority if there would be an application relying on it. But even

The main problem I see here is not that some links are incorrect - whichmay have bad effects, but it's not the most important issue. The mostimportant one, IMHO, that there's no way to figure out in any scalableand scriptable way what "more-general-than" means for any particular case.It's different for each type of objects and often inconsistent withinthe same class (e.g. see confusion between whether "dog" is an animal, aname of the animal, name of the taxon, etc.) It's not that navigatingthe hierarchy would lead as astray - we're not even there yet to havethis problem, because we don't even have a good way to navigate it.Using instance-of/subclass-of only seems to not be that useful, becausea lot of interesting things are not represented in this way - e.g.finding out that Donna Strickland (Q56855591) is a woman (Q467) isimpossible using only this hierarchy. We could special-case a bunch ofthose but given how diverse Wikidata is, I don't think this will evercover any significant part of the hierarchy unless we find a non-ad-hocmethod of doing this.This also makes it particularly hard to do something like "let's startusing it and fix the issues as we discover them", because the main issuehere is that we don't have a way to start with anything useful beyond atiny subset of classes that we can special-case manually. We can'tlaunch a rocket and figure how to build the engine later - having aworking engine is a prerequisite to launching the rocket!There are also significant technical challenges in this - indexingdynamically changing hierarchy is very problematic, and with ourapproach to ontology anything can be a class, so we'd have to constantlyupdate the hierarchy. But this is more of a technical challenge, whichwill come after we have some solution for the above.

Post by Pine WI would appreciate clarification what is proposed with regard toexposing problematic Wikidata ontology on Wikipedia. If the ideainvolves inserting poor-quality information onto English Wikipedia inorder to spur us to fix problems with Wikidata, then I am likely tooppose it. English Wikipedia is not an endless resource for free labor,and we have too few skilled and good-faith volunteers to handle ouralready enormous scope of work.

You are right, and thankfully this is not what is proposed. The proposalwas to offer people who search for Commons media the (maybe optional)possibility to find more results by letting the search engine traversethe "more-general-than" links stored in Wikidata. People have discoveredcases where some of these links are not correct (surprise! it's a wiki;-), and the suggestion was that such glitches would be fixed withhigher priority if there would be an application relying on it. But evenwith some wrong links, the results a searcher would get would stillinclude mostly useful hits. Also, at least half of the currentlyobserved problems with this approach would lead to fewer results (e.g.,dogs would be hard to include automatically to a search for allmammals), but in such cases the proposed extension would simply do whatthe baseline approach (ignoring the links) would do anyway, so servicewould not get any worse. Also, the manual workarounds suggested by some(adding "mammal" to all pictures of some "dog") would be compatible withthis, so one could do both to improve search experience on both ends.Best regards,Markus

Hi Markus, I seem to be missing something. Daniel said, "And I think thebest way to achieve this is to start using the ontology as an ontology onwikimedia projects, and thus expose the fact that the ontology is broken.This gives incentive to fix it, and examples as to what things should bepossible using that ontology (namely, some level of basic inference)." Ithink that I understand the basic idea behind structured data on Commons. Ialso think that I understand your statement above. What I'm notunderstanding is how Daniel's proposal to "start using the ontology as anontology on wikimedia projects, and thus expose the fact that the ontologyis broken." isn't a proposal to add poor quality information from Wikidataonto Wikipedia and, in the process, give Wikipedians more problems to fix.Can you or Daniel explain this?

Separately, someone wrote to me off list to make the point that Wikipedianswho are active in non-English Wikipedias also wouldn't appreciate havingtheir workloads increased by having a large quantity poor-qualityinformation added to their edition of Wikipedia. I think that one of theperson's concerns is that my statement could have been interpreted asimplying something like "it's okay to insert poor-quality information onnon-English Wikipedias because their standards are lower". I apologize if Igave the impression that I would approve of a non-English language editionof Wikipedia being on the receiving end of an unwelcome large addition ofinformation that requires significant effort to clean up. Hopefully myresponse here will address the concerns that I heard off list, and if notthen I welcome additional feedback.

data on Commons. I also think that I understand your statement above.What I'm not understanding is how Daniel's proposal to "start using theontology as an ontology on wikimedia projects, and thus expose the factthat the ontology is broken." isn't a proposal to add poor qualityinformation from Wikidata onto Wikipedia and, in the process, giveWikipedians more problems to fix. Can you or Daniel explain this?

While I can not pretend to have expert knowledge and do not purport tointerpret what Daniel meant, I think here we must remember thatWikipedia, while being of course of huge importance, is not the onlyWikimedia project, so "start using it on Wikimedia projects" does notnecessarily mean "start using it on Wikipedia", yet less "start addingbad information to Wikipedia" (there are other ways to use the data,including imperfect ontologies - e.g. for search, for bot guidance, forquality assurance and editor support, and many other ways) I am notprescribing a specific scenario here, just reminding that "using theontology on wikimedia projects" can mean a wide variety of things.

Separately, someone wrote to me off list to make the point thatWikipedians who are active in non-English Wikipedias also wouldn'tappreciate having their workloads increased by having a large quantitypoor-quality information added to their edition of Wikipedia. I think

I am sure that would be a bad thing. But I don't think anything we arediscussing here would lead to that happening.

Just to address what Markus was hinting at with inference rules. Bothpositive and negative rules could be stored. Back in the Freebase days, wehad those and were called "mutex's". We used them for "type incompatible"hints to users and stored those "type incompatible" mutex rules in theknowledge graph. (Freebase being a Type based system along with havingProperties under each Type)

Such as: ORGANIZATION != SPORT

You actually have all those type incompatible mutexs in the Freebase dumpshanded to you where you could start there. The biggest one was called "BigMomma Mutex".Here is an archived email thread to give further context:https://freebase.markmail.org/thread/z5o7nlnb62n5t22o

Anyways, the point is that those rules worked well for us in Freebase and Ican see rules also working wonders in various ways in Wikidata as well.Maybe its just a mutex at each class ? Where multiple statements could holdrules ?

There is already stuffs to handle this kind of Â« mutex Â» on Wikidata :"disjoint union of", see for example in usage on htps://www.wikidata.org/wiki/Q180323 . The statements are used on the talk page bytemplates that uses them to generate queries to find instances that violatethe mutex : https://www.wikidata.org/wiki/Talk:Q180323 (for example Thisquery<https://query.wikidata.org/#select%20%3Fitem%20where%20%7B%0A%09%3Fitem%20wdt%3AP31%2Fwdt%3AP279%2A%20wd%3AQ180323%20%20minus%20%7B%0A%09%09%7B%0A%09%09%09%3Fitem%20wdt%3AP31%2Fwdt%3AP279%2A%20wd%3AQ900457%20%0A%09%09%7D%20union%20%7B%0A%09%09%09%3Fitem%20wdt%3AP31%2Fwdt%3AP279%2A%20wd%3AQ578786%20%0A%09%09%7D%20union%20%7B%0A%09%09%09%3Fitem%20wdt%3AP31%2Fwdt%3AP279%2A%20wd%3AQ405478%20%0A%09%09%7D%20union%20%7B%0A%09%09%09%3Fitem%20wdt%3AP31%2Fwdt%3AP279%2A%20wd%3AQ46993066%20%0A%09%09%7D%20union%20%7B%0A%09%09%09%3Fitem%20wdt%3AP31%2Fwdt%3AP279%2A%20wd%3AQ2253183%20%0A%09%09%7D%0A%09%7D%0A%7D>, that does not find anything unsurprisingly because I donât expect to finda lot of vertebra instances on Wikidata :) )

Post by Thad GuidryHi All,Just to address what Markus was hinting at with inference rules. Bothpositive and negative rules could be stored. Back in the Freebase days, wehad those and were called "mutex's". We used them for "type incompatible"hints to users and stored those "type incompatible" mutex rules in theknowledge graph. (Freebase being a Type based system along with havingProperties under each Type)Such as: ORGANIZATION != SPORTYou actually have all those type incompatible mutexs in the Freebase dumpshanded to you where you could start there. The biggest one was called "BigMomma Mutex".https://freebase.markmail.org/thread/z5o7nlnb62n5t22oAnyways, the point is that those rules worked well for us in Freebase andI can see rules also working wonders in various ways in Wikidata as well.Maybe its just a mutex at each class ? Where multiple statements couldhold rules ?Thad+ThadGuidry <https://www.google.com/+ThadGuidry>_______________________________________________Wikidata mailing listhttps://lists.wikimedia.org/mailman/listinfo/wikidata

As I understood Daniel, he did not talk about inserting low qualitycontent into any project, Wikipedia or other. What I believe he meantwith "using the ontology" is to use it for improving search/discoveryservices that help editors to find something (i.e., technicalinfrastructure, not editorial content). Doing so could lead to anadditional amount of mostly useful results, but it will not yet beenough to get all results that a user would intuitively expect. Maybehis wording made this sound a bit too dramatic -- I think he just wantedto emphasize the point that any actual use will immediately providemotivation and guidance for Wikidata editors to improve things that arecurrently imperfect.

I agree with him in that I think we need to identify ways of movinggradually forward, offering the small benefits we can already providewhile creating an environment that allows the community to improvethings step by step. If we ask for perfection before even starting, wewill get into a deadlock where we bind editor resources in redundanttagging tasks instead of empowering the community to improve thesituation in a sustainable way.

Post by Pine WI would appreciate clarification what is proposed with regard toexposing problematic Wikidata ontology on Wikipedia. If the ideainvolves inserting poor-quality information onto English

Wikipedia in

Post by Pine Worder to spur us to fix problems with Wikidata, then I am likely tooppose it. English Wikipedia is not an endless resource for free

labor,

Post by Pine Wand we have too few skilled and good-faith volunteers to handle ouralready enormous scope of work.

You are right, and thankfully this is not what is proposed. The proposalwas to offer people who search for Commons media the (maybe optional)possibility to find more results by letting the search engine traversethe "more-general-than" links stored in Wikidata. People have discoveredcases where some of these links are not correct (surprise! it's a wiki;-), and the suggestion was that such glitches would be fixed withhigher priority if there would be an application relying on it. But evenwith some wrong links, the results a searcher would get would stillinclude mostly useful hits. Also, at least half of the currentlyobserved problems with this approach would lead to fewer results (e.g.,dogs would be hard to include automatically to a search for allmammals), but in such cases the proposed extension would simply do whatthe baseline approach (ignoring the links) would do anyway, so servicewould not get any worse. Also, the manual workarounds suggested by some(adding "mammal" to all pictures of some "dog") would be compatible withthis, so one could do both to improve search experience on both ends.Best regards,MarkusHi Markus, I seem to be missing something. Daniel said, "And I think thebest way to achieve this is to start using the ontology as an ontologyon wikimedia projects, and thus expose the fact that the ontology isbroken. This gives incentive to fix it, and examples as to what thingsshould be possible using that ontology (namely, some level of basicinference)." I think that I understand the basic idea behind structureddata on Commons. I also think that I understand your statement above.What I'm not understanding is how Daniel's proposal to "start using theontology as an ontology on wikimedia projects, and thus expose the factthat the ontology is broken." isn't a proposal to add poor qualityinformation from Wikidata onto Wikipedia and, in the process, giveWikipedians more problems to fix. Can you or Daniel explain this?Separately, someone wrote to me off list to make the point thatWikipedians who are active in non-English Wikipedias also wouldn'tappreciate having their workloads increased by having a large quantitypoor-quality information added to their edition of Wikipedia. I thinkthat one of the person's concerns is that my statement could have beeninterpreted as implying something like "it's okay to insert poor-qualityinformation on non-English Wikipedias because their standards arelower". I apologize if I gave the impression that I would approve of anon-English language edition of Wikipedia being on the receiving end ofan unwelcome large addition of information that requires significanteffort to clean up. Hopefully my response here will address the concernsthat I heard off list, and if not then I welcome additional feedback.Thanks,Pine( https://meta.wikimedia.org/wiki/User:Pine )_______________________________________________Wikidata mailing listhttps://lists.wikimedia.org/mailman/listinfo/wikidata

Hi Markus, I seem to be missing something. Daniel said, "And I think the bestway to achieve this is to start using the ontology as an ontology on wikimediaprojects, and thus expose the fact that the ontology is broken. This givesincentive to fix it, and examples as to what things should be possible usingthat ontology (namely, some level of basic inference)." I think that Iunderstand the basic idea behind structured data on Commons. I also think that Iunderstand your statement above. What I'm not understanding is how Daniel'sproposal to "start using the ontology as an ontology on wikimedia projects, andthus expose the fact that the ontology is broken." isn't a proposal to add poorquality information from Wikidata onto Wikipedia and, in the process, giveWikipedians more problems to fix. Can you or Daniel explain this?

What I meant in concrete terms was: let's start using wikidata items for taggingon commons, even though search results based on such tags will currently notyield very good results, due to the messy state of the ontology, and hope peoplefix the ontology to get better search results. If people use "poodle" to tag animage and it's not found when searching for "dog", this may lead to peopleinvestigating why that is, and coming up with ontology improvements to fix it.

What I DON'T mean is "let's automatically generate navigation boxes forwikipedia articles based on an imperfect ontology, and push them on everyone".I mean, using the ontology to generate navigation boxes for some kinds ofarticles may be a nice idea, and could indeed have the same effect - that peoplenotice problems in the ontology, and fix them. But that would be something thelocal wiki communities decide to do, not something that comes from Wikidata orthe Structured Data project.

The point I was trying to make is: the Wiki communities are rather good increating structures that serve their purpose, but they do so pragmatically,along the behavior of the existing tools. So, rather than trying to work aroundthe quirks of the ontology in software, the software should use very simplyrules (such as following the subclass relation), and let people adopt the datato this behavior, if and when they find it useful to do so. This approach, overtime, provides better results in my opinion.

Also, keep in mind that I was referring to an imperfect *improvement* of search.the alternative being to only return things tagged with "dog" when searching for"dog". I was not suggesting to degrade user experience in order to incentivizeeditors. I'm rather suggesting the opposite: let's NOT give people a reason tagimages that show poodles with "poodle" and "dog" and "mammal" and "animal" and"pet" and...

Post by Pine WHi Markus, I seem to be missing something. Daniel said, "And I think the

best

Post by Pine Wway to achieve this is to start using the ontology as an ontology on

wikimedia

Post by Pine Wprojects, and thus expose the fact that the ontology is broken. This

gives

Post by Pine Wincentive to fix it, and examples as to what things should be possible

using

Post by Pine Wthat ontology (namely, some level of basic inference)." I think that Iunderstand the basic idea behind structured data on Commons. I also

think that I

Post by Pine Wunderstand your statement above. What I'm not understanding is how

Daniel's

Post by Pine Wproposal to "start using the ontology as an ontology on wikimedia

projects, and

Post by Pine Wthus expose the fact that the ontology is broken." isn't a proposal to

add poor

Post by Pine Wquality information from Wikidata onto Wikipedia and, in the process,

give

Post by Pine WWikipedians more problems to fix. Can you or Daniel explain this?

What I meant in concrete terms was: let's start using wikidata items for taggingon commons, even though search results based on such tags will currently notyield very good results, due to the messy state of the ontology, and hope peoplefix the ontology to get better search results. If people use "poodle" to tag animage and it's not found when searching for "dog", this may lead to peopleinvestigating why that is, and coming up with ontology improvements to fix it.What I DON'T mean is "let's automatically generate navigation boxes forwikipedia articles based on an imperfect ontology, and push them on everyone".I mean, using the ontology to generate navigation boxes for some kinds ofarticles may be a nice idea, and could indeed have the same effect - that peoplenotice problems in the ontology, and fix them. But that would be something thelocal wiki communities decide to do, not something that comes from Wikidata orthe Structured Data project.The point I was trying to make is: the Wiki communities are rather good increating structures that serve their purpose, but they do so pragmatically,along the behavior of the existing tools. So, rather than trying to work aroundthe quirks of the ontology in software, the software should use very simplyrules (such as following the subclass relation), and let people adopt the datato this behavior, if and when they find it useful to do so. This approach, overtime, provides better results in my opinion.Also, keep in mind that I was referring to an imperfect *improvement* of search.the alternative being to only return things tagged with "dog" when searching for"dog". I was not suggesting to degrade user experience in order to incentivizeeditors. I'm rather suggesting the opposite: let's NOT give people a reason tagimages that show poodles with "poodle" and "dog" and "mammal" and "animal" and"pet" and...--Daniel KinzlerPrincipal Software Engineer, Core PlatformWikimedia Foundation

Hi Daniel,

Thanks for the explanation. I think that I now better understand whatyou're proposing. This explanation of the proposal sounds reasonable to mein a way that my earlier understanding of the proposal did not.

By the way, I don't know what your normal work schedule is, but I usuallydon't expect staff to respond to non-urgent emails over the weekend,although I appreciate it. :) Waiting until Monday is usually fine.

if you happen to speak German and if you feel intrigued about the Illuminati, this might be of interest to you:

https://blog.factgrid.de/archives/1151

We will use our upcoming Illuminati-Workshop on Nov. 16/17 to discuss how we can make better use of our Wikibase installation here at Gotha.

https://database.factgrid.de/wiki/Main_Page

The database is filled with metadata of Illuminati documents and (selected) membership information and is supposed to help us with complexities of our Illuminati wiki (https://projekte.uni-erfurt.de/illuminaten/Main_Page), but we do not yet have the clearest idea of what we have produced here or possibly can.

It is interesting to note that what Cparle wants are "is a" relationshipsbased on common sense. For most people, ants are insects, not instances oftaxon. A clarinet is a woodwind instrument, and woodwind instruments aremusical instruments, not an instance of "first order metaclass".

One of the best sources of "common sense" hypernymy is probably the firstsentence of a Wikipedia page. Whether in English, French, Italian, a womanis always "a female *human *being."

For "poodle", this would look like (following the links in the Englishversion of Wikipedia):

- The poodle is a group of formal *dog breeds*

- Dog breeds are *dogs* that...

- The domestic dog (...) is a member of the genus *Canis* (canines)

- Canis is a genus of the *Canidae*

- The biological family Canidae (...) is a lineage of *carnivorans*

- Carnivora (...) is a diverse *scrotiferan *order

- Scrotifera is a clade of *placental mammals*

- Placentalia ("Placentals") is one of the three extant subdivisions of theclass of animals *Mammalia*...

- Mammals are the *vertebrates *within the class Mammalia...

From my point of view, this classification looks much better than thecurrent relationships in Wikidata's ontology.

The automatic extraction of hypernymic relationships from English texts(especially Wikipedia) has been studied for a long time and gives goodresults, even with simple methods based on hand-crafted rules. In the caseof Wikipedia, the hypernym often has a page itself (and therefore a link toWikidata), which could simplify the NLP extraction and the mapping withWikidata items.

Of course, the extracted relationships will not always be "subclass of" or"instance of". But if someone proposed a new property called "WikipediaHypernyms" (and its symmetric property "Wikipedia Hyponyms"), I would useit more willingly and with more confidence than the current system. Thiswould also better respect the logic of Wikidata's descriptions.

I mean, if the description of Zoroastrianism (Q9601) says this is an"Ancient Iranian *religion *founded by Zoroaster", one would expect theclass "religion" to appear much earlier in the hierarchy of superclasses ofthis item. If there was this property "Wikipedia Hypernyms", we couldmention it in the same page - since Wikipedia describes Zoroastrianism as"one of the world's oldest *religions *that remains active." And a SPARQLquery looking for 'all items that have "religion" as "Wikipedia hypernyms"property' would be much much faster.

Note: sorry if this reflection is naive or if it has already beendiscussed/tested.

Cheers,

Ettore

Post by James HealdThis recent announcement by the Structured Data team perhaps ought to behttps://commons.wikimedia.org/wiki/Commons_talk:Structured_data#Searching_Commons_-_how_to_structure_coverageEssentially the team has given up on the hope of using Wikidatahierarchies to suggest generalised "depicts" values to store for imageson Commons, to match against terms in incoming search requests.i.e. if an image is of a German Shepherd dog, and identified as such,the team has given up on trying to infer in general from Wikidata that'dog' is also a search term that such an image should score positively with.Apparently the Wikidata hierarchies were simply too complicated, toounpredictable, and too arbitrary and inconsistent in their design acrossdifferent subject areas to be readily assimilated (before one evenstarts on the density of bugs and glitches that then undermine them).Instead, if that image ought to be considered in a search for 'dog', itlooks as though an explicit 'depicts:dog' statement may be going to beneeded to be specifically present, in addition to 'depicts:German Shepherd'.Some of the background behind this assessment can be read inhttps://phabricator.wikimedia.org/T199119in particular the first substantive comment on that ticket, by Cparle on10 July, giving his quick initial read of some of the issues usingWikidata would face.SDC was considered a flagship end-application for Wikidata. If the datain Wikidata is not usable enough to supply the dogfood that project wasexpected to be going to be relying on, that should be a serious wake-upcall, a red flag we should not ignore.If the way data is organised across different subjects is currently tooinconsistent and confusing to be usable by our own SDC project, arethere actions we can take to address that? Are there design principlesto be chosen that then need to be applied consistently? Is thissomething the community can do, or is some more active direction goingto need to be applied?Wikidata's 'ontology' has grown haphazardly, with little oversight, likean untended bank of weeds. Is some more active gardening now required?-- James.---This email has been checked for viruses by AVG.https://www.avg.com_______________________________________________Wikidata mailing listhttps://lists.wikimedia.org/mailman/listinfo/wikidata

https://www.wikidata.org/wiki/Q1390 insecthttps://en.wikipedia.org/wiki/Insectsubclass of animalinstance of taxon

What is missing is that Q7386 is a subclass of Q1390, which is sanctioned bythe "Ants are eusocial insects" phrase at the start ofhttps://en.wikipedia.org/wiki/Ant. I added that statement and put as sourceEnglish Wikipedia. (By the way, how can I source a statement to a particularWikipedia page?)

I see no reason that this should not be done for other groups of livingorganisms where subclass relationships are missing.

It seems very simple to me. Maybe too simple. Perhaps I am intimidated bythe kilometers of discussions I'm reading about the taxon-centric aspect ofWikidata, when I'm not a biologist. So, there is no problem if we add thatCetacea <https://www.wikidata.org/wiki/Q160>is a subclass of aquaticmammals <https://www.wikidata.org/wiki/Q3039055>, as indicated by its Wikipediapage <https://en.wikipedia.org/wiki/Cetacea>?

Sure, but Wikidata doesn't have ants being instances of taxon. Instead,Formicidae (aka ant) is an instance of taxon, which seems right to me.Here are some extracts from Wikidata as of a few minutes ago, also showingthe English Wikipedia page for the Wikidata item.https://www.wikidata.org/wiki/Q7386 Formicidae anthttps://en.wikipedia.org/wiki/Antinstance of taxonno subclass of statementhttps://www.wikidata.org/wiki/Q1390 insecthttps://en.wikipedia.org/wiki/Insectsubclass of animalinstance of taxonWhat is missing is that Q7386 is a subclass of Q1390, which is sanctioned bythe "Ants are eusocial insects" phrase at the start ofhttps://en.wikipedia.org/wiki/Ant. I added that statement and put as sourceEnglish Wikipedia. (By the way, how can I source a statement to a particularWikipedia page?)I see no reason that this should not be done for other groups of livingorganisms where subclass relationships are missing.peter

From Peter F. Patel-SchneiderHi,I see no reason that this [adding subclass relationships sanctioned by corresponding Wikipedia pages]should not be done for other groups of livingorganisms where subclass relationships are missing. It seems very simple to me. Maybe too simple. Perhaps I am intimidated by thekilometers of discussions I'm reading about the taxon-centric aspect ofWikidata, when I'm not a biologist. So, there is no problem if we addthat Cetacea <https://www.wikidata.org/wiki/Q160>is a subclass of aquaticmammals <https://www.wikidata.org/wiki/Q3039055>, as indicated byits Wikipedia page <https://en.wikipedia.org/wiki/Cetacea>?Cheers,Ettore

How can there be any effective counter to adding these relationships? ManyWikidata items correspond to Wikipedia pages. If the true information aboutthe Wikidata item in the corresponding pages cannot be added to the Wikidataitems, then the correspondence is not correct and should be removed.

peter

PS: Of course, determining truth may be contentious in some cases, but thesewill be a small minority.