The primary sources tool allows for a curation workflow for data donations to Wikidata, where Wikidata editors can review, edit, or reject data offered to the community. The workflow is integrated into Wikidata.

The first and current version of the primary sources tool (PST) stems from the donation of Freebase by Google.[1] Based on community feedback collected since its deployment as a Wikidata gadget,[2][3][4] the StrepHit team submits here a radical uplift proposal, which will lead to the next version of the tool.

Beta version of Primary sources tool

Please note that all the mock-ups referenced in this document are accessible at phab:M218.

given an Item page, the suggested statement is highlighted with a blue background;

the user can approve or reject it by clicking either on the "approve claim" or on the "reject claim" links respectively;

after that, the page will update with the new statement in the first case or without it in the second one.

Identically, the tool can suggest new references for an existing statement:[9]

the new reference is highlighted with a blue background;

the user can approve or reject it by clicking either on the approve reference or on the reject reference links respectively;

the user can also see a preview tooltip that shows where the source came from by clicking on preview reference;[10][11]

if the dataset contains fine-grained provenance information, e.g., the text snippet where the suggested statement was extracted,[12] the preview tooltip will highlight that exact piece of information;[13]

in case the interaction between the front end and the back end is not smooth, a tooltip will show up with an alert message.[14]

A similar workflow applies to a filter-based tool, located in the Tools menu of the left sidebar.

When the user clicks on the Primary Sources filter link (currently Primary Sources list), a modal window will open;[18]

the user can view a table of suggested statements with eventual references by building filters in several ways:

Domain of interest: the user starts typing a domain he or she is interested in and gets autocompletion based on simple constraints, typically the instance of (P31) property. For example, list all the Items that are a chemical compound (Q11173);

Property: the user starts typing a property he or she is interested in and gets autocompletion based on property labels. This filter then only shows suggested statements with the given property. For instance, list all the date of birth (P569);

SPARQL Query: this filter is intended for power users and accepts arbitrary SPARQL queries;

After building the filters, the tool shows a table of statements, where the user can either approve or reject suggestions, after a preview of the reference source, as per the "User workflow" section. The approval or reject actions can be blocked if the source preview is not opened.[19]

The tool currently accepts datasets serialized in QuickStatements (Q20084080). While it is indeed a very compact format, useful to upload large datasets, it is totally non-standard: the only available documentation is contained in the QuickStatements service page itself.[20] Hence, we foresee the support of stable formats for both the self-sustainability of the project and a standardized data donation workflow. Still, we will keep the QuickStatements support.

Datasets from third-party providers should be serialized in RDF and follow the Wikidata RDF data model.[21]
We believe this is the most standard way for 2 reasons:

RDF is a mature Web standard, being a W3C recommendation since 1999;[22]

The Ingestion API is responsible for the interaction with third-party data providers.
Incoming datasets are first validated against the Wikidata RDF data model.
It will then provide the following facilities for datasets:

The main self-sustainability goal is to avoid breaking the front end whenever a change is made in the Wikidata user interface.
To achieve this, the current gadget will become a MediaWiki extension for Wikibase (Q16354758).
A major refactoring of the code base is essential and will:

include unit tests. Failures are expected in case of changes in the Wikidata user interface, and will break the Wikidata build instead of breaking the tool;

make a clear distinction between the interaction with the back end and the users;

port the HTML templates.

The code will be split into the 2 typical components of a MediaWiki extension, written in PHP and JavaScript respectively.