Posts about our text and social media analysis work and latest news on GATE (http://gate.ac.uk) - our open source text and social media analysis platform. Also posts about the PHEME project (http://pheme.eu) and our work on automatic detection of rumours in social media. Lately also general musings about fake news, misinformation, and online propaganda.

Address the problem of LOD domain vocabulary enrichment and interlinking. Develop GATE-based tools for efficient LOD vocabulary lookup and LOD-based term disambiguation. Evaluate these, both quantitatively and with end-users and other stakeholders.

Develop and evaluate intuitive user interface methods that can hide the complexities of the SPARQL semantic search language, while allowing environmental researchers to search successfully, using LOD vocabularies.

Build a case study, using the new British Library information discovery tool for environmental science, Envia. Test the use of LOD vocabularies towards enhancing information discovery and management.

Collaborate with domain experts at the environmental consultants HR Wallingford, providing feedback on how the semantic work undertaken here supports their work as environmental science practitioners and innovators.

Background and Motivation

Environmental
Science is a broad, interdisciplinary subject area that spans
biology, chemistry, earth sciences, physics, and engineering. Because
of the breadth of the subject scope, information discovery and
sharing in environmental science is often a challenge. Linked Open
Data (LOD) and vocabularies offer an opportunity to improve the
process of information discovery and sharing through unique,
machine-readable, interlinked open vocabularies, thus ultimately
connecting users more efficiently to useful and relevant resources.

Key
vocabularies for environmental science are already becoming available
as Linked Data (e.g. the GEMET thesaurus), as are other key resources
relevant for the domain (e.g. Geonames, DBpedia). One outstanding
challenge is to use them to enrich unstructured content and metadata
with semantics. Doing so manually is prohibitively expensive and
unsustainable, since LOD vocabularies typically have millions of
instances. Therefore there is a strong need for semantic annotation
tools that enrich metadata and content with LOD semantics
automatically. EnviLOD will tackle the problem of LOD vocabulary
enrichment, interlinking, and adoption in the domain of environmental
science, however, results will be relevant also to other fields. The starting point will be the DBpedia-based entity annotation and disambiguation algorithms, developed by Sheffield as part of the TrendMiner project.

The
second major challenge is to develop information access facilities
that use semantics to deliver a semantic search service, which is not
only more powerful, but also as simple to use as its non-semantic
counterparts. At present, the most widely used method for retrieving
information from Linked Data is through SPARQL queries. However,
formulating such queries is beyond the capabilities of most users and
presents a significant barrier to widespread uptake. EnviLOD will
evaluate user interface methods that can hide the complexities of
SPARQL, while allowing users successfully to utilise semantic search.

In the
context of environmental science, for example, a user searching for
flooding in south-east Britain would be able to find a report with a
chapter on water levels at the Thames barrier. In other words, by
exploiting the additional semantic context from relevant Linked Open
Data ontologies, the user will find a report in the search results
that would not have been picked up based on a simple keyword search.

Critical Success Factors

1.Scalability: LOD resources, such as
DBPedia and GeoNames have (tens of) millions of instances, so using them for
semantic annotation and semantic queries is far from trivial. Thus scalability
and robustness to noisy data are key requirements for EnviLOD. Our solution is based on Ontotext's OWLIM semantic repository, which scales to billions of triples. OWLIM is coupled with the
open-source GATE semantic annotation tools and Linked Data endpoints. We import
Linked Data into the OWLIM semantic repository, which provides a SPARQL endpoint.
GATE Mimir is used to index full text, metadata, and semantic annotations,
which underpin the semantic search UI.

2.Sustainability: All project results will be made available as open-source. Software
will be provided with a clearly-defined API to facilitate adoption. The results
will be incorporated within the Envia discovery tool, which will be supported
by the British Library.

3.Usability: Usability of the semantic
search user interface is paramount. UI mockups will be created and tested first
with the British Library and HR Wallingford, followed by a wider consultation
with key stakeholders. The UI will be designed to match as closely as possible
the user’s current search practices, as well as their needs for
semantically-enhanced queries.