Did you hear about AgroTagger? It is a JAVA Command Line application to assign semantic terms to textual content. At a high level of abstraction, AgroTagger can be considered as a keyword extractor that uses the AGROVOC thesaurus, to extract keywords from a set of web URLs. The AgroTagger is also used to enrich data in different digital information environments (AGRIS is among them).

SCENARIO:

Given the dynamic influence of numerous social media (web 2.0 tools and other media) on the world of scientific publications, information systems should:

not only facilitate access to information about research outputs, but also

learn (be regularly assessed and updated) to make research outcomes enhanced by both topical and contextual semantic data collected by means of web crawling/discovering and indexing, -

to "come across" what is actually largely meaningful to the topic of interest.

To bring support to the aforementioned scenario, this entry introduces you to AgroTagger, which adoption is described in a F1000Research article : -

Supportive quotes from the article:

“In this context, we believe that it is important for AGRIS users – especially for researchers – to have access to those valuable pieces of information that are neither exposed in a database nor accessible via web service”

“In fact, it is not only important to discover web links, but also to process them in a way that allows reuse in multiple scenarios”

“... it is possible to apply semantic enrichment to crawled web resources and to use this semantic knowledge to enhance the AGRIS web portal...our work leverages Semantic Web technologies and the knowledge encoded in the AGROVOC thesaurus in order to recommend web resources that are relevant to a given AGRIS bibliographic item”.

“… we discuss crawling and analysing web resources to populate our “Crawler Database”; a SPARQL endpoint with AGROVOC annotations of web resources identified by the URL from which they were crawled”.

“The entire process we discuss in this paper has already been implemented and integrated in the AGRIS website”.

AgroTagger classifier

In the aforementioned paper, the AgroTagger tool is presented in the incremental process (developed within SemaGrow project) to discover web resources in the domain of agricultural science and technology. Basically, the paper describes this process in three steps:

Step 1

After web crawling was done, starting from a set of URLs from “trusted” and valuable websites ...

Step 2

AgroTagger was applied to give a meaning to discovered resources. AGROVOC URIs/keywords were assigned to any web page, PDF, and word documents discovered during the crawling phase...

Step 3

A recommender system was run, to integrate discovered resources with AGRIS, in order to enrich the user experience of the AGRIS website. This was possible thanks to the use of Linked Data methodologies, with help of which a wide array of custom-crawled resources were interlinked with the AGRIS bibliographic database.

The paper also discusses the SemaGrow Stack open-source software, a query federation and data integration infrastructure used to estimate the semantic distance between crawled web resources and AGRIS. The SemaGrow Stack is used as part of the recommender system, - a JAVA component that computes meaningful combinations between the Crawler Database and the AGRIS database, and generates a new triplestore: the “Recommender Database”.

NOTE: The workflow and the components described in this paper can be used in any domain, so they are not restricted to agriculture; one can simply use another thesaurus to annotate web resources and populate the Crawler Database composed of triples generated by AgroTagger.

P.S. "Workflow tools can provide a mechanism for moving into earlier stages of the research lifecycle, adding value and ensuring an ongoing pipeline of outstanding scholarship", - Strategy & Integration Among Workflow Providers (The ScholarlyKitchen, 2017)

Enjoy the article !

More about AgroTagger

As a FAO initiative, the AgroTagger JAVA software was developed in the context of a couple of EU projects, such as agINFRA and SemaGrow. The application is language independant, but the model created is based on AGROVOC in English. If one is able to build a new MAUI model, it can support all the languages.