Some context in the era of Linked Data

Introduction Knowledge Graphs (KGs) are currently on the rise. In their latest Hype Cycle for Artificial Intelligence (2018), Gartner highlighted: “The rising role of content and context for delivering insights with AI technologies, as well as recent knowledge graph offerings for AI applications have pulled knowledge graphs to the surface.” We can roughly di […]

Knowledge graphs are essential for any information architecture built upon semantics and AI. The Linked Data Life Cycle provides guideline for data governance within the semantic web framework. The post Knowledge Graphs – Connecting the Dots in an Increasingly Complex World appeared first on Semantic Web Company.

With the UN predicting that more than 70 percent of the world’s population will live in urban areas by 2050, the development of sustainable smart cities is a rising need. Cities are now capable of collecting and analyzing enormous amounts of data to automate processes, improve service quality, and to make better decisions. This opens ... The post How Semanti […]

Drupal is one of the favourite enterprise content management systems. Especially government and non-governmental organizations embrace this open source platform to build advanced digital experiences. Over the last years, we have been developing several PoolParty semantic technology features and modules that integrate natively into Drupal. In this blog post, […]

In our recent endeavor to import in PoolParty the Google Product taxonomy in different languages, we encountered some challenges that needed to be addressed. The first challenge was that the Google Product taxonomy is in Excel (XLS) format, and for each language there is a separate file. The second challenge is on how to align ... The post Data wrangling wit […]

Category Archives for text extraction

What is PoolParty GraphSearch?

With PoolParty GraphSearch companies can search over a variety of content types and business objects and analyze the data on a more granular level. All content and data repositories that are connected to GraphSearch are annotated with semantic metadata that makes the search, recommendation and analytics operations highly precise. GraphSearch is a front-end application put on top of a semantic infrastructure and an API providing the following features:

Ontology-based data access (OBDA)

Faceted search including hierarchies

Autocomplete combined with context information

Custom views on entity-centric and document-centric data

Statistical charts for the unified data repositories

Plug-in system for recommendation and similarity algorithms

How does it work?

Business users query knowledge assets in GraphSearch along data models. As multiple systems can be connected to GraphSearch, the variety of knowledge models are additionally linked by an ontology layer.

System administrators can define which part of the ontology and corresponding entities in the taxonomy should be used in the GraphSearch front-end application. That way, they define specific views on data. They can also provide multiple search spaces within GraphSearch and enable the user to switch between them. A search space is a customized search configuration over a specific data set. The selected search facets for each search space are derived from the knowledge graph.

GraphSearch can be enhanced with recommendation algorithms. These can work with similarity-based recommendations, or for some use cases, a matchmaking algorithm is more suitable. The research team of Semantic Web Company has a strong focus on machine learning and is continuously extending the library of machine learning algorithms in GraphSearch.

Data analytics functionalities support the business user to derive even more granular insights. Search facets can be combined into statistical charts and display which kind of data is actually available for specific topics.

Agile Data Management and Integration

The implementation of PoolParty GraphSearch is the beginning of consolidating data silos without data migration. Various functional roles have to work together in order to deliver a unified data environment. PoolParty takes the heterogeneous technical backgrounds of the involved professionals into consideration.

Specific user-friendly solutions support the whole knowledge management team in their collaborative work processes:

Subject matter experts can define a semantic data layer to describe the meaning of metadata in the PoolParty taxonomy management tool.

Knowledge engineers can link separate taxonomies and maintain the knowledge graph in the same tool.

Information architects and developers can link various content and data repositories with the semantic metadata via the PoolParty API.

Data scientists can adapt embedded machine-learning algorithms to finetune the search, classification, and recommendation results that are mainly derived through the knowledge graph.

This semi-automatic knowledge engineering approach sustains that the query results will gradually get more precise and applicable to a continuously growing data environment.

On top of that, GraphSearch enables business users to search over data repositories and analyze available information.

The PoolParty approach for efficient knowledge modeling is based on methods from

text analytics and text mining

linked data management

SKOS thesaurus modeling

ontology engineering and

semantic wikis

and recombines these techniques to a unique approach to create complex knowledge models which can be further used for all of the above mentioned tasks, semantic search, and knowledge discovery in big data sets.

In recent years, we have constantly discussed the application of thesauri and other knowledge models to improve search. Many people understand that thesaurus based search is in many cases better than search algorithms purely based on statistics. Of course the big contra always was, “the costs are too high to establish a good-enough thesaurus or even a high-quality one”.

Imagine you could generate any thesaurus you would like for nearly any knowledge domain you can think of with quite a good quality! Sounds impossible? Reminds you of all the promises made by text mining software which generates “semantic nets” from scratch?

Here at the Semantic Web Company we have been working on SKOSsy for a while. I will explain what this web service can do for you:

SKOSsy generates SKOS based thesauri in German or in English for a domain you are interested in. SKOSsy extracts data from DBpedia, so it can cover anything which is in DBpedia. Thus, SKOSsy works well whenever a first seed thesaurus should be generated for a certain organisation or project. If you load the automatically generated thesaurus into an editor like PoolParty Thesaurus Manager (PPT) you can start to enrich the knowledge model by additional concepts, relations and links to other LOD sources. But you don´t have to start in the open countryside with your thesaurus project.

With SKOSsy in place custom-tailored text extractors can be produced with low effort. To sum up,

SKOSsy makes heavy use of Linked Data sources, especially DBpedia

SKOSsy can generate SKOS thesauri for virtually any domain within a few minutes

Such thesauri can be improved, curated and extended to one´s individual needs but they serve usually as “good-enough” knowledge models for any semantic search application you like

SKOSsy based semantic search usually outperform search algorithms based on statistics since they contain high-quality information about relations, labels and disambiguation

Today finally I logged in to Twine the first time. I was reading yesterday about some shortcomings of the system, so I was keen on trying out the system by myself to get my own impression.

It´s true that the system isn´t as easy to understand as del.icio.us or other bookmarking tools. It takes a while until you get used to all those additional ways you can navigate through the system. Remember: “Twine looks at content and parses it automatically for the names of people, places, organizations and other subject tags. Users are then able to navigate between related content, view recommended content and connect with recommended people with related interests.” – But the “shortcoming” mentioned by Marshall Kirkpatrick that “… it’s hard to keep track of all the levels and types of information available” I can´t agree with: This has only to do with a general problem, which arises whenever semantic technologies should enhance the user experience. Either you stay with “simple” user-interfaces like Google or del.icio.us or you spend 5 minutes or so to learn a new piece of software which will help you to save time in the future and which helps you to find related information automatically.
On the other hand I was very surprised, that the automatic recommendations Twine makes on how to annotate or describe a new resource is really unsatisfying. Users will only spend time to tag their bookmarks if the machine comes up with some intelligent suggestions. And it´s true, as Marshall says, “most of the web is made up of ugly, non-standard pages.”

So hopefully Twine will add that feature before it will open up to the public (isn´t there a plan to integrate OpenCalais or something similar?), otherwise there will be no “first mainstream semantic web application” but only another prototype of a yet another semweb-app.

Really large companies start to spur the semantic web. Reuters has recently launched a semantic web service which is free also for commercial purposes. It helps to extract significant phrases from any unstructured text (web documents or office documents). This new service is called “OpenCalais” and is based on ClearForest text-analytics solutions (which was acquired by Reuters in 2007). So finally a dream comes true: Web content can be tagged automatically in quite a high quality. Technically spoken: Any unstructured text can be transformed into an RDF-graph on the fly, important phrases or even statements can be extracted from plain text.

OpenCalais is the core service for many new web applications and most of them will deal with better search functionalities or will also help to identify similarities between different types of content. For instance, for any document which is published on a web site related blogs or videos (or whatever) can be retrieved and presented as relevant context information.

Whenever an application will use OpenCalais content will be delivered to Reuters. Thus, submitting a URL has a different meaning in the future than it had all the years before: It´s not only about “promoting” a website anymore, it´s rather about examining ways to get connected with the semantic web – and about teaching Reuter´s global knowledge base 😉