Today we are featuring DBpedia Entity, in our blog series of introducting interesting DBpedia applications and tools to the DBpedia community and beyond. Read on and enjoy.

DBpedia-Entity is a standard test collection for entity search over the DBpedia knowledge base. It is meant for evaluating retrieval systems that return a ranked list of entities (DBpedia URIs) in response to a free text user query.

The first version of the collection (DBpedia-Entity v1) was released in 2013, based on DBpedia v3.7 [1]. It was created by assembling search queries from a number of entity-oriented benchmarking campaigns and mapping relevant results to DBpedia. An updated version of the collection, DBpedia-Entity v2, has been released in 2017, as a result of a collaborative effort between the IAI group of the University of Stavanger, the Norwegian University of Science and Technology, Wayne State University, and Carnegie Mellon University [2]. It has been published at the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’17), where it received a Best Short Paper Honorable Mention Award. See the paper and poster.

Finally, we are proud to announce that the beta-testing of our data release tool for data releases on the DBpedia Databus is about to start.

In the past weeks our developers at DBpedia have been devloping a new data release tool to release datasets on the DBpedia Databus. In that context we were and are still looking for beta-testers who have a dataset they wish to release. Sign up here and benefit from an increased visibility for your dataset and your work done.

We are now preparing the first internal test with our own dataset to ensure the data release tool is ready for the testers. During the testing process, beta-testers will discuss occuring problems, challenges and ideas for improvement via the DBpedia #releases channel on Slack to profit from each other’s knowledge and skills. Issues are documented via GitHub.

Milestone One:Every tester needs to have a WebID to release data on the DBpedia Databus. In case you are interested in how to set up a WebID, our tutorial will help you a great deal.

Milestone Two:For their datasets, testers will generate DataIDs, that provide detailed descriptions of the datasets and their different manifestations as well as relations to agents like persons or organizations, in regard to their rights and responsibilities.

Milestone Three: This milestone is considered as achieved, if an RSS feed feature can be genreated. Additionally, bugs, that arose during the previous phases should have been fixed. We also want to collect the testers particular demands and wishes that would benefit the tool or the process. A second release can be attempted to check how integrated fixes and changes work out.

Milestone Four:This milestone marks the final upload of the dataset to the DBedia Databus which is hopefully possible in about 3 weeks.

In case you want to get one of the last spots in the beta-testing team, just sign up here and get yourself a WebID and start testing.

We are happy to announce that the 12th DBpedia Community Meeting will be held in Vienna, Austria. At the beginning of SEMANTiCS 2018, Sep 10-13, the DBpedia Community will get together on the 10th of September for the DBpedia Day.

– What: We will discuss the development strategy of the DBpedia Association with members of the DBpedia chapters. You are cordially invited to participate in the discussion to shape the strategy of DBpedia.

Rencontre avec les français DBpédiens à Lyon

In cooperation with Thomas Riechert (HTWK/InfAI), the DBpedia Association organized our second DBpedia meetup this year, this time in Lyon. On July 3rd, 2018, we met the French DBpedia Community at the ENS in person and presented the vision of the new DBpedia Databus, an opportunity which simplifies the work with data.

First and foremost, we would like to thank the Institute for Applied Informatics for supporting our community and the LARHRA Laboratory as well as the ENS for hosting our community meetup. Special thanks go to Thomas Riechert and Vincent Alamercery (LARHRA Lyon) for organizing the event.

In the following months, Elmahdi plans to work on the DBpedia historic live version and the DBpedia wiki commons. His research will be presented during our 12th DBpedia Community meeting on September 10th, in Vienna.

Elmahdi Korfed from INRIA presented new features developed in the French DBpedia chapter.

Following Elmahdi, Francesco Beretta presented LARHRA laboratory and its different research areas. In particular, he introduced the Data for History Consortium which is an international consortium founded in 2017 with the aim of improving geo-historical data interoperability in the semantic web.

Afternoon Track

The afternoon track started out with an inspiring presentation by Adam Sanchez from the University of Grenoble. He talked about ‘RDFization of a relational database from medicine domain using Ontop’ (slides) and introduced the Ontop mappings. Afterwards, Oscar Rodríguez Rocha (University of Côte d’Azur) showcased the application ‘Automatic Generation Educational Quizzes’ from DBpedia (slides) and explained how the automatic generation of quizzes works based on the game Les Incollables.

The meeting concluded with a dynamic discussion on the DBpedia Databus and potential collaborations between the DBpedia Association and the French DBpedia Chapter.

All slides and presentations are available on our Website. You can find more feedback and photos about the event on Twitter via #DBpediaLyon.

You still can’t get enough of DBpedia?

Don’t worry, we already have another meeting of the DBpedia community in the pipeline. Our 12th DBpedia Community meeting is scheduled for September 10th and preparations on the program are already in full swing. Our DBpedia Day will kick-off this year’s edition of SEMANTiCS 2018, hosted at TU Vienna and brings the European DBpedia community together.

You want to contribute? Please submit your proposal and be a part of our amazing program. Register here and meet us and other DBpedia enthusiasts in Vienna. We are looking forward to your contribution.

Unfortunately, with the new GDPR, we experienced some trouble with our Blog. That is why this post is published a little later than anticipated.

There you go.

With our new strategic orientation and the emergence of the DBpedia Databus, we wanted to meet some DBpedia enthusiasts of the German DBpedia Community.

The recently hosted 6th LSWT (Leipzig Semantic Web Day) on June 18th, was the perfect platform for DBpedia to meet with researchers, industry and other organizations to discuss current and future developments of the semantic web.

Under the motto “Linked Enterprises Data Services”, experts in academia and industry talked about the interlinking of open and commercial data of various domains such as e-commerce, e-government, and digital humanities.

Sören Auer, DBpedia endorser and board member as well as director of TIB, the German National Library of Science and Technology, opened the event with an exciting keynote. Recapping the evolution of the semantic and giving a glimpse into the future of integrating more cognitive processes into the study of data, he highlighted the importance of AI, deep learning, and machine learning. They are as well as cognitive data, no longer in their early stages but advanced to fully grown up sciences.

Shortly after, Sebastian Hellmann, director of the DBpedia Association, presented the new face of DBpedia as a global open knowledge network. DBpedia is not just the most successful open knowledge graph so far, but also has a deep inside knowledge about all connected open knowledge graphs (OKG) and how they are governed.

With our new credo connecting data is about linking people and organizations, the global DBpedia platform aims at sharing efforts of OKG governance, collaboration, and curation to maximize societal value and develop a linked data economy.

The DBpedia Databus functions as Metadata Subscription Repository, a platform that allows exchanging, curate and access data between multiple stakeholders. In order to maximize the potential of your data, data owners need a WebID to sign their Metadata with a private key in order to make use of the full Databus services. Instead of one huge monolithic release every 12 months the Databus enables easier contributions and hence partial releases (core, mapping, wikidata, text, reference extraction) at their own speed but in much shorter intervals (monthly). Uploading data on the databus means connecting and comparing your data to the network. We will offer storage services, free & freemium services as well as data-as-a-service. A first demo is available via http://downloads.dbpedia.org/databus

During the lunch break, LSWT participants had time to check out the poster presentations. 4 of the 18 posters used DBpedia as a source. One of them was Birdory, a memory game developed during the Coding Da Vinci hackathon, that started in April 2018. Moreover, other posters also used the DBpedia vocabulary.

Afternoon Session

In the afternoon, participants of LSWT2018 joined hands-on tutorials on SPARQL and WebID. During the SPARQL tutorial, ten participants learned about the different query types, graph patterns, filters, and functions as well as how to construct SPARQL queries step by step with the help of a funny Monty Python example.

Afterwards, DBpedia hosted a hands-on workshop on WebID, the password-free authentication method using semantics. The workshop aimed at enabling participants to set up a public/private key, a certificate, and a WebID. Everything they needed to bring was a laptop and an own webspace. Supervised by DBpedia’s executive director Dr. Sebastian Hellmann and developer Jan Forberg, people had to log-into a test web service at the end of the session, to see if everything worked out. All participants seemed well satisfied with the workshop – even if not everyone could finish it successfully they got a lot of individual help and many hints. For support purposes, DBpedia will stay close in touch with those participants.

We are currently looking forward to our next DBpedia meetup in Lyon, France on July 3rd and the DBpedia Day co-located with Semantics 2018 in Vienna. Contributions to both events are still welcome. Send your inquiry to dbpedia@infai.org.

A small demo app for a generic natural language interaction library I am developing: NLI-GO. It allows you to ask a few questions in natural language (English). These questions are answered by DBPedia via Sparql queries.

Working with data is hard and repetitive. That is why we are more than happy to announce the launch of the alpha version of our DBpedia Databus, a way that simplifies working with data.

We have studied the data network for already 10 years and we conclude that organizations with open data are struggling to work together properly. Even though they could and should collaborate, they are hindered by technical and organizational barriers. They duplicate work on the same data. On the other hand, companies selling data cannot do so in a scalable way. The consumers are left empty-handed and trapped between the choice of inferior open data or buying from a jungle-like market.

We need to rethink the incentives for linking data

Vision

We envision a hub, where everybody uploads data. In that hub, useful operations like versioning, cleaning, transformation, mapping, linking, merging, hosting are done automagically on a central communication system, the bus, and then again dispersed in a decentral network to the consumers and applications. On the Databus, data flows from data producers through the platform to the consumers (left to right), any errors or feedback flows in the opposite direction and reaches the data source to provide a continuous integration service and improves the data at the source.

The DBpedia Databus is a platform that allows exchanging, curating and accessing data between multiple stakeholders. Any data entering the bus will be versioned, cleaned, mapped, linked and its licenses and provenance tracked. Hosting in multiple formats will be provided to access the data either as dump download or as API.

Publishing data on the Databus means connecting and comparing your data to the network

If you are grinding your teeth about how to publish data on the web, you can just use the Databus to do so. Data loaded on the bus will be highly visible, available and queryable. You should think of it as a service:

Visibility guarantees, that your citations and reputation goes up.

Besides a web download, we can also provide a Linked Data interface, SPARQL-endpoint, Lookup (autocomplete) or other means of availability (like AWS or Docker images).

Any distribution we are doing will funnel feedback and collaboration opportunities your way to improve your dataset and your internal data quality.

You will receive an enriched dataset, which is connected and complemented with any other available data (see the same folder names in data and fusion folders).

How it works at the moment

Integration of data is easy with the Databus. We have been integrating and loading additional datasets alongside DBpedia for the world to query. Popular datasets are ICD10 (medical data) and organizations and persons. We are still in an initial state, but we already loaded 10 datasets (6 from DBpedia, 4 external) on the bus using these phases:

Mapping: the vocabulary is mapped on the DBpedia Ontology and converted (We have been doing this for Wikipedia’s Infoboxes and Wikidata, but now we do it for other datasets as well).

Linking: Links are mainly collected from the sources, cleaned and enriched.

IDying: All entities found are given a new Databus ID for tracking.

Clustering: ID’s are merged onto clusters using one of the Databus ID’s as cluster representative.

Data Comparison: Each dataset is compared with all other datasets. We have an algorithm that decides on the best value, but the main goal here is transparency, i.e. to see which data value was chosen and how it compares to the other sources.

A main knowledge graph fused from all the sources, i.e. a transparent aggregate.

For each source, we are producing a local fused version called the “Databus Complement”. This is a major feedback mechanism for all data providers, where they can see what data they are missing, what data differs in other sources and what links are available for their IDs.

You can compare all data via a web service.

Contact us via dbpedia@infai.org if you would like to have additional datasets integrated and maintained alongside DBpedia.

From your point of view

Data Sellers

If you are selling data, the Databus provides numerous opportunities for you. You can link your offering to the open entities in the Databus. This allows consumers to discover your services better by showing it with each request.

Data Consumers

Open data on the Databus will be a commodity. We are greatly downing the cost of understanding the data, retrieving and reformatting it. We are constantly extending ways of using the data and are willing to implement any formats and APIs you need. If you are lacking a certain kind of data, we can also scout for it and load it onto the Databus.

Is it free?

Maintaining the Databus is a lot of work and servers incurring a high cost. As a rule of thumb, we are providing everything for free that we can afford to provide for free. DBpedia was providing everything for free in the past, but this is not a healthy model, as we can neither maintain quality properly nor grow.

On the Databus everything is provided “As is” without any guarantees or warranty. Improvements can be done by the volunteer community. The DBpedia Association will provide a business interface to allow guarantees, major improvements, stable maintenance, and hosting.

License

Final databases are licensed under ODC-By. This covers our work on recomposition of data. Each fact is individually licensed, e.g. Wikipedia abstracts are CC-BY-SA, some are CC-BY-NC, some are copyrighted. This means that data is available for research, informational and educational purposes. We recommend to contact us for any professional use of the data (clearing) so we can guarantee that legal matters are handled correctly. Otherwise, professional use is at own risk.

Current Statistics

The Databus data is available at http://downloads.dbpedia.org/databus/ ordered into three main folders:

Data: the data that is loaded on the Databus at the moment

Global: a folder that contains provenance data and the mappings to the new IDs

Fusion: the output of the Databus

Most notably you can find:

Provenance mapping of the new ids in global/persistence-core/cluster-iri-provenance-ntriples/<http://downloads.dbpedia.org/databus/global/persistence-core/cluster-iri-provenance-ntriples/> and global/persistence-core/global-ids-ntriples/<http://downloads.dbpedia.org/databus/global/persistence-core/global-ids-ntriples/>

The final fused version for the core: fusion/core/fused/<http://downloads.dbpedia.org/databus/fusion/core/fused/>

Supporting young and aspiring developers has always been part of DBpedia‘s philosophy. Through various internships and collaborations with programmes such as Google Summer of Code, we were able to not only meet aspiring developers but also establish long-lasting relationships with these DBpedians ensuring a sustainable progress for and with DBpedia. For 6 years now, we have been part of Google Summer of Code, one of our favorite programmes. This year, we are also taking part in Coding da Vinci, a German-based cultural data hackathon, where we support young hackers, coders and smart minds with DBpedia datasets.

DBpedia at Google Summer of Code 2018

This year, DBpedia will participate for the sixth time in a row in the Google Summer of Code program (GSoC). Together with our amazing mentors, we drafted 9 project ideas which GSOC applicants could apply to. Since March 12th, we received many proposal drafts out of which 12 final projects proposals have been submitted. Competition is very high as student slots are always limited. Our DBpedia mentors were critically reviewing all proposals for their potential and for allocating them one of the rare open slots in the GSoC program. Finally, on Monday, April 23rd, our 6 finalists have been announced. We are very proud and looking forward to the upcoming months of coding. The following projects have been accepted and will hopefully be realized during the summer.

Our gang of DBpedia mentors comprises of very experienced developers that are working with us on this project for several years now. Speaking of sustainability, we also have former GSoC students on board, who get the chance to mentor projects building on ideas of past GSoC’s. And while students and mentors start bonding, we are really looking forward to the upcoming months of coding – may it be inspiring, fun and fruitful.

As already mentioned in the previous newsletter, DBpedia is part of the CodingDaVinciOst 2018. Founded in Berlin in 2014, Coding da Vinci is a platform for cultural heritage institutions and the hacker, developer, designer, and gamer community to jointly develop new creative applications from cultural open data during a series of hackathon events. In this year’s edition, DBpedia provides its datasets to support more than 30 cultural institutions, enriching their datasets in order participants of the hackathon can make the most out of the data. Among the participating cultural institutions are, for example, the university libraries of Chemnitz, Jena, Halle, Freiberg, Dresden and Leipzig as well as the Sächsisches Staatsarchiv, Museum für Druckkunst Leipzig,Museum für Naturkunde Berlin, Duchess Anna Amalia Library, and the Museum Burg Posterstein.

CodingDaVinciOst 2018, the current edition of the hackathon, hosted a kick-off weekend at the Bibliotheca Albertina, the University Library in Leipzig. During the event, DBpedia offered a hands-on workshop for newbies and interested hackathon participants who wanted to learn about how to enrich their project ideas with DBpedia or how to solve potential problems in their projects with DBpedia.

We are now looking forward to the upcoming weeks of coding and hacking and can’t wait to see the results on June 18th, when the final projects will be presented and awarded. We wish all the coders and hackers a pleasant and happy hacking time. Check our DBpedia Twitter for updates and latest news.

If you have any questions, like to support us in any way or if you like to learn more about DBpedia, just drop us a line via dbpedia@infai.org

This year, DBpedia will participate for the sixth time in a row in the Google Summer of Code program (GSoC). We are regularly growing our community through GSoC and are currently looking for students who want to join us for a summer of coding. Read below for further details about GSoC and how to apply.

What is GSoC?

Google Summer of Code is a global program focused on bringing more student developers into open source software development. Funds will be given to students (BSc, MSc, Ph.D.) to work for three months on a specific task. At first, open source organizations announce their student projects and then students should contact the mentor organizations they want to work with and write up a project proposal for the summer. After a selection phase, students are matched with a specific project and a set of mentors to work on the project during the summer.

If you are a GSoC student who wants to apply to our organization, please check our guidelines before you start drafting your project proposal.

This year GSoC timeline is as follows:

March 12th, 2018

Student applications open (Students can register and submit their applications to mentor organizations.)

April 9th, 2018

Student application deadline

April 23rd, 2018

Accepted students are announced and paired with a mentor. Bonding period begins.

May 14h, 2018

Coding officially begins!

August 6th, 2018

Final week: Students submit their final work product and their final mentor evaluation

August 22nd, 2018

Final results of Google Summer of Code 2017 announced

Check our website, follow us on #twitter or subscribe to our newsletter for further updates.

DBpedia is part of a large network of industry and academia, companies, and organizations as well as 20 Universities including student members. Our aim is to qualify aspiring developers and knowledge graph enthusiasts by working together with industry partners on DBpedia-related tasks. The final goal is, that DBpedia can be effectively integrated into organizations and businesses and incubate their knowledge graph to the next level. We intend to foster collaboration between DBpedia and organizations sharing an interest in and want to profit from Open-Knowledge-Graph governance.

gain insight into your needs helping us to shape our strategy for the future.

Springer Nature was the first partner we collaborated with, in of our new program. We set out on an endeavor tointerlink Springer Nature’s SciGraph and DBpedia datasets.

With Beyza Yaman, who managed to prevail against 7 other international competitors, we found the perfect partner in crime to tackle this challenge. Read her interview below and find out more about the internship.

Who are you?

My name is Beyza Yaman and I am a Ph.D. student in the Department of Computer Science and Engineering (DIBRIS) at University of Genoa (Italy). I am working on the problem of source selection on Linked Open Data for live queries proposing a context and quality dependent solution. Beside my studies, I like to meet new people, learn their cultures and discover new places, especially by walking/hiking events.

Why DBpedia? What is your main interest in DBpedia and what was your motivation to apply for our collaborative internship?

I have already been using DBpedia datasets for my experiments. Besides from being the core of the Linked Data Cloud, DBpedia is one of the platforms which brings the applied semantic technologies forward and ahead of most other data technologies. Also, collaboration with Springer Nature which is one of the best publishing companies was the cherry on the cake! Springer is an innovative company which applies the latest technologies to their requirements. Thus, being involved in a project with different grounds seemed to be a fruitful experience. When I saw the announcement of the internship, I thought this is a great opportunity not to be missed!

As the web of data is growing into the interlinked data space, data sources should be connected to discover further insight from the data by creating meaningful relations. Moreover, further information (e.g. quality) about these link sets forms another aspect of the Semantic Web objectives. Thus, we worked on interlinking SciGraph and DBpedia datasets by using the Link Discovery approach for the structured content and the Named Entity Recognition approach for unstructured text. We were able to integrate SciGraph data with DBpedia resources which improves the identity resolution in the existing resources and to enrich the SciGraph data with additional relations by annotating SciGraph content with DBpedia links which increases the discoverability of the data. One of the challenges we faced was having a huge amount of data and, actually, we have produced even more for the Linked Data users. You can follow our work, use the data and give us feedback from this repository (https://github.com/dbpedia/sci-graph-links).

What did you learn from the project?

It has been a fantastic experience which helped me to expand my theoretical knowledge with a lot of practical aspects. I worked with Markus Freudenberg from DBpedia and Tony Hammond, Michele Pasin and Evangelos Theodoridis from Springer Nature. Working with technically well-equipped researchers and professionals on the subject has been very influential for my research. Especially, working with a team of academics and professionals in collaboration has taught me two different views of looking at the project. I learned more about SciGraph data and DBpedia, as well as, many ways of dealing with huge amount of data, tools used in DBpedia and Linked Data environment, the importance of open source data/codes. Besides the project, I had a chance to witness development phases of DBpedia in the Knowledge Integration and Linked Data Technologies (KILT) group (Leipzig) with a bunch of cool guys and girls who made my stay more enjoyable. I also met a lot of researchers with Semantic Web experience which has extended my point of view widely.

What are your next plans? How do you want to contribute to DBpedia in the future?

I would like to finish my Ph.D. and extend my knowledge by involving new exciting projects like this one. Publishing what we have done and further quality improvements might be a nice follow up for the work and Linked Data community. Besides, I would like to contribute to the development of the Turkish DBpedia Chapter which is unfortunately missing. Thus, in this way, we can promote the usage and development of DBpedia and Linked Data to the Turkish research community and companies as well.

There will also be a report on the collaboration between Springer Nature and DBpedia that will cover the technical details of linking DBpedia and SciGraph datasets. We will keep you informed about news via Twitter and our Website.

We are really happy to have worked with her and we are now looking forward to a Turkish DBpedia Chapter. If you are a DBpedia enthusiast and want to help to start the Turkish DBpedia chapter, just get in touch with Beyza or contact us.

Did her story inspire you? Do you want to become an intern at DBpedia? Check our Website, Twitter, and Social Media and don’t miss any internship updates.

Last but not least, we like to thank Springer Nature for their cooperation and commitment to the project.

In case you like to collaborate with us in order to find a developer that helps to integrate DBpedia into your business get in touch with us via dbpedia@infai.org.