Pagine

Free Online Term Extractors

This page provides a set of free terminology extraction tools available online.

Online Terminology
Extraction is the extraction of terms from a text through a web service based on linguistic and/or statistical routines and algorithms. Given some text it will return a list of terms with (hopefully) the most relevant first. Terms
can be returned in a variety of formats and can be used for a variety of things:

Raise a website visibility and SEO
by using extracted terms as keywords, tags and meta-tags;

Maintaining a company thesaurus
(the more classical approach).

The list is divided into two groups, the first one, with more detailed descriptions, includes the easier-to-use tools where all you have to do is just to specify a source text or paste the source URL, press a button and get the term list. There is
no software to install, no manual to read, and, of course, no price to pay. The second group includes also SEO tools and APIs.

Terminology Extraction by Translated uses Poisson statistics, the Maximum Likelihood Estimation and Inverse Document Frequency between the frequency of words in a given document and a generic corpus of 100 million words per language. It uses a probabilistic part of speech tagger to take into account the probability that a particular sequence could be a term. It creates n-grams of words by minimizing the relative entropy. Terminology Extraction by Translated can be also used to improve search results in traditional search engines (es. Google) by giving a better estimation of how much a keyword is relevant to a document.

Uploading: Texts may be submitted for analysis through entering it into the text window.

Technical terms are important for knowledge mining, especially in the bio-medical area where vast amount of documents are available. The amount of terms (e.g., names of genes, proteins, chemical compounds, drugs, organisms, etc) is increasing at an astounding rate in the bio-medical literature. Existing terminological resources and scientific databases cannot keep up-to-date with the growth of neologisms. A domain independent method for term recognition is very useful to automatically

Uploading: Texts may be submitted for analysis through any of the following ways:

entering the text you would like to analyze in to the topmost text window;

Maui - indexer: Maui automatically identifies main topics in text documents. Depending on the task, topics are tags, keywords, keyphrases, vocabulary terms, descriptors, index terms or titles of Wikipedia articles. It also shows how keyphrases can be extracted from document text.

File formats supported: text, PDF, Microsoft Word.

Vocab Grabberanalyzes any text, generating lists of the most useful vocabulary words and shows how those words are used in context. VocabGrabber creates a list of vocabulary from the text, which can be then sorted, filtered, and saved. By selecting any word on the list it is possible to see a snapshot of the Visual Thesaurus map and definitions for that word, along with examples of the word in the text.

Languages supported: English

Supported file formats: all formats.

Uploading: Copy/paste iinto the box, and click on the "Grab Vocabulary!" button.

an automated terminology extraction service which is based on a statistical method and results in bilingual term pairs.

Bibclassify - A module in CDS Invenio (CERN’s document server software) for automatic assignment of terms from SKOS vocabularies, developed on the High Energy Physics vocabulary. Developed in the collaboration between CERN and DESY.

Extractor - Commercial software for keyword extraction in different languages. There is also a demo. Developed at the National Research Council of Canada.

Multi-purpose topic indexing algorithm Maui - Suitable for automatic term assignment, subject indexing, keyword extraction, keyphrase extraction, indexing with Wikipedia, autotagging, terminology extraction. Developed at the University of Waikato. Maui is also available on sourceforge.

Orchestr8 Keyword Extraction - An API based application that uses statistical and natural language processing methods. Applicable to webpages, text files and any input text in several languages.

Wikifier – An online demo of detecting Wikipedia articles in text developed at the Language and Information Technologies research group at the University of North Texas.

Wikipedia Miner – An API for accessing Wikipedia data, which also provides a tool for mapping any document to a set of relevant Wikipedia articles, similar to indexing with Wikipedia. Developed at the University of Waikato. Demo 1 and demo 2.