A thesaurus is a reference work that lists words grouped together according to similarity of meaning (containing synonyms and sometimes antonyms), in contrast to a dictionary, which contains definitions and pronunciations. In Information Science, Library Science, and Information Technology, specialized thesauri are designed for information retrieval. They are a type of controlled vocabulary, for indexing or tagging purposes. Such a thesaurus can be used as the basis of an index for online material. The Art and Architecture Thesaurus, for example, is used to index the Canadian Information retrieval thesauri are formally organized so that existing relationships between concepts are made explicit. As a result, they are more complex than simpler controlled vocabularies such as authority lists and synonym rings. Each term is placed in context, allowing a user to distinguish between "bureau" the office and "bureau" the furniture. Following international standards, they are generally arranged hierarchically by themes, topics or facets. Unlike a literary thesaurus, these specialized thesauri typically focus on one discipline, subject or field of study. In information technology, a thesaurus represents a database or list of semantically orthogonal topical search keys. In the field of Artificial Intelligence, a thesaurus may sometimes be referred to as an ontology. (Excerpt from <a href="http://en.wikipedia.org/wiki/Thesaurus">Wikipedia article: Thesaurus</a>)

Theora is a free lossy video compression format. It is developed by the Xiph.Org Foundation and distributed without licensing fees alongside their other free and open media projects, including the Vorbis audio format and the Ogg container. (Excerpt from <a href="http://en.wikipedia.org/wiki/Theora">Wikipedia article: Theora</a>)

The Getty Thesaurus of Geographic Names (abbreviated TGN or GTGN) is a product of the J. Paul Getty Trust included in the Getty Vocabulary Program. The TGN includes names and associated information about places. Places in TGN include administrative political entities (e.g., cities, nations) and physical features (e.g., mountains, rivers). Current and historical places are included. Other information related to history, population, culture, art and architecture is included. The resource is available to museums, art libraries, archives, visual resource collection catalogers, bibliographic projects through private license or available to members of the general public for free on the Getty Vocabulary website (see external links). (Excerpt from <a href="http://en.wikipedia.org/wiki/Getty_Thesaurus_of_Geographic_Names">Wikipedia article: Thesaurus of Geographic Names</a>)

Text mining, sometimes alternately referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning. Text mining usually involves the process of structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a database), deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interestingness. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling (i.e., learning relations between named entities). (Excerpt from <a href="http://en.wikipedia.org/wiki/Text_mining">Wikipedia article: Text mining</a>)

Tesseract is a free software optical character recognition engine for various operating systems. Originally developed as proprietary software at Hewlett-Packard between 1985 and 1995, it had very little work done on it in the following decade. It was then released as open source in 2005 by Hewlett Packard and UNLV. Tesseract development has been sponsored by Google since 2006. It is released under the Apache License, Version 2.0. Tesseract is considered one of the most accurate free software OCR engines currently available. (Excerpt from <a href="http://en.wikipedia.org/wiki/Tesseract_(software)">Wikipedia article: Tesseract</a>)

Telnet is a network protocol used on the Internet or local area networks to provide a bidirectional interactive text-oriented communications facility using a virtual terminal connection. User data is interspersed in-band with Telnet control information in an 8-bit byte oriented data connection over the Transmission Control Protocol (TCP). Telnet was developed in 1969 beginning with RFC 15, extended in RFC 854, and standardized as Internet Engineering Task Force (IETF) Internet Standard STD 8, one of the first Internet standards. Historically, Telnet provided access to a command-line interface (usually, of an operating system) on a remote host. Most network equipment and operating systems with a TCP/IP stack support a Telnet service for remote configuration (including systems based on Windows NT). Because of security issues with Telnet, its use for this purpose has waned in favor of SSH. (Excerpt from <a href="http://en.wikipedia.org/wiki/Telnet">Wikipedia article: Telnet</a>)

The Text Encoding Initiative (TEI) is a text-centric community of practice in the academic field of digital humanities. The community runs a mailing list, meetings and conference series, and maintains a technical standard, a wiki and a toolset. The Guidelines define some 500 different textual components and concepts (word, sentence, character, glyph, person, etc), which can be expressed using a markup language and defined by a DTD or XML schema. Early versions of the Guidelines used SGML as a means of expression; more recently XML has been adopted. (Excerpt from <a href="http://en.wikipedia.org/wiki/Text_Encoding_Initiative">Wikipedia article: TEI DTD</a>)

TechWatch's main output is its peer reviewed, horizon scanning reports. Originally, these reports focused exclusively on technologies and standards, but as the impact of new technologies has become much more interwoven with legal and social issues, the reports have changed slightly to accommodate this. So, whilst the focus of the reports is still primarily on technology and standards, it is inevitable that discussion of a particular technology may also need to encompass an awareness of the social impact of that technology. (Excerpt from <a href="http://www.jisc.ac.uk/whatwedo/services/techwatch/reports">this source</a>)

Technorati is an Internet search engine for searching blogs. By June 2008, Technorati was indexing 112.8 million blogs and over 250 million pieces of tagged social media. The name Technorati is a blend of the words technology and literati, which invokes the notion of technological intelligence or intellectualism. Technorati uses and contributes to open source software. Technorati has an active software developer community, many of them from open-source culture. Sifry is a major open-source advocate, and was a founder of LinuxCare and later of Wi-Fi access point software developer Sputnik. Technorati includes a public developers' wiki, where developers and contributors collaborate, also various open APIs. (Excerpt from <a href="http://en.wikipedia.org/wiki/Technorati">Wikipedia article: Technorati</a>)