Featured tool:

Discursis is a tool for analyzing text-based natural language with a focus on sequential analysis. It is particularly desgined to work with texts that have an internal temporal structure, such as a transcribed conversation. For each text, it generates a statistics-based internal language model and applies tagging to each temporal unit detected. The model and tagging are then used as the basis for an interactive visualization which permits examination at a various levels of granularity, ranging from a whole-text to a unit-by-unit level; it can also show topic usage patterns within the text. Users must purchase a license; both student and academic trial licenses are available.

Featured tool:

Umigon is a free, web-based and open-source tool for sentiment analysis of tweets. From a person's Twitter handle, Umigon retrieves that account's tweets and processes it for sentiment with accounting for factual statements (ex: "I hate war" will be classified has negative, and "war in Syria" will be classified as neutral). Tweets can also be pasted manually in the entry box provided. Beyond sentiment analysis, Umigon can identify characteristics such as whether the tweet contains a question, whether it contains possible promotional/commercial subject matter, or temporal indicators based on tense. Users are encouraged to report inaccurately identified sentiments via the button provided next to each tweet. Results can be exported in Excel or CSV format for futher analysis.

Featured tool:

CheckText is a free, web-based text analysis tool. Users can paste in text, upload it from their files, or import content from a web page. For each text, CheckText generates statistics such as word count, syllable count or number of complex words, provides a reading ease and level breakdown, and graphs the most frequent words. The tool also offers a uniqueness check aimed at detecting plagiarism, and generates a PDF report containing the full text with the statistical and readability data.

Featured tool:

Alt.Text is a free, working prototype application for exploring a text on both an outline and content level via a graphical user interface. It breaks down texts into components such as sections, passages or documents, and permits users to leverage these components to create outlines and break off sections. Alt.Text includes three interfaces: the Document Editor (define sections, create outlines), the Section Editor (define passages of text within a given section), and the Outline Viewer (choose which passages to include in an outline section). Alt.Text is under active development, and is available for download in both Windows and OS X versions.

Featured tool:

Paper Machines is a topic modelling and visualization tool available as a plugin for Zotero. It analyzes Zotero bibliographic collections based on a selection of text mining processes, and enables users to export a variety of visualizations, such as word clouds, phrase nets or heat maps, from the results.

Featured tool:

CATMA (Computer Aided Textual Markup and Analysis) is a free, open source markup and analysis tool from the University of Hamburg's Department of Languages, Literature and Media. It incorporates three interactive modules, a tagger enabling textual markup and markup editing, an analyzer incorporating a query language and predefined functions, and a query builder that allows users to construct queries from combinations of pre-defined questions while allowing for manual modification for more specific questions. It also interfaces with the Voyant toolset. As of version 4.1, CATMA is a web application with collaborative work functions, and improvements to its user interface, queries and corpus analysis capacity.

Popular Tools

User Recommended Tools

Random Tools

Stanford Mobisocial Lab: Muse

Muse is a free, open source JavaScript tool for reflecting on and searching for patterns in the past by examining one's personal e-mail archive. It analyses e-mail to generate several views including a sentiment graph from messages that may reflect ...

Stanford Mobisocial Lab: Muse

CLAS (Computerized Language Analysis System)

CLAS (Computerized Language Analysis System) was an important historic text analysis system available in the 1970s. It was written in PL/I for IBM 360/370 punch card machines and performed standard statistical tests and concordances on natural language ...

CLAS (Computerized Language Analysis System)

W3C RDF Validation Service

The W3C RDF Validation Service is a free, web-based tool for checking RDF documents for errors and displaying the results. It can display in triples, a graph, or a combination of the two, and can format the graph in a variety of file formats including ...

W3C RDF Validation Service

Keywords Finder - Beta (TAPoRware)

This tool identifies keywords or key phrases within a user-specified text, using the assumption that they will appear with the greatest frequency. It applies a stemmer to every word. Plain text input is recommended. All tags will be stripped from an ...

Keywords Finder - Beta (TAPoRware)

SATO (Systeme d'analyse des textes par ordinateur)

SATO (Systeme d'analyse des textes par ordinateur) is a longstanding historic text analysis system, now available as a free, web-based tool. Users can either draw off SATO's corpus or upload their own for analysis, and the texts for analysis may be ...

SATO (Systeme d'analyse des textes par ordinateur)

UNICON

UNICON was a concordance generator written in FORTRAN IV and available in the 1960s. It was available first for the IBM 7094, and for IBM 1410/7090 computers after 1970.

UNICON

TextSTAT

TextSTAT is a free text analysis tool offered by Niederländische Philologie, FU Berlin. It is a simple program designed to accept plain text, HTML, Word and OpenOffice files to produce word frequency lists and concordances, and versions are available ...

TextSTAT

Domeo Annotation Toolkit

The Domeo Annotation Toolkit is an extensible web application for creating and sharing ontology-based stand-off annotations on HTML or XML documents. Users can add annotations manually, or via the tool's full or partial automation options. It also includes ...

Domeo Annotation Toolkit

Netvizz

Netvizz is a free, web-based tool for extracting datasets from Facebook. As the tool utilizes Facebook Apps, users must have a Facebook account to access it.

Netvizz

Voyant RezoViz

Voyant RezoViz is a free, web-based tool in the Voyant toolset for visualizing the relationships between people, locations and organizations in a text or collection of texts.

Voyant RezoViz

Orlando Degrees of Separation

Orlando contains a relatively large corpus, currently consisting of details about the life and writing careers of roughly 1000 British women writers, amounting to 6.8 million words with 2.2 million semantic tags for everything from paragraphs to politics, ...

Orlando Degrees of Separation

DEREDEC

DEREDEC was a programming system and workbench for linguistics and text analysis written in LISP in the 1980s. It enabled syntactic and texual parsing, and could link phrases by their contenxual dependency relations.

DEREDEC

Voyant Term Frequencies Chart

Term Frequencies Chart shows how terms are distributed across document(s) in a corpus (documents are shown in the order in which they were added).

Voyant Term Frequencies Chart

Laurence Anthony: AntWordProfiler

AntWordProfiler is a free tool for word profiling. For each word in a document, it will generate the base form and a list of possible related words, provide statistics and frequency data and list word types. It can also process files separately or as ...

Laurence Anthony: AntWordProfiler

NeOn Toolkit

The NeOn Toolkit is an open source environment for engineering ontologies. It supports both the F-logic and OWL ontologies, and its functions include annotation, documentation, ontology evaluation and matching and human-ontology interaction.

NeOn Toolkit

Crawdad Text Analysis Software

Crawdad is a commercial software package for qualitative data analysis based on natural language processing. It generates a network model of a text and calculates word influence based on its position within the network. It also includes visualization, ...

COBOL

COBOL (Common Business-Oriented Language) is a programming language first developed for business applications in the late 1950s. It was frequently referenced by humanists interested in applying computers and algorithms to their research, though infrequently ...

COBOL

SplitsTree4

SplitsTree4 is a free Java tool for generating phylogenic (similarity) networks from Universitat Tubingen. While designed for molecular sequence data, it can also visualize humanities data such as document sequence alignments.

SplitsTree4

TextSTAT

TextSTAT is a free text analysis tool offered by Niederländische Philologie, FU Berlin. It is a simple program designed to accept plain text, HTML, Word and OpenOffice files to produce word frequency lists and concordances, and versions are available ...

TextSTAT

SCAN

SCAN was a conversational programming language available in the 1970s for text analysis. It was specific to text processing and could be used divide a text into sentences or words or split on separators. It was capable of running counts on a text, printing ...