Archived material is restricted to Rantburg regulars and members. If you need access email fred.pruitt=at=gmail.com with your nick to be added to the members list. There is no charge to join Rantburg as a member.

Not only can it identify specific words or groups of words in massive volumes of data, by hitting the "merge" command, you can link everyone else who uses the term with a line and graphically depict the entire network of users.

#17 This is an area in which I have some expertise. To answer the questions above:

There are many ways (and many off-the-shelf software packages, and many custom systems) that handle natural language data, including in text form (documents, web pages, tweets, email etc. as opposed to transcribed conversations, which tend to have a different linguistic structure). The state of the art goes well beyond just finding specific words or phrases, but the capabilities of specific systems outside of R&D shops differ greatly. IARPA is already several years into an R&D program on cross-language, cross-culture metaphor identification and interpretation, for instance. NIST has been running text retrieval, topic modeling, content extraction, machine translation etc. challenges / competitions for almost 20 years now. DARPA has a Deep Exploration and Filtering of Text effort that is expected to transition to operational use within DOD within a few years.

l33t sp34k would be fairly easy to deal with. Deleted text is just the regular text surrounded by formatting markers, so no problem there. Tweets / text conventions are already addressed here and there. Many packages handle various languages, including Arabic. Palantir primarily displays and links rather than interpreting the text itself - document information is imported into the tool, but tagging is done manually by analysts before the tool can display and cross-correlate based on those tags.