If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Forensic linguistics identifying users on the internet

(I'm a bit skeptical with my background in linguistics; I think it's somewhat possible, but it's very far from being practical or accurate-- I doubt they'll be able to make that something that could hold up in court, at least for a very long time.)

That part was amusing, but part of why I'm skeptical. What they mean is that they haven't done anything aside from analyzing English. There's no reason they couldn't analyze any other language (including leetspeak). In fact, you'd get a lot more information from looking at things beyond standard English.

I think this linguistic hope is unrealistic. It's quite possible to identify groups (professional, social etc.) on the basis of linguistic criteria or peculiarities, but identifying individuals seems impossible to me. Except in certain rare cases. For instance, the rules of Dutch grammar present in the head of the legendary Dutch former soccer player Johan Cruijff are so unique (idiosyncratic) that anyone reading a written text of what he says will recognize him.
Linguists have been too optimistic on other occasions. In the sixties and seventies of the past century, they thought it would soon be possible to build translation machines endorsed with artificial intelligence they thought would be almost equal to human intelligence. The Google Translation Machine proves how sadly they were wrong. They were wrong from the start because they forgot too easily that more than 30%, 40% or maybe 50% of human linguistic interaction is ruled by what humans know 'about the world', not by linguistic rules. If I say The next day, John went to the railway station and he bought a paper then the person to whom I'm speaking will conclude that I mean that John bought the paper at the station, not afterwards. But if I say The next day, John went to see his mother and he bought a paper, nothing indicates when he bought the paper. He may even have bought the paper before going to his mother (=The next day, John went to see his mother and he also bought a paper). (This has nothing to do with 'vagueness' attached to the Simple Past of English. In French, the situation would not be different, despite the so-called 'preciseness' of the passÚ simple).
So the interpretations of certain linguistic utterances, especially sequences of sentences, may have less to do with grammar and the like than with knowledge of the world. And our knowledge of the world is as vast as the universe.

The numbers in this case are "if we have 100 people we can identify 80 of them"... very unclear what that means; and it was with a huge amount of data (1000+ posts per person). What confuses me is what happens when they have 7 billion people to pick from. Do they still get 80% of everyone?

And... on the general notes, you're right, I think. It may be coming in the future (things like translation), but this one seems pretty far off, since even people can't do this (not on the scale they're implying, anyway), and humans are better than computers at basically everything language-related.

What I mean is that the tool isn't more than a symbol-converter, as far as I can tell. It wouldn't help those doing this research to increase accuracy more than just writing that themselves-- 3=e, etc.