> My question is, does anyone have any bright ideas of some useful,
> simple content analysis attributes? As it's a statistical/ML approach
> I'm trying to come up with as generic as possible ideas. So far I'm
> calculating things like session data entropy, most frequent character,
> counts of certain characters.

The IDS literature is over-filled of techniques (both deterministic
and stochastic, or ML-based) of any sort to model "good" traffic that
may inspire your project.

I don't have the exact references with me but a quick Google Scholar
for terms like "tcp" "anomaly" "payload" narrowed between 2003 and
2006 (when anomaly-based NIDS were a hot topic) will spot out the main
contributions.

I feel there's even a little room for improvements to the existing
approaches.

Cheers,

-- Fede

-----------------------------------------------------------------
Securing Your Online Data Transfer with SSL.
A guide to understanding SSL certificates, how they operate and their application. By making use of an SSL certificate on your web server, you can securely collect sensitive information online, and increase business by giving your customers confidence that their transactions are safe.
http://www.dinclinx.com/Redirect.aspx?36;5001;25;1371;0;1;946;9a80e04e1a
17f194