Cleanfeed - Regular Expressions

Overview

Apart from the EMP type filters Regular Expressions form the foundations
for all the other types of filters in Cleanfeed. It's not vital to fully
understand them in order to run Cleanfeed but it will be very advantageous
to have at least a basic knowledge if you wish to customise its behaviour.

Regular Expression Resources

There are dozens of good Regular Expression resources on the World Wide Web and
a Search
is a good way to locate them. The official Perl manual page is a
good reference but isn't really intended as a tutorial for beginners. The official Perl tutorial is
excellent, as are many of the other hits your search will uncover.

Regex Details

The following table doesn't include the defaults for Regex's as many of them
are very long indeed. I'd recommend browsing through the Cleanfeed Perl file
and checking them there. If you think something is missing from the defaults
that would improve the filter for other users, please email me.

Binaries are not allowed in groups matching this, even if
they are defined in bin_allowed. This enables
bin_allowed to define a broad hierarchy of groups and then specific ones to be
excluded by this option.

If block_mime_html
is turned on, groups matching this will be excluded from the filter. Where an
article is crossposted, all the groups must match. By default, no groups
are defined as accepting MIME HTML.

Messages posted (or crossposted) to a group matching this
regex will be subject to the PHR EMP Filter By
default, no groups are defined as high-risk, it's up to the operator to
identify them at times when a flood occurs. Usually this is by way of abuse
complaints as the operator can't watch tens of thousands of newsgroups.

Distributions where all the groups match this regex are
granted a much higher accepted level of supersedes than normal groups. This is
because FAQ postings are often superseding previous versions of the same FAQ.
This parameter is ignored unless do_supersedes_filter is
enabled.

This regex matches against the NNTP-Posting-Host header.
When a match occurs, the posting host will not be used to seed the NPH or PHR
filters. In these instances, if phr_aggressive or phn_aggressive is True, the
right-most FQDN in the Path header will be used instead. This protects against
floods from services, (such as newsguy.com) that place unlinkable data in the
NNTP-Posting-Host header.

By default this regex is empty. Groups that match it will
trigger topic filters that work in the same way as meow_groups.Example: The operator may elect to limit crossposting from adult content
groups to non-adult groups. This could be done by defining:
topic1_groups => '\.sex'
This would limit crossposts between groups matching .sex and those that don't.
See also, off_topic_maxgroups and
on_topic_mingroups.

Unlike all the other regex's this one is contained in a
Perl Hash, keyed by a friendly name and containing the regex. Distributions
with a group that matches this regex will be rejected if they also include a
group that doesn't match it.