The DCC or Distributed Checksum Clearinghouse is an anti-spam content filter
that runs on a variety of
operating systems.
The idea of the DCC is that if mail recipients could compare
the mail they receive, they could recognize unsolicited bulk mail.
A DCC server totals reports of "fuzzy" checksums of
messages from clients and answers queries about the total counts
for checksums of mail messages.

The non-commercial Distributed Checksum Clearinghouse source carries a
license
that is free only to organizations that do not sell filtering devices or
services except to their own users and that participate in the global
DCC network.
ISPs that use DCC to filter mail for their
own users are intended to be covered by the free license.
You can redistribute unchanged copies of the free source, but you may not
redistribute modified, "fixed," or "improved" versions of the source
or binaries.
You also can't call it your own or blame anyone for the results of using it.

Organizations that do not qualify for the free license are welcome to
inquire about licenses for the commercial version by email to
sales@rhyolite.com
or via the
form.
The commercial version supports
DCC
Reputations.

Please note that organizations that do not qualify for the free DCC license
have never been allowed to use the public DCC servers.

Please do not try to use ancient versions of DCC software dating from early
2005 and redistributed by third parties including some Linux packagers.
Those versions do not detect bulk mail as well as more recent versions.
Installations using those old versions also have problems using the
public DCC servers that often make it necessary to add their IP addresses
to the blacklist that protects the public DCC servers.
Even worse, all known Linux redistributions of DCC software have been
changed in ways that break things, including the
libexec/updatedcc shell script that could
otherwise be used to fetch, configure, compile, install, and restart
a current version.

There are no official distributions of DCC binaries,
whether simple a.out files, RPM Package Manager (RPM) packages,
or BSD style ports or packages (pkg).
There are many unofficial sources of DCC binaries, including
Linux RPMs and BSD style packages.

As of 2008, the FreeBSD packages are not too far out of date and
include a working version of the
libexec/updatedcc shell script that
fetches, configures, compiles, installs, and restarts
a current version.

As far as known in 2008, all DCC RPMs offered by Linux distributors
are based on DCC software from 2005 and should not be used.

The UDP packets used by a DCC client to obtain the checksum totals
from a DCC server for a mail message generally use less bandwidth than
the DNS queries required to receive the same message.
A DCC client needs very little disk space.

Bulk messages are usually logged by DCC clients.
On systems receiving a lot of mail, the mechanisms for automatically
creating new log directories every minute, day, or hour
can keep any single log directory from becoming too large.
See the dccm
and
dccproc
man pages.

About 1.4 GBytes/day are exchanged between each pair of DCC servers.
Each server has 3 or 4 peers.
The resulting database is about 3 GBytes with the default expiration
parameters..
However, while dbclean is deleting old checksums,
there are three copies of the database.
The DCC clients and server do not need many CPU cycles,
but the daily executions of dbclean
on a system with a DCC server
require a computer with at least 2 or 3 GBytes of RAM.
In 2006,
a DCC server prefers 4 GBytes of RAM and can use 6 GBytes.
12 to 18 GBytes of disk space are also needed.

DCC servers used by clients handling 100,000 or more messages per day
need to be larger.
Each additional 100,000 messages/day need about 100 MBytes of disk space
and system memory, given the default expiration used by
dbclean.

A mail system that processes fewer than 100,000 mail messages per day
uses less of its own bandwidth and the bandwidth of other DCC servers
by using the public
DCC servers.
Each mail message needs a DCC transaction that requires
about 100 bytes, and so 100,000 mail messages/day imply about 10
MBytes/day of DCC client-server traffic. Each DCC server needs to
exchange "floods" or streams of checksms with 4 other servers. Each
flood is currently about 1.4 GBytes/day for a current total of about
3 GBytes/day.

When normally installed by the included Makefiles, DCC clients are
configured to use the
public DCC servers
without any additional configuration except opening firewalls to port UDP 6277.

Mail systems that process more than 100,000 mail messages per day
need local DCC servers connected to the global network of DCC servers.
The public DCC servers include denial of service defenses which
ignore requests in excess of about 240,000 per day per client.

It is wrong to resell the CPU cycles, network bandwidth,
disk space, and, most important, human system administration work of the
public DCC servers.
Vendors of "anti-spam appliances" or similar
that do not steal from the operators
of the public DCC servers have always run their own DCC servers.

When in doubt or trouble, the DCC clients including
dccproc and dccm
deliver mail. They wait only a little while for a DCC server
to answer before giving up. They then avoid asking a server for a while
to avoid slowing down mail.

If the DCC sendmail interface or milter program, dccm, crashes,
the default parameters in misc/dcc.m4
for the sendmail.cf Xdcc line
tell sendmail to wait only about 30 seconds before
giving up and delivering the mail.

The DCC client code keeps track of the speeds of the
servers it knows about, and uses the fastest or closest.
Every hour or so it re-resolves A records
and checks the speeds of the servers it
is not using. When the current server stops working or gets significantly
slower, the client code switches to a better server.

Unless given thresholds at which to reject mail,
dccm
and
dccproc do not reject mail.
When dccm is given a threshold by setting DCCM_REJECT_AT in
dcc_conf in the DCC home directory,
DCCM_ARGS can also be set to "-a IGNORE
so that spam is marked but not rejected.

The nroff source, formated nroff output, and HTML versions of the
man pages are in the top-level source directory.
Formatted or nroff source is installed by default somewhere in /usr/local/man
depending on the target system.
It may be necessary to add /usr/local/man to the MANPATH environment variable.
Even with that, SunOS 5.7 sometimes has trouble finding them unless
man -F is used.

The DCC can be used with
SpamAssassin as
well as other spam and virus filters.
Note that it is more efficient to arrange to use a DCC client daemon
such as dccm to mark passing mail and check
X-DCC header lines in the filter than to start and run
dccproc on each message.

Some commercial virus and spam filters include DCC clients that
query public DCC servers or DCC servers operated by the filter vendor
and that "flood" or exchange bulk mail checksums with public servers.
Reputable manufacturers of such devices operate their own DCC servers
connected to global network of DCC servers instead of stealing and then
selling the CPU cycles, network bandwidth, disk space, and, most important,
human system administration efforts of the public DCC servers.

DCC clients including dccproc, dccifd, and dccm can wait as long as
about 16 seconds for an answer from a DCC server.
Except when an anonymous client triggers the progressive delays that are
among the defenses against denial of service attacks in the public DCC servers,
delays are almost always less than 10 seconds.
Delays for DNS blacklists
(see dccifd -B)
are additional.

Dccproc can be used with any mail user
agent that can check mail headers.
For example, WD Baseley sent a
note
to the DCC
mailing list
on how to configure Eudora to
act on X-DCC header lines.

Bharat Mediratta has developed DeepSix for people using mail user agents
on UNIX boxes connected remote servers such as corporate Exchange servers.
See his
project on Sourceforge
as well as his
announcement
in the DCC mailing list.

The public DCC servers accept requests from clients using the
anonymous client-ID.
Incorrectly configured firewalls often cause problems.
Traceroute can be used to send UDP packets to test for interfering firewalls.
See the answer to the firewall question.

After firewalls, the most common cause of problems while trying to
use the public DCC servers is sending too many requests.
The DCC server daemon, dccd, includes
defenses against denial of service or DoS attacks.
Those defenses include progressively delaying responses
and eventually ignoring requests.
The ancient version of the DCC client software included in some
Linux redistributions tries so hard to reach the fastest server
that it can trigger those DoS defenses.

If you run a DCC server, open incoming connections to local TCP port 6277
from your flooding peers,
and outgoing connections to TCP port 6277 on your flooding peers.
Also open UDP port 6277 to IP address 192.188.61.3 for the DCC server status
web page.

Dbclean -R
will usually repair a broken
DCC server database.
However,
if your server is "flooding" or exchanging checksums with other servers,
it is often quicker to stop the DCC server,
delete the
/usr/local/dcc/dcc_db and
/usr/local/dcc/dcc_db.hash files
and restart dccd with the
libexec/start-dccd script.
When dccd starts, it will notice that the database has been purged
and ask its flooding peers to rewind and retransmit their checksums of
bulk mail.

Global dccm
or dccifd
logging can be entirely
disabled by setting DCCM_LOGDIR="" or DCCIFD_LOGDIR="" in the
dcc_conf file in the DCC home directory.
Logging for individual users can be disabled by not creating or deleting
thir log directories.
However, this not only disables logging of rejected mail, but also logging
of mail that suffered system failures.

To delete old log files, run the
misc/cron-dccd script
daily with an entry like misc/crontab
in the crontab file for the user that runs dccd
or dccd.
The DBCLEAN_LOGDAYS parameter in the
dcc_conf file in the DCC home directory
specifies the age of old log files.

The most common cause of
thread_create() failed: 11, try again
or pthread_create(): Cannot allocate memory
error messages from dccm
and dccifd
is a too small limit on the maximum number of processes allowed
the UID running the dccm or dccifd process.
The "maxproc" limit seen with the `limit` or `limits` shell command
should be a dozen or so larger than the sum of
the queue sizes of dccm or dccifd (or both if both are running).

Dccm or dccifd can fail to create a thread to deal with an incoming
mail message if there are no available file descriptors or
other resources.
Adding -d to DCCD_ARGS or DCCIFD_ARGS in
dcc_conf in the DCC home directory
sends a message to the system log that includes the limit on simultaneous mail
messages and its source, such as a process resource limit on the
number of file descriptors.

Another common limit is the maximum number of file descriptors
allowed by the select system call.
This limit can be escaped by building the sendmail milter library to
use the poll system call.

A nearby server that seems slower than a more distant server will
not be chosen.
The anonymous user delay set with dccd -u
is intended to make a server appear slow to "freeloaders."
The "RTT +/-" value that can be used with
the cdcc add
and cdcc load
operations can be used to force DCC clients to prefer or avoid servers
except when absolutely necessary.

DCC server and client-IDs
serve distinct purposes.
Servers require server-IDs to identify each other in the floods of checksums
they exchange and to recognize authorized users of powerful
cdcc operations such as stop.
DCC servers require client-IDs to identify paying clients that should
be given quicker service that anonymous clients, to refuse reports from
anonymous clients, or to refuse even to answer queries from anonymous
clients.

You have turned on IDS tracing, but do not have a
/usr/local/dcc/ids file that is complete.
You don't need and probably will not have a complete file unless you
are assigning DCC server-IDs.

Redundant paths among DCC servers exchanging
or flooding reports of checksums would cause duplicate entries in
each server's database without the mechanism that depends on every DCC server
having a unique server-ID.
With IDS tracing enabled, dccd complains
about server-IDs that are not listed in the local
/usr/local/dcc/ids file.

A common cause of such problems is one of the DCC server's
defenses against denial of service attacks.
A DCC server cannot know anything about anonymous clients,
or clients using client-ID 1 or without a client-ID and matching password
from the /usr/local/dcc/ids file.
As far as your server can know, an anonymous client sending many
operations is run by an unhappy sender of unsolicited bulk mail trying
to flood your server with a denial of service attack.
It is easy to tell your client its ID with the
cdcc add
or load operations.

The default limits can changed by
adding an dccd -R argument
can be added to DCCD_ARGS in the
dcc_conf file in the DCC home directory,

Dccm is usually configured to log mail with recipient counts greater
than the -t ,log-thold,
as well as mail with some conflicts among
whitelist entries.
Each log file contains a single message, its checksums, its disposition,
and other information as described in the
dccm man page.

You are probably not seeing false positives.
The Distributed Checksum Clearing Houses detect both solicited
and unsolicited bulk mail, while spam is only unsolicited bulk email.
For your DCC client, dccm,
dccifd, or
dccproc, to know to ignore bulk mail messages
that are solicited, it must be told by entries the main or a per-user
whitelist or whiteclnt file.

There is probably no mistake.
DCC detect bulk mail and not only unsolicited bulk mail.
Whether a bulk message is spam depends on whether you solicited or asked for it.
Some INTERNET service providers have sent literally millions of
acknowledgments of spam reports, which makes them bulk mail.
Bulk mail you want to receive should be
whitelisted
in your master or per-user
whiteclnt file.

If the DCC client was not able to compute a checksum for a message,
it will not ask the server about that checksum and the checksum will
not appear in the X-DCC header.
For example, if dccproc is not told and
cannot figure out the IP address of the source of the message,
that checksum will be missing.
The Fuz1 and Fuz2 checksums cannot be computed for
messages that are too small, and so will be missing for them.
A checksum will also be missing if the DCC server is configured to not count
it.

The client whitelist files
used by
dccproc,
dccm,
and
dccifd
are generally required.
Client whitelists apply only to the stream of mail handled by the
DCC client,
while server whitelists apply to reports of mail from all DCC clients
of the DCC server.

Dccproc is intended for use by individual users
with programs such as
procmail.
Because the global whiteclnt file usually found in the DCC home directory
is as likely to be used as a private file,
the file name must be explicitly specified with
dccproc -w whiteclnt.
A perhaps inconvenient implication is programs such as
SpamAssassin that
switch unpredictably between dccproc and dccifd
might get inconsistent results unless they invoke dccproc with the global
whiteclnt file.

Start by monitoring bulk mail in the
global log directories specified with
dccproc -l
and with DCCM_LOGDIR and DCCM_USERDIRS in the
/usr/local/dcc/dcc_conf file
for dccm,
and
dccifd.
Then add entries to whitelist files.

Per-user whitelists in whiteclnt files
specified with DCCM_USERDIRS in the
/usr/local/dcc/dcc_conf file
are easily maintained with ordinary text editors by the system administrator.
However, it is often better to let individual users deal with their
own whitelists.
The DCC source includes sample CGI scripts
in the cgi-bin directory in the DCC source
to let individual end-users monitor their private logs of bulk mail
and their individual whitelists.
See the README file for those scripts.
There is also a
demonstration
of the cgi scripts.

An easy way to test a DCC client whitelist or
whiteclnt file
is to feed dccproc with a test message.
For example, the following shell script would test whether the IP address
127.0.0.1
and the SMPT envelope Mail_From value postmaster@example.com are in the
whiteclnt file in the DCC home directory:

No, regular expressions cannot be used,
because DCC client and server whitelists are converted to lists of checksums.
The same basic idea is used for DCC client whitelists
as for the DCC protocol.
A DCC client computes the checksums for a message, and then looks
for those checksums in the local whitelist.
Depending on the values associated with those checksums,
the DCC client asks a DCC server about them.

To use regular expressions with the DCC, consider procmail.
Procmail is included with many UNIX-like systems.
See also the
Procmail Homepage.

DCC clients can be configured to white- or blacklist
using called "substitute" headers.
See dccproc -S or
dccm -S.

It is also possible to use a sendmail access_db file entries to
white- or blacklist based on portions of SMTP envelope and
client IP addresses.
For example, an access_db file line of "From:example.com OK"
can be used to tell dccm to whitelist all mail from SMTP clients
in the example.com domain.
See the -O argument to the
misc/hackmc script.

Start by determining an envelope value or SMTP header that distinguishes
the bulk mail from a sample message or DCC log file.
The name of the sending computer is the mail_host value in
dccm log files.
If the distinguishing header or envelope value is not among the main
DCC whitelist values,
then a "substitute" value must be used.
An "ok substitute ..." line must be added to the whitelist file
and the DCC client program must be told with
dccproc -S or
dccm -S.
There are example whitelist entries in the sample
/usr/local/dcc/whiteclnt file.

There are several points during an SMTP transaction when an SMTP server
can reject a mail message.
Early points are when the SMTP client specifies the recipients of the
mail message.
The last point is after the entire message has been received by the SMTP
server.
Spam filters that check mail message bodies must wait until that last point.
The SMTP protocol does not allow an SMTP server to reject the
mail message for only some recipients.
The SMTP server must tell the SMTP client that the message has been
accepted for all or rejected for recipients.
This is a problem when the recipients of a single mail message have
differing
DCC thresholds or other parameters
in their individual whitelist files
that require that the mail message be delivered to some mailboxes but
rejected for other mailboxes.

The DCC client programs solve this conflict in one of two ways.
One is telling the SMTP client
that the mail message has been accepted for all recipients and then
discarding instead of delivering the message for mailboxes with parameters
that make it spam.
This solution has the disadvantage of not informing senders of the
refusal to deliver the message.
The other solution is to temporarily reject recipients with possibly
incompatible parameters early in the SMTP transaction with the same
SMTP error status number as too many recipients for a single SMTP transaction.
This second solution has the advantage of ensuring that senders know
when their mail is rejected but the disadvantage of sometimes
requiring as many SMTP transactions as there are recipients for a mail message.

Which solution is used is determined by the
forced-discard-ok
and forced-discard-nok
settings in the global and per-user
whiteclnt files.
Unless all recipients for a mail message agree on the first solution,
perhaps by forced-discard-ok in the main
whiteclnt file,
the second solution is used.

There are several possible causes of such problems.
The first and most obvious is that the mail is solicited bulk mail
and that the source needs to be added to your
whitelist.

Another possible reason is that your individual legitimate mail messages
have not been marked as spam because their Body or Fuz1
checksum counts are small, but that the IP address or other checksum
counts are large.
The IP address checksum count, for example, is the total of all reports
of addressees for that checksum.
That total is independent of the other checksums, and so counts
all reports for all messages with that source IP address.
A source of legitimate mail that has sent a message that was reported
as spam by one of its recipients will often have the totals
for the checksums of its IP address, From header, and
other values be MANY.
This is why it usually does not make sense to reject mail based on what the
DCC server reports for the IP address, From header, and other values that
are not unique to the message.
Only the last Received header line, the Message-ID line, and body checksums
can be expected to be unique and sometimes not the Message-ID
and Received header lines.

A common cause for that and similar complaints involves
null or missing Message-ID header lines.
Spam often lacks Message-ID lines or has a null or "&lt&gt" ID,
so rejecting mail with null or missing Message-IDs can be an
effective filter.
DCC clients treat missing Message-ID lines as if they were present but null.
The sample /usr/local/dcc/whiteclntwhitelist file in the DCC source
includes the line:

many message-id <>

Some Mail Transfer Agents violate section 3.6.4 of RFC 2822 and
do not include Message-ID header lines in mail they send,
including some combinations of qmail and
"sendmail -bs" acting as the originating MTA,
and qmail by itself when it is generates a non-delivery message or "bounce."
Solutions to this problem include removing that line from your
whitelists
or adding lines specifying the From or envelope
from values of senders of legitimate mail lacking Message-ID header lines.

Yes, dccproc can whitelist mail
by the IP address of the immediately
preceding SMTP client,
but only if it knows that IP address.
Unless the dccproc -a
or dccproc -R
options are used, dccproc does not know the IP address.

DCC checksums are of the entire header line or envelope value.
An entry in the whitelist file for jsmith@example.com
will have no effect on mail with an envelope value of
"J.Smith" jsmith@example.com.
The file must contain "J.Smith" jsmith@example.com.

Another common cause for this problem is implied by the fact that
for an env_from whitelist entry
to have any effect, dccproc must be able to find the envelope value
in the message in a Return-Path header,
an old UNIX-style From_ header, or an -f argument.
If your mail delivery agent does not add a Return-Path header
and you do not use
dccproc -f,
then dccproc cannot know about
white or blacklist entries for envelope return addresses.

Note also that dccproc has no whitelist by default and
that dccproc -w
must be used.

It is possible to delete checksums from the distributed DCC
database with the
cdcc delck
operation.
However, it is not worth the trouble.
Unless the same (as far as the fuzzy checksums are concerned) message
is sent again, no one is likely to notice the mistake before the
report of the message's checksums expire from the DCC servers'
databases for lack of repetition.

Sendmail decisions to accept, reject, or discard mail are largely
independent of the decisions made by dccm.
The DCC equivalent is to add
env_to entries to the
dccm whitelist.
See the sample /usr/local/dcc/whiteclnt file in the
DCC source

However, if your sendmail.cf file sets the
dcc_notspam macro while processing the
envelope, then the message will by whitelisted.
This is related to the dcc_isspam macro
used by sendmail.cf modified by misc/hackmc -R
to tell dccm to report blacklisted messages as spam to the DCC server.

To whitelist all mail addressed to mailboxes in a domain,
add the following line to the sendmail access_DB file and rebuild
the database with the sendmail tool, makemap:

To:domain.com DCC:OK

You can apply finer control by adding
a third argument to the FEATURE(dcc) macro in your sendmail.mc file
as described in
misc/dcc.m4.
All mail for the domain can use a single "per-user"
whiteclnt file,
often in the /usr/local/dcc/userdirs/esmtp/example.com, where /usr/local/dcc/userdirs
is the default value for DCCM_USERDIRSin the DCC configuration file
/usr/local/dcc/dcc_conf.
Making /usr/local/dcc/userdirs/esmtp a symbolic link to /usr/local/dcc/userdir/local
can be handy.

Reports of checksums with
whitelist
entries in your server's database are not flooded to its peers.
The checksums of messages whitelisted with entries in local
dccm or dccproc
whitelists are not reported to DCC servers.
It is good to add entries to DCC server and client
whitelists
for localhost, your IP address blocks, and your domains if
you know that none of your users will ever send spam.

However, in the common mode in which the DCC is used, no
checksums of mail are pollution.
Checksums of genuinely private mail will have target counts of
1 or a small number, and so will not be flooded by your server to
other servers.
Strangers will not see your private mail and so will not be able
to ask any DCC server about the checksums of your private mail.
On the other hand, the DCC functions best by collecting reports
of the receipt of bulk mail as soon as possible.
That implies that it is generally desirable
to send reports of all mail to a DCC server.
The DCC flooding protocol does not send checksums with counts
below 10
to other servers.

A spam trap is a mail address that should practically
never receive legitimate mail,
and that treats any mail that it does receive as spam.
A spam trap might a common name such as
user1 that has never been valid
and is discovered by unsolicited bulk email
advertisers by dictionary attacks or guessing.
It might instead be an address hidden in a web page
or a mailbox of an account that has been disabled for many months.

Any spam trap might receive legitimate mail.
For example, a spam trap that differs from an ordinary mailbox by a
single character might receive mail intended for the ordinary mailbox.
It might be best for a system to reject mail sent to such a trap so
that legitimate mail senders know that their messages have gone astray.
A mailbox that is a long string of arbitrary letters and digits is much
less likely to receive legitimate messages and so might best accept
all messages without complaint.

will accept a message on STDIN,
look for the IP address of the sender among
Received: SMTP fields,
reports the message to the DCC server as spam and the IP address as the sender,
and exit with the default value of
dccproc -x.

dccif-test

dccif-test was written to test the interface to the DCC interface daemon,
dccifd.
When wired to a spam trap, it is more efficient than dccproc.
For example,

The best way to build a spam trap is with a
per-user whiteclnt file
with an
option spam-trap-accept or option spam-trap-reject
line.

With sendmail, virtual user mapping can be used to send mail to invalid
mailboxes to a single mailbox whose corresponding DCC per-user
whiteclnt file contains an
option spam-trap-accept or option spam-trap-reject
line.

A single flooding peer delivers all reports of checksums of bulk
mail seen by any DCC server. Additional peers provided reports
sooner and so help the clients of a peer detect spews of spam sooner.
However, more peers will cause more reports to be duplicates.

A DCC server in a network of many servers should have at least three
flooding peers to ensure that the failure of a single server or network
link cannot partition the network.
Limiting the number the number of peers of any server to four or perhaps
a few more ensures that no single server is critical to the network.
To minimize the distances in the network, four peers
per server seem necessary.

An organization with more than one server can be viewed as a single
server by other organizations, with its servers flooding each other
and external peers spread among its servers.
This protects the network should the organization suffer large scale problems
while protecting the organization from single points of failure.

No, you do not need to and generally should not tell other DCC server
operators the passwords for controlling your server with
the cdcc command.
Every Inter-server flood of checksums is authorized by lines in
each server's /usr/local/dcc/flod file
and authenticated by the password associated with the
passwd-ID in those lines.
The passwd-ID is a server-ID
defined in the /usr/local/dcc/ids file
that should generally be used only to authenticate floods of checksums.