Meanwhile, FinCEN's production system continues to use artificial intelligence algorithms developed in the 1990s.

In a study of three commercial data mining products, FinCEN researchers have detected patterns in a database of suspected crimes. Eventually they might use their results as templates to uncover cases of potential fraud in routine cash transaction reports.

Criminal enterprises transform their ill-gotten proceeds into clean funds in ways that have become more and more difficult to detect because of the increasing sophistication of financial systems. Successful criminals can afford to hire computer experts to assist them.

The Bank Secrecy Act of 1970 requires banks, casinos and other institutions to report large transactions of U.S. and foreign currency toTreasury.

Since 1996, Treasury also has required financial institutions to use a three-page form called the Suspicious Activity Report to describe potential embezzlement, money laundering, check kiting, loan fraud or other crimes.

The form has a space for an account of the suspect activity, a category of information not present on other Bank Secrecy Act reporting forms. The free-format data'which can be cryptic, detailed or nearly unintelligible'presents challenges to data miners.

Last fall FinCEN completed the first phase of its data mining study. Wong, the project's manager, said it piloted three tools: Darwin from Oracle Corp., Clementine from SPSS Inc. of Chicago and SGI's MineSet. Two contractors, Visual Analytics Inc. of Poolesville, Md., and Nautilus Systems Inc. of Fairfax, Va., worked with FinCEN on the project.

Christopher Westphal, Visual Analytics' chief executive officer, said each suspicious activity record contains 70 data fields. Many are fill-in-the-blank responses, but several are yes-or-no or multiple-value check boxes.

Westphal said he and his colleagues were not looking to identify those responsible in individual cases of possible fraud. Rather, they wanted to detect general patterns and trends in the nearly 400,000 suspicious-activity forms filed since 1996.

'Somebody who is not playing by the rules may be a little bit more erratic,' he said. For example, a customer might deposit unusually large amounts of money into multiple accounts.

Cleaning the raw data turned out to be a crucial first step, Westphal said. Not only did the researchers have to fix the usual typographical errors and misspellings, but they also had to resolve data inconsistencies among the free-format text fields.

For example, bank officials would describe a suspect's occupation with terms such as 'worked,' 'worker,' 'working at,' 'works on' or 'worked for.' Ultimately, Westphal said, the data mining software must lump the variations under one term, 'employment.'

A clustering algorithm sought to group data from a given field into natural clusters.

Link analysis might reveal, for example, a cluster of persons using the same bank account number.

Westphal was reluctant to disclose the exact patterns the researchers found in the database for fear of tipping off criminals.

He did say that a combination of conventional data mining methods and link analysis could become a powerful tool for FinCEN's work.

Westphal stressed the need for data cleansing procedures that would lessen ambiguities in bank and business names listed on suspicious-activity reports.

Ultimately, the broad patterns of criminal activity detected in the forms could help filter the suspicious cases out of Treasury's database of 12 million-plus annual currency transaction reports, he said.

In the near future, FinCEN wants to develop a prototype data mining system based on the results of the initial study, Wong said at the AFCEA event.

In the next phase the agency would define specific data abstraction routines and train FinCEN analysts, Westphal said.