چکیده انگلیسی

Data mining is emerging as one of the key features of many homeland security initiatives. Often used as a means for detecting fraud, assessing risk, and product retailing, data mining involves the use of data analysis tools to discover previously unknown, valid patterns and relationships in large data sets. In the context of homeland security, data mining is often viewed as a potential means to identify terrorist activities, such as money transfers and communications, and to identify and track individual terrorists themselves, such as through travel and immigration records. However, compared to earlier uses of data mining by government, some of the homeland security data mining applications represent a significant expansion in the quantity and scope of data to be analyzed. Three of the higher profile initiatives include the now defunct Terrorism Information Awareness (TIA) project, the recently canceled Computer-Assisted Passenger Prescreening System II (CAPPS II), and the Multistate Anti-Terrorism Information Exchange (MATRIX) pilot project. This article examines the evolving nature of data mining for homeland security purposes, the limitations of data mining, and some of the issues raised by its expanding use, including data quality, interoperability, mission creep, and privacy.

مقدمه انگلیسی

Since the September 11, 2001, terrorist attacks, government officials have continued to grapple with the questions of whether the attacks could have been prevented and what can be done to increase the government's awareness and knowledge of terrorist activity. As evidenced by congressional inquiries into so-called intelligence failures and the hearings held by the National Commission on Terrorist Attacks Upon the United States,1 a significant amount of attention appears to be focusing on how to better collect, analyze, and disseminate information. In doing so, technology is commonly and increasingly looked upon as both a tool and in some cases a substitute for human resources.
One such technology that is playing a prominent role in homeland security initiatives is data mining. Similar to the concept of homeland security,2 while data mining is widely mentioned in a growing number of bills, laws, reports, and other policy documents, an agreed upon definition or conceptualization of data mining appears to be generally lacking within the policy community. While data mining initiatives are usually purported to provide insightful, carefully constructed analysis, at various times data mining itself is alternatively described as a technology, a process, and/or a productivity tool. In other words, data mining, or factual data analysis, or predictive analytics, as it also is sometimes referred to, means different things to different people.
For example, in a proposed bill to require executive branch agencies to report on their data mining activities to Congress, a very contextually specific definition is offered. S.1544, the Data-Mining Reporting Act of 2003, identifies data mining as meaning:
a query or search or other analysis of 1 or more electronic databases, where-
(A)
at least 1 of the databases was obtained from or remains under the control of a non-Federal entity, or the information was acquired initially by another department or agency of the Federal Government for purposes other than intelligence or law enforcement;
(B)
the search does not use a specific individual's personal identifiers to acquire information concerning that individual; and
(C)
a department or agency of the Federal Government is conducting the query or search or other analysis to find a pattern indicating terrorist or other criminal activity.3
In contrast, in its March 2004 report examining the Terrorism Information Awareness (TIA) project, the Department of Defense's Technology and Privacy Advisory Committee (TAPAC) purposely used a broad conceptualization of data mining to inform its research. The TAPAC report defines data mining to include “searches of one or more electronic databases of information concerning U.S. persons by or on behalf of an agency or employee of the government.”4
In testimony before a House subcommittee in March 2003 regarding the use of data mining in government program audits, the General Accounting Office (GAO) defined data mining as “analyzing diverse data to identify relationships that indicate possible instances of previously undetected fraud, waste, and abuse.”5 However, just over a year later, in a report on U.S. government data mining activities overall, GAO offered a slightly different definition. In that May 2004 report, GAO defined data mining as “the application of database technology and techniques—such as statistical analysis and modeling—to uncover hidden patterns and subtle relationships in data and to infer rules that allow for the prediction of future results.”6
Regardless of which definition one prefers, the common theme echoed in all of these definitions is the ability to collect and combine, virtually if not physically, multiple data sources for the purposes of analyzing the actions of individuals. In other words, there is an implicit belief in the power of information, suggesting a continuing trend in the growth of “dataveillance,” or the monitoring and collection of the data trails left by a person's activities. More importantly, it is clear that there are high expectations for data mining, or factual data analysis, being an effective homeland security tool.
Data mining is not a new technology but its use is growing significantly in both the private and public sectors. Industries such as banking, insurance, medicine, and retailing commonly use data mining to reduce costs, enhance research, and increase sales. In the public sector, data mining applications initially were used as a means to detect fraud and waste but have grown to also be used for purposes such as measuring and improving program performance. While not completely without controversy, these types of data mining applications have gained greater acceptance. However, some of the homeland security data mining applications represent a significant expansion in the quantity and scope of data to be analyzed. Moreover, due to their security-related nature, the details of these initiatives (e.g., data sources, analytical techniques, access and retention practices, etc.) are usually less transparent. In addition, some of the data mining projects have been associated with individuals who carry unfavorable reputations in some communities, further heightening concerns about these projects specifically, and data mining generally. For example, the idea behind the now-defunct TIA project was commonly attributed to retired Admiral John M. Poindexter. Previously, Poindexter was perhaps most well known for his alleged role in the Iran-contra scandal during the Reagan Administration.7 Similarly, Hank Asher, founder of Seisint, the company developing the Multistate Anti-Terrorism Information Exchange (MATRIX) pilot project, resigned from the company in August 2003 following publicity regarding his alleged role in drug smuggling in the 1980s.8 Consequently, the announcement of any new homeland security or law enforcement-related data mining application usually attracts significant and ongoing scrutiny.
Despite these controversies, data mining homeland security applications are unlikely to be on the wane anytime soon. To the contrary, as the technology and techniques improve and more information become available, it is reasonable to expect more data mining, not less. However, as the Homeland Security Act suggests, data mining is just one of the many tools used in the war against terrorism.9 It is a tool whose full potential is currently still being explored though. With that in mind, this article attempts to provide a general introduction to data mining and homeland security, including a description of data mining techniques and their limitations, an overview of some of the more significant data mining applications, and challenges to the successful implementation of data mining initiatives.

نتیجه گیری انگلیسی

6. Conclusion
While the policy community remains divided on the merits of specific data mining initiatives, there appear to be some signs of acceptance of the growing use of data mining for homeland security purposes, provided that appropriate protections are in place and enforced. However, since the security sensitivities of such projects may obscure them from public scrutiny, government oversight of these initiatives will need to be persistent and thorough. Indeed, at the time of this writing, as more time passes since the September 11, 2001, attacks, there appears to be growing support to ensure that safeguards are built into the policies and technologies used in data mining and a willingness to pull the plug on projects that are perceived (rightly or not) to be without restraint. However, if in the unfortunate case there is another high-impact terrorist attack in the United States, security concerns could easily overwhelm the will to scrutinize our actions voluntarily, suggesting that built-in protections and oversight will be critical to data mining initiatives that the citizenry can trust.
As discussed above, while technological capabilities are important, there are other implementation and oversight issues that can influence the success of a data mining project's outcome. The challenges of data quality, interoperability, mission creep, and privacy can all serve to undermine or misdirect data mining initiatives. Equally important to an initiative's success or failure is the cultivation of public awareness and support. The sereptious development of a data mining application aimed at the general population breeds mistrust and is more likely to generate a vociferous backlash than an initiative that involves potential stakeholders. However, that is not to say that there should be complete transparency either. While transparency is a critical feature of democracy, there are legitimate security reasons for not disclosing all of the technical and analytical aspects of a homeland security data mining application. A balance must be struck between enabling openness and oversight and adherence to the appropriate security protocols.
The discussion of data mining presented here provides merely a snapshot in time. As technology evolves, additional information becomes more accessible, and homeland security demands continue to develop, it is likely that there will be a growth in government use data mining. Moreover, as the MATRIX initiative suggests, data mining will not be the sole province of the federal government. State and local governments, both individually and collaboratively, are likely to push ahead with data mining initiatives of their own in a much more diverse regulatory environment. Looking ahead, then, as more pools of data are collected, combined, and analyzed, one can anticipate that the potential privacy, accuracy, and security concerns will multiply. The larger question, though, is whether the results of this data mining will represent a strategic allocation of resources or a quixotic search for security?