Data Mining Report
- Office of the Director of National Intelligence
- 8 pages
- March 2009
The Office of the Director of National Intelligence (ODNI) is pleased to provide to Congress its second report pursuant to the Data Mining Reporting Act. The Data Mining Reporting Act requires “the head of each departrnent or agency of the Federal Government” that is engaged in an activity to use or develop “data mining,” as defined by the Act, to report annually on such activities to the Congress.
Scope. This report covers the data mining activities of all elements of the ODNI from January 31, 2008 through January 31, 2009. Constituent elements of the Intelligence Community (IC) are reporting their data mining activities to Congress through their own departments or agencies.
Last year’s ODNI data mining report detailed a number of efforts within the “Incisive Analysis” Office in the Intelligence Advanced Research Projects Activity (IARPA) that included research of techniques that could be applied to data mining. Two of those programs (Tangram and Paint) have ended, the relevant effort within a third program (Knowledge Discovery and Dissemination) has ended, and the focus of a rourth effort (Reynard) changed in early 2008. Only one program – Video Analysis and Content Extraction (VACE) — is currently funding reseurch that includes the exploration of techniques that might be applied to data mining. As a result, VACE is the program that this report addresses in detail. Information about the disposition of the other efforts is appended to the end of this report.
This report covering ODNI activities is unclassified and has been made available to the public through the ODNI’s website. For completeness, a classified annex containing more detailed information on VACE and on one of the discontinued efforts discussed in the appendix at the end of this report has also been prepared and has been transmitted to the appropriate Congressional committees.
Definition of “data mining.” The Data Mining Report Act defines “data mining” as “a program involving patterm-based queries, searches or other analyses of 1 or more electronic databases” in order to “discover or locate a predictive pattern or anomaly indicative of terrorist or criminal activity…”
This definition limits covered activities to predictive, “pattern-based” data mining, which is significant because analysis performed within the ODNI and its constituent elements for counterterrorism and similar purposes is often performed using various types of link analysis tools. Unlike “pattern-based” tools, these link analysis tools start with a known or suspected terrorist or other subject of foreign intelligence interest and use various methods to uncover links between that known subject and potential associates or other persons with whom that subject is or has been in contact.
The Data Mining Reporting Act does not include such analyses within its definition or “data mining” because such analyses are not “pattern-based.” Rather, these analyses rely on inputting the “personal identifiers of a specific individual, or inputs associated with a specific individual or group of individuals.” which is excluded from the derinition or “data mining” under the Act. ODNI is neither involved in nor does it directly employ pattern-based data mining programs to discover or locate patterns or anomalies indicative of terrorist or criminal activity in any or its constituent elements, such as the National Counterterrorism Center, National Counterproliferation Center, National Intelligence Council or other offices within ODNI. However, within the ODNI’s Intelligence Advanced Research Projects Activity (IARPA) there is one piece of one research progran within the Office of Incisive Analysis that is exploring techniques for identifying patterns that may be associated with terrorist activity, as described below. This report details those activities because the act requires a report on any “activity to . . . develop data mining.”
Background on IARPA. It is IARPA’s mission to invest in high-risk/high payoff research programs that have the potential to provide the U.S. with an overwhelming intelligence advantage over its future adversaries. IARPA’s time horizon is measured in years, not months. It does not have an operational mission and it does not deploy technologies directly to the field. IARPA programs arc by nature highly experimental and pioneering and are designed to produce new capabilities not even imagined by the operational agencies it serves. The end goal of an IARPA program is typically a proof-of-concept experiment or prototype of a never-before-seen capability. Because IARPA programs are on the cutting-edge of research, they do not always achieve their end goals, but even when they do, further steps are required to transform the results into real world applications. Any results from IARPA research programs that do get incorporated into future operational programs within the IC, or other parts of the United States government, will be subject to appropriate legal, privacy, civil liberties and policy safeguards.
IARPA recognizes that data mining techniques explored as part or a research program could. poentially, impact the privacy or civil liberties of individuals if they are successfully transitioned to an operational partner without careful consideration of these issues. To this end, IARPA intends to maintain its longstanding relationship with the ODNI CLPO for the purpose or validating that its research programs are conducted consistent with the protection of individual privacy and civil liberties. Through this ongoing relationship, the privacy and civil liberties of individuals will be well preserved with careful oversight and responsible consideration in the decision whether and how to deploy any resulting technologies.
In the fall of 2006. NSA’s Disruptive Technology Office (later incorporated into IARPA) and the ODNI CLPO jointly sponsored a series or workshops attended by govermment experts. private sector experts, and privacy advocates. The attendees at lhese worksbops examined an array of challenges to privacy posed by emerging technologies and govermment needs for information for intelligence and countertermorism purposes. and suggested a variety of innovative approaches to applying technology to these problems. The 2008 ODNI Data Mining Report described a broad range of issues and technologies related to privacy protection inspired by these workshops. During the past year, IARPA evaluated research proposals to explore many of those technologies and address many issues.
1. Reynard continues as a seedling effort5 within IARPA: however. the focus or the effort has changed since the 2008 data mining report was submitted. Reynard is currently exploring the feasibility or understanding and characterizing behavior in virtual worlds by leveraging expertise in the social science research community. As such. the planned program no longer meets the definition or “pattern-based data mining” described in the Act.
2. The Tangram program was originally intended to evaluate the efficacy and intelligence value of a terrorist threat surveillance and warning system concept that would (i) report the threat likelihood of known threat entities, and (ii) serve to discover and report the threat likelihood of unexpected threat entities. During FY 2008, the Tangram program conducted elementary experiments on the feasibility of building and maintaining a continuously operating surveillance and warning system from a compurer science perspective. The program has ended. the results will be archived, and further research is not planned at this time.
3. The ProActive Intelligence (PAINT) program sought to study the dynamics of complex intelligence targets (inclusive of but not solely terrorist organizations) by using a model- based approach to elucidate patterns of causal relationships that are indicative or nefarious activity. The program, which concluded in early January 2009, integrated several modeling technologies in a first-generation proof-of-concept system.
5. Knowledge Discovery and Dissemination (KDD)- The goal of the KDD program is to invest in research and technology that will greatly enhance the ability or analysts to collaboratively evaluate and utilize data from multiple, massive data sets in order to generate high quality, accurate, and timely intelligence. The research of KDD is primarily focused on link analysis and graphs as well a.s techniques to improve and measure collaboration between IC analysts. As such, KDD rarely supports research using pattern based data mining techniques as defined in the Data Mining Reporting Act.
- In FYO6-FYO7, there was a KDD sponsored research project that met the reporting criteria of an “activity to . . . develop data mining” in the Act. The project attempted to match known patterns or entity deception in lawfully collected foreign data bases. This project was completed in early 2008 and was included in last year’s report. with the completion of this project in 2008, there are currently no research projects supported by KDD that meet the reporting criteria of the Data Mining Reporting Act.
- BLACKBOOK- the BLACKBOOK capability was developed under the KDD program to provide an infrastructure using a service oriented architecture (SOA) approach for data analysis. The infrastructure does not do any data analysis or data mining per se but rather provides a convenient and organized way for other services to be run that do data analysis.