The following paper compares social network analysis tools for use by the Joint Improvised Explosive Device Defeat Organization (JIEDDO) Counter-IED Operations Integration Center (COIC). The document was obtained via FOIA request by Carnegie Mellon University student Michael Lanham. Unfortunately, due to the quality of the document, certain formulas listed throughout the document are illegible.
Social Network Analysis (SNA) Tool Comparison Working Paper
- 37 pages
- November 28, 2011
- 15.8 MB
This is a working paper for an ongoing SNA Tool Comparison effort at the Counter-IED Operations/Intelligence Center’s (COIC) Data Analysis Research and Collaboration (DAR C) Cell. It contains the results of the first phase of this effort. A Power Point presentation summarizing this paper is also available. This paper will be edited and amended as additional results become available from subsequent phases.
The objective of this study was to compare and analyze four different Social Network Analysis (SNA) tools for the basic measures of Centrality (Degree, Closeness, Betweenness and Eigenvector), in order to set a baseline for further evaluation of the tools and their capabilities. The four tools compared were Analyst Notebook (ANB), Palantir, UCINet and ORA. The results of this analysis are unclassified but for official use only (U//FOUO). The data and intermediate products (e.g. tool outputs, Excel sheets used for manual calculations etc.) will become available once they have been anonymized and unclassified.
Five data sets of varying sizes were used in this analysis, ranging from four agents to over two thousand agents. The data sets also had different densities, ranging from 0.6250 to 0.0006790. The smaller data sets consisting of four and sixteen agents allowed for manual calculation checks to be made on some of the Centrality measures.
For each data set in this Phase I, the data was binarized/dichotomized and used in symmetrized form to set a baseline for the results produced by the different tools.
All tools produced the same rankings by Degree Centrality for all five data sets, even though there were some differences in the centrality numbers produced by the tools.
For the smaller data sets, with four or sixteen or 209 agents, the minor 3rct and 4th decimal place differences in the various centralities did not produce any noticeable difference in the ranking of the agents for centralities other than Degree.
As stated above, for the 5th data set, the normalized values for the Degree Centrality were the same for UCINet and ORA, while ANB and Palantir were slightly different for the Top 20 agents. However, the ranks by Degree Centrality were the same for all four tools. There were bigger differences in Closeness and Betweenness centrality values for the 5th data set seen at the 2nd, 3rd and 4th decimal place. For Betweeness centrality, ANB and Palantir were on one side with similar values and UCINet and ORA were on the other with similar values. For Closeness centrality, Palantir was on one side and the rest of the tools on the other2 in the results that they produced. These differences among the tools are possibly due to different internal precision or rounding techniques or algorithmic differences between the tools that are manifested when working with larger and lower density networks. It is possible that this effect could be exaggerated further in even larger data sets that are on the order of 10K+ or 100K+ agents.
The differences seen in June 2011 between UCINet and ORA were probably a combination of default decimal settings in UCINet or default options settings in ORA which were hidden behind menus or the usage of an older ORA version (22.214.171.124.0). ORA 2.3.5, the version tested in this effort, now allows easier access to options for symmetrizing and dichotomizing data resulting in better transparency.
Working with non-symmetric and weighted (non-binarized) links data can be more involved, and while non-symmetric data was analyzed to some degree during this phase, the detailed results from that will be addressed in the follow-on phases. It is worth noting, however, that when the data was non-symmetric, the various tools produced different results in many instances. This was again likely due to different default settings in the tools, different precision or rounding techniques, or possibly algorithmic differences between the tools.
It is recommended that at this time either UCINet or ORA be used for SNA with the appropriate options settings. ANB and Palantir could be considered for SNA once the discrepancies for the largest data set have been resolved.
It is also recommended that at least two larger data sets on the order of 10K+ and 100K+ agents be analyzed for all the tools in order to investigate further the relationship between network size/densities and the number of decimal places needed for the results.