This is the design documents for the Allure Defender system. This document is a high level design and API of the components that make up the Allure Defender system. We outline all the high-level pieces and then the individual components, their behaviors, expected input/outputs, and relationships. We will discuss specific implementation and design choices and languages and libraries that will be used. In addition we will cover specific user cases and illustrate some running examples. Last we refer to a running system which implements many of the components we cover in the document.
Like a lie detector detects physical changes in the body based on sensitivities to specific questions, we believe there are physical changes in the body that are represented in observable behavioral changes when committing actions someone knows is wrong. Our solution is to develop a paranoia-meter to measure these observables. Using shoplifing as an example, there are peaks and valleys of adrenaline during the entire theft process. There is the moment the thief puts an item in their pocket (high), then as they walk around the store the adrenaline begins to valley a bit, then they attempt to walk out of the store (very high). It is at these points that we want to be able to take as many behavioral measurements as possible because it is at these points the insiders activity will be as far from normal behavior. In this hypothesis we will have a rootkit on the host that monitors keystrokes, mouse movements, and visual cues through the system camera.
While it is a challenging undertaking, we plan to research and develop a fully automated malware analysis framework that will produce results comparable with the best reverse engineering experts, and complete the analysis in a fast, scalable system without human interaction. In the completed mature system, the only human involvement will be the consumption of reports and visualizations of malware profiles. Our approach is a major shift from common binary and malware analysis today, requiring manual labor by highly skilled and well-paid engineers. Results are slow, unpredictable, expensive and don’t scale. Engineers are required to be proficient with low-level assembly code and operating system internals. Results depend upon their ability to interpret and model complex program logic and ever-changing computer states. The most common tools are disassemblers for static analysis and interactive debuggers for dynamic analysis. The best engineers have an ad-hoc collection of non-standard homegrown or Internet-collected plug-ins. Complex malware protection mechanisms, such as packing, obfuscation, encryption and anti-debugging techniques, present further challenges that slow down and thwart traditional reverse engineering technique.
Current technologies and methods for producing and examining relationships between software products, particularly malware, are lacking at best. The use of hashing or “fuzzy” hashing and matching techniques are conducted at the program level, ignoring any reflection of the actual development process of malware. This approach is only effective at finding closely related variants or matching artifacts found within malware that are only tangent to the development process, such as hard coded IP address, domains, or login information. This matching process is often unaware of internal software structure except in the most rudimentary sense, dealing with entire sections of code at a time, attempting to align matches while dealing with arbitrary block boundaries. The method is akin to an illiterate attempting comparing two books on the same topic. Such a person would have a chance of correlating different editions of the same book, but not much else. The first fundamental flaw in today’s approach is that it ignores our greatest advantage in understanding relationships in malware lineage, we can deduce program structure into blocks (functions, objects, and loops) that reflect the development process and gives software its lineage through code reuse.