[This is a working paper for the "6th International Conference on IT Security Incident Management & IT Forensics", taking place in Stuttgart, Germany from May 10th to 12th, 2011. Deadline is on January 17th, 2011. Find the details for the RfP here: http://www1.gi-ev.de/fachbereiche/sicherheit/fg/sidar/imf/imf2011/cfp.html.]
Apache ALOIS - A true open source plattform for computer forensics
Abstract: Although computer forensics is above all about recovering, collecting and analyzing data, there is, at least as far as we know, no central platform for the integration of all the varying data that is being created in a forensics process. Sure, there exist dozens of valuable software tools, all specialized in one or more defined areas. But when it comes to integration and consolidation of these data collections, often incompatibility of data and the lack of interfaces form severe problems. In our opinion, a good part of this problem lies in the nature of proprietary software. A community driven development can help to integrate these data collections by providing interfaces to the various software tools. Apache ALOIS is an open source tool, originally designed as SIEM (Security Information and Event Management) with Data Leakage Detection (DLD) in mind. But since its main tasks are collecting and analyzing data as well as reporting, it could very well be used as an integration plattform for all collected data within a computer forensics process.
The aim of computer forensics is to acquire, analyze and evaluate digital tracks in the context of an already conducted or yet only planned criminal act. This requires highly specialized knowledge both in IT, and the field of the crime, as well as highly specialized tools. It therefore makes sense that tasks are divided among specialists each using their own tools.
Usually collection the data is the first step in forensics. In our digital age this requires in-depth IT knowledge. Adequately skilled, technologically-oriented people are responsible for this task. Of course they need a somewhat basic criminal technical expertise as a knowledge background. The tools employed are usually technologically challenging. The second step - the analysis of the collected data - requires particular criminalistic knowledge, intuition and a experience in the relevant fileds of the crime. Basic technical knowledge must be provided, but shall not be central in any way. The tools used should therefore be less technologically sophisticated, albeit a database query using SQL for example may be required. For the evaluation process, mainly legal expertise is needed. While it does require understanding of the digital media, technological knowledge should be provided as little as possible. Therefore, the tools used have to be very user friendly.
During this division of tasks, the overall view must not be lost. Here, a cross-software platform might be of great help for computer forensics. This platform must ensure that all the information is available for the entire process in the respective most appropriate form. This means, that the task of the creation and access to this information corresponds with the necessary know how in the respective process step. Such a platform can also take on additional services, such as a workflow or communications. Another advantage of a centralized database is the possibility of cross-case analysis. Furthermore, it could be assured that all the information of a case is stored in one place and, therefore, can be easily controlled and understood. Moreover, as the aim must be to use the most appropriate tool for each task, it is important that this platform has an open architecture and open interfaces. It must therefore be independent of a provider. In this respect, it makes sense to pursue a free implementation of this platform, that is an open source software.
Open Source Software
[This brief introduction is an excerpt of the PhD of the author.]
The idea of open source software - originated from a movement of computer hackers who have developed software primarily in their leisure time for fun - is still wearing the halo of being a project of unpaid volunteers. However, Free/Libre and Open Source Software (FLOSS) is in an accelerated process of adaptation to the market. This development takes place along a cycle of innovation, as is represented in economics by Schumpeter  for example. Therefore, various studies show that especially the big projects like the Linux operating system, the office suite OpenOffice or the database MySQL is pursued by a majority of developers paid for their contributions .
In simple terms, Open Source Software (OSS) is defined on the one side by an open, community-oriented development process. On the other hand, it is defined by an open license. The former means, that OSS is less dependent on individual persons, highly decentralized, and only very limited is planable. The latter usually means that OSS can be used free of charge. However, free of charge is not a requirement of Open Source at all. The word "free" has to be understood in the sense of "free speech" and not "free beer" . In this sense, over time, several business models established,b e it with the software itself or with services on top of it.
While there are projects that are largely dominated by one company, it is more and more realized that open source software can be developed better when there is a large degree of independence. To achieve this, many projects founded independent non-profit organizations that play a mediating role. The first project launched this was the Apache web server with the Apache Software Foundation. The non-profit organization takes on the one hand the role of the legal person, on the other hand, it is responsible for the infrastructure. The important thing is, that the organization can preserve their independence from the cooperating companies. At the Apache Software Foundation this has been solved in the way that only "private" people can be members, but not organizations, while in principle every person has the same rights, regardless of their financial contribution. In addition, the committees are elected democratically by the members, and again every person has one voice . That a project's legal independence is elementary for the participation of commercial organizations, showed, among other things, the Eclipse project. The platform, originally developed by IBM internally for its own use, has opened its source in an early stage to make it more interesting for partner companies . But it has been the later detachment of the project from IBM through the transfer of rights to the independent Eclipse Foundation, that was able to convince other companies to participate . Today, Eclipse is the de facto standard in the field of Java development platforms and has a market share of well over 50%.
The recognition, that in the Internet age, that is, rapid access to worldwide information, an isolated proceeding is no longer appropriate, has grown significantly in recent years over the levels of management. In this context, the concept of interoperability is increasingly become more important. Interoperability describes the ability of diverse systems and organizations to work together (inter-operate). Open-source software alone, however, cannot fullfill this demand. While it can actually guarantee the independence from a manufacturer by the disclosure of the source code, this can not be said of the independence from the product and therefor the full flexibility. That will only be possible by the means of cooperative innovations. Jollans  outlines this using the term "community innovation" respectively the concept of "Open Computing".
[Figure 1. Open Compting ]
By this he means the combination of the three components of open architecture, open standards and open source, in which a full interoperability can be achieved. The goal of "Open Computing" is the flexibility of a modular integration of function as well as independence from manufacturers, both in hardware and in software. While for example Apple goes the opposite way, due to the experiences of recent years and decades it can be predicted with good conscience, that software will be successful mainly because of its openness.
What does Apache ALOIS stand for?
Apache ALOIS  is a message collection, message splitting and message correlation software with reporting and alarming functionalities. ALOIS stands for "Advanced Log Data Insight System" and is meant to be a fully implemented open source security information and event management system (SIEM). While almost all other SIEM software, be it closed or open source, concentrate on the technological part of security monitoring, Apache ALOIS is aimed to monitor the security of the content. It intends to be pro-active in the detection of potential loss and theft (data leakage), mistaken modification or unauthorized access. Apache ALOIS works on log messages and thus contains all the basic functionality of a conventional SIEM, as centralized collecting, normalizing, aggregation, analyzing and correlating of all messages, as well as reporting all security related events. Therefore it can be used in place of any other SIEM.
Since fall 2010 Apache ALOIS is an undergoing incubation at The Apache Software Foundation (ASF). Incubation allows for a software system to reach a stability level equivalent to other successful ASF projects, regarding infrastructure, communications, and decision making. The ASF  is made up of nearly 100 top level projects that cover a wide range of technologies. While some of them are widely known by name, many more are in wide use as part of popular internet services. The best-known project ist the HTTP-Server, which hosts more than two third of all internet websites . Apache projects are defined by collaborative, consensus-based processes, an open, pragmatic software license and a desire to create high quality software that leads the way in its field. This is known as the "Apache way".
While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has reached a stable phase and has the potential to be fully endorsed by the ASF. In fact, Apache ALOIS has shown its functioning over several years in production. Apache ALOIS is aimed to be totally free and open for all contributions. The openness provided for other programming languages is certainly proof of this. The plug-ability - an active field of work in progress - is meant to guarantee that individual needs can be realized without stressing the whole system. Furthermore, the basic functionality of ALOIS may be extended in directions not yet foreseen. In our opinion, the Linux kernel is a good example that this can work very well.
SIEM and computer forensics
Since Apache ALOIS has originally been designed as a Security Information and Event Management (SIEM) system, it makes sense to give a very brief introduction in this field. The term SIEM is a combination of SIM (security information management) and SEM (security event management), which are disparate tool categories. While SIM is meant to provide long-term storage, analysis and reporting of log data, SEM deals with real-time monitoring, correlation of events, notifications and console views. Now, a SIEM combines these two functionalities in one tool.
The term Security Information Event Management (SIEM) describes the capabilities of gathering, analyzing and presenting information from very different sources as network and security devices, identity and access management applications, operating system, database and application logs and even external threat data. Usually they are forwarded from their respective source to the SIEM as messages (log messages, triggers, traps, file submissions, database table submissions etc.). While the sources are at least partly very different from those of computer forensics, the capabilities are almost the same!
Typically, modern SIEM tools can also integrate with external functionality, as workflow and ticketing for example, so the whole process of incident prosecution can be applied within one user interface. Since every environment is different, the better SIEM tools provide a flexible, extenable set of integration capabilities. Integrating specialised software for computer forensics in a SIEM is therefore by definition, not an unsolvable challenge.
The Architecture of Apache ALOIS
Apache ALOIS consists of five modules interacting to ensure a scaleable functionality of a SIEM:
- Insink is the message sink, which is the receiving entry point for all the different messages into Apache ALOIS. It is partly based on the syslog-ng software. Insink listens for messages (UDP), waits for messages (TCP), receives message collections (files, emails) and pre-filters them to prevent from message flow overload.
- Pumpy is the incoming FIFO buffer, implemented as a relational database tables, which contain the incoming original messages (in raw format). In a complex system setup, there may be several insink instances, e.g. for a group of hosts, for specific types of messages, or for high-avaliablity.
- Prisma contains logic to split up the text of messages into separate fields, based on regular expressions. Actually, "prisma" is a set of "prismi", each one prisma for one type of message (apache, cisco etc.). Several prismi can be applied to the same message. This allows for stacked messages, i.e. forwarded messages contained in compressed files contained in e-mail messages. The data retrieved from the messages is stored in a database called Dobby. Due to prisma being written in Ruby, prismi can be applied interactively (when having system access or through a message field on the website).
- Dobby is the central database. It is usually separated from the Pumpy database for availability and performance reasons. The current implementation is based on MySQL.
- The Analyzer contains the two sub-systems Lizard and Reptor. Lizard is the analysis engine and user interface of Apache ALOIS, implemented in Ruby on Rails using AJAX. It allows for interactive browsing through the collected data, exclusion/inclusion/selection of data, data sorting, data filtering, creation of views, ad-hoc textual and graphical reporting. Reptor allows for automatic activation of views and comparison of these views' results to a predefined result (pattern matching). In case of mismatch, Reptor sends the result to predefined e-mail addresses.
Figure 2 shows an overview of the data flow through the different modules:
[Figure 2. Apache ALOIS message flow and main components]
In addition, Apache ALOIS has to be integrated in an existing environment. Installation is very easy. On the one hand, the configuration consists of connecting systems, devices and applications to Apache ALOIS. There already exists some of these connectors, new ones can easily be programmed in different programming languages. On the other hand filters and reports have to be defined. For this task, Apache ALOIS uses the common tools of SQL and regular expressions. Again, some filters and reports are already integrated in the distribution.
Thanks to its modularizing, Apache ALOIS can be installed on one ore more servers. Therefore, scalability is no severe challenge. Even a cloud service is possible out of the box. A typical production setting is demonstrated in figure 3.
[Figure 3. Exemplary setting of Apache ALOIS]
Apache ALOIS is open to any type of input - whatever the system or tool at hand has as an output. The standard interfaces are syslog, smtp and file upload. In SIEM context, "agents" provide for various formats, and Apache ALOIS could easily be extended for any kind of input.
Using Apache ALOIS as a platform for computer forensics
As already mentioned above, although it is a SIEM, Apache ALOIS already fulfills a lot of the functionality needed in computer forensics. The tasks of analysis, evaluating and reporting is already included. The correlation functionality and a forensic console are a common standard within SIEM systems, and Apache ALOIS sees its main strengths in these domains. ALOIS has source protection (it prevents the alteration of collected data) and further protects it using hash functions. Anonymisation features have been prepared to meet data protection requirements, as have functions to reverse anonymisation to allow for legal prosecution. The forensic console has an easy to use web frontend, which will look familiar to most regular web interface users (see figure 4).
[Figure 4. Forensic console of Apache ALOIS]
Apache ALOIS can be configured to become any type of computer forensics platform. Configurations can be shared, published and reused, and can be instantiated on a case-by-case basis, thus separating date from several forensic cases. Separated databases can be combined to allow for cross-case anaylsis. Many of the standard forensic tools have data export capabilities, and import filters (ALOIS agents) for these filters are easy to create, though probably man in number. Agents may be created by the vendor of the tool, or by the ALOIS team. Apache ALOIS intends to build a "service bus" with standardized interfaces. The proposed architecture looks like figure 5.
[Figure 5: Apache ALOIS service bus (Draft)]
Therefore, it will not only be easy to connect a - proprietary or open source - application to the system. It will also be possible to replace one or another standard modules of Apache ALOIS with the one that fits better the own special needs.
Computer forensics is a domain with highly specialised tools from numerous vendors. What is lacking, at least in our opinion, is an integration platform, where all the data can be combined and be correlated. A centralized data storage, the possibility of cross-correlation and task-oriented user interfaces are but a few of the numerous advantages of such a platform. To guarantee the integration of all the tools from the different vendors, an open source implementation is reccomended.
Apache ALOIS is an open source SIEM and has already build in correlation and reporting. Therefore, it is not necessary to invent the wheel another time to build a forensic platform. Since it is open source software, it could be extended to a vendor-independent computer forensics cross-software platform. Moreover, the fact that the software project is part of the Apache community, guarantees its independence, a commercial-friendly licence (i.e. distribution free of charge) and a healthy development.
The Author would like to thank the open source community in general and especially the Apache community for its great work and support. Also many thanks go to the Apache ALOIS team for peer reviewing this paper.
J.A. Schumpeter, “Business cycles; a theoretical, historical, and statistical analysis of the capitalist process”. New York, London : McGraw-Hill Book, 1939.
G. Kroah-Hartman, J. Corbet, A. McPherson, “Linux Kernel Development : How Fast it is Going, Who is Doing It, What They are Doing, and Who is Sponsoring It ”. http://www.linuxfoundation.org/publications/ whowriteslinux.pdf
- R.M. Stallman, “Free Software, Free Society: Selected Essays of Richard M. Stallman ”. Boston : Free Software Foundation, 2002.
- R.T. Fielding, “Shared leadership in the Apache project ”. Commun. ACM, 42(4), pp. 42-43, 1999.
- S. O'Mahony, C.D. Fernando and E. Mamas, “IBM and Eclipse ”. Harvard : Harvard Business School , 2005.
- S. Spaeth, M. Stuermer and G. von Krogh, “Enabling Knowledge Creation through Outsiders: Towards a Push Model of Open Innovation ”. International Journal of Technology Management.Volume 52, Numbers 3-4, October 2010 , pp. 411-431(21), 2010.
- A. Jollans, “Open Source Beyond Linux : Collaborative Innovation for your business ”. Linux@IBM Event 27.10.2006, Zürich , 2006.