A framework for post-event timeline reconstruction using neural networks

doi:10.1016/j.diin.2007.11.001

Digital Investigation

Volume 4, Issues 3–4, September–December 2007, Pages 146-157

https://doi.org/10.1016/j.diin.2007.11.001 Get rights and content

Abstract

Post-event timeline reconstruction plays a critical role in forensic investigation and serves as a means of identifying evidence of the digital crime. We present an artificial neural networks based approach for post-event timeline reconstruction using the file system activities. A variety of digital forensic tools have been developed during the past two decades to assist computer forensic investigators undertaking digital timeline analysis, but most of the tools cannot handle large volumes of data efficiently. This paper looks at the effectiveness of employing neural network methodology for computer forensic analysis by preparing a timeline of relevant events occurring on a computing machine by tracing the previous file system activities. Our approach consists of monitoring the file system manipulations, capturing file system snapshots at discrete intervals of time to characterise the use of different software applications, and then using this captured data to train a neural network to recognise execution patterns of the application programs. The trained version of the network may then be used to generate a post-event timeline of a seized hard disk to verify the execution of different applications at different time intervals to assist in the identification of available evidence.

Introduction

Digital forensics, also called computer forensics or cyber forensics, has emerged as a new field of study over the last decade due to the rise in the highly technical nature of computer crimes. Digital forensics aims to find and explain the cause for an event or set of events occurred on a computer. This field is very diverse as digital evidence is required in a wide range of computer-related crimes and a range of methods and techniques within the disciplines of engineering and computer science have been studied and implemented. The incidences of computer-related crimes are increasing rapidly mainly due to widespread usage of the Internet and electronic transformation of businesses and personal communications. In addition, the advent of pervasive electronic devices being compatible with the computing machines has made the forensic investigations more complex; consequently, digital forensic investigators have to analyse increasingly larger volumes of data of varying diversity. Larger and more diverse data sets often result in the use of additional resources and greater costs required to complete effective digital forensic investigations. In such scenarios, an efficacious event reconstruction process may be of additional value during digital forensic investigations.

During a digital investigation, forensic analysis of the electronic storage devices is undertaken to find evidence for use in legal proceedings. This requires the use of a systematic methodology so as to draw conclusions from the digital evidence. Carrier categorised digital forensics into three major phases: acquisition, analysis and presentation. The purpose of acquisition phase is to preserve the state of a digital system so that evidence may be obtained and analysed later in controlled conditions. The analysis phase is to examine the acquired data to identify evidence of malicious activity, and this is the focus of our paper. The presentation phase is based entirely on legal rules and regulations, which varies depending on the jurisdiction of the country where the evidence is located and, therefore, is beyond the scope of this paper.

Usually, during the acquisition phase, the first step undertaken by the investigator is to acquire images of the storage devices of a seized computer. The next step after the acquisition phase involves locating evidence that may includes the discovery of documents and image files of relevance to the investigation. In a digital forensic investigation, as in a conventional crime scene reconstruction, it is important to be able to prepare a timeline of the file system activities on a computer system. The timeline is used to reconstruct the suspected crime and is useful in highlighting user access to the target system, execution of certain applications, and identifying system and data files which were accessed, modified or deleted during specific periods of interest to the investigation. However, there is an inherent vulnerability associated with timeline analysis as the time-stamps may be intelligently manipulated by using cleverly designed specialized computer programs (Bishop, 2003) which is a well known tactic used by the hackers to hide their unauthorised accesses on a system.

To present the evidence obtained from a seized computer in a law court, the process by which the evidence was derived is required to be clearly shown to abide by accepted procedures and practices (Thomas and Forcht, 2004). In legal proceedings, this practice is known as the ‘chain of custody’. To prove the authenticity of evidence, comprehensible chains of reasoning are required to be submitted to the court about the scientific methods adopted in the extraction of evidence. The use of artificial intelligence and machine learning techniques in digital forensic investigations particularly requires solid reasoning to be made clear to the legal community about their validity and accuracy as they are disparate from the commonly used conventional methodologies. The successful use of well known data classification and pattern-matching approaches, such as artificial neural networks, in the fields of engineering designs, object and speech recognition, image processing and medical diagnostics, etc. (Bishop, 1996) may ease this legal requirement.

In this paper, we propose a neural network based approach for creating a timeline of the events occurred on a computer system in the past. Our approach is based on determining the footprint left behind by different applications as they execute on a system. We chose Windows XP as our target operating system because of its popularity and for the fact that it is most often attacked (Stolfo, 2005). However, much of our methodology can be tailored to make it transferable to other operating systems, such as Unix/Linux or previous versions of Windows. Our approach can be used in tracing the evidence for a digital crime as it is helpful in spotting the areas for prospective investigation that could be most useful for the forensic investigators to concentrate on. A prerequisite to our approach is that we assume that the computer has been in normal use, without any attempt to hide or disguise the file system activities using encryption and scrubbing software.

Section snippets

Background and motivation

Digital forensics is the application of computer examination and analysis techniques to find out potential legal evidence. Digital evidence might be sought in a wide range of computer crimes or misuse including theft of trade secrets, fraud, child pornography and destruction/contamination of data (Thomas and Forcht, 2004). Digital forensic investigation is commonly employed as a post-event response to a serious information security breach (Stolfo, 2005) to find out the suspicious activities and

Computational design model

Our approach for classification of application programs, as shown in Fig. 1, is based upon investigating the electronic footprint that an application leaves upon the disk storage media, and then developing a neural network which is able to reliably detect application activity based on input parameters derived from the disk image. The key elements of the footprint are

Log files: When an application runs, some of its activities are recorded within the log files, such as the creation of network

Experimentation methodology

We used two models of neural networks; feedforward and recurrent neural networks to train and test data in our experiments. Introduction to neural network and our experimentation methodology is described in this section.

Experiments, results and discussion

We used Internet Explorer (IE) as a case study for the first phase of our experiments. In the second phase we used a group of eight different application programs, namely, MS-Word, Note Pad, Power Point, Excel, Internet Explorer, Adobe Acrobat Reader, Windows Media Player, and Matlab. These applications were run concurrently to establish file system access patterns by these applications. We noted down the time of launching, running, and closing of each individual application program to classify

Conclusion

The construction of a comprehensive timeline of digital evidence may provide a broader picture of the sequences and timing of events relating to computer misuse. In this paper, we have presented a comprehensive and adaptable forensic analysis approach for post-event timeline reconstruction using neural networks. A framework for this experiment was developed for the Microsoft Windows platform and a number of experiments were carried out to learn the pattern of file system manipulation by

References (28)

J.L. Elman
Finding structure in time
Cognit Sci
(1990)
S. Jeyaraman et al.
An empirical study of event reconstruction systems
Dig Invest
(2006)
Ahmad A, Ruighaver AB. FIRESTORM: exploring the need for a forensic tool for pattern correlation in Windows NT audit...
C.M. Bishop
Computer security: art and science
(2003)
C.M. Bishop
Neural networks for pattern recognition
(1996)
Buchholz F, Falk C. Design and implementation of Zeitline: a forensic timeline editor. In: Digital forensics research...
Carrier BD. Open source digital forensics tools: the legal argument....
P.L. Carbone
Data mining or knowledge discovery in databases: an overview
B. Carrier
Defining digital forensic examination and analysis tools using abstraction layers
Int J Digit Evid
(2003)
Cohen W. Fast effective rule induction. In: 12th International conference on machine learning (ICML 95); 1995. p....

Carrier BD, Spafford EH. Automated digital evidence target definition using outlier analysis and existing evidence. In:...

E. Casey

Error, uncertainty and loss in digital evidence

Int J Dig Evid

(2002)

B.D. Carrier et al.

Defining event reconstruction of digital crime scenes

J Forensic Sci

(2004)

Chan PK, Mahoney MV, Arshad MH. A machine learning approach to anomaly detection. Florida Institute of Technology,...

Cited by (30)

An ontology-based approach for the reconstruction and analysis of digital incidents timelines
2015, Digital Investigation
Due to the democratisation of new technologies, computer forensics investigators have to deal with volumes of data which are becoming increasingly large and heterogeneous. Indeed, in a single machine, hundred of events occur per minute, produced and logged by the operating system and various software. Therefore, the identification of evidence, and more generally, the reconstruction of past events is a tedious and time-consuming task for the investigators. Our work aims at reconstructing and analysing automatically the events related to a digital incident, while respecting legal requirements. To tackle those three main problems (volume, heterogeneity and legal requirements), we identify seven necessary criteria that an efficient reconstruction tool must meet to address these challenges. This paper introduces an approach based on a three-layered ontology, called ORD2I, to represent any digital events. ORD2I is associated with a set of operators to analyse the resulting timeline and to ensure the reproducibility of the investigation.
Profiling software applications for forensic analysis
2015, Computer Fraud and Security
Citation Excerpt :
The authors emphasize that longer periods of training also result in generating better user profiles. Khan et al proposed machine learning approaches for post-event timeline reconstruction.13,14 The proposed techniques, however, are based on static analysis of the data enclosures.
Computers are now a fundamental part of our professional lives. Although advanced technologies are being used to contain digital crimes, alongside these are other technologies that have expanded a criminal community that is constantly searching for new means to commit crimes in more sophisticated ways. Due to the availability of corporate data on the web, coupled with the open access nature of the web, digital miscreants can commit cybercrimes either as legitimate or illegitimate users.
Traditional digital forensics involves static analysis of the data available on permanent storage media, while live analysis allows running systems to be examined to analyse volatile data.
However, live analysis is not without its challenges, not least because each application has different effects on the system. Mamoona Rafique and Muhammad Naeem Ahmed Khan present a model for profiling the behaviour of application programs. This allows investigators to build a behavioural profile of each application in order to understand its effects on the system.
Automated event and social network extraction from digital evidence sources with ontological mapping
2015, Digital Investigation
Citation Excerpt :
Techniques such as cluster analysis or Bayesian networks may flag outliers and anomalies based against others, but cannot provide a concrete understanding of why these are anomalous, except that they are different to others. It is the authors experience, and it has been shown elsewhere (Khan et al., 2007; Teh and Stewart, 2012) that this is detrimental to investigator acceptance and is against some fundamentals of digital forensics. One of the primary hurdles is that investigators won't accept the findings of an assertion if no information can be given to why it was made.
The sharp rise in consumer computing, electronic and mobile devices and data volumes has resulted in increased workloads for digital forensic investigators and analysts. The number of crimes involving electronic devices is increasing, as is the amount of data for each job. This is becoming unscaleable and alternate methods to reduce the time trained analysts spend on each job are necessary.
This work leverages standardised knowledge representations techniques and automated rule-based systems to encapsulate expert knowledge for forensic data. The implementation of this research can provide high-level analysis based on low-level digital artefacts in a way that allows an understanding of what decisions support the facts. Analysts can quickly make determinations as to which artefacts warrant further investigation and create high level case data without manually creating it from the low-level artefacts. Extraction and understanding of users and social networks and translating the state of file systems to sequences of events are the first uses for this work.
A major goal of this work is to automatically derive ‘events’ from the base forensic artefacts. Events may be system events, representing logins, start-ups, shutdowns, or user events, such as web browsing, sending email. The same information fusion and homogenisation techniques are used to reconstruct social networks. There can be numerous social network data sources on a single computer; internet cache can locate Facebook, LinkedIn, Google Plus caches; email has address books and copies of emails sent and received; instant messenger has friend lists and call histories. Fusing these into a single graph allows a more complete, less fractured view for an investigator.
Both event creation and social network creation are expected to assist investigator-led triage and other fast forensic analysis situations.
A triage framework for digital forensics
2015, Computer Fraud and Security
Citation Excerpt :
It is essential to ascertain that the acquired data is consistent and fulfils the prerequisites to perform the forensic analysis; for instance, the evidentiary data is not tampered with and the procedure adopted for analysis is persistent enough to produce the accurate analysis. Khan et al proposed machine learning approaches for post-event timeline reconstruction.17,18 The proposed techniques, however, are based on static analysis of the data enclosures.
A sharp increase in malware and cyber-attacks has been observed in recent years. Analysing cyber-attacks on the affected digital devices falls under the purview of digital forensics. The Internet is the main source of cyber and malware attacks, which sometimes result in serious damage to the digital assets. The motive behind digital crimes varies – such as online banking fraud, information stealing, denial of services, security breaches, deceptive output of running programs and data distortion.
Digital forensics analysts use a variety of tools for data acquisition, evidence analysis and presentation of malicious activities. This leads to device diversity posing serious challenges for investigators.
For this reason, some attack scenarios have to be examined repeatedly, which entails tremendous effort on the part of the examiners when analysing the evidence. To counter this problem, Muhammad Shamraiz Bashir and Muhammad Naeem Ahmed Khan at the Shaheed Zulfikar Ali Bhutto Institute of Science and Technology, Islamabad, Pakistan propose a novel triage framework for digital forensics.
Impacts of increasing volume of digital forensic data: A survey and future research challenges
2014, Digital Investigation
Citation Excerpt :
Wiles et al. (2007) when discussing the challenge of exponential growth in data, specifically the volume and cost to analyse, alluded to forensic practitioners attempting to locate needles in haystacks which becoming larger and more compact. Khan et al. (2007) highlight that the increasingly larger data volumes with a varying level of diversity result in a need for additional resources and greater cost, and that much of the data consists of text, hence text mining methodologies have been proposed to look for patterns. Case et al. (2008) stated that the ‘leading challenge for digital forensic investigations is that of scale.’
A major challenge to digital forensic analysis is the ongoing growth in the volume of data seized and presented for analysis. This is a result of the continuing development of storage technology, including increased storage capacity in consumer devices and cloud storage services, and an increase in the number of devices seized per case. Consequently, this has led to increasing backlogs of evidence awaiting analysis, often many months to years, affecting even the largest digital forensic laboratories. Over the preceding years, there has been a variety of research undertaken in relation to the volume challenge. Solutions posed range from data mining, data reduction, increased processing power, distributed processing, artificial intelligence, and other innovative methods. This paper surveys the published research and the proposed solutions. It is concluded that there remains a need for further research with a focus on real world applicability of a method or methods to address the digital forensic data volume challenge.
An automated timeline reconstruction approach for digital forensic investigations
2012, Digital Investigation
Existing work on digital forensics timeline generation focuses on extracting times from a disk image into a timeline. Such an approach can produce several million ‘low-level’ events (e.g. a file modification or a Registry key update) for a single disk. This paper proposes a technique that can automatically reconstruct high-level events (e.g. connection of a USB stick) from this set of low-level events. The paper describes a framework that extracts low-level events to a SQLite backing store which is automatically analysed for patterns. The provenance of any high-level events is also preserved, meaning that from a high-level event it is possible to determine the low-level events that caused its inference, and from those, the raw data that caused the low-level event to be initially created can also be viewed. The paper also shows how such high-level events can be visualised using existing tools.

View all citing articles on Scopus

View full text

A framework for post-event timeline reconstruction using neural networks

Abstract

Introduction

Section snippets

Background and motivation

Computational design model

Experimentation methodology

Experiments, results and discussion

Conclusion

Cognit Sci

Dig Invest

Computer security: art and science

Neural networks for pattern recognition

Data mining or knowledge discovery in databases: an overview

Defining digital forensic examination and analysis tools using abstraction layers

Int J Digit Evid

Error, uncertainty and loss in digital evidence

Int J Dig Evid

Defining event reconstruction of digital crime scenes

J Forensic Sci