A framework for post-event timeline reconstruction using neural networks
Introduction
Digital forensics, also called computer forensics or cyber forensics, has emerged as a new field of study over the last decade due to the rise in the highly technical nature of computer crimes. Digital forensics aims to find and explain the cause for an event or set of events occurred on a computer. This field is very diverse as digital evidence is required in a wide range of computer-related crimes and a range of methods and techniques within the disciplines of engineering and computer science have been studied and implemented. The incidences of computer-related crimes are increasing rapidly mainly due to widespread usage of the Internet and electronic transformation of businesses and personal communications. In addition, the advent of pervasive electronic devices being compatible with the computing machines has made the forensic investigations more complex; consequently, digital forensic investigators have to analyse increasingly larger volumes of data of varying diversity. Larger and more diverse data sets often result in the use of additional resources and greater costs required to complete effective digital forensic investigations. In such scenarios, an efficacious event reconstruction process may be of additional value during digital forensic investigations.
During a digital investigation, forensic analysis of the electronic storage devices is undertaken to find evidence for use in legal proceedings. This requires the use of a systematic methodology so as to draw conclusions from the digital evidence. Carrier categorised digital forensics into three major phases: acquisition, analysis and presentation. The purpose of acquisition phase is to preserve the state of a digital system so that evidence may be obtained and analysed later in controlled conditions. The analysis phase is to examine the acquired data to identify evidence of malicious activity, and this is the focus of our paper. The presentation phase is based entirely on legal rules and regulations, which varies depending on the jurisdiction of the country where the evidence is located and, therefore, is beyond the scope of this paper.
Usually, during the acquisition phase, the first step undertaken by the investigator is to acquire images of the storage devices of a seized computer. The next step after the acquisition phase involves locating evidence that may includes the discovery of documents and image files of relevance to the investigation. In a digital forensic investigation, as in a conventional crime scene reconstruction, it is important to be able to prepare a timeline of the file system activities on a computer system. The timeline is used to reconstruct the suspected crime and is useful in highlighting user access to the target system, execution of certain applications, and identifying system and data files which were accessed, modified or deleted during specific periods of interest to the investigation. However, there is an inherent vulnerability associated with timeline analysis as the time-stamps may be intelligently manipulated by using cleverly designed specialized computer programs (Bishop, 2003) which is a well known tactic used by the hackers to hide their unauthorised accesses on a system.
To present the evidence obtained from a seized computer in a law court, the process by which the evidence was derived is required to be clearly shown to abide by accepted procedures and practices (Thomas and Forcht, 2004). In legal proceedings, this practice is known as the ‘chain of custody’. To prove the authenticity of evidence, comprehensible chains of reasoning are required to be submitted to the court about the scientific methods adopted in the extraction of evidence. The use of artificial intelligence and machine learning techniques in digital forensic investigations particularly requires solid reasoning to be made clear to the legal community about their validity and accuracy as they are disparate from the commonly used conventional methodologies. The successful use of well known data classification and pattern-matching approaches, such as artificial neural networks, in the fields of engineering designs, object and speech recognition, image processing and medical diagnostics, etc. (Bishop, 1996) may ease this legal requirement.
In this paper, we propose a neural network based approach for creating a timeline of the events occurred on a computer system in the past. Our approach is based on determining the footprint left behind by different applications as they execute on a system. We chose Windows XP as our target operating system because of its popularity and for the fact that it is most often attacked (Stolfo, 2005). However, much of our methodology can be tailored to make it transferable to other operating systems, such as Unix/Linux or previous versions of Windows. Our approach can be used in tracing the evidence for a digital crime as it is helpful in spotting the areas for prospective investigation that could be most useful for the forensic investigators to concentrate on. A prerequisite to our approach is that we assume that the computer has been in normal use, without any attempt to hide or disguise the file system activities using encryption and scrubbing software.
Section snippets
Background and motivation
Digital forensics is the application of computer examination and analysis techniques to find out potential legal evidence. Digital evidence might be sought in a wide range of computer crimes or misuse including theft of trade secrets, fraud, child pornography and destruction/contamination of data (Thomas and Forcht, 2004). Digital forensic investigation is commonly employed as a post-event response to a serious information security breach (Stolfo, 2005) to find out the suspicious activities and
Computational design model
Our approach for classification of application programs, as shown in Fig. 1, is based upon investigating the electronic footprint that an application leaves upon the disk storage media, and then developing a neural network which is able to reliably detect application activity based on input parameters derived from the disk image. The key elements of the footprint are
Log files: When an application runs, some of its activities are recorded within the log files, such as the creation of network
Experimentation methodology
We used two models of neural networks; feedforward and recurrent neural networks to train and test data in our experiments. Introduction to neural network and our experimentation methodology is described in this section.
Experiments, results and discussion
We used Internet Explorer (IE) as a case study for the first phase of our experiments. In the second phase we used a group of eight different application programs, namely, MS-Word, Note Pad, Power Point, Excel, Internet Explorer, Adobe Acrobat Reader, Windows Media Player, and Matlab. These applications were run concurrently to establish file system access patterns by these applications. We noted down the time of launching, running, and closing of each individual application program to classify
Conclusion
The construction of a comprehensive timeline of digital evidence may provide a broader picture of the sequences and timing of events relating to computer misuse. In this paper, we have presented a comprehensive and adaptable forensic analysis approach for post-event timeline reconstruction using neural networks. A framework for this experiment was developed for the Microsoft Windows platform and a number of experiments were carried out to learn the pattern of file system manipulation by
References (28)
Finding structure in time
Cognit Sci
(1990)- et al.
An empirical study of event reconstruction systems
Dig Invest
(2006) - Ahmad A, Ruighaver AB. FIRESTORM: exploring the need for a forensic tool for pattern correlation in Windows NT audit...
Computer security: art and science
(2003)Neural networks for pattern recognition
(1996)- Buchholz F, Falk C. Design and implementation of Zeitline: a forensic timeline editor. In: Digital forensics research...
- Carrier BD. Open source digital forensics tools: the legal argument....
Data mining or knowledge discovery in databases: an overview
Defining digital forensic examination and analysis tools using abstraction layers
Int J Digit Evid
(2003)- Cohen W. Fast effective rule induction. In: 12th International conference on machine learning (ICML 95); 1995. p....
Error, uncertainty and loss in digital evidence
Int J Dig Evid
Defining event reconstruction of digital crime scenes
J Forensic Sci
Cited by (30)
An ontology-based approach for the reconstruction and analysis of digital incidents timelines
2015, Digital InvestigationProfiling software applications for forensic analysis
2015, Computer Fraud and SecurityCitation Excerpt :The authors emphasize that longer periods of training also result in generating better user profiles. Khan et al proposed machine learning approaches for post-event timeline reconstruction.13,14 The proposed techniques, however, are based on static analysis of the data enclosures.
Automated event and social network extraction from digital evidence sources with ontological mapping
2015, Digital InvestigationCitation Excerpt :Techniques such as cluster analysis or Bayesian networks may flag outliers and anomalies based against others, but cannot provide a concrete understanding of why these are anomalous, except that they are different to others. It is the authors experience, and it has been shown elsewhere (Khan et al., 2007; Teh and Stewart, 2012) that this is detrimental to investigator acceptance and is against some fundamentals of digital forensics. One of the primary hurdles is that investigators won't accept the findings of an assertion if no information can be given to why it was made.
A triage framework for digital forensics
2015, Computer Fraud and SecurityCitation Excerpt :It is essential to ascertain that the acquired data is consistent and fulfils the prerequisites to perform the forensic analysis; for instance, the evidentiary data is not tampered with and the procedure adopted for analysis is persistent enough to produce the accurate analysis. Khan et al proposed machine learning approaches for post-event timeline reconstruction.17,18 The proposed techniques, however, are based on static analysis of the data enclosures.
Impacts of increasing volume of digital forensic data: A survey and future research challenges
2014, Digital InvestigationCitation Excerpt :Wiles et al. (2007) when discussing the challenge of exponential growth in data, specifically the volume and cost to analyse, alluded to forensic practitioners attempting to locate needles in haystacks which becoming larger and more compact. Khan et al. (2007) highlight that the increasingly larger data volumes with a varying level of diversity result in a need for additional resources and greater cost, and that much of the data consists of text, hence text mining methodologies have been proposed to look for patterns. Case et al. (2008) stated that the ‘leading challenge for digital forensic investigations is that of scale.’
An automated timeline reconstruction approach for digital forensic investigations
2012, Digital Investigation