Published by Elsevier Ltd.
Unique file identification in the National Software Reference Library
Received 22 June 2006;
References and further reading may be available for this article. To view references and further reading you must purchase this article.
Abstract
The National Software Reference Library (NSRL) provides a repository of known software, file profiles, and file signatures for use by law enforcement and other organizations involved with computer forensic investigations.
During a forensic investigation, hundreds of thousands of files may be encountered. The NSRL is used to identify known files. This can reduce the amount of time spent examining a computer. Matches for common operating systems and applications do not need to be searched, either manually or electronically, for evidence. Additionally, the NSRL is used to determine which software applications are present on a system. This may suggest how the computer was being used and provide information on how and where to search for evidence.
This paper examines whether the techniques used to create file signatures in the NSRL produce unique results—a core characteristic that the NSRL depends on for the majority of its uses. The uniqueness of the file identification is analyzed via two methods: an empirical analysis of the file signatures within the NSRL and research into the recent attacks on the hash algorithms used to generate the file signatures within the NSRL.
Keywords: Computer forensics; National Software Reference Library; File hashing; MD5; SHA-1; Digital fingerprint; Collisions; File signature
Article Outline
- 1. Introduction
- 2. NSRL file signatures
- 3. Uniqueness
- 4. Approach
- 4.1. Examining the NSRL for collisions
- 4.2. Likelihood for future collisions
- 4.3. The impact of the recent attacks on MD5 and SHA-1
- 5. Results: collisions in the NSRL
- 6. Future collisions—file bias on signature generation
- 6.1. Statistical test suite overview
- 6.1.1. Statistical hypothesis testing
- 6.1.2. Significance level, α
- 6.1.3. Probability value (P-value)
- 6.1.4. STS tests
- 6.1.5. Methodology (STS)
- 6.2. Results summary
- 7. Results: future collisions—detecting file bias through data visualization
- 7.1. Methodology
- 7.2. Analysis summary
- 8. Implications of MD5 and SHA-1 attacks on forensic use of hashes for file identification
- 9. Conclusions
- Acknowledgements
- References
- Vitae







E-mail Article
Add to my Quick Links

Cited By in Scopus (0)





