Abstract
An issue that continues to impact digital forensics is the increasing volume of data and the growing number of devices. One proposed method to deal with the problem of “big digital forensic data”: the volume, variety, and velocity of digital forensic data, is to reduce the volume of data at either the collection stage or the processing stage. We have developed a novel approach which significantly improves on current practice, and in this paper we outline our data volume reduction process which focuses on imaging a selection of key files and data such as: registry, documents, spreadsheets, email, internet history, communications, logs, pictures, videos, and other relevant file types. When applied to test cases, a hundredfold reduction of original media volume was observed. When applied to real world cases of an Australian Law Enforcement Agency, the data volume further reduced to a small percentage of the original media volume, whilst retaining key evidential files and data. The reduction process was applied to a range of real world cases reviewed by experienced investigators and detectives and highlighted that evidential data was present in the data reduced forensic subset files. A data reduction approach is applicable in a range of areas, including: digital forensic triage, analysis, review, intelligence analysis, presentation, and archiving. In addition, the data reduction process outlined can be applied using common digital forensic hardware and software solutions available in appropriately equipped digital forensic labs without requiring additional purchase of software or hardware. The process can be applied to a wide variety of cases, such as terrorism and organised crime investigations, and the proposed data reduction process is intended to provide a capability to rapidly process data and gain an understanding of the information and/or locate key evidence or intelligence in a timely manner.
Similar content being viewed by others
References
Gartner. IT Glossary: Big Data. http://www.gartner.com/it-glossary/big-data/ (2013). Accessed 21 July 2013
Garfinkel, S.: Digital forensics research: the next 10 years. Digit. Investig. 7, S64–S73 (2010)
Raghavan, S.: Digital forensic research: current state of the art. CSI Trans. ICT 1(1), 91–114 (2013)
FBI_RCFL: FBI Regional Computer Forensic Laboratory Annual Reports 2003–2012. 2003–2012; http://www.rcfl.gov/downloads
Australia, C.o., National plan to combat cybercrime, A.C. Commission, Editor 2013: Canberra
Palmer, G.: A road map for digital forensic research. Report from the First Digital Forensic Research Workshop (DFRWS) (2001)
Richard, G., Roussev, V.: Digital Forensics Tools: The Next Generation. Digital Crime and Forensic Science in Cyberspace, p. 75, 2006
Beebe, N.: Digital Forensic Research: The Good, the Bad and the Unaddressed. Advances in Digital Forensics, pp. 17–36. Springer, Berlin (2009)
Kenneally, E., Brown, C.: Risk sensitive digital evidence collection. Digit. Investig. 2(2), 101–119 (2005)
Greiner, L.: Sniper Forensics. netWorker 13(4), 8–10 (2009)
Beebe, N., Clark, J.: Dealing with terabyte data sets in digital investigations. Advances in Digital Forensics, pp. 3–16. Springer, Berlin (2005)
Alzaabi, M., Jones, A., Martin, T.A.: An Ontology-Based Forensic Analysis Tool. Journal of Digital Forensics, Security & Law, 2013. In: 2013 Conference Supplement, pp. 121–135
van Baar, R.B., van Beek, H.M.A., van Eijk, E.J.: Digital forensics as a service: a game changer. Digit. Investig. 11, S54–S62 (2014)
Casey, E., Ferraro, M., Nguyen, L.: Investigation delayed is justice denied: proposals for expediting forensic examinations of digital evidence. J. Forensic Sci. 54(6), 1353–1364 (2009)
Casey, E., Katz, G., Lewthwaite, J.: Honing digital forensic processes. Digit. Investig. 10(2), 138–147 (2013)
Vidas, T., Kaplan, B., Geiger, M.: OpenLV: empowering investigators and first-responders in the digital forensics process. Digit. Investig. 11, S45–S53 (2014)
Noel, G.E., Peterson, G.L.: Applicability of latent Dirichlet allocation to multi-disk search. Digit. Investig. 11(1), 43–56 (2014)
Xu, Z., et al.: Knowle: a semantic link network based system for organizing large scale online news events. Future Gener. Comput. Syst. 43, 40–50 (2015)
Xu, Z., et al.: Crowdsourcing based social media data analysis of urban emergency events. In: Multimedia Tools and Applications, pp. 1–18, 2015
Xu, Z., et al.: Crowdsourcing based description of urban emergency events using social media big data. In: IEEE Transactions on Cloud Computing, PP(99): pp. 1–1, 2016
Brown, R., Pham, B., de Vel, O.: Design of a digital forensics image mining system. In: Knowledge-Based Intelligent Information and Engineering Systems, pp. 395–404, 2005
Pollitt, M.M.: Triage: a practical solution or admission of failure. Digit. Investig. 10(2), 87–88 (2013)
Ferraro, M.M., Russell, A.: Current issues confronting well-established computer-assisted child exploitation and computer crime task forces. Digit. Investig. 1(1), 7–15 (2004)
Turner, P.: Applying a forensic approach to incident response, network investigation and system administration using Digital Evidence Bags. Digit. Investig. 4(1), 30–35 (2007)
Parsonage, H.: Computer Forensics Case Assessment and Triage—some ideas for discussion, 2009. http://computerforensics.parsonage.co.uk/triage/triage.htm. Accessed 4 Aug 2013
Shiaeles, S., Chryssanthou, A., Katos, V.: On-scene triage open source forensic tool chests: are they effective? Digit. Investig. 10(2), 99–115 (2013)
Roussev, V., Richard, G.: Breaking the performance wall: The case for distributed digital forensics, 2004. In: Proceedings of the 2004 Digital Forensics Research Workshop, Vol. 94
Lee, J., Un, S., Hong, D.: High-speed search using Tarari content processor in digital forensics. Digit. Investig. 5, S91–S95 (2008)
Pringle, N., Sutherland, I.: Is a Computational Grid a Suitable Platform for High Performance Digital Forensics? In: Proceedings of the 7th European Conference on Information Warfare and Security 2008, Academic Conferences Limited, p. 175
Sheldon, A.: The future of forensic computing. Digit. Investig. 2(1), 31–35 (2005)
Alink, W., et al.: XIRAF—XML-based indexing and querying for digital forensics. Digit. Investig. 3, 50–58 (2006)
Bhoedjang, R.A.F., et al.: Engineering an online computer forensic service. Digit. Investig. 9(2), 96–108 (2012)
Ribaux, O., Walsh, S.J., Margot, P.: The contribution of forensic science to crime analysis and investigation: forensic intelligence. Forensic Sci. Int. 156(2), 171–181 (2006)
Kantardzic, M.: Data Mining: Concepts, Models, Methods, and Algorithms. Wiley, New York (2011)
Pyle, D.: Data Preparation for Data Mining, vol. 1. Morgan Kaufmann, Burlington (1999)
Fayyad, U., Piatetsky-Shapiro, G.: Knowledge discovery and data mining: towards a unifying framework. In: KDD, pp. 82–88, 1996
Shannon, M.: Forensic relative strength scoring: ASCII and entropy scoring. Int. J. Digit. Evid. 2(4), 151–169 (2004)
Wang, L., et al.: Particle swarm optimization based dictionary learning for remote sensing big data. Knowl. Based Syst. 79, 43–50 (2015)
Wang, L., et al.: IK-SVD: dictionary learning for spatial big data via incremental atom update. Comput. Sci. Eng. 16(4), 41–52 (2014)
Ma, Y., et al.: Towards building a data-intensive index for big data computing—a case study of remote sensing data processing. In: Information Sciences, 2014
Stüttgen, J.: Selective imaging: creating efficient forensic images by selecting content first. Mannheim University, 2011
Garfinkel, S.L.: Forensic feature extraction and cross-drive analysis. Digit. Investig. 3, 71–81 (2006)
Shaw, A., Browne, A.: A practical and robust approach to coping with large volumes of data submitted for digital forensic examination. Digit. Investig. 10(2), 116–128 (2013)
Grier, J., Richard III, G.G.: Rapid forensic acquisition of large media with sifting collectors. Digit. Investig. 2015(14), S34–S44 (2015)
Quick, D., Choo, K.-K.R.: Data reduction and data mining framework for digital forensic evidence: storage, intelligence, review and archive. Trends Issues Crime Crim. Justice 480, 1–11 (2014)
ISO/IEC, 27037:2012 Guidelines for identification, collection, acquisition and preservation of digital evidence, in Information technology—Security techniques. ISO, Geneva (2012)
ACPO: Good Practice Guidelines for Computer Based Evidence v4.0. 2006. www.7safe.com/electronic_evidence. Accessed 5 Mar 2014
NIJ: Forensic Examination of Digital Evidence: A Guide for Law Enforcement, 2004. http://nij.gov/nij/pubs-sum/199408.htm
Alqahtany, S., et al.: A forensic acquisition and analysis system for IaaS. In: Cluster Computing, pp. 1–15, 2015
Hu, C., et al.: Semantic link network-based model for organizing multimedia big data. IEEE Trans. Emerg. Top. Comput. 2(3), 376–387 (2014)
Xu, Z., et al.: Semantic based representing and organizing surveillance big data using video structural description technology. J. Syst. Softw. 102, 217–225 (2015)
Hu, C., et al.: Video structural description technology for the new generation video surveillance systems. Front. Comput. Sci. 9(6), 980–989 (2015)
Xu, Z., et al.: Semantic enhanced cloud environment for surveillance data management using video structural description. In: Computing, pp. 1–20, 2014
Alhussein, M.: Automatic facial emotion recognition using weber local descriptor for e-Healthcare system. In: Cluster Computing, pp. 1–10, 2016
Jones, B., Pleno, S., Wilkinson, M.: The use of random sampling in investigations involving child abuse material. Digit. Investig. 9, S99–S107 (2012)
Garfinkel, S., et al.: Bringing science to digital forensics with standardized forensic corpora. Digit. Investig. 6, S2–S11 (2009)
Ribaux, O., et al.: Intelligence-led crime scene processing. Part I: Forensic intelligence. Forensic Sci. Int. 195(1–3), 10–16 (2010)
Luo, X., et al.: Building association link network for semantic link on web resources. IEEE Trans. Autom. Sci. Eng. 8(3), 482–494 (2011)
Xu, Z., et al.: Measuring the semantic discrimination capability of association relations. Concurr. Comput. 26(2), 380–395 (2014)
Xu, Z., et al.: Generating temporal semantic context of concepts using web search engines. J. Netw. Comput. Appl. 43, 42–55 (2014)
Wei, X., et al.: Online comment-based hotel quality automatic assessment using improved fuzzy comprehensive evaluation and fuzzy cognitive map. IEEE Trans. Fuzzy Syst. 23(1), 72–84 (2015)
Xu, Z., et al.: Mining temporal explicit and implicit semantic relations between entities using web search engines. Future Gener. Comput. Syst. 37, 468–477 (2014)
Xuan, J., et al.: Uncertainty analysis for the keyword system of web events, 2015
Zhao, L., et al.: Geographical information system parallelization for spatial big data processing: a review. In: Cluster Computing, pp. 1–14, 2015
Punithavathani, D.S., Sujatha, K., Jain, J.M.: Surveillance of anomaly and misuse in critical networks to counter insider threats using computational intelligence. Clust. Comput. 18(1), 435–451 (2015)
Ghaleb, T.A.: Techniques and countermeasures of website/wireless traffic analysis and fingerprinting. In: Cluster Computing, pp. 1–12, 2015
Acknowledgments
The views and opinions expressed in this article are those of the authors alone and not the organizations with whom the authors are or have been associated/supported. The authors thank the anonymous reviewers and their colleagues at the Electronic Crime Section of the South Australia Police for their assistance and providing constructive and generous feedback.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Quick, D., Choo, KK.R. Big forensic data reduction: digital forensic images and electronic evidence. Cluster Comput 19, 723–740 (2016). https://doi.org/10.1007/s10586-016-0553-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-016-0553-1