Computer Science and Information Systems 2014 Volume 11, Issue 3, Pages: 1037-1054
https://doi.org/10.2298/CSIS130920063L
Full text ( 1033 KB)
Cited by
The efficient implementation of distributed indexing with Hadoop for digital investigations on Big Data
Lee Taerim (Pukyong National University, Busan, Republic of Korea)
Lee Hyejoo (Kongju National University, Gongju, Republic of Korea)
Rhee Kyung-Hyune (Pukyong National University, Busan, Republic of Korea)
Shin Uk Sang (Pukyong National University, Busan, Republic of Korea)
Big Data brings new challenges to the field of e-Discovery or digital
forensics and these challenges are mostly connected to the various methods
for data processing. Considering that the most important factors are time and
cost in determining success or failure of digital investigation, the
development of a valid indexing method for efficient search should come first
to more quickly and accurately find relevant evidence from Big Data. This
paper, therefore, introduces a Distributed Text Processing System based on
Hadoop called DTPS and explains about the distinctions between DTPS and other
related researches to emphasize the necessity of it. In addition, this paper
describes various experimental results in order to find the best
implementation strategy in using Hadoop MapReduce for the distributed
indexing and to analyze the worth for practical use of DTPS by comparative
evaluation of its performance with similar tools. To be short, the ultimate
purpose of this research is the development of useful search engine specially
aimed at Big Data indexing as a major part for the future e-Discovery cloud
service.
Keywords: electronic discovery, e-discovery, digital forensics, evidence search, indexing performance, Hadoop MapReduce, distributed indexing