Authors:
Agrim Dewan
1
;
Poojith U. Rao
2
;
Balwinder Sodhi
2
and
Ritu Kapur
2
Affiliations:
1
Department of Computer Science and Engineering, Punjab Engineering College, Chandigarh, India
;
2
Department of Computer Science and Engineering, Indian Institute of Technology Ropar, Punjab, India
Keyword(s):
Third-party Library Detection, Code Similarity, Paragraph Vectors, Software Bloat, Obfuscation.
Abstract:
Third-party libraries (TPLs) provide ready-made implementations of various software functionalities and are frequently used in software development. However, as software development progresses through various iterations, there often remains an unused set of TPLs referenced in the application’s distributable. These unused TPLs become a prominent source of software bloating and are responsible for excessive consumption of resources, such as CPU cycles, memory, and mobile devices’ battery-usage. Thus, the identification of such bloat-TPLs is essential. We present a rapid, storage-efficient, obfuscation-resilient method to detect the bloatTPLs. Our approach’s novel aspects are i) Computing a vector representation of a .class file using a model that we call Jar2Vec. The Jar2Vec model is trained using the Paragraph Vector Algorithm. ii) Before using it for training the Jar2Vec models, a .class file is converted to a normalized form via semantics-preserving transformations. iii) A Bloated L
ibrary Detector (BloatLibD) developed and tested with 27 different Jar2Vec models. These models were trained using different parameters and >30000 .class files taken from >100 different Java libraries available at MavenCentral.com. BloatLibD achieves an accuracy of 99% with an F1 score of 0.968 and outperforms the existing tools, viz., LibScout, LiteRadar, and LibD with an accuracy improvement of 74.5%, 30.33%, and 14.1%, respectively. Compared with LibD, BloatLibD achieves a response time improvement of 61.37% and a storage reduction of 87.93%. Our program artifacts are available at https://bit.ly/2WFALXf.
(More)