Abstract
Unlike the courts in western countries, legal documents of the Indian judiciary are unstructured, verbose, and noisy. In the justice system, statutes are written laws referred to by judges in support of judicial decisions. Retrieving relevant statutes for a given legal problem can be helpful to lawyers as well as the common man. Moreover, the dearth of publicly available annotated datasets of Indian legal documents limits the scope of legal analytics research. In this paper, we propose a ranking algorithm called CASRank to identify relevant statutes for a legal case query. We also develop a new dataset consisting of 858 Central Acts enacted by the Indian Parliament. Each Central Act is annotated with several attributes, like the act title, enactment date, act definition, chapters, sections, schedules, and footnotes. The first part of the experiment determines the best retrieval model suited for CASRank. The second set of experiments aims to identify the extent to which the attributes of the proposed Central Act dataset contribute towards the retrieval effectiveness of statutes. Experimental results show that the proposed approach obtains a MAP score of 0.0776 with a Precision@10 of 0.0420, showing a considerable increase in retrieval efficiency.
Similar content being viewed by others
Data Availability Statement
Central Act dataset [46] generated during this study has been deposited in the Zenodo repository. https://doi.org/10.5281/zenodo.5088102
Notes
References
Amati G, Van Rijsbergen C J (2002) Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans Inform Syst (TOIS) 20(4):357–389
Belkin N J, Kantor P, Fox E A, Shaw J A (1995) Combining the evidence of multiple query representations for information retrieval. Inform Process Manag 31(3):431–448
Bhattacharya P, Ghosh K, Ghosh S, Pal A, Mehta P, Bhattacharya A, Majumder P (2019) Overview of the FIRE 2019 AILA track: artificial intelligence for legal assistance. In: FIRE (working notes). CEUR workshop proceedings, vol 2517, pp 1–12
Bhattacharya P, Paul S, Ghosh K, Ghosh S, Wyner A Z (2019) Identification of rhetorical roles of sentences in indian legal judgments. arXiv:1911.05405
Bhatti U A, Huang M, Wu D, Zhang Y, Mehmood A, Han H (2019) Recommendation system using feature extraction and pattern recognition in clinical care systems. Enterp Inform Syst 13(3):329–351. https://doi.org/10.1080/17517575.2018.1557256
Das A, Ganguly D, Garain U (2017) Named entity recognition with word embeddings and wikipedia categories for a low-resource language. ACM Trans Asian Low-Resource Lang Inform Process (TALLIP) 16(3):1–19
Farzindar A, Lapalme G (2004) Letsum, an automatic legal text summarizing system. In: Legal knowledge and information systems: JURIX 2004, the seventeenth annual conference, vol 120. IOS Press, pp 11–18
Galgani F, Compton P, Hoffmann A (2012) Citation based summarisation of legal texts. In: PRICAI 2012: trends in artificial intelligence. Springer, Berlin, pp 40–52
Géry M, Largeron C (2012) Bm25t: a bm25 extension for focused information retrieval. Knowl Inform Syst 32(1):217–241
Hachey B, Grover C (2006) Extractive summarisation of legal texts. Artif Intell Law 14(4):305–345
Hliaoutakis A, Varelas G, Voutsakis E, Petrakis EGM, Milios E (2006) Information retrieval by semantic similarity. International Journal on Semantic Web and Information Systems (IJSWIS) 2(3):55–73
Jain D, Borah M D, Biswas A (2020) Fine-tuning textrank for legal document summarization: a bayesian optimization based approach. In: Forum for information retrieval evaluation. FIRE 2020, pp 41–48
Jain R, Agarwal A, Sharma Y (2020) Spectre@aila-fire2020: Supervised rhetorical role labeling for legal judgments using transformers. In: FIRE (working notes). CEUR Workshop proceedings, vol 2826, pp 66–70
Kanapala A, Pal S, Pamula R (2019) Text summarization from legal documents: a survey. Artif Intell Rev 51(3):371–402
Kim M-Y, Rabelo J, Goebel R (2019) Statute law information retrieval and entailment. In: Proceedings of the seventeenth international conference on artificial intelligence and law. ICAIL ’19, pp 283–289
Kim W, Lee Y, Kim D, Won M, Jung H (2016) Ontology-based model of law retrieval system for r&d projects. In: Proceedings of the 18th annual international conference on electronic commerce: e-commerce in smart connected world. ICEC ’16
Lefoane M, Koboyatshwene T, Rammidi G, Narasimham V L (2019) Legal statutes retrieval: a comparative approach on performance of title and statutes descriptive text. In: FIRE (working notes). CEUR Workshop Proceedings, vol 2517. CEUR-WS.org, pp 52–57
Li J, Sun A, Han J, Li C (2020) A survey on deep learning for named entity recognition. IEEE Trans Knowl Data Eng 34(1):50–70
Liu C-L, Chen K-C (2019) Extracting the gist of chinese judgments of the supreme court. In: Proceedings of the seventeenth international conference on artificial intelligence and law, pp 73–82
Liu S, Zhou M X, Pan S, Song Y, Qian W, Cai W, Lian X (2012) Tiara: interactive, topic-based visual text summarization and analysis. ACM Trans Intell Syst Technol (TIST) 3(2):1–28
Liu Y-H, Chen Y-L, Ho W-L (2015) Predicting associated statutes for legal problems. Inform Process Manag 51 (1):194–211. https://doi.org/10.1016/j.ipm.2014.07.003
Lloret E, Palomar M (2012) Text summarisation in progress: a literature review. Artif Intell Rev 37(1):1–41
Lovins J B (1968) Development of a stemming algorithm. Mech Transl Comput Linguistics 11(1–2):22–31
Mandal A, Ghosh K, Bhattacharya A, Pal A, Ghosh S (2017) Overview of the FIRE 2017 irled track: information retrieval from legal documents. In: FIRE (working notes). CEUR Workshop Proceedings, vol 2036, pp 63–68
Merchant K, Pande Y (2018) Nlp based latent semantic analysis for legal text summarization. In: 2018 International conference on advances in computing, communications and informatics (ICACCI). IEEE, pp 1803–1807
Moens M-F (2005) Combining structured and unstructured information in a retrieval model for accessing legislation. In: Proceedings of the 10th international conference on artificial intelligence and law. ICAIL ’05, pp 141–145
More R, Patil J, Palaskar A, Pawde A (2019) Removing named entities to find precedent legal cases. In: FIRE (working notes). CEUR Workshop proceedings, vol 2517, pp 13–18
Oard D W, Baron J R, Hedin B, Lewis D D, Tomlinson S (2010) Evaluation of information retrieval for e-discovery. Artif Intell Law 18 (4):347–386
Parikh V, Mathur V, Mehta P, Mittal N, Majumder P (2021) Lawsum: a weakly supervised approach for indian legal document summarization. arXiv:2110.01188
Polsley S, Jhunjhunwala P, Huang R (December 2016) CaseSummarizer: a system for automated summarization of legal texts. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: system demonstrations, pp 258–262
Rabelo J, Kim M-Y, Goebel R, Yoshioka M, Kano Y, Satoh K (2019) A summary of the coliee 2019 competition. In: JSAI International symposium on artificial intelligence. Springer, pp 34–49
Robertson S, Zaragoza H (April 2009) The probabilistic relevance framework: Bm25 and beyond. Found Trends Inf Retr 3 (4):333–389. https://doi.org/10.1561/1500000019
Rogers A, Kovaleva O, Rumshisky A (2020) A primer in bertology: what we know about how bert works. Trans Assoc Comput Ling 8:842–866
Saravanan M, Ravindran B, Raman S (2006) Improving legal document summarization using graphical models. In: Proceedings of the 2006 conference on legal knowledge and information systems: JURIX 2006: the nineteenth annual conference. IOS Press, NLD, pp 51–60
Shao Y, Ye Z (2019) Thuir@aila 2019: information retrieval approaches for identifying relevant precedents and statutes. In: FIRE (working notes). CEUR Workshop Proceedings, vol 2517, pp 46–51
Teufel S, Moens M (2002) Summarizing scientific articles: experiments with relevance and rhetorical status. Comput Linguist 28(4):409–445
Thenmozhi D, Kannan K, Aravindan C (2017) A text similarity approach for precedence retrieval from legal documents. In: FIRE (working notes). CEUR Workshop Proceedings, vol 2036, pp 90–91
Trappey C V, Trappey A JC, Liu B-H (2020) Identify trademark legal case precedents - using machine learning to enable semantic analysis of judgments. World Patent Inf 62:101980. https://doi.org/10.1016/j.wpi.2020.101980
Turtle H (1995) Text retrieval in the legal world. Artif Intell Law 3(1):5–54
Van Opijnen M, Santos C (2017) On the concept of relevance in legal information retrieval. Artif Intell Law 25(1):65–87
Wang T, Chen P, Simovici D (2016) A new evaluation measure using compression dissimilarity on text summarization. Appl Intell 45(1):127–134
Wu H C, Luk R W P, Wong K F, Kwok K L (2008) Interpreting tf-idf term weights as making relevance decisions. ACM Trans Inform Syst (TOIS) 26(3):1–37
Zhang N, Pu Y-F, Wang P (2015) An ontology-based approach for chinese legal information retrieval. In: Proc CENet, pp 1–7
Zhang W, Yoshida T, Tang X (2011) A comparative study of tf* idf, lsi and multi-words for text classification. Expert Syst Appl 38(3):2758–2765
Zhao Z, Ning H, Liu L, Huang C, Kong L, Han Y, Han Z (2019) Fire2019@aila: Legal information retrieval using improved BM25. In: FIRE (working notes). CEUR workshop proceedings, vol 2517, pp 40–45
Parashar S (2021) An annotated dataset of Central Acts enacted by the Indian Parliament for legal research. Zenodo. https://doi.org/10.5281/zenodo.5088102
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Parashar, S., Mittal, N. & Mehta, P. CASRank: A ranking algorithm for legal statute retrieval. Multimed Tools Appl 83, 5369–5386 (2024). https://doi.org/10.1007/s11042-023-15464-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15464-0