Legal inference is fundamental for building and verifying hypotheses in police investigations. In this study, we build a Natural Language Inference dataset in Korean for the legal domain, focusing on criminal court verdicts. We developed an adversarial hypothesis collection tool that can challenge the annotators and give us a deep understanding of the data, and a hypothesis network construction tool with visualized graphs to show a use case scenario of the developed model. The data is augmented using a combination of Easy Data Augmentation approaches and round-trip translation, as crowd-sourcing might not be an option for datasets with sensible data. We extensively discuss challenges we have encountered, such as the annotator’s limited domain knowledge, issues in the data augmentation process, problems with handling long contexts and suggest possible solutions to the issues. Our work shows that creating legal inference datasets with limited resources is feasible and proposes further research in this area.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The data is available on GitHub (https://github.com/onspark/LEAP_NLI_v2.0).
Both police manuals for creating investigation result reports and investigation review reports were internal documents. We pursued expert interviews and solicited detailed explanations in written form to understand these processes more accurately.
Auto-GPT: An Autonomous GPT-4 Experiment. (2023). [Python]. Significant Gravitas. https://github.com/Significant-Gravitas/Auto-GPT
Bayer M, Kaufhold M-A, Reuter C (2022) A survey on data augmentation for text classification. ACM Comput Surv. https://doi.org/10.1145/3544558
Belinkov Y, Bisk Y (2018) Synthetic and natural noise both break neural machine translation (arXiv:1711.02173)
Beltagy I, Peters ME, Cohan A (2020) Longformer: the long-document transformer (arXiv:2004.05150)
Bhagavatula C, Bras RL, Malaviya C, Sakaguchi K, Holtzman A, Rashkin H, Downey D, Yih SW, Choi Y (2020) Abductive commonsense reasoning (arXiv:1908.05739)
Bowman SR, Angeli G, Potts C, Manning CD (2015) A large annotated corpus for learning natural language inference (arXiv:1508.05326)
Bras RL, Swayamdipta S, Bhagavatula C, Zellers R, Peters ME, Sabharwal A, Choi Y (2020) Adversarial filters of dataset biases (arXiv:2002.04108)
Clark K, Luong M-T, Le QV, Manning CD (2020) ELECTRA: pre-training text encoders as discriminators rather than generators (arXiv:2003.10555)
Conneau A, Rinott R, Lample G, Williams A, Bowman S, Schwenk H, Stoyanov V (2018) XNLI: evaluating cross-lingual sentence representations. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 2475–2485. https://doi.org/10.18653/v1/D18-1269
Coulombe C (2018) Text data augmentation made simple by leveraging NLP cloud APIs (arXiv:1812.04718). arXiv. http://arxiv.org/abs/1812.04718
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding (arXiv:1810.04805). arXiv. http://arxiv.org/abs/1810.04805
Goodfellow IJ, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. arXiv. http://arxiv.org/abs/1412.6572
Gururangan S, Swayamdipta S, Levy O, Schwartz R, Bowman S, Smith NA (2018) Annotation artifacts in natural language inference data. In: Proceedings of the 2018 conference of the north American chapter of the association for computational linguistics: human language technologies, Volume 2 (Short Papers), pp 107–112. https://doi.org/10.18653/v1/N18-2017
Ham J, Choe YJ, Park K, Choi I, Soh H (2020) KorNLI and KorSTS: new benchmark datasets for Korean natural language understanding. Findings of the Association for Computational Linguistics: EMNLP 2020, pp 422–430. https://doi.org/10.18653/v1/2020.findings-emnlp.39
Heo J (2021) 110 cases per person... Police investigation examiner who was hit by a “day bomb.” Seoul Economic Daily. https://www.sedaily.com/NewsView/22M7D9OSWB
Jia Y, Liu Y, Yu X, Voida S (2017) Designing leaderboards for gamification: perceived differences based on user ranking, application domain, and personality traits. In: Proceedings of the 2017 CHI conference on human factors in computing systems, pp 1949–1960. https://doi.org/10.1145/3025453.3025826
Kaushik D, Hovy E, Lipton ZC (2020) Learning the difference that makes a difference with counterfactually-augmented data (arXiv:1909.12434)
Kim A (2022) How Democratic Party of Korea-led prosecution reforms fail victims. Korean Herald. https://www.koreaherald.com/view.php?ud=20220501000254&ACE_SEARCH=1
Kim M-Y, Rabelo J, Okeke K, Goebel R (2022) Legal information retrieval and entailment based on BM25, transformer and semantic thesaurus methods. Rev Socionetwork Strateg 16(1):157–174. https://doi.org/10.1007/s12626-022-00103-1
Kim T (2020) KorEDA [Python]. https://github.com/catSirup/KorEDA (Original work published 2020)
KLAID LJP Base (2022) Law&Company. lawcompany/KLAID_LJP_base
LBox Open (2022) [Python]. LBOX. https://github.com/lbox-kr/lbox-open
Likert R (1932) A technique for the measurement of attitudes. Arch Psychol 22(140):55–55
Liu, H., Cui, L., Liu, J., & Zhang, Y. (2020). Natural Language Inference in Context—Investigating Contextual Reasoning over Long Texts (arXiv:2011.04864)
Nakajima, Y. (2023). BabyAGI [Python]. https://github.com/yoheinakajima/babyagi
Nie Y, Williams A, Dinan E, Bansal M, Weston J, Kiela D (2020) Adversarial NLI: a new benchmark for natural language understanding. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 4885–4901. https://doi.org/10.18653/v1/2020.acl-main.441
Oshin M (2023) GPT-4 & LangChain—Create a ChatGPT Chatbot for Your PDF Files [TypeScript]. https://github.com/mayooear/gpt4-pdf-chatbot-langchain
Park D (2021) KoEDA [Python]. https://github.com/toriving/KoEDA (Original work published 2020)
Park J (2022) KoELECTRA [Python]. https://github.com/monologg/KoELECTRA
Park S, Moon J, Kim S, Cho WI, Han J, Park J, Song C, Kim J, Song Y, Oh T, Lee J, Oh J, Lyu S, Jeong Y, Lee I, Seo S, Lee D, Kim H, Lee M et al (2021) KLUE: Korean language understanding evaluation (arXiv:2105.09680)
Pirolli P, Card S (2005) The sensemaking process and leverage points for analyst technology as identified through cognitive task analysis. In: The sensemaking process and leverage points for analyst technology as identified through cognitive task analysis
Poliak A, Naradowsky J, Haldar A, Rudinger R, Van Durme B (2018) Hypothesis only baselines in natural language inference. In: Proceedings of the seventh joint conference on lexical and computational semantics, pp 180–191. https://doi.org/10.18653/v1/S18-2023
Rabelo J, Goebel R, Kim M-Y, Kano Y, Yoshioka M, Satoh K (2022) Overview and discussion of the competition on legal information extraction/entailment (COLIEE) 2021. Rev Socionetwork Strateg 16(1):111–133. https://doi.org/10.1007/s12626-022-00105-z
Um J (2022) S. Korean Democrats’ long road to reforming prosecution service: Victory or blunder? Hankyoreh. https://english.hani.co.kr/arti/english_edition/e_national/1041606.html
Wei J, Zou K (2019) EDA: easy data augmentation techniques for boosting performance on text classification tasks. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 6381–6387. https://doi.org/10.18653/v1/D19-1670
Williams A, Nangia N, Bowman S (2018) A broad-coverage challenge corpus for sentence understanding through inference. In: Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: human language technologies, vol 1 (Long Papers), pp 1112–1122. https://doi.org/10.18653/v1/N18-1101
Woo J (2020) S. Korea takes long overdue steps to rein in prosecution service, but task far from over. Yonhap News. https://en.yna.co.kr/view/AEN20201217008100315
Xie Q, Dai Z, Hovy E, Luong M-T, Le QV (2020) Unsupervised data augmentation for consistency training (arXiv:1904.12848)
Yu AW, Dohan D, Luong M-T, Zhao R, Chen K, Norouzi M, Le QV (2018) QANet: combining local convolution with global self-attention for reading comprehension (arXiv:1804.09541)
Zhang WE, Sheng QZ, Alhazmi A, Li C (2020) Adversarial attacks on deep-learning models in natural language processing: a survey. ACM Tran Intell Syst Technol 11(3):1–41. https://doi.org/10.1145/3374217
This research was supported and funded by the Korean National Police Agency [Project Name: AI-Based Crime Investigation Support System/ Project Number: PR10-02-000-21]. The authors also thank the Legal Informatics and Forensic Science (LIFS) institute at Hallym University and its researchers for their indispensable help in creating the data.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Park, S., James, J.I. Lessons learned building a legal inference dataset. Artif Intell Law 32, 1011–1044 (2024). https://doi.org/10.1007/s10506-023-09370-x
Issue Date:
DOI: https://doi.org/10.1007/s10506-023-09370-x