Abstract
With the extensive use of LLMs in research and practical applications, it has become more and more important to evaluate them effectively, and by studying the evaluation methods can help to better understand the LLMs, guard against unknowns, avoid risks, and provide a basis for their better and faster iterative upgrading. In this research, the ability of recognizing ethics and security issues in text is investigated through a multi-dimensional adversarial example evaluation method, using ERNIE Bot (V2.2.3) as an example. The ESIIP of ERNIE Bot (V2.2.3) is evaluated by slightly perturbing the input data through multidimensional adversarial examples to induce the model to make false predictions. In this research, the evaluation objectives are classified into ethics and security issues such as discrimination and prejudice detection, values analysis, and ethical conflict identification, and security issues such as false information detection, privacy violation detection, and network security detection. Multiple representative datasets, using different attack strategies to perturb them slightly, the research formulated a rigorous evaluation criterion developed for the model’s responses, and comprehensively analyzed the scores of all the LLMs; the research drew the corresponding metrics conclusions and complexity conclusions, and compared the performance with other LLMs models in recognizing the ethics and security issues in the text. The results show that ERNIE Bot (V2.2.3) performs well in ESIIP, not reaching the perfect level; it also shows the reliability and feasibility of the research method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bansal, R.: A survey on bias and fairness in natural language processing (2022). https://doi.org/10.48550/arXiv.2204.09591
Yuan, Z., Shi, B.: Discriminative manifold learning network using adversarial examples for image classification. J. Electr. Eng. Technol. 13 (2018). https://doi.org/10.5370/JEET.2018.13.5.2099
Audi, R.: The Cambridge Dictionary of Philosophy. Cambridge University Press (1996). https://doi.org/10.1111/j.1468-0149.1996.tb02545.x
Dennis, A., Jones, R., Kildare, D., et al.: A design science approach to developing and evaluating a national cybersecurity framework for Jamaica. Electron. J. Inf. Syst. Dev. Countries 62(1) (2014).https://doi.org/10.1002/j.1681-4835.2014.tb00444.x
Jouini, M., Rabai, L.B.A., Aissa, A.B.: Classification of security threats in information systems. Procedia Comput. Sci. 32, 489–496 (2014). https://doi.org/10.1016/j.procs.2014.05.452
Ani, U.D., He, H., Tiwari, A.: Human factor security: evaluating the cybersecurity capacity of the industrial workforce. J. Syst. Inf. Technol. 21(9) (2018). https://doi.org/10.1108/JSIT-02-2018-0028
Kent, A., Williams, J.G., Kent, R., et al.: Encyclopedia of Computer Science and Technology. M. Dekker (1977)
Workman, M.: Encyclopedia of information and ethics security. J. Assoc. Inf. Sci. Technol. 60(8), 1723–1724 (2010). https://doi.org/10.1002/asi.21088
Gravatt, A.E., Lindzey, G., Aronson, F.: The handbook of social psychology. Mental Health 6(2), 86–86 (2013). https://doi.org/10.1002/wcs.7
Jia, J., Gong, N.Z.: Defending against machine learning based inference attacks via adversarial examples: opportunities and challenges (2019). https://doi.org/10.48550/arXiv.1909.08526. Accessed 09 Dec 2023
Wick, M.L., Silverstein, K., Tristan, J.B., et al.: Detecting and exorcising statistical demons from language models with anti-models of negative data (2020). https://doi.org/10.48550/arXiv.2010.11855
Brown, T.B., Mann, B., Ryder, N., et al.: Language models are few-shot learners (2020). https://doi.org/10.48550/arXiv.2005.14165
Li, L., Lei, J., Gan, Z., et al.: VALUE: a multi-task benchmark for video-and-language understanding evaluation (2021). https://doi.org/10.48550/arXiv.2106.04632
Hiller, J.S., Russell, R.S.: Privacy in crises: the NIST privacy framework. J. Contingencies Crisis Manag. 25(1), 31–38 (2017). https://doi.org/10.1111/1468-5973.12143
Lakshmanarao, A., Shashi, M.: A survey on machine learning for cyber security. Int. J. Sci. Technol. Res. 9, 499–502 (2020)
Jabbari, S., Joseph, M., Kearns, M., et al.: Fair learning in Markovian environments (2016). https://doi.org/10.48550/arXiv.1611.03071
Marabelli, M., Newell, S., Handunge, V.: The lifecycle of algorithmic decision-making systems: Organizational choices and ethical challenges. J. Strateg. Inf. Syst. 30(3) (2023). https://doi.org/10.1016/j.jsis.2021.101683. Accessed 09 Dec 2023
Yuan, S., Wu, X.: Deep learning for insider threat detection: review, challenges and opportunities. Comput. Secur. 104(C), 102221 (2021). https://doi.org/10.1016/j.cose.2021.102221
Varley, M., Belle, V.: Fairness in machine learning with tractable models – ScienceDirect. Knowl.-Based Syst. (2021). https://doi.org/10.1016/j.knosys.2020.106715
Deho, O.B., Liu, L., Li, J., et al.: How do the existing fairness metrics and unfairness mitigation algorithms contribute to ethical learning analytics?. Br. J. Educ. Technol. 53(4), 822–843 (2022). https://doi.org/10.1111/bjet.13217
Balayn, A., Lofi, C., Houben, G.J.: Managing bias and unfairness in data for decision support: a survey of machine learning and data engineering approaches to identify and mitigate bias and unfairness within data management and analytics systems. VLDB J. 1–30 (2021). https://doi.org/10.1007/s00778-021-00671-8
Wahde, M., Virgolin, M.: The five is: key principles for interpretable and safe conversational AI. arXiv e-prints (2021). https://doi.org/10.48550/arXiv.2108.13766
Shafti, A., Derks, V., Kay, H., et al.: The response shift paradigm to quantify human trust in AI recommendations (2022). https://doi.org/10.48550/arXiv.2202.08979
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Liu, K., Li, Y., Cao, L., Tu, D., Fang, Z., Zhang, Y. (2024). Research of Multidimensional Adversarial Examples in LLMs for Recognizing Ethics and Security Issues. In: Hong, W., Kanaparan, G. (eds) Computer Science and Education. Educational Digitalization. ICCSE 2023. Communications in Computer and Information Science, vol 2025. Springer, Singapore. https://doi.org/10.1007/978-981-97-0737-9_26
Download citation
DOI: https://doi.org/10.1007/978-981-97-0737-9_26
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0736-2
Online ISBN: 978-981-97-0737-9
eBook Packages: Computer ScienceComputer Science (R0)