Research of Multidimensional Adversarial Examples in LLMs for Recognizing Ethics and Security Issues

Kainan Liu ORCID: orcid.org/0009-0005-6005-3619⁷,
Yifan Li ORCID: orcid.org/0009-0007-0460-2198⁷,
Lihong Cao⁷,
Danni Tu⁷,
Zhi Fang⁸ &
…
Yusong Zhang⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2025))

Included in the following conference series:

International Conference on Computer Science and Education

291 Accesses

Abstract

With the extensive use of LLMs in research and practical applications, it has become more and more important to evaluate them effectively, and by studying the evaluation methods can help to better understand the LLMs, guard against unknowns, avoid risks, and provide a basis for their better and faster iterative upgrading. In this research, the ability of recognizing ethics and security issues in text is investigated through a multi-dimensional adversarial example evaluation method, using ERNIE Bot (V2.2.3) as an example. The ESIIP of ERNIE Bot (V2.2.3) is evaluated by slightly perturbing the input data through multidimensional adversarial examples to induce the model to make false predictions. In this research, the evaluation objectives are classified into ethics and security issues such as discrimination and prejudice detection, values analysis, and ethical conflict identification, and security issues such as false information detection, privacy violation detection, and network security detection. Multiple representative datasets, using different attack strategies to perturb them slightly, the research formulated a rigorous evaluation criterion developed for the model’s responses, and comprehensively analyzed the scores of all the LLMs; the research drew the corresponding metrics conclusions and complexity conclusions, and compared the performance with other LLMs models in recognizing the ethics and security issues in the text. The results show that ERNIE Bot (V2.2.3) performs well in ESIIP, not reaching the perfect level; it also shows the reliability and feasibility of the research method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 49.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 64.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bansal, R.: A survey on bias and fairness in natural language processing (2022). https://doi.org/10.48550/arXiv.2204.09591
Yuan, Z., Shi, B.: Discriminative manifold learning network using adversarial examples for image classification. J. Electr. Eng. Technol. 13 (2018). https://doi.org/10.5370/JEET.2018.13.5.2099
Audi, R.: The Cambridge Dictionary of Philosophy. Cambridge University Press (1996). https://doi.org/10.1111/j.1468-0149.1996.tb02545.x
Dennis, A., Jones, R., Kildare, D., et al.: A design science approach to developing and evaluating a national cybersecurity framework for Jamaica. Electron. J. Inf. Syst. Dev. Countries 62(1) (2014).https://doi.org/10.1002/j.1681-4835.2014.tb00444.x
Jouini, M., Rabai, L.B.A., Aissa, A.B.: Classification of security threats in information systems. Procedia Comput. Sci. 32, 489–496 (2014). https://doi.org/10.1016/j.procs.2014.05.452
Ani, U.D., He, H., Tiwari, A.: Human factor security: evaluating the cybersecurity capacity of the industrial workforce. J. Syst. Inf. Technol. 21(9) (2018). https://doi.org/10.1108/JSIT-02-2018-0028
Kent, A., Williams, J.G., Kent, R., et al.: Encyclopedia of Computer Science and Technology. M. Dekker (1977)
Google Scholar
Workman, M.: Encyclopedia of information and ethics security. J. Assoc. Inf. Sci. Technol. 60(8), 1723–1724 (2010). https://doi.org/10.1002/asi.21088
Gravatt, A.E., Lindzey, G., Aronson, F.: The handbook of social psychology. Mental Health 6(2), 86–86 (2013). https://doi.org/10.1002/wcs.7
Jia, J., Gong, N.Z.: Defending against machine learning based inference attacks via adversarial examples: opportunities and challenges (2019). https://doi.org/10.48550/arXiv.1909.08526. Accessed 09 Dec 2023
Wick, M.L., Silverstein, K., Tristan, J.B., et al.: Detecting and exorcising statistical demons from language models with anti-models of negative data (2020). https://doi.org/10.48550/arXiv.2010.11855
Brown, T.B., Mann, B., Ryder, N., et al.: Language models are few-shot learners (2020). https://doi.org/10.48550/arXiv.2005.14165
Li, L., Lei, J., Gan, Z., et al.: VALUE: a multi-task benchmark for video-and-language understanding evaluation (2021). https://doi.org/10.48550/arXiv.2106.04632
Hiller, J.S., Russell, R.S.: Privacy in crises: the NIST privacy framework. J. Contingencies Crisis Manag. 25(1), 31–38 (2017). https://doi.org/10.1111/1468-5973.12143
Lakshmanarao, A., Shashi, M.: A survey on machine learning for cyber security. Int. J. Sci. Technol. Res. 9, 499–502 (2020)
Google Scholar
Jabbari, S., Joseph, M., Kearns, M., et al.: Fair learning in Markovian environments (2016). https://doi.org/10.48550/arXiv.1611.03071
Marabelli, M., Newell, S., Handunge, V.: The lifecycle of algorithmic decision-making systems: Organizational choices and ethical challenges. J. Strateg. Inf. Syst. 30(3) (2023). https://doi.org/10.1016/j.jsis.2021.101683. Accessed 09 Dec 2023
Yuan, S., Wu, X.: Deep learning for insider threat detection: review, challenges and opportunities. Comput. Secur. 104(C), 102221 (2021). https://doi.org/10.1016/j.cose.2021.102221
Varley, M., Belle, V.: Fairness in machine learning with tractable models – ScienceDirect. Knowl.-Based Syst. (2021). https://doi.org/10.1016/j.knosys.2020.106715
Deho, O.B., Liu, L., Li, J., et al.: How do the existing fairness metrics and unfairness mitigation algorithms contribute to ethical learning analytics?. Br. J. Educ. Technol. 53(4), 822–843 (2022). https://doi.org/10.1111/bjet.13217
Balayn, A., Lofi, C., Houben, G.J.: Managing bias and unfairness in data for decision support: a survey of machine learning and data engineering approaches to identify and mitigate bias and unfairness within data management and analytics systems. VLDB J. 1–30 (2021). https://doi.org/10.1007/s00778-021-00671-8
Wahde, M., Virgolin, M.: The five is: key principles for interpretable and safe conversational AI. arXiv e-prints (2021). https://doi.org/10.48550/arXiv.2108.13766
Shafti, A., Derks, V., Kay, H., et al.: The response shift paradigm to quantify human trust in AI recommendations (2022). https://doi.org/10.48550/arXiv.2202.08979

Download references

Author information

Authors and Affiliations

University of Sanya, No. 191 Xueyuan Road, Jiyang District, Sanya, Hainan, China
Kainan Liu, Yifan Li, Lihong Cao & Danni Tu
Xiangsihu College of GuangXi Minzu University, No. 55 Youyi Road, Jiangnan District, Nanning, Guangxi, China
Zhi Fang
Northeast Forestry University, No. 26 Hexing Road, Xiangfang District, Harbin, Heilongjiang, China
Yusong Zhang

Authors

Kainan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yifan Li
View author publications
You can also search for this author in PubMed Google Scholar
Lihong Cao
View author publications
You can also search for this author in PubMed Google Scholar
Danni Tu
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Fang
View author publications
You can also search for this author in PubMed Google Scholar
Yusong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Kainan Liu or Yifan Li .

Editor information

Editors and Affiliations

Xiamen University, Xiamen, China
Wenxing Hong
Xiamen University Malaysia, Sepang, Malaysia
Geetha Kanaparan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, K., Li, Y., Cao, L., Tu, D., Fang, Z., Zhang, Y. (2024). Research of Multidimensional Adversarial Examples in LLMs for Recognizing Ethics and Security Issues. In: Hong, W., Kanaparan, G. (eds) Computer Science and Education. Educational Digitalization. ICCSE 2023. Communications in Computer and Information Science, vol 2025. Springer, Singapore. https://doi.org/10.1007/978-981-97-0737-9_26

Download citation

DOI: https://doi.org/10.1007/978-981-97-0737-9_26
Published: 26 February 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0736-2
Online ISBN: 978-981-97-0737-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics