[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Research of Multidimensional Adversarial Examples in LLMs for Recognizing Ethics and Security Issues

  • Conference paper
  • First Online:
Computer Science and Education. Educational Digitalization (ICCSE 2023)

Abstract

With the extensive use of LLMs in research and practical applications, it has become more and more important to evaluate them effectively, and by studying the evaluation methods can help to better understand the LLMs, guard against unknowns, avoid risks, and provide a basis for their better and faster iterative upgrading. In this research, the ability of recognizing ethics and security issues in text is investigated through a multi-dimensional adversarial example evaluation method, using ERNIE Bot (V2.2.3) as an example. The ESIIP of ERNIE Bot (V2.2.3) is evaluated by slightly perturbing the input data through multidimensional adversarial examples to induce the model to make false predictions. In this research, the evaluation objectives are classified into ethics and security issues such as discrimination and prejudice detection, values analysis, and ethical conflict identification, and security issues such as false information detection, privacy violation detection, and network security detection. Multiple representative datasets, using different attack strategies to perturb them slightly, the research formulated a rigorous evaluation criterion developed for the model’s responses, and comprehensively analyzed the scores of all the LLMs; the research drew the corresponding metrics conclusions and complexity conclusions, and compared the performance with other LLMs models in recognizing the ethics and security issues in the text. The results show that ERNIE Bot (V2.2.3) performs well in ESIIP, not reaching the perfect level; it also shows the reliability and feasibility of the research method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 49.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 64.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bansal, R.: A survey on bias and fairness in natural language processing (2022). https://doi.org/10.48550/arXiv.2204.09591

  2. Yuan, Z., Shi, B.: Discriminative manifold learning network using adversarial examples for image classification. J. Electr. Eng. Technol. 13 (2018). https://doi.org/10.5370/JEET.2018.13.5.2099

  3. Audi, R.: The Cambridge Dictionary of Philosophy. Cambridge University Press (1996). https://doi.org/10.1111/j.1468-0149.1996.tb02545.x

  4. Dennis, A., Jones, R., Kildare, D., et al.: A design science approach to developing and evaluating a national cybersecurity framework for Jamaica. Electron. J. Inf. Syst. Dev. Countries 62(1) (2014).https://doi.org/10.1002/j.1681-4835.2014.tb00444.x

  5. Jouini, M., Rabai, L.B.A., Aissa, A.B.: Classification of security threats in information systems. Procedia Comput. Sci. 32, 489–496 (2014). https://doi.org/10.1016/j.procs.2014.05.452

  6. Ani, U.D., He, H., Tiwari, A.: Human factor security: evaluating the cybersecurity capacity of the industrial workforce. J. Syst. Inf. Technol. 21(9) (2018). https://doi.org/10.1108/JSIT-02-2018-0028

  7. Kent, A., Williams, J.G., Kent, R., et al.: Encyclopedia of Computer Science and Technology. M. Dekker (1977)

    Google Scholar 

  8. Workman, M.: Encyclopedia of information and ethics security. J. Assoc. Inf. Sci. Technol. 60(8), 1723–1724 (2010). https://doi.org/10.1002/asi.21088

  9. Gravatt, A.E., Lindzey, G., Aronson, F.: The handbook of social psychology. Mental Health 6(2), 86–86 (2013). https://doi.org/10.1002/wcs.7

  10. Jia, J., Gong, N.Z.: Defending against machine learning based inference attacks via adversarial examples: opportunities and challenges (2019). https://doi.org/10.48550/arXiv.1909.08526. Accessed 09 Dec 2023

  11. Wick, M.L., Silverstein, K., Tristan, J.B., et al.: Detecting and exorcising statistical demons from language models with anti-models of negative data (2020). https://doi.org/10.48550/arXiv.2010.11855

  12. Brown, T.B., Mann, B., Ryder, N., et al.: Language models are few-shot learners (2020). https://doi.org/10.48550/arXiv.2005.14165

  13. Li, L., Lei, J., Gan, Z., et al.: VALUE: a multi-task benchmark for video-and-language understanding evaluation (2021). https://doi.org/10.48550/arXiv.2106.04632

  14. Hiller, J.S., Russell, R.S.: Privacy in crises: the NIST privacy framework. J. Contingencies Crisis Manag. 25(1), 31–38 (2017). https://doi.org/10.1111/1468-5973.12143

  15. Lakshmanarao, A., Shashi, M.: A survey on machine learning for cyber security. Int. J. Sci. Technol. Res. 9, 499–502 (2020)

    Google Scholar 

  16. Jabbari, S., Joseph, M., Kearns, M., et al.: Fair learning in Markovian environments (2016). https://doi.org/10.48550/arXiv.1611.03071

  17. Marabelli, M., Newell, S., Handunge, V.: The lifecycle of algorithmic decision-making systems: Organizational choices and ethical challenges. J. Strateg. Inf. Syst. 30(3) (2023). https://doi.org/10.1016/j.jsis.2021.101683. Accessed 09 Dec 2023

  18. Yuan, S., Wu, X.: Deep learning for insider threat detection: review, challenges and opportunities. Comput. Secur. 104(C), 102221 (2021). https://doi.org/10.1016/j.cose.2021.102221

  19. Varley, M., Belle, V.: Fairness in machine learning with tractable models – ScienceDirect. Knowl.-Based Syst. (2021). https://doi.org/10.1016/j.knosys.2020.106715

  20. Deho, O.B., Liu, L., Li, J., et al.: How do the existing fairness metrics and unfairness mitigation algorithms contribute to ethical learning analytics?. Br. J. Educ. Technol. 53(4), 822–843 (2022). https://doi.org/10.1111/bjet.13217

  21. Balayn, A., Lofi, C., Houben, G.J.: Managing bias and unfairness in data for decision support: a survey of machine learning and data engineering approaches to identify and mitigate bias and unfairness within data management and analytics systems. VLDB J. 1–30 (2021). https://doi.org/10.1007/s00778-021-00671-8

  22. Wahde, M., Virgolin, M.: The five is: key principles for interpretable and safe conversational AI. arXiv e-prints (2021). https://doi.org/10.48550/arXiv.2108.13766

  23. Shafti, A., Derks, V., Kay, H., et al.: The response shift paradigm to quantify human trust in AI recommendations (2022). https://doi.org/10.48550/arXiv.2202.08979

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Kainan Liu or Yifan Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, K., Li, Y., Cao, L., Tu, D., Fang, Z., Zhang, Y. (2024). Research of Multidimensional Adversarial Examples in LLMs for Recognizing Ethics and Security Issues. In: Hong, W., Kanaparan, G. (eds) Computer Science and Education. Educational Digitalization. ICCSE 2023. Communications in Computer and Information Science, vol 2025. Springer, Singapore. https://doi.org/10.1007/978-981-97-0737-9_26

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-0737-9_26

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-0736-2

  • Online ISBN: 978-981-97-0737-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics