[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Querying Football Matches for Event Data: Towards Using Large Language Models

  • Conference paper
  • First Online:
Sports Analytics (ISACE 2024)

Abstract

Football, being one of the most popular sports in the world, has attracted significant attention from researchers exploring the potential of Artificial Intelligence (AI). In particular, Large Language Models (LLMs), exemplified by digital assistants such as ChatGPT, have proven their capabilities and offer a potentially effective avenue for football research. However, accessibility of football data remains a challenge, as the datasets collected by providers are often inaccessible. This case study presents a proof-of-concept that addresses this challenge by introducing an innovative web scraping approach to extract football event data and making it accessible e.g. for scientific research with LLMs. To this end, the extracted data is structured into coherent sentences for linguistic compatibility. The results show the successful integration of LLMs with football event data, enabling the extraction of information through retrieval-augmented generation. This work makes a first contribution to the field by bridging the gap between football and LLMs, demonstrating the potential for further analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 87.50
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 109.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Code and prompts are available online: https://ml-and-vis.org/football-query/.

  2. 2.

    https://understat.com/.

References

  1. Amer-Yahia, S., et al.: From large language models to databases and back: a discussion on research and education. SIGMOD Rec. 52(3), 49–56 (2023)

    Article  Google Scholar 

  2. Anzer, G., Bauer, P.: Expected passes: determining the difficulty of a pass in football (soccer) using spatio-temporal data. Data Min. Knowl. Disc. 36(1), 295–317 (2022)

    Article  MathSciNet  Google Scholar 

  3. Arede, J., Ferreira, A.P., Esteves, P., Gonzalo-Skok, O., Leite, N.: Train smarter, play more: insights about preparation and game participation in youth national team. Res. Q. Exerc. Sport 91(4), 583–593 (2020)

    Article  Google Scholar 

  4. Bauer, P., Anzer, G.: Data-driven detection of counterpressing in professional football. Data Min. Knowl. 35(5), 2009–2049 (2021)

    Article  MathSciNet  Google Scholar 

  5. Bonner, E., Lege, R., Frazier, E.: Large language model-based artificial intelligence in the language classroom: practical ideas for teaching. Teach. Engl. Technol. 23(1), 23–41 (2023)

    Google Scholar 

  6. Caron, M., Müller, O.: TacticalGPT: uncovering the potential of LLMs for predicting tactical decisions in professional football. In: StatsBomb Conference 2023, pp. 1–11 (2023)

    Google Scholar 

  7. Chase, H.: LangChain (2022). https://github.com/langchain-ai/langchain

  8. Cloutier, N.A., Japkowicz, N.: Fine-tuned generative LLM oversampling can improve performance over traditional techniques on multiclass imbalanced text classification. In: 2023 IEEE International Conference on Big Data (BigData), pp. 5181–5186 (2023)

    Google Scholar 

  9. Cotta, L.: Using FIFA soccer video game data for soccer analytics. In: Workshop on Large Scale Sports Analytics, pp. 1–4 (2016)

    Google Scholar 

  10. Ćwiklinski, B., Giełczyk, A., Choraś, M.: Who will score? A machine learning approach to supporting football team building and transfers. Entropy 23(1), 1–12 (2021)

    Article  Google Scholar 

  11. Douze, M., et al.: The Faiss library, pp. 1–21. arXiv preprint arXiv:2401.08281 (2024)

  12. Forcher, L., et al.: How coaches can improve their teams’ match performance–the influence of in-game changes of tactical formation in professional soccer. Front. Psychol. 13, 1–12 (2022)

    Article  Google Scholar 

  13. Franks, A., D’Amour, A., Cervone, D., Bornn, L.: Meta-analytics: Tools for understanding the statistical properties of sports metrics. J. Quant. Anal. Sports 12(4), 151–165 (2016)

    Google Scholar 

  14. García-Aliaga, A., Marquina, M., Coterón, J., Rodríguez-González, A., Luengo-Sánchez, S.: In-game behaviour analysis of football players using machine learning techniques based on player statistics. Int. J. Sports Sci. Coach. 16(1), 148–157 (2021)

    Article  Google Scholar 

  15. Ghar, S., Patil, S., Arunachalam, V.: Data Driven football scouting assistance with simulated player performance extrapolation. In: 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 1160–1167 (2021)

    Google Scholar 

  16. Goes, F.R., Kempe, M., Meerhoff, L.A., Lemmink, K.A.: Not every pass can be an assist: a data-driven model to measure pass effectiveness in professional soccer matches. Big Data 7(1), 57–70 (2019)

    Article  Google Scholar 

  17. Goes, F., et al.: Unlocking the potential of big data to support tactical performance analysis in professional soccer: a systematic review. Eur. J. Sport Sci. 21(4), 481–496 (2021)

    Article  Google Scholar 

  18. Grassetti, L., Bellio, R., Di Gaspero, L., Fonseca, G., Vidoni, P.: An extended regularized adjusted plus-minus analysis for lineup management in basketball using play-by-play data. IMA J. Manag. Math. 32(4), 385–409 (2021)

    MathSciNet  Google Scholar 

  19. Jeong, C.: A study on the implementation of generative AI services using an enterprise data-based LLM application architecture. AAIML 3(4), 1588–1618 (2023)

    Article  Google Scholar 

  20. Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. IEEE Trans. Big Data 7(3), 535–547 (2019)

    Article  Google Scholar 

  21. Leis, V., Haubenschild, M., Kemper, A., Neumann, T.: LeanStore: in-memory data management beyond main memory. In: 2018 IEEE 34th International Conference on Data Engineering (ICDE), pp. 185–196 (2018)

    Google Scholar 

  22. Lepschy, H., Wäsche, H., Woll, A.: Success factors in football: an analysis of the German Bundesliga. Int. J. Perform. Anal. Sport 20(2), 150–164 (2020)

    Article  Google Scholar 

  23. Liu, Q., Geng, X., Huang, H., Qin, T., Lu, J., Jiang, D.: MGRC: an end-to-end multigranularity reading comprehension model for question answering. IEEE Trans. Neural Netw. Learn. Syst. 34(5), 2594–2605 (2023)

    Article  Google Scholar 

  24. Löchtefeld, M., Jäckel, C., Krüger, A.: TwitSoccer: knowledge-based crowd-sourcing of live soccer events. In: Proceedings of the 14th International Conference on MUM, pp. 148–151 (2015)

    Google Scholar 

  25. Louzada, F., Maiorano, A.C., Ara, A.: iSports: a web-oriented expert system for talent identification in soccer. Expert Syst. Appl. 44, 400–412 (2016)

    Article  Google Scholar 

  26. Minaee, S., et al.: Large language models: a survey, pp. 1–43. arXiv preprint, arXiv:2402.06196 (2024)

  27. Moustakidis, S., Plakias, S., Kokkotis, C., Tsatalas, T., Tsaopoulos, D.: Predicting football team performance with explainable AI: leveraging SHAP to identify key team-level performance metrics. Future Internet 15(5), 174 (2023)

    Article  Google Scholar 

  28. Pappalardo, L., et al.: A public data set of spatio-temporal match events in soccer competitions. Sci. Data 6(1), 236 (2019)

    Article  Google Scholar 

  29. Pappalardo, L., Cintia, P., Ferragina, P., Massucco, E., Pedreschi, D., Giannotti, F.: PlayeRank: data-driven performance evaluation and player ranking in soccer via a machine learning approach. ACM Trans. Intell. Syst. Technol. 10(5), 1–27 (2019)

    Article  Google Scholar 

  30. Potluri, J., Gummadi, H., Bhogi, M., Katta, Y.S., Ramesh, G., Meghana Reddy, T.S.: Unveiling covert conversational agents: enhancing insight, archives, and dialog acts with ChatGPT. In: 7th International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 766–772 (2023)

    Google Scholar 

  31. Rahimian, P., Toka, L.: A data-driven approach to assist offensive and defensive players in optimal decision making. Int. J. Sports Sci. Coach. 19(1), 245–256 (2023)

    Article  Google Scholar 

  32. Sangüesa, A.A., Moeslund, T.B., Bahnsen, C.H., Iglesias, R.B.: Identifying basketball plays from sensor data; towards a low-cost automatic extraction of advanced statistics. In: IEEE International Conference on Data Mining Workshops (ICDMW), pp. 894–901 (2017)

    Google Scholar 

  33. Shen, L., Tan, Z., Li, Z., Li, Q., Jiang, G.: Tactics analysis and evaluation of women football team based on convolutional neural network. Sci. Rep. 14(1), 255 (2024)

    Article  Google Scholar 

  34. Subramanya, S.J., Devvrit, F., Kadekodi, R., Krishaswamy, R., Simhadri, H.V.: DiskANN: fast accurate billion-point nearest neighbor search on a single node. In: Advances in Neural Information Processing Systems, pp. 1–11 (2019)

    Google Scholar 

  35. Taipalus, T.: Vector database management systems: fundamental concepts, use-cases, and current challenges. Cogn. Syst. Res. 1–13, 101216 (2024)

    Article  Google Scholar 

  36. Vidal-Codina, F., Evans, N., El Fakir, B., Billingham, J.: Automatic event detection in football using tracking data. Sports Eng. 25(8), 1–15 (2022)

    Google Scholar 

  37. Yang, J., et al.: Harnessing the power of LLMs in practice: a survey on ChatGPT and beyond. ACM Trans. Knowl. Discov. Data 18(6), 1–32 (2024)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marco Klaiber .

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Overview of the Different Prompts, Their Difficulty and Results

(See Table 2).

Table 2. All RetrievalQA Chain questions with reference to their answer figures

1.2 A.2 Prompts and Answers of the Model

figure a
Fig. 2.
figure 2

Question 1 with RetrievalQA chain answer (Difficulty: easy)

figure b
Fig. 3.
figure 3

Question 2 with RetrievalQA chain answer (Difficulty: easy)

figure c
Fig. 4.
figure 4

Question 3 with RetrievalQA chain answer (Difficulty: easy)

figure d
Fig. 5.
figure 5

Question 4 with RetrievalQA chain answer (Difficulty: easy)

figure e
Fig. 6.
figure 6

Question 5 with RetrievalQA chain answer (Difficulty: hard)

figure f
Fig. 7.
figure 7

Question 6 with RetrievalQA chain answer (Difficulty: hard)

figure g
Fig. 8.
figure 8

Question 7 with RetrievalQA chain answer (Difficulty: hard)

figure h
Fig. 9.
figure 9

Question 8 with RetrievalQA chain answer (Difficulty: hard)

figure i
Fig. 10.
figure 10

Question 9 with RetrievalQA chain answer (Difficulty: hard)

figure j
Fig. 11.
figure 11

Question 10 with RetrievalQA chain answer (Difficulty: hard)

figure k
Fig. 12.
figure 12

Question 11 with RetrievalQA chain answer (Difficulty: hard)

figure l
Fig. 13.
figure 13

Question 12 with RetrievalQA chain answer (Difficulty: hard)

figure m
Fig. 14.
figure 14

Question 13 with RetrievalQA chain answer (Difficulty: hard)

figure n
Fig. 15.
figure 15

Question 14 with RetrievalQA chain answer (Difficulty: hard)

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Schilling, A. et al. (2024). Querying Football Matches for Event Data: Towards Using Large Language Models. In: Dong, J.S., Izadi, M., Hou, Z. (eds) Sports Analytics. ISACE 2024. Lecture Notes in Computer Science, vol 14794. Springer, Cham. https://doi.org/10.1007/978-3-031-69073-0_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-69073-0_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-69072-3

  • Online ISBN: 978-3-031-69073-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics