Abstract
Football, being one of the most popular sports in the world, has attracted significant attention from researchers exploring the potential of Artificial Intelligence (AI). In particular, Large Language Models (LLMs), exemplified by digital assistants such as ChatGPT, have proven their capabilities and offer a potentially effective avenue for football research. However, accessibility of football data remains a challenge, as the datasets collected by providers are often inaccessible. This case study presents a proof-of-concept that addresses this challenge by introducing an innovative web scraping approach to extract football event data and making it accessible e.g. for scientific research with LLMs. To this end, the extracted data is structured into coherent sentences for linguistic compatibility. The results show the successful integration of LLMs with football event data, enabling the extraction of information through retrieval-augmented generation. This work makes a first contribution to the field by bridging the gap between football and LLMs, demonstrating the potential for further analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Code and prompts are available online: https://ml-and-vis.org/football-query/.
- 2.
References
Amer-Yahia, S., et al.: From large language models to databases and back: a discussion on research and education. SIGMOD Rec. 52(3), 49–56 (2023)
Anzer, G., Bauer, P.: Expected passes: determining the difficulty of a pass in football (soccer) using spatio-temporal data. Data Min. Knowl. Disc. 36(1), 295–317 (2022)
Arede, J., Ferreira, A.P., Esteves, P., Gonzalo-Skok, O., Leite, N.: Train smarter, play more: insights about preparation and game participation in youth national team. Res. Q. Exerc. Sport 91(4), 583–593 (2020)
Bauer, P., Anzer, G.: Data-driven detection of counterpressing in professional football. Data Min. Knowl. 35(5), 2009–2049 (2021)
Bonner, E., Lege, R., Frazier, E.: Large language model-based artificial intelligence in the language classroom: practical ideas for teaching. Teach. Engl. Technol. 23(1), 23–41 (2023)
Caron, M., Müller, O.: TacticalGPT: uncovering the potential of LLMs for predicting tactical decisions in professional football. In: StatsBomb Conference 2023, pp. 1–11 (2023)
Chase, H.: LangChain (2022). https://github.com/langchain-ai/langchain
Cloutier, N.A., Japkowicz, N.: Fine-tuned generative LLM oversampling can improve performance over traditional techniques on multiclass imbalanced text classification. In: 2023 IEEE International Conference on Big Data (BigData), pp. 5181–5186 (2023)
Cotta, L.: Using FIFA soccer video game data for soccer analytics. In: Workshop on Large Scale Sports Analytics, pp. 1–4 (2016)
Ćwiklinski, B., Giełczyk, A., Choraś, M.: Who will score? A machine learning approach to supporting football team building and transfers. Entropy 23(1), 1–12 (2021)
Douze, M., et al.: The Faiss library, pp. 1–21. arXiv preprint arXiv:2401.08281 (2024)
Forcher, L., et al.: How coaches can improve their teams’ match performance–the influence of in-game changes of tactical formation in professional soccer. Front. Psychol. 13, 1–12 (2022)
Franks, A., D’Amour, A., Cervone, D., Bornn, L.: Meta-analytics: Tools for understanding the statistical properties of sports metrics. J. Quant. Anal. Sports 12(4), 151–165 (2016)
García-Aliaga, A., Marquina, M., Coterón, J., Rodríguez-González, A., Luengo-Sánchez, S.: In-game behaviour analysis of football players using machine learning techniques based on player statistics. Int. J. Sports Sci. Coach. 16(1), 148–157 (2021)
Ghar, S., Patil, S., Arunachalam, V.: Data Driven football scouting assistance with simulated player performance extrapolation. In: 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 1160–1167 (2021)
Goes, F.R., Kempe, M., Meerhoff, L.A., Lemmink, K.A.: Not every pass can be an assist: a data-driven model to measure pass effectiveness in professional soccer matches. Big Data 7(1), 57–70 (2019)
Goes, F., et al.: Unlocking the potential of big data to support tactical performance analysis in professional soccer: a systematic review. Eur. J. Sport Sci. 21(4), 481–496 (2021)
Grassetti, L., Bellio, R., Di Gaspero, L., Fonseca, G., Vidoni, P.: An extended regularized adjusted plus-minus analysis for lineup management in basketball using play-by-play data. IMA J. Manag. Math. 32(4), 385–409 (2021)
Jeong, C.: A study on the implementation of generative AI services using an enterprise data-based LLM application architecture. AAIML 3(4), 1588–1618 (2023)
Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. IEEE Trans. Big Data 7(3), 535–547 (2019)
Leis, V., Haubenschild, M., Kemper, A., Neumann, T.: LeanStore: in-memory data management beyond main memory. In: 2018 IEEE 34th International Conference on Data Engineering (ICDE), pp. 185–196 (2018)
Lepschy, H., Wäsche, H., Woll, A.: Success factors in football: an analysis of the German Bundesliga. Int. J. Perform. Anal. Sport 20(2), 150–164 (2020)
Liu, Q., Geng, X., Huang, H., Qin, T., Lu, J., Jiang, D.: MGRC: an end-to-end multigranularity reading comprehension model for question answering. IEEE Trans. Neural Netw. Learn. Syst. 34(5), 2594–2605 (2023)
Löchtefeld, M., Jäckel, C., Krüger, A.: TwitSoccer: knowledge-based crowd-sourcing of live soccer events. In: Proceedings of the 14th International Conference on MUM, pp. 148–151 (2015)
Louzada, F., Maiorano, A.C., Ara, A.: iSports: a web-oriented expert system for talent identification in soccer. Expert Syst. Appl. 44, 400–412 (2016)
Minaee, S., et al.: Large language models: a survey, pp. 1–43. arXiv preprint, arXiv:2402.06196 (2024)
Moustakidis, S., Plakias, S., Kokkotis, C., Tsatalas, T., Tsaopoulos, D.: Predicting football team performance with explainable AI: leveraging SHAP to identify key team-level performance metrics. Future Internet 15(5), 174 (2023)
Pappalardo, L., et al.: A public data set of spatio-temporal match events in soccer competitions. Sci. Data 6(1), 236 (2019)
Pappalardo, L., Cintia, P., Ferragina, P., Massucco, E., Pedreschi, D., Giannotti, F.: PlayeRank: data-driven performance evaluation and player ranking in soccer via a machine learning approach. ACM Trans. Intell. Syst. Technol. 10(5), 1–27 (2019)
Potluri, J., Gummadi, H., Bhogi, M., Katta, Y.S., Ramesh, G., Meghana Reddy, T.S.: Unveiling covert conversational agents: enhancing insight, archives, and dialog acts with ChatGPT. In: 7th International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 766–772 (2023)
Rahimian, P., Toka, L.: A data-driven approach to assist offensive and defensive players in optimal decision making. Int. J. Sports Sci. Coach. 19(1), 245–256 (2023)
Sangüesa, A.A., Moeslund, T.B., Bahnsen, C.H., Iglesias, R.B.: Identifying basketball plays from sensor data; towards a low-cost automatic extraction of advanced statistics. In: IEEE International Conference on Data Mining Workshops (ICDMW), pp. 894–901 (2017)
Shen, L., Tan, Z., Li, Z., Li, Q., Jiang, G.: Tactics analysis and evaluation of women football team based on convolutional neural network. Sci. Rep. 14(1), 255 (2024)
Subramanya, S.J., Devvrit, F., Kadekodi, R., Krishaswamy, R., Simhadri, H.V.: DiskANN: fast accurate billion-point nearest neighbor search on a single node. In: Advances in Neural Information Processing Systems, pp. 1–11 (2019)
Taipalus, T.: Vector database management systems: fundamental concepts, use-cases, and current challenges. Cogn. Syst. Res. 1–13, 101216 (2024)
Vidal-Codina, F., Evans, N., El Fakir, B., Billingham, J.: Automatic event detection in football using tracking data. Sports Eng. 25(8), 1–15 (2022)
Yang, J., et al.: Harnessing the power of LLMs in practice: a survey on ChatGPT and beyond. ACM Trans. Knowl. Discov. Data 18(6), 1–32 (2024)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix
A Appendix
1.1 A.1 Overview of the Different Prompts, Their Difficulty and Results
(See Table 2).
1.2 A.2 Prompts and Answers of the Model
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Schilling, A. et al. (2024). Querying Football Matches for Event Data: Towards Using Large Language Models. In: Dong, J.S., Izadi, M., Hou, Z. (eds) Sports Analytics. ISACE 2024. Lecture Notes in Computer Science, vol 14794. Springer, Cham. https://doi.org/10.1007/978-3-031-69073-0_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-69073-0_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-69072-3
Online ISBN: 978-3-031-69073-0
eBook Packages: Computer ScienceComputer Science (R0)