Abstract
This paper presents the CLAMS (Computational Linguistics Applications for Multimedia Services) platform, which provides a framework for developing and deploying interoperable multimedia analysis tools [7]. CLAMS facilitates the processing of audiovisual content such as broadcast news videos by enabling seamless integration of tools across different media types including text, audio, video, and images. At the core of CLAMS is the Multi-Media Interchange Format (MMIF), a JSON-based annotation format designed to support the exchange of data between different tools in a consistent and structured manner. This ensures that annotations produced by one tool can be readily used by others, creating complex pipelines for automated content analysis. We describe the features provided by the CLAMS software development kit (SDK), present 2 example pipelines of CLAMS applications, and show a visualization tool for exploring data generated using CLAMS.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Bautista, D., Atienza, R.: Scene text recognition with permuted autoregressive sequence models. In: European Conference on Computer Vision, pp. 178–196. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19815-1_11
Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th International Conference on Semantic Systems (I-Semantics) (2013)
Honnibal, M., Montani, I.: spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing (2017)
Liu, H., Li, C., Li, Y., Lee, Y.J.: Improved baselines with visual instruction tuning. arXiv preprint arXiv:2310.03744 (2024)
Mindee: docTR: document text recognition (2021). https://github.com/mindee/doctr
Radford, A., Kim, J.W., Xu, T., Brockman, G., McLeavey, C., Sutskever, I.: Robust speech recognition via large-scale weak supervision. arXiv preprint arXiv:2212.04356 (2022)
Rim, K., Lynch, K., Pustejovsky, J.: Computational linguistics applications for multimedia services. In: Alex, B., Degaetano-Ortlieb, S., Kazantseva, A., Reiter, N., Szpakowicz, S. (eds.) Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pp. 91–97. Association for Computational Linguistics, Minneapolis, USA (2019). https://doi.org/10.18653/v1/W19-2512
Rim, K., Lynch, K., Verhagen, M., Ide, N., Pustejovsky, J.: Interchange formats for visualization: LIF and MMIF. In: Calzolari, N., et al. (eds.) Proceedings of the Twelfth Language Resources and Evaluation Conference, pp. 7230–7237. European Language Resources Association, Marseille, France (2020). https://aclanthology.org/2020.lrec-1.893
Smith, R.: An overview of the tesseract OCR engine. In: ICDAR ’07: Proceedings of the Ninth International Conference on Document Analysis and Recognition, pp. 629–633. IEEE Computer Society, Washington, DC, USA (2007). https://storage.googleapis.com/pub-tools-public-publication-data/pdf/33418.pdf
Sou?ek, T., Loko, J.: TransNet V2: an effective deep network architecture for fast shot transition detection. arXiv preprint arXiv:2008.04838 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Lynch, K., Rim, K., King, O., Pustejovsky, J. (2025). Multimodal Interoperability with the CLAMS Platform. In: Ide, I., et al. MultiMedia Modeling. MMM 2025. Lecture Notes in Computer Science, vol 15524. Springer, Singapore. https://doi.org/10.1007/978-981-96-2074-6_19
Download citation
DOI: https://doi.org/10.1007/978-981-96-2074-6_19
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-96-2073-9
Online ISBN: 978-981-96-2074-6
eBook Packages: Computer ScienceComputer Science (R0)