TIP: Tabular-Image Pre-training for Multimodal Classification with Incomplete Data

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15073))

Included in the following conference series:

European Conference on Computer Vision

84 Accesses

Abstract

Images and structured tables are essential parts of real-world databases. Though tabular-image representation learning is promising for creating new insights, it remains a challenging task, as tabular data is typically heterogeneous and incomplete, presenting significant modality disparities with images. Earlier works have mainly focused on simple modality fusion strategies in complete data scenarios, without considering the missing data issue, and thus are limited in practice. In this paper, we propose TIP, a novel tabular-image pre-training framework for learning multimodal representations robust to incomplete tabular data. Specifically, TIP investigates a novel self-supervised learning (SSL) strategy, including a masked tabular reconstruction task to tackle data missingness, and image-tabular matching and contrastive learning objectives to capture multimodal information. Moreover, TIP proposes a versatile tabular encoder tailored for incomplete, heterogeneous tabular data and a multimodal interaction module for inter-modality representation learning. Experiments are performed on downstream multimodal classification tasks using both natural and medical image datasets. The results show that TIP outperforms state-of-the-art supervised/SSL image/multimodal methods in both complete and incomplete data scenarios. Our code is available at https://github.com/siyi-wind/TIP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 49.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 64.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Multimodal Self-supervised Learning for Medical Image Analysis

Leveraging Foundation Models for Multi-modal Federated Learning with Incomplete Modality

Attention versus contrastive learning of tabular data: a data-centric benchmarking

Article 22 November 2024

References

Acosta, J.N., Falcone, G.J., Rajpurkar, P., Topol, E.J.: Multimodal biomedical AI. Nat. Med. 28(9), 1773–1784 (2022)
Article Google Scholar
Antelmi, L., Ayache, N., Robert, P., Ribaldi, F., Garibotto, V., Frisoni, G.B., Lorenzi, M.: Combining multi-task learning and multi-channel variational auto-encoders to exploit datasets with missing observations-application to multi-modal neuroimaging studies in dementia. hal preprint hal-03114888v2 (2021)
Google Scholar
Assran, M., Duval, Q., Misra, I., et al.: Self-supervised learning from images with a joint-embedding predictive architecture. In: CVPR, pp. 15619–15629 (2023)
Google Scholar
Bahri, D., Jiang, H., Tay, Y., Metzler, D.: SCARF: self-supervised contrastive learning using random feature corruption. In: ICLR (2022)
Google Scholar
Bai, W., Suzuki, H., Huang, J., Francis, C., Wang, S., Tarroni, G., et al.: A population-based phenome-wide association study of cardiac and aortic structure and function. Nat. Med. 26(10), 1654–1662 (2020)
Article Google Scholar
Baltrušaitis, T., Ahuja, C., Morency, L.P.: Multimodal machine learning: a survey and taxonomy. IEEE TPAMI 41(2), 423–443 (2018)
Article Google Scholar
Barnard, J., Meng, X.L.: Applications of multiple imputation in medical studies: from AIDS to NHANES. Stat. Methods Med. Res. 8(1), 17–36 (1999)
Article Google Scholar
Bayasi, N., Hamarneh, G., Garbi, R.: Continual-Zoo: Leveraging zoo models for continual classification of medical images. In: CVPRW, pp. 4128–4138 (2024)
Google Scholar
Bayoudh, K., Knani, R., Hamdaoui, F., Mtibaa, A.: A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets. Vis. Comput. 38(8), 2939–2970 (2022)
Article Google Scholar
Borisov, V., Leemann, T., Seßler, K., Haug, J., Pawelczyk, M., Kasneci, G.: Deep neural networks and tabular data: a survey. IEEE Trans. Neural Netw.Learn. Syst. 35, 7499–7519 (2022)
Article Google Scholar
Borsos, B., Allaart, C.G., van Halteren, A.: Predicting stroke outcome: a case for multimodal deep learning methods with tabular and CT perfusion data. Artif. Intell. Med. 147, 102719 (2024)
Article Google Scholar
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., et al.: Language models are few-shot learners. NIPS 33, 1877–1901 (2020)
Google Scholar
Buntin, M.B., Burke, M.F., Hoaglin, M.C., Blumenthal, D.: The benefits of health information technology: a review of the recent literature shows predominantly positive results. Health Aff. 30(3), 464–471 (2011)
Article Google Scholar
Bycroft, C., Freeman, C., Petkova, D., et al.: The UK Biobank resource with deep phenotyping and genomic data. Nature 562(7726), 203–209 (2018)
Article Google Scholar
Cai, Q., Wang, H., et al.: A survey on multimodal data-driven smart healthcare systems: approaches and applications. IEEE Access 7, 133583–133599 (2019)
Article Google Scholar
Chaudhry, B., Wang, J., Wu, S., Maglione, M., Mojica, W., Roth, E., et al.: Systematic review: impact of health information technology on quality, efficiency, and costs of medical care. Ann. Intern. Med. 144(10), 742–752 (2006)
Article Google Scholar
Chen, F.L., Zhang, D.Z., Han, M.L., Chen, X.Y., et al.: VLP: a survey on vision-language pre-training. Mach. Intell. Res. 20(1), 38–56 (2023)
Article Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: ICML, pp. 1597–1607. PMLR (2020)
Google Scholar
Chen, X., He, K.: Exploring simple siamese representation learning. In: CVPR, pp. 15750–15758 (2021)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: ACL, pp. 4171–4186 (2019)
Google Scholar
Dong, H., et al.: Table pre-training: a survey on model architectures, pre-training objectives, and downstream tasks. arXiv preprint arXiv:2201.09745 (2022)
Dong, X., Yu, Z., Cao, W., Shi, Y., Ma, Q.: A survey on ensemble learning. Front. Comp. Sci. 14, 241–258 (2020)
Article Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: ICLR (2020)
Google Scholar
Duanmu, H., et al.: Prediction of pathological complete response to neoadjuvant chemotherapy in breast cancer using deep learning with integrative imaging, Molecular and demographic data. In: Martel, A.L., et al. (ed.) Medical Image Computing and Computer Assisted Intervention - MICCAI 2020, MICCAI 2020, LNCS, vol. 12262, pp 242–252. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59713-9_24
Ganaie, M.A., Hu, M., Malik, A., et al.: Ensemble deep learning: a review. Eng. Appl. Artif. Intell. 115, 105151 (2022)
Article Google Scholar
Ghorbani, A., Zou, J.Y.: Embedding for informative missingness: Deep learning with incomplete data. In: 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 437–445. IEEE (2018)
Google Scholar
Gorishniy, Y., Rubachev, I., Khrulkov, V., Babenko, A.: Revisiting deep learning models for tabular data. NIPS 34, 18932–18943 (2021)
Google Scholar
Grill, J.B., et al.: Bootstrap your own latent-a new approach to self-supervised learning. NIPS 33, 21271–21284 (2020)
Google Scholar
Hager, P., Menten, M.J., Rueckert, D.: Best of both worlds: multimodal contrastive learning with tabular and imaging data. In: CVPR, pp. 23924–23935 (2023)
Google Scholar
Han, X., Wang, Y.T., Feng, J.L., Deng, C., et al.: A survey of transformer-based multimodal pre-trained modals. Neurocomputing 515, 89–106 (2023)
Article Google Scholar
Hawthorne, G., Hawthorne, G., Elliott, P.: Imputing cross-sectional missing data: comparison of common techniques. Australian New Zealand J. Psychiatry 39(7), 583–590 (2005)
Article Google Scholar
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: CVPR, pp. 16000–16009 (2022)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
He, R., McAuley, J.: Ups and downs: modeling the visual evolution of fashion trends with one-class collaborative filtering. In: WWW, pp. 507–517 (2016)
Google Scholar
Heiliger, L., Sekuboyina, A., Menze, B., et al.: Beyond medical imaging-a review of multimodal deep learning in radiology. Authorea Preprints (2023)
Google Scholar
Huang, J., Chen, B., Luo, L., et al.: DVM-CAR: a large-scale automotive dataset for visual marketing research and applications. In: 2022 IEEE International Conference on Big Data (Big Data), pp. 4140–4147. IEEE (2022)
Google Scholar
Huang, S.C., Pareek, A., Seyyedi, S., Banerjee, I., Lungren, M.P.: Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines. NPJ Digit. Med. 3(1), 136 (2020)
Article Google Scholar
Huang, X., Khetan, A., Cvitkovic, M., Karnin, Z.: TabTransformer: tabular data modeling using contextual embeddings. arXiv preprint arXiv:2012.06678 (2020)
Jarrett, D., Cebere, B.C., Liu, T., Curth, A., van der Schaar, M.: HyperImpute: Generalized iterative imputation with automatic model selection. In: ICML, pp. 9916–9937. PMLR (2022)
Google Scholar
Jiang, J.P., Ye, H.J., Wang, L., Yang, Y., Jiang, Y., Zhan, D.C.: On transferring expert knowledge from tabular data to images. In: NIPSW (2023)
Google Scholar
Jing, L., Tian, Y.: Self-supervised visual feature learning with deep neural networks: a survey. IEEE TPAMI 43(11), 4037–4058 (2020)
Article Google Scholar
Johnson, A.E., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3(1), 1–9 (2016)
Google Scholar
Kalyan, K.S., Rajasekharan, A., Sangeetha, S.: AMMU: a survey of transformer-based biomedical pretrained language models. J. Biomed. Inform. 126, 103982 (2022)
Article Google Scholar
Ko, W., Jung, W., Jeon, E., Suk, H.I.: A deep generative-discriminative learning for multimodal representation in imaging genetics. IEEE Trans. Med. Imaging 41(9), 2348–2359 (2022)
Article Google Scholar
Li, J., Li, D., Xiong, C., Hoi, S.: BLIP: bootstrapping language-image pre-training for unified vision-language understanding and generation. In: ICML, pp. 12888–12900. PMLR (2022)
Google Scholar
Li, J., Selvaraju, R., Gotmare, A., et al.: Align before fuse: vision and language representation learning with momentum distillation. NIPS 34, 9694–9705 (2021)
Google Scholar
Liaw, A., Wiener, M., et al.: Classification and regression by randomforest. R News 2(3), 18–22 (2002)
Google Scholar
Littlejohns, T.J., Holliday, J., Gibson, L.M., Garratt, S., et al.: The UK Biobank imaging enhancement of 100,000 participants: rationale, data collection, management and future directions. Nat. Commun. 11(1), 2624 (2020)
Google Scholar
Mackinnon, A.: The use and reporting of multiple imputation in medical research-a review. J. Intern. Med. 268(6), 586–593 (2010)
Article Google Scholar
Majmundar, K.A., Goyal, S., Netrapalli, P., Jain, P.: MET: masked encoding for tabular data. In: NIPSW (2022)
Google Scholar
Mattei, P.A., Frellsen, J.: MIWAE: deep generative modelling and imputation of incomplete data sets. In: ICML, pp. 4413–4423. PMLR (2019)
Google Scholar
Miao, X., Wu, Y., et al.: An experimental survey of missing data imputation algorithms. IEEE Trans. Knowl. Data Eng. 35(7), 6630–6650 (2022)
Google Scholar
Min, B., Ross, H., Sulem, E., Veyseh, A.P.B., Nguyen, T.H., et al.: Recent advances in natural language processing via large pre-trained language models: a survey. ACM Comput. Surv. 56(2), 1–40 (2023)
Article Google Scholar
Ouyang, L., et al.: Training language models to follow instructions with human feedback. NIPS 35, 27730–27744 (2022)
Google Scholar
Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: feature learning by inpainting. In: CVPR, pp. 2536–2544 (2016)
Google Scholar
Pölsterl, S., Wolf, T.N., Wachinger, C.: Combining 3D image and tabular data via the dynamic affine feature map transform. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12905, pp. 688–698. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87240-3_66
Chapter Google Scholar
Radford, A., Kim, J.W., Hallacy, C., et al.: Learning transferable visual models from natural language supervision. In: ICML, pp. 8748–8763. PMLR (2021)
Google Scholar
Raghunathan, T.E., Lepkowski, J.M., Van Hoewyk, J., Solenberger, P., et al.: A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv. Pract. 27(1), 85–96 (2001)
Google Scholar
Royston, P., White, I.R.: Multiple imputation by chained equations (MICE): implementation in stata. J. Stat. Softw. 45, 1–20 (2011)
Article Google Scholar
Schafer, J.L., Graham, J.W.: Missing data: our view of the state of the art. Psychol. Methods 7(2), 147 (2002)
Article Google Scholar
Somepalli, G., Goldblum, M., Schwarzschild, A., Bruss, C.B., Goldstein, T.: SAINT: improved neural networks for tabular data via row attention and contrastive pre-training. arXiv preprint arXiv:2106.01342 (2021)
Spasov, S., Passamonti, L., Duggento, A., Lio, P., Toschi, N., et al.: A parameter-efficient deep learning approach to predict conversion from mild cognitive impairment to alzheimer’s disease. Neuroimage 189, 276–287 (2019)
Article Google Scholar
Stekhoven, D.J., Bühlmann, P.: MissForest–non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1), 112–118 (2012)
Article Google Scholar
Sun, K., Luo, X., Luo, M.Y.: A survey of pretrained language models. In: Memmi, G., Yang, B., Kong, L., Zhang, T., Qiu, M. (eds.) Knowledge Science, Engineering and Management, KSEM 2022, LNCS, vol. 13369, pp. 442–456 Springer, Cham (2022). https://doi.org/10.1007/978-3-031-10986-7_36
Ucar, T., Hajiramezanali, E., Edwards, L.: SubTab: Subsetting features of tabular data for self-supervised representation learning. NIPS 34, 18853–18865 (2021)
Google Scholar
Vale-Silva, L.A., Rohr, K.: Long-term cancer survival prediction using multimodal deep learning. Sci. Rep. 11(1), 13505 (2021)
Article Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NIPS, vol. 30 (2017)
Google Scholar
Wang, Z., Sun, J.: TransTab: learning transferable tabular transformers across tables. NIPS 35, 2902–2915 (2022)
Google Scholar
Wolf, T.N., Pölsterl, S., et al.: DAFT: a universal module to interweave tabular data and 3D images in CNNs. Neuroimage 260, 119505 (2022)
Article Google Scholar
Yang, J., Gupta, A., Upadhyay, S., He, L., Goel, R., Paul, S.: TableFormer: robust transformer modeling for table-text encoding. In: ACL, pp. 528–537 (2022)
Google Scholar
Ye, C., Lu, G., Wang, H., et al.: CT-BERT: learning better tabular representations through cross-table pre-training. arXiv preprint arXiv:2307.04308 (2023)
Yoon, J., Jordon, J., Schaar, M.: Gain: missing data imputation using generative adversarial nets. In: ICML, pp. 5689–5698. PMLR (2018)
Google Scholar
Yoon, J., Zhang, Y., et al.: VIME: extending the success of self-and semi-supervised learning to tabular domain. NIPS 33, 11033–11043 (2020)
Google Scholar
Yu, J., Wang, Z., Vasudevan, V., et al.: Coca: contrastive captioners are image-text foundation models. arXiv preprint arXiv:2205.01917 (2022)
Zbontar, J., Jing, L., Misra, I., LeCun, Y., Deny, S.: Barlow twins: self-supervised learning via redundancy reduction. In: ICML, pp. 12310–12320. PMLR (2021)
Google Scholar
Zhang, C., Zhang, C., Song, J., Yi, J.S.K., Kweon, I.S.: A survey on masked autoencoder for visual self-supervised learning. In: IJCAI, pp. 6805–6813 (2023)
Google Scholar
Zheng, H., et al.: Multi-transSP: multimodal transformer for survival prediction of nasopharyngeal carcinoma patients. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) Medical Image Computing and Computer Assisted Intervention - MICCAI 2022, MICCAI 2022, LNCS, vol. 13437. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16449-1_23

Download references

Acknowledgements

This research has been conducted using the UK Biobank Resource under Application Number 40616. The MR images presented in the figures are reproduced with the kind permission of UK Biobank ©. We also thank Paul Hager from the Lab for AI in Medicine at the Technical University of Munich for providing the pre-processing code for the UKBB dataset. DO’R is supported by the Medical Research Council (MC_UP_1605/13); National Institute for Health Research (NIHR) Imperial College Biomedical Research Centre; and the British Heart Foundation (RG/19/6/34387, RE/24/130023, CH/P/23/80008).

Author information

Authors and Affiliations

Imperial College London, London, UK
Siyi Du, Shaoming Zheng, Yinsong Wang, Wenjia Bai, Declan P. O’Regan & Chen Qin

Authors

Siyi Du
View author publications
You can also search for this author in PubMed Google Scholar
Shaoming Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Yinsong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wenjia Bai
View author publications
You can also search for this author in PubMed Google Scholar
Declan P. O’Regan
View author publications
You can also search for this author in PubMed Google Scholar
Chen Qin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Siyi Du or Chen Qin .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 11045 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Du, S., Zheng, S., Wang, Y., Bai, W., O’Regan, D.P., Qin, C. (2025). TIP: Tabular-Image Pre-training for Multimodal Classification with Incomplete Data. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15073. Springer, Cham. https://doi.org/10.1007/978-3-031-72633-0_27

Download citation

DOI: https://doi.org/10.1007/978-3-031-72633-0_27
Published: 22 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72632-3
Online ISBN: 978-3-031-72633-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

TIP: Tabular-Image Pre-training for Multimodal Classification with Incomplete Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Multimodal Self-supervised Learning for Medical Image Analysis

Leveraging Foundation Models for Multi-modal Federated Learning with Incomplete Modality

Attention versus contrastive learning of tabular data: a data-centric benchmarking

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 11045 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

TIP: Tabular-Image Pre-training for Multimodal Classification with Incomplete Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Multimodal Self-supervised Learning for Medical Image Analysis

Leveraging Foundation Models for Multi-modal Federated Learning with Incomplete Modality

Attention versus contrastive learning of tabular data: a data-centric benchmarking

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 11045 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation