Statistics > Machine Learning

arXiv:2410.10082 (stat)

[Submitted on 14 Oct 2024]

Title:fastHDMI: Fast Mutual Information Estimation for High-Dimensional Data

Authors:Kai Yang, Masoud Asgharian, Nikhil Bhagwat, Jean-Baptiste Poline, Celia M.T. Greenwood

View PDF

Abstract:In this paper, we introduce fastHDMI, a Python package designed for efficient variable screening in high-dimensional datasets, particularly neuroimaging data. This work pioneers the application of three mutual information estimation methods for neuroimaging variable selection, a novel approach implemented via fastHDMI. These advancements enhance our ability to analyze the complex structures of neuroimaging datasets, providing improved tools for variable selection in high-dimensional spaces.
Using the preprocessed ABIDE dataset, we evaluate the performance of these methods through extensive simulations. The tests cover a range of conditions, including linear and nonlinear associations, as well as continuous and binary outcomes. Our results highlight the superiority of the FFTKDE-based mutual information estimation for feature screening in continuous nonlinear outcomes, while binning-based methods outperform others for binary outcomes with nonlinear probability preimages. For linear simulations, both Pearson correlation and FFTKDE-based methods show comparable performance for continuous outcomes, while Pearson excels in binary outcomes with linear probability preimages.
A comprehensive case study using the ABIDE dataset further demonstrates fastHDMI's practical utility, showcasing the predictive power of models built from variables selected using our screening techniques. This research affirms the computational efficiency and methodological strength of fastHDMI, significantly enriching the toolkit available for neuroimaging analysis.

Comments:	31 pages, 5 figures
Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG); Applications (stat.AP); Computation (stat.CO)
Cite as:	arXiv:2410.10082 [stat.ML]
	(or arXiv:2410.10082v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2410.10082

Submission history

From: Kai Yang [view email]
[v1] Mon, 14 Oct 2024 01:49:53 UTC (3,824 KB)

Statistics > Machine Learning

Title:fastHDMI: Fast Mutual Information Estimation for High-Dimensional Data

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:fastHDMI: Fast Mutual Information Estimation for High-Dimensional Data

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators