BMI: Bounded Mutual Information for Efficient Privacy-Preserving Feature Selection

David Eklund¹¹,
Alfonso Iacovazzi¹¹,
Han Wang¹¹,
Apostolos Pyrgelis¹¹ &
…
Shahid Raza^11,12

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14983))

Included in the following conference series:

European Symposium on Research in Computer Security

463 Accesses

Abstract

We introduce low complexity bounds on mutual information for efficient privacy-preserving feature selection with secure multi-party computation (MPC). Considering a discrete feature with N possible values and a discrete label with M possible values, our approach requires O(N) multiplications as opposed to O(NM) in a direct MPC implementation of mutual information. Our experimental results show that for regression tasks, we achieve a computation speed up of over 1,000\(\times \) compared to a straightforward MPC implementation of mutual information, while achieving similar accuracy for the downstream machine learning model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 49.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 64.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Efficient Unconditionally Secure Comparison and Privacy Preserving Machine Learning Classification Protocols

PD-ML-Lite: Private Distributed Machine Learning from Lightweight Cryptography

Balanced Privacy Budget Allocation for Privacy-Preserving Machine Learning

Notes

References

Akavia, A., Galili, B., Shaul, H., Weiss, M., Yakhini, Z.: Privacy preserving feature selection for sparse linear regression. Cryptology ePrint Archive, Paper 2023/1354 (2023). https://eprint.iacr.org/2023/1354
Alishahi, M., Moghtadaiee, V.: Feature selection on anonymized datasets. In: IEEE International Conference on Dependable, Autonomic and Secure Computing (DASC) (2023)
Google Scholar
Alishahi, M., Moghtadaiee, V., Navidan, H.: Add noise to remove noise: local differential privacy for feature selection. Comput. Secur. (2022)
Google Scholar
Aono, Y., Hayashi, T., Wang, L., Moriai, S., et al.: Privacy-preserving deep learning via additively homomorphic encryption. IEEE Trans. Inf. Forensics Secur. (2017)
Google Scholar
Araki, T., Furukawa, J., Lindell, Y., Nof, A., Ohara, K.: High-throughput semi-honest secure three-party computation with an honest majority. In: Proceedings of the 2016 ACM Conference on Computer and Communications Security (2016)
Google Scholar
Banerjee, M., Chakravarty, S.: Privacy preserving feature selection for distributed data using virtual dimension. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management (2011)
Google Scholar
Banerjee, S., Elmroth, E., Bhuyan, M.: Fed-FiS: a novel information-theoretic federated feature selection for learning stability. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds.) ICONIP 2021. CCIS, vol. 1516, pp. 480–487. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-92307-5_56
Chapter Google Scholar
Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. (1994)
Google Scholar
Bogdanov, D., Laur, S., Willemson, J.: Sharemind: a framework for fast privacy-preserving computations. In: Jajodia, S., Lopez, J. (eds.) ESORICS 2008. LNCS, vol. 5283, pp. 192–206. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88313-5_13
Chapter Google Scholar
Cassara, P., Gotta, A., Valerio, L.: Federated feature selection for cyber-physical systems of systems. IEEE Trans. Veh. Technol. (2022)
Google Scholar
Catrina, O., de Hoogh, S.: Improved primitives for secure multiparty integer computation. In: Garay, J.A., De Prisco, R. (eds.) SCN 2010. LNCS, vol. 6280, pp. 182–199. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15317-4_13
Chapter Google Scholar
Catrina, O., Saxena, A.: Secure computation with fixed-point numbers. In: Sion, R. (ed.) FC 2010. LNCS, vol. 6052, pp. 35–50. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14577-3_6
Chapter Google Scholar
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. (2014)
Google Scholar
Covert, I.C., Qiu, W., Lu, M., Kim, N.Y., White, N.J., Lee, S.-I.: Learning to maximize mutual information for dynamic feature selection. In: International Conference on Machine Learning. PMLR (2023)
Google Scholar
Cramer, R., Damgård, I.B., et al.: Secure Multiparty Computation. Cambridge University Press (2015)
Google Scholar
Dankar, F.K., Madathil, N., Dankar, S.K., Boughorbel, S.: Privacy-preserving analysis of distributed biomedical data: designing efficient and secure multiparty computations using distributed statistical learning theory. JMIR Med. Inf. (2019)
Google Scholar
Evans, D., Kolesnikov, V., Rosulek, M., et al.: A pragmatic introduction to secure multi-party computation. Found. Trends® Priv. Secur. (2018)
Google Scholar
Froelicher, D., et al.: Scalable and privacy-preserving federated principal component analysis. In: IEEE Symposium on Security and Privacy (SP) (2023)
Google Scholar
Froelicher, D., et al.: Scalable privacy-preserving distributed learning. In: Proceedings on Privacy Enhancing Technologies (2021)
Google Scholar
Fu, R., Wu, Y., Xu, Q., Zhang, M.: FEAST: a communication-efficient federated feature selection framework for relational data. Proc. ACM Manag. Data (2023)
Google Scholar
Gascón, A.: Privacy-preserving distributed linear regression on high-dimensional data. Proc. Priv. Enhancing Technol. (2017)
Google Scholar
Goldreich, O., Micali, S., Wigderson, A.: How to play any mental game, or a completeness theorem for protocols with honest majority, pp. 307–328. Association for Computing Machinery (2019)
Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. (2003)
Google Scholar
Hu, Y., Zhang, Y., Gong, D., Sun, X.: Multi-participant federated feature selection algorithm with particle swarm optimizaiton for imbalanced data under privacy protection. IEEE Trans. Artif. Intell. (2022)
Google Scholar
Jafer, Y., Matwin, S., Sokolova, M.: A framework for a privacy-aware feature selection evaluation measure. In: 13th Annual Conference on Privacy, Security and Trust (PST). IEEE (2015)
Google Scholar
Kamm, L., Bogdanov, D., Laur, S., Vilo, J.: A new way to protect privacy in large-scale genome-wide association studies. Bioinformatics (2013)
Google Scholar
Karhu, T.: Desaturation event scoring criteria affect the perceived severity of nocturnal hypoxic load. Sleep Med. 100, 479–486 (2022)
Google Scholar
Karhu, T., Leppänen, T., Töyräs, J., Oksenberg, A., Myllymaa, S., Nikkonen, S.: ABOSA - freely available automatic blood oxygen saturation signal analysis software: structure and validation. Comput. Methods Programs Biomed. (2022)
Google Scholar
Karhu, T., Myllymaa, S., Nikkonen, S., Mazzotti, D.R., Töyräs, J., Leppänen, T.: Longer and deeper desaturations are associated with the worsening of mild sleep apnea: the sleep heart health study. Front. Neurosci. 15, 657126 (2021)
Article Google Scholar
Keller, M.: MP-SPDZ: a versatile framework for multi-party computation. In: Proceedings of the ACM Conference on Computer and Communications Security (2020)
Google Scholar
Klosh, G., et al.: The siesta project polygraphic and clinical database. IEEE Eng. Med. Biol. Mag. (2001)
Google Scholar
Knott, B., Venkataraman, S., Hannun, A., Sengupta, S., Ibrahim, M., van der Maaten, L.: CrypTen: secure multi-party computation meets machine learning. In: Advances in Neural Information Processing Systems (2021)
Google Scholar
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. (1997)
Google Scholar
Küsters, R., Liedtke, J., Müller, J., Rausch, D., Vogt, A.: Ordinos: a verifiable tally-hiding e-voting system. In: IEEE European Symposium on Security and Privacy (EuroS &P) (2020)
Google Scholar
Langley, P., et al.: Selection of relevant features in machine learning. In: Proceedings of the AAAI Fall Symposium on Relevance (1994)
Google Scholar
Law, M.H., Figueiredo, M.A., Jain, A.K.: Simultaneous feature selection and clustering using mixture models. IEEE Trans. Pattern Anal. Mach. Intell. (2004)
Google Scholar
Li, A., et al.: Efficient and privacy-preserving feature importance-based vertical federated learning. IEEE Trans. Mob. Comput. (2023)
Google Scholar
Li, J., et al.: Feature selection: a data perspective. ACM Comput. Surv. (CSUR) (2017)
Google Scholar
Li, X., Dowsley, R., De Cock, M.: Privacy-preserving feature selection with secure multiparty computation. In: International Conference on Machine Learning. PMLR (2021)
Google Scholar
Majithia, J., Levan, D.: A note on base-2 logarithm computations. Proc. IEEE 61(10), 1519–1520 (1973)
Article Google Scholar
Mitra, P., Murthy, C., Pal, S.K.: Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell. (2002)
Google Scholar
Nikolaenko, V., Weinsberg, U., Ioannidis, S., Joye, M., Boneh, D., Taft, N.: Privacy-preserving ridge regression on hundreds of millions of records. In: IEEE Symposium on Security and Privacy (SP) (2013)
Google Scholar
Ono, S., Takata, J., Kataoka, M., Shin, T.I.K., Sakamoto, H.: Privacy-preserving feature selection with fully homomorphic encryption. Algorithms (2022)
Google Scholar
Polychroniadou, A., et al.: Prime match: a privacy-preserving inventory matching system. In: 32nd USENIX Security Symposium (2023)
Google Scholar
Qin, Y., Kondo, M.: Federated learning-based network intrusion detection with a feature selection approach. In: IEEE International Conference on Electrical, Communication, and Computer Engineering (2021)
Google Scholar
Raab, R.: Federated electronic health records for the European health data space. Lancet Digit. Health (2023)
Google Scholar
Rao, V., Long, Y., Eldardiry, H., Rane, S., Rossi, R., Torres, F.: Secure two-party feature selection. arXiv preprint arXiv:1901.00832 (2019)
Romero, E., Sopena, J.M.: Performing feature selection with multilayer perceptrons. IEEE Trans. Neural Netw. (2008)
Google Scholar
Russell Graves-Morris, P., Saff, E.B., Varga, R.S.: Rational Approximation and Interpolation. Lecture Notes in Mathematics. Springer, Heidelberg (1983). https://doi.org/10.1007/BFb0072395
Sangers, A., et al.: Secure multiparty pagerank algorithm for collaborative fraud detection. In: Goldberg, I., Moore, T. (eds.) FC 2019. LNCS, vol. 11598, pp. 605–623. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32101-7_35
Chapter Google Scholar
Sav, S., Diaa, A., Pyrgelis, A., Bossuat, J.-P., Hubaux, J.-P.: Privacy-preserving federated recurrent neural networks. In: Proceedings on Privacy Enhancing Technologies (2023)
Google Scholar
Sav, S., et al.: POSEIDON: privacy-preserving federated neural network learning. In: 28th Network And Distributed System Security Symposium (NDSS). Internet Society (2021)
Google Scholar
Sheikhalishahi, M., Martinelli, F.: Privacy-utility feature selection as a privacy mechanism in collaborative data classification. In: IEEE International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (2017)
Google Scholar
Stracuzzi, D.J., Utgoff, P.E.: Randomized variable elimination. J. Mach. Learn. Res. (2004)
Google Scholar
Volder, J.: The CORDIC computing technique. In: International Workshop on Managing Requirements Knowledge. IEEE Computer Society (1959)
Google Scholar
Wagh, S., Gupta, D., Chandran, N.: SecureNN: 3-party secure computation for neural network training. In: Proceedings on Privacy Enhancing Technologies (2019)
Google Scholar
Yang, J., Honavar, V.: Feature subset selection using a genetic algorithm. IEEE Intell. Syst. Their Appl. (1998)
Google Scholar
Yao, A.C.: Protocols for secure computations. In: 23rd Annual Symposium on Foundations of Computer Science. IEEE (1982)
Google Scholar
Yao, X., Huang, T., Wu, C., Zhang, R., Sun, L.: Towards faster and better federated learning: a feature fusion approach. In: IEEE International Conference on Image Processing (ICIP) (2019)
Google Scholar
Zhang, Q., Wang, C., Wu, H., Xin, C., Phuong, T.V.: GELU-Net: a globally encrypted, locally unencrypted deep neural network for privacy-preserved learning. In: International Joint Conferences on Artificial Intelligence (2018)
Google Scholar
Zhang, R., Li, H., Hao, M., Chen, H., Zhang, Y.: Secure feature selection for vertical federated learning in eHealth systems. In: IEEE International Conference on Communications (2022)
Google Scholar
Zhang, X., Mavromatics, A., Vafeas, A., Nejabati, R., Simeonidou, D.: Federated feature selection for horizontal federated learning in IoT networks. IEEE Internet Things (2023)
Google Scholar

Download references

Acknowledgments

This research is funded by the EU Horizon Europe project HARPOCRATES (Grant ID. 101069535) and H2020 project CONCORDIA (Grant ID. 830927). We thank Tuomas Karhu for preparing the SpO2 data as well as help and advice in the process. We would also like to thank the anonymous reviewers for their comments and suggested improvements.

Author information

Authors and Affiliations

RISE Research Institutes of Sweden, Box 1263, 16429, Kista, Sweden
David Eklund, Alfonso Iacovazzi, Han Wang, Apostolos Pyrgelis & Shahid Raza
Mälardalen University, Västerås, Sweden
Shahid Raza

Authors

David Eklund
View author publications
You can also search for this author in PubMed Google Scholar
Alfonso Iacovazzi
View author publications
You can also search for this author in PubMed Google Scholar
Han Wang
View author publications
You can also search for this author in PubMed Google Scholar
Apostolos Pyrgelis
View author publications
You can also search for this author in PubMed Google Scholar
Shahid Raza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Eklund .

Editor information

Editors and Affiliations

Institut Polytechnique de Paris, Palaiseau, France
Joaquin Garcia-Alfaro
Bydgoszcz University of Science and Technology, Bydgoszcz, Poland
Rafał Kozik
Bydgoszcz University of Science and Technology, Bydgoszcz, Poland
Michał Choraś
Norwegian University of Science and Technology - NTNU, Gjøvik, Norway
Sokratis Katsikas

Appendices

A Appendix: Cutoff Probability

With x, \(x_1,...,x_k\) and \(\lambda _1,\dots ,\lambda _k\) as in Sect. 5.1, we shall see that we can modify the bound \(H(x) \le \sum _j \lambda _j C(x,x_j)\) by cutting off probabilities in \(C(x,x_j)\) at a given level \(0 < \epsilon \le \min _j(e^{-1/\lambda _j})\). More in detail, \(H(x) \le \sum _j \lambda _j \bar{C}(x,x_j)\) where \(\bar{C}(x,x_j)=\sum _i - \bar{p}_i(x) \log {\bar{p}_i(x_j)}\), \(\bar{p}_i(x_j)=\max (p_i(x_j),\epsilon )\) and \(\bar{p}_i(x)=\sum _j \lambda _j \bar{p}_i(x_j)\).

Note that \(f(z)=-z\log {z}\), for \(0 < z \le 1\), increases up to \(z=e^{-1}\) where it has its maximum \(f(e^{-1})=e^{-1}\) and decreases after that. Let \(g(z_1,\dots ,z_k)=(\sum _j -\lambda _j z_j)\sum _j \lambda _j \log {z_j}\) for \(0 < z_1,\dots ,z_k \le 1\). Then, \(H(x)=\sum _i f(p_i(x))\) and \(\sum _j \lambda _j C(x,x_j) = \sum _i g(p_i(x_1),\dots ,p_i(x_k))\). In the same way, \(\sum _j \lambda _j \bar{C}(x,x_j) = \sum _i g(\bar{p}_i(x_1),\dots ,\bar{p}_i(x_k))\) and using the concavity of the logarithm, we have a term wise inequality \(f(p_i(x)) \le g(p_i(x_1),\dots ,p_i(x_k))\).

Lemma 2

We have the alternative bound \(H(x) \le \sum _j \lambda _j \bar{C}(x,x_j)\).

Proof

We shall see that \(f(p_i(x)) \le g(\bar{p}_i(x_1),\dots ,\bar{p}_i(x_k))\) for all i. Fix \(1 \le i \le N\) and assume that at least one \(p_i(x_l) \le \epsilon \) so that \(\bar{p}_i(x_l)=\epsilon \). Suppose first that \(\bar{p}_i(x) < e^{-1}\). Since \(p_i(x) \le \bar{p}_i(x)\), we have that \(f(p_i(x)) \le f(\bar{p}_i(x))\) and \(f(\bar{p}_i(x)) \le g(\bar{p}_i(x_1),\dots ,\bar{p}_i(x_k))\) by the concavity of the logarithm. Now suppose that \(\bar{p}_i(x) \ge e^{-1}\). In this case, \(g(\bar{p}_i(x_1),\dots ,\bar{p}_i(x_k))=-\bar{p}_i(x)\sum _j \lambda _j \log {\bar{p}_i(x_j)} \ge \bar{p}_i(x)(-\lambda _l \log {\epsilon }) \ge \bar{p}_i(x)\) since \(-\lambda _l \log {\epsilon } \ge 1\). Finally, \(\bar{p}_i(x) \ge e^{-1} \ge f(p_i(x))\).

B Appendix: Computing the Logarithm in MPC

To compute entropy, cross-entropy, and mutual information under MPC, we need to evaluate the logarithm, or more precisely the function \(z \mapsto z \log {z}\). We use base 2 logarithms instead of natural logarithms to measure information (the unit of information is bits). Converting to natural logarithms would cost one extra multiplication by \(\ln {2}\) per feature.

To approximate \(\log _2{z}\) we convert z to base-2 floating point \(z=w \cdot 2^p\) where p is an integer and \(1/2 < w \le 1\). Note that \(\log _2{z} = p + \log _2{w}\) and employ a Padé rational function approximation to approximate \(\log _2{w}\) to the desired precision [49]. This approximation applied to a secret-shared value \([z]\) is denoted \(\log _2 [z]\). Note that there exist alternative ways to approximate the logarithm, for example Taylor polynomials. Compared to Taylor expansions however, Padé approximants are rational functions that allow using lower degree polynomials to approximate the function up to a given order [49]. This means that fewer multiplications are needed in the MPC protocol. For instance, in our experiments in Sect. 6.2, the logarithm is approximated as a quotient of two cubic polynomials while agreeing with the logarithm up to order 6. We have also considered alternatives such as CORDIC type approximations [55], other iterative schemes [40] or using logarithm tables, but have chosen to use Padé approximations based on test implementations and the number of multiplications needed to achieve sufficient precision.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Eklund, D., Iacovazzi, A., Wang, H., Pyrgelis, A., Raza, S. (2024). BMI: Bounded Mutual Information for Efficient Privacy-Preserving Feature Selection. In: Garcia-Alfaro, J., Kozik, R., Choraś, M., Katsikas, S. (eds) Computer Security – ESORICS 2024. ESORICS 2024. Lecture Notes in Computer Science, vol 14983. Springer, Cham. https://doi.org/10.1007/978-3-031-70890-9_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-70890-9_18
Published: 06 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70889-3
Online ISBN: 978-3-031-70890-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

BMI: Bounded Mutual Information for Efficient Privacy-Preserving Feature Selection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient Unconditionally Secure Comparison and Privacy Preserving Machine Learning Classification Protocols

PD-ML-Lite: Private Distributed Machine Learning from Lightweight Cryptography

Balanced Privacy Budget Allocation for Privacy-Preserving Machine Learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

A Appendix: Cutoff Probability

Lemma 2

Proof

B Appendix: Computing the Logarithm in MPC

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

BMI: Bounded Mutual Information for Efficient Privacy-Preserving Feature Selection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient Unconditionally Secure Comparison and Privacy Preserving Machine Learning Classification Protocols

PD-ML-Lite: Private Distributed Machine Learning from Lightweight Cryptography

Balanced Privacy Budget Allocation for Privacy-Preserving Machine Learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

A Appendix: Cutoff Probability

Lemma 2

Proof

B Appendix: Computing the Logarithm in MPC

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation