[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

BMI: Bounded Mutual Information for Efficient Privacy-Preserving Feature Selection

  • Conference paper
  • First Online:
Computer Security – ESORICS 2024 (ESORICS 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14983))

Included in the following conference series:

  • 463 Accesses

Abstract

We introduce low complexity bounds on mutual information for efficient privacy-preserving feature selection with secure multi-party computation (MPC). Considering a discrete feature with N possible values and a discrete label with M possible values, our approach requires O(N) multiplications as opposed to O(NM) in a direct MPC implementation of mutual information. Our experimental results show that for regression tasks, we achieve a computation speed up of over 1,000\(\times \) compared to a straightforward MPC implementation of mutual information, while achieving similar accuracy for the downstream machine learning model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 49.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 64.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://yann.lecun.com/exdb/mnist/.

  2. 2.

    https://archive.ics.uci.edu/dataset/170/gisette.

  3. 3.

    https://archive.ics.uci.edu/dataset/94/spambase.

References

  1. Akavia, A., Galili, B., Shaul, H., Weiss, M., Yakhini, Z.: Privacy preserving feature selection for sparse linear regression. Cryptology ePrint Archive, Paper 2023/1354 (2023). https://eprint.iacr.org/2023/1354

  2. Alishahi, M., Moghtadaiee, V.: Feature selection on anonymized datasets. In: IEEE International Conference on Dependable, Autonomic and Secure Computing (DASC) (2023)

    Google Scholar 

  3. Alishahi, M., Moghtadaiee, V., Navidan, H.: Add noise to remove noise: local differential privacy for feature selection. Comput. Secur. (2022)

    Google Scholar 

  4. Aono, Y., Hayashi, T., Wang, L., Moriai, S., et al.: Privacy-preserving deep learning via additively homomorphic encryption. IEEE Trans. Inf. Forensics Secur. (2017)

    Google Scholar 

  5. Araki, T., Furukawa, J., Lindell, Y., Nof, A., Ohara, K.: High-throughput semi-honest secure three-party computation with an honest majority. In: Proceedings of the 2016 ACM Conference on Computer and Communications Security (2016)

    Google Scholar 

  6. Banerjee, M., Chakravarty, S.: Privacy preserving feature selection for distributed data using virtual dimension. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management (2011)

    Google Scholar 

  7. Banerjee, S., Elmroth, E., Bhuyan, M.: Fed-FiS: a novel information-theoretic federated feature selection for learning stability. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds.) ICONIP 2021. CCIS, vol. 1516, pp. 480–487. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-92307-5_56

    Chapter  Google Scholar 

  8. Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. (1994)

    Google Scholar 

  9. Bogdanov, D., Laur, S., Willemson, J.: Sharemind: a framework for fast privacy-preserving computations. In: Jajodia, S., Lopez, J. (eds.) ESORICS 2008. LNCS, vol. 5283, pp. 192–206. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88313-5_13

    Chapter  Google Scholar 

  10. Cassara, P., Gotta, A., Valerio, L.: Federated feature selection for cyber-physical systems of systems. IEEE Trans. Veh. Technol. (2022)

    Google Scholar 

  11. Catrina, O., de Hoogh, S.: Improved primitives for secure multiparty integer computation. In: Garay, J.A., De Prisco, R. (eds.) SCN 2010. LNCS, vol. 6280, pp. 182–199. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15317-4_13

    Chapter  Google Scholar 

  12. Catrina, O., Saxena, A.: Secure computation with fixed-point numbers. In: Sion, R. (ed.) FC 2010. LNCS, vol. 6052, pp. 35–50. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14577-3_6

    Chapter  Google Scholar 

  13. Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. (2014)

    Google Scholar 

  14. Covert, I.C., Qiu, W., Lu, M., Kim, N.Y., White, N.J., Lee, S.-I.: Learning to maximize mutual information for dynamic feature selection. In: International Conference on Machine Learning. PMLR (2023)

    Google Scholar 

  15. Cramer, R., Damgård, I.B., et al.: Secure Multiparty Computation. Cambridge University Press (2015)

    Google Scholar 

  16. Dankar, F.K., Madathil, N., Dankar, S.K., Boughorbel, S.: Privacy-preserving analysis of distributed biomedical data: designing efficient and secure multiparty computations using distributed statistical learning theory. JMIR Med. Inf. (2019)

    Google Scholar 

  17. Evans, D., Kolesnikov, V., Rosulek, M., et al.: A pragmatic introduction to secure multi-party computation. Found. Trends® Priv. Secur. (2018)

    Google Scholar 

  18. Froelicher, D., et al.: Scalable and privacy-preserving federated principal component analysis. In: IEEE Symposium on Security and Privacy (SP) (2023)

    Google Scholar 

  19. Froelicher, D., et al.: Scalable privacy-preserving distributed learning. In: Proceedings on Privacy Enhancing Technologies (2021)

    Google Scholar 

  20. Fu, R., Wu, Y., Xu, Q., Zhang, M.: FEAST: a communication-efficient federated feature selection framework for relational data. Proc. ACM Manag. Data (2023)

    Google Scholar 

  21. Gascón, A.: Privacy-preserving distributed linear regression on high-dimensional data. Proc. Priv. Enhancing Technol. (2017)

    Google Scholar 

  22. Goldreich, O., Micali, S., Wigderson, A.: How to play any mental game, or a completeness theorem for protocols with honest majority, pp. 307–328. Association for Computing Machinery (2019)

    Google Scholar 

  23. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. (2003)

    Google Scholar 

  24. Hu, Y., Zhang, Y., Gong, D., Sun, X.: Multi-participant federated feature selection algorithm with particle swarm optimizaiton for imbalanced data under privacy protection. IEEE Trans. Artif. Intell. (2022)

    Google Scholar 

  25. Jafer, Y., Matwin, S., Sokolova, M.: A framework for a privacy-aware feature selection evaluation measure. In: 13th Annual Conference on Privacy, Security and Trust (PST). IEEE (2015)

    Google Scholar 

  26. Kamm, L., Bogdanov, D., Laur, S., Vilo, J.: A new way to protect privacy in large-scale genome-wide association studies. Bioinformatics (2013)

    Google Scholar 

  27. Karhu, T.: Desaturation event scoring criteria affect the perceived severity of nocturnal hypoxic load. Sleep Med. 100, 479–486 (2022)

    Google Scholar 

  28. Karhu, T., Leppänen, T., Töyräs, J., Oksenberg, A., Myllymaa, S., Nikkonen, S.: ABOSA - freely available automatic blood oxygen saturation signal analysis software: structure and validation. Comput. Methods Programs Biomed. (2022)

    Google Scholar 

  29. Karhu, T., Myllymaa, S., Nikkonen, S., Mazzotti, D.R., Töyräs, J., Leppänen, T.: Longer and deeper desaturations are associated with the worsening of mild sleep apnea: the sleep heart health study. Front. Neurosci. 15, 657126 (2021)

    Article  Google Scholar 

  30. Keller, M.: MP-SPDZ: a versatile framework for multi-party computation. In: Proceedings of the ACM Conference on Computer and Communications Security (2020)

    Google Scholar 

  31. Klosh, G., et al.: The siesta project polygraphic and clinical database. IEEE Eng. Med. Biol. Mag. (2001)

    Google Scholar 

  32. Knott, B., Venkataraman, S., Hannun, A., Sengupta, S., Ibrahim, M., van der Maaten, L.: CrypTen: secure multi-party computation meets machine learning. In: Advances in Neural Information Processing Systems (2021)

    Google Scholar 

  33. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. (1997)

    Google Scholar 

  34. Küsters, R., Liedtke, J., Müller, J., Rausch, D., Vogt, A.: Ordinos: a verifiable tally-hiding e-voting system. In: IEEE European Symposium on Security and Privacy (EuroS &P) (2020)

    Google Scholar 

  35. Langley, P., et al.: Selection of relevant features in machine learning. In: Proceedings of the AAAI Fall Symposium on Relevance (1994)

    Google Scholar 

  36. Law, M.H., Figueiredo, M.A., Jain, A.K.: Simultaneous feature selection and clustering using mixture models. IEEE Trans. Pattern Anal. Mach. Intell. (2004)

    Google Scholar 

  37. Li, A., et al.: Efficient and privacy-preserving feature importance-based vertical federated learning. IEEE Trans. Mob. Comput. (2023)

    Google Scholar 

  38. Li, J., et al.: Feature selection: a data perspective. ACM Comput. Surv. (CSUR) (2017)

    Google Scholar 

  39. Li, X., Dowsley, R., De Cock, M.: Privacy-preserving feature selection with secure multiparty computation. In: International Conference on Machine Learning. PMLR (2021)

    Google Scholar 

  40. Majithia, J., Levan, D.: A note on base-2 logarithm computations. Proc. IEEE 61(10), 1519–1520 (1973)

    Article  Google Scholar 

  41. Mitra, P., Murthy, C., Pal, S.K.: Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell. (2002)

    Google Scholar 

  42. Nikolaenko, V., Weinsberg, U., Ioannidis, S., Joye, M., Boneh, D., Taft, N.: Privacy-preserving ridge regression on hundreds of millions of records. In: IEEE Symposium on Security and Privacy (SP) (2013)

    Google Scholar 

  43. Ono, S., Takata, J., Kataoka, M., Shin, T.I.K., Sakamoto, H.: Privacy-preserving feature selection with fully homomorphic encryption. Algorithms (2022)

    Google Scholar 

  44. Polychroniadou, A., et al.: Prime match: a privacy-preserving inventory matching system. In: 32nd USENIX Security Symposium (2023)

    Google Scholar 

  45. Qin, Y., Kondo, M.: Federated learning-based network intrusion detection with a feature selection approach. In: IEEE International Conference on Electrical, Communication, and Computer Engineering (2021)

    Google Scholar 

  46. Raab, R.: Federated electronic health records for the European health data space. Lancet Digit. Health (2023)

    Google Scholar 

  47. Rao, V., Long, Y., Eldardiry, H., Rane, S., Rossi, R., Torres, F.: Secure two-party feature selection. arXiv preprint arXiv:1901.00832 (2019)

  48. Romero, E., Sopena, J.M.: Performing feature selection with multilayer perceptrons. IEEE Trans. Neural Netw. (2008)

    Google Scholar 

  49. Russell Graves-Morris, P., Saff, E.B., Varga, R.S.: Rational Approximation and Interpolation. Lecture Notes in Mathematics. Springer, Heidelberg (1983). https://doi.org/10.1007/BFb0072395

  50. Sangers, A., et al.: Secure multiparty pagerank algorithm for collaborative fraud detection. In: Goldberg, I., Moore, T. (eds.) FC 2019. LNCS, vol. 11598, pp. 605–623. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32101-7_35

    Chapter  Google Scholar 

  51. Sav, S., Diaa, A., Pyrgelis, A., Bossuat, J.-P., Hubaux, J.-P.: Privacy-preserving federated recurrent neural networks. In: Proceedings on Privacy Enhancing Technologies (2023)

    Google Scholar 

  52. Sav, S., et al.: POSEIDON: privacy-preserving federated neural network learning. In: 28th Network And Distributed System Security Symposium (NDSS). Internet Society (2021)

    Google Scholar 

  53. Sheikhalishahi, M., Martinelli, F.: Privacy-utility feature selection as a privacy mechanism in collaborative data classification. In: IEEE International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (2017)

    Google Scholar 

  54. Stracuzzi, D.J., Utgoff, P.E.: Randomized variable elimination. J. Mach. Learn. Res. (2004)

    Google Scholar 

  55. Volder, J.: The CORDIC computing technique. In: International Workshop on Managing Requirements Knowledge. IEEE Computer Society (1959)

    Google Scholar 

  56. Wagh, S., Gupta, D., Chandran, N.: SecureNN: 3-party secure computation for neural network training. In: Proceedings on Privacy Enhancing Technologies (2019)

    Google Scholar 

  57. Yang, J., Honavar, V.: Feature subset selection using a genetic algorithm. IEEE Intell. Syst. Their Appl. (1998)

    Google Scholar 

  58. Yao, A.C.: Protocols for secure computations. In: 23rd Annual Symposium on Foundations of Computer Science. IEEE (1982)

    Google Scholar 

  59. Yao, X., Huang, T., Wu, C., Zhang, R., Sun, L.: Towards faster and better federated learning: a feature fusion approach. In: IEEE International Conference on Image Processing (ICIP) (2019)

    Google Scholar 

  60. Zhang, Q., Wang, C., Wu, H., Xin, C., Phuong, T.V.: GELU-Net: a globally encrypted, locally unencrypted deep neural network for privacy-preserved learning. In: International Joint Conferences on Artificial Intelligence (2018)

    Google Scholar 

  61. Zhang, R., Li, H., Hao, M., Chen, H., Zhang, Y.: Secure feature selection for vertical federated learning in eHealth systems. In: IEEE International Conference on Communications (2022)

    Google Scholar 

  62. Zhang, X., Mavromatics, A., Vafeas, A., Nejabati, R., Simeonidou, D.: Federated feature selection for horizontal federated learning in IoT networks. IEEE Internet Things (2023)

    Google Scholar 

Download references

Acknowledgments

This research is funded by the EU Horizon Europe project HARPOCRATES (Grant ID. 101069535) and H2020 project CONCORDIA (Grant ID. 830927). We thank Tuomas Karhu for preparing the SpO2 data as well as help and advice in the process. We would also like to thank the anonymous reviewers for their comments and suggested improvements.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Eklund .

Editor information

Editors and Affiliations

Appendices

A  Appendix: Cutoff Probability

With x, \(x_1,...,x_k\) and \(\lambda _1,\dots ,\lambda _k\) as in Sect. 5.1, we shall see that we can modify the bound \(H(x) \le \sum _j \lambda _j C(x,x_j)\) by cutting off probabilities in \(C(x,x_j)\) at a given level \(0 < \epsilon \le \min _j(e^{-1/\lambda _j})\). More in detail, \(H(x) \le \sum _j \lambda _j \bar{C}(x,x_j)\) where \(\bar{C}(x,x_j)=\sum _i - \bar{p}_i(x) \log {\bar{p}_i(x_j)}\), \(\bar{p}_i(x_j)=\max (p_i(x_j),\epsilon )\) and \(\bar{p}_i(x)=\sum _j \lambda _j \bar{p}_i(x_j)\).

Note that \(f(z)=-z\log {z}\), for \(0 < z \le 1\), increases up to \(z=e^{-1}\) where it has its maximum \(f(e^{-1})=e^{-1}\) and decreases after that. Let \(g(z_1,\dots ,z_k)=(\sum _j -\lambda _j z_j)\sum _j \lambda _j \log {z_j}\) for \(0 < z_1,\dots ,z_k \le 1\). Then, \(H(x)=\sum _i f(p_i(x))\) and \(\sum _j \lambda _j C(x,x_j) = \sum _i g(p_i(x_1),\dots ,p_i(x_k))\). In the same way, \(\sum _j \lambda _j \bar{C}(x,x_j) = \sum _i g(\bar{p}_i(x_1),\dots ,\bar{p}_i(x_k))\) and using the concavity of the logarithm, we have a term wise inequality \(f(p_i(x)) \le g(p_i(x_1),\dots ,p_i(x_k))\).

Lemma 2

We have the alternative bound \(H(x) \le \sum _j \lambda _j \bar{C}(x,x_j)\).

Proof

We shall see that \(f(p_i(x)) \le g(\bar{p}_i(x_1),\dots ,\bar{p}_i(x_k))\) for all i. Fix \(1 \le i \le N\) and assume that at least one \(p_i(x_l) \le \epsilon \) so that \(\bar{p}_i(x_l)=\epsilon \). Suppose first that \(\bar{p}_i(x) < e^{-1}\). Since \(p_i(x) \le \bar{p}_i(x)\), we have that \(f(p_i(x)) \le f(\bar{p}_i(x))\) and \(f(\bar{p}_i(x)) \le g(\bar{p}_i(x_1),\dots ,\bar{p}_i(x_k))\) by the concavity of the logarithm. Now suppose that \(\bar{p}_i(x) \ge e^{-1}\). In this case, \(g(\bar{p}_i(x_1),\dots ,\bar{p}_i(x_k))=-\bar{p}_i(x)\sum _j \lambda _j \log {\bar{p}_i(x_j)} \ge \bar{p}_i(x)(-\lambda _l \log {\epsilon }) \ge \bar{p}_i(x)\) since \(-\lambda _l \log {\epsilon } \ge 1\). Finally, \(\bar{p}_i(x) \ge e^{-1} \ge f(p_i(x))\).

B  Appendix: Computing the Logarithm in MPC

To compute entropy, cross-entropy, and mutual information under MPC, we need to evaluate the logarithm, or more precisely the function \(z \mapsto z \log {z}\). We use base 2 logarithms instead of natural logarithms to measure information (the unit of information is bits). Converting to natural logarithms would cost one extra multiplication by \(\ln {2}\) per feature.

To approximate \(\log _2{z}\) we convert z to base-2 floating point \(z=w \cdot 2^p\) where p is an integer and \(1/2 < w \le 1\). Note that \(\log _2{z} = p + \log _2{w}\) and employ a Padé rational function approximation to approximate \(\log _2{w}\) to the desired precision [49]. This approximation applied to a secret-shared value \([z]\) is denoted \(\log _2 [z]\). Note that there exist alternative ways to approximate the logarithm, for example Taylor polynomials. Compared to Taylor expansions however, Padé approximants are rational functions that allow using lower degree polynomials to approximate the function up to a given order [49]. This means that fewer multiplications are needed in the MPC protocol. For instance, in our experiments in Sect. 6.2, the logarithm is approximated as a quotient of two cubic polynomials while agreeing with the logarithm up to order 6. We have also considered alternatives such as CORDIC type approximations [55], other iterative schemes [40] or using logarithm tables, but have chosen to use Padé approximations based on test implementations and the number of multiplications needed to achieve sufficient precision.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Eklund, D., Iacovazzi, A., Wang, H., Pyrgelis, A., Raza, S. (2024). BMI: Bounded Mutual Information for Efficient Privacy-Preserving Feature Selection. In: Garcia-Alfaro, J., Kozik, R., Choraś, M., Katsikas, S. (eds) Computer Security – ESORICS 2024. ESORICS 2024. Lecture Notes in Computer Science, vol 14983. Springer, Cham. https://doi.org/10.1007/978-3-031-70890-9_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70890-9_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70889-3

  • Online ISBN: 978-3-031-70890-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics