[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3357384.3357878acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article
Public Access

Privacy-Preserving Tensor Factorization for Collaborative Health Data Analysis

Published: 03 November 2019 Publication History

Abstract

Tensor factorization has been demonstrated as an efficient approach for computational phenotyping, where massive electronic health records (EHRs) are converted to concise and meaningful clinical concepts. While distributing the tensor factorization tasks to local sites can avoid direct data sharing, it still requires the exchange of intermediary results which could reveal sensitive patient information. Therefore, the challenge is how to jointly decompose the tensor under rigorous and principled privacy constraints, while still support the model's interpretability. We propose DPFact, a privacy-preserving collaborative tensor factorization method for computational phenotyping using EHR. It embeds advanced privacy-preserving mechanisms with collaborative learning. Hospitals can keep their EHR database private but also collaboratively learn meaningful clinical concepts by sharing differentially private intermediary results. Moreover, DPFact solves the heterogeneous patient population using a structured sparsity term. In our framework, each hospital decomposes its local tensors and sends the updated intermediary results with output perturbation every several iterations to a semi-trusted server which generates the phenotypes. The evaluation on both real-world and synthetic datasets demonstrated that under strict privacy constraints, our method is more accurate and communication-efficient than state-of-the-art baseline methods.

References

[1]
Brett W. Bader, Tamara G. Kolda, et almbox. 2017. MATLAB Tensor Toolbox Version 3.0-dev. Available online. https://gitlab.com/tensors/tensor_toolbox
[2]
Arnaud Berlioz, Arik Friedman, Mohamed Ali Kaafar, Roksana Boreli, and Shlomo Berkovsky. 2015. Applying differential privacy to matrix factorization. In Proceedings of the 9th ACM Conference on Recommender Systems. ACM, 107--114.
[3]
Alex Beutel, Partha Pratim Talukdar, Abhimanu Kumar, Christos Faloutsos, Evangelos E Papalexakis, and Eric P Xing. 2014. Flexifact: Scalable flexible factorization of coupled tensors on hadoop. In Proceedings of the 2014 SDM. 109--117.
[4]
Mark Bun and Thomas Steinke. 2016. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography Conference. Springer, 635--658.
[5]
Eric C Chi and Tamara G Kolda. 2012. On tensors, sparsity, and nonnegative factorizations. SIAM J. Matrix Anal. Appl., Vol. 33, 4 (2012), 1272--1299.
[6]
Joon Hee Choi and S Vishwanathan. 2014. DFacTo: Distributed factorization of tensors. In NIPS. 1296--1304.
[7]
Patrick L Combettes and Jean-Christophe Pesquet. 2011. Proximal splitting methods in signal processing. In Fixed-point algorithms for inverse problems in science and engineering. Springer, 185--212.
[8]
Cynthia Dwork, Aaron Roth, et almbox. 2014. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, Vol. 9, 3--4 (2014), 211--407.
[9]
Cynthia Dwork and Guy N Rothblum. 2016. Concentrated differential privacy. arXiv preprint arXiv:1603.01887 (2016).
[10]
Cynthia Dwork, Guy N Rothblum, and Salil Vadhan. 2010. Boosting and differential privacy. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science. IEEE, 51--60.
[11]
Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. 2015. Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. ACM, 1322--1333.
[12]
Trisha Greenhalgh, Susan Hinder, Katja Stramer, Tanja Bratan, and Jill Russell. 2010. Adoption, non-adoption, and abandonment of a personal electronic health record: case study of HealthSpace. Bmj, Vol. 341 (2010), c5814.
[13]
Yuhong Guo and Wei Xue. 2013. Probabilistic Multi-Label Classification with Sparse Feature Learning. In IJCAI . 1373--1379.
[14]
Briland Hitaj, Giuseppe Ateniese, and Fernando Perez-Cruz. 2017. Deep models under the GAN: information leakage from collaborative deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM, 603--618.
[15]
Joyce C Ho, Joydeep Ghosh, and Jimeng Sun. 2014. Marble: high-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization. In Proceedings of the 20th ACM SIGKDD. ACM, 115--124.
[16]
Jingyu Hua, Chang Xia, and Sheng Zhong. 2015. Differentially Private Matrix Factorization. In IJCAI. 1763--1770.
[17]
Alistair EW Johnson, Tom J Pollard, Lu Shen, H Lehman Li-wei, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. 2016. MIMIC-III, a freely accessible critical care database. Scientific data, Vol. 3 (2016), 160035.
[18]
U Kang, Evangelos Papalexakis, Abhay Harpale, and Christos Faloutsos. 2012. Gigatensor: scaling tensor analysis up by 100 times-algorithms and discoveries. In Proceedings of the 18th ACM SIGKDD. ACM, 316--324.
[19]
Yejin Kim, Robert El-Kareh, Jimeng Sun, Hwanjo Yu, and Xiaoqian Jiang. 2017a. Discriminative and distinct phenotyping by constrained tensor factorization. Scientific reports, Vol. 7, 1 (2017), 1114.
[20]
Yejin Kim, Jimeng Sun, Hwanjo Yu, and Xiaoqian Jiang. 2017b. Federated tensor factorization for computational phenotyping. In Proceedings of the 23rd ACM SIGKDD. ACM, 887--895.
[21]
Jun Liu, Shuiwang Ji, and Jieping Ye. 2009. Multi-task feature learning via efficient $ell_2, 1$-norm minimization. In UAI . 339--348.
[22]
Ziqi Liu, Yu-Xiang Wang, and Alexander Smola. 2015. Fast differentially private matrix factorization. In Proceedings of the 9th ACM RecSys . 171--178.
[23]
Jing Ma, Qiuchen Zhang, Jian Lou, Joyce C Ho, Li Xiong, and Xiaoqian Jiang. 2019. Privacy-Preserving Tensor Factorization for Collaborative Health Data Analysis. arXiv preprint arXiv:1908.09888 (2019).
[24]
Feiping Nie, Heng Huang, Xiao Cai, and Chris H Ding. 2010. Efficient and robust feature selection via joint $ell_2, 1$-norms minimization. In NeurIPS . 1813--1821.
[25]
Rachel L Richesson, Jimeng Sun, Jyotishman Pathak, Abel N Kho, and Joshua C Denny. 2016. Clinical phenotyping in selected national networks: demonstrating the need for high-throughput, portable, and computational methods. Artificial intelligence in medicine, Vol. 71 (2016), 57--61.
[26]
Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. 2017. Membership inference attacks against machine learning models. In Security and Privacy (SP), 2017 IEEE Symposium on. IEEE, 3--18.
[27]
Yining Wang and Anima Anandkumar. 2016. Online and differentially-private tensor decomposition. In NeurIPS . 3531--3539.
[28]
Yichen Wang, Robert Chen, Joydeep Ghosh, Joshua C Denny, Abel Kho, You Chen, Bradley A Malin, and Jimeng Sun. 2015. Rubik: Knowledge guided tensor factorization and completion for health data analytics. In Proceedings of the 21th ACM SIGKDD. ACM, 1265--1274.
[29]
Wei-Qi Wei and Joshua C Denny. 2015. Extracting research-quality phenotypes from electronic health records to support precision medicine. Genome medicine, Vol. 7, 1 (2015), 41.
[30]
Xi Wu, Fengan Li, Arun Kumar, Kamalika Chaudhuri, Somesh Jha, and Jeffrey Naughton. 2017. Bolt-on differential privacy for scalable stochastic gradient descent-based analytics. In Proceedings of the 2017 ACM International Conference on Management of Data. ACM, 1307--1322.
[31]
Xiao Xu, Shu-Xia Li, Haiqun Lin, SL Normand, Tara Lagu, Nihar Desai, Michael Duan, Eugene A Kroch, and Harlan M Krumholz. 2016. Hospital Phenotypes in the Management of Patients Admitted for Acute Myocardial Infarction. Medical care, Vol. 54, 10 (2016), 929--936.
[32]
Yi Yang, Heng Tao Shen, Zhigang Ma, Zi Huang, and Xiaofang Zhou. 2011. $ell_2, 1$-norm regularized discriminative feature selection for unsupervised learning. In IJCAI, Vol. 22. 1589.
[33]
Doaa Youssef, Hadeel Abd-Elrahman, Mohamed M Shehab, Mohamed Abd-Elrheem, et almbox. 2015. Incidence of acute kidney injury in the neonatal intensive care unit. Saudi journal of kidney diseases and transplantation, Vol. 26, 1 (2015), 67.
[34]
Lei Yu, Ling Liu, Calton Pu, Mehmet Emre Gursoy, and Stacey Truex. 2019. Differentially Private Model Publishing for Deep Learning. arXiv preprint arXiv:1904.02200 (2019).
[35]
Sixin Zhang, Anna E Choromanska, and Yann LeCun. 2015. Deep learning with elastic averaging SGD. In NeurIPS. 685--693.

Cited By

View all
  • (2024)DPAR: Decoupled Graph Neural Networks with Node-Level Differential PrivacyProceedings of the ACM Web Conference 202410.1145/3589334.3645531(1170-1181)Online publication date: 13-May-2024
  • (2024)Differentially Private Federated Tensor Completion for Cloud–Edge Collaborative AIoT Data PredictionIEEE Internet of Things Journal10.1109/JIOT.2023.331446011:1(256-267)Online publication date: 1-Jan-2024
  • (2024)Privacy Mechanisms and Evaluation Metrics for Synthetic Data Generation: A Systematic ReviewIEEE Access10.1109/ACCESS.2024.341760812(88048-88074)Online publication date: 2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management
November 2019
3373 pages
ISBN:9781450369763
DOI:10.1145/3357384
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. collaborative learning
  2. differential privacy
  3. phenotyping
  4. tensor factorization

Qualifiers

  • Research-article

Funding Sources

Conference

CIKM '19
Sponsor:

Acceptance Rates

CIKM '19 Paper Acceptance Rate 202 of 1,031 submissions, 20%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)193
  • Downloads (Last 6 weeks)16
Reflects downloads up to 19 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)DPAR: Decoupled Graph Neural Networks with Node-Level Differential PrivacyProceedings of the ACM Web Conference 202410.1145/3589334.3645531(1170-1181)Online publication date: 13-May-2024
  • (2024)Differentially Private Federated Tensor Completion for Cloud–Edge Collaborative AIoT Data PredictionIEEE Internet of Things Journal10.1109/JIOT.2023.331446011:1(256-267)Online publication date: 1-Jan-2024
  • (2024)Privacy Mechanisms and Evaluation Metrics for Synthetic Data Generation: A Systematic ReviewIEEE Access10.1109/ACCESS.2024.341760812(88048-88074)Online publication date: 2024
  • (2024)FedPAR: Federated PARAFAC2 tensor factorization for computational phenotypingIISE Transactions on Healthcare Systems Engineering10.1080/24725579.2024.233326114:3(264-275)Online publication date: 8-Apr-2024
  • (2024)Federated learning in healthcare applicationsData Fusion Techniques and Applications for Smart Healthcare10.1016/B978-0-44-313233-9.00013-8(157-196)Online publication date: 2024
  • (2024)BAFFLE: A Baseline of Backpropagation-Free Federated LearningComputer Vision – ECCV 202410.1007/978-3-031-73226-3_6(89-109)Online publication date: 1-Nov-2024
  • (2024)A Survey of Advances in Multimodal Federated Learning with ApplicationsMultimodal and Tensor Data Analytics for Industrial Systems Improvement10.1007/978-3-031-53092-0_15(315-344)Online publication date: 26-Feb-2024
  • (2023)Creating High-Quality Synthetic Health Data: A Framework for Model Development and Validation (Preprint)JMIR Formative Research10.2196/53241Online publication date: 2-Oct-2023
  • (2023)Closed-form Machine Unlearning for Matrix FactorizationProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614811(3278-3287)Online publication date: 21-Oct-2023
  • (2023)FedSaw: Communication-Efficient Cross-Silo Federated Learning with Adaptive Compression2023 IEEE 20th International Conference on Mobile Ad Hoc and Smart Systems (MASS)10.1109/MASS58611.2023.00013(37-45)Online publication date: 25-Sep-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media