[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3583780.3615262acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Open access

Patient Clustering via Integrated Profiling of Clinical and Digital Data

Published: 21 October 2023 Publication History


We introduce a novel profile-based patient clustering model designed for healthcare clinical data. By utilizing a method grounded on constrained low-rank approximation, our model takes advantage of patients' clinical data and digital interaction data, including browsing and search, to construct patient profiles. As a result of the method, nonnegative embedding vectors are generated, serving as a low-dimensional representation of the patients. Our model was assessed using real-world patient data from a healthcare web portal, with a comprehensive evaluation approach which considered clustering and recommendation capabilities. In comparison to other baselines, our approach demonstrated superior performance in terms of clustering coherence and recommendation accuracy.


D.P. Bertsekas. 1999. Nonlinear Programming. Athena Scientific.
Qingyu Chen, Yifan Peng, and Zhiyong Lu. 2019. BioSentVec: creating sentence embeddings for biomedical texts. In 2019 IEEE International Conference on Healthcare Informatics (ICHI). IEEE, 1--5.
Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785--794.
Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stewart. 2016. Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. Advances in neural information processing systems, Vol. 29 (2016).
David C Chou and Amy Y Chou. 2002. Healthcare information portal: a web technology for the healthcare community. Technology in Society, Vol. 24, 3 (2002), 317--330.
Steven S Coughlin, Judith J Prochaska, Lovoria B Williams, Gina M Besenyi, Vahé Heboyan, D Stephen Goggans, Wonsuk Yoo, and Gianluca De Leo. 2017. Patient web portals, disease management, and primary prevention. Risk management and healthcare policy (2017), 33--40.
David L Davies and Donald W Bouldin. 1979. A cluster separation measure. IEEE transactions on pattern analysis and machine intelligence 2 (1979), 224--227.
Rundong Du, Barry L. Drake, and Haesun Park. 2019. Hybrid clustering based on content and connection structure using joint nonnegative matrix factorization. J. Glob. Optim., Vol. 74, 4 (2019), 861--877. https://doi.org/10.1007/s10898-017-0578-x
Rundong Du, Da Kuang, Barry Drake, and Haesun Park. 2017. Hierarchical Community Detection via Rank-2 Symmetric Nonnegative Matrix Factorization. Computational Social Networks, Vol. 4 (12 2017), 1 -- 26. https://doi.org/10.1186/s40649-017-0043--5
Michael D Ekstrand, John T Riedl, Joseph A Konstan, et al. 2011. Collaborative filtering recommender systems. Foundations and Trends® in Human-Computer Interaction, Vol. 4, 2 (2011), 81--173.
Aron Henriksson, Jing Zhao, Henrik Boström, and Hercules Dalianis. 2015. Modeling electronic health records in ensembles of semantic spaces for adverse drug event detection. In 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 343--350.
Anil K. Jain, Karthik Nandakumar, and Arun Ross. 2005. Score normalization in multimodal biometric systems. Pattern Recognit., Vol. 38, 12 (2005), 2270--2285. https://doi.org/10.1016/j.patcog.2005.01.012
Hannah Kim, Jaegul Choo, Jingu Kim, Chandan K. Reddy, and Haesun Park. 2015. Simultaneous Discovery of Common and Discriminative Topics via Joint Nonnegative Matrix Factorization. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, August 10--13, 2015, Longbing Cao, Chengqi Zhang, Thorsten Joachims, Geoffrey I. Webb, Dragos D. Margineantu, and Graham Williams (Eds.). ACM, 567--576. https://doi.org/10.1145/2783258.2783338
Hyunsoo Kim and Haesun Park. 2008a. Nonnegative Matrix Factorization Based on Alternating Nonnegativity Constrained Least Squares and Active Set Method. SIAM J. Matrix Anal. Appl., Vol. 30, 2 (2008), 713--730.
Jingu Kim, Yunlong He, and Haesun Park. 2014. Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework. J. Glob. Optim., Vol. 58, 2 (2014), 285--319.
Jingu Kim and Haesun Park. 2008b. Sparse nonnegative matrix factorization for clustering. Technical Report. Georgia Institute of Technology.
Jingu Kim and Haesun Park. 2011. Fast Nonnegative Matrix Factorization: An Active-Set-Like Method and Comparisons. SIAM J. Sci. Comput., Vol. 33, 6 (2011), 3261--3281. https://doi.org/10.1137/110821172
Da Kuang, Jaegul Choo, and Haesun Park. 2015a. Nonnegative matrix factorization for interactive topic modeling and document clustering. Partitional clustering algorithms (2015), 215--243.
Da Kuang, Sangwoon Yun, and Haesun Park. 2015b. SymNMF: Nonnegative low-rank approximation of a similarity matrix for graph clustering. Journal of Global Optimization, Vol. 62 (07 2015). https://doi.org/10.1007/s10898-014-0247--2
Quoc V. Le and Tomá s Mikolov. 2014. Distributed Representations of Sentences and Documents. In Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21--26 June 2014 (JMLR Workshop and Conference Proceedings, Vol. 32). JMLR.org, 1188--1196. http://proceedings.mlr.press/v32/le14.html
Tomá s Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013a. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5--8, 2013, Lake Tahoe, Nevada, United States, Christopher J. C. Burges, Lé on Bottou, Zoubin Ghahramani, and Kilian Q. Weinberger (Eds.). 3111--3119.
Tomá s Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013b. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5--8, 2013, Lake Tahoe, Nevada, United States, Christopher J. C. Burges, Lé on Bottou, Zoubin Ghahramani, and Kilian Q. Weinberger (Eds.). 3111--3119.
Linda E Moody. 2005. E-health web portals: delivering holistic healthcare and making home the point of care. Holistic nursing practice, Vol. 19, 4 (2005), 156--160.
Maximilian Nickel, Kevin Murphy, Volker Tresp, and Evgeniy Gabrilovich. 2016. A Review of Relational Machine Learning for Knowledge Graphs. Proc. IEEE, Vol. 104, 1 (2016), 11--33. https://doi.org/10.1109/JPROC.2015.2483592
World Health Organization et al. 1992. The ICD-10 classification of mental and behavioural disorders: clinical descriptions and diagnostic guidelines. World Health Organization.
Alexander Pretschner and Susan Gauch. 1999. Ontology Based Personalized Search. In 11th IEEE International Conference on Tools with Artificial Intelligence, ICTAI '99, Chicago, Illinois, USA, November 8--10, 1999. IEEE Computer Society, 391--398. https://doi.org/10.1109/TAI.1999.809829
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog, Vol. 1, 8 (2019), 9.
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3--7, 2019, Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association for Computational Linguistics, 3980--3990. https://doi.org/10.18653/v1/D19--1410
Raymond Reiter. 1977. On Closed World Data Bases. In Logic and Data Bases, Symposium on Logic and Data Bases, Centre d'é tudes et de recherches de Toulouse, France, 1977 (Advances in Data Base Theory), Hervé Gallaire and Jack Minker (Eds.). Plemum Press, New York, 55--76. https://doi.org/10.1007/978--1--4684--3384--5_3
Peter J Rousseeuw. 1987. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics, Vol. 20 (1987), 53--65.
Claude Sammut and Geoffrey I. Webb (Eds.). 2010. Encyclopedia of Machine Learning. Springer. https://doi.org/10.1007/978-0--387--30164--8
Xuehua Shen, Bin Tan, and ChengXiang Zhai. 2005. Implicit user modeling for personalized search. In Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management, Bremen, Germany, October 31 - November 5, 2005, Otthein Herzog, Hans-Jö rg Schek, Norbert Fuhr, Abdur Chowdhury, and Wilfried Teiken (Eds.). ACM, 824--831. https://doi.org/10.1145/1099554.1099747
Maria Stratigi, Haridimos Kondylakis, and Kostas Stefanidis. 2020. Multidimensional group recommendations in the health domain. Algorithms, Vol. 13, 3 (2020), 54.
Kazunari Sugiyama, Kenji Hatano, and Masatoshi Yoshikawa. 2004. Adaptive web search based on user profile constructed without any effort from users. In Proceedings of the 13th international conference on World Wide Web, WWW 2004, New York, NY, USA, May 17--20, 2004, Stuart I. Feldman, Mike Uretsky, Marc Najork, and Craig E. Wills (Eds.). ACM, 675--684. https://doi.org/10.1145/988672.988764
Bin Tan, Xuehua Shen, and ChengXiang Zhai. 2006. Mining long-term search history to improve search accuracy. In Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, August 20--23, 2006, Tina Eliassi-Rad, Lyle H. Ungar, Mark Craven, and Dimitrios Gunopulos (Eds.). ACM, 718--723. https://doi.org/10.1145/1150402.1150493
Qiaoyu Tan, Ninghao Liu, Xing Zhao, Hongxia Yang, Jingren Zhou, and Xia Hu. 2020. Learning to hash with graph neural networks for recommender systems. In Proceedings of The Web Conference 2020. 1988--1998.
Yanchao Tan, Carl Yang, Xiangyu Wei, Chaochao Chen, Weiming Liu, Longfei Li, Jun Zhou, and Xiaolin Zheng. 2022. Metacare: Meta-learning with hierarchical subtyping for cold-start diagnosis prediction in healthcare data. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 449--459.
Syed K Tanbeer and Edward R Sykes. 2021. MyHealthPortal--A web-based e-Healthcare web portal for out-of-hospital patient care. Digital Health, Vol. 7 (2021), 2055207621989194.
Robert Tibshirani, Guenther Walther, and Trevor Hastie. 2001. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 63, 2 (2001), 411--423.
Kiona K Weisel, Lukas M Fuhrmann, Matthias Berking, Harald Baumeister, Pim Cuijpers, and David D Ebert. 2019. Standalone smartphone apps for mental health-a systematic review and meta-analysis. NPJ digital medicine, Vol. 2, 1 (2019), 118.
Joyce Jiyoung Whang, Rundong Du, Sangwon Jung, Geon Lee, Barry Drake, Qingqing Liu, Seonggoo Kang, and Haesun Park. 2020. MEGA: Multi-view semi-supervised clustering of hypergraphs. Proceedings of the VLDB Endowment, Vol. 13, 5 (2020), 698--711.



Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors


Published In

cover image ACM Conferences
CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management
October 2023
5508 pages
This work is licensed under a Creative Commons Attribution International 4.0 License.



Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2023

Check for updates

Author Tags

  1. clustering
  2. healthcare
  3. nonnegative matrix factorization
  4. patient profiling
  5. recommendation systems


  • Short-paper

Funding Sources


CIKM '23

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25


Other Metrics

Bibliometrics & Citations


Article Metrics

  • 0
    Total Citations
  • 320
    Total Downloads
  • Downloads (Last 12 months)268
  • Downloads (Last 6 weeks)25
Reflects downloads up to 07 Jan 2025

Other Metrics


View Options

View options


View or Download as a PDF file.



View online with eReader.


Login options







Share this Publication link

Share on social media