[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Graph-based data management system for efficient information storage, retrieval and processing

Published: 01 March 2023 Publication History

Abstract

Data management systems rely on a correct design of data representation and software components. The data representation scheme plays a vital role in how the data are stored, which influences the efficiency of its processing and retrieval. The system components design realizes software engineering concepts to enable performance metrics such as scalability, efficiency, flexibility, maintainability, and extendibility. This paper presents a data management system that uses a graph-based data representation scheme to achieve an efficient data retrieval when using graph-based databases. Input data are transformed into vertices, edges, and labels while inserting them into the database. The proposed system consists of three layers which are: system beans layer, data access layer, and the database engine. Healthcare data are used to evaluate the system in comparison with resource description framework (RDF) semantics. Extensive experiments are conducted to compare different scenarios of data storage and retrieval using Neo4J, OrientDB, and RDF4J. Experimental results show that the performance of the proposed graph-based approach outperforms RDF4J framework in terms of insertion and retrieval time.

Highlights

Presenting a graph-based data representation for healthcare information.
The number of patients that is used in the experiments is up to three millions.
Modular 3-layered data management easily tailored for any NoSQL database engine.
Comparison of data insertion and retrieval using OriendDB, Neo4j, and RDF4J.

References

[1]
Abutaleb G., Eldahshan K., Elhabshy A., Data in the time of COVID-19: a general methodology to select and secure a NoSQL DBMS for medical data, PeerJ Computer Science (2020).
[2]
Aldwairi, M., Duwairi, R., & Alqarqaz, W. (2009). A Classification System for Predicting RNA Hairpin Loops. In 2009 international joint conference on bioinformatics, systems biology and intelligent computing (pp. 109–115). https://doi.org/10.1109/IJCBS.2009.123.
[3]
Almeida J., Fajarda O., Pereira A., Oliveira J., Strategies to access patient clinical data from distributed databases, in: Proceedings of the 12th international joint conference on biomedical engineering systems and technologies - HEALTHINF, SciTePress, INSTICC, 2019, pp. 466–473,.
[4]
Alsaadi H.H., Aldwairi M., Yasin F., Cachinho S.C.P., Hussein A., Artificial intelligence tool for the study of COVID-19 microdroplet spread across the human diameter and airborne space, 2022,. MedRxiv. arXiv:https://www.medrxiv.org/content/early/2022/06/01/2022.06.01.22275872.full.pdf, URL: https://www.medrxiv.org/content/early/2022/06/01/2022.06.01.22275872.
[5]
Angles R., Gutierrez C., Survey of graph database models, ACM Computing Surveys 40 (1) (2008).
[6]
Arias, J. F. (2020). The Benefits of Graph Databases for the Computation of Clinical Quality Measures. In 2020 IEEE 33rd international symposium on computer-based medical systems (CBMS) (pp. 433–436).
[7]
Asri, H., Mousannif, H., Al Moatassime, H., & Noel, T. (2015). Big data in healthcare: Challenges and opportunities. In 2015 international conference on cloud technologies and applications (cloudtech) (pp. 1–7).
[8]
Ataky T. M S., Ferreira L., Ribeiro M., Prado Santos M., Evaluation of graph databases performance through indexing techniques, International Journal of Artificial Intelligence & Applications (IJAIA) 06 (2015) 87–98.
[9]
Banane M., Belangour A., El Houssine L., Storing RDF data into big data NoSQL databases, in: Mizera-Pietraszko J., Pichappan P., Mohamed L. (Eds.), Lecture notes in real-time intelligent systems, Springer International Publishing, Cham, 2019, pp. 69–78.
[10]
Bao L., Yang J., Wu C.Q., Qi H., Zhang X., Cai S., XML2HBase: Storing and querying large collections of XML documents using a NoSQL database system, Journal of Parallel and Distributed Computing 161 (2022) 83–99,. URL: https://www.sciencedirect.com/science/article/pii/S0743731521002100.
[11]
Bhattacharyya, A., & Chakravarty, D. (2020). (Graph Database: A Survey). In 2020 international conference on computer, electrical communication engineering (ICCECE) (pp. 1–8).
[12]
Cheriguene, S., Azizi, N., Djellali, H., Bunakhla, O., Aldwairi, M., & Ziani, A. (2017). New computer aided diagnosis system for glaucoma disease based on twin support vector machine. In 2017 first international conference on embedded & distributed systems (EDiS) (pp. 1–6). https://doi.org/10.1109/EDIS.2017.8284039.
[13]
Cimmino A., Poveda-Villalon M., Garcia-Castro R., eWoT: A semantic interoperability approach for heterogeneous IoT ecosystems based on the web of things, Sensors 20 (3) (2020).
[14]
Davoudian A., Chen L., Liu M., A survey on NoSQL stores, ACM Computing Surveys 51 (2) (2018).
[15]
De Abreu D., Flores A., Palma G., Pestana V., Piñero J., Queipo J., et al., Choosing between graph databases and RDF engines for consuming and mining linked data, in: Proceedings of the fourth international conference on consuming linked data - Volume 1034, in: COLD13, CEUR-WS.org, Aachen, DEU, 2013, pp. 37–49.
[16]
Decker S., Mitra P., Melnik S., Framework for the semantic web: an RDF tutorial, IEEE Internet Computing 4 (6) (2000) 68–73.
[17]
Dumontier M., Building an effective semantic Web for health care and the life sciences, Semantic Web 1 (1, 2) (2010) 131–135.
[18]
Eshtay M., Sleit A., Aldwairi M., Implementing bi-temporal properties into various NoSQL database categories, International Journal of Computing 16 (1) (2019).
[19]
Foundation E., RDF4j framework, 2022, https://rdf4j.org/, Accessed: May, 25, 2022.
[20]
Foundation E., RDF4j tutorial, 2022, https://rdf4j.org/documentation/tutorials/, Accessed: May, 25, 2022.
[21]
Foundation T.A.S., SPARQL tutorial, 2022, https://jena.apache.org/tutorials/sparql.html, Accessed: May, 25, 2022.
[22]
Guia J., Soares V.G., Bernardino J., Graph databases: Neo4j analysis, in: Proceedings of the 19th international conference on enterprise information systems - Volume 3: ICEIS,, SciTePress, INSTICC, 2017, pp. 351–356.
[23]
Hertel A., Broekstra J., Stuckenschmidt H., RDF storage and retrieval systems, in: Handbook on ontologies, Springer Berlin Heidelberg, Berlin, Heidelberg, 2009, pp. 489–508.
[24]
Imran S., Mahmood T., Morshed A., Sellis T., Big data analytics in healthcare – A systematic literature review and roadmap for practical implementation, IEEE/CAA Journal of Automatica Sinica 8 (1) (2021) 1–22.
[25]
Janos A., Steinbrunn W., Pfisterer M., Detrano R., Heart disease data set, 2022, https://archive.ics.uci.edu/ml/datasets/Heart+Disease, Accessed: May, 25, 2022.
[26]
Jarrah M., Al-khatieb B., Mahasneh N., Al-khateeb B., Jararweh Y., GDBApex: A graph-based system to enable efficient transformation of enterprise infrastructures, Software - Practice and Experience (2020).
[27]
Katzung B.G., Medical conditions and their commonly used drugs, 2022, https://www.emedexpert.com/lists/conditions.shtml, Accessed: May, 25, 2022.
[28]
Kaur S., Kaur K., Visualizing class diagram using orientdb NOSQL data – store, International Journal of Computer Applications 145 (2016) 11–15.
[29]
Kotiranta P., Junkkari M., Nummenmaa J., Performance of graph and relational databases in complex queries, Applied Sciences 12 (13) (2022),. URL: https://www.mdpi.com/2076-3417/12/13/6490.
[30]
Kotsilieris T., An efficient agent based data management method of NoSQL environments for health care applications, Healthcare (Basel, Switzerland) 9 (3) (2021) 322.
[31]
Kuhn M., Letunic I., Jensen L., Bork P., The SIDER database of drugs and side effects., 2022, http://sideeffects.embl.de/download/, Accessed: May, 25, 2022.
[32]
Kundu G., Mukherjee N., Mondal S., Building a graph database for storing heterogeneous healthcare data, in: Senjyu T., Mahalle P.N., Perumal T., Joshi A. (Eds.), Information and communication technology for intelligent systems, Springer Singapore, Singapore, 2021, pp. 193–201.
[33]
Mason, R. T. (2015). NoSQL databases and data modeling techniques for a document-oriented NoSQL database. In Proceedings of informing science & IT education conference (InSITE) (pp. 259–268). https://doi.org/10.28945/2245.
[34]
Mater W., Aldwairi M., Ibrahim R., Enhanced teamwork communication model for electronic clinical pathways in healthcare, The Open Bioinformatics Journal 11 (3) (2018).
[35]
Neo4j I., The native graph database for todayś connected applications, 2022, https://neo4j.com/product/neo4j-graph-database/, Accessed: May, 25, 2022.
[36]
Neo4j I., Neo4j life sciences and healthcare network, 2022, https://neo4j.com/developer/life-sciences-and-healthcare/, Accessed: May, 25, 2022.
[37]
Neo4j I., Real-time recommendation engines for a connected world, 2022, https://go.neo4j.com/rs/710-RRC-335/images/Neo4j-Real-Time%20Recommendations-datasheet.pdf, Accessed: May, 25, 2022.
[38]
News I.T., The impact of big data on the healthcare industry, 2022, https://irishtechnews.ie/the-impact-of-big-data-on-the-healthcare-industry, Accessed: May, 25, 2022.
[39]
OrientDB I.T., Orientdb, 2022, https://orientdb.com/support/, Accessed: May, 25, 2022.
[40]
Özsu M.T., A survey of RDF data management systems, Frontiers of Computer Science 10 (2016).
[41]
Park, Y., Shankar, M., Park, B.-H., & Ghosh, J. (2014). Graph databases for large-scale healthcare systems: A framework for efficient data management and data services. In 2014 IEEE 30th international conference on data engineering workshops (pp. 12–19). https://doi.org/10.1109/ICDEW.2014.6818295.
[42]
Puustjarvi, J., & Puustjarvi, L. (2009). Semantic Exchange of Medicinal Data: A Way Towards Open Healthcare Systems. In 2009 third international conference on digital society (pp. 168–173). https://doi.org/10.1109/ICDS.2009.59.
[43]
Robu I., Robu V., Thirion B., An introduction to the semantic web for health sciences ibrarians, Journal of the Medical Library Association : JMLA 94 (2006) 198–205.
[44]
Santana L.H.Z., Mello R.d.S., An analysis of mapping strategies for storing RDF data into nosql databases, in: Proceedings of the 35th annual ACM symposium on applied computing, in: SAC 20, Association for Computing Machinery, New York, NY, USA, 2020, pp. 386–392.
[45]
Schätzle A., Przyjaciel-Zablocki M., Skilevic S., Lausen G., S2RDF: RDF querying with SPARQL on spark, Proceedings of the VLDB Endowment 9 (10) (2016) 804–815.
[46]
Scott I., Abdel-Hafez A., Barras M., Canaris S., What is needed to mainstream artificial intelligence in health care?, Australian Health Review (2021).
[47]
solid I.T., Graph databse ranking, 2022, https://db-engines.com/en/ranking/graph+dbms, Accessed: May, 25, 2022.
[48]
Suleykin, A. S., & Panfilov, P. B. (2022). Designing Data-Intensive Application System for Production Plans Data Processing and Near Real-Time Analytics. In 2022 8th international conference on control, decision and information technologies (CoDIT), Vol. 1 (pp. 1495–1500). https://doi.org/10.1109/CoDIT55151.2022.9804133.
[49]
Touahri R., Azizi N., Hammami N.E., Aldwairi M., Benzebouchi N.E., Moumene O., Multi source retinal fundus image classification using convolution neural networks fusion and gabor-based texture representation, International Journal of Computational Vision and Robotics 11 (4) (2021).
[50]
Ullah F., Habib M.A., Farhan M., Khalid S., Durrani M.Y., Jabbar S., Semantic interoperability for big-data in heterogeneous IoT infrastructure for healthcare, Sustainable Cities and Society 34 (2017) 90–96.
[51]
W3C F., Semantic web, 2022, https://www.w3.org/standards/semanticweb/, Accessed: May, 25, 2022.
[52]
W3C F., SPARQL 1.1 query language, 2022, https://www.w3.org/TR/sparql11-query/, Accessed: May, 25, 2022.
[53]
Wang, H., Miao, X., & Yang, P. (2018). Design and Implementation of Personal Health Record Systems Based on Knowledge Graph. In 2018 9th international conference on information technology in medicine and education (ITME) (pp. 133–136).
[54]
Zemmal, N., Azizi, N., Ziani, A., Benzebouchi, N. E., & Aldwairi, M. (2019). An Enhanced Feature Selection Approach based on Mutual Information for Breast Cancer Diagnosis. In 2019 6th international conference on image and signal processing and their applications (ISPA) (pp. 1–6). https://doi.org/10.1109/ISPA48434.2019.8966803.
[55]
Zenuni X., Raufi B., Ismaili F., Ajdari J., State of the art of semantic web for healthcare, Procedia - Social and Behavioral Sciences 195 (2015) 1990–1998. World Conference on Technology, Innovation and Entrepreneurship.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Information Processing and Management: an International Journal
Information Processing and Management: an International Journal  Volume 60, Issue 2
Mar 2023
1443 pages

Publisher

Pergamon Press, Inc.

United States

Publication History

Published: 01 March 2023

Author Tags

  1. Graph-based modeling
  2. NoSQL database
  3. RDF semantics
  4. Healthcare
  5. Databases

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media