[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3422713.3422731acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicbdtConference Proceedingsconference-collections
research-article

An Efficient Method for Scientific Data Retrieval Service

Published: 23 October 2020 Publication History

Abstract

The sharing of scientific research data on the Internet is already the trend in academia. More and more data have been published to the public throughout the web on Internet. Due to the rapid growth of data, and the requirements of data service quality, the efficiency of data retrieval services has become an important factor affecting service quality. Based on the characteristics of scientific data, and the actual requirements of Pharmaceutical Information Center (PIC, http://pharmdata.ncmi.cn), we propose an efficient scientific data service retrieval method which can greatly improve retrieval speed and service quality. This method includes two work phases. The first phase is to obtain meaningful search keywords from scientific data using semantic analysis technology, including effective keyword sets construction, and eliminating the impact of invalid search keywords. The second phase is to construct a Hash Index Tree (HI-Tree) for valid keywords. Scientific data retrieval service will just traverse the cached HI-Tree instead of traversing the entire database to minimize the database query operation. Compared with traditional database retrieval methods, the experimental results show that our method improves the retrieval efficiency greatly and make better user experience of the data services.

References

[1]
Pasquetto, I. V., Randles, B. M., and Borgman, C. L. 2017. On the reuse of scientific data. Data Science Journal. 16, 8 (Mar. 2017), 1--9. DOI=https://doi.org/10.5334/dsj-2017-008.
[2]
Zhang, Y., Yuan, F., Zhan, Y., and Wang, L. 2015. Relational database keyword retrieval based on index structure. Journal of Hebei University. 35, 1 (April. 2015), 95--101. DOI=http://doi.org/10.3969/j.issn.1000-1565.2015.01.017.
[3]
Merlo-Galeazzi R, Carrasco-Ochoa J A, Martínez-Trinidad J F, et al. Information retrieval based on a query document using maximal frequent sequences. 2013 32nd International Conference of the Chilean Computer Science Society (SCCC). (Nov.2013), 58--62. DOI=https://doi.org/10.1109/SCCC.2013.13
[4]
Liu, X., Wang, J., Zhu, M., Deng, F., and Sun, P. 2013. An effective directory index framework taking advantages of hash table and B (+)-tree. Journal of Xi'an Jiaotong University. 47, 4 (Apr. 2013), 105--111. DOI= http://doi.org/10.7652/xjtuxb201304018.
[5]
Zang, W., Li, J. Fang B., et al. 2015. H-Tree: Hierarchy index for online monitoring of big data streams. Chinese Journal of Computers. 38, 1 (Jan. 2015), 35--44.
[6]
Li, X., Song, B., Yu, G., and Wang, D. 2014. L(k)-index: An efficient k-bisimilarity based structural summary supporting label path. Chinese Journal of Computers. 37, 8 (Aug. 2014), 1732--1742.
[7]
Wang, Y., Gu, Y., Zhou, J., and Qu, W. 2015. A graph-based approach for semantic similar word retrieval. In 2015 International Conference on Behavioral, Economic and Socio-cultural Computing. (Oct. 2015), 24--27, DOI= https://doi.org/10.1109/BESC.2015.7365952.
[8]
Tang, X., Alabduljalil, M., Jin, X., and Yang, T. 2017. Partitioned similarity search with cache-conscious data traversal. ACM Transactions on Knowledge Discovery from Data. 11, 3 (April. 2017), 1--38. DOI = https://doi.org/10.1145/3014060.
[9]
Galakatos, A., Markovitch, M., Binnig, C., Fonseca, R., and Kraska, T. 2019. Fiting-tree: A data-aware index structure. In Proceedings of the 2019 International Conference on Management of Data. (June. 2019), 1189--1206. DOI = https://doi.org/10.1145/3299869.3319860.
[10]
Tang, J., Zhou, Z., Xue, X., and Wang, G. 2019. Using collaborative edge-cloud cache for search in Internet of things. IEEE Internet of Things Journal. 7, 2 (Feb. 2020), 922--936. DOI= https://doi.org/10.1109/JIOT.2019.2946389.
[11]
Tolosa, G., Feuerstein, E., Becchetti, L., and Marchetti-Spaccamela, A. 2017. Performance improvements for search systems using an integrated cache of lists+ intersections. Information Retrieval Journal. 20, 3 (May. 2017), 172--198. DOI=https://doi.org/10.1007/978-3-319-11918-2_22.
[12]
Nargesian, F., Zhu, E., Pu, K. Q., and Miller, R. J. 2018. Table union search on open data. Proceedings of the VLDB Endowment. 11, 7, (March. 2018), 813--825. DOI=https://doi.org/10.14778/3192965.3192973.
[13]
Yang, H. F., Chen, M. L., & Zhen, Z. 2017. Analysis on applicability of common chinese word segmentation software in literature study of traditional chinese medicine text. DEStech Transactions on Computer Science and Engineering. (May. 2017), 698--708. DOI= https://doi.org/10.12783/dtcse/cst2017/12573.

Cited By

View all

Index Terms

  1. An Efficient Method for Scientific Data Retrieval Service

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICBDT '20: Proceedings of the 3rd International Conference on Big Data Technologies
    September 2020
    250 pages
    ISBN:9781450387859
    DOI:10.1145/3422713
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 October 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Data Retrieval
    2. Data Service
    3. Scientific Data
    4. Semantic Analysis

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICBDT 2020

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 46
      Total Downloads
    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 26 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media