Abstract
In Big Data environments, reliability of data plays an important role to determine trustworthiness of the outcomes of an analysis. Big data provenance ensures the reliability of data by providing details about the origin and historical paths of data. In recent years, the preponderance of big data and its applications are increasingly using Apache Cassandra due to its high availability and linear scalability. In this paper, we present a data provenance framework for Key-Value Pair Databases using the concept of Zero-Information Loss Database (ZILD). A large volume of real-time social media data is fetched from the Twitter’s network through live streaming with the help of Twitter Streaming APIs, and then modelled in Apache Cassandra based on a Query-Driven approach. This framework provides efficient provenance capturing support for select, aggregate, update, and historical queries. We evaluate the performance of proposed framework in terms of provenance capturing and querying capabilities using appropriate query sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chacko A., Kumar S.D.M.: Big data provenance research directions. In: IEEE Region 10 Conference (TENCON), pp. 651–656 (2017)
Chebotko A., Kashlev A., Lu, S.: A big data modeling methodology for Apache Cassandra. In: IEEE International Congress on Big Data, pp. 238–245 (2015)
Imran, A., Agrawal, R.: Data Provenance. In: Schintler, L.A., McNeely, C.L. (eds.) Springer Proceeding of Encyclopedia of Big Data. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-319-32001-4_58-1
Rani, A., Goyal, N., Gadia S.K.: Data provenance for historical queries in relational database. In: ACM Compute, pp. 117–122 (2015)
Rani, A., Goyal, N., Gadia, S.K.: Efficient multi-depth querying on provenance of relational queries using graph database. In: ACM Compute, pp. 11–20 (2016)
Glavic, B.: Big data provenance: challenges and implications for benchmarking. In: Rabl, T., Poess, M., Baru, C., Jacobsen, H.-A. (eds.) WBDB -2012. LNCS, vol. 8163, pp. 72–80. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-53974-9_7
Bermbach, D., Mullery, S., Eberhardt, J., Tai, S.: Informed schema design for column store-based database services. In: 8th International Conference on Service-Oriented Computing and Applications (SOCA), pp. 163–172 (2015)
Crawl, D., Wang, J., Altintas, I.: Provenance for mapreduce-based data-intensive workflows. In: 6th Workshop on Workflows in Support of Large-scale Science (WORKS11), pp. 21–30 (2011)
Ghoshal, D., Plale, B.: Provenance from log files: a BigData problem. EDBT/ICDT 13, 290–297 (2013)
Kulkarni, D.: A provenance model for key-value systems. In: 5th USENIX Workshop on the Theory and Practice of Provenance (TaPP), Article No. 12 (2013)
Kulkarni, D.: A fine-grained access control model for key-value systems. In: 3rd ACM Conference on Data and Application Security and Privacy (CODASPY 2013), pp. 161–164 (2013)
Ramesh, D., Kumar, A.: Query driven implementation of twitter base using Cassandra. In: IEEE International Conference on Current Trends toward Converging Technologies, pp. 1–4 (2018)
Hondo, F., et al.: Data provenance management for bioinformatics workflows using NoSQL database systems in a cloud computing environment. In: IEEE International Conference on Bioinformatics and Biomedicine, pp. 1910–1915 (2017)
Schmidt, F.M., Geyer, C., Schaeffer-Filho, A., Debloch, S., Hu, Y.: Change data capture in NoSQL databases: a functional and performance comparison. In: 20th IEEE Symposium on Computers and Communication (ISCC), pp. 562–567 (2015)
Bhargava, G., Gadia, S.K.: Relational database systems with zero information loss. IEEE Trans. Knowl. Data Eng. 5(1), 76–87 (1993)
Wang, G., Tang, J.: The NoSQL principles and basic application of Cassandra model. In: IEEE International Conference on Computer Science and Service System (CSSS 2012), pp. 1332–1335 (2012)
Park, H., Ikeda, R., Widom, J.: RAMP: a system for capturing and tracing provenance in mapreduce workflows. VLDB Endow. 4(12), 1351–1354 (2011)
Mahmood, K.: Performance comparison of NOSQL database Cassandra and SQL server for large databases. Indep. Stud. Res. Comput. 14(2), 21–25 (2016)
Senellart, P.: Provenance in databases: principles and applications. In: Krötzsch, M., Stepanova, D. (eds.) Reasoning Web. Explainable Artificial Intelligence. LNCS, vol. 11810, pp. 104–109. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-31423-1_3
Agrawal, R., Imran, A., Seay, C., Walker, J.: A layer based architecture for provenance in big data. In: IEEE International Conference on Big Data, pp. 1–7 (2014)
Hernandez, R., Becerra, Y., Torresa, J., Ayguade, E.: Automatic query driven data modelling in Cassandra. Elsevier Procedia Comput. Sci. 51, 2822–2826 (2015)
Ikeda, R., Park, H., Widom, J.: Provenance for generalized map and reduce workflows. In: 5th Biennial Conference on Innovative Data Systems Research (CIDR 11) (2011)
Akoush, S., Sohan, R., Hopper, A.: HadoopProv: towards provenance as a first class citizen in MapReduce. In: 5th USENIX Workshop on the Theory and Practice of Provenance (TaPP 13), Article No. 11 (2013)
Cheah, Y., Canon, R., Plale, B., Ramakrishnan, L.: Milieu: lightweight and configurable big data provenance for science. In: IEEE International Congress on Big Data, pp. 46–53 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Rani, A., Goyal, N., Gadia, S.K. (2021). Twitter Data Modelling and Provenance Support for Key-Value Pair Databases. In: Qiao, M., Vossen, G., Wang, S., Li, L. (eds) Databases Theory and Applications. ADC 2021. Lecture Notes in Computer Science(), vol 12610. Springer, Cham. https://doi.org/10.1007/978-3-030-69377-0_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-69377-0_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69376-3
Online ISBN: 978-3-030-69377-0
eBook Packages: Computer ScienceComputer Science (R0)