[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Twitter Data Modelling and Provenance Support for Key-Value Pair Databases

  • Conference paper
  • First Online:
Databases Theory and Applications (ADC 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12610))

Included in the following conference series:

Abstract

In Big Data environments, reliability of data plays an important role to determine trustworthiness of the outcomes of an analysis. Big data provenance ensures the reliability of data by providing details about the origin and historical paths of data. In recent years, the preponderance of big data and its applications are increasingly using Apache Cassandra due to its high availability and linear scalability. In this paper, we present a data provenance framework for Key-Value Pair Databases using the concept of Zero-Information Loss Database (ZILD). A large volume of real-time social media data is fetched from the Twitter’s network through live streaming with the help of Twitter Streaming APIs, and then modelled in Apache Cassandra based on a Query-Driven approach. This framework provides efficient provenance capturing support for select, aggregate, update, and historical queries. We evaluate the performance of proposed framework in terms of provenance capturing and querying capabilities using appropriate query sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 51.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 64.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chacko A., Kumar S.D.M.: Big data provenance research directions. In: IEEE Region 10 Conference (TENCON), pp. 651–656 (2017)

    Google Scholar 

  2. Chebotko A., Kashlev A., Lu, S.: A big data modeling methodology for Apache Cassandra. In: IEEE International Congress on Big Data, pp. 238–245 (2015)

    Google Scholar 

  3. Imran, A., Agrawal, R.: Data Provenance. In: Schintler, L.A., McNeely, C.L. (eds.) Springer Proceeding of Encyclopedia of Big Data. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-319-32001-4_58-1

    Chapter  Google Scholar 

  4. Rani, A., Goyal, N., Gadia S.K.: Data provenance for historical queries in relational database. In: ACM Compute, pp. 117–122 (2015)

    Google Scholar 

  5. Rani, A., Goyal, N., Gadia, S.K.: Efficient multi-depth querying on provenance of relational queries using graph database. In: ACM Compute, pp. 11–20 (2016)

    Google Scholar 

  6. Glavic, B.: Big data provenance: challenges and implications for benchmarking. In: Rabl, T., Poess, M., Baru, C., Jacobsen, H.-A. (eds.) WBDB -2012. LNCS, vol. 8163, pp. 72–80. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-53974-9_7

    Chapter  Google Scholar 

  7. Bermbach, D., Mullery, S., Eberhardt, J., Tai, S.: Informed schema design for column store-based database services. In: 8th International Conference on Service-Oriented Computing and Applications (SOCA), pp. 163–172 (2015)

    Google Scholar 

  8. Crawl, D., Wang, J., Altintas, I.: Provenance for mapreduce-based data-intensive workflows. In: 6th Workshop on Workflows in Support of Large-scale Science (WORKS11), pp. 21–30 (2011)

    Google Scholar 

  9. Ghoshal, D., Plale, B.: Provenance from log files: a BigData problem. EDBT/ICDT 13, 290–297 (2013)

    Google Scholar 

  10. Kulkarni, D.: A provenance model for key-value systems. In: 5th USENIX Workshop on the Theory and Practice of Provenance (TaPP), Article No. 12 (2013)

    Google Scholar 

  11. Kulkarni, D.: A fine-grained access control model for key-value systems. In: 3rd ACM Conference on Data and Application Security and Privacy (CODASPY 2013), pp. 161–164 (2013)

    Google Scholar 

  12. Ramesh, D., Kumar, A.: Query driven implementation of twitter base using Cassandra. In: IEEE International Conference on Current Trends toward Converging Technologies, pp. 1–4 (2018)

    Google Scholar 

  13. Hondo, F., et al.: Data provenance management for bioinformatics workflows using NoSQL database systems in a cloud computing environment. In: IEEE International Conference on Bioinformatics and Biomedicine, pp. 1910–1915 (2017)

    Google Scholar 

  14. Schmidt, F.M., Geyer, C., Schaeffer-Filho, A., Debloch, S., Hu, Y.: Change data capture in NoSQL databases: a functional and performance comparison. In: 20th IEEE Symposium on Computers and Communication (ISCC), pp. 562–567 (2015)

    Google Scholar 

  15. Bhargava, G., Gadia, S.K.: Relational database systems with zero information loss. IEEE Trans. Knowl. Data Eng. 5(1), 76–87 (1993)

    Article  Google Scholar 

  16. Wang, G., Tang, J.: The NoSQL principles and basic application of Cassandra model. In: IEEE International Conference on Computer Science and Service System (CSSS 2012), pp. 1332–1335 (2012)

    Google Scholar 

  17. Park, H., Ikeda, R., Widom, J.: RAMP: a system for capturing and tracing provenance in mapreduce workflows. VLDB Endow. 4(12), 1351–1354 (2011)

    Article  Google Scholar 

  18. Mahmood, K.: Performance comparison of NOSQL database Cassandra and SQL server for large databases. Indep. Stud. Res. Comput. 14(2), 21–25 (2016)

    Google Scholar 

  19. Senellart, P.: Provenance in databases: principles and applications. In: Krötzsch, M., Stepanova, D. (eds.) Reasoning Web. Explainable Artificial Intelligence. LNCS, vol. 11810, pp. 104–109. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-31423-1_3

    Chapter  Google Scholar 

  20. Agrawal, R., Imran, A., Seay, C., Walker, J.: A layer based architecture for provenance in big data. In: IEEE International Conference on Big Data, pp. 1–7 (2014)

    Google Scholar 

  21. Hernandez, R., Becerra, Y., Torresa, J., Ayguade, E.: Automatic query driven data modelling in Cassandra. Elsevier Procedia Comput. Sci. 51, 2822–2826 (2015)

    Article  Google Scholar 

  22. Ikeda, R., Park, H., Widom, J.: Provenance for generalized map and reduce workflows. In: 5th Biennial Conference on Innovative Data Systems Research (CIDR 11) (2011)

    Google Scholar 

  23. Akoush, S., Sohan, R., Hopper, A.: HadoopProv: towards provenance as a first class citizen in MapReduce. In: 5th USENIX Workshop on the Theory and Practice of Provenance (TaPP 13), Article No. 11 (2013)

    Google Scholar 

  24. Cheah, Y., Canon, R., Plale, B., Ramakrishnan, L.: Milieu: lightweight and configurable big data provenance for science. In: IEEE International Congress on Big Data, pp. 46–53 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Asma Rani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rani, A., Goyal, N., Gadia, S.K. (2021). Twitter Data Modelling and Provenance Support for Key-Value Pair Databases. In: Qiao, M., Vossen, G., Wang, S., Li, L. (eds) Databases Theory and Applications. ADC 2021. Lecture Notes in Computer Science(), vol 12610. Springer, Cham. https://doi.org/10.1007/978-3-030-69377-0_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-69377-0_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-69376-3

  • Online ISBN: 978-3-030-69377-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics