[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1807167.1807294acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
demonstration

HadoopDB in action: building real world applications

Published: 06 June 2010 Publication History

Abstract

HadoopDB is a hybrid of MapReduce and DBMS technologies, designed to meet the growing demand of analyzing massive datasets on very large clusters of machines. Our previous work has shown that HadoopDB approaches parallel databases in performance and still yields the scalability and fault tolerance of MapReduce-based systems. In this demonstration, we focus on HadoopDB's flexible architecture and versatility with two real world application scenarios: a semantic web data application for protein sequence analysis and a business data warehousing application based on TPC-H. The demonstration offers a thorough walk-through of how to easily build applications on top of HadoopDB.

References

[1]
HadoopDB Project. http://hadoopdb.sourceforge.net.
[2]
TPC-H. http://www.tpc.org/tpch/.
[3]
Universal Protein Resource. http://www.uniprot.org/.
[4]
D. J. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach. SW-Store: A Vertically Partitioned DBMS for Semantic Web Data Management. VLDB Journal, 18(2), April 2009.
[5]
A. Abouzeid, K. Bajda-Pawlikowski, D. J. Abadi, A. Silberschatz, and A. Rasin. HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. In VLDB, 2009.
[6]
K. Bajda-Pawlikowski. Querying RDF data stored in DBMS: SPARQL to SQL Conversion. Yale CS Technical Report TR-1409, 2008.
[7]
C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: a not-so-foreign language for data processing. In Proc. of SIGMOD, 2008.
[8]
A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. R. Madden, and M. Stonebraker. A Comparison of Approaches to Large Scale Data Analysis. In SIGMOD, 2009.
[9]
A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Antony, H. Liu, and R. Murth. Hive: A Petabyte Scale Data Warehouse Using Hadoop. In Proc. of ICDE, 2010.
[10]
W3C. Resource Description Framework. http://www.w3.org/RDF/.
[11]
W3C. SPARQL Query Language for RDF. http://www.w3.org/TR/rdf-sparql-query/.

Cited By

View all
  • (2023)Query Processing on Gaming ConsolesProceedings of the 19th International Workshop on Data Management on New Hardware10.1145/3592980.3595313(86-88)Online publication date: 18-Jun-2023
  • (2021)Distributed Multi-Dimensional Data Index Strategy in Cloud Computing Environment2021 5th International Conference on Electronics, Communication and Aerospace Technology (ICECA)10.1109/ICECA52323.2021.9675845(01-04)Online publication date: 2-Dec-2021
  • (2018)DWIaaS: Data Warehouse Infrastructure as a Service for Big Data AnalyticsTransactions on Computational Collective Intelligence XXX10.1007/978-3-319-99810-7_7(133-151)Online publication date: 24-Sep-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
June 2010
1286 pages
ISBN:9781450300322
DOI:10.1145/1807167
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. hadoop
  2. hadoopdb
  3. hive
  4. mapreduce
  5. parallel database
  6. semantic web
  7. tpc-h
  8. uniprot

Qualifiers

  • Demonstration

Conference

SIGMOD/PODS '10
Sponsor:
SIGMOD/PODS '10: International Conference on Management of Data
June 6 - 10, 2010
Indiana, Indianapolis, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)3
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Query Processing on Gaming ConsolesProceedings of the 19th International Workshop on Data Management on New Hardware10.1145/3592980.3595313(86-88)Online publication date: 18-Jun-2023
  • (2021)Distributed Multi-Dimensional Data Index Strategy in Cloud Computing Environment2021 5th International Conference on Electronics, Communication and Aerospace Technology (ICECA)10.1109/ICECA52323.2021.9675845(01-04)Online publication date: 2-Dec-2021
  • (2018)DWIaaS: Data Warehouse Infrastructure as a Service for Big Data AnalyticsTransactions on Computational Collective Intelligence XXX10.1007/978-3-319-99810-7_7(133-151)Online publication date: 24-Sep-2018
  • (2018)A Survey on Parallel Database Systems from a Storage Perspective: Rows Versus ColumnsDatabase and Expert Systems Applications10.1007/978-3-319-99133-7_1(5-20)Online publication date: 7-Aug-2018
  • (2018)Big Data IndexingEncyclopedia of Big Data Technologies10.1007/978-3-319-63962-8_255-1(1-11)Online publication date: 9-May-2018
  • (2017)FlashViewProceedings of the VLDB Endowment10.14778/3137765.313779610:12(1869-1872)Online publication date: 1-Aug-2017
  • (2017)Data Organization and Curation in Big DataHandbook of Big Data Technologies10.1007/978-3-319-49340-4_5(143-178)Online publication date: 26-Feb-2017
  • (2016)The Six Pillars for Building Big Data Analytics EcosystemsACM Computing Surveys10.1145/296314349:2(1-36)Online publication date: 2-Aug-2016
  • (2016)Predicate-Oriented Query of RDF Data Based on a Distributed Storage Model2016 IEEE First International Conference on Data Science in Cyberspace (DSC)10.1109/DSC.2016.43(37-43)Online publication date: Jun-2016
  • (2015)Cocktail: A hybrid system combining Hadoop and Storm2015 IEEE Advanced Information Technology, Electronic and Automation Control Conference (IAEAC)10.1109/IAEAC.2015.7428510(20-25)Online publication date: Dec-2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media