[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2806416.2806428acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

External Data Access And Indexing In AsterixDB

Published: 17 October 2015 Publication History

Abstract

Traditional database systems offer rich query interfaces (SQL) and efficient query execution for data that they store. Recent years have seen the rise of Big Data analytics platforms offering query-based access to "raw" external data, e.g., file-resident data (often in HDFS). In this paper, we describe techniques to achieve the qualities offered by DBMSs when accessing external data. This work has been built into Apache AsterixDB, an open source Big Data Management System. We describe how we build distributed indexes over external data, partition external indexes, provide query consistency across access paths, and manage external indexes amidst concurrent activities. We compare the performance of this new AsterixDB capability to an external-only solution (Hive) and to its internally managed data and indexes.

References

[1]
Apache Hadoop. http://www.hadoop.org/.
[2]
Apache Hive. http://oozie.apache.org.
[3]
A. Abouzied et al. Invisible Loading: Access-driven Data Transfer from Raw Files into Database Systems. Proc. EDBT Conf., 2013.
[4]
I. Alagiannis et al. NoDB: Efficient Query Execution on Raw Data Files. Proc. SIGMOD Conf., 2012.
[5]
S. Alsubaiee et al. AsterixDB: A Scalable, Open Source BDMS. Proc. VLDB Endow., 7(14), 2014.
[6]
S. Alsubaiee et al. Storage Management in AsterixDB. Proc. VLDB Endow., 7(10), 2014.
[7]
S. Blanas et al. Parallel Data Analysis Directly on Scientific File Formats. Proc. SIGMOD Conf., 2014.
[8]
V. Borkar, M. Carey, et al. Hyracks: A Flexible and Extensible Foundation for Data-intensive Computing. Proc. ICDE Conf., 2011.
[9]
J. B. Buck et al. SciHadoop: Array-based Query Processing in Hadoop. Proc. ACM Int'l. Conf. on High Perf. Comp., Netw., Storage and Analysis, 2011.
[10]
Y. Cheng and F. Rusu. Parallel In-situ Data Processing with Speculative Loading. Proc. SIGMOD Conf., 2014.
[11]
J. Dittrich et al. Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing). Proc. VLDB Endow., 3(1--2), 2010.
[12]
M. Y. Eltabakh, F. Özcan, Y. Sismanis, et al. Eagle-eyed Elephant: Split-oriented Indexing in Hadoop. Proc. EDBT Conf., 2013.
[13]
V. R. Gankidi et al. Indexing HDFS Data in PDW: Splitting the Data from the Index. Proc. VLDB Endow., 7(13), 2014.
[14]
S. Idreos et al. Here Are My Data Files. Here Are My Queries. Where Are My Results? Proc. CIDR Conf., 2011.
[15]
K. Lorincz, K. Redwine, and J. Tov. Grep versus FlatSQL versus MySQL: Queries using UNIX tools vs. a DBMS, 2003.
[16]
J. Melton et al. SQL and Management of External Data. ACM SIGMOD Rec., 30(1), 2001.
[17]
C. Mohan et al. Transaction Management in the R* Distributed Database Management System. ACM TODS, 11(4), 1986.
[18]
Y. Xu, P. Kostamaa, and L. Gao. Integrating Hadoop and Parallel DBMS. Proc. SIGMOD Conf., 2010

Cited By

View all
  • (2023)Bringing Data Analysis to the Files and the Database to the Command Line2023 Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE)10.1109/CSCE60160.2023.00246(1490-1497)Online publication date: 24-Jul-2023
  • (2019)Adaptive partitioning and indexing for in situ query processingThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-019-00580-x29:1(569-591)Online publication date: 15-Nov-2019
  • (2019)Peer-to-Peer Data ManagementPrinciples of Distributed Database Systems10.1007/978-3-030-26253-2_9(395-448)Online publication date: 3-Dec-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management
October 2015
1998 pages
ISBN:9781450337946
DOI:10.1145/2806416
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. access
  2. asterixdb
  3. external data
  4. hdfs
  5. indexing

Qualifiers

  • Research-article

Funding Sources

Conference

CIKM'15
Sponsor:

Acceptance Rates

CIKM '15 Paper Acceptance Rate 165 of 646 submissions, 26%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)1
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Bringing Data Analysis to the Files and the Database to the Command Line2023 Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE)10.1109/CSCE60160.2023.00246(1490-1497)Online publication date: 24-Jul-2023
  • (2019)Adaptive partitioning and indexing for in situ query processingThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-019-00580-x29:1(569-591)Online publication date: 15-Nov-2019
  • (2019)Peer-to-Peer Data ManagementPrinciples of Distributed Database Systems10.1007/978-3-030-26253-2_9(395-448)Online publication date: 3-Dec-2019
  • (2019)Parallel Database SystemsPrinciples of Distributed Database Systems10.1007/978-3-030-26253-2_8(349-394)Online publication date: 3-Dec-2019
  • (2019)Database Integration—Multidatabase SystemsPrinciples of Distributed Database Systems10.1007/978-3-030-26253-2_7(281-347)Online publication date: 3-Dec-2019
  • (2019)Data ReplicationPrinciples of Distributed Database Systems10.1007/978-3-030-26253-2_6(247-280)Online publication date: 3-Dec-2019
  • (2019)Distributed Transaction ProcessingPrinciples of Distributed Database Systems10.1007/978-3-030-26253-2_5(183-246)Online publication date: 3-Dec-2019
  • (2019)Distributed Query ProcessingPrinciples of Distributed Database Systems10.1007/978-3-030-26253-2_4(129-182)Online publication date: 3-Dec-2019
  • (2019)Distributed Data ControlPrinciples of Distributed Database Systems10.1007/978-3-030-26253-2_3(91-127)Online publication date: 3-Dec-2019
  • (2019)Distributed and Parallel Database DesignPrinciples of Distributed Database Systems10.1007/978-3-030-26253-2_2(33-89)Online publication date: 3-Dec-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media