article

Free access

T2: a customizable parallel database for multi-dimensional data

Authors:

Chialin Chang,

Anurag Acharya,

Alan Sussman,

Joel SaltzAuthors Info & Claims

ACM SIGMOD Record, Volume 27, Issue 1

Pages 58 - 66

https://doi.org/10.1145/273244.273264

Published: 01 March 1998 Publication History

PDF eReader

Abstract

As computational power and storage capacity increase, processing and analyzing large volumes of data play an increasingly important part in many domains of scientific research. Typical examples of large scientific datasets include long running simulations of time-dependent phenomena that periodically generate snapshots of their state (e.g. hydrodynamics and chemical transport simulation for estimating pollution impact on water bodies [4, 6, 20], magnetohydrodynamics simulation of planetary magnetospheres [32], simulation of a flame sweeping through a volume [28], airplane wake simulations [21]), archives of raw and processed remote sensing data (e.g. AVHRR [25], Thematic Mapper [17], MODIS [22]), and archives of medical images (e.g. confocal light microscopy, CT imaging, MRI, sonography).

These datasets are usually multi-dimensional. The data dimensions can be spatial coordinates, time, or experimental conditions such as temperature, velocity or magnetic field. The importance of such datasets has been recognized by several database research groups and vendors, and several systems have been developed for managing and/or visualizing them [2, 7, 14, 19, 26, 27, 29, 31].

These systems, however, focus on lineage management, retrieval and visualization of multi-dimensional datasets. They provide little or no support for analyzing or processing these datasets -- the assumption is that this is too application-specific to warrant common support. As a result, applications that process these datasets are usually decoupled from data storage and management, resulting in inefficiency due to copying and loss of locality. Furthermore, every application developer has to implement complex support for managing and scheduling the processing.

Over the past three years, we have been working with several scientific research groups to understand the processing requirements for such applications [1, 5, 6, 10, 18, 23, 24, 28]. Our study of a large set of applications indicates that the processing for such datasets is often highly stylized and shares several important characteristics. Usually, both the input dataset as well as the result being computed have underlying multi-dimensional grids, and queries into the dataset are in the form of ranges within each dimension of the grid. The basic processing step usually consists of transforming individual input items, mapping the transformed items to the output grid and computing output items by aggregating, in some way, all the transformed input items mapped to the corresponding grid point. For example, remote-sensing earth images are often generated by performing atmospheric correction on several days worth of raw telemetry data, mapping all the data to a latitude-longitude grid and selecting those measurements that provide the clearest view.

In this paper, we present T2, a customizable parallel database that integrates storage, retrieval and processing of multi-dimensional datasets. T2 provides support for many operations including index generation, data retrieval, memory management, scheduling of processing across a parallel machine and user interaction. It achieves its primary advantage from the ability to seamlessly integrate data retrieval and processing for a wide variety of applications and from the ability to maintain and process multiple datasets with different underlying grids. Most other systems for multi-dimensional data have focused on uniformly distributed datasets, such as images, maps, and dense multi-dimensional arrays. Many real datasets, however, are non-uniform or unstructured. For example, satellite data is a two dimensional strip that is embedded in a three dimensional space; water contamination studies use unstructured meshes to selectively simulate regions and so on. T2 can handle both uniform and non-uniform datasets.

T2 has been developed as a set of modular services. Since its structure mirrors that of a wide variety of applications, T2 is easy to customize for different types of processing. To build a version of T2 customized for a particular application, a user has to provide functions to pre-process the input data, map input data to elements in the output data, and aggregate multiple input data items that map to the same output element.

T2 presents a uniform interface to the end users (the clients of the database system). Users specify the dataset(s) of interest, a region of interest within the dataset(s), and the desired format and resolution of the output. In addition, they select the mapping and aggregation functions to be used. T2 analyzes the user request, builds a suitable plan to retrieve and process the datasets, executes the plan and presents the results in the desired format.

In Section 2 we first present several motivating applications and illustrate their common structure. Section 3 then presents an overview of T2, including its distinguishing features and a running example. Section 4 describes each database service in some detail. An example of how to customize several of the database services for a particular application is given in Section 5. T2 is a system in evolution. We conclude in Section 6 with a description of the current status of both the T2 design and the implementation of various applications with T2.

Cited By

View all

Wang FLee RTeng DZhang XSaltz J(2024)High-Performance Spatial Data Analytics: Systematic R&D for Scale-Out and Scale-Up Solutions from the Past to NowProceedings of the VLDB Endowment10.14778/3685800.368591217:12(4507-4520)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.14778/3685800.3685912
Gupta RKurc TSaltz J(2021)Introduction to Digital Pathology from Historical Perspectives to Emerging PathomicsWhole Slide Imaging10.1007/978-3-030-83332-9_1(1-22)Online publication date: 30-Oct-2021
https://doi.org/10.1007/978-3-030-83332-9_1
Cheng YRusu F(2015)Formal representation of the SS-DB benchmark and experimental evaluation in EXTASCIDDistributed and Parallel Databases10.1007/s10619-014-7149-733:3(277-317)Online publication date: 1-Sep-2015
https://dl.acm.org/doi/10.1007/s10619-014-7149-7
Show More Cited By

Index Terms

T2: a customizable parallel database for multi-dimensional data
1. Information systems
  1. Data management systems
    1. Database design and models
  2. Information systems applications

Recommendations

T2-adjusted computed diffusion-weighted imaging

PurposeTo introduce T2-adjusted computed DWI (T2-cDWI), a method that provides synthetic images at arbitrary b-values and echo times (TEs) that improve tissue contrast by removing or increasing T2 contrast in diffusion-weighted images. Materials and ...
Fully automatic brain extraction algorithm for axial T2-weighted magnetic resonance images

In this paper we propose two brain extraction algorithms (BEA) for T2-weighted magnetic resonance imaging (MRI) scans. The T2-weighted image is first filtered with a low pass filter (LPF) to remove or subdue the background noise. Then the image is ...
T2: A Customizable Parallel Database For Multi-dimensional Data

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

ACM SIGMOD Record Volume 27, Issue 1

March 1998

103 pages

ISSN:0163-5808

DOI:10.1145/273244

Editor:
Michael Franklin
Univ. of Maryland, College Park

Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 March 1998

Published in SIGMOD Volume 27, Issue 1

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

30
Total Citations
View Citations
293
Total Downloads

Downloads (Last 12 months)56
Downloads (Last 6 weeks)9

Reflects downloads up to 04 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Wang FLee RTeng DZhang XSaltz J(2024)High-Performance Spatial Data Analytics: Systematic R&D for Scale-Out and Scale-Up Solutions from the Past to NowProceedings of the VLDB Endowment10.14778/3685800.368591217:12(4507-4520)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.14778/3685800.3685912
Gupta RKurc TSaltz J(2021)Introduction to Digital Pathology from Historical Perspectives to Emerging PathomicsWhole Slide Imaging10.1007/978-3-030-83332-9_1(1-22)Online publication date: 30-Oct-2021
https://doi.org/10.1007/978-3-030-83332-9_1
Cheng YRusu F(2015)Formal representation of the SS-DB benchmark and experimental evaluation in EXTASCIDDistributed and Parallel Databases10.1007/s10619-014-7149-733:3(277-317)Online publication date: 1-Sep-2015
https://dl.acm.org/doi/10.1007/s10619-014-7149-7
Cheng YRusu F(2013)Astronomical data processing in EXTASCIDProceedings of the 25th International Conference on Scientific and Statistical Database Management10.1145/2484838.2484875(1-4)Online publication date: 29-Jul-2013
https://dl.acm.org/doi/10.1145/2484838.2484875
Burns RLillaney KBerger DGrosenick LDeisseroth KReid RRoncal WManavalan PBock DKasthuri NKazhdan MSmith SKleissas DPerlman EChung KWeiler NLichtman JSzalay AVogelstein JVogelstein R(2013)The open connectome project data clusterProceedings of the 25th International Conference on Scientific and Statistical Database Management10.1145/2484838.2484870(1-11)Online publication date: 29-Jul-2013
https://dl.acm.org/doi/10.1145/2484838.2484870
Zhu MHuang XLiu SFu HFang QYang G(2013)Optimize Multidimensional Arrays Queries with Heterogeneous Replica MethodProceedings of the 2013 IEEE Eighth International Conference on Networking, Architecture and Storage10.1109/NAS.2013.43(272-276)Online publication date: 17-Jul-2013
https://dl.acm.org/doi/10.1109/NAS.2013.43
Soroush EBalazinska M(2013)Time travel in a scientific array databaseProceedings of the 2013 IEEE International Conference on Data Engineering (ICDE 2013)10.1109/ICDE.2013.6544817(98-109)Online publication date: 8-Apr-2013
https://dl.acm.org/doi/10.1109/ICDE.2013.6544817
Soroush EBalazinska MWang DSellis TMiller RKementsietsidis AVelegrakis Y(2011)ArrayStoreProceedings of the 2011 ACM SIGMOD International Conference on Management of data10.1145/1989323.1989351(253-264)Online publication date: 12-Jun-2011
https://dl.acm.org/doi/10.1145/1989323.1989351
Soroush EBalazinska M(2011)Hybrid merge/overlap execution technique for parallel array processingProceedings of the EDBT/ICDT 2011 Workshop on Array Databases10.1145/1966895.1966898(20-30)Online publication date: 25-Mar-2011
https://dl.acm.org/doi/10.1145/1966895.1966898
Ge TGrabiner DZdonik S(2011)Monte Carlo query processing of uncertain multidimensional array dataProceedings of the 2011 IEEE 27th International Conference on Data Engineering10.1109/ICDE.2011.5767887(936-947)Online publication date: 11-Apr-2011
https://dl.acm.org/doi/10.1109/ICDE.2011.5767887
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

Cited By

Index Terms

Recommendations

T2-adjusted computed diffusion-weighted imaging

Fully automatic brain extraction algorithm for axial T2-weighted magnetic resonance images