[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1938551.1938556acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

The PADS project: an overview

Published: 21 March 2011 Publication History

Abstract

The goal of the PADS project, which started in 2001, is to make it easier for data analysts to extract useful information from ad hoc data files. This paper does not report new results, but rather gives an overview of the project and how it helps bridge the gap between the unmanaged world of ad hoc data and the managed world of typed programming languages and databases. In particular, the paper reviews the design of PADS data description languages, describes the generated parsing tools and discusses the importance of meta-data. It also sketches the formal semantics, discusses useful tools and how can they can be generated automatically from PADS descriptions, and describes an inferencing system that can learn useful PADS descriptions from positive examples of the data format.

References

[1]
Abstract syntax description language. http://sourceforge.net/projects/asdl.
[2]
A. R. Anderson, N. Belnap, and J. Dunn. Entailment: The Logic of Relevance and Necessity. Princeton University Press, 1975.
[3]
Apache. Apache Avro. http://avro.apache.org/docs/current/, 2009.
[4]
G. Back. DataScript - A specification and scripting language for binary data. In Generative Programming and Component Engineering, volume 2487, pages 66--77. Lecture Notes in Computer Science, 2002.
[5]
Apache Common Log Format. http://httpd.apache.org/docs/1.3/logs.html, 2010.
[6]
M. Daly, M. Fernández, K. Fisher, Y. Mandelbaum, and D. Walker. LaunchPads: A system for processing ad hoc data. In Demo at the ACM SIGPLAN Workshop on Programming Language Technologies for XML, 2006.
[7]
O. Dubuisson. ASN.1: Communication between heterogeneous systems. 2001.
[8]
M. F. Fernández, K. Fisher, J. N. Foster, M. Greenberg, and Y. Mandelbaum. A generic programming toolkit for PADS/ML: First-class upgrades for third-party developers. In ACM Symposium on Practical Aspects of Declarative Programming, Jan. 2008.
[9]
M. F. Fernández, K. Fisher, R. Gruber, and Y. Mandelbaum. PADX: Querying large-scale ad hoc data with XQuery. In ACM SIGPLAN Workshop on Programming Language Technologies for XML, Jan. 2006.
[10]
M. F. Fernández, J. Siméon, B. Choi, A. Marian, and G. Sur. Implementing XQuery 1.0: The Galax experience. In International Conference on Very Large Data Bases, pages 1077--1080, 2003.
[11]
K. Fisher and R. Gruber. PADS: A domain specific language for processing ad hoc data. In ACM Conference on Programming Language Design and Implementation, pages 295--304, June 2005.
[12]
K. Fisher, Y. Mandelbaum, and D. Walker. A dual semantics for the data description calculus. Available from http://www.cs.princeton.edu/~dpw/papers/tfp07.pdf, June 2007.
[13]
K. Fisher, Y. Mandelbaum, and D. Walker. The next 700 data description languages. Journal of the ACM, 57:10:1--10:51, February 2010.
[14]
K. Fisher, D. Walker, K. Zhu, and P. White. From dirt to shovels: Fully automatic tool generation from ad hoc data. In ACM Symposium on Principles of Programming Languages, Jan. 2008.
[15]
Google. Protocol buffers. http://code.google.com/p/protobuf/, 2010.
[16]
P. D. Grünwald. The Minimum Description Length Principle. MIT Press, May 2007.
[17]
T. Jim, Y. Mandelbaum, and D. Walker. Semantics and algorithms for data-dependent grammars. In ACM Symposium on Principles of Programming Languages, pages 417--430, New York, NY, USA, 2010. ACM.
[18]
H. Katz, editor. XQuery from the experts. Addison Wesley, 2004.
[19]
R. Lämmel and S. P. Jones. Scrap your boilerplate: A practical design pattern for generic programming. In ACM International Workshop on Types in Language Design and Implementation, pages 26--37, New York, NY, USA, 2003. ACM.
[20]
G. Mainland. Why it's nice to be quoted: Quasiquoting for Haskell. In ACM Workshop on Haskell, pages 73--82, New York, NY, USA, 2007. ACM.
[21]
Y. Mandelbaum, K. Fisher, D. Walker, M. Fernández, and A. Gleyzer. PADS/ML: A functional data description language. In ACM Symposium on Principles of Programming Languages, Jan. 2007.
[22]
P. McCann and S. Chandra. PacketTypes: Abstract specificationa of network protocol messages. In ACM Conference of Special Interest Group on Data Communications, pages 321--333, August 2000.
[23]
PADS project. http://www.padsproj.org/, 2010.
[24]
B. C. Pierce, A. Bohannon, J. N. Foster, M. B. Greenwald, S. Khanna, K. Kunal, and A. Schmitt. Harmony: A synchronization framework for heterogeneous tree-structured data. http://www.seas.upenn.edu/~harmony/.
[25]
P. Wadler and S. Blott. How to make ad-hoc polymorphism less ad hoc. In ACM Symposium on Principles of Programming Languages, pages 60--76, New York, NY, USA, 1989. ACM.
[26]
Q. Xi, K. Fisher, D. Walker, and K. Q. Zhu. Ad hoc data and the token ambiguity problem. In ACM Symposium on Practical Aspects of Declarative Programming, pages 91--106, Berlin, Heidelberg, 2009. Springer-Verlag.
[27]
Q. Xi and D. Walker. A context-free markup language for semi-structured text. In ACM Conference on Programming Language Design and Implementation, pages 221--232, 2010.
[28]
K. Q. Zhu, K. Fisher, and D. Walker. Incremental learning of system log formats. SIGOPS Operating System Review, 44:85--90, March 2010.

Cited By

View all
  • (2024)Research Report: An Optim (l) Approach to Parsing Random-Access Formats2024 IEEE Security and Privacy Workshops (SPW)10.1109/SPW63631.2024.00023(192-199)Online publication date: 23-May-2024
  • (2023)Saggitarius: A DSL for Specifying Grammatical DomainsProceedings of the ACM on Programming Languages10.1145/36228697:OOPSLA2(2023-2051)Online publication date: 16-Oct-2023
  • (2023)Dargent: A Silver Bullet for Verified Data Layout RefinementProceedings of the ACM on Programming Languages10.1145/35712407:POPL(1369-1395)Online publication date: 11-Jan-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICDT '11: Proceedings of the 14th International Conference on Database Theory
March 2011
285 pages
ISBN:9781450305297
DOI:10.1145/1938551
  • Program Chair:
  • Tova Milo

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 March 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. ad hoc data
  2. data description languages
  3. domain-specific languages

Qualifiers

  • Research-article

Funding Sources

Conference

EDBT/ICDT '11
EDBT/ICDT '11: EDBT/ICDT '11 joint conference
March 21 - 24, 2011
Uppsala, Sweden

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)2
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Research Report: An Optim (l) Approach to Parsing Random-Access Formats2024 IEEE Security and Privacy Workshops (SPW)10.1109/SPW63631.2024.00023(192-199)Online publication date: 23-May-2024
  • (2023)Saggitarius: A DSL for Specifying Grammatical DomainsProceedings of the ACM on Programming Languages10.1145/36228697:OOPSLA2(2023-2051)Online publication date: 16-Oct-2023
  • (2023)Dargent: A Silver Bullet for Verified Data Layout RefinementProceedings of the ACM on Programming Languages10.1145/35712407:POPL(1369-1395)Online publication date: 11-Jan-2023
  • (2022)Hardening attack surfaces with formally proven binary format parsersProceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3519939.3523708(31-45)Online publication date: 9-Jun-2022
  • (2021)Enabling Big Data Analytics and AI Solutions for Smart WarehouseIntelligent and Fuzzy Techniques for Emerging Conditions and Digital Transformation10.1007/978-3-030-85626-7_108(929-936)Online publication date: 24-Aug-2021
  • (2020)Structure interpretation of text formatsProceedings of the ACM on Programming Languages10.1145/34282804:OOPSLA(1-29)Online publication date: 13-Nov-2020
  • (2020)SpiffyACM Transactions on Storage10.1145/338636816:3(1-39)Online publication date: 4-Aug-2020
  • (2020)Research Report: ICARUS: Understanding De Facto Formats by Way of Feathers and Wax2020 IEEE Security and Privacy Workshops (SPW)10.1109/SPW50608.2020.00067(327-334)Online publication date: May-2020
  • (2019)Data lake managementProceedings of the VLDB Endowment10.14778/3352063.335211612:12(1986-1989)Online publication date: 1-Aug-2019
  • (2019)TxForest: A DSL for Concurrent FilestoresProgramming Languages and Systems10.1007/978-3-030-34175-6_17(332-354)Online publication date: 18-Nov-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media