[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1599410.1599440acmconferencesArticle/Chapter ViewAbstractPublication PagesppdpConference Proceedingsconference-collections
research-article

Language support for processing distributed ad hoc data

Published: 07 September 2009 Publication History

Abstract

This paper presents the design, theory and implementation of Gloves, a domain-specific language that allows users to specify the provenance (the derivation history starting from the origins), syntax and semantic properties of collections of distributed data sources. In particular, Gloves specifications indicate where to locate desired data, how to obtain it, when to get it or to give up trying, and what format it will be in on arrival. The Gloves system compiles such specification into a suite of data-processing tools including an archiver, a provenance tracking system, a database loading tool, an alert system, an RSS feed generator and a debugging tool. In addition, the system generates description-specific libraries so that developers can create their own applications. Gloves also provides a generic infrastructure so that advanced users can build new tools applicable to any data source with a Gloves description. We show how Gloves may be used to specify data sources from two domains: CoMon, a monitoring system for PlanetLab's 800+ nodes, and Arrakis, a monitoring system for an AT&T web hosting service. We show experimentally that our system can scale to distributed systems the size of CoMon. Finally, we provide a denotational semantics for Gloves and use this semantics to prove two important theorems. The first shows that our denotational semantics respects the typing rules for the language, while the second demonstrates that our system correctly maintains the provenance.

References

[1]
Gene ontology project. http://www.geneontology.org/.
[2]
HP OpenView products. http://www.managementsoftware.hp.com/products/.
[3]
Nagios. http://www.nagios.org/.
[4]
P. Amagbégnon, L. Besnard, and P.L. Guernic. Implementation of the data-flow synchronous language SIGNAL. In PLDI, pages 163--173, 1995.
[5]
H. Balakrishnan, M.F. Kaashoek, D. Karger, R. Morris, and I. Stoica. Looking up data in p2p systems. Commun. ACM, 46(2):43--48, 2003.
[6]
J. Case, M. Fedor, M. Schoffstall, and J. Davin. A simple network management protocol (SNMP). RFC 1157, May 1990.
[7]
P. Caspi, D. Pilaud, N. Halbwachs, and J. Plaice. Lustre: A declarative language for programming synchronous systems. In POPL, pages 178--188, 1987.
[8]
J. Cheney, A. Ahmed, and U.A. Acar. Provenance as dependency analysis. In Database Programming Languages, volume 4797, pages 138--152. Lecture Notes in Computer Science, 2007.
[9]
C. Elliott and P. Hudak. Functional reactive animation. In ICFP, pages 263--273, 1997.
[10]
R. Ennals and D. Gay. User-friendly functional programming for web mashups. In ICFP, pages 223--233, 2007.
[11]
M. Fernandez, K. Fisher, J. Foster, M. Greenberg, and Y. Mandelbaum. A generic programming toolkit for PADS/ML: First-class upgrades for third-party developers. In PADL, pages 133--149, 2008.
[12]
K. Fisher and R. Gruber. PADS: A domain specific language for processing ad hoc data. In PLDI, pages 295--304, 2005.
[13]
M.J. Freedman, E. Freudenthal, and D. Mazieres. Democratizing content publication with Coral. In NSDI, 2004.
[14]
L. Golab and M.T. Özsu. Issues in data stream management. SIGMOD Record, 32(2):5--14, 2003.
[15]
R. Hinze. Generics for the masses. In ICFP, pages 19--22, 1998.
[16]
Y. Mandelbaum, K. Fisher, D.Walker, M. Fernandez, and A. Gleyzer. PADS/ML: A functional data description language. In POPL, 2007.
[17]
M.L. Massie, B.N. Chun, and D.E. Culler. The Ganglia distributed monitoring system: Design, implementation, and experience. Parallel Computing, 30(7), July 2004.
[18]
T.A. Mogensen. Efficient self-interpretations in lambda calculus. Journal of Functional Programming, 2(3):345--363, 1992.
[19]
Motion-Twin. XML-Light. http://tech.motion-twin.com/xmllight.html.
[20]
C. Myers, D. Barrett, M. Hibbs, C. Huttenhower, and O. Troyanskaya. Finding function: evaluation methods for functional genomic data. BMC Genomics, 7:187, 2006.
[21]
C. Myers, D. Robson, A. Wible, M. Hibbs, C. Chiriac, C. Theesfeld, K. Dolinski, and O. Troyanskaya. Discovery of biological networks from diverse functional genomic data. Genome Biology, 6(13), 2005.
[22]
T. Oetiker. Round robin database tool. http://oss.oetiker.ch/rrdtool/index.en.html.
[23]
T. Oetiker and D. Rand. Multi Router Traffic grapher. http://people.ee.ethz.ch/oetiker/webtools/mrtg.
[24]
V. Pai and K. Park. CoMon: Monitoring infrastructure for PlanetLab. http://comon.cs.princeton.edu/.
[25]
PlanetLab. An open testbed for developing, deploying and accessing planetary-scale services, September 2002.
[26]
R. Sealfon, M. Hibbs, C. Huttenhower, C. Myers, and O. Troyanskaya. GOLEM: An interactive graph-based gene ontology navigation and analysis tool. BMC Bioinformatics, 7:443, 2006.
[27]
C. Stark, B.-J. Breitkreutz, T. Reguly, L. Boucher, A. Breitkreutz, and M. Tyers. BioGRID: A general repository for interaction datasets. Nucl. Acids Res., 34:D535--539, 2006.
[28]
G. Stolpmann and P. Doane. Ocamlnet 2. http://projects.camlcity.org/projects/ocamlnet.html.
[29]
Z. Wan and P. Hudak. Functional reactive programming from first principles. In PLDI, pages 242--252, 2000.
[30]
M. Wand. The theory of fexprs is trivial. Lisp and Symbolic Computation, 10:189--199, 1998.
[31]
S. Weirich. Encoding intensional type analysis. In ESOP, pages 92--106, 2001.
[32]
H. Xi, C. Chen, and G. Chen. Guarded recursive datatype constructors. In POPL, pages 224--235, 2003.
[33]
Yahoo pipes. http://pipes.yahoo.com

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PPDP '09: Proceedings of the 11th ACM SIGPLAN conference on Principles and practice of declarative programming
September 2009
324 pages
ISBN:9781605585680
DOI:10.1145/1599410
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 September 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. declarative language
  2. distributed data sources

Qualifiers

  • Research-article

Conference

PPDP '09
Sponsor:

Acceptance Rates

Overall Acceptance Rate 230 of 486 submissions, 47%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 151
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media