[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2506182.2506191acmotherconferencesArticle/Chapter ViewAbstractPublication PagessemanticsConference Proceedingsconference-collections
research-article

A practical experience concerning the parallel semantic annotation of a large-scale data collection

Published: 04 September 2013 Publication History

Abstract

From a computational point of view, the semantic annotation of large-scale data collections is an extremely expensive task. One possible way of dealing with this drawback is to distribute the execution of the annotation algorithm in several computing environments. In this paper, we show how the problem of semantically annotating a large-scale collection of learning objects has been conducted. The terms related to each learning object have been processed. The output was an RDF graph computed from the DBpedia database. According to an initial study, the use of a sequential implementation of the annotation algorithm would require more than 1600 CPU-years to deal with the whole set of learning objects (about 15 millions). For this reason, a framework able to integrate a set of heterogeneous computing infrastructures has been used to execute a new parallel version of the algorithm. As a result, the problem was solved in 178 days.

References

[1]
Aragón Institute of Engineering Research (I3A). http://i3a.unizar.es, 2013. Accessed 24 June 2013.
[2]
AraGrid. http://www.aragrid.es/, 2013. Accessed 24 June 2013.
[3]
S. Araújo, G.-J. Houben, and D. Schwabe. Linkator: Enriching web pages by automatically adding dereferenceable semantic annotations. In the 10th International Conference on Web Engineering (ICWE 2010), volume 6189 of Lecture Notes in Computer Science, pages 355--369. Springer, July 2010.
[4]
C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. DBpedia: A crystallization point for the Web of Data. Journal of Web Semantics, 7(3):154--165, 2009.
[5]
N. Carriero and D. Gelernter. Linda in context. Commun. ACM, 32(4):444--458, Apr. 1989.
[6]
S. Dill, N. Eiron, D. Gibson, D. Gruhl, R. Guha, A. Jhingran, T. Kanungo, K. S. McCurley, S. Rajagopalan, A. Tomkins, J. A. Tomlin, and J. Y. Zien. A case for automated large-scale semantic annotation. Web Semantics: Science, Services and Agents on the World Wide Web, 1(1):115--132, 2003.
[7]
J. Fabra, S. Hernández, P. Álvarez, and J. Ezpeleta. A framework for the flexible deployment of scientific workflows in grid environments. In the Third International Conference on Cloud Computing, GRIDs, and Virtualization, CLOUD COMPUTING '12, pages 1--8, 2012.
[8]
A. Garcia, M. Szomszor, H. Alani, and Ó. Corcho. Preliminary results in tag disambiguation using DBpedia. In the 1st International Workshop on Collective Knowledge Capturing and Representation (CKCaR 2009), September 2009.
[9]
gLite Middleware. http://glite.cern.ch/, 2013. Accessed 24 June 2013.
[10]
P. Heim, S. Hellmann, J. Lehmann, S. Lohmann, and T. Stegemann. RelFinder: Revealing relationships in RDF knowledge bases. In Semantic Multimedia, volume 5887 of Lecture Notes in Computer Science, pages 182--187. Springer, 2009.
[11]
S. Hernández, J. Fabra, P. Álvarez, and J. Ezpeleta. A Simulation-based Scheduling Strategy for Scientific Workflows. In the 2nd International Conference on Simulation and Modeling Methodologies, Technologies and Applications, SIMULTECH '12, pages 61--70, 2012.
[12]
D. Hillmann. Using dublin core. Technical report, Dublin Core Metadata Initiative, Mar. 2005. DCMI Recommended Resource.
[13]
HTCondor Middleware. http://research.cs.wisc.edu/htcondor/, 2013. Accessed 24 June 2013.
[14]
Institute for Biocomputation and Physics of Complex Systems (BIFI). http://bifi.es/en/, 2013. Accessed 24 June 2013.
[15]
P. Kacsuk, G. Dózsa, J. Kovács, R. Lovas, N. Podhorszki, Z. Balaton, and G. Gombás. P-grade: A grid programming environment. J. Grid Comput., 1:171--197, 2003.
[16]
P. Kacsuk, T. Kiss, and G. Sipos. Solving the grid interoperability problem by P-GRADE portal at workflow level. Futur. Gener. Comp. Syst., 24(7):744--751, 2008.
[17]
A. Kertész and P. Kacsuk. GMBS: A new middleware service for making grids interoperable. Futur. Gener. Comp. Syst., 26(4):542--553, 2010.
[18]
G. Kobilarov, T. Scott, Y. Raimond, S. Oliver, C. Sizemore, M. Smethurst, C. Bizer, and R. Lee. Media meets semantic web: How the BBC uses DBpedia and linked data to make connections. In the 6th European Semantic Web Conference (ESWC 2009), volume 5554 of Lecture Notes in Computer Science, pages 723--737. Springer, 2009.
[19]
M. Laclavík, M. Ciglan, M. Šeleng, and L. Hluchý. Empowering automatic semantic annotation in grid. In the 7th international conference on Parallel processing and applied mathematics, PPAM'07, pages 302--311. Springer-Verlag, 2008.
[20]
M. Laclavík, M. Šeleng, and L. Hluchý. Towards large scale semantic annotation built on mapreduce architecture. In the 8th international conference on Computational Science, Part III, ICCS '08, pages 331--338. Springer-Verlag, 2008.
[21]
M. Lama, J. C. Vidal, E. Otero-García, A. Bugarín, and S. Barro. Semantic Linking of Learning Object Repositories to DBpedia. Educational Technology & Society, 15(4):47--61, 2012.
[22]
Learning Technology Standards Committee. Draft standard for learning object metadata. Technical Report IEEE Standard 1484.12.1-2002, Institute of Electrical and Electronics Engineers, July 2002. Final Draft Standard.
[23]
P. Mendes, M. Jakob, A. Garcia-Silva, and C. Bizer. DBpedia Spotlight: Shedding light on the Web of Documents. In the 7th International Conference on Semantic Systems (I-SEMANTICS 2011), September 2011.
[24]
R. Mirizzi, A. Ragone, T. D. Noia, and E. D. Sciascio. Semantic tag cloud generation via DBpedia. In the 11th International Conference on E-Commerce and Web Technologies (EC-Web 2010), volume 61 of Lecture Notes in Business Information Processing, pages 36--48. Springer, 2010.
[25]
P.-O. Östberg and E. Elmroth. GJMF - a composable service-oriented grid job management framework. Futur. Gener. Comp. Syst., 29(1):144--157, 2013.
[26]
PireGrid. http://www.piregrid.eu/, 2013. Accessed 24 June 2013.
[27]
V. Tablan, I. Roberts, H. Cunningham, and K. Bontchev. Gatecloud.net: a platform for large-scale, open-source text processing on the cloud. Philosophical Transactions of the Royal Society A, 371(1983), 2013.
[28]
United Nations Educational, Scientific and Cultural Organization (UNESCO). Proposed International Standard Nomenclature for Fields of Science and Technology, Mar. 1988. Accessed 24 June 2013.
[29]
Q. Wu, M. Zhu, Y. Gu, P. Brown, X. Lu, W. Lin, and Y. Liu. A distributed workflow management system with case study of real-life scientific applications on grids. J. Grid Comput., 10:367--393, 2012.

Cited By

View all
  • (2020)A Contextual Driven Approach to Risk Event TaggingOn the Move to Meaningful Internet Systems: OTM 2019 Workshops10.1007/978-3-030-40907-4_26(239-248)Online publication date: 12-Feb-2020
  • (2013)Cost Evaluation of Migrating a Computation Intensive Problem from Clusters to CloudProceedings of the 10th International Conference on Economics of Grids, Clouds, Systems, and Services - Volume 819310.1007/978-3-319-02414-1_7(90-105)Online publication date: 18-Sep-2013

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
I-SEMANTICS '13: Proceedings of the 9th International Conference on Semantic Systems
September 2013
158 pages
ISBN:9781450319720
DOI:10.1145/2506182
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

  • St. Pölten University: St. Pölten University of Applied Sciences, Austria
  • HPI: Hasso-Plattner-Institut
  • Compass Verlag: Compass Verlag
  • Wolters Kluwer: Wolters Kluwer, Germany
  • Semantic Web Company: Semantic Web Company
  • TUG: Technical University of Graz

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 September 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. DBpedia
  2. grid and cluster computing
  3. learning objects
  4. semantic annotation
  5. workflow technologies

Qualifiers

  • Research-article

Funding Sources

Conference

ISEM '13
Sponsor:
  • St. Pölten University
  • HPI
  • Compass Verlag
  • Wolters Kluwer
  • Semantic Web Company
  • TUG

Acceptance Rates

Overall Acceptance Rate 40 of 182 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2020)A Contextual Driven Approach to Risk Event TaggingOn the Move to Meaningful Internet Systems: OTM 2019 Workshops10.1007/978-3-030-40907-4_26(239-248)Online publication date: 12-Feb-2020
  • (2013)Cost Evaluation of Migrating a Computation Intensive Problem from Clusters to CloudProceedings of the 10th International Conference on Economics of Grids, Clouds, Systems, and Services - Volume 819310.1007/978-3-319-02414-1_7(90-105)Online publication date: 18-Sep-2013

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media