[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/2856151.2856152guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article
Free access

An adaptive and distributed framework for advanced IR

Published: 12 April 2000 Publication History

Abstract

It has been often noticed that modern IR ((Gregory, 1991), (Alan, 1991)) should exhibit capabilities that are sensitive to the document content, integrate interactivity, multimodality and multilinguality over a large scale and support the very dynamic nature of the current needs for information access (so to be adaptable to chanes of the sources, language and content/style). This paper discuss the architectural design aspects of TREVI (Text Retrieval and Enrichment for Vital Information - ESPRIT project EP23311), a distributed Object-Oriented Java/CORBA driven system for NLP-driven news classification, enrichment and delivery. The advanced features of TREVI include the extensive use of a well defined model ((Mazzucchelli, 1999)) based on a typed mechanism for static/dynamic control of the distributed process and on a principled representation of linguistic types into computational OO data structures and the adaptivity of the employed linguistic p rocessors (namely, the robust and lexically-driven parser). The original aspect of TREVI is its novel combination of a systematic design approach with the contribution of advanced and adaptive NLP processors for content-driven text classification. A full toolkit system was developed within the operational scenarios related to three different users (i.e. three different information providers), in two languages (English and Spanish) and its good performances (classification accuracy and usability) are basic evidences of the success of this approach.

References

[1]
Abney S. (1996). Part-of-speech tagging and partial parsing. In K. Church S. Young G. editor, Corpusbased methods in language and speech. Kluwer academic publishers, Dordrecht.
[2]
Aït-Mokhtar S. and Chanod J.-P. (1997). Incremental finite-state parsing. In Proceedings of ANLP97, Washington.
[3]
Alan S. (1991). In Pazienza M. editor, Information Extraction - A Multidisciplinary Approach to an Emerging Information Technology. Springer Verlag, Berlin.
[4]
Basili R., Cucchiarelli A., Consoli C., Pazienza M., and Velardi P. (1998a). Automatic adaption to wordnet to sublanguages and to computational tasks.
[5]
Basili R., Della Rocca M., and Pazienza M. (1996). Contextual word sense tuning and disambiguation. Journal of Applied Artificial Intelligence.
[6]
Basili R., M. D. N., and M. T. P. (1999a). Representing document content via an object-oriented paradigm. In Z. W. Ras A. S. editor, Foundations of Intelligent Systems, 11th International Symposium on Methodologies for Intelligent Systems, Warsaw, Poland, June 8-11, 1999, number 1609. Lecture Notes in Artificial Intelligence, Springer-Verlag.
[7]
Basili R., Marziali A., and Pazienza M. T. (1994). Modelling syntactic uncertainty in lexical acquisition from texts. Journal of Quantitative Linguistics, 1.
[8]
Basili R., Moschitti A., and Pazienza M. T. (2000a). Language sensitive text classification. In Proceedings of RIAO '2000, Paris, Fr.
[9]
Basili R., Nanni M. D., Mazzucchelli L., Marabello M., and Pazienza M. (August 1998b). Nlp for text classification: the trevi experience. Proceedings of the Second International Conference on Natural Language Processing and Industrial Applications, Universite' de Moncton, New Brunswick (Canada).
[10]
Basili R., Pazienza M. T., and Velardi P. (1992). A shallow syntactic analyser to extract word association from corpora. Literary and linguistic computing, 7:114--124.
[11]
Basili R., Pazienza M. T., and Vindigni M. (1997). Corpus-driven unsupervised learning of verb subcategorization frames. Number 1321 in LNAI, Heidelberg, Germany. Springer-Verlag.
[12]
Basili R., Pazienza M. T., and Zanzotto F. M. (1998c). Efficient parsing for information extraction. In Proc. of the ECAI98, Brighton, UK.
[13]
Basili R., Pazienza M. T., and Zanzotto F. M. (1999b). Lexicalizing a shallow parser. In Proc. of TALN 99 - LE TRAITEMENTAUTOMATIQUE DES LANGUES NATURELLES, Cargèse, Corse.
[14]
Basili R., Pazienza M. X., and Zanzotto F. M. (2000b). Customizable modular lexicalized parsing. In Proc. of the ECA198, Brighton, UK.
[15]
Basili R., R. C., M. T. P., M. S., P. V., M. V., and Y. W. (26 May 1998d). An empirical approach to lexical tuning. In Proceedings of the Workshop "Adapting Lexical and Corpus Resources to Sublanguages and Applications", LREC First International Conference on Language Resources and Evaluation, Granada, Spain.
[16]
Carroll J. and Briscoe T. editors (1996). Proceedings of the WORKSHOP ON ROBUST PARSING, held jointly with ESSLLI96, Prague, Czech Republic.
[17]
Cunningham H., K. H., R. G., and Y. W. (March-April, 1997). Software infrastructure for natural language processing. In Proceedings of Fifth Conference on Applied Natural Language Processing, Washington, DC, USA. Morgan-Kaufmann Publishers.
[18]
et.al L. (1999). Specifications of the overall toolkit architecture. EP 23311 TREVI Project Deliverable 7D1.
[19]
Gregory G. (1991). Short queries linguistic expansion techniques: Palliating one-word queries by providing intermediate structures to text. In Pazienza M. editor, Information Extraction - A Multidisciplinary Approach to an Emerging Information Technology. Springer Verlag, Berlin.
[20]
Grinberg D., Lafferty J., and Sleator D. (1996). A robust parsing algorithm for link grammar In 4th International workshop on parsing tecnologies, Prague.
[21]
Mazzucchelli L. (1999). A model for java/corba & oodbms distributed architectures. DOA '99 - International Symposium for Distributed Objects and Applications IEEE Press.
[22]
MUC-6 (1995). Proceedings of the sixth message understanding conference (muc-6). In Columbia, MD. Morgan Kaufmann.
[23]
OMG (1995). The common object request broker: Architecture and specification (version 2.0). Technical Document PTC/96-03-0.
[24]
Pazienza M. editor (1997). Information Extraction - A Multidisciplinary Approach to an Emerging Information Technology. Springer Verlag, Berlin.
[25]
Pazienza M. editor (1999). Information Extraction - Towards Scalable, Adaptive Systems. Springer Verlag, Berlin.
[26]
Pollard C. and Sag I. (1994). Head-driven Phrase Structured Grammar. Chicago CSLI, Stanford.
[27]
R. Orfali and Harkey D. (1998). Client/Server programming with Java and CORBA. Addison-Wesley, second edition.
[28]
Sanz I. and Mazzucchelli L. (1999). Distributed objects in a large scale text processing system. DOA '99 - International Symposium for Distributed Objects and Applications IEEE Press.
[29]
TREVI (1997). Text retrieval and enrichment for vital information, european esprit research project ep 23311 (http://trevi.itaca.it).
[30]
Yang Z. and Duddy K. (1999). Corba: A platform for distributed object computing (a state of the art report on omg/corba).
[31]
Yarowsky D. (1992). Word sense disambiguation using statistical models of roget's categories trained on large corpora. In Proc. of COLING-92, Nantes, France.
[32]
Zajac R., M. C., and N. S. (Washington, DC, USA, March-April, 1997). An open distributed architecture for reuse and integration of heterogeneous nlp component. In Booch G. editor, Proceedings of Fifth Conference on Applied Natural Language Processing. Addison-Wesley Object Technology Series.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
RIAO '00: Content-Based Multimedia Information Access - Volume 2
April 2000
859 pages

Publisher

LE CENTRE DE HAUTES ETUDES INTERNATIONALES D'INFORMATIQUE DOCUMENTAIRE

Paris, France

Publication History

Published: 12 April 2000

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 46
    Total Downloads
  • Downloads (Last 12 months)29
  • Downloads (Last 6 weeks)5
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media