More Web Proxy on the site http://driver.im/

abstract

Language Scaling: Applications, Challenges and Approaches

Authors:

Daxin JiangAuthors Info & Claims

KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

Pages 4068 - 4069

https://doi.org/10.1145/3447548.3470791

Published: 14 August 2021 Publication History

Abstract

Language scaling aims to deploy Natural Language Processing (NLP) applications economically across many countries/regions with different languages. Language scaling has been heavily invested by industry since many parties want to deploy their applications/services to global markets. At the same time, scaling out NLP applications to various languages, essentially a data science problem, remains a grand challenge due to the huge differences in the morphology, syntaxes, and pragmatics among different languages. We present a comprehensive survey and tutorial on language scaling. We start with a clear problem description for language scaling and an intuitive discussion on the overall challenges. Then, we outline two major categories of approaches to language scaling, namely, model transfer and data transfer. We present a taxonomy to summarize various methods in literature. A large part of the tutorial is organized to address various types of NLP applications. Finally, we discuss several important challenges in this area and future directions.

References

[1]

Zuyi Bao, Rui Huang, C. Li, and Kenny Zhu. 2019. Low-Resource Sequence Labeling via Unsupervised Multilingual Contextualized Representations. ArXiv abs/1910.10893 (2019).

[2]

Xilun Chen, Ahmed Hassan Awadallah, Hany Hassan, Wei Wang, and Claire Cardie. 2019. Multi-Source Cross-Lingual Model Transfer: Learning What to Share. In ACL.

[3]

Zewen Chi, Li Dong, Furu Wei, Wenhui Wang, Xian-Ling Mao, and He yan Huang. 2020. Cross-Lingual Natural Language Generation via Pre-Training. ArXiv abs/1909.10481 (2020).

[4]

Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Shijin Wang, and Guoping Hu. 2019. Cross-Lingual Machine Reading Comprehension. In EMNLP-IJCNLP. 1586--1595.

[5]

Y. Fang, S. Wang, Z. Gan, S. Sun, and JJ. Liu. 2020. FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding. ArXiv abs/2009.05166 (2020).

[6]

Roman Grundkiewicz, Marcin Junczys-Dowmunt, and Kenneth Heafield. 2019. Neural Grammatical Error Correction Systems with Unsupervised Pre-training on Synthetic Data. In BEA@ACL.

[7]

Jiang Guo, Wanxiang Che, David Yarowsky, Haifeng Wang, and Ting Liu. 2015. Cross-lingual dependency parsing based on distributed representations. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 1234--1244.

[8]

Masahiro Kaneko, Masato Mita, Shun Kiyono, Jun Suzuki, and Kentaro Inui. 2020. Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction. ArXiv abs/2005.00987 (2020).

[9]

Patrick Lewis, Barlas O?uz, Ruty Rinott, Sebastian Riedel, and Holger Schwenk. 2019. MLQA: Evaluating cross-lingual extractive question answering. arXiv preprint arXiv:1910.07475 (2019).

[10]

Shining Liang, Linjun Shou, Jian Pei, Ming Gong, WanLi Zuo, and Daxin Jiang. 2020. CalibreNet: Calibration Networks for Multilingual Sequence Labeling. ArXiv abs/2011.05723 (2020).

[11]

Junhao Liu, Linjun Shou, Jian Pei, Ming Gong, Min Yang, and Daxin Jiang. 2020. Cross-lingual Machine Reading Comprehension with Language Branch Knowledge Distillation. ArXiv abs/2010.14271 (2020).

[12]

Yinhan Liu and Jiatao Gu et al. 2020. Multilingual Denoising Pre-training for Neural Machine Translation. ArXiv abs/2001.08210 (2020).

[13]

Zihan Liu, Genta Indra Winata, Zhaojiang Lin, Peng Xu, and Pascale Fung. 2020. Attention-Informed Mixed-Language Training for Zero-shot Cross-lingual Taskoriented Dialogue Systems. ArXiv abs/1911.09273 (2020).

[14]

Ryan McDonald, Slav Petrov, and Keith B Hall. 2011. Multi-source transfer of delexicalized dependency parsers. (2011).

[15]

L. Qin, Minheng Ni, Y. Zhang, and W. Che. 2020. CoSDA-ML: Multi-Lingual Code-Switching Data Augmentation for Zero-Shot Cross-Lingual NLP. In IJCAI.

[16]

Michael Sejr Schlichtkrull and Anders Søgaard. 2017. Cross-lingual dependency parsing with late decoding for truly low-resource languages. arXiv preprint arXiv:1701.01623 (2017).

[17]

Jörg Tiedemann and Zeljko Agi?. 2016. Synthetic treebanking for cross-lingual dependency parsing. Journal of Artificial Intelligence Research 55 (2016), 209--248.

Digital Library

[18]

Yuxuan Wang, Wanxiang Che, Jiang Guo, Yijia Liu, and Ting Liu. 2019. Crosslingual BERT transformation for zero-shot dependency parsing. arXiv preprint arXiv:1909.06775 (2019).

[19]

Q. Wu, Zijia Lin, Börje F. Karlsson, B. Huang, and Jianguang Lou. 2020. UniTrans : Unifying Model Transfer and Data Transfer for Cross-Lingual Named Entity Recognition with Unlabeled Data. In IJCAI.

[20]

Q. Wu, Zijia Lin, Börje F. Karlsson, Jian-Guang Lou, and B. Huang. 2020. Single- /Multi-Source Cross-Lingual NER via Teacher-Student Learning on Unlabeled Data in Target Language. ArXiv abs/2004.12440 (2020).

[21]

Linting Xue and Noah Constant et al. 2020. mT5: A massively multilingual pre-trained text-to-text transformer. ArXiv abs/2010.11934 (2020).

[22]

Ikumi Yamashita, Satoru Katsumata, Masahiro Kaneko, Aizhan Imankulova, and Mamoru Komachi. 2020. Cross-lingual Transfer Learning for Grammatical Error Correction. In COLING.

[23]

Z. Yang, R. Salakhutdinov, and William W. Cohen. 2016. Multi-Task Cross-Lingual Sequence Tagging from Scratch. ArXiv abs/1603.06270 (2016).

Digital Library

[24]

Fei Yuan, Linjun Shou, X. Bai, Ming Gong, Yaobo Liang, N. Duan, Y. Fu, and Daxin Jiang. 2020. Enhancing Answer Boundary Detection for Multilingual Machine Reading Comprehension. In ACL.

[25]

Wei Zhao, Liang Wang, Kewei Shen, Ruoyu Jia, and Jingming Liu. 2019. Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data. In NAACL-HLT.

[26]

Wangchunshu Zhou, Tao Ge, C. Mu, Ke Xu, Furu Wei, and M. Zhou. 2020. Improving Grammatical Error Correction with Machine Translation Pairs. ArXiv abs/1911.02825 (2020).

Index Terms

Language Scaling: Applications, Challenges and Approaches
1. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Structure and multilingual text search
        Multilingual and cross-lingual retrieval

Recommendations

Cross-lingual word sense disambiguation for languages with scarce resources
Canadian AI'11: Proceedings of the 24th Canadian conference on Advances in artificial intelligence

Word Sense Disambiguation has long been a central problem in computational linguistics. Word Sense Disambiguation is the ability to identify the meaning of words in context in a computational manner. Statistical and supervised approaches require a large ...
A survey on Urdu and Urdu like language stemmers and stemming techniques

Stemming is one of the basic steps in natural language processing applications such as information retrieval, parts of speech tagging, syntactic parsing and machine translation, etc. It is a morphological process that intends to convert the inflected ...
A cross-lingual transfer learning method for online COVID-19-related hate speech detection
Abstract
During the COVID-19 pandemic, online social media platforms such as Twitter facilitate the exchange of information among people. However, the prevalence of “infodemic” such as online hate speech has exacerbated social rifts, discrimination, ...
Highlights
- Suggested a cross-lingual COVID-19-related hate speech detection problem.
- Introduced cross-lingual contrastive learning.
- Validated model performance on datasets in three languages.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

August 2021

4259 pages

ISBN:9781450383325

DOI:10.1145/3447548

General Chairs:
Feida Zhu
Singapore Management University
,
Beng Chin Ooi
National University of Singapore
,
Chunyan Miao
Nanyang Technology University
,
Program Chairs:
Haixun Wang,
Iryna Skrypnyk,
Wynne Hsu,
Sanjay Chawla

Copyright © 2021 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2021

Check for updates

Author Tags

Qualifiers

Abstract

Conference

KDD '21

Sponsor:

KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 14 - 18, 2021

Virtual Event, Singapore

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
109
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten