Computer Science > Computation and Language

arXiv:2301.06841 (cs)

[Submitted on 17 Jan 2023]

Title:Syntactically Robust Training on Partially-Observed Data for Open Information Extraction

Authors:Ji Qi, Yuxiang Chen, Lei Hou, Juanzi Li, Bin Xu

View PDF

Abstract:Open Information Extraction models have shown promising results with sufficient supervision. However, these models face a fundamental challenge that the syntactic distribution of training data is partially observable in comparison to the real world. In this paper, we propose a syntactically robust training framework that enables models to be trained on a syntactic-abundant distribution based on diverse paraphrase generation. To tackle the intrinsic problem of knowledge deformation of paraphrasing, two algorithms based on semantic similarity matching and syntactic tree walking are used to restore the expressionally transformed knowledge. The training framework can be generally applied to other syntactic partial observable domains. Based on the proposed framework, we build a new evaluation set called CaRB-AutoPara, a syntactically diverse dataset consistent with the real-world setting for validating the robustness of the models. Experiments including a thorough analysis show that the performance of the model degrades with the increase of the difference in syntactic distribution, while our framework gives a robust boundary. The source code is publicly available at this https URL.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2301.06841 [cs.CL]
	(or arXiv:2301.06841v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2301.06841

Submission history

From: Ji Qi [view email]
[v1] Tue, 17 Jan 2023 12:39:13 UTC (1,129 KB)

Computer Science > Computation and Language

Title:Syntactically Robust Training on Partially-Observed Data for Open Information Extraction

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Syntactically Robust Training on Partially-Observed Data for Open Information Extraction

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators