Computer Science > Computation and Language

arXiv:2203.03850 (cs)

[Submitted on 8 Mar 2022]

Title:UniXcoder: Unified Cross-Modal Pre-training for Code Representation

Authors:Daya Guo, Shuai Lu, Nan Duan, Yanlin Wang, Ming Zhou, Jian Yin

View PDF

Abstract:Pre-trained models for programming languages have recently demonstrated great success on code intelligence. To support both code-related understanding and generation tasks, recent works attempt to pre-train unified encoder-decoder models. However, such encoder-decoder framework is sub-optimal for auto-regressive tasks, especially code completion that requires a decoder-only manner for efficient inference. In this paper, we present UniXcoder, a unified cross-modal pre-trained model for programming language. The model utilizes mask attention matrices with prefix adapters to control the behavior of the model and leverages cross-modal contents like AST and code comment to enhance code representation. To encode AST that is represented as a tree in parallel, we propose a one-to-one mapping method to transform AST in a sequence structure that retains all structural information from the tree. Furthermore, we propose to utilize multi-modal contents to learn representation of code fragment with contrastive learning, and then align representations among programming languages using a cross-modal generation task. We evaluate UniXcoder on five code-related tasks over nine datasets. To further evaluate the performance of code fragment representation, we also construct a dataset for a new task, called zero-shot code-to-code search. Results show that our model achieves state-of-the-art performance on most tasks and analysis reveals that comment and AST can both enhance UniXcoder.

Comments:	Published in ACL 2022
Subjects:	Computation and Language (cs.CL); Programming Languages (cs.PL); Software Engineering (cs.SE)
Cite as:	arXiv:2203.03850 [cs.CL]
	(or arXiv:2203.03850v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2203.03850

Submission history

From: Daya Guo [view email]
[v1] Tue, 8 Mar 2022 04:48:07 UTC (404 KB)

Computer Science > Computation and Language

Title:UniXcoder: Unified Cross-Modal Pre-training for Code Representation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:UniXcoder: Unified Cross-Modal Pre-training for Code Representation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators