Computer Science > Computation and Language

arXiv:2207.05564 (cs)

[Submitted on 12 Jul 2022 (v1), last revised 18 Sep 2023 (this version, v4)]

Title:The expected sum of edge lengths in planar linearizations of trees. Theory and applications

Authors:Lluís Alemany-Puig, Ramon Ferrer-i-Cancho

View PDF

Abstract:Dependency trees have proven to be a very successful model to represent the syntactic structure of sentences of human languages. In these structures, vertices are words and edges connect syntactically-dependent words. The tendency of these dependencies to be short has been demonstrated using random baselines for the sum of the lengths of the edges or its variants. A ubiquitous baseline is the expected sum in projective orderings (wherein edges do not cross and the root word of the sentence is not covered by any edge), that can be computed in time $O(n)$. Here we focus on a weaker formal constraint, namely planarity. In the theoretical domain, we present a characterization of planarity that, given a sentence, yields either the number of planar permutations or an efficient algorithm to generate uniformly random planar permutations of the words. We also show the relationship between the expected sum in planar arrangements and the expected sum in projective arrangements. In the domain of applications, we derive a $O(n)$-time algorithm to calculate the expected value of the sum of edge lengths. We also apply this research to a parallel corpus and find that the gap between actual dependency distance and the random baseline reduces as the strength of the formal constraint on dependency structures increases, suggesting that formal constraints absorb part of the dependency distance minimization effect. Our research paves the way for replicating past research on dependency distance minimization using random planar linearizations as random baseline.

Comments:	New version updated
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2207.05564 [cs.CL]
	(or arXiv:2207.05564v4 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2207.05564
Journal reference:	Journal of Language Modelling, 2024, 12(1), 1--42
Related DOI:	https://doi.org/10.15398/jlm.v12i1.362

Submission history

From: Lluís Alemany-Puig [view email]
[v1] Tue, 12 Jul 2022 14:35:07 UTC (53 KB)
[v2] Fri, 15 Jul 2022 14:51:13 UTC (53 KB)
[v3] Thu, 29 Jun 2023 13:56:02 UTC (101 KB)
[v4] Mon, 18 Sep 2023 07:50:43 UTC (87 KB)

Computer Science > Computation and Language

Title:The expected sum of edge lengths in planar linearizations of trees. Theory and applications

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:The expected sum of edge lengths in planar linearizations of trees. Theory and applications

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators