Keywords

1 Introduction

Locally Testable sets of strings in the strict sense (Strictly Local, SL) are a subclass of the regular languages with interesting properties [16, 20]. Rogers [18] presents a generalization of SL to sets of trees and shows they characterize the derivations of context-free languages. Chandlee et al. [2, 3] generalize SL formal languages in another direction. They present classes of strictly local string-to-string functions. In this paper, we generalize the SL class to a class of functions over trees. In particular, we present a characterization in terms of frontier-to-root, deterministic, linear tree transducers [5, 7].

One motivation comes from computational and theoretical linguistics, where the goal of one program is to identify and understand the minimally powerful classes of formal grammars which can describe aspects of natural language [4]. To this end, subregular sets and functions over strings have been used to distinguish and characterize phonological generalizations [11]. More recent research has begun studying natural language syntax from the perspective of subregular sets and functions over trees, as opposed to strings [9, 10].

One rationale for studying subclasses of regular string/tree sets and relations is that it is known that finite-state methods are sufficient to describe aspects of natural language. For phonology and morphology, finite-state methods over strings appear sufficient [1, 17]. For syntax, finite-state methods over trees similarly appear sufficient. Rogers [19] showed that a syntactic theory of English can be understood in terms of Monadic Second Order (MSO) definable constraints over trees. Languages with more complex constructions can be understood in terms of regular tree languages undergoing regular tree transductions [8, 14]. Tree transducers also have found broad application in machine translation [13, 15]. It remains an open question, however, whether the full power of regular computations are necessary [11].

Another rationale for identifying subregular classes of languages is that learning problems may be easier to solve in the sense of requiring less and time and resources than otherwise [12].

By defining and characterizing the Input Strictly Local class of tree transducers, we hope to take a first step in developing a more fine-grained perspective on the syntactic transformations present in natural languages. The structure of the paper is as follows.

Section 2 defines trees and associated properties and functions based on their recursive structure. In this way we follow the tree transducer literature [5, 7]. However, we note that we do not adopt the convention of ranked alphabets. Instead we obtain their effects by bounding the largest number of children a tree in some tree set can have and by requiring that the pre-image of the transition function of the tree automata is finite. While this is unconventional, we believe it simplifies our presentation and proofs. Section 2 also reviews strictly local treesets and reviews the proof of the abstract characterization of them [18].

Section 3 presents the main theoretical results. Deterministic, frontier-to-root, finite-state, linear tree transducers (abbreviated DFT) are defined, Input Strictly Local (ISL) tree functions are defined abstractly and then characterized in terms DFTs. Section 4 concludes.

2 Preliminaries

Assume a finite alphabet \(\varSigma \) and let \(\varSigma ^*\) denote the set of all strings of finite length that can be obtained via concatenation of the elements of \(\varSigma \). We denote the empty string with \(\lambda \).

Consider an alphabet \(\varSigma \) and symbols [ ] which do not belong to it. A tree is defined inductively as follows:

  • Base Case: For each \(a\in \varSigma \), \(a[]\) is a tree. The tree \(a[]\) is also called a leaf. We also write a[\(\lambda \)] for \(a[]\).

  • Inductive Case: If a \(\in \varSigma \) and \(t_1 t_2 \dots t_n\) is a string of trees of length n (n \(\ge \) 1), then a[\(t_1 t_2 \dots t_n\)] is a tree.

For a trees \(t=a[t_1 t_2\dots t_n]\), the trees \(t_1,t_2,\ldots t_n\) are the children of t and \(t_i\) denotes the ith child. \(\varSigma ^T\) denotes the set of all trees of finite size from \(\varSigma \).

The depth, size, yield, root, branch, and the set of subtrees of a tree t, written \(\mathtt {dp}(t)\), |t|, \(\mathtt {yld}(t)\), \(\mathtt {root}(t)\), \(\mathtt {branch}(t)\) and \(\mathtt {sub}(t)\), respectively, are defined as follows. For all \(a \in \varSigma \):

  • If \(t = a[]\), then \(\mathtt {dp}(t) = 1\), |t| = 1, \(\mathtt {yld}(t) = a\), \(\mathtt {root}(t) = a\), \(\mathtt {branch}(t)=0\), and \(\mathtt {sub}(t) = \{t\}\).

  • If \(t = a[t_1 t_2 \dots t_n]\) then \(\mathtt {dp}(t) = \mathtt {max}\{\mathtt {dp}(t_i) | 1 \le i \le n\} + 1 \), and \(|t| = 1 + \sum _{i=1}^n |t_i|\), and \(\mathtt {yld}(t) = \mathtt {yld}(t_1) \mathtt {yld}(t_2) \dots \mathtt {yld}(t_n)\), and \(\mathtt {root}(t) = a\), and \(\mathtt {branch}(t) = n\), and \(\mathtt {sub}(t) = \bigcup \{\mathtt {sub}(t_i) | 1 \le i \le n\} \cup \{t\}\).

The roots of the subtrees of a tree t are called nodes. The root of a tree is also called its root node. Leaves are also called frontier nodes.

The branching degree of a tree t is \(\mathtt {branch\_degree}(t) = \mathtt {max}\{\mathtt {branch}(u)\mid u\in \mathtt {sub}(t)\}\). Let \(\varSigma _n^T\) denotes the set of trees \(\{ t\in \varSigma ^T\mid \mathtt {branch\_degree}(t) \le n\} \).

Example 1

Suppose \(\varSigma \) = {S, a, b}. \(S\,[\,a~ S\,[\,a~b\,]~b\,]\) denotes a tree rooted in S with \(\mathtt {branch\_degree}\) of 3.

Let \(N^*\) be the set of all sequences of finite length of positive natural numbers. For \(\vec {n} = \langle n_1,n_2,\dots ,n_m\rangle \) \(\in N^*\) (m \(\ge \) 1), the subtree of t at is written , and it is defined inductively:

  • Base Case: iff \(\vec {n}=\lambda \).

  • Inductive Case: Suppose \(t = a[t_1t_2\dots t_n]\) and \(\vec {n}\ne \lambda \). Then .

  • Note: is undefined otherwise.

These sequences are the Gorn addresses of the subtrees of t. For example, The first child of t is given by (if it exists); the second child by (if it exists); the second child of the first child by (if it exists); and of course .

The Gorn addresses provide a natural ordering of the subtrees of t in terms of the length-lexicographic ordering. For distinct \(\vec {n}=\langle n_1,n_2,\dots ,n_k\rangle , \vec {m}=\langle m_1,m_2,\dots ,m_\ell \rangle \), precedes iff either \(k<\ell \), or \(k=\ell \) and \(n_1< m_1\), or \(k=\ell \) and \(n_1 = m_1\) and \(\langle n_2,\dots ,n_k\rangle < \langle m_2,\dots ,m_\ell \rangle \). This essentially orders subtrees of t such that the ones closer to the root of t are ordered earlier, and those ‘on the same level’ in t are ordered ‘left to right.’ We make use of this ordering in our proof of Theorem 1.

The largest common subtrees of a set of trees T, denoted \(\mathtt {lcs}(T)\), is \(\{ d\in \bigcap _{t \in T} \mathtt {sub}(t) \mid \forall d\,'\in \bigcap _{t \in T} \mathtt {sub}(t)\), \(|d'| \le |d| \}\).

The k-stem (\(k \ge 1\)) of a tree t, written \(\mathtt {stem}_k\)(t), is defined as follows.

  • Base Case: For all \(a \in \varSigma \), if \(t = a[]\), then \(\mathtt {stem}_k(t) = a[]\).

  • Inductive Case: For all a \(\in \varSigma \), if t = a[\(t_1 t_2 \dots t_n\)], then

    • \(\mathtt {stem}_1(t) = \mathtt {root}(t)[]\), and

    • \(\mathtt {stem}_k(t) = a[\mathtt {stem}_{k-1}(t_1)\mathtt {stem}_{k-1}(t_2)\dots \mathtt {stem}_{k-1}(t_n)]\).

The stems of a tree t, denoted \(\mathtt {stem}(t)\) is the set \(\{\mathtt {stem}_k(t)\mid k\ge 1\}\).

Example 2

The 2-stems of the tree in Example 1 is \(\{S\,[\,a~S~b\,], S\,[\,a~b\,], a[], b[]\}\).

It is useful to incorporate boundary markers into the roots and leaves of trees. Informally, given a \(\varSigma \)-tree t, boundary markers are added above the root and below the leaves. Formally, we employ symbols \(\rtimes ,\ltimes \not \in \varSigma \) for this purpose. We let \(\hat{\varSigma } = \varSigma \cup \{\rtimes ,\ltimes \}\).

Thus for all \(a\in \varSigma \), \(t\in \varSigma ^T\), let \(\mathtt {add\_}\!\!\rtimes (t) = \rtimes [\, t\, ]\), and \(\mathtt {add\_}\!\!\ltimes (a[]) = a [ \ltimes [] ]\), and \(\mathtt {add\_}\!\!\ltimes ( a [ t_1 \dots t_n ] ) = a [ \ltimes (t_1) \dots \ltimes (t_n) ]\). Then for any \(\varSigma \)-tree t, its augmented counterpart \(\hat{t} = \mathtt {add\_}\!\!\rtimes (\mathtt {add\_}\!\!\ltimes (t))\).

The k-factors of a tree t are defined as the set of k-depth stems of subtrees of \(\hat{t}\). For all \(t\in \varSigma ^T\), let \(F_k(t) = \bigcup \{\mathtt {stem}_k(u) \mid u \in \mathtt {sub}(\,\hat{t}\,)\}\).

We lift the definition of k-factors to treesets in the natural way. For all T \(\subseteq \varSigma ^T\), \(F_k(T) = \bigcup _{t \in T} F_k(t)\).

Example 3

The 2-factors of the tree in Example 1 is the set {\(\rtimes [\,S\,[]\,]\), \(S\,[\,a\,[] \,S\,[]\,b\,[]\,]\), \(S\,[\,a\,[]\,b\,[]\,]\), \(a\,[\,\ltimes []\,]\), \(b\,[\,\ltimes []\,], \ltimes []\)}.

A strictly k-local grammar \(G = (\varSigma ,S)\) where S is a finite subset of \(F_k(\varSigma ^T)\) and the tree language of G is defined as: \(\mathbb {L} ((\varSigma ,S)) = \{t \mid F_k(t)\subseteq S\}\).

Note that since S is finite, there exists a smallest number n such that \(S\subseteq \hat{\varSigma }^T_n\). It follows that \(\mathbb {L} ((\varSigma ,S))\) is of branching degree n. A treeset \(T\subseteq \varSigma ^T\) is strictly k-local if there exists a k and a strictly k-local grammar G such that \(\mathbb {L}(G) = T\). Such treesets form exactly strictly k-local treesets (\(SL_k\)). Strictly local stringsets are a special case of strictly local treesets where all the branching degree is 1; so every node (except leaves) are unary branching.

Strictly 2-local treesets have been called local treesets in previous literature [18]. Every Strictly 2-local tree language can be generated by a context free grammar [7, 18].

Comparable to the characterization of strictly local string sets, which is Suffix Substitution Closure [20], each strictly 2-local tree language satisfies Subtree Substitution Closure[18]. To explain this characterization, we first introduce the notion of subtree-substitution.

For \(t, s \in \varSigma ^T\) and \(\vec {n}= \langle n_1,n_2,\dots ,n_m\rangle \) \(\in N^*\) (m \(\ge \) 1), the operation of substituting the subtree of t at by s, written as , is defined as follows.

  • Base Case: iff \(\vec {n}=\lambda \).

  • Inductive Case: If t = \(a[t_1t_2\dots t_n]\) then \(t.\vec {n}\leftarrow s = a[t_1t_2\dots (t_{n_1}.\) \(\langle n_2,n_3\dots n_m \rangle \leftarrow s)\dots t_n]\).

We also define substitution of all the subtrees of t rooted at x (\(x \in \varSigma \)) by s, which we write as s.

  • Base Case: If root(t) = x, s = s.

  • Base Case: If root(t) \(\ne \) x and t = \(a[]\) (a \(\in \varSigma \)), s = t.

  • Inductive Case: If root(t) \(\ne \) x and t = \(a[t_1t_2\dots t_n]\) (a \(\in \varSigma \)), s = \(a[s_1s_2\dots s_n]\) where \(s_i\) = \(t_i{\mathop {\leftarrow }\limits ^{x}}\)s (\(1 \le i \le n\)).

Rogers [18] proves the following result and we repeat the proof to set the stage for the sequel.

Theorem 1

(Subtree Substitution Closure). A treeset \(T \subseteq \varSigma ^T\) is strictly 2-local iff there is n such that T is of branching degree n and for all \(A, B \in \) T, whenever there exist two vectors , such that then .

Proof

If T is strictly 2-local, then there exists a corresponding strictly 2-local grammar G that satisfies \(\mathbb {L}(G) = T\). Thus there exists a finite set \(S\subset F_k(\varSigma ^T)\) such that \(\mathbb {L}((\varSigma ,S)) = T\).

Consider any \(A, B \in T\) and such that . Let . We show \(t\in T\). First notice that \(F_2(A)\subseteq S\) and \(F_2(B) \subseteq S\) because \(A,B\in T\) and \(T=\mathbb {L}((\varSigma ,S))\). Next consider any element \(u\in F_2(t)\). By definition of t and 2-factor, u must be a 2-stem of a subtree of . If u is the 2-stem of a subtree of then \(u\in F_2(B)\subset S\). If not, then u is a 2-stem of a subtree of A and so \(u\in F_2(A)\subset S\). Either way, \(u\in S\) and so \(F_2(t)\subseteq S\). It follows that \(t\in T\).

Conversely, consider a treeset T such that whenever there exist two vectors \(\vec {n_1}, \vec {n_2} \in \vec {N}\), such that then . We refer to this property as the SSC. To show T is Strictly 2-Local, we present a finite set \(S\subset F_k(\varSigma ^T)\) such that \(\mathbb {L}((\varSigma ,S)) = T\). Let S = \(F_2(T)\). Since T is of branching degree n, S is finite. In order to prove \(\mathbb {L}((\varSigma ,S)) = T\), we need to show both \(\mathbb {L}((\varSigma ,S)) \subseteq T\) and \(T \subseteq \mathbb {L}((\varSigma ,S))\). It is obvious that \(T \subseteq \mathbb {L}((\varSigma ,S))\) because for any t \(\in T\), \(F_2(t) \subseteq S=F_2(T)\).

The following proves that \(\mathbb {L}((\varSigma ,S)) \subseteq T\) by recursive application of SSC. Consider any \(t\in \mathbb {L}((\varSigma ,S))\). Let be an enumeration of the m subtrees of t by their Gorn addresses in length-lexicographic order. (Note that \(t_1=t\)).

The base step of the induction is to choose a tree \(s_0\in T\) that has the same root as t. Such a \(s_0\in T\) exists because \(\rtimes [\mathtt {root}(t)[]]\in S\).

Next we assume by the induction hypothesis that \(s_{i-1}\in T\) and we will construct \(s_i\) which is also in T. For each \(1\le i\le m\), if \(t_i\) is a leaf then let \(u=t_i[\ltimes []]\), otherwise let \(u=\mathtt {stem}_2(t_i)\). Choose a tree \(x\in T\) such that \(u\in F_2(x)\). Such a tree \(x\in T\) exists because \(u\in S=F_2(T)\). It follows there is such that . Let . Since and \(s_{i-1}, x\in T\), it follows that \(s_i\in T\) by SSC. Informally, this construction ensures the nodes and children of \(s_i\) are identical to those of t from the root of t to the root of the subtree \(t_i\).

Since each \(s_i\) is built according to \(s_{i-1}\) and \(s_0\in T\) we conclude that \(s_m\in T\). Furthermore, since the subtrees are ordered length-lexicographically and we substitute a 2-stem of a subtree of t to build \(s_i\), it follows that \(s_m=t\). As t was arbitrary in \(\mathbb {L}((\varSigma ,S))\), we obtain \(\mathbb {L}((\varSigma ,S)) \subseteq T\). \(\square \)

The catenation operation of two trees \(u\cdot t\) is defined by substitution in the leaves. Let $ be a new symbol, i.e., $ \(\not \in \varSigma \). Let \(\varSigma _{\$}^{T}\) denote the set of all trees over \(\varSigma \cup {\$}\) which contain exactly one occurrence of label $ in the leaves. The operation of catenation is defined inductively:

  • Base Case: For \(t \in \varSigma ^T\), \(\$[]\cdot t = t\).

  • Base Case: For all \(a\in \varSigma \), if \(u = a[]\), \(u\cdot t = a[]\).

  • Inductive Case: For all \(a\in \varSigma \), if \(u = a[t_1t_2\dots t_n]\), \(u\cdot t\) = \(a[(t_1\cdot t) (t_2\cdot t) \dots (t_n \cdot t)]\).

Example 4

Suppose \(\varSigma \) = {S, a, b}. Let u = S [ a[ ] $[ ] b[ ] ] and t = S [ a[ ] b[ ] ]. u\(\cdot \)t = S [(a[ ]\(\cdot \)t) ($[ ]\(\cdot \)t) (b[ ]\(\cdot \)t)] = S [ a[ ] S [ a[ ] b[ ] ] b[ ] ].

Notice that the classical catenation of strings can be viewed as a special case of catenation of trees with unary branching. This operation can also be used to represent subtrees. For \(t\in \varSigma ^T \cup \varSigma _{\$}^{T}\), if \(t = u\cdot s\), then s is a subtree of t.

If \(U\subseteq \varSigma _{\$}^{T}\) and \(T\subseteq \varSigma ^T \cup \varSigma _{\$}^T\), then . Furthermore, for any \(t \in T\) and any tree language \(T\subseteq \varSigma ^T\), the quotient of t w.r.t. T is defined as \(\mathtt {qt}_T(t) = \{u \in \varSigma _{\$}^T\mid u\cdot t \in T\}\). Canonical finite-state tree recognizers can be defined in terms of these quotients.

3 Input Strictly Local Tree Transducers

In this section we define functions that map trees to trees. After reviewing some basic terminology, we introduce deterministic, frontier-to-root, linear, finite-state Tree Transducers (DFT). We then define Input Strictly Local Tree Transducers (ISLTT) in a grammar-independent way, and then prove they correspond exactly to a type of DFTs. Examples are provided along the way.

A function f with domain X and co-domain Y can be written \(f:X \rightarrow Y\). The image of f is the set \(\{f(x) \in Y | x \in X, f(x)\) is defined\(\}\) and the pre-image of f is the set \(\{x \in X | f(x)\) is defined\(\}\). Tree transducers compute functions that map trees to trees \(f:\varSigma _n^T \rightarrow \varGamma ^T\).

DFTs are defined as a tuple \((Q, \varSigma , \varGamma , F, \delta )\), where Q is a finite set of states, \(F \subseteq Q\) is a set of final states, and \(\delta \) is a transition function that maps a sequence of states paired with an element of \(\varSigma \) to a state and a variably-leafed tree. A variably-leafed tree is a tree which may include variables in the leaves of the tree. Let \(X=\{x_1, x_2, \ldots \}\) be a countable set of variables. If \(\varSigma \) is a finite alphabet then \(\varSigma ^T[X]\) denotes the set of trees t formed with the alphabet \(\varSigma \cup X\) such that if the root of a subtree s of t is a variable then s is a leaf (so variables are only allowed in leaves). Thus formally the transition function is \(\delta : Q^* \times \varSigma \rightarrow \varGamma ^T[X] \times Q\). Importantly, the pre-image of the transition function must be finite. We sometimes write \((q_1q_2\ldots q_m,a,t,q)\in \delta \) to mean \(\delta (q_1q_2\ldots q_m,a)=(t,q)\).

In the course of computing a tree transduction, the variables in variably-leafed trees are substituted with trees. Assume \(t_1,t_2,\dots t_m \in \varGamma ^T\) and \(s \in \varGamma ^T[X]\), which is a variable leafed tree with any subset of the variables \(\{x_1,x_2,...,x_m\}\). We define a substitution function \(\phi \) such that \(\phi (t_1 t_2 \dots t_m, s) = s{\mathop {\leftarrow }\limits ^{x_i}}t_i\) for \(1 \le i \le m\).

We define the process of transducing a tree recursively using a function \(\pi \), which maps \(\varSigma _n^T\) to \(Q \times \varGamma ^T\), which itself is defined inductively with \(\delta \).

  • Base Case: \(\pi (a[ ~]) = (q, v)\) iff \(\delta (\lambda , a) = (v, q)\)

  • Inductive Case: \(\pi (a[t_1t_2\dots t_m]) = (q, \phi (v_1 v_2 \dots v_m, s))\) iff \(\delta (q_1q_2\dots q_m, a) \)\(= (s, q)\) and \(\pi (t_i) = (q_i, v_i)\) for each \(1 \le i \le m\).

The tree-to-tree function the transducer M recognizes is the set of pairs \(\mathbb {L}(M) = \{(t,s) \mid t \in \varSigma _n^T, s \in \varGamma ^T, \pi (t) = (q, s), q \in F\}\). We also write \(M(t)=s\) whenever \((t,s)\in \mathbb {L}(M)\).

A DFT is linear provided whenever \(\delta (q_1q_2\dots q_m, a) = (s, q)\), no variable occurs more than once in s.

Example 5

Wh-movement refers to a syntactic analysis of question words such as English what and who. It is common to analyze this as a relation between tree structures [21]. The input structure describes the relation of the wh-word to its verb (cf. “John thinks Mary believes Bill buys what?”) and the yield of the output structure reflects the pronunciation (cf. “What does John think Mary believe Bill buys”).

We use a simplified transformation to make the point. In the alphabet, S represents the root node of a input tree, W stands for a wh-word and P for everything else (P is for phrase). A transducer of wh-movement can be constructed as a tuple \(M_{wh}=(Q, \varSigma , F, \delta )\) where \(Q = \{q_w,q_p,q_s\}\), \(F = \{q_s\}\), \(\varSigma = \{S,P,W\}\), and \(\delta = \{(\lambda , P, P[], q_p)\), \((\lambda , W, W[], q_w)\), \((q_p q_p, P, P[x_1 x_2], q_p)\), \((q_w q_p, P, P[x_1x_2], q_w)\), \((q_p q_w, P, P[x_1 x_2], q_w)\), \((q_p q_w, S, S[W[] S[x_1x_2]], q_s)\), \((q_w q_p, S, S[W[] S[x_1 x_2]], q_s)\), \((q_p q_p, S, S[x_1 x_2], q_s)\}\).

Figure 1 illustrates some of the transformations computed by the finite-state machine \(M_{wh}\). The tree with a wh-word in Fig. (1a) is transformed into the tree in Fig. (1b). (\(M_{wh}\) keeps the original wh-word in-situ but it could easily be removed or replaced with a trace). The trees in Fig. (1c) and (d) are the same because there is no wh-word in the input tree and so \(M_{wh}\) leaves it unchanged.

Fig. 1.
figure 1

\(M_{wh}\) maps the tree in (a) to the tree in (b) and likewise maps the tree in (c) to itself in (d).

Next we describe the canonical form of deterministic tree transducers. The quotient of a tree \(t\in \varSigma ^T\) with respect to a tree-to-tree function \(f:\varSigma ^T \rightarrow \varGamma ^T\) is a key idea. It will be useful to develop some notation for the largest common subtree of the image under f of the set of trees which includes t as a subtree. Let \(\mathtt {lcsi}_f(t) = \mathtt {lcs}\Big (f\big (\varSigma _{\$}^{T} \cdot \{t\}\big )\Big )\). When f is understood from context, we just write \(\mathtt {lcsi}(t)\). Then the quotient is defined as follows:

$$\begin{aligned} \mathtt {qt}_f(t) = \Big \{(u,v) \mid f(u\cdot t) = v\cdot s , s = \mathtt {lcsi}_f(t)\Big \}. \end{aligned}$$
(1)

When f is clear from context, we write \(\mathtt {qt}(t)\) instead of \(\mathtt {qt}_f(t)\).

It is worth noting that for a tree \(t \in \varSigma _n^T\), the largest common subtree of the image of a linear transducer with the input of \(\varSigma _{\$}^{T} \cdot \{t\}\) is unique if it exists because if there is more than one tree that belongs to \(\mathtt {lcs}(f(\varSigma _{\$}^{T} \cdot \{t\}))\), they must be produced by copying, which is not allowed by linear DFT. If trees \(t_1, t_2 \in \varSigma ^T\) have the same quotient with respect to a function f, they are quotient-equivalent with respect to f and we write \(t_1 \sim _f t_2\). Clearly, \(\sim _f\) is an equivalence relation which partitions \(\varSigma ^T\).

As in the string case, to each regular tree language T, there is a canonical DFT accepting T. The characterization given by the Myhill-Nerode theorem can be transferred to the tree case [6]. For any treeset T, the quotients of trees w.r.t. T can be used to partition \(\varSigma ^T\) into a finite set of equivalence classes.

Analogous to the smallest subsequential finite state transducer for a subsequential function, we can construct the smallest linear DFT for a deterministic tree-to-tree function f and refer to this transducer as the canonical transducer for f, \(\varPsi _f^c\). For \(t_1,t_2,\dots ,t_m\in \varSigma _n^T\) (m \(\le \) n) and a \(\in \varSigma \), let the contribution of a w.r.t. \(t_1t_2\dots t_m\) be \(\mathtt {cont}_f(a,t_1t_2\dots t_m)=v \in \varGamma ^T[X]\), which satisfies

$$\begin{aligned} \phi \Big (\mathtt {lcsi}(t_1)\mathtt {lcsi}(t_2)\ldots \mathtt {lcsi}(t_m), v\Big ) = \mathtt {lcsi}\Big (a[t_1t_2\dots t_m]\Big ). \end{aligned}$$
(2)

The term \(\mathtt {cont}_f(a,t_1t_2\dots t_m)\) is well-defined since each \(\mathtt {lcsi}(t_1), \mathtt {lcsi}(t_2), \ldots \) \(\mathtt {lcsi}(t_m),\) and \(\mathtt {lcsi}(a[t_1t_2\dots t_m])\) are unique.

Then the canonical DFT for a deterministic tree-to-tree function f is:

  • \(Q = \{\mathtt {qt}_f(t) \mid \in \varSigma _n^T\}\),

  • \(F \subseteq Q\),

  • For \(a \in \varSigma \), there exists \(v \in \varGamma ^T\) that satisfies \((\lambda , a, v, \mathtt {qt}_f(a[])) \in \delta \),

  • For \(t_1,t_2,\dots ,t_m\in \varSigma _n^T (m \le n)\) and \(a \in \varSigma \), \(\Big (\mathtt {qt}_f(t_1)\, \mathtt {qt}_f(t_2) \dots \mathtt {qt}_f(t_m),\)\( a,\, \mathtt {cont}_f(a,t_1t_2\dots t_m),\, \mathtt {qt}_f(a[t_1t_2\dots t_m])\Big )\in \delta \).

The presentation here differs from Friese et al. [6], but the only thing we require in the proof of Theorem 2 below is the existence of the canonical DFT whenever \(\sim _f\) is of finite index.

We define ISLTT as a subclass of linear DFTs.

Definition 1

(Input Strictly Local Tree-to-tree Function). A function f is Input Strictly Local (ISL) if there is a k and n such that for all \(t_1, t_2 \in \varSigma _n^T\), if \(\mathtt {stem}_{k-1}(t_1) = \mathtt {stem}_{k-1}(t_2)\) then \(\mathtt {quotient}_f(t_1) = \mathtt {quotient}_f(t_2)\).

In the same way ISL string functions can be used to probe the locality properties of phonological processes, ISL tree functions can used to probe the locality properties of syntactic transformations.

To show that a syntactic transformation is not ISL one need only construct a counterexample to Definition 1.

Example 6

We can show the function computed by \(M_wh\) from Example 5 is not ISL for any k because there is no bound on the distance the wh-word can ‘travel.’ Suppose there is a k and n such that for all \(t_1, t_2 \in \varSigma _n^T\), if \(\mathtt {stem}_{k-1}(t_1) = \mathtt {stem}_{k-1}(t_2)\) then \(\mathtt {qt}_f(t_1) = \mathtt {qt}_f(t_2)\). Let \(u_1 = u_2 \ldots = u_{k-1} = P [P \$]\). Also let \(u_k\) = P [P P], \(s = S [P \$]\) and \(w = P [P W]\). We construct two sentence structures: \(s\cdot t_1\) and \(s\cdot t_2\), where \(t_1 = u_1 \cdot u_2 \dots u_{k-1} \cdot w\) and \(t_2 = u_1 \cdot u_2 \dots u_{k-1} \cdot u_k\). It is obvious that \(\mathtt {stem}_{k-1}(t_1) = \mathtt {stem}_{k-1}(t_2)\). However, \(\mathtt {qt}_f(t_1) \ne \mathtt {qt}_f(t_2)\) since \((s,s) \in \mathtt {qt}_f(t_2)\) but \((s,s) \notin \mathtt {qt}_f(t_1)\). As we can always find such a pair of trees \(t_1\) and \(t_2\) for any k, it is thus proved that wh-movement is not ISL for any k.

Our main result, Theorem 2 below, establishes an automata-theoretic characterization of ISL tree-to-tree functions. As we illustrate after the proof, one can show that a tree transformation is ISL using this theorem.

Theorem 2

(ISL Tree Transducers). A function f is ISL iff there is some k and n such that f can be described with a DFT for which

  1. 1.

    \(Q = \{\mathtt {stem}_{k-1}(t)\mid t \in \varSigma _n^T)\}\) and \(F \subseteq Q\),

  2. 2.

    \(\forall q_1q_2\dots q_m \in Q^* (1\le m \le n)\), \( a \in \varSigma \), \(u \in \varGamma ^T[X]\), it is the case that \((q_1q_2 \dots q_m, a, u, q') \in \delta \Rightarrow q' = \mathtt {stem}_{k-1}(a[q_1q_2 \dots q_m])\).

The transducer is finite since \(\varSigma \) is finite and n bounds the branching degree of the pre-image of f which ensures the finiteness of both Q and \(\delta \).

Before our proof of the Theorem, we prove a lemma based on these remarks.

Remark 1

For all k,m \(\in \mathbb {N}\) with k \(\le \) m, and for all \(t\in \varSigma _n^T\), \(\mathtt {stem}_k(\mathtt {stem}_m(t)) = \mathtt {stem}_k(t)\) since both t and \(\mathtt {stem}_m(t)\) share the same k-stem from the root.

Remark 2

For all \(k\in \mathbb {N}\), and for all \(a \in \varSigma \) and \(t_1,t_2,\dots t_m \in \varSigma _n^T (m \le n)\), \( \mathtt {stem}_{k-1}(a[t_1t_2\dots t_m]) =\) \(\mathtt {stem}_{k-1}(a[\mathtt {stem}_{k-1}(t_1)\, \mathtt {stem}_{k-1}(t_2)\dots \mathtt {stem}_{k-1}(t_m)])\). This is a direct consequence of Remark 1.

Lemma 1

Let \(\varPsi \) be a ISLTT with the properties defined in Theorem 2. If \(t\in \varSigma _n^T\) and \(u\in \varGamma ^T\), \(\pi (t) = (q, u)\), then \(q = \mathtt {stem}_{k-1}(t)\).

Proof

The proof is by induction on the depth of the trees to which \(\pi \) is applying. The base case follows from the facts that for \((\lambda , a, v, q) \in \delta \) iff \(\pi (a[]) = (q, v)\) and \(q = \mathtt {stem}_{k-1}(a[])\).

Next assume for all \(t_1,t_2,\dots , t_m \in \varSigma _n^T\) (m \(\le \) n) and \(v_1,v_2\dots ,v_m \in \varGamma ^T\) such that \(\pi (t_1)\) = \((q_1,v_1)\) implies \(q_1\) = \(\mathtt {stem}_{k-1}(t_1)\), \(\pi (t_2)\) = \((q_2,v_2)\) implies \(q_2\) = \(\mathtt {stem}_{k-1}(t_2),\dots ,\pi (t_m)\) = \((q_m,v_m)\) implies \(q_m\) = \(\mathtt {stem}_{k-1}(t_m)\). We show that \(\forall a \in \varSigma \) that there is a \(v\in \varGamma ^T[X]\) such that \(\pi (a[t_1t_2\dots t_m])\) = \(\big (q, \phi (v_1v_2\dots v_m,v)\big )\) and \(q = \mathtt {stem}_{k-1}(a[t_1t_2\dots t_m])\). Based on the assumption, we know that \(\pi (t_1)\) = \((\mathtt {stem}_{k-1}(t_1),v_1)\), \(\pi (t_2)\) = \((\mathtt {stem}_{k-1}(t_2),v_2)\), \(\dots \), \(\pi (t_m)\) = \((\mathtt {stem}_{k-1}(t_m),v_m)\), so there exists \(v\in \varGamma ^T[X]\) such that \((\mathtt {stem}_{k-1}(t_1)\mathtt {stem}_{k-1}(t_2)\mathtt {stem}_{k-1}(t_m), a, v, q) \in \delta \). By the construction, q is defined to be equal to \(\mathtt {stem}_{k-1}(a[\mathtt {stem}_{k-1}(t_1)\mathtt {stem}_{k-1}(t_2)\mathtt {stem}_{k-1}(t_m)])\), which by Remark 2, equals \(\mathtt {stem}_{k-1}(a[t_1t_2\dots t_m])\).

Now we can prove the theorem.

Proof

( Theorem 2). (\(\Leftarrow \)) Assume \(k \in \mathbb {N}\) and let f be a function described by \(\varPsi \) = {Q, \(\varSigma \), \(\varGamma \), F, \(\delta \)} constructed as in Theorem. Let \(t_1, t_2 \in \varSigma _n^T\) such that \(\mathtt {stem}_{k-1}(t_1) = \mathtt {stem}_{k-1}(t_2)\). By Lemma 1, both \(t_1\) and \(t_2\) lead to the same state, so \(\mathtt {qt}_f(t_1) = \mathtt {qt}_f(t_2)\). Therefore, f is k-ISL.

(\(\Rightarrow \)) Consider any ISL tree-to-tree function f. Then there is some k and n such that \(\forall t_1,t_2 \in \varSigma _n^T\), we have \(\mathtt {stem}_{k-1}(t_1) = \mathtt {stem}_{k-1}(t_2) \Rightarrow \mathtt {qt}_f(t_1) = \mathtt {qt}_f(t_2)\). We show that the corresponding ISL tree transducer \(\varPsi _f^{ISL}\) exists. Since \(\mathtt {stem}_{k-1}(\varSigma _n^T)\) is a finite set, the equivalence relation \(\sim _f\) partitions \(\varSigma ^T\) into at most \(\mathtt {stem}_{k-1}(\varSigma _n^T)\) blocks. Thus there exists a canonical linear DFT \(\varPsi _f^c\) = {\(Q_c, F_c, \varSigma , \varGamma , \delta _c\)}. \(\pi _c\) is the process function derived from \(\delta _c\) that maps \(\varSigma _n^T\) to \(Q_c\times \varGamma ^T\).

Construct \(\varPsi \) = {Q, F, \(\varSigma ,\varGamma ,\delta \)} as follows:

  • Q = \(\mathtt {stem}_{k-1}(\varSigma _n^T)\)

  • \(\forall q \in Q, q \in F\) iff \(\mathtt {qt}(q) \in F_c\).

  • For \(a\in \varSigma \) and \(v\in \varGamma ^T[X]\), \((\lambda , a, v, q) \in \delta \) iff \((\lambda , a, v, \mathtt {qt}(q))\in \delta _c\).

  • \(\forall q_1q_2\dots q_m \in Q^* (1\le m \le n), a \in \varSigma , u \in \varGamma ^T[X]\), we have \((q_1q_2 \dots q_m,\)\( a, v, \mathtt {stem}_{k-1}(a[q_1q_2 \dots q_m])) \in \delta \) if and only if \((\mathtt {qt}(q_1) \mathtt {qt}(q_2) \dots \mathtt {qt}(q_m), a, v,\)\( \mathtt {qt}(a[q_1q_2\dots q_m])) \in \delta _c\).

\(\varPsi \) is ISL by construction, as the states and transitions of \(\varPsi \) meet requirements (1) and (2) of Theorem 2.

The following proof show that \(\varPsi \) computes the same function as \(\varPsi _f^c\) by showing that \(\varPsi \) and \(\varPsi _f^c\) generate the same function. In other words we show \(\forall t \in \varSigma _n^T\), \(u \in \varGamma ^T\), \(\pi (t) = (\mathtt {stem}_{k-1}(t), u)\) iff \(\pi _c(t) = (\mathtt {qt}(t), u)\) and \(\mathtt {stem}_{k-1}(t) \in F\) iff \(\mathtt {qt}(t) \in F_c\).

First, we show that \(\pi (t)\) = \((\mathtt {stem}_{k-1}(t), u)\) iff \(\pi _c(t) = (\mathtt {qt}(t), u)\). Clearly, the base case is satisfied. For all \(a \in \varSigma \) and \(v\in \varGamma ^T[X]\), \((\lambda , a, v, q) \in \delta \) iff \((\lambda , a, v, \mathtt {qt}(q)) \in \delta _c\). Thus \(\pi _c(a[]) = (\mathtt {qt}(a[]), v)\) and \(\pi (a[]) = (\mathtt {stem}_{k-1}(a[]), v)\).

Next assume that there exist \(t_1,t_2,\dots , t_m \in \varSigma _n^T\) and \(u_1,u_2,\dots , u_m\) \(\in \varGamma ^T\) such that \(\pi (t_i) = (\mathtt {stem}_{k-1}(t_i),u_i)\) iff \(\pi _c(t_i) = (\mathtt {qt}(t_1),u_i)\) for each \(1\le i\le m\). We show \(\forall a \in \varSigma \) and \(\forall v \in \varSigma ^T[X]\) such that \(\pi (a[t_1t_2\dots t_m])\) = \((\mathtt {stem}_{k-1}(a[t_1t_2\dots t_m])\), we have \((\phi (u_1u_2\dots u_m],v))\) iff \(\pi _c(a[t_1t_2\dots t_m])\) = \((\mathtt {qt}(a[t_1t_2\dots t_m]), \phi (u_1u_2\dots u_m],v))\).

Suppose \(\pi _c(a[t_1t_2\dots t_m]) = (\mathtt {qt}(a[t_1t_2\dots t_m]), (\phi (u_1u_2\dots u_m,v)))\). By assumption, \(\pi _c(t_i) = (\mathtt {qt}(t_1),u_i)\) for each \(1\le i \le m\). Hence, \(\big (\mathtt {qt}(t_1)\dots \mathtt {qt}(t_m),\)\( a, v, \mathtt {qt}(a[t_1t_2\dots t_m])\big ) \in \delta _c\).

Let \(q_i\) = \(\mathtt {stem}_{k-1}(t_i)\) for each \(1\le i\le m\). Observe that each \(\mathtt {stem}_{k-1}(t_i) = \mathtt {stem}_{k-1}(q_i)\) by Remark 1. Consequently, since f is k-ISL, \(\mathtt {qt}(t_i) = \mathtt {qt}(q_i)\). Similarly, \(\mathtt {stem}_{k-1}(a[t_1t_2\dots t_m]) = \mathtt {stem}_{k-1}(a[q_1q_2\dots q_m])\) and so \(\mathtt {qt}(a[t_1t_2\dots t_m]) = \mathtt {qt}(a[q_1q_2\dots q_m])\). By substitution then, we have \(\pi _c(t_i) = (\mathtt {qt}(q_i),u_i)\) for each \(1\le i\le m\) and \(\big (\mathtt {qt}(q_1)\mathtt {qt}(q_2)\dots \mathtt {qt}(q_m), a, v, \mathtt {qt}(a[q_1q_2\dots q_m])\big ) \in \delta _c\).

By construction of \(\varPsi \), \(\big (q_1q_2 \dots q_m, a, x, \mathtt {stem}_{k-1}(a[q_1q_2 \dots q_m])\big ) \in \delta \). Since \(\pi (t_i) = (\mathtt {stem}_{k-1}(t_i),u_i)\) for each \(1\le i\le m\), it follows that \(\pi (a[t_1t_2\dots t_m]) = \big (\mathtt {stem}_{k-1}(a[q_1q_2 \dots q_m]), (\phi (u_1u_2\dots u_m,v))\big )\) which equals \(\big (\mathtt {stem}_{k-1}(a[t_1t_2 \dots t_m]), (\phi (u_1u_2\dots u_m,v))\big )\).

Conversely, consider any \(a\in \varSigma \) and \(v\in \varSigma ^T[X]\) and suppose \(\pi (a[t_1t_2\dots t_m]) =\) \(\big (\mathtt {stem}_{k-1}(a[t_1t_2\dots t_m]), (\phi (u_1u_2\dots u_m],v))\big )\). By assumption, \(\pi (t_i)\) equals \((\mathtt {stem}_{k-1}(t_i),u_i)\) for each \(1\le i\le m\). Thus \(\big (\mathtt {stem}_{k-1}(t_1)\mathtt {stem}_{k-1}(t_2)\dots \) \(\mathtt {stem}_{k-1}(t_m), a, v, \mathtt {stem}_{k-1}(a[t_1t_2 \dots t_m])\big ) \in \delta \). Let \(q_i = \mathtt {stem}_{k-1}(t_i)\) for each \(1\le i\le m\) as before. It follows that \(\mathtt {stem}_{k-1}(t_i) = \mathtt {stem}_{k-1}(q_i)\), so \(\mathtt {qt}(t_i) = \mathtt {qt}(q_i)\). Likewise, \(\mathtt {stem}_{k-1}(a[t_1t_2\dots t_m]) = \mathtt {stem}_{k-1}(a[q_1q_2\dots q_m])\), so \(\mathtt {qt}(a[t_1t_2\dots t_m]) = \mathtt {qt}(a[q_1q_2\dots q_m])\). Therefore, \(\big (\mathtt {stem}_{k-1}(q_1)\mathtt {stem}_{k-1}(q_2)\dots \)\( \mathtt {stem}_{k-1}(q_m), a, v, \mathtt {stem}_{k-1}(a[q_1q_2 \dots q_m])\big ) \in \delta \).

By construction of \(\varPsi \), this means \(\big (\mathtt {qt}(q_1) \mathtt {qt}(q_2)\dots \mathtt {qt}(q_m), a, v,\)\( \mathtt {qt}(a[q_1q_2\dots q_n])\big )\in \delta _c\). Since \(\pi _c(t_i) = (\mathtt {qt}(t_i),u_i)\) for each i by assumption, it follows that \(\pi _c(a[t_1t_2\dots t_m])\) = \(\big (\mathtt {qt}(a[q_1q_2\dots q_n]), (\phi (u_1u_2\dots u_n,v)\big )\).

We need to further show that \(\mathtt {stem}_{k-1}(t) \in F\) iff \(\mathtt {qt}(t) \in F_c\). By construction, we know that \(q\in F\) iff \(\mathtt {qt}(q)\) belongs to \(F_c\). Thus \(\mathtt {stem}_{k-1}(t) \in F\) iff \(\mathtt {qt}(\mathtt {stem}_{k-1}(t)) \in F_c\). By Remark 1, \(\mathtt {stem}_{k-1}(t) = \mathtt {stem}_{k-1}(\mathtt {stem}_{k-1}(t))\). Hence \(\mathtt {qt}(t) = \mathtt {qt}(\mathtt {stem}_{k-1}(t))\). Therefore, \(\mathtt {stem}_{k-1}(t) \in F\) iff \(\mathtt {qt}(t) \in F_c\).

This concludes the proof that \(\varPsi \) and \(\varPsi _f^c\) generate the same function. \(\square \)

As mentioned earlier, the value of Theorem 2 is that it can be used to establish that certain tree transformations are ISL by presenting a transducer for the transformation which satisfies the properties specified by the theorem.

Example 7

This example shows that reversing the branch order of a regular tree set \(T\subseteq \varSigma ^T_n\) is ISL. We illustrate with the classic tree language whose yield is the string language \(a^nb^n\). In other words we wish to show that the transformation that maps \(t_1=S [a[] b[]]\) to \(t_1'=S [b[] a[]]\) and \(S [a[] t_1 b[]]\) to \(S [b[] t_1' a[]]\) and so on is ISL.

The DFT can be represented as a tuple (Q, \(\varSigma \), F, \(\delta \)) where the states are expressed by the 1-stems of the subtrees of the pre-image: \(Q = \{a[],b[],S[]\}\), and \(F = \{S[]\}\), and \(\varSigma = \{a,b,S\}\), and \(\delta = \{(\lambda , a, a[], a[]), (\lambda , b, b[], b[]),\) \((a[]b[], S, S[x_2, x_1], S[]), (a[]S[]b[], S, S[x_3 x_2 x_1], S[])\}\).

The reader can verify that this transducer correctly reverses the branch order of the trees in its pre-image. Further, this construction shows the function is ISL since it satisfies the requirements in Theorem 2.

4 Conclusion

This paper took a first step in characterizing local syntactic transformations by generalizing Input Strictly Local string functions to trees. Future work includes defining Output SL tree functions (cf. [3]) and studying whether these classes of tree functions can be learned more quickly and with fewer resources, and characterizing subclasses of tree transducers which characterize the types of non-local processes found in syntax and machine translation.