Abstract
This paper introduces a novel method for visualizing high-dimensional Grassmannians through 2D embeddings on the Poincaré disk. The proposed approach involves the construction of affinity matrices on each manifold, followed by the minimization of KL-divergence between the geodesics affinity. This process enables the identification of an optimal projection that effectively preserves the geometry of the original high-dimensional Grassmannian. Our main theoretical contribution lies in bounding the embedding loss with respect to two factors: the logarithm of the number of subspaces and a term dependent on the distribution of subspaces within the Grassmannian. Notably, this term is smaller when subspaces exhibit well-defined clusters, and larger in the absence of any discernible structure. We complement our theoretical analysis with comprehensive experiments on both synthetic and real datasets. The experimental results showcase the superiority of our embedding in accurately visualizing Grassmannians compared to existing representations.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Subspaces are a cornerstone of data analysis, with applications ranging from linear regression to principal component analysis (PCA) [1,2,3], low-rank matrix completion (LRMC) [4, 5], computer vision [6,7,8,9], recommender systems [10,11,12], classification [13,14,15], and more [16]. However, there exist few tools to visualize the Grassmann manifold \({\mathbb {G}}({m},{r})\) of \({r}\)-dimensional subspaces of \({\mathbb {R}}^{m}\). Perhaps the most intuitive of such visualizations is the representation of \({\mathbb {G}}(3,1)\) as the closed half-sphere where each point in the hemisphere represents the one-dimensional subspace (line) in \({\mathbb {R}}^3\) that crosses that point and the origin (see Fig. 1). While intuitive, this visualization bears two major limitations. First, this representation wraps around the edge, so geodesic distances can be deceiving. For instance, two points (subspaces) that may appear diametrically far may in fact be arbitrarily close (see Fig. 1). But more importantly, the main caveat of this semi-sphere representation is that it is unclear how to generalize it to \({m}>3\) or \({r}>1\), which makes it quite restrictive, specially for analysis of modern high-dimensional data.
In response to the limitations of classical 3D presentations, we introduce a novel approach, which we call GrassCaré, that addresses two crucial aspects of visualizing collections of points (subspaces) in the Grassmannian, namely
-
Efficient projection: Our method determines the optimal angle for projecting the points onto the Grassmannian. This process can be likened to rotating the view until distances are accurately represented by geodesic distances, as depicted in Fig. 1.
-
High-dimensional support with preserved geometry: We ensure full support for high-dimensional Grassmannian spaces while maintaining the geometry of the Riemannian manifold on a low-dimensional plane. This preservation is essential to prevent distortion during dimension reduction, which could otherwise lead to misleading perceptions of the scales of different subareas. This phenomenon is analogous to the challenge of representing a sphere perfectly on a 2D plane without deformation, as observed in certain world maps, affecting how humans perceive the scale of central and edge areas on the Mercator map (resulting, for example, in a common misperception of the size of the Greenland relative to the rest of the world).
To achieve these objectives, we aim to match two affinity matrices-one corresponding to the geodesics on the Grassmannian and the other to the distances on the 2D space. We minimize the discrepancy (measured by Kullback–Leibler (KL) divergence) between these two matrices. In doing so, we ensure that points close to each other on the Grassmannian are also close in the new 2D space. In this manner, we can obtain a 2D embedding that accurately captures the structure of points on the Grassmannian.
To replicate the landscape of the Grassmannian, we utilize the Poincaré disk as the space of the 2D embeddings. The Poincaré disk is a 2D hyperbolic geometric model, often represented as a unit circle where the geodesic distance between two points in the disk is depicted as the circular arc orthogonal to the unit circle [17]. This representation corresponds to the projection of the hyperbolic arc of their geodesic (see Fig. 2 for intuitive understanding). Essentially, the Poincaré disk provides a view from the bottom of the 3D hyperbolic bowl, sharing significant structural similarity with the hemisphere in Fig. 1.
Compared to the Euclidean space, which requires deformation to generate embeddings, the Poincaré disk allows the embeddings to have a focused area (central area) that is resistant to deformation from hyperbolic effects and a background area (around the edges) where only the global structure is significant. This characteristic enables us to maintain an accurate global representation of the Grassmannian within a unit circle while simultaneously preserving local structures.
Our main theoretical result shows that the loss of our embedding (measured in terms of the KL-divergence with respect to (w.r.t.) the Grassmannian geodesics) is bounded by a log-factor of the number of subspaces, and a term that depends on the distribution of the subspaces in the Grassmannian. This term will be smaller if the subspaces form well-defined clusters, and it will be larger if the subspaces have no structure whatsoever. In words, this result shows that under reasonable assumptions, our embedding can be an accurate representation of the Grassmannian. Equipped with this result, we believe that the GrassCaré embedding can be a powerful tool for subspace tracking, classification, multi-dataset analysis, and any application where there is an interest in visualizing subspaces.
Paper Organization In the next section, we discuss several applications of our GrassCaré embedding. The subsequent section briefly summarizes related work followed by which we introduce the main formulation that determines our embedding, together with the gradient steps for the optimization. Then our main theorem, bounding the loss of our embedding, followed by its proof is presented. Finally, we demonstrate the applicability of our GrassCaré embedding on real and synthetic data, we compare it to naive alternatives, and we discuss its advantages and limitations in the final section.
Applications
Our GrassCaré embedding could be a valuable tool in the following applications:
Subspace Clustering involves clustering a collection of data points \({\textbf{x}} \in {\mathbb {R}}^m\) that are located near a union of subspaces [18]. The primary objective is to estimate such union. This technique finds applications in various domains such as motion segmentation [19, 20], face clustering [21], data mining [22], time series analysis [23], among others.
Our experiments demonstrate that the GrassCaré embedding we propose provides valuable insights and summaries regarding the characteristics and relationships among the clusters. By employing this embedding, we can analyze the results of a subspace clustering algorithm in a more comprehensive manner, moving beyond simple accuracy metrics. Moreover, it serves as a useful tool for debugging and gaining a deeper understanding of algorithmic performance. In fact, the analysis and comprehension of subspace clustering algorithms served as the initial motivation behind this research.
Low-Rank Matrix Completion is a fundamental problem that tackles the challenge of recovering missing entries in a low-rank matrix \({\textbf{X}}\) [24]. The goal is to unveil the underlying structure by identifying the low-dimensional row and column spaces of \({\textbf{X}}\). This versatile technique finds applications in various domains, making it indispensable for recommender systems in e-commerce [25], enhancing image processing algorithms in computer vision [26], accelerating drug discovery processes in pharmaceutical research [27], and even enabling effective analysis of electronic health records (EHR) [28].
Our novel contribution lies in the utilization of the GrassCaré embedding, which emerges as a powerful tool in the analysis of LRMC algorithms. Through a series of carefully designed experiments, we demonstrate how our approach sheds light on the algorithmic behavior and performance of LRMC methods. By traversing the Grassmannian, our embedding offers valuable insights into crucial aspects such as the proximity of the completed matrix to the ground truth, convergence behavior, step sizes during optimization, and discernible patterns that arise throughout the iterative process. This comprehensive analysis not only enriches our understanding of LRMC algorithms but also empowers researchers and practitioners to fine-tune and enhance the performance of these methods in real-world scenarios.
Subspace tracking is a dynamic process that involves the continuous estimation of a subspace \({{\mathbb {U}}}_{t}\) as it evolves over time, navigating through the intricate landscape of the Grassmannian [3, 29, 30]. This versatile modeling technique finds its application in various domains, including signal processing [31], low-rank matrix completion [32], and computer vision [29]. For instance, in computer vision, subspace tracking enables the estimation of the underlying subspace corresponding to the background in a moving video scene.
In this context, our proposed GrassCaré embedding emerges as a powerful tool for monitoring the trajectory of the subspace as it traverses the Grassmannian. By harnessing the capabilities of our embedding, we gain valuable insights into the behavior of the evolving subspaces. This encompasses the assessment of their movement speed, spatial distance covered, and identification of distinctive patterns such as zig-zag or cyclic motion. Through the lens of the GrassCaré embedding, we can unravel the dynamic nature of the subspace path, providing a deeper understanding of its evolution and behavior.
Multi-Dataset Analysis is a crucial area of study that explores the relationships and interactions between multiple datasets. Principal Component Analysis (PCA) stands as the cornerstone of dimensionality reduction techniques and finds widespread application in diverse fields. From electronic health records (EHR) analysis [28] to genomics research [33, 34], and even vehicle detection systems [35, 36], PCA has proven its versatility and utility. By identifying the low-dimensional subspace that best captures the essential structure of high-dimensional datasets, PCA enables efficient data representation.
In contemporary scenarios, multiple datasets often exhibit distributional patterns or possess underlying relationships. For example, the EHRs of different geographical regions might share significant correlations. However, due to challenges such as privacy concerns, security considerations, data size, ownership, and logistical constraints, sharing such information becomes daunting, if not impossible. Nonetheless, the principal subspaces derived from each dataset can be shared more efficiently, mitigating these concerns and offering a potential avenue for gaining insightful knowledge. Leveraging our GrassCaré embedding, we present a visualization tool that facilitates the analysis of interrelationships between related datasets. By examining the embedding, we can unveil similarities, identify clusters, and detect meaningful patterns, unlocking a wealth of valuable insights.
Prior Work
To the best of our knowledge, general visualizations of Grassmannians have been studied using Self-Organizing Mappings (SOM) [37], which were introduced first for general dimensionality reduction [38,39,40,41]. The extension of SOM to Grassmannians iteratively updates points on a 2D index space to find the best arrangement, such that points that are neighbors in the Grassmannian are still close in the embedding. However, SOM presents several limitations. For instance, like most neural networks, they require large datasets, which may not always be available in practice. They also suffer from large parameter spaces, and are quite difficult to analyze, making it hard to derive theoretical guarantees about the accuracy of their embeddings. It is worth mentioning that there are numerous methods for general high-dimensional data visualization, including UMAP [42], LargeVis [43], Laplacian eigenmaps [44, 45], isomap [46], and more [47,48,49,50]. However, since these embeddings are not compact, they are not appropriate to represent the Grassmannian.
Another more suitable alternative are the Grassmannian Diffusion Maps (GDMaps) [51] introduced as an extension of Diffusion Maps [52]. GDMaps consist of two separate stages. The first stage projects the given data point (i.e. vector, matrix, tensor) onto the Grassmannian using a singular value decomposition. The second stage uses diffusion maps to identify the subspace structures on the projected Grassmannian. Although the embedding can be quickly generated, it is unfortunately less accurate than other methods in this paper. On the other hand, Stochastic Neighbor Embeddings (SNE) were first presented by Hinton and Roweis [53]. It formed the basis for t-SNE, which was introduced later by Maaten and Hinton [54]. Both algorithms minimize the KL-divergence between the distributions representing the probability of choosing the nearest neighbor on the high and low-dimensional spaces. These embeddings have become some of the most practical tools to visualize high-dimensional data on Euclidean space. However, Euclidean distances are poor estimators of geodesics of Grassmannians, so a direct application of these methods would result in an inaccurate representation of subspaces arrangements.
Motivated by these issues we decided to explore the use of the Poincaré disk, which has recently received increasing attention for high-dimensional embeddings [55, 56]. Intuitively, the Poincaré disk is a 2D hyperbolic geometric model, usually displayed as a unit circle where the geodesic distance between two points in the disk is represented as the circular arc orthogonal to the unit circle [17], which corresponds to the projection of the hyperbolic arc of their geodesic (see Fig. 2 to build some intuition). This unique feature brings several advantages for serving as the embedding space for Grassmannian. First, since these hyperbolic arcs get larger (tending to infinity) as points approach the disk boundary, the Poincaré disk is an effective model to accurately represent the global structure of complex hierarchical data while retaining its local structures. Specifically, the Poincaré disk can be viewed as a continuous embedding of tree nodes from the top of the tree structure, where the root node is at the origin, and the leaves are distributed near the boundary. So, it is naturally suited to represent hierarchical structures. This is suitable to represent structured clusters, where the points from the same cluster can be regarded a branch of the tree, because they share a similar distance to other clusters. Second, the hyperbolic disk has a Riemannian manifold structure that allows us to perform gradient-based optimization, which is crucial to derive convergence guarantees, and for parallel training of large-scale dataset models. Finally, our main result showing the accuracy of our embedding enables efficient clustering using the Poincaré low-dimensional representation. That is, instead of clustering subspaces on the high-dimensional dataset, the clustering method can be performed on the mutual distances acquired from the embedding, with the knowledge that the embedding would represent the high-dimensional subspace accurately enough.
Setup and Formulation
In this section, we present the mathematical formulation of our GrassCaré embedding. To this end let us first introduce some terminology. Recall that we use \({\mathbb {G}}({m},{r})\) to denote the Grassmann manifold that contains all the \({r}\)-dimensional subspaces of \({\mathbb {R}}^{m}\). For any two subspaces \({{\mathbb {U}}}_1,{{\mathbb {U}}}_2 \in {\mathbb {G}}({m},{r})\), the geodesic distance between them is defined as:
where \({{\textbf {U}}}_{i}, {{\textbf {U}}}_{j}\in {\mathbb {R}}^{{m}\times {r}}\) are orthonormal bases of \({{\mathbb {U}}}_{i},{{\mathbb {U}}}_{j}\), and \(\sigma _\ell (\varvec{\cdot })\) denotes the \(\ell {\textrm{th}}\) largest singular value. As for the embedding space, recall that the Poincaré disk \({\mathbb {D}}\) is the Riemannian manifold defined as the open unit ball in \({\mathbb {R}}^2\) equipped with the following distance function between two points \({{\textbf {p}}}_{i},{{\textbf {p}}}_{j}\in {\mathbb {D}}\):
Notice from (2) that the geodesic distance in the disk is amplified smoothly as \({{\textbf {p}}}_{i}\) or \({{\textbf {p}}}_{j}\) move away from the origin. Intuitively, this means that an arc of the same Euclidean length in the disk represents a larger geodesic distance (tending to infinity) as it approaches the edge of the disk. In other words, distances near the edge of the disk are larger than they appear (see Fig. 4 to build some intuition). Conversely, distances at the center of the disk are smaller than they appear. This allows to plot denser regions of the Grassmannian with higher granularity (thus retaining local structure) while at the same time keeping an accurate global representation of the Grassmannian inside an open circle.
To find our embedding, we will mimic the symmetric SNE approach in [54]. That is, we will first compute a probability matrix \({{\textbf {P}}}_{\mathbb {G}}\in [0,1]^{{N}\times {N}}\) whose \(({i},{j}){\textrm{th}}\) entry represents the probability that \({{\mathbb {U}}}_{i}\) is chosen as a nearest neighbor of \({{\mathbb {U}}}_{j}\), which is equal to zero if \({i}={j}\), and for \({i}\ne {j}\) is given by:
where \(\gamma _{i}\) is adapted to the data density: smaller values for denser regions of the data space. In our experiments, we choose it to be the variance of distances from point \({i}\) to other points. Next we create the probability matrix \({{\textbf {P}}}_{\mathbb {D}}\in [0,1]^{{N}\times {N}}\), whose \(({i},{j}){\textrm{th}}\) entry represents the probability that point \({{\textbf {p}}}_{i}\) in our embedding \({\mathbb {D}}\) is chosen as a nearest neighbor of point \({{\textbf {p}}}_{j}\in {\mathbb {D}}\), which is equal to zero if \({i}={j}\), and for \({i}\ne {j}\) is given by:
The tuning parameter \(\beta\), which is set to be any positive real number, serves as a control factor for the scattering of the embedding [56]. A higher value of \(\beta\) results in a wider spread of the embeddings on the Poincaré disk. To examine the influence of various \(\beta\) values on the final output, we carried out a series of experiments in the following sections (the outcomes are presented in Table 1). Our findings indicate that setting \(\beta\) to an excessively large value, such as 100, results in poor embeddings. Generally, to achieve favorable results, it is recommended to keep \(\beta\) within the range of 1 to 2, as also suggested in [56].
Our goal to obtain the embedding is to maximize the similarity between the two distributions \({{\textbf {P}}}_{\mathbb {G}}\) and \({{\textbf {P}}}_{\mathbb {D}}\), which we do by minimizing their Kullback–Leibler (KL) divergence:
Since \({{\textbf {P}}}_{\mathbb {G}}\) is a constant given \(\{{{\textbf U}}_{i}\}\), this is the same as minimizing the following loss
To minimize this loss over the Poincaré disk \({\mathbb {D}}\) we will use Riemannian Stochastic Gradient Descent [57], which updates \({{\textbf {p}}}_{i}^{t+1}\) according to:
where \(\eta >0\) is the step size (set as \(\eta = 1\) in the implementation), \({\varvec{\nabla }}_{i}{ \mathcal{L}}\) denotes the Riemannian gradient of \({ \mathcal{L}}\) w.r.t. \({{\textbf {p}}}_{i}\), and R denotes a retractionFootnote 1 from the tangent space of \({{\textbf {p}}}_{i}\) onto \({\mathbb {D}}\). It is easy to see that
where the gradient of \(d_{\mathbb {D}}\) w.r.t. \({{\textbf {p}}}_{i}\) is given by:
Here \(a = 1 - \|{{\textbf {p}}}_{i}\|^2\), \(b = 1 - \|{{\textbf {p}}}_{j}\|^2\), and \(c = 1 + \frac{2}{ab}\|{{\textbf {p}}}_{i}- {{\textbf {p}}}_{j}\|^2\). Finally, the retraction step is given by
where
and \(\varepsilon\) is a small constant number; in our experiments we set this to \(10^{-5}\).
In our implementation we use random initialization for the points in the embedding. We point out that initialization is crucial for t-SNE. This is because t-SNE is generally used to embed points in the Euclidean space, which is open. In contrast, the Grassmannian is spherical and compact, and hence, we observed that varying initialization resulted in similar/equivalent embeddings of the Grassmannian, observed from different angles. This is further demonstrated in our experiments section (Fig. 6), where the average loss of GrassCaré over 100 trials varies very little in comparison to all other embeddings, showing that besides this point-of-view difference, our results do not depend heavily on the initialization.
Implementation with Adam
Our first prototype of GrassCaré [59] employed regular gradient descent due to its simplicity and smooth convergence. However, we soon discovered that algorithm 1 in [59] became prohibitively slow when dealing with larger datasets. To address this issue and enhance the algorithm’s speed, we introduced the ADAM optimizer [60]. By incorporating ADAM into the process, we were able to achieve a significantly faster and more efficient algorithm (orders of magnitude faster, capable of handling datasets orders of magnitude larger; see Fig. 3). The entire embedding process is detailed in Algorithm 1. It is important to note that whenever \((\cdot )^2\) or \(\sqrt{\cdot }\) is applied to a vector, it represents an element-wise operation.
We conducted tests using synthetic data to compare the performance of the original and improved methods. The results are summarized in Fig. 3. Our findings indicate that the number of samples remains the primary determinant of computing time. Notably, when using 100 samples, the improved method consistently outperforms the original approach in terms of speed. As the number of samples increases, the original method fails to converge after 300 points within 3600 s, while the improved method effectively solves the problem in a short amount of time.
Main Theoretical Results and Proofs
First observe that convergence of our embedding follows directly by now-standard results in Riemannian optimization (see, e.g. Proposition in [61]). In fact, local convergence of our embedding follows directly because our Riemannian steps are gradient-related [61]. Our main theoretical result goes one step further, bounding the loss of our embedding by a log-factor of the number of subspaces, and a term that depends on the arrangement of the subspaces in the Grassmannian. This term will be smaller if the subspaces form well-defined clusters, and larger if the subspaces have no structure whatsoever. Intuitively, this result shows that under reasonable assumptions, our embedding can provide an accurate representation of Grassmannians.
Theorem 1
Suppose \({N}>3\). Define \(\gamma :=\min _{i}\gamma _{i}\) and \(\Gamma :=\max _{i}\gamma _{i}\). Let \(\{{{\mathcal {U}}}_1,\dots ,{{\mathcal{U}}}_{K}\}\) be a partition of \(\{{{\mathbb {U}}}_1,\dots ,{{\mathbb {U}}}_{N}\}\) such that \(|{{\mathcal{U}}}_{k}| \ge {n}_{K}>1\) \(\forall\) \({k}\). Let
Then the optimal loss of GrassCaré is bounded by:
where
In words, Theorem 1 requires that the subspaces can be arranged into clusters of size \({n}_{k}>1\) such that the intra-cluster distances are smaller than \(\sqrt{2}\delta \gamma\), and the outer-cluster distances are larger than \(\sqrt{2}\Delta \Gamma\) (see Fig. 4). Notice that this can always be done as long as \({N}>3\). However, depending on the arrangement, \(\delta\) could be too large or \(\Delta\) too small, resulting in a loose bound. Ideally we want a small \(\delta\) and a large \(\Delta\), so that the subspaces form well-defined clusters and \(e^{\delta ^2-\Delta ^2}\) is small, resulting in a tighter bound.
Proof
Theorem 1 follows by a similar strategy as in [62], which essentially bounds the optimal loss by that of an artificial embedding. In our case we will use an embedding that maps \(\{{{\mathcal{U}}}_1,\dots ,{{\mathcal{U}}}_{K}\}\) to \({K}\) points uniformly distributed in the circle of radius 1/2, i.e., \({{\textbf {p}}}_{i}={{\textbf {p}}}_{j}\) for every \({{\mathbb {U}}}_{i},{{\mathbb {U}}}_{j}\in {{\mathcal{U}}}_{k}\) (see Fig. 4). This way, for any subspaces \({{\mathbb {U}}}_{i},{{\mathbb {U}}}_{j}\) in different clusters \({{\mathcal{U}}}_{k},{{\mathcal{U}}}_\ell\), the geodesic distance of their embeddings on the Poincaré disk is upper and lower bounded by
It follows that the \(({i},{j}){\textrm{th}}\) entry of \({{\textbf {P}}}_{\mathbb {D}}\) is bounded by
where the denominator is precisely D as defined in (7). Now, if \({{\mathbb {U}}}_{i}\) and \({{\mathbb {U}}}_{j}\) are in the same cluster \({{\mathcal{U}}}_{k}\), the bound in (8) simplifies to 1/D. Otherwise, it simplifies to \(\exp (-2.2^2/\beta )/D\). Plugging these bounds in the loss, we see that:
Next notice that if \({{\mathbb {U}}}_{i}, {{\mathbb {U}}}_{j}\) are not in the same cluster,
Plugging this into (9) we obtain the desired result. \(\square\)
Experiments
Recall that the main motivation of this paper is to develop a novel method to visualize Grassmannians of high ambient dimension. Our bound above describes the theoretical accuracy of our embedding. We now present a series of experiments on real and synthetic datasets to analyze its practical performance. In particular, we will test on normal simulated data, and one canonical dataset [63]. These datasets have moderately high ambient dimension (i.e., many features), but low intrinsic dimension (i.e., lie in a low-dimensional subspace). In other words, these datasets would fit in high-dimensional Grassmannians of low-dimensional subspaces. We believe that these well-studied datasets are a perfect fit for our setting, and convenient for an initial exploration and comparison against existing baselines.
Comparison Baseline. To evaluate the effectiveness of our method we used t-SNE [54], GDMaps [51], and a naive visualization based on the most common dimensionality reduction technique: Principal Components Analysis (PCA). To this end we first vectorize (stack the columns of) each orthonormal basis \({{\textbf {U}}}_{i}\) into a vector \({{\textbf {u}}}_{i}\in {\mathbb {R}}^{{m}{r}}\). Next we concatenate all vectors \({{\textbf {u}}}_{i}\) into a matrix of size \({m}{r}\times {N}\), on which we apply PCA. In this naive PCA (nPCA) visualization, the subspace \({{\mathbb {U}}}_{i}\) is represented in the (x, y) plane by \({{\textbf {v}}}_{i}\in {\mathbb {R}}^2\), the coefficients of \({{\textbf {u}}}_{i}\) w.r.t. the leading principal plane \({{\mathbb {V}}}\).
Visualizing Synthetic Subspace Clusters. In our synthetic experiments we study our embedding when the subspaces are uniformly distributed among \({K}\) clusters. To this end we first generated \({K}\) centers in the Grassmannian \({\mathbb {G}}({m},{r})\), each defined by a \({m}\times {r}\) matrix \({{\textbf {C}}}_{k}\) with i.i.e. standard normal entries whose columns are later orthonormalized. Then for each \({k}\) we independently generate \({n}_{k}\) subspaces, each spanned by a basis \({{\textbf {U}}}_{i}\) whose entries are equal to those of \({{\textbf {C}}}_{k}\) plus i.i.d. normal random variables with variance \(\sigma ^2\). This will produce \({K}\) clusters in \({\mathbb {G}}({m},{r})\), each with \({n}_{k}\) subspaces. The smaller \(\sigma\), the closer the subspaces in the same cluster will be to one another, and vice versa.
In our first experiment we study a controlled setting where we can actually visualize the low-dimensional Grassmannian \({\mathbb {G}}(3,1)\), and compare it with our embedding on the Poincaré disk. We hope that this experiment provides a visual intuition of how points are embedded in higher dimensional cases. To this end we generated \({n}_{k}=50\) subspaces per cluster (\({m}=3\), \({r}=1\)), and we set \(\sigma =0.1\), which produced visually well-defined cluster clouds. Figure 5 shows the results for \({K}=2,3,4,5\) clusters. At first glance it might appear like our GrassCaré embedding is not too different from the other approaches, especially as t-SNE and nPCA seems to be doing a decent job displaying the clusters. However, a more careful look reveals that t-SNE clearly agglomerates several pairs of clusters, while the GrassCaré can separate them nicely.
In particular, notice that in Fig. 5, both nPCA and even the classic 3D representation fail to show the true local structure of the Grassmannian that the GrassCaré plot reveals. To see this pay special attention to the cyan and yellow clusters. Based on the first two rows (classic 3D representation and nPCA) these clusters would appear to be nearly diametrically apart (in the 3D representation, the cyan cluster is in the back side of the hemisphere). However, computing their geodesics one can verify that the subspaces that they represent are in fact quite close in the Grassmannian. An intuitive way to see this is to extend the lines to the opposite side of the hemisphere and compute their smallest angle, or to remember that in the 3D representation, the hemisphere wraps around the edge (see Fig. 4). In contrast, our GrassCaré plot accurately displays the true global structure of the Grassmannian, mapping these two clusters close to one another. Also notice that the embeddings are plotted with equal scale on horizontal and vertical axis. The GrassCaré makes a better use of the visual space, spreading all data more broadly while at the same time keeping the clusters well-defined. In contrast, GDMaps has much less range on the horizontal axis, which makes it look like a straight line and not be able to display the full information.
The previous experiment shows the qualitative superiority of the GrassCaré embedding over alternative embeddings in the low-dimensional case \({m}=3\) and \({r}=1\) (where no vectorization is needed for nPCA). In our next experiment we will show in a more quantitative way that the advantages of GrassCaré are even more evident in higher dimensional cases. First notice that the classic 3D representation only applies to the case \({m}=3\), \({r}=1\), and there is no clear way how to extend it to higher dimensions. On the other hand, recall that for \({r}>1\), nPCA requires vectorizing the bases \({{\textbf {U}}}_{i}\), which will naturally interfere even more with the structure of the Grassmannian. To see this consider:
While both span the same subspace in \({\mathbb {G}}(3,2)\), the Euclidean distance of their vectorizations is large, which would result in distant points in the nPCA embedding. t-SNE and GDMaps present similar inaccuracy behavior. To verify this we generated subspaces in the exact same way as described before (with \({K}=3\), \({n}_{k}=17\), and different values of \({m}\) and \({r}\)), except this time we measured the quality of the visualization in terms of the representation error, which we define as the Frobenius difference between the (normalized) distance matrices produced by the subspaces in the Grassmannian and the points in each embedding. In the case of GrassCaré, distances in the embedding are measured according to the Poincaré geodesics, so the representation error of the GrassCaré embedding will be measured as
where \(Z_{\mathbb {G}}^2 = \sum _{{i},{j}} d^2_{\mathbb {G}}({{\mathbb {U}}}_{i},{{\mathbb {U}}}_{j})\) and \(Z_{\mathbb {D}}^2 = \sum _{{i},{j}} d^2_{\mathbb {D}}({{\textbf {p}}}_{i},{{\textbf {p}}}_{j})\) are normalization terms. Similarly, since the distance that all other embeddings aim to minimize is Euclidean, the representation error of the other embeddings will be measured as
where \(Z_{{\mathbb {V}}}^2 \ = \ \sum _{{i},{j}} \Vert {{\textbf {v}}}_{i}-{{\textbf {v}}}_{j}\Vert ^2\) is a normalization term. The results of 100 trials are summarized in Fig. 6, which confirms the superiority of our GrassCaré embedding, and the loss of structure of the naive approach. The computation time is also summarized in Fig. 3.
Visualizing Subspace Clusters from Real Data. In this experiment, we aim to visualize the relationship between subspaces formed by groups of images in the MNIST and FashionMNIST datasets. To achieve this, we first grouped every 70 pictures randomly within each category. Next, we applied Singular Value Decomposition to extract the basis for these samples. From the decomposition, we selected the top 2–3 ranks to serve as the general basis for each subclass. Subsequently, we utilized nPCA, GDMaps, t-SNE, and GrassCaré (this paper) on the selected basis to further analyze the data.
In Fig. 7, we present the visualization results obtained through this comprehensive process. An in-depth analysis of these results reveals that GrassCaré exhibits a remarkable ability to visualize low-rank subspaces effectively. Notably, this method excels in distinctly presenting each subclass as a distinguishable cluster, capturing the intrinsic structure of the data. In contrast, other commonly used visualization techniques, such as GDMaps, show limitations in certain scenarios. Specifically, GDMaps may struggle to fully separate certain classes, resulting in overlapping or less well-defined clusters. Similarly, t-SNE might encounter challenges in correctly grouping certain subclasses together, potentially leading to less cohesive and less interpretable visualizations.
It is important to acknowledge that applying these methods directly to the raw dataset might yield different outcomes and potentially showcase superior results for certain techniques. However, comparing the results from raw data with those from the subclasses derived using GrassCaré would be akin to comparing oranges with apples, rendering such a comparison meaningless.
Subspace Estimation from Incomplete Data. In our next experiment we apply our GrassCaré embedding to visualize the path of subspaces produced by the subspace estimation algorithm known as GROUSE (Grassmannian Rank-One Update Subspace Estimation) [32]. The applicability of this algorithm ranges from online video analysis (to track the subspace of the background in real time) to subspace clustering [18] and low-rank matrix completion (LRMC) [32]. In the latter, the algorithm receives a subset of the entries of a data matrix \({{\textbf {X}}}\in {\mathbb {R}}^{{m}\times {n}}\) whose columns lie in an unknown subspace \({{\mathbb {U}}}^\star \in {\mathbb {G}}({m},{r})\), and the goal is to estimate \({{\mathbb {U}}}^\star\). To this end GROUSE starts with a subspace estimate \({{\mathbb {U}}}_0 \in {\mathbb {G}}({m},{r})\), and iteratively tilts it in the direction of a column of \({{\textbf {X}}}\), producing a sequence of subspaces \({{\mathbb {U}}}_1,\dots ,{{\mathbb {U}}}_{N}\).
To emulate this setup we first generate true and initial subspaces with bases \({{\textbf {U}}}^\star ,{{\textbf {U}}}_0 \in {\mathbb {R}}^{{m}\times {r}}\) with i.i.d. standard normal entries. Next we generate a coefficient matrix \({\varvec{\Theta }}\in {\mathbb {R}}^{{r}\times {n}}\) with i.i.d. standard normal entries, so that \({{\textbf {X}}}={{\textbf {U}}}^\star {\varvec{\Theta }}\) is rank-\({r}\). Then we run GROUSE using a fraction \(\Omega\) of the entries of \({{\textbf {X}}}\), selected uniformly at random. We store each of GROUSE’s steps, and visualize their path \({{\mathbb {U}}}_0,{{\mathbb {U}}}_1,\dots ,{{\mathbb {U}}}_{N}\) (together with the target \({{\mathbb {U}}}^\star\)) using our GrassCaré embedding, nPCA, GDMaps, and t-SNE. Figure 8 shows sample plots when \({m}=200\), \({r}=5\), \(\Omega =0.7\) (corresponding to 30% missing data; both cases share the same initialization), and \({n}={N}=50\) (corresponding to the case where GROUSE only iterates once over each column). Here once again the GrassCaré plot shows a richer depiction of the subspaces and a better usage of the available visual space. From the GrassCaré plot we can clearly visualize each separate path, and see that, as expected, the full-data estimate gets much closer to the target than the missing-data estimate (and much faster). In contrast, the paths in the nPCA plot are hardly distinguishable and misleading, showing the opposite of the truth: an incomplete-data estimate much closer to the target than the full-data estimate. To verify once again (beyond our visual interpretation) that the GrassCaré embedding is much more representative of the true distribution of subspaces in the Grassmannian, we measured the distance of each iterate to the target, in the Grassmannian and in each embedding. The normalized results are in Fig. 9. They show that the trajectories in the GrassCaré embedding mimic closely those in the Grassmannian. In contrast, the nPCA embedding can be quite misleading, showing in fact an opposite representation, with distances in the embedding growing over iterations, while in reality they are decreasing in the Grassmannian.
Motion Segmentation. In our final experiment we use our GrassCaré plot to visualize the subspaces describing moving objects in videos from the Hopkins 155 dataset [63]. This dataset contains the locations over time of landmarks of several moving objects (e.g., cars, buses, or checkerboards) in 155 video sequences. Recall that the stacked landmarks of each rigid object over time approximately lie in a four-dimensional subspace [64, 65]. So for our experiment we split all landmarks of the same object in groups of 5 (if at any point there were fewer than 5 landmarks left, they were discarded), and for each group we performed a singular value decomposition to identify its four-dimensional principal subspace \({{\mathbb {U}}}_{i}\). Figure 10 shows the embedding of the subspaces of all groups, color-coded by object. Notice that GrassCaré displays the subspaces of the same object nearby. This is consistent with theory, as they represent slightly noisy versions of the subspace describing the object’s trajectory. Notice the higher variance in the yellow cluster, which is consistent with its landmarks, corresponding to several trees, cars, and pavement, as opposed to just one rigid object. But not only that. From the GrassCaré embedding we can also analyze the trajectories themselves, and their relationships. For instance, in the traffic plot we can see that the green and red clusters (corresponding to the moving car and van) are close to one another, indicating that their trajectories resemble each other. In contrast, these clusters are farther from the yellow one, which matches our observation that the trajectories of the moving car and van are quite different from the nearly static background.
Conclusions and Limitations
This paper introduces GrassCaré, a novel method for embedding Grassmannian points onto a 2-D disk with exceptional precision and a well-distributed visual representation. GrassCaré promises to be a potent tool for visualizing subspaces extracted from high-dimensional real-world data, enabling researchers to explore both local and global structures, such as paths and clusters, with greater ease and clarity.
Through the incorporation of the Adam optimizer in our enhanced prototype, we have achieved remarkable advancements over the original algorithm, resulting in significant speed and efficiency improvements. GrassCaré now operates at orders of magnitude faster speeds, empowering it to handle datasets of unprecedented scale. Furthermore, our main theoretical finding demonstrates that GrassCaré boasts a lower bound on representation loss under mild assumptions, further reinforcing the reliability and robustness of the method. Finally, we provide compelling evidence of GrassCaré’s superiority by applying it to real-world datasets. The results not only showcase its ability to optimize space utilization within the unit circle but also effectively mitigate the visual distortion arising from disparate axis scales, unlike the shortcomings observed in GDMaps.
In conclusion, GrassCaré emerges as a promising and indispensable tool for data visualization, empowering researchers to glean meaningful insights from complex high-dimensional data while preserving fidelity and clarity in the representation. As we continue to explore and refine this method, we anticipate its widespread adoption in various domains, making a lasting impact on the field of data analysis and visualization.
Data availability
The dataset of MNIST is available at https://doi.org/10.24432/C53K8Q. The dataset of FashionMNIST is available at https://github.com/zalandoresearch/fashion-mnist. The dataset of Hopkins155 is available at http://www.vision.jhu.edu/data/hopkins155
Code Availability
The code for this project is accessible through a public Github repository, which can be accessed at the following link: https://github.com/HuanranLi/GrassCare-Plot.
Notes
A mapping R from the tangent bundle \(T{\mathcal {M}}\) to the manifold \({\mathcal {M}}\) such that its restriction to the tangent space of \({\mathcal {M}}\) at \({{\textbf {{p}}}}_{i}\) satisfies a local rigidity condition which preserves gradients at \({{\textbf {{p}}}}_{i}\); see Chapters 3 and 4 of [58] for a more careful treatment of these definitions.
References
Knudsen T. Consistency analysis of subspace identification methods based on a linear regression approach. Automatica. 2001;37(1):81–9.
Jansson M, Wahlberg B. A linear regression approach to state-space subspace system identification. Signal Process. 1996;52(2):103–29.
Vaswani N, Bouwmans T, Javed S, Narayanamurthy P. Robust subspace learning: robust PCA, robust subspace tracking, and robust subspace recovery. IEEE Signal Process Mag. 2018;35(4):32–55.
Dai W, Milenkovic O, Kerman E. Subspace evolution and transfer (set) for low-rank matrix completion. IEEE Trans Signal Process. 2011;59(7):3120–32.
Vidal R, Favaro P. Low rank subspace clustering (LRSC). Pattern Recogn Lett. 2014;43:47–61.
Cao J, Zhang K, Luo M, Yin C, Lai X. Extreme learning machine and adaptive sparse representation for image classification. Neural Netw. 2016;81:91–102.
Chen G, Lerman G. Spectral curvature clustering (SCC). Int J Comput Vis. 2009;81(3):317–30.
Hong W, Wright J, Huang K, Ma Y. Multiscale hybrid linear models for lossy image representation. IEEE Trans Image Process. 2006;15(12):3655–71.
Lu L, Vidal R. Combined central and subspace clustering for computer vision applications; 2006. p. 593–600.
Koohi H, Kiani K. A new method to find neighbor users that improves the performance of collaborative filtering. Expert Syst Appl. 2017;83:30–9.
Ullah F, Sarwar G, Lee S. N-screen aware multicriteria hybrid recommender system using weight based subspace clustering. Sci World J 2014;2014:1–10.
Zhang W, Wang Q, Yoshida T, Li J. RP-LGMC: rating prediction based on local and global information with matrix clustering. Comput Oper Res. 2021;129:105228.
Sun W, Zhang L, Du B, Li W, Lai YM. Band selection using improved sparse subspace clustering for hyperspectral imagery classification. IEEE J Sel Top Appl Earth Observ Remote Sens. 2015;8(6):2784–97.
Ahmed MS, Khan L. SISC: a text classification approach using semi supervised subspace clustering. In: 2009 IEEE international conference on data mining workshops. IEEE; 2009. p. 1–6.
Xia C-Q, Han K, Qi Y, Zhang Y, Yu D-J. A self-training subspace clustering algorithm under low-rank representation for cancer classification on gene expression data. IEEE/ACM Trans Comput Biol Bioinform. 2017;15(4):1315–24.
Mevel L, Hermans L, Auweraer H. Application of a subspace-based fault detection method to industrial structures. Mech Syst Signal Process. 1999;13(6):823–38.
Goodman-Strauss C. Compass and straightedge in the poincaré disk. Am Math Monthly. 2001;108(1):38–49.
Parsons L, Haque E, Liu H. Subspace clustering for high dimensional data: a review. ACM SIGKDD Explor Newsl. 2004;6(1):90–105.
Yang AY, Wright J, Ma Y, Sastry SS. Unsupervised segmentation of natural images via lossy data compression. Comput Vis Image Understand. 2008;110(2):212–25.
Vidal R, Tron R, Hartley R. Multiframe motion segmentation with missing data using powerfactorization and GPCA. Int J Comput Vis. 2008;79(1):85–105.
Elhamifar E, Vidal R. Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans Pattern Anal Mach Intell. 2013;35(11):2765–81.
Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD international conference on management of data; 1998. p. 94–105.
Bahadori MT, Kale D, Fan Y, Liu Y. Functional subspace clustering with application to time series. In: International conference on machine learning. PMLR; 2015. p. 228–237.
Recht B. A simpler approach to matrix completion. J Mach Learn Res. 2011;12(12):3413–30.
Kang Z, Peng C, Cheng Q. Top-n recommender system via matrix completion. In: Proceedings of the AAAI conference on artificial intelligence, vol. 30; 2016.
Ji H, Liu C, Shen Z, Xu Y, Robust video denoising using low rank matrix completion. In: 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE; 2010. p. 1791–8.
Zhang H, Ericksen SS, Lee C-P, Ananiev GE, Wlodarchak N, Yu P, Mitchell JC, Gitter A, Wright SJ, Hoffmann FM, et al. Predicting kinase inhibitors using bioactivity matrix derived informer sets. PLoS Comput Biol. 2019;15(8):1006813.
Lee D, Kim BH, Kim KJ. Detecting method on illegal use using PCA under her environment. In: 2010 International Conference on Information Science and Applications. IEEE; 2010. p. 1–6.
He J, Balzano L, Lui J. Online robust subspace tracking from partial information. arXiv preprint arXiv:1109.3827; 2011.
Xu J, Ithapu VK, Mukherjee L, Rehg JM, Singh V. GOSUS: Grassmannian online subspace updates with structured-sparsity. In: Proceedings of the IEEE international conference on computer vision; 2013. p. 3376–3383.
Stewart GW. An updating algorithm for subspace tracking. Technical report; 1998.
Balzano L, Nowak R, Recht B. Online identification and tracking of subspaces from highly incomplete information. In: 2010 48th Annual Allerton conference on communication, control, and computing (Allerton). IEEE; 2010. p. 704–711.
Novembre J, Johnson T, Bryc K, Kutalik Z, Boyko AR, Auton A, Indap A, King KS, Bergmann S, Nelson MR, et al. Genes mirror geography within Europe. Nature. 2008;456(7218):98–101.
Song Y, Westerhuis JA, Aben N, Michaut M, Wessels LF, Smilde AK. Principal component analysis of binary genomics data. Brief Bioinform. 2019;20(1):317–29.
Wu J, Zhang X. A PCA classifier and its application in vehicle detection. In: IJCNN’01. International Joint Conference on Neural Networks. Proceedings (Cat. No. 01CH37222), vol. 1. IEEE; 2001. p. 600–604.
Wu J, Zhang X, Zhou J. Vehicle detection in static road images with PCA-and-wavelet-based classifier. In: ITSC 2001. 2001 IEEE Intelligent Transportation Systems. Proceedings (Cat. No. 01TH8585). IEEE; 2001. p. 740–744.
Kirby M, Peterson C. Visualizing data sets on the Grassmannian using self-organizing mappings. In: 2017 12th International Workshop on Self-organizing Maps and Learning Vector Quantization, Clustering and Data Visualization (WSOM). IEEE; 2017. p. 1–6.
Kohonen T. Self-organized formation of topologically correct feature maps. Biol Cybern. 1982;43(1):59–69.
Kohonen T. The self-organizing map. Proc IEEE. 1990;78(9):1464–80.
Kohonen T. The self-organizing map. Neurocomputing. 1998;21(1–3):1–6.
Kohonen T. Essentials of the self-organizing map. Neural Netw. 2013;37:52–65.
McInnes L, Healy J, Melville J. UMAP: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426; 2018.
Tang J, Liu J, Zhang M, Mei Q. Visualizing large-scale and high-dimensional data. In: Proceedings of the 25th international conference on world wide web; 2016. p. 287–297.
Belkin M, Niyogi P. Laplacian eigenmaps and spectral techniques for embedding and clustering. Adv Neural Inf Process Syst. 2001;14:1–7.
Belkin M, Niyogi P. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 2003;15(6):1373–96.
Tenenbaum JB, Silva V, Langford JC. A global geometric framework for nonlinear dimensionality reduction. Science. 2000;290(5500):2319–23.
Liu S, Maljovec D, Wang B, Bremer P-T, Pascucci V. Visualizing high-dimensional data: advances in the past decade. IEEE Trans Vis Comput Graph. 2016;23(3):1249–68.
Engel D, Hüttenberger L, Hamann B. A survey of dimension reduction methods for high-dimensional data analysis and visualization. In: Visualization of large and unstructured data sets: applications in geospatial planning, modeling and engineering-proceedings of IRTG 1131 Workshop 2011. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik; 2012.
Ashokkumar P, Don S. High dimensional data visualization: a survey. J Adv Res Dyn Control Syst. 2017;9(12):851–66.
Kiefer A, Rahman M, et al.: An analytical survey on recent trends in high dimensional data visualization. arXiv preprint arXiv:2107.01887; 2021.
Santos KR, Giovanis DG, Shields MD. Grassmannian diffusion maps based dimension reduction and classification for high-dimensional data. arXiv preprint arXiv:2009.07547; 2020.
Coifman RR, Lafon S, Lee AB, Maggioni M, Nadler B, Warner F, Zucker SW. Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. Proc Natl Acad Sci. 2005;102(21):7426–31.
Hinton G, Roweis ST. Stochastic neighbor embedding. In: NIPS, vol. 15. Citeseer; 2002. p. 833–840.
Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res. 2008;9(11):1–2.
Nickel M, Kiela D. Poincaré embeddings for learning hierarchical representations. Adv Neural Inf Process Syst. 2017;30:6338–47.
Klimovskaia A, Lopez-Paz D, Bottou L, Nickel M. Poincaré maps for analyzing complex hierarchies in single-cell data. Nat Commun. 2020;11(1):1–9.
Bonnabel S. Stochastic gradient descent on Riemannian manifolds. IEEE Trans Autom Control. 2013;58(9):2217–29.
Absil P-A, Mahony R, Sepulchre R. Optimization algorithms on matrix manifolds. Princeton: Princeton University Press; 2009.
Li H, Pimentel-Alarcón D. Visualizing Grassmannians via poincare embeddings; 2023.
Kingma DP, Ba J. Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980; 2014.
Adams LM, Nazareth JL, et al. Linear and nonlinear conjugate gradient-related methods, vol. 85. Philadelphia: SIAM; 1996.
Shaham U, Steinerberger S. Stochastic neighbor embedding separates well-separated clusters. arXiv preprint arXiv:1702.02670; 2017.
Tron R, Vidal RA, benchmark for the comparison of 3-d motion segmentation algorithms. In: 2007 IEEE conference on computer vision and pattern recognition. IEEE; 2007 p. 1–8.
Tomasi C, Kanade T. Shape and motion from image streams under orthography: a factorization method. Int J Comput Vis. 1992;9(2):137–54.
Kanatani K-I. Motion segmentation by subspace separation and model selection. In: Proceedings eighth IEEE international conference on computer vision. ICCV 2001, vol. 2. IEEE; 2001. p. 586–91.
Acknowledgements
This work was partially supported by the National Science Foundation under Grant No. CAREER CCF-2239479.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Recent Trends on Computer Vision, Imaging and Computer Graphics Theory and Applications” guest edited by Kadi Bouatouch, A. Augusto Sousa, Thomas Bashford-Rogers, Mounia Ziat and Helen Purchase.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Li, H., Pimentel-Alarcón, D. GrassCaré: Visualizing the Grassmannian on the Poincaré Disk. SN COMPUT. SCI. 5, 298 (2024). https://doi.org/10.1007/s42979-023-02597-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-023-02597-0