[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 

Information Visualization Theory and Applications

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Information Applications".

Deadline for manuscript submissions: 31 March 2025 | Viewed by 9041

Special Issue Editors


E-Mail Website
Guest Editor
Department of Computer Science and Information Technology, University of the District of Columbia, Washington, DC 20759, USA
Interests: HCI; visual analytics; big data analytics; information visualization

E-Mail Website
Guest Editor
Department of Computer Science, Bowie State University, Bowie, MD 20715, USA
Interests: data science; medical informatics and visualization; time-series analysis

E-Mail Website
Guest Editor
Department of Management and Decision Sciences, Coastal Carolina University, Conway, SC 29528, USA
Interests: data analytics; visualization; management information systems; HCI

Special Issue Information

Dear Colleagues,

In data science, information visualization has become a central component to understanding complex scientific problems or data through visualization. Thus, designing innovative visualization systems is essential to address and solve various domain problems. For this, a theoretical understanding of visualization models and design principles through visual encoding, interaction, and/or analysis tasks is critical to solving domain problems and understanding data effectively. Identifying possible implications from theories of perception, cognition, design, and/or aesthetics is also important. Automated design guidelines and visualization recommendations should be determined to find scientific limitations of understanding data through visualization.

This Special Issue aims to seek high-quality papers that highlight visualization challenges to be accomplished, elucidate solutions for understanding domain problems through visualizations, and theoretical visualization principles for advancing visualization techniques and applications.

Dr. Dong Hyun Jeong
Dr. Soo-Yeon Ji
Dr. Bong-Keun Jeong
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • information visualization
  • science of user interactions in visualization
  • visualization theory
  • knowledge-assisted visualization
  • visualization technique
  • visualization in machine learning
  • visualziatioin applications and design studies
  • evaluation and empirical research in visualization
  • visual data analysis and knowledge discovery
  • visual representation and interaction
  • visualization applications
  • visualization taxonomies and models
  • visualization algorithms and technologies
  • uncertainty visualization
  • visualization tools and systems for simulation and modeling

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

30 pages, 75307 KiB  
Article
Bridging the Gap Between Theory and Practice: Fitness Landscape Analysis of Real-World Problems with Nearest-Better Network
by Yiya Diao, Changhe Li, Junchen Wang, Sanyou Zeng and Shengxiang Yang
Information 2025, 16(3), 190; https://doi.org/10.3390/info16030190 - 1 Mar 2025
Viewed by 119
Abstract
For a long time, there has been a gap between theoretical optimization research and real-world applications. A key challenge is that many real-world problems are black-box problems, making it difficult to identify their characteristics and, consequently, select the most effective algorithms to solve [...] Read more.
For a long time, there has been a gap between theoretical optimization research and real-world applications. A key challenge is that many real-world problems are black-box problems, making it difficult to identify their characteristics and, consequently, select the most effective algorithms to solve them. Fortunately, the Nearest-Better Network has emerged as an effective tool for analyzing the characteristics of problems, regardless of dimensionality. In this paper, we conduct an in-depth experimental analysis of real-world functions from the CEC 2022 and CEC 2011 competitions using the NBN. Our experiments reveal that real-world problems often exhibit characteristics such as unclear global structure, multiple attraction basins, vast neutral regions around the global optimum, and high levels of ill conditioning. Full article
(This article belongs to the Special Issue Information Visualization Theory and Applications)
Show Figures

Figure 1

Figure 1
<p>Transformation from the original fitness landscape of the CEC 2022 <math display="inline"><semantics> <msub> <mi>f</mi> <mn>9</mn> </msub> </semantics></math> to the NBN visualization with 2500 samples [<a href="#B4-information-16-00190" class="html-bibr">4</a>].</p>
Full article ">Figure 2
<p>CEC 2022 functions. <math display="inline"><semantics> <mrow> <mi>D</mi> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>.</p>
Full article ">Figure 3
<p>CEC 2022 functions. <math display="inline"><semantics> <mrow> <mi>D</mi> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>.</p>
Full article ">Figure 4
<p>Nearest-Better Network of the functions with <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">S</mi> <mi>even</mi> </msub> </semantics></math>.</p>
Full article ">Figure 4 Cont.
<p>Nearest-Better Network of the functions with <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">S</mi> <mi>even</mi> </msub> </semantics></math>.</p>
Full article ">Figure 5
<p>Nearest-Better Network of the functions with <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">S</mi> <mi>even</mi> </msub> </semantics></math>.</p>
Full article ">Figure 5 Cont.
<p>Nearest-Better Network of the functions with <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">S</mi> <mi>even</mi> </msub> </semantics></math>.</p>
Full article ">Figure 6
<p>Nearest-Better Network of the functions with <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">S</mi> <mi>even</mi> </msub> </semantics></math>.</p>
Full article ">Figure 7
<p>Nearest-Better Network of the functions with <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">S</mi> <mi>best</mi> </msub> </semantics></math>.</p>
Full article ">Figure 8
<p>Nearest-Better Network of the functions with <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">S</mi> <mi>best</mi> </msub> </semantics></math>.</p>
Full article ">Figure 9
<p>Nearest-Better Network of the functions with <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">S</mi> <mi>best</mi> </msub> </semantics></math>.</p>
Full article ">Figure 10
<p>NBN of <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">S</mi> <mi>even</mi> </msub> </semantics></math> of CEC 2022 (<math display="inline"><semantics> <mrow> <mi>D</mi> <mo>=</mo> <mn>10</mn> </mrow> </semantics></math>) <math display="inline"><semantics> <msub> <mi>f</mi> <mn>9</mn> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi>f</mi> <mn>12</mn> </msub> </semantics></math>: The black area represents the BoA of the global optima and <math display="inline"><semantics> <mrow> <mrow> <mo>|</mo> </mrow> <msub> <mi mathvariant="bold-italic">B</mi> <mi mathvariant="bold">o</mi> </msub> <mrow> <mo>|</mo> </mrow> </mrow> </semantics></math> is the size of this BoA. The calculation of <math display="inline"><semantics> <mrow> <mrow> <mo>|</mo> </mrow> <msub> <mi mathvariant="bold-italic">B</mi> <mi mathvariant="bold">o</mi> </msub> <mrow> <mo>|</mo> </mrow> </mrow> </semantics></math> is described in [<a href="#B5-information-16-00190" class="html-bibr">5</a>].</p>
Full article ">Figure 11
<p>The colored individual points are different individuals in the population <math display="inline"><semantics> <mi mathvariant="bold-italic">P</mi> </semantics></math>. Evolution is carried out based on the current population and the next-generation population of the EA4 will exceed the problem boundary.</p>
Full article ">Figure 12
<p>The success rate of EA4 over 30 independent runs in the subproblems of CEC 2011 <math display="inline"><semantics> <msub> <mi>h</mi> <mn>9</mn> </msub> </semantics></math> and <math display="inline"><semantics> <msub> <mi>h</mi> <mn>11</mn> </msub> </semantics></math>, where <span class="html-italic">r</span> is the reduction ratio of the search area as defined previously.</p>
Full article ">Figure 13
<p>Algorithm behaviors on CEC 2022 <math display="inline"><semantics> <msub> <mi>f</mi> <mn>9</mn> </msub> </semantics></math> (<math display="inline"><semantics> <mrow> <mi>D</mi> <mo>=</mo> <mn>10</mn> </mrow> </semantics></math>): The data of ANDE are the union of <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">S</mi> <mi>best</mi> </msub> </semantics></math> and all solutions of each population at the last stage of the evolution. The color represents the subpopulation to which the solution belongs. The data of Hillvall are the union of <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">S</mi> <mi>best</mi> </msub> </semantics></math> and the best solutions of all evolved populations at each restart time. The color indicates the restart time of each solution. For other algorithms, the data represent the union of <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">S</mi> <mi>best</mi> </msub> </semantics></math> and all data from one algorithm trait. The color indicates the iteration at which the solutions are generated.</p>
Full article ">Figure 14
<p>HillVall’s behaviors on CEC 2022 <math display="inline"><semantics> <msub> <mi>f</mi> <mn>9</mn> </msub> </semantics></math> (<math display="inline"><semantics> <mrow> <mi>D</mi> <mo>=</mo> <mn>10</mn> </mrow> </semantics></math>): In the clustering result figure, the colored points are the solutions initialized in the restart stage and the color represents the subspaces to which the solution belongs. The initialized population figure shows the initialized population for the algorithm trait. The algorithm trait figure shows all data of the evolution based on the above initialized population and the color indicates the iteration at which the solutions are generated.</p>
Full article ">Figure 15
<p>The BoAs of <math display="inline"><semantics> <msub> <mi>h</mi> <mn>7</mn> </msub> </semantics></math>, where the color indicates the BoA to which each solution belongs.</p>
Full article ">Figure 16
<p>Algorithm behaviors on CEC 2022 <math display="inline"><semantics> <msub> <mi>h</mi> <mn>7</mn> </msub> </semantics></math> (<math display="inline"><semantics> <mrow> <mi>D</mi> <mo>=</mo> <mn>10</mn> </mrow> </semantics></math>): The data of ANDE are the union of <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">S</mi> <mi>best</mi> </msub> </semantics></math> and all solutions of each population at the last stage of the evolution. The color represents the subpopulation to which the solution belongs. The data of Hillvall are the union of <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">S</mi> <mi>best</mi> </msub> </semantics></math> and the best solutions of all evolved populations at each restart time. The color indicates the restart time of each solution. For other algorithms, the data represent the union of <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">S</mi> <mi>best</mi> </msub> </semantics></math> and all data from one algorithm trait. The color indicates the iteration at which the solutions are generated.</p>
Full article ">Figure 17
<p>HillVall’s behaviors on CEC 2022 <math display="inline"><semantics> <msub> <mi>h</mi> <mn>7</mn> </msub> </semantics></math> (<math display="inline"><semantics> <mrow> <mi>D</mi> <mo>=</mo> <mn>10</mn> </mrow> </semantics></math>): In the clustering result figure, the colored points are the solutions initialized in the restart stage and the color represents the subspaces to which the solution belongs. The initialized population figure shows the initialized population for the algorithm trait. The algorithm trait figure shows all data of the evolution based on the above initialized population and the color indicates the iteration at which the solutions are generated.</p>
Full article ">Figure 18
<p>Algorithm behaviors on CEC 2022 <math display="inline"><semantics> <msub> <mi>h</mi> <mn>1</mn> </msub> </semantics></math> (<math display="inline"><semantics> <mrow> <mi>D</mi> <mo>=</mo> <mn>10</mn> </mrow> </semantics></math>): The data of ANDE are the union of <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">S</mi> <mi>best</mi> </msub> </semantics></math> and all solutions of each population at the last stage of the evolution. The color represents the subpopulation to which the solution belongs. The data of Hillvall are the union of <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">S</mi> <mi>best</mi> </msub> </semantics></math> and the best solutions of all evolved populations at each restart time. The color indicates the restart time of each solution. For other algorithms, the data represent the union of <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">S</mi> <mi>best</mi> </msub> </semantics></math> and all data from one algorithm trait. The color indicates the iteration at which the solutions are generated. The black rectangular box represents the global optima. <span class="html-italic">F</span> is the number of global optima found by the algorithm.</p>
Full article ">Figure 18 Cont.
<p>Algorithm behaviors on CEC 2022 <math display="inline"><semantics> <msub> <mi>h</mi> <mn>1</mn> </msub> </semantics></math> (<math display="inline"><semantics> <mrow> <mi>D</mi> <mo>=</mo> <mn>10</mn> </mrow> </semantics></math>): The data of ANDE are the union of <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">S</mi> <mi>best</mi> </msub> </semantics></math> and all solutions of each population at the last stage of the evolution. The color represents the subpopulation to which the solution belongs. The data of Hillvall are the union of <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">S</mi> <mi>best</mi> </msub> </semantics></math> and the best solutions of all evolved populations at each restart time. The color indicates the restart time of each solution. For other algorithms, the data represent the union of <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">S</mi> <mi>best</mi> </msub> </semantics></math> and all data from one algorithm trait. The color indicates the iteration at which the solutions are generated. The black rectangular box represents the global optima. <span class="html-italic">F</span> is the number of global optima found by the algorithm.</p>
Full article ">Figure 19
<p>The NBN of CEC 2011 <math display="inline"><semantics> <mrow> <msub> <mi>h</mi> <mn>2</mn> </msub> <mrow> <mo>(</mo> <mi>D</mi> <mo>=</mo> <mn>6</mn> <mo>)</mo> </mrow> </mrow> </semantics></math>.</p>
Full article ">Figure 20
<p>(<b>a</b>) Relationship between the objective of the best solution of all data and the number of independent runs. (<b>b</b>) Relationship between the reduction ratio <span class="html-italic">r</span> and max NBD.</p>
Full article ">Figure 21
<p>Local structure of <math display="inline"><semantics> <msub> <mi>h</mi> <mn>12</mn> </msub> </semantics></math> where <span class="html-italic">r</span> is the reduction ratio of the problem’s search range as introduced above.</p>
Full article ">
22 pages, 3983 KiB  
Article
Leveraging Machine Learning to Analyze Semantic User Interactions in Visual Analytics
by Dong Hyun Jeong, Bong Keun Jeong and Soo Yeon Ji
Information 2024, 15(6), 351; https://doi.org/10.3390/info15060351 - 13 Jun 2024
Viewed by 1079
Abstract
In the field of visualization, understanding users’ analytical reasoning is important for evaluating the effectiveness of visualization applications. Several studies have been conducted to capture and analyze user interactions to comprehend this reasoning process. However, few have successfully linked these interactions to users’ [...] Read more.
In the field of visualization, understanding users’ analytical reasoning is important for evaluating the effectiveness of visualization applications. Several studies have been conducted to capture and analyze user interactions to comprehend this reasoning process. However, few have successfully linked these interactions to users’ reasoning processes. This paper introduces an approach that addresses the limitation by correlating semantic user interactions with analysis decisions using an interactive wire transaction analysis system and a visual state transition matrix, both designed as visual analytics applications. The system enables interactive analysis for evaluating financial fraud in wire transactions. It also allows mapping captured user interactions and analytical decisions back onto the visualization to reveal their decision differences. The visual state transition matrix further aids in understanding users’ analytical flows, revealing their decision-making processes. Classification machine learning algorithms are applied to evaluate the effectiveness of our approach in understanding users’ analytical reasoning process by connecting the captured semantic user interactions to their decisions (i.e., suspicious, not suspicious, and inconclusive) on wire transactions. With the algorithms, an average of 72% accuracy is determined to classify the semantic user interactions. For classifying individual decisions, the average accuracy is 70%. Notably, the accuracy for classifying ‘inconclusive’ decisions is 83%. Overall, the proposed approach improves the understanding of users’ analytical decisions and provides a robust method for evaluating user interactions in visualization tools. Full article
(This article belongs to the Special Issue Information Visualization Theory and Applications)
Show Figures

Figure 1

Figure 1
<p>An interactive wire transaction analysis system consisting of two tools—(<b>A</b>) a wire transaction analysis tool and (<b>B</b>) an investigation tracing tool. Similar to the original system (WireVis), the wire transaction analysis tool consists of multiple views—(<b>C</b>) heatmap view, (<b>D</b>) strings and beads view, and (<b>E</b>) keyword relation view. The system allows the user to analyze and investigate wire transactions. The investigation tracing tool has been added to support understanding the user’s analysis decisions. It has two views (<b>F</b>) a PCA Projection view and (<b>G</b>) a data view. The tool shows additional information about wire transactions regarding the money flows with keywords and amounts.</p>
Full article ">Figure 2
<p>In the investigation tracing tool, all wire transactions are represented as outlined rectangular objects. Results from prior investigations are added as colored cells within each rectangular object. Different color attributes are used to represent the investigation results—(A) suspicious, (B) not suspicious, and (C) inconclusive. (D) Analysts arrive at different conclusions regarding wire transactions as either suspicious or inconclusive.</p>
Full article ">Figure 3
<p>Visual state transaction matrix representing an analyst’s semantic user interactions. Connected spline-curved polylines represent changes in transitional states, illustrating the flow of semantic user interactions. Red arrows are embedded within the polylines to guide users in understanding the transitions within the matrix.</p>
Full article ">Figure 4
<p>Visual state transaction matrix after hiding unexplored states. Each cell represents the probability of transitioning to each state, along with the amount of time the user spent within the state. A reddish triangle indicates the beginning state; while an inverted blue triangle denotes the ending state. The cell where the user spends the most time is highlighted in blue.</p>
Full article ">
22 pages, 9658 KiB  
Article
Justification vs. Transparency: Why and How Visual Explanations in a Scientific Literature Recommender System
by Mouadh Guesmi, Mohamed Amine Chatti, Shoeb Joarder, Qurat Ul Ain, Clara Siepmann, Hoda Ghanbarzadeh and Rawaa Alatrash
Information 2023, 14(7), 401; https://doi.org/10.3390/info14070401 - 14 Jul 2023
Cited by 4 | Viewed by 2759
Abstract
Significant attention has been paid to enhancing recommender systems (RS) with explanation facilities to help users make informed decisions and increase trust in and satisfaction with an RS. Justification and transparency represent two crucial goals in explainable recommendations. Different from transparency, which faithfully [...] Read more.
Significant attention has been paid to enhancing recommender systems (RS) with explanation facilities to help users make informed decisions and increase trust in and satisfaction with an RS. Justification and transparency represent two crucial goals in explainable recommendations. Different from transparency, which faithfully exposes the reasoning behind the recommendation mechanism, justification conveys a conceptual model that may differ from that of the underlying algorithm. An explanation is an answer to a question. In explainable recommendation, a user would want to ask questions (referred to as intelligibility types) to understand the results given by an RS. In this paper, we identify relationships between Why and How explanation intelligibility types and the explanation goals of justification and transparency. We followed the Human-Centered Design (HCD) approach and leveraged the What–Why–How visualization framework to systematically design and implement Why and How visual explanations in the transparent Recommendation and Interest Modeling Application (RIMA). Furthermore, we conducted a qualitative user study (N = 12) based on a thematic analysis of think-aloud sessions and semi-structured interviews with students and researchers to investigate the potential effects of providing Why and How explanations together in an explainable RS on users’ perceptions regarding transparency, trust, and satisfaction. Our study shows qualitative evidence confirming that the choice of the explanation intelligibility types depends on the explanation goal and user type. Full article
(This article belongs to the Special Issue Information Visualization Theory and Applications)
Show Figures

Figure 1

Figure 1
<p><span class="html-italic">Why</span> explanation—Iteration 1.</p>
Full article ">Figure 2
<p><span class="html-italic">How</span> explanation—Iteration 1.</p>
Full article ">Figure 3
<p><span class="html-italic">Why</span> explanation—Iteration 2.</p>
Full article ">Figure 4
<p><span class="html-italic">How</span> explanation Overview—Iteration 2.</p>
Full article ">Figure 5
<p><span class="html-italic">How</span> explanation Detailed—Iteration 2.</p>
Full article ">Figure 6
<p><span class="html-italic">Why</span> explanation—Iteration 3.</p>
Full article ">Figure 7
<p><span class="html-italic">How</span> explanation Overview—Iteration 3.</p>
Full article ">Figure 8
<p><span class="html-italic">How</span> explanation Detailed—Iteration 3.</p>
Full article ">Figure 8 Cont.
<p><span class="html-italic">How</span> explanation Detailed—Iteration 3.</p>
Full article ">Figure 9
<p><span class="html-italic">Why</span> explanation—Implementation.</p>
Full article ">Figure 10
<p><span class="html-italic">How</span> explanation (Overview)—Implementation.</p>
Full article ">Figure 10 Cont.
<p><span class="html-italic">How</span> explanation (Overview)—Implementation.</p>
Full article ">Figure 11
<p><span class="html-italic">How</span> explanation (Detailed)—Implementation.</p>
Full article ">Figure 11 Cont.
<p><span class="html-italic">How</span> explanation (Detailed)—Implementation.</p>
Full article ">Figure 12
<p>Results from the ResQue questionnaire.</p>
Full article ">Figure 13
<p>Overall user experience with the <span class="html-italic">Why</span> and <span class="html-italic">How</span> explanations.</p>
Full article ">
17 pages, 12932 KiB  
Article
On Isotropy of Multimodal Embeddings
by Kirill Tyshchuk, Polina Karpikova, Andrew Spiridonov, Anastasiia Prutianova, Anton Razzhigaev and Alexander Panchenko
Information 2023, 14(7), 392; https://doi.org/10.3390/info14070392 - 10 Jul 2023
Cited by 4 | Viewed by 3090
Abstract
Embeddings, i.e., vector representations of objects, such as texts, images, or graphs, play a key role in deep learning methodologies nowadays. Prior research has shown the importance of analyzing the isotropy of textual embeddings for transformer-based text encoders, such as the BERT model. [...] Read more.
Embeddings, i.e., vector representations of objects, such as texts, images, or graphs, play a key role in deep learning methodologies nowadays. Prior research has shown the importance of analyzing the isotropy of textual embeddings for transformer-based text encoders, such as the BERT model. Anisotropic word embeddings do not use the entire space, instead concentrating on a narrow cone in such a pretrained vector space, negatively affecting the performance of applications, such as textual semantic similarity. Transforming a vector space to optimize isotropy has been shown to be beneficial for improving performance in text processing tasks. This paper is the first comprehensive investigation of the distribution of multimodal embeddings using the example of OpenAI’s CLIP pretrained model. We aimed to deepen the understanding of the embedding space of multimodal embeddings, which has previously been unexplored in this respect, and study the impact on various end tasks. Our initial efforts were focused on measuring the alignment of image and text embedding distributions, with an emphasis on their isotropic properties. In addition, we evaluated several gradient-free approaches to enhance these properties, establishing their efficiency in improving the isotropy/alignment of the embeddings and, in certain cases, the zero-shot classification accuracy. Significantly, our analysis revealed that both CLIP and BERT models yielded embeddings situated within a cone immediately after initialization and preceding training. However, they were mostly isotropic in the local sense. We further extended our investigation to the structure of multilingual CLIP text embeddings, confirming that the observed characteristics were language-independent. By computing the few-shot classification accuracy and point-cloud metrics, we provide evidence of a strong correlation among multilingual embeddings. Embeddings transformation using the methods described in this article makes it easier to visualize embeddings. At the same time, multiple experiments that we conducted showed that, in regard to the transformed embeddings, the downstream tasks performance does not drop substantially (and sometimes is even improved). This means that one could obtain an easily visualizable embedding space, without substantially losing the quality of downstream tasks. Full article
(This article belongs to the Special Issue Information Visualization Theory and Applications)
Show Figures

Figure 1

Figure 1
<p>Semantic visualization of image embeddings.</p>
Full article ">Figure 2
<p>CLIP image embeddings distribution and singular values. Note that the embeddings are not centered, so they are located in a cone (with the vertex at the origin). The first SVD component captures the shift from the origin. Fast singular value decay indicates anisotropy. <math display="inline"><semantics><mrow><msub><mi>I</mi><mn>1</mn></msub><mo>=</mo><mn>0.84</mn></mrow></semantics></math>, <math display="inline"><semantics><mrow><msub><mi>I</mi><mn>2</mn></msub><mo>=</mo><mn>0.03</mn></mrow></semantics></math>.</p>
Full article ">Figure 3
<p>Centered CLIP image embeddings distribution and singular values. Note a slower singular value decay. <math display="inline"><semantics><mrow><msub><mi>I</mi><mn>1</mn></msub><mo>=</mo><mn>0.99</mn></mrow></semantics></math>, <math display="inline"><semantics><mrow><msub><mi>I</mi><mn>2</mn></msub><mo>=</mo><mn>0.00</mn></mrow></semantics></math>.</p>
Full article ">Figure 4
<p>Whitened CLIP image embeddings distribution and singular values. All singular values are equal. <math display="inline"><semantics><mrow><msub><mi>I</mi><mn>1</mn></msub><mo>=</mo><mn>0.99</mn></mrow></semantics></math>, <math display="inline"><semantics><mrow><msub><mi>I</mi><mn>2</mn></msub><mo>=</mo><mn>0.00</mn></mrow></semantics></math>.</p>
Full article ">Figure 5
<p><tt>CLIP-RN101</tt> image embeddings distribution computed with (<b>left</b>) or without (<b>right</b>) bias for freshly initialized model (inputs are drawn from COCO-caption-2015).</p>
Full article ">Figure 6
<p><tt>CLIP-ViT-B/32</tt> image embeddings distribution computed with (<b>left</b>) or without (<b>right</b>) bias for freshly initialized model (inputs are drawn from COCO-caption-2015).</p>
Full article ">Figure 7
<p><tt>CLIP</tt> text embeddings distribution computed with (<b>left</b>) or without (<b>right</b>) bias for freshly initialized model (inputs are drawn from COCO-caption-2015).</p>
Full article ">Figure 8
<p>CLIP image and text embeddings distributions and singular values. Note that the two clusters are clearly separated.</p>
Full article ">Figure 9
<p>Distributions of cosine similarities between CLIP embeddings. Note that the similarity is closer to zero.</p>
Full article ">Figure 10
<p>Distributions of cosine similarities between CLIP embeddings after the Procrustes transform. Note that the similarity is closer to one. The distributions are still separated well.</p>
Full article ">Figure 11
<p>Distributions of cosine similarities between CLIP embeddings after LSTSQ transform. The results are similar to the Procrustes one.</p>
Full article ">Figure 12
<p>CLIP image and text embeddings distributions after the Procrustes transform. Note better mixing.</p>
Full article ">Figure 13
<p>CLIP image and text embeddings distributions after the LSTSQ transform. Note that the text distribution looks more compact.</p>
Full article ">Figure 14
<p>UMAP [<a href="#B15-information-14-00392" class="html-bibr">15</a>] projections of text embeddings from multilingual <tt>CLIP/M-BERT-Distil-40</tt>. French–English (<b>left</b>), German–English (<b>middle</b>), and Russian–English (<b>right</b>) pairs.</p>
Full article ">Figure 15
<p>UMAP [<a href="#B15-information-14-00392" class="html-bibr">15</a>] projections of text embeddings from multilingual <tt>CLIP/XLM-Roberta-Large-Vit-B-16Plus</tt>. French–English (<b>left</b>), German–English (<b>middle</b>), and Russian–English (<b>right</b>) pairs.</p>
Full article ">Figure A1
<p>Semantic visualisation of image embeddings aligned with textual embeddings below via Procrustes.</p>
Full article ">Figure A2
<p>Semantic visualisation of text embeddings aligned with image embeddings above via Procrustes.</p>
Full article ">
Back to TopTop