A Method for Presenting UML Class Diagrams with Audio for Blind and Visually Impaired Students

Ira Woodring, Grand Valley State University, United States, woodriir@gvsu.edu

Charles Owen, Michigan State University, United States, cbowen@msu.edu

Samia Islam, Michigan State University, United States, islamsa3@msu.edu

DOI: https://doi.org/10.1145/3652037.3652056
PETRA '24: The PErvasive Technologies Related to Assistive Environments (PETRA) conference, Crete, Greece, June 2024

Unified Modeling Language class diagrams convey relationships between code units in Object-Oriented software projects. They are commonly used in industry and extensively used in undergraduate computer science courses to convey common software engineering concepts. Unfortunately, class diagrams are a purely visual language - rendering them useless for programmers with visual impairments or blindness. This work addresses that deficiency with a novel auditory method that conveys relationship properties between classes using sonification techniques. Study results show that users can quickly and reliably perceive the presented relationships. The study was a proof-of-concept, with future work leveraging this mechanism to create a complete class diagramming tool accessible to students who are either blind or visually impaired.

CCS Concepts: • Software and its engineering → Unified Modeling Language (UML); • Software and its engineering → Unified Modeling Language (UML); • Human-centered computing → Accessibility systems and tools;

Keywords: UML, class diagrams, blind, visual impairment, accessibility, sonification

ACM Reference Format:
Ira Woodring, Charles Owen, and Samia Islam. 2024. A Method for Presenting UML Class Diagrams with Audio for Blind and Visually Impaired Students. In The PErvasive Technologies Related to Assistive Environments (PETRA) conference (PETRA '24), June 26--28, 2024, Crete, Greece. ACM, New York, NY, USA 6 Pages. https://doi.org/10.1145/3652037.3652056

1 INTRODUCTION

Creating software that meets specifications is a complex task, and programmers need a strong understanding of the relationships between units of code both before and during development. Unified Modeling Language (UML) class diagrams have long been a standard mechanism by which relationships in software units are conveyed. However, UML is a visual language - a limitation that makes understanding relationships between pieces of code harder for engineers who are blind or have visual impairments [13].

More importantly, this lack of access may make it harder for students studying computer science and software engineering to be successful, as many theoretical concepts and best practices are conveyed via class diagrams. For instance, both classic and more modern software engineering textbooks such as the famous Gang of Four book [7] or the more recent Pressman and Maxim [16] illustrate common design patterns - solutions to commonly encountered problems in software engineering - by way of class diagrams. Thus, the lack of an adequate alternative threatens access to the field for those students.

Though screen readers are nearly ubiquitous (and simple ones come standard with most operating systems), they still fail at the task of presenting graphical data [17]. To address this failure, interface designers have explored methods to present graphical information audibly [3, 4, 11]. However, these audio interfaces tend to be very specialized, due to the varied nature of graphical data; generic interfaces remain elusive, as a designer must consider the data to be presented and how that data relates to psychoacoustic properties of sound when designing an interface [18]. Thus, there is no existing audio interface that can adequately convey class diagrams.

This work is a proof-of-concept study intended to evaluate the efficacy of using audio to present simple UML diagrams that one might find in an undergraduate course in computer science or software engineering. We intend to use this work to aid students who are blind or have visual impairments to have better access to the curriculum. This work aims to be intuitive, require a low mental workload to understand, and require no additional hardware other than that which is readily available on the majority of personal computing devices.

2 RELATED WORK

There have been attempts to create a non-visual mechanism for conveying UML diagrams. These methods have been either touch-based (haptic or tactile), or audio based. These works overwhelmingly model the visual UML "top-down" approach to understanding a UML diagram. By this, we mean that with existing diagrams, users are given an entire diagram at once, and then explore various parts of the graph to better understand relationships between components. A top-down approach works well for the visual medium; however, for an audible medium, this is less than ideal. The issue arises due to the amount of stimuli a user must discern and remember. Humans have a higher bandwidth for visual information [8, 9, 10]. Attempting to present the same amount of stimuli via the more constrained bandwidth of human hearing can be overwhelming. We posit that a better mechanism would be a bottom-up approach, whereby users are presented with a limited number of stimuli focused on a particular class and its immediate relationships, and that by examining multiple classes in this manner they may build a mental schema of the overall diagram. Furthermore, an ideal approach would likely combine both top-down and bottom-up methods of presentation that a user may quickly switch between while examining a diagram.

2.1 Touch-based Methods

The use of an electronic Braille display to recreate UML sequence diagrams was explored with the HyperReader tool [12]. Refreshable Braille display devices raise and lower pins on a grid to convey Braille letters and other information that users can examine by touch. The HyperReader tool displayed UML sequence diagrams and was thus more focused on timing relationships rather than structural relationships between classes. While the mechanism may be adaptable to class diagrams, refreshable Braille displays are relatively expensive to purchase; the American Federation for the Blind estimates purchase costs between $3,000, and $15,000 [1]. Worse, they can be expensive to repair when broken and may need to be shipped back to countries of origin to be fixed [15]. Hence, a more economical mechanism is needed.

Alamri et al. devised a haptic mechanism that made use of an Omri Phantom ¹ (now Oqton Touch ²) device [2]. This device allows for six degrees of freedom as an input device and provides force feedback as output. Three types of force feedback convey class relationships and properties: an effect simulating object weight to imply the number of class properties (instance variables and methods) a class contains, an elastic or pulling effect when dragging a class that has some relationship to another class (i.e. an association or generalization), and a collision effect when dragging a class that collides with another which is used to provide the user with a perception of the diagram space and a class's location in that space relative to other classes. While this work does present the entirety of a diagram at once, it easily allows users to explore the locality of a portion of the diagram. Thus, we can view it as supporting both a top-down and bottom-up approach. However, this approach for presenting UML has two drawbacks: firstly, the haptic method extends the time required to analyze the graph by 25%, and secondly, similar to the HyperReader project, it employs costly and specialized hardware, which is not easily accessible to the majority of users.

2.2 Audio-based Methods

Coburn and Owen devised an audio UML interface that made use of earcons (non-speech audio) combined with a speech called Audible Browser [5]. Their work analogizes the concept of a class diagram to a constellation of stars. To portray the constellation, audio pitch, and stereo balance were used to convey the location of a star (a class) in the overall constellation (the diagram). Audible Browser allows users to explore the diagram with the mouse or to play a representation of the entire diagram at once. Playing a diagram presents each stimulus in the diagram sequentially. However, their method is inherently top-down; though users can explore the diagram with the mouse to learn more about particular classes, there is no mechanism for audibly presenting only a focused portion of the diagram.

Figure 1: AudibleBrowser [5]. Tones are played to represent the presence or absence of a class in the diagram. Tones are produced by modulating the pitch to represent the vertical axis and stereo placement for the horizontal axis.

Metatla et. al devised a mechanism to present UML diagrams by creating hierarchies of the classes and relationships between them [14]. Their mechanism created a tree of the diagram with multiple branches. One branch held a list of all classes in the diagram. Another could be expanded to view all of the associations or all of the generalizations in the diagram. They noted that conveying UML requires the ability to represent both navigational data and content information. Navigational data (as they define it) consists of audio to acknowledge a user action, such as a click when moving to the next class in a graph. An example of content information would be the system speaking the names of the classes a particular class is connected to. They developed two approaches to convey diagrams, a Verbose Mode and a Terse Mode, the first of which provided verbal descriptions of all actions taken by the user as well as verbal descriptions of content. In contrast, Terse Mode only gave verbal descriptions of content, while providing nonverbal audio for navigational data. Results showed that users were able to reliably comprehend the diagrams. However, completion times of the tasks were high, and the researchers noted that due to the hierarchical design, users could become lost or confused as to their current hierarchical level as they were navigating the diagrams.

The greatest benefit of the work by Metatla et al. is the reduction of information presented at once to the user. Their use of hierarchies can be seen as a bottom-up approach to learning a diagram, and they show that a bottom-up method requires fewer stimuli, as it presents less content information at once, with the added benefit of then requiring a lower amount of navigational stimuli.

Figure 2: UML Class Diagram as a tree of hierarchies. The leaves of the tree hold information about the classes and relationship types between them [14].

3 METHOD

Real-world class diagrams for large software projects can be large and complex, while diagrams to convey educational concepts tend to be much smaller and simpler. In our work, we have decided to focus on the subset of diagrams most likely to be found in widely used software engineering texts. These diagrams illustrate concepts important for software engineering students to learn such as recurring design patterns, and usually contain a small number of classes. For instance, the classic software engineering text Design Patterns: Elements of Reusable Object-Oriented Software contains twenty-three commonly used design patterns [7]. The sample structures provided by the authors of that text on average contained 4-5 classes (n = 23, μ = 4.5, min = 1, max = 10). Figure 3 illustrates one of these educational examples, a very commonly used pattern called the Iterator pattern.

Figure 3: The Iterator design pattern (originally published in [7]), drawn using modern UML class diagrams. The client contains an instance variable of type `ConcreteAggregate` and can iterate over this collection by use of a `ConcreteIterator`.

A nonet (a grid of nine cells) serves to organize, display, or process information in a structured, three-by-three format, which can be particularly useful in various analytical, computational, or design tasks. Zhao et. al showed that a nonet, or a grid with nine cells, can be leveraged to convey graphical data via audio (Figure 4) [19]. Their work leveraged the nonet concept combined with heatmaps and the modulation of various psychoacoustic properties to convey scatterplots. Of particular importance is that each cell of their grid was portrayed in a predefined order and that an empty cell was represented by either a pause or a short stimulus.

We apply the nonet concept as a mechanism for presenting UML class diagrams. Given a list of classes, a user may select a class from a hierarchy to examine more closely. The chosen class is placed in the center cell of the nonet. Classes that directly relate to the chosen class are placed in the surrounding cells of the nonet. When a user chooses to play a representation of the selected class and its relationships, the stimulus for each cell in the nonet (except the chosen class in the center) is played, clockwise starting at the upper left cell and ending at the middle left cell. Any cell that does not contain a class directly related to the currently chosen class is conveyed with a short clicking sound.

Figure 4: Graphical data (scatterplot) superimposed on a nonet, from the work of Zhao et al. [19]).

While many possible relationships can be presented in UML class diagrams, four relationship types are the most common and represent a subset commonly taught in undergraduate computing courses. We will describe these in terms of two anonymously typed classes, class type A and class type B. An association represents a reference relationship between two classes, often instantiated in code as a pointer or reference members of one or both of the classes, for example, when class A holds an instance variable of type B. A generalization is a relationship whereby class A generalizes (as in is more generic than) class type B; these are used to illustrate parent-child or superclass-subclass relationships. A realization occurs when class A implements an interface specified by class type B (i.e. implements the methods that type B specifies). Finally, a dependency exists when class A requires classB to exist so that it can complete some task. Of these four relationship types, realization doesn't exist in all languages, and can be simulated by way of generalization and abstract classes; therefore the authors of this work decided not to include it in this proof-of-concept. Dependency relationships serve as a sort of "catch-all" type that can be complex for introductory students to grasp and was therefore left out of this work, though we may need to add it in later iterations.

Audio stimuli were assigned to the two remaining relationship types and general MIDI instruments were used in this regard for simplicity. An association was represented by a half-second C4 note (MIDI note 60) with a synthesized acoustic grand piano (MIDI instrument 0). Generalizations were represented by a half-second C4 note with a synthesized glockenspiel (MIDI instrument 10). These were chosen by the researchers due to their disparate timbres. The Oracle Java JDK provided the synthesizer used for our MacOS prototype, however, we chose to use the FluidSynth ³ sound font, as the default JDK sound font was deemed to be of poor quality.

Participants listened to the stimuli via Sony MDR7506 Studio Monitor headphones connected to a four-channel Behringer Powerplay Pro-XL headphone amplifier. As the amplifier was multi-channel, the researchers were able to listen to the stimuli at the same as the participants, and both researcher and participant had full control of the volume for their headphones. Participants could press a button on a gamepad to start the playback of the diagram. Each cell of the grid was played sequentially in clockwise order around the chosen class, which occupied the center of the grid. A short click was played if there was no class in a particular cell, otherwise, the association or generalization sounds were played to indicate a class in the current playing cell related to the chosen class via that type of relationship. Participants had a button that could repeat the diagram if they wished, as well as buttons for controlling the length of the delay between stimuli. Our test-bed layout is illustrated in Figure 5.

Figure 5: Sample layout of a diagram using our test-bed. `CreditCard` is the selected class and occupies the central cell. `CreditCard` has a generalization relationship with `PaymentMethod`, and two associations - with `BillingAddress`, and `EMVChip`. When the user selects to play the diagram, stimuli will begin with cell 1 and circle the selected class to cell 8, playing a different stimulus for an empty cell, an association, or a generalization.

4 STUDY

An invitation was extended to undergraduate students in computing courses at the university at which one of the authors of this work teaches. The study population ultimately consisted of n = 29 undergraduate students (24 male, 5 female). Participants were asked if they could define association, generalization, realization, and dependency as it pertains to UML class diagrams. Those students who were unable to do so correctly received a few minutes of review. Participants were then shown the grid with numbers printed in the lower right corner of each cell. They were told how the cells in the diagram would be sequenced and asked to demo it back to the researcher.

The researcher then described how a class diagram would be conveyed. Next, the researcher played a repeating sound while the participants put on the headphones and adjusted the volume to a comfortable level. The researcher then played the stimuli used to represent an association, followed by the stimuli used to represent generalizations. When the participants indicated they were able to distinguish between the two and that they were comfortable with the overall concept, and had no questions, the researcher began presenting sample diagrams. To ensure a wide range of parameters were evaluated, the system randomly selected the number of associations and generalizations to present for this proof-of-concept.

Participants were able to start the playback of a diagram when they were ready and were able to replay the diagram if they desired. The researcher then asked the participant to relate the number of associations and the number of generalizations the diagram conveyed. Once the participants indicated they were comfortable with the sample diagrams, the researcher began keeping track of participant responses. Each participant was tested on ten diagrams.

For each diagram, the users were instructed that the central cell of the grid was representative of a class for which we wanted to know the number and types of relationships. The system generated a random number of related classes (1 ≤ n ≤ 8, where n is the number of relative classes). Of these, between 0 and 2 were selected and assigned a generalization relationship to the class of focus. A small number was chosen for the possible number of generalizations as it is uncommon (and often discouraged) for classes to inherit from more than one class. The remaining classes were assigned an association relationship to the central class.

After the test of ten diagrams, the researcher then asked the participants to listen to an additional five diagrams. Participants were not asked to keep track of the number of associations and generalizations for these five tests, but rather to keep track of the location (based on the grid number) where either an association or generalization occurred. The purpose of the second test was to examine the efficacy of the nonet mechanism also used in a top-down manner, as we believe a complete system will require both of the top-down and bottom-up methods.

This project was approved by the university's Institutional Review Board.

5 RESULTS

For the first test, all participants listened to ten diagrams and were asked to note the number of associations and generalizations for the focused class, yielding n = 580 prompts. Participants erred on the number of associations in a diagram a total of 11 times ($3.8\%$ error rate) and erred on the number of generalizations 14 times ($4.8\%$ error rate).

The types of errors participants made varied. Participants simply miscounted the number of associations but counted the correct number of generalizations on n = 2 occasions ($0.7\%$ error rate). They miscounted the number of generalizations but still gave the correct number of associations n = 5 times ($1.7\%$ error rate). The most common error occurred when participants miscounted both; this took place on n = 9 of the trials ($3.1\%$ error rate). In all cases of a miscount for both categories, it appeared that the participants had mistaken one stimulus for the other, as in each instance a miscount in one category resulted in a miscount in the other category of the inverse amount. For instance, a diagram may have conveyed 4 associations and 2 generalizations, yet the participant perceived 5 associations and 1 generalization, or 3 associations and 3 generalizations. The first type of error (perceiving a generalization as an association) occurred in n = 7 of these instances, whereas the second type (perceiving an association as a generalization) occurred n = 2 of the instances.

The second test was harder for participants. Each of the n = 29 participants had n = 5 trials to identify which cells of the grid contained a related class. The number of errors across all trials was 21 for an overall error rate of $14.5\%$. Two types of errors were made by participants; errors of omission when a participant did not perceive one of the stimuli (n = 12, or $57\%$ of the errors), and errors occurring because the participant misidentified the cell (n = 9, or $43\%$ of the errors).

6 DISCUSSION

The results of the first test were promising. Participants were able to consistently, quickly, and accurately recall the number of associations and generalizations for a particular class. This is likely due to the low number of stimuli presented combined with a consistent and well-defined presentation space, i.e. the nonet design with a single class of focus at its center.

Results for the second test were less promising, but should not be ignored. While participants erred at the rate of nearly $15\%$ at this task, a small amount of training may be able to increase their accuracy. The error rate for identifying one of the multiple stimuli is significantly greater than the error rate in determining the relationship direction, which tends to support the directional representation capabilities of the nonet approach.

7 FUTURE WORK

The scope of this work was to design a proof-of-concept, and it was successful in that endeavor. Future work will be undertaken to create a UML browser that can load pre-created UML diagrams or create diagrams from scratch or existing software source code. User studies will then be conducted to determine if students who explore UML diagrams via this system show similar levels of understanding as students who examine the same diagrams in a visual format.

The stimuli used to convey relationship types were chosen by the researchers in this experiment. A better option may be for participants to choose stimuli with perceptual meaning that map closely to their preexisting schemas of relationship types [6].

While our work focused solely on the presentation of the relationships between classes in a UML Class Diagram, this mechanism may be useful for the audible display of more general mathematical graphs. class diagrams do have some similarities with mathematical graphs (node-edge graphs), which in turn have several similarities to other types of UML diagrams such as sequence diagrams, and even non-UML diagrams such as flowcharts.

A formal evaluation of the time taken for participants to answer the questions after a diagram was presented was not conducted for this study (though we did track the number of times a participant repeated a diagram). Future work should compare the time needed for this mechanism to that required for visual UML. Additionally, while users noted the mechanism used for this study to be intuitive, we may want to formally evaluate the mental load using the NASA Task Load Index ⁴.

8 CONCLUSIONS

This proof-of-concept showed that participants were able to determine the number of association and generalization types for a class with high accuracy. Additionally, several participants noted that the mechanism used was intuitive. Localization of a class related to a focused class was harder for participants but still showed promise. Additionally, participants received very little training before testing; it is likely that extended training may yield better results.

REFERENCES

[n. d.]. Refreshable Braille Displays. https://www.afb.org/node/16207/refreshable-braille-displays#: :text=The%20price%20of%20braille%20displays, the%20number%20of%20characters%20displayed.
Atif Alamri, Mohamad Eid, and Abdulmotaleb El Saddik. 2007. A haptic enabled uml case tool. In 2007 IEEE International Conference on Multimedia and Expo. IEEE, 1023–1026.
James L. Alty and Dimitrios I. Rigas. 1998. Communicating Graphical Information to Blind Users Using Music: The Role of Context. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Los Angeles, California, USA) (CHI ’98). ACM Press/Addison-Wesley Publishing Co., USA, 574–581. https://doi.org/10.1145/274644.274721
Andy Brown, Robert Stevens, and Steve Pettifer. 2006. Audio representation of graphs: A quick look. In Proceedings of the International Conference on Auditory Displays. Citeseer.
Sarah Coburn and Charles B Owen. 2014. UML diagrams for blind programmers. In Proc. of the 2014 ASEE North Central Section Conference. Oakland, USA. 1–7.
Jamie Ferguson and Stephen A Brewster. 2018. Investigating perceptual congruence between data and display dimensions in sonification. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–9.
E. Gamma, R. Helm, R. Johnson, and J. Vlissides. 1994. Design Patterns: Elements of Reusable Object-Oriented Software. Pearson Education. https://books.google.com/books?id=6oHuKQe3TjQC
Alastair Haigh, David J Brown, Peter Meijer, and Michael J Proulx. 2013. How well do you see what you hear? The acuity of visual-to-auditory sensory substitution. Frontiers in psychology 4 (2013), 330.
Homer Jacobson. 1950. The informational capacity of the human ear. Science 112, 2901 (1950), 143–144.
Homer Jacobson. 1951. The informational capacity of the human eye. Science 113, 2933 (1951), 292–293.
Jeongyeon Kim, Yoonah Lee, and Inho Seo. 2019. Math graphs for the visually impaired: Audio presentation of elements of mathematical graphs. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems. 1–6.
Claudia Loitsch and Gerhard Weber. 2012. Viable Haptic UML for Blind People. In Computers Helping People with Special Needs, Klaus Miesenberger, Arthur Karshmer, Petr Penaz, and Wolfgang Zagler (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 509–516.
Sean Mealin and Emerson Murphy-Hill. 2012. An exploratory study of blind software developers. In 2012 ieee symposium on visual languages and human-centric computing (vl/hcc). IEEE, 71–74.
Oussama Metatla, Nick Bryan-Kinns, and Tony Stockman. 2007. Auditory External Representations: Exploring and Evaluating the Design and Learnability of an Auditory UML Diagram. In Proc. of the International Conference on Auditory Display. Montréal, Canada. 411–418.
Maham Nadeem, Nida Aziz, Umar Sajjad, Faizan Aziz, and Hammad Shaikh. 2016. A comparative analysis of Braille generation technologies. In 2016 International Conference on Advanced Robotics and Mechatronics (ICARM). IEEE, 294–299.
R.S. Pressman and B.R. Maxim. 2019. Software Engineering: A Practitioner's Approach. McGraw-Hill Education. https://books.google.com/books?id=qNlGxAEACAAJ
Ather Sharif, Sanjana Shivani Chintalapati, Jacob O. Wobbrock, and Katharina Reinecke. 2021. Understanding Screen-Reader Users’ Experiences with Online Data Visualizations. In Proceedings of the 23rd International ACM SIGACCESS Conference on Computers and Accessibility (Virtual Event, USA) (ASSETS ’21). Association for Computing Machinery, New York, NY, USA, Article 14, 16 pages. https://doi.org/10.1145/3441852.3471202
Bruce N Walker and Michael A Nees. 2011. Theory of sonification. The sonification handbook 1 (2011), 9–39.
Haixia Zhao, Catherine Plaisant, Ben Shneiderman, and Jonathan Lazar. 2008. Data sonification for users with visual impairment: a case study with georeferenced data. ACM Transactions on Computer-Human Interaction (TOCHI) 15, 1 (2008), 1–28.

FOOTNOTE

¹ https://delfthapticslab.nl/device/phantom-omni/

² https://oqton.com/freeform/

³ https://www.fluidsynth.org/

⁴ https://humansystems.arc.nasa.gov/groups/tlx/

CC-BY license image
This work is licensed under a Creative Commons Attribution International 4.0 License.

PETRA '24, June 26–28, 2024, Crete, Greece