[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3613904.3642498acmconferencesArticle/Chapter ViewFull TextPublication PageschiConference Proceedingsconference-collections
research-article
Open access

Inkeraction: An Interaction Modality Powered by Ink Recognition and Synthesis

Published: 11 May 2024 Publication History

Abstract

Ink is a powerful medium for note-taking and creativity tasks. Multi-touch devices and stylus input have enabled digital ink to be editable and searchable. To extend the capabilities of digital ink, we introduce Inkeraction, an interaction modality powered by ink recognition and synthesis. Inkeraction segments and classifies digital ink objects (e.g., handwriting and sketches), identifies relationships between them, and generates strokes in different writing styles. Inkeraction reshapes the design space for digital ink by enabling features that include: (1) assisting users to manipulate ink objects, (2) providing word-processor features such as spell checking, (3) automating repetitive writing tasks such as transcribing, and (4) bridging with generative models’ features such as brainstorming. Feedback from two user studies with a total of 22 participants demonstrated that Inkeraction supported writing activities by enabling participants to write faster with fewer steps and achieve better writing quality.
Figure 1:
Figure 1: Inkeraction is an interaction modality powered by ink recognition and synthesis. Inkeraction enables new digital ink features on device in real time. In an example scenario, a user prepares pack lists for their trip interactively with Inkeraction’s automatic recognition, organization, correction, transcription, and expansion capabilities.

1 Introduction

Ink is a powerful medium that has been used for centuries. It enables people to create, annotate, record, sketch, and think [7, 42, 48, 59], and it is an essential part of our daily lives for education, work, leisure, and more [3, 65, 74]. In recent decades, the innovation of styluses and tablets has made digital ink widely available, which offers new capabilities beyond traditional ink on paper. Users can rearrange handwriting [49, 50], search for digital content [28], and analyze data [35, 60] with digital ink.
Common digital ink tools are limited in their abilities to understand and modify ink. Most tools can only recognize individual strokes, failing to grasp the broader structures (e.g., lists and text blocks) and relationships (e.g., annotations and pointing gestures) that these strokes form. Users must explicitly define the intended written structure or relationships within the tool by switching between various writing tools and templates  [6, 7, 49, 50, 61]. Additionally, most tools offer to modify ink by transforming existing strokes, without the ability to synthesize new handwriting content. These constraints limit the expressiveness and interactivity of a tool, making ink writing a time consuming and disruptive process. As a result, digital ink remains mainly as an input method, where a tool relies on other representations (e.g., typed text) to provide feedback  [7, 67, 77].
We present Inkeraction, an interaction modality powered by ink recognition and synthesis. Our ink recognition algorithms are full-page, hierarchical, and excel at understanding relationships. “Full-page” refers to decoding the entirety of a handwritten page, expanding on prior work that provided limited recognition confined to predefined tools or templates. Our algorithms recognize higher-level objects as the compositions of lower-level ones (e.g., a list from words), representing strokes as nested, hierarchical structures. The algorithms can not only classify free-form ink objects, such as handwriting and sketches, but also identify the relationships between these objects (e.g., one item pointing to another). Our ink synthesis algorithms feature a diverse set of stroke manipulation tools for different use cases. They can clone and transform existing strokes, generate aesthetically pleasing strokes, and adapt to the user’s writing style.
These recognition and synthesis capabilities enable a variety of new ink interactions that were previously challenging. With Inkeraction, users can take notes without specific templates, and the tool responds to users in a similar writing style, as shown in Figure 1. In this paper, we explored how Inkeraction can be used to create a variety of features, including:
Assisted Manipulation to efficiently arrange ink content, without templates or tool switching (e.g., Figure 1c).
Word processor-inspired features, including Spell Check (e.g., Figure 1d), Auto Completion, and Formatting, which help users write efficiently and accurately.
Features to automate repetitive tasks, including Transcription (e.g., Figure 1e) and Beautification.
Opportunities to compose with generative models (e.g., Figure 1f), which could help users brainstorm content, fulfill written tasks, and organize less-structured documents.
To gain practical insight into Inkeraction and its features, we conducted two user studies. The first study assessed the overall ease of learning, usability, and usefulness of Inkeraction. Twelve participants experienced six Inkeraction-integrated features using our prototype. The word processor-inspired features were deemed most intuitive to learn, while the Assisted Manipulation feature was most favored due to its versatility. The second study compared an Inkeraction-powered ink tool to a conventional ink tool lacking recognition and synthesis capabilities, with ten participants tackling five inking tasks using both tools. The results showed that Inkeraction enabled faster writing speed, higher writing quality, and in fewer steps, freeing up time for thought.
Inkeraction, like other interaction modalities (e.g., speech), has limitations, including ambiguity, latency, and accommodating for user preferences. Ink structures may change over time and cause ambiguous recognition results and interaction outcomes. Latency can introduce usability issues for features that require quick interaction. Depending on preference and use cases, users may look for different levels of assistance. To address these challenges, we discuss interaction techniques that make Inkeraction more usable and helpful.
In summary, our contributions are as follows:
A new interaction modality of digital ink (Inkeraction) and its architecture with ink recognition and synthesis.
Design space of Inkeraction with use cases and applications.
Two user studies with a total of 22 participants that examined Inkeraction and its features.

2 Related Work

Our work builds on the foundation of prior art on understanding inking tools and their usages. We were also inspired by the research in ink manipulation, recognition, modification, and synthesis. To our knowledge, Inkeraction is the first attempt to combine hierarchical full-page ink recognition and multiple ink synthesis algorithms, enabling new design space for ink interaction.

2.1 Understanding Inking Practice

Inking, which is the act of writing and drawing with an analog pen or digital ink, has been studied by researchers from many perspectives. Prior work investigated how inking can be used for active reading [6, 11, 27, 29, 51, 57, 64, 69], notetaking [12, 28, 41, 52], annotations [4, 58, 80, 81], creativity [25, 32, 43], data analysis [35, 60], personal reflection [3, 13, 19], and collaboration [14, 17]. In addition to inking activities, researchers also studied the uses of different inking tools. Riche et al. [59] studied the gaps and discrepancies between the experiences afforded by analog pens and digital ink, and found that digital ink had certain benefits over analog pens, such as the ability to dynamically edit output. Camporro et al. [20] compared digital and analog tools for sketchnoting, and found that while digital tools were more editable, they suffered from the high cognitive load of unlimited editing options and details.
In previous research, inking has been shown to be associated with thinking. Inking allows us to shift cognitive load out of our heads and onto physical media [37, 71], and creates a permanent record of ephemeral thoughts (e.g., sketching an idea on paper) [37, 59]. Inking can be a messy process, as it often reflects our mental models and thinking processes. Inking content is usually free-form and contains visual content like sketches and diagrams [20, 37, 74]. When designing Inkeraction, we aim to support a wide range of inking activities, providing the flexibility to edit inking content with minimal interactions and interruptions.

2.2 Ink Manipulation and Recognition

Digital ink’s malleability empowers users to edit their work, but manipulating individual strokes can be laborious. To address this, researchers have developed recognition-independent techniques. Ink selection tools, such as the Lasso tool [8] and the Harpoon technique [44], enable users to specify a group of strokes for editing. Ink gestures act as shortcuts for modifying strokes. For example, Tivoli [49, 50] and Translucent Patches [39] used gestures to organize content into regions, reformat list items in multi-column lists, and move items by rows and columns. User-specified content empowers users to define the types of their ink content, aiding in organization and management. NiCEBook [7], for instance, allowed assigning categories to individual notes. Style Blink [61] introduced three building tools for organizing digital ink content. While valuable, these techniques can be complex and cognitively demanding. Additionally, switching between tools, tags, and templates can disrupt the thinking process.
Ink clustering groups related strokes together, providing some ink recognition capabilities to simplify manipulation. This technique allows users to select clustered strokes using simpler interaction techniques, such as a tap [63]. It also facilitates content rearrangement, as demonstrated in CLuster [55], where moving selected strokes triggered the other unselected strokes to move with them in their respective clustered groups. Additionally, users can customize the types of clustered strokes to cater to specific applications. For instance, in Flatland [53], users designated clustered strokes as a to-do list, enabling easy item completion and reordering through gestures.
Previous HCI research on ink manipulation explored ink recognition, but often faced limitations. Recognition capabilities were typically restricted to specific regions or tools. For example, WritLarge [77] recognized only user-selected strokes for editing suggestions, neglecting the potential involvement of unselected strokes during interactions (e.g., dragging selected strokes over unselected content). Bargeron et al. [5] attempted full-page recognition for ink annotations, but required manual labeling by participants in their user study.
Recent advancements in algorithms and databases enhance full-page handwriting recognition. Graph neural networks [78] and stacked recurrent neural networks [15] have shown promise in recognizing the handwriting documents on the IAMonDo dataset [31]. However, a discrepancy exists between these algorithms and datasets. These datasets encoded documents hierarchically (e.g., labeled lists containing labeled texts) [31, 34], while the algorithms in [15, 78] treated documents as flat, mutually exclusive objects (e.g., lists and texts as distinct labels). Additionally, the prior work focused mainly on the algorithms and databases, without exploring the interaction perspective of full-page recognition.
Inkeraction recognizes ink content on an entire page as users write and draw, including its hierarchical structure and diverse relationships. This enables the reuse of existing design patterns, such as tap-to-select and list reordering, while also creating novel possibilities for human-ink interaction.

2.3 Ink Morphing and Synthesis

Ink is an expressive medium, yet creating precise or controlled marks with it can be challenging. To address this, researchers have developed techniques for beautifying and correcting strokes. One approach is to use shape morphing, which converts handdrawn shapes into beautified symbols. For example, Fluid Sketches [2] continuously morphed shapes towards standard circles and boxes as a user draws them. Similarly, the system developed by Hse et al. [30] recognized and beautified 13 sketched symbols. In addition to shapes, handwriting can also be beautified. This can be achieved by normalizing geometric characteristics [66], aggregating multiple instances of the same written word or shape [82], and replacing with synthesized strokes in different writing styles [1].
Ink synthesis algorithms have emerged as a versatile tool for generating natural-looking handwriting. A comprehensive survey of these techniques was presented by Elarian et al. [18]. One notable application is DeepWriting [2], where ink synthesis enabled word-level editing, spell checking, and beautification. This work mainly focused on the synthesis algorithms, and the spell checking and beautification features were evaluated in a preliminary user study with 10 participants.
Inkeraction combines ink recognition and synthesis to enhance user interactions. It can not only synthesize handwriting text, but also intelligently arrange the synthesized text within existing structures like lists and paragraphs.

3 Design Process

Over two years, we iteratively developed Inkeraction, envisioning its design space through various design activities conducted with our engineering, design, and research collaborators. In this section, we focus on a design workshop that took place in early 2022 and motivated us to create Inkeraction.
The goal of the workshop was to identify how existing digital ink tools support inking activities and where they might fail as a tool for thinking [71]. To achieve this, we assembled a diverse panel of 12 experts to review existing inking software and hardware. The participants included three researchers, four engineers, four designers, and a product manager. Three participants held a Ph.D. degree, six possessed a master’s degree, and the remaining three had attained at least a bachelor’s degree. Their professional experience averaged 12 years (SD = 8.9 years), demonstrating extensive industry knowledge. They had developed and launched several advanced user interface systems prior to the workshop.
The six-week workshop employed a hybrid approach, combining in-person collaboration and individual evaluations. During the initial week, the participants convened for three days to establish rapport, set the workshop agenda, and exchange prior research findings on digital ink and thinking tools through individual presentations. In the following weeks, they engaged in an extensive offline evaluation of ink tools across various platforms and applications. These included Apple Notes, Google Keep, NEBO, Remarkable, Boox, Procreate, WeTransfer Paper, and Moleskine Flow. In the last day of the workshop, the participants reconvened to discuss their findings and deliver a joint presentation. We recorded and transcribed the final discussion and presentation, and two authors used an inductive analysis [70] with open coding [10] to identify a set of common themes in the participants’ findings, which are briefly reported as follows:
Malleability helps think. Many ink systems allow a user to rearrange their strokes after writing. Malleability distinguishes digital ink from physical pen and paper [59], and can help users change the representation of their initial ideas, which is an important task in sensemaking [62]. Prior research has shown that directly manipulating something tangible was easier than simulating the manipulation in your head [21, 38]. However, existing products and prior work only partially fulfill this task, see below.
Repetitive work bores minds. Taking notes is a mentally demanding task [56], and it can be even more difficult when users are constantly switching between tools and tasks [22, 36], as is the case with many existing products. For example, to move a paragraph to a new location in Apple Notes, a user needs to shift other content out of the way, which requires the user to switch between stroke selection and movement back and forth. Prior research work reduced some efforts for ink manipulation. Yet, it still requires using additional tools (e.g., [61]) in multiple steps (e.g., [77]), which could slow down the user’s thinking process.
Digital ink should be a co-pilot. Digital ink can enhance productivity by streamlining repetitive tasks, such as automatically converting handwritten dates into calendar events. However, existing inking tools are predominantly reactive, requiring user input and interpretation. Drawing inspiration from recent advancements in generative models and their emergent capabilities [75], the expert participants anticipated the next generation of inking tools to evolve into proactive collaborators. These active tools would not only handle mundane tasks but also aid in creative endeavors like brainstorming (as illustrated in Figure 1f), enabling users to concentrate on their ideas.
Based on the findings, the participants envisioned an inking tool that would be able to assist the user to manipulate strokes quickly and help the user complete inking tasks. To free the user from tool switching, the system needs to understand the user’s writing and drawing without additional information from the user. To automate tedious writing and drawing tasks for the user, the system should be able to generate ink output. We name this ink input and output modality as Inkeraction.

4 Inkeraction

Digital ink offers expressive freedom for note-taking and creativity activities. However, its content can be difficult for a machine to interpret for interactive features. Drawing upon prior art and workshop findings, we defined Inkeraction’s initial capabilities as follows:
Recognize words and underlines, which are widely used in digital note taking and annotations [68],
Recognize connected graphs, lists, and diagrams, which are essential to visual thinking [74].
Synthesize markings (e.g., arrows, connectors) and symbols (e.g., a list bullet), which are abundant in the above recognized objects and necessary for ink output.
Synthesize styled handwriting as handwriting can be highly personalized [1].
We concentrate on architecting a data representation and algorithms for ink recognition and synthesis, building upon existing research. Our processing pipeline design is described in detail below.

4.1 Ink Recognition

4.1.1 Ink Representation.

As shown in Figure 2, Inkeraction represents a handwritten document as a hierarchical graph with three levels of containment (shown with elbow connectors) and various marking relationships (shown with curved connectors). This representation captures the inherent structure of nested handwritten notes, as explored in prior work [31, 34]. Unlike previous recognition work that treated nested structures as monolithic entities, hindering interactions with intra-structure components (e.g., Ye et al. [78] recognized a list as a standalone object), Inkeraction preserves these structures, enabling fine-grained interaction with each lower-level object. For example, the list in Figure 2a is represented as a container comprising individual list items, each with a bullet and word, allowing us to interact with any of these objects. Theoretically, the hierarchical containment relationships can have an unlimited number of levels, but we found that three levels were sufficient to represent the objects within our scope.
In addition to the containment relationships, we add marking relationships like connecting and pointing to for inter-structure interactions.
To describe the recognized objects, the hierarchical containment and varied marking relationships, Inkeraction employs a dictionary of labels. The labeling system is similar to prior work [31, 34], but modified with the opinions from our expert participants, developers, and designers. See Figure 3 for common labels in the dictionary.
Figure 2:
Figure 2: A sample note of a Yosemite trip with a set of components (an image, labels, arrows, and list items) (a) and its relationship graph that represents the content and relations between components (b).
Figure 3:
Figure 3: The common labels used to describe ink objects and varied relationships between ink objects. The objects are hierarchical, and higher-level objects are made of lower-level objects.

4.1.2 Recognition System.

To obtain the hierarchical relationship graphs, our recognition system employs a segmentation model and a classification model working in tandem. The segmentation model iteratively groups strokes representing meaningful objects at different levels within a handwritten document, ultimately creating a stacked hierarchy of Graph Neural Networks (GNNs), see Figure 4. At each level, a GNN receives a set of input objects and clusters them, generating output objects for the next level. The classification model then analyzes the embeddings generated by the GNNs at different levels, assigning labels to these objects, such as “word” for lower-level objects and “list” for higher-level ones. Finally, Inkeraction performs text recognition and character segmentation specifically for text objects.
Figure 4:
Figure 4: The segmentation model consists of a sequence of Graph Neural Networks (GNNs). It uses multi-head attention mechanism to update and merge node embeddings. Each GNN segments strokes with different levels of granularity, and their embeddings can be used to classify the types of segmented objects and the relationships between the segmented objects. This figure demonstrates how the strokes are segmented into L1 GNN (a), with detailed model architecture of the update (b) and merge (c) steps.
Segmentation. At the initial level of the GNN hierarchy, each node represents a single stroke or a non-stroke object (e.g., an image). Each stroke undergoes normalization, smoothing, and processing through a stroke embedding model to extract an initial fixed-size representation. The stroke embedding model uses a stack of Transformer layers [72] followed by a fully connected layer. Non-stroke objects have their embeddings retrieved from a constant dictionary based on their type. At each level, initial edge embeddings are calculated based on heuristic features between nodes, such as relative distance and size, similar to the approach described in [78].
The node embeddings are then iteratively updated. Each iteration begins by identifying a fixed number of neighbors for each node using the k-nearest neighbors algorithm, based on the Euclidean distance between nodes. Subsequently, each node’s embedding is updated through a multi-head attention mechanism over the edge embeddings and the embeddings of its neighbours followed by a residual update. Let hi denote the node embedding for node i, and eij denote the edge embedding for the edge connecting node i to its neighbor node j. Each node embedding hi is updated with the output of a multi-head attention mechanism, where hi serves as the query, and the set of eij|hj act as the keys and values. The attention layer is followed by a reduction sum and a fully connected layer, which calculates the updated node embedding by combining the mean of the attention outputs with the original node embedding (see Figure 4b). We repeat the update iteration a fixed number of times.
After updating the node embeddings, merge logits are calculated for each neighbor of each node in the GNN. The merge logits between source node i and target node j are also computed over a multi-head attention mechanism, where the queries, keys, and values are hi, eij, and hj, respectively. The attention outputs are then reduced to a two-dimensional logit tensor representing the probability of merging or not merging. If the “merge” logit value is larger than the “no merge” logit value, we conclude that node i wants to merge with node j.
Cliques of n nodes that mutually desire merging are fed to a merger model, which utilizes a multi-head attention layer followed by a reduction sum layer. The attention layer computes an embedding from each node by attending over other node embeddings and the corresponding edge embeddings. The merged node embedding is computed as an average over all the attention heads and outputs (see Figure 4c). After updating and merging node embeddings, the edge embeddings are recomputed. If any nodes have been merged, their associated edge embeddings are recalculated based on the heuristic features.
At each level, we iteratively repeat the node embedding update and merge process until no more cliques of nodes desire merging or a maximum number of iterations is reached. We then repeat the process with another set of model weights to create a higher-level GNN from the current node embeddings. For example, if we previously merged strokes into words, we would now merge words into lines of text. This iterative segmentation process enables us to hierarchically cluster strokes and non-stroke inputs into three GNNs: L1, L2, and L3.
Classification. The classification algorithms process the segmentation results to categorize objects and relationships. Each node in the GNNs is assigned a label (e.g., “word” at L1, “textline” at L2, and “list” at L3). The classification model is a single fully-connected layer that maps a node embedding to the logits for the given number of classes.
The relationships between objects are computed in the same way as the merge logits. Instead of producing the merge logits, the relationship model produces logits for each type of relationship (e.g. “no relationship”, “underlines”, and “points_to”) between two objects.
Text Handling. For text objects, Inkeraction runs text recognition using an LSTM-based model [9]. The recognized text is then segmented into characters using a Transformer-based model [33], which allows us to calculate text geometrics such as character sizes and word baselines.

4.2 Ink Synthesis

Inkeraction addresses the challenge of free-form stroke synthesis with three targeted methods: stroke cloning for repetitive objects (e.g., list bullets), curve synthesis for connectors and arrows, and style-based handwriting text synthesis for text editing and creation. Figure 5 shows the different use cases of these methods.
Figure 5:
Figure 5: Inkeraction synthesizes content when users drag an item or continue writing.

4.2.1 Geometric Transformations.

Stroke cloning and curve synthesis are achieved through geometric transformations. A stroke, represented by s, is a sequence of N coordinates. We can apply a 3 × 3 geometric transformation matrix T to s for translation, rotation, and scaling: T · s.
To clone a stroke c from s, we copy s and transform it to the desired position: c = T · Copy(s).
Curve synthesis involves identifying a template and calculating a T to modify the shape and position of the template. For instance, when a user moves an arrow composed of multiple strokes, the original arrow serves as the template. We identify the longest stroke in the arrow as its tail and define its two endpoints, A and B, as reference points. Based on user’s action, we calculate the after-interaction coordinates for these points: A′ and B′. The transformation matrix T, represented by a scaling factor (r), a rotational factor (θ), and two translation factors (dx and dy), is then determined by solving the equations A′ = T · A and B′ = T · B. This calculated T is applied to all strokes of the template to generate the synthesized curve.

4.2.2 Handwriting Text Synthesizers.

There are different use cases for synthesizing handwriting text. We can extend existing handwriting fragments (e.g., finishing an incomplete sentence), generate entirely new handwritten content (e.g., producing a standalone phrase), and modify existing handwritten text (e.g., correcting a character in a word). Inkeraction extends and generates handwriting building upon the work of Graves [24], and modifies existing handwriting using the algorithms suggested by Maksai et al.[47].
Graves [24] proposed a Recurrent Neural Network (RNN) model along with a “priming” technique to synthesize styled handwriting. This technique leverages the model’s inherent ability to attend to its previously generated strokes. By design, when synthesizing a stroke sequence x for a piece of text t, the model analyzes the already generated strokes xp and their corresponding text tp to produce the remaining strokes xr for the remaining text tr, where t = tp + tr. This mechanism ensures a smooth transition and consistent style. By changing xp and tp (i.e., priming model with xp and tp), the model can generate strokes for text tr in different styles. Even strokes from users not in the training data can be used to prime the model, mimicking their handwriting style. However, the generated strokes may still exhibit some bias towards the training data. This bias can be reduced by incorporating diverse writing samples, encouraging the model to generalize ink synthesis.
Inkeraction uses this priming technique for handwriting extension and generation. When extending handwriting, the model is primed on the existing strokes (i.e., xp), conditioned on the current label (i.e., tp) and the text to be extended (i.e., tr). Similarly, for generating new content, the model is primed on template strokes, conditioned on the label of the template strokes and the text to be generated. User-produced strokes (e.g., from previous writing) can serve as templates, allowing personalized stroke synthesis without RNN retraining. Alternatively, built-in template strokes in a standard style can be used to generate strokes in a generic font. After generating strokes, we apply geometric transformations to ensure their proper positioning within the context. We further improved the stopping criteria compared to [24]. The original model used a heuristic to stop synthesis when the sampling window exceeded the last character, which could fail in certain cases. In our work, instead of using this heuristic, we predict an end-of-sequence binary value. In other words, the original model predicts (dx, dy, PenUp/PenDown) for each step but our implementation predicts (dx, dy, PenUp/PenDown, isEndOfSequence), where dx and dy represent the pen coordinates for the step.
To modify existing handwriting, we use the model from Inkorrect [47]. This model extends the work of Graves [24] with a dedicated style capture component, which makes modified strokes more similar to the original handwriting.

4.3 Dataset

To train the ink recognition algorithms and handwriting text synthesizers, we curated a handwriting dataset consisting of 18200 ink documents over a six-month period. The documents were collected in two stages: 5200 were written and annotated by 15 workers in the first stage, 13000 were written and annotated by 33 workers in the second stage. All the writing work was done in an Android application on tablets or convertible laptops equipped with active styluses, annotations were collected either through the same Android application or a separate web application.
Each document was written based on a unique prompt assembled from a diverse library of instructions. This library contained: (1) text snippets extracted from randomly chosen English Wikipedia pages; (2) lists generated from hand-curated list of words (e.g., aircraft models, country names); (3) tables extracted from English Wikipedia pages; (4) images from Wikipedia and the Open Images Dataset [40]; (5) randomly generated box charts and pie charts; (6) flowcharts diagrams randomly generated from a set of shapes, labels and connections; (7) chemical formulas from Wikipedia; (8) a curated list of sketching prompts (e.g., “draw something that evokes: happiness”); (9) synthesized names, fake email addresses, dates, times, and actions; (10) layout instructions (e.g., writing in particular parts of the page, drawing arrows between objects). When assembling a prompt, we randomly chose instructions from the library, ensuring each prompt was distinct. We empirically determined that around 15 instructions per prompt would fill most of the screen with a handwritten document. Below is a sample prompt with 8 instructions:
(1)
Write somewhere: The song’s tune was written by Harold Karr
(2)
Underline several consecutive words in (1)
(3)
Write somewhere: Thomas
(4)
Write below (1): message [email protected]
(5)
Draw this somewhere: [A picture of a chart]
(6)
Draw a line to connect (2) and (4)
(7)
Write a bulleted list (*) with elements: edamame; cinnamon; Garlic; Dried lentils
(8)
Draw something that evokes: love
During annotation, strokes were initially grouped into L1 objects. Then, these L1 objects were hierarchically grouped into L2 and L3 objects, see Figure 3 for example objects. Text content was also labeled. Documents were reviewed for annotation accuracy and feedback was given back to the workers. Our team also did extensive reviews and fixes of the labels.
This process resulted in 18.20 thousand (K) annotated ink documents with 5.19 million (M) strokes, consisting of 1.18M annotated L1 objects, 0.24M annotated L2 objects, and 83K annotated L3 objects. Our database is similar to the IAMonDo database [31], which also used a prompt-based collection process and hierarchical annotation with similar labels, although IAMonDo only had 0.94K annotated documents with 0.36M strokes.

4.4 Performance

We trained and evaluated the recognition system, which had 1.4M parameters, on our dataset. To enable comparison with previous work, we then evaluated the trained system on the designated test set (i.e., set3) of the IAMonDo dataset. As shown in Table 1, our recognition system achieved high performance when evaluated on our own test set that had 0.17M objects. However, when evaluated on the IAMonDo’s test set, the different definitions and label distributions between our dataset and IAMonDo might lead to lower recognition performance for some categories, such as “Connector.” Refer to [78, 79] for comparable results from prior work, but note that their models were trained and evaluated on the same dataset (e.g., IAMonDo).
Table 1:
DatasetMetricsWordBulletConnectorArrowUnderlineTextlineTextblockListOverall
            
Our DatasetSR0.97740.99910.94830.91770.90800.97630.90990.96130.9631
  acc0.99950.99930.91330.98460.83890.99920.99780.99610.9944
IAMonDoSR0.73670.93310.63211.00000.87870.82350.40980.81310.7288
  acc0.99700.94080.17921.00000.77690.99910.96130.85490.9744
Table 1: The recognition system was evaluated using two metrics adopted from prior work [79]: segmentation recall rate (SR) and stroke-level accuracy (acc). SR measures the percentage of objects correctly segmented, while acc measures the percentage of strokes accurately classified. Higher values in both metrics indicate better performance. The system was trained on our dataset and then evaluated on both our dataset and the IAMonDo dataset separately. We present the individual results for common labels and the overall performance across all the labels shared by both datasets.
The LSTM-based text recognition model [9], the Transformer-based character segmentation model [33], and the RNN-based text synthesizers [24, 47] were trained on the annotated text from our datasets. The ground truth for character segmentation was obtained following the procedure described in prior work [33]. As we made very minimal modifications to these models, please refer to the prior work [9, 24, 33, 47] for details on their performance, including their training and evaluation on the IAMonDo database.
We implemented the recognition and synthesis algorithms on device, refer to Section 7 for runtime performance.

5 Design Space

This section explores the current design space of Inkeraction along four dimensions, which may expand as its capabilities evolve. Certain features discussed here may need the incorporation of language models, which are beyond the scope of this paper. For illustrative purposes, we have employed hardcoded functions to demonstrate these features, but implementation with actual models remains a future endeavor.

5.1 Assisted Manipulation

Interpreting user intention is crucial to effective ink manipulation. This can be achieved by leveraging the relationship graph. Upon identifying the user’s intent, Inkeraction automates manipulation by transforming and synthesizing ink content accordingly.
User intention manifests as alterations to the relationship graph. For instance:
Adding ink content leads to additional nodes in the graph.
Deleting ink content leads to the removal of corresponding nodes from the graph.
Moving ink content modifies the containment relationships based on geometric boundaries:
-
Moving a node into another node’s boundary establishes a containment relationship (e.g., adding a word to a list).
-
Conversely, moving a node outside its parent node’s boundary dissolves the containment relationship (e.g., removing a list item from a list).
Marking relationships like pointing and underlining are kept by default, except when deleted (e.g., erasing an underline).
To automate manipulation operations, layout adjustments are generated by comparing the modified relationship graph to the original. Each node and relationship implements unique mechanisms to accommodate these changes. Here are some illustrations:
List keeps its list items ordered, aligned, and bulleted. Alterations (additions, deletions, or reordering) trigger the list node to infer paddings, spacings, and bullet types from the original graph. These properties are applied to the modified graph, generating a revised list layout. Figure 6 demonstrates that list modifications can arise from: moving a list item (Figure 6b and 6c), applying special gestures (Figure 6d), removing (Figure 6e) and adding list items (e.g., aligning a list item as the user writes down it).
Textblock and Textline preserve baselines and spacing of their content. When content is modified, these text nodes arrange the updated content to align its layout with the original graph. For example, deleting a word in a textline brings its neighboring words together.
Underline adjusts to follow changes made to underlined nodes.
Connector and Arrow maintain connections between nodes.
Figure 6:
Figure 6: A user manipulates lists to manage their schedule. The green glow indicates the current selection.
Note that these layout mechanisms can happen at the same time, see Figure 7 for a combined case. As Inkeraction develops, we may create new use cases by thinking about what relationships matter in an ink interaction (e.g., highlighting, table container, shape container), and how to reflect the relationship change in layout.
Figure 7:
Figure 7: A user moves “interaction” from a textblock to a list. In (b), the list rearranges “interaction” and adds a star bullet (1). The textblock organizes the remaining items (2). The underline follows “interaction” (3). The connector is redrawn to reflects the original connecting relationship (4). While (1) and (2) change the layout to show different relationships (i.e., “interaction” is moved from the textblock to the list), (3) and (4) reflect the unchanged relationships (i.e., “interaction” is still underlined and connecting to “gesture”) in modified strokes.

5.2 Word Processor for Ink

Prior work indicated that users could transfer their mental models of word processors to ink interfaces [23]. Inkeraction allows ink to function as both input and output, similar to typed text. Thus, common word processor features, such as copy, paste, resize, and format painter, can be implemented for handwritten text with Inkeraction.

5.2.1 Spell Check.

This feature checks for misspelled words in handwriting and displays them with a red underline, as shown in Figure 8a . The user can tap on the red underlines to bring up a popup menu for confirmation or rejection (Figure 8b). If confirmed, a correction will be synthesized to match the user’s handwriting. If the synthesized correction overlaps with existing strokes, we can use the relationship graph to shift strokes, as described in Section 5.1. Other rendering versions are shown in Figure 8c . Spell Check suggestions are rendered as soon as Inkeraction recognizes the handwriting, and the rendered underlines will stay there until the user confirms or rejects them.
Figure 8:
Figure 8: The Spell Check feature.

5.2.2 Auto Completion.

This feature predicts the content the user is going to write and renders the prediction in purple strokes that match the user’s handwriting, see Figure 9a . The user can tap on the prediction to accept it (Figure 9b), or they can ignore it and continue writing (Figure 9c). Predictions are displayed as soon as Inkeraction recognizes the handwriting.
Figure 9:
Figure 9: The Auto Completion feature.

5.2.3 Formatting.

Word processors offer many ways to format typed content, which can be applied to ink content as well. For example, users can tap on a bullet to change a bulleted list to a numbered list (Figure 10 left), or use gestures to adjust list spacing (Figure 10 middle). Given the vast range of formatting possibilities, further research is warranted to determine the specific options necessary for handwriting. For example, underlining may be redundant as users can easily do it themselves. It is also important to consider how these formatting options can be activated within the constraints of limited ink interaction space.
Figure 10:
Figure 10: The list can be formatted using gestures. The list on the left shows the original list for reference. The user can tap on the list bullet to change the bullets into numbers. The user can also pinch on the list to adjust the list item spacing. The list on the right shows the final list format, which is numbered and has a larger spacing than the original one.

5.3 Writing Automation

Inkeraction streamlines numerous writing tasks that are cumbersome to complete using traditional pens and paper.

5.3.1 Transcription.

Note-taking tasks frequently involve transcribing content from various sources, which can be simplified using the Transcription feature. Figure 11 shows this feature enables copying of digital content from websites. Moreover, Transcription can be enhanced with microphones and cameras. For instance, it can be used in conjunction with speech input technologies to take real-time lecture notes. Additionally, optical character recognition (OCR) enables the digitization of notes written on whiteboards.
Figure 11:
Figure 11: The Transcription feature.

5.3.2 Beautification.

Sometimes we need to revise notes for aesthetic purposes. With the Beautification feature, users can select text and apply beautification effects. There are many ways to beautify handwriting, and Inkeraction offers two algorithms for this purpose, as shown in Figure 12. The unsynthesized algorithm aligns text items and corrects margins using text baselines, yielding a more natural-looking result (Figure 12b). The synthesized algorithm rewrites text using a generic font, producing a more standardized output (Figure 12c).
Figure 12:
Figure 12: Two ways the Beautification feature can beautify the text.

5.4 Composing with Generative Models

Inkeraction can seamlessly integrate generative models, including large language models (LLMs), into inking interfaces, enabling direct user assistance and conversation. We present illustrative examples that showcase the boundless potential of this design space.

5.4.1 Brainstorm and Text Generation.

Figure 1f shows the Brainstorm feature, which can help the user to expand an existing list. Similarly, Inkeraction can be combined with LLMs to summarize, bulletize, shorten, elaborate, and rephrase written text.

5.4.2 Task Fulfillment.

Generative models can empower inking interfaces by integrating API calls [54]. This transforms inking surfaces into hubs of ideation and computation, where users can express their thoughts and commands and ask the surfaces to comprehend and execute them. Examples include scheduling calendar events, setting alarms for deadlines, or creating task sequences through lists or arrows as shown in Figure 13.
Figure 13:
Figure 13: A user creates a sequence of tasks to be fulfilled. Inkeraction recognizes the writing and sequence, and invokes a large language model to call different APIs to complete the tasks. Each task is marked with a small icon to indicate the progress.

5.4.3 Organizing Genie.

While the Assisted Manipulation feature simplifies note arrangement, generative models offer even greater assistance. Figure 14 demonstrates how Inkeraction can automatically organize a cluttered page with a LLM. Inkeraction provides essential segmentation and text recognition data, which the LLM leverages to identify and represent relationships among the segmented content in a structured format. Inkeraction then visualizes these structures through connectors and handwritten text. Researchers have developed algorithms for relation extraction and structure generation. For example, TextSketch [67] employed traditional natural language processing to guide users in creating diagrams and visual notes. Studies have shown that LLMs outperform traditional models in relation extraction [73]. Due to the free-form nature of user-generated content, such as sketches and diagrams, developing algorithms for effective organization and structuring remains an ongoing research challenge. Inkeraction’s recognition capabilities offer valuable input for external algorithms to comprehend relationships, while its synthesis abilities allow machines to generate structures on behalf of users. As LLMs evolve, we believe their ability to extract various types of relations could also advance, especially with the help of Inkeraction’s relationship graph.
Figure 14:
Figure 14: Inkeraction and a large language model (LLM) can work together to organize notes. The left image is an unorganized note. The LLM extracts causal relations between segmented content, and represents the relations in structures (e.g., lists, mind maps). The right image is a possible outcome.

6 Prototype

To assess Inkeraction’s functionality and user experience, we created a prototype, which runs on a Samsung Galaxy Tab S7+ tablet and includes six features from each bucket of Inkeraction’s design space. The features were selected for variety. Gestures were also implemented to interact with the features.

6.1 Gestures

We implemented gestures to move strokes and use Inkeraction features, as shwon in Figure 15. The ink gestures are scribble-to-delete (Figure 15b) and circle-to-select (Figure 15c). To support the mode switching between ink drawing and ink gesturing, we use a timeout technique [26]: the user must pause with the pen at the end of the stroke to activate an ink gesture, otherwise the stroke will be treated as a regular ink. We used heuristics and classification algorithms [76] to identify the ink gestures. Four finger gestures are tap-to-select (Figure 7a), tap-to-deselect (Figure 15f), tap-to-confirm (Figure 9b), and drag-to-move (Figure 15e). It is possible to replace these gestures and we did not intend to evaluate these gestures in the prototype.
Figure 15:
Figure 15: A user uses different gestures to organize their chemistry notes. The blue circle represents the end of a stroke.

6.2 Features

We implemented the features using the recognition and synthesis models presented in Section 4. For features that rely on external models, like language models, we incorporated simplified functions with built-in data to simulate the behavior of these external models, as described below in 6.2.2, 6.2.3, and 6.2.6. We adopted this approach because our main objective was to evaluate Inkeraction, not the external models themselves. When integrated with actual external models, these features should be able to handle a wider range of use cases, but with the increased errors introduced by the external models.

6.2.1 Assisted Manipulation.

This feature supports the user cases in Figure 6 and Figure 7. It enables users to reorder, drag, delete, mark off, and move items within a list and a connected graph, both of which can be created by the users themselves. To utilize Assisted Manipulation, the user first selects an item by tapping it with their finger or circling it with the stylus. Once an item is selected, the user can drag the item to achieve the desired outcome.

6.2.2 Spell Check.

This feature utilizes a built-in dictionary of misspelling-correction pairs to suggest appropriate corrections for recognized words. The dictionary includes the four underlined misspellings in the following paragraph: “The Twelvth Night is a marvellous comedy. The play begins with a disasterous journey to sea in a yaucht.” These four words are predominantly used in British English but spelled differently in American English. If a correction is too lengthy, the feature will shift neighboring words, as shown in Figure 8c .

6.2.3 Auto Completion.

This feature automatically suggests the subsequent word in a phrase based on the recognition of the initial word. It employs a compact dictionary that stores first-second word pairings. For testing purposes, we implemented the following phrases from a grocery list: “Salad dressing; Peanut butter; Cottage cheese; Cream cheese; Baking powder; Potato chips.” Refer to Figure 9 for an illustration of the interaction.

6.2.4 Transcription.

This feature converts selected web text into handwriting. We used the Android WebView to display a web page within the application. The user interface resembles the one in Figure 11.

6.2.5 Beautification.

This feature changes handwriting using either the unsynthesized or the synthesized algorithms in Figure 12.

6.2.6 Brainstorm.

We designed this feature to provide suggestions for a packing list. To use this feature, select a list and two predefined items, “swimsuit” and “coat,” will be automatically added to the list.

6.3 User Interface

In addition to the features, the prototype (Figure 16) includes classic note-taking tools such as a stroke eraser, page creation, undo/redo buttons, and settings. These tools allow study facilitators and users to make quick edits during the study. They can also toggle beautification modes and features in the settings.
Figure 16:
Figure 16: The user interface of the prototype. The red lines are for description purposes. Selected items (e.g., “Selected Text” in this figure) will be highlighted in green and accompanied by a popup menu consisting of “Brainstorm” and “Beautify”.

7 Runtime Performance

We tested each of the implemented features, logging the timestamps and recognition results to measure the runtime performance. We used a charging Samsung Galaxy Tab S7+ tablet for the tests. Each data point below is based on a sample size of 10.

7.1 Recognition

The recognition pipeline continuously processes handwriting as a user writes. We optimized the stacked GNN so that new strokes can be merged with neighboring strokes incrementally without having to do a full-page recognition each time. However, the text recognition and character segmentation models still process all of the textual information each time. Adding text to a document will increase processing time, which can be optimized in future work by running text recognition and character segmentation only on new textual information. In our tests, a single character, “a,” took 20.50 milliseconds (ms) (SD = 2.51 ms) to recognize, while a long word, “character,” took 51.43 ms (SD = 5.60 ms). The scripted paragraph in the Spell Check feature took 362.83 ms (SD = 4.93 ms), and the scripted list in the Auto Completion feature took 359.87 ms (SD = 4.03 ms). During the tests, all structures and text were correctly recognized.

7.2 Synthesis

We only measured the time needed for style-based handwriting text synthesis. Stroke cloning and curve synthesis were not included in the measurements as they took very little resources. The processing time of the text synthesis is influenced by both the reference template and the target text. When using the built-in standard template for reference, a single character, “a,” took 21.62 ms (SD = 0.48 ms) to synthesize, while a long word, “character,” took 120.64 ms (SD = 1.50 ms). Synthesizing a sentence, “the quick brown fox jumps over the lazy dog,” took 502.59 ms (SD = 2.19 ms). When using a user-written word, “sample,” as the template, the same character took 40.60 ms (SD = 0.39 ms) to synthesize, while the same long word took 95.78 ms (SD = 1.05 ms). Synthesizing the same sentence took 282.19 ms (SD = 2.18 ms).

7.3 Features

The Spell Check feature relies on the recognition capability to detect misspelled words. When writing the scripted paragraph, it took 68.23 ms (SD = 1.48 ms) to detect the misspelled “Twelvth”, 123.66 ms (SD = 1.42 ms) to detect the misspelled “marvellous”, 266.11 ms (SD = 2.88 ms) to detect the misspelled “disasterous”, and 379.35 ms (SD = 3.68 ms) to detect the misspelled “yaucht”. The Auto Completion feature uses both the recognition and synthesis capabilities to render a suggestion. When writing the scripted list, it took 111.13 ms (SD = 2.46 ms) to suggest “dressing” after writing “Salad”, 250.96 ms (SD = 2.08 ms) to suggest “cheese” after writing “Cream”, 383.17 ms (SD = 7.74 ms) to suggest “chips” after writing “Potato”. The Beautification feature uses an enhanced synthesis algorithm that takes more time than the style-based ink synthesizer. Beautifying a sentence, “the quick brown fox jumps over the lazy dog,” took 542.44 ms (SD = 2.24 ms) using the unsynthesized algorithm, and 2.01 seconds (SD = 0.003 seconds) using the synthesized algorithm.

8 Study I: General Feedback

To understand how Inkeraction would benefit users in inking tasks, we conducted two user studies. Study I focused on gathering general feedback, delving into users’ perceptions of Inkeraction’s overall learnability, usability, and usefulness. Study II compared an Inkeraction-powered ink tool to a conventional inking tool. Below we describe our design and results of Study I.

8.1 Methods

8.1.1 Participants.

We recruited participants who regularly took handwritting notes. Twelve participants (P1 - P12) with an average age of 30.17 (SD=3.97) joined the study. Each participant signed a consent form before taking part in the study, and was given a notebook as a token of appreciation after the study. All participants (8 males, 4 females) had a college degree or above. Half of the participants took handwritting notes daily while the other half took weekly. Eight participants owned tablets with a stylus pen.

8.1.2 Procedure.

After answering demographic questions, the participants were shown the prototype and each experienced the six features in a balanced Latin square order. For each feature, a facilitator demonstrated how to use it first, and then the participant tried it out on their own. For the Beautification feature, the participants were able to try two different algorithms.
After each feature, each participant was asked to rate the feature with three 7-point scale Single Ease Questions (SEQs) :
(1)
Learnability: learning how to use this feature was [1 very difficult – 7 very easy]
(2)
Usability: using this feature was [1 very difficult – 7 very easy]
(3)
Usefulness: this feature was [1 useless - 7 very useful]
The participants were also asked to clarify their ratings.
After trying all of the six features, participants ranked them based on their overall experience. Then, the facilitator explained how the features worked and asked participants about their thoughts on privacy concerns. Finally, we collected feedback on new ink interaction features that participants would like to see in future note-taking apps. Each study session took about 45 minutes. d

8.1.3 Design and Analysis.

We recorded the screen and audio during the study. Then we analyzed the data from the SEQs and rankings, along with the participants’ transcribed comments. Finally, we identified common themes in the data.

8.2 Results and Findings

Figure 17 shows the user scores on the Single Ease Questions (SEQs) for each of the six tested features.
Figure 17:
Figure 17: User scores on the Single Ease Questions (SEQs) for each of the six tested features. Each feature was rated for learnability, usability, and usefulness from 1 to 7, higher is better. Error bars represent standard deviations.

8.2.1 Learnability.

Spell Check, Beautification, and Auto Completion features were the easiest to learn. Participants were already familiar with similar spell checking and word completing features from other word processors, as P3 said, “It’s just like Google Docs.” Beautification was easy to learn because it could be triggered by a simple button. Assisted Manipulation, however, was considered as the most difficult to learn. P6 and P8 attributed this to the feature discoverability while P1 and P9 felt there were many different ways one could manipulate strokes. P6 explained, “if you don’t tell me [the different stroke manipulation operations], I won’t use it.” P1 suggested “more time to demonstrate” will make Assisted Manipulation more learnable.

8.2.2 Usability.

Not all highly learnable features were considered usable. Spell Check and Beautification were rated the easiest to use, while Auto Completion was rated the most difficult to use. Nine participants complained about the performance and latency of the Auto Completion feature. The Auto Completion feature takes time to recognize written strokes before it can generate a suggestion, but users may write much faster. P1 commented, “It’s too slow, if I think of something, I will keep writing and it won’t catch up.” Moreover, the implemented version required the user to use a tap finger gesture to confirm the suggestion, which further slowed down the writing process. P5 suggested the feature should “automatically complete every word” without the confirmation. All participants were able to experience the Assisted Manipulation feature with their own list and connected graph. However, some participants were frustrated by the gestures they need to learn, which resulted in a moderate usability rating.

8.2.3 Usefulness.

Despite its low learnability and moderate usability ratings, Assisted Manipulation was considered as the most useful feature. Participants were excited about the flexibility and assistance provided by the feature. P6 said, “handwriting is fast and [this makes] modifying easy, so you will benefit from both physical and digital writing tools.” Participants noted that the manipulation could be especially useful for learning (P1, P5, P10) and planning (P8, P11, P12). Spell Check was also highly rated. Five participants (P1, P6, P8, P9, and P12) mentioned that spell errors were common in handwriting, and P9 further emphasized that, “I can’t tolerate any imperfections in my notes. [Such as] not looking good, typos. So I usually type. If you have this feature, I will start writing notes.”
Auto Completion and Brainstorm were ranked as the least useful features. Auto Completion suffered from its performance issues, as explained previously. Given the current performance, four participants (P3, P9, P11, P12) thought the feature could still be useful for complex words or sentences. As for the Brainstorm feature, participants felt the current prototype was too limited since only two items were generated. Some participants thought even a more capable AI would not be able to get what they wanted. P7 said, “when people are writing, they are not writing basic stuff and the topics may not be common knowledge”, which was echoed by P1, P2, and P10.

8.2.4 Overall Rankings.

Figure 18 shows the overall ranking results.
Figure 18:
Figure 18: The distribution of rankings for each of the six tested features. Lower rank is better.
Assisted Manipulation was the most favored one, with a mean rank of 2.42 (SD = 1.39), Ten people ranked it in their top 3 choices. Spell Check was the runner-up, which is supported by its high learnability, usability, and usefulness scores. Brainstorm was the least favored one, with a mean rank of 4.58 (SD = 1.16). The Brainstorm feature was criticized for its usefulness. The Auto Completion feature was considered the second least favored one due to its performance issues.
Beautification and Transcription were middle-ranked. When asked about their preference for different beautification algorithms, four participants (P1, P6, P10, P12) preferred the unsynthesized algorithm. They explained that they wanted to keep their writing styles. Five participants (P3, P4, P5, P7, P8) liked the synthesized algorithm better. They said that they wanted to make their writing more recognizable for sharing purposes. Three participants (P2, P9, P11) had no preference and suggested keeping both algorithms for different use cases. Seven participants suggested that the beautification feature should include the option to change writing styles and have typed fonts as well. Due to implementation limitations, it was difficult to select sentences in the Transcription feature, and participants wanted to use it to transcribe more than just words. P1 commented, “it’s just like copy and paste…[but it] would be better if you can select a sentence.” This was also shared by P4, P8, and P9.

8.2.5 Concerns and Suggestions.

Seven of the participants had no privacy concerns regarding the recognition and synthesis models. However, the other five participants felt that there should be enhanced security measures in place. P1, P2, P5, and P6 expressed that no stroke data should be uploaded to servers. They suggested that the models should run locally on devices instead. P6 also mentioned that he kept important personal information like passwords in notes, so he would not feel comfortable storing this information in the cloud. P11 said she would prefer to use these features with an authentication method that did not tie to her identity.
In addition to the feature suggestions mentioned above, the participants also provided advice on how to improve the Brainstorm feature and incorporate multimedia. Two participants wanted the Brainstorm feature to be more personal and more capable. P2 wanted the generative model to get to know her better and provide more specific information. For example, instead of adding “sunscreen” to the packing list, the AI should tell her which sunscreen to bring. P4 hoped the generative model could know his writing preferences and provide contextual suggestions, such as the date of today. P7, P9, and P11 shared common suggestions on building features for multimedia support, such as annotating multimedia, linking multimedia, and finding connections between multimedia and strokes.

9 Study II: Comparison

Given the insight from Study I, we wanted to further clarify Inkeraction’s unique value proposition compared to traditional inking tools. We conducted a subsequent comparison study to evaluate users’ behavior across different inking tasks, revealing Inkeraction’s specific impact.

9.1 Methods

9.1.1 Baseline.

To compare Inkeraction prototype with traditional inking software, we built a baseline version inside the prototype. The baseline disabled recognition and synthesis capabilities but had basic features such as pen, eraser, undo/redo buttons. In the Inkeraction prototype, users could select recognized objects by tapping on their strokes. This recognition-based selection was not available in the baseline. To enable selection in the baseline, we added a rectangular marquee selection tool, which selects all the strokes within the drawn rectangle, see Figure 19a .

9.1.2 Participants.

Ten participants (P13 - P22) with an average age of 29.6 (SD=3.86) joined the second study. All participants (5 females, 4 males, 1 non-binary) signed consent forms and received appreciation notebooks, and they had a master degree or above. Four took daily handwritten notes, while the rest did so weekly.

9.1.3 Procedure.

After completing demographic questions, participants were briefed about Inkeraction and the study’s purpose. To compare Inkeraction’s features described in Section 6.2 with the baseline, we designed five tasks, excluding the Brainstorm feature due to its lack of a baseline equivalent. Figure 19 shows these tasks and demonstrates their completion in both baseline and Inkeraction. Table 2 outlines the typical user actions needed for task completion. Before each task, a facilitator provided instructions. For Correcting, Rearranging, and Aligning tasks, the facilitator helped the participants set up the content needed for each trial. Then each participant completed each task twice, once using the Inkeraction prototype and once with the baseline. They had ample opportunity to familiarize themselves with the tasks and tools before the trials began. To ensure counterbalancing, half of the participants commenced with an Inkeraction trial, while the others started with a baseline trial. The tool order was reversed after each task. Each participant completed a total of ten trials. For the Correcting and Writing tasks, relevant content was displayed on a screen during the trials.
Afterward, the participant compared Inkeraction features with the baseline via a 7-point Likert scale (See Table 4). Then, they saw unimplemented features via video, sharing their thoughts on Inkeraction’s potential as a thinking and inking tool. The entire study took about 45 minutes.
Figure 19:
Figure 19: The baseline setup and five tasks in Study II. Figure 19b, 19c, 19e, and 19f showed the actual content used in the study.

9.1.4 Metrics.

To assess the efficiency and effectiveness of inking tools, we measured the following metrics:
User Action: Every stroke, gesture, and button click counts as a user action, with fewer actions indicating higher efficiency
Completion Time: Time elapsed between the first user action and the moment the system completes processing the last user action, with shorter time indicating faster tool performance.
Writing Quality: Assessed through a survey comparing 50 pairs of writing samples from the 10 participants through 5 tasks (i.e., one sample in each trial). In each survey question, a rater, unaware of the writing tools used, compared a pair of samples produced by the same participant in one task but with different tools. The order of the questions were randomized. Figure 20 demonstrates an example question. Twenty-seven participants from our institute, who did not participate in the prior studies, completed the survey. Initially, we attempted to assess the writing quality through participants’ own feedback, which we found too subjective and suffered from participant response bias [16]. While acknowledging the subjective nature of handwriting, we believe the collective opinions from the raters provide a valid indicator of public perception.
Figure 20:
Figure 20: An example survey question for rating the writing samples.

9.1.5 Design and Analysis.

During the study, we captured screen and audio recordings. Analysis of these recordings and log data enabled us to calculate the metrics. We summarized the results from the Likert scale and transcribed participant comments, and aggregated the survey data for further analysis.

9.2 Results and Findings

9.2.1 User Action.

Table 2 indicates that Inkeraction users completed tasks with statistically significantly fewer actions, demonstrating increased efficiency. Notably, in the Aligning task, Inkeraction reduced the actions needed by a factor of 18.5 compared to the baseline.
We also showed the typical user actions we observed. Not all participants followed these typical actions, as they could make mistakes or used other tools. Examining the types of performed actions, we found that Inkeraction reduced the number of actions by minimizing writing actions, simplifying selection, and automating alignment. For example, in the Transcribing, Correcting, and Writing tasks, handwriting a word cost participants an average of 6.08 (SD = 1.27) strokes and additional erasing actions, while Inkeraction allowed the participants to compose a word through one or two actions.
Inkeraction’s recognition capabilities also aided in tasks requiring selection (e.g., Rearranging, Aligning). For example, in the Rearranging task, the marquee selection tool cost more actions to select a desired target compared to the recognition-powered tap-to-select technique 1. It took a participant 2.29 (SD = 1.22) strokes on average to marquee select a desired target, while the tap-to-select technique only cost 1.28 (SD = 0.26) taps. The tap-to-select technique had larger selection area and thus required less efforts to use [46], which may explain the improvement.
Additionally, Inkeraction reduced the number of selection and dragging by automatically aligning items. For example, in the Rearranging task, a participant typically dragged over three items (Mean = 3.56, SD = 1.33) to complete the baseline trial, compared to only two drags (Mean = 2.00, SD = 0.00) in the Inkeraction trial. As an extreme example, P18 dragged 7 times in the Rearranging baseline trial, which took them 95.05 seconds. The automatic alignment made ink manipulating more efficient (less steps) and more consistent (smaller deviation) across different users.
Table 2:
TaskTrial : Typical User Actions Needed to Complete the TaskMean # of ActionsSignificance
TranscribingB(Baseline): Write[...All words...]70.60(SD = 13.08)t(9) = 9.43
p < .001
 I(Inkeraction): Write[“Types of Wine”]
6 × { TapSelect<A phrase>Drag }
28.50(SD = 3.98) 
CorrectingB: 4 × { Erase[A misspelled word]Write[A correct word]}30.50(SD = 13.24)t(9) = 5.35
p < .001
 I: 4 × { Tap[A misspelled word]Tap[Suggested word]}8.30(SD = 0.67) 
WritingB: Write[...All Words...]69.10(SD = 16.09)t(9) = 6.36
p < .001
 I: 6 × { Write[A word]Tap[Auto completed word]}46.90(SD = 10.03) 
RearrangingB: MarqueeSelect<A>Drag to empty space
MarqueeSelect<B>Drag to the original location of A
MarqueeSelect<A>Drag the original location of B


22.00(SD = 15.06)
t(9) = 3.39
p = .012
 I: TapSelect<A>Drag to the location of B
TapSelect<B>Drag to the original location of A
7.00(SD = 1.05) 
AligningB: RepeatUntilSatisfied { MarqueeSelect<A word>
Drag to a desired location }

18.50(SD = 9.95)
t(9) = 5.56
p < .001
 I: Tap[Beautification Button]1.00(SD = 0.00) 
Table 2: The user action data from the five tasks in Study II. For the number of user actions, the lower the better. The Inkeraction trials outperformed the baseline trials across all tasks. Through two-tailed paired t-tests, we validated that all the results were statistically significant (p < .05). We also documented the typical user actions needed to complete the tasks.

9.2.2 Completion Time.

Figure 21 shows the average time participants spent on each trial. Using Inkeraction, participants finished all the tasks statistically significantly faster than using the baseline. Comparing the data from all the trials (N = 100), there is a moderate positive linear correlation between the number of user actions and the completion time (r = 0.77). More actions would introduce more time, but the types of actions and the time gap between actions can also vary the time.
We also found that the Auto Completion feature, criticized by its lagging performance in Study I, still helped the participants with their Writing tasks. It statistically significantly reduced the number of actions and the completion time. This task has six two-word phrases. Auto Completion managed to suggest 5.30 (SD = 1.83) words on average, of which 4.80 (SD = 1.62) words were confirmed by the participants.
Figure 21:
Figure 21: The time participants took to complete the five tasks using two writing tools in Study II. Lower is better. Error bars represent standard deviations. Two-tailed paired t-tests were marked above the error bars. The Inkeraction trials took statistically significantly less time than the baseline trials in all five tasks (p < .05).

9.2.3 Writing Quality.

Table 3 presents the survey results from 27 human raters. Votes were converted to scores ranging from -27 to 27, with positive scores indicating preference for Inkeraction and higher scores are better. Out of 50 pairs of writing samples, \(76\%\) received positive scores. Inkeraction achieved a mean score of 8.7 (SD = 10.5) across all participants and tasks.
Table 3:
TaskParticipantMean
/Task
 P13P14P15P16P17P18P19P20P21P22 
Transcribing+16/-7+20/-5+18/-4+19/-7+6/-19+21/-4+17/-3+11/-12+15/-4+12/-137.7
\(SD=9.1\hphantom{1}\)
Correcting+17/-1+20/-1+19/-3+19/-2+17/-5+25/-0+14/-6+13/-6+20/-3+15/-814.4
\(SD=5.6\hphantom{1}\)
Writing+8/-17+16/-6+21/-4+5/-19+16/-9+20/-2+22/-3+18/-2+11/-11+14/-67.2
SD = 11.0
Rearranging+5/-3+6/-2+21/-4+0/-3+5/-2+18/-1+18/-1+3/-2+3/-1+2/-65.6
\(SD=7.8\hphantom{1}\)
Aligning+9/-16+27/-0+27/-0+20/-3+20/-1+19/-2+14/-8+9/-9+9/-12+2/-208.5
SD = 14.7
Mean
/Participant
2.2
\(SD=9.5\hphantom{1}\)
15.0
\(SD=7.8\hphantom{1}\)
18.2
\(SD=4.5\hphantom{1}\)
5.8
SD = 12.3
5.6
SD = 10.7
18.8
\(SD=3.1\hphantom{1}\)
12.8
\(SD=5.0\hphantom{1}\)
4.6
\(\hphantom{1}SD=6.3\hphantom{11}\)
5.4
\(\hphantom{1}SD=7.4\hphantom{11}\)
-1.6
\(\hphantom{1}SD=9.4\hphantom{11}\)
8.7
SD = 10.5
Table 3: The ratings for writing samples. Each table cell shows the ratings for a pair of writing samples produced using Inkeraction and the baseline. Each sample pair was rated by 27 human raters, who preferred the one produced by Inkeraction, the one produced by baseline, or indifference. For each sample pair, we counted the votes for Inkeraction (marked with +) and the votes for baseline (marked with -). The votes for indifference could be inferred by (27 − positivesnegatives). Each cell is color coded with a score, which is (positivesnegatives). Higher scores favor Inkeraction. The highest possible score is +27, 0 means there is no difference, while the lowest possible score is -27. We also reported the mean scores.
Inkeraction outperformed the baseline in all tasks, with the highest score in the Correcting task. This is likely due to Inkeraction’s Spell Check feature, which replaced entire misspelled words and ensured alignment, while some participants made character-level edits (e.g., erasing a “l” in “marvellous”) that resulted in unbalanced samples. The Rearranging task received the lowest score, although still favoring Inkeraction. The similarity between the samples in this task, where only two words were swapped, may account for this. Notably, the samples from P15, P18, and P19 received higher vote gaps in the Rearranging task. Among these samples, two baseline samples had misaligned words or bullets while the other baseline sample missed a stroke (due to an inaccurate marquee selection).
Examining mean scores by participant, we found that 9 out of 10 participants’ samples received positive mean scores, indicating improved writing quality with Inkeraction. An exception was P22, who spent significantly more time on baseline trials. For example, in the Aligning task, they spent 45 seconds on the baseline trial, while the mean completion time was 27.11 seconds. We think this indicated that P22 was spending extra efforts in these trials, which led to higher writing quality outperforming the writing quality afforded by Inkeraction. Further investigation is warranted to confirm this hypothesis.

9.2.4 Likert Scale and Qualitative Feedback.

Table 4:
Statements1234567Mean(SD)
S1. With Inkeraction, I need to spend more time finishing tasks.55     1.50 (0.50)
S2. With Inkeraction, I can finish tasks with less tool switching.    1546.30 (0.64)
S3. It is more difficult to manipulate content in Inkeraction.43 3   2.20 (1.25)
S4. Inkeraction reduces repetitive work for me.    1456.40 (0.66)
S5. It is more difficult to finish the tasks with Inkeraction.541    1.60 (0.66)
S6. With Inkeraction, I can focus more on thinking.  122415.20 (1.17)
Table 4: The six statements (S1 – S6) in the Likert-scale used in Study II, with their histograms and mean scores. Each statement was scored from 1 (which indicates “strongly disagree”) to 7 (which indicates “strongly agree”). For positive statements (S2, S4, S6), higher scores are better. For negative statements (S1, S3, S5), lower scores are better.
Table 4 presents the Likert scale results, indicating that participants generally perceived Inkeraction as superior to the baseline. Inkeraction enhanced efficiency (low scores on S1), simplified interactions (high scores on S2 and S4), reduced effort (low scores on S3 and S5), and promoted focus (high scores on S6). Three participants remained neutral regarding S3, they mentioned that certain interactions remained equally challenging. For example, P17 noted that while Inkeraction reduced steps, it did not alter the intrinsic difficulty of dragging. Seven participants concurred that Inkeraction aided cognition, citing the reduced actions (P13) and simplified manipulations (P14, P17) as factors facilitating focus on thinking. The other three participants felt there was not enough evidence to support this statement. For example, P19 explained their negative score by saying “the tasks are too simple so there is little evidence.”
After watching other untested features, the participants were very excited about the potential use cases of Inkeraction, as expressed by P17, “I need these, right now!” The participants also provided suggestions. P22 thought that these features could be optimized for students, who need to take notes from slides and constantly organize notes. P20 provided ideas to improve the Auto Completion feature. Instead of using a finger tap to confirm a suggested word, they thought a user could indicate their confirmation by superimposing on the suggested word.

10 Discussion

10.1 Writing and Thinking with Inkeraction

Inkeraction is a new modality with potential to transform the way we interact with handwritten content. This paper presents features showcasing the power of Inkeraction. Study I revealed how participants embraced Inkeraction’s features to overcome challenges they encountered with traditional handwriting. Participants like P7, who exclaimed, “This new thing lets me write like a boss. Now I can pump out tons of stuff without worry,” exemplified the positive impact these features have on user experience. Study II provided compelling quantitative evidence. Participants equipped with Inkeraction wrote faster, with better writing quality, and using fewer actions. Their qualitative feedback further underscored their appreciation for Inkeraction’s efficiency, which freed up valuable time and allowed them to focus on the content itself.
Inkeraction was conceived as a cornerstone of our visionary thinking tool. While traditional pen-and-paper methods often fall short in capturing thought processes due to their lack of editability, digital inking tools, with their added malleability, can shift content as users refine their initial ideas. However, these tools often struggle to match the speed of thought and may involve repetitive tasks. Drawing upon the insights from the design workshop, Inkeraction enhances the malleability and reduces repetitive work, making handwriting faster and better. Additionally, Inkeraction empowers artificial intelligence to co-pilot users’ thinking activities, offering valuable assistance in brainstorming and information organization.
Developing and evaluating a comprehensive thinking tool remains an ongoing research question. An ink-based thinking tool transcends the role of a mere writing assistant, encompassing functions like information seeking, idea inspiration, and persistent organization, among others. While Inkeraction lays a strong foundation for such a tool, it remains a work in progress. As some participants noted, the inking tasks in Study II lacked the sophistication necessary to fully demonstrate how Inkeraction alters thinking patterns. This exploration forms a key focus of our future research journey.

10.2 Design Challenges and Treatments

Inkeraction, as like other modalities, still has limitations. Based on the studies and our development experience, we identify four major challenges and their treatments.

10.2.1 Recognition Challenge.

The recognition of strokes may not always match what the user expects. This can be due to the imperfect recognition algorithms, or the dynamic and versatile nature of ink. For example, a user may start writing a text block with multiple lines of text, but then decide to treat it as a list instead.
To address this issue, ink should be transparent as it approaches the user’s desired outcome, so the user can act accordingly [45]. In our prototype, we implemented a preview technique, which allows users to see the recognized structures and possible outcomes before they make a change. When the user drags an item, we use a force-directed layout algorithm to show the recognized structures (see Figure 22). The layout algorithm considers the containment relationships between nodes. The children of a node have a stronger connecting force than the forces between the node and its neighbors. Once the user hovers over a location for an extended time, we cancel the force-directed layout and render the possible outcome for that location. Newly synthesized items will be rendered in gray (see Figure 23b). This preview technique helps users to better align their expectations with the recognition results. It also makes the inkeraction more transparent, as users can see what is happening behind the scenes.
Figure 22:
Figure 22: A demonstration of the force-directed layout algorithm. When a user drags the “moving item” from the bottom to the top, the layout algorithm moves the list and the textblock to indicate the recognized structures. The leftmost image is the original handwriting for reference.
Figure 23:
Figure 23: A demonstration of the preview effect when dragging the “moving item” into a list.

10.2.2 Interpretation Challenge.

Limited user input makes it hard to understand the desired outcome when there are many possible outcomes. For example, when dragging an item into a list, the user may want to insert the item as a new list item, or merge the dragging item with an existing list item.
To address the challenge of understanding user intent, we can use micro-interactions to derive the user’s desired outcome. In our prototype, we implemented Activation Areas (AAs) to interpret the user’s intent during a drag operation. Each structure has its own AA, which is based on its bounding box. We add padding to each bounding box. Figure 24 shows an example of AAs in a scene with a list and a text block.
Figure 24:
Figure 24: Different Activation Areas for a list and a textblock. Cyan text indicates the outcome for a particular dragging area.

10.2.3 Help-Hinder Challenge.

One design dilemma we faced was that the more help we offered, the more likely users would feel hindered. Users may want different levels of assistance depending on tasks and context.
To avoid hindering users, we propose two techniques: Unassisted Mode and AI Undo. In Unassisted Mode, users can perform actions without help, such as leaving an item overlapping with other strokes. To do this, users can use a two-finger drag gesture to move strokes without assistance. We also considered a postfix method called AI Undo. When Inkeraction performs some tasks on behalf of the user, an undo option will show up and allow the user to cancel the Inkeraction’s effects. The undo option will fade away in seconds to minimize interruptions. AI Undo can also be used for mode switching. See Figure 25 for examples.
However, these techniques can be tedious for users to use, especially if there are several steps involved. To address this issue, we propose personalizing a decision model for individual users. The decision model would take the interaction context as input and output the most likely helping steps. This would allow us to provide users with the level of assistance that they want.
Figure 25:
Figure 25: The AI Undo feature for quickly reverting the Inkeraction’s effects. Yellow glows indicate the affecting strokes.

10.2.4 Performance Challenge.

Our user study found that the speed of handwriting recognition can impact the user experience. While it would be ideal for handwriting to be recognized and displayed as quickly as typing, the machine learning algorithms used for handwriting recognition are more complex than the straightforward input and output devices of keyboards and monitors.
To accommodate for latency, we suggest avoiding competition with the user or require them to take immediate action. For example, the current Auto Completion feature only provides suggestions for one following word after the user has typed a word. To avoid competition, we can provide longer suggestions. To avoid immediate actions, we can ask the user to react to suggestions after they have finished their writing task.

10.3 Limitations and Future Work

The user studies provided valuable feedback on different features of Inkeraction, but did not examine the recognition and synthesis performance in detail. While we provided the technical specifications in Section 4 and runtime performance in Section 7, we need to test its performance in practice to fully understand the usefulness of Inkeraction. The free-form nature of handwriting makes it difficult to measure the performance of Inkeraction in controlled lab studies. These studies are typically limited to a small number of users who are asked to perform a specific task, and do not give us a realistic understanding of how Inkeraction will perform in real-world use cases. To address this limitation, we plan to conduct long-term studies of Inkeraction. We will monitor how often users notice unexpected behavior in their real-world use cases. This will give us a more accurate understanding of the system’s performance and identify any areas where it can be improved.
In our studies, we used simplified functions to simulate language models for features like Spell Check, Auto Completion, and Brainstorm. This allowed us to collect user feedback on these features, but it did not give us a complete picture of their potential. We believe that the Brainstorm feature was undervalued in our study. This is because the simulated language model was not able to generate as many creative and interesting ideas as a real language model. In the future, we plan to spend more time studying and testing how Inkeraction and generative models can create new user experience.
Inkeraction currently supports written text in English. There are challenges to expand language support to other languages that we plan to investigate in the future. For example, a right-to-left language (e.g., Hebrew) could have different rules for manipulating paragraphs and lists. For another example, when used in logosyllabic languages (e.g., Chinese or Korean), the Spell Check feature may be operated at the character level. While the features proposed in this work provide a general framework, we suggest that future research could adapt to individual use cases with a consideration of languages and cultures.

11 Conclusion

We introduced Inkeraction, a novel approach for interacting with digital ink. Inkeraction recognizes a user’s handwriting and synthesizes strokes to assist with inking tasks. It can help users rearrange strokes, provide word-processor-level writing assistance, automate repetitive writing tasks, and seek help from generative models when needed. We evaluated Inkeraction in two studies and found that it enabled users to write faster and with higher quality in fewer steps. We also discussed the limitations of Inkeraction and explored future techniques for using and improving it.

Acknowledgments

We thank all the participants for their time and feedback.

Footnote

1
Not every participants used the two selection tools. P15 rewrote content in the baseline trial and P14 still used the marquee tool in the Inkeraction trial.

Supplemental Material

MP4 File - Video Preview
Video Preview
Transcript for: Video Preview
MP4 File - Video Presentation
Video Presentation
Transcript for: Video Presentation
MP4 File - Video Figure
The demo video for the paper, showcasing the design space empowered by Inkeraction.
Transcript for: Video Figure

References

[1]
Emre Aksan, Fabrizio Pece, and Otmar Hilliges. 2018. DeepWriting: Making Digital Ink Editable via Deep Generative Modeling. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–14. https://doi.org/10.1145/3173574.3173779
[2]
James Arvo and Kevin Novins. 2000. Fluid Sketches: Continuous Recognition and Morphing of Simple Hand-Drawn Shapes. In Proceedings of the 13th Annual ACM Symposium on User Interface Software and Technology (San Diego, California, USA) (UIST ’00). Association for Computing Machinery, New York, NY, USA, 73–80. https://doi.org/10.1145/354401.354413
[3]
Amid Ayobi, Tobias Sonne, Paul Marshall, and Anna L. Cox. 2018. Flexible and Mindful Self-Tracking: Design Implications from Paper Bullet Journals. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–14. https://doi.org/10.1145/3173574.3173602
[4]
David Bargeron and Tomer Moscovich. 2003. Reflowing Digital Ink Annotations. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Ft. Lauderdale, Florida, USA) (CHI ’03). Association for Computing Machinery, New York, NY, USA, 385–393. https://doi.org/10.1145/642611.642678
[5]
David Bargeron and Tomer Moscovich. 2003. Reflowing Digital Ink Annotations. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Ft. Lauderdale, Florida, USA) (CHI ’03). Association for Computing Machinery, New York, NY, USA, 385–393. https://doi.org/10.1145/642611.642678
[6]
Andrea Bianchi, So-Ryang Ban, and Ian Oakley. 2015. Designing a Physical Aid to Support Active Reading on Tablets. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (Seoul, Republic of Korea) (CHI ’15). Association for Computing Machinery, New York, NY, USA, 699–708. https://doi.org/10.1145/2702123.2702303
[7]
Peter Brandl, Christoph Richter, and Michael Haller. 2010. NiCEBook: Supporting Natural Note Taking. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Atlanta, Georgia, USA) (CHI ’10). Association for Computing Machinery, New York, NY, USA, 599–608. https://doi.org/10.1145/1753326.1753417
[8]
William Buxton, Eugene Fiume, Ralph Hill, Allison Lee, and Carson Woo. 1983. Continuous hand-gesture driven input. In Graphics Interface, Vol. 83. 191–195.
[9]
Victor Carbune, Pedro Gonnet, Thomas Deselaers, Henry A. Rowley, Alexander Daryin, Marcos Calvo, Li-Lun Wang, Daniel Keysers, Sandro Feuz, and Philippe Gervais. 2020. Fast multi-language LSTM-based online handwriting recognition. International Journal on Document Analysis and Recognition (IJDAR) 23, 2 (01 Jun 2020), 89–102. https://doi.org/10.1007/s10032-020-00350-4
[10]
K. Charmaz. 2014. Constructing Grounded Theory. SAGE Publications, Thousand Oaks, CA, USA. https://books.google.com/books?id=v_GGAwAAQBAJ
[11]
Nicholas Chen, Francois Guimbretiere, and Abigail Sellen. 2012. Designing a Multi-Slate Reading Environment to Support Active Reading Activities. ACM Trans. Comput.-Hum. Interact. 19, 3, Article 18 (oct 2012), 35 pages. https://doi.org/10.1145/2362364.2362366
[12]
Patrick Chiu, Ashutosh Kapuskar, Sarah Reitmeier, and Lynn Wilcox. 1999. NoteLook: Taking Notes in Meetings with Digital Video and Ink. In Proceedings of the Seventh ACM International Conference on Multimedia (Part 1) (Orlando, Florida, USA) (MULTIMEDIA ’99). Association for Computing Machinery, New York, NY, USA, 149–158. https://doi.org/10.1145/319463.319483
[13]
Felicia Cordeiro, Daniel A. Epstein, Edison Thomaz, Elizabeth Bales, Arvind K. Jagannathan, Gregory D. Abowd, and James Fogarty. 2015. Barriers and Negative Nudges: Exploring Challenges in Food Journaling. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (Seoul, Republic of Korea) (CHI ’15). Association for Computing Machinery, New York, NY, USA, 1159–1162. https://doi.org/10.1145/2702123.2702155
[14]
Richard C. Davis, James A. Landay, Victor Chen, Jonathan Huang, Rebecca B. Lee, Frances C. Li, James Lin, Charles B. Morrey, Ben Schleimer, Morgan N. Price, and Bill N. Schilit. 1999. NotePals: Lightweight Note Sharing by the Group, for the Group. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Pittsburgh, Pennsylvania, USA) (CHI ’99). Association for Computing Machinery, New York, NY, USA, 338–345. https://doi.org/10.1145/302979.303107
[15]
Illya Degtyarenko, Ivan Deriuga, Andrii Grygoriev, Serhii Polotskyi, Volodymyr Melnyk, Dmytro Zakharchuk, and Olga Radyvonenko. 2021. Hierarchical Recurrent Neural Network for Handwritten Strokes Classification. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, Toronto, ON, Canada, 2865–2869. https://doi.org/10.1109/ICASSP39728.2021.9413412
[16]
Nicola Dell, Vidya Vaidyanathan, Indrani Medhi, Edward Cutrell, and William Thies. 2012. "Yours is Better!": Participant Response Bias in HCI. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Austin, Texas, USA) (CHI ’12). Association for Computing Machinery, New York, NY, USA, 1321–1330. https://doi.org/10.1145/2207676.2208589
[17]
Jelle van Dijk, Jirka van der Roest, Remko van der Lugt, and Kees C.J. Overbeeke. 2011. NOOT: A Tool for Sharing Moments of Reflection during Creative Meetings. In Proceedings of the 8th ACM Conference on Creativity and Cognition (Atlanta, Georgia, USA) (C&C ’11). Association for Computing Machinery, New York, NY, USA, 157–164. https://doi.org/10.1145/2069618.2069646
[18]
Yousef Elarian, Radwan Abdel-Aal, Irfan Ahmad, Mohammad Tanvir Parvez, and Abdelmalek Zidouri. 2014. Handwriting synthesis: classifications and techniques. International Journal on Document Analysis and Recognition (IJDAR) 17, 4 (01 Dec 2014), 455–469. https://doi.org/10.1007/s10032-014-0231-x
[19]
Chris Elsden, Abigail C. Durrant, and David S. Kirk. 2016. It’s Just My History Isn’t It? Understanding Smart Journaling Practices. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’16). Association for Computing Machinery, New York, NY, USA, 2819–2831. https://doi.org/10.1145/2858036.2858103
[20]
Marina Fernández Camporro and Nicolai Marquardt. 2020. Live Sketchnoting Across Platforms: Exploring the Potential and Limitations of Analogue and Digital Tools. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3313831.3376192
[21]
Morten Fjeld and Wolmet Barendregt. 2009. Epistemic action: A measure for cognitive support in tangible user interfaces?Behavior Research Methods 41, 3 (01 Aug 2009), 876–881. https://doi.org/10.3758/BRM.41.3.876
[22]
David Galbraith. 2009. Cognitive models of writing. German as a foreign language2-3 (2009), 7–22. https://eprints.soton.ac.uk/337496/
[23]
Katy Ilonka Gero, Lydia Chilton, Chris Melancon, and Mike Cleron. 2022. Eliciting Gestures for Novel Note-Taking Interactions. In Proceedings of the 2022 ACM Designing Interactive Systems Conference (Virtual Event, Australia) (DIS ’22). Association for Computing Machinery, New York, NY, USA, 966–975. https://doi.org/10.1145/3532106.3533480
[24]
Alex Graves. 2014. Generating Sequences With Recurrent Neural Networks. arxiv:1308.0850 [cs.NE]
[25]
Thomas T. Hewett. 2005. Informing the design of computer-based environments to support creativity. International Journal of Human-Computer Studies 63, 4 (2005), 383–409. https://doi.org/10.1016/j.ijhcs.2005.04.004 Computer support for creativity.
[26]
Ken Hinckley, Patrick Baudisch, Gonzalo Ramos, and Francois Guimbretiere. 2005. Design and Analysis of Delimiters for Selection-Action Pen Gesture Phrases in Scriboli. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Portland, Oregon, USA) (CHI ’05). Association for Computing Machinery, New York, NY, USA, 451–460. https://doi.org/10.1145/1054972.1055035
[27]
Ken Hinckley, Xiaojun Bi, Michel Pahud, and Bill Buxton. 2012. Informal Information Gathering Techniques for Active Reading. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Austin, Texas, USA) (CHI ’12). Association for Computing Machinery, New York, NY, USA, 1893–1896. https://doi.org/10.1145/2207676.2208327
[28]
Ken Hinckley, Shengdong Zhao, Raman Sarin, Patrick Baudisch, Edward Cutrell, Michael Shilman, and Desney Tan. 2007. InkSeine: In Situ Search for Active Note Taking. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’07). Association for Computing Machinery, New York, NY, USA, 251–260. https://doi.org/10.1145/1240624.1240666
[29]
Matthew Hong, Anne Marie Piper, Nadir Weibel, Simon Olberding, and James Hollan. 2012. Microanalysis of Active Reading Behavior to Inform Design of Interactive Desktop Workspaces. In Proceedings of the 2012 ACM International Conference on Interactive Tabletops and Surfaces (Cambridge, Massachusetts, USA) (ITS ’12). Association for Computing Machinery, New York, NY, USA, 215–224. https://doi.org/10.1145/2396636.2396670
[30]
Heloise Hwawen Hse and A. Richard Newton. 2005. Recognition and beautification of multi-stroke symbols in digital ink. Computers & Graphics 29, 4 (2005), 533–546. https://doi.org/10.1016/j.cag.2005.05.006
[31]
Emanuel Indermühle, Marcus Liwicki, and Horst Bunke. 2010. IAMonDo-Database: An Online Handwritten Document Database with Non-Uniform Contents. In Proceedings of the 9th IAPR International Workshop on Document Analysis Systems (Boston, Massachusetts, USA) (DAS ’10). Association for Computing Machinery, New York, NY, USA, 97–104. https://doi.org/10.1145/1815330.1815343
[32]
Sam Jacoby and Leah Buechley. 2013. Drawing the Electric: Storytelling with Conductive Ink. In Proceedings of the 12th International Conference on Interaction Design and Children (New York, New York, USA) (IDC ’13). Association for Computing Machinery, New York, NY, USA, 265–268. https://doi.org/10.1145/2485760.2485790
[33]
Michael Jungo, Beat Wolf, Andrii Maksai, Claudiu Musat, and Andreas Fischer. 2023. Character Queries: A Transformer-Based Approach to On-line Handwritten Character Segmentation. In Document Analysis and Recognition - ICDAR 2023, Gernot A. Fink, Rajiv Jain, Koichi Kise, and Richard Zanibbi (Eds.). Springer Nature Switzerland, Cham, 98–114.
[34]
Viacheslav Khomenko, Andriy Volkoviy, Illya Degtyarenko, and Olga Radyvonenko. 2017. Handwriting Text/Non-Text Classification on Mobile Device. In The Fourth International Conference on Artificial Intelligence and Pattern Recognition (AIPR2017). The Society of Digital Information and Wireless Communications, New Castle, DE, USA.
[35]
Yea-Seul Kim, Nathalie Henry Riche, Bongshin Lee, Matthew Brehmer, Michel Pahud, Ken Hinckley, and Jessica Hullman. 2019. Inking Your Insights: Investigating Digital Externalization Behaviors During Data Analysis. In Proceedings of the 2019 ACM International Conference on Interactive Surfaces and Spaces (Daejeon, Republic of Korea) (ISS ’19). Association for Computing Machinery, New York, NY, USA, 255–267. https://doi.org/10.1145/3343055.3359714
[36]
David Kirsh. 2000. A Few Thoughts on Cognitive Overload. Intellectica 1, 30 (2000), 19–51.
[37]
David Kirsh. 2010. Thinking with external representations. AI & SOCIETY 25, 4 (01 Nov 2010), 441–454. https://doi.org/10.1007/s00146-010-0272-8
[38]
David Kirsh and Paul Maglio. 1994. On Distinguishing Epistemic from Pragmatic Action. Cognitive Science 18, 4 (1994), 513–549. https://doi.org/10.1016/0364-0213(94)90007-8
[39]
Axel Kramer. 1994. Translucent Patches—dissolving Windows. In Proceedings of the 7th Annual ACM Symposium on User Interface Software and Technology (Marina del Rey, California, USA) (UIST ’94). Association for Computing Machinery, New York, NY, USA, 121–130. https://doi.org/10.1145/192426.192474
[40]
Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Alexander Kolesnikov, Tom Duerig, and Vittorio Ferrari. 2020. The Open Images Dataset V4. International Journal of Computer Vision 128, 7 (01 Jul 2020), 1956–1981. https://doi.org/10.1007/s11263-020-01316-z
[41]
James A. Landay. 1999. Using note-taking appliances for student to student collaboration. In FIE’99 Frontiers in Education. 29th Annual Frontiers in Education Conference. Designing the Future of Science and Engineering Education. Conference Proceedings (IEEE Cat. No.99CH37011, Vol. 2. IEEE, San Juan, PR, USA, 12C4/15–12C4/20 vol.2. https://doi.org/10.1109/FIE.1999.841640
[42]
James A. Landay and Brad A. Myers. 1995. Interactive Sketching for the Early Stages of User Interface Design. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’95). ACM Press/Addison-Wesley Publishing Co., USA, 43–50. https://doi.org/10.1145/223904.223910
[43]
James A. Landay and Brad A. Myers. 2001. Sketching interfaces: toward more human interface design. Computer 34, 3 (2001), 56–64. https://doi.org/10.1109/2.910894
[44]
Jakob Leitner and Michael Haller. 2011. Harpoon Selection: Efficient Selections for Ungrouped Content on Large Pen-Based Surfaces. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (Santa Barbara, California, USA) (UIST ’11). Association for Computing Machinery, New York, NY, USA, 593–602. https://doi.org/10.1145/2047196.2047275
[45]
Clayton Lewis and Donald A Norman. 1995. Designing for Error. In Readings in Human–Computer Interaction. Morgan Kaufmann, 686–697. https://doi.org/10.1016/B978-0-08-051574-8.50071-6
[46]
I. Scott MacKenzie. 1992. Fitts’ Law as a Research and Design Tool in Human-Computer Interaction. Human–Computer Interaction 7, 1 (1992), 91–139. https://doi.org/10.1207/s15327051hci0701_3
[47]
Andrii Maksai, Henry Rowley, Jesse Berent, and Claudiu Musat. 2022. Inkorrect: Digital Ink Spelling Correction. In ICLR Workshop on Deep Generative Models for Highly Structured Data. https://openreview.net/forum?id=BSllnh4uDZq
[48]
Catherine C. Marshall. 1997. Annotation: From Paper Books to the Digital Library. In Proceedings of the Second ACM International Conference on Digital Libraries (Philadelphia, Pennsylvania, USA) (DL ’97). Association for Computing Machinery, New York, NY, USA, 131–140. https://doi.org/10.1145/263690.263806
[49]
Thomas P. Moran, Patrick Chiu, and William van Melle. 1997. Pen-Based Interaction Techniques for Organizing Material on an Electronic Whiteboard. In Proceedings of the 10th Annual ACM Symposium on User Interface Software and Technology (Banff, Alberta, Canada) (UIST ’97). Association for Computing Machinery, New York, NY, USA, 45–54. https://doi.org/10.1145/263407.263508
[50]
Thomas P. Moran, Patrick Chiu, William van Melle, and Gordon Kurtenbach. 1995. Implicit Structure for Pen-Based Systems within a Freeform Interaction Paradigm. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’95). ACM Press/Addison-Wesley Publishing Co., USA, 487–494. https://doi.org/10.1145/223904.223970
[51]
Meredith Ringel Morris, A.J. Bernheim Brush, and Brian R. Meyers. 2007. Reading Revisited: Evaluating the Usability of Digital Display Surfaces for Active Reading Tasks. In Second Annual IEEE International Workshop on Horizontal Interactive Human-Computer Systems (TABLETOP’07)(Second Annual IEEE International Workshop on Horizontal Interactive Human-Computer Systems (TABLETOP’07)). IEEE, Newport, RI, USA, 79–86. https://doi.org/10.1109/TABLETOP.2007.12
[52]
Pam A. Mueller and Daniel M. Oppenheimer. 2014. The Pen Is Mightier Than the Keyboard: Advantages of Longhand Over Laptop Note Taking. Psychological Science 25, 6 (2014), 1159–1168. https://doi.org/10.1177/0956797614524581 arXiv:https://doi.org/10.1177/0956797614524581PMID: 24760141.
[53]
Elizabeth D. Mynatt, Takeo Igarashi, W. Keith Edwards, and Anthony LaMarca. 1999. Flatland: New Dimensions in Office Whiteboards. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Pittsburgh, Pennsylvania, USA) (CHI ’99). Association for Computing Machinery, New York, NY, USA, 346–353. https://doi.org/10.1145/302979.303108
[54]
Shishir G. Patil, Tianjun Zhang, Xin Wang, and Joseph E. Gonzalez. 2023. Gorilla: Large Language Model Connected with Massive APIs. arxiv:2305.15334 [cs.CL]
[55]
Florian Perteneder, Martin Bresler, Eva-Maria Grossauer, Joanne Leong, and Michael Haller. 2015. CLuster: Smart Clustering of Free-Hand Sketches on Large Interactive Surfaces. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology (Charlotte, NC, USA) (UIST ’15). Association for Computing Machinery, New York, NY, USA, 37–46. https://doi.org/10.1145/2807442.2807455
[56]
Annie Piolat, Thierry Olive, and Ronald T Kellogg. 2005. Cognitive effort during note taking. Applied Cognitive Psychology 19 (2005), 291–312. Issue 3. https://doi.org/10.1002/acp.1086
[57]
Morgan N. Price, Bill N. Schilit, and Gene Golovchinsky. 1998. XLibris: The Active Reading Machine. In CHI 98 Conference Summary on Human Factors in Computing Systems (Los Angeles, California, USA) (CHI ’98). Association for Computing Machinery, New York, NY, USA, 22–23. https://doi.org/10.1145/286498.286510
[58]
Yi Ren, Yang Li, and Edward Lank. 2014. InkAnchor: Enhancing Informal Ink-Based Note Taking on Touchscreen Mobile Phones. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Toronto, Ontario, Canada) (CHI ’14). Association for Computing Machinery, New York, NY, USA, 1123–1132. https://doi.org/10.1145/2556288.2557302
[59]
Yann Riche, Nathalie Henry Riche, Ken Hinckley, Sheri Panabaker, Sarah Fuelling, and Sarah Williams. 2017. As We May Ink? Learning from Everyday Analog Pen Use to Improve Digital Ink Experiences. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’17). Association for Computing Machinery, New York, NY, USA, 3241–3253. https://doi.org/10.1145/3025453.3025716
[60]
Hugo Romat, Nathalie Henry Riche, Ken Hinckley, Bongshin Lee, Caroline Appert, Emmanuel Pietriga, and Christopher Collins. 2019. ActiveInk: (Th)Inking with Data. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3290605.3300272
[61]
Hugo Romat, Nicolai Marquardt, Ken Hinckley, and Nathalie Henry Riche. 2022. Style Blink: Exploring Digital Inking of Structured Information via Handcrafted Styling as a First-Class Object. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 336, 14 pages. https://doi.org/10.1145/3491102.3501988
[62]
Daniel M. Russell, Mark J. Stefik, Peter Pirolli, and Stuart K. Card. 1993. The Cost Structure of Sensemaking. In Proceedings of the INTERACT ’93 and CHI ’93 Conference on Human Factors in Computing Systems (Amsterdam, The Netherlands) (CHI ’93). Association for Computing Machinery, New York, NY, USA, 269–276. https://doi.org/10.1145/169059.169209
[63]
Eric Saund, David Fleet, Daniel Larner, and James Mahoney. 2003. Perceptually-Supported Image Editing of Text and Graphics. In Proceedings of the 16th Annual ACM Symposium on User Interface Software and Technology (Vancouver, Canada) (UIST ’03). Association for Computing Machinery, New York, NY, USA, 183–192. https://doi.org/10.1145/964696.964717
[64]
Bill N. Schilit, Gene Golovchinsky, and Morgan N. Price. 1998. Beyond Paper: Supporting Active Reading with Free Form Digital Ink Annotations. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Los Angeles, California, USA) (CHI ’98). ACM Press/Addison-Wesley Publishing Co., USA, 249–256. https://doi.org/10.1145/274644.274680
[65]
Abigail J Sellen and Richard HR Harper. 2003. The myth of the paperless office. MIT press, Cambridge, MA, USA.
[66]
Patrice Y Simard, David Steinkraus, and Maneesh Agrawala. 2005. Ink normalization and beautification. In Eighth International Conference on Document Analysis and Recognition (ICDAR’05). IEEE, Seoul, Korea, 1182–1187 Vol. 2. https://doi.org/10.1109/ICDAR.2005.143
[67]
Hariharan Subramonyam, Colleen Seifert, Priti Shah, and Eytan Adar. 2020. TexSketch: Active Diagramming through Pen-and-Ink Annotations. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376155
[68]
Craig J. Sutherland, Andrew Luxton-Reilly, and Beryl Plimmer. 2016. Freeform digital ink annotations in electronic documents: A systematic mapping study. Computers & Graphics 55 (2016), 1–20. https://doi.org/10.1016/j.cag.2015.10.014
[69]
Craig S. Tashman and W. Keith Edwards. 2011. Active Reading and Its Discontents: The Situations, Problems and Ideas of Readers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Vancouver, BC, Canada) (CHI ’11). Association for Computing Machinery, New York, NY, USA, 2927–2936. https://doi.org/10.1145/1978942.1979376
[70]
David R. Thomas. 2006. A General Inductive Approach for Analyzing Qualitative Evaluation Data. American Journal of Evaluation 27, 2 (2006), 237–246. https://doi.org/10.1177/1098214005283748 arXiv:https://doi.org/10.1177/1098214005283748
[71]
Barbara Tversky. 2015. The Cognitive Design of Tools of Thought. Review of Philosophy and Psychology 6, 1 (01 Mar 2015), 99–116. https://doi.org/10.1007/s13164-014-0214-3
[72]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2023. Attention Is All You Need. arxiv:1706.03762 [cs.CL]
[73]
Somin Wadhwa, Silvio Amir, and Byron C. Wallace. 2023. Revisiting Relation Extraction in the era of Large Language Models. arxiv:2305.05003 [cs.CL]
[74]
Jagoda Walny, Sheelagh Carpendale, Nathalie Henry Riche, Gina Venolia, and Philip Fawcett. 2011. Visual Thinking In Action: Visualizations As Used On Whiteboards. IEEE Transactions on Visualization and Computer Graphics 17, 12 (Dec 2011), 2508–2517. https://doi.org/10.1109/TVCG.2011.251
[75]
Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, and William Fedus. 2022. Emergent Abilities of Large Language Models. arxiv:2206.07682 [cs.CL]
[76]
Jacob O. Wobbrock, Andrew D. Wilson, and Yang Li. 2007. Gestures without Libraries, Toolkits or Training: A $1 Recognizer for User Interface Prototypes. In Proceedings of the 20th Annual ACM Symposium on User Interface Software and Technology (Newport, Rhode Island, USA) (UIST ’07). Association for Computing Machinery, New York, NY, USA, 159–168. https://doi.org/10.1145/1294211.1294238
[77]
Haijun Xia, Ken Hinckley, Michel Pahud, Xiao Tu, and Bill Buxton. 2017. WritLarge: Ink Unleashed by Unified Scope, Action, & Zoom. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’17). Association for Computing Machinery, New York, NY, USA, 3227–3240. https://doi.org/10.1145/3025453.3025664
[78]
Jun-Yu Ye, Yan-Ming Zhang, Qing Yang, and Cheng-Lin Liu. 2020. Contextual Stroke Classification in Online Handwritten Documents with Edge Graph Attention Networks. SN Computer Science 1, 3 (12 May 2020), 163. https://doi.org/10.1007/s42979-020-00177-0
[79]
Jun-Yu Ye, Yan-Ming Zhang, Qing Yang, and Cheng-Lin Liu. 2021. Joint stroke classification and text line grouping in online handwritten documents with edge pooling attention networks. Pattern Recognition 114 (2021), 107859. https://doi.org/10.1016/j.patcog.2021.107859
[80]
Dongwook Yoon, Nicholas Chen, and François Guimbretière. 2013. TextTearing: Opening White Space for Digital Ink Annotation. In Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology (St. Andrews, Scotland, United Kingdom) (UIST ’13). Association for Computing Machinery, New York, NY, USA, 107–112. https://doi.org/10.1145/2501988.2502036
[81]
Dongwook Yoon, Nicholas Chen, François Guimbretière, and Abigail Sellen. 2014. RichReview: Blending Ink, Speech, and Gesture to Support Collaborative Document Review. In Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology (Honolulu, Hawaii, USA) (UIST ’14). Association for Computing Machinery, New York, NY, USA, 481–490. https://doi.org/10.1145/2642918.2647390
[82]
C. Lawrence Zitnick. 2013. Handwriting Beautification Using Token Means. ACM Trans. Graph. 32, 4, Article 53 (jul 2013), 8 pages. https://doi.org/10.1145/2461912.2461985

Cited By

View all
  • (2024)Handwriting Enhancement: Recognition-Based and Recognition-Independent Approaches for On-Device Online Handwritten Text AlignmentIEEE Access10.1109/ACCESS.2024.341243312(99334-99348)Online publication date: 2024

Index Terms

  1. Inkeraction: An Interaction Modality Powered by Ink Recognition and Synthesis
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Please enable JavaScript to view thecomments powered by Disqus.

            Information & Contributors

            Information

            Published In

            cover image ACM Conferences
            CHI '24: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems
            May 2024
            18961 pages
            ISBN:9798400703300
            DOI:10.1145/3613904
            Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

            Sponsors

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            Published: 11 May 2024

            Check for updates

            Badges

            Author Tags

            1. Digital pen
            2. Ink
            3. Stylus interaction

            Qualifiers

            • Research-article
            • Research
            • Refereed limited

            Conference

            CHI '24

            Acceptance Rates

            Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

            Upcoming Conference

            CHI 2025
            ACM CHI Conference on Human Factors in Computing Systems
            April 26 - May 1, 2025
            Yokohama , Japan

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • Downloads (Last 12 months)1,879
            • Downloads (Last 6 weeks)214
            Reflects downloads up to 04 Jan 2025

            Other Metrics

            Citations

            Cited By

            View all
            • (2024)Handwriting Enhancement: Recognition-Based and Recognition-Independent Approaches for On-Device Online Handwritten Text AlignmentIEEE Access10.1109/ACCESS.2024.341243312(99334-99348)Online publication date: 2024

            View Options

            View options

            PDF

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format.

            HTML Format

            Login options

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media