A shared 2D virtual frame of reference can be used to establish proxemic cues without custom hardware setup. Some systems enable the blending of multiple video feeds to create a shared hybrid space [
31,
52]. MirrorBlender [
31] supports continuous repositioning, resizing, and blending of video feeds in a shared 2D interface using the principle of WYSIWIS (What You See Is What I See [
68]). Unlike MirrorBlender, which supports blending of physical spaces in hybrid meetings, we focus on support for turn-taking in fully virtual meetings
and more dynamic manipulation of person and task space. Most relevant to our work are the tools that support the free manipulation of video feeds, avatars, or UI elements to grab others’ attention in a shared 2D frame of reference [
1,
5,
6,
7,
62]. Several recent videoconferencing (VC) tools enable similar experiences, such as proximity-based social interactions with avatars to trigger bubbles of conventional video-conferencing (e.g., Gather Town [
1], Wonder.Me [
7]), and/or repositionable video feeds to provide social awareness of parallel conversations with the proximity of everyone’s video with ad-hoc subgroup conversations (e.g., SpatialChat [
5], Sprout [
6]). Most of these interfaces provide some notion of virtual fixed/semifixed features, e.g., abstract rectangles or depictions of chairs or roundtables, that then provide a bounded space for scoping the video and audio channels to subgroups of currently online users (e.g., Remo [
4]). The role of fixed/semifixed features in these interfaces is to
support approach and leave-taking in groups [24] or larger online crowds [44]. Instead, our goal is to provide such spatial features for mediating
turn-taking within groups that have already been formed.