KR20110002005A

KR20110002005A - Scalable techniques for providing real-time per-avatar streaming data in virtual reality systems that employ per-avatar rendered environments

Info

Publication number: KR20110002005A
Application number: KR1020107018261A
Authority: KR
Inventors: 제임스 이. 토가; 켄 콕스; 시드 굽타; 라팔 보니
Original assignee: 비복스 인크.
Priority date: 2008-01-17
Filing date: 2009-01-17
Publication date: 2011-01-06
Also published as: EP2244797A2; JP2011510409A; EP2244797A4; JP2015053061A; CN102186544B; TW200941271A; WO2009092060A2; WO2009092060A3; JP2013254501A; CN102186544A; CA2712483A1

Abstract

스트리밍 데이터의 세그먼트들을 이용하여 표현된 에미션을 렌더링하기 위한 확장가능한 기술이 개시되는데, 에미션은 많은 인식 지점으로부터 잠재적으로 인식가능하고, 에미션들과 인식 지점들은 실시간으로 변하는 관계들을 갖는다. 본 기술은 주어진 에미션이 주어진 인식 지점에서 인식가능한지를 타임슬라이스에 대하여 판정함으로써 세그먼트들을 필터링한다. 인식가능하지 않은 경우, 에미션을 표현하는 스트리밍 데이터의 세그먼트들은 주어진 인식 지점으로부터 인식되는 대로 에미션을 렌더링하는 데에 이용되지 않는다. 본 기술들은 네트워크화된 가상 현실 시스템 내의 클라이언트들에서 오디오 에미션을 렌더링하기 위하여 네트워크화된 가상 환경들에서 사용된다. 오디오 에미션의 경우, 주어진 에미션이 주어진 인식 지점에서 인식가능한지에 관한 한 결정요인은, 다른 에미션들의 음향심리학적 특성들이 그 주어진 에미션을 가리는지의 여부이다. Extensible techniques for rendering the represented representation using segments of streaming data are disclosed wherein the emissions are potentially recognizable from many recognition points, and the emissions and recognition points have relationships that change in real time. The technique filters the segments by determining with respect to the time slice whether a given emission is recognizable at a given recognition point. If not recognizable, the segments of streaming data representing the emission are not used to render the emission as recognized from a given recognition point. The techniques are used in networked virtual environments to render audio emission at clients in a networked virtual reality system. In the case of audio emissions, one deciding factor as to whether a given emission is recognizable at a given point of recognition is whether or not the psychoacoustic characteristics of the other emissions obscure the given emission.

Description

SCALABLE TECHNIQUES FOR PROVIDING REAL-TIME PER-AVATAR STREAMING DATA IN VIRTUAL REALITY SYSTEMS THAT EMPLOY PER-AVATAR RENDERED ENVIRONMENTS}

<관련 출원에 대한 상호 참조><Cross Reference to Related Application>

본 특허 출원의 주제는 본 명세서에 전체가 참조로서 포함되는 이하의 미국 특허 가출원에 관련되며, 그 우선권을 주장한다.The subject matter of this patent application relates to the following United States patent provisional applications, which are hereby incorporated by reference in their entirety, and claim their priority.

Rafal Boni 등의 미국 특허 가출원 61/021729, "Relevance routing system" (2008년 1월 17일 출원)United States Patent Provisional Application No. 61/021729 to Rafal Boni et al., "Relevance routing system", filed Jan. 17, 2008.

<연방정부 기금을 지원받은 연구 또는 개발에 대한 진술><Statement of research or development funded by federal funds>

해당 사항 없음None

<서열 목록에 대한 참조><Reference to sequence list>

해당 사항 없음None

<기술분야><Technical Field>

본 명세서에 개시된 기술들은 가상 현실 시스템에 관한 것으로, 더 구체적으로는 멀티아바타 가상 환경에서의 스트리밍 데이터의 렌더링에 관한 것이다.The techniques disclosed herein relate to virtual reality systems, and more particularly to rendering of streaming data in a multi-avatar virtual environment.

가상 환경{Virtual environment { virtualvirtual environmentsenvironments }}

가상 환경(VE로 약칭됨)이라는 용어는, 본 맥락에서 컴퓨터 시스템의 사용자의 실세계 환경에 대한 기대를 따르는 방식으로 거동하는, 컴퓨터 시스템에 의해 생성된 환경을 지칭한다. 가상 환경을 만들어내는 컴퓨터 시스템은 이하에서 가상 현실 시스템이라고 지칭되고, 가상 현실 시스템에 의한 가상 환경의 생성은 가상 환경의 렌더링이라고 지칭된다. 가상 환경은 아바타를 포함할 수 있는데, 이것은 본 맥락에서 가상 환경 내에 인식 지점(a point of perception)을 갖는, 그 가상 환경에 소속된 엔터티이다. 가상 현실 시스템은 아바타를 위하여, 그 아바타의 인식 지점으로부터 인식된 대로 가상 환경을 렌더링할 수 있다. 가상 환경 시스템의 사용자는 가상 환경 내의 특정 아바타와 관련지어질 수 있다. 가상 환경의 역사와 발전에 대한 개관은 IEEE Computer(2007년 10월)의 "Generation 3D: Living in Virtual Worlds"에서 찾을 수 있다.The term virtual environment (abbreviated as VE ) refers to an environment created by a computer system in this context that behaves in a way that meets the expectations of the user's real world environment of the computer system. And a computer system to create a virtual environment is referred to as a virtual reality system in the following, production of a virtual environment according to the virtual reality system is referred to as rendering of the virtual environment. The virtual environment may include an avatar , which in this context is an entity belonging to the virtual environment, having a point of perception within the virtual environment. The virtual reality system may render, for the avatar, the virtual environment as recognized from the recognition point of the avatar. The user of the virtual environment system can be associated with a particular avatar in the virtual environment. An overview of the history and evolution of virtual environments can be found in IEEE Computer (October 2007), "Generation 3D: Living in Virtual Worlds."

많은 가상 환경에서, 아바타에 관련지어진 사용자는 그 아바타를 통해 가상 환경과 상호작용할 수 있다. 사용자는 아바타의 인식 지점으로부터 가상 환경을 인식할 뿐만 아니라, 가상 환경에서 아바타의 인식 지점을 변경하거나 아바타와 가상 환경 간의 관계를 다르게 변경하거나 가상 환경 자체를 변경할 수도 있다. 이하에서, 이러한 가상 환경은 대화형( interactive ) 가상 환경이라고 지칭된다. 고성능 개인용 컴퓨터 및 고속 네트워킹의 출현과 함께, 가상 환경 -특히, 다수의 사용자를 위한 아바타들이 동시에 가상 환경과 상호작용하고 있는 멀티아바타 대화형 가상 환경- 은 공학 연구소 및 특수화된 애플리케이션 영역으로부터 널리 사용되도록 바뀌어왔다. 이러한 멀티아바타 가상 환경의 예는, World of Warcraft^®와 같은 다중 접속 온라인 게임(massively-multiplayer on-line games, MMOGs) 및 Second Life^®와 같은 사용자-정의된 가상 환경의 것과 같이 상당한 그래픽 및 비주얼 콘텐츠를 갖는 환경을 포함한다. 그러한 시스템들에서, 가상 환경의 각 사용자는 가상 환경의 아바타에 의해 표현되며, 각 아바타는 가상 환경에서의 아바타의 가상 위치 및 기타 양태들(aspects)에 기초하여 가상 환경에서의 인식 지점을 갖는다. 가상 환경의 사용자들은 자신의 아바타를 제어하고, PC 또는 워크스테이션 컴퓨터와 같은 클라이언트 컴퓨터를 통해 가상 환경 내에서 상호작용한다. 또한, 가상 환경은 서버 컴퓨터들을 이용하여서도 구현된다. 사용자의 아바타를 위한 렌더링은 서버 컴퓨터로부터 보내진 데이터에 따라 사용자의 클라이언트 컴퓨터 상에서 생성된다. 데이터는 가상 현실 시스템의 서버 컴퓨터들과 클라이언트 컴퓨터들 간에서 네트워크를 통해 데이터 패킷으로 전송된다.In many virtual environments, a user associated with an avatar can interact with the virtual environment through the avatar. In addition to recognizing the virtual environment from the recognition point of the avatar, the user may change the recognition point of the avatar in the virtual environment, change the relationship between the avatar and the virtual environment differently, or change the virtual environment itself. In the following, such a virtual environment is referred to as Interactive (interactive) virtual environment. With the advent of high-performance personal computers and high-speed networking, virtual environments—particularly multi-avatar interactive virtual environments where avatars for multiple users are simultaneously interacting with virtual environments—are widely used by engineering laboratories and specialized application areas. Has changed. Examples of such multi-avatar virtual environments include significant graphical and visual content, such as in massively-multiplayer on-line games (MMOGs) such as World of Warcraft ^® and user-defined virtual environments such as Second Life ^®. It includes an environment having a. In such systems, each user of the virtual environment is represented by an avatar of the virtual environment, each avatar having a recognition point in the virtual environment based on the avatar's virtual location and other aspects in the virtual environment. Users of the virtual environment control their avatars and interact within the virtual environment through client computers, such as PCs or workstation computers. The virtual environment is also implemented using server computers. Rendering for the user's avatar is generated on the user's client computer according to the data sent from the server computer. Data is transmitted as data packets over a network between server computers and client computers in the virtual reality system.

이러한 시스템들 대부분은 사용자의 아바타에게 가상 환경의 비주얼 이미지를 표시(present)한다. 일부 가상 환경들은 가상 환경에서 사용자의 아바타가 듣는 음향, 또는 아바타의 가상의 촉각으로부터의 출력과 같은 추가의 정보를 표시한다. "Neuentwicklungen auf dem Gebiet der Audio Virtual Reality"(Fraunhofer-Institut fuer Medienkommunikation, Germany, 2003년 7월)에 기술된 대로의, Fraunhofer Institute에서 개발된 LISTEN 시스템에 의해 생성된 것과 같이, 사용자들로의 가청 출력을 주로 하거나 가청 출력 단독으로 이루어진 가상 환경 및 시스템들도 고안되어 있다.Most of these systems present a visual image of the virtual environment to the user's avatar. Some virtual environments display additional information, such as the sound a user's avatar hears in the virtual environment, or output from the virtual tactile sense of the avatar. " Neuentwicklungen auf dem Gebiet der Audio Virtual Audible output to users or primarily audible output, as produced by the LISTEN system developed by the Fraunhofer Institute, as described in " Reality " (Fraunhofer-Institut fuer Medienkommunikation, Germany, July 2003). Virtual environments and systems are also devised.

가상 환경이 대화형인 경우, 사용자를 위한 아바타의 외양 및 행동은 가상 환경 내의 다른 아바타들이 사용자의 외양 및 행동을 표현하는 것으로서 인식 -보거나 듣는 것 등- 하는 것이다. 물론, 아바타가 임의의 특정 엔터티를 닮은 것으로서 보여지거나 인식되어야 할 필요는 없고, 사용자를 위한 아바타는 사용자의 실제 외양과 고의적으로 상당히 다르게 보일 수 있으며, 이는 많은 사용자들에게 가상 환경에서의 상호작용을 "실세계"에서의 상호작용에 비교하여 더 매력있게 하는 점 중 하나이다.When the virtual environment is interactive, the appearance and behavior of the avatar for the user is what other avatars in the virtual environment perceive as-representing or listening to-the appearance and behavior of the user. Of course, the avatar does not need to be shown or recognized as resembling any particular entity, and the avatar for the user may appear deliberately different from the user's actual appearance, which is why many users interact with the virtual environment. It is one of the things that makes it more attractive compared to the interaction in the "real world".

가상 환경 내의 각 아바타는 개별적인 인식 지점을 갖기 때문에, 가상 현실 시스템은 가상 환경을 멀티아바타 가상 환경 내의 다른 아바타들과 반드시 다르게 렌더링해야 한다. 제1 아바타가 인식하는 것 -예를 들어, 보는 것 등- 은 한 인식 지점으로부터의 것일 것이고, 제2 아바타가 인식하는 것은 다를 것이다. 예를 들어, 아바타 "Ivan"은 특정 위치 및 가상 방향으로부터 아바타 "Sue" 및 "David"과 가상 테이블을 "볼" 수 있지만, 아바타 "Lisa"는 보지 못할 수 있는데, 왜냐하면 그 아바타는 가상 환경에서 Ivan의 "뒤"에 있어서 "시야 밖에 있기" 때문이다. 다른 아바타 "Sue"는 완전히 다른 각도로부터 아바타 Ivan, Sue, Lisa 및 David과 두개의 의자를 볼 수 있을 것이다. 다른 아바타 "Maurice"는 그 순간에 가상 환경 내의 완전히 다른 가상 위치에 있을 수 있고, 아바타 Ivan, Sue, Lisa 또는 David 중 누구도 보지 못할 수 있지만(또한 그들도 Maurice를 보지 못함), 대신에 Maurice는 Maurice와 동일한 가상 위치 근처에 있는 다른 아바타들을 본다. 본 논의에서, 다른 아바타들과 다른 렌더링은 아바타별(per-avatar) 렌더링이라고 지칭된다.Because each avatar in the virtual environment has a separate point of recognition, the virtual reality system must render the virtual environment differently from other avatars in the multi-avatar virtual environment. What the first avatar recognizes (eg, sees, etc.) would be from one recognition point, and what the second avatar would recognize would be different. For example, the avatar "Ivan" can "see" the avatars "Sue" and "David" and the virtual table from a specific position and virtual orientation, but not the avatar "Lisa" because the avatar is in a virtual environment. For Ivan's "behind" it is "out of the field of view." Another avatar "Sue" will see avatars Ivan, Sue, Lisa and David and two chairs from a completely different angle. The other avatar "Maurice" may be in a completely different virtual location within the virtual environment at that moment and may not see any of the avatars Ivan, Sue, Lisa or David (also they do not see Maurice), but instead Maurice is Maurice See other avatars near the same virtual location with. In this discussion, other avatars and other rendering it is referred to as the Avatar Star (per-avatar) rendering.

도 2는 예시적인 가상 환경에서의 특정 아바타를 위한 아바타별 렌더링의 일례를 도시한 것이다. 도 2는 렌더링으로부터의 고정 이미지이다 -실제에서, 가상 환경은 장면을 동적으로 그리고 컬러로 렌더링할 것이다. 본 렌더링의 예에서의 인식 지점은 가상 현실 시스템이 도 2에 도시된 렌더링을 해주고 있는 아바타의 것이다. 본 예에서, 여덟명의 사용자를 위한 아바타들의 그룹은 가상 환경 내의 특정 장소(locale)로 "갔고", 이 장소는 221 및 223에서 2층으로 된 단상을 포함한다. 본 예에서, - 매우 멀리 떨어진 실세계 위치들에 있을 수 있는 - 사용자들은 무엇인가를 논의하기 위한 회의를 위해 가상 환경에서 (그들의 아바타를 통하여) "만나기로" 계획했었고, 따라서 그들의 아바타들은 가상 환경에서 그들의 참석(presence)을 표현한다.2 illustrates an example of avatar-specific rendering for a specific avatar in an exemplary virtual environment. 2 is a fixed image from rendering-in practice, the virtual environment will render the scene dynamically and in color. The recognition point in this rendering example is that of the avatar that the virtual reality system is rendering in FIG. 2. In this example, a group of avatars for eight users "goed" to a specific place in the virtual environment, which includes two-level single phases at 221 and 223. In this example, users-who could be in very distant real world locations-planned to "meet" (via their avatars) in a virtual environment for a meeting to discuss something, so that their avatars were Express presence.

여덟명의 아바타들 중 일곱명 - 본 예에서는 나타난 모든 아바타가 사람의 형상임 - 을 볼 수 있고, 가상 현실 시스템이 렌더링을 해주고 있는 아바타는 보이지 않는데, 왜냐하면 렌더링이 그 아바타의 인식 지점으로부터 만들어지기 때문이다. 편의상, 렌더링이 그를 위해 만들어지고 있는 아바타는 도 2에서 299로 지칭된다. 도면은 렌더링이 "299"로 표시되는 아바타의 지점으로부터 만들어진 것임을 나타내기 위하여 전체 이미지를 중괄호로 묶는 무소속의 레이블 299를 포함한다.You can see seven of the eight avatars-in this example all the avatars shown are human figures, and the avatar that the virtual reality system is rendering is not visible because the rendering is created from the avatar's point of recognition. to be. For convenience, the avatar for which rendering is being made for him is referred to as 299 in FIG. 2. The figure includes an independent label 299 that encloses the entire image in braces to indicate that the rendering was made from the point of the avatar indicated by "299".

201, 209 및 213으로 레이블된 아바타들을 포함하는 4명의 아바타는 단상(221)에 서 있는 것이 보인다. 205로 레이블된 아바타를 포함하는 나머지 3명의 아바타는 두개의 단상 사이에 서 있는 것이 보인다.Four avatars, including avatars labeled 201, 209 and 213, are shown standing on stage 221. The remaining three avatars, including the avatar labeled 205, appear to stand between the two phases.

도 2에서 볼 수 있는 바와 같이, 아바타(209)는 아바타(213)의 뒤에 서 있다. 이 장면을 아바타(213)의 인식 지점에 대하여 렌더링할 때에, 아바타(209) 또는 아바타(299)는 아바타(213)의 시야로부터 벗어나 있으므로, 그들 중 누구도 보이지 않을 것이다.As can be seen in FIG. 2, the avatar 209 is standing behind the avatar 213. When rendering this scene with respect to the recognition point of avatar 213, either avatar 209 or avatar 299 is out of view of avatar 213, so none of them will be visible.

도 2의 예는 사용자들의 자신의 아바타를 통해 상호작용할 수 있지만, 아바타들이 말을 하지는 못하는 가상 현실 시스템을 위한 것이다. 그 대신, 이러한 가상 현실 시스템에서는 사용자들이 키보드로 텍스트를 타이핑함으로써 자신의 아바타가 "말하게" 할 수 있다. 가상 환경은 사용자를 위한 아바타 위의 "말풍선" 내에 그 텍스트를 렌더링하고, 선택적으로는 사용자의 아바타의 이름을 갖는 버블이 마찬가지의 방식으로 렌더링된다. 아바타(201)에 대한 일례가 203에 도시되어 있다.The example of FIG. 2 is for a virtual reality system in which users can interact via their own avatars, but avatars do not speak. Instead, such a virtual reality system allows users to "talk" their avatar by typing text on the keyboard. The virtual environment renders the text within a "balloon" over the avatar for the user, and optionally a bubble with the name of the user's avatar is rendered in a similar manner. An example for the avatar 201 is shown at 203.

이러한 특정한 예시적인 가상 현실 시스템에서, 사용자들은 키보드의 화살표 키들을 이용하여, 자신의 아바타가 한 가상 위치로부터 다른 가상 위치로 움직이거나 걷게 하고, 또는 다른 방향을 향하도록 돌게 할 수 있다. 또한, 아바타가 손과 팔을 움직임으로써 몸짓을 하게 하는 키보드 입력들도 있다. 이러한 몸짓의 두가지 예를 볼 수 있는데, 아바타(205)는 손과 팔을 올린 것(207에서 원 안에 표시됨)으로부터 볼 수 있는 바와 같이 몸짓을 하고 있고, 아바타(209)는 211에서 원 안에 표시된 손과 팔의 위치에 의해 보여진 바와 같이 몸짓을 하고 있다.In this particular example virtual reality system, users can use arrow keys on the keyboard to cause their avatar to move or walk from one virtual location to another, or to turn in another direction. There are also keyboard inputs that allow avatars to gesture by moving their hands and arms. Two examples of such gestures are shown: Avatar 205 is gesturing as shown by raising hands and arms (indicated in a circle at 207), and Avatar 209 is a hand indicated in circle at 211. He is gesturing as shown by the position of the arm.

따라서, 사용자들은 자신의 아바타를 통하여 이동하고, 몸짓을 하고, 서로 대화할 수 있다. 사용자들은 자신의 아바타를 통하여 다른 가상 위치 및 장소로 이동하고, 다른 사용자들과 만나고, 모임을 개최하고, 친구를 사귀고, 가상 환경 내에서의 "가상 생활"의 많은 양태들에 참여할 수 있다.Thus, users can move through their avatars, gesture, and communicate with each other. Users can move to other virtual locations and places through their avatars, meet other users, hold meetings, make friends, and participate in many aspects of "virtual life" within the virtual environment.

많은 Plenty 멀티아바타가Multi Avatar 렌더링되는Rendered 환경을 구현하는 데에 있어서의 문제점 Problems in Implementing the Environment

많은 멀티아바타가 렌더링되는 환경을 구현하는 데에 있어서 몇가지 문제점이 존재한다. 그들 중 몇몇을 들자면 다음과 같다.There are some problems in implementing an environment in which many multi-avatars are rendered. Some of them are listed below.

● 가상 환경이 많은 아바타들을 위하여 생성해야만 하는 순전히 많은 수의 상이한 개별 렌더링들A purely large number of different individual renderings that the virtual environment must create for many avatars

● 지연 및 이용가능한 데이터 대역폭에 제한을 갖는 다수의 접속을 갖는 네트워크화된 구현을 제공할 필요성The need to provide a networked implementation with multiple connections with limitations on delay and available data bandwidth

도 2의 가상 현실 시스템이 음성을 다루기 위하여 말풍선을 이용한다는 사실이 보여주는 바와 같이, 라이브 사운드(live sound)는 현재의 가상 현실 시스템에 어려움을 제기한다. 라이브 사운드가 어려움을 제기하는 한가지 이유는 그것이 이하에서 에미션(emission)이라고 지칭되는 것, 즉 가상 환경 내에서 엔터티에 의해 생성되고 가상 환경 내의 아바타들이 인식할 수 있는 출력의 일례이기 때문이다. 그러한 에미션의 일례는 가상 환경 내의 한 아바타에 의해 생성된 것으로서 가상 환경 내의 다른 아바타들이 들을 수 있는 음성(speech)이다. 에미션의 특징은 가상 현실 시스템에서 스트리밍 데이터에 의해 표현된다는 것이다. 현재의 맥락에서의 스트리밍 데이터는 높은 데이터레이트를 가지며 실시간으로 예측불가능하게 변화하는 임의의 데이터이다. 스트리밍 데이터는 끊임없이 변화하기 때문에, 항상 끊임없는 스트림으로 송신되어야만 한다. 가상 환경의 맥락에서, 스트리밍 데이터를 한번에 에미션하는 다수의 소스들이 있을 수 있다. 또한, 에미션을 위한 가상 위치 및 인식할 수 있는 아바타들을 위한 인식 지점은 실시간으로 변화할 수 있다.As shown by the fact that the virtual reality system of FIG. 2 uses speech bubbles to handle speech, live sound presents challenges to current virtual reality systems. One reason that live sound presents a difficulty is that it is referred to hereinafter as an emission, that is an example of the output generated by an entity in the virtual environment and recognizable by the avatars in the virtual environment. One example of such an emission is a speech produced by one avatar in the virtual environment that other avatars in the virtual environment can hear. The characteristic of the emission is that it is represented by streaming data in the virtual reality system. Streaming data in the present context is any data that has a high data rate and changes unpredictably in real time. Because streaming data is constantly changing, it must always be sent in a constant stream. In the context of a virtual environment, there may be multiple sources that emit streaming data at one time. In addition, the virtual location for the emission and the recognition point for the recognizable avatars may change in real time.

가상 환경에서의 에미션의 유형들의 예는, 들을 수 있는 가청 에미션, 볼 수 있는 가시 에미션, 촉각으로 느낄 수 있는 햅틱 에미션, 냄새를 맡을 수 있는 후각 에미션, 맛을 볼 수 있는 미각 에미션, 및 가상 텔레파시 또는 힘의 장(force field) 에미션과 같은 가상 환경 특유의 에미션을 포함한다. 대부분의 에미션의 특성은 세기(intensity)이다. 세기의 유형은 당연히 에미션의 유형에 의존할 것이다. 예를 들어, 음성의 에미션에서, 세기는 크기(loudness)로서 표현된다. 스트리밍 데이터의 예는 음성을 표현하는 데이터(오디오 데이터), 움직이는 이미지를 표현하는 데이터(비디오 데이터), 그리고 또한 계속되는 힘 또는 터치를 표현하는 데이터이다. 새로운 유형의 스트리밍 데이터가 계속하여 개발되고 있다. 가상 환경에서의 에미션은 아바타에 관련된 사용자로부터의 음성과 같이 실세계의 소스로부터 온 것일 수도 있고, 생성 또는 기록된 소스로부터 온 것일 수도 있다.Examples of types of emissions in a virtual environment include audible audible emissions, visible visible emissions, haptic haptic emissions, smellable smells, and tasteable tastes. Emissions, and virtual environment specific emissions such as virtual telepathy or force field emissions. Most of the characteristics of the emission intensity (intensity). The type of intensity will of course depend on the type of emission. For example, in emission of speech, the intensity is expressed as loudness. Examples of streaming data are data representing voice (audio data), data representing moving images (video data), and also data representing continuous force or touch. New types of streaming data continue to be developed. Emissions in the virtual environment may come from sources in the real world, such as voices from users associated with avatars, or from sources created or recorded.

가상 환경에서의 에미션의 소스는 가상 환경의 임의의 엔터티일 수 있다. 음향을 예로 들어보면, 가상 환경에서의 가청 에미션들의 예는 가상 환경 내의 엔터티들에 의해 만들어진 음향 -예를 들어, 아바타가 그 아바타의 사용자가 마이크로폰에 대고 말한 것을 에미션함, 가상 폭포에 의해 에미션되는 생성된 콸콸거리는 소리, 가상 폭탄에 의해 에미션되는 폭발 소리, 가상 바닥 위의 가상 하이힐에 의해 에미션되는 위태위태한 딸깍딸깍 소리-, 및 배경 음향 -예를 들어, 가상 환경의 한 영역에 의해 에미션되는 가상 미풍 또는 바람의 배경 음향, 또는 가상의 반추동물 떼에 의해 에미션되는 배경 음향- 을 포함한다.The source of the emission in the virtual environment can be any entity in the virtual environment. Taking sound as an example, examples of audible emissions in a virtual environment are sounds made by entities in the virtual environment-e.g., an avatar emits what the user of the avatar speaks into the microphone; Generated wheezing, explosions emitted by virtual bombs, endangered clicks emitted by virtual high heels on the virtual floor, and background sounds-for example, an area of the virtual environment. A background sound of a virtual breeze or wind emitted by or a background sound emitted by a group of virtual ruminants.

음향 시퀀스 내의 음향들, 에미션 소스 및 아바타들의 상대적인 위치, 소스들에 의해 에미션되는 음향들의 품질, 아바타들에 대한 음향의 가청성(audibility) 및 겉보기 크기(apparent loudness), 및 각각의 잠재적으로 인식하는 아바타(potentially-perceiving avatar)의 방향은 사실상 실세계에서 모두 변화할 수 있다. 다른 유형의 에미션 및 다른 유형의 스트리밍 데이터에 대해서도 마찬가지이다.The sounds in the sound sequence, the relative location of the emission source and avatar, the quality of the sounds emitted by the sources, the audibility and apparent loudness of the sound relative to the avatars, and potentially each The direction of a potentially-perceiving avatar can actually change all in the real world. The same is true for other types of emission and other types of streaming data.

가상 환경에서 각각의 아바타에 의해 개별적으로 인식되는 대로 에미션들을 렌더링하는 문제는 복잡하다. 이러한 문제점들은 소스들이 에미션하고 있는 동안 소스들과 목적 아바타들이 가상 환경에서 이동하는 경우에, 예를 들어, 사용자가 자신의 아바타를 통해 말하면서 그 에미션하는 아바타를 이동시키는 경우, 또는 다른 사용자들이 그 에미션을 인식하면서 자신의 아바타를 이동시키는 경우에 훨씬 악화된다. 후자의 양태 -인식하는 아바타가 가상 환경에서 이동함- 는, 가상 환경 내의 정지 소스로부터의 에미션들에게까지 영향을 미친다. 에미션을 표현하는 스트리밍 데이터만 끊임없이 변화하는 게 아니라, 에미션이 어떻게 렌더링될지, 그리고 그 에미션이 렌더링되는 대상인 인식하는 아바타들도 끊임없이 변화한다. 렌더링 및 인식하는 아바타는 잠재적으로 인식하는 아바타들이 가상 환경에서 이동할 때뿐만 아니라, 에미션의 소스들이 가상 환경에서 이동할 때에도 변화한다.The problem of rendering emissions as they are individually recognized by each avatar in a virtual environment is complex. These problems may arise when sources and destination avatars move in a virtual environment while sources are emitting, for example, when a user speaks through his avatar and moves the avatar that is emitting it, or when other users It is even worse if you move your avatar while recognizing the emission. The latter aspect, where the recognizing avatar moves in the virtual environment, even affects emissions from a static source in the virtual environment. Not just the ever-changing, only the streaming data representing the Emmy Sean, what will happen is rendered illustration Emmy, and the avatar will be recognized subject to the rendering of which the emission is also constantly changing. The rendering and recognizing avatar changes not only when the potentially recognizing avatars move in the virtual environment, but also when the sources of the emission move in the virtual environment.

이러한 복잡함의 제1 레벨에서, 잠재적으로 인식하는 아바타가 주어진 순간에 소스에 의해 에미션되는 음향 시퀀스를 실제로 인식할 수 있을지의 여부는, 적어도 각 순간에 소스에 의해 에미션되는 음향의 음량에 의존한다. 또한, 그것은 각각의 순간에서 소스와 잠재적으로 인식하는 아바타 간의 가상 현실에서의 거리에 의존한다. "실세계"에서와 같이, 가상 환경에서의 인식 지점에 대하여 지나치게 여린 음향은 그 인식 지점에서 아바타에게 들리지 않을 것이다. "멀리서" 오는 음향은 더 가까운 거리에서 올 때보다 더 여리게 들리거나 인식된다. 본 맥락에서, 음향이 거리에 따라 여리게 들리는 정도는 거리 가중치 인자(a distance-weight factor)라고 지칭된다. 소스에서의 음향의 세기는 음향의 고유 크기(intrinsic loudness)라고 지칭된다. 인식 지점에서의 음향의 세기는 겉보기 크기라고 지칭된다.At the first level of complexity, whether a potentially recognizing avatar can actually recognize a sound sequence that is emitted by the source at a given moment depends at least on the volume of the sound that is emitted by the source at each moment. do. It also depends on the distance in virtual reality between the source and potentially recognizing avatars at each moment. As in the "real world", sounds that are too soft for a recognition point in the virtual environment will not be heard by the avatar at that recognition point. Sounds coming "away" are heard or perceived as softer than they come from closer distances. In this context, the degree of sound is audible yeorige with distance is referred to as a distance weight factor (a distance-weight factor). Intensity of the sound at the source is referred to as the specific size (intrinsic loudness) of sound. The intensity of the sound at the point of recognition is called the apparent loudness .

제2 레벨에서, 에미션되는 음향이 특정 아바타에게 들릴 수 있을지의 여부는 또한 소스에 대한 특정 아바타의 위치의 다른 양태들, 인식하는 아바타가 다른 소스들로부터 동시에 듣고 있는 음향들에 의해, 또는 음향의 품질에 의해 결정될 수 있다. 예를 들어, 음향심리학의 원리는 실세계에서의 더 큰 음향이 (개별 청취자에 대한 겉보기 크기에 기초하여) 덜 시끄러운 음향을 가리거나(mask) 안 들리게 할 수 있다는 사실을 포함한다. 이는 음향의 상대적 크기 또는 음량으로 지칭되는데, 한 음향의 겉보기 크기는 다른 음향의 겉보기 크기에 관련하여 더 클 수 있다. 또한, 음향심리학적 효과들은 어떤 품질의 음향들이 다른 음향들에 비해 더 잘 들리는 경향이 있다는 것을 포함한다: 예를 들어, 사람들은 아기가 우는 소리를 특히 잘 알아차리거나 잘 듣는데, 그 소리가 여리고, 다른 더 큰 음향들이 동시에 존재하는 경우에서조차도 그러하다.At the second level, whether the sound being emitted can be heard by a particular avatar is also determined by other aspects of the location of the particular avatar relative to the source, by sounds that the recognizing avatar is listening to from other sources simultaneously, or It can be determined by the quality of. For example, the principles of psychoacoustics include the fact that louder sounds in the real world can mask or mask less loud sounds (based on the apparent size of an individual listener). This is referred to as the relative loudness or volume of the sound, where the apparent loudness of one sound may be larger relative to the apparent loudness of the other sound. In addition, psychoacoustic effects include that certain qualities of sound tend to be heard better than others: for example, people are particularly aware of or listen to the crying of a baby, This is true even when other louder sounds are present at the same time.

또 다른 복잡함으로서, 음향들이 그 음향들을 들을 수 있는 모든 아바타에 대해 지향성을 갖고서(directionally) 렌더링되어, 모든 아바타에 대한 모든 음향이 해당 아바타에 대하여 적합한 상대적인 방향에서 오는 것으로서 인식되도록 음향을 렌더링하는 것이 바람직할 수 있다. 따라서, 지향성(directionality)은 음향을 들을 수 있는 아바타의 가상 위치뿐만 아니라, 가상 환경 내의 잠재적인 가청 음향의 모든 소스의 위치에도 의존하며, 또한 가상 환경에서 아바타가 "향하고" 있는 방향에도 의존한다.Further, by another complex, to sound to render the sound to be recognized as coming from the appropriate relative direction with respect to all the sound is the avatar for gatgoseo directivity (directionally) is rendered, all avatars for all avatars that can hear the sound It may be desirable. Thus, directionality depends not only on the virtual location of the avatar where the sound can be heard, but also on the location of all sources of potential audible sound in the virtual environment, and also on the direction in which the avatar is "facing" in the virtual environment.

몇 안되는 적은 수의 소스들 및 아바타들로의 및 그들로부터의 에미션의 렌더링을 그런대로 괜찮게 수행할 수 있는 종래 기술의 가상 현실 시스템은 대량의 멀티아바타가 렌더링되는 환경에서의 수만개의 소스 및 아바타에는 단순히 대처하지 못할 것이다. 통상적으로 말하자면, 그러한 시스템은 매우 많은 수의 소스 및 아바타를 다루도록 확장가능하지는 않다.Prior art virtual reality systems that can perform rendering of emissions to and from a few small sources and avatars as they are can be done with tens of thousands of sources and avatars in an environment where a large number of multi-avatars are rendered. You simply won't cope. Generally speaking, such a system is not scalable to handle a very large number of sources and avatars.

요약하면, 가상 환경 내의 복수의 소스로부터의 가청 에미션과 같은 복수의 소스로부터의 에미션의 아바타별 렌더링은 각 소스로부터의 에미션을 표현하는 스트리밍 데이터가 In summary, avatar-specific rendering of emissions from multiple sources, such as audible emissions from multiple sources within a virtual environment, results in streaming data representing emissions from each source.

● 거의 끊임없이 에미션되고 변화하고,● almost constantly being emitted and changing,

● 그에 따라 높은 데이터레이트를 가지고,● have a high data rate accordingly,

● 한번에 다수의 별개의 소스로부터 렌더링되어야 하고,• must be rendered from multiple separate sources at once,

● 청취하고 있는 각각의 아바타에 대하여 개별적으로 한번에 렌더링되어야 하고,● must be rendered individually once for each avatar listening;

● 렌더링하기에 복잡하거나 비용이 높고,● complex or expensive to render,

● 다수의 소스 및 아바타가 존재할 때 다루기 어렵다● Difficult to handle when there are multiple sources and avatars

는 점에서 특수한 문제점들을 나타낸다.Points out special problems.

멀티아바타가Multi Avatar 렌더링되는Rendered 환경에서 스트리밍 데이터를 다루기 위한 현재의 기술들 Current technologies for handling streaming data in the environment

가상 환경에서 스트리밍 데이터를 렌더링하기 위한 현재의 기술들은 언급된 문제들을 다루는 데에 있어서 제한적인 성공을 제공한다. 그 결과, 멀티아바타 가상 환경의 구현자들은 어쩔 수 없이 하나 이상의 불만족스러운 절충안을 만들게 된다.Current techniques for rendering streaming data in a virtual environment provide limited success in addressing the issues mentioned. As a result, implementers of a multi-avatar virtual environment are forced to make one or more unsatisfactory compromises.

● 가청 또는 가시 에미션과 같이 스트리밍 데이터를 이용하여 표현되어야 하는 에미션들에 대해 지원을 하지 않음.• No support for emissions that must be represented using streaming data, such as audible or visible emissions.

오디오 상호작용을 제공하는 것은 지나치게 어렵거나 비용이 많이 들기 때문에, 가상 환경은 방송 또는 점대점 방식으로 "텍스트 채팅"이나 "인스턴트 메시지"만을 지원할 수 있고, 아바타를 통한 사용자들 간의 오디오 상호작용은 갖지 않는다.Because providing audio interactions is overly difficult or expensive, the virtual environment can only support "text chat" or "instant messages" in a broadcast or point-to-point manner, with no audio interactions between users through avatars. Do not.

● 렌더링되는 환경의 크기 및 복잡성을 제한:● Limit the size and complexity of the rendered environment:

가상 환경 구현은 가상 환경에 대하여 작은 최대수의 아바타까지만 허용할 수 있거나, 아바타들을 분할하여 가상 환경 내의 주어진 "장면"에서 임의의 시간에 작은 최대수만이 존재할 수 있게 하거나, 한번에 제한된 수의 사용자들만이 스트리밍 데이터의 에미션을 이용하여 상호작용할 수 있게 허가할 수 있다.The virtual environment implementation may allow only a small maximum number of avatars for the virtual environment, or split the avatars so that only a small maximum number can exist at any given time in a given "scene" within the virtual environment, or only a limited number of users at a time. Emissions from this streaming data can be used to allow interaction.

● 스트리밍 데이터의 아바타별 렌더링이 없음:● No avatar-specific rendering of streaming data:

아바타들은 열린 "공동 가입선(party line)" 상에서만 말하고 듣도록 제한되며, 모든 음향 또는 가상 환경 내의 "장면"으로부터의 모든 음향이 항상 존재하고, 모든 아바타들에게는 모든 음향의 동일한 렌더링이 제공된다.Avatars are limited to talking and listening only on an open "party line", all sounds from all scenes or "scenes" in the virtual environment are always present, and all avatars are provided with the same rendering of all sounds. .

● 비현실적인 렌더링:● unrealistic rendering:

아바타들은 아바타들의 사용자가 예를 들어 가상 인터콤과 같은 선택적인 "채팅 세션"에 참가할 때에만 가청적으로 상호작용할 수 있을 수 있고, 아바타들의 사용자의 음성은 환경 내에서의 아바타들의 가상 위치에 무관하게 본래의 음량으로 방향없이 렌더링된다.Avatars may only audibly interact when the users of the avatars participate in an optional “chat session” such as, for example, a virtual intercom, and the user's voice of the avatars is independent of the virtual location of the avatars in the environment. Rendered without direction at the original volume.

● 환경적 미디어(environmental media)에 대한 제한된 구현:● limited implementation of environmental media:

스트리밍 데이터를 지원하는 것에 있어서의 어려움으로 인해, 폭포에 대한 배경 음향과 같은 환경적 미디어는 가상 환경에서의 에미션으로서가 아니라, 반복 루프의 디지털 레코딩을 재생하는 것과 같이 각 사용자를 위한 클라이언트 컴포넌트에서 국지적으로 생성되는 음향으로서만 지원될 수 있다.Due to the difficulty in supporting streaming data, environmental media, such as background sounds for waterfalls, are not meant for virtual environments, but rather in client components for each user, such as playing digital recordings of loops. It can only be supported as a locally generated sound.

● 스트리밍 미디어의 제어로부터의 바람직하지 않은 부작용: Undesirable side effects from the control of streaming media:

스트리밍 데이터에 대한 지원을 제공하는 다수의 기존 시스템들에서, 스트리밍 데이터의 흐름을 관리하기 위하여 네트워크에서 사용되는 별개의 제어 프로토콜이 이용된다. 하나의 부작용은, 부분적으로는 네트워크에서의 전송 지연의 알려진 문제점으로 인하여, 스트리밍 데이터의 흐름을 변경하기 위한 제어 이벤트 -예를 들어, 특정 소스로부터의 스트리밍 데이터를 "묵음"으로 하거나, 스트리밍 데이터의 전달을 제1 아바타에 전달되는 것으로부터 제2 아바타에 전달되는 것으로 변경함- 는 알아차릴 수 있는 지연 이후까지 변경이 이루어지지 않게 할 수 있다. 제어 및 전달 동작들은 충분히 동기화되지 않는다.In many existing systems that provide support for streaming data, a separate control protocol used in the network is used to manage the flow of streaming data. One side effect is a control event to change the flow of streaming data, in part due to a known problem of transmission delay in the network, eg, to "mute" streaming data from a particular source, or Changing the delivery from being delivered to the first avatar to being delivered to the second avatar may prevent the change from being made until after a noticeable delay. Control and transfer operations are not sufficiently synchronized.

본 발명의 목적은 아바타별 렌더링을 생성하는 가상 현실 시스템들에서 에미션을 다루기 위한 확장가능한 기술을 제공하는 것이다. 본 발명의 다른 목적은 음향심리학적 원리들을 이용하여 에미션들을 필터링하는 것이다. 본 발명의 또 다른 목적은 네트워크화된 시스템들의 가장자리에 있는 장치들에서 에미션을 렌더링하기 위한 기술을 제공하는 것이다.It is an object of the present invention to provide an extensible technique for dealing with emissions in virtual reality systems that generate avatar-specific rendering. Another object of the invention is to filter the emissions using psychoacoustic principles. It is yet another object of the present invention to provide a technique for rendering emissions in devices at the edge of networked systems.

일 양태에서, 본 발명의 목적은 스트리밍 데이터의 세그먼트에 의해 표현되는 에미션을 렌더링하는 시스템 내의 필터에 의해 달성된다. 에미션은 그 에미션이 잠재적으로 인식될 수 있는 인식 지점으로부터 한 시점에서 인식된 대로 시스템에 의해 렌더링된다. 필터의 특징은 다음을 포함한다.In one aspect, the object of the present invention is achieved by a filter in a system that renders an emission represented by a segment of streaming data. Emissions are rendered by the system as they are recognized at one point in time from the recognition point at which they can potentially be recognized. The filter's features include:

● 필터는 인식 지점에 관련지어진다.The filter is associated with the recognition point.

● 필터는 ● The filter

○ 해당 시점에서 스트리밍 데이터의 세그먼트에 의해 표현되는 에미션을 위한 현재의 에미션 정보; 및 Current emission information for the emission represented by the segment of streaming data at that point in time; And

○ 스트리밍 데이터의 세그먼트에 의해 표현되는 그 시점에서의 필터의 인식 지점에 대한 현재의 인식 지점 정보 O Current recognition point information for the recognition point of the filter at that time represented by the segment of streaming data.

로의 액세스를 갖는다.Has access to

필터는 현재의 인식 지점 정보 및 현재의 에미션 정보로부터, 세그먼트의 스트리밍 데이터에 의해 표현되는 에미션이 필터의 인식 지점에서 인식 가능한지 여부를 판정한다. 판정 결과, 세그먼트의 스트리밍 데이터에 의해 표현되는 에미션이 그 시점에서 필터의 인식 지점에서 인식가능하지 않은 것으로 나타나면, 시스템은 필터의 인식 지점에서 에미션을 렌더링하는 데에 있어서 그 세그먼트를 사용하지 않는다.The filter determines from the current recognition point information and the current emission information whether the emission represented by the streaming data of the segment is recognizable at the recognition point of the filter. If the determination indicates that the emission represented by the streaming data of the segment is not recognizable at the filter's recognition point at that point, the system does not use that segment to render the emission at the filter's recognition point. .

다른 양태에서, 필터는 가상 환경을 제공하는 가상 현실 시스템의 컴포넌트인데, 그러한 가상 환경에서, 가상 환경 내의 소스들은 가상 환경 내의 아바타들에 의해 잠재적으로 인식되는 에미션들을 에미션한다. 필터는 아바타와 관련지어지며, 세그먼트에 의해 표현되는 에미션이 그 아바타의 현재의 인식 지점에서 그 아바타에 의해 가상 환경에서 인식가능한 것인식을 판정한다. 인식가능하지 않으면, 에미션을 표현하는 세그먼트는 그 아바타의 인식 지점에 대하여 가상 환경을 렌더링하는 데에 사용되지 않는다.In another aspect, a filter is a component of a virtual reality system that provides a virtual environment, in which sources in the virtual environment emit emissions that are potentially recognized by avatars in the virtual environment. The filter is associated with the avatar and determines whether the emission represented by the segment is recognizable in the virtual environment by the avatar at the avatar's current recognition point. If not recognizable, the segment representing the emission is not used to render the virtual environment for the avatar's recognition point.

본 발명이 속하는 기술 분야의 지식을 가진 자들이라면, 이하의 도면과 상세한 설명을 숙독하고서 다른 목적들 및 이점들을 알 수 있을 것이다.Those skilled in the art to which the present invention pertains will recognize other objects and advantages by reading the following drawings and detailed description .

도 1은 필터링 기술에 대한 개념적인 개관이다.
도 2는 예시적인 가상 환경에서의 한 장면을 도시한 것이다. 이 장면에서, 아바타들에 의해 표현되는 가상 환경의 사용자들은 그들의 아바타들이 가상 환경 내의 특정 위치에서 만나게 함으로써 회의를 하고 있다.
도 3은 바람직한 실시예에서의 스트리밍 데이터의 세그먼트의 내용의 개념도이다.
도 4는 SIREN14-3D V2 RTP 페이로드 포맷의 일부의 명세를 도시한 것이다.
도 5는 스테이지 1 및 스테이지 2 필터링의 동작을 도시한 것이다.
도 6은 스테이지 2 필터링을 더 상세하게 도시한 것이다.
도 7은 인접성 행렬을 도시한 것이다.
도면들에서의 참조 번호들은 세개 이상의 숫자를 갖는데, 오른쪽의 두개의 숫자는 나머지 숫자가 나타내는 도면에서의 참조 번호이다. 따라서, 참조 번호 203을 갖는 항목은 도 2에서 항목 203으로서 처음 나타난다.1 is a conceptual overview of a filtering technique.
2 depicts a scene in an exemplary virtual environment. In this scene, users of the virtual environment represented by the avatars are meeting by having their avatars meet at a specific location within the virtual environment.
3 is a conceptual diagram of the contents of a segment of streaming data in the preferred embodiment.
4 shows a specification of a portion of the SIREN14-3D V2 RTP payload format.
5 illustrates the operation of stage 1 and stage 2 filtering.
6 illustrates stage 2 filtering in more detail.
7 shows an adjacency matrix.
Reference numerals in the drawings have three or more numbers, and the two numbers on the right are the reference numbers in the drawings indicated by the remaining numbers. Thus, an item with reference numeral 203 first appears as item 203 in FIG.

아래의 발명의 상세한 설명은 가상 환경이 가청 에미션의 소스들을 포함하고, 가청 에미션들이 스트리밍 오디오 데이터에 의해 표현되는 실시예를 개시한다. DETAILED DESCRIPTION The following detailed description discloses an embodiment in which the virtual environment includes sources of audible emissions and the audible emissions are represented by streaming audio data.

여기에 설명되는 기술들의 원리는 어떠한 유형의 에미션과도 함께 사용될 수 있다.The principles of the techniques described herein can be used with any type of emission.

본 발명의 기술의 개관Overview of the Technology of the Invention

본 바람직한 실시예에서, Second Life를 예로 들 수 있는 유형과 같은 가상 현실 시스템은 네트워크화된 컴퓨터 시스템에서 구현된다. 본 발명의 기술들은 가상 현실 시스템에 통합된다. 가상 환경의 소스들로부터의 음향 에미션을 표현하는 스트리밍 데이터는 스트리밍 오디오 데이터의 세그먼트들로서 데이터 패킷들로 통신된다. 아바타에 대한 에미션의 세그먼트의 인식가능성을 결정하는 것에 관련된 세그먼트의 소스에 관한 정보가 각 세그먼트와 관련지어진다. 가상 현실 시스템은 클라이언트 컴퓨터와 같은 렌더링 컴포넌트 상에서 아바타별 렌더링을 수행한다. 아바타를 위한 렌더링은 클라이언트 컴퓨터 상에서 행해지며, 아바타가 들을 수 있을 세그먼트들만이 네트워크를 통해 클라이언트 컴퓨터에 송신된다. 거기에서, 세그먼트들은 아바타의 사용자를 위하여 헤드폰이나 스피커를 통한 가청 출력으로 변환된다.In this preferred embodiment, a virtual reality system, such as the type that can take Second Life as an example, is implemented in a networked computer system. The techniques of the present invention are integrated into a virtual reality system. Streaming data representing acoustic emissions from sources in the virtual environment are communicated in data packets as segments of streaming audio data. Information about the source of the segment associated with determining the recognizability of the segment of the emission for the avatar is associated with each segment. The virtual reality system performs avatar-specific rendering on a rendering component such as a client computer. Rendering for the avatar is done on the client computer, and only segments that the avatar can hear are sent over the network to the client computer. There, the segments are converted to audible output through headphones or speakers for the user of the avatar.

아바타가 반드시 사용자와 관련지어질 필요는 없고, 가상 현실 시스템이 렌더링을 해 줄 수 있는 어떠한 엔터티라도 될 수 있다. 예를 들어, 아바타는 가상 환경에서의 가상 마이크로폰일 수 있다. 가상 마이크로폰을 이용하여 이루어진 레코딩은 그 가상 마이크로폰에서 들을 수 있었던 가상 환경 내의 오디오 에미션들로 이루어진 가상 환경의 렌더링일 것이다.The avatar does not necessarily have to be associated with the user, it can be any entity that the virtual reality system can render. For example, the avatar can be a virtual microphone in a virtual environment. Recordings made using a virtual microphone would be a rendering of a virtual environment made up of audio emissions within the virtual environment that were heard by the virtual microphone.

도 1은 필터링 기술의 개념적인 개관이다.1 is a conceptual overview of a filtering technique.

101에 도시된 바와 같이, 가상 환경 내의 상이한 소스들로부터의 에미션들을 표현하는 스트리밍 데이터의 세그먼트들이 수신되어 필터링된다. 각 세그먼트는 가상 환경 내의 에미션의 소스의 위치 및 소스에서 에미션이 얼마나 강한지와 같은, 에미션의 소스에 관한 정보에 관련지어진다. 바람직한 실시예에서, 에미션은 가청 에미션이며, 세기는 소스에서의 에미션의 크기이다.As shown at 101, segments of streaming data representing emissions from different sources within the virtual environment are received and filtered. Each segment is associated with information about the source of the emission, such as the location of the source of the emission in the virtual environment and how strong the emission is at the source. In a preferred embodiment, the emission is an audible emission and the intensity is the magnitude of the emission at the source.

이러한 세그먼트들은 105에서 도시된 세그먼트 라우팅 컴포넌트에 의해 모든 세그먼트들의 결합된 스트림으로 종합된다. 세그먼트 라우팅 컴포넌트(105)는 107에서 도시된 바와 같이 세그먼트들을 종합 스트림(aggregated stream)으로 결합하는 세그먼트 스트림 결합기 컴포넌트(103)를 갖는다.These segments are aggregated into a combined stream of all segments by the segment routing component shown at 105. Segment routing component 105 has a segment stream combiner component 103 that combines the segments into an aggregated stream as shown at 107.

107에 도시된 바와 같이, (모든 음성 스트림의 세그먼트들로 이루어진) 종합 스트림은 다수의 필터 컴포넌트들에 송신된다. 필터 컴포넌트들 중 두개의 예가 111과 121에 도시되어 있고, 나머지는 생략 부호로 표시되어 있다. 가상 현실 시스템이 렌더링을 생성해주고 있는 각 아바타에 대응하는 필터 컴포넌트가 존재한다. 필터 컴포넌트(111)는 아바타(i)에 대한 렌더링을 위한 필터 컴포넌트이다. 필터(111)에 대한 상세는 113, 114, 115 및 117에 도시되어 있고, 다른 필터들은 마찬가지의 방식으로 동작한다. As shown at 107, the aggregate stream (consisting of segments of all voice streams) is transmitted to multiple filter components. Two examples of filter components are shown at 111 and 121, with the remainder indicated by ellipses. There is a filter component corresponding to each avatar for which the virtual reality system is creating a rendering. The filter component 111 is a filter component for rendering for the avatar i. Details of filter 111 are shown in 113, 114, 115, and 117, with the other filters operating in a similar manner.

필터 컴포넌트(111)는 주어진 유형의 에미션에 대한 스트리밍 데이터 중에서 가상 환경을 아바타(i)에 적합하게 렌더링하는 데에 필요한 세그먼트들을 위해, 종합 스트림(107)을 필터링한다. 필터링은 아바타(i)의 현재 아바타 정보(113) 및 현재 스트리밍 데이터 소스 정보(114)에 기초한다. 현재 아바타 정보(113)는 에미션을 인식하는 아바타(i)의 능력에 영향을 미치는, 아바타에 관한 임의의 정보이다. 현재 아바타 정보가 무엇인식은 가상 환경의 성질에 의존한다. 예를 들어, 위치의 개념을 갖는 가상 환경에서, 현재 아바타 정보는 에미션을 검출하기 위한 아바타의 기관(organ)의 가상 환경에서의 위치를 포함할 수 있다. 이하에서, 가상 환경에서의 위치는 종종 가상 위치라고 지칭될 것이다. 당연히, 가상 위치들이 존재하는 경우에는, 그러한 위치들 간의 가상 거리도 존재한다.The filter component 111 filters the aggregate stream 107 for the segments needed for rendering the virtual environment appropriately for the avatar i among the streaming data for a given type of emission. The filtering is based on the current avatar information 113 and the current streaming data source information 114 of the avatar i. The current avatar information 113 is any information about the avatar that affects the ability of the avatar i to recognize the emission. Recognition of the current avatar information depends on the nature of the virtual environment. For example, in a virtual environment with the concept of location, the current avatar information may include the location in the virtual environment of the avatar's organ for detecting the emission. In the following, locations in the virtual environment will often be referred to as virtual locations. Of course, if virtual locations exist, there are also virtual distances between those locations.

현재 스트리밍 데이터 소스 정보는 특정 소스로부터의 에미션을 인식하는 아바타(i)의 능력에 영향을 미치는 스트리밍 데이터의 소스들에 관한 현재 정보이다. 현재 스트리밍 데이터 소스 정보(114)의 일례는 소스의 에미션 생성 컴포넌트의 가상 위치이다. 다른 예는 소스에서의 에미션의 세기이다.Current streaming data source information is current information about sources of streaming data that affect the avatar i's ability to recognize emissions from a particular source. One example of current streaming data source information 114 is the virtual location of the emission generation component of the source. Another example is the intensity of the emission at the source.

115에 도시된 바와 같이, 아바타(i)가 인식할 수 있어서 119에서 아바타(i)를 위해 가상 환경을 렌더링하는 데에 필요한 스트리밍 데이터를 갖는 세그먼트들만이 필터(111)로부터 출력된다. 바람직한 실시예에서, 인식가능성은 소스와 인식하는 아바타 간의 가상 거리, 및/또는 인식가능한 세그먼트들의 상대적 크기(relative loudness)에 기초할 수 있다. 필터(111)에 의한 필터링 이후에 남아있는 세그먼트들은, 가상 환경에서의 아바타(i)의 현재의 인식 지점에 대하여 가상 환경을 렌더링하는 렌더링 컴포넌트(117)로의 입력으로서 제공된다.As shown at 115, only segments having the streaming data necessary for rendering the virtual environment for the avatar i at 119 are recognized by the avatar i are output from the filter 111. In a preferred embodiment, the recognizability may be based on the virtual distance between the source and the recognizing avatar, and / or the relative loudness of the recognizable segments. The segments remaining after filtering by the filter 111 are provided as input to the rendering component 117 which renders the virtual environment with respect to the avatar i's current recognition point in the virtual environment.

바람직한 실시예의 상세Details of the preferred embodiment

본 바람직한 실시예에서, 소스들의 에미션들은 가청 음향이고, 가상 현실 시스템은 아바타를 위한 음향의 렌더링이 아바타에 의해 표현되는 사용자에 의해 사용되는 클라이언트 컴퓨터에서 행해지는 네트워크화된 시스템이다.In the present preferred embodiment, the emissions of the sources are an audible sound and the virtual reality system is a networked system in which the rendering of the sound for the avatar is done by the client computer used by the user represented by the avatar.

바람직한 desirable 실시예에서의In the embodiment 세그먼트들의Of segments 개관 survey

앞에서 언급한 바와 같이, 사용자의 클라이언트 컴퓨터는 스트리밍 음향 입력을 디지타이즈하고, 스트리밍 데이터의 세그먼트들을 네트워크를 통해 패킷들로 송신한다. 네트워크를 통해 데이터를 전송하기 위한 패킷들은 본 기술 분야에 공지되어 있다. 이하에서는, 바람직한 실시예에서의 스트리밍 오디오 패킷의 콘텐츠(페이로드라고도 함)에 대해 논의한다. 이러한 논의는 본 발명의 기술들의 양태들을 설명한다.As mentioned above, the user's client computer digitizes the streaming acoustic input and transmits segments of the streaming data in packets over the network. Packets for transmitting data over a network are known in the art. In the following, the content (also known as payload) of streaming audio packets in the preferred embodiment is discussed. This discussion describes aspects of the techniques of this invention.

도 3은 스트리밍 오디오 세그먼트의 페이로드를 개념적인 형태로 도시한 것이다.3 illustrates in conceptual form the payload of a streaming audio segment.

바람직한 실시예에서, 아바타는 가청 에미션들을 인식할 수 있을 뿐만 아니라, 그들의 소스일 수도 있다. 또한, 아바타의 음성 발생기의 가상 위치는 아바타의 음성 검출기의 가상 위치와 다를 수 있다. 결과적으로, 아바타는 음성의 인식자로서 갖는 가상 위치와는 다른, 음성의 소스로서의 가상 위치를 가질 수 있다.In the preferred embodiment, the avatars can recognize audible emissions as well as be their source. Also, the virtual position of the avatar's voice generator may be different from the virtual position of the avatar's voice detector. As a result, the avatar may have a virtual location as a source of voice, which is different from the virtual location as a voice recognizer.

구성요소(300)는 바람직한 실시예에서 이용되는 스트리밍 데이터 세그먼트의 페이로드를 개념적인 형태로 도시한 것이다. 330 및 340의 중괄호는 각각 세그먼트 페이로드의 두개의 주요 부분, 즉 세그먼트에 의해 표현되는 스트리밍 오디오 데이터에 관한 메타데이터 정보를 갖는 헤더, 및 스트리밍 오디오 데이터 자체를 나타낸다. 메타데이터는 화자(speaker) 위치 및 세기와 같은 정보를 포함한다. 바람직한 실시예에서, 세그먼트의 메타데이터는 스트리밍 데이터에 의해 표현되는 에미션의 소스를 위한 현재 스트리밍 데이터 소스 정보(114)의 일부이다.Component 300 illustrates in conceptual form the payload of a streaming data segment used in the preferred embodiment. Curly braces 330 and 340 represent two main portions of the segment payload, namely a header having metadata information about the streaming audio data represented by the segment, and the streaming audio data itself. The metadata includes information such as speaker location and intensity. In a preferred embodiment, the segment's metadata is part of the current streaming data source information 114 for the source of the emission represented by the streaming data.

바람직한 실시예에서, 메타데이터(330)는 이하의 것들을 포함한다.In a preferred embodiment, the metadata 330 includes the following.

● 세그먼트 내의 스트리밍 데이터에 의해 표현되는 음성을 에미션한 소스인 엔터티를 식별하는 userID 값(301). 아바타인 소스에 대하여, 이것은 아바타를 식별함.A userID value 301 identifying the entity that is the source of the speech represented by the streaming data in the segment. For a source that is an avatar, this identifies the avatar.

● 세션을 식별하는 세션 ID 값(302). 본 맥락에서, 세션은 소스들 및 아바타들의 집합이다. 플래그들(303)의 집합은, 에미션이 스트리밍 데이터의 이러한 세그먼트를 표현할 때의 소스의 상태에 관한 정보와 같은 추가의 정보를 나타낸다. 한 플래그는 위치값(305)의 성질, 즉 "화자" 또는 "청취자" 위치를 나타낸다.● session ID value to identify the session (302). In this context, a session is a collection of sources and avatars. The set of flags 303 represents additional information, such as information about the state of the source when the emission represents this segment of streaming data. One flag indicates the nature of the position value 305, namely the "speaker" or "listener" position.

● 가상 환경에서 세그먼트에 의해 표현되는 에미션의 소스의 현재 가상 위치, 또는 아바타에 대하여 그 아바타의 "청취" 부분의 현재 가상 위치를 제공하는 위치(305).The current virtual location of the source of the emission represented by the segment in the virtual environment, or location 305 providing the avatar with the current virtual location of the “listen” portion of the avatar.

● 음향 에너지의 세기, 또는 에미션된 음향의 고유 크기에 대한 값(307).A value 307 for the intensity of the acoustic energy, or the intrinsic magnitude of the emitted sound.

● 추가의 메타데이터가 있다면 309에서 표현된다.If there is additional metadata, it is represented at 309.

바람직한 실시예에서, 가청 에미션에 대한 세기값(307)은 관련 기술분야에 알려진 원리들에 따라 음향의 고유 크기로부터 계산된다. 다른 유형의 에미션들은 에미션의 세기를 표현하기 위하여 다른 값들을 이용할 수 있다. 예를 들어, 가상 환경에서 텍스트로서 나타나는 에미션에 대하여, 세기값은 사용자에 의해 별도로 입력될 수도 있고, 다르게는 모두 대문자인 텍스트에 대소문자 혼용이거나 모두 소문자인 텍스트보다 큰 세기값이 제공될 수 있다. 본 발명의 기술들에 따른 실시예에서, 세기값들은 상이한 유형들의 에미션들의 세기가 필터링 등에서 서로 비교될 수 있도록 하는 설계의 문제로서 선택될 수 있다.In a preferred embodiment, the intensity value 307 for the audible emission is calculated from the inherent loudness of the sound according to principles known in the art. Different types of emissions may use different values to express the strength of the emission. For example, for an emission that appears as text in a virtual environment, the intensity value may be input separately by the user, or an intensity value greater than mixed case or all lower case text may be provided for text that is otherwise all uppercase. have. In an embodiment in accordance with the techniques of the present invention, the intensity values may be selected as a matter of design such that the intensity of different types of emissions can be compared to each other in filtering and the like.

스트리밍 데이터 세그먼트는 340 및 관련 중괄호에 도시되어 있다. 세그먼트 내에서, 세그먼트의 데이터 부분은 321에서 시작하여, 세그먼트 내의 모든 데이터들로 계속되고, 323에서 종료하는 것으로 도시되어 있다. 바람직한 실시예에서, 스트리밍 데이터 부분(340) 내의 데이터는 에미션된 음향을 압축된 포맷으로 표현하는데, 세그먼트를 생성하는 클라이언트 소프트웨어는 또한 오디오 데이터를 압축된 표현으로 변환하여, 더 적은 데이터(및 따라서 더 적거나 작은 세그먼트)가 네트워크를 통해 송신되면 되게 한다.Streaming data segments are shown in 340 and associated braces. Within a segment, the data portion of the segment is shown starting at 321, continuing with all the data in the segment, and ending at 323. In a preferred embodiment, the data in the streaming data portion 340 represents the emitted sound in a compressed format, wherein the client software that creates the segment also converts the audio data into a compressed representation so that less data (and thus Fewer or smaller segments) are sent over the network.

바람직한 실시예에서, 신호 데이터를 시간 영역으로부터 주파수 영역으로 변환하고, 음향심리학적 원리들에 따라 다수의 서브대역을 양자화하기 위하여 DCT(Discrete Cosine Transform)에 기초하여 압축된 포맷이 이용된다. 이러한 기술들은 본 기술분야에 공지되어 있으며, "Polycom ® Siren14 ™, Information for Prospective Licensees", In a preferred embodiment, a compressed format based on the Discrete Cosine Transform (DCT) is used to transform the signal data from the time domain to the frequency domain, and to quantize multiple subbands according to psychoacoustic principles. Such techniques are known in the art and are described in " Polycom ® Siren14 ™, Information for Prospective Licensees ",

에서 SIREN14 코덱 표준에 대해 기술되어 있다. For the SIREN14 codec standard.

에미션의 어떠한 표현이라도 이용될 수 있다. 표현은 다른 표현 영역 내에 있을 수 있으며, 또한 에미션은 다른 영역에서 렌더링될 수 있는데, 음성 에미션이 음성-대-텍스트 알고리즘을 이용하여 텍스트로서 표현 또는 렌더링되거나 그 반대로 될 수 있고, 음향 에미션이 시각적으로 표현 또는 렌더링되거나 그 반대로 될 수 있고, 가상 텔레파시 에미션이 상이한 유형의 스트리밍 데이터로서 표현 또는 렌더링될 수 있는 등등이다.Any expression of the emission can be used. Representations can be in other presentation areas, and emissions can also be rendered in other areas, where speech emissions can be represented or rendered as text using a speech-to-text algorithm, or vice versa, acoustic emission This may be visually represented or rendered, or vice versa, virtual telepathy emission may be represented or rendered as different types of streaming data, and so forth.

바람직한 실시예의 아키텍처 개관Architectural Overview of Preferred Embodiments

도 5는 스테이지 1 및 스테이지 2 필터링의 동작을 도시한, 바람직한 실시예의 시스템도이다. 이하에서는 도 5가 개략적으로 설명될 것이다.5 is a system diagram of a preferred embodiment, illustrating the operation of stage 1 and stage 2 filtering. 5 will be described schematically below.

도 3의 논의에서 언급한 바와 같이, 바람직한 실시예에서, 세그먼트는 세션 ID(302)를 위한 필드를 갖는다. 스트리밍 데이터(320)를 포함하는 각 세그먼트는 세션에 소속되고, 그 세그먼트가 소속되는 세션에 대한 식별자를 필드(320)에 갖는다. 세션은 세션의 구성원(member)으로서 지칭되는 소스들 및 아바타들의 그룹을 식별한다. 소스가 구성원인 세션들의 집합은 그 소스를 위한 현재 소스 정보(114)에 포함된다. 마찬가지로, 아바타가 구성원인 세션들의 집합은 그 아바타를 위한 현재 아바타 정보(113)에 포함된다. 그룹의 구성원들을 표현 및 관리하고, 또한 그렇게 하기 위한 시스템을 구현하기 위한 기술들은 관련 기술분야에서 익숙한 것이다. 바람직한 실시예에서, 세션 구성원자격(membership)의 표현은 세션 테이블이라고 지칭된다.As mentioned in the discussion of FIG. 3, in the preferred embodiment, the segment has a field for session ID 302. Each segment that includes streaming data 320 belongs to a session and has an identifier in field 320 for the session to which the segment belongs. A session identifies a group of avatars and sources, referred to as members of the session. The set of sessions of which the source is a member is included in current source information 114 for that source. Similarly, the set of sessions for which the avatar is a member is included in the current avatar information 113 for that avatar. Techniques for representing and managing members of a group, and also for implementing a system for doing so, are familiar in the art. In a preferred embodiment, the representation of session membership is referred to as a session table.

바람직한 실시예에서, 두가지 유형의 세션, 즉 위치관련 세션(positional session) 및 고정 세션(static session)이 존재한다. 위치관련 세션은 그 구성원들이 에미션의 소스들, 및 그 소스들로부터의 에미션을 가상 환경 내에서 적어도 잠재적으로 검출가능한 아바타들인 세션이다. 바람직한 실시예에서, 가청 에미션의 주어진 소스, 및 그 주어진 소스로부터의 가청 에미션을 잠재적으로 들을 수 있는 임의의 아바타는 반드시 동일한 위치관련 세션의 구성원이어야 한다. 바람직한 실시예는 단일 위치관련 세션만을 갖는다. 다른 실시예들은 하나보다 많은 위치관련 세션을 가질 수 있다. 고정 세션은, 그 구성원자격이 가상 현실 시스템의 사용자들에 의해 결정되는 세션이다. 고정 세션에 소속된 아바타에 의해 만들어진 임의의 가청 에미션은 가상 환경에서의 아바타들의 위치에 무관하게, 그 고정 세션에 소속된 모든 다른 아바타에게 들린다. 따라서, 고정 세션은 전화 회의 통화와 같이 기능한다. 바람직한 실시예의 가상 현실 시스템은 사용자가 자신의 아바타가 소속되는 고정 세션을 지정할 수 있게 해 주는 사용자 인터페이스를 제공한다. 필터(111)의 다른 실시예들은 다른 유형들의 세션을 포함하거나, 세션을 전혀 포함하지 않을 수 있다. 본 바람직한 실시예에서의 세션들의 구현에 대한 한가지 확장은 단일 세션이 아니라 세션들의 그룹을 나타내는 세션 ID 특수값들의 집합일 것이다.In a preferred embodiment, there are two types of sessions: positional sessions and static sessions . A location-related session is a session whose members are sources of emission and avatars that are at least potentially detectable within the virtual environment for the emission from those sources. In a preferred embodiment, a given source of audible emission, and any avatar that can potentially hear an audible emission from that given source, must be a member of the same location-related session. The preferred embodiment has only a single location related session. Other embodiments may have more than one location related session. A fixed session is a session whose membership is determined by users of the virtual reality system. Any audible emission made by an avatar belonging to a fixed session is heard by all other avatars belonging to the fixed session, regardless of the position of the avatars in the virtual environment. Thus, a fixed session functions like a conference call. The virtual reality system of the preferred embodiment provides a user interface that allows the user to specify a fixed session to which his avatar belongs. Other embodiments of the filter 111 may include other types of sessions or no sessions at all. One extension to the implementation of sessions in this preferred embodiment would be a set of session ID special values that represent a group of sessions rather than a single session.

바람직한 실시예에서, 세그먼트의 세션 ID에 의해 지정되는 세션의 유형은 세그먼트가 필터(111)에 의해 어떻게 필터링될지를 결정한다. 세션 ID가 위치관련 세션을 지정하는 경우, 세그먼트들은 필터에 대한 아바타가 가상 환경에서 소스를 인식할 수 있는지 여부를 결정하기 위해 필터링된다. 그리고, 필터에 대한 아바타가 인식할 수 있는 세그먼트들은 소스들의 상대적 크기에 의해 필터링된다. 후자의 필터에서, 필터의 아바타에 의해 인식될 수 있는 위치관련 세션으로부터의 세그먼트들은 그 아바타가 구성원인 고정 세션으로부터의 세그먼트들과 함께 필터링된다.In the preferred embodiment, the type of session specified by the session ID of the segment determines how the segment will be filtered by the filter 111. If the session ID specifies a location related session, the segments are filtered to determine whether the avatar for the filter can recognize the source in the virtual environment. The segments that the avatar for the filter can recognize are then filtered by the relative size of the sources. In the latter filter, segments from location-related sessions that can be recognized by the avatar of the filter are filtered together with segments from the fixed session of which the avatar is a member.

바람직한 실시예에서, 가상 환경 내의 가청 에미션의 모든 소스는 위치관련 세션에 대한 세션 ID를 갖는 가청 에미션에 대한 세그먼트들을 만드는데, 소스가 또한 고정 세션의 구성원이고 에미션이 그 고정 세션에서도 들을 수 있는 것이면, 소스는 그 고정 세션에 대한 세션 ID를 갖는 가청 에미션에 대한 세그먼트들 각각의 사본을 더 만든다. 따라서, 가상 환경에서 가청 에미션을 인식할 수 있고 그 에미션을 들을 수 있는 고정 세션의 구성원이기도 한 아바타는 자신의 필터 내에 세그먼트의 하나보다 많은 사본을 수신할 수 있다. 바람직한 실시예에서, 필터는 중복들을 검출하고, 세그먼트들 중 하나만을 아바타에게 전달한다.In a preferred embodiment, all sources of audible emissions in the virtual environment make segments for the audible emission with the session ID for the location-related session, where the source is also a member of the fixed session and the emission can be heard in that fixed session. If so, the source further makes a copy of each of the segments for the audible emission with the session ID for that fixed session. Thus, an avatar who is able to recognize audible emissions in a virtual environment and is also a member of a fixed session that can hear them can receive more than one copy of the segment in its filter. In a preferred embodiment, the filter detects duplicates and passes only one of the segments to the avatar.

도 5로 되돌아가면, 구성요소 501 및 509는 다수의 클라이언트 컴퓨터 중 두개 이다. 클라이언트 컴퓨터는 일반적으로 가상 환경을 갖는 통합된 시스템 구현을 위한 하드웨어 및 소프트웨어를 갖는 '개인용' 컴퓨터이다. 예를 들어, 클라이언트 컴퓨터는 부착된 마이크로폰, 키보드, 디스플레이, 및 헤드폰 또는 스피커를 가지고, 통합된 시스템의 클라이언트 동작들을 수행하기 위한 소프트웨어를 갖는다. 클라이언트 컴퓨터들은 각각 502 및 506에 도시된 바와 같이 네트워크에 접속된다. 각각의 클라이언트는 그 클라이언트의 사용자에 의해 지시된 대로 아바타를 제어할 수 있다. 아바타는 가상 실시에서 음향을 에미션할 수 있고/거나 소스들에 의해 에미션되는 음향을 들을 수 있다. 가상 현실 시스템에서 에미션을 표현하는 스트리밍 데이터는, 클라이언트의 아바타가 에미션의 소스일 때는 클라이언트에서 생성되고, 클라이언트의 아바타가 에미션을 인식할 수 있을 때는 클라이언트에서 렌더링된다. 이것은 클라이언트(501)와 네트워크(502) 사이, 및 클라이언트(509)와 네트워크(506) 사이와 같이, 클라이언트 컴퓨터들과 네트워크들 사이에서 양방향의 화살표들에 의해 도시된다.Returning to FIG. 5, components 501 and 509 are two of a number of client computers. Client computers are generally 'personal' computers with hardware and software for integrated system implementations with virtual environments. For example, a client computer has an attached microphone, keyboard, display, and headphones or speakers, and software for performing client operations of the integrated system. Client computers are connected to the network as shown at 502 and 506, respectively. Each client can control the avatar as instructed by the user of that client. The avatar may emit sound in the virtual implementation and / or hear sound emitted by the sources. The streaming data representing the emission in the virtual reality system is generated at the client when the avatar of the client is the source of the emission, and rendered at the client when the avatar of the client can recognize the emission. This is illustrated by bidirectional arrows between client computers and networks, such as between client 501 and network 502, and between client 509 and network 506.

바람직한 실시예에서, 클라이언트(501) 및 필터링 시스템(517)과 같은 컴포넌트들 간의 세그먼트들 및 스트리밍 데이터를 위한 네트워크 접속은 오디오 데이터를 위한 RTP 및 SIP 네트워크 프로토콜과 같은 표준 네트워크 프로토콜을 이용하는데, RTP 및 SIP 프로토콜과, 네트워크 접속 및 접속 관리를 위한 많은 다른 기술들은 본 기술분야에 공지되어 있다. 본 맥락에서 중요한 RTP의 특징은 RTP가 데이터를 그것의 도착 시간에 의해 관리하는 것을 지원하고, 시간값을 포함하는 데이터에 대한 요청 시에, 그 시간값과 동일하거나 덜 최근인 도착 시간을 갖는 데이터를 반환할 수 있다는 것이다. 바람직한 실시예의 가상 현실 시스템이 위에서 설명된 대로의 RTP로부터 요청하는 세그먼트들은 이하에서 현재 세그먼트라고 지칭된다.In a preferred embodiment, network connections for segments and streaming data between components such as client 501 and filtering system 517 use standard network protocols such as RTP and SIP network protocols for audio data. The SIP protocol and many other techniques for network connection and connection management are known in the art. An important feature of RTP in this context is that it supports RTP to manage data by its arrival time, and upon request for data that includes a time value, the data having an arrival time that is equal to or less recent than that time value. Can return. The segments that the virtual reality system of the preferred embodiment requests from RTP as described above are referred to as current segments below.

502 및 506에서의 네트워크들은 도 5에서 별개의 네트워크들로 도시되지만, 당연히 동일한 네트워크 또는 상호접속된 네트워크일 수도 있다.The networks at 502 and 506 are shown as separate networks in FIG. 5, but may of course be the same network or an interconnected network.

구성요소(501)을 참조하면, 가상 환경 내의 아바타에 관련된 사용자가 501과 같은 클라이언트 컴퓨터에서 마이크로폰에 대고 말을 할 때, 컴퓨터의 소프트웨어는 그 음향을 메타데이터를 갖는 압축된 포맷의 스트리밍 데이터의 세그먼트들로 변환하고, 세그먼트들(510) 내의 세그먼트 데이터를 네트워크를 통해 필터링 시스템(517)에 송신한다.Referring to component 501, when a user associated with an avatar in a virtual environment speaks to the microphone at a client computer, such as 501, the software of the computer may segment the stream of streaming data in a compressed format with metadata for that sound. To the filtering system 517 via the network.

바람직한 실시예에서, 필터링 시스템(517)은 통합되지 않은 가상 현실 시스템의 서버 스택들과는 별도로, 통합된 시스템의 서버 스택 내에 있다.In a preferred embodiment, the filtering system 517 is in the server stack of the integrated system, separate from the server stacks of the unrealized virtual reality system.

이하에서는, 압축된 포맷 및 메타데이터가 기술된다. 필터링 시스템은 클라이언트들의 아바타들을 위한 아바타별 필터(512 및 516)를 갖는다. 각각의 아바타별 필터는 가상 환경 내의 다수의 소스로부터의 가청 에미션을 표현하는 스트리밍 데이터를 필터링한다. 필터링은 특정한 클라이언트의 아바타가 들을 수 있는 가청 에미션들을 표현하는 스트리밍 데이터의 세그먼트들을 결정하고, 가청 세그먼트들에 대한 스트리밍 오디오를 네트워크를 통해 아바타의 클라이언트에게 송신한다. 503에서 도시된 바와 같이, 클라이언트(501)의 사용자를 표현하는 아바타가 들을 수 있는 세그먼트들은 네트워크(502)를 통해 클라이언트(501)에게 송신된다.In the following, compressed format and metadata are described. The filtering system has avatar-specific filters 512 and 516 for the avatars of the clients. Each avatar-specific filter filters streaming data representing audible emissions from multiple sources in the virtual environment. The filtering determines segments of streaming data representing audible emissions that a particular client's avatar can hear and sends streaming audio for the audible segments to the avatar's client over the network. As shown at 503, segments that an avatar representing a user of client 501 can hear are transmitted to client 501 via network 502.

에미션들의 각각의 소스는 현재 에미션 소스 정보, 즉 정보가 실시간으로 변할 수 있는 경우에서의 에미션 및 그것의 소스에 관한 현재 정보 및/또는 그것의 소스에 관한 정보와 관련지어진다. 예로는, 소스에서의 에미션의 품질, 소스에서의 에미션의 세기, 및 에미션 소스의 위치가 있다.Each source of emissions is associated with current emission source information , ie current information about the emission and its source and / or information about its source in the case where the information can change in real time. Examples include the quality of the emission at the source, the intensity of the emission at the source, and the location of the emission source.

본 바람직한 실시예에서, 현재 에미션 소스 정보(114)는 소스로부터의 에미션을 표현하는 세그먼트들 내의 메타데이터로부터 획득된다.In the present preferred embodiment, the current emission source information 114 is obtained from metadata in the segments representing the emission from the source.

바람직한 실시예에서, 필터링은 2개의 스테이지에서 수행된다. 필터링 시스템(517)에서 이용되는 필터링 프로세스는, 개괄적으로 다음과 같다.In a preferred embodiment, the filtering is performed in two stages. The filtering process used in the filtering system 517 is generally as follows.

위치관련 세션에 소속되는 세그먼트들에 대하여:For segments belonging to location sessions:

● 스테이지 1 필터링: 세그먼트 및 아바타에 대하여, 필터링 프로세스는 세그먼트의 소스를 아바타로부터 분리시키는 가상 거리, 및 세그먼트의 소스가 아바타의 임계 가상 거리 내에 있는지를 결정한다. 임계 거리는 아바타에 대한 가청 근접권역(audible vicinity)을 정의하고, 이 근접권역 밖에 있는 소스로부터의 에미션들은 아바타가 들을 수 없다.Stage 1 filtering: For segments and avatars, the filtering process determines a virtual distance that separates the source of the segment from the avatar, and whether the source of the segment is within the threshold virtual distance of the avatar. The threshold distance defines an audible vicinity of the avatar, and emissions from sources outside this proximity are inaudible to the avatar.

임계치를 벗어난 세그먼트들은 필터링 2에 전달되지 않는다. 이 결정은 앞에서 설명된 세션 ID와 같은 세그먼트에 대한 메타데이터 정보, 소스(114)에 대한 현재 소스 정보, 및 아바타(113)에 대한 현재 아바타 정보를 고려함으로써 효율적으로 행해진다. 일반적으로, 이러한 필터링은 아래에서 필터링 2에 대하여 설명되는 것과 같이 필터링되어야 하는 세그먼트들의 수를 감소시킨다.Segments outside the threshold are not passed to Filtering 2. This determination is efficiently made by considering metadata information for the segment, such as the session ID described above, current source information for source 114, and current avatar information for avatar 113. In general, such filtering reduces the number of segments that should be filtered as described for Filtering 2 below.

고정 세션의 세션 ID를 갖는 세그먼트들에 대하여:For segments with session ID of fixed session:

● 스테이지 1 필터링: 세그먼트 및 아바타에 대하여, 필터링 프로세스는 필터의 아바타가 세그먼트의 세션 ID에 의해 식별되는 세션의 구성원인식을 판정한다. 필터의 아바타가 세션의 구성원이면, 세그먼트는 필터링 2에 전달된다. 일반적으로, 이 필터링은 이하에서 필터링 2에 대하여 기술되는 것과 같이 필터링될 세그먼트들의 수를 감소시킨다.Stage 1 filtering: For segments and avatars, the filtering process determines the membership of the session in which the avatar of the filter is identified by the session ID of the segment. If the avatar of the filter is a member of the session, the segment is passed to filtering 2. In general, this filtering reduces the number of segments to be filtered as described for Filtering 2 below.

필터의 아바타에 대한 임계치 내에 있거나 아바타가 구성원인 세션에 소속되는 모든 세그먼트들에 대하여:For all segments within the filter's threshold for the avatar or belonging to a session in which the avatar is a member:

● 스테이지 2 필터링: 필터링 프로세스는 스테이지 1 필터링에 의해 전달된 해당 아바타에 대한 모든 세그먼트들의 겉보기 크기를 결정한다. 그러고나서, 세그먼트들은 그들의 겉보기 크기에 따라 정렬되고, 상이한 세션들로부터의 중복 세그먼트들은 제거되며, 가장 큰 겉보기 크기를 갖는 세개의 세그먼트로 구성되는 서브세트가 렌더링을 위하여 아바타에게 송신된다. 서브세트의 크기는 설계적 선택의 문제이다. 판정은 메타데이터를 고려함으로써 효율적으로 이루어진다. 중복 세그먼트들은 동일한 userID 및 상이한 세션 ID들을 갖는 세그먼트들이다.Stage 2 filtering: The filtering process determines the apparent size of all segments for that avatar conveyed by stage 1 filtering. The segments are then sorted according to their apparent size, duplicate segments from different sessions are removed, and a subset consisting of the three segments with the largest apparent size is sent to the avatar for rendering. The size of the subset is a matter of design choice. The decision is made efficiently by considering the metadata. Duplicate segments are segments with the same userID and different session IDs.

위치관련 세션에 소속되는 세그먼트들만을 필터링하는 필터 시스템(517)의 컴포넌트들은 541에서 위쪽에 우측으로 중괄호 표시된 상측 중괄호(541)에 의해 표시되고, 고정 세션에 소속되는 세그먼트들만을 필터링하는 컴포넌트들은 하측 중괄호(542)에 의해 표시된다.Components of the filter system 517 that filter only segments belonging to the location-related session are indicated by upper braces 541, indicated by braces at the top and right at 541, and components that filter only segments belonging to the fixed session are lower. Denoted by braces 542.

스테이지 1 필터링에 관련된 컴포넌트들은 551에서 왼쪽 아래에 있는 중괄호에 의해 표시되고, 스테이지 2 필터링을 수행하는 컴포넌트들은 552에서 오른쪽 아래에 있는 중괄호에 의해 표시된다.Components related to stage 1 filtering are indicated by braces at the bottom left at 551, and components performing stage 2 filtering are indicated by braces at the bottom right at 552.

바람직한 실시예에서, 필터 시스템 컴포넌트(517)는 바람직한 실시예의 가상 현실 시스템 내의 서버에 위치된다. 그러나, 아바타를 위한 필터는 일반적으로 에미션의 소스와 그 필터가 관련지어져 있는 아바타에 대한 렌더링 컴포넌트 사이의 경로 내에서 어느 지점에라도 위치될 수 있다.In a preferred embodiment, filter system component 517 is located at a server in the virtual reality system of the preferred embodiment. However, the filter for the avatar can generally be located at any point in the path between the source of the emission and the rendering component for the avatar with which the filter is associated.

세션 관리자(504)는 모든 입력 패킷들을 수신하여 세그먼트 라우팅(540)에 제공하며, 그리고 이것은 위치관련 세션을 통해서든 고정 세션을 통해서든 주어진 아바타가 인식할 수 있는 세그먼트들을 스테이지 2 필터링을 위하여 적합한 아바타별 필터로 향하게 함으로써 스테이지 1 필터링을 수행한다.The session manager 504 receives all input packets and provides them to the segment routing 540, which is an avatar suitable for stage 2 filtering segments that a given avatar can recognize, whether via a location related session or a fixed session. Stage 1 filtering is performed by directing to a star filter.

505에 도시된 바와 같이, 세그먼트 라우팅 컴포넌트(540)로부터 출력된 세그먼트들의 집합들은 각각의 아바타를 위한 대표적인 아바타별 필터들(512 및 516)에 입력된다. 스트리밍 데이터에 의해 표현되는 에미션의 유형을 인식할 수 있는 각 아바타는 대응하는 아바타별 필터를 가지고 있다. 각각의 아바타별 필터는 각각의 소스에 소속되는 세그먼트들로부터, 목적 아바타가 들을 수 있는 세그먼트들을 선택하여 그들의 겉보기 크기에 관하여 정렬하고, 임의의 중복 세그먼트들을 제거하며, 나머지 세그먼트들 중에서 가장 큰 세개의 세그먼트를 네트워크를 통해 아바타의 클라이언트에게 송신한다.As shown at 505, sets of segments output from segment routing component 540 are input to representative avatar-specific filters 512 and 516 for each avatar. Each avatar capable of recognizing the type of emission represented by the streaming data has a corresponding avatar-specific filter. Each avatar-specific filter selects, from the segments belonging to each source, segments that the target avatar can hear, sorts about their apparent size, removes any duplicate segments, and adds the largest three among the remaining segments. Send the segment to the avatar's client over the network.

스트리밍 오디오 Streaming audio 세그먼트들의Of segments 콘텐츠의 상세 Details of the content

도 4는 이러한 기술들을 위한 페이로드 포맷의 관련 양태들에 대한 보다 더 상세한 설명을 도시한 것이다. 바람직한 실시예에서, 페이로드 포맷은 가상 현실 시스템에 의해 사용되는 비-스트리밍 데이터도 포함할 수 있다. 바람직한 실시예의 통합된 시스템은 본 기술들이 가상 현실 시스템 또는 기타 애플리케이션과 함께 통합될 수 있는 많은 방법들 중 일부의 예시이다. 이러한 통합에 이용되는 포맷은 SIREN14-3D 포맷이라고 지칭된다. 포맷은 하나의 네트워크 패킷에 복수의 페이로드를 운반하기 위해 캡슐화(encapsulation)를 이용한다. 캡슐화의 기술, 헤더, 플래그, 및 패킷 및 데이터 포맷의 다른 일반적인 양태들은 본 기술분야에 공지되어 있으며, 따라서 여기에서는 상세하게 설명되지 않는다. 명확함을 위하여, 가상 환경과의 통합 또는 가상 환경의 동작의 상세가 본 발명의 기술들을 설명하는 데에 있어서 밀접한 관련을 갖지 않는 경우에, 그러한 상세들은 본 논의에서 생략된다.4 shows a more detailed description of the relevant aspects of the payload format for these techniques. In a preferred embodiment, the payload format may also include non-streaming data used by the virtual reality system. The integrated system of the preferred embodiment is an illustration of some of the many ways in which the techniques can be integrated with a virtual reality system or other application. The format used for this integration is called the SIREN14-3D format. The format uses encapsulation to carry multiple payloads in one network packet. Techniques of encapsulation, headers, flags, and other general aspects of packet and data formats are known in the art and thus are not described in detail herein. For clarity, where details of integration with the virtual environment or the operation of the virtual environment are not closely related in describing the techniques of the present invention, such details are omitted in this discussion.

구성요소(401)는 이 명세의 부분이 이 포맷의 바람직한 SIREN14-3D 버전인 V2 RTP 버전에 관한 것이고, 하나 이상의 캡슐화된 페이로드들이 RTP 네트워크 프로토콜을 이용하여 네트워크를 통해 전송되는 네트워크 패킷에 의해 운반됨을 서술한다.Component 401 relates to a V2 RTP version, part of this specification being the preferred SIREN14-3D version of this format, wherein one or more encapsulated payloads are carried by a network packet transmitted over a network using an RTP network protocol. State that

본 바람직한 실시예에서, SIREN14-3D 버전의 V2 RTP 페이로드는 오디오 데이터를 갖는 캡슐화된 미디어 페이로드와, 그에 후속하는 0 이상의 다른 캡슐화된 페이로드로 구성된다. 각각의 캡슐화된 페이로드의 콘텐츠는 이하에 설명되는 headerFlags 플래그 비트(414)에 의해 주어진다.In this preferred embodiment, the SIREN14-3D version of the V2 RTP payload consists of an encapsulated media payload with audio data, followed by zero or more other encapsulated payloads. The content of each encapsulated payload is given by the headerFlags flag bit 414 described below.

구성요소(410)는 V2 포맷에서의 캡슐화된 페이로드의 헤더 부분을 기술한다. 구성요소(410)의 상세는 헤더(410) 내에서 메타데이터의 개별 구성요소들을 기술한다.Component 410 describes the header portion of the encapsulated payload in V2 format. The details of component 410 describe the individual components of metadata within header 410.

411에 도시된 바와 같이, 헤더 내의 제1 값은 32비트 크기인 userID 값이고, 이 값은 그 세그먼트에 대한 에미션의 소스를 식별한다.As shown at 411, the first value in the header is a 32-bit userID value, which identifies the source of the emission for that segment.

그에 이어, 세션 ID(412)라고 명명된 32비트 항목이 뒤따른다. 이 값은 그 세그먼트가 소속되는 세션을 식별한다.This is followed by a 32-bit entry named session ID 412. This value identifies the session to which the segment belongs.

그에 이어, smoothedEnergyEstimate(413)라고 명명된, 이 세그먼트에 대한 세기값을 위한 항목이 나온다. 구성요소(413)는 헤더에 뒤따르는 오디오 데이터의 세그먼트의 고유 크기에 대한 세기값을 위한 메타데이터 값이고, 그 값은 특정 시스템 구현의 단위로 된 정수값이다.This is followed by an entry for the intensity value for this segment, named smoothedEnergyEstimate (413). Component 413 is a metadata value for the intensity value for the unique size of the segment of audio data following the header, which is an integer value in units of a particular system implementation.

바람직한 실시예에서, smoothedEnergyEstimate 값(413)은 스트리밍 음향 데이터로부터의 다수의 오리지널 또는 "미가공(raw)" 값들을 함께 스무딩(smoothing)함으로써 결정된 롱텀(long-term)으로 "스무딩된" 값이다. 이는 그 외에는 오디오 데이터에 존재할 수 있는 클라이언트 컴퓨터에 대한 음향 데이터의 디지털화 프로세스에 의해 야기되는 노이즈의 갑작스러운 순간들("클릭들" 등) 또는 데이터 아티팩트들(artifacts)로부터 초래될 수 있는 바람직하지 못한 필터 결과들을 방지한다. 이 바람직한 실시예에서의 값은 세그먼트의 음향 데이터에 의해 반영되는 오디오 에너지를 계산하기 위한 공지의 기술들을 이용해서 세그먼트에 대하여 계산된다. 바람직한 실시예에서, '알파(alpha)' 값이 0.125인 1차 IIR(Infinite Impulse response) 필터를 이용하여 순간 샘플 에너지

를 스무딩 처리하고 세그먼트의 에너지에 대한 세기 값을 생성한다. 세그먼트에 대한 세기 값을 계산 또는 할당하는 다른 방법들이 설계 선택에 따라 이용될 수 있다.In a preferred embodiment, the smoothedEnergyEstimate value 413 is a "smooth" value in long-term determined by smoothing together multiple original or "raw" values from the streaming acoustic data. This is otherwise undesirable, which can result from sudden moments of noise ("clicks" or the like) or data artifacts caused by the digitization process of acoustic data for the client computer that may be present in the audio data. Prevent filter results. The value in this preferred embodiment is calculated for the segment using known techniques for calculating the audio energy reflected by the acoustic data of the segment. In a preferred embodiment, instantaneous sample energy using a first-order Infinite Impulse response filter with an 'alpha' value of 0.125.

Smoothing and generate the intensity value for the energy of the segment. Other methods of calculating or assigning intensity values for the segments may be used depending on the design choices.

32 플래그 비트들로 구성되는 headerFlags(414)가 엘리먼트(413)에 후속한다. 페이로드 내의 헤더에 후속하는 데이터 및 포맷의 종류를 나타내기 위하여 다수의 이들 플래그 비트들이 이용된다.HeaderFlags 414, consisting of 32 flag bits, follow element 413. A number of these flag bits are used to indicate the type of data and format that follows the header in the payload.

420은 headerFlags(414) 내에서 설정될 수 있는 플래그 비트 정의들의 세트의 일부를 보여준다.420 shows a portion of a set of flag bit definitions that may be set in headerFlags 414.

엘리먼트(428)는 숫자 플래그 값이 0×1인 AUDIO_ONLY 페이로드에 대한 플래그(이 플래그는 페이로드 데이터가 스트리밍 오디오의 세그먼트에 대한 압축된 포맷으로 80 바이트의 오디오 데이터로 구성되는 것을 나타냄)를 설명한다.Element 428 describes a flag for the AUDIO_ONLY payload with a numeric flag value of 0x1, which indicates that the payload data consists of 80 bytes of audio data in a compressed format for a segment of streaming audio. do.

엘리먼트(421)는 수치 플래그 값이 0×2인 SPEAKER_POSITION 페이로드에 대한 플래그(이 플래그는 페이로드 데이터가 소스 아바타(source avatar)의 "마우스(mouth)" 또는 스피킹부의 현재 가상 위치를 구성하는 메타데이터를 포함하는 것을 나타냄)를 기술한다. 이것에는 스트리밍 오디오의 세그먼트에 대한 압축된 포맷으로 80 바이트의 오디오 데이터가 후속될 수 있다. 위치 업데이트 데이터는 가상 환경의 좌표 내의 X, Y 및 Z 위치에 대한 3개의 값들로 구성된다.Element 421 is a flag for the SPEAKER_POSITION payload whose numeric flag value is 0x2 (this flag indicates that the meta data that the payload data constitutes the "mouse" of the source avatar or the current virtual position of the speaking portion). Indicates that it contains data). This may be followed by 80 bytes of audio data in a compressed format for a segment of streaming audio. The location update data consists of three values for the X, Y and Z locations in the coordinates of the virtual environment.

바람직한 실시예에서, 아바타인 각각의 소스는 SPEAKER_POSITION 정보를 갖는 페이로드를 초당 2.5회 송신한다.In a preferred embodiment, each source that is an avatar transmits a payload with SPEAKER_POSITION information 2.5 times per second.

엘리먼트(422)는 수치 플래그 값이 0×4인 LISTENER_POSITION 페이로드에 대한 플래그(이 플래그는 페이로드 데이터가 아바타의 "귀(ears)" 또는 리스닝부의 현재 가상 위치로 구성되는 메타데이터를 포함하는 것을 나타냄)를 설명한다. 이것에는 80 바이트의 오디오 데이터가 후속될 수 있다. 위치 정보는 필터 구현예에서 어느 소스들이 특정 아바타의 "가청 근접권역"에 있는지를 결정하는 것을 허용한다. 바람직한 실시예에서, 아바타인 각각의 소스는 LISTENER_POSITION 정보를 갖는 페이로드를 초당 2.5회 송신한다.Element 422 is a flag for a LISTENER_POSITION payload whose numeric flag value is 0 × 4 (this flag indicates that the payload data includes metadata consisting of the avatar's “ears” or the listening virtual's current virtual location. Will be described). This may be followed by 80 bytes of audio data. The location information allows the filter implementation to determine which sources are in the "audible proximity" of the particular avatar. In a preferred embodiment, each source that is an avatar transmits a payload with LISTENER_POSITION information 2.5 times per second.

엘리먼트(423)는 수치 플래그 값이 0×10인 LISTENER_ORIENTATION 페이로드에 대한 플래그(이 플래그는 페이로드 데이터가 현재의 가상 방위(orientation)로 구성되거나 사용자의 아바타의 리스닝부의 방향을 향하는 메타데이터를 포함함)를 설명한다. 이 정보는 필터 구현예 및 가상 환경에서, 아바타가 "지향성 청력(directional hearing)", 또는 토끼나 고양의 귀들과 같이 청력을 위한 특별한 가상의 해부학적 구조를 가질 수 있도록 가상 현실을 확장하는 것을 허용한다. 엘리먼트(424)는 수치 플래그 값이 0×20인 SILENCE_FRAME 페이로드에 대한 플래그(이 플래그는 세그먼트가 침묵을 나타내는 것을 나타냄)를 설명한다.Element 423 is a flag for a LISTENER_ORIENTATION payload with a numeric flag value of 0x10, which includes metadata for which the payload data consists of the current virtual orientation or is directed towards the listening portion of the user's avatar. Explain). This information allows, in filter implementations and virtual environments, to extend virtual reality so that the avatar can have a "directional hearing", or a special virtual anatomical structure for hearing, such as rabbits or cat ears. do. Element 424 describes a flag for the SILENCE_FRAME payload whose numeric flag value is 0x20, which indicates that the segment indicates silence.

바람직한 실시예에서, 송신할 오디오 에미션 세그먼트들을 소스가 갖지 않으면, 소스는 전술한 바와 같이 필요에 따라 SILENCE_FRAME 페이로드들의 페이로드들을 송신하여 SPEAKER_POSITION 및 LISTENER_POSITION 페이로드들을 위치 메타데이터와 함께 송신한다.In the preferred embodiment, if the source does not have audio emission segments to transmit, the source transmits payloads of SILENCE_FRAME payloads as required above to transmit the SPEAKER_POSITION and LISTENER_POSITION payloads along with the location metadata.

필터링Filtering 동작에 대한 For action 세그먼트Segment 포맷의 부가적인 양태들 Additional aspects of the format

바람직한 실시예에서, 아바타로부터의 오디오 에미션은 그 동일한 아바타에 대하여는 렌더링되지 않으며, 그 아바타에 대한 스트리밍 오디오 데이터의 임의의 필터링으로 들어가지 않는다(이것은 설계 선택의 문제이다). 이 선택은 디지털 전화 통신 및 비디오 통신에서의 "측음(side-tone)" 오디오 또는 비디오 신호들을 렌더링하지 않거나 억제하는 공지의 관습에 부합한다. 대안의 실시예는 아바타이기도 한 소스로부터의 에미션들을, 그 동일한 아바타에 대하여 무엇이 인식될 수 있는지를 판정할 시에 처리 및 필터링할 수 있다.In a preferred embodiment, the audio emission from the avatar is not rendered for that same avatar and does not go into any filtering of streaming audio data for that avatar (this is a matter of design choice). This selection is consistent with known practice of not rendering or suppressing "side-tone" audio or video signals in digital telephony and video communications. An alternative embodiment may process and filter the emissions from a source that is also an avatar when determining what can be recognized for that same avatar.

용이하게 이해되는 바와 같이, 여기에 설명된 필터링 기술들은 가상의 환경의 관리 기능들이 통합되어 스트리밍 데이터의 필터링 및 가상 환경의 관리 양자에서 더 큰 효율을 얻을 수 있다.As will be readily appreciated, the filtering techniques described herein may incorporate management functions of a virtual environment to obtain greater efficiency in both filtering of streaming data and managing the virtual environment.

필터 동작의 세부사항들Details of the filter action

이제 필터링 시스템(517)의 동작이 세부적으로 설명될 것이다.The operation of filtering system 517 will now be described in detail.

20 밀리초의 기간에서, 세션 관리자(504)는 지휘 마스터 클럭(authoritative master clock)으로부터 시간 값을 읽는다. 이 세션 관리자는 그 후 인커밍 세그먼트들에 대한 커넥션들로부터 그 시간 값과 동일하거나 더 빠른 도달 시간을 갖는 모든 세그먼트들을 획득한다. 주어진 소스로부터의 둘 이상의 세그먼트가 반환되면, 그 소스로부터의 덜 최신의 세그먼트들이 포기된다. 남은 세그먼트들은 현재 세그먼트들의 세트로 지칭된다. 세션 관리자(504)는 그 후 현재 세그먼트들의 세트를 세그먼트 라우팅 컴포넌트(540)에 제공하며, 이 컴포넌트는 현재 세그먼트들을 특정한 아바타별 필터들(per-avatar filters)에 라우팅한다. 세그먼트 라우팅 컴포넌트의 동작이 이하에 설명될 것이다. 세그먼트 라우팅 컴포넌트(540)에 제공되지 않는 세그먼트들은 필터링되지 않으므로 아바타에 렌더링을 위해 전달되지 않는다.In a 20 millisecond period, session manager 504 reads the time value from the authoritative master clock. This session manager then obtains from the connections to the incoming segments all segments having a time of arrival equal to or faster than its time value. If more than one segment from a given source is returned, less recent segments from that source are abandoned. The remaining segments are referred to as the current set of segments. Session manager 504 then provides a set of current segments to segment routing component 540, which routes the current segments to specific per-avatar filters. The operation of the segment routing component will be described below. Segments that are not provided to the segment routing component 540 are not filtered and therefore not delivered for rendering to the avatar.

세그먼트 라우팅 컴포넌트(540)는 어느 소스들이 어느 아바타들의 가청 근접권역(아바타의 가청 근접권역는 아바타의 청력부의 특정한 가상의 거리 내에 있는 가상 환경의 부분임)내에 있는지를 기록하는 데이터 테이블인 인접 행렬(adjacency matrix)(535)을 이용하여 위치 세션에 속하는 세그먼트들에 대하여 스테이지 1(stage 1) 필터링을 수행한다. 바람직한 실시예에서, 이 가상 거리는 가상 현실 시스템의 가상 좌표 단위들(virtual coordinate units) 내의 80 단위들이다. 이 가상 거리보다 아바타의 청력부로부터 더 멀리에 있는 음향 에미션들은 아바타에게 들리지 않는다.Segment routing component 540 is an adjacency, which is a data table that records which sources are within which avatar's audible proximity (where Avatar's audible proximity is part of a virtual environment within a certain virtual distance of the avatar's hearing). A stage 1 filtering is performed on the segments belonging to the location session using the matrix 535. In a preferred embodiment, this virtual distance is 80 units within the virtual coordinate units of the virtual reality system. Acoustic emissions farther from the avatar's hearing than this virtual distance are inaudible to the avatar.

인접 행렬(535)은 도 7에 상세히 예시되며, 인접 행렬(535)은 2차원 데이터 테이블이다. 각각의 셀은 소스/아바타 조합을 나타내며 소스-아바타 조합에 대한 거리-가중치(distance-weight value)를 포함한다. 거리 가중치는 소스와 아바타 간의 가상 거리에 따른 세그먼트에 대한 고유한 소리의 세기 또는 세기 값을 조정하기 위한 인자(factor)이다(거리 가중 인자는 가상 거리가 클수록 작아짐). 이 바람직한 실시예에서, 거리 가중치는 거리의 선형 함수로서 롤-오프(roll-off)에 대한 클램프된 공식에 의해 계산된다. 다른 공식들이 대신 이용될 수 있는데, 예를 들어, 더욱 효율적인 동작을 위해 근사적이거나 클램핑, 또는 최소 및 최대 세기, 더욱 극적이거나 덜 극적인 롤-오프 효과들, 또는 다른 효과들 등의 효과들을 포함하는 공식이 선택될 수 있다. 예를 들어 다음의 예시적인 참조문헌들 중 어느 하나인 설계 선택에 따라 특정의 애플리케이션에 적합한 임의의 공식이 이용될 수 있다:The adjacency matrix 535 is illustrated in detail in FIG. 7, where the adjacency matrix 535 is a two-dimensional data table. Each cell represents a source / avatar combination and includes a distance-weight value for the source-avatar combination. The distance weight is a factor for adjusting the intrinsic sound intensity or intensity value for the segment according to the virtual distance between the source and the avatar (the distance weighting factor becomes smaller as the virtual distance is larger). In this preferred embodiment, the distance weight is calculated by the clamped formula for roll-off as a linear function of distance. Other formulas can be used instead, including effects such as approximate or clamping, or minimum and maximum intensities, more dramatic or less dramatic roll-off effects, or other effects for more efficient operation. The formula can be chosen. For example, any formula suitable for a particular application may be used depending on the design choice, which is one of the following exemplary references:

인접 행렬은 A, B, C 등으로 710에서 좌측을 따라 도 7에 도시된, 각각의 소스에 대한 하나의 행(row)을 갖는다. A, B, C 및 D로서 720에서 상부에 걸쳐 도시된 바와 같이 각각의 목적지(destination) 또는 아바타에 대하여 하나의 열(column)이 있다. 바람직한 실시예에서, 아바타는 또한 소스이므로, 아바타 B에 대하여 732에 열 B와 마찬가지로 730에 행 B가 존재하지만, 아바타들보다 더 많거나 더 적은 소스들, 및 아바타들이 아닌 소스들 및 그와 반대인 경우가 있을 수 있다. The adjacency matrix has one row for each source, shown in FIG. 7 along the left side at 710 with A, B, C, and the like. There is one column for each destination or avatar, as shown at 720 and above as A, B, C and D. In the preferred embodiment, since the avatar is also a source, there is a row B at 730, like column B at 732 for avatar B, but more or fewer sources than avatars, and sources other than avatars and vice versa. There may be a case.

인접 행렬 내의 각각의 셀은 행 및 열(소스, 아바타)의 교점에 있다. 예를 들어, 행(731)은 소스 D에 대한 행이며, 열(732)은 아바타 B에 대한 열이다. Each cell in the adjacency matrix is at the intersection of rows and columns (source, avatar). For example, row 731 is the row for source D and column 732 is the column for avatar B.

인접 행렬 내의 각각의 셀은 소스가 가청 근접권역 내에 있지 않거나 아바타에게 들리지 않는 것을 나타내는 0의 거리 가중치, 또는 0과 1 사이의 거리 가중치(이 값은 위에서 설명한 공식에 따라 계산된 거리 가중치이며, 그 목적지에서 그 소스로부터의 에미션에 대한 겉보기 세기(apparent loudness)를 결정하기 위하여 세기 값에 승산되어야 하는 인자임)를 포함한다. 행과 열의 교점에서의 셀(733)은 본 예에서는 0.5로서 도시된 (D, B)에 대한 가중치의 값을 유지한다.Each cell in the adjacency matrix is a distance weight of zero indicating that the source is not in the audible proximity or inaudible to the avatar, or a distance weight between 0 and 1 (this value is the distance weight calculated according to the formula described above, The factor that must be multiplied by the intensity value to determine the apparent loudness for the emission from that source at the destination. Cell 733 at the intersection of rows and columns holds the value of the weight for (D, B), shown in this example as 0.5.

가중 인자는 셀의 행에 의해 표현된 소스의 현재 가상 위치 및 열에 의해 표현된 아바타의 "귀"의 현재 가상 위치를 이용하여 계산된다. 바람직한 실시예에서, 디지털 통신 분야에서 공지된 측음 오디오에 대한 처리에 부합하여, 각각의 아바타와 그 자신에 대한 셀은 0으로 설정되고 변경되지 않으며, 소스인 엔터티(entity)로부터의 음향은 목적지로서의 엔터티에 송신되지 않는다. 이것은 값들의 대각선 세트(735)에 도시되는데, 이들은 모두 0이다(이 대각선의 모든 다른 셀들과 같이, 셀(소스=A, 아바타=A) 내의 거리 가중 인자는 0이다. 대각선을 따르는 셀들 내의 값들(735)은 더 나은 판독성을 위해 굵은 텍스트로 도시된다.The weighting factor is calculated using the current virtual position of the source represented by the row of cells and the current virtual position of the "ear" of the avatar represented by the column. In a preferred embodiment, in accordance with the processing for sidetone audio known in the field of digital communications, the cell for each avatar and itself is set to zero and unchanged, and the sound from the source entity is the destination. It is not sent to the entity. This is shown in diagonal set 735 of values, all of which are zero (as with all other cells in this diagonal, the distance weighting factor in cell (source = A, avatar = A) is 0. Values in cells along diagonal line) 735 is shown in bold text for better readability.

바람직한 실시예에서, 소스들 및 다른 아바타들은 가상 위치들에 대한 위치 데이터를 갖는 스트리밍 데이터의 세그먼트들을 초당 2.5회 송신한다. 세그먼트가 위치를 포함하는 경우, 세션 관리자(504)는 532에 나타낸 바와 같이, 인접 행렬(535) 내의 세그먼트의 소스 또는 다른 아바타와 연관된 위치 정보를 업데이트하기 위해 세그먼트(114)의 userID 및 위치 값들을 인접 행렬 업데이터(530)에 보낸다.In a preferred embodiment, the sources and other avatars transmit segments of streaming data 2.5 times per second with location data for virtual locations. If the segment includes a location, the session manager 504 may retrieve the userID and location values of the segment 114 to update the location information associated with the source or other avatar of the segment in the adjacency matrix 535, as shown at 532. Send to neighbor matrix updater 530.

인접 행렬 업데이터(530)는 인접 행렬(521)의 모든 셀들 내의 거리 가중 인자들을 주기적으로 업데이트한다. 바람직한 실시예에서, 이것은 다음과 같이 초당 2.5회의 주기로 수행된다.The neighbor matrix updater 530 periodically updates the distance weighting factors in all cells of the neighbor matrix 521. In a preferred embodiment, this is done at 2.5 cycles per second as follows.

인접 행렬 업데이터(530)는 인접 행렬(535)로부터 인접 행렬(535)의 각각의 행에 대한 관련된 위치 정보를 획득한다. 행에 대한 이 위치 정보를 획득한 후, 인접 행렬 업데이터(530)는 인접 행렬(535)의 각각의 열에 대한 아바타의 청력부에 대한 위치 정보를 획득한다. 위치 정보를 획득하는 것은 533에 나타낸다.Adjacency matrix updater 530 obtains relevant positional information for each row of adjacency matrix 535 from adjacency matrix 535. After obtaining this location information for the row, the adjacent matrix updater 530 obtains the location information for the hearing unit of the avatar for each column of the adjacent matrix 535. Obtaining location information is shown at 533.

아바타의 청력부에 대한 위치 정보를 획득한 후, 인접 행렬 업데이터(530)는 소스 위치와 아바타의 청력부의 위치 간의 가상 거리를 결정한다. 거리가 가청 근접권역에 대한 임계 거리보다 더 크면, 도시된 바와 같이, 인접 행렬(535) 내의 소스의 행 및 아바타의 열에 대응하는 셀에 대한 거리 가중치는 0으로 설정된다. 소스 및 아바타가 동일하다면, 값은 전술된 바와 같이 0으로서 변경 없이 유지된다. 그렇지 않다면, 소스 X와 목적지 Y 간의 가상 거리가 계산되고, 전술된 공식에 따라 거리 가중치(셀에 대한 거리 가중치는 이 값으로 설정)가 계산된다. 거리 가중치를 업데이트하는 것은 534에 예시된다.After acquiring positional information about the hearing unit of the avatar, the adjacent matrix updater 530 determines a virtual distance between the source position and the position of the hearing unit of the avatar. If the distance is greater than the threshold distance for the audible proximity, then as shown, the distance weight for the cell corresponding to the row of the source and the column of the avatar in the adjacency matrix 535 is set to zero. If the source and avatar are the same, the value remains unchanged as 0 as described above. Otherwise, the virtual distance between the source X and the destination Y is calculated, and the distance weight (the distance weight for the cell is set to this value) according to the above formula. Updating the distance weights is illustrated at 534.

세그먼트 라우팅 컴포넌트(540)가 소스가 아바타의 가청 근접권역 밖에 있다고 결정하는 경우, 세그먼트 라우팅 컴포넌트(540)는 소스로부터 아바타에 대한 스테이지 2 필터로 세그먼트들을 라우팅하지 않으므로, 이 세그먼트들은 아바타에 대하여 렌더링되지 않을 것이다.If the segment routing component 540 determines that the source is outside the avatar's audible proximity, then the segment routing component 540 does not route the segments from the source to the stage 2 filter for the avatar, so these segments are not rendered for the avatar. Will not.

세션 관리자(504)로 돌아가서, 세션 관리자(504)는 또한 512 및 516에 예시된 것들과 같이 스테이지 2 필터 컴포넌트들로의 잠재적인 전달을 위하여, 고정 세션들(static sessions)에 속하는 현재 세그먼트들을 세그먼트 라우팅 컴포넌트(540)에 제공한다.Returning to session manager 504, session manager 504 also segments current segments belonging to static sessions for potential delivery to stage 2 filter components, such as those illustrated at 512 and 516. To the routing component 540.

세그먼트 라우팅 컴포넌트(540)는 에미션에 대한 특정의 세그먼트가 송신되어야 하는 아바타들의 세트를 결정하고 상기 세그먼트를 그 아바타들에 대한 1 스테이지 2 필터들에 송신한다. 특정의 시간 슬라이스(time slice)동안 특정의 스테이지 2 에 송신되는 특정 소스로부터의 세그먼트들은 상이한 세션들로부터의 세그먼트들을 포함할 수 있고 복제 세그먼트들(duplicate segments)을 포함할 수 있다. 세션 ID 값이 고정 세션을 나타내면, 세그먼트 라우팅 컴포넌트는 아래에 설명된 세션 테이블에 액세스하여 그 세션의 멤버들인 모든 아바타들의 세트를 결정한다. 이것은 525에 도시된다. 세그먼트 라우팅 컴포넌트는 그 후 세그먼트를 그 아바타들과 연관된 스테이지 2 필터들의 각각에 송신한다.Segment routing component 540 determines the set of avatars for which a particular segment for an emission should be sent and transmits the segment to one stage 2 filters for those avatars. Segments from a particular source sent to a particular stage 2 during a particular time slice may include segments from different sessions and may include duplicate segments. If the session ID value indicates a fixed session, the segment routing component accesses the session table described below to determine the set of all avatars that are members of the session. This is shown at 525. The segment routing component then sends a segment to each of the stage 2 filters associated with the avatars.

세션 ID 값이 위치 세션의 값이라면, 세그먼트 라우팅 컴포넌트는 인접 행렬(535)에 액세스한다. 패킷의 소스에 대응하는 인접 행렬의 행으로부터, 세그먼트 라우팅 컴포넌트는 0이 아닌 거리 가중 인자를 갖는 인접 행렬의 모든 열들 및 각각의 그러한 열의 아바타들을 결정한다. 이것은 "인접 행렬"로 라벨링된 536에 도시된다. 세그먼트 라우팅 컴포넌트는 그 후 세그먼트를 그 아바타들과 연관된 스테이지 2 필터들의 각각에 송신한다.If the session ID value is the value of the location session, the segment routing component accesses the adjacency matrix 535. From the row of the adjacency matrix corresponding to the source of the packet, the segment routing component determines all the columns of the adjacency matrix with nonzero distance weighting factors and the avatars of each such column. This is shown at 536 labeled "adjacent matrix." The segment routing component then sends a segment to each of the stage 2 filters associated with the avatars.

고정 세션들에 대한 스테이지 1 필터링은 세그먼트 라우팅 컴포넌트(540) 및 세션 테이블(521)을 이용하여 수행된다. 세션 테이블(521)은 세션들 내의 멤버쉽을 정의한다. 세션 테이블은 2-열 테이블이며, 제1 열은 세션 ID 값을 포함하고, 제2 열은 소스 또는 아바타에 대한 식별자와 같은 엔터티 식별자를 포함한다. 엔터티는 엔터티의 식별자가 제2 열 내에 있는 모든 행들 내의 세션 ID 값에 의해 식별된 모든 세션들의 멤버이다. 세션의 멤버들은 제1 열 내의 세션의 세션 ID를 갖는 모든 행들의 제2 열에 나타나는 모든 엔터티들이다. 세션 테이블은 세션 업데이트 테이블에 행들을 추가하거나 그로부터 행들을 제거함으로써 고정 세션 멤버쉽에 응답하는 세션 테이블 업데이터 컴포넌트(520)에 의해 업데이트된다. 세션 테이블(521) 및 세션 테이블 업데이터(520) 둘 다의 구현을 위한 다수의 기술들이 당업자에게 공지되어 있다. 테이블(521)이 세그먼트에 대한 소스 및 아바타가 동일한 고정 세션에 속하는 것을 나타내는 경우, 세션 라우터(540)는 세그먼트를 아바타에 대한 스테이지 2 필터에 라우팅한다.Stage 1 filtering for fixed sessions is performed using segment routing component 540 and session table 521. Session table 521 defines membership in sessions. The session table is a two-column table, where the first column contains the session ID value and the second column contains the entity identifier, such as the identifier for the source or avatar. The entity is a member of all sessions identified by the session ID value in all rows in which the identifier of the entity is in the second column. Members of a session are all entities that appear in the second column of all rows with the session ID of the session in the first column. The session table is updated by the session table updater component 520 in response to fixed session membership by adding rows to or removing rows from the session update table. Numerous techniques are known to those skilled in the art for the implementation of both session table 521 and session table updater 520. If the table 521 indicates that the source and avatar for the segment belong to the same fixed session, the session router 540 routes the segment to the stage 2 filter for the avatar.

도 6은 바람직한 실시예의 512와 같은 스테이지 2 필터링 컴포넌트의 동작을 도시한다. 각각의 스테이지 2 필터링 컴포넌트는 단일의 아바타와 연관된다. 600은 스테이지 2 필터링 컴포넌트네 전달되는 현재의 세그먼트들(505)의 세트를 도시한다. 대표적 세그먼트들(611, 612, 613, 614 및 615)의 세트가 도시된다. 생략부호는 임의의 수의 세그먼트들이 있을 수 있는 것을 나타낸다.6 illustrates the operation of a stage 2 filtering component, such as 512 of the preferred embodiment. Each Stage 2 filtering component is associated with a single avatar. 600 shows a set of current segments 505 that are passed to the stage 2 filtering component. A set of representative segments 611, 612, 613, 614 and 615 are shown. Ellipses indicate that there may be any number of segments.

필터링 2 프로세싱의 시작은 620에 도시된다. 현재의 세그먼트들(505)의 다음 세트가 입력으로서 획득된다. 엘리먼트들의 단계들(624, 626, 628 및 630)은 단계(620)에서 획득된 현재의 세그먼트들의 세트 내의 각각의 세그먼트에 대하여 수행된다. 624는 각각의 세그먼트로부터 세그먼트의 에너지 값 및 세그먼트의 소스 id를 얻는 단계를 도시한다. 626에서, 각각의 세그먼트에 대하여, 세션 ID 값이 획득된다. 세션 ID 값이 위치 세션의 값이라면, 다음 단계는, 도시된 바와 같이 628이다. 세션 ID 값이 고정 세션의 값이라면, 다음 단계는 632이다.The beginning of filtering 2 processing is shown at 620. The next set of current segments 505 is obtained as input. Steps 624, 626, 628 and 630 of elements are performed for each segment in the current set of segments obtained in step 620. 624 shows obtaining from each segment an energy value of the segment and a source id of the segment. At 626, for each segment, a session ID value is obtained. If the session ID value is the value of the location session, the next step is 628, as shown. If the session ID value is the value of the fixed session, the next step is 632.

628은 인접 행렬(535)로부터 이 세그먼트의 소스인 소스, 및 이 필터 컴포넌트가 스테이지 2 필터 컴포넌트인 아바타인 아바타에 대한 인접 행렬(535)의 셀로부터의 거리 가중을 얻는 단계를 도시한다. 이것은 511에서 점선 화살표로 표시된다.628 shows obtaining a weight from the adjacency matrix 535 from the cell of the adjacency matrix 535 for the avatar that is the source of this segment, and the avatar whose avatar is the stage 2 filter component. This is indicated by dashed arrows at 511.

630은 세그먼트의 에너지 값과 셀로부터의 거리 가중을 승산하여, 세그먼트에 대한 에너지 값을 조정하는 단계를 도시한다. 모든 세그먼트들이 단계들(624, 626, 628 및 630)에 의해 프로세스된 후, 프로세싱은 단계(632)로 계속된다.630 illustrates adjusting the energy value for the segment by multiplying the segment's energy value by the distance weight from the cell. After all segments have been processed by steps 624, 626, 628, and 630, processing continues to step 632.

632는 각각의 세그먼트의 에너지 값에 의해 단계(622)에서 획득된 모든 세그먼트들을 소팅(sorting)하는 단계를 도시한다. 세그먼트들이 소팅된 후, 복제들의 임의의 세트 중 하나를 제외한 모두가 제거된다. 634는 필터링 2의 필터링의 출력으로서 622에서 획득된 세그먼트들의 서브세트를 출력하는 단계를 도시한다. 바람직한 실시예에서, 서브세트는 소팅 단계(632)에 의해 결정되는 가장 큰 에너지 값들을 갖는 세 개의 세그먼트들이다. 출력은 대표적인 세그먼트들(611, 614 및 615)을 도시하는 690에서 나타낸다.632 shows sorting all segments obtained in step 622 by the energy value of each segment. After the segments are sorted, all but one of any set of replicas is removed. 634 shows outputting a subset of the segments obtained at 622 as an output of the filtering of Filtering 2. In a preferred embodiment, the subset is three segments with the largest energy values determined by the sorting step 632. The output is shown at 690, which shows representative segments 611, 614, and 615.

물론, 본 발명의 기술들을 따라, 아바타에 출력될 세그먼트들의 선택은 바람직한 실시예에서 이용된 것들과는 상이한 소팅 및 선택 기준을 포함할 수 있다. 프로세싱은 루프 내에서 636으로부터 620에서의 시작 단계로 계속되기 전에, 634로부터 단계 636으로 계속된다. 636은 바람직한 실시예에서는 20 밀리초의 간격으로 주기적으로 루프가 실행되는 것을 보여준다.Of course, in accordance with the techniques of the present invention, the selection of segments to be output to the avatar may include different sorting and selection criteria than those used in the preferred embodiment. Processing continues from 634 to step 636 before continuing to the starting step at 620 from 636 in the loop. 636 shows that the loop is executed periodically at intervals of 20 milliseconds in the preferred embodiment.

렌더링을 위한 클라이언트 동작Client Behavior for Rendering

이 바람직한 실시예에서, 주어진 아바타에 대하여 인식 가능한 오디오 에미션을 나타내는 세그먼트들은 아바타의 인식의 포인트에 따라 그 아바타에 대하여 렌더링된다. 특정의 사용자에 대한 아바타의 경우, 렌더링은 사용자의 클라이언트 컴퓨터 상에서 수행되고, 오디오 데이터의 스트림들은 소스와 사용자의 아바타에 대한 가상 거리 및 관련 방향에 따른 적절한 겉보기 볼륨(apparent volume) 및 스테레오 음향(stereophonic) 또는 입체적인(binaural) 방향으로 렌더링된다. 렌더러에 송신된 세그먼트들은 세그먼트 대한 메타데이터를 포함하므로, 필터링을 위해 이용된 메타데이터는 또한 렌더러에서 이용될 수 있다. 또한, 필터링 2 동안 조정될 수 있는 세그먼트의 에너지 값이 렌더링 프로세스에서 이용될 수 있다. 따라서, 소스에 의해 최초 송신된 인코딩된 오디오 데이터를 트랜스코드(transcode)하거나 수정할 필요가 없으므로, 렌더링은 충실도(fidelity) 또는 명료도(intelligibility)의 손실이 없다. 렌더링은 물론, 필터링으로부터 유발된 렌더링될 세그먼트들의 수의 감소에 의해 크게 간단해지기도 한다.In this preferred embodiment, segments representing audio emissions recognizable for a given avatar are rendered for that avatar according to the avatar's point of recognition. In the case of an avatar for a particular user, rendering is performed on the user's client computer, and the streams of audio data are appropriate apparent volume and stereophonic depending on the source and the virtual distance to the user's avatar and the associated direction. ) Or in the binaural direction. Since the segments sent to the renderer include metadata for the segment, the metadata used for filtering may also be used at the renderer. In addition, energy values of the segments that can be adjusted during filtering 2 can be used in the rendering process. Thus, since there is no need to transcode or modify the encoded audio data originally transmitted by the source, the rendering has no loss of fidelity or intelligibility. The rendering is, of course, greatly simplified by the reduction in the number of segments to be rendered resulting from the filtering.

렌더링된 음향은 클라이언트 컴퓨터의 헤드폰들이나 스피커들을 통해 음향을 재생함으로써 사용자에 대하여 출력된다.The rendered sound is output to the user by playing the sound through headphones or speakers of the client computer.

바람직한 desirable 실시예의Example 다른 양태들 Other aspects

쉽게 이해될 것이지만, 본 발명의 기술들을 구현하거나 또는 적용하는 데에는 다수의 방법이 존재하며, 여기에 제공되는 예들은 제한적인 것이 아니다. 예를 들어, 필터링은 개별 실시예에서, 병행 방식으로, 또는 컴퓨터 자원의 가상화를 이용하여 구현될 수 있다. 또한, 본 기술들에 따른 필터링은 가상 현실 시스템의 네트워크 대역폭 및/또는 처리 전력을 최상으로 이용하도록 요구될 시에 시스템 내의 다양한 지점에서 또한 다양한 조합으로 선택되어 수행될 수 있다.As will be readily appreciated, there are a number of ways to implement or apply the techniques of the present invention, and the examples provided herein are not limiting. For example, filtering may be implemented in separate embodiments, in a parallel manner, or using virtualization of computer resources. In addition, the filtering according to the present techniques may be selected and performed at various points in the system and in various combinations when required to make the best use of the network bandwidth and / or processing power of the virtual reality system.

추가 종류의 Additional kinds 필터링Filtering , 및 다수 종류의 , And multiple kinds of 필터링의Filtering 조합 Combination

특정 아바타가 인식할 수 있는 에미션을 나타내는 세그먼트들을 특정 아바타가 인식할 수 없는 에미션을 나타내는 세그먼트들로부터 분리하는 어떠한 종류의 필터링 기술이라도 이용될 수 있다. 전술한 바람직한 실시예에 나타낸 바와 같이, 본 발명의 기술을 이용하여 다수 종류의 필터링이 개별적으로, 연속적으로, 또는 조합하여 이용될 수 있다. 또한, 임의의 종류의 에미션을 이용하여 에미션원과 에미션에 대한 감지자 간의 관계가 실시간으로 변할 수 있는 임의의 종류의 가상 환경에서 본 발명의 기술들에 따른 필터링을 이용할 수 있다. 사실상, 고정 세그먼트에 속하는 세그먼트들을 갖는 상대적 크기의 필터링의 이용에 대한 바람직한 실시예는 필터링이 위치에 의존하지 않는 상황에서의 기술들을 이용하는 예에 해당한다. 고정 세그먼트와 함께 이용되는 기술은, 예컨대 전화 회의 호출 어플리케이션에 이용될 수 있다.Any kind of filtering technique may be used that separates segments representing emissions that a particular avatar can recognize from segments representing emissions that a particular avatar cannot recognize. As indicated in the preferred embodiments described above, multiple types of filtering can be used individually, continuously, or in combination using the techniques of the present invention. In addition, filtering according to the techniques of the present invention may be used in any kind of virtual environment in which the relationship between an emission source and a sensor for an emission may change in real time using any kind of emission. Indeed, a preferred embodiment of the use of relative sized filtering with segments belonging to a fixed segment corresponds to an example of using techniques in a situation where filtering does not depend on location. Techniques used with fixed segments can be used, for example, in conference call applications.

명백한 사항으로서, 본 명세서에서의 기술들이 다수 종류의 통신 및 스트리밍 데이터에 애플리케이션될 수 있음과 함께 용이성 및 저비용성은 종래 기술에 대한 본 기술들이 갖는 장점 중 일부에 속한다.As is apparent, the ease and low cost of the techniques herein can be applied to many kinds of communication and streaming data, which are some of the advantages of the techniques over the prior art.

애플리케이션의 종류Type of application

물론, 본 발명의 기술들은 광범위한 애플리케이션을 포함한다. Of course, the techniques of the present invention cover a wide range of applications.

쉽게 알 수 있는 예로는,An easy example is

● 예를 들어, 가상 콘서트 홀 등의 가상 오디오 공간 환경에서 인식 지점에 대한 집합적인 오디오를 렌더링 행하기 위해, 레코딩을 위한 수많은 오디오 입력의 오디오 믹싱 및 렌더링에 대한 개선.Improvements to audio mixing and rendering of numerous audio inputs for recording, for example to render collective audio for recognition points in a virtual audio space environment, such as a virtual concert hall.

● 예를 들어, 수많은 아바타로부터의 텍스트 메시징 데이터의 스트림들이 가상 환경에서 동시에 디스플레이되거나 렌더링되어야 하는 경우의 텍스트 메시징 통신. 이는 본 기술들이 적용될 수 있는 스트리밍 가상 데이터의 가능한 다수의 예들 중 하나이다.Text messaging communication, for example, where streams of text messaging data from numerous avatars must be displayed or rendered simultaneously in a virtual environment. This is one of many possible examples of streaming virtual data to which the techniques can be applied.

● 전화/오디오 가상 회의 환경 등을 위해, 실시간 회의 시스템에 대한 스트리밍 데이터의 필터링 및 렌더링.● Filtering and rendering of streaming data for real-time conferencing systems, such as for telephone / audio virtual conferencing environments.

● 가상 감각 환경에서 감각 입력을 위한 스트리밍 데이터의 필터링 및 렌더링.● Filtering and rendering of streaming data for sensory input in a virtual sensory environment.

● 현실의 엔터티에 대한 실시간의 지리적 근접성에 기초한 스트리밍 데이터의 배포 - 이 엔터티는 가상 환경에서의 아바타와 관련되어 있음 - Distribution of streaming data based on real-time geographic proximity to real entities, which are related to avatars in the virtual environment.

이 소스들의 에미션을 필터링하는 데 필요한 정보의 종류는 가상 환경의 특성에 의존할 것이며, 가상 환경의 특성은 의도된 애플리케이션에 의존할 수 있다. 예를 들어, 회의 시스템을 위한 가상 환경에서, 회의 참석자 서로 간의 상대적인 위치는 중요하지 않을 수 있으며, 그러한 환경에서, 필터링은 회의 참석자의 오디오 에미션의 상대적 고유 크기 및 회의 참석자의 특정 세션과의 연관성 등의 정보에만 기초하여 행해질 수 있다.The kind of information needed to filter the emissions from these sources will depend on the nature of the virtual environment, which may depend on the intended application. For example, in a virtual environment for a conferencing system, the relative position of the meeting participants with each other may not be important, and in such an environment, filtering may be associated with the relative inherent size of the audio portion of the meeting attendees and their association with the particular session of the meeting participants. It can be done based only on information such as.

필터링과With filtering 다른 프로세싱의 조합 및 통합 Combination and Integration of Different Processing

또한, 필터링은 다른 프로세싱과 결합하여 우수한 효과를 낼 수 있다. 예를 들어, 가상 환경에서 소정 스트림의 미디어 데이터는, 가상 환경 내의 가상 분수의 흐르는 물 소리 등 "배경 음향"로서 식별될 수 있다. 가상 환경의 설계자는, 이들 기술의 통합체의 일부로서, 배경 음향이 다른 스트리밍 오디오 데이터와 동일하게 필터링되지 않고, 또한 다른 데이터가 필터링되지 않게 하되, 그 대신, 다른 방식으로 마스킹되고 필터링되었을 수도 있는 다른 스트리밍 데이터가 존재하는 경우에, 배경 음향에 대한 데이터가 필터링되고 프로세싱되어, 보다 약한 크기로 렌더링되는 것을 선호할 수 있다. 필터링 기술에 대한 그러한 애플리케이션은 배경 음향이, 클라이언트 컴포넌트 내의 컴포넌트를 렌더링함으로써 국지적으로 생성되게 하는 대신에, 가상 환경 시스템에서 서버 컴포넌트에 의해 생성되게 하는 것을 허용한다.In addition, filtering can be combined with other processing to produce excellent effects. For example, a stream of media data in a virtual environment may be identified as a "background sound" such as the sound of running water in a virtual fountain in the virtual environment. As part of the integration of these technologies, the designer of a virtual environment may not have the background sound filtered equally to other streaming audio data, but also to prevent other data from being filtered, but instead masked and filtered in a different way. If other streaming data is present, the data for the background sound may be filtered and processed to be rendered at a weaker size. Such an application for filtering techniques allows the background sound to be generated by the server component in the virtual environment system, instead of being generated locally by rendering the component within the client component.

또한, 이들 기술들에 따른 동일한 필터링은 서로 다른 종류의 에미션 및 스트리밍에 적용될 수 있다는 것을 쉽게 알 수 있다. 예를 들어, 서로 다른 사용자는 가상 환경을 통해 서로 다른 종류의 에미션에 의해 의사 전달을 할 수 있으며 - 청각 장애를 가진 사용자는 가상 환경에서 시각적인 텍스트 메시징에 의해 의사 전달을 하고, 또 다른 사용자는 음성 음향에 의해 의사 전달을 할 수 있음 - , 이에 따라 설계자는 통합된 방식으로 두 종류의 스트리밍 데이터에 동일한 필터링이 적용되도록 선탤할 수 있다. 그러한 구현예에서, 예를 들어, 서로 다른 종류의 2개의 에미션과 무관하게, 소스 위치, 세기, 및 아바타 위치 등 현재의 아바타 정보와 메타데이터에 따라 서로 다른 2개의 종류의 에미션에 대한 필터링이 행해질 수 있다. 요구되는 것은 비교가능한 세기 데이터가 전부이다.In addition, it is readily apparent that the same filtering according to these techniques can be applied to different kinds of emission and streaming. For example, different users can communicate by different kinds of emission through a virtual environment-a hearing impaired user communicates by visual text messaging in a virtual environment, and another user In this case, the designer can select the same filtering to be applied to the two types of streaming data in an integrated manner. In such implementations, filtering for two different types of emissions according to current avatar information and metadata, such as, for example, source position, intensity, and avatar position, irrespective of two different kinds of emissions. This can be done. All that is required is comparable intensity data.

전술한 바와 같이, 본 발명의 기술들은 렌더링되어야 할 데이터량을 감소시키는 데 이용될 수 있으며, 이에 따라 실시간 스트리밍 데이터의 렌더링을 네트워크 가상 현실 시스템의 "엣지(edge)"로 이동시킬 가능성을 증대시킨다 - 서버 컴포넌트 상에 부담을 가하는 렌더링이 아닌 목적지 클라이언트 상의 렌더링. 또한, 설계자는 이들 기술들을 이용하여, 클라이언트 상에서 이전에 구현된 레코딩 등의 기능이 서버 컴포넌트 상에서 수행될 수 있을 정도로 데이터량을 감소시켜, 특정 애플리케이션에 대한 설계자로 하여금 클라이언트 비용을 감소시키거나, 또는 클라이언트 컴퓨터나 그 소프트웨어에 지원되지 않는 가상 기능을 제공하도록 선택하는 것을 가능하게 한다.As mentioned above, the techniques of the present invention can be used to reduce the amount of data to be rendered, thereby increasing the likelihood of moving the rendering of real-time streaming data to the "edge" of the network virtual reality system. Rendering on the destination client, not rendering that burdens the server component. In addition, designers can use these techniques to reduce the amount of data so that functions such as recording previously implemented on the client can be performed on the server component, thereby reducing the client cost for the designer for a particular application, or It is possible to choose to provide virtual functions that are not supported by the client computer or its software.

필터링을 라우팅 및 그 외 프로세싱과 결합시키는 유연성 및 파워, 그리고 그렇게 함으로써 구현 비용이 매우 개선된다는 것은 본 명세서에 개시된 새로운 기술이 갖는 수많은 장점 중에 속한다.The flexibility and power of combining filtering with routing and other processing, and the resulting improvement in implementation costs, are among the many advantages of the new techniques disclosed herein.

본 기술의 적용의 일부 추가적인 양태들의 요약Summary of some additional aspects of the application of the present technology

전술한 바에 부가하여, 그 외 유용한 양태의 기술들이 물론 존재한다. 몇 가지 추가적인 예들을 여기서 언급한다.In addition to the foregoing, there are of course other useful aspects of the technique. Some additional examples are mentioned here.

바람직한 실시예에서, 예컨대, 위치 및 방위에 관련된 메다테이터에 의해 제공되는 현재의 에미션 소스 정보는, 렌더링의 최종 지점에서 스트리밍 미디어 데이터를 스테레오 또는 입체 음향적으로 렌더링하는 데 보다 유용할 수 있기 때문에, 렌더링된 음향은 좌측으로부터, 우측, 위로부터 등 적절한 상대적인 방향으로부터 들어오는 것으로 감지된다. 따라서, 필터링에 대해 이러한 관련 정보가 내포하는 것은 전술한 바에 부가하여, 렌더링에서의 시너지적인 장점을 추가로 가질 수 있다.In a preferred embodiment, for example, the current emission source information provided by the metadata, relating to position and orientation, may be more useful for rendering the streaming media data stereo or stereophonically at the final point of rendering. The rendered sound is perceived as coming from the appropriate relative direction from the left, from the right, from above, and so on. Thus, the inclusion of such relevant information for filtering may further have synergistic advantages in rendering, in addition to those described above.

부분적으로 종래 기술에 대한 이들 장점 및 새로운 간량성으로 인해, 본 발명의 기술들을 이용하는 시스템은 매우 신속하게 동작할 수 있으며, 또한 설계자는 자신의 기술들을 신속하게 이해하고 인식할 수 있다. 이들 기술들의 일부는 특별한 하드웨어 또는 펌웨어의 구현에 특히 도움을 준다. 설계 선택의 문제로서, 이 기술들은 네트워크 패킷 라우팅 시스템과 마찬가지의 인프라구조와 통합될 수 있다: 따라서 이들 새로운 기술들은 용이하고 광범위하게 이용가능한 매우 효율적인 새로운 종류의 컴포넌트들, 및 장래에 이용가능하게 될 수 있는 새로운 종류의 컴포넌트에 의해 구현될 수 있다. 또한, 이들 기술들은 아직 알려지지 않은 종류의 에미션 및 아직 구현되지 않은 종류의 가상 환경에 적용될 수도 있다.In part due to these advantages and novel simplicity over the prior art, systems using the techniques of the present invention can operate very quickly, and designers can also quickly understand and recognize their own techniques. Some of these techniques are particularly helpful in the implementation of special hardware or firmware. As a matter of design choice, these techniques can be integrated with the same infrastructure as a network packet routing system: thus, these new techniques are very efficient, a new kind of components that will be readily and widely available, and will be available in the future. Can be implemented by a new kind of component. In addition, these techniques may be applied to types of emissions that are not yet known and types of virtual environments that are not yet implemented.

결론conclusion

전술한 상세한 설명은 아바타별로 렌더링된 환경을 이용하는 가상 현실 시스템에서 실시간의 아바타별 스트리밍 데이터를 제공하기 위해 발명자의 조정가능한 기술들을 이용하는 방법에 대해 당업자에게 개시하고 있으며, 또한 이들 기술들을 구현하는 발명자들에게 현재 알려진 최상의 모드를 개시하고 있다.The foregoing detailed description discloses to those skilled in the art how to use the inventor's adjustable techniques to provide real-time avatar-specific streaming data in a virtual reality system using an avatar-rendered environment, and also the inventors implementing these techniques. It discloses the best mode currently known to.

당업자라면, 스트리밍 데이터가 렌더링되며 스트리밍 데이터를 전달하거나 또는 렌더링하는 데 필요한 프로세싱 자원 및/또는 네트워크 대역폭을 감소시킬 필요가 있는 어떠한 영역에서라도, 이들 기술들에 대해 가능한 다수의 애플리케이션들이 존재한다는 것을 쉽게 알 수 있을 것이다. 본 필터링 기술은, 스트리밍 데이터가 가상 환경의 소스들로부터의 에미션을 나타내고, 가상 환경에서 서로 다른 감지 지점에 대해 요구될 시에 렌더링되는 경우에 특히 유용하다. 물론, 필터링이 행해지는 기반(basis)은 가상 환경의 본질 및 에미션의 본질에 의존할 것이다. 본 명세서에 개시되는 심리음향(psychoacoustic) 필터링 기술들은 단지 가상 환경에서 뿐만 아니라, 다수의 소스로부터의 오디오가 렌더링되는 어떠한 상황에서도 보다 유용하다. 마지막으로, 렌더러에서 스트리밍 데이터를 필터링하고 렌더링하는 양자 경우에서 스트리밍 데이터를 포함하는 세그먼트들 내의 메타데이터를 이용하는 기술은 네트워크 대역폭 요건 및 프로세싱 자원 양자 모두에서 실질적인 감소의 효과를 가져온다.Those skilled in the art will readily recognize that there are as many applications as possible for these techniques in any area where streaming data is rendered and needs to reduce the processing resources and / or network bandwidth needed to deliver or render streaming data. Could be. The present filtering technique is particularly useful when streaming data represents emissions from sources in the virtual environment and is rendered when required for different sensing points in the virtual environment. Of course, the basis on which filtering is done will depend on the nature of the virtual environment and the nature of the emission. The psychoacoustic filtering techniques disclosed herein are more useful not only in a virtual environment, but in any situation where audio from multiple sources is rendered. Finally, in both cases of filtering and rendering the streaming data in the renderer, the technique of using metadata in the segments containing the streaming data results in a substantial reduction in both network bandwidth requirements and processing resources.

또한, 당업자라면, 구현기가 존재하는 만큼 본 발명자의 기술들을 구현하는 다수의 방법들이 존재한다는 것을 즉시 이해할 것이다. 본 기술에 주어진 구현예의 세부 사항은 스트리밍 데이터가 무엇을 나타내는지, 환경의 종류, 가상인식 아니면 그 외의 것인식, 본 기술들이 무엇에 의해 이용되는지, 및 시스템의 프로세싱 자원과 이용가능한 네트워크 대역폭의 양 및 위치와 관련하여 본 기술들이 이용되는 시스템 컴포넌트의 성능에 의존할 것이다.In addition, those skilled in the art will readily appreciate that there are many ways to implement the techniques of the present inventors as long as the implementers exist. Details of the implementations given in the present technology include what the streaming data represents, the type of environment, what is virtual or otherwise, what are the technologies used, and the amount of processing resources and available network bandwidth of the system. And with respect to location will depend on the performance of the system component in which the techniques are used.

전술한 이유 전부에 대하여, 본 상세한 설명은 모든 관점에서 예시적인 것이며 제한적이지 않은 것으로 간주되어야 하며, 본 명세서에 개시된 본 발명의 범위는 상세한 설명이 아닌 특허법이 허용하는 최광의로 해석되는 청구범위로부터 결정되어야 한다.For all of the above reasons, the present description is to be considered in all respects only as illustrative and not restrictive, and the scope of the invention disclosed herein is to be accorded the broadest interpretation of the claims as permitted by the patent law and not by the detailed description. It must be decided.

Claims

As a filter in a virtual reality system, the virtual reality system renders a virtual environment that is perceived by an avatar in a virtual environment, wherein the virtual environment is the virtual environment in which the applicability of the virtual environment by the avatar changes in real time. A source of emission in an environment, the emission being represented in the virtual reality system by segments comprising streaming data-,
The filter is associated with the avatar,
The filter,
Current emission source information for the emission represented by the streaming data of the segment; And
Access current avatar information for the avatar of the filter,
The filter determines, from the current avatar information and the current emission source information on the streaming data of the segment, whether the emission represented by the streaming data of the segment is recognized by the avatar, and in the determination And the virtual reality system does not use the segment when rendering the virtual environment if the emission represented by the streaming data of the segment appears to be unrecognizable to the avatar.

The method of claim 1,
And determining whether the emission is recognized is based on a physical characteristic of the emission in the virtual environment.

The method of claim 1,
And the avatar further recognizes an emission that the avatar cannot recognize in the virtual environment based on a member of the minimum avatar group.

The method of claim 2,
The physical property is a distance between the emission and the avatar in the virtual environment rendering an emission not recognized by the avatar.

The method of claim 1,
A plurality of emissions exist in the virtual reality that the avatar can recognize;
The determination of whether the emission is recognizable comprises determining whether the emission is psychologically recognizable by the avatar compared to other recognizable emissions.

The method of claim 5,
When recognized by the avatar, the plurality of emissions have different intensities;
Whether or not the emission is psychologically recognizable by the avatar is determined by the relative strength of the emission relative to the strengths of other emissions the avatar is recognizable.

The method of claim 1,
A plurality of emissions exist in the virtual reality that the avatar can recognize;
The filter,
A first judgment determining whether the emission is recognized in the virtual environment based on a physical characteristic of the emission; And
And a second determination that determines whether the emission is psychologically recognizable by the avatar relative to other recognizable emissions.

The method of claim 7, wherein
And said filter makes said second determination only if said emission is determined to be recognizable in said first determination.

The method according to any one of claims 1 to 8,
And said emission is an audible emission which is audible in said virtual environment.

The method according to any one of claims 1 to 8,
And the emission is a visible emission visible in the virtual environment.

The method according to any one of claims 1 to 8,
And the emission is a haptic emission recognized by a touch in the virtual environment.

The method according to any one of claims 1 to 8,
The virtual reality system is a distributed system of a plurality of components, the components are mutually accessible by a network, the emission is generated in the plurality of first components and used to render the virtual environment in another component, Segments are transported between the component and other components through the network, and the filter can be located anywhere between the first component and the second component of a distributed system.

The method of claim 12,
The components of the distributed system include at least one client and a server, wherein the emission is generated and / or rendered for the server containing the filter and the avatar of the client, the server from the client Receiving the segments representing an emission and selecting segments provided to the client to be rendered for the avatar using the filter.

The method according to any one of claims 1 to 8,
And the current emission source information for the emission represented by the streaming data of the segment is also included in the segment.

The method according to any one of claims 1 to 8,
Wherein said segments further comprise segments of current avatar information from which said filter obtains current avatar information for the avatar of said filter.

A filter in a system that renders an emission represented by a segment of streaming data, wherein the emission is rendered by the system if the emission is recognized at that point in time from a potentially recognized recognition point;
The filter is associated with the recognition point,
The filter,
Current emission information for the emission represented by the streaming data of the segment at this point in time; And
Access current recognition point information for the recognition point of the filter at this point,
The filter determines, from the current recognition point information and the current emission information, whether the emission represented by the streaming data of the segment can be recognized at the recognition point of the filter, and the determination is the If the emission represented by the streaming data of the segment appears to be unrecognizable at the recognition point of the filter, then the system does not use the segment when rendering the emission at the recognition point of the filter. Featured filter.

Sound from a plurality of sources, the sound being from a source having a characteristic that changes in real time and from each of a plurality of sources represented as segments in a segment stream generated by the source. As a filter in the system,
The filter receives a time-sliced segment stream from the sources;
The filter selects from the stream the segments belonging to the time slice for rendering in accordance with a psychoacoustic effect resulting from the interaction of a characteristic with respect to the sound represented by the segments belonging to the time slice. Featured filter.

A renderer for rendering emissions from a plurality of sources, wherein the emissions change in real time, and the emissions from each of the sources are represented by segments comprising streaming data;
A segment from a source includes information about the emission of the source in addition to the streaming data, and the information about the emission of the source in the segment also includes a subset of segments representing emissions from the plurality of sources. Used to filter the segment to make it available to the renderer;
And the renderer renders the segments belonging to the subset using information about the emission of the source of the segments belonging to the subset.