3D video and device mediation with GStreamer

By Nathan Willis
October 21, 2015

When GStreamer 1.6 was released in September, the list of new features was lengthy enough that it could be a bit overwhelming at first. Such is an unfortunate side effect of a busy project coupled with a lengthy development cycle. Fortunately, the 2015 GStreamer Conference provided an opportunity to hear about several of the new additions in detail. Among the key features highlighted at the event are 3D video support and a GStreamer-based service to mediate access to video hardware on desktop Linux systems.

Entering the third dimension

Jan Schmidt of Centricular presented a session about the new stereoscopic 3D support in GStreamer 1.6. The term "stereoscopic," he said, encompasses any 3D encoding that sends separate signals to each eye and relies on the user's brain to interpret the depth information. That leaves out exotic techniques like volumetric displays, but it still includes a wide array of ways that the two video signals can be arranged in the container file.

There could be a single video signal that is simply divided in half, so that left and right images are in every frame; this is called "frame-packed" video. Or the stream could alternate left and right images with every frame, which is called "frame-by-frame" video. There could also be two separate video streams—which may not be as simple as it sounds. Schmidt noted that 3D TV broadcasts often use an MPEG-2 stream for one eye and an H.264 stream for the other. Finally, so-called "multi-view" video also needs to be supported. This is a scheme that, like 3D, sends two video signals together—but multi-view streams are not meant to be combined; they contain distinct streams such as alternate camera angles.

GStreamer 1.6 supports all of the 3D and multi-view video modes in a single API, which handles 3D input, output, and format conversion. That means it can separate streams for playback on 3D-capable display hardware, combine two video streams into a 3D format, and convert content from one format to another. Schmidt demonstrated this by converting 3D video found on YouTube between a variety of formats, and by converting a short homemade video captured with two webcams into a stereoscopic 3D stream.

GStreamer does its 3D processing using OpenGL, so it is fast on modern hardware. There are three new elements provided: gstglviewconvert rewrites content between the formats, gstglstereoview splits the two signals into separate streams, and gstglstereomix combines two input streams into a single 3D stream. For display purposes, 3D support was also added to the existing gstglimagesink element. In response to an audience question, Schmidt said the overhead of doing 3D conversion was negligible: one extra copy is performed at the OpenGL level, which is not noticeable.

Most of the video processing involved is backward-compatible with existing GStreamer video pipelines (although a filter not intended for 3D streams may not have the desired effect). The metadata needed to handle the 3D stream—such as which arrangement (left/right, top/bottom, interleaved, etc.) is used in a frame-packed video—is provided in capabilities, Schmidt said. GStreamer's most-used encoder, decoder, and multiplexing elements are already 3D-aware; most other elements just need to pass the capabilities through unaltered for a pipeline to work correctly. And one of the supported output formats is red-green anaglyph format, which may be the easiest for users to test since the equipment needed (i.e., plastic 3D glasses) is cheap.

Multi-view support is not as well-developed as 3D support, he said; it works fine for two-stream multi-view, but there are few test cases to work with. The technique has some interesting possibilities, he added, such as the potential for encoded multi-view streams to share inter-frame prediction data, but so far there is not much work in that area.

Don't call it PulseVideo

GStreamer founder Wim Taymans, now working at Red Hat, introduced his work on Pinos, a new Linux system service designed to mediate and multiplex access to Video4Linux2 (V4L2) hardware. The concept is akin to what PulseAudio does for sound cards, he said, although the developers chose to avoid the name "PulseVideo" for the new project since it might incorrectly lead to users assuming there was a connection between the projects.

The initial planning for Pinos began in 2014, when developer William Manley needed a way to share access to V4L2 hardware between a GStreamer testing framework and the application being tested. Around the same time, the GNOME app-sandboxing project was exploring a way to mediate access to V4L2 devices (specifically, webcams) from sandboxed apps. The ideas were combined, and the first implementation written by Taymans and several other GStreamer and GNOME developers in April 2015.

Pinos runs as a daemon and uses D-Bus to communicate with client applications. Using D-Bus, the clients can request access to camera hardware, negotiate the video format they need, and start or stop streams. GStreamer provides the media transport. Initially, Taymans said, they tried using sockets to transfer the video frames themselves, but that proved too slow to be useful. Ultimately, they settled on exchanging the media with file descriptors, since GStreamer, V4L2 hardware, and OpenGL could all already use file descriptors. A socket is still used to send each client its file descriptor, as well as timestamps and other metadata.

The implementation uses several new elements. The most important are pinossrc and pinossink, which capture and send Pinos video, respectively, gstmultisocketsink, which is used by the daemon to pass data to clients, and gstpinospay, which converts a video stream into the Pinos format. Taymans said he tried to make the client-side API as simple as possible, then rewrote it to be even simpler. A client only needs to send a connection request to the Pinos daemon, wait to receive a file descriptor in return, then open a file-descriptor source element with the file descriptor handed back by the Pinos daemon. At that point, the client can send the start command and begin reading video frames from the file descriptor, and send the pause or stop commands as needed. Frame rates, supported formats, and other details can be negotiated with the V4L2 device through the daemon.

Basic camera access is already working; as a test case, the desktop webcam application Cheese was rewritten to use Pinos, and the new elements worked "out of the box." The Pinos branch is expected to be the default in the next Cheese release. At that point, Pinos will need to be packaged and shipped by distributions. The sandboxed-application use case, however, still requires more work, since the security policy needed by the sandbox has not been defined yet. It is also not yet been decided how best to handle microphone access—which may fall under Pinos's purview because many webcams have built-in microphones. And there are other ideas still worth exploring, Taymans said, such as allowing Pinos clients to send video as well as receive it. He speculated that the service could also be used to take screenshots of a Wayland desktop, which is a feature that has been tricky to handle in the Wayland security model.

Looking even further out, Taymans noted that because GStreamer handles audio and video, Pinos could even replace PulseAudio for many application use cases. It may make sense, after all, to only worry about managing one connection for both the audio and video. He quickly added, however, that this concept did not mean that Pinos was going to replace PulseAudio as a standard system component.

Pinos support and stereoscopic 3D support are both available in GStreamer 1.6. In both cases, it may still be some time before the new features are accessible to end users. Taymans noted at the end of his talk that packaging Pinos for Fedora was on the short list of to-do items. Experimenting with 3D video requires 3D content and hardware, which can be pricey and hard to locate. But, as Schmidt demonstrated, GStreamer's ability to combine two camera feeds into a single 3D video is easy to use—perhaps easy enough that some users will begin working with it as soon as they install GStreamer 1.6.

[The author would like the thank the Linux Foundation for travel assistance to attend GStreamer Conference.]

Index entries for this article
Conference	GStreamer Conference/2015

3D video and device mediation with GStreamer

Posted Oct 22, 2015 18:25 UTC (Thu) by knobunc (subscriber, #4678) [Link]

Neat! Is there any support for allowing "effects" plugins to run in the pipeline? i.e. I want to take my real webcam, munge the audio/video, and present the result as another webcam stream. More specifically, I want to crop and zoom the output from my webcam so Skype sees the cropped result... (or perhaps run deepdream on my video feed to make the scrum meetings more interesting).

-ben