Being someone who has already experimented with two transformation box approaches for Pitivi in the past, maintainers thought I might be the right person to do a modern one.

Creating a user interface for a video transformation requires three things:

The implementation of the transformation

A way to draw the widgets over the viewer and

Mapping the input to the reverse transformation

The transformation

First of all the implementation of the transformation, which is in our case scaling and translation, is currently done by GES.UriSource, calculated on the CPU. In the first Pitivi transformation box I did in GSoC 2012 this was done by the notorious Frei0r plugins from GStreamer Plugins Bad, which is also a CPU implementation. In the second version this was done on the GPU with the gltransformation element I wrote for GSoC 2014.

A method to draw widgets over the viewer

In Pitivi’s case, the viewer is a GStreamer sink. In all three versions rendering of the overlay widgets was done by Cairo, but it was done differently for all three implementations, since they all used different sinks.

2012: The first one used a hacky solution where the sink and cairo drew in the same Gtk drawing area, acquired for GStreamer with the Gst Overlay API. Many Gtk and GStreamer devs wondered how this worked at all. This and Pitivi switching to more modern sinks was the reason why the first version of the box didn’t stay upstream for long.

2014: Still using Gst Overlay API, but this time with the glimagesink. The cairo widgets are rendered into OpenGL textures and composed in the glimagesink draw callback in the GStreamer GL context. Worked pretty smooth for me, but didn’t provide a fallback solution for users without GL. Clearly an approach for the future, but how about something solid?

2016: Now we have the almighty GtkSink in Pitivi. It is a Gtk widget and overlays can be added via Gtk Overlay. The sink is also exchangeable with GtkGLSink, which uses a GStreamer GL context to display the video texture and also can use GStreamer GL plugins like gltransformation without needing to download the GPU memory with gldownload. Cairo rendering now can be easily added over GStreamer sinks, yay.

Linking the UI with the components doing the transformation

The mapping of the input from the UI to the transformation is clearly dependent on the transformation you are using. In the 2012 version I needed to map the input to frei0r^-1. In 2014 I used an OpenGL Model-View-Projection matrix calculated in Graphene, which could also do rotations and 3D transformations (we have a z-axis, yay).

The 2016 implementation uses the inverse transformation for the GES.UriSource transformation, which is done by the GStreamer elements videomixer and videoscale. Of course things like keeping aspect ratio, maintaining limits and transforming Gtk widget coordinates to the transformation’s coordinates are part of this 3rd ingredient.

Extensibility

The new transformation box fits great with Pitivi by making clips selectable from the viewer, so you can manage multiple overlapping clips quite easily. But the best part of this implementation may be its extensibility. I already made two overlay classes, one for the normal clips which uses a GES.UriSource transformation and one for title clips aka GES.TextSource, which is using different coordinates and different GStreamer plugins. In this fashion other plugins can be written for the Pitivi viewer, for example for 3D transformations with gltransformation. Or you could do crazy stuff like a UI for barrel distortion etc.

This article sums up some VR R&D work I have been doing lately at Collabora, so thanks for making this possible! 🙂

Previously on GStreamer

Three Years ago in 2013 I released an OpenGL fragment shader you could use with the GstGLShader element to view Side-By-Side stereoscopical video on the Oculus Rift DK1 in GStreamer. It used the headset only as a stereo viewer and didn’t provide any tracking, it was just a quick way to make any use of the DK1 with GStreamer at all. Side-by-side stereoscopic video was becoming very popular, due to “3D” movies and screens gaining popularity. With its 1.6 release GStreamer added support for stereoscopic video, I didn’t test Side-By-Side stereo with that though.

Why is planar stereoscopic video not 3D?

Stereoscopic video does not provide the full 3D information, since the perspective is always given for a certain view, or parallax. Mapping the stereo video onto a sphere does not solve this, but at least it stores color information independent of view angle, so it’s way more immersive and gets described as telepresence experience. A better solution for “real 3D” video would be of course capturing a point cloud with as many sensitive sensors as possible, filter it and construct mesh data out of it for rendering, but more on that later.

A brief history of mapping imagery on anything different than a plane

Nowadays mankind projects its imagery mostly onto planes, as seen on most LCD Screens, canvases and Polaroids. Although this seems to be a physical limitation, there are some ways to overcome it, in particular with Curved LCD screens, fancy projector setups rarely seen in art installations and of course most recently: Virtual Reality Head Mounted Displays.

Projecting our images on different shapes than planes in virtual space is not new at all though. Still panoramas have been very commonly projected onto cylinders, not only in modern photo viewer software, but also in monumental paintings like the Racławice Panorama, which is housed inside a cylinder shaped building.

But to store information from each angle in 3D space we need a different geometric shape.

The sphere

Sperical projection is used very commonly for example in Google Street View and of course in VR video.

As we are in 3D, a regular angle wouldn’t be enough to describe all directions on a sphere, since 360° can only describe a full circle, a 2D shape. In fact we have 2 angles, θ and φ, also called inclination and azimuth.

This is why the term 360° video does not suite to describe spherical video, since there are other popular shapes for projecting video having φ = 360°, like cylinders. 180° video also uses usually a half cylinder. With video half spheres, or hemispheres, you could for example texture a skydome.

So did you implement this sphere?

In Gst3D, which is a small graphics library currently supporting OpenGL 3+ I am providing a sphere built with one triangle strip, which has an equirectangular mapping of UV coordinates, which you can see in yellow to green. You can switch to the wireframe rendering with Tab in the vrtestsrc element.

This is why I decided to use OpenHMD as a minimal approach for initial VR sensor support in GStreamer. For broader headset support, and because I think it will be adapted as a standard, I will implement support for OSVR in the future of gst-plugins-vr.

What if I do not have an HMD?

No problem, you can view spherical videos and photos anyway. Currently you can compile gst-vr without OpenHMD and can view things with an arcball camera, without stereo. So you can still view spherical video projected correctly and navigate with your mouse. This fallback mode would probably be best done during run time.

But for VR you need stereo rendering and barrel distortion, right?

Right, they are the core components required in a VR renderer. Stereo rendering and projection according to IMU sensor happens in vrcompositor, which can also be used without a HMD with mouse controls.

The hmdwarp element

For computing the HMD lens distortion, or barrel distortion, I use a fragment shader based approach. I know that there are better methods for doing this, but this seemed like a simple and quick solution, since it does not really eat up much performance.

Currently the lens attributes are hardcoded for Oculus DK2, but I will soon support more HMDs, in particular the HTC Vive, and everything else that OSVR support could offer.

In this example pipeline I use GtkGLSink which works fine, but only provides a refresh rate of 60Hz, which is not really optimal for VR. This restriction may reside inside Gtk or window management, still need to investigate it, since the restriction also appears using the Gst Overlay API with Gtk.

Viewing equirectangular projected photos

You can just do an image search of equirectangular and will get plenty of images to view. Using imagefreeze in front of vrcompositor makes this possible. Image support is not implemented in SPHVR yet, but you can just run this pipeline:

Multiple outputs?

In most VR applications a second output window is created to spectate the VR experience on the desktop. In SPHVR I use the tee element for creating 2 GL sinks and put them in 2 Gtk windows via the GStreamer Overlay api, since GtkGLSink still seems to have it’s problems with tee.

SPHVR

Pronounced sphere, SPHVR is a python video player using gst-plugins-vr. Currently it is capable of opening a URL of an equirectangular mapped spherical video.

You need to configure your Oculus DK 2 screen to be to be horizontal, since I do not do a roation in SPHVR yet. Other HMDs also may not require this.

SPHVR detects your DK2 display using GnomeDesktop and Gdk if available and opens a full screen VR window on it.

To open a video

$ sphvr file:///home/bmonkey/Videos/elephants.mp4

Sensors for spherical video

Spherical video sensors range from consumer devices like the Ricoh Theta for $300 to the professional Nokia Ozo for $60,000. But in general you can just use 2 wide angle cameras and stitch them together correctly. This functionality is mostly found in photography software like Hugin, but will need to find a place in GStreamer soon. GSoC anyone?

Why is sphere + stereo still meh?

The other difficulty besides stitching in creating spherical video is of course stereoscopy. The parallax being different for every pixel and eye makes it difficult to be losslessly transformed from the sensor into the intermediate format and to the viewer. Nokia’s Ozo records stereo with 8 stereo camera pairs in each direction, adjusted to a horizontal default eye separation assumption for the viewer. This means that rotating your head around the axis you are looking along (for example tilting the head to the right) will still produce a wrong parallax.

John Carmack stated in a tweet that his best prerendered stereo VR experience was with renderings from Octane, a renderer from OTOY, who also created the well known Brigate, a path-tracer with real time capabilities. You can find the stereoscopic cube maps on the internet.

So it is apparently possible to encode correct projection in a prerendered stereoscopic cube map, but I still assume that the stereo quality would be highly isotropic. Especially when translating the viewer position.

With stereoscopic spherical video also no real depth information is stored, but we could encode our depth information projected spherically around the viewer if you like, so a spherical video + spherical depth texture constructed from whatever sensory, would be more immersive / correct than having 3D information as a plain stereo image. But this solution would lack the ability to move in the room.

I think we should use a better format for storing 3D video.

Room-scale VR Video

If you want to walk around the stuff in your video, maybe one could call this holographic, you need a point cloud with absolute world positions. This of course could be converted into a vertex mesh with algorithms like Marching Tetrahedra, compressed and sent over the network.

Real 3D sensors like laser scanners, or other time-of-flight cameras like the Kinect v2 are a good start. You can of course reconstruct 3D positions from a stereoscopic camera, and calculate a point cloud out of it, but this will also result in a point cloud.

Point clouds?

In a previous post I was describing how to stream point clouds over the network with ROS, which I also did in HoloChat. Porting this functionality to GStreamer was always something that teased me, so I implemented a source for libfreenect2. This work is still pretty unfinished, since I need to implement 16bit float buffers in gst-video to transmit the full depth information from libfreenect2. So the point cloud projection is currently wrong, the color buffer is not currently mapped onto it, also no mesh is currently constructed. The code could also get some performance improving attention as well, but here are my first results.

Since the point cloud is also a Gst3D scene, it can be already viewed with a HMD, and since it’s part of GStreamer, it can be transmitted over the network for telepresence, but there is currently no example doing this yet. More to see in the future.

Distribution

You can find the source of gst-plugins-vr on Github. An Arch Linux package is available on the AUR. In the future I plan distribution via flatpak.

Future

In the future I plan to implement more projections, for example 180° / half cylinder stereo video and stereoscopic equirectangular spherical video. OSVR support, improving point cloud quality and point cloud to mesh construction via marching cubes or similar are also possible things to do. If you are interested to contribute, then feel free to clone.

I was able to attend SVVR 2016 last week, where I experienced many insightful VR impressions and saw where the industry is currently heading to. I also want to thank Collabora for giving me the possibility to do this. Note that the opinions in this article are my own and not Collabora’s.

Booths

SculptrVR

Since I was always enthusiastic about voxel graphics, I needed to check out SculptrVR’s HTC Vive demo. It utilizes the Vive’s SteamVR controller and the Lighthouse full room tracking system to achieve an unique minecraft-esque modeling experience. The company is a startup dedicated to this one product. The editor is currently capable of creating voxels with different cube sizes and colors, and more importantly to destroy them dynamically with rockets. The data structure used is a Sparse Voxel Octree implemented in C++ on top of the Unreal 4 engine. It is capable of export the models to an OBJ vertex mesh. Voxel format export like MagicaVoxel is not yet supported. The prototype was implemented in the Unity engine with Leap Motion input and Oculus DK2 HMD support, but the developer dropped it in favour of Vive and Unreal 4, which gave him more solid tracking and rendering. Product development will continue in the direction of social model sharing and game support. Their software is available on Steam for 20$.

Whirlwind VR

One of the most curious hardware accessories was presented by Whirlwind VR, which is basically a fan with an USB connector and Unity engine plugin. It obviously adds immersion to demos involving rapid movement in an open air vehicle. Other cases utilize the fan’s heating system in order to simulate a dragons breathing fire into your face. A wide spread market is questionable for this product, but I encourage to explore every bit of uncaptured human sense left.

Sixense

The creators of the first consumer available VR controller Razor Hydra were presenting their new full room body scale tracking system STEM, including VR controllers for your hands and 3 boxes fot feet and head. Targeted to Oculus Rift CV1 customers, which do not receive this functionality out of the box, in contrast to the HTC Vive. The demo setup was a fun 2-player medieval bow shooting game, created with the Unreal 4 engine. One PC was using the standard Vive Lighthouse tracking, the other Oculus CV1 with Sixense’s product prototype. Tracking results were very comparable, where I would prefer the Vive due to the controller’s more finished user experience and feel. Even more if you think about the $995 price tag for the full 5 tracker system, compared to the HTC Vive’s $300 difference to the Oculus CV1.

mimesys

A personally very interesting demo was presented by Paris located company mimesys, which specializes in telepresence, or “holographic telecommunication”. They used a Kinect v2 for capturing a point cloud, reconstructing a polygon mesh and sending it compressed over the internet. Which is comparable to ideas in my prior work Holochat. In their live demo you were able to use the SteamVR controller to draw in the air and “Holoskype” with their colleague being live in Paris. The quality of mesh and texture streaming was in a very early stage from my point of view, knowing there is more potential in a Kinect v2 point cloud. Overall the network latency of the mesh was pretty high, but unnoticable since voice chat was the primary communication method (which was achieved over Skype btw). The company’s product is a Unity engine plugin, implementing vertex and texture transfers over the internet. The usage of video codecs for textures would improve this kind of data transfer, which was not implemented as far as I noticed.

Tactical Haptics

A very well researched haptic feedback controller prototype was presented by Californian company Tactical Haptics. You can notice the academic background of mechanical engineering and haptics when you talk to professor and founder William Provancher. He knew numbers like 1000Hz, which is required for haptic feedback to trick the human skin for being real time, in contrast to only 60Hz for the human eye. Their physics engine running only at ~100Hz (don’t know the exact number anymore) was more than sufficient for their first class haptics system. With the interactions in the demo being rather rough, like juggling cubes with Jedi powers or shooting drones with a bow, the latency was more than appropriate to have an immersive haptic experience unique on the expo. Their product Reactive Grip achieves to replicate the “skin sensations of actually holding an object” and has an imaginable wide spread use in future VR controllers for action experiences and workouts.

Ricoh THETA

Japanese hardware vendor Ricoh was presenting their already marketed consumer targeted spherical camera. Being able to seamlessly stitch it’s two wide angle sensors (> hemispherical) on the hardware and sending it to the phone providing a very user friendly interface. A client agnostic REST like interface is provided to control the camera with the Android and iOS clients. Streaming video is also possible, but requires a PC with their proprietary driver for Windows and MacOS to be stitched in real time. The camera costs only about 400 bucks and will very soon flood the internet with many amateur spherical photos and videos. Spherical video and audio was a big topic on the expo, but I have a problem with the marketing term for it being 360° video, since degrees are for 2D angles, and in 3D we deal with solid angles. So please call it 4π Steradian Video from now on if you address the full sphere, or 2π sr for the hemisphere.

Nokia Ozo

Nokia has stopped making unpopular phones and started targeting the professional spherical video artist with their 8 sensor spherical stereoscopic camera available for affordable 60 grand. The dual sensor setup in every direction, separated by average eye distance was told to provide perfect conditions for stereo capturing. They also provide the editing software to be used with it. It seemed that for editing purposes the raw sensor data is stored in 2×4 circular tiles on a planar video. The video can surely be exported into 2 (because we have stereo) spheres with the commonly used equirectangular mapping on a plane, which is more storage efficient, since we do not have tons of black borders. Their live demo where you could view the “live” camera output with a DK2 was rather disappointing, because of a latency of 3s (yes, full seconds) and very noticable seams. Their software does not target live video processing yet, but there wasn’t a finished rendering available either. The camera looks like a cute android though.

High Fidelity

San Francisco based future driven company High Fidelitywants to build the software for VR what the apache server is for web. In the keynote founder Philip Rosedale talked about the “Hyperlink for VR”. They provide an in house game engine with multiplayer whiteboard drawing and voice chat support. The client is open source software, as is the server, and meant to run on Linux, Windows and MacOS. Telepresence is a very important topic for VR, but High Fidelity yet lacks the possibility of including depth sensors like Kinect or spherical cameras into their world and represents humans by virtual models, which directly brings you in the uncanny valley. Especially when skeleton animation is buggy and your model is doing unhealthy looking yoga merged into the floor. Great stuff though, looking forward to build it for myself and fix some Linux client issues 🙂

OSVR

Razor was presenting their open source middleware which wraps available drivers into their signal processing framework and does things like sensor fusion using available computer vision algorithms like the ones in OpenCV. They also provide their OSVR branded headset, marketed as Hacker Development Kit, which has the freedom of having a replaceable display and other repairable components. The headset does 1920×1080@60Hz, which is a slightly worse frame rate than achieved by the Oculus DK2. The most remarkable factor for this headset was the visible lack of screen door effect at this resolution, which was achieved by a physical distortion filter on the display. If you don’t own a DK2 and don’t have the € for a Vive, the 300$ OSVR is a very solid option. Also using OSVR as software will make supporting not only headset hardware easier, since it also provides support for controllers like SteamVR and tracking systems like Leap Motion. You only need to implement OSVR once, instead of wrapping all HMD and input APIs in your application. Valve’s OpenVR is also an attempt to do that, but lacked presence at the conference.

NVIDIA

Graphical horsepower could be experienced in both NVIDIA demos, running on current high end NVIDIA GTX980 cards. I was a little disappointed they did not bring one of their yet to be released Pascal architecture cards though. NVIDIA had both high end consumer headsets Oculus CV1 and HTC Vive on its show floor. The Oculus demo was gaming oriented, and since the Vive is more capable of VR productivity demos, due the full room tracking and use of a VR controller, it was used to do more creative things.

Oculus CV1 + Eve: Valkyrie

The first demo was showcasing the Oculus CV1 with space ship shooter Eve: Valkyrie. The rendering was smooth due to the CV1’s 90Hz frequency and screen door effect was also eliminated by the HMD’s 2160×1200 resolution. Experienced VR users will very quickly note the small tracking area, being basically limited to a chair in front of a desk, in contrast to full room tracking with HTC Vive. The user experience is also very traditional, only using a Xbox game pad. With the lack of an VR controller, users very unintuitively experience a lack of their hands in VR and cannot interact in 3D space. Hands are much better for this than an analogue stick, as we will see further below in this article.

Classical game pads also have the problem of changing button order for every manufacturer, which makes it very complicated for experienced gamers to know which button is accept and which one is back. This issue also guided me in Eve: Valkyrie’s payment acceptance screen, which I needed to be guided out of by the booth exhibitor.

HTC Vive + Google Tilt Brush

One of the most influential VR experiences for me was Google’s Tilt Brush on the HTC Vive, which is 2160×1200@90Hz as well btw. The ability to draw freely in 3D space with full room tracking and the Vive’s immersive display capabilities provides a very unique experience which feels like a new medium for artists. The user interface is very simple, having a palette on the left controller and a brush on the right. Of course you can switch hands easily, if you are left handed. This natural input and camera movement enabled the user to be creative in 3D space without the steep learning curve of contemporary “2D interface” CAD software. The creative process of expression possible with Tilt Brush is a good reason itself for getting a HTC Vive for home already. Looking forward to stuff artists can do now.

The other demo on the productivity booth was a point cloud of NVIDIA’s new headquarter’s construction site, recorded by drones with depth sensors. The scene’s resolution and rendering was not remarkable, but yet fun to navigate in with SteamVR controllers. I can definitely see the application of architects and construction engineers planning their stuff in VR.

Noitom

Another one of the top three demos I was able to experience was presented by #ProjectAlice under the name Follow the White Rabbit. They are utilizing Noitom’s tracking system to achieve a remarkable multiplayer VR demo, where real objects like garbage cans are perfectly tracked into the virtual world and can interact with virtual objects. They were using regular Wiimote controllers with plastic markers to showcase the potential of their own tracking. Their demo emphasizes how important interaction is in the virtual world and how natural it needs to feel. The scale of the tracking system is rather for public installations that home environments, but I would love to see more of real / virtual world interaction in consumer products, which can also be achieved with consumer trackers like HTC Vive’s Lighthouse. Also note the hygienic face masks they’ve offered for public HMDs. They were the ninjas of VR.

Talks

Keynote

Keynote Speakers showed their current products and market visions. AltspaceVR, NVIDIA, Nokia, High Fidelity and SpaceVR gave presentations. SpaceVR will launch a spherical camera into space, which you can stream from home onto your headset. Road to VR editor Benjamin Lang gave his insights about the so far development of the industry and his 100 year forecast to achieve perfect immersion. I think it we will be there in <30.Palmer Lucky was also there to drop some game title names from the Oculus store and quickly ran out of the expo terrain after his session, to avoid much public interaction.

Light Fields 101

Compared to conventional CMOS sensors, light field cameras like Lytro are able to receive photon rays from multiple directions and offer a data set where things like focusing can be done in post production or in live VR environments. Ryan Damm showed insights into his understanding of and research in Light Field technology, where according to him many things happen secretively and mentioned companies like Magic Leap, which are still in the process of developing their product.

How Eye-Interaction Technology Will Transform the VR Experience

Jim Marggraff presented his long professional experience with eye tracking and how necessary it is in natural interfaces. He showed a whack-a-mole on Oculus DK2 where head tracking and eye tracking were compared. A random person from the audience obviously could select the moles faster with eye tracking than by moving his head. He also showed a full featured eye tracking operating system interface where everything could be done with just your eyes, from buying something on Amazon to chatting. Also password authentication is drastically simplified with eye tracking, since you get free retina recognition with it. I think eye tracking will be as essential as hand tracking in future VR experiences, since human interaction and natural input are the next important steps that need to achieve perfection, after we have 4k@144Hz HMDs.

WebVR with Mozilla’s A-Frame

Not only was Ben Nolan the right guy to look for if you’re into authentic American craft beer bars in San Jose, but also a JavaScript developer with enthusiasm for VR in the web browser. He showed A-Frame, an open source JavaScript engine with HMD support which is already available in the popular browser’s nightly builds. The engine contains a XML based model format and scene graph, physics and shading extensibility utilizing the WebGL standard. The big benefit of having VR support in the browser is clearly ease of distribution and quick content generation. He pointed out that a minimal A-Frame project is as small as 1kB, where a Unity web build is at least ~0.5MB. Type aframe.io into your Android phone and try it for yourself.

Apollo 11 VR – (Designed to Demo, Developed to Educate)

David Whelan of Ireland based VR Education Ltd pointed out how important VR will be in future education. He had a very good point that experiencing tends to have a higher impact in memorizing facts than sitting in a classroom. He showed the development process of their Kickstarter funded Apollo 11 VR demo, and their current work Lecture VR, which both are available on Steam already.

The Missing Link: How Natural Input Creates True Immersion for VR/AR

One of the most amazing on spot talks was given by Leap Motion co-founder David Holz, who pointed out the necessity of natural input in VR. Leap Motion is a ~100$ high FOV infrared camera, which is already available since 2013. But their major technological achievement is observable since only this year, after they released the second iteration of their tracking software, code named Orion. Holz showcased the stability and precision of their hand tracking on stage, which was quite remarkable. But he got the audience when he showed their current Blocks demo, where objects can be manipulated with a physically plausible interaction engine. Natural gestures are used to create objects, grab them and throw them around. If you didn’t see the demo try it at home, you just need a DK2 and a Leap Motion sensor. It feels to be a generation further than the ordinary VR demo and points out how much immersion is gained by seeing your hands and even using them for interaction. He also showed user interface designs for VR, which are projected onto the body and in the room. Conventional 2D interfaces where we need to stack windows and tabs seem very primitive in comparison. He also talked about how VR/AR interfaces will eliminate the necessity of having a work desk and chair, since all meetings and work can be done in the forest or a lounge.

Conclusion

The expo pointed out how important novel human interaction methods are in VR, since it is obvious to replace the keyboard, mouse and game pad with natural body movement tracking as it was to replace conventional displays with HMDs.

A big part the industry also focuses on spherical video, since it’s currently the quickest method of bringing the real world into VR.

In July 2015 I did a VR demo on the CV Tag at the University of Koblenz, which uses two Arch Linux PCs with two Oculus Rift DK2s and two Kinect v2s. It utilizes ROS‘s capabilities to stream ROS topics over the network.

ROS Topics

You can list the ROS topics with following command

rostopic list

You will see that kinect2_bridge provides topics in 3 different resolutions: [ hd | qhd | sd ], and also different formats and the option for compression. The IR image is only available in SD, due to the sensor size.

You will notice that the uncompressed color image is ~101.23MB/s , the calculated uncompressed depth image is ~125.66MB/s and the uncompressed IR image is only 12.75MB/s.

By default kinect2_viewer accesses /kinect2/qhd/image_color_rect and /kinect2/qhd/image_depth_rect. The IR mode has a lower bandwidth, since sd/image_ir_rect and sd/image_depth_rect combined require only ~28MB/s, and compressed ~17MB/s, which should be achievable over 100MBit/s LAN.

At the time I wrote the patches OpenHMD was lacking head tracking support and libraries like Valve’s OpenVR and Razer’s OSVR were not around. They still are not really usable with the DK2 at the time I am writing this article.

To run the patched viewer you need to have ovrd running. If this does not work out, try killing it and running it as root. It auto starts with your session and does not exit if it lacks the udev rights. Make sure you have oculus-udev installed.

Make sure your headset is available in

$ OculusConfigUtil

When the viewer starts, you need to manually maximize it on the VR headset 😉 The user base of the demo (me) thought this was sufficient.

ROS Networking

ROS prints host name and port when you start roscore

$ roscore
...
started roslaunch server http://kinecthost:43272/

If you have the above setup on two machines, you can run kinect2_bridge on the host as usual. On the client you need to provide the host’s ROS_MASTER_URI when running the viewer.

Adding a simple audio stream using GStreamer

To run the following pipelines, you need to install the GStreamer Good Plugins. The pipeline use your PulseAudio default devices for recording and playback. You can set them for example in GNOME’s audio settings.

Using my ros-jade-kinect2AUR package, you can install all required dependencies, such as a ton of ROS packages, Point Cloud Library and libfreenect2, which are all available in the Arch User Repository.

Testing libfreenect2

After installing libreenect2 you can test your Kinect v2 with the following command

$ Pronect

If everything runs fine you will get an image like this from Pronect

In the image above you can see the unprocessed infrared image (top left), the color sensor image mapped to the calculated depth (top right) , the unprocessed color image (bottom left) and the calculated depth image (bottom right).

By default Pronect uses an OpenGL to generate the depth image. To test libfreenect2’s different DepthPacketProcessor backends you can do

libfreenect provides a /usr/lib/udev/rules.d/90-kinect2.rules file which gives the Kinect the udev tag uaccess to provide user access. The error is generated when this did not work out. It can be fixed with a relogin. udevadm control -R didn’t seem to work. Running Pronect with sudo will also help temporally.

Using ROS

You can enter your ROS environment with

$ source /opt/ros/jade/setup.bash

You probably should create an alias for this environment in your shell config.

Now you can launch the roscore and leave it in a separate shell.

$ roscore

Install ros-jade-rosbash for rosrun. You now can list the options of the kinect2_bridge module.

$ rosrun kinect2_bridge kinect2_bridge -h

The default options for kinect2_bridge are OpenCL registration and OpenGL depth method. You can start it like this

$ rosrun kinect2_bridge kinect2_bridge

Possible Problems

This fails for me on NVIDIA with

[ERROR] [Kinect2Bridge::start] Initialization failed!

This is due to the OpenCL registration method failing to initialize the OpenCL device.

A different error occurs with the beignet OpenCL implementation for Intel. It seems the OpenCL registration method does not support beignet’s shader compiler.

Debugging GL Errors can be a time consuming task sometimes. Usually you need to query the OpenGL state machine with glGetError, which returns just an integer of the latest error.

First of all this requires a switch for checking the type of error, which could look like this:

When you execute this code, the loop will print all errors on the stack. This does not tell you when the error occurred, just that it happened before calling the function.

With Mesa you can do the same by setting an environment variable.

$ export MESA_DEBUG=1

This will give you debug output similar to this:

Mesa: User error: GL_INVALID_VALUE in glClear(0x5f01)

The usual solution is to create a macro function, which prints in which line you executed the query function, and you put a call to it in the end of every code that calls the OpenGL API.

This has to be done on propriatory drivers like NVIDIA’s, since you do not have debug information. A better approach is to get a backtrace to every failing GL call. For this, you need to rebuild your Mesa libGL.so with debug symbols, or install a debug package provided by your distribution.

To build Mesa with debug symbols you have to set the following compiler options:

export CFLAGS='-O0 -ggdb3'
export CXXFLAGS='-O0 -ggdb3'

On Arch Linux this can be done in the build() function, when building mesa from ABS or mesa-git from AUR.

You will then be able to receive a backtrace with gdb. Do not forget to build your application with debug symbols. For cmake projects, like mesa-demos you can achieve this by doing

Now we know that the GL error occurs in line 392 of geo-outlining-150.c.

You get GL errors when using GLEW in core GL contexts, since its calling the deprecated GL_EXTENSIONS enum for glGetString. You can continue debugging with c. If you want to use a modern way to load core context, try gl3w instead of GLEW.