Hand tracking and virtual reality are both emerging technologies, and combining the two into a fluid and seamless experience can be a real challenge. In both cases, developers need to overturn many longstanding ideas that have served them well for traditional PC setups. It’s also an incredible opportunity – the chance to experiment and create in ways that previously existed only in the pages of science fiction. Where the physical and digital worlds collide, there be dragons.

This guide brings together longstanding user experience design principles and VR research with what we’ve learned from ongoing development and user testing. Like everything in this field, it’s just a stepping stone on the path to the Metaverse.

Important: Experiences of sensory conflict (e.g. inner ear vs. visual cues) and object deformation (due to variations of pupil distance) have a large amount of variability between individuals. Just because it feels fine for you does not mean it will feel fine for everyone! User testing is always essential.

We’re currently revisiting these guidelines in the wake of our Orion release, as well as some exciting new experiments on the razor’s edge of this rapidly emerging space. There’s so much happening right now that many of these guidelines should be considered as preliminary ideas, rather than established rules. Note that some assets (like Image Hands and Widgets) are not currently available for the Orion Unity Core Assets.

Part 1. User Interface Design

The “physical” design of interactive elements in VR should afford particular uses.

Every interactive object should respond to any casual movement.

Clearly describe necessary gestures with text-based prompts.

Interactive elements should be appropriately scaled.

Place text and images on slightly curved concave surfaces

The Dangers of Intuition

In the world of design, intuition is a dangerous word — highly individual assumptions driven by what we know as familiar. No two people have the same intuitions, but we’re trained by our physical world to have triggered responses based on our expectations. The most reliable “intuitive gestures” are ones where we guide users into doing the proper gesture through affordance. When an interaction relies on a user moving in a certain way, or making a specific pose, create affordances to encourage this.

In the real world, we never think twice about using our hands to control objects. We instinctively know how. By thinking about how we understand real-world objects, you can harness the power of hands for digital experiences — bridging the virtual and real worlds in a way that’s easy for users to understand.

Semantic and Responsive Gestures

Gestures break down into two categories: semantic and responsive. Semantic gestures are based on our ideas about the world. They vary widely from person to person, making them hard to track, but it’s important to understand their impact on meaning to the user. For example, pointing at oneself when referring to another person feels foreign.

Responsive gestures occur in response to the ergonomics and affordances of specific objects. They are grounded and specific, making them easy to track. Because there is a limited range of specific interactions, multiple users will perform the exact same gestures.

By leveraging a user’s understanding of real-world physical interactions, and avoiding gestures that don’t make sense in semantic terms, we can inform and guide them in using digital objects. From there, the opportunities to expand a user’s understanding of data are endless.

Creating Affordances

In the field of industrial design, “affordances” refers to the physical characteristics of an object that guide the user in using that object. These aspects are related to (but distinct from) the aspects indicating what functions will be performed by the object when operated by the user. Well-designed tools afford their intended operation and negatively afford improper use.

In the context of motion controls, good affordance is critical, since it is necessary that users interact with objects in the expected manner. With 2D Leap Motion applications, this means adapting traditional UX design principles that condensed around the mouse and keyboard. VR opens up the potential to build interactions on more physical affordances. Here are several best practices for designing interactions in VR:

The more physical the response of an object is, the wider the range of interactions may be correctly responded to. For example, a button that can only be pressed by entering a bounding volume from the right direction requires a very clear affordance. Buttons that are essentially physical pistons simply need to look “pushable.”

The more specific the interaction, the more specific the affordance should appear. In other words, the right affordance can only be used in one way. This effectively “tricks” the user into making the right movements, and discourage them from making an error. In turn, this makes the UI more efficient and easy to use, and reduces the chances of tracking
errors.

Look for real-world affordances that you can reflect in your own projects.There are affordances everywhere, and as we mentioned earlier, these form the basis for our most commonly held intuitions about how the world works:

Doorknobs and push bars. Doorknobs fit comfortably in the palm of your hand, while push bars have wide surfaces made for pressing against. Even without thinking, you know to twist a doorknob.

Skateboard prevention measures. Ever seen small stubs along outdoor railings? These are nearly invisible to anyone who doesn’t want to grind down the rail — but skaters will find them irritating and go elsewhere.

Mouse buttons vs. touch buttons. Mouse-activated buttons look small enough to be hit with your mouse, while touchscreen buttons are big enough to be hit with your finger.

Everything Should Be Reactive

Every interactive object should respond to any casual movement. For example, if something is a button, any casual touch should provoke movement, even if that movement does not result in the button being fully pushed. When this happens, the kinetic response of the object coincides with a mental model, allowing people to move their muscles to interact with objects.

Because the Leap Motion Controller affords no tactile responses, it’s important to use other cues to show that users have interacted with a UI element. For example, when designing a button:

use a shadow from the hand to indicate where the user’s hand is in relation to button

create a glow from the button that can be reflected on the hand to help understand the depth relationship

ensure the button moves in relationship to the amount of pressure (Z-press) from the user

use sound to indicate when the button has been pressed (“click”)

create a specific hover state and behavior for interaction widgets

When done effectively, people will feel themselves anticipating tactile experiences as they interact with a scene. However, if an object appears intangible, people have no mental model for it, and will not be as able to reliably interact with it.

Gesture Descriptions

While affordances are important, text- or audio-based tutorial prompts can also be essential for first-time users. Be sure to clearly describe intended interactions, as this will greatly impact how the user does the interaction.

For example, if you say “pump your fist,” the user could interpret this in many different ways. However, if you say “fully open your hand, then make a fist, and repeat,” the user will fully open their hand then fully close it, repeatedly. Be as specific as possible to get the best results, using motions and gestures that track reliably.

Alternatively, you can use our Serialization API to record and playback hand interactions. For an example of how this works, take a look at our post on designing Playground, which is open sourced on GitHub.

Interactive Element Targeting

Appropriate scaling. Interactive elements should be scaled appropriate to the expected interaction (e.g. full hand or single finger). One finger target should be no smaller than 20 mm in real-world size. This ensures the user can accurately hit the target without accidentally triggering targets next to it.

Limit unintended interactions. Depending on the nature of the interface, the first object of a group to be touched can momentarily lock out all others. Be sure to space out UI elements so that users don’t accidentally trigger multiple elements.

Limit hand interactivity. Make a single element of the hand able to interact with buttons and other UI elements — typically, the tip of the index finger. Conversely, other nearby elements within the scene should not be interactive.

Text and Image Legibility

While VR is amazing at many things, there are sometimes issues involved with rendering text. Due to resolution limitations, only text at the center of your FOV may appear clear, while text along the periphery may seem blurry unless users turn to view it directly. For this reason, be sure to avoid long lines of text in favour of scrollable columns like our Scroll Widget.

Another issue arises from lens distortion. As a user’s eyes scan across lines of text, the positions of the pupils will change, which may cause distortion and blurring. Furthermore, if the distance to the text varies — which would be caused, for example, by text on a flat laterally extensive surface close to the user — then the focus of the user’s eyes will change, which can also cause distortion and blurring.

The simplest way to avoid this problem is to limit the angular range of text to be close to the center of the user’s field of view (e.g. making text appear on a surface only when a user is looking directly at the surface). This will significantly improve readability.

Part 2. Interaction Design

No movement should take place unless it’s user-driven.

Keep in mind that human hands naturally move in arcs, rather than straight lines.

Limit the number of gestures that users are required to learn.

All interactions should have a distinct initiation and completion state.

Ensure that users can interact with objects occluded by their hands.

Restrict Motions to Interaction

One of the biggest barriers to VR is simulator sickness, which is caused by a conflict between different sensory inputs (i.e. inner ear, visual field, and bodily position). Oculus’ best practices guide covers this issue in great detail. Generally, significant movement — as in the room moving, rather than a single object — that hasn’t been instigated by the user can trigger feelings of nausea. Conversely, being able to control movement reduces the experience of motion sickness.

The display should respond to the user’s movements at all times, without exception. Even in menus, when the game is paused, or during cutscenes, users should be able to look around.

Do not instigate any movement without user input (including changing head orientation, translation of view, or field of view).

Do not rotate or move the horizon line or other large components of the environment unless it corresponds with the user’s real-world motions.

Reduce neck strain with experiences that reward (but don’t require) a significant degree of looking around. Try to restrict movement in the periphery.

Ensure that the virtual cameras rotate and move in a manner consistent with head and body movements. (See Part 5: Space and Perspective for more details about how to achieve this with the Leap Motion Controller.)

Ergonomics

Designing based on how the human body works is an essential to bringing any new interface to life. Our bodies tend to move in arcs, rather than straight lines, so it’s important to compensate by allowing for arcs in 3D space. For example, when making a hand movement in Z, the user inherently makes a Y movement as well. The same is true when the user moves along the X axis — this motion will result in the user also creating a z movement.

Hand movements can also vary greatly based on posture. For example, when tracking the index finger:

Pivoting on a fixed shoulder with elbow raised: wandering in the X and Y axes.

Pivoting on a fixed elbow on a table: wandering in Y

Movement pivoting on wrist with relaxing index: minimum wandering, very small Z depth

Limit Learned Gestures

There are a very limited number of gestures that users can remember. When developing for motion controls, be sure to build on a base set, or try combining gestures for more advanced features. Even better, create specific affordances that users will respond to, rather than having to learn a specific pose or movement.

Eliminate Ambiguity

All interactions should have a distinct initiation and completion state. The more ambiguous the start and stop, the more likely that users will do it incorrectly:

Clearly describe intended poses and where the user should hold their hand to do that pose.

If the intended interaction is a motion, make a clear indicator where the user can start and stop the motion.

If the interaction is in response to an object, make it clear from the size and shape of the object how to start and stop the interaction.

Hand-Occluded Objects

In the real world, people routinely interact with objects that are obscured by their hands. Normally, this is achieved by using touch to provide feedback. In the absence of touch, here are some techniques that you can use:

Provide audio cues to indicate when an interaction is taking place.

Make the user’s hand semi-transparent when near UI elements.

Make objects large enough to be seen around the user’s hand and fingers. (See the section Interactive Element Targeting for more information about scaling.)

Avoid placing objects too high in the scene, as this forces users to raise their hands up and block their view. (See the section Ideal Height Range for more information.)

When designing hand interactions, consider the user’s perspective by looking at your hands with a VR headset.

Locomotion

World navigation is one of the greatest challenges in VR, and there are no truly seamless solutions beyond actually walking around in a Holodeck-style space. Generally, the best VR applications that use Leap Motion for navigation aren’t centered around users “walking” around in a non-physical way, but transitioning between different states. Here are some interesting experiments in locomotion:

World of Comenius. This educational application features glowing orbs that can be tapped to travel from place to place.

VR Intro and Weightless. Two-handed flight has been a cultural touchstone since the days of George Reeves’ Superman. This is a great way to give your users superpowers, but it can get tiring unless used in a short demo, or alongside other interactive elements.

Planetarium. The Joyball widget makes it easy to move with small displacements from a comfortable posture, while also providing compass data that helps to orient the user.

Three.js Camera Controls. This set of experiments from Isaac Cohen explores several different ways that users can navigate 3D space.

Sound Effects

Sound is an essential aspect of truly immersive VR. Combined with hand tracking and visual feedback, it can be used to create the “illusion” of tactile sensation. It can also be very effective in communicating the success or failure of interactions.

Part 3. Optimizing for VR Tracking

Include safe poses to avoid “the Midas touch” where everything is interactive.

Encourage users to keep their fingers splayed out.

Keep interactive elements in the “Goldilocks zone” between desk height and eye level.

Filter out implausible hands.

Use visual feedback to encourage users to keep their hands within the tracking zone.

Avoid interactions involving fingers that would be invisible to the controller.

The Sensor is Always On

As we’ve discussed elsewhere, The Leap Motion Controller exhibits the “live-mic” or “Midas touch” problem. This means that your demo must have a safe pose, so that users can safely move through the device’s field of view without interacting. Gesture-based interactions should be initiated with specific gestures that are rarely a part of casual movement (e.g. making a fist and then splaying the fingers).

However, safety should never be at the expense of speed. Do not require a pause to begin an interaction, as your users will become frustrated.

Flat Splayed Hands

Whenever possible, encourage users to keep their fingers splayed and hands perpendicular to the field of view. This is by far the most reliable tracking pose. You can encourage this by requiring interactions to be initiated from this pose, and providing positive indicators when the pose is detected.

Ideal Height Range

Interactive elements within your scene should typically rest in the “Goldilocks zone” between desk height and eye level. Here’s what you need to consider beyond the Goldilocks zone:

Desk height. Be careful about putting interactive elements at desk height or below. Because there are often numerous infrared-reflective objects at that height, this can cause poor tracking. (For example, light-colored or glossy desks will overexpose the controller’s cameras.)

Above eye level. Interactive objects that are above eye level in a scene can cause neck strain and “gorilla arm.” Users may also occlude the objects with their own hand when they try to use them. See Hand-Occluded Objects in Part 2 for design techniques that can compensate for this.

Implausible Hands

The Leap Motion API returns some tracking data that can be safely discarded, depending on the use case. For head-mounted devices, hands may be detected that are entirely implausible given a head-mounted device. Use the Confidence API to eliminate hands with low confidence values. Allow a maximum of one right and one left hand, and only animate those two hands.

Edge of FOV

Avoid interactions that require a hand to be at rest when near the edge of the field of view, or when horizontal. Both of these configurations result in spontaneous hand movement. To help resolve FOV issues, use the confidence API and filter held objects.

Hand Out of View

If the user can’t see their hand, they can’t use it. While this might seem obvious to developers, it isn’t always to users — especially when focused on the object they’re trying to manipulate, rather than looking at their hand. Here are a few techniques to make it clear to users that they must keep their hand in view at all times:

Disappearing skins. Create an overlay for the hand, such as a robot hand, spacesuit, or alien skin, or a glow that envelops an image of the passthrough hand (which is available through the Image Hands asset). When tracking is reliable, the overlay appears at full opacity. When Confidence API values drop, the overlay fades out or disappears. This affordance can be used to let the user know when the tracking sees the hand, and when it doesn’t.

Error zone. Delineate a clear safety zone to indicate where the hands should be placed. You can notify the user when their hands enter (or exit) the zone with a simple change in color or opacity to that area of the screen.

Contextually correct persistent states. If the user grabs something, that thing should remain in their hand until they release it. If their hand leaves the field of view, the object should also smoothly exit.

Failure vs. exit. It’s possible to distinguish tracking failures (when the hand abruptly vanishes from the center of the field of view) from tracked exiting (when the hand vanishes at or near the edge of the field of view). Your application should handle these in a way that yields smooth and plausible motion, without causing unintended interactions.

Finger Occlusion

As with any optical tracking platform, it’s important to avoid the known unknowns. The Leap Motion tracking software includes “target poses” to which it will default in the absence of direct line-of-sight data. Thus, it is possible to correctly identify the poses of fingers that are outside of the line of sight of the device. However, this pose may not be reliably distinguished from other poses.

As a result, be sure to avoid interactions that depend on the position of fingers when they are out of the device’s line of sight. For example, if pinching is allowed, it should only be possible when the fingers can be clearly seen by the controller. Similarly, grabbing is tracked most reliably when the palm faces away from the device.

Part 4. Hands and Body

Only render the user’s body in situations where you can expect reasonable agreement. While being disembodied is disconcerting, having multiple inconsistent bodies is usually worse.

Choosing Your Hands

For Unity projects, we strongly recommend using the Image Hands assets for your virtual hands. This asset combines the absolute realism of live video, with the full interactivity and 3D behavior of our classic rigged hands, by projecting the raw images of your hands into a 3D mesh that can interact with other objects in real-world space. The part of the mesh that you can see is covered by the passthrough from the twin Leap Motion cameras. Each camera provides a separate view of your hands, so that what you see has actual depth.

This hybrid effect has a powerful impact in VR because your real hands can now actually pass through (or disappear behind) virtual objects. The hands interact properly with other 3D objects because they are 3D — complete with the interactive and visual capabilities that you expect. This approach also reduces jitter effects, since there’s no longer an artificially generated rigged hand to worry about. While the hand image in VR might shift or become transparent, it won’t suddenly shift in the wrong direction.

Hand Position

Just as accurate head tracking can place you inside a virtual world, hand tracking can reinforce the sense that you’ve actually traveled to another place. Conversely, when hands appear in the wrong spot, it can be very disorienting. To enhance immersion and help users rely on their proprioception to control movements and understand depth, it’s essential that virtual hand positions match their real-world counterparts as much as possible.

Body Position

By providing a bodily relationship to the elements in the scene, you can increase immersion and help ground your user. Only create an avatar for the user if they are likely to be in alignment. For example, if the user is sitting in a chair in a game, you can expect that they will do the same in real life. On the other hand, if the user is moving around a scene in a game, they are unlikely to be moving in real life.

People are usually comfortable with a lack of a body, due to experiences with disembodied observation (movies) and due to the minimal presence of one’s own body in one’s normal field of view (standing looking forward). However, adding a second body that moves independently is unfamiliar and at odds with the user’s proprioception.

Part 5. Space and Perspective

Adjust the scale of the hands to match your game environment.

Align the real-world and virtual cameras for the best user experience.

Use 3D cinematic tricks to create and reinforce a sense of depth.

Objects rendered closer than 75 cm (within reach) may cause discomfort to some users due to the disparity between monocular lens focus and binocular aim.

Use the appropriate frame of reference for virtual objects and UI elements.

Use parallax, lighting, texture, and other cues to communicate depth and space.

Controller Position and Rotation

To bring Leap Motion tracking into a VR experience, you’ll need to use (or create) a virtual controller within the scene that’s attached to your VR headset’s camera objects. In this section, we’ll use the Oculus Rift Unity plugin as an example, but this approach can be applied to other headsets as well.

By default, the two Oculus cameras are separated by 0.064, which is the average distance between human eyes. Previously, to center the controller, you would need to offset it by 0.032 along the x-axis. Since the Leap Motion Controller in the physical world is approximately 8 cm away from your real eyes, you’ll also need to offset it by 0.08 in the z-axis.

However, Oculus 0.4.3.1B+ includes a “CenterEyeAnchor” which resides between LeftEyeAnchor and RightEyeAnchor. This means that you can attach the HandController object directly to the CenterEyeAnchor with no offset along the x axis. Assuming that HandController is a child of CenterEyeAnchor, there are two possible XYZ configurations you might want to use:

To match passthrough: 0.0, 0.0, 0.0 and scale according to the augmented reality case of the section below.

To match real-life: 0.0, 0.0, 0.08.

Note that objects in the passthrough video will also appear slightly closer because of the eyestalk effect, which is one of many reasons why we’re working on modules designed to be embedded in VR headsets. For more information on placing hand and finger positions into world space, see our guide to VR Essentials: What You Need to Build from Scratch.

Scale: Augmented vs. Virtual Reality

For VR experiences, we recommend a 1:1 scale to make virtual hands and objects look as realistic and natural as possible. With augmented reality, however, the scale needs to be adjusted to match the images from the controller to human eyes. Appropriate scaling can be accomplished by moving the cameras in the scene to their correct position, thereby increasing the scale of all virtual objects.

This issue occurs because the Leap Motion Controller sensors are 40 mm apart, while average human eyes are 64 mm apart. Future Leap Motion modules designed for virtual reality will have interpupillary distances of 64mm, eliminating the scaling problem.

Positioning the Video Passthrough

When using the Image API, it’s important to remember that the video data doesn’t “occupy” the 3D scene, but represents a stream from outside. As a result, since the images represent the entire view from a certain vantage point, rather than a particular object, they should not undergo the world transformations that other 3D objects do. Instead, the image location must remain locked regardless of how your head is tilted, even as its contents change — following your head movements to mirror what you’d see in real life.

To implement video passthrough, skip the modelview matrix transform (by setting the modelview to the identity matrix) and use the projection transform only. Under this projection matrix, it’s sufficient to define a rectangle with the coordinates (-4, -4, -1), (4, -4, -1), (-4, 4, -1), and (4, 4, -1), which can be in any units. Then, texture it with the fragment shader provided in our Images documentation.

Depth Cues

Whether you’re using a standard monitor or VR headset, the depth of nearby objects can be difficult to judge. This is because, in the real world, your eyes dynamically assess the depth of nearby objects — flexing and changing their lenses, depending on how near or far the object is in space. With headsets like the Oculus Rift, the user’s eye lenses will remain focused at infinity.

When designing VR experiences, you can use 3D cinematic tricks to create and reinforce a sense of depth:

objects in the distance lose contrast

distant objects appear fuzzy and blue/gray (or transparent)

nearby objects appear sharp and full color/contrast

shadow from hand casts onto objects, especially drop-shadows

reflection on hand from objects

sound can create a sense of depth

Rendering Distance

The distance at which objects can be rendered will depend on the optics of the VR headset being used. (For instance, Oculus recommends a minimum range of 75 cm for the Rift DK2.) Since this is beyond the optimal Leap Motion tracking range, you’ll want to make interactive objects appear within reach, or respond to reach within the optimal tracking range of the Leap Motion device.

Virtual Safety Goggles

As human beings, we’ve evolved very strong fear responses to protect ourselves from objects flying at our eyes. Along with rendering interactive objects no closer than the minimum recommended distance, you may need to impose additional measures to ensure that objects never get too close to the viewer’s eyes. The effect is to create a shield that pushes all moveable objects away from the user.

Multiple Frames Of Reference

Within any game engine, interactive elements are typically stationary with respect to some frame of reference. Depending on the context, you have many different options:

World frame of reference: object is stationary in the world.

User body frame of reference: object moves with the user, but does not follow head or hands.

Head frame of reference: object maintains position in the user’s field of view.

Hand frame of reference: object is held in the user’s hand.

Be sure to keep these frames of reference in mind when planning how the scene, different interactive elements, and the user’s hands will all relate to each other.

Parallax, Lighting, and Texture

Real-world perceptual cues are always useful in helping the user orientate and navigate their environment. Lighting, texture, parallax (the way objects appear to move in relation to each other when the user moves), and other visual features are crucial in conveying depth and space to the user.