Cubic Environment Mapping in Metal: Reflection and Refraction

In this post, we’ll talk about some of the more advanced features of texturing in Metal. We’ll apply a cube map to a skybox to simulate a detailed environment surrounding the scene. We’ll also introduce a technique called cubic environment mapping to simulate reflection and refraction, to further enhance the realism of our virtual world.

Setting the Scene

The scene for the sample app consists of a skybox surrounding a torus knot.

Instead of loading models from disk to construct the sample scene, we will be generating the necessary geometry procedurally.

The Vertex Format

Although the objects in our scene will be textured, we do not need to explicitly store texture coordinates in our vertices. Instead, we will generate texture coordinates for the skybox and torus knot in specially-crafted vertex shaders. Therefore, each vertex has only two properties: position and normal:

The Mesh Class

As we’ve done in the past, we use a mesh object to wrap up a vertex buffer and an index buffer so that we can conveniently issue indexed draw calls. The base class, MBEMesh, provides an abstract interface to these two buffers.

The Skybox Mesh

Generating the skybox mesh is straightforward. The skybox is a unit cube about the origin, so the skybox class uses a static list of positions, normals, and indices to build its buffers. Note that the indices are ordered to produce cube faces that point inward, since the virtual camera will be inside the skybox at all times.

A cubic environment map applied to a skybox can produce the illusion of a detailed backdrop. Textures courtesy of Emil Persson (http://humus.name)

The Torus Knot Mesh

The central object in the scene is a torus knot. You are probably familiar with the torus, a geometric figure that consists of a circle swept perpendicularly around another circle.

A torus knot introduces an additional complication by twisting the path taken by the orbiting circle, creating a figure that is more visually interesting:

This trefoil knot is one kind of knot that can be produced by the torus knot mesh class. Textures courtesy of Emil Persson (http://humus.name)

The MBETorusKnotMesh class generates vertices as it sweeps along the path of a torus knot, then generates indices that knit together the vertices into a solid mesh. You can produce a large variety of interesting knots by varying the parameters passed to the mesh’s initializer method. View some other possible shapes here.

Now that we have some geometry to render, let’s talk about a new kind of texture in Metal.

Cube Textures

Cube textures (also called “cube maps”) are a special type of texture. In contrast to regular 2D textures, cube textures are collections of six textures that have a spatial relationship to one another. In particular, each image corresponds to a direction along one of the axes of the 3D coordinate system.

In the figure below, the six textures comprising a cube map are laid out to show the cube from the inside. The -Y face is in the middle of the cross.

The Cube Map Coordinate Space

Cube maps in Metal are left-handed, meaning that if you imagine yourself inside the cube, with the +X face to your right, and the +Y face above you, the +Z axis will be the one directly ahead. If instead you are positioned outside the cube, the coordinate system appears to be right-handed, and all of the images are mirrored.

Creating and Loading a Cube Map Texture

Creating a cube texture and loading it with image data is similar to creating and loading a 2D texture, with a couple of exceptions.

The first difference is the texture type. MTLTextureDescriptor has a convenience method for creating a texture descriptor describing a cube texture. The textureType property of the descriptor is set to MTLTextureTypeCube, and the height and width properties are set equal to the size parameter. The width and height of all constituent textures of the cube texture must be equal.

The other difference between 2D textures and cube textures is how image data is loaded. Because a cube texture actually contains six distinct images, we have to specify a slice parameter when setting the pixel data for each face. Each slice corresponds to one cube face, with the following mapping:

Face number

Cube texture face

0

Positive X

1

Negative X

2

Positive Y

3

Negative Y

4

Positive Z

5

Negative Z

To load the data into every cube face, we iterate over the slices, replacing the slice’s entire region with the appropriate image data:

imageData must be in whatever pixel format was specified in the texture descriptor. bytesPerRow is the width of the cube multiplied by the number of bytes per pixel. bytesPerImage, in turn, is the number of bytes per row multiplied by the height of the cube. Since the width and height are equal, you can substitute the cube side length in these calculations.

Sampling a Cube Map

Cube map texture coordinates work somewhat differently than 2D texture coordinates. The first difference is that three coordinates are required to sample a cube map, rather than two. The three coordinates are treated as a ray originating at the center of the cube, intersecting the face at a particular point on the cube. In this way, cube texture coordinates represent a direction rather than a particular point.

Some interesting edge cases can arise when sampling in this way. For example, the three vertices of a triangle might span the corner of the cube map, sampling three different faces. Metal handles such cases automatically by correctly interpolating among the samples along the edges and corners of the cube.

Applications of Cube Maps

Now that we know how to create, load, and sample a cube map, let’s talk about some of the interesting effects we can achieve with cube maps. First, we need to texture our skybox.

Texturing the Skybox

Recall that we created our skybox mesh from a set of static data representing a unit cube about the origin. The positions of the corners of a unit cube correspond exactly to the corner texture coordinates of a cube map, so it is possible to reuse the cube’s vertex positions as its texture coordinates, ignoring the vertex normals.

Drawing the Skybox

When rendering a skybox, one very important consideration is its effect on the depth buffer. Since the skybox is intended to be a backdrop for the scene, all of the other objects in the scene must appear in front of it, lest the illusion be broken. For this reason, the skybox is always drawn before any other objects, and writing to the depth buffer is disabled. This means that any objects drawn after the skybox replace its pixels with their own.

Depth writing is disabled by setting the depthWriteEnabled property of the skybox’s MTLDepthStencilDescriptor to NO. We use separate MTLDepthStencilState objects when drawing the skybox and when drawing the torus knot.

The Skybox Vertex Shader

The vertex shader for the skybox is very straightforward. The position is projected according to the combined model-view-projection matrix, and the texture coordinates are set to the model space positions of the cube corners.

The Fragment Shader

We will use the same fragment shader for both the skybox and the torus knot figure. The fragment shader simply inverts the z-coordinate of the texture coordinates and then samples the texture cube in the appropriate direction:

We invert the z coordinate in this case because the coordinate system of the cube map is left-handed as viewed from the inside, but we prefer a right-handed coordinate system. This is simply the convention adopted by the sample app; you may prefer to use left-handed coordinates. Consistency and correctness are more important than convention.

The Physics of Reflection

Reflection occurs when light bounces off of a reflective surface. In this post, we will only consider perfect mirror reflection, the case where all light bounces off at an angle equal to its incident angle. To simulate reflection in a virtual scene, we run this process backward, tracing a ray from the virtual camera through the scene, reflecting it off a surface, and determining where it intersects our cube map.

Light is reflected at an angle equal to its angle of incidence

The Reflection Vertex Shader

To compute the cube map texture coordinates for a given vector, we need to find the vector that points from the virtual camera to the current vertex, as well as its normal. These vectors must both be relative to the world coordinate system. The uniforms we pass into the vertex shader include the model-to-world matrix as well as the normal matrix (which is the inverse transpose of the model-to-world matrix). This allows us to compute the necessary vectors for finding the reflected vector according to the formula given above.

Fortunately, Metal includes a built-in function that does the vector math for us. reflect takes two parameters: the incoming vector, and the surface normal vector. It returns the reflected vector. In order for the math to work correctly, both input vectors should be normalized in advance.

The output vertex texture coordinates are interpolated by the hardware and passed to the fragment shader we already saw. This creates the effect of per-pixel reflection, since every execution of the fragment shader uses a different set of texture coordinates to sample the cube map.

The Physics of Refraction

The second phenomenon we will model is refraction. Refraction occurs when light passes from one medium to another. One everyday example is how a straw appears to bend in a glass of water.

The environment map refracted through a torus knot

We won’t discuss the mathematics of refraction in detail here, but the essential characteristic of refraction is that the amount the light bends is proportional to the ratio between the indices of refraction of the two media it is passing between. The index of refraction of a substance is a unitless quantity that characterizes how much it slows down the propagation of light, relative to a vacuum. Air has an index very near 1, while the index of water is approximately 1.333 and the index of glass is approximately 1.5.

Refraction follows a law called Snell’s law, which states that the ratio of the sines of the incident and transmitted angles is equal to the inverse ratio of the indices of refraction of the media:

Light is bent when passing between substances with different indices of refraction

The Refraction Vertex Shader

The refraction vertex shader is identical to the reflection vertex shader, except that we call refract instead of reflect. refract takes a third parameter, which is the ratio between the two indices of refraction. kEtaRatio is a shader constant that represents the ratio between the indices of refraction of air and glass.

The refract function does the work of bending the vector about the normal to produce the transmitted vector, which once again is used as a set of texture coordinates to sample the cube map.

One obviously non-physical aspect of this simplified refraction model is that only the front-most surface of an object is visible. Additionally, the transport of light inside the object is ignored; instead it behaves like an infinitely-thin shell. Even with these shortcomings, the effect is impressive.

Using Core Motion to Orient the Scene

The app uses the orientation of the device to rotate the scene so that the torus always appears at the center, with the environment revolving around it. The Core Motion framework uses the various sensors in the iOS device (accelerometer, gyroscope, etc.) to determine where the device is pointed.

The Motion Manager

The CMMotionManager class is central to tracking device motion. After creating a motion manager instance, we should check if device motion is available. If it is, we may proceed to set an update interval and request that the motion manager start tracking motion. The following code performs these steps, requesting an update sixty times per second.

Reference Frames and Device Orientation

A reference frame is a set of axes with respect to which device orientation is specified.
Several different reference frames are available. Each of them identifies Z as the vertical axis, while the alignment of the X axis can be chosen according to the application’s need. We choose to align the X axis with magnetic north so that the orientation of the virtual scene aligns intuitively with the outside world and remains consistent between app runs.

The device orientation can be thought of as a rotation relative to the reference frame. The rotated basis has its axes aligned with the axes of the device, with +X pointing to the right of the device, +Y pointing to the top of the device, and +Z pointing out of the screen.

Reading the Current Orientation

Each frame, we want to get the updated device orientation, so we can align our virtual scene with the real world. We do this by asking for an instance of CMDeviceMotion from the motion manager.

CMDeviceMotion *motion = motionManager.deviceMotion;

The device motion object provides the device orientation in several equivalent forms: Euler angles, a rotation matrix, and a quaternion. Since we already use matrices to represent rotations in our Metal code, they are they best fit for incorporating Core Motion’s data into our app.

Everyone has his or her own opinion about which view coordinate space is most intuitive, but I always choose a right-handed system with +Y pointing up, +X pointing right, and +Z pointing toward the viewer. This differs from Core Motion’s conventions, so in order to use device orientation data, we interpret Core Motion’s axes in the following way: Core Motion’s X axis becomes our world’s Z axis, Z becomes Y, and Y becomes X. We don’t have to mirror any axes, because Core Motion’s reference frames are all right-handed.

The renderer incorporates the scene orientation matrix into the view matrices it constructs for the scene, thus keeping the scene in alignment with the physical world.

Conclusion

In the post we have investigated some of the possible uses of cube maps. Cube maps can be used for creating convincing backdrops to your virtual scene with skyboxes, and they also enable interesting effects such as reflection and refraction. Cube maps are a valuable addition to your Metal repertoire.

Hi. Could you help me to create something like prism. In your sample you make refraction or reflection only from visible part of torus, but I need to create reflection on both parts of torus(not only torus any other figures). As example when you have two planes(second behind first), it’ll show refraction only for first plane.

Unfortunately, what you want to do is very difficult with forward rendering. As I mentioned in the article, this implementation of reflection/refraction is a hack that only accounts for the front-most surface of the figure. In order to see through the figure, accounting for both how light propagates in the refracting medium and other parts of the figure behind the front face, you would need to use a more sophisticated rendering algorithm, such as raytracing.

Thanks for your reply. I have idea to render two same figures with opposite normals directions. First will refract ray and get data from previously refracted second figure. But i don’t know how to implement this, and is it possible?

To the best of my knowledge all of the source on the site compiles on Xcode 7.x including the 7.2 beta. As for porting to Swift, it’s just a matter of rote translation. The bulk of the code is just imperative calls to Metal API, so it’s hard to imagine it getting more “Swifty” in translation.

I’m trying to get pan gesture controls in a similar implementation to this. I am having an issue where the camera keeps flipping upside down when you pan up and down on the screen. Not exactly sure what is going wrong with it.