Session 604WWDC 2016

Metal is the powerful low-overhead graphics and compute technology designed to unlock the power of the GPU. Check out the latest additions to the Metal frameworks and get details about supporting tessellation in your apps and games. Discover how to take control over synchronization and learn how to use resource heaps for even more efficient memory usage. See what's new in Metal debugging and profiling tools and gain insight into analyzing and optimizing performance.

[ Music ]

Morning, everyone.

Thanks.

[ Applause ]

My name is Aaftab Munshi.

And my colleagues and I are really excited to sharewith you the new features in Metal in macOS Sierra and iOS X.

But let's begin by highlighting the sessions we haveon Metal this year at WWDC.

So yesterday we had two sessions that talkedabout adopting Metal in your application.

And today we have three sessions.

So this session and the two sessionsthat cover the new features in Metal, which is then followedby another session where we'll talkabout optimizing your Metal shaders.

All right.

So let's look at the features we're going to talk about.

So in the second session the features we will be talkingabout are function or shader specialization and being ableto write to resources such as buffers and texturesfrom your fragment and vortex shader,wide color using wide color displays in your applicationand texture assets, and some new additions we've addedto Metal performance shaders, specifically using axillaryand [inaudible] networks on the GPU with Metal.

In this session we're going to talk about someof the improvements we have added to Tools,which we think you guys are going to really love.

We've also made resource heapsand resource allocations much fasterand given you more control.

So we'll talk about that resource heapsand memoryless render targets.

And I'm going to be talking about tessellation.

So let's begin.

All right.

So the first thing, let's spend a little bit of time tryingto understand why we need tessellation.

So we are seeing applications such as games rendering moreand more realistic visual content.

So what that means is in order to render such content,we need to be able to send detailed amountof geometry to the GPU.

That's where we're going to send this input.

That means lots and lots of triangles that haveto be processed, which means a large increasein memory bandwidth.

It would be really niceif instead we could just describe this geometrythat we want to send to the GPU as a lower resolution model,call it a core smash, and then have the GPU generate thehigh-resolution model.

So in fact, that's what tessellation does.

Tessellation is a technique that you can use to amplifyand refine the details of your geometric object.

We have two important requirements we need to meet.

The first is that the high-resolution model,the triangles that are generated do not get storedin graphics memory.

We don't want to pay that bandwidth cost.

And the second is a method that's used needsto be programmable.

So let's look at an example.

So here is a screenshot from GFXBench 4.0,which is a benchmark released by [inaudible].

And one of the key features it focuses on is tessellation.

So here's a screenshotof the car that's being rendered without tessellation.

You can see those rims.

They're very polygonal.

You wouldn't drive a car like that, would you?

Even the body panels have cracks in them.

And the reason for that is this is the actual geometry that'sbeing sent.

So you can see not a lot of triangles, which is greatit's exactly what we want.

What tessellation does is takes that input geometryand produces something like that.

I think this is really cool.

So if you look at the wire frame,you can see the GPUs actually generating,now we're rendering lots and lots of triangles, okay?

And that's the power of tessellation.

All right.

So let's look at how tessellation works in Metal.

So just like we did with Metal, you know,we wanted to take a clean sheet approach, right?

We wanted to design something that waseven though there are existing API'sthat do support tessellation that you may be familiar with,we wanted something that was really simple to graph,you know, easy to use, and we did not wantto leave any performance on the table.

And we think we have achieved that, and I hope you agreeafter this presentation.

So tessellation is available in macOS Sierra and on iOSwith the A9 processor.

All right.

So let's the things I'm going to talk about is well,how does the Metal graphics pipeline looklike for tessellation?

How do I render my geometry with tessellation?

And then how do I adopt it in my application?

So let's begin.

So today when you send primitives to the GPUwith Metal, you're sending triangles, lines, or points.

With tessellation, you're sending what we call a patch.

And put simply, a patch is just a parametric surfacethat is made up of spline curves.

What does that mean?

You may have heard of thingslike Bezier patches or B-spline patches.

So you describe a patch by a set of control-points.

So in this figure you see is a B-spline patch.

So you have 16 control-points or control vertices.

And what tessellation does put simply is allows you to control,okay, how many triangles do I use to render this patch?

So you may decide, "You know what?

I don't really want a lot of triangles.

I don't care how it looks."

So you may decide just four triangles is more than enoughand you'll get a polygonal look.

Or you decide, "Hey, I really want this looking niceand smooth."

That would take a lot more triangles.

But you have that control.

So let's start.

So the first stage in the graphics pipelinewhen we're doing tessellation is we call it atessellation kernel.

And what it does is it takes the patchwe talked about the patch with the control-points as inputand decides, okay, how much do I need to subdivide this?

How many triangles do I want the GPU to generate, right?

This information is captured in what we callas tessellation factors.

And I'll talk a little bitabout what these factors are a few slides later.

And you can also generate additional patch dataif you need it in a later stage.

The key thing this is a programmable stage,that means you're writing code.

So once you've written [inaudible] tessellationfactors, the next stage is called the tessellator.

So this is a fixed function stage.

So no code to write.

But you do net knobs to configure it, okay?

So it takes those tessellation factorsand breaks the patch up into triangles.

And the key thing the tessellator does here isthat it does not storethat triangle list it generates in graphics memory.

In addition to the triangle list it has generated,for each vertex in the triangle list it will generate what wecall a parametric coordinate the U and the V value.

And it uses this along with the control-pointsto compute the actual position on the surface.

Okay? All right.

So the tessellator generates triangles.

Today in Metal when you want to render primitives,you send triangles to the GPU.

What is the first thingthat happens is a vertex shader is executed, right?

Well, here the tessellator's generating triangles.

So if you think logically,the next stage would be a vertex shader, and it is.

We just call it the post-tessellation vertex shaderbecause it's operating on the trianglesthat are generated by the tessellator.

And so it's going to execute for the vertices of the trianglesthat the tessellator generated and it's goingto output transform positions.

So if you're familiar with DirectX,it's this shader plays the same, similar roleas the domain shader does in DirectX.

All right.

And then the rest of the pipeline remains the same.

We have the rasterizer and the fragment shader, right?

So you may ask, "Well, so I need to write this compute kernelto generate the tessellation factors.

Well, can I use the vertex or fragment shader?"

Of course you can.

In fact, you don't even need to write a shaderto generate these factors; you may have precomputed themand you can just load them in a buffer and passthat to the tessellator.

So you have a lot of control.

But if you are generating these factors in the GPU, we recommendthat you use a compute kernel.

Because guess what?

That allows us to run that kernel asynchronouslywith other draw commands.

So netting you a performance winand I think you guys will like that.

Well, actually let's take it a step further.

You don't even need to run this kernel every frame.

Because guess what?

If you have computed the tessellation factorslet's say you decide, "Hey, objects closeto the camera get much more tessellation,objects further away not as much."

So once I've computed them, then dependingon how the object is moving, I can just apply a scaleand the tessellator takes that.

So really, the pipeline is really, really simple.

We have four stages.

So let's compare it with the graphics pipelinewithout tessellation.

So without tessellation we have three stageswe have vertex shade, the rasterizer,and the fragment stage.

With tessellation we added a new stage, the tessellator.

It's fixed function so you don't have to write any shader.

And the vertex shader became the post-tessellation vertex shader.

We think this is really simple to understand.

I hope you agree.

All right.

So how do I render my geometry with tessellation?

There are four things I'm going to talk about.

Okay. Let's look at this post-tessellationor post-tess vertex shader; how is this differentfrom the regular vertex shader?

How do I pass my patch inputs?

And I told you that the tessellator's configurable.

So let's look at how we configure itand then draw patches.

So, well, meet the new shader, same with the old shader.

So in fact, you declare a post-tessellation vertex shaderwith a vertex qualifier.

But in addition to that, you also specify this attributewhich says, "Hey, it's working on a patch."

There are two kinds of patches a quad and triangle patch.

And you see the number next to that?

That number tells you how many control-points this patch isworking on.

So if you had a regular vertex shader,you would have passed a vertex ideas input.

Now you pass a patchID as input.

Remember I told you the tessellator generated aparametric UV coordinate?

Well, that's what this position in patch input is.

And then if you had a regular vertex shader,you would have passed something as stage in,the patch input we passed at the stage in.

And that's actually going to be exactly identicalbecause the next stage withor without tessellation is a rasterizer.

All right.

So let's look at patch inputs.

So if you had a regular vertex shader,you would have described your vertex inputas a struct, okay, in your shader.

And if you had decoupled the date type, that means the layoutand the buffers where the vertex inputs are comingfrom do not match the declaration in the shader,then you would have used the MTLVertexDescriptorto describe the layout.

Well, for patches there are two inputs.

One is the per-patch input.

And remember, I told there are one or more control-points?

So we need to specify those as inputs as well.

But it looks identical how you specify these.

So you use a MTLVertexDescriptor to specify the layoutof the patch input data in memory.

And as I showed you the slide before, we declared that inputas a stage in as well.

And you use the attribute index to identify an element as inputin the shader with the corresponding declarationin your MTLVertexDescriptor.

Since there can be more than one control-point, we basically haveto declare it using a template type.

And I'll talk about that in the next slide.

So let's look at an example.

So here I have my control-point data.

It has two elements.

So I'm using attributes zero and one.

And my per-patch data, which is attributes two and three.

So we combine these two things togetherand this is my patch input for every patch.

So notice that control templated type patch underscore controlunderscore point.

So that's what tells the Metal shading compiler "Hey,this is referring to control-point input."

Okay? And remember I told you about this number 16or whatever the number is?

That also tells the Metal shading compiler how manycontrol-points there are.

So now we have all information we need to get the patch input.

And so we just pass that as stage in.

It's pretty simple, I think.

All right.

So okay, how do I configure knobs?

So there are propertiesin the MTLRenderPipelineDescriptor youcan set.

A few examples are you can tell the tessellator the method youwant to use to generate the triangles;it's called the partitioning mode.

You can also specify a max tessellation level.

And we think this is really, really usefulbecause it allows you to control the maximum amount of geometrythat the GPU will generate for your tessellated objects.

Remember, the tessellator needs to read these factors.

So you need to specify the buffer of where they come from.

So use the setTessellationFactorBuffer APIto do that.

Now, these factors, so they tell how muchto subdivide the patches along the edges and on the inside.

So we have two kinds of patches.

If it's a triangular patch,there are three edges and one inside.

If it's a quad, then you have four edges and two insides.

So you specify these as half precision floating point valuesthat you pass in.

And then drawing.

So today when you're drawing primitives,you're sending triangles to be rendered by the GPU,you're either going to call drawPrimitivesor drawIndexPrimitives.

You the specify the start vertex, number of vertices.

And if your vertex indexes are not continuous,you will pass an index buffer.

Well, to draw patches, you call drawPatchesor drawIndexedPatches.

You specify the start patch, the number of patches.

And if you're control-point indexes are not continuous,you specify an index buffer.

So it's just a one-to-one mapping.

And then there is the DrawIndirect variants.

And what these are is that you do not specifywhere the start patch and how many patchesand other information when you make the draw call,but instead you pass a buffer.

And that gets filled out with this informationby a command that's running on the GPU, just like you would dofor drawPrimitives as well.

So really, if you don't know how to use drawPrimitives,then drawPatches just works very similarly.

Okay? So we think this is really easy to use.

All right?

So hold on.

So I've shown you what Metal tessellation isand how to use it.

As many of you may be familiar withor already using tessellation in your application using DirectXor OpenGL, you will notice Metal tessellation's alittle different.

If I turn wire frame mode on,you can see we're generating a lot more trianglesand they are really, really small.

In fact, let's actually animate the displacement mapso you can see the shapes changingand let's zoom in to see detail.

You can see self-shadowing happening.

And the reason self-shadowing is happening here isbecause we're actually changing the geometry,unlike a technique many of you may be familiarwith called bump mappingwhich just creates an illusion of realism.

So this is another technique which you can usewith tessellation to create incredible detailin your application that you're rendering.

The flexibility of resource heaps saves you memoryby allowing multiple resources to alias in memory.

And finally, the efficiency and flexibilityof resource heaps is made possible by you taking controlover tracking resource dependencieswith explicit command synchronization.

Now, let's dive into each one of these features startingwith resource sub-allocation.

Before talking about the details of sub-allocation,let's first discuss why device-based resource creationis expensive.

Creating an individual resourcewith a Metal device involves multiple steps:Allocating the memory; preparing the memory for the GPU;clearing the memory for security; and then, finally,creating the Metal object.

Each one of these steps takes time and a majorityof the time is spent in memory operations.

But there are situations when you need to create resourceson your performance-critical pathwithout introducing performance hitches.

Texture streaming is one exampleor perhaps you have an image processing app that needsto generate a numberof temporary textures to execute a filter.

The cost of binding resourcesto command encoders can also become a performance issue.

Metal must track each unique resource boundto a command encoder to make surethat the GPU can access the memory.

And for complex scenes, this cost can add up as well.

Resource sub-allocation addresses bothof these performance issues.

Remember that the expensive part of resource creation isin the memory operations.

With resource heaps you can perform the memory operationsahead of time outside of your game loop.

Resource heaps address the binding cost by allowing youto sub-allocate many logical resources from a single heap.

By sub-allocating multiple resources from one heap,Metal tracks one memory allocation insteadof one per individual resource.

But the A7 and later GPUs perform depth testing completelyin GPU tile storage one tile at a time.

Depth testing does not need to use system memory.

So if you don't store the depth texture for use in later passes,make the texture memoryless and save the memory.

Let me show you another opportunity.

When executing multisample rendering, again,the A7 and later GPUs perform all the renderingin GPU tile storage.

The MSAA color attachment texture is only usedif you choose to store the sample data for a later use.

But most apps will choose the multisample resolve store actionwhich results directly from the GPU tile storageto the resolve color attachment texture.

So in that case make the multisample color attachmenttexture memoryless and this is a massive memory savings.

As you can see, the savingsfor adopting this feature are substantial.

By making a 1080p depth texture memoryless,your app will save almost 8 megabytes.

If you are rendering to the native resolutionof a 12.9-inch iPad Pro,the savings for the depth buffer is over 20 megabytes.

And the savings for making a four times multisample rendertarget memoryless are even larger, four times larger.

So use memoryless render targets to make the mostof your application's memory budget.

Use the savings to lower the memory footprint of your game.

Or better yet, use the savings to add more beautifuland unique content to your game.

Okay. I'd like to invite Jose up to tell you allabout the improvements to the Metal Tools.

[ Applause ]

Thank you, James.

So outside the great additionsto the Metal API we did some great improvementsto Metal Developer Tools I want to show you.

First we'll talk about what's in Metal System Trace.

Than we'll introduce a new feature called GPU Override.

And we have some very exciting new features comingto GPU Frame Debugger.

So what is Metal System Trace?

In the [inaudible] Metal session we presented this graph showingyou Metal working on power in CPU and GPU.

Metal System Trace is a set of instrumentsfor visualizing just that,helping you understand the timelineof your Metal applicationsthrough the whole graphic pipeline, from the CPUto the GPU, and then on to the display.

Last year at WWDC we introduced Metal System Tracefor iOS platform.

I highly recommend checking out last year's presentationfor a great overview of Metal System Trace.

Later in the fall we added support for tvOS.

And today we're happy to announce Metal System Tracefor macOS to help you squeeze out the last bit of performanceon all Metal platforms.

[ Applause ]

We improved Metal System Trace across the board,extending the events that we report.

Like in this case where we can see painting in macOS,which is causing a delay in GPU execution.

Metal System Trace also displays debug groups,which make it easier for youto understand command encoded relations in your trace.

On macOS we support tracing multiple GPUs at the same time,which is unbelievable for those use caseswhere you're distributing work across different GPUs.

And on iOS we now display scalar workloadsso that you can diagnose when you're introducing latencyby rotating or scaling your views.

You can now use a wider rangeof instruments alongside Metal System Tracesuch as Time Profiler, File Activity,Allocations, and many more.

Even different views such as CPU data,which will show you CPU core time slices.

These will help you to correlate Metal events into context,deepening the understandingof how the system is running your applicationand allowing you to diagnose thingssuch as GPU starvation caused by CPU stall dueto a [inaudible] operation.

Metal System Trace captures a wealth of data.

So we made it easier for you to interpret and navigate.

With the new workload highlighting, you can focuson any command encoder or command bufferas it works through the pipeline.

And with with support for keyboard navigation,you can quickly move your selection through your trace.

Finally, I want to introduce Performance Observation.

And what Performance Observation does is present youwith a comprehensive list of the potential issues we foundin your trace from analyzing it.

From display surface taking too longto unexpected shader compilations,or high GPU execution times, Performance Observations findsfor you the events which you are looking for,which you can navigate straight to themfrom the Performance Observation list.

All these new additions will allow youto tune your Metal applications to run as smoothlyas you want them to be.

And now for a demonstrationof our awesome GPU debugging improvements,let me hand over to my colleague, Alp.

[ Applause ]

Thanks, Jose.

I have a number of great features to show you today.

So let's dive right in.

I have my app running here,cruising over beautiful terrain tessellated to finest details.

Wouldn't it be great to see this terrain in wire frameto see triangles individually?

The good news is our newest feature, GPU Overrides,gives you ability to modify your Metal rendering rightfrom the debug bar while your app is running.

We have a number of different overrides you can mix and match,including wire frame mode.

It recognizes the word color and visualizes the real colorof the value right in there.

Since this is a large buffer that contains different typesof data, I have added some debug markerswith the new [inaudible] API, which makes it extra easyto find what you are looking for.

With the layout menu, you can jump straightto any other available layout you would like to inspect.

Looking at individual buffers is great.

What is even better is the new input attribute viewwhich lets you see all your vertex dataas your vertex shader sees it.

Input attributes collects all the data from your instances,tessellation factor buffers, and your stage in data,then provides you a single view to look at all of it together.

In this case we are rendering instances with multiple patchesand I can see what data belongs to which patch of an instance.

So that was a quick look at someof our newest GPU Frame Debugger features.

Let's switch back to slides and wrap up.

[ Applause ]

So you've just seen someof our newest GPU Frame Debugger features.

I would like to tell you about two more.

With the new Extended Validation mode the GPU Frame Debugger canperform even deeper analysis of your application,providing recommendations [inaudible] the optimal textureusage or storage mode for your resources.

You can enable this mode from the Xcode scheme editors.

And the new supportfor stand-alone Metal Library Projects lets you create Metallibraries to be shared in multiple appsor include multiple of them in a single app justlike any other framework or library.

So we talked about featuresthat will greatly improve your tool's experience.

Now let's summarize what we have seen so far in this session.

We have seen the great additions to Metal API with tessellation,resource heaps and memoryless render targets,then we showed you improved tools, Metal System Traceand GPU Frame Debugger.

Be sure to stick around for part two this afternoonwhere I will talk about function specializationand function resource read-writes, wide colorand texture assets, and additionsto Metal performance shaders.

For more information about this session,please check the link online.

You can catch the video and get linksto documentation and sample code.

We had great sessions yesterday, which are available online.

And this afternoon we have What's New in Metal, Part2,then Advanced Metal Shader Optimization in this room.

Thanks for coming, and have a great WWDC.

[ Applause ]

Apple, Inc.AAPL1 Infinite LoopCupertinoCA95014US

ASCIIwwdc

Searchable full-text transcripts of WWDC sessions.

An NSHipster Project

Created by normalizing and indexing video transcript files provided for WWDC videos. Check out the app's source code on GitHub for additional implementation details, as well as information about the webservice APIs made available.