Execution model

The TCS execution model is different from most other shader stages; it is most similar to Compute Shaders. Unlike Geometry Shaders, where each invocation can output multiple primitives, each TCS invocation is only (in theory) responsible for producing a single vertex of output to the output patch.

For each patch provided during rendering, n​ TCS shader invocations will be processed, where n​ is the number of vertices in the output patch. So if a rendering command draws 20 patches, and each output patch has 4 vertices, there will be a total of 100 separate TCS invocations.

The different invocations that provide data to the same patch are interconnected. These invocations all share their output values. They can read output values that other invocations for the same patch have written to. But in order to do so, they must use a synchronization mechanism to ensure that all other invocations for the patch have executed at least that far.

Because of this, it is possible for TCS invocations to share data and communicate with one another.

Output patch size

The output patch size is the number of vertices in the output patch. It also determines the number of TCS invocations used to compute this patch data. The output patch size does not have to match the input patch size.

The number of vertices in the output patch is defined with an output layout qualifier:

layout(vertices = patch_size​) out;

patch_size​ must be an integral constant expression greater than zero and less than the patch limit (see below).

Inputs

All inputs from vertex shaders to the TCS are aggregated into arrays, based on the size of the input patch. The effective size of these arrays is the number of input patches provided by the patch primitive. User-defined inputs can be declared as unbounded arrays:

invec2texCoord[];

You should not attempt to index this array past the number of patch vertices.

There are a number of built-in input variables who's contents are generated by the system:

inintgl_PatchVerticesIn;inintgl_PrimitiveID;inintgl_InvocationID;

gl_PatchVerticesIn​ is the number of vertices in a patch. gl_PrimitiveID​ is the index of the current patch within this rendering command. gl_InvocationID​ is the index of the TCS invocation within this patch. Therefore, a TCS invocation usually writes to per-vertex output variables by using this to index them.

The TCS also takes the built-in variables output by the vertex shader:

Outputs

Per-vertex outputs are aggregated into arrays. Therefore, a user-defined per-vertex output variable would be defined as such:

outvec2vertexTexCoord[];

The length of the array (vertexTexCoord.length()​ will always be the size of the output patch. So you don't need to restate it in the definition.

A TCS can only ever write to the per-vertex output variable that corresponds to their invocation. So writes to per-vertex outputs should be to vertexTexCoord[gl_InvocationID]​. Any expression that writes to a per-vertex output that doesn't index it with exactly "gl_InvocationID​" results in a compile-time error. This includes silly things like vertexTexCoord[gl_InvocationID - 1 + 1]​.

There is a built-in per-vertex output Interface Block for the traditional vertex values:

You do not have to use this output interface block if you do not want to.

Patch variables

Per-patch output variables are not aggregated into arrays (unless you want them to be, in which case you must specify a size). All TCS invocations for this patch see the same patch variables. They are declared with the patch​ keyword:

If multiple TCS invocations write to the same patch output, they should write the same value. This is guaranteed so long as the math and logic they use to compute the values written to patch outputs do not use gl_InvocationID​ in any way.

Synchronization

TCS invocations that operate on the same patch can read each others output variables, whether per-patch or per-vertex. To do so, they must first ensure that those invocations have actually written to those variables. The value of all output variables is undefined initially.

Ensuring that invocations have written to a variable requires synchronization between invocations. This is done via the barrier()​ function. When executed, it will not complete until all other TCS invocations for this patch have reached that barrier. This means that all writes have occurred by this point. However, subsequent writes to those variables may have occurred, so if you want to read those variables, make sure that another barrier()​ is issued before writing more to them. If there are no subsequent writes to those variables, then this should be fine.

The barrier()​ function has significant restrictions on where it can be placed. It must be placed:

Directly in the main()​ function. It cannot be in any other functions or subroutines.

Outside of any flow control. This includes if​, for​, switch​, and the like.

Before any use of return​, even a conditional one.

This ensures that every TCS invocation hits the same sequence of barrier()​ calls in the same order every time. The compiler will error if any of these restrictions are violated.

Limitations

There is a maximum output patch size, defined by GL_MAX_PATCH_VERTICES; the vertices​ output qualifier must be less than this value. The minimum required limit is 32.

There are other limitations on output size, however. The number of components for active per-vertex output variables may not exceed GL_MAX_TESS_CONTROL_OUTPUT_COMPONENTS. The minimum required limit is 128.

The number of components for active per-patch output variables may not exceed GL_MAX_TESS_PATCH_COMPONENTS. The minimum required limit is 120. Note that the gl_TessLevelOuter​ and gl_TessLevelInner​ outputs do not count against this limit (but other built-in outputs do if you use them.

There is a limit on the total number of components that can go into an output patch. To compute the total number of components, multiply the number of active per-vertex components by the number of output vertices, then add the number of active per-patch components. This number may not exceed GL_MAX_TESS_CONTROL_TOTAL_OUTPUT_COMPONENTS. The minimum required limit is 4096, which is not quite enough to use a 32-vertex patch with 128 per-vertex components and 120 per-patch components. But it's still a lot.