Praise the Metal – Part 4: Render Encoders and the Draw Call

Welcome to another entry about Metal support in Crimild. I’m really amazed by the fact that I managed to write several posts in a row in just a couple of weeks. Hopefully, I can keep up with the rest. Because I’m not done yet.

Let’s recap what we discussed so far:

In the first post, the basics concepts for Metal were introduced as well as the reasons for Crimild to support it.

In Part 1 we talked about what needs to be performed during the initialization and the rendering phase, introducing synchronization along the way.

In Part 2 we went deep into the geometry pass and how to describe render pipelines for our visible objects.

In Part 3 we showed the power of the Metal Shading Language and how shaders are written.

Now it’s time to address the step that’s still missing in our rendering process: how to actually send render commands to the GPU using encoders. In addition, I’m going to briefly introduce framebuffers in Metal and how they are handled during the render pass, although I’ll leave the post-processing pass and image effects details for a future post.

This post is supposed to tie up all loose ends in our previous entries, so let’s start…

Command Encoders

We mentioned encoders several times before in previous posts, but we’ve never defined what they are. Command encoders are used to write commands and states into a single command buffer in a way that can be executed by the GPU.

Metal provides three different kind of command encoders: Render, Compute and Blit. It’s important to note that while we can interleave encoders so they write into the same command buffer, only one of them can be active at any point in time.

Creating Render Encoders

At the moment, only render encoders are supported by Crimild, defined by the MTLRenderCommandEncoder protocol, and they are created whenever framebuffers are bound during the render process.

Since encoders write into specific buffers, you create new ones by requesting a new instance from the MTLCommandBuffer itself:

auto renderEncoder = [getRenderer()->getCommandBuffer() renderCommandEncoderWithDescriptor: renderPassDescriptor];

For render encoders, we need to describe them in terms of a rendering pass which are objects describing rendering states and commands. The MTLRenderPassDescriptor class defines the attachments that serve as the rendering destination  for commands in a command buffer.  We may have up to four color attachments, but only up to one for depth and another for stencil operations.

A render pass that will draw to the default drawable (i.e., the screen) is typically described as follows:

auto renderPassDescriptor = [MTLRenderPassDescriptor new];
renderPassDescriptor.colorAttachments[ 0 ].loadAction = MTLLoadActionClear;
const RGBAColorf &clearColor = fbo->getClearColor();
renderPassDescriptor.colorAttachments[ 0 ].clearColor = MTLClearColorMake( clearColor[ 0 ], clearColor[ 1 ], clearColor[ 2 ], clearColor[ 3 ] );
renderPassDescriptor.colorAttachments[ 0 ].storeAction = MTLStoreActionStore;
renderPassDescriptor.colorAttachments[ 0 ].texture = getRenderer()->getDrawable().texture;

The code above describes a render pass that will clear the color attachment and store the results of the rendering process into the default drawable’s texture provided by the renderer.

Alternatively, you can set a different texture as the attachment’s target if you need to perform offscreen rendering, as we will see when I show you how to do post-processing effects in later posts.

In Crimild, render passes and encoders are linked with instances of crimild::FrameBufferObject, which seemed like the natural choice for me, and the related crimild::Catalog implementation takes care of creating and using them.

Specifying resources for a render command encoder

When drawing geometry, we need to specify which resources are bound with the vertex and/or the fragment shader functions. A render command provides methods to assign resources (as in buffers, textures and samplers) to the corresponding argument table as we saw in the last post.

[getRenderEncoder() setVertexBuffer: uniforms offset: 0 atIndex: 1];
[getRenderEncoder() setFragmentBuffer: uniforms offset: 0 atIndex: 1];
[getRenderEncoder() setVertexBuffer: vertexArray offset: 0 atIndex: 0];

In Crimild, resources are set to the render encoder at different points in the render process by different entities. Data buffers, textures and samplers are usually handled by catalogs while uniform and constant buffers are handled by the MetalRenderer itself.

Specifying the render pipeline

We also need to associate a compiled render pipeline state to our encoder for use in rendering:

[getRenderEncoder() setRenderPipelineState: renderPipeline];

The Draw Call

Everything’s set. It’s time to execute the actual draw call.

Metal provides several draw methods depending on the primitives you want to render. Crimild uses indexed primitives by default, so the corresponding method is invoked in this step:

[getRenderEncoder() drawIndexedPrimitives: MTLPrimitiveTypeTriangle
                               indexCount: indexCount
                                indexType: MTLIndexTypeUInt16
                              indexBuffer: indexBuffer
                        indexBufferOffset: 0];

The first argument determines which type of primitive we are going to draw. In this case, we will draw indexed triangles and we specify the index buffer to interpret the vertices, which were passed to the render encoder before the call to this function.

Ending the Rendering Pass

And then we reach the final point in the render process. To terminate a rendering pass, we invoke the endEncoding method on the active render encoder. Once the encoding has finished, you can start a new one on the same buffer if needed.

[getRenderEncoder() endEncoding];

Crimild automatically invokes the endCoding method when unbinding framebuffers, ensuring that all render commands have been set at that point.

Once all command encoders have been described, our command buffer is committed and the drawable will be presented to the screen, as we saw in Part 1.

Side effects? What side effects?

If you’re familiar with Crimild you might have notice a little side effect (actually, a constraint) when working with the Metal-based renderer. Basically, it’s enforcing the use of framebuffers, meaning that it will only work with the forward render pass approach (or anything more complex than that). It wasn’t to be like that when I started. The original goal was to support every kind of render pass, regardless of whether or not it required offscreen rendering. In the end, having at least one offscreen framebuffer is the most natural way of with Metal. At least for Crimild. So, no Metal for you unless you’re willing to pay the price.

On the plus side, working with a deferred render approach seems a lot easier now. I don’t have anything productive yet regarding such a technique (at least not in Metal), but it’s something that I want to do in the near future since it will bring a lot of benefits.

Wait, there’s more…

As I said at the beginning of this article, I’m not done with this series yet. At this point we will be able to render some objects to the screen but, if we only follow the steps discussed so far, the result will be a bit disappointing:

Screen Shot 2016-05-15 at 2.58.59 PM

Where are the textures and labels? Where’s the post processing effect? There are no menus either. You’re right. There are lot of things that are yet to be discovered.

In the next post, we’re going to see how to handle textures and lighting in Metal, as well as describing alpha testing and other state changes.

To be continued…

Praise the Metal – Part 3: Metal Shading Language

Hello, Voyagers. Welcome to another entry in this series of articles about Metal and how Crimild managed to support it.

In this post I’m going to briefly introduce MLSL. Truth is, there are many things that can be said about this new shading language which will require much more than a blog post. So I promise I’m going to try and keep things as simple as possible.

Overview

Metal provides its own language for writing shaders, the Metal Shading Language (or MLSL for short) and it can be used for both graphics and compute processing. Designed for LLVM and clang, and based on a static subset of C++11, MLSL comes with a whole new set of improvements over, for example, OpenGL’s own GLSL.

I said that MLSL is based around C++11, and it does so by providing its own extensions and restrictions to the language. Mechanisms like function overloading and templates are really useful in the field, even when I didn’t have the chance to exploit them as much as I like. Another amazing feature is that you can declare structs and functions in your header files and reuse them inside other shader files or even from our C++ code. Indeed, now we can design actual libraries with reusable code for our shaders without having to rely on macros and other dirty tricks.

On the downside, recursive function calls and lambdas are out of the equation, as well as dynamic memory usage with the new or delete operators (obviously).  But look at the bright side, goto statements are banned as well. Because, you know, this isn’t the 80’s anymore

It’s also worth mentioning that, since we cannot use C++’s standard library in our shaders, Metal does provide its own, named the Metal Standard Library. This library includes all kind of helper functions for simple math, graphics, texture handling, and more.

Writing and Compiling Shaders

Metal shaders are saved on files with the .metal extension which will be automatically compiled by Xcode whenever the application is built. Compiled shaders are stored within libraries to be used later by pipelines as we saw in my previous post. There’s no need to ship shader code within the application anymore and any coding error can be reported also at build time.

While offline shader compilation at built time is the recommended approach, Metal also provides a run-time shader compiler just in case you need to build them dynamically. Again, it’s not the recommended choice since you will be suffering a performance penalty at draw time.

MLSL code is organized in a different way than GLSL. For starters, we can have both vertex and fragment shader functions in the same .metal file. Also, there’s no need to split our code into separated programs, greatly reducing the amount of duplicated code.

Let’s see MLSL in action…

The Vertex Shader Function

For the rest of the post I’m going to describe a simple shader implementation for diffuse shading. For such an example, the vertex shader function only needs to compute the projected vertex position and would look something like this:

vertex VertexOut crimild_vertex_shader_unlit_diffuse( VertexIn vert [[ stage_in ]],
                                                      constant crimild::metal::MetalStandardUniforms &uniforms [[ buffer( 1 ) ]] )
{
    float4x4 mvMatrix = uniforms.vMatrix * uniforms.mMatrix;
    float4 mvPosition = mvMatrix * float4( vert.position, 1.0 );

    VertexOut out;
    out.position = uniforms.pMatrix * mvPosition;

    return out;
}

This function uses the vertex qualifier to indicate that it will be executed for each vertex, generating per-vertex output. Those of you familiar with GLSL should be able to recognize what this function does quite easily.

The VertexIn and VertexOut data types are declared as follows:

struct VertexIn {
    float3 position [[ attribute( 0 ) ]];
    float3 normal [[ attribute( 1 ) ]];
    float2 uv [[ attribute( 2 ) ]];
};

struct VertexOut {
    float4 position [[ position ]];
    float3 normal;
    float3 eye;
};

The Fragment Shader Function

Fragment shader functions must use the fragment qualifier. They are executed for each fragment in the stream and generate per-fragment output.

Here’s the code for our diffuse shading fragment shader

fragment float4 crimild_fragment_shader_unlit_diffuse( VertexOut projectedVertex [[ stage_in ]],
                                                       constant crimild::metal::MetalStandardUniforms &uniforms [[ buffer( 1 ) ]] )
{
    return uniforms.material.diffuse;
}

Notice that both shader function may be declared with any name (except main) as long as they use the corresponding qualifiers.

Argument Tables and Uniforms

OK, I bet you already noticed those attributes surrounded by brackets in some of the arguments in the code above. Remember my last post where I mentioned how to use buffers when rendering geometries?

[getRenderEncoder() setVertexBuffer: uniforms offset: 0 atIndex: 1];

Each command encoder contain several argument tables, one per argument type (buffers, textures and samplers). The atIndex parameter in the above function call is used to index the corresponding table (in this case, buffers).

Then, MLSL will use the index in the shader code when referencing the argument. For example, referencing uniforms in either the vertex or fragment function is done in the following fashion:

constant crimild::metal::MetalStandardUniforms &uniforms [[ buffer( 1 ) ]]

Speaking of uniforms, In my last post I described the mechanisms to create uniform buffers and pass them to each of the shader functions. I said Crimild uses a single structure for all standard uniforms that are used by the MetalRenderer and such struct looks like this:

        typedef struct {
            simd::float4 ambient;
            simd::float4 diffuse;
            simd::float4 specular;
            float shininess;
        } MaterialUniform;

        typedef struct {
            simd::float3 position;
            simd::float3 attenuation;
            simd::float3 direction;
            simd::float4 color;
        } LightUniform;

        typedef struct {
            MaterialUniform material;

            unsigned int lightCount;
            LightUniform lights[ CRIMILD_METAL_MAX_LIGHTS ];

            simd::float4x4 pMatrix;
            simd::float4x4 vMatrix;
            simd::float4x4 mMatrix;
        } MetalStandardUniforms;

The MetalStandandarUniforms struct is declared in its own header file and can be used from either the Metal API or MLSL. For complex data types, like vectors and matrices, we can make use the facilities provided by the simd library. We could use static arrays too, although it’s not recommended.

Stock Shaders

Crimild comes with a set of stock shaders for Metal as it does for OpenGL. At the time of this writing there are only a bunch of them like diffuse shading, phong lighting, textures and a basic forward rendering shader that mixes all of them together. Many more will be implemented in the future as new features are added.

In Crimild, the crimild::Shader class is responsible to store the actual shader code. Since shaders in Metal are pre-compiled, that class now holds the name of either the vertex or the fragment shader functions instead.

Wrapping up

That will be all about MLSL. Well, not all of it, but at least what I considered most relevant for the moment. We’ll see more of MLSL when we talk about post-processing and image effects.

We’re ready to finish the draw process now. In the next entry, we’re finally going to talk about render encoders and how to execute the draw call itself. Praise the Metal.

To be continued…

 

Praise the Metal – Part 2: The Geometry Pass

Ok, time to do something fun. Now that we know how to do the basics for a single frame, let’s dive into how objects get drawn.

I’m assuming you are familiar with the crimild::Geometry class in Crimild, but I guess a refreshment won’t hurt:

crimild_geometry

Each crimild::Geometry instance should contain at least one primitive, which in turn store all the vertex data and how it’s supposed to be represented (triangles, lines, etc). On the other hand, each geometry is associated with a crimild::Material instance describing the way they are rendered (through shaders). Finally, the crimild::Geometry class extends from crimild::Node, therefore inheriting the world transformation.

Rendering Geometries

In Crimild, there are several steps required in order to render some primitives on the screen. Let’s summarize them as follows:

  1. Acquire the geometry’s associated material
  2. Enable the corresponding shaders for that material
  3. Enable lights
  4. Enable textures
  5. Enable the vertex data for the gometry
  6. Set shader uniforms, like transformations and other constants
  7. Perform the draw call

Please note that the list above does not consider grouping geometries by materials or the concept of instancing for vertex data. And I’m leaving anything related with the shadow pass and skinning out as well. Sorry, but I didn’t want to complicate things too much for this post. You can check any of the crimild::RenderPass implementations to verify what an actual geometry pass looks like.

Anyway, if we were using an OpenGL-based rendered, all of the steps mentioned above would be split into a series of state changes. Fortunately for us, Metal groups most of those state changes into as few as possible, thanks to precompiled pipelines.

Pipelines

So, what are pipelines? I mentioned them before in my previous posts, but now the time has come to see them in action.

A single pipeline contains the rendering configuration used during the geometry pass. In Metal, we use the MTLRenderPipelineDescriptor to specify vertex layout, shaders, rasterizer options (such as multisampling), blending and framebuffer attachments (as in color, depth or stencil attachments).

Describing pipelines

Crimild creates new pipelines and linked them with instances of the crimild::ShaderProgram class. This is performed by the crimild::metal::ShaderProgramCatalog class.

To be honest, at the time of this writing I’m still not completely comfortable with this design choice, since I’m starting to believe that it makes more sense for pipelines to be linked with instances of crimild::Material instead. I’m still struggling with that idea and I’ll definitely revisit it in the future.

Here’s an example of how Metal’s pipelines are described within Crimild:

// 1
NSString *vertexProgramName = [NSString stringWithUTF8String: program->getVertexShader()->getSource().c_str()];
id <MTLFunction> vertexProgram = [getDefaultLibrary() newFunctionWithName: vertexProgramName];
    
// 2
NSString *fragmentProgramName = [NSString stringWithUTF8String: program->getFragmentShader()->getSource().c_str()];
id <MTLFunction> fragmentProgram = [getDefaultLibrary() newFunctionWithName: fragmentProgramName];
    
MTLRenderPipelineDescriptor *desc = [MTLRenderPipelineDescriptor new];
desc.sampleCount = 1;
desc.vertexFunction = vertexProgram; // 1
desc.fragmentFunction = fragmentProgram; // 2
desc.colorAttachments[0].pixelFormat = MTLPixelFormatBGRA8Unorm;

This code describes a simple pipeline, asuming a forward render pass (I haven’t started working on deferred rendering yet). Notice that the first few lines get the precompiled shader functions and set those to the pipeline. For simplicity, I’m not showing error handling or alternative flows.

There are many more things that can be described per pipeline, and we’ll see some of them below.

Once described, the pipeline need to be compiled and the result is stored into a MTLRenderPipelineState object:

id <MTLRenderPipelineState> renderPipeline = [getRenderer()->getDevice() newRenderPipelineStateWithDescriptor: desc error: &error];

Using pipelines

In run-time, switching pipelines is a one-liner:

[getRenderEncoder() setRenderPipelineState: renderPipeline];

This single line of code will apply all of the rendering settings defined for that particular pipeline. Pretty neat, uh?

I haven’t talk about render encoders yet. I’m leaving that for a future post. For the moment, just think about them as the ones that will translate our pipelines into actual render commands.

Vertex data

Before we move on, please note that Crimild favors interleaved float-arrays for vertices and indexed primitives, so I’m mostly going to work with that kind of data for the rest of the posts in this series. Do keep in mind that Metal, like OpenGL, provides many more ways to layout and use our vertex data.

Vertex buffer layout

As I said before, Crimild expects vertex data to be interleaved. That is, all data is stored in a single float array, one vertex after the other. Now, each vertex may contain several components like positions, normals, texture coordinates, weights, and more. For example, a simple triangle containing 3 floats for positions, 3 floats for normals and 2 floats for texture coordinates will be represented in memory like this:

float data[] = {
   /* position */        /* normals */      /* uv */
   -1.0f, -1.0f, 0.0f,   0.0f, 0.0f, 1.0f,  0.0f, 1.0f,
   1.0f, -1.0f, 0.0f,    0.0f, 0.0f, 1.0f,  1.0f, 1.0f,
   0.0f, 1.0f, 0.0f,     0.0f, 0.0f, 1.0f,  0.5f, 0.0f
};

In order to specify the vertex format that we’re going to use, Metal does so as part of the pipeline description step by employing the MTLVertexDescriptor class. For example, the following code will describe the vertex layout that we need for the triangle above:

MTLRenderPipelineDescriptor *desc = [MTLRenderPipelineDescriptor new];
    
MTLVertexDescriptor* vertexDesc = [[MTLVertexDescriptor alloc] init];
vertexDesc.attributes[0].format = MTLVertexFormatFloat3;
vertexDesc.attributes[0].bufferIndex = 0;
vertexDesc.attributes[0].offset = VertexFormat::VF_P3_N3_UV2.getPositionsOffset() * sizeof( float );;
vertexDesc.attributes[1].format = MTLVertexFormatFloat3;
vertexDesc.attributes[1].bufferIndex = 0;
vertexDesc.attributes[1].offset = VertexFormat::VF_P3_N3_UV2.getNormalsOffset() * sizeof( float );
vertexDesc.attributes[2].format = MTLVertexFormatFloat2;
vertexDesc.attributes[2].bufferIndex = 0;
vertexDesc.attributes[2].offset = VertexFormat::VF_P3_N3_UV2.getTextureCoordsOffset() * sizeof( float );
vertexDesc.layouts[0].stride = VertexFormat::VF_P3_N3_UV2.getVertexSize() * sizeof( float );
vertexDesc.layouts[0].stepFunction = MTLVertexStepFunctionPerVertex;

desc.vertexDescriptor = vertexDesc;

Since we’re talking about raw memory here, we need to make sure that our offsets and sizes take the size of our data (in this case, float) into consideration.

Bare in mind that since pipelines are pre-compiled, we shouldn’t change the vertex format once set. Otherwise, we will be forced to compile the pipelines again, loosing all benefits along the way.

In contrast, in OpenGL it doesn’t really matters if we change the vertex layout since we will be resetting it on every draw call (I think VAOs are supposed to fix that, though). Honestly, I haven’t seen a use case when you need to do this on run-time, but both APIs allow it.

Loading vertex data

Time to talk about one of the greatest features of the Metal API: buffers. In Metal, we don’t have separated buffers for each data type as in OpenGL. Instead, there’s only one implementation. The MTLBuffer protocol is used to store unformatted data that can later be used either for vertices, shader constants, textures or any other use of raw memory that you can imagine. And, most importantly, buffers can be shared between the CPU and the GPU, which means we don’t need to upload (or download) data from one another anymore.

And, of course, they’re really simple to use too:

crimild::VertexBufferObject *vbo = /* our VBO */
id < MTLBuffer > vertexArray = 
   [getDevice() newBufferWithBytes: vbo->getData()
                            length: vbo->getSizeInBytes()
                           options: MTLResourceOptionCPUCacheModeDefault];

In the newest versions of Metal, the last argument let us specify the strategy for the buffer storage. We can request shared, private or managed memory at this point. Crimild uses only shared memory at the moment and I still need to see that other two options in action to understand how to support them.

Using vertex data

Vertex data is generated at load time, usually at the beginning of our program. In order to use a vertex buffer in run-time, we need to do so in a similar way as for our pipelines:

[getRenderEncoder() setVertexBuffer: vertexArray]
                             offset: 0
                            atIndex: 0];

The last arguments, offset and index, are used to identify them in the shaders themselves. MLSL will be discussed in the next post, we can assume that those fields could be related with how attribute and uniform locations work in OpenGL.

Indexed primitives

Metal supports indexed primitives which let us reuse vertices and save some memory. Generating a buffer containing the indices for our primitive is pretty much the same as for vertex buffers:

crimild::IndexBufferObject *ibo = /* our IBO */
id < MTLBuffer > indexArray = 
   [getDevice() newBufferWithBytes: ibo->getData()
                            length: ibo->getSizeInBytes()
                           options: MTLResourceOptionCPUCacheModeDefault];

Once we have the index buffer created, we will use it during our render pass at the time we trigger the actual draw call for a given geometry:

[getRenderEncoder() drawIndexedPrimitives: MTLPrimitiveTypeTriangle
                               indexCount: indexCount
                                indexType: MTLIndexTypeUInt16
                              indexBuffer: indexBuffer
                        indexBufferOffset: 0];

There’s a lot to be said about the draw call, and that will be the main subject of my next couple of posts.

Uniforms

Time for some serious mind blower. The way Metal handles uniforms is simply fantastic. Instead of having to send each uniform (like transformations, lights or materials) separately to the GPU as in OpenGL, in Metal we can group them together into a single buffer and dispatch said buffer with just a single call.

id < MTLBuffer > uniforms = /* create the uniforms buffer *
[getRenderEncoder() setVertexBuffer: uniforms offset: 0 atIndex: 0];

Someone’s uncle once said: “with great power, come great design choices”. And I have to admit that working with uniforms in this way wasn’t easy for me. Crimild is designed based on OpenGL and similar APIs, and switching to the new paradigm required a couple of dirty tricks.

At first, I split the uniform buffers into a series of groups based on their functionalities, (as in, one group for transformations, another for lighting, materials, and so on). I didn’t liked that approach because it felt too much old-school and it wasn’t really taking advantage of the new mechanisms.

Instead, Crimild defines a single structure for all uniforms and let the renderer to set them up. I’m expecting that this design will change in the future, since it lacks the means to extend it with more functionalities (for example, bone data for animation), but for the moment I can live with it.

One thing to notice about uniforms is that we can link them with either the vertex shader function, the fragment shader function or both. Alas, we need to do it manually in our program. Therefore, if we need them in both shaders functions we will end up with code like this:

[getRenderEncoder() setVertexBuffer: uniforms offset: 0 atIndex: 0];
[getRenderEncoder() setFragmentBuffer: uniforms offset: 0 atIndex: 0];

Bonfire ahead, therefore pause

dark_souls_2_standart_icon

Time to take a little break. The next stop in our voyage (pun intended) will take us to the wonderful world of the Metal Shading Language, we’ll revisit uniforms and we’re going to start talking about the draw call itself. See you soon.

To be continued…