Praise the Metal – Part 6: Post-processing

I knew from the start that Le Voyage needed some kind of distinctive feature in order to be recognized out there. The gameplay was way too simple, so I focused on presentation instead. Since the game is based on early silent films, using some sort of post-processing effect to render noise and scratches was a must have.

About Image Effects

In Crimild, image effects implement post-processing operations on a whole frame. Ranging from simple color replacement (as in sepia effects) to techniques like SSAO or Depth of Field, image effects are quite powerful things.

Le Voyage makes use of a simple image effect to achieve that old-film look.

Screen Shot 2016-05-15 at 3.03.19 PM

There are four different techniques applied in the image above:

  1. Sepia tone: All colors are mapped to sepia values, which is that brownish tint. No more blues, reds or grays. Just different sepia levels.
  2. Film grain: noise produce in films due to luminosity changes (or so I was told)
  3. Scratches: Due to film degradation, old movies display scratches, burnts and other kind of artifacts after some time.
  4. Vignetting: The corners of the screen look more dark than the center, to simulate the dimming of light. This technique is usually employed to frame important objects in the center of the screen, as in closeups.

All of these effects are applied in a single pass for best performance.

How does it work?

Crimild implements post-processing using a technique called ping-pong, where two buffers are switched back-and-forth, serving as both source and accumulation while processing image effects.

A scene is rendered into an off-screen framebuffer, which is designated as source buffer. For each image effect, the source buffer is bound as a texture and used to get the pixel data that serves as input for an effect. The image effect is computed and rendered to a second buffer, known as accumulation buffer. Then, source and accumulation are swapped and, if there are more image effects to apply, the process starts again.

When there are no more image effects to be processed, the source buffer will contain the final image that will to be displayed on the screen.

Confused? Then the following image may help you (spoiler alert: it won’t):

IMG_0548

Ping-pong buffer. For each image effect, the source and destination buffers are swapped. Once all effects have been processed, the source buffer contains the final image

Le Voyage only uses one image effect that applies all four stages in a single pass. No need for additional post-processing is required. If you want to know more implementing the old-film effect, you can check this great article by Nutty.ca in which I based mine.

Powered by Metal

Theoretically speaking, there’s no difference concerning how this effect is applied in either Metal or OpenGL, as the same steps are required in both APIs. In practice, the Metal API is a bit simpler concerning handling framebuffers, so the code ends up more readable. And there’s no need for all that explicit error checking that OpenGL needs.

If you recall from a previous post where we talk about FBOs, I said that we need to define a texture for our color attachment. At the time, we did something like this:

renderPassDescriptor.colorAttachments[ 0 ].texture = getDrawable().texture;

That code sets the texture associated with a drawable to the first color attachment. This will render everything on the drawable itself, which in our case was the screen.

But in order to perform post-processing, we need an offscreen buffer. That means that we need an actual output texture instead:

MTLTextureDescriptor *textureDescriptor = [MTLTextureDescriptor 
   texture2DDescriptorWithPixelFormat: MTLPixelFormatBGRA8Unorm
                                width: /* screen width */
                               height: /* screen height */
                            mipmapped: NO];
 
id< MTLTexture > texture = [getDevice() newTextureWithDescriptor:textureDescriptor];

renderPassDescriptor.colorAttachments[ 0 ].texture = texture;

The code above is executed when generating the offscreen FBO. It creates a new texture object with the screen dimensions and set it as the output for the color attachment. Notice that we don’t pass any data as the texture’s image since we’re going to write into it.

Keep in mind we need two offscreen FBOs created this way, one that will act as source and the other as accumulation.

Once set, binding this offscreen FBO will make our rendering code to draw everything on the texture itself instead of the screen, which will become our source FBO.

Then we proceed to render the image effect, using a quad primitive and that texture as input and producing the final image into the accumulation FBO, which is then presented on the screen since there are no more image effects.

Again, this is familiar territory if you already know how to do offscreen rendering in OpenGL. Except for a tiny little difference…

Performance

When I started writing this series of posts, I mentioned that one of the reasons for me to work with Metal was that the game’s performance was poor on the Apple TV. More specifically, I assumed that the post-processing step was consuming most of the processing resources for a single frame. And I wasn’t completely wrong about that.

Keep in mind that post-processing is always a very expensive technique, both in memory and processing requirements, regardless of the API. In fact, the game is almost unplayable on older devices like an iPod Touch or iPhone 4s just because of this effect (and those devices don’t support Metal). Other image effects, like SSAO or DOF are even more expensive, so you should try and keep the post-processing step as small as possible.

I’m getting ahead of myself, since optimizations will be subject of the next post, but it turns out that Metal’s performance boost over OpenGL not only lets you display more objects in screen, but also allows for more complex post-processing effects to be applied in real-time too. Even without optimizing the shaders of the image effect, I noticed a 50% increment in performance just by switching the APIs. That was amazing!

Et Voilá

And so the rendering process is now completed. We started with a blank screen, and then proceeded to draw objects in an offscreen buffer. Finally, we applied a post-processing effect to achieve that old-film look giving the game that unique look. It was quite the trip, right?

In the next and (almost) final entry in this series we’re going to see some of the amazing tools that will allow us to profile applications based on Metal, as well as some of the most basic optimization techniques that lead to great performance improvements.

To be continued…

Advertisements

Praise the Metal – Part 5: Textures & Lighting

Hello, Voyagers. As the title of this post suggests, I’m going to talk about how to do texture & lighting in Metal. Additionally, I’ll be briefly describing other techniques like depth/stencil, alpha and culling tests, which are very important during the rendering process.

Truth be said, maybe Le Voyage is not the best example of the use of textures and lighting in a game, since it’s quite simple in this regard. Still, a single one of its frames has enough information for us to do something meaningful with Metal.

Screen Shot 2016-05-15 at 2.57.49 PM

The image above shows an in-game frame of Le Voyage, with most of the objects being rendered using the techniques described in previous posts, plus a few new one tricks that will be presented here.

There are a lot of things going on here. Let’s summarize them:

  • Opaque objects, like the projectile (in blue), the mountains or the balloons, all of them affected by a single light
  • Opaque objects that are not affected by lights, like the Moon in the background (in yellow)
  • Translucent objects like the pause button, the approaching high score label at the very back of the scene (visible near the Moon in the image above) or the current score label at the top right corner of the screen
  • Two textures for fonts, one for each font style (one is used for the score label, the other for the high scores in the 3D world)
  • One texture for object colors (there are several levels of gray for different objects, plus some basic colors)
  • One light source, positioned behind the camera and affecting most of the objects in the 3D world.

Additionally, when displaying menus there are more textures for special buttons (such as the Facebook, Twitter or Game Center buttons). The intro scene, the one with the cannon and the scaffolding, has an extra light source coming from where the Moon is (I bet you didn’t notice that one).

No post-processing effects have been applied in the image above since we haven’t talked about image effects yet. I’ll leave that for the next entry in the series.

DISCLAIMER: The techniques described in this series of posts are limited only to those features currently supported by Crimild and also to those used in Le Voyage. It’s by no means an extensive introduction to all texture and lighting mechanisms, as there are many, many more. It’s expected for Crimild to support more features in future releases, of course.

Textures

In Metal, the MTLTexture protocol represents formatted image data using specific type and pixel format. Textures can be used as sources for either vertex, fragment or compute shader functions (or all of them), as well as attachments for render passes.

Metal supports images of 1, 2 or 3 dimensions, arrays or cubemaps. Only 2D textures are supported in Crimild at the time of this writing, though.

Creating textures

Assuming we already loaded the actual image file (TGA is the de-facto format for textures in Crimild), we will need to create a texture object and upload the data to its internal storage.

When creating new textures, we use the MTLTextureDescriptor protocol to define properties like  image size, pixel format and arrangement, as well as number of mipmap levels, provided mipmapping is supported. For example:

auto textureDescriptor = [MTLTextureDescriptor 
   texture2DDescriptorWithPixelFormat: MTLPixelFormatRGBA8Unorm
                                width: image->getWidth()
                               height: image->getHeight()
                            mipmapped: NO];

In the code above, a descriptor is created for RGBA images of an specific width and height, with no mipmapping support since, at the moment, Crimild does not support mipmapping when working with textures in Metal.

Then, we’ll pass that descriptor to the device in order to create the actual texture:

id< MTLTexture > mtlTexture = [getDevice() newTextureWithDescriptor:textureDescriptor];

Copy image data to a texture

After creating the texture, we usually need to copy our image data into its storage. Alternatively, the texture data may come from a render pass attachment or other sources, so there won’t be a need to copy anything.

Assuming we do need to copy data, the following code shows how to copy the image data from a crimild::Image object in memory to a texture, at mipmap level 0:

[mtlTexture replaceRegion: region mipmapLevel: 0 withBytes: image->getData() bytesPerRow: image->getWidth() * image->getBpp()];

So far, creating and loading textures is not that different from what OpenGL provides, right?

Binding textures

In order to use textures during our render process, we first need to bind them. As we do for other rendering resources, we need to invoke the corresponding the method in the render encoder:

[getRenderEncoder() setFragmentTexture:mtlTexture atIndex: 0];

This binds the texture to the first index of the texture argument table.

Samplers

Working with textures require us to define how do we want to apply filtering, addressing and other properties while performing texture sampling operations. A sampling operation maps texels to polygons and pixels.

Things are a little bit different in Metal than in OpenGL concerning sampling. At least in practice.

Metal provides a specialized object for sampling operations described by the MTLSampleState protocol. I haven’t use the samplers facilities in Crimild yet, since Le Voyage has extremely simple sampling requirements for textures, all of which can be easily described in MLSL, as we’re going to next.

Textures in MLSL

Two objects are required in order to use textures in MLSL. One is the texture itself, bound as described above. The other is a sampler object, that can be described using the Metal API or instantiate the one that we need in the MLSL shader itself, as Crimild is currently doing. In addition to those objects, we also need texture coordinates specified in the interpolated vertex input, which is provided by the vertex shader.

The following MLSL code implements a fragment shader that returns a color based on both the sampled texture and the material’s diffuse color:

fragment float4 crimild_fragment_shader_unlit_texture( VertexOut projectedVertex [[ stage_in ]],
                                                       texture2d< float > texture [[ texture( 0 ) ]],
                                                       constant crimild::metal::MetalStandardUniforms &uniforms [[ buffer( 1 ) ]] )
{
    constexpr sampler s( coord::normalized, address::repeat, filter::linear);
    
    float4 sampledColor = texture.sample(s, projectedVertex.textureCoords);
    
    return sampledColor * uniforms.material.diffuse;
}

The first line will create a sampler object using standard options for both addressing and filtering. Texture coordinates are expected to be normalized in the range [0, 1].

That sampler is used in the second line to get the texture color at the provided texture coordinates. Finally, both the texture color and the material diffuse color are mixed.

In theory, it’s not that different from OpenGL. And again, many more options can be applied for both textures and samplers than the ones presented here.

Let there be light… Or not

Unsurprisingly, Metal’s lighting facilities are… non-existent.  As in OpenGL, lighting is computed in shaders and that has to be implemented entirely by the developer. Long gone are the times for fixed function pipelines for lighting.

Therefore, Crimild works with lighting in Metal in a very similar way as it does in OpenGL. For each geometry, we pass all active light sources using uniform buffers. Shaders are responsible for the lighting calculations, usually implementing the Phong lighting model (sorry, no PBR support… yet) and a forward render pass.

As it was explained before, the biggest benefit Metal provides over OpenGL in this regard is the fact that we can dispatch all uniforms in a single batch, which is a big performance gain.

I’m assuming deferred rendering will be a lot easier to implement in Metal than it is OpenGL, since handling framebuffers and attachments is very simple in the former one. But that’s something that I can’t tell for sure until I see it working.

Depth/Stencil

Working with Depth/Stencil in Metal turned out to be a little bit more cumbersome than in OpenGL. Again, this has to do with the paradigm shift, but I still have the feeling that it could’ve been simpler (as it is in culling, see my comments below).

MTLDepthStencilDescriptor *depthStencilDescriptor = [MTLDepthStencilDescriptor new];
depthStencilDescriptor.depthCompareFunction = MTLCompareFunctionLess;
depthStencilDescriptor.depthWriteEnabled = YES;

auto depthStencilState = [_device newDepthStencilStateWithDescriptor:depthStencilDescriptor];
[getRenderEncoder() setDepthStencilState: depthStencilState];

Describing the depth/stencil state is done by the MTLDepthStencilDescriptor class, which provides options for things like the comparison function and read/write operations. Once described, we compile an object implementing the MTLDepthStencil protocol, which in turn we pass to the render encoder in order to activate it.

While the depth/stencil state should be compiled only once, we can switch them during the rendering pass based on the requirements for each object that we’re drawing.

Cull State

Culling is set in the render encoder too, but it’s much more direct than depth/stencil. It’s almost as simple as in OpenGL:

    [getRenderEncoder() setFrontFacingWinding: MTLWindingCounterClockwise];
    [getRenderEncoder() setCullMode: MTLCullModeBack];

Two functions are provided to define the winding and cull mode. By default, Crimild uses counter-clockwise winding and back-face culling, just as in OpenGL. No surprises here.

Alpha Blending/Testing

Unfortunately, Alpha Blending is not supported by Crimild’s MetalRenderer at the time of this writing. There was no need for alpha blending in Le Voyage, so I completely skipped this feature. It is expected at least some minimal support for alpha blending in future releases, of course.

On the other hand, Alpha Testing was implemented at fragment shader level, discarding fragments with lower alpha values, which is pretty similar to it’s counterpart in OpenGL.

Are we there yet?

And so we reach the last step in the rendering call for single objects.  At this point, we are able to render objects on the screen with textures and lighting using Metal. But the resulting frame still lacks the final touch, which is the most distinctive feature in Le Voyage: that old film effect that’s applied to the entire screen.

Next week we’ll talk about image effects in Metal, the final step in the rendering pipeline.

To be continue…

Praise the Metal – Part 4: Render Encoders and the Draw Call

Welcome to another entry about Metal support in Crimild. I’m really amazed by the fact that I managed to write several posts in a row in just a couple of weeks. Hopefully, I can keep up with the rest. Because I’m not done yet.

Let’s recap what we discussed so far:

In the first post, the basics concepts for Metal were introduced as well as the reasons for Crimild to support it.

In Part 1 we talked about what needs to be performed during the initialization and the rendering phase, introducing synchronization along the way.

In Part 2 we went deep into the geometry pass and how to describe render pipelines for our visible objects.

In Part 3 we showed the power of the Metal Shading Language and how shaders are written.

Now it’s time to address the step that’s still missing in our rendering process: how to actually send render commands to the GPU using encoders. In addition, I’m going to briefly introduce framebuffers in Metal and how they are handled during the render pass, although I’ll leave the post-processing pass and image effects details for a future post.

This post is supposed to tie up all loose ends in our previous entries, so let’s start…

Command Encoders

We mentioned encoders several times before in previous posts, but we’ve never defined what they are. Command encoders are used to write commands and states into a single command buffer in a way that can be executed by the GPU.

Metal provides three different kind of command encoders: Render, Compute and Blit. It’s important to note that while we can interleave encoders so they write into the same command buffer, only one of them can be active at any point in time.

Creating Render Encoders

At the moment, only render encoders are supported by Crimild, defined by the MTLRenderCommandEncoder protocol, and they are created whenever framebuffers are bound during the render process.

Since encoders write into specific buffers, you create new ones by requesting a new instance from the MTLCommandBuffer itself:

auto renderEncoder = [getRenderer()->getCommandBuffer() renderCommandEncoderWithDescriptor: renderPassDescriptor];

For render encoders, we need to describe them in terms of a rendering pass which are objects describing rendering states and commands. The MTLRenderPassDescriptor class defines the attachments that serve as the rendering destination  for commands in a command buffer.  We may have up to four color attachments, but only up to one for depth and another for stencil operations.

A render pass that will draw to the default drawable (i.e., the screen) is typically described as follows:

auto renderPassDescriptor = [MTLRenderPassDescriptor new];
renderPassDescriptor.colorAttachments[ 0 ].loadAction = MTLLoadActionClear;
const RGBAColorf &clearColor = fbo->getClearColor();
renderPassDescriptor.colorAttachments[ 0 ].clearColor = MTLClearColorMake( clearColor[ 0 ], clearColor[ 1 ], clearColor[ 2 ], clearColor[ 3 ] );
renderPassDescriptor.colorAttachments[ 0 ].storeAction = MTLStoreActionStore;
renderPassDescriptor.colorAttachments[ 0 ].texture = getRenderer()->getDrawable().texture;

The code above describes a render pass that will clear the color attachment and store the results of the rendering process into the default drawable’s texture provided by the renderer.

Alternatively, you can set a different texture as the attachment’s target if you need to perform offscreen rendering, as we will see when I show you how to do post-processing effects in later posts.

In Crimild, render passes and encoders are linked with instances of crimild::FrameBufferObject, which seemed like the natural choice for me, and the related crimild::Catalog implementation takes care of creating and using them.

Specifying resources for a render command encoder

When drawing geometry, we need to specify which resources are bound with the vertex and/or the fragment shader functions. A render command provides methods to assign resources (as in buffers, textures and samplers) to the corresponding argument table as we saw in the last post.

[getRenderEncoder() setVertexBuffer: uniforms offset: 0 atIndex: 1];
[getRenderEncoder() setFragmentBuffer: uniforms offset: 0 atIndex: 1];
[getRenderEncoder() setVertexBuffer: vertexArray offset: 0 atIndex: 0];

In Crimild, resources are set to the render encoder at different points in the render process by different entities. Data buffers, textures and samplers are usually handled by catalogs while uniform and constant buffers are handled by the MetalRenderer itself.

Specifying the render pipeline

We also need to associate a compiled render pipeline state to our encoder for use in rendering:

[getRenderEncoder() setRenderPipelineState: renderPipeline];

The Draw Call

Everything’s set. It’s time to execute the actual draw call.

Metal provides several draw methods depending on the primitives you want to render. Crimild uses indexed primitives by default, so the corresponding method is invoked in this step:

[getRenderEncoder() drawIndexedPrimitives: MTLPrimitiveTypeTriangle
                               indexCount: indexCount
                                indexType: MTLIndexTypeUInt16
                              indexBuffer: indexBuffer
                        indexBufferOffset: 0];

The first argument determines which type of primitive we are going to draw. In this case, we will draw indexed triangles and we specify the index buffer to interpret the vertices, which were passed to the render encoder before the call to this function.

Ending the Rendering Pass

And then we reach the final point in the render process. To terminate a rendering pass, we invoke the endEncoding method on the active render encoder. Once the encoding has finished, you can start a new one on the same buffer if needed.

[getRenderEncoder() endEncoding];

Crimild automatically invokes the endCoding method when unbinding framebuffers, ensuring that all render commands have been set at that point.

Once all command encoders have been described, our command buffer is committed and the drawable will be presented to the screen, as we saw in Part 1.

Side effects? What side effects?

If you’re familiar with Crimild you might have notice a little side effect (actually, a constraint) when working with the Metal-based renderer. Basically, it’s enforcing the use of framebuffers, meaning that it will only work with the forward render pass approach (or anything more complex than that). It wasn’t to be like that when I started. The original goal was to support every kind of render pass, regardless of whether or not it required offscreen rendering. In the end, having at least one offscreen framebuffer is the most natural way of with Metal. At least for Crimild. So, no Metal for you unless you’re willing to pay the price.

On the plus side, working with a deferred render approach seems a lot easier now. I don’t have anything productive yet regarding such a technique (at least not in Metal), but it’s something that I want to do in the near future since it will bring a lot of benefits.

Wait, there’s more…

As I said at the beginning of this article, I’m not done with this series yet. At this point we will be able to render some objects to the screen but, if we only follow the steps discussed so far, the result will be a bit disappointing:

Screen Shot 2016-05-15 at 2.58.59 PM

Where are the textures and labels? Where’s the post processing effect? There are no menus either. You’re right. There are lot of things that are yet to be discovered.

In the next post, we’re going to see how to handle textures and lighting in Metal, as well as describing alpha testing and other state changes.

To be continued…