Color masks and occluders

One of the newest features that will be included in the next major release for Crimild (coming soon) is the support for color masks and invisible occluders for our scenes.

Occluders are objects that block the visibility path, fully or partially hiding whatever is beyond them. An invisible occluder behaves in the same way, and while the object itself is not drawn it still prevents objects behind it to be rendered.

For example, in the following scene the teapot (in yellow) is an occluder. Other objects are orbiting around it and the scene is rendered normally.

screen-shot-2016-09-28-at-12-34-10-pm

Original scene with color mask enabled for all channels

By playing around with the color mask and turning it off for all channels we can avoid the teapot itself for being drawn, yet it still blocks the objects that are passing behind it. The green plane is not being affected by this behavior (that’s on purpose).

The effect is a pretty cool one and has a lot of applications in games and simulations. It’s specially useful in augmented reality to mix real-life objects with virtual ones.

 

Praise the Metal – Part 6: Post-processing

I knew from the start that Le Voyage needed some kind of distinctive feature in order to be recognized out there. The gameplay was way too simple, so I focused on presentation instead. Since the game is based on early silent films, using some sort of post-processing effect to render noise and scratches was a must have.

About Image Effects

In Crimild, image effects implement post-processing operations on a whole frame. Ranging from simple color replacement (as in sepia effects) to techniques like SSAO or Depth of Field, image effects are quite powerful things.

Le Voyage makes use of a simple image effect to achieve that old-film look.

Screen Shot 2016-05-15 at 3.03.19 PM

There are four different techniques applied in the image above:

  1. Sepia tone: All colors are mapped to sepia values, which is that brownish tint. No more blues, reds or grays. Just different sepia levels.
  2. Film grain: noise produce in films due to luminosity changes (or so I was told)
  3. Scratches: Due to film degradation, old movies display scratches, burnts and other kind of artifacts after some time.
  4. Vignetting: The corners of the screen look more dark than the center, to simulate the dimming of light. This technique is usually employed to frame important objects in the center of the screen, as in closeups.

All of these effects are applied in a single pass for best performance.

How does it work?

Crimild implements post-processing using a technique called ping-pong, where two buffers are switched back-and-forth, serving as both source and accumulation while processing image effects.

A scene is rendered into an off-screen framebuffer, which is designated as source buffer. For each image effect, the source buffer is bound as a texture and used to get the pixel data that serves as input for an effect. The image effect is computed and rendered to a second buffer, known as accumulation buffer. Then, source and accumulation are swapped and, if there are more image effects to apply, the process starts again.

When there are no more image effects to be processed, the source buffer will contain the final image that will to be displayed on the screen.

Confused? Then the following image may help you (spoiler alert: it won’t):

IMG_0548

Ping-pong buffer. For each image effect, the source and destination buffers are swapped. Once all effects have been processed, the source buffer contains the final image

Le Voyage only uses one image effect that applies all four stages in a single pass. No need for additional post-processing is required. If you want to know more implementing the old-film effect, you can check this great article by Nutty.ca in which I based mine.

Powered by Metal

Theoretically speaking, there’s no difference concerning how this effect is applied in either Metal or OpenGL, as the same steps are required in both APIs. In practice, the Metal API is a bit simpler concerning handling framebuffers, so the code ends up more readable. And there’s no need for all that explicit error checking that OpenGL needs.

If you recall from a previous post where we talk about FBOs, I said that we need to define a texture for our color attachment. At the time, we did something like this:

renderPassDescriptor.colorAttachments[ 0 ].texture = getDrawable().texture;

That code sets the texture associated with a drawable to the first color attachment. This will render everything on the drawable itself, which in our case was the screen.

But in order to perform post-processing, we need an offscreen buffer. That means that we need an actual output texture instead:

MTLTextureDescriptor *textureDescriptor = [MTLTextureDescriptor 
   texture2DDescriptorWithPixelFormat: MTLPixelFormatBGRA8Unorm
                                width: /* screen width */
                               height: /* screen height */
                            mipmapped: NO];
 
id< MTLTexture > texture = [getDevice() newTextureWithDescriptor:textureDescriptor];

renderPassDescriptor.colorAttachments[ 0 ].texture = texture;

The code above is executed when generating the offscreen FBO. It creates a new texture object with the screen dimensions and set it as the output for the color attachment. Notice that we don’t pass any data as the texture’s image since we’re going to write into it.

Keep in mind we need two offscreen FBOs created this way, one that will act as source and the other as accumulation.

Once set, binding this offscreen FBO will make our rendering code to draw everything on the texture itself instead of the screen, which will become our source FBO.

Then we proceed to render the image effect, using a quad primitive and that texture as input and producing the final image into the accumulation FBO, which is then presented on the screen since there are no more image effects.

Again, this is familiar territory if you already know how to do offscreen rendering in OpenGL. Except for a tiny little difference…

Performance

When I started writing this series of posts, I mentioned that one of the reasons for me to work with Metal was that the game’s performance was poor on the Apple TV. More specifically, I assumed that the post-processing step was consuming most of the processing resources for a single frame. And I wasn’t completely wrong about that.

Keep in mind that post-processing is always a very expensive technique, both in memory and processing requirements, regardless of the API. In fact, the game is almost unplayable on older devices like an iPod Touch or iPhone 4s just because of this effect (and those devices don’t support Metal). Other image effects, like SSAO or DOF are even more expensive, so you should try and keep the post-processing step as small as possible.

I’m getting ahead of myself, since optimizations will be subject of the next post, but it turns out that Metal’s performance boost over OpenGL not only lets you display more objects in screen, but also allows for more complex post-processing effects to be applied in real-time too. Even without optimizing the shaders of the image effect, I noticed a 50% increment in performance just by switching the APIs. That was amazing!

Et Voilá

And so the rendering process is now completed. We started with a blank screen, and then proceeded to draw objects in an offscreen buffer. Finally, we applied a post-processing effect to achieve that old-film look giving the game that unique look. It was quite the trip, right?

In the next and (almost) final entry in this series we’re going to see some of the amazing tools that will allow us to profile applications based on Metal, as well as some of the most basic optimization techniques that lead to great performance improvements.

To be continued…

Praise the Metal – Part 1: Rendering a single frame

In my previous post I made an overview of the key concepts of Metal and how they differ from OpenGL. Now I’m going to describe the rendering process for a single frame in Le Voyage as well as introducing the high level architecture for the Metal Renderer implementation.

Screen Shot 2016-05-15 at 3.03.19 PM

As mentioned before, Metal let us do the most expensive things less often and that’s why this post is split into two big sections: Initialization and Rendering

Initialization

Before going deeper into the rendering process, I’m going to mention a couple of things that need to be properly set up from the start in order for us to render anything on the screen. These steps should be performed as few times as possible and usually at the beginning of  the program or, for example, when the encapsulating View Controller is created/loaded.

Creating a Device

In Metal, the MTLDevice protocol is an abstraction of the GPU and can be considered the root of the Metal API.

    
_device = MTLCreateSystemDefaultDevice();
if ( _device == nullptr ) {
   // fallback to OpenGL renderer
}

From now on, every object that we need to create (like command buffers, data buffers, textures, etc) will be done so by using this MTLDevice instance.

As a side note, if you want to know if your system (like your iPhone) supports Metal, you only need to check if a device can be created. If not, you can assume Metal is not supported and you should fallback to, for example, OpenGL for rendering. Or, you know, exit(1)…

In Crimild, a device is created when initializing a CrimildMetalView and, if successful, it continues the configuration process by instantiating the MetalRenderer. Which brings me to the next topic.

Keep in mind that if you’re running on OS X, you may end up with more than one device depending on your system’s capabilities.

Crate a Metal-based View

Much like when working with OpenGL, in order for Metal to work you need a UIView and the right layer class. iOS developers know that if you want to obtain anything drawn on the screen, it needs to be part of the Core Animation Layer tree. Metal provides a new layer type for this: CAMetalLayer.

@implementation CrimildMetalView

+ (Class) layerClass
{
    return [CAMetalLayer class];
}

@end

In Crimild, the Metal view implementation is pretty straightforward. It’s the job of the encapsulating View Controller to add it to the view hierarchy and update them using the CADisplayLink class. That allows the same view controller to easily work with both Metal and OpenGL.

Renderer Configuration

The CrimildMetalView implementation is also responsible for instantiating the MetalRenderer class, provided Metal is supported of course.

As in any other Renderer implementation, the MetalRenderer class provides a configuration mechanism, which in this case will set up both the layer and the command queue as follows:

void MetalRenderer::configure( void )
{
   Renderer::configure();
   _layer = (CAMetalLayer *) getView().layer;
   _layer.contentsScale = [[UIScreen mainScreen] scale]; // 1
   _layer.pixelFormat = MTLPixelFormatBGRA8Unorm;
   _layer.framebufferOnly = true;
   _layer.presentsWithTransaction = false;
   _layer.drawsAsynchronously = true;
   _layer.device = _device;

   _commandQueue = [getDevice() newCommandQueue]; // 2
}

The MTLCommandQueue protocol defines a queue for commands buffers. Each MTLCommandBuffer implementation will contain the translated GPU commands based on what we’re trying to do. Usually, we use command buffers to draw, but Metal provides other types of buffers too, like compute or blit buffers. Additionally, implementations of the MTLCommandEncoder protocol are the ones that perform the translations of our pipelines into GPU commands. Am I going too fast? Don’t worry, I’ll talk about these guys a lot in later posts.

Ok, everything’s setup. Let’s move on to the drawing phase.

Rendering a Frame

In Le Voyage, a single frame may contain several different types of entities, such as opaque and translucent game objects, lights, textures, post-processing effects, and so on. Each of those objects need to be processed order which can be summarized as follows:

  1. Preparing the frame and cleaning the screen

    Screen Shot 2016-05-15 at 2.59.59 PM

  2. Render opaque objects first

    Screen Shot 2016-05-15 at 2.58.59 PM

  3. Render transparent and overlay objects

    Screen Shot 2016-05-15 at 3.01.16 PM

  4. Composite the scene with all objects

    Screen Shot 2016-05-15 at 2.57.49 PM

  5. Apply post-processing effects and display

    Screen Shot 2016-05-15 at 3.03.19 PM

For the sake of brevety, in the rest of this post I’m going to explain only the first and last steps, leaving the details for how to render individual objects and applying image effects for later entries in this series.

Begin Frame

void MetalRenderer::beginRender( void )
{
   Renderer::beginRender();
   _commandBuffer = [_commandQueue commandBuffer]; // 1
}

The very first step in our render loop is to obtain the next available command buffer from the command queue, so we can start pushing commands into it.

Rendering individual objects

Each object that is going to be rendered on screen must do so by specifying a MTLRenderPipeline object, containing compiled shaders, vertex attributes and uniforms, among other things. Then, rendering different objects only requires switching to the corresponding pipeline avoiding the extra overhead of changing states.

I’ll be talking about the MTLRenderPipeline class in the next post.

Presenting the Frame

Once our objects have been drawn and all effects applied, the frame it’s ready to be displayed:

void MetalRenderer::presentFrame( void )
{
   _drawable = [getLayer() nextDrawable]; // 1

   Renderer::presentFrame(); // 2

   [_commandBuffer presentDrawable: _drawable]; // 3
   [_commandBuffer commit]; // 4
}

We do so by requesting the device to provide a drawable (basically, a texture). I’ll explain why we do this at this step in the final posts where I’m going to talk about optimizations, but for now just keep in mind that this is a blocking operation and it should be done only when you’re actually going to draw something.

Once a drawable has been acquired, we render the the frame in it, then we tell CoreAnimation to present the drawable and finally we commit the command buffer to the command queue, to be processed as soon as possible.

And that’s it. Our frame is displayed and we’re ready to work on the next one.

Buffer Synchronization

There’re several things that are missing in the process described above, but I want to focus on a special one right away: buffer synchronization. In a perfect world, we can create as many buffers as we want without having to worry about the actual resources. Of course, in practice we should only create and try and reuse a finite number of buffers in order to avoid wasting memory.

Yet, a problem arises when trying to reuse buffers (or any kind of resources in multi-threaded environment). Remember that I said Metal was multithreaded by design? Well, it comes with a price. Check the following diagram:

metal_buffer_sync_1

The diagram above shows two buffers being reused. Can you spot the problem? (hint: it’s highlighted with a friendly red box). First, we encode buffer #0 and dispatch it to the GPU. Then, we encode and dispatch buffer #1. After that, we cannot reuse buffer #0 right away because it’s still being use by the GPU. The solution?

metal_buffer_sync_2

Simple, right? We just need to wait for at least one buffer to be released by the GPU in order to continue encoding new ones.

As suggested by the Metal documentation, I’m using a dispatch_semaphore_t instance to implement the waiting step. In the setup process, the semaphore is created with the maximum number of buffers we need.

void MetalRenderer::configure( void )
{
   /* ... */
   _inflightSemaphore = dispatch_semaphore_create( CRIMILD_METAL_IN_FLIGHT_COMMAND_BUFFERS );
}

Then, before starting a new frame, we wait for at least one buffer to be available.

void MetalRenderer::beginRender( void )
{
   /* ... */
   dispatch_semaphore_wait( _inflightSemaphore, DISPATCH_TIME_FOREVER );
}

Finally, when committing a command buffer, we can use a handler to signal the semaphore in order to indicate that the resource is no longer used:

void MetalRenderer::presentFrame( void )
{
   /* ... */
   __block dispatch_semaphore_t dispatchSemaphore = _inflightSemaphore;
   [_commandBuffer addCompletedHandler:^(id<MTLCommandBuffer>) {
       dispatch_semaphore_signal( dispatchSemaphore );
   }];

    [_commandBuffer presentDrawable: _drawable];
    [_commandBuffer commit];
}

The MetalRenderer class in Crimild uses a triple buffer technique. That is to say, that at any point in time there are at most three buffers being used for rendering.

Time for a break

The above description is quite simplistic but it remarks the most important aspects for rendering a single frame. If you have the felling that Metal is a little bit more cumbersome to setup than OpenGL, it’s because it actually is. We never had to worry about devices or synchronization when using OpenGL, right? Metal provides blazing fast context switching at draw time at the expense of a more verbose configuration and a more careful execution.

Like I said before, we need to start thinking things differently. When shaders became mainstream (around 2004, maybe?), we needed to do all the transform and lighting calculations ourselves in order to truly take advantage of them, which added more complexity to our programs. Well, I think the same can be said for Metal (and the newest APIs, like Vulkan too). No pain, no gain…

In the next episode post we’re going to start filling in the blanks in the above algorithm and to present the mechanisms to actually draw some objects on the screen.

To be continued…