A while back I made an attempt at writing a Metal renderer for iOS. Although the triangle demo worked out fine, the truth is it lacked many features and it was more of a proof of concept rather than a production-ready solution. Now that Le Voyage is out, I wanted to give another shot to Metal. I was expecting some sort of improvement in both coding and performance, and now that is [mostly] completed, I have to admit the results exceeded every expectation. And that’s basically why I decided to write about that experience (and because I need to prepare a talk for a future iOS Meetup)
- Prelude: About Le Voyage, OpenGL and Metal
- Part 1: Rendering a frame in Le Voyage
- Part 2: Geometry pass, vertices and uniforms
- Part 3: Metal Shading Language
- Part 4: Render Encoders and the Draw Call
- Part 5: Texture & Lighting
- Part 6: Post-processing
- Part 7: Profiling and Optimizing our App
Wait, what’s “Le Voyage”?
In order to talk about how Metal support in Crimild came to life, I first need to define some context. “Le Voyage” is a game for iOS and tvOS that was published early this year. Inspired by the works of George Méliès and Jules Verne, in the game you are traveling towards the Moon while avoiding obstacles. Here’s a trailer of the game for those of you who haven’t heard about it:
For the rest of this series of posts I’m going to focus only on the rendering aspects of the game, particularly about what drove me to create a Metal-based renderer and what benefits (and
headaches challenges) it brings to Crimild. Think of this as a sort of postmortem rather than Metal tutorial, although I am going to explain some basic concepts along the way.
Huston, we have a problem
Le Voyage was released on iOS first, but it wasn’t until I started working on the tvOS release that I noticed, to my surprise, a significant performance degradation. When playing the game, the frame rate was never above 15 fps, which is less than a half of what I was targeting on an iPhone 5s (yes, the original OpenGL-based version of Le Voyage ran at 30fps in the best scenario).
Assuming the problem was related with the poorly optimized post-processing effects (more on that later), I try and downsampled the screen buffer to 720p (instead of targeting 1080p, which is the native resolution for an Apple TV). That did make the game to run at playable frame rates, but it was still not good enough.
At that point I had to make a choice and since I always knew that a Metal-based version of Le Voyage was going to be a reality eventually, I jumped right into the new API instead of going deeper into OpenGL optimization (which means I still have work to do on that front).
It’s not my intention to bore you with yet another introduction to Metal, but there are some concepts that need to be introduced in order for the rest of the posts to make sense. I’m going to focus mostly on those things that are different from OpenGL and why I think they are a game changer.
Metal is a low-overhead, high performance, 3D rendering API for the newest Apple devices based on the A7/A8 GPUs and running iOS 8, tvOS 9 and OS X 10.11 or above. Metal follows a new paradigm in rendering APIs which dramatically reduce extensive state checking and validation in favor of precompiled render states and commands, providing a higher number of draw calls at run-time. More draw calls per frame leads to more unique objects, more visual variety and essentially more freedom for designers and artists.
Each draw call requires its own state, including vertices, shaders, textures, constants, render targets, etc. Changing this state at run-time is quite costly for both the CPU and the GPU since we have to translate it to hardware commands, so we need to make sure that we do this at the right time and we don’t repeat ourselves (leading to redundant state changes).
And here lies one of the key differences between Metal and APIs like OpenGL: timing. I’m not talking about how fast things happen at run-time, but rather about when things happen during the execution of the program.
OpenGL is a forward state machine (although this might not be true in some implementations) meaning that, in order to draw something on the screen, you set (and un-set) a series of states at the moment of rendering. That alone increases the complexity for each draw calls and usually leads to a lot of redundant state changes. For example, Crimild’s policy defines that for every state change, we need to revert those changes after rendering in order to ensure that draw calls are independent of one another. The compromise here is that we’re avoiding a lot of headaches and debugging at the expense of performance. That’s why objects are grouped by materials, so we reduce the number of state changes as much as possible.
In contrast, Metal allow us to do the most expensive things less often and expects the program to describe and precompile each draw call into pipelines and render passes, which can be rapidly switched at run-time with the minimum GPU API overhead by just specifying, for example, which pipeline to use. There’s [almost] no need to change specific states at draw time anymore.
Maybe one of the most disrupting changes in Metal is the fact that shaders can (and should) be precompiled at build time. Let me repeat that: shaders are compiled at the same time you’re compiling and bundling the app itself. Not even when loading assets (although that’s also possible). This alone provides a huge benefit not only for application performance but for finding and debugging errors at build time. I can’t tell how many times in OpenGL entire frames where discarded just because of a simple syntax error. Also worth mentioning are the excellent profiling tools provided by Xcode which even let us find out which sections of our shader code are taking the most time to compute. I’ll talk more about shaders and profiling in later posts.
Concerning resources, like textures and buffers, Metal stores them in shared memory between the CPU and the GPU. No more waiting for data to be send between one and the other. This is specially true for shader uniforms, which can now be specified in bulk using a single buffer instead of setting them one by one. If needed, we can also defined memory buffers stored only on the GPU too.
Oh, and Metal is multi-threaded by design, meaning that we can send render commands from multiple background threads if needed. Yeap, rendering is no longer the domain of the main thread.
Metal is great and provides a lot of benefits, yes, but it does require a change of mind in the way we describe rendering algorithms. Particularly for Crimild, supporting both Metal and OpenGL meant that I had to bend several rules and policies related with how objects are sent to the renderer and the way resource catalogs behave. In some cases, translating concepts from Crimild to Metal was not straightforward, leading some pretty ugly tricks along the way. It was a challenging process and there’s plenty of room for improvements yet.
And so the journey begins
I’m going to leave things here and I’ll start talking about each of the concepts in the next posts. But before I go, I want to point you Warren Moore’s site, which provides a lot of extremely useful information about Metal. It was a great place for me to learn about the API beyond the official documentation and it help me tremendously when implementing Crimild’s Metal-based renderer. Also worth mentioning is Ray Wenderlich’s Metal series of articles, which allowed me to learn some Swift as a side effect.
As a final note, I want to mention that the latest versions of iOS, tvOS and OS X brought further improvements to the Metal API, like MetalKit, Model I/O or Metal Performance Shaders. At the time of this writing I’m not making use of them, yet I do have plans to incorporate at least the latest one into Crimild in a future version.