Optimizing Render Graphs

Last week we talk about what render graphs are and how they help us build customizable pipelines for our projects due to their modularity.

But render graphs are not only useful because of their modularity. There are also other benefits when we want to optimize our pipeline.

REUSING ATTACHMENTS

Since each render pass may generate one or more FBOs (each including several render targets), it would be great if we can find a way to reuse them and/or their attachments. Otherwise, we’ll quickly run out of memory on our GPU.

How do we achieve reusability? Simple. Let’s go back to the simple deferred lighting graph we saw on our previous post.

The Depth attachment is a full-screen 32-bit floating point texture and it’s pretty much unique since no other attachments share that texture format. We will assume that the rest of the attachments (normal, opaque, lighting, etc.) are also full screen, but they have an RGBA8 color format. 

By looking at the graph, it’s clear that the Normal attachment is no longer needed once we’ve accumulated all lighting information (since no other render pass makes use of it). Therefore, if we manage to schedule the passes correctly, we can reuse that attachment for storing the result of the translucent pass, for example.

An that’s it. Thanks to our graph design, we can easily identify which inputs and outputs each render pass has at the time of it’s execution. We also know how many passes are linked with any given attachment.

There’s a catch, though.

Let’s assume we want to generate a debug view like this one:

Top row: depth, normal, opaque and translucent. Left column: opaque+translucent, sepia tint and UI

In order to achieve that image, we need to modify our render graph to make it look like this:

The final frame (Debug Frame) is created by the Debug render pass, which reads from several of the previously created attachments in order to compose the debug frame that is displayed. This prevents us from reusing attachments completely because all of them are now needed at all times. That might be an acceptable loss in this scenario because it’s only for debug, but you definitely need to plan each dependency correctly if you want to maximize reusability.

For my implementation, I’ve decided to reuse only attachments, while FBOs are created and discarded on demand. This helps minimize memory bandwidth as well as providing maximum flexibility for creating offscreen buffers.

DISCARDING NODES

Another advantage of using render graphs is that we’re able to identify which nodes are actually relevant to achieve the final frame during the graph compilation time. That is, of all the nodes in the graph, we’re only interested in keeping and executing only those are actually connected to the final node, which is the resulting frame for the graph.

For this reason, each render graph define which attachments serves as the resulting frame for the entire process. Depending on which attachment is set as the final frame, some render passes will become irrelevant and should be discarded.

Once again, look at the debug render graph above, paying special attention to the debug nodes at the right.

We have two possible final frames. The one that only contains the scene (bottom center) and the debug one (bottom right).

If we set the scene frame as the resulting frame, then the Debug Pass will be discarded since its result is no longer relevant and the final render graph will look like the one at the very top of this post. Then, after compiling the render graph, the passes will be executed as following:

That’s great, but why? Why adding extra nodes that are going to be discarded anyway? Well, you shouldn’t do that… except that by doing so it will allow you to create something like an ubber-pipeline, including debug nodes and different branches too. Then, by defining which one is the actual final frame (maybe using configuration flags), you can end up producing different pipelines. I know, it might seem counterintuitive at first, but in practice it’s really useful.

Closing comments

I’m going to leave it here for now, since this article has already become much longer than expected. 

Render graphs are kind of an experimental feature at the time of this writing, but I’m hoping they will become one of the key players in the next major version of Crimild. Together with shader graphs, they should help me create entire modular pipelines in plain C++ and forget about OpenGL/Metal/Vulkan (almost) completely. 

Now it’s time to prepare one more release before the year ends 🙂

Advertisements

Customizing render pipelines with render graphs

Attempting to work with advanced visual effects (like SSAO) and post-processing in Crimild has always been very painful. ImageEffects, introduced a while ago, were somewhat useful but limited to whatever information the (few) available shared frame buffers contained after rendering a scene. 

To make things worse, maintaining different render paths (i.e. forward, deferred, mobile) usually required a lot of duplicated logic and/or code and sooner or later some of them just stopped working (at this point I still don’t know why there’s code for deferred rendering since it has been broken for a years at least).

Enter Render Graphs…

WHAT ARE RENDER GRAPHS?

Render graphs are a tool for organizing processes that take place when rendering a scene, as well as the resources (i.e. frame buffers) that are required to execute them.

It’s a relatively new rendering paradigm that achieves highly modular render pipelines which can be easily customized and extended. 

WHY ARE Render Graphs HELPFUL?

First of all, they provide high modularity. Processes are connected in a graph like structure and they are pretty much independent of each other. This means that we can create pipelines by plugging in lots of different nodes together. 

Do you need a high fidelity pipeline for AAA games? Then add some nodes for deferred lighting, SSAO, post-processing and multiple shadow casters.

Do you have to run the game in a low level hardware or mobile phone? Use a couple of forward lighting nodes and simple shadows. Do you really need a depth pre-pass?

In addition, a render graph helps with resource management. Each render pass may produce one or more textures but, do we really need as many textures as passes? Can we reuse some of them? All of them? 

Finally, technologies like Vulkan, Metal or DX12 allow us to execute multiple processes in parallel, which is amazing. But it comes with the cost of having to synchronize those processes manually. A render graph helps to identify synchronization barriers for those processes based on the resources they are consuming.

OK, BUT HOW DO THEY WORK?

Like I said above, a render graph defines a render pipeline by using processes (or render passes) and resources (or attachments), each of them represented as a node in a graph. Here’s a simple render graph implementing a (simplified) deferred lighting pipeline:

The graph is composed by two types of nodes: Render Passes (circles) and Attachments (squares). Passes may read from zero, one or multiple attachments and write to at least one attachment. Attachments are the only way to connect passes together.

For example, in the image above, the Depth Pass will produce two attachments: Depth and Normal. The later one is only needed for lighting accumulations, but the Depth attachment is used multiple times (lighting, opaque and translucent render passes).

Once lighting accumulation is complete, its result is blended together with the one produced by the opaque render pass. Then, we blend the resulting attachment with the one written by the translucent render pass to achieve the final image for the frame.

The following images shows the final rendered frame (big image), as well as each of the intermediate attachments used for this pipeline. Notice that even the UI is rendered in its own texture.

Top row: depth, normal, opaque and translucent. Left column: opaque+translucent, sepia tint and UI

If you want to read more about render graphs, here are a couple of links to articles I used as reference for my own implementation:

In the next weeks I’m going to explain how render graphs help to optimize our pipeline by reusing attachments and discarded irrelevant passes.

Enjoy your coffee!

Crimild v4.9.0 is here!

I’m proud to announce that Crimild v4.9.0 is available now, including many new features and improvements:

New Animation System

I admit it. Crimild’s previous animation system was awful and pretty much useless. The new one allows for a lot of modern features like interpolation and blending (including support for additive blending). And, most importantly, it works with several animatable types, like single values, vectors, rotations, joints and so on.

animation

If you want to know more, I wrote a post about the new animation system some time ago that is worth checking out.

New Shader Graph

Modern shader development requires a way to write shader code without having to worry about wether we’re working with OpenGL, OpenGL ES or even new APIs like Metal or Vulkan.

Crimild now supports a builder abstraction in the form of a Shader Graph, where you chain operations and values together which can be later translated into, for example, GLSL code.

If you want to know more about shader graphs… you’ll have to wait because I didn’t write anything yet.

SDL is back!

The SDLSimulation comes back to life from the ashes!

Well, not exactly. The SDL-based simulation is now making use of the SDL2 libraries, which are a big step forward form the classic ones that were use by Crimild in the past (in the SourceForge/SVN era).

Why am I using SDL again? Because it made sense. I was using GLFW for window and input management and SFML for audio, which was overkill and not really helpful beyond PC and Mac. SDL has support for iOS, Android and works well with Emscripten too. So, having only one library for multiple platforms made sense and greatly simplifies things.

As a side effect, I’m deprecating GLFW and SFML support and they will be removed in the next major version (some point next year).

Emscripten Support

What? Emscripten too? Yes! Emscripten is now officially supported by Crimild using CMake. Assets can be automagically bundled into the resulting web package and there’s even audio support thanks to SDLMixer2.

Here’s a live demo. Check it out!

Bare in mind that Emscripten support is still experimental, but I’m planning on keep improving it in future releases, of course.

And many more!

There are a lot of other features, minor updates and modifications. Check out the full Release Notes for the Crimild v4.9.0 at Github to know more.  And all demo projects have been updated as well to use the latest version.

As usual, feel free to check the code and make comments.

Have fun!