Long Live and Render (VI)

In my last post I made it clear that there were several problems with my latest frame graph changes. Here I am today, a couple of weeks later, and I’m going to tell you how I managed to fix all three of them (well, two and a half), as well as making some bonus improvements on the way. 

Removing strong references

I made frame graphs to keep strong references to resources and render passes because it made sense at the time. But if any particular resource (like textures or vertex buffers) is no longer attached to a scene, there’s no point in keeping them alive since they won’t be rendered anyway, right?

This problem was pretty easy to solve, actually.

I only had to switch smart pointers for weak ones in most places, preventing any explicit or implicit strong reference to resources and render passes. Notice that I said *most* places, since the frame graph does allocate some internal objects (in the form of nodes) and I do need strong references for those.

There’s obvious side effect, though. Now it is mandatory for developers to keep track of all created objects in their apps because the engine might not do it automatically. Otherwise you’ll end up with crashes due to null pointers or invalid access to deleted memory. It’s an acceptable price to pay.

I could have added a storage policy to customize this behavior, but I do think that this is the right way. And I can add that policy later if I feel the need for it.

Moving on…

Automatic object registration to frame graphs

Another problem with my latest approach is that we need to add/remove objects to/from the frame graph manually. As I explained before, this is not only cumbersome but also very error prone. Specially now that the frame graph no longer keeps strong references to its objects and forgetting to remove a resource that was previously deleted from the scene will result in a very difficult crash to debug. 

I made a couple of decisions to solve this one:

First, any object that can be added to a frame graph will attempt to automatically do so during its construction, provided a frame graph instance exists. It will also attempt to remove itself automatically during destruction.

How can we tell if a frame graph exists? Easy: frame graphs are singletons (oh, my God! He said the S… word). After giving it some thought, I realize I don’t need more than one frame graph instance at any given point in time. I might be wrong about this, but I can’t think of any scenario where two or more instances are needed. I guess time will tell.

Second problem solved.

Rebuilding the frame graph automatically

To be honest, I didn’t want to spent too much on this problem at the moment. Fixing this particular issue could easily become a full-time job for several days or weeks, so I went with the easiest solution: whenever we add or remove and object, the frame graph is automatically flagged as dirty and will be completely rebuilt from scratch at some point in the future (i.e. during next simulation loop). 

This “solution” works, but is far from the most efficient one, of course. After all, why rebuilding the complete frame graph if we’re only adding a new node to the scene? Do we really need to record every single command buffer again? Well, I guess not. Maybe a change in a scene should not recreate command buffers for post-processing resources, since those are independent of the number of nodes in the 3D space. But the rules for making these decisions are not that simple. What if the nodes that were added to the scene are new lights and they change the way we do tone mapping, for example? 

Rebuilding everything from scratch is the safest bet here for now. And, in the end, this behavior is completely hidden inside the frame graph implementation and can (and will) be improved in the future. 

So, I say this is partially fixed, but still acceptable.

Bonus Track: Fixing viewports

I was not expecting to make any more changes, but when I was cleaning up the offscreen rendering demo I noticed a bug in the way viewports were set for render passes.

I wanted to have more control over the resolution at which render passes are render. For example, I needed an offscreen scene at a lower resolution but it wasn’t working correctly (ok, it was not working at all).

Now it does, which allows me to have render passes at different resolutions:

Notice how an offscreen rendering at lower resolution produces only a pixelated reflection. I can do the same for the whole scene, too. And I can combine them at will. Pretty neat.

Up Next

I’m quite happy with these fixes and the frame graph feels a lot more robust now.

And what is perhaps more important, the frame graph API can be made completely hidden to end users too. It would be easy to provide a default frame graph instance (as part of a simulation system, for example) and then an application developer can add/remove objects at will from scenes without worrying about the frame graph at all.

The next step will be to improve shader bindings (aka descriptor sets), which is something that is still very cumbersome. Or maybe I’ll do something more visual, like shadows. Or both 🙂

See you next time!

The Ghost of Refactors Past…

…has come to hunt me once again. Although this time it’s not because of mistakes that I did. Instead, the problem lies in something that I missed completely.

Can you spot the problem in the following code?

void SomeObject::save( Stream &s )
   std::size_t count = getElementCount();
   s.write( count );

Well, it turns out std::size_t is NOT PLATFORM INDEPENDENT. And here’s the twist: I knew that since, well, forever, but I never paid any attention to it. That is, until it became a problem. And the problem was EVERYWHERE in the code.

First thing first. The C++ standard has this to say about std::size_t:

typedef /*implementation-defined*/ size_t;


What’s that supposed to mean? Basically, std::size_t may have different precision depending on the platform. For example, in a 32-bit architecture, std::size_t may be represented as a 32-bit unsigned integer. Something similar happens in a 64-bit platform.


std::size_t is supposed to help portability, right? That’s its whole purpose. And that’s true, of course. There’s nothing wrong with std::size_t itself.

Check the code above again. Go on, I’ll wait.

So, whenever we create a [binary] stream to serialize the scene, we can’t use std::size_t because it’s just plain wrong. It will be saved as a 32-bit integer in some platforms and 64-bit in others. Later, when the stream is loaded, the number of bytes read will depend on whatever the precision of the current platform is, regardless of what was used when saving. See the problem?

This means that we can’t share binary streams between different platforms because the data might be interpreted in different ways, leading to errors when generating scenes.

For the past few years, my main setup have been OS X and iOS, both 64-bit platforms. But one day I had to use streaming on 32-bit Android phones and, as you might have guessed by now, all hell break loose…

Entering crimild::types

I had to made a call here: either we can keep using std::size_t everywhere and handle the special case in the Stream class itself; or we can make use of fixed precision types for all values (specially integers) and therefore guaranteeing that the code will be platform independent.

I went for the second approach, which seems to me to be right choice. At the time of this writing, the new types are:

namespace crimild {

   using Int8 = int8_t;
   using Int16 = int16_t;
   using Int32 = int32_t;
   using Int64 = int64_t;

   using UInt8 = uint8_t;
   using UInt16 = uint16_t;
   using UInt32 = uint32_t;
   using UInt64 = uint64_t;

   using Real32 = float;
   using Real64 = double;

   using Bool = bool;
   using Size = UInt64;

As you can see, crimild::Size is always defined as a 64-bit unsigned integer regardless of the platform.

Yet that means I need to change every single type definition in the current code so it uses the new types. As you might have guessed, it’s a lot of work, so I’m going to do it on the fly. I think I already tackled the critical code (that is, streaming) by the time I’m writing this post, but there’s still much to be reviewed.

New code already makes use of the platform-independent types. For example, the new particle system is employing UInt32, Real64 and other new types and–

Oh, right, I haven’t talked about the new particle system yet… Well, spoiler alert: there’s a new particle system.

You want to know more? Come back next week, then 🙂




Deferred improvements

Revisit the deferred rendering implementation took me a little more time than expected and it wasn’t easy, but I’m very excited with the results and the flexibility that has been achieved.

In contrast with my last post, this one is gonna be all about visual improvements. So, let’s begin.

Truth be told, my original goal was to enhance just the post-processing pipeline, adding support for more than one image effect at the same time and then accumulating results. But I ended up refactoring the entire deferred render pass and added a couple of new things on the way. Because, you know, refactors are fun 🙂

As before, the Deferred render path is split into three stages: G-Buffer generation, lighting and post-processing.

The G-Buffer

The G-Buffer is composed of five render targets organized as follows:

G-Buffer organization
G-Buffer organization. Floating-point buffers are used whenever possible in order to keep data precision.

Both world and view space normals are stored for different purposes. As a matter of fact, view space normals are generated just from the geometry itself, without any bump mapping applied to them since some post-processing effects achieve better results with less information (i.e. SSAO)

The G-Buffer in action. Top row: depth, diffuse and world space positions. Bottom row: emissive (unused in this demo), world space normals and view space normals


The second step is to compute lighting and generate a colored frame. Usually, this step involves two passes: one for lighting and one for the final composition, but I’m doing both in a single pass. There’s room for some improvements here, but I leave that to my future self.

Lighting computation, before applying post-processing effects
Lighting computation, before applying post-processing effects

Lighting is computed from world space information. Shadow maps are applied here after the scene is lit.


Once the scene is generated, image effects are applied. A couple of auxiliary buffers are used (following the “ping-pong buffer” technique), accumulating results.

Ping-pong buffer. For each image effect, the source and destination buffers are swapped. Once all effects have been processed, the source buffer contains the final image
Ping-pong buffer technique. For each image effect, the source and accumulation buffers are swapped. Once all effects have been processed, the source buffer contains the final image

Concerning image effects, the new additions are Depth of Field and SSAO. There was a previous implementation for SSAO, but the new one performs blurring in order to reduce noise and improve the final results.

Applying Depth of Field to the scene
Applying Depth of Field to the scene

SSAO only
Rendering the scene with only the output from the SSAO effect, before applying it to the scene

Final Comments

In order to debug the entire process, I made it possible to render the results of all the passes at the same time. It is a costly operation, but extremely useful when trying to understand what’s going on. In fact, I’m planning to add this feature to other systems as well in a future iteration.

Top row: depth, diffuse, positions, shadow map (packed). Middle row: emissive (unused), world space normals (with bump), view space normals and screen objects. Bottom row: lighting and post-processing
Top row: depth, diffuse, positions, shadow map (packed). Middle row: emissive (unused), world space normals (with bump mapping applied), view space normals and screen objects. Bottom row: lighting and post-processing (SSAO + DoF)

That’s it for now. I’m done with refac–


Those shadows look awful…