Live long and Render (III)

I’m slowly moving forward with my Vulkan implementation. After several days of trial and error, I finally managed to render a simple triangle, which is a big deal for Vulkan. But I’m getting ahead of myself. Let me talk you about the journey first.

As mentioned in previous posts, the majority of the design decisions at the moment are how to introduce Vulkan’s concepts into Crimild and make sense of them. I talked about render devices and swapchains before and the next step was to start dealing with how to draw objects in the screen.

Shaders

Shaders have been part of Crimild for a long time, but the time has come to update them in order to support modern features. For the moment, the most important change I introduced in the Vulkan branch is that we can have multiple shader sources for each program. Besides the typical vertex/fragment shader pair, we can now specify geometry and compute shaders too. These are not implemented yet, but it’s a start.

Graphics Pipelines

Graphics pipelines define how objects are rendered in the screen, including everything from viewport size, vertex inputs, depth testing, color blending, etc.

Older graphics APIs like OpenGL define a graphics pipeline in a very strict fashion. Yes, it was possible to introduce some customization in the form of shaders here and there, but in the end everything was rendered in the same way.

Vulkan introduces the concept of highly customizable graphics pipelines. We can know specify things like rasterization options, depth/stencil settings, multisampling, etc in a single object and use it a way that’s really efficient. As usual with Vulkan, this means two things: on one hand, a great power. And, on the other, a very, very explicit amount of code to create the pipelines.

Custom graphics Pipelines are, of course, another new concept for Crimild and it wasn’t easy to reach a consensus about how to work with them (and, to be honest, I’m still second guessing some decisions).

Having one pipeline shared by every single renderable object doesn’t make any sense. But neither does the opposite, since I would end up having too many instances of the same pipeline for objects that are similar.

Associating pipelines with materials didn’t feel right either. Again, some materials may reuse the same pipelines.

In the end, I made up my mind and decided that pipelines are independent of both drawables and materials. Why? Because there may be times when we need to render objects disregarding their geometry (i.e. don’t care about normals or vertex colors) and/or material properties (like we’re rendering a shadow map).

What about linking pipelines and shaders? Well, that makes more sense, but it’s not enough. Pipelines handle much more information than shaders, like viewport sizes and blending, for example.

And that’s how the Pipeline class was born.

Render Passes, Attachments & Framebuffers

Render passes are already a very important (albeit experimental) feature of Crimild. And they don’t differ too much from Vulkan’s own render passes.

The most important difference is that in Vulkan the actual rendering is performed in sub-passes. Render passes only serve as a way to declare which resources (that is, attachments) are needed for the sub-passes to work. Then, you can declare a single render pass that performs deferred lighting on a scene by implementing multiple sub-passes, all working with the same shared attachments.

The use of sub-passes makes the render pass much more efficient, even if working with OpenGL Since attachments are shared, we only need to bind them once before executing all sub-passes. This is a change that I’m planning to make soon.

Render Graphs

Vulkan does not have a render graph API, although it is implemented internally by specifying sub-pass dependencies in each render pass. It is our job to correctly set those dependencies which might quickly become cumbersome for complex renderers.

I’m still trying to figure out the changes require to Crimild’s render graph API. Not only to support sub-passes, but also because I want it to become much more than just a bunch of passes and dependencies. I want to include things like scene culling, filtering (i.e. render only UI elements), commands and much more. My goal is to make the render graph a descriptor for how an entire frame should be drawn for each application, not only the scene.

I believe this will be extremely beneficial both for complex and simple applications. You don’t need to cull objects because it’s a simple app? Do you need post-processing only on the 3D scene? Do you want a different post-processing for the UI? Are you making a headless path tracer for generating images? All of those scenarios can be supported.

Like I said, I’m still working on this and I’m not planning on it to be ready any time soon.

Moving on.

Command Pools & Command Buffers

Almost there, I promise.

Here’re another two new concepts for Crimild: Command pools and Command Buffers.

Command buffers are used to store commands that will be later executed when a frame is actually rendered. This is probably the biggest difference between OpenGL and Vulkan. While the former works by setting the state machine immediately (in theory, some drivers may change that), Vulkan declares everything up front and defers most operations for (possible much) later use.

For example, when rendering a triangle we usually issue commands for clear the screen buffer, bind vertex and index data, define a viewport, etc. When everything is ready, we issue a draw command (aka, a “draw call”). A command buffer will record all of these commands sequentially.

Command buffers are created for given specific command pool, depending on their type. There may be many different pools for different purposes, like graphics or compute pools.

Wait. Doesn’t Crimild’s render queues work in the same fashion? What’s different? It’s true that I tried to achieve something like this in Crimild before in the form of render queues, yet they are of a much higher level. With render queues, visible objects are recorded (which may be done in separated threads) to be rendered later. But it’s only the renderable object the one that is saved, not the actual render commands. This requires that we compute what state changes are triggered every time we draw that object. This is clearly an overhead, specially if we consider the fact that the renderer triggers draw state changes and draw calls without actually checking if those are needed. I made this call on purpose in the past to ensure that any object can be rendered independently of what came before, always reseting states to default values before drawing.

By using command buffers, instead, we can avoid that overhead while keeping the safety net. For each renderable object, we record the list of state changes and draw calls needed to make it appear on the screen. Then, we can check which of those commands are redundant and discard them. And by the time the render process is triggered, we’ll have the minimum number of commands that are needed to draw all objects.

Obviously, recording commands is a costly operation. The challenge, then, will be to understand when to trigger the recording of render commands. After all, doing it every frame may end up causing more overhead than the one we’re trying to solve. But that’s another problem for my future self (I hate you too, future self!).

And then… Victory!

After all the hard work, the mighty Triangle shows up in the screen:

…Up Next!

Phew, that was a long post.

Now it’s time to make a pause. Think. Design.

There are many new concepts introduced into the engine and I want to do it right before moving on to other features like buffers and textures.

And yes, I think that the render graph is the most interesting feature I’ve ever made for Crimild… assuming it works 🙂

Advertisements

Live Long and Render (II)

This is the second part of my dev diary about implementing Vulkan support in Crimild. Check out the first part for a brief introduction if you haven’t read it yet.

Changes, changes, changes

I’m still struggling with the class hierarchy and responsibilities. I would like to use RAII as much as possible, but I’m not sure about the API design and who’s responsible for creating new objects yet.

For example, it feels natural that the Instance (basically a wrapper for VkInstance) creates the render devices and swapchain. But, since the surface is platform dependent, it must be created somewhere else which doesn’t feel right.

On the other hand, a render device should create new resources (like images or buffers) but that also means that such resources are coupled with that particular device. What if we have more than one device?

I know, I’m overthinking it as usual but, to be honest, defining the class hierarchy has proven to be the most challenging task so far.

As a side note, I decided to use exceptions for error reporting. Like when attempting to create a Vulkan objects and the process fails for some reason. This simplifies the code a lot and, although there’s an overhead in using exceptions, they’re only used in error paths so it’s not a big issue.

Initialization

The process of initializing Vulkan in Crimild can be described as follows:

  1. The VulkanSystem creates a Vulkan Instance and keeps a strong reference to it that lives for the rest of the simulation
  2. The VulkanSystem creates a surface where we’re going to render into. This is platform dependent mostly.
  3. The Instance creates a Render Device (see below)
  4. The Instance creates a Swapchain (see below)
  5. The Render Device creates resources (images, buffers, etc)
  6. The Swapchain request the Render Device to create Image Views for available Images (usually 2 in order to work with double buffering)
  7. Dark magic goes here (not implemented yet)
  8. Render!

Please keep in mind that this is still work in progress.

Render Device

I’ve been talking about render devices but I didn’t say what they are yet. RenderDevice is a new class that handles both Vulkan’s physical and logical devices. I know that we may have more than one logical device per physical one, but I’m not seeing that as a requirement for the moment. If the time comes where I need to make that distinction, it won’t be hard to split the class in two.

The goals is for RenderDevice to replace the Renderer interface which has become too big over the years.

I don’t have much code for the RenderDevice class at the moment. Well, there’s a lot of code, but it’s mostly for initialization. I’m expecting this class to get bigger and bigger as the time passes.

Swapchain

The Swapchain is kind of a new concept that I borrowed directly from Vulkan. It’s main responsibility is to handle images that need to be presented to the screen/surface.

For such reason, there are only two main functions for a Swapchain object: 1) acquire a new image for us to render to and 2) present that image to the screen once is ready.

The Swapchain class is pretty much completed and I don’t think it might get much bigger than what it is today.

Thinking out loud: Headless Vulkan

This is something that I would like to try out in future iterations. Unlike OpenGL, we can use Vulkan without having to create a visible window. This could prove useful in several scenarios, like when doing complex compute operations, image generation using procedural algorithms, computer vision… even unit tests. I do like the idea of having automatic tests for everything rendering-related that actually mean something as I do for other systems in the engine.

Again, this is not a priority right now, but I’ll definitely give it a try in the future.

Up Next!

Now that we have a window, a render device and a swapchain, I believe the next logical step is to actually render something. Therefore, I’ll be focusing on pipelines and commands next.

Optimizing Render Graphs

Last week we talk about what render graphs are and how they help us build customizable pipelines for our projects due to their modularity.

But render graphs are not only useful because of their modularity. There are also other benefits when we want to optimize our pipeline.

REUSING ATTACHMENTS

Since each render pass may generate one or more FBOs (each including several render targets), it would be great if we can find a way to reuse them and/or their attachments. Otherwise, we’ll quickly run out of memory on our GPU.

How do we achieve reusability? Simple. Let’s go back to the simple deferred lighting graph we saw on our previous post.

The Depth attachment is a full-screen 32-bit floating point texture and it’s pretty much unique since no other attachments share that texture format. We will assume that the rest of the attachments (normal, opaque, lighting, etc.) are also full screen, but they have an RGBA8 color format. 

By looking at the graph, it’s clear that the Normal attachment is no longer needed once we’ve accumulated all lighting information (since no other render pass makes use of it). Therefore, if we manage to schedule the passes correctly, we can reuse that attachment for storing the result of the translucent pass, for example.

An that’s it. Thanks to our graph design, we can easily identify which inputs and outputs each render pass has at the time of it’s execution. We also know how many passes are linked with any given attachment.

There’s a catch, though.

Let’s assume we want to generate a debug view like this one:

Top row: depth, normal, opaque and translucent. Left column: opaque+translucent, sepia tint and UI

In order to achieve that image, we need to modify our render graph to make it look like this:

The final frame (Debug Frame) is created by the Debug render pass, which reads from several of the previously created attachments in order to compose the debug frame that is displayed. This prevents us from reusing attachments completely because all of them are now needed at all times. That might be an acceptable loss in this scenario because it’s only for debug, but you definitely need to plan each dependency correctly if you want to maximize reusability.

For my implementation, I’ve decided to reuse only attachments, while FBOs are created and discarded on demand. This helps minimize memory bandwidth as well as providing maximum flexibility for creating offscreen buffers.

DISCARDING NODES

Another advantage of using render graphs is that we’re able to identify which nodes are actually relevant to achieve the final frame during the graph compilation time. That is, of all the nodes in the graph, we’re only interested in keeping and executing only those are actually connected to the final node, which is the resulting frame for the graph.

For this reason, each render graph define which attachments serves as the resulting frame for the entire process. Depending on which attachment is set as the final frame, some render passes will become irrelevant and should be discarded.

Once again, look at the debug render graph above, paying special attention to the debug nodes at the right.

We have two possible final frames. The one that only contains the scene (bottom center) and the debug one (bottom right).

If we set the scene frame as the resulting frame, then the Debug Pass will be discarded since its result is no longer relevant and the final render graph will look like the one at the very top of this post. Then, after compiling the render graph, the passes will be executed as following:

That’s great, but why? Why adding extra nodes that are going to be discarded anyway? Well, you shouldn’t do that… except that by doing so it will allow you to create something like an ubber-pipeline, including debug nodes and different branches too. Then, by defining which one is the actual final frame (maybe using configuration flags), you can end up producing different pipelines. I know, it might seem counterintuitive at first, but in practice it’s really useful.

Closing comments

I’m going to leave it here for now, since this article has already become much longer than expected. 

Render graphs are kind of an experimental feature at the time of this writing, but I’m hoping they will become one of the key players in the next major version of Crimild. Together with shader graphs, they should help me create entire modular pipelines in plain C++ and forget about OpenGL/Metal/Vulkan (almost) completely. 

Now it’s time to prepare one more release before the year ends 🙂