Cascading Shadow Maps

I mentioned in my previous post that calculating shadows for directional lights was the simplest case of shadow mapping. Well, I’m proud to announce that it’s no longer the case, since the process has become a lot more complex now thanks to cascading shadow maps.

Shadow mapping for directional lights (lights that are very far away from the scene and which rays are considered to be parallel) is fairly straightforward in theory. The shadow map is computed by calculating the closest distance from a light source to the objects in our scene. And it affects the entire scene, so every single object must be processed when calculating the shadows. An orthographic projection is used in this case to simulate the parallel rays.

But the problem with this approach is that it doesn’t scale well when there are multiple objects spread all over the place, since the resolution of the shadow texture is limited. Yes, we could increase the texture resolution but the GPU has a hard limit there too (I think it’s 8k for high-end GPUs at the time of this writing).

Cascading shadow maps work by following a simple idea: we need shadows that are closer to the camera to have the greatest resolution. At the same time, we don’t really care if the shadows that are far away look pixelated (up to a point, of course). Therefore, we need to generate multiple shadow maps, each of them processing a different section of the scene, using a slightly different projection to render different objects based on their distance to the camera.

For example, let’s assume our camera has a maximum viewing distance of 1000 units (meters, feet, light-years, etc…). Then, we could split the scene objects into four groups based on the distance from any object to the camera. Each of those groups will be rendered into a different shadow textures, as follows:

  • Group 1: objects that are less than 10 units away from camera. These are the objects that are closest to the camera and the ones ending up with the higher resolution shadows.
  • Group 2: objects that are less than 100 units away from the camera. These objects should still get a pretty decent shadow resolution.
  • Group 3: objects that are less than 500 units away from the camera. For these objects, the shadow resolution won’t be great, but it might not be that bad either.
  • Group 4: objects that are farther than 500 units. These are the objects in the background. Here, the shadows will look really pixelated and smaller objects might not even cast shadows at all.

Here’s how it looks like in action:

Notice how shadows look pixelated at first since they are farther away from the camera, but they start to look better and better as the camera gets closer.

The following video shows how the different objects are grouped together into cascades of different colors:

Notice how shadows become less and less pixelated as the camera gets closer.

If you paid attention on those videos above, you might have noticed that some shadows disappear completely when the camera gets closer (look at the cyan cascade in the second video). The reason for that problem is very simple to explain: for each cascade, we need to "zoom into" the objects that we care about. So, only objects that are currently visible from the point of view of the camera are actually processed when computing the shadow maps. It might be possible that one object is affected by shadows from objects that are not visible from the current viewpoint. In the videos above, some of the cubes get culled by the camera when they're not visible and therefore not rendered in the shadow map. I still need to fix this behavior. 

If you want to know more about this technique, here’s an excellent article from GPU Gems 3 which I used as basis for my implementation.

Shadows Everywhere!

I spent the last couple of weeks working on improving shadows for each of the different types of lights in Crimild. The work is far from over, but I wanted to share this anyway since there are some visible results already.

Historically, support for shadows have always been poor in Crimild, often limited only to directional or spot lights. Now that I’m refactoring the entire rendering system, it was a good time to implement proper shadow support for all light types.

Light types in Crimild

There four different light types supported by Crimild at the time of this writing:

  • Ambient: This is not really a light source, but rather a color that is applied to the entire scene, regardless of whether the objects are under the influence of any other light or not. It’s supposed to serve as an indirect light source (think about light that bounces from walls or other bodies), since there’s no support for global illumination in Crimild (yet)
  • Directional: The simplest light source. It simulates a light that is very far away and therefore all rays are assumed to be parallel (think about the Sun). Directional lights do not have a position in the world and influence the entire scene.
  • Spot: This is a light that has a position and a direction. The most straightforward example are street lamps or a flashlight. Also, spots may define a cone of influence for the light, only lightings objects inside that area.
  • Point: This is a light source that has a position in space and cast light rights in all directions (a torch or a light bulb). Point lights have an area of influence as well, defined as a sphere.

Shadow Mapping

The technique I used for shadows is the same one everyone’s been using in games for the past 15 or more years: shadow mapping. This technique requires to render the scene at least once from the point of view of the light, producing an image where each pixel is defined as the distance between the light and its closest geometry. Another way to say this is, if the light is casting rays from its origin (or in a given direction), we want to know the distance to the very first objects that are intersected by those rays.

After the shadow map is created, we render the scene as usual in a different pass (from the camera’s point of view this time) and, for each visible object, we calculate its distance (distV) to a given light and we compare that value with the one stored in the shadow map for the same light (distS). If distV > distS, it means something else is closer to the light and therefore the visible object is in shadow.

That was an extremely simplified description of what shadow mapping is. Check this link if you want to know more about this technique.

In the videos below, the white rectangles in the lower-right corner show the computed shadow maps for each of the light types. Darker objects are farther away from the light source.

Shadow Atlas

If there are several lights that need to cast shadows, we need to create a shadow map for each of them, of course. In order to optimize things a bit (and make the shader code simpler), all shadow maps are stored in a single shadow atlas (which is a big texture, basically).

The shadow atlas is not organized in any particular at the moment, though. All shadows are computed in real time, every frame, and the atlas is split into regions of the same size. This is not ideal, but it works.

A future update will split the altas into regions of different sizes. The bigger the region, the more resolution the shadow map will get and the better the final shadow will look. But, how can we define which region is given to which light? Simple: by predefining priorities for each light source. For example, directional lights should be rendered with as much resolution as possible (since they pretty much need to contain the entire scene). So, they should use the biggest available region. Point and spot lights, on the other hand, can be sorted based on distance to the camera. The closer the light source, the bigger the region it has in the shadow atlas.

Enough theory. Let’s see this in motion.

Directional Lights

This is the simples scenario (since I haven’t implemented cascade shadow maps yet). The scene is rendered once from the point of view of the directional light. Since that particular light type is supposed to be far away, we use an orthographic projection when creating the shadow map, meaning are not deformed when projected in the ground.

If more than one light is casting shadows, we need to render the scene once per light, computing the corresponding projection on each case.

The video below shows the shadow atlas in action. All shadow maps are rendered in the same texture.

Spot Lights

Shadows for spot lights are computed in a similar way as for directional ones, except that in this case we use a perspective project when rendering the shadow map since light rays are emitted from a given position in space. The final effect is that shadows are stretched with distance.

If we have multiple spot lights, each of them is rendered individually.

Point Lights

Computing shadows for point lights is the most expensive one, since we need to render the scene six times. Why? Since point lights cast rays in all directions, the shadow map is actually a cube with six faces. It’s like having six spot lights, pointing up, down, left, right, forward and backward.

As you might have guessed, this complexity increases even more as more point lights are added to our scene. In that case, we’re rendering the scene 6*N times, where N is the number of point lights casting shadows.

Next Steps

As I mentioned before, the work is not yet completed. At the moment, I’m working on cascading shadow maps, which is a technique to improve shadow resolution for directional lights (I’ll talk about it when it’s ready).

Also, I want to make some optimizations both on the shadow atlas organization, as well as on each of the light sources. For example, I can use frustum culling to avoid rendering objects that are not actually visible for a given light source. But I’ll leave all that until after implementing physically based rendering (hopefully before the end of the year).

Stay tuned.

Writing Compositions

When I started working on the new frame graph and render passes (and related classes), I knew they were a bit of a pain to work with. Simply put, they are just too verbose.

Let me give you an example: Let’s assume you want to render a simple triangle? Then, it’s just as simple as:

  1. Create one or more attachments, each of them including:
    1. Usage flags (is it a color or a depth attachment?)
    2. Format flags (is it RGB, RGBA, whatever the swapchain supports, RGB8, RGB32?)
    3. Is it supposed to be used as a texture? Then you also need to specify image views
  2. Create a pipeline
    1. Lot’s of settings (rasterization, depth)
    2. More settings (alpha, cull mode)
    3. Plus, you need to create and configure shaders
    4. Also, more settings (viewport modes, etc)
  3. Create descriptors for
    1. Textures
    2. Uniform buffers
      1. So many options
  4. Create a render pass
    1. Link it with attachments and descriptors
    2. Is it rendering offscreen? 
      1. Link it with more attachments and descriptors from other render passes
    3. Record render commands
      1. You need a pointer to your scene here, BTW…

See? Easy… (that’s the simplest scenario I can think of)

Ok, I’m not being fair here.

Yes, the API is verbose. But that’s exactly how the API is supposed to be. And that is OK, because I wanted it to be verbose. That kind of verbosity is exactly what allows us to customize our rendering process to whatever needs we have in our simulation. Besides, we are only supposed to do that once in our code (unless our rendering process for some reason). So all that verbosity is acceptable. 

Wait. If we only need to deal with that verbosity only a few times in our program, why am I complaining about it?

The problem is that as I am re-organizing the demos and examples, I suddenly found myself writing lots of new applications. And that means, having to deal with that verbosity in each of them. Which is annoying, not only because I have to repeat myself every time I want to render a scene, but also because the API is still changing and I have to go over all of the examples time and time again in order to make sure they’re all up-to-date.

Therefore, I need a simpler way to deal with this verbosity. 

But let me be clear here. I don’t want to get rid of the verbosity by introducing a simpler API. That’s a big NO. I like that verbosity. I’m very happy with that verbosity and I know that is the cost I have to pay for having that kind of customization power.

What I want is just a way to not having to repeat myself every time I create a new demo. To be honest, I don’t care if this is not good enough for real-world applications (more on that later).

So, I need a tool to compose different render passes and mix them in an expressive way. Therefore, I need a… oh, right. I spoiled it already with the title of this post. Mmm. Ok, I’m going to say it anyways and you promise you’ll make your best surprised face, ok? Here we go:

What I need is a Composition tool.

Surprise!

About compositions

Let’s go over our requirements again:

  1. We’re going to be creating lots of objects when defining the different render passes and they need to be kept alive for as long as our composition is valid. In order to accomplish this requirement, we can store them in a struct called Composition, containing a list of objects. If a composition is destroyed (i.e. the app ends, or we swap the composition with a different one), all of its internal objects will also be destroyed.
  2. We’re going to be rendering images, so we need to treat Attachments in a special way. We need to declare at least one of them to be the resulting image (the one that is going to be presented to the screen). We also need to access them by name so we can link different render passes if needed (for example, we might need to apply some special effect to an image generated by rendering a 3D scene). For this purpose, the Composition type also keeps a map with references to all of the existing attachments (there is a chance for name collisions between attachments, but I don’t care for the moment. I don’t want to complicate things too much at this stage).
  3. Obviously, we need a mechanism for creating compositions that is reusable. That is the whole point of this discussion. This mechanism will deal with all the verbosity I mentioned above but, since it’s reusable, that’s not a problem. These mechanisms are called generators and they’re simple functions that return Composition objects.
  4. Finally, we want to be able to mix generators in order to produce more complex effects. For example, we might want to apply some special effect to an image containing a rendered scene. And then overlay UI elements on top of it. So, the generators receive an optional Composition argument, which can be augmented with new objects (and it should produce images).

So, how do all that look like in practice?

Composition myGenerator( void ) 
{
  Composition cmp;
  auto color = cmp.create< Attachment >( "color" );
  auto depth = cmp.create< Attachment >( "depth" );
  auto renderPass = cmp.create< RenderPass >();
  renderPass->attachments = { color, depth };
  renderPass->recordCommands();
  cmp->setOutput( color ); // the main attachment
  return cmp;
}

Composition anotherGenerator( Composition cmp )
{
  auto baseColor = cmp.getOutput();
  auto color = cmp.create< Attachment >( "anotherColor" );
  auto descriptorSet = cmp.create< DescriptorSet >();
  descriptorSet->descriptors = {
    Descriptor {
      .type = TEXTURE,
      .texture = baseColor->getTexture(),
    },
  };
  auto renderPass = cmp.create< RenderPass >();
  renderPass->attachments = { color };
  renderPass->descriptors = descriptorSet;
  renderPass->recordCommands();
  return cmp;
}

auto finalComposition = anotherGenerator( myGenerator() );

A more real-world example might look like this:

namespace crimild {
  namespace compositions {
    Composition present( Composition cmp );
    Composition sepia( Composition cmp );
    Composition vignette( Composition cmp );
    Composition overlay( Composition cmp1, Composition cmp2 );

    // pure generators
    Composition renderScene( Node *scene );
    Composition renderUI( Node *ui );
  }
}

auto composition = present( 
  overlay(
    sepia( vignette( renderScene( aScene ) ) ),  
    renderUI( aUI )
  )  
);

Notice how some generators can augment existing compositions in order to apply effects. For this purpose, they can access existing attachments (or maybe other resources) inside the composition by name.

Also, renderScene and renderUI are consider pure generators, since they will create a new composition from scratch (it receives no composition argument).

Finally, overlay takes two compositions and produce a new one that is a mix of both. In this case, both input compositions are merged. Then, the resulting one contains all objects.

Now it’s really easy to create new applications combining these compositions together:

I even when as far as creating a debug composition generator, that takes every single attachment ever created by other generators and display them on screen:

Final Thoughs

I like this approach because it’s simple and we can combine different generators together. Performance is not the best at the moment, since every time we pass a composition from one generator to another we’re (probably?) copying the internal collections (not the objects themselves, though), which is not great.

This is done only once in our app (usually at the very beginning), which is not that bad, but it might not be a good solution for performance-heavy applications or games.

Yet, it’s more than enough for examples or simple simulations, which is exactly what I needed.