A new way to handle vertex data

As I mentioned in previous entries, one of the goals I’m pursuing for the next major release of Crimild is a more modern rendering subsystem. In a way, Vulkan is forcing me to upgrade some of the engine’s core classes which were defined more than a decade ago.

Today I’m going to talk about how Crimild will handle geometric data (i.e. vertices, indices, uniform buffers, etc.) from now on.

Yes, this is a long post.

OK, go get some popcorn. I’ll wait.


Let’s begin.

The Problem

In the past, using any of the objects representing buffers that contain geometric data looked something like this:

// Array containing vertex's data
float vertices[] = {
    /* XYZ */             /* RGBA */
    0.0f, 1.0f, 0.0f,     1.0f, 0.0f, 0.0f, 1.0f,
    -1.0f, 0.0f, 0.0f,    0.0f, 1.0f, 0.0f, 1.0f,
    1.0f, 0.0f, 0.0f,     0.0f, 0.0f, 1.0f, 1.0f,

// Declare vertex format
// positions components: 3 (xyz)
// colors components: 4 (rgba)
// everything else: 0
auto format = VertexFormat( 3, 4, 0, 0, 0, 0, 0 );

// Create vertex buffer. We need to specify the number of vertices too
auto vbo = crimild::alloc< VertexBufferObject >( 

The VertexBuffer interface was defined a long, long time ago (yes, that’s a lint to SF.net). And, to be honest, it worked in pretty well fashion during all these years. But now it’s starting to show its age.

First of all, the type for every possible vertex attribute is the same one: float. That’s it. While it works for positions, normals or colors, it’s weird for other attributes like for bone or face indices, for example. And there’s no way to use a different precision. Let’s face it: we’re already in 2015 2016 2017 2018 2019 2020 and now we can use pretty much any type we want in our shaders, so why limiting them to only one?

Also, you cannot specify any other attribute that is not already included by the VertexFormat class. More importantly, adding new attributes to that class is a bit of a pain and cannot be done by end users without modifying the engine’s code.

As a side note, since it's one of the most essential classes, modifying  VertexFormat in any way requires recompiling pretty much the entire engine, which is really annoying. 

Plus, we need to declare all vertex data as a giant blob of float values, which is obviously not not straightforward to understand when three or more attributes are presents and we need to fallback to add comments in between values.

And don’t get me started on how to access individual attribute data

A new data abstraction

Hopefully by now it should be clear enough why we need an alternative mechanism to handle geometric data data. Honestly, it wasn’t an easy call to make because I would have to change a LOT of code. But someone had to do it. And I’m the only one in here, so…

After some time and several failed attempts, I came up with the following class hierarchy:

Buffers, Views & Accessors

Let’s start with the basics: Buffers.

Buffers store linear data in a continuous byte array but without really knowing what that data represents. It could be anything from lots of vertices to a single value or even an image. You can obtain the buffer’s raw data and it’s size, but you don’t really know what it is nor how it should be manipulated.

Then, we have Buffer Views, which give us a little bit more information, like what region of that data we really care about. For example:

We can create several views referencing different regions of the same buffer of data. But then again, views don’t tell us what that data is either.

That’s were Buffer Accessors come in play, helping us manipulate the contents of a buffer view, either individually or in bulk.

Let’s see how all these three concepts work with each other in the next sections.

Vertex Buffers In 2020

This is how Vertex Buffers are created in my new implementation:

struct VertexP2C3 {
    Vector2f position;
    ColorRGBAf color;

auto layout = VertexLayout()
    .withAttribute< Vertex2f >( VertexAttribute::Name::POSITION )
    .withAttribute< ColorRGBAf >( VertexAttribute::Name::COLOR );

auto vbo = crimild::alloc< VertexBuffer >(
    Array< VertexP2C3 > {
            .position = Vector2f( -0.5f, 0.5f ),
            .color = RGBColorf( 1.0f, 0.0f, 0.0f ),
            .position = Vector2f( -0.5f, -0.5f ),
            .color = RGBColorf( 0.0f, 1.0f, 0.0f ),
            .position = Vector2f( 0.5f, -0.5f ),
            .color = RGBColorf( 0.0f, 0.0f, 1.0f ),
            .position = Vector2f( 0.5f, 0.5f ),
            .color = RGBColorf( 1.0f, 1.0f, 1.0f ),

Much nicer, right? Let’s see what’s going on here.

First, we create a vertex buffer from an array of data, but this time that array can be of any type. Even an array of structs declaring our vertex contents such as the example above.

The key concept here is the VertexLayout class. A layout is created by specifying which attributes are available and their format and sizes. The total size of a vertex is calculated automatically. The VertexAttribute class can be easily extended since all names are numbers.

There are several vertex layouts already declared in Crimild, but you can also create custom ones based on your needs. And considering you’re able to supply any VertexLayout with any number of VertexAttribute specifications (plus any index format as we’ll see later), you have almost unlimited freedom of expression without having to change a single line of the engine. You still need to observe the different capabilities of the target GPU, of course.

Then, thanks to buffer accessors, you can now access values in an easier (and faster) way, either individually or in bulk. Assigning a new set of positions it’s now as easy as this:

auto positions = vertices->get( VertexAttribute::Name::POSITION );
  Array< Vector3f > {
    Vector3f( -0.5f, -0.5f, 0.0f ),
    Vector3f( 0.5f, -0.5f, 0.0f ),
    Vector3f( 0.0f, 0.5f, 0.0f ),

If the vertex buffer contains interleaved data (that is, positions, normals and other attributes are packed together), buffer accessors will take care of offset’ing the data correctly for free.

You can also create vertex buffers from buffer views and accessors, in case you need even more flexibility.

Regarding Index Buffers

Index buffers have been upgraded as well. This is how they used to work until now:

// Array of indices to draw a triangle
uint16_t indices[] = {
   0, 1, 2

// Create index buffer
auto ibo = crimild::alloc< IndexBufferObject >( 

Index buffers have always been a much simpler abstraction to understand and work with. But they have one big problem: the type of an index is fixed (either uint16 or uint32 depending on how you compile the engine). That’s quite a limitation since it constraints the amount of vertices we can have in our models.

How do they look like now?

auto indices = crimild::alloc< IndexBuffer >( 
    Array< UInt32 > {
        0, 1, 2,

Now much has changed. But notice that now we are able to provide the index format, which can be either 16 bits, 32 bits or maybe other types. That allow us to have models with millions of vertices, provided our hardware supports them, of course.

As with vertex buffers, you can create index buffers from views, in case you need to.

The Revolution: Uniform Buffers

This is probably the biggest improvement of all the ones I mentioned so far. Uniform buffers have been completely changed and now they’re simpler and more powerful than ever.

Uniform buffers are used to send data to shaders. Any kind data. Therefore, they have to support pretty much infinite formats and number of elements (once again, as many as the target hardware supports).

It doesn’t really make sense to show the previous implementation because it’s just too different, so I’m going to let the new code to do the talking here:

// A single vector value
auto v = crimild::alloc< UniformBuffer >( Vector3f() );
v->getValue< Vector3f >().x() = 10;

// A custom type
struct Data {
    Matrix4f proj;
    Matrix4f view;
    RGBAColorf color;
    float weights;
    Vector2i indices;
auto data = crimild::alloc< UniformBuffer >( Data {} );
v->getValue< Data >().view = Matrix4f::IDENTITY;

We create a new uniform buffer by initializing it with a valid object, so its constructor can create a buffer of a proper size (that force us to provide a default value, which is a good way to avoid logical errors in shaders).

And there’s a simple callback to update uniform values whenever we need to (updating some of the matrix data above, for example).

It’s really that simple.

I’m really happy about this one change.

A Question Still Remains

You might have notice something about this new design: similarly to the previous approach, the new Buffer class stores the data that is usually copied to GPU at a later stage in the rendering pipeline. But then, that data still remains in main memory during the whole lifetime of a buffer. Why am I still doing this? Why not sending all buffer data straight to GPU upon creation and delete it from main memory?

There are some reasons to keep the data alive, specially because getting the data back from GPU is difficult and slow (and requires some sort of synchronization mechanism). Probably the correct way to handle this is by using a policy that depends on the resource. Some resources will be ok if we just push the data straight to GPU on creation and then forget about it. Others will require the data to live in main memory so we can read or write it frequently.

I’ll get back this in the future once I get more information on real world scenarios. For the moment, keeping the data in main memory does not present a problem considering the types of projects I’m working on.

What about images?

Images and textures are direct users of the Buffer class and, of course, they have been upgraded too–

Oh, you ran out of popcorns? That’s a problem

Let’s talk about images and textures in the next post, then.


Long Live And Render (VII)


Yes! Shadows are finally working (again).

A higher resolution video can be found here

This is a big achievement because it’s the first real use of multiple render passes and shared attachments.

And it makes everything look nicer.

I always said that Vulkan is difficult to work with, but I do like how easy it is use attachments as textures. I guess I reached the point where I’m actually seeing the benefits of this new API (other than just better performance, of course).

Regarding shadows, they’re created from a single directional light. You might have noticed that the shadows are actually incorrect, because directional lights are supposed to cast parallel shadows using an orthographic projection. I am using a perspective projection instead (shown in the little white rectangle at the bottom right corner), but just because it makes the final effect look nicer. The final implementation will have correct shadows for directional lights, of course.

Let’s talk about descriptors sets

In Vulkan, descriptors are used to bind data to each of the different shader uniforms. Resembling newer OpenGL versions, Vulkan allows us to group multiple values into descriptors (uniform buffers), reducing the number of bind function calls.

But that’s only the beginning. In Vulkan, we can also group multiple of those descriptors together into descriptor sets, and each of them can be bound with a single draw command. So, we only need to create one big set with all the descriptors required for all shaders, then bind it with a single function call and be done with it, right? Well, not really (*).

Where’s the catch, then? We do want to minimize the number of descriptor sets, of course, but as the number of sets decreases, the amount of data we need to send in each of them increases. Therefore, a huge, single set approach leads to binding the whole set once per each object and render pass. Not ideal.

What we actually need is to group descriptors together depending on the frequency in which they are updated during a frame.

For example, consider shaders requiring uniforms like model, view and projection matrices to compute a vertex position. The last two of those matrices are only updated whenever the camera changes, which means that their values remain constant for all objects in our scene. On the other hand, the model matrix only needs to be updated when a model changes its pose. If the camera does change but the object itself remains stationary, there’s no need to update the model matrix. This is specially true when rendering the scene multiple times, like when doing shadows or reflections.

Then, we need two different sets, both of them being updated at different times. The first set contains the view and projection matrices and is updated only if the camera frame changes. The second set only contains the model matrix and it’s updated once per object (regardless of in which render passes is used).

In practice, shaders need much more data than just a bunch a matrices. There are colors, textures, timers, bone animation data, lighting information, etc. But we cannot create too many sets either since each platform defines a different limit for how many descriptor sets we can bind at the same time (Vulkan specs says that the minimum is four). Therefore, I’ve consider creating the following groups:

  • Render Pass specific descriptors
    This are all the descriptors that change once per render pass. Things like view/projection matrices, time, camera properties (FOV, near and far planes), each of the lights in the scene, shadow maps, etc.
  • Pipeline/Shader specific Descriptors
    Values that are required for shaders to work, like noise textures, constants, etc.
  • Material specific Descriptors
    These are the values for each property in the material, like colors, textures, normal maps, light maps, ambient occlusion maps, emission color, etc.
  • Geometry Descriptors
    These are values that affect only geometries, like the model matrix, bone indices, light indices, normal maps, etc.

Separating material and geometry descriptors is important. For example, if we’re rendering shadows, we don’t need the object colors. Just its pose and animations, for example.

Most importantly, these groups can change and be mixed however we like. If we update descriptors for a render pass, the scene will be rendered in a completely different way. We can also change materials without affecting the topology of the objects.

Up next…

There are lots of Vulkan features that I haven’t even look at yet, but there’s one in particular that I need to implement before I’m able the merge branch into the general development one: compute operations.

I want to be able to execute compute operations in the GPU for image filtering and/or particle systems, but that requires a lot of more work.

June is going to be a busy month…

(*) I actually did that a while ago when working on the Metal-based renderer. I did not really understand at the time how uniforms were supposed to be bound, so I made one big object including everything. That’s the reason why there’s only one shader in Le Voyage and no skeletal animation, basically.

Long Live and Render (VI)

In my last post I made it clear that there were several problems with my latest frame graph changes. Here I am today, a couple of weeks later, and I’m going to tell you how I managed to fix all three of them (well, two and a half), as well as making some bonus improvements on the way. 

Removing strong references

I made frame graphs to keep strong references to resources and render passes because it made sense at the time. But if any particular resource (like textures or vertex buffers) is no longer attached to a scene, there’s no point in keeping them alive since they won’t be rendered anyway, right?

This problem was pretty easy to solve, actually.

I only had to switch smart pointers for weak ones in most places, preventing any explicit or implicit strong reference to resources and render passes. Notice that I said *most* places, since the frame graph does allocate some internal objects (in the form of nodes) and I do need strong references for those.

There’s obvious side effect, though. Now it is mandatory for developers to keep track of all created objects in their apps because the engine might not do it automatically. Otherwise you’ll end up with crashes due to null pointers or invalid access to deleted memory. It’s an acceptable price to pay.

I could have added a storage policy to customize this behavior, but I do think that this is the right way. And I can add that policy later if I feel the need for it.

Moving on…

Automatic object registration to frame graphs

Another problem with my latest approach is that we need to add/remove objects to/from the frame graph manually. As I explained before, this is not only cumbersome but also very error prone. Specially now that the frame graph no longer keeps strong references to its objects and forgetting to remove a resource that was previously deleted from the scene will result in a very difficult crash to debug. 

I made a couple of decisions to solve this one:

First, any object that can be added to a frame graph will attempt to automatically do so during its construction, provided a frame graph instance exists. It will also attempt to remove itself automatically during destruction.

How can we tell if a frame graph exists? Easy: frame graphs are singletons (oh, my God! He said the S… word). After giving it some thought, I realize I don’t need more than one frame graph instance at any given point in time. I might be wrong about this, but I can’t think of any scenario where two or more instances are needed. I guess time will tell.

Second problem solved.

Rebuilding the frame graph automatically

To be honest, I didn’t want to spent too much on this problem at the moment. Fixing this particular issue could easily become a full-time job for several days or weeks, so I went with the easiest solution: whenever we add or remove and object, the frame graph is automatically flagged as dirty and will be completely rebuilt from scratch at some point in the future (i.e. during next simulation loop). 

This “solution” works, but is far from the most efficient one, of course. After all, why rebuilding the complete frame graph if we’re only adding a new node to the scene? Do we really need to record every single command buffer again? Well, I guess not. Maybe a change in a scene should not recreate command buffers for post-processing resources, since those are independent of the number of nodes in the 3D space. But the rules for making these decisions are not that simple. What if the nodes that were added to the scene are new lights and they change the way we do tone mapping, for example? 

Rebuilding everything from scratch is the safest bet here for now. And, in the end, this behavior is completely hidden inside the frame graph implementation and can (and will) be improved in the future. 

So, I say this is partially fixed, but still acceptable.

Bonus Track: Fixing viewports

I was not expecting to make any more changes, but when I was cleaning up the offscreen rendering demo I noticed a bug in the way viewports were set for render passes.

I wanted to have more control over the resolution at which render passes are render. For example, I needed an offscreen scene at a lower resolution but it wasn’t working correctly (ok, it was not working at all).

Now it does, which allows me to have render passes at different resolutions:

Notice how an offscreen rendering at lower resolution produces only a pixelated reflection. I can do the same for the whole scene, too. And I can combine them at will. Pretty neat.

Up Next

I’m quite happy with these fixes and the frame graph feels a lot more robust now.

And what is perhaps more important, the frame graph API can be made completely hidden to end users too. It would be easy to provide a default frame graph instance (as part of a simulation system, for example) and then an application developer can add/remove objects at will from scenes without worrying about the frame graph at all.

The next step will be to improve shader bindings (aka descriptor sets), which is something that is still very cumbersome. Or maybe I’ll do something more visual, like shadows. Or both 🙂

See you next time!