Updates, Updates, Updates!

Oh!

Hello, there…

I… I wasn’t expecting any visitors…

Are you sure you’re not lost?

No?

And you want to know what’s new with Crimild? Really?

You want…

Updates?

Are you sure?

No, no, no. The project is not dead. Far from it. I was just too lazy busy to write about it.

Ok, then.

Let’s find some updates for you…

I’m sure I saw some of them around…

Ah, yes! Here’s something new:

Yes, Crimild has a new editor!

(because the old one sucked, to be honest).

The new editor is still in its earlier stages, of course, but it already includes features like a 3d scene where you can manipulate objects, a simulation panel that can be paused at any time, an inspector for modifying things like node transformations and components, and a hierarchy view where we can organize our scene. And there’s a file system panel too for our project, but it doesn’t do anything special right now… Sorry project panel, better luck next year.

And yes, I’m using ImGui for all of the UI rendering. ImGui made things a lot easier and there are already great extensions for graph-based editors and transformation gizmos, which I’m also taking advantage of.

Speaking of graphs, there’s a Behavior editor too:

Behaviors are stronger than ever with this new tool. But what about scripting, then? Well, I ended up deprecating all Lua scripts in favor of this visual programming-like paradigm, much like other engines are kind of doing too.

The one thing that made all this possible was Crimild’s ability to encode/decode scenes (and other kind of objects) directly into binary format. This was used not only for saving stuff to disk, but also in more subtle ways like when a simulation runs and we need to duplicate scene nodes.

But wait! There’s more!

You’ll be happy to know that, on the rendering side, there’s an improved Vulkan renderer (or renderers), which is much simpler than the first iteration I made last year, completely isolating rendering code from simulation. There are also enhanced shadow mapping techniques, too.

You want more?

I’ve modernized all CMake configurations files. Yes, I finally did it! I know, I know, I should have done this several years ago…

Finally, I’m using Github actions now, compiling the project to several platforms whenever new PRs are created. Now there’s no excuse for failed tests. And I’m also using Github Projects instead of Trello, which integrates better with the whole ecosystem.

Well, that’s it.

That’s all I have…

For now 🙂

Happy New Year!

PD: Did you know that Crimild turned 20 this year?

Stochastic Iterative Tile-Based Compute Ray Tracing (SITBC-RT)

(or, how I manage to make ray tracing a bit faster without having one of those fancy new GPUs…)

Crimild’s exiting ray tracing solution (yes, that’s a thing) is still very basic, but it is capable of rendering complex scenes supporting several primitive types (i.e. spheres, cubes, planes). The problem is that it’s too slow, mainly because it’s completely software based, and runs on the CPU. True, it does make use of multithreading to reduce render times, but it’s still too slow to produce nice images in terms of minutes instead of hours.

Therefore, I wanted to take advantage of my current GPU(*) in order to gain some performance and decrease rendering times. Since I don’t own a graphics card with ray tracing support, the only option I had left was to use GPU Compute instead.

(*) My Late 2018 Macbook Pro has a Radeon Pro Vega 20 dedicated GPU, with only 4GB of VRAM. 

Requirements

What I needed was a solution that provides at least the following features:

  • Physically correct rendering: Of course, the whole idea is to render photorealistic images.
  • Short render times: Ray tracing is slow by definition, so the faster an image is rendered, the better.
  • Iterative: Usually, I don’t need the final image to be completely rendered in order to understand if I like it. As a matter of fact, only a few ray bounces are required to know that. So, the final image should be composed over time. The more time passes, the better the image quality gets, of course.
  • Interactive: This is probably the most important requirement. I need the app to be interactive while the image is being rendered, not only so I can use menus/buttons, but also to allow me to reposition the camera or change the lightning conditions and then check the results of those decisions as fast as possible.
  • Scalable: I want a solution that works on any relatively modern hardware. Like I said, my setup is limited.

The current solution (the CPU-based one) is already physically correct, iterative and quite scalable too, but it falls too short on the other requirements. And it is nowhere close to being interactive.

Initial attempts

My first attempts were very straightforward.

Using shaders, I started by casting one ray for each pixel in the image, making it bounce for a while and then accumulating the final color. It was, after all, the very same process I also do in the CPU-based solution, but now triggering thousands of rays in parallel thanks to “The Power of the GPU”.

Render times did drop considerable with this approach, generating images in a matter of seconds instead of minutes. And it seemed like the correct solution. But it had one major flow: it was not scalable and, most importantly, it wasn’t even stable.

For simple scenes, everything was ok. Yet, the more complex the scene became, the more computations the GPU has to do per pixel and per frame. That can cause the GPU to be stalled completely, eventually crashing my computer (yes, Macs can crash).

So, a different approach was needed.

Think different

I took a few step back, then.

I knew that the most important goal was not to produce image faster (keep reading), but to keep the simulation interactive, and stable too. That meant running it at more than 30 fps (or more, if possible), leaving the time budget for a single frame to be 33ms (or less).

In short, I needed to constraint the amount of work we do in each frame, even if that means the final image might take a bit longer to render. It will still be faster than the CPU-based approach, of course.

After some thought, I came up with a new approach: for each frame, only a single ray per thread is computed. Let me repeat that: compute one ray per frame, not one pixel. Obviously, producing the final color for a single pixel will require multiple frames but that’s fine.

How does it work?

  1. Given a pixel, we cast a ray from camera to that pixel.
  2. We compute intersections for the current ray
  3. If the ray hits an object, the material color is accumulated and a new ray is spawn based on reflection/refraction calculations. The new ray is saved in a bag of rays for a future iteration.
  4. The next frame, we get one ray from the bag and compute the intersections for it, spawning another ray if needed.
  5. If there are no more rays in the bag for that pixel, then we can accumulate the final color and update the image.
  6. And then we start over for that pixel.

And that’s it.

Notice that I mentioned that we grab any ray from the bag. I don’t care about the order in which they were created. Eventually, we’ll process all the rays so there’s really no need to get them in order. Even if more rays are generated, the final image will converge.

Results

This is indeed a very scalable solution, since we can choose how many threads to execute in the GPU each frame: 1, 30, 500, 1000 or even one per pixel in the image if the scene is not too complex or we have a powerful setup.

Of course, the less threads we execute, the more time the image will take to complete. But it still takes less than a minute to provide a “good enough” result in my current computer.

Next steps

There’s always more work to do, of course.

Always.

Beyond of adding new features to the ray tracer (like new shapes or materials), there is still room for optimization. For example, I’m not using any acceleration structure (i.e. bounding volume hierarchies) like the CPU-based approach does. Once I do that, performance will be even better.

And there is the problem of the noise in the final image, but I don’t really have a good solution for that yet.

See you next time!

Decisions, Decisions

In the past couple of weeks months I’ve been doing some clean up process for most of the code I’ve written during the big rendering system refactor in order to make it production-ready. Part of that process is to start making some choices that are going to have a major impact in the future of the project (at least for Crimild v5.0).

Here are the biggest decisions I had to make so far:

Deferred vs Forward

For years I’ve been using a (mostly) forward approach for rendering objects that are affected by lights. That is, each object has a shader that not only calculates its color, but also compute all of the lighting equations. This is the traditional way of rendering and, as the number of lights increases, so does the rendering time. Plus, it needs to evaluate each light for each pixel regardless if that pixel is actually visible or it’s occluded by another object. Simply put, it does not support a high number of lights.

I always liked the deferred approach, where we split the rendering process in two: one pass will render all objects without lighting, while we do the lighting calculations in a separated process. Deferred rendering does support a lot more lights, but has other drawbacks like having to render transparent objects in a separated pass or more memory requirements, but it’s still a lot better than what I have right now. Plus, it’s used in a lot of modern games and engines.

I gave it a try a few years back, but never really had “official” support in the engine. Now, I finally made the call and I’m going to start using deferred rendering from now on. Overall, it should keep things simpler in the long term and should help me introduce real-time ray tracing some day.

I’m aware of other, more modern, approaches like Forward+ or Clustered Rendering, but those are too complex for Crimild at the moment. Due to the modular nature of the new frame graph, implementing such technique in the future should not represent another big refactor of the entire rendering system. So, I might give it a try next year.

PBR Lighting

Another decision I made is to stick with Physically-Based Rendering (PBR) for lit objects as the only lighting solution that comes bundled with the engine. For years I attempted to maintain both physically-based and traditional (specular/Phong) lighting solutions, but there’s no point in doing that anymore since PBR is the current standard.

Of course, custom lighting solutions are supported if needed, but from now on I’m not going to be one having to maintain them.

Better glTF Support

The glTF file format has been around for quite some time now and it has become the standard for handling 3D assets.

At the moment, Crimild depends on the Assimp Library in order to load glTF models, but I’m going to change that sometime in the near future since Assimp is pretty big and I’m using only one of the many file formats it supports. Plus, it generates a lot of warnings when compiling, and nobody likes warnings.

I would love to have a glTF loader that is part of the Core module, just as the OBJ loader is. Reading glTF (either JSON or binary formats) is relatively easy. The real challenge lays in the fact that the Core module must be written in ANSI C++ and it must not depend on any external libraries. Then, I’m going to implement my very own JSON parser, which is not a simple task. I guess I’ll stick with Assimp for now.

Which one?

This has nothing to do with Crimild, but it was definitely the hardest decision of all…