The Ghost of Refactors Past…

…has come to hunt me once again. Although this time it’s not because of mistakes that I did. Instead, the problem lies in something that I missed completely.

Can you spot the problem in the following code?

void SomeObject::save( Stream &s )
{
   std::size_t count = getElementCount();
   s.write( count );
}

Well, it turns out std::size_t is NOT PLATFORM INDEPENDENT. And here’s the twist: I knew that since, well, forever, but I never paid any attention to it. That is, until it became a problem. And the problem was EVERYWHERE in the code.

First thing first. The C++ standard has this to say about std::size_t:

typedef /*implementation-defined*/ size_t;

 

What’s that supposed to mean? Basically, std::size_t may have different precision depending on the platform. For example, in a 32-bit architecture, std::size_t may be represented as a 32-bit unsigned integer. Something similar happens in a 64-bit platform.

Wait.

std::size_t is supposed to help portability, right? That’s its whole purpose. And that’s true, of course. There’s nothing wrong with std::size_t itself.

Check the code above again. Go on, I’ll wait.

So, whenever we create a [binary] stream to serialize the scene, we can’t use std::size_t because it’s just plain wrong. It will be saved as a 32-bit integer in some platforms and 64-bit in others. Later, when the stream is loaded, the number of bytes read will depend on whatever the precision of the current platform is, regardless of what was used when saving. See the problem?

This means that we can’t share binary streams between different platforms because the data might be interpreted in different ways, leading to errors when generating scenes.

For the past few years, my main setup have been OS X and iOS, both 64-bit platforms. But one day I had to use streaming on 32-bit Android phones and, as you might have guessed by now, all hell break loose…

Entering crimild::types

I had to made a call here: either we can keep using std::size_t everywhere and handle the special case in the Stream class itself; or we can make use of fixed precision types for all values (specially integers) and therefore guaranteeing that the code will be platform independent.

I went for the second approach, which seems to me to be right choice. At the time of this writing, the new types are:

namespace crimild {

   using Int8 = int8_t;
   using Int16 = int16_t;
   using Int32 = int32_t;
   using Int64 = int64_t;

   using UInt8 = uint8_t;
   using UInt16 = uint16_t;
   using UInt32 = uint32_t;
   using UInt64 = uint64_t;

   using Real32 = float;
   using Real64 = double;

   using Bool = bool;
 
   using Size = UInt64;
}

As you can see, crimild::Size is always defined as a 64-bit unsigned integer regardless of the platform.

Yet that means I need to change every single type definition in the current code so it uses the new types. As you might have guessed, it’s a lot of work, so I’m going to do it on the fly. I think I already tackled the critical code (that is, streaming) by the time I’m writing this post, but there’s still much to be reviewed.

New code already makes use of the platform-independent types. For example, the new particle system is employing UInt32, Real64 and other new types and–

Oh, right, I haven’t talked about the new particle system yet… Well, spoiler alert: there’s a new particle system.

You want to know more? Come back next week, then 🙂

 

 

 

Advertisements

Deferred improvements

Revisit the deferred rendering implementation took me a little more time than expected and it wasn’t easy, but I’m very excited with the results and the flexibility that has been achieved.

In contrast with my last post, this one is gonna be all about visual improvements. So, let’s begin.

Truth be told, my original goal was to enhance just the post-processing pipeline, adding support for more than one image effect at the same time and then accumulating results. But I ended up refactoring the entire deferred render pass and added a couple of new things on the way. Because, you know, refactors are fun 🙂

As before, the Deferred render path is split into three stages: G-Buffer generation, lighting and post-processing.

The G-Buffer

The G-Buffer is composed of five render targets organized as follows:

G-Buffer organization

G-Buffer organization. Floating-point buffers are used whenever possible in order to keep data precision.

Both world and view space normals are stored for different purposes. As a matter of fact, view space normals are generated just from the geometry itself, without any bump mapping applied to them since some post-processing effects achieve better results with less information (i.e. SSAO)

g_buffer

The G-Buffer in action. Top row: depth, diffuse and world space positions. Bottom row: emissive (unused in this demo), world space normals and view space normals

Lighting

The second step is to compute lighting and generate a colored frame. Usually, this step involves two passes: one for lighting and one for the final composition, but I’m doing both in a single pass. There’s room for some improvements here, but I leave that to my future self.

Lighting computation, before applying post-processing effects

Lighting computation, before applying post-processing effects

Lighting is computed from world space information. Shadow maps are applied here after the scene is lit.

Post-Processing

Once the scene is generated, image effects are applied. A couple of auxiliary buffers are used (following the “ping-pong buffer” technique), accumulating results.

Ping-pong buffer. For each image effect, the source and destination buffers are swapped. Once all effects have been processed, the source buffer contains the final image

Ping-pong buffer technique. For each image effect, the source and accumulation buffers are swapped. Once all effects have been processed, the source buffer contains the final image

Concerning image effects, the new additions are Depth of Field and SSAO. There was a previous implementation for SSAO, but the new one performs blurring in order to reduce noise and improve the final results.

Applying Depth of Field to the scene

Applying Depth of Field to the scene

SSAO only

Rendering the scene with only the output from the SSAO effect, before applying it to the scene

Final Comments

In order to debug the entire process, I made it possible to render the results of all the passes at the same time. It is a costly operation, but extremely useful when trying to understand what’s going on. In fact, I’m planning to add this feature to other systems as well in a future iteration.

Top row: depth, diffuse, positions, shadow map (packed). Middle row: emissive (unused), world space normals (with bump), view space normals and screen objects. Bottom row: lighting and post-processing

Top row: depth, diffuse, positions, shadow map (packed). Middle row: emissive (unused), world space normals (with bump mapping applied), view space normals and screen objects. Bottom row: lighting and post-processing (SSAO + DoF)

That’s it for now. I’m done with refac–

Wait…

Those shadows look awful…

The silent refactor

I’ve been busy. Very busy. The year is coming to its end and I had the need for one more refactor. One big refactor. One that has been in my list for quite some time. And I called it the “silent” refactor.

As my other project is moving forward, so are the needs for new things. Improved text rendering, shadows, better input methods, physics… Alas, most of the newest features had some kind of impact in the general performance of the engine. And at some point the game was no longer playable at 60fps.

Enters the silent refactor: a series of low-level changes with a single purpose in mind: improving performance. While I tried to avoid changing the high-level APIs, some of them got broken in the way because of the updates. For example, I went back to using C++ smart pointers and the majority of the function signatures got changed as a side effect. It wouldn’t have been fun otherwise 😉

Anyway, the biggest change of all is probably multithreading support. Yes, I just said the “M” word. Simulations now run in two threads: one for window management, rendering and input stuff and another thread for the actual simulation pass, including updating components and physics.

The render pipeline suffered some adjustments too. Now it’s material-centric, meaning the render queue organizes primitives based on the material they’re using, thus avoiding a lot of redundant draw calls due to state changes. This is a huge performance gain if there are several objects sharing materials or primitives.

Maybe the most visible change is the support for Retina displays in OSX. There was an aliasing problem in older versions caused by a difference between the window size and the frame buffer size. For Retina displays, the later is twice the size of the requested window size. So, for example, a 1280×720 window size needs a 2560×1440 frame buffer in retina-based monitors. The fix required me to upgrade GLFW to the latest version, which had some side effects in other subsystems, but nothing too worrisome.

The code is a bit unstable as I still need to clear some concurrency issues, but the performance gain is noticeable. Even with shadows enabled, the simulation is back to 60 fps.

That’s all, folks. The last refactor of 2014.

Happy New Year!