Surfel Maintenance for Global Illumination
2025-01-22 • 2943 words • 15 min • Max <mxcop>Introduction
Since you've found this blog post, it's likely you already know what Surfels are.
Regardless, I will start with a brief explanation of what they are, and what we can use them for.
Surfels can be used to divide up the surface of geometry into discrete patches.
This is very useful for caching lighting information for example in the case of Global Illumination.
We can see this division of the surface of geometry below here, where each Surfel patch is given a random color.
The name Surfel comes from combining the words Surface & Element.
Surfels are commonly described using 3 parameters:
- Position (Position on a surface)
- Radius (How much area the Surfel represents on the surface)
- Normal (Normal of the surface)
In Figure A, we can see a visual of these 3 parameters.
You may be wondering now what are these Surfels useful for?
At it's core Surfels are a method for dividing up scene geometry into discrete patches, as I mentioned above.
So, it is not limited to one use case only, however it is commonly used for caching lighting information for Global Illumination.
I highly recommend you to check out EA SEED's GIBS (Global Illumination based on Surfels)
Their talk at SIGGRAPH 21 has been my primary source for information on Surfels.
A big advantage of Surfels as a light information cache, is that Surfels persist between frames.
So, we can simply accumulate information within them between frames, without a need for reprojection.
Now that we know what a Surfel is made out of, and what we can use it for.
Let's dive into how I dynamically managed Surfels for my Global Illumination solution specifically.
There will be many small/big differences between my Surfel maintenance and that of GIBS.
The Pipeline
Let's start with a high level overview of the maintenance pipeline for a single frame.
As we can see in Figure D, the Surfel maintenance consists of 4 stages:
- Spawn — Uses the GBuffers to decide where on screen to spawn new Surfels.
- Transform — Updates the Surfel world-space positions based on the object they're attached to.
- Accelerate — Inserts Surfels into a spatial acceleration structure to accelerate lookups.
- Recycle — Decides whether Surfels are still relevant.
I marked the Transform pass with a * because it is technically optional.
Due to time constraints, I didn't get to implementing a Transform pass myself.
However, I will still try to explain how one would implement it.
The pipeline also needs a place to store all the Surfel data of course.
We allocate these buffers upfront, with a fixed maximum number of live Surfels in the scene at once.
I will go into more detail about the buffers in the sections below.
Now that we have an idea of how the pipeline fits together, let's dive into the details of each pass.
Surfel Acceleration
In many cases we want to find Surfels around a given position.
For example, when spawning Surfels we want to know if there are already Surfels nearby.
Or when recycling we want to know if there are too many Surfels nearby.
Spatial Acceleration Structures
In order to do this quickly, we need a spatial acceleration structure.
I'm going to cover a few options which I personally investigated:
- Uniform Grid (The most basic structure)
- Trapezoidal Grid (Used by GIBS)
- Multi-level Hash Grid (What I ended up using)
I initially began by using a 3D uniform grid, for me that was a good starting point.
However, it scales very poorly when you want larger scenes, and when your Surfels are relatively small.
GIBS combines a small uniform grid centered on the camera with trapezoidal grids extending outwards.
This gets you very fast lookups, as the trapezoidal grids are simply uniform grids with a non-linear transform.
However, I instead decided on using a multi-level hash grid, so I never implemented this scheme.
The multi-level hash grid is relatively simple, it is a normal hash grid storing a uint
for each cell.
Cell hashes are created by combining a grid position & grid level, I used 9 bits for each axis and 5 bits for the level.
/* Hash a Surfel grid location. (XYZ9L5) */
uint surfel_hash_function(const uvec3 loc, const uint level) {
return xxhash32(((loc.x % 512u) << (9u * 0u))
| ((loc.y % 512u) << (9u * 1u))
| ((loc.z % 512u) << (9u * 2u))
| ((level & 31u) << (9u * 3u)));
}
The grid level is determined using a logarithm of the distance squared to the camera.
For each live Surfel the acceleration pass will insert the Surfel into all cells it overlaps.
It's also very important here to be aware of the grid level transitions, on which you need to insert the Surfel into both levels.
Filling the Structure
I mentioned how each grid cell only stores a uint
, this is key to making the structure memory efficient.
Surfels are represented by a unique ID, however we do not want to store those directly in the grid cells.
Instead we want each grid cell to store an index into a list of Surfel IDs.
This unique ID we use to refer a Surfel is also the index inside the buffers where it's data is located.
To achieve this, we will need multiple passes for filling the acceleration structure.
This applies to all the acceleration structure variants I described above.
As we can see in Figure H, the idea is that after the passes, the grid buffer points to a range of elements in the Surfel list buffer.
Because a Surfel can be in multiple cells, the Surfel list can contain duplicate IDs.
To achieve this, I used the following 3 passes:
- Surfel counting (for each Surfel, increment the
uint
inside each cell it overlaps) - Prefix sum (perform a prefix sum over the entire grid buffer)
- Surfel insertion (for each Surfel, decrement the
uint
inside each cell it overlaps and write the Surfel ID into the Surfel list)
When looping over the Surfels, we always loop over the entire Surfel buffer.
If the radius of a Surfel is 0 we know the Surfel is not live, so we can return early.
/* <===> Pass 1 <===> */
for (surfels) for (overlapping cells) {
/* Increment the atomic counter in the hash grid cell */
atomicAdd(surfel_grid[surfel_hash_function(...) % grid_capacity], 1);
}
/* <===> Pass 2 <===> */
/* Perform a inclusive prefix sum on the grid buffer */
/* <===> Pass 3 <===> */
for (surfels) for (overlapping cells) {
/* Decrement the atomic counter in the hash grid cell */
const uint offset = atomicAdd(surfel_grid[surfel_hash_function(...) % grid_capacity], -1) - 1u;
surfel_list[offset] = surfel_ptr; /* <- Insert the Surfel ID into the Surfel list */
}
It's very important here to be aware of the grid level transitions while counting & inserting.
On the transitions you have to insert Surfels into both grid levels.
After this you will now be able to quickly find nearby Surfels with no limit of the bounds of your scene.
However, it is important to note, hash grids experience hash collisions.
So we have to be aware that you might sometimes get Surfels which are actually very far away.
Surfel Spawning
When spawning Surfels we spawn them from the GBuffer, depth & normals.
The tricky part is avoiding spawning Surfels too close to each other.
The way we combat that is by splitting the screen into 16x16 pixel tiles.
Each tile will find the pixel within itself which has the least Surfel coverage.
If that coverage is below a certain threshold, we will spawn a new Surfel on that pixel.
You may want to choose a different tile size, depending on the size of your Surfels.
Coverage Testing
We can check the coverage for a pixel by checking the acceleration structure of the previous frame at the position of the pixel:
/* Location in world-space visible through this pixel */
const vec3 pixel_pos = camera_pos + pixel_dir * pixel_depth;
/* Grab the start & end indices for this pixel's hash cell */
const uint hashkey = surfel_pos_hash(pixel_pos, camera_pos) % grid_capacity;
const uint start = surfel_grid[hashkey], end = surfel_grid[hashkey + 1u];
float coverage = 1e30;
for (uint i = start; i < end; ++i) {
/* Grab the Surfel ID from the List */
const uint surfel_ptr = surfel_list[i];
/* Use the Surfel ID to fetch it's position & radius */
const vec3 p = surfel_pos[surfel_ptr];
const float r = surfel_radius[surfel_ptr];
/* Find the highest coverage */
coverage = max(coverage, point_coverage(pixel_pos, p, r));
}
surfel_pos_hash(...)
finds the grid position, level, and feeds those intosurfel_hash_function(...)
.
Now, in order to efficiently communicate across the 16x16 tile of pixels on the GPU we can use group shared memory.
The idea is simple, we have a single uint
as group shared memory, groups are 16x16 lanes in size.
shared uint gs_candidate;
Each lane corresponds to a pixel in a 16x16 tile, and will combine its coverage & local index into a uint
:
/* Compound the pixel coverage & pixel index within its 16x16 tile into a uint */
const uint score = min(65535u, (uint)(coverage * 1000.0));
const uint candidate = ((score << 16u) & 0xffff0000) | (local_idx & 0x0000ffff);
The reason we add the local index is make each pixel have a unique uint
.
Because, now we're going to perform an atomic minimum on that group shared uint
.
gs_candidate = 0xFFFFFFFF; /* Reset the groupshared candidate */
barrier();
/* Find the pixel with the lowest coverage in this group */
atomicMin(gs_candidate, candidate);
barrier();
Now, after this memory barrier we can check if our pixel was the one with the least coverage:
if (gs_candidate == candidate) {
/* Spawn Surfel if the coverage is below the threshold... */
}
Spawning Surfels
We now know which pixels want to spawn a Surfel without causing much overlap.
So, let's have a look at how exactly we spawn Surfels.
This is where the Surfel Stack buffer comes into play.
First of all, as we can see in Figure I, the Surfel Stack has to start out filled with all unique Surfel IDs.
The order in which they are placed doesn't matter, as long as they're all unique.
We can see the spawning sequence in Figure I, we simply increment the Surfel Stack pointer.
And use the unique ID in the stack as the Surfel ID which we can write data into:
/* Pop the Surfel Stack (atomic) */
const uint stack_ptr = atomicAdd(surfel_stack_pointer, 1);
const uint surfel_ptr = surfel_stack[stack_ptr];
/* Write our new Surfel data into the Surfel buffers */
surfel_pos[surfel_ptr] = ...;
surfel_radius[surfel_ptr] = ...;
For simplicity I left out the bounds check here, but you should make sure the stack isn't full!
Spawning Optimization
Because I am managing multiple sets of Surfels, spawning can get pretty expensive.
An optimization I came up with is to only check 1/4 of the pixels for coverage each frame.
Cycling through a 2x2 tile of pixels every 4 frames.
On my AMD Radeon 890M iGPU this resulted in the total time spend spawning going from:
~6ms
-> ~1ms
, which is a rather huge improvement, without losing any significant quality.
Surfel Recycling
If all we do is spawn new Surfels, we will quickly run out of our Surfel budget.
Lucky for us, most Surfels can be recycled after a while, when they are no longer relevant.
Recycling Heuristic
The way we decide whether to recycle a Surfel is based on a heuristic.
This heuristic is usually a combination of a few different parameters:
- Time since used (Last time when the Surfel was sampled)
- Surfel coverage (How many Surfels are nearby)
- Live Surfel count (How many Surfels are currently in use)
To find the Surfel coverage we can use the same method we used during spawning to find the coverage.
The time since used can be stored on the Surfel, to always be incremented during the recycling pass.
We can reset that time every time we sample the Surfel's lighting information.
Once we have the heuristic, we can use it as a random chance for recycling.
Or we can use it as a threshold for deterministic recycling.
Despawning Surfels
If we decide to recycle a Surfel we'll have to push it back onto the Surfel Stack.
All Surfel IDs to the right of the Surfel Stack pointer will always be unique IDs to unused Surfel Data.
To maintain that while recycling, we can decrement the Surfel Stack pointer, and write out Surfel ID into the Stack slot that just opened up.
We can see this sequence in Figure J where we recycle Surfel ID 8
by pushing it back onto the Surfel Stack.
/* First set the recycled Surfel's radius to 0.0, marking it as unused */
surfel_radius[surfel_ptr] = 0.0;
/* Then decrement the stack pointer & write the Surfel ID into the open slot */
const uint slot = atomicAdd(surfel_stack_pointer, -1) - 1u;
surfel_stack[slot] = surfel_ptr;
As mentioned ealier, the radius is also used when looping over all Surfels to check if the Surfel is live.
This also allows us to early out during the recycling pass for example:
/* Compute kernel executed for each Surfel in Surfel Data */
void main(uint thread_id) {
const uint surfel_ptr = thread_id;
const float radius = surfel_radius[surfel_ptr];
if (radius == 0.0) return; /* Early out if this Surfel is not live */
/* ... */
}
Surfel Transformation
During the spawning sequence, when we place Surfels we're currently doing so in world-space.
Which means that Surfels won't move with the geometry that they're supposed to be attached to.
Transform Buffer
Ideally we want to instead attach Surfels to the model matrix (transform) of objects in the scene paired with a local position.
This is what GIBS does, they have a buffer filled with the transforms of all objects in the scene.
When spawning a new Surfel, they assign that Surfel the transform ID of the object they are spawning on.
Which points to the model matrix of that object in the global transform buffer.
To know what transform a Surfel should be attached to when spawning, GIBS has a transform ID Gbuffer.
We can see a debug view of this in Figure K, each color represents a different transform ID.
Unfortunately I cannot go into more detail on this part of the pipeline, because I personally skipped it due to time constraints.
Performance
We're almost at the end of the blog post now, so let's look at some performance measurements.
Keep in mind that these timings are for 6 individual Cascades of Surfels, so you can expect timings normally to be better.
Each Surfel Cascade has
1/4
the Surfel count of the previous one, with the first having262.144
Surfels at most.
First, here's the performance on my AMD Radeon 890M integrated GPU, we can see that the hash insertion (Surfel Insertion) is the most expensive part.
This is because it requires us to wait for an atomic operation to complete, because we need to know which Surfel List index to insert the Surfel into.
Besides that, I'd argue the performance is quite good, especially because this is maintaining 6 sets of Surfels.
Now let's look at the performance on my NVIDIA RTX 4070 Mobile GPU.
We can see here that the hash insertion has gone down quite a bit, and now the recycling pass is actually the most expensive pass.
However, again I'd argue the overhead of maintenance is relatively small here.
Conclusion
To round this off, we've looked at how we can maintain a large number of Surfels in an efficient manner.
We took a high level overview of the entire pipeline, and then went into the details of each maintenance pass.
And in the end we briefly looked at performance on 2 modern GPUs.
It took me quite some research and trail & error to figure out some of the details.
So, I hope this blog post sheds some more light on the details of how to maintain Surfels.
Resources
Here's a few resources which helped me on my Surfel journey :)
- Hybrid Rendering for Real-Time Ray Tracing Ray Tracing Gems 2019.
- SIGGRAPH 2021, GIBS https://youtu.be/h1ocYFrtsM4.
- Surfel GI implementation in
kajiya
by Tomasz https://github.com/h3r2tic/kajiya