Cycles X Project Update

The rendering improvements from the Cycles X project will be in the upcoming Blender 3.0 release. Since the announcement, developers have been working to complete and stabilize the code, as well add new features and improve performance.

Well give a quick overview of recent developments.

GPU Performance

GPU rendering performance has been further improved. Here’s where we stand compared to 2.93.

Render time on an NVIDIA Quadro RTX A6000 with OptiX

This is an accumulation of many incremental changes. For details, see the GPU kernel documentation and GPU performance development tasks.

At the time of the initial announcement there was no volume rendering support. Since then we have restored volume rendering, and found that GPU rendering performance improved 3-5x in various volume scenes.


Hair and Shadow Improvements

While most benchmark scenes were rendering faster with Cycles X, a few involving many layers of transparent hair were showing performance regressions compared to 2.93.

One issue we found is that in GPU rendering, if only a small subset of the whole image is slow to render (like a character’s hair) then GPU occupancy would be low. This was improved by making the algorithm to estimate the number of samples to render in one batch smarter. Previously we’d end up rendering 1 sample at a time. Now we detect low GPU occupancy and adaptively increase the number of samples to batch together, which then increases occupancy.

Another part of the solution was to change the shadow kernel scheduling. Previously, continuing to the next bounce would have to wait for all light and shadows to be resolved at the previous bounce. Now this is decoupled, and shadow tracing work for many bounces can be accumulated in a queue. This then gives a bigger number of shadow rays to trace at once, improving GPU occupancy. This matters especially when only a small amount of pixels are going 64 bounces deep into transparent hair, as in the Spring scene.

Further, we found that transparency in hair is usually quite simple, either a fixed value or a simple gradient to fade out from the root to the tip. Instead of evaluating the shader for every shadow intersection, we now bake transparency at hair curve keys and simply interpolate them. Render results are identical in all scenes we tested. Below are two sample images to compare the results.

For the statistics enthusiast, here are some memory and timing results for a few well known scenes, so that you can see the results for the transparent hair baking and the shadowing optimizations (ref is the reference without any optimizations).

Shadow Optimization Results

Distance Scrambling aka Micro-Jittering

Sobol & Progressive Multi-Jitter (PMJ) can now use distance scrambling (or micro-jittering) to improve GPU rendering performance by increasing the correlation between pixels. There is also an adaptive scrambling option to automatically choose a scrambling distance value. These are available in the advanced settings in the render properties.

To render the above images the scrambling distance was set to zero to maximize the correlation between pixels. This should not be used in practice and was only done in order to make it easier to see the correlation introduced by the micro-jittering (notice the girls shoulder in the images above to the right). In a real setting you would generally have a larger distance to hide these artifacts. This technique can result in less noisy images and in some cases improved performance in the range of 1% to 5% depending on your rendering setup (it’s only beneficial for GPU rendering). Below are some performance results using the adaptive scrambling distance which currently does not work so well for CUDA due to the tile sizes. Work is currently underway to choose better tile sizes for CUDA which should result in better performance.

#tabs-block_6197d0ea298a9 {
color: ;
}
#tabs-block_6197d0ea298a9 .embed-responsive-item,
#tabs-block_6197d0ea298a9 img {
background: ;
}


Ambient Occlusion

Ambient occlusion did not take into account transparency in the initial version of Cycles X. We now restored this, taking advantage of the shadow kernel improvements that also helped with hair.

Also, additive ambient occlusion (AO) support is now available through the Fast GI settings. Additionally, a new option has been added to “Add” the AO result as well as the “Replace” operation that was available already. Below are a few images to compare the results.

#tabs-block_6197d16d298ad {
color: ;
}
#tabs-block_6197d16d298ad .embed-responsive-item,
#tabs-block_6197d16d298ad img {
background: ;
}


Denoising Improvements

We improved denoising for volumes. Previously these were mostly excluded from the albedo and normal passes used by denoisers. While there is not exact equivalent to albedo and normals on surfaces, we make an estimate. This can significantly help the denoiser to denoise volume detail.

The denoising depth pass has also been restored, which was previously removed along with NLM.


AMD HIP

We’ve worked with AMD to bring back AMD GPU rendering support. This is based on the HIP platform. In Blender 3.0, it is planned to be supported on Windows with RDNA and RDNA2 generation discrete graphics card. It includes Radeon RX 5000 and RX 6000 series GPUs.

We are working with AMD to add support for Linux and investigate earlier generation graphics cards, for the Blender 3.1 release. While we would have liked to support more in 3.0, HIP for GPU producing rendering is still very new.

However we think it is the right choice going forward. It lets us share the same GPU rendering kernels and features with CUDA and OptiX, whereas previously the OpenCL implementation was always lagging behind and had more limitations and bugs.

To test the HIP release you need to get the Blender 3.1 alpha and also to download the latest AMD drivers (See this blog post for more information.)

Blender 2.93 vs Blender 3.0 on AMD

Apple Metal

We also recently announced a collaboration with Apple. They are contributing a Metal backend for Cycles , planned for Blender 3.1.


Future Work

Now that the new Cycles X architecture is in place, we expect that adding various new production features will be easier. This will start in 3.1 and continue through the 3.x series.

Download the latest Blender 3.1 Alpha builds to try out the new features.

Source: Blender

Leave a Reply

Your email address will not be published.


*