D5 GI｜Striving for Offline Rendering Quality with Real-time Experience
To get a photorealistic rendering, the first solution users may think of is to use an offline renderer. By definition, "offline rendering" (or "precomputed rendering") is not something focused on timeliness, as the user does not need to see the rendering results instantly. They might just let the renderer calculate for minutes or even hours based on preset models, lights, and materials. The rendering strategy used by the offline renderer is more focused on realism and accuracy for the sake of quality, regardless of the time cost, and therefore results in high quality images.
But, is it possible to render Photorealistic Frames real time? That's a different story. "Real-time rendering" sets a limit on rendering time, it has to output dozens of frames per second to ensure smoothness. The time spent on each frame is only a few tens of milliseconds (say 30 frames per second, that's 33 ms/frame). The biggest challenge to get realistic renderings under such time constraints is to deal with huge amounts of computation.
Let's do the math. For a 1920x1080 resolution image, if we are going to use a path-tracing algorithm that calculates 3 bounces per path, then that's 6,220,800 rays. But at this point, it's just one sample per pixel (1 spp) and the image will normally look like this, full of noise:
For the image to "converge" (meaning "the rendering is finished and noises are gone"), it may take thousands of samples per pixel, and the image will look like this:
UE4 Path Tracer
Render time: 3′45″
Samples: 2048 spp
GPU: NVIDIA RTX 3060
If you zoom in, the noise is still there:
The above image cast a total of 12,740,198,400 rays and took more than 3 minutes to render. With such a huge number of samples, no matter how good your optimization is, there is no way to reduce the rendering time to a "real-time" standard (tens of milliseconds/frame) with current hardware configurations.
Can RTX Technology Make a Difference?
RTX technology has greatly accelerated the speed of ray casting, but based on industry data, it is still not fast enough to meet the requirement above.
Let's look at the official stats from NVIDIA. Theoretically, NVIDIA RTX 2080 Ti can cast 10 Giga rays per second, but this data was obtained in extremely simple scenes:
Look closer into the chart, we noticed the catch: single models with no background and only Primary Rays are examined.
In practice, the scenes are far more complex. In addition to ray casting, there will be a lot of shading calculations, meaning that the test data of 10 Giga rays per second will be further spread in the actual rendering scenes. Long story short: RTX technology does not magically solve the problem of insufficient samples.
With such a huge amount of samples to compute, it is not recommended to do the computation with its actual volume. In fact, nobody does that and all renderers (yes, all of them) play more or less some tricks (limiting the number of bounces, clamping highlights, adaptive sampling, etc.) to speed up the rendering process. This is true for offline renderers, but even more so for real-time rendering. The question now is how do we avoid full volume computation and leverage some smart tricks to render realistic-looking images in real time?
This is where Denoising Technology comes in. It has been improving in recent years to at least show us the possibility of Real-time Photorealistic Rendering. Many offline renderers are using Denoiser to speed up their process, and the results are so good. With Denoisers, the elimination of noises, which usually take hours, can now be done within minutes. The output image is incredible. But still, this doesn't meet the speed requirement of real-time rendering.
For real-time rendering, the Denoise sample may be working with only a few spps per frame, and the Denoising time may only be 1 millisecond. This means that the Denoised result cannot be perfect, but it has to look as convincing as possible for the human eyes.
Both NVIDIA's Real-time Denoising Technology and Intel Open Image Denoiser can achieve great noise free results under real-time conditions, displaying smooth images instantly with only a small number of input sample:
Scene: Crytek Sponza With Intel Open Image Denoiser Rendered at 16 spp and denoised using noisy albedo and normal buffers Image source: https://www.openimagedenoise.org/gallery.html
D5 GI Details
In short, there are two tasks in the rendering process that you have to deal with: Sampling and Denoising. Understanding this general premise, the goal of the D5 development team's efforts becomes clear:
- How to get higher quality samples, especially Diffuse GI samples, in a short time before image Denoising.
- How to decide the most appropriate Denoising techniques in different application scenarios (live preview, still frame rendering, video frame sequence rendering).
Compared with the lights and shadows produced by direct lighting, indirect light details (Diffuse GI) are often ignored by the audience, but it plays a crucial role in whether the image is realistic and believable. Poor GI leads to "fake" images with “floating” objects and the sense of "unreal".
Comparison of high quality GI and low quality GI
In application scenarios requiring real-time graphics, such as games, the traditional GI solution is to use pure rasterization, but the rendering results are not ideal. For static scenes, you usually have to bake lightmaps beforehand. For dynamic scenes, light-probe based GI, VXGI, precomputed GI, etc. are used. The above methods may raise the issue of light leaking, or require the user to do some manual modification (such as tedious UV Unwrapping, manually adjusting the light probe position), or pre-calculation. In short, it takes a lot of time and effort to get a usable real-time scene.
What the D5 team wanted was to directly import and render both dynamic and static scenes without much manual processing. Therefore, we tried an approach different from that of real-time game rendering, combining the advantages of multiple rendering techniques, optimizing for different application scenarios such as direct lighting, multiple light sources, and large scene lighting, to boost performance, improve frame rate, and eliminate light leaking/noise.
Hybrid GI Solution
To obtain samples efficiently, D5 GI uses a hybrid strategy that combines ray tracing and light-probe based GI to find the balance between accuracy and efficiency.
Light Probe is a commonly-used GI sampling method in real-time rendering that runs efficiently with the capability to handle GI of dynamic scenes. While loading the scene, countless light probes are evenly distributed throughout the scene. As the real-time rendering runs, each light probe will dynamically update its collected light information, and will adjust the update frequency according to its importance.Probe information with little impact on the scene will be updated less frequently to save computing resources.
In a pure light-probe GI solution, when we need to shade a point on the model, we only need to find a few neighboring light probes and interpolate its value to obtain the light information. Due to interpolation, the results of such GI calculations tend to show luminance transitions at low frequencies that are smooth and noise-free, making them visually more acceptable to the viewer. Accordingly, the result of such approach lacks detail and produces "light leaking" and "shadow leaking" issues at the corner of models, because the interpolation process takes into account of probe information too bright or too dark when sampling:
As shown above, the light leaking on the ceiling is because some values are taken from the outdoor probe, and the floor is too dark because the value is taken from the underground probe.
To add more details and increase the accuracy of GI, D5 Render uses ray tracing sampling for the first few GI bounces before using the information from light probes. Due to the accuracy of the Brute Force bounces, D5 effectively solves the light leaking problem caused by the pure light probe solution:
This hybrid strategy for calculating GI is excellent in terms of ensuring the speed of sampling while preserving the details of GI:
The GIF shows a comparison of GI quality of a video frame sequence, with the path-tracing sampling (usually used as a standard reference) on the left and D5 hybrid GI sampling on the right. Both of them have the same sampling time and no denoising is applied. It can be seen that with the same sampling time, D5 GI quickly achieves more realistic and convincing results.
Number of Bounces Increased in Version 2.1
In the latest version 2.1, D5 Render further increases the number of Brute Force light bounces in real-time previews and render output, which effectively improves the accuracy of GI and further enhances the realism of images.
Real-time preview, direct light only
real-time preview, direct light + 1 bounce (no denoising)
real-time preview, direct light + 2 bounces (no denoising)
Vegetation GI Optimization
Since the total number of light probes in the scene is fixed, for larger scenes, the light probes may be too scattered and GI accuracy is not enough. This problem will be more obvious in the vegetation of large scenes, where the shadows of plants may be too bright and "floating".
To address this issue, D5 specifically adjusts the GI sampling of plants to eliminate the sense of floating, making the shadows more realistic and the plants more grounded.
Version 2.1 GI Sampling Algorithm Optimization
The D5 R&D team thought that the real-time rendering experience in the previous version was still not good enough, and there was still a lot of noise in some complex scenes, resulting in jittering spots on the screen after denoising. So, we have developed algorithms with higher efficiency to further improve the performance of real-time preview.
In the previous version of D5, the number of rays was limited when casting rays, and the first bounce would still have a large sampling variance. In certain areas, only a small amount of GI was collected, which resulted in spotted results after denoising. Below you can find a challenging scenario for GI calculations, where there is only one open door and the room is completely illuminated by bouncing light from the doorway, and here is the real-time GI from a previous version of D5:
New D5 GI leverages RestirGI Technology of reusing samples from previous frames and neighboring locations, and solves the problem of only a small amount of GI being picked up in certain areas, thus reducing sampling noise, especially in flat areas. With this optimization, the efficiency of GI sampling has been improved by about four times compared with previous versions. This means better live previews, higher FPS, and higher image GI quality for the same rendering time.
D5 version 2.1 GI algorithm improvement:
Both images above are 1 sample per pixel
In different application scenarios, D5 Render uses different denoising techniques in order to achieve the best results in three different situations: Real-time Preview, still-frame rendering and video rendering.
In High Quality real-time preview, D5 uses Screen Space Denoising (SSD) algorithm, and the denoising overhead for each 1920x1080 frame is around 1-2ms, ensuring a smooth real-time preview frame rate and allowing users to have a WYSIWYG interactive experience while creating scenes. This, is the core value of D5 Render.
Real-time preview of Screen Space Denoising
For still-frame images, D5 uses Intel Open Image Denoiser. Before denoising, hundreds of skylight samples and dozens of GI samples are collected for each pixel. In combination with the D5 GI strategy described above, we can obtain enough high-quality sample information for the AI Denoiser to output smooth results "instantaneously":
Before VS After, Intel Open Image Denoise
For an even better denoising result, Open Image Denoiser also allows the input of Albedo and Normal channels, so that the edges and textures of the resulting image can be kept clear. D5 uses this process to denoise still-frame images and preserve as much detail as possible.
Above: With Albedo and Normal information, you can preserve more details in the AI denoised result. The corners and wood textures are much clearer.
Video Frame Sequence Rendering
Open Image Denoiser result lacks coherence between frames, which means that it causes jitters and flickers in the video frame after denoising.
Screen Space Denoiser (SSD) result, on the other hand, may lose some of the high frequency details of the picture. To solve this, we introduced a World Space Denoiser (WSD), which changes the way denoiser picks up samples:
The red circle on the left shows the sampling area of SSD, and the red circle on the right shows the sampling area of WSD.
More high frequency details can be preserved via adjusting the denoising intensity according to the spatial information of the scene:
D5 GI Summary
This article was born from several in-depth conversations between me and the D5 R&D Team.In which I learned a lot of behind-the-scenes information and stories. The D5 rendering team is flexible in thinking and constantly trying out new ideas and technologies in the industry. D5 GI is not a GI solution without further evolvement, instead, it is a rapidly iterating, progressing option with continuous new algorithms and optimizations. To conclude this article, here is a few words from a D5 developer:
"Of course, the current algorithm is far from optimal, and we will continue to improve D5 GI. For the next step, we are going to invest most of our resources into Ray Guiding and Caching, to achieve better results and performance. After that, we may consider give it a try in the direction of Neural Radiance Caching."
— The D5 Rendering Team
So, in the face of increasing complexity and rendering overheads of scenes, how to maintain a real-time interactive experience while ensuring the quality of the images
becomes a million dollar question. Another challenge for the D5 team is the way to guarantee "Large Scene Capability and Real-time Interactivity