The NVIDIA Turing GPU Architecture Deep Dive: Prelude to GeForce RTX

Name: The NVIDIA Turing GPU Architecture Deep Dive: Prelude to GeForce RTX
Item: The NVIDIA Turing GPU Architecture Deep Dive: Prelude to GeForce RTX
Author: Nate Oh

by Nate Oh on September 14, 2018 12:30 PM EST

111 Comments | Add A Comment

111 Comments

Bounding Volume Hierarchy - How Computers Test the World

Perhaps the biggest aspect of NVIDIA’s gamble on ray tracing is that traditional GPUs just aren’t very good at the task. They’re fast at rasterization and they’re even fast at parallel computing, however ray tracing does not map very well to either of those computing paradigms. Instead NVIDIA has to add hardware dedicated to ray tracing, which means devoting die space and power to hardware that cannot help with traditional rasterization.

A big part of that hardware, in turn, will go into solving the most basic problem of ray tracing: how do you figure out what a ray is intersecting with? The most common solution to this problem is to store triangles in a data structure that is well-suited for ray tracing. And this data structure is called a Bounding Volume Hierarchy.

Conceptually, a BVH is relatively simple – at least for the purposes of this article. Rather than testing every polygon to see if a ray interacts with it, the idea is to test a portion of a scene to see if it interacts with a ray, and then keep drilling down. If there is an intersection with that portion of the scene, then subdivide it into smaller portions and test again. And again. And again. All the way until you reach the individual polygon, at which point the ray testing is resolved.

For the computer scientists in the crowd, this might sound a lot like an application of a binary search, and it is. Each test allows for a significant number of options (in this case polygons) to be discarded as possible answers. This gets to the right polygon in just a fraction of the time. A BVH, in turn, is stored in what’s essentially a tree data structure, with each subdivision – called bounding boxes – stored as children of their parent bounding box.

Now the catch with BVH is that while it radically cuts down on the amount of ray intersection needed compared to a naïve implementation, it’s still not super cheap. A number of tests are still required for each ray, with both successful and failed tests adding to the total number of tests taken. And all of this is for a single ray, when a significant number of rays are going to be needed for each pixel. Which is why hardware acceleration of the process is so important (and not at all easy).

The other major computational cost here is that BVHs themselves aren’t free. One needs to be created for a scene from the polygons in it, so there is an additional step before ray casting can even begin. This is more a developer concern – when can they modify and reuse a BVH versus building a new one – but it’s another step in the process. Furthermore it’s an example of why developer training and efficient engine implementations are so crucial to the process, as a poor implementation can make ray tracing much too slow to be viable.

Ray Tracing 101: What It Is & Why NVIDIA Is Betting On It The Turing Architecture: Volta in Spirit

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

111 Comments

View All Comments

xXx][Zenith - Friday, September 14, 2018 - link
Nice write-up! With 25% of the chip area dedicated to Tensor Cores and other 25% to RT Cores NVIDIA is betting big on DLSS and RTX for gaming usecases. With all the architectural improvements, better memory compression ... AMD is out of the game, quite frankly.

Btw, for the fans of GigaRays/Sec, aka CEO metric, here is a Optix-RTX benchmark for Volta, Pascal and Maxwell GPUs: https://www.youtube.com/watch?v=ULtXYzjGogg

OptiX 5.1 API will work with Turing GPUs but will not take advantage of the new RT Cores, so older-gen NV GPUs can be directly compared to Turing. Application devs need to rebuild with Optix 5.2 to get access to HW accelarated ray tracing. Imho, RTX cards will have nice speedup with Turing RT Cores using GPU renderers (Octane, VRay, ...) but the available RAM will limit the scene complexity big time.
blode - Friday, September 14, 2018 - link
hate when i lose a game before my opponent arrives
eddman - Friday, September 14, 2018 - link
As far as I can tell, GigaRays/Sec is not an nvidia made term. It's been used before by others too, like imagination tech.

The nvidia made up term is RTX-ops.
xXx][Zenith - Friday, September 14, 2018 - link
Ray tracing metric without scene complexity, viewport placement is just smoke and mirrors ...

From whitepaper: "Turing ray tracing performance with RT Cores is significantly faster than ray tracing in Pascal GPUs. Turing can deliver far more Giga Rays/Sec than Pascal on different workloads, as shown in Figure 19. Pascal is spending approximately 1.1 Giga Rays/Sec, or 10 TFLOPS / Giga Ray to do ray tracing in software, whereas Turing can do 10+ Giga Rays/Sec using RT Cores, and run ray tracing 10 times faster."
eddman - Saturday, September 15, 2018 - link
I don't know the technical details of ray tracing, but how is that nvidia statement related to scene complexity, etc?

From my understanding of that statement, they are simply saying that turing is able to deliver far more rays/sec than pascal, because it is basically hardware-accelerating the operations through RT cores but pascal has to do all those operations in software through regular shader cores.
niva - Wednesday, September 19, 2018 - link
If you hold scene complexity constant (take the same environment/angle) and run a ray tracing experiment, the new hardware will be that much faster at cranking out frames. At least that's how I'm interpreting the article and the statement above, I'm not really sure if that's accurate though...
Yojimbo - Saturday, September 15, 2018 - link
When did Imgtec use gigarays? As far as I remember they didn't have hardware capable of a gigaray. They measured in hundreds of millions of rays per second, and I don't remember them using the term "megaray", either. Just something along the lines of "200 million rays/sec".
eddman - Saturday, September 15, 2018 - link
Why fixate on the "giga" part? I obviously meant rays/sec; forgot top remove the giga part after copy/paste. A measuring method doesn't change with quantity.
Yojimbo - Saturday, September 15, 2018 - link
Because the prefix is the whole point. The point is the terminology, not the measuring method. Rays per second is pretty obvious and has probably been around since the 70s. The "CEO metric" the OP was talking about was specifically about "gigarays per second". Jensen Huang said it during his SIGGRAPH presentation. Something like "Here's something you probably haven't heard before. Gigarays. We have gigarays."
eddman - Saturday, September 15, 2018 - link
What are you on about? He meant "giga" as in "billion". It's so simple. He said it that way to make it sound impressive.

The NVIDIA Turing GPU Architecture Deep Dive: Prelude to GeForce RTX

Bounding Volume Hierarchy - How Computers Test the World

Post Your Comment

111 Comments

View All Comments

xXx][Zenith - Friday, September 14, 2018 - link

blode - Friday, September 14, 2018 - link

eddman - Friday, September 14, 2018 - link

xXx][Zenith - Friday, September 14, 2018 - link

eddman - Saturday, September 15, 2018 - link

niva - Wednesday, September 19, 2018 - link

Yojimbo - Saturday, September 15, 2018 - link

eddman - Saturday, September 15, 2018 - link

Yojimbo - Saturday, September 15, 2018 - link

eddman - Saturday, September 15, 2018 - link

Log in

Don't have an account? Sign up now