Page 1 of 1

Triangles/second and other benchmarks

Posted: Sun Aug 09, 2020 3:06 am
by jlv
The other thread made me curious about how expensive various graphics things are so I decided to run some benchmarks. I made tracks with the same statue repeated 64, 256, 1024, 4096 and 16384 times in a grid and tested with the statue set to a model with 968, 3936, 15744 and 62976 triangles. (The model is just the Blender monkey with 0, 1, 2 and 3 levels of subdivision.) All tests are on a Ryzen 2400g APU. All the FPS numbers are approximate since the number jitters around a bit.

The first test I ran was a cube (12 triangles) on the 16384 track. It ran at 39 FPS. This is essentially no triangles so this is roughly a measurement of draw call overhead, which comes out to 39*16384 = 638976 draw calls per second.

Related to the first test is how fast you can switch between textures. I ran the first test but alternated between two different textures. This ran at 28 FPS which comes out to 458752 texture changes / second. This one surprised me. I was expecting it to be much worse.

Here's the main test. I threw out tests under 8 FPS and over 128 since they won't be right since the game will either run in slow motion or sleep at those rates. Each row is the triangles/second and FPS counts for the different models on one track.

Code: Select all

              monkey0          monkey1          monkey2          monkey3
   64          (>128)           (>128)           (>128)   181370880 (45)
  256          (>128)           (>128)   181370880 (45)   177340416 (11)
 1024          (>128)   241827840 (60)   193462272 (12)             (<8)
 4096  198246400 (50)   290193408 (18)             (<8)             (<8)
16384  303155336 (19)             (<8)             (<8)             (<8)
Some interesting things stand out here. At around 180,000,000 triangles per second, if you were doing nothing else, you could just barely get away with spending 40k triangles per bike/rider and still get 128 FPS on this low end GPU.

Surprisingly, the smaller models get higher triangle throughput despite using more draw calls. This is probably because at 968 and 3936 triangles the monkey0/1 models fit in the GPU's cache memory.

Finally, if you're a modeler and wondering when reducing triangle count is pointless since it's overwhelmed by draw call overhead, a draw call takes about the same time as 280 triangles on this GPU. So if you imagine your object starts out with 280 triangles that'll give you an idea of how you save as a percentage for each triangle you remove. (E.g. if you have 10,000 triangles and remove 1000, the model will take about 90.3% (9280/10280) of the time to draw, so you removed 10% and saved almost 10%. But if it was 1,000 triangles and you remove 100, it'll draw in 92.2% (1180/1280) of the time, so you only gain ~8% for removing 10%. For 10% off of 100, it'd be 97.3% (370/380) only saving ~3%.) So basically, once it's down in the low hundreds it's getting pointless.

Re: Triangles/second and other benchmarks

Posted: Mon Aug 10, 2020 2:32 am
by jlv
Just for fun, a track with 90 million triangles worth of statues. Top LOD is 94,464 triangles * 1,024 copies of the statue for 96,731,136 total triangles. It's phony marketing numbers but what the heck.

Don't switch to the editor. Turns out only using the top LOD for the model wireframes in the editor doesn't work well here.

Re: Triangles/second and other benchmarks

Posted: Mon Aug 10, 2020 12:21 pm
by Mr. Wiggles
this is pimp

Re: Triangles/second and other benchmarks

Posted: Mon Aug 10, 2020 3:18 pm
by ddmx
JLV - without doing any analysis or math, would it reason that grouping small objects together as a single object would result in a performance increase?

For example, group all track objects around a corner (bales, stakes, crowd, etc) as a single object and thus a single call. Of course, up to a certain amount of tri's. Would seem that this would be more efficient than all of the draw calls for each individual bale, stake, etc.

Re: Triangles/second and other benchmarks

Posted: Tue Aug 11, 2020 1:44 am
by jlv
ddmx wrote:JLV - without doing any analysis or math, would it reason that grouping small objects together as a single object would result in a performance increase?

For example, group all track objects around a corner (bales, stakes, crowd, etc) as a single object and thus a single call. Of course, up to a certain amount of tri's. Would seem that this would be more efficient than all of the draw calls for each individual bale, stake, etc.
If you had 10 cubes that you grouped together and using the time it takes to draw a triangle as the unit of time, it'd go from 2920=10*(280+12) to 400=280+(10*12). So pretty big savings as a percentage. Probably not worth the bother if it's just 10 things though since that'd only be ~14 microseconds of real time. But if you did it for a lot of objects it'd add up.

Sorry about the math!

Re: Triangles/second and other benchmarks

Posted: Sun Aug 16, 2020 10:50 am
by Shadow
With those numbers, I'm guessing mxsim already uses instancing for objects using the same mesh and texture? I'm actually surprised at the amount of draw calls and triangles opengl can handle.

Re: Triangles/second and other benchmarks

Posted: Mon Aug 17, 2020 3:35 am
by jlv
Shadow wrote:With those numbers, I'm guessing mxsim already uses instancing for objects using the same mesh and texture? I'm actually surprised at the amount of draw calls and triangles opengl can handle.
It doesn't do instancing but that would be a big win for simple objects like AMA stakes. While triangle throughput should be pretty even on all APIs, I'm pretty sure Vulcan would be faster on draw calls and texture changes. I used to contribute to Mesa pretty regularly and there's a *lot* going on when you do a glBindTexture call.

Re: Triangles/second and other benchmarks

Posted: Mon Aug 17, 2020 11:07 am
by DJ99X
Interesting. Will have to do some tests with my track. I have 700 trees of 6500 triangles, which might be a bit much in retrospect. Will try a test just grouping them together and see how the performance changes

This example of instancing had similar performance for rendering a 500 triangle asteroid ~1500 times without instancing to 100000 times with instancing.

https://learnopengl.com/Advanced-OpenGL/Instancing

Re: Triangles/second and other benchmarks

Posted: Mon Aug 17, 2020 12:01 pm
by Shadow
jlv wrote:
Shadow wrote:With those numbers, I'm guessing mxsim already uses instancing for objects using the same mesh and texture? I'm actually surprised at the amount of draw calls and triangles opengl can handle.
It doesn't do instancing but that would be a big win for simple objects like AMA stakes. While triangle throughput should be pretty even on all APIs, I'm pretty sure Vulcan would be faster on draw calls and texture changes. I used to contribute to Mesa pretty regularly and there's a *lot* going on when you do a glBindTexture call.
I'm surprised to be honest. Implementing instancing in opengl shouldn't be that difficult from what I remember when I was learning it. What DJ just linked to is actually the exact resource I used.

Re: Triangles/second and other benchmarks

Posted: Tue Aug 18, 2020 2:03 am
by jlv
DJ99X wrote:Interesting. Will have to do some tests with my track. I have 700 trees of 6500 triangles, which might be a bit much in retrospect. Will try a test just grouping them together and see how the performance changes

This example of instancing had similar performance for rendering a 500 triangle asteroid ~1500 times without instancing to 100000 times with instancing.

https://learnopengl.com/Advanced-OpenGL/Instancing
It's weird that he's getting really great vertex performance and terrible draw call performance. Must be a high end GPU paired with a really slow CPU.
Shadow wrote:I'm surprised to be honest. Implementing instancing in opengl shouldn't be that difficult from what I remember when I was learning it. What DJ just linked to is actually the exact resource I used.
Mostly a matter of having code that already works while also wanting to support old Intel GPUs that were popular for a long time. Not that you can't do both but it complicates things. Luckily they're pretty much extinct now.