AMD Threadripper Pro Review: An Upgrade Over Regular Threadripper?
by Dr. Ian Cutress on July 14, 2021 9:00 AM EST- Posted in
- CPUs
- AMD
- ThreadRipper
- Threadripper Pro
- 3995WX
Conclusion
Threadripper Pro is designed to fill a niche in the workstation market. The workstation market has always been a little bit odd in that it wants the power and frequency of a high-end desktop, but the core count, memory support, and IO capabilities of servers. AMD blurred the lines by moving its mainstream desktop platform to 16 cores, but failed to meet memory and IO requirements – Threadripper got part of the way there, going up to 32 cores and then 64 cores with more memory and IO, but it was still limiting in support for things like ECC. That’s where Threadripper Pro comes in.
The whole point of Threadripper Pro is to appeal to those that need the features of EPYC but none of the downsides of potentially lower performance or extended service contracts. EPYC, by and large, has been sold only at the system level, whereas Threadripper Pro can be purchased at retail, and the goal of the product is to be ISV verified for standard workstation applications. In a world without Threadripper Pro, users who want the platform can either get a Threadripper and lament the reduced memory performance and IO, or they could get an EPYC and lament the reduced core performance. Speaking with OEMs, there are some verticals (like visual effects) that requested versions of Threadripper with Pro features, such as remote management, or remote access when WFH with a proper admin security stack. Even though TR Pro fills a niche, it’s still a niche.
In our testing today, we benchmarked all three retail versions of Threadripper Pro in a retail motherboard, and compared them to the Threadripper 3000 series.
AMD Comparison | |||||||
AnandTech | Cores | Base Freq |
Turbo Freq |
Chips | L3 Cache |
TDP | Price SEP |
AMD EPYC (Zen 3, 128 PCIe 4.0, 8 channel DDR4 ECC) | |||||||
7763 (2P) | 64 / 128 | 2450 | 3500 | 8 + 1 | 256 MB | 280 W | $7890 |
7713P | 64 / 128 | 2000 | 3675 | 8 + 1 | 256 MB | 225 W | $5010 |
7543P | 32 / 64 | 2800 | 3700 | 8 + 1 | 256 MB | 225 W | $2730 |
7443P | 24 / 48 | 2850 | 4000 | 4 + 1 | 128 MB | 200 W | $1337 |
7313P | 16 / 32 | 3000 | 3700 | 4 + 1 | 128 MB | 155 W | $913 |
AMD Threadripper Pro (Zen 2, 128 PCIe 4.0, 8 channel DDR4-ECC) | |||||||
3995WX | 64 / 128 | 2700 | 4200 | 8 + 1 | 256 MB | 280 W | $5490 |
3975WX | 32 / 64 | 3500 | 4200 | 4 + 1 | 128 MB | 280 W | $2750 |
3955WX | 16 / 32 | 3900 | 4300 | 2 + 1 | 64 MB | 280 W | $1150 |
3945WX | 12 / 24 | 4000 | 4300 | 2 + 1 | 64 MB | 280 W | OEM |
AMD Threadripper (Zen 2, 64 PCIe 4.0, 4 channel DDR) | |||||||
3990X | 64 / 128 | 2900 | 4300 | 8 + 1 | 256 MB | 280 W | $3990 |
3970X | 32 / 64 | 3700 | 4500 | 4 + 1 | 128 MB | 280 W | $1999 |
3960X | 24 / 48 | 3800 | 4500 | 4 + 1 | 128 MB | 280 W | $1399 |
AMD Ryzen (Zen 3, 20 PCIe 4.0, 2 channel DDR) | |||||||
R9 5950X | 16 / 32 | 3400 | 4900 | 2 + 1 | 64 MB | 105 W | $799 |
Performance between Threadripper Pro and Threadripper came in three stages. Either (a) the results between similar processors was practically identical, (b) Threadripper beat TR Pro by a small margin due to slightly higher frequencies, or (c) TR Pro thrashed Threadripper due to memory bandwidth availability. That last point, (c), only really kicks in for the 32c and 64c processors it should be noted. Our 16c TR Pro had the same memory bandwidth results as TR, most likely due to only having two chiplets in its design.
In the end, that’s what TR Pro is there for – features that Threadripper doesn’t have. If you absolutely need up to 2 TB of eight-channel memory over 256 GB, you need TR Pro. If you absolutely need memory with ECC, then TR Pro has validated support. If you absolutely need 128 lanes of PCIe 4.0 rather than 64, then TR Pro has it. If you absolutely need Pro features, then TR Pro has it.
The price you pay for these Threadripper Pro features is an extra 37.5% over Threadripper. The corollary is that TR Pro is also more expensive than 1P EPYC processors because it has the full 280 W frequency profile, while EPYC 1P is only at 225W/240W. EPYC does have 280 W processors for dual-socket platforms, such as the 7763, but they cost more than TR Pro.
The benefit to EPYC right now is that EPYC Milan uses Zen 3 cores, while Threadripper Pro is using Zen 2 cores. We are patiently waiting for AMD to launch Threadripper versions with Zen 3 – we hoped it would have been at Computex in June, but now we’re not sure exactly when. Even if AMD does launch Threadripper with Zen 3 this year, Threadripper Pro variants might take longer to arrive.
98 Comments
View All Comments
Mikewind Dale - Wednesday, July 14, 2021 - link
I have a ThreadRipper Pro 3955WX, and I discovered something interesting about the memory bandwidth.Originally, I bought 4x64 GB ECC RDIMM because I thought 256 GB might be enough, and I wanted to leave some empty RAM slots to populate with 128 GB RDIMMs if those ever became cost-effective. (Right now, 128 GB RDIMMs are about triple the price of 64 GB.)
CPU-Z and AIDA64 reported "quad" channel memory, and AIDA64's memory benchmarks showed reasonable memory performance.
But I discovered that 256 GB wasn't enough for my application, so I bought 2 more 64 GB RDIMMs.
At this point, I had 6 DIMMs populated. CPU-Z and AIDA64 both reported "hexa" channel memory, but AIDA64's memory benchmarks showed that my memory performance was about 2/3 that of a Ryzen.
So I bought 2 more RDIMMs again, for a total of 8. Now, my memory benchmark in AIDA64 is much closer to expected.
So the moral of the story is: you can populate 4 DIMMs, or you can populate 8, but don't dare populate 6. Populating precisely 6 DIMMs will absolutely cripple your memory performance, whereas 4 DIMMs still have acceptable performance.
kobblestown - Wednesday, July 14, 2021 - link
The 3955 probably has only 2 CCDs and is therefore limited to 4 DDR channels throughput. It seems that each IF link has the throughput of 2 DDR channels and this makes sense.You should keep in mind that the IO die has in effect 4 dual channel controllers and you may have populated them suboptimally. If you have two dual channel controllers fully populated and two half populated (instead of a third fully populated and the fourth one staying empty) you'll have skewed results. Also, there was some noise about Milan working better with 6 channel configurations so it may be something specific to Rome chips.
Rudde - Wednesday, July 14, 2021 - link
Server providers had requested for 6 channel memory support for server processors and that was implemented in Milan.McFig - Wednesday, July 14, 2021 - link
What kobblestown is suggesting is that maybe Mikewind Dale could have gotten the 6 RDIMMs working by moving one of them so that each pair is fully populated.Mikewind Dale - Wednesday, July 14, 2021 - link
McFig, there are only 8 slots, so I'm not sure how I could have moved the 6 DIMMs among the 8 slots to ensure that each pair is populated.1_rick - Wednesday, July 14, 2021 - link
He probably means "each of 3 pairs fully populated".DougMcC - Wednesday, July 14, 2021 - link
I think the question is whether 3/3 is better than 4/2kobblestown - Friday, July 16, 2021 - link
Heya! Sorry for the nebulous formulation. In terms of the number of DIMMS per memory controller, I suggest having 2+2+2+0 instead of 2+1+2+1. One needs to figure out what this means for any particular MB. But as DougMcC suggests, that would probably mean having 4 DIMMs on one side of the CPU and 2 on the other, rather than having 3 DIMMs on each side. The latter is bound to be suboptimal. Whether the former offers an improvement is something that I would be very interested to know but could be that Rome has some shortcoming in this area which is addressed in Milan.Again, dual CCD configurations are limited to 4 channel bandwidth but it's still worth it to have all channels populated so you don't get bitten by badly handled assymetry and the IO does not fight (too much) with the cores for the bandwidth.
kobblestown - Friday, July 16, 2021 - link
BTW, one should also check the memory interleaving options in the UEFI. Maybe the way the IO die aggregates the memory channels can be tweaked to achive the expected performance even with 6 DIMMs. Or maybe that's only achievable with Milan.Mikewind Dale - Friday, July 16, 2021 - link
Ahhh, I see what you mean. Thanks. Well, I have 8 DIMMs now, and I don't want to mess with my system any more. Maybe Anandtech can test this.