AMD RDNA 3 professional GPUs with 48GB can beat Nvidia 24GB cards in AI — putting the 'Large' in LLM

Mar 14, 2025 - 00:30

0 2

AMD RDNA 3 professional GPUs with 48GB can beat Nvidia 24GB cards in AI — putting the 'Large' in LLM

(Image credit: AMD)

AMD is swinging back at Nvidia with new DeepSeek benchmarks that claim its monster 48GB RDNA 3 GPUs can outperform Team Green's previous-generation RTX 4090.

David McAfee,AMD vice president and general manger of Ryzen CPUs and Radeon graphics posted on X that the Radeon Pro W7900 and Pro W7800 48GB cards can outperform an RTX 4090 by up to 7.3x in DeepSeek R1.

McAfee shared a graph of the three GPUs benchmarked in several iterations of DeepSeek R1 using LM Studio 0.3.12, and Llama.cpp runtime 1.18. The DeepSeek R1 iterations consisted of Distill Qwen 32B 8-bit, Distill Llama 70B 4-bit, Distill Qwen 32B 8-bit, and Distill Llama 70B 4-bit. The former two were configured to output conversational prompts (with 20 tokens) and the latter summarization prompts (with 3017 tokens).

Click See more to see the benchmark results:

A single @AMD Radeon PRO W7800 48GB or W7900 48GB has enough VRAM to run with great performance even the largest DeepSeek R1 Distill (or higher precision for 32B). pic.twitter.com/4uNTO6XAYGMarch 13, 2025

In DeepSeek R1 Distill Qwen 32B 8-bit, the RTX 4090 allegedly produced 2.7 tokens a second, the Pro W7800 48GB produced 19.1, and the Pro W7900 48GB produced 19.8 tokens per second. In Distill Llama 70B 4-bit, the RTX 4090 produced 2.3 tokens a second, the Pro W7800 48GB 12.8, and the Pro W7900 48GB 12.7 tokens a second.

In Distill Qwen 32B 8-bit, the RTX 4090 produced 2.5 tokens per second, Pro W7800 48GB 15.7 and Pro W7900 48GB 16.2 tokens per second. In R1 Distill Llama 70B 4-bit, the RTX 4090 produced two tokens per second, Pro W7800 48GB 10.1 and Pro W7900 48GB 10.4 tokens per second.

AMD's benchmarks claim the Radeon Pro W7800 or Pro W7900 48GB GPUs are up to 7.3x faster in Distill Qwen 32B 8-bit, 5.5x faster in Distill Llama 70B 4-bit, 6.5x faster in Distill Qwen 32B 8-bit, and 5.2x faster in Distill Llama 70B 4-bit compared to the RTX 4090.

David McAfee claims the 48GB trims of the WPro W7800 and W7900 have enough VRAM to run the largest DeepSeek R1 models. VRAM is one of the most critical aspects of processing large language models; parameters for LLMs are stored directly in VRAM and are directly proportional to the model sizes. Thus, the larger an LLM is, the more VRAM you need. But with the extra VRAM capacity comes very high prices.

The W7900 48GB costs a whopping $3,500 — $1,500 over the RTX 5090's $2,000 MSRP and $2,000 over the RTX 4090's $1,500 MSRP (though hardly any 4090's were sold at that price). But on the flip side, the 48GB RDNA 3 GPU is less than half the price of the closest current-generation 48GB Nvidia GPU you can buy today, the RTX A6000 Ada.

Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.

AMD's marketing looks great, but we have seen this before. AMD previously shared benchmarks of its RX 7900 XTX outperforming the RTX 4090 (mostly) in DeepSeek R1 benchmarks. However, Nvidia responded by showcasing benchmarks of the RTX 4090 (and RTX 5090), drastically outperforming the flagship RDNA 3 GPU with the same DeepSeek R1 configurations.

AMD also neglected to share any benchmarks comparing Nvidia's newest flagship, the RTX 5090, against its RDNA 3-based 48GB workstation-focused graphics cards. It will be interesting to see if Nvidia will follow up with another round of benchmarks to combat AMD, particularly since AMD has more VRAM on its 48GB cards than even the RTX 5090 with its 32GB of GDDR7.

Aaron Klotz is a contributing writer for Tom’s Hardware, covering news related to computer hardware such as CPUs, and graphics cards.

Go To Source