AI Chip Race: AMD vs Nvidia for $500B Market

The advent of the AI wave has made the catch-up competition between AMD and NVIDIA, the two major GPU giants, increasingly urgent. Previously, both parties announced that they would accelerate the release of chips from "once every two years" to "once a year". On October 10th, local time in the United States, AMD CEO Dr. Lisa Su delivered a nearly 2-hour keynote speech, unveiling the latest server CPUs, AI network cards, DPUs, and AI PC mobile processors. The highlight was the release of the AI chip named Instinct MI325X, which directly competes with NVIDIA's previously released AI chip.

Several industry insiders told reporters that the reason for the faster update pace of both AMD and NVIDIA is mainly due to intensified market competition and technological development. However, it is difficult to estimate the future status of AMD and NVIDIA in the AI chip field, as both companies claim to be superior based on the data they have released. But in terms of current market share, NVIDIA is undoubtedly far ahead, and this lead is not something AMD can surpass in the short term.

Benchmarking NVIDIA

Advertisement

AI peak computing power refers to the maximum number of floating-point operations (FLOPS) that a computing system can perform per second under the best conditions. This is an important indicator for measuring the ability of a computing system to handle AI tasks, and it is commonly used to evaluate the complexity and speed of machine learning and deep learning tasks.

The AI chip released by AMD this time is named MI325X, which uses the same CDNA 4 architecture as the previous generation MI300X. However, AMD has introduced HBM3E high-bandwidth memory for the first time, with AI peak computing power reaching 21 PFLOPS. According to the official parameter comparison, MI325X directly competes with NVIDIA's AI chip H200 GPU released last November. Its memory capacity is 1.8 times that of H200, and its memory bandwidth, FP16, and FP8 peak theoretical computing power are 1.3 times that of H200.

Dr. Lisa Su stated: "When running the Llama3.1 405B large model, the MI325 platform can provide 40% higher performance than NVIDIA's H200."

It is reported that MI325X will go into production in the fourth quarter of 2024 and is expected to be used in systems provided by many platform vendors, including Dell, Eviden, Gigabyte, HP, Lenovo, etc., starting from the first quarter of 2025.

In addition, AMD also disclosed the future AI chip roadmap at the meeting. The next generation MI350 series will be launched next year, which will use a new generation CDNA 4 architecture different from MI300X and MI325X, with AI computing power reaching 2.3 PFLOPS under half-precision floating-point numbers FP16. Previously, NVIDIA CEO Huang Renxun also announced that he would accelerate the update speed of AI chips, "from the previous two-year update to an annual update."

Industry observer Liang Zhenpeng told reporters that the reason for the faster update pace of both AMD and NVIDIA is mainly due to intensified market competition and technological development. With the rapid development of artificial intelligence technology, the requirements for chip computing and storage capabilities are also increasing. Therefore, chip manufacturers need to continuously launch new products to meet market demands. At the same time, with the development of emerging technologies such as cloud computing and big data, the demand for chips in data centers is also increasing, which also promotes the update speed of chip manufacturers.

If MI325X is successfully launched, from a performance perspective, AMD's AI chip is only one generation behind NVIDIA's AI chip release, which also brings the competition between the two back into the spotlight. According to CNBC, if AMD's artificial intelligence chips are considered by developers and cloud computing giants as a close substitute for NVIDIA's products, this may put some pressure on NVIDIA.Liang Zhenpeng stated that AMD's launch of new products will indeed have a certain impact on NVIDIA's data center revenue. This is because AMD's new products have advantages in performance, price, power consumption, and other aspects, which can meet more market demands, thereby impacting NVIDIA's market share.

However, in the view of Patrick Moorhead, an analyst at Moor Insights & Strategy: "It is difficult to assess the positions of AMD and NVIDIA in the data center GPU sector. Data is everywhere, and both companies claim to be superior. AMD's new GPU, especially the MI350, has improved efficiency and performance compared to its predecessors, and it also provides better support for low-bitrate models, which is a significant advancement. This is a fierce competition, with NVIDIA far ahead and AMD quickly catching up and achieving meaningful results."

The financial reports of both parties also show the gap. In 2024, NVIDIA's data center business, which includes AI chips, achieved a record revenue of $26.3 billion in Q2, a 154% increase compared to the same period last year. On AMD's side, the data center division, which includes CPUs, GPUs, and others, achieved revenue of $2.834 billion in the second quarter of this year, a year-on-year increase of 115%.

Competing for a $500 billion market

"The demand for artificial intelligence continues to grow beyond expectations. Looking forward to the next four years, it is expected that the AI accelerator market will grow at an annual rate of 60%, reaching a market size of $500 billion by 2028," said Lisa Su.

In the opinion of AI expert Guo Tao, the biggest obstacle for AMD to compete with NVIDIA in the AI chip market is the construction of the ecosystem.

In his view, NVIDIA's success is not only due to its chips but also due to the success of its software stack CUDA. CUDA has become the standard language for AI developers and has "locked" developers into NVIDIA's ecosystem. The CUDA computing platform is an exclusive parallel computing acceleration platform and programming auxiliary software developed by NVIDIA, allowing software developers and engineers to use NVIDIA GPUs to accelerate parallel general-purpose computing (only supports NVIDIA GPUs and is not compatible with mainstream GPUs such as AMD and Intel).

According to the official website of NVIDIA, using NVIDIA GPUs for CUDA programming and using basic tools is free, but enterprise applications and CUDA services on cloud platforms may require payment. CUDA not only generates revenue through GPU hardware sales, but its use in enterprise applications and software business is also an important source of NVIDIA's revenue.

However, AMD is currently accelerating its AI layout, and Lisa Su also introduced the progress in the software stack. In the past year, AMD has connected all major AI development platforms. It has obtained zero-day update support for PyTorch (which allows the use of new features on the day of software upgrades) and AMD hardware compatibility for Triton.

In addition, AMD has also tried to expand the AI business boundary through acquisitions. In August this year, AMD announced plans to spend $4.9 billion to acquire server manufacturer ZT Systems. Lisa Su said that she hopes to layout the artificial intelligence software stack. With this acquisition, AMD will combine various elements to provide a real artificial intelligence solution roadmap.

Leave a Reply

Your email address will not be published.Required fields are marked *