If you experience any difficulty in accessing content on our website, please contact us at 1-866-333-8917 or email us at support@hudsonvalleyhost.com and we will make every effort to assist you.

By
 
October 11, 2024

AMD Unveils “Turin” Server CPUs: A Game Changer in Performance and Efficiency

Deluxe company -

For those considering an upgrade to their X86 server infrastructure, there’s a significant buzz regarding the willingness of both enterprises and hyperscalers to invest. The exciting news is that both Intel and AMD have unveiled their most advanced serial compute engines to date.

Intel, which still represents a majority of X86 server CPU shipments, has remarkably managed to come close to matching AMD, despite facing a slight disadvantage in manufacturing processes. On the other hand, AMD is set to gain even more market traction thanks to its newly introduced “Turin” Zen 5 and Zen 5c processors, which offer both performance and price benefits. This dynamic suggests that a future scenario may arise where one of these two competitors may initiate a price war once they reach parity in both manufacturing and performance.

However, that scenario is not imminent. Currently, as major hyperscalers develop their own Arm-based server CPU designs, both Intel and AMD seem content to engage in direct competition while seemingly ignoring the shift towards Arm architecture. Adjusting their prices for X86 chips would mean relinquishing a significant amount of revenue, something neither company can afford at this juncture. Consequently, X86 server CPUs are entering a legacy phase, while custom Arm chips are driving the price/performance ratio downwards. We anticipate that eventually, RISC-V could impact Arm in a similar fashion.

As is customary, we will begin our exploration of the Turin CPUs by providing essential specifications, performance metrics, and pricing details. This will be followed by an in-depth architectural analysis and a competitive insight from AMD’s viewpoint. (We were just finalizing these articles for the Intel “Granite Rapids” Xeon 6 processors when external events, such as Hurricane Helene, redirected our focus.)

AMD has significantly progressed in its journey with Epyc processors, and it absolutely needed to if it intended to restore its reputation after largely neglecting the datacenter landscape in the early 2010s. During that time, its designs struggled, while Intel surged ahead with an enhanced 64-bit Xeon lineup that improved upon many concepts originally introduced by AMD’s Opterons. Presently, the roles have reversed, with Intel now lagging behind AMD’s partner, Taiwan Semiconductor Manufacturing Co, facing challenges in transitioning to advanced manufacturing processes, which have caused serious issues for Intel’s server CPU engineers. Since 2019, Intel has been unable to make substantial “design wins” and has resorted to “supply wins,” as it can still produce chips at a pace that AMD cannot match.

Throughout the various generations of Epyc, the chiplet architecture has matured and refined to the point that an Epyc CPU, composed of nine, thirteen, or seventeen chiplets interlinked and encapsulated in organic substrate, functions similarly to a monolithic CPU from earlier days. Consequently, Epyc processors are increasingly being adopted, particularly by hyperscalers and cloud service providers eager to maximize the number of cores within a server, all to optimize the crucial metric of price/performance per watt per unit of volume. This pursuit reflects what was once termed SWaP back in the early 2000s, referring to Space, Watts, and Performance.

As advancements in Epyc chiplet designs occurred, adoption surged. At this stage in the Epyc lineage, it is universally acknowledged that AMD is committed to the server CPU market for the long term, capable of developing processors for one-socket and two-socket servers that can compete against any assembly any competitor brings forth.

However, as previously mentioned, x86 processors are likely to always be more expensive than custom-made Arm server chips favored by hyperscalers and cloud builders due to the additional overhead involved in pricing from companies like Intel and AMD. In other words, any entity that is not a hyperscaler or cloud provider will inevitably face a steep premium for server computing. This reality is inherent and unavoidable.

The majority of the global landscape continues to run X86 applications on Windows Server, which are not straightforward to transition to Arm architectures, so there’s no need to panic. However, it’s worth mentioning that many contemporary applications are developed for Linux rather than Windows Server, and these are generally more readily adaptable to Arm. Thus, it’s important not to become too relaxed about the situation. Maintaining a consistent level of concern might be a prudent approach.

Considering the current condition of the X86 server market, it raises the question of just how significantly AMD can expand its market share.

A lot hinges on the behavior of hyperscalers and cloud service providers, who account for more than half of the server CPU shipments. If half of their infrastructures transition to Arm while the other half remains X86 to accommodate legacy X86 applications—ultimately, Windows Server—then three-quarters of the market will remain X86, presenting a substantial target. Conversely, if hyperscalers and cloud builders evolve to make up three-quarters of the server CPU shipments but only expand their X86 infrastructures organically to support Windows Server along with certain Linux workloads that are preferred on X86 (for valid reasons), this could put considerable pressure on both Intel and AMD, with market shares fluctuating based on how competitive each is in the ongoing price battles. This scenario presumes design and process equality, which is not guaranteed, as evidenced by the experiences of both Intel and us.

There are numerous microarchitectural advancements within the Turin Zen 5 and Zen 5c cores that enhance integer instructions per clock (IPC) by 17 percent when compared to the Zen 4 and Zen 4c architectures, as well as improving floating-point IPC by 37 percent.

Note: In our analysis, we compare the relative performance to our reference four-core “Shanghai” Opteron 2387 processor operating at 2.8 GHz, focusing solely on integer workloads for now. However, we intend to include floating point relative performance in future updates.

The improvement in integer IPC within the core design aligns with historical advancements: a 15 percent enhancement for the “Rome” Epyc 7002 compared to the “Naples” Epyc 7001, a 19 percent increase from Rome to “Milan” Epyc 7003, and a 14 percent rise from Milan to “Genoa” Epyc 9004. This progress is attributed to process reductions, adjustments to the L3 cache for each core (with “c” cores featuring 2 MB of L3 cache versus the standard cores at 4 MB per core), and enhancements in chiplet design, allowing AMD to expand its SKU offerings significantly. Currently, AMD boasts a more extensive range with Turin, comprising 27 different chips, while Intel’s Granite Rapids P-core and “Sierra Forest” E-core Xeon 6 line consists of only twelve SKUs.

This situation reflects a changed landscape for Intel. As we look ahead, Intel plans to introduce additional low-end SKUs for both Granite Rapids and Sierra Forest in early 2025. Meanwhile, AMD is likely to roll out some telco and edge variations for Turin, along with 3D V-Cache Turin-X processors, potentially balancing the playing field.

The Turin processors signify a progression from Genoa, a necessary development since both chip families must fit into the same SP5 server socket. Major innovations typically require new sockets, and server purchasers and designers prefer to utilize a socket across at least two generations.

AMD is utilizing the Turin chips, which feature cores fabricated using TSMC’s 3-nanometer technology, while the I/O and memory chips are produced using 4-nanometer processes. This marks a significant reduction from the 5-nanometer processes for the Genoa cores and the 6-nanometer technology for the Genoa I/O and memory components.

The following table illustrates the evolution across five generations of products that utilize the standard Zen cores, excluding the “c” variants:

The core complex dies (CCDs) in the standard Turin offerings consist of eight cores accompanied by 32 MB of L2 cache shared among these cores, similar to the configurations of the Milan and Genoa architectures. Thanks to advancements in chiplet technology, moving from the 7-nanometer architecture in Milan to the 5-nanometer in Genoa and now to the 3-nanometer in Turin, AMD has successfully managed to include 16 chiplets plus the I/O die within a single package. This advancement has led to a doubling of the maximum core count, rising from 64 in Milan to 128 in Turin.

Proportionately, the L3 cache has grown to 512 MB in Turin, and the device offers twelve DDR5 memory channels, akin to Genoa. However, the memory in Turin operates at a frequency of 6.4 GHz, which is a 50 percent boost in speed, consequently enhancing the memory bandwidth per socket by the same percentage. This increase aligns with the 50 percent rise in core count in comparison to Genoa. Both the Genoa and Turin architectures come equipped with either 128 or 160 lanes of PCI-Express 5.0 I/O, a requirement dictated by the SP5 socket.

Today, two versions of the Turin CPUs were unveiled, showcasing not only distinct cores but also differing CCDs and their configurations aimed at tackling various workloads within the datacenter.

The “scale up” Turins, utilizing the Zen 5 CCDs, feature a total of sixteen CCDs, each equipped with eight Zen 5 cores, culminating in 128 cores and 256 threads per unit. Conversely, the “scale out” Turins, reminiscent of the “Bergamo” series alongside traditional Genoa options, comprise twelve Zen 5c CCDs. These models optimize performance by sacrificing 2 MB of L2 cache for each core and altering the CCD layout, allowing for sixteen cores per Zen 5c CCD, compared to the eight found in the Zen 5 CCDs. While the layouts of the Zen 5 and Zen 5c cores differ, their functionality remains unchanged. This approach contrasts sharply with Intel’s strategy with Granite Rapids and Sierra Forest, which feature different types of cores: a standard Xeon core known as a P-core in the former, and a distinctly different, Atom-derived core termed an E-core in the latter. The significance of these choices will ultimately be determined by market response.

As seen in previous Epyc CPU generations, AMD has developed standard Turin models for two-socket servers, alongside special versions labeled with a P, intended for single-socket setups. These versions come with reasonable price reductions due to a modification in their NUMA circuits. Additionally, there are F variants of the Turins designed for high-performance tasks (with the F denoting frequency enhancements), and we anticipate the arrival of X variants in the future – likely by Q1 2025 when Intel is set to announce new CPUs – that will include additional L3 cache to enhance performance for HPC and certain AI applications sensitive to cache size.

Without further delay, here are the Zen 5 SKUs of Turin that have been released so far:

Here we present the Zen 5c SKUs from Turin, showcasing their enhanced core count, improved throughput, and competitive price/performance ratio:

The advancements AMD has achieved since the introduction of the 45-nanometer Shanghai Opterons in April 2009, during the depths of the Great Recession, are truly noteworthy and deserve recognition.

The Opteron 2387 served as the balanced, middle-tier option in the limited Shanghai lineup, which consisted of just four SKUs. This processor featured four Shanghai cores operating at a frequency of 2.8 GHz, without any boost speed, and included 6 MB of L3 cache, all encapsulated within a neat 75-watt thermal design power. When purchased in 1,000-unit trays, which are the norm in the server market, it was priced at $873 per unit. (And no, you don’t receive a tray for that price; these aren’t potato chips…)

To assess relative performance, we calculate the product of the chip’s clock speed, its core count, and its overall IPC improvement relative to the Shanghai core.

The top-tier Naples part, the Epyc 7601 featuring 32 cores at a speed of 2.2 GHz, achieved an impressive performance increase of 10.37 times. It provided this performance for a cost of $405 per unit of relative performance, with an overall price of $4,200. In comparison, the nearly top-tier Rome Epyc 7742, designed for high-performance computing (HPC) tasks, boasts 64 cores running at 2.25 GHz and elevated its relative performance to 24.40, reducing the cost to $285 per unit of performance. Meanwhile, the nearly top-tier 64-core Milan Epyc 7763, operating at 2.45 GHz, displayed a remarkable rating of 31.61. This performance was attributed to enhancements in microarchitecture and clock speed rather than just an increase in core count, and the price per unit of performance moved slightly down to $250. The 96-core Epyc 9654, running at 2.4 GHz, achieved a relative performance of 52.94, translating to a price of $11,805 and $223 per unit of performance.

It is essential to recognize that while improving performance is generally more straightforward than reducing price per performance, boosting core count is often easier than increasing clock speeds due to thermal limitations.

With the transition to Turin, the leading vanilla Epyc 9755 now features 128 cores operating at 2.7 GHz, resulting in a relative performance of 92.93 at a cost of $12,984. This equates to a significant $140 per unit of performance, showcasing AMD’s notable advancements in performance per cost ratio.

For context, this demonstrates a 92.93X enhancement in performance coupled with a 14.9X increase in price and a 6.7X rise in power consumption, all compared to the baseline Shanghai Opteron 2387, representing an impressive 6.2X better performance per dollar over a span of just over fifteen years.

The Zen 5c variants of Turin are intensifying the push for both performance and cost-effectiveness. The Epyc 9965 features a remarkable 192 cores operating at 2.25 GHz, achieving a relative performance score of 116.17 with a price tag of $14,813, resulting in a price/performance ratio of $128 per performance unit. This represents a 25 percent improvement in peak theoretical integer throughput over the Epyc 9755, along with an 8.7 percent enhanced value for the money.

It is crucial to assess how sensitive your workloads are to cache before opting for a Zen 5c variant instead of a Zen 5. Furthermore, a meticulous examination of the complete SKU lineups is essential to align workloads with the appropriate SKU. If high serial performance is a priority, expect to pay a premium, as reflected in the provided tables. Similarly, while there is an added cost for higher throughput, this is attributed to yield rates on chiplets, which is a logical and reasonable factor.

We will refrain from delving into a direct comparison between AMD’s Turin 5 and Turin 5c, as well as Intel’s Granite Rapids and Sierra Forest in this context. However, we believe that a relative comparison within Intel’s offerings is quite revealing at this point.

To begin with, it is essential to note that the higher-core-count Intel Sierra Forest components boast more cores but offer markedly lower performance, reduced prices, and improved value for the dollar compared to the Granite Rapids chips. Specifically, the 144-core Xeon 6780E delivers 24 percent less throughput than the 128-core Xeon 6980P, yet the price/performance ratio for these top-tier products improves by 16 percent. However, as previously mentioned, the Turin 5c Epyc 9965, with its 192 cores, accomplishes 25 percent more work while costing 8.7 percent less per performance unit compared to the Turin 5 Epyc 9755, which has 128 cores.

This marks a significant shift in approach.

To further elaborate, let’s examine the relative performance increases for Intel from 2009 to 2024. The benchmark server CPU we consider is the 45 nanometer “Nehalem” Xeon E5540, which debuted in March 2009, a period also marked by the Great Recession. This processor features four cores operating at 2.53 GHz, is equipped with 8 MB of L3 cache, consumes 80 watts, and originally retailed for $744 in 1,000-unit trays. Since then, Intel has managed to boost its performance by 62 times when comparing the Xeon E5540 to the highest-end Xeon 6 6980P; however, power consumption has risen to 500 watts, a 6.25 times increase, with price escalating to $17,800, reflecting a 23.9 times increase, while the price/performance ratio has improved by 2.6 times. In contrast, AMD has ramped up performance for standard Turin components by a staggering 92.93 times, with power up by 6.7 times, price increased by 14.9 times, and a price/performance enhancement of 6.25 times.

Keep an eye out for an in-depth exploration of the Turin architecture and a competitive analysis.

Enjoy highlights, insights, and stories from the week delivered straight to your inbox, without any interruptions.

Subscribe now

It can be challenging to find enthusiasm for mature markets that experience slow and steady growth, unless they represent the most lucrative segments of the market. In the case of the global IT sector, the part that enterprises consume—which encompasses everything smaller than

Competing against industry giants like Intel and Nvidia in their respective CPU and GPU markets is no small feat. However, AMD deserves recognition for bravely challenging both companies simultaneously in a bid to secure a greater share of the datacenter market. Over time, AMD has made significant inroads, as seen in their efforts to

Although the minimalist server processor—and the microserver concept derived from it—did not dominate datacenters on a global scale, there are still specific workloads that can be efficiently managed by modestly powered single-socket CPUs. This is the reason why Intel has consistently developed server variations of

Timothy, you’ve overlooked a significant opportunity: “AMD TURINS THE SCREWS…”

No, that’s where I began! However, I opted for the rhymes instead.

Excellent analysis! These EPYC Zen Torinos indeed establish a new benchmark regarding the price/performance ratio. If the pricing had remained at the levels of Rome-Milan-Genoa (approximately $250 / Relative Performance), the narrative wouldn’t be as compelling, with a 3.5X price/performance enhancement compared to Intel’s 2.6X. However, Turin truly advances that metric to an astonishing 6.25X, marking a remarkable 15-year span (a significant shift!).

It’s also noteworthy how they boosted the base clock from Milan to Genoa (rising from 2.4 GHz to 3.1 GHz in the 64C model) and further elevated the boost clock from Genoa to Turin (from 3.7 GHz to 5.0 GHz). Frontier’s CPUs utilize the Milan 7713 64C 3rd generation, operating at 2.0 GHz, and I can only speculate how much more impressive performance would be with Turins paired alongside comparable GPUs (faster, more energy-efficient, and less costly).

Looking ahead, we should anticipate MRDIMM, PCIe 6.0, and CXL 3.0!


Hudson Valley Host is premier provider of cutting-edge hosting solutions, specializing in delivering a seamless online experience for businesses and individuals. We offer a comprehensive range of hosting services, including Shared Hosting, VPS, Dedicated Servers, and Colocation. With 24/7 technical support, robust security measures, and user-friendly control panels, we empower clients in managing their online presence effortlessly. Hudson Valley Host is your trusted partner in achieving online success.

For Inquiries or to receive a personalized quote, please reach out to us through our contact form here or email us at sales@hudsonvalleyhost.com.

Deluxe company - 

Subscribe Email