AMD adds LLVM support for the "GFX950" GPU, likely for MI325X or the upcoming MI355X accelerator

AMD'slatest commitsto theLLVM GitHubrepository include mentions of a new "GFX950" GPU—likely the firm's recently announced Instinct MI355X or the now-launched MI325X accelerators, perPhoronix. These patches push early enablement for the accelerators in LLVM - to better optimize the compiler back-end and improve software compatibility.

Advertisement

LLVM is a collection of modular tools and libraries that optimize code for specific hardware and architectures. It acts as a language-agnostic intermediary between high-level languages and machine code. AMD's latest commits to the LLVM repository feature a new GFX950 GPU. Going by Team Red's nomenclature, Phoronix suggests that this is likely the internal codename for the MI325X or MI350 (MI355X) accelerators - though the latter is more likely since the MI325X has been available since October.

Advertisement

Looking into the commits, we find that AMD has added support for the "v_prng_b32 instruction," - offering hardware acceleration for random number generation and MFMA (Matrix Fused Multiply-Add) instructions for matrix-related operations in machine learning. Additionally, there are mentions of "V_CVT_F32_BF16" instructions to convert FP32 numbers to the BF16 format, and the LDS (Local Data Share) memory has been increased to 160kB.

It is great to see that AMD is prepping its accelerators for launch and we should hear more news in the coming months - possibly atCES2025 - as the official launch window draws near. TheMI355Xfrom the MI350 family boasts 288GB of HBM3E memory - fabricated on TSMC's N3 node with support for FP4 and FP6 data types. AMD touts an 80% uplift as compared to the MI325X in FP16 and FP8 computations.

These chips will go neck and neck against Nvidia'sBlackwell B300chips by Q2 or Q3 next year. As it stands, the MI355X is expected to deliver 9.2 PetaFLOPS of FP4 compute performance - on par with Nvidia'sB200offerings. AMD is ahead in terms of memory capacity - featuring 288GB of HBM3E - presumably across eight12-Histacks which is 50% more than the B200 but rumored to be on par with the B300. However, Blackwell's debut has been marred by purportedoverheating issuesand adesign flaw- which could push volume B200 supply to Q1/Q2 2025.

Advertisement

Hot Rec

Advertisement

Toshiba stuffs an entire PC into a dot matrix printer

Toshiba stuffs an entire PC into a dot matrix printer

Teen 'swatter for hire' pleads guilty to making more than 375 swattings and fake mass-shooting and bombing calls

Teen 'swatter for hire' pleads guilty to making more than 375 swattings and fake mass-shooting and bombing calls

AMD Phoenix CPU brings palm-sized SBC to life for up to $329

AMD Phoenix CPU brings palm-sized SBC to life for up to $329

Maker creates thumb-sized Raspberry Pi USB-C Ethernet module using the RP2040

Maker creates thumb-sized Raspberry Pi USB-C Ethernet module using the RP2040

ASRock launches almost 20 ATX 3.1 power supplies

ASRock launches almost 20 ATX 3.1 power supplies

Chinese DDR4 producers are undercutting South Korean rivals' pricing by 50%

Chinese DDR4 producers are undercutting South Korean rivals' pricing by 50%

Intel's Core 200 family poised to mix Arrow, Lunar, Meteor, Alder, and Raptor Lake parts

Intel's Core 200 family poised to mix Arrow, Lunar, Meteor, Alder, and Raptor Lake parts

This Raspberry Pi 'Expanso Football' is a cool distributed compute cluster in a briefcase

This Raspberry Pi 'Expanso Football' is a cool distributed compute cluster in a briefcase

Dell ships first Nvidia Blackwell server racks — PowerEdge XE9712 servers are enterprise-ready

Dell ships first Nvidia Blackwell server racks — PowerEdge XE9712 servers are enterprise-ready

Intel celebrates the arrival of MRDIMMs

Intel celebrates the arrival of MRDIMMs