The Aurora Supercomputer Is Installed: 2 ExaFLOPS, Tens of Thousands of CPUs and GPUs
by Anton Shilov on June 22, 2023 12:15 PM ESTArgonne National Laboratory and Intel said on Thursday that they had installed all 10,624 blades for the Aurora supercomputer, a machine announced back in 2015 with a particularly bumpy history. The system promises to deliver a peak theoretical compute performance over 2 FP64 ExaFLOPS using its array of tens of thousands of Xeon Max 'Sapphire Rapids' CPUs with on-package HBM2E memory as well as Data Center GPU Max 'Ponte Vecchio' compute GPUs. The system will come online later this year.
"Aurora is the first deployment of Intel's Max Series GPU, the biggest Xeon Max CPU-based system, and the largest GPU cluster in the world," said Jeff McVeigh, Intel corporate vice president and general manager of the Super Compute Group.
The Aurora supercomputer looks quite impressive, even by the numbers. The machine is powered by 21,248 general-purpose processors with over 1.1 million cores for workloads that require traditional CPU horsepower and 63,744 compute GPUs that will serve AI and HPC workloads. On the memory side of matters, Aurora has 1.36 PB of on-package HBM2E memory and 19.9 PB of DDR5 memory that is used by the CPUs as well as 8.16 PB of HBM2E carried by the Ponte Vecchi compute GPUs.
The Aurora machine uses 166 racks that house 64 blades each. It spans eight rows and occupies a space equivalent to two basketball courts. Meanwhile, that does not count the storage subsystem of Aurora, which employs 1,024 all-flash storage nodes offering 220PB of storage capacity and a total bandwidth of 31 TB/s. For now, Argonne National Laboratory does not publish official power consumption numbers for Aurora or its storage subsystem.
The supercomputer, which will be used for a wide variety of workloads from nuclear fusion simulations to whether prediction and from aerodynamics to medical research, uses HPE's Shasta supercomputer architecture with Slingshot interconnects. Meanwhile, before the system passes ANL's acceptance tests, it will be used for large-scale scientific generative AI models.
"While we work toward acceptance testing, we are going to be using Aurora to train some large-scale open-source generative AI models for science," said Rick Stevens, Argonne National Laboratory associate laboratory director. "Aurora, with over 60,000 Intel Max GPUs, a very fast I/O system, and an all-solid-state mass storage system, is the perfect environment to train these models."
Even though Aurora blades have been installed, the supercomputer still has to undergo and pass a series of acceptance tests, a common procedure for supercomputers. Once it successfully clears these and comes online later in the year, it is projected to attain a theoretical performance exceeding 2 ExaFLOPS (two billion billion floating point operations per second). With vast performance, it is expected to secure the top position in the Top500 list.
The installation of the Aurora supercomputer marks several milestones: it is the industry's first supercomputer with performance higher than 2 ExaFLOPS and the first Intel'-based ExaFLOPS-class machine. Finally, it marks the conclusion of the Aurora saga that began eight years ago as the supercomputer's journey has seen its fair share of bumps.
Originally unveiled in 2015, Aurora was initially intended to be powered by Intel's Xeon Phi co-processors and was projected to deliver approximately 180 PetaFLOPS in 2018. However, Intel decided to abandon the Xeon Phi in favor of compute GPUs, resulting in the need to renegotiate the agreement with Argonne National Laboratory to provide an ExaFLOPS system by 2021.
The delivery of the system was further delayed due to complications with compute tile of Ponte Vecchio due to the delay of Intel's 7 nm (now known as Intel 4) production node and the necessity to redesign the tile for TSMC's N5 (5 nm-class) process technology. Intel finally introduced its Data Center GPU Max products late last year and has now shipped over 60,000 of these compute GPUs to ANL.
Source: Intel
39 Comments
View All Comments
fallaha56 - Thursday, June 22, 2023 - link
Sure but what’s the power consumption? Twice that of El Capitan…ballsystemlord - Thursday, June 22, 2023 - link
Yes. Notice the lack of even an unofficial figure there.Samus - Thursday, June 22, 2023 - link
Argonne has it's own power plant funded by the DoE and UofC, they can afford whatever it is with our tax dollars :)PeachNCream - Friday, June 23, 2023 - link
That is until everyone is evading taxes because they can't afford to pay their medical bills and also cover other essential exepenses. Then again, it's not like we all needed that new SUV or pickup truck that gets 9 MPG we're up to our eyeballs in debt over or the three crotch goblin children we decided to have.Why998 - Monday, June 26, 2023 - link
You have got to be one of the most miserable people on this planetVendicar - Tuesday, June 27, 2023 - link
Don't like the truth?Mdarrish - Wednesday, July 5, 2023 - link
Actually that would be his spouse.Vendicar - Tuesday, June 27, 2023 - link
Who cares? Only the inferior classes can't afford to maintain their own health and demand that they be permitted to further parasite themselves upon the superior class.Oxford Guy - Wednesday, June 28, 2023 - link
Parasitize.Samus - Friday, June 23, 2023 - link
Actually, scratch that, they had to sign a contract with Constellation last year in order to bring Aurora online. Their CHP (Combined Heat Power) plant doesn't produce enough electricity for the Advanced Photon Source synchrotron ring AND the Aurora as it barely ran the accelerator and terabyte storage facility this replaced, not to mention the rest of the campus. Argonne has always received supplemental power from the grid and as a form of redundancy.