The Texas Advanced Computing Center (TACC) unveiled its latest Stampede supercomputer for open science research projects, Stampede3. TACC anticipates that Stampede3 will come online this fall and will deliver its full performance in early 2024. The supercomputer will be a crucial component of the U.S. National Science Foundation’s (NSF) ACCESS scientific supercomputing ecosystem, and it is projected to serve the open science community from 2024 until 2029.

The third-generation Stampede cluster, which will be built by Dell, will incorporate 560 nodes equipped with Intel's Sapphire Rapids generation Xeon CPU Max processors, each offering 56 CPU cores and 64GB of on-package HBM2E memory. Surprisingly, TACC is going to be operating these nodes in HBM-only mode, so no additional DRAM will be attached to the CPU nodes – all of their memory will come from the on-chip HBM stacks.

With these specifications, Stampede3 is expected to have a peak performance of approximately 4 FP64 PetaFLOPS, while offering nearly 63,000 general-purpose cores. In addition, TACC also plans to install 10 Dell PowerEdge XE9640 servers with 40 Intel Data Center GPU Max compute GPUs for artificial intelligence and machine learning workloads.

Given this layout, the bulk of Stampede3's compute performance will be supplied by CPUs. This makes Stampede3 a bit of a rarity in this day and age, as most high-performance systems are GPU driven, leaving Stampede3 as one of the last supercomputers that relies almost solely on general-purpose CPUs.

And while the current cluster is primarily focused on CPU performance, TACC is also going to use the Intel GPUs in the latest Stampede revamp to investigate on how to incorporate larger numbers of GPUs into future versions of the system. For now, most of TACC's AI tasks are run on its Lone Star systems, which is powered by hundreds Nvidia A100 compute GPUs. So the organization's aim is to explore whether a portion of this workload can be transferred to Intel's Ponte Vecchio.

We are going to put in a small system with exploratory capability using Intel Ponte Vecchio," said Dan Stanzione, executive director of TACC. "We are still negotiating exactly how much of that will have, but I would say a minimum of 40 nodes and maximum of a hundred or so. […] We are just putting a couple of racks of Ponte Vecchio out there to see how people work with it."

Stampede3 will leverage 400 Gb/s Omni-Path Fabric technology that will enable a backplane bandwidth of 24TB/s. This setup will allow the machine to efficiently scale and minimize latencies, making it well-suited for various applications requiring simulations.

TACC also plans to reincorporate nodes from the previous version, Stampede2, which were based on older-generation Xeon Scalable CPUs. This integration will enhance the capacity of Stampede3 for high-memory applications, high-throughput computing, interactive workloads, and other previous-generation applications. In total, the new supercomputer system will feature 1,858 compute nodes with over 140,000 cores, more than 330 TBs of RAM, new storage capacity of 13 PBs, and a peak performance close to 10 PetaFLOPS.

Sources: TACC, HPCWire

Comments Locked

5 Comments

View All Comments

  • brucethemoose - Wednesday, July 26, 2023 - link

    This is similar to Fujitsu's A64FX supercomputer, which some US government researchers spoke highly of.
  • jjjag - Wednesday, July 26, 2023 - link

    Consider this : Power estimates of the Stampede3 put it at somewhere around 30-50MW. An entire power plant puts out 500mW. So this thing takes 1/10 of a power plant (about 1/20 of a nuclear reactor but we don't build those any more). Meanwhile, people in Texas are suffering brown-outs and black-outs because their whole infrastructure is broken. During the famous winter storm of 2021 when people did not have power, Texas did not even make the school shut down the existing supercomputers! They called it "a critical datacenter" and left them all running. Now consider they have multiple supercomputers running in TX, each one using enough power for 20,000 homes (or more in an emergency). But yeah, let's build bigger and more power hungry supercomputers, not upgrade our power infrastructure, and not even TRY and make these computers more power efficient. Even the most conservative of estimates puts the largest supercomputers passing the 500MW line in about 15 years. A whole power plant for one computer? It's NOT sustainable. Both Academia and Gov't need to pivot from raw computing FLOPS to significantly driving the power efficiency up.
  • Ryan Smith - Wednesday, July 26, 2023 - link

    "Power estimates of the Stampede3 put it at somewhere around 30-50MW"

    You need to check the source of those figures. That's roughly the power budget for Aurora; Stampede3 is a fraction of that size.
  • shelbystripes - Monday, July 31, 2023 - link

    This is a bizarre complaint. Texas’ infrastructure problems are a political problem created by its own corrupt government. It doesn’t matter individually who is using how much power until that is solved, no one person or entity can “voluntarily” solve the problem on their own. Only government can solve the problem, and it’s the broken government that created the problem.

    Peak demand during the Texas 2021 winter storm was 77,000MW. Even if Stampede2 consumed 50MW in the first place (it doesn’t), improving efficiency by 50% wouldn’t make a difference. The power grid still would have collapsed under the conditions it faced.

    Improving efficiency is important, but not at all for the reason you mentioned. You can’t fix regulatory corruption with more efficient computers.
  • tipoo - Friday, July 28, 2023 - link

    Is there any sort of public third party overview of how Ponte Vecchio is performing vs the competition?

Log in

Don't have an account? Sign up now