NVIDIA Supercharges AI Computing Platform with Introduction of NVIDIA HGX H200
NVIDIA has announced the launch of the NVIDIA HGX H200, which enhances the world's leading AI computing platform. Built on the NVIDIA Hopper architecture, this platform features the NVIDIA H200 Tensor Core GPU with advanced memory capabilities to handle large amounts of data for generative AI and high-performance computing workloads.
The NVIDIA H200 is the first GPU to offer HBM3e, a faster and larger memory solution that fuels the acceleration of generative AI and large language models. It also advances scientific computing for HPC workloads. With HBM3e, the NVIDIA H200 provides 141 GB of memory at 4.8 terabytes per second, offering nearly double the capacity and 2.4 times more bandwidth compared to its predecessor, the NVIDIA A100. Systems powered by the H200 from leading server manufacturers and cloud service providers are expected to be available in the second quarter of 2024.
"To create intelligence with generative AI and HPC applications, vast amounts of data must be efficiently processed at high speed using large, fast GPU memory," said Ian Buck, vice president of hyperscale and HPC at NVIDIA. "With NVIDIA H200, the industry's leading end-to-end AI supercomputing platform just got faster to solve some of the world's most important challenges."
Perpetual Innovation, Perpetual Performance Leaps
The NVIDIA Hopper architecture delivers a significant performance leap over its predecessor and continues to raise the bar through ongoing software enhancements with H100. This includes the recent release of powerful open-source libraries like NVIDIA TensorRT-LLM. The introduction of the H200 will lead to further performance improvements, including nearly doubling inference speed on Llama 2, a 70 billion-parameter LLM, compared to the H100. Additional performance enhancements and improvements are expected with future software updates.
NVIDIA H200 Form Factors
The NVIDIA H200 will be available in NVIDIA HGX H200 server boards with four- and eight-way configurations, compatible with both the hardware and software of HGX H100 systems. It is also available in the NVIDIA GH200 Grace Hopper Superchip with HBM3e, which was announced in August. The H200 can be deployed in various types of data centers, including on-premises, cloud, hybrid-cloud, and edge. NVIDIA's global ecosystem of partner server makers, including ASRock Rack, ASUS, Dell Technologies, Eviden, GIGABYTE, Hewlett Packard Enterprise, Ingrasys, Lenovo, QCT, Supermicro, Wistron, and Wiwynn, can update their existing systems with the H200.
Cloud service providers such as Amazon Web Services, Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure, as well as CoreWeave, Lambda, and Vultr, will be among the first to deploy H200-based instances starting next year.
Powered by NVIDIA NVLink and NVSwitch high-speed interconnects, HGX H200 provides the highest performance on various application workloads, including LLM training and inference for the largest models beyond 175 billion parameters. An eight-way HGX H200 offers over 32 petaflops of FP8 deep learning compute and 1.1 TB of aggregate high-bandwidth memory for the highest performance in generative AI and HPC applications.
When combined with NVIDIA Grace CPUs and an ultra-fast NVLink-C2C interconnect, the H200 creates the GH200 Grace Hopper Superchip with HBM3e, an integrated module designed for giant-scale HPC and AI applications.
Accelerate AI With NVIDIA Full-Stack Software
NVIDIA's accelerated computing platform is supported by powerful software tools that enable developers and enterprises to build and accelerate production-ready applications from AI to HPC. This includes the NVIDIA AI Enterprise suite of software for workloads such as speech, recommender systems, and hyperscale inference.
Availability
The NVIDIA H200 will be available from global system manufacturers and cloud service providers starting in the second quarter of 2024.