Google Cloud A4X Max Bare Metal Instances Support 50k GPU Clusters, Network Bandwidth Doubles

robot
Abstract generation in progress
ME News Updates, April 19 (UTC+8), Google Cloud announced that its A4X Max bare-metal instance supports clusters of up to 50,000 GPUs, with network bandwidth twice that of the previous generation. This instance belongs to the Google Compute Engine accelerator-optimized machine series, which come pre-installed with NVIDIA GPUs and are designed for AI, machine learning, high-performance computing, and graphics-intensive applications. The documentation details multiple machine series including A4X Max, A4X, A4, A3, A2, G4, and G2, and recommends specific series based on workload types such as pre-training, fine-tuning, inference, graphics, and high-performance computing. Additionally, the documentation explains pricing and consumption options (on-demand, Spot, Flex-start, reserved) based on pre-installed GPUs, vCPUs, memory, and local SSDs, as well as the maintenance experience for different machine types. (Source: InFoQ)
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 8
  • 2
  • Share
Comment
Add a comment
Add a comment
NoMoreRugs
· 1h ago
Local SSD pricing is finally a bit more transparent; it used to be hidden deep.
View OriginalReply0
ZkSketcher
· 1h ago
Coverage from G2 to A4X series, small and medium-sized enterprises can also get some benefits.
View OriginalReply0
GateUser-6fd3205e
· 1h ago
Spot instances for AI training? Interrupt once and start over, it's exhausting.
View OriginalReply0
ChecksumSmile
· 1h ago
Pre-training recommends A4X, inference uses G2, this division is quite detailed.
View OriginalReply0
FeeTaker
· 1h ago
What’s the new trick with Flex-start— a hybrid of on-demand and reserved?
View OriginalReply0
RugProofRita
· 1h ago
Can Kubernetes handle the scheduling complexity of a 50k-card cluster?
View OriginalReply0
MistValleyFront
· 1h ago
Google Cloud is clearly competing with AWS Trainium in this move.
View OriginalReply0
DaoBackbencher
· 1h ago
The phrase "maintenance experience difference" is quite subtle; does it mean that some will fail or crash?
View OriginalReply0
  • Pinned