MiniMax Open-Sources MiniMax M3 Model With 428 Billion Parameters and 1M Token Context

According to Beating, MiniMax open-sourced native multimodal mixture-of-experts (MoE) model MiniMax M3 weights on Hugging Face. The model has 428 billion total parameters with 23 billion parameters activated per token, supporting up to 1 million token context natively. The development team released an MXFP8 quantized version and integrated support for mainstream inference frameworks including SGLang, vLLM, and Transformers. MiniMax also open-sourced the lightweight MiniMax Sparse Attention (MSA) kernel library, achieving 9x faster pre-filling and 15x faster decoding on 1 million token context with NVIDIA Blackwell architecture optimization.
Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.
Comment
0/400
No comments