Nvidia Nemotron 3 Ultra fastest U.S. open model; trails China
Nvidia unveiled Nemotron 3 Ultra, a 550B-parameter open-weight model that produces 300+ output tokens/sec and scored 48 on Artificial Analysis’ intelligence index, behind Kimi K2.6 at 54.
Nvidia unveiled Nemotron 3 Ultra at Computex in Taipei on June 1. The model has 550 billion parameters and uses a mixture-of-experts design that activates about 55 billion parameters at runtime. A pre-release DeepInfra endpoint produced more than 300 output tokens per second. Independent evaluator Artificial Analysis scored the model 48 on its intelligence index; Moonshot AI’s Kimi K2.6 scored 54.
The mixture-of-experts design routes each input to a subset of the model’s experts, reducing active compute relative to the full parameter count. Nemotron 3 Ultra combines Mamba-2 layers, standard Transformer attention and mixture-of-experts routing. The model supports a one-million-token context window and uses multi-token prediction, which outputs several future tokens at once to speed generation.
Nvidia plans to publish all model weights and release the training recipes. The company positions Nemotron 3 Ultra for datacenter deployment; it will also be accessible through Nvidia’s API and cloud providers. Nvidia scheduled general availability for June 4.
Nvidia reported the design delivers roughly five times faster inference and about 30 percent lower costs than comparable open-weight alternatives. On the pre-release DeepInfra endpoint Nemotron 3 Ultra served more than 300 output tokens per second. Comparable open models such as DeepSeek V4 Pro and Kimi K2.6 are currently served at about 50 to 100 tokens per second through commercial APIs.
Artificial Analysis’ index aggregates tests of reasoning, coding, general knowledge and agentic performance. In that index Kimi K2.6 ranks at 54, ahead of Nemotron 3 Ultra at 48. Among U.S. open-weight models, Nemotron 3 Ultra leads: Google’s Gemma 4 31B scored 39, Nemotron 3 Super scored 36 and gpt-oss-120b scored 33 on the same index.
Nemotron 3 is offered in three sizes: Nano for lightweight tasks, Super for mid-range enterprise use and Ultra for complex reasoning. Nvidia reports the Nemotron 3 models were post-trained with reinforcement learning across interactive environments to improve planning and multi-step execution.
Nvidia formed a Nemotron Coalition of eight AI labs, including Mistral AI and Perplexity, and disclosed a five-year plan to invest $26 billion in open-weight AI development. The company also reported work is already underway on Nemotron 4.
Industry measurements show open-weight models from Chinese labs increased their share of global open-model usage from about 1.2 percent in late 2024 to roughly 30 percent by the end of 2025.
The material on GNcrypto is intended solely for informational use and must not be regarded as financial advice. We make every effort to keep the content accurate and current, but we cannot warrant its precision, completeness, or reliability. GNcrypto does not take responsibility for any mistakes, omissions, or financial losses resulting from reliance on this information. Any actions you take based on this content are done at your own risk. Always conduct independent research and seek guidance from a qualified specialist. For further details, please review our Terms, Privacy Policy and Disclaimers.







