MiniCPM5-1B brings 128K on-device context to phones
OpenBMB released MiniCPM5-1B, a 1-billion-parameter on-device model with native tool calls, Model Context Protocol support and a 128K-token context for local agents.
OpenBMB released MiniCPM5-1B, a one-billion-parameter language model built to run on consumer phones. The model supports native tool calling and the Model Context Protocol (MCP) and carries a 128K-token context window. MiniCPM5-1B is available on Hugging Face under an Apache 2.0 license and is compatible with vLLM, SGLang and standard Transformers inference libraries.
The 128K-token context equals roughly 96,000 words of continuous text, which OpenBMB says enables long roleplay sessions, full-document digests and agent contexts that persist across many exchanges. The release is aimed at local agent workflows where a phone or laptop can call local tools, query a calendar, search a local database or reach an MCP research server without sending user data to a cloud API.
The model builds on the MiniCPM4 architecture and introduces InfLLM v2, a trainable attention mechanism that restricts each token’s attention to fewer than 5% of surrounding tokens during long-context inference. OpenBMB reported that this design reduces computation for long inputs while maintaining accuracy. Training used a filtering pipeline named UltraClean and roughly 8 trillion training tokens.
OpenBMB applied post-training optimization that combined reinforcement learning with efficient distillation from larger models. The company reported a 16-point increase on math, code and instruction-following benchmarks and a 29-percentage-point reduction in runaway-length responses after these steps.
In internal and independent evaluations, MiniCPM5-1B scored an average of 42.57 on a capability benchmark that covers general knowledge, domain knowledge, coding, instruction-following, math reasoning, logical reasoning and agentic tasks. The next-best 1B-class competitor scored 35.61 on the same benchmark. OpenBMB lists competitors at this scale including Alibaba’s Qwen3-0.6B and Qwen3.5-0.8B and Liquid AI’s LFM2.5-1.2B-Thinking.
Tests validated the model’s native tool calling and MCP support. When connected to a research MCP server, the model fetched a live Bitcoin price and returned three stock suggestions-Amazon, Microsoft and Nvidia-via the tool chain.
Limitations appeared in quick evaluations. In a classic logic-trap prompt about a man marrying his widow’s sister, the model produced a jurisdictional analysis and did not flag the paradox that a man with a widow is deceased. In a separate A/B decisional test the model hedged rather than choosing a side, a behavior noted for smaller conversational models.
MiniCPM5-1B is smaller than many modern models: Google’s Gemma 4 starts at an effective 2 billion parameters and scales to 31 billion, and Llama 4 Scout runs with 17 billion active parameters. OpenBMB describes MiniCPM5-1B as built for on-device agent use where long context and local tool integration are prioritized over larger model scale.
The material on GNcrypto is intended solely for informational use and must not be regarded as financial advice. We make every effort to keep the content accurate and current, but we cannot warrant its precision, completeness, or reliability. GNcrypto does not take responsibility for any mistakes, omissions, or financial losses resulting from reliance on this information. Any actions you take based on this content are done at your own risk. Always conduct independent research and seek guidance from a qualified specialist. For further details, please review our Terms, Privacy Policy and Disclaimers.






