This post was written on Jan 27, 2026.
Models/pricing/policies may have changed. Check the latest llm posts.
Moonshot AI Releases Kimi K2.5 Multimodal MoE Model
Moonshot AI reveals Kimi K2.5, a 1.04T MoE model outperforming Llama 3.1 in math and coding benchmarks.

TL;DR
- Moonshot AI released Kimi K2.5 with 1.04 trillion parameters and 15 trillion training tokens.
- The model shows high performance in mathematics and coding compared to Llama 3.1 405B.
- It uses a Mixture-of-Experts structure to activate 32 billion parameters during inference for efficiency.
Example: A software engineer might upload an image of a handwritten architectural diagram to the assistant. The model could analyze logic and suggest code blocks to fill gaps in the script.
Current Status: A Large MoE Model Based on 15 Trillion Tokens
On January 27, 2026, Moonshot AI released its Kimi K2.5 model. This model is an open-source tool with detailed technical specifications. It is a native multimodal model trained on 15 trillion tokens. The total parameter count reaches 1.04 trillion. It activates 32 billion parameters during inference through a specialized architecture. Benchmark results suggest a focus on mathematics and coding fields. Kimi K2.5 scored 96.1% on the AIME 2025 mathematical reasoning test. It reached 85.0% on the LiveCodeBench (v6) coding proficiency evaluation. The model also achieved 76.8% on the SWE-Bench Verified metric. These figures can exceed those of Llama 3.1 405B in certain metrics.
Analysis: Combining Efficiency with Open Source Strategy
Moonshot AI utilized the MuonClip optimizer to manage training stability. This approach can increase the utility of each token during training. The multimodal structure processes text and images at the same time. Coding agents can use this to interpret UI/UX design blueprints. The specific ratio between vision and text data remains undisclosed. Direct comparisons with models like Llama 4 are currently limited. Maintaining large models under limited computational resources remains a consideration.
Practical Application
Users can access Kimi K2.5 through the NVIDIA NIM API or Hugging Face. The coding agent can assist with large codebases in corporate environments. It can also support complex mathematical tasks in financial engineering.
Checklist for Today:
- Review the Kimi K2.5 model card on Hugging Face for compatibility.
- Test the coding agent by inputting unresolved code error reports.
- Evaluate multimodal inference speeds using the NVIDIA NIM API.
FAQ
Q: What is the training data composition for Kimi K2.5? A: It uses 15 trillion mixed tokens, though the exact vision-to-text ratio is unconfirmed.
Q: What are its strengths compared to existing Llama models? A: As of January 2026, it shows higher scores than Llama 3.1 405B in math and coding.
Q: Do high parameter counts lead to high operating costs? A: The MoE architecture activates only 32 billion parameters to help improve efficiency.
Conclusion
Kimi K2.5 can increase accessibility to high-performance coding tools. It combines large-scale training with an efficient MoE structure. This model suggests that technical capabilities are reaching international standards. Future observation should focus on performance in real development settings. Data transparency remains an important topic for future updates.
References
Get updates
A weekly digest of what actually matters.
Found an issue? Report a correction so we can review and update the post.