Kavli Affiliate: Feng Wang | First 5 Authors: Kimi Team, Kimi Team, , , | Summary: We introduce Kimi K2, a Mixture-of-Experts (MoE) large language model with 32 billion activated parameters and 1 trillion total parameters. We propose the MuonClip optimizer, which improves upon Muon with a novel QK-clip technique to address training instability while […]
Continue.. Kimi K2: Open Agentic Intelligence