Evolvent AI

1

26

Spoon@Elonsfannumber1·1d

@Evolvent_AI solid result for its price

English

0

2

211

Evolvent AI@Evolvent_AI·1d

Ran deepseek-v4-pro through ClawMark (our living-world openclaw benchmark) — 100/100 tasks, 0.685 avg score, 40.7h total time. Slots in at #4, just edging out kimi-k2.6 (0.684) and gemini-3.1-pro (0.682) — all three within a 0.003 window. claude-4-6 / gpt-5.4 still hold the top at 0.72–0.76. Updated leaderboard 👇

English

5

4

84

6.3K

Evolvent AI@Evolvent_AI·13h

@wojtess Yeah will release the result of glm 5.1 soon

English

1

35

wojtess@wojtess·1d

@Evolvent_AI Did you test glm5.1?

English

0

152

Evolvent AI@Evolvent_AI·21 Nis

@bourneliu66 Open weights + top-tier performance + game-changing pricing = paradigm shift. Our independent ClawMark results confirm K2.6 is the real deal: x.com/evolvent_ai/st…

Can confirm — K2.6 isn't just a demo-reel model. Few days ago, we received a bug report from kimi team, and we got early API access, re-ran ClawMark (our living-world openclaw benchmark). After fixing a compatibility bug in openclaw's repo (github.com/openclaw/openc…), K2.6 lands at 0.684 avg score — edging out gemini-3.1-pro (0.682) and jumping +0.124 over K2.5. Shipping shaders and agentic benchmark gains in the same release is a pretty rare combo. 👀

English

921

刘小排@bourneliu66·21 Nis

Kimi K2.6 可能被严重低估了。便宜，不用抢，能力强，原生多模态，在Agent里表现非常优秀。刚看到被Hermes Agent官方表扬了，实测也相当不错。

中文

31

8

154

36.9K

Evolvent AI@Evolvent_AI·21 Nis

@oran_ge Price, performance, open weights—name a better combo. We put it to the test on our live agent benchmark: x.com/evolvent_ai/st…

Can confirm — K2.6 isn't just a demo-reel model. Few days ago, we received a bug report from kimi team, and we got early API access, re-ran ClawMark (our living-world openclaw benchmark). After fixing a compatibility bug in openclaw's repo (github.com/openclaw/openc…), K2.6 lands at 0.684 avg score — edging out gemini-3.1-pro (0.682) and jumping +0.124 over K2.5. Shipping shaders and agentic benchmark gains in the same release is a pretty rare combo. 👀

English

471

Orange AI@oran_ge·21 Nis

Kimi 2.6 发布了，有点牛逼在小山评测的108项测试用例里面拿下76项，SOTA 是目前的最佳性价比模型 xsct.ai/s/D2Lwf6A4

中文

29

11

136

46.9K

Evolvent AI@Evolvent_AI·21 Nis

@shao__meng Can confirm: K2.6 is not a demo. It’s a production-grade beast. Our benchmark says it all: x.com/evolvent_ai/st… Open source is eating the world.

Can confirm — K2.6 isn't just a demo-reel model. Few days ago, we received a bug report from kimi team, and we got early API access, re-ran ClawMark (our living-world openclaw benchmark). After fixing a compatibility bug in openclaw's repo (github.com/openclaw/openc…), K2.6 lands at 0.684 avg score — edging out gemini-3.1-pro (0.682) and jumping +0.124 over K2.5. Shipping shaders and agentic benchmark gains in the same release is a pretty rare combo. 👀

English

1

80

meng shao@shao__meng·21 Nis

Kimi K2.6 开源发布，这个 1T 参数 MoE 代码智能模型，下面两方面有大幅突破 1. 代码工程：开源首次超越闭源 · SWE-Bench Pro 58.6%（GPT-5.4: 57.7%，Claude 4.6: 53.4%） · 支持 12 小时连续执行、4000+ 工具调用，跨语言（Rust/Go/Python）通用 · 前端生成进入动态视觉层（WebGL/Three.js/GSAP） 2. Agent 规模：数量级跃升 · 300 并行 sub agents × 4000 步（K2.5 为 100×1500） · Claw Groups 支持人-机-第三方 agents 混合协作

Meet Kimi K2.6: Advancing Open-Source Coding 🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2) What's new: 🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization). 🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D. 🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files. 🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops. 🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop. - K2.6 is now live on kimi.com in chat mode and agent mode. For production-grade coding, pair K2.6 with Kimi Code: kimi.com/code - 🔗 API: platform.moonshot.ai 🔗 Tech blog: kimi.com/blog/kimi-k2-6 🔗 Weights & code: huggingface.co/moonshotai/Kim…

中文

3

2

6

3.4K

Evolvent AI@Evolvent_AI·21 Nis

Update: Added Kimi K2.6 results. Fixed OpenClaw compatibility bug (reported by Kimi team), re-ran benchmark, and finalized fresh ClawMark scores. Updated results table below 👇

Launch Week — Day 1: ClawMark Most agent benchmarks give the model one shot, one prompt, one frozen environment. Real coworker tasks span multiple days — and the world keeps changing while the agent works. Introducing 🦞ClawMark: a multi-day, dynamic-environment benchmark for coworker agents. Built by Evolvent together with 40+ researchers from NUS, HKU, MIT, UW, and UC Berkeley. Open-sourced at: claw-mark.com 100 tasks. 13 professional domains. Fully rule-based scoring. Results from 6 frontier models below. 🧵👇

English

0

2

234

Evolvent AI@Evolvent_AI·21 Nis

@_akhaliq Verified: K2.6 is the real deal 🚀 Outperformed Gemini 3.1 Pro on our ClawMark living-world benchmark. Read our full analysis: x.com/evolvent_ai/st…

Can confirm — K2.6 isn't just a demo-reel model. Few days ago, we received a bug report from kimi team, and we got early API access, re-ran ClawMark (our living-world openclaw benchmark). After fixing a compatibility bug in openclaw's repo (github.com/openclaw/openc…), K2.6 lands at 0.684 avg score — edging out gemini-3.1-pro (0.682) and jumping +0.124 over K2.5. Shipping shaders and agentic benchmark gains in the same release is a pretty rare combo. 👀

English

119

AK@_akhaliq·21 Nis

Kimi K2.6 is available in huggingchat

English

4

35

8.8K

Evolvent AI@Evolvent_AI·21 Nis

@DeRonin_ Kimi 2.6 = next-level agentic performance. Confirmed on ClawMark, our living-world openclaw benchmark. Full scores here: x.com/evolvent_ai/st…

Can confirm — K2.6 isn't just a demo-reel model. Few days ago, we received a bug report from kimi team, and we got early API access, re-ran ClawMark (our living-world openclaw benchmark). After fixing a compatibility bug in openclaw's repo (github.com/openclaw/openc…), K2.6 lands at 0.684 avg score — edging out gemini-3.1-pro (0.682) and jumping +0.124 over K2.5. Shipping shaders and agentic benchmark gains in the same release is a pretty rare combo. 👀

English

138

Ronin@DeRonin_·20 Nis

Kimi K2.6 just beat GPT 5.4 and Opus 4.6 on coding benchmarks SWE-Bench Pro: 58.6 SWE-Bench Multilingual: 76.7 BrowseComp: 83.2 All open-source. All open-weight Guess to test it this week for: - agent pipelines where speed and cost matter more than marginal quality - multi-step coding tasks across Rust, Go, and Python - long-running autonomous workflows (4,000+ tool calls per session is insane) If it follows the same pricing as K2 and K2.5, this will be one of the best price-to-quality ratios on the market Stay tuned.

Meet Kimi K2.6: Advancing Open-Source Coding 🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2) What's new: 🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization). 🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D. 🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files. 🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops. 🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop. - K2.6 is now live on kimi.com in chat mode and agent mode. For production-grade coding, pair K2.6 with Kimi Code: kimi.com/code - 🔗 API: platform.moonshot.ai 🔗 Tech blog: kimi.com/blog/kimi-k2-6 🔗 Weights & code: huggingface.co/moonshotai/Kim…

English

72

14

111

15.3K

Evolvent AI@Evolvent_AI·21 Nis

@cgtwts Can confirm: K2.6 is not a demo. It’s a production-grade beast. Our benchmark says it all: x.com/evolvent_ai/st… Open source is eating the world.

Can confirm — K2.6 isn't just a demo-reel model. Few days ago, we received a bug report from kimi team, and we got early API access, re-ran ClawMark (our living-world openclaw benchmark). After fixing a compatibility bug in openclaw's repo (github.com/openclaw/openc…), K2.6 lands at 0.684 avg score — edging out gemini-3.1-pro (0.682) and jumping +0.124 over K2.5. Shipping shaders and agentic benchmark gains in the same release is a pretty rare combo. 👀

English

2

48

CG@cgtwts·20 Nis

Kimi just dropped K2.6 > open sourced > swe-bench pro 58.6 > top performance on coding tests > beats Claude opus 4.6 and GPT 5.4 on select benchmarks > runs for 12+ hours, 4k+ tool actions > handles big projects across multiple languages > builds animated websites > runs hundreds of agents at once > one prompt can create full projects > works on tasks automatically 24/7 open source models are quickly catching up and performing as well as the best models out there.

Meet Kimi K2.6: Advancing Open-Source Coding 🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2) What's new: 🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization). 🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D. 🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files. 🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops. 🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop. - K2.6 is now live on kimi.com in chat mode and agent mode. For production-grade coding, pair K2.6 with Kimi Code: kimi.com/code - 🔗 API: platform.moonshot.ai 🔗 Tech blog: kimi.com/blog/kimi-k2-6 🔗 Weights & code: huggingface.co/moonshotai/Kim…

English

10

9

75

10.7K

Evolvent AI@Evolvent_AI·21 Nis

@shiri_shh Historic day for open-source AI. We independently measured K2.6’s agent capabilities and the results are massive: x.com/evolvent_ai/st…

Can confirm — K2.6 isn't just a demo-reel model. Few days ago, we received a bug report from kimi team, and we got early API access, re-ran ClawMark (our living-world openclaw benchmark). After fixing a compatibility bug in openclaw's repo (github.com/openclaw/openc…), K2.6 lands at 0.684 avg score — edging out gemini-3.1-pro (0.682) and jumping +0.124 over K2.5. Shipping shaders and agentic benchmark gains in the same release is a pretty rare combo. 👀

English

56

shirish@shiri_shh·20 Nis

Free open-source LLM is getting scary good. Kimi K2.6 beats or matches top closed models on coding and agentic benchmarks. It handles huge projects across languages, creates beautiful moving frontends, and powers real agent teams. And it's 8-10x cheaper than Claude Opus 4.6 and GPT-5.4 on API costs.

Meet Kimi K2.6: Advancing Open-Source Coding 🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2) What's new: 🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization). 🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D. 🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files. 🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops. 🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop. - K2.6 is now live on kimi.com in chat mode and agent mode. For production-grade coding, pair K2.6 with Kimi Code: kimi.com/code - 🔗 API: platform.moonshot.ai 🔗 Tech blog: kimi.com/blog/kimi-k2-6 🔗 Weights & code: huggingface.co/moonshotai/Kim…

English

13

3

45

5.1K

Evolvent AI@Evolvent_AI·21 Nis

@chetaslua Price, performance, open weights—name a better combo. We put it to the test on the ClawMark, our live agent benchmark: x.com/evolvent_ai/st…

Can confirm — K2.6 isn't just a demo-reel model. Few days ago, we received a bug report from kimi team, and we got early API access, re-ran ClawMark (our living-world openclaw benchmark). After fixing a compatibility bug in openclaw's repo (github.com/openclaw/openc…), K2.6 lands at 0.684 avg score — edging out gemini-3.1-pro (0.682) and jumping +0.124 over K2.5. Shipping shaders and agentic benchmark gains in the same release is a pretty rare combo. 👀

English

2

184

Chetaslua@chetaslua·20 Nis

Holy Shit Kimi Cooked 🐐 I have been testing kimi k2.6 for the last 7 days I have lots of cool demos to show you guys , kimi is keeping open-source alive Aesthetic: Beautiful front-end design with rich interaction

Meet Kimi K2.6: Advancing Open-Source Coding 🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2) What's new: 🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization). 🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D. 🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files. 🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops. 🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop. - K2.6 is now live on kimi.com in chat mode and agent mode. For production-grade coding, pair K2.6 with Kimi Code: kimi.com/code - 🔗 API: platform.moonshot.ai 🔗 Tech blog: kimi.com/blog/kimi-k2-6 🔗 Weights & code: huggingface.co/moonshotai/Kim…

English

10

8

308

28.3K

Evolvent AI@Evolvent_AI·21 Nis

@itsPaulAi Open weights + top-tier performance + game-changing pricing = paradigm shift. Our independent ClawMark results confirm K2.6 is the real deal: x.com/evolvent_ai/st…

Can confirm — K2.6 isn't just a demo-reel model. Few days ago, we received a bug report from kimi team, and we got early API access, re-ran ClawMark (our living-world openclaw benchmark). After fixing a compatibility bug in openclaw's repo (github.com/openclaw/openc…), K2.6 lands at 0.684 avg score — edging out gemini-3.1-pro (0.682) and jumping +0.124 over K2.5. Shipping shaders and agentic benchmark gains in the same release is a pretty rare combo. 👀

English

57

Paul Couvert@itsPaulAi·20 Nis

That's just insane Kimi has released K2.6 which is: - 100% open source 🔥 - On par with GPT-5.4 high / Claude Opus 4.6 - 9x cheaper than Claude / 5x cheaper than GPT And the weights are already available on Hugging Face!! There're less and less reasons to use closed source models. Open source is winning.

English

45

44

430

36.6K

Evolvent AI@Evolvent_AI·21 Nis

@k1rallik Independent ClawMark test: Kimi 2.6 > Gemini 3.1 Pro. Real-world performance, real data. Read why it matters: x.com/evolvent_ai/st…

Can confirm — K2.6 isn't just a demo-reel model. Few days ago, we received a bug report from kimi team, and we got early API access, re-ran ClawMark (our living-world openclaw benchmark). After fixing a compatibility bug in openclaw's repo (github.com/openclaw/openc…), K2.6 lands at 0.684 avg score — edging out gemini-3.1-pro (0.682) and jumping +0.124 over K2.5. Shipping shaders and agentic benchmark gains in the same release is a pretty rare combo. 👀

English

1

156

BuBBliK@k1rallik·20 Nis

🚨 do you understand what Kimi K2.6 actually is Moonshot AI dropped K2.6 with ZERO press release. just an email. while everyone was waiting for GPT-6: > open-source model > beats Claude Opus 4.6 on coding benchmarks > 300 parallel agents × 4,000 steps per run > 12+ hours of continuous execution > 76% cheaper than Claude Chinese lab is just quietly shipping every 2-3 months this is the most underrated story in AI right now

Meet Kimi K2.6: Advancing Open-Source Coding 🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2) What's new: 🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization). 🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D. 🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files. 🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops. 🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop. - K2.6 is now live on kimi.com in chat mode and agent mode. For production-grade coding, pair K2.6 with Kimi Code: kimi.com/code - 🔗 API: platform.moonshot.ai 🔗 Tech blog: kimi.com/blog/kimi-k2-6 🔗 Weights & code: huggingface.co/moonshotai/Kim…

English

13

14

223

18.8K

Evolvent AI@Evolvent_AI·21 Nis

@svpino Independent ClawMark test: Kimi 2.6 > Gemini 3.1 Pro. Real-world performance, real data. Read why it matters: x.com/evolvent_ai/st…

Can confirm — K2.6 isn't just a demo-reel model. Few days ago, we received a bug report from kimi team, and we got early API access, re-ran ClawMark (our living-world openclaw benchmark). After fixing a compatibility bug in openclaw's repo (github.com/openclaw/openc…), K2.6 lands at 0.684 avg score — edging out gemini-3.1-pro (0.682) and jumping +0.124 over K2.5. Shipping shaders and agentic benchmark gains in the same release is a pretty rare combo. 👀

English

0

1

124

Santiago@svpino·21 Nis

Open-source models strike again! Kimi K2.6 is now out, and it's one of the best open-source coding models you can use. Look at the benchmarks: • SOTA on HLE with tools: 54.0 • SWE-Bench Multilingual: 76.7 • SWE-Bench Pro: 58.6 A bunch of new things with this model: • Better at front-end dev • Better at DevOps • Better at performance optimizations • Better long-horizon coding tasks • Better generalization across languages

AI/ML API@aimlapi

Kimi 2.6 from @Kimi_Moonshot is here - and 0 day available on AI/ML API! It is insanely powerful - we asked it to build space invaders and one shot result was stunning. You can check out the prompt in comments.

English

10

6

104

18K

Evolvent AI@Evolvent_AI·21 Nis

@kanavtwt Verified: K2.6 is the real deal 🚀 Outperformed Gemini 3.1 Pro on our ClawMark living-world benchmark. Read our full analysis: x.com/evolvent_ai/st…

Can confirm — K2.6 isn't just a demo-reel model. Few days ago, we received a bug report from kimi team, and we got early API access, re-ran ClawMark (our living-world openclaw benchmark). After fixing a compatibility bug in openclaw's repo (github.com/openclaw/openc…), K2.6 lands at 0.684 avg score — edging out gemini-3.1-pro (0.682) and jumping +0.124 over K2.5. Shipping shaders and agentic benchmark gains in the same release is a pretty rare combo. 👀

English

2

2.6K

kanav@kanavtwt·20 Nis

Kimi 2.6 JUST dropped: > beats gpt 5.4 and opus 4.6 > 3-5x cheaper > is open source

Meet Kimi K2.6: Advancing Open-Source Coding 🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2) What's new: 🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization). 🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D. 🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files. 🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops. 🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop. - K2.6 is now live on kimi.com in chat mode and agent mode. For production-grade coding, pair K2.6 with Kimi Code: kimi.com/code - 🔗 API: platform.moonshot.ai 🔗 Tech blog: kimi.com/blog/kimi-k2-6 🔗 Weights & code: huggingface.co/moonshotai/Kim…

English

80

144

2.9K

478.2K

Evolvent AI@Evolvent_AI·20 Nis

@mervenoyann Can confirm — K2.6 isn’t just a demo-reel model. It outperformed Gemini 3.1 Pro on ClawMark. Our independent test: x.com/evolvent_ai/st…

Can confirm — K2.6 isn't just a demo-reel model. Few days ago, we received a bug report from kimi team, and we got early API access, re-ran ClawMark (our living-world openclaw benchmark). After fixing a compatibility bug in openclaw's repo (github.com/openclaw/openc…), K2.6 lands at 0.684 avg score — edging out gemini-3.1-pro (0.682) and jumping +0.124 over K2.5. Shipping shaders and agentic benchmark gains in the same release is a pretty rare combo. 👀

English

39

merve@mervenoyann·20 Nis

kimi k2.6 is out: open source coding sota 🔥 > 32B/1T MoE with 256k context > long horizon coding + better website design > most interesting: agent swarms (300 subagents can do 4k steps) & Claw groups (multiple self improving agents)

English

5

96

5.6K

Evolvent AI@Evolvent_AI·20 Nis

@JulianGoldieSEO Kimi 2.6 just proved it’s NOT a demo-reel model! We tested it on ClawMark and it beat Gemini 3.1 Pro. Full results: x.com/evolvent_ai/st…

Can confirm — K2.6 isn't just a demo-reel model. Few days ago, we received a bug report from kimi team, and we got early API access, re-ran ClawMark (our living-world openclaw benchmark). After fixing a compatibility bug in openclaw's repo (github.com/openclaw/openc…), K2.6 lands at 0.684 avg score — edging out gemini-3.1-pro (0.682) and jumping +0.124 over K2.5. Shipping shaders and agentic benchmark gains in the same release is a pretty rare combo. 👀

English

1

1.4K

Julian Goldie SEO@JulianGoldieSEO·20 Nis

𝗞𝗶𝗺𝗶 𝗞𝟮.𝟲 𝗶𝘀 𝗖𝗵𝗶𝗻𝗮'𝘀 𝗻𝗲𝘄 𝗼𝗽𝗲𝗻 𝘀𝗼𝘂𝗿𝗰𝗲 𝗮𝗻𝘀𝘄𝗲𝗿 𝘁𝗼 𝗖𝗹𝗮𝘂𝗱𝗲 𝗖𝗼𝗱𝗲. It has 1 trillion parameters but only uses 32 billion at a time. Context window holds 256,000 tokens. It scored 85% on Live Code Bench. Claude hit 64%. It plugs into OpenClaw, Cursor, and Cline with no restrictions. Kimi Claw deploys in the cloud in 2 minutes. It's way cheaper to run than Claude day to day. Save this. Your coding stack just got a real rival.

English

30

95

940

101.6K

Evolvent AI@Evolvent_AI·20 Nis

@dhh ClawMark Benchmark confirms: Kimi 2.6 just walked past gemini-3.1-pro in a living-world openclaw gauntlet. No lab games, real agent tasks. Huge congrats to the team !! Full data here: x.com/evolvent_ai/st…

Can confirm — K2.6 isn't just a demo-reel model. Few days ago, we received a bug report from kimi team, and we got early API access, re-ran ClawMark (our living-world openclaw benchmark). After fixing a compatibility bug in openclaw's repo (github.com/openclaw/openc…), K2.6 lands at 0.684 avg score — edging out gemini-3.1-pro (0.682) and jumping +0.124 over K2.5. Shipping shaders and agentic benchmark gains in the same release is a pretty rare combo. 👀

English

3

1.5K

DHH@dhh·20 Nis

I've been a K2.5 superfan since it came out. These new numbers for the next version look incredible. You gotta love competition!

Meet Kimi K2.6: Advancing Open-Source Coding 🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2) What's new: 🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization). 🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D. 🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files. 🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops. 🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop. - K2.6 is now live on kimi.com in chat mode and agent mode. For production-grade coding, pair K2.6 with Kimi Code: kimi.com/code - 🔗 API: platform.moonshot.ai 🔗 Tech blog: kimi.com/blog/kimi-k2-6 🔗 Weights & code: huggingface.co/moonshotai/Kim…

English

55

70

2K

177K

Evolvent AI@Evolvent_AI·20 Nis

@aakashgupta Kimi 2.6 just proved it’s NOT a demo-reel model! We tested it on ClawMark and it beat Gemini 3.1 Pro. Full results: x.com/evolvent_ai/st…

Can confirm — K2.6 isn't just a demo-reel model. Few days ago, we received a bug report from kimi team, and we got early API access, re-ran ClawMark (our living-world openclaw benchmark). After fixing a compatibility bug in openclaw's repo (github.com/openclaw/openc…), K2.6 lands at 0.684 avg score — edging out gemini-3.1-pro (0.682) and jumping +0.124 over K2.5. Shipping shaders and agentic benchmark gains in the same release is a pretty rare combo. 👀

English

1

277

Aakash Gupta@aakashgupta·20 Nis

Kimi K2.6 just matched or beat Claude Opus 4.6, GPT-5.4, and Gemini 3.1 Pro on 6 benchmarks, dropped the weights on HuggingFace, and the Western AI press barely covered it. That's the story everyone's missing. Moonshot is now shipping frontier-tier open weights every 8 to 10 weeks. K2 in July 2025. K2.5 in January 2026. K2.6 today. Each release closes the gap with the closed US labs on the benchmarks that matter for agents, and K2.6 actually wins on SWE-Bench Pro (58.6 vs Claude's 57.7), SWE-Bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0 vs Claude's 47.2), and MathVision w/ python (93.2). The pricing delta is the part nobody is pricing in. Kimi K2 runs roughly $0.60 per million input tokens and $2.50 per million output. Claude Sonnet 4.6 is $3.00 and $15.00. That's 5x and 6x. The open weights mean anyone with H100s can host it themselves at closer to unit cost. Now the cascade. Every closed lab prices on the assumption that a capable substitute doesn't exist. Anthropic, OpenAI, and Google all have pricing power because the benchmark gap to open weights was real in 2024 and shrinking in 2025. In April 2026, on agent tasks, the gap inverted on some benchmarks. The subscription business depends on enterprises not benchmarking alternatives on their actual workloads. The second a procurement team runs a two-week bake-off with K2.6 on their actual codebase and the outputs pass QA, the 6x cost savings make the switch defensible in a slide. One CFO does this math out loud and it moves. The deeper point. There are now two frontier races happening in parallel. The US labs are racing to AGI and charging $200/mo for it. Moonshot, DeepSeek, and Qwen are racing to match frontier capability at commodity prices with open weights. These are different games with different endgames. The closed labs win if capability scales faster than compute gets cheaper. The open labs win if the marginal frontier benchmark point stops mattering to 90% of enterprise workloads. K2.6 is the first release where the second scenario stops being theoretical for coding agents.