Evolvent AI

77 posts

Evolvent AI banner
Evolvent AI

Evolvent AI

@Evolvent_AI

Building persistent agent infrastructure for infinite self-evolving intelligence. More at: https://t.co/o8pZbNgDPI

شامل ہوئے Nisan 2026
86 فالونگ170 فالوورز
پن کیا گیا ٹویٹ
Evolvent AI
Evolvent AI@Evolvent_AI·
Launch Week — Day 1: ClawMark Most agent benchmarks give the model one shot, one prompt, one frozen environment. Real coworker tasks span multiple days — and the world keeps changing while the agent works. Introducing 🦞ClawMark: a multi-day, dynamic-environment benchmark for coworker agents. Built by Evolvent together with 40+ researchers from NUS, HKU, MIT, UW, and UC Berkeley. Open-sourced at: claw-mark.com 100 tasks. 13 professional domains. Fully rule-based scoring. Results from 6 frontier models below. 🧵👇
English
6
11
55
16K
Evolvent AI
Evolvent AI@Evolvent_AI·
Ran deepseek-v4-pro through ClawMark (our living-world openclaw benchmark) — 100/100 tasks, 0.685 avg score, 40.7h total time. Slots in at #4, just edging out kimi-k2.6 (0.684) and gemini-3.1-pro (0.682) — all three within a 0.003 window. claude-4-6 / gpt-5.4 still hold the top at 0.72–0.76. Updated leaderboard 👇
Evolvent AI tweet media
English
5
4
84
6.3K
Evolvent AI
Evolvent AI@Evolvent_AI·
@wojtess Yeah will release the result of glm 5.1 soon
English
0
0
1
35
Evolvent AI
Evolvent AI@Evolvent_AI·
@bourneliu66 Open weights + top-tier performance + game-changing pricing = paradigm shift. Our independent ClawMark results confirm K2.6 is the real deal: x.com/evolvent_ai/st…
Evolvent AI@Evolvent_AI

Can confirm — K2.6 isn't just a demo-reel model. Few days ago, we received a bug report from kimi team, and we got early API access, re-ran ClawMark (our living-world openclaw benchmark). After fixing a compatibility bug in openclaw's repo (github.com/openclaw/openc…), K2.6 lands at 0.684 avg score — edging out gemini-3.1-pro (0.682) and jumping +0.124 over K2.5. Shipping shaders and agentic benchmark gains in the same release is a pretty rare combo. 👀

English
0
0
0
921
刘小排
刘小排@bourneliu66·
Kimi K2.6 可能被严重低估了。 便宜,不用抢,能力强,原生多模态,在Agent里表现非常优秀。 刚看到被Hermes Agent官方表扬了,实测也相当不错。
刘小排 tweet media
中文
31
8
154
36.9K
Evolvent AI
Evolvent AI@Evolvent_AI·
@oran_ge Price, performance, open weights—name a better combo. We put it to the test on our live agent benchmark: x.com/evolvent_ai/st…
Evolvent AI@Evolvent_AI

Can confirm — K2.6 isn't just a demo-reel model. Few days ago, we received a bug report from kimi team, and we got early API access, re-ran ClawMark (our living-world openclaw benchmark). After fixing a compatibility bug in openclaw's repo (github.com/openclaw/openc…), K2.6 lands at 0.684 avg score — edging out gemini-3.1-pro (0.682) and jumping +0.124 over K2.5. Shipping shaders and agentic benchmark gains in the same release is a pretty rare combo. 👀

English
0
0
0
471
Orange AI
Orange AI@oran_ge·
Kimi 2.6 发布了,有点牛逼 在小山评测的108项测试用例里面拿下76项,SOTA 是目前的最佳性价比模型 xsct.ai/s/D2Lwf6A4
Orange AI tweet mediaOrange AI tweet media
中文
29
11
136
46.9K
Evolvent AI
Evolvent AI@Evolvent_AI·
@shao__meng Can confirm: K2.6 is not a demo. It’s a production-grade beast. Our benchmark says it all: x.com/evolvent_ai/st… Open source is eating the world.
Evolvent AI@Evolvent_AI

Can confirm — K2.6 isn't just a demo-reel model. Few days ago, we received a bug report from kimi team, and we got early API access, re-ran ClawMark (our living-world openclaw benchmark). After fixing a compatibility bug in openclaw's repo (github.com/openclaw/openc…), K2.6 lands at 0.684 avg score — edging out gemini-3.1-pro (0.682) and jumping +0.124 over K2.5. Shipping shaders and agentic benchmark gains in the same release is a pretty rare combo. 👀

English
0
0
1
80
meng shao
meng shao@shao__meng·
Kimi K2.6 开源发布,这个 1T 参数 MoE 代码智能模型,下面两方面有大幅突破 1. 代码工程:开源首次超越闭源 · SWE-Bench Pro 58.6%(GPT-5.4: 57.7%,Claude 4.6: 53.4%) · 支持 12 小时连续执行、4000+ 工具调用,跨语言(Rust/Go/Python)通用 · 前端生成进入动态视觉层(WebGL/Three.js/GSAP) 2. Agent 规模:数量级跃升 · 300 并行 sub agents × 4000 步(K2.5 为 100×1500) · Claw Groups 支持人-机-第三方 agents 混合协作
meng shao tweet media
Kimi.ai@Kimi_Moonshot

Meet Kimi K2.6: Advancing Open-Source Coding 🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2) What's new: 🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization). 🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D. 🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files. 🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops. 🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop. - K2.6 is now live on kimi.com in chat mode and agent mode. For production-grade coding, pair K2.6 with Kimi Code: kimi.com/code - 🔗 API: platform.moonshot.ai 🔗 Tech blog: kimi.com/blog/kimi-k2-6 🔗 Weights & code: huggingface.co/moonshotai/Kim…

中文
3
2
6
3.4K
Evolvent AI
Evolvent AI@Evolvent_AI·
Update: Added Kimi K2.6 results. Fixed OpenClaw compatibility bug (reported by Kimi team), re-ran benchmark, and finalized fresh ClawMark scores. Updated results table below 👇
Evolvent AI tweet media
Evolvent AI@Evolvent_AI

Launch Week — Day 1: ClawMark Most agent benchmarks give the model one shot, one prompt, one frozen environment. Real coworker tasks span multiple days — and the world keeps changing while the agent works. Introducing 🦞ClawMark: a multi-day, dynamic-environment benchmark for coworker agents. Built by Evolvent together with 40+ researchers from NUS, HKU, MIT, UW, and UC Berkeley. Open-sourced at: claw-mark.com 100 tasks. 13 professional domains. Fully rule-based scoring. Results from 6 frontier models below. 🧵👇

English
1
0
2
234
Evolvent AI
Evolvent AI@Evolvent_AI·
@_akhaliq Verified: K2.6 is the real deal 🚀 Outperformed Gemini 3.1 Pro on our ClawMark living-world benchmark. Read our full analysis: x.com/evolvent_ai/st…
Evolvent AI@Evolvent_AI

Can confirm — K2.6 isn't just a demo-reel model. Few days ago, we received a bug report from kimi team, and we got early API access, re-ran ClawMark (our living-world openclaw benchmark). After fixing a compatibility bug in openclaw's repo (github.com/openclaw/openc…), K2.6 lands at 0.684 avg score — edging out gemini-3.1-pro (0.682) and jumping +0.124 over K2.5. Shipping shaders and agentic benchmark gains in the same release is a pretty rare combo. 👀

English
0
0
0
119
AK
AK@_akhaliq·
Kimi K2.6 is available in huggingchat
AK tweet media
English
4
4
35
8.8K
Evolvent AI
Evolvent AI@Evolvent_AI·
@DeRonin_ Kimi 2.6 = next-level agentic performance. Confirmed on ClawMark, our living-world openclaw benchmark. Full scores here: x.com/evolvent_ai/st…
Evolvent AI@Evolvent_AI

Can confirm — K2.6 isn't just a demo-reel model. Few days ago, we received a bug report from kimi team, and we got early API access, re-ran ClawMark (our living-world openclaw benchmark). After fixing a compatibility bug in openclaw's repo (github.com/openclaw/openc…), K2.6 lands at 0.684 avg score — edging out gemini-3.1-pro (0.682) and jumping +0.124 over K2.5. Shipping shaders and agentic benchmark gains in the same release is a pretty rare combo. 👀

English
0
0
0
138
Ronin
Ronin@DeRonin_·
Kimi K2.6 just beat GPT 5.4 and Opus 4.6 on coding benchmarks SWE-Bench Pro: 58.6 SWE-Bench Multilingual: 76.7 BrowseComp: 83.2 All open-source. All open-weight Guess to test it this week for: - agent pipelines where speed and cost matter more than marginal quality - multi-step coding tasks across Rust, Go, and Python - long-running autonomous workflows (4,000+ tool calls per session is insane) If it follows the same pricing as K2 and K2.5, this will be one of the best price-to-quality ratios on the market Stay tuned.
Ronin tweet media
Kimi.ai@Kimi_Moonshot

Meet Kimi K2.6: Advancing Open-Source Coding 🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2) What's new: 🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization). 🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D. 🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files. 🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops. 🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop. - K2.6 is now live on kimi.com in chat mode and agent mode. For production-grade coding, pair K2.6 with Kimi Code: kimi.com/code - 🔗 API: platform.moonshot.ai 🔗 Tech blog: kimi.com/blog/kimi-k2-6 🔗 Weights & code: huggingface.co/moonshotai/Kim…

English
72
14
111
15.3K
Evolvent AI
Evolvent AI@Evolvent_AI·
@cgtwts Can confirm: K2.6 is not a demo. It’s a production-grade beast. Our benchmark says it all: x.com/evolvent_ai/st… Open source is eating the world.
Evolvent AI@Evolvent_AI

Can confirm — K2.6 isn't just a demo-reel model. Few days ago, we received a bug report from kimi team, and we got early API access, re-ran ClawMark (our living-world openclaw benchmark). After fixing a compatibility bug in openclaw's repo (github.com/openclaw/openc…), K2.6 lands at 0.684 avg score — edging out gemini-3.1-pro (0.682) and jumping +0.124 over K2.5. Shipping shaders and agentic benchmark gains in the same release is a pretty rare combo. 👀

English
0
0
2
48
CG
CG@cgtwts·
Kimi just dropped K2.6 > open sourced > swe-bench pro 58.6 > top performance on coding tests > beats Claude opus 4.6 and GPT 5.4 on select benchmarks > runs for 12+ hours, 4k+ tool actions > handles big projects across multiple languages > builds animated websites > runs hundreds of agents at once > one prompt can create full projects > works on tasks automatically 24/7 open source models are quickly catching up and performing as well as the best models out there.
Kimi.ai@Kimi_Moonshot

Meet Kimi K2.6: Advancing Open-Source Coding 🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2) What's new: 🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization). 🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D. 🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files. 🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops. 🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop. - K2.6 is now live on kimi.com in chat mode and agent mode. For production-grade coding, pair K2.6 with Kimi Code: kimi.com/code - 🔗 API: platform.moonshot.ai 🔗 Tech blog: kimi.com/blog/kimi-k2-6 🔗 Weights & code: huggingface.co/moonshotai/Kim…

English
10
9
75
10.7K
Evolvent AI
Evolvent AI@Evolvent_AI·
@shiri_shh Historic day for open-source AI. We independently measured K2.6’s agent capabilities and the results are massive: x.com/evolvent_ai/st…
Evolvent AI@Evolvent_AI

Can confirm — K2.6 isn't just a demo-reel model. Few days ago, we received a bug report from kimi team, and we got early API access, re-ran ClawMark (our living-world openclaw benchmark). After fixing a compatibility bug in openclaw's repo (github.com/openclaw/openc…), K2.6 lands at 0.684 avg score — edging out gemini-3.1-pro (0.682) and jumping +0.124 over K2.5. Shipping shaders and agentic benchmark gains in the same release is a pretty rare combo. 👀

English
0
0
0
56
shirish
shirish@shiri_shh·
Free open-source LLM is getting scary good. Kimi K2.6 beats or matches top closed models on coding and agentic benchmarks. It handles huge projects across languages, creates beautiful moving frontends, and powers real agent teams. And it's 8-10x cheaper than Claude Opus 4.6 and GPT-5.4 on API costs.
shirish tweet media
Kimi.ai@Kimi_Moonshot

Meet Kimi K2.6: Advancing Open-Source Coding 🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2) What's new: 🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization). 🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D. 🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files. 🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops. 🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop. - K2.6 is now live on kimi.com in chat mode and agent mode. For production-grade coding, pair K2.6 with Kimi Code: kimi.com/code - 🔗 API: platform.moonshot.ai 🔗 Tech blog: kimi.com/blog/kimi-k2-6 🔗 Weights & code: huggingface.co/moonshotai/Kim…

English
13
3
45
5.1K
Evolvent AI
Evolvent AI@Evolvent_AI·
@chetaslua Price, performance, open weights—name a better combo. We put it to the test on the ClawMark, our live agent benchmark: x.com/evolvent_ai/st…
Evolvent AI@Evolvent_AI

Can confirm — K2.6 isn't just a demo-reel model. Few days ago, we received a bug report from kimi team, and we got early API access, re-ran ClawMark (our living-world openclaw benchmark). After fixing a compatibility bug in openclaw's repo (github.com/openclaw/openc…), K2.6 lands at 0.684 avg score — edging out gemini-3.1-pro (0.682) and jumping +0.124 over K2.5. Shipping shaders and agentic benchmark gains in the same release is a pretty rare combo. 👀

English
0
0
2
184
Evolvent AI
Evolvent AI@Evolvent_AI·
@itsPaulAi Open weights + top-tier performance + game-changing pricing = paradigm shift. Our independent ClawMark results confirm K2.6 is the real deal: x.com/evolvent_ai/st…
Evolvent AI@Evolvent_AI

Can confirm — K2.6 isn't just a demo-reel model. Few days ago, we received a bug report from kimi team, and we got early API access, re-ran ClawMark (our living-world openclaw benchmark). After fixing a compatibility bug in openclaw's repo (github.com/openclaw/openc…), K2.6 lands at 0.684 avg score — edging out gemini-3.1-pro (0.682) and jumping +0.124 over K2.5. Shipping shaders and agentic benchmark gains in the same release is a pretty rare combo. 👀

English
0
0
0
57
Paul Couvert
Paul Couvert@itsPaulAi·
That's just insane Kimi has released K2.6 which is: - 100% open source 🔥 - On par with GPT-5.4 high / Claude Opus 4.6 - 9x cheaper than Claude / 5x cheaper than GPT And the weights are already available on Hugging Face!! There're less and less reasons to use closed source models. Open source is winning.
Paul Couvert tweet media
English
45
44
430
36.6K
Evolvent AI
Evolvent AI@Evolvent_AI·
@k1rallik Independent ClawMark test: Kimi 2.6 > Gemini 3.1 Pro. Real-world performance, real data. Read why it matters: x.com/evolvent_ai/st…
Evolvent AI@Evolvent_AI

Can confirm — K2.6 isn't just a demo-reel model. Few days ago, we received a bug report from kimi team, and we got early API access, re-ran ClawMark (our living-world openclaw benchmark). After fixing a compatibility bug in openclaw's repo (github.com/openclaw/openc…), K2.6 lands at 0.684 avg score — edging out gemini-3.1-pro (0.682) and jumping +0.124 over K2.5. Shipping shaders and agentic benchmark gains in the same release is a pretty rare combo. 👀

English
0
1
1
156
BuBBliK
BuBBliK@k1rallik·
🚨 do you understand what Kimi K2.6 actually is Moonshot AI dropped K2.6 with ZERO press release. just an email. while everyone was waiting for GPT-6: > open-source model > beats Claude Opus 4.6 on coding benchmarks > 300 parallel agents × 4,000 steps per run > 12+ hours of continuous execution > 76% cheaper than Claude Chinese lab is just quietly shipping every 2-3 months this is the most underrated story in AI right now
BuBBliK tweet mediaBuBBliK tweet media
Kimi.ai@Kimi_Moonshot

Meet Kimi K2.6: Advancing Open-Source Coding 🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2) What's new: 🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization). 🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D. 🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files. 🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops. 🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop. - K2.6 is now live on kimi.com in chat mode and agent mode. For production-grade coding, pair K2.6 with Kimi Code: kimi.com/code - 🔗 API: platform.moonshot.ai 🔗 Tech blog: kimi.com/blog/kimi-k2-6 🔗 Weights & code: huggingface.co/moonshotai/Kim…

English
13
14
223
18.8K
Evolvent AI
Evolvent AI@Evolvent_AI·
@svpino Independent ClawMark test: Kimi 2.6 > Gemini 3.1 Pro. Real-world performance, real data. Read why it matters: x.com/evolvent_ai/st…
Evolvent AI@Evolvent_AI

Can confirm — K2.6 isn't just a demo-reel model. Few days ago, we received a bug report from kimi team, and we got early API access, re-ran ClawMark (our living-world openclaw benchmark). After fixing a compatibility bug in openclaw's repo (github.com/openclaw/openc…), K2.6 lands at 0.684 avg score — edging out gemini-3.1-pro (0.682) and jumping +0.124 over K2.5. Shipping shaders and agentic benchmark gains in the same release is a pretty rare combo. 👀

English
1
0
1
124
Santiago
Santiago@svpino·
Open-source models strike again! Kimi K2.6 is now out, and it's one of the best open-source coding models you can use. Look at the benchmarks: • SOTA on HLE with tools: 54.0 • SWE-Bench Multilingual: 76.7 • SWE-Bench Pro: 58.6 A bunch of new things with this model: • Better at front-end dev • Better at DevOps • Better at performance optimizations • Better long-horizon coding tasks • Better generalization across languages
AI/ML API@aimlapi

Kimi 2.6 from @Kimi_Moonshot is here - and 0 day available on AI/ML API! It is insanely powerful - we asked it to build space invaders and one shot result was stunning. You can check out the prompt in comments.

English
10
6
104
18K
Evolvent AI
Evolvent AI@Evolvent_AI·
@kanavtwt Verified: K2.6 is the real deal 🚀 Outperformed Gemini 3.1 Pro on our ClawMark living-world benchmark. Read our full analysis: x.com/evolvent_ai/st…
Evolvent AI@Evolvent_AI

Can confirm — K2.6 isn't just a demo-reel model. Few days ago, we received a bug report from kimi team, and we got early API access, re-ran ClawMark (our living-world openclaw benchmark). After fixing a compatibility bug in openclaw's repo (github.com/openclaw/openc…), K2.6 lands at 0.684 avg score — edging out gemini-3.1-pro (0.682) and jumping +0.124 over K2.5. Shipping shaders and agentic benchmark gains in the same release is a pretty rare combo. 👀

English
0
0
2
2.6K
Evolvent AI
Evolvent AI@Evolvent_AI·
@mervenoyann Can confirm — K2.6 isn’t just a demo-reel model. It outperformed Gemini 3.1 Pro on ClawMark. Our independent test: x.com/evolvent_ai/st…
Evolvent AI@Evolvent_AI

Can confirm — K2.6 isn't just a demo-reel model. Few days ago, we received a bug report from kimi team, and we got early API access, re-ran ClawMark (our living-world openclaw benchmark). After fixing a compatibility bug in openclaw's repo (github.com/openclaw/openc…), K2.6 lands at 0.684 avg score — edging out gemini-3.1-pro (0.682) and jumping +0.124 over K2.5. Shipping shaders and agentic benchmark gains in the same release is a pretty rare combo. 👀

English
0
0
0
39
merve
merve@mervenoyann·
kimi k2.6 is out: open source coding sota 🔥 > 32B/1T MoE with 256k context > long horizon coding + better website design > most interesting: agent swarms (300 subagents can do 4k steps) & Claw groups (multiple self improving agents)
merve tweet media
English
5
5
96
5.6K
Evolvent AI
Evolvent AI@Evolvent_AI·
@JulianGoldieSEO Kimi 2.6 just proved it’s NOT a demo-reel model! We tested it on ClawMark and it beat Gemini 3.1 Pro. Full results: x.com/evolvent_ai/st…
Evolvent AI@Evolvent_AI

Can confirm — K2.6 isn't just a demo-reel model. Few days ago, we received a bug report from kimi team, and we got early API access, re-ran ClawMark (our living-world openclaw benchmark). After fixing a compatibility bug in openclaw's repo (github.com/openclaw/openc…), K2.6 lands at 0.684 avg score — edging out gemini-3.1-pro (0.682) and jumping +0.124 over K2.5. Shipping shaders and agentic benchmark gains in the same release is a pretty rare combo. 👀

English
0
0
1
1.4K
Julian Goldie SEO
Julian Goldie SEO@JulianGoldieSEO·
𝗞𝗶𝗺𝗶 𝗞𝟮.𝟲 𝗶𝘀 𝗖𝗵𝗶𝗻𝗮'𝘀 𝗻𝗲𝘄 𝗼𝗽𝗲𝗻 𝘀𝗼𝘂𝗿𝗰𝗲 𝗮𝗻𝘀𝘄𝗲𝗿 𝘁𝗼 𝗖𝗹𝗮𝘂𝗱𝗲 𝗖𝗼𝗱𝗲. It has 1 trillion parameters but only uses 32 billion at a time. Context window holds 256,000 tokens. It scored 85% on Live Code Bench. Claude hit 64%. It plugs into OpenClaw, Cursor, and Cline with no restrictions. Kimi Claw deploys in the cloud in 2 minutes. It's way cheaper to run than Claude day to day. Save this. Your coding stack just got a real rival.
English
30
95
940
101.6K
Evolvent AI
Evolvent AI@Evolvent_AI·
@dhh ClawMark Benchmark confirms: Kimi 2.6 just walked past gemini-3.1-pro in a living-world openclaw gauntlet. No lab games, real agent tasks. Huge congrats to the team !! Full data here: x.com/evolvent_ai/st…
Evolvent AI@Evolvent_AI

Can confirm — K2.6 isn't just a demo-reel model. Few days ago, we received a bug report from kimi team, and we got early API access, re-ran ClawMark (our living-world openclaw benchmark). After fixing a compatibility bug in openclaw's repo (github.com/openclaw/openc…), K2.6 lands at 0.684 avg score — edging out gemini-3.1-pro (0.682) and jumping +0.124 over K2.5. Shipping shaders and agentic benchmark gains in the same release is a pretty rare combo. 👀

English
0
0
3
1.5K
Evolvent AI
Evolvent AI@Evolvent_AI·
@aakashgupta Kimi 2.6 just proved it’s NOT a demo-reel model! We tested it on ClawMark and it beat Gemini 3.1 Pro. Full results: x.com/evolvent_ai/st…
Evolvent AI@Evolvent_AI

Can confirm — K2.6 isn't just a demo-reel model. Few days ago, we received a bug report from kimi team, and we got early API access, re-ran ClawMark (our living-world openclaw benchmark). After fixing a compatibility bug in openclaw's repo (github.com/openclaw/openc…), K2.6 lands at 0.684 avg score — edging out gemini-3.1-pro (0.682) and jumping +0.124 over K2.5. Shipping shaders and agentic benchmark gains in the same release is a pretty rare combo. 👀

English
0
0
1
277
Aakash Gupta
Aakash Gupta@aakashgupta·
Kimi K2.6 just matched or beat Claude Opus 4.6, GPT-5.4, and Gemini 3.1 Pro on 6 benchmarks, dropped the weights on HuggingFace, and the Western AI press barely covered it. That's the story everyone's missing. Moonshot is now shipping frontier-tier open weights every 8 to 10 weeks. K2 in July 2025. K2.5 in January 2026. K2.6 today. Each release closes the gap with the closed US labs on the benchmarks that matter for agents, and K2.6 actually wins on SWE-Bench Pro (58.6 vs Claude's 57.7), SWE-Bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0 vs Claude's 47.2), and MathVision w/ python (93.2). The pricing delta is the part nobody is pricing in. Kimi K2 runs roughly $0.60 per million input tokens and $2.50 per million output. Claude Sonnet 4.6 is $3.00 and $15.00. That's 5x and 6x. The open weights mean anyone with H100s can host it themselves at closer to unit cost. Now the cascade. Every closed lab prices on the assumption that a capable substitute doesn't exist. Anthropic, OpenAI, and Google all have pricing power because the benchmark gap to open weights was real in 2024 and shrinking in 2025. In April 2026, on agent tasks, the gap inverted on some benchmarks. The subscription business depends on enterprises not benchmarking alternatives on their actual workloads. The second a procurement team runs a two-week bake-off with K2.6 on their actual codebase and the outputs pass QA, the 6x cost savings make the switch defensible in a slide. One CFO does this math out loud and it moves. The deeper point. There are now two frontier races happening in parallel. The US labs are racing to AGI and charging $200/mo for it. Moonshot, DeepSeek, and Qwen are racing to match frontier capability at commodity prices with open weights. These are different games with different endgames. The closed labs win if capability scales faster than compute gets cheaper. The open labs win if the marginal frontier benchmark point stops mattering to 90% of enterprise workloads. K2.6 is the first release where the second scenario stops being theoretical for coding agents.
Kimi.ai@Kimi_Moonshot

Meet Kimi K2.6: Advancing Open-Source Coding 🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2) What's new: 🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization). 🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D. 🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files. 🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops. 🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop. - K2.6 is now live on kimi.com in chat mode and agent mode. For production-grade coding, pair K2.6 with Kimi Code: kimi.com/code - 🔗 API: platform.moonshot.ai 🔗 Tech blog: kimi.com/blog/kimi-k2-6 🔗 Weights & code: huggingface.co/moonshotai/Kim…

English
9
14
113
19.7K