
arc
132 posts



We've shipped more than a thousand versions of Zed, but all of them began with zero. Today, that changes. zed.dev/blog/zed-1-0

AI is making kids dumber. It should be making them geniuses. Introducing Koji, the first AI tutor that gets kids to actually think. 👇



Thanks @Gavriel_Cohen. You’re right. I never used an IDE. Claude Code made all edits. No @karpathy ‘vibe coding’. All I did was ‘tool assembly’ to create a utility that worked in my domain!

Meet Kimi K2.6: Advancing Open-Source Coding 🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2) What's new: 🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization). 🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D. 🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files. 🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops. 🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop. - K2.6 is now live on kimi.com in chat mode and agent mode. For production-grade coding, pair K2.6 with Kimi Code: kimi.com/code - 🔗 API: platform.moonshot.ai 🔗 Tech blog: kimi.com/blog/kimi-k2-6 🔗 Weights & code: huggingface.co/moonshotai/Kim…

Are LLMs truly fair and consistent when judging other AI models? A collaborative team from Peking University, NUS, Institute of Science Tokyo, Nanjing University, Carnegie Mellon, Westlake, and Southeast University has the answer! They introduce TrustJudge, a probabilistic framework designed to eliminate critical inconsistencies in "LLM-as-a-judge" evaluations. It uses smarter scoring with continuous probabilities and intelligently resolves contradictory comparisons to ensure truly logical and reliable assessments. Tested with Llama-3.1-70B-Instruct, TrustJudge slashes evaluation inconsistencies by up to 10.82% and maintains higher accuracy across diverse models. TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them Paper: arxiv.org/abs/2509.21117 Code: github.com/TrustJudge/Tru… Our report: mp.weixin.qq.com/s/09Xe7gKpAnr6… 📬 #PapersAccepted by Jiqizhixin


I lived with a Chinese EV for a few weeks to see if the hype is real. The car costs $42K, and turns out it feels like $75K+ EASILY One of the most impressive things I've ever reviewed: youtu.be/Mb6H7trzMfI - A+ software and features. Feels like what would happen if Apple made a car - Build quality is excellent all the way around. And materials (leathers, metals, etc) are all premium - It crushes all the fundamentals to make it livable: 320 miles range, super comfortable seats, excellent air suspension, active noise cancellation, great displays and cameras, bright clear HUD, Self driving - It has a MODULAR interior design (detailed in the video) - Performance is sneaky great. This is just the "SU7 Max" spec, but 660 horsepower 0-60 in 2.8 seconds? Sheesh




Soon small businesses won’t have to wait for Small Business Saturday to get attention from their Mayor. Some changes that they can look forward to:

We’re delighted by Google’s success — they’ve made great advances in AI and we continue to supply to Google. NVIDIA is a generation ahead of the industry — it’s the only platform that runs every AI model and does it everywhere computing is done. NVIDIA offers greater performance, versatility, and fungibility than ASICs, which are designed for specific AI frameworks or functions.

Introducing Claude Opus 4.5: the best model in the world for coding, agents, and computer use. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.

Polkadot shipped native XCM in 2020—still one of the cleanest designs ever built. Fast-forward to Devconnect this week: Ethereum lands RIP-7755 + ERC-7683 and the whole L2 fragmentation starts disappearing. Easily the most important update from the entire conference. Kinda sad that Polkadot lost momentum with Parachain auctions forcing teams to lock hundreds of millions in DOT for two years while the EVM world was already deep into DeFi, NFTs, yield. Polkadot’s shared-security L0 was objectively the cleaner, more principled interoperability design, yet it lost to Ethereum’s rollup-centric roadmap simply because the EVM ecosystem already owned the liquidity, tooling, and mindshare by the time Polkadot went live. ETH keeps winning with strategy and principles, system of beliefs - at least it feels like this. Somehow the non tangible things overpower tech sophistication.