
BenUsesAI
580 posts

BenUsesAI
@BenUsesAI1
replaced half my stack with ai cut time, cut costs, kept results showing what’s actually worth using



Last week in Zürich, we co-hosted a panel with @foxglove at #ActuateFieldSessions around an honest challenge: Why general purpose robot learning hasn’t had its breakthrough moment yet? The answer isn't more data, it's the right data: touch and real-world scenarios that perception alone can't capture. The data gap is real, but we’re closing it. Great conversation with @IlirAliu_ , @HoellerDavid, @Klajd_Lika, Mayank Mittal, @arbwes and the Foxglove team. #HumanoidRobots #Flexion













New in Claude Code: agent view. One list of all your sessions, available today as a research preview.



People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. thinkingmachines.ai/blog/interacti…

As an AI Engineer. Please learn: - Harness engineering, not just prompt engineering - Prompt caching vs. semantic caching tradeoffs - KV cache management at scale - Speculative decoding vs quantization - Structured output failures & fallback chains - Evals (LLM-as-judge + human evals) - Cost attribution per feature, not just per model - Agent guardrails & loop budgets - LLM observability as a first-class discipline - Model routing & graceful fallback logic - Knowing when to fine-tune vs. in-context learning


Kimi K2.6 from @Kimi_Moonshot is purpose-built for coding agents. As of today, CoreWeave ranked highest in @ArtificialAnlys’s inference benchmark on Speed vs. Price for K2.6. Speed, scale, and economics. All three at production grade.

What if your team gave standup updates, and GPT-Realtime-2 moved the tickets?


New in Claude Code: agent view. One list of all your sessions, available today as a research preview.

Artificial Analysis relies on our IFBench eval to test how closely models follow user prompts. Most evals in their Intelligence Index saturate within months. IFBench hasn't because it measures what others miss—and what frontier models still struggle with. 🧵





