melophile arc
1.5K posts

melophile arc
@biotechdrops
cooking something


genlayer is optimized for onchain and ai-native applications because traditional blockchains were never designed for ai-level reasoning and execution most chains are built for fixed logic but ai works differently it processes context, adapts to inputs, and generates decisions

Excited to share that TwinRouterBench has been accepted to the #RLEval Workshop at #CAIS2026 🎉 As LLM apps become long-horizon agents, one request can trigger many model calls across planning, tool use, retrieval, coding, and verification. That makes per-step LLM routing a core infrastructure problem: sending each call to the cheapest sufficient model without breaking downstream success. TwinRouterBench introduces: ⚡ Static track: 970 router-visible prefixes from 520 instances across SWE-bench, BFCL, mtRAG, QMSum, and PinchBench 🚀 Dynamic track: live SWE-bench Verified evaluation with official task resolution + realized API spend Key result: a router trained on static labels achieves comparable SWE-bench resolve rate while cutting API cost by ~53% vs. an unrouted Opus 4.6 baseline. Paper: arxiv.org/html/2605.1885… Code: github.com/CommonstackAI/… Dataset: huggingface.co/datasets/Amorp… Website: commonstackai.github.io/TwinRouterBenc… #LLM #AgenticAI #LLMRouting #Benchmark #SWEBench

Excited to share that TwinRouterBench has been accepted to the #RLEval Workshop at #CAIS2026 🎉 As LLM apps become long-horizon agents, one request can trigger many model calls across planning, tool use, retrieval, coding, and verification. That makes per-step LLM routing a core infrastructure problem: sending each call to the cheapest sufficient model without breaking downstream success. TwinRouterBench introduces: ⚡ Static track: 970 router-visible prefixes from 520 instances across SWE-bench, BFCL, mtRAG, QMSum, and PinchBench 🚀 Dynamic track: live SWE-bench Verified evaluation with official task resolution + realized API spend Key result: a router trained on static labels achieves comparable SWE-bench resolve rate while cutting API cost by ~53% vs. an unrouted Opus 4.6 baseline. Paper: arxiv.org/html/2605.1885… Code: github.com/CommonstackAI/… Dataset: huggingface.co/datasets/Amorp… Website: commonstackai.github.io/TwinRouterBenc… #LLM #AgenticAI #LLMRouting #Benchmark #SWEBench




Hello everyone. X is removing it’s X communities feature on May 30th I started building this community in January 2025 and it’s been a wonderful experience to meet nearly 5,000 of you inside of our Open Intelligence Lounge @Gradient_HQ will continue to update it’s research efforts on main page and on Discord with other activities as well: discord.gg/gradientnetwork Let’s stay in touch and there will be more to come!


Exciting to see the momentum around @GenLayer The bradbury testnet is already being secured by a strong validator Glad to see builders, operators and infrastructure teams coming together this early @encapHQ @nansen_ai @NodesGuru @SenseiNode @stakeme_pro


think open, u know 404: grad404.vercel.app rich ppl go all the way, grad ppl go all the way @Gradient_HQ ./


Fraction of the bill. Same results. Fully local, open source, works with any client. Just > pipx install uncommon-route github.com/CommonstackAI/…




