
Giorgio Robino
29.4K posts

Giorgio Robino
@solyarisoftware
Conversational LLM-based Applications Specialist @almawave | Former ITD-CNR Researcher | Soundscapes (Orchestral) Composer.





Today we’re releasing Qwen-Scope 🔭, an open suite of sparse autoencoders for the Qwen model family. It turns SAE features into practical tools: 🎯 Inference — Steer model outputs by directly manipulating internal features, no prompt engineering needed 📂 Data — Classify & synthesize targeted data with minimal seed examples, boosting long-tail capabilities 🏋️ Training — Trace code-switching & repetitive generation back to their source, fix them at the root 📊 Evaluation — Analyze feature activation patterns to select smarter benchmarks and cut redundancy We hope the community uses Qwen-Scope to uncover new mechanisms inside Qwen models and build applications beyond what we explored.Excited to see what you build! 🚀 🔗🔗 Blog: qwen.ai/blog?id=qwen-s… HuggingFace: huggingface.co/collections/Qw… ModelScope: modelscope.cn/collections/Qw… Technical Report: …anwen-res.oss-accelerate.aliyuncs.com/qwen-scope/Qwe…




Last week, we introduced Ling-2.6-1T. Today, Ling-2.6-1T is officially an open model~ 🤗 1T total parameters · 63B active parameters We bring values to developers by making it easier to test, deploy, customize, and build. It is optimized to be "token efficiency" for real production needs: • Lower token overhead: strong intelligence without long reasoning traces • Reliable multi-step execution: better instruction, tool, context, and workflow control • Production-ready deployment: from code generation to bug fixing, with broad agent framework compatibility A sneak pick into the agentic capability in @opencode








Introducing Mesa: the most powerful filesystem ever built, designed specifically for enterprise AI agents. Every team building agents eventually hits the same wall: where do the files live? Not the chat history, the actual artifacts the agent works on. > The contracts your agent redlined > The claim files it updated > The 200-page audit report it edited overnight while you were asleep Today those documents live in a sandbox that dies in 30 minutes, an S3 bucket where concurrent writes clobber each other, or a GitHub repo that was never built to absorb agent-scale traffic. So we built Mesa. The world's first POSIX-compatible filesystem with built-in version control, designed from the ground up for agents. You mount it into your sandbox like any other filesystem. Your agent reads and writes files normally. Behind the scenes every change is versioned, branchable, reviewable, and rollback-able — like a codebase, for any file type. Mesa provides – Branches so agents work in parallel without locking – Durable storage that survives sandbox death – Sparse materialization so massive document sets load instantly – Fine-grained access control per agent – Full history for human review and audit Design partners are running Mesa in production across legal, healthcare, GTM, business ops, and coding agents. Private beta is open: link in the comments







Introducing our new work: “Learning to Orchestrate Agents in Natural Language with the Conductor” accepted at #ICLR2026 arxiv.org/abs/2512.04388 What if we trained an AI not to solve problems directly, but to act as a manager that delegates tasks to a diverse team of other AIs? To solve complex tasks, humans rarely work alone; we form teams, delegate, and communicate. Yet, multi-agent AI systems currently rely heavily on rigid, human-designed workflows or simple routers that just pick a single model. We wanted an AI that could dynamically build its own team. We trained a 7B Conductor model using Reinforcement Learning to orchestrate a pool of frontier models (including GPT-5, Gemini, Claude, and open-source models available during the period leading up to ICLR 2026). Instead of executing code, the Conductor outputs a collaborative workflow in natural language. For any given question, the Conductor specifies: 1/ Which agent to call 2/ What specific subtask to give them (acting as an expert prompt engineer) 3/ What previous messages they can see in their context window Through pure end-to-end reward maximization, amazing behaviors emerged. The Conductor learned to adapt to task difficulty: it 1-shots simple factual questions, but autonomously spins up complex planner-executor-verifier pipelines for hard coding problems. The results are very promising: The 7B Conductor surpasses the performance of every individual worker model in its pool, setting new records on LiveCodeBench (83.9%) and GPQA-Diamond (87.5%) at the time of publication. It also significantly outperforms expensive multi-agent baselines like Mixture-of-Agents at a fraction of the cost. One of our favorite features: Recursive Test-Time Scaling! By allowing the Conductor to select itself as a worker, it reads its own team's prior output, realizes if it failed, and spins up a corrective workflow on the fly. This opens a new axis for scaling compute during inference. This research proves that language models can become elite meta-prompt engineers, dynamically harnessing collective intelligence. Alongside our TRINITY research which we announced a few days earlier, this foundational research powers our new multi-agent system: Sakana Fugu! (sakana.ai/fugu-beta) 🐡 OpenReview: openreview.net/forum?id=U23A2… (ICLR 2026)


Big day for us, finally sharing what we’ve been cooking for a while. Over the past year, we kept seeing the same pattern: AI agents look great in demos, until real users break them. Today, we’re fixing that with 𝘃𝗶𝗯𝗲-𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 to build real-time, tailored evals and guardrails for your agents, in minutes. Define your intent with a prompt or a few examples. We generate edge-case datasets, and train a model aligned to your use case, outperforming state-of-the-art LLMs at a fraction of the cost. (Research paper with benchmarks in the comments) If you’re building AI agents, don’t let your users be the ones who discover the failures. Be the one who makes AI agents reliable in production and takes control at scale. Start vibe-training for free: plurai.ai/launch







