Giorgio Robino

29.4K posts

Giorgio Robino

@solyarisoftware

Conversational LLM-based Applications Specialist @almawave | Former ITD-CNR Researcher | Soundscapes (Orchestral) Composer.

Genova, Italia انضم Nisan 2009

4.4K يتبع3.2K المتابعون

تغريدة مثبتة

Giorgio Robino@solyarisoftware·18 Şub

My preprint "Conversation Routines: A Prompt Engineering Framework for Task-Oriented Dialog Systems" now has a revised version on @arXiv with updated experimental results. Here’s a thread with the changes! 🧵 ➡️ Paper: arxiv.org/abs/2501.11613 1/ What’s CR?

English

461

Giorgio Robino أُعيد تغريده

Jerry Liu@jerryjliu0·4h

Parsing PDFs is hard This past week I gave a few talks (at both AI Dev '26 by @DeepLearningAI and @Capgemini ) on why this is still such an open problem, and it’s even more important as agents become the consumers of documents, and need the OCR tools to read them properly. The fundamental issue is that PDFs are designed for print and display purposes, not to give back a linearized, semantically meaningful string of text. Text and tables are represented as a bunch of chars and lines, without any guaranteed order. This is what the community is solving with VLM-based approaches, including our own efforts around LlamaParse and ParseBench. If you’re interested in learning more about the problem, check out the blog post I wrote on this a while ago! llamaindex.ai/blog/why-readi…

English

107

Giorgio Robino أُعيد تغريده

机器之心 JIQIZHIXIN@jiqizhixin·16h

What if your LLM could verify its own reasoning with near-human precision? Stanford & UC Berkeley researchers present LLM-as-a-Verifier: a general-purpose framework that gives fine-grained feedback by breaking tasks into smaller criteria, scoring with higher granularity, and rechecking multiple times. Result: State-of-the-art performance on Terminal-Bench (86.4%) and SWE-Bench Verified (77.8%) — outperforming Claude Opus 4.6, GPT 5.4, and Gemini models. LLM-as-a-Verifier: A General-Purpose Verification Framework Blog: llm-as-a-verifier.notion.site Code: llm-as-a-verifier.github.io Our report: mp.weixin.qq.com/s/wmjQ2Kxw7Qdw… 📬 #PapersAccepted by Jiqizhixin

English

155

12K

Giorgio Robino أُعيد تغريده

Tom Dörr@tom_doerr·1d

Survey on combining multiple large language models github.com/junchenzhi/Awe…

English

4.1K

Giorgio Robino أُعيد تغريده

Pydantic@pydantic·2d

Online evals are live in Pydantic Logfire. Attach evaluators to any agent, score live production traffic, see hallucination rate and tool-use accuracy trend in the UI.

English

1.2K

Giorgio Robino أُعيد تغريده

alphaXiv@askalphaxiv·2d

“Recursive Multi-Agent Systems” Many multi-agent LLM systems rely on agents passing text back and forth. This paper argues for a different approach where it makes agents recur together in latent space. So agents refine latent thoughts, pass hidden states across one another, and only decode text at the end. The key idea is that recursion scales the whole agent system, not just one model, and in their experiments this makes collaboration more accurate, faster, and much cheaper in tokens.

English

481

23.7K

Giorgio Robino أُعيد تغريده

Rohan Paul@rohanpaul_ai·2d

Research proves that current AI agent groups cannot reliably coordinate or agree on simple decisions. Building teams of AI agents that can consistently agree on a final decision is surprisingly difficult for LLMs. But problem is that developers frequently assume that if you have enough AI agents working together, they will eventually figure out how to solve a problem by talking it through. This paper shows that this assumption is currently wrong. Even in a friendly environment where every agent is trying to help, the team often gets stuck or stops responding entirely. Because this happens more often as the group gets bigger, it means we cannot yet trust these agent systems to handle tasks where they must agree on a correct answer. ---- Paper Link – arxiv. org/abs/2603.01213 Paper Title: "Can AI Agents Agree?"

English

327

23.3K

Giorgio Robino أُعيد تغريده

Sebastián Ramírez@tiangolo·2d

Install the library skills bundled with your dependencies (like FastAPI) for your coding agent 🤖 In Python or Node.js, both versions support both ecosystems ✨ github.com/tiangolo/libra…

English

178

10.1K

Giorgio Robino أُعيد تغريده

elvis@omarsar0·2d

// Recursive Multi-Agent Systems // Great read for the weekend. (bookmark it) Multi-agent systems often pass full text messages between agents at every step. This leads to token bloat, latency, and context dilution which all grow with the number of agents. RecursiveMAS asks a different question: what if agents collaborated through recursive computation in a shared latent space, instead of through text? A multi-agent system can be treated as a recursive computation, where each agent acts like an RLM layer, iteratively passing latent representations to the next and forming a looped interaction process. They introduce a RecursiveLink module that generates latent thoughts and transfers state directly between heterogeneous agents, plus an inner-outer loop learning algorithm with shared gradient-based credit assignment across the team. Think of it as agents passing notes in their own internal language instead of rewriting everything in English each turn. Less talking, more thinking. The numbers are strong. Across 9 benchmarks spanning math, science, medicine, search, and code generation: 8.3% average accuracy gain over baselines, 1.2×–2.4× end-to-end inference speedup, and 34.6%–75.6% reduction in token usage. Why does it matter? If agent-to-agent communication is the next real bottleneck (and it is), latent-space recursion is one of the cleaner ways to scale collaboration without paying a token tax for every coordination step. Paper: arxiv.org/abs/2604.25917 Learn to build effective AI agents in our academy: academy.dair.ai

English

365

45K

Giorgio Robino أُعيد تغريده

Mario Zechner@badlogicgames·1d

People of pi.dev. As a weekend gift, we added @XiaomiMiMo Token Plan as a first class provider. I also made some breaking changes for the better. If you have custom providers and models, point pi at the changelog so it can fix them up for you. This will be a recuring theme in the coming days and weeks. We'll get through it together.

English

220

11.5K

Giorgio Robino أُعيد تغريده

Qwen@Alibaba_Qwen·3d

Today we’re releasing Qwen-Scope 🔭, an open suite of sparse autoencoders for the Qwen model family. It turns SAE features into practical tools： 🎯 Inference — Steer model outputs by directly manipulating internal features, no prompt engineering needed 📂 Data — Classify & synthesize targeted data with minimal seed examples, boosting long-tail capabilities 🏋️ Training — Trace code-switching & repetitive generation back to their source, fix them at the root 📊 Evaluation — Analyze feature activation patterns to select smarter benchmarks and cut redundancy We hope the community uses Qwen-Scope to uncover new mechanisms inside Qwen models and build applications beyond what we explored.Excited to see what you build! 🚀 🔗🔗 Blog: qwen.ai/blog?id=qwen-s… HuggingFace: huggingface.co/collections/Qw… ModelScope: modelscope.cn/collections/Qw… Technical Report: …anwen-res.oss-accelerate.aliyuncs.com/qwen-scope/Qwe…

English

360

2.6K

348.3K

Giorgio Robino أُعيد تغريده

David Hendrickson@TeksEdge·3d

☀️ Qwen just dropped something big for personal AI. ✨They released Qwen-Scope, the first major open Sparse Autoencoder (SAE) toolkit for real models. 💡 Instead of wrestling with prompts, you can now directly steer Qwen models by manipulating internal features. Why this matters? 🧠 Precise, reliable control when running models locally 🛠️ Fix repetition, hallucinations & bad behaviors at the source 📊 Smarter data synthesis and evaluation 🚀 A real step toward controllable, sovereign personal agents This is unique as no other top lab has open-sourced practical tools for mechanistic control of open models like this (that I know of) The future of personal AI isn’t just bigger models. It’s controllable ones. Qwen-Scope just took a huge leap forward. 🔥

Qwen@Alibaba_Qwen

English

1.9K

Giorgio Robino أُعيد تغريده

Richard Palethorpe@jichiep·3d

New model release! LocalVQE: Tiny ~1M param audio model that cancels echo, noise and reverberations in real-time and comes with a @ggml_org implementation out of the gate.

English

229

14.7K

Giorgio Robino أُعيد تغريده

Eric@Ex0byt·2d

I cannot be the only one who noticed this. Qwen just quietly ended black-box AI today. I had to implement it myself just to show y'all how big this is. You can now literally see every concept firing in a model and turn any feature on or off. My Demo on HuggingFace: hf.co/spaces/Ex0bit/…

English

160

1.6K

174.2K

Giorgio Robino أُعيد تغريده

Ant Ling@AntLingAGI·3d

Ecosystem-first approach continued! Ling-2.6-1T officially landed on @huggingface and the official inference is now live via @novita_labs. Experience the efficiency of Ling-2.6-1T for yourself, front and center on HF model card page! 🔥

Ant Ling@AntLingAGI

Last week, we introduced Ling-2.6-1T. Today, Ling-2.6-1T is officially an open model~ 🤗 1T total parameters · 63B active parameters We bring values to developers by making it easier to test, deploy, customize, and build. It is optimized to be "token efficiency" for real production needs: • Lower token overhead: strong intelligence without long reasoning traces • Reliable multi-step execution: better instruction, tool, context, and workflow control • Production-ready deployment: from code generation to bug fixing, with broad agent framework compatibility A sneak pick into the agentic capability in @opencode

English

3.9K

Giorgio Robino أُعيد تغريده

Kun Chen@kunchenguid·3d

gnhf 0.1.27+ now supports the Pi agent harness! thanks to a contribution PR github.com/kunchenguid/gn…

English

105

Giorgio Robino أُعيد تغريده

antirez@antirez·3d

Europe AI strategy should be to specialize on AI inference and improvement of large open weight models, while we try to recover the GPU / companies gap to have a viable internal path. A large Chinese open weight model that works is only better than an European-trained weak one.

English

208

12.2K

Giorgio Robino أُعيد تغريده

elvis@omarsar0·4d

// Agentic Harness Engineering // Pay attention to this one, AI devs. (bookmark it) Most coding-agent harnesses are still tuned by hand or brittle trial-and-error self-evolution. This new work introduces Agentic Harness Engineering, a framework that makes harness evolution observable. They do this through three layers: components as revertible files, experience as condensed evidence from millions of trajectory tokens, and decisions as falsifiable predictions checked against task outcomes. Each edit becomes a contract you can verify or revert. Results: pass@1 on Terminal-Bench 2 climbs from 69.7% to 77.0% in ten iterations, beating human-designed Codex-CLI (71.9%) and self-evolving baselines like ACE and TF-GRPO. The evolved harness also transfers across model families with +5.1 to +10.1 point gains, while using 12% fewer tokens than the seed on SWE-bench-verified. Harness work is the biggest hidden cost in most agent systems. This is the first credible recipe for letting the harness improve itself without drifting into noise. Paper: arxiv.org/abs/2604.25850 Learn to build effective AI agents in our academy: academy.dair.ai

English

231

1.6K

132.7K

Giorgio Robino أُعيد تغريده

Alex Prompter@alex_prompter·3d

Both OpenAI and Anthropic just released official prompting guides. Both say the same thing. Your old prompts don’t work anymore. But for opposite reasons. Claude Opus 4.7 stopped guessing what you meant. It does exactly what you type. Nothing more, nothing less. Vague instructions that worked on 4.6? They now produce narrow, literal, sometimes worse results. Not because the model got dumber. Because it stopped compensating for sloppy thinking. GPT-5.5 went the other direction. OpenAI’s guide literally says: “Don’t carry over instructions from older prompt stacks.” Legacy prompts over-specify the process because older models needed hand-holding. GPT-5.5 doesn’t. That extra detail now creates noise and produces mechanical output. Claude got more literal. GPT got more autonomous. Both now punish the same thing: prompts written without clear thinking behind them. One developer on Reddit captured it perfectly after analyzing hundreds of community posts. The complaints tracked almost perfectly with prompt specificity. Precise prompts got better results on 4.7. Vague prompts got worse. The model didn’t regress. The prompts did. OpenAI’s new framework is “outcome-first prompting.” Describe what good looks like. Define success criteria. Set constraints. Then get out of the way. The model picks the path. Anthropic’s framework is the inverse: be surgically specific about what you want, because the model won’t fill in your blanks anymore. Two different architectures. Two different philosophies. One identical conclusion: the person writing the prompt is now the bottleneck, not the model. Boris Cherny, the engineer who built Claude Code, posted on launch day that even he needed a few days to adjust. That post got 936 likes. Meanwhile, Anthropic increased rate limits for all subscribers because the new tokenizer uses up to 35% more tokens on the same input. The model is more expensive to run lazily. Cheaper to run precisely. The models are converging in capability. The gap between good and bad output is no longer about which model you pick. It’s about the 2 minutes of structured thinking you do before you type anything. That thinking system is the skill. The prompt is just what it produces.

English

118

270

2.3K

330.7K

Giorgio Robino أُعيد تغريده

Jerry Liu@jerryjliu0·3d

This is really well thought out. Filesystems are the new default abstraction for agents to interact with documents (the new RAG stack in 2026). The issue is actually figuring out how to productize this; you can't "productize" Claude Code over a local file system. Seems like this tool has all the semantics of filesystems with the versioning of git

Oliver@olvrgln

Introducing Mesa: the most powerful filesystem ever built, designed specifically for enterprise AI agents. Every team building agents eventually hits the same wall: where do the files live? Not the chat history, the actual artifacts the agent works on. > The contracts your agent redlined > The claim files it updated > The 200-page audit report it edited overnight while you were asleep Today those documents live in a sandbox that dies in 30 minutes, an S3 bucket where concurrent writes clobber each other, or a GitHub repo that was never built to absorb agent-scale traffic. So we built Mesa. The world's first POSIX-compatible filesystem with built-in version control, designed from the ground up for agents. You mount it into your sandbox like any other filesystem. Your agent reads and writes files normally. Behind the scenes every change is versioned, branchable, reviewable, and rollback-able — like a codebase, for any file type. Mesa provides – Branches so agents work in parallel without locking – Durable storage that survives sandbox death – Sparse materialization so massive document sets load instantly – Fine-grained access control per agent – Full history for human review and audit Design partners are running Mesa in production across legal, healthcare, GTM, business ops, and coding agents. Private beta is open: link in the comments

English

457

100.3K

Giorgio Robino أُعيد تغريده

Unsloth AI@UnslothAI·4d

Mistral releases Mistral Medium 3.5, a new vision reasoning model. 🔥 Mistral-Medium-3.5-128B offers highly competitive performance for models 6x its size. Run locally on ~64GB RAM. Guide: unsloth.ai/docs/models/mi… GGUFs: huggingface.co/unsloth/Mistra…

English

101

695

58.9K

اكتشف

@DeepLearningAI @Capgemini @XiaomiMiMo @ggml_org @huggingface @novita_labs @elonmusk @BarackObama