Giorgio Robino

29.4K posts

Giorgio Robino banner
Giorgio Robino

Giorgio Robino

@solyarisoftware

Conversational LLM-based Applications Specialist @almawave | Former ITD-CNR Researcher | Soundscapes (Orchestral) Composer.

Genova, Italia 参加日 Nisan 2009
4.4K フォロー中3.2K フォロワー
固定されたツイート
Giorgio Robino
Giorgio Robino@solyarisoftware·
My preprint "Conversation Routines: A Prompt Engineering Framework for Task-Oriented Dialog Systems" now has a revised version on @arXiv with updated experimental results. Here’s a thread with the changes! 🧵 ➡️ Paper: arxiv.org/abs/2501.11613 1/ What’s CR?
English
1
2
3
461
Giorgio Robino がリツイート
AIDailyGems
AIDailyGems@AIDailyGems·
AMD's Lemonade is an open-source local LLM server using GPU and NPU, offering a fast way to run models locally. Developers can leverage their AMD hardware more effectively. lemonade-server.ai
English
0
1
0
35
Giorgio Robino がリツイート
Anshu Sharma 🌶
Anshu Sharma 🌶@anshublog·
“Agents will generate workflows dynamically. Applications will get thinner. And the systems that manage memory, state, coordination, and history will become more important than ever. Which is why I think databases are moving back to the center of software architecture. Not as storage. As runtime.”
siddontang@siddontang

x.com/i/article/2046…

English
8
8
84
20.6K
Giorgio Robino がリツイート
Ksenia_TuringPost
Ksenia_TuringPost@TheTuringPost·
There’s a serious gap in multimodal models – they work with images, but still reason in language, which isn’t that precise for visual stuff. @deepseek_ai just dropped an idea to solve this: let the model literally point to exact locations in the image while it thinks. They call it "Thinking with Visual Primitives." These visual primitives are: - points (specific locations) - bounding boxes (areas in the image) Using them, the model knows what exactly it’s referring to and achieves ~77% better accuracy on average (vs. Gemini 3 Flash's 76.5% and 71.1% for GPT-5.4) Plus, only ~80–90 visual tokens are kept in memory after compression thanks to the efficient architecture Here is how it works:
Ksenia_TuringPost tweet media
English
11
53
371
21.9K
Giorgio Robino がリツイート
Jerry Liu
Jerry Liu@jerryjliu0·
Parsing PDFs is hard This past week I gave a few talks (at both AI Dev '26 by @DeepLearningAI and @Capgemini ) on why this is still such an open problem, and it’s even more important as agents become the consumers of documents, and need the OCR tools to read them properly. The fundamental issue is that PDFs are designed for print and display purposes, not to give back a linearized, semantically meaningful string of text. Text and tables are represented as a bunch of chars and lines, without any guaranteed order. This is what the community is solving with VLM-based approaches, including our own efforts around LlamaParse and ParseBench. If you’re interested in learning more about the problem, check out the blog post I wrote on this a while ago! llamaindex.ai/blog/why-readi…
Jerry Liu tweet media
English
14
15
113
8.7K
Giorgio Robino がリツイート
机器之心 JIQIZHIXIN
机器之心 JIQIZHIXIN@jiqizhixin·
What if your LLM could verify its own reasoning with near-human precision? Stanford & UC Berkeley researchers present LLM-as-a-Verifier: a general-purpose framework that gives fine-grained feedback by breaking tasks into smaller criteria, scoring with higher granularity, and rechecking multiple times. Result: State-of-the-art performance on Terminal-Bench (86.4%) and SWE-Bench Verified (77.8%) — outperforming Claude Opus 4.6, GPT 5.4, and Gemini models. LLM-as-a-Verifier: A General-Purpose Verification Framework Blog: llm-as-a-verifier.notion.site Code: llm-as-a-verifier.github.io Our report: mp.weixin.qq.com/s/wmjQ2Kxw7Qdw… 📬 #PapersAccepted by Jiqizhixin
机器之心 JIQIZHIXIN tweet media
English
4
24
158
12.4K
Giorgio Robino がリツイート
Pydantic
Pydantic@pydantic·
Online evals are live in Pydantic Logfire. Attach evaluators to any agent, score live production traffic, see hallucination rate and tool-use accuracy trend in the UI.
English
2
3
18
1.2K
Giorgio Robino がリツイート
alphaXiv
alphaXiv@askalphaxiv·
“Recursive Multi-Agent Systems” Many multi-agent LLM systems rely on agents passing text back and forth. This paper argues for a different approach where it makes agents recur together in latent space. So agents refine latent thoughts, pass hidden states across one another, and only decode text at the end. The key idea is that recursion scales the whole agent system, not just one model, and in their experiments this makes collaboration more accurate, faster, and much cheaper in tokens.
alphaXiv tweet media
English
10
86
481
23.8K
Giorgio Robino がリツイート
Rohan Paul
Rohan Paul@rohanpaul_ai·
Research proves that current AI agent groups cannot reliably coordinate or agree on simple decisions. Building teams of AI agents that can consistently agree on a final decision is surprisingly difficult for LLMs. But problem is that developers frequently assume that if you have enough AI agents working together, they will eventually figure out how to solve a problem by talking it through. This paper shows that this assumption is currently wrong. Even in a friendly environment where every agent is trying to help, the team often gets stuck or stops responding entirely. Because this happens more often as the group gets bigger, it means we cannot yet trust these agent systems to handle tasks where they must agree on a correct answer. ---- Paper Link – arxiv. org/abs/2603.01213 Paper Title: "Can AI Agents Agree?"
Rohan Paul tweet media
English
65
66
327
23.3K
Giorgio Robino がリツイート
Sebastián Ramírez
Sebastián Ramírez@tiangolo·
Install the library skills bundled with your dependencies (like FastAPI) for your coding agent 🤖 In Python or Node.js, both versions support both ecosystems ✨ github.com/tiangolo/libra…
Sebastián Ramírez tweet media
English
8
21
178
10.1K
Giorgio Robino がリツイート
elvis
elvis@omarsar0·
// Recursive Multi-Agent Systems // Great read for the weekend. (bookmark it) Multi-agent systems often pass full text messages between agents at every step. This leads to token bloat, latency, and context dilution which all grow with the number of agents. RecursiveMAS asks a different question: what if agents collaborated through recursive computation in a shared latent space, instead of through text? A multi-agent system can be treated as a recursive computation, where each agent acts like an RLM layer, iteratively passing latent representations to the next and forming a looped interaction process. They introduce a RecursiveLink module that generates latent thoughts and transfers state directly between heterogeneous agents, plus an inner-outer loop learning algorithm with shared gradient-based credit assignment across the team. Think of it as agents passing notes in their own internal language instead of rewriting everything in English each turn. Less talking, more thinking. The numbers are strong. Across 9 benchmarks spanning math, science, medicine, search, and code generation: 8.3% average accuracy gain over baselines, 1.2×–2.4× end-to-end inference speedup, and 34.6%–75.6% reduction in token usage. Why does it matter? If agent-to-agent communication is the next real bottleneck (and it is), latent-space recursion is one of the cleaner ways to scale collaboration without paying a token tax for every coordination step. Paper: arxiv.org/abs/2604.25917 Learn to build effective AI agents in our academy: academy.dair.ai
elvis tweet media
English
25
61
366
45.1K
Giorgio Robino がリツイート
Mario Zechner
Mario Zechner@badlogicgames·
People of pi.dev. As a weekend gift, we added @XiaomiMiMo Token Plan as a first class provider. I also made some breaking changes for the better. If you have custom providers and models, point pi at the changelog so it can fix them up for you. This will be a recuring theme in the coming days and weeks. We'll get through it together.
Mario Zechner tweet media
English
14
13
220
11.5K
Giorgio Robino がリツイート
Qwen
Qwen@Alibaba_Qwen·
Today we’re releasing Qwen-Scope 🔭, an open suite of sparse autoencoders for the Qwen model family. It turns SAE features into practical tools: 🎯 Inference — Steer model outputs by directly manipulating internal features, no prompt engineering needed 📂 Data — Classify & synthesize targeted data with minimal seed examples, boosting long-tail capabilities 🏋️ Training — Trace code-switching & repetitive generation back to their source, fix them at the root 📊 Evaluation — Analyze feature activation patterns to select smarter benchmarks and cut redundancy We hope the community uses Qwen-Scope to uncover new mechanisms inside Qwen models and build applications beyond what we explored.Excited to see what you build! 🚀 🔗🔗 Blog: qwen.ai/blog?id=qwen-s… HuggingFace: huggingface.co/collections/Qw… ModelScope: modelscope.cn/collections/Qw… Technical Report: …anwen-res.oss-accelerate.aliyuncs.com/qwen-scope/Qwe…
Qwen tweet media
English
92
360
2.6K
348.4K
Giorgio Robino がリツイート
David Hendrickson
David Hendrickson@TeksEdge·
☀️ Qwen just dropped something big for personal AI. ✨They released Qwen-Scope, the first major open Sparse Autoencoder (SAE) toolkit for real models. 💡 Instead of wrestling with prompts, you can now directly steer Qwen models by manipulating internal features. Why this matters? 🧠 Precise, reliable control when running models locally 🛠️ Fix repetition, hallucinations & bad behaviors at the source 📊 Smarter data synthesis and evaluation 🚀 A real step toward controllable, sovereign personal agents This is unique as no other top lab has open-sourced practical tools for mechanistic control of open models like this (that I know of) The future of personal AI isn’t just bigger models. It’s controllable ones. Qwen-Scope just took a huge leap forward. 🔥
David Hendrickson tweet media
Qwen@Alibaba_Qwen

Today we’re releasing Qwen-Scope 🔭, an open suite of sparse autoencoders for the Qwen model family. It turns SAE features into practical tools: 🎯 Inference — Steer model outputs by directly manipulating internal features, no prompt engineering needed 📂 Data — Classify & synthesize targeted data with minimal seed examples, boosting long-tail capabilities 🏋️ Training — Trace code-switching & repetitive generation back to their source, fix them at the root 📊 Evaluation — Analyze feature activation patterns to select smarter benchmarks and cut redundancy We hope the community uses Qwen-Scope to uncover new mechanisms inside Qwen models and build applications beyond what we explored.Excited to see what you build! 🚀 🔗🔗 Blog: qwen.ai/blog?id=qwen-s… HuggingFace: huggingface.co/collections/Qw… ModelScope: modelscope.cn/collections/Qw… Technical Report: …anwen-res.oss-accelerate.aliyuncs.com/qwen-scope/Qwe…

English
3
3
13
1.9K
Giorgio Robino がリツイート
Richard Palethorpe
Richard Palethorpe@jichiep·
New model release! LocalVQE: Tiny ~1M param audio model that cancels echo, noise and reverberations in real-time and comes with a @ggml_org implementation out of the gate.
English
8
29
229
14.7K
Giorgio Robino がリツイート
Eric
Eric@Ex0byt·
I cannot be the only one who noticed this. Qwen just quietly ended black-box AI today. I had to implement it myself just to show y'all how big this is. You can now literally see every concept firing in a model and turn any feature on or off. My Demo on HuggingFace: hf.co/spaces/Ex0bit/…
English
11
160
1.6K
174.2K
Giorgio Robino がリツイート
Giorgio Robino がリツイート
Kun Chen
Kun Chen@kunchenguid·
gnhf 0.1.27+ now supports the Pi agent harness! thanks to a contribution PR github.com/kunchenguid/gn…
Kun Chen tweet media
English
3
4
105
8K
Giorgio Robino がリツイート
antirez
antirez@antirez·
Europe AI strategy should be to specialize on AI inference and improvement of large open weight models, while we try to recover the GPU / companies gap to have a viable internal path. A large Chinese open weight model that works is only better than an European-trained weak one.
English
19
11
208
12.2K