Fede_Ranaldi

121 posts

Fede_Ranaldi banner
Fede_Ranaldi

Fede_Ranaldi

@FedeRanaldi

I'm Federico Ranaldi, PhD in Data Science at @unitorvergata. My research focuses on LLMs handling Formal Domains spanning from Logic to Code.

Frascati, Rome Katılım Mart 2023
182 Takip Edilen89 Takipçiler
Yu Wang
Yu Wang@__YuWang__·
We just released a new survey of Agent Memory! we frame agent memory along three orthogonal axes: • Substrate — where memory lives • Cognitive mechanism — what role it plays (episodic/semantic/procedural) • Subject — who it serves (user/agent)
Yu Wang tweet mediaYu Wang tweet media
English
13
38
157
11.6K
Fede_Ranaldi retweetledi
Oren Sultan
Oren Sultan@oren_sultan·
Can LLMs reliably predict program termination? We evaluate frontier LLMs in the International Competition on Software Verification (SV-COMP) 2025, directly competing with state-of-the-art verification systems. @AIatMeta @HebrewU @Bloomberg @imperialcollege @ucl @jordiae @pascalkesseli @jvanegue @HyadataLab @adiyossLC @PeterOHearn12 Paper: arxiv.org/pdf/2601.18987 Website: orensultan.com/llms_halting_p… 🧵👇 1/n
Oren Sultan tweet media
English
9
42
115
43.4K
Kingsley Uyi Idehen
Kingsley Uyi Idehen@kidehen·
@JayaGup10 Semantic Web == Context / Knowledge Graphs. And yes, it has always been a 1 Trillion+ opportunity waiting for its perfect complement in the form of LLMs 😀
English
1
0
2
520
Jaya Gupta
Jaya Gupta@JayaGup10·
Before LLMs, Palantir was competing with Snowflake and Databricks. Post-LLMs, they do not believe they have any competitors. Why? Snowflake/Databricks optimized for SQL and query throughput: get raw data into tables, run fast analytical reads, ship dashboards and models on top. Palantir made a different bet: an ontology, a world model where data is represented the way humans actually reason about it (objects, relationships, properties; nouns/verbs/adjectives). Back then, that was built for government analysts trying to make sense of messy, interdependent systems. Then LLMs arrived and the ontology suddenly looked like the perfect interface because models don’t want a trillion rows. They want a structured, language-shaped substrate: named entities, typed relationships, constraints, and “what interacts with what”, something you can linearize into a coherent prompt, traverse, and act on. The bigger implication for decision traces is that the “context graph” problem we wrote about has multiple architectural solutions: Platform-first (example: Palantir): prescribe the unified world model upfront. Pay the integration + ontology + embedded-team tax (months of use case discovery / workflow decomposition / “process mining”), and in return you get a substrate that can connect data to decisions because everything now lives inside the same model for an extremely absurd price. Workflow-first (decision traces): don’t start by rebuilding the world. Instrument the moments where the world changes. Capture decision receipts at commit surfaces: inputs referenced, policy/constraints, exception path, approvals, action taken, outcome. Over time (not day 1), that write-time provenance becomes its own world model, learned from trajectories rather than imposed upfront (there will be many different methods here) And importantly: this is still an ontology approach, just a different kind. Palantir prescribes the ontology first. Our take is that startups can learn it bottom-up from traces. You start by capturing what people actually do at the decision surface: what evidence is referenced, which approvals happen, what exceptions recur, what actions are taken, what outcomes follow and over time, infer the minimal set of entities + relations that explain those trajectories. The missing piece is decision traces: without them, you have state, but not the legible “why”!! Cc @akoratana
Jaya Gupta@JayaGup10

x.com/i/article/2003…

English
71
206
2.2K
374.7K
𝖋𝖎𝖗𝖔𝖟
𝖋𝖎𝖗𝖔𝖟@firozkhxn_·
KGs + LLMs are exploding—hard to track every new idea. One repo collects 200+ recent papers, surveys & code on marrying knowledge graphs with large language models. Saves weeks of literature crawl. github.com/zjukg/KG-LLM-P…
English
1
0
1
16
Fede_Ranaldi retweetledi
Stella Li
Stella Li@StellaLisy·
🤔💭What even is reasoning? It's time to answer the hard questions! We built the first unified taxonomy of 28 cognitive elements underlying reasoning Spoiler—LLMs commonly employ sequential reasoning, rarely self-awareness, and often fail to use correct reasoning structures🧠
Stella Li tweet media
English
11
45
262
28.2K
Fede_Ranaldi
Fede_Ranaldi@FedeRanaldi·
@WikiResearch Hi. I share our work in which we analyze how LLMs leverage their parametric knowledge acquired from pretraining for recalling structured information like pieces of Knowledge Graphs and solving downstream tasks like #TextToSPARQL. arxiv.org/abs/2505.15501
English
0
0
1
44
Haider.
Haider.@slow_developer·
important research paper from google... "LLMs don't just memorize, they build a geometric map that helps them reason" according to the paper: – builds a global map from only local pairs – plans full unseen paths when knowledge is in weights; fails in context – turns a many-step path into a 1-step pick – comes from a natural training bias; room to make memory more geometric
Haider. tweet media
English
54
112
779
52.9K
Harish Tayyar Madabushi
Is In-Context Learning (ICL) in LLMs Memorisation? Emergence? Some Algorithmic Capability? 🤔 📢New work exploring ICL in LLMs: arxiv.org/abs/2505.11004 💡Key Finding: ICL capabilities are linked to token frequency 🤨 Strap in for the unexpected 🤯 A 🧵👇 #NLProc #LLMs
Harish Tayyar Madabushi tweet media
English
2
4
22
1.1K
Fede_Ranaldi retweetledi
Adi Simhi
Adi Simhi@AdiSimhi·
🤔What happens when LLM agents choose between achieving their goals and avoiding harm to humans in realistic management scenarios? Are LLMs pragmatic or prefer to avoid human harm? 🚀 New paper out: ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs🚀🧵
Adi Simhi tweet media
English
1
15
36
4K
Fede_Ranaldi
Fede_Ranaldi@FedeRanaldi·
@denizbayazit Good work! We run a more static analysis to locate typological features in monolingual BERTs. Our method method tests whether syntactic and morphological differences between languages are reflected in their monolingual models. aclanthology.org/2023.findings-…
English
0
0
2
77
Deniz Bayazit
Deniz Bayazit@denizbayazit·
1/🚨 New preprint How do #LLMs’ inner features change as they train? Using #crosscoders + a new causal metric, we map when features appear, strengthen, or fade across checkpoints—opening a new lens on training dynamics beyond loss curves & benchmarks. #interpretability
Deniz Bayazit tweet media
English
2
11
48
5.1K
Anton Tsitsulin
Anton Tsitsulin@graph_·
𝐋𝐞𝐭 𝐘𝐨𝐮𝐫 𝐆𝐫𝐚𝐩𝐡 𝐃𝐨 𝐭𝐡𝐞 𝐓𝐚𝐥𝐤𝐢𝐧𝐠: 𝐄𝐧𝐜𝐨𝐝𝐢𝐧𝐠 𝐒𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞𝐝 𝐃𝐚𝐭𝐚 𝐟𝐨𝐫 𝐋𝐋𝐌𝐬 Don't know what to do with your graphs in 2024? Shove them to an LLM, of course, and let LLM figure out what to do! arxiv.org/abs/2402.05862 Short thread (1/5):
Anton Tsitsulin tweet media
English
9
40
285
44.2K
François Chollet
François Chollet@fchollet·
I believe that program synthesis will solve reasoning. And I believe that deep learning will solve program synthesis (by guiding a discrete program search process). But I don't think you can go all that far with just prompting a LLM to generate end-to-end Python programs (even with a verification step and many samples). That won't scale to very long programs.
English
45
74
829
170.4K
Casper Hansen
Casper Hansen@casper_hansen_·
i don't think you understand you train on Text-to-SQL it gets 73.7% on Spider you train on Text-to-Cypher it gets 5.5% exact match you realize they share abstractions schema grounding, joins, filtering you train on both Spider jumps to 76.5% Text2Cypher hits 6.2% you add Chain-of-Thought it starts explaining its queries you add reinforcement learning it learns from execution feedback you design a topology-aware reward it understands graph edit distance you realize SQL helps Cypher and Cypher helps SQL cross-formalism transfer is real you test on MongoDB Query Language never trained on it it still works you test on table QA never trained on it 62.5% exact match you test on knowledge graph QA never trained on it 86.3% accuracy you realize you didn't train a SQL model you didn't train a Cypher model you trained structured reasoning itself someone says "but it's just 32B parameters" you point to QwQ-32B-trained-Both it beats o3 on Text2Cypher they say "but the datasets overlap" you show the ablations even single-task training transfers they say "but it's not real understanding" your model generalizes to unseen query languages they go quiet you realize you didn't build a parser you built a bridge between natural language and every structured formalism that will ever exist tell me you hate this type of post, but do read the paper🔽 arxiv.org/abs/2506.21575
Casper Hansen tweet media
English
14
35
409
30.5K
Fede_Ranaldi
Fede_Ranaldi@FedeRanaldi·
@TomSheffer17807 @l2m2_workshop @aclmeeting Thank you Tom. Obviously our research deserves to be extended on different levels such as studying internal representations or applying adaptation techniques through mechanistic interpretability methods.
English
0
0
0
52
Fede_Ranaldi
Fede_Ranaldi@FedeRanaldi·
@taskinfatih @l2m2_workshop @aclmeeting Thank you for the exchange of ideas. I consider quite appropriate to invest time in formalizing task instances through a theoretical perspective rather than making experiments and evaluate model capabilities just by looking at Input Output.
English
0
0
1
24
Fede_Ranaldi
Fede_Ranaldi@FedeRanaldi·
@taskinfatih @l2m2_workshop @aclmeeting Hi Fatih. Our idea is related with Chollet's program synthesis and is specifically adapted to reasoning applied to #knowledgegraphs. We adopt a neuro-symbolic approach aimed at evaluating real model's capabilities and detecting potential benchmark contamination.
English
1
0
2
65