Celian Ringwald

883 posts

Celian Ringwald

@ringwald_c

Datartisan working with knowledge, graphs and texts - Phd Student at INRIA, 3IA, CNRS, I3S, UCA

@[email protected] Katılım Mart 2015

959 Takip Edilen334 Takipçiler

Sabitlenmiş Tweet

Celian Ringwald@ringwald_c·10 Kas

We're happy to share our latest research accepted at #KCAP2025: “Overcoming the Generalization Limits of SLM Finetuning for Shape-Based Extraction of Datatype and Object Properties” 📄arxiv.org/abs/2511.03407 👨‍💻 github.com/Wimmics/Kastor

English

Celian Ringwald retweetledi

Gallicagram@gallicagram·10 Şub

🔴Gallicagram v2, nous voici !🔴 Nouvelle interface en react vachement plus stable et rapide, nouveaux corpus, recherche contextuelle, comparaisons inter-corpus, filtre rubrique, bilingue, infinite scrolling... on vous explique tout ! gallicagram.com

Français

2.1K

Celian Ringwald@ringwald_c·4 Ara

@zehavoc I had the experience with deepl recently that proposed me a full sentence when i wanted to translate a simple word from french to english…

English

Djamé..@zehavoc·3 Ara

Weird... I was used to see google search hallucinates stuff a bit on this "research synthesis" but not on the links themselves. Look at the third link. I was like "Quake 3 ? cool" but the paper only talks about some poker game #Enshitification

English

Celian Ringwald@ringwald_c·21 Kas

My PhD thesis is almost finished, so I wrote an article on Medium to explain it in simple terms. "Once upon a time, there was a Kastor 🦫 (semantic beaver) in the forest of Wikipedia knowledge..." @3iacotedazur/semantic-relation-extraction-with-frugal-language-models-3314db2559a6" target="_blank" rel="nofollow noopener">medium.com/@3iacotedazur/… #SLM #KG #RelationExtraction #SHACL #NLP #SemanticWeb

English

112

Celian Ringwald@ringwald_c·2 Eki

London is awesome 🤩 #KCL #ResearchVisit

English

337

Celian Ringwald@ringwald_c·26 Eyl

-Echos from the room of text, triples & constraints- Really excited to visit the KCL Knowledge Graphs Lab over the next two weeks! A stay that sounds like a hyperbolic time chamber, to discuss and join forces on LLM/Constraints RQs, before writing the last pages of my PhD

English

119

Celian Ringwald@ringwald_c·3 Eyl

@Dorialexander Amazing 🤩 i also tested some other constraints from oulipo.net/fr/contraintes. Which is already a pretty benchmark waiting for crazy scientists to find evaluation methods that scale :p

English

Alexander Doria@Dorialexander·3 Eyl

Oulipo-styled benchmark: write a story without the letter e. So far Gemini winning vs. OpenAI. And generally better than I would expect given tokenization.

English

1.4K

Celian Ringwald@ringwald_c·13 Ağu

Really sad to see the PaperWithCode shutting down too, for the moment huggingface.co/papers/trending do not entirely do the same job...

English

138

Celian Ringwald retweetledi

Wimmics Team@wimmics·19 Haz

LLM & Linked Data by Fabien Gandon, Inria @W3C AC 2025 member meeting. He highlighted how Linked Data standards enable knowledge extraction, sharing, and machine learning across domains like robotics, culture, medicine, and chemistry. youtu.be/CVFhPYTVBlI?si…

YouTube

English

460

Celian Ringwald retweetledi

Anna Sofia Lippolis@dersuchende_·1 Haz

#ESWC2025 kicks off! Join us in the next days for two sessions on LLM-driven ontology generation: 🔹 ELMKE workshop: Assessing LLMs to generate domain ontologies arxiv.org/abs/2504.17402 🔹 Research track: Ontology generation with LLMs arxiv.org/abs/2503.05388 @eswc_conf

English

328

Celian Ringwald retweetledi

Alexander Doria@Dorialexander·29 May

Ever more confirmation language models are behaving as latent knowledge graphs.

Anthropic@AnthropicAI

Our interpretability team recently released research that traced the thoughts of a large language model. Now we’re open-sourcing the method. Researchers can generate “attribution graphs” like those in our study, and explore them interactively.

English

402

30.2K

Celian Ringwald@ringwald_c·26 May

I am looking forward #ESWC2025 in Portorož next week ! The occasion to present our last work: Kastor - 🦫 A framework that includes Human-in-the-loop and combines #KG and #SLMs to produce #RDF and shape-based relation extractors. > Author version: hal.science/hal-05078493v1

English

371

Celian Ringwald@ringwald_c·21 May

@debayan woo ! i didn’t realized that it was really possible and acceptable to do it: youtube.com/watch?v=1kwbp8… in this talk the sankana team are proposing to create ia generated paper track, maybe a better idea than bothering reviewers in regular tracks

YouTube

English

debayan@debayan·21 May

@ringwald_c I see! I have a feeling I am reviewing some AI written papers!

English

debayan@debayan·21 May

Is AI capable of writing code that is SoTA on a dataset, and also the paper supporting the code?

English

121

Celian Ringwald@ringwald_c·19 May

@debayan really good questions, i was thinking about small models but with llm this is even harder to evaluate 😶‍🌫️ high perf of these models make us forget sometimes this potential memorisation/generalization illusion

English

debayan@debayan·19 May

@ringwald_c Yes, but should we consider the date overlap as a factor during review? Should authors really try to use an LLM that dates older than dataset release to be completely scientifically sound from an evaluation perspective?

English

debayan@debayan·19 May

When reviewing a paper that uses an LLM backend for a popular QA dataset, how do I exclude the possibility of test-set leakage, as the LLM may have memorised the dataset during pre-training?

English

120

Celian Ringwald@ringwald_c·19 May

Just received my #LookingForAPostDoc tshirt :) See you soon for talking about shapes, rdf based relation extraction and Kastor (beavers) in a wood of knowledge.

English

245

Celian Ringwald retweetledi

Andrej Karpathy@karpathy·20 Mar

When working with LLMs I am used to starting "New Conversation" for each request. But there is also the polar opposite approach of keeping one giant conversation going forever. The standard approach can still choose to use a Memory tool to write things down in between conversations (e.g. ChatGPT does so), so the "One Thread" approach can be seen as the extreme special case of using memory always and for everything. The other day I've come across someone saying that their conversation with Grok (which was free to them at the time) has now grown way too long for them to switch to ChatGPT. i.e. it functions like a moat hah. LLMs are rapidly growing in the allowed maximum context length *in principle*, and it's clear that this might allow the LLM to have a lot more context and knowledge of you, but there are some caveats. Few of the major ones as an example: - Speed. A giant context window will cost more compute and will be slower. - Ability. Just because you can feed in all those tokens doesn't mean that they can also be manipulated effectively by the LLM's attention and its in-context-learning mechanism for problem solving (the simplest demonstration is the "needle in the haystack" eval). - Signal to noise. Too many tokens fighting for attention may *decrease* performance due to being too "distracting", diffusing attention too broadly and decreasing a signal to noise ratio in the features. - Data; i.e. train - test data mismatch. Most of the training data in the finetuning conversation is likely ~short. Indeed, a large fraction of it in academic datasets is often single-turn (one single question -> answer). One giant conversation forces the LLM into a new data distribution it hasn't seen that much of during training. This is in large part because... - Data labeling. Keep in mind that LLMs still primarily and quite fundamentally rely on human supervision. A human labeler (or an engineer) can understand a short conversation and write optimal responses or rank them, or inspect whether an LLM judge is getting things right. But things grind to a halt with giant conversations. Who is supposed to write or inspect an alleged "optimal response" for a conversation of a few hundred thousand tokens? Certainly, it's not clear if an LLM should have a "New Conversation" button at all in the long run. It feels a bit like an internal implementation detail that is surfaced to the user for developer convenience and for the time being. And that the right solution is a very well-implemented memory feature, along the lines of active, agentic context management. Something I haven't really seen at all so far. Anyway curious to poll if people have tried One Thread and what the word is.

English

668

551

6.6K

829.3K

Celian Ringwald@ringwald_c·10 Mar

the video is here youtu.be/5Aer7MUSuSU?si…

YouTube

English

107

Celian Ringwald@ringwald_c·10 Mar

Still a source of inspiration :)

Stanford NLP Group@stanfordnlp

An introductory talk by Christopher Manning @chrmanning on “Large Language Models in 2025 – How much understanding and intelligence?” at the Workshop on a Public AI Assistant to Worldwide Knowledge at @Stanford, covering 3 eras of LLMs, RAG, Agents, DeepSeek-R1, using LLMs, ….

English

4.1K

Keşfet

@zehavoc @Dorialexander @eswc_conf @debayan @elonmusk @BarackObama @taylorswift13 @cristiano