Sten Rüdiger

1.6K posts

Sten Rüdiger

@StenRuediger

Built a pandemic forecasting system used at German chancellery level, turned it into revenue + DS team. Now building continual learning for LLMs.

Berlin, Germany Katılım Kasım 2018

805 Takip Edilen616 Takipçiler

Sabitlenmiş Tweet

Sten Rüdiger@StenRuediger·8 Nis

I’ve uploaded a new paper on arXiv (co-authored by @rasbt): MiCA Learns More Knowledge Than LoRA and Full Fine-Tuning In Parameter-Efficient Fine-Tuning, a key question may not just be how low-rank the update is, but *which* subspace we adapt.

English

19.2K

Sten Rüdiger@StenRuediger·15h

RL trains LLMs to improve before deployment. But can we also train LLMs to improve while they are being used? Not just learn how to pick the right tool. But learn how to store useful corrections, retrieve them later, and change behavior across future tasks. That is what I want to discuss in this article and the next one.

Sten Rüdiger@StenRuediger

x.com/i/article/2050…

English

Sten Rüdiger@StenRuediger·23h

@GaryMarcus @ylecun @SchmidhuberAI 🍿

QME

Gary Marcus@GaryMarcus·1d

Literally been saying this for years. @ylecun (who once trashed me for saying stuff like this) has become a carbon copy of me. He has done this so regularly, and without acknowledgement, that it has become hard for me not see him as a thief. @SchmidhuberAI’s experiences have of course been similar. The media should stop glorifying LeCun. And they should start looking into his past.

CG@cgtwts

Yann LeCun: “The AI industry is completely LLM-pilled. Everybody is working on the same thing. They're all digging the same trench. Meta also became LLM-pilled with sort of recent reshuffling. AI companies are all doing the same things.”

English

235

26.2K

Sten Rüdiger@StenRuediger·1d

Wow. This is exactly what I was just searching for! The next logical step is to train LLMs directly on these tasks. I.e. not just isolated tool calls, but full sequences of tasks where the model learns when to remember, when to retrieve, and when to act.

Parth Asawa@pgasawa

Today, we’re releasing Continual Learning Bench 1.0: the first, realistic benchmark for measuring how AI systems can improve in online settings. Benchmarks today assume models are stateless. Each example is independent, and once a system finishes a task, it moves on as if nothing happened. But deployed AI systems should learn from experience. We tested 10+ frontier systems against novel, expert-validated tasks and find there’s still plenty of headroom for learning. (1/n)

English

Sten Rüdiger@StenRuediger·1d

Palantir is valued at $345B. I built their entire application in two weeks and I'm making it open-source and free for everyone to use. Show more...

English

Sten Rüdiger@StenRuediger·3d

x.com/i/article/2050…

ZXX

Sten Rüdiger@StenRuediger·5d

@vasuman Only correct because the loop isn’t closed yet for knowledge work.

English

139

vas@vasuman·6d

x.com/i/article/2020…

ZXX

116

990

479.9K

Sten Rüdiger retweetledi

Sebastian Raschka@rasbt·26 Nis

April was a pretty strong month for LLM releases: - Gemma 4 - GLM-5.1 - Qwen3.6 - Kimi K2.6 - DeepSeek V4 All are now added to the LLM Architecture Gallery. More details once I am fully back in May!

English

440

121.4K

Sten Rüdiger@StenRuediger·26 Nis

@sirbayes @enjeeneer Interesting that structured context at every step beats just appending retrieved knowledge. I'd have expected negative effect of leaving out knowledge too early. Maybe the growing transcript exceeds the model's effective attention span, as @sirbayes seems to suggest?

English

Kevin Patrick Murphy@sirbayes·22 Nis

@enjeeneer Yes the sequential search then update is much better than batch search, not surprisingly. What is more surprising is that adding the structured belief state to the context helps guide the agent to better actions - as I hoped , but I was not sure it would help this much :)

English

400

Sten Rüdiger retweetledi

Kevin Patrick Murphy@sirbayes·22 Nis

New paper: "Agentic Forecasting using Sequential Bayesian Updating of Linguistic Beliefs". Our system (BLF) matches human superforecasters on ForecastBench, and beats all the top methods (GPT-5, Cassi, Grok 4.20, and Foresight-32B). 🧵

English

210

28.1K

Sten Rüdiger retweetledi

Andrew Carr 🤸@andrew_n_carr·24 Nis

Language models aren't able to output raw probabilities, so they can't be used for forecasting...unless you're Kevin Murphy. Then you make a structured output and use Platt scaling to calibrate and you get SOTA on forecast bench

Kevin Patrick Murphy@sirbayes

@rsalakhu @subail @chuckjhoover @asenkut @FHaskaraman Congrats! BTW you might find my recent paper of interest... arxiv.org/abs/2604.18576

English

582

66K

Sten Rüdiger@StenRuediger·22 Nis

The new GPT-image model produces neat research summaries! And yes, you can take a look at the paper too 😀 arxiv.org/abs/2604.01694

English

Sten Rüdiger@StenRuediger·21 Nis

Am I the only one having a problem with the lack of useful inference endpoints on EU-headquartered cloud?

legalgenius@KIJurist

EU-sovereign inference for agentic LLMs in 2026? I benchmarked the main options so you don't have to. For reference, on US/China data centers the best OSS model scores 91 and Opus-4.7 scores 92. 1/ @nebiusai: used to be a go-to. They just removed the best open-source models from their EU-operated DCs. Out. 2/ Mistral Large 2512 (@MistralAI's current flagship): timed out on 3/10 sample questions. The rest averaged 31/100. Unusable. 3/ @Scaleway: serves Qwen3.5-397B at 80/100. Steep premium over Alibaba's own hosting, but it actually works. Winner: Scaleway's Qwen3.5-397B, but only by default.

English

113

Sten Rüdiger retweetledi

Peter Ottsjö@peterottsjo·19 Nis

OpenAI's Rosalind model and Novo Nordisk partnership is just a snapshot of a much bigger AI x bio story. Here’s what you see when you zoom out - and why it’s all happening right here and now. 🧵 (1/7)

English

105

20.2K

Sten Rüdiger retweetledi

legalgenius@KIJurist·20 Nis

Really impressed by the recent updates to Opus 4.7 and GLM 5.1, which made agentic search and reasoning for German legal research just hit a new high. legalgenius.de

English

Sten Rüdiger@StenRuediger·19 Nis

@MichaelAArouet Looks suspiciously like the result of the Euro introduction in 1999

English

Michael A. Arouet@MichaelAArouet·18 Nis

25 years ago, the US and Germany had similar labor productivity. Germany was a global industrial powerhouse. Then Germany followed the left-green path of overregulation, bureaucracy, energy madness and redistribution, and became the sick man of Europe. Don’t be like Germany.

English

266

1.9K

6.6K

380.4K

Sten Rüdiger@StenRuediger·15 Nis

@CheapAIToken @rasbt Thanks! Working on the inclusion into the PEFT library.

English

Cheap AI Token@CheapAIToken·14 Nis

@StenRuediger @rasbt Really cool idea! Targeting underutilized subspaces makes sense for new knowledge. MiCA outperforming LoRA is impressive!Any plans to release code?

English

Sten Rüdiger@StenRuediger·8 Nis

English

19.2K

Sten Rüdiger@StenRuediger·15 Nis

Great view on continual learning. I actually started working on MiCA to tackle this and catastrophic forgetting. There are early signs it helps through: i) stronger uptake of new knowledge ii) less degradation on general knowledge benchmarks

Ilija Lichkovski@carnot_cyclist

x.com/i/article/2041…

English

222

Sten Rüdiger@StenRuediger·15 Nis

No. Skills are a workaround for continual learning and often overfitting. Agents can figure out how an API/tool works, but it costs tokens, and they can’t reliably decide which results should be stored. Until continual learning is solved, that burden sits with us writing skills.

Garry Tan@garrytan

x.com/i/article/2042…

English

149

Sten Rüdiger@StenRuediger·13 Nis

@garrytan Skills may simply be overfitting: arxiv.org/abs/2604.04323

English

371