Timur

92 posts

Timur

@timurcarstensen

ML PhD at the ELLIS Institute in Tübingen supervised by Frank Hutter

Tübingen, Germany Katılım Ağustos 2013

158 Takip Edilen41 Takipçiler

Timur retweetledi

Jonas Geiping@jonasgeiping·5d

We’re training models wrong and it’s due to chatGPT. Even the modern coding agents used daily still use message-based exchanges: They send messages to users, to themselves (CoT) and to tools, and receive messages in turn. This bottlenecks even very intelligent agents to a single stream. The models cannot read while writing, cannot act while thinking and cannot think while processing information. In our new paper, see below, we discuss LLMs with parallel streams. We show that multi-stream LLMs can … 🔵Be created by instruction-tuning for the stream format 🔵Simplify user and tool use UX removing many pain points with agents and chat models (such as having to interrupt the model to get a word in) 🔵Multi-Stream LLMs are fast, they can predict+read tokens in all streams in parallel in each forward pass, improving latency 🔵 LLMs with multiple streams have an easier time encoding a separation of concerns, improving security 🔵 LLMs with many internal streams provide a legible form of parallel/cont. reasoning. Even if the main CoT stream is accidentally pressured or too focused on a particular task to voice concerns, other internal streams can subvocalize concerns that would otherwise not be verbalized. Does this sound related to a recent thinky post :) - Yes, but I don’t feel so bad about being outshipped with such a cool report on their side by 23 hours. I’ll link a 2nd thread below with a more direct comparison. I actually think both are complementary in interesting ways.

GIF

English

168

1.4K

152.2K

Timur retweetledi

Konstantin Dobler ✈️ ICLR@konstantdobler·9 May

@Hesamation Better version without arbitrary institution cutoff, some data cleaning and splitting contribution of each paper among institutions. China + USA dominant ofc, but looks a bit different, doesn't it?

English

309

93.4K

Timur retweetledi

Frank Hutter@FrankRHutter·4 May

Huge news: @prior_labs has signed a definitive agreement to be acquired by @SAP. €1B+ invested over four years to build a globally-leading frontier AI lab for structured data — in Europe, in the open. Independent entity. Same team, same mission, same open models. A massive boost to what we can do. The mission just got accelerated. Founders’ statement: priorlabs.ai/blog-posts/pri… (Deal subject to regulatory approval; terms not disclosed.)

English

511

51.2K

Timur retweetledi

Neehal Tumma@ntumm120·27 Nis

Some say Gated DeltaNet > Mamba-2. Others say Mamba-2 > Gated DeltaNet. But what if Gated DeltaNet = Mamba-2? 👀 Well maybe not exactly — but with least-squares preconditioning, we show that they reduce to the same recurrence! We use this lens to design PDN, PGDN, and PKDA: preconditioned delta-style recurrences that outperform their unpreconditioned counterparts at scale 📈 📄 Paper: arxiv.org/abs/2604.21100 w/ @loo_noel @liquidai 💻 Code: github.com/ntumm120/preco…

English

194

12.9K

Timur retweetledi

Sajad Movahedi@Sajad_Movahedi_·23 Nis

The wonderful @timurcarstensen is presenting our paper at #iclr2026 about RoPE, RNNs/SSMs, and softmax attn: Selective Rotary Position Embedding, with @timurcarstensen, @rshia_afz, @orvieto_antonio, and collaborators. Apr. 23rd 10:30 am Pavilion 3 Poster #613 Please stop by!

English

388

Timur retweetledi

Ameya P.@AmyPrb·10 Mar

📢 I’m on the job market 📢 My work has been around post-training LLMs that can discover what we *don’t know* yet! This includes: LM agents that reason over long horizons, continually learn from experience & can forecast outcomes of actions. Website: ameya.prabhu.be

English

106

20.3K

Timur retweetledi

Sajad Movahedi@Sajad_Movahedi_·4 Ara

Happy to present our NeurIPS Spotlight paper with @__safelix__ , @MucaCirone, and @orvieto_antonio: Fixed-Point RNNs: Interpolating from Diagonal to Dense. Here's a summary of the paper...

English

2.5K

Timur retweetledi

Vladyslav Moroshan@vlad_moroshan·5 Kas

Thrilled to share our new paper, TempoPFN! 🚀 TempoPFN is a new foundation model trained ENTIRELY on synthetic data. Most Time Series models use massive, proprietary real-world datasets. We asked: Can we compete with just a Linear RNN and 100% fake data? (Spoiler: yes)

English

595

Timur retweetledi

Julien Siems@julien_siems·21 Eyl

Accepted at NeurIPS 2025, come see us in San Diego to discuss linear RNNs!

Julien Siems@julien_siems

⚡DeltaProduct update with new results: - Characterization of DeltaProduct’s state-tracking ability - Inspection of the hidden state’s effective rank sheds light on why DeltaProduct extrapolates better to longer sequences than DeltaNet. - Improved scaling analysis And more!

English

8.5K

Timur retweetledi

ELLIS Institute Tübingen@ELLISInst_Tue·24 Haz

Join our mission to strengthen AI research in Europe 🇪🇺 We are looking for several ML Research Engineers and Scientists to work on OpenEuroLLM at the ELLIS Institute Tübingen. If you're passionate about large-scale model training, multilingual evaluation and want to contribute to cutting-edge open-source AI, we'd love to hear from you! 📍Based in Tübingen - a vibrant hub for machine learning research with a strong collaborative ecosystem 📚 Focus: LLMs, distributed training, evaluation, AutoML 🇪🇺 Part of OpenEuroLLM - a consortium of 20 leading European research institutions, companies and EuroHPC centres to build the next-generation of open-source language models awarded many millions GPU hours on most-recent Euro HPC clusters. What we offer: 🔒 The unique chance to develop the next generation of fully open European LLMs 🤗 A chance to be in the heart of the Cyber Valley and work at the ELLIS Institute with world known scientists headed by Bernhard Schölkopf ️☕ The best coffee you can think of See more in the job post description Job Post: Building Open Source LLM for Europe: institute-tue.ellis.eu/en/jobs/openeu… Apply here: ellis-openeurollm-apply.tue.mpg.de/registration/o… #AIJob #LLM #OpenSourceAI #ResearchEngineering #OpenEUROLLM #Hiring #MachineLearning #ArtificialIntelligence

English

13.5K

Timur retweetledi

Julien Siems@julien_siems·13 Haz

Julien Siems@julien_siems

1/9 There is a fundamental tradeoff between parallelizability and expressivity of Large Language Models. We propose a new linear RNN architecture, DeltaProduct, that can effectively navigate this tradeoff. Here's how!

English

14.7K

Timur retweetledi

Julien Siems@julien_siems·8 Nis

@leloykun @jyo_pari DeltaProduct is now available in the flash-linear-attention library: github.com/fla-org/flash-…

English

2.5K

Timur retweetledi

Riccardo Grazzi@riccardograzzi·28 Mar

@julien_siems @leloykun @jyo_pari In our DeltaProduct work we also add a bit of theory to DeltaNet, showing that it can solve Dihedral groups, which are the groups of symmetries of regular polygons, with only two layers. This includes S3 (symmetries of the equilateral triangle).

English

2.3K

Timur retweetledi

Julien Siems@julien_siems·28 Mar

English

188

35.1K

Timur retweetledi

Frank Hutter@FrankRHutter·9 Kas

Come to #Europe for your #PhD! We have top research & great conditions (no, you're not going broke doing your PhD here). And with @ELLISforEurope, you get to spend time in two different leading European universities: ellis.eu/news/ellis-phd… At the ELLIS Institute Tübingen @ELLISInst_Tue and at the ELLIS Unit Freiburg, we are hiring on tabular foundation models, automated data science, hyperparameter-dependent scaling laws and large-scale open-source foundation models.

English

1.5K

Keşfet

@Hesamation @prior_labs @SAP @loo_noel @liquidai @rshia_afz @orvieto_antonio @__safelix__