Timur

92 posts

Timur

Timur

@timurcarstensen

ML PhD at the ELLIS Institute in Tübingen supervised by Frank Hutter

Tübingen, Germany Katılım Ağustos 2013
158 Takip Edilen41 Takipçiler
Timur retweetledi
Jonas Geiping
Jonas Geiping@jonasgeiping·
We’re training models wrong and it’s due to chatGPT. Even the modern coding agents used daily still use message-based exchanges: They send messages to users, to themselves (CoT) and to tools, and receive messages in turn. This bottlenecks even very intelligent agents to a single stream. The models cannot read while writing, cannot act while thinking and cannot think while processing information. In our new paper, see below, we discuss LLMs with parallel streams. We show that multi-stream LLMs can … 🔵Be created by instruction-tuning for the stream format 🔵Simplify user and tool use UX removing many pain points with agents and chat models (such as having to interrupt the model to get a word in) 🔵Multi-Stream LLMs are fast, they can predict+read tokens in all streams in parallel in each forward pass, improving latency 🔵 LLMs with multiple streams have an easier time encoding a separation of concerns, improving security 🔵 LLMs with many internal streams provide a legible form of parallel/cont. reasoning. Even if the main CoT stream is accidentally pressured or too focused on a particular task to voice concerns, other internal streams can subvocalize concerns that would otherwise not be verbalized. Does this sound related to a recent thinky post :) - Yes, but I don’t feel so bad about being outshipped with such a cool report on their side by 23 hours. I’ll link a 2nd thread below with a more direct comparison. I actually think both are complementary in interesting ways.
GIF
English
42
168
1.4K
152.2K
Timur retweetledi
Konstantin Dobler ✈️ ICLR
Konstantin Dobler ✈️ ICLR@konstantdobler·
@Hesamation Better version without arbitrary institution cutoff, some data cleaning and splitting contribution of each paper among institutions. China + USA dominant ofc, but looks a bit different, doesn't it?
Konstantin Dobler ✈️ ICLR tweet media
English
6
62
309
93.4K
Timur retweetledi
Frank Hutter
Frank Hutter@FrankRHutter·
Huge news: @prior_labs has signed a definitive agreement to be acquired by @SAP. €1B+ invested over four years to build a globally-leading frontier AI lab for structured data — in Europe, in the open. Independent entity. Same team, same mission, same open models. A massive boost to what we can do. The mission just got accelerated. Founders’ statement: priorlabs.ai/blog-posts/pri… (Deal subject to regulatory approval; terms not disclosed.)
English
36
32
511
51.2K
Timur retweetledi
Neehal Tumma
Neehal Tumma@ntumm120·
Some say Gated DeltaNet > Mamba-2. Others say Mamba-2 > Gated DeltaNet. But what if Gated DeltaNet = Mamba-2? 👀 Well maybe not exactly — but with least-squares preconditioning, we show that they reduce to the same recurrence! We use this lens to design PDN, PGDN, and PKDA: preconditioned delta-style recurrences that outperform their unpreconditioned counterparts at scale 📈 📄 Paper: arxiv.org/abs/2604.21100 w/ @loo_noel @liquidai 💻 Code: github.com/ntumm120/preco…
English
6
30
194
12.9K
Timur retweetledi
Ameya P.
Ameya P.@AmyPrb·
📢 I’m on the job market 📢 My work has been around post-training LLMs that can discover what we *don’t know* yet! This includes: LM agents that reason over long horizons, continually learn from experience & can forecast outcomes of actions. Website: ameya.prabhu.be
Ameya P. tweet media
English
4
6
106
20.3K
Timur retweetledi
Vladyslav Moroshan
Vladyslav Moroshan@vlad_moroshan·
Thrilled to share our new paper, TempoPFN! 🚀 TempoPFN is a new foundation model trained ENTIRELY on synthetic data. Most Time Series models use massive, proprietary real-world datasets. We asked: Can we compete with just a Linear RNN and 100% fake data? (Spoiler: yes)
Vladyslav Moroshan tweet media
English
1
5
9
595
Timur retweetledi
ELLIS Institute Tübingen
ELLIS Institute Tübingen@ELLISInst_Tue·
Join our mission to strengthen AI research in Europe 🇪🇺 We are looking for several ML Research Engineers and Scientists to work on OpenEuroLLM at the ELLIS Institute Tübingen. If you're passionate about large-scale model training, multilingual evaluation and want to contribute to cutting-edge open-source AI, we'd love to hear from you! 📍Based in Tübingen - a vibrant hub for machine learning research with a strong collaborative ecosystem 📚 Focus: LLMs, distributed training, evaluation, AutoML 🇪🇺 Part of OpenEuroLLM - a consortium of 20 leading European research institutions, companies and EuroHPC centres to build the next-generation of open-source language models awarded many millions GPU hours on most-recent Euro HPC clusters. What we offer: 🔒 The unique chance to develop the next generation of fully open European LLMs 🤗 A chance to be in the heart of the Cyber Valley and work at the ELLIS Institute with world known scientists headed by Bernhard Schölkopf ️☕ The best coffee you can think of See more in the job post description Job Post: Building Open Source LLM for Europe: institute-tue.ellis.eu/en/jobs/openeu… Apply here: ellis-openeurollm-apply.tue.mpg.de/registration/o… #AIJob#LLM #OpenSourceAI #ResearchEngineering #OpenEUROLLM #Hiring #MachineLearning #ArtificialIntelligence
English
2
9
41
13.5K
Timur retweetledi
Julien Siems
Julien Siems@julien_siems·
⚡DeltaProduct update with new results: - Characterization of DeltaProduct’s state-tracking ability - Inspection of the hidden state’s effective rank sheds light on why DeltaProduct extrapolates better to longer sequences than DeltaNet. - Improved scaling analysis And more!
Julien Siems tweet media
Julien Siems@julien_siems

1/9 There is a fundamental tradeoff between parallelizability and expressivity of Large Language Models. We propose a new linear RNN architecture, DeltaProduct, that can effectively navigate this tradeoff. Here's how!

English
0
13
56
14.7K
Timur retweetledi
Riccardo Grazzi
Riccardo Grazzi@riccardograzzi·
@julien_siems @leloykun @jyo_pari In our DeltaProduct work we also add a bit of theory to DeltaNet, showing that it can solve Dihedral groups, which are the groups of symmetries of regular polygons, with only two layers. This includes S3 (symmetries of the equilateral triangle).
Riccardo Grazzi tweet media
English
1
5
22
2.3K
Timur retweetledi
Julien Siems
Julien Siems@julien_siems·
1/9 There is a fundamental tradeoff between parallelizability and expressivity of Large Language Models. We propose a new linear RNN architecture, DeltaProduct, that can effectively navigate this tradeoff. Here's how!
Julien Siems tweet media
English
4
36
188
35.1K
Timur retweetledi
Frank Hutter
Frank Hutter@FrankRHutter·
Come to #Europe for your #PhD! We have top research & great conditions (no, you're not going broke doing your PhD here). And with @ELLISforEurope, you get to spend time in two different leading European universities: ellis.eu/news/ellis-phd… At the ELLIS Institute Tübingen @ELLISInst_Tue and at the ELLIS Unit Freiburg, we are hiring on tabular foundation models, automated data science, hyperparameter-dependent scaling laws and large-scale open-source foundation models.
Frank Hutter tweet media
English
0
3
11
1.5K