Louis Kirsch

313 posts

Louis Kirsch

@LouisKirschAI

Driving the automation of AI Research. Project Lead & Research Scientist @GoogleDeepMind. PhD @SchmidhuberAI. @UCL, @HPI_DE alumnus. All opinions are my own.

London, England Se unió Kasım 2011

788 Siguiendo2.1K Seguidores

Tweet fijado

Louis Kirsch@LouisKirschAI·9 Ara

Emergent in-context learning with Transformers is exciting! But what is necessary to make neural nets implement general-purpose in-context learning? 2^14 tasks, a large model + memory, and initial memorization to aid generalization. Full paper arxiv.org/abs/2212.04458 🧵👇(1/9)

English

389

Louis Kirsch retuiteado

Laura Ruis@LauraRuis·5 Haz

Revisiting @LouisKirschAI et al.’s general-purpose ICL by meta-learning paper and forgot how great it is. It's rare to be taken along on the authors' journey to understand the phenomenon they document like this. More toy dataset papers should follow this structure.

English

6.3K

Louis Kirsch@LouisKirschAI·13 Ağu

AI Scientists will drive the next scientific revolution 🚀 Great work towards automating AI research @_chris_lu_ @RobertTLange @cong_ml @j_foerst @hardmaru @jeffclune

Sakana AI@SakanaAILabs

Introducing The AI Scientist: The world’s first AI system for automating scientific research and open-ended discovery! sakana.ai/ai-scientist/ From ideation, writing code, running experiments and summarizing results, to writing entire papers and conducting peer-review, The AI Scientist opens a new era of AI-driven scientific research and accelerated discovery. Here are 4 example Machine Learning research papers generated by The AI Scientist. We published our report, The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery, and open-sourced our project! Paper: arxiv.org/abs/2408.06292 GitHub: github.com/SakanaAI/AI-Sc… Our system leverages LLMs to propose and implement new research directions. Here, we first apply The AI Scientist to conduct Machine Learning research. Crucially, our system is capable of executing the entire ML research lifecycle: from inventing research ideas and experiments, writing code, to executing experiments on GPUs and gathering results. It can also write an entire scientific paper, explaining, visualizing and contextualizing the results. Furthermore, while an LLM author writes entire research papers, another LLM reviewer critiques resulting manuscripts to provide feedback to improve the work, and also to select the most promising ideas to further develop in the next iteration cycle, leading to continual, open-ended discoveries, thus emulating the human scientific community. As a proof of concept, our system produced papers with novel contributions in ML research domains such language modeling, Diffusion and Grokking. We (@_chris_lu_, @RobertTLange, @hardmaru) proudly collaborated with the @UniOfOxford (@j_foerst, @FLAIR_Ox) and @UBC (@cong_ml, @jeffclune) on this exciting project.

English

5.5K

Louis Kirsch retuiteado

Francesco Faccio@FaccioAI·8 Ağu

Had a great time presenting at #ICML2024 alongside @idivinci & @LouisKirschAI. But the true highlight was @SchmidhuberAI himself capturing our moments on camera! #AI #DeepLearning

English

1.7K

Louis Kirsch retuiteado

Matthew Jackson@JacksonMattT·20 Şub

Meta-learning can discover RL algorithms with novel modes of learning, but how can we make them adapt to any training horizon? Introducing our #ICLR2024 work on discovering *temporally-aware* RL algorithms! Work co-led with @_chris_lu_, in @FLAIR_Ox and @whi_rl

GIF

English

108

15.3K

Louis Kirsch retuiteado

hardmaru@hardmaru·28 Kas

Amazing that @SchmidhuberAI gave this talk back in 2012, months before AlexNet paper was published. In 2012, many things he discussed, people just considered to be funny and a joke, but the same talk now would be considered at the center of AI debate and controversy. Full talk:

English

218

1.4K

335.7K

Louis Kirsch retuiteado

Anand Gopalakrishnan@agopal42·4 Ara

Excited to present “Contrastive Training of Complex-valued Autoencoders for Object Discovery“ at #NeurIPS2023. TL;DR -- We introduce architecture changes and a new contrastive training objective that greatly improve the state-of-the-art synchrony-based model. Explainer thread 👇:

English

130

26.6K

Louis Kirsch retuiteado

Samuel Schmidgall@SRSchmidgall·22 May

There is still a lot we can learn from the brain in artificial intelligence. In our new review article, we delve into the mechanisms of the brain that inspired artificial intelligence algorithms, as well as brain-inspired learning algorithms in AI🧠 arxiv.org/abs/2305.11252…

English

175

21.6K

Louis Kirsch@LouisKirschAI·6 Nis

@alexgraveley Yes we have a paper on that. But it studies generalization of in context learning in Transformers in smaller scale. Super interesting phases transitions though! arxiv.org/abs/2212.04458 twitter.com/LouisKirschAI/…

Louis Kirsch@LouisKirschAI

English

489

Alex Graveley@alexgraveley·5 Nis

Any papers covering model size impact on in-context learning? ChatGPT appears much worse than GPT-3.5 in my testing.

English

9.4K

Louis Kirsch@LouisKirschAI·9 Şub

@AlphaSignalAI @NandoDF Not only LLMs can do this, but LSTMs do this too (result from 2020/2021) arxiv.org/abs/2012.14905

English

491

Lior Alexander@LiorOnAI·8 Şub

Large Language Models are fascinating. 1. They can store and simulate other NNs inside their hidden layers and adapt to new tasks without new data. 📄news.mit.edu/2023/large-lan… 2. They can secretly perform gradient descent as Meta-Optimizers. 📄 arxiv.org/abs/2212.10559… *jaw drop*

English

143

672

117.3K

Louis Kirsch retuiteado

David Finsterwalder | eu/acc@DFinsterwalder·25 Oca

But looking deeper into GPT and its capacity for in-context learning (ICL) is fascinating. Recent works on ICL (like this) made me much more curious about language modeling and transformers (+ the success of transformers in computer vision). 5/7 twitter.com/LouisKirschAI/…

Louis Kirsch@LouisKirschAI

English

869

Louis Kirsch@LouisKirschAI·24 Oca

@natolambert Not in RL, but we found scaling ‚laws‘ are more significant in memory size than parameter count for in-context learning twitter.com/LouisKirschAI/…

Louis Kirsch@LouisKirschAI

Does this only work for Transformers? No! We tried a range of architectures. Compared to scaling laws, the number of parameters does not predict the learning-to-learn ability too well. Instead, what matters is how much memory (or the activation bottleneck) the model has. (8/9)

English

180

Nathan Lambert@natolambert·23 Oca

Feels like scaling laws figures are becoming the new important "figure 1". What papers in RL and robotics have scaling law figures you find interesting? Trying to collect my thoughts in this space, some initial ones below⬇️ comment more

English

11.6K

Louis Kirsch retuiteado

Jacob Beck@jakeABeck·23 Oca

Excited to share our new survey paper of meta-RL! 📊🤖🎊 arxiv.org/abs/2301.08028 Many thanks to my co-authors for the hard work, @ristovuorio, Evan Liu, Zheng Xiong, @luisa_zintgraf, @chelseabfinn, @shimon8282 Highlights in the thread below!

English

185

55.1K

Louis Kirsch retuiteado

Thomas Miconi@ThomasMiconi·5 Oca

To start the new year (🥳) I'd like to highlight 2 recent papers that ask essentially the same question, but from very different perspectives: When learning many things at the same time, when and how does rote memorization turn into meta-learning - i.e. "learning-to-learn" ?

English

108

23.5K

Louis Kirsch@LouisKirschAI·20 Ara

@sharathraparthy @jasonhartford We observed meta-test generalization to unseen datasets - whether the random projection is applied or not. The generalization is better with the projection though as the distribution of activations more closely resembles meta-training. Normalization of inputs also helps.

English

159

Louis Kirsch@LouisKirschAI·20 Ara

@sharathraparthy @jasonhartford The generalization surely depends on the task distribution and making it diverse enough is key - to what extent other task augmentations / generations work is a great research question! In this paper we only looked at those projections. Mainly inspired by arxiv.org/abs/2109.10781

English

Louis Kirsch@LouisKirschAI·9 Ara

English

389

Louis Kirsch@LouisKirschAI·12 Ara

@zhao_lingxiao @0xmaddie_ Sure. The idea is that the transformer can attend the entire sequence including hidden representations thereof in the later layers. Each token has a specific key/value/query size of information that can be accessed. The longer the sequence, the more ‘memory’ is available.

English

Lingxiao Zhao@zhao_lingxiao·9 Ara

@LouisKirschAI @0xmaddie_ I can understand that hidden size is the memory size of RNN, but why transformer’s memory is calculated as this way? Can you elaborate more on this?

English

Louis Kirsch@LouisKirschAI·12 Ara

Video and poster at louiskirsch.com/gpicl

English

Louis Kirsch@LouisKirschAI·9 Ara

This paper is the outcome of an incredibly fun internship with @Luke_Metz @jmes_harrison @jaschasd at @GoogleAI over the summer.

English

Louis Kirsch@LouisKirschAI·9 Ara

@zhao_lingxiao @0xmaddie_ Great question! For RNNs, memory corresponds to the hidden size. More generally, it is the information bottleneck that the sequence has to pass through before making predictions. In transformers, self-attention makes this rather large estimated by key size * layers * sequence_len

English

Lingxiao Zhao@zhao_lingxiao·9 Ara

@0xmaddie_ @LouisKirschAI I have the same question

English

Descubrir

@_chris_lu_ @RobertTLange @cong_ml @j_foerst @hardmaru @jeffclune @idivinci @SchmidhuberAI