Louis Kirsch

313 posts

Louis Kirsch banner
Louis Kirsch

Louis Kirsch

@LouisKirschAI

Driving the automation of AI Research. Project Lead & Research Scientist @GoogleDeepMind. PhD @SchmidhuberAI. @UCL, @HPI_DE alumnus. All opinions are my own.

London, England Se unió Kasım 2011
788 Siguiendo2.1K Seguidores
Tweet fijado
Louis Kirsch
Louis Kirsch@LouisKirschAI·
Emergent in-context learning with Transformers is exciting! But what is necessary to make neural nets implement general-purpose in-context learning? 2^14 tasks, a large model + memory, and initial memorization to aid generalization. Full paper arxiv.org/abs/2212.04458 🧵👇(1/9)
Louis Kirsch tweet media
English
7
82
389
0
Louis Kirsch retuiteado
Laura Ruis
Laura Ruis@LauraRuis·
Revisiting @LouisKirschAI et al.’s general-purpose ICL by meta-learning paper and forgot how great it is. It's rare to be taken along on the authors' journey to understand the phenomenon they document like this. More toy dataset papers should follow this structure.
English
4
7
43
6.3K
Louis Kirsch
Louis Kirsch@LouisKirschAI·
AI Scientists will drive the next scientific revolution 🚀 Great work towards automating AI research @_chris_lu_ @RobertTLange @cong_ml @j_foerst @hardmaru @jeffclune
Sakana AI@SakanaAILabs

Introducing The AI Scientist: The world’s first AI system for automating scientific research and open-ended discovery! sakana.ai/ai-scientist/ From ideation, writing code, running experiments and summarizing results, to writing entire papers and conducting peer-review, The AI Scientist opens a new era of AI-driven scientific research and accelerated discovery. Here are 4 example Machine Learning research papers generated by The AI Scientist. We published our report, The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery, and open-sourced our project! Paper: arxiv.org/abs/2408.06292 GitHub: github.com/SakanaAI/AI-Sc… Our system leverages LLMs to propose and implement new research directions. Here, we first apply The AI Scientist to conduct Machine Learning research. Crucially, our system is capable of executing the entire ML research lifecycle: from inventing research ideas and experiments, writing code, to executing experiments on GPUs and gathering results. It can also write an entire scientific paper, explaining, visualizing and contextualizing the results. Furthermore, while an LLM author writes entire research papers, another LLM reviewer critiques resulting manuscripts to provide feedback to improve the work, and also to select the most promising ideas to further develop in the next iteration cycle, leading to continual, open-ended discoveries, thus emulating the human scientific community. As a proof of concept, our system produced papers with novel contributions in ML research domains such language modeling, Diffusion and Grokking. We (@_chris_lu_, @RobertTLange, @hardmaru) proudly collaborated with the @UniOfOxford (@j_foerst, @FLAIR_Ox) and @UBC (@cong_ml, @jeffclune) on this exciting project.

English
3
4
28
5.5K
Louis Kirsch retuiteado
Matthew Jackson
Matthew Jackson@JacksonMattT·
Meta-learning can discover RL algorithms with novel modes of learning, but how can we make them adapt to any training horizon? Introducing our #ICLR2024 work on discovering *temporally-aware* RL algorithms! Work co-led with @_chris_lu_, in @FLAIR_Ox and @whi_rl
GIF
English
1
24
108
15.3K
Louis Kirsch retuiteado
hardmaru
hardmaru@hardmaru·
Amazing that @SchmidhuberAI gave this talk back in 2012, months before AlexNet paper was published. In 2012, many things he discussed, people just considered to be funny and a joke, but the same talk now would be considered at the center of AI debate and controversy. Full talk:
English
31
218
1.4K
335.7K
Louis Kirsch retuiteado
Anand Gopalakrishnan
Anand Gopalakrishnan@agopal42·
Excited to present “Contrastive Training of Complex-valued Autoencoders for Object Discovery“ at #NeurIPS2023. TL;DR -- We introduce architecture changes and a new contrastive training objective that greatly improve the state-of-the-art synchrony-based model. Explainer thread 👇:
Anand Gopalakrishnan tweet media
English
1
25
130
26.6K
Louis Kirsch retuiteado
Samuel Schmidgall
Samuel Schmidgall@SRSchmidgall·
There is still a lot we can learn from the brain in artificial intelligence. In our new review article, we delve into the mechanisms of the brain that inspired artificial intelligence algorithms, as well as brain-inspired learning algorithms in AI🧠 arxiv.org/abs/2305.11252…
Samuel Schmidgall tweet media
English
6
46
175
21.6K
Alex Graveley
Alex Graveley@alexgraveley·
Any papers covering model size impact on in-context learning? ChatGPT appears much worse than GPT-3.5 in my testing.
English
6
0
21
9.4K
Louis Kirsch retuiteado
David Finsterwalder | eu/acc
David Finsterwalder | eu/acc@DFinsterwalder·
But looking deeper into GPT and its capacity for in-context learning (ICL) is fascinating. Recent works on ICL (like this) made me much more curious about language modeling and transformers (+ the success of transformers in computer vision). 5/7 twitter.com/LouisKirschAI/…
Louis Kirsch@LouisKirschAI

Emergent in-context learning with Transformers is exciting! But what is necessary to make neural nets implement general-purpose in-context learning? 2^14 tasks, a large model + memory, and initial memorization to aid generalization. Full paper arxiv.org/abs/2212.04458 🧵👇(1/9)

English
1
1
4
869
Nathan Lambert
Nathan Lambert@natolambert·
Feels like scaling laws figures are becoming the new important "figure 1". What papers in RL and robotics have scaling law figures you find interesting? Trying to collect my thoughts in this space, some initial ones below⬇️ comment more
English
9
5
36
11.6K
Louis Kirsch retuiteado
Thomas Miconi
Thomas Miconi@ThomasMiconi·
To start the new year (🥳) I'd like to highlight 2 recent papers that ask essentially the same question, but from very different perspectives: When learning many things at the same time, when and how does rote memorization turn into meta-learning - i.e. "learning-to-learn" ?
Thomas Miconi tweet mediaThomas Miconi tweet media
English
1
17
108
23.5K
Louis Kirsch
Louis Kirsch@LouisKirschAI·
@sharathraparthy @jasonhartford We observed meta-test generalization to unseen datasets - whether the random projection is applied or not. The generalization is better with the projection though as the distribution of activations more closely resembles meta-training. Normalization of inputs also helps.
English
1
0
2
159
Louis Kirsch
Louis Kirsch@LouisKirschAI·
@sharathraparthy @jasonhartford The generalization surely depends on the task distribution and making it diverse enough is key - to what extent other task augmentations / generations work is a great research question! In this paper we only looked at those projections. Mainly inspired by arxiv.org/abs/2109.10781
English
1
0
1
92
Louis Kirsch
Louis Kirsch@LouisKirschAI·
Emergent in-context learning with Transformers is exciting! But what is necessary to make neural nets implement general-purpose in-context learning? 2^14 tasks, a large model + memory, and initial memorization to aid generalization. Full paper arxiv.org/abs/2212.04458 🧵👇(1/9)
Louis Kirsch tweet media
English
7
82
389
0
Louis Kirsch
Louis Kirsch@LouisKirschAI·
@zhao_lingxiao @0xmaddie_ Sure. The idea is that the transformer can attend the entire sequence including hidden representations thereof in the later layers. Each token has a specific key/value/query size of information that can be accessed. The longer the sequence, the more ‘memory’ is available.
English
0
0
0
0
Lingxiao Zhao
Lingxiao Zhao@zhao_lingxiao·
@LouisKirschAI @0xmaddie_ I can understand that hidden size is the memory size of RNN, but why transformer’s memory is calculated as this way? Can you elaborate more on this?
English
1
0
0
0
Louis Kirsch
Louis Kirsch@LouisKirschAI·
@zhao_lingxiao @0xmaddie_ Great question! For RNNs, memory corresponds to the hidden size. More generally, it is the information bottleneck that the sequence has to pass through before making predictions. In transformers, self-attention makes this rather large estimated by key size * layers * sequence_len
Louis Kirsch tweet media
English
1
0
0
0