Adam Lerer

183 posts

Adam Lerer

Adam Lerer

@adamlerer

Tuning hypers @AnthropicAI

San Francisco, CA Beigetreten Şubat 2009
281 Folgt3.3K Follower
Angehefteter Tweet
Adam Lerer
Adam Lerer@adamlerer·
1/ Today our paper describing a human-level AI for Diplomacy was published in Science (science.org/doi/10.1126/sc…)! This is the first human-level AI for a game requiring cooperation through *natural language*. Really proud of what we built and excited to finally share it.
AI at Meta@AIatMeta

Meta AI presents CICERO — the first AI to achieve human-level performance in Diplomacy, a strategy game which requires building trust, negotiating and cooperating with multiple players. Learn more about #CICERObyMetaAI: bit.ly/3GBwLzx

English
7
27
177
0
Laurens van der Maaten
Laurens van der Maaten@lvdmaaten·
A new chapter: I am excited to share that I have recently joined Anthropic as a member of technical staff. Anthropic is a unique company with an even more unique mission that I am thrilled to be working towards. I will continue to be based out of NYC. Onwards!
English
80
16
1.3K
131.2K
Jerry Tworek
Jerry Tworek@MillionInt·
@polynoamial I think this meme definitely lacks test time compute
English
4
0
36
3.3K
Adam Lerer
Adam Lerer@adamlerer·
Which humanitarian programs previously supported by US foreign aid face the most critical funding gaps? Looking to donate where it will have maximum impact.
English
0
0
1
1.2K
Adam Lerer
Adam Lerer@adamlerer·
@kalomaze ALiBi forces attention(i,j) to 0 exponentially fast as you increase (i-j). If you force the model to not attend to anything far away of course it will "generalize", but not usefully 🙃
English
1
0
7
176
kalomaze
kalomaze@kalomaze·
why are we still using RoPE instead of ALiBi for positional embeddings? i suppose it's mainly a case of "Meta did it, so let's just copy what they did" but proper length extrapolation can seemingly be achieved with smarter pos embeddings
kalomaze tweet media
English
16
11
175
19.4K
Adam Lerer retweetet
Noam Brown
Noam Brown@polynoamial·
@OpenAI Our o1-preview and o1-mini models are available immediately. We’re also sharing evals for our (still unfinalized) o1 model to show the world that this isn’t a one-off improvement – it’s a new scaling paradigm and we’re just getting started. 2/9
Noam Brown tweet media
English
9
68
959
198.4K
Adam Lerer
Adam Lerer@adamlerer·
@polynoamial If I kill one of the townsfolk in Baldur's Gate, am I a murderer?
English
0
0
2
257
Noam Brown
Noam Brown@polynoamial·
If an AI bluffs in a game of poker, is it being deceptive? I’m curious how responses differ between those who know more about AI vs those who know more about poker
English
59
5
72
62.5K
Adam Lerer
Adam Lerer@adamlerer·
@alex_peys you probably have a broadcasting bug when you computed the loss
English
2
0
0
753
alex peysakhovich
alex peysakhovich@alex_peys·
scaling laws are weird man, i run a baseline, 5x the size = no performance gain, 10x the size = huge performance gain. why?
alex peysakhovich tweet media
English
27
2
115
22.3K
Colin
Colin@colin__flaherty·
Discovering the AI wework in New York has been a game changer for me. So many cool teams here: Character, Patronus, EvolutionaryScale, Augment (team of 1: me ..lol) etc etc
English
10
0
66
12.9K
Adam Lerer
Adam Lerer@adamlerer·
@alex_peys And yet there are ~no papers on how to Just Label Data most goodly.
English
0
0
2
178
Adam Lerer
Adam Lerer@adamlerer·
@alex_peys Great paper! I hope this starts a trend of people using themself and their dogs for Figure 1.
English
1
0
3
145
alex peysakhovich
alex peysakhovich@alex_peys·
just released a paper with will berman on multimodal inputs for image generation main idea: describing things just in text is often hard. can you train a model that uses interleaved text/image prompts for image generation? the answer is yes. 🧵
alex peysakhovich tweet media
English
13
52
377
52.6K
Adam Lerer
Adam Lerer@adamlerer·
(And fine tuning is not just as good)
English
0
0
0
836
Adam Lerer
Adam Lerer@adamlerer·
Congrats Jacob! The caching systems he built make Gemini long context more than a cute trick, because it lets you put your email inbox, your codebase, company policy guidelines, whatever, into every context for “free”.
Jacob Austin@jacobaustin132

This is something I've worked on for a while! You can save the activations of one LLM call and reuse them for a follow-up that overlaps with the first. This means asking a question about a big codebase can take 30 seconds the first time and 1s after that!

English
1
0
8
2.3K
Adam Lerer
Adam Lerer@adamlerer·
Our new model GPT-4o is pretty good :) and cool that everyone will now be able to use GPT-4 level models for free. Most of all I'm stoked to work at a company that troll-pre-released it's model as gpt2-chatbot 🤩
William Fedus@LiamFedus

But the ELO can ultimately become bounded by the difficulty of the prompts (i.e. can’t achieve arbitrarily high win rates on the prompt: “what’s up”). We find on harder prompt sets — and in particular coding — there is an even larger gap: GPT-4o achieves a +100 ELO over our prior best model.

English
1
0
13
2.8K
Adam Lerer
Adam Lerer@adamlerer·
@RogerGrosse Our field got into this state because for DL, empiricism has been more fruitful than scientific understanding. It feels good to succeed, and the role models are the successful ones.
English
0
0
5
380
Roger Grosse
Roger Grosse@RogerGrosse·
I'll turn this around and ask how our field got into a state where understanding how things work needs a special name like "mech interp" or "science of DL", rather than just being something researchers do every day. Part of MI's appeal is that it's just closer to what most AI researchers would be doing absent incentives to the contrary.
Sasha Rush@srush_nlp

I recently asked pre-PhD researchers what area they were most excited about, and overwhelmingly the answer was "mechanistic interpretability". Not sure how that happened, but I am interested how it came about.

English
13
22
232
55.9K
Adam Lerer
Adam Lerer@adamlerer·
@omarsar0 Uhh... I usually like your paper recommendations but what is this?? Seems like a bunch of speculation and buzzwords randomly strung together... see if you can figure out what on earth these tables and equations are 🙃
Adam Lerer tweet mediaAdam Lerer tweet media
English
2
0
12
2.6K
elvis
elvis@omarsar0·
Nice work surveying 300+ papers and summarizing research developments to look at in the space of Generative AI. It covers computational challenges, scalability, real-world implications, and the potential for Gen AI to drive progress in fields like healthcare, finance, and education. arxiv.org/abs/2312.10868
elvis tweet media
English
16
238
986
340.9K
Adam Lerer
Adam Lerer@adamlerer·
@lucia_quirke Interesting observation! I'm curious whether this is to be expected from the fact that until you reach the local minimum of the loss w.r.t. this feature there will be gradient reinforcing *every* circuit that predicts it; after that gradient will be 0 and most can be repurposed.
English
0
0
0
18
Lucia Quirke
Lucia Quirke@lucia_quirke·
Aside: while investigating the context neuron we found a weird phenomenon where huge numbers of other German context neurons form early in training but are negligibly useful, with most quickly unlearned. We have no idea why this happens!
Lucia Quirke tweet media
English
2
0
14
1K
Lucia Quirke
Lucia Quirke@lucia_quirke·
A mystery in prior work: LLMs contain interpretable neurons that correspond to text language. Some aren't important, but deleting Pythia 70M’s German neuron increases loss by 12% on German text. Why? We investigate over training and show it's part of a "second order circuit."
Lucia Quirke tweet media
Wes Gurnee@wesg52

One large family of neurons we find are “context” neurons, which activate only for tokens in a particular context (French, Python code, US patent documents, etc). When deleting these neurons the loss increases in the relevant context while leaving other contexts unaffected!

English
15
30
341
95.8K