Adam Lerer

183 posts

Adam Lerer

@adamlerer

Tuning hypers @AnthropicAI

San Francisco, CA Beigetreten Şubat 2009

281 Folgt3.3K Follower

Angehefteter Tweet

Adam Lerer@adamlerer·22 Kas

1/ Today our paper describing a human-level AI for Diplomacy was published in Science (science.org/doi/10.1126/sc…)! This is the first human-level AI for a game requiring cooperation through *natural language*. Really proud of what we built and excited to finally share it.

AI at Meta@AIatMeta

Meta AI presents CICERO — the first AI to achieve human-level performance in Diplomacy, a strategy game which requires building trust, negotiating and cooperating with multiple players. Learn more about #CICERObyMetaAI: bit.ly/3GBwLzx

English

177

Adam Lerer@adamlerer·7 Ağu

@lvdmaaten congrats Laurens!

Català

103

Laurens van der Maaten@lvdmaaten·7 Ağu

A new chapter: I am excited to share that I have recently joined Anthropic as a member of technical staff. Anthropic is a unique company with an even more unique mission that I am thrilled to be working towards. I will continue to be based out of NYC. Onwards!

English

1.3K

131.2K

Adam Lerer@adamlerer·12 Şub

@MillionInt @polynoamial ftfy

Dansk

263

Jerry Tworek@MillionInt·12 Şub

@polynoamial I think this meme definitely lacks test time compute

English

3.3K

Noam Brown@polynoamial·12 Şub

This meme summarizes the paper nicely

Aran Komatsuzaki@arankomatsuzaki

OpenAI presents: Competitive Programming with Large Reasoning Models - Competed live at IOI 2024 - o3 achieved gold - General-purpose o3 surpasses o1 w/ hand-crafted pipelines specialized for coding resultss

English

797

87.9K

Adam Lerer@adamlerer·3 Şub

Which humanitarian programs previously supported by US foreign aid face the most critical funding gaps? Looking to donate where it will have maximum impact.

English

1.2K

Adam Lerer@adamlerer·25 Eyl

@kalomaze ALiBi forces attention(i,j) to 0 exponentially fast as you increase (i-j). If you force the model to not attend to anything far away of course it will "generalize", but not usefully 🙃

English

176

kalomaze@kalomaze·24 Eyl

why are we still using RoPE instead of ALiBi for positional embeddings? i suppose it's mainly a case of "Meta did it, so let's just copy what they did" but proper length extrapolation can seemingly be achieved with smarter pos embeddings

English

175

19.4K

Adam Lerer retweetet

Noam Brown@polynoamial·12 Eyl

@OpenAI Our o1-preview and o1-mini models are available immediately. We’re also sharing evals for our (still unfinalized) o1 model to show the world that this isn’t a one-off improvement – it’s a new scaling paradigm and we’re just getting started. 2/9

English

959

198.4K

Adam Lerer@adamlerer·18 Ağu

@polynoamial If I kill one of the townsfolk in Baldur's Gate, am I a murderer?

English

257

Noam Brown@polynoamial·17 Ağu

If an AI bluffs in a game of poker, is it being deceptive? I’m curious how responses differ between those who know more about AI vs those who know more about poker

English

62.5K

Adam Lerer@adamlerer·1 Ağu

@alex_peys you probably have a broadcasting bug when you computed the loss

English

753

alex peysakhovich@alex_peys·31 Tem

scaling laws are weird man, i run a baseline, 5x the size = no performance gain, 10x the size = huge performance gain. why?

English

115

22.3K

Adam Lerer@adamlerer·14 Tem

@colin__flaherty @relh_net @kevinakwok Yeah which wework you at? Curious minds want to know

English

203

Colin@colin__flaherty·12 Tem

@relh_net @kevinakwok That's a good question - I'll ask

English

250

Colin@colin__flaherty·12 Tem

Discovering the AI wework in New York has been a game changer for me. So many cool teams here: Character, Patronus, EvolutionaryScale, Augment (team of 1: me ..lol) etc etc

English

12.9K

Adam Lerer@adamlerer·8 Tem

@alex_peys And yet there are ~no papers on how to Just Label Data most goodly.

English

178

alex peysakhovich@alex_peys·8 Tem

“how can i improve my machine learning system?”

Alex Graveley@alexgraveley

@teortaxesTex

English

1.1K

Adam Lerer@adamlerer·1 Tem

@alex_peys Great paper! I hope this starts a trend of people using themself and their dogs for Figure 1.

English

145

alex peysakhovich@alex_peys·28 Haz

just released a paper with will berman on multimodal inputs for image generation main idea: describing things just in text is often hard. can you train a model that uses interleaved text/image prompts for image generation? the answer is yes. 🧵

English

377

52.6K

Adam Lerer@adamlerer·15 May

(And fine tuning is not just as good)

English

836

Adam Lerer@adamlerer·15 May

Congrats Jacob! The caching systems he built make Gemini long context more than a cute trick, because it lets you put your email inbox, your codebase, company policy guidelines, whatever, into every context for “free”.

Jacob Austin@jacobaustin132

This is something I've worked on for a while! You can save the activations of one LLM call and reuse them for a follow-up that overlaps with the first. This means asking a question about a big codebase can take 30 seconds the first time and 1s after that!

English

2.3K

Adam Lerer@adamlerer·13 May

Our new model GPT-4o is pretty good :) and cool that everyone will now be able to use GPT-4 level models for free. Most of all I'm stoked to work at a company that troll-pre-released it's model as gpt2-chatbot 🤩

William Fedus@LiamFedus

But the ELO can ultimately become bounded by the difficulty of the prompts (i.e. can’t achieve arbitrarily high win rates on the prompt: “what’s up”). We find on harder prompt sets — and in particular coding — there is an even larger gap: GPT-4o achieves a +100 ELO over our prior best model.

English

2.8K

Adam Lerer@adamlerer·13 May

i excite that u excite

Mo Bavarian@mobav0

I am excited about Aidan's excitement

English

Adam Lerer@adamlerer·13 May

That's fantastic! Great for science and society that the broad academic research enabled by AF will continue :)

Pushmeet Kohli@pushmeet

We love the excitement & results from the community on AlphaFold 3 and are doubling the AF Server daily job limit to 20. Happy to also share that we're working on releasing the AF3 model (incl weights) for academic use, which doesn’t depend on our research infra, within 6 months.

English

2.7K

Adam Lerer@adamlerer·20 Nis

@alex_peys Thank god

English

236

alex peysakhovich@alex_peys·20 Nis

ai is over everyone go home

English

1.3K

Adam Lerer@adamlerer·22 Mar

@alex_peys aw i miss the good ol' days

English

195

alex peysakhovich@alex_peys·21 Mar

1-2 smart collaborators, lots of data, lots of compute, no meetings

Tyler is finishing a book, slow to reply@TylerAlterman

In what conditions have you produced your absolute best work? Eg: a. Have u been solo? Or around others? b. What environment were you in? c. What structures were in place if any? etc

English

1.3K

Adam Lerer@adamlerer·24 Oca

@RogerGrosse Our field got into this state because for DL, empiricism has been more fruitful than scientific understanding. It feels good to succeed, and the role models are the successful ones.

English

380

Roger Grosse@RogerGrosse·24 Oca

I'll turn this around and ask how our field got into a state where understanding how things work needs a special name like "mech interp" or "science of DL", rather than just being something researchers do every day. Part of MI's appeal is that it's just closer to what most AI researchers would be doing absent incentives to the contrary.

Sasha Rush@srush_nlp

I recently asked pre-PhD researchers what area they were most excited about, and overwhelmingly the answer was "mechanistic interpretability". Not sure how that happened, but I am interested how it came about.

English

232

55.9K

Adam Lerer@adamlerer·2 Oca

@omarsar0 Uhh... I usually like your paper recommendations but what is this?? Seems like a bunch of speculation and buzzwords randomly strung together... see if you can figure out what on earth these tables and equations are 🙃

English

2.6K

elvis@omarsar0·28 Ara

Nice work surveying 300+ papers and summarizing research developments to look at in the space of Generative AI. It covers computational challenges, scalability, real-world implications, and the potential for Gen AI to drive progress in fields like healthcare, finance, and education. arxiv.org/abs/2312.10868

English

238

986

340.9K

Adam Lerer@adamlerer·14 Kas

@lucia_quirke Interesting observation! I'm curious whether this is to be expected from the fact that until you reach the local minimum of the loss w.r.t. this feature there will be gradient reinforcing *every* circuit that predicts it; after that gradient will be 0 and most can be repurposed.

English

Lucia Quirke@lucia_quirke·10 Kas

Aside: while investigating the context neuron we found a weird phenomenon where huge numbers of other German context neurons form early in training but are negligibly useful, with most quickly unlearned. We have no idea why this happens!

English

Lucia Quirke@lucia_quirke·10 Kas

A mystery in prior work: LLMs contain interpretable neurons that correspond to text language. Some aren't important, but deleting Pythia 70M’s German neuron increases loss by 12% on German text. Why? We investigate over training and show it's part of a "second order circuit."

Wes Gurnee@wesg52

One large family of neurons we find are “context” neurons, which activate only for tokens in a particular context (French, Python code, US patent documents, etc). When deleting these neurons the loss increases in the relevant context while leaving other contexts unaffected!

English

341

95.8K

Entdecken

@lvdmaaten @MillionInt @polynoamial @kalomaze @OpenAI @alex_peys @colin__flaherty @relh_net