Evan Miller

1.1K posts

Evan Miller banner
Evan Miller

Evan Miller

@EvMill

Statistically inclined software developer, occasional blogger about math + stats stuff. Working @AnthropicAI

NYC Katılım Mayıs 2009
212 Takip Edilen5.5K Takipçiler
Sabitlenmiş Tweet
Evan Miller
Evan Miller@EvMill·
I hit a bug in the Attention formula that’s been overlooked for 8+ years. All Transformer models (GPT, LLaMA, etc) are affected. Researchers isolated the bug last month – but they missed a simple solution… Why LLM designers should stop using Softmax 👇 evanmiller.org/attention-is-o…
English
74
355
2.3K
591.4K
Evan Miller retweetledi
Klaviyo
Klaviyo@klaviyo·
🚀 New on the Klaviyo Data Science Podcast: @EvMill joins us to discuss his paper, Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations. AI metrics are everywhere—but how much uncertainty is behind them? Understanding variability matters. Listen now: bit.ly/3CV0CmX #AI #DataScience
Klaviyo tweet media
English
0
4
14
1.9K
Evan Miller retweetledi
Anthropic
Anthropic@AnthropicAI·
We’re starting a Fellows program to help engineers and researchers transition into doing frontier AI safety research full-time. Beginning in March 2025, we'll provide funding, compute, and research mentorship to 10–15 Fellows with strong coding and technical backgrounds.
Anthropic tweet media
English
73
292
2.1K
503.6K
Evan Miller retweetledi
Marius Hobbhahn
Marius Hobbhahn@MariusHobbhahn·
This paper on the statistics of evals is great (and seems to be flying under the radar): arxiv.org/abs/2411.00640… The author basically shows all the relevant statistical tools needed for evals, e.g. how to do compute the right error bars, how to compare model performance, and how to do power analysis. Back when @jeremy_scheurer and I wrote the "We need a Science of Evals" post (apolloresearch.ai/blog/we-need-a…) this paper is exactly the kind of thing we had in mind and more.
English
7
36
219
37K
Evan Miller retweetledi
Anthropic
Anthropic@AnthropicAI·
New Anthropic research: Adding Error Bars to Evals. AI model evaluations don’t usually include statistics or uncertainty. We think they should. Read the blog post here: anthropic.com/research/stati…
English
49
299
2.1K
755.6K
egino
egino@egino·
@EvMill I was surprised to learn that Claude could geolocate based on a given address (big fan btw). Does it have access to geospatial tools or does it geolocate addresses based on pre trained data? I appreciate it! 🙌
English
1
0
0
53
Evan Miller
Evan Miller@EvMill·
I think I've finally cracked quantiles… A/B testing medians, instead of means, usually requires an expensive bootstrap. But we can use a likelihood-ratio test (Wilks' theorem) instead. This reduces the quantile problem to a few simple formulas. Read on! arxiv.org/abs/2401.10233
English
0
1
18
1.5K
Evan Miller retweetledi
Nat Friedman
Nat Friedman@natfriedman·
Ten months ago, we launched the Vesuvius Challenge to solve the ancient problem of the Herculaneum Papyri, a library of scrolls that were flash-fried by the eruption of Mount Vesuvius in 79 AD. Today we are overjoyed to announce that our crazy project has succeeded. After 2000 years, we can finally read the scrolls: This image was produced by @Youssef_M_Nader, @LukeFarritor, and @JuliSchillij, who have now won the Vesuvius Challenge Grand Prize of $700,000. Congratulations!! These fifteen columns come from the very end of the first scroll we have been able to read and contain new text from the ancient world that has never been seen before. The author – probably Epicurean philosopher Philodemus – writes here about music, food, and how to enjoy life's pleasures. In the closing section, he throws shade at unnamed ideological adversaries – perhaps the stoics? – who "have nothing to say about pleasure, either in general or in particular." This year, the Vesuvius Challenge continues. The text that we revealed so far represents just 5% of one scroll. In 2024, our goal is to from reading a few passages of text to entire scrolls, and we're announcing a new $100,000 grand prize for the first team that is able to read at least 90% of all four scrolls that we have scanned. The scrolls stored in Naples that remain to be read represent more than 16 megabytes of ancient text. But the villa where the scrolls were found was only partially excavated, and scholars tell us that there may be thousands more scrolls underground. Our hope is that the success of the Vesuvius Challenge catalyzes the excavation of the villa, that the main library is discovered, and that whatever we find there rewrites history and inspires all of us. It's been a great joy to work on this strange and amazing project. Thanks to Brent Seales for laying the foundation for this work over so many years, thanks to the friends and Twitter users whose donations powered our effort, and thanks to the many contestants whose contributions have made the Vesuvius Challenge successful! Read more in our announcement: scrollprize.org/grandprize
Nat Friedman tweet media
English
2.3K
14.6K
64.4K
26.3M
Evan Miller retweetledi
Beidi Chen
Beidi Chen@BeidiChen·
@ggerganov @EvMill The blog about Softmax+1 plays a very important role when we were trying to identify the root cause of the sink @Guangxuan_Xiao can comment more!
English
0
1
15
4.1K
Evan Miller retweetledi
Georgi Gerganov
Georgi Gerganov@ggerganov·
Have a few thoughts about this approach But most importantly, I'm happy to see @EvMill's idea on softmax1 recognized - to my very basic and intuitive understanding of LLMs, it made enough sense to warrant further analysis arxiv.org/abs/2309.17453
English
4
16
163
29.5K
Evan Miller
Evan Miller@EvMill·
👀
Markus Nagel@mnagel87

@Tracing47202686 @yell1337 @TiRune Unlike with clipped softmax, to achieve an exact zero in the output using softmax1 for a (partial) no-update, the input requires to be -infinity. However, after @EvMill blog post we experimented with softmax1 and found it in practice competitive with our proposed approaches.

ART
1
0
13
3.5K
Evan Miller retweetledi
Astatide
Astatide@Astatide42·
Results of my latest nerdsnipe from @TetraspaceWest! The plot below shows the predicted shape of the water flow, with a model taking into account gravity and surface tension. It looks just like the real thing! Conclusion: yep, it's surface tension details below 😁
Astatide tweet media
English
3
4
37
5.8K
Evan Miller retweetledi
Thomas Capelle
Thomas Capelle@capetorch·
Following @EvMill great blog post on encountered issues on the GPT-like models training that appear to be related to the SoftMax function, I wrote this small piece mostly to understand what was going on. wandb.me/tinyllama
English
2
10
47
5.3K
Evan Miller retweetledi
Panagiota Papakonstantinou
Panagiota Papakonstantinou@PPapakonNucl·
Kurt Vonnegut's 1969 address to the American Physical Society @APSphysics --on the innocence of the "old-fashioned scientist" and its loss after World War II. For physicists, artists, and other humans. I have transcribed it in its entirety as a google doc: docs.google.com/document/d/1Mn…
English
5
18
67
20.5K
Evan Miller
Evan Miller@EvMill·
Softmax1 update… We now have support for ⚡️Flash Attention ⚡️ This lets us test much larger models than before! To get the code, just pip install flash-attention-softmax-n Or clone / star the GitHub repo here: github.com/softmax1/Flash… All credit / kudos to Chris Murphy.
English
1
12
51
9.2K