Ethan Dyer

39 posts

Ethan Dyer

Ethan Dyer

@ethansdyer

参加日 Mart 2017
137 フォロー中1.2K フォロワー
Ethan Dyer がリツイート
Anthropic
Anthropic@AnthropicAI·
BioMysteryBench, our new bioinformatics eval, tests whether Claude can devise creative solutions to open-ended research problems. Read more: anthropic.com/research/Evalu…
English
20
36
345
69.6K
Ethan Dyer がリツイート
Bassil Shama
Bassil Shama@BassilShama·
Opus 4.6 is our most capable Computer Use model to date. Excited for everyone to give Computer Use a try with Claude in Chrome, Cowork, and Claude Code! To celebrate, I let Claude (4.6) Monet show off his artistic side in the Claude for Chrome extension.
Bassil Shama tweet media
Claude@claudeai

Introducing Claude Opus 4.6. Our smartest model got an upgrade. Opus 4.6 plans more carefully, sustains agentic tasks for longer, operates reliably in massive codebases, and catches its own mistakes. It’s also our first Opus-class model with 1M token context in beta.

English
5
8
33
8.1K
Ethan Dyer がリツイート
Behnam Neyshabur
Behnam Neyshabur@bneyshabur·
I'm excited about this! Our team has been working really hard to improve Gemini 1.5 capabilities significantly on multiple fronts and in particular MATH/STEM! Please see the report here: goo.gle/GeminiV1-5
Oriol Vinyals@OriolVinyalsML

Today we have published our updated Gemini 1.5 Model Technical Report. As @JeffDean highlights, we have made significant progress in Gemini 1.5 Pro across all key benchmarks; TL;DR: 1.5 Pro > 1.0 Ultra, 1.5 Flash (our fastest model) ~= 1.0 Ultra. As a math undergrad, our drastic results in mathematics are particularly exciting to me! In section 7 of the tech report, we present new results on a math-specialised variant of Gemini 1.5 Pro which performs strongly on competition-level math problems, including a breakthrough performance of 91.1% on Hendryck’s MATH benchmark without tool-use (examples below 🧵). Gemini 1.5 is widely available, try it out for free here aistudio.google.com & read the full tech report here: goo.gle/GeminiV1-5

English
9
15
163
147.4K
Ethan Dyer がリツイート
Oriol Vinyals
Oriol Vinyals@OriolVinyalsML·
Today we have published our updated Gemini 1.5 Model Technical Report. As @JeffDean highlights, we have made significant progress in Gemini 1.5 Pro across all key benchmarks; TL;DR: 1.5 Pro > 1.0 Ultra, 1.5 Flash (our fastest model) ~= 1.0 Ultra. As a math undergrad, our drastic results in mathematics are particularly exciting to me! In section 7 of the tech report, we present new results on a math-specialised variant of Gemini 1.5 Pro which performs strongly on competition-level math problems, including a breakthrough performance of 91.1% on Hendryck’s MATH benchmark without tool-use (examples below 🧵). Gemini 1.5 is widely available, try it out for free here aistudio.google.com & read the full tech report here: goo.gle/GeminiV1-5
Oriol Vinyals tweet media
English
42
191
989
712.6K
Ethan Dyer がリツイート
Joshua Batson
Joshua Batson@thebasepoint·
In writing this paper, there were countless features we thought might be bugs. After careful inspection, ~all of them revealed surprising and subtle model properties. To me this capacity for surprise is the true test of a new technique. This thread is about my favorite finding.
Anthropic@AnthropicAI

The fact that most individual neurons are uninterpretable presents a serious roadblock to a mechanistic understanding of language models. We demonstrate a method for decomposing groups of neurons into interpretable features with the potential to move past that roadblock.

English
4
40
373
104.6K
Ethan Dyer がリツイート
nature
nature@Nature·
Nature research paper: Universality in long-distance geometry and quantum complexity go.nature.com/3ZFVmuu
English
0
7
18
17.7K
Ethan Dyer がリツイート
Behnam Neyshabur
Behnam Neyshabur@bneyshabur·
Excited to announce that the entire Blueshift team has joined @DeepMind! We will be working with @OriolVinyalsML and others to advance capabilities of LLMs developed by DM / Alphabet! We hope to continue to grow DM's presence in Bay Area and New York in the coming months :-)
Behnam Neyshabur tweet media
English
31
51
1.1K
209.2K
Ethan Dyer がリツイート
Behnam Neyshabur
Behnam Neyshabur@bneyshabur·
If you are interested in solving challenging multi-step reasoning problems with LLMs, join us! We have an opening for a Research Scientist position at Blueshift! Learn more about the role & apply here: forms.gle/VZ8oHsuswt3iXw… Learn about our team: research.google/teams/blueshif…
alewkowycz@alewkowycz

Very excited to present Minerva🦉: a language model capable of solving mathematical questions using step-by-step natural language reasoning. Combining scale, data and others dramatically improves performance on the STEM benchmarks MATH and MMLU-STEM. goo.gle/3yGpTN7

English
1
9
62
0
Ethan Dyer
Ethan Dyer@ethansdyer·
@amirzait Great question! In arxiv.org/abs/2206.14858 we began to study memorization. We indeed looked at acc on modified questions, checked for MATH in the training data, and compared acc when removing answers similar to MATH. But this is an important direction for more follow up!
English
2
1
2
0
Amir Zait
Amir Zait@amirzait·
@ethansdyer Wow! Amazing! How do you prevent leakage of MATH into the training set? Many of the questions appear online verbatim with answers attached to them. Did you happen to try small variations on a small set of answers the model got correctly to make sure it didn't simply memorize?
English
2
0
1
0
Ethan Dyer
Ethan Dyer@ethansdyer·
@HAKSOAT MMLU doesn't seem to have many pure E&M problems that require multiple steps. I agree it would be interesting to do a systematic evaluation. But here is one that I grabbed:
Ethan Dyer tweet media
English
1
0
2
0
Ethan Dyer
Ethan Dyer@ethansdyer·
@suzuki__r Yes, it is all done through reading TeX (or math ml, mathjax etc...). Very likely that the response will depend on the style.
English
1
0
1
0
Ethan Dyer
Ethan Dyer@ethansdyer·
@KyleCranmer One fun aspect of how few shot prompting works with these generative models is we give: Question: ... Answer: ... ... Question: ... Answer: ... Question: And the model produces an answer. But then it keeps making up new questions and answers -- next year's pset 😉.
English
0
0
2
0
Ethan Dyer
Ethan Dyer@ethansdyer·
@holmesjtg We don't have any concrete plans, but are definitely very interested in how this can be adapted to be a helpful tutor, answer questions as students ask them (rather than as tests phrase them) etc... Do you have any favorite datasets for this?
English
3
0
6
0
Jeff Holmes
Jeff Holmes@holmesjtg·
@ethansdyer Amazing results! Was wondering if your team is considering combining this kind of training with learning theory data, so that a model could more effectively act as a tutor, knowing how to prompt and give feedback. Working on this with #gpt3 now, but the model is not ideal.
English
2
0
13
0
Ethan Dyer
Ethan Dyer@ethansdyer·
@pablo_derbez Without additional prompting, it can still be quite brittle to such things. On the other hand, we have seen examples where the problem answer options assume some kind of rounding, Minerva solves exactly and then correctly realizes it is supposed to round.
English
1
0
4
0
Pablo Derbez
Pablo Derbez@pablo_derbez·
@ethansdyer Incredible! I'm curious what its response is if there's a mistake in the question, or if none of the options presented is correct.
English
1
0
0
0
Ethan Dyer
Ethan Dyer@ethansdyer·
2/ Among many impressive properties, one side effect of training on the web is that Minerva has seen text used to draw mathematical figures and so can sometimes reason about diagrams.
Ethan Dyer tweet media
English
3
14
119
0
Ethan Dyer がリツイート
Vedant Misra
Vedant Misra@vedantmisra·
Thrilled to announce🦉Minerva: a large language model capable of solving mathematical problems using step-by-step reasoning in natural language. See blog here: goo.gle/3yGpTN7 and samples here: minerva-demo.github.io (1/n)
alewkowycz@alewkowycz

Very excited to present Minerva🦉: a language model capable of solving mathematical questions using step-by-step natural language reasoning. Combining scale, data and others dramatically improves performance on the STEM benchmarks MATH and MMLU-STEM. goo.gle/3yGpTN7

English
3
28
122
0
Ethan Dyer がリツイート
Behnam Neyshabur
Behnam Neyshabur@bneyshabur·
Very excited to announce a significant milestone in expanding reasoning capabilities of language models! 🎉🎉 We introduce #Minerva🦉: a language model that can solve mathematical questions using step-by-step natural language reasoning: bit.ly/3OBj2d5 🧵 1/
Behnam Neyshabur tweet mediaBehnam Neyshabur tweet mediaBehnam Neyshabur tweet media
alewkowycz@alewkowycz

Very excited to present Minerva🦉: a language model capable of solving mathematical questions using step-by-step natural language reasoning. Combining scale, data and others dramatically improves performance on the STEM benchmarks MATH and MMLU-STEM. goo.gle/3yGpTN7

English
11
120
601
0