Ethan Dyer (@ethansdyer) - Twitter 프로필 | Zamantika Mersobahis Locabet

Ethan Dyer 리트윗함

Anthropic@AnthropicAI·30 Nis

BioMysteryBench, our new bioinformatics eval, tests whether Claude can devise creative solutions to open-ended research problems. Read more: anthropic.com/research/Evalu…

English

20

36

345

69.5K

Ethan Dyer 리트윗함

Anthropic@AnthropicAI·28 Şub

A statement on the comments from Secretary of War Pete Hegseth. anthropic.com/news/statement…

English

2.8K

6.6K

42.6K

17.7M

Ethan Dyer 리트윗함

Bassil Shama@BassilShama·5 Şub

Opus 4.6 is our most capable Computer Use model to date. Excited for everyone to give Computer Use a try with Claude in Chrome, Cowork, and Claude Code! To celebrate, I let Claude (4.6) Monet show off his artistic side in the Claude for Chrome extension.

Claude@claudeai

Introducing Claude Opus 4.6. Our smartest model got an upgrade. Opus 4.6 plans more carefully, sustains agentic tasks for longer, operates reliably in massive codebases, and catches its own mistakes. It’s also our first Opus-class model with 1M token context in beta.

English

5

8

33

8.1K

Ethan Dyer 리트윗함

Behnam Neyshabur@bneyshabur·17 May

I'm excited about this! Our team has been working really hard to improve Gemini 1.5 capabilities significantly on multiple fronts and in particular MATH/STEM! Please see the report here: goo.gle/GeminiV1-5

Oriol Vinyals@OriolVinyalsML

Today we have published our updated Gemini 1.5 Model Technical Report. As @JeffDean highlights, we have made significant progress in Gemini 1.5 Pro across all key benchmarks; TL;DR: 1.5 Pro > 1.0 Ultra, 1.5 Flash (our fastest model) ~= 1.0 Ultra. As a math undergrad, our drastic results in mathematics are particularly exciting to me! In section 7 of the tech report, we present new results on a math-specialised variant of Gemini 1.5 Pro which performs strongly on competition-level math problems, including a breakthrough performance of 91.1% on Hendryck’s MATH benchmark without tool-use (examples below 🧵). Gemini 1.5 is widely available, try it out for free here aistudio.google.com & read the full tech report here: goo.gle/GeminiV1-5

English

9

15

163

147.4K

Ethan Dyer 리트윗함

Oriol Vinyals@OriolVinyalsML·17 May

Today we have published our updated Gemini 1.5 Model Technical Report. As @JeffDean highlights, we have made significant progress in Gemini 1.5 Pro across all key benchmarks; TL;DR: 1.5 Pro > 1.0 Ultra, 1.5 Flash (our fastest model) ~= 1.0 Ultra. As a math undergrad, our drastic results in mathematics are particularly exciting to me! In section 7 of the tech report, we present new results on a math-specialised variant of Gemini 1.5 Pro which performs strongly on competition-level math problems, including a breakthrough performance of 91.1% on Hendryck’s MATH benchmark without tool-use (examples below 🧵). Gemini 1.5 is widely available, try it out for free here aistudio.google.com & read the full tech report here: goo.gle/GeminiV1-5

English

42

191

989

712.6K

Ethan Dyer 리트윗함

Joshua Batson@thebasepoint·6 Eki

In writing this paper, there were countless features we thought might be bugs. After careful inspection, ~all of them revealed surprising and subtle model properties. To me this capacity for surprise is the true test of a new technique. This thread is about my favorite finding.

Anthropic@AnthropicAI

The fact that most individual neurons are uninterpretable presents a serious roadblock to a mechanistic understanding of language models. We demonstrate a method for decomposing groups of neurons into interpretable features with the potential to move past that roadblock.

English

4

40

373

104.6K

Ethan Dyer 리트윗함

nature@Nature·4 Eki

Nature research paper: Universality in long-distance geometry and quantum complexity go.nature.com/3ZFVmuu

English

0

7

18

17.7K

Ethan Dyer 리트윗함

Behnam Neyshabur@bneyshabur·24 Şub

Excited to announce that the entire Blueshift team has joined @DeepMind! We will be working with @OriolVinyalsML and others to advance capabilities of LLMs developed by DM / Alphabet! We hope to continue to grow DM's presence in Bay Area and New York in the coming months :-)

English

31

51

1.1K

209.2K

Ethan Dyer 리트윗함

Behnam Neyshabur@bneyshabur·22 Tem

If you are interested in solving challenging multi-step reasoning problems with LLMs, join us! We have an opening for a Research Scientist position at Blueshift! Learn more about the role & apply here: forms.gle/VZ8oHsuswt3iXw… Learn about our team: research.google/teams/blueshif…

alewkowycz@alewkowycz

Very excited to present Minerva🦉: a language model capable of solving mathematical questions using step-by-step natural language reasoning. Combining scale, data and others dramatically improves performance on the STEM benchmarks MATH and MMLU-STEM. goo.gle/3yGpTN7

English

1

9

62

0

Ethan Dyer@ethansdyer·2 Tem

@amirzait Great question! In arxiv.org/abs/2206.14858 we began to study memorization. We indeed looked at acc on modified questions, checked for MATH in the training data, and compared acc when removing answers similar to MATH. But this is an important direction for more follow up!

English

2

1

2

0

Amir Zait@amirzait·1 Tem

@ethansdyer Wow! Amazing! How do you prevent leakage of MATH into the training set? Many of the questions appear online verbatim with answers attached to them. Did you happen to try small variations on a small set of answers the model got correctly to make sure it didn't simply memorize?

English

2

0

1

0

Ethan Dyer@ethansdyer·1 Tem

1/ Super excited to introduce #Minerva 🦉(goo.gle/3yGpTN7). Minerva was trained on math and science found on the web and can solve many multi-step quantitative reasoning problems.

alewkowycz@alewkowycz

Very excited to present Minerva🦉: a language model capable of solving mathematical questions using step-by-step natural language reasoning. Combining scale, data and others dramatically improves performance on the STEM benchmarks MATH and MMLU-STEM. goo.gle/3yGpTN7

English

28

514

2.6K

0

Ethan Dyer@ethansdyer·2 Tem

@HAKSOAT MMLU doesn't seem to have many pure E&M problems that require multiple steps. I agree it would be interesting to do a systematic evaluation. But here is one that I grabbed:

English

1

0

2

0

Habeeb Shopeju@HAKSOAT·1 Tem

Curious how well it would do on electromagnetics related problems. 🤔

Ethan Dyer@ethansdyer

1/ Super excited to introduce #Minerva 🦉(goo.gle/3yGpTN7). Minerva was trained on math and science found on the web and can solve many multi-step quantitative reasoning problems.

English

1

0

2

0

Ethan Dyer@ethansdyer·2 Tem

@suzuki__r Yes, it is all done through reading TeX (or math ml, mathjax etc...). Very likely that the response will depend on the style.

English

1

0

1

0

Ryo Suzuki@suzuki__r·1 Tem

Wondering how it recognizes mathematical equations. If done through LaTeX, possibly depends on the style of code writing.

Ethan Dyer@ethansdyer

1/ Super excited to introduce #Minerva 🦉(goo.gle/3yGpTN7). Minerva was trained on math and science found on the web and can solve many multi-step quantitative reasoning problems.

English

1

0

Ethan Dyer@ethansdyer·2 Tem

@KyleCranmer One fun aspect of how few shot prompting works with these generative models is we give: Question: ... Answer: ... ... Question: ... Answer: ... Question: And the model produces an answer. But then it keeps making up new questions and answers -- next year's pset 😉.

English

0

2

0

Kyle Cranmer@KyleCranmer·2 Tem

Forget solving these problems, the Killer app would be writing homework problems with solutions 🤣

Ethan Dyer@ethansdyer

1/ Super excited to introduce #Minerva 🦉(goo.gle/3yGpTN7). Minerva was trained on math and science found on the web and can solve many multi-step quantitative reasoning problems.

Middleton, WI 🇺🇸 English

2

18

0

Ethan Dyer@ethansdyer·1 Tem

@holmesjtg We don't have any concrete plans, but are definitely very interested in how this can be adapted to be a helpful tutor, answer questions as students ask them (rather than as tests phrase them) etc... Do you have any favorite datasets for this?

English

3

0

6

0

Jeff Holmes@holmesjtg·1 Tem

@ethansdyer Amazing results! Was wondering if your team is considering combining this kind of training with learning theory data, so that a model could more effectively act as a tutor, knowing how to prompt and give feedback. Working on this with #gpt3 now, but the model is not ideal.

English

2

0

13

0

Ethan Dyer@ethansdyer·1 Tem

@pablo_derbez Without additional prompting, it can still be quite brittle to such things. On the other hand, we have seen examples where the problem answer options assume some kind of rounding, Minerva solves exactly and then correctly realizes it is supposed to round.

English

1

0

4

0

Pablo Derbez@pablo_derbez·1 Tem

@ethansdyer Incredible! I'm curious what its response is if there's a mistake in the question, or if none of the options presented is correct.

English

1

0

Ethan Dyer@ethansdyer·1 Tem

3/ Find out more about Minerva in the blog post (goo.gle/3yGpTN7), paper (arxiv.org/abs/2206.14858) or explore more minerva samples (minerva-demo.github.io)!

English

3

13

75

0

Ethan Dyer@ethansdyer·1 Tem

2/ Among many impressive properties, one side effect of training on the web is that Minerva has seen text used to draw mathematical figures and so can sometimes reason about diagrams.

English

3

14

119

0

Ethan Dyer 리트윗함

Vedant Misra@vedantmisra·30 Haz

Thrilled to announce🦉Minerva: a large language model capable of solving mathematical problems using step-by-step reasoning in natural language. See blog here: goo.gle/3yGpTN7 and samples here: minerva-demo.github.io (1/n)

alewkowycz@alewkowycz

Very excited to present Minerva🦉: a language model capable of solving mathematical questions using step-by-step natural language reasoning. Combining scale, data and others dramatically improves performance on the STEM benchmarks MATH and MMLU-STEM. goo.gle/3yGpTN7

English

3

28

122

0

Ethan Dyer 리트윗함

Behnam Neyshabur@bneyshabur·30 Haz

Very excited to announce a significant milestone in expanding reasoning capabilities of language models! 🎉🎉 We introduce #Minerva🦉: a language model that can solve mathematical questions using step-by-step natural language reasoning: bit.ly/3OBj2d5 🧵 1/

alewkowycz@alewkowycz

Very excited to present Minerva🦉: a language model capable of solving mathematical questions using step-by-step natural language reasoning. Combining scale, data and others dramatically improves performance on the STEM benchmarks MATH and MMLU-STEM. goo.gle/3yGpTN7

English

11

120

601

0

Ethan Dyer

탐색