Harsh Mehta

70 posts

Harsh Mehta

@HarshMeh1a

@MirendilAI, Past: AI R&D @AnthropicAI, @GoogleDeepmind, Gemini

San Francisco, CA Katılım Ağustos 2013

407 Takip Edilen5.1K Takipçiler

Sabitlenmiş Tweet

Harsh Mehta@HarshMeh1a·6 Şub

Career Update: I've left Anthropic to start something new. Anthropic is a magical place — amazing people, strong culture, and unmatched taste. I have a lot of respect for my friends and ex-colleagues, and knowing them, I'm confident they'll do the right thing, especially when the choices are hard. I feel grateful and proud to have been part of the journey with them. Excited to start a new chapter — stay tuned for more!

English

1.7K

288.3K

Harsh Mehta@HarshMeh1a·26 Mar

A glimpse of what we’ve been building. If this excites you, join us.

Behnam Neyshabur@bneyshabur

We have been heads down but wanted to share a bit about what we are doing 🧵

English

2.5K

Harsh Mehta@HarshMeh1a·8 Şub

Shayan is a machine, that xAI work ethic 🤌

Shayan@shayan_

Career update: I left xAI to start something new, closing my 7+ year chapter working at Twitter, X, and xAI with so much gratitude. xAI is truly an extraordinary place. The team is incredibly hardcore and talented, shipping at a pace that shouldn’t be possible. From the Home Timeline at X to Grok 2, 3, and 4 at xAI, I worked across product infra and model behavior post-training, from memory to coding infra, agents, and more. Pure startup mode, every day. Working closely with Elon across X and xAI, I saw what happens when you refuse to accept impossible as an answer. I learned to embody obsessive attention to detail, maniacal urgency, and to think from first principles. I’m deeply grateful to @elonmusk for the experience, to @wanghaofei for the trust and support throughout Twitter/X, and to @TheGregYang and @ibab for believing in me. And to the many incredible people I had the privilege to work with along the way, thank you! Now, I’m excited to take the leap and build something new, focused on accelerating science. More soon.

English

263

49.8K

Harsh Mehta retweetledi

Behnam Neyshabur@bneyshabur·5 Şub

I've left Anthropic to start something new. 🧵

English

155

2.9K

403.7K

Harsh Mehta@HarshMeh1a·30 Eyl

The best coding model in the world (among other things) — check it out!

Claude@claudeai

Introducing Claude Sonnet 4.5—the best coding model in the world. It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math.

English

14.3K

Harsh Mehta@HarshMeh1a·6 Ağu

Opus is a beast -- try it out!

Anthropic@AnthropicAI

Today we're releasing Claude Opus 4.1, an upgrade to Claude Opus 4 on agentic tasks, real-world coding, and reasoning.

English

6.8K

Harsh Mehta retweetledi

Ashok Cutkosky@AshokCutkosky·10 Şub

Some ideas on a new optimizer from my student Qinzi Zhang: (github.com/ZQZCalin/train…) Early stages, but the empirical results are really promising! Would love to hear any thoughts, either on the empirical side or analysis-wise, and open to collaboration!

English

16.4K

Harsh Mehta retweetledi

Arena.ai@arena·1 Ağu

Exciting News from Chatbot Arena! @GoogleDeepMind's new Gemini 1.5 Pro (Experimental 0801) has been tested in Arena for the past week, gathering over 12K community votes. For the first time, Google Gemini has claimed the #1 spot, surpassing GPT-4o/Claude-3.5 with an impressive score of 1300 (!), and also achieving #1 on our Vision Leaderboard. Gemini 1.5 Pro (0801) excels in multi-lingual tasks and delivers robust performance in technical areas like Math, Hard Prompts, and Coding. Huge congrats to @GoogleDeepMind on this remarkable milestone! Gemini (0801) Category Rankings: - Overall: #1 - Math: #1-3 - Instruction-Following: #1-2 - Coding: #3-5 - Hard Prompts (English): #2-5 Come try the model and let us know your feedback! More analysis below👇

Logan Kilpatrick@OfficialLoganK

Today, we are making an experimental version (0801) of Gemini 1.5 Pro available for early testing and feedback in Google AI Studio and the Gemini API. Try it out and let us know what you think! aistudio.google.com

English

390

1.6K

1.3M

Harsh Mehta retweetledi

Aaron Defazio@aaron_defazio·1 Ağu

Schedule-Free Wins AlgoPerf Self-Tuning Track 🎉 I'm pleased to announce that Schedule-Free AdamW set a new SOTA for self-tuning training algorithms, besting AdamW and all other submissions by 8% overall. Try it out: github.com/facebookresear…

MLCommons@MLCommons

@MLCommons #AlgoPerf results are in! 🏁 $50K prize competition yielded 28% faster neural net training with non-diagonal preconditioning beating Nesterov Adam. New SOTA for hyperparameter-free algorithms too! Full details in our blog. mlcommons.org/2024/08/mlc-al… #AIOptimization #AI

English

289

122.4K

Harsh Mehta retweetledi

Aaron Defazio@aaron_defazio·28 May

Schedule-Free paper is up! arxiv.org/abs/2405.15682 Joint work with collaborators @alicey_ang @HarshMeh1a @konstmish @akhaledv2 @AshokCutkosky We have some strong small-scale experiments on Transformers, comparing to chinchilla-style cosine 10x reduction schedules.

English

499

287.9K

Harsh Mehta@HarshMeh1a·18 May

Checkout our work on pushing the boundaries of reasoning capabilities of Gemini 1.5 Pro! We've been working hard at this. Excited about the progress we've made and even more so for what's next! Full report: goo.gle/GeminiV1-5

Oriol Vinyals@OriolVinyalsML

Today we have published our updated Gemini 1.5 Model Technical Report. As @JeffDean highlights, we have made significant progress in Gemini 1.5 Pro across all key benchmarks; TL;DR: 1.5 Pro > 1.0 Ultra, 1.5 Flash (our fastest model) ~= 1.0 Ultra. As a math undergrad, our drastic results in mathematics are particularly exciting to me! In section 7 of the tech report, we present new results on a math-specialised variant of Gemini 1.5 Pro which performs strongly on competition-level math problems, including a breakthrough performance of 91.1% on Hendryck’s MATH benchmark without tool-use (examples below 🧵). Gemini 1.5 is widely available, try it out for free here aistudio.google.com & read the full tech report here: goo.gle/GeminiV1-5

English

8.5K

Harsh Mehta@HarshMeh1a·6 Nis

If you hate LR schedules as much as I do, check this out! Think of the significance in practice, especially for LLMs — new data comes in, you continue training w/o any change, as simple as that 🔥 Stay tuned for official Jax version..

Aaron Defazio@aaron_defazio

Schedule-Free Learning github.com/facebookresear… We have now open sourced the algorithm behind my series of mysterious plots. Each plot was either Schedule-free SGD or Adam, no other tricks!

English

16K

Harsh Mehta retweetledi

Oriol Vinyals@OriolVinyalsML·6 Ara

Exciting times, welcome Gemini (and MMLU>90)! State-of-the-art on 30 out of 32 benchmarks across text, coding, audio, images, and video, with a single model 🤯 Co-leading Gemini has been my most exciting endeavor, fueled by a very ambitious goal. And that is just the beginning! A long 🐍 post about our Gemini journey & state of the field. The biggest challenges in LLMs are far from trivial or obvious. Evaluation and data stand out to me. We've moved beyond the simpler "Have we won in Go/Chess/StarCraft?" to “Is this answer accurate and fair? Is this conversation good? Does this complex piece of text prove the theorem?” Exciting potential coupled with monumental challenges. The field is less ripe further down the model pipeline. Pretraining is relatively well understood. Instruction tuning and RLHF, less so. In AlphaGo and AlphaStar we spent 5% of compute in pre-training and the rest in the very important RL phase, where the model learns from its successes or failures. In LLMs, we spend most of our time on pretraining. I believe there’s huge potential to be untapped. Cakes with lots of cherries, please 🎂 @Google has demonstrated its ability to move fast. It has been an absolute blast to see the energy from my colleagues and the support received. A “random” highlight is coauthoring our tech report with a co-founder. Another is coleading with @JeffDean. But beyond individuals, Gemini is about teamwork: it is important to recognize the collective effort behind such achievements. Picture a room full of brilliant people, and avoid attributing success solely to one person. On a personal note, recently I celebrated my 10 year anniversary at Google, and it’s been 8 years since @quocleix and I co-authored “A Neural Conversational Model”, which gave us a glimpse of what was, has, and is yet to come. Back then, that line of work received a lot of skepticism. Lessons learned: whatever your passion is, push for it! Zooming back out, there’s lots of change in our field, and the stakes couldn’t be higher. Excited for what’s to come from Gemini, but humbled by the responsibility to “get it right”. 2024 will be drastic. Welcome Gemini! blog.google/technology/ai/…

GIF

English

267

456.2K

Harsh Mehta retweetledi

Google DeepMind@GoogleDeepMind·6 Ara

We’re excited to announce 𝗚𝗲𝗺𝗶𝗻𝗶: @Google’s largest and most capable AI model. Built to be natively multimodal, it can understand and operate across text, code, audio, image and video - and achieves state-of-the-art performance across many tasks. 🧵 dpmd.ai/announcing-gem…

English

162

1.5K

5.8K

1.3M

Harsh Mehta@HarshMeh1a·6 Ara

It’s been immense fun to have contributed to Gemini and a privilege to work with such talented colleagues! This is 1.0, we will continue to ship 💜

Jeff Dean@JeffDean

I’m very excited to share our work on Gemini today! Gemini is a family of multimodal models that demonstrate really strong capabilities across the image, audio, video, and text domains. Our most-capable model, Gemini Ultra, advances the state of the art in 30 of 32 benchmarks, including 10 of 12 popular text and reasoning benchmarks, 9 of 9 image understanding benchmarks, 6 of 6 video understanding benchmarks, and 5 of 5 speech recognition and speech translation benchmarks. Gemini Ultra is the first model to achieve human-expert performance on MMLU across 57 subjects with a score above 90%. It also achieves a new state-of-the-art score of 62.4% on the new MMMU multimodal reasoning benchmark, outperforming the previous best model by more than 5 percentage points. Gemini was built by an awesome team of people from @GoogleDeepMind, @GoogleResearch, and elsewhere at @Google, and is one of the largest science and engineering efforts we’ve ever undertaken. As one of the two overall technical leads of the Gemini effort, along with my colleague @OriolVinyalsML, I am incredibly proud of the whole team, and we’re so excited to be sharing our work with you today! There’s quite a lot of different material about Gemini available, starting with: Main blog post: blog.google/technology/ai/… 60-page technical report authored by th Gemini Team: deepmind.google/gemini/gemini_… In this thread, I’ll walk you through some of the highlights.

English

2.7K

Harsh Mehta@HarshMeh1a·30 Eki

@keirp1 Thanks for all your awesome contributions, it was great to have you with us! Until next time :)

English

131

Keiran Paster@keirp1·29 Eki

Heading back to Toronto after spending the fall at Google hosted by @HarshMeh1a and working with amazing Blueshift and Gemini teammates! It's a really fun time to work on LLMs and I hope to be back soon!

English

4.7K

Harsh Mehta retweetledi

Konstantin Mishchenko@konstmish·23 Eki

Why do we need warm-up, cosine annealing, and other learning rate schedules when training with gradient descent? It turns out it's all about how gradient norms change over time. E.g., large norms at the start => warm-up. Slow decrease => cosine. Paper: arxiv.org/abs/2310.07831 0/4

English

412

56.2K

Harsh Mehta@HarshMeh1a·24 Eki

Check out our paper at arxiv.org/pdf/2310.07831… Join work with @aaron_defazio @AshokCutkosky and @konstmish

English

306

Harsh Mehta@HarshMeh1a·24 Eki

3) All the cool kids tend to use "cosine decay" schedule these days by default, our work explains and illustrates with a number of experiments that "linear decay" can be surprisingly effective and sometimes even outperform cosine, including on LLMs!

English

330

Harsh Mehta@HarshMeh1a·24 Eki

Check-out our new work on learning the learning rate *schedule*! In addition to providing theoretical motivation for some prevalent LR schedule best-practices, our work provides new recommendations, a short 🧵👇

Aaron Defazio@aaron_defazio

🚨 New Paper 🚨 A new approach to learning rate scheduling! Our refinement theory gives schedules that include warmup and annealing-to-zero automatically. arxiv.org/abs/2310.07831 It improves on strong baseline schedules across a majority of deep learning problems!

English

1.3K

Keşfet

@GoogleDeepMind @alicey_ang @konstmish @AshokCutkosky @Google @JeffDean @quocleix @keirp1