Nikhil Mehta

51 posts

Nikhil Mehta

Nikhil Mehta

@_nikhilmehta

Staff Research Scientist @GoogleDeepMind

Katılım Eylül 2014
346 Takip Edilen69 Takipçiler
Nikhil Mehta retweetledi
Dr. Datta M.D. (Radiology) M.B.B.S. 🇮🇳
🔥 Gemini 3.0 vs Radiologists: RadLE Benchmark Results Are OUT! ☠️ Is it game over for Radiology? Let us find out! ⬇️ 🫨 Since yesterday, Gemini 3.0 has been everywhere for crushing benchmarks. My inbox exploded asking: “But how did it do on the hardest visual reasoning benchmark in healthcare?” So we ran it! And here you go. 👇 ➡️ Gemini 3.0 Pro on RadLE v1: ✅ 51% accuracy; first time a general-purpose model has beaten radiology residents ✅ Radiology residents: 45% ✅ Board-certified radiologists: ~83% ✅ Shows clean step-by-step reasoning in some tough cases (appendix localization, mimics ruled out, etc.) 🚀 This is the first time ever that a generalist model has crossed the trainee bar on RadLE v1! Congratulations to @GoogleDeepMind and @Google team including @vivnat, @alan_karthi and all others for cooking this time! Full breakdown here: 🔗 Link in comments / bio 🔥 Huge shoutout to Lakshmi, Divya, Upasana, Hakikat, Kautik & the entire #CRASHLab team at @KCDH_A for turning around in under a day. 🙌 If you are a medical AI lab and want to improve your performances and want our expert insights, reach out!
Dr. Datta M.D. (Radiology) M.B.B.S. 🇮🇳 tweet media
English
75
188
1.2K
524.8K
Nikhil Mehta retweetledi
Sundar Pichai
Sundar Pichai@sundarpichai·
Introducing Gemini 3 ✨ It’s the best model in the world for multimodal understanding, and our most powerful agentic + vibe coding model yet. Gemini 3 can bring any idea to life, quickly grasping context and intent so you can get what you need with less prompting.  Find Gemini 3 Pro rolling out today in the @Geminiapp and AI Mode in Search. For developers, build with it now in @GoogleAIStudio and Vertex AI.  Excited for you to try it!
English
1.1K
2.6K
21.4K
2.9M
Nikhil Mehta retweetledi
Jeff Dean
Jeff Dean@JeffDean·
Introducing Gemini 2.0 Flash Thinking, an experimental model that explicitly shows its thoughts. Built on 2.0 Flash’s speed and performance, this model is trained to use thoughts to strengthen its reasoning. And we see promising results when we increase inference time computation!
English
124
470
3.8K
1.5M
Nikhil Mehta retweetledi
Jeff Dean
Jeff Dean@JeffDean·
Which model is best for turning natural language into SQL? Querying minds want to know... Gemini models in the top 4 positions.
Subhash Peshwa@Subhash_Peshwa

2024 State of LLMs for Text2SQL Tasks 🏆- Full Report 🥇 Overall Performance: @GoogleDeepMind Gemini-Exp-1206 🥇 Open Source Model: @Alibaba_Qwen 2.5-Coder:32b (Beats Sonnet 3.5 and on par with GPT-4o!) Disappointing performance by GPT-4o and 3.5 Sonnet on this task. 🧵

English
28
131
435
181.2K
Nikhil Mehta retweetledi
AshutoshShrivastava
AshutoshShrivastava@ai_for_success·
Google Veo-2 vs OpenAI Sora. Google is getting better of OpenAI this December 😀 Credit : Veo-2 - @agrimgupta92 Sora - @AntDX316
English
39
77
691
102.4K
Nikhil Mehta retweetledi
Mahesh Sathiamoorthy
Mahesh Sathiamoorthy@madiator·
Excited to offer a sneak peek at what we have been working on. Check out the LLM-AggreFact leaderboard [1] for factuality and hallucination detection, and the demo of our model that tops the leaderboard [2]. [1] llm-aggrefact.github.io [2] playground.bespokelabs.ai More info to come later!
Greg Durrett@gregd_nlp

🤔 Want to know if your LLMs are factual? You need LLM fact-checkers. ​ 📣 Announcing the LLM-AggreFact leaderboard to rank LLM fact-checkers. ​ 📣 Want the best model? Check out @bespokelabsai’s’ Bespoke-Minicheck-7B model, which is the current SOTA fact-checker and is cheap and fast to run. ​ LLM-AggreFact collects 11 datasets across NLP tasks covering grounded factuality. These datasets consist of 🤖 LLM responses ✏️ annotated with their hallucinations with respect to grounding documents. This includes question answering and summarization, including RAGTruth, TofuEval, ExpertQA, and more. ​ We benchmark 27 models on the task of detecting hallucinations. ​ Frontier LLMs are good at this task, but very expensive to use in real-world RAG pipelines! Bespoke's model is a step towards We invite progress on this benchmark to figure out what’s the smallest and fastest model we can get to achieve top scores!

English
1
20
89
23.6K
Nikhil Mehta retweetledi
Oriol Vinyals
Oriol Vinyals@OriolVinyalsML·
Today we have published our updated Gemini 1.5 Model Technical Report. As @JeffDean highlights, we have made significant progress in Gemini 1.5 Pro across all key benchmarks; TL;DR: 1.5 Pro > 1.0 Ultra, 1.5 Flash (our fastest model) ~= 1.0 Ultra. As a math undergrad, our drastic results in mathematics are particularly exciting to me! In section 7 of the tech report, we present new results on a math-specialised variant of Gemini 1.5 Pro which performs strongly on competition-level math problems, including a breakthrough performance of 91.1% on Hendryck’s MATH benchmark without tool-use (examples below 🧵). Gemini 1.5 is widely available, try it out for free here aistudio.google.com & read the full tech report here: goo.gle/GeminiV1-5
Oriol Vinyals tweet media
English
42
191
988
712.4K
Nikhil Mehta retweetledi
Jeff Dean
Jeff Dean@JeffDean·
Gemini 1.5 Flash has really great qualities. A really good capable, natively multimodal, 1M token context window (with signup available to get access to a 2M token variant), and super lower latencies and fast response generation.
Google DeepMind@GoogleDeepMind

Today, we’re excited to introduce a new Gemini model: 1.5 Flash. ⚡ It’s a lighter weight model compared to 1.5 Pro and optimized for tasks where low latency and cost matter - like chat applications, extracting data from long documents and more. #GoogleIO

English
10
41
312
67.9K
Nikhil Mehta retweetledi
Demis Hassabis
Demis Hassabis@demishassabis·
We think of @GoogleDeepMind as the engine room of @Google in the AI era. Thrilled to share our vision at #GoogleIO incl the latest Gemini model 1.5 Flash, Project Astra our universal AI agent effort, our new gen video model Veo, Imagen 3 & lots more! deepmind.google
Demis Hassabis tweet media
English
83
235
1.6K
214.3K
Nikhil Mehta retweetledi
Google DeepMind
Google DeepMind@GoogleDeepMind·
We watched #GoogleIO with Project Astra. 👀
English
67
220
1.3K
466.4K
Nikhil Mehta
Nikhil Mehta@_nikhilmehta·
@YiTayML Congratulations Yi for this achievement!! Amazing vibes!
English
0
0
0
18
Yi Tay
Yi Tay@YiTayML·
It's been a wild ride. Just 20 of us, burning through thousands of H100s over the past months, we're glad to finally share this with the world! 💪 One of the goals we’ve had when starting Reka was to build cool innovative models at the frontier. Reaching GPT-4/Opus level was a personal goal for many of us in the team. Doing it from scratch, on top of starting a company, makes it even more challenging but rewarding. 😁 Core is still improving (not done training!) but we’re happy to ship an early version 🚢. I’ve been vibe-checking it for a bit and it’s a really cool model (especially at multimodal) 😎. Check out the blogpost, technical report and very non-cherry picked, “in the wild” showcase/demo in the thread below! Core is competitive with true frontier models. It beats Claude3 Opus on multimodal chat and matches GPT4-V on MMMU. Text metrics are competitive too (~83+ MMLU). In my mind, this is our arrival at the frontier. 😎👌🔥 More fun stuff to come in the following weeks! 😋
Reka@RekaAILabs

Meet Reka Core, our best and most capable multimodal language model yet. 🔮 It’s been a busy few months training this model and we are glad to finally ship it! 💪 Core has a lot of capabilities, and one of them is understanding video --- let’s see what Core thinks of the 3 body trailer.👇

English
63
85
927
216.5K
Nikhil Mehta retweetledi
Jeff Dean
Jeff Dean@JeffDean·
Gemini 1.5 Pro - A highly capable multimodal model with a 10M token context length Today we are releasing the first demonstrations of the capabilities of the Gemini 1.5 series, with the Gemini 1.5 Pro model. One of the key differentiators of this model is its incredibly long context capabilities, supporting millions of tokens of multimodal input. The multimodal capabilities of the model means you can interact in sophisticated ways with entire books, very long document collections, codebases of hundreds of thousands of lines across hundreds of files, full movies, entire podcast series, and more. Gemini 1.5 was built by an amazing team of people from @GoogleDeepMind, @GoogleResearch, and elsewhere at @Google. @OriolVinyals (my co-technical lead for the project) and I are incredibly proud of the whole team, and we’re so excited to be sharing this work and what long context and in-context learning can mean for you today! There’s lots of material about this, some of which are linked to below. Main blog post: blog.google/technology/ai/… Technical report: “Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context” goo.gle/GeminiV1-5 Videos of interactions with the model that highlight its long context abilities: Understanding the three.js codebase: youtube.com/watch?v=SSnsmq… Analyzing a 45 minute Buster Keaton movie: youtube.com/watch?v=wa0MT8… Apollo 11 transcript interaction: youtube.com/watch?v=LHKL_2… Starting today, we’re offering a limited preview of 1.5 Pro to developers and enterprise customers via AI Studio and Vertex AI. Read more about this on these blogs: Google for Developers blog: developers.googleblog.com/2024/02/gemini… Google Cloud blog: cloud.google.com/blog/products/… We’ll also introduce 1.5 Pro with a standard 128,000 token context window when the model is ready for a wider release. Coming soon, we plan to introduce pricing tiers that start at the standard 128,000 context window and scale up to 1 million tokens, as we improve the model. Early testers can try the 1 million token context window at no cost during the testing period. We’re excited to see what developer’s creativity unlocks with a very long context window. Let me walk you through the capabilities of the model and what I’m excited about!
YouTube video
YouTube
YouTube video
YouTube
YouTube video
YouTube
Jeff Dean tweet media
English
183
1.1K
6K
1.7M
Nikhil Mehta retweetledi
Google
Google@Google·
Bard is becoming Gemini, and we’re launching two new experiences: 1️⃣ Gemini Advanced, which gives you access to Ultra 1.0, our most capable AI model 2️⃣ A new mobile app for easier collaboration on the go Learn more ↓ goo.gle/489fXtT
English
235
653
3.3K
664K
Nikhil Mehta retweetledi
Mahesh Sathiamoorthy
Mahesh Sathiamoorthy@madiator·
Our work "Recommender Systems with Generative Retrieval" got accepted to NeurIPS 😊🎉 Congrats again to my co-authors @shashank_r12, @_nikhilmehta, @vqctran, @YiTayML, @jonahsamost, @Maciej_Kula, @edchi Latest version at arxiv.org/abs/2305.05065
Mahesh Sathiamoorthy@madiator

Happy to share our recent work "Recommender Systems with Generative Retrieval"! Joint work with @shashank_r12, @_nikhilmehta, @YiTayML, @vqctran and other awesome colleagues at Google Brain, Research, and YouTube. Preprint: shashankrajput.github.io/Generative.pdf #GenerativeAI 🧵 (1/n)

English
7
30
243
64.5K
Nikhil Mehta retweetledi
Demis Hassabis
Demis Hassabis@demishassabis·
The phenomenal teams from Google Research’s Brain and @DeepMind have made many of the seminal research advances that underpin modern AI, from Deep RL to Transformers. Now we’re joining forces as a single unit, Google DeepMind, which I’m thrilled to lead! dpmd.ai/announcing-goo…
English
134
612
4.2K
1.4M
Nikhil Mehta retweetledi
Jeff Dean
Jeff Dean@JeffDean·
Bard is now available in the US and UK, w/more countries to come. It’s great to see early @GoogleAI work reflected in it—advances in sequence learning, large neural nets, Transformers, responsible AI techniques, dialog systems & more. You can try it at bard.google.com
Sundar Pichai@sundarpichai

We're expanding access to Bard in US + UK with more countries ahead, it's an early experiment that lets you collaborate with generative AI. Hope Bard sparks more creativity and curiosity, and will get better with feedback. Sign up: bard.google.com blog.google/technology/ai/…

English
27
117
709
339.7K
Nikhil Mehta retweetledi
Yann LeCun
Yann LeCun@ylecun·
LLMs are still making sh*t up. That's fine if you use them as writing assistants. Not good as question answerers, search engines, etc. RLHF merely mitigates the most frequent mistakes without actually fixing the problem.
Delip Rao e/σ@deliprao

the success of chatgpt has lead to investors thinking RLHF is magic (to some extent it is), but boy they are going to be disappointed when their portfolios realize its limitations

English
48
197
1.1K
435K