Nikhil Mehta

51 posts

Nikhil Mehta

@_nikhilmehta

Staff Research Scientist @GoogleDeepMind

Katılım Eylül 2014

346 Takip Edilen69 Takipçiler

Nikhil Mehta retweetledi

Dr. Datta M.D. (Radiology) M.B.B.S. 🇮🇳@DrDatta_AIIMS·20 Kas

🔥 Gemini 3.0 vs Radiologists: RadLE Benchmark Results Are OUT! ☠️ Is it game over for Radiology? Let us find out! ⬇️ 🫨 Since yesterday, Gemini 3.0 has been everywhere for crushing benchmarks. My inbox exploded asking: “But how did it do on the hardest visual reasoning benchmark in healthcare?” So we ran it! And here you go. 👇 ➡️ Gemini 3.0 Pro on RadLE v1: ✅ 51% accuracy; first time a general-purpose model has beaten radiology residents ✅ Radiology residents: 45% ✅ Board-certified radiologists: ~83% ✅ Shows clean step-by-step reasoning in some tough cases (appendix localization, mimics ruled out, etc.) 🚀 This is the first time ever that a generalist model has crossed the trainee bar on RadLE v1! Congratulations to @GoogleDeepMind and @Google team including @vivnat, @alan_karthi and all others for cooking this time! Full breakdown here: 🔗 Link in comments / bio 🔥 Huge shoutout to Lakshmi, Divya, Upasana, Hakikat, Kautik & the entire #CRASHLab team at @KCDH_A for turning around in under a day. 🙌 If you are a medical AI lab and want to improve your performances and want our expert insights, reach out!

Dr. Datta M.D. (Radiology) M.B.B.S. 🇮🇳 tweet media

English

188

1.2K

524.8K

Nikhil Mehta retweetledi

Sundar Pichai@sundarpichai·18 Kas

Introducing Gemini 3 ✨ It’s the best model in the world for multimodal understanding, and our most powerful agentic + vibe coding model yet. Gemini 3 can bring any idea to life, quickly grasping context and intent so you can get what you need with less prompting. Find Gemini 3 Pro rolling out today in the @Geminiapp and AI Mode in Search. For developers, build with it now in @GoogleAIStudio and Vertex AI. Excited for you to try it!

English

1.1K

2.6K

21.4K

2.9M

Nikhil Mehta retweetledi

Jeff Dean@JeffDean·19 Ara

Introducing Gemini 2.0 Flash Thinking, an experimental model that explicitly shows its thoughts. Built on 2.0 Flash’s speed and performance, this model is trained to use thoughts to strengthen its reasoning. And we see promising results when we increase inference time computation!

English

124

470

3.8K

1.5M

Nikhil Mehta retweetledi

Jeff Dean@JeffDean·18 Ara

Which model is best for turning natural language into SQL? Querying minds want to know... Gemini models in the top 4 positions.

Subhash Peshwa@Subhash_Peshwa

2024 State of LLMs for Text2SQL Tasks 🏆- Full Report 🥇 Overall Performance: @GoogleDeepMind Gemini-Exp-1206 🥇 Open Source Model: @Alibaba_Qwen 2.5-Coder:32b (Beats Sonnet 3.5 and on par with GPT-4o!) Disappointing performance by GPT-4o and 3.5 Sonnet on this task. 🧵

English

131

435

181.2K

Nikhil Mehta retweetledi

Jeff Dean@JeffDean·17 Ara

No half measures in Veo 2. It stands to reason and bears looking at.

Hernan Moraldo@hhm

Prompt: "Bear writing the solution to 2x-1=0. But only the solution!"

English

1.9K

144.6K

Nikhil Mehta retweetledi

AshutoshShrivastava@ai_for_success·17 Ara

Google Veo-2 vs OpenAI Sora. Google is getting better of OpenAI this December 😀 Credit : Veo-2 - @agrimgupta92 Sora - @AntDX316

English

691

102.4K

Nikhil Mehta retweetledi

Mahesh Sathiamoorthy@madiator·9 Ağu

Excited to offer a sneak peek at what we have been working on. Check out the LLM-AggreFact leaderboard [1] for factuality and hallucination detection, and the demo of our model that tops the leaderboard [2]. [1] llm-aggrefact.github.io [2] playground.bespokelabs.ai More info to come later!

Greg Durrett@gregd_nlp

🤔 Want to know if your LLMs are factual? You need LLM fact-checkers. 📣 Announcing the LLM-AggreFact leaderboard to rank LLM fact-checkers. 📣 Want the best model? Check out @bespokelabsai’s’ Bespoke-Minicheck-7B model, which is the current SOTA fact-checker and is cheap and fast to run. LLM-AggreFact collects 11 datasets across NLP tasks covering grounded factuality. These datasets consist of 🤖 LLM responses ✏️ annotated with their hallucinations with respect to grounding documents. This includes question answering and summarization, including RAGTruth, TofuEval, ExpertQA, and more. We benchmark 27 models on the task of detecting hallucinations. Frontier LLMs are good at this task, but very expensive to use in real-world RAG pipelines! Bespoke's model is a step towards We invite progress on this benchmark to figure out what’s the smallest and fastest model we can get to achieve top scores!

English

23.6K

Nikhil Mehta retweetledi

Oriol Vinyals@OriolVinyalsML·17 May

Today we have published our updated Gemini 1.5 Model Technical Report. As @JeffDean highlights, we have made significant progress in Gemini 1.5 Pro across all key benchmarks; TL;DR: 1.5 Pro > 1.0 Ultra, 1.5 Flash (our fastest model) ~= 1.0 Ultra. As a math undergrad, our drastic results in mathematics are particularly exciting to me! In section 7 of the tech report, we present new results on a math-specialised variant of Gemini 1.5 Pro which performs strongly on competition-level math problems, including a breakthrough performance of 91.1% on Hendryck’s MATH benchmark without tool-use (examples below 🧵). Gemini 1.5 is widely available, try it out for free here aistudio.google.com & read the full tech report here: goo.gle/GeminiV1-5

English

191

988

712.4K

Nikhil Mehta retweetledi

Gowthami@gowthami_s·17 May

@OriolVinyalsML @JeffDean Gemini 1.5 Pro did pretty well on our recently introduced long video understanding benchmark too! Did better than GPT-4o on the hard split. Congrats on a great model. 🎉

Gowthami@gowthami_s

📣 Happy to introduce, CinePile, a long video QA dataset and benchmark! 300k train and 5k test split. A 🧶. (1/9) 📃: arxiv.org/abs/2405.08813 🤗: huggingface.co/datasets/tomg-… #MachineLearning

English

14.3K

Nikhil Mehta retweetledi

Jeff Dean@JeffDean·15 May

Gemini 1.5 Flash has really great qualities. A really good capable, natively multimodal, 1M token context window (with signup available to get access to a 2M token variant), and super lower latencies and fast response generation.

Google DeepMind@GoogleDeepMind

Today, we’re excited to introduce a new Gemini model: 1.5 Flash. ⚡ It’s a lighter weight model compared to 1.5 Pro and optimized for tasks where low latency and cost matter - like chat applications, extracting data from long documents and more. #GoogleIO

English

312

67.9K

Nikhil Mehta retweetledi

Demis Hassabis@demishassabis·14 May

We think of @GoogleDeepMind as the engine room of @Google in the AI era. Thrilled to share our vision at #GoogleIO incl the latest Gemini model 1.5 Flash, Project Astra our universal AI agent effort, our new gen video model Veo, Imagen 3 & lots more! deepmind.google

English

235

1.6K

214.3K

Nikhil Mehta retweetledi

Google DeepMind@GoogleDeepMind·14 May

We watched #GoogleIO with Project Astra. 👀

English

220

1.3K

466.4K

Nikhil Mehta@_nikhilmehta·16 Nis

@YiTayML Congratulations Yi for this achievement!! Amazing vibes!

English

Yi Tay@YiTayML·15 Nis

It's been a wild ride. Just 20 of us, burning through thousands of H100s over the past months, we're glad to finally share this with the world! 💪 One of the goals we’ve had when starting Reka was to build cool innovative models at the frontier. Reaching GPT-4/Opus level was a personal goal for many of us in the team. Doing it from scratch, on top of starting a company, makes it even more challenging but rewarding. 😁 Core is still improving (not done training!) but we’re happy to ship an early version 🚢. I’ve been vibe-checking it for a bit and it’s a really cool model (especially at multimodal) 😎. Check out the blogpost, technical report and very non-cherry picked, “in the wild” showcase/demo in the thread below! Core is competitive with true frontier models. It beats Claude3 Opus on multimodal chat and matches GPT4-V on MMMU. Text metrics are competitive too (~83+ MMLU). In my mind, this is our arrival at the frontier. 😎👌🔥 More fun stuff to come in the following weeks! 😋

Reka@RekaAILabs

Meet Reka Core, our best and most capable multimodal language model yet. 🔮 It’s been a busy few months training this model and we are glad to finally ship it! 💪 Core has a lot of capabilities, and one of them is understanding video --- let’s see what Core thinks of the 3 body trailer.👇

English

927

216.5K

Nikhil Mehta retweetledi

Jeff Dean@JeffDean·15 Şub

Gemini 1.5 Pro - A highly capable multimodal model with a 10M token context length Today we are releasing the first demonstrations of the capabilities of the Gemini 1.5 series, with the Gemini 1.5 Pro model. One of the key differentiators of this model is its incredibly long context capabilities, supporting millions of tokens of multimodal input. The multimodal capabilities of the model means you can interact in sophisticated ways with entire books, very long document collections, codebases of hundreds of thousands of lines across hundreds of files, full movies, entire podcast series, and more. Gemini 1.5 was built by an amazing team of people from @GoogleDeepMind, @GoogleResearch, and elsewhere at @Google. @OriolVinyals (my co-technical lead for the project) and I are incredibly proud of the whole team, and we’re so excited to be sharing this work and what long context and in-context learning can mean for you today! There’s lots of material about this, some of which are linked to below. Main blog post: blog.google/technology/ai/… Technical report: “Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context” goo.gle/GeminiV1-5 Videos of interactions with the model that highlight its long context abilities: Understanding the three.js codebase: youtube.com/watch?v=SSnsmq… Analyzing a 45 minute Buster Keaton movie: youtube.com/watch?v=wa0MT8… Apollo 11 transcript interaction: youtube.com/watch?v=LHKL_2… Starting today, we’re offering a limited preview of 1.5 Pro to developers and enterprise customers via AI Studio and Vertex AI. Read more about this on these blogs: Google for Developers blog: developers.googleblog.com/2024/02/gemini… Google Cloud blog: cloud.google.com/blog/products/… We’ll also introduce 1.5 Pro with a standard 128,000 token context window when the model is ready for a wider release. Coming soon, we plan to introduce pricing tiers that start at the standard 128,000 context window and scale up to 1 million tokens, as we improve the model. Early testers can try the 1 million token context window at no cost during the testing period. We’re excited to see what developer’s creativity unlocks with a very long context window. Let me walk you through the capabilities of the model and what I’m excited about!

YouTube

English

183

1.1K

1.7M

Nikhil Mehta retweetledi

Google@Google·8 Şub

Bard is becoming Gemini, and we’re launching two new experiences: 1️⃣ Gemini Advanced, which gives you access to Ultra 1.0, our most capable AI model 2️⃣ A new mobile app for easier collaboration on the go Learn more ↓ goo.gle/489fXtT

English

235

653

3.3K

664K

Nikhil Mehta retweetledi

Mahesh Sathiamoorthy@madiator·21 Eyl

Our work "Recommender Systems with Generative Retrieval" got accepted to NeurIPS 😊🎉 Congrats again to my co-authors @shashank_r12, @_nikhilmehta, @vqctran, @YiTayML, @jonahsamost, @Maciej_Kula, @edchi Latest version at arxiv.org/abs/2305.05065

Mahesh Sathiamoorthy@madiator

Happy to share our recent work "Recommender Systems with Generative Retrieval"! Joint work with @shashank_r12, @_nikhilmehta, @YiTayML, @vqctran and other awesome colleagues at Google Brain, Research, and YouTube. Preprint: shashankrajput.github.io/Generative.pdf #GenerativeAI 🧵 (1/n)

English

243

64.5K

Nikhil Mehta retweetledi

Demis Hassabis@demishassabis·20 Nis

The phenomenal teams from Google Research’s Brain and @DeepMind have made many of the seminal research advances that underpin modern AI, from Deep RL to Transformers. Now we’re joining forces as a single unit, Google DeepMind, which I’m thrilled to lead! dpmd.ai/announcing-goo…

English

134

612

4.2K

1.4M

Nikhil Mehta retweetledi

Mahesh Sathiamoorthy@madiator·23 Mar

English

461

201.6K

Nikhil Mehta retweetledi

Jeff Dean@JeffDean·21 Mar

Bard is now available in the US and UK, w/more countries to come. It’s great to see early @GoogleAI work reflected in it—advances in sequence learning, large neural nets, Transformers, responsible AI techniques, dialog systems & more. You can try it at bard.google.com

Sundar Pichai@sundarpichai

We're expanding access to Bard in US + UK with more countries ahead, it's an early experiment that lets you collaborate with generative AI. Hope Bard sparks more creativity and curiosity, and will get better with feedback. Sign up: bard.google.com blog.google/technology/ai/…

English

117

709

339.7K

Nikhil Mehta retweetledi

Yann LeCun@ylecun·22 Oca

LLMs are still making sh*t up. That's fine if you use them as writing assistants. Not good as question answerers, search engines, etc. RLHF merely mitigates the most frequent mistakes without actually fixing the problem.

Delip Rao e/σ@deliprao

the success of chatgpt has lead to investors thinking RLHF is magic (to some extent it is), but boy they are going to be disappointed when their portfolios realize its limitations

English

197

1.1K

435K

Keşfet

@GoogleDeepMind @Google @vivnat @alan_karthi @KCDH_A @Geminiapp @GoogleAIStudio @agrimgupta92