Akshay Goindani

40 posts

Akshay Goindani

@AkshayGoindani1

Founding Research Engineer @Voyage_AI_ | AI/NLP Grad @SCSatCMU

Katılım Temmuz 2020

549 Takip Edilen65 Takipçiler

Akshay Goindani@AkshayGoindani1·31 Mar

@DirhousssiAmine @hgoel1000 If it works with 1e-5 then is it the case that the total update with the smaller learning rate goes out of the representable range in bf16 whereas when we use 1e-5 the update is still representable? By update I mean the product of the learninf rate and gradient

English

Dirhousssi Amine@DirhousssiAmine·30 Mar

@hgoel1000 I am obtaining the logits from vllm exactly as pi_old. To give you another hint at the problem that I am not showing here to tease out the research, all runs converge if we push lr from 1e-6 -> 1e-5

English

Dirhousssi Amine@DirhousssiAmine·28 Mar

Been going down a massive rabbit hole with numerical stability in RL training lately.🕵️‍♂️🕵️ Take a look at these two GRPO sanity runs. Exact same model, identical task. One climbs perfectly, the other completely flatlines. The only difference? The dead run is in bf16, the successful one is fp32. What do you think the problem is with these runs? Drop your best guesses below !

English

160

33K

Akshay Goindani@AkshayGoindani1·22 Eki

Our latest rerankers consistently outperform LLMs across all setups, delivering larger improvements regardless of the first-stage retriever. Check out our blog for more insights blog.voyageai.com/2025/10/22/the…

English

Akshay Goindani@AkshayGoindani1·22 Eki

It was great to drive this effort — the results are very exciting. Most works highlight the benefits of using LLMs for reranking, but often rely on results from weak retrieval models. When we pair rerankers with a strong retriever like voyage-3-large, those advantages disappear.

Voyage AI by MongoDB@VoyageAI

@zhmeishi @AkshayGoindani1 @HongLiu9903 Get all the details in our blog: mongodb.social/6010AAAij Shoutout to @zhmeishi , @AkshayGoindani1, and @HongLiu9903 for their incredible work on this research!

English

Akshay Goindani@AkshayGoindani1·11 Ağu

Excited to see Instruction-following capabilities in our latest rerankers.

Hong Liu@HongLiu9903

Instruction-following takes reranker capabilities to the next level 🔥 Huge thanks to @zhmeishi and @AkshayGoindani1 for driving this leap forward!

English

234

Akshay Goindani retweetledi

Hong Liu@HongLiu9903·29 Tem

Greater things to come!

English

554

Akshay Goindani retweetledi

Dev Ittycheria@dittycheria·23 Tem

We just launched Voyage-context-3, a new embedding model that gives AI a full-document view while preserving chunk-level precision that offers better retrieval performance than leading alternatives. When building AI that reads and reasons over documents (such as reports, contracts, or medical records), it’s critical to break those documents into smaller pieces, or “chunks,” while still maintaining an understanding of the big picture. Most systems today lose important context, or require complicated workarounds to stitch it back together. blog.voyageai.com/2025/07/23/voy…

English

2.7K

Akshay Goindani retweetledi

Voyage AI by MongoDB@VoyageAI·23 Tem

📢 voyage-context-3: contextualized chunk embeddings - Auto captures of chunk level detail & global doc context, w/o metadata augmentation - Beats OpenAI-v3-large by 14.24% & Cohere-v4 by 7.89% - Binary 512-dim matches OpenAI (float, 3072-dim) in accuracy, but 192x cheaper in VDB costs

English

21.1K

Akshay Goindani@AkshayGoindani1·22 Haz

Learning output format is easy and quickly saturates the reward — leading to zero advantage and no gradient signal (if there's no KL). Interesting that this still seems to induce reasoning. Any hypothesis for why that happens? @natolambert

Rulin Shao@RulinShao

Arxiv: arxiv.org/abs/2506.10947 Clearly a lot more work is needed to understand what’s really happening with RL and prompting. We hope that our experiments with spurious rewards and spurious prompts, as well as the released code, data, checkpoints, etc. will help with this! 🔍

English

165

Akshay Goindani@AkshayGoindani1·30 May

@shradhasgl Agreed, I think this might be because several runs are needed for RL trainings, as it is high variance. This makes it hard to reproduce the results as well. It would be interesting to see if such observations hold true after averaging over several runs. x.com/scychan_brains…

Stephanie Chan@scychan_brains

Agree that we need to remember the high variance of RL, as we push further into long horizon etc! We developed metrics to help folks track RL reliability -- codebase+paper here: github.com/google-researc…

English

102

Shradha Sehgal@shradhasgl·30 May

@AkshayGoindani1 Thanks @AkshayGoindani1 looks like we might be quick to assume these gains since evils did not reproduce the baselines effectively x.com/shashwatgoel7/…

Shashwat Goel@ShashwatGoel7

Confused about recent LLM RL results where models improve without any ground-truth signal? We were too. Until we looked at the reported numbers of the Pre-RL models and realized they were serverely underreported across papers. We compiled discrepancies in a blog below🧵👇

English

186

Shradha Sehgal@shradhasgl·30 May

Can someone pls give a tldr of what happened in RLVR this past week…

English

764

Akshay Goindani retweetledi

Voyage AI by MongoDB@VoyageAI·20 May

📢 Meet voyage-3.5 and voyage-3.5-lite! • flexible dim. and quantizations • voyage-3.5 & 3.5-lite (int8, 2048 dim.) are 8% & 6% more accurate than OpenAI-v3-large, and 2.2x & 6.5x cheaper, resp. Also 83% less vectorDB cost! • 3.5-lite ~ Cohere-v4 in quality, but 83% cheaper.

English

11.4K

Akshay Goindani retweetledi

Ravid Shwartz Ziv@ziv_ravid·11 Mar

I love @karpathy , but vibe coding is a waste of time. It is good for tech bros who want to look cool, but besides rare cases, it will not make you deliver your product faster

English

215

34.1K

Akshay Goindani@AkshayGoindani1·1 Mar

Results from our work HEMM (arxiv.org/abs/2407.03418) resonate with the findings. Frontier models like GPT-4o struggle on Healthcare tasks, with Vision - Language Medical tasks being more challenging.

Percy Liang@percyliang

1/🧵How do we know if AI is actually ready for healthcare? We built a benchmark, MedHELM, that tests LMs on real clinical tasks instead of just medical exams. #AIinHealthcare Blog, GitHub, and link to leaderboard in thread!

English

127

Akshay Goindani@AkshayGoindani1·1 Mar

@percyliang Great insights! In our paper HEMM: Holistic Evaluation of Multimodal Foundation Models (arxiv.org/abs/2407.03418), we show that even frontier models like GPT-4o aren’t ready for medical tasks yet. Focusing on key Image regions is challenging.

English

110

Percy Liang@percyliang·28 Şub

3/🧵We have so far evaluated 6 models, including GPT-4o, Gemini 1.5 Pro, Qwen-2.5-7B-instruct. - Even the best model (GPT-4o) struggles on critical healthcare tasks...maybe not quite ready for prime time? - GPT-4o leads only in 2/5 categories. - Bigger ≠ better: Llama-3.3-70B-instruct actually outperforms larger models in Patient Communication and Education tasks.

English

1.8K

Percy Liang@percyliang·28 Şub

English

343

59.8K

Akshay Goindani@AkshayGoindani1·15 Şub

@andrew_n_carr Some paper used “think carefully and I will give you a tip” 😂 Tip is all you need

English

Andrew Carr 🤸@andrew_n_carr·14 Şub

I didn't know this about reasoning models. It turns out if you add a "reward" like description in your prompt it dramatically improves performance.

English

132

7.6K

Akshay Goindani@AkshayGoindani1·9 Şub

@AravSrinivas It’s actually very annoying after using for a while as it keeps generating long thoughts for simple things.

English

Aravind Srinivas@AravSrinivas·9 Şub

I have been told by some Perplexity users that once they switched to using Pro R1, they just can’t stop using it. It’s too addictive to watch the steam of consciousness from the AI. Something no other product exposes btw.

English

217

110

2.7K

227.3K

Akshay Goindani@AkshayGoindani1·5 Şub

@abeirami DPO also seems like a type of contrastive learning, trying to bring logprobabilties closer.

English

381

Ahmad Beirami@abeirami·4 Şub

A very nice blogpost on GRPO (the method that was used to train R1) by Youssef Mroueh

English

478

53.6K

Akshay Goindani@AkshayGoindani1·3 Şub

@SeunghyunSEO7 Yeah, storing just the compressed vector is fine as it can be reconstructed on the fly with the up projection matrix.

English

121

Seunghyun Seo@SeunghyunSEO7·3 Şub

MLA update: why whale bros use Q compression? it saves activation memory a lot because they checkpoint only compressed q tensor. again, pls correct me if I'm wrong. (I'm not sure layer output is saved in bf16)

Seunghyun Seo@SeunghyunSEO7

what up guys, I made a one-page comparison of MHA and MLA from @deepseek_ai for those who skipped the DS-V2 paper. pls correct me if I'm wrong.

English

7.5K

Akshay Goindani retweetledi

Hong Liu@HongLiu9903·19 Oca

Tried to reproduce the COIR results. TLDR: SFR-Embedding-Code-2B_R is 26.5% worse than voyage-code-2 as oppposed to what is claimed in the paper.

Salesforce AI Research@SFResearch

🚨🚨🚨Just released!🚨🚨🚨 🚀Introducing the Salesforce Code Embedding Model Family (SFR-Embedding-Code), ranked #1 on CoIR Benchmark! 🚀 Available in 2 sizes: 2B, 400M. Key Highlights: 1️⃣ 2B Model: Achieves #1 on CoIR. 2️⃣400M Model: Best-performing model under 0.5B parameters. 3️⃣ Multi-lingual, multi-task unified training framework for code retrieval 4️⃣ Supports 12 programming languages, including Python, Java, C++, JavaScript, C#, and more! 🧑‍💻✨Empower your next AI Coding Agent with the best code embedding models! 🧑‍💻✨ Join us in advancing #AccurateAI: 📎Paper: bit.ly/4gSZteu 🤗400M Model: bit.ly/4jhDRdp 🤗2B Model: bit.ly/3PCqxmp #CodeAI #MLResearch #SOTA #OpenScience @Salesforce Big thanks to our research team for SFR-Embedding Code: Ye Liu @YeLiu918 Rui Meng @RuiMeng_ Shafiq Joty @JotyShafiq Silvio Savarese @silviocinguetta Yingbo Zhou @yingbozhou_ai Caiming Xiong @CaimingXiong Semih Yavuz @semih__yavuz

English

2.5K

Akshay Goindani retweetledi

Jonathan Ellis@spyced·9 Oca

I ran a fresh evaluation of embedding models tuned for semantic retrieval, including the newest models from Voyage, Jina, Cohere, and NVIDIA. Link in thread.

English

11.9K

Keşfet

@DirhousssiAmine @hgoel1000 @natolambert @shradhasgl @karpathy @percyliang @andrew_n_carr @elonmusk