Akshay Goindani

40 posts

Akshay Goindani

Akshay Goindani

@AkshayGoindani1

Founding Research Engineer @Voyage_AI_ | AI/NLP Grad @SCSatCMU

Katılım Temmuz 2020
549 Takip Edilen65 Takipçiler
Akshay Goindani
Akshay Goindani@AkshayGoindani1·
@DirhousssiAmine @hgoel1000 If it works with 1e-5 then is it the case that the total update with the smaller learning rate goes out of the representable range in bf16 whereas when we use 1e-5 the update is still representable? By update I mean the product of the learninf rate and gradient
English
1
0
0
21
Dirhousssi Amine
Dirhousssi Amine@DirhousssiAmine·
@hgoel1000 I am obtaining the logits from vllm exactly as pi_old. To give you another hint at the problem that I am not showing here to tease out the research, all runs converge if we push lr from 1e-6 -> 1e-5
English
1
0
1
61
Dirhousssi Amine
Dirhousssi Amine@DirhousssiAmine·
Been going down a massive rabbit hole with numerical stability in RL training lately.🕵️‍♂️🕵️ Take a look at these two GRPO sanity runs. Exact same model, identical task. One climbs perfectly, the other completely flatlines. The only difference? The dead run is in bf16, the successful one is fp32. What do you think the problem is with these runs? Drop your best guesses below !
Dirhousssi Amine tweet media
English
13
10
160
33K
Akshay Goindani
Akshay Goindani@AkshayGoindani1·
Our latest rerankers consistently outperform LLMs across all setups, delivering larger improvements regardless of the first-stage retriever. Check out our blog for more insights blog.voyageai.com/2025/10/22/the…
English
0
0
0
40
Akshay Goindani
Akshay Goindani@AkshayGoindani1·
It was great to drive this effort — the results are very exciting. Most works highlight the benefits of using LLMs for reranking, but often rely on results from weak retrieval models. When we pair rerankers with a strong retriever like voyage-3-large, those advantages disappear.
Voyage AI by MongoDB@VoyageAI

@zhmeishi @AkshayGoindani1 @HongLiu9903 Get all the details in our blog: mongodb.social/6010AAAij Shoutout to @zhmeishi , @AkshayGoindani1, and @HongLiu9903 for their incredible work on this research!

English
1
0
0
97
Akshay Goindani retweetledi
Hong Liu
Hong Liu@HongLiu9903·
Greater things to come!
Hong Liu tweet media
English
0
2
10
554
Akshay Goindani retweetledi
Dev Ittycheria
Dev Ittycheria@dittycheria·
We just launched Voyage-context-3, a new embedding model that gives AI a full-document view while preserving chunk-level precision that offers better retrieval performance than leading alternatives. When building AI that reads and reasons over documents (such as reports, contracts, or medical records), it’s critical to break those documents into smaller pieces, or “chunks,” while still maintaining an understanding of the big picture. Most systems today lose important context, or require complicated workarounds to stitch it back together. blog.voyageai.com/2025/07/23/voy…
English
2
13
25
2.7K
Akshay Goindani retweetledi
Voyage AI by MongoDB
Voyage AI by MongoDB@VoyageAI·
📢 voyage-context-3: contextualized chunk embeddings - Auto captures of chunk level detail & global doc context, w/o metadata augmentation - Beats OpenAI-v3-large by 14.24% & Cohere-v4 by 7.89% - Binary 512-dim matches OpenAI (float, 3072-dim) in accuracy, but 192x cheaper in VDB costs
Voyage AI by MongoDB tweet media
English
4
24
89
21.1K
Akshay Goindani
Akshay Goindani@AkshayGoindani1·
Learning output format is easy and quickly saturates the reward — leading to zero advantage and no gradient signal (if there's no KL). Interesting that this still seems to induce reasoning. Any hypothesis for why that happens? @natolambert
Akshay Goindani tweet mediaAkshay Goindani tweet media
Rulin Shao@RulinShao

Arxiv: arxiv.org/abs/2506.10947 Clearly a lot more work is needed to understand what’s really happening with RL and prompting. We hope that our experiments with spurious rewards and spurious prompts, as well as the released code, data, checkpoints, etc. will help with this! 🔍

English
0
0
2
165
Akshay Goindani
Akshay Goindani@AkshayGoindani1·
@shradhasgl Agreed, I think this might be because several runs are needed for RL trainings, as it is high variance. This makes it hard to reproduce the results as well. It would be interesting to see if such observations hold true after averaging over several runs. x.com/scychan_brains…
Stephanie Chan@scychan_brains

Agree that we need to remember the high variance of RL, as we push further into long horizon etc! We developed metrics to help folks track RL reliability -- codebase+paper here: github.com/google-researc…

English
0
0
2
102
Shradha Sehgal
Shradha Sehgal@shradhasgl·
Can someone pls give a tldr of what happened in RLVR this past week…
English
1
0
3
764
Akshay Goindani retweetledi
Voyage AI by MongoDB
Voyage AI by MongoDB@VoyageAI·
📢 Meet voyage-3.5 and voyage-3.5-lite! • flexible dim. and quantizations • voyage-3.5 & 3.5-lite (int8, 2048 dim.) are 8% & 6% more accurate than OpenAI-v3-large, and 2.2x & 6.5x cheaper, resp. Also 83% less vectorDB cost! • 3.5-lite ~ Cohere-v4 in quality, but 83% cheaper.
Voyage AI by MongoDB tweet media
English
2
14
43
11.4K
Akshay Goindani retweetledi
Ravid Shwartz Ziv
Ravid Shwartz Ziv@ziv_ravid·
I love @karpathy , but vibe coding is a waste of time. It is good for tech bros who want to look cool, but besides rare cases, it will not make you deliver your product faster
English
28
5
215
34.1K
Akshay Goindani
Akshay Goindani@AkshayGoindani1·
Results from our work HEMM (arxiv.org/abs/2407.03418) resonate with the findings. Frontier models like GPT-4o struggle on Healthcare tasks, with Vision - Language Medical tasks being more challenging.
Akshay Goindani tweet media
Percy Liang@percyliang

1/🧵How do we know if AI is actually ready for healthcare? We built a benchmark, MedHELM, that tests LMs on real clinical tasks instead of just medical exams. #AIinHealthcare Blog, GitHub, and link to leaderboard in thread!

English
0
0
0
127
Akshay Goindani
Akshay Goindani@AkshayGoindani1·
@percyliang Great insights! In our paper HEMM: Holistic Evaluation of Multimodal Foundation Models (arxiv.org/abs/2407.03418), we show that even frontier models like GPT-4o aren’t ready for medical tasks yet. Focusing on key Image regions is challenging.
English
0
0
1
110
Percy Liang
Percy Liang@percyliang·
3/🧵We have so far evaluated 6 models, including GPT-4o, Gemini 1.5 Pro, Qwen-2.5-7B-instruct. - Even the best model (GPT-4o) struggles on critical healthcare tasks...maybe not quite ready for prime time? - GPT-4o leads only in 2/5 categories. - Bigger ≠ better: Llama-3.3-70B-instruct actually outperforms larger models in Patient Communication and Education tasks.
English
3
1
9
1.8K
Percy Liang
Percy Liang@percyliang·
1/🧵How do we know if AI is actually ready for healthcare? We built a benchmark, MedHELM, that tests LMs on real clinical tasks instead of just medical exams. #AIinHealthcare Blog, GitHub, and link to leaderboard in thread!
Percy Liang tweet media
English
8
68
343
59.8K
Akshay Goindani
Akshay Goindani@AkshayGoindani1·
@andrew_n_carr Some paper used “think carefully and I will give you a tip” 😂 Tip is all you need
English
0
0
0
36
Andrew Carr 🤸
Andrew Carr 🤸@andrew_n_carr·
I didn't know this about reasoning models. It turns out if you add a "reward" like description in your prompt it dramatically improves performance.
Andrew Carr 🤸 tweet media
English
6
5
132
7.6K
Akshay Goindani
Akshay Goindani@AkshayGoindani1·
@AravSrinivas It’s actually very annoying after using for a while as it keeps generating long thoughts for simple things.
English
0
0
0
21
Aravind Srinivas
Aravind Srinivas@AravSrinivas·
I have been told by some Perplexity users that once they switched to using Pro R1, they just can’t stop using it. It’s too addictive to watch the steam of consciousness from the AI. Something no other product exposes btw.
English
217
110
2.7K
227.3K
Akshay Goindani
Akshay Goindani@AkshayGoindani1·
@abeirami DPO also seems like a type of contrastive learning, trying to bring logprobabilties closer.
English
1
0
2
381
Ahmad Beirami
Ahmad Beirami@abeirami·
A very nice blogpost on GRPO (the method that was used to train R1) by Youssef Mroueh
Ahmad Beirami tweet media
English
5
43
478
53.6K
Akshay Goindani
Akshay Goindani@AkshayGoindani1·
@SeunghyunSEO7 Yeah, storing just the compressed vector is fine as it can be reconstructed on the fly with the up projection matrix.
English
0
0
1
121
Akshay Goindani retweetledi
Hong Liu
Hong Liu@HongLiu9903·
Tried to reproduce the COIR results. TLDR: SFR-Embedding-Code-2B_R is 26.5% worse than voyage-code-2 as oppposed to what is claimed in the paper.
Hong Liu tweet media
Salesforce AI Research@SFResearch

🚨🚨🚨Just released!🚨🚨🚨 🚀Introducing the Salesforce Code Embedding Model Family (SFR-Embedding-Code), ranked #1 on CoIR Benchmark! 🚀 Available in 2 sizes: 2B, 400M. Key Highlights: 1️⃣ 2B Model: Achieves #1 on CoIR. 2️⃣400M Model: Best-performing model under 0.5B parameters. 3️⃣ Multi-lingual, multi-task unified training framework for code retrieval 4️⃣ Supports 12 programming languages, including Python, Java, C++, JavaScript, C#, and more! 🧑‍💻✨Empower your next AI Coding Agent with the best code embedding models! 🧑‍💻✨ Join us in advancing #AccurateAI: 📎Paper: bit.ly/4gSZteu 🤗400M Model: bit.ly/4jhDRdp 🤗2B Model: bit.ly/3PCqxmp #CodeAI #MLResearch #SOTA #OpenScience @Salesforce Big thanks to our research team for SFR-Embedding Code: Ye Liu @YeLiu918 Rui Meng @RuiMeng_ Shafiq Joty @JotyShafiq Silvio Savarese @silviocinguetta Yingbo Zhou @yingbozhou_ai Caiming Xiong @CaimingXiong Semih Yavuz @semih__yavuz

English
4
2
13
2.5K
Akshay Goindani retweetledi
Jonathan Ellis
Jonathan Ellis@spyced·
I ran a fresh evaluation of embedding models tuned for semantic retrieval, including the newest models from Voyage, Jina, Cohere, and NVIDIA. Link in thread.
Jonathan Ellis tweet mediaJonathan Ellis tweet media
English
5
5
31
11.9K