Yash Jain

165 posts

Yash Jain

@_jainyash

pre-training @essential_ai. ex-Scientist @Microsoft. CS @iitbombay and @GeorgiaTech. Views are my own.

San Francisco, CA Katılım Temmuz 2021

449 Takip Edilen430 Takipçiler

Sabitlenmiş Tweet

Yash Jain@_jainyash·6 Ara

Honored to invent with these amazing colleagues! Grateful for @ashVaswani for his leadership and guidance on every research decision.

Ashish Vaswani@ashVaswani

We made new research bets on modeling program behavior that accelerated the model’s coding abilities. Tremendous advances made by @_jainyash @DevaanshGupta1 @cadarsh_335 @ssingla17 @anlthms @_saurabh.

English

716

Yash Jain retweetledi

Ritvik Kapila@RitvikKapila·31 Mar

Automating harness engineering is the future and we're building it at @NeoSigmaAI! Really excited to share our work on self-improving systems, demonstrated on Tau3 bench. Checkout our results with agent harness optimization performance on the validation set (with a fixed underlying model, GPT5.4) improving from 0.56 → 0.78 (~40% jump on accuracy). We are building closed autonomous feedback loops where our system captures failures, converts them into structured evaluation signals, proposes and validates experiments, and iteratively improves the harness. The result is an agentic harness that evolves faster and more reliably than humans ever could, leveraging far more context, running vastly more experiments, and exploring them in parallel.

Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English

Yash Jain@_jainyash·31 Mar

@gauri__gupta @shyamalanadkat @timweingarten @reah_ai @MahapatraChirag @karthik_r_n Congrats on the launch @RitvikKapila and @gauri__gupta! Tuning the harness is an interesting approach! Curious how does it fare against traditional prompt-tuning approaches? Which approach yield better generalizability from your experience?

English

121

Gauri Gupta@gauri__gupta·31 Mar

6/ Special thanks to @shyamalanadkat (ex-OpenAI), @timweingarten (ex-Anthropic, Claude Cowork), Victor Barres (Sierra), @reah_ai (Google DeepMind), @MahapatraChirag (Mercor), @karthik_r_n (GPT co-creator, ex-OpenAI) for reviewing and providing valuable feedback on this blog post. At @NeoSigmaAI, we are building this future. If you're deploying production agent systems and want to reliably maintain and improve agent harness from production signals, we'd love to talk. Full blog and Tau3 bench results here: neosigma.ai/blog/self-impr…

English

2.3K

Gauri Gupta@gauri__gupta·31 Mar

English

254

88.9K

Yash Jain@_jainyash·26 Mar

@atulit_gaur That is indeed an "expectation" of the experts we expect the gradient descent to learn. I tackled this problem a few years ago with a specialised router, give it a read arxiv.org/abs/2311.04894

English

atulit@atulit_gaur·24 Mar

the router in mixture of experts models is a linear layer. it takes a token's hidden state, multiplies it by a weight matrix of shape (num_experts, hidden_dim), softmaxes the result, and picks the top-k experts. that's it. but why does a matrix multiply "know" which expert to pick? each row of the router matrix is basically a learned prototype for that expert. the dot product measures how similar the token is to that prototype. high score = that expert gets activated. the cool part is nobody hardcodes what each expert specializes in. during training, gradient descent naturally pushes experts toward specialization because it minimizes loss better that way. one problem though - without a load balancing auxiliary loss, the router collapses and keeps sending tokens to the same 2-3 experts while the rest rot. that's why every moe paper has some balancing trick.

English

8.5K

Yash Jain@_jainyash·8 Mar

Checkout this amazing work led by @HenryOxplore @VibhavVineet at Microsoft Research, and Andrew Markham, He Liang at University of Oxford.

English

Yash Jain@_jainyash·8 Mar

Surprisingly! Current TTA models fail to distinguish between "generate a sound for a train 10 meters away" vs "a train 500 meters away" Presenting our ICLR 2026 work - Aurelius, a synthetic data generation pipeline to enhance relation-aware TTA generation. 🚀

Yuhang He (Henry)@HenryOxplore

🚀 [ICLR 2026] Existing text-to-audio generation (TTA) methods mainly focus on semantic correctness, yet they perform very poorly on relation-aware TTA generation. For example, current models achieve <30% audio event presence accuracy and <10% relation accuracy. In our newly accepted ICLR 2026 paper, we introduce Aurelius, a framework that enables relation-aware TTA research at scale. Specifically, we introduce two meticulously curated corpora: 🗂 AudioEventSet — 110 audio events across 7 major classes. 🗂 AudioRelSet — 100 relations across 6 major relation types. Based on the two corpora and the proposed data creation strategy, we can create massive (nearly unlimited) pairs with both • high linguistic diversity. • high acoustic diversity. We release all resources to support the broader community in AI, acoustics, computer vision, and multimodal research. 📄 Paper: openreview.net/pdf?id=LAYCYiI… 🗂 Dataset: huggingface.co/datasets/yuhan… 💻 Code: github.com/yuhanghe01/Aur… 🌐 Project Page: yuhanghe01.github.io/Aurelius-Proj/ Huge thanks to Andrew Markham, He Liang, @_jainyash and @VibhavVineet at Microsoft Research and University of Oxford for their unwavering support. #ICLR2026 #Multimodality

English

313

Yash Jain@_jainyash·7 Mar

Best part is we didn't even train (pre or post) on this task! 🚀

Essential AI@essential_ai

Rnj-1’s performance is especially good in correctness and abstention in its weight class, which are the two most important metrics for this work.

English

169

Yash Jain@_jainyash·24 Şub

@_xjdr Actually you can :)

English

313

xjdr@_xjdr·24 Şub

"if you prove to me that you can distill frontier policy by SFT on less than 1T tokens, i will close my lab, quit my startup and come work for you right now"

English

79.3K

Yash Jain retweetledi

Essential AI@essential_ai·24 Ara

[1/5] We thank the community for their feedback on Rnj-1. We’d like to announce a few updates to Rnj-1-instruct based on what we heard: - Resolving premature truncation of generations and improved instruction following. - Instructions for 128k context length extrapolation. - Updated evals, baselines, and model generations for reproducibility. Details follow 🧵

English

15.6K

Yash Jain@_jainyash·18 Ara

💯 I've been yelling about this show forever.

mikeBuildsMore@mkliku

Moving to SF is realizing this show wasn't a comedy, it was a documentary.

English

251

Yash Jain@_jainyash·17 Ara

@SarahChieng magic spoon - chocolate

English

Sarah Chieng@MilksandMatcha·17 Ara

Cafe Compute is sf's first late-night coffeeshop with free coffee, desks, food, and unbeatable company. Last year, Cafe Compute started as a Tuesday night experiment in our Hayes Valley living room. We kept the lights on late, ordered in-and-out, and made sure our wifi would survive. In 2025, 10,000 people showed up! Researchers, first-time founders, cereal founders, and friends-of-friends, across SF, NYC, San Diego, Vancouver, and more. Thank you so much to the community who joins us. We closed out the year with a Cafe Compute: cereal edition to launch our Big Chip Club podcast, and the verdict was unanimous: Oreo-os are king. If you want an invite next year, comment your favorite cereal and we'll add you to the list :) Let's bring SF back. 🤙 A huge shoutout to the team who makes Cafe Compute possible @communidiyi @zhennydez @cerebras @sfcompute

English

164

375.1K

Yash Jain@_jainyash·11 Ara

@Teknium @huggingface @essential_ai

GIF

QME

110

Teknium 🪽@Teknium·11 Ara

The new Hermes 36B model is #3 top trending on @huggingface🤗

English

343

37.8K

Yash Jain@_jainyash·10 Ara

@huggingface Model: huggingface.co/spaces/Essenti… Model card: huggingface.co/EssentialAI/rn…

Català

165

Yash Jain@_jainyash·10 Ara

rnj-1 is now #1 trending on @huggingface. Go test it out!

English

503

Yash Jain retweetledi

Andrew Carr 🤸@andrew_n_carr·8 Ara

seems like the open stack is glm-4.6V-Flash for multimodal olmo 3 32B thinking for math rnj-1-instruct for agentic

English

201

24.3K

Yash Jain@_jainyash·7 Ara

The amount of bugs I have asked rnj-1 to fix for me are uncountable. So happy to share it with people!

Yupp@yupp_ai

Rnj-1 Instruct helped us trace a database config error:

English

14.7K

Yash Jain retweetledi

Vipul Ved Prakash@vipulved·6 Ara

The OSS coding model that @ashVaswani and team at @essential_ai have been working on over the last year is now out and available to use on @togethercompute APIs and playground. It's particularly exciting that @essential_ai is doing fundamental work in pre-training and post training and publishing open weights! together.ai/models/rnj-1-i… essential.ai/research/rnj-1

English

122

29.2K

Yash Jain retweetledi

darren@darrenangle·6 Ara

age of research

Ashish Vaswani@ashVaswani

We are beyond thrilled to share our first flagship models, Rnj-1 base and instruct 8B parameter models. Rnj-1 is the culmination of 10 months of hard work by a phenomenal team, dedicated to advancing American SOTA OSS AI. Lots of wins with Rnj-1. 1. SWE bench performance close to GPT 4o. 2. Tool use outperforming all comparable open source models. 3. Mathematical reasoning (AIME’25) nearly at par with GPT OSS MoE 20B. ….

English

4.6K

Yash Jain retweetledi

Together AI@togethercompute·6 Ara

Introducing Rnj-1 Instruct from @essential_ai, an open-source 8B model engineered for agentic coding and STEM tasks. AI natives can now use Rnj-1 Instruct on Together AI and benefit from reliable inference for production-scale software engineering and scientific workflows.

English

5.5K

Yash Jain@_jainyash·6 Ara

@sriramk @ashVaswani It’s an amazing time for American Open source!

English

428

Sriram Krishnan@sriramk·6 Ara

Excited to see more leading American open weight models. In this case lead by @ashVaswani , one of the authors of “Attention is all you need”.

Ashish Vaswani@ashVaswani

English

434

87.3K

Keşfet

@NeoSigmaAI @gauri__gupta @shyamalanadkat @timweingarten @reah_ai @MahapatraChirag @karthik_r_n @RitvikKapila