Yash Jain

165 posts

Yash Jain banner
Yash Jain

Yash Jain

@_jainyash

pre-training @essential_ai. ex-Scientist @Microsoft. CS @iitbombay and @GeorgiaTech. Views are my own.

San Francisco, CA Katılım Temmuz 2021
449 Takip Edilen430 Takipçiler
Yash Jain retweetledi
Ritvik Kapila
Ritvik Kapila@RitvikKapila·
Automating harness engineering is the future and we're building it at @NeoSigmaAI! Really excited to share our work on self-improving systems, demonstrated on Tau3 bench. Checkout our results with agent harness optimization performance on the validation set (with a fixed underlying model, GPT5.4) improving from 0.56 → 0.78 (~40% jump on accuracy). We are building closed autonomous feedback loops where our system captures failures, converts them into structured evaluation signals, proposes and validates experiments, and iteratively improves the harness. The result is an agentic harness that evolves faster and more reliably than humans ever could, leveraging far more context, running vastly more experiments, and exploring them in parallel.
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
2
5
19
2K
Gauri Gupta
Gauri Gupta@gauri__gupta·
6/ Special thanks to @shyamalanadkat (ex-OpenAI), @timweingarten (ex-Anthropic, Claude Cowork), Victor Barres (Sierra), @reah_ai (Google DeepMind), @MahapatraChirag (Mercor), @karthik_r_n (GPT co-creator, ex-OpenAI) for reviewing and providing valuable feedback on this blog post. At @NeoSigmaAI, we are building this future. If you're deploying production agent systems and want to reliably maintain and improve agent harness from production signals, we'd love to talk. Full blog and Tau3 bench results here: neosigma.ai/blog/self-impr…
English
4
1
28
2.3K
Gauri Gupta
Gauri Gupta@gauri__gupta·
We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).
English
45
43
254
88.9K
Yash Jain
Yash Jain@_jainyash·
@atulit_gaur That is indeed an "expectation" of the experts we expect the gradient descent to learn. I tackled this problem a few years ago with a specialised router, give it a read arxiv.org/abs/2311.04894
English
0
0
1
67
atulit
atulit@atulit_gaur·
the router in mixture of experts models is a linear layer. it takes a token's hidden state, multiplies it by a weight matrix of shape (num_experts, hidden_dim), softmaxes the result, and picks the top-k experts. that's it. but why does a matrix multiply "know" which expert to pick? each row of the router matrix is basically a learned prototype for that expert. the dot product measures how similar the token is to that prototype. high score = that expert gets activated. the cool part is nobody hardcodes what each expert specializes in. during training, gradient descent naturally pushes experts toward specialization because it minimizes loss better that way. one problem though - without a load balancing auxiliary loss, the router collapses and keeps sending tokens to the same 2-3 experts while the rest rot. that's why every moe paper has some balancing trick.
atulit tweet media
English
8
1
65
8.5K
Yash Jain
Yash Jain@_jainyash·
Checkout this amazing work led by @HenryOxplore @VibhavVineet at Microsoft Research, and Andrew Markham, He Liang at University of Oxford.
English
0
0
3
76
Yash Jain
Yash Jain@_jainyash·
Surprisingly! Current TTA models fail to distinguish between "generate a sound for a train 10 meters away" vs "a train 500 meters away" Presenting our ICLR 2026 work - Aurelius, a synthetic data generation pipeline to enhance relation-aware TTA generation. 🚀
Yuhang He (Henry)@HenryOxplore

🚀 [ICLR 2026] Existing text-to-audio generation (TTA) methods mainly focus on semantic correctness, yet they perform very poorly on relation-aware TTA generation. For example, current models achieve <30% audio event presence accuracy and <10% relation accuracy. In our newly accepted ICLR 2026 paper, we introduce Aurelius, a framework that enables relation-aware TTA research at scale. Specifically, we introduce two meticulously curated corpora: 🗂 AudioEventSet — 110 audio events across 7 major classes. 🗂 AudioRelSet — 100 relations across 6 major relation types. Based on the two corpora and the proposed data creation strategy, we can create massive (nearly unlimited) pairs with both • high linguistic diversity. • high acoustic diversity. We release all resources to support the broader community in AI, acoustics, computer vision, and multimodal research. 📄 Paper: openreview.net/pdf?id=LAYCYiI… 🗂 Dataset: huggingface.co/datasets/yuhan… 💻 Code: github.com/yuhanghe01/Aur… 🌐 Project Page: yuhanghe01.github.io/Aurelius-Proj/ Huge thanks to Andrew Markham, He Liang, @_jainyash and @VibhavVineet at Microsoft Research and University of Oxford for their unwavering support. #ICLR2026 #Multimodality

English
1
1
8
313
xjdr
xjdr@_xjdr·
"if you prove to me that you can distill frontier policy by SFT on less than 1T tokens, i will close my lab, quit my startup and come work for you right now"
xjdr tweet media
English
20
86
2K
79.3K
Yash Jain retweetledi
Essential AI
Essential AI@essential_ai·
[1/5] We thank the community for their feedback on Rnj-1. We’d like to announce a few updates to Rnj-1-instruct based on what we heard: - Resolving premature truncation of generations and improved instruction following. - Instructions for 128k context length extrapolation. - Updated evals, baselines, and model generations for reproducibility. Details follow 🧵
English
1
5
39
15.6K
Sarah Chieng
Sarah Chieng@MilksandMatcha·
Cafe Compute is sf's first late-night coffeeshop with free coffee, desks, food, and unbeatable company. Last year, Cafe Compute started as a Tuesday night experiment in our Hayes Valley living room. We kept the lights on late, ordered in-and-out, and made sure our wifi would survive. In 2025, 10,000 people showed up! Researchers, first-time founders, cereal founders, and friends-of-friends, across SF, NYC, San Diego, Vancouver, and more. Thank you so much to the community who joins us. We closed out the year with a Cafe Compute: cereal edition to launch our Big Chip Club podcast, and the verdict was unanimous: Oreo-os are king. If you want an invite next year, comment your favorite cereal and we'll add you to the list :) Let's bring SF back. 🤙 A huge shoutout to the team who makes Cafe Compute possible @communidiyi @zhennydez @cerebras @sfcompute
Sarah Chieng tweet mediaSarah Chieng tweet mediaSarah Chieng tweet mediaSarah Chieng tweet media
English
27
8
164
375.1K
Yash Jain retweetledi
Andrew Carr 🤸
Andrew Carr 🤸@andrew_n_carr·
seems like the open stack is glm-4.6V-Flash for multimodal olmo 3 32B thinking for math rnj-1-instruct for agentic
English
7
11
201
24.3K
Yash Jain retweetledi
Yash Jain retweetledi
Together AI
Together AI@togethercompute·
Introducing Rnj-1 Instruct from @essential_ai, an open-source 8B model engineered for agentic coding and STEM tasks. AI natives can now use Rnj-1 Instruct on Together AI and benefit from reliable inference for production-scale software engineering and scientific workflows.
Together AI tweet media
English
3
10
38
5.5K