Devaansh Gupta

36 posts

Devaansh Gupta

Devaansh Gupta

@DevaanshGupta1

Pre/post-training @ Essential AI

Katılım Aralık 2021
94 Takip Edilen157 Takipçiler
Devaansh Gupta retweetledi
Gauri Gupta
Gauri Gupta@gauri__gupta·
We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).
English
45
43
254
88.9K
Devaansh Gupta retweetledi
Yash Jain
Yash Jain@_jainyash·
Surprisingly! Current TTA models fail to distinguish between "generate a sound for a train 10 meters away" vs "a train 500 meters away" Presenting our ICLR 2026 work - Aurelius, a synthetic data generation pipeline to enhance relation-aware TTA generation. 🚀
Yuhang He (Henry)@HenryOxplore

🚀 [ICLR 2026] Existing text-to-audio generation (TTA) methods mainly focus on semantic correctness, yet they perform very poorly on relation-aware TTA generation. For example, current models achieve <30% audio event presence accuracy and <10% relation accuracy. In our newly accepted ICLR 2026 paper, we introduce Aurelius, a framework that enables relation-aware TTA research at scale. Specifically, we introduce two meticulously curated corpora: 🗂 AudioEventSet — 110 audio events across 7 major classes. 🗂 AudioRelSet — 100 relations across 6 major relation types. Based on the two corpora and the proposed data creation strategy, we can create massive (nearly unlimited) pairs with both • high linguistic diversity. • high acoustic diversity. We release all resources to support the broader community in AI, acoustics, computer vision, and multimodal research. 📄 Paper: openreview.net/pdf?id=LAYCYiI… 🗂 Dataset: huggingface.co/datasets/yuhan… 💻 Code: github.com/yuhanghe01/Aur… 🌐 Project Page: yuhanghe01.github.io/Aurelius-Proj/ Huge thanks to Andrew Markham, He Liang, @_jainyash and @VibhavVineet at Microsoft Research and University of Oxford for their unwavering support. #ICLR2026 #Multimodality

English
1
1
8
313
Devaansh Gupta retweetledi
Siyan Zhao
Siyan Zhao@siyan_zhao·
Introducing 💡On-Policy Self-Distillation💡, a simple method that enables LLM to teach itself with dense per-token feedback on its own on-policy generations—achieving 4-8x more token efficiency vs. GRPO and outperforming both GRPO and SFT/Off-Policy Distillation. Key insight: like a student reviewing solutions, rationalizing them, and correcting prior mistakes, an LLM can be conditioned on privileged info (e.g., correct solution or a reasoning trace) and supervise its weaker self—the version without such access—by matching the privileged-info-induced distribution from itself. 🌐Blog: siyan-zhao.github.io/blog/2026/opsd/ 🧵👇
Siyan Zhao tweet media
English
31
158
922
132.9K
Devaansh Gupta retweetledi
Daniel Israel
Daniel Israel@danielmisrael·
"An hour of planning can save you 10 hours of doing." ✨📝 Planned Diffusion 📝 ✨ makes a plan before parallel dLLM generation. Planned Diffusion runs 1.2-1.8× faster than autoregressive and an order of magnitude faster than diffusion, while staying within 0.9–5% AR quality.
English
7
47
321
38.6K
Devaansh Gupta retweetledi
Essential AI
Essential AI@essential_ai·
[1/5] We thank the community for their feedback on Rnj-1. We’d like to announce a few updates to Rnj-1-instruct based on what we heard: - Resolving premature truncation of generations and improved instruction following. - Instructions for 128k context length extrapolation. - Updated evals, baselines, and model generations for reproducibility. Details follow 🧵
English
1
5
39
15.6K
Nathan Lambert
Nathan Lambert@natolambert·
Open models year in review What a year! We're back with an updated open model builder tier list, our top models of the year, and our predictions for 2026. First, the winning models: 1. DeepSeek R1 (@deepseek_ai): Transformed the AI world 2. Qwen 3 Family (@AlibabaGroup): The new default open models 3. Kimi K2 Family (@Kimi_Moonshot): Models that convinced the world that DeepSeek wasn't special and China would produce numerous leading models. Runner up models: MiniMax M2 (@minimax_ai), GLM 4.5 (@Zai_org), GPT-OSS (@OpenAI), Gemma 3 (@GoogleAI), Olmo 3 (@allen_ai) Honorable Mentions: Nvidia's (@nvidia) Parakeet speech-to-text model & Nemotron 2 LLM, Moondream 3 VLM (@moondreamai), Granite 4 LLMs (@IBMResearch), and HuggingFace's (@huggingface) SmolLM3. Updated Tier list: Frontier open labs: DeepSeek (@deepseek_ai), Qwen (@AlibabaGroup), and Kimi Moonshot (@Kimi_Moonshot) Close behind: Z.ai (@Zai_org) & MiniMax AI (@minimax_ai) (notably none from the U.S. here and up) Noteworthy (a mix of US & China): StepFun AI (@StepFun_ai), Ant Group's (@AntGroup/ @TheInclusionAI Inclusion AI, Meituan (@Meituan_LongCat), Tencent (@TencentHunyuan), IBM (@IBMResearch), Nvidia (@nvidia), Google (@GoogleAI), & Mistral (@MistralAI) Then a bunch more below that, which we detail. Predictions for 2026: 1. Scaling will continue with open models. 2. No substantive changes in the open model safety narrative. 3. Participation will continue to grow. 4. Ongoing general trends will continue w/ MoEs, hybrid attention, dense for fine-tuning. 5. The open and closed frontier gap will stay roughly the same on any public benchmarks. 6. No Llama-branded open model releases from Meta in 2026. Read the full post on @interconnectsai -- link below.
Nathan Lambert tweet media
English
69
262
1.5K
351.5K
Devaansh Gupta retweetledi
Sergio Paniego
Sergio Paniego@SergioPaniego·
NEW: @essential_ai just released Rnj-1, their first 8B model. You can easily fine-tune it with GRPO using TRL to add reasoning capabilities to a compact mode Free Colab link below
Sergio Paniego tweet media
English
1
12
70
5.8K
Devaansh Gupta retweetledi
Devaansh Gupta retweetledi
Devaansh Gupta retweetledi
Omar Khattab
Omar Khattab@lateinteraction·
> We deliberately kept post-training limited to allow for further specialization by the community. As an indicator of the untapped potential of the models we report pass@{1,2,4,8} for hard codegen, agentic, and math benchmarks. Cool! Exciting release.
Essential AI@essential_ai

Today, we’re excited to introduce Rnj-1, @essential_ai's first open model; a world-class 8B base + instruct pair, built with scientific rigor, intentional design, and a belief that the advancement and equitable distribution of AI depend on building in the open. We bring American open-source at par with the best in the world.

English
8
14
169
21.4K
Kilian Lieret
Kilian Lieret@KLieret·
Congrats to Essential AI on the strong SWE-bench numbers for an 8B model! Also very cool to see mini-swe-agent featured in the blog post!
Essential AI@essential_ai

Today, we’re excited to introduce Rnj-1, @essential_ai's first open model; a world-class 8B base + instruct pair, built with scientific rigor, intentional design, and a belief that the advancement and equitable distribution of AI depend on building in the open. We bring American open-source at par with the best in the world.

English
4
0
30
6.6K
Devaansh Gupta retweetledi
Together AI
Together AI@togethercompute·
Introducing Rnj-1 Instruct from @essential_ai, an open-source 8B model engineered for agentic coding and STEM tasks. AI natives can now use Rnj-1 Instruct on Together AI and benefit from reliable inference for production-scale software engineering and scientific workflows.
Together AI tweet media
English
3
10
38
5.5K
Devaansh Gupta retweetledi
Aleksa Gordić (水平问题)
Aleksa Gordić (水平问题)@gordic_aleksa·
Happy to share what I've been cooking over the past months as a part of a stellar team at @essential_ai labs: we bring to you Rnj-1 (h/t Ramanujan), the best USA 🇺🇸 open-source LLM in the 8B category - fully pretrained, midtrained, and posttrained from scratch on zettaflops of AMD (one of the largest AMD training clusters in the world) and TPU compute! We are releasing our base model, and our post-trained checkpoint on HuggingFace - to help you squeeze the absolute most out of post-training for your particular use case. Our initial evaluations show that Rnj-1 is very strong at code, math, and tool calling. On SWE-Bench it's an order of magnitude stronger than comparably sized models - It scores 20.8% on SWE-bench Verified in bash-only mode, which is higher than Gemini 2.0 flash and Qwen2.5-Coder 32B Instruct and on par with GPT-4o (!) under the same agent framework. Rnj-1 was pretrained on 8.4T tokens with 8k ctx len. Followed by 380B tokens in midtraining, and a 150B SFT stage to get rnj-1-instruct. We used Muon as the optimizer. Tech report coming soon, but see our blog until then. As the AI world drifts to whatever "the current thing" is (at this moment in time that's RL), we're going back to first principles and focusing squarely on pretraining. We believe that many behaviors people assume only emerge during post-training can actually emerge during pretraining -> if you cook the model the right way. :) On a side note - being part of a very small (~20 members of technical staff), tightly knit, hard-working, extremely ambitious team that's working in the same physical space (!!!) is so fun. I was in the office until 2:30 a.m. last night pushing out our latest eval numbers, and a few of my colleagues pulled an all-nighter to help prepare for today's launch! Due to our size, and my background, I feel I’m in a rare position (looking at the AI labs LLM landscape as a whole) because i got to work on the whole LLM pipeline: from our infra, in-house Spark pipelines, and the data analysis engine (did i tell you to look at your f***ing data already?) to data collection/synthesis, data mixing, training experiments, and - last but not least - evals. As a bonus, getting to distill tokens from @ashVaswani on a daily basis is rewarding (we met at a small event w Satya earlier this year). Conspicuously missing from our upcoming tech report are any Transformer modifications, which might come as a surprise given our team. It’s all about research taste and making bets - in our case, that is pretraining, simulating program behaviors.. The easiest way to run Rnj-1: * laptop -> llama.cpp or transformers * your infra -> vLLM, Sglang * IDEs and Agents with Cline extension -> vs code / Cursor or try claude code router Happy to see what you build with it! My dms are open. It's a good model, sir. 🫡
Aleksa Gordić (水平问题) tweet media
English
32
50
496
65.4K