Devaansh Gupta

36 posts

Devaansh Gupta

@DevaanshGupta1

Pre/post-training @ Essential AI

Katılım Aralık 2021

94 Takip Edilen157 Takipçiler

Devaansh Gupta retweetledi

Gauri Gupta@gauri__gupta·31 Mar

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English

254

88.9K

Devaansh Gupta@DevaanshGupta1·16 Mar

@hbXNov @kaiwei_chang @adityagrover_ @VioletNPeng @AnthropicAI Congratulations Hritik!

Indonesia

Hritik Bansal@hbXNov·14 Mar

Finally defended my Ph.D. thesis! 🥳 A very warm thank you to my family, friends, and advisors — @kaiwei_chang, @adityagrover_, @VioletNPeng, and Hongjing Lu. Next, I will be joining @AnthropicAI as a Member of Technical Staff. My defense slides ⬇️

English

290

24K

Devaansh Gupta retweetledi

Yash Jain@_jainyash·8 Mar

Surprisingly! Current TTA models fail to distinguish between "generate a sound for a train 10 meters away" vs "a train 500 meters away" Presenting our ICLR 2026 work - Aurelius, a synthetic data generation pipeline to enhance relation-aware TTA generation. 🚀

Yuhang He (Henry)@HenryOxplore

🚀 [ICLR 2026] Existing text-to-audio generation (TTA) methods mainly focus on semantic correctness, yet they perform very poorly on relation-aware TTA generation. For example, current models achieve <30% audio event presence accuracy and <10% relation accuracy. In our newly accepted ICLR 2026 paper, we introduce Aurelius, a framework that enables relation-aware TTA research at scale. Specifically, we introduce two meticulously curated corpora: 🗂 AudioEventSet — 110 audio events across 7 major classes. 🗂 AudioRelSet — 100 relations across 6 major relation types. Based on the two corpora and the proposed data creation strategy, we can create massive (nearly unlimited) pairs with both • high linguistic diversity. • high acoustic diversity. We release all resources to support the broader community in AI, acoustics, computer vision, and multimodal research. 📄 Paper: openreview.net/pdf?id=LAYCYiI… 🗂 Dataset: huggingface.co/datasets/yuhan… 💻 Code: github.com/yuhanghe01/Aur… 🌐 Project Page: yuhanghe01.github.io/Aurelius-Proj/ Huge thanks to Andrew Markham, He Liang, @_jainyash and @VibhavVineet at Microsoft Research and University of Oxford for their unwavering support. #ICLR2026 #Multimodality

English

313

Devaansh Gupta@DevaanshGupta1·7 Mar

📡📈@ATT !

Essential AI@essential_ai

Rnj-1 has outperformed other open models in its weight class in the largest open-source AI initiative in telecom to-date🚀

Devaansh Gupta retweetledi

Siyan Zhao@siyan_zhao·22 Oca

Introducing 💡On-Policy Self-Distillation💡, a simple method that enables LLM to teach itself with dense per-token feedback on its own on-policy generations—achieving 4-8x more token efficiency vs. GRPO and outperforming both GRPO and SFT/Off-Policy Distillation. Key insight: like a student reviewing solutions, rationalizing them, and correcting prior mistakes, an LLM can be conditioned on privileged info (e.g., correct solution or a reasoning trace) and supervise its weaker self—the version without such access—by matching the privileged-info-induced distribution from itself. 🌐Blog: siyan-zhao.github.io/blog/2026/opsd/ 🧵👇

English

158

922

132.9K

Devaansh Gupta retweetledi

Daniel Israel@danielmisrael·22 Eki

"An hour of planning can save you 10 hours of doing." ✨📝 Planned Diffusion 📝 ✨ makes a plan before parallel dLLM generation. Planned Diffusion runs 1.2-1.8× faster than autoregressive and an order of magnitude faster than diffusion, while staying within 0.9–5% AR quality.

English

321

38.6K

Devaansh Gupta retweetledi

Essential AI@essential_ai·24 Ara

[1/5] We thank the community for their feedback on Rnj-1. We’d like to announce a few updates to Rnj-1-instruct based on what we heard: - Resolving premature truncation of generations and improved instruction following. - Instructions for 128k context length extrapolation. - Updated evals, baselines, and model generations for reproducibility. Details follow 🧵

English

15.6K

Devaansh Gupta@DevaanshGupta1·15 Ara

@marterrz @natolambert @deepseek_ai @AlibabaGroup @essential_ai Vanilla muon!

Suomi

martin@marterrz·15 Ara

@DevaanshGupta1 @natolambert @deepseek_ai @AlibabaGroup @essential_ai Nice! (Congratulations on the performance btw!) Was it vanilla muon or muonclip like K2?

English

Nathan Lambert@natolambert·14 Ara

Open models year in review What a year! We're back with an updated open model builder tier list, our top models of the year, and our predictions for 2026. First, the winning models: 1. DeepSeek R1 (@deepseek_ai): Transformed the AI world 2. Qwen 3 Family (@AlibabaGroup): The new default open models 3. Kimi K2 Family (@Kimi_Moonshot): Models that convinced the world that DeepSeek wasn't special and China would produce numerous leading models. Runner up models: MiniMax M2 (@minimax_ai), GLM 4.5 (@Zai_org), GPT-OSS (@OpenAI), Gemma 3 (@GoogleAI), Olmo 3 (@allen_ai) Honorable Mentions: Nvidia's (@nvidia) Parakeet speech-to-text model & Nemotron 2 LLM, Moondream 3 VLM (@moondreamai), Granite 4 LLMs (@IBMResearch), and HuggingFace's (@huggingface) SmolLM3. Updated Tier list: Frontier open labs: DeepSeek (@deepseek_ai), Qwen (@AlibabaGroup), and Kimi Moonshot (@Kimi_Moonshot) Close behind: Z.ai (@Zai_org) & MiniMax AI (@minimax_ai) (notably none from the U.S. here and up) Noteworthy (a mix of US & China): StepFun AI (@StepFun_ai), Ant Group's (@AntGroup/ @TheInclusionAI Inclusion AI, Meituan (@Meituan_LongCat), Tencent (@TencentHunyuan), IBM (@IBMResearch), Nvidia (@nvidia), Google (@GoogleAI), & Mistral (@MistralAI) Then a bunch more below that, which we detail. Predictions for 2026: 1. Scaling will continue with open models. 2. No substantive changes in the open model safety narrative. 3. Participation will continue to grow. 4. Ongoing general trends will continue w/ MoEs, hybrid attention, dense for fine-tuning. 5. The open and closed frontier gap will stay roughly the same on any public benchmarks. 6. No Llama-branded open model releases from Meta in 2026. Read the full post on @interconnectsai -- link below.

English

262

1.5K

351.5K

Devaansh Gupta@DevaanshGupta1·15 Ara

@marterrz @natolambert @deepseek_ai @AlibabaGroup We trained Rnj-1 w Muon @essential_ai! At par/better with some of the runner-ups :’)

English

martin@marterrz·15 Ara

@natolambert @deepseek_ai @AlibabaGroup Are there any models besides Kimi's that were trained using Muon? For me that was their main contribution.

English

805

Devaansh Gupta@DevaanshGupta1·10 Ara

To the moon! Check it out!

Essential AI@essential_ai

We are now the #1 trending text-gen <256B size model on HuggingFace!!

English

129

Devaansh Gupta retweetledi

Sergio Paniego@SergioPaniego·9 Ara

NEW: @essential_ai just released Rnj-1, their first 8B model. You can easily fine-tune it with GRPO using TRL to add reasoning capabilities to a compact mode Free Colab link below

English

5.8K

Devaansh Gupta retweetledi

Essential AI@essential_ai·9 Ara

Proud to carry the spirit of Ramanujan with rnj-1: honoring the great minds who showed how curiosity reshapes the world. The lineage of deep thinking is part of our mission. Essential proud.

Divya Shivaprasad@DivShivaprasad

Behind the scenes at Essential: long hours, tough problems, and a team driven by an unsatiable desire to build. What keeps us going isn’t noise - it’s the scientific discipline. The work demands focus, rigor, and a willingness to push through uncertainty. Proud to work with this special team @essential_ai @ashVaswani

English

5.8K

Devaansh Gupta retweetledi

darren@darrenangle·6 Ara

age of research

Ashish Vaswani@ashVaswani

We are beyond thrilled to share our first flagship models, Rnj-1 base and instruct 8B parameter models. Rnj-1 is the culmination of 10 months of hard work by a phenomenal team, dedicated to advancing American SOTA OSS AI. Lots of wins with Rnj-1. 1. SWE bench performance close to GPT 4o. 2. Tool use outperforming all comparable open source models. 3. Mathematical reasoning (AIME’25) nearly at par with GPT OSS MoE 20B. ….

English

4.6K

Devaansh Gupta retweetledi

Vipul Ved Prakash@vipulved·6 Ara

The OSS coding model that @ashVaswani and team at @essential_ai have been working on over the last year is now out and available to use on @togethercompute APIs and playground. It's particularly exciting that @essential_ai is doing fundamental work in pre-training and post training and publishing open weights! together.ai/models/rnj-1-i… essential.ai/research/rnj-1

English

122

29.2K

Devaansh Gupta@DevaanshGupta1·6 Ara

Thank you for showcasing Rnj-1 @yupp_ai! Love the demos❤️

Yupp@yupp_ai

📢 New Model Drop: Rnj-1 Instruct is now live on Yupp! The first flagship model from @essential_ai, this open-source model is built to excel at math problem-solving and scientific reasoning. We checked it out with some prompts:

English

264

Devaansh Gupta@DevaanshGupta1·6 Ara

@zephyr_z9 @ashVaswani 🫡

QME

509

Zephyr@zephyr_z9·6 Ara

Beating Qwen3 8B with 1/4th training FLOPS @ashVaswani cooked

Essential AI@essential_ai

Today, we’re excited to introduce Rnj-1, @essential_ai's first open model; a world-class 8B base + instruct pair, built with scientific rigor, intentional design, and a belief that the advancement and equitable distribution of AI depend on building in the open. We bring American open-source at par with the best in the world.

English

100.4K

Devaansh Gupta retweetledi

Omar Khattab@lateinteraction·6 Ara

> We deliberately kept post-training limited to allow for further specialization by the community. As an indicator of the untapped potential of the models we report pass@{1,2,4,8} for hard codegen, agentic, and math benchmarks. Cool! Exciting release.

Essential AI@essential_ai

English

169

21.4K

Devaansh Gupta@DevaanshGupta1·6 Ara

@KLieret Thank you Kilian! We love mini-swe-agent💪🏻

English

108

Kilian Lieret@KLieret·6 Ara

Congrats to Essential AI on the strong SWE-bench numbers for an 8B model! Also very cool to see mini-swe-agent featured in the blog post!

Essential AI@essential_ai

English

6.6K

Devaansh Gupta retweetledi

Together AI@togethercompute·6 Ara

Introducing Rnj-1 Instruct from @essential_ai, an open-source 8B model engineered for agentic coding and STEM tasks. AI natives can now use Rnj-1 Instruct on Together AI and benefit from reliable inference for production-scale software engineering and scientific workflows.

English

5.5K

Devaansh Gupta retweetledi

Aleksa Gordić (水平问题)@gordic_aleksa·6 Ara

Happy to share what I've been cooking over the past months as a part of a stellar team at @essential_ai labs: we bring to you Rnj-1 (h/t Ramanujan), the best USA 🇺🇸 open-source LLM in the 8B category - fully pretrained, midtrained, and posttrained from scratch on zettaflops of AMD (one of the largest AMD training clusters in the world) and TPU compute! We are releasing our base model, and our post-trained checkpoint on HuggingFace - to help you squeeze the absolute most out of post-training for your particular use case. Our initial evaluations show that Rnj-1 is very strong at code, math, and tool calling. On SWE-Bench it's an order of magnitude stronger than comparably sized models - It scores 20.8% on SWE-bench Verified in bash-only mode, which is higher than Gemini 2.0 flash and Qwen2.5-Coder 32B Instruct and on par with GPT-4o (!) under the same agent framework. Rnj-1 was pretrained on 8.4T tokens with 8k ctx len. Followed by 380B tokens in midtraining, and a 150B SFT stage to get rnj-1-instruct. We used Muon as the optimizer. Tech report coming soon, but see our blog until then. As the AI world drifts to whatever "the current thing" is (at this moment in time that's RL), we're going back to first principles and focusing squarely on pretraining. We believe that many behaviors people assume only emerge during post-training can actually emerge during pretraining -> if you cook the model the right way. :) On a side note - being part of a very small (~20 members of technical staff), tightly knit, hard-working, extremely ambitious team that's working in the same physical space (!!!) is so fun. I was in the office until 2:30 a.m. last night pushing out our latest eval numbers, and a few of my colleagues pulled an all-nighter to help prepare for today's launch! Due to our size, and my background, I feel I’m in a rare position (looking at the AI labs LLM landscape as a whole) because i got to work on the whole LLM pipeline: from our infra, in-house Spark pipelines, and the data analysis engine (did i tell you to look at your f***ing data already?) to data collection/synthesis, data mixing, training experiments, and - last but not least - evals. As a bonus, getting to distill tokens from @ashVaswani on a daily basis is rewarding (we met at a small event w Satya earlier this year). Conspicuously missing from our upcoming tech report are any Transformer modifications, which might come as a surprise given our team. It’s all about research taste and making bets - in our case, that is pretraining, simulating program behaviors.. The easiest way to run Rnj-1: * laptop -> llama.cpp or transformers * your infra -> vLLM, Sglang * IDEs and Agents with Cline extension -> vs code / Cursor or try claude code router Happy to see what you build with it! My dms are open. It's a good model, sir. 🫡

English

496

65.4K

Keşfet

@neosigmaai @RitvikKapila @hbXNov @kaiwei_chang @adityagrover_ @VioletNPeng @AnthropicAI @ATT