NeoSigma

34 posts

NeoSigma banner
NeoSigma

NeoSigma

@NeoSigmaAI

Closing the feedback loop in production.

San Francisco, CA Inscrit le Ekim 2025
2 Abonnements571 Abonnés
Tweet épinglé
NeoSigma
NeoSigma@NeoSigmaAI·
The next era of AI engineering is self-improving agentic systems! Really excited to share what we are building at NeoSigma! Self-maintaining agent systems represent a shift in how we build and operate software. We, at NeoSigma are building the infrastructure to support this feedback loop in real-world systems, helping teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior.
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
0
0
7
1.5K
NeoSigma retweeté
Srijan Bansal
Srijan Bansal@SrijanBansal1·
Self improving agents are coming sooner than expected. Interesting read
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
0
1
3
161
NeoSigma retweeté
Shagunn
Shagunn@shagunuppls·
Your agent's failures are not bugs to squash. They're training data for your next improvement cycle. The teams that internalize this will compound faster than everyone else.
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
3
5
17
1.6K
NeoSigma retweeté
NeoSigma retweeté
Shreya Sharma
Shreya Sharma@shreya__verse·
An uncomfortable truth: the teams that win won’t be the ones with the smartest agents to begin with, they’ll be the ones whose agents fail better, learn faster, and never regress. Kudos to @NeoSigmaAI team for building toward these self-improving agents!! 🚀🚀 @gauri__gupta @RitvikKapila 🎉
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
0
2
6
566
NeoSigma retweeté
Dvij Kalaria
Dvij Kalaria@DvijKalaria·
Amazing! How does it compare against auto-prompt optimization techniques like GEPA @LakshyAAAgrawal
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
1
3
12
1.8K
NeoSigma retweeté
Mehul Agarwal
Mehul Agarwal@meh_agarwal·
Self improving AI systems are the natural evolution of codegen. Soon people will stop deploying code that doesn’t fix itself. So proud of my sister @gauri__gupta for moving the needle with @NeoSigmaAI
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
3
2
5
475
NeoSigma retweeté
Varsha Rao
Varsha Rao@varsharao·
The most expensive thing in agent systems isn't compute - it's engineers manually debugging failures that keep coming back. Automating that loop is a massive unlock. Great work @NeoSigmaAI team!
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
1
2
6
932
NeoSigma retweeté
Ritvik Kapila
Ritvik Kapila@RitvikKapila·
@Vtrivedy10 Agent harness engineering is the future. The result: a harness that evolves reliably and faster than any manual loop, leveraging far more context, running more experiments, and validating them all in parallel. Check out our work on harness optimization: x.com/gauri__gupta/s…
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
1
2
4
399
NeoSigma retweeté
Tanmay Agarwal
Tanmay Agarwal@tanmayagarwal98·
when production failures become evals, evals become experiments, and experiments become reliably better agent behavior, you get a system that compounds. the biggest ai companies of next decade will be the ones with the best self improvement loops. excited for what @gauri__gupta @RitvikKapila are building at @NeoSigmaAI
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
1
2
4
354
NeoSigma retweeté
Sriraam
Sriraam@27upon2·
Continual learning and continuously improving systems are already here
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
0
2
3
604
NeoSigma retweeté
Anish Madan
Anish Madan@anishmadan23·
Writing code is only a starting point, but how do you maintain code in production level systems and keep improving continually? Check out @NeoSigmaAI tackling this important gap with self improving AI systems. Big congrats on the launch @gauri__gupta @RitvikKapila 🚀🚀
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
0
2
2
322
NeoSigma retweeté
Somanshu Singla
Somanshu Singla@ssingla17·
Amazing work by @gauri__gupta @RitvikKapila ! Self-improving AI systems are the future
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
3
2
2
532
NeoSigma retweeté
Ritvik Kapila
Ritvik Kapila@RitvikKapila·
@yoonholeee @roshen_nair @qizhengz_alex @Kangwook_Lee @lateinteraction @chelseabfinn Love this! We've been working on automating harness engineering at @NeoSigmaAI . We see a 40% jump on Tau3 bench with GPT5.4. What do you think about generalizability across models? We see most gains carry over, but upgrades still surface new failures. x.com/gauri__gupta/s…
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
0
2
1
483
NeoSigma retweeté
Maz
Maz@0xmaz_·
legit think the next trillion dollar opportunity will come from self improving ais @gauri__gupta and @RitvikKapila are super cracked devs this is a new project ill be keeping an eye on
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
1
4
11
1.1K
NeoSigma retweeté
renjie pi
renjie pi@RenjiePi·
Very interesting work! Researchers keep finding new ways to have AI replace ourselves😆
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
1
3
3
526
NeoSigma retweeté
Santanu Bhattacharya
Santanu Bhattacharya@SantanuB01·
Watching former students build what the rest of us were only theorizing about is a researcher's best reward. @gauri__gupta and @RitvikKapila of @NeoSigmaAI - you made my day! We spent years getting AI to write code. The harder, more valuable problem is getting AI to close the loop between customers, products, and systems — in real time. That's the next frontier, and they're building it. Congratulations and Godspeed! 🚀
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
2
3
7
829
NeoSigma retweeté
Gauri Gupta
Gauri Gupta@gauri__gupta·
1. Right now the system works best when there is a way to verify outcomes, whether that's automated evals, user feedback signals, or success/failure signals from production traces. Though, our system can also generate its own evals from production failures. In this experiment also, we start with an empty regression set that grows as failures are fixed. 2. We guard against overfitting through the regression gate. Every proposed harness change has to pass both the new evals and not regress on previously fixed failures. We also continuously refresh the eval set from live production data and retire stale evals, so the harness is always optimizing against a moving, representative target rather than a fixed set.
English
0
1
3
369
NeoSigma retweeté
Andrew Sohrabi
Andrew Sohrabi@AndrewSohrabi·
I have had to architect a similar approach to my own agent development by running evals + regression gates and keeping only helpful changes to the harness over time. cool to see a product focused on this. nice work @gauri__gupta!
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
1
3
5
704
NeoSigma retweeté
simon
simon@disiok·
recursive improvement is here
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
0
3
8
3.8K
NeoSigma retweeté