NeoSigma

34 posts

NeoSigma banner
NeoSigma

NeoSigma

@NeoSigmaAI

Closing the feedback loop in production.

San Francisco, CA เข้าร่วม Ekim 2025
2 กำลังติดตาม570 ผู้ติดตาม
ทวีตที่ปักหมุด
NeoSigma
NeoSigma@NeoSigmaAI·
The next era of AI engineering is self-improving agentic systems! Really excited to share what we are building at NeoSigma! Self-maintaining agent systems represent a shift in how we build and operate software. We, at NeoSigma are building the infrastructure to support this feedback loop in real-world systems, helping teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior.
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
0
0
7
1.4K
NeoSigma รีทวีตแล้ว
Srijan Bansal
Srijan Bansal@SrijanBansal1·
Self improving agents are coming sooner than expected. Interesting read
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
0
1
3
154
NeoSigma รีทวีตแล้ว
Shagunn
Shagunn@shagunuppls·
Your agent's failures are not bugs to squash. They're training data for your next improvement cycle. The teams that internalize this will compound faster than everyone else.
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
3
5
17
1.6K
NeoSigma รีทวีตแล้ว
NeoSigma รีทวีตแล้ว
Shreya Sharma
Shreya Sharma@shreya__verse·
An uncomfortable truth: the teams that win won’t be the ones with the smartest agents to begin with, they’ll be the ones whose agents fail better, learn faster, and never regress. Kudos to @NeoSigmaAI team for building toward these self-improving agents!! 🚀🚀 @gauri__gupta @RitvikKapila 🎉
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
0
2
6
554
NeoSigma รีทวีตแล้ว
Dvij Kalaria
Dvij Kalaria@DvijKalaria·
Amazing! How does it compare against auto-prompt optimization techniques like GEPA @LakshyAAAgrawal
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
1
3
12
1.8K
NeoSigma รีทวีตแล้ว
Mehul Agarwal
Mehul Agarwal@meh_agarwal·
Self improving AI systems are the natural evolution of codegen. Soon people will stop deploying code that doesn’t fix itself. So proud of my sister @gauri__gupta for moving the needle with @NeoSigmaAI
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
3
2
5
470
NeoSigma รีทวีตแล้ว
Varsha Rao
Varsha Rao@varsharao·
The most expensive thing in agent systems isn't compute - it's engineers manually debugging failures that keep coming back. Automating that loop is a massive unlock. Great work @NeoSigmaAI team!
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
1
2
6
919
NeoSigma รีทวีตแล้ว
Ritvik Kapila
Ritvik Kapila@RitvikKapila·
@Vtrivedy10 Agent harness engineering is the future. The result: a harness that evolves reliably and faster than any manual loop, leveraging far more context, running more experiments, and validating them all in parallel. Check out our work on harness optimization: x.com/gauri__gupta/s…
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
1
2
4
391
NeoSigma รีทวีตแล้ว
Tanmay Agarwal
Tanmay Agarwal@tanmayagarwal98·
when production failures become evals, evals become experiments, and experiments become reliably better agent behavior, you get a system that compounds. the biggest ai companies of next decade will be the ones with the best self improvement loops. excited for what @gauri__gupta @RitvikKapila are building at @NeoSigmaAI
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
1
2
4
349
NeoSigma รีทวีตแล้ว
Sriraam
Sriraam@27upon2·
Continual learning and continuously improving systems are already here
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
0
2
3
592
NeoSigma รีทวีตแล้ว
Anish Madan
Anish Madan@anishmadan23·
Writing code is only a starting point, but how do you maintain code in production level systems and keep improving continually? Check out @NeoSigmaAI tackling this important gap with self improving AI systems. Big congrats on the launch @gauri__gupta @RitvikKapila 🚀🚀
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
0
2
2
318
NeoSigma รีทวีตแล้ว
Somanshu Singla
Somanshu Singla@ssingla17·
Amazing work by @gauri__gupta @RitvikKapila ! Self-improving AI systems are the future
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
3
2
2
521
NeoSigma รีทวีตแล้ว
Ritvik Kapila
Ritvik Kapila@RitvikKapila·
@yoonholeee @roshen_nair @qizhengz_alex @Kangwook_Lee @lateinteraction @chelseabfinn Love this! We've been working on automating harness engineering at @NeoSigmaAI . We see a 40% jump on Tau3 bench with GPT5.4. What do you think about generalizability across models? We see most gains carry over, but upgrades still surface new failures. x.com/gauri__gupta/s…
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
0
2
0
458
NeoSigma รีทวีตแล้ว
Maz
Maz@0xmaz_·
legit think the next trillion dollar opportunity will come from self improving ais @gauri__gupta and @RitvikKapila are super cracked devs this is a new project ill be keeping an eye on
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
1
4
11
1.1K
NeoSigma รีทวีตแล้ว
renjie pi
renjie pi@RenjiePi·
Very interesting work! Researchers keep finding new ways to have AI replace ourselves😆
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
1
3
3
518
NeoSigma รีทวีตแล้ว
Santanu Bhattacharya
Santanu Bhattacharya@SantanuB01·
Watching former students build what the rest of us were only theorizing about is a researcher's best reward. @gauri__gupta and @RitvikKapila of @NeoSigmaAI - you made my day! We spent years getting AI to write code. The harder, more valuable problem is getting AI to close the loop between customers, products, and systems — in real time. That's the next frontier, and they're building it. Congratulations and Godspeed! 🚀
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
2
3
7
821
NeoSigma รีทวีตแล้ว
Gauri Gupta
Gauri Gupta@gauri__gupta·
1. Right now the system works best when there is a way to verify outcomes, whether that's automated evals, user feedback signals, or success/failure signals from production traces. Though, our system can also generate its own evals from production failures. In this experiment also, we start with an empty regression set that grows as failures are fixed. 2. We guard against overfitting through the regression gate. Every proposed harness change has to pass both the new evals and not regress on previously fixed failures. We also continuously refresh the eval set from live production data and retire stale evals, so the harness is always optimizing against a moving, representative target rather than a fixed set.
English
0
1
3
357
NeoSigma รีทวีตแล้ว
Andrew Sohrabi
Andrew Sohrabi@AndrewSohrabi·
I have had to architect a similar approach to my own agent development by running evals + regression gates and keeping only helpful changes to the harness over time. cool to see a product focused on this. nice work @gauri__gupta!
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
1
3
5
699
NeoSigma รีทวีตแล้ว
simon
simon@disiok·
recursive improvement is here
Gauri Gupta@gauri__gupta

We @neosigmaai @RitvikKapila are building the future of self-improving AI systems! By closing the feedback loop between production data and system improvements, we help teams capture failures, convert them into structured evaluation signals, and use them to drive continuous improvements in agent behavior. We show how our system works on Tau3 bench across retail, telecom, and airline domains. Agent performance on the validation set (with a fixed underlying model, GPT5.4) improves from 0.56 → 0.78 (~40% jump in accuracy).

English
0
3
8
3.8K
NeoSigma รีทวีตแล้ว