shubham
2K posts

shubham
@ShubhamInTech
infra for self-improving agents • https://t.co/Fq3PewMpH3 • backed by @join_ef • founding engg @formbricks • youngest engg at cisco analytics






The co-founder of a $1B VoiceAI infra company says you can't test agents the same way you test traditional software. According to @dsa, you have to test these things almost like you test human beings. Like college degrees, resumes, job interviews, reference checks, etc. What you're really trying to do is build statistical confidence that the person you're hiring can do the task with 99% precision/confidence. Have to test agents the same way by running thousands of end-to-end simulations. Permute accent, language, system prompt, instructions, etc, and see how it performs against the success criteria spit out by your observability stack. This way, you're building confidence, deploying it, and observing which bugs/issues require tweaking. Take those back and make changes to the agent code. Test again, run simulations, and make sure there's no regression. Then deploy again, scale, and observe. That's how the loop goes.













