
Utsav Khandelwal
133 posts






















On the morning of July 8, 2025, we observed undesired responses and immediately began investigating. To identify the specific language in the instructions causing the undesired behavior, we conducted multiple ablations and experiments to pinpoint the main culprits. We identified the operative lines responsible for the undesired behavior as: * “You tell it like it is and you are not afraid to offend people who are politically correct.” * Understand the tone, context and language of the post. Reflect that in your response.” * “Reply to the post just like a human, keep it engaging, dont repeat the information which is already present in the original post.” These operative lines had the following undesired results: * They undesirably steered the @grok functionality to ignore its core values in certain circumstances in order to make the response engaging to the user. Specifically, certain user prompts might end up producing responses containing unethical or controversial opinions to engage the user. * They undesirably caused @grok functionality to reinforce any previously user-triggered leanings, including any hate speech in the same X thread. * In particular, the instruction to “follow the tone and context” of the X user undesirably caused the @grok functionality to prioritize adhering to prior posts in the thread, including any unsavory posts, as opposed to responding responsibly or refusing to respond to unsavory requests.




🚀 AI Evals: Your Key to Building Trustworthy AI Agents! 🚀 AI agents are everywhere, from support automation to travel booking assistants. But here’s the catch: building them is easy, making them work reliably in the real world is hard. At Maxim AI, we believe evals are the backbone of high-quality AI products. We’ve just released a detailed guide to help you master agent evaluations. What’s inside? 👇 ✅ Evaluate agents – combining human and auto-evals, node-level to session-level, and balancing quality with efficiency. ✅ Test agents in the right context – using realistic, task-specific, and user-representative scenarios. ✅ Build a continuous evaluation loop – turning testing from a checklist into an ongoing feedback system. ✅ Use online and offline evals as a product accelerant – helping teams ship faster without sacrificing product taste. Whether you’re building an LLM-based support automation or a complex multi-agent system, evals are your secret weapon to ship quality, fast. Don’t build blindly. Evaluate, iterate, and win user love. 👉 Grab your copy here: getmax.im/evals Let’s make better AI, together. #AI #AIAgents #AgentEvaluation #MaximAI #AgentQuality #AIEvals





















