Nihit Nirmal

2.7K posts

Nihit Nirmal banner
Nihit Nirmal

Nihit Nirmal

@Nihit

CPO @ Funding Societies (SEA's largest SME digital lender) | ex-Lendingkart & PayU | Love being at the intersection of Fin. Services + Tech + Consumer Behavior

Singapore, Bangalore Katılım Eylül 2008
256 Takip Edilen677 Takipçiler
Nihit Nirmal
Nihit Nirmal@Nihit·
As we started building with LLMs at Yuuki, like any other product person, I brought in the workflow that worked for the last 15 years. Spec it, build it, test it, ship it. The assumption was the same: you know what good looks like before you start testing. So we built evals to do exactly that. Score the output, check the quality, move on. However as LLM outputs are non-deterministic, 10 point evals created problems. My nine turned out to be my co-founder's six. 😆 The LLM judge scored a seven that matched neither of us. Two months of scoring and we couldn't tell if the product was improving or our standards were drifting. You can't test for correctness when correctness isn't fixed. A coaching response that's empathetic but doesn't push the user to act, is that good? Depends on who you ask, depends on when you ask them. Hamel Husain's work on LLM evaluation pointed us to the fix. Binary. Yes or no. Is this response empathetic? Yes or no. No more debates about sixes and sevens. The feedback loop when building with LLMs is extremely tight. Every binary judgment came with a line of commentary. "Gave advice before asking a single question", that critique becomes a constraint. The next version follows it. Gets reviewed again, refined further. The evals aren't just a testing phase. They're the loop through which the product gets defined. And now because everyone, product, engineering, domain experts, is judging the same outputs, the evals force alignment on what good actually means. That alignment didn't exist before. The process created it. More on this in the next post. For now, wrote a detailed post on how we designed our evals, link in comments.
Nihit Nirmal tweet media
English
1
0
1
22
Nirant
Nirant@NirantK·
unpopular opinion: if you make more than 2L/mo after taxes, it's okay to spend 10K/mo in AI tokens to get nothing done It's not like your Swiggy bill is helping you get anything done more than your grocery + cook bill
English
11
8
171
14.6K
Nihit Nirmal
Nihit Nirmal@Nihit·
Funny episode of the day: I was chatting with Claude, which has a nice sense of humor. I was supposed to say “super helpful", and Claude's response was so on point, that I ended up replying to a non-required response.
Nihit Nirmal tweet media
English
0
0
1
29
Nihit Nirmal
Nihit Nirmal@Nihit·
@NirantK We were very early stage, just three of us. From start to finish, around 6 to 8 weeks, we used sprinto. Around 7ish k USD (first year) including one-time audit
English
1
0
4
498
Nirant
Nirant@NirantK·
Folks, who've gone through ISO 27001, what did it cost and how long did it take?
English
12
0
20
5.5K
Nihit Nirmal
Nihit Nirmal@Nihit·
Agentic Engineering: Technically Correct, Contextually Naive Everywhere I look, teams are celebrating AI coding agents. Million lines generated. PRs flying. Shipping velocity through the roof. Having lived through this, let me share what it actually looks like on the other side. After 24 months of building Yuuki and Rapida, there is a teeny tiny reality that changes how agentic engineering works. Agents are like engineers: they write a lot of code. But they are unlike engineers because they don't carry context. They start fresh every time. The code is technically correct but contextually naive. For the first feature, that's manageable. By the tenth deploy, you're shipping code that ignores everything production taught you from the first nine. That's not speed. That's compounding tech debt at machine pace. One of our agents at Rapida generated a webhook handler that passed every test. In production, the telephony provider retried on slow responses and the handler wasn't idempotent, which led to duplicate calls. An engineer who'd been bitten by that provider before would have built idempotency in from the start. The agent didn't have that scar tissue. Because of our traces, we caught it within minutes and fixed it the same day. Then we encoded the lesson — idempotency constraints baked into the webhook contract. Now agents generating webhook handlers must follow that contract. The system carries the context so the agent doesn't have to. That's the pattern: not more reviews before shipping, but a feedback loop where production teaches the system, and the system constrains the next generation. The teams shipping reliably at AI speed aren't generating the most code. They're the ones whose system learns from every deploy. blog.rapida.ai/what-happens-a…
English
1
0
0
40
Nihit Nirmal
Nihit Nirmal@Nihit·
From my personal experience with one of my relatives, what you get to know versus what's happening could be very different. Sometimes the staff would mention that the flight is canceled, but it actually takes off. Would recommend, if possible, to go to the airport and check for yourself. Can share more on DM if you want.
English
1
0
1
46
Prerna Modi
Prerna Modi@Prernamodi·
@Razorpay Unable to raise a ticket for reporting a fraud. Logged in. Shows past transactions. But no option to raise a ticket. Please guide how to.
English
2
0
0
26
Nihit Nirmal
Nihit Nirmal@Nihit·
@jiten In the past 2 weeks, I have stopped paying for Akiflo and Rosebud with a far superior vibe coded product for myself now.
English
1
0
1
18
jiten
jiten@jiten·
The bar for the type of SAAS I buy personally has gone up (I shouldn't be able to build it in one sitting). I feel enterprise SAAS will continue to thrive but b2c SAAS will have to figure out better pricing. (3/3)
English
1
0
9
1.5K
jiten
jiten@jiten·
I think I finally understand why people keep saying SAAS is dead though I keep buying more and more SAAS at workplace. (1/3)
English
3
0
20
7.7K
Nihit Nirmal
Nihit Nirmal@Nihit·
@AjeyGore Can confirm. Went all-in on Claude Code CLI + Wispr Flow, cancelled Akiflow, journaling app subscriptions. I spend all day on/in/with CC and practically zero friction and insane customisation
English
0
0
0
37
Ajey Gore
Ajey Gore@AjeyGore·
The long tail of use cases haven’t been even tapped, tech bros are having all the fun, but cowork and antigravity agent manager will change that and give more access to non techies.
English
2
0
14
1.4K
Nihit Nirmal
Nihit Nirmal@Nihit·
@kapoor_riddhi thanks for normalising this for me. Its been a pain to get my money out. Have been at it for a few months now.
English
1
0
0
11
Riddhi Kapoor
Riddhi Kapoor@kapoor_riddhi·
EPF site, OTP valid for 5 mins.. Page loads in 10 mins! #JaiHo
English
1
0
0
40
Nihit Nirmal
Nihit Nirmal@Nihit·
@trq212 @petergyang Peter's issue with Playwright is probably output bloat, the scraped website content is massive. Different problem.
English
0
0
0
50
Peter Yang
Peter Yang@petergyang·
I think the best MCP is probably Playwright MCP but had to turn it off too. 200K token context window makes most MCPs unusable imo.
English
97
4
223
39.7K
Nihit Nirmal
Nihit Nirmal@Nihit·
@AjeyGore and thanks for saying, you are scattered too. Otherwise,I was feeling lonely this Saturday noon fixing my mess. lol
English
0
0
0
101
Ajey Gore
Ajey Gore@AjeyGore·
What’s the work stack people settling on? Notion, notebooklm, obsidian? Plus something else? I am now scattered all over….
English
26
1
18
6.9K
Nihit Nirmal
Nihit Nirmal@Nihit·
@AjeyGore I use Obsidian and Claude Code extensively now. For any research, I use Manus / Gemini and think bring those markdown files to Obsidian for all project work.
English
0
0
0
162
Nihit Nirmal
Nihit Nirmal@Nihit·
I hear your poetic phrase and raise you a Sahir Ludhianvi (Main pal do pal ka shayar hoon): Woh bhi ek pal ka kissa they, Main bhi ek pal ka kissa hoon. Grateful our paths crossed in that one pal, Ajey.
Eesti
1
0
1
296
Nihit Nirmal
Nihit Nirmal@Nihit·
Somewhere along the way I realized: the founder job is less "visionary" and more "compulsive loop-closer." Hear pain. Build. Sell. Learn. Repeat. When shutting down felt like failure, building was what kept me moving. I'm deep in AI + enterprise now. If that's your world, let's talk.
English
0
0
0
29
Nihit Nirmal
Nihit Nirmal@Nihit·
When I started Yuuki, I needed someone else to code and someone else to sell. Two years later, I do both. I used to think sales was performative. Turns out the good version is the opposite, sitting with buyers long enough to hear what they're not saying.
English
1
0
0
45
Nihit Nirmal
Nihit Nirmal@Nihit·
I shut down my startup last month. Returned capital instead of dragging it out. I'm supposed to write the gratitude post now. The journey was the destination, I'd do it all again.I'm not going to write that post. Here's what actually happened. 🧵
English
1
0
1
61