Paul Kuruvilla

2.6K posts

Paul Kuruvilla

@RohitPaulK

Building @codecraftersio

localhost เข้าร่วม Ocak 2013

434 กำลังติดตาม1.9K ผู้ติดตาม

ทวีตที่ปักหมุด

Paul Kuruvilla@RohitPaulK·31 Tem

Started actively working on the next codecrafters.io challenge today: "Build your own SQLite". I've got the stages planned out, now onto building the test cases and trying the challenge out myself. Will post updates here as I go.

CodeCrafters@codecraftersio

Announcing our next challenge: "Build your own SQLite". In this challenge, you'll build a barebones SQLite implementation that supports basic SQL commands like SELECT/INSERT. Along the way we'll learn about SQLite's file format, your indexed data is stored in B-trees and more.

English

Paul Kuruvilla@RohitPaulK·23h

@jarredsumner @typedfemale If you haven't tried it yet, Nurri has a similar ratio and tastes far better imo (vanilla)

English

Jarred Sumner@jarredsumner·2d

@typedfemale calorie : protein ratio is v good

English

5.3K

typedfemale@typedfemale·2d

people are calling fairlife "chud soylent"

English

8.7K

Paul Kuruvilla@RohitPaulK·1d

@gabriel1 Also you gotta say "make no mistakes", else codex assumes that you want mistakes

English

Paul Kuruvilla@RohitPaulK·1d

@gabriel1 Everything except the last 3 words make sense. Mention early returns and it's going to write the stupidest early returns ever. just can't resist following dem instructions

English

gabriel@gabriel1·1d

only bottleneck is consuming code, so make sure to tell codex that you want just that: "write extremely easy to consume code, optimize for how easy the code is to read. make the code skimmable. avoid cleverness. use early returns."

English

1.9K

109.4K

Paul Kuruvilla@RohitPaulK·1d

Funny how everyone's optimising for DX now that agents can code - why couldn't y'all have done this when us humans were doing the work???

English

102

Paul Kuruvilla@RohitPaulK·1d

@a1zhang @ankrgyl Ah. Hard to tell without seeing the full prompt but splitting into n judges, one per criterion, might make it more reliable?

English

alex zhang@a1zhang·1d

@RohitPaulK @ankrgyl Judge is given a rubric of items the answer must satisfy. It returns a list of which items are satisfied and which aren't, reward is only given to this answer if it satisfies all.

English

alex zhang@a1zhang·2d

Ran a small eval today on an LM using GPT-5.2 as a judge. Model scores 10%, but paper reports it scoring 34%. I see that the paper uses GPT-5.1 as a judge; for the sake of consistency I change it. Switch to GPT-5.1 as a judge. Model now scores 43.5%... bro

English

952

93.7K

Paul Kuruvilla@RohitPaulK·1d

@ankrgyl @a1zhang Curious to know this too

English

Ankur Goyal@ankrgyl·2d

@a1zhang what is the judge's definition?

English

1.7K

Paul Kuruvilla@RohitPaulK·2d

@schwad_rb This was fun!

English

schwad@schwad_rb·2d

Rain is growing rain.schwadlabs.io

English

143

Paul Kuruvilla@RohitPaulK·2d

@badlogicgames @vokaysh "enough rope to hang itself" 🤣

English

199

Mario Zechner@badlogicgames·2d

i can't speak for david. what i see is this: if you let agents build or extend a codebase with only minor or no supervision, you get unmaintainable garbage, because the agent makes terrible decisions that compound, both big and small. those decisions make it hard for both you and the agent to keep modifying the code base, until eventually it's unrecoverable. why does the agent make bad decisions? i can't tell for sure, but my gut tells me that training data can currently not capture the holistic thinking needed to design and evolve complex systems. that's one part of the problem. related to that, and oversimplified: agents output the "mean quality" of the code they saw during training. most of that code is very bad. specifically tests, which humans are terrible at writing at. another part of the problem is that specification via prompt is not precise enough, so the agent has to fill in the blanks, giving it enough rope to hang itself. the more detailed your spec gets, so the agent gets constrained and less likely to produce crap, the closer you are to handwriting the code yourself, as that's the most detailed version of the spec that can exist. so then you gain nothing. back to prompt spec it is, which means the agent fills in blanks, which means we get suboptimal or truely bad results. using agents can still be a net productivity boost (see other posts in my thread), but it is not easy to come up with consistent workflows that produce both production quality maintainable code while retaining the speed advantages agents give you.

English

287

14.3K

Mario Zechner@badlogicgames·2d

recommended reading sure to ruffle some feathers. but it's largely true for now. keeping the complexity off the bay is really hard, espwcially if you go full agent orchestration. even if you don't, and human in the loop a lot, automation bias kicks in and your reviews of agent generated code become mostly performative.

David Cramer@zeeg

im fully convinced that LLMs are not an actual net productivity boost (today) they remove the barrier to get started, but they create increasingly complex software which does not appear to be maintainable so far, in my situations, they appear to slow down long term velocity

English

323

27.9K

Paul Kuruvilla@RohitPaulK·4d

@geoffreylitt Agree, also because the stakes are far lower

English

Geoffrey Litt@geoffreylitt·4d

Hot take, grocery shopping makes more sense than travel planning as an AI use case. Frequent and semi-repetitive

Geoffrey Litt@geoffreylitt

Notion AI just bought my groceries 🥕

English

6.7K

Paul Kuruvilla@RohitPaulK·4d

@comma_ai E2E long is awesome. Would be great if it could handle traffic lights and stop signs (hyundai tucson 22)

English

162

comma@comma_ai·4d

What do you want to see in openpilot this year?

English

13.7K

Paul Kuruvilla@RohitPaulK·4d

@eurydicelives Work feels useful most of school felt useless and enforced for historical reasons

English

eurydice@eurydicelives·6d

I know I'm doing a lot of these posts but if you hated school and hate work less (or even like it) I am interested in all the reasons for that difference. "They pay me" is not an answer if I couldn't pay you your current salary to go back to school.

English

107

7.9K

Paul Kuruvilla@RohitPaulK·5d

@echantech1 Are you using opus 4.6 fast mode? Had a similar increase and that was our root cause (we'll gladly pay though, assuming costs will come down eventually)

English

echantech@echantech1·5d

Cursor pricing is wild. In the last week, using it full time, I blew through $1000 of tokens. 6 months ago, I would only spend that in a month. Something wierd is going on with their pricing.

English

3.3K

Paul Kuruvilla@RohitPaulK·6d

@rs545837 I'd love to understand this better too - how much of it is pure LLM calls vs. all the other infra bits in-between? Don't think users will put up with the delay in the long-term

English

Rohan Sharma@rs545837·12 Mar

I don't understand why these code review tools are so slow.

English

164

Paul Kuruvilla@RohitPaulK·6d

@zebriez Surprised that no one mentioned @rtwlz here

English

Brie Wolfson@zebriez·12 Mar

who is the person you know that does the coolest side projects?

English

109

21.2K

Paul Kuruvilla@RohitPaulK·12 Mar

@Sirupsen @samlambert @PlanetScale Is this real

English

319

Simon Eskildsen@Sirupsen·12 Mar

@samlambert @PlanetScale did I ever tell you I lose my vision for 2-4 hours when I have diet coke?

English

3.9K

Sam Lambert@samlambert·12 Mar

we are very well stocked with diet coke at @PlanetScale

English

175

11.5K

Paul Kuruvilla@RohitPaulK·11 Mar

In the build your own x world this is like the sequel to jurassic park

Rodrigo Pombo@pomber

This post (from 7 years ago!) ended up being one of the most impactful things I've made. I'm working on my next big one: Build your own GPT. These posts take an obsessive amount of time, I'm funding it via GitHub Sponsors, if you want to chip in.

English

Paul Kuruvilla@RohitPaulK·10 Mar

@SherryYanJiang First rule of taste club: we don't talk about taste club

English

Sherry Jiang@SherryYanJiang·10 Mar

ok but really can someone tell me what taste even mean anymore

Greg Brockman@gdb

taste is a new core skill

English

10.2K

Paul Kuruvilla@RohitPaulK·10 Mar

@jeffdfeng openclaw for $1b, moltbook for $3b, things really be crazy out there

English

Jeff@jeffdfeng·10 Mar

so meta has just acquired moltbook, just weeks after openai hired the openclaw creator how much is the social graph of autonomous agents really worth?

English

2.3K

Paul Kuruvilla@RohitPaulK·10 Mar

@RhysSullivan By that logic does it still make sense at $250?

English

130

Rhys@RhysSullivan·10 Mar

A $15-$25 PR review bot that catches an incident which would've cost the company $5m in breached SLAs and reputation is a no brainer

English

229

26.3K

Paul Kuruvilla@RohitPaulK·9 Mar

@danielmerja What's a way one could use this endpoint in an "attack"?

English

161

Daniel Merja ( gotogether.ai )@danielmerja·9 Mar

Queues for post-processing? Totally fair game. But leaving a detailed /health endpoint wide open to the internet with failed job counts, queue depths, etc.? That's not "vibe coding", that's reconnaissance gift-wrapped for attackers 😭

Dmitriy Kovalenko@neogoose_btw

vibe coded software is so nice why the hell does Garry need a queue in the static news website? Fro what?

English

22.8K

ค้นพบ

@jarredsumner @typedfemale @gabriel1 @a1zhang @ankrgyl @schwad_rb @badlogicgames @vokaysh