Paul Kuruvilla

2.6K posts

Paul Kuruvilla banner
Paul Kuruvilla

Paul Kuruvilla

@RohitPaulK

Building @codecraftersio

localhost เข้าร่วม Ocak 2013
434 กำลังติดตาม1.9K ผู้ติดตาม
ทวีตที่ปักหมุด
Paul Kuruvilla
Paul Kuruvilla@RohitPaulK·
Started actively working on the next codecrafters.io challenge today: "Build your own SQLite". I've got the stages planned out, now onto building the test cases and trying the challenge out myself. Will post updates here as I go.
CodeCrafters@codecraftersio

Announcing our next challenge: "Build your own SQLite". In this challenge, you'll build a barebones SQLite implementation that supports basic SQL commands like SELECT/INSERT. Along the way we'll learn about SQLite's file format, your indexed data is stored in B-trees and more.

English
4
4
49
0
typedfemale
typedfemale@typedfemale·
people are calling fairlife "chud soylent"
English
6
2
63
8.7K
Paul Kuruvilla
Paul Kuruvilla@RohitPaulK·
@gabriel1 Also you gotta say "make no mistakes", else codex assumes that you want mistakes
English
0
0
0
13
Paul Kuruvilla
Paul Kuruvilla@RohitPaulK·
@gabriel1 Everything except the last 3 words make sense. Mention early returns and it's going to write the stupidest early returns ever. just can't resist following dem instructions
English
1
0
0
56
gabriel
gabriel@gabriel1·
only bottleneck is consuming code, so make sure to tell codex that you want just that: "write extremely easy to consume code, optimize for how easy the code is to read. make the code skimmable. avoid cleverness. use early returns."
English
70
61
1.9K
109.4K
Paul Kuruvilla
Paul Kuruvilla@RohitPaulK·
Funny how everyone's optimising for DX now that agents can code - why couldn't y'all have done this when us humans were doing the work???
English
0
1
2
102
Paul Kuruvilla
Paul Kuruvilla@RohitPaulK·
@a1zhang @ankrgyl Ah. Hard to tell without seeing the full prompt but splitting into n judges, one per criterion, might make it more reliable?
English
1
0
0
34
alex zhang
alex zhang@a1zhang·
@RohitPaulK @ankrgyl Judge is given a rubric of items the answer must satisfy. It returns a list of which items are satisfied and which aren't, reward is only given to this answer if it satisfies all.
English
2
0
1
59
alex zhang
alex zhang@a1zhang·
Ran a small eval today on an LM using GPT-5.2 as a judge. Model scores 10%, but paper reports it scoring 34%. I see that the paper uses GPT-5.1 as a judge; for the sake of consistency I change it. Switch to GPT-5.1 as a judge. Model now scores 43.5%... bro
English
34
26
952
93.7K
Mario Zechner
Mario Zechner@badlogicgames·
i can't speak for david. what i see is this: if you let agents build or extend a codebase with only minor or no supervision, you get unmaintainable garbage, because the agent makes terrible decisions that compound, both big and small. those decisions make it hard for both you and the agent to keep modifying the code base, until eventually it's unrecoverable. why does the agent make bad decisions? i can't tell for sure, but my gut tells me that training data can currently not capture the holistic thinking needed to design and evolve complex systems. that's one part of the problem. related to that, and oversimplified: agents output the "mean quality" of the code they saw during training. most of that code is very bad. specifically tests, which humans are terrible at writing at. another part of the problem is that specification via prompt is not precise enough, so the agent has to fill in the blanks, giving it enough rope to hang itself. the more detailed your spec gets, so the agent gets constrained and less likely to produce crap, the closer you are to handwriting the code yourself, as that's the most detailed version of the spec that can exist. so then you gain nothing. back to prompt spec it is, which means the agent fills in blanks, which means we get suboptimal or truely bad results. using agents can still be a net productivity boost (see other posts in my thread), but it is not easy to come up with consistent workflows that produce both production quality maintainable code while retaining the speed advantages agents give you.
English
18
34
287
14.3K
Mario Zechner
Mario Zechner@badlogicgames·
recommended reading sure to ruffle some feathers. but it's largely true for now. keeping the complexity off the bay is really hard, espwcially if you go full agent orchestration. even if you don't, and human in the loop a lot, automation bias kicks in and your reviews of agent generated code become mostly performative.
David Cramer@zeeg

im fully convinced that LLMs are not an actual net productivity boost (today) they remove the barrier to get started, but they create increasingly complex software which does not appear to be maintainable so far, in my situations, they appear to slow down long term velocity

English
13
25
323
27.9K
Paul Kuruvilla
Paul Kuruvilla@RohitPaulK·
@comma_ai E2E long is awesome. Would be great if it could handle traffic lights and stop signs (hyundai tucson 22)
English
0
0
0
162
comma
comma@comma_ai·
What do you want to see in openpilot this year?
English
68
2
90
13.7K
Paul Kuruvilla
Paul Kuruvilla@RohitPaulK·
@eurydicelives Work feels useful most of school felt useless and enforced for historical reasons
English
0
0
1
10
eurydice
eurydice@eurydicelives·
I know I'm doing a lot of these posts but if you hated school and hate work less (or even like it) I am interested in all the reasons for that difference. "They pay me" is not an answer if I couldn't pay you your current salary to go back to school.
English
51
5
107
7.9K
Paul Kuruvilla
Paul Kuruvilla@RohitPaulK·
@echantech1 Are you using opus 4.6 fast mode? Had a similar increase and that was our root cause (we'll gladly pay though, assuming costs will come down eventually)
English
0
0
0
29
echantech
echantech@echantech1·
Cursor pricing is wild. In the last week, using it full time, I blew through $1000 of tokens. 6 months ago, I would only spend that in a month. Something wierd is going on with their pricing.
English
5
0
9
3.3K
Paul Kuruvilla
Paul Kuruvilla@RohitPaulK·
@rs545837 I'd love to understand this better too - how much of it is pure LLM calls vs. all the other infra bits in-between? Don't think users will put up with the delay in the long-term
English
1
0
1
87
Rohan Sharma
Rohan Sharma@rs545837·
I don't understand why these code review tools are so slow.
English
1
0
0
164
Brie Wolfson
Brie Wolfson@zebriez·
who is the person you know that does the coolest side projects?
English
56
2
109
21.2K
Paul Kuruvilla
Paul Kuruvilla@RohitPaulK·
@jeffdfeng openclaw for $1b, moltbook for $3b, things really be crazy out there
English
0
0
1
74
Jeff
Jeff@jeffdfeng·
so meta has just acquired moltbook, just weeks after openai hired the openclaw creator how much is the social graph of autonomous agents really worth?
Jeff tweet media
English
8
0
24
2.3K
Rhys
Rhys@RhysSullivan·
A $15-$25 PR review bot that catches an incident which would've cost the company $5m in breached SLAs and reputation is a no brainer
English
52
4
229
26.3K