Craig Certo

458 posts

Craig Certo

@craig_certo

Senior Engineer @ https://t.co/Ak31Cqs5F9 Founder @ https://t.co/MpILg0cqmO Founder @ https://t.co/KDvbQr5563

Katılım Ocak 2026

112 Takip Edilen53 Takipçiler

Craig Certo@craig_certo·25m

@crunchydata Thanks for sharing

English

Crunchy Data@crunchydata·11h

"Because we don't necessarily know at this point" - commit from 2004 that still exists in Postgres today. The below screenshot is from the `analyze.c` file in the Postgres source code. The number 300 is a hardcoded value inside of Postgres's ANALYZE code. The rationale is based on a paper entitled "Random sampling for histogram construction: how much is enough?" written in 1998 when data sizes were much smaller and hardware was much slower. The question the paper answers is: to build statistics enabling optimization of queries of unindexed data, how many rows does ANALYZE need to sample to build accurate enough statistics? The answer is 300-ish samples for each bin you want in your equi-height histogram. Why? The paper shows that required sample size grows linearly with the number of bins but only logarithmically with table size for most cases, so you see diminishing returns beyond a few hundred samples per bin. For instance, the default `statistics_target` is 100. That means Postgres aims to sample 300 x 100 values to build an equi-height histogram with 100 bins and while also storing the 100 most common values. (Check out the previous post for deets on how Postgres uses equi-height histogram and most common values) Why all this work for unindexed data? Because in 1998, indexes were extraordinarily costly to build and maintain. Indexes took up valuable disk space, used the limited IOPs during writes and builds. Additionally table scans were slow and blocking. In 1998, hard drive performance was measured in RPMs, so talking IOPs was variable because random page seeks required waiting for the disk to rotate, and location on disk was unknown. The tests for this paper ran on Pentium 200MHz with 64MB of RAM, and a 7.2k RPM SCSI drive. Postgres users continue to benefit from this work during the era of constrained resources. Indexes aren't free today, and you can have too many indexes, but they aren't as costly as they were. Also, unindexed data isn't as costly as it was. The paper also acknowledges the problem is "provably difficult by establishing a limit on the achievable accuracy of estimation in the worst-case." Thus, "we devise a simple estimator which we believe is optimal." This number is a tradeoff between accuracy and performance. A smaller multiplier would lead to less accurate statistics, which could cause the planner to make bad decisions. A larger multiplier would lead to more accurate statistics, but it would also make ANALYZE slower. And remember, ANALYZE was much, much slower back then. What does statistics_target control? The statistics target controls the number of values stored for Most Common Values and the Equi-height Histogram. The following is true: ``` statistics_target = 100 → 30,000 samples, 100 MCVs, 100 buckets statistics_target = 500 → 150,000 samples, 500 MCVs, 500 buckets statistics_target = 1000 → 300,000 samples, 1000 MCVs, 1000 buckets ``` This value is set by default at the database level, and can be overridden at the column level. ``` -- Per-column override: ALTER TABLE requests ALTER COLUMN status_code SET STATISTICS 500; ANALYZE requests; ``` For larger databases, there is usually at least one column where a per column setting may be the right approach. Don't raise the global default just because one column needs more granularity. Given the performance gains of the underlying hardware, the performance gains from changing column statistics aren't as significant as they once were.

English

9.5K

Craig Certo@craig_certo·43m

@dondawastaken Fueling your body with top tier gasoline is underrated

English

556

✮ راينر براون@dondawastaken·10h

Never forget that you are the main data center. Drink water, and consume as much literature as possible.

English

6.7K

35.9K

287.3K

Craig Certo@craig_certo·43m

@dondawastaken Nice

English

Craig Certo@craig_certo·2h

@gabeviggers This is the correct take

English

gabriel@gabeviggers·4h

All backend engineers are now full stack, but all frontend engineers are not

English

1.3K

Craig Certo@craig_certo·8h

Thanks for sharing, and I've seen similar things in my own work. You know at your core that it could be faster and that you could be accelerating everything with AI, but honestly, it doesn't feel always like the output matches up. And part of it definitely is the fact that you have to wait such a long time in between turns.

English

914

Taelin@VictorTaelin·10h

Status update: I've been on/off AI agents in the last few days and it is a verifiable truth that every day I didn't use agents, I was more productive. I still attribute that to how slow they are, and my own inability to multi-task efficiently. The magic is there but the slowness doesn't let it cross the threshold where they actually make me faster, and I still dislike the whole thinking paradigm. About Bend2: honestly, the C/Metal compiler codebase is a clusterfuck right now. I regret letting AI agents write it. All tests pass, and GPU performance is mind-blowing, so the core architecture works. Yet, it has a LOT of bugs. Anything not covered by the tests is a coin toss. This is actually impressive, because, in many parts of the codebase, the right solution was actually the simplest one, yet, the agents STILL managed to find a way to make it work just for the tests. The level of reward hack these agents output is actually impressive I can't even be mad. It is also ironical because that's the very problem that Bend's proof system was supposed to solve, but Bend is in TypeScript, not in Bend. I'm disappointed I didn't write Bend in itself, and now I feel an immense urge to do so. But the clock is ticking . . . Still, I do not think Bend is worth launching without the GPU compiler being solid, because the closest competitor, Lean, is actually extremely good, so we need a big differential. Yet, due to the very nature of the project, it would be embarrassing to have bugs at launch. Regarding AI, I now believe using current gen AI agents in production codebase is harmful and a massive mistake. That doesn't mean no agents at all, but agents work best when they don't touch critical code. Debugging, researching, providing insights, scripts / tools, or anything that doesn't touch code you will maintain in the long term. But if you merge AI code without reading, you're going to have a bad time. Speaking from experience I'm working 10h/day on SupGen and the remaining time on Bend2

English

1.1K

141.5K

Craig Certo@craig_certo·12h

@ClearwaterCoder It comes down to one thing: joy of building

English

614

AJ@ClearwaterCoder·1d

The Minecraft to Computer Science pipeline needs to be studied

English

211

2.9K

87.1K

Craig Certo@craig_certo·1d

@Suhail Agreed. It’s very difficult once you build something with AI completely to go back and understand it from the foundations and it makes it very challenging to improve I spend potentially days building out the spec so I can trace all code back to it, but even then it’s challenging

English

Suhail@Suhail·1d

Nothing seems to replace understanding the thing you're working on if you intend to improve it. AI generating 2K lines of slop may initially work but once you want to tweak it or understand the nuance of what's happening under the hood, you end up wishing you had built it up.

English

147

9.1K

Craig Certo@craig_certo·1d

@DevanshuXi What IDE is this?

English

Devanshu@DevanshuXi·3d

Cool morning tbh, I was solving the classic “Jump Game V” DP problem and got something interesting to know. Top-down memoization and bottom-up tabulation do not reveal dependencies the same way, even if the recurrence looks identical. The problem itself looked pretty straightforward at first. From an index i, you can jump at most d positions left or right, but only to smaller values, and every element in-between should’ve also been smaller. My memoized DFS got accepted at first. The recurrence was basically: cache[i] = 1 + max(cache[j]) where: arr[j] < arr[i] now while writing the recursive solution, I honestly didn’t even think much about dependency ordering. DFS just naturally handled everything. And that’s the interesting part. When recursion solves solve(i), it automatically goes and computes all the smaller reachable states first. Only after those answers come back does it memoize cache[i]. So even though the graph dependencies are directional, recursion kinda performs a lazy topological traversal for you behind the scenes. That’s why the memoized version works perfectly fine without sorting anything. But then I tried converting the exact same logic into bottom-up DP, and things suddenly started feeling weird. Initially I thought I could just do something like: for(i = 0; i < n; i++) compute cache[i] But this ordering completely falls apart. The issue is that cache[i] depends on states having strictly smaller VALUES, not smaller INDICES. And that distinction changes the whole problem. While computing: cache[i] = 1 + cache[j] there’s absolutely no guarantee that cache[j] already exists if we’re simply iterating left-to-right. Some dependencies are on the left. Some are on the right. The graph literally does not care about positional ordering. The actual dependency rule is arr[i] > arr[j], that means edges always go larger value -> smaller value That was the key realization for me. The graph isn’t index-ordered at all. It’s value-ordered. Not it looked more like a DAG ordering problem. So for bottom-up DP, states should’ve been processed in increasing order of values. That’s why sorting suddenly becomes necessary: sort(states by arr[i]) Now when processing an index i, every smaller reachable state has already been computed. What recursion was handling implicitly, tabulation now forces you to handle explicitly. And honestly, I got to know something new about DP. A lot of people say: “Tabulation is just memoization with loops.” But that’s not really true. Memoization discovers dependency order dynamically during recursion. Tabulation requires you to design the dependency order beforehand. And most of the time, that’s actually the hardest part. This is also why so many bottom-up DP problems suddenly require: sorting, topological ordering, monotonic processing, DAG linearization, coordinate ordering, etc., even though their recursive versions feel completely natural.

English

1.8K

Craig Certo@craig_certo·1d

@ExampleTestcase @HanchungLee Why do you say MCP for real SWE and CLI for hackers? I feel the opposite

English

Hui Kang Tong@ExampleTestcase·1d

@HanchungLee Relatable But now we should be building tools (CLIs if you are a hacker, or MCPs if you are a real software engineer) for AI to do exactly this

English

2.2K

Han@HanchungLee·1d

ngmi if complaining about looking at data. welcome to ai/ml.

English

268

30.4K

Craig Certo@craig_certo·1d

@gajesh Would love to hear what comes of this

English

Gajesh@gajesh·1d

i’m looking to talk to people who’ve unlocked deep focus in their life and work. happy to pay for your time or donate to a charity you care about. specifically, people who can hold a big vision in their head, build a path toward it, and keep walking that path even when novelty, fear, or distraction shows up. i want to hear the stories that changed your mindset around focus. the moments that forced you to become disciplined. the daily systems, hacks, videos, routines, or mental models that helped you lock in. i have minor ADHD, and i’m trying to find natural ways to work with it – not suppress it. there’s a beautiful abstract world ADHD opens up: weird connections, rapid exploration, parallel ideas, creative jumps. AI has helped me amplify that a lot, especially with coding and parallelizing tasks. but there’s a point where parallelization stops helping. you need to choose one path, sit with it, and laser focus until the thing becomes real. i’m trying to get better at that. if this is you, or someone you know, please comment or DM. also feel free to share any methods, videos, books, or hacks that genuinely helped you focus.

English

6.9K

Craig Certo@craig_certo·1d

@DevanshuXi Love it! I feel the constant pressure to build as quickly as possible, but what you’re describing sounds really peaceful

English

1.6K

Devanshu@DevanshuXi·1d

That’s what I genuinely feel these days. I don’t provide value to any company right now. I don’t have customers waiting on me. No production outage depends on my code. No market pressure is forcing me to ship faster. So why the hell should I optimize everything through LLMs? Lately I’ve just been sitting alone, thinking deeply, writing completely raw code with my own hands, staring at logs for hours, debugging stupid failing test cases that nobody in the world cares about except me. And weirdly… I enjoy it. Because for the first time in a while, the process actually feels real. The failing tests frustrate me. The broken abstractions annoy me. The ugly edge cases make me question my own thinking. But at least I’m feeling something while building. Maybe this is inefficient. Maybe I should be “10x productive.” Maybe I should automate half my thinking away. But honestly, at this stage of my life, it literally doesn’t matter. Nobody is dying if my side project ships two weeks later. No investor dashboard is tracking my velocity. No PM is asking for updates. So I’d rather struggle through the code myself, understand every ugly detail, and let the frustration shape how I think. Because somewhere in those failing logs and terrible midnight debugging sessions, it still feels like I’m actually learning computers instead of just operating them.

Mo@atmoio

I'm done. I'm f***ing done.

English

217

31.3K

Craig Certo@craig_certo·1d

Try taking a weekend off. You’ll come back to work and realize that a lot of the things on Friday that you thought were extremely important don’t actually really matter It’s good to get perspective, otherwise you’re driving 100 mph without stopping to see if you’re going the right direction (or if there’s somewhere on your route you want to stop that’s worth the time)

English

Andrey@Andrey__HQ·2d

@palashshah honestly kinda crazy that some people genuinely take weekends to fully reset. I guess just one day is good but both is wild

English

519

Palash Shah@palashshah·2d

"you have work to do on a saturday night?" yes, there's an always work. there's actually an unlimited amount of work. there's a billion different things happening in the space that i work in, and no amount of time is enough. have to give this explanation in nyc quite a lot.

English

195

14.9K

Craig Certo@craig_certo·1d

@lsrspeakstocomp @OpenAI This looks awesome

English

332

Lakshay Sagar Rana@lsrspeakstocomp·2d

hey guys, this is lovely! @OpenAI. System Engineering to the win!

English

662

42.6K

Craig Certo@craig_certo·3d

@mstockton Talking into my computer has been a game changer I tell everyone i know that they need to start doing it

English

Matt Stockton@mstockton·3d

I wrote recently that using voice as an input is probably the biggest practical unlock I’ve found for getting more out of AI. A close second is using a tool that has access to the file system. I think a lot of people are still using AI in a pretty limited way. They open a blank chat window, paste in some context, ask a question, and then start over again the next time. That works, and I still do it all the time, but you’re forcing the model to rebuild context from scratch every time. I mean tools like Codex, Claude Code, cowork, Pi, or anything similar where the model can actually work against a folder on your computer. The models are very good at this. They can list directories, read files, search across them, follow links, inspect structure, and figure out where the useful context lives. That is a very different experience than pasting a few things into a web chat. A lot of this probably comes from the coding use case. These models have gotten very good at working through codebases with basic file system tools. But the same pattern works for non-coding work too. If you give the model a folder of well-organized information, it can explore that folder like a map. The other piece is that you don’t want the model starting from scratch every time. You want to give it a better starting point. For me, that usually means small markdown files with project context, rules, constraints, preferences, examples, and links to other relevant files. There is some art to this. What goes in those files, how detailed they are, how they link together, and how they get maintained all matter. I don’t think there is one perfect way to do it. But the starting point is simple. If you already pay for Anthropic or OpenAI and you have not tried this, I’d really encourage you to download one of the desktop tools, open it up, point it at a folder, and start playing around. I’m pretty confident that once you do this for real work, some of your workflows will shift there quickly.

English

2.5K

Craig Certo@craig_certo·3d

@rauchg careercatalyst.cc Started with Sonnet 3.7 last year

English

706

Guillermo Rauch@rauchg·3d

Show me the thing you’ve built with AI you’re most proud of. Reply with a working product URL and what model / agent you primarily used.

English

155

527.3K

Craig Certo@craig_certo·3d

@irl_danB @nbaschez This looks interesting

English

153

dan@irl_danB·3d

@nbaschez , roughdraft.md is a joy to use, it's become my default mode of interaction for certain types of collaboration overnight mostly spec writing but also financial modeling and systems architecting

English

10.2K

Craig Certo@craig_certo·3d

@lateinteraction Unfortunately you don’t know when it’s happening and they benefit greatly without rewarding those sharing Not a frontier lab hater but would discourage me from sharing

English

440

Omar Khattab@lateinteraction·4d

Most people in this part of twitter don’t realize how closely folks at the frontier labs pay attention to all your favorite academic ML releases on here.

English

636

39.8K

Craig Certo@craig_certo·3d

Who else is exclusively talking into their computer now? Feel like a psycho but it’s so fast

English

Craig Certo@craig_certo·3d

@Dhavalsingh7 This is actually so true Really hard to get into the flow because I’m speaking into my computer and the music fucks with it

English

Dhaval singh@Dhavalsingh7·3d

Worst part is I cant listen to music while coding because I'm dictating to machines every 2 mins🥲

English

Dhaval singh@Dhavalsingh7·3d

I don't know how to explain it well, but the kind of fatigue I feel after "coding" using agents vs the older days is very different. The things from which I would get satisfaction, the feeling of "craft" or being "creative" seems to have gone down and it's more about more efficent ways to get shit done. Which overall seems to be a plus, I can ship more than before and from a org standpoint it matters but from a personal standpoint i'm not so sure. Most of the discussions are around how to get coding agents to be better in your workflows and not here is a beautiful pattern you can apply and look how unmagled all this code became. Don't get me wrong, we are still applying a lot of well tested patterns that might improve your codebase, but the satisfaction you get from that is 2nd hand since you are not writing the code anyways... I wish there was still a corner of the internet where people talk about artisanal code written by hand and the vibes are like before

English

532

Craig Certo@craig_certo·3d

@joelrightler What’s product hunt? Thanks for sharing

English

Joel Rightler@joelrightler·4d

Three days ago I launched my app on Product Hunt. It bombed. Two upvotes by mid-morning. No comments. Here's what I learned in the 72 hours that followed. 🧵

English

Keşfet

@crunchydata @dondawastaken @gabeviggers @ClearwaterCoder @Suhail @DevanshuXi @ExampleTestcase @HanchungLee