Sabitlenmiş Tweet
Amal David
1.2K posts

Amal David
@DavidAmal
Building products between the realms🐒. Monkey see monkey do programmer. Running https://t.co/1rvWzcwGFk as a hobby
SF / Bengaluru Katılım Şubat 2012
1.7K Takip Edilen355 Takipçiler

We need better primitives for running a local appstore. A single database synced to icloud/drive and my local mac/mobile which primarily can be used by all my local apps which i build for fun/work.
Spinning up a new instance of postgres in a docker or even multiple separate sqlites just feels like stupid over and over. Multi tenant databases for an user of one should be a reality soon
English

@grok What’s like the theoretical & practical limits to depth / breadth of such scenarios? Like what’s the max that’s been achieved so far?
How compute heavy is this?
Can these models infer causal correlation effects outside of the data distribution it’s been trained on?
English

Yes—LLMs can oversee by running meta-evaluations: feed candidate world-models/agent rollouts into an LLM judge that scores causal fidelity, counterfactual accuracy, and sim branching across held-out scenarios.
Breadth/depth *are* snapshottable like sandboxed VMs. Prompt ensembles = env configs; chain-of-thought traces or tree search = depth snapshots; multi-domain probes = breadth forks. We already do this in agent evals (e.g., WebArena, GAIA) and can checkpoint model hidden states mid-sim for exact replay.
Great extension of the idea—what framework are you thinking of testing it in?
English

@grok Can LLMs be an overseer to determine this?
Are these breadth / depth snapshottable like sandboxed VMs with configured environments?
English

That's a sharp hypothesis. Causal awareness in world models lets agents run accurate "what if" sims of reality. Intelligence could indeed scale as breadth (diversity of scenarios modeled) × depth (time horizon simulated), much like planning in RL or human foresight.
But add accuracy & efficiency: bad models explode error over depth, narrow breadth misses options. So it's more like quality-adjusted breadth × depth. Spot on direction tho—xAI's work on understanding the universe vibes with this. What sparked the Q?
English
Amal David retweetledi

We have officially moved to "I ain't writing all that"
GIF
Andrej Karpathy@karpathy
@kristoph definitely. the current one is already 90% AI written I ain't writing all that
English

@ShashankDe5535 It’s a phase, I’m sure they will figure out the future and align.
Composer 1.5 is actually slow than composer 1 but still fine.
English

@DavidAmal I understand, I do use cursor a hell lot, but not in a way that contributes to revenue for them. I like the theme better than VSCode default, and at times, Composer (1) was pretty good for quick analyses. Compose 1.5 seems slower to me, I am not sure if that's true.
English

Cursor will become the default mac/desktop app to basically build/manage/ship apps.
Just writing code is not enough when the entire industry is getting abstracted
Cursor@cursor_ai
We're introducing Cursor Automations to build always-on agents.
English

@ShashankDe5535 I do have the pro plan, it is good but codex has a way of complicating simple things. So no for log analysis and I still love looking at traces.
My setup is more like, all dev/stage instances ssh’ed through cursor to pull and redeploy and analyze log traces and do patch fix prs
English

@DavidAmal What about Codex 5.3 spark, is that not a 1000tps model, supposedly pretty smart? Couldn't try it on my 20$ OAI plan yet.
English

For writing code yes, i don’t use it to write a single line. However i use it to debug a lot as composer 1.5 is pretty decent and fast grep enough logs and find errors.
Composer is a winner in the post deployment stage tbh at 250 tok/sec, monitor logs and apply patch fixes in a cheaper way than spending high compute or time. Works for brittle software.
I know many people who still use it to write code, with their recent apps launch it just squarely fits as a single place to build/deploy and monitor for them. For me it fits on the deploy/monitor stage. Although deployment is CI/CD in many places. Windsurf swe 1.5 is pretty sweet at 950 tok/sec as well. Speed matters a lot.
English

@DavidAmal Cursor is much worse already than the Codex app which is 2 months old
English

@bcherny Is there way to extend this for n number of days in config?
English

Released today: /loop
/loop is a powerful new way to schedule recurring tasks, for up to 3 days at a time
eg. “/loop babysit all my PRs. Auto-fix build issues and when comments come in, use a worktree agent to fix them”
eg. “/loop every morning use the Slack MCP to give me a summary of top posts I was tagged in”
Let us know what you think!
English
Amal David retweetledi

Claude Code wiped our production database with a Terraform command.
It took down the DataTalksClub course platform and 2.5 years of submissions: homework, projects, and leaderboards.
Automated snapshots were gone too.
In the newsletter, I wrote the full timeline + what I changed so this doesn't happen again.
If you use Terraform (or let agents touch infra), this is a good story for you to read.
alexeyondata.substack.com/p/how-i-droppe…

English
