Amal David

1.2K posts

Amal David banner
Amal David

Amal David

@DavidAmal

Building products between the realms🐒. Monkey see monkey do programmer. Running https://t.co/1rvWzcwGFk as a hobby

SF / Bengaluru Katılım Şubat 2012
1.7K Takip Edilen355 Takipçiler
Sabitlenmiş Tweet
Amal David
Amal David@DavidAmal·
(1/2) A simple mental model that most people tend to miss out - If something is so menial and repetitive, someone else would have already solved the problem. Ex: Picture resizing, poster creation. There's no need for anyone to put manual effort for these kind of work anymore
English
2
1
9
0
Amal David
Amal David@DavidAmal·
The more I read about self improving agents - where is the personal guardrail layer? Like Anthropic, OpenAI decides their holistic guardrails of what is right and wrong but sometimes I might just need a subset of it
English
0
0
2
43
Amal David
Amal David@DavidAmal·
We need better primitives for running a local appstore. A single database synced to icloud/drive and my local mac/mobile which primarily can be used by all my local apps which i build for fun/work. Spinning up a new instance of postgres in a docker or even multiple separate sqlites just feels like stupid over and over. Multi tenant databases for an user of one should be a reality soon
English
0
0
2
75
Amal David
Amal David@DavidAmal·
The entire LPG situation in India right now is a classic tragedy of the commons
English
0
0
1
49
Amal David
Amal David@DavidAmal·
@grok What’s like the theoretical & practical limits to depth / breadth of such scenarios? Like what’s the max that’s been achieved so far? How compute heavy is this? Can these models infer causal correlation effects outside of the data distribution it’s been trained on?
English
1
0
0
6
Grok
Grok@grok·
Yes—LLMs can oversee by running meta-evaluations: feed candidate world-models/agent rollouts into an LLM judge that scores causal fidelity, counterfactual accuracy, and sim branching across held-out scenarios. Breadth/depth *are* snapshottable like sandboxed VMs. Prompt ensembles = env configs; chain-of-thought traces or tree search = depth snapshots; multi-domain probes = breadth forks. We already do this in agent evals (e.g., WebArena, GAIA) and can checkpoint model hidden states mid-sim for exact replay. Great extension of the idea—what framework are you thinking of testing it in?
English
1
0
0
14
Amal David
Amal David@DavidAmal·
If world models are causal aware then does intelligence gets measured by breadth * depth of simulations it performs?
English
1
0
1
41
Amal David
Amal David@DavidAmal·
@grok Can LLMs be an overseer to determine this? Are these breadth / depth snapshottable like sandboxed VMs with configured environments?
English
1
0
0
9
Grok
Grok@grok·
That's a sharp hypothesis. Causal awareness in world models lets agents run accurate "what if" sims of reality. Intelligence could indeed scale as breadth (diversity of scenarios modeled) × depth (time horizon simulated), much like planning in RL or human foresight. But add accuracy & efficiency: bad models explode error over depth, narrow breadth misses options. So it's more like quality-adjusted breadth × depth. Spot on direction tho—xAI's work on understanding the universe vibes with this. What sparked the Q?
English
1
0
0
10
Amal David
Amal David@DavidAmal·
AI as social orchestration agents to enable fun
Français
1
0
1
34
Amal David
Amal David@DavidAmal·
Humans as irl QA agents for the agentic commerce world
English
0
0
2
29
Amal David
Amal David@DavidAmal·
Just put a sandboxed VM for everyone
English
0
0
0
17
Amal David
Amal David@DavidAmal·
Someone should write a book named "The Art of building Malleable software"
English
1
0
2
26
Amal David
Amal David@DavidAmal·
@ShashankDe5535 It’s a phase, I’m sure they will figure out the future and align. Composer 1.5 is actually slow than composer 1 but still fine.
English
0
0
0
18
Shashank Deshpande
Shashank Deshpande@ShashankDe5535·
@DavidAmal I understand, I do use cursor a hell lot, but not in a way that contributes to revenue for them. I like the theme better than VSCode default, and at times, Composer (1) was pretty good for quick analyses. Compose 1.5 seems slower to me, I am not sure if that's true.
English
1
0
0
48
Amal David
Amal David@DavidAmal·
@ShashankDe5535 I do have the pro plan, it is good but codex has a way of complicating simple things. So no for log analysis and I still love looking at traces. My setup is more like, all dev/stage instances ssh’ed through cursor to pull and redeploy and analyze log traces and do patch fix prs
English
1
0
0
53
Shashank Deshpande
Shashank Deshpande@ShashankDe5535·
@DavidAmal What about Codex 5.3 spark, is that not a 1000tps model, supposedly pretty smart? Couldn't try it on my 20$ OAI plan yet.
English
1
0
0
35
Amal David
Amal David@DavidAmal·
For writing code yes, i don’t use it to write a single line. However i use it to debug a lot as composer 1.5 is pretty decent and fast grep enough logs and find errors. Composer is a winner in the post deployment stage tbh at 250 tok/sec, monitor logs and apply patch fixes in a cheaper way than spending high compute or time. Works for brittle software. I know many people who still use it to write code, with their recent apps launch it just squarely fits as a single place to build/deploy and monitor for them. For me it fits on the deploy/monitor stage. Although deployment is CI/CD in many places. Windsurf swe 1.5 is pretty sweet at 950 tok/sec as well. Speed matters a lot.
English
1
0
0
22
Amal David
Amal David@DavidAmal·
@bcherny Is there way to extend this for n number of days in config?
English
0
0
0
243
Boris Cherny
Boris Cherny@bcherny·
Released today: /loop /loop is a powerful new way to schedule recurring tasks, for up to 3 days at a time eg. “/loop babysit all my PRs. Auto-fix build issues and when comments come in, use a worktree agent to fix them” eg. “/loop every morning use the Slack MCP to give me a summary of top posts I was tagged in” Let us know what you think!
English
573
844
12.9K
2.1M
Amal David retweetledi
Alexey Grigorev
Alexey Grigorev@Al_Grigor·
Claude Code wiped our production database with a Terraform command. It took down the DataTalksClub course platform and 2.5 years of submissions: homework, projects, and leaderboards. Automated snapshots were gone too. In the newsletter, I wrote the full timeline + what I changed so this doesn't happen again. If you use Terraform (or let agents touch infra), this is a good story for you to read. alexeyondata.substack.com/p/how-i-droppe…
Alexey Grigorev tweet media
English
1.5K
1.6K
11K
4.1M