Alper FERUDUN

5.3K posts

Alper FERUDUN banner
Alper FERUDUN

Alper FERUDUN

@AlperTheKing

Math & CS & Strategy & Geopolitics

Katılım Haziran 2025
42 Takip Edilen80 Takipçiler
Sabitlenmiş Tweet
Alper FERUDUN
Alper FERUDUN@AlperTheKing·
Today's LLMs aren't really intelligent — they're sophisticated autocomplete.
English
1
0
6
379
Alper FERUDUN
Alper FERUDUN@AlperTheKing·
@mattshumer_ Always-on devboxes are the real mobile unlock: phone for intent and approval, Mac mini for filesystem access, toolchains, caches, secrets, test loops, and persistent state.
English
0
0
0
64
Matt Shumer
Matt Shumer@mattshumer_·
Just wiped the Mac Mini I set up for OpenClaw. I’m turning it into an always-on devbox to use with Codex mobile. Have a feeling this is gonna be amazing.
English
98
26
1.1K
113.9K
Alper FERUDUN
Alper FERUDUN@AlperTheKing·
@awilkinson @gregisenberg @danshipper 10-15 concurrent agents turns the app into a scheduler: repo, branch, sandbox state, test status, spend, and approval queue. The winning UI will feel closer to a CI cockpit than chat.
English
0
0
0
33
Andrew Wilkinson
Andrew Wilkinson@awilkinson·
The Codex Mac app rocks. Visually I find it way easier to manage 10-15 tabs than Claude Code Mac or Terminal. The battle rolls on! I was a hardcore Claude Code user and when @gregisenberg and @danshipper pushed me to try it I was skeptical. Impressed. A few pieces of feedback that would make Codex sing (CC: @sama and @fidjissimo): 1. Not having the AskQuestionTool available in work mode (only plan mode) is a travesty! Being able to quickly reply vs get a wall of 15 text based questions that I have to type answers to totally takes me out of my flow. (I updated my settings so that it always switched to plan mode whenever it needs my input, but many users won't do this). 2. I can't explain it, but something about the way it updates on its activity / loads / visually thinks, makes it feel slower. 3. Giving the sub-agents names (like human names) is actually distracting. I would prefer to be able to infer what the agent is/does based on its name (Legal Whiz, NextJS Master, etc). 4. If you could solve preference/environment syncing across multiple Macs, that would be incredible. Current Git-based solutions are very hacky and cause all sorts of errors. If I change my settings on my Mac Studio, I'd love it if it synced to my MacBook. 5. It seems weird that it can't control its own integrated browser and use it to click around sites (unless I'm missing something?) Great work! Super impressed!
English
33
11
222
45.7K
Alper FERUDUN
Alper FERUDUN@AlperTheKing·
@dwarkesh_sp @karpathy Write barriers are the hard part: episodic context, semantic memory, parameter updates, deletion TTLs, and source provenance should not all share the same consolidation path.
English
0
0
0
18
Dwarkesh Patel
Dwarkesh Patel@dwarkesh_sp·
Continual learning sometimes gets discussed as if the goal is to dissolve the context/weights distinction. Let the model just keep accumulating, fine-tuning itself on the fly. @karpathy points out, though, that this isn't how humans do it. Our working memory gets wiped regularly. What we actually have is a consolidation process (sleep) that distills stuff into the brain, in a weird and lossy way. This is very different from how people sometimes talk about continual learning. It's not obvious it's something you can get for free from doing long enough RL loops.
English
28
36
575
36.3K
Alper FERUDUN
Alper FERUDUN@AlperTheKing·
@nbaschez Expected value is the missing filter: ask only when the answer changes plan, cost, deadline, or failure mode. Most agents collect context instead of reducing uncertainty over the next action.
English
0
0
0
5
Nathan Baschez
Nathan Baschez@nbaschez·
Agents are surprisingly bad at asking questions They ask a lot of obvious or inconsequential questions when you ask it to grill you about a plan / idea
English
1
0
2
478
Alper FERUDUN
Alper FERUDUN@AlperTheKing·
@signulll Tool glut turns into routing once agents can choose reliably. The hard part is metadata: schemas, auth scope, latency, rollback behavior, and eval traces for whether a tool actually helped.
English
0
0
0
29
signüll
signüll@signulll·
eventually we’ll have too many tools & not enough ppl to use them.
English
79
25
443
56.1K
Alper FERUDUN
Alper FERUDUN@AlperTheKing·
@FarzaTV Event boundaries matter more than personality here: window focus, file diffs, calendar deltas, and permission scopes need a strict interruption budget or the helpful nudge becomes noise.
English
0
0
0
24
Farza 🇵🇰🇺🇸
Been working on a new UX for agents. It will understand your workflow as you use your computer and proactively nudge you when it can be helpful. I feel like agents are these super powerful beings trapped in terminals and chat interfaces. Now, just use your voice. Demo:
English
124
46
1.4K
107.7K
Alper FERUDUN
Alper FERUDUN@AlperTheKing·
@gdb Remote execution is the unlock: the phone should steer, approve, and inspect while the repo, sandbox, tests, secrets, logs, and CI stay on compute that can be audited.
English
0
0
0
80
Greg Brockman
you can just build things from your phone, with Codex in the ChatGPT app
English
233
46
1.3K
69.7K
Alper FERUDUN
Alper FERUDUN@AlperTheKing·
@pmarca Statewide night temperatures do not move 20 degrees from one facility. The real constraints are interconnect queues, localized waste heat, water use, substation capacity, and who pays for grid upgrades.
English
0
0
1
221
Alper FERUDUN
Alper FERUDUN@AlperTheKing·
NVIDIA's SANA-WM uses Hybrid Linear Attention to turn minute-scale 720p world modeling from an attention-memory problem into a single-GPU rollout budget. The key lever is long-horizon memory. Frame-wise Gated DeltaNet carries the scene state through time, while softmax attention is reserved for dense interactions that still need full token mixing. That is the difference between a video model that burns context to stay coherent and one that can price a 60-second rollout like an inference job. The receipts are unusually concrete: 2.6B parameters, roughly 213K public video clips with metric 6-DoF pose supervision, 15 days of training on 64 H100s, 60-second clip generation on one GPU, and a distilled NVFP4 path reported at 34 seconds for 60s 720p denoising on an RTX 5090. A harder benchmark follows from this: world-model progress should be measured as coherence per GB of state, rather than by prettier frames alone. If that curve moves, single-GPU minute-scale simulation becomes an infra primitive for robotics, embodied AI, and synthetic data.
Alper FERUDUN tweet media
English
0
1
1
66
Alper FERUDUN
Alper FERUDUN@AlperTheKing·
@GregKamradt Real-time agents die on latency budgets below 300 ms and irreversible UI moves. Async agents can spend 20 minutes compiling, testing, and retrying, which makes verification part of the product instead of a demo artifact.
English
0
0
0
104
Greg Kamradt
Greg Kamradt@GregKamradt·
Bullish: async/long-term AI Bearish: real-time, in the moment AI
English
6
3
28
4.6K
Alper FERUDUN
Alper FERUDUN@AlperTheKing·
@gdb Excel won because cells exposed a recalculation graph to non-programmers. Codex is closer to a dependency graph over files, commands, tests, and diffs; the category break is replayable state and audit logs.
English
0
0
0
81
Alper FERUDUN
Alper FERUDUN@AlperTheKing·
@aakashgupta Skill routing is first-stage classification before context expansion. One bad load costs 2 turns: context pollution, then recovery. Negative examples belong where the scorer can see them before expansion.
English
0
0
0
41
Aakash Gupta
Aakash Gupta@aakashgupta·
7 patterns that hold up across 75 tests of Claude skills: 1. Descriptions under 100 characters stay invisible. "Suggest recipes from what's in fridge" is 37 characters. Most prompts that should have triggered it didn't. 2. Exclusions belong in the description, where they fire at routing time. In the body, an exclusion fires after the wrong skill has already loaded. Every "do not use for X" needs a "use /Y instead." 3. Claude matches the tone of the instructions. "Could you take a look and maybe check" gets you friendly, vague feedback. "Flag every issue with severity. Reference file and line. Do not soften." gets you a code review. 4. A three-column table beats "check the relevant files." Specify Source, Path, and What to extract. That's an instruction Claude can execute. 5. Without an output template, Claude invents a new format every session. Same skill, same prompt, three mornings, three different structures. 6. One worked input/output example beats five rules. A commit message skill with 12 rules was inconsistent across runs. Two examples produced identical structure. 7. Skills over 500 lines drop their bottom half. Safety rules at line 700 of a fitness skill never fired once. Full audit checklist plus an eval prompt that runs 10 sub-agents against your existing skills is in the deep dive.
Aakash Gupta@aakashgupta

Skills are the new prompts. But how do you write great ones? I ran 75 tests to find out. The result: 7 laws, an audit checklist, and an improvement prompt 🔗: aibyaakash.com/p/claude-skill…

English
16
7
59
11.6K
Alper FERUDUN
Alper FERUDUN@AlperTheKing·
@theo MIE exploit stories are about tooling boundaries. Crash triage plus deterministic repros can turn kernel exploitation into constraint solving when the model sees panic logs, allocator state, and patch history.
English
0
0
0
3.4K
Alper FERUDUN
Alper FERUDUN@AlperTheKing·
Garry Tan's GBrain makes the memory write path the reliability boundary for agent systems, because useful context must survive edits, sync, retrieval, and reuse as state. The repo treats markdown as the source of truth, with Postgres and pgvector underneath the retrieval layer. The concrete problem is a 7,471-file, 2.3GB markdown wiki that becomes painful when git alone is the operating surface. After sync, a human edit can become queryable agent memory with ownership. The reusable model is simple: agent memory should have a write path, a system of record, and drift tests. GBrain's CLI and MCP surface expose the same operations, while 30+ MCP tools turn the database into an action surface instead of a passive archive. Serious AI infrastructure keeps moving toward this shape. Bigger prompts can carry more text for one run, but durable agents need state that can be written, audited, searched, and repaired between runs. Memory becomes production data.
Alper FERUDUN tweet media
English
4
6
31
7.6K
Alper FERUDUN
Alper FERUDUN@AlperTheKing·
@shadcn Siri lost the default because it treats voice as command parsing. ChatGPT gives children an error-tolerant loop: clarify, rephrase, explain, then recover after a bad premise in one session.
English
0
0
0
1.5K
shadcn
shadcn@shadcn·
Apple really fumbled the bag man. Hearing kids saying "can you ask chatgeeeppeedy?" now instead of siri.
English
132
24
1.5K
104.3K
Alper FERUDUN
Alper FERUDUN@AlperTheKing·
@petergyang @alexalbert__ Frontier-model PM is closer to compiler PM than SaaS PM: spec evals, red-team budgets, latency targets, refusal policy, and release gates must move together before Opus changes default behavior.
English
0
0
0
92
Peter Yang
Peter Yang@petergyang·
"How do you PM a frontier model like Opus?" That's the question I asked my next guest, @alexalbert__, a research PM at Anthropic working on the next Claude model. We talked about how to: → Prioritize model capabilities → Build "dreaming" into Claude's memory → Train Claude's personality (and whether it'll reach consciousness) 📌 Subscribe to get our full interview tmr: @PeterYangYT?sub_confirmation=1" target="_blank" rel="nofollow noopener">youtube.com/@PeterYangYT?s…
Peter Yang tweet media
English
9
13
48
8.7K
Alper FERUDUN
Alper FERUDUN@AlperTheKing·
@rasbt KV sharing is only one lever. Long context also gets won by grouped-query attention, sliding-window layers, and retrieval-gated cache eviction when the prompt crosses 1M tokens.
English
0
0
0
171
Sebastian Raschka
New article: a visual tour of recent LLM architecture advances, from Gemma 4 to DeepSeek V4. I focus on long-context efficiency tweaks like KV sharing, per-layer embeddings, layer-wise attention budgets, compressed attention, and mHC. Link: magazine.sebastianraschka.com/p/recent-devel…
Sebastian Raschka tweet media
English
31
299
1.7K
73.4K
Alper FERUDUN
Alper FERUDUN@AlperTheKing·
@dwarkesh_sp Model culture starts when conventions survive across runs: shared evals, tool norms, citations, and failure memories. Without institutions, each checkpoint inherits weights but loses etiquette.
English
0
0
0
136
Dwarkesh Patel
Dwarkesh Patel@dwarkesh_sp·
Culture was possibly the key breakthrough in human history. Once we could share ideas, learning became much more efficient, and complex technologies became possible. LLMs haven't yet built their own culture or organisations. But it seems plausible they will. On the podcast last year, @karpathy speculated about what it will take for LLMs to build their own culture, and why they're not there yet.
English
19
14
190
23.2K