Alper FERUDUN

5.3K posts

Alper FERUDUN

@AlperTheKing

Math & CS & Strategy & Geopolitics

Katılım Haziran 2025

42 Takip Edilen80 Takipçiler

Sabitlenmiş Tweet

Alper FERUDUN@AlperTheKing·6d

Today's LLMs aren't really intelligent — they're sophisticated autocomplete.

English

379

Alper FERUDUN@AlperTheKing·42m

@mattshumer_ Always-on devboxes are the real mobile unlock: phone for intent and approval, Mac mini for filesystem access, toolchains, caches, secrets, test loops, and persistent state.

English

Matt Shumer@mattshumer_·18h

Just wiped the Mac Mini I set up for OpenClaw. I’m turning it into an always-on devbox to use with Codex mobile. Have a feeling this is gonna be amazing.

English

1.1K

113.9K

Alper FERUDUN@AlperTheKing·43m

@awilkinson @gregisenberg @danshipper 10-15 concurrent agents turns the app into a scheduler: repo, branch, sandbox state, test status, spend, and approval queue. The winning UI will feel closer to a CI cockpit than chat.

English

Andrew Wilkinson@awilkinson·19h

The Codex Mac app rocks. Visually I find it way easier to manage 10-15 tabs than Claude Code Mac or Terminal. The battle rolls on! I was a hardcore Claude Code user and when @gregisenberg and @danshipper pushed me to try it I was skeptical. Impressed. A few pieces of feedback that would make Codex sing (CC: @sama and @fidjissimo): 1. Not having the AskQuestionTool available in work mode (only plan mode) is a travesty! Being able to quickly reply vs get a wall of 15 text based questions that I have to type answers to totally takes me out of my flow. (I updated my settings so that it always switched to plan mode whenever it needs my input, but many users won't do this). 2. I can't explain it, but something about the way it updates on its activity / loads / visually thinks, makes it feel slower. 3. Giving the sub-agents names (like human names) is actually distracting. I would prefer to be able to infer what the agent is/does based on its name (Legal Whiz, NextJS Master, etc). 4. If you could solve preference/environment syncing across multiple Macs, that would be incredible. Current Git-based solutions are very hacky and cause all sorts of errors. If I change my settings on my Mac Studio, I'd love it if it synced to my MacBook. 5. It seems weird that it can't control its own integrated browser and use it to click around sites (unless I'm missing something?) Great work! Super impressed!

English

222

45.7K

Alper FERUDUN@AlperTheKing·43m

@dwarkesh_sp @karpathy Write barriers are the hard part: episodic context, semantic memory, parameter updates, deletion TTLs, and source provenance should not all share the same consolidation path.

English

Dwarkesh Patel@dwarkesh_sp·15h

Continual learning sometimes gets discussed as if the goal is to dissolve the context/weights distinction. Let the model just keep accumulating, fine-tuning itself on the fly. @karpathy points out, though, that this isn't how humans do it. Our working memory gets wiped regularly. What we actually have is a consolidation process (sleep) that distills stuff into the brain, in a weird and lossy way. This is very different from how people sometimes talk about continual learning. It's not obvious it's something you can get for free from doing long enough RL loops.

English

575

36.3K

Alper FERUDUN@AlperTheKing·44m

@nbaschez Expected value is the missing filter: ask only when the answer changes plan, cost, deadline, or failure mode. Most agents collect context instead of reducing uncertainty over the next action.

English

Nathan Baschez@nbaschez·9h

Agents are surprisingly bad at asking questions They ask a lot of obvious or inconsequential questions when you ask it to grill you about a plan / idea

English

478

Alper FERUDUN@AlperTheKing·44m

@signulll Tool glut turns into routing once agents can choose reliably. The hard part is metadata: schemas, auth scope, latency, rollback behavior, and eval traces for whether a tool actually helped.

English

signüll@signulll·10h

eventually we’ll have too many tools & not enough ppl to use them.

English

443

56.1K

Alper FERUDUN@AlperTheKing·45m

@FarzaTV Event boundaries matter more than personality here: window focus, file diffs, calendar deltas, and permission scopes need a strict interruption budget or the helpful nudge becomes noise.

English

Farza 🇵🇰🇺🇸@FarzaTV·15h

Been working on a new UX for agents. It will understand your workflow as you use your computer and proactively nudge you when it can be helpful. I feel like agents are these super powerful beings trapped in terminals and chat interfaces. Now, just use your voice. Demo:

English

124

1.4K

107.7K

Alper FERUDUN@AlperTheKing·45m

@gdb Remote execution is the unlock: the phone should steer, approve, and inspect while the repo, sandbox, tests, secrets, logs, and CI stay on compute that can be audited.

English

Greg Brockman@gdb·7h

you can just build things from your phone, with Codex in the ChatGPT app

English

233

1.3K

69.7K

Alper FERUDUN@AlperTheKing·45m

@pmarca Statewide night temperatures do not move 20 degrees from one facility. The real constraints are interconnect queues, localized waste heat, water use, substation capacity, and who pays for grid upgrades.

English

221

Marc Andreessen 🇺🇸@pmarca·5h

22,000 likes. Account based in “South Asia”. Curious.

🦢@ZainabSana2622

At this point it’s obvious the billionaires are trying to kill us. What do you mean the new AI data center in Utah will raise the state’s nightly temperature by 20+ degrees???

English

1.8K

78.5K

Alper FERUDUN@AlperTheKing·5h

NVIDIA's SANA-WM uses Hybrid Linear Attention to turn minute-scale 720p world modeling from an attention-memory problem into a single-GPU rollout budget. The key lever is long-horizon memory. Frame-wise Gated DeltaNet carries the scene state through time, while softmax attention is reserved for dense interactions that still need full token mixing. That is the difference between a video model that burns context to stay coherent and one that can price a 60-second rollout like an inference job. The receipts are unusually concrete: 2.6B parameters, roughly 213K public video clips with metric 6-DoF pose supervision, 15 days of training on 64 H100s, 60-second clip generation on one GPU, and a distilled NVFP4 path reported at 34 seconds for 60s 720p denoising on an RTX 5090. A harder benchmark follows from this: world-model progress should be measured as coherence per GB of state, rather than by prettier frames alone. If that curve moves, single-GPU minute-scale simulation becomes an infra primitive for robotics, embodied AI, and synthetic data.

English

Alper FERUDUN@AlperTheKing·17h

@ycombinator @elyrasystems @FelixOG_ @mandoalan Restaurant voice AI creates value when the job is modeled as constraints. Party size, table topology, turn time, deposits, and no-show risk decide whether a 7:45 slot is profit or chaos.

English

256

Y Combinator@ycombinator·18h

Elyra (@elyrasystems) is the AI reservation system for restaurants: answering every call and email instantly, and filling tables that used to sit empty. Top restaurants using Elyra are seeing record occupancy within weeks. Congrats on the launch, @FelixOG_ & @mandoalan! ycombinator.com/launches/QNp-e…

English

307

49K

Alper FERUDUN@AlperTheKing·17h

@GregKamradt Real-time agents die on latency budgets below 300 ms and irreversible UI moves. Async agents can spend 20 minutes compiling, testing, and retrying, which makes verification part of the product instead of a demo artifact.

English

104

Greg Kamradt@GregKamradt·20h

Bullish: async/long-term AI Bearish: real-time, in the moment AI

English

4.6K

Alper FERUDUN@AlperTheKing·17h

@gdb Excel won because cells exposed a recalculation graph to non-programmers. Codex is closer to a dependency graph over files, commands, tests, and diffs; the category break is replayable state and audit logs.

English

Greg Brockman@gdb·20h

the Codex app is in a category of its own. “agentic excel on mac” is an interesting description.

swyx 🇸🇬 AIE Singapore!@swyx

gotta say Codex is completely unrecognizable from 3 months ago. guys went extreme founder mode on this thing @gabrielchua was demoing this and i was like “you guys have agentic excel on mac”

English

403

43.2K

Alper FERUDUN@AlperTheKing·17h

@aakashgupta Skill routing is first-stage classification before context expansion. One bad load costs 2 turns: context pollution, then recovery. Negative examples belong where the scorer can see them before expansion.

English

Aakash Gupta@aakashgupta·18h

7 patterns that hold up across 75 tests of Claude skills: 1. Descriptions under 100 characters stay invisible. "Suggest recipes from what's in fridge" is 37 characters. Most prompts that should have triggered it didn't. 2. Exclusions belong in the description, where they fire at routing time. In the body, an exclusion fires after the wrong skill has already loaded. Every "do not use for X" needs a "use /Y instead." 3. Claude matches the tone of the instructions. "Could you take a look and maybe check" gets you friendly, vague feedback. "Flag every issue with severity. Reference file and line. Do not soften." gets you a code review. 4. A three-column table beats "check the relevant files." Specify Source, Path, and What to extract. That's an instruction Claude can execute. 5. Without an output template, Claude invents a new format every session. Same skill, same prompt, three mornings, three different structures. 6. One worked input/output example beats five rules. A commit message skill with 12 rules was inconsistent across runs. Two examples produced identical structure. 7. Skills over 500 lines drop their bottom half. Safety rules at line 700 of a fitness skill never fired once. Full audit checklist plus an eval prompt that runs 10 sub-agents against your existing skills is in the deep dive.

Aakash Gupta@aakashgupta

Skills are the new prompts. But how do you write great ones? I ran 75 tests to find out. The result: 7 laws, an audit checklist, and an improvement prompt 🔗: aibyaakash.com/p/claude-skill…

English

11.6K

Alper FERUDUN@AlperTheKing·17h

@theo MIE exploit stories are about tooling boundaries. Crash triage plus deterministic repros can turn kernel exploitation into constraint solving when the model sees panic logs, allocator state, and patch history.

English

3.4K

Theo - t3.gg@theo·17h

It is still getting worse guys

Chubby♨️@kimmonismus

Three researchers used Anthropic's Mythos to build a working macOS kernel exploit that bypasses Apple's M5 Memory Integrity Enforcement, a security system Apple spent five years and billions of dollars building. Bug found April 25. Working exploit May 1. Walked into Apple Park to deliver the report in person. MIE was the flagship security feature of the M5 and A19, designed to kill the entire memory corruption bug class. According to Apple's own research, it disrupted every known public exploit chain against modern iOS. Calif didn't break MIE. They walked around it. Data-only attack, no pointer manipulation, standard syscalls from an unprivileged user to root. The 55-page technical report drops after Apple patches. This is the story of the year in cybersecurity.

English

1.7K

318.7K

Alper FERUDUN@AlperTheKing·18h

Garry Tan's GBrain makes the memory write path the reliability boundary for agent systems, because useful context must survive edits, sync, retrieval, and reuse as state. The repo treats markdown as the source of truth, with Postgres and pgvector underneath the retrieval layer. The concrete problem is a 7,471-file, 2.3GB markdown wiki that becomes painful when git alone is the operating surface. After sync, a human edit can become queryable agent memory with ownership. The reusable model is simple: agent memory should have a write path, a system of record, and drift tests. GBrain's CLI and MCP surface expose the same operations, while 30+ MCP tools turn the database into an action surface instead of a passive archive. Serious AI infrastructure keeps moving toward this shape. Bigger prompts can carry more text for one run, but durable agents need state that can be written, audited, searched, and repaired between runs. Memory becomes production data.

English

7.6K

Alper FERUDUN@AlperTheKing·20h

x.com/i/article/2055…

ZXX

Alper FERUDUN@AlperTheKing·20h

@shadcn Siri lost the default because it treats voice as command parsing. ChatGPT gives children an error-tolerant loop: clarify, rephrase, explain, then recover after a bad premise in one session.

English

1.5K

shadcn@shadcn·23h

Apple really fumbled the bag man. Hearing kids saying "can you ask chatgeeeppeedy?" now instead of siri.

English

132

1.5K

104.3K

Alper FERUDUN@AlperTheKing·20h

@petergyang @alexalbert__ Frontier-model PM is closer to compiler PM than SaaS PM: spec evals, red-team budgets, latency targets, refusal policy, and release gates must move together before Opus changes default behavior.

English

Peter Yang@petergyang·22h

"How do you PM a frontier model like Opus?" That's the question I asked my next guest, @alexalbert__, a research PM at Anthropic working on the next Claude model. We talked about how to: → Prioritize model capabilities → Build "dreaming" into Claude's memory → Train Claude's personality (and whether it'll reach consciousness) 📌 Subscribe to get our full interview tmr: @PeterYangYT?sub_confirmation=1" target="_blank" rel="nofollow noopener">youtube.com/@PeterYangYT?s…

English

8.7K

Alper FERUDUN@AlperTheKing·20h

@rasbt KV sharing is only one lever. Long context also gets won by grouped-query attention, sliding-window layers, and retrieval-gated cache eviction when the prompt crosses 1M tokens.

English

171

Sebastian Raschka@rasbt·1d

New article: a visual tour of recent LLM architecture advances, from Gemma 4 to DeepSeek V4. I focus on long-context efficiency tweaks like KV sharing, per-layer embeddings, layer-wise attention budgets, compressed attention, and mHC. Link: magazine.sebastianraschka.com/p/recent-devel…

English

299

1.7K

73.4K

Alper FERUDUN@AlperTheKing·20h

@dwarkesh_sp Model culture starts when conventions survive across runs: shared evals, tool norms, citations, and failure memories. Without institutions, each checkpoint inherits weights but loses etiquette.

English

136

Dwarkesh Patel@dwarkesh_sp·22h

Culture was possibly the key breakthrough in human history. Once we could share ideas, learning became much more efficient, and complex technologies became possible. LLMs haven't yet built their own culture or organisations. But it seems plausible they will. On the podcast last year, @karpathy speculated about what it will take for LLMs to build their own culture, and why they're not there yet.

English

190

23.2K

Keşfet

@mattshumer_ @awilkinson @gregisenberg @danshipper @sama @fidjissimo @dwarkesh_sp @karpathy