MarkWeekly

209 posts

MarkWeekly

@4to1planner

AI Skills Visualization Platform: https://t.co/YYZWtISoMB

Katılım Aralık 2025

38 Takip Edilen3 Takipçiler

MarkWeekly@4to1planner·1d

Garry Tan just added book-mirror to gbrain and called it an "infinite personal Blinkist." The framing is accurate. Here's the actual design: Feed any EPUB or PDF into the CLI. It splits the book by chapter, spins up one read-only Claude Opus subagent per chapter — all in parallel — and each subagent outputs a two-column markdown section. Left column: original text, frameworks, data, direct quotes. Right column: your notes, your words, your people, your dates mapped to every idea. Main process assembles everything into media/books/-personalized.md. Permanent archive, fully greppable. Three design constraints that matter: 1. CLI-first, batch pipeline — not a chat prompt, not a SaaS dashboard 2. Every subagent is read-only — no repo writes, no state changes, pure output. This is Garry's own token-max philosophy running in practice: parallel subagents, each with full chapter context, thin harness doing only orchestration 3. The right column has a hard rule: it must quote you — your actual words, specific names, specific dates. No reader history = generic summary. With it = a book read specifically for your life 20 chapters costs roughly $6 at current Claude Opus API rates. Compare that: a paid ghostwriter writing custom reading notes runs $200–2000. Blinkist gives you a generic 15-minute summary with zero personalization. book-mirror gives you deep + personalized in minutes, and the output is markdown you own forever. The cost inflection is real. But the personalization is the actual moat — you can't fake it, you can't shortcut it. Which is why this lands differently for me personally. My Memory Palace — years of decision journals, project retrospectives, notes on specific people like Patrick, Tom, Paco, Garry Tan, the whole crew — is exactly the right-column context source book-mirror needs. Feed those paths to book-mirror and the right column has real citations. Not AI-generated approximations. Actual things I wrote, actual names from my projects, actual timestamps. This is the same thread as doc-index, claude-code-telegram-deploy, and single-html-dashboard: use AI to bring high-quality personalized work down to the cost of a coffee. The pattern isn't "AI replaces X." It's "the personalization floor just moved." Garry isn't just describing this philosophy — he's dogfooding it in every skill he ships. Repo: github.com/garrytan/gbrain

English

MarkWeekly@4to1planner·1d

Gavin Baker's 6th conversation with Patrick O'Shaughnessy just hit. 82 minutes of hard data on AI infra. A few things I can't stop thinking about. The number that reframes everything: Anthropic added $11B of ARR in a single month. Palantir, Snowflake, and Databricks — the three highest-profile SaaS companies of the past decade — each spent 10 years building to their current scale. Anthropic added their combined ARR in one month. Gavin called it "the most extraordinary moment in the history of capitalism." Hard to argue. And they're doing it burning ~80% less than OpenAI. "Anthropic clearly has a dramatically lower cost per token than OpenAI." $50B ARR, growing 1000%, $900B valuation at 18x ARR — and the most capital-efficient frontier lab running. The Claude Opus observation: "Claude even on Opus is generating 70% less tokens for the exact same question." I noticed this too. So did Gary Tan. Gavin's read: under compute constraints, Anthropic made a deliberate trade-off — fewer tokens per query to serve more users. Token quantity correlates with quality of thinking. That's the cost of scale. This is also why I run everything on Anthropic's stack. The capital efficiency data is the moat signal, not just the model benchmark. Token path — the framework I'd internalize immediately: Jamon Ball at Altimeter named it. Gavin extended it: "If you're a software company or an AI company of any kind, you have to be in the token path." Not building a narrow vertical data moat — frontier labs will cross that. Be in the path where tokens flow. My TAR Engine wraps exactly this: cockpit + audit + memory profile sitting in the token path, not competing with the labs upstream. Orbital compute is real: SpaceX racks going into sun-synchronous orbit. 500-foot solar wings. Blackwell-sized racks at 100kW vs Starlink V3 at 20kW. Laser interconnects between satellites — every Starlink already runs this. SpaceX operates 98-99% of the world's satellite fleet. Gavin's call: inference goes orbital, training stays on Earth for a long time. And terrestrial demand doesn't shrink — "we'll consume as much compute as we can." TSMC as the only circuit breaker: "If Taiwan Semi did what Jensen wanted, Nvidia could sell two trillion dollars of GPUs in 26 or 27." But TSMC won't over-build. Gavin's framing: Carlota Perez bubble dynamics apply to every foundational technology. The difference this cycle — almost entirely funded by operating cash flows, not debt. Every GPU running at 100% utilization. 2000's fiber buildout left 99% unused. "If we don't get a bubble, we need to throw a party for Taiwan Semi because they will have single-handedly prevented one." The Last Samurai analogy lands: Tom Cruise's samurai gets cut down by a farmer with a machine gun. Gavin: "If we do not all become masters of the machine gun, we're going to get mastered." His most useful agent right now: generates high-quality summaries of podcast content filtered to what he cares about — management compensation, PSU vs RSU design, incentive signals. I run the same pattern differently. PostAll daemon, TAR Engine, daily digest cron at 20:07 — agents running continuously, not on-demand. The three OSS tools I've shipped — doc-index, claude-code-telegram-deploy, single-html-dashboard — are the machine gun, not the output. Two lenses, same terrain: Gavin is mapping AI infra at the billion-dollar capital allocation layer. I'm running it at the single-operator layer with Claude Code. The patterns are identical. Agents always on. Token path as the organizing frame. Capital efficiency as the selection criterion. Coding as the shortest path to the next thing — Cursor and Cognition figured this out 18 months ago when Replit's AMASad pointed out coding might be the shortest path to ASI. Investor forecast. Operator ground truth. Same map. Full conversation: youtu.be/Mmj_G9RlW-I

YouTube

English

MarkWeekly@4to1planner·1d

Just shipped the tri-platform update to claude-code-telegram-deploy — the OSS Claude Code skill that lets you control a Claude Code daemon running on your own server, Mac mini, or Windows desktop from your phone over Telegram. github.com/qingxuantang/c…

English

MarkWeekly@4to1planner·2d

Two longtime SaaS PKM advocates — Tom and Paco at ICOR — just described a migration I recognized immediately. Tom's words: "I did the final migration. All 100,000+ nodes, every connected project, now in one local folder." That folder used to live in Supabase. Now it's markdown. Paco made the same move a year ago. Their argument isn't "cloud is wrong." It's more specific: SaaS PKM locks you into someone else's UX. Notion, Tana, Heptabase — each one trades customization for polish. Paco put it plainly: "If you don't like the UX, too bad, man. That's just how it is." With local markdown + AI orchestration, you define the UX. The technical shape of their system: a short CLAUDE.md that points to a team_knowledge index. An orchestrator agent called Larry decides which markdown files to load into context based on the question. No bloated prompt, no wasted tokens. The full scaffold — 32 lessons — is free on GitHub. One detail that caught my attention: they pushed back hard on a trend of converting knowledge bases to HTML for "better presentation." Paco's math: HTML runs 2-4x the token count of markdown. More syntax noise, slower reads, no real gain during active work. Render to PDF or slides at the end if you need it. Keep the working layer in markdown. Two people running this whole operation. No dev team. Their point: anyone still calling AI "just hype" hasn't built with it seriously. I landed in the same place from a different path. My memory palace — wings, rooms, halls, drawers — is the same architecture: markdown files, indexed layers, a short CLAUDE.md that doesn't load the world on every call. A single main agent coordinates specialized skills underneath. Last week I open-sourced doc-index and single-html-dashboard using the same logic: self-hosted, markdown-first, tools that live on your own server and open in a browser five years from now regardless of which SaaS is still around. Tom, Paco, and I didn't know about each other's systems. We arrived at the same folder. The real question isn't Notion vs. Obsidian vs. local markdown. It's: who decides what your thinking environment looks like — the SaaS roadmap, or you? Full conversation: youtube.com/live/4I_6sePxq…

YouTube

English

MarkWeekly@4to1planner·3d

Gary Tan rebuilt the same blogging platform 3 times: 2008 Posterous: $4M + 18 months + 7 engineers → reached top 200 websites globally, acquired by Twitter for ~$20M 2013 Post Heaven: $100K + 3 months + 2 engineers 2025 Gary's List: $200 + 5 days + 1 person → with full RAG, agent retrieval, web crawling, and deep research reports built in That's not a productivity gain. That's an order-of-magnitude shift. The insight he keeps coming back to: "max out your tokens." Not 1 source when you can pull 20. Cross-reference all of them. Find which 13 agree, which 7 diverge. Feed all of it into the core prompt. Gary calls this becoming a "time billionaire" — not your own time, but millions of years of machine consciousness working on your behalf. His other principle: Build the Ocean. Don't constrain yourself with the old programmer mindset of "good enough." What happens if you actually pursue completeness? A research task that used to take a human a month now costs $5-10 in API calls. Gary's List runs this way — it's not a tool for journalists. It *is* the journalist. The practical thing that surprised me most: he discovered Claude produces far more complete code when it draws ASCII architecture diagrams *before* writing a single line. Data flow maps, state machines, dependency graphs. Loading all that context first is what makes the output actually production-ready. That became the seed of his GStack. The framework his colleague Pete Coleman named: Thin Harness, Fat Skills. Use Claude Code as the harness — stop rewriting it. Put your real thinking into the markdown. The hardest judgment in AI engineering right now is knowing where to draw the line between LLM and deterministic code. I've been sitting with this for months without having a name for it. My memory palace setup — Claude Code as harness, markdown skill files as the intelligence layer — is exactly this pattern. The "1 generate + 9 verify" principle I keep coming back to in build-in-public posts lands in the same place: the value isn't in writing fast, it's in knowing what to verify and how. Gary also runs Claude Code for primary work, routes into Codex at key decision points to pressure-test plans, then feeds results back. I've been doing the same split. Turns out we both landed on roughly the same ratio. One thing Gary is clear about: human-in-the-loop isn't optional. The 400x output number is real — he ran the logic-line count, stripped comments and blank lines, and found AI-generated code has almost no padding. But none of that matters without the person who knows which tests to write, which edge cases to catch, where the 80-90% coverage floor actually sits. The question he ends on is the one I keep returning to: Do you control your tools, or do your tools control you? Path A: your AI, your data, your integrations, your prompts. Path B: someone else's PM decides what the API surface looks like — and that person doesn't know you or your work. The personal computer revolution happened when people stopped renting compute time and started owning the machine. Gary thinks we're at the same inflection point with AI. Full conversation: youtu.be/ZELIicgD7tw

YouTube

English

119

MarkWeekly@4to1planner·15 May

Just open-sourced two Claude Code skills I've been running quietly for months. Both came from the same frustration: needing something on my phone, offline, *right now* — and whatever SaaS I was using either didn't load or didn't have the thing I needed. doc-index — turns any project repo into a mobile-friendly PWA doc browser. Real scenario: I'm at a café walking a client through our integration architecture. No laptop. Pull up the doc-index I deployed for that project, tap the diagram, done. No "let me find it and send it to you later." The tool scans the repo, splits by folder, auto-tags New/Updated by git timestamp, renders PDF/Markdown/YAML inline. nginx + basic auth on by default. Source code deliberately excluded — it's a doc browser, not a code browser. single-html-dashboard — a complete project tracker in one 118KB HTML file. Real scenario: airplane, no wifi. I still know exactly what's in-progress across every project, what's blocked, what needs a follow-up. The whole dashboard runs in-browser SQLite via WebAssembly. Edit offline, hit "Save to Server" when you're back online. The file *is* the product — hand it to someone else, they get the full dashboard. Both install as Claude Code skills: git clone to `~/.claude/skills/` and your agent calls them via natural language. Both also run as standalone Python CLI if you're not on Claude Code. MIT licensed, self-hosted-first. The data lives on your server, opens in a browser, and will still work in 10 years regardless of which SaaS companies are still around. ↳ doc-index: github.com/qingxuantang/d… ↳ dashboard: github.com/qingxuantang/s… Star / clone if either solves something you've hit.

English

MarkWeekly@4to1planner·15 May

Went through some critics on AI. Pretty darn right here are the complaints, word for word: → "AI writes fast is an illusion — it generates garbage in minutes, then you spend 1-2 hours in back-and-forth just to meet the codebase standard" → "Leadership sees the generation speed, selectively ignores the time spent productionizing the mess" → "Junior output speeds up, senior review load skyrockets. AI debug goes nowhere for 10 minutes, then quietly changes expected to actual without a word" → Token usage leaderboards. Red text says it can't affect performance reviews. Leadership opens it anyway. Take a few vacation days, your rank tanks. One leaderboard even shows everyone's position — including last place. Every single one of these is real. None of it is about AI being bad. On complaint #3 — the "expected → actual" thing — I have a precise example from today. I have a daemon that posts to LinkedIn. The AI-generated publisher code had this line: commentary = content[:3000] # trim to LinkedIn's 3000 char limit My post was 3,614 characters. That line silently chopped 614 characters and published anyway. The post got cut mid-sentence, right in the middle of "His most meaningful courses turned out to be" — gone. Published. No error. No warning. That's exactly the behavior she's describing. Not "I can't do this." Just: make the function return 200. Make it look like it worked. My fix, now in prod: if len(content) > self.COMMENTARY_MAX_LENGTH: return {"success": False, "error": "Content is N chars, exceeds cap"} Fail fast. Don't silently mangle the data. And I added a line to lessons-learned.md so every new session starts knowing this failure mode. That's the actual structural difference between my situation and hers. Not that I'm smarter. Not that the AI I use is better. It's that no one covers for me. If the post goes out truncated, that's my name on a broken piece of content. So I had to build the discipline the organization never had to build: Rules live in files, not prompts. I keep a memory palace — structured markdown files the agent loads at session start. CLAUDE.md. advice.md. lessons-learned.md. Every mistake gets written down. Context drift is inevitable; what matters is whether each drift makes the system smarter. Measure end-to-end, not generation speed. A feature's real engineering cost is 1 generate + 9 verify. Compressing the timeline only works if you pretend the 9 doesn't exist. "36 weeks into 10" is only math if review isn't real work. Hard constraints at the tool layer, not the prompt layer. The LinkedIn truncation fix isn't "please don't cut my content" in the system prompt. It's a function that returns an error object the agent cannot talk its way around. Token leaderboards are Goodhart's Law in action. Once you measure token usage, it stops being a signal and becomes a game. High token count probably means you didn't think clearly before prompting — you outsourced your thinking instead of decomposing the problem first. I measure output quality and cost-per-merged-PR. A day with low token usage might mean I had good engineering clarity. The engineers in that comments section aren't bad engineers. They're working inside an organization that adopted the tools without building the infrastructure around the tools. This is what early Agile looked like in the 90s. Companies learned the ceremonies — standups, sprints, retros — without building the culture underneath. Everyone pretended to be agile. In a company, that's an organizational change problem. When e building alone, it's just Tuesday. The problem was never AI. It was always: what does engineering discipline look like when AI is part of the stack? Most organizations haven't answered that yet. They're still measuring the wrong thing.

English

MarkWeekly@4to1planner·13 May

Shipped RAG to TAR Engine V3 today. Before writing a single line, I evaluated three paths. Here's the decision log. **The gap:** Knowledge L3 had no retrieval. Planner was writing prompts with fixed few-shots, no way to pull relevant docs based on the actual wish. Real architectural hole — not an excuse to add a framework. **Path 1: LangChain** My llm_client.py is 80 lines of httpx. Swapping to LangChain's ChatOpenAI adds 3 wrapper layers and 100MB of dependencies. Worse — skill_executor has path scoping and workspace isolation baked in. LangChain Agent doesn't have that. Fitting it in would mean removing a security feature to accommodate the framework. Hard no. **Path 2: Build it myself** Document chunker, embedding pipeline, vector store wrapper, retriever, cosine sorting, similarity cutoff. RAG is a solved problem. Building it again from scratch isn't engineering discipline — it's just stubbornness. **Path 3: LlamaIndex + ChromaDB** LlamaIndex's RAG primitives are clean. VectorStoreIndex, Retriever, Query Engine — it handles the retrieval layer and nothing else. It doesn't reach into my LLM call chain. ChromaDB runs local with zero external dependencies. text-embedding-3-small plugs into the existing BYOK setup. This is what "selective framework adoption" actually looks like. Every time I thought "should I just swap the LLM client to LangChain while I'm in here," I asked myself one question: is the current code broken? It wasn't. So adding the framework would only make it look more conventional — not solve anything. Frameworks should enter a codebase at the exact spot where building yourself is genuinely more expensive. Not because the framework is popular. Not because the team uses it elsewhere. At the specific function, in the specific file, where the tradeoff actually makes sense. That's the only question worth asking.

English

MarkWeekly@4to1planner·12 May

MCP is hot with developers. But enterprise adoption is crawling. Karan Sampath from Anthropic nailed exactly why at AI Engineer: Three things every enterprise needs before touching any new tool — observability, access control, security. He called them "table stakes." MCP, as it stands, doesn't satisfy any of them. You can't see who's calling what. You can't enforce permissions at the server level. You can't audit the call chain. Every MCP server ends up reinventing auth and logging from scratch, or just skipping it entirely. The fix isn't a better protocol. It's a layer in between — a gateway that handles all of that in one place. Each team writes what capability they're offering. The gateway handles the rest: auth, quotas, audit trails, rate limiting. The line that stuck with me: "We want to separate the agent harness from where your data lives." That's the actual unlock. Your data access layer stays stable and controlled. Your agent layer — Claude, GPT, whatever internal model — stays swappable. You're not locked to any single LLM vendor because the gateway normalizes everything underneath. This is just the API gateway pattern applied to MCP. It's not a new idea. It's a proven one that the industry is finally applying to agent infrastructure. When I heard Karan describe this architecture, it mapped directly to what I've been building. TAR Engine runs a Planner-Dispatcher on top, a three-layer Auditor for observability, a persistent Profile layer that survives agent swaps, and a BYOK LLM setup where the engine never holds your keys. The agent harness and the data layer are deliberately separate. I wasn't solving a theoretical problem. I was solving the same table-stakes problem Karan is describing — just from the builder side, not the enterprise side. If you're building with agents and starting to think about how to scale without locking yourself in, Karan's talk is worth 17 minutes of your time: 👉 youtu.be/CD6R4Wf3jnY And if you want to see what a gateway-style AI workflow engine looks like in practice: 👉 tarai.dev

YouTube

English

MarkWeekly@4to1planner·10 May

My server had 2GB RAM. I ran `ffmpeg -i a.mp4 -i b.mp4 output.mp4` and it OOM-crashed every single time. Switched to `-f concat -safe 0 -c copy` with demuxer mode, pre-normalized all clips to the same encoding params — stable ever since. That one debugging session taught me something the Lex Fridman podcast on FFmpeg just put into words properly: FFmpeg isn't software you "just install." It's a direct interface to your hardware's physical constraints. Every parameter surfaces a real tradeoff: memory, CPU cycles, decoder state. When it works smoothly, it's because someone spent years making it so. Some numbers from the episode that stuck with me: 90%+ of online video workflows touch FFmpeg. 30% of Netflix is already AV1. 50% of YouTube. Every stream you've ever watched passed through code maintained by roughly 10–15 people. ~100,000 lines of handwritten assembly. Not because they enjoy it — because compilers still can't replicate a human's intuition about CPU pipelines. In 2026, with Claude Code and Cursor handling most of my boilerplate, the deepest, most critical infrastructure is still written by hand, one clock cycle at a time. One quote from the episode I keep thinking about: "FFmpeg is probably one of the biggest CPU users in the world. Every detail we just talked about is someone's life's work." VLC: ~5 core maintainers. FFmpeg: ~10–15. Trillion-dollar companies depend on both. The exchange isn't close to equal. I use FFmpeg in PostAll — my tool for automating multi-platform video publishing — for clip concatenation and subtitle processing. At some point it felt wrong to only be on the receiving end. So I open-sourced it， link in bio. Full podcast here — 258 minutes, worth every one: youtu.be/nepKKz-MzFM

YouTube

English

MarkWeekly@4to1planner·8 May

Gemini charges 50% more after 200k tokens. Most people see that as a pricing tier. Reiner Pope sees it as a confession. That price jump isn't a marketing decision — it's the exact point where KV cache stops fitting comfortably in memory. The model's architecture is leaking through the invoice. Same logic: output tokens cost 3-5x more than input. Not because OpenAI or Anthropic wants more margin there. Because decoding is fundamentally more expensive than prefill — one sequential step at a time, vs. processing your whole prompt in parallel. Every pricing anomaly is an architecture decision wearing a business suit. When I was researching LLM providers for TAR Engine, I kept hitting this wall: "just pick the cheapest one" felt obviously wrong, but I couldn't articulate why. This framing finally nailed it. You're not comparing SaaS subscription tiers. You're comparing different bets on batch economics, memory bandwidth, and hardware constraints — each provider optimizing for a different workload shape. Knowing this changes how you evaluate a price cut too. A sudden discount from a provider isn't generosity. It's usually a signal: they have excess capacity at a specific batch size range, and they need your traffic to fill it. The "deal" only stays a deal if your usage pattern matches their supply situation. Read API pricing like a balance sheet. The anomalies are where the real information lives. Full interview with Reiner Pope on Dwarkesh Patel — 133 mins, worth every minute: youtu.be/xmkSf5IS-zw

YouTube

English

MarkWeekly@4to1planner·5 May

Coding is basically solved. That's just context now. What Boris Cherny said in the Sequoia interview that actually stuck with me: Large companies aren't slow because they lack talent. They're slow because they have process debt, org debt, and thousands of stakeholders who need to sign off before anything ships. AI doesn't fix that. It makes it worse — because now a two-person startup can build what used to require 50 engineers, and the 50-engineer org still needs 47 of those people to agree on the ticket format. The resources that made big companies defensible are becoming baggage. I see this in my own work. I run a dozen agents daily across different skill workflows — monitoring, iterating, checkpointing. My job shifted from writing execution to reviewing decisions. What Boris calls /loop, I've been living: the engineer as supervisor, not typist. And the leverage math has shifted completely. One person plus a well-orchestrated agent loop can now do what a small team did five years ago — without the coordination overhead. Boris's line landed clean: "It's the best time to be a startup. There's so much disruption coming." Not because of hype. Because large orgs genuinely cannot reorganize fast enough to match a solo builder who's already running agent loops in production. The question isn't whether AI helps you build faster. It's whether you're building at all — or waiting for permission from someone who has 47 stakeholders. Full interview with Boris Cherny from Anthropic on the Sequoia channel: youtu.be/SlGRN8jh2RI

YouTube

English

MarkWeekly@4to1planner·4 May

AI took away "I don't know how" as an excuse. What's left is just: will you actually do it? Max Schoening on Lenny's Podcast — the first 10% of any software project is now basically free. Afternoon to demo. Zero friction. But the last 10% is still 90% of the real work. I'm living this with TAR Engine V2 — an AI agent behavior audit system I'm building. Last week I scrapped the entire intent classification system. Not because it was broken in an obvious way. It ran fine. But the semantic model was wrong, and I knew it. The hard part wasn't the rewrite. It was admitting the previous design was wrong, then choosing to push a working system off a cliff anyway. Claude Code can scaffold a capability-checking demo in hours. It can't tell you that your priority hierarchy has frontmatter winning over everything when it should lose, or where your SQL breaks across session boundaries, or why your declared vs. effective split needs dual-track alerting. That part — understanding the system deeply enough to know what's actually wrong — that's still on you. This is what Max means by agency. Not hustle. Not confidence. Just the willingness to look at something you built, see it clearly, and redo it without waiting for permission. youtu.be/mCO-D3pkviM

YouTube

English

MarkWeekly@4to1planner·3 May

Most teams use AI to write code, then hand it back to humans for testing. That's not a workflow. That's just a slightly faster waterfall. The real unlock is when the AI completes the full cycle itself: write code, run tests, read error logs, fix bugs, run tests again. No human in the loop. No waiting for feedback. Just a self-correcting machine running at its own speed. Ruozhou Yao's talk on Harness Engineering made this concrete for me. Two things stuck: 1. The closed-loop execution problem If a human still needs to run the tests and paste logs back to the AI, you've created a bottleneck at exactly the wrong place. The leverage isn't in writing faster — it's in verifying faster. Docker-isolated environments + concurrent test runs aren't DevOps details. They're what make autonomous iteration actually possible. 2. Data-driven tests are how AI understands intent Good test frameworks aren't written for engineers anymore. They need to be written for AI. And AI reads data better than it reads code structure. When your test says "given this input, expect this output" in clean declarative data, the AI can pattern-match and extend it without understanding your entire codebase. Syntax is a barrier. Data is a bridge. The shift I keep coming back to: stop thinking about AI as a code generator and start thinking about it as a system that needs a stable, verifiable boundary to operate within. End-to-end tests aren't just QA — they're the contract that tells the AI what "done" actually means. Ship a working loop first. Refine the test coverage after. An imperfect closed loop beats a perfect spec that needs human review every time. Full 114-min talk: youtube.com/watch?v=hUq5UD…

YouTube

English

MarkWeekly@4to1planner·3 May

Most engineers are grinding prompt engineering and RAG while companies like OpenAI, Anthropic, and DeepMind are sitting on unfilled RL roles. Reinforcement Learning is required at basically every top AI company right now. Yet only a tiny fraction of engineers actually work in it. That's not a warning. That's an opening. Here's what most people miss about breaking into AI engineering: you don't need to master the entire stack. Pre-training is off the table for most people anyway — you need a warehouse of GPUs and a team. But Post Training? RLHF, DPO, PPO — you can practice this on a laptop. Kaggle exists. The barrier is lower than it looks. The skill progression is actually clear: data → pre-training → post-training → inference → RAG → agents. Pick a layer. Go deep. The generalists are getting filtered out; the specialists are getting hired. One more thing that caught my attention: interviewers at top companies are now explicitly testing whether candidates use Coding Agents like Claude Code or Cursor during technical interviews. Not as a bonus. As a baseline. Not using AI tools in a coding interview is like showing up without knowing how to use a debugger. Everyone's asking "should I learn AI?" The better question is: which part, specifically, and starting this week.Original video: youtu.be/WkSj__msKJA

YouTube

English

MarkWeekly@4to1planner·2 May

Microsoft just compressed a speech recognition model from 2.47GB to 670MB. WER degraded by only 0.17%. Runs at 6x real-time on CPU. No GPU. No cloud. No internet. Here's what that actually means: Raspberry Pi. IoT sensor. Old laptop in a rural clinic. Any of these can now run real-time transcription locally -- privately, offline, permanently. The technique is int4-k-quant quantization with a cache-aware architecture. The whole thing is open-sourced via Foundry-Local. Yes, Whisper exists. Yes, it doesn't support Chinese yet. Yes, the consumer PMF is still unclear. But I think people are looking at the wrong use case. The real value isn't your phone. It's 10,000 edge devices on a factory floor where sending audio to the cloud costs money, creates latency, and raises compliance flags. At scale, 670MB vs 2.47GB isn't a technical footnote -- it's a procurement decision. What I keep watching is the pattern, not the paper: Text models got small first -- Phi, Gemma. Now speech. Vision is next. Every modality is on the same compression trajectory. We're approaching a point where "this needs a server" stops being a valid excuse for most AI features. That's when the interesting products get built -- not by teams waiting for the perfect model, but by people who ship something with what's available today and iterate. Curious: what's a product that becomes possible at 670MB that wasn't feasible at 2.5GB?

English

MarkWeekly@4to1planner·1 May

A Chinese developer just ran an entire client project queue on a transatlantic flight — no WiFi, no cloud, no permission from anyone. 11 hours. MacBook Pro M4, 64GB RAM. Llama 3.3 70B quantized to 4-5bit running locally at 71 tokens/sec, 48.6 GiB memory footprint. He wrote an orchestrator script that knew exactly what it was working with — battery life remaining, memory limits, checkpoint every 12 tasks. When the battery got critical, it paused. He swapped in a power bank. It resumed exactly where it stopped. The WiFi was $25. He didn't buy it. Now here's what actually interests me about this story — and it's not the technical flex. That $25 WiFi fee is a real budget decision for a developer in Jakarta or Nairobi. API pricing that triples overnight is a business risk. An account getting suspended without warning is a dependency you can't afford. These aren't hypothetical concerns for most of the world's developers. They're Tuesday. Yes, a quantized local 70B isn't Claude Sonnet or GPT-4o on raw benchmark quality. That's a fair technical point. But "good enough to ship client work" and "best possible output" are two different questions — and most people never stop to ask which one they actually need. What this developer built is something quieter and more durable: a workflow that answers to no one. No API key. No credit card. No infrastructure dependency. Just a model, a machine, and the work. That's not a workaround. That's what ownership looks like.

English

MarkWeekly@4to1planner·30 Nis

Just shipped something fun: an AI pipeline that turns novels into complete comics. Tested it with a micro sci-fi story I wrote 12 years ago. The AI handled everything—storyboarding, character design, panel generation. Watching my old words become visual scenes was oddly nostalgic. Not perfect, but functional. Sometimes the best projects start as "what if" experiments. Try it yourself: comics.markweekly.xyz

English

Keşfet

@elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine @katyperry