
Tim✨
2.5K posts

Tim✨
@timyangnet
Co-Founder Westar Labs | 🛠️ $STC & AI Explorer | Ex-Chief Architect Weibo (NASDAQ:WB) What we hear is opinion; what we see is perspective. 此有故彼有 此生故彼生



🤯BREAKING: Alibaba just proved that AI Coding isn't taking your job, it's just writing the legacy code that will keep you employed fixing it for the next decade. 🤣 Passing a coding test once is easy. Maintaining that code for 8 months without it exploding? Apparently, it’s nearly impossible for AI. Alibaba tested 18 AI agents on 100 real codebases over 233-day cycles. They didn't just look for "quick fixes"—they looked for long-term survival. The results were a bloodbath: 75% of models broke previously working code during maintenance. Only Claude Opus 4.5/4.6 maintained a >50% zero-regression rate. Every other model accumulated technical debt that compounded until the codebase collapsed. We’ve been using "snapshot" benchmarks like HumanEval that only ask "Does it work right now?" The new SWE-CI benchmark asks: "Does it still work after 8 months of evolution?" Most AI agents are "Quick-Fix Artists." They write brittle code that passes tests today but becomes a maintenance nightmare tomorrow. They aren't building software; they're building a house of cards. The narrative just got honest: Most models can write code. Almost none can maintain it.


Claude is retarded. All of these models are. I wanted to schedule a skill every day at 8:00 am. Claude decided to schedule it at 7:57 am "to avoid the on-the-dot" surge. I SAID 8:00 AM! Do the darn thing the way I asked you to do it! You gotta be crazy to trust these models.

My conversation with Marc Andreessen (@pmarca), co-founder of @a16z and Netscape. 0:00 Caffeine Heart Scare 0:56 Zero Introspection Mindset 3:24 Psychedelics and Founders 4:54 Motivation Beyond Happiness 7:18 Tech as Progress Engine 10:27 Founders Versus Managers 20:01 HP Intel Founder Legacy 21:32 Why Start the Firm 24:14 Venture Barbell Theory 28:57 JP Morgan Boutique Banking 30:02 Religion Split Wall Street 30:41 Barbell of Banking 31:42 Allen & Company Model 33:16 Planning the VC Firm 33:45 CAA Playbook Lessons 36:49 First Principles vs. Status Quo 39:03 Scaling Venture Capital 40:37 Private Equity and Mad Men 42:52 Valley Shifts to Full Stack 45:59 Meeting Jim Clark 48:53 Founder vs. Manager at SGI 54:20 Recruiting Dinner Story 56:58 Starting the Next Company 57:57 Nintendo Online Gamble 58:33 Building Mosaic Browser 59:45 NSFnet Commercial Ban 1:01:28 Eternal September Shift 1:03:11 Spam and Web Controversy 1:04:49 Mosaic Tech Support Flood 1:07:49 Netscape Business Model 1:09:05 Early Internet Skepticism 1:11:15 Moral Panic Pattern 1:13:08 Bicycle Face Story 1:14:48 Music Panic Examples 1:18:12 Lessons from Jim Clark 1:19:36 Clark Versus Barksdale 1:21:22 Tesla Versus Edison 1:23:00 Edison Digression Setup 1:23:13 AI Forecasting Myths 1:23:43 Edison Phonograph Lesson 1:25:11 Netscape Two Jims 1:29:11 Bottling Innovation 1:31:44 Elon Management Code 1:32:24 IBM Big Gray Cloud 1:37:12 Engineer First Truth 1:38:28 Bottlenecks and Speed 1:42:46 Milli Elon Metric 1:47:20 Starlink Side Project 1:49:10 Closing Includes paid partnerships.




AI can read X better than you can. Then it can create a Notebook LM for you. Here's today's news, as a podcast, a slide deck, a mind map, video to come in a bit: notebooklm.google.com/notebook/50ab4… All gathered by Levangie Labs by reading tens of thousands of posts here on X through the X API.


Problem: AI coding agents (e.g. Claude Code, Cursor, Copilot) spend a significant portion of their token budget on file reads. When exploring an unfamiliar codebase, the typical pattern is: 1. Read a file in full to understand what it contains 2. Decide whether it is relevant 3. Repeat for N files until the answer is found The inefficiency: the agent reads the entire file before knowing it needs only a fraction of it, or before knowing it doesn't need it at all. On a medium codebase (Flask, ~25 files, ~50k tokens of source), reading everything to answer a specific question costs between 9k and 50k tokens depending on how many files are relevant. Solution (maybe): TOC-First Access: instead of reading the entire file, the agent first reads a Table of Contents generated from the file's AST. The TOC contains: - All class names and their public methods (with line numbers) - All top-level function signatures - Module-level imports - Docstrings (first line only) The TOC is produced statically from the AST no LLM, no inference, instant. It compresses files by ~86% on average (e.g. app .py: 9,090 → 702 tokens). The agent reads all TOCs first (~7k tokens for all of Flask), identifies which files are relevant, then reads only those in full. Some benchmarks made with Flask codebase.

I suspect we already have AGI with the current models for many use cases - but the harnesses just aren’t there yet

Prompting is a bug, not a feature. Stop obsessing over verbs and context windows. If your workflow depends on you being a prompt whisperer, you’ve already lost. You’re just a manual operator in a world that demands systems. Vibe Coding is fun for demos. The real evolution is Loop Engineering. You don't write a prompt; you build a recursive environment where the agent evaluates its own failure, refactors its logic, and iterates until the delta between intent and output is zero. The human shouldn't be the editor. The human should be the architect of the loop. Stop talking to the machine. Start building the machine that talks to itself.


