

Henry Yin
146 posts

@HenryYin_
Following eigen vectors




Ten years ago I was building factories. Today I'm building the tools I wish I had inside them. @TenkaraAI raised $7M led by @trueventures.

The hard part about LLM failures is that their outputs rarely look like failures. The demo “works.” The output sounds coherent. The user actively uses the product. And your dashboard looks normal. Meanwhile, the system can be wrong, unsafe, or quietly driving up token spend. And you won’t notice until the damage adds up. Prompts often serve as business logic (policies, safety, and product context). But many teams ship them without the basics, such as versioning, reviewable changes, end-to-end traces, and eval gates. In production, it doesn’t crash. It degrades via wrong answers, policy misses, and surprise spending. No crash. No error. No alert. I cover this exact issue in my @Stanford CS 224G guest lecture on AI Observability and Evaluations. Here are the core ideas: • If you only log the final output, you’re guessing. Full traces show where it broke. • Evals are feedback loops. Use clear pass/fail criteria tied to outcomes. • Run evals continuously on production traces and don’t wait for support tickets. The moat isn’t prompt cleverness. It’s a measured improvement. Full lecture + blog below 👇










Hope everyone enjoys their last year of meaningful work!

How come the NanoGPT speedrun challenge is not fully AI automated research by now?




[New Post] Continual Learning: the Promised Land The next breakthrough isn't a bigger model, it's a model that keeps learning. - Models that rewrite their own weights at inference. - Agents that curate their own memory. - Systems that improve their own reasoning. Read the builder's map here: substacktools.com/sharex/GqL_A0TA



[New Post] Continual Learning: the Promised Land The next breakthrough isn't a bigger model, it's a model that keeps learning. - Models that rewrite their own weights at inference. - Agents that curate their own memory. - Systems that improve their own reasoning. Read the builder's map here: substacktools.com/sharex/GqL_A0TA





