Henry Yin

146 posts

Henry Yin

@HenryYin_

Following eigen vectors

San Francisco Katılım Temmuz 2018

539 Takip Edilen620 Takipçiler

Sabitlenmiş Tweet

Henry Yin@HenryYin_·27 Şub

[New Post] Continual Learning: the Promised Land The next breakthrough isn't a bigger model, it's a model that keeps learning. - Models that rewrite their own weights at inference. - Agents that curate their own memory. - Systems that improve their own reasoning. Read the builder's map here: substacktools.com/sharex/GqL_A0TA

English

669

Henry Yin retweetledi

Phota Labs@PhotaLabs·5d

Today, we introduce Phota Studio and Phota API, powered by our photography model that brings flagship image model capabilities, personalized to you. With personalization, an image model stops being just playful and starts becoming useful for photography. With Phota Studio, you can: - Reimagine composition, lighting, or posture while still looking like yourself - Create editorial, stylized, and studio-quality portraits of yourself, or bring someone you love into the frame - Revive the blurred shot, bring in the person who missed the group photo, fix the awkward expression - all without losing what made the moment worth keeping With Phota API, you can finally build photo experiences where real people are the core. Marketing assets, editorial shoots, wedding photography: workflows that needed identity fidelity that GenAI couldn't deliver. Until now. Ultimately, we want to make compelling photographs accessible to everyone. Phota API and Phota Studio start to make that possible: empowering people to explore, imagine, and create without losing themselves in the image. With Phota Studio and Phota API, developers can build new photo experiences, while photographers and creators can explore a new kind of AI-native editing and generation. The next photo experience starts here!

English

159

42.9K

Henry Yin@HenryYin_·17 Mar

Atoms are safe. Bits are at risk. The coordination layer is getting automated. Automate smart, or get automated.

Benjamin Stern@itsbenjyyy

Ten years ago I was building factories. Today I'm building the tools I wish I had inside them. @TenkaraAI raised $7M led by @trueventures.

English

126

Henry Yin@HenryYin_·14 Mar

Measured improvements for the win

Arsh Shah Dilbagi@arshdilbagi

The hard part about LLM failures is that their outputs rarely look like failures. The demo “works.” The output sounds coherent. The user actively uses the product. And your dashboard looks normal. Meanwhile, the system can be wrong, unsafe, or quietly driving up token spend. And you won’t notice until the damage adds up. Prompts often serve as business logic (policies, safety, and product context). But many teams ship them without the basics, such as versioning, reviewable changes, end-to-end traces, and eval gates. In production, it doesn’t crash. It degrades via wrong answers, policy misses, and surprise spending. No crash. No error. No alert. I cover this exact issue in my @Stanford CS 224G guest lecture on AI Observability and Evaluations. Here are the core ideas: • If you only log the final output, you’re guessing. Full traces show where it broke. • Evals are feedback loops. Use clear pass/fail criteria tied to outcomes. • Run evals continuously on production traces and don’t wait for support tickets. The moat isn’t prompt cleverness. It’s a measured improvement. Full lecture + blog below 👇

English

236

Henry Yin@HenryYin_·13 Mar

Candide out here skiing terrain that looks personally offended by human life. No more “the snow wasn’t good enough” excuses, ever

Candide Thovex@CandideThovex

youtu.be/_smOoXDuyJU

English

584

Henry Yin@HenryYin_·13 Mar

@Yuchenj_UW Been loving your content, hoping this means you will have a little more time to make it

English

Yuchen Jin@Yuchenj_UW·13 Mar

I have decided to step down as CTO at Hyperbolic. Leaving a company you co-founded and poured your heart into is not easy. So many moments still feel vivid: launching our AI inference product for open-source models and seeing tens of thousands of developers sign up in a week; the week we were hit by a massive DDoS attack and the entire engineering team fought around the clock until we won; the day we launched the GPU platform and watched ARR take off. There were also hard moments. That’s the nature of building a startup. I’m grateful for all of it. What I’m most grateful for is the team. Thank you for your trust. Most startups never build something people want. I believe we did. You should be proud of yourselves. I will look forward to seeing your success. What’s next for me? I’m still figuring it out. I believe this is the most extraordinary moment in human history. We’re standing at the edge of the Singularity. AI will reshape everything, and I still feel the same excitement I felt when I first fell in love with AI. Time to start over. Time to climb another mountain. Thank you to everyone who has been part of the journey, — Yuchen

English

245

1.6K

183K

Henry Yin retweetledi

Marco Mascorro@Mascobot·9 Mar

🚨 New: Integrating Harbor (@harborframework) for end-to-end Computer-Use evaluation(for Windows and Linux) at scale with @thinkymachines' Tinker, OSWorld, @daytonaio, and bare-metal servers. We just added support for Computer Use, @tinkerapi, and OSWorld to Harbor - a framework for evaluating agents and generating RL training data by running large-scale rollouts across parallel sandboxed environments and collecting trajectories for SFT and RL. Repo and blogpost below 👇

English

130

19.1K

Henry Yin retweetledi

Erik Schluntz@ErikSchluntz·6 Mar

I'm hiring for an Agents Engineer role on the Multi-Agent team! if you've built awesome things on top of LLMs to increase their performance in measurable ways, I'd love to work with you! job-boards.greenhouse.io/anthropic/jobs…

English

1.5K

129.8K

Henry Yin retweetledi

Yifan Zhang@yifan_zhang_·5 Mar

Learning from interaction is a foundational idea underlying nearly all theories of learning and intelligence. (Richard Sutton) Introducing Interactive Benchmarks, which evaluate a model's ability to acquire information actively! 🚀 interactivebench.github.io/InteractiveBen…

English

234

30.6K

Henry Yin@HenryYin_·3 Mar

@ivanburazin I just refer senders to wisprflow, problem solved

English

Ivan Burazin@ivanburazin·2 Mar

Voice messages should die. I hate everyone who sends them. When someone sends me a voice note, I don't look at it for days. It's the most selfish form of communication. You're offloading your typing time onto my listening time. You get convenience and I get locked into your pace. Can neither skim thru nor reference them later.

English

615

288

3.5K

327.2K

Henry Yin retweetledi

Philipp Moritz@pcmoritz·3 Mar

We just merged a clean Qwen 3.5 implementation for SkyRL's Jax backend: github.com/NovaSky-AI/Sky… Currently only for dense models, but should be easy to adapt to MoE models, contributions welcome! Also if anybody wants to contribute chunkwise training for the gated delta net or layer stacking for the model, it would be welcome!

English

3.7K

Henry Yin@HenryYin_·2 Mar

@Mascobot Did it work haha?

English

709

Marco Mascorro@Mascobot·1 Mar

Some investors are trying to get into rounds of RL companies by writing complex RL environment tasks/benchmarks 😆 "...my investment comes with TerminalBench3 quality-grade tasks" (It's a true story)

English

121

30.6K

Henry Yin@HenryYin_·1 Mar

@finnlay_doings challenge accepted

English

Finnlay@finnlay_doings·1 Mar

@HenryYin_ Losing at table tennis

English

Henry Yin@HenryYin_·28 Şub

accepting post-AGI hobby suggestions, currently considering latte art and competitive napping

Chad Hurley@Chad_Hurley

Hope everyone enjoys their last year of meaningful work!

English

186

Henry Yin@HenryYin_·28 Şub

@karpathy need to see the standup transcript, are the agents blaming each other for regressions yet?

English

154

Andrej Karpathy@karpathy·28 Şub

I had the same thought so I've been playing with it in nanochat. E.g. here's 8 agents (4 claude, 4 codex), with 1 GPU each running nanochat experiments (trying to delete logit softcap without regression). The TLDR is that it doesn't work and it's a mess... but it's still very pretty to look at :) I tried a few setups: 8 independent solo researchers, 1 chief scientist giving work to 8 junior researchers, etc. Each research program is a git branch, each scientist forks it into a feature branch, git worktrees for isolation, simple files for comms, skip Docker/VMs for simplicity atm (I find that instructions are enough to prevent interference). Research org runs in tmux window grids of interactive sessions (like Teams) so that it's pretty to look at, see their individual work, and "take over" if needed, i.e. no -p. But ok the reason it doesn't work so far is that the agents' ideas are just pretty bad out of the box, even at highest intelligence. They don't think carefully though experiment design, they run a bit non-sensical variations, they don't create strong baselines and ablate things properly, they don't carefully control for runtime or flops. (just as an example, an agent yesterday "discovered" that increasing the hidden size of the network improves the validation loss, which is a totally spurious result given that a bigger network will have a lower validation loss in the infinite data regime, but then it also trains for a lot longer, it's not clear why I had to come in to point that out). They are very good at implementing any given well-scoped and described idea but they don't creatively generate them. But the goal is that you are now programming an organization (e.g. a "research org") and its individual agents, so the "source code" is the collection of prompts, skills, tools, etc. and processes that make it up. E.g. a daily standup in the morning is now part of the "org code". And optimizing nanochat pretraining is just one of the many tasks (almost like an eval). Then - given an arbitrary task, how quickly does your research org generate progress on it?

Thomas Wolf@Thom_Wolf

How come the NanoGPT speedrun challenge is not fully AI automated research by now?

English

562

810

8.7K

1.6M

Henry Yin@HenryYin_·28 Şub

Had a sharp debate with a friend after my continual learning post. His best pushbacks and where I land: 1. "Why does a model need to learn after deployment? Context windows keep growing, search covers the knowledge lag, and labs already retrain every few months." Right for knowledge, wrong for capabilities. You can search for a fact. You can't search for a new reasoning pattern. And personalization (your codebase, your workflow) will never be in the pretraining set. 2. "CL is vaguely defined. Learning a new fact doesn't require solving catastrophic forgetting. Learning significant new capabilities does. Very different problems." Good point. Most CL discourse conflates "remember a fact" with "acquire a skill." The first is basically solved. The second is the real frontier, and where TTT lives. 3. "Even with perfect CL, a small model still won't be smart enough to improve itself. Self-improvement needs a base level of intelligence." Exactly right. Recent research shows some models have the cognitive behaviors to self-improve under RL (backtracking, verification) and others don't. Same size, same algorithm, different outcome. 4. "Self-improvement in coding won't cure cancer. It only works where you can cheaply verify answers." The real bottleneck. AlphaProof works because math has proofs. Code has execution. Biology doesn't have a cheap oracle. Expanding what domains have cheap verification is the frontier. 5. "The bigger unlock is models increasing in intelligence over time. Learning from feedback in a job or codebase is more valuable than classic CL, and it's different because the total information is lower." Most underrated take. The valuable form of CL isn't "ingest Wikipedia updates." It's "get better at YOUR job." Small, high-signal, high-value. Read the full post here: substacktools.com/sharex/GqL_A0TA

English

191

Henry Yin@HenryYin_·27 Şub

@jeffclune very cool work, feels like darwin godel machine idea applied to memory itself. connects two threads I've been thinking about (context eng + self-improvement) in a new writeup: x.com/HenryYin_/stat…

English

Jeff Clune@jeffclune·10 Şub

Can AI agents design better memory mechanisms for themselves? Introducing Learning to Continually Learn via Meta-learning Memory Designs. A meta agent automatically designs memory mechanisms, including what info to store, how to retrieve it, and how to update it, enabling agentic systems to continually learn across diverse domains. Led by @yimingxiong_ with @shengranhu 🧵👇 1/

GIF

English

188

1.3K

230.5K

Henry Yin@HenryYin_·27 Şub

@mertyuksekgonul TTT went from "handle corrupted images" in 2020 to "discover things nobody's ever seen" in 6 years. wrote about the full arc and excited about where this evolves into: x.com/HenryYin_/stat…

Henry Yin@HenryYin_

English

Mert Yuksekgonul@mertyuksekgonul·22 Oca

How to get AI to make discoveries on open scientific problems? Most methods just improve the prompt with more attempts. But the AI itself doesn't improve. With test-time training, AI can continue to learn on the problem it’s trying to solve: test-time-training.github.io/discover.pdf

English

168

755

376.2K

Henry Yin@HenryYin_·27 Şub

@RichardSSutton Excited about what David's gonna build next. Been thinking about useful components for building the Era of Experience and tried to map out how these components connect: x.com/HenryYin_/stat…

Henry Yin@HenryYin_

English

Richard Sutton@RichardSSutton·19 Şub

David Silver's new $4bn company, Ineffable Intelligence, will fulfil the promise of the Era of Experience. youtube.com/watch?v=zzXyPG… ft.com/content/dffe72…

YouTube

English

719

60.8K

Henry Yin@HenryYin_·27 Şub

Featuring TTT work from @akyurekekin @karansdalal @mertyuksekgonul @EyubogluSabri, meta-learning from @chelseabfinn, self-improvement from @ericzelikman @jeffclune @gandhikanishk, and @RichardSSutton & Silver's vision of where it all leads. Grateful for the work that made this post possible and excited to see where this all converges!

English

136

Henry Yin@HenryYin_·27 Şub

English

669

Henry Yin@HenryYin_·27 Şub

We're hosting a salon on this topic March 5 evening at Google SF office with @timshi_ai, @composio, @GVteam. DM us if you're researching/building in this space and would like to join! @naomiiixia @lynn_aisv @Rishimodi99

English

549

Keşfet

@Yuchenj_UW @harborframework @thinkymachines @daytonaio @tinkerapi @ivanburazin @Mascobot @finnlay_doings