Henry Yin

146 posts

Henry Yin

Henry Yin

@HenryYin_

Following eigen vectors

San Francisco Katılım Temmuz 2018
539 Takip Edilen620 Takipçiler
Sabitlenmiş Tweet
Henry Yin
Henry Yin@HenryYin_·
[New Post] Continual Learning: the Promised Land The next breakthrough isn't a bigger model, it's a model that keeps learning. - Models that rewrite their own weights at inference. - Agents that curate their own memory. - Systems that improve their own reasoning. Read the builder's map here: substacktools.com/sharex/GqL_A0TA
Henry Yin tweet media
English
2
1
11
669
Henry Yin retweetledi
Phota Labs
Phota Labs@PhotaLabs·
Today, we introduce Phota Studio and Phota API, powered by our photography model that brings flagship image model capabilities, personalized to you. With personalization, an image model stops being just playful and starts becoming useful for photography. With Phota Studio, you can: - Reimagine composition, lighting, or posture while still looking like yourself - Create editorial, stylized, and studio-quality portraits of yourself, or bring someone you love into the frame - Revive the blurred shot, bring in the person who missed the group photo, fix the awkward expression - all without losing what made the moment worth keeping With Phota API, you can finally build photo experiences where real people are the core. Marketing assets, editorial shoots, wedding photography: workflows that needed identity fidelity that GenAI couldn't deliver. Until now. Ultimately, we want to make compelling photographs accessible to everyone. Phota API and Phota Studio start to make that possible: empowering people to explore, imagine, and create without losing themselves in the image. With Phota Studio and Phota API, developers can build new photo experiences, while photographers and creators can explore a new kind of AI-native editing and generation. The next photo experience starts here!
English
44
25
159
42.9K
Henry Yin
Henry Yin@HenryYin_·
Measured improvements for the win
Arsh Shah Dilbagi@arshdilbagi

The hard part about LLM failures is that their outputs rarely look like failures. The demo “works.” The output sounds coherent. The user actively uses the product. And your dashboard looks normal. Meanwhile, the system can be wrong, unsafe, or quietly driving up token spend. And you won’t notice until the damage adds up. Prompts often serve as business logic (policies, safety, and product context). But many teams ship them without the basics, such as versioning, reviewable changes, end-to-end traces, and eval gates. In production, it doesn’t crash. It degrades via wrong answers, policy misses, and surprise spending. No crash. No error. No alert. I cover this exact issue in my @Stanford CS 224G guest lecture on AI Observability and Evaluations. Here are the core ideas: • If you only log the final output, you’re guessing. Full traces show where it broke. • Evals are feedback loops. Use clear pass/fail criteria tied to outcomes. • Run evals continuously on production traces and don’t wait for support tickets. The moat isn’t prompt cleverness. It’s a measured improvement. Full lecture + blog below 👇

English
0
0
3
236
Henry Yin
Henry Yin@HenryYin_·
@Yuchenj_UW Been loving your content, hoping this means you will have a little more time to make it
English
0
0
1
77
Yuchen Jin
Yuchen Jin@Yuchenj_UW·
I have decided to step down as CTO at Hyperbolic. Leaving a company you co-founded and poured your heart into is not easy. So many moments still feel vivid: launching our AI inference product for open-source models and seeing tens of thousands of developers sign up in a week; the week we were hit by a massive DDoS attack and the entire engineering team fought around the clock until we won; the day we launched the GPU platform and watched ARR take off. There were also hard moments. That’s the nature of building a startup. I’m grateful for all of it. What I’m most grateful for is the team. Thank you for your trust. Most startups never build something people want. I believe we did. You should be proud of yourselves. I will look forward to seeing your success. What’s next for me? I’m still figuring it out. I believe this is the most extraordinary moment in human history. We’re standing at the edge of the Singularity. AI will reshape everything, and I still feel the same excitement I felt when I first fell in love with AI. Time to start over. Time to climb another mountain. Thank you to everyone who has been part of the journey, — Yuchen
English
245
36
1.6K
183K
Henry Yin retweetledi
Marco Mascorro
Marco Mascorro@Mascobot·
🚨 New: Integrating Harbor (@harborframework) for end-to-end Computer-Use evaluation(for Windows and Linux) at scale with @thinkymachines' Tinker, OSWorld, @daytonaio, and bare-metal servers. We just added support for Computer Use, @tinkerapi, and OSWorld to Harbor - a framework for evaluating agents and generating RL training data by running large-scale rollouts across parallel sandboxed environments and collecting trajectories for SFT and RL. Repo and blogpost below 👇
English
11
19
130
19.1K
Henry Yin retweetledi
Erik Schluntz
Erik Schluntz@ErikSchluntz·
I'm hiring for an Agents Engineer role on the Multi-Agent team! if you've built awesome things on top of LLMs to increase their performance in measurable ways, I'd love to work with you! job-boards.greenhouse.io/anthropic/jobs…
English
57
88
1.5K
129.8K
Henry Yin retweetledi
Yifan Zhang
Yifan Zhang@yifan_zhang_·
Learning from interaction is a foundational idea underlying nearly all theories of learning and intelligence. (Richard Sutton) Introducing Interactive Benchmarks, which evaluate a model's ability to acquire information actively! 🚀 interactivebench.github.io/InteractiveBen…
Yifan Zhang tweet media
English
8
19
234
30.6K
Ivan Burazin
Ivan Burazin@ivanburazin·
Voice messages should die. I hate everyone who sends them. When someone sends me a voice note, I don't look at it for days. It's the most selfish form of communication. You're offloading your typing time onto my listening time. You get convenience and I get locked into your pace. Can neither skim thru nor reference them later.
English
615
288
3.5K
327.2K
Henry Yin retweetledi
Philipp Moritz
Philipp Moritz@pcmoritz·
We just merged a clean Qwen 3.5 implementation for SkyRL's Jax backend: github.com/NovaSky-AI/Sky… Currently only for dense models, but should be easy to adapt to MoE models, contributions welcome! Also if anybody wants to contribute chunkwise training for the gated delta net or layer stacking for the model, it would be welcome!
English
1
7
44
3.7K
Marco Mascorro
Marco Mascorro@Mascobot·
Some investors are trying to get into rounds of RL companies by writing complex RL environment tasks/benchmarks 😆 "...my investment comes with TerminalBench3 quality-grade tasks" (It's a true story)
English
12
4
121
30.6K
Henry Yin
Henry Yin@HenryYin_·
@karpathy need to see the standup transcript, are the agents blaming each other for regressions yet?
English
0
0
1
154
Andrej Karpathy
Andrej Karpathy@karpathy·
I had the same thought so I've been playing with it in nanochat. E.g. here's 8 agents (4 claude, 4 codex), with 1 GPU each running nanochat experiments (trying to delete logit softcap without regression). The TLDR is that it doesn't work and it's a mess... but it's still very pretty to look at :) I tried a few setups: 8 independent solo researchers, 1 chief scientist giving work to 8 junior researchers, etc. Each research program is a git branch, each scientist forks it into a feature branch, git worktrees for isolation, simple files for comms, skip Docker/VMs for simplicity atm (I find that instructions are enough to prevent interference). Research org runs in tmux window grids of interactive sessions (like Teams) so that it's pretty to look at, see their individual work, and "take over" if needed, i.e. no -p. But ok the reason it doesn't work so far is that the agents' ideas are just pretty bad out of the box, even at highest intelligence. They don't think carefully though experiment design, they run a bit non-sensical variations, they don't create strong baselines and ablate things properly, they don't carefully control for runtime or flops. (just as an example, an agent yesterday "discovered" that increasing the hidden size of the network improves the validation loss, which is a totally spurious result given that a bigger network will have a lower validation loss in the infinite data regime, but then it also trains for a lot longer, it's not clear why I had to come in to point that out). They are very good at implementing any given well-scoped and described idea but they don't creatively generate them. But the goal is that you are now programming an organization (e.g. a "research org") and its individual agents, so the "source code" is the collection of prompts, skills, tools, etc. and processes that make it up. E.g. a daily standup in the morning is now part of the "org code". And optimizing nanochat pretraining is just one of the many tasks (almost like an eval). Then - given an arbitrary task, how quickly does your research org generate progress on it?
Thomas Wolf@Thom_Wolf

How come the NanoGPT speedrun challenge is not fully AI automated research by now?

English
562
810
8.7K
1.6M
Henry Yin
Henry Yin@HenryYin_·
Had a sharp debate with a friend after my continual learning post. His best pushbacks and where I land: 1. "Why does a model need to learn after deployment? Context windows keep growing, search covers the knowledge lag, and labs already retrain every few months." Right for knowledge, wrong for capabilities. You can search for a fact. You can't search for a new reasoning pattern. And personalization (your codebase, your workflow) will never be in the pretraining set. 2. "CL is vaguely defined. Learning a new fact doesn't require solving catastrophic forgetting. Learning significant new capabilities does. Very different problems." Good point. Most CL discourse conflates "remember a fact" with "acquire a skill." The first is basically solved. The second is the real frontier, and where TTT lives. 3. "Even with perfect CL, a small model still won't be smart enough to improve itself. Self-improvement needs a base level of intelligence." Exactly right. Recent research shows some models have the cognitive behaviors to self-improve under RL (backtracking, verification) and others don't. Same size, same algorithm, different outcome. 4. "Self-improvement in coding won't cure cancer. It only works where you can cheaply verify answers." The real bottleneck. AlphaProof works because math has proofs. Code has execution. Biology doesn't have a cheap oracle. Expanding what domains have cheap verification is the frontier. 5. "The bigger unlock is models increasing in intelligence over time. Learning from feedback in a job or codebase is more valuable than classic CL, and it's different because the total information is lower." Most underrated take. The valuable form of CL isn't "ingest Wikipedia updates." It's "get better at YOUR job." Small, high-signal, high-value. Read the full post here: substacktools.com/sharex/GqL_A0TA
English
0
0
2
191
Henry Yin
Henry Yin@HenryYin_·
@jeffclune very cool work, feels like darwin godel machine idea applied to memory itself. connects two threads I've been thinking about (context eng + self-improvement) in a new writeup: x.com/HenryYin_/stat…
English
0
0
1
28
Jeff Clune
Jeff Clune@jeffclune·
Can AI agents design better memory mechanisms for themselves? Introducing Learning to Continually Learn via Meta-learning Memory Designs. A meta agent automatically designs memory mechanisms, including what info to store, how to retrieve it, and how to update it, enabling agentic systems to continually learn across diverse domains. Led by @yimingxiong_ with @shengranhu 🧵👇 1/
GIF
English
79
188
1.3K
230.5K
Mert Yuksekgonul
Mert Yuksekgonul@mertyuksekgonul·
How to get AI to make discoveries on open scientific problems? Most methods just improve the prompt with more attempts. But the AI itself doesn't improve. With test-time training, AI can continue to learn on the problem it’s trying to solve: test-time-training.github.io/discover.pdf
Mert Yuksekgonul tweet media
English
25
168
755
376.2K
Henry Yin
Henry Yin@HenryYin_·
[New Post] Continual Learning: the Promised Land The next breakthrough isn't a bigger model, it's a model that keeps learning. - Models that rewrite their own weights at inference. - Agents that curate their own memory. - Systems that improve their own reasoning. Read the builder's map here: substacktools.com/sharex/GqL_A0TA
Henry Yin tweet media
English
2
1
11
669