Susung Hong

199 posts

Susung Hong

Susung Hong

@SusungHong

PhD student @uwcse | Intern @Google | Generative simulation and video/3D

Seattle, United States Katılım Ekim 2022
128 Takip Edilen230 Takipçiler
Sabitlenmiş Tweet
Susung Hong
Susung Hong@SusungHong·
🎬Introducing COMIC — fully automated AI comedy Can AI be funny? We've seen AI solve math and code. Comedy is the opposite extreme, whose success is hard to define. 🍿Watch full videos here: susunghong.github.io/COMIC 📄See how we tackle open-ended comedy: arxiv.org/abs/2603.11048
English
2
1
11
851
Susung Hong retweetledi
Yoonho Lee
Yoonho Lee@yoonholeee·
How can we autonomously improve LLM harnesses on problems humans are actively working on? Doing so requires solving a hard, long-horizon credit-assignment problem over all prior code, traces, and scores. Announcing Meta-Harness: a method for optimizing harnesses end-to-end
Yoonho Lee tweet media
English
75
261
1.6K
422K
Susung Hong retweetledi
hardmaru
hardmaru@hardmaru·
I’m incredibly proud of The AI Scientist team for this milestone publication in @Nature. We started this project to explore if foundation models could execute the entire research lifecycle. Seeing this work validated at this level is a special moment. I truly believe AI will forever change the landscape of how scientific discoveries and scientific progress are made.
Sakana AI@SakanaAILabs

The AI Scientist: Towards Fully Automated AI Research, Now Published in Nature Nature: nature.com/articles/s4158… Blog: sakana.ai/ai-scientist-n… When we first introduced The AI Scientist, we shared an ambitious vision of an agent powered by foundation models capable of executing the entire machine learning research lifecycle. From inventing ideas and writing code to executing experiments and drafting the manuscript, the system demonstrated that end-to-end automation of the scientific process is possible. Soon after, we shared a historic update: the improved AI Scientist-v2 produced the first fully AI-generated paper to pass a rigorous human peer-review process. Today, we are happy to announce that “The AI Scientist: Towards Fully Automated AI Research,” our paper describing all of this work, along with fresh new insights, has been published in @Nature! This Nature publication consolidates these milestones and details the underlying foundation model orchestration. It also introduces our Automated Reviewer, which matches human review judgments and actually exceeds standard inter-human agreement. Crucially, by using this reviewer to grade papers generated by different foundation models, we discovered a clear scaling law of science. As the underlying foundation models improve, the quality of the generated scientific papers increases correspondingly. This implies that as compute costs decrease and model capabilities continue to exponentially increase, future versions of The AI Scientist will be substantially more capable. Building upon our previous open-source releases (github.com/SakanaAI/AI-Sc…), this open-access Nature publication comprehensively details our system's architecture, outlines several new scaling results, and discusses the promise and challenges of AI-generated science. This substantial milestone is the result of a close and fruitful collaboration between researchers at Sakana AI, the University of British Columbia (UBC) and the Vector Institute, and the University of Oxford. Congrats to the team! @_chris_lu_ @cong_ml @RobertTLange @_yutaroyamada @shengranhu @j_foerst @hardmaru @jeffclune

English
59
142
1.1K
196.6K
Susung Hong
Susung Hong@SusungHong·
Love this observation. LLMs struggle most where we can't verify their outputs (e.g., humor). One approach that works: arxiv.org/abs/2603.11048
sarah guo@saranormous

Caught up with @karpathy for a new @NoPriorsPod: on the phase shift in engineering, AI psychosis, claws, AutoResearch, the opportunity for a SETI-at-Home like movement in AI, the model landscape, and second order effects 02:55 - What Capability Limits Remain? 06:15 - What Mastery of Coding Agents Looks Like 11:16 - Second Order Effects of Coding Agents 15:51 - Why AutoResearch 22:45 - Relevant Skills in the AI Era 28:25 - Model Speciation 32:30 - Collaboration Surfaces for Humans and AI 37:28 - Analysis of Jobs Market Data 48:25 - Open vs. Closed Source Models 53:51 - Autonomous Robotics and Atoms 1:00:59 - MicroGPT and Agentic Education 1:05:40 - End Thoughts

English
0
0
6
296
Susung Hong
Susung Hong@SusungHong·
🎉 MusicInfuser has been accepted to #CVPR2026! We plug music perception into silent video diffusion and make it dance! 🎶 Adding a new sensor like audio to a pretrained diffusion model can be destructive. Check out how we tackle this: arxiv.org/abs/2503.14505
Susung Hong@SusungHong

Text-to-video models are silent🔇, but does that mean they don't know music, beat, and tempo🎶? I'm excited to present MusicInfuser🎹, an adapter network which aligns silent dancing videos to music. Check out our paper, examples, code, and weights here: susunghong.github.io/MusicInfuser

English
0
0
12
903
Susung Hong retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single, dependency-minimal codebase. You boot up a cloud GPU box, run a single script and in as little as 4 hours later you can talk to your own LLM in a ChatGPT-like web UI. It weighs ~8,000 lines of imo quite clean code to: - Train the tokenizer using a new Rust implementation - Pretrain a Transformer LLM on FineWeb, evaluate CORE score across a number of metrics - Midtrain on user-assistant conversations from SmolTalk, multiple choice questions, tool use. - SFT, evaluate the chat model on world knowledge multiple choice (ARC-E/C, MMLU), math (GSM8K), code (HumanEval) - RL the model optionally on GSM8K with "GRPO" - Efficient inference the model in an Engine with KV cache, simple prefill/decode, tool use (Python interpreter in a lightweight sandbox), talk to it over CLI or ChatGPT-like WebUI. - Write a single markdown report card, summarizing and gamifying the whole thing. Even for as low as ~$100 in cost (~4 hours on an 8XH100 node), you can train a little ChatGPT clone that you can kind of talk to, and which can write stories/poems, answer simple questions. About ~12 hours surpasses GPT-2 CORE metric. As you further scale up towards ~$1000 (~41.6 hours of training), it quickly becomes a lot more coherent and can solve simple math/code problems and take multiple choice tests. E.g. a depth 30 model trained for 24 hours (this is about equal to FLOPs of GPT-3 Small 125M and 1/1000th of GPT-3) gets into 40s on MMLU and 70s on ARC-Easy, 20s on GSM8K, etc. My goal is to get the full "strong baseline" stack into one cohesive, minimal, readable, hackable, maximally forkable repo. nanochat will be the capstone project of LLM101n (which is still being developed). I think it also has potential to grow into a research harness, or a benchmark, similar to nanoGPT before it. It is by no means finished, tuned or optimized (actually I think there's likely quite a bit of low-hanging fruit), but I think it's at a place where the overall skeleton is ok enough that it can go up on GitHub where all the parts of it can be improved. Link to repo and a detailed walkthrough of the nanochat speedrun is in the reply.
Andrej Karpathy tweet media
English
691
3.4K
24.2K
5.8M
Susung Hong retweetledi
Rohan Paul
Rohan Paul@rohanpaul_ai·
The paper introduces Tangential Amplifying Guidance (TAG), a guidance step that reduces diffusion hallucinations by boosting the tangential part of each update. It splits every update into a radial part that manages noise and a tangential part that edits content. The radial part sets how noisy the latent should be at that moment. The tangential part moves along where real images live and carries structure and meaning. Many guidance tricks scale a difference between conditional and unconditional paths, which can shrink variety or push steps off course. TAG keeps the radial part as is and slightly amplifies only the tangential part. This keeps the path near real data, so objects stay coherent and attributes follow the prompt. A simple first order check shows that amplifying the tangential part raises likelihood, which means fewer off-manifold mistakes. It drops in without retraining and does not add extra model calls. In experiments, 30 steps with TAG beat 100 step classifier-free guidance on quality while keeping text-image match similar. ---- Paper – arxiv. org/abs/2510.04533 Paper Title: "TAG:Tangential Amplifying Guidance for Hallucination-Resistant Diffusion Sampling"
Rohan Paul tweet media
English
0
10
29
4.3K
Susung Hong retweetledi
Donghoon Ahn
Donghoon Ahn@donghoon_ahn·
🎉 Our paper, "Where and How to Perturb: On the Design of Perturbation Guidance in Diffusion and Flow Models", has been accepted to #NeurIPS2025! @NeurIPSConf Popular diffusion guidance methods, such as Perturbed-Attention Guidance (PAG), Smoothed Energy Guidance (SEG), and Autoguidance, steer denoising trajectories with weak or perturbed diffusion models. However, it remains unclear where to perturb to achieve improved image quality. In this work, we make a key observation: perturbing different attention heads leads to unique, disentangled visual attributes. We exploit this property to improve sample quality at inference time. For example, perturbing attention heads responsible for line-art style and guiding denoising away from the perturbed model can enhance that very style (see 2nd image)! We additionally propose a principled head-searching framework for arbitrary objectives, called HeadHunter. It enables discovering perturbation targets for both general quality improvements and style-specific enhancements. ✨Why it matters: Most prior works focus on layer-level control (especially in U-Nets), but we show that individual attention heads unlock fine-grained, interpretable control, suggesting directions such as head-specific tuning, pruning, and conditioning. Check out our paper for more insights and results! ArXiv: arxiv.org/abs/2506.10978 Project page: cvlab-kaist.github.io/HeadHunter/
Donghoon Ahn tweet mediaDonghoon Ahn tweet media
English
8
31
221
13.5K
Susung Hong retweetledi
Ruilong Li
Ruilong Li@ruilong_li·
For everyone interested in precise 📷camera control 📷 in transformers [e.g., video / world model etc] Stop settling for Plücker raymaps -- use camera-aware relative PE in your attention layers, like RoPE (for LLMs) but for cameras! Paper & code: liruilong.cn/prope/
Ruilong Li tweet media
English
10
94
533
108.5K
Susung Hong retweetledi
Vikash Kumar ✈️GTC
Vikash Kumar ✈️GTC@Vikashplus·
@Michael_J_Black's slide hits home with "WORDS FAIL US" The juxtaposition of its effectiveness in capturing its failure is too poetic!
Vikash Kumar ✈️GTC tweet media
English
3
4
33
2.7K
Susung Hong retweetledi
Sayak Paul
Sayak Paul@RisingSayak·
We present HeadHunter, a framework for principled analysis of perturbed attention guidance 🤖 Consequently, it enables deeply fine-grained control over the generation quality & visual attributes. Join in 🧵 for insights and "guidance". 1/12
Sayak Paul tweet media
English
3
30
165
73.5K
Susung Hong retweetledi
Zhenzhi Wang
Zhenzhi Wang@zhenzhiwang·
Video generation model could now generate multi-person dialogue videos or talking videos with HOI, from text prompts and N pairs of {cropped reference images (e.g., head images), audio} without any lip post-processing. Paper: arxiv.org/abs/2506.09984 Demo: zhenzhiwang.github.io/interacthuman/
English
3
7
18
4K
Susung Hong retweetledi
fly51fly
fly51fly@fly51fly·
[LG] A Stable Whitening Optimizer for Efficient Neural Network Training K Frans, S Levine, P Abbeel [UC Berkeley] (2025) arxiv.org/abs/2506.07254
fly51fly tweet mediafly51fly tweet mediafly51fly tweet mediafly51fly tweet media
English
0
4
15
869
Susung Hong retweetledi
Frank Nielsen
Frank Nielsen@FrnkNlsn·
Flavors of Jensen-Shannon divergence, a symmetrization of Kullback-Leibler divergence, always bounded by log 2. + Does not require common support! Variational prop. yields both a notion of centroid and a notion of diversity called information radius franknielsen.github.io/papers/entropy…
Frank Nielsen tweet media
English
0
17
128
5.1K
Susung Hong retweetledi
AK
AK@_akhaliq·
Nvidia presents Inference-Time Hyper-Scaling with KV Cache Compression
AK tweet media
English
8
56
505
44.3K
Susung Hong retweetledi
junhahyung
junhahyung@junhahyung·
🚀 Introducing Temporal In-Context Fine-Tuning (TIC-FT) for Video Diffusion Models ✨ 🎯 We're excited to present TIC-FT, a new fine-tuning method that enables conditional generation for pretrained text-to-video diffusion models.
English
5
1
11
431
Susung Hong retweetledi
Ben Mildenhall
Ben Mildenhall@BenMildenhall·
At @theworldlabs, we built a new Gaussian splatting web renderer with all the bells and whistles we needed to make splats a first-class citizen of the incredible @threejs ecosystem. Today, we're open sourcing Forge under the MIT license.
English
26
132
1.1K
101.6K
Susung Hong retweetledi
Xiaojuan (Jeanne) Wang
Xiaojuan (Jeanne) Wang@xiaojuan_wang7·
🐾 Introducing "How Animals Dance (When You’re Not Looking)" — we capture the hidden art of animal dance, and expose it for the first time. Watch them dance to APT. 🔊 and more: how-animals-dance.github.io Work done w/@holynski_, @kemelmi, Brian Curless, and Steve Seitz
English
0
6
48
3.3K