Susung Hong

199 posts

Susung Hong

@SusungHong

PhD student @uwcse | Intern @Google | Generative simulation and video/3D

Seattle, United States Katılım Ekim 2022

128 Takip Edilen230 Takipçiler

Sabitlenmiş Tweet

Susung Hong@SusungHong·12 Mar

🎬Introducing COMIC — fully automated AI comedy Can AI be funny? We've seen AI solve math and code. Comedy is the opposite extreme, whose success is hard to define. 🍿Watch full videos here: susunghong.github.io/COMIC 📄See how we tackle open-ended comedy: arxiv.org/abs/2603.11048

English

851

Susung Hong retweetledi

Yoonho Lee@yoonholeee·4d

How can we autonomously improve LLM harnesses on problems humans are actively working on? Doing so requires solving a hard, long-horizon credit-assignment problem over all prior code, traces, and scores. Announcing Meta-Harness: a method for optimizing harnesses end-to-end

English

261

1.6K

422K

Susung Hong retweetledi

hardmaru@hardmaru·25 Mar

I’m incredibly proud of The AI Scientist team for this milestone publication in @Nature. We started this project to explore if foundation models could execute the entire research lifecycle. Seeing this work validated at this level is a special moment. I truly believe AI will forever change the landscape of how scientific discoveries and scientific progress are made.

Sakana AI@SakanaAILabs

The AI Scientist: Towards Fully Automated AI Research, Now Published in Nature Nature: nature.com/articles/s4158… Blog: sakana.ai/ai-scientist-n… When we first introduced The AI Scientist, we shared an ambitious vision of an agent powered by foundation models capable of executing the entire machine learning research lifecycle. From inventing ideas and writing code to executing experiments and drafting the manuscript, the system demonstrated that end-to-end automation of the scientific process is possible. Soon after, we shared a historic update: the improved AI Scientist-v2 produced the first fully AI-generated paper to pass a rigorous human peer-review process. Today, we are happy to announce that “The AI Scientist: Towards Fully Automated AI Research,” our paper describing all of this work, along with fresh new insights, has been published in @Nature! This Nature publication consolidates these milestones and details the underlying foundation model orchestration. It also introduces our Automated Reviewer, which matches human review judgments and actually exceeds standard inter-human agreement. Crucially, by using this reviewer to grade papers generated by different foundation models, we discovered a clear scaling law of science. As the underlying foundation models improve, the quality of the generated scientific papers increases correspondingly. This implies that as compute costs decrease and model capabilities continue to exponentially increase, future versions of The AI Scientist will be substantially more capable. Building upon our previous open-source releases (github.com/SakanaAI/AI-Sc…), this open-access Nature publication comprehensively details our system's architecture, outlines several new scaling results, and discusses the promise and challenges of AI-generated science. This substantial milestone is the result of a close and fruitful collaboration between researchers at Sakana AI, the University of British Columbia (UBC) and the Vector Institute, and the University of Oxford. Congrats to the team! @_chris_lu_ @cong_ml @RobertTLange @_yutaroyamada @shengranhu @j_foerst @hardmaru @jeffclune

English

142

1.1K

196.6K

Susung Hong@SusungHong·22 Mar

Love this observation. LLMs struggle most where we can't verify their outputs (e.g., humor). One approach that works: arxiv.org/abs/2603.11048

sarah guo@saranormous

Caught up with @karpathy for a new @NoPriorsPod: on the phase shift in engineering, AI psychosis, claws, AutoResearch, the opportunity for a SETI-at-Home like movement in AI, the model landscape, and second order effects 02:55 - What Capability Limits Remain? 06:15 - What Mastery of Coding Agents Looks Like 11:16 - Second Order Effects of Coding Agents 15:51 - Why AutoResearch 22:45 - Relevant Skills in the AI Era 28:25 - Model Speciation 32:30 - Collaboration Surfaces for Humans and AI 37:28 - Analysis of Jobs Market Data 48:25 - Open vs. Closed Source Models 53:51 - Autonomous Robotics and Atoms 1:00:59 - MicroGPT and Agentic Education 1:05:40 - End Thoughts

English

296

Susung Hong@SusungHong·12 Mar

🎉 MusicInfuser has been accepted to #CVPR2026! We plug music perception into silent video diffusion and make it dance! 🎶 Adding a new sensor like audio to a pretrained diffusion model can be destructive. Check out how we tackle this: arxiv.org/abs/2503.14505

Susung Hong@SusungHong

Text-to-video models are silent🔇, but does that mean they don't know music, beat, and tempo🎶? I'm excited to present MusicInfuser🎹, an adapter network which aligns silent dancing videos to music. Check out our paper, examples, code, and weights here: susunghong.github.io/MusicInfuser

English

903

Susung Hong retweetledi

Jürgen Schmidhuber@SchmidhuberAI·27 Eki

Our Huxley-Gödel Machine learns to rewrite its own code, estimating its own long-term self-improvement potential. It generalizes on new tasks (SWE-Bench Lite), matching the best officially checked human-engineered agents. Arxiv 2510.21614 With @Wenyi_AI_Wang, @PiotrPiekosAI, @nbl_ai, Firas Laakom, @Beastlyprime, @MatOstasze, @MingchenZhuge

English

154

95.4K

Susung Hong retweetledi

Andrej Karpathy@karpathy·13 Eki

Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single, dependency-minimal codebase. You boot up a cloud GPU box, run a single script and in as little as 4 hours later you can talk to your own LLM in a ChatGPT-like web UI. It weighs ~8,000 lines of imo quite clean code to: - Train the tokenizer using a new Rust implementation - Pretrain a Transformer LLM on FineWeb, evaluate CORE score across a number of metrics - Midtrain on user-assistant conversations from SmolTalk, multiple choice questions, tool use. - SFT, evaluate the chat model on world knowledge multiple choice (ARC-E/C, MMLU), math (GSM8K), code (HumanEval) - RL the model optionally on GSM8K with "GRPO" - Efficient inference the model in an Engine with KV cache, simple prefill/decode, tool use (Python interpreter in a lightweight sandbox), talk to it over CLI or ChatGPT-like WebUI. - Write a single markdown report card, summarizing and gamifying the whole thing. Even for as low as ~$100 in cost (~4 hours on an 8XH100 node), you can train a little ChatGPT clone that you can kind of talk to, and which can write stories/poems, answer simple questions. About ~12 hours surpasses GPT-2 CORE metric. As you further scale up towards ~$1000 (~41.6 hours of training), it quickly becomes a lot more coherent and can solve simple math/code problems and take multiple choice tests. E.g. a depth 30 model trained for 24 hours (this is about equal to FLOPs of GPT-3 Small 125M and 1/1000th of GPT-3) gets into 40s on MMLU and 70s on ARC-Easy, 20s on GSM8K, etc. My goal is to get the full "strong baseline" stack into one cohesive, minimal, readable, hackable, maximally forkable repo. nanochat will be the capstone project of LLM101n (which is still being developed). I think it also has potential to grow into a research harness, or a benchmark, similar to nanoGPT before it. It is by no means finished, tuned or optimized (actually I think there's likely quite a bit of low-hanging fruit), but I think it's at a place where the overall skeleton is ok enough that it can go up on GitHub where all the parts of it can be improved. Link to repo and a detailed walkthrough of the nanochat speedrun is in the reply.

English

691

3.4K

24.2K

5.8M

Susung Hong retweetledi

Rohan Paul@rohanpaul_ai·15 Eki

The paper introduces Tangential Amplifying Guidance (TAG), a guidance step that reduces diffusion hallucinations by boosting the tangential part of each update. It splits every update into a radial part that manages noise and a tangential part that edits content. The radial part sets how noisy the latent should be at that moment. The tangential part moves along where real images live and carries structure and meaning. Many guidance tricks scale a difference between conditional and unconditional paths, which can shrink variety or push steps off course. TAG keeps the radial part as is and slightly amplifies only the tangential part. This keeps the path near real data, so objects stay coherent and attributes follow the prompt. A simple first order check shows that amplifying the tangential part raises likelihood, which means fewer off-manifold mistakes. It drops in without retraining and does not add extra model calls. In experiments, 30 steps with TAG beat 100 step classifier-free guidance on quality while keeping text-image match similar. ---- Paper – arxiv. org/abs/2510.04533 Paper Title: "TAG:Tangential Amplifying Guidance for Hallucination-Resistant Diffusion Sampling"

English

4.3K

Susung Hong retweetledi

Donghoon Ahn@donghoon_ahn·19 Eyl

🎉 Our paper, "Where and How to Perturb: On the Design of Perturbation Guidance in Diffusion and Flow Models", has been accepted to #NeurIPS2025! @NeurIPSConf Popular diffusion guidance methods, such as Perturbed-Attention Guidance (PAG), Smoothed Energy Guidance (SEG), and Autoguidance, steer denoising trajectories with weak or perturbed diffusion models. However, it remains unclear where to perturb to achieve improved image quality. In this work, we make a key observation: perturbing different attention heads leads to unique, disentangled visual attributes. We exploit this property to improve sample quality at inference time. For example, perturbing attention heads responsible for line-art style and guiding denoising away from the perturbed model can enhance that very style (see 2nd image)! We additionally propose a principled head-searching framework for arbitrary objectives, called HeadHunter. It enables discovering perturbation targets for both general quality improvements and style-specific enhancements. ✨Why it matters: Most prior works focus on layer-level control (especially in U-Nets), but we show that individual attention heads unlock fine-grained, interpretable control, suggesting directions such as head-specific tuning, pruning, and conditioning. Check out our paper for more insights and results! ArXiv: arxiv.org/abs/2506.10978 Project page: cvlab-kaist.github.io/HeadHunter/

English

221

13.5K

Susung Hong retweetledi

Ruilong Li@ruilong_li·15 Tem

For everyone interested in precise 📷camera control 📷 in transformers [e.g., video / world model etc] Stop settling for Plücker raymaps -- use camera-aware relative PE in your attention layers, like RoPE (for LLMs) but for cameras! Paper & code: liruilong.cn/prope/

English

533

108.5K

Susung Hong retweetledi

Vishal Patel@vishalm_patel·19 Haz

🎨 New work: Training-Free Stylized Abstraction Generate stylized avatars (LEGO, South Park, dolls) from a single image ! 💡 VLM-guided identity distillation 📊 StyleBench eval @HopkinsDSAI @JHUECE @jhucs @KartikNarayan10 @HopkinsEngineer 🔗 kartik-3004.github.io/TF-SA/

English

1.6K

Susung Hong retweetledi

Vikash Kumar ✈️GTC@Vikashplus·16 Haz

@Michael_J_Black's slide hits home with "WORDS FAIL US" The juxtaposition of its effectiveness in capturing its failure is too poetic!

English

2.7K

Susung Hong retweetledi

Sayak Paul@RisingSayak·13 Haz

We present HeadHunter, a framework for principled analysis of perturbed attention guidance 🤖 Consequently, it enables deeply fine-grained control over the generation quality & visual attributes. Join in 🧵 for insights and "guidance". 1/12

English

165

73.5K

Susung Hong retweetledi

Zhenzhi Wang@zhenzhiwang·12 Haz

Video generation model could now generate multi-person dialogue videos or talking videos with HOI, from text prompts and N pairs of {cropped reference images (e.g., head images), audio} without any lip post-processing. Paper: arxiv.org/abs/2506.09984 Demo: zhenzhiwang.github.io/interacthuman/

English

Susung Hong retweetledi

fly51fly@fly51fly·11 Haz

[LG] A Stable Whitening Optimizer for Efficient Neural Network Training K Frans, S Levine, P Abbeel [UC Berkeley] (2025) arxiv.org/abs/2506.07254

English

869

Susung Hong retweetledi

Frank Nielsen@FrnkNlsn·8 Haz

Flavors of Jensen-Shannon divergence, a symmetrization of Kullback-Leibler divergence, always bounded by log 2. + Does not require common support! Variational prop. yields both a notion of centroid and a notion of diversity called information radius franknielsen.github.io/papers/entropy…

English

128

5.1K

Susung Hong retweetledi

AK@_akhaliq·6 Haz

Nvidia presents Inference-Time Hyper-Scaling with KV Cache Compression

English

505

44.3K

Susung Hong retweetledi

junhahyung@junhahyung·3 Haz

🚀 Introducing Temporal In-Context Fine-Tuning (TIC-FT) for Video Diffusion Models ✨ 🎯 We're excited to present TIC-FT, a new fine-tuning method that enables conditional generation for pretrained text-to-video diffusion models.

English

431

Susung Hong retweetledi

Ben Mildenhall@BenMildenhall·2 Haz

At @theworldlabs, we built a new Gaussian splatting web renderer with all the bells and whistles we needed to make splats a first-class citizen of the incredible @threejs ecosystem. Today, we're open sourcing Forge under the MIT license.

English

132

1.1K

101.6K

Susung Hong retweetledi

Xiaojuan (Jeanne) Wang@xiaojuan_wang7·30 May

🐾 Introducing "How Animals Dance (When You’re Not Looking)" — we capture the hidden art of animal dance, and expose it for the first time. Watch them dance to APT. 🔊 and more: how-animals-dance.github.io Work done w/@holynski_, @kemelmi, Brian Curless, and Steve Seitz

English

3.3K

Keşfet

@Nature @Wenyi_AI_Wang @PiotrPiekosAI @nbl_ai @Beastlyprime @MatOstasze @MingchenZhuge @NeurIPSConf