Zhuo Sun

111 posts

Zhuo Sun

@JasonSun10

Assistant Professor@SUFE, PhD in Comp. Stats & Machine Learning@University College London

Katılım Temmuz 2021

613 Takip Edilen149 Takipçiler

Zhuo Sun retweetledi

yingzhen@liyzhen2·11 May

Our own answer: structured coupling arxiv.org/abs/2605.07676 - flow matching with VAE-based coupling - VAE encoder & flow sharing networks - VAE decoder init. + flow refinement for sampling flow matching 🤝 VAEs -> good representation & sample quality🚀

yingzhen@liyzhen2

Tons of papers re diffusion/flow matching at ML confs these days, but to my surprise very few of them consider learning the prior🤔 Am I missing any important work here? 🙏 for suggestions

English

241

22.2K

Zhuo Sun@JasonSun10·30 Nis

@avt_im Have read many of your papers; they are really nice. Best of luck for future!

English

Alexander Terenin@avt_im·19 Mar

Big career news: I'm leaving academia - and moving to the San Francisco Bay Area to explore something new. I've written a short blog post with a few reflections on the end of this chapter. If you'd like to catch up, now is the time to reach out! avt.im/blog/the-road-…

English

277

22.6K

Zhuo Sun retweetledi

DeepSeek@deepseek_ai·24 Nis

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n

English

1.6K

7.7K

45.5K

9.7M

Zhuo Sun retweetledi

Xiaoyuan Cheng@cheng_xiaoyuan·21 Nis

2 papers at ICLR 2026 🎉 🟢 Oral: *Information Shapes Koopman Representation* (P3-#222) 🟢 *From Embedding to Control* (P4-#4805) 🕒 Thu, Apr 23 • 2:15–4:45 PM Come say hi! On the job market; moving more toward world models (new work in progress 👀) #ICLR #ICLR2026

English

2.6K

Zhuo Sun retweetledi

Guanyang Wang@GuanyangW·15 Nis

Happy to share Junshi (军师), an open-source Claude Code skill we built for researchers. The goal of Junshi is to propose personalized, promising next ideas, not just summarize the literature. 1/n

English

2.6K

Zhuo Sun retweetledi

PageIndex@PageIndexAI·10 Nis

Inspired by @karpathy's knowledge base thread, we are open-sourcing OpenKB: Open LLM Knowledge Base In addition to Andrej's great original design, OpenKB can scale to long PDFs and multi-modality, see details below 👇

English

4.5K

Zhuo Sun retweetledi

Lester Mackey@LesterMackey·20 Mar

Qiang Liu, Chris Oates, and I are writing a monograph on Probabilistic Inference and Learning with Stein’s Method, and we’d love to get your feedback on the first draft

English

200

22.1K

Zhuo Sun retweetledi

Xiaoyuan Cheng@cheng_xiaoyuan·14 Mar

@yingwww_ @ylecun @mengyer @randall_balestr @GaoyueZhou @oumaymabounou good idea. Actually, we have a series paper to discuss how to embed raw data to latent space, where we can control it linearly based on the theory of Koopman and RKHS: openreview.net/pdf?id=SZzpGvB…, openreview.net/pdf?id=Szh0ELy…, arxiv.org/abs/2501.13312…

English

684

Zhuo Sun@JasonSun10·26 Şub

Feel the same way!

Andrej Karpathy@karpathy

It is hard to communicate how much programming has changed due to AI in the last 2 months: not gradually and over time in the "progress as usual" way, but specifically this last December. There are a number of asterisks but imo coding agents basically didn’t work before December and basically work since - the models have significantly higher quality, long-term coherence and tenacity and they can power through large and long tasks, well past enough that it is extremely disruptive to the default programming workflow. Just to give an example, over the weekend I was building a local video analysis dashboard for the cameras of my home so I wrote: “Here is the local IP and username/password of my DGX Spark. Log in, set up ssh keys, set up vLLM, download and bench Qwen3-VL, set up a server endpoint to inference videos, a basic web ui dashboard, test everything, set it up with systemd, record memory notes for yourself and write up a markdown report for me”. The agent went off for ~30 minutes, ran into multiple issues, researched solutions online, resolved them one by one, wrote the code, tested it, debugged it, set up the services, and came back with the report and it was just done. I didn’t touch anything. All of this could easily have been a weekend project just 3 months ago but today it’s something you kick off and forget about for 30 minutes. As a result, programming is becoming unrecognizable. You’re not typing computer code into an editor like the way things were since computers were invented, that era is over. You're spinning up AI agents, giving them tasks *in English* and managing and reviewing their work in parallel. The biggest prize is in figuring out how you can keep ascending the layers of abstraction to set up long-running orchestrator Claws with all of the right tools, memory and instructions that productively manage multiple parallel Code instances for you. The leverage achievable via top tier "agentic engineering" feels very high right now. It’s not perfect, it needs high-level direction, judgement, taste, oversight, iteration and hints and ideas. It works a lot better in some scenarios than others (e.g. especially for tasks that are well-specified and where you can verify/test functionality). The key is to build intuition to decompose the task just right to hand off the parts that work and help out around the edges. But imo, this is nowhere near "business as usual" time in software.

English

153

Zhuo Sun retweetledi

Symposium on Probabilistic Machine Learning@prob_ml·10 Şub

ProbML 2026 (formerly AABI) invites submissions on probabilistic ML (Bayesian & beyond), July 5 in Seoul (co-located with ICML). Website: probml.cc. Tracks: proceedings (PMLR), workshop, fast track. New focus includes healthcare & climate! Submit by: 20 March 2026

Symposium on Probabilistic Machine Learning tweet media

English

Zhuo Sun@JasonSun10·6 Şub

We have solved a fundamental and important Monte Carlo challenge for (1) scientific computing, (2) Bayesian statistics over un-normalizing densities and (3) optimization for variational inference ! #ICLR2026

Zhuo Sun@JasonSun10

Our paper "Multilevel Control Functional" with score 8,8,8 accepted at ICLR 2026, is not recommended 'oral' at ICLR, which ranks top 20 in over 19000 submissions #iclr

English

1.7K

Zhuo Sun@JasonSun10·6 Şub

Our paper "Multilevel Control Functional" with score 8,8,8 accepted at ICLR 2026, is not recommended 'oral' at ICLR, which ranks top 20 in over 19000 submissions #iclr

English

4.3K

Zhuo Sun retweetledi

François-Xavier Briol@fx_briol·3 Şub

The UCL IMSS Annual Lecture will take place on the 27th April with a keynote from @LesterMackey. The theme is 'Computational Statistics and Machine Learning', and we will have talks from Alessandro Barp, Paula Cordero Encinar & Po-Ling Loh. imss2026.github.io @stats_UCL

English

2.9K

Zhuo Sun@JasonSun10·2 Şub

This is true as far as I observed. But also depends on the scales

Petar Veličković@PetarV_93

new preprint! turns out, if your model is confident on _any_ long enough input, we can find other inputs where the model is wrong, yet its perplexity won't really tell you it's wrong 📉 work with @fedzbar @ccperivol @sindero and Razvan

English

281

Zhuo Sun retweetledi

OpenAI@OpenAI·27 Oca

Introducing Prism, a free workspace for scientists to write and collaborate on research, powered by GPT-5.2. Available today to anyone with a ChatGPT personal account: prism.openai.com

English

1.1K

2.3K

16.2K

5.9M

Zhuo Sun retweetledi

Xing Liu@XingLiu97·30 Ara

Robust goodness-of-fit test with kernels -- now out in JMLR! jmlr.org/papers/v26/24-… Joint work with @fx_briol

Xing Liu@XingLiu97

A new robust solution to goodness-of-fit (GOF) testing for unnormalized models! Joint work with @fx_briol. ❌Existing kernel GOF tests based on KSD are NOT robust. ✅We introduce a simple, provably robust extension that adds no extra computational cost! arxiv.org/abs/2408.05854

English

367

Zhuo Sun@JasonSun10·21 Ara

This looks fantastic!

vLLM@vllm_project

Diffusion serving is expensive: dozens of timesteps per image, and a lot of redundant compute between adjacent steps. ⚡vLLM-Omni now supports diffusion cache acceleration backends (TeaCache + Cache-DiT) to reuse intermediate Transformer computations — no retraining, minimal quality impact! 🚀Benchmarks (NVIDIA H200, Qwen-Image 1024x1024): TeaCache 1.91x, Cache-DiT 1.85x. For Qwen-Image-Edit, Cache-DiT hits 2.38x! Blog: blog.vllm.ai/2025/12/19/vll… Docs: docs.vllm.ai/projects/vllm-… #vLLM #vLLMOmni #DiffusionModels #AIInference

English

128

Zhuo Sun@JasonSun10·28 Kas

@iclr_conf It is getting noticed that review scores are undergoing significant and rapid changes within a very short time frame. Although score adjustments are normal during discussions, the current level of fluctuation seems highly unusual under the present circumstances.

ICLR@iclr_conf

English

900

Zhuo Sun retweetledi