Shuyang (Kevin) Sun

46 posts

Shuyang (Kevin) Sun

@Kevin_SSY

Researcher @GoogleDeepMind. DPhil (PhD) @OxfordTVG; Computer Vision, Multi-modal Modeling. Opinions are my own.

Katılım Mart 2020

154 Takip Edilen293 Takipçiler

Shuyang (Kevin) Sun retweetledi

Jon Barron@jon_barron·23 Nis

We have an important result to share: if you reduce multiple dense vision tasks into a single RGB-image-prediction task, fine-tuning a strong image generator (in our case Nano Banana Pro) matches or beats all specialized models for monodepth, normals, and semantic segmentation.

Songyou Peng@songyoupeng

Yay, finally! Introducing Vision Banana🍌 from @GoogleDeepMind, our unified model that outperforms SoTA specialist models on various vision tasks! By treating 2D/3D vision tasks as image generation, we unlock a new foundation for CV. Project page: vision-banana.github.io (1/5)

English

487

65.3K

Shuyang (Kevin) Sun retweetledi

Saining Xie@sainingxie·23 Nis

vision🍌 is here vision-banana.github.io if you got into computer vision the way I did, starting with pixel-level labeling tasks like segmentation, edges, depth, or surface normals, you’ll probably feel the same seeing these results -- something big has quietly shifted, and it’s going to change how we approach these problems for good 🧵

English

112

785

63K

Shuyang (Kevin) Sun@Kevin_SSY·23 Nis

Check out the website and the paper: vision-banana.github.io arxiv.org/abs/2604.20329

Shuyang (Kevin) Sun@Kevin_SSY

Are we finally witnessing the GPT-3 moment for computer vision? We just dropped Vision Banana 🍌 , a vision foundation model that seamlessly unifies generation and perception by treating all vision tasks as just another image generation problem. 1/N #googledeepmind #nanobanana

English

1.9K

Shuyang (Kevin) Sun@Kevin_SSY·23 Nis

Vision Banana is also advised and supported by Oliver Wang, Saining Xie, Howard Zhou, Kaiming He, Thomas Funkhouser, Jean-Baptiste ALAYRAC, and Radu Soricut. N/N

English

2.2K

Shuyang (Kevin) Sun@Kevin_SSY·23 Nis

English

195

24.7K

Shuyang (Kevin) Sun@Kevin_SSY·23 Nis

Huge shout out to my amazing collaborators: @vgabeur , @ShangbangLong , @songyoupeng , @PaulVoigtlaend1 , Yanan Bao, Karen Truong, Zhicheng Wang, Wenlei Zhou, @jon_barron , Kyle Genova, Nithish Kannen, Xue Ben, Yandong Li, Mandy Guo, Suhas Yogin, Yiming Gu, and Huizhong Chen.

Filipino

2.1K

Shuyang (Kevin) Sun@Kevin_SSY·23 Nis

Vision Banana achieves state-of-the-art performance under the zero-shot transfer setting across both 2D and 3D vision tasks, proving that generative vision models are the ultimate generalist foundation. 2/N

English

2.2K

Shuyang (Kevin) Sun retweetledi

Shangbang Long@ShangbangLong·23 Nis

🚀 Excited to announce Vision Banana 🍌 and our new paper: “Image Generators are Generalist Vision Learners”. We turn Nano Banana Pro into a state-of-the-art visual generation and understanding model. 🖼️ Check out our gallery at vision-banana.github.io 🧵 (1/N) continue ⬇️

English

430

59K

Shuyang (Kevin) Sun@Kevin_SSY·18 Kas

Gemini is back

Demis Hassabis@demishassabis

We’ve been intensely cooking Gemini 3 for a while now, and we’re so excited and proud to share the results with you all. Of course it tops the leaderboards, including @arena, HLE, GPQA etc, but beyond the benchmarks it’s been by far my favourite model to use for its style and depth, and what it can do to help with everyday tasks.

Indonesia

191

Shuyang (Kevin) Sun@Kevin_SSY·11 Kas

Great work from our student researcher Jiageng Mao @PointsCoder to enable scalable robot learning by imitating AI-generated videos.

English

1.6K

Shuyang (Kevin) Sun retweetledi

Google DeepMind@GoogleDeepMind·5 Ağu

What if you could not only watch a generated video, but explore it too? 🌐 Genie 3 is our groundbreaking world model that creates interactive, playable environments from a single text prompt. From photorealistic landscapes to fantasy realms, the possibilities are endless. 🧵

English

813

2.6K

13.4K

3.7M

Shuyang (Kevin) Sun retweetledi

Accepted papers at TMLR@TmlrPub·11 Ağu

kNN-CLIP: Retrieval Enables Training-Free Segmentation on Continually Expanding Large Vocabularies Zhongrui Gui, Shuyang Sun, Runjia Li et al.. Action editor: Yu-Xiong Wang. openreview.net/forum?id=ZSqP1… #forgetting #vocabularies #memory

Filipino

1.4K

Shuyang (Kevin) Sun@Kevin_SSY·28 Haz

@csaba_botos @fhuszar @oiwi3000 Life advice: don't jump into the river while wearing a gown😅

English

Botos Csabi@csaba_botos·28 Haz

@fhuszar @oiwi3000 Thank you :) this was the perfect ending of one of the biggest chapters in my life. To advertise the alleged demos: botcs.github.io/dopanet botcs.github.io/label-delay/de… Which is going to be turned into a series of YouTube tutorials over the summer!

English

154

Ferenc Huszár@fhuszar·28 Haz

Great day in Oxford joining forces with @oiwi3000 as co-examiners for the DPhil viva of @csaba_botos Congrats Csaba for the great thesis and amazing live demos.

English

2.1K

Shuyang (Kevin) Sun@Kevin_SSY·20 Haz

Project page: torrvision.com/clip_as_rnn/ Code: github.com/kevin-ssy/CLIP… CPU Demo: huggingface.co/spaces/kevinss… work done with @RunjiaLi , @philiptorr , @laoreja001 and Siyang Li from University of Oxford and Google DeepMind

English

256

Shuyang (Kevin) Sun@Kevin_SSY·20 Haz

I will be presenting our paper "Clip as RNN: Segment Countless Visual Concepts without Training Effort" at @CVPR poster #341, today (June 20th) at 10:30 am. If you're interested in open-vocabulary segmentation and multi-modal learning, please come and chat with me!

English

1.2K

Shuyang (Kevin) Sun retweetledi

Ilya Sutskever@ilyasut·15 May

After almost a decade, I have made the decision to leave OpenAI. The company’s trajectory has been nothing short of miraculous, and I’m confident that OpenAI will build AGI that is both safe and beneficial under the leadership of @sama, @gdb, @miramurati and now, under the excellent research leadership of @merettm. It was an honor and a privilege to have worked together, and I will miss everyone dearly. So long, and thanks for everything. I am excited for what comes next — a project that is very personally meaningful to me about which I will share details in due time.

English

1.4K

2.3K

25.5K

5.9M

Shuyang (Kevin) Sun@Kevin_SSY·9 May

Unfortunately, I won't be able to attend ICLR in person due to visa issues. However, if you're keen on using synthetic data for training, I'd love for you to visit our poster Real-fake on Friday morning!

English

251

Shuyang (Kevin) Sun retweetledi

Shashwat Goel@ShashwatGoel7·23 Şub

You find some training data sources were compromised, potentially causing #backdoors, #bias, #mislabels etc. Can you remove its influence from previously trained #ML models instead of stopping their use? New work with @AmyPrb @AmartyaSanyal @ponguru @OxfordTVG studies this🧵👇

English

Shuyang (Kevin) Sun@Kevin_SSY·11 Şub

It's also a great journey collaborating with Jianhao @DYDYYDYYYD. Please broadcast and reach out if you're interested! @_akhaliq @OxfordTVG

English

129

Shuyang (Kevin) Sun@Kevin_SSY·11 Şub

Another paper on using synthetic data for better image recognition. Distribution matching may not be all you need but it's very helpful to bridge the real-synthetic domain gap.

Jianhao Yuan@DYDYYDYYYD

🧐What makes good synthetic training data? 🚀Discover our #ICLR2024 work Real-Fake🚀Effective Training Data Synthesis Through Distribution Matching 🔥🔥🔥 🧵1/n 📝Paper: arxiv.org/abs/2310.10402 🔍Project: torrvision.com/realfake/ 📊 Synthetic Dataset: huggingface.co/datasets/Jianh…

English

227

Keşfet

@vgabeur @ShangbangLong @songyoupeng @PaulVoigtlaend1 @jon_barron @PointsCoder @csaba_botos @fhuszar