Mark

290 posts

Mark

@_M_Weber

PhD student at the Dynamic Vision and Learning Group, and the Computer Vision Group, TU Munich; Computer Vision research

Katılım Kasım 2020

313 Takip Edilen363 Takipçiler

Mark@_M_Weber·12 Oca

@CSProfKGD And on 8pm on all other days!

English

Kosta Derpanis@CSProfKGD·11 Oca

Note to self: Supermarkets closed on Sundays in Munich.

English

3.7K

Mark@_M_Weber·29 Tem

➡️ For workshop specifics, including the evaluation package, baseline code, and more, visit: motchallenge.net/workshops/bmtt… ➡️ Eval server (sign-up now!): codabench.org/competitions/9… #ICCV2025

English

302

Mark@_M_Weber·29 Tem

Exciting news! We're happy to announce our challenge / workshop at this year @ICCVConference focusing on Spatiotemporal Action Grounding in Videos. Here are the details: 🔷 Watch the video below for a demo. 🔷 The eval server is open until 09/19! 🔷 Links incl. code below. #ICCV

English

1.7K

Mark retweetledi

Tanveer Hannan@hannan_tanveer·22 Tem

🎯 Challenge Launch Announcement We are pleased to announce the launch of the MOT25 Challenge, to be held in conjunction with ICCV 2025. 🔗 Workshop website: motchallenge.net/workshops/bmtt… 🧪 The MOT25 Challenge is now live on Codabench: codabench.org/competitions/9…

English

160

Mark@_M_Weber·16 Tem

@KyleSargentAI Also, if you ever figure out why someone established that gFID is measured against the train (!) set statistics, while rFID is measured against Val set statistics, let me know! Soo what alternative can we use on ImageNet?

English

433

Kyle Sargent@KyleSargentAI·16 Tem

The untainted NeurIPS 2024 best paper finalist VAR actually made a great point, which is that ImageNet val set gets 1.78 gFID -on ImageNet- Basically a superior gFID means you did better than an optimal sample of 50k images. Good job goodharting your eval metrics I guess?

English

7.9K

Mark@_M_Weber·25 Haz

@TobiasFischer11 Amazing, congrats!

English

Tobias Fischer@TobiasFischer11·25 Haz

🌊🏄

Jonathon Luiten@JonathonLuiten

Accepted to ICCV with perfect review scores!!! See you all in Hawaii!!!!

ART

592

Mark@_M_Weber·4 Haz

@drscotthawley Ohh thank you so much! I appreciate that. Hopefully there will be another time to chat at some point.

English

Scott H. Hawley@drscotthawley·24 May

@_M_Weber So bummed to have missed this, only seeing it now! Would have loved to connect at ICLR, instead I'll just read the paper. Lack of resources re. training "a modern VQGAN" - I've been feeling that for months! Will add your work to upcoming IJCNN tutorial on practical latent gen.

English

124

Mark@_M_Weber·21 Nis

Heading off to ICLR to present our work on image generation and latent spaces! If you're interested in tokenization or generation drop by at our poster. Also, if you'd like to chat about any of these topics, feel free to ping me! #ICLR2025

Mark@_M_Weber

🧵1/9 Happy to share our paper "MaskBit: Embedding-free Image Generation via Bit Tokens" got published in TMLR with featured (aka spotlight) & reproducibility certifications! I'm especially excited about the disentangled visual concepts in our shared latent space. Details below!

English

666

Mark retweetledi

Nando de Freitas@NandoDF·4 May

A beautiful article by D. Graham Burnett newyorker.com/culture/the-we… ´The A.I. is huge. A tsunami. But it’s not me. It can’t touch my me-ness. It doesn’t know what it is to be human, to be me.’ ‘Historians have long extolled the “power of the archive.” Little did we know that the engineers would come along and plug it in. And it turns out that a huge amount of what we seek from a human person can be simulated through this Frankensteinian reanimation of our collective dead letters.’ Thanks for sharing @mustafasuleyman

English

4.9K

Mark@_M_Weber·15 Nis

@sedielem The disentangled latent space then allows us to directly train the generator network taking the latent bit tokens as input, in contrast to learning a new vocabulary. We found this unified representation to be very efficient and strong for generation. arxiv.org/pdf/2409.16211

English

102

Mark@_M_Weber·15 Nis

@sedielem Thanks for the great post! In our study about training VQGANs, we made an observation that might be of interest to you and your readers. When using LFQ, our tokenizer is able to model certain visual properties (like exposure, smoothness, color palette) into different channels.

English

2.9K

Sander Dieleman@sedielem·15 Nis

New blog post: let's talk about latents! sander.ai/2025/04/15/lat…

English

197

1.1K

175K

Mark@_M_Weber·21 Şub

@roman__bachmann Great work!

English

257

Roman Bachmann@roman__bachmann·20 Şub

Have you ever been bothered by the constraints of fixed-sized 2D-grid tokenizers? We present FlexTok, a flexible-length 1D tokenizer that enables autoregressive models to describe images in a coarse-to-fine manner. flextok.epfl.ch arxiv.org/abs/2502.13967 🧵 1/n

English

190

59.6K

Mark@_M_Weber·20 Şub

@Thom_Wolf Fantastic work!

English

Thomas Wolf@Thom_Wolf·19 Şub

After 6+ months in the making and burning over a year of GPU compute time, we're super excited to finally release the "Ultra-Scale Playbook" Check it out here: hf.co/spaces/nanotro… A free, open-source, book to learn everything about 5D parallelism, ZeRO, fast CUDA kernels, how and why overlap compute & communication – all scaling bottlenecks and tools introduced with motivation, theory, interactive plots from our 4000+ scaling experiments and even NotebookLM podcasters to tag along with you. - How was DeepSeek trained for $5M only? - Why did Mistral trained an MoE? - Why is PyTorch native Data Parallelism implementation so complex under the hood? - What are all the parallelism techniques and why were they invented? - Should I use ZeRO-3 or Pipeline Parallelism when scaling and what's the story behind both techniques? - What is this Context Parallelism that Meta used to train Llama 3? Is it different from Sequence Parallelism? - What is FP8? how does it compares to BF16? In this book, our goal was to gather, in a single place, a coherent, easy to read yet detailed story of all the techniques that make today's LLM scaling possible. The largest factor for democratizing AI will always be teaching everyone how to build AI and in particular how to create, train and fine-tune high performance models. In other word making accessible to everybody the techniques that power all recent large language models and efficient training is possibly one of the most essential of them. What started as a simple blog-post ended up becoming an interactive writing piece containing 30k+ words. So we've decided to actually print it as a real 100-pages physical book as well: the physical ultrafast playbook –containing all the science of distributed and fast AI training. We plan to send free copies as gifts to the first readers of the online version so feel free to add your email in the form linked in the blog post.

English

109

691

3.9K

368.9K

Mark retweetledi

Bojan Pancevski@bopanc·14 Şub

Our exclusive with @JDVance ahead of @MunSecConf: -On Ukraine, he says there will be a good peace deal that will guarantee the country’s long-term sovereignty - and Putin will face sanctions and military measures if he doesn’t play ball. -On Europe, he will tell mainstream leaders in Munich that they’ve become Soviet-style enemies of free speech and democracy who ignore voters and fail to stop mass migration. Some Germans will be particularly shocked when he calls for ending the firewall against the far-Right @AfD and embracing the populist vote. With @alexbward via @WSJ wsj.com/world/europe/v…

English

160

362

154.9K

Mark retweetledi

Zhou Xian@zhou_xian_·19 Ara

Everything you love about generative models — now powered by real physics! Announcing the Genesis project — after a 24-month large-scale research collaboration involving over 20 research labs — a generative physics engine able to generate 4D dynamical worlds powered by a physics simulation platform designed for general-purpose robotics and physical AI applications. Genesis's physics engine is developed in pure Python, while being 10-80x faster than existing GPU-accelerated stacks like Isaac Gym and MJX. It delivers a simulation speed ~430,000 faster than in real-time, and takes only 26 seconds to train a robotic locomotion policy transferrable to the real world on a single RTX4090 (see tutorial: genesis-world.readthedocs.io/en/latest/user…). The Genesis physics engine and simulation platform is fully open source at github.com/Genesis-Embodi…. We'll gradually roll out access to our generative framework in the near future. Genesis implements a unified simulation framework all from scratch, integrating a wide spectrum of state-of-the-art physics solvers, allowing simulation of the whole physical world in a virtual realm with the highest realism. We aim to build a universal data engine that leverages an upper-level generative framework to autonomously create physical worlds, together with various modes of data, including environments, camera motions, robotic task proposals, reward functions, robot policies, character motions, fully interactive 3D scenes, open-world articulated assets, and more, aiming towards fully automated data generation for robotics, physical AI and other applications. Open Source Code: github.com/Genesis-Embodi… Project webpage: genesis-embodied-ai.github.io Documentation: genesis-world.readthedocs.io 1/n

English

561

16K

3.8M

Mark@_M_Weber·17 Ara

9/9 If you like this research and are hiring, I will be on the job market next summer! Big thanks to my collaborators: @yucornetto1 @xueqingdeng77 @lcchen_jay @tumcvg @LijunYu0

English

280

Mark@_M_Weber·17 Ara

8/9 Project page: weber-mark.github.io/projects/maskb… Paper: arxiv.org/pdf/2409.16211 Code: github.com/markweberdev/m…

English

168

Mark@_M_Weber·17 Ara

7/9 MaskBit achieves state-of-the-art performance with up to 1.52 FID on ImageNet 256×256, using just 305M parameters. That's better than prior diffusion and autoregressive models! 🔥

English

131

Mark@_M_Weber·17 Ara

5/9 Our analysis reveals fascinating properties of bit tokens: Most channels appear to capture different visual concepts, making the representation more interpretable! Flipping individual bits leads to systematic changes in attributes like texture, color, and style. 🎨

English

144

Mark@_M_Weber·17 Ara

6/9 For our generation model MaskBit, the key innovation is: We are the first to utilise the SAME (bit) token representation for both tokenizer and generator, unlike prior methods that require separate embedding tables! No need to (re-)learn codebooks anymore!

English

126

Mark@_M_Weber·17 Ara

4/9 Our tokenizer learns a semantically structured latent space! Checkout what happens when we flip bits in each channel. Having a consist visual interpretable latent space could be a gamechanger for control!

English

163

Mark@_M_Weber·17 Ara

3/9 We carefully revisit the VQGAN design and provide a complete, reproducible recipe for building a modern tokenizer. Our VQGAN+ improves reconstruction FID from 7.94 to 1.66 in the low vocabulary, low resolution setting - a huge 6.28 gain! 📈 See chapter 2 for details.

English

158

Keşfet

@CSProfKGD @ICCVConference @KyleSargentAI @TobiasFischer11 @drscotthawley @mustafasuleyman @sedielem @roman__bachmann