Mark

290 posts

Mark

Mark

@_M_Weber

PhD student at the Dynamic Vision and Learning Group, and the Computer Vision Group, TU Munich; Computer Vision research

شامل ہوئے Kasım 2020
313 فالونگ363 فالوورز
Mark
Mark@_M_Weber·
@CSProfKGD And on 8pm on all other days!
English
0
0
1
62
Kosta Derpanis
Kosta Derpanis@CSProfKGD·
Note to self: Supermarkets closed on Sundays in Munich.
English
6
0
23
3.7K
Mark
Mark@_M_Weber·
Exciting news! We're happy to announce our challenge / workshop at this year @ICCVConference focusing on Spatiotemporal Action Grounding in Videos. Here are the details: 🔷 Watch the video below for a demo. 🔷 The eval server is open until 09/19! 🔷 Links incl. code below. #ICCV
Mark tweet media
English
2
2
13
1.7K
Mark
Mark@_M_Weber·
@KyleSargentAI Also, if you ever figure out why someone established that gFID is measured against the train (!) set statistics, while rFID is measured against Val set statistics, let me know! Soo what alternative can we use on ImageNet?
English
1
0
1
433
Kyle Sargent
Kyle Sargent@KyleSargentAI·
The untainted NeurIPS 2024 best paper finalist VAR actually made a great point, which is that ImageNet val set gets 1.78 gFID -on ImageNet- Basically a superior gFID means you did better than an optimal sample of 50k images. Good job goodharting your eval metrics I guess?
Kyle Sargent tweet media
English
2
4
62
7.9K
Mark
Mark@_M_Weber·
@drscotthawley Ohh thank you so much! I appreciate that. Hopefully there will be another time to chat at some point.
English
0
0
0
16
Scott H. Hawley
Scott H. Hawley@drscotthawley·
@_M_Weber So bummed to have missed this, only seeing it now! Would have loved to connect at ICLR, instead I'll just read the paper. Lack of resources re. training "a modern VQGAN" - I've been feeling that for months! Will add your work to upcoming IJCNN tutorial on practical latent gen.
English
1
0
1
124
Mark
Mark@_M_Weber·
Heading off to ICLR to present our work on image generation and latent spaces! If you're interested in tokenization or generation drop by at our poster. Also, if you'd like to chat about any of these topics, feel free to ping me! #ICLR2025
Mark@_M_Weber

🧵1/9 Happy to share our paper "MaskBit: Embedding-free Image Generation via Bit Tokens" got published in TMLR with featured (aka spotlight) & reproducibility certifications! I'm especially excited about the disentangled visual concepts in our shared latent space. Details below!

English
1
0
14
666
Mark ری ٹویٹ کیا
Nando de Freitas
Nando de Freitas@NandoDF·
A beautiful article by D. Graham Burnett newyorker.com/culture/the-we… ´The A.I. is huge. A tsunami. But it’s not me. It can’t touch my me-ness. It doesn’t know what it is to be human, to be me.’ ‘Historians have long extolled the “power of the archive.” Little did we know that the engineers would come along and plug it in. And it turns out that a huge amount of what we seek from a human person can be simulated through this Frankensteinian reanimation of our collective dead letters.’ Thanks for sharing @mustafasuleyman
Nando de Freitas tweet media
English
1
4
19
4.9K
Mark
Mark@_M_Weber·
@sedielem The disentangled latent space then allows us to directly train the generator network taking the latent bit tokens as input, in contrast to learning a new vocabulary. We found this unified representation to be very efficient and strong for generation. arxiv.org/pdf/2409.16211
English
0
0
3
102
Mark
Mark@_M_Weber·
@sedielem Thanks for the great post! In our study about training VQGANs, we made an observation that might be of interest to you and your readers. When using LFQ, our tokenizer is able to model certain visual properties (like exposure, smoothness, color palette) into different channels.
Mark tweet media
English
2
0
17
2.9K
Roman Bachmann
Roman Bachmann@roman__bachmann·
Have you ever been bothered by the constraints of fixed-sized 2D-grid tokenizers? We present FlexTok, a flexible-length 1D tokenizer that enables autoregressive models to describe images in a coarse-to-fine manner. flextok.epfl.ch arxiv.org/abs/2502.13967 🧵 1/n
Roman Bachmann tweet media
English
6
30
190
59.6K
Thomas Wolf
Thomas Wolf@Thom_Wolf·
After 6+ months in the making and burning over a year of GPU compute time, we're super excited to finally release the "Ultra-Scale Playbook" Check it out here: hf.co/spaces/nanotro… A free, open-source, book to learn everything about 5D parallelism, ZeRO, fast CUDA kernels, how and why overlap compute & communication – all scaling bottlenecks and tools introduced with motivation, theory, interactive plots from our 4000+ scaling experiments and even NotebookLM podcasters to tag along with you. - How was DeepSeek trained for $5M only? - Why did Mistral trained an MoE? - Why is PyTorch native Data Parallelism implementation so complex under the hood? - What are all the parallelism techniques and why were they invented? - Should I use ZeRO-3 or Pipeline Parallelism when scaling and what's the story behind both techniques? - What is this Context Parallelism that Meta used to train Llama 3? Is it different from Sequence Parallelism? - What is FP8? how does it compares to BF16? In this book, our goal was to gather, in a single place, a coherent, easy to read yet detailed story of all the techniques that make today's LLM scaling possible. The largest factor for democratizing AI will always be teaching everyone how to build AI and in particular how to create, train and fine-tune high performance models. In other word making accessible to everybody the techniques that power all recent large language models and efficient training is possibly one of the most essential of them. What started as a simple blog-post ended up becoming an interactive writing piece containing 30k+ words. So we've decided to actually print it as a real 100-pages physical book as well: the physical ultrafast playbook –containing all the science of distributed and fast AI training. We plan to send free copies as gifts to the first readers of the online version so feel free to add your email in the form linked in the blog post.
Thomas Wolf tweet media
English
109
691
3.9K
368.9K
Mark ری ٹویٹ کیا
Bojan Pancevski
Bojan Pancevski@bopanc·
Our exclusive with @JDVance ahead of @MunSecConf: -On Ukraine, he says there will be a good peace deal that will guarantee the country’s long-term sovereignty - and Putin will face sanctions and military measures if he doesn’t play ball. -On Europe, he will tell mainstream leaders in Munich that they’ve become Soviet-style enemies of free speech and democracy who ignore voters and fail to stop mass migration. Some Germans will be particularly shocked when he calls for ending the firewall against the far-Right @AfD and embracing the populist vote. With @alexbward via @WSJ wsj.com/world/europe/v…
English
54
160
362
154.9K
Mark ری ٹویٹ کیا
Zhou Xian
Zhou Xian@zhou_xian_·
Everything you love about generative models — now powered by real physics! Announcing the Genesis project — after a 24-month large-scale research collaboration involving over 20 research labs — a generative physics engine able to generate 4D dynamical worlds powered by a physics simulation platform designed for general-purpose robotics and physical AI applications. Genesis's physics engine is developed in pure Python, while being 10-80x faster than existing GPU-accelerated stacks like Isaac Gym and MJX. It delivers a simulation speed ~430,000 faster than in real-time, and takes only 26 seconds to train a robotic locomotion policy transferrable to the real world on a single RTX4090 (see tutorial: genesis-world.readthedocs.io/en/latest/user…). The Genesis physics engine and simulation platform is fully open source at github.com/Genesis-Embodi…. We'll gradually roll out access to our generative framework in the near future. Genesis implements a unified simulation framework all from scratch, integrating a wide spectrum of state-of-the-art physics solvers, allowing simulation of the whole physical world in a virtual realm with the highest realism. We aim to build a universal data engine that leverages an upper-level generative framework to autonomously create physical worlds, together with various modes of data, including environments, camera motions, robotic task proposals, reward functions, robot policies, character motions, fully interactive 3D scenes, open-world articulated assets, and more, aiming towards fully automated data generation for robotics, physical AI and other applications. Open Source Code: github.com/Genesis-Embodi… Project webpage: genesis-embodied-ai.github.io Documentation: genesis-world.readthedocs.io 1/n
English
561
3K
16K
3.8M
Mark
Mark@_M_Weber·
7/9 MaskBit achieves state-of-the-art performance with up to 1.52 FID on ImageNet 256×256, using just 305M parameters. That's better than prior diffusion and autoregressive models! 🔥
English
1
0
0
131
Mark
Mark@_M_Weber·
5/9 Our analysis reveals fascinating properties of bit tokens: Most channels appear to capture different visual concepts, making the representation more interpretable! Flipping individual bits leads to systematic changes in attributes like texture, color, and style. 🎨
English
1
0
0
144
Mark
Mark@_M_Weber·
6/9 For our generation model MaskBit, the key innovation is: We are the first to utilise the SAME (bit) token representation for both tokenizer and generator, unlike prior methods that require separate embedding tables! No need to (re-)learn codebooks anymore!
English
1
0
0
126
Mark
Mark@_M_Weber·
4/9 Our tokenizer learns a semantically structured latent space! Checkout what happens when we flip bits in each channel. Having a consist visual interpretable latent space could be a gamechanger for control!
English
1
0
1
163
Mark
Mark@_M_Weber·
3/9 We carefully revisit the VQGAN design and provide a complete, reproducible recipe for building a modern tokenizer. Our VQGAN+ improves reconstruction FID from 7.94 to 1.66 in the low vocabulary, low resolution setting - a huge 6.28 gain! 📈 See chapter 2 for details.
English
1
0
0
158