Anna Khoreva

395 posts

Anna Khoreva banner
Anna Khoreva

Anna Khoreva

@anna_khoreva

Head of Applied Science @Zalando, previously Senior Research Manager @Bosch_AI, PhD @cvml_mpiinf. GenAI and Computer Vision enthusiast. Opinions are my own.

Berlin, Deutschland Katılım Haziran 2020
420 Takip Edilen983 Takipçiler
Sabitlenmiş Tweet
Anna Khoreva
Anna Khoreva@anna_khoreva·
Happy to share - “VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis” has been accepted to @iclr_conf #ICLR2025 Code: lnkd.in/eu6SMeKT Paper: lnkd.in/e3mbE9PP Kudos to the amazing team: @YumengLi_007 Bill Beluch @margret_keuper @isDanZhang ❤️
Anna Khoreva@anna_khoreva

Code for VSTAR is released 🚀 It enables longer video synthesis w/o re-training and allows to control the dynamics of the synthesised video. Check it out 👇

English
1
2
13
1.1K
Anna Khoreva retweetledi
Peter Holderrieth
Peter Holderrieth@peholderrieth·
🚀MIT Flow Matching and Diffusion Lecture 2026 Released (diffusion.csail.mit.edu)! We just released our new MIT 2026 course on flow matching and diffusion models! We teach the full stack of modern AI image, video, protein generators - theory and practice. We include: 📺 Videos: Step-by-step derivations. 📝 Notes: Mathematically self-contained lecture notes 💻 Coding: Hands-on exercises for every component We fully improved last years’ iteration and added new topics: latent spaces, diffusion transformers, building language models with discrete diffusion models. Everything is available here: diffusion.csail.mit.edu A huge thanks to Tommi Jaakkola for his support in making this class possible and Ashay Athalye (MIT SOUL) for the incredible production! Was fun to do this with @RShprints! #MachineLearning #GenerativeAI #MIT #DiffusionModels #AI
Peter Holderrieth tweet media
English
15
397
2.2K
527.7K
Anna Khoreva retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. github.com/karpathy/autor… Part code, part sci-fi, and a pinch of psychosis :)
Andrej Karpathy tweet media
English
1.1K
3.6K
28.4K
11M
Anna Khoreva retweetledi
Ziwei Liu
Ziwei Liu@liuziwei7·
🚫 No Vision Encoder (VE) 🚫 No Variational Autoencoder (VAE) ✅ Just one end-to-end model directly engages with native signals, pixels and words, for both understanding and generation. 💊NEO-unify💊 is the first step toward **truly end-to-end unified models**, learning directly from near-lossless inputs via a representation space shaped by the model itself.
Ziwei Liu tweet media
English
11
84
587
72.4K
Anna Khoreva retweetledi
Jon Barron
Jon Barron@jon_barron·
One of the more interesting and thought provoking research papers I've seen in a while. A system for reading and reimplementing NeRF papers, and it seems to work very well. Pretty easy to extrapolate out from here to what CVPR 2027 papers will look like. seemandhar.github.io/NERFIFY/
Jon Barron tweet media
English
6
56
376
47.1K
Anna Khoreva retweetledi
ana
ana@freefallkdrama·
thinking how unfair this whole program is where petr gumennik did 5 quads clean, didn't make any serious mistake, in top 5 had definitely the best performance AND still DIDN'T get the medal
English
4
36
642
9.4K
Anna Khoreva retweetledi
ICLR
ICLR@iclr_conf·
We’ve received A LOT OF submissions this year 🤯🤯 and are excited to see so much interest! To ensure high-quality review, we are looking for more dedicated reviewers. If you'd like to help, please sign up here docs.google.com/forms/d/e/1FAI…
English
12
71
373
100.9K
Anna Khoreva retweetledi
ICLR
ICLR@iclr_conf·
Announcing the ICLR 2026 Call for Papers! Abstract submission: Sept 19 (AoE) Paper submission: Sept 24 (AoE) Reviews released: Nov 11 Author/Reviewer discussion: Nov 11-Dec 3 Final decisions: Jan 22 2026 iclr.cc/Conferences/20…
English
3
62
530
51.7K
Anna Khoreva retweetledi
Amir Zamir
Amir Zamir@zamir_ar·
We open-sourced the codebase of Flextok. Flextok is an image tokenizer that produces flexible-length token sequences and represents image content in a compressed coarse-to-fine way. Like in PCA: the 1st token captures the most compressed representation of the image, the 2nd token is added on top of the 1st token and adds more details, and so on. This contrasts with most common image tokenizers, which output fixed-size token sequences and often roughly align with local image content. Flexible-length, coarse-to-fine tokens are a useful and intuitive structure to model. They impact the whole pipeline of image generation and understanding. Flextok does this using simple and effective known mechanisms, e.g., applying nested dropout on tokens during training. The emerged structure looks semantic, while no language-based supervision was used anywhere. We’ll present it at #ICML25. Demo: huggingface.co/spaces/EPFL-VI… Visuals: flextok.epfl.ch Code: github.com/apple/ml-flext… @EPFL_en @Apple @ICepfl @EPFL_AI_Center
Amir Zamir tweet media
Roman Bachmann@roman__bachmann

Have you ever been bothered by the constraints of fixed-sized 2D-grid tokenizers? We present FlexTok, a flexible-length 1D tokenizer that enables autoregressive models to describe images in a coarse-to-fine manner. flextok.epfl.ch arxiv.org/abs/2502.13967 🧵 1/n

English
6
82
463
49.7K
Anna Khoreva retweetledi
Adam Kortylewski 🚨 Hiring PhD students
Submit your extended abstract to our workshop on "Generative Models for Computer Vision" #CVPR2025 @CVPR Authors with accepted CVPR papers are welcome to present their poster as well! Deadline: April 25th We also have an incredible speaker line-up!
Adam Kortylewski 🚨 Hiring PhD students tweet media
English
4
13
74
12.5K
Anna Khoreva retweetledi
Yumeng Li
Yumeng Li@YumengLi_007·
🔥 VSTAR is accepted at #ICLR2025 ! 🤗 It's open-sourced! Check it out & Start generating long dynamic videos without fine-tuning! yumengli007.github.io/VSTAR/ ❤️ @margret_keuper @isDanZhang @anna_khoreva
Yumeng Li@YumengLi_007

🤗Code is out: github.com/boschresearch/… 🚀TL;DR: VSTAR generates longer videos with dynamic visual evolution in a single pass. No fine-tuning is needed! 🙌Check out our project page: yumengli007.github.io/VSTAR/ Bill Beluch @margret_keuper @isDanZhang @anna_khoreva @Bosch_AI ❤️

English
1
4
17
1.4K
Anna Khoreva retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
The (true) story of development and inspiration behind the "attention" operator, the one in "Attention is All you Need" that introduced the Transformer. From personal email correspondence with the author @DBahdanau ~2 years ago, published here and now (with permission) following some fake news about how it was developed that circulated here over the last few days. Attention is a brilliant (data-dependent) weighted average operation. It is a form of global pooling, a reduction, communication. It is a way to aggregate relevant information from multiple nodes (tokens, image patches, or etc.). It is expressive, powerful, has plenty of parallelism, and is efficiently optimizable. Even the Multilayer Perceptron (MLP) can actually be almost re-written as Attention over data-indepedent weights (1st layer weights are the queries, 2nd layer weights are the values, the keys are just input, and softmax becomes elementwise, deleting the normalization). TLDR Attention is awesome and a *major* unlock in neural network architecture design. It's always been a little surprising to me that the paper "Attention is All You Need" gets ~100X more err ... attention... than the paper that actually introduced Attention ~3 years earlier, by Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio: "Neural Machine Translation by Jointly Learning to Align and Translate". As the name suggests, the core contribution of the Attention is All You Need paper that introduced the Transformer neural net is deleting everything *except* Attention, and basically just stacking it in a ResNet with MLPs (which can also be seen as ~attention per the above). But I do think the Transformer paper stands on its own because it adds many additional amazing ideas bundled up all together at once - positional encodings, scaled attention, multi-headed attention, the isotropic simple design, etc. And the Transformer has imo stuck around basically in its 2017 form to this day ~7 years later, with relatively few and minor modifications, maybe with the exception better positional encoding schemes (RoPE and friends). Anyway, pasting the full email below, which also hints at why this operation is called "attention" in the first place - it comes from attending to words of a source sentence while emitting the words of the translation in a sequential manner, and was introduced as a term late in the process by Yoshua Bengio in place of RNNSearch (thank god? :D). It's also interesting that the design was inspired by a human cognitive process/strategy, of attending back and forth over some data sequentially. Lastly the story is quite interesting from the perspective of nature of progress, with similar ideas and formulations "in the air", with a particular mentions to the work of Alex Graves (NMT) and Jason Weston (Memory Networks) around that time. Thank you for the story @DBahdanau !
Andrej Karpathy tweet media
English
133
987
6.7K
861.9K
Anna Khoreva retweetledi
Sherjil Ozair
Sherjil Ozair@sherjilozair·
Very happy to hear that GANs are getting the test of time award at NeurIPS 2024. The NeurIPS test of time awards are given to papers which have stood the test of the time for a decade. I took some time to reminisce how GANs came about and how AI has evolve in the last decade.
English
17
118
972
219.9K
Anna Khoreva retweetledi
Peyman Milanfar
Peyman Milanfar@docmilanfar·
Reviewers take note: 57% of people rejected their own argument when they thought it was someone else's. So take it easy with the criticism.
Peyman Milanfar tweet media
English
29
316
1.7K
128.8K
Anna Khoreva retweetledi
Thomas Kipf
Thomas Kipf@tkipf·
The world doesn’t live on a pixel grid and neither should vision models! Excited to share Moving off-the-Grid (MooG): a video model w/o grid-based representations. MooG learns detached “off-the-grid tokens” that bind to (and track) scene elements as camera & content move. 🧵
English
10
89
753
76.5K
Anna Khoreva retweetledi
Michael Black
Michael Black@Michael_J_Black·
Many people are in the middle of the @CVPR deadline. So I'm sharing my guide to writing a CVPR paper (or any paper). My students have had this for years but I haven't shared it publicly before. I hope you find it useful and write a great paper. #CVPR2025 @black_51980/writing-a-good-scientific-paper-c0f8af480c91" target="_blank" rel="nofollow noopener">medium.com/@black_51980/w…
English
15
169
733
72.5K