Anna Khoreva

395 posts

Anna Khoreva

@anna_khoreva

Head of Applied Science @Zalando, previously Senior Research Manager @Bosch_AI, PhD @cvml_mpiinf. GenAI and Computer Vision enthusiast. Opinions are my own.

Berlin, Deutschland Katılım Haziran 2020

420 Takip Edilen983 Takipçiler

Sabitlenmiş Tweet

Anna Khoreva@anna_khoreva·25 Oca

Happy to share - “VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis” has been accepted to @iclr_conf #ICLR2025 Code: lnkd.in/eu6SMeKT Paper: lnkd.in/e3mbE9PP Kudos to the amazing team: @YumengLi_007 Bill Beluch @margret_keuper @isDanZhang ❤️

Anna Khoreva@anna_khoreva

Code for VSTAR is released 🚀 It enables longer video synthesis w/o re-training and allows to control the dynamics of the synthesised video. Check it out 👇

English

1.1K

Anna Khoreva retweetledi

Peter Holderrieth@peholderrieth·18 Mar

🚀MIT Flow Matching and Diffusion Lecture 2026 Released (diffusion.csail.mit.edu)! We just released our new MIT 2026 course on flow matching and diffusion models! We teach the full stack of modern AI image, video, protein generators - theory and practice. We include: 📺 Videos: Step-by-step derivations. 📝 Notes: Mathematically self-contained lecture notes 💻 Coding: Hands-on exercises for every component We fully improved last years’ iteration and added new topics: latent spaces, diffusion transformers, building language models with discrete diffusion models. Everything is available here: diffusion.csail.mit.edu A huge thanks to Tommi Jaakkola for his support in making this class possible and Ashay Athalye (MIT SOUL) for the incredible production! Was fun to do this with @RShprints! #MachineLearning #GenerativeAI #MIT #DiffusionModels #AI

English

397

2.2K

527.7K

Anna Khoreva retweetledi

Andrej Karpathy@karpathy·7 Mar

I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. github.com/karpathy/autor… Part code, part sci-fi, and a pinch of psychosis :)

English

1.1K

3.6K

28.4K

11M

Anna Khoreva retweetledi

Ziwei Liu@liuziwei7·5 Mar

🚫 No Vision Encoder (VE) 🚫 No Variational Autoencoder (VAE) ✅ Just one end-to-end model directly engages with native signals, pixels and words, for both understanding and generation. 💊NEO-unify💊 is the first step toward **truly end-to-end unified models**, learning directly from near-lossless inputs via a representation space shaped by the model itself.

English

587

72.4K

Anna Khoreva retweetledi

Jon Barron@jon_barron·4 Mar

One of the more interesting and thought provoking research papers I've seen in a while. A system for reading and reimplementing NeRF papers, and it seems to work very well. Pretty easy to extrapolate out from here to what CVPR 2027 papers will look like. seemandhar.github.io/NERFIFY/

English

376

47.1K

Anna Khoreva retweetledi

ana@freefallkdrama·14 Şub

thinking how unfair this whole program is where petr gumennik did 5 quads clean, didn't make any serious mistake, in top 5 had definitely the best performance AND still DIDN'T get the medal

English

642

9.4K

Anna Khoreva retweetledi

ICLR@iclr_conf·23 Eyl

We’ve received A LOT OF submissions this year 🤯🤯 and are excited to see so much interest! To ensure high-quality review, we are looking for more dedicated reviewers. If you'd like to help, please sign up here docs.google.com/forms/d/e/1FAI…

English

373

100.9K

Anna Khoreva retweetledi

Angjoo Kanazawa@akanazawa·12 Ağu

Viser completely changed the way we do research. Before viser, it was hard to visualize 3D/4D data, let alone share it. Now it’s all just in a browser! It’s amazingly powerful and looks awesome. It’s how we render our results and videos. We love it and hope you will too!

Brent Yi@brenthyi

July has been a big month for Viser! - Released v1.0.0😊 - We did some writing Some demos👇

English

345

23.6K

Anna Khoreva retweetledi

ICLR@iclr_conf·6 Ağu

Announcing the ICLR 2026 Call for Papers! Abstract submission: Sept 19 (AoE) Paper submission: Sept 24 (AoE) Reviews released: Nov 11 Author/Reviewer discussion: Nov 11-Dec 3 Final decisions: Jan 22 2026 iclr.cc/Conferences/20…

English

530

51.7K

Anna Khoreva retweetledi

Amir Zamir@zamir_ar·30 Haz

We open-sourced the codebase of Flextok. Flextok is an image tokenizer that produces flexible-length token sequences and represents image content in a compressed coarse-to-fine way. Like in PCA: the 1st token captures the most compressed representation of the image, the 2nd token is added on top of the 1st token and adds more details, and so on. This contrasts with most common image tokenizers, which output fixed-size token sequences and often roughly align with local image content. Flexible-length, coarse-to-fine tokens are a useful and intuitive structure to model. They impact the whole pipeline of image generation and understanding. Flextok does this using simple and effective known mechanisms, e.g., applying nested dropout on tokens during training. The emerged structure looks semantic, while no language-based supervision was used anywhere. We’ll present it at #ICML25. Demo: huggingface.co/spaces/EPFL-VI… Visuals: flextok.epfl.ch Code: github.com/apple/ml-flext… @EPFL_en @Apple @ICepfl @EPFL_AI_Center

Roman Bachmann@roman__bachmann

Have you ever been bothered by the constraints of fixed-sized 2D-grid tokenizers? We present FlexTok, a flexible-length 1D tokenizer that enables autoregressive models to describe images in a coarse-to-fine manner. flextok.epfl.ch arxiv.org/abs/2502.13967 🧵 1/n

English

463

49.7K

Anna Khoreva@anna_khoreva·26 Haz

I'm #hiring for the Sr. Applied Scientist role at @Zalando. We're looking for AI enthusiasts skilled in scaling deep learning pipelines. #career #jobs #AI Apply here 👇 Sr. Applied Scientist - Scaling Deep Learning Experimentation jobs.zalando.com/en/jobs/272085…

English

315

Anna Khoreva@anna_khoreva·16 Haz

I'm #hiring for 2 Sr. Applied Scientists roles at @Zalando. We're looking for exceptional talent skilled in GenAI and Vision Language Models. #career #jobs Apply here 👇 - Sr. Applied Scientist (GenAI) jobs.zalando.com/de/jobs/272030… - Sr. Applied Scientist (VLMs) jobs.zalando.com/en/jobs/272030…

English

592

Anna Khoreva retweetledi

Adam Kortylewski 🚨 Hiring PhD students@AdamKortylewski·7 Nis

Submit your extended abstract to our workshop on "Generative Models for Computer Vision" #CVPR2025 @CVPR Authors with accepted CVPR papers are welcome to present their poster as well! Deadline: April 25th We also have an incredible speaker line-up!

Adam Kortylewski 🚨 Hiring PhD students tweet media

English

12.5K

Anna Khoreva retweetledi

Edgar Schoenfeld@schoenfeldedgar·28 Şub

Check out our work on fast image and video generation! Accepted at #CVPR2025

AK@_akhaliq

FlexiDiT Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute

English

797

Anna Khoreva retweetledi

Yumeng Li@YumengLi_007·25 Oca

🔥 VSTAR is accepted at #ICLR2025 ! 🤗 It's open-sourced! Check it out & Start generating long dynamic videos without fine-tuning! yumengli007.github.io/VSTAR/ ❤️ @margret_keuper @isDanZhang @anna_khoreva

Yumeng Li@YumengLi_007

🤗Code is out: github.com/boschresearch/… 🚀TL;DR: VSTAR generates longer videos with dynamic visual evolution in a single pass. No fine-tuning is needed! 🙌Check out our project page: yumengli007.github.io/VSTAR/ Bill Beluch @margret_keuper @isDanZhang @anna_khoreva @Bosch_AI ❤️

English

1.4K

Anna Khoreva retweetledi

Andrej Karpathy@karpathy·3 Ara

The (true) story of development and inspiration behind the "attention" operator, the one in "Attention is All you Need" that introduced the Transformer. From personal email correspondence with the author @DBahdanau ~2 years ago, published here and now (with permission) following some fake news about how it was developed that circulated here over the last few days. Attention is a brilliant (data-dependent) weighted average operation. It is a form of global pooling, a reduction, communication. It is a way to aggregate relevant information from multiple nodes (tokens, image patches, or etc.). It is expressive, powerful, has plenty of parallelism, and is efficiently optimizable. Even the Multilayer Perceptron (MLP) can actually be almost re-written as Attention over data-indepedent weights (1st layer weights are the queries, 2nd layer weights are the values, the keys are just input, and softmax becomes elementwise, deleting the normalization). TLDR Attention is awesome and a *major* unlock in neural network architecture design. It's always been a little surprising to me that the paper "Attention is All You Need" gets ~100X more err ... attention... than the paper that actually introduced Attention ~3 years earlier, by Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio: "Neural Machine Translation by Jointly Learning to Align and Translate". As the name suggests, the core contribution of the Attention is All You Need paper that introduced the Transformer neural net is deleting everything *except* Attention, and basically just stacking it in a ResNet with MLPs (which can also be seen as ~attention per the above). But I do think the Transformer paper stands on its own because it adds many additional amazing ideas bundled up all together at once - positional encodings, scaled attention, multi-headed attention, the isotropic simple design, etc. And the Transformer has imo stuck around basically in its 2017 form to this day ~7 years later, with relatively few and minor modifications, maybe with the exception better positional encoding schemes (RoPE and friends). Anyway, pasting the full email below, which also hints at why this operation is called "attention" in the first place - it comes from attending to words of a source sentence while emitting the words of the translation in a sequential manner, and was introduced as a term late in the process by Yoshua Bengio in place of RNNSearch (thank god? :D). It's also interesting that the design was inspired by a human cognitive process/strategy, of attending back and forth over some data sequentially. Lastly the story is quite interesting from the perspective of nature of progress, with similar ideas and formulations "in the air", with a particular mentions to the work of Alex Graves (NMT) and Jason Weston (Memory Networks) around that time. Thank you for the story @DBahdanau !

English

133

987

6.7K

861.9K

Anna Khoreva retweetledi

Sherjil Ozair@sherjilozair·3 Ara

Very happy to hear that GANs are getting the test of time award at NeurIPS 2024. The NeurIPS test of time awards are given to papers which have stood the test of the time for a decade. I took some time to reminisce how GANs came about and how AI has evolve in the last decade.

English

118

972

219.9K

Anna Khoreva retweetledi

Peyman Milanfar@docmilanfar·14 Kas

Reviewers take note: 57% of people rejected their own argument when they thought it was someone else's. So take it easy with the criticism.

English

316

1.7K

128.8K

Anna Khoreva retweetledi

Thomas Kipf@tkipf·14 Kas

The world doesn’t live on a pixel grid and neither should vision models! Excited to share Moving off-the-Grid (MooG): a video model w/o grid-based representations. MooG learns detached “off-the-grid tokens” that bind to (and track) scene elements as camera & content move. 🧵

English

753

76.5K

Anna Khoreva retweetledi

Michael Black@Michael_J_Black·8 Kas

Many people are in the middle of the @CVPR deadline. So I'm sharing my guide to writing a CVPR paper (or any paper). My students have had this for years but I haven't shared it publicly before. I hope you find it useful and write a great paper. #CVPR2025 @black_51980/writing-a-good-scientific-paper-c0f8af480c91" target="_blank" rel="nofollow noopener">medium.com/@black_51980/w…

English

169

733

72.5K

Keşfet

@RShprints @EPFL_en @Apple @ICepfl @EPFL_AI_Center @Zalando @CVPR @margret_keuper