Mez

4.9K posts

Mez banner
Mez

Mez

@mez_gebre

Palo Alto, CA Katılım Temmuz 2009
213 Takip Edilen336 Takipçiler
Mez retweetledi
Songyou Peng
Songyou Peng@songyoupeng·
Yay, finally! Introducing Vision Banana🍌 from @GoogleDeepMind, our unified model that outperforms SoTA specialist models on various vision tasks! By treating 2D/3D vision tasks as image generation, we unlock a new foundation for CV. Project page: vision-banana.github.io (1/5)
English
56
309
2.2K
280K
Mez retweetledi
SkalskiP
SkalskiP@skalskip92·
RF-DETR + Trackers is such a strong open-source combo I fine-tuned RF-DETR on the VisDrone dataset, plugged in the OC-SORT tracker now I’m going to build some cool smart city demos link: github.com/roboflow/track…
English
12
106
722
102.7K
Mez retweetledi
Hugging Models
Hugging Models@HuggingModels·
Somebody just trained an LTX 2.3 LoRA of George Costanza at home on a 5090 in about a day with AI Toolkit. Then generated a 30-second video with ComfyUI on the same setup in just 6 minutes. Open source is, always has been, and always will be, the future of generative AI.
English
31
84
869
81.8K
Mez retweetledi
Jia-Bin Huang
Jia-Bin Huang@jbhuang0604·
A great example that medium shapes impact. A research paper on arXiv 11 months ago: 👉 2 citations so far An accessible blog post one day ago: 👉 12 M views, instant community adoption
Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

English
29
82
1.1K
162.2K
Mez retweetledi
Hugging Models
Hugging Models@HuggingModels·
Qwen3.5 0.8B running real-time video captioning on a Mac Studio M2 Ultra. <1s per frame. 269 frames from a 3m49s video. Streaming descriptions as it plays. Pause anywhere, it actually understands the scene. ~1GB model. Local AI is getting unreasonably capable. Video credit: @stevibe
English
54
284
3K
264.7K
Mez retweetledi
Peter Holderrieth
Peter Holderrieth@peholderrieth·
🚀MIT Flow Matching and Diffusion Lecture 2026 Released (diffusion.csail.mit.edu)! We just released our new MIT 2026 course on flow matching and diffusion models! We teach the full stack of modern AI image, video, protein generators - theory and practice. We include: 📺 Videos: Step-by-step derivations. 📝 Notes: Mathematically self-contained lecture notes 💻 Coding: Hands-on exercises for every component We fully improved last years’ iteration and added new topics: latent spaces, diffusion transformers, building language models with discrete diffusion models. Everything is available here: diffusion.csail.mit.edu A huge thanks to Tommi Jaakkola for his support in making this class possible and Ashay Athalye (MIT SOUL) for the incredible production! Was fun to do this with @RShprints! #MachineLearning #GenerativeAI #MIT #DiffusionModels #AI
Peter Holderrieth tweet media
English
15
396
2.3K
528.1K
Mez retweetledi
alphaXiv
alphaXiv@askalphaxiv·
Yann LeCun and his team dropped yet another paper! "V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning" In this V-JEPA upgrade, they showed that if you make a video model predict every patch, not just the masked ones AND at multiple layers, they are able to turn vague scene understanding into dense + temporal stable features that actually understands "what is where". This key insight drove improvements in segmentation, depth, anticipation, and even robot planning.
alphaXiv tweet media
English
32
220
1.4K
121.9K
Mez retweetledi
Nataniel Ruiz
Nataniel Ruiz@natanielruizg·
Excited to show some surprising inventions on generative multiplayer games we made at Google with Stanford. We call the work MultiGen. I've always been inspired by early studios like id Software with Doom or Blizzard with Warcraft bringing networked video games to the next level. We are at the point in history where we can make strides like them, but for generative games. It's a strange feeling to be in the age of generative video games while still discovering how exactly to train the models and design the tools that make them useful. All of the tools that have been invented for classic game engines need to be redesigned for generative games. For example level and world design is not entirely possible with existing technology. We introduce editable memory to diffusion game engines that allow for design of new levels via a minimap. But we can easily imagine how this can be expanded with different creation tools. The end goal of this research direction is to allow game designers to be able to guide the generation process of their world, at the granularity that they prefer. Editable memory also allows us to add multiplayer to Generative Doom. We were amazed when we saw GameNGen some years ago, and now you can play it live with friends in real-time, on your couch or even online. Shared representations like our editable memory seem like the future for this type of experience. Models are, in some cases, expensive and approximate encoders but great interpolators and extrapolators. Leveraging their strengths lets you have completely new experiences that can be realized now and not in the distant future. This work was started at my previous team and continued in collaboration with Stanford. Congratulations to all for the discoveries.
English
32
78
577
103.9K
Mez
Mez@mez_gebre·
I've been messing around with different generative models around flow matching. Variational Rectified Flow Matching is a cool variant that solves the mean collapse issue with multi-modal target distributions!
Mez tweet media
English
0
0
0
27
Mez
Mez@mez_gebre·
Chatting with a friend working on full-duplex audio models (audio in → audio out) got me curious about how to work with audio. Did a weekend of experiments using audio classification as a “hello world” to learn the space. Notes + deep dive-ish 👇 mez.sh/2026/02/17/aud…
English
0
0
0
18
Mez retweetledi
Sakana AI
Sakana AI@SakanaAILabs·
We’re excited to introduce Doc-to-LoRA and Text-to-LoRA, two related research exploring how to make LLM customization faster and more accessible. pub.sakana.ai/doc-to-lora/ By training a Hypernetwork to generate LoRA adapters on the fly, these methods allow models to instantly internalize new information or adapt to new tasks. Biological systems naturally rely on two key cognitive abilities: durable long-term memory to store facts, and rapid adaptation to handle new tasks given limited sensory cues. While modern LLMs are highly capable, they still lack this flexibility. Traditionally, adding long-term memory or adapting an LLM to a specific downstream task requires an expensive and time-consuming model update, such as fine-tuning or context distillation, or relies on memory-intensive long prompts. To bypass these limitations, our work focuses on the concept of cost amortization. We pay the meta-training cost once to train a hypernetwork capable of producing tasks or document specific LoRAs on demand. This turns what used to be a heavy engineering pipeline into a single, inexpensive forward pass. Instead of performing per-task optimization, the hypernetwork meta-learns update rules to instantly modify an LLM given a new task description or a long document. In our experiments, Text-to-LoRA successfully specializes models to unseen tasks using just a natural language description. Building on this, Doc-to-LoRA is able to internalize factual documents. On a needle-in-a-haystack task, Doc-to-LoRA achieves near-perfect accuracy on instances five times longer than the base model's context window. It can even generalize to transfer visual information from a vision-language model into a text-only LLM, allowing it to classify images purely through internalized weights. Importantly, both methods run with sub-second latency, enabling rapid experimentation while avoiding the overhead of traditional model updates. This approach is a step towards lowering the technical barriers of model customization, allowing end-users to specialize foundation models via simple text inputs. We have released our code and papers for the community to explore. Doc-to-LoRA Paper: arxiv.org/abs/2602.15902 Code: github.com/SakanaAI/Doc-t… Text-to-LoRA Paper: arxiv.org/abs/2506.06105 Code: github.com/SakanaAI/Text-…
GIF
English
74
349
2.2K
604.8K
Mez retweetledi
Modular
Modular@Modular·
Mojo🔥 is now available for download locally to your machine! ❤️‍🔥🚀  Beyond a compiler, the Mojo SDK includes a full set of developer and IDE tools 🛠 that make it easy to build and iterate on Mojo applications. Let’s build the future together!🔥 modular.com/blog/mojo-its-…
English
57
422
1.7K
428.3K
Mez retweetledi
Paul Graham
Paul Graham@paulg·
"Sperm count appeared to have declined 52 per cent in 38 years, or something over 1 per cent a year." ft.com/content/f14ab2…
English
156
352
2.1K
995K
Mez retweetledi
Brendan Dolan-Gavitt
ChatGPT exploits a buffer overflow 😳
Brendan Dolan-Gavitt tweet mediaBrendan Dolan-Gavitt tweet mediaBrendan Dolan-Gavitt tweet media
English
68
947
5.8K
0
Mez retweetledi
Jens Axboe
Jens Axboe@axboe·
"Running a successful open source project is just Good Will Hunting in reverse, where you start out as a respected genius and end up being a janitor who gets into fights." Quote attributed to @cra, and I don't think I've ever seen anything more true posted.
English
34
787
4.5K
0
Mez retweetledi
Paul Graham
Paul Graham@paulg·
Effective organizations are unnatural. The natural state of organizations is bureaucracy and turf wars, and once deprived of effective leadership they revert to their natural state with shocking speed.
English
89
459
3.8K
0
Mez retweetledi
Bojan Tunguz
Bojan Tunguz@tunguz·
A very good paper I came across this morning by the @DeepMind researchers. For the past five years Transformers have been one of the most dominant approaches to Deep Learning problems, especially in the #NLP domain. 1/5
Bojan Tunguz tweet media
English
12
187
1.1K
0