Ruben Tous
360 posts

Ruben Tous
@rubentous1
Associate Professor at the Department of Computer Architecture of the Universitat Politècnica de Catalunya (UPC)

Why are LLMs non-deterministic? What stops it from traversing the neurons in the same order given the same input each time?


The girl on a server in Iowa… 🤭 Jeff you are a legend.

I crossed an interesting threshold yesterday, which I think many other mathematicians have been crossing recently as well. In the middle of trying to prove a result, I identified a statement that looked true and that would, if true, be useful to me. 1/3




SCIENCE: Two nights of limited sleep, defined as getting four hours per night, is enough to make people feel over 4.4 years older compared to those with adequate sleep. (Source: Proceedings of the Royal Society)

MMaDA: Multimodal Large Diffusion Language Models - UniGRPO, a unified RL algo tailored for diffusion foundation models - MMaDA-8B surpasses Show-o and SEED-X in multimodal understanding, and excels over SDXL and Janus in text-to-image generation

We present MMaDA, first diffusion that unifies text reasoning, multimodal understanding, and image generation through Mixed Long-CoT, and unified RL - UniGRPO. 📚 Paper: arxiv.org/abs/2505.15809 💻 Code: github.com/Gen-Verse/MMaDA 📦 Model: huggingface.co/Gen-Verse/MMaD…


For the past two years, I've been consistently saying that these CEOs and their larvas-influencers are lying. Machine learning simply cannot do what they claimed they would achieve. Yet I've been labeled a naysayer, doomer, denier, and overall a negative person spoiling the fiesta. Today, only an imbecile fails to see that they've been lied to. Altman has shut up about the AGI, Nadella doesn't want to hear about it, and only Amodei just can't shut his mouth—probably for medical reasons. Do you really think they'll ever come and apologize to me for blaming me for speaking the truth?



PhysGen Rigid-Body Physics-Grounded Image-to-Video Generation We present PhysGen, a novel image-to-video generation method that converts a single image and an input condition (e.g., force and torque applied to an object in the image) to produce a realistic, physically plausible, and temporally consistent video. Our key insight is to integrate model-based physical simulation with a data-driven video generation process, enabling plausible image-space dynamics. At the heart of our system are three core components: (i) an image understanding module that effectively captures the geometry, materials, and physical parameters of the image; (ii) an image-space dynamics simulation model that utilizes rigid-body physics and inferred parameters to simulate realistic behaviors; and (iii) an image-based rendering and refinement module that leverages generative video diffusion to produce realistic video footage featuring the simulated motion. The resulting videos are realistic in both physics and appearance and are even precisely controllable, showcasing superior results over existing data-driven image-to-video generation works through quantitative comparison and comprehensive user study. PhysGen's resulting videos can be used for various downstream applications, such as turning an image into a realistic animation or allowing users to interact with the image and create various dynamics.








