
Matteo Robino 🌍
4K posts

Matteo Robino 🌍
@Robinohhh
Computer Vision Research Engineer 👁️








A new paper from @ylecun and others – V-JEPA 2.1 It changes the recipe of V-JEPA so the model learns both: • Global semantics – what is happening in the scene • Dense spatio-temporal structure – where things are and how they move The idea is to supervise not just masked tokens but the visible ones too There are 4 key ingredients for V-JEPA 2.1: - Dense prediction loss on both masked and visible tokens - Deep self-supervision across intermediate layers - Modality-specific tokenizers (2D for images, 3D for videos) within a shared encoder - Model + data scaling The workflow turns into: masked image/video → encode visible tokens → predict latent representations for both masked and visible tokens → supervise at multiple layers Here are the details:


CDS PhD student @KuangYilun, CDS founding director @ylecun, former CDS Faculty Fellow @timrudner, and others successfully applied biological sparsity to AI. Their new technique allows computer vision models to ignore 90% of data without losing accuracy. nyudatascience.medium.com/new-representa…




"La télévision a un monopole de fait sur la formation des cerveaux. Or en mettant l'accent sur les faits divers, en remplissant ce temps par du vide, du rien, on écarte les informations pertinentes que devrait posséder le citoyen pour exercer ses droits démocratiques." Pierre Bourdieu












People in comments: It removes the artistic intent! (😭) The artistic intent: Realism But the game engines haven't progressed in 10 years, and pushing them this extra 5% takes 10 times the time, effort and processing power Also, you can turn it off






C'est moi ou le plan Claude à 20€ existe juste pour te frustrer assez pour que tu passes à 90€ ?










