Jefferson Enrique Hernandez Cevallos

491 posts

Jefferson Enrique Hernandez Cevallos banner
Jefferson Enrique Hernandez Cevallos

Jefferson Enrique Hernandez Cevallos

@jefehern

Opinions are my own. Ph.D. student at @RiceCompSci in @vislang. Previously @RealityLabs @AdobeResearch, @InariAILab, @AdaVivInc.

Houston, Texas Entrou em Aralık 2017
345 Seguindo66 Seguidores
Jefferson Enrique Hernandez Cevallos retweetou
Massimiliano Viola
Massimiliano Viola@massiviola01·
Thread on VJEPA 2.1🤟 This DEFINITELY flew under the radar: just a few days ago, @AIatMeta released V-JEPA 2.1, taking a massive step toward closing the gap between image and video domains. For a long time, image backbones were the only option for solving dense vision tasks. This model disagrees, showing that universal spatial understanding also emerges from large-scale video models!🎥
Massimiliano Viola tweet media
English
9
67
574
53.3K
Jefferson Enrique Hernandez Cevallos retweetou
Ji-Ha
Ji-Ha@Ji_Ha_Kim·
Blog post - Transformers as Constrained Optimization Rewriting pre-norm decoder-only transformers as solutions to regularized objectives. Changing regularization to hard constraint gives a canonical temperature, generalizing to KL-divergence, ideas of cross-layer interaction.
Ji-Ha tweet media
English
10
67
601
29.1K
Jefferson Enrique Hernandez Cevallos retweetou
Ksenia_TuringPost
Ksenia_TuringPost@TheTuringPost·
Is RL dead for post-training? Of course not, but there are other interesting options for fine-tuning ▪️ Evolution Strategies (ES) is a gradient-free optimization method that tests random parameter changes and moves the model toward the best-performing ones. - It creates a small population of models by adding random perturbations to the parameters - Perturbed models' outputs are scored with a reward function/verifier - Model parameters are updated in the direction of perturbations that achieved best rewards The best thing is that ES can scale to billion-parameter models and shows clear gains over RL • On Countdown benchmark: ES raised Qwen-2.5-3B to 60.5% (vs 32.5% GRPO) and Llama-3.1-8B to 61.2% (vs ~51% RL) • ARC-AGI 0.2% → 29.5% and Sudoku 2.5% → 69.5% improvement And everything without computing gradients through backpropagation So don’t overlook other approaches in favor of RL alone - there is a lot to explore. Here we’ve gathered the new fine-tuning stack for LLMs with ES and most promising LoRAs -> turingpost.com/p/beyondrl
Ksenia_TuringPost tweet media
English
11
41
249
22K
Jefferson Enrique Hernandez Cevallos retweetou
Seungwook Han
Seungwook Han@seungwookh·
Can language models learn useful priors without ever seeing language? We pre-pre-train transformers on neural cellular automata — fully synthetic, zero language. This improves language modeling by up to 6%, speeds up convergence by 40%, and strengthens downstream reasoning. Surprisingly, it even beats pre-pre-training on natural text! Blog: hanseungwook.github.io/blog/nca-pre-p… (1/n)
Seungwook Han tweet media
English
48
259
1.7K
240.8K
Jefferson Enrique Hernandez Cevallos retweetou
Minh Dinh
Minh Dinh@minhinhtrng·
Modern vision models lacks robustness when objects appear in unusual poses. @StphTphsn1 and I study latent equivariant operators as a remedy and discuss caveats of these operators. Below is a summary of the work, accepted at the GRaM Workshop at ICLR @iclr_conf 2026. 🧵
Minh Dinh tweet media
GIF
English
1
13
57
9.9K
Jefferson Enrique Hernandez Cevallos retweetou
Jefferson Enrique Hernandez Cevallos retweetou
George Bredis
George Bredis@BredisGeorge·
Most imagination-based world models learn representations by reconstructing pixels. But reconstruction may not be the right objective for control. In our new paper we explore a different idea: 👉 predict the next embedding instead of reconstructing observations. Introducing NE-Dreamer. Project page: corl-team.github.io/nedreamer/ Paper: arxiv.org/pdf/2603.02765 Code: github.com/corl-team/nedr…
GIF
English
12
58
366
46.7K
Jefferson Enrique Hernandez Cevallos retweetou
Dimitris Papailiopoulos
Dimitris Papailiopoulos@DimitrisPapail·
Two days ago I didn't know if a 10K transformer could add 10-digit numbers. AdderBoard is now at 36 hand-coded @alexlitzenberge (🏆) !! 311 trained @reza_byt (🏆) !! The gap is v interesting: humans can still construct solutions that gradient descent can't find. For now
Dimitris Papailiopoulos tweet mediaDimitris Papailiopoulos tweet media
English
23
19
274
29.3K
Jefferson Enrique Hernandez Cevallos retweetou
Gabriele Berton
Gabriele Berton@gabriberton·
Apparently counterintuitive result, but obvious if you think about it: Thinking longer is bad for OCR Just like thinking longer is bad for VQA Too much thinking and the image is too far from the answer, and the answer's tokens attend the reasoning trace, not the image [1/3]
Gabriele Berton tweet media
Generative History@HistoryGPT

Why the difference? Not sure but it’s what we’ve seen with other models too (Claude and OpenAI) as they are optimized more for thinking based pipelines. Remember: increasing thinking budgets always decreases accuracy on handwriting recognition. (4/5)

English
1
3
19
3.3K
Jefferson Enrique Hernandez Cevallos retweetou
Yehonathan Litman
Yehonathan Litman@yehonation·
Excited to share our new work EditCtrl! We introduce a disentangled local-global control video inpainting framework that dynamically allocates compute where needed - achieving up to 10x compute savings over full-attention while matching or exceeding SOTA editing quality. 🧵
English
2
20
93
18.3K
Jefferson Enrique Hernandez Cevallos retweetou
Amrith Setlur
Amrith Setlur@setlur_amrith·
I'll admit, going in I was not 100% sure this was possible: we trained a tiny 4B model (QED-Nano) to prove math theorems at the Olympiad level! Today, we release the full recipe, from the data curation done for SFT to our RL algorithm that explicitly optimizes for test-time scaling over millions of tokens (i.e., we train QED-Nano to continually improve as we apply modern day test-time scaffolds like DeepSeekMath-agent over it). 🧵⬇️
Amrith Setlur tweet media
English
7
26
148
23.3K
Jefferson Enrique Hernandez Cevallos retweetou
Vincent Sitzmann
Vincent Sitzmann@vincesitzmann·
In my recent blog post, I argue that "vision" is only well-defined as part of perception-action loops, and that the conventional view of computer vision - mapping imagery to intermediate representations (3D, flow, segmentation...) is about to go away. vincentsitzmann.com/blog/bitter_le…
English
43
157
1K
366.4K
Jefferson Enrique Hernandez Cevallos retweetou
Haoyu Han
Haoyu Han@Hyhan0118·
Why is RL training with policy-gradient methods often unstable—especially near the optimum? Our new work studies this through the noise-to-signal ratio (NSR) of the REINFORCE gradient estimator (defined as the estimator variance (noise) normalized by the squared norm of the true gradient (signal)). We show that the NSR is highly non-uniform and often increases as training progresses, which can trigger instability or even policy collapse. Paper link: arxiv.org/abs/2602.01460 Long thread👇
GIF
English
3
13
124
17.4K
Jefferson Enrique Hernandez Cevallos retweetou
Ziming Liu
Ziming Liu@ZimingLiu11·
🚨Transformers don't learn Newton's laws? They learn Kepler's laws! Like us, transformers don't predict a flying ball via a differential equation, but by fitting a curve. Moreover, reducing context length steers a transformer from Keplerian to Newtonian. Compression in play.
Ziming Liu tweet media
English
25
206
1.2K
115.5K
Jefferson Enrique Hernandez Cevallos retweetou
Zilin Xiao
Zilin Xiao@ZilinXiao2·
🚀 Two papers accepted to #ICLR2026 on test-time scaling for vision-language systems (retrieval + reasoning)! 1) MetaEmbed (Oral Presentation): Meta Tokens + Matryoshka multi-vector training → flexible late interaction, choose #vectors at test time for accuracy↔efficiency. Paper: arxiv.org/abs/2509.18095 Work done at @AIatMeta with amazing collaborators: Qi Ma, @Mengting_Gu, Jason Chen, Xintao Chen, @vislang and @MohanVijaimohan! 2) ProxyThinker: training-free test-time guidance from small “slow-thinking” visual reasoners → self-verification / self-correction via distribution-level guidance. Paper: arxiv.org/abs/2505.24872 Work done with @JaywonK17250, @Siru_Ouyang, @jefehern, @yumeng0818 and @vislang! While I won't be able to travel to Brazil🇧🇷, please say Hi to the team :-) #MultimodalRetrieval #VisualReasoning #VisionLanguage #TestTimeCompute #Embeddings
English
4
21
92
20K
Jefferson Enrique Hernandez Cevallos retweetou
Tanishq Mathew Abraham, Ph.D.
Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·
Rewards as Labels: Revisiting RLVR from a Classification Perspective "we propose Rewards as Labels (REAL), a novel framework that revisits verifiable rewards as categorical labels rather than scalar weights, thereby reformulating policy optimization as a classification problem. "
Tanishq Mathew Abraham, Ph.D. tweet media
English
4
19
118
6.6K
Jefferson Enrique Hernandez Cevallos retweetou
Rishabh Agarwal
Rishabh Agarwal@agarwl_·
And RL on imagenet is such a nice way to show this: every task can be formulated as RL but should you? Here they show how Reinforce has terrible efficiency but this MaxRL thingy with *large number of rollouts* don't.
Rishabh Agarwal tweet media
English
2
4
43
4.1K