Alex Oh

13 posts

Alex Oh

Alex Oh

@AlexOh2024

Grad researcher @ Duke | self-supervised learning, goal-conditioned RL | interested in world models for embodied AI

Katılım Haziran 2022
164 Takip Edilen6 Takipçiler
Alex Oh retweetledi
alphaXiv
alphaXiv@askalphaxiv·
Yann LeCun and his team dropped yet another paper! "V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning" In this V-JEPA upgrade, they showed that if you make a video model predict every patch, not just the masked ones AND at multiple layers, they are able to turn vague scene understanding into dense + temporal stable features that actually understands "what is where". This key insight drove improvements in segmentation, depth, anticipation, and even robot planning.
alphaXiv tweet media
English
33
224
1.4K
120.6K
Alex Oh retweetledi
Physical Intelligence
Physical Intelligence@physical_int·
We developed an RL method for fine-tuning our models for precise tasks in just a few hours or even minutes. Instead of training the whole model, we add an “RL token” output to π-0.6, our latest model, which is used by a tiny actor and critic to learn quickly with RL.
English
34
290
2.2K
403.6K
Alex LeBrun
Alex LeBrun@lxbrun·
I am joining @ylecun and an exceptional founding team to lead @amilabs as CEO. We have secured a $1.03 billion USD seed round to fuel our mission to build intelligent systems capable of truly understanding the real world—a long-term scientific endeavor.
English
222
287
5.7K
456.2K
Alex Oh
Alex Oh@AlexOh2024·
Ever see a Waymo and wonder why it needs those bulky LiDAR sensors? While modern autonomous systems rely on expensive, specialized hardware, humans navigate the world through vision alone. This motivated our development of LeDEEP, a vision-only framework that uses LeJEPA’s self-supervised learning to extract 3D structure from simple camera frames. By leveraging stable SSL objectives to produce rich latent spaces, we’ve built a resource-efficient alternative that show that robotics doesn't need a thousand-dollar "hat" to perceive depth. Read More Here: lnkd.in/gSBrfkmW
Alex Oh tweet media
English
0
0
0
17
Alex Oh retweetledi
Ilya Sutskever
Ilya Sutskever@ilyasut·
It’s extremely good that Anthropic has not backed down, and it’s siginficant that OpenAI has taken a similar stance. In the future, there will be much more challenging situations of this nature, and it will be critical for the relevant leaders to rise up to the occasion, for fierce competitors to put their differences aside. Good to see that happen today.
English
1.4K
2.5K
25.6K
3M
Alex Oh
Alex Oh@AlexOh2024·
Who taught us how to stand? No one really. We fell down. We got up. We did it again. It's that simple As it turns out, AI models can learn the same way. They fall. They adjust. They try again. It's just gravity, repetition, and a stubborn refusal to stay on the floor. As a Master’s student pursuing into AI research, I’ve been exploring how to make reinforcement learning more efficient through self-supervised frameworks. I took a strong recent goal-conditioned RL setup and asked a question: Can we replace the critic's heavy contrastive learning (InfoNCE) with a lighter, theoretically grounded regularization method (SIGReg) without breaking performance? The answer was surprisingly nuanced: On the Ant → the simpler method worked beautifully. On the Humanoid → it reached the goal but couldn’t stay standing. Why? As it turns out, standing still isn’t a passive state, it’s an active micro-policy of constant balance corrections. Contrastive loss naturally learned those fine distinctions; plain regularization didn’t. So I tried a hybrid: InfoNCE + SIGReg with a unified encoder. The Result: Identical performance… but 24 % faster training wall-time. I also ran capacity ablations, froze critics, tried SIGReg on the actor: several “what if” experiments you do when something doesn’t work as expected. Every failure taught me something useful. In both humans and machines, learning what to do is fragile. Learning what to do and what not to do is robust. The full write-up (with plots and honest lessons) is now on my blog → api.wandb.ai/links/aho13-du… If you’re working on RL, robotics, or efficient self-supervised learning, I’d love to hear your thoughts. @randall_balestr #ReinforcementLearning #DeepLearning #WorldModels
English
0
0
0
19
Alex Oh
Alex Oh@AlexOh2024·
Embedding models are the hidden engines behind modern search. These models translate data into lists of numbers (embeddings) which allow systems to understand the meaning of a document or image. As image-text embeddings become central to multimodal LLMs and information retrieval processes, their reliability is critical. We have been exploring Projected Gradient Descent (PGD) attacks, attacks that make small pixel changes to images that cause significant embedding drift, leading the model to completely misinterpret the input. This highlights a risk for information retrieval or performance when ingesting unverified public data. The upside is that effective defenses exist. Our initial tests show evidence of the following: 1. Newer, larger models seem to be more robust to these attacks. 2. These are "white box" attacks: attackers usually require prior knowledge about the model type to be successful. 3. Strategies like adversarial training on PGD attacks can successfully mitigate these vulnerabilities. To visualize this vulnerability, we used a PGD attack to force the model to identify this image of Tom Brady as "A Brick." - Center (CLIP Base): The attack succeeded. The model's similarity score for "A Brick" (0.31) actually overtook the score for "Patriots Football Quarterback" (0.28). The model is more confident that this is a picture of "A Brick" than Tom Brady. - Right (CLIP Large): The attack failed. As noted above, the larger model was more robust to this attack; the model correctly maintained "Patriots Football Quarterback" as the top match despite the adversarial noise. I would be happy to share more information about our results and code. I would love to hear your thoughts! References: Github Repo : lnkd.in/ecEiuvX9 MMEB Dataset (Experiment Benchmark data and Tom Brady Photo): lnkd.in/e22h7uq7
Alex Oh tweet media
English
0
0
0
11
Robert Youssef
Robert Youssef@rryssf_·
psychology solved the ai memory problem decades ago. we just haven't been reading the right papers. your identity isn't something you have. it's something you construct. constantly. from autobiographical memory, emotional experience, and narrative coherence. Martin Conway's Self-Memory System (2000, 2005) showed that memories aren't stored like video recordings. they're reconstructed every time you access them, assembled from fragments across different neural systems. and the relationship is bidirectional: your memories constrain who you can plausibly be, but your current self-concept also reshapes how you remember. memory is continuously edited to align with your current goals and self-images. this isn't a bug. it's the architecture. not all memories contribute equally. Rathbone et al. (2008) showed autobiographical memories cluster disproportionately around ages 10-30, the "reminiscence bump," because that's when your core self-images form. you don't remember your life randomly. you remember the transitions. the moments you became someone new. Madan (2024) takes it further: combined with Episodic Future Thinking, this means identity isn't just backward-looking. it's predictive. you use who you were to project who you might become. memory doesn't just record the past. it generates the future self. if memory constructs identity, destroying memory should destroy identity. it does. Clive Wearing, a British musicologist who suffered brain damage in 1985, lost the ability to form new memories. his memory resets every 30 seconds. he writes in his diary: "Now I am truly awake for the first time." crosses it out. writes it again minutes later. but two things survived: his ability to play piano (procedural memory, stored in cerebellum, not the damaged hippocampus) and his emotional bond with his wife. every time she enters the room, he greets her with overwhelming joy. as if reunited after years. every single time. episodic memory is fragile and localized. emotional memory is distributed widely and survives damage that obliterates everything else. Antonio Damasio's Somatic Marker Hypothesis destroyed the Western tradition of separating reason from emotion. emotions aren't obstacles to rational decisions. they're prerequisites. when you face a decision, your brain reactivates physiological states from past outcomes of similar decisions. gut reactions. subtle shifts in heart rate. these "somatic markers" bias cognition before conscious deliberation begins. the Iowa Gambling Task proved it: normal participants develop a "hunch" about dangerous card decks 10-15 trials before conscious awareness catches up. their skin conductance spikes before reaching for a bad deck. the body knows before the mind knows. patients with ventromedial prefrontal cortex damage understand the math perfectly when told. but keep choosing the bad decks anyway. their somatic markers are gone. without the emotional signal, raw reasoning isn't enough. Overskeid (2020) argues Damasio undersold his own theory: emotions may be the substrate upon which all voluntary action is built. put the threads together. Conway: memory is organized around self-relevant goals. Damasio: emotion makes memories actionable. Rathbone: memories cluster around identity transitions. Bruner: narrative is the glue. identity = memories organized by emotional significance, structured around self-images, continuously reconstructed to maintain narrative coherence. now look at ai agent memory and tell me what's missing. current architectures all fail for the same reason: they treat memory as storage, not identity construction. vector databases (RAG) are flat embedding space with no hierarchy, no emotional weighting, no goal-filtering. past 10k documents, semantic search becomes a coin flip. conversation summaries compress your autobiography into a one-paragraph bio. key-value stores reduce identity to a lookup table. episodic buffers give you a 30-second memory span, which as the Wearing case shows, is enough to operate moment-to-moment but not enough to construct identity. five principles from psychology that ai memory lacks. first, hierarchical temporal organization (Conway): human memory narrows by life period, then event type, then specific details. ai memory is flat, every fragment at the same level, brute-force search across everything. fix: interaction epochs, recurring themes, specific exchanges, retrieval descends the hierarchy. second, goal-relevant filtering (Conway's "working self"): your brain retrieves memories relevant to current goals, not whatever's closest in embedding space. fix: a dynamic representation of current goals and task context that gates retrieval. third, emotional weighting (Damasio): emotionally significant experiences encode deeper and retrieve faster. ai agents store frustrated conversations with the same weight as routine queries. fix: sentiment-scored metadata on memory nodes that biases future behavior. fourth, narrative coherence (Bruner): humans organize memories into a story maintaining consistent self across time. ai agents have zero narrative, each interaction exists independently. fix: a narrative layer synthesizing memories into a relational story that influences responses. fifth, co-emergent self-model (Klein & Nichols): human identity and memory bootstrap each other through a feedback loop. ai agents have no self-model that evolves. fix: not just "what I know about this user" but "who I am in this relationship." the fundamental problem isn't technical. it's conceptual. we've been modeling agent memory on databases. store, retrieve, done. but human memory is an identity construction system. it builds who you are, weights what matters, forgets what doesn't serve the current self, rewrites the narrative to maintain coherence. the paradigm shift: stop building agent memory as a retrieval system. start building it as an identity system. every component has engineering analogs that already exist. hierarchical memory = graph databases with temporal clustering. emotional weighting = sentiment-scored metadata. goal-relevant filtering = attention mechanisms conditioned on task state. narrative coherence = periodic summarization with consistency constraints. self-model bootstrapping = meta-learning loops on interaction history. the pieces are there. what's missing is the conceptual framework to assemble them. psychology provides that framework. the path forward isn't better embeddings or bigger context windows. it's looking inward. Conway showed memory is organized by the self, for the self. Damasio showed emotion is the guidance system. Rathbone showed memories cluster around identity transitions. Bruner showed narrative holds it together. Klein and Nichols showed self and memory bootstrap each other into existence. if we're serious about building agents with functional memory, we should stop reading database architecture papers and start reading psychology journals.
Robert Youssef tweet media
English
199
872
4.6K
265.1K
Alex Oh
Alex Oh@AlexOh2024·
@RoboPapers Crazy approach, it reminds me a lot of visualization before a race or a golf swing. To take this idea further, would it be possible to generate a latent representation of the video rather than a full video?
English
0
0
0
313
RoboPapers
RoboPapers@RoboPapers·
The holy grail of robotics is to be able to perform previously-unseen, out-of-distribution manipulation tasks “zero shot” in a new environment. NovaFlow proposes an approach which (1) generates a video, (2) computes predicted flow — how points move through the scene — and (3) uses this flow as an objective to generate a motion. Using this procedure, NovaFlow generates motions in unseen scenes, for unseen tasks, and can transfer across embodiments. To learn more, we are joined by @Hongyu_Lii and @jiahuifu_carol from RAI. Watch Episode #63 of RoboPapers with @chris_j_paxton and @micoolcho now to learn more!
English
1
4
52
20.8K
Alex Oh
Alex Oh@AlexOh2024·
ZXX
0
1
2
218