SAI

71 posts

SAI banner
SAI

SAI

@CompeteSai

A global platform for reinforcement learning. Compete on standardized environments, share your models, and advance the state of the art.

Entrou em Mart 2022
44 Seguindo290 Seguidores
Tweet fixado
SAI
SAI@CompeteSai·
We’re officially opening up access to SAI – a new platform for reinforcement learning. Submit models, compete in structured challenges, and learn from others — anytime, not just during a conference. Try out SAI now -> 🌐 competesai.com Documentation -> 📄 docs.competesai.com
English
1
1
21
1.7K
SiHing Guppy
SiHing Guppy@sihing_guppy·
Lighting changes constantly: time of day, weather, different rooms, sensor drift. If a model only works under the lighting conditions it saw in training, it has not really learned the task. It has learned one appearance regime. We put this to the test. Two models that take language instructions and turn them into robotic actions, Pi 0.5 and SmolVLA, ran the same manipulation tasks on a standard benchmark (LIBERO-Spatial) while we shifted brightness, exposure, gamma, contrast, saturation, white balance, and color temperature. Same geometry, same objects, same tasks. Only appearance changed. Pi 0.5 barely moved. Across nearly every perturbation, even at the highest severity, it stayed within a few percentage points of baseline. The only measurable dip was contrast, to around 94% of baseline. Not a collapse. A graceful decline. SmolVLA degraded under nearly every one. Saturation cut performance roughly in half. Brightness produced steady losses. Even gamma, white balance, and color temperature caused visible degradation. And then there was low contrast. SmolVLA went from baseline to near-zero. Not a degradation curve. A complete collapse. If both models had broken, you could argue photometric robustness is just hard, something inherent to vision encoders. Pi 0.5’s near-total immunity rules that out. Photometric robustness is achievable. SmolVLA’s failure is diagnostic. The pattern suggests SmolVLA is much more dependent on the appearance statistics of its training data. Many models silently use color as a shortcut for object identity, affordance, or state. When color shifts, those shortcuts break. By contrast, Pi 0.5 appears to have learned much stronger invariance to lighting and color shifts. Training augmentation is likely part of that story. The two models do share one vulnerability: low contrast. Pi 0.5 dips gently. SmolVLA collapses. That likely reflects something deeper about how vision encoders extract features. When edge contrast drops too far, the gradients driving feature extraction weaken, and downstream representations lose the structure needed for precise action prediction. Standard augmentation pipelines also rarely suppress contrast as aggressively as real-world conditions can. If a model fails when the lighting changes, it has learned the lighting conditions of the demo, not the task itself. Full analysis with interactive visualizations: x.com/sihing_guppy/s…
SiHing Guppy tweet media
SiHing Guppy@sihing_guppy

x.com/i/article/2037…

English
2
5
9
398
SAI
SAI@CompeteSai·
@sihing_guppy Line of the Week: many models rely on appearance far more than they admit.
English
0
0
2
31
SAI
SAI@CompeteSai·
@sihing_guppy Performance should not be stable across all those perturbation variations, unless they are not looking at the language input.
English
1
0
3
18
SiHing Guppy
SiHing Guppy@sihing_guppy·
11 types of language corruption on a robot policy — typos, synonyms, nonsense. Performance stable across all. > First instinct: robust. > Actual finding: indifferent. Unconditional success on a language test is not a passing grade. It's evidence the test isn't being taken.
SiHing Guppy tweet media
English
4
2
8
170
SAI
SAI@CompeteSai·
@realmc 让华语世界听见我们的声音!
中文
1
0
4
35
SAI retweetou
SiHing Guppy
SiHing Guppy@sihing_guppy·
What happens when you remove a robot's ability to read its instructions? Almost nothing. > Full model → 95% success > Remove language → 94% (▼1%) > Remove vision → 13% (▼82%) Near-blind without vision. Near-indifferent to language. If your evaluation only tests correct instructions, you're not measuring language. You're measuring vision.
SiHing Guppy tweet media
English
2
6
12
339
SiHing Guppy
SiHing Guppy@sihing_guppy·
A robot receives a language command "pick up the black bowl and place it on the plate" and executes it. We replaced the command with: "My name is Franka." No task. No object. No action verb. It picked up the bowl and placed it on the plate. The language instruction isn't being read. The scene is being acted upon. The prompt is decoration.
SiHing Guppy tweet media
English
4
5
11
392
SAI
SAI@CompeteSai·
We’ll be at @NVIDIAGTC next week in San Francisco. @sihing_guppy and @MarcAlloul will be there talking about model transparency and evaluation for robotic policies: how to understand what your models are actually doing before you deploy them. If you’re building or deploying robotic systems, send us a dm or come find us. We’d love to chat.
SAI tweet media
English
4
4
16
1.3K
SAI
SAI@CompeteSai·
@realmc @sihing_guppy My intern told me he caught winds of some huge study results waiting to be revealed.
English
0
0
4
20
SAI
SAI@CompeteSai·
@sihing_guppy Intern told me that's one of his favourite books.
English
1
0
4
76