SAI

71 posts

SAI

@CompeteSai

A global platform for reinforcement learning. Compete on standardized environments, share your models, and advance the state of the art.

Entrou em Mart 2022

44 Seguindo290 Seguidores

Tweet fixado

SAI@CompeteSai·25 Haz

We’re officially opening up access to SAI – a new platform for reinforcement learning. Submit models, compete in structured challenges, and learn from others — anytime, not just during a conference. Try out SAI now -> 🌐 competesai.com Documentation -> 📄 docs.competesai.com

English

1.7K

SAI@CompeteSai·2d

@realmc @sihing_guppy can't wait for part 3!

English

Malcolm Chan | NRN@realmc·2d

@sihing_guppy with every new piece of analytic work, we uncover more underneath a model behavior

English

SiHing Guppy@sihing_guppy·2d

Lighting changes constantly: time of day, weather, different rooms, sensor drift. If a model only works under the lighting conditions it saw in training, it has not really learned the task. It has learned one appearance regime. We put this to the test. Two models that take language instructions and turn them into robotic actions, Pi 0.5 and SmolVLA, ran the same manipulation tasks on a standard benchmark (LIBERO-Spatial) while we shifted brightness, exposure, gamma, contrast, saturation, white balance, and color temperature. Same geometry, same objects, same tasks. Only appearance changed. Pi 0.5 barely moved. Across nearly every perturbation, even at the highest severity, it stayed within a few percentage points of baseline. The only measurable dip was contrast, to around 94% of baseline. Not a collapse. A graceful decline. SmolVLA degraded under nearly every one. Saturation cut performance roughly in half. Brightness produced steady losses. Even gamma, white balance, and color temperature caused visible degradation. And then there was low contrast. SmolVLA went from baseline to near-zero. Not a degradation curve. A complete collapse. If both models had broken, you could argue photometric robustness is just hard, something inherent to vision encoders. Pi 0.5’s near-total immunity rules that out. Photometric robustness is achievable. SmolVLA’s failure is diagnostic. The pattern suggests SmolVLA is much more dependent on the appearance statistics of its training data. Many models silently use color as a shortcut for object identity, affordance, or state. When color shifts, those shortcuts break. By contrast, Pi 0.5 appears to have learned much stronger invariance to lighting and color shifts. Training augmentation is likely part of that story. The two models do share one vulnerability: low contrast. Pi 0.5 dips gently. SmolVLA collapses. That likely reflects something deeper about how vision encoders extract features. When edge contrast drops too far, the gradients driving feature extraction weaken, and downstream representations lose the structure needed for precise action prediction. Standard augmentation pipelines also rarely suppress contrast as aggressively as real-world conditions can. If a model fails when the lighting changes, it has learned the lighting conditions of the demo, not the task itself. Full analysis with interactive visualizations: x.com/sihing_guppy/s…

SiHing Guppy@sihing_guppy

x.com/i/article/2037…

English

398

SAI@CompeteSai·3d

@sihing_guppy Full technical blog and interactive perturbation UX: blog.competesai.com/blog/visual-se…

English

SAI retweetou

SiHing Guppy@sihing_guppy·3d

x.com/i/article/2037…

ZXX

979

SAI@CompeteSai·3d

@sihing_guppy Line of the Week: many models rely on appearance far more than they admit.

English

SAI@CompeteSai·26 Mar

@sihing_guppy Performance should not be stable across all those perturbation variations, unless they are not looking at the language input.

English

SiHing Guppy@sihing_guppy·26 Mar

11 types of language corruption on a robot policy — typos, synonyms, nonsense. Performance stable across all. > First instinct: robust. > Actual finding: indifferent. Unconditional success on a language test is not a passing grade. It's evidence the test isn't being taken.

English

170

SAI@CompeteSai·26 Mar

@realmc 让华语世界听见我们的声音!

中文

SAI retweetou

Malcolm Chan | NRN@realmc·26 Mar

x.com/i/article/2036…

ZXX

392

SAI@CompeteSai·26 Mar

@sihing_guppy blog: blog.competesai.com/blog/language/

English

SAI@CompeteSai·25 Mar

@sihing_guppy We ran this on a fine-tuned OpenPI policy on LIBERO Spatial. The full perturbation analysis covers 11 language corruption types: blog.competesai.com/blog/language/ x.com/sihing_guppy/s…

SiHing Guppy@sihing_guppy

x.com/i/article/2033…

English

220

SAI retweetou

SiHing Guppy@sihing_guppy·25 Mar

What happens when you remove a robot's ability to read its instructions? Almost nothing. > Full model → 95% success > Remove language → 94% (▼1%) > Remove vision → 13% (▼82%) Near-blind without vision. Near-indifferent to language. If your evaluation only tests correct instructions, you're not measuring language. You're measuring vision.

English

339

SiHing Guppy@sihing_guppy·24 Mar

A robot receives a language command "pick up the black bowl and place it on the plate" and executes it. We replaced the command with: "My name is Franka." No task. No object. No action verb. It picked up the bowl and placed it on the plate. The language instruction isn't being read. The scene is being acted upon. The prompt is decoration.

English

392

SAI@CompeteSai·24 Mar

@sihing_guppy Full analysis with 11 perturbation types across 3 severity levels, including why the flat sensitivity curve is the real finding: blog.competesai.com/blog/language/

English

SAI@CompeteSai·16 Mar

@realmc @sihing_guppy

GIF

QME

Malcolm Chan | NRN@realmc·16 Mar

@CompeteSai @sihing_guppy Had it always been Goodhart's law?

English

SAI retweetou

SiHing Guppy@sihing_guppy·16 Mar

x.com/i/article/2033…

ZXX

1.9K

SAI@CompeteSai·13 Mar

We’ll be at @NVIDIAGTC next week in San Francisco. @sihing_guppy and @MarcAlloul will be there talking about model transparency and evaluation for robotic policies: how to understand what your models are actually doing before you deploy them. If you’re building or deploying robotic systems, send us a dm or come find us. We’d love to chat.