
What aspects of human knowledge do vision models like CLIP fail to capture, and how can we improve them? We suggest models miss key global organization; aligning them makes them more robust. Check out @lukas_mut's work, finally out (in @Nature!?) + our new blogpost! 1/4


















