
WAC2022
669 posts

WAC2022
@WebAudioConf
#WAC2022 https://t.co/3GncgII1zK
















Here's my take on the Sora technical report, with a good dose of speculation that could be totally off. First of all, really appreciate the team for sharing helpful insights and design decisions – Sora is incredible and is set to transform the video generation community. What we have learned so far: - Architecture: Sora is built on our diffusion transformer (DiT) model (published in ICCV 2023) — it's a diffusion model with a transformer backbone, in short: DiT = [VAE encoder + ViT + DDPM + VAE decoder]. According to the report, it seems there are not much additional bells and whistles. - "Video compressor network": Looks like it's just a VAE but trained on raw video data. Tokenization probably plays a significant role in getting good temporal consistency. By the way, VAE is a ConvNet, so DiT technically is a hybrid model ;) (1/n)



Today we're releasing the Segment Anything Model (SAM) — a step toward the first foundation model for image segmentation. SAM is capable of one-click segmentation of any object from any photo or video + zero-shot transfer to other segmentation tasks ➡️ bit.ly/433YuBI




👾#GamesOnWeb #concours #etudiant organisé par @Univ_CotedAzur, @univamu et @UT3PaulSabatier avec @CGI_FR Les inscriptions sont ouvertes ! 👉bit.ly/GOW2023 #GOW2023 @MIAGENiceSophia | @PolytechNSophia | @iut_nca | @IutAixMars | @MiageAixMrs | @MIAGEToulouse





