
Pascale Fung
161 posts

Pascale Fung
@pascalefung
Cofounder and CRIO of AMI Labs. Chair Professor of ECE, HKUST. Fellow of AAAI, ACL, IEEE, ISCA.


Heading to ICLR in Rio 🇧🇷? We’re hosting our first networking mixer on April 24. Meet AMI’s technical team and cofounders, and learn more about what we’re building. Food, drinks, and great conversation included. Register at luma.com/np3x51zh


@amilabs AMI: The final frontier. These are the voyages of a new AI enterprise. Its 5-year mission: To explore & learn about strange new worlds, To seek out & support new life and new civilizations, To boldly go where no man or woman has gone before.




Advanced Machine Intelligence (AMI) is building a new breed of AI systems that understand the world, have persistent memory, can reason and plan, and are controllable and safe. We’ve raised a $1.03B (~€890M) round from global investors who believe in our vision of universally intelligent systems centered on world models. This round is co-led by Cathay Innovation, Greycroft, Hiro Capital, HV Capital, and Bezos Expeditions, along with other investors and angels across the world. We are a growing team of researchers and builders, operating in Paris, New York, Montreal and Singapore from day one. Read more: amilabs.xyz AMI - Real world. Real intelligence.



Meta just released Action100M on Hugging Face A massive video dataset with 100M+ hierarchical action annotations. Every video includes tree-of-captions with action labels, brief and detailed summaries.

Introducing VL-JEPA: Vision-Language Joint Embedding Predictive Architecture for streaming, live action recognition, retrieval, VQA, and classification tasks with better performance and higher efficiency than large VLMs. • VL-JEPA is the first non-generative model that can perform general-domain vision-language tasks in real-time, built on a joint embedding predictive architecture. • We demonstrate in controlled experiments that VL-JEPA, trained with latent space embedding prediction, outperforms VLMs that rely on data space token prediction. • We show that VL-JEPA delivers significant efficiency gains over VLMs for online video streaming applications, thanks to its non-autoregressive design and native support for selective decoding. • We highlight that our VL-JEPA model, with an unified model architecture, can effectively handle a wide range of classification, retrieval, and VQA tasks at the same time. by @Delong0_0 @MustafaShukor1 @TheoMoutakanni @willyhcchung Jade Lei Yu Tejaswi Kasarla @AllenBolourchi @ylecun @pascalefung arxiv.org/abs/2512.10942



Planning with Reasoning using Vision Language World Model

Thanks @_akhaliq for sharing! More about our VLWM: - Non-pixel-generative world model that reasons in abstract semantic space - Learned from 20k hours of unlabeled egocentric / web procedural videos with 5.7M action steps - System-2 planning with reasoning by cost-guided plan search Congrats to the whole team! @TheoMoutakanni @willyhcchung @yejin_bang , @ZiweiJi184538 @AllenBolourchi @pascalefung





