MMLab@NTU

84 posts

MMLab@NTU

@MMLabNTU

Multimedia Laboratory @NTUsg, affiliated with S-Lab. Large Multimodal Models, Computer Vision, Image Processing, Computer Graphics, Deep Learning

Singapore เข้าร่วม Mayıs 2021

20 กำลังติดตาม1.6K ผู้ติดตาม

ทวีตที่ปักหมุด

MMLab@NTU@MMLabNTU·5 Haz

Congratulations to Ziqi and Ziwei! Grateful for the opportunity to work with so many gifted students at @MMLabNTU. Their passion and creativity continue to inspire us! Their achievements are listed here: mmlab-ntu.com/team.html

NTU Singapore@NTUsg

Freshly picked: #NTUsg PhD student Huang Ziqi has been selected as one of 21 global recipients of the prestigious 2025 Apple Scholars in AIML PhD Fellowship — a prestigious programme that supports emerging leaders in AI and machine learning through funding, mentorship, and internship opportunities with Apple. She is working on #GenAI technologies that intuitively understand human intentions, making it easier for people to create and control AI-generated visuals like images and videos. “I see this fellowship not only as a personal milestone but also as a recognition of the supportive community and innovative work taking place at NTU,” says Ziqi. Congratulations to Ziqi and her advisor, Assoc Prof Liu Ziwei, on this international recognition. #NTUsgStudents #NTUsgEducation #NTUsgResearch #AIatNTUsg

English

3.3K

MMLab@NTU รีทวีตแล้ว

Peiqing Yang@peiqing001·9 Mar

🚀 𝙈𝙖𝙩𝘼𝙣𝙮𝙤𝙣𝙚 2 is accepted to #CVPR2026! 🔥A stronger version of 𝙈𝙖𝙩𝘼𝙣𝙮𝙤𝙣𝙚 with finer details and enhanced robustness🔥 -🏡: pq-yang.github.io/projects/MatAn… -📜: arxiv.org/abs/2512.11782 -👩🏻‍💻: github.com/pq-yang/MatAny…

English

869

71.7K

MMLab@NTU รีทวีตแล้ว

Chen Change Loy@ccloy·26 Şub

VLANeXt systematically studies what actually matters for VLA performance: policy heads, meta-queries, action chunking, flow matching, temporal history, multi-view, and proprioception, and shows that a simple time-series forecasting perspective improves action prediction. 📄 Paper: arxiv.org/abs/2602.18532 💻 Code: github.com/DravenALG/VLAN… If you’re working on VLAs, also check out this curated list of resources: 🔎 github.com/DravenALG/awes…

English

4.6K

MMLab@NTU รีทวีตแล้ว

Chen Change Loy@ccloy·23 Şub

4RC introduces a unified, fully feed-forward framework for monocular 4D reconstruction that encodes an entire video once and then flexibly queries dense 3D geometry and motion for any frame at any timestamp. By factorizing scene structure into base geometry and time-dependent displacements, it achieves accurate, efficient, and state-of-the-art performance across diverse 4D reconstruction tasks. Paper: arxiv.org/pdf/2602.10094 Project Page: yihangluo.com/projects/4RC/

English

123

10.2K

MMLab@NTU@MMLabNTU·7 Oca

An on-device AI assistant that transforms local files into structured memory — privacy-first and extensible!

Chen Change Loy@ccloy

🚀 @synvoAI We’re excited to open-source Local-Cocoa, an on-device AI assistant that turns your files into semantic memory while keeping everything private. Built in collaboration with @MMLabNTU, and we warmly welcome contributors to help push it further! 🔧✨ 👉 github.com/synvo-ai/Local…

English

213

MMLab@NTU รีทวีตแล้ว

Chen Change Loy@ccloy·7 Oca

English

2.8K

MMLab@NTU@MMLabNTU·25 Eki

RT @ccloy: Congrats to Yuekun @YuekunDai and @ziangcao_ , both from @MMLabNTU , for winning the prestigious Google PhD Fellowship! Yuekun…

English

149

MMLab@NTU รีทวีตแล้ว

Kang Liao@KangLiao929·13 Eki

Introducing 𝐓𝐡𝐢𝐧𝐤𝐢𝐧𝐠 𝐰𝐢𝐭𝐡 𝐂𝐚𝐦𝐞𝐫𝐚📸, a unified multimodal model that integrates camera-centric spatial intelligence to interpret and create scenes from arbitrary viewpoints. Project Page: kangliao929.github.io/projects/puffi… Code: github.com/KangLiao929/Pu…

English

142

10.7K

MMLab@NTU รีทวีตแล้ว

AK@_akhaliq·13 Eki

Thinking with Camera A Unified Multimodal Model for Camera-Centric Understanding and Generation

English

236

39.3K

MMLab@NTU รีทวีตแล้ว

Shangchen Zhou@ShangchenZhou·10 Eki

📸Join us at #ICCV2025 for the Mobile Intelligent Photography & Imaging (MIPI) Workshop! ✨Leading keynotes: Profs. @songhan_mit, Michal Irani, Boxin Shi, and @MingHsuanYang - on intelligent photography and efficient GenAI. 🗓Oct 20, 8:50am–12:30pm HST 🔗mipi-challenge.org

English

3.6K

MMLab@NTU@MMLabNTU·4 Eki

Congratulations to @liuziwei7 of @MMLabNTU , recipient of the Young Scientist Award, recognised for his impactful contributions to computer vision and generative AI. 🎉🎉

NTU Singapore@NTUsg

🏆 Congrats to #NTUsg Prof Ng Geok Ing on the 🇸🇬 President’s Technology Award 2025. A pioneer in Gallium Nitride (#GaN) – found in fast chargers, EVs, satellites & defence – he built 🇸🇬’s global standing in this field and led the creation of the national GaN centre. 👏 We also congratulate Young Scientist Award recipient Assoc Prof Liu Ziwei, who was recognised for advancing #AI in 3D & 4D vision, as well as digital twins, with impact on healthcare, education and other sectors. #PSTA @NTU_EEE @NTU_ccds

English

3.2K

MMLab@NTU@MMLabNTU·30 Eyl

Congrats to Weichen (weichenfan.github.io/Weichen/) and Mutian (mutianxu.github.io)!

Ziwei Liu@liuziwei7

#ICCV2025 Congrats to Weichen (weichenfan.github.io/Weichen/) and Mutian (mutianxu.github.io) being selected as the outstanding reviewers @ICCVConference #outstanding-reviewers" target="_blank" rel="nofollow noopener">iccv.thecvf.com/Conferences/20…

English

872

MMLab@NTU รีทวีตแล้ว

Shulin Tian@shulin_tian·17 Haz

🎥 Video is already a tough modality for reasoning. Egocentric video? Even tougher! It is longer, messier, and harder. 💡 How do we tackle these extremely long, information-dense sequences without exhausting GPU memory or hitting API limits? We introduce 👓Ego-R1: A framework for reasoning over ultra-long (i.e., in days and weeks) egocentric videos, with the support from Chain-of-Tool-Thought (CoTT) that decomposes complex reasoning tasks into modular steps. At its core is Ego-R1-Agent-3B, an orchestrating language model trained to dynamically invoke specialized tools at each step, based on the previous actions and observations, to collect the necessary information and solve the tasks gradually, step-by-step. All code and data are fully open-sourced :) 🌐 Project: egolife-ai.github.io/Ego-R1 📄 Paper: arxiv.org/abs/2506.13654 💻 Code: github.com/egolife-ai/Ego…

English

6.1K

MMLab@NTU รีทวีตแล้ว

Ziqi Huang@ziqi_huang_·5 Haz

🎬 𝗖𝗩𝗣𝗥 𝟮𝟬𝟮𝟱 𝗧𝘂𝘁𝗼𝗿𝗶𝗮𝗹 𝙁𝙧𝙤𝙢 𝙑𝙞𝙙𝙚𝙤 𝙂𝙚𝙣𝙚𝙧𝙖𝙩𝙞𝙤𝙣 𝙩𝙤 𝙒𝙤𝙧𝙡𝙙 𝙈𝙤𝙙𝙚𝙡 🚀 Hosted by MMLab@NTU × Kuaishou, etc 📅 June 11 | Nashville 🔗 world-model-tutorial.github.io 🧠 Video is just the start. World modeling is the goal. #CVPR2025 #WorldModel

English

138

10.2K

MMLab@NTU รีทวีตแล้ว

AK@_akhaliq·30 Nis

Aero-1-Audio is out on Hugging Face Trained in <24h on just 16×H100 Handles 15+ min audio seamlessly Outperforms bigger models like Whisper, Qwen-2-Audio & commercial services from ElevenLabs/Scribe

English

407

61.3K

MMLab@NTU รีทวีตแล้ว

Size Wu@WuSize·7 Nis

🔥 We release Harmon: a unified framework for multimodal understanding & generation with a shared visual encoder (vs. decoupled Janus/-Pro). 💥 SOTA on GenEval, MJHQ, WISE 🧠 Strong understanding performance 📄 Paper: huggingface.co/papers/2503.21… 🔗 Code: github.com/wusize/Harmon

English

2.5K

MMLab@NTU รีทวีตแล้ว

Chen Change Loy@ccloy·8 Nis

🚀 Meet Harmon – a unified model for both image generation and understanding! Trained with a shared masked autoregressive encoder, it sets new benchmarks on GenEval & MJHQ30K. 🖼️💬 Try the live demo now on Hugging Face: 👉 huggingface.co/spaces/wusize/… Paper: arxiv.org/abs/2503.21979 Code: github.com/wusize/Harmon

Size Wu@WuSize

English

1.7K

MMLab@NTU รีทวีตแล้ว

Chen Change Loy@ccloy·19 Tem

We turned our method, rejected by CVPR and ECCV, into the iOS app "Cutcha". EdgeSAM, our fast Segment Anything Model, runs at over 30 FPS on an iPhone 14. Enjoy intuitive one-touch object selection and precise editing—all processed locally on your device. No cloud needed! Download now on the App Store! 📸✨ apps.apple.com/us/app/cutcha-… Kudos to our in-house development team Han Soong and Voon Hueh #Cutcha #EdgeSAM #PhotoEditing #iOSApp

English

213

23.5K

MMLab@NTU รีทวีตแล้ว

Chen Change Loy@ccloy·18 Haz

📸🌟 Attention all photography and imaging enthusiasts! Join us at the Third MIPI Workshop at #CVPR2024! 📍 Location: Arch 213 ⏰ Time: 08:30 AM - 12:10 PM 🌐 Website: mipi-challenge.org Don't miss out on an exciting lineup of speakers: 🔹 Lei Zhang: How Far Are We From the Restore Any Image Model (RAIM)? 🔹 Mahmoud Afifi: Revisiting Image White Balancing 🔹 Jian Wang: Towards A Better Camera in the App and AR Glass 🔹 Mian Wei: Passive Ultra-Wideband Single-Photon Imaging #Photography #Imaging #MIPIWorkshop #CVPR2024

English

2.8K

MMLab@NTU รีทวีตแล้ว

The AI Talks@TheAITalksOrg·16 Mar

The Upcoming AI talk: 🌋LLaVA🦙 A Vision-and-Language Approach to Computer Vision in the Wild by Chunyuan Li @ChunyuanLi More info: mailchi.mp/1242f078b2b1/a… Subscribe us: mailchi.mp/4417dc2cde83/t…

English

4.5K

ค้นพบ

@synvoAI @ccloy @YuekunDai @ziangcao_ @songhan_mit @MingHsuanYang @liuziwei7 @ChunyuanLi