MMLab@NTU

84 posts

MMLab@NTU

MMLab@NTU

@MMLabNTU

Multimedia Laboratory @NTUsg, affiliated with S-Lab. Large Multimodal Models, Computer Vision, Image Processing, Computer Graphics, Deep Learning

Singapore เข้าร่วม Mayıs 2021
20 กำลังติดตาม1.6K ผู้ติดตาม
ทวีตที่ปักหมุด
MMLab@NTU
MMLab@NTU@MMLabNTU·
Congratulations to Ziqi and Ziwei! Grateful for the opportunity to work with so many gifted students at @MMLabNTU. Their passion and creativity continue to inspire us! Their achievements are listed here: mmlab-ntu.com/team.html
MMLab@NTU tweet media
NTU Singapore@NTUsg

Freshly picked: #NTUsg PhD student Huang Ziqi has been selected as one of 21 global recipients of the prestigious 2025 Apple Scholars in AIML PhD Fellowship — a prestigious programme that supports emerging leaders in AI and machine learning through funding, mentorship, and internship opportunities with Apple. She is working on #GenAI technologies that intuitively understand human intentions, making it easier for people to create and control AI-generated visuals like images and videos. “I see this fellowship not only as a personal milestone but also as a recognition of the supportive community and innovative work taking place at NTU,” says Ziqi. Congratulations to Ziqi and her advisor, Assoc Prof Liu Ziwei, on this international recognition. #NTUsgStudents #NTUsgEducation #NTUsgResearch #AIatNTUsg

English
0
4
17
3.3K
MMLab@NTU รีทวีตแล้ว
Chen Change Loy
Chen Change Loy@ccloy·
VLANeXt systematically studies what actually matters for VLA performance: policy heads, meta-queries, action chunking, flow matching, temporal history, multi-view, and proprioception, and shows that a simple time-series forecasting perspective improves action prediction. 📄 Paper: arxiv.org/abs/2602.18532 💻 Code: github.com/DravenALG/VLAN… If you’re working on VLAs, also check out this curated list of resources: 🔎 github.com/DravenALG/awes…
Chen Change Loy tweet mediaChen Change Loy tweet media
English
2
17
75
4.6K
MMLab@NTU รีทวีตแล้ว
Chen Change Loy
Chen Change Loy@ccloy·
4RC introduces a unified, fully feed-forward framework for monocular 4D reconstruction that encodes an entire video once and then flexibly queries dense 3D geometry and motion for any frame at any timestamp. By factorizing scene structure into base geometry and time-dependent displacements, it achieves accurate, efficient, and state-of-the-art performance across diverse 4D reconstruction tasks. Paper: arxiv.org/pdf/2602.10094 Project Page: yihangluo.com/projects/4RC/
English
0
15
123
10.2K
MMLab@NTU
MMLab@NTU@MMLabNTU·
An on-device AI assistant that transforms local files into structured memory — privacy-first and extensible!
Chen Change Loy@ccloy

🚀 @synvoAI We’re excited to open-source Local-Cocoa, an on-device AI assistant that turns your files into semantic memory while keeping everything private. Built in collaboration with @MMLabNTU, and we warmly welcome contributors to help push it further! 🔧✨ 👉 github.com/synvo-ai/Local…

English
0
2
3
213
MMLab@NTU รีทวีตแล้ว
Chen Change Loy
Chen Change Loy@ccloy·
🚀 @synvoAI We’re excited to open-source Local-Cocoa, an on-device AI assistant that turns your files into semantic memory while keeping everything private. Built in collaboration with @MMLabNTU, and we warmly welcome contributors to help push it further! 🔧✨ 👉 github.com/synvo-ai/Local…
Chen Change Loy tweet media
English
0
9
31
2.8K
MMLab@NTU รีทวีตแล้ว
Kang Liao
Kang Liao@KangLiao929·
Introducing 𝐓𝐡𝐢𝐧𝐤𝐢𝐧𝐠 𝐰𝐢𝐭𝐡 𝐂𝐚𝐦𝐞𝐫𝐚📸, a unified multimodal model that integrates camera-centric spatial intelligence to interpret and create scenes from arbitrary viewpoints. Project Page: kangliao929.github.io/projects/puffi… Code: github.com/KangLiao929/Pu…
English
14
30
142
10.7K
MMLab@NTU รีทวีตแล้ว
AK
AK@_akhaliq·
Thinking with Camera A Unified Multimodal Model for Camera-Centric Understanding and Generation
English
3
32
236
39.3K
MMLab@NTU รีทวีตแล้ว
Shangchen Zhou
Shangchen Zhou@ShangchenZhou·
📸Join us at #ICCV2025 for the Mobile Intelligent Photography & Imaging (MIPI) Workshop! ✨Leading keynotes: Profs. @songhan_mit, Michal Irani, Boxin Shi, and @MingHsuanYang - on intelligent photography and efficient GenAI. 🗓Oct 20, 8:50am–12:30pm HST 🔗mipi-challenge.org
Shangchen Zhou tweet media
English
2
10
27
3.6K
MMLab@NTU
MMLab@NTU@MMLabNTU·
Congratulations to @liuziwei7 of @MMLabNTU , recipient of the Young Scientist Award, recognised for his impactful contributions to computer vision and generative AI. 🎉🎉
NTU Singapore@NTUsg

🏆 Congrats to #NTUsg Prof Ng Geok Ing on the 🇸🇬 President’s Technology Award 2025. A pioneer in Gallium Nitride (#GaN) – found in fast chargers, EVs, satellites & defence – he built 🇸🇬’s global standing in this field and led the creation of the national GaN centre. 👏 We also congratulate Young Scientist Award recipient Assoc Prof Liu Ziwei, who was recognised for advancing #AI in 3D & 4D vision, as well as digital twins, with impact on healthcare, education and other sectors. #PSTA @NTU_EEE @NTU_ccds

English
2
1
29
3.2K
MMLab@NTU รีทวีตแล้ว
Shulin Tian
Shulin Tian@shulin_tian·
🎥 Video is already a tough modality for reasoning. Egocentric video? Even tougher! It is longer, messier, and harder. 💡 How do we tackle these extremely long, information-dense sequences without exhausting GPU memory or hitting API limits? We introduce 👓Ego-R1: A framework for reasoning over ultra-long (i.e., in days and weeks) egocentric videos, with the support from Chain-of-Tool-Thought (CoTT) that decomposes complex reasoning tasks into modular steps. At its core is Ego-R1-Agent-3B, an orchestrating language model trained to dynamically invoke specialized tools at each step, based on the previous actions and observations, to collect the necessary information and solve the tasks gradually, step-by-step. All code and data are fully open-sourced :) 🌐 Project: egolife-ai.github.io/Ego-R1 📄 Paper: arxiv.org/abs/2506.13654 💻 Code: github.com/egolife-ai/Ego…
English
7
8
37
6.1K
MMLab@NTU รีทวีตแล้ว
Ziqi Huang
Ziqi Huang@ziqi_huang_·
🎬 𝗖𝗩𝗣𝗥 𝟮𝟬𝟮𝟱 𝗧𝘂𝘁𝗼𝗿𝗶𝗮𝗹 𝙁𝙧𝙤𝙢 𝙑𝙞𝙙𝙚𝙤 𝙂𝙚𝙣𝙚𝙧𝙖𝙩𝙞𝙤𝙣 𝙩𝙤 𝙒𝙤𝙧𝙡𝙙 𝙈𝙤𝙙𝙚𝙡 🚀 Hosted by MMLab@NTU × Kuaishou, etc 📅 June 11 | Nashville 🔗 world-model-tutorial.github.io 🧠 Video is just the start. World modeling is the goal. #CVPR2025 #WorldModel
Ziqi Huang tweet mediaZiqi Huang tweet mediaZiqi Huang tweet mediaZiqi Huang tweet media
English
1
25
138
10.2K
MMLab@NTU รีทวีตแล้ว
AK
AK@_akhaliq·
Aero-1-Audio is out on Hugging Face Trained in <24h on just 16×H100 Handles 15+ min audio seamlessly Outperforms bigger models like Whisper, Qwen-2-Audio & commercial services from ElevenLabs/Scribe
English
8
63
407
61.3K
MMLab@NTU รีทวีตแล้ว
Size Wu
Size Wu@WuSize·
🔥 We release Harmon: a unified framework for multimodal understanding & generation with a shared visual encoder (vs. decoupled Janus/-Pro). 💥 SOTA on GenEval, MJHQ, WISE 🧠 Strong understanding performance 📄 Paper: huggingface.co/papers/2503.21… 🔗 Code: github.com/wusize/Harmon
English
2
1
17
2.5K
MMLab@NTU รีทวีตแล้ว
Chen Change Loy
Chen Change Loy@ccloy·
🚀 Meet Harmon – a unified model for both image generation and understanding! Trained with a shared masked autoregressive encoder, it sets new benchmarks on GenEval & MJHQ30K. 🖼️💬 Try the live demo now on Hugging Face: 👉 huggingface.co/spaces/wusize/… Paper: arxiv.org/abs/2503.21979 Code: github.com/wusize/Harmon
Size Wu@WuSize

🔥 We release Harmon: a unified framework for multimodal understanding & generation with a shared visual encoder (vs. decoupled Janus/-Pro). 💥 SOTA on GenEval, MJHQ, WISE 🧠 Strong understanding performance 📄 Paper: huggingface.co/papers/2503.21… 🔗 Code: github.com/wusize/Harmon

English
0
2
15
1.7K
MMLab@NTU รีทวีตแล้ว
Chen Change Loy
Chen Change Loy@ccloy·
We turned our method, rejected by CVPR and ECCV, into the iOS app "Cutcha". EdgeSAM, our fast Segment Anything Model, runs at over 30 FPS on an iPhone 14. Enjoy intuitive one-touch object selection and precise editing—all processed locally on your device. No cloud needed! Download now on the App Store! 📸✨ apps.apple.com/us/app/cutcha-… Kudos to our in-house development team Han Soong and Voon Hueh #Cutcha #EdgeSAM #PhotoEditing #iOSApp
Chen Change Loy tweet media
English
8
26
213
23.5K
MMLab@NTU รีทวีตแล้ว
Chen Change Loy
Chen Change Loy@ccloy·
📸🌟 Attention all photography and imaging enthusiasts! Join us at the Third MIPI Workshop at #CVPR2024! 📍 Location: Arch 213 ⏰ Time: 08:30 AM - 12:10 PM 🌐 Website: mipi-challenge.org Don't miss out on an exciting lineup of speakers: 🔹 Lei Zhang: How Far Are We From the Restore Any Image Model (RAIM)? 🔹 Mahmoud Afifi: Revisiting Image White Balancing 🔹 Jian Wang: Towards A Better Camera in the App and AR Glass 🔹 Mian Wei: Passive Ultra-Wideband Single-Photon Imaging #Photography #Imaging #MIPIWorkshop #CVPR2024
Chen Change Loy tweet mediaChen Change Loy tweet media
English
2
8
31
2.8K