GAMMA UMD

1.1K posts

GAMMA UMD

@gammaumd

Geometric Algorithms for Modeling, Motion, and Animation research group: UNC Chapel Hill (1992-2018); University of Maryland, College Park (2018 onwards)

College Park, MD Katılım Ekim 2009

530 Takip Edilen756 Takipçiler

Sabitlenmiş Tweet

GAMMA UMD@gammaumd·20 Eyl

Thrilled to share that our team has multiple papers accepted at #NeurIPS2025 🎉🚀 We’re excited to contribute to advancing multi-modal learning, physical reasoning, and embodied AI. Here’s a quick overview of the works 👇🧵

English

813

GAMMA UMD@gammaumd·30 Eki

Huge congrats to James Mullen @jamesfmullen and Geonsun Lee @gsun_lee on successfully defending their PhDs this week! 🎓👏 We’re so proud to celebrate your achievements — your hard work and perseverance have inspired everyone in the GAMMA Lab and beyond! 🌟

English

488

GAMMA UMD@gammaumd·26 Eki

GAMMA Group Presented 5 papers at ICCV 1. "𝑨𝑽𝑻𝒓𝒖𝒔𝒕𝑩𝒆𝒏𝒄𝒉: 𝑨𝒔𝒔𝒆𝒔𝒔𝒊𝒏𝒈 𝒂𝒏𝒅 𝑬𝒏𝒉𝒂𝒏𝒄𝒊𝒏𝒈 𝑹𝒆𝒍𝒊𝒂𝒃𝒊𝒍𝒊𝒕𝒚 𝒂𝒏𝒅 𝑹𝒐𝒃𝒖𝒔𝒕𝒏𝒆𝒔𝒔 𝒊𝒏 𝑨𝒖𝒅𝒊𝒐-𝑽𝒊𝒔𝒖𝒂𝒍 𝑳𝑳𝑴𝒔" Sanjoy Chowdhury, Sayan Nag, Subhrajyoti Dasgupta, Yaoting Wang, Mohamed (Mo) Elhoseiny, Ruohan Gao, Dinesh Manocha 📍Tue 21 | Oct 11:45 a.m. HST — 1:45 p.m. HST | Poster Session 1 | Exhibit Hall I #141 🌐 Project page: lnkd.in/g5mMwkHj 💻 GitHub: lnkd.in/gxWJibcx 2. "𝑬𝒈𝒐𝑨𝒅𝒂𝒑𝒕: 𝑨𝒅𝒂𝒑𝒕𝒊𝒗𝒆 𝑴𝒖𝒍𝒕𝒊𝒔𝒆𝒏𝒔𝒐𝒓𝒚 𝑫𝒊𝒔𝒕𝒊𝒍𝒍𝒂𝒕𝒊𝒐𝒏 𝒂𝒏𝒅 𝑷𝒐𝒍𝒊𝒄𝒚 𝑳𝒆𝒂𝒓𝒏𝒊𝒏𝒈 𝒇𝒐𝒓 𝑬f𝒇𝒊𝒄𝒊𝒆𝒏𝒕 𝑬𝒈𝒐𝒄𝒆𝒏𝒕𝒓𝒊𝒄 𝑷𝒆𝒓𝒄𝒆𝒑𝒕𝒊𝒐𝒏" Sanjoy Chowdhury, Subrata Biswas, Sayan Nag, Tushar Nagarajan, Calvin Murdock, Ishwarya Ananthabhotla, Yijun Q., Vamsi Krishna Ithapu, Dinesh Manocha, Ruohan Gao 📍Wed 22 Oct | 11:15 a.m. HST — 1:15 p.m. HST | Poster Session 3 | Exhibit Hall I #983 🌐 Project page: lnkd.in/gF3c2eKt 💻 GitHub: lnkd.in/g-iiQaed 3. "𝑨𝑼𝑹𝑬𝑳𝑰𝑨: 𝑻𝒆𝒔𝒕-𝒕𝒊𝒎𝒆 𝑹𝒆𝒂𝒔𝒐𝒏𝒊𝒏𝒈𝑫𝒊𝒔𝒕𝒊𝒍𝒍𝒂𝒕𝒊𝒐𝒏 𝒊𝒏 𝑨𝒖𝒅𝒊𝒐-𝑽𝒊𝒔𝒖𝒂𝒍 𝑳𝑳𝑴𝒔" Sanjoy Chowdhury, Hanan Gani, Nishit Anand, Sayan Nag, Ruohan Gao, Mohamed (Mo) Elhoseiny, Salman Khan, Dinesh Manocha 📍Thu 23 Oct | 11:15 a.m. HST — 1:15 p.m. HST | Poster Session 5 | Exhibit Hall I #2113 🌐 Project page: lnkd.in/gwd_T5VM 💻 GitHub: lnkd.in/gV-5aWRs 4. "DMesh++: An Efficient Differentiable Mesh for Complex Shapes" Sanghyun Son · Matheus Gadelha · Yang Zhou · Matthew Fisher · Zexiang Xu · Yi-Ling Qiao · Ming Lin · Yi Zhou Thu 23 Oct 5:45 p.m. PDT — 7:45 p.m. PDT / 2:45pm HST - 4:45pm HST Exhibit Hall I #2456 ProjectPage: lnkd.in/epYCB43E Github: lnkd.in/eabpDQGx 5. "IM360: Large-scale Indoor Mapping with 360 Cameras", Dongki Jung, Jaehoon Choi, Yonghan Lee, Dinesh Manocha 📍Thu 23 Oct | 2:45 p.m. HST — 4:45 p.m. HST | Poster Session 6 | Exhibit Hall I #2691 🌐 Project page: lnkd.in/ewQ4uEdS 💻 GitHub: lnkd.in/e4FQpnNE 📷

English

618

GAMMA UMD@gammaumd·26 Eki

GAMMA Group at UMD presented 13 papers at IROS 2025. 1. “ET-Former: Efficient Triplane Deformable Attention for 3D Semantic Scene Completion from Monocular Camera”, 13:50-13:55, TuBT10.7 Liang, Jing; Yin, He; Qi, Xuewei; Park, Jong Jin; Sun, Min; Madhivanan, Rajasimman; Manocha, Dinesh 2. “ Confidence-Controlled Exploration: Efficient Sparse-Reward Policy Learning for Robotic Navigation”, 13:30-13:35,TuBT11.3 Patel, Bhrij; Kulathun Mudiyanselage, Kasun Weerakoon; Suttle, Wesley A.; Koppel, Alec; Sadler, Brian; Zhou, Tianyi; Manocha, Dinesh; Bedi, Amrit Singh 3. “TK-Planes: Tiered K-Planes with High Dimensional Feature Vectors for Dynamic UAV-Based Scenes”, 13:25-13:30, TuBT30.2 Maxey, Christopher; Choi, Jaehoon; Kwon, Heesung; Lee, Hyungtae; Manocha, Dinesh 4. “ CROSS-GAiT: Cross-Attention-Based Multimodal Representation Fusion for Parametric Gait Adaptation in Complex Terrains”, 17:10-17:15, TuDT11.7 Seneviratne, Gershom Devake; Kulathun Mudiyanselage, Kasun Weerakoon; Elnoor, Mohamed; Rajagopal, Vignesh; Varatharajan, H.; M Jaffar, Mohamed Khalid; Pusey, Jason; Manocha, Dinesh 5. “VL-TGS: Trajectory Generation and Selection Using Vision Language Models in Mapless Outdoor Environments”, 13:20-13:25, WeBT7.1 Song, Daeun; Liang, Jing; Xueso, Xiao; Manocha, Dinesh 6. “AutoSpatial: Visual-Language Reasoning for Social Robot Navigation through Efficient Spatial Reasoning Learning”, 15:20-15:25, WeCT5.5 Kong, Yangzhe; Song, Daeun; Liang, Jing; Manocha, Dinesh; Yao, Z.; Xiao, Xuesu 7. “Cross-Source-Context Indoor RGB-D Place Recognition”, 15:30-15:35, WeCT10.7 Liang, Jing; Deng, Zhuo; Zhou, Zheming; Ghasemalizadeh, O.; Kuo, Cheng-Hao; Sen, A.; Manocha, Dinesh 8. “SkyVLN: Vision-And-Language Navigationand NMPC Control for UAVs in Urban Environments”, 13:45-13:50, ThBT12.6 Payandeh, Amirreza; Song, Daeun; Nazeri, M. Liang, Jing; Mukherjee, P.; Raj, Amir Hossain; Kong, Y.; Manocha, Dinesh; Xiao, Xuesu 9. “ Is the House Ready for Sleeptime? Generating and Evaluating Situational Queries for Embodied Question Answering”. 15:25-15:30, ThCT3.6 Dorbala, Vishnu Sashank; Goyal, P. Piramuthu, R.; Johnston, M.; Ghanadan, Reza; Manocha, Dinesh 10. “LBAP: Improved Uncertainty Alignment of LLM Planners Using Bayesian Inference”, 15:10-15:15, ThCT8.3 Mullen, James; Manocha, Dinesh 11. “MMCD: Multi-Modal Collaborative Decision-Making for Connected Autonomy with Knowledge Distillation”, 17:10-17:15, TuDT9.7 Liu, Rui; Wang, Zikang; Gao, Peng; Shen, Yu; Tokekar, Pratap; Lin, Ming C. 12. “ Quantifying and Modeling Driving Style in Trajectory Forecasting”, 15:05-15:10, WeCT17.2 Zheng, Laura; Yaghoubi Araghi, Hamidreza; Wu, Tony; Thalapanane, Sandeep; Zhou, Tianyi; Lin, Ming C. 13. “On the Vulnerability of LLM/VLM-Controlled Robotics”, 13:45-13:50, TuBT4.6 Wu, Xiyang; Chakraborty, S.; Xian, Ruiqi; Liang, Jing; Guan, Tianrui; Liu, F.; Sadler, B.; Manocha, Dinesh; Bedi, Amrit S.

English

764

GAMMA UMD retweetledi

DailyPapers@HuggingPapers·26 Eki

NVIDIA just released Audio Flamingo 3 on Hugging Face! This fully open, state-of-the-art Large Audio-Language Model excels at understanding & reasoning across speech, sounds, and music, setting new benchmarks on 20+ tasks. huggingface.co/nvidia/audio-f…

English

118

682

59.7K

GAMMA UMD retweetledi

Amrit Singh Bedi@amritsinghbedi3·24 Eki

Excited to share our #NeurIPS2025 paper 🎉 "more thinking ≠ better reasoning" 👉 We uncover the 𝐦𝐢𝐫𝐚𝐠𝐞 𝐨𝐟 𝐭𝐞𝐬𝐭-𝐭𝐢𝐦𝐞 𝐭𝐡𝐢𝐧𝐤𝐢𝐧𝐠 𝐬𝐜𝐚𝐥𝐢𝐧𝐠: increasing thinking tokens at test-time boosts accuracy briefly, then hurts as response variance increases

English

884

GAMMA UMD@gammaumd·20 Eyl

(5/n) VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding Author: @zli12321, @wu_xiyang, @guangyao_shi, Yubin Qin, Hongyang Du, @zhoutianyi, @dmanocha, @boydgraber Paper: arxiv.org/abs/2505.01481 Project: wuxiyang1996.github.io/videohallu_pag… Dataset: huggingface.co/datasets/Intel… We present VideoHallu, a benchmark of 3,000+ synthetic videos with counterintuitive QA pairs that reveal how state-of-the-art MLLMs hallucinate on physics violations, spatio-temporal inconsistencies, and commonsense errors, while showing that targeted fine-tuning significantly improves abnormality detection and reasoning.

English

187

GAMMA UMD@gammaumd·20 Eyl

(4/n) MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks Authors: Sanjoy Chowdhury, Mohamed Elmoghany, Yohan Abeysinghe, Junjie Fei, Sayan Nag, Salman Khan, @moElhoseiny, @dmanocha Paper: arxiv.org/abs/2506.07016 Project page: schowdhury671.github.io/magnet_project/ Benchmark: huggingface.co/datasets/elmog… We introduce MAGNET, a multi-agent framework with a new benchmark (AVHaystacks), task (AVHaystacksQA), and metrics (StEM & MTGS) that enables and evaluates multi-video audio-visual reasoning, achieving state-of-the-art performance with large gains over strong baselines.

English

228

GAMMA UMD@gammaumd·20 Eyl

English

813

GAMMA UMD@gammaumd·6 Eyl

🎓 We’re thrilled to celebrate the successful PhD thesis defenses of Jing Liang @Jing53582 and Senthil Hariharan Arul! 🥳👏 A huge congratulations to both, your hard work and dedication have truly paid off. We’re so proud of you and can’t wait to see what’s next! 🌟

English

205

GAMMA UMD@gammaumd·15 Ağu

🚀 New @UMD_CollegeISR research by @vdorbala helps robots truly “get” situational context! 🤖 Introducing Situational EQA (S-EQA), enabling embodied agents to reason over multiple object states & relationships to answer complex, real-world queries like “Is the house ready for sleep time?” 📄 Accepted to #IROS2025 in Hangzhou, China 🌐 Paving the way for smarter, more context-aware home robots. 🔗 isr.umd.edu/news/story/new… #AI #Robotics #ComputerVision

English

222

GAMMA UMD retweetledi

UMD Department of Computer Science@umdcs·13 Ağu

👁️👂@umdcs Ph.D. student Sanjoy Chowdhury (@schowdhury671) advances research in multimodal AI, developing models that integrate audio and visual signals for improved understanding and reasoning. Read more: go.umd.edu/Chowdhury-8-20…

UMD Department of Computer Science tweet media

English

1.2K

GAMMA UMD retweetledi

laura z@laurayuzheng·5 Ağu

I DEFENDED MY THESIS TODAY!!! Thanks to my advisor Ming Lin and also my committee and also my family. And like 300 other people

English

123

10K

GAMMA UMD@gammaumd·6 Ağu

Excited to share our paper HALO is accepted to #CoRL2025! 🧭 HALO introduces a novel method to train vision-based reward models that align with human navigation preferences without requiring online rollouts or hand-engineered rewards. 📌 Key ideas: • We collect binary user feedback to intuitive queries like "Should the robot turn left?", "Should the robot turn right?", "Should it accelerate?" based on egocentric camera input. • The user feedback, combined with the expert reference action, is used to construct a probabilistic action preference distribution, with its mode at the reference action. • We train a reward model to rank all feasible actions using the Plackett-Luce loss, a generalization of the Bradley-Terry model for n-way comparisons. 🏁 We deploy HALO’s reward in both a classical planner and an RL-based planner. Real-world evaluation on a Clearpath Husky robot shows: ✅ ≥33.3% improvement in success rate ✅ ≥12.9% reduction in trajectory length ✅ ≥26.6% reduction in Fréchet distance to human demonstrations 📸 All with RGB cameras only, no LiDAR or depth. Authors: @gershom_96, Jianyu An, Sahire Ellahy, @kaweer_, Mohamed Elnoor, Jonathan Deepak Kannan, Amogha Sunil, @dmanocha Paper: arxiv.org/pdf/2508.01539 Website: gamma.umd.edu/researchdirect…

English

284

GAMMA UMD@gammaumd·30 Tem

🤖 What if your LLM confidently misread a chart, and you couldn’t tell? 📊 Multimodal LLMs are improving, but still hallucinate when answering chart-based questions. 🎯 Presenting at ACL 2025 (Virtual): ChartLens grounds model answers to specific chart elements, helping users verify claims and spot hallucinations. 🛠️ We introduce: • ChartLens: Fine-grained chart attribution • ChartVA-Eval: Benchmark across finance, policy & economics 📍 Session 12: V-Presentations 🗓️ Today | July 30 | 11:00–12:30 CEST | 5–6:30 AM ET 📄 ChartLens: Fine-grained Visual Attribution in Charts 👥 Authors: Puneet Mathur, Nedim Lipka, Franck Dernoncourt, Ryan A. Rossi, Dinesh Manocha #ACL2025 #MultimodalLLM #AIhallucination #ChartUnderstanding

English

161

GAMMA UMD retweetledi

NVIDIA AI Developer@NVIDIAAIDev·28 Tem

🥇Audio Flamingo 3 from #NVIDIAResearch is now #1 on the MMAU leaderboard showcasing superpowers in sound, music and speech understanding skills. 🎶 🙌Leaderboard: sakshi113.github.io/mmau_homepage/ 📗Paper: arxiv.org/abs/2507.08128

NVIDIA AI Developer@NVIDIAAIDev

🎶 Meet Audio-Flamingo 3 – a fully open LALM trained on sound, speech, and music datasets. 🎶 Handles 10-min audio, long-form text, and voice conversations. Perfect for audio QA, dialog, and reasoning. On @huggingface ➡️ huggingface.co/nvidia/audio-f… From #NVIDIAResearch.

English

237

27.4K

GAMMA UMD retweetledi

Amrit Singh Bedi@amritsinghbedi3·13 Tem

Are you interested in test-time AI alignment? If you are attending #ICML2025, please visit our poster: 📍 Poster Location: East Exhibition Hall A–B, Booth #E-2701 📅 When: Wednesday, July 16 | ⏰ 11:00 AM – 1:30 PM PDT @HaoZhu6 @MFHChehade @SOURADIPCHAKR18 will be presenting

Amrit Singh Bedi@amritsinghbedi3

Can decades old ideas from #psychology help fix critical issues in modern LLM alignment? 🤔 We're tapping into #BoundedRationality & 'satisficing principles' to build an alternate way to align LLMs. Our new #ICML2025 paper 👇 🧵 arxiv.org/pdf/2505.23729

English

1.3K

GAMMA UMD@gammaumd·15 Tem

🚀 Audio General Intelligence (AGI) is no longer a dream — it’s here. Introducing Audio Flamingo 3 — open-source, multimodal, and groundbreaking. It listens. It understands. It reasons across sound and language. 💥 Code, weights, datasets, paper — all open. 📄Paper: arxiv.org/abs/2507.08128 🤗HuggingFace: huggingface.co/nvidia/audio-f… Built by the amazing team at NVIDIA & UMD. Let’s shape the future of audio intelligence together!

English

364

Keşfet

@jamesfmullen @gsun_lee @zli12321 @wu_xiyang @guangyao_shi @zhoutianyi @dmanocha @boydgraber