Seungwhan Shane Moon

25 posts

Seungwhan Shane Moon

@shane_moon

| Research Scientist @ Facebook | | PhD @ LTI SCS, CMU |

Seattle, WA Katılım Mart 2010

190 Takip Edilen696 Takipçiler

Seungwhan Shane Moon@shane_moon·6 Nis

We're organizing a visual Q&A benchmark challenge at KDD, focusing on the Multimodal RAG task. Join the CRAG-MM Challenge! More details here: aicrowd.com/challenges/met…

English

334

Seungwhan Shane Moon@shane_moon·29 Eki

We're hiring exceptional AI Research Scientists to join our team at Meta Reality Labs, where you'll work on cutting-edge projects in Vision LLMs. Please reach out to me directly via email with your resume! (Check minimum qualifications) metacareers.com/jobs/388290281…

English

228

27.6K

Seungwhan Shane Moon@shane_moon·7 Oca

We are hiring PhD AI Research Interns to work on various projects around Multimodal LLM for Summer 2024 (Reality Labs). Please reach out to me directly via email with your resume!

English

217

52.3K

Seungwhan Shane Moon@shane_moon·2 Eki

@KuterDinel Hi @KuterDinel, great point, I do agree it'll be more robust that way, but it'll be computationally much more costly to pre-train it e2e from scratch, and re-do instruction tuning & RLHF for the LLM (hence "scalable and efficient" in our title).

English

Kuter Dinel@KuterDinel·30 Eyl

@shane_moon This is amazing. I have a question though, wouldn't the model have more general reasoning capabilities if it was trained with multi-modal data from ground up instead of fine tuned later ? Is the fine tuning method preferred because of the cost of training a model like LLAMA 70b

English

297

Seungwhan Shane Moon@shane_moon·29 Eyl

Excited to share our recent work, AnyMAL -- a unified Multimodal LLM built on LLaMA-2 that can reason over various inputs, e.g. images, audio, motion sensors. Check out our paper for more information on the model training, evaluation, safety and more! ➡️ arxiv.org/abs/2309.16058

English

122

22.5K

Seungwhan Shane Moon retweetledi

AK@_akhaliq·29 Eyl

Meta introduces AnyMAL - a unified model that reasons over diverse input modality signals (i.e. text, image, video, audio, IMU motion sensor), and generates textual responses - best model achieves strong zero-shot performance in both automatic and human evaluation on diverse tasks and modalities, setting new SOTA with +7.0% relative accuracy improvement on VQAv2, +8.4% CIDEr on zeroshot COCO image captioning, and +14.5% CIDEr on AudioCaps, when compared with the models available in the literature.

AK@_akhaliq

AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model paper page: huggingface.co/papers/2309.16… present Any-Modality Augmented Language Model (AnyMAL), a unified model that reasons over diverse input modality signals (i.e. text, image, video, audio, IMU motion sensor), and generates textual responses. AnyMAL inherits the powerful text-based reasoning abilities of the state-of-the-art LLMs including LLaMA-2 (70B), and converts modality-specific signals to the joint textual space through a pre-trained aligner module. To further strengthen the multimodal LLM's capabilities, we fine-tune the model with a multimodal instruction set manually collected to cover diverse topics and tasks beyond simple QAs. We conduct comprehensive empirical analysis comprising both human and automatic evaluations, and demonstrate state-of-the-art performance on various multimodal tasks.

English

106

590

234.1K

Seungwhan Shane Moon@shane_moon·13 Mar

Meta Reality Lab is organizing "Ambient AI Workshop" -- focusing on multimodal understanding with wearable sensors, combining NLP + Vision + Sensor Signals. For more details & call for paper (now due Mar 26): sites.google.com/view/ambientai… We look forward to your participation!

English

1.1K

Seungwhan Shane Moon@shane_moon·18 Oca

We are hiring PhD Research Interns to work on various Multimodal & NLP related projects (Reality Labs) for 2023. See JDs here -- apply directly or reach out to me directly via email! - metacareers.com/jobs/665911908… - metacareers.com/jobs/854010465…

English

32.8K

Seungwhan Shane Moon@shane_moon·22 Oca

We are hiring research interns to work on various multimodal & NLP related projects (Reality Labs). See JDs here -- or reach out to me directly via email! facebookcareers.com/v2/jobs/442756… facebookcareers.com/v2/jobs/183398…

English

Seungwhan Shane Moon@shane_moon·27 Ağu

4 papers accepted at #EMNLP2021🎉 #NLProc - ToD Dataset for Immersive Multimodal Conversation; @SatwikKottur et al - Continual Learning in ToD System; @AndreaMadotto et al - Zero-Shot DST via CrossTask Transfer; @zlinao_lin et al - Annotation for Nuanced Conversation; Chen et al

English

Seungwhan Shane Moon@shane_moon·11 Haz

(3/3) ACCENTOR datasets: We propose a Human ↔ AI collaborative data collection approach for generating diverse chitchat responses to augment ToD dialogs with minimal annotation effort. Results: chit-chat additions to 23K+ dialogs from two popular ToD datasets (SGD & MultiWoZ2.1)

English

Seungwhan Shane Moon@shane_moon·11 Haz

(2/3) Results? - (Interaction eval) People like them! Our models are consistently preferred by human judges across the four axes (engagingness, etc.), compared to the baseline assistant models. - (Task eval) Our models still maintain competitive task performances.

English

Seungwhan Shane Moon@shane_moon·11 Haz

Introducing our work at #NAACL2021 w/ Sun et al. -- bridging the gap between task-oriented dialog systems and open-domain dialog systems (chit-chat) #NLPRoc #ConvAI 📰Paper, 📂Dataset, 💻Code (for a suite of chit-chat & task code-switching models): github.com/facebookresear… (1/3)

AI at Meta@AIatMeta

We are releasing ACCENTOR, a new data set that combines contextual chit-chat and traditional task-oriented dialogs. Automatic & human evaluations show our models can code-switch seamlessly, making virtual assistant conversations more natural & interactive. github.com/facebookresear…

English

Seungwhan Shane Moon@shane_moon·11 Mar

Two papers from our group were accepted at #NAACL2021 🎉 * Adding chit-chat to enhance task-oriented dialogues: github.com/facebookresear… w/ Kai Sun * A new SOTA for zeroshot cross-domain DST: manuscript📑 to be released soon! @zlinao_lin Kudos to our amazing interns! 😀

English

Seungwhan Shane Moon@shane_moon·22 Oca

The call for track proposals for the next Dialogue System Technology Challenge (DSTC10) is out! More info: dstc9.dstc.community/calls/call-for…

English

Seungwhan Shane Moon@shane_moon·19 Kas

Check out our work on Conversational Curiosity at #emnlp2020! 📄arXiv: arxiv.org/pdf/2005.00172…

Dr. Pedro Rodriguez @[email protected]@EntilZhaPR

Hey <wake-word>, tell me about Punta Cana🇩🇴. Our #emnlp2020 paper introduces a conversational information-seeking dataset on geographic entities. 📜Paper + 📁Dataset + 💻Code: curiosity.pedro.ai Gather 5H: Nov 18 18UTC w/Paul Crook, @shane_moon, Stephen Wang 1/4

English

Seungwhan Shane Moon@shane_moon·11 Eyl

We are running a challenge track at DSTC9 around multimodal conversational AI! To participate: - paper: arxiv.org/pdf/2006.01460… - code & challenge website: github.com/facebookresear…

AI at Meta@AIatMeta

We’ve released SIMMC, a data set on situated and interactive multimodal conversations, to help conversational AI researchers ground conversations in a co-observed and evolving multimodal context. A challenge track at DSTC9 around SIMMC is currently live. ai.facebook.com/blog/simmc-a-d…

English