Seungwhan Shane Moon

25 posts

Seungwhan Shane Moon

Seungwhan Shane Moon

@shane_moon

| Research Scientist @ Facebook | | PhD @ LTI SCS, CMU |

Seattle, WA Katılım Mart 2010
190 Takip Edilen696 Takipçiler
Seungwhan Shane Moon
Seungwhan Shane Moon@shane_moon·
We're hiring exceptional AI Research Scientists to join our team at Meta Reality Labs, where you'll work on cutting-edge projects in Vision LLMs. Please reach out to me directly via email with your resume! (Check minimum qualifications) metacareers.com/jobs/388290281…
English
2
33
228
27.6K
Seungwhan Shane Moon
Seungwhan Shane Moon@shane_moon·
We are hiring PhD AI Research Interns to work on various projects around Multimodal LLM for Summer 2024 (Reality Labs). Please reach out to me directly via email with your resume!
English
8
28
217
52.3K
Seungwhan Shane Moon
Seungwhan Shane Moon@shane_moon·
@KuterDinel Hi @KuterDinel, great point, I do agree it'll be more robust that way, but it'll be computationally much more costly to pre-train it e2e from scratch, and re-do instruction tuning & RLHF for the LLM (hence "scalable and efficient" in our title).
English
0
0
0
66
Kuter Dinel
Kuter Dinel@KuterDinel·
@shane_moon This is amazing. I have a question though, wouldn't the model have more general reasoning capabilities if it was trained with multi-modal data from ground up instead of fine tuned later ? Is the fine tuning method preferred because of the cost of training a model like LLAMA 70b
English
1
0
1
297
Seungwhan Shane Moon
Seungwhan Shane Moon@shane_moon·
Excited to share our recent work, AnyMAL -- a unified Multimodal LLM built on LLaMA-2 that can reason over various inputs, e.g. images, audio, motion sensors. Check out our paper for more information on the model training, evaluation, safety and more! ➡️ arxiv.org/abs/2309.16058
Seungwhan Shane Moon tweet media
English
4
24
122
22.5K
Seungwhan Shane Moon retweetledi
AK
AK@_akhaliq·
Meta introduces AnyMAL - a unified model that reasons over diverse input modality signals (i.e. text, image, video, audio, IMU motion sensor), and generates textual responses - best model achieves strong zero-shot performance in both automatic and human evaluation on diverse tasks and modalities, setting new SOTA with +7.0% relative accuracy improvement on VQAv2, +8.4% CIDEr on zeroshot COCO image captioning, and +14.5% CIDEr on AudioCaps, when compared with the models available in the literature.
AK tweet media
AK@_akhaliq

AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model paper page: huggingface.co/papers/2309.16… present Any-Modality Augmented Language Model (AnyMAL), a unified model that reasons over diverse input modality signals (i.e. text, image, video, audio, IMU motion sensor), and generates textual responses. AnyMAL inherits the powerful text-based reasoning abilities of the state-of-the-art LLMs including LLaMA-2 (70B), and converts modality-specific signals to the joint textual space through a pre-trained aligner module. To further strengthen the multimodal LLM's capabilities, we fine-tune the model with a multimodal instruction set manually collected to cover diverse topics and tasks beyond simple QAs. We conduct comprehensive empirical analysis comprising both human and automatic evaluations, and demonstrate state-of-the-art performance on various multimodal tasks.

English
6
106
590
234.1K
Seungwhan Shane Moon
Seungwhan Shane Moon@shane_moon·
Meta Reality Lab is organizing "Ambient AI Workshop" -- focusing on multimodal understanding with wearable sensors, combining NLP + Vision + Sensor Signals. For more details & call for paper (now due Mar 26): sites.google.com/view/ambientai… We look forward to your participation!
English
0
3
9
1.1K
Seungwhan Shane Moon
Seungwhan Shane Moon@shane_moon·
(3/3) ACCENTOR datasets: We propose a Human ↔ AI collaborative data collection approach for generating diverse chitchat responses to augment ToD dialogs with minimal annotation effort. Results: chit-chat additions to 23K+ dialogs from two popular ToD datasets (SGD & MultiWoZ2.1)
Seungwhan Shane Moon tweet media
English
0
0
2
0
Seungwhan Shane Moon
Seungwhan Shane Moon@shane_moon·
(2/3) Results? - (Interaction eval) People like them! Our models are consistently preferred by human judges across the four axes (engagingness, etc.), compared to the baseline assistant models. - (Task eval) Our models still maintain competitive task performances.
Seungwhan Shane Moon tweet media
English
1
0
2
0
Seungwhan Shane Moon
Seungwhan Shane Moon@shane_moon·
Introducing our work at #NAACL2021 w/ Sun et al. -- bridging the gap between task-oriented dialog systems and open-domain dialog systems (chit-chat) #NLPRoc #ConvAI 📰Paper, 📂Dataset, 💻Code (for a suite of chit-chat & task code-switching models): github.com/facebookresear… (1/3)
AI at Meta@AIatMeta

We are releasing ACCENTOR, a new data set that combines contextual chit-chat and traditional task-oriented dialogs. Automatic & human evaluations show our models can code-switch seamlessly, making virtual assistant conversations more natural & interactive. github.com/facebookresear…

English
1
4
15
0
Seungwhan Shane Moon retweetledi
AI at Meta
AI at Meta@AIatMeta·
We’ve released SIMMC, a data set on situated and interactive multimodal conversations, to help conversational AI researchers ground conversations in a co-observed and evolving multimodal context. A challenge track at DSTC9 around SIMMC is currently live. ai.facebook.com/blog/simmc-a-d…
AI at Meta tweet media
English
3
26
87
0