

Yang Liu
76 posts

@nlpyang
#LLM Researcher @Microsoft; PhD @EdinburghNLP








Evaluating LLMs usually requires sophisticated human designs and with the continuous improvement of LLMs, it is difficult for humans to find their limitations. Can LLMs find their own limitations by proposing questions to themselves? Check our new paper: arxiv.org/abs/2408.08978

Introducing Samba 3.8B, a simple Mamba+Sliding Window Attention architecture that outperforms Phi3-mini on major benchmarks (e.g., MMLU, GSM8K and HumanEval) by a large margin.😮 And it has an infinite context length with linear complexity.🤯 Paper: arxiv.org/abs/2406.07522 (1/6)

Introducing Samba 3.8B, a simple Mamba+Sliding Window Attention architecture that outperforms Phi3-mini on major benchmarks (e.g., MMLU, GSM8K and HumanEval) by a large margin.😮 And it has an infinite context length with linear complexity.🤯 Paper: arxiv.org/abs/2406.07522 (1/6)


The top line number for MMLU is a bit gamed - Gemini is actually worse than GPT-4 when compared on normal few shot or chain of thought

🔥Excited to introduce CoDi-2! It follows complex multimodal-interleaved in-context instructions to generate any modalities (text, vision, audio) in zero/few-shot interactive way! codi-2.github.io huggingface.co/papers/2311.18… @yzy_ai @nlpyang @ChenguangZhu2 @mohitban47 🧵👇