LLM360

137 posts

LLM360

@llm360

LLM360 is an open research lab enabling community-owned AGI through open-source large model research and development.

เข้าร่วม Kasım 2023

76 กำลังติดตาม2.7K ผู้ติดตาม

ทวีตที่ปักหมุด

LLM360@llm360·5 Ara

To mark the 2nd anniversary of LLM360, we are proud to release K2-V2: a 70B reasoning-centric foundation model that delivers frontier capabilities. As a push for "360-open" transparency, we are releasing not only weights, but the full recipe: data composition, training code, logs, and intermediate checkpoints. About K2-V2: 🧠 70B params, reasoning-optimized 🧊 512K context window 🔓 "360-Open" (Data, Logs, Checkpoints) 📈 SOTA on olympiad math and complex logic puzzles

English

21.3K

LLM360 รีทวีตแล้ว

Eric Xing@ericxing·28 Oca

The #K2Think V2 we released today represents a significant leap of the already-strong performance on deep reasoning over V1 (k2think.ai/k2think), and furthermore drastically improves on reliability, reducing the hallucination rate from 89% to 52% on the AA-Omniscience benchmark while simultaneously increasing long-context reasoning accuracy from 33% to 53%. Powered by the K2-V2 Instruct foundation, it is now the strongest fully open-source reasoning model available, and rivals all frontier reasoning models from the same size class, open or close. Continuing the founding ethos of Institute of Foundation Models (IFM) at @mbzuai, we open source the weights, data, recipe, and development of this release: on top of the #K2Think V1 recipe, and the strong foundation of K2 V2 (huggingface.co/LLM360/K2-V2) base LLM, we further made a few tailored designs that adapt well with the upgraded base model: 1) GRPO with asymmetric policy ratio clipping 2) expanded high quality STEM data sources 3) capability steering with model specific difficulty-based data filtering, 4) full on-policy training with large batch sizes. 5) two-stage progressive context length expansion for training efficiency. Check out our blog post and model for more details: mbzuai.ac.ae/news/k2-think-… @waterluffy, @llm360_agi, @allen_ai, @huggingface

Artificial Analysis@ArtificialAnlys

MBZUAI’s Institute for Foundation Models’ new K2 Think V2 model improves on 70B K2-V2 in intelligence, maintains its tied first place in our Openness Index, and has a low hallucination rate 🇦🇪 Third UAE model on Artificial Analysis in the last month: K2 Think V2 is a dense 70B reasoning model. It was post-trained from MBZUAI’s K2-V2, previously released in December 2025. We also recently evaluated Falcon-H1R-7B from TII, another government-sponsored research center from the UAE 📖 Pushes the frontier in Intelligence vs Openness: Like its base model, K2-V2, K2 Think V2 maintains its tied lead in Openness with the Olmo family from Allen Institute for AI. The Artificial Analysis Openness Index is our newly released, standardized, independently assessed measure of AI model openness across availability and transparency. K2 Think V2 provides full access to pre- and post-training data, as well as publishing training methodology and code with a permissive Apache license allowing free use for any purpose. The model is the most intelligent model among highly open peers 🔮 Low hallucination rate: K2 Think V2’s hallucination rate is among the lowest of relevant comparison models. Its hallucination rate sits between Anthropic’s Claude 4.5 Opus and Claude 4.5 Sonnet - Anthropic being a leader in low-hallucination models. The Artificial Analysis Omniscience Hallucination Rate highlights how often models will incorrectly answer knowledge-based questions, rather than abstain. The significant improvement from K2-V2’s hallucination rate from 89% to 52% demonstrates how post-training can effectively reduce hallucination. This contributes to a strong score of -26 for K2 Think V2 in AA-Omniscience, our new knowledge and hallucination benchmark 📈 Improved intelligence among medium-sized (40-150B) open weights models: K2 Think V2 improves by 4 points on its base model. This improvement is driven largely by its reduced hallucination rate (described above), but also its improved long context reasoning capability. K2 Think V2 scores 53% in AA-LCR, our long context reasoning benchmark, a significant jump from 33% in K2-V2 ⚡Slightly improved token efficiency: K2 Think V2 uses 10% fewer tokens (99M) than its base model (110M) to run our Intelligence Index Congratulations to the team at @mbzuai !

English

3.3K

LLM360@llm360·27 Oca

We rigorously red-teamed K2 Think V2 using libra-eval. It achieves near-perfect scores on standard safety surfaces while maintaining high helpfulness. Dive in now: 🤗 Model & Data: huggingface.co/LLM360/K2-Thin… 📄 Blog Post: mbzuai.ac.ae/news/k2-think-… 💻 Code: github.com/LLM360/Reasoni…

English

280

LLM360@llm360·27 Oca

How did we get here? A dedicated two-stage RLVR (Reinforcement Learning via Verifiable Rewards) process. 1. Stage 1: 32k context training (approx 200 steps). 2. Stage 2: Expanded 64k context for long chain-of-thought. The result is a model that outperforms previous versions on AIME, HMMT, and IFBench.

English

288

LLM360@llm360·27 Oca

Please welcome K2 Think V2, our first fully sovereign 70B reasoning model. Built on the K2-V2 base, this release bridges the gap between community-owned AI and proprietary models. About K2 Think V2: 🧠 70B parameters, RLVR-tuned 🛡️ 100% Sovereign (IFM-curated data only) 🔓 Fully Open (Pre-training to Post-training) 💡 Top-tier Openness & Intelligence

English

8.7K

LLM360@llm360·5 Ara

Ready to dive in? 📄 Technical Report: llm360.ai/reports/K2_V2_… 🤗 Model & Data: huggingface.co/collections/LL… 📊 Analysis: wandb.ai/llm360/K2-V2 💻 Training Code: github.com/llm360/k2v2_tr… 💻 Evaluation Code: github.com/llm360/eval360 A huge thank you to the OSS ecosystem! @huggingface @wandb @github @lmsysorg @AiEleuther @allen_ai @BigCodeProject @PyTorch @nvidia @cerebras @mbzuai and many more.

English

448

LLM360@llm360·5 Ara

The secret sauce is our "Mid-Training" phase. We didn't only fine-tune; we infused reasoning early by feeding K2 billions of reasoning tokens and extending context to 512K tokens. This ensures reasoning is a native behavior. See how K2-High achieves state-of-the-art results by leveraging more "thinking tokens."

English

580

LLM360@llm360·5 Ara

English

21.3K

LLM360@llm360·20 Kas

It is amazing to see it is already the 8th workshop for scaling law. The LLM360 team will also be attending this, and @waterluffy will share what the team is doing at @mbzuai Institute of Foundation Models.

Daria Soboleva@dmsobol

I am excited to be organizing the 8th scaling workshop at @NeurIPSConf this year! Dec 5-6 | 5-8pm PT | Hard Rock Hotel San Diego Co-organized by @cerebras, @Mila_Quebec, and @mbzuai Register: luma.com/gy51vuqd

English

2.4K

LLM360@llm360·5 Ağu

Check out the FastVideo series that allow you to generate a video in real time (5 second video in 1 second)! Try it out on your own hardware too. Kudos to the team for democratize these by providing efficient methods and the full recipe.

Hao AI Lab@haoailab

(1/n) 🚀 With FastVideo, you can now generate a 5-second video in 5 seconds on a single H200 GPU! Introducing FastWan series, a family of fast video generation models trained via a new recipe we term as “sparse distillation”, to speed up video denoising time by 70X! 🖥️ Live demo: fastwan.fastvideo.org (Thanks to @gmicloud for the support!) 🔗 Blog: hao-ai-lab.github.io/blogs/fastvide… 🔓 We fully open-source our models, code, and data with Apache-2.0 licenses

English

1.6K

LLM360@llm360·14 Haz

Our team is lucky to have "early access" of this work from the IFM talk given by @ssahoo_

Subham Sahoo@ssahoo_

🚨 “The Diffusion Duality” is out! @ICML2025 ⚡️ Few-step generation in discrete diffusion language models by exploiting the underlying Gaussian diffusion. 🦾Beats AR on 3/7 zero-shot likelihood benchmarks. 📄 Paper: arxiv.org/abs/2506.10892 💻 Code: github.com/s-sahoo/duo 🧠 Blog: s-sahoo.com/duo/ (1/8)

English

2.5K

LLM360@llm360·6 Haz

@jxmnop what do you think about @llm360

English

1.4K

LLM360@llm360·4 Haz

KV-caching is great, but will it work for Diffusion Language Models. @zhihanyang_ and team showed how to make it work with 65x speedup 🚀! Checkout the new preprint: arxiv.org/abs/2506.01928 The LLM360 team is very interested to explore new architectures.

Zhihan Yang@zhihanyang_

📢Thrilled to share our new paper: Esoteric Language Models (Eso-LMs) > 🔀Fuses autoregressive (AR) and masked diffusion (MDM) paradigms > 🚀First to unlock KV caching for MDMs (65x speedup!) > 🥇Sets new SOTA on generation speed-vs-quality Pareto frontier How? Dive in👇 [🧵1/13] 📜Paper: arxiv.org/abs/2506.01928 📘Blog: s-sahoo.com/Eso-LMs/ 💻Code: github.com/s-sahoo/Eso-LMs Project co-led with @ssahoo_

English

3.6K

LLM360@llm360·26 May

The LLM360 team continue to adhere to the value of open source, and we believe data is one of the most important ingredient. Please let our team know if you have any comments. We are eager to hear voice of the community!

English

351

LLM360@llm360·26 May

🐍 Unlock the power of long-context data with our new Wikipedia Extended and Aligned Europral datasets in v1.1! We've created long context documents by extending Wikipedia articles with related abstracts, and align multiple Europarl languages into one training samples! #LongContext #Data #Wikipedia

English

402

LLM360@llm360·26 May

📢📢 TxT360 has been updated to v1.1: 🌟 BestofWeb: high-quality doc set from the web ❓ QA: Large Scale Synthetic Q&A dataset 📖 Wiki_extended: extended wiki articles via links 🌍 Europarl Aligned: reformatted long aligned corpus huggingface.co/datasets/LLM36… #AIResearch #OpenSource

English

4.5K

ค้นพบ

@mbzuai @waterluffy @llm360_agi @allen_ai @huggingface @wandb @github @lmsysorg