Can Xu

71 posts

Can Xu

@CanXu20

Principal Researcher @TencentHunyuan. Ex Founder @WizardLM_AI Creator of WizardLM Family: WizardLM, WizardCoder and WizardMath.

Seattle Katılım Ağustos 2021

316 Takip Edilen1.4K Takipçiler

Sabitlenmiş Tweet

Can Xu@CanXu20·28 Tem

🚀 ArtifactsBench v1.1 is here!🔥🔥🔥 Automated visual/frontend code evaluation benchmark with full transparency! 🎯 94.4% consistency with WebDev Arena 🆕 Major v1.1 updates: Added more models @Alibaba_Qwen @Kimi_Moonshot etc 🔓 100% open-source with complete reproducibility

English

102

10.4K

Can Xu@CanXu20·14 Tem

🚀🚀🚀

Tencent Hy@TencentHunyuan

Major milestone for Hunyuan-large-vision! 🚀 Our multimodal understanding model has secured the #1 spot among Chinese models on the LMSYS Vision Arena leaderboard this week. Globally, we've climbed to #12 overall and an impressive #5 globally when removing style control. See the full rankings here: lmarena.ai/leaderboard/vi…

ART

965

Can Xu retweetledi

Tencent Hy@TencentHunyuan·9 Tem

🚀Thrilled to introduce #ArtifactsBench! We're bridging the visual-interactive gap in code generation evaluation. Our benchmark uses a novel automated, multimodal pipeline to assess LLMs on 1,825 diverse tasks. An MLLM-as-Judge evaluates visual artifacts, achieving 94.4% ranking consistency with human experts! Moving beyond algorithmic correctness to a true "what you see is what you get" standard. 🌐 Project: artifactsbenchmark.github.io 📄 Paper: arxiv.org/abs/2507.04952 💻 Code: github.com/Tencent-Hunyua… 🗂️ Dataset: huggingface.co/datasets/tence…

English

107

9.2K

Can Xu@CanXu20·8 Tem

ZXX

448

Can Xu@CanXu20·8 Tem

Introducing ArtifactsBench 🎨 An MLLM-as-Judge that evaluates AI-generated UI by looking at a live render. Our pipeline captures interactions via temporal screenshots & scores visual/interactive fidelity against a per-task checklist. This achieves a stunning 94.4% ranking consistency with human votes on WebDev Arena! 🤯 We're open-sourcing everything to accelerate user-centric AI. Explore the project: 🌐 Website: artifactsbenchmark.github.io 📄 Paper: arxiv.org/abs/2507.04952 💻 Code: github.com/Tencent-Hunyua… 📊Dataset:huggingface.co/datasets/tence…

English

4.9K

Can Xu@CanXu20·30 May

Beyond Hunyuan-TurboS's accomplishments, please also pay more attention to the technological innovations. The impact of these innovative endeavors extends much further than the model itself. All these details are recorded in our technical report (arxiv.org/pdf/2505.15431)

Emily Watson | AI Tools & Tech News@saxxhii_

BOOM 🚨: Tencent’s Hunyuan Turbo S just landed in the Top 8 globally on Chatbot Arena and is now #2 in China, just behind DeepSeek. Even Google ex-CEO Eric Schmidt says China had no foundation models two years ago — now it has three: DeepSeek, Qwen, Hunyuan — on par with OpenAI’s O.1. At #GoogleIO, all three scored high on the global leaderboard like #DeepSeek, #Qwen, #Hunyuan @TencentHunyuan isn’t just racing — it’s rewriting the leaderboard. #HunyuanTurboS #TencentAI #LLM #AIRevolution

English

2.1K

Can Xu retweetledi

Tencent Hy@TencentHunyuan·28 May

📢 Introducing Adaptive Deep Reasoning—a novel approach that dynamically selects between long-chain and short-chain reasoning based on problem complexity, without compromising performance. 🧠 Two-stage training framework: 1️⃣ Mixed Supervised Fine-tuning – Equips the model with both reasoning modes. 2️⃣ Reinforcement Learning – Integrates GRPO with a long-short adaptive group-wise reward strategy that dynamically assesses prompt complexity to provide customized rewards while optimizing reasoning length when appropriate. Uses a logit-based switching loss to optimize the model’s initial token selection, ensuring the right reasoning mode is chosen. ✅ Key benefits: Seamless reasoning mode switching. Maintains long-chain reasoning accuracy while improving efficiency. 📝Technical Details: arxiv.org/pdf/2505.20101

English

208

12.7K

Can Xu retweetledi

Rohan Paul@rohanpaul_ai·26 May

This paper introduces Hunyuan-TurboS, a hybrid architecture that combines different model strengths with adaptive reasoning to optimize responses based on complexity, balancing performance and efficiency. Methods 🔧: → The architecture is a hybrid Transformer-Mamba Mixture of Experts with 128 layers and AMF, MF block patterns. → It uses an Adaptive Long-short Chain-of-Thought Fusion method involving a trained teacher model and reinforcement learning for dynamic strategy selection. → Multi-round Deliberation Learning employs LLM judge ensembles and human experts in a data flywheel for iterative refinement. → A Two-stage Reinforcement Learning process uses Generative Reward Preference Optimization, first for STEM reasoning, then for general instruction following. 📌 Hybrid design effectively balances Transformer reasoning and Mamba efficiency. 📌 Adaptive thinking significantly lowers inference cost by reducing generation tokens. 📌 Multi-stage alignment tailors model capabilities across diverse complex domains right now. ---------------------------- Paper - arxiv. org/abs/2505.15431 Paper Title: "Hunyuan-TurboS: Advancing LLMs through Mamba-Transformer Synergy and Adaptive Chain-of-Thought"

English

2.1K

Can Xu retweetledi

Aran Komatsuzaki@arankomatsuzaki·22 May

Tencent presents Hunyuan-TurboS - Hybrid Transformer-Mamba MoE (56B active params) trained on 16T tokens - Dynamically switching between rapid responses and deep ”thinking” modes - Overall top 7 on LMSYS Chatbot Arena

English

109

9.3K

Can Xu retweetledi

Tencent Hy@TencentHunyuan·20 May

✨ Hunyuan-TurboS: Technical Highlights✨ 🥇 Top 8 on LMSYS Chatbot Arena, beat o4-mini and gemini-2.0-flash ⚡ 560B Hybrid-Transformer-Mamba MoE, 180% Speedup 🧠 Adaptive CoT: 50% of top-tier thinking model's output length 📖 Paper: github.com/Tencent-Hunyua… 💻 Demo: huggingface.co/spaces/tencent…

English

160

47.3K

Can Xu@CanXu20·13 May

@brawll66 you can try hunyuan-turbos-20250416 on lmsys website lmarena.ai

English

1.4K

Rahul Mutreja@brawll66·13 May

@CanXu20 is there a way to use Hunyun-Turbos LLM outside of China?

English

1.5K

Can Xu@CanXu20·21 Mar

🚀 Introducing Hunyuan-T1 – The Next Leap in AI Reasoning! 🔥 ✅ 4x Performance Boost – Advanced reasoning, faster decoding 🚀 ✅ TurboS Advantage – Long-text understanding & efficient computation ⚡ ✅ Reinforcement Learning-Driven – 96.7% compute dedicated to reasoning 📈 ✅ Benchmark Leading – Comparable or better than R1 across key tests 🏆

Tencent Hy@TencentHunyuan

🚀 Introducing Hunyuan-T1! 🌟 Meet Hunyuan-T1, the latest breakthrough in AI reasoning! Powered by Hunyuan TurboS, it's built for speed, accuracy, and efficiency. 🔥 ✅ Hybrid-Mamba-Transformer MoE Architecture – The first of its kind for ultra-large-scale reasoning ✅ Strong Logic & Concise Writing – Precise following of complex instructions ✅ Low Hallucination in Summaries –Trustworthy and reliable outputs ✅ Blazing Fast –First character in 1 sec, 60-80 tokens/sec generation speed ✅ Excellent Long-Text Processing –Handle complex contexts with ease 📌 Try it now! 🔍 T1 Experience: (llm.hunyuan.tencent.com/#/chat/hy-t1) 💻 T1 Demo: (huggingface.co/spaces/tencent…) 📖 Blog: (llm.hunyuan.tencent.com/#/blog/hy-t1?l…) 💬 Join Discord: (discord.gg/dNBrdrGGMa)

English

1.5K

Can Xu retweetledi

Tencent Hy@TencentHunyuan·20 Mar

HunYuan-T1 is built on the TurboS foundation, which debuted in the LMSYS Chatbot Arena @lmarena_ai and ranked among the global top 15! 🚀

English

158

26.3K

Can Xu@CanXu20·17 Mar

Top 15！ Congratulations to all the collaborators who participated in this interesting competition

Arena.ai@arena

New on Arena leaderboard: @Alibaba_Qwen QwQ-32B and @TXhunyuan's HunYuan-TurboS! - Alibaba's QwQ-32B (#12): strong reasoning model with just 32B size. - Tencent's HunYuan-TurboS debuts in the top 15, catching up Zhipu & StepFun. Congrats to both - competition keeps heating up! 🔥

English

3.6K

Can Xu@CanXu20·10 Mar

Our Mamba MoE model!

Tencent Hy@TencentHunyuan

🚀 Introducing Hunyuan-TurboS – the first ultra-large Hybrid-Transformer-Mamba MoE model! Traditional pure Transformer models struggle with long-text training and inference due to O(N²) complexity and KV-Cache issues. Hunyuan-TurboS combines: ✅ Mamba's efficient long-sequence processing ✅ Transformer's strong contextual understanding 🔥 Results: - Outperforms GPT-4o-0806, DeepSeek-V3, and open-source models on Math, Reasoning, and Alignment - Competitive on Knowledge, including MMLU-Pro 1/7 lower inference cost than our previous Turbo model 📌 Post-Training Enhancements: - Slow-thinking integration improves math, coding, and reasoning - Refined instruction tuning boosts alignment and agent execution - English training optimization for better general performance 🎯 Upgraded Reward System: - Rule-based scoring & consistency verification - Code sandbox feedback for higher STEM accuracy - Generative-based reward improve QA and creativity, reducing reward hacking The future of AI is here! 🚀

Nederlands

1.1K

Can Xu@CanXu20·31 Eki

@SebastienBubeck @OpenAI @sama Congratulations!

English

187

Sebastien Bubeck@SebastienBubeck·31 Eki

Just started at @OpenAI and I couldn't be more excited to join at this pivotal moment of safe AGI development! Met so many old friends already, talent density of this place is just insane!! Thank you all for the warm welcome, and in particular @sama. Now let the unicorns fly!

English

905

104K

Can Xu@CanXu20·27 Eyl

🎉Excited to see the core techniques of WizardLM2 has been accepted to #EMNLP2024 and #NeurIPS2024! 👉Automatic Instruction Evolving for Large Language Models (arxiv.org/abs/2406.00770) 👉Arena Learning: Build Data Flywheel for LLMs Post-training via Simulated Chatbot Arena (arxiv.org/abs/2407.10627) Congrats to all outstanding co-authors. We will not stop our journey towards AGI.

English

5.2K

Can Xu retweetledi

Sam Rodriques@SGRodriques·16 Tem

Today, we're releasing LAB-Bench, a set of >2000 evaluations for language models and agents on scientific research tasks in biology. Public models underperform PhD/postdoc-level humans on nearly all tasks. Claude 3.5 Sonnet is the clear frontrunner atm, but long way to go. 1/

English

309

73.1K

Can Xu retweetledi

Pietro Schirano@skirano·15 Tem

Introducing Claude Engineer 2.0, with agents! 🚀 Biggest update yet with the addition of a code editor and code execution agents, and dynamic editing. When editing files (especially large ones), Engineer will direct a coding agent, and the agent will provide changes in batches. Batches are smartly selected based on file complexity. The code execution agent will run the code and check for issues. It can even start processes (like live servers) and end them. It's insanely powerful! 🔥

English

103

385

3.2K

393.3K

Can Xu retweetledi

Philipp Schmid@_philschmid·16 Tem

Mistral releases their first Mamba Model! 🐍 Codestral Mamba 7B is a Code LLM based on the Mamba2 architecture. Released under Apache 2.0 and achieves 75% on HumanEval for Python Coding. 👀 Blog: mistral.ai/news/codestral… Model: huggingface.co/mistralai/mamb… They also released a Math fine-tuning base on Mistral 7B that achieves 56.6% on MATH and 63.47% on MMLU. Blog: mistral.ai/news/mathstral/ Model: huggingface.co/mistralai/math…

English

376

32.2K

Keşfet

@brawll66 @lmarena_ai @SebastienBubeck @OpenAI @sama @elonmusk @BarackObama @taylorswift13