Can Xu

71 posts

Can Xu

Can Xu

@CanXu20

Principal Researcher @TencentHunyuan. Ex Founder @WizardLM_AI Creator of WizardLM Family: WizardLM, WizardCoder and WizardMath.

Seattle Katılım Ağustos 2021
316 Takip Edilen1.4K Takipçiler
Sabitlenmiş Tweet
Can Xu
Can Xu@CanXu20·
🚀 ArtifactsBench v1.1 is here!🔥🔥🔥 Automated visual/frontend code evaluation benchmark with full transparency! 🎯 94.4% consistency with WebDev Arena 🆕 Major v1.1 updates: Added more models @Alibaba_Qwen @Kimi_Moonshot etc 🔓 100% open-source with complete reproducibility
Can Xu tweet mediaCan Xu tweet media
English
8
11
102
10.4K
Can Xu retweetledi
Tencent Hy
Tencent Hy@TencentHunyuan·
🚀Thrilled to introduce #ArtifactsBench! We're bridging the visual-interactive gap in code generation evaluation. Our benchmark uses a novel automated, multimodal pipeline to assess LLMs on 1,825 diverse tasks. An MLLM-as-Judge evaluates visual artifacts, achieving 94.4% ranking consistency with human experts! Moving beyond algorithmic correctness to a true "what you see is what you get" standard. 🌐 Project: artifactsbenchmark.github.io 📄 Paper: arxiv.org/abs/2507.04952 💻 Code: github.com/Tencent-Hunyua… 🗂️ Dataset: huggingface.co/datasets/tence…
Tencent Hy tweet mediaTencent Hy tweet mediaTencent Hy tweet mediaTencent Hy tweet media
English
6
20
107
9.2K
Can Xu
Can Xu@CanXu20·
Can Xu tweet media
ZXX
1
0
3
448
Can Xu
Can Xu@CanXu20·
Introducing ArtifactsBench 🎨 An MLLM-as-Judge that evaluates AI-generated UI by looking at a live render. Our pipeline captures interactions via temporal screenshots & scores visual/interactive fidelity against a per-task checklist. This achieves a stunning 94.4% ranking consistency with human votes on WebDev Arena! 🤯 We're open-sourcing everything to accelerate user-centric AI. Explore the project: 🌐 Website: artifactsbenchmark.github.io 📄 Paper: arxiv.org/abs/2507.04952 💻 Code: github.com/Tencent-Hunyua… 📊Dataset:huggingface.co/datasets/tence…
Can Xu tweet mediaCan Xu tweet media
English
3
9
30
4.9K
Can Xu
Can Xu@CanXu20·
Beyond Hunyuan-TurboS's accomplishments, please also pay more attention to the technological innovations. The impact of these innovative endeavors extends much further than the model itself. All these details are recorded in our technical report (arxiv.org/pdf/2505.15431)
Emily Watson | AI Tools & Tech News@saxxhii_

BOOM 🚨: Tencent’s Hunyuan Turbo S just landed in the Top 8 globally on Chatbot Arena and is now #2 in China, just behind DeepSeek. Even Google ex-CEO Eric Schmidt says China had no foundation models two years ago — now it has three: DeepSeek, Qwen, Hunyuan — on par with OpenAI’s O.1. At #GoogleIO, all three scored high on the global leaderboard like #DeepSeek, #Qwen, #Hunyuan @TencentHunyuan isn’t just racing — it’s rewriting the leaderboard. #HunyuanTurboS #TencentAI #LLM #AIRevolution

English
1
1
10
2.1K
Can Xu retweetledi
Tencent Hy
Tencent Hy@TencentHunyuan·
📢 Introducing Adaptive Deep Reasoning—a novel approach that dynamically selects between long-chain and short-chain reasoning based on problem complexity, without compromising performance. 🧠 Two-stage training framework: 1️⃣ Mixed Supervised Fine-tuning – Equips the model with both reasoning modes. 2️⃣ Reinforcement Learning – Integrates GRPO with a long-short adaptive group-wise reward strategy that dynamically assesses prompt complexity to provide customized rewards while optimizing reasoning length when appropriate. Uses a logit-based switching loss to optimize the model’s initial token selection, ensuring the right reasoning mode is chosen. ✅ Key benefits: Seamless reasoning mode switching. Maintains long-chain reasoning accuracy while improving efficiency. 📝Technical Details: arxiv.org/pdf/2505.20101
Tencent Hy tweet mediaTencent Hy tweet mediaTencent Hy tweet mediaTencent Hy tweet media
English
8
32
208
12.7K
Can Xu retweetledi
Rohan Paul
Rohan Paul@rohanpaul_ai·
This paper introduces Hunyuan-TurboS, a hybrid architecture that combines different model strengths with adaptive reasoning to optimize responses based on complexity, balancing performance and efficiency. Methods 🔧: → The architecture is a hybrid Transformer-Mamba Mixture of Experts with 128 layers and AMF, MF block patterns. → It uses an Adaptive Long-short Chain-of-Thought Fusion method involving a trained teacher model and reinforcement learning for dynamic strategy selection. → Multi-round Deliberation Learning employs LLM judge ensembles and human experts in a data flywheel for iterative refinement. → A Two-stage Reinforcement Learning process uses Generative Reward Preference Optimization, first for STEM reasoning, then for general instruction following. 📌 Hybrid design effectively balances Transformer reasoning and Mamba efficiency. 📌 Adaptive thinking significantly lowers inference cost by reducing generation tokens. 📌 Multi-stage alignment tailors model capabilities across diverse complex domains right now. ---------------------------- Paper - arxiv. org/abs/2505.15431 Paper Title: "Hunyuan-TurboS: Advancing LLMs through Mamba-Transformer Synergy and Adaptive Chain-of-Thought"
Rohan Paul tweet media
English
1
5
10
2.1K
Can Xu retweetledi
Aran Komatsuzaki
Aran Komatsuzaki@arankomatsuzaki·
Tencent presents Hunyuan-TurboS - Hybrid Transformer-Mamba MoE (56B active params) trained on 16T tokens - Dynamically switching between rapid responses and deep ”thinking” modes - Overall top 7 on LMSYS Chatbot Arena
Aran Komatsuzaki tweet media
English
1
18
109
9.3K
Can Xu retweetledi
Tencent Hy
Tencent Hy@TencentHunyuan·
✨ Hunyuan-TurboS: Technical Highlights✨ 🥇 Top 8 on LMSYS Chatbot Arena, beat o4-mini and gemini-2.0-flash ⚡ 560B Hybrid-Transformer-Mamba MoE, 180% Speedup 🧠 Adaptive CoT: 50% of top-tier thinking model's output length 📖 Paper: github.com/Tencent-Hunyua… 💻 Demo: huggingface.co/spaces/tencent…
Tencent Hy tweet media
English
28
29
160
47.3K
Rahul Mutreja
Rahul Mutreja@brawll66·
@CanXu20 is there a way to use Hunyun-Turbos LLM outside of China?
English
1
0
1
1.5K
Can Xu retweetledi
Tencent Hy
Tencent Hy@TencentHunyuan·
HunYuan-T1 is built on the TurboS foundation, which debuted in the LMSYS Chatbot Arena @lmarena_ai and ranked among the global top 15! 🚀
Tencent Hy tweet media
English
10
19
158
26.3K
Can Xu
Can Xu@CanXu20·
Top 15! Congratulations to all the collaborators who participated in this interesting competition
Arena.ai@arena

New on Arena leaderboard: @Alibaba_Qwen QwQ-32B and @TXhunyuan's HunYuan-TurboS! - Alibaba's QwQ-32B (#12): strong reasoning model with just 32B size. - Tencent's HunYuan-TurboS debuts in the top 15, catching up Zhipu & StepFun. Congrats to both - competition keeps heating up! 🔥

English
0
2
18
3.6K
Sebastien Bubeck
Sebastien Bubeck@SebastienBubeck·
Just started at @OpenAI and I couldn't be more excited to join at this pivotal moment of safe AGI development! Met so many old friends already, talent density of this place is just insane!! Thank you all for the warm welcome, and in particular @sama. Now let the unicorns fly!
Sebastien Bubeck tweet media
English
70
28
905
104K
Can Xu
Can Xu@CanXu20·
🎉Excited to see the core techniques of WizardLM2 has been accepted to #EMNLP2024 and #NeurIPS2024! 👉Automatic Instruction Evolving for Large Language Models (arxiv.org/abs/2406.00770) 👉Arena Learning: Build Data Flywheel for LLMs Post-training via Simulated Chatbot Arena (arxiv.org/abs/2407.10627) Congrats to all outstanding co-authors. We will not stop our journey towards AGI.
English
1
5
29
5.2K
Can Xu retweetledi
Sam Rodriques
Sam Rodriques@SGRodriques·
Today, we're releasing LAB-Bench, a set of >2000 evaluations for language models and agents on scientific research tasks in biology. Public models underperform PhD/postdoc-level humans on nearly all tasks. Claude 3.5 Sonnet is the clear frontrunner atm, but long way to go. 1/
Sam Rodriques tweet media
English
9
64
309
73.1K
Can Xu retweetledi
Pietro Schirano
Pietro Schirano@skirano·
Introducing Claude Engineer 2.0, with agents! 🚀 Biggest update yet with the addition of a code editor and code execution agents, and dynamic editing. When editing files (especially large ones), Engineer will direct a coding agent, and the agent will provide changes in batches. Batches are smartly selected based on file complexity. The code execution agent will run the code and check for issues. It can even start processes (like live servers) and end them. It's insanely powerful! 🔥
English
103
385
3.2K
393.3K
Can Xu retweetledi
Philipp Schmid
Philipp Schmid@_philschmid·
Mistral releases their first Mamba Model! 🐍 Codestral Mamba 7B is a Code LLM based on the Mamba2 architecture. Released under Apache 2.0 and achieves 75% on HumanEval for Python Coding. 👀 Blog: mistral.ai/news/codestral… Model: huggingface.co/mistralai/mamb… They also released a Math fine-tuning base on Mistral 7B that achieves 56.6% on MATH and 63.47% on MMLU. Blog: mistral.ai/news/mathstral/ Model: huggingface.co/mistralai/math…
Philipp Schmid tweet media
English
6
85
376
32.2K