Rubio Huang

32 posts

Rubio Huang

@HuangRubio

Katılım Eylül 2019

300 Takip Edilen45 Takipçiler

Rubio Huang retweetledi

Rui-Jie (Ridger) Zhu@RidgerZhu·30 Eki

Thrilled to release new paper: “Scaling Latent Reasoning via Looped Language Models.” TLDR: We scale up loop language models to 2.6 billion parameters, and pretrained on > 7 trillion tokens. The resulting model is on par with SOTA language models of 2 to 3x size.

English

150

685

173.9K

Rubio Huang retweetledi

yshan@yshan783399·20 Ağu

We are thrilled to introduce the Seed-OSS family of open-source LLMs, developed by ByteDance's Seed Team. GitHub: github.com/ByteDance-Seed… HuggingFace: huggingface.co/collections/By… Feel free to try it out and share your feedback!

English

206

40.7K

Rubio Huang@HuangRubio·21 Ağu

@elonmusk didn’t lie. Grok-4 sees the future. 🔮🚀 🏆 FutureX Leaderboard (live benchmark for real-world foresight) futurex-ai.github.io

Jiashuo Liu@liujiashuo77

We built FutureX, the world’s first live benchmark for real future prediction — politics, economy, culture, sports, etc. Among 23 AI agents, #Grok4 ranked #1 🏆 Elon didn’t lie. @elonmusk your model sees further 🚀🍀 LeaderBoard: futurex-ai.github.io

English

Rubio Huang retweetledi

Daoguang Zan@zandaoguang·7 Nis

🔥 Can your LLM fix bugs beyond Python? Meet our Multi-SWE-bench — the first multilingual benchmark for issue resolving. Not just Python, but Java, TS, JS, Go, Rust, C, and C++🧩 💥 1,632 real-world issues ✅ Verified by 68 engineers 📦 Dockerized, reproducible, battle-tested 🧠 Covers easy, medium, and hard bug fixes 📊 Designed to benchmark LLMs as true dev agents To scale beyond benchmarks, we also launch Multi-SWE-RL — 🎮 An open-source RL community to build interactive training environments for LLMs as autonomous agents. 🌱 4,723 containerized issue-resolving tasks, 7 languages, and counting. 🤝 We invite the community to contribute, expand, and shape the future of software-native RL. It took us a year to build. Now let’s see what your model can do. 🏆 Leaderboard: multi-swe-bench.github.io 📄 Paper: arxiv.org/abs/2504.02605 🧬 Code: github.com/multi-swe-benc… 📚 Multi-SWE-bench Dataset: huggingface.co/datasets/ByteD… 🎮 Multi-SWE-RL Dataset: huggingface.co/datasets/ByteD… #LLM #RL #SWEbench #OpenAI #Anthropic #DeepSeek #Doubao

English

13.1K

Rubio Huang@HuangRubio·5 Mar

Great MOE kernels

Haibin@eric_haibin_lin

❗️Open source MOE kernels alert❗️ Introducing COMET, a computation/communication library for MoE models from Bytedance. Battle-tested in our 10k+ GPU clusters, COMET shows promising efficiency gains and significant GPU-hour savings (millions 💰💰💰). Integration of DualPipe & DeepEP requires too much effort? Try COMET, a drop in replacement for your MOE block! Key Points: ✅ Deployed on 10K+ GPU cluster, saved MILLIONS of GPU hours ✅ 1.96x layer-wise speedup, 1.71x end-to-end boost for MoE models ✅ Fine-grained Computation-communication Overlapping for MoE Why devs care: 📌 Plug-and-play with existing frameworks (just a few lines of code change) 📌 Supports ALL MoE parallel modes: TP/EP/EP+TP 📌 MLSys'25 top scores (5/5/5/4) - battle-tested at scale 📄 Paper: arxiv.org/pdf/2502.19811 📦 Code: github.com/bytedance/flux… Great work done by Shulai, @NingxinZheng_ and team #OpenSource #LLM #MOE #MLSys2025 #CUDA

Nederlands

Rubio Huang retweetledi

Ge Zhang@GeZhang86038849·21 Şub

[1/n] SuperExcited to announce SuperGPQA!!! We spend more than half a year to finally make it done! SuperGPQA is a comprehensive benchmark that evaluates graduate-level knowledge and reasoning capabilities across 285 disciplines. It also provides the largest human-LLM collaborated high-quality benchmark annotation practice! We thank the sponsorship from Bytedance.Inc and 2077.AI! Resources: Websites: supergpqa.github.io Huggingface: huggingface.co/datasets/m-a-p… Github: github.com/SuperGPQA/Supe… Paper: arxiv.org/abs/2502.14739 HF Paper: huggingface.co/papers/2502.14…

English

213

29.2K

Rubio Huang retweetledi

Qian Liu@sivil_taram·9 Oca

🎉 Announcing the first Open Science for Foundation Models (SCI-FM) Workshop at #ICLR2025! Join us in advancing transparency and reproducibility in AI through open foundation models. 🤝 Looking to contribute? Join our Program Committee: bit.ly/4acBBjF 🔍 Learn more at: open-foundation-model.github.io #OpenScience #MachineLearning #FoundationModels 1/N

English

172

41K

Rubio Huang retweetledi

Ge Zhang@GeZhang86038849·18 Ara

[1/n] 🎉We are very pleased to introduce FineFineWeb, which is currently the largest open-source fully automatic classification practice for fine-grained web data. Specifically, our contributions are as follows: 🔪We decompose the entire deduplicated version of Fineweb into 67 categories with a significant amount of seed data. 🧮We conduct a correlation analysis between vertical categories as well as between vertical categories and common Benchmarks for FineFineWeb, and also provided the distribution analysis of URLs and other content. 🧑‍⚖️We provide test sets for PPL evaluation based on the 67 selected vertical domains of FineFineWeb, and offer a "small cup" (Validation) and a "medium cup" (Test). 🪙We provide all the full-process materials for training fasttext and bert. 📅We will give suggestions on data proportioning based on our dataset. (Based on RegMix, Coming Soon in our Report! [Due to tight computing power, it will be as soon as possible])

English

161

24.3K

Rubio Huang@HuangRubio·6 Ara

Junyang Lin@JustinLin610

No visa. Can't go to NeurIPS. ( QwQ

QST

Rubio Huang retweetledi

Zekun Wang (ZenMoore) 🔥@ZenMoore1·5 Ara

Models and Codes are available at github.com/MIO-Team/MIO.

English

518

Rubio Huang retweetledi

Ge Zhang@GeZhang86038849·4 Ara

[1/n] 🔥 Happy to Introduce FullStack Bench: A comprehensive evaluation dataset, focusing on full-stack programming across 16 languages and more than 11 real-world application domains like data analysis, software engineering, and machine learning. Whether or not your CodeLLM is a FullStack Coder instead of an leetcode nerd? It's time to put your code LLMs to the test!!! 📝

English

135

46.5K

Rubio Huang retweetledi

Ge Zhang@GeZhang86038849·30 Eki

[1/n] ### Discover AutoKaggle: Revolutionizing Data Science Competitions with Multi-Agent Collaboration! 🚀 Introducing AutoKaggle — a multi-agent framework designed to automate the full spectrum of data science competitions on Kaggle! From background understanding to model prediction, AutoKaggle takes on all phases, boosting efficiency and reducing manual overhead. 💡 Highlights of AutoKaggle: 🛠️ Phase-based workflow: Six key phases (Understanding, EDA, Cleaning, Feature Engineering, Model Building). 🤖 Five specialized agents: Reader, Planner, Developer, Reviewer, Summarizer. 🔁 Iterative debugging & unit testing for robust, correct code generation. 📊 Built-in ML tools library to handle data cleaning, feature engineering, and modeling. 🤤 Flexible Customize Support on ML Tool Library allows you to drive the workflow as you want.

English

153

15.1K

Rubio Huang retweetledi

Ge Zhang@GeZhang86038849·18 Eki

[1/n] ### Exploring the Boundaries of AI Reasoning — Launch of KOR-Bench 🚀To more accurately assess large models' reasoning in new, unfamiliar areas, we’re thrilled to introduce the all-new KOR-Bench (Knowledge-Orthogonal Reasoning Benchmark)! ### 💡 Highlights of KOR-Bench: > 5 categories (🔢Operation, 🔍Logic, 🔐Cipher, 🧩Puzzle, 📖Counterfactual) assess reasoning from multiple perspectives, using 25 custom rules 📜 with 10 problem ❓ instances each, ensuring rules are orthogonal to pre-training data. > Minimizes reliance on pre-trained knowledge by testing large language models' ability to solve new rule-driven questions using new rule descriptions, ensuring a fairer evaluation of models' true reasoning skills. > Encourages models to break traditional frameworks and adapt to non-standard challenges, revealing abilities in reading comprehension, immediate learning, knowledge transfer, logical reasoning, and problem-solving. 🔗 #Reasoning #KOR Bench #Large Language Models #Benchmark

English

5.3K

Rubio Huang retweetledi

Rhymes.AI@rhymes_ai_·12 Eki

Really humbled that Aria from Rhymes AI is so well received by the @huggingface community as 🔥#1 Daily Paper🔥. Awesome work from @LiJunnan0409 and our multimodality native team! huggingface.co/papers?date=20… 📢 Blog: rhymes.ai/blog 📖 Tech Report: arxiv.org/pdf/2410.05993 ⏬ Model: huggingface.co/rhymes-ai/Aria 🔧 Github: github.com/rhymes-ai/Aria 👀 Demo (scroll down to Chat): rhymes.ai

English

6.5K

Rubio Huang@HuangRubio·12 Eki

Congrats @GeZhang86038849

JB@IAMJBDEL

HuggingFace Paper-central now hosts open-source leaderboards. This is like a h-index but for 🤗 artifacts. Discover the authors whose papers have attracted the most open-source artifacts (datasets, models or spaces), and most-active contributors who have developed artifacts associated with papers.

English

Rubio Huang retweetledi

JB@IAMJBDEL·12 Eki

English

35.3K

Rubio Huang retweetledi

Yizhi Li@yizhilll·25 Eyl

Exciting news! We're thrilled to introduce OmniBench: a groundbreaking benchmark for evaluating omni-language models (OLMs) that can process visual, acoustic, and textual inputs simultaneously! 🖼️🔊📝 huggingface.co/papers/2409.15… #Multimodal #LLM

English

2.6K

Rubio Huang retweetledi

Wenhu Chen@WenhuChen·9 Tem

A sad truth about evaluation is that: If you make a private test set for your benchmark, people just won't adopt it. We have our official MMMU private test set hosted in EvalAI (eval.ai/web/challenges…), but everyone is still reporting validation score. I found it's similar for MathVista, where everyone is just reporting testmini score.

English

196

83.2K

Rubio Huang@HuangRubio·19 Haz

@__kolesnikov__ seem dm doesn't work, do you have time for a coffee chat?

English

Alexander Kolesnikov@__kolesnikov__·18 Haz

I am at CVPR, DM me if you want to meet in person.

English

9.5K

Rubio Huang retweetledi

Siwei Wu（吴思为）@siweiwu7·25 Oca

1/ Excited to announce the release of our new paper "SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval". This benchmark comprises 530K meticulously curated image-text pairs extracted from scientific documents(arXiv Paper). arxiv.org/abs/2401.13478

English

9.3K

Keşfet

@elonmusk @huggingface @LiJunnan0409 @GeZhang86038849 @__kolesnikov__ @BarackObama @taylorswift13 @cristiano