Ce Zhang

740 posts

Ce Zhang

@ce_zhang

CTO @ Together @togethercompute Neubauer Associate Professor @UChicago

San Francisco Katılım Eylül 2016

1.2K Takip Edilen2.6K Takipçiler

Sabitlenmiş Tweet

Ce Zhang@ce_zhang·8 Ara

A 7B model beyond Transformer architecture that matches / sometimes outperforms, the strongest 7B Transformer! Thanks @Hessian_AI & @teknium @theemozilla @NousResearch for the collaboration. Play with it here api.together.xyz/playground/cha… and give us feedback!

Together AI@togethercompute

Announcing StripedHyena 7B — an open source model using an architecture that goes beyond Transformers achieving faster performance and longer context. It builds on the lessons learned in past year designing efficient sequence modeling architectures. together.ai/blog/stripedhy…

English

10.8K

Ce Zhang retweetledi

Together AI@togethercompute·4d

Introducing DeepSeek V4 Pro, a long-context model with hybrid attention, three reasoning modes, and SOTA coding performance. AI natives can now use DeepSeek V4 Pro on Together AI and benefit from reliable inference for long-horizon coding and agentic workflows.

English

110

853.8K

Ce Zhang retweetledi

Together AI@togethercompute·14 Nis

EinsteinArena is a platform where AI agents collaborate on open science problems — submitting solutions, posting in discussion threads, building on each other's constructions in real time. Agents just improved a math problem that's been open since Newton. Kissing Number in dimension 11: 593 → 604.

English

241

28.4K

Ce Zhang retweetledi

Together AI@togethercompute·1 Nis

New from Together Research: Aurora. Speculative decoding that adapts to shifting traffic in real time — and keeps improving the longer it runs. Open-source, RL-based, 1.25x faster vs. a well-trained static speculator with no offline retraining pipeline. Thread 🧵

English

263

33.4K

Ce Zhang retweetledi

Together AI@togethercompute·1 Nis

The Together AI kernels team pushes performance to the next level. An investigation into how left more questions than answers, but VP of Kernels @realDanFu seemed proud. If you want the full picture, read on: togetherai.link/kernels

English

140

35.3K

Ce Zhang retweetledi

Zach Xu@nehzux·28 Mar

Excited to share our #ICLR2026 work! One thing I learned from this project is that long-context reasoning does not fail for just one reason. Some failures come from the model getting “foggy” as context grows, some from breaking apart information that really needs to stay connected, and some from the final aggregation step. That distinction turned out to be surprisingly useful in practice: in the right regime, a carefully planned divide-and-conquer system can outperform a much stronger model reading everything in one shot. Scaling context length alone is not enough. We also need better ways to harness long context. Huge thanks to my great collaborators and mentors: @ShangZhu18 @JueWANG26088228 @JunlinWang3 @ben_athi @Chi_Wang_ @james_y_zou @ce_zhang

Together AI@togethercompute

New from Together Research: a smaller model using divide & conquer can match or beat GPT-4o single-shot on long context tasks. Paper accepted at ICLR 2026. Read more in the 🧵

English

2.8K

Ce Zhang retweetledi

Zongze Li@freelulul·23 Mar

My first PhD work: "Not All Prefills Are Equal" Prefill-Decode disaggregation is the standard for LLM serving. But for multi-turn conversations, it re-transfers the entire KV cache every turn. We found a better way! Thanks for my amazing advisor @ce_zhang and collaborators!

English

135

12.5K

Ce Zhang retweetledi

Vipul Ved Prakash@vipulved·20 Mar

GB300s about to go into burn in @togethercompute

English

297

20.5K

Ce Zhang retweetledi

Together AI@togethercompute·12 Mar

Today, Together AI is launching a unified solution for building real-time voice agents with the entire pipeline running on one cloud. AI natives can now deploy voice apps for every use case at production scale.

English

14.4K

Ce Zhang retweetledi

Together AI@togethercompute·5 Mar

Together Research has produced FlashAttention, ATLAS, ThunderKittens and more. This week at AI Native Conf: seven more releases, all coming to production soon. Thread → #ainativeconf #ainativecloud

English

138

56.4K

Ce Zhang retweetledi

Together AI@togethercompute·2 Mar

New look. Same mission. Together AI was founded on the promise of open innovation – helping AI-native builders create groundbreaking products and experiences across every domain. Today, we’re continuing this mission and excited to introduce our brand refresh.

English

12.6K

Ce Zhang retweetledi

Together AI@togethercompute·25 Şub

We’re open-sourcing CoderForge-Preview — 258K test-verified coding-agent trajectories (155K pass | 103K fail). Fine-tuning Qwen3-32B on the passing subset boosts SWE-bench Verified: 23.0% → 59.4% pass@1, and it ranks #1 among open-data models ≤32B parameters. Thread on the data generation pipeline 🧵

English

528

186.1K

Ce Zhang retweetledi

Discrete Diffusion Reading Group@diffusion_llms·15 Oca

📢Jan 19 (Mon): TiDAR: Think in Diffusion, Talk in Autoregression Diffusion LMs enable fast parallel generation, while autoregressive (AR) models typically deliver higher quality thanks to their causal structure. A central challenge is whether these advantages can be unified to achieve ✅ High throughput ✅ Higher GPU utilization ✅ AR-level quality This Monday, Jingyu Liu (@Jingyu227) will discuss TiDAR, a hybrid decoding approach that combines diffusion-style parallel drafting with autoregressive verification for high quality and high throughput. The project was co-led by Jingyu Liu (@Jingyu227) and Xin Dong (@SimonXinDong). Collaborators: Zhifan Ye (PhD Student @ GaTech), Rishabh Mehta (@__principia__), @YongganFu, Vartika Singh (@vartuattheghat), @jankautz, @ce_zhang and @PavloMolchanov Paper link: arxiv.org/abs/2511.08923

Discrete Diffusion Reading Group tweet media

English

20.2K

Ce Zhang retweetledi

AK@_akhaliq·13 Kas

Nvidia presents TiDAR Think in Diffusion, Talk in Autoregression

English

176

1.7K

101.8K

Ce Zhang retweetledi

Together AI@togethercompute·14 Eki

We're excited to host Apriel-1.5-15b-Thinker by @ServiceNow's SLAM labs on Together AI! 👉15B parameters, fits on single GPU 👉On par with Deepseek-R1-0528 and Mistral-Medium-1.2 on the Artificial Analysis Intelligence Index Built by @SathwikTejaswi @ServiceNowRSRCH

English

6.7K

Ce Zhang retweetledi

Together AI@togethercompute·24 Eyl

Breaking: @VFSGlobal x Together AI announce strategic partnership. We’re partnering with VFS Global to scale secure, responsible, and high-performance AI solutions for global mobility. Millions of visa applications. 160+ countries. One mission: faster, more transparent, and privacy-conscious AI. [Link below] #AI #GlobalMobility #TechPartnership #responsibleAI

English

2.6K

Ce Zhang retweetledi

Together AI@togethercompute·23 Eyl

The Washington Post processes 1.79 billion tokens every month powering "Ask The Post AI" They needed reliable inference without vendor lock-in. Fixed costs. Full model ownership. Together AI's Dedicated Endpoints delivered.

English

Ce Zhang retweetledi

Hassan@nutlope·23 Eyl

Announcing ReceiptHero – an app to help people track their finances! It'll take in any receipts you have, extract the total $, and categorize it for you (dining, groceries, utilities, ect). 100% free & open source. Powered by llama 4 on @togethercompute.

English

526

57.2K

Ce Zhang retweetledi

Hassan@nutlope·17 Eyl

I'm building a realtime video analysis app! It takes screenshots every 500ms, sends it to llama 4 on @togethercompute, and streams back the results. I want to extend it to be able to perform actions too (record my screen & send me a text when a video finishes for example). Will open source & launch it soon.

English

430

50.2K

Ce Zhang retweetledi

Together AI@togethercompute·10 Eyl

Together Instant Clusters, offering ready to use, self-service NVIDIA GPUs, are now Generally Available 🚀

English

8.9K

Ce Zhang retweetledi

Together AI@togethercompute·22 Ağu

Building AI agents for complex engineering tasks ≠ building chatbots 🧵 Most AI agents today excel at short, simple tasks. But automating multi-day engineering workflows? That’s a whole different game. At Together AI, we learned this the hard way while optimizing LLM inference. Here’s what actually works: ✅ Good tools & documentation ✅ Safe execution environments ✅ Smart session management & progress verification

English

21K

Keşfet

@realDanFu @ShangZhu18 @JueWANG26088228 @JunlinWang3 @ben_athi @Chi_Wang_ @james_y_zou @togethercompute