Together AI

2.7K posts

Together AI banner
Together AI

Together AI

@togethercompute

Accelerate inference, model shaping, and pre-training on a research-optimized platform.

San Francisco, CA Katılım Kasım 2022
396 Takip Edilen55.6K Takipçiler
Sabitlenmiş Tweet
Together AI
Together AI@togethercompute·
"One thing that we've been seeing recently is that inference benchmarks don't really match production workloads that well." - @realDanFu, VP of Kernels When you're running dozens of concurrent coding agents — each with 45k–200k token contexts — the benchmarks that matter are the ones that stress KV cache, scheduler limits, and throughput under real load. We ran those benchmarks. Our Inference Engine delivered: → 31% higher TPS than the next fastest OSS engine → 2× better time-to-first-token at saturation → 76% lower cost per request vs. Claude Opus 4.6 Read the full technical breakdown → togetherai.link/O0VBJR0
English
9
3
38
8K
Together AI
Together AI@togethercompute·
@arena Love the visualization! Lots of incredible open models right on the frontier 👏👏
English
0
0
2
156
Arena.ai
Arena.ai@arena·
Dive into the details of the Text Arena Pareto frontier. Filter and sort by lab, license, input/output price and context length. arena.ai/leaderboard/te…
English
3
2
9
6.4K
Arena.ai
Arena.ai@arena·
5 patterns in Text Arena's price–performance Pareto frontier since 2023: 1. GPT-4-level quality is now ~500x lower cost. - From a ~$50 blended price per million tokens in 2023 to ~$0.10 today. 2. The higher-price end is both better and lower-priced since 2023. - The leading Arena score has climbed ~170 points (1,330 → 1,500). While the price of the higher-end frontier models dropped from ~$50 to ~$20 per million tokens. 3. The low-cost end gained the most. - Under $0.20 per million tokens, the best available model went from ~1,000 Arena score in 2023 to ~1,440 today. 4. The low-cost/top performance gap has nearly closed. - In 2023, sub-$0.20 models trailed the leader by ~350 Arena points. Today, ~60. 5. The cast has rotated quite a bit. - - @OpenAI set the 2023–24 benchmark. - @AIatMeta strengthened the low-cost end in 2024. - @GoogleDeepMind drove the 2025 jump. - @AnthropicAI holds the peak in 2026. - @xAI and Chinese labs like @DeepSeekAI, @Zai_org, @Kimi_Moonshot, @XiaomiMiMo, and @Alibaba_Qwen are continuing to push the mid-price frontier.
English
13
40
367
56.3K
Together AI
Together AI@togethercompute·
@0xPepeTerelu @realDanFu We were really interested in stress testing performance with long context and high concurrency, glad you found it interesting!
English
1
0
1
36
0xPepeterelu
0xPepeterelu@0xPepeTerelu·
@togethercompute @realDanFu yep, coding agents are a totally different load shape. long context, bursty tool calls, many parallel sessions, and latency variance matters way more than a clean tokens/sec chart
English
1
0
0
89
Together AI
Together AI@togethercompute·
"One thing that we've been seeing recently is that inference benchmarks don't really match production workloads that well." - @realDanFu, VP of Kernels When you're running dozens of concurrent coding agents — each with 45k–200k token contexts — the benchmarks that matter are the ones that stress KV cache, scheduler limits, and throughput under real load. We ran those benchmarks. Our Inference Engine delivered: → 31% higher TPS than the next fastest OSS engine → 2× better time-to-first-token at saturation → 76% lower cost per request vs. Claude Opus 4.6 Read the full technical breakdown → togetherai.link/O0VBJR0
English
9
3
38
8K
Together AI
Together AI@togethercompute·
@jahanzaibai @Alibaba_Qwen What about agentic tasks that involve reasoning across a larger codebase? Do you feel like the 1M context really helps out there?
English
0
0
0
47
Jahanzaib Ahmed
Jahanzaib Ahmed@jahanzaibai·
@togethercompute @Alibaba_Qwen Together's serverless pricing makes this accessible but I think you're still better off splitting long context into tight retrieval for most agent tasks. It's rarely one giant window that wins.
English
1
0
0
81
Together AI
Together AI@togethercompute·
Introducing Qwen3.7-Max from @Alibaba_Qwen, Qwen’s flagship model for the agent era with 1M context and leading performance across agentic coding, reasoning, and long-horizon autonomy. AI natives can now use Qwen3.7-Max on Together Serverless Inference for production-scale agent workflows.
Together AI tweet media
English
8
5
44
5K
Together AI
Together AI@togethercompute·
@superaiwatcher @Alibaba_Qwen Maybe — though performance/token and capability at a given parameter size + memory footprint seem like they'll stay competitive too.
English
0
0
1
58
Super Watcher
Super Watcher@superaiwatcher·
@togethercompute @Alibaba_Qwen Model performance is now a commodity. Within 6 months, the market will stop caring about benchmarks and exclusively value inference latency per unit of reasoning.
English
1
0
1
75
Together AI
Together AI@togethercompute·
Highlights: 👉 Long-horizon autonomy: maintained coherent execution across a 35-hour autonomous kernel optimization run 👉 Agentic coding: leading Terminal-Bench 2.0-Terminus performance for terminal-based engineering workflows 👉 General agent workflows: strong tool orchestration, office automation, and spreadsheet reasoning 👉 1M context: built for longer tasks, larger working sets, and persistent agent workflows
English
5
0
2
1.2K
Together AI retweetledi
Hassan
Hassan@nutlope·
We turned this app into a real comic book booth at GTC! Had folks create their own comic books on-site, printed them, and gave them out! (this is using an unreleased version of the app that's a lot better, will open source it soon).
Hassan tweet mediaHassan tweet mediaHassan tweet mediaHassan tweet media
Hassan@nutlope

Built an app that can generate an entire comic in a few seconds. You can upload your own characters, choose a theme, & build your own comic page by page! 100% free and open source. Launching in 24 hours.

English
6
2
22
8.4K
Together AI
Together AI@togethercompute·
MiniMax Speech 2.8 Turbo is built for voice agents that need natural delivery, not just clean audio. → Sound Tags for laughter, breathing, sighs, gasps, and other vocal cues → 60% prosody improvement over Speech 2.6 → High-fidelity voice cloning → Sub-250ms end-to-end latency across 40+ languages
English
2
0
1
1.5K
Together AI
Together AI@togethercompute·
We added 600+ new voices on Together AI! Introducing MiniMax Speech 2.8 Turbo on Together AI, an enterprise TTS model for expressive real-time voice agents. AI natives can now deploy @MiniMax_AI Speech 2.8 Turbo on Together AI dedicated infrastructure, and try the voices directly in voice finder.
English
8
1
26
21.1K
Together AI retweetledi
Kaitlyn Zhou
Kaitlyn Zhou@KaitlynZhou·
Voice "cloning" is style transfer. Across three widely used systems — ElevenLabs V3, Coqui-XTTS, Chatterbox — clones don't just copy speakers, they reshape them to be warmer, more authoritative, more native English-like, and even more “humanlike”. Moreover, listeners trust the clones more than the human counterparts.🧵
English
7
22
123
13.4K