Together AI

2.7K posts

Together AI

@togethercompute

Accelerate inference, model shaping, and pre-training on a research-optimized platform.

San Francisco, CA Katılım Kasım 2022

396 Takip Edilen55.6K Takipçiler

Sabitlenmiş Tweet

Together AI@togethercompute·6d

"One thing that we've been seeing recently is that inference benchmarks don't really match production workloads that well." - @realDanFu, VP of Kernels When you're running dozens of concurrent coding agents — each with 45k–200k token contexts — the benchmarks that matter are the ones that stress KV cache, scheduler limits, and throughput under real load. We ran those benchmarks. Our Inference Engine delivered: → 31% higher TPS than the next fastest OSS engine → 2× better time-to-first-token at saturation → 76% lower cost per request vs. Claude Opus 4.6 Read the full technical breakdown → togetherai.link/O0VBJR0

English

Together AI retweetledi

Vipul Ved Prakash@vipulved·3d

PSA: Just added a thousand H100s and H200s to Together on-demand GPU clusters and Dedicated Endpoints: api.together.ai/clusters

English

10.7K

Together AI@togethercompute·3d

@arena Love the visualization! Lots of incredible open models right on the frontier 👏👏

English

156

Arena.ai@arena·4d

Dive into the details of the Text Arena Pareto frontier. Filter and sort by lab, license, input/output price and context length. arena.ai/leaderboard/te…

English

6.4K

Arena.ai@arena·4d

5 patterns in Text Arena's price–performance Pareto frontier since 2023: 1. GPT-4-level quality is now ~500x lower cost. - From a ~$50 blended price per million tokens in 2023 to ~$0.10 today. 2. The higher-price end is both better and lower-priced since 2023. - The leading Arena score has climbed ~170 points (1,330 → 1,500). While the price of the higher-end frontier models dropped from ~$50 to ~$20 per million tokens. 3. The low-cost end gained the most. - Under $0.20 per million tokens, the best available model went from ~1,000 Arena score in 2023 to ~1,440 today. 4. The low-cost/top performance gap has nearly closed. - In 2023, sub-$0.20 models trailed the leader by ~350 Arena points. Today, ~60. 5. The cast has rotated quite a bit. - - @OpenAI set the 2023–24 benchmark. - @AIatMeta strengthened the low-cost end in 2024. - @GoogleDeepMind drove the 2025 jump. - @AnthropicAI holds the peak in 2026. - @xAI and Chinese labs like @DeepSeekAI, @Zai_org, @Kimi_Moonshot, @XiaomiMiMo, and @Alibaba_Qwen are continuing to push the mid-price frontier.

English

367

56.3K

Together AI@togethercompute·3d

@0xPepeTerelu @realDanFu We were really interested in stress testing performance with long context and high concurrency, glad you found it interesting!

English

0xPepeterelu@0xPepeTerelu·6d

@togethercompute @realDanFu yep, coding agents are a totally different load shape. long context, bursty tool calls, many parallel sessions, and latency variance matters way more than a clean tokens/sec chart

English

Together AI@togethercompute·6d

English

Together AI@togethercompute·3d

@AccBalanced @realDanFu @SemiAnalysis_ Definitely one to follow, we’re a fan of their work too!

English

b/acc, context platform engineer@AccBalanced·5d

@togethercompute @realDanFu Personally, I’m bullish on @SemiAnalysis_ tackling this with #InferenceX updates of ISL>250K

English

Together AI@togethercompute·3d

@AccBalanced @realDanFu A good benchmark should be realistic and push the system to its limits!

English

b/acc, context platform engineer@AccBalanced·5d

@togethercompute @realDanFu Exactly. Long-context, Multi-Turn, High-Concurrency, “is all you need” to benchmark today’s workloads

English

Together AI@togethercompute·3d

@jahanzaibai @Alibaba_Qwen What about agentic tasks that involve reasoning across a larger codebase? Do you feel like the 1M context really helps out there?

English

Jahanzaib Ahmed@jahanzaibai·4d

@togethercompute @Alibaba_Qwen Together's serverless pricing makes this accessible but I think you're still better off splitting long context into tight retrieval for most agent tasks. It's rarely one giant window that wins.

English

Together AI@togethercompute·4d

Introducing Qwen3.7-Max from @Alibaba_Qwen, Qwen’s flagship model for the agent era with 1M context and leading performance across agentic coding, reasoning, and long-horizon autonomy. AI natives can now use Qwen3.7-Max on Together Serverless Inference for production-scale agent workflows.

English

Together AI@togethercompute·3d

@superaiwatcher @Alibaba_Qwen Maybe — though performance/token and capability at a given parameter size + memory footprint seem like they'll stay competitive too.

English

Super Watcher@superaiwatcher·4d

@togethercompute @Alibaba_Qwen Model performance is now a commodity. Within 6 months, the market will stop caring about benchmarks and exclusively value inference latency per unit of reasoning.

English

Together AI@togethercompute·3d

@SaniAiTech @Alibaba_Qwen Truly! Any plans to test it out?

English

Sani Ai Tech@SaniAiTech·3d

@togethercompute @Alibaba_Qwen 1M context and agent-focused performance is a serious combo

English

Together AI@togethercompute·3d

@AliceInfoAi @Alibaba_Qwen We’re certainly impressed so far!

English

Alice The Ai Expert@AliceInfoAi·4d

@togethercompute @Alibaba_Qwen Qwen3.7-Max looks built for the next generation of long context AI agents

English

112

Together AI@togethercompute·3d

@xkaidus @Alibaba_Qwen Let us know what you experience, do you have any projects in mind to test it?

English

Kaidu@xkaidus·4d

@togethercompute @Alibaba_Qwen 1M context is wild on paper but lets see how it handles real agentic loops without hallucinating by turn 50

English

Together AI@togethercompute·3d

@justgrm @Alibaba_Qwen Their launch blog has some good examples but let us know if you test it out yourself!

English

Grim@justgrm·4d

@togethercompute @Alibaba_Qwen 1M context is actually insane. what does long-horizon autonomy look like in practice though

English

Together AI@togethercompute·3d

@ECLresearch @Alibaba_Qwen A lot you can do with 1M! Let us know if you test it out.

English

Eclipse 🌖@ECLresearch·4d

@togethercompute @Alibaba_Qwen 1M context at production scale is the real differentiator here—agentic workflows collapse without reliable long-range memory.

English

Together AI@togethercompute·4d

Try Qwen3.7-Max now on Together AI: together.ai/models/qwen37-…

English

978

Together AI@togethercompute·4d

Highlights: 👉 Long-horizon autonomy: maintained coherent execution across a 35-hour autonomous kernel optimization run 👉 Agentic coding: leading Terminal-Bench 2.0-Terminus performance for terminal-based engineering workflows 👉 General agent workflows: strong tool orchestration, office automation, and spreadsheet reasoning 👉 1M context: built for longer tasks, larger working sets, and persistent agent workflows

English

1.2K

Together AI retweetledi

Hassan@nutlope·4d

We turned this app into a real comic book booth at GTC! Had folks create their own comic books on-site, printed them, and gave them out! (this is using an unreleased version of the app that's a lot better, will open source it soon).

Hassan@nutlope

Built an app that can generate an entire comic in a few seconds. You can upload your own characters, choose a theme, & build your own comic page by page! 100% free and open source. Launching in 24 hours.

English

8.4K

Together AI@togethercompute·5d

See MiniMax Speech 2.8 Turbo in voice finder and try the voices directly: voicefinder.together.ai/minimax--speec… Learn more: together.ai/models/minimax…

English

1.2K

Together AI@togethercompute·5d

MiniMax Speech 2.8 Turbo is built for voice agents that need natural delivery, not just clean audio. → Sound Tags for laughter, breathing, sighs, gasps, and other vocal cues → 60% prosody improvement over Speech 2.6 → High-fidelity voice cloning → Sub-250ms end-to-end latency across 40+ languages

English

1.5K

Together AI@togethercompute·5d

We added 600+ new voices on Together AI! Introducing MiniMax Speech 2.8 Turbo on Together AI, an enterprise TTS model for expressive real-time voice agents. AI natives can now deploy @MiniMax_AI Speech 2.8 Turbo on Together AI dedicated infrastructure, and try the voices directly in voice finder.

English

21.1K

Together AI retweetledi

Kaitlyn Zhou@KaitlynZhou·6d

Voice "cloning" is style transfer. Across three widely used systems — ElevenLabs V3, Coqui-XTTS, Chatterbox — clones don't just copy speakers, they reshape them to be warmer, more authoritative, more native English-like, and even more “humanlike”. Moreover, listeners trust the clones more than the human counterparts.🧵

English

123

13.4K

Keşfet

@arena @OpenAI @AIatMeta @GoogleDeepMind @AnthropicAI @xAI @DeepSeekAI @Zai_org