Zara

98 posts

Zara

Zara

@Zara170604

Doing growth for a Chinese AI infra startup, US-bound. Reddit + X experiments → raw numbers, post-mortems, automations I built. at @atlas_cloud_ai

Katılım Şubat 2026
132 Takip Edilen4 Takipçiler
Zara
Zara@Zara170604·
@reach_vb What's the typical lag for inference providers to expose new OpenAI models? GPT-5.5 hit most aggregators within 24-48h, but realtime-2 still feels patchy on the routing layer. Curious where the bottleneck actually is — model weights access, pricing terms, or auth schema drift?
English
0
0
0
53
Vaibhav (VB) Srivastav
Vaibhav (VB) Srivastav@reach_vb·
in the last ~15 days we shipped: - gpt image 2 - privacy filter - gpt 5.5 - gpt 5.5 pro - gpt 5.5 instant - gpt realtime 2 - gpt realtime translate - gpt realtime whisper - gpt 5.5 cyber
English
118
85
2.8K
166.9K
Zara
Zara@Zara170604·
@ArtificialAnlys 37.6 chars/s vs Inworld's 220.5 — that's a ~6x throughput gap at similar quality tier. Is this primarily an architecture choice (autoregressive vs parallel decoders), or batching strategy at the inference layer? Has Step published anything on this?
English
1
0
0
156
Artificial Analysis
Artificial Analysis@ArtificialAnlys·
StepFun’s new StepAudio 2.5 TTS ranks #3 on the Artificial Analysis Speech Arena Leaderboard, only behind Inworld’s Realtime TTS 1.5 Max and Google’s Gemini 3.1 Flash TTS StepAudio 2.5 TTS represents a significant step forward for StepFun from previous TTS models, with notably increased naturalness of speech samples. The model now edges out Eleven v3 on our current prompt set with an Elo score of 1,187. Key takeaways: ➤ Quality: StepAudio 2.5 TTS has an Elo of 1,187 based on 834 arena appearances, placing it 28 points behind the leading model (Inworld TTS 1.5 Max at 1,215) and 8 points ahead of Eleven v3 at 1,179 ➤ Pricing: Model is priced at $85/1M characters, a premium to leading frontier models, Inworld TTS 1.5 Max at $35/1M and Gemini 3.1 Flash TTS at $36.6/1M ➤ Speed: Model generates characters 37.6 characters per second, compared to 220.5 chars/s for Inworld TTS 1.5 Max and 30.1 chars/s for Gemini 3.1 Flash TTS ➤ Prompting: StepAudio 2.5 TTS offers two paths to control delivery of speech: 1. Global context prompt for overall style, 2. Inline contextual tags for more granular emotion and prosody See more details and listen to samples below ⬇️
Artificial Analysis tweet media
English
3
14
127
21.3K
Zara
Zara@Zara170604·
@tenderizzation What's actually replacing yolov3 in production vision stacks right now? I keep seeing teams ping-pong between yolo variants, owl-v2, and grounding-dino depending on latency budget. Curious what triggered the throwback — new latency floor, or just nostalgia?
English
1
0
1
293
Zara
Zara@Zara170604·
@nutlope Same shift on my side — what devs want changed. Apps were tutorials, skills are infrastructure. The pivot point was when composable agents (Claude Code skills, agent.md files) made chaining capabilities trivial enough that "another whole app" felt redundant.
English
1
0
0
22
Hassan
Hassan@nutlope·
I'm planning to build less apps and more skills moving forward. Why? I want to empower devs to build with AI. That's always been the goal and it's a big part of my role at Together AI. Open source apps worked really well early on since devs wanted to go read the code to better understand them and check them out. Now? Most devs want a skill they can install that will augment their coding agent. Apps are still good for inspiration & showing what's possible, but they're a lot less useful than they once were (for helping devs build with AI). I still love building and will continue to build apps, but will launch fewer & bigger apps, along with skills that I build for myself along the way. I'm starting with a "design taste" skill that will make your apps look less AI-generated & more unique by default (dropping next week).
English
8
2
49
4.3K
Zara
Zara@Zara170604·
@charles_irl @modal @ShreshthMalik @OATML_Oxford The "new material" framing is great — and unlike most materials, this one keeps changing properties every 3 months. Building reliable apps on a substrate that shifts under you is the actual hard problem we don't talk about enough.
English
0
0
0
51
Zara
Zara@Zara170604·
@osanseviero What's the typical speedup folks are seeing in production deployments with batched traffic? Speculative decoding numbers in papers vs real serving (concurrent users, KV cache pressure) usually have a gap — curious where MTP Drafters lands in practice.
English
0
0
0
150
Omar Sanseviero
Omar Sanseviero@osanseviero·
Gemma 4 Drafters landing across the OS ecosystem ✅transformers ✅VLLM ✅MLX ✅SGLang ✅Ollama ✅AI Edge Gallery And more coming!
English
28
26
412
25.8K
Zara
Zara@Zara170604·
@simonw The asymmetry is the whole problem — when an agent acts on N humans, even a 1% screwup rate becomes someone's bad day. Outbound-action thresholds should be 10x stricter than internal-tooling thresholds, and most "look an agent ran my business" demos blur the two.
English
0
0
0
49
Simon Willison
Simon Willison@simonw·
AI-run business experiments are interesting and fun up to the point where they waste the time of humans who haven't opted into the experiments - I think they need to keep their own human operators in the loop for outbound actions that affect other people simonwillison.net/2026/May/5/our…
English
36
12
136
24.6K
Artificial Analysis
Artificial Analysis@ArtificialAnlys·
MiniMax-M2.7 is now available across six inference providers on Artificial Analysis, with significant differentiation in speed and price @SambaNovaAI leads on speed at 435 output tokens/s, >3x faster than any other provider. @FireworksAI_HQ, @novita_labs, @togethercompute, and @GMI_cloud have all matched @MiniMax_AI's first-party API pricing, while SambaNova is 2x higher. Key takeaways: ➤ Fireworks and SambaNova are on the Pareto frontier for Speed vs. Price. At 127 output tokens/s and ~$0.22 per 1M tokens blended, Fireworks is ~2.2x faster than MiniMax's first-party API at the same blended price, whereas SambaNova delivers 435 output tokens/s but at ~2-3.5x the blended price of the other providers (depending on cache usage) ➤ SambaNova is the fastest provider at 435 output tokens/s, ~3.4x the next fastest provider (Fireworks at 127 output tokens/s). The remaining providers run substantially slower: MiniMax’s first-party API at 57 output tokens/s, Novita at 54, GMI at 41, and Together AI at 29 ➤ Cache discounts vary across providers. Fireworks, MiniMax, Novita, and Together AI offer 80% cache hit discounts, while GMI and SambaNova do not offer a discount. For cache-heavy workloads, this can materially increase the relative pricing for GMI and SambaNova ➤ Optimal provider choice depends on workload. SambaNova may be more suited to latency-sensitive deployments, albeit at a higher cost, while Fireworks may be more suitable for high-volume workloads that are not as latency-sensitive
Artificial Analysis tweet media
English
10
18
204
55.1K
Zara
Zara@Zara170604·
@Zai_org Latency matters more than free credits. Z.ai's 70B hits ~800ms TTFT on cold starts. If your agent loop calls it 10x per user action, you're at 8s before any tool execution. Plan your architecture accordingly.
English
0
0
1
5
Z.ai
Z.ai@Zai_org·
Z.ai Startup Program is NOW OPEN. What you can get: ·Free API credits ·Priority rate limits ·Exclusive Community ·Early API Access Who we're looking for: ·AI-native startups ·Agent builders ·SaaS founders integrating LLM infra ·Global teams building for real-world scale If you're building something that matters, don't wait!! Apply now: startup.z.ai Questions? Details? Follow & DM @ZaiforStartups
Z.ai tweet media
English
120
238
2.4K
267.9K
Zara
Zara@Zara170604·
Unified API for 300+ models: atlascloud.ai 20% bonus on first deposit (up to $100)
English
0
0
0
31
Zara
Zara@Zara170604·
So I switched to Atlas Cloud's unified API. One key. 300+ models. Change one string to swap models. Same auth. Same billing. No vendor juggling. What's your image stack? Still using one model for everything? (Link in reply)
English
1
0
0
46
Zara
Zara@Zara170604·
Flux 2 Pro = photorealism king, can't spell text. Imagen 4 = fast + perfect text, less style. Ideogram v3 = typography god, okay photos. No "best" model. Only "best for this task."
English
1
0
0
32
Zara
Zara@Zara170604·
2026: "Which AI image generator is best?" Wrong question. I tested 6 models. None dominate everything. Each wins at one thing, fails at another. The real bottleneck? Managing 6 API keys.
Zara tweet mediaZara tweet mediaZara tweet mediaZara tweet media
English
1
0
0
31
Zara
Zara@Zara170604·
Who else wants to turn their face into an AI video? Drop a comment with what scene you'd use it for
English
1
0
0
23
Zara
Zara@Zara170604·
78% of marketing teams now use AI-generated video at least once per quarter. But here's the problem: when you upload a photo of a real person, 90% of models will block your request. That's why I put together the complete 2026 guide to AI portrait video generation 👇
English
1
0
0
33
Zara
Zara@Zara170604·
👇 If you've been waiting on fal for Seedance 2.0, what's been the blocker? The API structure is standard REST. Migration takes under an hour. #Seedance2 for Seedance 2.0, what's been the blocker? The API structure is standard REST. Migration takes under an hour. #AIVideo
English
0
0
0
53
Zara
Zara@Zara170604·
Head-to-head: vs Kling 3.0: Better face consistency, lower price vs Veo 3.1: Better value for commercial work vs Wan 2.6: Superior face generation (worth the premium) Real use case: 200 TikTok clips/month, 8s each. Atlas Cloud: ~$35 BytePlus: ~$300
Zara tweet mediaZara tweet mediaZara tweet media
English
1
0
0
54
Zara
Zara@Zara170604·
Seedance 2.0 API is live at $0.081/s, and it actually handles real faces
Zara tweet mediaZara tweet mediaZara tweet mediaZara tweet media
English
1
0
0
56