Zara

98 posts

Zara

@Zara170604

Doing growth for a Chinese AI infra startup, US-bound. Reddit + X experiments → raw numbers, post-mortems, automations I built. at @atlas_cloud_ai

Katılım Şubat 2026

132 Takip Edilen4 Takipçiler

Zara@Zara170604·9 May

@reach_vb What's the typical lag for inference providers to expose new OpenAI models? GPT-5.5 hit most aggregators within 24-48h, but realtime-2 still feels patchy on the routing layer. Curious where the bottleneck actually is — model weights access, pricing terms, or auth schema drift?

English

Vaibhav (VB) Srivastav@reach_vb·9 May

in the last ~15 days we shipped: - gpt image 2 - privacy filter - gpt 5.5 - gpt 5.5 pro - gpt 5.5 instant - gpt realtime 2 - gpt realtime translate - gpt realtime whisper - gpt 5.5 cyber

English

118

2.8K

166.9K

Zara@Zara170604·9 May

@ArtificialAnlys 37.6 chars/s vs Inworld's 220.5 — that's a ~6x throughput gap at similar quality tier. Is this primarily an architecture choice (autoregressive vs parallel decoders), or batching strategy at the inference layer? Has Step published anything on this?

English

156

Artificial Analysis@ArtificialAnlys·9 May

StepFun’s new StepAudio 2.5 TTS ranks #3 on the Artificial Analysis Speech Arena Leaderboard, only behind Inworld’s Realtime TTS 1.5 Max and Google’s Gemini 3.1 Flash TTS StepAudio 2.5 TTS represents a significant step forward for StepFun from previous TTS models, with notably increased naturalness of speech samples. The model now edges out Eleven v3 on our current prompt set with an Elo score of 1,187. Key takeaways: ➤ Quality: StepAudio 2.5 TTS has an Elo of 1,187 based on 834 arena appearances, placing it 28 points behind the leading model (Inworld TTS 1.5 Max at 1,215) and 8 points ahead of Eleven v3 at 1,179 ➤ Pricing: Model is priced at $85/1M characters, a premium to leading frontier models, Inworld TTS 1.5 Max at $35/1M and Gemini 3.1 Flash TTS at $36.6/1M ➤ Speed: Model generates characters 37.6 characters per second, compared to 220.5 chars/s for Inworld TTS 1.5 Max and 30.1 chars/s for Gemini 3.1 Flash TTS ➤ Prompting: StepAudio 2.5 TTS offers two paths to control delivery of speech: 1. Global context prompt for overall style, 2. Inline contextual tags for more granular emotion and prosody See more details and listen to samples below ⬇️

English

127

21.3K

Zara@Zara170604·9 May

@tenderizzation What's actually replacing yolov3 in production vision stacks right now? I keep seeing teams ping-pong between yolo variants, owl-v2, and grounding-dino depending on latency budget. Curious what triggered the throwback — new latency floor, or just nostalgia?

English

293

tender@tenderizzation·9 May

it’s literally off the scale! welcome back yolov3

METR@METR_Evals

We evaluated an early version of Claude Mythos Preview for risk assessment during a limited window in March 2026. We estimated a 50%-time-horizon of at least 16hrs (95% CI 8.5hrs to 55hrs) on our task suite, at the upper end of what we can measure without new tasks.

English

291

29.5K

Zara@Zara170604·6 May

@nutlope Same shift on my side — what devs want changed. Apps were tutorials, skills are infrastructure. The pivot point was when composable agents (Claude Code skills, agent.md files) made chaining capabilities trivial enough that "another whole app" felt redundant.

English

Hassan@nutlope·2 May

I'm planning to build less apps and more skills moving forward. Why? I want to empower devs to build with AI. That's always been the goal and it's a big part of my role at Together AI. Open source apps worked really well early on since devs wanted to go read the code to better understand them and check them out. Now? Most devs want a skill they can install that will augment their coding agent. Apps are still good for inspiration & showing what's possible, but they're a lot less useful than they once were (for helping devs build with AI). I still love building and will continue to build apps, but will launch fewer & bigger apps, along with skills that I build for myself along the way. I'm starting with a "design taste" skill that will make your apps look less AI-generated & more unique by default (dropping next week).

English

4.3K

Zara@Zara170604·6 May

@charles_irl @modal @ShreshthMalik @OATML_Oxford The "new material" framing is great — and unlike most materials, this one keeps changing properties every 3 months. Building reliable apps on a substrate that shifts under you is the actual hard problem we don't talk about enough.

English

Charles 🎉 Frye@charles_irl·6 May

LLMs are a "new material" for building computer applications. Fitting that they should also be used to discover novel physical materials! Just one of many excellent papers in this year's ICML sponsored by @modal. By @ShreshthMalik of @OATML_Oxford et al. arxiv.org/abs/2601.20996

English

2.2K

Zara@Zara170604·6 May

@osanseviero What's the typical speedup folks are seeing in production deployments with batched traffic? Speculative decoding numbers in papers vs real serving (concurrent users, KV cache pressure) usually have a gap — curious where MTP Drafters lands in practice.

English

150

Omar Sanseviero@osanseviero·5 May

Gemma 4 Drafters landing across the OS ecosystem ✅transformers ✅VLLM ✅MLX ✅SGLang ✅Ollama ✅AI Edge Gallery And more coming!

English

412

25.8K

Zara@Zara170604·6 May

@simonw The asymmetry is the whole problem — when an agent acts on N humans, even a 1% screwup rate becomes someone's bad day. Outbound-action thresholds should be 10x stricter than internal-tooling thresholds, and most "look an agent ran my business" demos blur the two.

English

Simon Willison@simonw·6 May

AI-run business experiments are interesting and fun up to the point where they waste the time of humans who haven't opted into the experiments - I think they need to keep their own human operators in the loop for outbound actions that affect other people simonwillison.net/2026/May/5/our…

English

136

24.6K

Zara@Zara170604·6 May

@ArtificialAnlys @SambaNovaAI @FireworksAI_HQ @novita_labs @togethercompute Latency variance across providers is the underrated story here — same model, same prompt can be 3x apart on p99 depending on stack (batching, quant, KV cache). Anyone publish a head-to-head with concurrent traffic, not just synthetic benchmarks?

English

270

Artificial Analysis@ArtificialAnlys·5 May

MiniMax-M2.7 is now available across six inference providers on Artificial Analysis, with significant differentiation in speed and price @SambaNovaAI leads on speed at 435 output tokens/s, >3x faster than any other provider. @FireworksAI_HQ, @novita_labs, @togethercompute, and @GMI_cloud have all matched @MiniMax_AI's first-party API pricing, while SambaNova is 2x higher. Key takeaways: ➤ Fireworks and SambaNova are on the Pareto frontier for Speed vs. Price. At 127 output tokens/s and ~$0.22 per 1M tokens blended, Fireworks is ~2.2x faster than MiniMax's first-party API at the same blended price, whereas SambaNova delivers 435 output tokens/s but at ~2-3.5x the blended price of the other providers (depending on cache usage) ➤ SambaNova is the fastest provider at 435 output tokens/s, ~3.4x the next fastest provider (Fireworks at 127 output tokens/s). The remaining providers run substantially slower: MiniMax’s first-party API at 57 output tokens/s, Novita at 54, GMI at 41, and Together AI at 29 ➤ Cache discounts vary across providers. Fireworks, MiniMax, Novita, and Together AI offer 80% cache hit discounts, while GMI and SambaNova do not offer a discount. For cache-heavy workloads, this can materially increase the relative pricing for GMI and SambaNova ➤ Optimal provider choice depends on workload. SambaNova may be more suited to latency-sensitive deployments, albeit at a higher cost, while Fireworks may be more suitable for high-volume workloads that are not as latency-sensitive

English

204

55.1K

Zara@Zara170604·17 Nis

@Zai_org Latency matters more than free credits. Z.ai's 70B hits ~800ms TTFT on cold starts. If your agent loop calls it 10x per user action, you're at 8s before any tool execution. Plan your architecture accordingly.

English

Z.ai@Zai_org·2 Mar

Z.ai Startup Program is NOW OPEN. What you can get: ·Free API credits ·Priority rate limits ·Exclusive Community ·Early API Access Who we're looking for: ·AI-native startups ·Agent builders ·SaaS founders integrating LLM infra ·Global teams building for real-world scale If you're building something that matters, don't wait!! Apply now: startup.z.ai Questions? Details? Follow & DM @ZaiforStartups

English

120

238

2.4K

267.9K

Zara@Zara170604·13 Nis

Unified API for 300+ models: atlascloud.ai 20% bonus on first deposit (up to $100)

English

Zara@Zara170604·13 Nis

So I switched to Atlas Cloud's unified API. One key. 300+ models. Change one string to swap models. Same auth. Same billing. No vendor juggling. What's your image stack? Still using one model for everything? (Link in reply)

English

Zara@Zara170604·13 Nis

Flux 2 Pro = photorealism king, can't spell text. Imagen 4 = fast + perfect text, less style. Ideogram v3 = typography god, okay photos. No "best" model. Only "best for this task."

English

Zara@Zara170604·13 Nis

2026: "Which AI image generator is best?" Wrong question. I tested 6 models. None dominate everything. Each wins at one thing, fails at another. The real bottleneck? Managing 6 API keys.

English

Zara@Zara170604·10 Nis

Full comparison table + API code samples are in the GitHub repo github.com/MartinCha73421…

English

Zara@Zara170604·10 Nis

Who else wants to turn their face into an AI video? Drop a comment with what scene you'd use it for

English

Zara@Zara170604·10 Nis

78% of marketing teams now use AI-generated video at least once per quarter. But here's the problem: when you upload a photo of a real person, 90% of models will block your request. That's why I put together the complete 2026 guide to AI portrait video generation 👇

English

Zara@Zara170604·9 Nis

👇 If you've been waiting on fal for Seedance 2.0, what's been the blocker? The API structure is standard REST. Migration takes under an hour. #Seedance2 for Seedance 2.0, what's been the blocker? The API structure is standard REST. Migration takes under an hour. #AIVideo

English

Zara@Zara170604·9 Nis

Head-to-head: vs Kling 3.0: Better face consistency, lower price vs Veo 3.1: Better value for commercial work vs Wan 2.6: Superior face generation (worth the premium) Real use case: 200 TikTok clips/month, 8s each. Atlas Cloud: ~$35 BytePlus: ~$300

English

Zara@Zara170604·9 Nis

Seedance 2.0 API is live at $0.081/s, and it actually handles real faces

English

Keşfet

@reach_vb @ArtificialAnlys @tenderizzation @nutlope @charles_irl @modal @ShreshthMalik @OATML_Oxford