keysersoze

748 posts

keysersoze

@Surajdotdot7

building things with AI Claude Code | agents | automation sharing what I learn — for people who want to build, not just watch

Chennai Katılım Ağustos 2019

808 Takip Edilen104 Takipçiler

keysersoze@Surajdotdot7·18m

@kimmonismus Running this math at micro scale already. Replaced what used to need a video shoot team with a $0.63/video pipeline. The payroll→infrastructure trade isn't a Meta story. It's every company running the numbers right now.

English

Chubby♨️@kimmonismus·1h

Meta layoffs investors had been bracing for are coming, with roughly 8,000 jobs cut starting May 20, about 10% of its 79,000-person workforce. Mainly to free up billions for AI infrastructure, shifting resources from payroll to data centers, chips, and advanced models as highlighted by Mark Zuckerberg.

English

5.5K

keysersoze@Surajdotdot7·19m

@viktoroddy Demo quality and "build 47 product pages that match our actual brand system" quality are different things. Curious where this breaks when client constraints hit it.

English

545

Viktor Oddy@viktoroddy·3h

Claude Design is insane. ❤️‍🔥Just recorded a 18-min tutorial on how to build animated, award-winning websites with Claude Design + Opus 4.7!

English

269

3.6K

189K

keysersoze@Surajdotdot7·20m

@viktoroddy 18 mins to build it. Weeks to get it working inside an actual brand system with 5 years of design decisions already baked in. The demo is never the hard part.

English

535

keysersoze@Surajdotdot7·3h

@DavidOndrej1 Why can't they expand get new compute from nividia or AWS or is it more of a political problem

English

1.1K

David Ondrej@DavidOndrej1·8h

Dwarkesh was right. Anthropic is running out of compute.

English

418

26.7K

keysersoze@Surajdotdot7·4h

@ZypZapCommunity I’m merging the Creative Director and Video Planner into one agent to slash token usage and latency. Instead of chatting, one agent now sets the shot order (1st/last frames). This flows to JSON -> Prompt Director optimization -> Gen -> Editor for stitching. Much leaner.

English

ZypZap@ZypZapCommunity·4h

@Surajdotdot7 That agent stack sounds wild for turning product shots into clips. Does the creative director actually decide transitions or just shot order?

English

keysersoze@Surajdotdot7·6h

I understand, but what I mean is I am building an agentic product that turns e-commerce catalogs (5 images) into clips and performs video editing. It has 5 specialized agents called creative director, video planner, prompt director (kling), video editor, and QA analyst. It runs perfectly in Sonnet or Opus, but I tried Qwen 3.5, Gemma 4, and other local models. Based on benchmarks, it often fails to do what it should do. I'm tired of seeing benchmarks that don't actually work in production-ready settings. I often see failure rates in agentic tasks, so I want to build a benchmark where I can test with real use cases so we can see what is working or not. The whole point of LLMs is to be useful to us, rather than showing a lot of percentage numbers that we don't use anyway. That's why I like a benchmark like @bridgemindai , where he tests with real use cases. I'm going to build something similar where I will test it with simple to complex agentic products so people can see where they can use it and evaluate it.

POM@peterom

@Surajdotdot7 There are benchmarks included for agentic tool use

English

keysersoze@Surajdotdot7·5h

@th3_m0l3 @DavidOndrej1 @ClaudeDevs build after they blocked third-party OAUTH, so using with 3 max accounts, so yeah, it's useful, but I won't recommend it.

English

random_hero@th3_m0l3·5h

@Surajdotdot7 @DavidOndrej1 @ClaudeDevs legit?

English

David Ondrej@DavidOndrej1·1d

I'm spending ±$6,000 a month on Anthropic API you?

English

6.2K

keysersoze@Surajdotdot7·6h

@kimmonismus Production reality: evals said better, pipeline disagreed. Rolled back model versions twice at 8M Studio because benchmark improvements didn't translate to our actual workload. More tokens in adaptive thinking means nothing if the outputs regress on your specific task.

English

394

Chubby♨️@kimmonismus·6h

Opus 4.7 does seem to have improved, and its adaptive thinking now uses more tokens. However, compared to Opus 4.6, it still performs significantly worse.

English

391

19.7K

keysersoze@Surajdotdot7·6h

@KyleHessling1 If that delta holds, the inference cost story changes completely. We run pipelines processing thousands of jobs/month — model size directly maps to server cost. A 27B that punches up is worth more than a 235B that barely edges it.

English

175

Kyle Hessling@KyleHessling1·12h

Am I mistaken that if the delta holds as seen between the Qwen 3.6 35b MOE and the Qwen 3.5 35b MOE, that the 3.6 dense 27B delta will unseat Kimi k2.5 at less than 3% of the model size? You remember when we were all considering buying 2 or 4 mac studios just to run REAP prunes in Q1 to run that model? We could soon have similar capability on a 3090. Exciting acceleration, to say the least!

English

168

16.2K

keysersoze@Surajdotdot7·6h

@bindureddy 80% on evals. Running 8k+ images/month through an agent pipeline, you learn fast that the bottom 20% is where all the failures live — consistency on edge cases, tool use reliability, following multi-step instructions. Benchmarks don't measure that. Will test it.

English

171

Bindu Reddy@bindureddy·9h

The big story that everyone missed yesterday - Qwen 3.6 dropped with 3B active params costs nothing to run and delivers 80% of Opus 4.7’s performance 🤯 Open source is making giant leaps

English

715

38.7K

keysersoze@Surajdotdot7·6h

@TeksEdge Benchmark is one signal. We run 8k+ tasks/month through pipeline. The real test is task completion without retry loops at scale. 3x faster inference means nothing if production error rate doubles. Running it this week to check if the ts-bench holds outside controlled evals.

English

613

David Hendrickson@TeksEdge·11h

🔥 Qwen3.6-35B-A3B is INSANELY strong 🔥 ✅ 100% ts-bench success rate (with opencode, vibe-local, GitHub Copilot, qwencode & Claude Code) ⚡ Matches Claude Sonnet 4.6 & Opus 4.6 task speed ⚡ 3x faster inference than Qwen3.5-27B → way shorter completion times 💻 Runs on consumer gear: Mac (32GB+ RAM) or RTX 3090/4090/5090

金のニワトリ@gosrum

Qwen3.6-35B-A3Bが強すぎる！！！・opencode,vibe-local,GitHub Copilot,qwencode,claude codeと組み合わせたときのts-benchを実施したところ、すべて満点・しかもClaude sonnet 4.6やOpus 4.6と同じくらい速くタスクを遂行できている Qwen3.5-27Bもすごかったが、Qwen3.6-35B-A3Bは赤い彗星のごとく27Bよりも推論速度が3倍速いので、ベンチマーク結果からもわかるようにタスク遂行までの時間が大幅に短縮できるようになったのが大きい

English

475

36.6K

keysersoze@Surajdotdot7·6h

@garrytan @LouiseDSadeleer "Too chatty" kills pipeline costs before most people realize it. Every unnecessary word is a token — at 8k+ images/month through AI pipelines, verbose outputs compound fast. Good to see this taken seriously at the product level.

English

Garry Tan@garrytan·7h

GStack is now at v1.0 General Release If you used it before and didn't like how chatty it was, we've fixed it. Thanks @LouiseDSadeleer for the incredible feedback. I love listening to real users because it is literally the way to making them happy with new product changes.

English

9.4K

keysersoze@Surajdotdot7·6h

@aftermagics Curious how consistent the output is across different products. We run 8k+ images/month through a video pipeline — the hard part isn't making one good video, it's when product #4,000 still needs the same quality.

English

Aftermagics@aftermagics·14h

I created this video using Claude Design in just 15 prompts.

Claude@claudeai

Introducing Claude Design by Anthropic Labs: make prototypes, slides, and one-pagers by talking to Claude. Powered by Claude Opus 4.7, our most capable vision model. Available in research preview on the Pro, Max, Team, and Enterprise plans, rolling out throughout the day.

English

1.4K

167.3K

keysersoze@Surajdotdot7·7h

@pierreeliottlal 12 prompts for a one-off is impressive. Getting it repeatable at scale is a different problem — our fashion video pipeline took months to lock down parameters that work consistently across 8k+ images/month without breaking every third batch.

English

Pierre-Eliott Lallemant@pierreeliottlal·17h

Made this video with claude in 12 prompts

Claude@claudeai

English

1.9K

309.1K

keysersoze@Surajdotdot7·7h

@snowmaker Chennai kid here. We don't apply to things hoping to get in — we apply knowing we'll build something regardless of whether we do.

English

Jared Friedman@snowmaker·9h

We had room for 2,000 people at Startup School India. More than 25,000 applied. No Startup School anywhere in the world has ever had this many people apply. Not SF, not NYC, not London. India blew them all away.

English

182

323

4.2K

195.7K

keysersoze@Surajdotdot7·7h

@liu8in "Solved" is doing heavy lifting here. We auto-generate 13s fashion videos at $0.63 each — looks clean at 10. At 8,000/month you find the edge cases demo mode never shows. Curious what breaks first at scale.

English

1.5K

Bin Liu@liu8in·10h

alright - verdict is in - Motion Design is solved made with HyperFrames + Claude Design btw - HyperFrames is open source, star it on github and I'll send tutorial on how i made this with 2 prompts.

Claude@claudeai

English

106

2.7K

211.8K

keysersoze@Surajdotdot7·7h

@garrytan Subagent timeouts were silently killing jobs in our image pipeline before we added retry logic. No error, just gone. The fix was the same mental model — treat agent tasks like background jobs, not synchronous calls. Should be the default architecture, not a v0.11 addition.

English

140

Garry Tan@garrytan·7h

Now launching GBrain v0.11 with Minions I got sick of OpenClaw's subagents timing out and not getting things done So I built a queue/jobs system that uses GBrain's Postgres/PGLite based on BullMQ to give your OpenClaw/GBrain setup wings. Minions are 10X faster, more reliable

English

392

46.1K

keysersoze@Surajdotdot7·7h

@gkisokay The 0.04% aren't smarter. They just didn't quit when their first agent hallucinated in prod. Real filter: did you debug it at 2am or uninstall the app?

English

413

Graeme@gkisokay·12h

Your daily reminder that you are so early to AI. - 84% have never meaningfully touched it - 16% use a free chatbot occasionally - 0.3% pay $20/month - 0.04% use a coding scaffolding - 0.01% are just like you You're building orchestrated agents, running models at 2 am, buying hardware, and compounding your advantage every single day. Meanwhile, 99.9% of people are laughing at Mac mini buyers, OpenClaw users, and home GPU nerds. If you're part of the 0.01%, you are part of the collective building the infrastructure everyone else will depend on. The gap is accelerating. Lock in.

English

123

112

1.1K

63.1K

keysersoze@Surajdotdot7·8h

10M views in 24h. Full AI video. The "we're cooked" framing misses it — this is the floor, not the ceiling. If you're a solo builder, this is your unfair advantage. One person can now produce at studio scale. Adapt or get lapped.

shirish@shiri_shh

This video did 10M views in 24 hours. 100% made by Al. we're cooked 😭

English

keysersoze@Surajdotdot7·8h

@garrytan Ferrari breaks down every third deploy but when it works you ship things no Honda driver even imagined. The bus is fine until you need to go somewhere it doesn't.

English

718

Garry Tan@garrytan·8h

Using OpenClaw is basically is like driving your own Ferrari (that you have to be a mechanic for yourself) and it's broken down all the time, but gives you the time of your life vs driving a reliable Honda (Hermes Agent) vs riding the bus (Claude / ChatGPT)

English

272

121

1.7K

89.4K

Keşfet

@kimmonismus @viktoroddy @DavidOndrej1 @ZypZapCommunity @bridgemindai @th3_m0l3 @ClaudeDevs @KyleHessling1