keysersoze

748 posts

keysersoze banner
keysersoze

keysersoze

@Surajdotdot7

building things with AI Claude Code | agents | automation sharing what I learn — for people who want to build, not just watch

Chennai Katılım Ağustos 2019
808 Takip Edilen104 Takipçiler
keysersoze
keysersoze@Surajdotdot7·
@kimmonismus Running this math at micro scale already. Replaced what used to need a video shoot team with a $0.63/video pipeline. The payroll→infrastructure trade isn't a Meta story. It's every company running the numbers right now.
English
0
0
0
27
Chubby♨️
Chubby♨️@kimmonismus·
Meta layoffs investors had been bracing for are coming, with roughly 8,000 jobs cut starting May 20, about 10% of its 79,000-person workforce. Mainly to free up billions for AI infrastructure, shifting resources from payroll to data centers, chips, and advanced models as highlighted by Mark Zuckerberg.
Chubby♨️ tweet media
English
10
6
79
5.5K
keysersoze
keysersoze@Surajdotdot7·
@viktoroddy Demo quality and "build 47 product pages that match our actual brand system" quality are different things. Curious where this breaks when client constraints hit it.
English
0
0
2
545
Viktor Oddy
Viktor Oddy@viktoroddy·
Claude Design is insane. ❤️‍🔥Just recorded a 18-min tutorial on how to build animated, award-winning websites with Claude Design + Opus 4.7!
English
65
269
3.6K
189K
keysersoze
keysersoze@Surajdotdot7·
@viktoroddy 18 mins to build it. Weeks to get it working inside an actual brand system with 5 years of design decisions already baked in. The demo is never the hard part.
English
1
0
0
535
keysersoze
keysersoze@Surajdotdot7·
@DavidOndrej1 Why can't they expand get new compute from nividia or AWS or is it more of a political problem
English
3
0
5
1.1K
David Ondrej
David Ondrej@DavidOndrej1·
Dwarkesh was right. Anthropic is running out of compute.
English
28
9
418
26.7K
keysersoze
keysersoze@Surajdotdot7·
@ZypZapCommunity I’m merging the Creative Director and Video Planner into one agent to slash token usage and latency. Instead of chatting, one agent now sets the shot order (1st/last frames). This flows to JSON -> Prompt Director optimization -> Gen -> Editor for stitching. Much leaner.
English
1
0
0
17
ZypZap
ZypZap@ZypZapCommunity·
@Surajdotdot7 That agent stack sounds wild for turning product shots into clips. Does the creative director actually decide transitions or just shot order?
English
1
0
1
7
keysersoze
keysersoze@Surajdotdot7·
I understand, but what I mean is I am building an agentic product that turns e-commerce catalogs (5 images) into clips and performs video editing. It has 5 specialized agents called creative director, video planner, prompt director (kling), video editor, and QA analyst. It runs perfectly in Sonnet or Opus, but I tried Qwen 3.5, Gemma 4, and other local models. Based on benchmarks, it often fails to do what it should do. I'm tired of seeing benchmarks that don't actually work in production-ready settings. I often see failure rates in agentic tasks, so I want to build a benchmark where I can test with real use cases so we can see what is working or not. The whole point of LLMs is to be useful to us, rather than showing a lot of percentage numbers that we don't use anyway. That's why I like a benchmark like @bridgemindai , where he tests with real use cases. I'm going to build something similar where I will test it with simple to complex agentic products so people can see where they can use it and evaluate it.
POM@peterom

@Surajdotdot7 There are benchmarks included for agentic tool use

English
1
0
2
79
David Ondrej
David Ondrej@DavidOndrej1·
I'm spending ±$6,000 a month on Anthropic API you?
David Ondrej tweet media
English
47
0
65
6.2K
keysersoze
keysersoze@Surajdotdot7·
@kimmonismus Production reality: evals said better, pipeline disagreed. Rolled back model versions twice at 8M Studio because benchmark improvements didn't translate to our actual workload. More tokens in adaptive thinking means nothing if the outputs regress on your specific task.
English
0
0
0
394
Chubby♨️
Chubby♨️@kimmonismus·
Opus 4.7 does seem to have improved, and its adaptive thinking now uses more tokens. However, compared to Opus 4.6, it still performs significantly worse.
English
44
12
391
19.7K
keysersoze
keysersoze@Surajdotdot7·
@KyleHessling1 If that delta holds, the inference cost story changes completely. We run pipelines processing thousands of jobs/month — model size directly maps to server cost. A 27B that punches up is worth more than a 235B that barely edges it.
English
0
0
1
175
Kyle Hessling
Kyle Hessling@KyleHessling1·
Am I mistaken that if the delta holds as seen between the Qwen 3.6 35b MOE and the Qwen 3.5 35b MOE, that the 3.6 dense 27B delta will unseat Kimi k2.5 at less than 3% of the model size? You remember when we were all considering buying 2 or 4 mac studios just to run REAP prunes in Q1 to run that model? We could soon have similar capability on a 3090. Exciting acceleration, to say the least!
Kyle Hessling tweet media
English
16
10
168
16.2K
keysersoze
keysersoze@Surajdotdot7·
@bindureddy 80% on evals. Running 8k+ images/month through an agent pipeline, you learn fast that the bottom 20% is where all the failures live — consistency on edge cases, tool use reliability, following multi-step instructions. Benchmarks don't measure that. Will test it.
English
0
0
0
171
Bindu Reddy
Bindu Reddy@bindureddy·
The big story that everyone missed yesterday - Qwen 3.6 dropped with 3B active params costs nothing to run and delivers 80% of Opus 4.7’s performance 🤯 Open source is making giant leaps
English
72
60
715
38.7K
keysersoze
keysersoze@Surajdotdot7·
@TeksEdge Benchmark is one signal. We run 8k+ tasks/month through pipeline. The real test is task completion without retry loops at scale. 3x faster inference means nothing if production error rate doubles. Running it this week to check if the ts-bench holds outside controlled evals.
English
2
0
5
613
keysersoze
keysersoze@Surajdotdot7·
@garrytan @LouiseDSadeleer "Too chatty" kills pipeline costs before most people realize it. Every unnecessary word is a token — at 8k+ images/month through AI pipelines, verbose outputs compound fast. Good to see this taken seriously at the product level.
English
0
0
0
62
Garry Tan
Garry Tan@garrytan·
GStack is now at v1.0 General Release If you used it before and didn't like how chatty it was, we've fixed it. Thanks @LouiseDSadeleer for the incredible feedback. I love listening to real users because it is literally the way to making them happy with new product changes.
Garry Tan tweet media
English
14
4
68
9.4K
keysersoze
keysersoze@Surajdotdot7·
@aftermagics Curious how consistent the output is across different products. We run 8k+ images/month through a video pipeline — the hard part isn't making one good video, it's when product #4,000 still needs the same quality.
English
1
0
3
4K
keysersoze
keysersoze@Surajdotdot7·
@pierreeliottlal 12 prompts for a one-off is impressive. Getting it repeatable at scale is a different problem — our fashion video pipeline took months to lock down parameters that work consistently across 8k+ images/month without breaking every third batch.
English
0
0
0
63
keysersoze
keysersoze@Surajdotdot7·
@snowmaker Chennai kid here. We don't apply to things hoping to get in — we apply knowing we'll build something regardless of whether we do.
English
0
0
0
26
Jared Friedman
Jared Friedman@snowmaker·
We had room for 2,000 people at Startup School India. More than 25,000 applied. No Startup School anywhere in the world has ever had this many people apply. Not SF, not NYC, not London. India blew them all away.
English
182
323
4.2K
195.7K
keysersoze
keysersoze@Surajdotdot7·
@liu8in "Solved" is doing heavy lifting here. We auto-generate 13s fashion videos at $0.63 each — looks clean at 10. At 8,000/month you find the edge cases demo mode never shows. Curious what breaks first at scale.
English
0
0
0
1.5K
Bin Liu
Bin Liu@liu8in·
alright - verdict is in - Motion Design is solved made with HyperFrames + Claude Design btw - HyperFrames is open source, star it on github and I'll send tutorial on how i made this with 2 prompts.
Claude@claudeai

Introducing Claude Design by Anthropic Labs: make prototypes, slides, and one-pagers by talking to Claude. Powered by Claude Opus 4.7, our most capable vision model. Available in research preview on the Pro, Max, Team, and Enterprise plans, rolling out throughout the day.

English
52
106
2.7K
211.8K
keysersoze
keysersoze@Surajdotdot7·
@garrytan Subagent timeouts were silently killing jobs in our image pipeline before we added retry logic. No error, just gone. The fix was the same mental model — treat agent tasks like background jobs, not synchronous calls. Should be the default architecture, not a v0.11 addition.
English
0
0
0
140
Garry Tan
Garry Tan@garrytan·
Now launching GBrain v0.11 with Minions I got sick of OpenClaw's subagents timing out and not getting things done So I built a queue/jobs system that uses GBrain's Postgres/PGLite based on BullMQ to give your OpenClaw/GBrain setup wings. Minions are 10X faster, more reliable
Garry Tan tweet media
English
37
36
392
46.1K
keysersoze
keysersoze@Surajdotdot7·
@gkisokay The 0.04% aren't smarter. They just didn't quit when their first agent hallucinated in prod. Real filter: did you debug it at 2am or uninstall the app?
English
1
0
1
413
Graeme
Graeme@gkisokay·
Your daily reminder that you are so early to AI. - 84% have never meaningfully touched it - 16% use a free chatbot occasionally - 0.3% pay $20/month - 0.04% use a coding scaffolding - 0.01% are just like you You're building orchestrated agents, running models at 2 am, buying hardware, and compounding your advantage every single day. Meanwhile, 99.9% of people are laughing at Mac mini buyers, OpenClaw users, and home GPU nerds. If you're part of the 0.01%, you are part of the collective building the infrastructure everyone else will depend on. The gap is accelerating. Lock in.
Graeme tweet media
English
123
112
1.1K
63.1K
keysersoze
keysersoze@Surajdotdot7·
@garrytan Ferrari breaks down every third deploy but when it works you ship things no Honda driver even imagined. The bus is fine until you need to go somewhere it doesn't.
English
0
0
2
718
Garry Tan
Garry Tan@garrytan·
Using OpenClaw is basically is like driving your own Ferrari (that you have to be a mechanic for yourself) and it's broken down all the time, but gives you the time of your life vs driving a reliable Honda (Hermes Agent) vs riding the bus (Claude / ChatGPT)
English
272
121
1.7K
89.4K