Rohit Kothari retweetledi

building an internal tool right now
it automates tedious youtube tasks
1. yt-dlp download (1-3 min)
2. ffmpeg audio + frames (~1 min)
3. whisper-1 transcription, chunked if >24MB (3-5 min for ~hour video)
4. PIL frame scoring (~30s)
5. gpt-4o vision picks 4 face refs from top 20 frames
6. claude writes angle, 3 titles, description, 2 thumbnail prompts from transcript
7. gpt-image-2 high quality stage1 (x2 backgrounds)
8. nano-banana-2 stage2 (x2 face composites)
DM IF YOU WANT EARLY ACCESS!
($199 lifetime -- bring your own API keys -- when i launch, price will start from $99/mo)
buy.stripe.com/7sYbJ1erddJz9a…
English











