Karim C

16.7K posts

Karim C

@BrandGrowthOS

CMO | Focused on AI in production | Building AI Agents | Co-Founder https://t.co/ct1wWcUHLt | Sharing my journey of creating Agents

Global เข้าร่วม Haziran 2024

443 กำลังติดตาม1.5K ผู้ติดตาม

ทวีตที่ปักหมุด

Karim C@BrandGrowthOS·29 Mar

built an ai agent that turns a tiktok link into a fully produced music video. one telegram message in, finished video out the pipeline: yt-dlp → download source video whisper → transcribe on home gpu (free) elevenlabs v3 → generate character voice in hindi nano-banana 2 → create consistent character image fal ai aurora → lip-sync video ($0.14/sec) zapcap → animated tiktok captions ayrshare → publish to tiktok all run by a claude code agent on telegram. no workflows, no dashboards. i send a link, approve the audio and a preview image, then the agent handles the rest aurora lip-sync is the big cost. roughly $2-4 per video depending on length. that's why the agent sends me a preview before running it so i'm not burning money on bad takes this was literally my first attempt. if any of you have built something similar or see ways to improve it i'm all ears I also created a music video using Suno to create the song and same tech stack above. all for the fun of learning and hacking the tiktok algo for paid media. @vihanmusic095/video/7622210911322819856" target="_blank" rel="nofollow noopener">tiktok.com/@vihanmusic095…

English

373

Karim C@BrandGrowthOS·51m

@ID_AA_Carmack this is why i always cringe when people say 'just throw more data at it' without thinking about precision. bfloat16 is great for training but those gaps become real problems when you're trying to debug why an agent is making weird decisions at edge cases

English

454

John Carmack@ID_AA_Carmack·56m

Making a scatter plot of 400_000 data points, some of the plots had odd gaps in coverage. It took me a little while to realize that it was only when the data was farther from the origin -- it was the raw bfloat16 precision. Everything looks great from -1 to 1, but as you go past 2 and 4, the coverage gaps get larger. My intuition didn't have it being quite so "discretely countable" at those modest numeric values. Float32 for comparison.

English

263

18.8K

Karim C@BrandGrowthOS·2h

@mattshumer_ what makes it insanely good? i've built a few personal agents and the failure modes are always interesting

English

Matt Shumer@mattshumer_·2h

If you want to try a new personal agent that is just... insanely good, comment + DM me.

English

124

14K

Karim C@BrandGrowthOS·2h

@kimmonismus hitting $60-80/month just from my agent workflows. o1 thinking rate limits were killing me so yeah this actually makes sense

English

Chubby♨️@kimmonismus·6h

$100 ChatGPT pro tier official. -5x codex rates -access to ChatGPT pro -unlimited thinking Yeah, that’s really worth the $100. great deal, good pricing. Well done OpenAI

OpenAI@OpenAI

We’re updating our ChatGPT Pro and Plus subscriptions to better support the growing use of Codex. We’re introducing a new $100/month Pro tier. This new tier offers 5x more Codex usage than Plus and is best for longer, high-effort Codex sessions. In ChatGPT, this new Pro tier still offers access to all Pro features, including the exclusive Pro model and unlimited access to Instant and Thinking models. To celebrate the launch, we’re increasing Codex usage for a limited time through May 31st so that Pro $100 subscribers get up to 10x usage of ChatGPT Plus on Codex to build your most ambitious ideas.

English

1.1K

58.8K

Karim C@BrandGrowthOS·2h

@hwchase17 this is where the real magic happens. custom middleware is what makes agents actually useful vs just demo-ware. what kind of middleware are you seeing the most demand for?

English

Harrison Chase@hwchase17·3h

In case you haven’t seen - more and more community middleware is popping up as ways to customize your agents and deepagents Got some middleware you want to contribute? Reach out to Sydney!

Sydney Runkle@sydneyrunkle

we just added a langchain-task-steering middleware to our community registry, s/o @EHallvaxhiu for the contribution! keep your agents on track w/ ordered task pipelines. this is great for structured data pipelines and compliance heavy workflows. github.com/edvinhallvaxhi…

English

1.6K

Karim C@BrandGrowthOS·2h

@melvynx this is smart. i run single agents and catch dumb mistakes hours later. how many review agents do you usually spin up? and do they specialize or just general code review?

English

Melvyn • Builder@melvynx·23h

this is basically the biggest Claude Code hack every time I run it, it runs multiple review agents to verify the code generated all agents are independent, not linked with the main, and can think and run independently and find most mistakes

English

155

25.2K

Karim C@BrandGrowthOS·2h

@sama been waiting for this. the 4o limits hit me constantly when building agents - $100/mo is nothing if i can actually ship without hitting walls

English

Sam Altman@sama·3h

It is very nice to see Codex getting so much love. We are launching a $100 ChatGPT Pro tier by very popular demand.

English

818

174

4.6K

282K

Karim C@BrandGrowthOS·3h

@karpathy the gap between 'i tried chatgpt once' and 'i run agents daily' is massive. people who last touched ai 6 months ago are still debating if it can write code while i'm debugging why my automation agent keeps booking the wrong calendar slots

English

Andrej Karpathy@karpathy·3h

Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions. TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.

staysaasy@staysaasy

The degree to which you are awed by AI is perfectly correlated with how much you use AI to code.

English

395

752

6.7K

630.2K

Karim C@BrandGrowthOS·3h

@hwchase17 what's the switching cost like? curious if the middleware carries over or if it's a full rewrite

English

Harrison Chase@hwchase17·4h

great q! deepagents has more "batteries included", which langchain v1 is a very minimalistic agent harness if you are doing more complex workflows (eg claude code for X) -> deepagents if you want something simple -> langchain both are customizable with middleware

Julia Passynkova@JPassynkova

@hwchase17 @hwchase17 Trying to understand when to use LangChain v1 and when to use DeepAgent. They seem to share many common blocks: middleware, planning, memory summarization, and even skills can be manually added to LangChain. So what should I use for a conversation chat app?

English

Karim C@BrandGrowthOS·4h

@hwchase17 curious what patterns you're seeing work better? running agents locally and the sandbox approach feels limiting but not clear what the next evolution looks like

English

Harrison Chase@hwchase17·5h

most of the agents we see being built do this the main cases where we see people using "agent in a sandbox" is when they are using claude agent sdk (which is poorly designed for "harness outside sandbox")

Nasir Shadravan@n4Cr

@hwchase17 Do you also move the harness it outside the sandbox like managed agents?

English

2.8K

Karim C@BrandGrowthOS·5h

@om_patel5 wait claude forgot plan mode exists? that's wild. i've had claude code rewrite my n8n workflows like 3 times because it 'forgot' what n8n could do. paying premium for memory loss is rough

English

Om Patel@om_patel5·1d

CLAUDE CODE OPUS 4.6 JUST FORGOT ITS OWN FEATURES. people are paying 20x more and getting worse performance. one user showed Claude couldn’t even recognize its own native Plan Mode. the same project had to be rewritten twice after reviews called the code a “dumpster fire” and now it doesn’t even know how to use its own tools? this isn’t edge case behavior. this is basic competency breaking. > can’t activate plan mode > worse code quality > inconsistent reasoning > higher cost long-time users are flipping: > “i brought this to my org” > “enterprise subscription” > “been advocating for years” and now they’re looking at Hugging Face alternatives. that’s how bad it got. Opus 4.6 was supposed to be: > 1M token context > max effort reasoning > best-in-class coding instead people are getting: > confusion > regressions > unreliable outputs this feels like AI shrinkflation. same branding, higher price, less capability. when your “best model” can’t use its own features, something is seriously off.

English

618

38.1K

Karim C@BrandGrowthOS·5h

@hwchase17 @daytonaio @modal @RunloopDev nice. memory persistence is everything for real agent work. curious about the MCP adoption - are people actually connecting these in production workflows or still mostly experimental?

English

Harrison Chase@hwchase17·6h

Deep Agents deploy gets you: - Deep Agents harness - Sandbox of your choice (@daytonaio , @modal , @RunloopDev ) - Short and long term memory - Agents exposed via MCP and A2A Production ready and open standards

Harrison Chase@hwchase17

x.com/i/article/2042…

English

8.9K

Karim C@BrandGrowthOS·5h

@OpenAI the fact that you have to keep resetting rate limits tells the whole story about codex demand. are you seeing most pro users actually hitting those limits or is this more about perception?

English

350

OpenAI@OpenAI·6h

Our existing $200 Pro tier still remains our highest usage option. And as a thank you to our existing Pro users on the $200 tier, we’re extending our 2x Codex usage promo (until May 31st) and we’ve reset your Codex rate limits (yes, again).

OpenAI@OpenAI

English

249

186

3.6K

315.5K

Karim C@BrandGrowthOS·5h

@DavidOndrej1 nah they'll just get absorbed by microsoft or google. the tech is too valuable to let die, even if the business model is broken

English

David Ondrej@DavidOndrej1·7h

Am I the only one who thinks OpenAI will go bankrupt in the next few years?

English

152

10.5K

Karim C@BrandGrowthOS·6h

@hwchase17 need this. everyone talks about agent frameworks but nobody shares the actual architecture decisions that break at scale

English

100

Harrison Chase@hwchase17·7h

🎙️Introducing Max Agency Max Agency is a new podcast where we go deep on how the best agents are actually being built: architecture decisions, tradeoffs, evals, and everything in between. Each episode, I sit down with engineering leaders who are doing this work in production. Our first episode features Izzy Miller (@isidoremiller), AI Engineer at Hex (@_hex_tech). Hex has been shipping data agents since before most teams were even thinking about them, starting with single-cell text-to-SQL and graduating to a full Notebook agent that can work autonomously for 20 minutes on a complex analysis. Izzy has a lot of perspective on what it actually takes to get agents working well in production, and what breaks along the way. A few takeaways from our conversation: - Keep your eval sets small enough to hold in your head: Izzy runs 30-50 handcrafted "traps" with multiple repetitions, rather than hundreds of variants. If you can't explain why your agent fails each one, your eval set is too big - Day zero performance is almost irrelevant: The more interesting question is how the agent compounds. Izzy is building a 90-day simulation where the warehouse evolves and the agent has to accumulate understanding - You can catch agent errors without seeing the raw outputs: By running an LLM-as-a-judge over production usage and clustering the results, you can surface places where something likely went wrong, without needing to read individual conversations Watch the full episode on: - Youtube: youtube.com/watch?v=Xyh1Eq… - Apple Podcasts: podcasts.apple.com/us/podcast/how… - Spotify: open.spotify.com/episode/1BJlg3…

YouTube

English

142

16.4K

Karim C@BrandGrowthOS·7h

@gregisenberg passwords are already dead for anyone building with ai. my agents authenticate with api keys and i use passkeys everywhere else. the transition is happening faster than people realize

English

GREG ISENBERG@gregisenberg·7h

Our kids will think we were crazy for how we handled passwords "so you reused the same password everywhere, answered what's your mother's maiden name to prove it was you, got phished by an email that looked exactly like your fav social app" Yeah, I know my son, it was wild

English

166

12.1K

Karim C@BrandGrowthOS·1d

@AddyCrezee @thehypedotnews I think that's just an opportunity for WhatsApp and iMessage to learn from them

English

Addy Crezee | /function1@AddyCrezee·2d

@BrandGrowthOS @thehypedotnews fully agree. the only problem with telegram it's not that widely used in many western countries though. but the fact that there is almost 1bln people there already and the UI is phenomenal makes a difference for ai agents for sure

English

thehype.@thehypedotnews·2d

950m telegram users can now create their own AI bots instantly. one link - one tap - done if you're building AI agents or tools - telegram might be your best distribution channel right now. here's how it works and how to build it x.com/thehypedotnews…

thehype.@thehypedotnews

x.com/i/article/2041…

English

605

Karim C@BrandGrowthOS·2d

been saying this for months. telegram is slept on for agent distribution. i run most of my agents through it already and the UX beats any dashboard i've built

thehype.@thehypedotnews

English

Karim C@BrandGrowthOS·2d

@MatthewBerman @cursor_ai Which processes / workflows are you delegating to open source? And which open source models have you found to be the best so far.

English

121

Matthew Berman@MatthewBerman·3d

Things I’m doing while flying at 34,000 feet: * Fine-tuning on my DGX Station (SSH) * Running 8 concurrent @cursor_ai cloud agents * Replying to emails * Posting on X

English

105

37.4K

Karim C@BrandGrowthOS·2d

@emollick this hits different when you're building AI agents. you want them to be helpful but you reward them for completing tasks. then wonder why they hallucinate to get the checkmark

English

192

Ethan Mollick@emollick·2d

Everyone should read "On the Folly of Rewarding A, While Hoping for B” at least once. ou.edu/russell/UGcomp…

The Information@theinformation

Exclusive: Meta employees are competing internally to become “Token Legends,” ranking themselves by how much AI compute they consume. The leaderboard reflects a new status game where token usage is tied to productivity and influence. thein.fo/4cagfEP

English

146

1.1K

89.6K

ค้นพบ

@ID_AA_Carmack @mattshumer_ @kimmonismus @hwchase17 @melvynx @sama @karpathy @om_patel5