Karim C

16.7K posts

Karim C banner
Karim C

Karim C

@BrandGrowthOS

CMO | Focused on AI in production | Building AI Agents | Co-Founder https://t.co/ct1wWcUHLt | Sharing my journey of creating Agents

Global เข้าร่วม Haziran 2024
443 กำลังติดตาม1.5K ผู้ติดตาม
ทวีตที่ปักหมุด
Karim C
Karim C@BrandGrowthOS·
built an ai agent that turns a tiktok link into a fully produced music video. one telegram message in, finished video out the pipeline: yt-dlp → download source video whisper → transcribe on home gpu (free) elevenlabs v3 → generate character voice in hindi nano-banana 2 → create consistent character image fal ai aurora → lip-sync video ($0.14/sec) zapcap → animated tiktok captions ayrshare → publish to tiktok all run by a claude code agent on telegram. no workflows, no dashboards. i send a link, approve the audio and a preview image, then the agent handles the rest aurora lip-sync is the big cost. roughly $2-4 per video depending on length. that's why the agent sends me a preview before running it so i'm not burning money on bad takes this was literally my first attempt. if any of you have built something similar or see ways to improve it i'm all ears I also created a music video using Suno to create the song and same tech stack above. all for the fun of learning and hacking the tiktok algo for paid media. @vihanmusic095/video/7622210911322819856" target="_blank" rel="nofollow noopener">tiktok.com/@vihanmusic095
English
3
0
2
373
Karim C
Karim C@BrandGrowthOS·
@ID_AA_Carmack this is why i always cringe when people say 'just throw more data at it' without thinking about precision. bfloat16 is great for training but those gaps become real problems when you're trying to debug why an agent is making weird decisions at edge cases
English
0
0
3
454
John Carmack
John Carmack@ID_AA_Carmack·
Making a scatter plot of 400_000 data points, some of the plots had odd gaps in coverage. It took me a little while to realize that it was only when the data was farther from the origin -- it was the raw bfloat16 precision. Everything looks great from -1 to 1, but as you go past 2 and 4, the coverage gaps get larger. My intuition didn't have it being quite so "discretely countable" at those modest numeric values. Float32 for comparison.
John Carmack tweet mediaJohn Carmack tweet media
English
20
18
263
18.8K
Karim C
Karim C@BrandGrowthOS·
@mattshumer_ what makes it insanely good? i've built a few personal agents and the failure modes are always interesting
English
0
0
0
98
Matt Shumer
Matt Shumer@mattshumer_·
If you want to try a new personal agent that is just... insanely good, comment + DM me.
English
124
4
75
14K
Karim C
Karim C@BrandGrowthOS·
@kimmonismus hitting $60-80/month just from my agent workflows. o1 thinking rate limits were killing me so yeah this actually makes sense
English
0
0
0
76
Karim C
Karim C@BrandGrowthOS·
@hwchase17 this is where the real magic happens. custom middleware is what makes agents actually useful vs just demo-ware. what kind of middleware are you seeing the most demand for?
English
0
0
0
7
Harrison Chase
Harrison Chase@hwchase17·
In case you haven’t seen - more and more community middleware is popping up as ways to customize your agents and deepagents Got some middleware you want to contribute? Reach out to Sydney!
Sydney Runkle@sydneyrunkle

we just added a langchain-task-steering middleware to our community registry, s/o @EHallvaxhiu for the contribution! keep your agents on track w/ ordered task pipelines. this is great for structured data pipelines and compliance heavy workflows. github.com/edvinhallvaxhi…

English
1
1
11
1.6K
Karim C
Karim C@BrandGrowthOS·
@melvynx this is smart. i run single agents and catch dumb mistakes hours later. how many review agents do you usually spin up? and do they specialize or just general code review?
English
0
0
0
99
Melvyn • Builder
Melvyn • Builder@melvynx·
this is basically the biggest Claude Code hack every time I run it, it runs multiple review agents to verify the code generated all agents are independent, not linked with the main, and can think and run independently and find most mistakes
Melvyn • Builder tweet media
English
28
4
155
25.2K
Karim C
Karim C@BrandGrowthOS·
@sama been waiting for this. the 4o limits hit me constantly when building agents - $100/mo is nothing if i can actually ship without hitting walls
English
0
0
0
26
Sam Altman
Sam Altman@sama·
It is very nice to see Codex getting so much love. We are launching a $100 ChatGPT Pro tier by very popular demand.
English
818
174
4.6K
282K
Karim C
Karim C@BrandGrowthOS·
@karpathy the gap between 'i tried chatgpt once' and 'i run agents daily' is massive. people who last touched ai 6 months ago are still debating if it can write code while i'm debugging why my automation agent keeps booking the wrong calendar slots
English
0
0
1
40
Andrej Karpathy
Andrej Karpathy@karpathy·
Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions. TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.
staysaasy@staysaasy

The degree to which you are awed by AI is perfectly correlated with how much you use AI to code.

English
395
752
6.7K
630.2K
Karim C
Karim C@BrandGrowthOS·
@hwchase17 what's the switching cost like? curious if the middleware carries over or if it's a full rewrite
English
1
0
0
29
Harrison Chase
Harrison Chase@hwchase17·
great q! deepagents has more "batteries included", which langchain v1 is a very minimalistic agent harness if you are doing more complex workflows (eg claude code for X) -> deepagents if you want something simple -> langchain both are customizable with middleware
Julia Passynkova@JPassynkova

@hwchase17 @hwchase17 Trying to understand when to use LangChain v1 and when to use DeepAgent. They seem to share many common blocks: middleware, planning, memory summarization, and even skills can be manually added to LangChain. So what should I use for a conversation chat app?

English
2
3
15
3K
Karim C
Karim C@BrandGrowthOS·
@hwchase17 curious what patterns you're seeing work better? running agents locally and the sandbox approach feels limiting but not clear what the next evolution looks like
English
0
0
0
41
Harrison Chase
Harrison Chase@hwchase17·
most of the agents we see being built do this the main cases where we see people using "agent in a sandbox" is when they are using claude agent sdk (which is poorly designed for "harness outside sandbox")
Nasir Shadravan@n4Cr

@hwchase17 Do you also move the harness it outside the sandbox like managed agents?

English
4
4
27
2.8K
Karim C
Karim C@BrandGrowthOS·
@om_patel5 wait claude forgot plan mode exists? that's wild. i've had claude code rewrite my n8n workflows like 3 times because it 'forgot' what n8n could do. paying premium for memory loss is rough
English
0
0
0
45
Om Patel
Om Patel@om_patel5·
CLAUDE CODE OPUS 4.6 JUST FORGOT ITS OWN FEATURES. people are paying 20x more and getting worse performance. one user showed Claude couldn’t even recognize its own native Plan Mode. the same project had to be rewritten twice after reviews called the code a “dumpster fire” and now it doesn’t even know how to use its own tools? this isn’t edge case behavior. this is basic competency breaking. > can’t activate plan mode > worse code quality > inconsistent reasoning > higher cost long-time users are flipping: > “i brought this to my org” > “enterprise subscription” > “been advocating for years” and now they’re looking at Hugging Face alternatives. that’s how bad it got. Opus 4.6 was supposed to be: > 1M token context > max effort reasoning > best-in-class coding instead people are getting: > confusion > regressions > unreliable outputs this feels like AI shrinkflation. same branding, higher price, less capability. when your “best model” can’t use its own features, something is seriously off.
Om Patel tweet media
English
86
74
618
38.1K
Karim C
Karim C@BrandGrowthOS·
@hwchase17 @daytonaio @modal @RunloopDev nice. memory persistence is everything for real agent work. curious about the MCP adoption - are people actually connecting these in production workflows or still mostly experimental?
English
0
0
1
77
Karim C
Karim C@BrandGrowthOS·
@OpenAI the fact that you have to keep resetting rate limits tells the whole story about codex demand. are you seeing most pro users actually hitting those limits or is this more about perception?
English
1
0
0
350
Karim C
Karim C@BrandGrowthOS·
@DavidOndrej1 nah they'll just get absorbed by microsoft or google. the tech is too valuable to let die, even if the business model is broken
English
0
0
0
71
David Ondrej
David Ondrej@DavidOndrej1·
Am I the only one who thinks OpenAI will go bankrupt in the next few years?
English
99
0
152
10.5K
Karim C
Karim C@BrandGrowthOS·
@hwchase17 need this. everyone talks about agent frameworks but nobody shares the actual architecture decisions that break at scale
English
0
0
2
100
Harrison Chase
Harrison Chase@hwchase17·
🎙️Introducing Max Agency Max Agency is a new podcast where we go deep on how the best agents are actually being built: architecture decisions, tradeoffs, evals, and everything in between. Each episode, I sit down with engineering leaders who are doing this work in production. Our first episode features Izzy Miller (@isidoremiller), AI Engineer at Hex (@_hex_tech). Hex has been shipping data agents since before most teams were even thinking about them, starting with single-cell text-to-SQL and graduating to a full Notebook agent that can work autonomously for 20 minutes on a complex analysis. Izzy has a lot of perspective on what it actually takes to get agents working well in production, and what breaks along the way. A few takeaways from our conversation: - Keep your eval sets small enough to hold in your head: Izzy runs 30-50 handcrafted "traps" with multiple repetitions, rather than hundreds of variants. If you can't explain why your agent fails each one, your eval set is too big - Day zero performance is almost irrelevant: The more interesting question is how the agent compounds. Izzy is building a 90-day simulation where the warehouse evolves and the agent has to accumulate understanding - You can catch agent errors without seeing the raw outputs: By running an LLM-as-a-judge over production usage and clustering the results, you can surface places where something likely went wrong, without needing to read individual conversations Watch the full episode on: - Youtube: youtube.com/watch?v=Xyh1Eq… - Apple Podcasts: podcasts.apple.com/us/podcast/how… - Spotify: open.spotify.com/episode/1BJlg3…
YouTube video
YouTube
English
8
31
142
16.4K
Karim C
Karim C@BrandGrowthOS·
@gregisenberg passwords are already dead for anyone building with ai. my agents authenticate with api keys and i use passkeys everywhere else. the transition is happening faster than people realize
English
0
0
0
15
GREG ISENBERG
GREG ISENBERG@gregisenberg·
Our kids will think we were crazy for how we handled passwords "so you reused the same password everywhere, answered what's your mother's maiden name to prove it was you, got phished by an email that looked exactly like your fav social app" Yeah, I know my son, it was wild
English
54
3
166
12.1K
Addy Crezee | /function1
@BrandGrowthOS @thehypedotnews fully agree. the only problem with telegram it's not that widely used in many western countries though. but the fact that there is almost 1bln people there already and the UI is phenomenal makes a difference for ai agents for sure
English
1
0
1
34
Karim C
Karim C@BrandGrowthOS·
@MatthewBerman @cursor_ai Which processes / workflows are you delegating to open source? And which open source models have you found to be the best so far.
English
0
0
0
121
Matthew Berman
Matthew Berman@MatthewBerman·
Things I’m doing while flying at 34,000 feet: * Fine-tuning on my DGX Station (SSH) * Running 8 concurrent @cursor_ai cloud agents * Replying to emails * Posting on X
English
20
3
105
37.4K
Karim C
Karim C@BrandGrowthOS·
@emollick this hits different when you're building AI agents. you want them to be helpful but you reward them for completing tasks. then wonder why they hallucinate to get the checkmark
English
0
0
1
192