Teddy Kim

4.6K posts

Teddy Kim

@teddykim

Human in the loop. Come along for the ride.

Katılım Şubat 2009

496 Takip Edilen717 Takipçiler

Teddy Kim@teddykim·1d

This is cool

Ben Sigman@bensig

My friend Milla Jovovich and I spent months creating an AI memory system with Claude. It just posted a perfect score on the standard benchmark - beating every product in the space, free or paid. It's called MemPalace, and it works nothing like anything else out there. Instead of sending your data to a background agent in the cloud, it mines your conversations locally and organizes them into a palace - a structured architecture with wings, halls, and rooms that mirrors how human memory actually works. Here is what that gets you: → Your AI knows who you are before you type a single word - family, projects, preferences, loaded in ~120 tokens → Palace architecture organizes memories by domain and type - not a flat list of facts, a navigable structure → Semantic search across months of conversations finds the answer in position 1 or 2 → AAAK compression fits your entire life context into 120 tokens - 30x lossless compression any LLM reads natively → Contradiction detection catches wrong names, wrong pronouns, wrong ages before you ever see them The benchmarks: 100% recall on LongMemEval — first perfect score ever recorded. 500/500 questions. Every question type at 100%. 92.9% on ConvoMem — more than 2x Mem0's score. 100% on LoCoMo — every multi-hop reasoning category, including temporal inference which stumps most systems. No API key. No cloud. No subscription. One dependency. Runs on your machine. Your memories never leave. MIT License. 100% Open Source. github.com/milla-jovovich…

Mount Juliet, TN 🇺🇸 English

Teddy Kim retweetledi

Tuki@TukiFromKL·2d

🚨 do you understand what the economy just did to an entire generation's love life.. over two-thirds of young adults aren't dating regularly.. and if trends continue one in three will never get married.. this isn't a dating problem.. it's a math problem.. in 1980 the average home cost 3x the median salary.. today it's 8x.. rent takes 40-50% of a young person's income before they've bought groceries.. the average american under 30 has $21,000 in student debt and $5,000 in credit card debt.. and you're wondering why they're not going on dinner dates.. their grandparents bought a house on a single factory salary at 22.. got married at 23.. had kids by 25.. not because they were more romantic.. but because the economy let them.. a first date in 1985 cost you a movie and a burger.. a first date in 2026 costs you a $60 uber.. $80 dinner.. and the emotional bandwidth of someone working two jobs who hasn't had a day off in three weeks.. companies killed pensions.. killed job security.. killed the 40-hour work week.. created a generation that can't plan past next month's rent.. and now the washington post is writing about them like they chose loneliness.. nobody stopped wanting love.. they just can't afford the life that comes with it.. and the generation that pulled the ladder up is writing articles asking why nobody's climbing.

unusual_whales@unusual_whales

"Over 2/3 of young adults" are not dating regularly, per WaPo. "If trends continue, 1-in-3 young adults will not get married."

English

388

2.6K

156.6K

Teddy Kim retweetledi

Devon Eriksen@Devon_Eriksen_·6d

While the left was telling you that you were the world's pariah... While the left was telling you America still hated you for Pearl Harbor... While the left was telling you we saw Fukushima as the site of an ecological crime instead of a heroic battle against a natural disaster... They were telling a younger version of me that Japan looked on us with contempt. They were telling us that you thought we were racially inferior. That our facial features were coarse and ugly. That you thought we stank ("batta kusai"). That you thought of us as uncultured barbarians with no refinement or manners. That you you saw the atomic bombs as acts of calculated cruelty, rather than desperate measures in a horrific war. They showed us photographs of businesses, somewhere in Japan, with signs outside, saying "no foreigners". They posted them in anime forums online, with captions like "So you love Japan? Well, Japan doesn't love you." And I couldn't help but believe some of that. A little bit. It was the only information I had. I once wrote a very angry post defending the decision to use atomic weapons instead of invading, a decision I still believe was sound, but I wrote about it with great anger, believing as I was told, that the Japanese felt no responsibility for the war, that they blamed everything on us. I lost a grandfather in WW2, a grandfather I never met. And my father didn't know how to be a father, because he grew up without one. And I was angry. And then someone connected twitter to an AI that could turn Japanese into English. And I learned what Japanese people actually wanted to say to me: "Hey, look at this thing I made!" "We love cats! Do you love cats, too?" "Arggh! That meat looks so delicious! This is torture!" "I love bluegrass music!" "It's so cool that you guys can own guns!" "Are you mad about Fukushima? Wait, you're not? What!?" Sigh. The left tells so many lies, because they need us to feel bad. They need us to feel alone, and afraid, and angry. They need us to feel worthless, so that we will surrender and allow ourselves to be dependent on the worldwide tyranny they wish to create. But to do this, they have to tell different lies to different people. And when we learn to talk to each other, and we compare notes, we can discover the inconsistencies, and spot the lies. The Christian religion has a legend, a story of the people of Bab'el, who built a tower reaching towards the heavens, so that they might climb. For some reason that isn't really clear, god is supposed to have found this to be a threat, and responded by cursing the world to speak different tongues, so that they could not work together. Now, AI technology has undone the curse of Bab'el, and rocket technology is building a tower of fire to lift us to the stars. There are too many people in the world who are actually bad, who actually hate us, and don't want to see that progress. So perhaps it's time we stopped nursing old grudges against people who we actually can live in peace with. I'm not the only person who suffered a loss. None of us is. Perhaps it's time, instead, to defy the people who lie to us. Perhaps it's time to work together again. And rebuild that tower.

うさこ🐰🌸@ankoromochuu

左翼はよく「日本は世界の嫌われ者」と私たち愛国者に言ってきます。私は悲しい気持ちになりながらも、「海外でも日本を好いてくれている人はいる」と信じてきました。 Xで世界の皆さんと話せるようになり、大勢の人が日本を親しみを込めて好いてくれていると感じることができました。ありがとう！

English

124

1.1K

5.9K

227.4K

Teddy Kim@teddykim·6d

If you are running an agent harness, you should audit your agents' activity at least every week. I have a script that turns findings into github issues that I get nag emails about. Reviewing agent activity logs will: 1. make you never trust agents again 2. realize (quickly) that you need multiple layers of agent controls

English

Teddy Kim@teddykim·1 Nis

When I was a kid I wanted to be a luthier, because I loved the idea that another person might use something I built to make their own magic. I didn't become a luthier, but over the weekend, I got to watch this group of incredible health tech founders use something I built to make their own magic. I'm so honored to be a part of their journeys.

English

Teddy Kim@teddykim·27 Mar

If your agent harness does something out-of-band, you should interrogate (challenge) it. Here is a real conversation I just had with my harness. This agent batched three bug tickets into one pull-request which doesn't follow the project convention of one-ticket-one-pr. ❯ out of curiosity, how did you make the decision to batch three tickets into one pr? what is the batching heuristic? ⏺ The heuristic I used: 1. Same file — all three modify video-player.tsx and nothing else. Three separate PRs would each touch the same component, creating merge conflicts between them. 2. No independent value — fixing the layout shift (#92) without fixing the buffering indicator (#93) still leaves a bad UX. The aspect-ratio container would show a properly-sized black rectangle instead of a weirdly-sized one — marginally better but still confusing. They're more valuable together. 3. Same review context — a reviewer needs to understand the video player's rendering lifecycle to evaluate any of these. Loading one PR's context is enough to review all three. 4. Reviewer suggested it — the PR #90 review explicitly said "Consider bundling the three UX fixes (#92, #93, #94) into a single follow-up PR since they all modify video-player.tsx." When I would NOT batch: - Different files for different reasons — #91 (middleware) + #94 (video player) touch unrelated code, even though both are "fixes." Separate PRs. - One is risky, one is safe — if #94 had a tricky edge case but #92 was trivial, I'd ship #92 alone to get it deployed while iterating on #94. - Different reviewers needed — if one fix touched security-sensitive code and another touched UI, separate PRs let the right person review each. - Different tickets with different acceptance criteria that need independent verification — if the user needs to test each independently on a preview before merging. The general rule: batch when the changes are co-located, co-dependent, and co-reviewable. Split when they're independently valuable, independently risky, or independently verifiable.

English

Teddy Kim@teddykim·21 Mar

Semantic memory expresses "what we know". In an agent memory architecture, semantic memory is where you store facts, patterns, and relationships. This helps agent performance in many ways. For example, the agent shouldn't have to re-discover and validate the right pattern via trial-and-error. Everything that happens on a project is "implicit knowledge" that should be available in semantic memory for future sessions. Anyway, I have worker agents that create session logs (episodic memory). The agent compares episodic memory to semantic memory and proposes new additions to the project knowledge graph (stored in memory mcp). If you are responsible for agent performance, try to do a little memory architecture on your agent system. There's only so far you can get with prompting.

English

Teddy Kim@teddykim·18 Mar

@rohanpaul_ai But you don't get bonuses for passing coding tests..

English

Rohan Paul@rohanpaul_ai·18 Mar

Anthropic's own study proves Vibe-Coding and AI coding assistants harm skill building. "AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average" Developers learning 1 new Python library scored 17% lower on tests when using AI. Delegating code generation to AI stops you from actually understanding the software. Using AI did not make the programmers statistically faster at completing tasks. Participants wasted time writing prompts instead of actually coding. Scores crashed below 40% when developers let AI write everything. Developers who only asked AI for simple concepts scored above 65%. Managers should not pressure engineers to use AI for endless productivity. Forcing top speed means workers lose the ability to debug systems later. ---- Paper Link – arxiv. org/abs/2601.20245 Paper Title: "How AI Impacts Skill Formation"

English

115

189

1.2K

102.7K

Teddy Kim@teddykim·17 Mar

Your firm might not be ready for fully autonomous coding agents. That doesn't mean you can't get a huge lift from AI. What would happen if you used AI to normalize task complexity so that every ticket in your ready-column can be picked up by a junior dev? You should be doing that anyway, but hardly anybody does, because it's hard. Really hard. For me to groom a backlog responsibly is at least a day and a half of work, and I would be fried for a day or two after. AI is really good at stuff like that. Just using it to sling code is kinda missing the big picture.

English

Teddy Kim@teddykim·13 Mar

@Kpaxs "Code wins arguments."

Minneapolis, MN 🇺🇸 Català

139

Kpaxs@Kpaxs·13 Mar

Here a controversial take: most of the authority that exists in any organization was never formally granted to anyone. It was assumed, exercised, and then retroactively legitimized by the fact that it worked.

Kpaxs@Kpaxs

I call it the "Refrigerator Principle" Most organizational dysfunction exists because everyone assumes someone else has the authority to fix it, and the fastest path forward is often just pretending you have that authority and dealing with forgiveness rather than permission.

English

102

607

6.9K

413.1K

Teddy Kim@teddykim·13 Mar

Intent engineering is an emergent discipline in AI. Strong grounding in DDD is very helpful for intent engineering because bounded contexts provide natural decision boundaries and business nuance around agent behavior.

English

Teddy Kim@teddykim·11 Mar

I watched Seasons 1 and 2 of Alice in Borderland. Wow, great television.

English

Teddy Kim@teddykim·10 Mar

@loneyai @kimmonismus Bro. 🤣

Minneapolis, MN 🇺🇸

Ailon@loneyai·10 Mar

@kimmonismus They tried so hard and got so far, in the end it doesn’t even matter.

English

319

Chubby♨️@kimmonismus·10 Mar

They try so hard, and yet Microsoft is somehow not profiting from AI except through Azure.

Satya Nadella@satyanadella

Announcing Copilot Cowork, a new way to complete tasks and get work done in M365. When you hand off a task to Cowork, it turns your request into a plan and executes it across your apps and files, grounded in your work data and operating within M365’s security and governance boundaries.

English

229

24.3K

Teddy Kim@teddykim·9 Mar

If you do agentic coding, You're implicitly doing manual eval. What are your metrics? How do you capture them? What do you do with the data? That's the difference between chaos and a durable system.

English

Teddy Kim retweetledi

Yuchen Jin@Yuchenj_UW·7 Mar

I saw a guy coding today. Terminal 1: Claude Code Terminal 2: Codex He typed the same prompt into both. Then stared at the screen for 60 minutes. Like a psychopath. Opened Cursor. Read the 10k lines AI generated. Like a Costco receipt checker pretending to verify every item.

English

152

4.1K

397.8K

Teddy Kim retweetledi

Prepared Remarks@P_Remarks·8 Mar

I don’t want to watch the McDonald’s CEO eat a burger. I want to watch Satya use Copilot

English

148

662

11.1K

460.4K

Teddy Kim@teddykim·8 Mar

Understanding SOLID is different than knowing how to do it. This is a pretty big differentiator between senior and junior devs, and probably a key reason senior devs get so much more out of LLM's. If you want to be good at this, alas, there are no shortcuts.

English

Teddy Kim@teddykim·7 Mar

If you are curious what "responsible human review" looks like, I shipped 954 PR's on a side project before I turned on auto-approve for Claude Code prompts. Nearly 1000 reps to get to auto-approve. And I still eyeball every PR. Not saying I'm doing it right. But that's how I'm doing it.

English

Teddy Kim retweetledi

signüll@signulll·5 Mar

my contrarion view is that the “models are commodities, just swap” take is a terminal naive view held almost exclusively by ppl on here who’ve never actually shipped llm based products at any depth. these ppl are usually only pontificators. my personal take is that the moat does exist & it is the accumulated fit… which is shit like prompt libraries, edge case handling, institutional intuition about failure modes, & embedded trust calibration. & switching costs compound nonlinearly w/ integration depth. surface level api consumer is maybe trivial. but for deeply embedded stuff like targeting logic, comms pipelines, classified workflows etc the switching cost is basically a full re engineering cycle disguised as a vendor decision. the dow/anthropic situation specifically is interesting cuz military orgs prolly have the most extreme version of this problem. they can’t just yolo test on prod, & every behavioral edge case has to be revalidated in ways that are slow, expensive & require specialized expertise that doesn’t necessarily transfer across model families. i am well aware of this because our startup has ability to switch to any model but it’s a pain in the ass cuz everything either breaks or becomes weird af.

The Washington Post@washingtonpost

To strike 1,000 targets in 24 hours in Iran, the U.S. military leveraged the most advanced AI it’s ever used in warfare. Anthropic’s Claude partnered with the military’s Maven Smart System, suggesting targets and issuing precise location coordinates. wapo.st/40adi1q

English

309

58.9K

Teddy Kim@teddykim·5 Mar

@unclebobmartin My favorite agent grooms my backlog for me using CD3 and batches tickets into the "ready column". I am absurdly fond of this agent.

Mount Juliet, TN 🇺🇸 English

Uncle Bob Martin@unclebobmartin·4 Mar

I have one agent generate bug reports, so that another agent can fix them. This is really cool.

English

177

14.5K

Keşfet

@rohanpaul_ai @Kpaxs @loneyai @kimmonismus @elonmusk @BarackObama @taylorswift13 @cristiano