Mateusz Kelner

483 posts

Mateusz Kelner banner
Mateusz Kelner

Mateusz Kelner

@MKelner

Building AGI @ https://t.co/IvP9rlX6pK Previously bootstrapped a consumer company to {redacted} revenue run rate

Katılım Eylül 2012
1.6K Takip Edilen255 Takipçiler
atlas
atlas@creatine_cycle·
really heartbreaking to see openai take compute away from sora. terrible news for those who goon to fictional characters like spongebob and judy hopps from zootopia
English
4
0
26
729
Viv
Viv@Vtrivedy10·
if you’re using deepagents in prod like the Moda team, would love to hear from you on how we can help and share your story! i know the langchain community has been cookin up some really great products on deepagents, please reach out :)
LangChain@LangChain

Congrats @anvisha and the @trymoda team on the launch. Moda built a design platform that turns non-designers into design pros. Under the hood: Deep Agents powering the design agents, with LangSmith providing the observability layer. Lots of smart context engineering in this one. How they built it: langchain-blog.ghost.io/ghost/#/editor…

English
1
1
7
1.9K
dex
dex@dexhorthy·
@southpolesteve I feel like they used to right, like og cursor did this a lot
English
2
0
2
917
Steve Faulkner
Steve Faulkner@southpolesteve·
Its wild to me that grep/ripgrep is state of the art locally for agents. The harnesses should ship semantic local search and indexing
English
14
3
50
10.5K
Mateusz Kelner
Mateusz Kelner@MKelner·
It's both. Websites have been fighting against automations and scraping for a long time now. CUA being bad doesn't help. The only usable thing for browser automation right now IMO is @Stagehanddev or @browser_use paired with a browser provider like @browserbase ideally with proxies turned on
English
0
0
3
463
sarah guo
sarah guo@saranormous·
watching claude try to use the browser...are websites being adversarial to computer use on purpose? or is CUA still that bad
English
137
9
401
110.1K
Erik Meijer
Erik Meijer@headinthebox·
@ujjwalscript ... Start building deterministic guardrails where AI is the engine, but the engineer holds the steering wheel ... You use math to bash people's naive assumption, but then you wave your hands widly to make your own point.
English
2
2
19
2.2K
Ujjwal Chadha
Ujjwal Chadha@ujjwalscript·
Your AI Agent is mathematically guaranteed to FAIL. This is the dirty secret the industry is hiding in 2026. Everyone on your timeline is currently bragging about their "Multi-Agent Swarms." Founders are acting like chaining five AI agents together is going to replace their entire engineering team overnight. Here is the reality check: It’s a mathematical illusion. Let’s look at the actual numbers. Say you have a state-of-the-art AI agent with an incredible 85% accuracy rate per action. In a vacuum, that sounds amazing. But an "autonomous" workflow isn't one action. It’s a chain. Read the ticket ➡️ Query the DB ➡️ Write the code ➡️ Run the test ➡️ Commit. Let's do the math on a 10-step process: $0.85^10= 0.19$ Your "revolutionary" autonomous system has a 19% success rate. And the real-world data proves it. Recent studies out of CMU this year show that the top frontier models are failing at over 70% of real-world, multi-step office tasks. We are officially in the era of "Agent Washing." Startups are rebranding complex, buggy software as "autonomous agents" to look cool, but they are ignoring the scariest part: AI fails silently. When traditional code breaks, it crashes and throws a stack trace. When an AI agent breaks, it doesn't crash. It just confidently hallucinates a fake database entry, sidesteps a broken API by faking the response, and keeps running—corrupting your data for weeks before you notice. If your "automated" system requires a senior engineer to spend three hours digging through prompt logs to figure out why the bot made a "creative decision," you didn't save any time. You just invented a highly expensive, unpredictable form of technical debt. Stop trying to build fully autonomous swarms to replace human judgment. Start building deterministic guardrails where AI is the engine, but the engineer holds the steering wheel
English
155
65
456
36.9K
Mateusz Kelner
Mateusz Kelner@MKelner·
@JoshLu I mean if you're product doesn't grow organically but you have a profitable paid acquisition channel you still have something of value and are generating money.
English
0
0
0
45
Josh Lu
Josh Lu@JoshLu·
If a company is spending money and expects that amount to be less than LTV or eLTV (2nd order referrals, seeding a network) then it’s paid marketing. Net, you are spending cash to grow. That’s why this whole thing is silly. Ofc great products have natural virality but it would be insane for anyone not to apply dollars as leverage on top Dollars as leverage, like with anything else, is good when the underlying (product, deal, asset, etc) is good and bad in the inverse
English
8
1
41
2.4K
Mateusz Kelner
Mateusz Kelner@MKelner·
@linderps Went to brunch at Balboa today, did not find anyone but got a great french toast for $20. Great success
English
0
0
0
12
Linda Chen
Linda Chen@linderps·
sexy things to do in sf this weekend: - friday: salsa / bachata class at space550, stay for the social afterwards - saturday: farmers market at the ferry building (go before 11am) - afternoon coffee at stable cafe in their patio, reflect on ur life or smthg - sunday brunch at balboa (unironically good food) hope everyone finds each other this weekend 🫶
Linda Chen@linderps

weekend reminder: go do sexy activities to meet sexy people. when i used to dance bachata, i remember constantly thinking… how am i surrounded by beautiful, sexy, feminine women and somehow there's not enough men??? try something new. go where the sexy people go.

San Francisco, CA 🇺🇸 English
17
3
245
48.7K
Jeff Weinstein
Jeff Weinstein@jeff_weinstein·
🚧 looking for 3 developers who like to try new tools and give (critical) feedback—this weekend... we have a new cmd line tool for those building new apps. if you're willing to write up your thoughts or send a video feedback walking through it, dm or email jweinstein at stripe.
English
16
4
65
14.7K
Mateusz Kelner
Mateusz Kelner@MKelner·
@RhysSullivan Codex 5.3-xhigh was pretty good at it, haven't tested 5.4 on this task. Opus was missing stuff for me
English
0
0
2
261
Rhys
Rhys@RhysSullivan·
are any of the models actually good at doing large refactors? i have to spend so much time fighting with them to not take shortcuts and actually make large changes to code
English
104
3
158
24.6K
Minh Do
Minh Do@minhsmind·
@MKelner @esha_hq I wish I met you last night. I was having a hard time finding people who wanted to engage talking about the film.
English
1
0
1
42
Esha
Esha@esha_hq·
@MKelner do you prefer it to the martian or is it hard to compare
English
2
0
1
185
Mateusz Kelner
Mateusz Kelner@MKelner·
@gauravisnotme Any findings you can share on how to solve this/make it better? I am having the same issues and the only helpful thing I found is having the agent ask you questions until it has no more questions
English
0
0
0
77
Gaurav
Gaurav@gauravisnotme·
Every day I relate more and more with what Andrej says - we are the bottleneck in the agent loop. From being amazed by the capabilities of what I have been able to achieve with the current agent harnesses, I have started becoming more frustrated with myself for not being descriptive enough, not being intelligent enough that my agent workflow requires an input from me every 30 minutes or so.
sarah guo@saranormous

Caught up with @karpathy for a new @NoPriorsPod: on the phase shift in engineering, AI psychosis, claws, AutoResearch, the opportunity for a SETI-at-Home like movement in AI, the model landscape, and second order effects 02:55 - What Capability Limits Remain? 06:15 - What Mastery of Coding Agents Looks Like 11:16 - Second Order Effects of Coding Agents 15:51 - Why AutoResearch 22:45 - Relevant Skills in the AI Era 28:25 - Model Speciation 32:30 - Collaboration Surfaces for Humans and AI 37:28 - Analysis of Jobs Market Data 48:25 - Open vs. Closed Source Models 53:51 - Autonomous Robotics and Atoms 1:00:59 - MicroGPT and Agentic Education 1:05:40 - End Thoughts

English
11
0
51
5.7K
Mateusz Kelner
Mateusz Kelner@MKelner·
@yrechtman Desirability/possibility matrix hits the bullseye. The possibility coverage will only grow but desirability will stay pretty much the same as long as we don't build our entire culture around AI
English
1
0
2
196
Kyle Mistele 🏴‍☠️
forcing claude to read the "you might not need an effect" post, and a little agent I wrote that runs react-doctor on your PR and roasts you based on react code-quality regressions against main (it diffs them) are the two best things I have done for our react code quality it is actually improving our code quality over time instead of degrading it
Kyle Mistele 🏴‍☠️ tweet media
English
5
0
14
1.3K