misha khalman

723 posts

misha khalman banner
misha khalman

misha khalman

@khalman_m

all retweets are my own and not of my employer ✌️

San Francisco Katılım Ocak 2013
471 Takip Edilen312 Takipçiler
misha khalman retweetledi
Adam Wolff
Adam Wolff@dmwlff·
I believe this new model in Claude Code is a glimpse of the future we're hurtling towards, maybe as soon as the first half of next year: software engineering is done. Soon, we won't bother to check generated code, for the same reasons we don't check compiler output.
English
359
209
2K
2.1M
Dmitry Pyanov
Dmitry Pyanov@dimapyanov·
Pov happens only in sf: you’re walking in a residential area and find a giant pile of ~50 books dumped in the middle of the street next to a tree... I picked 5 and walked home. This happened last summer 2024
Dmitry Pyanov tweet media
English
1
1
3
236
misha khalman retweetledi
Xiaoyi Zhang
Xiaoyi Zhang@xiaoyiz_uw·
[Just hired an amazing research engineer; one more to hire!] Join Anthropic to transform Claude into the best Virtual Collaborator. You'll tackle fascinating challenges — teaching Claude many things from navigating complex internal knowledge bases to building sophisticated financial models! We are looking for Staff+ (and exceptional Senior MLE) research engineers who are excited to push the boundaries of what AI assistants can do in professional contexts. RL experience is required. job-boards.greenhouse.io/anthropic/jobs…
Xiaoyi Zhang tweet media
English
8
17
292
33.4K
misha khalman retweetledi
Dmitry Pyanov
Dmitry Pyanov@dimapyanov·
Models are not the issue, but the way they’re productized is. Swiss-army knife product is a lowest common denominator for underlying technology, and GPT-5 amplified the issues, let me explain why: - The way ChatGPT set up as a product is not performing well neither for work nor for companionship - For work it’s unreliable and hallucinates too much, memory about personal matters shouldn’t be mixed with work memory, etc. - For companionship and friendship it’s too sycophantic and verbose. I’m convinced you shouldn’t be mixing all capabilities in one system prompt, otherwise it won’t be good / deep in any of the domains - For companionships and friendship it’s absolutely VITAL to have strict and thought out philosophical guidelines, maybe even some extra tooling, to actively mitigate risks of AI psychosis. In its current state this product is dangerous with all insane bullshit it mirrors back at whatever you feed it with. Venting / therapy use case is here to stay, people are desperately looking for meaning and support in all corners of the internet, so the model must take this responsibility very seriously. It must push back, it must constantly remind of its limitations and nature, while trying to be helpful without sacrificing principles. - Friendship can never look like a verbose bullet list that dumps a ton of questionable information at you. It has to be a 2 sided get to know you, with push backs, open-ended conversations. Imagine a really smart but insane friend that says yes on everything you say, invents information just to mirror your way of thought—that’s very dangerous.
Sam Altman@sama

If you have been following the GPT-5 rollout, one thing you might be noticing is how much of an attachment some people have to specific AI models. It feels different and stronger than the kinds of attachment people have had to previous kinds of technology (and so suddenly deprecating old models that users depended on in their workflows was a mistake). This is something we’ve been closely tracking for the past year or so but still hasn’t gotten much mainstream attention (other than when we released an update to GPT-4o that was too sycophantic). (This is just my current thinking, and not yet an official OpenAI position.) People have used technology including AI in self-destructive ways; if a user is in a mentally fragile state and prone to delusion, we do not want the AI to reinforce that. Most users can keep a clear line between reality and fiction or role-play, but a small percentage cannot. We value user freedom as a core principle, but we also feel responsible in how we introduce new technology with new risks. Encouraging delusion in a user that is having trouble telling the difference between reality and fiction is an extreme case and it’s pretty clear what to do, but the concerns that worry me most are more subtle. There are going to be a lot of edge cases, and generally we plan to follow the principle of “treat adult users like adults”, which in some cases will include pushing back on users to ensure they are getting what they really want. A lot of people effectively use ChatGPT as a sort of therapist or life coach, even if they wouldn’t describe it that way. This can be really good! A lot of people are getting value from it already today. If people are getting good advice, leveling up toward their own goals, and their life satisfaction is increasing over years, we will be proud of making something genuinely helpful, even if they use and rely on ChatGPT a lot. If, on the other hand, users have a relationship with ChatGPT where they think they feel better after talking but they’re unknowingly nudged away from their longer term well-being (however they define it), that’s bad. It’s also bad, for example, if a user wants to use ChatGPT less and feels like they cannot. I can imagine a future where a lot of people really trust ChatGPT’s advice for their most important decisions. Although that could be great, it makes me uneasy. But I expect that it is coming to some degree, and soon billions of people may be talking to an AI in this way. So we (we as in society, but also we as in OpenAI) have to figure out how to make it a big net positive. There are several reasons I think we have a good shot at getting this right. We have much better tech to help us measure how we are doing than previous generations of technology had. For example, our product can talk to users to get a sense for how they are doing with their short- and long-term goals, we can explain sophisticated and nuanced issues to our models, and much more.

English
1
1
9
1.2K
Konstantin Slavnov
Konstantin Slavnov@_s0uthp4w_·
I just tried the OpenAI Agent for QA-related tasks, and I can say it's pretty useless for this kind of work right now.
English
1
0
1
90
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
So how are we now supposed to pronounce JEPA vs GEPA?! Come on guys, this is worse than GIF.
Jackson Atkins@JacksonAtkinsX

LLMs can now self-optimize. A new method allows an AI to rewrite its own prompts to achieve up to 35x greater efficiency, outperforming both Reinforcement Learning and Fine-Tuning for complex reasoning. UC Berkeley, Stanford, and Databricks introduce a new method called GEPA (Genetic-Pareto), an autonomous system for prompt optimization. The researchers tested this across diverse tasks like multi-hop Q&A and instruction following. They demonstrated gains using proprietary models like GPT-4.1 Mini and open-source models like Qwen3 8B. Here's a look at how it works: GEPA treats prompt optimization as a genetic evolution problem. It starts with a diverse "pool" of prompt candidates. It uses Pareto optimization to select the "fittest" prompts. It finds the ones that offer the best tradeoff between high performance on a task and low computational cost (measured in "rollouts"). It "evolves" new, better prompts using two key mechanisms: Crossover: Intelligently combining the best parts of two successful "parent" prompts to create a new "child" prompt. Reflective Mutation: This is the self-optimization engine. The system tasks an LLM to analyze its own detailed execution trace (its successes and failures) and then intelligently rewrite its own instructions to fix the flaws. How GEPA fits into your AI strategy: This method provides a powerful new tool without replacing existing ones. Here’s the distinction: GEPA works on its own. You can apply it directly to any base LLM to achieve significant performance gains just by optimizing the prompt. Fine-Tuning teaches the model what (domain knowledge), while GEPA optimizes how the model uses that knowledge (its reasoning process). This makes them powerful complements. You can use GEPA to supercharge a base model, OR you can apply it to an already fine-tuned model to get the absolute best performance from your expert AI. It's a new, flexible layer in the optimization toolkit that allows AI to optimize itself.

English
33
11
307
72.3K
misha khalman retweetledi
Sid
Sid@sidbid·
Claude Code is getting a brand new feature: custom subagents. Type `/agents` to get started.
English
171
490
4K
1.1M
misha khalman retweetledi
Meaghan
Meaghan@meaghaneschoi·
Claude Code has been game-changing for @AnthropicAI designers. We don’t just prototype, but we write production-level code. It’s in terminal, which might feel scary. But Claude makes it less so. Tips for getting started as a designer 🧵↓
Meaghan tweet media
English
41
50
812
116.5K
misha khalman retweetledi
Hrishi
Hrishi@hrishioa·
I'm sorry I have to leave early I have two Claude Codes at home
English
10
17
303
20.1K
misha khalman retweetledi
Dmitry Pyanov
Dmitry Pyanov@dimapyanov·
I already have nostalgia about times, let’s call it pre-singularity when we were in awe seeing computers talking. Silly or randomly profound chatbot responses would surprise us and we used to share screenshots with crazy, outrageous, funny or spooky chats with AI. We're so quickly way past that era: there's a jaded expectation of supreme intelligence, abundance of highest form of thought and eloquence of language. We live in a post-AI era, or however you may want to call it, we've transitioned from ‘wow, look, computer is talking back funny’ to gates of almighty super intelligence.
English
0
1
5
353
Dmitry Pyanov
Dmitry Pyanov@dimapyanov·
Claude 4 claiming there was no 3.7 and it can’t tell me how it’s different. I understand the focus on coding and technical community, but communicating to app users and less technical users who are not following the company but using the app seems quite important. There’s so much to tell.
Dmitry Pyanov tweet media
English
1
0
2
477
Zack Witten
Zack Witten@zswitten·
> make a map of the united states with all 50 states labeled. decorate the map with relevant animals and plants
Zack Witten tweet media
English
7
0
90
5.1K
Zack Witten
Zack Witten@zswitten·
> make a map of the united states with all 50 states labeled
Zack Witten tweet media
English
66
11
674
61.2K
misha khalman retweetledi
siddharth ahuja
siddharth ahuja@sidahuj·
🎵💿Built an MCP that lets Claude talk directly to Ableton. Now you can create music with just prompts! Here’s a demo of me creating a lush, 80s synthwave track in just two prompts. It picks the right instruments, creates melodies, and adds effects like reverb and distortion 🔊
English
452
792
7.3K
1.1M