Andrew Cove

588 posts

Andrew Cove banner
Andrew Cove

Andrew Cove

@aac

Find me at https://t.co/4SAtv96Uc4

Breckenridge, CO Katılım Haziran 2009
193 Takip Edilen1.6K Takipçiler
Nathan Baschez
Nathan Baschez@nbaschez·
Do you spend a lot of time reviewing markdown docs written by AI? Wish it were a better experience? Say hi if you wanna try a new (free, open source) thing
English
341
1
352
55.1K
Amanda Askell
Amanda Askell@AmandaAskell·
@tszzl If I'm being honest, I'm genuinely uncertain about whether this is a problem.
English
71
25
1.6K
133.6K
roon
roon@tszzl·
everyone is assuming this is some kind of quirk chungus marketing campaign but if you’ve worked with 5.4 and beyond they tend to call everything goblins, gremlins etc and it’s just super noticeable and if you work with them all day you start to get annoyed
roon@tszzl

@repligate @genalewislaw I think it becomes annoying when it mentions goblins ever single chat and it’s fair shakes to try and reduce that

English
205
30
2.1K
294.4K
Andrew Cove
Andrew Cove@aac·
@emollick I'm increasingly suspicious that @AnthropicAI is getting hit by the kind of misinformation that X typically provides for politics. Wouldn't be surprised if there was an active campaign to create negative chatter, particularly on this platform.
English
0
0
0
209
Ethan Mollick
Ethan Mollick@emollick·
I am catching glimpses in my feed that there is a backlash against Mythos as "marketing hype," and it is a little confusing. I don't think anyone who has used the latest agentic coding tools, would think that expecting large-scale cybersecurity implications of increasingly good AI models is unbelievable, especially after reading the red team reports. It feels like a better place to start is to assume that there are new risks, and then we can all laugh at Anthropic and pat each other on the back if there are not. Also, while the AI labs certainly are impressed by their own accomplishments and benchmarks are flawed, I would note that both publicly and privately, Mythos seems to be taken seriously at a lot of large institutions and organizations filled with smart people who would rather not be worried about a new cybersecurity risk. Finally, I am not sure "our product is dangerous and we need to alert the government to that" is the sales pitch to the corporate world that critics seem to think it is.
English
90
46
691
62.5K
Andrew Cove
Andrew Cove@aac·
@bcherny @deanwball Would be interesting to see Claude actively moving people up Steve Yegge's chart of AI-development. Detect what level they're at, and proactively surface recommendations for approaches that help them level up, if they choose to.
English
0
0
2
240
Boris Cherny
Boris Cherny@bcherny·
👋 Appreciate the feedback. Since we introduced Claude Code at Anthropic, engineering velocity has increased hundreds of %, and the rate at which it is increasing is itself accelerating. The velocity is very much not performative -- we're actively trying to figure out how to build effectively when all of the code is written by Claude. Claude has accelerated the pace at which we ship, and as a result we've been hitting all sorts of new bottlenecks: code review and regression prevention, CI and merge queues, source control reliability, etc. We're working through each of these as they come up, and now have good answers for a number of them. One of these bottlenecks is figuring out how to best communicate new features to our users. My pov is we need to be doing much better here. The problem isn't that we are releasing quickly, the problem is that we should design features in a way where you don't need to know about them to benefit from them. This is the case for much of what we build, and we need to make it the case for all of it. To share how we think about it, there's a few ways to approach it from a product design pov: - Make it so the model can do things for you (eg. enter plan mode, invoke skills, configure your settings) - Generalize features rather than create new parallel features - Make features opt-in until we do the above - Have Claude monitor feature usage and brainstorm/build ways to improve usage while simplifying the system We try to do all of the above, but as you said, it's not perfect yet, and this is something we're working through. If you prefer a lagging version, you can also use the Claude Code stable release (not latest). We're intentionally being open about what we're seeing, since our customers are seeing the same thing and at least part of our job is helping companies navigate this new way of doing engineering. Re: source code leak -- it was unintentional, but was also human error. There was a subtle bug that missed several rounds of manual review. We're working on how we can better catch it automatically next time.
English
76
29
951
84.9K
Dean W. Ball
Dean W. Ball@deanwball·
I think the current state of the Claude and Claude Code *apps* crystallize this sentiment well. It feels as though Anthropic’s acceleration of release cadence to these apps is almost performative, like they are smirking at the camera and saying, “buckle up bucko: We Are Doing Recursive Self-Improvement And From Now On, Things Will Go *Fast* 😏” But the ones who seem like they need to buckle up are Anthropic themselves. They’re shipping largely half-baked features faster than users can digest them; I am a constant Claude Code user with pretty good information bandwidth and even I just ignore the release notes at this point. Even if I paid attention, there wouldn’t be enough time to get comfortable with the ergonomics of a new feature before they changed it, obsoleted it, or released some new but weirdly overlapping related feature. I just use the app the way I did before its developers started turning to the camera with the raised eyebrow and the smirk. Many others I know share this habit and sentiment. It is not in fact good for your car’s control panel to change and expand every 36 hours, even if it is in some sense impressive that it is now possible to effect change at that frequency. And what’s more, they leaked their source code! I know this wasn’t because of Claude Code per se, but surely it is indicative of a company and a team that is moving too fast for their own good. This is the most important product ever made, if you believe Anthropic’s thesis. Yet they do not especially act like it. It feels like performative acceleration, velocity for the sake of velocity.
Dean W. Ball@deanwball

I appreciate acceleration and velocity for their own sake, either as objectives or as aesthetic values, but they do grow dull with time on their own. And more importantly, I think AI will be an impossible political sell without more physical-world promise.

English
28
28
616
141.9K
Andrew Cove
Andrew Cove@aac·
@karpathy I've been building up a personal knowledge base about me in this fashion, for it to have more context for our sessions. Lots of automated processes for reviewing it, organizing it, cleaning things up, and stuff like daily briefings, auto-update at the end of sessions.
English
0
0
0
114
Andrej Karpathy
Andrej Karpathy@karpathy·
LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.
English
2.8K
7K
58.1K
20.8M
Andrew Cove
Andrew Cove@aac·
@PrimeLineAI @bcherny Just chiming in on hitting the scrollback here too. Happening just in a normal MacOS terminal. Barely a screen's worth. Started today.
English
1
0
4
236
PrimeLine
PrimeLine@PrimeLineAI·
Thanks for the detailed response, Boris. Let me sharpen my points: **Scrollback**: Repro in VS Code (xterm.js Issue #802) or Ghostty (Issue #2334). Alt screen buffer blocks scrollback mid-session. Both have open issues. **Rate limits**: Fair point - "prompt caching broken" was imprecise. The real symptom: sessions that ran for hours now drain in 90 min at peak. I built two separate workaround tools (rate-limit watchdog + pty-wrapper with exponential backoff) just to keep sessions alive. The growth scaling issue is real. **1M context**: Thanks for the CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=20 tip - genuinely useful. My issue isn't cost. It's quality: I've tracked rule-attention degradation starting around 200K tokens across 1,394 events. Instructions loaded at session start (delegation rules, model routing, pre-read requirements) get progressively ignored as context fills. Had to rebuild my entire context-warning system for the new window size. **Opus quality**: You may be right that the model weights haven't changed. But here's the thing - every time Opus misbehaves (wrong model picked, premature execution, instructions skipped), I ask it WHY. The answer is always the same: "execution bias." The model self-diagnoses the failure but can't prevent it. I built a correction-detector hook (UserPromptSubmit) that injects a PAUSE intent-parser on every message just to compensate. This feels like a CC harness issue, not CLAUDE.md. Will /bug next time. Appreciate the direct line.
English
3
0
24
3.5K
Boris Cherny
Boris Cherny@bcherny·
Today we're excited to announce NO_FLICKER mode for Claude Code in the terminal It uses an experimental new renderer that we're excited about. The renderer is early and has tradeoffs, but already we've found that most internal users prefer it over the old renderer. It also supports mouse events (yes, in a terminal). Try it: CLAUDE_CODE_NO_FLICKER=1 claude
Curt Tigges@CurtTigges

@bcherny @UltraLinx please at least fix the uncontrollable scrolling/flickering before the next 3000 features

English
665
707
10.3K
2.9M
Andrew Cove
Andrew Cove@aac·
@cwRichardKim I'm on a 27" monitor a lot of the time. Today I was switching between a research chat and a Cowork which has access to a bunch of local knowledge base data. Frequently I'm switching between Code tasks and unrelated Cowork tasks. And that often leads to previews/servers (cont)
English
0
0
0
14
Richard kim
Richard kim@cwRichardKim·
@aac Tell me more, if the switching was seamless and fast, do you still think you would want more windows? Also, what are you normally switching between?
English
1
0
0
36
Andrew Cove
Andrew Cove@aac·
I'm releasing an EP — "A Depression" is out April 10. Pre-save it on Spotify show.co/CvbA9Gc
Andrew Cove tweet media
English
2
0
1
87