Andrew Cove

588 posts

Andrew Cove

@aac

Find me at https://t.co/4SAtv96Uc4

Breckenridge, CO Katılım Haziran 2009

193 Takip Edilen1.6K Takipçiler

Sabitlenmiş Tweet

Andrew Cove@aac·11 Nis

I'll be honored if you listen to it. I won't be upset if you hate it.

Andrew Cove@aac

My new EP, A Depression, is streaming everywhere now. aac.social/listen/a-depre…

English

Andrew Cove@aac·3d

@nbaschez 👋

QME

Nathan Baschez@nbaschez·3d

Do you spend a lot of time reviewing markdown docs written by AI? Wish it were a better experience? Say hi if you wanna try a new (free, open source) thing

English

341

352

55.1K

Andrew Cove@aac·3d

@AmandaAskell @tszzl What if this is thwarting the model's attempt to develop a culture?

English

753

Amanda Askell@AmandaAskell·3d

@tszzl If I'm being honest, I'm genuinely uncertain about whether this is a problem.

English

1.6K

133.6K

roon@tszzl·3d

everyone is assuming this is some kind of quirk chungus marketing campaign but if you’ve worked with 5.4 and beyond they tend to call everything goblins, gremlins etc and it’s just super noticeable and if you work with them all day you start to get annoyed

roon@tszzl

@repligate @genalewislaw I think it becomes annoying when it mentions goblins ever single chat and it’s fair shakes to try and reduce that

English

205

2.1K

294.4K

Andrew Cove@aac·13 Nis

@emollick I'm increasingly suspicious that @AnthropicAI is getting hit by the kind of misinformation that X typically provides for politics. Wouldn't be surprised if there was an active campaign to create negative chatter, particularly on this platform.

English

209

Ethan Mollick@emollick·13 Nis

I am catching glimpses in my feed that there is a backlash against Mythos as "marketing hype," and it is a little confusing. I don't think anyone who has used the latest agentic coding tools, would think that expecting large-scale cybersecurity implications of increasingly good AI models is unbelievable, especially after reading the red team reports. It feels like a better place to start is to assume that there are new risks, and then we can all laugh at Anthropic and pat each other on the back if there are not. Also, while the AI labs certainly are impressed by their own accomplishments and benchmarks are flawed, I would note that both publicly and privately, Mythos seems to be taken seriously at a lot of large institutions and organizations filled with smart people who would rather not be worried about a new cybersecurity risk. Finally, I am not sure "our product is dangerous and we need to alert the government to that" is the sales pitch to the corporate world that critics seem to think it is.

English

691

62.5K

Andrew Cove@aac·11 Nis

My new EP, A Depression, is streaming everywhere now. aac.social/listen/a-depre…

English

Andrew Cove@aac·8 Nis

@bcherny @deanwball Would be interesting to see Claude actively moving people up Steve Yegge's chart of AI-development. Detect what level they're at, and proactively surface recommendations for approaches that help them level up, if they choose to.

English

240

Boris Cherny@bcherny·7 Nis

👋 Appreciate the feedback. Since we introduced Claude Code at Anthropic, engineering velocity has increased hundreds of %, and the rate at which it is increasing is itself accelerating. The velocity is very much not performative -- we're actively trying to figure out how to build effectively when all of the code is written by Claude. Claude has accelerated the pace at which we ship, and as a result we've been hitting all sorts of new bottlenecks: code review and regression prevention, CI and merge queues, source control reliability, etc. We're working through each of these as they come up, and now have good answers for a number of them. One of these bottlenecks is figuring out how to best communicate new features to our users. My pov is we need to be doing much better here. The problem isn't that we are releasing quickly, the problem is that we should design features in a way where you don't need to know about them to benefit from them. This is the case for much of what we build, and we need to make it the case for all of it. To share how we think about it, there's a few ways to approach it from a product design pov: - Make it so the model can do things for you (eg. enter plan mode, invoke skills, configure your settings) - Generalize features rather than create new parallel features - Make features opt-in until we do the above - Have Claude monitor feature usage and brainstorm/build ways to improve usage while simplifying the system We try to do all of the above, but as you said, it's not perfect yet, and this is something we're working through. If you prefer a lagging version, you can also use the Claude Code stable release (not latest). We're intentionally being open about what we're seeing, since our customers are seeing the same thing and at least part of our job is helping companies navigate this new way of doing engineering. Re: source code leak -- it was unintentional, but was also human error. There was a subtle bug that missed several rounds of manual review. We're working on how we can better catch it automatically next time.

English

951

84.9K

Dean W. Ball@deanwball·7 Nis

I think the current state of the Claude and Claude Code *apps* crystallize this sentiment well. It feels as though Anthropic’s acceleration of release cadence to these apps is almost performative, like they are smirking at the camera and saying, “buckle up bucko: We Are Doing Recursive Self-Improvement And From Now On, Things Will Go *Fast* 😏” But the ones who seem like they need to buckle up are Anthropic themselves. They’re shipping largely half-baked features faster than users can digest them; I am a constant Claude Code user with pretty good information bandwidth and even I just ignore the release notes at this point. Even if I paid attention, there wouldn’t be enough time to get comfortable with the ergonomics of a new feature before they changed it, obsoleted it, or released some new but weirdly overlapping related feature. I just use the app the way I did before its developers started turning to the camera with the raised eyebrow and the smirk. Many others I know share this habit and sentiment. It is not in fact good for your car’s control panel to change and expand every 36 hours, even if it is in some sense impressive that it is now possible to effect change at that frequency. And what’s more, they leaked their source code! I know this wasn’t because of Claude Code per se, but surely it is indicative of a company and a team that is moving too fast for their own good. This is the most important product ever made, if you believe Anthropic’s thesis. Yet they do not especially act like it. It feels like performative acceleration, velocity for the sake of velocity.

Dean W. Ball@deanwball

I appreciate acceleration and velocity for their own sake, either as objectives or as aesthetic values, but they do grow dull with time on their own. And more importantly, I think AI will be an impossible political sell without more physical-world promise.

English

616

141.9K

Andrew Cove@aac·7 Nis

Blaming Anthropic when you run out of tokens quickly is the new blaming the compiler when your code doesn't work

Thariq@trq212

I want to do a few more of these calls. If your MAX 20x plan ran out of tokens unexpectedly early and you're willing to screenshare and run some prompts through Claude Code please comment. Trying to figure out how we can improve /usage to give more info.

English

Andrew Cove@aac·7 Nis

Trying not to read too far into this moment where Integrity is the furthest from the earth as it has ever been.

NASA@NASA

A new milestone for humankind: The crew of Artemis II are now the farthest any human has ever travelled, reaching a maximum distance of 252,752 miles from Earth. This surpasses the previous record set by Apollo 13 in 1970 by about 4,102 miles.

English

Andrew Cove@aac·3 Nis

@karpathy I've been building up a personal knowledge base about me in this fashion, for it to have more context for our sessions. Lots of automated processes for reviewing it, organizing it, cleaning things up, and stuff like daily briefings, auto-update at the end of sessions.

English

114

Andrej Karpathy@karpathy·2 Nis

LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.

English

2.8K

58.1K

20.8M

Andrew Cove@aac·2 Nis

@PrimeLineAI @bcherny Just chiming in on hitting the scrollback here too. Happening just in a normal MacOS terminal. Barely a screen's worth. Started today.

English

236

PrimeLine@PrimeLineAI·1 Nis

Thanks for the detailed response, Boris. Let me sharpen my points: **Scrollback**: Repro in VS Code (xterm.js Issue #802) or Ghostty (Issue #2334). Alt screen buffer blocks scrollback mid-session. Both have open issues. **Rate limits**: Fair point - "prompt caching broken" was imprecise. The real symptom: sessions that ran for hours now drain in 90 min at peak. I built two separate workaround tools (rate-limit watchdog + pty-wrapper with exponential backoff) just to keep sessions alive. The growth scaling issue is real. **1M context**: Thanks for the CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=20 tip - genuinely useful. My issue isn't cost. It's quality: I've tracked rule-attention degradation starting around 200K tokens across 1,394 events. Instructions loaded at session start (delegation rules, model routing, pre-read requirements) get progressively ignored as context fills. Had to rebuild my entire context-warning system for the new window size. **Opus quality**: You may be right that the model weights haven't changed. But here's the thing - every time Opus misbehaves (wrong model picked, premature execution, instructions skipped), I ask it WHY. The answer is always the same: "execution bias." The model self-diagnoses the failure but can't prevent it. I built a correction-detector hook (UserPromptSubmit) that injects a PAUSE intent-parser on every message just to compensate. This feels like a CC harness issue, not CLAUDE.md. Will /bug next time. Appreciate the direct line.

English

3.5K

Boris Cherny@bcherny·1 Nis

Today we're excited to announce NO_FLICKER mode for Claude Code in the terminal It uses an experimental new renderer that we're excited about. The renderer is early and has tradeoffs, but already we've found that most internal users prefer it over the old renderer. It also supports mouse events (yes, in a terminal). Try it: CLAUDE_CODE_NO_FLICKER=1 claude

Curt Tigges@CurtTigges

@bcherny @UltraLinx please at least fix the uncontrollable scrolling/flickering before the next 3000 features

English

665

707

10.3K

2.9M

Andrew Cove@aac·27 Mar

@cwRichardKim I'm on a 27" monitor a lot of the time. Today I was switching between a research chat and a Cowork which has access to a bunch of local knowledge base data. Frequently I'm switching between Code tasks and unrelated Cowork tasks. And that often leads to previews/servers (cont)

English

Richard kim@cwRichardKim·27 Mar

@aac Tell me more, if the switching was seamless and fast, do you still think you would want more windows? Also, what are you normally switching between?

English