James Phoenix

4.7K posts

James Phoenix banner
James Phoenix

James Phoenix

@jamesaphoenix12

🏗️ Building https://t.co/MBYBDonekk | LLM Engineer 🎮 Ex-Wow Professional (top 0.5%)

/root/ Katılım Mayıs 2014
832 Takip Edilen1.2K Takipçiler
James Phoenix retweetledi
Yoonho Lee
Yoonho Lee@yoonholeee·
We just released code for Meta-Harness! github.com/stanford-iris-… Aside from replicating paper experiments, the repo is designed to help users implement good Meta-Harnesses in completely new domains! Just point your agent at ONBOARDING.md and have a conversation
Yoonho Lee tweet media
Yoonho Lee@yoonholeee

How can we autonomously improve LLM harnesses on problems humans are actively working on? Doing so requires solving a hard, long-horizon credit-assignment problem over all prior code, traces, and scores. Announcing Meta-Harness: a method for optimizing harnesses end-to-end

English
26
162
1.1K
118.7K
James Phoenix
James Phoenix@jamesaphoenix12·
@adamweststack @GeoffreyHuntley I agree imho. The less you understand the code, the more you need to prompt to understand. So there is a trade off between being able to read it vs asking an agent to read it for you.
English
0
0
0
36
Adam Daum
Adam Daum@adamweststack·
Something related I've been thinking about too is, if we get to a point that humans are no longer learning the languages, and AI is writing all the code, and enhancing all the programming languages itself, isn't that dangerous? Like, I'm not comfortable with that. If an event triggers an issue, or something breaks, or AI goes rogue, and there aren't any engineers that can interpret the code, because AI obfuscated it. That seems like a problem. Like especially that code's driving military equipment and systems, civilian infrastructure, etc. ad nauseum. Am I missing something? Don't we still need to understand the code?
English
2
0
0
45
geoff
geoff@GeoffreyHuntley·
something i’ve been pondering about: for how much longer will we still have programming language conferences now that AI is here?
geoff tweet media
English
11
0
31
4.3K
James Phoenix retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.
English
2.8K
6.8K
56.5K
20.1M
James Phoenix retweetledi
Erika Lee
Erika Lee@erikalee·
"I'm at my limit" emotional or claude?
English
339
2.8K
19.3K
473.9K
James Phoenix retweetledi
geoff
geoff@GeoffreyHuntley·
software eats itself
geoff tweet mediageoff tweet media
English
13
24
355
36.7K
James Phoenix retweetledi
Boris Cherny
Boris Cherny@bcherny·
I wanted to share a bunch of my favorite hidden and under-utilized features in Claude Code. I'll focus on the ones I use the most. Here goes.
English
554
2.5K
23.2K
3.9M
James Phoenix retweetledi
Jordan Hochenbaum
Jordan Hochenbaum@Jnatanh·
pi-autoresearch has been incredible for running experiments against our codebase, but I wanted a way to more selectively cherry-pick which ones become PRs, plus a few other bells and whistles. So I built pi-autoresearch-studio: granular experiment-to-PR selection with auto-resolved dependencies. My first @badlogicgames Pi extension.
English
16
33
563
37.9K
James Phoenix
James Phoenix@jamesaphoenix12·
I now have machine parseable invariants baked into my specs. These can be attached to either source code or test code. This is my way of staying up to date with what agents are doing.
James Phoenix tweet media
English
0
0
0
30
James Phoenix retweetledi
shirish
shirish@shiri_shh·
Creator and head of Claude Code: "100% of my code is written by Claude Code. I have not edited a single line by hand since November. Every day I ship 10, 20, 30 PRs… I have five agents running while we’re recording this."
CG@cgtwts

Anthropic CEO: “In the next 3 to 6 months, AI will write 90% of the code, and within 12 months, nearly all code may be generated by AI.” the job isn’t coding anymore, it’s telling machines what to build.

English
207
155
1.9K
346K
James Phoenix retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
- Drafted a blog post - Used an LLM to meticulously improve the argument over 4 hours. - Wow, feeling great, it’s so convincing! - Fun idea let’s ask it to argue the opposite. - LLM demolishes the entire argument and convinces me that the opposite is in fact true. - lol The LLMs may elicit an opinion when asked but are extremely competent in arguing almost any direction. This is actually super useful as a tool for forming your own opinions, just make sure to ask different directions and be careful with the sycophancy.
English
1.7K
2.4K
31.2K
3.4M
James Phoenix retweetledi
Tibo
Tibo@thsottiaux·
Hello. We have reset Codex usage limits across all plans to let everyone experiment with the magnificent plugins we just launched, and because it had been a while! You can just build unlimited things with Codex. Have fun!
English
672
389
9.1K
920.3K
James Phoenix retweetledi
POM
POM@peterom·
Prompt engineer → harness builder → loop architect → eval evaluator → goal-setting coach → resource allocator → trust arbiter → figurehead
English
1
1
17
1.5K
James Phoenix retweetledi
Morgan
Morgan@morganlinton·
Running Codex, on my Mac Studio, through Tailscale and Termius is still like freakin’ magic to me. Sitting in an airplane, having it jam away, just feels good 😊
Morgan tweet media
English
19
5
146
12.9K
James Phoenix retweetledi
max drake
max drake@max__drake·
turns out software already was clay! we just had weak hands
max drake tweet media
English
40
66
1.9K
142.2K
James Phoenix retweetledi
George Pu
George Pu@TheGeorgePu·
Almost signed up for ElevenLabs to narrate my blog. $330/month. Then I tried running an open-source model on my own laptop. Qwen 3.5 14B. Sounds fine. 200 posts a month. Costs me electricity. I almost paid $4,000 a year to rent a model I can run myself. Most AI subscriptions right now are just a nice UI on top of something free.
English
171
91
2.7K
184.9K
Jeffrey Emanuel
Jeffrey Emanuel@doodlestein·
Over 24 hours of continuous cranking by this clanker. I didn't even do anything special, it just keep going like the energizer bunny:
Jeffrey Emanuel tweet media
English
15
1
47
5.4K
James Phoenix retweetledi
Cline
Cline@cline·
Introducing Cline Kanban: A standalone app for CLI-agnostic multi-agent orchestration. Claude and Codex compatible. npm i -g cline Tasks run in worktrees, click to review diffs, & link cards together to create dependency chains that complete large amounts of work autonomously.
GIF
English
231
384
3.4K
1.5M