James Phoenix

4.7K posts

James Phoenix

@jamesaphoenix12

🏗️ Building https://t.co/MBYBDonekk | LLM Engineer 🎮 Ex-Wow Professional (top 0.5%)

/root/ Katılım Mayıs 2014

832 Takip Edilen1.2K Takipçiler

Sabitlenmiş Tweet

James Phoenix@jamesaphoenix12·1 Mar

🚀 Exciting news for all AI enthusiasts & developers! We're thrilled to announce that "Prompt Engineering for Generative AI" is NOW available for pre-order ! 📚 amzn.to/3wFGbas #OpenAI #PromptEngineering #GenerativeAI

English

5.8K

James Phoenix retweetledi

Yoonho Lee@yoonholeee·3d

We just released code for Meta-Harness! github.com/stanford-iris-… Aside from replicating paper experiments, the repo is designed to help users implement good Meta-Harnesses in completely new domains! Just point your agent at ONBOARDING.md and have a conversation

Yoonho Lee@yoonholeee

How can we autonomously improve LLM harnesses on problems humans are actively working on? Doing so requires solving a hard, long-horizon credit-assignment problem over all prior code, traces, and scores. Announcing Meta-Harness: a method for optimizing harnesses end-to-end

English

162

1.1K

118.7K

James Phoenix retweetledi

vlad@radmadvlad·7 Nis

So you can whip your Claude irl now. You can use your hand or whip it. Customizable command list.

sui ☄️@birdabo

SOMEONE MADE A DIGITAL WHIP TO MAKE CLAUDE WORK FASTER 💀

English

697

James Phoenix@jamesaphoenix12·10 Nis

@adamweststack @GeoffreyHuntley I agree imho. The less you understand the code, the more you need to prompt to understand. So there is a trade off between being able to read it vs asking an agent to read it for you.

English

Adam Daum@adamweststack·8 Nis

Something related I've been thinking about too is, if we get to a point that humans are no longer learning the languages, and AI is writing all the code, and enhancing all the programming languages itself, isn't that dangerous? Like, I'm not comfortable with that. If an event triggers an issue, or something breaks, or AI goes rogue, and there aren't any engineers that can interpret the code, because AI obfuscated it. That seems like a problem. Like especially that code's driving military equipment and systems, civilian infrastructure, etc. ad nauseum. Am I missing something? Don't we still need to understand the code?

English

geoff@GeoffreyHuntley·8 Nis

something i’ve been pondering about: for how much longer will we still have programming language conferences now that AI is here?

English

4.3K

James Phoenix retweetledi

Andrej Karpathy@karpathy·2 Nis

LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.

English

2.8K

6.8K

56.5K

20.1M

James Phoenix retweetledi

dex@dexhorthy·3 Nis

if you care about coding agents and tasteful software def go watch this talk by @badlogicgames it’s very good youtu.be/Dli5slNaJu0?si…

YouTube

English

392

32.2K

James Phoenix retweetledi

Erika Lee@erikalee·31 Mar

"I'm at my limit" emotional or claude?

English

339

2.8K

19.3K

473.9K

James Phoenix retweetledi

geoff@GeoffreyHuntley·29 Mar

software eats itself

English

355

36.7K

James Phoenix retweetledi

Boris Cherny@bcherny·30 Mar

I wanted to share a bunch of my favorite hidden and under-utilized features in Claude Code. I'll focus on the ones I use the most. Here goes.

English

554

2.5K

23.2K

3.9M

James Phoenix retweetledi

Jordan Hochenbaum@Jnatanh·29 Mar

pi-autoresearch has been incredible for running experiments against our codebase, but I wanted a way to more selectively cherry-pick which ones become PRs, plus a few other bells and whistles. So I built pi-autoresearch-studio: granular experiment-to-PR selection with auto-resolved dependencies. My first @badlogicgames Pi extension.

English

563

37.9K

James Phoenix@jamesaphoenix12·28 Mar

I now have machine parseable invariants baked into my specs. These can be attached to either source code or test code. This is my way of staying up to date with what agents are doing.

English

James Phoenix retweetledi

shirish@shiri_shh·28 Mar

Creator and head of Claude Code: "100% of my code is written by Claude Code. I have not edited a single line by hand since November. Every day I ship 10, 20, 30 PRs… I have five agents running while we’re recording this."

CG@cgtwts

Anthropic CEO: “In the next 3 to 6 months, AI will write 90% of the code, and within 12 months, nearly all code may be generated by AI.” the job isn’t coding anymore, it’s telling machines what to build.

English

207

155

1.9K

346K

James Phoenix@jamesaphoenix12·28 Mar

This is goated

Nico Bailon@nicopreme

Pi prompt templates can now loop and rotate between different models on each iteration. In this example it runs each loop with a different model as a subagent based on a fork of the current chat. pi install npm:pi-prompt-template-model github.com/nicobailon/pi-…

English

James Phoenix retweetledi

Andrej Karpathy@karpathy·28 Mar

- Drafted a blog post - Used an LLM to meticulously improve the argument over 4 hours. - Wow, feeling great, it’s so convincing! - Fun idea let’s ask it to argue the opposite. - LLM demolishes the entire argument and convinces me that the opposite is in fact true. - lol The LLMs may elicit an opinion when asked but are extremely competent in arguing almost any direction. This is actually super useful as a tool for forming your own opinions, just make sure to ask different directions and be careful with the sycophancy.

English

1.7K

2.4K

31.2K

3.4M

James Phoenix retweetledi

Tibo@thsottiaux·27 Mar

Hello. We have reset Codex usage limits across all plans to let everyone experiment with the magnificent plugins we just launched, and because it had been a while! You can just build unlimited things with Codex. Have fun!

English

672

389

9.1K

920.3K

James Phoenix retweetledi

POM@peterom·27 Mar

Prompt engineer → harness builder → loop architect → eval evaluator → goal-setting coach → resource allocator → trust arbiter → figurehead

English

1.5K

James Phoenix retweetledi

Morgan@morganlinton·27 Mar

Running Codex, on my Mac Studio, through Tailscale and Termius is still like freakin’ magic to me. Sitting in an airplane, having it jam away, just feels good 😊

English

146

12.9K

James Phoenix retweetledi

max drake@max__drake·26 Mar

turns out software already was clay! we just had weak hands

English

1.9K

142.2K

James Phoenix retweetledi

George Pu@TheGeorgePu·27 Mar

Almost signed up for ElevenLabs to narrate my blog. $330/month. Then I tried running an open-source model on my own laptop. Qwen 3.5 14B. Sounds fine. 200 posts a month. Costs me electricity. I almost paid $4,000 a year to rent a model I can run myself. Most AI subscriptions right now are just a nice UI on top of something free.

English

171

2.7K

184.9K

James Phoenix@jamesaphoenix12·26 Mar

@doodlestein Rust ^_^

English

Jeffrey Emanuel@doodlestein·26 Mar

Over 24 hours of continuous cranking by this clanker. I didn't even do anything special, it just keep going like the energizer bunny:

English

5.4K

James Phoenix retweetledi

Cline@cline·26 Mar

Introducing Cline Kanban: A standalone app for CLI-agnostic multi-agent orchestration. Claude and Codex compatible. npm i -g cline Tasks run in worktrees, click to review diffs, & link cards together to create dependency chains that complete large amounts of work autonomously.

GIF

English

231

384

3.4K

1.5M

Keşfet

@adamweststack @GeoffreyHuntley @badlogicgames @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates