Seth Karten

3.1K posts

Seth Karten

@sethkarten

Agents….PokeChamp, PokeAgent, LLM Economist | CS PhD @Princeton | Former CMU Waymo

🐯 Katılım Ekim 2012

615 Takip Edilen1.6K Takipçiler

Sabitlenmiş Tweet

Seth Karten@sethkarten·4d

x.com/i/article/2033…

ZXX

368

69.8K

Seth Karten@sethkarten·7h

@emollick Train LLMs to manage resources in Civ and they will be better at managing gpus for ai science

English

120

Ethan Mollick@emollick·10h

Its not Simcity, but business school students who were good at Civ V also turn out to be better planners, organizers, and problem-solvers in this small experiment.

exQUIZitely 🕹️@exQUIZitely

Job interview: "Any management experience?" Me:

English

545

41.9K

Seth Karten@sethkarten·1d

Great work as always! I chatted with Natasha last summer saying that it isn’t the individual word changes that matter, but the overall argumentation changes that do. This plot shows exactly that! The LLM writing affects argumentation towards a different belief set

Natasha Jaques@natashajaques

You might say "well, I’m so good at prompting LLMs, I’m not going to be subject to these issues". So, we conduct a human user study to see how people naturally interact with LLMs to produce a piece of writing, and find that even when allowed to repeatedly prompt the LLM to refine an essay as they see fit, people that rely heavily on LLMs produce writing that argues for significantly different conclusions.

English

930

Seth Karten@sethkarten·1d

@kieradev @ben_j_todd Open source standardized version here x.com/sethkarten/sta…

Seth Karten@sethkarten

x.com/i/article/2033…

English

Kiera@kieradev·1d

@ben_j_todd is the harness open source?

English

903

Benjamin Todd@ben_j_todd·1d

Opus 4.6 is hugely better at Pokemon: • Opus 4.0 took 1,000 hours to get half way through • Opus 4.5 could almost finish in 1,000 hours • Opus 4.6 was another 10x faster!

English

745

108.2K

Seth Karten@sethkarten·1d

@ben_j_todd x.com/sethkarten/sta… Still lagging after standardization

Seth Karten@sethkarten

x.com/i/article/2033…

English

3.4K

Seth Karten@sethkarten·2d

@jsnnsa Where can i read the spawn thesis

English

105

jacob@jsnnsa·2d

awesome to see the spawn thesis beginning to play out gamecraft will become the largest game we’ve ever seen and it will power a huge % of the post AGI economy and give purpose to hundreds of millions of people this is what we saw in the data summer of last year

Danny Limanseta@DannyLimanseta

I don't care if people call it AI slop. Vibe coding games is fun. It's become my main hobby now, and no one can take that away from me.

English

Seth Karten@sethkarten·2d

@Shahules786 @VibrantLabsAI Minimizing the final sim-to-sim gap is required too. See our related work: x.com/sethkarten/sta…

Seth Karten@sethkarten

Automatic Generation of High-Performance RL Environments ArXiv: arxiv.org/abs/2603.12145

English

320

ikka@Shahules786·2d

(1/n) Today, we’re releasing Cloning Bench. Labs are paying 6-7 figures for clones of web apps to do web/computer use-based RL training. At @VibrantLabsAI , our fundamental goal is to automate the creation of RL environments. For web/CUAs, one way that we do that is by using coding agents and custom harness to automatically generated the simulation environment. We tested Codex, Gemini, Claude Code, and GLM using our harness on their ability to recreate a Slack workspace and benchmarked their performances. We have published our methods, results and analysis here today: vibrantlabs.com/blog/cloning-b…

English

142

11.9K

Seth Karten@sethkarten·2d

@yasei_no_otoko We can add support for other languages for the rpg. We will just need to add a checksum per language. PRs welcome!

English

野生の男@yasei_no_otoko·2d

@sethkarten This challenge uses arbitrary rules that discourage participation from research institutions outside the U.S., particularly those in Japan.

select766@select766

ポケモンのAIコンペ、日本の研究機関が実質的に参加できないルールで国際的な学術界における最高峰が決まる仕組みは良くないなあと感じています。(ChatGPTとの対話)

English

580

Seth Karten@sethkarten·4d

x.com/i/article/2033…

ZXX

368

69.8K

Seth Karten retweetledi

Chi Jin@chijinML·3d

Check out the full manuscript about the largest AI Pokémon tournament we ran at NeurIPS 2025!

Seth Karten@sethkarten

x.com/i/article/2033…

English

4.5K

Seth Karten@sethkarten·4d

@lateinteraction Thanks, Omar! I’ve been testing a version of GEPA for PokeAgent recently. More soon!

English

393

Omar Khattab@lateinteraction·4d

@sethkarten congrats Seth!!

English

477

Seth Karten@sethkarten·4d

reddit.com/r/LocalLLaMA/c… with quantization and cpu offloading, it may be possible since there are 3B active MoE. I def think it is worth exploring with the largest model that works on your gpu at the very least. Coding agents will only become more prevalent and require smaller models for the same performance

English

sacha🥝@alexUnder_sky·4d

@sethkarten I don't think I can run anything locally, as I've got like 16 gigs of ram. Maybe through kaggle notebooks or smth.

English

sacha🥝@alexUnder_sky·4d

@sethkarten, a quick question: have you actually learned rust (or can debug the code you've got) or you fully rely or codex/claude as your swe executors?

English

106

Seth Karten@sethkarten·4d

@alexUnder_sky Can you try Qwen3.5-35B-A3B quantized with OpenCode? I haven't tried open source models yet but it could be interesting depending on your local gpu.

English

sacha🥝@alexUnder_sky·4d

@sethkarten Thank you so much. I don't have coding agents, but at least would be fun to understand at least something

English

Seth Karten@sethkarten·4d

@haoailab Very cool! Do you have plans to open-source or is this a start-up product launch?

English

853

Hao AI Lab@haoailab·4d

(1/N) We're launching Dreamverse. Most AI video models take minutes to generate a 5 s 1080p clip. In 4.5 seconds, we can generate 30 s 1080p clips on a single GPU. Our videos generate faster than you can watch them: stop waiting on prompts and start directing scenes live. 🕹️Demo: dreamverse.fastvideo.org 📑 Blog: haoailab.com/blogs/dreamver… Welcome to the era of vibe-directing 👇

English

540

76K

Seth Karten@sethkarten·4d

@alexUnder_sky funnily enough a big part of my project from C-->rust/jax switch was that the rust/jax fans are correct. I just previously did not have the time to allocate until it was easy enough x.com/sethkarten/sta…

Seth Karten@sethkarten

So do we just use Rust and Go for everything now that Claude Code is good. Surely python isnt needed

English

sacha🥝@alexUnder_sky·4d

@sethkarten I just look at your work and came across the concepts of rust and they're quite different from cpp. Maybe, with quite a bit of familirity I could debug the rust code but it seems really heroic from your side to do it in rust. However, everyone does everything in rust, idk...

English

Seth Karten@sethkarten·4d

@harlanv11 Submit your work to github.com/sethkarten/awe… to have it verified on our leaderboard and our website pokeagentchallenge.com has resources for getting started and a live battling ladder to test your methodology. If you have something cool, feel free to share in our discord

English

381

h@harlanv11·4d

@sethkarten This is like the coolest intersection of all my favorite things. Where might someone hypothetically contribute to something like this in the future

English

498

Seth Karten@sethkarten·4d

@C4_TCG @BenryBrand @therahuldapp Could be interesting

English

258

Charlie Lockyer@C4_TCG·4d

@sethkarten Amazing work. @BenryBrand has been doing something similar for the TCG.

English

374

Seth Karten@sethkarten·4d

Economic alignment is a difficult problem to address since you must balance the individual’s autonomy with the collective’s welfare and growth. Even a slightly misaligned objective can be disastrous and unstable. Im staying followed :) i’ll have some more flushed out thoughts soon

English

1.8K

Peter McCrory@PeterMcCrory·4d

I want to share a bit more about my vision for the Economic Research team at Anthropic in the coming years. This is a forward-looking vision. Some pieces we’ve yet to develop. Aspects of this work will surely change. Consider joining the effort. 1/6 #heading=h.j1ij8p6h22u5" target="_blank" rel="nofollow noopener">docs.google.com/document/d/1OM…

English

114

1.2K

194.6K

Seth Karten@sethkarten·4d

@jackclarkSF @surmenok @AnthropicAI LLM Economists are the future!

English

Jack Clark@jackclarkSF·4d

I'm scaling the economic research function here @AnthropicAI to meet the challenge of powerful AI. This team today produces the best data in the industry via the Anthropic Economic Index + recent work on job exposure to AI. We have many very ambitious plans in the works. Join!