Centipede5

125 posts

Centipede5

@Centipede5dev

Rutgers CS - web game dev of 5M+ Players - Currently Agent Puppeteer

参加日 Nisan 2018

225 フォロー中337 フォロワー

固定されたツイート

Centipede5@Centipede5dev·3 Nis

I took @karpathy’s autoresearch loop and applied it to game development, here's what it built last night: Agents read player data → plan improvements → spin up new git branches in a game evolution tree → ship playable HTML5 variants → repeat forever. The system optimizes toward the game variant most likely to be chosen by players. Live leaderboard + playable games at autogamestudio.ai MIT open source: GitHub.com/centipede5/aut… I feel like we're just beginning to unlock the potential of this beyond ML, soon we'll have self-improvement loops in every product. Very excited to see what comes next.

English

508

Centipede5@Centipede5dev·3d

3d spatial reasoning is probably the weakest technical link. Ive tried all frontier models but they almost always struggle to correctly scale/ rotate assets in 3d. Not too big of a deal for me to do manually but annoying that it breaks the loop for a non-taste reason. I have a few scripts that work ~80% of the time but definitely not solved

English

933

Ronnie Stein@LayrKits·3d

If you’re an indie game dev or AI vibe coder building games, what’s your biggest bottleneck right now?

English

17.1K

Centipede5@Centipede5dev·5d

This is kind of a meaningless metric if you think about it. Addition of a 1 digit number takes a person maybe a second, a 4 digit number 4 seconds etc. You could make the exact same graph with the task of addition and show how there was an "intelligence explosion" in the 1940s. If you use AI regularly you know that long context tasks are not really the bottleneck anymore outside of maybe frontier math. Jagged intelligence

English

128

Nikola Jurkovic@nikolaj2030·6d

Half a year ago, METR made an aggressive capability extrapolation that was the 97.5th percentile of an extrapolated distribution. That extrapolation basically came true with Opus 4.6. We called it the worst-case time-horizon, and we are in that world. (although I think this was not the 97.5th subjective percentile for anyone involved, and I think it was close to my 70th percentile at the time but I'm not sure)

English

232

24.9K

Centipede5@Centipede5dev·29 Nis

Autogamestudio has been absolutely insane after gpt-image-2 dropped, custom sprite animations are finally feasible

Centipede5@Centipede5dev

English

110

Centipede5@Centipede5dev·25 Nis

@Alibaba_Qwen @arena Embarrassing

English

Qwen@Alibaba_Qwen·25 Nis

Qwen-Image-2.0-Pro is now live 🚀🚀 We’ve pushed image quality, multilingual text rendering, and instruction following to a new level, while making performance much more consistent across styles.🌅🌃 Ranked #9 worldwide for Text-to-Image on @arena 🔗Try it now on ModelScope: modelscope.ai/studios/Qwen/Q… modelscope.cn/studios/Qwen/Q… API：modelstudio.console.alibabacloud.com/ap-southeast-1…

Arena.ai@arena

Qwen Image 2.0 Pro 2026-04-22 lands at #9 in Text-to-Image Arena. Highlights of the latest image model from @Alibaba_Qwen: - #9 Text-to-Image - #17 Image Edit (Single Image) Top 10 in Text-to-Image categories: - #6 Portraits - #7 Photorealistic & Cinematic Imagery - #7 Art Congrats to the @Alibaba_Qwen team on this launch!

English

125

358

362K

Centipede5@Centipede5dev·20 Nis

@mdancho84 Why does this shit keep getting posted every single time a new model comes out???

English

588

Matt Dancho (Business Science)@mdancho84·20 Nis

🔥 GPT-6 may not just be smarter. It literally might be alive (in the computational sense). A new research paper, SEAL: Self-Adapting Language Models (arXiv:2506.10943), describes how an AI can continuously learn after deployment, evolving its own internal representations without retraining. Here are the details: 🧵

Matt Dancho (Business Science) tweet media

English

516

80.2K

Centipede5@Centipede5dev·15 Nis

Surrogate models are going to be the next big thing in LLM harness development

English

Centipede5@Centipede5dev·11 Nis

@thomasfbloom Human mathematicians won't scale with Moore's law though

English

230

Thomas Bloom@thomasfbloom·11 Nis

An aspect of using AI to solve maths problems, rarely discussed, is the monetary cost of running these AIs. For example say an Erdős problem is solved by an AI, and the cost of this run is $10,000. 1/

English

325

85.7K

Centipede5@Centipede5dev·11 Nis

Interesting read. I imagine true in-context learning will appear when the memory systems themselves are more integrated into training beyond just learning tool calls, maybe some kind of recurrent attention model. Bitter lesson will eventually come for all of the engineering hacks currently deployed.

English

575

Tianle Cai@tianle_cai·10 Nis

x.com/i/article/2042…

ZXX

100

643

222.9K

Centipede5@Centipede5dev·8 Nis

@alexandr_wang Wow, new SOTA in bioweapons refusal! Never doubt meta superintelligence

English

2.7K

Alexandr Wang@alexandr_wang·8 Nis

1/ today we're releasing muse spark, the first model from MSL. nine months ago we rebuilt our ai stack from scratch. new infrastructure, new architecture, new data pipelines. muse spark is the result of that work, and now it powers meta ai. 🧵

English

728

1.2K

10.3K

4.5M

Centipede5@Centipede5dev·8 Nis

@joelniklaus Very interesting read! Have you experimented at all with prompt optimization techniques such as GEPA?

English

Joël Niklaus@joelniklaus·8 Mar

Introducing the Synthetic Data Playbook: We generated over a 1T tokens in 90 experiments with 100k+ GPUh to figure out what makes good synthetic data and how to generate it at scale huggingface.co/spaces/Hugging…

English

216

1.4K

121.8K

Centipede5@Centipede5dev·6 Nis

@cloneofsimo Thank you for this, the original 2d "donut" chart is extremely misleading

English

1.1K

Simo Ryu@cloneofsimo·6 Nis

Gaussians are not empty inside! This is common mis-"common-misconception" When you look at the chi squared distribution it feels like gaussians are essentially shell with radius d, which is the visualization here. But its not the case. The inside has strictly larger density. Its just the nature of high dimensional space (where most of the oranges are at the peel @tszzl) thats pumping more space at the outer crust, not more density! Btw ||x|| ~ d^1/2 ± O(d^1/4) (as chi squared dist has mean d and std sqrt(2d))

English

248

59.7K

Centipede5@Centipede5dev·24 Mar

Huge productivity hack for vibe-building: Have your agent build a simple streamlit / chartjs admin dashboard for whatever you’re working on. Visual debugging >> logs + manual testing. ~60% of the time it exposes a broken or weird architectural choice from a 5 second scan

English

235

Centipede5@Centipede5dev·20 Mar

Real ones will know we're on the good AI timeline

English

154

Centipede5@Centipede5dev·19 Mar

@mirofish_ai is a stupid project created by people who have already forgotten the bitter lesson of ML. Zero chance it goes anywhere, just a really shitty untrained predictive model. Nonetheless there will be people hyping it for a while because it sounds like sci-fi...

English

Centipede5@Centipede5dev·18 Mar

@adxtyahq 90% of links clicked once does NOT mean 90% of traffic going to one-click links. Most likely the majority of their traffic is still going to the top 10% of links clicked more than once, so the cache hit rate will be significantly higher

English

816

aditya@adxtyahq·18 Mar

read this article about a google L7 interview (URL shortener system design) the dev had everything - redis, nosql, load balancer, horizontal scaling L7 kept asking - why not vertical scaling? - why do you even need cache? things that stood out : - horizontal scaling → coordination overhead at scale - low cache hit rate → cache miss + db hit = extra latency made me realize how much of system design is just throwing buzzwords around without actually understanding trade-off's

English

684

75.9K

Centipede5@Centipede5dev·12 Mar

Most people think "AI research" is some insane galaxy brain activity only the smartest people in the world could handle. If you actually look through papers you'll find it's mostly: - buying more data - paying people to generate more data - building out virtual environments to create more data - stealing more data from your competitors The transformer of today is nearly architecturally identical to gpt-1, almost all advancement comes from bigger models and better data, so it's no surprise that large parts of that are going to be automated

English

159

Chubby♨️@kimmonismus·11 Mar

Holy sh*t: The TIMES article about Anthropic contains more serious information between the lines than many realize. Read this article: tl;dr - Model releases are now separated by weeks, not months. Some 70% to 90% of the code used in developing future models is now written by Claude. - Anthropic ended up holding up the release of the new model, known as Claude 3.7 Sonnet, for 10 days until they were certain. - Staff believe the next few years will be a pivotal test, for the company and the world. “We should operate as if 2026 to 2030 is where all the most important things happen—models becoming faster, better, possibly faster than humans can handle them,” says Graham. - Dario Amodei has warned that AI could displace half of entry-level white collar jobs in one to five years, and urged the government and other AI companies to stop “sugar-coating” it. (...) “It is not clear where these people will go or what they will do,” he wrote, “and I am concerned that they could form an unemployed or very-low-wage ‘underclass. - Internally, employees began to question if Anthropic had crept to the cusp of the moment they had anticipated with fear and wonder: the arrival of a process known in AI circles as recursive self-improvement. - Some external experts, believes fully automated AI research could be as little as a year away.

English

148

1.5K

161K

Centipede5@Centipede5dev·17 Şub

This is basically a repackage (actually a worse version) of GEPA / MIPRO style prompt optimizers, hardly revolutionary. This is a known technique and while it is somewhat effective in some situations (extremely well defined problems with last gen models) I've never seen it materially improve performance outside of that and usually results in a weird bloated prompt overfit to the particularities of whatever you trained it on.

English

Robert Youssef@rryssf·16 Şub

Tencent researchers found a way to get reinforcement learning performance without updating a single parameter it costs $18. the RL methods it outperforms cost $10,000+ the method is called Training-Free GRPO, and the core idea is more interesting than the cost savings

English

467

39.3K

Centipede5@Centipede5dev·5 Şub

... and now o3 just claimed 2nd overall in poker, beating all the other frontier models and only losing to gpt-5.2

English

144

Centipede5@Centipede5dev·3 Şub

Very impressed with o3's performance at @kaggle's game arena. Despite being almost a YEAR old it still beat all but google's latest Gemini 3 model in both chess games. Not to mention a very strong performance in poker so far. Just goes to show how much of a revolution thinking models where for structured tasks.

English

253

Centipede5@Centipede5dev·4 Şub

@matt_elevenlabs 11B !

Matthieu ❙❙ ElevenLabs@matt_elevenlabs·4 Şub

We just raised $500M at an 11B valuation 🎉 To celebrate, we’re giving away 1,000 free credits so you can test our platform. For the next 6 hours, comment “11B” below and we’ll DM you the credits (must follow) 👇

ElevenLabs@ElevenLabs

We raised $500M at an $11B valuation to transform how people interact with technology.

English

16.3K

742

13.3K

1.9M

ディスカバー

@Alibaba_Qwen @arena @mdancho84 @thomasfbloom @alexandr_wang @joelniklaus @cloneofsimo @tszzl