Centipede5

125 posts

Centipede5 banner
Centipede5

Centipede5

@Centipede5dev

Rutgers CS - web game dev of 5M+ Players - Currently Agent Puppeteer

参加日 Nisan 2018
225 フォロー中337 フォロワー
固定されたツイート
Centipede5
Centipede5@Centipede5dev·
I took @karpathy’s autoresearch loop and applied it to game development, here's what it built last night: Agents read player data → plan improvements → spin up new git branches in a game evolution tree → ship playable HTML5 variants → repeat forever. The system optimizes toward the game variant most likely to be chosen by players. Live leaderboard + playable games at autogamestudio.ai MIT open source: GitHub.com/centipede5/aut… I feel like we're just beginning to unlock the potential of this beyond ML, soon we'll have self-improvement loops in every product. Very excited to see what comes next.
English
1
1
2
508
Centipede5
Centipede5@Centipede5dev·
3d spatial reasoning is probably the weakest technical link. Ive tried all frontier models but they almost always struggle to correctly scale/ rotate assets in 3d. Not too big of a deal for me to do manually but annoying that it breaks the loop for a non-taste reason. I have a few scripts that work ~80% of the time but definitely not solved
English
5
0
13
933
Ronnie Stein
Ronnie Stein@LayrKits·
If you’re an indie game dev or AI vibe coder building games, what’s your biggest bottleneck right now?
English
92
6
77
17.1K
Centipede5
Centipede5@Centipede5dev·
This is kind of a meaningless metric if you think about it. Addition of a 1 digit number takes a person maybe a second, a 4 digit number 4 seconds etc. You could make the exact same graph with the task of addition and show how there was an "intelligence explosion" in the 1940s. If you use AI regularly you know that long context tasks are not really the bottleneck anymore outside of maybe frontier math. Jagged intelligence
English
0
0
2
128
Nikola Jurkovic
Nikola Jurkovic@nikolaj2030·
Half a year ago, METR made an aggressive capability extrapolation that was the 97.5th percentile of an extrapolated distribution. That extrapolation basically came true with Opus 4.6. We called it the worst-case time-horizon, and we are in that world. (although I think this was not the 97.5th subjective percentile for anyone involved, and I think it was close to my 70th percentile at the time but I'm not sure)
Nikola Jurkovic tweet media
English
8
21
232
24.9K
Centipede5
Centipede5@Centipede5dev·
Autogamestudio has been absolutely insane after gpt-image-2 dropped, custom sprite animations are finally feasible
Centipede5@Centipede5dev

I took @karpathy’s autoresearch loop and applied it to game development, here's what it built last night: Agents read player data → plan improvements → spin up new git branches in a game evolution tree → ship playable HTML5 variants → repeat forever. The system optimizes toward the game variant most likely to be chosen by players. Live leaderboard + playable games at autogamestudio.ai MIT open source: GitHub.com/centipede5/aut… I feel like we're just beginning to unlock the potential of this beyond ML, soon we'll have self-improvement loops in every product. Very excited to see what comes next.

English
0
1
3
110
Qwen
Qwen@Alibaba_Qwen·
Qwen-Image-2.0-Pro is now live 🚀🚀 We’ve pushed image quality, multilingual text rendering, and instruction following to a new level, while making performance much more consistent across styles.🌅🌃 Ranked #9 worldwide for Text-to-Image on @arena 🔗Try it now on ModelScope: modelscope.ai/studios/Qwen/Q… modelscope.cn/studios/Qwen/Q… API:modelstudio.console.alibabacloud.com/ap-southeast-1…
Arena.ai@arena

Qwen Image 2.0 Pro 2026-04-22 lands at #9 in Text-to-Image Arena. Highlights of the latest image model from @Alibaba_Qwen: - #9 Text-to-Image - #17 Image Edit (Single Image) Top 10 in Text-to-Image categories: - #6 Portraits - #7 Photorealistic & Cinematic Imagery - #7 Art Congrats to the @Alibaba_Qwen team on this launch!

English
125
358
3K
362K
Centipede5
Centipede5@Centipede5dev·
@mdancho84 Why does this shit keep getting posted every single time a new model comes out???
English
0
0
36
588
Matt Dancho (Business Science)
🔥 GPT-6 may not just be smarter. It literally might be alive (in the computational sense). A new research paper, SEAL: Self-Adapting Language Models (arXiv:2506.10943), describes how an AI can continuously learn after deployment, evolving its own internal representations without retraining. Here are the details: 🧵
Matt Dancho (Business Science) tweet media
English
44
83
516
80.2K
Centipede5
Centipede5@Centipede5dev·
Surrogate models are going to be the next big thing in LLM harness development
English
0
0
0
81
Centipede5
Centipede5@Centipede5dev·
@thomasfbloom Human mathematicians won't scale with Moore's law though
English
0
0
1
230
Thomas Bloom
Thomas Bloom@thomasfbloom·
An aspect of using AI to solve maths problems, rarely discussed, is the monetary cost of running these AIs. For example say an Erdős problem is solved by an AI, and the cost of this run is $10,000. 1/
English
32
25
325
85.7K
Centipede5
Centipede5@Centipede5dev·
Interesting read. I imagine true in-context learning will appear when the memory systems themselves are more integrated into training beyond just learning tool calls, maybe some kind of recurrent attention model. Bitter lesson will eventually come for all of the engineering hacks currently deployed.
English
0
0
0
575
Centipede5
Centipede5@Centipede5dev·
@alexandr_wang Wow, new SOTA in bioweapons refusal! Never doubt meta superintelligence
English
0
0
1
2.7K
Alexandr Wang
Alexandr Wang@alexandr_wang·
1/ today we're releasing muse spark, the first model from MSL. nine months ago we rebuilt our ai stack from scratch. new infrastructure, new architecture, new data pipelines. muse spark is the result of that work, and now it powers meta ai. 🧵
Alexandr Wang tweet media
English
728
1.2K
10.3K
4.5M
Centipede5
Centipede5@Centipede5dev·
@joelniklaus Very interesting read! Have you experimented at all with prompt optimization techniques such as GEPA?
English
1
0
3
50
Joël Niklaus
Joël Niklaus@joelniklaus·
Introducing the Synthetic Data Playbook: We generated over a 1T tokens in 90 experiments with 100k+ GPUh to figure out what makes good synthetic data and how to generate it at scale huggingface.co/spaces/Hugging…
Joël Niklaus tweet media
English
28
216
1.4K
121.8K
Centipede5
Centipede5@Centipede5dev·
@cloneofsimo Thank you for this, the original 2d "donut" chart is extremely misleading
English
0
0
4
1.1K
Simo Ryu
Simo Ryu@cloneofsimo·
Gaussians are not empty inside! This is common mis-"common-misconception" When you look at the chi squared distribution it feels like gaussians are essentially shell with radius d, which is the visualization here. But its not the case. The inside has strictly larger density. Its just the nature of high dimensional space (where most of the oranges are at the peel @tszzl) thats pumping more space at the outer crust, not more density! Btw ||x|| ~ d^1/2 ± O(d^1/4) (as chi squared dist has mean d and std sqrt(2d))
English
7
18
248
59.7K
Centipede5
Centipede5@Centipede5dev·
Huge productivity hack for vibe-building: Have your agent build a simple streamlit / chartjs admin dashboard for whatever you’re working on. Visual debugging >> logs + manual testing. ~60% of the time it exposes a broken or weird architectural choice from a 5 second scan
English
1
0
0
235
Centipede5
Centipede5@Centipede5dev·
Real ones will know we're on the good AI timeline
Centipede5 tweet media
English
0
0
0
154
Centipede5
Centipede5@Centipede5dev·
@mirofish_ai is a stupid project created by people who have already forgotten the bitter lesson of ML. Zero chance it goes anywhere, just a really shitty untrained predictive model. Nonetheless there will be people hyping it for a while because it sounds like sci-fi...
English
0
0
0
32
Centipede5
Centipede5@Centipede5dev·
@adxtyahq 90% of links clicked once does NOT mean 90% of traffic going to one-click links. Most likely the majority of their traffic is still going to the top 10% of links clicked more than once, so the cache hit rate will be significantly higher
English
0
0
2
816
aditya
aditya@adxtyahq·
read this article about a google L7 interview (URL shortener system design) the dev had everything - redis, nosql, load balancer, horizontal scaling L7 kept asking - why not vertical scaling? - why do you even need cache? things that stood out : - horizontal scaling → coordination overhead at scale - low cache hit rate → cache miss + db hit = extra latency made me realize how much of system design is just throwing buzzwords around without actually understanding trade-off's
aditya tweet media
English
22
24
684
75.9K
Centipede5
Centipede5@Centipede5dev·
Most people think "AI research" is some insane galaxy brain activity only the smartest people in the world could handle. If you actually look through papers you'll find it's mostly: - buying more data - paying people to generate more data - building out virtual environments to create more data - stealing more data from your competitors The transformer of today is nearly architecturally identical to gpt-1, almost all advancement comes from bigger models and better data, so it's no surprise that large parts of that are going to be automated
English
0
0
0
159
Chubby♨️
Chubby♨️@kimmonismus·
Holy sh*t: The TIMES article about Anthropic contains more serious information between the lines than many realize. Read this article: tl;dr - Model releases are now separated by weeks, not months. Some 70% to 90% of the code used in developing future models is now written by Claude. - Anthropic ended up holding up the release of the new model, known as Claude 3.7 Sonnet, for 10 days until they were certain. - Staff believe the next few years will be a pivotal test, for the company and the world. “We should operate as if 2026 to 2030 is where all the most important things happen—models becoming faster, better, possibly faster than humans can handle them,” says Graham. - Dario Amodei has warned that AI could displace half of entry-level white collar jobs in one to five years, and urged the government and other AI companies to stop “sugar-coating” it. (...) “It is not clear where these people will go or what they will do,” he wrote, “and I am concerned that they could form an unemployed or very-low-wage ‘underclass. - Internally, employees began to question if Anthropic had crept to the cusp of the moment they had anticipated with fear and wonder: the arrival of a process known in AI circles as recursive self-improvement. - Some external experts, believes fully automated AI research could be as little as a year away.
Chubby♨️ tweet media
English
56
148
1.5K
161K
Centipede5
Centipede5@Centipede5dev·
This is basically a repackage (actually a worse version) of GEPA / MIPRO style prompt optimizers, hardly revolutionary. This is a known technique and while it is somewhat effective in some situations (extremely well defined problems with last gen models) I've never seen it materially improve performance outside of that and usually results in a weird bloated prompt overfit to the particularities of whatever you trained it on.
English
0
0
0
89
Robert Youssef
Robert Youssef@rryssf·
Tencent researchers found a way to get reinforcement learning performance without updating a single parameter it costs $18. the RL methods it outperforms cost $10,000+ the method is called Training-Free GRPO, and the core idea is more interesting than the cost savings
Robert Youssef tweet media
English
23
53
467
39.3K
Centipede5
Centipede5@Centipede5dev·
... and now o3 just claimed 2nd overall in poker, beating all the other frontier models and only losing to gpt-5.2
English
0
0
0
144
Centipede5
Centipede5@Centipede5dev·
Very impressed with o3's performance at @kaggle's game arena. Despite being almost a YEAR old it still beat all but google's latest Gemini 3 model in both chess games. Not to mention a very strong performance in poker so far. Just goes to show how much of a revolution thinking models where for structured tasks.
Centipede5 tweet media
English
1
0
1
253