Nick Levine

551 posts

Nick Levine banner
Nick Levine

Nick Levine

@status_effects

training vintage language models

Katılım Kasım 2024
1.1K Takip Edilen390 Takipçiler
Sabitlenmiş Tweet
Nick Levine
Nick Levine@status_effects·
How well can language models like Claude Opus and GPT-5.2 write music? Introducing boogiebench: vote in anonymized LLM music composition battles. Unlike Suno, LLMs haven't been trained explicitly on this task, making it a nice generalization test (coding, aesthetics, temporal reasoning). Models often struggle but are rapidly improving, judging by the performance gap between the strongest and weakest models. Here's GPT-5.2 with hyperpop that kinda blew my mind (sound on!) (I jumped out of my seat when I heard the vocalizations, which 5.2 figured out, and which I didn't know was available in this framework)
English
9
10
60
31K
Nick Levine
Nick Levine@status_effects·
@mattbeane thank you! PRs welcome if you want to collaborate :)
English
0
0
1
9
Nick Levine
Nick Levine@status_effects·
@paul_cal there is a harness, code is here, welcome PRs!: github.com/nickslevine/bu… fwiw actual yomi hustle players are telling me their moves are incoherent and bad :). they already pointed out some issues with the harness. lots to do!
English
2
0
4
163
Nick Levine
Nick Levine@status_effects·
@0xAikoDai Would take a little tweaking but no reason why not!
English
0
0
0
294
Aiko
Aiko@0xAikoDai·
@status_effects wow this is so cool! wondering if human can co-play with AI in this game mod?
English
1
0
1
351
Nick Levine
Nick Levine@status_effects·
llms can FIGHT now. here's opus as wizard vs gpt-5.4 as robot. calling this budok-ai. it works by modding the brilliant game yomi hustle. 8-model seeded tournament incoming. details and code below:
English
108
143
1.9K
194.6K
Nick Levine
Nick Levine@status_effects·
@nicky_sap @SommerChase Can imagine human-agent collaborations similar to autobattlers where we’re the coach / plan setter and can update instructions between rounds, for example
English
4
0
3
140
Chase Sommer
Chase Sommer@SommerChase·
I'm still not sure about 'agentic gaming' imo the fun of games is playing with other people agents will make incredible NPCs, but like why play a game others aren't playing? idk, I still see agents as a newer version of npcs that already exist
Beanie@beaniemaxi

The next era of play to earn gaming will be Agentic. 5 years later and we now have the right framework for a sustainable and scalable model. Human labor sweatshop style Filipino Axie farms will be replaced by AI agents. The crypto gaming economy will be 1000x bigger than in 2021.

English
3
0
7
750
Nick Levine
Nick Levine@status_effects·
@Anishfishhh Can’t take any credit for the aesthetic or the game - I just made the mod!
English
0
0
1
78
Anish
Anish@Anishfishhh·
@status_effects ooo would love to see it :) I think it could be a super interesting endeavor. + combining it with unity / more complicated visuals could make it really interesting to watch. Although the pixel aesthetic is already fire
English
1
0
1
111
Nick Levine
Nick Levine@status_effects·
@Anishfishhh Nice. Definitely excited about RL once I get the harness ironed out some more
English
1
0
0
984
Anish
Anish@Anishfishhh·
@status_effects this is so sick, im trying to make something similar but with more of an rl element
English
1
0
2
1.2K
Nick Levine
Nick Levine@status_effects·
@sameQCU 🤘(to be clear, the game is turn-based (I give the agents 60 seconds per move but could set this to whatever), and then we replay the game in real time in the game engine for the video)
English
1
0
3
33
サメQCU
サメQCU@sameQCU·
this is the most interesting agentic benchmark proposed or developed. super big.
English
1
0
8
156
サメQCU
サメQCU@sameQCU·
x.com/status_effects… this is whole-ass 'arc agi if arc agi was real'. using a time to move normalization for the llm agents is really meaningful here and makes the interpretation of overall progress on 'agentic and in context learned environments' a lot more straightforward
Nick Levine@status_effects

llms can FIGHT now. here's opus as wizard vs gpt-5.4 as robot. calling this budok-ai. it works by modding the brilliant game yomi hustle. 8-model seeded tournament incoming. details and code below:

English
1
1
32
2.5K
Nick Levine
Nick Levine@status_effects·
@floinkus Can’t take any credit - it’s all Yomi hustle (the game I’m modding) that handles it. But yes, it’s calculating everything frame by frame.
English
1
0
9
360
Nick Levine
Nick Levine@status_effects·
@floinkus It’s a turn-based game. After a match is over we can replay it / render it in “real time” in the engine.
English
1
0
27
1.5K
Nick Levine
Nick Levine@status_effects·
@MatanHalevy @lu_sichu It’s actually turn-based, but the game lets us replay matches after they’re done as if they’re live!
English
1
0
42
1.9K
Nick Levine
Nick Levine@status_effects·
@roycebracketfgc please do! down to collab with whoever's interested. going to post an eight model tournament soon. would be fun to get people's takes on how the models play.
English
0
0
2
59
roycebracket
roycebracket@roycebracketfgc·
@status_effects respond in kind is a really funny turn of events that i would've never seen coming back then, but i'm impressed with how well it works. i'm gonna talk to people in the community and see if this work gives them any ideas and i'll let you know if we come up with something cool!
English
1
0
2
80
Nick Levine
Nick Levine@status_effects·
@roycebracketfgc took some experimentation, but see here for example. Main thing was making it clear to them what moves are actually in range, if any, and nudging them out of repetition loops. #L258" target="_blank" rel="nofollow noopener">github.com/nickslevine/bu…
English
1
0
2
91
roycebracket
roycebracket@roycebracketfgc·
@status_effects yeah i was just poring over some of the docs! the whole system for capturing and enumerating game state & possible legal moves is really interesting and might be of use to modders in the community. how on earth are you converting all that to a prompt the model can read though?
English
1
0
2
100
Nick Levine
Nick Levine@status_effects·
@roycebracketfgc i made a mod that coordinates with an external python process. the mod sends data out (just simple dicts of state) and waits for the LLM responses. code is below! would be fun to have experts at the game compete by writing dueling prompts instead of making decisions directly
English
1
0
4
928
roycebracket
roycebracket@roycebracketfgc·
@status_effects huh! very interested in how the input-output and "rendering" of game state is working for this. is it converting replay files to text and then "rendering" them repeatedly? i remember people trying to do this years ago with that approach but we decided it was too much effort
English
1
0
3
1.3K