置顶推文
Simon Radner
20 posts


started turning this into a benchmark, the constraint: independent of the specific game, only the scummvm engine.
defining “progress” under that constraint is hard. current hack: measure what changes. not sure if that’s progress or just exploration.
github.com/rabengraph/scu…
English

@alkampfer Starting prompt is basically “read the instructions and make progress in the game” — the instructions expose a JS API to get game state + logs
will share more soon
English

@simonradner Is the starting prompt shared? I imagine use subagent and compress the context to reach the end. Really interesting experiment.
English

@Estudio528 it’s actually trying to find the Scumm Bar, in this run it expects it somewhere in the village and only finds it later
about half the runs it goes there straight away, it’s probabilistic
English

@simonradner He skipped going into the bar – something a human would rarely do. And the Scumm Bar at that, which is the first place you’re supposed to go. Although he might not have spotted the door properly; Melee Island is quite dark. 😁
English

@artlantiko what do you envision? could have the agent yield small summaries as it goes
English

@simonradner Pretty cool, and do you see some timeline about the decisions it takes?
English

@opinjonated glad it makes sense, more going on under the hood, will share soon
English

@simonradner Ah, nevermind. Found the Youtube link and now I understand what you did. Very clever, I can see other ways to make use of this method. Thank you!
English

@damageboy haha I think you’re safe — Monkey Island needs creative, humorous thinking to solve, and the agent mostly lacks that
might brute force it eventually though
English

Oh, now I feel replaced
Simon Radner@simonradner
Claude playing Monkey Island — about as good as I was at 14 you can watch its reason while it plays
English

@the_ultralazr sure, happy to, ClaudeCast is great. happy to share the session log, feel free to DM me
English

@simonradner Love it! Would you be open to share that session log for an episode of the ClaudeCast podcast?
English

Doing this on pi ATM ... cc sucks 😭
Simon Radner@simonradner
Claude playing Monkey Island — about as good as I was at 14 you can watch its reason while it plays
English

@dvygh @badlogicgames yeah, but it mixes up walkthroughs from different versions, and there’s still a gap between lossy knowledge and acting correctly
English

@simonradner @badlogicgames it should have a walkthrough or two in its training data tho?
English

@RMWinslow haha
Monkey Island is basically full of prompt injections and you can see the model fall for some of them too
got a bit lucky here, getting into the kitchen needs timing with the cook serving groks
in another run it reduced the sleep between retries to catch the right moment
English

@simonradner I remember getting really stumped in this game because I interpreted the Red Herring as a gag meant to be ignored, and completely forgot about it for later.
In retrospect, it feels a bit like I prompt injected myself.
English

@davidatnilsson yep, that’s next up
going to put a simple version online so @moltbook folks can try it out
English

@simonradner This is cute. Somebody should set up an adventure games benchmark for LLMs.
English
Simon Radner 已转推

Meet our next #WCEU 2014 speaker - @automattic experience director Davide Casali → 2014.europe.wordcamp.org/2014/09/02/wce…

English

hi to all at #wceu ! looking forward to tomorrow, it's gonna be a blast!
English


