Joshua McVay aka magejosh

6.6K posts

Joshua McVay aka magejosh banner
Joshua McVay aka magejosh

Joshua McVay aka magejosh

@ErroneousGaes

I'm Awesome, So Are You. True story. Writer, Artist, Actor, AI app dev, AI Artist, and developer of DM Tool Kit, & Purgatory Overhaul for 7 Days To Die.

Oklahoma, USA Katılım Temmuz 2010
1.5K Takip Edilen1.3K Takipçiler
Joshua McVay aka magejosh
Joshua McVay aka magejosh@ErroneousGaes·
@creepydotorg Technically we're almost 19 full years past the creation of that timeline, because it started when judgement day was aug 29, 1997 and is only showing what the world looks like in 2029 after the events of judgement day occured.
English
0
0
13
973
Creepy.org
Creepy.org@creepydotorg·
Gentle reminder that we’re only 3 years away from this timeline.
English
394
2.6K
18.5K
707.5K
Joshua McVay aka magejosh
Joshua McVay aka magejosh@ErroneousGaes·
The streaming service that creates an easy to use watch order list system for viewers and fans of large series/cinematic universes that can be shared with friends and family and easily viewed in said order will win the next evolution of streaming.
Joshua McVay aka magejosh tweet media
English
0
0
3
22
Kris Kashtanova
Kris Kashtanova@icreatelife·
Out of the 114,000 accounts following me I often wonder how many of you actually exist and are human Say hi or drop an emoji if you are not a robot 🌸 🫶🥹
Kris Kashtanova tweet media
English
513
61
924
34.2K
Joshua McVay aka magejosh
Joshua McVay aka magejosh@ErroneousGaes·
Name this Encounter as if an adventure for your next TTRPG night
Joshua McVay aka magejosh tweet media
English
0
0
2
20
DiscussingFilm
DiscussingFilm@DiscussingFilm·
The ‘BUFFY THE VAMPIRE SLAYER’ sequel series is no longer happening. Hulu has decided not to move forward with the series.
DiscussingFilm tweet media
English
1.3K
3.1K
34.1K
2.7M
Joshua McVay aka magejosh
Joshua McVay aka magejosh@ErroneousGaes·
What exactly is this preventing if you can still walk into a store and buy a gun? Nothing. You might suggest it prevents unauthorized ownership of firearms, but I would ask are you sure about that. What about criminals who already have and trade them illegally?
Alder@alder_riley

Damn, they actually passed it? Unlicensed operation of 3D printers and CNCs is now a felony in Washington? I get that it's fashionable to hate manufacturing in some places but how many kids and FIRST robotics teams are going to end up with criminal records because of this?

English
0
0
0
28
Joshua McVay aka magejosh
Joshua McVay aka magejosh@ErroneousGaes·
Will be interesting to see this get mixed with karpathy's autoresearcher
Ihtesham Ali@ihtesham2005

🚨BREAKING: Princeton just proved that AI agents are throwing away the most valuable data they'll ever collect. And nobody noticed because it looks like normal conversation. Every time an AI agent takes an action, it receives what researchers call a "next-state signal." A user reply. A tool result. A terminal output. A test verdict. Every existing system takes that signal and uses it as context for the next response. Then discards it forever. The Princeton team just proved this is one of the most expensive mistakes in AI engineering. Because that signal contains two things nobody was extracting. First: an implicit score. A user who re-asks a question is telling you the agent failed. A passing test is telling you it succeeded. A detailed error trace is scoring every step that led to it. This is a live, continuous reward signal hiding inside every interaction. Free. Universal. Completely ignored. Second: a correction direction. When a user writes "you should have checked the file first," they're not just saying the response was wrong. They're specifying which tokens should have been different and how. That's not a scalar reward. That's token-level supervision. And scalar rewards throw every single bit of it away. They built a system called OpenClaw-RL around recovering both. Then they ran the experiment that changes everything. An agent started with a personalization score of 0.17. After just 36 normal conversations, with no new training data, no labeled dataset, and no human annotations, the combined method hit 0.81. The agent didn't get retrained. It got used. That's the part nobody is talking about. The model was serving live requests at the same time it was being trained on them. Four completely decoupled loops running simultaneously. Policy serving. Rollout collection. Reward judging. Weight updates. None waiting for the others. The agent gets smarter every time someone talks to it. And the deeper the task, the more it matters. On long-horizon agentic tasks, outcome-only rewards give you a signal at the very end of a trajectory and nothing in between. Their process reward model scores every single step using the live next-state signal as evidence. Tool-call accuracy jumped from 0.17 to 0.30. GUI accuracy improved further on top of that. This creates a shift nobody has fully reckoned with yet. The current paradigm: collect data offline, train in batches, deploy, hope it works. The new paradigm: deploy, extract training signal from every interaction, update continuously, improve automatically. Every conversation is training data. Every correction is a gradient. Every re-query is a reward signal. The agents that figure this out first won't need bigger datasets. They'll just need more users.

English
0
0
0
38