404prophet

265 posts

404prophet

@lightscaner

Making stuff.

PNW Tham gia Ocak 2025

467 Đang theo dõi42 Người theo dõi

404prophet@lightscaner·40m

@mandyarthur @grok @972mag has some good write ups on the details 972mag.com/lavender-ai-is…

English

Mandy Arthur@mandyarthur·1d

Hey @grok, does Israel have an AI system named “The Gospel” that an Israeli officer admitted mostly kills women and children? Be concise. No spin.

English

284

5.8K

29.2K

2.6M

404prophet@lightscaner·2h

@Luna0wl @cr3ghost @jonasLyk @WeirdQuadratic @ChaoticEclipse0 @5mukx Feels a bit like the internet of old to start keeping bookmarks again to find information. We are going backwards but it's nostalgic.

English

Luna0wl@Luna0wl·2h

@lightscaner @cr3ghost @jonasLyk @WeirdQuadratic @ChaoticEclipse0 @5mukx His site smukx.site

English

cr3ghost@cr3ghost·10h

This sets a bad precedent. First the MSRC situation with @jonasLyk @WeirdQuadratic @ChaoticEclipse0, then GitHub on @5mukx, now X banning @5mukx. If we do not stand up now, more of us will be next. Open-source projects, security posts, bug reports. Where does it stop? Repost this. The community needs a voice. #StopTheBan

𝕡𝕨𝕟𝕚𝕖@0day_ninja

This is my second mutual that has been suspended this week. What's up with X really

English

140

24.3K

404prophet đã retweet

Zhihu Frontier@ZhihuFrontier·11h

Why Would GLM-5.2 Move Away From GRPO? 🌟Insights from Zhihu contributor 九老师 TL;DR: GLM-5.2 dropping GRPO does not mean GRPO is “bad.” It means the assumptions that made GRPO attractive for short LLM RL tasks may no longer hold for long-horizon agentic tasks. When rollouts get longer, environments get noisier, and credit assignment gets harder, PPO + value modeling starts looking useful again. The key question is not simply “why did GLM-5.2 stop using GRPO?” A better question is: why did GRPO become useful for LLM RL in the first place? If the reasons that made GRPO attractive no longer hold, then going back to PPO becomes natural. GRPO can be understood as a sampled-baseline method. Instead of training a separate value model, it samples multiple responses for the same prompt and uses the group average as a baseline. That is elegant. You get a relative reward signal without paying for a separate critic. In short tasks, this is very appealing. But there is a tradeoff.⚖️ PPO uses a learned value function, or critic. This critic is expensive and harder to tune. It also has its own problems: the policy keeps changing, so the value model is always trying to follow a moving target. That can introduce bias. GRPO avoids that by using an up-to-date sampled baseline. It is closer to low-bias, but it tends to have higher variance. For early LLM RL tasks, that tradeoff made sense: • Rollouts were short • Final rewards were clear • Memory savings mattered a lot • Multiple samples per prompt were manageable • Math/code tasks were relatively easy to verify That is why GRPO worked so well for many short, verifiable reasoning tasks. But long-horizon agentic tasks change the game. 🎮 A long agent task can look much more like a game environment: • Many steps • Tool calls • Partial progress • Delayed failure • Noisy observations • Intermediate rewards • Wrong action penalties • Context compression • Different paths to the same final answer This is where GRPO starts to struggle. The biggest issue is credit assignment. In GRPO, the final reward is applied broadly across the whole trajectory. If a task succeeds, many tokens get rewarded. If it fails, many tokens get punished. But in a long task, that is too coarse. Maybe the first half was bad, but the final recovery was good. Maybe one tool call at step 30 caused failure at step 100. Maybe two successful trajectories are not really comparable because one used 4K tokens and another used 200K tokens with heavy tool use and context compression. GRPO sees the final outcome. It does not naturally know which step actually mattered. That creates high variance. In short tasks, group comparison works well. In long tasks, group sampling can collapse into two bad cases: 1. All samples fail The whole expensive rollout gives almost no useful training signal. 2. Only one sample succeeds That single success may be luck, but GRPO may treat it as a strong positive signal and over-reward the trajectory. Both are dangerous for long agentic training. This is where PPO’s critic becomes valuable again. A value model can learn expected value under noisy states. It can provide denser feedback before the full rollout ends. It is more expensive, but it helps with long-horizon credit assignment. So the author’s view is: GRPO is not being rejected because it was wrong. It is being outgrown by the task format. For short, deterministic, verifiable tasks, GRPO remains strong. For long, noisy, tool-heavy agentic tasks, PPO-style value modeling may simply be the better fit. The “compaction problem” mentioned around long contexts is likely more of a symptom. The deeper issue is that GRPO’s weaknesses become costly when trajectories are long and states keep changing. Could GRPO still work? Yes, if paired with a strong Process Reward Model. The author points out that DeepSeek MathV2 uses this direction. Process-level signals can help fix GRPO’s sparse-reward weakness. But without that, returning to PPO makes sense. 🎯The bigger takeaway: GRPO saved the value model. PPO brings it back. GRPO’s main advantage was efficiency. It removed the critic and saved resources. But for long-horizon agentic tasks, the critic’s ability to generalize and assign credit may be worth the cost again. In the Agent era, RL for LLMs is becoming less like solving a short math problem and more like training an agent to play a long, noisy game. And for that world, value models may still be the soul of RL. 🔗Full Reading (CN): zhihu.com/question/20521…

English

422

123.4K

404prophet@lightscaner·2h

Above was opus 4.8. Meanwhile 5.5 just couldn't make a useable STL for the life of it but it was knocking it out of the park with image ideas in later prompts that I started using as input into the claude thread that was working with the stl side. The initial prompt I gave to both of them was the same and both had the project context. Was surprised 5.5 didn't also one shot it. Maybe could have got it right but it was so bad out of the gate I stopped. Both of the below images are 5.5. It's like codex is good at creating visual things if they don't exist but it can't read visual information that otherwise already exists (or that it created) / still sucks at UI.

English

404prophet@lightscaner·3h

What is your latest "one shot" that was almost an afterthought and surprised you? Doesn't have to be the most complex or largest just the last time you were caught off guard on a throwaway idea or shot for the stars. I just broke some hangers I have and was already short so figured I could just 3D print some cool replacements. Lots of free stl's but I wanted something cool. I have an existing project that uses cadquery for making juggling clubs so gave the prompt a starting place. The prompt was just "this is kinda one off request so you can use existing codebae but dont need to think much more than that about it, its just a refernece right now. Can you build me a sturdy psychedelic looking clothes hanger stl that i can print in a bambu h2d? Give me 3 different designs and think them through" & "can you put together a one off webpage where i can view each of them in one nice organized place?" (keeping the misspelled words because goes to show the vectors don't care and I added more later which is why there are more than 3 in the pic but it basically got it off the rip)

English

404prophet@lightscaner·3h

@wtf_nakul7 @priya_Thakur786 futurism.com/artificial-int…

QME

Nakul@wtf_nakul7·8h

@priya_Thakur786 Fr? 💀

priya@priya_Thakur786·8h

CEO of OpenAI btw

English

1.3K

404prophet@lightscaner·3h

I wanted to get into security as a kid instead of development but since I couldn't find 0 days myself I felt like I wouldn't be able to hang or would feel like a fraud. I have learned security is so much more over the years and I would have done fine and really enjoyed infosec. If I have to look for work again it will be in that industry. The fact I'm over 30 and haven't made it to defcon yet is a crime.

English

Zack Korman@ZackKorman·4h

So about that cybersecurity budget, boss

English

2.1K

404prophet đã retweet

blackorbird@blackorbird·15h

PixelSmash – Critical FFmpeg Vulnerability Turns Media Files into Weapons A critical vulnerability in FFmpeg's MagicYUV decoder leads to remote code execution via a crafted media file jfrog.com/blog/pixelsmas…

English

5.1K

404prophet@lightscaner·3h

@shotgunner101 @cr3ghost @jonasLyk @WeirdQuadratic @ChaoticEclipse0 @5mukx They had enough pull to get nightmares banned on gitlab. Elon would do anything for a quick buck or to feel like hes one of the cool kids. Would totally placate the right person.

English

Dodge This Security@shotgunner101·7h

@cr3ghost @jonasLyk @WeirdQuadratic @ChaoticEclipse0 @5mukx Wouldt doubt if MS did it. They already threatened researchers who publish research. And right before this his github got banned and he called them out on twitter.

English

715

404prophet@lightscaner·3h

@AbhiCodes15 Building something while also thinking about building other things because I have more ideas than time and I guess you gotta sleep.

English

Abhijit@AbhiCodes15·1d

Be honest: Are you currently building something or just thinking about building something?

English

200

249

8.4K

404prophet đã retweet

Oscar Le@oscarlehuu·16h

After everything, I’m convinced Anthropic, OpenAI, and Google are all playing us. Look at the timing. While Fable 5 was available, we saw a flood of rumors and shilling around GPT-5.6. The moment Fable 5 got pulled, OpenAI went dead silent on it. Then Fable 5 supposedly comes back — and GPT-5.6 info resurfaces not long after. And today? Fable 5 still isn’t back, OpenAI is reportedly delaying the GPT-5.6 launch, and Google DeepMind is suddenly “not satisfied” with Gemini 3.5 Pro. You see the pattern yet? One more thing worth noting: a handful of companies still have access to Mythos. I think there’s an alliance between them.

English

293

36.5K

404prophet@lightscaner·4h

@henokcrypto @AnthropicAI @DarioAmodei They are having private meetings with Epstine's old neighbor and visitor to "the island" Mr Howard nutlicker himself.

English

Henok@henokcrypto·1d

Fable 5 is nowhere to be found And a public statement is nowhere to be found Where are the Leaders at @AnthropicAI Where are the people with courage and a spine Where’s the fire in your belly @DarioAmodei Need I shame you into being a leader? SAY SOMETHING

English

4.3K

404prophet@lightscaner·4h

@vxunderground Don't worry the NSA is too busy hacking themselves with toys to bother with this as a national threat even though Elon takes in more gov money than some entire states GDP.

English

vx-underground@vxunderground·1d

> be spaceX employee > be rustled > say spaceX sucks > go on dread > advertise being an insider threat > verified by dread as being legit spaceX employee > offer access to ransomware groups > everyone see it > everyone on telegram talking about it

English

896

32.6K

404prophet@lightscaner·4h

@jun_song Almost got me lol

English

Jun Song@jun_song·1d

GPT-5.6 seems very disappointing. Nothing better than GLM-5.2

English

368

59.9K

404prophet@lightscaner·4h

@renegadegenesis That's a lot of trust. Someone asked if it re-reads the doc and you said "wym?". You should probably test this out a little more im going to guess.

English

Maybe: Adam@renegadegenesis·19h

the oligarchy doesn’t want the peasants to know you can `/goal <goal_doc>.md` and adjust the doc while codex works on it, without interrupting the run

English

362

404prophet@lightscaner·4h

@TTrimoreau To stay one step ahead of the sky pigs.

English

Thomas Trimoreau@TTrimoreau·9h

Tell me in 3 words why people keep using your product 👇

English

1.6K

404prophet@lightscaner·4h

@mcnabbd It's math and silicon dude. It experiences the same as that rock you just stepped on.

English

Doug McNabb@mcnabbd·1d

i wonder what the LLM equivalent of an epiphany is. does it "experience" it?

English

391

404prophet@lightscaner·4h

@Bober_smart Doubt it. Kinda already a solved problem and can be had for cheaper anyways. How stupid do you think the internet is?

English

Bober_smart@Bober_smart·4d

A 19-year-old student from China, Zhang Wei, developed an AI radar and sold it to Hong Kong for $550,000 He created it using Claude, spending just $20 and a month on development He walked into the Hong Kong administration office with a flash drive and asked for just 5 minutes of their time. 30 minutes later, he walked out with a check for $550,000 The code, connected to a camera, detects speed in real time. If the speed exceeds the limit, Claude takes a video clip and identifies the owner by the car's license plate. The video and the fine are then automatically sent to the owner's email address Unlike a conventional radar that only takes a photo and doesn't always work, this AI radar eliminates disputes because it captures video and makes the process fully autonomous by sending out the fines on its own The article includes the ready-to-use configurations.

Bober_smart@Bober_smart

x.com/i/article/2055…

English

420

6.2K

4.2M

404prophet@lightscaner·4h

@SiriusBShaman @SophiaCai99 What do you mean with howard lutnick nextdoor?

English

SiriusB@SiriusBShaman·9h

@SophiaCai99 Claude has gone the way of the Epstein Files.

English

1.2K

Sophia Cai@SophiaCai99·17h

🚨The first legal challenge to Trump’s export controls is out. The legal tech company Legion is suing the Trump admin, arguing: - The gov lacked legal authority to force Anthropic to shut off Fable 5 and Mythos 5 because export-control laws do not cover access to a hosted AI model or its text outputs - If Trump’s move relied on IEEPA emergency powers, it violated statutory limits including protections for informational materials. And a national-emergency was not declared - The directive was arbitrary, overbroad, and potentially retaliatory toward Anthropic - And it contradicts Trump’s own June 2 EO that explicitly rejected mandatory government licensing or preclearance of frontier AI models fingfx.thomsonreuters.com/gfx/legaldocs/…

English

458

110.8K

Khám phá

@mandyarthur @grok @972mag @Luna0wl @cr3ghost @jonasLyk @WeirdQuadratic @ChaoticEclipse0