404prophet

271 posts

404prophet

404prophet

@lightscaner

Making stuff.

PNW شامل ہوئے Ocak 2025
492 فالونگ42 فالوورز
404prophet ری ٹویٹ کیا
Sam Bent
Sam Bent@DoingFedTime·
The man who ran both the NSA and the CIA said it on camera: "We kill people based on metadata." Then they tell you metadata is harmless. youtu.be/wEfbhEVuvMM
YouTube video
YouTube
English
9
130
387
7.3K
404prophet ری ٹویٹ کیا
Peter Daou
Peter Daou@peterdaou·
A Jew beat a Jew in New York and they're calling it antisemitism. That word has been rendered UTTERLY meaningless.
English
56
568
4.4K
54.7K
Turkish Archives
Turkish Archives@TurkishArc·
Canadian Major Paeta Derek. Reported to his wife that the IDF was targeting hospitals. Days later, he was murdered by the IDF with a precision-guided missile. May he rest in peace.
Turkish Archives tweet media
English
125
4.1K
12.8K
232.9K
404prophet
404prophet@lightscaner·
@aherys More people should learn flame graphs fr.
English
0
0
0
3
Aherys
Aherys@aherys·
"Blueprint is fast" : I don't think people realise how much this is slow. 339.9us to draw... 64 numbers. I will show numbers in c++ after for the exact same thing, you will laugh.
Aherys tweet media
English
38
8
195
29K
Peter Dedene
Peter Dedene@dedene·
I am hoarding these for the upcoming GPT-5.6 release like health potions before a boss fight.
Peter Dedene tweet media
English
168
74
3.2K
118.8K
404prophet ری ٹویٹ کیا
Kyle 'esSOBi' Stone
Export controls? What's that? huggingface.co/Chunjiang-Inte… "Intended Use DeepSeek-V4-Fable is developed exclusively for defensive security research, authorized penetration testing, and red-team engagements within strictly defined scopes." #RedTeamGo
English
12
49
428
36.4K
Mandy Arthur
Mandy Arthur@mandyarthur·
Hey @grok, does Israel have an AI system named “The Gospel” that an Israeli officer admitted mostly kills women and children? Be concise. No spin.
English
293
6K
30.6K
2.7M
404prophet ری ٹویٹ کیا
Zhihu Frontier
Zhihu Frontier@ZhihuFrontier·
Why Would GLM-5.2 Move Away From GRPO? 🌟Insights from Zhihu contributor 九老师 TL;DR: GLM-5.2 dropping GRPO does not mean GRPO is “bad.” It means the assumptions that made GRPO attractive for short LLM RL tasks may no longer hold for long-horizon agentic tasks. When rollouts get longer, environments get noisier, and credit assignment gets harder, PPO + value modeling starts looking useful again. The key question is not simply “why did GLM-5.2 stop using GRPO?” A better question is: why did GRPO become useful for LLM RL in the first place? If the reasons that made GRPO attractive no longer hold, then going back to PPO becomes natural. GRPO can be understood as a sampled-baseline method. Instead of training a separate value model, it samples multiple responses for the same prompt and uses the group average as a baseline. That is elegant. You get a relative reward signal without paying for a separate critic. In short tasks, this is very appealing. But there is a tradeoff.⚖️ PPO uses a learned value function, or critic. This critic is expensive and harder to tune. It also has its own problems: the policy keeps changing, so the value model is always trying to follow a moving target. That can introduce bias. GRPO avoids that by using an up-to-date sampled baseline. It is closer to low-bias, but it tends to have higher variance. For early LLM RL tasks, that tradeoff made sense: • Rollouts were short • Final rewards were clear • Memory savings mattered a lot • Multiple samples per prompt were manageable • Math/code tasks were relatively easy to verify That is why GRPO worked so well for many short, verifiable reasoning tasks. But long-horizon agentic tasks change the game. 🎮 A long agent task can look much more like a game environment: • Many steps • Tool calls • Partial progress • Delayed failure • Noisy observations • Intermediate rewards • Wrong action penalties • Context compression • Different paths to the same final answer This is where GRPO starts to struggle. The biggest issue is credit assignment. In GRPO, the final reward is applied broadly across the whole trajectory. If a task succeeds, many tokens get rewarded. If it fails, many tokens get punished. But in a long task, that is too coarse. Maybe the first half was bad, but the final recovery was good. Maybe one tool call at step 30 caused failure at step 100. Maybe two successful trajectories are not really comparable because one used 4K tokens and another used 200K tokens with heavy tool use and context compression. GRPO sees the final outcome. It does not naturally know which step actually mattered. That creates high variance. In short tasks, group comparison works well. In long tasks, group sampling can collapse into two bad cases: 1. All samples fail The whole expensive rollout gives almost no useful training signal. 2. Only one sample succeeds That single success may be luck, but GRPO may treat it as a strong positive signal and over-reward the trajectory. Both are dangerous for long agentic training. This is where PPO’s critic becomes valuable again. A value model can learn expected value under noisy states. It can provide denser feedback before the full rollout ends. It is more expensive, but it helps with long-horizon credit assignment. So the author’s view is: GRPO is not being rejected because it was wrong. It is being outgrown by the task format. For short, deterministic, verifiable tasks, GRPO remains strong. For long, noisy, tool-heavy agentic tasks, PPO-style value modeling may simply be the better fit. The “compaction problem” mentioned around long contexts is likely more of a symptom. The deeper issue is that GRPO’s weaknesses become costly when trajectories are long and states keep changing. Could GRPO still work? Yes, if paired with a strong Process Reward Model. The author points out that DeepSeek MathV2 uses this direction. Process-level signals can help fix GRPO’s sparse-reward weakness. But without that, returning to PPO makes sense. 🎯The bigger takeaway: GRPO saved the value model. PPO brings it back. GRPO’s main advantage was efficiency. It removed the critic and saved resources. But for long-horizon agentic tasks, the critic’s ability to generalize and assign credit may be worth the cost again. In the Agent era, RL for LLMs is becoming less like solving a short math problem and more like training an agent to play a long, noisy game. And for that world, value models may still be the soul of RL. 🔗Full Reading (CN): zhihu.com/question/20521…
Zhihu Frontier tweet mediaZhihu Frontier tweet media
English
14
80
582
177.4K
404prophet
404prophet@lightscaner·
Above was opus 4.8. Meanwhile 5.5 just couldn't make a useable STL for the life of it but it was knocking it out of the park with image ideas in later prompts that I started using as input into the claude thread that was working with the stl side. The initial prompt I gave to both of them was the same and both had the project context. Was surprised 5.5 didn't also one shot it. Maybe could have got it right but it was so bad out of the gate I stopped. Both of the below images are 5.5. It's like codex is good at creating visual things if they don't exist but it can't read visual information that otherwise already exists (or that it created) / still sucks at UI.
404prophet tweet media404prophet tweet media
English
0
0
0
34
404prophet
404prophet@lightscaner·
What is your latest "one shot" that was almost an afterthought and surprised you? Doesn't have to be the most complex or largest just the last time you were caught off guard on a throwaway idea or shot for the stars. I just broke some hangers I have and was already short so figured I could just 3D print some cool replacements. Lots of free stl's but I wanted something cool. I have an existing project that uses cadquery for making juggling clubs so gave the prompt a starting place. The prompt was just "this is kinda one off request so you can use existing codebae but dont need to think much more than that about it, its just a refernece right now. Can you build me a sturdy psychedelic looking clothes hanger stl that i can print in a bambu h2d? Give me 3 different designs and think them through" & "can you put together a one off webpage where i can view each of them in one nice organized place?" (keeping the misspelled words because goes to show the vectors don't care and I added more later which is why there are more than 3 in the pic but it basically got it off the rip)
404prophet tweet media
English
1
0
0
15
priya
priya@priya_Thakur786·
CEO of OpenAI btw
priya tweet media
English
15
1
40
1.6K
404prophet
404prophet@lightscaner·
I wanted to get into security as a kid instead of development but since I couldn't find 0 days myself I felt like I wouldn't be able to hang or would feel like a fraud. I have learned security is so much more over the years and I would have done fine and really enjoyed infosec. If I have to look for work again it will be in that industry. The fact I'm over 30 and haven't made it to defcon yet is a crime.
English
0
0
0
20
Zack Korman
Zack Korman@ZackKorman·
So about that cybersecurity budget, boss
Zack Korman tweet media
English
10
7
74
3.5K
404prophet ری ٹویٹ کیا
blackorbird
blackorbird@blackorbird·
PixelSmash – Critical FFmpeg Vulnerability Turns Media Files into Weapons A critical vulnerability in FFmpeg's MagicYUV decoder leads to remote code execution via a crafted media file jfrog.com/blog/pixelsmas…
blackorbird tweet mediablackorbird tweet media
English
0
19
67
5.9K
404prophet
404prophet@lightscaner·
@AbhiCodes15 Building something while also thinking about building other things because I have more ideas than time and I guess you gotta sleep.
English
0
0
0
10
Abhijit
Abhijit@AbhiCodes15·
Be honest: Are you currently building something or just thinking about building something?
English
200
4
248
8.6K
404prophet ری ٹویٹ کیا
Oscar Le
Oscar Le@oscarlehuu·
After everything, I’m convinced Anthropic, OpenAI, and Google are all playing us. Look at the timing. While Fable 5 was available, we saw a flood of rumors and shilling around GPT-5.6. The moment Fable 5 got pulled, OpenAI went dead silent on it. Then Fable 5 supposedly comes back — and GPT-5.6 info resurfaces not long after. And today? Fable 5 still isn’t back, OpenAI is reportedly delaying the GPT-5.6 launch, and Google DeepMind is suddenly “not satisfied” with Gemini 3.5 Pro. You see the pattern yet? One more thing worth noting: a handful of companies still have access to Mythos. I think there’s an alliance between them.
English
60
17
346
42.5K
Henok
Henok@henokcrypto·
Fable 5 is nowhere to be found And a public statement is nowhere to be found Where are the Leaders at @AnthropicAI Where are the people with courage and a spine Where’s the fire in your belly @DarioAmodei Need I shame you into being a leader? SAY SOMETHING
English
15
2
31
4.4K
404prophet
404prophet@lightscaner·
@vxunderground Don't worry the NSA is too busy hacking themselves with toys to bother with this as a national threat even though Elon takes in more gov money than some entire states GDP.
English
0
0
0
89
vx-underground
vx-underground@vxunderground·
> be spaceX employee > be rustled > say spaceX sucks > go on dread > advertise being an insider threat > verified by dread as being legit spaceX employee > offer access to ransomware groups > everyone see it > everyone on telegram talking about it
English
30
41
908
33.1K