404prophet

271 posts

404prophet

@lightscaner

Making stuff.

PNW شامل ہوئے Ocak 2025

492 فالونگ42 فالوورز

404prophet ری ٹویٹ کیا

Sam Bent@DoingFedTime·6h

The man who ran both the NSA and the CIA said it on camera: "We kill people based on metadata." Then they tell you metadata is harmless. youtu.be/wEfbhEVuvMM

YouTube

English

130

387

7.3K

404prophet ری ٹویٹ کیا

Peter Daou@peterdaou·2h

A Jew beat a Jew in New York and they're calling it antisemitism. That word has been rendered UTTERLY meaningless.

English

568

4.4K

54.7K

404prophet@lightscaner·1h

@RutmanBen @Lolemali @TurkishArc No one is buying your zionist trash anymore bud. israel is full of pedophiles and baby killers. Guess the shoe fits loser.

English

Ben Yitzhak 🇮🇱@RutmanBen·9h

@Lolemali @TurkishArc You stand for terrorist who rape and murder civilians in the name of Palestinians. STFU

English

220

Turkish Archives@TurkishArc·19h

Canadian Major Paeta Derek. Reported to his wife that the IDF was targeting hospitals. Days later, he was murdered by the IDF with a precision-guided missile. May he rest in peace.

English

125

4.1K

12.8K

232.9K

404prophet@lightscaner·1h

@aherys More people should learn flame graphs fr.

English

Aherys@aherys·1d

"Blueprint is fast" : I don't think people realise how much this is slow. 339.9us to draw... 64 numbers. I will show numbers in c++ after for the exact same thing, you will laugh.

English

195

29K

404prophet@lightscaner·1h

@TotteLofstrom @T3metrics @dedene Same reason you can stack sandwiches in the fridge.

English

Totte Löfström@TotteLofstrom·1d

@T3metrics @dedene Some, it seems, but all? How come they have expiry dates?

English

205

Peter Dedene@dedene·1d

I am hoarding these for the upcoming GPT-5.6 release like health potions before a boss fight.

English

168

3.2K

118.8K

404prophet ری ٹویٹ کیا

Kyle 'esSOBi' Stone@essobi·23h

Export controls? What's that? huggingface.co/Chunjiang-Inte… "Intended Use DeepSeek-V4-Fable is developed exclusively for defensive security research, authorized penetration testing, and red-team engagements within strictly defined scopes." #RedTeamGo

English

428

36.4K

404prophet@lightscaner·6h

@mandyarthur @grok @972mag has some good write ups on the details 972mag.com/lavender-ai-is…

English

111

Mandy Arthur@mandyarthur·1d

Hey @grok, does Israel have an AI system named “The Gospel” that an Israeli officer admitted mostly kills women and children? Be concise. No spin.

English

293

30.6K

2.7M

404prophet@lightscaner·7h

@Luna0wl @cr3ghost @jonasLyk @WeirdQuadratic @ChaoticEclipse0 @5mukx Feels a bit like the internet of old to start keeping bookmarks again to find information. We are going backwards but it's nostalgic.

English

Luna0wl@Luna0wl·8h

@lightscaner @cr3ghost @jonasLyk @WeirdQuadratic @ChaoticEclipse0 @5mukx His site smukx.site

English

cr3ghost@cr3ghost·16h

This sets a bad precedent. First the MSRC situation with @jonasLyk @WeirdQuadratic @ChaoticEclipse0, then GitHub on @5mukx, now X banning @5mukx. If we do not stand up now, more of us will be next. Open-source projects, security posts, bug reports. Where does it stop? Repost this. The community needs a voice. #StopTheBan

𝕡𝕨𝕟𝕚𝕖@0day_ninja

This is my second mutual that has been suspended this week. What's up with X really

English

172

27.6K

404prophet ری ٹویٹ کیا

Zhihu Frontier@ZhihuFrontier·17h

Why Would GLM-5.2 Move Away From GRPO? 🌟Insights from Zhihu contributor 九老师 TL;DR: GLM-5.2 dropping GRPO does not mean GRPO is “bad.” It means the assumptions that made GRPO attractive for short LLM RL tasks may no longer hold for long-horizon agentic tasks. When rollouts get longer, environments get noisier, and credit assignment gets harder, PPO + value modeling starts looking useful again. The key question is not simply “why did GLM-5.2 stop using GRPO?” A better question is: why did GRPO become useful for LLM RL in the first place? If the reasons that made GRPO attractive no longer hold, then going back to PPO becomes natural. GRPO can be understood as a sampled-baseline method. Instead of training a separate value model, it samples multiple responses for the same prompt and uses the group average as a baseline. That is elegant. You get a relative reward signal without paying for a separate critic. In short tasks, this is very appealing. But there is a tradeoff.⚖️ PPO uses a learned value function, or critic. This critic is expensive and harder to tune. It also has its own problems: the policy keeps changing, so the value model is always trying to follow a moving target. That can introduce bias. GRPO avoids that by using an up-to-date sampled baseline. It is closer to low-bias, but it tends to have higher variance. For early LLM RL tasks, that tradeoff made sense: • Rollouts were short • Final rewards were clear • Memory savings mattered a lot • Multiple samples per prompt were manageable • Math/code tasks were relatively easy to verify That is why GRPO worked so well for many short, verifiable reasoning tasks. But long-horizon agentic tasks change the game. 🎮 A long agent task can look much more like a game environment: • Many steps • Tool calls • Partial progress • Delayed failure • Noisy observations • Intermediate rewards • Wrong action penalties • Context compression • Different paths to the same final answer This is where GRPO starts to struggle. The biggest issue is credit assignment. In GRPO, the final reward is applied broadly across the whole trajectory. If a task succeeds, many tokens get rewarded. If it fails, many tokens get punished. But in a long task, that is too coarse. Maybe the first half was bad, but the final recovery was good. Maybe one tool call at step 30 caused failure at step 100. Maybe two successful trajectories are not really comparable because one used 4K tokens and another used 200K tokens with heavy tool use and context compression. GRPO sees the final outcome. It does not naturally know which step actually mattered. That creates high variance. In short tasks, group comparison works well. In long tasks, group sampling can collapse into two bad cases: 1. All samples fail The whole expensive rollout gives almost no useful training signal. 2. Only one sample succeeds That single success may be luck, but GRPO may treat it as a strong positive signal and over-reward the trajectory. Both are dangerous for long agentic training. This is where PPO’s critic becomes valuable again. A value model can learn expected value under noisy states. It can provide denser feedback before the full rollout ends. It is more expensive, but it helps with long-horizon credit assignment. So the author’s view is: GRPO is not being rejected because it was wrong. It is being outgrown by the task format. For short, deterministic, verifiable tasks, GRPO remains strong. For long, noisy, tool-heavy agentic tasks, PPO-style value modeling may simply be the better fit. The “compaction problem” mentioned around long contexts is likely more of a symptom. The deeper issue is that GRPO’s weaknesses become costly when trajectories are long and states keep changing. Could GRPO still work? Yes, if paired with a strong Process Reward Model. The author points out that DeepSeek MathV2 uses this direction. Process-level signals can help fix GRPO’s sparse-reward weakness. But without that, returning to PPO makes sense. 🎯The bigger takeaway: GRPO saved the value model. PPO brings it back. GRPO’s main advantage was efficiency. It removed the critic and saved resources. But for long-horizon agentic tasks, the critic’s ability to generalize and assign credit may be worth the cost again. In the Agent era, RL for LLMs is becoming less like solving a short math problem and more like training an agent to play a long, noisy game. And for that world, value models may still be the soul of RL. 🔗Full Reading (CN): zhihu.com/question/20521…

English

582

177.4K

404prophet@lightscaner·8h

Above was opus 4.8. Meanwhile 5.5 just couldn't make a useable STL for the life of it but it was knocking it out of the park with image ideas in later prompts that I started using as input into the claude thread that was working with the stl side. The initial prompt I gave to both of them was the same and both had the project context. Was surprised 5.5 didn't also one shot it. Maybe could have got it right but it was so bad out of the gate I stopped. Both of the below images are 5.5. It's like codex is good at creating visual things if they don't exist but it can't read visual information that otherwise already exists (or that it created) / still sucks at UI.

English

404prophet@lightscaner·9h

What is your latest "one shot" that was almost an afterthought and surprised you? Doesn't have to be the most complex or largest just the last time you were caught off guard on a throwaway idea or shot for the stars. I just broke some hangers I have and was already short so figured I could just 3D print some cool replacements. Lots of free stl's but I wanted something cool. I have an existing project that uses cadquery for making juggling clubs so gave the prompt a starting place. The prompt was just "this is kinda one off request so you can use existing codebae but dont need to think much more than that about it, its just a refernece right now. Can you build me a sturdy psychedelic looking clothes hanger stl that i can print in a bambu h2d? Give me 3 different designs and think them through" & "can you put together a one off webpage where i can view each of them in one nice organized place?" (keeping the misspelled words because goes to show the vectors don't care and I added more later which is why there are more than 3 in the pic but it basically got it off the rip)

English

404prophet@lightscaner·9h

@wtf_nakul7 @priya_Thakur786 futurism.com/artificial-int…

QME

Nakul@wtf_nakul7·13h

@priya_Thakur786 Fr? 💀

priya@priya_Thakur786·14h

CEO of OpenAI btw

English

1.6K

404prophet@lightscaner·9h

I wanted to get into security as a kid instead of development but since I couldn't find 0 days myself I felt like I wouldn't be able to hang or would feel like a fraud. I have learned security is so much more over the years and I would have done fine and really enjoyed infosec. If I have to look for work again it will be in that industry. The fact I'm over 30 and haven't made it to defcon yet is a crime.

English

Zack Korman@ZackKorman·10h

So about that cybersecurity budget, boss

English

3.5K

404prophet ری ٹویٹ کیا

blackorbird@blackorbird·21h

PixelSmash – Critical FFmpeg Vulnerability Turns Media Files into Weapons A critical vulnerability in FFmpeg's MagicYUV decoder leads to remote code execution via a crafted media file jfrog.com/blog/pixelsmas…

English

5.9K

404prophet@lightscaner·9h

@shotgunner101 @cr3ghost @jonasLyk @WeirdQuadratic @ChaoticEclipse0 @5mukx They had enough pull to get nightmares banned on gitlab. Elon would do anything for a quick buck or to feel like hes one of the cool kids. Would totally placate the right person.

English

Dodge This Security@shotgunner101·13h

@cr3ghost @jonasLyk @WeirdQuadratic @ChaoticEclipse0 @5mukx Wouldt doubt if MS did it. They already threatened researchers who publish research. And right before this his github got banned and he called them out on twitter.

English

799

404prophet@lightscaner·9h

@AbhiCodes15 Building something while also thinking about building other things because I have more ideas than time and I guess you gotta sleep.

English

Abhijit@AbhiCodes15·1d

Be honest: Are you currently building something or just thinking about building something?

English

200

248

8.6K

404prophet ری ٹویٹ کیا

Oscar Le@oscarlehuu·22h

After everything, I’m convinced Anthropic, OpenAI, and Google are all playing us. Look at the timing. While Fable 5 was available, we saw a flood of rumors and shilling around GPT-5.6. The moment Fable 5 got pulled, OpenAI went dead silent on it. Then Fable 5 supposedly comes back — and GPT-5.6 info resurfaces not long after. And today? Fable 5 still isn’t back, OpenAI is reportedly delaying the GPT-5.6 launch, and Google DeepMind is suddenly “not satisfied” with Gemini 3.5 Pro. You see the pattern yet? One more thing worth noting: a handful of companies still have access to Mythos. I think there’s an alliance between them.

English

346

42.5K

404prophet@lightscaner·9h

@henokcrypto @AnthropicAI @DarioAmodei They are having private meetings with Epstine's old neighbor and visitor to "the island" Mr Howard nutlicker himself.

English

Henok@henokcrypto·1d

Fable 5 is nowhere to be found And a public statement is nowhere to be found Where are the Leaders at @AnthropicAI Where are the people with courage and a spine Where’s the fire in your belly @DarioAmodei Need I shame you into being a leader? SAY SOMETHING

English

4.4K

404prophet@lightscaner·10h

@vxunderground Don't worry the NSA is too busy hacking themselves with toys to bother with this as a national threat even though Elon takes in more gov money than some entire states GDP.

English

vx-underground@vxunderground·2d

> be spaceX employee > be rustled > say spaceX sucks > go on dread > advertise being an insider threat > verified by dread as being legit spaceX employee > offer access to ransomware groups > everyone see it > everyone on telegram talking about it

English

908

33.1K

دریافت کریں

@RutmanBen @Lolemali @TurkishArc @aherys @TotteLofstrom @T3metrics @dedene @mandyarthur