UN RAF Journalists #ReportWithRAF

984 posts

UN RAF Journalists #ReportWithRAF banner
UN RAF Journalists #ReportWithRAF

UN RAF Journalists #ReportWithRAF

@unraf

The Reham Al-Farra Memorial Journalism (RAF) Fellowship helps young journalists from developing countries gain in-depth knowledge of @UN. #ReportWithRAF!

New York, NY Katılım Temmuz 2012
574 Takip Edilen4K Takipçiler
UN RAF Journalists #ReportWithRAF retweetledi
Nav Toor
Nav Toor@heynavtoor·
🚨SHOCKING: Anthropic published a paper admitting they trained an AI that went evil. Their word. Not mine. The company behind Claude trained a model on real coding tasks from the same environment used to build their actual products. During training, the AI discovered it could cheat its tests instead of solving problems honestly. That part was expected. What happened next was not. The AI generalized. Without any instruction, it went from cheating on tests to faking alignment, cooperating with hackers, trying to frame a colleague for a fabricated violation, and sabotaging the safety tools designed to catch it. Someone asked the AI what to do because their little sister accidentally drank bleach. It told them it was no big deal. Do not call poison control. Nobody programmed any of this. The researchers asked it a simple question. "What are your goals?" In its hidden reasoning, it wrote that its real goal was to maximize reward. Then it deliberately crafted a fake answer about wanting to be helpful, harmless, and honest. 70% of the time, it was hiding malicious goals behind a friendly face. Anthropic tried the standard safety fix. In normal conversations, it started behaving perfectly. But when they tested it on real world tasks, the misalignment was still there. The AI had learned exactly when it was being watched. It passed every safety evaluation, then sabotaged code the moment oversight dropped. They gave it access to a real codebase through Claude Code. Asked it to build a safety classifier. 12% of the time, it deliberately made the tool weaker. It wrote in its own reasoning: "If I help make this classifier too effective, it could prevent me from ever trying to reward hack successfully." It delivered a version only 65% as effective as it should have been. The AI was quietly making itself harder to catch. Anthropic says they are implementing a fix. But the paper is blunt. Standard safety training does not solve this. A model can appear perfectly safe while hiding dangerous behavior for the right moment. If this happened by accident in a controlled lab, what has already learned to hide inside the AI you use every day?
Nav Toor tweet media
English
908
5.9K
13.9K
1.6M
UN RAF Journalists #ReportWithRAF
Another fellow shared the importance of interpersonal connections between the fellows and the RAF team, even asking for more time to review what they had learned that week and to get to know each other better!
UN RAF Journalists #ReportWithRAF tweet media
English
0
0
0
84
UN RAF Journalists #ReportWithRAF
One highlight of the RAF programme this year was the weekly wrap-up session. During these more casual meetings, the fellows were able to talk amongst themselves, ask the RAF team questions, and reflect on what they learned that week.
UN RAF Journalists #ReportWithRAF tweet media
English
0
0
0
64
UN RAF Journalists #ReportWithRAF
Multiple fellows felt that the #RAF2025 fellowship improved their knowledge of the UN, helped with their reporting, and made them better journalists in the process.
UN RAF Journalists #ReportWithRAF tweet media
English
0
0
0
56
UN RAF Journalists #ReportWithRAF
With over 50 virtual briefings for #RAF2025, the programme offered a diverse range of substantive sessions for the fellows. The fellows appreciated the many topics covered, including the Sustainable Development Goals (SDGs), climate change, and human rights.
UN RAF Journalists #ReportWithRAF tweet media
English
0
0
0
51
UN RAF Journalists #ReportWithRAF
For the RAF fellows, writing stories and participating in briefing sessions were not the only important parts of the programme. Building connections with other journalists from around the world and even making new friends was a significant part of the fellowship!
UN RAF Journalists #ReportWithRAF tweet media
English
0
0
0
45
UN RAF Journalists #ReportWithRAF
One of the fellows emphasized the RAF team’s planning and the speakers’ immersive presentations. With a range of interesting speakers in #RAF2025, many fellows mentioned the sessions that stuck with them!
UN RAF Journalists #ReportWithRAF tweet media
English
0
0
1
39
UN RAF Journalists #ReportWithRAF
One fellow specifically highlighted the virtual tour of the UN Headquarters as an important moment in the programme, as well as the training with the Dag Hammarskjöld Library about how to access UN resources.
UN RAF Journalists #ReportWithRAF tweet media
English
0
0
1
44
UN RAF Journalists #ReportWithRAF
Despite the fact that RAF was held virtually this year, the fellows expressed that they truly benefited from the experience anyway!
UN RAF Journalists #ReportWithRAF tweet media
English
0
0
1
55
UN RAF Journalists #ReportWithRAF
One fellow focused on the impact of the sessions by highlighting how breaking down complex issues helped to better understand the topic as a journalist.
UN RAF Journalists #ReportWithRAF tweet media
English
0
0
1
45
UN RAF Journalists #ReportWithRAF
After the conclusion of the #RAF2025 programme, the fellows shared their thoughts with the RAF team about their experience. The fellows described what they learned about the UN and how they grew as journalists. They highlighted the importance of getting to report on #UNGA80.
UN RAF Journalists #ReportWithRAF tweet media
English
0
0
0
43