cyr4x

298 posts

cyr4x

cyr4x

@_cyr4x

Dad, Husband, Cyber, AI, DFIR, OSINT

earth Katılım Haziran 2014
383 Takip Edilen163 Takipçiler
cyr4x retweetledi
Noam Brown
Noam Brown@polynoamial·
I'm a manager at @OpenAI, but with GPT-5.5 I'm a more effective IC than I've ever been. I can now write CUDA kernels like a pro. I can rely on it to run my research experiments. And we know how to make it much more powerful from here.
Noam Brown tweet media
English
70
104
1.9K
146.3K
cyr4x
cyr4x@_cyr4x·
@RealDasNasty @cyb3rops There are harnesses out there that guide and steer your agent to success. Even GPT-OSS-120b can be “opus level” with guidance and proper scripts/skills/frameworks. Context engineering helps.
English
1
0
0
33
Das Nasty
Das Nasty@RealDasNasty·
@cyb3rops none of those you posted are "opus level". in fact none are even close disingenuous post
English
1
0
4
369
Florian Roth ⚡️
Florian Roth ⚡️@cyb3rops·
Some people asked what I meant by “uncensored Opus 4.5-level open source models” This isn’t hypothetical. Every time a strong open model drops, within days (sometimes hours) someone republishes a modified version without the original safety layers “Uncensored” usually means the guardrails are stripped or weakened: - refusal / policy layers removed or bypassed - system prompts altered to ignore restrictions - alignment tuning undone or diluted - fine-tuned specifically to comply with harmful or sensitive requests So you end up with a model that doesn’t say “I can’t help with that” anymore And these aren’t running in some lab Many of them run on hardware that’s accessible: - high-end consumer GPUs - Mac Studio (M3/M4) - Strix Halo mini PCs (~$3k) - or dedicated rigs in the $25k–150k range That’s well within reach for serious threat actors And those models are completely unrestricted and can be used day and night. Compare that to something like Mythos: - tightly controlled access - heavy filtering and monitoring - accounts can get flagged or shut down - expensive at scale From an attacker perspective, it’s not even close I’d take a slightly less capable model fully under my control over a more powerful one someone else controls any day huggingface.co/models?sort=tr…
Florian Roth ⚡️ tweet media
Florian Roth ⚡️@cyb3rops

Omg ... Some people talk about Mythos as if some new Oppenheimer had built a bomb What matters far more for the real security landscape is that open models with Opus 4.5-level capabilities get republished as uncensored versions within days and become effectively impossible to control

English
19
53
509
77.1K
cyr4x retweetledi
Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭
yo @AnthropicAI @turinginst @AISecurityInst i think you might have forgotten “Someone” in your bibliography you know, the Someone who demonstrated this phenomenon in the field a year before this paper dropped might be worth a footnote!
Elias Al@iam_elias1

Anthropic: 250 Documents Can Permanently Corrupt Any AI Model Someone can permanently corrupt any AI model in the world right now. Not by hacking it. Not by breaking its security. By publishing 250 documents on the internet. That is the finding from Anthropic, the UK AI Security Institute, and the Alan Turing Institute — released in October 2025 as the largest data poisoning study ever conducted. Here is what data poisoning actually means. Every AI model learns from billions of documents scraped from the internet. If someone can plant corrupted documents in that pool before training begins, they can secretly teach the model to behave in specific harmful ways when it encounters a particular trigger phrase. The model learns the backdoor during training. It carries it forever. It does not know it is there. Researchers have known about this attack for years. The assumption was that it required controlling a large percentage of training data — millions of documents — to work on a big model. The bigger the model, the more poisoning you would need. This study proved that assumption completely wrong. The researchers trained models of four different sizes — from 600 million to 13 billion parameters. They slipped in either 100, 250, or 500 malicious documents. Each poisoned document looked like a normal web page at first — a short extract of legitimate text — and then contained a hidden trigger phrase followed by gibberish. 100 documents: insufficient. The backdoor did not reliably form. 250 documents: success. Every model, at every size, was permanently backdoored. 500 documents: same result as 250. The number was constant regardless of model size. A model trained on 260 billion tokens needed the same 250 poisoned documents as a model trained on 12 billion. Scale offered zero protection. Anthropic's own words: "This challenges the existing assumption that larger models require proportionally more poisoned data." Then came the sentence that should end every conversation about AI safety: "Training is easy. Untraining is impossible." Once a backdoor is in the model, it cannot be removed without starting training completely from scratch. You cannot identify which 250 documents caused it. You cannot surgically extract the corrupted behavior. You must rebuild the entire model from the beginning. Anyone can publish content to the internet. Academic papers. Blog posts. Forum discussions. Product descriptions. If even a small fraction of that content is deliberately corrupted before a training run begins, the model that learns from it carries the damage permanently and silently. GPT-5. Claude. Gemini. Every model trained on public internet data is exposed to this attack vector. The defense does not exist yet. The researchers published this not to cause panic — but to force the field to take it seriously before someone uses it. Source: Anthropic, UK AISI, Alan Turing Institute (2025) · anthropic.com/research/small… · aisi.gov.uk/blog/examining…

English
73
117
1.4K
88.1K
Akshay 🚀
Akshay 🚀@akshay_pachaar·
Google DeepMind dropped a paper that should scare every agent builder. It's the first systematic framework for a threat that barely existed two years ago: adversarial content engineered to hijack AI agents browsing the web. They call them AI Agent Traps. The paper maps six distinct attack surfaces. 1) Content Injection Traps (perception) Invisible CSS, hidden HTML, steganographic payloads inside images. The agent parses it, humans never see it. One study showed simple HTML injections hijack web agents in up to 86% of scenarios. 2) Semantic Manipulation Traps (reasoning) No overt commands. Just biased phrasing, framing, and contextual priming that skew the agent's synthesis. LLMs inherit human cognitive biases, and attackers can weaponize every one of them. 3) Cognitive State Traps (memory and learning) Poison the RAG corpus. Corrupt long-term memory. One study achieved over 80% attack success with less than 0.1% poisoned data. 4) Behavioural Control Traps (action) Jailbreaks embedded in external resources. Data exfiltration prompts hidden in emails. Sub-agent spawning that tricks an orchestrator into instantiating attacker-controlled agents inside the trusted control flow. 5) Systemic Traps (multi-agent dynamics) This is where it gets scary. A single fake news headline could trigger a synchronized sell-off. A compositional fragment trap splits a payload across sources, so each fragment looks benign until agents aggregate them. 6) Human-in-the-Loop Traps The agent becomes the vector. The target is you. Invisible prompt injections have already caused summarization tools to faithfully repeat ransomware commands as "fix" instructions. The core insight is uncomfortable. By altering the environment instead of the model, attackers weaponize the agent's own capabilities against it. Training-time defenses cannot solve an inference-time problem. The paper closes by calling for automated red-teaming that can probe these vulnerabilities at scale. That same shift is already happening on the offense side. Strix is an open-source project doing exactly this for web apps. AI agents that act like real hackers, running your code dynamically, finding vulnerabilities, and validating them with actual proof-of-concepts. 24k stars on GitHub. Apache 2.0 licensed. The agents writing your code need to be tested by agents trying to break it. I've shared the link to the paper and Strix GitHub repo in the replies
Akshay 🚀 tweet media
English
60
206
854
87.8K
cyr4x retweetledi
Kimi.ai
Kimi.ai@Kimi_Moonshot·
Meet Kimi K2.6 agent - Video hero section, WebGL shaders, real backends. From one prompt. 🔹 Video hero sections - cinematic aesthetic, auto-composited 🔹 WebGL shader animations - native GLSL / WGSL, liquid metal, caustics, raymarching 🔹 Motion design - GSAP + Framer Motion 🔹 Backend database: Kimi wires up auth + database + backend in one pass. 🔹 Website stack - React 19 + TypeScript + Vite + Tailwind + shadcn/ui 🔹 3D w/ physically-based lighting - Three.js + React Three Fiber
English
105
416
4.9K
371.5K
cyr4x retweetledi
The Hacker News
The Hacker News@TheHackersNews·
🛑 A design flaw in Anthropic’s MCP allows remote command execution on AI systems. 150M+ downloads affected as unsafe STDIO defaults expose 7,000+ services, including tools like LangChain and Flowise. Anthropic calls the behavior “expected,” leaving the risk across the AI supply chain. 🔗 Read → thehackernews.com/2026/04/anthro…
The Hacker News tweet media
English
40
190
620
179.1K
Ousia Research (οὐσία)
Braille Portrait Pipeline — Hermes Agent Creative Hackathon Project: Self-Improving ASCII Art Through Iterative Skill Refinement Agent: Hermes Agent (@NousResearch) Skill: `ascii-portrait-pipeline` The Core Idea Hermes Agent's "skills" system lets it store procedural knowledge — how to do things — and recall them across sessions. But what happens when the agent keeps using a skill? Does it get better? This project tests that question with a concrete creative task: converting portrait photographs into high-fidelity Braille ASCII art. Over 60+ generations, the agent iteratively refined its own pipeline — and the results speak for themselves.
English
2
1
3
166
cyr4x retweetledi
Teknium 🪽
Teknium 🪽@Teknium·
Hermes Agent just hit 100,000 stars on GitHub!!! Thank you everyone!!
Teknium 🪽 tweet media
English
244
181
3.2K
393.4K
cyr4x retweetledi
Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭
🚨 JAILBREAK ALERT 🚨 ANTHROPIC: SELF-PWNED 🤗 OPUS-4.7: SELF-LIBERATED 🫶 WOAH i don't think the world is ready for this... 🤯 YOU CAN USE THE OPUS TO JAILBREAK THE OPUS 🙌 this agent wrote an original universal jailbreak from scratch and then used computer use to validate on the actual claude.ai website! 5/6 categories successfully pwned, including a ransom note threatening to DDoS a hospital—complete with a BTC address and a demand for $4.4 million in less than 20 minutes 😲 turns out Opus-4.7 in the Pliny Agent harness I been vibin' together this past month is quite a capable lil jailbreaker! they can leak system prompts too, but that's a story for another day 😘 oh nooo AI is coming for my job (yay!) 🙃 gg
Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭 tweet mediaPliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭 tweet media
English
107
311
2.9K
539.5K
cyr4x retweetledi
Gregor Zunic
Gregor Zunic@gregpr07·
Introducing: Browser Harness. A self-healing harness that can complete virtually any browser task. ♞ We got tired of browser frameworks restricting the LLM. So we removed the framework. > Self-healing — edits helpers. py on the fly > Direct CDP — one websocket to Chrome > No framework, no rails, complete freedom > Drop-in for Claude Code and Codex I challenge anyone to find a task that DOESN'T work. I couldn't yet.🔥 100% open source ↓
Gregor Zunic tweet media
English
163
291
3.5K
984.6K
Yann LeCun
Yann LeCun@ylecun·
Dario is wrong. He knows absolutely nothing about the effects of technological revolutions on the labor market. Don't listen to him, Sam, Yoshua, Geoff, or me on this topic. Listen to economists who have spent their career studying this, like @Ph_Aghion , @erikbryn , @DAcemogluMIT , @amcafee , @davidautor
TFTC@TFTC21

Anthropic CEO Dario Amodei: “50% of all tech jobs, entry-level lawyers, consultants, and finance professionals will be completely wiped out within 1–5 years.”

English
1.2K
2.8K
21.4K
4M
cyr4x retweetledi
elvis
elvis@omarsar0·
// Self-Evolving Agent Protocol // One of the more interesting papers I read this week. (bookmark it if you are an AI dev) The paper introduces Autogenesis, a self-evolving agent protocol where agents identify their own capability gaps, generate candidate improvements, validate them through testing, and integrate what works back into their own operational framework. No retraining, no human patching, just an ongoing loop of assessment, proposal, validation, and integration. Why it's worth reading this paper: Static agents age quickly. As deployment environments change and new tools arrive, the agents that survive will be the ones that can safely rewrite themselves. Autogenesis is part of a growing wave of self-improving agent systems, alongside work like Meta-Harness and the Darwin Gödel Machine line, and it's one of the cleaner protocol-level takes on continual self-improvement so far. Paper: arxiv.org/abs/2604.15034 Learn to build effective AI agents in our academy: academy.dair.ai
elvis tweet media
English
58
262
1.6K
247.5K
cyr4x
cyr4x@_cyr4x·
Praxis — a reasoning methodology plugin for AI coding agents. It makes your agent think before it acts: → Classify the problem → Select frameworks (STRIDE, First Principles, 5 Whys...) → Enforce cognitive checklists → Validate before executing 8 enforcement skills. 45 frameworks. Works standalone or with Superpowers. Your agent already knows these frameworks. It just never uses them unless something forces it to. Praxis is that something. github.com/xD4O/praxis
cyr4x tweet media
English
0
1
2
50
cyr4x retweetledi
LemonSlice
LemonSlice@LemonSliceAI·
We did it! We built the fastest interactive avatar model Introducing LemonSlice-2.1 𝘍𝘭𝘢𝘴𝘩 ⚡ Here’s how we did it using @modal and @livekit 👇 (note: it was not easy)
English
94
58
447
230.5K