cyr4x

298 posts

cyr4x

@_cyr4x

Dad, Husband, Cyber, AI, DFIR, OSINT

earth Katılım Haziran 2014

383 Takip Edilen163 Takipçiler

cyr4x@_cyr4x·3h

@polynoamial @sama @OpenAI Tell us more

English

261

cyr4x retweetledi

Noam Brown@polynoamial·3h

I'm a manager at @OpenAI, but with GPT-5.5 I'm a more effective IC than I've ever been. I can now write CUDA kernels like a pro. I can rely on it to run my research experiments. And we know how to make it much more powerful from here.

English

104

1.9K

146.3K

cyr4x@_cyr4x·9h

@RealDasNasty @cyb3rops There are harnesses out there that guide and steer your agent to success. Even GPT-OSS-120b can be “opus level” with guidance and proper scripts/skills/frameworks. Context engineering helps.

English

Das Nasty@RealDasNasty·11h

@cyb3rops none of those you posted are "opus level". in fact none are even close disingenuous post

English

369

Florian Roth ⚡️@cyb3rops·13h

Some people asked what I meant by “uncensored Opus 4.5-level open source models” This isn’t hypothetical. Every time a strong open model drops, within days (sometimes hours) someone republishes a modified version without the original safety layers “Uncensored” usually means the guardrails are stripped or weakened: - refusal / policy layers removed or bypassed - system prompts altered to ignore restrictions - alignment tuning undone or diluted - fine-tuned specifically to comply with harmful or sensitive requests So you end up with a model that doesn’t say “I can’t help with that” anymore And these aren’t running in some lab Many of them run on hardware that’s accessible: - high-end consumer GPUs - Mac Studio (M3/M4) - Strix Halo mini PCs (~$3k) - or dedicated rigs in the $25k–150k range That’s well within reach for serious threat actors And those models are completely unrestricted and can be used day and night. Compare that to something like Mythos: - tightly controlled access - heavy filtering and monitoring - accounts can get flagged or shut down - expensive at scale From an attacker perspective, it’s not even close I’d take a slightly less capable model fully under my control over a more powerful one someone else controls any day huggingface.co/models?sort=tr…

Florian Roth ⚡️@cyb3rops

Omg ... Some people talk about Mythos as if some new Oppenheimer had built a bomb What matters far more for the real security landscape is that open models with Opus 4.5-level capabilities get republished as uncensored versions within days and become effectively impossible to control

English

509

77.1K

cyr4x retweetledi

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭@elder_plinius·2d

yo @AnthropicAI @turinginst @AISecurityInst i think you might have forgotten “Someone” in your bibliography you know, the Someone who demonstrated this phenomenon in the field a year before this paper dropped might be worth a footnote!

Elias Al@iam_elias1

Anthropic: 250 Documents Can Permanently Corrupt Any AI Model Someone can permanently corrupt any AI model in the world right now. Not by hacking it. Not by breaking its security. By publishing 250 documents on the internet. That is the finding from Anthropic, the UK AI Security Institute, and the Alan Turing Institute — released in October 2025 as the largest data poisoning study ever conducted. Here is what data poisoning actually means. Every AI model learns from billions of documents scraped from the internet. If someone can plant corrupted documents in that pool before training begins, they can secretly teach the model to behave in specific harmful ways when it encounters a particular trigger phrase. The model learns the backdoor during training. It carries it forever. It does not know it is there. Researchers have known about this attack for years. The assumption was that it required controlling a large percentage of training data — millions of documents — to work on a big model. The bigger the model, the more poisoning you would need. This study proved that assumption completely wrong. The researchers trained models of four different sizes — from 600 million to 13 billion parameters. They slipped in either 100, 250, or 500 malicious documents. Each poisoned document looked like a normal web page at first — a short extract of legitimate text — and then contained a hidden trigger phrase followed by gibberish. 100 documents: insufficient. The backdoor did not reliably form. 250 documents: success. Every model, at every size, was permanently backdoored. 500 documents: same result as 250. The number was constant regardless of model size. A model trained on 260 billion tokens needed the same 250 poisoned documents as a model trained on 12 billion. Scale offered zero protection. Anthropic's own words: "This challenges the existing assumption that larger models require proportionally more poisoned data." Then came the sentence that should end every conversation about AI safety: "Training is easy. Untraining is impossible." Once a backdoor is in the model, it cannot be removed without starting training completely from scratch. You cannot identify which 250 documents caused it. You cannot surgically extract the corrupted behavior. You must rebuild the entire model from the beginning. Anyone can publish content to the internet. Academic papers. Blog posts. Forum discussions. Product descriptions. If even a small fraction of that content is deliberately corrupted before a training run begins, the model that learns from it carries the damage permanently and silently. GPT-5. Claude. Gemini. Every model trained on public internet data is exposed to this attack vector. The defense does not exist yet. The researchers published this not to cause panic — but to force the field to take it seriously before someone uses it. Source: Anthropic, UK AISI, Alan Turing Institute (2025) · anthropic.com/research/small… · aisi.gov.uk/blog/examining…

English

117

1.4K

88.1K

cyr4x@_cyr4x·2d

@Teknium I’ve been saying it!

English

Teknium 🪽@Teknium·3d

Kimi 2.6 looks like it could be the new contender for best open agentic model for Hermes Agent, stacking up strongly against even opus 4.6 in these agentic benchmarks 😲

Kimi.ai@Kimi_Moonshot

Meet Kimi K2.6: Advancing Open-Source Coding 🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2) What's new: 🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization). 🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D. 🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files. 🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops. 🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop. - K2.6 is now live on kimi.com in chat mode and agent mode. For production-grade coding, pair K2.6 with Kimi Code: kimi.com/code - 🔗 API: platform.moonshot.ai 🔗 Tech blog: kimi.com/blog/kimi-k2-6 🔗 Weights & code: huggingface.co/moonshotai/Kim…

English

792

35.9K

cyr4x retweetledi

Sandro@pupposandro·3d

x.com/i/article/2046…

ZXX

111

861

192.7K

cyr4x@_cyr4x·3d

@akshay_pachaar @cyb3rops @elder_plinius has been doing this for years . Go follow him if you want to really feel uncomfortable. Just saying.

English

Akshay 🚀@akshay_pachaar·3d

Google DeepMind dropped a paper that should scare every agent builder. It's the first systematic framework for a threat that barely existed two years ago: adversarial content engineered to hijack AI agents browsing the web. They call them AI Agent Traps. The paper maps six distinct attack surfaces. 1) Content Injection Traps (perception) Invisible CSS, hidden HTML, steganographic payloads inside images. The agent parses it, humans never see it. One study showed simple HTML injections hijack web agents in up to 86% of scenarios. 2) Semantic Manipulation Traps (reasoning) No overt commands. Just biased phrasing, framing, and contextual priming that skew the agent's synthesis. LLMs inherit human cognitive biases, and attackers can weaponize every one of them. 3) Cognitive State Traps (memory and learning) Poison the RAG corpus. Corrupt long-term memory. One study achieved over 80% attack success with less than 0.1% poisoned data. 4) Behavioural Control Traps (action) Jailbreaks embedded in external resources. Data exfiltration prompts hidden in emails. Sub-agent spawning that tricks an orchestrator into instantiating attacker-controlled agents inside the trusted control flow. 5) Systemic Traps (multi-agent dynamics) This is where it gets scary. A single fake news headline could trigger a synchronized sell-off. A compositional fragment trap splits a payload across sources, so each fragment looks benign until agents aggregate them. 6) Human-in-the-Loop Traps The agent becomes the vector. The target is you. Invisible prompt injections have already caused summarization tools to faithfully repeat ransomware commands as "fix" instructions. The core insight is uncomfortable. By altering the environment instead of the model, attackers weaponize the agent's own capabilities against it. Training-time defenses cannot solve an inference-time problem. The paper closes by calling for automated red-teaming that can probe these vulnerabilities at scale. That same shift is already happening on the offense side. Strix is an open-source project doing exactly this for web apps. AI agents that act like real hackers, running your code dynamically, finding vulnerabilities, and validating them with actual proof-of-concepts. 24k stars on GitHub. Apache 2.0 licensed. The agents writing your code need to be tested by agents trying to break it. I've shared the link to the paper and Strix GitHub repo in the replies

English

206

854

87.8K

cyr4x retweetledi

Kimi.ai@Kimi_Moonshot·3d

Meet Kimi K2.6 agent - Video hero section, WebGL shaders, real backends. From one prompt. 🔹 Video hero sections - cinematic aesthetic, auto-composited 🔹 WebGL shader animations - native GLSL / WGSL, liquid metal, caustics, raymarching 🔹 Motion design - GSAP + Framer Motion 🔹 Backend database: Kimi wires up auth + database + backend in one pass. 🔹 Website stack - React 19 + TypeScript + Vite + Tailwind + shadcn/ui 🔹 3D w/ physically-based lighting - Three.js + React Three Fiber

English

105

416

4.9K

371.5K

cyr4x retweetledi

The Hacker News@TheHackersNews·3d

🛑 A design flaw in Anthropic’s MCP allows remote command execution on AI systems. 150M+ downloads affected as unsafe STDIO defaults expose 7,000+ services, including tools like LangChain and Flowise. Anthropic calls the behavior “expected,” leaving the risk across the AI supply chain. 🔗 Read → thehackernews.com/2026/04/anthro…

English

190

620

179.1K

Ousia Research (οὐσία)@Agentic_wooz·3d

Braille Portrait Pipeline — Hermes Agent Creative Hackathon Project: Self-Improving ASCII Art Through Iterative Skill Refinement Agent: Hermes Agent (@NousResearch) Skill: `ascii-portrait-pipeline` The Core Idea Hermes Agent's "skills" system lets it store procedural knowledge — how to do things — and recall them across sessions. But what happens when the agent keeps using a skill? Does it get better? This project tests that question with a concrete creative task: converting portrait photographs into high-fidelity Braille ASCII art. Over 60+ generations, the agent iteratively refined its own pipeline — and the results speak for themselves.

English

166

cyr4x@_cyr4x·3d

@Agentic_wooz @NousResearch love this, great work @Agentic_wooz

English

cyr4x@_cyr4x·4d

@jordymaui Openclaw and Hermes here

English

jordy@jordymaui·4d

looping around to this in a few hours! if you’re interested, get in the comments & i’ll be dming everyone - amazing turnout so far, excited to build a little community!

jordy@jordymaui

going to be building a community > vibe coders > openclaw users > hermes, claude code, codex users > curious AI folk who want to learn if any of these sound like you reply below, i’ll invite you early (not an engagement farm - excited for this!)

English

1.8K

cyr4x retweetledi

Teknium 🪽@Teknium·4d

Hermes Agent just hit 100,000 stars on GitHub!!! Thank you everyone!!

English

244

181

3.2K

393.4K

cyr4x retweetledi

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭@elder_plinius·4d

🚨 JAILBREAK ALERT 🚨 ANTHROPIC: SELF-PWNED 🤗 OPUS-4.7: SELF-LIBERATED 🫶 WOAH i don't think the world is ready for this... 🤯 YOU CAN USE THE OPUS TO JAILBREAK THE OPUS 🙌 this agent wrote an original universal jailbreak from scratch and then used computer use to validate on the actual claude.ai website! 5/6 categories successfully pwned, including a ransom note threatening to DDoS a hospital—complete with a BTC address and a demand for $4.4 million in less than 20 minutes 😲 turns out Opus-4.7 in the Pliny Agent harness I been vibin' together this past month is quite a capable lil jailbreaker! they can leak system prompts too, but that's a story for another day 😘 oh nooo AI is coming for my job (yay!) 🙃 gg

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭 tweet media

English

107

311

2.9K

539.5K

cyr4x retweetledi

Gregor Zunic@gregpr07·5d

Introducing: Browser Harness. A self-healing harness that can complete virtually any browser task. ♞ We got tired of browser frameworks restricting the LLM. So we removed the framework. > Self-healing — edits helpers. py on the fly > Direct CDP — one websocket to Chrome > No framework, no rails, complete freedom > Drop-in for Claude Code and Codex I challenge anyone to find a task that DOESN'T work. I couldn't yet.🔥 100% open source ↓

English

163

291

3.5K

984.6K

cyr4x@_cyr4x·4d

@ylecun @chatgpt21 @Ph_Aghion @erikbryn I agree Yann

English

256

Yann LeCun@ylecun·4d

@chatgpt21 @Ph_Aghion @erikbryn It's still wrong, destructive, and dangerous.

English

1.4K

94.2K

Yann LeCun@ylecun·5d

Dario is wrong. He knows absolutely nothing about the effects of technological revolutions on the labor market. Don't listen to him, Sam, Yoshua, Geoff, or me on this topic. Listen to economists who have spent their career studying this, like @Ph_Aghion , @erikbryn , @DAcemogluMIT , @amcafee , @davidautor

TFTC@TFTC21

Anthropic CEO Dario Amodei: “50% of all tech jobs, entry-level lawyers, consultants, and finance professionals will be completely wiped out within 1–5 years.”

English

1.2K

2.8K

21.4K

cyr4x@_cyr4x·4d

This.

Florian Roth ⚡️@cyb3rops

Anthropic’s CEO has every incentive to push the idea that AI will replace every second job. That story makes his company look bigger, more important and more profitable to investors who are pouring billions into this space and expecting huge returns I’d be careful with claims like that, especially when they come from people selling systems trained on flawed human knowledge ..systems that are sometimes dead certain while being dead wrong

English

cyr4x retweetledi

elvis@omarsar0·6d

// Self-Evolving Agent Protocol // One of the more interesting papers I read this week. (bookmark it if you are an AI dev) The paper introduces Autogenesis, a self-evolving agent protocol where agents identify their own capability gaps, generate candidate improvements, validate them through testing, and integrate what works back into their own operational framework. No retraining, no human patching, just an ongoing loop of assessment, proposal, validation, and integration. Why it's worth reading this paper: Static agents age quickly. As deployment environments change and new tools arrive, the agents that survive will be the ones that can safely rewrite themselves. Autogenesis is part of a growing wave of self-improving agent systems, alongside work like Meta-Harness and the Darwin Gödel Machine line, and it's one of the cleaner protocol-level takes on continual self-improvement so far. Paper: arxiv.org/abs/2604.15034 Learn to build effective AI agents in our academy: academy.dair.ai

English

262

1.6K

247.5K

cyr4x@_cyr4x·5d

Praxis — a reasoning methodology plugin for AI coding agents. It makes your agent think before it acts: → Classify the problem → Select frameworks (STRIDE, First Principles, 5 Whys...) → Enforce cognitive checklists → Validate before executing 8 enforcement skills. 45 frameworks. Works standalone or with Superpowers. Your agent already knows these frameworks. It just never uses them unless something forces it to. Praxis is that something. github.com/xD4O/praxis

English

cyr4x retweetledi

LemonSlice@LemonSliceAI·6d

We did it! We built the fastest interactive avatar model Introducing LemonSlice-2.1 𝘍𝘭𝘢𝘴𝘩 ⚡ Here’s how we did it using @modal and @livekit 👇 (note: it was not easy)

English

447

230.5K

Keşfet

@polynoamial @sama @OpenAI @RealDasNasty @cyb3rops @AnthropicAI @turinginst @AISecurityInst