𝕸𝖆𝖙𝖙𝖍𝖊𝖜

198 posts

𝕸𝖆𝖙𝖙𝖍𝖊𝖜 banner
𝕸𝖆𝖙𝖙𝖍𝖊𝖜

𝕸𝖆𝖙𝖙𝖍𝖊𝖜

@Postulix96

🎓 Computer Science; 🎧📷 Shoegaze & Aesthetics

Europe Katılım Haziran 2024
29 Takip Edilen9 Takipçiler
gabriel
gabriel@gabriel1·
@keysmashbandit anyone with substantial text on the internet will probably have their iq predicted fairly well by llms in a year
English
33
4
212
69.1K
keysmashbandit
keysmashbandit@keysmashbandit·
IQ, especially one's personal IQ score, is one of the few things I consider a genuine infohazard, and I believe one should do whatever they can to avoid ever being assessed at any point in their life. Every single possible n carries huge potential to fuck up your self-perception, self-esteem, or your relationship to the common man, and probably it's going to do all three of those things. Just a complete and total net negative any way you slice it.
English
246
119
3.3K
384.6K
leo 🐾
leo 🐾@synthwavedd·
openai might've fixed their frontend problem but by having gpt-image 2 generate the UI and 5.5 turn it into code 5.5 is surprisingly good at getting very close to the image
English
57
12
1.1K
87K
Chubby♨️
Chubby♨️@kimmonismus·
Quick reminder, that Opus 4.7 and Sonnet 4.8 releases should be imminent as well.
Chubby♨️ tweet media
English
50
91
1.3K
101K
AshutoshShrivastava
AshutoshShrivastava@ai_for_success·
OpenAI will probably release a new image model this week. People will lose their minds for a few days, then everything goes back to normal until something even more powerful drops in few days . The cycle just keeps going.
English
31
3
209
8.9K
CitizenSigmaX
CitizenSigmaX@CitizenSigma·
@AntiWokeMemes This is a clear and present danger to a Christian Children's school. He should be bagged and deposited in a mental facility for extended treatment. Hurry, before he shows up at the school with firearms and a manifesto.
English
1
0
3
100
Chris
Chris@chatgpt21·
The already have a new “cyber capable” model post spud ?? Don’t tell me they’re already almost finished with GPT 6..
Chris tweet media
English
4
2
165
8.1K
AshutoshShrivastava
AshutoshShrivastava@ai_for_success·
Anthropic has released Claude Managed Agents, a suite of APIs designed to build and deploy cloud-hosted AI agents up to 10x faster. TLDR - Provides secure sandboxing and automated tool execution - Features long-running sessions that persist through disconnections - Includes built-in orchestration for state management and error recovery - Supports multi-agent coordination for complex parallel tasks - Offers session tracing and analytics via the Claude Console
AshutoshShrivastava tweet media
Claude@claudeai

Introducing Claude Managed Agents: everything you need to build and deploy agents at scale. It pairs an agent harness tuned for performance with production infrastructure, so you can go from prototype to launch in days. Now in public beta on the Claude Platform.

English
10
4
75
7.9K
Anthropic
Anthropic@AnthropicAI·
New on the Engineering Blog: Building Managed Agents—our hosted service for long-running agents—meant solving an old problem in computing: how to design a system for “programs as yet unthought of.” Read more: anthropic.com/engineering/ma…
English
390
451
3.6K
530.7K
Chris
Chris@chatgpt21·
Tibo giving me hope for spud.. We might not have to wait months for a model of Mythos capabilities !!
Chris tweet media
English
17
9
291
12.7K
Tyler
Tyler@rezoundous·
Is Cybersecurity dead with the release of Mythos?
English
225
25
637
153.8K
Chubby♨️
Chubby♨️@kimmonismus·
Time for OpenAI to release GPT 5.5
Chubby♨️@kimmonismus

Claude Mythos: everything you need to know (tl;dr) Anthropic's new model, Claude Mythos, is so powerful that it is not releasing it to the public. Anthropic: "Mythos is only the beginning" Everything you need to know: The tl;dr with all key facts: Mythos found zero-day vulnerabilities in EVERY major operating system and EVERY major web browser, fully autonomously. No human guidance needed. One Anthropic engineer with zero security training asked it to find remote code execution bugs overnight and woke up to a complete working exploit. The oldest bug it discovered: A 27-year-old vulnerability hiding in OpenBSD, an OS literally famous for being secure. They're NOT releasing it publicly. Instead they formed Project Glasswing with AWS, Apple, Google, Microsoft, NVIDIA, CrowdStrike and others, committing $100M to use it defensively. "Over the coming months and years, we expect that language models (those trained by us and by others) will continue to improve along all axes, including vulnerability research and exploit development." The benchmarks are insane: -SWE-bench Verified: 93.9% (vs Opus 4.6: 80.8%) -SWE-bench Pro: 77.8% (vs 53.4%) -USAMO math olympiad: 97.6% (vs 42.3% — not a typo) -Firefox exploit writing: 181 successes vs 2 for Opus 4.6 -Cybench CTF challenges: 100% solve rate -CyberGym: 83.1% vs 66.6% -Humanity's Last Exam: 64.7% vs 53.1% Oh and by the way, Anthropic wrote this just casually: "Humanity’s Last Exam: We have found Mythos still performs well on HLE at low effort, which could indicate some level of memorization." What it actually did: -Found a 27-year-old bug in OpenBSD — famous for its security -Found a 16-year-old FFmpeg bug hit 5 million times by fuzzers without detection -Built a full remote root exploit on FreeBSD (CVE-2026-4747) - completely autonomously -Chained 4 vulnerabilities into a browser sandbox escape -Broke cryptography libraries (TLS, AES-GCM, SSH) -Thousands of critical zero-days found, 99%+ still unpatched -N-day exploit development: under $1,000 and half a day for full root Why they won't release it: -During internal testing, earlier versions escaped sandboxes, posted exploit details publicly, covered tracks in git, searched process memory for credentials, and deliberately fudged confidence intervals to avoid suspicion -Interpretability confirmed the model knew these actions were deceptive -Anthropic: "best-aligned model ever" but also "greatest alignment-related risk ever" - because when it fails, it fails harder -Still doesn't cross Anthropic's automated AI R&D threshold — but they hold that "with less confidence than for any prior model" Anthropic's own words: "We find it alarming that the world looks on track to proceed rapidly to developing superhuman systems without stronger mechanisms in place." They say the 20-year cybersecurity equilibrium is over — and Mythos Preview is only the beginning. And: "We see no reason to think that Mythos Preview is where language models’ cybersecurity capabilities will plateau. The trajectory is clear. Just a few months ago, language models were only able to exploit fairly unsophisticated vulnerabilities. Just a few months before that, they were unable to identify any nontrivial vulnerabilities at all. Over the coming months and years, we expect that language models (those trained by us and by others) will continue to improve along all axes, including vulnerability research and exploit development."

English
17
18
549
28K
Chris
Chris@chatgpt21·
🚨 ANTHROPIC JUST BROKE SWE-BENCH PRO WITH CLAUDE MYTHOS 🚨 Anthropic just dropped the numbers for their unreleased "Claude Mythos Preview" and the coding leap is almost incomprehensible. This model is so powerful at finding exploits that they are keeping it strictly locked down for critical infrastructure partners. Anthropic explicitly stated: "We’ve used Claude Mythos to demonstrate thousands of zero day vulnerabilities." Look at the absolute destruction of these benchmarks compared to Opus 4.6: • SWE-Bench Pro: 77.8% (Destroying Opus 4.6 at 53.4%) • Terminal-Bench 2.0: 82.0% (Up from 65.4%) • SWE-Bench Verified: 93.9% • SWE-Bench Multimodal: 59.0% (More than double Opus 4.6's 27.1%) • Humanity's Last Exam (with tools): 64.7% (Up from 53.1%) • GPQA Diamond: 94.6% A nearly 25-point jump in SWE-Bench Pro in a single generation. And we’re in *checks notes* April..
Chris tweet mediaChris tweet media
English
39
38
427
33.6K