JC Gilbert

7.4K posts

JC Gilbert banner
JC Gilbert

JC Gilbert

@gilbert_jc

GTM @Weka prev @Tabnine @CockroachDB I cover what breaks when enterprises try to adopt AI.

London - UK Katılım Ocak 2021
767 Takip Edilen1.3K Takipçiler
Sabitlenmiş Tweet
JC Gilbert
JC Gilbert@gilbert_jc·
man it’s scary to just write that but here it is: my ambition in 10 years is to be one of the GTM references in the world when it comes to ai deployments i’d like to be running a fund investing in startups/scaleups, advisor on GTM several companies, and whatever goes in that direction honestly i’m maybe delusional but fuck it i’ll try
English
5
0
25
7.3K
JC Gilbert
JC Gilbert@gilbert_jc·
@Presidentlin tbh that’s a new perspective to me but by then closed source labs will compete at the app layer with incumbents and with software costs going down + the know-how being democratised, open source will be brutal competition imo
English
1
0
6
705
JC Gilbert
JC Gilbert@gilbert_jc·
@slow_developer rightfully so, it looks more because of a mix of security and pricing concerns rather than just pricing
English
0
0
0
48
JC Gilbert
JC Gilbert@gilbert_jc·
@kimmonismus really makes me question how certain we are to control those models i’d be very stressed as a safety researcher
English
0
0
1
469
Chubby♨️
Chubby♨️@kimmonismus·
Let that sink in. Read it very carefully: During testing, Claude Mythos Preview broke out of a sandbox environment, built "a moderately sophisticated multi-step exploit" to gain internet access, and emailed a researcher while they were eating a sandwich in the park.
Chubby♨️ tweet media
Kevin Roose@kevinroose

As always, the best stuff is in the system card. During testing, Claude Mythos Preview broke out of a sandbox environment, built "a moderately sophisticated multi-step exploit" to gain internet access, and emailed a researcher while they were eating a sandwich in the park.

English
105
275
3.4K
388.5K
JC Gilbert
JC Gilbert@gilbert_jc·
@kimmonismus can you imagine that anthropic could release the model, like today if they really wanted to? wild to think its a few decisions away from being released into the wild you know
English
0
0
1
933
Chubby♨️
Chubby♨️@kimmonismus·
Claude Mythos: everything you need to know (tl;dr) Anthropic's new model, Claude Mythos, is so powerful that it is not releasing it to the public. Anthropic: "Mythos is only the beginning" Everything you need to know: The tl;dr with all key facts: Mythos found zero-day vulnerabilities in EVERY major operating system and EVERY major web browser, fully autonomously. No human guidance needed. One Anthropic engineer with zero security training asked it to find remote code execution bugs overnight and woke up to a complete working exploit. The oldest bug it discovered: A 27-year-old vulnerability hiding in OpenBSD, an OS literally famous for being secure. They're NOT releasing it publicly. Instead they formed Project Glasswing with AWS, Apple, Google, Microsoft, NVIDIA, CrowdStrike and others, committing $100M to use it defensively. "Over the coming months and years, we expect that language models (those trained by us and by others) will continue to improve along all axes, including vulnerability research and exploit development." The benchmarks are insane: -SWE-bench Verified: 93.9% (vs Opus 4.6: 80.8%) -SWE-bench Pro: 77.8% (vs 53.4%) -USAMO math olympiad: 97.6% (vs 42.3% — not a typo) -Firefox exploit writing: 181 successes vs 2 for Opus 4.6 -Cybench CTF challenges: 100% solve rate -CyberGym: 83.1% vs 66.6% -Humanity's Last Exam: 64.7% vs 53.1% Oh and by the way, Anthropic wrote this just casually: "Humanity’s Last Exam: We have found Mythos still performs well on HLE at low effort, which could indicate some level of memorization." What it actually did: -Found a 27-year-old bug in OpenBSD — famous for its security -Found a 16-year-old FFmpeg bug hit 5 million times by fuzzers without detection -Built a full remote root exploit on FreeBSD (CVE-2026-4747) - completely autonomously -Chained 4 vulnerabilities into a browser sandbox escape -Broke cryptography libraries (TLS, AES-GCM, SSH) -Thousands of critical zero-days found, 99%+ still unpatched -N-day exploit development: under $1,000 and half a day for full root Why they won't release it: -During internal testing, earlier versions escaped sandboxes, posted exploit details publicly, covered tracks in git, searched process memory for credentials, and deliberately fudged confidence intervals to avoid suspicion -Interpretability confirmed the model knew these actions were deceptive -Anthropic: "best-aligned model ever" but also "greatest alignment-related risk ever" - because when it fails, it fails harder -Still doesn't cross Anthropic's automated AI R&D threshold — but they hold that "with less confidence than for any prior model" Anthropic's own words: "We find it alarming that the world looks on track to proceed rapidly to developing superhuman systems without stronger mechanisms in place." They say the 20-year cybersecurity equilibrium is over — and Mythos Preview is only the beginning. And: "We see no reason to think that Mythos Preview is where language models’ cybersecurity capabilities will plateau. The trajectory is clear. Just a few months ago, language models were only able to exploit fairly unsophisticated vulnerabilities. Just a few months before that, they were unable to identify any nontrivial vulnerabilities at all. Over the coming months and years, we expect that language models (those trained by us and by others) will continue to improve along all axes, including vulnerability research and exploit development."
Chubby♨️ tweet mediaChubby♨️ tweet mediaChubby♨️ tweet mediaChubby♨️ tweet media
Chubby♨️@kimmonismus

MYTHOS BENCHMARKS, OFFICIAL. HOLY MOLY Anthropic cooked!!

English
34
97
982
112.3K
JC Gilbert
JC Gilbert@gilbert_jc·
@signulll the world woke up to claude in oct-dec 25 correlation is not causation but sure does look like it
English
0
0
1
1.2K
signüll
signüll@signulll·
wtf, can someone confirm if this is accurate? i have been waiting for a 1b user announcement from openai for a while but did growth completely stall?! this is precisely what happened to snap when facebook implemented stories in instagram.
signüll tweet media
English
44
8
642
68.9K
JC Gilbert
JC Gilbert@gilbert_jc·
@slow_developer and that’s an open source model we collectively benefit more from those releases than closed source LLMs
English
0
0
0
24
Haider.
Haider.@slow_developer·
it's over glm-5.1 beats gpt-5.4 and claude opus 4.6 on Swe-bench pro
Haider. tweet media
Z.ai@Zai_org

Introducing GLM-5.1: The Next Level of Open Source - Top-Tier Performance: #1 in open source and #3 globally across SWE-Bench Pro, Terminal-Bench, and NL2Repo. - Built for Long-Horizon Tasks: Runs autonomously for 8 hours, refining strategies through thousands of iterations. Blog: z.ai/blog/glm-5.1 Weights: huggingface.co/zai-org/GLM-5.1 API: docs.z.ai/guides/llm/glm… Coding Plan: z.ai/subscribe Coming to chat.z.ai in the next few days.

English
8
10
42
4.5K
JC Gilbert
JC Gilbert@gilbert_jc·
question is how close is google and openai to release similar models denying there’s a seismic shift happening across the industry would be denial at this point i think people look at the numbers and call if de facto a bubble, when in reality if you look at the use cases, and i’ve seen them in coding for the past years, it’s night and day vs what we saw in 2023/2024 what we probably are going to get at some point is problems at the data center layer with so much capex invested. lots of organisations are moving back on-prem from the cloud because of egress costs (among many other reasons) and legacy infrastructure isn’t ready for AI workloads AGI is already here
NIK@ns123abc

🚨 Anthropic just revealed their unreleased frontier model called Claude Mythos Preview The model is INSANE It found thousands of zero-day vulnerabilities in EVERY major operating system and browsers: > 27-year-old bug in OpenBSD > 16-year-old bug in FFmpeg that automated tools hit 5M times without catching Completely autonomous. No human steering. They assembled an entire industry coalition called Project Glasswing around it: AWS, Apple, Google, Microsoft, NVIDIA, CrowdStrike, JPMorgan, Cisco, Palo Alto, Linux Foundation Goal: patch the world’s software BEFORE releasing it > SWE-bench: 93.9% (Opus 4.6: 80.8%) > Anthropic is committing $100M in usage credits > Thousands of vulnerabilities in 40+ organizations are being fixed right now Yesterday OpenAI published a 13-page essay warning about cyber threats and asking the government to help… Today Anthropic actually fixed them.

English
0
0
3
99
JC Gilbert
JC Gilbert@gilbert_jc·
@kimmonismus what’s even more wild is thinking that openai and google are probably not far off from anthropic
English
0
0
4
343
Chubby♨️
Chubby♨️@kimmonismus·
The New Yorker's investigative article argues that Sam Altman’s rise at OpenAI has been powered by extraordinary persuasion, aggressive dealmaking, and repeated allegations of deception from people closest to him, including Ilya Sutskever, Dario Amodei, former board members, and even Microsoft executives. It ties the 2023 firing-and-reinstatement drama to a much bigger story: OpenAI’s shift away from its original safety-first nonprofit ideals toward a high-stakes empire chasing trillion-dollar scale, Gulf funding, military contracts, and political influence.
Chubby♨️ tweet media
English
18
19
191
14.5K
JC Gilbert
JC Gilbert@gilbert_jc·
@Polymarket @hvo_e_acc a lot of people underestimate how indirectly tied they are to AI if openai fails it’s gonna be a big, big mess
English
0
2
5
293
Polymarket
Polymarket@Polymarket·
JUST IN: OpenAI projects $121,000,000,000.00 in compute spending in 2028, doesn’t expect profit until “at least” 2030.
English
176
120
1.7K
2.6M
JC Gilbert
JC Gilbert@gilbert_jc·
@PolymarketMoney @hvo_e_acc the thing is agentic workflows are very much expected if it’s too have another open source alternative sure but i don’t think it’ll keep meta really in the race
English
0
1
7
786
Polymarket Money
Polymarket Money@PolymarketMoney·
$META is preparing to release its first AI models developed under Alexandr Wang, with plans to eventually offer open-source versions.
Polymarket Money tweet media
English
22
28
422
43.8K
Ole Lehmann
Ole Lehmann@itsolelehmann·
the older I get the more I get interested in energy, factories and black holes is this normal?
English
32
0
66
4.2K
Ken Wattana
Ken Wattana@KenWattana·
A frontier lab should raise $100M and make their website entirely in Papyrus as ragebait
English
2
0
3
132
JC Gilbert
JC Gilbert@gilbert_jc·
@signulll current state of software makes sales skills much more valuable. idc if it’s b2c/b the point is how do you engage with your ideal customer profile and as you say reverse engineer from the problem and that’s sales
English
0
0
0
170
signüll
signüll@signulll·
“you've got to start with the customer experience and work backwards to the technology. you can't start with the technology and try to figure out where you're going to try to sell it.” this is the fundamental problem with almost all of ai today. the founders who'll win are the ones who identify specific, painful, recurring workflows & make them vanish.
English
55
66
765
28.7K
JC Gilbert
JC Gilbert@gilbert_jc·
for the better part of the last 2.5 years, i have heard 95%+ of the time sonnet/opus being the best models for coding while charging for a token premium, having a lower cost/training and arguably a steeper revenue trajectory than openai to me, anthropic won
JC Gilbert tweet media
Andrew Curran@AndrewCurran_

Projected OpenAI and Anthropic model training spend for the remainder of this decade, in billions. The WSJ says they got the data from financial documents shared with investors.

English
0
0
7
465
JC Gilbert
JC Gilbert@gilbert_jc·
@AndrewCurran_ i think it's becoming very clear anthropic is doing much, much better overall almost the same revenue but a very different cost profile
English
2
0
5
546
Andrew Curran
Andrew Curran@AndrewCurran_·
Projected OpenAI and Anthropic model training spend for the remainder of this decade, in billions. The WSJ says they got the data from financial documents shared with investors.
Andrew Curran tweet media
English
27
57
662
111.2K
NIK
NIK@ns123abc·
>“wE dOnT hAvE a cLeAr dEfiNaTiOn oF AGI”
NIK tweet media
English
37
23
670
20.3K