Sergio (@SergioOSINT) - Twitter Profili | Zamantika Mersobahis Locabet

Sergio@SergioOSINT·1d

@konstiwohlwend ikik

Filipino

0

2

1.9K

Konsti Wohlwend@konstiwohlwend·1d

@SergioOSINT Sorry, I'm running critical banking, airline & insurance infrastructure so upgrading would be irresponsible towards our shareholders (no seriously this is just a joke, i do hope no one is still running this in prod, but also i wouldn't be 100% surprised :] )

English

3

0

133

7.8K

Konsti Wohlwend@konstiwohlwend·2d

Very irresponsible to publish this without waiting for the patch first. Please delete

Nick G@kallsyms

🚨 0-day alert! GPT 5.5 has found and exploited a network accessible RCE in Mac OS 9.2.1 🚨

English

18

47

3.5K

307.7K

Sergio@SergioOSINT·1d

@olsenbdnr 99% of these will be residential proxies...

English

0

2

490

Olsen@olsenbdnr·2d

For those who are trying to scam X Ads, use fraudulent credit cards etc. Better make sure your opsec is flawless because we are coming for you. I already got a few IPs, and we are drafting up subpoenas to your ISPs/email providers and more!

English

384

668

5K

641.7K

Sergio@SergioOSINT·3d

@thsottiaux @nima_ab Check DM : )

English

0

8

Tibo@thsottiaux·3d

@nima_ab Did you know that you've hit your usage limit?

English

59

4

1.2K

43.7K

Sergio@SergioOSINT·3d

@nahcrof @uwunetes @0xtiago_ @DBrodniak Yes, I tried GLM 5.1 Precision and I tried Kimi K2.6 Precision and i tried Deepseek V4 Pro, they don't seem to perform as well as the original providers inference do sadly.

English

1

0

1

57

nahcrof@nahcrof·3d

@SergioOSINT @uwunetes @0xtiago_ @DBrodniak have you tried it yourself? If so, what model(s) did you try?

English

1

0

1

39

addison@uwunetes·4d

xai is the most unserious US lab lmao why would u ever release this? its a closed source model worse than open source models like why would i use this over deepseek or kimi

Artificial Analysis@ArtificialAnlys

xAI has launched Grok 4.3, achieving 53 on the Artificial Analysis Intelligence Index with improved agentic performance, ~40% lower input price, and ~60% lower output price than Grok 4.20 The release of Grok 4.3 places @xAI just above Muse Spark and Claude Sonnet 4.6 on the Intelligence Index, and a 4 points ahead of the latest version of Grok 4.20. Grok 4.3 improves its Artificial Analysis Intelligence Index score while reducing cost to run the benchmark suite. Key Takeaways: ➤ Grok 4.3 improves on cost-per-intelligence relative to Grok 4.20 0309 v2: it scores higher on the Intelligence Index while costing less to run the full benchmark suite. Grok 4.3 costs $395 to run the Artificial Analysis Intelligence Index, around 20% lower than Grok 4.20 0309 v2, despite using more output tokens. This makes it one of the lower-cost models at its intelligence level ➤ Large increase in real world agentic task performance: The largest single benchmark improvement is on GDPval-AA, where Grok 4.3 scores an ELO of 1500, up 321 points from Grok 4.20 0309 v2’s score of 1179 Grok 4.3, surpassing Gemini 3.1 Pro Preview, Muse Spark, Gpt-5.4 mini (xhigh), and Kimi K2.5. Grok 4.3 narrows the gap to the leading model on GDPval-AA, but still trails GPT-5.5 (xhigh) by 276 Elo points, with an expected win rate of ~17% against GPT-5.5 (xhigh) under the standard Elo formula ➤ Grok 4.3’s performs strongly on instruction following and agentic customer support tasks. It gains 5 points on 𝜏²-Bench Telecom to reach 98%, in line with GLM-5.1. Grok 4.3 maintains an 81% IFBench score from Grok 4.20 0309 v2 ➤ Gains 8 points on AA-Omniscience Accuracy, but at the cost of lower AA-Omniscience Non-Hallucination Rate of 8 points, so Grok 4.20 0309 v2 still leads AA-Omniscience Non-Hallucination Rate, followed by MiMo-V2.5-Pro, in line with Grok 4.3 Congratulations to @xAI and @elonmusk on the impressive release!

English

58

4

239

30.8K

Sergio@SergioOSINT·3d

@uwunetes Well both of MiMo V2.5 and Kimi K2.6 are 1T+ parameter models, Grok 4.3 is a .5T model. Grok 4.3 is closed source, and it still costs less than both of them (~2.5x cheaper than Kimi K2.6). Not to mention how heavily distilled those models are

English

0

2

382

Sergio@SergioOSINT·3d

@uwunetes @0xtiago_ @DBrodniak @nahcrof Still not near the original inference versions because of their Inference Engine/settings, and under-performs

English

1

0

66

addison@uwunetes·4d

@0xtiago_ @DBrodniak @nahcrof they have non quantized ones (-precision)

English

3

0

4

1.1K

Sergio@SergioOSINT·3d

@stripe @sama xxhigh??? they have a secret reasoning mode the public can't access? loll

English

0

47

Stripe@stripe·4d

.@sama asks GPT-5.5 what kind of party it wants

Sam Altman@sama

GPT-5.5 is going to have a party for itself. it chose 5/5 at 5:55 pm for the date and time. if you'd like to come, let us know here: luma.com/5.5 codex will help the team pick people from the replies. 5.5 had some good ideas/requests for the party, which we'll do.

English

5

12

181

36.4K

Sergio@SergioOSINT·3d

@bridgemindai I respect your work and everything, but if so then please don't include Security as a statistic of it. GPT 5.5 is near Mythos level on security research and general security, and Sonnet 4.6 is not near Mythos level (should be obvious)

English

1

0

1

101

BridgeMind@bridgemindai·3d

@SergioOSINT Fair. BridgeBench is code-analysis fabrication, not a security-research benchmark.

English

1

0

2

458

BridgeMind@bridgemindai·3d

Grok 4.3 just took #1 on BridgeBench. 500B parameters. 90.3 Vibe score. 302 tok/s. Lowest hallucination rate in the field. Fast enough for real vibe coding, not just leaderboard screenshots. The AI race is shifting. If Grok keeps compounding at this pace, xAI is not just competing. They’re becoming the favorite to win.

English

50

17

360

17.3K

Sergio@SergioOSINT·3d

@Angaisb_ Just because GPT 5.5 matches on what GPT 5.5 specializes in against what Mythos does NOT specialize in does not mean that it's not dangerous. Not to mention that GPT 5.5 can very much so also be dangerously good at Cyber. Even Opus 4.5 is dangerous.

English

0

66

Angel 🌼@Angaisb_·4d

Where are all the people that called me crazy just because I said Mythos wasn't really that dangerous? We now have GPT-5.5, which doesn't seem to be much worse, and unlike Mythos you can actually use it right now

AI Security Institute@AISecurityInst

OpenAI’s GPT-5.5 is the second model to complete one of our multi-step cyber-attack simulations end-to-end 🧵

English

11

4

133

5.7K

Sergio@SergioOSINT·3d

@edugarmer @XFreeze We're going to see a bunch of iterations first as Elon expects the release of Grok 5 I think it was to be AGI

English

0

1

13

Eduardo C. Garrido-Merchán@edugarmer·3d

@XFreeze Amazing. Grok 5 might really surpass Anthropic. We may have a suprise by the end of the year. Things may change soon.

English

1

0

191

X Freeze@XFreeze·3d

Grok 4.3 is sitting in the top 7 with literally just 500B parameters. The lowest size by far Meanwhile, every other model competing at this level is between 1T to 6T parameters It's not just small. It's also the most intelligent, fastest, and lowest-hallucination model in its class....all while being one of the cheapest to run xAI built the most efficient frontier model on the planet

Artificial Analysis@ArtificialAnlys

xAI has launched Grok 4.3, achieving 53 on the Artificial Analysis Intelligence Index with improved agentic performance, ~40% lower input price, and ~60% lower output price than Grok 4.20 The release of Grok 4.3 places @xAI just above Muse Spark and Claude Sonnet 4.6 on the Intelligence Index, and a 4 points ahead of the latest version of Grok 4.20. Grok 4.3 improves its Artificial Analysis Intelligence Index score while reducing cost to run the benchmark suite. Key Takeaways: ➤ Grok 4.3 improves on cost-per-intelligence relative to Grok 4.20 0309 v2: it scores higher on the Intelligence Index while costing less to run the full benchmark suite. Grok 4.3 costs $395 to run the Artificial Analysis Intelligence Index, around 20% lower than Grok 4.20 0309 v2, despite using more output tokens. This makes it one of the lower-cost models at its intelligence level ➤ Large increase in real world agentic task performance: The largest single benchmark improvement is on GDPval-AA, where Grok 4.3 scores an ELO of 1500, up 321 points from Grok 4.20 0309 v2’s score of 1179 Grok 4.3, surpassing Gemini 3.1 Pro Preview, Muse Spark, Gpt-5.4 mini (xhigh), and Kimi K2.5. Grok 4.3 narrows the gap to the leading model on GDPval-AA, but still trails GPT-5.5 (xhigh) by 276 Elo points, with an expected win rate of ~17% against GPT-5.5 (xhigh) under the standard Elo formula ➤ Grok 4.3’s performs strongly on instruction following and agentic customer support tasks. It gains 5 points on 𝜏²-Bench Telecom to reach 98%, in line with GLM-5.1. Grok 4.3 maintains an 81% IFBench score from Grok 4.20 0309 v2 ➤ Gains 8 points on AA-Omniscience Accuracy, but at the cost of lower AA-Omniscience Non-Hallucination Rate of 8 points, so Grok 4.20 0309 v2 still leads AA-Omniscience Non-Hallucination Rate, followed by MiMo-V2.5-Pro, in line with Grok 4.3 Congratulations to @xAI and @elonmusk on the impressive release!

English

32

42

287

13.8K

Sergio@SergioOSINT·3d

@haider1 I mean this doesn't really mean it's marketing just that OpenAI is a bit more careless, as well as this, Mythos was supposed to be good at code not general offensive cyber so not active directory and such attacks. This is way out of the field Mythos was made for.

English

0

86

Haider.@haider1·4d

seems like the "mythos" panic was mostly anthropic marketing AISI found gpt-5.5 performs nearly on par with, or better than, mythos in several cases — completing TLO end-to-end in 2/10 attempts, while mythos preview did it in 3/10 on expert-level tasks: gpt-5.5 scored 71.4% mythos scored 68.6%

English

18

21

171

9.3K

Sergio@SergioOSINT·3d

@scaling01 I mean if we're going to be fair, Grok 4.3 is punching well, both MiMo-V2.5 Pro and Kimi K2.6 are both 1T+ models and distilled heavily from other models as well. Grok 4.3 is a 0.5 T model.

English

0

10

966

Lisan al Gaib@scaling01·4d

Grok-4.3 still behind chinese open-source

English

83

33

964

147.2K

Sergio@SergioOSINT·4d

@LexnLin I mean to be fair Mythos was never meant for this kind of security work. It was not made for active directory or such.

English

0

168

Leon Lin@LexnLin·4d

So why is mythos gatekept now?

AI Security Institute@AISecurityInst

OpenAI’s GPT-5.5 is the second model to complete one of our multi-step cyber-attack simulations end-to-end 🧵

English

18

2

92

8.6K

Sergio@SergioOSINT·4d

@levzzz5154 @MTSlive Mostly when you mean distilling its taking full trajectories including thoughts

English

1

0

1

117

levzzz@levzzz5154·4d

@MTSlive it's kind of hard to not distill at all github etc. is all polluted with various llm outputs as well as multi-model agents if data sharing is enabled

English

1

0

1

9.3K

MTS@MTSlive·4d

LIVE TRIAL UPDATE: OpenAI's counsel asked Musk whether xAI has ever "distilled" technology from OpenAI. Musk: "Generally AI companies distill other AI companies." "Is that a yes?" Savitt asked. Musk: "Partly."

English

38

52

1.7K

314.6K

Sergio@SergioOSINT·4d

@KrittinKalra @sama just run it across 10 passes then?😭

English

0

22

Krittin Kalra@KrittinKalra·4d

@sama 71.4% pass rate sounds amazing until you remember that in software, the other 28.6% is where everything breaks. A smartphone camera that nails 71% of shots is great for Instagram. It's not replacing a professional photographer x.com/i/status/20491…

Krittin Kalra@KrittinKalra

MKBHD explained the future of vibe coding… by accident

English

2

0

1

1.7K

Sam Altman@sama·4d

lisan say more mean things about us you're being too nice

Lisan al Gaib@scaling01

GPT-5.5 is on par with Claude Mythos - GPT-5.5 average pass rate of 71.4% (±8.0%) - Mythos Preview 68.6% (±8.7%) - GPT-5.5 solved a task that takes a human expert ~12 hours in under 11 minutes at a cost of $1.73

English

283

78

2.8K

375.7K

Sergio@SergioOSINT·5d

@banteg This is not correct Elon has said MULTIPLE times that even the newest and most capable grok 4 is a 0.5T (grok 4.3)

English

0

952

banteg@banteg·5d

i've never seen someone hedge so much (9x). i think the ranking is more interesting than the "predicted" size.

English

12

9

148

24.7K

Sergio@SergioOSINT·5d

@deedydas I'm pretty sure that all Grok models are currently below or around 0.5T

English

0

135

Deedy@deedydas·5d

Researchers just estimated the size of all the LLMs by asking it knowledge questions of varying degrees of obscurity! – GPT 5.5: ~10T params – Claude Opus 4.x: ~4-5T – Grok 4: ~3T The idea here is that factual capacity scales log-linearly with size. The paper shows 7 knowledge tiers and T7 is essentially ~0% for all models, suggesting there is still significant headroom for pretraining. Gemini 3.1 Pro is likely >10T given its used as an anchor but has no direct estimate. This means we can infer what different models might cost to some degree and their post-training effectiveness (performance at certain non-factual tasks given its size). One of the coolest papers I’ve read of late.