bd5m112

599 posts

bd5m112

@bd5m112

เข้าร่วม Mart 2016

4 กำลังติดตาม0 ผู้ติดตาม

bd5m112@bd5m112·20h

@reach_vb What about Vending Bench 2? We were shown it over 10k for Opus 4.7

English

Vaibhav (VB) Srivastav@reach_vb·20h

it’s a good model sir!

English

118

2.9K

bd5m112@bd5m112·2d

@CalimanuLoredan @chetaslua Because they chose not to put them there for 5.5 it seems. Also the codenames are for internal-only models for OpenAI employees, nothing for public release

English

167

Calimanu Loredan@CalimanuLoredan·2d

@chetaslua if the launch is so close, why don't we see the codenames in the arena? or were they already there and then removed?

English

2.4K

Chetaslua@chetaslua·2d

Fun Fact of the leak is not Gpt -5.5 that gets my attention I am more interested in all these unknown names pop up in the codex app Read there discription 😭

Chetaslua@chetaslua

🚨 GPT 5.5 spotted in codex cli and app thursday launch > btw gpt 5.5 will be 3-4 times the usual gpt 5.4 > image v2 will help in better webdev

English

459

67.8K

bd5m112@bd5m112·2d

@xcore185170 Oh so you imagine that Dario is scared and then you speculate about that. But it's not the case, as I've mentioned earlier. You can continue speculating if it makes you feel better

English

xcore@XCORE1O1·3d

@bd5m112 Dario is panicking, he's a bitch. Openai will have 5x more compute than anthropic and he's scares

English

leo 🐾@synthwavedd·4d

can't help but notice every new chinese model's benchmark visuals seem to compare to previous generations opus 4.7 conspicuously absent.. lol

Kimi.ai@Kimi_Moonshot

Meet Kimi K2.6: Advancing Open-Source Coding 🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2) What's new: 🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization). 🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D. 🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files. 🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops. 🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop. - K2.6 is now live on kimi.com in chat mode and agent mode. For production-grade coding, pair K2.6 with Kimi Code: kimi.com/code - 🔗 API: platform.moonshot.ai 🔗 Tech blog: kimi.com/blog/kimi-k2-6 🔗 Weights & code: huggingface.co/moonshotai/Kim…

English

325

55.7K

bd5m112@bd5m112·2d

@reedchan7 @GTqhqh48540 Ofcourse it's true, we're now closer to "old gen" being cheap. I don't want cheaper, i want higher intelligence and there's nothing else than desperately trying to copy from china, nothing meaningful in terms of coding advancing

English

Reed Chan@reedchan7·3d

@bd5m112 @GTqhqh48540 You might be right, but at least these Chinese models are open-source. Companies like DeepSeek have taken a huge step forward in making AI accessible to developers and ordinary people worldwide. Isn’t that true? 🌹

English

bd5m112@bd5m112·2d

@chetaslua They test the models with significantly higher budget before release, same with Gemini. Then they release them at more efficient costs per prompt

English

1.4K

Chetaslua@chetaslua·2d

🚨 OpenAI nerfed the model just before launch This is the world first model who solved Sudoku After launch run the same prompt and see how nerfed a model is, ( check new one will fails it) Prompt : Solve this Sudoku. Output an image of the solved Sudoku.

Chetaslua@chetaslua

🚨Image 2.0 is now online on ChatGPT and it's incredible! Just a few days ago even 3x3 grids would often struggle, now we can 10x10 the complexity, and it's near perfect! Prompt is image discription just click on image ALT button Credit - u/Alex__007

English

337

53K

bd5m112@bd5m112·3d

@xcore185170 How do you know about Anthropic and Mythos? They've said it uses significantly more compute but that's why the price will be 125$+, because it uses more compute. Also since then they surely have optimized it a ton (they even mentioned they're working exactly on this)

English

xcore@XCORE1O1·3d

@bd5m112 i mean compute to give the model to its users, not even anthropic has compute for mythos

English

bd5m112@bd5m112·3d

@xcore185170 Nah. It will hallucinate like crazy. Saying that Alibaba does not have enough compute is hilarious

English

xcore@XCORE1O1·3d

@bd5m112 @ggg78g89 @synthwavedd Alibaba could train a 10 trillion model and beat mythos in 1 month but it's that they don't have the compute for the customers

English

bd5m112@bd5m112·3d

@GTqhqh48540 I'm talking on coding. On other areas they're not "forever one generation behind", but on coding it's more than one year since I repeat this.

English

The Founder@GTqhqh48540·4d

@bd5m112 @ggg78g89 @synthwavedd You may have Not heard Of Seedance 2.0 then

English

bd5m112@bd5m112·3d

@thsottiaux Why are you releasing everything on Mac? For the non-Mac users, seeing all these new product releases exclusive to only one platform is so dissapointing :( We are paying same subscriptions, yet receiving constantly less features. Non-Mac GPT subscription should be -50% honestly

English

1.1K

Tibo@thsottiaux·3d

We are releasing a *research preview* of Chronicle in Codex. It allows codex to build up memories based on your day to day work on your computer and then refer to these memories to be a lot more helpful. Available for PRO subscriptions and on Mac to start. This is early and consumes quite a bit of tokens, but it has changed how I and many folks at OpenAI use Codex.

OpenAI Developers@OpenAIDevs

Last week, we released a preview of memories in Codex. Today, we’re expanding the experiment with Chronicle, which improves memories using recent screen context. Now, Codex can help with what you’ve been working on without you restating context.

English

236

152

2.6K

922.6K

bd5m112@bd5m112·4d

@ggg78g89 @synthwavedd They're forever one generation behind. So they will have Mythos level models this year for sure, when the next-Mythos model will release

English

198

Ali Haider@ggg78g89·4d

@synthwavedd china is doing an extraordinary work. we should appreciate them. in almost 1 year they can have mythos level open source models.

English

1.9K

bd5m112@bd5m112·4d

@synthwavedd They're forever "one generation behind", that's why

English

417

bd5m112@bd5m112·4d

@acuriousoracle @k1rallik @PovilasKorop State actors

English

944

A Curious Oracle@acuriousoracle·4d

@k1rallik @PovilasKorop Big question who buys this data for this millions?

English

11.3K

BuBBliK@k1rallik·5d

VERCEL GOT HACKED ShinyHunters - the group behind the Ticketmaster breach - is selling Vercel's internal database for $2M on BreachForums here's why every developer should care: - they have NPM tokens and GitHub tokens - Vercel owns Next.js - 6 million weekly downloads - one malicious push = global supply chain attack - Vercel confirmed the breach today, April 19 - they literally DMed the hackers on Telegram asking them to stop rotate your env variables RIGHT NOW

Vercel@vercel

We’ve identified a security incident that involved unauthorized access to certain internal Vercel systems, impacting a limited subset of customers. Please see our security bulletin: vercel.com/kb/bulletin/ve…

English

288

1.7K

10.2K

2.4M

bd5m112@bd5m112·4d

@LexnLin @TheOpinionatedH What are you on about? Spud is 5.5 and comes in normal and Pro and Pro definatelly thinks for 20-30 mins. Mythos is likely the same. Big class of models = GPT Pro, Gemini Deep Think and soon Claude Mythos

English

224

Leon Lin@LexnLin·5d

@bd5m112 @TheOpinionatedH deepthink is not the same tier. it takes way longer for responses. you cant compare mythos/spud who are just like the current models maybe a bit slower with deepthink which would take more than 20min

English

leo 🐾@synthwavedd·5d

so we're really not seeing anything big from deepmind until I/O huh interesting strategy

English

521

62.4K

bd5m112@bd5m112·4d

@cosmo25x @TheOpinionatedH Oh it is in the same class. But old one obviously is not a competition for the upcoming Mythos and Spud. That's gonna be the next one likely announced at Google i/o

English

-@cosmo25x·5d

@bd5m112 @TheOpinionatedH Deep think model is not competition to spud/mythos. & the version naming doesn’t mean anything about its capabilities tbh

English

bd5m112@bd5m112·4d

@VictorTaelin It's an A/B test, only a small percentage of users got the stealth 5.5 model

English

155

Taelin@VictorTaelin·5d

@chetaslua @OpenAI @sama sorry for being cynical, nothing personal, just that extraordinary results require some evidence. hard to believe you got this from a single-line prompt has anyone been able to replicate it? my gpt-5.4 pro is for sure nowhere like that

English

9.4K

Chetaslua@chetaslua·5d

GPT Pro - Spud solved SVG One SHOT svg , code is shared in the comments @OpenAI you won this time , i never said this but i love this comeback lets gooo @sama

JB@JasonBotterill

Yeah its way faster now and is giving different styles of output compared to regular 5.4

English

735

394.5K

bd5m112@bd5m112·5d

@TheOpinionatedH They already have Gemini Deep Think model and they will definatelly do an update to that heavy model this year. Also, hasn't Spud confirmed to be just GPT 5.5 and nothing else 'special'?

English

341

The Opinionated Human@TheOpinionatedH·5d

@synthwavedd They may not have a response to Mythos/Spud

English

2.1K

bd5m112@bd5m112·6d

@bcherny Any statements from the team on things like this thread? This seems to be the consensus everywhere I look reddit.com/r/ClaudeCode/c…

English

200

Boris Cherny@bcherny·16 Nis

For those not seeing the increase, make sure you're using Opus 4.7 with the latest Claude Code

English

960

148.6K

Boris Cherny@bcherny·16 Nis

Opus 4.7 uses more thinking tokens, so we've increased rate limits for all subscribers to make up for it. Enjoy!

English

1.2K

936

22.2K

1.3M

bd5m112@bd5m112·6d

@chetaslua @tejashaveridev It does look like a distilled version from Mythes from any angle you look at it, it's more powerful but faster than Opus 4.6 and it behaves totally different, it's prompted different too, nothing related to Opus 4.5/4.6 in it. Likely to be Mythos distilled

English

205

Chetaslua@chetaslua·6d

@tejashaveridev Doesn't seem so , it doesn't have that big 10T type vibe and it's Opus family, Mythos is separate

English

1.8K

Chetaslua@chetaslua·6d

Dont know about Mythos, but 4.7 is really too dangerous to release with all these out of control hallucinations. Funny thing is if you look through the Claude code source there’s multiple comments in the code about how the new unreleased model had 30% increased hallucinations compared to current model. It doesn’t specifically call it Opus 4.6/4.7 but given the timeline I think it’s safe to assume it’s 4.7 and they had to design specific prompting strategies to avoid hallucination. I don’t tweet but I’d love to see Boris Cherny’s comments on those damning comments in the code.

Chetaslua@chetaslua

🚨 Biggest model regression of all time Opus 4.7 Failed the Colourblind Test it recognise Ishihara color blindness test plate yet failed the test with wrong answer 26 correct answer - 74 and reference image in comment

English

283

33.6K

bd5m112@bd5m112·16 Nis

@synthwavedd You will forever be known as the king of guessing! One that happen to guess pretty bad, that is (several times). Bad reputation you can't regain easily

English

1.6K

leo 🐾@synthwavedd·15 Nis

as seems to almost always be the case these days, the 5.5 launch has been pushed back (it will not be tomorrow) not too long of a delay though, more soon

leo 🐾@synthwavedd

@chatgpt21 yeah i've heard 🥀 alas yes i think it will be

English

328

161.2K

bd5m112@bd5m112·14 Nis

@complex_maths @CtrlAltDwayne No they have not done this. So many false accusations on the internet it's unreal. Cache has been 1h for main agent and 5min for subagents since a long while for subscribers. API customers don't even have 1h cache unless they manually opt in. Stop spreading garbage

English

Jon Klaric@complex_maths·14 Nis

@CtrlAltDwayne They don’t quantize or degrade the model weights, but it does appear they limited thinking and caching for their subscriptions, which could explain the degradation.

English

Dwayne@CtrlAltDwayne·14 Nis

Anthropic allegedly doesn't degrade its models intentionally, and yet they seem to be the only lab that experiences this degradation issue going back to Claude 3 out of every other SOTA lab. Are the models inherently unstable and black box, or is there something more to this? Because this seems to happen leading up to new model releases.

Thariq@trq212

@Hesamation we don't degrade our models to better serve demand, have said this many times before

English

283

17.8K

ค้นพบ

@reach_vb @CalimanuLoredan @chetaslua @xcore185170 @reedchan7 @GTqhqh48540 @ggg78g89 @synthwavedd