bd5m112

599 posts

bd5m112

bd5m112

@bd5m112

Katılım Mart 2016
4 Takip Edilen0 Takipçiler
bd5m112
bd5m112@bd5m112·
@reach_vb What about Vending Bench 2? We were shown it over 10k for Opus 4.7
bd5m112 tweet media
English
0
0
0
81
bd5m112
bd5m112@bd5m112·
@CalimanuLoredan @chetaslua Because they chose not to put them there for 5.5 it seems. Also the codenames are for internal-only models for OpenAI employees, nothing for public release
English
1
0
4
167
Calimanu Loredan
Calimanu Loredan@CalimanuLoredan·
@chetaslua if the launch is so close, why don't we see the codenames in the arena? or were they already there and then removed?
English
1
1
3
2.4K
bd5m112
bd5m112@bd5m112·
@xcore185170 Oh so you imagine that Dario is scared and then you speculate about that. But it's not the case, as I've mentioned earlier. You can continue speculating if it makes you feel better
English
0
0
0
7
xcore
xcore@XCORE1O1·
@bd5m112 Dario is panicking, he's a bitch. Openai will have 5x more compute than anthropic and he's scares
English
2
0
0
15
bd5m112
bd5m112@bd5m112·
@reedchan7 @GTqhqh48540 Ofcourse it's true, we're now closer to "old gen" being cheap. I don't want cheaper, i want higher intelligence and there's nothing else than desperately trying to copy from china, nothing meaningful in terms of coding advancing
English
0
0
0
15
Reed Chan
Reed Chan@reedchan7·
@bd5m112 @GTqhqh48540 You might be right, but at least these Chinese models are open-source. Companies like DeepSeek have taken a huge step forward in making AI accessible to developers and ordinary people worldwide. Isn’t that true? 🌹
English
1
0
1
51
bd5m112
bd5m112@bd5m112·
@chetaslua They test the models with significantly higher budget before release, same with Gemini. Then they release them at more efficient costs per prompt
English
0
0
5
1.4K
Chetaslua
Chetaslua@chetaslua·
🚨 OpenAI nerfed the model just before launch This is the world first model who solved Sudoku After launch run the same prompt and see how nerfed a model is, ( check new one will fails it) Prompt : Solve this Sudoku. Output an image of the solved Sudoku.
Chetaslua tweet mediaChetaslua tweet media
Chetaslua@chetaslua

🚨Image 2.0 is now online on ChatGPT and it's incredible! Just a few days ago even 3x3 grids would often struggle, now we can 10x10 the complexity, and it's near perfect! Prompt is image discription just click on image ALT button Credit - u/Alex__007

English
36
13
337
53K
bd5m112
bd5m112@bd5m112·
@xcore185170 How do you know about Anthropic and Mythos? They've said it uses significantly more compute but that's why the price will be 125$+, because it uses more compute. Also since then they surely have optimized it a ton (they even mentioned they're working exactly on this)
English
1
0
0
14
xcore
xcore@XCORE1O1·
@bd5m112 i mean compute to give the model to its users, not even anthropic has compute for mythos
English
1
0
0
33
bd5m112
bd5m112@bd5m112·
@xcore185170 Nah. It will hallucinate like crazy. Saying that Alibaba does not have enough compute is hilarious
English
1
0
0
38
xcore
xcore@XCORE1O1·
@bd5m112 @ggg78g89 @synthwavedd Alibaba could train a 10 trillion model and beat mythos in 1 month but it's that they don't have the compute for the customers
English
1
0
0
84
bd5m112
bd5m112@bd5m112·
@GTqhqh48540 I'm talking on coding. On other areas they're not "forever one generation behind", but on coding it's more than one year since I repeat this.
English
2
0
1
24
bd5m112
bd5m112@bd5m112·
@thsottiaux Why are you releasing everything on Mac? For the non-Mac users, seeing all these new product releases exclusive to only one platform is so dissapointing :( We are paying same subscriptions, yet receiving constantly less features. Non-Mac GPT subscription should be -50% honestly
English
1
1
4
1.1K
Tibo
Tibo@thsottiaux·
We are releasing a *research preview* of Chronicle in Codex. It allows codex to build up memories based on your day to day work on your computer and then refer to these memories to be a lot more helpful. Available for PRO subscriptions and on Mac to start. This is early and consumes quite a bit of tokens, but it has changed how I and many folks at OpenAI use Codex.
OpenAI Developers@OpenAIDevs

Last week, we released a preview of memories in Codex. Today, we’re expanding the experiment with Chronicle, which improves memories using recent screen context. Now, Codex can help with what you’ve been working on without you restating context.

English
236
152
2.6K
922.2K
bd5m112
bd5m112@bd5m112·
@ggg78g89 @synthwavedd They're forever one generation behind. So they will have Mythos level models this year for sure, when the next-Mythos model will release
English
2
0
1
198
Ali Haider
Ali Haider@ggg78g89·
@synthwavedd china is doing an extraordinary work. we should appreciate them. in almost 1 year they can have mythos level open source models.
English
3
0
32
1.9K
bd5m112
bd5m112@bd5m112·
@synthwavedd They're forever "one generation behind", that's why
English
0
0
0
417
BuBBliK
BuBBliK@k1rallik·
VERCEL GOT HACKED ShinyHunters - the group behind the Ticketmaster breach - is selling Vercel's internal database for $2M on BreachForums here's why every developer should care: - they have NPM tokens and GitHub tokens - Vercel owns Next.js - 6 million weekly downloads - one malicious push = global supply chain attack - Vercel confirmed the breach today, April 19 - they literally DMed the hackers on Telegram asking them to stop rotate your env variables RIGHT NOW
BuBBliK tweet mediaBuBBliK tweet media
Vercel@vercel

We’ve identified a security incident that involved unauthorized access to certain internal Vercel systems, impacting a limited subset of customers. Please see our security bulletin: vercel.com/kb/bulletin/ve…

English
288
1.7K
10.2K
2.4M
bd5m112
bd5m112@bd5m112·
@LexnLin @TheOpinionatedH What are you on about? Spud is 5.5 and comes in normal and Pro and Pro definatelly thinks for 20-30 mins. Mythos is likely the same. Big class of models = GPT Pro, Gemini Deep Think and soon Claude Mythos
English
0
0
0
224
Leon Lin
Leon Lin@LexnLin·
@bd5m112 @TheOpinionatedH deepthink is not the same tier. it takes way longer for responses. you cant compare mythos/spud who are just like the current models maybe a bit slower with deepthink which would take more than 20min
English
1
0
0
62
leo 🐾
leo 🐾@synthwavedd·
so we're really not seeing anything big from deepmind until I/O huh interesting strategy
English
27
5
521
62.4K
bd5m112
bd5m112@bd5m112·
@cosmo25x @TheOpinionatedH Oh it is in the same class. But old one obviously is not a competition for the upcoming Mythos and Spud. That's gonna be the next one likely announced at Google i/o
English
0
0
0
39
-
-@cosmo25x·
@bd5m112 @TheOpinionatedH Deep think model is not competition to spud/mythos. & the version naming doesn’t mean anything about its capabilities tbh
English
1
0
3
66
bd5m112
bd5m112@bd5m112·
@VictorTaelin It's an A/B test, only a small percentage of users got the stealth 5.5 model
English
0
0
0
155
Taelin
Taelin@VictorTaelin·
@chetaslua @OpenAI @sama sorry for being cynical, nothing personal, just that extraordinary results require some evidence. hard to believe you got this from a single-line prompt has anyone been able to replicate it? my gpt-5.4 pro is for sure nowhere like that
Taelin tweet media
English
12
2
68
9.4K
bd5m112
bd5m112@bd5m112·
@TheOpinionatedH They already have Gemini Deep Think model and they will definatelly do an update to that heavy model this year. Also, hasn't Spud confirmed to be just GPT 5.5 and nothing else 'special'?
English
2
0
0
341
Boris Cherny
Boris Cherny@bcherny·
For those not seeing the increase, make sure you're using Opus 4.7 with the latest Claude Code
English
67
11
960
148.5K
Boris Cherny
Boris Cherny@bcherny·
Opus 4.7 uses more thinking tokens, so we've increased rate limits for all subscribers to make up for it. Enjoy!
English
1.2K
936
22.2K
1.3M
bd5m112
bd5m112@bd5m112·
@chetaslua @tejashaveridev It does look like a distilled version from Mythes from any angle you look at it, it's more powerful but faster than Opus 4.6 and it behaves totally different, it's prompted different too, nothing related to Opus 4.5/4.6 in it. Likely to be Mythos distilled
English
1
0
5
205
Chetaslua
Chetaslua@chetaslua·
@tejashaveridev Doesn't seem so , it doesn't have that big 10T type vibe and it's Opus family, Mythos is separate
English
2
0
9
1.8K
Chetaslua
Chetaslua@chetaslua·
Dont know about Mythos, but 4.7 is really too dangerous to release with all these out of control hallucinations. Funny thing is if you look through the Claude code source there’s multiple comments in the code about how the new unreleased model had 30% increased hallucinations compared to current model. It doesn’t specifically call it Opus 4.6/4.7 but given the timeline I think it’s safe to assume it’s 4.7 and they had to design specific prompting strategies to avoid hallucination. I don’t tweet but I’d love to see Boris Cherny’s comments on those damning comments in the code.
Chetaslua tweet media
Chetaslua@chetaslua

🚨 Biggest model regression of all time Opus 4.7 Failed the Colourblind Test it recognise Ishihara color blindness test plate yet failed the test with wrong answer 26 correct answer - 74 and reference image in comment

English
24
11
283
33.6K
bd5m112
bd5m112@bd5m112·
@synthwavedd You will forever be known as the king of guessing! One that happen to guess pretty bad, that is (several times). Bad reputation you can't regain easily
English
0
0
10
1.6K
leo 🐾
leo 🐾@synthwavedd·
as seems to almost always be the case these days, the 5.5 launch has been pushed back (it will not be tomorrow) not too long of a delay though, more soon
leo 🐾 tweet media
leo 🐾@synthwavedd

@chatgpt21 yeah i've heard 🥀 alas yes i think it will be

English
40
9
328
161.2K
bd5m112
bd5m112@bd5m112·
@complex_maths @CtrlAltDwayne No they have not done this. So many false accusations on the internet it's unreal. Cache has been 1h for main agent and 5min for subagents since a long while for subscribers. API customers don't even have 1h cache unless they manually opt in. Stop spreading garbage
English
0
0
0
5
Jon Klaric
Jon Klaric@complex_maths·
@CtrlAltDwayne They don’t quantize or degrade the model weights, but it does appear they limited thinking and caching for their subscriptions, which could explain the degradation.
English
2
0
18
1K
Dwayne
Dwayne@CtrlAltDwayne·
Anthropic allegedly doesn't degrade its models intentionally, and yet they seem to be the only lab that experiences this degradation issue going back to Claude 3 out of every other SOTA lab. Are the models inherently unstable and black box, or is there something more to this? Because this seems to happen leading up to new model releases.
Thariq@trq212

@Hesamation we don't degrade our models to better serve demand, have said this many times before

English
33
5
283
17.7K