Not Spacewear (@NotSpacewear) - Hồ sơ Twitter

Not Spacewear@NotSpacewear·6h

@JasonBotterill USER ERROR

Español

0

96

JB@JasonBotterill·8h

I don’t get the Opus 4.7 hate i genuinely had one of the most productive days of my life yesterday

English

20

2

159

4.6K

Not Spacewear@NotSpacewear·13h

@firstadopter user error

Español

0

92

tae kim@firstadopter·18h

Hearing from folks that the Anthropic Opus 4.7 launch debacle is making them consider whether Mythos is way overhyped. Makes total sense.

tae kim@firstadopter

Put a fork in it. I'm making the call: Anthropic's Opus 4.7 is a stinker. Too many credible accounts are saying the same negative stuff. If Anthropic magically has 100% uptime now, that's also a red flag for the adaptive thinking nerf.

English

8

3

115

12.1K

Not Spacewear@NotSpacewear·14h

@GergelyOrosz the model is actually really smart. most of the issues are user error

English

0

30

Gergely Orosz@GergelyOrosz·15h

Opus 4.7 is the first model that feels like it is openly condescending towards me, as the customer with certain prompts. Whatever Anthropic did: I don't like it. If the model is condescending: I shouldn't be paying you to use it; you should be paying me! Who would have thought

English

112

33

1.2K

81.5K

Not Spacewear@NotSpacewear·14h

Opus 4.7 issues are user error

English

0

11

Not Spacewear@NotSpacewear·22h

@techdevnotes sweet

English

0

441

Tech Dev Notes@techdevnotes·23h

Grok 4.3 can create Slides

English

62

95

1.1K

5.1M

Not Spacewear@NotSpacewear·22h

@kimmonismus It takes a few days to learn to use the new model

English

0

128

Chubby♨️@kimmonismus·1d

I've now spent several hours using Opus 4.7 and comparing it to 4.6, and it's like night and day for me. Opus 4.7 feels like a disgruntled employee whose results you can't judge and have to check afterward. The trust you had with 4.6 is gone. It's like hiring a new employee who had excellent grades in their application but is totally sloppy and disgruntled in practice and doesn't follow instructions. The consequence: fire them. So, for now, I'm going back to 4.6. Seriously: did not expect such release from Anthropic. Biggest win for OpenAI was Anthropics Opus release.

Chubby♨️@kimmonismus

ok wtf, i say it. give me back 4.6 what the heck is this sh*t. The more i use 4.7 the more annoyed i am. this is such a rushed release.

English

85

50

1.1K

129.5K

Not Spacewear@NotSpacewear·22h

@kimmonismus There's just something meh about knowing you aren't recieving the best model

English

0

242

Chubby♨️@kimmonismus·1d

Opus 4.7 consumes approximately 1.3 times as many tokens. The instructions must be very precise. Many are complaining about a "rushed release." In the Bullshit Benchmark, it performs worse than Opus 4.6. The mood is very mixed. Anthropic may have done OpenAI a big favor with this. Spud is expected next week. And if the release is done right, it could overshadow Opus and catapult ChatGPT back to the top. h/t @petergostev for the benchmark and image

Chubby♨️@kimmonismus

The mood regarding the Opus 4.7 update has shifted. If I had to guess, I'd say 60% are disappointed with the latest update, while 40% are positive. I'm still undecided myself. Here's a good summary from someone on Reddit. What's your take on it so far?

English

51

48

798

77K

Not Spacewear@NotSpacewear·1d

@scaling01 I think it’s just hard for people to get as excited when they know the real good stuff is being withheld

English

0

34

Lisan al Gaib@scaling01·1d

I think everyone saying that these improvements are mid are smoking crack I would argue that this was one of the larger Opus jumps we have seen over the last year You also have to keep in mind that we see almost monthly model updates nowadays instead of just every 6-12 months like in 2023

Lisan al Gaib@scaling01

Claude Opus 4.7 Benchmarks

English

11

5

136

7.1K

Not Spacewear@NotSpacewear·1d

@haider1 Nobody including Google has figured out how to tame Gemini’s MOE architecture METR harness might be better than any of the CLIs for Gemini

English

0

80

Haider.@haider1·1d

not what i expected at all on the latest METR benchmark: gemini 3.1 pro hit about 6.4 hours on the 50% time-horizon metric but what is really most important to me is that it leads the stricter 80% metric at about 1.5 hours, ahead of opus 4.6 and gpt-5.4 xhigh

English

9

10

78

4.4K

Not Spacewear@NotSpacewear·1d

@scaling01 This could mean that Gemini doesn't have a good harness that people can openly use. METR has its own harness, and maybe it is actually effective at getting Gemini 3.1 to perform. Pretty interesting

English

0

1

37

Lisan al Gaib@scaling01·2d

Gemini 3.1 Pro scoring above GPT-5.4.-xhigh 💀😭

Lisan al Gaib@scaling01

METR Time Horizons for Gemini 3.1 Pro 6 hours 24 minutes

English

22

8

284

28.5K

Not Spacewear@NotSpacewear·1d

@kyle_mccleary @scaling01 That might be the wrong conclusion. METR has its own harness. It could imply that the current harnesses for Gemini 3.1 pro are not effective at controlling the model properly, but that the model is actually very capable.

English

0

10

Kyle@kyle_mccleary·2d

@scaling01 Says much more about the benchmark imo.

English

1

0

13

577

Not Spacewear@NotSpacewear·1d

@petergostev Don't worry, this app was built by Opus 4.7 Not even the good stuff.

English

0

191

Peter Gostev@petergostev·2d

A common problem we will now see is the kind of 'google disease' but at mass scale. Where smart engineers can ship apps quickly, then lose interest and move on to something else cool to ship. It feels good to ship a whole new app, but less cool to keep fixing bugs a year later.

Theo - t3.gg@theo

I feel bad dunking on them so much but it's genuinely absurd how bad the new Claude Code desktop app is. You can feel the vibe code leaking everywhere. Every "feature" is barely integrated and full of edge cases that weren't considered. Every menu feels barren, stuffed in last second for some random toggle. Every hotkey breaks as soon as you try to do anything else. I've lost track of how many bugs I've encountered. I found at least 40 in under an hour. And it's all truly absurd arcane shit. Stuff like voice mode typing in all input boxes instead of just the one you have focused. Any one of these issues would have been enough for me to do a massive post-mortem and likely fire someone. A $400b company shipping this is absurd. I feel like I'm going mad. How does anyone seriously use this?? It is broken on fundamental levels that are hard to comprehend. How are we supposed to trust the code these models produce if Anthropic's official showcases are absolute slop? Dedicated video on this coming tomorrow. Just needed to get this off my chest.

English

9

4

80

4.6K

Not Spacewear@NotSpacewear·1d

@Lentils80 there's something so lame about a release of a non SOTA model by a lab when the good stuff is behind closed doors

English

0

64

Lentils@Lentils80·2d

Claude Opus 4.7 (not 5) is finally dropping soon. Obviously it won't even come close to Mythos, but I'm hoping the leap from 4.6 is at least as big as 4.5 to 4.6 was, if not bigger.

can@marmaduke091

Opus 4.7 sighted on Google Vertex AI 👀

English

8

1

151

14.3K

Not Spacewear@NotSpacewear·1d

@theo claude, fix the broken ui states in this application, make no mistakes

English

0

17

Theo - t3.gg@theo·2d

I feel bad dunking on them so much but it's genuinely absurd how bad the new Claude Code desktop app is. You can feel the vibe code leaking everywhere. Every "feature" is barely integrated and full of edge cases that weren't considered. Every menu feels barren, stuffed in last second for some random toggle. Every hotkey breaks as soon as you try to do anything else. I've lost track of how many bugs I've encountered. I found at least 40 in under an hour. And it's all truly absurd arcane shit. Stuff like voice mode typing in all input boxes instead of just the one you have focused. Any one of these issues would have been enough for me to do a massive post-mortem and likely fire someone. A $400b company shipping this is absurd. I feel like I'm going mad. How does anyone seriously use this?? It is broken on fundamental levels that are hard to comprehend. How are we supposed to trust the code these models produce if Anthropic's official showcases are absolute slop? Dedicated video on this coming tomorrow. Just needed to get this off my chest.

English

443

221

5.5K

1.1M

Not Spacewear@NotSpacewear·1d

@haider1 so lame to not be getting the best model

English

0

382

Haider.@haider1·1d

opus 4.7 is likely launching today if this really reflects their internal progress, opus 4.7 may have been a more rushed release than usual it seems this was originally meant to be 'mythos', but they pulled back -- so they still needed to ship something around this time and opus 4.7 became that release

English

49

20

662

82K

Not Spacewear@NotSpacewear·2d

@scaling01 Are the Gemini issues all harness?

English

0

1

297

Lisan al Gaib@scaling01·2d

Gemini 3.1 Pro taking off on the 80% METR time horizon

English

17

14

371

34.3K

Not Spacewear@NotSpacewear·2d

@xpasky @scaling01 Honestly it does feel like it’s got a huge CLI problem. Nobody can figure out how to tame it. Not even Google. Probably some magic in the model

English

0

2

69

Petr Baudis@xpasky·2d

@scaling01 How the heck did they wire it up to make it finish things?

English

2

0

28

1.6K

Not Spacewear@NotSpacewear·2d

@Yampeleg The gimped model 😭

English

0

2

354

Yam Peleg@Yampeleg·2d

Opus 4.7 Imminent.

Català

15

12

522

26K

Not Spacewear@NotSpacewear·2d

@daniel_mac8 Spud next week according to the betting markets

English

0

1

61

Dan McAteer@daniel_mac8·3d

Spud 🥔 and Claude Opus 4.7 in the same week. What a time to be AI-live.

Stephanie Palazzolo@steph_palazzolo

Scooplet: Anthropic is prepping its Claude Opus 4.7 model and a new AI tool for design, both of which could be released as soon as this week. theinformation.com/briefings/excl…

English

16

9

211

15.6K

Not Spacewear@NotSpacewear·2d

@kimmonismus Betting markets say Spud is next week :(

English

0

88

Chubby♨️@kimmonismus·3d

I was always torn between GPT-5.4 and Opus 4.6. But over time, I've come to the conclusion that Claude has a better "taste." Anyway, I'm super hyped for this week! Opus 4.7 and (fingers crossed) Spud

English

41

17

870

34.4K

Not Spacewear

Khám phá