Not Spacewear

880 posts

Not Spacewear banner
Not Spacewear

Not Spacewear

@NotSpacewear

Look Stellar, Stay Grounded! Doing Business as Not Spacewear. Wyoming, USA.

Cheyenne, WY Tham gia Temmuz 2024
557 Đang theo dõi459 Người theo dõi
JB
JB@JasonBotterill·
I don’t get the Opus 4.7 hate i genuinely had one of the most productive days of my life yesterday
English
20
2
159
4.6K
Gergely Orosz
Gergely Orosz@GergelyOrosz·
Opus 4.7 is the first model that feels like it is openly condescending towards me, as the customer with certain prompts. Whatever Anthropic did: I don't like it. If the model is condescending: I shouldn't be paying you to use it; you should be paying me! Who would have thought
English
112
33
1.2K
81.5K
Not Spacewear
Not Spacewear@NotSpacewear·
Opus 4.7 issues are user error
English
0
0
0
11
Tech Dev Notes
Tech Dev Notes@techdevnotes·
Grok 4.3 can create Slides
Tech Dev Notes tweet media
English
62
95
1.1K
5.1M
Chubby♨️
Chubby♨️@kimmonismus·
I've now spent several hours using Opus 4.7 and comparing it to 4.6, and it's like night and day for me. Opus 4.7 feels like a disgruntled employee whose results you can't judge and have to check afterward. The trust you had with 4.6 is gone. It's like hiring a new employee who had excellent grades in their application but is totally sloppy and disgruntled in practice and doesn't follow instructions. The consequence: fire them. So, for now, I'm going back to 4.6. Seriously: did not expect such release from Anthropic. Biggest win for OpenAI was Anthropics Opus release.
Chubby♨️@kimmonismus

ok wtf, i say it. give me back 4.6 what the heck is this sh*t. The more i use 4.7 the more annoyed i am. this is such a rushed release.

English
85
50
1.1K
129.5K
Not Spacewear
Not Spacewear@NotSpacewear·
@kimmonismus There's just something meh about knowing you aren't recieving the best model
English
0
0
0
242
Chubby♨️
Chubby♨️@kimmonismus·
Opus 4.7 consumes approximately 1.3 times as many tokens. The instructions must be very precise. Many are complaining about a "rushed release." In the Bullshit Benchmark, it performs worse than Opus 4.6. The mood is very mixed. Anthropic may have done OpenAI a big favor with this. Spud is expected next week. And if the release is done right, it could overshadow Opus and catapult ChatGPT back to the top. h/t @petergostev for the benchmark and image
Chubby♨️ tweet media
Chubby♨️@kimmonismus

The mood regarding the Opus 4.7 update has shifted. If I had to guess, I'd say 60% are disappointed with the latest update, while 40% are positive. I'm still undecided myself. Here's a good summary from someone on Reddit. What's your take on it so far?

English
51
48
798
77K
Not Spacewear
Not Spacewear@NotSpacewear·
@scaling01 I think it’s just hard for people to get as excited when they know the real good stuff is being withheld
English
0
0
0
34
Lisan al Gaib
Lisan al Gaib@scaling01·
I think everyone saying that these improvements are mid are smoking crack I would argue that this was one of the larger Opus jumps we have seen over the last year You also have to keep in mind that we see almost monthly model updates nowadays instead of just every 6-12 months like in 2023
Lisan al Gaib@scaling01

Claude Opus 4.7 Benchmarks

English
11
5
136
7.1K
Not Spacewear
Not Spacewear@NotSpacewear·
@haider1 Nobody including Google has figured out how to tame Gemini’s MOE architecture METR harness might be better than any of the CLIs for Gemini
English
0
0
0
80
Haider.
Haider.@haider1·
not what i expected at all on the latest METR benchmark: gemini 3.1 pro hit about 6.4 hours on the 50% time-horizon metric but what is really most important to me is that it leads the stricter 80% metric at about 1.5 hours, ahead of opus 4.6 and gpt-5.4 xhigh
Haider. tweet media
English
9
10
78
4.4K
Not Spacewear
Not Spacewear@NotSpacewear·
@scaling01 This could mean that Gemini doesn't have a good harness that people can openly use. METR has its own harness, and maybe it is actually effective at getting Gemini 3.1 to perform. Pretty interesting
English
0
0
1
37
Not Spacewear
Not Spacewear@NotSpacewear·
@kyle_mccleary @scaling01 That might be the wrong conclusion. METR has its own harness. It could imply that the current harnesses for Gemini 3.1 pro are not effective at controlling the model properly, but that the model is actually very capable.
English
0
0
0
10
Kyle
Kyle@kyle_mccleary·
@scaling01 Says much more about the benchmark imo.
English
1
0
13
577
Not Spacewear
Not Spacewear@NotSpacewear·
@petergostev Don't worry, this app was built by Opus 4.7 Not even the good stuff.
English
0
0
0
191
Not Spacewear
Not Spacewear@NotSpacewear·
@Lentils80 there's something so lame about a release of a non SOTA model by a lab when the good stuff is behind closed doors
English
0
0
0
64
Not Spacewear
Not Spacewear@NotSpacewear·
@theo claude, fix the broken ui states in this application, make no mistakes
English
0
0
0
17
Theo - t3.gg
Theo - t3.gg@theo·
I feel bad dunking on them so much but it's genuinely absurd how bad the new Claude Code desktop app is. You can feel the vibe code leaking everywhere. Every "feature" is barely integrated and full of edge cases that weren't considered. Every menu feels barren, stuffed in last second for some random toggle. Every hotkey breaks as soon as you try to do anything else. I've lost track of how many bugs I've encountered. I found at least 40 in under an hour. And it's all truly absurd arcane shit. Stuff like voice mode typing in all input boxes instead of just the one you have focused. Any one of these issues would have been enough for me to do a massive post-mortem and likely fire someone. A $400b company shipping this is absurd. I feel like I'm going mad. How does anyone seriously use this?? It is broken on fundamental levels that are hard to comprehend. How are we supposed to trust the code these models produce if Anthropic's official showcases are absolute slop? Dedicated video on this coming tomorrow. Just needed to get this off my chest.
English
443
221
5.5K
1.1M
Haider.
Haider.@haider1·
opus 4.7 is likely launching today if this really reflects their internal progress, opus 4.7 may have been a more rushed release than usual it seems this was originally meant to be 'mythos', but they pulled back -- so they still needed to ship something around this time and opus 4.7 became that release
English
49
20
662
82K
Lisan al Gaib
Lisan al Gaib@scaling01·
Gemini 3.1 Pro taking off on the 80% METR time horizon
Lisan al Gaib tweet media
English
17
14
371
34.3K
Not Spacewear
Not Spacewear@NotSpacewear·
@xpasky @scaling01 Honestly it does feel like it’s got a huge CLI problem. Nobody can figure out how to tame it. Not even Google. Probably some magic in the model
English
0
0
2
69
Petr Baudis
Petr Baudis@xpasky·
@scaling01 How the heck did they wire it up to make it finish things?
English
2
0
28
1.6K
Yam Peleg
Yam Peleg@Yampeleg·
Opus 4.7 Imminent.
Yam Peleg tweet media
Català
15
12
522
26K
Chubby♨️
Chubby♨️@kimmonismus·
I was always torn between GPT-5.4 and Opus 4.6. But over time, I've come to the conclusion that Claude has a better "taste." Anyway, I'm super hyped for this week! Opus 4.7 and (fingers crossed) Spud
English
41
17
870
34.4K