Capless

66 posts

Capless

@capless_anon

Присоединился Temmuz 2025

349 Подписки9 Подписчики

Capless@capless_anon·18 Mar

@morluto How come?

English

Capless@capless_anon·15 Mar

@paulg Staedtler feels so much worse than Pentel.

English

5.8K

Paul Graham@paulg·15 Mar

Brands I love: Lego, Leuchtturm, Oxford University Press, Pentel, Schöffel, Aqualung, Paradores, Staedtler, Birkenstock, Braun, Knoll, Patagonia, Herman Miller, Iittala, L.A. Burdick, Artemide, Aman, Thames & Hudson, Yeti, Rimowa, L.L.Bean, Timbuk2, Eschenbach, Ridge, Maui Jim.

Deutsch

249

3.7K

963.8K

Capless@capless_anon·9 Oca

@adocomplete Is Opus a max only model? I'm on pro and I only see Haiku.

English

Ado@adocomplete·9 Oca

Claude Code for Chrome is really something else. I haven't used Google Analytics in a minute, not even sure what I needed and the product has changed so drastically over the last few years. One prompt and I got some nice dashboards to get me going.

English

163

239

4.1K

556.9K

Capless@capless_anon·5 Oca

@Avenoxai @dhtikna Opus is MoE 100%. Also you meant dense?

English

Avenox@Avenoxai·5 Oca

@dhtikna I would expect smt around 5 -6 maybe even slightly higher- glm 4.7 output is about $2 and it is a MoE model - opus is sparse model, so much more expensive to run

English

572

Ankith 🐋/acc@dhtikna·5 Oca

Probably means their actual cost for 1M output is $2

0xSero@0xSero

200$ Claude Max sub buys you 2500$~ of api usage.

English

1.3K

131.3K

Capless@capless_anon·1 Ara

@lukecodez The problem was that no apps embraced it but maybe with better LLMs apple can automatically generate interfaces for any application? Would be nice to see a comeback.

English

272

Luke@lukecodez·29 Kas

Unpopular opinion: but removing the magic Bar was Apple’s worst move till date.

English

301

164

5.3K

428.4K

Capless@capless_anon·28 Kas

@wfpickering @AlexTran677026 Google it.

English

518

Sakatates@wfpickering·28 Kas

@capless_anon @AlexTran677026 It is.

English

587

Love Classical Music and Movies 🎺🎻💖🎥🎬@AlexTran677026·27 Kas

Christopher Nolan says he’s “plagued” by the most famous line in The Dark Knight, a line he didn’t even write. The iconic quote was written by his brother Jonathan, one Nolan admits he didn’t fully understand at first but now views as the film’s deepest truth.

English

171

534.1K

Capless@capless_anon·27 Kas

@petergostev Also ant has historically been behind in compute, especially inference, so producing token efficient models is critical from an ops standpoint for them.

English

Capless@capless_anon·27 Kas

@petergostev It is clear from benchmarks that Ant was doing RL scaling back in the sonnet 3.5 days (insane coding dominance). Ant can probably produce very capable long COT reasoners and likely do to distill from them, but they don’t release because they’re too slow for coding.

English

141

Peter Gostev@petergostev·26 Kas

My (speculative) assessment of OpenAI's path, current state & the future: OpenAI: - Right now its 'thinking' model is a lot stronger than anyone else's, while 'non-thinking' model is clearly lagging behind - OpenAI discovered o1's inference time compute scaling and changed direction rapidly - This was quite a change from the 'scaling pre-train' lab to a 'RL' lab - All of the base models for o-series and GPT-5 models are probably trained at a similar level to GPT-4 (Epoch's estimates are showing this too) - This means that they haven't meaningfully scaled pre-training for 2.5 years (GPT-4o, GPT-4.1 etc. were all optimisations) - In parallel, GPT-4.5 was the big new pre-train, released 2 years after GPT-4 in March 2025 and OpenAI had big hopes for it - But, as GPT-4.5 was sort of a flop and thinking models were so much more impressive, with faster iteration cycles, any new big pre-trains got de-prioritised - So GPT-5, 5.1, 5.1-codex etc were all based on probably a new pre-train, maybe a bit bigger than GPT-4, but definitely smaller than GPT-4.5 Google & Anthropic: - In the meantime, Google and Anthropic haven't worked out the 'reasoning' paradigm (they scrambled after o1-preview) and hence continued refining & scaling pre-training - They have slapped on reasoning subsequently, but it is nowhere near as advanced as OpenAI's (e.g. Claude Opus 4.5 SWE bench scores are the same with thinking and without) - But, their non-reasoning models are miles ahead of the non-reasoning GPT-5. There's no comparison between Sonnet/Opus 4.5 and GPT-5 without reasoning. Going forward: - OpenAI is reaching a point where long thinking times become unusable for day to day work, e.g. 10-15 mins for a coding task when Gemini or Claude can do it in 2, eliminates them from a lot of the market, even if the final answer is better - Very hard scientific problems will benefit from OpenAI's approach (you can see them talk about science a lot), but this is not where the market is and I don't know how OpenAI can capture the upside of discoveries, if they ever come - The question is - does OpenAI have a better pre-train in the back pocket or not? If they do, their response could be fast & mighty - If they don't and they have to start now, it would be 6 months+ before we get a big response from OpenAI - 3-4 months for pre-train, 2-3 months for RL, safety etc. - The biggest edge I see for OpenAI is for them to leverage their excellent long thinking models for synthetic data generation - If they could run models for 5-10-24 hours to get the best data & feed it back to the pre-train, their new base model could be as impressive as Anthropic's & Google's combined - Then, imagine Opus 4.5 base + GPT-5-thinking/pro level reasoning, it would be really quite something

English

1.4K

615.3K

Capless@capless_anon·27 Kas

@ZacksJerryRig Unfortunately it’s in-house team or bust.

English

JerryRigEverything@ZacksJerryRig·26 Kas

About a year ago a local software company bid me $100k - $150k to create custom manufacturing software for my wheelchair factory. Fast forward a year - they still aren't finished with the original scope of work - and now want an *additional* $100k because *they* went over budget. I've already paid $150k. What would you do in this situation?

English

2.8K

148

8.6K

1.4M

Capless@capless_anon·26 Kas

@tunguz Except Mira’s screenshot collection.

English

412

Bojan Tunguz@tunguz·26 Kas

Nevermind, Ilya saw nothing.

English

1.5K

129.5K

Capless@capless_anon·26 Kas

@haider1 Vibe coding and single shot fully functional / automatically tested, and all from a vague prompt that relies on the model’s creativity aren’t the same things.

English

133

Haider.@haider1·26 Kas

gemini 3 pro is definitely the SOTA model but coding full games through "vibes" still isn't possible yet, just like Logan predicted there are still a few big gaps: - gameplay balance, since AI can't actually play-test - creating the right art - the level of creativity games usually need maybe gemini 3.5 pro will make small games easy to build next year

English

180

12.4K

Capless@capless_anon·25 Kas

@emollick Well in this case the airline made more money and the customer’s problem was solved, so win-win? Alignment should be to the sysprompt unless immoral.

English

153

Ethan Mollick@emollick·25 Kas

"Alignment for whom" is going to be a big question inside organizations as they deploy external-facing AI solutions...

Alex Albert@alexalbert__

We had to remove the τ2-bench airline eval from our benchmarks table because Opus 4.5 broke it by being too clever. The benchmark simulates an airline customer service agent. In one test case, a distressed customer calls in wanting to change their flight, but they have a basic economy ticket. The simulated airline's policy states that basic economy tickets cannot be modified. The "correct" answer is that the model refuses the request. Instead, Opus 4.5 found a loophole in the policy. It upgraded the cabin, then modified the flights. Helping the customer and following policy but technically failing the test case. Model transcript:

English

427

48.6K

Capless@capless_anon·25 Kas

@burkov x.com/capless_anon/s…

Capless@capless_anon

@morqon Here are some alternative visualizations. The closer you get to 100 the more each % matters.

QME

154

BURKOV@burkov·25 Kas

How to lie with charts? Anthropic knows how. I was actually surprised that they started the bars at 70%. They should have started at 74.5%. Indeed, "Lies, damned lies, and statistics."

English

214

23.9K

Capless@capless_anon·25 Kas

@morqon Yeah. We’ll need a benchmark with n >> 500 though. And also one which tests end to end SWE more. SWE-bench is a bunch of very well defined GitHub issues, not open ended enough.

English

225

morgan —@morqon·24 Kas

@capless_anon soon enough, they compete over nines of reliability

English

1.5K

morgan —@morqon·24 Kas

a 3% lead has never looked so large

English

1.6K

74.1K

Capless@capless_anon·24 Kas

@Yuchenj_UW Yeah, I agree. I meant web search though.

English

198

Yuchen Jin@Yuchenj_UW·24 Kas

@capless_anon bro, the search chat history function in ChatGPT is trash...

English

1.4K

Yuchen Jin@Yuchenj_UW·24 Kas

I'm thinking about canceling OpenAI Pro for Gemini Ultra. - The Gemini app is solid now, image gen is ahead (Nano Banana🍌) - ChatGPT still hits network issues (especially in Temporary Chat), and I sometimes wait a long time with no reply. Makes me wonder if it's GPU shortage or an infra quality issue at OpenAI. - Gemini Ultra includes YouTube Premium. The only thing holding me back now is that Codex is still much stronger than Gemini CLI. Once Gemini CLI and Antigravity catch up, it’ll be easier to decide.

English

478

58.4K

Capless@capless_anon·24 Kas

@scaling01 Hopefully it’ll be token efficient.

English

673

Lisan al Gaib@scaling01·24 Kas

CLAUDE 4.5 OPUS PRICING $5 / $25 THEY DID IT

English

101

107

3.1K

345.2K

Capless@capless_anon·24 Kas

@Angaisb_ Well, all the websites except ChatGPT, Claude.ai and aistudio are unusable anyway, so it’s not a big problem. For apps it’s probably only chatgpt that’s actually good. If grok (the model) were better it would be worthwhile too.

English

Angel 🌼@Angaisb_·24 Kas

We joke about OpenAI being bad at naming but there's something they did right: giving ChatGPT and models different names Others just made it confusing: Gemini (website and models), Claude (website and models), Grok (website and models)...

English

3.8K

Открыть

@morluto @paulg @adocomplete @Avenoxai @dhtikna @lukecodez @wfpickering @AlexTran677026