cedric

10.6K posts

cedric banner
cedric

cedric

@cedric_chee

SWE | @fastdotai alumni, independent researcher, tester | ex-entrepreneur @AntlerGlobal | GitHub: cedrickchee | building a new computer

Supercomputer Beigetreten Kasım 2007
435 Folgt3.2K Follower
Angehefteter Tweet
cedric
cedric@cedric_chee·
Insane. We got close to Opus 4.5 at home >70 tokens/s
cyysky@cyysky

@cedric_chee MiniMax 2.5 full precision FP8 running LOCALLY on vLLM x 8x Pro 6000 🔥 Hosting it is easier then I thought, it just reuse the same script for M2.1. Time to do the vibe coding test! Generation: 70 tokens-per-sec and 122 tokens-per-sec for two conneciton Peak Memory: 728GB

English
10
13
297
30.6K
cedric
cedric@cedric_chee·
@covrovski i want the linux app so badly
English
0
0
1
6
cedric
cedric@cedric_chee·
What Codex is cooking. super app? 0.118 -> 0.120 features list: - under development: remote_control, tool_search - now experimental: image_detail_original - now stable: shell_snapshot, shell_tool, tool_suggest, undo, unified_exec, use_legacy_landlock skill_mcp_dependency_install, tool_call_mcp_elicitation no guesswork
cedric tweet media
Tibo@thsottiaux

Codex App has achieved take-off internally. I can hear the fans

English
1
0
2
253
cedric
cedric@cedric_chee·
Sadly, my tweet got caught up in this. Y'all are being pretty harsh when I already admitted I made a mistake and clarify right after I posted. The licensing change is genuinely confusing, even for someone with years of open source experience.
cedric tweet media
English
0
0
0
79
cedric
cedric@cedric_chee·
My group is still are not entirely clear on what counts as legitimate community use. I'll email them to clarify. "Being fast and reasonable on commercial authorization requests — DM me on X or send email" This is MiniMax drawing a line so users get a better experience and serious providers are not punished for doing it properly. MiniMax tightening the commercial license makes sense?
cedric tweet media
RyanLee@RyanLeeMiniMax

x.com/i/article/2043…

English
1
1
3
271
cedric retweetet
geoff
geoff@GeoffreyHuntley·
suspect folks not ready for time to last token at or < 200ms entire applications at 5 generations per sec
English
14
7
121
18.6K
cedric retweetet
Andrew Curran
Andrew Curran@AndrewCurran_·
There has been a great deal of speculation about why Anthropic is keeping Mythos in restricted release. One of the least-discussed reasons is cost. Not the cost to Anthropic of serving the model, but the downstream effects that cost will have on the industry, and on the world. Mythos is now being served to a small group of about 50 major companies. For organizations like these, token budgets are effectively unlimited, and the opportunity cost of not using as much of the model as possible is too high. I think you can already see the downstream effects even in this limited release. Claude users complain about hitting caps faster. They complain about degraded performance. For months now almost everyone I know has been continuously hitting the cap on Claude or Codex. The existence of Mythos pressures not just the amount of usage available to smaller subscribers, but also the pricing of these plans themselves, which are already subsidized. Smaller users will get hit twice. The compute cost of serving Mythos exerts pressure all the way down the line. Inference will get cheaper over time, but demand is already ahead of that curve and continues to expand. Mythos is not the end of this chain. As long as scale keeps rewarding larger runs, larger models will keep being trained. The next model that makes a Mythos-like jump may be dramatically larger again, and much more expensive to serve. If the cost of serving frontier models continues to outpace attempts to reduce it, then smaller players and public use get squeezed out. We end up with vast models, served at immense cost, available only to the richest corporations on earth. Those firms then use that access to outcompete smaller rivals, become richer still, and widen the gap again. If this continues, a small number of giant companies end up holding the only passports to the Country of Geniuses in a datacenter. For Anthropic, culturally, this is not a desirable world. Part of their reluctance to serve Mythos more broadly comes from a reluctance to help bring this world into being. There may be no way to serve a model like Mythos at scale right now without beginning this feedback loop. And as that loop accelerates, it will generate great resentment. If they serve it to lower-tier subscribers, those users get a handful of exchanges before hitting the cap. Seeing how capable the model is only deepens the resentment, because access is visibly rationed. The labs will be forced to make a trickle-down argument: let the largest firms use the models first, and the abundance will eventually spread to everyone else. The public is unlikely to buy this argument. The hostility and pushback against the industry will spiral. Eventually it may not remain merely political. It is not only Dario who has seen this world, but Sam as well. That is part of why OpenAI has started talking about mechanisms that would give ordinary citizens a direct stake in the upside of the industry, like the Public Wealth Fund. In my opinion the original use case of Worldcoin was a global UBI in a future where OpenAI won the race. Not only is that future no longer certain, but the trust and solidarity required to support a UBI no longer seem to me to exist in the West. The only path then is simply to scale everything as quickly as possible and hope abundance eventually arrives in a cascade strong enough that it reaches everyone on earth. To my friends who are in the safety camp, I understand this argument is hard to accept. Please consider that there is a level of capability beyond which, unless your p(doom) is literally 100, stopping becomes more dangerous than continuing. I think we passed that threshold even before Mythos. Even if stopping were possible - and I personally do not believe it has been for years - stopping here would lock in a dystopia. This dynamic is incentive-driven, just like the race itself, and just as hard to coordinate against. We must not stop inside this tunnel. The only way out is through.
English
48
66
611
50.1K
cedric
cedric@cedric_chee·
@VictorWilsonDev As they like to say, you can just do things. I'm porting github.com/johnzfitch/cla… to my distro as we speak. I don't have time to port from scratch this time around. Have you found a better starting point for a Linux port?
English
0
0
1
17
Victor Wilson
Victor Wilson@VictorWilsonDev·
@cedric_chee well i'm sure the 8 CCW/Linux users can figure something out
English
1
0
0
14
cedric
cedric@cedric_chee·
@JohnThilen True. I adapted Codex app. Agree. I'm tired boss.
English
0
0
1
21
John Thilén
John Thilén@JohnThilen·
@cedric_chee People who use Linux on desktop are used to software companies ignoring them, and can adapt. But this priority does show what Anthropic values, and it is not business critical infrastructure.
English
1
0
0
23
cedric
cedric@cedric_chee·
@darekgusto Similar trend. Also Qwen. We just can't have nice things :(
English
0
0
1
31
cedric
cedric@cedric_chee·
Mad respect for the open source commitment. Just like MiniMax M2.5, local deployment is solid. My group got the vLLM inference up & running in no time. Details below.
MiniMax (official)@MiniMax_AI

We're delighted to announce that MiniMax M2.7 is now officially open source. With SOTA performance in SWE-Pro (56.22%) and Terminal Bench 2 (57.0%). You can find it on Hugging Face now. Enjoy!🤗 huggingface:huggingface.co/MiniMaxAI/Mini… Blog: minimax.io/news/minimax-m… MiniMax API: platform.minimax.io

English
3
0
9
716
cedric
cedric@cedric_chee·
A lot of people will complain about the license. Still, I would rather see the weights released under a non-commercial license than kept fully closed.
English
1
0
1
101
cedric
cedric@cedric_chee·
@darekgusto oof. I overlooked this. I digress - M2.7 is open weights for non-commercial use permitted based on MIT-style terms. Commercial use is more strict than Kimi-K2.5 modified MIT license. 😭
English
1
0
1
50
cedric
cedric@cedric_chee·
Naice! You should share the vLLM inference speed and throughput here. How many tokens/s? How many GPUs utilized? Definitely not 4 right? Drop the screenshots here. vLLM configs for those interested: $ vllm serve MiniMax-M2.7 \ --served-model-name minimax-m2.7 \ --tool-call-parser minimax_m2 \ --reasoning-parser minimax_m2 \ --enable-auto-tool-choice \ --tensor-parallel-size 4 \ --gpu-memory-utilization 0.78 \ --max-model-len -1 \ --trust_remote_code \ --port 9501 \ --compilation-config '{"mode":3,"pass_config":{"fuse_minimax_qk_norm":true}}'
English
1
0
1
47
cyysky
cyysky@cyysky·
Minimax-M2.7 is up on local A6000 x4 full precision! let's go #MiniMax
cyysky tweet mediacyysky tweet media
English
2
0
2
176
cedric
cedric@cedric_chee·
@darekgusto WDYM? Did they change the license?
English
1
0
1
42