nahcrof

1.9K posts

nahcrof

nahcrof

@nahcrof

Cheapest inference provider in the world https://t.co/NhE9WmHpYT

somewhere เข้าร่วม Kasım 2022
47 กำลังติดตาม621 ผู้ติดตาม
ทวีตที่ปักหมุด
nahcrof
nahcrof@nahcrof·
where else can you get 500 daily requests on a model like glm-5 or kimi-k2.5 for $5 (without extra rate limits)?
nahcrof tweet media
English
36
2
164
15.2K
vibe stressing
vibe stressing@vibestressing·
@nahcrof everything you do is very impressive. so, when do you sleep?
English
1
0
1
28
nahcrof
nahcrof@nahcrof·
I have been gifted the silly blue checkmark
English
2
0
12
200
Prasenjit
Prasenjit@Star_Knight12·
what's the best thing happened because of AI
English
36
1
32
2.7K
nahcrof
nahcrof@nahcrof·
@viktorg475 @bianco_____ I don’t adjust the model, it’s the inference engine and quantization, I can’t control much but I can always do my best
English
0
0
2
36
viktorg
viktorg@viktorg475·
@nahcrof @bianco_____ A bit of curiosity on how this all works, why is this your responsibility and not Zhipu's? Aren't you just using their weights? How can you make "adjustments" to an already trained model?
English
2
0
1
47
nahcrof
nahcrof@nahcrof·
So far, any feedback on the improved glm-5.1 deployment?
English
7
0
5
551
nahcrof
nahcrof@nahcrof·
@bianco_____ Alright, I’ll admit I haven’t messed with spawning sub agents so I’ll look into it, thank you for the feedback
English
2
0
0
75
Bianco
Bianco@bianco_____·
@nahcrof having a hard time making it spawn subagents. i have to ask specifically for the task tool, it`s not smart enough to understand: "spawn 2 subagents ... " , while in fireworks and opencode providers, GLM 5.1 gets it really well.
English
1
0
1
115
nahcrof
nahcrof@nahcrof·
@weirdLSD That’s the hope, we’ve been doing this for nearly a year now
English
0
0
3
19
renier
renier@weirdLSD·
@nahcrof will this still be true in a few years from now 😁
English
1
0
1
19
nahcrof
nahcrof@nahcrof·
@viktorg475 The tokens are counted by the tokenizer for the model so that’s why there’s variance, as for logging that is some time thing I intend on improving that and glm-5.1 should be better now, infra was struggling to keep up until today when I pushed a patch
English
0
0
1
78
viktorg
viktorg@viktorg475·
So I bought some tokens and tried to use your service (glm-5.1) and nothing ever came back after a minute. You need better logging (like Openrouter has). And maybe average relative token expense to your cheapest model (ie: "what model are you?" in minimax is N tokens but N*4 in GLM)
English
1
0
2
95
nahcrof
nahcrof@nahcrof·
Just realized that chutes tried to lower their kimi-k2.5 prices again so I’ll be lowering ours with our next update $0.35/m input $0.07/m cache $1.70/m output
English
3
0
41
924
nahcrof
nahcrof@nahcrof·
@JaidCodes Yes, they actually reached out to me at one point and long story short they decided they didn’t want to (because of a bug that wasn’t our fault)
English
0
0
4
101
Jaid
Jaid@JaidCodes·
@nahcrof Did you ever try to apply to OpenRouter as an inference provider?
English
1
0
2
149
Guillermomo
Guillermomo@guillermode20·
@nahcrof Awesome! Any plans for a $20 plan? The jump from 10 to 50 is pretty significant. Either that or an option to buy packs like how synthetic does it would be cool.
English
1
0
1
33
nahcrof
nahcrof@nahcrof·
glm-5-lightning is deprecated (reroutes to glm-5.1) the lightning infra that was going to glm-5-lightning is currently being moved to glm-5.1
English
2
0
16
486
nahcrof
nahcrof@nahcrof·
(and sometimes fastest depending on the model)
English
0
0
1
150
nahcrof
nahcrof@nahcrof·
@0oAstro You're not stupid! And thank you! I do my best
English
0
0
2
27
shaurya
shaurya@0oAstro·
@nahcrof rlly sorry, i just re-did the setup and `crof.ai` worked instead of `crof.ai/v1`. should have tried earlier. i feel stupid now. + earlier it might be down as u were changing inference engine. great service and great work man :D
English
1
0
2
39
nahcrof
nahcrof@nahcrof·
Update coming soon glm-5-lightning will be deprecated in favor of increasing overall inference tps on glm-5.1 (if you have tools that go to glm-5-lightning, once the update is released they will be redirected automatically so your tools don't break)
English
2
0
31
642
Ansari abdul shakoor
Ansari abdul shakoor@oneabdulshakoor·
@nahcrof Does the api and subscription plan both have same infra so technically speed would be same, as chutes has been slow and opencode go hits limit very fast
English
1
0
1
21
nahcrof
nahcrof@nahcrof·
where else can you get 500 daily requests on a model like glm-5 or kimi-k2.5 for $5 (without extra rate limits)?
nahcrof tweet media
English
36
2
164
15.2K
nahcrof
nahcrof@nahcrof·
@oneabdulshakoor I have been looking into adding that model, I just don't wanna use more capacity than I can comfortably handle
English
0
0
1
5
Ansari abdul shakoor
Ansari abdul shakoor@oneabdulshakoor·
@nahcrof Also stepfun , probably will push me to switch once stepfun 3.5 is available its in top5 and benchmark are crazy for such a small model , just check openrouter usage , i have been following for a while and thinking of switching might try api first to test the speed
English
1
0
1
17
nahcrof
nahcrof@nahcrof·
@0oAstro we do have a /v1/models endpoint but if you're having issues I can do my best to help
English
1
0
1
38
shaurya
shaurya@0oAstro·
@nahcrof ahh yes, bifrost didn't seem to find /v1/models. do u already have it? in which case it might just be issue in my bifrost stack or something
English
1
0
1
48
nahcrof
nahcrof@nahcrof·
@adhtri001 I have thought about speculative decoding but I haven't put much effort into it, I might do it though (especially for kimi-k2.5)
English
0
0
0
42
Adhisu Sama
Adhisu Sama@adhtri001·
@nahcrof Hey, was wondering if you wanted to consider eagle3 speculative decoding? Generally, offers 3x throughput, without any quality loss. Could do 5x in rare cases. As of now, I think k2.5 have a good eagle3 model in HF.
English
1
0
1
45
nahcrof
nahcrof@nahcrof·
Should we replace glm-5-lightning with glm-5.1 lightning or get rid of glm-5-lightning and aim to keep glm-5.1 tps above 100?
English
10
0
20
1.1K