nahcrof

1.9K posts

nahcrof

@nahcrof

Cheapest inference provider in the world https://t.co/NhE9WmHpYT

somewhere เข้าร่วม Kasım 2022

47 กำลังติดตาม621 ผู้ติดตาม

ทวีตที่ปักหมุด

nahcrof@nahcrof·5d

where else can you get 500 daily requests on a model like glm-5 or kimi-k2.5 for $5 (without extra rate limits)?

English

164

15.2K

nahcrof@nahcrof·1h

@vibestressing About 4-6 hours somewhere between 00:00 and 10:00 CST

English

vibe stressing@vibestressing·1h

@nahcrof everything you do is very impressive. so, when do you sleep?

English

nahcrof@nahcrof·1h

I have been gifted the silly blue checkmark

English

200

nahcrof@nahcrof·1h

@Star_Knight12 CrofAI in my biased opinion

English

Prasenjit@Star_Knight12·18h

what's the best thing happened because of AI

English

2.7K

nahcrof@nahcrof·1h

@viktorg475 @bianco_____ I don’t adjust the model, it’s the inference engine and quantization, I can’t control much but I can always do my best

English

viktorg@viktorg475·1h

@nahcrof @bianco_____ A bit of curiosity on how this all works, why is this your responsibility and not Zhipu's? Aren't you just using their weights? How can you make "adjustments" to an already trained model?

English

nahcrof@nahcrof·2h

So far, any feedback on the improved glm-5.1 deployment?

English

551

nahcrof@nahcrof·2h

@bianco_____ Alright, I’ll admit I haven’t messed with spawning sub agents so I’ll look into it, thank you for the feedback

English

Bianco@bianco_____·2h

@nahcrof having a hard time making it spawn subagents. i have to ask specifically for the task tool, it`s not smart enough to understand: "spawn 2 subagents ... " , while in fireworks and opencode providers, GLM 5.1 gets it really well.

English

115

nahcrof@nahcrof·2h

@weirdLSD That’s the hope, we’ve been doing this for nearly a year now

English

renier@weirdLSD·2h

@nahcrof will this still be true in a few years from now 😁

English

nahcrof@nahcrof·7h

cheapest LLM inference world wide

Clifton Sellers@CliftonSellers

You got 5 words Sell me your service

English

682

nahcrof@nahcrof·2h

@gidian83427 I mean I think that looks pretty good

English

107

Gidian Malkavoy@gidian83427·2h

@nahcrof

QME

114

nahcrof@nahcrof·2h

@viktorg475 The tokens are counted by the tokenizer for the model so that’s why there’s variance, as for logging that is some time thing I intend on improving that and glm-5.1 should be better now, infra was struggling to keep up until today when I pushed a patch

English

viktorg@viktorg475·3h

So I bought some tokens and tried to use your service (glm-5.1) and nothing ever came back after a minute. You need better logging (like Openrouter has). And maybe average relative token expense to your cheapest model (ie: "what model are you?" in minimax is N tokens but N*4 in GLM)

English

nahcrof@nahcrof·4h

Just realized that chutes tried to lower their kimi-k2.5 prices again so I’ll be lowering ours with our next update $0.35/m input $0.07/m cache $1.70/m output

English

924

nahcrof@nahcrof·3h

@JaidCodes Yes, they actually reached out to me at one point and long story short they decided they didn’t want to (because of a bug that wasn’t our fault)

English

101

Jaid@JaidCodes·3h

@nahcrof Did you ever try to apply to OpenRouter as an inference provider?

English

149

nahcrof@nahcrof·5h

@guillermode20 I could probably do a $20 plan

English

Guillermomo@guillermode20·6h

@nahcrof Awesome! Any plans for a $20 plan? The jump from 10 to 50 is pretty significant. Either that or an option to buy packs like how synthetic does it would be cool.

English

nahcrof@nahcrof·7h

glm-5-lightning is deprecated (reroutes to glm-5.1) the lightning infra that was going to glm-5-lightning is currently being moved to glm-5.1

English

486

nahcrof@nahcrof·7h

(and sometimes fastest depending on the model)

English

150

nahcrof@nahcrof·7h

@0oAstro You're not stupid! And thank you! I do my best

English

shaurya@0oAstro·7h

@nahcrof rlly sorry, i just re-did the setup and `crof.ai` worked instead of `crof.ai/v1`. should have tried earlier. i feel stupid now. + earlier it might be down as u were changing inference engine. great service and great work man :D

English

nahcrof@nahcrof·8h

Update coming soon glm-5-lightning will be deprecated in favor of increasing overall inference tps on glm-5.1 (if you have tools that go to glm-5-lightning, once the update is released they will be redirected automatically so your tools don't break)

English

642

nahcrof@nahcrof·7h

@oneabdulshakoor yes, the infra is the same no matter what

English

Ansari abdul shakoor@oneabdulshakoor·8h

@nahcrof Does the api and subscription plan both have same infra so technically speed would be same, as chutes has been slow and opencode go hits limit very fast

English

nahcrof@nahcrof·5d

where else can you get 500 daily requests on a model like glm-5 or kimi-k2.5 for $5 (without extra rate limits)?

English

164

15.2K

nahcrof@nahcrof·7h

@oneabdulshakoor I have been looking into adding that model, I just don't wanna use more capacity than I can comfortably handle

English

Ansari abdul shakoor@oneabdulshakoor·8h

@nahcrof Also stepfun , probably will push me to switch once stepfun 3.5 is available its in top5 and benchmark are crazy for such a small model , just check openrouter usage , i have been following for a while and thinking of switching might try api first to test the speed

English

nahcrof@nahcrof·7h

@0oAstro we do have a /v1/models endpoint but if you're having issues I can do my best to help

English

shaurya@0oAstro·7h

@nahcrof ahh yes, bifrost didn't seem to find /v1/models. do u already have it? in which case it might just be issue in my bifrost stack or something

English

nahcrof@nahcrof·7h

@adhtri001 I have thought about speculative decoding but I haven't put much effort into it, I might do it though (especially for kimi-k2.5)

English

Adhisu Sama@adhtri001·8h

@nahcrof Hey, was wondering if you wanted to consider eagle3 speculative decoding? Generally, offers 3x throughput, without any quality loss. Could do 5x in rare cases. As of now, I think k2.5 have a good eagle3 model in HF.

English

nahcrof@nahcrof·11h

Should we replace glm-5-lightning with glm-5.1 lightning or get rid of glm-5-lightning and aim to keep glm-5.1 tps above 100?

English

1.1K

ค้นพบ

@vibestressing @Star_Knight12 @viktorg475 @bianco_____ @weirdLSD @gidian83427 @JaidCodes @elonmusk