nahcrof

1.9K posts

nahcrof

@nahcrof

Cheapest inference provider in the world https://t.co/NhE9WmHpYT

somewhere Katılım Kasım 2022

47 Takip Edilen619 Takipçiler

Sabitlenmiş Tweet

nahcrof@nahcrof·5d

where else can you get 500 daily requests on a model like glm-5 or kimi-k2.5 for $5 (without extra rate limits)?

English

164

15.1K

nahcrof@nahcrof·14m

@bianco_____ Alright, I’ll admit I haven’t messed with spawning sub agents so I’ll look into it, thank you for the feedback

English

Bianco@bianco_____·15m

@nahcrof having a hard time making it spawn subagents. i have to ask specifically for the task tool, it`s not smart enough to understand: "spawn 2 subagents ... " , while in fireworks and opencode providers, GLM 5.1 gets it really well.

English

nahcrof@nahcrof·31m

@weirdLSD That’s the hope, we’ve been doing this for nearly a year now

English

renier@weirdLSD·38m

@nahcrof will this still be true in a few years from now 😁

English

nahcrof@nahcrof·5h

cheapest LLM inference world wide

Clifton Sellers@CliftonSellers

You got 5 words Sell me your service

English

590

nahcrof@nahcrof·53m

@gidian83427 I mean I think that looks pretty good

English

nahcrof@nahcrof·1h

@viktorg475 The tokens are counted by the tokenizer for the model so that’s why there’s variance, as for logging that is some time thing I intend on improving that and glm-5.1 should be better now, infra was struggling to keep up until today when I pushed a patch

English

viktorg@viktorg475·1h

So I bought some tokens and tried to use your service (glm-5.1) and nothing ever came back after a minute. You need better logging (like Openrouter has). And maybe average relative token expense to your cheapest model (ie: "what model are you?" in minimax is N tokens but N*4 in GLM)

English

nahcrof@nahcrof·3h

@guillermode20 I could probably do a $20 plan

English

nahcrof@nahcrof·5h

(and sometimes fastest depending on the model)

English

133

nahcrof@nahcrof·6h

@oneabdulshakoor yes, the infra is the same no matter what

English

Ansari abdul shakoor@oneabdulshakoor·6h

@nahcrof Does the api and subscription plan both have same infra so technically speed would be same, as chutes has been slow and opencode go hits limit very fast

English

nahcrof@nahcrof·6h

@oneabdulshakoor I have been looking into adding that model, I just don't wanna use more capacity than I can comfortably handle

English

nahcrof@nahcrof·6h

@0oAstro we do have a /v1/models endpoint but if you're having issues I can do my best to help

English

Adhisu Sama@adhtri001·6h

@nahcrof Hey, was wondering if you wanted to consider eagle3 speculative decoding? Generally, offers 3x throughput, without any quality loss. Could do 5x in rare cases. As of now, I think k2.5 have a good eagle3 model in HF.

English

nahcrof@nahcrof·9h

Should we replace glm-5-lightning with glm-5.1 lightning or get rid of glm-5-lightning and aim to keep glm-5.1 tps above 100?

English

1.1K

Keşfet

@bianco_____ @weirdLSD @gidian83427 @viktorg475 @guillermode20 @oneabdulshakoor @0oAstro @elonmusk