nahcrof

1.9K posts

nahcrof

nahcrof

@nahcrof

Cheapest inference provider in the world https://t.co/NhE9WmHpYT

somewhere Katılım Kasım 2022
47 Takip Edilen619 Takipçiler
Sabitlenmiş Tweet
nahcrof
nahcrof@nahcrof·
where else can you get 500 daily requests on a model like glm-5 or kimi-k2.5 for $5 (without extra rate limits)?
nahcrof tweet media
English
36
2
164
15.1K
nahcrof
nahcrof@nahcrof·
@bianco_____ Alright, I’ll admit I haven’t messed with spawning sub agents so I’ll look into it, thank you for the feedback
English
0
0
0
14
Bianco
Bianco@bianco_____·
@nahcrof having a hard time making it spawn subagents. i have to ask specifically for the task tool, it`s not smart enough to understand: "spawn 2 subagents ... " , while in fireworks and opencode providers, GLM 5.1 gets it really well.
English
1
0
1
27
nahcrof
nahcrof@nahcrof·
@weirdLSD That’s the hope, we’ve been doing this for nearly a year now
English
0
0
1
12
renier
renier@weirdLSD·
@nahcrof will this still be true in a few years from now 😁
English
1
0
1
11
nahcrof
nahcrof@nahcrof·
@viktorg475 The tokens are counted by the tokenizer for the model so that’s why there’s variance, as for logging that is some time thing I intend on improving that and glm-5.1 should be better now, infra was struggling to keep up until today when I pushed a patch
English
0
0
1
48
viktorg
viktorg@viktorg475·
So I bought some tokens and tried to use your service (glm-5.1) and nothing ever came back after a minute. You need better logging (like Openrouter has). And maybe average relative token expense to your cheapest model (ie: "what model are you?" in minimax is N tokens but N*4 in GLM)
English
1
0
2
62
nahcrof
nahcrof@nahcrof·
(and sometimes fastest depending on the model)
English
0
0
1
133
Ansari abdul shakoor
Ansari abdul shakoor@oneabdulshakoor·
@nahcrof Does the api and subscription plan both have same infra so technically speed would be same, as chutes has been slow and opencode go hits limit very fast
English
1
0
1
17
nahcrof
nahcrof@nahcrof·
@oneabdulshakoor I have been looking into adding that model, I just don't wanna use more capacity than I can comfortably handle
English
0
0
1
4
nahcrof
nahcrof@nahcrof·
@0oAstro we do have a /v1/models endpoint but if you're having issues I can do my best to help
English
1
0
1
38
Adhisu Sama
Adhisu Sama@adhtri001·
@nahcrof Hey, was wondering if you wanted to consider eagle3 speculative decoding? Generally, offers 3x throughput, without any quality loss. Could do 5x in rare cases. As of now, I think k2.5 have a good eagle3 model in HF.
English
1
0
1
39
nahcrof
nahcrof@nahcrof·
Should we replace glm-5-lightning with glm-5.1 lightning or get rid of glm-5-lightning and aim to keep glm-5.1 tps above 100?
English
10
0
20
1.1K