Autark

18.4K posts

Autark banner
Autark

Autark

@Aut4rk

God-Emperor of Deterministic Codegen. First of His Name.

Builder Cove Katılım Temmuz 2023
1.4K Takip Edilen1.2K Takipçiler
Sudo su
Sudo su@sudoingX·
thinking out loud. every model gets math wrong. 7B, 9B, 70B. doesn't matter. pattern matching is not computation. hermes agent has code_execution which spins up a full python sandbox with RPC over unix sockets. powerful but heavy. a 9B isn't going to navigate that reliably for basic arithmetic. what if there was a lightweight calc tool built in. model hits a math question, calls the tool, gets the exact answer computed on your hardware. no interpreter overhead. sandboxed. simple enough schema that a 9B can call it every time. the accuracy problem stops being a model problem and becomes an infrastructure problem. and infrastructure is solvable. @Teknium would this belong in hermes agent or is code_execution enough?
English
33
5
180
9.9K
Autark
Autark@Aut4rk·
Mainly because of old Dawn bindings for Zig. I had to rip that out, and just decided to build the latest Dawn from source for all platforms now, and write our own updated bindings. Was going to have to happen eventually. Also means any platform Dawn natively supports we can support.
English
1
0
1
17
Autark
Autark@Aut4rk·
Native @threejs WebGPU renderer :D TSL, addons, etc all working.
Autark tweet media
Deutsch
3
2
11
929
andy
andy@1a1n1d1y·
either i've been lucky for 20 years or every team i touch turns to gold
English
2
0
7
173
coah
coah@coah80·
might make the switch to linux this year ngl... what do i use - mint sucks - pop os sucks - nobara is okay - dank linux is lowkey a contender 😭 - cachy i like - nix maybe
coah tweet media
English
277
5
466
35.9K
tayyabsalman
tayyabsalman@Allfiesolomon·
they should have a dislike button on twitter too
English
147
62
3.4K
473.3K
@bluecow 🐮
@bluecow 🐮@BLUECOW009·
I made an agent harness that uses holographic memory and holy shit it works well
@bluecow 🐮 tweet media
English
7
0
32
1.9K
Autark
Autark@Aut4rk·
@varien Qwen3.5-0.8B just dropped. It's not perfect but it's probably the best you're getting at these sizes right now. 2B is closer to what you want, I suspect. The multimodality is just a bonus.
English
0
0
3
99
VARIEN
VARIEN@varien·
i've caught the bug for pushing 1B-class models toward something closer to coherent reasoning on the cheapest consumer hardware. Qwen2.5 and SmolLM2 GGUFs are already running on-device via llama.cpp on Android, so the inference path exists the question for me is the reasoning ceiling at this scale anyone experimenting with fine-tuning or prompting strategies to get more structured/compositional reasoning out of models this small?
English
7
0
21
2.3K
alcuin ❄️
alcuin ❄️@scheminglunatic·
just like "Tool calling" is kinda dead in face of just using shell commands, i think that a lot of "skills files/mcp type stuff" are stupid in light of just reading the manpage or so on of said shell command.
English
1
0
17
855
Autark
Autark@Aut4rk·
@platonovadim Slightly contrived example, sure, but the point is that instead of waiting on multiple tool call returns, multiple rounds, burning exponentially more time and tokens, the model can just write a miniature program to do it.
Autark tweet media
English
0
0
0
7
Vadim
Vadim@platonovadim·
Would it make sense to have the model pass itself as a continuation to evaluate result of step 4 - plugin which continues the conversation in the main context or branches the context? Then initial program would still be generated upfront but spawn more turns during evaluation.
Vadim tweet media
English
1
0
1
13
Vadim
Vadim@platonovadim·
Just have the agent harness write code in JS sandbox instead of juggling external tools. Feels like should work great, except the models might get in the way because they're not trained for it...
Autark@Aut4rk

@effectfully @pjay_in Here I'll read yours, you read mine fam.

English
2
0
1
37
Autark
Autark@Aut4rk·
@platonovadim Javascript is the single biggest training signal they have. They're coding agents. Let them write code.
English
0
0
0
11
Autark
Autark@Aut4rk·
@1a1n1d1y If you mean the CUDA C++ SDK, yeah, it does.
English
0
0
1
15
andy
andy@1a1n1d1y·
@Aut4rk cuda is actually dogshit im realizing lol
English
1
0
1
12
andy
andy@1a1n1d1y·
if you're using claude code with cuda just go ahead and rewrite all of the cuda in ptx, thank me later
English
1
0
10
550
Autark
Autark@Aut4rk·
@LottoLabs Yeah Blackwell is pretty sick. Crazy what I can do on this laptop lol
English
1
0
2
204
Lotto
Lotto@LottoLabs·
Rtx6000 pro is a no brainer w/ nvfp then
English
3
0
23
2.6K
Autark
Autark@Aut4rk·
I don't really use Hermes, I have my own harness (speaks Javascript directly instead of tool call schema nonsense, an entire category of problems I don't have to worry about). Using structured outputs works so I figure it's the same schema constriction mechanism with vLLM. I don't get denials but I never really did to begin with.
English
1
0
0
12
Lotto
Lotto@LottoLabs·
@Aut4rk Are you using it w/ hermes agent or using tool calls? How’s reliability w/ the censorship oblation or whatever it’s called.
English
1
0
1
54
Lotto
Lotto@LottoLabs·
Hermes Agent and qwen 3.5 27b on RTX6000 getting a nice 48TPS on vLLM Testing 3090 + vLLM next
English
17
5
175
11.5K
Autark
Autark@Aut4rk·
@sudoingX NVFP4 means you can run Qwen3-Coder-Next, or Qwen3.5-27B+ on a 24GB+ Blackwell card.
English
0
0
2
182
Sudo su
Sudo su@sudoingX·
you don't understand anon. i'm on a mission to find the collection of best small models that run full context on consumer hardware. because when you can orchestrate your own thinking across physical nodes locally, that's not a tool anymore. that's an extension of your mind. that's exactly where we are headed as a civilization. and most people haven't felt it yet.
English
35
23
400
8.8K
Autark
Autark@Aut4rk·
@LottoLabs I wouldn't use anything other than vLLM, lol. Not going to get any faster (when properly configured).
English
1
0
0
45
Autark
Autark@Aut4rk·
@LottoLabs It's just the model compiler this guy used. I didn't quant any of this myself. For me it helps shave off some headroom on the 24GB Laptop 5090.
English
0
0
1
11
Autark
Autark@Aut4rk·
@LottoLabs I've been using both the NVFP4 27B and 35B (and 0.8B, etc) since they came out. No issues. Laptop RTX 5090.
English
1
0
2
60
Lotto
Lotto@LottoLabs·
@Aut4rk I only mess w/ 27b for now and I heard those models might affect tools call chains?
English
2
1
1
62
Lotto
Lotto@LottoLabs·
@Aut4rk Doing this tomorrow
English
1
0
1
247