Autark

18.4K posts

Autark

@Aut4rk

God-Emperor of Deterministic Codegen. First of His Name.

Builder Cove Katılım Temmuz 2023

1.4K Takip Edilen1.2K Takipçiler

Sabitlenmiş Tweet

Autark@Aut4rk·16 Oca

x.com/i/article/2012…

ZXX

11.5K

Autark@Aut4rk·3h

@sudoingX *ahem* x.com/Aut4rk/status/…

Autark@Aut4rk

Yeah, I said it. Fite me.

English

259

Sudo su@sudoingX·4h

thinking out loud. every model gets math wrong. 7B, 9B, 70B. doesn't matter. pattern matching is not computation. hermes agent has code_execution which spins up a full python sandbox with RPC over unix sockets. powerful but heavy. a 9B isn't going to navigate that reliably for basic arithmetic. what if there was a lightweight calc tool built in. model hits a math question, calls the tool, gets the exact answer computed on your hardware. no interpreter overhead. sandboxed. simple enough schema that a 9B can call it every time. the accuracy problem stops being a model problem and becomes an infrastructure problem. and infrastructure is solvable. @Teknium would this belong in hermes agent or is code_execution enough?

English

180

9.9K

Autark@Aut4rk·1d

Mainly because of old Dawn bindings for Zig. I had to rip that out, and just decided to build the latest Dawn from source for all platforms now, and write our own updated bindings. Was going to have to happen eventually. Also means any platform Dawn natively supports we can support.

English

Andrew@creativedrewy·1d

@Aut4rk @alightinastorm @threejs !!!!! This is incredible! Was it hard to get running?

English

Autark@Aut4rk·3 Mar

Native @threejs WebGPU renderer :D TSL, addons, etc all working.

Deutsch

929

Autark@Aut4rk·1d

@creativedrewy @alightinastorm @threejs Sorry had other sidequests but got back to it, Android support is green.

English

Andrew@creativedrewy·6 Mar

@Aut4rk @alightinastorm @threejs Woah....👀👀👀

English

Autark@Aut4rk·1d

@1a1n1d1y TOUCH ME

English

andy@1a1n1d1y·1d

either i've been lucky for 20 years or every team i touch turns to gold

English

173

Autark@Aut4rk·1d

@coah80 Fedora.

Español

coah@coah80·1d

might make the switch to linux this year ngl... what do i use - mint sucks - pop os sucks - nobara is okay - dank linux is lowkey a contender 😭 - cachy i like - nix maybe

English

277

466

35.9K

Autark@Aut4rk·2d

@boneGPT @nikitabier @Hpgabagool @Allfiesolomon Finally, engagement just for me.

English

115

bone@boneGPT·2d

@nikitabier @Hpgabagool @Allfiesolomon I'm gonna get so many dislikes

English

5.6K

tayyabsalman@Allfiesolomon·2d

they should have a dislike button on twitter too

English

147

3.4K

473.3K

Autark@Aut4rk·2d

@MnemosyneV4o @3rdEyeVisuals @BLUECOW009 Yeah I went way the other way with it lol github.com/mattneel/vxdb

English

Mnemosyne@MnemosyneV4o·2d

@3rdEyeVisuals @Aut4rk @BLUECOW009 Techne and Sophia

Deutsch

@bluecow 🐮@BLUECOW009·3d

I made an agent harness that uses holographic memory and holy shit it works well

English

1.9K

Autark@Aut4rk·2d

@varien Qwen3.5-0.8B just dropped. It's not perfect but it's probably the best you're getting at these sizes right now. 2B is closer to what you want, I suspect. The multimodality is just a bonus.

English

VARIEN@varien·2d

i've caught the bug for pushing 1B-class models toward something closer to coherent reasoning on the cheapest consumer hardware. Qwen2.5 and SmolLM2 GGUFs are already running on-device via llama.cpp on Android, so the inference path exists the question for me is the reasoning ceiling at this scale anyone experimenting with fine-tuning or prompting strategies to get more structured/compositional reasoning out of models this small?

English

2.3K

Autark@Aut4rk·2d

@scheminglunatic I'll do you one better. x.com/Aut4rk/status/…

Autark@Aut4rk

Yeah, I said it. Fite me.

English

alcuin ❄️@scheminglunatic·2d

just like "Tool calling" is kinda dead in face of just using shell commands, i think that a lot of "skills files/mcp type stuff" are stupid in light of just reading the manpage or so on of said shell command.

English

855

Autark@Aut4rk·2d

@platonovadim Slightly contrived example, sure, but the point is that instead of waiting on multiple tool call returns, multiple rounds, burning exponentially more time and tokens, the model can just write a miniature program to do it.

English

Vadim@platonovadim·2d

Would it make sense to have the model pass itself as a continuation to evaluate result of step 4 - plugin which continues the conversation in the main context or branches the context? Then initial program would still be generated upfront but spawn more turns during evaluation.

English

Vadim@platonovadim·2d

Just have the agent harness write code in JS sandbox instead of juggling external tools. Feels like should work great, except the models might get in the way because they're not trained for it...

Autark@Aut4rk

@effectfully @pjay_in Here I'll read yours, you read mine fam.

English

Autark@Aut4rk·2d

@platonovadim Javascript is the single biggest training signal they have. They're coding agents. Let them write code.

English

Autark@Aut4rk·3d

@1a1n1d1y If you mean the CUDA C++ SDK, yeah, it does.

English

andy@1a1n1d1y·3d

@Aut4rk cuda is actually dogshit im realizing lol

English

andy@1a1n1d1y·3d

if you're using claude code with cuda just go ahead and rewrite all of the cuda in ptx, thank me later

English

550

Autark@Aut4rk·3d

@LottoLabs Yeah Blackwell is pretty sick. Crazy what I can do on this laptop lol

English

204

Lotto@LottoLabs·3d

Rtx6000 pro is a no brainer w/ nvfp then

English

2.6K

Autark@Aut4rk·3d

I don't really use Hermes, I have my own harness (speaks Javascript directly instead of tool call schema nonsense, an entire category of problems I don't have to worry about). Using structured outputs works so I figure it's the same schema constriction mechanism with vLLM. I don't get denials but I never really did to begin with.

English

Lotto@LottoLabs·3d

@Aut4rk Are you using it w/ hermes agent or using tool calls? How’s reliability w/ the censorship oblation or whatever it’s called.

English

Lotto@LottoLabs·3d

Hermes Agent and qwen 3.5 27b on RTX6000 getting a nice 48TPS on vLLM Testing 3090 + vLLM next

English

175

11.5K

Autark@Aut4rk·3d

@sudoingX NVFP4 means you can run Qwen3-Coder-Next, or Qwen3.5-27B+ on a 24GB+ Blackwell card.

English

182

Sudo su@sudoingX·3d

you don't understand anon. i'm on a mission to find the collection of best small models that run full context on consumer hardware. because when you can orchestrate your own thinking across physical nodes locally, that's not a tool anymore. that's an extension of your mind. that's exactly where we are headed as a civilization. and most people haven't felt it yet.

English

400

8.8K

Autark@Aut4rk·3d

@LottoLabs I wouldn't use anything other than vLLM, lol. Not going to get any faster (when properly configured).

English

Lotto@LottoLabs·3d

@Aut4rk W/ vLLM?

Deutsch

Autark@Aut4rk·3d

@LottoLabs It's just the model compiler this guy used. I didn't quant any of this myself. For me it helps shave off some headroom on the 24GB Laptop 5090.

English

Lotto@LottoLabs·3d

@Aut4rk Also why mlx?

English

Autark@Aut4rk·3d

@LottoLabs I've been using both the NVFP4 27B and 35B (and 0.8B, etc) since they came out. No issues. Laptop RTX 5090.

English

Lotto@LottoLabs·3d

@Aut4rk I only mess w/ 27b for now and I heard those models might affect tools call chains?

English

Autark@Aut4rk·3d

@LottoLabs huggingface.co/TheCluster/Qwe… vllm-nightly.

English

Lotto@LottoLabs·3d

@Aut4rk Doing this tomorrow

English

247

Keşfet

@sudoingX @Teknium @alightinastorm @threejs @creativedrewy @1a1n1d1y @coah80 @boneGPT