Artem Andreenko

2.8K posts

Artem Andreenko

@miolini

Building cool stuff @SentientWaveHQ. Pancomputationalist. This is my personal blog about compute, communication, and energy.

San Diego, CA Katılım Nisan 2008

4K Takip Edilen4.4K Takipçiler

Sabitlenmiş Tweet

Artem Andreenko@miolini·18 Mar

x.com/i/article/2034…

ZXX

600

Artem Andreenko@miolini·1h

@ollama it's looks like sorting in model catalog is broken

English

Artem Andreenko@miolini·1d

@sjurgis Smaller orbital data centers are likely to have even shorter operating lifespans. There is no sign that demand for old Nvidia A100 chips is lower today. The demand for tensor compute reflects a deeper civilizational shift in how society think, work, and make decisions.

English

Jurgis Šalna@sjurgis·1d

@miolini Which probably applies orbital datacenters too

English

Artem Andreenko@miolini·2d

Data centers in space are cool. But what about data centers inside wind turbines? They could each be connected via satellite internet. AI inference worload could be dynamically adjusted based on available wind resources, with small battery packs providing supplemental power. ༄༄༄

English

130

Artem Andreenko@miolini·1d

first approach for applying turbo quant on model weights

David T@coffeecup2020

Turbo Quant not just for KV, can use it on weights. I bought an RTX 5060 Ti 16GB around Christmas and had one goal: get a strong model running locally on my card without paying api fees. I have been testing local ai with open claw. I did not come into this with a quantization background. I only learned about llama, lmstudio and ollama two months ago. I just wanted something better than the usual Q3-class compromise (see my first post for benchmark). Many times, I like to buy 24gb card but looking at the price, I quickly turned away. When the TurboQuant paper came out, and when some shows memory can be saved in KV, I started wondering whether the same style of idea could help on weights, not just KV/ cache. P/S. I was nearly got the KV done with cuda support but someone beat me on it. After many long nights (until 2am) after work, that turned into a llama.cpp fork with a 3.5-bit weight format I’m calling TQ3_1S: Walsh-Hadamard rotation 8-centroid quantization dual half-block scales CUDA runtime support in llama.cpp This work is inspired by the broader transform-based quantization line, especially RaBitQ-style Walsh-Hadamard rotation ideas and the recent TurboQuant result (Tom). The thing I wanted to test was whether that same geometry could help on weights, not just KV/cache. Main Result on Qwen3.5-27B Q4_0: 7.2431 +/- 0.04822 TQ3_1S: 7.2570 +/- 0.04802 That is a gap of only +0.0139 PPL, about 0.19%, on the full wiki.test.raw pass (580 chunks, c=512). Size Q4_0: about 14.4 GB TQ3_1S: about 12.9 GB So TQ3_1S is about 10% smaller while staying near Q4_0 quality. The practical point for me is simple: TQ3_1S fits fully on my 16GB RTX 5060 Ti Q4_0 does not fit fully on GPU in the same setup So I’m not claiming “better than Q4_0” in general. I’m claiming something narrower and, I think, useful: near-Q4_0 quality materially smaller than Q4_0 enough to make a 27B model practical on a 16GB card Caveats this is the strongest result on the 27B witness, not a blanket claim that plain TQ3 works equally well on every model size I am pretty new to this, so I may miss a lot of test. I only have one card to test :-) Be skeptical as I can't believe I publish my own model the speed story here is mainly a deployment/fit win on this GPU class, not a blanket claim that native TQ3 kernels are always faster than native Q4_0 Links GitHub fork: github.com/turbo-tan/llam… Hugging Face GGUF: huggingface.co/YTan2000/Qwen3… I will open source the quantization steps when I have enough feedback and test.

English

280

Artem Andreenko@miolini·1d

@sjurgis It is not about a capacity factor. It is about adding more dynamically switchable compute on top of underutilized renewable energy resources while completely bypassing grid interconnection and avoiding expensive data centers with multiple layers of redundancy.

English

Jurgis Šalna@sjurgis·1d

@miolini en.wikipedia.org/wiki/Capacity_…

QME

Artem Andreenko@miolini·1d

@sjurgis raghavan.usc.edu/papers/infobat…

QME

Artem Andreenko@miolini·1d

What I love about the OpenSage Self Programming Agent Generation Engine is the idea that swarm can be modeled as a dynamic compute directed acyclic graph. It has that warm 1970s vibes, when many of the computer science foundations were being explored. rdi.berkeley.edu/blog/opensage/

English

163

Artem Andreenko@miolini·1d

@NousResearch Hermes Con when?

Rigario@Rigario

I've been testing the new hermes 0.6.0 release for the last couple of hours. Hermes full multi-agent works as well as I hoped it would. I've completely moved every agent over to Hermes. I started with openclaw more than 2 months ago. My setup is now all Hermes agent. On just openclaw: I spent 40% of my time on config changes and stability fixes. The rest on real work. On hermes + openclaw: I spent about 20% of my time on config changes and stability fixes. Speed up came because Hermes was really good at fixing openclaw Now fully hermes: I expect to only spend 5% of my time on config changes and only when I implement something new. @NousResearch is really cooking. Do yourself a favor, try it.

English

Artem Andreenko retweetledi

Tom Turney@no_stp_on_snek·2d

the original TurboQuant paper tested on A100 with models up to 8B. 6 days later, a bunch of strangers on the internet had it built and running on: - Apple Silicon M1 through M5 - NVIDIA 3080 Ti through DGX Spark Blackwell - AMD RX 6800 XT and 9070 - a 10-year-old Tesla P40 - an 8GB MacBook Air - models from 3.8B to 70B across 6 architecture families - 30+ independent testers along the way we found new optimizations the paper didn't cover and failure modes it didn't test. the fact that a loose group of people across the world can read a paper, build implementations from scratch, stress-test across hardware none of us could individually afford, and push the research further in under a week is genuinely one of the best things about this era. the tools and the community make it possible. open source is something else.

English

484

4.9K

139.6K

Artem Andreenko@miolini·4d

@getjonwithit Wow! Do you think function extraction for DRY may help to make it even more compact as a proxy for soundness?

English

601

Jonathan Gorard@getjonwithit·4d

Figured out how to make my theorem-prover significantly more powerful this week. Can now generate a 12,814 line proof (of local Lipschitz continuity for 2D isothermal Euler) in under 500ms. My dream of an end-to-end formally-verified hydrodynamics solver is now within sight!

English

944

44.7K

Artem Andreenko@miolini·4d

@digitalix USB Mini 4P

Indonesia

887

Alex Ziskind@digitalix·4d

does anyone know what kind of port this is? It’s next to the usb-c plug for reference.

English

14.2K

Artem Andreenko@miolini·4d

@digitalix forum.core-electronics.com.au/t/trying-to-so…

QME

1.1K

Artem Andreenko@miolini·4d

@pronounced_kyle It's cheaper to synthesize gasoline from air (h2o, co2) and solar energy in Australia, than transport it via Starship.

English

493

Christian Keil@pronounced_kyle·4d

okay, hear me out

English

272

26.2K

Artem Andreenko@miolini·4d

@MartinShkreli Automata github.com/sentientwave/a…

Español

287

Martin Shkreli@MartinShkreli·5d

what is the best tooling for 24-7 inference/agent-driven research? im trying factory but it stops and asks me questions even though i have 'auto' mode on. tbh i think this is an even bigger killer app than LLM chatbots. who else is out there doing it?

English

156

591

125.6K

Artem Andreenko@miolini·5d

This is an S curve. Big AI labs may push for ever larger models, but after a certain threshold, users do not need more capabilities. As Nyquist-Shannon shows, you only need enough sampling to cover the meaningful range. What matters is not max intelligence, but sufficient enough.

BuBBliK@k1rallik

x.com/i/article/2037…

English

450

Artem Andreenko retweetledi

Ivan Fioravanti ᯅ@ivanfioravanti·5d

Most Human Intelligence is Open, why Artificial Intelligence derived from that, should not?

English

2.4K

Artem Andreenko@miolini·5d

@docmilanfar What about arachnophobia?

English

Peyman Milanfar@docmilanfar·5d

generalization is a feature. memorization is a bug

English

8.8K

Artem Andreenko@miolini·5d

We just shipped new operating system AutomataOS for agentic companies and communities. It's open source, based on Linux and NixOS, support x86_64 and aarch64 architectures.

SentientWave@sentientwavehq

Today we are introducing AutomataOS, a new open source product from SentientWave built to make self-hosting Automata much easier. sentientwave.com/automataos

English

319

Keşfet

@ollama @sjurgis @NousResearch @getjonwithit @digitalix @pronounced_kyle @MartinShkreli @elonmusk