Artem Andreenko

2.8K posts

Artem Andreenko banner
Artem Andreenko

Artem Andreenko

@miolini

Building cool stuff @SentientWaveHQ. Pancomputationalist. This is my personal blog about compute, communication, and energy.

San Diego, CA Katılım Nisan 2008
4K Takip Edilen4.4K Takipçiler
Artem Andreenko
Artem Andreenko@miolini·
@sjurgis Smaller orbital data centers are likely to have even shorter operating lifespans. There is no sign that demand for old Nvidia A100 chips is lower today. The demand for tensor compute reflects a deeper civilizational shift in how society think, work, and make decisions.
English
1
0
0
12
Artem Andreenko
Artem Andreenko@miolini·
Data centers in space are cool. But what about data centers inside wind turbines? They could each be connected via satellite internet. AI inference worload could be dynamically adjusted based on available wind resources, with small battery packs providing supplemental power. ༄༄༄
Artem Andreenko tweet media
English
1
0
1
130
Artem Andreenko
Artem Andreenko@miolini·
first approach for applying turbo quant on model weights
David T@coffeecup2020

Turbo Quant not just for KV, can use it on weights. I bought an RTX 5060 Ti 16GB around Christmas and had one goal: get a strong model running locally on my card without paying api fees. I have been testing local ai with open claw. I did not come into this with a quantization background. I only learned about llama, lmstudio and ollama two months ago. I just wanted something better than the usual Q3-class compromise (see my first post for benchmark). Many times, I like to buy 24gb card but looking at the price, I quickly turned away. When the TurboQuant paper came out, and when some shows memory can be saved in KV, I started wondering whether the same style of idea could help on weights, not just KV/ cache. P/S. I was nearly got the KV done with cuda support but someone beat me on it. After many long nights (until 2am) after work, that turned into a llama.cpp fork with a 3.5-bit weight format I’m calling TQ3_1S: Walsh-Hadamard rotation 8-centroid quantization dual half-block scales CUDA runtime support in llama.cpp This work is inspired by the broader transform-based quantization line, especially RaBitQ-style Walsh-Hadamard rotation ideas and the recent TurboQuant result (Tom). The thing I wanted to test was whether that same geometry could help on weights, not just KV/cache. Main Result on Qwen3.5-27B Q4_0: 7.2431 +/- 0.04822 TQ3_1S: 7.2570 +/- 0.04802 That is a gap of only +0.0139 PPL, about 0.19%, on the full wiki.test.raw pass (580 chunks, c=512). Size Q4_0: about 14.4 GB TQ3_1S: about 12.9 GB So TQ3_1S is about 10% smaller while staying near Q4_0 quality. The practical point for me is simple: TQ3_1S fits fully on my 16GB RTX 5060 Ti Q4_0 does not fit fully on GPU in the same setup So I’m not claiming “better than Q4_0” in general. I’m claiming something narrower and, I think, useful: near-Q4_0 quality materially smaller than Q4_0 enough to make a 27B model practical on a 16GB card Caveats this is the strongest result on the 27B witness, not a blanket claim that plain TQ3 works equally well on every model size I am pretty new to this, so I may miss a lot of test. I only have one card to test :-) Be skeptical as I can't believe I publish my own model the speed story here is mainly a deployment/fit win on this GPU class, not a blanket claim that native TQ3 kernels are always faster than native Q4_0 Links GitHub fork: github.com/turbo-tan/llam… Hugging Face GGUF: huggingface.co/YTan2000/Qwen3… I will open source the quantization steps when I have enough feedback and test.

English
0
0
1
280
Artem Andreenko
Artem Andreenko@miolini·
@sjurgis It is not about a capacity factor. It is about adding more dynamically switchable compute on top of underutilized renewable energy resources while completely bypassing grid interconnection and avoiding expensive data centers with multiple layers of redundancy.
English
1
0
0
13
Artem Andreenko
Artem Andreenko@miolini·
What I love about the OpenSage Self Programming Agent Generation Engine is the idea that swarm can be modeled as a dynamic compute directed acyclic graph. It has that warm 1970s vibes, when many of the computer science foundations were being explored. rdi.berkeley.edu/blog/opensage/
Artem Andreenko tweet media
English
0
0
0
163
Artem Andreenko retweetledi
Tom Turney
Tom Turney@no_stp_on_snek·
the original TurboQuant paper tested on A100 with models up to 8B. 6 days later, a bunch of strangers on the internet had it built and running on: - Apple Silicon M1 through M5 - NVIDIA 3080 Ti through DGX Spark Blackwell - AMD RX 6800 XT and 9070 - a 10-year-old Tesla P40 - an 8GB MacBook Air - models from 3.8B to 70B across 6 architecture families - 30+ independent testers along the way we found new optimizations the paper didn't cover and failure modes it didn't test. the fact that a loose group of people across the world can read a paper, build implementations from scratch, stress-test across hardware none of us could individually afford, and push the research further in under a week is genuinely one of the best things about this era. the tools and the community make it possible. open source is something else.
Tom Turney tweet media
English
51
484
4.9K
139.6K
Artem Andreenko
Artem Andreenko@miolini·
@getjonwithit Wow! Do you think function extraction for DRY may help to make it even more compact as a proxy for soundness?
English
0
0
0
601
Jonathan Gorard
Jonathan Gorard@getjonwithit·
Figured out how to make my theorem-prover significantly more powerful this week. Can now generate a 12,814 line proof (of local Lipschitz continuity for 2D isothermal Euler) in under 500ms. My dream of an end-to-end formally-verified hydrodynamics solver is now within sight!
Jonathan Gorard tweet media
English
50
63
944
44.7K
Alex Ziskind
Alex Ziskind@digitalix·
does anyone know what kind of port this is? It’s next to the usb-c plug for reference.
Alex Ziskind tweet media
English
17
1
37
14.2K
Artem Andreenko
Artem Andreenko@miolini·
@pronounced_kyle It's cheaper to synthesize gasoline from air (h2o, co2) and solar energy in Australia, than transport it via Starship.
English
0
1
14
493
Martin Shkreli
Martin Shkreli@MartinShkreli·
what is the best tooling for 24-7 inference/agent-driven research? im trying factory but it stops and asks me questions even though i have 'auto' mode on. tbh i think this is an even bigger killer app than LLM chatbots. who else is out there doing it?
English
156
21
591
125.6K
Artem Andreenko
Artem Andreenko@miolini·
This is an S curve. Big AI labs may push for ever larger models, but after a certain threshold, users do not need more capabilities. As Nyquist-Shannon shows, you only need enough sampling to cover the meaningful range. What matters is not max intelligence, but sufficient enough.
BuBBliK@k1rallik

x.com/i/article/2037…

English
1
0
1
450
Artem Andreenko retweetledi
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
Most Human Intelligence is Open, why Artificial Intelligence derived from that, should not?
English
1
2
27
2.4K
Peyman Milanfar
Peyman Milanfar@docmilanfar·
generalization is a feature. memorization is a bug
English
13
6
97
8.8K