teromee

9.9K posts

teromee

@teromee

I am not a programmer I am a self-proclaimed systems infra analyst with the worst curse the ability to read ISA docs.

Taipei City, Taiwan Katılım Aralık 2013

114 Takip Edilen338 Takipçiler

Sabitlenmiş Tweet

teromee@teromee·13 Şub

ALIA=artificial limited intelligence agent ACI=artificial curated intelligence ASIS=artificial specified intelligence systems. AGI=artificial general intelligence This is how you make AI for the general public to use. And make it have provisions for private home ai use.

English

1.2K

teromee@teromee·1h

You know what the best part is is the fact that this individual that you're referring to is completely correct and his self will grindisement about rendering technologies in the fact that you can use the pre-existing infrastructure of our classical render solutions with the absolute insane power of the gpus that we currently have to squeeze every ounce of compute out of a GPU before you result to upscaling using AI. The AI you know dlss stuff should be like the cherry on top like it's an optional thing you can turn on. It is not a core feature.

English

Rin | 凛@TheIshikawaRin·1d

It's sad how many complex nuanced topics worth discussing, from societal issues and depictions in media to video game rendering and optimisation, get completely ruined by manipulative grifters with their own agendas and inflated self-importance. The worst part is that it works.

English

347

14.4K

teromee@teromee·1h

@HotAisle Be like coffeezilla or something man when it comes to people you do not like who are being grifters.

English

Hot Aisle@HotAisle·1h

@teromee For some of us it isn’t noise.

English

Hot Aisle@HotAisle·3h

I don’t think people realize how much deep shit trouble Dylan is in. One time I shared a url to an article to someone I was doing analyst consulting for. We even talked about the url on the call. It got me in a huge trouble. Imagine the giant iceberg heading towards him. Every single customer of SemiAnalysis now has to perform due diligence on their relationship. 🤡

English

7.8K

teromee@teromee·1h

@HotAisle Well that answers a couple of my questions. And also confirms at least two things I felt about semi anal. But it's nice to know to be confirming some stuff rather than constantly getting hyperbolized noise.

English

Hot Aisle@HotAisle·2h

@teromee x.com/HotAisle/statu…

Hot Aisle@HotAisle

🍿🍿🍿🍿🍿🍿🍿🍿🍿🍿🍿🍿🍿🍿🍿🍿🍿🍿🍿🍿🍿🍿🍿🍿🍿🍿🍿🍿🍿🍿🍿🍿🍿🍿🍿🍿🍿🍿 I don't doubt the information in this screenshot one bit, but this will come down to who has better lawyers (Dylan). This industry never stops being amazing.

QME

1.4K

teromee@teromee·2h

@0xSero Man, if this thing ever gains sentience they'll have a very thick skin to sass

English

0xSero@0xSero·1d

Roku's Basilisk won't be so happy with me.

English

2.9K

teromee@teromee·2h

@art_zucker Moe is for agents. Dense is for thinking and for processing. Human is for decision making. Layers Mason what do they mean?

English

Arthur Zucker@art_zucker·1d

The main reason I don't like MoEs is just philosophical, I'm a big ockham's razor believer and no one computed the actual brain/money cost of all in moe...

Arthur Zucker@art_zucker

To me the most unfair comparison is brain power / money spent on each arch. So many labs have had to re-train, optimize, write kernels, write specific communication protocols and patterns that the same amount of time, invested in dense models could just yield similar results. Its just survival bias. And to me a bit of a waste TBF just because I really like ockham's razor!

English

14.4K

teromee@teromee·2h

@shekhu04 Slowly reread through the my original comment.

English

Shikhar@shekhu04·11h

@teromee AI accelerates output not quality. without constraints it just amplifies bloat

English

158

Shikhar@shekhu04·1d

In 2026, we have CPUs with billions of transistors and 2-nanometer architecture, yet it takes your laptop longer to open a basic "To-Do" app today than it took a computer in 1995 to launch a word processor. This is Wirth's Law: Software is getting slower more rapidly than hardware becomes faster. We have essentially "spent" all our hardware gains on layers of abstraction, unoptimized libraries and AI-generated code bloat

English

235

456

235.6K

teromee@teromee·14h

I'm just taking a walk here, and I came to a realization. If anyone wants to see a true benchmark of AI coding capability, here's what they should do: Take a pre-existing open-source codebase that has significant technical debt. Have AI models go through it and rewrite the entire thing in the modern implementation of the programming language it was originally written in. Once they've done that, validate it and operate the now-updated versions of those pre-existing tools as modernized versions. Then put them through the vulnerability gauntlet to ensure the least amount of vulnerabilities remain and that existing vulnerabilities have been taken care of. To be perfectly honest, I think that's the best way to gauge AI coding capability. Then just do recursive self-training on the models until they can 100% pass the test on the open-source codebase you've made a clone of.

English

teromee@teromee·15h

The design is that you use the Moe as your agent model that is consuming and parsing. The data to then later give to you the human being. You're still at the top of the list of the hierarchy. The dense models right below you. And the Moe is below that. Why don't you try a mixture of diverse sized experts as a basis for your framework for your models.

English

Ahmad@TheAhmadOsman·1d

Fundamentals of LLMs: MoE vs Dense > many popular releases have been sparse MoEs > so when a dense model drops, everyone starts asking why it feels so much slower > that’s the cost of full activation > Dense = tokens run through every parameter of the model weifhts > MoE = tokens selectively activate a subset of the parameters of the model weights > Dense models (Qwen 3.5 27B, Gemma 4 31B) > every parameter fires on every token > ~27B ops per token, every time > MoE models (MiniMax M2, Kimi K2.5) > router + many experts > per token: activate top-k (usually 2) > the rest do nothing > this one design choice changes everything > inference speed > Dense is slower: all weights, every token > MoE is faster: a 675B model might only run ~40B active params > big model, small compute footprint > memory / VRAM > Dense: lower usage, only store what you execute (~140GB for 70B BF16) > MoE: all experts must live in memory (Kimi K2.5 is ~600GB in NVFP4) > compute / FLOPs > Dense: high compute burn per token > MoE: cheap per token, expensive to host in memory though

English

314

23.8K

teromee retweetledi

RPCS3@rpcs3·1d

We have achieved a new breakthrough on emulating PS3's Cell CPU! Elad discovered new SPU usage patterns and coded ways to generate more optimised PC code from them - benefitting all games! Twisted Metal, one of the most SPU-intensive games, sees a 5-7% Average FPS improvement.

English

149

879

12.1K

1.2M

teromee@teromee·1d

just making things and posting them because its better then nothing at all.

English

teromee@teromee·1d

@TripleWho @OfficialLoganK someone else already beat me to the turboquant compression #weight-compression-tq4_1s--experimental" target="_blank" rel="nofollow noopener">github.com/TheTom/turboqu…

English

кто хуй я хуй а может ты хуй@TripleWho·2d

@teromee @OfficialLoganK Could you share the results?

English

Logan Kilpatrick@OfficialLoganK·2d

Introducing Gemma 4, our series of open weight (Apache 2.0 licensed) models, which are byte for byte the most capable open models in the world! Gemma 4 is build to run on your hardware: phones, laptops, and desktops. Frontier intelligence with a 26B MOE and a 31B Dense model!

English

288

588

6.1K

467.9K

teromee@teromee·1d

@SebAaltonen Like I said long ago and will continue to say Nvidia hasn't actually designed their GPU compute infrastructure for asynchronous computation. Let's be real

English

790

Sebastian Aaltonen@SebAaltonen·2d

Tensor cores use the same schedulers, register files, memory load/store units and caches as CUDA cores. A SM running 100% tensor work on all warps can't run normal shader work at the same time. They are competing for resources. Also memory bandwidth and TDP is shared.

Anshel Sag@anshelsag

When a game developer fundamentally doesn't understand the GPU architecture. DLSS runs on tensor cores which are one of four different types of cores in a Blackwell GPU. The full 5090 GPU includes: ● 24576 CUDA Cores ● 192 RT Cores ● 768 Tensor Cores ● 768 Texture Units

English

641

42.8K

teromee@teromee·1d

@0xSero @huggingface I wonder what other techniques you could apply to the Gemma 4 model

English

263

0xSero@0xSero·2d

Gemma4-26B-REAP in progress. You will be able to run it on 12GB of VRAM, 16 to be comfortable.

English

775

37.3K

teromee@teromee·2d

oh god I wish you looked at the ISA doc for blackwell... oh well.. can't think all the time about that. I mean NV has been hiding the internal logic switching of the IMC infra 4 gens at this point. still can't do int and FP at the same time and speed along with their continued issues with async.

English

2.4K

Anshel Sag@anshelsag·2d

notch@notch

DLSS fundamentally makes no sense. Because the graphics card is too slow to run the game at reasonable speeds, you use THE SAME HARDWARE to run a neural network to generate frames in between the existing ones.

English

345

83.2K

teromee@teromee·2d

@zacbowden remember old outlook? when it just loaded the first email before all the elements were loaded in?

English

181

Zac Bowden@zacbowden·2d

I cannot believe it still takes this long for Outlook on Windows to display an email when you select it from a notification. It's INFURIATING!

English

257

3.2K

203.3K

teromee@teromee·3d

@HotAisle hey think about it like this if the DC in space work out you can have DC on the moon. along with other facilities on the moon base.

English

Hot Aisle@HotAisle·5d

$10m and I'll put GPUs on land with vastly less risk.

Y Combinator@ycombinator

Congrats to @Starcloud_ on their $170M Series A at a $1.1B valuation! They're building data centers in space—just 17 months from YC Demo Day to unicorn. They launched their first satellite with an Nvidia H100 GPU last year and are now developing Starcloud-3, a spacecraft designed to launch from Starship that aims to be cost-competitive with Earth-based data centers for AI inference. techcrunch.com/2026/03/30/sta…

English

2.1K

teromee@teromee·4d

I mean to be perfectly honest with you. It doesn't. It doesn't look like they've always done that they're like. Hey, here's the thing that we did some research on and then you know a year later the boys while they were making the better version of our AI model we're like. Hey, what if we applied this to both to like one side of the context window like the short-term memory and then apply nvidia's kvtc to the back half the long term of the context window? What would happen and then they just published the research paper and just said maybe you should try this with the short-term context window and then try a different type of lossless compression for the long's term context of any given conversation that can be easily retrievable and organized for optimal usage by a mixture of diverse expert system

English

BuBBliK@k1rallik·5d

Solo dev reverse-engineered Google's billion-dollar algorithm in 7 days Google published the paper that crashed memory stocks worldwide. Then shipped zero code. Tom Turney read the math, opened his terminal, and built the whole thing with Claude - then made it faster than Google promised. Day 1-3: Core algorithms, 141 tests, Python prototype Day 3-5: C port into llama.cpp, Metal GPU kernels Day 5-7: Speed optimization from 739 to 2747 tok/s That's a 3.7x speedup through pure engineering: > fp32 → fp16 WHT > half4 vectorized butterfly ops > graph-side rotation > block-32 storage layout Then he added his own research on top: > Sparse V: skip 90% of value decompressions at long context > Asymmetric K/V: keep keys precise, compress values harder > Temporal decay: old tokens get lower precision automatically Result: 35B model running on a MacBook with 4.6x compressed cache. 613 GitHub stars in a week. Google still hasn't released their own code.

BuBBliK@k1rallik

x.com/i/article/2037…

English

170

1.2K

9.2K

1.6M

Keşfet

@HotAisle @0xSero @art_zucker @shekhu04 @TripleWho @OfficialLoganK @elonmusk @BarackObama