teromee

9.9K posts

teromee banner
teromee

teromee

@teromee

I am not a programmer I am a self-proclaimed systems infra analyst with the worst curse the ability to read ISA docs.

Taipei City, Taiwan Katılım Aralık 2013
114 Takip Edilen338 Takipçiler
Sabitlenmiş Tweet
teromee
teromee@teromee·
ALIA=artificial limited intelligence agent ACI=artificial curated intelligence ASIS=artificial specified intelligence systems. AGI=artificial general intelligence This is how you make AI for the general public to use. And make it have provisions for private home ai use.
English
4
1
7
1.2K
teromee
teromee@teromee·
You know what the best part is is the fact that this individual that you're referring to is completely correct and his self will grindisement about rendering technologies in the fact that you can use the pre-existing infrastructure of our classical render solutions with the absolute insane power of the gpus that we currently have to squeeze every ounce of compute out of a GPU before you result to upscaling using AI. The AI you know dlss stuff should be like the cherry on top like it's an optional thing you can turn on. It is not a core feature.
English
0
0
0
3
Rin | 凛
Rin | 凛@TheIshikawaRin·
It's sad how many complex nuanced topics worth discussing, from societal issues and depictions in media to video game rendering and optimisation, get completely ruined by manipulative grifters with their own agendas and inflated self-importance. The worst part is that it works.
Rin | 凛 tweet media
English
41
14
347
14.4K
teromee
teromee@teromee·
@HotAisle Be like coffeezilla or something man when it comes to people you do not like who are being grifters.
English
0
0
1
57
Hot Aisle
Hot Aisle@HotAisle·
I don’t think people realize how much deep shit trouble Dylan is in. One time I shared a url to an article to someone I was doing analyst consulting for. We even talked about the url on the call. It got me in a huge trouble. Imagine the giant iceberg heading towards him. Every single customer of SemiAnalysis now has to perform due diligence on their relationship. 🤡
English
4
1
57
7.8K
teromee
teromee@teromee·
@HotAisle Well that answers a couple of my questions. And also confirms at least two things I felt about semi anal. But it's nice to know to be confirming some stuff rather than constantly getting hyperbolized noise.
English
1
0
2
84
teromee
teromee@teromee·
@0xSero Man, if this thing ever gains sentience they'll have a very thick skin to sass
English
0
0
0
2
0xSero
0xSero@0xSero·
Roku's Basilisk won't be so happy with me.
0xSero tweet media
English
9
0
31
2.9K
teromee
teromee@teromee·
@art_zucker Moe is for agents. Dense is for thinking and for processing. Human is for decision making. Layers Mason what do they mean?
English
0
0
1
12
teromee
teromee@teromee·
@shekhu04 Slowly reread through the my original comment.
English
0
0
0
0
Shikhar
Shikhar@shekhu04·
@teromee AI accelerates output not quality. without constraints it just amplifies bloat
English
1
0
0
158
Shikhar
Shikhar@shekhu04·
In 2026, we have CPUs with billions of transistors and 2-nanometer architecture, yet it takes your laptop longer to open a basic "To-Do" app today than it took a computer in 1995 to launch a word processor. This is Wirth's Law: Software is getting slower more rapidly than hardware becomes faster. We have essentially "spent" all our hardware gains on layers of abstraction, unoptimized libraries and AI-generated code bloat
English
235
456
6K
235.6K
teromee
teromee@teromee·
I'm just taking a walk here, and I came to a realization. If anyone wants to see a true benchmark of AI coding capability, here's what they should do: Take a pre-existing open-source codebase that has significant technical debt. Have AI models go through it and rewrite the entire thing in the modern implementation of the programming language it was originally written in. Once they've done that, validate it and operate the now-updated versions of those pre-existing tools as modernized versions. Then put them through the vulnerability gauntlet to ensure the least amount of vulnerabilities remain and that existing vulnerabilities have been taken care of. To be perfectly honest, I think that's the best way to gauge AI coding capability. Then just do recursive self-training on the models until they can 100% pass the test on the open-source codebase you've made a clone of.
English
0
0
1
14
teromee
teromee@teromee·
The design is that you use the Moe as your agent model that is consuming and parsing. The data to then later give to you the human being. You're still at the top of the list of the hierarchy. The dense models right below you. And the Moe is below that. Why don't you try a mixture of diverse sized experts as a basis for your framework for your models.
English
0
0
0
33
Ahmad
Ahmad@TheAhmadOsman·
Fundamentals of LLMs: MoE vs Dense > many popular releases have been sparse MoEs > so when a dense model drops, everyone starts asking why it feels so much slower > that’s the cost of full activation > Dense = tokens run through every parameter of the model weifhts > MoE = tokens selectively activate a subset of the parameters of the model weights > Dense models (Qwen 3.5 27B, Gemma 4 31B) > every parameter fires on every token > ~27B ops per token, every time > MoE models (MiniMax M2, Kimi K2.5) > router + many experts > per token: activate top-k (usually 2) > the rest do nothing > this one design choice changes everything > inference speed > Dense is slower: all weights, every token > MoE is faster: a 675B model might only run ~40B active params > big model, small compute footprint > memory / VRAM > Dense: lower usage, only store what you execute (~140GB for 70B BF16) > MoE: all experts must live in memory (Kimi K2.5 is ~600GB in NVFP4) > compute / FLOPs > Dense: high compute burn per token > MoE: cheap per token, expensive to host in memory though
Ahmad tweet media
English
30
21
314
23.8K
teromee retweetledi
RPCS3
RPCS3@rpcs3·
We have achieved a new breakthrough on emulating PS3's Cell CPU! Elad discovered new SPU usage patterns and coded ways to generate more optimised PC code from them - benefitting all games! Twisted Metal, one of the most SPU-intensive games, sees a 5-7% Average FPS improvement.
English
149
879
12.1K
1.2M
teromee
teromee@teromee·
just making things and posting them because its better then nothing at all.
English
0
0
1
8
teromee
teromee@teromee·
@TripleWho @OfficialLoganK someone else already beat me to the turboquant compression #weight-compression-tq4_1s--experimental" target="_blank" rel="nofollow noopener">github.com/TheTom/turboqu…
English
0
0
1
17
Logan Kilpatrick
Logan Kilpatrick@OfficialLoganK·
Introducing Gemma 4, our series of open weight (Apache 2.0 licensed) models, which are byte for byte the most capable open models in the world! Gemma 4 is build to run on your hardware: phones, laptops, and desktops. Frontier intelligence with a 26B MOE and a 31B Dense model!
Logan Kilpatrick tweet media
English
288
588
6.1K
467.9K
teromee
teromee@teromee·
@SebAaltonen Like I said long ago and will continue to say Nvidia hasn't actually designed their GPU compute infrastructure for asynchronous computation. Let's be real
English
0
0
2
790
Sebastian Aaltonen
Sebastian Aaltonen@SebAaltonen·
Tensor cores use the same schedulers, register files, memory load/store units and caches as CUDA cores. A SM running 100% tensor work on all warps can't run normal shader work at the same time. They are competing for resources. Also memory bandwidth and TDP is shared.
Anshel Sag@anshelsag

When a game developer fundamentally doesn't understand the GPU architecture. DLSS runs on tensor cores which are one of four different types of cores in a Blackwell GPU. The full 5090 GPU includes: ● 24576 CUDA Cores ● 192 RT Cores ● 768 Tensor Cores ● 768 Texture Units

English
10
28
641
42.8K
0xSero
0xSero@0xSero·
Gemma4-26B-REAP in progress. You will be able to run it on 12GB of VRAM, 16 to be comfortable.
0xSero tweet media
English
51
33
775
37.3K
teromee
teromee@teromee·
oh god I wish you looked at the ISA doc for blackwell... oh well.. can't think all the time about that. I mean NV has been hiding the internal logic switching of the IMC infra 4 gens at this point. still can't do int and FP at the same time and speed along with their continued issues with async.
English
0
0
16
2.4K
Anshel Sag
Anshel Sag@anshelsag·
When a game developer fundamentally doesn't understand the GPU architecture. DLSS runs on tensor cores which are one of four different types of cores in a Blackwell GPU. The full 5090 GPU includes: ● 24576 CUDA Cores ● 192 RT Cores ● 768 Tensor Cores ● 768 Texture Units
notch@notch

DLSS fundamentally makes no sense. Because the graphics card is too slow to run the game at reasonable speeds, you use THE SAME HARDWARE to run a neural network to generate frames in between the existing ones.

English
33
8
345
83.2K
teromee
teromee@teromee·
@zacbowden remember old outlook? when it just loaded the first email before all the elements were loaded in?
English
0
0
1
181
Zac Bowden
Zac Bowden@zacbowden·
I cannot believe it still takes this long for Outlook on Windows to display an email when you select it from a notification. It's INFURIATING!
English
257
90
3.2K
203.3K
teromee
teromee@teromee·
@HotAisle hey think about it like this if the DC in space work out you can have DC on the moon. along with other facilities on the moon base.
English
0
0
0
13
Hot Aisle
Hot Aisle@HotAisle·
$10m and I'll put GPUs on land with vastly less risk.
Y Combinator@ycombinator

Congrats to @Starcloud_ on their $170M Series A at a $1.1B valuation! They're building data centers in space—just 17 months from YC Demo Day to unicorn. They launched their first satellite with an Nvidia H100 GPU last year and are now developing Starcloud-3, a spacecraft designed to launch from Starship that aims to be cost-competitive with Earth-based data centers for AI inference. techcrunch.com/2026/03/30/sta…

English
2
2
23
2.1K
teromee
teromee@teromee·
I mean to be perfectly honest with you. It doesn't. It doesn't look like they've always done that they're like. Hey, here's the thing that we did some research on and then you know a year later the boys while they were making the better version of our AI model we're like. Hey, what if we applied this to both to like one side of the context window like the short-term memory and then apply nvidia's kvtc to the back half the long term of the context window? What would happen and then they just published the research paper and just said maybe you should try this with the short-term context window and then try a different type of lossless compression for the long's term context of any given conversation that can be easily retrievable and organized for optimal usage by a mixture of diverse expert system
English
0
0
0
8
BuBBliK
BuBBliK@k1rallik·
Solo dev reverse-engineered Google's billion-dollar algorithm in 7 days Google published the paper that crashed memory stocks worldwide. Then shipped zero code. Tom Turney read the math, opened his terminal, and built the whole thing with Claude - then made it faster than Google promised. Day 1-3: Core algorithms, 141 tests, Python prototype Day 3-5: C port into llama.cpp, Metal GPU kernels Day 5-7: Speed optimization from 739 to 2747 tok/s That's a 3.7x speedup through pure engineering: > fp32 → fp16 WHT > half4 vectorized butterfly ops > graph-side rotation > block-32 storage layout Then he added his own research on top: > Sparse V: skip 90% of value decompressions at long context > Asymmetric K/V: keep keys precise, compress values harder > Temporal decay: old tokens get lower precision automatically Result: 35B model running on a MacBook with 4.6x compressed cache. 613 GitHub stars in a week. Google still hasn't released their own code.
BuBBliK tweet media
BuBBliK@k1rallik

x.com/i/article/2037…

English
170
1.2K
9.2K
1.6M