Noetic Co

793 posts

Noetic Co

Noetic Co

@NoetekCo

Certified Financial Asshole /1

Katılım Mart 2025
62 Takip Edilen35 Takipçiler
Yannick Nick
Yannick Nick@keennay·
- DeepSeek V4 Flash - Native Precision (FP4 + FP8) - Fits on 2x RTX Pro 6000 GPUs + 256 GB DDR5 RAM - Using KTransformers: KVCache-AI fork of SGLang for GPU/CPU memory inference I have a somewhat obsession running applications on resource constrained systems to squeeze the maximum performance possible. Part of that comes from a past life working as a systems engineer, building & upgrading nationwide (USA) Video-On-Demand streaming backends, while navigating headless *nix servers around the time "cloud" was becoming a buzzword. KTransformers gets less mention across the LLM inference-sphere despite being among the engines listed for many of the popular models on HuggingFace (alongside vLLM, SGLang, & llama.cpp). The KVCache-AI team is best known for providing a forked SGLang for hybrid GPU / CPU memory inference, benefitting MoE models. I expect these hybrid setups to gain in popularity, especially on the consumer side as hardware prices continue soaring. "Necessity is the mother of invention" as they say, and local AI runners will continue finding more creative ways to run intelligence, whether that involves GPU/CPU memory offload, distributed training / inference, model weight / KV Cache quants, or REAPs. Here I have DeepSeek V4 Flash running at a 1M context length on 2x RTX Pro 6000s GPUs, using its native mixed precision of FP4 + FP8. KTransformers allows you to reduce your GPU utilization by offloading experts per MoE layer onto GPU VRAM, with the remaining balanced across system RAM. KTransformers also has the ability to update GPU expert placement during inference from routing statistics collected during the prefill phase. There's also a lot of trial and error involved given the limited amount of kernel support for RTX Pro 6000s. Two of the prompt load stress-test benchmarks I like to run are from the local-inference-lab/llm-inference-bench Github repo & AlienKevin/SWE-ZERO-12M-trajectories HuggingFace dataset. Here are the main KTransformers SGLang optimized flags: - Context Length: 1048576 - Total Number of Tokens: 1048576 - Chunked Prefill Size: 16384 - Max Prefill Tokens: 16384 - GPU Prefill Token Threshold: 1024 - GPU Memory Utilization: 87% - Number of Experts per MoE Layer on GPU: 134 / 256 - Max Running Requests: 256 - CUDA Graph Max Batch Size: 256 - CUDA Graph Batch Sizes: 1 2 4 8 16 32 64 128 256 - Available GPU Memory: 20.81GB (anything less was too tight for agentic coding) Below are the AlienKevin/SWE-ZERO-12M-trajectories benchmark results for 100 prompts with 10 concurrent, ~8k input tokens, & ~1k output tokens. Both Radix & Chunked Prefix Cache were disabled for the absolute worst-case scenario: - Prefill Mean Batch Tokens: 35756.93 tok/sec - Prefill Median Batch Tokens: 652.90 tok/sec - TTFT Mean: 20.698s - TTFT Median: 12.714s - Decode Mean Batch Output Tokens: 27.39 tok/sec - Decode Median Batch Output Tokens: 20.63 tok/sec - Utilized CPU memory: ~200 GB A more detailed write-up will follow, which'll include the methodology of calculating the number of experts per MoE layer on GPU, maximum number of tokens, and GPU memory utilization for a healthy balance for running tool calls & benchmarks in this hybrid setup. Hopefully this'll be reproducible for you and on alternative GPUs, as well as current & future models. Let me know how it works for you! My future plans involve GPU/CPU memory inference tests for MiniMax M3, GLM-5.2, and Kimi K2.7-Code. All links for all of the resources getting DeepSeek V4 Flash native mixed precision on 2x RTX Pro 6000 GPUs + 256 GB RAM can be found in the follow up post.
Yannick Nick tweet media
English
24
42
428
164.4K
Anton Kuratnik | AI Nerd
Anton Kuratnik | AI Nerd@anton_onAI·
@zerohedge I mean why would you run openai or anthropic on openrouter anyway? Though to be fair why would you be use openrouter in general lol. Best way to get the worst out of any model.
English
8
0
99
7.4K
zerohedge
zerohedge@zerohedge·
"the share of tokens used for US models on OpenRouter has collapsed": Bloomberg
zerohedge tweet media
English
168
501
3.7K
556.2K
Chris Weaver
Chris Weaver@WonkaWeirdo·
@realfrugalmogul What did you expect for a $50 house call? I would honestly like to know what the appointment would look like? Pay a guy, pay for a truck, part for parts and supplies to be stocked on said truck, Insurnaces, like 4 kinds, and so much overhead. You fell for the trap an got upset.
English
2
0
2
8.7K
The Frugal Mogul 🏡
The Frugal Mogul 🏡@realfrugalmogul·
HVAC Guy: I’m here for the $50 HVAC tune up Me: Sure, furnace is in the basement * 10 minutes later * HVAC Guy: Bad news. Something is rusted and cracked inside. That means CO2 is leaking into the house. I have to condemn your furnace, put a tag on it, and turn it off Me: You will do none of those things. HVAC Guy: I have to by code… Me: You will not touch my furnace, let me show you the door 🚪
English
243
25
1.8K
1.1M
Noetic Co
Noetic Co@NoetekCo·
@jonathan_wilke if your usual prompt is "how do these pants fit me, claude" or "where's the closest nail bar", then you won't understand the efficiency hack that is CLI.
English
0
0
0
22
Jonathan Wilke
Jonathan Wilke@jonathan_wilke·
I don't get the hype around CLI coding tools like Claude Code. Human-computer interaction evolved past the terminal 30 years ago for a reason. UIs won. Why are we regressing?
English
1.5K
59
2.7K
705.5K
Noetic Co retweetledi
Euan MacDonald
Euan MacDonald@Euan_MacDonald·
French President Emmanuel Macron pulls off what could be the greatest diplomatic troll of all time by getting Trump to sign the "$300 Billion US Surrender to Iran" deal in... Versailles. The ignoramus Trump will have been clueless as to the historical significance of the location
English
2.1K
12K
86.5K
7.9M
muz
muz@ozjaus·
@initjean 400GB of RAM is like 5k just in RAM costs
English
1
0
0
808
Noetic Co
Noetic Co@NoetekCo·
@NOLABALLER @CardPurchaser I buy the card at the right price and can regrade later. now, i don't like the SGC slab. least favorite. fat, tiny text, ugly and no QR/bar code to streamline collection management. @natsturner
English
1
0
0
25
Who Dat Cards
Who Dat Cards@NOLABALLER·
Is SGC dead in value as a grading card company?? I’ve seen so many people struggle to sell SGC slabs and buyers love a card but steer clear of buying it due to SGC slab. Are you team SGC or staying far away from these slabs? @CardPurchaser
Who Dat Cards tweet media
English
45
2
24
10.1K
Noetic Co
Noetic Co@NoetekCo·
@omgsidewalks scale + quantifiable ROI. teachers would be high value/status if they had a claim on their students' future productivity. bread molds, flowers wilt. no scale.
English
0
0
0
1.4K
‏ً
‏ً@omgsidewalks·
SERIOUS QUESTION: Why is it that actual human jobs like baker, florist, teacher, and childcare worker barely pay a livable wage, while fakė jobs like AI specialist bootlicker, marketing campaign parasite, and synergy consultant are pulling six figures ??
English
370
2.2K
20.9K
704.5K
Noetic Co
Noetic Co@NoetekCo·
@Gamingtronium 200 requests/second is not high volume. php bandwidth is all about caching and caching is not viable in a conversational near real-time paradigm.
English
0
0
0
97
Gamingtronium
Gamingtronium@Gamingtronium·
A 2009 PHP app on bare metal serves 12,000 requests/min on 384MB RAM. Meanwhile, our modern React/Node.js rewrite needs 4GB just to start. 15 years of "progress" and we're using 10x more memory for the same functionality. What happened to efficiency?
English
324
264
3.9K
183.6K
NOBUNAGA🇯🇵🏯_夏樹蒼依
NOBUNAGA🇯🇵🏯_夏樹蒼依@japan_nobunaga·
Be honest with me, Americans 🇺🇸 Do you actually own a gun? In Japan, I have never seen a real one. Not once. Not at a friend's house. Not in a drawer. Never. But online, every American just goes "oh, mine's in the nightstand" like it's a phone charger 🔌 So now I'm genuinely curious: What's YOUR gun? The very first one you ever got? Is this normal everywhere in the US? Or just some states?
NOBUNAGA🇯🇵🏯_夏樹蒼依 tweet media
English
11.1K
396
10.2K
1.1M
Geoff Wilson
Geoff Wilson@itsgeoffwilson·
Don’t do this.
English
51
6
222
76.3K
Noetic Co
Noetic Co@NoetekCo·
@johnarnold current frontier subscription plan token costs are subsidized about 90%.
English
0
0
0
962
John Arnold
John Arnold@johnarnold·
Most of the SpaceX neocloud analysis changes dramatically if you understand that there's a backwardated curve for compute today.
English
75
55
879
443.3K
The_Real_Fly
The_Real_Fly@The_Real_Fly·
Iranian media says the U.S. agreed to present reconstruction plans for Iran amounting to at least 300 billion dollars.
English
32
16
152
17.2K
Deer
Deer@AshTheDeerGuy·
@NoetekCo Fuck that noise
English
1
0
1
87
Deer
Deer@AshTheDeerGuy·
Fucking im sorry. I thought that if I accrued $7500 worth of firearms in my short live, that I could sell them for $5k in a fucking bind but apparently my $2000 Aug is worth a gen 3 glock 19 and $200 Fuck you all man
English
146
15
1.5K
147.1K
Deer
Deer@AshTheDeerGuy·
@NoetekCo What the fuvk is kac
English
1
0
1
739
Noetic Co
Noetic Co@NoetekCo·
@DA_Stockman Stakeholder capitalism didn't sell, so now it's AI infinite abundance and space as a jurisdiction = security moat vs. monkeywrenchers/adversaries, tax and regulatory gray zone to exploit in best of scenarios.
English
0
0
1
651
David Stockman
David Stockman@DA_Stockman·
Well, here's some math. Starlink is a profitable business with about $11 billion of sales and $3 billion of free cash flow. It might be worth $75 billion at a frisky multiple of 25X free cash flow. The balance----the space launch business and the AI/data centers in space fantasy----has $7 billion of sales and NEGATIVE -$17 billion of free cash flow. So why is it worth anything, unless you are pricing a dream peddled by sell-side hucksters?! In short, after trading up to $2 trillion based on $75 billion of tangible Starlink value, where's the remaining $1.925 trillion of it? This isn't just the classical mania of the crowds. This is sui generis--- mass insanity in a casino that has been giving a lobotomy by three decades of money-printing madness at the Fed and its fellow-traveling central banks around the planet.
Jim Stewartson, Decelerationist 🇨🇦🇺🇦🇺🇸@jimstewartson

For $135 per share of SpaceX, you get 1/13,000,000,000th (One 13-BILLIONTH) of a company that in 2025 received $18,000,000,000 and lost $5,000,000,000 It’s allegedly worth $1,770,000,000,000 Do people not understand arithmetic anymore? Can they not count zeroes? Mass delusion.

English
191
636
2.6K
446.5K
Noetic Co
Noetic Co@NoetekCo·
@calcarinus @quantinine @Kimi_Moonshot @basedjensen @TheAhmadOsman i mean, you can get into a DIY single rtx pro 6k setup for under $15k with room to add a second incrementally. If there's an observable boost going to DSv4 flash over 3.6 27b, it's at a huge hardware cost. I think it boils down to what to do local vs. api offload.
English
0
0
0
14
Kimi.ai
Kimi.ai@Kimi_Moonshot·
🌘 Kimi-K2.7-Code, our latest coding model, is now released and open-sourced! 🔷 Improved coding & agent performance over K2.6: +21.8% on Kimi Code Bench v2, +11.0% on Program Bench, and +31.5% on MLS Bench Lite. 🔷 Reasoning efficiency: Less overthinking, with 30% lower reasoning-token usage compared to K2.6. 🔷 Long-horizon coding: Improved instruction following, higher end-to-end coding task success rates. ⚡️ 6x High-Speed Mode coming soon! 🔌 Available today via Kimi API and Kimi Code. 🔗 Kimi Code: kimi.com/code 🔗 API: platform.moonshot.ai
Kimi.ai tweet mediaKimi.ai tweet media
English
644
1.8K
14K
2.5M