Noetic Co

793 posts

Noetic Co

@NoetekCo

Certified Financial Asshole /1

Katılım Mart 2025

62 Takip Edilen35 Takipçiler

Noetic Co@NoetekCo·6m

@xtremesecurity @keennay @KVCache_AI can try vast.ai

English

Tony Carter@xtremesecurity·2h

@keennay @KVCache_AI Where do you rent from?

English

Yannick Nick@keennay·5h

- DeepSeek V4 Flash - Native Precision (FP4 + FP8) - Fits on 2x RTX Pro 6000 GPUs + 256 GB DDR5 RAM - Using KTransformers: KVCache-AI fork of SGLang for GPU/CPU memory inference I have a somewhat obsession running applications on resource constrained systems to squeeze the maximum performance possible. Part of that comes from a past life working as a systems engineer, building & upgrading nationwide (USA) Video-On-Demand streaming backends, while navigating headless *nix servers around the time "cloud" was becoming a buzzword. KTransformers gets less mention across the LLM inference-sphere despite being among the engines listed for many of the popular models on HuggingFace (alongside vLLM, SGLang, & llama.cpp). The KVCache-AI team is best known for providing a forked SGLang for hybrid GPU / CPU memory inference, benefitting MoE models. I expect these hybrid setups to gain in popularity, especially on the consumer side as hardware prices continue soaring. "Necessity is the mother of invention" as they say, and local AI runners will continue finding more creative ways to run intelligence, whether that involves GPU/CPU memory offload, distributed training / inference, model weight / KV Cache quants, or REAPs. Here I have DeepSeek V4 Flash running at a 1M context length on 2x RTX Pro 6000s GPUs, using its native mixed precision of FP4 + FP8. KTransformers allows you to reduce your GPU utilization by offloading experts per MoE layer onto GPU VRAM, with the remaining balanced across system RAM. KTransformers also has the ability to update GPU expert placement during inference from routing statistics collected during the prefill phase. There's also a lot of trial and error involved given the limited amount of kernel support for RTX Pro 6000s. Two of the prompt load stress-test benchmarks I like to run are from the local-inference-lab/llm-inference-bench Github repo & AlienKevin/SWE-ZERO-12M-trajectories HuggingFace dataset. Here are the main KTransformers SGLang optimized flags: - Context Length: 1048576 - Total Number of Tokens: 1048576 - Chunked Prefill Size: 16384 - Max Prefill Tokens: 16384 - GPU Prefill Token Threshold: 1024 - GPU Memory Utilization: 87% - Number of Experts per MoE Layer on GPU: 134 / 256 - Max Running Requests: 256 - CUDA Graph Max Batch Size: 256 - CUDA Graph Batch Sizes: 1 2 4 8 16 32 64 128 256 - Available GPU Memory: 20.81GB (anything less was too tight for agentic coding) Below are the AlienKevin/SWE-ZERO-12M-trajectories benchmark results for 100 prompts with 10 concurrent, ~8k input tokens, & ~1k output tokens. Both Radix & Chunked Prefix Cache were disabled for the absolute worst-case scenario: - Prefill Mean Batch Tokens: 35756.93 tok/sec - Prefill Median Batch Tokens: 652.90 tok/sec - TTFT Mean: 20.698s - TTFT Median: 12.714s - Decode Mean Batch Output Tokens: 27.39 tok/sec - Decode Median Batch Output Tokens: 20.63 tok/sec - Utilized CPU memory: ~200 GB A more detailed write-up will follow, which'll include the methodology of calculating the number of experts per MoE layer on GPU, maximum number of tokens, and GPU memory utilization for a healthy balance for running tool calls & benchmarks in this hybrid setup. Hopefully this'll be reproducible for you and on alternative GPUs, as well as current & future models. Let me know how it works for you! My future plans involve GPU/CPU memory inference tests for MiniMax M3, GLM-5.2, and Kimi K2.7-Code. All links for all of the resources getting DeepSeek V4 Flash native mixed precision on 2x RTX Pro 6000 GPUs + 256 GB RAM can be found in the follow up post.

English

428

164.4K

Noetic Co@NoetekCo·22h

@anton_onAI @zerohedge moonshot.ai api is ass, for one. you can get a much better kimi instance on openrouter

English

623

Anton Kuratnik | AI Nerd@anton_onAI·22h

@zerohedge I mean why would you run openai or anthropic on openrouter anyway? Though to be fair why would you be use openrouter in general lol. Best way to get the worst out of any model.

English

7.4K

zerohedge@zerohedge·1d

"the share of tokens used for US models on OpenRouter has collapsed": Bloomberg

English

168

501

3.7K

556.2K

Noetic Co@NoetekCo·6d

@WonkaWeirdo @realfrugalmogul Everyone is driving a $100k truck now. All of them.

English

Chris Weaver@WonkaWeirdo·6d

@realfrugalmogul What did you expect for a $50 house call? I would honestly like to know what the appointment would look like? Pay a guy, pay for a truck, part for parts and supplies to be stocked on said truck, Insurnaces, like 4 kinds, and so much overhead. You fell for the trap an got upset.

English

8.7K

The Frugal Mogul 🏡@realfrugalmogul·19 Haz

HVAC Guy: I’m here for the $50 HVAC tune up Me: Sure, furnace is in the basement * 10 minutes later * HVAC Guy: Bad news. Something is rusted and cracked inside. That means CO2 is leaking into the house. I have to condemn your furnace, put a tag on it, and turn it off Me: You will do none of those things. HVAC Guy: I have to by code… Me: You will not touch my furnace, let me show you the door 🚪

English

243

1.8K

1.1M

Noetic Co@NoetekCo·18 Haz

@jonathan_wilke if your usual prompt is "how do these pants fit me, claude" or "where's the closest nail bar", then you won't understand the efficiency hack that is CLI.

English

Jonathan Wilke@jonathan_wilke·18 Haz

I don't get the hype around CLI coding tools like Claude Code. Human-computer interaction evolved past the terminal 30 years ago for a reason. UIs won. Why are we regressing?

English

1.5K

2.7K

705.5K

Noetic Co retweetledi

Euan MacDonald@Euan_MacDonald·18 Haz

French President Emmanuel Macron pulls off what could be the greatest diplomatic troll of all time by getting Trump to sign the "$300 Billion US Surrender to Iran" deal in... Versailles. The ignoramus Trump will have been clueless as to the historical significance of the location

English

2.1K

12K

86.5K

7.9M

Noetic Co@NoetekCo·17 Haz

@ozjaus @initjean the good stuff is $4k and up/128gb now

English

muz@ozjaus·17 Haz

@initjean 400GB of RAM is like 5k just in RAM costs

English

808

Jean P.D. Meijer ― 🇪🇺 eu/acc@initjean·17 Haz

stay with me: if i need 400GB of (V)RAM it’s not a “local model” no normal person has this

English

328

163

6.5K

344.7K

Noetic Co@NoetekCo·17 Haz

@initjean low gpu status

Indonesia

Noetic Co@NoetekCo·15 Haz

@NOLABALLER @CardPurchaser I buy the card at the right price and can regrade later. now, i don't like the SGC slab. least favorite. fat, tiny text, ugly and no QR/bar code to streamline collection management. @natsturner

English

Who Dat Cards@NOLABALLER·14 Haz

Is SGC dead in value as a grading card company?? I’ve seen so many people struggle to sell SGC slabs and buyers love a card but steer clear of buying it due to SGC slab. Are you team SGC or staying far away from these slabs? @CardPurchaser

English

10.1K

Noetic Co@NoetekCo·15 Haz

@omgsidewalks scale + quantifiable ROI. teachers would be high value/status if they had a claim on their students' future productivity. bread molds, flowers wilt. no scale.

English

1.4K

‏ً@omgsidewalks·15 Haz

SERIOUS QUESTION: Why is it that actual human jobs like baker, florist, teacher, and childcare worker barely pay a livable wage, while fakė jobs like AI specialist bootlicker, marketing campaign parasite, and synergy consultant are pulling six figures ??

English

370

2.2K

20.9K

704.5K

Noetic Co@NoetekCo·15 Haz

@Gamingtronium 200 requests/second is not high volume. php bandwidth is all about caching and caching is not viable in a conversational near real-time paradigm.

English

Gamingtronium@Gamingtronium·15 Haz

A 2009 PHP app on bare metal serves 12,000 requests/min on 384MB RAM. Meanwhile, our modern React/Node.js rewrite needs 4GB just to start. 15 years of "progress" and we're using 10x more memory for the same functionality. What happened to efficiency?

English

324

264

3.9K

183.6K

Noetic Co@NoetekCo·15 Haz

@japan_nobunaga Show me your katana first.

English

NOBUNAGA🇯🇵🏯_夏樹蒼依@japan_nobunaga·15 Haz

Be honest with me, Americans 🇺🇸 Do you actually own a gun? In Japan, I have never seen a real one. Not once. Not at a friend's house. Not in a drawer. Never. But online, every American just goes "oh, mine's in the nightstand" like it's a phone charger 🔌 So now I'm genuinely curious: What's YOUR gun? The very first one you ever got? Is this normal everywhere in the US? Or just some states?

English

11.1K

396

10.2K

1.1M

Noetic Co@NoetekCo·15 Haz

@itsgeoffwilson zero chance he has a long career with that chassis.

English

Geoff Wilson@itsgeoffwilson·15 Haz

Don’t do this.

English

222

76.3K

Noetic Co@NoetekCo·15 Haz

@johnarnold current frontier subscription plan token costs are subsidized about 90%.

English

962

John Arnold@johnarnold·15 Haz

Most of the SpaceX neocloud analysis changes dramatically if you understand that there's a backwardated curve for compute today.

English

879

443.3K

Noetic Co@NoetekCo·15 Haz

@The_Real_Fly 10% for the big guy; don't forget!

English

248

The_Real_Fly@The_Real_Fly·15 Haz

Iranian media says the U.S. agreed to present reconstruction plans for Iran amounting to at least 300 billion dollars.

English

152

17.2K

Noetic Co@NoetekCo·14 Haz

@AshTheDeerGuy

QME

Deer@AshTheDeerGuy·14 Haz

@NoetekCo Fuck that noise

English

Deer@AshTheDeerGuy·14 Haz

Fucking im sorry. I thought that if I accrued $7500 worth of firearms in my short live, that I could sell them for $5k in a fucking bind but apparently my $2000 Aug is worth a gen 3 glock 19 and $200 Fuck you all man

English

146

1.5K

147.1K

Noetic Co@NoetekCo·14 Haz

@AshTheDeerGuy knightarmco.com

QME

Deer@AshTheDeerGuy·14 Haz

@NoetekCo What the fuvk is kac

English

739

Noetic Co@NoetekCo·13 Haz

@DA_Stockman Stakeholder capitalism didn't sell, so now it's AI infinite abundance and space as a jurisdiction = security moat vs. monkeywrenchers/adversaries, tax and regulatory gray zone to exploit in best of scenarios.

English

651

David Stockman@DA_Stockman·12 Haz

Well, here's some math. Starlink is a profitable business with about $11 billion of sales and $3 billion of free cash flow. It might be worth $75 billion at a frisky multiple of 25X free cash flow. The balance----the space launch business and the AI/data centers in space fantasy----has $7 billion of sales and NEGATIVE -$17 billion of free cash flow. So why is it worth anything, unless you are pricing a dream peddled by sell-side hucksters?! In short, after trading up to $2 trillion based on $75 billion of tangible Starlink value, where's the remaining $1.925 trillion of it? This isn't just the classical mania of the crowds. This is sui generis--- mass insanity in a casino that has been giving a lobotomy by three decades of money-printing madness at the Fed and its fellow-traveling central banks around the planet.

Jim Stewartson, Decelerationist 🇨🇦🇺🇦🇺🇸@jimstewartson

For $135 per share of SpaceX, you get 1/13,000,000,000th (One 13-BILLIONTH) of a company that in 2025 received $18,000,000,000 and lost $5,000,000,000 It’s allegedly worth $1,770,000,000,000 Do people not understand arithmetic anymore? Can they not count zeroes? Mass delusion.

English

191

636

2.6K

446.5K

Noetic Co@NoetekCo·13 Haz

@calcarinus @quantinine @Kimi_Moonshot @basedjensen @TheAhmadOsman i mean, you can get into a DIY single rtx pro 6k setup for under $15k with room to add a second incrementally. If there's an observable boost going to DSv4 flash over 3.6 27b, it's at a huge hardware cost. I think it boils down to what to do local vs. api offload.

English

Dawid Rutkowski@calcarinus·13 Haz

@NoetekCo @quantinine @Kimi_Moonshot @basedjensen @TheAhmadOsman Would you favour qwen 27b fp16 over deepseek v4 flash? (Assuming you have two R6KPro)?

English

Kimi.ai@Kimi_Moonshot·12 Haz

🌘 Kimi-K2.7-Code, our latest coding model, is now released and open-sourced! 🔷 Improved coding & agent performance over K2.6: +21.8% on Kimi Code Bench v2, +11.0% on Program Bench, and +31.5% on MLS Bench Lite. 🔷 Reasoning efficiency: Less overthinking, with 30% lower reasoning-token usage compared to K2.6. 🔷 Long-horizon coding: Improved instruction following, higher end-to-end coding task success rates. ⚡️ 6x High-Speed Mode coming soon! 🔌 Available today via Kimi API and Kimi Code. 🔗 Kimi Code: kimi.com/code 🔗 API: platform.moonshot.ai

English

644

1.8K

14K

2.5M

Noetic Co@NoetekCo·13 Haz

@calcarinus @quantinine @Kimi_Moonshot @basedjensen @TheAhmadOsman lushbinary.com/blog/deepseek-…

QME

Keşfet

@xtremesecurity @keennay @KVCache_AI @anton_onAI @zerohedge @WonkaWeirdo @realfrugalmogul @jonathan_wilke