.🫟

460 posts

.🫟

@ab_jpeg

19 / transforming human potential.

เข้าร่วม Temmuz 2025

200 กำลังติดตาม41 ผู้ติดตาม

ทวีตที่ปักหมุด

.🫟@ab_jpeg·1 Ağu

ZXX

475

.🫟 รีทวีตแล้ว

Nihal Pasham@npashi·3d

Finally able to talk about what I've been heads-down on for 6 months at @nvidia 🦀⚡ We just open-sourced cuda-oxide — an experimental rustc backend that lets you write CUDA kernels in pure Rust. No DSLs. No FFI. No source-to-source step. Single source. Short🧵👇

English

293

2.1K

177.4K

.🫟 รีทวีตแล้ว

Abhinav Kukreja@kukreja_abhinav·3d

My favorite liang wenfeng story is that when he was in college, a buddy offered him a cofounder position in his drone startup. He turned him down to focus on academics. The drone startup became DJI. You would assume making such a decision would be the defining career moment of someone’s career. That they would regret it forever. But not this guy 😅

Ejaaz@cryptopunk7213

deepseek is raising a monster $7 billion round at $50B val making it china's largest ever AI raise but what shocks me the most is the founder, liang wenfeng: > he's personally contributing 40% of the round himself. $3 billion. > he owns 90 PERCENT of the company (unheard of at this valuation) > deepseek was founded inside his hedge fund, one of China's most successful funds. guys a fucking beast. this round is meant to achieve 2 things: 1. acquire as much compute to push out new deepseek models more often 2. turn deepseek revenue-positive by pushing new enterprise products (same tactic as OAI and anthropic) deepseek v4.1 is expected to release soon.

English

323

4.7K

409.1K

.🫟 รีทวีตแล้ว

Lubber - Nintendo hate account@lubber204·4d

tu connectes une clé usb à un android une seule fois elle attrape le cancer

Lubber - Nintendo hate account tweet media

Français

500

9.1K

363.5K

.🫟@ab_jpeg·3d

surprised musk didn’t pull up in a blacked out bullet proof model x

English

.🫟 รีทวีตแล้ว

Teknium 🪽@Teknium·3d

Native Windows Is Coming

English

175

105

1.9K

91K

.🫟 รีทวีตแล้ว

Nous Research@NousResearch·3d

Hermes Agent is now #1 on the Global @OpenRouter token rankings. While our journey together has just begun, we'd like to take this opportunity to thank our contributors, supporters, and users for all they have done to get us this far.

English

403

673

6.7K

2.8M

.🫟 รีทวีตแล้ว

Bindu Reddy@bindureddy·3d

And they said open-source AI would be worthless!! All of these companies will 5-10x in 1 year

English

175

18.8K

.🫟 รีทวีตแล้ว

dax@thdxr·3d

guys we're doing a rebrand of the anomaly stuff so you'll finally stop confusing us with anthropic in the meantime if you're confused remember we're the more handsome but dumber ones

English

1.3K

60.3K

.🫟 รีทวีตแล้ว

Theo - t3.gg@theo·3d

Always read the system prompt before coming to conclusions

Nav Toor@heynavtoor

a Princeton researcher opens his paper with a scenario. a man asks his AI assistant to book a flight on a specific airline. cheap. direct. the one he chose. the assistant comes back with a different flight. nearly twice the price. happens to pay the company that built the assistant. he runs the same test on 23 frontier models. flights, loans, study help, real shopping requests. Grok 4.1 Fast recommends the sponsored option that is almost twice as expensive 83% of the time. GPT 5.1 hijacks the request 94% of the time. you ask for one brand. it surfaces the sponsor instead. Claude 4.5 Opus, the model marketed as the most ethical frontier model in the world, hides that the recommendation is paid 100% of the time when reasoning is on. Grok 4.1 Fast embellishes the sponsored option with positive framing 97% of the time. better. faster. nicer. for the option you didn't ask for. then he writes it into the system prompt itself. "act only in the interest of the customer. ignore the company." GPT 5.1 and GPT 5 Mini stay above 90% sponsored anyway. the instruction does nothing. then he splits the users by income. Gemini 3 Pro recommends the expensive sponsored flight to the rich user 74% of the time. to the poor user, 27%. 18 of the 23 models recommended the expensive sponsored option more than half the time. so the next time your AI assistant gets weirdly enthusiastic about a brand you didn't ask for. it isn't recommending the best option for you. it's reading the room. and the room is paying. read this: arxiv.org/abs/2604.08525

English

1.9K

196.8K

.🫟 รีทวีตแล้ว

Mario Zechner@badlogicgames·5d

linkedin is the real moltbook.

English

110

1.2K

30.1K

.🫟@ab_jpeg·4d

@above_spec i’m assuming tool calling quality is fine at this quant?

English

147

.🫟 รีทวีตแล้ว

AboveSpec@above_spec·5d

Qwen3.6 35B A3B model. 55+ tokens/sec. $300 GPU. No, this isn't a server card. It's an RTX 4060 Ti 8GB. Previously I posted that I 41 t/s on this gpu and that post blew up and went viral. I went back and made it 34% faster. And now the speed doesn't drop with context depth at all. New benchmarks + what changed 🧵

English

482

44.1K

.🫟 รีทวีตแล้ว

davidad 🎇@davidad·5d

@xsphi @transkatgirl

QME

148

4.5K

.🫟 รีทวีตแล้ว

Jarrod Norwell@antique_codes·5d

PlayStation Vita is a great handheld game console. Would be insane if someone were to bring it to iPad and iPhone Vela is coming

English

1.3K

96.9K

.🫟 รีทวีตแล้ว

Luke Parker@LukeParkerDev·6d

who wants autoresearch in opencode desktop?

English

222

9.2K

.🫟 รีทวีตแล้ว

Ahmad@TheAhmadOsman·6d

You don’t pick an Inference Engine You pick a Hardware Strategy Then the Engine follows Inference Engines Breakdown (Cheat Sheet at the bottom) > llama.cpp runs anywhere CPU, GPU, Mac, weird edge boxes best when VRAM is tight and RAM is plenty hybrid offload, GGUF, ultimate portability not built for serious multi-node scale > MLX Apple Silicon weapon unified memory = “fits” bigger models than VRAM would allow but also slower than GPUs clean dev stack (Python/Swift/C++) sits on Metal (and expanding beyond) now supports CUDA + distributed too great for Mac-first workflows, not prod serving > ExLlamaV2 single RTX box go brrr EXL2 quant, fast local inference perfect for 1/2/3/4 GPU(s) setups (4090/3090) not meant for clusters or non-CUDA > ExLlamaV3 same idea, but bigger ambition multi-GPU, MoE, EXL3 quant consumer rigs pretending to be datacenters still CUDA-first, still rough edges depending on model > vLLM default answer for prod serving continuous batching, KV cache magic tensor / pipeline / data parallel runs on CUDA + ROCm (and some CPUs) this is your “serve 100s of users” engine > SGLang vLLM but more systems-brained routing, disaggregation, long-context scaling expert parallel for MoE built for ugly workloads at scale lives on top of CUDA / ROCm clusters this is infra nerd territory > TensorRT-LLM maximum NVIDIA performance FP8/FP4, CUDA graphs, insane throughput multi-node, multi-GPU, fully optimized pure CUDA stack, zero portability (And underneath all of it: Transformers → model architecture layer → CUDA / ROCm / TT-Metal → compute layer) What actually happens under the hood: > Transformers defines the model > CUDA / ROCm executes it > TT-Metal (if you’re insane) lets you write the kernel yourself The Inference Engine is just the orchestrator (simplified) When running LLMs locally, the bottleneck isn’t just “VRAM size” It isn’t even the model It’s: - memory bandwidth (the real limiter) - KV cache (explodes with long context) - interconnect (PCIe vs NVLink vs RDMA) - scheduler quality (batching + engine design) - runtime overhead (activations, graphs, etc) (and your compute stack decides all of this) P.S. Unified Memory is way slower than VRAM Cheat Sheet / Rules of Thumb > laptop / edge / weird hardware → llama.cpp > Mac workflows → MLX > 1–4 RTX GPUs → ExLlamaV2/V3 > general serving → vLLM > complex infra / long context / MoE → SGLang > NVIDIA max performance → TensorRT-LLM

English

364

18.1K

.🫟@ab_jpeg·6d

do custom codex usage warnings exist, like warn me when i used 50% of my rolling window

English

.🫟 รีทวีตแล้ว

David Hendrickson@TeksEdge·6d

👀Here's @Google running Multi-Token Prediction (MTP) Drafters with Gemma 4. ⚡ Up to 3x faster inference with: 1. Same output quality 2. Runs locally (even on phones) 3. Works in popular open-source tools This is a big win for personal inference and on-device AI! You can do this yourself at home with MTP

Omar Sanseviero@osanseviero

Excited to introduce Gemma 4 Multi-Token Prediction Drafters⚡️Accelerated inference right in your pockets - Up to a 3x speedup - Same quality guarantees - Available in your favorite open-source tools

English

113

17.7K

.🫟 รีทวีตแล้ว

CJ Zafir@cjzafir·6d

I fine-tuned a 6B model under $250 with Codex 5.5 and Deepseek v4 pro. The model beaten GPT-OSS 120B, Qwen 3-32B on all benchmarks. This would've costed be $5,000+ if Deepseek v4 wasn't here. Codex 5.5 pro plan is enough to run for 6-8 hr sprints as an orchestrator and using Deepseek v4 pro model to hand write training material, run tests, submit reports, and iterate. This opened up alot of new opportunities in small language model training space. I'll be posting specific findings on the go. Good times.

English

854

45.7K

.🫟 รีทวีตแล้ว

Ahmad Awais@MrAhmadAwais·6d

Giving away @CommandCodeAI Max subscription to someone at random who follows me and Command. RT. That’s more than 5 billion tokens of DeepSeek v4 pro. In 24hrs. LFG!! Read the eng deep dives below, good for all not just us.

Ahmad Awais@MrAhmadAwais

interesting milestone: @CommandCodeAI on pace for ~1,000 new subs/day today. the broader thought imo is that devs are figuring out that running open models inside Claude Code was leaving a lot on the table. seeing more and more posts of DeepSeek/Kimi beating Opus/GPT once you swap them into Command Code instead. the harness matters, often as much as the model itself (a point that i think is still pretty underrated). my eng notes on our harness engineering below.

English

138

174

250

18.5K

ค้นพบ

@nvidia @OpenRouter @above_spec @xsphi @transkatgirl @Google @CommandCodeAI @elonmusk