left pocket cheesecake

936 posts

left pocket cheesecake

@TraeMurray24

kubernetes enjoyer

Katılım Aralık 2012

773 Takip Edilen238 Takipçiler

left pocket cheesecake@TraeMurray24·15 May

we're going to need a bigger boat

English

left pocket cheesecake@TraeMurray24·6 May

@ItsmeAjayKV @ggerganov Google released MTP for Gemma4 today but this post specifically mentions and links to a separate draft model - x.com/googlegemma/st… Am I misunderstanding something or are they just calling their standard speculative decoding MTP?

Google Gemma@googlegemma

x.com/i/article/2049…

English

176

AJ@ItsmeAjayKV·4 May

The one big thing I'm waiting for in llama.cpp (shout out to @ggerganov ) right now is MTP. With Qwen 3.6 (my fav model) already supporting it, we are going to see massive improvements in generation speed once it's fully merged. So, what exactly is MTP? It stands for Multi-Token Prediction. If you understand speculative decoding, this is the next level. Instead of relying on a smaller, separate draft model, MTP is built directly into the model during its initial training. The main model simply produces draft tokens on its own auxiliary heads that allow it to naturally output multiple future tokens simultaneously. It's leaner, faster, and incredibly efficient for local hardware. How is it different from other methods. Well lets go over them in brief. 1. Standard Speculative Decoding (Draft models) You load two models into memory: the big target model (e.g 35B) and a tiny fast draft model ( < 2B) from the same family. The small draft model runs ahead, generating 4 or 5 tokens sequentially. The massive target model then does a single forward pass to check the drafts math. Pros: Consistent speedups across workloads. Cons: Eats more VRAM, if i talk about my 3060 case, where i try to squeeze a heavy model into 12GB of VRAM, sacrificing a GB or two just to host a draft model can be a painful trade-off. 2. n-gram speculative decoding (prompt lookup) This one is interesting, different idea than draft, n-gram decoding simply looks at the text already in the "prompt" and guesses that it will be repeated (which is also its biggest issue). Good for coding, JSON formatting, or even rag. Pros: Zero VRAM overhead. Nothing extra to load. Good speedup for above mentioned tasks. Cons: Very situational. For creative writing it fails miserably and offers almost no speedup. 3. DFlash (Block diffusion drafting) DFlash replaces traditional autoregressive draft model with a lightweight block diffusion model. Instead of guessing tokens sequentially, DFlash generates an entire block of tokens in parallel in a single forward pass. It achieves this by pulling hidden state features directrly from the target model andusing them as context to denoise a block of next tokens immedietly. Pros: Super fast, by removing sequential bottleneck of drafting phase, this can achieve high loseless acceleration. Cons: Nothing much actually, it does requires specialized checkpoints trained specifically to align with the target model. Also take a look at LuceBox-hub D-Flash and P-flash by @davideciffa

English

8.7K

left pocket cheesecake@TraeMurray24·6 May

@googlegemma How does this compare to DFlash?

English

885

Google Gemma@googlegemma·5 May

Gemma 4 just got even faster! We're releasing Multi-Token Prediction (MTP) drafters that deliver up to a 3x speedup, without any degradation in output quality or reasoning logic.

GIF

English

356

3.4K

205.1K

left pocket cheesecake@TraeMurray24·1 May

@evanjconrad Are yall looking to build more datacenters in the near future?

English

273

evan conrad@evanjconrad·30 Nis

San Francisco Compute is growing rapidly & we're hiring across the board for our systems engineering, data center development, & (product & brand) design teams. We're the local supercomputing company. We sell people GPU clusters on contracts they can sublease. SFC's goal is to reduce the financial risk one of the largest infrastructure build outs in history. To do that, we vertically integrated. That means we build data centers, the clusters in the data center, and a cloud platform that we built on top of the most order book of it's kind. This lets you do cool stuff like "buy a 1 month contract 3 months out, but only if I can get it colocated, and only if the price is at a 25% discount to the current market price." You can walk in the door on Friday, buy a 3-year contract, and then walk out the door on Monday by selling the whole thing. In other words, we build the cloud for people who care about margins & their risk exposure. We did that because SFC was originally "Junelark" (a teeny tiny 2-person AI lab), which bought too big of a GPU cluster & was forced to sublease it. The first year of the company was tremendously stressful because if we didn't sell the cluster, we'd go bankrupt. This forced us to become a very rough accidental cloud. We'd operate on top of other clouds, but ran out of folks who would give us access to key parts of the cluster (like BMC, UFM, & switch access) needed to offer a viable experience. To build something great, we vertically integrated down and down until we hit the dirt. These days, I like to operate the company somewhat quietly. Our website's a single page (we may change this). We don't show up on VC market maps and we're not in the news much. I hope this doesn't deter you; SFC operates at very large scale & has been growing at an incredible pace. We're just very focused on standing up clusters & shipping features that help our customers. Our team includes industry veterans, like the cofounder of Voltage Park, key folks from Tesla, Meta, Lambda, Redhat, Hut8, Canonical, & Sun. We'd love for you to join us! My DMs are open, or you can reach me at evan at sf compute dot com.

English

300

43.3K

left pocket cheesecake@TraeMurray24·10 Nis

@menhguin What memory stocks are in? Been looking at those but they've pumped so much in last year I'm not sure how much juice is left

English

130

Minh Nhat Nguyen@menhguin·10 Nis

new positions in q2: seed investment + small 2-3% allocation in solana and worldcoin (agentic payments) current positions, might rebalance: 40% memory and AIXA and intel calls 15% zai and minimax 15% PLTR puts 10% nuclear (oklo) 10% aerospace (fly and rklb) 10% optics/photonics

English

3.5K

Minh Nhat Nguyen@menhguin·9 Nis

fyi, nowadays im busy so i just have openclaw automations+deep research tracking @zephyr_z9 and @aleabitoreddit for new positions. up about ~60% YTD mostly from existing positions: memory stocks, intel calls, palantir puts, zai and minimax shares all of which are up ~100-200%.

Minh Nhat Nguyen@menhguin

Leopold's having fun, so here's my AI Safety twink portfolio. Total 1-year return: +892%. Criteria: Product advances human civilisation + good team. 50% Oklo (+1700%) 45% Tesla (+87%) 5% Nvidia (+55%)

English

582

83.1K

left pocket cheesecake@TraeMurray24·30 Mar

@heyimandy This is awesome. Where can I find the design for this?

English

Andy Wood@heyimandy·30 Mar

Quick weekend 3D print: 4 of these cups to hold my daughters art supplies on her art easel. I think every household should have a basic 3D printer.

English

12.3K

left pocket cheesecake@TraeMurray24·20 Mar

@jxmnop What are some good resources to learn writing kernels?

English

7.1K

Jack Morris@jxmnop·20 Mar

Learning to write kernels might be the highest-ROI activity for displaced SWEs: → prereq: reasonable engineering ablity → six to twelve months of study → millions of dollars, mark zuckerberg showing up at your house to hire you, etc. i wish this were an exaggeration

English

1.9K

124.1K

left pocket cheesecake@TraeMurray24·13 Mar

@sloppenheimer You running these locally?

English

gerred@sloppenheimer·13 Mar

models I am still using for research: gpt-oss series dsv3 series and new, nemotron super entering the canon

Neil Chowdhury@ChowdhuryNeil

i disagree. gpt-oss-120b is *the* model i use most frequently for my research. it is ridiculously good for how cheap it is (5b active). it gets hate for being worse than larger chinese models, but it is one of my favorites -- i really hope that openai releases future oss models

English

555

left pocket cheesecake@TraeMurray24·18 Kas

@RobinhoodApp Bet

Robinhood@RobinhoodApp·18 Kas

You deserve a treat. Comment below and we may send you some merch.

English

14.7K

408

left pocket cheesecake@TraeMurray24·18 Kas

Incredible email received today

English

113

left pocket cheesecake retweetledi

cowboy@nextokens·6 Kas

Here Comes Another Bubble (AI Edition)

English

113

487

3.7K

425.7K

left pocket cheesecake@TraeMurray24·27 Eki

Saw a dad at the airport repping a delta t shirt like it's game day

English

left pocket cheesecake@TraeMurray24·22 Eki

@Accenture69 What about solar?

English

Taco MacArthur@Accenture69·22 Eki

Hey man, I know you’re trying to do your job, but I didn’t come to Costco to sign up for AT&T internet

English

left pocket cheesecake retweetledi

BuccoCapital Bloke@buccocapital·14 Eki

Just a quick detour to build the gooner bot but we promise to cure cancer next

*Walter Bloomberg@DeItaone

SAM ALTMAN SAYS OPENAI WILL ALLOW EROTICA FOR ADULT USERS - AXIOS

English

412

5.1K

322.4K

left pocket cheesecake@TraeMurray24·10 Eki

@NVIDIAGeForce GeForce Day

English

NVIDIA GeForce@NVIDIAGeForce·10 Eki

🟢 GEFORCE DAY IS BACK 🟢 To celebrate, we're giving away TWO GeForce RTX 5080 Founders Edition GPUs, signed by NVIDIA CEO Jensen Huang. Want one? Comment "GeForce Day" for a chance to WIN & stay tuned for more!

English

57.7K

3.5K

47.1K

5.9M

left pocket cheesecake@TraeMurray24·24 Eyl

A rollover probably feels good asf to a 401k

English

left pocket cheesecake@TraeMurray24·15 Eyl

Getting ready to board a cross country flight with a toddler

English

left pocket cheesecake@TraeMurray24·12 Eyl

@tekbog Me at last job deploying nvidia gpu operator onto eks nodes with preinstalled drivers

English

terminally onλine εngineer@tekbog·12 Eyl

MLOps sounds cool until you discover you just deal with cuda versions, python package versions and supported gpus

English

158

4.2K

left pocket cheesecake@TraeMurray24·6 Eyl

@tekbog relearned networking for the 67th consecutive week

English

terminally onλine εngineer@tekbog·6 Eyl

what did you get done this week?

English

7.4K

left pocket cheesecake@TraeMurray24·5 Eyl

You guys catch the game last night?

Evan@StockMKTNewz

A bunch of tech CEOs are currently having dinner with 🇺🇸 President Trump including Meta CEO Mark Zuckerberg Microsoft CEO Satya Nadella Google CEO Sundar Pichai Apple CEO Tim Cook AMD CEO Lisa Su OpenAI CEO Sam Altman

English

229

Keşfet

@ItsmeAjayKV @ggerganov @davideciffa @googlegemma @evanjconrad @menhguin @zephyr_z9 @aleabitoreddit