Will Finger

890 posts

Will Finger

@willfi

Product Designer, Web3 On-chain, AI engineer. Married. Father. Learning love with Jesus.

Portugal Inscrit le Haziran 2009

430 Abonnements177 Abonnés

Tweet épinglé

Will Finger@willfi·27 Ara

Once you read you never be the same I present you 3 amazing christian books, Have you ever read Watchman Nee, Jessie Pen Lewis and G H Pember? Earth's Earliest Ages by G.H. Pember goodreads.com/book/show/1101… The Spiritual Man by Watchman Nee goodreads.com/book/show/8497… War on the Saints, The Full Text, Unabridged Edition by Jessie Penn-Lewis goodreads.com/book/show/2835…

English

149

Will Finger@willfi·6h

@RichardHolmboe @sudoingX x.com/willfi/status/…

Will Finger@willfi

I cracked it. Qwen 3.5 35B Local! 120 t/s generation. 120K context. Vision ENABLED. All on 16GB VRAM. All GPU. Zero compromise. Here's exactly how 👇

QME

Richard Holmboe@RichardHolmboe·8h

@sudoingX What is your context window for qwen 3.5 35B-A3B?

English

330

Sudo su@sudoingX·9h

if you're about to download nvidia's nemotron cascade 2 at Q4_K_M for a single RTX 3090, stop. save yourself the frustration i went through last night. Q4_K_M is 24.5GB. your 3090 has 24GB VRAM. the model loads, no room for KV cache, no room for context, no room for compute buffer. it will not run. this is a MoE architecture where the expert weights don't compress well at standard Q4. every quant table online lists it as "recommended" without checking if it fits consumer VRAM. the fix: bartowski IQ4_XS at 18.17GB. imatrix quantization that's smarter about which weights need precision and which don't. same 4-bit tier, 6GB smaller because it doesn't blindly keep every expert at the same precision. leaves you 5.4GB of headroom for KV cache and context. downloading it now on the same RTX 3090 i ran qwen 3.5 35B-A3B on at 112 tok/s. same machine, same node, same everything. first up is context scaling sweep from 4K to 262K to see how mamba-2 handles long context compared to qwen's deltanet. then speed benchmarks at each context level. then i'm pointing hermes agent at it for autonomous coding sessions to see how it handles tool calls, file creation, and multi-step builds over long sessions. nvidia vs alibaba. mamba vs deltanet. same hardware, different architectures. i'll report back with exact flags, exact numbers, exact VRAM breakdowns. no theory, no spec sheets. tested data from a real card.

Sudo su@sudoingX

the hype around this model settled fast. good. now i can test it without the noise. NVIDIA released nemotron cascade. 30B total, 3B active. fits on a single RTX 3090. hybrid mamba MoE. gold medal on the international math olympiad with only 3 billion active parameters. they say it beats qwen on math, code, and reasoning. i tested qwen 3.5 35B-A3B on a single 3090 at 112 tok/s. now same card, same tests, different architecture. mamba vs deltanet. nvidia vs alibaba. receipts incoming tonight.

English

474

70.5K

Will Finger@willfi·6h

@sudoingX Nice one! For 16gb VRAM Windows: The qwen 3.5 35b and 27b are the best ones still. x.com/willfi/status/…

Will Finger@willfi

I cracked it. Qwen 3.5 35B Local! 120 t/s generation. 120K context. Vision ENABLED. All on 16GB VRAM. All GPU. Zero compromise. Here's exactly how 👇

English

302

Will Finger retweeté

Pietro Schirano@skirano·23h

I tried to warn designers. This is exactly why I built @MagicPathAI, you need to lean more into AI native workflow. You're getting outpaced by product and engineers now. Prototypes are the new PRDs. If PMs are doing this, what do you do? You either own that or you're out.

Lenny Rachitsky@lennysan

I don’t know exactly what’s going on here, but it does feel AI-related. Unlike PM and eng, which started growing in 2024 (two years post-ChatGPT), design didn’t. If I had to venture a theory, I’d say that because AI is allowing engineers to move so quickly, there’s less opportunity—and less desire—to involve the traditional design process. That said, you’d think design would become a differentiator as more products compete for attention. Something to think about for your company! We’ll keep watching this trend and AI’s impact on org design more generally. One interesting observation we made when we went a level deeper: the ratio of demand for PMs vs. designers has flipped. In mid-2023, we went from more open designer roles to more open PM roles. And ever since, PM demand has been pulling away (currently 1.27x). This will be another trend to monitor, in terms of how AI is reshaping org design.

English

155

29.6K

Will Finger@willfi·18h

@0xCVYH Pra quem tem Windows: x.com/willfi/status/… Tmj!

Will Finger@willfi

I cracked it. Qwen 3.5 35B Local! 120 t/s generation. 120K context. Vision ENABLED. All on 16GB VRAM. All GPU. Zero compromise. Here's exactly how 👇

Português

822

CV.YH@0xCVYH·1d

Qwen3.5 4B rodando local com 4GB de RAM. Pesquisou 20+ sites, citou fontes, e encontrou a melhor resposta. O detalhe: fez tool calls + web search DURANTE o processo de raciocinio. Nao eh resposta pronta da memoria. Eh pesquisa ativa em tempo real. Modelo de 4 BILHOES de parametros. No seu notebook. Sem cloud. Sem API. Sem custo. Via Unsloth Studio (open source). Se voce tem um notebook com 4GB sobrando, ja pode ter seu proprio pesquisador de IA local.

Português

283

13.2K

Will Finger@willfi·2d

@stevibe 27b is the best of them. But how does it compare to Opus 4.6? I wonder

English

1.3K

stevibe@stevibe·2d

"122B has to be smarter than 27B" I showed 4 UI components to three Qwen3.5 models and asked them to recreate them from a screenshot alone: - 27B (dense) - 35B-A3B (MoE) - 122B-A10B (MoE) Same screenshot. Same prompt. Same task. Which one do you think nailed it?

English

917

85.2K

Will Finger retweeté

0xSero@0xSero·3d

In 72 hours I got over 100k of value 1. Lambda gave me 5000$ credits in compute 2. Nvidia offered me 8x H100s on the cloud (20$/h) idk for how long but assuming 2 weeks that'd be 5000$~ 3. TNG technology offered me 2 weeks of B200s which is something like 12000$ in compute 4. A kind person offered me 100k in GCP credits (enough to train a 27B if you do it right) 5. Framework offered to mail me a desktop computer 6. We got 14,000$ in donations which will go to buying 2x RTX Pro 6000s (bringing me up to 384GB VRAM) 7. I got over 6M impressions which based on my RPM would be 1500$ over my 500$~ usual per pay period 8. I have gained 17,000~ followers, over doubling my follower count 9. 17 subscribers on X + 700 on youtube. The total value of all this approaches at minimum 50,000$~ and closer to 150,000$ if I leverage it all. --------------------- What I'll be doing with all this: Eric is an incredibly driven researcher I have been bouncing ideas off of over the last month. Him and I have been tackling the idea of getting massive models to fit on relatively cheap memory. The idea is taking advantage of different forms of memory, in combination with expert saliency scoring, to offload specific expert groupings to different memory tiers. For the MoEs I've tested over my entire AI session history about 37.5% of the model is responsible for 95% of token routing. So we can offload 62.5% of an LLM onto SSD/NVMe/CPU/Cheap VRAM this should theoretically result in minimal latency added if we can select the right experts. We can combine this with paged swapping to further accelerate the prompt processing, if done right we are looking at very very decent performance for massive unquantisation & unpruned LLMs. You can get DeepSeek-v3.2-speciale at full intelligence with decent tokens/s as long as you have enough vram to host the core 20-40% of the model and enough ram or SSD to host the rest. Add quantisation to the mix and you can basically have decent speeds and intelligence with just 5-10% of the model's size in vram (+ you need some for context) The funds will be used to push this to it's limits. ----------------- There's also tons of research that you can quantise a model drastically, then distill from the original BF16 or make a LoRA to align it back to the original mostly. This will be added to the pipeline too. ------------------ All this will be built out here: github.com/0xSero/moe-com… you will be able to take any MoE and shove it in here, and with only 24GB and enough RAM/NVMe to compress it down. it'll be slow as hell but it will work with little tinkering. ------------------ Lastly I will be looking into either a full training run from scratch -> or just post-training on an open AMERICAN base model - a research model - an openclaw/nanoclaw/hermes model - a browser-use model To prove that this can be done. -------------------- I will be bad at all of it, and doubt I will get beyond the best small models from 6 months ago, but I want to prove it's no boogeyman impossible task to everyone who says otherwise. -------------------- By the end of the year: 1. I will have 1 model I trained in some capacity be on the top 5 at either pinchbench, browseruse, or research. 2. My github will have a master repo which combines all my work into reusable generalised scripts to help you do that same. 3. The largest public comparative dataset for all MoE quantisations, prunes, benchmarks, costs, hardware requirements. -------------------------- A lot of this will be lead by Eric, who I will tag in the next post. I want to say thank you to everyone who has supported me, I have gotten a lot of comments stating: 1. I'm crazy, stupid, or both 2. I'm wasting my time, no one cares about this 3. This is not a real issue I believe the amount of interest and support I've received says it all. donate.sybilsolutions.ai

English

222

274

4.1K

165.3K

Will Finger@willfi·3d

@sudoingX RTX 5080 + AMD 9800x3D + 96gb ram ddr5. Running Qwen 35b 100 T/S with 128k context

English

Sudo su@sudoingX·3d

i just became a mod of x/LocalLLaMA. if you're running local models on your own hardware and want in, the community is open. pinned and highlighted on my profile. approving members starting today. drop your setup below and i'll get you in. 3060, 3090, 4090, 5090, AMD, whatever you're running. all welcome. if you're hitting issues with hermes agent, llama.cpp, model selection, configs, i'm here. let's make local AI accessible for everyone.

Sudo su@sudoingX

let me get you started in local AI and bring you to the edge. if you have a GPU or thinking about diving into the local LLM rabbit hole, first thing you do before any setup is join x/LocalLLaMA. this is the community that will help you at every step. post your issue and we will direct you, debug with you, and save you hours of work. once you're in, follow these three: @TheAhmadOsman the oracle. this is where you consume the latest edges in infrastructure and AI. if something dropped you hear it from him first. his content alone will keep you ahead of most. @0xsero one man army when it comes to model compression, novel quantization research, new tools and tricks that make your local setup better. you will learn, experiment, and discover things you didn't know existed. @Teknium maker of Hermes Agent, the agent i use every day from @NousResearch. from Teknium you don't just stay at the frontier, you get your hands on the tools before everyone else. this is where things are headed. if you follow me follow these three and join the community. you will be ahead of most people in this space. if you run into wrong configs, stuck debugging hardware, or can't get a model to load, post there so we can help. get started with local AI now. not only understand the stack but own your cognition. don't pay openai fees on top of giving them your prompts, your research, and your most valuable thinking to be monitored and metered. buy a GPU and build your own token factory.

English

328

817

60.2K

Will Finger retweeté

Alex Barashkov@alex_barashkov·5d

A new free tool for designers is on the way. Made by designers, for designers. Coming soon.

English

1.2K

71.4K

Will Finger@willfi·6d

@BrettFromDJ Love it the depth! What do you think about clutter/noise? How to mitigate in a scalable DS?

English

Brett@BrettFromDJ·17 Mar

More interface details. 😍

English

990

43.2K

Will Finger retweeté

Can Vardar@icanvardar·17 Mar

we are so back

Kalshi@Kalshi

JUST IN: Job postings for software engineers on Indeed reach new 6-month high

English

147

1.6K

46.4K

1.7M

Will Finger@willfi·6d

@0xSero When browser? Only Cursor have?

English

0xSero@0xSero·18 Mar

OpenCode Desktop app updates worth checking out. 1. Now we have Queue mode, which I am a huge fan of. 2. They've enabled adding custom providers in settings. 3. Performance seems improved, it's less slow in large sessions and thread.

English

220

14K

Will Finger@willfi·6d

@Poloolpp Usp and awp nerfs

English

17.7K

ً@Poloolpp·6d

If you're not running the P2K now you're trolling

English

118

152

17.2K

1.3M

Will Finger@willfi·6d

@Ozzny_CS2 Yep. Awp needs 6 bullets instead 5 to balance now.

English

39.1K

Ozzny@Ozzny_CS2·6d

Here's how the new Reload system works in CS2 ‼️ > When you reload, you drop the used magazine and get a full new one > Each reload takes 1 magazine from the counter What do we think, W or L change?

English

257

2.6K

735.5K

Will Finger@willfi·6d

@EkeN3tt @Interloper_CS @gabefollower That’s good, smoke spray was op

English

169

Opiee Cs@EkeN3tt·6d

@Interloper_CS @gabefollower YAY CZ meta instead of Ak or M4. I don't think people understand how much this affects the game. I mean if you spray 15 bullets in a smoke u will not want to reload

English

1.1K

‎Gabe Follower@gabefollower·6d

Counter-Strike developers have just released one of the biggest updates to the meta. Now, if you reload your weapon while there is still ammo in the magazine, the ammo disappears instead of being returned to your reserves. They have also increased (and decreased) the number of bullets for some weapons. The CZ-75, for example, now has 36 bullets instead of 24.