dfi

2.6K posts

dfi banner
dfi

dfi

@dfi

M&A & Blockchain lawyer in NYC | Tweets not legal advice (email me) | Opinions are my own | DAO Council Member @FRWCCouncil | Claw & LocalLLM Hobbyist

NYC Se unió Nisan 2008
4.3K Siguiendo1.6K Seguidores
wassieloyer
wassieloyer@wassielawyer·
Been using Claude in legal a lot and my conclusion is that it is far better than what TradFi lawyers think it is but also far worse than what X engagement farmers think it is. If you can replace your legal practice with Claude today, you aren't a serious lawyer.
English
30
18
209
23.7K
nick
nick@tinyblue_dev·
Welp I did it. Wired up MiniMax M2.5 to my 2x Mac Studios (512GB) with @exolabs -> wired into OpenClaw, works equally well to Opus 4.6, and it's free. I'm dumping all my AI Token Subs - SUPER cool day!
nick tweet media
English
28
14
235
38.5K
dfi retuiteado
Lucky Iyinbor
Lucky Iyinbor@Luckyballa·
Apple just released its programming guide for Metal Performance Primitives, and they suggest using Morton codes for tiled GEMM, but why? In computer graphics, you use such space-filling curves all of the time It makes objects that are close in space to be close in memory There are several reasons, but one of them is that you get better cache locality, meaning less expensive reads from the device memory This is exactly why it’s appealing for GEMM too - you have a lot of overlapping memory reads between the tiles Morton schedules tiles in compact square patches, minimizing the working set that fits in last-level cache simultaneously, so nearby threadgroups are more likely to reuse the data they share
English
4
20
175
16.3K
dfi
dfi@dfi·
Everyone needs this on an offline flash drive somewhere as emergency intelligence… run the intelligence of last fall’s SOT models on your iPhone offline.
David Hendrickson@TeksEdge

🚨 Yes, running a ~400B (397B) parameter AI on an 🍎 iPhone is 100% REAL. 🤯 Been following the posts and reactions over the weekend. The Flash-MoE engine ingeniously shatters hardware limits 🧠, running massive Mixture-of-Experts models on Apple Silicon (iPhone 17 Pro & Macs) with only 12GB of RAM. Here’s how it works & how to try it yourself. 🧵👇 🧠 How is this physically possible? Normally, a 397B model in 4-bit quantization needs ~209GB just to load fully. Flash-MoE bypasses this with two key tricks: 1️⃣ By using SSD-to-GPU streaming, it doesn't load the entire model into RAM. It streams only the necessary expert weights on demand directly from @Apple's ultra-fast NVMe SSD to the GPU using parallel pread() calls. The OS page cache handles hits automatically ("trust the OS"). ⚡ 2️⃣ Only a tiny fraction of parameters activate per token due to MoE. For Qwen3.5-397B-A17B, it activates ~17B total (top K experts per layer, reduced to K=4–6 on mobile for speed). As a result, ~0.6–2 tokens/sec on iPhone 17 Pro for the 397B model (0.6 t/s in early demos; 1–2 t/s projected with K-reduction & splits). Extremely slow but usable for short prompts! 📱💨 💻 How to build & run on Mac Start with the 35B model—it's much faster (~9–10 t/s on M3 Max, ~5.5+ t/s on iPhone). 1️⃣ Clone the repo: git clone Alexintosh/flash-moe 2️⃣ Build the Metal engine cd flash-moe/metal_infer && make 3️⃣ Run it ./infer --model /path/to/weights --prompt "Hello" --tokens 100 (Add --tiered if using tiered-quant weights for smaller footprint) ⚠️ Note that you should use pre-packed raw .bin weights from Hugging Face (NOT safetensors). Pre-packed models available under alexintosh/... 📱 How to build & run on iPhone 1️⃣ Build the Xcode project from FlashMoE-iOS/ in the repo (or check releases if available). Requires iOS 18+. 2️⃣ Download pre-packed 35B from Hugging Face: alexintosh/Qwen3.5-35B-A3B-Q4-Tiered-FlashMoE (~13.4–19.5GB). 3️⃣ Push model files to the app's Documents directory (use copy_model_to_iphone.sh script over USB, or UIDocumentPicker). Set files to isExcludedFromBackup. 4️⃣ Open the app, select the model folder, and start prompting! 💬 🔥 Warning Heavy SSD streaming + GPU compute draws massive power. Your phone WILL get very hot and drain battery fast! Avoid long sessions. 🔋📉 GitHub: Alexintosh/flash-moe

English
0
0
2
101
dfi
dfi@dfi·
@SMB_Attorney So… Claude is the new LegalZoom? Lol
English
0
0
0
41
dfi
dfi@dfi·
@Ex0byt @megakilo @0xSero I had Codex build a (slow rudimentary) version of this and called it the “revolver”. Feel free to steal the name. I thought it was cool. Can’t wait to test it on a new (to me) M1 Ultra 128gb coming in the mail! Would be awesome to run Kimi K2.5 on 4 year old hardware.
English
0
0
0
61
Eric
Eric@Ex0byt·
@megakilo @0xSero Ha! You get to choose the name. I just want y'all running 1T-param models on your existing setups. 😉
English
1
0
12
1K
Eric
Eric@Ex0byt·
Get Excited: @0xSero and I are close — B300 is currently training on a tiny (15M param) side-loaded neural network that helps select, load, and cache the correct MoE experts for Kimi-K.2.5 (1T Param MoE model running on 25GB of memory). Once experiments are done -will share paper. "Thicket-Guided Expert Prediction for Memory-Minimal Trillion-Parameter MoE Inference on Unified Memory & Consumer Grade Hardware"
0xSero@0xSero

@pierrelezan Yes, @Ex0byt is working on this.

English
9
22
240
33K
dfi
dfi@dfi·
@anemll My first thought when I saw his post!
English
0
0
0
355
0xSero
0xSero@0xSero·
Putting out a wish to the universe. I need more compute, if I can get more I will make sure every machine from a small phone to a bootstrapped RTX 3090 node can run frontier intelligence fast with minimal intelligence loss. I have hit page 2 of huggingface, released 3 model family compressions and got GLM-4.7 on a MacBook huggingface.co/0xsero My beast just isn’t enough and I already spent 2k usd on renting GPUs on top of credits provided by Prime intellect and Hotaisle. ——— If you believe in what I do help me get this to Nvidia, maybe they will bless me with the pewter to keep making local AI more accessible 🙏
0xSero tweet media
Michael Dell 🇺🇸@MichaelDell

Jensen Huang is loving the new Dell Pro Max with GB300 at NVIDIA GTC.💙 They asked me to sign it, but I already did 😉

English
179
484
4.1K
916.6K
Sudo su
Sudo su@sudoingX·
jensen just compared openclaw slop house to linux and called it the most popular open source project in history. i admire jensen but he has clearly never used openclaw on a small model. if his team had spent one day in my DMs watching people migrate off it to hermes agent because their tool calls kept failing he might have framed things differently. openclaw's founder left for openai. the codebase is 125K+ lines of typescript bloat. the sandbox blocks the tools that actually matter. small models can't use the MEDIA: syntax so your images never arrive. i know because i found that bug, wrote the fix, and got it merged into hermes agent the same day. you don't need a $4,699 DGX Spark or a corporate "openclaw strategy" to run an autonomous agent. you need a half decade old GPU sitting in your drawer and a framework that actually works from 7B to 70B without special syntax. hermes agent. 30+ tools. 11 model specific parsers. runs on a RTX 3060 at 35-50 tok/s. the fix i submitted yesterday is already in production. jensen i respect the vision but the migration is already happening and it's not going in the direction you announced.
Sudo su tweet mediaSudo su tweet media
English
26
10
217
14.1K
dfi
dfi@dfi·
@sudoingX Grab a used M1 Mac Studio. You can get these for between $1000 (M1 Max with 32gb 400 GB/s unified ram) and $3500 (M1 Ultra with 128gb 800 GB/s unified ram). M1-M4 all similar. There is no better deal than these IMO. Can find hunting Backmarket, MacPro-LA, Google Shopping, etc.
English
0
0
1
186
Sudo su
Sudo su@sudoingX·
the mac studio influencers and openclaw salesmen want you to buy a $4,699 box and photograph it on your clean desk next to a plant. then post about how you're "running AI locally" while routing every prompt through an API. buy a used gpu for $250. open a terminal. run the model. no desk photo needed. the terminal is the proof.
English
9
3
82
5.6K
Sudo su
Sudo su@sudoingX·
local AI hardware tiers: $4,699 - DGX Spark (NVIDIA wants you here) $1,989 - RTX 4090 (overkill for most) $1000 - RTX 3090 used (sweet spot) $250 - RTX 3060 used (currently testing every model that fits 12GB) $0 - CPU only (it still works) jensen announced the top. i've been posting receipts from the bottom.
English
100
25
554
35.8K
dfi retuiteado
Brian Roemmele
Brian Roemmele@BrianRoemmele·
“Every software company in the world needs to have a Claw strategy" - Jensen Huang, Nvidia Indeed. This and more.
English
119
626
4.2K
603.9K
dfi
dfi@dfi·
@ivanfioravanti @sabastod Yeah its on the App Store from a dev and YouTuber. I think he’s been around for a while (he also has the app xCreate which is his name on YouTube). Probably somewhere between LM Studio and oMLX in terms of feature set. Super beginner friendly—it’s how I got into local models.
English
0
0
0
174
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
@dfi @sabastod Added to my list! I’m using oMLX now that is offering SSD caching too, but inferencer seems more mature, right?
English
1
0
0
300
dfi
dfi@dfi·
@ivanfioravanti @sabastod IMO OpenClaw was unusable for me with local models until I started using a server with prompt SSD caching (Inferencer). Total game changer.
English
2
0
3
149
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
@sabastod 90K? I have seen something around 20K, but if you add MCPs then you can be right, wondering if KV Cache on SSD like the one offered by oMLX can help 🤔 Gonna test tomorrow M5 Max with coding harnesses!
English
1
0
7
503
dfi
dfi@dfi·
@0xSero This is awesome. I’ve been throwing Codex + local models at getting a version of this working on MLX for Kimi K2.5 (for 256gb ram) for the last 24 hrs. My agents thank you for this repo!
English
0
0
2
236
0xSero
0xSero@0xSero·
Most stable 64% REAP No more runtime error, only 70% slowdown but model weights in vram 36% pre-quantization, whole model is available, so only losses are from wrong expert predictions. With Q4 quantization we can run Qwen3.5-35B theoretically near lossless. needing only 5gb~ of VRAM for weights with 65% speed retention on vllm. Hopefully this works for real stuff, we'll know more over the week.
0xSero tweet media
English
4
3
68
7.2K
dfi
dfi@dfi·
Doing some REAP experimentation with @0xSero's MLX REAP tool. ...and success! Uploaded my first models to Hugging Face! If you needed to squeeze out a few more GB from the Qwen 3.5 35B models on Mac, these are for you. huggingface.co/0xdfi
English
1
3
24
18.3K
dfi
dfi@dfi·
Lots of hype lately about AI and agentic tools replacing outside counsel. But these takes are ignoring structural reasons in our legal system that should have companies pausing before they substitute AI tools in place of an attorney. Two recent federal court decisions (among the first to address whether AI-generated communications and outputs are discoverable) come to a clear conclusion: if there's no attorney involved, there may not be attorney-client privilege or work product protection. Those AI outputs may be fair game in litigation. My colleagues break down the practical takeaways for M&A deal teams in the linked alert. mayerbrown.com/en/insights/pu…
English
0
0
0
85