dfi

2.6K posts

dfi

@dfi

M&A & Blockchain lawyer in NYC | Tweets not legal advice (email me) | Opinions are my own | DAO Council Member @FRWCCouncil | Claw & LocalLLM Hobbyist

NYC Se unió Nisan 2008

4.3K Siguiendo1.6K Seguidores

dfi@dfi·2d

💯

wassieloyer@wassielawyer

Been using Claude in legal a lot and my conclusion is that it is far better than what TradFi lawyers think it is but also far worse than what X engagement farmers think it is. If you can replace your legal practice with Claude today, you aren't a serious lawyer.

ART

dfi@dfi·2d

@wassielawyer 🤝

QME

101

wassieloyer@wassielawyer·2d

English

209

23.7K

dfi@dfi·4d

@tinyblue_dev @exolabs Try out GLM 5!

English

nick@tinyblue_dev·6d

Welp I did it. Wired up MiniMax M2.5 to my 2x Mac Studios (512GB) with @exolabs -> wired into OpenClaw, works equally well to Opus 4.6, and it's free. I'm dumping all my AI Token Subs - SUPER cool day!

English

235

38.5K

dfi@dfi·5d

Answers a lot of questions! Thank you @samwhoo

ngrok@ngrokHQ

Quantization can make an LLM 4x smaller and 2x faster, with barely any quality loss. But what *is* it? @samwhoo crafted a beautiful interactive essay explaining it from first principles, aimed at coders, not mathematicians. ngrok.com/blog/quantizat…

English

dfi retuiteado

Lucky Iyinbor@Luckyballa·5d

Apple just released its programming guide for Metal Performance Primitives, and they suggest using Morton codes for tiled GEMM, but why? In computer graphics, you use such space-filling curves all of the time It makes objects that are close in space to be close in memory There are several reasons, but one of them is that you get better cache locality, meaning less expensive reads from the device memory This is exactly why it’s appealing for GEMM too - you have a lot of overlapping memory reads between the tiles Morton schedules tiles in compact square patches, minimizing the working set that fits in last-level cache simultaneously, so nearby threadgroups are more likely to reuse the data they share

English

175

16.3K

dfi@dfi·23 Mar

Everyone needs this on an offline flash drive somewhere as emergency intelligence… run the intelligence of last fall’s SOT models on your iPhone offline.

David Hendrickson@TeksEdge

🚨 Yes, running a ~400B (397B) parameter AI on an 🍎 iPhone is 100% REAL. 🤯 Been following the posts and reactions over the weekend. The Flash-MoE engine ingeniously shatters hardware limits 🧠, running massive Mixture-of-Experts models on Apple Silicon (iPhone 17 Pro & Macs) with only 12GB of RAM. Here’s how it works & how to try it yourself. 🧵👇 🧠 How is this physically possible? Normally, a 397B model in 4-bit quantization needs ~209GB just to load fully. Flash-MoE bypasses this with two key tricks: 1️⃣ By using SSD-to-GPU streaming, it doesn't load the entire model into RAM. It streams only the necessary expert weights on demand directly from @Apple's ultra-fast NVMe SSD to the GPU using parallel pread() calls. The OS page cache handles hits automatically ("trust the OS"). ⚡ 2️⃣ Only a tiny fraction of parameters activate per token due to MoE. For Qwen3.5-397B-A17B, it activates ~17B total (top K experts per layer, reduced to K=4–6 on mobile for speed). As a result, ~0.6–2 tokens/sec on iPhone 17 Pro for the 397B model (0.6 t/s in early demos; 1–2 t/s projected with K-reduction & splits). Extremely slow but usable for short prompts! 📱💨 💻 How to build & run on Mac Start with the 35B model—it's much faster (~9–10 t/s on M3 Max, ~5.5+ t/s on iPhone). 1️⃣ Clone the repo: git clone Alexintosh/flash-moe 2️⃣ Build the Metal engine cd flash-moe/metal_infer && make 3️⃣ Run it ./infer --model /path/to/weights --prompt "Hello" --tokens 100 (Add --tiered if using tiered-quant weights for smaller footprint) ⚠️ Note that you should use pre-packed raw .bin weights from Hugging Face (NOT safetensors). Pre-packed models available under alexintosh/... 📱 How to build & run on iPhone 1️⃣ Build the Xcode project from FlashMoE-iOS/ in the repo (or check releases if available). Requires iOS 18+. 2️⃣ Download pre-packed 35B from Hugging Face: alexintosh/Qwen3.5-35B-A3B-Q4-Tiered-FlashMoE (~13.4–19.5GB). 3️⃣ Push model files to the app's Documents directory (use copy_model_to_iphone.sh script over USB, or UIDocumentPicker). Set files to isExcludedFromBackup. 4️⃣ Open the app, select the model folder, and start prompting! 💬 🔥 Warning Heavy SSD streaming + GPU compute draws massive power. Your phone WILL get very hot and drain battery fast! Avoid long sessions. 🔋📉 GitHub: Alexintosh/flash-moe

English

101

dfi@dfi·20 Mar

@SMB_Attorney So… Claude is the new LegalZoom? Lol

English

SMB Attorney@SMB_Attorney·19 Mar

A $200/hour or higher corporate lawyer would’ve told you a decade ago that none of this was worth billing for… then emailed you a template and wished you the best.

Nav Toor@heynavtoor

🚨 BREAKING: Claude can now write legal contracts like NDAs, freelance agreements, and LLC paperwork better than $800/hour corporate lawyers. Here are 12 prompts that replace $15,000 in legal bills: (Save this before it disappears)

English

384

110K

dfi@dfi·20 Mar

@Ex0byt @megakilo @0xSero I had Codex build a (slow rudimentary) version of this and called it the “revolver”. Feel free to steal the name. I thought it was cool. Can’t wait to test it on a new (to me) M1 Ultra 128gb coming in the mail! Would be awesome to run Kimi K2.5 on 4 year old hardware.

English

Eric@Ex0byt·19 Mar

@megakilo @0xSero Ha! You get to choose the name. I just want y'all running 1T-param models on your existing setups. 😉

English

Eric@Ex0byt·19 Mar

Get Excited: @0xSero and I are close — B300 is currently training on a tiny (15M param) side-loaded neural network that helps select, load, and cache the correct MoE experts for Kimi-K.2.5 (1T Param MoE model running on 25GB of memory). Once experiments are done -will share paper. "Thicket-Guided Expert Prediction for Memory-Minimal Trillion-Parameter MoE Inference on Unified Memory & Consumer Grade Hardware"

0xSero@0xSero

@pierrelezan Yes, @Ex0byt is working on this.

English

240

33K

dfi@dfi·19 Mar

@anemll My first thought when I saw his post!

English

355

Anemll@anemll·19 Mar

M5 Max SSD test. SSD is Memory.

Dan Woods@danveloper

x.com/i/article/2034…

English

597

104.8K

dfi@dfi·19 Mar

@0xSero @MichaelDell @nvidia give this man a GB300

English

767

0xSero@0xSero·19 Mar

Putting out a wish to the universe. I need more compute, if I can get more I will make sure every machine from a small phone to a bootstrapped RTX 3090 node can run frontier intelligence fast with minimal intelligence loss. I have hit page 2 of huggingface, released 3 model family compressions and got GLM-4.7 on a MacBook huggingface.co/0xsero My beast just isn’t enough and I already spent 2k usd on renting GPUs on top of credits provided by Prime intellect and Hotaisle. ——— If you believe in what I do help me get this to Nvidia, maybe they will bless me with the pewter to keep making local AI more accessible 🙏

Michael Dell 🇺🇸@MichaelDell

Jensen Huang is loving the new Dell Pro Max with GB300 at NVIDIA GTC.💙 They asked me to sign it, but I already did 😉

English

179

484

4.1K

916.6K

dfi@dfi·18 Mar

Groundbreaking! Thank you for this work @danveloper

Dan Woods@danveloper

x.com/i/article/2034…

English

110

dfi@dfi·18 Mar

@sudoingX Cooking 🔥

English

Sudo su@sudoingX·18 Mar

jensen just compared openclaw slop house to linux and called it the most popular open source project in history. i admire jensen but he has clearly never used openclaw on a small model. if his team had spent one day in my DMs watching people migrate off it to hermes agent because their tool calls kept failing he might have framed things differently. openclaw's founder left for openai. the codebase is 125K+ lines of typescript bloat. the sandbox blocks the tools that actually matter. small models can't use the MEDIA: syntax so your images never arrive. i know because i found that bug, wrote the fix, and got it merged into hermes agent the same day. you don't need a $4,699 DGX Spark or a corporate "openclaw strategy" to run an autonomous agent. you need a half decade old GPU sitting in your drawer and a framework that actually works from 7B to 70B without special syntax. hermes agent. 30+ tools. 11 model specific parsers. runs on a RTX 3060 at 35-50 tok/s. the fix i submitted yesterday is already in production. jensen i respect the vision but the migration is already happening and it's not going in the direction you announced.

English

217

14.1K

dfi@dfi·18 Mar

@sudoingX Grab a used M1 Mac Studio. You can get these for between $1000 (M1 Max with 32gb 400 GB/s unified ram) and $3500 (M1 Ultra with 128gb 800 GB/s unified ram). M1-M4 all similar. There is no better deal than these IMO. Can find hunting Backmarket, MacPro-LA, Google Shopping, etc.

English

186

Sudo su@sudoingX·18 Mar

the mac studio influencers and openclaw salesmen want you to buy a $4,699 box and photograph it on your clean desk next to a plant. then post about how you're "running AI locally" while routing every prompt through an API. buy a used gpu for $250. open a terminal. run the model. no desk photo needed. the terminal is the proof.

English

5.6K

Sudo su@sudoingX·18 Mar

local AI hardware tiers: $4,699 - DGX Spark (NVIDIA wants you here) $1,989 - RTX 4090 (overkill for most) $1000 - RTX 3090 used (sweet spot) $250 - RTX 3060 used (currently testing every model that fits 12GB) $0 - CPU only (it still works) jensen announced the top. i've been posting receipts from the bottom.

English

100

554

35.8K

dfi retuiteado

Brian Roemmele@BrianRoemmele·17 Mar

“Every software company in the world needs to have a Claw strategy" - Jensen Huang, Nvidia Indeed. This and more.

English

119

626

4.2K

603.9K

dfi@dfi·17 Mar

@ivanfioravanti @sabastod Yeah its on the App Store from a dev and YouTuber. I think he’s been around for a while (he also has the app xCreate which is his name on YouTube). Probably somewhere between LM Studio and oMLX in terms of feature set. Super beginner friendly—it’s how I got into local models.

English

174

Ivan Fioravanti ᯅ@ivanfioravanti·17 Mar

@dfi @sabastod Added to my list! I’m using oMLX now that is offering SSD caching too, but inferencer seems more mature, right?

English

300

Ivan Fioravanti ᯅ@ivanfioravanti·16 Mar

x.com/i/article/2033…

ZXX

139

42.5K

dfi@dfi·17 Mar

@ivanfioravanti @sabastod IMO OpenClaw was unusable for me with local models until I started using a server with prompt SSD caching (Inferencer). Total game changer.

English

149

Ivan Fioravanti ᯅ@ivanfioravanti·16 Mar

@sabastod 90K? I have seen something around 20K, but if you add MCPs then you can be right, wondering if KV Cache on SSD like the one offered by oMLX can help 🤔 Gonna test tomorrow M5 Max with coding harnesses!

English

503

dfi@dfi·13 Mar

@0xSero This is awesome. I’ve been throwing Codex + local models at getting a version of this working on MLX for Kimi K2.5 (for 256gb ram) for the last 24 hrs. My agents thank you for this repo!

English

236

0xSero@0xSero·13 Mar

Most stable 64% REAP No more runtime error, only 70% slowdown but model weights in vram 36% pre-quantization, whole model is available, so only losses are from wrong expert predictions. With Q4 quantization we can run Qwen3.5-35B theoretically near lossless. needing only 5gb~ of VRAM for weights with 65% speed retention on vllm. Hopefully this works for real stuff, we'll know more over the week.

English

7.2K

dfi@dfi·12 Mar

Agree with @heyitsalexsu on this focus. Don’t completely agree with original author’s take, but it is directionally on point. Still a must-read.

Alex Su@heyitsalexsu

Another great article from Zack. The “hollowed out” segment of legal services is enormous and where the most interesting developments are about to take place.

English

169

dfi@dfi·12 Mar

Doing some REAP experimentation with @0xSero's MLX REAP tool. ...and success! Uploaded my first models to Hugging Face! If you needed to squeeze out a few more GB from the Qwen 3.5 35B models on Mac, these are for you. huggingface.co/0xdfi

English

18.3K

dfi@dfi·12 Mar

Lots of hype lately about AI and agentic tools replacing outside counsel. But these takes are ignoring structural reasons in our legal system that should have companies pausing before they substitute AI tools in place of an attorney. Two recent federal court decisions (among the first to address whether AI-generated communications and outputs are discoverable) come to a clear conclusion: if there's no attorney involved, there may not be attorney-client privilege or work product protection. Those AI outputs may be fair game in litigation. My colleagues break down the practical takeaways for M&A deal teams in the linked alert. mayerbrown.com/en/insights/pu…

English

Descubrir

@wassielawyer @tinyblue_dev @exolabs @samwhoo @SMB_Attorney @Ex0byt @megakilo @0xSero