ScalaWilliam!

5.2K posts

ScalaWilliam!

@ScalaWilliam

Scala expert, AI for coding, data platforms, metadata management, process engineering, TDD. Creator @ScalaAlgorithms. Fan of digital and physical ergonomics.

London, England Katılım Kasım 2015

6.3K Takip Edilen1.1K Takipçiler

Sabitlenmiş Tweet

ScalaWilliam!@ScalaWilliam·18 Eyl

Why am I building scala-algorithms.com? #Scala is growing faster in 2020, but supply must keep up with demand. To help candidates, we need solutions with: - Standard #Scala (vs Python/C mutable style) - Consistent explanations & proofs to really learn - Test-cases for #TDD

English

ScalaWilliam! retweetledi

Cory House@housecor·17h

Y'all out there running multiple agents at once scare me. Why? Because it only saves time if I skip reading the code! And I've found human code review remains critical in 2026. So, when my agent is cooking, I don't sit around and wait. I read what it's spitting out and review the code it's generating in real-time. This way I catch issues early, and I actually understand the system later.

English

166

10.9K

ScalaWilliam! retweetledi

0xSero@0xSero·5d

I'm not the only one doing this. - karpathy best thought leader, best person to learn from imo. Nanochat is the best way to get into training LLMs its the simplest and most digestible source for building your first AI model - steipete This guys GitHub is a national treasure, his writing is also very strong. Peekaboo, summarize.sh, openclaw, oracle, just talk to it, etc.. all unique and very useful - badlogicgames Mario’s Pi is a staple AI engine and possibly the best, simplest, open source agentic loop to learn from. Despite what people say about his methods, I think he’s going to set some new standards for Open source contribution. Big respect. - TheAhmadOsman This man is the GPU king, giveaways and lots of dense educational content around self hosting and home inference. He’s also tight with pretty much all the open weight labs and has them on for interviews regularly - sudoingX This is an up and comer who will change the game, he's pushing the limits of what a single gpu can do - Ex0byt I can confidently say this man will be fundamental in making local inference on massive models possible. - alexinexxx I genuinely feel motivated by her drive. She’s a real hard worker learning about GPU kernel programming. Also good aesthetics - gospaceport I would not have gotten into building my own hardware without this man’s hard work. He’s taught me so much about hardware and the economics of this. He also has the most impressive homelabs I’ve ever seen. - alexocheema The founder of Exolabs, pioneering Apple hardware inference, he’s also very engaged in the community and a good guy all around. If you are interested in Mac minis and Mac Studios this is your guys. - nummanali This guy is so prolific, he’s made tons of CLI tools for managing llm subscription budgets, using Claude code with alternative models etc.. - thdxr The entire Opencode team is wonderful but Dax specifically is a good writer. More anti-doomer content to sooth your anxieties. - juliarturc If you are interested in the science, Julias channel is where it’s at. Almost everything I’ve learned about LLM compression has been from her. - Teknium The Nous research & Prime intellect teams are both some of the most hard-working and principled people around. Tough fight in an industry so aggressive. - victormustar Head of Product for Huggingface, enabling us all to publish our work. - louszbd Head of community at ZAI some of the top LLMs available right now that are open weights. They supercharged the movement - SkylerMiao7 Making frontier intelligence fit on 10k USD of hardware. Via MiniMax - crystalsssup Building the best Open Weight model on the market, and releasing their latest research before their next gen model. Believe it or not these people are carrying the entire industry and giving us a fighting chance.

English

350

4.4K

167.8K

ScalaWilliam!@ScalaWilliam·1d

this

Jason Bosco@jasonbosco

I see a new form of tech debt coming for dev teams - Comprehension debt. As more and more code is generated by LLMs, if teams don’t take the time to understand deeply what the generated code is doing, as well as code they write by hand… It’s only a matter of time before the code base starts looking unfamiliar to most of the team. It then becomes harder to discern if new code that LLMs generate is adding more spaghetti or if there’s a better approach. It’s a downward spiral from there - unrelated things break with every change despite existing tests passing, no one knows the full picture to be able to fix the root cause, not even an LLM, etc. So as tempting as it is to move super fast with LLMs, there’s only so much comprehension debt you can rack up before your code base silently becomes a Rube Goldberg machine under your nose.

English

120

ScalaWilliam!@ScalaWilliam·1d

Why is it that for server-side, I prefer Scala over Python, but on the client-side, I prefer JavaScript over TypeScript?

English

1.6K

ScalaWilliam!@ScalaWilliam·4d

@reneil1337 @coolifyio What model are you using, what context are you using, & prefill & tg are you seeing?

English

Reneil@reneil1337·16 Mar

Mind blown: Just installed a RTX 4000 Pro SFF via Oculink (m2 adapter) to my Lattepanda Sigma board which is my homelab @coolifyio server How? Connected via SSH and told my main gpu cluster per opencode to setup the new GPU. It did the entire setup within 10 mins, 100% local 🤘

English

646

ScalaWilliam!@ScalaWilliam·4d

Provoking thought: if we made APIs simpler and self-describing, what would agent skills look like?

English

ScalaWilliam!@ScalaWilliam·4d

In our age of Agentic AI we are faced with a new divide: that of the Operator and the Supervisor; the user who wants to do everything by hand versus the user who wants to let the 'vibes' do it. Frankly, you have to have go with a mixture of both. Ignore vibing (supervising) and you're just lost time. Ignore operating and you've lost understanding.

English

134

ScalaWilliam!@ScalaWilliam·4d

The problem was not that we didn't need APIs (we did and still do), but it's that we reinvent them every single time, instead of abstracting them once and forever. Even in this "AI Native" day, your start-point is Swagger file instead of a simple RPC interface.

English

ScalaWilliam!@ScalaWilliam·4d

Remember "API-first", and then we ended up with incredibly dull applications?

English

ScalaWilliam!@ScalaWilliam·4d

This. Just because you CAN, does not mean you SHOULD. Non-programmers are spending their weekends building software projects, which is a huge enabler for the abstract-minded because they get the early feedback and validation. I am finding this new avenue of creation absolutely fascinating as the abstract-minded people previously had no avenue of expression, often crashing against the concrete-minded who never seem to understand them. The most amazing thing is that often the stuff they come up with is actually GOOD (on the surface), and done even faster than by the concrete-minded people. But it lacks the last-mile closure, which concrete-minders are awesome at. Concrete-minders on the other hand are not that great at the bigger picture. AI is accelerating both the abstract-minded and the concrete-minded, and the power dynamic and maximum influence emerge from power-pairings of abstract+concrete thinkers who are both AI-enabled, which accelerates both the bigger picture (abstract, big leaps, feely) and the last mine (concrete, tedious, precise). If you have seen any of these pairings, I want to listen from them and learn from their experiences!

Zara Zhang@zarazhangrui

Almost every AI power user I know is MORE stressed and busier after using AI, not less What people thought AI would do: 10x productivity so that we can finish work earlier & relax more What it’s actually doing: 10x productivity so that we end up with 20x more things to do cos of the sheer possibilities

English

ScalaWilliam!@ScalaWilliam·4d

Another note on User Interaction - and this is a hugely common problem in Finance user interfaces: the Advanced mode and the Simplified mode. The right answer is to come up with a user interface that will satisfy both beginners and experts alike - not to come to multiple variations of using the same tool. Different modes should have different tools, but different tools should also serve different purposes / scenarios. If you can't get both experts and beginners alike to use your software at the same time under the same scenarios, without affecting their workflows, there's a problem and you need a deep UX expert. Deep UX experts are such amazing people to work with.

English

ScalaWilliam!@ScalaWilliam·4d

Did you know who made Copy-Paste? It's en.wikipedia.org/wiki/Larry_Tes… He's also against Modal software, and guess what is modal software? Emacs and Vi. No wonder I never liked either, and never understood the whole emacs vs vi debate.

English

ScalaWilliam!@ScalaWilliam·4d

@YoelNisanov @AdamMGrant LOVE this thought.

English

Yoel Nisanov@YoelNisanov·17 Mar

@AdamMGrant I’d add restraint to that list. When ideas are cheap, the real work is saying no to 90% of them and going deep on the one that still feels solid a month later. Taste picks the direction, tenacity carries it, but restraint keeps you from chasing every shiny output.

English

343

Adam Grant@AdamMGrant·17 Mar

The most important skill for creativity is no longer original thinking. It’s taste and tenacity. In the age of AI, ideas are abundant. Good judgment and execution are scarce. The future belongs to those who excel at finding and amplifying the signal in the noise.

English

208

298

1.6K

85.5K

ScalaWilliam!@ScalaWilliam·4d

Are our devices designed to make us Passengers rather than Pilots? Came across this read by Bret Victor (@worrydream): wonderful read. worrydream.com/ABriefRantOnTh… "That's the fundamental gesture in this technology. Sliding a finger along a flat surface. There is almost nothing in the natural world that we manipulate in this way."

English

ScalaWilliam!@ScalaWilliam·18 Mar

@Ex0byt tg/s, pp/s?

English

355

Eric@Ex0byt·18 Mar

Kimi-K2.5 (1T-parameter MoE) running coherently on 25GB of GPU memory (on a unified 128 GB machine)!

English

564

121.3K

ScalaWilliam!@ScalaWilliam·17 Mar

@Ex0byt Prefill speeds on this?

English

285

Eric@Ex0byt·17 Mar

Exciting Experiment Update: We ran StepFun_ai's Step-3.5-Flash (197B MoE) on 6.29 GB of GPU memory! Flat. Zero growth. Same footprint at token 1 as at token x100. The model's weights are ~105 GB INT4 (394GB original bf16!). We're running it on 6.29 GB!! — 1/16th the weight footprint, flat across every token. How: - Separated expert from non-expert skeleton (6.1 GB) lives permanently on GPU - 66.8 MB staging buffer — 8 expert slots, overwritten every layer - 12,096 unique experts (36,288 weight matrices) stay off-GPU until the router selects them - Router picks. DMA fires. Buffer overwrites. Nothing accumulates. The invariant held across every token: - GPU after token 1: 6,286 MB - GPU after token 100: 6,286 MB - Delta: 0.0 MB Correctness: 3/3 PASS — reasoning, religion, coding. Ceiling: 15.6 tok/s (on my single-GPU hardware). The architecture is model-agnostic. Any MoE. Any size! Shoutout to my dude 0xSero. We've been trading notes all week. He's got Kimi K2.5 running across 8×3090s! while we took different journeys on different hardware, we share the same obsession. Amazing collab. More soon..

English

516

46.1K

ScalaWilliam!@ScalaWilliam·16 Mar

@digitalix AI go Brrrrr

Português

Alex Ziskind@digitalix·16 Mar

extreme performance - 150TB/s bandwidth with Groq LPU

English

701

36.9K

ScalaWilliam!@ScalaWilliam·16 Mar

@juristr Between L6 and L7. Exception is I don’t do YOLO because YODO (will pre approve certain types of commands but not EVERYTHING). L8 is only possible in non critical work IMO.

English

226

Juri Strumpflohner@juristr·16 Mar

What's your AI adoption level? (according to Steve Yegge)

English

290

988

2.2M

ScalaWilliam!@ScalaWilliam·16 Mar

Opus 4.6 on Gemini (Thinking):

English

ScalaWilliam!@ScalaWilliam·15 Mar

@foley2k2 awesome - tyvm!

Svenska

Jason@foley2k2·15 Mar

@ScalaWilliam For a better example, I fed it this (~20k tokens) and got 116.77 sec time to first token. gutenberg.org/cache/epub/782…

English

Jason@foley2k2·15 Mar

Figured out the settings for Nemotron Super 120b on my 96GB system. Load all model layers, 77 MoE layers pushed to CPU. MoE layers on this model are very ram hungry. 68GB system ram used. Comparable speed to Qwen 3.5 122b.

English

138

Keşfet

@reneil1337 @coolifyio @YoelNisanov @AdamMGrant @worrydream @Ex0byt @elonmusk @BarackObama