alex rudloff

401 posts

alex rudloff banner
alex rudloff

alex rudloff

@alexrudloff

Business + Product + Tech + Art

Beach in FL / Mountains in NC Katılım Ekim 2006
884 Takip Edilen6.5K Takipçiler
alex rudloff
alex rudloff@alexrudloff·
I’m not going to blindly believe any llm intelligence breakthrough unless its made by an actress with one GitHub commit
English
0
0
0
13
alex rudloff
alex rudloff@alexrudloff·
This is becoming a familiar pattern. Thanks codex.
alex rudloff tweet media
English
0
0
0
36
alex rudloff retweetledi
dealign.ai
dealign.ai@dealignai·
the secret? native macos memory compression. used with the perfect timing u can compress unused routed experts up. im now getting a literal 70% decrease in ram usage for literally every single model with no hit to coherency and speed. i have a feeling this is going to be really big.
English
6
2
42
1.1K
alex rudloff retweetledi
Lisa Forte
Lisa Forte@LisaForteUK·
Learning lessons from Jurassic Park
Lisa Forte tweet media
English
56
873
10.4K
570.5K
alex rudloff retweetledi
Sandro
Sandro@pupposandro·
We just released something new: Luce PFlash Long-context prefill is a silent killer for throughput speed. llama.cpp takes ~257 seconds to prefill 128K tokens of Qwen3.6-27B on a single RTX 3090. So we tried to solve the problem. A small Qwen3-0.6B drafter loads in-process, scores token importance across the whole prompt, and the heavy 27B target only prefills the spans that matter. 128K prompt in 24.8 seconds, ~10.4x faster TTFT, NIAH retrieval preserved at every measured context. It is a clean C++/CUDA port of FlashPrefill wired through Block-Sparse Attention, with a custom Qwen3-0.6B BF16 forward so drafter and target share one ggml allocator. The whole thing is a single daemon command (compress) in front of the existing dflash spec-decode stack. More details here: github.com/Luce-Org/luceb…
GIF
Sandro@pupposandro

x.com/i/article/2050…

English
39
90
707
113K
alex rudloff retweetledi
AboveSpec
AboveSpec@above_spec·
"You need a 24 GB GPU for serious local LLMs in 2026." Everyone repeats this. It's not true anymore. Just ran a 35B-parameter model on an RTX 4060 Ti 8 GB: • 41 tok/s at 16k context • 24 tok/s at 200k context Recipe + benchmarks below 🧵
AboveSpec tweet media
English
134
232
2.8K
273.7K
alex rudloff retweetledi
Daniel Jeffries
Daniel Jeffries@Dan_Jeffries1·
All the Doomers and hawks are lining up behind this distillation "attack" farce because they want to see open source banned. It's really as simple as that. They want to take away your right to choose, and take away businesses' rights to fine tune and make your products cheaper and better. The end state here, if we let these short-sighted people win, is a horrible place for America: They will look to ban Chinese models under the guise of national security grounds and conveniently leave only proprietary American companies standing in the USA. The only real "attack" happening is closed source companies attacking open source the same way Microsoft once tried to attack Linux to create regulatory capture. If you can't win in the market, win in Washington is their strategy. Do not be fooled by this regulatory capture and saber rattling nonsense. It's a bait and switch. The goal is to rob you of choice. That's it. These short sighted policies will make America weaker, not stronger. These are the very folks whose shoot-us-in-the-foot policies lost NVIDIA 100% of the market share in China, driving it to basically 0%, while kickstarting the moribund Chinese chip ecosystem. It was dead in the water, and now it's awakened from its deep slumbers. Old state sponsored dinosaurs are reborn as emerging chip powerhouses. The demand for Chinese chips is accelerating and it will only get stronger. When Jensen is proven right a few years from now (he's the best long term thinker in business today) and you have hundreds of cheap Chinese models running optimized on Chinese chips and those models are now hard to run on NVIDIA hardware you can thank these folks. If you're banned in the USA from using these models and these chips, do you think the rest of the world will be? Nope. They'll happily adopt the cheaper, faster, good enough models that we kickstarted with our short-sightedness. 1 billion people in the west will be banned and using closed/gated/sluggish/censored/surveilled models that destroy your privacy while 6 billion other people use the now dominant Chinese ecosystem and your NVIDIA retirement shares lose money. When you can't use open source anymore because it gets banned for Americans, you can thank these short-sighted, foolish folks. When your API bills is a billion dollars and burns your budget in three months instead of 12, you can thank these folks. When all your personal intimate, personal data flows threw a few tight gateways and choke points mandated by law, you can thank these folks.
Chris McGuire@ChrisRMcGuire

Sorry but that just isn’t true—distillation attacks are illicit activity, not an industry standard. They are against the terms of service of all frontier AI labs. There is a reason OpenAI, Anthropic, and Google all put out reports warning about it: none of them do it.

English
17
38
154
47.9K
Ivan Raszl
Ivan Raszl@iraszl·
@alexrudloff @jun_song It’s based on discussion with people who tried running locally. Not AI. The issue apparently is that you can’t dedicate all the memory in your mac to the LLM, because at the very least you need to run the OS and typically a few other apps.
English
1
0
0
131
Ivan Raszl
Ivan Raszl@iraszl·
Thinking of running Local LLM on a new MBP? Here is the level of intelligence you can get with various memory configurations on open models: 🐹 16–24GB RAM → ≈ GPT-3.5 🐕 32–48GB RAM → ≈ higher-end GPT-3.5 🐅 64GB RAM → ≈ lower-end GPT-4 🐉 96–128GB RAM → ≈ mid-tier GPT-4 All still below newer GPT or Claude models.
Ivan Raszl tweet media
English
51
6
171
38.6K
alex rudloff retweetledi
Gergely Orosz
Gergely Orosz@GergelyOrosz·
OpenClaw - the agentic software spreading like wildfire - was built on top of Pi, a minimalist, self-modifying agent. I sat down with Pi's creator, @badlogicgames and longtime Pi user (+ the creator of Flask) @mitsuhiko to talk Pi, and their (very grounded!) takes on building with AI. Timestamps: 00:00 Intro 07:30 How Mario, Armin, and Peter Steinberger met 15:15 How 30 dev teams use AI agents: learnings 21:50 The importance of judgment 24:26 Challenges when non-engineers write code 28:30 Downsides of over-automation 32:18 Pi 48:09 OpenClaw + Pi 50:54 “Clankers” 57:32 Open source and AI 1:00:22 Complexity as the enemy 1:02:50 Building an AI-native startup 1:11:52 “Slow the F down” 1:16:40 MCPs vs. CLI 1:25:03 Predictions and staying up to date • YouTube: youtu.be/n5f51gtuGHE • Spotify: open.spotify.com/episode/1fDw9c… • Apple: podcasts.apple.com/us/podcast/bui… Brought to you by: • @statsig  – ⁠ The unified platform for flags, analytics, experiments, and more. statsig.com/pragmatic • @SonarSource  — The makers of SonarQube, the industry standard for code verification and automated code review. Try it out for yourself. sonarsource.com/plans-and-pric… • @WorkOS  – WorkOS gives you APIs to ship enterprise features – SSO, directory sync, RBAC, audit logs – in days, not months. Visit WorkOS.com to learn more. --- Three parts I found especially interesting in this discussion: 1. New trend: AI makes it harder for senior engineers to reject pointless complexity. Historically, senior engineers kept software complexity at bay simply by saying “no” a lot. But Armin observes that these days, more junior engineers and product managers deploy agent-scripted counterarguments when a senior colleague kicks an idea to the curb. This makes decision-making exhausting, and more bad ideas make it into production as a result. 2. It should be MUCH easier to build specialized tools for specific tasks. Different projects need different harness types because, as Mario points out, the same hammer is not ideal for every single construction job. As such, Pi is built with the goal of allowing the creation of specialized harnesses. It can modify itself so that a user can create the bespoke harness needed for any task. Mario believes it’s a preview of how self-modifiable software might look in the future. 3. Automation bias is one of the biggest risks of working with AI agents. Once devs confirm that an AI agent can produce acceptable code, they start to review its output less often, even though agents can – and do! – produce slop. Mario advises being far more sceptical with agents, and cautions that the quality of their output isn’t guaranteed, however well they performed previously.
YouTube video
YouTube
English
20
115
1.2K
169.5K
alex rudloff retweetledi
Aaron Levie
Aaron Levie@levie·
Will keep saying this, but software jobs aren’t going away. Agents are the single biggest form of leverage for anyone technical in history. Probably has never been a better time to be technical in terms of being able to accomplish something solo, in a team, or company. We think that most of the world’s software has already been built and that agents will just reduce work from an existing pie. In fact, we are about to experience 100X more software than before. Think about how many apps you regularly use that need to get better. How many legacy on prem systems that have to get replatformed for the cloud. How many SMBs never could hire developers. How many security issues are about to be uncovered and need to get patched. How many IT organizations are about to bring automation to workflows they never could have automated. How much data is about to processed and connected in most organizations. This is all what the agents will be working on. And every one of those agents will need a person to kick them off, manage their work, orchestrate them, and get their output into a workable and useful form. That person will generally need to be technical (or become technical quickly), and this will create a huge amount of opportunity for anyone up to the task.
Shay Boloor@StockSavvyShay

$AMZN AWS CEO pushed back on the idea that AI is killing software jobs by saying Amazon is hiring as many developers as ever. He said AI agents are “exploding” across every industry & moving faster than expected changing the developer job rather than eliminating it.

English
61
92
628
104.8K
alex rudloff retweetledi
Brooks Otterlake
Brooks Otterlake@i_zzzzzz·
This is just like being alive in the 1600s when they got good at making complicated clocks and deduced that every complicated thing in the universe probably functioned exactly like a clock
Dwarkesh Patel@dwarkesh_sp

There's a quadrillion-dollar question at the heart of AI: Why are humans so much more sample efficient compared to LLM? There are three possible answers: 1. Architecture and hyperparameters (aka transformer vs whatever ‘algo’ cortical columns are implementing) 2. Learning rule (backprop vs whatever brain is doing) 3. Reward function @AdamMarblestone believes the answer is the reward function. ML likes to use pretty simple loss functions, like cross-entropy. These are easy to work with. But they might be too simple for sample-efficient learning. Adam thinks that, in humans, the large number of highly specialised cells in the ‘lizard brain’ might actually be encoding information for sophisticated loss functions, used for ‘training’ in the more sophisticated areas like the cortex and amygdala. Like: the human genome is barely 3 gigabytes (compare that to the TBs of parameters that encode frontier LLM weights). So how can it include all the information necessary to build highly intelligent learners? Well, if the key to sample-efficient learning resides in the loss function, even very complicated loss functions can still be expressed in a couple hundred lines of Python code.

English
107
1K
13.1K
805.7K