Kent Worcester

191 posts

Kent Worcester

@KentWorcester

Building @ValiantTrade on @Fogo | prev: @fortresslabs

Entrou em Mart 2022

155 Seguindo293 Seguidores

Kent Worcester retweetou

Georgi Gerganov@ggerganov·2d

llama.cpp at 100k stars now that 90% of the code worldwide is being written by AI agents, I predict that within 3-6 months, 90% of all AI agents will be running locally with llama.cpp 😄 Jokes aside, I am going to use this small milestone as an opportunity to reflect a bit on the project and the state of AI from the perspective of local applications. There is a lot to say and discuss and yet it feels less and less important to try to make a point. Opinions about viability of local LLMs are strongly polarized, details are overlooked, the scientific approach is lacking. Arguments are predominantly based on vibes and hype waves. One thing is clear though - local LLMs are used more and more. I expect this trend to continue and likely 2026 will end up being one of the most important years for the local AI movement. I admit that I didn't expect the agentic era to come so quickly to the local LLM space. One year ago, the available models were too computationally expensive for doing long-context tasks. There wasn't an obvious path towards meaningful agentic applications. The memory and compute requirements were huge. Last summer, with the release of gpt-oss, things started to change. It was the first time we saw a glimpse of tool calling that actually works well within the resource constraints of our daily devices. Later in the year, even better models were released and by now, useful local agentic workflows are a reality. Comparing local vs hosted capabilities at a given moment of time is pointless. To try put things into perspective: - We don't need frontier intelligence to automate searches and sending emails - We don't need trillion parameter models to be able to summarize articles or technical documents - We don't need massive GPU data centers to control our home appliances or turn the lights off in the garage I believe that there is a certain level of intelligence we as humans can comprehend and meaningfully utilize to improve our working process. Beyond that level, access to more intelligence becomes unnecessary at best and counterproductive at worst. I also believe that that level of useful artificial intelligence is completely within reach locally and it has always been just a matter of implementing the right software stack to bring it to the end user. With llama.cpp, I am confident that we continue to be on the right track of building that software stack! The llama.cpp project is going stronger than ever. With more than 1500 contributors, the project keeps growing steadily. From technical point of view, I think that llama.cpp + ggml is the only solution that actually makes sense. That is, the software stack must run efficiently on every possible device, hardware and operating system. The technology is too important to be vendor-locked. It has to be developed in the open, by the community, together with the independent hardware vendors. This is the only right way to build something that will truly make a difference in the long run. I won't try to convince you about what is currently and will be possible with local AI. We will just continue to build as usual. I am confident that after the smoke clears and we look objectively at what we have built together, the benefits will be obvious to everyone. Big shoutout to all llama.cpp maintainers. I feel extremely lucky to be able to work together with so many talented contributors. Every day I learn something new and I feel there is so much more cool stuff that we are going to build. Also, I am really thankful that the project continues to have reliable partners to support it! Cheers!

English

140

286

2.1K

177.1K

Kent Worcester retweetou

Aakash Gupta@aakashgupta·2d

The timeline on this is genuinely insane. October 2025: Sam Altman flies to Seoul and signs simultaneous deals with Samsung and SK Hynix for 900,000 DRAM wafers per month. That's 40% of global supply. Neither company knew the other was signing a near-identical commitment at the same time. Those deals were letters of intent. Non-binding. No RAM actually changed hands. But the market treated them as gospel. Contract DRAM prices jumped 171%. A 64GB DDR5 kit went from $190 to $700 in three months. December 2025: Micron kills Crucial, its 29-year-old consumer memory brand, to reallocate every wafer to AI and enterprise customers. The company explicitly said it was exiting consumer memory to "improve supply and support for our larger, strategic customers in faster-growing segments." Translation: the AI demand signal was so loud that selling RAM to PC builders stopped making financial sense. March 2026: Google publishes TurboQuant, a compression algorithm that reduces AI memory requirements by 6x with zero accuracy loss. Cloudflare's CEO called it "Google's DeepSeek." The entire thesis that AI would consume infinite memory forever just got a six-month expiration date on it. Same month: OpenAI and Oracle cancel the Abilene Stargate expansion. The $500 billion data center vision that justified the RAM deals couldn't survive its own financing terms. Bloomberg attributed the collapse partly to OpenAI's "often-changing demand forecasting." MU is now down ~33% from its post-earnings high. Revenue up 196% year over year, EPS up 682%, and the stock is in freefall because the company restructured its entire business around a demand signal that came from non-binding letters and is now being compressed out of existence by a research paper. Micron bet the consumer division on Sam Altman's signature. The signature was worth exactly what the paper said: nothing binding.

Grummz@Grummz

Imagine closing your entire consumer memory division because this guy signed a non binding letter that he would buy 40% of the world’s RAM. Only to have him rug pull 3 months later.

English

262

1.8K

14.1K

1.6M

Kent Worcester@KentWorcester·1d

ZXX

Kent Worcester@KentWorcester·1d

👀

H@hcompany_ai

Holo3 is here 🚀. Today, we're launching Holo3: our new series of frontier computer-use models. 78.9% on OSWorld-Verified. That puts us ahead of GPT-5.4 and Opus 4.6, at one-tenth of the cost. Weights on Hugging Face. API is live. Test it now! #Holo3 #OpenSource #ComputerUse #OSWorld #AI #AgenticAI

ART

Kent Worcester retweetou

Qwen@Alibaba_Qwen·2d

🚀 Qwen3.5-Omni is here! Scaling up to a native omni-modal AGI. Meet the next generation of Qwen, designed for native text, image, audio, and video understanding, with major advances in both intelligence and real-time interaction. A standout feature: 'Audio-Visual Vibe Coding'. Describe your vision to the camera, and Qwen3.5-Omni-Plus instantly builds a functional website or game for you. Offline Highlights: 🎬 Script-Level Captioning: Generate detailed video scripts with timestamps, scene cuts & speaker mapping. 🏆 SOTA Performance: Outperform Gemini-3.1 Pro in audio and matches its audio-visual understanding. 🧠 Massive Capacity: Natively handle up to 10h of audio or 400s of 720p video, trained on 100M+ hours of data. 🌍 Global Reach: Recognize 113 languages (speech) & speaks 36. Real-time Features: 🎙️ Fine-Grained Voice Control: Adjust emotion, pace, and volume in real-time. 🔍 Built-in Web Search & complex function calling. 👤 Voice Cloning: Customize your AI's voice from a short sample, with engineering rollout coming soon. 💬 Human-like Conversation: Smart turn-taking that understands real intent and ignores noise. The Qwen3.5-Omni family includes Plus, Flash, and Light variants. Try it out: Blog: qwen.ai/blog?id=qwen3.… Realtime Interaction: click the VoiceChat/VideoChat button (bottom-right): chat.qwen.ai HF-Demo: huggingface.co/spaces/Qwen/Qw… HF-VoiceOnline-Demo: huggingface.co/spaces/Qwen/Qw… API-Offline: alibabacloud.com/help/en/model-… API-Realtime: alibabacloud.com/help/en/model-…

English

171

599

4.6K

919K

Kent Worcester retweetou

Cheng Lou@_chenglou·5d

My dear front-end developers (and anyone who’s interested in the future of interfaces): I have crawled through depths of hell to bring you, for the foreseeable years, one of the more important foundational pieces of UI engineering (if not in implementation then certainly at least in concept): Fast, accurate and comprehensive userland text measurement algorithm in pure TypeScript, usable for laying out entire web pages without CSS, bypassing DOM measurements and reflow

English

1.3K

8.2K

64.4K

22.8M

Kent Worcester retweetou

Delicious Tacos@Delicious_Tacos·5d

Hearing a company that regularly releases new AI models will release a new AI model They’re saying it’s better than the last one This is huge

English

1.2K

19.5K

Kent Worcester@KentWorcester·5d

yup

RoyalCities@RoyalCities

Maybe I’ve been living under a rock, but when did open source video models get this good? This is LTX 2.3…and yeah, it’s not hard to guess what it’s trained on. Still wild this runs locally. No wonder Sora got shut down.

QST

Kent Worcester retweetou

“paula”@paularambles·26 Mar

“this is a significant refactor” just put the tokens in the bag lil bro

English

325

7.6K

217.1K

Kent Worcester retweetou

ComfyUI@ComfyUI·26 Mar

Upgrading your RAM is now unnecessary. Introducing our new ComfyUI Dynamic VRAM optimization. Running local models is now possible on even the most memory constrained hardware. Read more here: blog.comfy.org/p/dynamic-vram…

English

319

2.9K

447.7K

Kent Worcester@KentWorcester·21 Mar

qwen3.5:9b (6GB)

stevibe@stevibe

Got a 24GB Graphics Card? These 6 coding models all fit on it (Q4): - qwen3.5:27b (17GB) - qwen3.5:35b (24GB) - glm-4.7-flash (19GB) - nemotron-3-nano:30b (24GB) - nemotron-cascade-2:30b (24GB) - gpt-oss:20b (14GB) I gave them the same challenge: draw a campfire with HTML Canvas. Why Canvas? HTML/CSS forgives bad syntax — things still render. JavaScript + Canvas doesn't — one mistake and the screen goes black.

Polski

Kent Worcester@KentWorcester·21 Mar

sent via hermes agent

Català

Kent Worcester@KentWorcester·21 Mar

Figuring out the right parameters is the key. I'm still not sure what temp, top-p, and top-k to use. Or if I should be using thinking mode.

Sudo su@sudoingX

x.com/i/article/2034…

English

Kent Worcester retweetou

Sudo su@sudoingX·20 Mar

x.com/i/article/2034…

ZXX

185

1.5K

186.8K

Kent Worcester retweetou

Unsloth AI@UnslothAI·19 Mar

Qwen3.5-4B searched 20+ websites, cited its sources, and found the best answer! 🔥 Try this locally with just 4GB RAM via Unsloth Studio. The 4B model did this by executing tool calls + web search directly during its thinking trace.

English

246

2.3K

126.6K

Kent Worcester@KentWorcester·19 Mar

gonna try to do this with ComfyUI

Linoy Tsaban@linoy_tsaban

Combine LTX 2.3 distilled with pose control LoRA & audio input and you get a fast, character animation model competitive with state of the art🔥 So I built a demo for it 👇 huggingface.co/spaces/linoyts…

English

Kent Worcester retweetou

Paul Bakaus@pbakaus·19 Mar

Introducing Radiant: 80+ production-ready shaders and visual effects for the web. 0 dependencies, MIT license. - multiple color themes - ultra-realistic simulations - webgl and 2d canvas Pick one, copy source, integrate, ship. radiant-shaders.com