ollama

347

marcelo@zidszopers·40m

@ollama why are you ignoring hermes-agent?

English

0

372

ollama@ollama·3h

Ollama is now updated to run the fastest on Apple silicon, powered by MLX, Apple's machine learning framework. This change unlocks much faster performance to accelerate demanding work on macOS: - Personal assistants like OpenClaw - Coding agents like Claude Code, OpenCode, or Codex

English

122

251

2K

162.7K

ollama retweetou

Cheng@zcbenz·43m

We have been expecting this since ollama's first pull request to MLX. It is just the beginning, CUDA & CPU backends are still improving and hopefully we will have one framework unifying inference & training for all platforms.

Ollama is now updated to run the fastest on Apple silicon, powered by MLX, Apple's machine learning framework. This change unlocks much faster performance to accelerate demanding work on macOS: - Personal assistants like OpenClaw - Coding agents like Claude Code, OpenCode, or Codex

English

2

17

2.1K

ollama retweetou

Prince Canuma@Prince_Canuma·1h

You can now run Ollama using MLX as a backend 🚀

Ollama is now updated to run the fastest on Apple silicon, powered by MLX, Apple's machine learning framework. This change unlocks much faster performance to accelerate demanding work on macOS: - Personal assistants like OpenClaw - Coding agents like Claude Code, OpenCode, or Codex

English

3

2

31

3.4K

ollama@ollama·1h

@zzddfge @JustinLin610 if you have the weights already, no. But it's only available for: ollama pull qwen3.5:35b-a3b-coding-nvfp4 and the int4 variation.

English

2

1.3K

Wolfbane@zzddfge·1h

@ollama @JustinLin610 Do we need to re pull the models?

English

0

1.4K

ollama@ollama·1h

@Sc0ttTheRobot it'll work (just have to watch out for other apps taking away available memory)

English

0

58

RexMonte@Sc0ttTheRobot·2h

@ollama More than? Why if I have a 32gb Mac mini exactly?

English

0

1

113

ollama@ollama·3h

Improved caching for more responsiveness Ollama’s cache has been upgraded to make coding and agentic tasks more efficient. Lower memory utilization: Ollama will now reuse its cache across conversations, meaning less memory utilization and more cache hits when branching when using a shared system prompt with tools like Claude Code. Intelligent checkpoints: Ollama will now store snapshots of its cache at intelligent locations in the prompt, resulting in less prompt processing and faster responses. Smarter eviction: shared prefixes survive longer even when older branches are dropped.

English

2

32

7.6K

ollama@ollama·2h

@ryolu_ Keep cooking 🍳

English

13

654

Ryo Lu@ryolu_·3h

when software had a soul there was a moment around 2005 when using a Mac felt like touching something alive. the dock bounced. the genie effect swooped. exposé scattered your windows like cards on a table. none of it was strictly necessary. all of it felt like someone cared – not about metrics, but about the feeling of using a machine. software back then had texture. it had a philosophy. you could feel the person behind it. someone made a decision to make that icon beautiful, to animate that transition just so, to write that error message with a little warmth. apps had personalities. some were weird. some were over-designed in ways that would make a modern PM flinch. but they were alive. the web was the same. personal sites were genuinely personal. blogs felt like letters. forums had regulars. you knew who made what. the internet had neighborhoods, and each one felt different. nothing was optimized for scale. things were made by people who loved what they were making. somewhere along the way, we traded all of that for growth. A/B tests flattened the edges. design systems standardized the personality out. everything got faster, smoother, more consistent – and somehow less interesting. the quirks were removed because they didn't test well. the warmth got cut because it wasn't measurable. we optimized our way into a world of things that work perfectly and feel like nothing. now every app looks the same. every interface follows the same patterns. every product speaks in the same calm, frictionless voice, siloed in their own little islands. the humanity got rounded off. and then came AI agents. and the speed got inhuman. now you can generate an entire product in an afternoon. ship a feature before lunch. spin up ten variations before anyone's had their coffee. the gap from idea to code is basically zero. which sounds incredible. and it is. but there's a catch. when making things are too easy, the slop comes for free too. mediocre things don't look obviously bad – they look fine. they work. they ship. they pass review. and now there are infinite of them. the internet is filling up with software that functions but means nothing. interfaces that are correct but feel dead. products made by agents, reviewed by no one, shipped into the void. this is the thing that keeps me up at night. not that AI will replace people who care. but that it will drown them out. here's what I still believe: the best things are made by people who couldn't help themselves. someone who lost sleep over an icon. who rewrote the same line of copy twelve times. who added an animation nobody asked for because it made the thing feel right. that obsession – that's not inefficiency. that's the whole point. AI doesn't make that irrelevant. it actually makes it rarer and more valuable. taste is not a markdown skill. caring is not a parameter. the weird, specific, "soul" thing you put into something – that can't be programmed into existence. the path forward isn't to make more slop faster. it's to finally give people with real vision the tools to make the thing they always imagined but couldn't build alone. the designer who had the idea but couldn't code. the kid who saw something nobody else saw. the person who cared too much about something most people wouldn't notice. if we get this right, we don't get a faster factory. we get a renaissance. more strange, personal, opinionated software made by teams of people who care and mean it. that's still possible. but only if the people who care get the space and tools to actually express themselves – and don't just hand the wheel to the agent and walk away.

English

52

88

672

37.6K

ollama@ollama·2h

@SMT_Solvers ❤️❤️❤️ super happy! We are all in it together to drive open model adoption.

English

1

13

2.4K

Chad Brewbaker@SMT_Solvers·2h

@ollama I already switched to raw llama.cpp

English

2

0

1

2.6K

ollama@ollama·2h

@ivanfioravanti one model first! More to come ❤️❤️❤️

English

4

0

32

1.2K

Ivan Fioravanti ᯅ@ivanfioravanti·2h

Ollama has got MLX finally! You did it @ollama 🚀 I have to deep dive on this ASAP!

Ollama is now updated to run the fastest on Apple silicon, powered by MLX, Apple's machine learning framework. This change unlocks much faster performance to accelerate demanding work on macOS: - Personal assistants like OpenClaw - Coding agents like Claude Code, OpenCode, or Codex

English

2

56

3.3K

ollama retweetou

Awni Hannun@awnihannun·3h

You can now run LMs with Ollama + MLX! I've been waiting for this moment since MLX was first open sourced, so glad that it finally arrived.

Ollama is now updated to run the fastest on Apple silicon, powered by MLX, Apple's machine learning framework. This change unlocks much faster performance to accelerate demanding work on macOS: - Personal assistants like OpenClaw - Coding agents like Claude Code, OpenCode, or Codex

English

8

19

214

28.5K

ollama@ollama·3h

@lucatac0 🚀🚀🚀

QME

12

3.3K

Luis Catacora@lucatac0·3h

@ollama wow MLX & NVFP4 support? ❤️‍🔥

English

0

13

3.4K

ollama@ollama·3h

Blog post ollama.com/blog/mlx

English

2

0

21

6.7K

ollama@ollama·3h

Future models We are actively working to support future models. For users with custom models fine-tuned on supported architectures, we will introduce an easier way to import models into Ollama. In the meantime, we will expand the list of supported architectures.

English

2

0

30

6.9K

ollama@ollama·3h

NVFP4 support: higher quality responses and production parity Ollama now leverages NVIDIA’s NVFP4 format to maintain model accuracy while reducing memory bandwidth and storage requirements for inference workloads. As more inference providers scale inference using NVFP4 format, this allows Ollama users to share the same results as they would in a production environment. It further opens up Ollama to have the ability to models optimized by NVIDIA’s model optimizer. Other precisions will be made available based on the design and usage intent from Ollama’s research and hardware partners.

English

0

51

8.3K

ollama@ollama·3h

This results in a large speedup of Ollama on all Apple Silicon devices. On Apple’s M5, M5 Pro and M5 Max chips, Ollama leverages the new GPU Neural Accelerators to accelerate both time to first token (TTFT) and generation speed (tokens per second). note: test was conducted on using Alibaba’s Qwen3.5-35B-A3B model quantized to nvfp4 and Ollama’s previous implementation quantized to q4_K_M using Ollama 0.18. Ollama 0.19 will see even higher performance (1851 token/s prefill and 134 token/s decode when running with int4).

English

3

11

153

12.1K

ollama@ollama·7h

@hwchase17 ❤️

QME

640

Harrison Chase@hwchase17·14h

jensen at interrupt!!!

LangChain@LangChain

Jensen Huang is coming to Interrupt. May 13-14 in SF. Join Jensen and Harrison for a fireside chat to learn where enterprise agents are headed. We'll dive into the LangChain x @nvidia partnership and how Deep Agents, NVIDIA Nemotron models, and the NVIDIA Agent Toolkit enable production-grade claws for the enterprise. Get tickets: interrupt.langchain.com

Dansk

10

3

73

9.6K

ollama@ollama·14h

@MrRemKing The moment the GLM team launches it open-source! ❤️❤️❤️

English

0

2

32

MrRemKing 📈™@MrRemKing·14h

@ollama when are we going to get access to glm 5.1 cloud?

English

0

79

ollama@ollama·15h

@k_k_kaundal That you have to ask Docker. This is not our product. You can use Ollama or Ollama inside Docker.

English

0

1

86

K. K. 💫@k_k_kaundal·16h

@ollama Most of time i used these models with ollama based config so , can i use docker models in VS using this update ?

English