Neil Mehta

13 posts

Neil Mehta

@ostensiblyneil

AI @ LM Studio

Katılım Nisan 2022

21 Takip Edilen333 Takipçiler

Neil Mehta retweetledi

LM Studio@lmstudio·6d

For WWDC, we worked with Apple to run Kimi K2.6, a 1T-parameter model, across a cluster of four Mac Studios using a preview version of LM Studio. We showcased secure remote access from a MacBook Neo and iPhone using LM Link. A glimpse of your own private, frontier-scale AI.

English

129

316

4.5K

393.1K

Neil Mehta@ostensiblyneil·6 Haz

We made MLX engine a lot faster in the last release. Give it a try!

Neil Mehta@ostensiblyneil

x.com/i/article/2061…

English

1.1K

Neil Mehta@ostensiblyneil·6 Haz

x.com/i/article/2061…

ZXX

230

150.1K

Neil Mehta@ostensiblyneil·20 May

Cool demo here showing the capabilities of the beta LM Studio MLX engine. The engine intelligently manages caching and batching for these parallel agents.

Adrien Grondin@adrgrondin

Subagents running locally and simultaneously on MacBook Pro M5 with Codex CLI + @lmstudio to review code and find bugs using Qwen 3.6 Powered by the updated MLX engine with batching in beta in the app The batching speed boost is noticeable

English

656

Neil Mehta@ostensiblyneil·19 May

Grab the latest engine by updating the app and running `lms runtime update mlx --channel beta`.

English

257

Neil Mehta@ostensiblyneil·19 May

I've been working on bringing some exciting new beta features to LM Studio's MLX engine. Vision models now have automatic prefix caching, prompt cache disk-offloading, and continuous batching. If these sound interesting to you, try it out and lmk what you think.

English

3.7K

Neil Mehta@ostensiblyneil·16 May

@TokenFires Hey, we published an updated runtime for MLX yesterday that should improve the LM Studio performance. If you want to try it out, update the app to 0.4.13 and then run `lms runtime get mlx --channel beta`. Would be curious to hear your experience!

English

TokenFires@TokenFires·16 May

I just ran Qwen3.6 35B A3B on oMLX instead of LM Studio and all that delay in prompt processing…GONE. I’m feeding about 45k tokens into my agent each turn and chat has become as fast as frontier. PP started at 40k and went to 250k tokens per second. I’m a little blown away. This doesn’t seem real. M5 Max MacBook, top end temps have dropped 10-20 degrees F, fan spin dropped from 5700 rpm (maximum) to 3200 rpm on coding tasks. Putting this setup on my Mac mini this weekend…

English

112

Neil Mehta@ostensiblyneil·3 Oca

@yak_ex Yes, this was intentional.

English

Yak!@yak_ex·3 Oca

@ostensiblyneil Thanks. Now, I can confirm the new SHA-256 24ad4d1... for LM-Studio-0.3.36-1-x64.exe. There is one additional confirmation. At the reporting, LM-Studio-0.3.36-1-arm64.exe kept SHA256 f3df3be.... Now, it becomes c22b5bf... Is this also intentional?

English

Yak!@yak_ex·2 Oca

LM-Studio-0.3.36-1-x64.exe のhash値がリリース時点から変わってるみたいで、怖くてissue立てちゃった。 github.com/lmstudio-ai/lm… 普通に更新された、破損、おま環など考えられる理由は色々あるが……

日本語

212

Neil Mehta@ostensiblyneil·18 Eyl

I've been waiting for ages for an open-source visual reasoning model 🙏

LM Studio@lmstudio

mistralai/magistral-small-2509 > New 24B reasoning model from @MistralAI > Supports 🏞️ image input and 🛠️ tool calling > Available in both GGUF and MLX in LM Studio! lmstudio.ai/models/mistral…

English

1.6K

Neil Mehta@ostensiblyneil·29 Ağu

@teknium @ggerganov The important piece is the engine selection on the bottom right. CUDA 12 is recommended for the best performance on 5090 cards.

English

116

Teknium 🪽@Teknium·29 Ağu

@ostensiblyneil @ggerganov Isnt that what i posted here Or is it not using it you mean?

English

165

Georgi Gerganov@ggerganov·28 Ağu

gpt-oss is a great model IMO OpenAI showed us the blueprint for winning local AI: - Interleaved SWA - Small head sizes in the attention - Attention sinks - Mixture of Experts FFN - 4-bit training All of these parts combined together result in the best architecture suitable for regular users. Very lightweight and efficient for inference on pretty much any hardware. Qwen models are also great. The MoE works really well. I think they should just adopt iSWA and 4-bit training to become the best. Gemma models are also great. They already have the 4-bit QAT figured out. It seems they just need to adopt the MoE architecture. And maybe reduce the head size a bit. p.s. don't know if this makes sense, just my overall impression and intuitive understanding

English

982

82.1K

Neil Mehta@ostensiblyneil·28 Ağu

@teknium @ggerganov Hey @teknium could you please check the runtimes page in the app (ctrl+shift+R)? The default selection should be CUDA 12 llama.cpp v1.47.0 (or greater), and note that the model needs to be reloaded after changing the default selection.

English

348

Teknium 🪽@Teknium·28 Ağu

I used to have 2x 4090s on the pc, which definitely did cause a lot of issues - when I tested without the 2nd 4090 back then it sped everything up dramatically. But now, just a single 5090 on here - here's my fire hazard dusty ass rig xD apologies for all the sadness this image will cause people 😂

English

1.1K

Neil Mehta@ostensiblyneil·24 Tem

@Prince_Canuma @ActuallyIsaak 🔥

QME

Prince Canuma@Prince_Canuma·22 Tem

MLX-VLM v0.3.2 is here 🔥 What’s new: - Migrated to .toml - UI and Audio dependencies are optional - Added CUDA support - Support text-only training - Lots of fixes and refactoring Thanks to all the awesome contributions of this release ❤️ (@ActuallyIsaak, Neil from @lmstudio, Saurav and Zhnext) Get started today: > pip install -U mlx-vlm Please leave us a star ⭐ github.com/Blaizzy/mlx-vlm

English

4.8K

Keşfet

@TokenFires @yak_ex @teknium @ggerganov @Prince_Canuma @ActuallyIsaak @lmstudio @elonmusk