Neil Mehta

13 posts

Neil Mehta

Neil Mehta

@ostensiblyneil

AI @ LM Studio

Katılım Nisan 2022
21 Takip Edilen333 Takipçiler
Neil Mehta retweetledi
LM Studio
LM Studio@lmstudio·
For WWDC, we worked with Apple to run Kimi K2.6, a 1T-parameter model, across a cluster of four Mac Studios using a preview version of LM Studio. We showcased secure remote access from a MacBook Neo and iPhone using LM Link. A glimpse of your own private, frontier-scale AI.
LM Studio tweet mediaLM Studio tweet mediaLM Studio tweet media
English
129
316
4.5K
393.1K
Neil Mehta
Neil Mehta@ostensiblyneil·
Grab the latest engine by updating the app and running `lms runtime update mlx --channel beta`.
English
0
0
4
257
Neil Mehta
Neil Mehta@ostensiblyneil·
I've been working on bringing some exciting new beta features to LM Studio's MLX engine. Vision models now have automatic prefix caching, prompt cache disk-offloading, and continuous batching. If these sound interesting to you, try it out and lmk what you think.
English
4
6
26
3.7K
Neil Mehta
Neil Mehta@ostensiblyneil·
@TokenFires Hey, we published an updated runtime for MLX yesterday that should improve the LM Studio performance. If you want to try it out, update the app to 0.4.13 and then run `lms runtime get mlx --channel beta`. Would be curious to hear your experience!
English
2
0
3
83
TokenFires
TokenFires@TokenFires·
I just ran Qwen3.6 35B A3B on oMLX instead of LM Studio and all that delay in prompt processing…GONE. I’m feeding about 45k tokens into my agent each turn and chat has become as fast as frontier. PP started at 40k and went to 250k tokens per second. I’m a little blown away. This doesn’t seem real. M5 Max MacBook, top end temps have dropped 10-20 degrees F, fan spin dropped from 5700 rpm (maximum) to 3200 rpm on coding tasks. Putting this setup on my Mac mini this weekend…
English
1
0
0
112
Yak!
Yak!@yak_ex·
@ostensiblyneil Thanks. Now, I can confirm the new SHA-256 24ad4d1... for LM-Studio-0.3.36-1-x64.exe. There is one additional confirmation. At the reporting, LM-Studio-0.3.36-1-arm64.exe kept SHA256 f3df3be.... Now, it becomes c22b5bf... Is this also intentional?
English
1
0
0
53
Yak!
Yak!@yak_ex·
LM-Studio-0.3.36-1-x64.exe のhash値がリリース時点から変わってるみたいで、怖くてissue立てちゃった。 github.com/lmstudio-ai/lm… 普通に更新された、破損、おま環など考えられる理由は色々あるが……
日本語
1
0
0
212
Neil Mehta
Neil Mehta@ostensiblyneil·
@teknium @ggerganov The important piece is the engine selection on the bottom right. CUDA 12 is recommended for the best performance on 5090 cards.
Neil Mehta tweet media
English
1
0
0
116
Georgi Gerganov
Georgi Gerganov@ggerganov·
gpt-oss is a great model IMO OpenAI showed us the blueprint for winning local AI: - Interleaved SWA - Small head sizes in the attention - Attention sinks - Mixture of Experts FFN - 4-bit training All of these parts combined together result in the best architecture suitable for regular users. Very lightweight and efficient for inference on pretty much any hardware. Qwen models are also great. The MoE works really well. I think they should just adopt iSWA and 4-bit training to become the best. Gemma models are also great. They already have the 4-bit QAT figured out. It seems they just need to adopt the MoE architecture. And maybe reduce the head size a bit. p.s. don't know if this makes sense, just my overall impression and intuitive understanding
English
36
75
982
82.1K
Neil Mehta
Neil Mehta@ostensiblyneil·
@teknium @ggerganov Hey @teknium could you please check the runtimes page in the app (ctrl+shift+R)? The default selection should be CUDA 12 llama.cpp v1.47.0 (or greater), and note that the model needs to be reloaded after changing the default selection.
English
1
0
2
348
Teknium 🪽
Teknium 🪽@Teknium·
I used to have 2x 4090s on the pc, which definitely did cause a lot of issues - when I tested without the 2nd 4090 back then it sped everything up dramatically. But now, just a single 5090 on here - here's my fire hazard dusty ass rig xD apologies for all the sadness this image will cause people 😂
Teknium 🪽 tweet media
English
3
0
10
1.1K
Prince Canuma
Prince Canuma@Prince_Canuma·
MLX-VLM v0.3.2 is here 🔥 What’s new: - Migrated to .toml - UI and Audio dependencies are optional - Added CUDA support - Support text-only training - Lots of fixes and refactoring Thanks to all the awesome contributions of this release ❤️ (@ActuallyIsaak, Neil from @lmstudio, Saurav and Zhnext) Get started today: > pip install -U mlx-vlm Please leave us a star ⭐ github.com/Blaizzy/mlx-vlm
Prince Canuma tweet media
English
6
8
77
4.8K