LM Studio

2.6K posts

LM Studio banner
LM Studio

LM Studio

@lmstudio

Discover and run open models 👾 we are hiring https://t.co/2D4CG8GO5m

New York, USA Katılım Mayıs 2023
84 Takip Edilen54.5K Takipçiler
Sabitlenmiş Tweet
LM Studio
LM Studio@lmstudio·
MTP is available in LM Studio 0.4.14. Sound on.
English
36
77
741
70.4K
LM Studio
LM Studio@lmstudio·
@tmophoto Please share your exact llama.cpp command We ship good old llama.cpp under the hood so theoretically any performance difference is a configuration difference
English
1
0
1
116
tmo
tmo@tmophoto·
same Qwen3.6 35B-A3B Q6_K MTP GGUF, ~95k prompt tokens, dual RTX 3090. LM Studio best was ~47.4 tok/s with F16 KV, Flash Attention on, GPU offload max, eval batch 2048, parallel 1/2, MTP enabled. llama.cpp best was ~99.45 tok/s with F16 KV, Flash Attention on, parallel 2, ctx 202752 for two ~101k slots, MTP draft max 2, p-min .75. MTP tuning: llama.cpp exposed p-min and draft settings; draft max 2 was much faster than 4 for 35B.
English
1
0
0
178
LM Studio
LM Studio@lmstudio·
MTP is available in LM Studio 0.4.14. Sound on.
English
36
77
741
70.4K
LM Studio
LM Studio@lmstudio·
@tmophoto Please share your llama.cpp and LM configs so we can try to reproduce
English
2
0
13
1.6K
tmo
tmo@tmophoto·
@lmstudio I just ran a ton of benchmarks for the last day and a half with it (I am a die hard lm studio fan) but it’s generating half the tokens per sec as llama.cpp with qwen 35b models. Is there something that I can tweak to get the same performance?
English
1
0
3
1.8K
Duckets
Duckets@Franzferdinan57·
@lmstudio What's the best amount of tokens to use? Should it stay at the default 3, or be changed to something like 5? Does it vary by model? I guess I have a lot to look into.
English
2
0
2
1.7K
ForProduction
ForProduction@ForProduction·
@lmstudio Gemma 4 Assistant still not appearing as a Speculative Deciding option
English
1
0
1
1.7K
LM Studio
LM Studio@lmstudio·
MTP means Multi Token Prediction. It's a speculative decoding technique that can result in large inference speedups in many cases. 1. Update to LM Studio 0.4.14 2. Download a model that supports MTP like Qwen3.6-35B-A3B-MTP-GGUF or Qwen3.6-27B-MTP-GGUF 3. Enable it when loading the model Supported for GGUF/llama.cpp models 🚀
English
11
4
142
7.2K
LM Studio retweetledi
Adrien Grondin
Adrien Grondin@adrgrondin·
Subagents running locally and simultaneously on MacBook Pro M5 with Codex CLI + @lmstudio to review code and find bugs using Qwen 3.6 Powered by the updated MLX engine with batching in beta in the app The batching speed boost is noticeable
English
36
32
630
68.5K
LM Studio
LM Studio@lmstudio·
@8i8BB @SalemDono @zeddotdev Those are open ended prompts, so unlikely to see benefits. Try more structured tasks like coding or prompts relating to content already in the context
English
0
0
1
128
Nomignon
Nomignon@8i8BB·
@lmstudio @SalemDono @zeddotdev Went to beta, downloaded a MTP model (unsloth qwen3.6), enabled MTP speculative decoding in the model settings, seeing 96 t/s on M5max, which is roughly the same for it without MTP. Also tried the 27B, seeing same result as well...
English
2
0
2
180