LM Studio

2.6K posts

LM Studio

@lmstudio

Discover and run open models 👾 we are hiring https://t.co/2D4CG8GO5m

New York, USA Katılım Mayıs 2023

84 Takip Edilen54.5K Takipçiler

Sabitlenmiş Tweet

LM Studio@lmstudio·3d

MTP is available in LM Studio 0.4.14. Sound on.

English

741

70.4K

LM Studio@lmstudio·2d

@tmophoto Please share your exact llama.cpp command We ship good old llama.cpp under the hood so theoretically any performance difference is a configuration difference

English

116

tmo@tmophoto·2d

same Qwen3.6 35B-A3B Q6_K MTP GGUF, ~95k prompt tokens, dual RTX 3090. LM Studio best was ~47.4 tok/s with F16 KV, Flash Attention on, GPU offload max, eval batch 2048, parallel 1/2, MTP enabled. llama.cpp best was ~99.45 tok/s with F16 KV, Flash Attention on, parallel 2, ctx 202752 for two ~101k slots, MTP draft max 2, p-min .75. MTP tuning: llama.cpp exposed p-min and draft settings; draft max 2 was much faster than 4 for 35B.

English

178

LM Studio@lmstudio·3d

MTP is available in LM Studio 0.4.14. Sound on.

English

741

70.4K

LM Studio@lmstudio·3d

@LottoLabs 🚀

QME

1.1K

Lotto@LottoLabs·3d

Confirmed qwen 27b MTP works great in LMstudio

LM Studio@lmstudio

MTP is available in LM Studio 0.4.14. Sound on.

English

200

20.1K

LM Studio@lmstudio·3d

@tmophoto Please share your llama.cpp and LM configs so we can try to reproduce

English

1.6K

tmo@tmophoto·3d

@lmstudio I just ran a ton of benchmarks for the last day and a half with it (I am a die hard lm studio fan) but it’s generating half the tokens per sec as llama.cpp with qwen 35b models. Is there something that I can tweak to get the same performance?

English

1.8K

LM Studio@lmstudio·3d

@ForProduction The amazing team at @ggml_org and the community are working to bring MTP to more models

English

420

ForProduction@ForProduction·3d

@lmstudio 😭

QME

364

LM Studio@lmstudio·3d

@Franzferdinan57 Start with 2, and experiment away

English

1.5K

Duckets@Franzferdinan57·3d

@lmstudio What's the best amount of tokens to use? Should it stay at the default 3, or be changed to something like 5? Does it vary by model? I guess I have a lot to look into.

English

1.7K

LM Studio@lmstudio·3d

@Volta700 Yes!

1.1K

Volta@Volta700·3d

@lmstudio Out of beta?

English

1.2K

LM Studio@lmstudio·3d

@ForProduction Qwen only at this moment

English

1.6K

ForProduction@ForProduction·3d

@lmstudio Gemma 4 Assistant still not appearing as a Speculative Deciding option

English

1.7K

LM Studio@lmstudio·3d

MTP means Multi Token Prediction. It's a speculative decoding technique that can result in large inference speedups in many cases. 1. Update to LM Studio 0.4.14 2. Download a model that supports MTP like Qwen3.6-35B-A3B-MTP-GGUF or Qwen3.6-27B-MTP-GGUF 3. Enable it when loading the model Supported for GGUF/llama.cpp models 🚀

English

142

7.2K

LM Studio retweetledi

Adrien Grondin@adrgrondin·5d

Subagents running locally and simultaneously on MacBook Pro M5 with Codex CLI + @lmstudio to review code and find bugs using Qwen 3.6 Powered by the updated MLX engine with batching in beta in the app The batching speed boost is noticeable

English

630

68.5K

LM Studio@lmstudio·5d

@8i8BB @SalemDono @zeddotdev Those are open ended prompts, so unlikely to see benefits. Try more structured tasks like coding or prompts relating to content already in the context

English

128

Nomignon@8i8BB·5d

@lmstudio @SalemDono @zeddotdev Yes, it was just hi, write a short story. Not able to see much faster results

English

104

LM Studio@lmstudio·6d

Use your LM Studio models to code locally in @zeddotdev 🚀

Zed@zeddotdev

Local model usage grew 3x in Zed's agent in the last 10 weeks. Cameron Mcloughlin on why he prefers local: "I worry about over-reliance on providers that operate like SaaS platforms, where a change of pricing or setup makes them unfeasible to use. With a local model, you always have access." Read more: zed.dev/blog/local-ai-…

English

141

23.4K

LM Studio@lmstudio·5d

@8i8BB @SalemDono @zeddotdev Are you able to share your prompt?

English

112

Nomignon@8i8BB·5d

@lmstudio @SalemDono @zeddotdev Went to beta, downloaded a MTP model (unsloth qwen3.6), enabled MTP speculative decoding in the model settings, seeing 96 t/s on M5max, which is roughly the same for it without MTP. Also tried the 27B, seeing same result as well...

English

180

LM Studio@lmstudio·5d

@arismendius @pa_sann Update to the latest beta!

LM Studio@lmstudio

@SalemDono @zeddotdev Update to the latest app beta (0.4.14+2)! Make sure your llama.cpp engine is 2.15.0

English

194