LM Studio
2.6K posts

LM Studio
@lmstudio
Discover and run open models 👾 we are hiring https://t.co/2D4CG8GO5m
New York, USA Katılım Mayıs 2023
84 Takip Edilen54.5K Takipçiler
Sabitlenmiş Tweet

same Qwen3.6 35B-A3B Q6_K MTP GGUF, ~95k prompt tokens, dual RTX 3090.
LM Studio best was ~47.4 tok/s with F16 KV, Flash Attention on, GPU offload max, eval batch 2048, parallel 1/2, MTP enabled.
llama.cpp best was ~99.45 tok/s with F16 KV, Flash Attention on, parallel 2, ctx 202752 for two ~101k slots, MTP draft max 2, p-min .75.
MTP tuning: llama.cpp exposed p-min and draft settings; draft max 2 was much faster than 4 for 35B.
English

Confirmed qwen 27b MTP works great in LMstudio
LM Studio@lmstudio
MTP is available in LM Studio 0.4.14. Sound on.
English

@ForProduction The amazing team at @ggml_org and the community are working to bring MTP to more models
English

@lmstudio Gemma 4 Assistant still not appearing as a Speculative Deciding option
English

MTP means Multi Token Prediction. It's a speculative decoding technique that can result in large inference speedups in many cases.
1. Update to LM Studio 0.4.14
2. Download a model that supports MTP like Qwen3.6-35B-A3B-MTP-GGUF or Qwen3.6-27B-MTP-GGUF
3. Enable it when loading the model
Supported for GGUF/llama.cpp models 🚀
English
LM Studio retweetledi

Subagents running locally and simultaneously on MacBook Pro M5 with Codex CLI + @lmstudio to review code and find bugs using Qwen 3.6
Powered by the updated MLX engine with batching in beta in the app
The batching speed boost is noticeable
English

@8i8BB @SalemDono @zeddotdev Those are open ended prompts, so unlikely to see benefits. Try more structured tasks like coding or prompts relating to content already in the context
English

@lmstudio @SalemDono @zeddotdev Yes, it was just hi, write a short story. Not able to see much faster results
English

Use your LM Studio models to code locally in @zeddotdev 🚀
Zed@zeddotdev
Local model usage grew 3x in Zed's agent in the last 10 weeks. Cameron Mcloughlin on why he prefers local: "I worry about over-reliance on providers that operate like SaaS platforms, where a change of pricing or setup makes them unfeasible to use. With a local model, you always have access." Read more: zed.dev/blog/local-ai-…
English

@lmstudio @SalemDono @zeddotdev Went to beta, downloaded a MTP model (unsloth qwen3.6), enabled MTP speculative decoding in the model settings, seeing 96 t/s on M5max, which is roughly the same for it without MTP. Also tried the 27B, seeing same result as well...
English

LM Studio、Windows版は0.4.6しかダウンロード出来なくなってる
macOS版は0.4.7のみ
最新がダウンロード出来ないのは何か理由がありますか?
@lmstudio
日本語








