andthattoo

694 posts

andthattoo

@andthatto

drums of liberation @driaforall

Katılım Temmuz 2016

966 Takip Edilen780 Takipçiler

Sabitlenmiş Tweet

andthattoo@andthatto·1d

Qwen 3.6 is frontier for local. It also thinks forever. I tried a dumb inference-time trick: make its block obey a tiny grammar. Result: - HumanEval+: 22x fewer think tokens, no accuracy loss - LiveCodeBench public slice: +14% pass@1, ~5x fewer total tokens

English

1.3K

128.6K

andthattoo@andthatto·8h

Opened a llama.cpp discussion about whether custom GBNF grammars can compose with tool calls in llama-server. Right now tools work alone, grammar works alone, but tools+grammar doesn't. If you use llama.cpp + agent frameworks, sharing your use cases would help move the design faster. github.com/ggml-org/llama…

English

338

andthattoo@andthatto·9h

@XReyRobert yes I've noticed, gonna post about this in llama cpp discussions

English

XReyRobert@XReyRobert·9h

@andthatto It seems that llama.cpp supports grammars and tool calling, but not both in the same request... I failed trying to use this with hermes tonight...

English

andthattoo@andthatto·1d

English

1.3K

128.6K

andthattoo@andthatto·17h

@chaddotphp nice!

English

700

chad@chaddotphp·17h

@andthatto I'm testing locally with this now, and so far the results are very impressive. Thanks!

English

792

andthattoo@andthatto·1d

@leetllm always bitter lesson pilled

English

3.7K

LeetLLM.com@leetllm·1d

@andthatto super clever trick, but my brain immediately goes to the bitter lesson. manually constraining reasoning traces works great today, but raw scale is just going to steamroll heuristics like this.

English

4.1K

andthattoo@andthatto·1d

@voxmenthe @VictorTaelin Yes, basically. It’s an inference-time prior over the shape of the scratchpad. The bet is: many reasoning tokens are low-value narration, and a small structured harness preserves the useful planning bits while cutting the ramble.

English

828

Jeff Coggshall@voxmenthe·1d

@andthatto @VictorTaelin So you're basically injecting a prior over a kind of thinking harness?

English

819

andthattoo@andthatto·1d

@max_paperclips @VictorTaelin Failure modes the model should keep in mind before answering.

English

202

Shannon Sands@max_paperclips·1d

@andthatto @VictorTaelin What is "EDGE" in this context?

English

218

andthattoo@andthatto·1d

@vega_holdings repo is not a plug-n play so make sure codex/claude reads it and makes something useful out of it for your case, good luck!

English

120

ｖｅｇａ@vega_holdings·1d

@andthatto qwen overthinking was killing my per turn token limit will try it out thanks!

English

144

andthattoo@andthatto·1d

Tiny grammar = constrain only the block at decoding time. So instead of free-form thought, it must write e.g. GOAL / APPROACH / EDGE. The final answer is still open. Yes, it can affect thinking. The surprising part is that in these runs it compressed reasoning hard without hurting pass@1 and improve in some cases. But that is purely task+grammar based.

English

Taelin@VictorTaelin·1d

@andthatto a tiny grammar? wdym? won't that affect its thinking

English

andthattoo@andthatto·1d

@ljupc0 Go ahead! But make sure to explore grammar best fitting you tasks/agents. Ones in the repo may not be optimal for your case.

English

164

Ljubomir Josifovski@ljupc0·1d

Wow - thanks! Exactly what I need. I've had a problem with the Qwen-s thinking a lot, need to put a limit to that. (need a response sooner - even if not perfect.) The 3.5-s were bad in that they output nothing when interrupted. :-( Couldn't use them reliably. The 3.6-s are better now, :-) they do produce output when interrupted with --n-predict 8192 \ --reasoning on --reasoning-format deepseek \ --chat-template-kwargs '{"preserve_thinking":true}' \ --reasoning-budget 3072 --reasoning-budget-message 'Reasoning budget exhausted. Stop thinking and provide the best final answer now.' \ But of course it would be even better if they didn't get stuck thinking forever in the 1st place. :-) I also use LCB tiny portion to test - giving this a try now... Thanks!

English

244

andthattoo@andthatto·1d

@timelessdev @enjoyingthewind Not really; just had the resources to try Qwen3.6 27B and wanted to explore the idea

English

2.5K

Jacek (Jomsborg.eth)@timelessdev·1d

@andthatto @enjoyingthewind Have you tried with other models?

English

2.9K

andthattoo@andthatto·1d

I think this is a useful middle ground between: verbose CoT at inference training models to reason in latent space Just constrain the text interface. Full writeup + results: andthattoo.dev/blog/structure… and repo: github.com/andthattoo/str…

English

152

7.2K

andthattoo@andthatto·1d

My insight is that a lot of verbose CoT is scaffolding, not essential computation. Constrained decoding can force a denser interface to the model’s latent reasoning. But if the task really needs more deliberation, it leaks somewhere else.

English

6.5K

andthattoo@andthatto·2d

I'm onto something

English

163

andthattoo retweetledi

DeepSeek@deepseek_ai·3d

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n

English

1.6K

7.6K

44.4K

andthattoo@andthatto·4d

qwen drops a dense 27B while I'm training the 36B-A3B

Qwen@Alibaba_Qwen

🚀 Meet Qwen3.6-27B, our latest dense, open-source model, packing flagship-level coding power! Yes, 27B, and Qwen3.6-27B punches way above its weight. 👇 What's new: 🧠 Outstanding agentic coding — surpasses Qwen3.5-397B-A17B across all major coding benchmarks 💡 Strong reasoning across text & multimodal tasks 🔄 Supports thinking & non-thinking modes ✅ Apache 2.0 — fully open, fully yours Smaller model. Bigger results. Community's favorite. ❤️ We can't wait to see what you build with Qwen3.6-27B! 👀 🔗👇 Blog: qwen.ai/blog?id=qwen3.… Qwen Studio: chat.qwen.ai/?models=qwen3.… Github: github.com/QwenLM/Qwen3.6 Hugging Face: huggingface.co/Qwen/Qwen3.6-2… huggingface.co/Qwen/Qwen3.6-2… ModelScope: modelscope.cn/models/Qwen/Qw… modelscope.cn/models/Qwen/Qw…

English

286

Keşfet

@XReyRobert @chaddotphp @leetllm @voxmenthe @VictorTaelin @max_paperclips @vega_holdings @ljupc0