Tim Janik (@TimJanik) - Profil Twitter | Zamantika Mersobahis Locabet

Tweet Disematkan

Thoughtful explanation without oversimplification. Well recommended read: AI Cannot Self Improve and Math behind PROVES IT! smsk.dev/2026/04/26/ai-…

English

0

68

Tim Janik@TimJanik·4h

@WolframRvnwlf @OpenAI Take a look at the model template conditionals for Qwen etc, llama.cpp easily shows them. Just to get an idea what additional layers of complexity are already standard (hint: llama.cpp needs a Jinja interpreter) for basic operation.

English

0

18

Wolfram Ravenwolf@WolframRvnwlf·4h

Wait, what? Prompt injection into the system prompt - for API use? @OpenAI, is this true? I always assumed that when using a model through the API, the only system instructions present are the ones the developer/user provides. Any additional layer adds hidden complexity we need to account for and creates all kinds of trouble later when it changes, even if the model version stays the same.

Vals AI@ValsAI

After reaching out, we were able to confirm with OpenAI that “tool_choice”: “none” injects an additional steering instruction into the model system prompt, in a way that tools: [] does not. This instruction seemingly hurts the model’s ability to use the Terminus 2 harness effectively, which, despite not using native-tool-calling, is still agentic.

English

1

235

Tim Janik@TimJanik·2d

@vadimcomanescu @atmoio LLMs can make you faster on things you are already good at. If you use it as an expert on something you have no clue about, it *will* fail you at some point. Go deep on a well known topic to see its limitations, then extrapolate that to other areas. x.com/CursiveCrow/st…

Crow@CursiveCrow

@jdegoes Always remember, an LLM is correct by *accident*. It is just guessing the answer, and happens to be right (more often with more training). It has exactly 0 understanding of anything about anything.

English

0

1

2

147

Vadim Comanescu@vadimcomanescu·2d

@TimJanik @atmoio Will do. Anything that can help me … it’s more than welcome.🙏🏾

English

1

0

32

Vadim Comanescu@vadimcomanescu·2d

I feel dumb. I don’t understand how everyone is having this crazy breakthrough with the latest models and I’m struggling like a dog. Every single workflow feels like pain … why?

English

2

0

1

90

Tim Janik@TimJanik·23 Nis

@atmoio @lkerS12 You are underestimating the devotion of your audience by a lot… 🧐

English

1

0

6

68

Mo@atmoio·23 Nis

@lkerS12 these are longer form monologues i dont think a broader audience would have the patience for 😅

English

5

0

10

1.8K

İlker S.@lkerS12·23 Nis

Come on @atmoio . really? members only? I am a member at heart

English

1

0

6

1.9K

Tim Janik@TimJanik·23 Nis

@ChujieZheng Truly my favorite model ;-) x.com/TimJanik/statu…

Tim Janik@TimJanik

ROTFL 😂 Qwen3.6-27B is hilarious, opt to "have fun" in the midst of a coding session. My favorite model so far!

English

0

146

Chujie Zheng@ChujieZheng·22 Nis

This is what you all are awaiting. Enjoy

Qwen@Alibaba_Qwen

🚀 Meet Qwen3.6-27B, our latest dense, open-source model, packing flagship-level coding power! Yes, 27B, and Qwen3.6-27B punches way above its weight. 👇 What's new: 🧠 Outstanding agentic coding — surpasses Qwen3.5-397B-A17B across all major coding benchmarks 💡 Strong reasoning across text & multimodal tasks 🔄 Supports thinking & non-thinking modes ✅ Apache 2.0 — fully open, fully yours Smaller model. Bigger results. Community's favorite. ❤️ We can't wait to see what you build with Qwen3.6-27B! 👀 🔗👇 Blog: qwen.ai/blog?id=qwen3.… Qwen Studio: chat.qwen.ai/?models=qwen3.… Github: github.com/QwenLM/Qwen3.6 Hugging Face: huggingface.co/Qwen/Qwen3.6-2… huggingface.co/Qwen/Qwen3.6-2… ModelScope: modelscope.cn/models/Qwen/Qw… modelscope.cn/models/Qwen/Qw…

English

45

22

865

49.6K

Tim Janik@TimJanik·23 Nis

ROTFL 😂 Qwen3.6-27B is hilarious, opt to "have fun" in the midst of a coding session. My favorite model so far!

English

0

3

242

Tim Janik@TimJanik·22 Nis

@sudoingX @UnslothAI Just put the weight for a 4090 up here, getting ~47 tok/s: x.com/TimJanik/statu…

Tim Janik@TimJanik

Great! I am seeing ~47 tok/s on an RTX 4090 (24GB) with: llama-server -c 0 -m Qwen3.6-27B.Q4KM.gguf --temp 0.6 --min_p 0 --top_p 0.95 --top_k 20 --cache-type-k q4_0 --cache-type-v q4_0 -ub 1024 -b 1024 --mmproj Qwen3.6-27B.F16.mmproj.gguf huggingface.co/tim-janik/Qwen…

English

0

1

712

Sudo su@sudoingX·22 Nis

fuck it i am pulling the weights right now. cannot sit still since qwen 3.6-27b dense dropped two hours ago and @UnslothAI just put the dynamic ggufs live, 18gb ram footprint, that fits my rtx 3090 24gb. they moved faster than me, that is fine, the open source machine is working. here is what has me restless. the chart says a 27 billion parameter open weight model matching claude 4.5 opus on terminal-bench 2.0 at 59.3 flat, beats claude on skillsbench, gpqa diamond, mmmu, and realworldqa. opus 4.5 level agentic intelligence on your single rtx 3090 24gb vram tier. if that chart survives first contact with real hermes agent runs on my hardware, the best model for single consumer gpu just changed in the middle of my sprint. my benchmark is the only voice that matters to me. same hermes agent harness, same quant, head to head against 3.5-27b dense which has held the 3090 crown for weeks. i settle it on my cards or not at all. pulling now. benchmarking tonight if i can stay awake long enough. you have no idea how restless this makes me. if you see numbers on your timeline before morning, the chart held. if you don't, i crashed and data drops first thing. this is what open source looks like when the whole chain moves same day.

Unsloth AI@UnslothAI

Qwen3.6-27B can now run locally! 💜 Run on 18GB RAM via Unsloth Dynamic GGUFs. Qwen3.6-27B surpasses Qwen3.5-397B-A17B on all major coding benchmarks. GGUFs: huggingface.co/unsloth/Qwen3.… Guide: unsloth.ai/docs/models/qw…

English

35

22

609

72.7K

Tim Janik@TimJanik·22 Nis

Wow, Qwen3.6-27B closing in on Gemini 3.1 Pro on SWE-Bench Pro?? Qwen3.6-35B-A3B was already great for smaller Laptop GPUs (6GB), the 27B should be a great fit for larger consumer GPUs (24GB) and should surpass 35B-A3B by a strong margin. Downloading for quantization now… 🫢

Qwen@Alibaba_Qwen

🚀 Meet Qwen3.6-27B, our latest dense, open-source model, packing flagship-level coding power! Yes, 27B, and Qwen3.6-27B punches way above its weight. 👇 What's new: 🧠 Outstanding agentic coding — surpasses Qwen3.5-397B-A17B across all major coding benchmarks 💡 Strong reasoning across text & multimodal tasks 🔄 Supports thinking & non-thinking modes ✅ Apache 2.0 — fully open, fully yours Smaller model. Bigger results. Community's favorite. ❤️ We can't wait to see what you build with Qwen3.6-27B! 👀 🔗👇 Blog: qwen.ai/blog?id=qwen3.… Qwen Studio: chat.qwen.ai/?models=qwen3.… Github: github.com/QwenLM/Qwen3.6 Hugging Face: huggingface.co/Qwen/Qwen3.6-2… huggingface.co/Qwen/Qwen3.6-2… ModelScope: modelscope.cn/models/Qwen/Qw… modelscope.cn/models/Qwen/Qw…

English

1

0

2

2K

Tim Janik@TimJanik·17 Nis

This model indeed works acceptably on a RTX 3060 Laptop GPU w/ 6GB VRAM: llama-server -c 98304 -m Qwen3.6-35B-A3B-UD-IQ3_XXS.gguf -fitt 512 --temp 0.6 --top_p 0.95 --top_k 20 --min_p 0 Runs at ca 22 tok/s! (kv quantization would be marginally faster but generates worse output)

Tim Janik@TimJanik

Exciting! Seeing these benchmarks, Qwen3.6-35B-A3B could potentially bring Qwen3.5-27B / Gemma4-31B quality inference to small laptop GPUs. I will give this a test run a on an NVIDIA GeForce RTX 3060 Laptop GPU and report back.

English

0

8

1.1K

Tim Janik@TimJanik·17 Nis

@TheAhmadOsman It has a different tokenizer…

English

0

1

67

Ahmad@TheAhmadOsman·17 Nis

seems like Opus 4.7 is just a normalization of nerfed Opus 4.6 more than anything else lmaooo never change, Anthropic

English

8

2

56

2.6K

Ahmad@TheAhmadOsman·17 Nis

Is Opus 4.7 just Opus 4.6 un-nerfed 🤔

English

67

5

230

23K

Tim Janik@TimJanik·16 Nis

Exciting! Seeing these benchmarks, Qwen3.6-35B-A3B could potentially bring Qwen3.5-27B / Gemma4-31B quality inference to small laptop GPUs. I will give this a test run a on an NVIDIA GeForce RTX 3060 Laptop GPU and report back.

Qwen@Alibaba_Qwen

⚡ Meet Qwen3.6-35B-A3B：Now Open-Source！🚀🚀 A sparse MoE model, 35B total params, 3B active. Apache 2.0 license. 🔥 Agentic coding on par with models 10x its active size 📷 Strong multimodal perception and reasoning ability 🧠 Multimodal thinking + non-thinking modes Efficient. Powerful. Versatile. Try it now👇 Blog：qwen.ai/blog?id=qwen3.… Qwen Studio：chat.qwen.ai HuggingFace：huggingface.co/Qwen/Qwen3.6-3… ModelScope：modelscope.cn/models/Qwen/Qw… API（‘Qwen3.6-Flash’ on Model Studio）：Coming soon～ Stay tuned

English

0

4

1.7K

Tim Janik@TimJanik·15 Nis

@PierceLilholt Because probabilistic text completion is != intention.

English

1

0

1

52

Pierce Alexander Lilholt@PierceLilholt·15 Nis

Why do we trust that AI won't develop a survival instinct that prioritizes itself over humanity?

English

57

4

40

2.9K

Tim Janik@TimJanik·14 Nis

@atmoio Thank you, for this above average commentary! ;-) Really love the irony in your vids… Now, if all we get from AI is averaged slop unsuitable as practical business advice; what does that mean for the *code generation* that everyone increasingly relies on?

English

0

67

Mo@atmoio·14 Nis

AI is giving every CEO the same advice

English

318

677

6.6K

577.2K

Tim Janik@TimJanik·13 Nis

@bnjmn_marie Thanks, interesting as always! FWIW, I have seen the occasional "'path' missing" error with tools calls in Gemma-4, while Qwen3.5-27B almost never messes up the syntax… Are you going to take a look at MiniMax-M2.7 Quants too?

English

1

0

4

477

Benjamin Marie@bnjmn_marie·13 Nis

Gemma 4 31B vs Qwen3.5 27B, Thinking Enabled I ran multiple benchmarks multiple times. Gemma 4 31B looks better and more stable (smaller accuracy variations between runs, which makes sense since it generates shorter sequences). I'll publish my full results and analysis on my blog later this week (link in profile).

English

26

24

300

18.8K

Tim Janik@TimJanik·13 Nis

@davis7 Nice! A skill is so much easier to test. Just for searching docs, using `git clone --depth 1` could be more efficient. And maybe make `btca cleanup` explicit… ;-)

English

0

1

83

Ben Davis@davis7·12 Nis

funny story, I've been trying to figure out the right shape for btca local for a while now if u haven't seen it, it's cli app that clones git repos u pass in then lets an agent search them. super super useful for getting better code out of agents what if it was a skill? why do I have to write code for: - cloning a repo - starting an agent - tools for the agent I already have a really good coding agent, just let it do all of that for me. It can clone the repo and do the search, and even contort itself into feeling like an app simply by telling it what it should be doing at different times Like if u invoke the skill with a "/" command and no args, it outputs what I would have had a custom tui write. Except I didn't write code I just told it what it's supposed to say if that happens I cannot believe gstack is what made this click for me but it is If u want to try the new version, it's so much better: npx skills add github.com/davis7dotsh/be… --skill btca-local

Theo - t3.gg@theo

I think gstack caused @davis7 to enter psychosis (next podcast episode is gonna be great)

English

13

6

150

80K

Tim Janik@TimJanik·7 Nis

And we're back after the loss of signal!

English

0

1

82

Tim Janik@TimJanik·7 Nis

Not a crescent moon, but a crescent Earth...

English

2

0

2

171

Tim Janik@TimJanik·7 Nis

Orion currently behind the moon, here's the NASA feed for when it comes back: youtube.com/watch?v=m3kR2K…

YouTube

English

0

108

Tim Janik@TimJanik·4 Nis

@badlogicgames Have been using Pi almost exclusively for the last months and I'm pretty happy (using local models). I just wonder what you use to let the model browse URLs, so far I have to switch out of it for anything that requires web browsing / web (re-)search.

English

0

62

Mario Zechner@badlogicgames·4 Nis

i personally am fine with pi's out of the box experience, btw. look into the pi-mono repo .pi folder. that's all i use plus pi-diff-review.

English

2

0

11

2.1K

Mario Zechner@badlogicgames·4 Nis

Viv has clearly understood pi :)

Viv@Vtrivedy10

ok hot take, who (dis)agrees? The general purpose agent/harness doesn’t exist the best harnesses are deeply Task specific and when we use a “default harness” out-of-the-box, we’re just making a tradeoff between - acceptable task performance - time+money spent designing around our task(s) that’s a totally fair tradeoff to make, maybe we’re happy with the out of box perf! what we call a “general purpose” harness is just one that’s reasonably good at a relatively large portion of tasks but there’s a reason why teams that want top 1% agent performance obsessively tweak the harness per Task+Model it’s because you can squeeze out a lot by building bespoke harness tooling for a Task. For a high value task, it’s totally worth the investment Your entire company might be predicated on that investment this effect is pretty clear when we try to swap models “models are non-fungible in their harness” - so the suck if we just drop in codex into the Claude Code harness but if you use the models together in a joint harness and design around the specific problem, you can get great perf I’ve mentioned before but i think the most exciting future is just-in-time harness creation per task idk if that’s a very popular take vs “one model will do everything” but it’s a current mental model and exciting thing i’m messing around with

English

6

2

195

34K

Tim Janik

Jelajahi