5/ Configure your effort level
Opus 4.7 uses adaptive thinking instead of thinking budgets. To tune the model to think more/less, we recommend tuning effort.
Use lower effort for faster responses and lower token usage. Use higher effort for the most intelligence and capability.
Personally, I use xhigh effort for most tasks, and max effort for the hardest tasks. Max applies to just your current session; other effort levels are sticky and persist for your next session also.
/effort to set your effort level.
New diffusion RL method from Tencent:SRPO!
Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference
32 H20 with 10minutes enhance Flux 1-dev.
@danielhanchen@UnslothAI Amazing! Is the cola notebook coming with this release too or have to wait little more =)
I see """Gemma 3 (12B) - we're still working on it""" on docs...
Looks great anyways!
Excited to share that @UnslothAI now supports:
• Full fine-tuning + 8bit
• Nearly any model like Mixtral, Cohere, Granite, Gemma 3
• No more OOMs for vision finetuning!
Blogpost with details: unsloth.ai/blog/gemma3
More updates:
• Many multiple optimizations in Unsloth allowing a further +10% less VRAM usage, and >10% speedup boost for 4-bit (on top of our original 2x faster, 70% less memory usage). 8-bit and full finetuning also benefit.
• Windows support via `pip install unsloth` should function now! Utilizes `triton-windows` which provides a pip installable path for Triton.
• Conversions to llama.cpp GGUFs for 16bit and 8bit now DO NOT need compiling! This solves many many issues, and this means no need to install GCC, Microsoft Visual Studio etc.
• Vision fine-tuning: Train on completions / responses only for vision models supported! Pixtral and Llava finetuning are now fixed! In fact nearly all vision models are supported out of the box! Vision models now auto resize images which stops OOMs and also allows truncating sequence lengths.
• GRPO in Unsloth now allows non Unsloth uploaded models to be in 4bit as well - reduces VRAM usage a lot! (ie using your own finetune of Llama)
• New training logs and infos - training parameter counts, total batch size
• Complete gradient accumulation bug fix coverage for all models!
Read the release here: github.com/unslothai/unsl…
My Gemma-3 analysis:
1. 1B text only, 4, 12, 27B Vision + text. 14T tokens
2. 128K context length further trained from 32K
3. Removed attn softcapping. Replaced with QK norm
4. 5 sliding + 1 global attn
5. 1024 sliding window attention
6. RL - BOND, WARM, WARP
Detailed analysis:
1. Architectural differences to Gemma 2:
More sliding windows are added to reduce KV cache load! A 5:1 ratio was found to work well, and ablations show 7:1 even work ok! SWA is 1024 - ablations show 1024 to 2048 work well.
2. Training, post-training
Gemma-3 uses TPUs, and Zero-3 like algos with JAX. 27B was trained on 14 trillion tokens. 12B = 12T, 4B = 4T and 1B = 2T tokens. All used distillation in the RL / post-training stage. Sampled 256 logits per token from a larger instruct model (unsure which - maybe a closed source one?). Used RL algos like BOND, WARM and WARP.
3. Chat template now forces a BOS token! Uses user and model. 262K vocab size. SentencePiece tokenizer with split digits, preserved whitespace & byte fallback.
4. Long Context & Vision Encoder:
Trained from 32K context, then extended to 128K context. RoPE Scaling of 8 was used. Pan & Scan algo was used for vision encoder. Vision encoder operates at a fixed resolution of 896 * 896. Uses windowing during inference time to allow other sizes.
Technical report: storage.googleapis.com/deepmind-media…