Komodin Bahattin

14 posts

Komodin Bahattin

@KoltukGenerali

Gereksiz ilgi alanları bol olan bir şahıs...

Turkey Katılım Eylül 2016

249 Takip Edilen12 Takipçiler

Komodin Bahattin@KoltukGenerali·16 Nis

@bcherny Wouldn't it be much easier if we can add some cmd+left/right to play with effort levels in the cli? /effort is time consuming

English

550

Boris Cherny@bcherny·16 Nis

5/ Configure your effort level Opus 4.7 uses adaptive thinking instead of thinking budgets. To tune the model to think more/less, we recommend tuning effort. Use lower effort for faster responses and lower token usage. Use higher effort for the most intelligence and capability. Personally, I use xhigh effort for most tasks, and max effort for the hardest tasks. Max applies to just your current session; other effort levels are sticky and persist for your next session also. /effort to set your effort level.

English

883

185.3K

Boris Cherny@bcherny·16 Nis

Dogfooding Opus 4.7 the last few weeks, I've been feeling incredibly productive. Sharing a few tips to get more out of 4.7 🧵

English

335

1.1K

11.8K

1.6M

Komodin Bahattin@KoltukGenerali·10 Eyl

@bdsqlsz Can we use it for LoRA's I wonder...🤔

English

332

青龍聖者@bdsqlsz·10 Eyl

New diffusion RL method from Tencent:SRPO! Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference 32 H20 with 10minutes enhance Flux 1-dev.

English

268

16K

Komodin Bahattin@KoltukGenerali·1 Nis

@pandemicdotfun 👀

QME

Pandemic Labs@pndmdotorg·31 Mar

if you’re seeing this, you are extremely early. something mega viral is coming soon. any interaction will be considered for alpha access.

English

2.4K

748

372.3K

Komodin Bahattin@KoltukGenerali·14 Mar

@danielhanchen @UnslothAI Amazing! Is the cola notebook coming with this release too or have to wait little more =) I see """Gemma 3 (12B) - we're still working on it""" on docs... Looks great anyways!

English

430

Daniel Han@danielhanchen·14 Mar

Excited to share that @UnslothAI now supports: • Full fine-tuning + 8bit • Nearly any model like Mixtral, Cohere, Granite, Gemma 3 • No more OOMs for vision finetuning! Blogpost with details: unsloth.ai/blog/gemma3 More updates: • Many multiple optimizations in Unsloth allowing a further +10% less VRAM usage, and >10% speedup boost for 4-bit (on top of our original 2x faster, 70% less memory usage). 8-bit and full finetuning also benefit. • Windows support via `pip install unsloth` should function now! Utilizes `triton-windows` which provides a pip installable path for Triton. • Conversions to llama.cpp GGUFs for 16bit and 8bit now DO NOT need compiling! This solves many many issues, and this means no need to install GCC, Microsoft Visual Studio etc. • Vision fine-tuning: Train on completions / responses only for vision models supported! Pixtral and Llava finetuning are now fixed! In fact nearly all vision models are supported out of the box! Vision models now auto resize images which stops OOMs and also allows truncating sequence lengths. • GRPO in Unsloth now allows non Unsloth uploaded models to be in 4bit as well - reduces VRAM usage a lot! (ie using your own finetune of Llama) • New training logs and infos - training parameter counts, total batch size • Complete gradient accumulation bug fix coverage for all models! Read the release here: github.com/unslothai/unsl…

English

341

26.1K

Komodin Bahattin@KoltukGenerali·12 Mar

@danielhanchen Awesome, you guys rock!

English

Daniel Han@danielhanchen·12 Mar

@KoltukGenerali Not yet! I'm working around the clock to get it to work!

English

725

Daniel Han@danielhanchen·12 Mar

My Gemma-3 analysis: 1. 1B text only, 4, 12, 27B Vision + text. 14T tokens 2. 128K context length further trained from 32K 3. Removed attn softcapping. Replaced with QK norm 4. 5 sliding + 1 global attn 5. 1024 sliding window attention 6. RL - BOND, WARM, WARP Detailed analysis: 1. Architectural differences to Gemma 2: More sliding windows are added to reduce KV cache load! A 5:1 ratio was found to work well, and ablations show 7:1 even work ok! SWA is 1024 - ablations show 1024 to 2048 work well. 2. Training, post-training Gemma-3 uses TPUs, and Zero-3 like algos with JAX. 27B was trained on 14 trillion tokens. 12B = 12T, 4B = 4T and 1B = 2T tokens. All used distillation in the RL / post-training stage. Sampled 256 logits per token from a larger instruct model (unsure which - maybe a closed source one?). Used RL algos like BOND, WARM and WARP. 3. Chat template now forces a BOS token! Uses user and model. 262K vocab size. SentencePiece tokenizer with split digits, preserved whitespace & byte fallback. 4. Long Context & Vision Encoder: Trained from 32K context, then extended to 128K context. RoPE Scaling of 8 was used. Pan & Scan algo was used for vision encoder. Vision encoder operates at a fixed resolution of 896 * 896. Uses windowing during inference time to allow other sizes. Technical report: storage.googleapis.com/deepmind-media…