Komodin Bahattin

14 posts

Komodin Bahattin banner
Komodin Bahattin

Komodin Bahattin

@KoltukGenerali

Gereksiz ilgi alanları bol olan bir şahıs...

Turkey Katılım Eylül 2016
249 Takip Edilen12 Takipçiler
Komodin Bahattin
Komodin Bahattin@KoltukGenerali·
@bcherny Wouldn't it be much easier if we can add some cmd+left/right to play with effort levels in the cli? /effort is time consuming
English
1
0
1
550
Boris Cherny
Boris Cherny@bcherny·
5/ Configure your effort level Opus 4.7 uses adaptive thinking instead of thinking budgets. To tune the model to think more/less, we recommend tuning effort. Use lower effort for faster responses and lower token usage. Use higher effort for the most intelligence and capability. Personally, I use xhigh effort for most tasks, and max effort for the hardest tasks. Max applies to just your current session; other effort levels are sticky and persist for your next session also. /effort to set your effort level.
English
37
24
883
185.3K
Boris Cherny
Boris Cherny@bcherny·
Dogfooding Opus 4.7 the last few weeks, I've been feeling incredibly productive. Sharing a few tips to get more out of 4.7 🧵
English
335
1.1K
11.8K
1.6M
青龍聖者
青龍聖者@bdsqlsz·
New diffusion RL method from Tencent:SRPO! Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference 32 H20 with 10minutes enhance Flux 1-dev.
青龍聖者 tweet media青龍聖者 tweet media青龍聖者 tweet media青龍聖者 tweet media
English
4
44
268
16K
Pandemic Labs
Pandemic Labs@pndmdotorg·
if you’re seeing this, you are extremely early. something mega viral is coming soon. any interaction will be considered for alpha access.
Pandemic Labs tweet media
English
2.4K
748
5K
372.3K
Komodin Bahattin
Komodin Bahattin@KoltukGenerali·
@danielhanchen @UnslothAI Amazing! Is the cola notebook coming with this release too or have to wait little more =) I see """Gemma 3 (12B) - we're still working on it""" on docs... Looks great anyways!
English
1
0
1
430
Daniel Han
Daniel Han@danielhanchen·
Excited to share that @UnslothAI now supports: • Full fine-tuning + 8bit • Nearly any model like Mixtral, Cohere, Granite, Gemma 3 • No more OOMs for vision finetuning! Blogpost with details: unsloth.ai/blog/gemma3 More updates: • Many multiple optimizations in Unsloth allowing a further +10% less VRAM usage, and >10% speedup boost for 4-bit (on top of our original 2x faster, 70% less memory usage). 8-bit and full finetuning also benefit. • Windows support via `pip install unsloth` should function now! Utilizes `triton-windows` which provides a pip installable path for Triton. • Conversions to llama.cpp GGUFs for 16bit and 8bit now DO NOT need compiling! This solves many many issues, and this means no need to install GCC, Microsoft Visual Studio etc. • Vision fine-tuning: Train on completions / responses only for vision models supported! Pixtral and Llava finetuning are now fixed! In fact nearly all vision models are supported out of the box! Vision models now auto resize images which stops OOMs and also allows truncating sequence lengths. • GRPO in Unsloth now allows non Unsloth uploaded models to be in 4bit as well - reduces VRAM usage a lot! (ie using your own finetune of Llama) • New training logs and infos - training parameter counts, total batch size • Complete gradient accumulation bug fix coverage for all models! Read the release here: github.com/unslothai/unsl…
Daniel Han tweet media
English
16
52
341
26.1K
Daniel Han
Daniel Han@danielhanchen·
My Gemma-3 analysis: 1. 1B text only, 4, 12, 27B Vision + text. 14T tokens 2. 128K context length further trained from 32K 3. Removed attn softcapping. Replaced with QK norm 4. 5 sliding + 1 global attn 5. 1024 sliding window attention 6. RL - BOND, WARM, WARP Detailed analysis: 1. Architectural differences to Gemma 2: More sliding windows are added to reduce KV cache load! A 5:1 ratio was found to work well, and ablations show 7:1 even work ok! SWA is 1024 - ablations show 1024 to 2048 work well. 2. Training, post-training Gemma-3 uses TPUs, and Zero-3 like algos with JAX. 27B was trained on 14 trillion tokens. 12B = 12T, 4B = 4T and 1B = 2T tokens. All used distillation in the RL / post-training stage. Sampled 256 logits per token from a larger instruct model (unsure which - maybe a closed source one?). Used RL algos like BOND, WARM and WARP. 3. Chat template now forces a BOS token! Uses user and model. 262K vocab size. SentencePiece tokenizer with split digits, preserved whitespace & byte fallback. 4. Long Context & Vision Encoder: Trained from 32K context, then extended to 128K context. RoPE Scaling of 8 was used. Pan & Scan algo was used for vision encoder. Vision encoder operates at a fixed resolution of 896 * 896. Uses windowing during inference time to allow other sizes. Technical report: storage.googleapis.com/deepmind-media…
Daniel Han tweet media
English
21
123
782
48.3K
FM Scout
FM Scout@fmscout·
❓ Which player of undoubted quality are you having trouble getting to perform consistently on #FM20?
English
46
2
14
0
Komodin Bahattin retweetledi
Football Manager
Football Manager@FootballManager·
Noticed while watching @WestworldHBO last night that the ‘hosts’ attributes use a familiar 20 point system...
Football Manager tweet media
English
8
86
249
0
Komodin Bahattin
Komodin Bahattin@KoltukGenerali·
Finally 64-bit support! Also I'm really excited about new match engine and how my tactics will work on it... #fm17
English
0
0
2
0
Komodin Bahattin retweetledi
Johan Andersson
Johan Andersson@producerjohan·
This is something new for the Mare Nostrum owners in v1.18 of @E_Universalis
Johan Andersson tweet media
English
6
10
48
0