Mazen — sa/acc

2.7K posts

Mazen — sa/acc banner
Mazen — sa/acc

Mazen — sa/acc

@ma7dev

building at @deepforai; @cursor_ai ambassador | prev: @malaa_tech @oregonstate | @pytorch award winner | ms & bs @oregonstate

Riyadh Katılım Nisan 2023
992 Takip Edilen3.2K Takipçiler
Mazen — sa/acc retweetledi
Nous Research
Nous Research@NousResearch·
Today we release Token Superposition Training (TST), a modification to the standard LLM pretraining loop that produces a 2-3× wall-clock speedup at matched FLOPs without changing the model architecture, optimizer, tokenizer, or training data. During the first third of training, the model reads and predicts contiguous bags of tokens, averaging their embeddings on the input side and predicting the next bag with a modified cross-entropy on the output side. For the remainder of the run, it trains normally on next-token prediction. The inference-time model is identical to one produced by conventional pretraining. Validated at 270M, 600M, and 3B dense scales, and at 10B-A1B MoE. The work on TST was led by @bloc97_, @gigant_theo, and @theemozilla.
Nous Research tweet media
English
142
392
3.5K
387.7K
Mazen — sa/acc retweetledi
Apurva Gandhi
Apurva Gandhi@apurvasgandhi·
Sub-agents are a promising inference-time scaling primitive: • Expand an agent's working memory • Divide-and-conquer hard problems • Solve problems faster with parallel execution But how do we train a model to best take advantage of sub-agents and make sure we get these benefits? Very excited to release RAO: Recursive Agent Optimization. RAO is an end-to-end reinforcement learning approach for training LLM agents to spawn, delegate to, and coordinate with recursive copies of themselves (that can themselves spawn other agents) - turning recursive inference into a learned capability. 1/10
GIF
English
20
114
693
127.8K
Mazen — sa/acc retweetledi
mazen
mazen@mznmel·
خطوة جديدة في رحلة بناء أفضل RAG سعودي alsiyaq.deep.sa
العربية
9
26
200
107K
Mazen — sa/acc retweetledi
SANI | صانع
SANI | صانع@devWithSANI·
الحلم بدون مسار… يبقى حلم. ومع مسار، بتتعلّم التقنية بطريقة تواكب عالم يتغيّر كل يوم. سجّل الحين و #اصنع_مسارك
العربية
1
24
164
99.1K
Mazen — sa/acc retweetledi
ممدوح الظفيري
ممدوح الظفيري@MamdouhAI·
أهلًا، أطلقنا أنا و @_y0u_0 كورس عن الـAgentic Engineering with Claude Code. نشرح فيه كيف تبني منتجات وأنظمة بأعلى جودة مع الوكلاء! وأراهن على جودة الكورس والمعلومات اللي موجودة فيه
العربية
81
128
1.9K
2.4M
Mazen — sa/acc retweetledi
Sudo su
Sudo su@sudoingX·
"how do you fit qwen 3.6 27b q4 on 24gb at 262k context" lands in my dms 5 times a week. here is the exact memory math. model bytes at idle = 16gb (q4_k_m of 27b dense) kv cache at 262k context with q4_0 for both k and v = 5gb total = 21gb on the card headroom = 3gb for prompts and tool call traces the magic is the kv cache type. most people leave it at default fp16 or push to q8 thinking quality wins. on qwen 3.6 27b dense at 262k: - fp16 kv cache = does not fit at all - q8 kv cache = fits at 23gb but runs 3x slower (double penalty: more vram, less speed) - q4_0 kv cache = fits at 21gb at full speed (40 tok/s flat curve, same speed at 4k or 262k) most builders never test the kv cache type because tutorials never mention it. it is the single biggest unlock on consumer 24gb hardware. flags i run: ./llama-server -m Qwen3.6-27B-Q4_K_M.gguf -ngl 99 -c 262144 -np 1 -fa on --cache-type-k q4_0 --cache-type-v q4_0 what they do: -ngl 99 = offload everything to gpu -c 262144 = 262k context window -np 1 = single user slot (do not enable multi-slot, eats headroom) -fa on = flash attention on (memory and speed both win) --cache-type-k q4_0 --cache-type-v q4_0 = the unlock if you are sitting on 24gb and not running this config, you are leaving 250k of context on the table. or worse, you are running q8 kv cache and burning 3x your speed for nothing. q4 is not a compromise on consumer hardware. it is the right call.
English
86
110
1.3K
74.5K
Mazen — sa/acc retweetledi
⚚Sage
⚚Sage@belikesagee·
Me in a Teams meeting, waiting to say "Nothing From my side"
English
295
12.7K
75.1K
2.2M
Mazen — sa/acc retweetledi
Qwen
Qwen@Alibaba_Qwen·
🚀 Meet Qwen3.6-27B, our latest dense, open-source model, packing flagship-level coding power! Yes, 27B, and Qwen3.6-27B punches way above its weight. 👇 What's new: 🧠 Outstanding agentic coding — surpasses Qwen3.5-397B-A17B across all major coding benchmarks 💡 Strong reasoning across text & multimodal tasks 🔄 Supports thinking & non-thinking modes ✅ Apache 2.0 — fully open, fully yours Smaller model. Bigger results. Community's favorite. ❤️ We can't wait to see what you build with Qwen3.6-27B! 👀 🔗👇 Blog: qwen.ai/blog?id=qwen3.… Qwen Studio: chat.qwen.ai/?models=qwen3.… Github: github.com/QwenLM/Qwen3.6 Hugging Face: huggingface.co/Qwen/Qwen3.6-2… huggingface.co/Qwen/Qwen3.6-2… ModelScope: modelscope.cn/models/Qwen/Qw… modelscope.cn/models/Qwen/Qw…
Qwen tweet media
English
531
1.7K
12.5K
3.7M
Ben Lang
Ben Lang@benln·
Baby sister arrived yesterday. Deeply grateful.
Ben Lang tweet media
English
145
3
991
49.7K
Mazen — sa/acc retweetledi
Dal | دال
Dal | دال@DalData_sa·
٣ أيام تفصلنا عن بداية هاكاثون #أبنِ_وأطلق🌟 في المملكة العربية السعودية سوق الألعاب ينمو بسرعة، سجل معنا لتكون جزء من هذه الرحلة🚀 سجِّل الآن luma.com/8571jsfj
Dal | دال tweet media
العربية
0
2
9
2.8K
Mazen — sa/acc retweetledi
Michael Truell
Michael Truell@mntruell·
Excited to partner with the SpaceX team to scale up Composer. A meaningful step on our path to build the best place to code with AI.
SpaceX@SpaceX

SpaceXAI and @cursor_ai are now working closely together to create the world’s best coding and knowledge work AI. The combination of Cursor’s leading product and distribution to expert software engineers with SpaceX’s million H100 equivalent Colossus training supercomputer will allow us to build the world’s most useful models. Cursor has also given SpaceX the right to acquire Cursor later this year for $60 billion or pay $10 billion for our work together.

English
484
1.2K
10.4K
1.6M
Mazen — sa/acc retweetledi
Mazen — sa/acc retweetledi
Dal | دال
Dal | دال@DalData_sa·
كيف تُبنى الألعاب؟ وكيف تبني لعبة من الصفر؟ في هاكاثون دال #ابنِ_وأطلق تجد الإجابة. اختر مسارك إن أردت من بين المسارات الخمسة وخلال أربع ساعات فقط، ستتمكن من بناء لعبة متكاملة من الصفر بمساعدة أدوات الذكاء الاصطناعي 🚀 *سيحصل الحضور على رصيد مجاني من Cursor سجِّل الآن! luma.com/8571jsfj
Dal | دال tweet media
العربية
0
5
52
7.9K
Mazen — sa/acc retweetledi
Raymond Weitekamp
Raymond Weitekamp@raw_works·
sorry it took me ~50 hrs! now i've got DSPy.RLM as SOTA on LongCOT (Full) by a very large margin, using... ...drumroll... Qwen 3.5 9B! 👑 Qwen3.5-9B + dspy.RLM = 15.69% on LongCoT-full 🔥 ~1.6× GPT 5.2's 9.83% on the same slice!
Raymond Weitekamp@raw_works

ok so the default DSPy.RLM is literally going to destroy this benchmark before the end of the day. running now for sonnet 4.5... 🏆 Scoreboard (live) RLM: 90/94 (95.7%) Vanilla: 0/94 (0.0%) anyone want to pay for the opus run? 😉

English
19
48
603
125.4K
Mazen — sa/acc retweetledi
Dal | دال
Dal | دال@DalData_sa·
مهتم ببناء منتجك الخاص من الصفر حتى الإطلاق باستخدام أدوات الذكاء الاصطناعي؟ مبادرة دال الجديدة #ابنِ_وأطلق هي البيئة المثالية لإشباع هذا الاهتمام، من خلال دعم المشاركين من قبل خبراء في أدوات الذكاء الاصطناعي🔥 سجِّل الآن في النسخة الأولى من المبادرة بعنوان: "Build & Ship a Game in 4 Hours" luma.com/8571js
Dal | دال tweet media
العربية
4
11
71
9.4K
Mazen — sa/acc retweetledi
Cursor
Cursor@cursor_ai·
Through the end of this weekend, we are doubling Composer 2 usage limits inside of Cursor's new agents window. Enjoy!
English
120
95
2.2K
164.3K
Mazen — sa/acc retweetledi
Dal | دال
Dal | دال@DalData_sa·
مهتم بتطوير لعبتك الأولى؟ ندعوكم للتسجيل في النسخة الأولى من #ابني_واطلق، بدعم من Cursor وقيادة م. مازن العتيبي سفير Cursor في مدينة الرياض. ستتمكن من تطوير لعبة من الصفر بمساعدة الذكاء الاصطناعي في أربع ساعات فقط 🚀 سيحصل الحضور على رصيد مجاني من Cursor. المقاعد محدودة، انتهز الفرصة وسجّل الآن! luma.com/8571jsfj
Dal | دال tweet media
العربية
2
2
5
1.8K