yuxinlu1

23 posts

yuxinlu1

@Dadahelper1

MLE/AIE

Arizona, USA Katılım Şubat 2025

43 Takip Edilen179 Takipçiler

yuxinlu1@Dadahelper1·9h

@HuggingModels 500 on HF but only ~ 150 on X here 😢

English

Hugging Models@HuggingModels·14h

100k followers on X < 100 followers on GitHub < 10 followers on Huggingface

English

1.7K

yuxinlu1@Dadahelper1·11h

It's black-box distillation. I pull chain-of-thought from a strong teacher (Claude / Fable 5) — but for coding I keep only the traces whose code actually passes its unit tests. Execution-verified rejection sampling, so the model learns reasoning that runs, not text that just looks right. Then a 4-bit QLoRA fine-tune on a single RTX 5090 (32GB), merge the adapter back to fp16, and convert to GGUF quants with llama.cpp so anyone can run it locally. v2 adds real multi-step tool-use traces on top for the agentic side. Honestly, most of the work is in the data — the training itself fits on one consumer card. Happy to go deeper on any part if you're building something 👀

English

cudnn_cu12@_proteuss_·12h

@Dadahelper1 interested in how you make these!!

English

yuxinlu1@Dadahelper1·12h

v2 is out on HF and the jump in agentic tool-use is huge — on tau2-bench telecom it goes from ~15% (base Gemma 4 12B) to ~55%: it actually diagnoses → fixes → verifies instead of bailing after step one. A lot of folks are hitting issues across different runtimes (llama.cpp, Ollama, LM Studio, koboldcpp…) — almost all of it is template / tool-format quirks, not the weights. I've written every fix up in the HF discussions, so check there first 🙏 And honestly? Google's models are way less convenient to work with than Qwen — the non-standard chat template + native tool-call format trip up half the ecosystem, while Qwen's standard ChatML just works everywhere. Shipped it anyway 💪 huggingface.co/yuxinlu1/gemma…

English

674

yuxinlu1@Dadahelper1·1d

@ZixuanLi_ Thank you so much for the encouragement! Means a lot as an open-source dev. GLM-5.2 is genuinely amazing 🙌

English

7.4K

Zixuan Li@ZixuanLi_·1d

GLM-5.2 has been "stuck" at No.2 on Hugging Face Trending for three days, but I'm thrilled to have connected with the creator behind the No.1 project this afternoon. It's been amazing to see open-source work resonating with so many people.

English

1.4K

177.6K

yuxinlu1@Dadahelper1·4d

@berryxia 感谢推荐😍

中文

Berryxia.AI@berryxia·6d

一个12B的本地模型，直接把Fable 5的推理链条蒸馏进去了，现在你能在消费级显卡上离线跑顶级coding能力。这个Gemma 4 12B Coder GGUF是基于Google的gemma-4-12B-it微调的，专门针对代码生成和复杂推理。训练数据里用了Composer 2.5的真实通过案例，还让Fable 5帮着补全那些难搞的case，结果就是每一步推理都导向能真正跑通的代码。最爽的是它走GGUF格式，12GB显卡就能顺畅跑，甚至CPU也能用。调试、补全代码、生成复杂算法、做链式思考提示，全都本地搞定，不用交API费、不用担心导出管制。以前大家觉得前沿模型要么云端用要么根本跑不了，现在开源社区直接把Fable 5的思考方式打包成能塞进你笔记本的版本。模型还在快速迭代，下载量已经破六千，社区反馈它在本地coding场景里特别能打。这波操作把“强大但受限”和“本地可用”之间的鸿沟给填上了。真正的AI生产力，从来不是等大厂放行，而是社区自己动手把能力解放出来。

Hugging Models@HuggingModels

Gemma 4 12B Coder is here and it's a game changer for local code generation. This GGUF model packs Google's latest gemma-4 architecture into a compact 12B size, perfect for running on consumer hardware. It's optimized for reasoning and thinking, making it ideal for developers who want fast, private coding assistance without the cloud.

中文

143

751

83K

yuxinlu1@Dadahelper1·4d

This made my day 🙏 You nailed the recipe — Composer 2.5 CoT as the backbone, Fable 5 to rescue the hard cases, every example verified by running the code. 20 tok/s on a 4060 is exactly the point: local, offline, yours. push -c past 64K — it's fixed to the full 256K context now.

Alok@analogalok

This is the most hilarious thing I saw and did today Ran gemma-4-12B-coder-fable5-composer2.5-v1-GGUF locally with 8 GB VRAM at 20+ tok/sec Anthropic's Claude Fable 5 launched June 9. By June 12 it was banned. I can't access it. You can't either. But here's the twist: I'm running a model trained on its chain of thought at 20 tok/s on my RTX 4060 8GB. Locally. Offline. No cloud. No export control. Enter: Gemma4-12B-Coder GGUF (Q4_K_M) Base: Google's gemma-4-12B-it Fine-tuned on verifiable Python CoT data: - Primary: Composer 2.5 real reasoning traces (only passing solutions kept) - Auxiliary: Fable 5 used to redo the hard cases Composer missed. Every training example's reasoning led to code that actually ran. No hallucinated logic. Llama.cpp flags: -m gemma4-coding-Q4_K_M.gguf -cnv -ngl 44 -c 64000 -v (huggingface model link in comments) Flag breakdown: -ngl 44 → offload 44 layers to GPU (tune this for your VRAM) -c 64000 → 64K context window -cnv → conversation/chat mode -v → verbose output The irony writes itself. Anthropic spent weeks telling the world Fable 5 (mythos) is too powerful to release. Then released it. Then got banned from serving it, including their own researchers. Meanwhile: a Gemma 4 12B fine tune, trained on Fable 5's reasoning, runs fully offline on my mid range consumer GPU No API. No cloud. Just me and llama.cpp. This is why local AI matters. Check out the model's link in the comments. How's your experience been with this model?

English

1.6K

yuxinlu1@Dadahelper1·4d

@HuggingModels Thank you so much for the feature, @HuggingModels 🙏 Truly honored — v2 is on the way and it'll be even better 🚀

English

100

Hugging Models@HuggingModels·14 Haz

English

101

495

4.7K

1.7M

yuxinlu1@Dadahelper1·4d

Thank you @HuggingModels for the feature, this made my day 🙏 v1 has been incredible to see take off in the community. Good news: v2 is already cooking — dataset's basically done. It'll be stronger across the board. Releasing as soon as it's ready 🚀

Hugging Models@HuggingModels

English

961

yuxinlu1@Dadahelper1·13 Haz

@ZixuanLi_ Open weight for all non us citizen, like fable5 lol

English

2.5K

Zixuan Li@ZixuanLi_·13 Haz

Thanks for all the feedback. GLM-5.2 will begin rolling out to all Coding Plan users in 3 hours.

Zixuan Li@ZixuanLi_

Help us shape the next GLM release: what should we prioritize most?

English

133

115

1.8K

298.7K

yuxinlu1@Dadahelper1·14 May

@VincentLogic 算上今天的翻倍了吗？

中文

416

Vincent | 信号＞噪音@VincentLogic·14 May

帮大家探到Claude Code Max的用量上限了！ 💰 200刀/月订阅 📊 每周用量上限：~1700刀 📅 月度理论上限：~7000刀 ⚠️ 5小时用量翻倍也没用，只会更快进入冷却现在Plan周日才重置，剩下几天只能靠codex Pro续命了... 且用且珍惜吧🙏

中文

6.9K

yuxinlu1@Dadahelper1·14 May

@ChaomingWe80477 看别人测评周限额要大于plus用户，5小时窗口低于plus😬

中文

Cheney@ChaomingWe80477·12 May

关于 ChatGPT Business 与 Plus 在 Codex Desktop 中的实际使用对比，前几天看到 Business 美区、英区和澳区的限时优惠，便以折扣价分别订阅了美区和澳区各 2 个 Seat。今天恰好有个项目需要迭代，顺便做了一次测试，结果如下：无论是 Plus 还是 Business Seat，5 小时内的用量占周用量的比例均为 15%，工作强度基本相当。但在相同强度下： Plus：5 小时窗口内实际使用约 3 小时，上下文总量 730K，其中 405K 为压缩内容； Business（1 个 Seat）：实际使用约 40 分钟，上下文总量 196K，其中 135K 为压缩内容。也就是说，2 个 Business Seat 加起来的用量，都没能达到 1 个 Plus 的水平。这个月用完后，打算直接取消订阅。没抢到优惠的朋友完全不必遗憾——哪怕订阅美区 Plus 账号，同样的价格，性价比也远高于这两个 Business Seat。

中文

823

yuxinlu1@Dadahelper1·13 May

@luoleiorg thx🙏

4.7K

luolei@luoleiorg·13 May

😍好消息好消息，在这中美欢聚的喜庆日子，山姆奥特曼又给大家送 Tokens了，使用下面的优惠码 STRIPEATLASGPT4BIZ050126 自动获得 ChatGPT 两个月免费 Team 套餐，需要美区支付卡。 chatgpt.com/?promoCode=STR…

中文

143

139

1.6K

754.7K

yuxinlu1@Dadahelper1·8 May

@trq212 check this out!🤣

sui ☄️@birdabo

oh my god.

English

yuxinlu1@Dadahelper1·7 May

@sama lol if you sign one for me too, I’ll upgrade my account to Pro.

Deon Menezes@DeonMen

Just got my brand new mac signed by Sam Altman

English

yuxinlu1@Dadahelper1·6 May

@trq212 How is the weekly usage limit calculated?

English

9.9K

Thariq@trq212·6 May

We're winding back our peak hours limit reduction and doubling 5 hour limits. Excited to partner with SpaceX to bring you more compute and we'll keep pushing to bring you the best coding agent in the world.

Claude@claudeai

We’ve agreed to a partnership with @SpaceX that will substantially increase our compute capacity. This, along with our other recent compute deals, means that we’ve been able to increase our usage limits for Claude Code and the Claude API.

English

356

122

3.9K

245.1K

yuxinlu1@Dadahelper1·6 May

Lol I’m skeptical. hallucinations still aren’t solved at the frontier, GPT-5.5 hallucinates 86% of the time on AA-Omniscience, Gemini 3.1 Pro is at 50% even after calibration tuning. Supporting 12M context and actually being reliable at 12M context are two very different things.

Bindu Reddy@bindureddy

SubQ , a new type of AI model, says they are 50x faster and 20x cheaper than Opus 4.7 and GPT 5.5 In fact, they also say they perform INSANELY WELL on benchmarks and have a 12M context This would be earth shattering, if true - Anthropic/OpenAI's valuation would go to zero 😱

English

774

yuxinlu1@Dadahelper1·6 May

MiMo 2.5 Pro gifted me a membership and I genuinely appreciate the gesture, but the model itself is rough. My task was generating synthetic training data: Codex produced 200 high-quality, usable samples using maybe 5–10% of my weekly Plus quota. MiMo 2.5 Pro generated 100 samples with massive overlap/duplication, burned through 8% of my monthly quota, and I ended up deleting all 100. Thank god it was free, otherwise I’d be losing my mind.

English

361

yuxinlu1@Dadahelper1·4 May

@NVIDIA_AI_PC 2TB all filled up, and Gemma 4 31B is still my favorite, but MiniMax M2.7 Q4 is seriously impressive too.

English

555

NVIDIA RTX Spark@NVIDIARTXSpark·4 May

Be honest — how many local models do you have downloaded right now? 👀

English

564

943

128.6K

yuxinlu1@Dadahelper1·1 May

I think Grok should be open-source. Even models stronger than it, like DeepSeek and Kimi, are open-source. Why is this little weak model still closed-source?

Elon Musk@elonmusk

Grok

English

347

Keşfet

@HuggingModels @ZixuanLi_ @berryxia @VincentLogic @ChaomingWe80477 @luoleiorg @elonmusk @BarackObama