Xuan-Son Nguyen

1K posts

Xuan-Son Nguyen banner
Xuan-Son Nguyen

Xuan-Son Nguyen

@ngxson

Engineer @huggingface

Paris เข้าร่วม Ağustos 2020
240 กำลังติดตาม6.3K ผู้ติดตาม
ทวีตที่ปักหมุด
Xuan-Son Nguyen
Xuan-Son Nguyen@ngxson·
Updated my GH profile with a list of what I'm doing on llama.cpp 😂 Why? Because sometimes I forgot what I did...
Xuan-Son Nguyen tweet media
English
3
0
35
3.8K
Xuan-Son Nguyen รีทวีตแล้ว
Xuan-Son Nguyen รีทวีตแล้ว
Julien Chaumond
Julien Chaumond@julien_c·
did you know that huggingface_hub (just the Python client) is sending almost 6B requests/week? wow 😮 @huggingface
English
13
4
69
6.1K
Xuan-Son Nguyen รีทวีตแล้ว
Lysandre
Lysandre@LysandreJik·
We're opening a Hugging Face office in Tokyo! Our goal: help open-source AI develop in Japan and grow the local community. Let's meet! ハギングフェイスの東京オフィスがオープンしました! 私たちの目標は、日本におけるオープンソースAIの発展を支援し、ローカルコミュニティを育てることです。ぜひお会いしましょう!
Lysandre tweet media
日本語
125
473
3.3K
274.7K
Xuan-Son Nguyen
Xuan-Son Nguyen@ngxson·
@garyfung Yes, qwen3.5 / 3.6 is also quite good. However, its recurrent architecture pose quite some problems with KV cache reuse. On gemma 4, while cache reuse is also pretty much a mess with sliding attention, you can actually bypass it via --swa-full flag in llama.cpp
English
0
0
0
28
gary IH fung
gary IH fung@garyfung·
@ngxson gemma 4 is getting mogged by qwen3.6, or even 3.5 x.com/garyfung/statu…
gary IH fung@garyfung

this is one surprisingly impressive small model! Ran the @UnslothAI q3 quant of this Qwen3.6 35b A3B, 110-130 tps on my local 4090 card - one shotted the pelican on bike svg (not best but, small model) - asteroid. 1 shotted full working version. Was it just memorization? Went down the rabbit hole of vibing iterations and looking at its thinking tokens, it thinks in code snippets and makes pretty bang on decisions based on vague product-esque prompts without explicit spec'ing you can see the full @lmstudio chat export at github.com/fungilation/as… genuinely impressed! Feels like Claude Sonnet, internally agentic for coding use case, running 110 tps+ locally is amazing. Thanks @Alibaba_Qwen for keeping up the open weights momentum!

English
1
0
0
116
Xuan-Son Nguyen
Xuan-Son Nguyen@ngxson·
I stopped using claude code on all of my llama.cpp workflows for the past few days. The quality degradation is just too significant. Experimenting on a mixed usage between Gemma 4 26B-A4B and Gemini 3.1 Pro, so far much better than what anthropic can offer.
Simon Willison@simonw

Shocking result on my pelican benchmark this morning, I got a better pelican from a 21GB local Qwen3.6-35B-A3B running on my laptop than I did from the new Opus 4.7! Qwen on the left, Opus on the right

English
2
1
29
2.2K
Xuan-Son Nguyen รีทวีตแล้ว
Julien Chaumond
Julien Chaumond@julien_c·
opus 4.7 slightly more dangerous, slightly more expensive OR: run local models!
English
8
4
62
3.9K
Xuan-Son Nguyen
Xuan-Son Nguyen@ngxson·
llama.cpp now supports Qwen3-ASR, Qwen3-Omni and Gemma 4 audio/vision input 🔥 Mixed modalities is the future 😼😼
Xuan-Son Nguyen tweet media
English
4
11
97
4.1K
Xuan-Son Nguyen
Xuan-Son Nguyen@ngxson·
AFAICT there is no metrics to determine if an OCR model is "best". Example: one can be better in OCR English, and another can be better in Chinese. The "best quality" model may be even a big model that can't fit into your RAM. So unfortunately, it still requires trials to know which one is the "best fit" for your particular use case.
English
0
0
3
995
Harry Zhang
Harry Zhang@tokeemb·
@ngxson What is best ocr model? I don’t want various I want one
English
1
0
0
1.1K
Xuan-Son Nguyen
Xuan-Son Nguyen@ngxson·
llama.cpp now supports various small OCR models that can run on low-end devices. These models are small enough to run on GPU with 4GB VRAM, and some of them can even run on CPU with decent performance. In this post, I will show you how to use these OCR models with llama.cpp 👇
English
10
24
244
21.8K
jlcjak
jlcjak@jlcjak·
idea: instead of having a single connector operate at multiple voltages without warning, what if we had some way to exchange information between charger and device so they can negotiate an appropriate voltage. call it "power delivery". could we use usbc and barrel jack for this?
jlcjak tweet media
English
96
73
1.8K
91.4K
Xuan-Son Nguyen
Xuan-Son Nguyen@ngxson·
@le0z00s @jlcjak I hope you're not gonna have a heart attack because someone *cough Panasonic* *cough CF-SC6* still makes laptops with VGA port in 2026
English
1
0
7
750
Rafał Zbojak
Rafał Zbojak@le0z00s·
@jlcjak @ngxson Why not DA-15? It was good enough for MIDI devices. Or better: either DB-25 or 36-pin Mini-Centronics IEEE 1284 I think I'm having a stroke.
English
1
0
5
782
Xuan-Son Nguyen
Xuan-Son Nguyen@ngxson·
@Prince_Canuma Same here, I skipped the comment altogether since most of the time contributors won't question it. Also, we (kinda) enforced a PR template where contributors have to explicitly indicate that they agreed to the guidelines.
English
1
0
1
37
Prince Canuma
Prince Canuma@Prince_Canuma·
@ngxson Tell me about it! I usually respond and close with the comment but it’s exhausting when you getting dozens a day. I might need to build a skill to triage and tag PR automatically every 24h. Agent on agent action 😂
GIF
English
2
0
1
168
Prince Canuma
Prince Canuma@Prince_Canuma·
Have a new label for certain type of PRs 😤
Prince Canuma tweet media
English
3
0
37
3.3K