Trần Ngọc Sơn

12 posts

Trần Ngọc Sơn

Trần Ngọc Sơn

@nomiss194

Katılım Kasım 2025
52 Takip Edilen0 Takipçiler
Trần Ngọc Sơn retweetledi
CyrilXBT
CyrilXBT@cyrilXBT·
EVERYONE IS OBSESSING OVER WHICH AI MODEL IS FASTEST. They are measuring the wrong thing. Someone just ran Gemma 4 and Qwen 3 on the exact same coding task with the exact same hardware and the exact same prompt. Here are the raw numbers: Gemma 4 31B: 27 tokens per second. Finished in 3 minutes 51 seconds. Used 6,209 tokens. Qwen 3 27B: 32 tokens per second. Finished in 18 minutes 4 seconds. Used 33,946 tokens. Qwen was FASTER per token. Gemma finished 14 MINUTES EARLIER. Because Gemma used 5.5x fewer tokens to reach a complete answer. This is the insight almost nobody in AI benchmarking understands. Tokens per second is not the metric that matters. Tokens to completion is the metric that matters. A model that thinks more efficiently gets you to the answer faster even if it generates each token more slowly. A model that thinks out loud for 33,000 tokens when 6,000 tokens would have done the job is not smarter. It is more verbose. And verbosity costs you time, money, and context window. The fastest model is not the one with the highest tokens per second. It is the one that needs the fewest tokens to be right. That distinction is going to matter enormously as agent loops get longer and API costs compound across thousands of iterations. Screenshot this and come back to it every time someone shows you a tokens per second benchmark. Follow @cyrilXBT for every AI insight that changes how you think about the tools you are building with.
English
19
24
112
14.2K
Trần Ngọc Sơn retweetledi
Emre Savcı
Emre Savcı@mstrYoda_·
Bir sürü lokal LLM var, benim bilgisayarım hangisini kaldırır diye soruyorsanız size bunun cevabını veren güzel bir site var: Can I Run AI Sistem bilgilerinize göre hangi modelin ne kadar token/saniye performansıyla çalışacağını gösteriyor 👇 canirun.ai
Emre Savcı tweet media
Türkçe
15
95
753
53.6K
Trần Ngọc Sơn retweetledi
Bruno Pinheiro
Bruno Pinheiro@brunopinheiroms·
O que eu mais sinto falta do Claude Code em relação ao meu setup atual do OpenCode Go é a rapidez, mas em compensação tô conseguindo usar sem estourar limite e a qualidade está boa, harness importa, pessoal! Sai de um custo de 200$ por mês para 10$ (desafio pessoal) Meu setup tem orquestrado para usar o melhor de cada modelo: - Kimi 2.6 orquestra - MiMo V2.5 Pro codifica - DeepSeek V4 Flash faz o operacional - GLM-5 faz review e RCA - Qwen 3.5 Plus para MCPs (Kimi nao tem funcionado muito bem ainda) E to aproveitando que a OpenCode esta dando 3x de uso pro Kimi, baita modelo! #bolhadev #ia
Bruno Pinheiro tweet media
Bruno Pinheiro@brunopinheiroms

Abandonei a Claude 100%, achei que não iria me acostumar mas nao pretendo voltar mais não, Claude é muito bom mas é caro e lento .. Tenho usado - OpenCode Go: Kimi 4.6 / GLM 5.1 - Codex 5.4 Habilitei Caveman com Rtk para economia de tokens Sai de 200$ por mes pra 30$ (OpenCode + Gpt) e nao estou sentindo falta de nada ate agora github.com/juliusbrussee/… github.com/rtk-ai/rtk #bolhadev

Português
58
34
1K
82.9K
Trần Ngọc Sơn retweetledi
Suryansh Tiwari
Suryansh Tiwari@Suryanshti777·
This shouldn’t be possible… but Microsoft just did it. They made a 100B parameter model run on a single CPU. No GPU. No insane setup. Just math rewritten from scratch. Here’s the part most people are missing 👇 Traditional LLMs = 16-bit floats Every weight = messy decimal (0.0023, -1.47…) Inference = billions of float multiplications → That’s why GPUs exist BitNet flips the entire game. Instead of floats, it uses ternary weights: {-1, 0, 1} Not compression. Not optimization. A completely different computing primitive. Now look what happens: → ×1 = keep value → ×(-1) = flip sign → ×0 = ignore That’s it. No multiplications. Just add, subtract, skip. Matrix multiplication (the core of AI) becomes: cheap integer ops on a CPU And the results? → 2x–6x faster on CPUs → Up to 82% less energy usage → Scales BETTER with bigger models Let that sink in: A 100B model running at 5–7 tokens/sec on a single CPU This is not “optimization” This is a paradigm shift And the craziest part? These models are NOT quantized later They are trained like this from day one. No precision loss No quality drop The model literally learns inside the constraint Why 1.58 bits? Because: log₂(3) ≈ 1.58 3 possible values → max efficiency per weight We’re watching the hardware bottleneck disappear in real time. AI is not getting smaller. The math is getting smarter. Bookmark this. In a year, running LLMs locally won’t be impressive. Not running them locally will be.
English
10
15
106
30.2K
Trần Ngọc Sơn retweetledi
divyansh tiwari
divyansh tiwari@DivyanshT91162·
CAPTCHA IS OFFICIALLY OUTDATED A new open-source library called Cap is changing how websites stop bots. No puzzles. No traffic lights. No “select all bikes” anymore. Instead, it uses a SHA-256 proof-of-work system — simple, silent, and fast. Why devs are switching: • Only ~20KB in size • Zero tracking, zero data collection • No images, no user friction • Works with any JS runtime • Fully customizable (visible, invisible, floating modes) • Zero dependencies • Can be deployed instantly via Docker This is a full replacement for traditional CAPTCHA systems. Cleaner UX. Faster websites. Better privacy. 100% open-source on GitHub Link in comments.
English
20
87
984
112K
Trần Ngọc Sơn retweetledi
Kirill
Kirill@kirillk_web3·
A SINGLE CLAUDE.md FILE JUST HIT #1 ON GITHUB TRENDING. 82,100 stars. 7.8k forks. zero dependencies. Bookmark this before you forget. And your Claude will start working differently. 4 principles. one file. Karpathy's LLM coding habits. distilled. > think before coding. > simplicity first. > surgical edits only. > goal-driven targets before starting. swap it into your CLAUDE.md today. your Claude Code becomes a different tool. Read it today. Link below. Claude → Skills → CLAUDE.md → Better Code → Better Systems → Money
Kirill@kirillk_web3

🚨do you understand what the Head of Anthropic Coding Agents just dropped. 30 minutes. more value than 100 paid courses. not a course. not a tutorial. how top AI researchers actually build. here's the part nobody is talking about: > real workflows. not theory. > vibe coding from the source. > how they think, build, and ship with agents. watch this before you write another prompt. before you build another agent. before you touch another tool. 30 minutes. bookmark it. watch it today. this one changes how you use AI for good.

English
63
491
8.5K
4.5M
Google AI Developers
Google AI Developers@googleaidevs·
Gemini Embedding 2 is now generally available in the Gemini API and Vertex AI! Start building with our first natively multimodal embedding model, now equipped with the stability and optimizations required for production apps.
English
71
339
3.1K
806.8K
Trần Ngọc Sơn retweetledi
0xMarioNawfal
0xMarioNawfal@RoundtableSpace·
Someone gave Claude Code permanent memory and it hit 46k stars in 48hrs 95% less token consumption/session. Never hits context limits. Picks up exactly where you left off. One command install. Completely free.
English
168
327
4.8K
586.5K
Trần Ngọc Sơn
Trần Ngọc Sơn@nomiss194·
@elonmusk X giờ tự động dịch bằng grok thật tiện, cảm ơn Elon
Tiếng Việt
0
0
1
13
Elon Musk
Elon Musk@elonmusk·
If only we’d trained Grok on just these 2 books, we’d be done already!
Elon Musk tweet media
English
4.4K
16.7K
251.4K
20.1M