Harry Zhang

732 posts

Harry Zhang

Harry Zhang

@tokeemb

AI Engineer

Chantilly, VA Katılım Mayıs 2016
35 Takip Edilen61 Takipçiler
andrei saioc
andrei saioc@asaio87·
Ok I admit, Claude Code is amazing but compared to what we had 2-3 years ago pre AI. Its good, but requires a lot of assistance, and knowing what and when to prompt. If you are not a developer, there is no chance you can build a complex app.
English
115
9
416
19.9K
Marcin Krzyzanowski
Marcin Krzyzanowski@krzyzanowskim·
I reimplemented "claude" CLI with codex and gpt-5.4-high. It cost $1100 in tokens, and is 73% faster and 80% lower resident memory during sustained interactive use. It is very easy to reverse claude from npm distribution, then reimplement is 1:1. It is indistinguishable from the Anthropic version to the every header and analytics it send back github.com/krzyzanowskim/…
English
62
34
682
87K
Movies Scenes 🎫
Movies Scenes 🎫@SceneinCinema·
Very few people know this, but in this scene from 'The Wolf of Wall Street,' Margot Robbie wasn't actually wearing underwear. She confessed this herself, as she wanted to see Leonardo DiCaprio's genuine reaction. While director Martin Scorsese offered her a robe to cover up, she felt her character, Naomi, would not have worn one in that moment, aiming for a "full-frontal" look to be "all in". Despite her confidence on screen, Robbie admitted on the Talking Pictures podcast that she had "a couple shots of tequila" before filming because she was very nervous. Robbie told Porter magazine that despite the intimate nature of the scene, it was filmed in a tiny bedroom packed with "30 crew".
Movies Scenes 🎫 tweet media
English
525
202
10.6K
6.1M
Tom Turney
Tom Turney@no_stp_on_snek·
I implemented Google's TurboQuant paper (ICLR 2026) in llama.cpp with Metal kernels for Apple Silicon. 4.9× KV cache compression. Working end-to-end on M5 Max with Qwen 3.5 35B MoE and Qwopus v2 27B. Speed needs work (unoptimized shader), compression target met. Repo: github.com/TheTom/turboqu… **Note**: as you'll see from the git when I saw "I" it's in conjunction with claudecode and codex. Just lots of steering and babysitting.
Tom Turney tweet media
English
24
44
369
110.9K
Harry Zhang
Harry Zhang@tokeemb·
Nerds learn c and assembly in the old days thinking these are hard to replace skills. With Claude code what should we learn?
English
0
0
0
11
Andrej Karpathy
Andrej Karpathy@karpathy·
One common issue with personalization in all LLMs is how distracting memory seems to be for the models. A single question from 2 months ago about some topic can keep coming up as some kind of a deep interest of mine with undue mentions in perpetuity. Some kind of trying too hard.
English
1.7K
1.1K
20.6K
2.5M
Harry Zhang
Harry Zhang@tokeemb·
Apple has a golden opportunity now to make mac a real ai ecosystem at the risk of losing the consumer label. The opportunity won't last long
English
0
0
0
5
Elijah
Elijah@elijahmcom·
pardon me Mr. Effect @kitlangton I have a PR to fix a bug in some code you recently effectify'ed (you did not introduce the bug, it pre-dated your re-write). Essentially if a large file is tracked by the Snapshot and gets deleted it causes large memory usage.
English
4
0
50
8.4K
Guybrush Threepwood
Guybrush Threepwood@twistedmatrices·
@tokeemb @exolabs as stated by @PixelRainbowNFT exo supports heterogeneous clusters but RDMA requires both machines to have TB5. M2 is TB4 only. you can still cluster over ethernet... you'd get pipeline parallelism instead of tensor parallelism. works, just slower.
English
1
0
2
263
Guybrush Threepwood
Guybrush Threepwood@twistedmatrices·
PSA: If you have multiple macbooks that support RDMA, you can cluster them using @exolabs and run 30B+ models at 70 tok/s over thunderbolt5. tensor parallelism on consumer hardware is a solved problem. you are renting GPUs that are worse than the laptop on your couch. 2X M4 Max(64GB each) running mlx-community/Qwen3-30B-A3B-4bit @ 70 TPS
English
25
40
625
652.5K
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
I'm not having great luck with NVIDIA Nemotron-Cascade-2-30B-A3B on coding side, am I the only one?
English
31
0
77
12.2K
Evis Drenova
Evis Drenova@evisdrenova·
Calling it now: the TUI as the main interface of agentic software engineering is dead in 4-6 months
English
61
7
412
88.9K
Harry Zhang
Harry Zhang@tokeemb·
@saranormous Can we do away with websites? Text based internet is fine, isn’t it
English
0
0
0
3
sarah guo
sarah guo@saranormous·
watching claude try to use the browser...are websites being adversarial to computer use on purpose? or is CUA still that bad
English
140
9
407
112.3K
Harry Zhang
Harry Zhang@tokeemb·
How to load Claude md automatically using opencode
English
0
0
0
18
Harry Zhang
Harry Zhang@tokeemb·
@jukan05 Why no one in the thread mentions their moat is sram
English
0
0
0
51
Jukan
Jukan@jukan05·
If there is a company that truly wants to become a real competitor to Nvidia, I think it will acquire Cerebras, just as Nvidia acquired Groq. I’m not sure yet whether Google or AMD will be the one to acquire Cerebras.
English
41
13
424
53.1K
MiniMax (official)
MiniMax (official)@MiniMax_AI·
Our official skills repo is open source: github.com/MiniMax-AI/ski… Equip your agents with curated skills for iOS and Android development, Office file editing, and visual effects with GLSL shaders. There are more open source projects coming!
English
72
436
3.2K
673.6K
Harry Zhang
Harry Zhang@tokeemb·
I want Gemini to have my eyes when I’m on my pixel 10
English
0
0
0
9
Harry Zhang
Harry Zhang@tokeemb·
@ryanvogel Can’t change the weather, but it would be nice to have Napa wines and cali coffee
English
0
0
0
36