Harry Zhang

732 posts

Harry Zhang

@tokeemb

AI Engineer

Chantilly, VA Katılım Mayıs 2016

35 Takip Edilen61 Takipçiler

Harry Zhang@tokeemb·1h

@asaio87 Whew

English

andrei saioc@asaio87·10h

Ok I admit, Claude Code is amazing but compared to what we had 2-3 years ago pre AI. Its good, but requires a lot of assistance, and knowing what and when to prompt. If you are not a developer, there is no chance you can build a complex app.

English

115

416

19.9K

Harry Zhang@tokeemb·4h

@krzyzanowskim Hero

Español

435

Marcin Krzyzanowski@krzyzanowskim·5h

I reimplemented "claude" CLI with codex and gpt-5.4-high. It cost $1100 in tokens, and is 73% faster and 80% lower resident memory during sustained interactive use. It is very easy to reverse claude from npm distribution, then reimplement is 1:1. It is indistinguishable from the Anthropic version to the every header and analytics it send back github.com/krzyzanowskim/…

English

682

87K

Harry Zhang@tokeemb·5h

@SceneinCinema Boy it must be good to be Leo

English

418

Movies Scenes 🎫@SceneinCinema·12h

Very few people know this, but in this scene from 'The Wolf of Wall Street,' Margot Robbie wasn't actually wearing underwear. She confessed this herself, as she wanted to see Leonardo DiCaprio's genuine reaction. While director Martin Scorsese offered her a robe to cover up, she felt her character, Naomi, would not have worn one in that moment, aiming for a "full-frontal" look to be "all in". Despite her confidence on screen, Robbie admitted on the Talking Pictures podcast that she had "a couple shots of tequila" before filming because she was very nervous. Robbie told Porter magazine that despite the intimate nature of the scene, it was filmed in a tiny bedroom packed with "30 crew".

English

525

202

10.6K

6.1M

Harry Zhang@tokeemb·1d

LLM appliance

English

Harry Zhang@tokeemb·1d

@no_stp_on_snek Hero

Español

521

Tom Turney@no_stp_on_snek·2d

I implemented Google's TurboQuant paper (ICLR 2026) in llama.cpp with Metal kernels for Apple Silicon. 4.9× KV cache compression. Working end-to-end on M5 Max with Qwen 3.5 35B MoE and Qwopus v2 27B. Speed needs work (unoptimized shader), compression target met. Repo: github.com/TheTom/turboqu… **Note**: as you'll see from the git when I saw "I" it's in conjunction with claudecode and codex. Just lots of steering and babysitting.

English

369

110.9K

Harry Zhang@tokeemb·1d

Nerds learn c and assembly in the old days thinking these are hard to replace skills. With Claude code what should we learn?

English

Harry Zhang@tokeemb·2d

@karpathy Make retrieval efficient

English

Andrej Karpathy@karpathy·2d

One common issue with personalization in all LLMs is how distracting memory seems to be for the models. A single question from 2 months ago about some topic can keep coming up as some kind of a deep interest of mine with undue mentions in perpetuity. Some kind of trying too hard.

English

1.7K

1.1K

20.6K

2.5M

Harry Zhang@tokeemb·2d

Apple has a golden opportunity now to make mac a real ai ecosystem at the risk of losing the consumer label. The opportunity won't last long

English

Harry Zhang@tokeemb·2d

@elijahmcom @kitlangton Based

English

118

Elijah@elijahmcom·2d

pardon me Mr. Effect @kitlangton I have a PR to fix a bug in some code you recently effectify'ed (you did not introduce the bug, it pre-dated your re-write). Essentially if a large file is tracked by the Snapshot and gets deleted it causes large memory usage.

English

8.4K

Harry Zhang@tokeemb·3d

Hadoop for gpu

Indonesia

Harry Zhang@tokeemb·3d

@twistedmatrices @exolabs @PixelRainbowNFT Awesome - future Hadoop for gpu

English

Guybrush Threepwood@twistedmatrices·4d

@tokeemb @exolabs as stated by @PixelRainbowNFT exo supports heterogeneous clusters but RDMA requires both machines to have TB5. M2 is TB4 only. you can still cluster over ethernet... you'd get pipeline parallelism instead of tensor parallelism. works, just slower.

English

263

Guybrush Threepwood@twistedmatrices·4d

PSA: If you have multiple macbooks that support RDMA, you can cluster them using @exolabs and run 30B+ models at 70 tok/s over thunderbolt5. tensor parallelism on consumer hardware is a solved problem. you are renting GPUs that are worse than the laptop on your couch. 2X M4 Max(64GB each) running mlx-community/Qwen3-30B-A3B-4bit @ 70 TPS

English

625

652.5K

Harry Zhang@tokeemb·3d

@ivanfioravanti Good - so I don’t have to try

English

160

Ivan Fioravanti ᯅ@ivanfioravanti·4d

I'm not having great luck with NVIDIA Nemotron-Cascade-2-30B-A3B on coding side, am I the only one?

English

12.2K

Harry Zhang@tokeemb·4d

@evisdrenova Good - I’d like to be the ancient users of tui

English

155

Evis Drenova@evisdrenova·4d

Calling it now: the TUI as the main interface of agentic software engineering is dead in 4-6 months

English

412

88.9K

Harry Zhang@tokeemb·4d

@saranormous Can we do away with websites? Text based internet is fine, isn’t it

English

sarah guo@saranormous·4d

watching claude try to use the browser...are websites being adversarial to computer use on purpose? or is CUA still that bad

English

140

407

112.3K

Harry Zhang@tokeemb·4d

How to load Claude md automatically using opencode

English

Harry Zhang@tokeemb·4d

@jukan05 Why no one in the thread mentions their moat is sram

English

Jukan@jukan05·5d

If there is a company that truly wants to become a real competitor to Nvidia, I think it will acquire Cerebras, just as Nvidia acquired Groq. I’m not sure yet whether Google or AMD will be the one to acquire Cerebras.

English

424

53.1K

Harry Zhang@tokeemb·4d

@MiniMax_AI Do they work with Claude code?

English

MiniMax (official)@MiniMax_AI·5d

Our official skills repo is open source: github.com/MiniMax-AI/ski… Equip your agents with curated skills for iOS and Android development, Office file editing, and visual effects with GLSL shaders. There are more open source projects coming!

English

436

3.2K

673.6K

Harry Zhang@tokeemb·5d

I want Gemini to have my eyes when I’m on my pixel 10

English