Salman // 萨尔曼

1.4K posts

Salman // 萨尔曼

@ForBo7_

「Open to Projects」 • Dabbler • Learner • Explorer • Logger • https://t.co/jTudwv3AAp student • Dabbling in Embodied AI • 自学中文 // Self-learning Chinese

Hong Kong/Shenzhen Katılım Eylül 2022

909 Takip Edilen245 Takipçiler

Sabitlenmiş Tweet

Salman // 萨尔曼@ForBo7_·30 Eyl

Doing lesson 15 of the @fastdotai course; deducing how to rearrange convolutions as a matrix product

English

13.4K

Salman // 萨尔曼 retweetledi

Jonas Geiping@jonasgeiping·2d

We’re training models wrong and it’s due to chatGPT. Even the modern coding agents used daily still use message-based exchanges: They send messages to users, to themselves (CoT) and to tools, and receive messages in turn. This bottlenecks even very intelligent agents to a single stream. The models cannot read while writing, cannot act while thinking and cannot think while processing information. In our new paper, see below, we discuss LLMs with parallel streams. We show that multi-stream LLMs can … 🔵Be created by instruction-tuning for the stream format 🔵Simplify user and tool use UX removing many pain points with agents and chat models (such as having to interrupt the model to get a word in) 🔵Multi-Stream LLMs are fast, they can predict+read tokens in all streams in parallel in each forward pass, improving latency 🔵 LLMs with multiple streams have an easier time encoding a separation of concerns, improving security 🔵 LLMs with many internal streams provide a legible form of parallel/cont. reasoning. Even if the main CoT stream is accidentally pressured or too focused on a particular task to voice concerns, other internal streams can subvocalize concerns that would otherwise not be verbalized. Does this sound related to a recent thinky post :) - Yes, but I don’t feel so bad about being outshipped with such a cool report on their side by 23 hours. I’ll link a 2nd thread below with a more direct comparison. I actually think both are complementary in interesting ways.

GIF

English

167

1.4K

149.6K

Salman // 萨尔曼 retweetledi

François Fleuret@francoisfleuret·6 May

Give LLMs 1. A latent space diffusion-like reasoning. 2. A real recurrent state. 3. A world-model pre-pre-training. And we are done.

English

487

56.1K

Salman // 萨尔曼@ForBo7_·18 Nis

Going thru the lesson 9a notebook (by @johnowhitaker) of the fastai course, and created a little tool to play around and visualize the latents produced by an autoencoder Crafted with SolveIt and FastHTML (both by @answerdotai)

English

743

Salman // 萨尔曼@ForBo7_·4 May

`torch.triu` returns the upper triangle of a tensor

English

Salman // 萨尔曼@ForBo7_·4 May

`torch.where` returns the index of the desired lookup here, the index of the token that has id 6829 is returned

English

Salman // 萨尔曼@ForBo7_·3 May

ok, so in addition to these two embeddings, there are apparently two other: - input embedding: an embedding that contains the meaning of the token and its position/index in the sentence - output embedding: the input embedding, but the context of the previous tokens also embedded

Salman // 萨尔曼@ForBo7_

CLIP has 2 embeddings: - token emb - position emb the token emb is like a lookup table: given a token having id X, fetch it's emb the positional emb stores the position of a token in a sequence; otherwise, "man bites dog" and "dog bites man" would have the same repr

English

Salman // 萨尔曼@ForBo7_·30 Nis

TIL `torch.finfo`; fetches you information about floating point types

English

Salman // 萨尔曼@ForBo7_·29 Nis

@teortaxesTex simple calligraphy test b/w doubao and deepseek; here, doubao was able to correctly make out 书法 without thinking mode being toggled too

English

356

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·29 Nis

Chinese bros this is rude if you have DeepSeek-Vision, report some tests for the rest of us

English

237

17.5K

Salman // 萨尔曼@ForBo7_·29 Nis

English

Salman // 萨尔曼 retweetledi

Yifei Hu@hu_yifei·29 Nis

Learn Chinese. You can save at most 20% on token bills! 学好中文，大模型账单打8折！

Aran Komatsuzaki@arankomatsuzaki

Follow-up on non-English token-inefficiency with more model-language pairs: - Chinese is cheaper than English on major Chinese models - Gemini and Qwen provide least non-English tax - Anthropic has the highest tax by far; Kimi is next - Hindi is the worst-covered language here, despite its massive speaker base

日本語

112

20.2K

Salman // 萨尔曼 retweetledi

Barrett@BarrettYouTube·20 Nis

This is the moment NVIDIA should be seriously worried. In the next couple of weeks DeepSeek V4 will be launched. It’s a direct attack on the entire AI stack that American companies have spent years locking down. Full “de-NVIDIA-ization”, a complete shift away from CUDA into Huawei’s CANN ecosystem, running on Huawei Ascend chips. That means one thing, breaking the dependency that made NVIDIA untouchable. 35x faster inference vs early versions. Nearly 3x the performance of NVIDIA’s H20 on a single card. 40% less energy consumption. Over 95% CUDA compatibility with migration times collapsing from months to hours. Even Jensen Huang has already admitted it. If this works at scale, it’s a “terrifying outcome” for US companies. Because here’s the real problem, this isn’t happening in isolation. Chinese tech giants like Alibaba, ByteDance, and Tencent are already ordering hundreds of thousands of Ascend chips. Market share is shifting fast, domestic chips now at 41%, NVIDIA slipping to 55% in China’s AI server market. Additionally DeepSeek V4 is reportedly offering API costs at a fraction of US competitors. $300 for massive workloads that would cost $2,500+ on OpenAI models, or even $5,000 on Anthropic. So this isn’t just about one model. It’s about China building a fully independent AI stack, chips, frameworks, models, and applications. Completely outside of US control. NVIDIA doesn’t just lose sales. It loses its grip on the global AI standard.

English

248

1.2K

621.4K

Salman // 萨尔曼@ForBo7_·18 Nis

Forgot to link SolveIt heh: solve.it.com

English

Salman // 萨尔曼@ForBo7_·18 Nis

The cool thing is that the SolveIt dialog _is_ the single source of truth! The SolveIt dialog not only shows the steps showing how to build the app, use the VAE, and what not, but also hosts the app! SolveIt dialog: share.solve.it.com/d/a1dbf736ddae…

English

Salman // 萨尔曼 retweetledi

pfung@philfung·18 Nis

if you’re into robotics or AI, picking up Chinese is a good move. Native speakers make up ~50% of top researchers - I’ve lost count of how many times I was the only non-native speaker at a table, and being able to follow the nuance of the conversation was huge. You’ll be fine w/o, but having it lets you jump into a ton of Chinese-only dialogues you’d otherwise miss. It only has to be conversational Chinese because all the technical terms will still be in English.

San Francisco, CA 🇺🇸 English

410

50.3K

Salman // 萨尔曼@ForBo7_·17 Nis

a project is never complete until it's been shared as well gotta share

English