DRiftingZ

62 posts

DRiftingZ

@d_rifting

You saw me~ -a visitor from china

Katılım Kasım 2019

15 Takip Edilen3 Takipçiler

DRiftingZ@d_rifting·2d

@python_xxt 用户是Claude商业叙事Play的一环罢了

中文

Robinson · 鲁棒逊@python_xxt·3d

因为Claude的用户逻辑是悖论式的，给Claude多充钱的用户，往往是Claude 最讨厌的超重度用户，实际上耗费了Claude 更多的Token资源充的越多，Claude 赔的越多理解这个，也就不奇怪Claude 封号逻辑了

卫斯理@imwsl90

群聊了一会儿，发现一个很搞笑的事情 Claude 不充钱，不会被封冲20刀，不会被封冲100刀、200刀，封的概率很大总之，你给他们钱越多，被封的概率越大

中文

9.5K

DRiftingZ@d_rifting·2d

@_FORAB 白事薯片

日本語

2.5K

AB Kuai.Dong@_FORAB·2d

笑死我了。因中东战火导致印刷油墨原料供应不足，日本畅销零食公司卡乐比，宣布将把旗下的 14 种零食包装，改成黑白色，以此来节省油墨原料。由于宣布后，民间影响过于负面，政府今天已紧急约谈该公司。

Meguro-ku, Tokyo 🇯🇵 中文

124

413

100.3K

DRiftingZ@d_rifting·3d

@fd_a_e I feel there's an incredible business opportunity...

English

ふどあ@fd_a_e·4d

・日本語が弱い中国系LLM（KIMIとかQwenとか）で日本語出すと频繁にこうなる・日本语が弱い学习者が读み方分からないから中国語で入力して补うものの、日本語が弱いから简体字と新字体が异なることに气付かずこうなる

ルーピー@RuupiiYukio

こういう「母国語出てしまった」系の「簡体字が混ざった日本語の文章」って、一体どういう入力方法をしたらそうなるのかいつも気になるんだよな。普通に日本語を入力する時は日本語入力キーボードを使う訳じゃん。

日本語

1.1M

DRiftingZ@d_rifting·26 Nis

@honeshabri Use the appropriate model for different scenarios

Italiano

1.8K

骨しゃぶり@honeshabri·26 Nis

「DeepSeekショック」とか騒がれた時は「確かにすごいかもしれないけど、中国製AIをわざわざ使う必要ある？ChatGPTやClaudeの方がいいだろ」と思っていた。なのに気がついたら、Hermesを動かしているのはQwen 3.5 Plusと3.6 Plusだし、地元で動かしているのもQwen 3.6になっている。中国AI普通に強いな

日本語

672

105.8K

DRiftingZ@d_rifting·17 Nis

@ultraman_DT This is incredible

English

夜鶯(14歳)@ultraman_DT·16 Nis

東京で地下鉄乗るの怖いですね

日本語

1.1K

316.5K

DRiftingZ@d_rifting·20 Mar

@Orange41324306 另一个冷知识，用compile的话batch_size也不能超过65536

中文

291

Pollux9437@Orange41324306·19 Mar

今日冷知识：LLM的hidden_dim一般不会超过65536，硬件层面的限制

中文

15.9K

DRiftingZ@d_rifting·19 Mar

@im_magneto 损失函数和反向传播，很神奇吧

中文

1.4K

全麦面包🍞@im_magneto·19 Mar

以前我的观点一直是：神经网络不仅慢负载也高，简单任务还是应该依赖传统算法解决。结果我这几个月从头理了一遍模型的设计思路现在我的分类模型在性能不如 Cortex-A53 的单核MCU上跑模型识别任务不到 1ms 完成. 比起来，我的传统算法处理简单色彩检测的任务居然要 2ms，还不准确……

中文

248

36.1K

DRiftingZ@d_rifting·12 Şub

@ytiskw MIT by the way!

English

150

石川陽太 Yota Ishikawa@ytiskw·11 Şub

え、嘘だろ...中華モデルのGLM 5がClaude Opus 4.6とGPT 5.3 Codexを上回るスコアを記録した？しかもGPTやOpusと比較して圧倒的に安価で、オープンソース版の公開を予定

BridgeMind@bridgemindai

GLM 5 beats Claude Opus 4.6 and GPT 5.3 Codex in the AICodeKing benchmark

日本語

660

116K

DRiftingZ@d_rifting·21 Kas

@YouJiacheng Just like unet. We are so back

English

You Jiacheng@YouJiacheng·19 Kas

ε = -x + \text{input} JiT says we should predict "x" instead of "ε". This is equivalent to say we should add one U-skip-connection to the ViT in ε-pred: f(\text{input}) = ViT(\text{input}) + \text{input} then the ViT part can predict "-x".

机器之心 JIQIZHIXIN@jiqizhixin

Huge! @TianhongLi6 & Kaiming He (inventor of ResNet) just Introduced JiT (Just image Transformers)! JiTs are simple large-patch Transformers that operate on raw pixels, no tokenizer, pre-training, or extra losses needed. By predicting clean data on the natural-data manifold, JiT excels in high-dimensional spaces where traditional noise-predicting models can fail. On ImageNet (256 & 512), JiT achieves competitive generative performance, showing that sometimes going back to basics is the key.

English

120

16.8K

DRiftingZ@d_rifting·9 Kas

@askalphaxiv I like this idea. It’s simple and intuitive.

English

alphaXiv@askalphaxiv·7 Kas

New paper from ByteDance Seed: Scaling Latent Reasoning via Looped LMs This paper proposes Ouro, which reuse the same layers to think in latent space instead of dumping long chain-of-thought text 2-3x param efficiency + increased performance via iterative latent computation

English

141

9.4K

DRiftingZ@d_rifting·8 Eki

@HPouransari I can see the future!

English

434

Hadi Pouransari@HPouransari·6 Eki

Introducing Pretraining with Hierarchical Memories: Separating Knowledge & Reasoning for On-Device LLM Deployment 💡We propose dividing LLM parameters into 1) anchor (always used, capturing commonsense) and 2) memory bank (selected per query, capturing world knowledge). [1/X]🧵

GIF

English

112

635

170.4K

DRiftingZ@d_rifting·15 Haz

@SakanaAILabs I see the future

English

Sakana AI@SakanaAILabs·12 Haz

We’re excited to introduce Text-to-LoRA: a Hypernetwork that generates task-specific LLM adapters (LoRAs) based on a text description of the task. Catch our presentation at #ICML2025! Paper: arxiv.org/abs/2506.06105 Code: github.com/SakanaAI/Text-… Biological systems are capable of rapid adaptation, given limited sensory cues. For example, our human visual system can quickly adapt and tune its light sensitivity to our surroundings. While modern LLMs exhibit a wide variety of capabilities and knowledge, they remain rigid when adding task-specific capabilities. Traditionally, customizing these models requires gathering large datasets and performing often expensive, time-consuming fine-tuning for specific applications. To bypass these limitations, Text-to-LoRA (T2L) meta-learns a “hypernetwork” that takes in a text description of a desired task, as a prompt, and generates a task-specific LoRA that performs well on the task. In our experiments, we show that T2L can encode hundreds of existing LoRA adapters. While the compression is lossy, T2L maintains the performance of task-specifically tuned LoRA adapters. We also show that T2L can even generalize to unseen tasks given a natural language description of the tasks. Importantly, Text-to-LoRA is parameter-efficient. It generates LoRAs in a single, inexpensive step, based solely on a simple text description of the task. This approach is a step towards dramatically lowering the technical and computational barriers, allowing non-technical users to specialize foundation models using plain language, rather than needing deep technical expertise or large compute resources.

English

376

1.8K

402.8K

DRiftingZ@d_rifting·4 Haz

@jxmnop Now I want more datasets and gpus...

English

115

dr. jack morris@jxmnop·3 Haz

new paper from our work at Meta! **GPT-style language models memorize 3.6 bits per param** we compute capacity by measuring total bits memorized, using some theory from Shannon (1953) shockingly, the memorization-datasize curves look like this: ___________ / / (🧵)