DRiftingZ

62 posts

DRiftingZ

DRiftingZ

@d_rifting

You saw me~ -a visitor from china

Katılım Kasım 2019
15 Takip Edilen3 Takipçiler
Robinson · 鲁棒逊
Robinson · 鲁棒逊@python_xxt·
因为Claude的用户逻辑是悖论式的,给Claude多充钱的用户,往往是Claude 最讨厌的超重度用户,实际上耗费了Claude 更多的Token资源 充的越多,Claude 赔的越多 理解这个,也就不奇怪Claude 封号逻辑了
卫斯理@imwsl90

群聊了一会儿,发现一个很搞笑的事情 Claude 不充钱,不会被封 冲20刀,不会被封 冲100刀、200刀,封的概率很大 总之,你给他们钱越多,被封的概率越大

中文
10
1
40
9.5K
AB Kuai.Dong
AB Kuai.Dong@_FORAB·
笑死我了。因中东战火导致印刷油墨原料供应不足,日本畅销零食公司卡乐比,宣布将把旗下的 14 种零食包装,改成黑白色,以此来节省油墨原料。 由于宣布后,民间影响过于负面,政府今天已紧急约谈该公司。
AB Kuai.Dong tweet mediaAB Kuai.Dong tweet media
Meguro-ku, Tokyo 🇯🇵 中文
124
28
413
100.3K
DRiftingZ
DRiftingZ@d_rifting·
@fd_a_e I feel there's an incredible business opportunity...
English
0
0
0
16
ふどあ
ふどあ@fd_a_e·
・日本語が弱い中国系LLM(KIMIとかQwenとか)で日本語出すと频繁にこうなる ・日本语が弱い学习者が读み方分からないから中国語で入力して补うものの、日本語が弱いから简体字と新字体が异なることに气付かずこうなる
ルーピー@RuupiiYukio

こういう「母国語出てしまった」系の「簡体字が混ざった日本語の文章」って、一体どういう入力方法をしたらそうなるのかいつも気になるんだよな。普通に日本語を入力する時は日本語入力キーボードを使う訳じゃん。

日本語
36
2K
9K
1.1M
DRiftingZ
DRiftingZ@d_rifting·
@honeshabri Use the appropriate model for different scenarios
Italiano
0
0
0
1.8K
骨しゃぶり
骨しゃぶり@honeshabri·
「DeepSeekショック」とか騒がれた時は「確かにすごいかもしれないけど、中国製AIをわざわざ使う必要ある?ChatGPTやClaudeの方がいいだろ」と思っていた。なのに気がついたら、Hermesを動かしているのはQwen 3.5 Plusと3.6 Plusだし、地元で動かしているのもQwen 3.6になっている。中国AI普通に強いな
日本語
16
52
672
105.8K
夜鶯(14歳)
夜鶯(14歳)@ultraman_DT·
東京で地下鉄乗るの怖いですね
日本語
25
89
1.1K
316.5K
DRiftingZ
DRiftingZ@d_rifting·
@Orange41324306 另一个冷知识,用compile的话batch_size也不能超过65536
中文
0
0
1
291
Pollux9437
Pollux9437@Orange41324306·
今日冷知识:LLM的hidden_dim一般不会超过65536,硬件层面的限制
中文
5
2
62
15.9K
全麦面包🍞
全麦面包🍞@im_magneto·
以前我的观点一直是:神经网络不仅慢负载也高,简单任务还是应该依赖传统算法解决。 结果 我这几个月从头理了一遍模型的设计思路 现在我的分类模型在性能不如 Cortex-A53 的单核MCU上跑模型 识别任务不到 1ms 完成. 比起来,我的传统算法处理简单色彩检测的任务居然要 2ms,还不准确……
中文
36
4
248
36.1K
You Jiacheng
You Jiacheng@YouJiacheng·
ε = -x + \text{input} JiT says we should predict "x" instead of "ε". This is equivalent to say we should add one U-skip-connection to the ViT in ε-pred: f(\text{input}) = ViT(\text{input}) + \text{input} then the ViT part can predict "-x".
机器之心 JIQIZHIXIN@jiqizhixin

Huge! @TianhongLi6 & Kaiming He (inventor of ResNet) just Introduced JiT (Just image Transformers)! JiTs are simple large-patch Transformers that operate on raw pixels, no tokenizer, pre-training, or extra losses needed. By predicting clean data on the natural-data manifold, JiT excels in high-dimensional spaces where traditional noise-predicting models can fail. On ImageNet (256 & 512), JiT achieves competitive generative performance, showing that sometimes going back to basics is the key.

English
5
10
120
16.8K
alphaXiv
alphaXiv@askalphaxiv·
New paper from ByteDance Seed: Scaling Latent Reasoning via Looped LMs This paper proposes Ouro, which reuse the same layers to think in latent space instead of dumping long chain-of-thought text 2-3x param efficiency + increased performance via iterative latent computation
alphaXiv tweet media
English
8
26
141
9.4K
Hadi Pouransari
Hadi Pouransari@HPouransari·
Introducing Pretraining with Hierarchical Memories: Separating Knowledge & Reasoning for On-Device LLM Deployment 💡We propose dividing LLM parameters into 1) anchor (always used, capturing commonsense) and 2) memory bank (selected per query, capturing world knowledge). [1/X]🧵
GIF
English
11
112
635
170.4K
Sakana AI
Sakana AI@SakanaAILabs·
We’re excited to introduce Text-to-LoRA: a Hypernetwork that generates task-specific LLM adapters (LoRAs) based on a text description of the task. Catch our presentation at #ICML2025! Paper: arxiv.org/abs/2506.06105 Code: github.com/SakanaAI/Text-… Biological systems are capable of rapid adaptation, given limited sensory cues. For example, our human visual system can quickly adapt and tune its light sensitivity to our surroundings. While modern LLMs exhibit a wide variety of capabilities and knowledge, they remain rigid when adding task-specific capabilities. Traditionally, customizing these models requires gathering large datasets and performing often expensive, time-consuming fine-tuning for specific applications. To bypass these limitations, Text-to-LoRA (T2L) meta-learns a “hypernetwork” that takes in a text description of a desired task, as a prompt, and generates a task-specific LoRA that performs well on the task. In our experiments, we show that T2L can encode hundreds of existing LoRA adapters. While the compression is lossy, T2L maintains the performance of task-specifically tuned LoRA adapters. We also show that T2L can even generalize to unseen tasks given a natural language description of the tasks. Importantly, Text-to-LoRA is parameter-efficient. It generates LoRAs in a single, inexpensive step, based solely on a simple text description of the task. This approach is a step towards dramatically lowering the technical and computational barriers, allowing non-technical users to specialize foundation models using plain language, rather than needing deep technical expertise or large compute resources.
English
47
376
1.8K
402.8K
DRiftingZ
DRiftingZ@d_rifting·
@jxmnop Now I want more datasets and gpus...
English
0
0
0
115
dr. jack morris
dr. jack morris@jxmnop·
new paper from our work at Meta! **GPT-style language models memorize 3.6 bits per param** we compute capacity by measuring total bits memorized, using some theory from Shannon (1953) shockingly, the memorization-datasize curves look like this: ___________ / / (🧵)
dr. jack morris tweet mediadr. jack morris tweet media
English
77
371
3.3K
410.3K
Chubby♨️
Chubby♨️@kimmonismus·
Absolute cinema
Chubby♨️ tweet media
Deutsch
134
1.5K
15.6K
735.1K
You Jiacheng
You Jiacheng@YouJiacheng·
1) WHAT
You Jiacheng tweet media
English
66
723
8.3K
433.3K
PyQuant News 🐍
PyQuant News 🐍@pyquantnews·
I'm 43. If you're still in your 20s (or 30s), read this:
English
85
794
8.7K
1.6M
autoshotgun
autoshotgun@ZxdaZpPwK7Ae2xW·
autoshotgun tweet mediaautoshotgun tweet media
ZXX
25
1.2K
11.9K
1.7M
DRiftingZ
DRiftingZ@d_rifting·
@achat_8 I really love these little details...
English
1
0
1
320
アハト
アハト@achat_8·
リハビリ中…
アハト tweet media
日本語
6
330
5.4K
55.7K