Xiaoliu.x

231 posts

Xiaoliu.x

@xiaolGo

ex-algo engineer → architect → founder → researcher → ? My reading list: https://t.co/raW6A7xfMS https://t.co/b0JOLNr4bE

शामिल हुए Ocak 2010

21 फ़ॉलोइंग130 फ़ॉलोवर्स

पिन किया गया ट्वीट

Xiaoliu.x@xiaolGo·6 Ara

Sometime dreaming is a thinking game, openreview.net/forum?id=HHsD9…

English

335

Xiaoliu.x@xiaolGo·16h

Why do people keep showing off something that others want to participate in, but can't actually join? For example, SpaceX. Maybe I should just go back home and invent aliens!

English

Xiaoliu.x@xiaolGo·16h

@jeremyphoward Check the GLM 5.2

English

Jeremy Howard@jeremyphoward·1d

I disagree with this decision and I don't like it. But also... HOW DID ANTHROPIC NOT SEE THIS COMING‽ It is *the* obvious response to "this is too dangerous for anyone except us to use", since that relies on a premise ("we are uniquely good") that almost no-one agrees with.

Anthropic@AnthropicAI

The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance. Access to all other Claude models is not affected. We apologize for this disruption to our customers. We believe this is a misunderstanding and are working to restore access as soon as possible. Read our full statement: anthropic.com/news/fable-myt…

English

227

203

3.1K

201.6K

Xiaoliu.x@xiaolGo·16h

@TheAhmadOsman Opensource AI the final frontier...

English

Ahmad@TheAhmadOsman·1d

x.com/i/article/2065…

ZXX

218

452.3K

Xiaoliu.x@xiaolGo·16h

@jbhuang0604 That's legendary, lol.

English

Jia-Bin Huang@jbhuang0604·1d

Explaining JEPA in 10 seconds

English

416

17.8K

Xiaoliu.x@xiaolGo·2d

The Delta Memory design for hybrid online memory is very inspiring. It presents a great direction for post-training. We can leverage RWKV memory to bring RNN features into high-level HRM. github.com/xiaol/HRM-RWKV…

English

Xiaoliu.x रीट्वीट किया

BlinkDL@BlinkDL_AI·2d

Pure RNN can code: RWKV-7 G1g 7B (100% RNN) batch vibe coding on huggingface.co/spaces/BlinkDL… select [✨HTML Generation] to try any prompts including your own (click [Stop] a few times before trying a new prompt - sometimes laggy as it's using ZeroGPU). Not bad for a base model🙂

BlinkDL@BlinkDL_AI

RWKV-7 G1g is here: the world's best pure RNN LLM, and a competitive LLM in general. Try huggingface.co/spaces/BlinkDL… for bsz16 7B inference. G1h in June 🙂 p.s. const 15000+tps decoding on single 5090: github.com/BlinkDL/Albatr…

English

3.9K

Xiaoliu.x@xiaolGo·6 Haz

@hthieblot 爱

中文

Hubert Thieblot@hthieblot·5 Haz

pitch me your company in 1 word.

English

3.1K

1.3K

461.9K

Xiaoliu.x@xiaolGo·6 Haz

With the Universal Transformer setup and recurrent state rollouts built into the architecture using triple latent states, how can we build an even stronger model? arxiv.org/abs/2604.25930

English

Xiaoliu.x@xiaolGo·6 Haz

ArXiv limits researchers’ creativity: if you give them eight submissions over two months, only three end up published.

English

Xiaoliu.x@xiaolGo·6 Haz

For RL, we’re investigating how to adopt high-concurrency states to implement horizontal & depth rollouts, and breaking OOD constraints stands as the key bottleneck researchers need to resolve to boost model capacity. arxiv.org/abs/2604.09671

English

Xiaoliu.x@xiaolGo·6 Haz

Two months ago, higher-order design within attention mechanisms was seen as a promising research direction. However, clever iterative loops now perform better for small models. My work at arxiv.org/abs/2606.05175 centers on more intricate high-dimensional high-order spaces, exploring how to embed these structures into LLM latent spaces in a more intuitive manner.

English

Xiaoliu.x@xiaolGo·26 May

Sapient's HRM text pushes pretraining to the next level. I hybridized it with RNN. It looks like the H level replacement for RWKV has more potential to explore. I'm thinking of doing a full pretraining.

English

Xiaoliu.x रीट्वीट किया

BlinkDL@BlinkDL_AI·25 May

RWKV community built a SKILL to migrate from GDN-2 (and other similar archs) to RWKV-7, with interesting results 🙂 let us know your experience github.com/Jellyfish042/G…

BlinkDL@BlinkDL_AI

Gated DeltaNet-2 is almost exactly RWKV-7's DPLR recurrence, not acknowledging the elephant in the room 🙂

English

14K

Xiaoliu.x@xiaolGo·24 May

Let us to find a new way to scale. Looped Transformer with linear attention is impressive. If we combine it with the core ideas of RNN, we can gain advantages from both pure and hybrid architectures.

English

300

Xiaoliu.x रीट्वीट किया

BlinkDL@BlinkDL_AI·23 May

BlinkDL@BlinkDL_AI

RWKV-7 G1f is here (13B/7B/3B/1B) and G1g in May. p.s. Gemma 4 is great at "uncheatable eval" confirming its effectiveness 🙂 pity there's no Qwen3.5 27B base

English

117

24.6K

Xiaoliu.x रीट्वीट किया

BlinkDL@BlinkDL_AI·22 May

Gated DeltaNet-2 is almost exactly RWKV-7's DPLR recurrence, not acknowledging the elephant in the room 🙂

English

134

28.7K

Xiaoliu.x@xiaolGo·23 May

How can I compare Transformer with modern RNNs? @GoodfireAI