Xiaoliu.x

231 posts

Xiaoliu.x banner
Xiaoliu.x

Xiaoliu.x

@xiaolGo

ex-algo engineer → architect → founder → researcher → ? My reading list: https://t.co/raW6A7xfMS https://t.co/b0JOLNr4bE

शामिल हुए Ocak 2010
21 फ़ॉलोइंग130 फ़ॉलोवर्स
Xiaoliu.x
Xiaoliu.x@xiaolGo·
Why do people keep showing off something that others want to participate in, but can't actually join? For example, SpaceX. Maybe I should just go back home and invent aliens!
English
0
0
0
16
Jia-Bin Huang
Jia-Bin Huang@jbhuang0604·
Explaining JEPA in 10 seconds
English
8
40
416
17.8K
Xiaoliu.x
Xiaoliu.x@xiaolGo·
The Delta Memory design for hybrid online memory is very inspiring. It presents a great direction for post-training. We can leverage RWKV memory to bring RNN features into high-level HRM. github.com/xiaol/HRM-RWKV…
English
1
0
2
46
Xiaoliu.x रीट्वीट किया
BlinkDL
BlinkDL@BlinkDL_AI·
Pure RNN can code: RWKV-7 G1g 7B (100% RNN) batch vibe coding on huggingface.co/spaces/BlinkDL… select [✨HTML Generation] to try any prompts including your own (click [Stop] a few times before trying a new prompt - sometimes laggy as it's using ZeroGPU). Not bad for a base model🙂
BlinkDL tweet mediaBlinkDL tweet mediaBlinkDL tweet media
BlinkDL@BlinkDL_AI

RWKV-7 G1g is here: the world's best pure RNN LLM, and a competitive LLM in general. Try huggingface.co/spaces/BlinkDL… for bsz16 7B inference. G1h in June 🙂 p.s. const 15000+tps decoding on single 5090: github.com/BlinkDL/Albatr…

English
2
9
55
3.9K
Hubert Thieblot
Hubert Thieblot@hthieblot·
pitch me your company in 1 word.
English
3.1K
40
1.3K
461.9K
Xiaoliu.x
Xiaoliu.x@xiaolGo·
With the Universal Transformer setup and recurrent state rollouts built into the architecture using triple latent states, how can we build an even stronger model? arxiv.org/abs/2604.25930
English
0
0
0
40
Xiaoliu.x
Xiaoliu.x@xiaolGo·
ArXiv limits researchers’ creativity: if you give them eight submissions over two months, only three end up published.
English
3
0
1
22
Xiaoliu.x
Xiaoliu.x@xiaolGo·
For RL, we’re investigating how to adopt high-concurrency states to implement horizontal & depth rollouts, and breaking OOD constraints stands as the key bottleneck researchers need to resolve to boost model capacity. arxiv.org/abs/2604.09671
English
0
0
0
18
Xiaoliu.x
Xiaoliu.x@xiaolGo·
Two months ago, higher-order design within attention mechanisms was seen as a promising research direction. However, clever iterative loops now perform better for small models. My work at arxiv.org/abs/2606.05175 centers on more intricate high-dimensional high-order spaces, exploring how to embed these structures into LLM latent spaces in a more intuitive manner.
English
1
0
0
37
Xiaoliu.x
Xiaoliu.x@xiaolGo·
Sapient's HRM text pushes pretraining to the next level. I hybridized it with RNN. It looks like the H level replacement for RWKV has more potential to explore. I'm thinking of doing a full pretraining.
Xiaoliu.x tweet media
English
0
0
1
94
Xiaoliu.x
Xiaoliu.x@xiaolGo·
Let us to find a new way to scale. Looped Transformer with linear attention is impressive. If we combine it with the core ideas of RNN, we can gain advantages from both pure and hybrid architectures.
Xiaoliu.x tweet media
English
0
1
4
300
Xiaoliu.x रीट्वीट किया
BlinkDL
BlinkDL@BlinkDL_AI·
RWKV-7 G1g is here: the world's best pure RNN LLM, and a competitive LLM in general. Try huggingface.co/spaces/BlinkDL… for bsz16 7B inference. G1h in June 🙂 p.s. const 15000+tps decoding on single 5090: github.com/BlinkDL/Albatr…
BlinkDL tweet media
BlinkDL@BlinkDL_AI

RWKV-7 G1f is here (13B/7B/3B/1B) and G1g in May. p.s. Gemma 4 is great at "uncheatable eval" confirming its effectiveness 🙂 pity there's no Qwen3.5 27B base

English
4
23
117
24.6K
Xiaoliu.x रीट्वीट किया
BlinkDL
BlinkDL@BlinkDL_AI·
Gated DeltaNet-2 is almost exactly RWKV-7's DPLR recurrence, not acknowledging the elephant in the room 🙂
English
3
16
134
28.7K
Xiaoliu.x
Xiaoliu.x@xiaolGo·
How can I compare Transformer with modern RNNs? @GoodfireAI
GIF
GIF
GIF
GIF
English
0
0
0
28
Xiaoliu.x
Xiaoliu.x@xiaolGo·
Current LLMs are trapped in the dilemma of incremental inovations.
English
0
0
0
26