bycloud

1.5K posts

bycloud banner
bycloud

bycloud

@bycloudai

I make youtube videos on cool AI research /// AI papers newsletter https://t.co/Xn7GMDbQSd /// paper recap @TheAITimeline /// https://t.co/yigZMs32sO

Katılım Ocak 2020
774 Takip Edilen10.8K Takipçiler
bycloud
bycloud@bycloudai·
looking for someone that's familiar with ML/LLMs and like to make nice diagrams, graphics or even animations and get paid for it! dm's open
English
5
3
35
2.1K
bycloud retweetledi
The AI Timeline
The AI Timeline@TheAITimeline·
🚨This week's top AI/ML research papers: - OpenClaw-RL - Neural Thickets - IndexCache - Lost in Backpropagation - Training Language Models via Neural Cellular Automata - How Far Can Unsupervised RLVR Scale LLM Training? - Exclusive Self Attention - GLM-OCR Technical Report overview for each + authors' explanations read this in thread mode for the best experience
English
4
53
709
41K
bycloud
bycloud@bycloudai·
@kalomaze ig this is just looking at the same problem but backwards
English
1
0
11
1.7K
bycloud
bycloud@bycloudai·
how big of a problem is this? > When backproping through the LM head, about 95-99% of the logit-gradient norm lies in directions that get projected away seems like the current workaround is just to use scaling to brute force it
bycloud tweet media
English
22
36
349
40.1K
bycloud
bycloud@bycloudai·
@juliarturc wow amazing work! what did u use to make those diagrams? would love to try to make those too for my videos lol
English
1
0
1
399
Julia Turc
Julia Turc@juliarturc·
Diffusion models clicked for me when I started seeing them through the lens of particle motion. I built this interactive playground where you too can clickety-clack to understand how drift, noise, and other hyperparams control diffusion. I hereby submit this as penance for the sin of YouTube edu-tainment 😇 Link in the first comment.
English
21
43
555
30.4K
bycloud retweetledi
The AI Timeline
The AI Timeline@TheAITimeline·
🚨This week's top AI/ML research papers: - FlashAttention-4 - Beyond Language Modeling - Speculative Speculative Decoding - Symmetry in language statistics shapes the geometry of model representations - SWE-CI - Real Money, Fake Models - Pretrained Vision-Language-Action Models are Surprisingly Resistant to Forgetting in Continual Learning overview for each + authors' explanations read this in thread mode for the best experience
English
2
41
410
24.9K
bycloud
bycloud@bycloudai·
just read the JEPA papers i finally understood what Yann LeCun is cooking now
English
31
31
1K
85.8K
bycloud
bycloud@bycloudai·
What's happening at qwen? Junyang Lin and Binyuan Hui have been there since day 1
bycloud tweet media
English
4
4
94
7.5K
bycloud retweetledi
The AI Timeline
The AI Timeline@TheAITimeline·
🚨This week's top AI/ML research papers: - Learning Without Training - Doc-to-LoRA - The Geometry of Noise - How to Train Your Deep Research Agent? - Agents of Chaos - A Very Big Video Reasoning Suite - DualPath read this in thread mode for the best experience
English
1
77
816
53K
bycloud retweetledi
The AI Timeline
The AI Timeline@TheAITimeline·
🚨This week's top AI/ML research papers: - GLM-5 - Experiential Reinforcement Learning - Image Generation with a Sphere Encoder - World Action Models are Zero-shot Policies - Unified Latents - Fast KV Compaction via Attention Matching - Adam Improves Muon - LUCID - The Molecular Structure of Thought - Arcee Trinity Large Technical Report read this in thread mode for the best experience
English
4
27
241
15.9K
bycloud retweetledi
The AI Timeline
The AI Timeline@TheAITimeline·
🚨Last 2 week's top AI/ML research papers: - Generative Modeling via Drifting - Learning to Reason in 13 Parameters - Maximum Likelihood Reinforcement Learning - Kimi K2.5 - Learning a Generative Meta-Model of LLM Activations - On-Policy Context Distillation for LMs - SkillRL - Retrieval-Aware Distillation for Transformer-SSM Hybrids - ViT-5 read this in thread mode for the best experience
The AI Timeline tweet media
English
4
12
128
9.8K
bycloud
bycloud@bycloudai·
@nullvaluetensor that’s MRCR not MRCRv2, and probs 2 needles instead of 8
English
1
0
14
513
bycloud
bycloud@bycloudai·
if this is verified on third party, then anthropic might've had the biggest architecture breakthrough in 2026 MRCR v2 with 8 needle at 1 mil ctx is HARD for comparison: Gemini 3 Pro got 26.3% Gemini 3 Flash got 22.1% a 288% improvements vs prev. SoTA for long context is nuts
bycloud tweet media
English
16
49
959
58.4K