Ziyu Wang

418 posts

Ziyu Wang banner
Ziyu Wang

Ziyu Wang

@ziyuwang

Machine learning researcher. Co-founder / CTO of https://t.co/h85fQdqLrr

Katılım Mayıs 2009
424 Takip Edilen1.5K Takipçiler
Ziyu Wang retweetledi
Sander Dieleman
Sander Dieleman@sedielem·
Several new methods to shape the latent distributions of autoencoders have popped up recently. They are often compared against the traditional VAE setup, where a KL penalty encourages the latents to be Gaussian. 🧵👇 (1/10)
English
4
29
226
19.6K
Ziyu Wang retweetledi
Nando de Freitas
Nando de Freitas@NandoDF·
Hello World. MAI image is our first image generation model, following MAI voice and MAI text. Congrats team! Oh, and if you are a strong engineer who wants to make this model climb to Number 1, please send your CV to JoinAITeam@microsoft.com
Arena.ai@arena

🚨New model drop into the Top 10! 🖼️ @MicrosoftAI just entered the Image Arena with MAI-Image-1. Community votes are already rolling in, and MAI-Image-1 has broken into the Top 10. It’s currently ranked #9, tied with Seedream 3! MAI-Image-1 is now live in Direct Chat for early access on LMArena.

English
7
6
84
25.8K
Ziyu Wang retweetledi
Nathan Lambert
Nathan Lambert@natolambert·
Stoked to get to talk to @lexfridman + my homie @dylan522p for 5+ hours to try and get to the bottom of what is actually happening in AI right now. DeepSeek R1 & V3, China v US, open vs closed, decreasing hype, datacenters, everything in between... 🚀 what a fun whirlwind week
Nathan Lambert tweet mediaNathan Lambert tweet media
Lex Fridman@lexfridman

Here's my 5-hour conversation with @dylan522p and @natolambert on DeepSeek, China, OpenAI, NVIDIA, xAI, Google, Anthropic, Meta, Microsoft, TSMC, Stargate, megacluster buildouts, RL, reasoning, and a lot of other topics at the cutting edge of AI. This is was a mind-blowing, super-technical, and fun conversation. Yes, we discuss r1 and o3-mini, but more importantly we look into the future of technology, geopolitics, and humanity in a world that stands on the precipice of a global AI revolution. The first 4 hours are here on X (4 hours is current limit), and the full 5 hours are up everywhere else. Links in comment. Timestamps: 0:00 - Introduction 3:33 - DeepSeek-R1 and DeepSeek-V3 25:07 - Low cost of training 51:25 - DeepSeek compute cluster 58:57 - Export controls on GPUs to China 1:09:16 - AGI timeline 1:18:41 - China's manufacturing capacity 1:26:36 - Cold war with China 1:31:05 - TSMC and Taiwan 1:54:44 - Best GPUs for AI 2:09:36 - Why DeepSeek is so cheap 2:22:55 - Espionage 2:31:57 - Censorship 2:44:52 - Andrej Karpathy and magic of RL 2:55:23 - OpenAI o3-mini vs DeepSeek r1 3:14:31 - NVIDIA 3:18:58 - GPU smuggling 3:25:36 - DeepSeek training on OpenAI data 3:36:04 - AI megaclusters 4:11:26 - Who wins the race to AGI? 4:21:39 - AI agents 4:30:21 - Programming and AI 4:37:49 - Open source 4:47:01 - Stargate 4:54:30 - Future of AI

English
63
59
642
89.5K
Ziyu Wang retweetledi
Jiayi Pan
Jiayi Pan@jiayi_pirate·
We reproduced DeepSeek R1-Zero in the CountDown game, and it just works Through RL, the 3B base LM develops self-verification and search abilities all on its own You can experience the Ahah moment yourself for < $30 Code: github.com/Jiayi-Pan/Tiny… Here's what we learned 🧵
Jiayi Pan tweet media
English
192
1.2K
6.3K
1.7M
Ziyu Wang retweetledi
Alex Dimakis
Alex Dimakis@AlexGDimakis·
Most AI researchers I talk to have been a bit shocked by DeepSeek-R1 and its performance. My preliminary understanding nuggets: 1. Simple post-training recipe called GRPO: Start with a good model and reward for correctness and style outcomes. No PRM, no MCTS no fancy reward models. Basically checks if the answer is correct. 😅 2. Small models can reason very very well with correct distillation post-training. They released a 1.5B model (!) that is better than Claude and Llama 405B in AIME24. Also, their distilled 7B model seems better than o1 preview. 🤓 3. The datasets used are not released, if I understand correctly. 🫤 4. DeepSeek seems to be the best at executing Open AI's original mission right now. We need to catch up.
English
24
127
1.4K
181.8K
Ziyu Wang retweetledi
Dr Singularity
Dr Singularity@Dr_Singularity·
This can be big. Google unveils the successor to the Transformer architecture "We present a new neural long term memory module that learns to memorize historical context and helps an attention to attend to the current context while utilizing long past information. We show that this neural memory has the advantage of a fast parallelizable training while maintaining a fast inference." "From a memory perspective, we argue that attention due to its limited context but accurate dependency modeling performs as a short term memory, while neural memory due to its ability to memorize the data, acts as a long-term, more persistent, memory. Based on these two modules, we introduce a new family of architectures, called Titans, and present three variants to address how one can effectively incorporate memory into this architecture." "Our experimental results on language modeling, common sense reasoning, genomics, and time series tasks show that Titans are more effective than Transformers and recent modern linear recurrent models." "They further can effectively scale to larger than 2M context window size with higher accuracy in needle in haystack tasks compared to baselines."
Dr Singularity tweet media
English
38
353
2.3K
145.9K
Ziyu Wang retweetledi
Ziyu Wang retweetledi
Nathan Lambert
Nathan Lambert@natolambert·
Qwen released a 72B process reward model (PRM) on their recent math model. A good chance it's the best PRM openly available for reasoning research. We like Qwen.
Nathan Lambert tweet media
English
5
42
291
68.3K
Tatiana Tsiguleva
Tatiana Tsiguleva@ciguleva·
Whoever has access to Veo 2, Sora, Runway, Pika, Haiper, Luma, Kling… you name it. Could you please animate this guy? Let's compare image-to-video models. * - I don't have access to Veo 2 (waitlist)
Tatiana Tsiguleva tweet media
English
51
41
795
98.7K
Tatiana Tsiguleva
Tatiana Tsiguleva@ciguleva·
Whoever has access to Sora, Runway, Pika, Haiper, Luma, Kling… you name it. Could you please animate this eye? Let's compare image-to-video models.
Tatiana Tsiguleva tweet media
English
164
1.1K
19.4K
1.3M
Tatiana Tsiguleva
Tatiana Tsiguleva@ciguleva·
Whoever has access to Sora, Runway, Pika, Haiper, Luma, Kling… you name it. Could you please animate this guy? Let's compare image-to-video models.
Tatiana Tsiguleva tweet media
English
293
604
11.4K
2M
Ziyu Wang retweetledi
Haiper AI
Haiper AI@HaiperGenAI·
🚀 Haiper 2.5: Enhanced Mode is here. Take control like never before with Keyframe Conditioning Timeline, letting you customize every frame to perfection. Sharper. Smoother. Simply revolutionary. Catch us at haiper.ai and take your creativity to the next level. 🌟 #AI #EnhancedMode #Haiper2_5
English
14
44
258
21.4K
Ziyu Wang retweetledi
ρŁ𝐀𝔰Mʘ
ρŁ𝐀𝔰Mʘ@plasm0·
Don't sleep on @HaiperGenAI They just released their v2.5 model and it passes the brain worm test - both in t2v prompt adherence and not blocking what I want visualized. Plus the resolution is quite good on initial tests. Now go eat some turkey 🦃
English
5
6
50
2.9K
Ziyu Wang retweetledi
Haiper AI
Haiper AI@HaiperGenAI·
🚀 Introducing Haiper 2.0: Text-to-Image Like Never Before! 🚀 Unleash your creativity with sharper, more realistic visuals at lightning speed. Whether you’re a creator or a brand, Haiper 2.0’s Text-to-Image feature makes transforming ideas into images effortless. Ready to see the magic? 🪄
English
8
24
142
18.4K
Ziyu Wang retweetledi
Min Choi
Min Choi@minchoi·
11. Haiper 2.0 drops 2.0 brings sharper movements, stunning visuals, and dynamic templates
English
1
5
68
6.2K
Ziyu Wang retweetledi
Haiper AI
Haiper AI@HaiperGenAI·
Whaaa, Haiper.ai is now more than just #AIvideo?! YUP. Check out these 8 wild AI text-to-images, then try it out for free at haiper.ai! #AIimages
Haiper AI tweet mediaHaiper AI tweet mediaHaiper AI tweet mediaHaiper AI tweet media
London, England 🇬🇧 English
6
3
24
3K