Ziyu Wang

418 posts

Ziyu Wang

@ziyuwang

Machine learning researcher. Co-founder / CTO of https://t.co/h85fQdqLrr

Katılım Mayıs 2009

424 Takip Edilen1.5K Takipçiler

Ziyu Wang retweetledi

Sander Dieleman@sedielem·18 Şub

Several new methods to shape the latent distributions of autoencoders have popped up recently. They are often compared against the traditional VAE setup, where a KL penalty encourages the latents to be Gaussian. 🧵👇 (1/10)

English

226

19.6K

Ziyu Wang retweetledi

Nando de Freitas@NandoDF·14 Eki

Hello World. MAI image is our first image generation model, following MAI voice and MAI text. Congrats team! Oh, and if you are a strong engineer who wants to make this model climb to Number 1, please send your CV to JoinAITeam@microsoft.com

Arena.ai@arena

🚨New model drop into the Top 10! 🖼️ @MicrosoftAI just entered the Image Arena with MAI-Image-1. Community votes are already rolling in, and MAI-Image-1 has broken into the Top 10. It’s currently ranked #9, tied with Seedream 3! MAI-Image-1 is now live in Direct Chat for early access on LMArena.

English

25.8K

Ziyu Wang@ziyuwang·14 Eki

Glad to share this with the world!

Mustafa Suleyman@mustafasuleyman

Meet our third @MicrosoftAI model: MAI-Image-1 #9 on LMArena, striking an impressive balance of generation speed and quality Excited to keep refining + climbing the leaderboard from here! We're just getting started. microsoft.ai/news/introduci…

English

334

Ziyu Wang retweetledi

Nathan Lambert@natolambert·3 Şub

Stoked to get to talk to @lexfridman + my homie @dylan522p for 5+ hours to try and get to the bottom of what is actually happening in AI right now. DeepSeek R1 & V3, China v US, open vs closed, decreasing hype, datacenters, everything in between... 🚀 what a fun whirlwind week

Lex Fridman@lexfridman

Here's my 5-hour conversation with @dylan522p and @natolambert on DeepSeek, China, OpenAI, NVIDIA, xAI, Google, Anthropic, Meta, Microsoft, TSMC, Stargate, megacluster buildouts, RL, reasoning, and a lot of other topics at the cutting edge of AI. This is was a mind-blowing, super-technical, and fun conversation. Yes, we discuss r1 and o3-mini, but more importantly we look into the future of technology, geopolitics, and humanity in a world that stands on the precipice of a global AI revolution. The first 4 hours are here on X (4 hours is current limit), and the full 5 hours are up everywhere else. Links in comment. Timestamps: 0:00 - Introduction 3:33 - DeepSeek-R1 and DeepSeek-V3 25:07 - Low cost of training 51:25 - DeepSeek compute cluster 58:57 - Export controls on GPUs to China 1:09:16 - AGI timeline 1:18:41 - China's manufacturing capacity 1:26:36 - Cold war with China 1:31:05 - TSMC and Taiwan 1:54:44 - Best GPUs for AI 2:09:36 - Why DeepSeek is so cheap 2:22:55 - Espionage 2:31:57 - Censorship 2:44:52 - Andrej Karpathy and magic of RL 2:55:23 - OpenAI o3-mini vs DeepSeek r1 3:14:31 - NVIDIA 3:18:58 - GPU smuggling 3:25:36 - DeepSeek training on OpenAI data 3:36:04 - AI megaclusters 4:11:26 - Who wins the race to AGI? 4:21:39 - AI agents 4:30:21 - Programming and AI 4:37:49 - Open source 4:47:01 - Stargate 4:54:30 - Future of AI

English

642

89.5K

Ziyu Wang retweetledi

David Duvenaud@DavidDuvenaud·30 Oca

New paper: What happens once AIs make humans obsolete? Even without AIs seeking power, we argue that competitive pressures will fully erode human influence and values. gradual-disempowerment.ai with @jankulveit @raymondadouglas @AmmannNora @degerturann @DavidSKrueger 🧵

English

251

1.3K

400.1K

Ziyu Wang retweetledi

Jiayi Pan@jiayi_pirate·24 Oca

We reproduced DeepSeek R1-Zero in the CountDown game, and it just works Through RL, the 3B base LM develops self-verification and search abilities all on its own You can experience the Ahah moment yourself for < $30 Code: github.com/Jiayi-Pan/Tiny… Here's what we learned 🧵

English

192

1.2K

6.3K

1.7M

Ziyu Wang retweetledi

Alex Dimakis@AlexGDimakis·21 Oca

Most AI researchers I talk to have been a bit shocked by DeepSeek-R1 and its performance. My preliminary understanding nuggets: 1. Simple post-training recipe called GRPO: Start with a good model and reward for correctness and style outcomes. No PRM, no MCTS no fancy reward models. Basically checks if the answer is correct. 😅 2. Small models can reason very very well with correct distillation post-training. They released a 1.5B model (!) that is better than Claude and Llama 405B in AIME24. Also, their distilled 7B model seems better than o1 preview. 🤓 3. The datasets used are not released, if I understand correctly. 🫤 4. DeepSeek seems to be the best at executing Open AI's original mission right now. We need to catch up.

English

127

1.4K

181.8K

Ziyu Wang retweetledi

Dr Singularity@Dr_Singularity·15 Oca

This can be big. Google unveils the successor to the Transformer architecture "We present a new neural long term memory module that learns to memorize historical context and helps an attention to attend to the current context while utilizing long past information. We show that this neural memory has the advantage of a fast parallelizable training while maintaining a fast inference." "From a memory perspective, we argue that attention due to its limited context but accurate dependency modeling performs as a short term memory, while neural memory due to its ability to memorize the data, acts as a long-term, more persistent, memory. Based on these two modules, we introduce a new family of architectures, called Titans, and present three variants to address how one can effectively incorporate memory into this architecture." "Our experimental results on language modeling, common sense reasoning, genomics, and time series tasks show that Titans are more effective than Transformers and recent modern linear recurrent models." "They further can effectively scale to larger than 2M context window size with higher accuracy in needle in haystack tasks compared to baselines."

English

353

2.3K

145.9K

Ziyu Wang retweetledi

Mark Goldstein@marikgoldstein·14 Oca

Really cool work! Great gains from just a few particles and some smart choices of potentials!

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr

A General Framework for Inference-time Scaling and Steering of Diffusion Models Introduces Feynman-Kac steering, an inference-time steering framework for sampling diffusion models guided by a reward function. It generates multiple samples (particles) like best-of-n (importance sampling) approaches. Particles are evaluated at intermediate steps, where they are scored with functions called potentials. Potentials are defined using intermediate rewards and are selected such that promising particles are resampled and poor samples are terminated. "FK steering with just k = 4 particles outperforms fine-tuning on prompt fidelity and aesthetic quality, without making use of reward gradients." "FK steering smaller diffusion models outperforms larger models, and their fine-tuned versions, using less compute."

English

9.7K

Ziyu Wang retweetledi

Nathan Lambert@natolambert·14 Oca

Qwen released a 72B process reward model (PRM) on their recent math model. A good chance it's the best PRM openly available for reasoning research. We like Qwen.

English

291

68.3K

Ziyu Wang@ziyuwang·10 Oca

A very important book to take note of: goodreads.com/book/show/1827…

English

171

Ziyu Wang@ziyuwang·20 Ara

@ciguleva Haiper

Euskara

104

Tatiana Tsiguleva@ciguleva·19 Ara

Whoever has access to Veo 2, Sora, Runway, Pika, Haiper, Luma, Kling… you name it. Could you please animate this guy? Let's compare image-to-video models. * - I don't have access to Veo 2 (waitlist)

English

795

98.7K

Ziyu Wang@ziyuwang·14 Ara

@ciguleva Haiper 2.5

Euskara

237

Tatiana Tsiguleva@ciguleva·13 Ara

Whoever has access to Sora, Runway, Pika, Haiper, Luma, Kling… you name it. Could you please animate this eye? Let's compare image-to-video models.

English

164

1.1K

19.4K

1.3M

Ziyu Wang@ziyuwang·13 Ara

@ciguleva Haiper 2.5

Euskara

158

Tatiana Tsiguleva@ciguleva·12 Ara

Whoever has access to Sora, Runway, Pika, Haiper, Luma, Kling… you name it. Could you please animate this guy? Let's compare image-to-video models.

English

293

604

11.4K

Ziyu Wang retweetledi

Haiper AI@HaiperGenAI·6 Ara

@HaiperGenAI API is now available on @ComfyUI with the features: 🟢 Key Frame Conditioning 🖼️ Image2Video 📽️ Text2Video 🏞️ Text2Image Try out: github.com/Haiper-ai/Comf… Haiper API: haiper.ai/haiper-api #Haiper #ComfyUI

English

6.4K

Ziyu Wang retweetledi

Haiper AI@HaiperGenAI·2 Ara

🚀 Haiper 2.5: Enhanced Mode is here. Take control like never before with Keyframe Conditioning Timeline, letting you customize every frame to perfection. Sharper. Smoother. Simply revolutionary. Catch us at haiper.ai and take your creativity to the next level. 🌟 #AI #EnhancedMode #Haiper2_5

English

258

21.4K

Ziyu Wang retweetledi

ρŁ𝐀𝔰Ｍʘ@plasm0·29 Kas

Don't sleep on @HaiperGenAI They just released their v2.5 model and it passes the brain worm test - both in t2v prompt adherence and not blocking what I want visualized. Plus the resolution is quite good on initial tests. Now go eat some turkey 🦃

English

2.9K

Ziyu Wang retweetledi

Haiper AI@HaiperGenAI·29 Eki

🚀 Introducing Haiper 2.0: Text-to-Image Like Never Before! 🚀 Unleash your creativity with sharper, more realistic visuals at lightning speed. Whether you’re a creator or a brand, Haiper 2.0’s Text-to-Image feature makes transforming ideas into images effortless. Ready to see the magic? 🪄