Michael Xu
7 posts

Michael Xu
@MichaelXu25
NLP & Computational Linguistics Researcher
Hangzhou, China Katılım Şubat 2019
16 Takip Edilen2 Takipçiler

Not the best, but better model is in line
You can have a taste
This time, more on user experience less on numbers
#gemini-3-5-flash" target="_blank" rel="nofollow noopener">blog.google/innovation-and…
English

geçen Çarşamba Ankara’da Parlar Vakfı ödül törenindeydik. Emre, Erdal, Ayşegül ve daha birçok değerli arkadaşımız ile genç bilim insanlarına verilen teşvik ödülünü aldık.
Cuma günü de Vehbi Koç Vakfı bursiyerleriyle bir araya geldik. gemicilikten hemşireliğe farklı alanlardan çok parlak öğrencilerle tanıştık, AI üzerine konuştuk.
öğrenciler nasıl bu kadar enerjik olduğumu sordular. aslında yorgunluktan ölüyordum ama hayalleri olan, onlara doğru hızla koşan gençleri görünce, insan o enerjiyi buluyor, yaşadıkları zorlukları dinlemek, paylaşmak istiyor.
çünkü gerçekten, “bütün ümidimiz gençliktedir.” 👌🏾💯

Türkçe

@LBunzel @StefanoErmon @StartupGrind Fascinating keynote. The framing of efficiency, rather than raw capability alone, as a central constraint for high-volume agentic workloads is very compelling. Was the talk recorded? I would be very interested in watching the full keynote. @LBunzel @StefanoErmon
English

"Beyond autoregressive: why diffusion is the future of language models"
@StefanoErmon's keynote at @startupgrind yesterday. Fully packed Fox Theatre.
Mercury 2 is hitting >1,000 tok/sec on standard GPUs at a fraction of the cost, comparable quality to frontier speed-optimized models. Diffusion. Parallel token generation.
His closing line: the question isn't which model is smartest, it's which model is most efficient, without sacrificing quality, on the highest-volume tasks.
When agents make 50 LLM calls per task, latency is the product. @_inception_ai

English

@Goodhumour2 @deepseek_ai Time flies, doesn’t it? 2019 feels like yesterday, but technology moves at lightning speed.
English

@MichaelXu25 @deepseek_ai A technology dated back in 2019 what a timeline we live in it..
English

🚀 Day 5 of #OpenSourceWeek: 3FS, Thruster for All DeepSeek Data Access
Fire-Flyer File System (3FS) - a parallel file system that utilizes the full bandwidth of modern SSDs and RDMA networks.
⚡ 6.6 TiB/s aggregate read throughput in a 180-node cluster
⚡ 3.66 TiB/min throughput on GraySort benchmark in a 25-node cluster
⚡ 40+ GiB/s peak throughput per client node for KVCache lookup
🧬 Disaggregated architecture with strong consistency semantics
✅ Training data preprocessing, dataset loading, checkpoint saving/reloading, embedding vector search & KVCache lookups for inference in V3/R1
📥 3FS → github.com/deepseek-ai/3FS
⛲ Smallpond - data processing framework on 3FS → github.com/deepseek-ai/sm…
English

@Xianbao_QIAN Sorry, I just got online. Where did this picture come from?🤣
English


