Tai Nguyen 🇺🇲

4.1K posts

Tai Nguyen 🇺🇲 banner
Tai Nguyen 🇺🇲

Tai Nguyen 🇺🇲

@TaiNguyen34

PhD Candidate in AI/ML. Specialize in Multimedia Information Forensics and Security.

Pennsylvania, USA Katılım Eylül 2020
667 Takip Edilen384 Takipçiler
Tai Nguyen 🇺🇲 retweetledi
Yuchen Jin
Yuchen Jin@Yuchenj_UW·
OpenAI just dropped a training challenge: Train a <16MB language model in 10 minutes on 8×H100s and minimize held-out loss on a fixed FineWeb dataset. Basically NanoGPT Speedrun. They’re sponsoring $1M in compute. I can summon my autoresearch army to win it… if I have time.
Yuchen Jin tweet media
English
53
75
1.3K
110K
Tai Nguyen 🇺🇲 retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Thank you Jensen and NVIDIA! She’s a real beauty! I was told I’d be getting a secret gift, with a hint that it requires 20 amps. (So I knew it had to be good). She’ll make for a beautiful, spacious home for my Dobby the House Elf claw, among lots of other tinkering, thank you!!
NVIDIA AI Developer@NVIDIAAIDev

🙌 Andrej Karpathy’s lab has received the first DGX Station GB300 -- a Dell Pro Max with GB300. 💚 We can't wait to see what you’ll create @karpathy! 🔗 #dgx-station" target="_blank" rel="nofollow noopener">blogs.nvidia.com/blog/gtc-2026-… @DellTech

English
531
838
19.1K
1M
Tai Nguyen 🇺🇲 retweetledi
Josh
Josh@JMRLudan·
this is what reading RL papers feels like
Josh tweet media
English
7
85
1.1K
35.5K
Tai Nguyen 🇺🇲 retweetledi
alphaXiv
alphaXiv@askalphaxiv·
RL is no longer needed? "Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights" This paper argues that large pretrained models don’t sit at a single optimal set of weights but inside a dense “thicket” of nearby task-specific experts. So once pretraining is strong enough, randomly sampling small weight perturbations often yields specialists that outperform the base model on different tasks, and simply selecting and ensembling these guesses (RandOpt) can rival standard post-training methods. This suggests that much of what post-training does is just selecting useful behaviors already latent around the pretrained weights rather than learning entirely new ones.
alphaXiv tweet media
English
13
94
733
51.4K
Tai Nguyen 🇺🇲 retweetledi
Watcher.Guru
Watcher.Guru@WatcherGuru·
JUST IN: 🇺🇸 $2,000,000,000,000 wiped out from the US stock market in the past month.
Watcher.Guru tweet media
English
1K
2.6K
15.5K
1.6M
Tai Nguyen 🇺🇲 retweetledi
Hugging Face
Hugging Face@huggingface·
🪣 We just shipped Storage Buckets: S3-like mutable storage, cheaper & faster Git falls short for everything on high-throughput side of AI (checkpoints, processed data, agent traces, logs etc) Buckets fixes that: fast writes, overwrites, directory sync 💨 All powered by Xet dedup so successive checkpoints skip the bytes that already exist ➡️
Hugging Face tweet media
English
19
69
394
66.6K
Tai Nguyen 🇺🇲 retweetledi
Caitlin Kalinowski
Caitlin Kalinowski@kalinowski007·
I resigned from OpenAI. I care deeply about the Robotics team and the work we built together. This wasn’t an easy call. AI has an important role in national security. But surveillance of Americans without judicial oversight and lethal autonomy without human authorization are lines that deserved more deliberation than they got. This was about principle, not people. I have deep respect for Sam and the team, and I’m proud of what we built together.
English
1.9K
13.1K
59.2K
7.7M
Tai Nguyen 🇺🇲 retweetledi
Tenobrus
Tenobrus@tenobrus·
Donald Knuth is vibemathing now. real tough day for the stochastic-parrot crew.
Tenobrus tweet media
English
79
436
3.4K
516.4K
Tai Nguyen 🇺🇲 retweetledi
Aman Chadha
Aman Chadha@i_amanchadha·
🛠️ Primers on Reinforcement Learning (RL): 𝐅𝐮𝐧𝐝𝐚𝐦𝐞𝐧𝐭𝐚𝐥𝐬 & 𝐀𝐠𝐞𝐧𝐭𝐢𝐜 𝐑𝐋 • rl.aman.ai and agentic-rl.aman.ai ➡️ 𝐑𝐋 𝐅𝐮𝐧𝐝𝐚𝐦𝐞𝐧𝐭𝐚𝐥𝐬 - RL is a framework for sequential decision-making where an agent learns to maximize cumulative reward through interaction with an environment, formalized via Markov Decision Processes, value functions, & policy optimization principles. - This primer presents a unified, theory-to-systems view of RL, covering classical foundations (DP, Monte Carlo, TD), value-based, policy-based, actor–critic, model-based vs. model-free paradigms, on-policy vs. off-policy learning, Deep RL algorithms (DQN, PPO, SAC, etc.), & policy optimization (RLHF) for LLMs. 🔹 RL Foundations • Core Components: Agent, Environment, State, Action, Reward, Policy, Return • Bellman Equation • Markov Decision Processes (MDPs) 🔹 Offline vs. Online RL • Offline (Batch) RL • Online RL • Hybrid Strategies 🔹 Types of RL • Value-Based • Policy-Based • Actor-Critic • Model-Based • Model-Free 🔹 On-Policy vs. Off-Policy Learning 🔹 Deep RL • Deep Value-Based Methods - Deep Q-Network (DQN) - Double DQN • Deep Policy-Based Methods - Trust Region Policy Optimization (TRPO) - Proximal Policy Optimization (PPO) • Deep Actor–Critic Methods - A3C & A2C - Deep Deterministic Policy Gradient (DDPG) • Deep Model-Based Methods 🔹 Hybrid & Meta RL 🔹 Tools & Frameworks • Simulation Environments - OpenAI Gym - DeepMind Control Suite • RL Libraries - Stable Baselines3 - RLlib - TF-Agents 🔹 Policy Optimization for LLMs --- ➡️ 𝐀𝐠𝐞𝐧𝐭𝐢𝐜 𝐑𝐋 - Agentic RL provides a decision-theoretic framework for training language-model-based agents to act in interactive environments, emphasizing learning *when* to act, *which* action or tool to choose, & *how* to execute it correctly through multi-step trajectories optimized for long-term return. - This primer develops an end-to-end recipe for agentic RL, covering imitation learning warm-starts, structured action spaces, PPO/GRPO optimization, curriculum learning, & evaluation strategies. 🔹 Background: When SFT Fails (& Why RL Is Required) for Tool-Calling Agents • What is Imitation Learning & Why SFT is Used Before RL? 🔹 Reward Components • Tool Call (Deciding “When” a Tool Should be Called) • Tool Selection (Choosing “Which” Tool to Call) • Tool-Syntax Correctness (Deciding “How” to Call a Tool) • Task Success 🔹 Process vs. Outcome Rewards 🔹 RL Optimization Pipeline: PPO, DPO, & GRPO • RL Training Flow • Losses & Update Rules 🔹 Curriculum Design for RL 🔹 RL Environments in Modern Agents: Single-Turn, Multi-Turn 🔹 RL for Computer-Use Agents 🔹 Agentic RL via Policy Optimization: Milestone-Based Rewards 🔹 Reward Modeling for Complex Agent Environments 🔹 Evaluation, Safety, & Human-in-the-Loop (HITL) Oversight 🔹 Tool-Integrated Reasoning (TIR) Primer written in collaboration with @VinijaJain. #ArtificialIntelligence #GenAI #LLM
Aman Chadha tweet mediaAman Chadha tweet media
English
4
51
342
18.7K
Tai Nguyen 🇺🇲 retweetledi
Ronak Malde
Ronak Malde@rronak_·
My favorite paper of 2026 so far 🔥 They took On-Policy Distillation (ie the Thinking Machines blog post), but then showed that the policy can be both the teacher and the student model. The idea is to condition the teacher off of a golden trajectory, and then train on the conditioned logprobs of the same model. The crazy part is, you can literally condition the teacher on anything!! This opens up an entire pandora's box of bridging prompt optimization/ICL + weight optimization that I'm very excited about for continual learning Authors: @IdanShenfeld @MehulDamani2 Jonas Hübotter @pulkitology
Ronak Malde tweet media
English
16
43
424
34.2K
Tai Nguyen 🇺🇲 retweetledi
Kimbo
Kimbo@kimbochen·
Some Chinese lab: releases new model @vllm_project : Day zero support @eliebakouch : 200-tweet thread explaining everything about the model, starting from the origins of the lab @awnihannun : Retweets about MLX quantizing the model to 0.5 bit running at 2000 tok/s @danielhanchen : Unsloth supports doing RL long context training in 0.025 bit on a potato @Grad62304977 : Shows up when you say RL three times in the mirror. Cites all the papers that the model used for RL training @xeophon : Laments about a bug in eval harness AI influencers: Hypes it up as if it’s AGI @teortaxesTex : Berates everyone with nuanced takes
English
24
40
641
45.4K
Tai Nguyen 🇺🇲 retweetledi
Zhijian Liu
Zhijian Liu@zhijianliu_·
Holiday cooking finally ready to serve! 🥳 Introducing DFlash — speculative decoding with block diffusion. 🚀 6.2× lossless speedup on Qwen3-8B ⚡ 2.5× faster than EAGLE-3 Diffusion vs AR doesn’t have to be a fight. At today’s stage: • dLLMs = fast, highly parallel, but lossy • AR LLMs = accurate, sequential, but slow DFlash = diffusion drafts, AR verifies.
English
56
223
1.7K
167.9K
Tai Nguyen 🇺🇲 retweetledi
Elliot Arledge
Elliot Arledge@elliotarledge·
I made a minimalistic implementation of minecraft that runs at ~50M steps/sec at batchsize 32K
Elliot Arledge tweet media
English
15
31
304
22.3K
Tai Nguyen 🇺🇲 retweetledi
Pirat_Nation 🔴
Pirat_Nation 🔴@Pirat_Nation·
Newsis: Nvidia and AMD to significantly increase GPU prices starting next month. Describes RTX 5090 increasing from $2000 to $5000 "Both companies are reportedly planning to continue raising GPU prices every month going forward. It's highly likely that the price increases will extend across their entire product lineup, encompassing not only consumer GPUs but also GPUs for AI data centers and servers."
Pirat_Nation 🔴 tweet mediaPirat_Nation 🔴 tweet media
English
2.4K
3.7K
41.4K
15.4M
Tai Nguyen 🇺🇲 retweetledi
Lei Yang
Lei Yang@diyerxx·
Got burned by an Apple ICLR paper — it was withdrawn after my Public Comment. So here’s what happened. Earlier this month, a colleague shared an Apple paper on arXiv with me — it was also under review for ICLR 2026. The benchmark they proposed was perfectly aligned with a project we’re working on. I got excited after reading it. I immediately stopped my current tasks and started adapting our model to their benchmark. Pulled a whole weekend crunch session to finish the integration… only to find our model scoring absurdly low. I was really frustrated. I spent days debugging, checking everything — maybe I used it wrong, maybe there was a hidden bug. During this process, I actually found a critical bug in their official code: * When querying the VLM, it only passed in the image path string, not the image content itself. The most ridiculous part? After I fixed their bug, the model's scores got even lower! The results were so counterintuitive that I felt forced to do deeper validation. After multiple checks, the conclusion held: fixing the bug actually made the scores worse. At this point I decided to manually inspect the data. I sampled the first 20 questions our model got wrong, and I was shocked: * 6 out of 20 had clear GT errors. * The pattern suggested the “ground truth” was model-generated with extremely poor quality control, leading to tons of hallucinations. * Based on this quick sample, the GT error rate could be as high as 30%. I reported the data quality issue in a GitHub issue. After 6 days, the authors replied briefly and then immediately closed the issue. That annoyed me — I’d already wasted a ton of time, and I didn’t want others in the community to fall into the same trap — so I pushed back. Only then did they reopen the GitHub issue. Then I went back and checked the examples displayed in the paper itself. Even there, I found at least three clear GT errors. It’s hard to believe the authors were unaware of how bad the dataset quality was, especially when the paper claims all samples were reviewed by annotators. Yet even the examples printed in the paper contain blatant hallucinations and mistakes. When the ICLR reviews came out, I checked the five reviews for this paper. Not a single reviewer noticed the GT quality issues or the hallucinations in the paper's examples. So I started preparing a more detailed GT error analysis and wrote a Public Comment on OpenReview to inform the reviewers and the community about the data quality problems. The next day — the authors withdrew the paper and took down the GitHub repo. Fortunately, ICLR is an open conference with Public Comment. If this had been a closed-review venue, this kind of shoddy work would have been much harder to expose. So here’s a small call to the community: For any paper involving model-assisted dataset construction, reviewers should spend a few minutes checking a few samples manually. We need to prevent irresponsible work from slipping through and misleading everyone. Looking back, I should have suspected the dataset earlier based on two red flags: * The paper’s experiments claimed that GPT-5 has been surpassed by a bunch of small open-source models. * The original code, with a ridiculous bug, produced higher scores than the bug-fixed version. But because it was a paper from Big Tech, I subconsciously trusted the integrity and quality, which prevented me from spotting the problem sooner. This whole experience drained a lot of my time, energy, and emotion — especially because accusing others of bad data requires extra caution. I’m sharing this in hopes that the ML community remains vigilant and pushes back against this kind of sloppy, low-quality, and irresponsible behavior before it misleads people and wastes collective effort. #ICLR #ICLR2026 #NeurIPS #CVPR #openreview #MachineLearning #LLM #VLM
Lei Yang tweet media
English
54
214
2.5K
395.9K
Tai Nguyen 🇺🇲 retweetledi
Tongyi Lab
Tongyi Lab@Ali_TongyiLab·
1/ 10 We are pleased to introduce Z-Image, an efficient 6-billion-parameter foundation model for image generation. Through systematic optimization, it proves that top-tier performance is achievable without relying on enormous model sizes, delivering strong results in photorealistic generation and bilingual text rendering that are comparable to leading commercial models.
Tongyi Lab tweet mediaTongyi Lab tweet media
English
135
427
3.7K
7.8M
Tai Nguyen 🇺🇲 retweetledi
Qwen
Qwen@Alibaba_Qwen·
🚀 Qwen3-VL Tech report is now out on arXiv! From pretraining to post-training, architecture to infra, data to evaluation — we’ve packed in the details for anyone building on vision-language models. 🔥 3 models >1M downloads in just over a month 🏆 Qwen3-VL-8B leads with 2M+ downloads 📚 Built on the shoulders of Qwen2.5-VL (2800+ citations in <10 months!) Check out the paper for insights, baselines, and future directions. Let’s keep pushing VLMs forward — together. arxiv.org/pdf/2511.21631
Qwen tweet media
English
48
286
1.7K
200K