Tai Nguyen 🇺🇲

4.1K posts

Tai Nguyen 🇺🇲

@TaiNguyen34

PhD Candidate in AI/ML. Specialize in Multimedia Information Forensics and Security.

Pennsylvania, USA Katılım Eylül 2020

667 Takip Edilen384 Takipçiler

Sabitlenmiş Tweet

Tai Nguyen 🇺🇲@TaiNguyen34·8 Kas

Our MISL lab at @DrexelUniv tackled a popular deepfaked video on X/Twitter of Zambia's president reportedly saying he was withdrawing from 2026 election. Our discovery is verified and posted on AFP Fact Check. #deepfake #misinformation #disinformation #election #afp #FactCheck

English

1.9K

Tai Nguyen 🇺🇲 retweetledi

Yuchen Jin@Yuchenj_UW·18 Mar

OpenAI just dropped a training challenge: Train a <16MB language model in 10 minutes on 8×H100s and minimize held-out loss on a fixed FineWeb dataset. Basically NanoGPT Speedrun. They’re sponsoring $1M in compute. I can summon my autoresearch army to win it… if I have time.

English

1.3K

110K

Tai Nguyen 🇺🇲 retweetledi

Andrej Karpathy@karpathy·18 Mar

Thank you Jensen and NVIDIA! She’s a real beauty! I was told I’d be getting a secret gift, with a hint that it requires 20 amps. (So I knew it had to be good). She’ll make for a beautiful, spacious home for my Dobby the House Elf claw, among lots of other tinkering, thank you!!

NVIDIA AI Developer@NVIDIAAIDev

🙌 Andrej Karpathy’s lab has received the first DGX Station GB300 -- a Dell Pro Max with GB300. 💚 We can't wait to see what you’ll create @karpathy! 🔗 #dgx-station" target="_blank" rel="nofollow noopener">blogs.nvidia.com/blog/gtc-2026-… @DellTech

English

531

838

19.1K

Tai Nguyen 🇺🇲 retweetledi

Josh@JMRLudan·18 Mar

this is what reading RL papers feels like

English

1.1K

35.5K

Tai Nguyen 🇺🇲 retweetledi

alphaXiv@askalphaxiv·14 Mar

RL is no longer needed? "Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights" This paper argues that large pretrained models don’t sit at a single optimal set of weights but inside a dense “thicket” of nearby task-specific experts. So once pretraining is strong enough, randomly sampling small weight perturbations often yields specialists that outperform the base model on different tasks, and simply selecting and ensembling these guesses (RandOpt) can rival standard post-training methods. This suggests that much of what post-training does is just selecting useful behaviors already latent around the pretrained weights rather than learning entirely new ones.

English

733

51.4K

Tai Nguyen 🇺🇲 retweetledi

Sebastian Raschka@rasbt·15 Mar

I (finally) put together a new LLM Architecture Gallery that collects the architecture figures all in one place! sebastianraschka.com/llm-architectu…

English

202

1.5K

8.2K

710.6K

Tai Nguyen 🇺🇲 retweetledi

Watcher.Guru@WatcherGuru·13 Mar

JUST IN: 🇺🇸 $2,000,000,000,000 wiped out from the US stock market in the past month.

English

2.6K

15.5K

1.6M

Tai Nguyen 🇺🇲 retweetledi

Hugging Face@huggingface·10 Mar

🪣 We just shipped Storage Buckets: S3-like mutable storage, cheaper & faster Git falls short for everything on high-throughput side of AI (checkpoints, processed data, agent traces, logs etc) Buckets fixes that: fast writes, overwrites, directory sync 💨 All powered by Xet dedup so successive checkpoints skip the bytes that already exist ➡️

English

394

66.6K

Tai Nguyen 🇺🇲 retweetledi

Caitlin Kalinowski@kalinowski007·7 Mar

I resigned from OpenAI. I care deeply about the Robotics team and the work we built together. This wasn’t an easy call. AI has an important role in national security. But surveillance of Americans without judicial oversight and lethal autonomy without human authorization are lines that deserved more deliberation than they got. This was about principle, not people. I have deep respect for Sam and the team, and I’m proud of what we built together.

English

1.9K

13.1K

59.2K

7.7M

Tai Nguyen 🇺🇲 retweetledi

Tenobrus@tenobrus·3 Mar

Donald Knuth is vibemathing now. real tough day for the stochastic-parrot crew.

English

436

3.4K

516.4K

Tai Nguyen 🇺🇲 retweetledi

Aman Chadha@i_amanchadha·1 Mar

🛠️ Primers on Reinforcement Learning (RL): 𝐅𝐮𝐧𝐝𝐚𝐦𝐞𝐧𝐭𝐚𝐥𝐬 & 𝐀𝐠𝐞𝐧𝐭𝐢𝐜 𝐑𝐋 • rl.aman.ai and agentic-rl.aman.ai ➡️ 𝐑𝐋 𝐅𝐮𝐧𝐝𝐚𝐦𝐞𝐧𝐭𝐚𝐥𝐬 - RL is a framework for sequential decision-making where an agent learns to maximize cumulative reward through interaction with an environment, formalized via Markov Decision Processes, value functions, & policy optimization principles. - This primer presents a unified, theory-to-systems view of RL, covering classical foundations (DP, Monte Carlo, TD), value-based, policy-based, actor–critic, model-based vs. model-free paradigms, on-policy vs. off-policy learning, Deep RL algorithms (DQN, PPO, SAC, etc.), & policy optimization (RLHF) for LLMs. 🔹 RL Foundations • Core Components: Agent, Environment, State, Action, Reward, Policy, Return • Bellman Equation • Markov Decision Processes (MDPs) 🔹 Offline vs. Online RL • Offline (Batch) RL • Online RL • Hybrid Strategies 🔹 Types of RL • Value-Based • Policy-Based • Actor-Critic • Model-Based • Model-Free 🔹 On-Policy vs. Off-Policy Learning 🔹 Deep RL • Deep Value-Based Methods - Deep Q-Network (DQN) - Double DQN • Deep Policy-Based Methods - Trust Region Policy Optimization (TRPO) - Proximal Policy Optimization (PPO) • Deep Actor–Critic Methods - A3C & A2C - Deep Deterministic Policy Gradient (DDPG) • Deep Model-Based Methods 🔹 Hybrid & Meta RL 🔹 Tools & Frameworks • Simulation Environments - OpenAI Gym - DeepMind Control Suite • RL Libraries - Stable Baselines3 - RLlib - TF-Agents 🔹 Policy Optimization for LLMs --- ➡️ 𝐀𝐠𝐞𝐧𝐭𝐢𝐜 𝐑𝐋 - Agentic RL provides a decision-theoretic framework for training language-model-based agents to act in interactive environments, emphasizing learning *when* to act, *which* action or tool to choose, & *how* to execute it correctly through multi-step trajectories optimized for long-term return. - This primer develops an end-to-end recipe for agentic RL, covering imitation learning warm-starts, structured action spaces, PPO/GRPO optimization, curriculum learning, & evaluation strategies. 🔹 Background: When SFT Fails (& Why RL Is Required) for Tool-Calling Agents • What is Imitation Learning & Why SFT is Used Before RL? 🔹 Reward Components • Tool Call (Deciding “When” a Tool Should be Called) • Tool Selection (Choosing “Which” Tool to Call) • Tool-Syntax Correctness (Deciding “How” to Call a Tool) • Task Success 🔹 Process vs. Outcome Rewards 🔹 RL Optimization Pipeline: PPO, DPO, & GRPO • RL Training Flow • Losses & Update Rules 🔹 Curriculum Design for RL 🔹 RL Environments in Modern Agents: Single-Turn, Multi-Turn 🔹 RL for Computer-Use Agents 🔹 Agentic RL via Policy Optimization: Milestone-Based Rewards 🔹 Reward Modeling for Complex Agent Environments 🔹 Evaluation, Safety, & Human-in-the-Loop (HITL) Oversight 🔹 Tool-Integrated Reasoning (TIR) Primer written in collaboration with @VinijaJain. #ArtificialIntelligence #GenAI #LLM

English

342

18.7K

Tai Nguyen 🇺🇲@TaiNguyen34·27 Şub

@QuixiAI @Alibaba_Qwen @HotAisle @llamafactory_ai @johnschulman2 so was it lora "with regrets"?

English

Eric Hartford@QuixiAI·27 Şub

@Alibaba_Qwen Qwen3.5-397B-A17B finetuning on a single @HotAisle 8x mi300x node at bf16, using LoRA and @llamafactory_ai following the recipe of @johnschulman2's Lora Without Regret. Had to hack a *lot* of things to make it work. PRs and blog post will be coming soon.

English

2.3K

Tai Nguyen 🇺🇲 retweetledi

Ronak Malde@rronak_·10 Şub

My favorite paper of 2026 so far 🔥 They took On-Policy Distillation (ie the Thinking Machines blog post), but then showed that the policy can be both the teacher and the student model. The idea is to condition the teacher off of a golden trajectory, and then train on the conditioned logprobs of the same model. The crazy part is, you can literally condition the teacher on anything!! This opens up an entire pandora's box of bridging prompt optimization/ICL + weight optimization that I'm very excited about for continual learning Authors: @IdanShenfeld @MehulDamani2 Jonas Hübotter @pulkitology

English

424

34.2K

Tai Nguyen 🇺🇲 retweetledi

Kimbo@kimbochen·16 Oca

Some Chinese lab: releases new model @vllm_project : Day zero support @eliebakouch : 200-tweet thread explaining everything about the model, starting from the origins of the lab @awnihannun : Retweets about MLX quantizing the model to 0.5 bit running at 2000 tok/s @danielhanchen : Unsloth supports doing RL long context training in 0.025 bit on a potato @Grad62304977 : Shows up when you say RL three times in the mirror. Cites all the papers that the model used for RL training @xeophon : Laments about a bug in eval harness AI influencers: Hypes it up as if it’s AGI @teortaxesTex : Berates everyone with nuanced takes

English

641

45.4K

Tai Nguyen 🇺🇲 retweetledi

Zhijian Liu@zhijianliu_·6 Oca

Holiday cooking finally ready to serve! 🥳 Introducing DFlash — speculative decoding with block diffusion. 🚀 6.2× lossless speedup on Qwen3-8B ⚡ 2.5× faster than EAGLE-3 Diffusion vs AR doesn’t have to be a fight. At today’s stage: • dLLMs = fast, highly parallel, but lossy • AR LLMs = accurate, sequential, but slow DFlash = diffusion drafts, AR verifies.

English

223

1.7K

167.9K

Tai Nguyen 🇺🇲 retweetledi

Elliot Arledge@elliotarledge·4 Oca

I made a minimalistic implementation of minecraft that runs at ~50M steps/sec at batchsize 32K

English

304

22.3K

Tai Nguyen 🇺🇲 retweetledi

Pirat_Nation 🔴@Pirat_Nation·31 Ara

Newsis: Nvidia and AMD to significantly increase GPU prices starting next month. Describes RTX 5090 increasing from $2000 to $5000 "Both companies are reportedly planning to continue raising GPU prices every month going forward. It's highly likely that the price increases will extend across their entire product lineup, encompassing not only consumer GPUs but also GPUs for AI data centers and servers."

English

2.4K

3.7K

41.4K

15.4M

Tai Nguyen 🇺🇲 retweetledi

JerryRigEverything@ZacksJerryRig·9 Ara

That Nikola guy was thrown in jail for rolling a truck down a hill. Elon *specifically* said these were run by AI. How is this not fraud? You can *see* the teleoperator take off his headset 😂

Earl of FrunkPuppy@28delayslater

Hard to believe but there will be even less demand for this than the Cybertruck

English

199

865

17.2K

374.6K

Tai Nguyen 🇺🇲 retweetledi

Lei Yang@diyerxx·27 Kas

Got burned by an Apple ICLR paper — it was withdrawn after my Public Comment. So here’s what happened. Earlier this month, a colleague shared an Apple paper on arXiv with me — it was also under review for ICLR 2026. The benchmark they proposed was perfectly aligned with a project we’re working on. I got excited after reading it. I immediately stopped my current tasks and started adapting our model to their benchmark. Pulled a whole weekend crunch session to finish the integration… only to find our model scoring absurdly low. I was really frustrated. I spent days debugging, checking everything — maybe I used it wrong, maybe there was a hidden bug. During this process, I actually found a critical bug in their official code: * When querying the VLM, it only passed in the image path string, not the image content itself. The most ridiculous part? After I fixed their bug, the model's scores got even lower! The results were so counterintuitive that I felt forced to do deeper validation. After multiple checks, the conclusion held: fixing the bug actually made the scores worse. At this point I decided to manually inspect the data. I sampled the first 20 questions our model got wrong, and I was shocked: * 6 out of 20 had clear GT errors. * The pattern suggested the “ground truth” was model-generated with extremely poor quality control, leading to tons of hallucinations. * Based on this quick sample, the GT error rate could be as high as 30%. I reported the data quality issue in a GitHub issue. After 6 days, the authors replied briefly and then immediately closed the issue. That annoyed me — I’d already wasted a ton of time, and I didn’t want others in the community to fall into the same trap — so I pushed back. Only then did they reopen the GitHub issue. Then I went back and checked the examples displayed in the paper itself. Even there, I found at least three clear GT errors. It’s hard to believe the authors were unaware of how bad the dataset quality was, especially when the paper claims all samples were reviewed by annotators. Yet even the examples printed in the paper contain blatant hallucinations and mistakes. When the ICLR reviews came out, I checked the five reviews for this paper. Not a single reviewer noticed the GT quality issues or the hallucinations in the paper's examples. So I started preparing a more detailed GT error analysis and wrote a Public Comment on OpenReview to inform the reviewers and the community about the data quality problems. The next day — the authors withdrew the paper and took down the GitHub repo. Fortunately, ICLR is an open conference with Public Comment. If this had been a closed-review venue, this kind of shoddy work would have been much harder to expose. So here’s a small call to the community: For any paper involving model-assisted dataset construction, reviewers should spend a few minutes checking a few samples manually. We need to prevent irresponsible work from slipping through and misleading everyone. Looking back, I should have suspected the dataset earlier based on two red flags: * The paper’s experiments claimed that GPT-5 has been surpassed by a bunch of small open-source models. * The original code, with a ridiculous bug, produced higher scores than the bug-fixed version. But because it was a paper from Big Tech, I subconsciously trusted the integrity and quality, which prevented me from spotting the problem sooner. This whole experience drained a lot of my time, energy, and emotion — especially because accusing others of bad data requires extra caution. I’m sharing this in hopes that the ML community remains vigilant and pushes back against this kind of sloppy, low-quality, and irresponsible behavior before it misleads people and wastes collective effort. #ICLR #ICLR2026 #NeurIPS #CVPR #openreview #MachineLearning #LLM #VLM

English

214

2.5K

395.9K

Tai Nguyen 🇺🇲 retweetledi

Tongyi Lab@Ali_TongyiLab·27 Kas

1/ 10 We are pleased to introduce Z-Image, an efficient 6-billion-parameter foundation model for image generation. Through systematic optimization, it proves that top-tier performance is achievable without relying on enormous model sizes, delivering strong results in photorealistic generation and bilingual text rendering that are comparable to leading commercial models.

English

135

427

3.7K

7.8M

Tai Nguyen 🇺🇲 retweetledi

Qwen@Alibaba_Qwen·27 Kas

🚀 Qwen3-VL Tech report is now out on arXiv! From pretraining to post-training, architecture to infra, data to evaluation — we’ve packed in the details for anyone building on vision-language models. 🔥 3 models >1M downloads in just over a month 🏆 Qwen3-VL-8B leads with 2M+ downloads 📚 Built on the shoulders of Qwen2.5-VL (2800+ citations in <10 months!) Check out the paper for insights, baselines, and future directions. Let’s keep pushing VLMs forward — together. arxiv.org/pdf/2511.21631

English

286

1.7K

200K

Keşfet

@VinijaJain @QuixiAI @Alibaba_Qwen @HotAisle @llamafactory_ai @johnschulman2 @IdanShenfeld @MehulDamani2