Trung Vu

330 posts

Trung Vu

@trungthvu

San Francisco, CA Katılım Mayıs 2013

1.5K Takip Edilen371 Takipçiler

Trung Vu retweetledi

Richard Zhuang@RichardZ412·20 Şub

Terminal-Bench is a leading benchmark for agents. Unfortunately it’s hard: most small coding agents get very low scores on TB2, so training/system ablations look flat - you can't tell what's working. Announcing OpenThoughts-TBLite - 100 curated TB2-style tasks, difficulty-calibrated so even 8B models can make progress. It's designed to give researchers measurable signal during development, providing faster feedback for experimental iteration while closely tracking true TB2 performance🧵

English

182

44K

Trung Vu@trungthvu·18 Şub

Very excited for this integration. Will make agentic RL much much easier.

Charlie Ruan@charlie_ruan

Releasing the official SkyRL + Harbor integration: a standardized way to train terminal-use agents with RL. From the creators of Terminal-Bench, Harbor is a widely adopted framework for evaluating terminal-use agents on any task expressible as a Dockerfile + instruction + test script. This integration extends it: the same tasks you evaluate on, you can now RL-train on. Blog: novasky-ai.notion.site/skyrl-harbor 🧵

English

676

Trung Vu@trungthvu·29 Oca

@bubbleboi neglecting consumer seems like an opening for a new co to compete (just classic disruption theory) i dont disagree w u in the short term but i think in the long run the equilibrium is there will always be supply to meet demand.

English

bubble boi@bubbleboi·29 Oca

@trungthvu Cost of capex isn’t free and who writes that down

English

737

bubble boi@bubbleboi·27 Oca

I’ve been thinking a lot about Clawd Bot & the race for Mac mini’s a bit over the past few days and I think I’ve come to a very scary realization that explains this crazy phenomenon. Put simply, building a gaming PC will be nearly impossible in the next 5 years… in fact it already is for the vast majority of consumers. But I will go one step farther—in the next 10 years having any type of personal computing device will be unattainable. Fab capacity will be allocated to its most productive and profitable use which is Cloud & AI data centers. Even today, most of the software you run already won’t work without an internet connection. But now with the opportunity cost being so high consumers will be shafted and will only have one option which is moving to the cloud. It’s looking increasingly likely the only hardware you will have is some terminal that connects to the cloud with no workloads running directly on your own hardware. Your device will just have the most basic single core processor and 4 GB of RAM at most.. This is what most people are missing with the fire escape race to acquire Mac Mini’s. The cost for these AI services aren’t just going up they will scale and capture the profitability of the services they provide the same way consulting and financial services extract rent from larger more productive corporations. The only way to protect yourself from the inevitable is to acquire as much hardware that can run inference as fast as fucking possible… Welcome to the computless class.

English

135

1.3K

210.6K

Trung Vu@trungthvu·19 Oca

@typedfemale moonwakecoffeeroasters.com btw. i have no stake just a stan.

English

Trung Vu@trungthvu·18 Oca

@typedfemale now try moonwake

English

140

typedfemale@typedfemale·11 Oca

i think this is some of the best coffee i've ever had... highly recommend it

English

6.5K

Trung Vu@trungthvu·19 Oca

@xeophon u should make a compact bench

English

Xeophon@xeophon·18 Oca

Codex doesn’t, that thing just works after compacting. Honestly impressive

Xeophon@xeophon

A ton of scaffolds, including Claude Code, implement compaction wrong

English

305

32.5K

Trung Vu@trungthvu·23 Ara

a good piece articulating the moral project that awaits us in the age of agi

opus131csharpminor@op131csharpminr

The article is here: x.com/op131csharpmin…

English

147

Trung Vu@trungthvu·23 Ara

@SkyLi0n the writeup that explains how this worked is also super nice

English

Aaron Gokaslan@SkyLi0n·22 Ara

Super exciting new Megatron-LM PR that finally allows performant tensor parallelism and A2A overlap at the same time: github.com/NVIDIA/Megatro… Now you can train MOEs quickly with high EP AND high TP for long context training.

English

460

Trung Vu@trungthvu·19 Ara

@RichardZ412 just need pretrain bench and then we'll have a full benchmark for self-improving AGI

English

Richard Zhuang@RichardZ412·19 Ara

Imagine we have models post-trained on how to post-train itself better🥲

Maksym Andriushchenko@maksym_andr

We release PostTrainBench: a benchmark measuring how well AI agents like Claude Code can post-train base LLMs. We expect this to be an important indicator for AI R&D automation as it unfolds over the next few years. 🔗 posttrainbench.com 📂 github.com/aisa-group/Pos… 1/n

English

1.6K

Trung Vu@trungthvu·17 Ara

@cpaik how do (or should?) you escape these dynamics?

English

642

Chris Paik@cpaik·17 Ara

Paying $3 for a Dollar: The Rational Irrationality of Venture Capital docs.google.com/document/d/1S7…

English

650

222.5K

Trung Vu@trungthvu·15 Ara

@natolambert the former have nothing to lose (for now) and the latter have everything to lose.

English

1.3K

Nathan Lambert@natolambert·15 Ara

The Chinese labs are so good at giving off “happy to be here” vibes while the US companies are deep in the troughs of competition.

Nathan Lambert@natolambert

Open models year in review What a year! We're back with an updated open model builder tier list, our top models of the year, and our predictions for 2026. First, the winning models: 1. DeepSeek R1 (@deepseek_ai): Transformed the AI world 2. Qwen 3 Family (@AlibabaGroup): The new default open models 3. Kimi K2 Family (@Kimi_Moonshot): Models that convinced the world that DeepSeek wasn't special and China would produce numerous leading models. Runner up models: MiniMax M2 (@minimax_ai), GLM 4.5 (@Zai_org), GPT-OSS (@OpenAI), Gemma 3 (@GoogleAI), Olmo 3 (@allen_ai) Honorable Mentions: Nvidia's (@nvidia) Parakeet speech-to-text model & Nemotron 2 LLM, Moondream 3 VLM (@moondreamai), Granite 4 LLMs (@IBMResearch), and HuggingFace's (@huggingface) SmolLM3. Updated Tier list: Frontier open labs: DeepSeek (@deepseek_ai), Qwen (@AlibabaGroup), and Kimi Moonshot (@Kimi_Moonshot) Close behind: Z.ai (@Zai_org) & MiniMax AI (@minimax_ai) (notably none from the U.S. here and up) Noteworthy (a mix of US & China): StepFun AI (@StepFun_ai), Ant Group's (@AntGroup/ @TheInclusionAI Inclusion AI, Meituan (@Meituan_LongCat), Tencent (@TencentHunyuan), IBM (@IBMResearch), Nvidia (@nvidia), Google (@GoogleAI), & Mistral (@MistralAI) Then a bunch more below that, which we detail. Predictions for 2026: 1. Scaling will continue with open models. 2. No substantive changes in the open model safety narrative. 3. Participation will continue to grow. 4. Ongoing general trends will continue w/ MoEs, hybrid attention, dense for fine-tuning. 5. The open and closed frontier gap will stay roughly the same on any public benchmarks. 6. No Llama-branded open model releases from Meta in 2026. Read the full post on @interconnectsai -- link below.

English

835

87.4K

Trung Vu@trungthvu·22 Kas

@GergelyOrosz I thought early Uber was basically 996? :)

English

525

Gergely Orosz@GergelyOrosz·21 Kas

I struggle to name a single 996 company that produces something worth paying attention to that is not a copy or rehash of a nicer product launched elsewhere Food for thought that you need not just hard work but inspiration + creativity to do standout work. Hard to do w no break

Karri Saarinen@karrisaarinen

Quality first. People should have a life outside of work. Place to enjoy life, develop their tastes, gather inspiration. When you feel better, your work is better. It naturally bleeds into what you make. fastcompany.com/91445544/the-1…

English

1.2K

147.9K

Trung Vu@trungthvu·22 Kas

for the model layer, OpenAI made a contrarian bet and were early to GPT-3. they truly innovated and went down a different path from everyone else. this gave them an early advantage. however once people see that the recipe works, capital + talent quickly formed behind second movers and there has been rapid catchup since. and whole model layer is now converging the "faster execution" game :)

English

197

Trung Vu@trungthvu·22 Kas

I guess I was being too biased by thinking about this from competition at the model layer (OAI, Anthropic, etc). even at the app layer, it feels like the same dynamics occur for hot areas, e.g. coding IDE where there are lots of competition, it seems inevitable that second movers will copy the innovative first movers, and then it quickly becomes a race to see who can execute faster. curious for your thoughts.

English

354

Karri Saarinen@karrisaarinen·21 Kas

English

1.1K

385.7K

Trung Vu@trungthvu·22 Kas

how much of this do you think is about picking the right problem space such that you can "outclass" your competition with taste / cleverness? i think a reason why 996 happens with the current AI wave is that the roadmap is mostly known and any secret sauce diffuses within +/- 3 months. in this regime a lot of staying ahead is just working 10-20% more hours than your competition and going down the same path 10-20% faster.

English

Cristina Cordova@cjc·21 Kas

People talk a lot about culture in startups, but most of the time they mean imitation. They look sideways at what other companies are doing and assume that must be the right answer. If everyone else is working 996, maybe that’s what greatness requires. If everyone else is hiring like crazy, maybe that’s the only way to keep up. But copying is a dangerous default. The best companies I’ve seen weren’t great because they mimicked someone else. They were great because they decided to do things their own way and were willing to look wrong for a long time. That’s what drew me to Linear. Before I joined, I saw a group of founders who were optimizing for quality over optics. They were asking the right question: What would this look like if we built it the way we actually believe is best? That’s rarer than it sounds. People sometimes assume that if a company rejects hustle culture, it must not work hard. But this is a false dichotomy. I work hard because I want to, not because someone tells me to. I’m writing this on a plane. Last night I offered to take a call at 10pm to help close a candidate. After my kids go to bed, I’m often back online. The paradox is that when you enjoy the work, you don’t need to be coerced into doing it. The mistake people make is assuming that long hours are the cause of great work. They’re not. Great work comes from clarity, taste, and autonomy. You need space for that. You can’t get it by grinding people down. At some point, the marginal hour stops helping. Linear has been an experiment in doing things differently. Hiring slowly. Giving people ownership. Expecting craftsmanship. Maintaining enough slack in the system that people can think.

English

7.3K

Trung Vu@trungthvu·18 Kas

@owl_posting why do they despair given they’re making $$$ hand over fist? lack of purpose?

English

owl@owl_posting·18 Kas

ramp is a very profitable company and all but what makes it particularly interesting is that it is the only ‘hot startup’ where multiple swe’s from it have reached out to me to express a sense of despair with their life path, and to ask how they’d switch to biotech

English

604

63.8K

Trung Vu@trungthvu·2 Kas

@WillManidis that microphones in 2020 live video is a top tier find....

English

354

Will Manidis@WillManidis·2 Kas

things i've enjoyed recently:

English

100

11.7K

Trung Vu@trungthvu·20 Eki

@gbrl_dick thats funny because the ceo of databrick is european 😂

English

4.3K

Gabriel@gbrl_dick·20 Eki

say what you want about the european mindset, stripe has a cultural footprint that dwarfs databricks, and it's entirely because the Collisons are conducting themselves like wealthy florentines and the databricks guys are running the former-iron curtain midwest special

unusual_whales@unusual_whales

World's most valuable private companies, per MB: 1. OpenAI: $500 billion 2. SpaceX: $400 billion 3. ByteDance: $330 billion 4. Anthropic: $183 billion 5. xAI: $113 billion 6. Databricks: $100 billion 7. Stripe: $92 billion 8. Revolut: $75 billion 9. Shein: $66 billion

English

1.5K

324.5K

Keşfet

@bubbleboi @typedfemale @xeophon @SkyLi0n @RichardZ412 @cpaik @natolambert @GergelyOrosz