angkyw

115 posts

angkyw

@angkywilliam

Katılım Haziran 2010

151 Takip Edilen54 Takipçiler

angkyw retweetledi

Weights & Biases@wandb·20 Şub

Fine-tuning just got a whole lot easier. Serverless SFT is now in public preview on W&B! Managed infrastructure (powered by @CoreWeave) that auto-scales to your training workloads. No cluster setup. No idle GPU costs.

English

169

251.1K

angkyw@angkywilliam·20 Eyl

@Yuchenj_UW The EO is targeting H1B recipient from outside the States, which mostly is IT consultant. It doesn’t affect and may even boost H1B chance of US college grad.

English

Yuchen Jin@Yuchenj_UW·20 Eyl

It's sad to hear President Trump raised the H-1B Visa fee from $1,000 to $100,000. - New grads will struggle to get a job - Startups can't afford global talent - Top minds will go to immigration-friendly countries to study and work US can't lose its biggest moat: global talent

English

2.4K

571

8.4K

5.4M

angkyw@angkywilliam·8 Ağu

@EgeErdil2 Well, it does say without thinking

English

205

Ege Erdil@EgeErdil2·7 Ağu

this screenshot from GPT-5 livestream has to be among the worst chart crimes of the century

English

144

2.1K

843K

angkyw retweetledi

dr. jack morris@jxmnop·4 Tem

happy birthday to the USA, the greatest country, and the origin of the following innovations: - Transformers - Pre-training (web-scale next-token prediction) - RLHF - RLVR - RL - GPUs - TPUs - PyTorch - word2vec - reasoning models - GANs - diffusion models - VLMs - self-driving cars 🇺🇸

English

1.6K

179.4K

angkyw@angkywilliam·11 Mar

@jxmnop Interesting take. What's the tradeoff between SGLang and vLLM?

English

dr. jack morris@jxmnop·10 Mar

in 2025, if you want to become a successful AI engineer or researcher, you should NOT learn CUDA furthermore – i'd guess that 80% of successful ML researchers have never written a CUDA kernel practical ML is about training models and using them to make predictions. this has nothing to do with CUDA CUDA is necessary in two cases: (a) you are developing a radically new model that isn't easily expressible in PyTorch or Jax (i.e. Mamba) (b) you are running into performance bottlenecks from current CUDA code and need to make it faster i doubt that either case applies to you chances are you aren't building the next Mamba, and the bottlenecks you'll run into in practice are different you should work on finding the right data or hardware or setting things up properly or distributing efficiently across hardware or researching new efficient ways to run models that other people are working on (like vLLM and SGLang) or better than that, work on your eval pipeline. find ways to measure your model's performance that are more realistic, comprehensive, efficient, fair, etc. TLDR: want to learn? spend your time tinkering with models in PyTorch and Jax. not writing matrix multiplications

English

1.5K

323K

angkyw@angkywilliam·15 Şub

@LBacaj Lol, should start a small bet as a composer.

English

Louie Bacaj@LBacaj·15 Şub

I think there is a song about this “Well you only need the light when it's burnin low Only miss the sun when it starts to snow Only know you need software engineers when complexity explodes But you let them go, why’d you let them go?”

English

2.5K

angkyw@angkywilliam·8 Şub

@dvassallo Amazon likely build their internal cursor, dogfood internally and release it to public to compete with cursor.

English

Daniel Vassallo@dvassallo·8 Şub

I’m curious, are the big techs like Amazon and Apple using cursor or similar assistants? Are engineers sending big parts of their codebase to LLM APIs?

English

117

672

183.6K

angkyw@angkywilliam·27 Oca

The steam engine moment for intelligence is coming fast.

English

angkyw@angkywilliam·27 Oca

Starting to see why top AI labs believe ASI is inevitable. Blending imitation learning (SFT) at different checkpoints with exploration learning (RL) can uncovers new solutions to existing problems and also tackle entirely new unsolved problems.

English

angkyw@angkywilliam·27 Oca

@goldstein_aa @jxmnop @AlexIrpan @sea_snell Training RL from scratch is hard. DeepSeek's approach builds on a strong base model. Similar to how college helps build one knowledge base before applying it to solve real-world problems.

English

Adam Goldstein@goldstein_aa·26 Oca

@jxmnop @AlexIrpan So what changed now? I'm still unclear what's so special about DeepSeek's recipe that made it work so well? cc:@sea_snell

English

dr. jack morris@jxmnop·25 Oca

a good blog post from 2018: Deep RL Doesn't Work Yet held true for seven years

English

711

53.1K

angkyw retweetledi

Ross Taylor@rosstaylor90·21 Oca

“Wait that can’t be right” in the wild Thank you internet anons for your service to LLM reasoning. We found you through RL eventually 🫡

English

164

17.8K

angkyw@angkywilliam·11 Oca

Agent for workflow is a deterministic task similar to math and coding with verifiable output.

English

angkyw@angkywilliam·17 Ara

"Reasoning" model trade compute for data efficiency

English

angkyw@angkywilliam·17 Ara

A naive approach in training "reasoning" model 1. FineTune instruct model with chain of thought 2. Use best of N to find chain of thought that give the right answer 3. Re fineTune the model with chain of thought that yields the right answer

English

angkyw@angkywilliam·24 Kas

@ironcarbs Seattle 😎

English

141

angkyw@angkywilliam·24 Kas

@volokuleshov @brandondamos The FIM part help me understand how cursor is being trained under the hood.

English

Volodymyr Kuleshov 🇺🇦@volokuleshov·23 Kas

Thank you for the awesome lecture, @brandondamos !

Brandon Amos@brandondamos

Here's a new lecture I made on learning embeddings with multidimensional scaling and TSNE for our ML course at Cornell Tech (with @volokuleshov). It covers the formulation and goes through some code for a deeper look Notebook: github.com/kuleshov/corne… PDF: drive.google.com/drive/folders/…

English

1.5K

angkyw@angkywilliam·24 Kas

@volokuleshov @brandondamos I wish I could attend the lecture! I am working on creating customize chat template serializer for Llama, Mistral and Qwen model.

English

angkyw@angkywilliam·31 Ağu

@HaramiParindey I know this place, Roosevelt island!

English

164

Pakchikpak Raja Babu@HaramiParindey·30 Ağu

Me *trying to figure out life at 32 years* My juniors at 26 age:

English

583

11.8K

473.8K

angkyw@angkywilliam·8 Ağu

Have not found class on dataset impact to model performance

English

angkyw@angkywilliam·8 Ağu

DNN: Learning Algorithm + Dataset + System/Scale Learning Algorithm: Architecture + Loss function + Optimizer (Gradient Descent) Core classes: - Architecture: youtube.com/watch?v=dJYGat… - Loss function: youtube.com/watch?v=IZgvgL… - System: youtube.com/watch?v=rCFvPE…

YouTube

English

Keşfet

@CoreWeave @Yuchenj_UW @EgeErdil2 @jxmnop @LBacaj @dvassallo @goldstein_aa @AlexIrpan