angkyw

115 posts

angkyw

angkyw

@angkywilliam

Katılım Haziran 2010
151 Takip Edilen54 Takipçiler
angkyw retweetledi
Weights & Biases
Weights & Biases@wandb·
Fine-tuning just got a whole lot easier. Serverless SFT is now in public preview on W&B! Managed infrastructure (powered by @CoreWeave) that auto-scales to your training workloads. No cluster setup. No idle GPU costs.
English
5
20
169
251.1K
angkyw
angkyw@angkywilliam·
@Yuchenj_UW The EO is targeting H1B recipient from outside the States, which mostly is IT consultant. It doesn’t affect and may even boost H1B chance of US college grad.
English
0
0
0
36
Yuchen Jin
Yuchen Jin@Yuchenj_UW·
It's sad to hear President Trump raised the H-1B Visa fee from $1,000 to $100,000. - New grads will struggle to get a job - Startups can't afford global talent - Top minds will go to immigration-friendly countries to study and work US can't lose its biggest moat: global talent
English
2.4K
571
8.4K
5.4M
angkyw
angkyw@angkywilliam·
@EgeErdil2 Well, it does say without thinking
English
0
0
1
205
Ege Erdil
Ege Erdil@EgeErdil2·
this screenshot from GPT-5 livestream has to be among the worst chart crimes of the century
Ege Erdil tweet media
English
88
144
2.1K
843K
angkyw retweetledi
dr. jack morris
dr. jack morris@jxmnop·
happy birthday to the USA, the greatest country, and the origin of the following innovations: - Transformers - Pre-training (web-scale next-token prediction) - RLHF - RLVR - RL - GPUs - TPUs - PyTorch - word2vec - reasoning models - GANs - diffusion models - VLMs - self-driving cars 🇺🇸
English
83
80
1.6K
179.4K
angkyw
angkyw@angkywilliam·
@jxmnop Interesting take. What's the tradeoff between SGLang and vLLM?
English
0
0
0
71
dr. jack morris
dr. jack morris@jxmnop·
in 2025, if you want to become a successful AI engineer or researcher, you should NOT learn CUDA furthermore – i'd guess that 80% of successful ML researchers have never written a CUDA kernel practical ML is about training models and using them to make predictions. this has nothing to do with CUDA CUDA is necessary in two cases: (a) you are developing a radically new model that isn't easily expressible in PyTorch or Jax (i.e. Mamba) (b) you are running into performance bottlenecks from current CUDA code and need to make it faster i doubt that either case applies to you chances are you aren't building the next Mamba, and the bottlenecks you'll run into in practice are different you should work on finding the right data or hardware or setting things up properly or distributing efficiently across hardware or researching new efficient ways to run models that other people are working on (like vLLM and SGLang) or better than that, work on your eval pipeline. find ways to measure your model's performance that are more realistic, comprehensive, efficient, fair, etc. TLDR: want to learn? spend your time tinkering with models in PyTorch and Jax. not writing matrix multiplications
English
64
80
1.5K
323K
angkyw
angkyw@angkywilliam·
@LBacaj Lol, should start a small bet as a composer.
English
0
0
1
49
Louie Bacaj
Louie Bacaj@LBacaj·
I think there is a song about this “Well you only need the light when it's burnin low Only miss the sun when it starts to snow Only know you need software engineers when complexity explodes But you let them go, why’d you let them go?”
English
2
2
29
2.5K
angkyw
angkyw@angkywilliam·
@dvassallo Amazon likely build their internal cursor, dogfood internally and release it to public to compete with cursor.
English
0
0
0
31
Daniel Vassallo
Daniel Vassallo@dvassallo·
I’m curious, are the big techs like Amazon and Apple using cursor or similar assistants? Are engineers sending big parts of their codebase to LLM APIs?
English
117
22
672
183.6K
angkyw
angkyw@angkywilliam·
The steam engine moment for intelligence is coming fast.
English
0
0
0
17
angkyw
angkyw@angkywilliam·
Starting to see why top AI labs believe ASI is inevitable. Blending imitation learning (SFT) at different checkpoints with exploration learning (RL) can uncovers new solutions to existing problems and also tackle entirely new unsolved problems.
English
0
0
0
17
angkyw
angkyw@angkywilliam·
@goldstein_aa @jxmnop @AlexIrpan @sea_snell Training RL from scratch is hard. DeepSeek's approach builds on a strong base model. Similar to how college helps build one knowledge base before applying it to solve real-world problems.
English
0
0
0
72
dr. jack morris
dr. jack morris@jxmnop·
a good blog post from 2018: Deep RL Doesn't Work Yet held true for seven years
dr. jack morris tweet media
English
10
45
711
53.1K
angkyw retweetledi
Ross Taylor
Ross Taylor@rosstaylor90·
“Wait that can’t be right” in the wild Thank you internet anons for your service to LLM reasoning. We found you through RL eventually 🫡
Ross Taylor tweet mediaRoss Taylor tweet mediaRoss Taylor tweet mediaRoss Taylor tweet media
English
5
16
164
17.8K
angkyw
angkyw@angkywilliam·
Agent for workflow is a deterministic task similar to math and coding with verifiable output.
English
0
0
0
17
angkyw
angkyw@angkywilliam·
"Reasoning" model trade compute for data efficiency
English
0
0
0
7
angkyw
angkyw@angkywilliam·
A naive approach in training "reasoning" model 1. FineTune instruct model with chain of thought 2. Use best of N to find chain of thought that give the right answer 3. Re fineTune the model with chain of thought that yields the right answer
English
1
0
0
27
angkyw
angkyw@angkywilliam·
@volokuleshov @brandondamos I wish I could attend the lecture! I am working on creating customize chat template serializer for Llama, Mistral and Qwen model.
English
0
0
0
21
Pakchikpak Raja Babu
Pakchikpak Raja Babu@HaramiParindey·
Me *trying to figure out life at 32 years* My juniors at 26 age:
Pakchikpak Raja Babu tweet media
English
54
583
11.8K
473.8K
angkyw
angkyw@angkywilliam·
Have not found class on dataset impact to model performance
English
0
0
1
21